CN113724163B - Image correction method, device, equipment and medium based on neural network - Google Patents

Image correction method, device, equipment and medium based on neural network Download PDF

Info

Publication number
CN113724163B
CN113724163B CN202111012729.7A CN202111012729A CN113724163B CN 113724163 B CN113724163 B CN 113724163B CN 202111012729 A CN202111012729 A CN 202111012729A CN 113724163 B CN113724163 B CN 113724163B
Authority
CN
China
Prior art keywords
image
sample
training
network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111012729.7A
Other languages
Chinese (zh)
Other versions
CN113724163A (en
Inventor
孙超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202111012729.7A priority Critical patent/CN113724163B/en
Publication of CN113724163A publication Critical patent/CN113724163A/en
Application granted granted Critical
Publication of CN113724163B publication Critical patent/CN113724163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the field of artificial intelligence, and provides an image correction method, device, equipment and medium based on a neural network, which can generate a large number of distorted images as sufficient samples to train according to a small number of flat sample images, integrate original characteristics and text line characteristics of the images, guide the original images to train by using descriptors with the text line characteristics as global characteristics, solve the problem of incomplete correction detail recovery, enable a model obtained by training to have a more robust and smooth correction effect, correct an image to be corrected according to an optical flow information image to obtain a target image, directly apply the generated optical flow information image on the original image to obtain the corrected flat image, and further realize a better image correction effect by combining an artificial intelligence means. In addition, the invention also relates to a blockchain technology, and a model obtained through training can be stored in a blockchain node.

Description

Image correction method, device, equipment and medium based on neural network
Technical Field
The present invention relates to the field of artificial intelligence technologies, and in particular, to a neural network-based image correction method, apparatus, device, and medium.
Background
With the continuous development of science and technology and the continuous improvement of living standard, taking a picture by using mobile equipment has become a common way for people to record document information.
Shooting with mobile devices is often affected by factors such as angular tilt, physical distortion of documents, distortion, etc., resulting in significant challenges in text recognition and structured information archiving. Therefore, the document image distorted and deformed is automatically flattened, so that the accuracy of character recognition can be improved, the difficulty of extracting structured information is reduced, and the accuracy of structured document archiving is improved as a whole.
In the prior art, the correction method for the distorted and inclined document mainly comprises the following steps: reconstruction method based on 3D (3-dimension) model and 2D (2-dimension) deep learning algorithm. Among them, the 3D model reconstruction method is often limited by expensive calibration hardware costs, and has low market popularity. In recent years, with the development of deep learning technology, a new 2D correction-based method has appeared in the industry, and the end-to-end restoration by means of a convolutional neural network has been converted into the problem of searching for a suitable 2D mapping restoration curved image, but because the global information of a picture is too large, local details generally have jitter, and the problem of unsmooth restoration is easily caused.
Disclosure of Invention
The embodiment of the invention provides an image correction method, device, equipment and medium based on a neural network, and aims to solve the problem of poor image correction effect.
In a first aspect, an embodiment of the present invention provides an image correction method based on a neural network, including:
Responding to an image correction instruction, and acquiring an initial sample according to the image correction instruction;
Converting the initial sample to obtain a first training sample;
Training DBNet a network by using the first training sample to obtain a first model, and obtaining the output of the first model as a text line mask map;
performing fusion processing on the text line mask map and the first training sample to obtain a second training sample;
splicing the first model with a preset DocUNet network to obtain an initial network;
Training the initial network by using the second training sample to obtain a second model;
When an image to be corrected is received, inputting the image to be corrected into the second model, and acquiring the output of the second model as an optical flow information graph;
and correcting the image to be corrected according to the optical flow information graph to obtain a target image.
According to a preferred embodiment of the present invention, the converting the initial sample to obtain a first training sample includes:
randomly acquiring any point from each sample picture in the first training sample as a starting point of each sample picture;
Starting from the starting point of each sample picture, moving according to a randomly generated step length to obtain a moving track, wherein each moving step length of the next moving step is randomly generated, and twisting or turning is performed on each point determined according to the randomly generated step length in the moving track;
and constructing the first training sample according to each moved sample picture.
According to a preferred embodiment of the invention, the method further comprises:
extracting image features of each sample picture in the first training sample by using a backbone network of the first model;
Performing up-sampling processing on the image characteristics of each sample picture to obtain a characteristic picture with the same size as each sample picture;
predicting according to the feature images to obtain probability images and threshold images of each sample image;
And carrying out binarization processing according to the probability map and the threshold map of each sample picture to obtain a text line mask image of each sample picture.
According to a preferred embodiment of the present invention, the fusing the text line mask map and the first training sample to obtain a second training sample includes:
Determining a corresponding relation between each mask image in the text line mask images and each sample image in the first training sample;
Dividing the corresponding mask image and sample image into a group to obtain at least one image group;
Fusing the mask image and the sample image in each image group to obtain at least one fused image;
And integrating the at least one fusion picture to obtain the second training sample.
According to a preferred embodiment of the present invention, before the first model is spliced with the preset DocUNet networks, the method further includes:
acquiring an initial DocUNet network;
identifying an encoding layer and a decoding layer in the initial DocUNet network;
Acquiring a convolution layer between the coding layer and the decoding layer;
And replacing the convolution layer with a cavity convolution layer to obtain the preset DocUNet network.
According to a preferred embodiment of the present invention, training the initial network using the second training samples to obtain a second model includes:
freezing parameters of the first model in the initial network;
after the parameters of the first model are frozen, inputting the second training sample into the initial network for training;
monitoring the value of the loss function in the training process;
and stopping training when the value of the loss function reaches convergence, and obtaining the second model.
According to a preferred embodiment of the present invention, the correcting the image to be corrected according to the optical flow information map, to obtain a target image includes:
For each point in the image to be corrected, acquiring the displacement of each point on an x-channel from the optical flow information graph, and acquiring the displacement of each point on a y-channel;
moving each corresponding point according to the displacement of each point on the x channel and the displacement of each point on the y channel;
and determining the moved image to be corrected as the target image.
In a second aspect, an embodiment of the present invention provides an image correction apparatus based on a neural network, including:
The acquisition unit is used for responding to the image correction instruction and acquiring an initial sample according to the image correction instruction;
The conversion unit is used for converting the initial sample to obtain a first training sample;
The training unit is used for training DBNet networks by using the first training samples to obtain a first model, and obtaining the output of the first model as a text line mask graph;
the fusion unit is used for carrying out fusion processing on the text line mask graph and the first training sample to obtain a second training sample;
The splicing unit is used for splicing the first model and a preset DocUNet network to obtain an initial network;
The training unit is further configured to train the initial network by using the second training sample to obtain a second model;
The input unit is used for inputting the image to be corrected into the second model when the image to be corrected is received, and obtaining the output of the second model as an optical flow information graph;
And the correcting unit is used for correcting the image to be corrected according to the optical flow information graph to obtain a target image.
In a third aspect, an embodiment of the present invention further provides a computer apparatus, which includes a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor executes the computer program to implement the neural network-based image correction method described in the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program when executed by a processor causes the processor to perform the neural network based image correction method described in the first aspect above.
The embodiment of the invention provides an image correction method, device, equipment and medium based on a neural network, which can respond to an image correction instruction, acquire an initial sample according to the image correction instruction, convert the initial sample to obtain a first training sample, generate a large number of distorted images as training samples according to a small number of flat sample pictures to obtain sufficient samples for training, train DBNet a network by using the first training sample to obtain a first model, acquire the output of the first model as a text line mask graph, perform fusion processing on the text line mask graph and the first training sample to obtain a second training sample, simultaneously fuse the original characteristics of pictures with the text line characteristics, splice the first model with a preset DocUNet network to obtain an initial network, train the initial network by using the second training sample to obtain a second model, guide the original pictures to train by using the text line characteristics as the description of global characteristics, correct the original pictures, recover the original pictures to be used as a more complete picture, take the optical flow as a more robust image after the corrected pictures are more complete picture, and have the smooth image information, and have the more robust image correction effect after the corrected pictures are obtained by inputting the image information, the image has the more robust correction information after the corrected pictures are more completely corrected by the image has been obtained, the image is more robust and is more corrected by the image information is obtained, and better image correction effect is realized by combining an artificial intelligence means.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of an image correction method based on a neural network according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of an image correction device based on a neural network according to an embodiment of the present invention;
Fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
Referring to fig. 1, a flowchart of an image correction method based on a neural network according to an embodiment of the invention is shown.
S10, responding to an image correction instruction, and acquiring an initial sample according to the image correction instruction.
In this embodiment, the image correction instruction may be triggered by an associated staff member, such as a developer, etc., and the present invention is not limited thereto.
In at least one embodiment of the present invention, the acquiring an initial sample according to the image correction instruction includes:
analyzing the image correction instruction to obtain information carried by the image correction instruction;
Acquiring a preset label corresponding to the address;
constructing a regular expression according to the preset label;
Traversing information carried by the image correction instruction by using the regular expression, and determining the traversed information matched with the regular expression as a target address;
is connected to the target address and obtains data from the target address as the initial sample.
For example: the initial sample may include manifest invoice data, or the like.
The preset labels can be configured in a self-defined mode, and correspond to the addresses.
For example: the preset tag may be configured as ADD, and then the regular expression established according to the preset tag may be ADD ().
Further, the information carried by the image correction instruction is traversed through the regular expression ADD (), the traversed information matched with the regular expression ADD () is determined to be a target address, and data are further acquired from the target address to serve as the initial sample.
The target address may be a file address, a database address, etc.
According to the embodiment, the required data can be acquired based on the tag and the regular expression, and due to the uniqueness of the tag, the data acquisition efficiency is improved, and meanwhile, the accuracy of the acquired data is ensured.
S11, converting the initial sample to obtain a first training sample.
It will be appreciated that, because it is difficult to obtain a warped sample, the initial sample may be a small number of flat samples, and therefore, the initial sample needs to be converted first, that is, a large number of warped images are generated according to a small number of flat sample pictures to be used as training samples, so as to obtain sufficient samples for training.
In at least one embodiment of the present invention, the converting the initial sample to obtain a first training sample includes:
randomly acquiring any point from each sample picture in the first training sample as a starting point of each sample picture;
Starting from the starting point of each sample picture, moving according to a randomly generated step length to obtain a moving track, wherein each moving step length of the next moving step is randomly generated, and twisting or turning is performed on each point determined according to the randomly generated step length in the moving track;
and constructing the first training sample according to each moved sample picture.
Through the embodiment, a large number of training samples can be synthesized according to a small number of pictures, the problem of insufficient training data is solved, and further the accuracy and the training effect of model training are improved.
And S12, training DBNet (Real-TIME SCENE Text Detection with Differentiable Binarization) the network by using the first training sample to obtain a first model, and obtaining the output of the first model as a text line mask diagram.
In at least one embodiment of the present invention, training DBNet the network using the first training sample, to obtain the first model includes:
determining the first training sample as training data to train the DBNet network;
in the training process, stopping training when the DBNet network reaches convergence;
A current model is determined as the first model.
Of course, in other embodiments, text detection networks such as EAST (EFFICIENT AND Accuracy Scene Text) network, pixelLink network, PSENet (progressive extension network) may also be used for training, and the invention is not limited.
Through the implementation manner, the first model can be obtained through training and used for extracting the subsequent features.
In at least one embodiment of the invention, the method further comprises:
extracting image features of each sample picture in the first training sample by using a backbone network of the first model;
Performing up-sampling processing on the image characteristics of each sample picture to obtain a characteristic picture with the same size as each sample picture;
predicting according to the feature images to obtain probability images and threshold images of each sample image;
And carrying out binarization processing according to the probability map and the threshold map of each sample picture to obtain a text line mask image of each sample picture.
The probability map can be converted into a boundary box and a text area through binarization processing, and the boundary box and the text area are further compared with a threshold value in the threshold value map, so that binarization is realized.
Specifically, the network structure of the first model comprises a feature extraction module, an up-sampling fusion module, a feature map output module and the like. And inputting the picture into the first model, obtaining a feature map through a feature extraction module and an up-sampling fusion module, predicting a probability map and a threshold map by using the feature map in a feature map output module, and finally calculating a binary map and outputting the binary map.
Wherein, a standard binarization algorithm can be adopted, and a differentiable binarization algorithm with an adaptive threshold value can be adopted, and the invention is not limited.
Through the implementation mode, the text line mask image is extracted based on the model to serve as the image feature, so that the feature of the text region is more obvious, and the subsequent targeted correction is facilitated.
S13, fusing the text line mask graph and the first training sample to obtain a second training sample.
In at least one embodiment of the present invention, the fusing the text line mask map and the first training sample to obtain a second training sample includes:
Determining a corresponding relation between each mask image in the text line mask images and each sample image in the first training sample;
Dividing the corresponding mask image and sample image into a group to obtain at least one image group;
Fusing the mask image and the sample image in each image group to obtain at least one fused image;
And integrating the at least one fusion picture to obtain the second training sample.
For example: the mask image A in the text line mask image is obtained after a sample image B in the first training sample is input to the first model, namely, the sample image B is an original image of the mask image A, the mask image A and the sample image B have a corresponding relation, the mask image A and the sample image B are divided into a group to be used as an image group C, the mask image A and the sample image B are fused, and the fused image comprises characteristics of the original image and text line characteristics of the original image.
In the second training sample constructed by the above embodiment, the original features and the text line features of the picture are fused.
And S14, splicing the first model with a preset DocUNet (Document Image Unwarping VIA A STACKED U-Net) network to obtain an initial network.
In at least one embodiment of the present invention, before the first model is spliced with the preset DocUNet network, the method further includes:
acquiring an initial DocUNet network;
identifying an encoding layer and a decoding layer in the initial DocUNet network;
Acquiring a convolution layer between the coding layer and the decoding layer;
And replacing the convolution layer with a cavity convolution layer to obtain the preset DocUNet network.
In the embodiment, the cavity convolution is introduced on the basis of the traditional DocUNet network, so that a larger receptive field is obtained, and the prediction of the pixel-by-pixel mapping matrix is facilitated.
And S15, training the initial network by using the second training sample to obtain a second model.
In at least one embodiment of the present invention, training the initial network using the second training sample to obtain a second model includes:
freezing parameters of the first model in the initial network;
after the parameters of the first model are frozen, inputting the second training sample into the initial network for training;
monitoring the value of the loss function in the training process;
and stopping training when the value of the loss function reaches convergence, and obtaining the second model.
In the above embodiment, since the original features and the text line features of the picture are fused in the second training sample, and the text line features are used as the descriptors of the global features to guide the original picture to train, the problem that the correction details of the original picture are not completely restored is solved, compared with the correction of the original picture, the method has stronger robustness, the corrected effect is smoother, the model obtained by training has a more robust and smooth correction effect, and the performances of text detection and text recognition are effectively improved.
S16, when an image to be corrected is received, the image to be corrected is input into the second model, and the output of the second model is obtained as an optical flow information graph.
In at least one embodiment of the present invention, the optical flow information map is a two-dimensional feature map, and includes two channels, namely an x channel and a y channel, where the displacement of the x channel and the displacement of the y channel are recorded in the optical flow information map.
In this embodiment, the image to be corrected may be uploaded by a related staff, such as a staff responsible for image recognition or text recognition.
And S17, correcting the image to be corrected according to the optical flow information graph to obtain a target image.
In at least one embodiment of the present invention, the correcting the image to be corrected according to the optical flow information map, to obtain a target image includes:
For each point in the image to be corrected, acquiring the displacement of each point on an x-channel from the optical flow information graph, and acquiring the displacement of each point on a y-channel;
moving each corresponding point according to the displacement of each point on the x channel and the displacement of each point on the y channel;
and determining the moved image to be corrected as the target image.
Through the implementation mode, the generated optical flow information graph can be directly acted on the original graph to obtain the corrected flat graph, and further, a better image correction effect is achieved by combining an artificial intelligence means.
It should be noted that, in order to further improve the security of the data and avoid the data from being tampered maliciously, the model obtained by training may be stored in the blockchain node.
According to the technical scheme, the method and the device respond to the image correction instruction, obtain an initial sample according to the image correction instruction, convert the initial sample to obtain a first training sample, generate a large number of distorted images according to a small number of flat sample pictures as the training sample to obtain sufficient samples for training, train DBNet a network by using the first training sample to obtain a first model, obtain the output of the first model as a text line mask graph, fuse the text line mask graph with the first training sample to obtain a second training sample, fuse the original features and the text line features of pictures at the same time, splice the first model with a preset DocUNet network to obtain an initial network, train the initial network by using the second training sample to obtain a second model, guide the original pictures to be trained by simultaneously fusing the original features and the text line features of the pictures, solve the problem of using the original pictures to correct incomplete restoration, compared with the original pictures to have stronger corrected pictures, take the corrected pictures as a more robust corrected pictures, directly obtain the corrected images according to the obtained optical flow information when the corrected pictures are obtained by using the second model, further obtain the image information after the corrected pictures are obtained according to the obtained by the image, the optical flow information, the corrected images are obtained after the image is obtained by further has the image has the corrected information is obtained, the image is corrected by the image is obtained by the image has the more robust corrected, and better image correction effect is realized by combining an artificial intelligence means.
The embodiment of the invention also provides an image correction device based on the neural network, which is used for executing any embodiment of the image correction method based on the neural network. Specifically, referring to fig. 2, fig. 2 is a schematic block diagram of an image correction device based on a neural network according to an embodiment of the present invention.
As shown in fig. 2, the neural network-based image correction apparatus 100 includes: the device comprises an acquisition unit 101, a conversion unit 102, a training unit 103, a fusion unit 104, a splicing unit 105, an input unit 106 and a correction unit 107.
In response to the image correction instruction, the acquisition unit 101 acquires an initial sample according to the image correction instruction.
In this embodiment, the image correction instruction may be triggered by an associated staff member, such as a developer, etc., and the present invention is not limited thereto.
In at least one embodiment of the present invention, the acquiring unit 101 acquires an initial sample according to the image correction instruction includes:
analyzing the image correction instruction to obtain information carried by the image correction instruction;
Acquiring a preset label corresponding to the address;
constructing a regular expression according to the preset label;
Traversing information carried by the image correction instruction by using the regular expression, and determining the traversed information matched with the regular expression as a target address;
is connected to the target address and obtains data from the target address as the initial sample.
For example: the initial sample may include manifest invoice data, or the like.
The preset labels can be configured in a self-defined mode, and correspond to the addresses.
For example: the preset tag may be configured as ADD, and then the regular expression established according to the preset tag may be ADD ().
Further, the information carried by the image correction instruction is traversed through the regular expression ADD (), the traversed information matched with the regular expression ADD () is determined to be a target address, and data are further acquired from the target address to serve as the initial sample.
The target address may be a file address, a database address, etc.
According to the embodiment, the required data can be acquired based on the tag and the regular expression, and due to the uniqueness of the tag, the data acquisition efficiency is improved, and meanwhile, the accuracy of the acquired data is ensured.
The conversion unit 102 converts the initial sample to obtain a first training sample.
It will be appreciated that, because it is difficult to obtain a warped sample, the initial sample may be a small number of flat samples, and therefore, the initial sample needs to be converted first, that is, a large number of warped images are generated according to a small number of flat sample pictures to be used as training samples, so as to obtain sufficient samples for training.
In at least one embodiment of the present invention, the converting unit 102 converts the initial sample to obtain a first training sample includes:
randomly acquiring any point from each sample picture in the first training sample as a starting point of each sample picture;
Starting from the starting point of each sample picture, moving according to a randomly generated step length to obtain a moving track, wherein each moving step length of the next moving step is randomly generated, and twisting or turning is performed on each point determined according to the randomly generated step length in the moving track;
and constructing the first training sample according to each moved sample picture.
Through the embodiment, a large number of training samples can be synthesized according to a small number of pictures, the problem of insufficient training data is solved, and further the accuracy and the training effect of model training are improved.
The training unit 103 trains DBNet (Real-TIME SCENE Text Detection with Differentiable Binarization) the network by using the first training sample to obtain a first model, and obtains the output of the first model as a text line mask map.
In at least one embodiment of the present invention, the training unit 103 training DBNet the network using the first training sample, to obtain a first model includes:
determining the first training sample as training data to train the DBNet network;
in the training process, stopping training when the DBNet network reaches convergence;
A current model is determined as the first model.
Of course, in other embodiments, text detection networks such as EAST (EFFICIENT AND Accuracy Scene Text) network, pixelLink network, PSENet (progressive extension network) may also be used for training, and the invention is not limited.
Through the implementation manner, the first model can be obtained through training and used for extracting the subsequent features.
In at least one embodiment of the present invention, extracting image features of each sample picture in the first training sample using a backbone network of the first model;
Performing up-sampling processing on the image characteristics of each sample picture to obtain a characteristic picture with the same size as each sample picture;
predicting according to the feature images to obtain probability images and threshold images of each sample image;
And carrying out binarization processing according to the probability map and the threshold map of each sample picture to obtain a text line mask image of each sample picture.
The probability map can be converted into a boundary box and a text area through binarization processing, and the boundary box and the text area are further compared with a threshold value in the threshold value map, so that binarization is realized.
Specifically, the network structure of the first model comprises a feature extraction module, an up-sampling fusion module, a feature map output module and the like. And inputting the picture into the first model, obtaining a feature map through a feature extraction module and an up-sampling fusion module, predicting a probability map and a threshold map by using the feature map in a feature map output module, and finally calculating a binary map and outputting the binary map.
Wherein, a standard binarization algorithm can be adopted, and a differentiable binarization algorithm with an adaptive threshold value can be adopted, and the invention is not limited.
Through the implementation mode, the text line mask image is extracted based on the model to serve as the image feature, so that the feature of the text region is more obvious, and the subsequent targeted correction is facilitated.
And the fusion unit 104 performs fusion processing on the text line mask map and the first training sample to obtain a second training sample.
In at least one embodiment of the present invention, the fusing unit 104 performs a fusion process on the text line mask map and the first training sample, to obtain a second training sample includes:
Determining a corresponding relation between each mask image in the text line mask images and each sample image in the first training sample;
Dividing the corresponding mask image and sample image into a group to obtain at least one image group;
Fusing the mask image and the sample image in each image group to obtain at least one fused image;
And integrating the at least one fusion picture to obtain the second training sample.
For example: the mask image A in the text line mask image is obtained after a sample image B in the first training sample is input to the first model, namely, the sample image B is an original image of the mask image A, the mask image A and the sample image B have a corresponding relation, the mask image A and the sample image B are divided into a group to be used as an image group C, the mask image A and the sample image B are fused, and the fused image comprises characteristics of the original image and text line characteristics of the original image.
In the second training sample constructed by the above embodiment, the original features and the text line features of the picture are fused.
The splicing unit 105 splices the first model and a preset DocUNet (Document Image Unwarping VIA A STACKED U-Net) network to obtain an initial network.
In at least one embodiment of the present invention, an initial DocUNet network is obtained before the first model is spliced with a preset DocUNet network;
identifying an encoding layer and a decoding layer in the initial DocUNet network;
Acquiring a convolution layer between the coding layer and the decoding layer;
And replacing the convolution layer with a cavity convolution layer to obtain the preset DocUNet network.
In the embodiment, the cavity convolution is introduced on the basis of the traditional DocUNet network, so that a larger receptive field is obtained, and the prediction of the pixel-by-pixel mapping matrix is facilitated.
The training unit 103 trains the initial network by using the second training samples to obtain a second model.
In at least one embodiment of the present invention, the training unit 103 trains the initial network with the second training samples, to obtain a second model includes:
freezing parameters of the first model in the initial network;
after the parameters of the first model are frozen, inputting the second training sample into the initial network for training;
monitoring the value of the loss function in the training process;
and stopping training when the value of the loss function reaches convergence, and obtaining the second model.
In the above embodiment, since the original features and the text line features of the picture are fused in the second training sample, and the text line features are used as the descriptors of the global features to guide the original picture to train, the problem that the correction details of the original picture are not completely restored is solved, compared with the correction of the original picture, the method has stronger robustness, the corrected effect is smoother, the model obtained by training has a more robust and smooth correction effect, and the performances of text detection and text recognition are effectively improved.
When receiving an image to be corrected, the input unit 106 inputs the image to be corrected to the second model, and acquires an output of the second model as an optical flow information map.
In at least one embodiment of the present invention, the optical flow information map is a two-dimensional feature map, and includes two channels, namely an x channel and a y channel, where the displacement of the x channel and the displacement of the y channel are recorded in the optical flow information map.
In this embodiment, the image to be corrected may be uploaded by a related staff, such as a staff responsible for image recognition or text recognition.
The correcting unit 107 corrects the image to be corrected according to the optical flow information map, and obtains a target image.
In at least one embodiment of the present invention, the correcting unit 107 corrects the image to be corrected according to the optical flow information map, and obtaining the target image includes:
For each point in the image to be corrected, acquiring the displacement of each point on an x-channel from the optical flow information graph, and acquiring the displacement of each point on a y-channel;
moving each corresponding point according to the displacement of each point on the x channel and the displacement of each point on the y channel;
and determining the moved image to be corrected as the target image.
Through the implementation mode, the generated optical flow information graph can be directly acted on the original graph to obtain the corrected flat graph, and further, a better image correction effect is achieved by combining an artificial intelligence means.
It should be noted that, in order to further improve the security of the data and avoid the data from being tampered maliciously, the model obtained by training may be stored in the blockchain node.
According to the technical scheme, the method and the device respond to the image correction instruction, obtain an initial sample according to the image correction instruction, convert the initial sample to obtain a first training sample, generate a large number of distorted images according to a small number of flat sample pictures as the training sample to obtain sufficient samples for training, train DBNet a network by using the first training sample to obtain a first model, obtain the output of the first model as a text line mask graph, fuse the text line mask graph with the first training sample to obtain a second training sample, fuse the original features and the text line features of pictures at the same time, splice the first model with a preset DocUNet network to obtain an initial network, train the initial network by using the second training sample to obtain a second model, guide the original pictures to be trained by simultaneously fusing the original features and the text line features of the pictures, solve the problem of using the original pictures to correct incomplete restoration, compared with the original pictures to have stronger corrected pictures, take the corrected pictures as a more robust corrected pictures, directly obtain the corrected images according to the obtained optical flow information when the corrected pictures are obtained by using the second model, further obtain the image information after the corrected pictures are obtained according to the obtained by the image, the optical flow information, the corrected images are obtained after the image is obtained by further has the image has the corrected information is obtained, the image is corrected by the image is obtained by the image has the more robust corrected, and better image correction effect is realized by combining an artificial intelligence means.
The neural network-based image rectification apparatus described above may be implemented in the form of a computer program that is executable on a computer device as shown in fig. 3.
Referring to fig. 3, fig. 3 is a schematic block diagram of a computer device according to an embodiment of the present invention. The computer device 500 is a server, and the server may be a stand-alone server or a server cluster formed by a plurality of servers. The server may be an independent server, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms.
Wherein artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend, and expand human intelligence, sense the environment, acquire knowledge, and use knowledge to obtain optimal results.
Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.
With reference to FIG. 3, the computer device 500 includes a processor 502, a memory, and a network interface 505, connected by a system bus 501, where the memory may include a storage medium 503 and an internal memory 504.
The storage medium 503 may store an operating system 5031 and a computer program 5032. The computer program 5032, when executed, may cause the processor 502 to perform a neural network based image rectification method.
The processor 502 is used to provide computing and control capabilities to support the operation of the overall computer device 500.
The internal memory 504 provides an environment for the execution of a computer program 5032 in the storage medium 503, which computer program 5032, when executed by the processor 502, causes the processor 502 to perform a neural network based image rectification method.
The network interface 505 is used for network communication, such as providing for transmission of data information, etc. It will be appreciated by those skilled in the art that the architecture shown in fig. 3 is merely a block diagram of some of the architecture relevant to the present inventive arrangements and is not limiting of the computer device 500 to which the present inventive arrangements may be implemented, and that a particular computer device 500 may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
The processor 502 is configured to execute a computer program 5032 stored in a memory, so as to implement the neural network-based image correction method disclosed in the embodiment of the present invention.
Those skilled in the art will appreciate that the embodiment of the computer device shown in fig. 3 is not limiting of the specific construction of the computer device, and in other embodiments, the computer device may include more or less components than those shown, or certain components may be combined, or a different arrangement of components. For example, in some embodiments, the computer device may include only a memory and a processor, and in such embodiments, the structure and function of the memory and the processor are consistent with the embodiment shown in fig. 3, and will not be described again.
It should be appreciated that in embodiments of the present invention, the Processor 502 may be a central processing unit (Central Processing Unit, CPU), the Processor 502 may also be other general purpose processors, digital signal processors (DIGITAL SIGNAL processors, DSPs), application SPECIFIC INTEGRATED Circuits (ASICs), off-the-shelf Programmable gate arrays (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
In another embodiment of the invention, a computer-readable storage medium is provided. The computer readable storage medium may be a nonvolatile computer readable storage medium or a volatile computer readable storage medium. The computer readable storage medium stores a computer program, wherein the computer program when executed by a processor implements the neural network-based image correction method disclosed in the embodiment of the present invention.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The blockchain (Blockchain), essentially a de-centralized database, is a string of data blocks that are generated in association using cryptographic methods, each of which contains information from a batch of network transactions for verifying the validity (anti-counterfeit) of its information and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working procedures of the apparatus, device and unit described above may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein. Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a logical function division, there may be another division manner in actual implementation, or units having the same function may be integrated into one unit, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the embodiment of the present invention.
The invention is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the technical solution of the present invention is essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (7)

1. An image correction method based on a neural network, comprising:
Responding to an image correction instruction, and acquiring an initial sample according to the image correction instruction;
Converting the initial sample to obtain a first training sample;
Training DBNet a network by using the first training sample to obtain a first model, and obtaining the output of the first model as a text line mask map;
performing fusion processing on the text line mask map and the first training sample to obtain a second training sample;
splicing the first model with a preset DocUNet network to obtain an initial network;
Training the initial network by using the second training sample to obtain a second model;
When an image to be corrected is received, inputting the image to be corrected into the second model, and acquiring the output of the second model as an optical flow information graph;
correcting the image to be corrected according to the optical flow information graph to obtain a target image;
The obtaining an initial sample according to the image correction instruction comprises:
analyzing the image correction instruction to obtain information carried by the image correction instruction;
Acquiring a preset label corresponding to the address;
constructing a regular expression according to the preset label;
Traversing information carried by the image correction instruction by using the regular expression, and determining the traversed information matched with the regular expression as a target address;
connecting to the target address and acquiring data from the target address as the initial sample;
The converting the initial sample to obtain a first training sample includes:
randomly acquiring any point from each sample picture in the first training sample as a starting point of each sample picture;
Starting from the starting point of each sample picture, moving according to a randomly generated step length to obtain a moving track, wherein each moving step length of the next moving step is randomly generated, and twisting or turning is performed on each point determined according to the randomly generated step length in the moving track;
constructing the first training sample according to each moved sample picture;
the fusing processing is performed on the text line mask map and the first training sample, and obtaining a second training sample comprises the following steps:
Determining a corresponding relation between each mask image in the text line mask images and each sample image in the first training sample;
Dividing the corresponding mask image and sample image into a group to obtain at least one image group;
Fusing the mask image and the sample image in each image group to obtain at least one fused image;
Integrating the at least one fusion picture to obtain the second training sample;
Before the first model is spliced with the preset DocUNet network, the method further includes:
acquiring an initial DocUNet network;
identifying an encoding layer and a decoding layer in the initial DocUNet network;
Acquiring a convolution layer between the coding layer and the decoding layer;
And replacing the convolution layer with a cavity convolution layer to obtain the preset DocUNet network.
2. The neural network-based image rectification method of claim 1, further comprising:
extracting image features of each sample picture in the first training sample by using a backbone network of the first model;
Performing up-sampling processing on the image characteristics of each sample picture to obtain a characteristic picture with the same size as each sample picture;
predicting according to the feature images to obtain probability images and threshold images of each sample image;
And carrying out binarization processing according to the probability map and the threshold map of each sample picture to obtain a text line mask image of each sample picture.
3. The neural network-based image rectification method of claim 1, wherein said training the initial network with the second training samples to obtain a second model comprises:
freezing parameters of the first model in the initial network;
after the parameters of the first model are frozen, inputting the second training sample into the initial network for training;
monitoring the value of the loss function in the training process;
and stopping training when the value of the loss function reaches convergence, and obtaining the second model.
4. The neural network-based image correction method according to claim 1, wherein correcting the image to be corrected according to the optical flow information map, to obtain a target image, comprises:
For each point in the image to be corrected, acquiring the displacement of each point on an x-channel from the optical flow information graph, and acquiring the displacement of each point on a y-channel;
moving each corresponding point according to the displacement of each point on the x channel and the displacement of each point on the y channel;
and determining the moved image to be corrected as the target image.
5. An image correction device based on a neural network, comprising:
The acquisition unit is used for responding to the image correction instruction and acquiring an initial sample according to the image correction instruction;
The conversion unit is used for converting the initial sample to obtain a first training sample;
The training unit is used for training DBNet networks by using the first training samples to obtain a first model, and obtaining the output of the first model as a text line mask graph;
the fusion unit is used for carrying out fusion processing on the text line mask graph and the first training sample to obtain a second training sample;
The splicing unit is used for splicing the first model and a preset DocUNet network to obtain an initial network;
The training unit is further configured to train the initial network by using the second training sample to obtain a second model;
The input unit is used for inputting the image to be corrected into the second model when the image to be corrected is received, and obtaining the output of the second model as an optical flow information graph;
the correcting unit is used for correcting the image to be corrected according to the optical flow information graph to obtain a target image;
the acquiring unit acquiring an initial sample according to the image correction instruction includes:
analyzing the image correction instruction to obtain information carried by the image correction instruction;
Acquiring a preset label corresponding to the address;
constructing a regular expression according to the preset label;
Traversing information carried by the image correction instruction by using the regular expression, and determining the traversed information matched with the regular expression as a target address;
connecting to the target address and acquiring data from the target address as the initial sample;
the converting unit converts the initial sample to obtain a first training sample, which includes:
randomly acquiring any point from each sample picture in the first training sample as a starting point of each sample picture;
Starting from the starting point of each sample picture, moving according to a randomly generated step length to obtain a moving track, wherein each moving step length of the next moving step is randomly generated, and twisting or turning is performed on each point determined according to the randomly generated step length in the moving track;
constructing the first training sample according to each moved sample picture;
the fusing processing is performed on the text line mask map and the first training sample, and obtaining a second training sample comprises the following steps:
Determining a corresponding relation between each mask image in the text line mask images and each sample image in the first training sample;
Dividing the corresponding mask image and sample image into a group to obtain at least one image group;
Fusing the mask image and the sample image in each image group to obtain at least one fused image;
Integrating the at least one fusion picture to obtain the second training sample;
Before the first model is spliced with the preset DocUNet network, the device further comprises:
acquiring an initial DocUNet network;
identifying an encoding layer and a decoding layer in the initial DocUNet network;
Acquiring a convolution layer between the coding layer and the decoding layer;
And replacing the convolution layer with a cavity convolution layer to obtain the preset DocUNet network.
6. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the neural network-based image rectification method of any one of claims 1 to 4 when the computer program is executed.
7. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, which when executed by a processor causes the processor to perform the neural network-based image correction method of any one of claims 1 to 4.
CN202111012729.7A 2021-08-31 2021-08-31 Image correction method, device, equipment and medium based on neural network Active CN113724163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111012729.7A CN113724163B (en) 2021-08-31 2021-08-31 Image correction method, device, equipment and medium based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111012729.7A CN113724163B (en) 2021-08-31 2021-08-31 Image correction method, device, equipment and medium based on neural network

Publications (2)

Publication Number Publication Date
CN113724163A CN113724163A (en) 2021-11-30
CN113724163B true CN113724163B (en) 2024-06-07

Family

ID=78679749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111012729.7A Active CN113724163B (en) 2021-08-31 2021-08-31 Image correction method, device, equipment and medium based on neural network

Country Status (1)

Country Link
CN (1) CN113724163B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114202648B (en) * 2021-12-08 2024-04-16 北京百度网讯科技有限公司 Text image correction method, training device, electronic equipment and medium
CN115619678B (en) * 2022-10-31 2024-04-19 锋睿领创(珠海)科技有限公司 Correction method and device for image deformation, computer equipment and storage medium
CN117557447B (en) * 2024-01-11 2024-04-26 深圳智能思创科技有限公司 Image restoration method, device, equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229575A (en) * 2018-01-19 2018-06-29 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN109961507A (en) * 2019-03-22 2019-07-02 腾讯科技(深圳)有限公司 A kind of Face image synthesis method, apparatus, equipment and storage medium
CN110543815A (en) * 2019-07-22 2019-12-06 平安科技(深圳)有限公司 Training method of face recognition model, face recognition method, device, equipment and storage medium
CN110866871A (en) * 2019-11-15 2020-03-06 深圳市华云中盛科技股份有限公司 Text image correction method and device, computer equipment and storage medium
US10599952B1 (en) * 2019-11-01 2020-03-24 Capital One Services, Llc Computer-based systems and methods for recognizing and correcting distorted text in facsimile documents
CN111223066A (en) * 2020-01-17 2020-06-02 上海联影医疗科技有限公司 Motion artifact correction method, motion artifact correction device, computer equipment and readable storage medium
CN111260586A (en) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 Method and device for correcting distorted document image
WO2021057848A1 (en) * 2019-09-29 2021-04-01 Oppo广东移动通信有限公司 Network training method, image processing method, network, terminal device and medium
CN112597998A (en) * 2021-01-07 2021-04-02 天津师范大学 Deep learning-based distorted image correction method and device and storage medium
CN112597918A (en) * 2020-12-25 2021-04-02 创新奇智(西安)科技有限公司 Text detection method and device, electronic equipment and storage medium
CN113012075A (en) * 2021-04-22 2021-06-22 中国平安人寿保险股份有限公司 Image correction method and device, computer equipment and storage medium
CN113034406A (en) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 Distorted document recovery method, device, equipment and medium
CN113033543A (en) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 Curved text recognition method, device, equipment and medium

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229575A (en) * 2018-01-19 2018-06-29 百度在线网络技术(北京)有限公司 For detecting the method and apparatus of target
CN109961507A (en) * 2019-03-22 2019-07-02 腾讯科技(深圳)有限公司 A kind of Face image synthesis method, apparatus, equipment and storage medium
CN110543815A (en) * 2019-07-22 2019-12-06 平安科技(深圳)有限公司 Training method of face recognition model, face recognition method, device, equipment and storage medium
WO2021012526A1 (en) * 2019-07-22 2021-01-28 平安科技(深圳)有限公司 Face recognition model training method, face recognition method and apparatus, device, and storage medium
WO2021057848A1 (en) * 2019-09-29 2021-04-01 Oppo广东移动通信有限公司 Network training method, image processing method, network, terminal device and medium
US10599952B1 (en) * 2019-11-01 2020-03-24 Capital One Services, Llc Computer-based systems and methods for recognizing and correcting distorted text in facsimile documents
CN110866871A (en) * 2019-11-15 2020-03-06 深圳市华云中盛科技股份有限公司 Text image correction method and device, computer equipment and storage medium
CN111223066A (en) * 2020-01-17 2020-06-02 上海联影医疗科技有限公司 Motion artifact correction method, motion artifact correction device, computer equipment and readable storage medium
CN111260586A (en) * 2020-01-20 2020-06-09 北京百度网讯科技有限公司 Method and device for correcting distorted document image
CN112597918A (en) * 2020-12-25 2021-04-02 创新奇智(西安)科技有限公司 Text detection method and device, electronic equipment and storage medium
CN112597998A (en) * 2021-01-07 2021-04-02 天津师范大学 Deep learning-based distorted image correction method and device and storage medium
CN113012075A (en) * 2021-04-22 2021-06-22 中国平安人寿保险股份有限公司 Image correction method and device, computer equipment and storage medium
CN113034406A (en) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 Distorted document recovery method, device, equipment and medium
CN113033543A (en) * 2021-04-27 2021-06-25 中国平安人寿保险股份有限公司 Curved text recognition method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DocUNet: Document Image Unwarping via A Stacked U-Net;Ke Ma et al;《CVPR》;第4700-4709页 *
Real-Time Scene Text Detection with Differentiable Binarization;Minghui Liao et al;《The Thirty-Fourth AAAI Conference on Artificial Intelligence》;第11474-11481页 *

Also Published As

Publication number Publication date
CN113724163A (en) 2021-11-30

Similar Documents

Publication Publication Date Title
CN113724163B (en) Image correction method, device, equipment and medium based on neural network
US20220114750A1 (en) Map constructing method, positioning method and wireless communication terminal
US8442307B1 (en) Appearance augmented 3-D point clouds for trajectory and camera localization
US8971641B2 (en) Spatial image index and associated updating functionality
US10599975B2 (en) Scalable parameter encoding of artificial neural networks obtained via an evolutionary process
WO2023273628A1 (en) Video loop recognition method and apparatus, computer device, and storage medium
CN109871736B (en) Method and device for generating natural language description information
CN113033543B (en) Curve text recognition method, device, equipment and medium
WO2021007846A1 (en) Method, apparatus and device for video similarity detection
CN112037142B (en) Image denoising method, device, computer and readable storage medium
CN109710705A (en) Map point of interest treating method and apparatus
CN114495128B (en) Subtitle information detection method, device, equipment and storage medium
CN116978011B (en) Image semantic communication method and system for intelligent target recognition
CN114329013A (en) Data processing method, data processing equipment and computer readable storage medium
CN115223166A (en) Picture pre-labeling method, picture labeling method and device, and electronic equipment
CN111651674A (en) Bidirectional searching method and device and electronic equipment
CN115131801A (en) Multi-modal-based document recognition method, device, equipment and storage medium
Zou et al. 360° image saliency prediction by embedding self-supervised proxy task
CN112052409B (en) Address resolution method, device, equipment and medium
US11328095B2 (en) Peceptual video fingerprinting
CN113781462A (en) Human body disability detection method, device, equipment and storage medium
CN117078970A (en) Picture identification method and device, electronic equipment and storage medium
CN114756837B (en) Block chain-based digital content tracing method and system
CN114463376B (en) Video text tracking method and device, electronic equipment and storage medium
CN115002196B (en) Data processing method and device and vehicle end acquisition equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant