CN115063813A - Training method and training device of alignment model aiming at character distortion - Google Patents
Training method and training device of alignment model aiming at character distortion Download PDFInfo
- Publication number
- CN115063813A CN115063813A CN202210781749.9A CN202210781749A CN115063813A CN 115063813 A CN115063813 A CN 115063813A CN 202210781749 A CN202210781749 A CN 202210781749A CN 115063813 A CN115063813 A CN 115063813A
- Authority
- CN
- China
- Prior art keywords
- image
- original
- alignment
- distorted
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012549 training Methods 0.000 title claims abstract description 84
- 238000000034 method Methods 0.000 title claims abstract description 65
- 230000009466 transformation Effects 0.000 claims abstract description 39
- 230000006870 function Effects 0.000 claims description 45
- 238000006073 displacement reaction Methods 0.000 claims description 29
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 description 12
- 238000010586 diagram Methods 0.000 description 9
- 238000005070 sampling Methods 0.000 description 6
- 238000010191 image analysis Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000002372 labelling Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 3
- 238000012015 optical character recognition Methods 0.000 description 3
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000009792 diffusion process Methods 0.000 description 2
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 238000005192 partition Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19147—Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/168—Smoothing or thinning of the pattern; Skeletonisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Character Input (AREA)
Abstract
The present disclosure describes a training method and a training apparatus for an alignment model for character distortion, the training method including obtaining a plurality of original documents and performing geometric transformation and position alignment based on a character skeleton on the corresponding original images to obtain a label image; respectively taking image blocks corresponding to character areas in the original image, the distorted image and the label image as a first image set, a second image set and a third image set; acquiring a prediction set obtained by predicting the first image set and the second image set by the alignment model, and determining a predicted aligned image block based on the prediction set; determining a first loss function based on the predicted aligned image blocks and the image blocks in the third image set, and determining a second loss function based on the character skeleton of the predicted aligned image blocks and the character skeleton of the image blocks in the second image set; and training an alignment model based on the first loss function and the second loss function to obtain a trained alignment model. Thereby, the alignment precision and accuracy can be improved.
Description
Technical Field
The present disclosure relates generally to the field of document processing, and in particular, to a training method and a training apparatus for an alignment model for character distortion.
Background
In recent years, deep learning methods have been widely applied in the field of document image analysis and processing. When a document image analysis model based on deep learning is trained, corresponding annotation data is often required to be collected as a gold standard for learning of the document image analysis model.
At present, great difficulty is faced in training a document image analysis model, wherein obviously, the labeling difficulty of training data is great and the cost is high. In particular, when labeling the training data, it is often necessary to align the associated images in the training data (for example, it is necessary to align the original image corresponding to the original document with the distorted image), which in turn causes the cost and difficulty of data labeling to be greatly increased. In order to solve the problem of high labeling cost of training data, the existing mainstream method comprises the following steps: generating a corresponding distorted image from an original image based on an image distortion model theory, and further searching the original image which can be aligned with the distorted image; and estimating the geometric distortion of the distorted image and aligning the original image and the distorted image by using the characteristic points of the distorted image so as to label the training data acquired by the real environment.
However, in the scheme of the above mainstream method, distortion at the text level in a distorted image (e.g., distortion in which the edge of text is blurred and/or spread) is not considered. Therefore, the alignment precision and accuracy still need to be improved.
Disclosure of Invention
The present disclosure has been made in view of the above circumstances, and an object thereof is to provide a training method and a training device for an alignment model for character distortion, which can improve alignment accuracy and precision.
To this end, a first aspect of the present disclosure provides a training method for a distorted-text alignment model, where the alignment model is a deep neural network for aligning an undistorted original image and a distorted image corresponding to an original document in position, and the training method includes: acquiring a plurality of original documents, and performing geometric transformation and position alignment based on a character skeleton on original images of all the original documents to acquire a label image, wherein the label image is an aligned original image; respectively taking image blocks corresponding to character areas in original images, distorted images and label images corresponding to the original documents as a first image set, a second image set and a third image set; obtaining a prediction set obtained by predicting the alignment model aiming at the first image set and the second image set, wherein the prediction set comprises prediction results corresponding to image blocks in the first image set, and determining predicted aligned image blocks corresponding to the image blocks in the first image set based on the prediction results in the prediction set; determining a first loss function based on a predicted aligned image block corresponding to the image block in the first image set and a corresponding image block in the third image set, and determining a second loss function based on a character skeleton of the predicted aligned image block corresponding to the image block in the first image set and a character skeleton of the image block corresponding to the second image set; and training the alignment model based on the first loss function and the second loss function to obtain the trained alignment model. In this case, the predicted aligned image block can be made to approach the golden standard more and more by the first loss function, the risk that the predicted aligned image block deviates from the character form in the distorted image block can be reduced by the second loss function, the influence of errors possibly existing in the golden standard can also be reduced, so that the predicted aligned image block and the label image block are aligned in position and the character form is closer to the distorted image block, the alignment of the distortion of the character level in the distorted image can be adapted, and the alignment precision and accuracy can be improved.
In addition, in the training method according to the first aspect of the present disclosure, optionally, in the geometric transformation, geometric transformation parameters are obtained based on image blocks corresponding to character areas in an original image and a distorted image of each original document, and the original image of each original document is transformed by using the geometric transformation parameters so as to align character shapes between the transformed original image and the distorted image. In this case, the geometric transformation is performed based on the character region, and the interference of the content other than the characters in the original image and/or the distorted image can be reduced.
In addition, in the training method according to the first aspect of the present disclosure, optionally, in the position alignment, for each original document, a text skeleton of the geometrically transformed original image is used as a first skeleton, a text skeleton of a distorted image is used as a second skeleton, an alignment position of a preset position in the geometrically transformed original image in the distorted image is determined based on a degree of overlap between the first skeleton and the second skeleton, and a position of the geometrically transformed original image and a position of the distorted image are aligned based on the alignment position. Under the condition, the negative influence of character distortion on the alignment result can be reduced by the character skeleton, so that the condition that some distorted characters are easy to diffuse outwards or the edges are fuzzy due to distortion is compatible, and a more accurate alignment position can be obtained.
In addition, in the training method according to the first aspect of the present disclosure, optionally, each image block in the third image set is taken as a label image block, and the first loss function is determined based on a similarity between the label image block and a prediction aligned image block corresponding to the label image block; and/or determining the second loss function based on the coincidence degree between the character framework of the distorted image block and the character framework of the prediction alignment image block corresponding to the distorted image block by taking each image block in the second image set as the distorted image block. In this case, the prediction aligned image blocks can be made closer and closer to the golden standard by the first loss function. In addition, the risk that the predicted aligned image block deviates from the character form in the distorted image block can be reduced through the second loss function, and the influence of errors possibly existing in the golden standard can be reduced, so that the predicted aligned image block is aligned with the original image block in position and the character form is closer to the distorted image block.
In addition, in the training method according to the first aspect of the present disclosure, optionally, the image blocks have the same size, and the number of characters in the image block is not less than 1.
Further, in the training method according to the first aspect of the present disclosure, optionally, the granularity of division of the image block includes at least one of a single word, a plurality of words, a single line of words, and a plurality of lines of words. In this case, the image blocks of the corresponding division granularity can be obtained as needed.
In addition, in the training method according to the first aspect of the present disclosure, optionally, the distorted image includes at least one of a legal image and a copied image, where the legal image is an image acquired by a first acquisition device from the original image, and the copied image is an image acquired by printing the legal image on a physical object carrier to obtain a printed image and then acquiring the printed image by a second acquisition device.
In addition, in the training method according to the first aspect of the present disclosure, optionally, an input of the alignment model is a result of superimposing, in a channel dimension, each pair of image blocks that are in a position matching in the first image set and the second image set, an output of the alignment model is the prediction result, the prediction result is prediction displacement data, and the prediction displacement data is used to move pixel points in the image blocks of the first image set to determine a predicted aligned image block corresponding to the image block in the first image set. In this case, the prediction aligned image block can be determined based on the prediction displacement data, knowing the image block and the prediction displacement data in the first image set.
In addition, in the training method according to the first aspect of the present disclosure, optionally, the predicted displacement data includes a first displacement image and a second displacement image respectively located in two channels, a pixel value of each position in the first displacement image represents a horizontal parameter for horizontally moving a pixel point of a corresponding position of an image block of the first image set, and a pixel value of each position in the second displacement image represents a vertical parameter for vertically moving a pixel point of a corresponding position of an image block of the first image set.
A second aspect of the present disclosure provides a training apparatus for an alignment model for text distortion, comprising a memory for non-transitory storage of computer readable instructions; and a processor for executing the computer-readable instructions, when executed by the processor, performing the training method according to the first aspect of the present disclosure.
According to the present disclosure, a training method and a training device for an alignment model for character distortion are provided, which can improve alignment accuracy and precision.
Drawings
The disclosure will now be explained in further detail by way of example only with reference to the accompanying drawings, in which:
fig. 1 is an exemplary schematic diagram illustrating an alignment environment to which examples of the present disclosure relate.
Fig. 2 is an exemplary flow chart illustrating a training method of an alignment model according to an example of the present disclosure.
Fig. 3 is a schematic diagram illustrating a text skeleton to which examples of the present disclosure relate.
Fig. 4 is an exemplary flow chart illustrating acquiring a tag image according to an example of the present disclosure.
Fig. 5 is an exemplary flow chart illustrating a geometric transformation in accordance with an example of the present disclosure.
Fig. 6 is an exemplary flowchart showing step S104 in fig. 2.
Fig. 7 is an exemplary block diagram illustrating a UNet-based alignment model to which examples of the present disclosure relate.
Fig. 8 is an exemplary flow diagram illustrating the use of a trained alignment model in accordance with examples of the present disclosure.
Fig. 9 is a schematic diagram illustrating aligned image blocks generated by a trained alignment model according to examples of the present disclosure.
Detailed Description
Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the following description, the same components are denoted by the same reference numerals, and redundant description thereof is omitted. The drawings are schematic, and the proportions of the dimensions of the components and the shapes of the components may be different from the actual ones. It is noted that the terms "comprises," "comprising," and "having," and any variations thereof, in this disclosure, for example, a process, method, system, article, or apparatus that comprises or has a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include or have other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. All methods described in this disclosure can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context.
In the present disclosure, the position alignment between the original image and the distorted image may be the alignment of the original image to the distorted image, or the alignment of the distorted image to the original image. Preferably, the positional alignment between the original image and the distorted image may be the alignment of the original image to the distorted image. In this case, since the original image is relatively sharp, the risk of introducing new distortions can be reduced in the adjustment of the original image to align the distorted image.
The training method for the alignment model for character distortion according to the examples of the present disclosure may also be simply referred to as a training method or a model training method. The training method can conveniently align the original image with the distorted image, can adapt to the alignment of the distortion (such as the edge blurring and/or diffusion of the characters) at the character level in the distorted image, and further can improve the alignment precision and accuracy. The training method related to the present disclosure can be applied to any application scenario in which the original image and the distorted image are aligned.
It should be noted that the training method according to the example of the present disclosure is described by taking the original image aligned to the distorted image as an example, and does not represent a limitation to the present disclosure, and may also be adjusted to be used for aligning the distorted image to the original image.
In addition, the original image may be an undistorted image corresponding to the original document. In some examples, the original image may be generated by software. Thus, the original image can be easily acquired. In other examples, the original image may also be an image corresponding to the original document from which the scene was actually applied and via desensitization processing. For example, the original document may be a paper document, an ID card, or a credit card. In addition, the distortion may be any difference from the original image. For example, the types of distortion may include paper breakage distortion, paper folding distortion, optical distortion, down-sampling distortion of the acquisition device, and the like.
In addition, the distorted image may be a distorted image corresponding to the original document. In some examples, the distorted image may include at least one of a legal image and a reproduced image. The legal image may be an image obtained by capturing an original image by the first capturing apparatus. The copied image can be an image obtained by printing a legal image on a physical carrier to obtain a printed image and then collecting the printed image by second collecting equipment. It can be seen that the distorted image is distorted to some extent with respect to the original image. In particular, in the printing step, it is generally difficult to print image data onto paper without loss, and sometimes the quality of the paper also affects the print quality, thereby distorting or degrading the image. In the actual alignment scene, in addition to the down-sampling distortion caused by the acquisition process, the influence of the edge blurring and/or diffusion distortion of the text caused by the acquisition process (e.g., the printing process) on the alignment result of the distorted image and the original image needs to be considered.
In some examples, the acquisition devices involved in generating a distorted image may be different. For example, the models of the first and second capture devices may be different. In this case, the process of obtaining the distorted image can be made closer to the real distorted scene. This can improve the quality of the training data.
Additionally, the capture devices (e.g., the first capture device and/or the second capture device) may be used to print, photograph, or scan a corresponding document or image. Thereby, a distorted image can be acquired by the acquisition device. In some examples, the acquisition device may include at least one of a cell phone, a digital camera, a video camera, and a scanner.
Fig. 1 is an exemplary schematic diagram illustrating an alignment environment to which examples of the present disclosure relate. In some examples, the trained alignment model 20 to which the disclosed examples relate may be applied in an alignment environment as shown in fig. 1. In the alignment environment, the original image and the distorted image in the training data of the document analysis model 10 may be position-aligned by the alignment model 20. Specifically, the alignment model 20 may receive an original image and a distorted image corresponding to each original document and output an aligned original image. That is, the alignment model 20 may be used to positionally align the original image with the distorted image. After the document analysis model 10 is trained using the aligned original images and the distorted images obtained by the alignment model 20, the trained document analysis model 10 may receive and analyze the document images to be analyzed (i.e., the distorted images and/or the original images) to obtain an analysis result. In addition, the document analysis model 10 can be any model based on machine learning and requiring training based on the distorted image and the original image aligned with each other to achieve the corresponding purpose. For example, the document analysis model 10 may be a duplication detection network or an image content detection network.
In the training phase of the alignment model 20, a part of a text region in an image may be extracted, the original image is processed for the part to obtain an aligned original image (i.e., a label image), the aligned original image is used as a part of the monitoring information for training the alignment model 20, and meanwhile, a text skeleton of a distorted image is added as another part of the monitoring information for training. In the training, the original image and the distorted image may be segmented and input to the alignment model 20 for training. In addition, in the test stage or the model application stage, the distorted image and the original image to be aligned may be cropped to the size of the input of the alignment model 20, and input into the alignment model 20 to obtain the alignment result.
Fig. 2 is an exemplary flow chart illustrating a training method of the alignment model 20 according to an example of the present disclosure. Fig. 3 is a schematic diagram illustrating a text skeleton to which examples of the present disclosure relate.
In some examples, as shown in fig. 2, the training method may include obtaining a sample set (step S102), determining a predicted aligned image block corresponding to an image block in the first image set using the sample set and the alignment model 20 (step S104), determining a plurality of loss functions based on the predicted aligned image block (step S106), and training the alignment model 20 based on the plurality of loss functions to obtain a trained alignment model 20 (step S108).
Referring to fig. 2, in step S102, in the present embodiment, a sample set may be acquired. The sample set may be used to train an alignment model 20 (described later). The sample set may be determined from a plurality of original documents. Specifically, original images, distorted images, and label images corresponding to a plurality of original documents may be acquired, and a sample set may be determined based on the original images, distorted images, and label images corresponding to the plurality of original documents.
In some examples, the sample set may include a first image set, a second image set, and a third image set. In some examples, the first set of images may be derived from original images, the second set of images may be derived from distorted images, and the third set of images may be derived from tagged images. Specifically, image blocks corresponding to character areas in original images, distorted images and label images corresponding to a plurality of original documents may be respectively used as the first image set, the second image set and the third image set. That is, the image blocks corresponding to the character areas of the original images corresponding to the plurality of original documents may be used as the first image set, the image blocks corresponding to the character areas of the distorted images corresponding to the plurality of original documents may be used as the second image set, and the image blocks corresponding to the character areas of the tag images corresponding to the plurality of original documents may be used as the third image set.
In some examples, in acquiring an image block corresponding to a text region of an original image, the text region in the original image may be acquired using an OCR (Optical Character Recognition) technique, and the image block may be acquired based on image data in the text region. In some examples, image data in a text region may be blocked to obtain image blocks. In this case, a sample set that meets the dimensional requirements of the alignment model 20 can be obtained while preserving the size of the original image.
In some examples, the image blocks may be uniform in size. This can facilitate training of the alignment model 20. In some examples, the size of the image block may be determined by the number of characters in the image block. In some examples, the number of words in the image block may be no less than 1. This enables characters to be included in each image block.
In some examples, the granularity of division of the image block may include at least one of a single word, a plurality of words, a single line of words, and a plurality of lines of words. Specifically, the image data in the text area may be blocked at the division granularity of the image block to obtain the image block. In this case, the image blocks of the corresponding division granularity can be obtained as needed. Preferably, the division granularity of the image block may be a single word. In this case, the granularity of alignment is accurate to each word, and thus the alignment accuracy and precision can be further improved.
In other examples, the partition granularity of the image block may also be a first preset ratio of the original image (e.g., the first preset ratio may be 1/3, 1/5, 1/7, 1/10, etc.). In other examples, the partition granularity of the image block may also be a second preset proportion of the single-row characters (e.g., the second preset proportion may be 1/2, 1/4, 1/5, 1/10, etc.).
It should be noted that the manner of obtaining the image blocks corresponding to the character areas of the distorted image and the label image is similar to that of the original image, and details are not repeated.
As described above, the third set of images may be source tagged images. The label image may be the original image after alignment. That is, the tag image may be an image in which the original image of each original document is aligned with the distorted image. The label image may also be referred to as a gold standard. In some examples, the label image may be acquired using image processing methods. In other examples, the label image may be obtained manually.
In some examples, in an image processing method, a tag image may be acquired based on positional alignment of a text skeleton. Specifically, the original images of the respective original documents may be geometrically transformed and aligned based on the position of the text skeleton to acquire the tag images. That is, the label image may be obtained by adjusting the original image. It should be noted that if the distorted image has no geometric distortion with respect to the original image, the geometric transformation may not be necessary.
Additionally, the text skeleton may be a centerline of the text. In some examples, the width of the text skeleton may be one pixel and the basic topology of the text shape is kept unchanged. In some examples, a morphological refinement algorithm may be employed to refine image blocks corresponding to a text region in an image to be refined (e.g., a geometrically transformed original image and a distorted image) to extract a text skeleton. In some examples, the morphology refinement algorithm may include, but is not limited to, a Hilditch refinement algorithm, a Pavlidis refinement algorithm, or a Rosenfeld refinement algorithm, among others. In some examples, the text skeleton may be fitted to obtain a smoother text skeleton. As an example of the character skeleton, fig. 3 shows a character skeleton P11 of an image block P10 corresponding to the letter "q", for example.
In some examples, prior to the position alignment, size unification may also be performed between the original image and the distorted image of each original document. This enables more accurate alignment. Preferably, the position alignment may be preceded by a geometric transformation. Specifically, prior to the geometric transformation, a global alignment parameter (e.g., a scaling size) may be calculated between the distorted image and the original image, based on which the distorted image or the original image is adjusted to achieve size unification between the distorted image and the original image.
Examples of the present disclosure also provide a method of obtaining a label image. Fig. 4 is an exemplary flow chart illustrating acquiring a tag image according to an example of the present disclosure. Fig. 5 is an exemplary flow chart illustrating a geometric transformation in accordance with an example of the present disclosure.
In some examples, referring to fig. 4, acquiring a label image may include step S202. In step S202, the original image of each original document may be geometrically transformed to obtain a transformed image (i.e., a geometrically transformed original image).
In some examples, in the geometric transformation, geometric transformation parameters may be obtained based on image blocks corresponding to character regions in the original image and the distorted image of each original document, and the original image of each original document may be transformed using the geometric transformation parameters to align character shapes between the transformed original image and the distorted image. In this case, the geometric transformation is performed based on the character region, and the interference of the content other than the characters in the original image and/or the distorted image can be reduced. Additionally, the text shape alignment may be such that the text between the transformed original image and the distorted image is substantially uniform in shape with respect to the same reference direction. In some examples, the geometric transformation parameters may be obtained by transmission changes.
In some examples, image blocks corresponding to character regions in the original image and the distorted image may be taken as a whole to obtain a set of geometric transformation parameters. That is, all the texts in one image (i.e. the original image or the distorted image) can be taken as a whole to obtain a set of geometric transformation parameters. In other examples, a set of geometric transformation parameters may be obtained based on each pair of image blocks that are position-matched in text regions in the original image and the distorted image. That is, a plurality of sets of geometric transformation parameters respectively corresponding to a plurality of pairs of image blocks may be acquired. In some examples, each pair of image blocks and each set of geometric transformation parameters may have a one-to-one correspondence.
Taking a plurality of sets of geometric transformation parameters as an example, referring to fig. 5, the process of geometric transformation may include:
step S302: and respectively acquiring image blocks of character areas in the original image and the distorted image as a first image set and a second image set by utilizing an OCR technology. The image blocks corresponding to the original image may be a first image set, and the image blocks corresponding to the distorted image may be a second image set.
Step S304: and traversing the image blocks of the second image set, and acquiring image blocks matched with the image blocks of the second image set from the first image set by using an image registration method to acquire a plurality of pairs of image blocks.
Step S306: in each pair of image blocks, the minimum rectangular coordinate containing the character area in the image block corresponding to the first image set is obtained as a first coordinate, and the minimum rectangular coordinate containing the character area in the image block corresponding to the second image set is obtained as a second coordinate.
Step S308: a perspective transformation matrix is acquired based on the first coordinates and the second coordinates. Wherein the first coordinate may represent a coordinate before transformation, and the second coordinate may represent a coordinate after transformation.
Step S310: and carrying out perspective transformation on the image blocks in the first image set by utilizing the perspective transformation matrix to obtain the image blocks after geometric transformation, and further obtaining the original image after geometric transformation. That is, the original image after geometric transformation can be obtained by performing perspective transformation on the image block corresponding to the character region in the original image.
Referring back to fig. 4, acquiring the label image may further include step S204. In step S204, the positions of the transformed image and the distorted image may be aligned based on the text skeleton to acquire a label image. Under the condition, the negative influence of character distortion on the alignment result can be reduced by the character skeleton, so that the condition that some distorted characters are easy to diffuse outwards or the edges are fuzzy due to distortion is compatible, and a more accurate alignment position can be obtained.
In some examples, text skeletons of the transformed image and the distorted image may be extracted separately for image matching to obtain alignment positions, and the positions of the transformed image and the distorted image may be aligned based on the alignment positions. Specifically, for each original document, a text skeleton of the transformed image may be used as a first skeleton, a text skeleton of the distorted image may be used as a second skeleton, an alignment position of a preset position in the transformed image in the distorted image is determined based on an overlapping degree between the first skeleton and the second skeleton, and the positions of the transformed image and the distorted image are aligned based on the alignment position.
For example, a preset position in the transformed image may be determined, and a position is found in the distorted image, where the position enables a pixel point at the preset position in the transformed image to move to the position, and after other pixel points except the preset position are also translated synchronously, the overlapping degree between the text skeleton in the transformed image and the text skeleton in the distorted image is the highest, and the position may be used as an alignment position.
In some examples, the position with the highest degree of overlap may be obtained as the alignment position by calculating a normalized correlation coefficient between the first skeleton and the second skeleton.
In some examples, the offset amount may be determined based on a preset position and the above-described alignment position. For example, if the coordinates of the preset position are the coordinates of the upper left corner of the transformed image, the offset amount may be the coordinates of the aligned position. In some examples, the transformed image may be aligned with the location of the distorted image based on the offset. For example, the transformed image or the distorted image may be moved to align the transformed image with the position of the distorted image based on the amount of shift.
In addition, the preset position may be a position of any pixel point in the original image. In some examples, the preset position may be a position of four corner points or a center point of the text skeleton corresponding region.
In some examples, the text skeleton for position alignment may be a text skeleton of the entire image. In other examples, the text skeleton for position alignment may be a text skeleton of respective image blocks in a text region in the image.
Fig. 6 is an exemplary flowchart showing step S104 in fig. 2.
Referring back to fig. 2, in step S104, the present embodiment may determine a predicted aligned image block corresponding to an image block in the first image set by using the sample set and the alignment model 20.
In some examples, referring to fig. 6, step S104 may include predicting the sample set using the alignment model 20 to obtain a prediction set (step S402). In particular, the prediction set may be obtained by the alignment model 20 predicting for the first image set and the second image set.
In some examples, each pair of image blocks in the first image set that matches the position in the second image set may be superimposed and the superimposed result may be used as input to the alignment model 20 to obtain the prediction set. The prediction set may include a plurality of predicted outcomes. Individual predictors may correspond one-to-one to each pair of image blocks. That is, the output of the alignment model 20 may be the prediction result for each pair of image blocks. Specifically, pairs of image blocks corresponding to the first image set and the second image set may be traversed, two image blocks in each pair of image blocks are overlapped and then input to the alignment model 20 to obtain a prediction result corresponding to each pair of image blocks, and then a prediction set is obtained based on a plurality of prediction results.
In some examples, each pair of image blocks may be superimposed in the channel dimension. Taking the image blocks of the RGB space as an example, each image block may have three color channels (i.e., R channel, G channel, and B channel), and after two image blocks in each pair of image blocks are superimposed in the channel dimension, the superimposed result may have six color channels. For example, for an image block in RGB space, the size of the superimposed result may be: the image block height x the image block width x 6.
In addition, each pair of image blocks with matching positions may indicate that two image blocks in each pair of image blocks may correspond to the same position in the original document. In some examples, each pair of image blocks of the first image set that are position-matched with the second image set may be obtained by an image registration method.
In some examples, referring to fig. 6, step S104 may further include determining a predicted aligned image block corresponding to an image block in the first image set based on the prediction results in the prediction sets (step S404). The prediction aligned image block may be a predicted position-aligned image block corresponding to an image block in the first image set.
As described above, the respective predictors may correspond one-to-one to each pair of image blocks. Each pair of image blocks may correspond to image blocks in the first image set and the second image set, respectively, one to one. Thus, image blocks in the first image set may have corresponding prediction results. That is, the prediction set may include predictions corresponding to image blocks in the first image set.
In some examples, the prediction result may be a predicted aligned image block. That is, the output of the alignment model 20 may be a predicted aligned image block. Thereby, a predicted aligned image block can be obtained directly based on the prediction result.
In some examples, the prediction result may be predictive displacement data. The prediction displacement data may represent an offset of each pixel point between an image block in the first image set and a corresponding prediction aligned image block. In this case, the prediction aligned image block can be determined based on the prediction displacement data, knowing the image block and the prediction displacement data in the first image set.
In some examples, for each pair of image blocks, the prediction displacement data may be used to move pixel points in the image blocks of the first image set to determine a prediction aligned image block to which the image block of the first image set corresponds. That is, the prediction displacement data may be applied to the original image blocks (i.e. the image blocks of the first image set) to obtain corresponding prediction aligned image blocks.
In some examples, the predicted displacement data includes a first displacement image and a second displacement image respectively located at two channels. That is, the predicted displacement data may have two channels, each channel having corresponding displacement data. The size of the displacement data may be identical to the size of the image block. For example, the size of the predicted displacement data for two channels may be: the image block height x the image block width x 2.
In some examples, the pixel values of the respective positions of the first shift image may represent horizontal parameters for horizontally shifting the pixel points of the corresponding positions of the image blocks of the first image set, and the pixel values of the respective positions of the second shift image may represent vertical parameters for vertically shifting the pixel points of the corresponding positions of the image blocks of the first image set. That is, in each pair of image blocks, the first and second shifted images may be used to shift pixel points of the image blocks belonging to the first image set in the horizontal direction and the vertical direction, respectively, to obtain a predicted aligned image block.
Referring back to fig. 2, in the present embodiment, in step S106, a plurality of loss functions may be determined based on the prediction aligned image block. In some examples, the plurality of loss functions may include a first loss function and a second loss function. In this case, a plurality of losses are comprehensively considered, the prediction result can be constrained from multiple dimensions, and the accuracy of the alignment result can be improved.
In some examples, the first loss function may be determined based on a predicted aligned image block corresponding to an image block in the first image set and a corresponding image block in the third image set. In some examples, the first loss function may represent a degree of difference between the predicted aligned image block and a corresponding image block in the third image set. That is, the first loss function may represent a loss of the predicted aligned image block relative to a golden standard (i.e., the image block in the third image set). In this case, by continuously constraining the prediction result of the alignment model 20 by the first loss function, it is possible to make the predicted aligned image block obtained based on the prediction result closer to the golden standard.
In some examples, the degree of difference corresponding to the first loss function may be represented by a similarity. Specifically, each image block in the third image set may be taken as a label image block, and the first loss function may be determined based on a similarity between the label image block and a prediction aligned image block corresponding to the label image block. Thereby, a loss of the prediction aligned image block with respect to the golden standard can be obtained.
In some examples, the second loss function may be determined based on a text skeleton of a corresponding predicted aligned image block in the first image set and a text skeleton of a corresponding image block in the second image set. In some examples, the second loss function may represent a degree of difference between a text skeleton of the predicted aligned image block and a text skeleton of a corresponding image block in the third image set. That is, the second loss function may represent a loss of the text skeleton of the predicted aligned image block relative to the text skeleton of the distorted image block (i.e., the image block in the second image set). In this case, the alignment model 20 can be constrained by the second loss function to match the character skeleton of the predicted aligned image block with the character skeleton of the distorted image block, so as to reduce the risk that the predicted aligned image block deviates from the character form of the distorted image block, and also reduce the influence of errors (e.g., distortion or distortion) that may exist in the golden standard, so as to achieve that the predicted aligned image block and the tag image block are aligned in position and the character form is closer to the distorted image block, thereby improving the accuracy of the alignment result.
In some examples, the degree of coincidence may be used to represent the degree of difference to which the second loss function corresponds. Specifically, each image block in the second image set may be taken as a distorted image block, and the second loss function may be determined based on a degree of coincidence between a character skeleton of the distorted image block and a character skeleton of a prediction-aligned image block corresponding to the distorted image block. Thereby, the second loss function can be determined based on the degree of overlap.
Referring back to fig. 2, in step S108, in the present embodiment, the alignment model 20 may be trained based on a plurality of loss functions to obtain a trained alignment model 20. In this case, the trained alignment model 20 can be generalized to align the positions of a greater variety of original images and distorted images based on the supervised information by training the alignment model 20, and thus a more robust alignment result can be obtained.
In some examples, the alignment model 20 may be trained using the plurality of penalty functions described above until a stop training condition is met to obtain a trained alignment model 20. For example, the stop training condition may be that the total loss corresponding to the alignment model 20 is not decreased or the number of training rounds reaches a preset number. In addition, the total loss can be the sum of losses corresponding to a plurality of loss functions in one training round. The plurality of loss functions may include the first loss function and the second loss function described above.
The alignment model 20 to which the disclosed examples relate may be any deep neural network that positionally aligns an original image with a distorted image. In some examples, the alignment model 20 may be a convolutional neural network based on deep learning. Preferably, the alignment model 20 may be based on a U-Net network. That is, the network structure of the alignment model 20 may be based on the network structure of the U-Net network. In some examples, the U-Net network based alignment model 20 may include an encoder and a decoder. The input to the encoder is the result of superimposing in the channel dimension each pair of image blocks that can be position matched in the first image set and the second image set. The output of the decoder may be the prediction result.
To this end, examples of the present disclosure also provide a U-Net network based alignment model 20. Fig. 7 is an exemplary block diagram illustrating a UNet-based alignment model 30 in accordance with examples of the present disclosure. Referring to fig. 7, the alignment model 20 is implemented as a UNet-based alignment model 30 and does not represent a limitation of the disclosed examples. Further, let the original image blocks and the distorted image blocks represent two image blocks in each pair of image blocks, wherein the original image blocks represent image blocks in the first image set and the distorted image blocks represent image blocks in the second image set.
With continued reference to fig. 7, the encoder of the alignment model 30 may include a plurality of downsample blocks, and the decoder of the alignment model 30 may include a plurality of upsample blocks. The downsampling block may include a convolution layer, a modified linear unit layer, a residual block, and a downsampling layer. The upsampling block may include an upsampling layer, a convolutional layer, a modified linear cell layer, and a residual module.
With continued reference to fig. 7, the convolutional layer of the first downsampling block may receive the result of the superposition of the original image block and the distorted image block in the channel dimension. The output of the downsampling layer for each downsampling block may be the input of the convolution layer for the next downsampling block. The output of the down-sampling layer of the last down-sampling block may be passed through the convolutional layer, the modified linear unit layer, and the residual block as input to the up-sampling layer of the first up-sampling block.
In addition, the corresponding downsample blocks and upsample blocks may be connected by a jump connection layer therebetween. Specifically, the output of the residual module of the upsampling block and the output of the residual module of the previous upsampling block of the corresponding upsampling module may be combined by jumping the connection layer, and the two combined outputs may be used as the input of the convolution layer of the corresponding upsampling module. This enables the combination of image features from deep layers and image features from shallow layers.
With continued reference to fig. 7, the output of the residual module of the last upsampled block may output the prediction result after the convolution layer.
The present disclosure also provides a training apparatus for an alignment model 20 for text distortion that may include a memory and a processor. The memory is for non-transitory storage of computer readable instructions. The processor is configured to execute computer readable instructions, which when executed by the processor perform one or more steps of a training method provided by any of the examples of the disclosure. It should be noted that, for the detailed description of the process of performing model training by the training apparatus, reference may be made to the relevant description in the example of the training method, and details are not repeated here.
The present disclosure also relates to an electronic device, which may comprise at least one processing circuit. The at least one processing circuit is configured to perform one or more steps of the training method described above.
The present disclosure also relates to a computer-readable storage medium, which may store at least one instruction, which when executed by a processor, implements one or more steps of the training method described above.
In some examples, after the alignment model 20 is trained, the document analysis model 10 may be trained using the distorted images of the original document and the aligned original images generated by the trained alignment model 20. Under the condition, a large amount of aligned data suitable for the training of the document analysis model 10 can be quickly generated, and a database for document image analysis is efficiently constructed, so that the problems of insufficient training data, overlarge difficulty in generating the training data, time and labor consumption in the process of developing the application of the deep network in the field of document images are solved or alleviated, and meanwhile, the quality of the training result of the document analysis model 10 is improved, and the generalization performance of the trained document analysis model 10 is improved.
Hereinafter, a process of using the trained alignment model 20 according to the present disclosure will be described in detail with reference to the accompanying drawings. Fig. 8 is an exemplary flow chart illustrating the use of a trained alignment model 20 in accordance with examples of the present disclosure. Fig. 9 is a schematic diagram illustrating aligned image blocks generated by the trained alignment model 20 according to an example of the present disclosure.
Referring to fig. 8, using the trained alignment model 20 may include:
step S502: an original image and a distorted image of each of a plurality of original documents are acquired. In some examples, image blocks corresponding to text regions in the original image and the distorted image may be extracted. The size of the image block may be the size of the input of the alignment model 20. That is, the size of the image block may conform to the size requirements input by the alignment model 20.
Step S504: the trained alignment model 20 is used to generate aligned original images for the distorted image and the original image of each original document. In some examples, the image blocks obtained in step S502 may be input into the alignment model 20 to obtain aligned image blocks. As an example, fig. 9 shows an aligned image block P23 generated by the alignment model 20 for an original image block P20. In addition, for the sake of comparison, FIG. 9 also shows a distorted image block P21 corresponding to original image block P20, and a corresponding image block P22 resulting from artificial alignment.
Step S506: the document analysis model 10 is trained using the distorted image and the aligned original images of the respective original documents. In some examples, the document analysis model 10 may be trained using the image blocks corresponding to the distorted image obtained in step S502 and the aligned image blocks obtained in step S504.
The training method and the training device perform geometric transformation on an original image of each original document and position alignment based on a character framework to obtain a label image, train by using image blocks corresponding to character areas in the original image, a distorted image and the label image as a training data alignment model 20 to obtain a predicted alignment image block, and continuously optimize the alignment model 20 by using a first loss function determined based on the predicted alignment image block and the label image block and a second loss function determined based on the character framework of the predicted alignment image block and the character framework of the distorted image block to further obtain the trained alignment model 20. In this case, the predicted aligned image block can be made closer to the golden standard by the first loss function, the risk that the predicted aligned image block deviates from the character form in the distorted image block can be reduced by the second loss function, the influence of errors possibly existing in the golden standard can be reduced, the predicted aligned image block and the label image block are aligned in position and the character form is closer to the distorted image block, the alignment of the distortion of the character level in the distorted image can be adapted, and the alignment precision and accuracy can be improved.
In addition, the trained alignment model 20 according to the example of the present disclosure can conveniently align the original image and the distorted image, and can replace the tedious matching step in the document image alignment process, thereby greatly shortening the time required for aligning the text of the document image, and making the document alignment method have a wider application range and stronger robustness. If new document contents and types are encountered and the original alignment model 20 has poor effect, only a small amount of data of corresponding document types need to be added to retrain the alignment model 20, and a more comprehensive alignment model 20 can be obtained to generate alignment data meeting requirements in batch quickly.
While the present disclosure has been described in detail in connection with the drawings and examples, it should be understood that the above description is not intended to limit the disclosure in any way. Those skilled in the art can make modifications and variations to the present disclosure as needed without departing from the true spirit and scope of the disclosure, which fall within the scope of the disclosure.
Claims (10)
1. A training method for an alignment model of character distortion is characterized in that the alignment model is a deep neural network used for aligning the position of an undistorted original image and a distorted image corresponding to an original document, and the training method comprises the following steps: acquiring a plurality of original documents, and performing geometric transformation and position alignment based on a character skeleton on an original image of each original document in the plurality of original documents to acquire a tag image, wherein the tag image is an aligned original image; respectively taking image blocks corresponding to character areas in original images, distorted images and label images corresponding to the original documents as a first image set, a second image set and a third image set; obtaining a prediction set obtained by predicting the alignment model aiming at the first image set and the second image set, wherein the prediction set comprises prediction results corresponding to image blocks in the first image set, and determining predicted aligned image blocks corresponding to the image blocks in the first image set based on the prediction results in the prediction set; determining a first loss function based on a predicted aligned image block corresponding to the image block in the first image set and a corresponding image block in the third image set, and determining a second loss function based on a character skeleton of the predicted aligned image block corresponding to the image block in the first image set and a character skeleton of the image block corresponding to the second image set; and training the alignment model based on the first loss function and the second loss function to obtain the trained alignment model.
2. Training method according to claim 1, characterized in that:
in the geometric transformation, geometric change parameters are obtained based on image blocks corresponding to character areas in the original image and the distorted image of each original document, and the original image of each original document is transformed by using the geometric transformation parameters so as to align the character shapes between the transformed original image and the distorted image.
3. Training method according to claim 1, characterized in that:
in the position alignment, aiming at each original document, taking a character skeleton of the original image after the geometric transformation as a first skeleton, taking a character skeleton of a distorted image as a second skeleton, determining an alignment position of a preset position in the original image after the geometric transformation in the distorted image based on the overlapping degree between the first skeleton and the second skeleton, and aligning the position of the original image after the geometric transformation with the position of the distorted image based on the alignment position.
4. Training method according to claim 1, characterized in that:
taking each image block in the third image set as a label image block, and determining the first loss function based on the similarity between the label image block and a prediction alignment image block corresponding to the label image block; and/or
And taking each image block in the second image set as a distorted image block, and determining the second loss function based on the coincidence degree between the character framework of the distorted image block and the character framework of the prediction alignment image block corresponding to the distorted image block.
5. Training method according to claim 1, characterized in that:
the sizes of the image blocks are consistent, and the number of characters in the image blocks is not less than 1.
6. Training method according to claim 5, characterized in that:
the granularity of division of the image block includes at least one of a single word, a plurality of words, a single line of words, and a plurality of lines of words.
7. Training method according to claim 1, characterized in that:
the distorted image comprises at least one of a legal image and a copied image, the legal image is an image acquired by acquiring the original image through first acquisition equipment, the copied image is an image acquired by printing the legal image on a material object carrier and acquiring the printed image through second acquisition equipment.
8. Training method according to claim 1, characterized in that:
the input of the alignment model is a result of superposition of each pair of image blocks in position matching in the first image set and the second image set on a channel dimension, the output of the alignment model is the prediction result, the prediction result is prediction displacement data, and the prediction displacement data is used for moving pixel points in the image blocks of the first image set to determine the prediction aligned image blocks corresponding to the image blocks in the first image set.
9. Training method according to claim 8, characterized in that:
the predicted displacement data comprises a first displacement image and a second displacement image which are respectively located in two channels, pixel values of all positions in the first displacement image represent horizontal parameters used for horizontally moving pixel points of corresponding positions of image blocks of the first image set, and pixel values of all positions of the second displacement image represent vertical parameters used for vertically moving pixel points of corresponding positions of image blocks of the first image set.
10. An apparatus for training an alignment model for character distortion, comprising: a memory for non-transitory storage of computer readable instructions; and a processor for executing the computer readable instructions, the computer readable instructions when executed by the processor performing the training method of any of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210781749.9A CN115063813B (en) | 2022-07-05 | 2022-07-05 | Training method and training device of alignment model aiming at character distortion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210781749.9A CN115063813B (en) | 2022-07-05 | 2022-07-05 | Training method and training device of alignment model aiming at character distortion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115063813A true CN115063813A (en) | 2022-09-16 |
CN115063813B CN115063813B (en) | 2023-03-24 |
Family
ID=83203562
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210781749.9A Expired - Fee Related CN115063813B (en) | 2022-07-05 | 2022-07-05 | Training method and training device of alignment model aiming at character distortion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115063813B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350909A (en) * | 2023-10-24 | 2024-01-05 | 江苏群杰物联科技有限公司 | Text watermark processing method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215248A (en) * | 2019-07-11 | 2021-01-12 | 深圳先进技术研究院 | Deep learning model training method and device, electronic equipment and storage medium |
CN113762117A (en) * | 2021-08-27 | 2021-12-07 | 深圳数联天下智能科技有限公司 | Training method of image processing model, image processing model and computer equipment |
US20210383544A1 (en) * | 2020-06-03 | 2021-12-09 | Here Global B.V. | Semantic segmentation ground truth correction with spatial transformer networks |
CN114241493A (en) * | 2021-12-20 | 2022-03-25 | 深圳大学 | Training method and training device for training data of amplification document analysis model |
CN114298195A (en) * | 2021-12-21 | 2022-04-08 | 上海高德威智能交通系统有限公司 | Image quality evaluation method and device, electronic equipment and machine-readable storage medium |
-
2022
- 2022-07-05 CN CN202210781749.9A patent/CN115063813B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112215248A (en) * | 2019-07-11 | 2021-01-12 | 深圳先进技术研究院 | Deep learning model training method and device, electronic equipment and storage medium |
US20210383544A1 (en) * | 2020-06-03 | 2021-12-09 | Here Global B.V. | Semantic segmentation ground truth correction with spatial transformer networks |
CN113762117A (en) * | 2021-08-27 | 2021-12-07 | 深圳数联天下智能科技有限公司 | Training method of image processing model, image processing model and computer equipment |
CN114241493A (en) * | 2021-12-20 | 2022-03-25 | 深圳大学 | Training method and training device for training data of amplification document analysis model |
CN114298195A (en) * | 2021-12-21 | 2022-04-08 | 上海高德威智能交通系统有限公司 | Image quality evaluation method and device, electronic equipment and machine-readable storage medium |
Non-Patent Citations (2)
Title |
---|
ISHANK GOEL,AND ETC: "Deep Convolutional Neural Network for Double-Identity Fingerprint Detection", 《 IEEE SENSORS LETTERS》 * |
应自炉等: "多特征融合的文档图像版面分析", 《中国图象图形学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350909A (en) * | 2023-10-24 | 2024-01-05 | 江苏群杰物联科技有限公司 | Text watermark processing method and device, electronic equipment and storage medium |
CN117350909B (en) * | 2023-10-24 | 2024-05-14 | 江苏群杰物联科技有限公司 | Text watermark processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115063813B (en) | 2023-03-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108520247B (en) | Method, device, terminal and readable medium for identifying object node in image | |
EP3309703B1 (en) | Method and system for decoding qr code based on weighted average grey method | |
CN109753838A (en) | Two-dimensional code identification method, device, computer equipment and storage medium | |
WO2018233055A1 (en) | Method and apparatus for entering policy information, computer device and storage medium | |
CN114155527A (en) | Scene text recognition method and device | |
CN113392669B (en) | Image information detection method, detection device and storage medium | |
CN111104813A (en) | Two-dimensional code image key point detection method and device, electronic equipment and storage medium | |
CN115063813B (en) | Training method and training device of alignment model aiming at character distortion | |
CN111737478A (en) | Text detection method, electronic device and computer readable medium | |
CN110544202A (en) | parallax image splicing method and system based on template matching and feature clustering | |
CN114155540B (en) | Character recognition method, device, equipment and storage medium based on deep learning | |
CN111783763A (en) | Text positioning box correction method and system based on convolutional neural network | |
CN112184533B (en) | Watermark synchronization method based on SIFT feature point matching | |
CN111814543B (en) | Depth video object repairing and tampering detection method | |
CN113159035A (en) | Image processing method, device, equipment and storage medium | |
CN111881914A (en) | License plate character segmentation method and system based on self-learning threshold | |
CN116050379A (en) | Document comparison method and storage medium | |
CN116311290A (en) | Handwriting and printing text detection method and device based on deep learning | |
US11423597B2 (en) | Method and system for removing scene text from images | |
CN111402281B (en) | Book edge detection method and device | |
CN112907533A (en) | Detection model training method, device, equipment and readable storage medium | |
CN111768333A (en) | Identification removing method, device, equipment and storage medium | |
CN111401365A (en) | OCR image automatic generation method and device | |
CN111626286A (en) | Method for quickly identifying arbitrary angle alignment of express delivery surface | |
CN116259050B (en) | Method, device, equipment and detection method for positioning and identifying label characters of filling barrel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20230324 |