WO2021203832A1 - 文本图像中手写内容去除方法、装置、存储介质 - Google Patents
文本图像中手写内容去除方法、装置、存储介质 Download PDFInfo
- Publication number
- WO2021203832A1 WO2021203832A1 PCT/CN2021/076250 CN2021076250W WO2021203832A1 WO 2021203832 A1 WO2021203832 A1 WO 2021203832A1 CN 2021076250 W CN2021076250 W CN 2021076250W WO 2021203832 A1 WO2021203832 A1 WO 2021203832A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- handwritten
- text
- pixel
- handwritten content
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 78
- 238000012545 processing Methods 0.000 claims abstract description 39
- 238000003709 image segmentation Methods 0.000 claims abstract description 34
- 230000006870 function Effects 0.000 claims description 27
- 238000003708 edge detection Methods 0.000 claims description 11
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 230000008439 repair process Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 description 26
- 239000011159 matrix material Substances 0.000 description 25
- 230000008569 process Effects 0.000 description 15
- 238000007639 printing Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 6
- 230000001788 irregular Effects 0.000 description 5
- 230000009466 transformation Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000001914 filtration Methods 0.000 description 4
- 238000002156 mixing Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000002902 bimodal effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/147—Determination of region of interest
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/273—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/22—Character recognition characterised by the type of writing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/24—Character recognition characterised by the processing or recognition method
- G06V30/242—Division of the character sequences into groups prior to recognition; Selection of dictionaries
- G06V30/244—Division of the character sequences into groups prior to recognition; Selection of dictionaries using graphical properties, e.g. alphabet type or font
- G06V30/2455—Discrimination between machine-print, hand-print and cursive writing
Definitions
- the invention relates to a method, a device and a storage medium for removing handwritten content in a text image.
- the present invention provides a method for removing handwritten content in a text image, including: obtaining an input image of a text page to be processed, wherein the input image includes a handwritten area, and the handwritten area includes handwritten content; using image segmentation
- the model recognizes the input image to obtain the initial handwritten pixels of the handwritten content; blurs the initial handwritten pixels to obtain a handwritten pixel mask area; determines the handwritten pixel mask area according to the handwritten pixel mask area The handwritten content in the area; remove the handwritten content in the input image to obtain an output image.
- removing the handwritten content in the input image to obtain an output image includes:
- the pixel value of the initial handwritten pixel and the position of the handwritten pixel mask area determine the non-handwritten pixel in the handwritten pixel mask area in the input image; remove the handwritten pixel in the input image The content of the pixel mask area to obtain the intermediate output image;
- removing the handwritten content in the input image to obtain an output image includes:
- the handwritten content in the input image is removed according to the non-handwritten pixels in the handwritten pixel mask area and the handwritten pixel mask area to obtain the output image.
- removing the handwritten content in the input image to obtain an output image includes: cutting and removing the handwritten content from the input image to Obtain an intermediate output image; perform binarization processing on the intermediate output image to obtain the output image.
- removing the handwritten content in the input image to obtain the output image includes: obtaining replacement pixels; The pixels of the handwritten content are used to remove the handwritten content from the input image to obtain the output image.
- replacing pixels of the handwritten content with the replacement pixels to remove the handwritten content from the input image to obtain the output image includes : Replacing the pixels of the handwritten content with the replacement pixels to remove the handwritten content from the input image to obtain an intermediate output image; perform binarization processing on the intermediate output image to obtain the output image.
- the replacement pixels are obtained according to the pixels of the handwritten content through an image restoration algorithm based on pixel neighborhood calculation.
- the obtaining replacement pixels further includes recognizing the input image using a region recognition model to obtain the handwriting area, and the replacement pixels are the Any pixel in the handwriting area except the pixel of the handwritten content; or, the replacement pixel is an average value of the pixel values of all pixels in the handwriting area except the pixel of the handwritten content.
- obtaining the input image of the text page to be processed includes: obtaining an original image of the text page to be processed, wherein the original image includes the text page to be processed Text area; performing edge detection on the original image to determine the text area to be processed in the original image; performing normalization processing on the text area to be processed to obtain the input image.
- the image segmentation model is a pre-trained U-Net model for segmenting the input image.
- the initial handwritten pixel is blurred by a Gaussian filter function, and the area of the initial handwritten pixel is enlarged to obtain the handwritten pixel mask area .
- the present invention also provides an apparatus for removing handwritten content in a text image, including: a memory for non-temporarily storing computer readable instructions; and a processor for running the computer readable instructions, wherein the When the computer-readable instructions are executed by the processor, the method for removing handwritten content in a text image according to any one of the above embodiments is executed.
- the present invention also provides a storage medium that stores computer-readable instructions non-transitory Content removal method.
- FIG. 1 is a schematic flowchart of a method for removing handwritten content in a text image according to an embodiment of the present invention
- FIG. 2A is a schematic diagram of an original image provided by an embodiment of the present invention.
- 2B is a schematic diagram of an output image provided by an embodiment of the present invention.
- FIG. 3 is a schematic block diagram of a device for removing handwritten content in a text image according to an embodiment of the present invention
- FIG. 4 is a schematic diagram of a storage medium provided by an embodiment of the present invention.
- Fig. 5 is a schematic diagram of a hardware environment provided by an embodiment of the present invention.
- At least one embodiment of the present invention provides a method, device and storage medium for removing handwritten content from a text image.
- the method for removing handwritten content from a text image includes: obtaining an input image of a text page to be processed, where the input image includes a handwritten area, and the handwritten area includes handwritten content; and using an image segmentation model to recognize the input image to obtain the handwritten content.
- the initial handwritten pixels of the; the initial handwritten pixels are blurred to obtain the handwritten pixel mask area; the handwritten content is determined according to the handwritten pixel mask area; the handwritten content in the input image is removed to obtain the output image.
- the method for removing handwritten content in a text image can effectively remove the handwritten content in the handwritten area in the input image, so as to facilitate the output of an image or file that only includes printed content.
- the method for removing the handwritten content in the text image can also convert the input image into a form that is convenient for printing, so that the user can print the input image into a paper form for storage or distribution.
- FIG. 1 is a schematic flowchart of a method for removing handwritten content in a text image provided by at least one embodiment of the present invention
- FIG. 2A is a schematic diagram of an original image provided by at least one embodiment of the present invention
- An embodiment provides a schematic diagram of an output image.
- the method for removing handwritten content in a text image provided by an embodiment of the present invention includes steps S10 to S14.
- the method for removing handwritten content in a text image obtains an input image of a text page to be processed in step S10.
- the input image includes a handwritten area, and the handwritten area includes handwritten content.
- the input image can be any image that includes handwritten content.
- the input image may be an image taken by an image acquisition device (for example, a digital camera or a mobile phone, etc.), and the input image may be a grayscale image or a color image. It should be noted that the input image refers to a form in which the text page to be processed is presented in a visual manner, such as a picture of the text page to be processed.
- an image acquisition device for example, a digital camera or a mobile phone, etc.
- the input image may be a grayscale image or a color image.
- the input image refers to a form in which the text page to be processed is presented in a visual manner, such as a picture of the text page to be processed.
- the handwriting area does not have a fixed shape, but depends on the content of the handwriting, that is, the area with the handwriting content is the handwriting area, and the handwriting area can be a regular shape (for example, a rectangle, etc.), or it can be irregular shape.
- the handwriting area may include a filled area, a handwritten draft, or other handwritten marked areas.
- the input image also includes a text printing area, and the text printing area includes printed content.
- the shape of the text printing area may also be a regular shape (for example, a rectangle, etc.) or an irregular shape.
- the shape of each handwriting area is a rectangle and the shape of each text printing area is a rectangle as an example for description.
- the present invention includes but is not limited to this.
- the text pages to be processed may include books, newspapers, periodicals, documents, forms, contracts, and so on.
- Books, newspapers, and periodicals include all kinds of document pages with articles or patterns
- documents include all kinds of invoices, receipts, express orders, etc.
- the forms can be various types of forms, such as year-end summary tables, entry lists, price summary tables, Application forms, etc.
- contracts can include various forms of contract text pages, etc.
- the invention does not specifically limit the type of text page to be processed.
- the text page to be processed may be text in paper form or text in electronic form.
- the printed content can include the title text of each item, and the handwritten content can include information filled in by the user, such as name, address, phone number, etc. (In this case, the information is filled in by the user Personal information, not general information)
- the to-be-processed text page is article-type text
- the printed content may be article content
- the handwritten content may be user notes or other handwritten marks.
- the printed content can include item titles such as "name”, “gender”, “ethnicity”, and “work history”, and the handwritten content can include users (for example, employees, etc.) Handwritten information such as the user’s name, gender (male or female), ethnicity, and work experience filled in the entry form.
- the printed content can also include various symbols, graphics, and so on.
- the shape of the text page to be processed may be a rectangle or the like, and the shape of the input image may be a regular shape (for example, a parallelogram, a rectangle, etc.) to facilitate printing.
- the present invention is not limited to this.
- the input image may also have an irregular shape.
- the size of the input image and the size of the text page to be processed are not the same.
- the present invention is not limited to this, and the size of the input image and the size of the text page to be processed may also be same.
- the text page to be processed includes printed content and handwritten content
- the printed content may be printed content
- the handwritten content is user handwritten content
- the handwritten content may include handwritten characters.
- printed content does not only refer to text, characters, graphics and other content input on the electronic device through the input device.
- the text page to be processed is text such as notes
- the content of the notes It can also be handwritten by the user.
- the printed content is the printed content on the blank notebook page used for handwriting, such as horizontal lines.
- the printed content may include characters in various languages, such as Chinese (for example, Chinese characters or pinyin), English, Japanese, French, Korean, etc.
- the printed content may also include numbers and various symbols (for example, check marks, Cross and various operation symbols, etc.) and various graphics.
- the handwritten content may also include text, numbers, various symbols, and various graphics in various languages.
- the to-be-processed text page 100 is a form, and the area surrounded by four boundary lines (straight lines 101A-101D) represents the to-be-processed text area 100 corresponding to the to-be-processed text page.
- the printing area includes a form area
- the printed content can include the text of each item, such as name, birthday, etc.
- the printed content can also include the logo graphic in the upper right corner of the to-be-processed text area 100 (covered) Processing), etc.
- the handwritten area includes a handwritten information area
- the handwritten content may include personal information handwritten by the user, for example, the user’s handwritten name, birthday information, health information, tick marks, and so on.
- the input image may include multiple handwritten content and multiple printed content.
- Multiple handwritten content is spaced apart from each other, and multiple printed content is also spaced apart from each other.
- part of the handwritten content in multiple handwritten content may be the same (that is, the characters of the handwritten content are the same, but the specific shape of the handwritten content is different); part of the printed content in the multiple printed content may also be the same.
- the present invention is not limited to this, a plurality of handwritten contents may also be different from each other, and a plurality of printed contents may also be different from each other.
- step S10 may include: obtaining an original image of a text page to be processed, where the original image includes a text area to be processed; performing edge detection on the original image to determine the text area to be processed in the original image; The text area to be processed is normalized to obtain the input image.
- a neural network or an OpenCV-based edge detection algorithm can be used to perform edge detection on the original image to determine the text area to be processed.
- OpenCV is an open source computer vision library.
- Edge detection algorithms based on OpenCV include Sobel, Scarry, Canny, Laplacian, Prewitt, Marr-Hildresh, scharr and many other algorithms.
- performing edge detection on the original image to determine the text area to be processed in the original image may include: processing the original image to obtain a line drawing of the gray contour in the original image, where the line drawing includes multiple lines; The similar lines in the line drawing are merged to obtain multiple initial merged lines, and a boundary matrix is determined according to the multiple initial merged lines; the similar lines in the multiple initial merged lines are merged to obtain the target line, and the unmerged initial The merged line is also used as the target line to obtain multiple target lines; according to the boundary matrix, multiple reference boundary lines are determined from the multiple target lines; the original image is processed through the pre-trained boundary line region recognition model to obtain the original image Multiple boundary line areas of the text page to be processed in the text page; for each boundary line area, determine the target boundary line corresponding to the boundary line area from multiple reference boundary lines; determine the target boundary line in the original image according to the determined multiple target boundary lines Process the edges of the text area.
- processing the original image to obtain a line drawing of the gray contour in the original image includes: processing the original image by an edge detection algorithm based on OpenCV to obtain a line drawing of the gray contour in the original image .
- merging similar lines in a line drawing to obtain multiple initial merged lines includes: obtaining long lines in the line drawing, where the long lines are lines whose length exceeds the first preset threshold; and obtaining multiple lines from the long lines.
- a group of first-type lines wherein the first-type lines include at least two successively adjacent long lines, and the angle between any two adjacent long lines is less than the second preset threshold; for each group of first Class line, each long line in the first type of line of the group is sequentially merged to obtain an initial merged line.
- the boundary matrix is determined in the following way: multiple initial merged lines and unmerged lines in long lines are redrawn, and the position information of the pixels in all redrawn lines is mapped to the matrix of the entire original image.
- the values of the positions of the pixels of these lines are set to the first value, and the values of the positions of the pixels other than these lines are set to the second value, thereby forming a boundary matrix.
- merging similar lines among multiple initial merged lines to obtain the target line includes: obtaining multiple sets of second-type lines from the multiple initial merged lines, where the second-type lines include at least two adjacent initial lines. Merged lines, and the angle between any two adjacent initial merged lines is less than the third preset threshold; for each group of second type lines, each initial merged line in the group of second type lines is merged in turn Get a target line.
- the first preset threshold may be 2 pixels in length, and the second preset threshold and the third preset threshold may be 15 degrees. It should be noted that the first preset threshold, the second preset threshold, and the third preset threshold can be set according to actual application requirements.
- multiple reference boundary lines are determined from multiple target lines, including: for each target line, the target line is extended, a line matrix is determined according to the extended target line, and then the line The matrix is compared with the boundary matrix, and the number of pixels belonging to the boundary matrix on the extended target line is calculated as the result of the target line, that is, the line matrix is compared with the boundary matrix to determine how many pixels fall into In the boundary matrix, it is to determine how many pixels at the same position in the two matrices have the same first value, such as 255, to calculate the score.
- the line matrix and the boundary matrix have the same size; according to the scores of each target line, Determine multiple reference boundary lines among multiple target lines. It should be noted that the number of target lines with the best performance may be multiple. Therefore, according to the performance of each target line, multiple target lines with the best performance are determined from the multiple target lines as the reference boundary line.
- the line matrix is determined in the following way: redraw the extended target line or straight line, correspond the position information of the pixel points in the redrawn line to the matrix of the entire original image, and combine the lines in the matrix of the original image.
- the value of the location of the pixel is set to the first value, and the value of the location of the pixel other than the line is set to the second value, thereby forming a line matrix.
- determining the target boundary line corresponding to the boundary line area from a plurality of reference boundary lines includes: calculating the slope of each reference boundary line; for each boundary line area, using Hough transform to transform The boundary line area is converted into multiple straight lines, and the average slope of the multiple straight lines is calculated, and then it is judged whether there is a reference boundary line whose slope matches the average slope among the multiple reference boundary lines.
- the reference boundary line is determined as The target boundary line corresponding to the boundary line area; if it is determined that there is no reference boundary line with a slope matching the average slope among the multiple reference boundary lines, then for each straight line obtained by the conversion of the boundary line area, the straight line
- the formed line matrix is compared with the boundary matrix, and the number of pixels belonging to the boundary matrix on the line is calculated as the score of the line; the line with the best score is determined as the target boundary line corresponding to the boundary line area; Among them, the line matrix and the boundary matrix have the same size. It should be noted that if there are multiple straight lines with the best results, the first straight line among them will be used as the best boundary line according to the sorting algorithm.
- the boundary line region recognition model is a neural network-based model.
- the boundary line region recognition model can be established through machine learning training.
- multiple target boundary lines for example, four target boundary lines
- the text area to be processed is determined by multiple target boundary lines, for example, according to multiple intersection points of multiple target boundary lines
- the text area to be processed can be determined with multiple target boundary lines. Every two adjacent target boundary lines intersect to obtain an intersection point. Multiple intersection points and multiple target boundary lines together define the area where the text to be processed in the original image is located. .
- the text area to be processed may be a text area surrounded by four target boundary lines.
- the four target boundary lines are all straight lines, and the four target boundary lines are respectively the first target boundary line 101A, the second target boundary line 101B, the third target boundary line 101C, and the fourth target boundary line 101D.
- the original image may also include a non-text area, for example, an area other than the area enclosed by the four border lines in FIG. 2A.
- performing normalization processing on the text area to be processed to obtain the input image includes: performing projection transformation on the text area to be processed to obtain a front view of the text area to be processed, and the front view is the input image.
- Projective transformation Perspective Transformation
- Viewing Plane also known as Projective Mapping.
- the true shape of the text to be processed has changed in the original image, that is, geometric distortion has occurred.
- the shape of the text to be processed ie, the form
- was originally a rectangle but the shape of the text to be processed in the original image has changed, becoming an irregular polygon.
- performing projection transformation on the text area to be processed in the original image can transform the text area to be processed from irregular polygons into rectangles or parallelograms, etc., that is, to correct the text area to be processed to remove the influence of geometric distortion, and get The front view of the text to be processed in the original image.
- the projection transformation can process the pixels in the text area to be processed according to the space projection conversion coordinates to obtain the front view of the text to be processed, which will not be repeated here.
- the text area to be processed may not be normalized, and the text area to be processed may be directly cut from the original image to obtain a separate image of the text area to be processed.
- the image of the processed text area is the input image.
- the original image may be an image directly collected by the image acquisition device, or may be an image obtained after preprocessing the image directly collected by the image acquisition device.
- the original image can be a grayscale image or a color image.
- the method for removing handwritten content in the text image provided by the embodiment of the present invention may also include performing processing on the original image.
- the operation of preprocessing Preprocessing can eliminate irrelevant information or noise information in the original image, so as to better process the original image.
- the preprocessing may include, for example, processing such as scaling, cropping, gamma correction, image enhancement, or noise reduction filtering on the image directly collected by the image collection device.
- the original image can be used as the input image.
- the original image can be directly recognized to determine the handwritten content in the original image; then the handwritten content in the original image is removed , To obtain the output image; alternatively, you can directly recognize the original image to determine the handwritten content in the original image; then remove the handwritten content in the original image to obtain an intermediate output image; then perform edge detection on the intermediate output image to determine The text area to be processed in the intermediate output image; the text area to be processed is corrected to obtain the output image, that is, in some embodiments of the present invention, the handwritten content in the original image can be removed first to obtain the intermediate output Image, and then perform edge detection and normalization processing on the intermediate output image.
- step S11 the input image is recognized using an image segmentation model to obtain the initial handwritten pixels of the handwritten content.
- an image segmentation model refers to a model for region recognition (or division) of an input image
- an image segmentation model is implemented using machine learning technology (for example, convolutional neural network technology) and running on a general-purpose computing device or a dedicated computing device, for example
- the image segmentation model is a pre-trained model.
- the neural network applied to the image segmentation model can also achieve the same function through other neural network models including deep convolutional neural network, masked region convolutional neural network (Mask-RCNN), deep residual network, attention model, etc. , Don’t make too many restrictions here.
- U-Net model which is an improved FCN (Fully Convolutional Network, Fully Convolutional Neural Network) structure, which follows the idea of FCN for image semantic segmentation, namely The convolutional layer and the pooling layer are used for feature extraction, and then the deconvolutional layer is used to restore the image size.
- U-Net network model is a model with better performance for image segmentation. Deep learning is good at solving classification problems. Using this feature of deep learning for image segmentation, its essence is to classify each pixel in the image. Finally, the points of different categories are marked with different channels, and the effect of classifying and marking the characteristic information in the target area can be achieved.
- the U-Net model can determine the initial handwritten pixels of the handwritten content in the input image.
- other neural network models such as Mask-RCNN can also be used to determine the initial handwritten pixels of the handwritten content.
- step S12 blur processing is performed on the initial handwritten pixels to obtain a handwritten pixel mask area. Recognizing the input image through the image segmentation model, the obtained initial handwritten pixels may not be all handwritten pixels, but the remaining missing handwritten pixels are generally adjacent to the initial handwritten pixels, so the initial handwritten pixels need to be checked. Blur processing is performed to expand the handwritten pixel area to obtain a handwritten pixel mask area.
- the handwritten pixel mask area basically contains all the handwritten pixels.
- Gaussian blurring can be performed on the initial handwritten pixels through the GaussianBlur function based on the OpenCV Gaussian filter to expand the initial handwritten pixel area, thereby obtaining the handwritten pixel mask area.
- Gaussian filtering is performed by performing convolution calculations on each point of the input array and the input Gaussian filter template, and then these results are formed into the filtered output array. It is a process of weighted averaging the image of the initial handwritten pixel. Each pixel The value of a point is obtained by weighted average of its own and other pixel values in the neighborhood.
- Gaussian blur processing the handwritten pixel image becomes blurred, but its area is enlarged.
- any other blur processing technology can also be used to blur the initial handwritten pixels, and there are no too many restrictions here.
- step S13 the handwritten content is determined according to the handwritten pixel mask area.
- the handwritten pixel mask area According to the handwritten pixel mask area and combined with the initial handwritten pixels, basically all the handwritten pixels of the handwritten content are determined, thereby determining the handwritten content.
- step S14 the handwritten content in the input image is removed to obtain an output image.
- the position of the handwritten pixel mask area in the input image can be determined, and then it is transferred to the area corresponding to the position in the input image
- To determine non-handwritten pixels According to the pixel value of the initial handwritten pixel, search for other pixels with a larger pixel value difference in the corresponding area corresponding to the position of the handwritten pixel mask area in the input image, and determine them as non-handwritten pixels, for example, it can be set
- the threshold value of the pixel difference value is determined, and when there are pixels with the pixel difference value outside the threshold value range in the area, it is determined as a non-handwritten pixel.
- the inpaint function based on OpenCV can be used to remove the content of the handwritten pixel mask area.
- the inpaint function based on OpenCV uses the area neighborhood to repair the selected area in the image, that is, the pixels in the corresponding area corresponding to the position of the handwritten pixel mask area in the input image are repaired using the neighborhood pixels to remove the input.
- the effect of the handwritten pixel mask area content in the image, and an intermediate output image is obtained.
- the position of the handwritten pixel mask area in the input image can be determined, and then it is transferred to the area corresponding to the position in the input image
- To determine non-handwritten pixels According to the pixel value of the initial handwritten pixel, search for other pixels with a larger pixel value difference in the corresponding area corresponding to the position of the handwritten pixel mask area in the input image, and determine them as non-handwritten pixels, for example, it can be set
- the threshold value of the pixel difference value is determined, and when there are pixels with the pixel difference value outside the threshold value range in the area, it is determined as a non-handwritten pixel.
- the handwritten content in the input image is removed according to the non-handwritten pixels in the handwritten pixel mask area and the handwritten pixel mask area to obtain the output image. That is, non-handwritten pixels are excluded from the handwritten pixel mask area, so that other parts of pixels are removed, so non-handwritten pixels are retained to prevent them from being removed by mistake, and the output image is finally obtained.
- the inpaint function based on OpenCV can be used to remove the content of the handwritten pixel mask area that excludes non-handwritten pixels.
- the inpaint function based on OpenCV uses the area neighborhood to repair the selected area in the image, that is, the pixels in the input image corresponding to the position of the handwritten pixel mask area in the corresponding area except for the non-handwritten pixels are repaired using the neighborhood pixels , So as to achieve the effect of removing the content of the handwritten pixel mask area in the input image.
- removing the handwritten content in the input image to obtain an output image includes: cutting and removing the handwritten content from the input image to obtain an intermediate output image ; Binarize the intermediate output image to obtain the output image.
- Binarization processing is the process of setting the gray value of the pixels on the intermediate output image to 0 or 255, that is, making the entire intermediate output image show a clear black and white effect. Binarization processing can make the data in the intermediate output image The amount is greatly reduced, so that the outline of the target can be highlighted. Binarization processing can convert the intermediate output image into a grayscale image with obvious black and white contrast (ie output image). The converted grayscale image has less noise interference, which can effectively improve the recognition and printing of the content in the output image. Effect.
- the pixels in the area corresponding to the handwritten content in the input image are empty, that is, there are no pixels.
- the intermediate output image is binarized, the area where the pixels in the intermediate output image are empty will not be processed; or, when the intermediate output image is binarized, the pixels in the intermediate output image can also be processed.
- the empty area is filled with a gray value of 255. In this way, the processed text image is formed into a whole without unsightly void areas of handwritten content.
- the final output image can be used to facilitate the user to print the output image into a paper form.
- the output image can be printed into a paper form for other users to fill in.
- the method of binarization processing can be a threshold method.
- the threshold method includes: setting a binarization threshold, and comparing the pixel value of each pixel in the intermediate output image with the binarization threshold. If a certain value in the intermediate output image is If the pixel value of the pixel is greater than or equal to the binarization threshold, then the pixel value of the pixel is set to 255 gray scale. If the pixel value of a pixel in the intermediate output image is less than the binarization threshold, the pixel value of the pixel is set If the grayscale is 0, the intermediate output image can be binarized.
- the selection methods of binarization threshold include bimodal method, P-parameter method, big law method (OTSU method), maximum entropy method, iterative method and so on.
- performing binarization processing on an intermediate output image includes: obtaining an intermediate output image; performing grayscale processing on the intermediate output image to obtain a grayscale image of the intermediate output image;
- the binary image is processed by binarization to obtain the binarized image of the intermediate output image; the binarized image is used as the guiding image, and the gray image is subjected to guiding filtering processing to obtain the filtered image; according to the second threshold, the filtered image is determined
- the gray value of the high-value pixel is greater than the second threshold; according to the preset expansion coefficient, the gray value of the high-value pixel is expanded to obtain an expanded image; the expanded image is sharpened to obtain Clear image; adjust the contrast of the clear image to get the output image.
- gray-scale processing methods include component method, maximum value method, average method, and weighted average method.
- the preset expansion factor is 1.2-1.5, for example, 1.3.
- the gray value of each high-value pixel is multiplied by a preset expansion coefficient to expand the gray value of the high-value pixel, thereby obtaining an expanded image with more obvious black and white contrast.
- the second threshold is the sum of the mean gray value of the filtered image and the standard deviation of the gray value.
- clearing the expanded image to obtain a clear image includes: using Gaussian filtering to blur the expanded image to obtain a blurred image; according to the preset mixing coefficient, the blurred image and the expanded image are mixed in proportion to obtain a clear image image.
- f 1 (i,j) is the gray value of the pixel at (i,j) in the expanded image
- f 2 (i,j) is the gray value of the pixel at (i,j) in the blurred image.
- Degree value, f 3 (i,j) is the gray value of the pixel of the clear image at (i,j)
- k 1 is the preset mixing coefficient of the expanded image
- k 2 is the preset expansion coefficient of the blurred image
- f 3 (i,j) k 1 f 1 (i,j)+k 2 f 2 (i,j).
- the preset mixing coefficient of the expanded image is 1.5, and the preset mixing coefficient of the blurred image is -0.5.
- adjusting the contrast of a clear image includes: adjusting the gray value of each pixel of the clear image according to the average gray value of the clear image.
- the gray value of each pixel of a clear image can be adjusted by the following formula:
- f'(i,j) is the gray value of the pixel of the enhanced image at (i,j)
- the average gray value of the clear image f(i,j) is the gray value of the pixel of the clear image at (i,j)
- t is the intensity value.
- the intensity value may be 0.1-0.5, for example, the intensity value may be 0.2. In practical applications, the intensity value can be selected according to the final black and white enhancement effect to be achieved.
- step S14 includes: obtaining replacement pixels; replacing pixels of handwritten content with replacement pixels to remove the handwritten content from the input image to obtain an output image.
- the replacement pixels can be adjacent pixels outside the handwritten pixel mask area, that is, the current handwritten pixels that need to be replaced are adjacent pixels outside the handwritten pixel mask area.
- area recognition can also be used to perform handwriting pixel replacement processing.
- the replacement pixel can be the pixel value of any pixel in the handwriting area except the pixel of the handwritten content; or, replace The pixel is the average value (for example, geometric average) of the pixel values of all pixels in the handwriting area except the pixels of the handwritten content; or, the replacement pixel value may also be a fixed value, for example, a 255 grayscale value.
- an image segmentation model such as the U-Net model can be used to directly extract any pixel in the handwriting area except for the handwritten content pixels to obtain replacement pixels; alternatively, image segmentation such as the U-Net model can be used The model extracts all pixels in the handwriting area except the pixels of the handwritten content, and then obtains the replacement pixel value based on the pixel value of all pixels.
- replacing pixels of handwritten content with replacement pixels to remove handwritten content from an input image to obtain an output image includes: replacing pixels of handwritten content with replacement pixels to remove handwritten content from an input image to obtain an intermediate output image; for intermediate output The image is binarized to obtain the output image.
- region recognition and binarization processing of the region recognition model can refer to the above-mentioned related description, and the repetition will not be repeated.
- an output image as shown in FIG. 2B can be obtained, and the output image is a binarized image. As shown in FIG. 2B, in the output image, all the handwritten content is removed, thereby obtaining a blank form without user information.
- the model (for example, an arbitrary model such as a region recognition model, an image segmentation model, etc.) is not just a mathematical model, but a module that can receive input data, perform data processing, and output processing results.
- the module can be a software module, a hardware module (for example, a hardware neural network) or a combination of software and hardware.
- the region recognition model and/or image segmentation model includes codes and programs stored in a memory; the processor can execute the codes and programs to implement some of the region recognition model and/or image segmentation models described above. Function or full function.
- the region recognition model and/or the image segmentation model may include one circuit board or a combination of multiple circuit boards for realizing the functions described above.
- the circuit board or combination of circuit boards may include: (1) one or more processors; (2) one or more non-transitory computer-readable Memory; and (3) firmware stored in the memory executable by the processor.
- the method for removing handwritten content in the text image further includes a training phase.
- the training phase includes the process of training the region recognition model and the image segmentation model. It should be noted that the region recognition model and the image segmentation model can be trained separately, or the region recognition model and the image segmentation model can be trained at the same time.
- the training area may be treated by the first sample image marked with a text printing area (for example, the number of marked text printing areas is at least one) and a handwriting area (for example, the number of marked handwriting areas is at least one)
- the recognition model is trained to obtain an area recognition model.
- the training process of the region recognition model to be trained may include: in the training phase, training the region recognition model to be trained using multiple first sample images marked with the text printing region and the handwritten region to obtain the region recognition model.
- using multiple first sample images to train the region recognition model to be trained includes: obtaining the current first sample image from multiple first sample images; using the region recognition model to be trained to process the current first sample image, To obtain the training text printing area and the training handwriting area; according to the text printing area and the handwriting area, the training text printing area and the training handwriting area marked in the current first sample image, the first loss function is used to calculate the recognition model of the area to be trained The first loss value; the parameters of the region recognition model to be trained are modified according to the first loss value.
- the trained region recognition model is obtained.
- the first loss function does not meet the first predetermined condition, If the condition is satisfied, continue to input the first sample image to repeat the above-mentioned training process.
- the above-mentioned first predetermined condition corresponds to the convergence of the loss of the first loss function (that is, the first loss value is no longer significantly reduced) when a certain number of first sample images are input.
- the above-mentioned first predetermined condition is that the number of training times or the training period reaches a predetermined number (for example, the predetermined number may be millions).
- the image segmentation model to be trained may be obtained by training the image segmentation model to be trained through the second sample image marked with pixels of the handwritten content.
- the second sample image can be enlarged to accurately label all the handwritten content pixels.
- handwriting features for example, pixel gray features, font features, etc.
- machine learning is performed to build an image segmentation model.
- the training process of the image segmentation model to be trained may include: in the training phase, training the image segmentation model to be trained using multiple second sample images marked with pixels of the handwritten content to obtain the image segmentation model.
- using multiple second sample images to train the region recognition model to be trained includes: obtaining the current second sample image from multiple second sample images; using the image segmentation model to be trained to process the current second sample image to obtain training handwriting Content pixels; according to the handwritten content pixels marked in the current second sample image and the training handwritten content pixels, the second loss function is used to calculate the second loss value of the image segmentation model to be trained; the second loss value of the image segmentation model to be trained is calculated according to the second loss value The parameters are corrected.
- the trained image segmentation model is obtained.
- the second loss function does not meet the second predetermined condition, continue to input the second sample image to repeat the above training process.
- the above-mentioned second predetermined condition corresponds to the convergence of the loss of the second loss function (that is, the second loss value is no longer significantly reduced) when a certain number of second sample images are input.
- the above-mentioned second predetermined condition is that the number of training times or the training period reaches a predetermined number (for example, the predetermined number may be millions).
- the multiple first training sample images and the multiple second training sample images may be the same or different.
- FIG. 3 is a schematic block diagram of an apparatus for removing handwritten content from a text image provided by at least one embodiment of the present invention.
- the device 300 for removing handwritten content in a text image includes a processor 302 and a memory 301.
- the components of the handwritten content removal device 300 in the text image shown in FIG. 3 are only exemplary and not restrictive. According to actual application requirements, the handwritten content removal device 300 in the text image may also have other components.
- the memory 301 is used for non-transitory storage of computer-readable instructions; the processor 302 is used for running computer-readable instructions, and the computer-readable instructions are executed when the processor 302 runs. Content removal method.
- the apparatus 300 for removing handwritten content from a text image provided by the embodiment of the present invention may be used to implement the method for removing handwritten content from a text image provided by the embodiment of the present invention.
- the apparatus 300 for removing handwritten content from a text image may be configured on an electronic device.
- the electronic device may be a personal computer, a mobile terminal, etc.
- the mobile terminal may be a hardware device with various operating systems such as a mobile phone or a tablet computer.
- the device 300 for removing handwritten content in a text image may further include an image acquiring component 303.
- the image obtaining part 303 is used to obtain a text image, for example, to obtain an image of a paper text.
- the memory 301 may also be used to store text images; the processor 302 is also used to read and process the text images to obtain input images.
- the text image may be the original image described in the embodiment of the method for removing handwritten content in the text image.
- the image acquisition component 303 is the image acquisition device described in the embodiment of the method for removing handwritten content in the text image.
- the image acquisition component 303 may be a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a digital Camera lenses, webcams, and other devices that can be used for image capture.
- the image acquisition component 303, the memory 301, and the processor 302 may be physically integrated in the same electronic device, and the image acquisition component 303 may be a camera configured on the electronic device.
- the memory 301 and the processor 302 receive the image sent from the image acquisition part 303 via the internal bus.
- the image acquisition component 303 and the memory 301/processor 302 may also be configured separately in physical locations, and the memory 301 and the processor 302 may be integrated in the first user's electronic device (for example, the first user's computer, mobile phone, etc.) ,
- the image acquisition component 303 can be integrated in the electronic device of the second user (the first user and the second user are not the same), the electronic device of the first user and the electronic device of the second user can be configured separately in physical location, and
- the electronic device of the first user and the electronic device of the second user may communicate in a wired or wireless manner.
- the electronic device of the second user may send the original image to the electronic device of the first user via a wired or wireless manner.
- the electronic device of receives the original image and performs subsequent processing on the original image.
- the memory 301 and the processor 302 may also be integrated in a cloud server, and the cloud server receives the original image and processes the original image.
- the device 300 for removing handwritten content in a text image may further include an output device, and the output device is used to output the output image.
- the output device may include a display (for example, an organic light emitting diode display, a liquid crystal display), a projector, etc., and the display and the projector may be used to display the output image.
- the output device may also include a printer, and the printer is used to print the output image.
- the network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
- the network may include a local area network, the Internet, a telecommunications network, the Internet of Things (Internet of Things) based on the Internet and/or a telecommunications network, and/or any combination of the above networks, and the like.
- the wired network may, for example, use twisted pair, coaxial cable, or optical fiber transmission for communication, and the wireless network may use, for example, a 3G/4G/5G mobile communication network, Bluetooth, Zigbee, or WiFi.
- the invention does not limit the type and function of the network here.
- the processor 302 may control other components in the handwriting content removal apparatus 300 in the text image to perform desired functions.
- the processor 302 may be a central processing unit (CPU), a tensor processor (TPU), or a graphics processing unit (GPU) and other devices with data processing capabilities and/or program execution capabilities.
- the central processing unit (CPU) can be an X86 or ARM architecture.
- the GPU can be directly integrated on the motherboard alone or built into the north bridge chip of the motherboard.
- the GPU can also be built into the central processing unit (CPU).
- the memory 301 may include any combination of one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
- Volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
- Non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
- One or more computer-readable instructions may be stored on the computer-readable storage medium, and the processor 302 may run the computer-readable instructions to implement various functions of the apparatus 300 for removing handwritten content in a text image.
- Various application programs and various data can also be stored in the storage medium.
- FIG. 4 is a schematic diagram of a storage medium provided by at least one embodiment of the present invention.
- one or more computer-readable instructions 501 may be non-transitory stored on the storage medium 500.
- the computer-readable instruction 501 is executed by a computer, one or more steps in the method for removing handwritten content from a text image described above can be executed.
- the storage medium 500 may be applied to the apparatus 300 for removing handwritten content from a text image, for example, it may include the memory 301 in the apparatus 300 for removing handwritten content from a text image.
- Fig. 5 is a schematic diagram of a hardware environment provided by at least one embodiment of the present invention.
- the device for removing handwritten content in a text image provided by an embodiment of the present invention can be applied to an Internet system.
- the computer system provided in FIG. 5 can be used to implement the handwritten content removal device in the text image involved in the present invention.
- Such computer systems can include personal computers, notebook computers, tablet computers, mobile phones and any smart devices.
- the specific system in this embodiment uses a functional block diagram to explain a hardware platform including a user interface.
- Such a computer system may include a general purpose computer device, or a special purpose computer device. Both types of computer equipment can be used to implement the apparatus for removing handwritten content in a text image in this embodiment.
- the computer system can implement any of the currently described components of the information needed to implement the method for removing handwritten content from text images.
- a computer system can be realized by a computer device through its hardware device, software program, firmware, and their combination.
- the computer system may include a communication port 250, which is connected to a network that realizes data communication.
- the communication port 250 may communicate with the image acquisition component 403 described above.
- the computer system may also include a processor group 220 (ie, the processor described above) for executing program instructions.
- the processor group 220 may be composed of at least one processor (for example, a CPU).
- the computer system may include an internal communication bus 210.
- the computer system may include different forms of program storage units and data storage units (ie, the memory or storage medium described above), such as a hard disk 270, a read only memory (ROM) 230, and a random access memory (RAM) 240, which can be used for storage Various data files used for computer processing and/or communication, and possible program instructions executed by the processor group 220.
- the computer system may also include an input/output component 260, which may support input/output data flow between the computer system and other components (for example, the user interface 280, which may be the display described above).
- the computer system can also send and receive information and data through the communication port 250.
- the above-mentioned computer system may be used to form a server in an Internet communication system.
- the server of the Internet communication system can be a server hardware device or a server group. Each server in a server group can be connected through a wired or wireless network.
- a server group can be centralized, such as a data center.
- a server group can also be distributed, such as a distributed system.
- each block in the block diagram and/or flowchart of the present invention can be used as a dedicated hardware-based system that performs specified functions or actions. , Or can be realized by a combination of dedicated hardware and computer program instructions. It is well known to those skilled in the art that implementation through hardware, implementation through software, and implementation through a combination of software and hardware are all equivalent.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Character Input (AREA)
- Image Processing (AREA)
- Editing Of Facsimile Originals (AREA)
- Facsimile Image Signal Circuits (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (13)
- 一种文本图像中手写内容去除方法,其特征在于,包括:获取待处理文本页面的输入图像,其中,所述输入图像包括手写区域,所述手写区域包括手写内容;利用图像分割模型对所述输入图像进行识别,以得到所述手写内容的初始手写像素;对所述初始手写像素进行模糊处理,以得到手写像素掩膜区域;根据所述手写像素掩膜区域确定所述手写区域中的所述手写内容;去除所述输入图像中的所述手写内容,以得到输出图像。
- 根据权利要求1所述的文本图像中手写内容去除方法,其特征在于,去除所述输入图像中的所述手写内容,以得到输出图像,包括:根据所述初始手写像素的像素值以及所述手写像素掩膜区域的位置,在所述输入图像中确定所述手写像素掩膜区域中的非手写像素;去除所述输入图像中的所述手写像素掩膜区域内容,以得到中间输出图像;对所述中间输出图像进行所述手写像素掩膜区域中的非手写像素修复,以得到所述输出图像。
- 根据权利要求1所述的文本图像中手写内容去除方法,其特征在于,去除所述输入图像中的所述手写内容,以得到输出图像,包括:根据所述初始手写像素的像素值以及所述手写像素掩膜区域的位置,在所述输入图像中确定所述手写像素掩膜区域中的非手写像素;根据所述手写像素掩膜区域中的非手写像素以及所述手写像素掩膜区域去除所述输入图像中的所述手写内容,以得到所述输出图像。
- 根据权利要求1所述的文本图像中手写内容去除方法,其特征在于,去除所述输入图像中的所述手写内容,以得到输出图像包括:从所述输入图像中切割去除所述手写内容,以得到中间输出图像;对所述中间输出图像进行二值化处理,以得到所述输出图像。
- 根据权利要求1所述的文本图像中手写内容去除方法,其特征在于,去除所述输入图像中的所述手写内容,以得到所述输出图像,包括:获取替换像素;利用所述替换像素替换所述手写内容的像素,以从所述输入图像去除所述手写内容而得到所述输出图像。
- 根据权利要求5所述的文本图像中手写内容去除方法,其特征在于,利用所述替换像素替换所述手写内容的像素,以从所述输入图像去除所述手写内容以得到所述输出图像,包括:利用所述替换像素替换所述手写内容的像素,以从所述输入图像去除所述手写内容而得到中间输出图像;对所述中间输出图像进行二值化处理,以得到所述输出图像。
- 根据权利要求5所述的文本图像中手写内容去除方法,其特征在于,所述替换像素是根据所述手写内容的像素通过基于像素邻域计算的图像修复算法获取的。
- 根据权利要求5所述的文本图像中手写内容去除方法,其特征在于,所述获取替换像素还包括利用区域识别模型对所述输入图像进行识别,得到所述手写区域,所述替换像素为所述手写区域中除了所述手写内容的像素之外的任意一个像素;或者,所述替换像素为所述手写区域中除了所述手写内容的像素之外的所有像素的像素值的平均值。
- 根据权利要求1-8任一项所述的文本图像中手写内容去除方法,其特征在于,获取所述待处理文本页面的输入图像包括:获取所述待处理文本页面的原始图像,其中,所述原始图像包括待处理文本区域;对所述原始图像进行边缘检测,以确定所述原始图像中的所述待处理文本区域;对所述待处理文本区域进行转正处理,以得到所述输入图像。
- 根据权利要求1所述的文本图像中手写内容去除方法,其特征在于,所述图像分割模型为预先训练好的对所述输入图像进行分割的U-Net模型。
- 根据权利要求1所述的文本图像中手写内容去除方法,其特征在于,通过高斯滤波函数对所述初始手写像素进行模糊处理,扩大所述初始手写像 素的区域,以得到所述手写像素掩膜区域。
- 一种文本图像中手写内容去除装置,其特征在于,包括:存储器,用于非暂时性存储计算机可读指令;以及处理器,用于运行所述计算机可读指令,其中,所述计算机可读指令被所述处理器运行时执行根据权利要求1-11任一项所述的文本图像中手写内容去除方法。
- 一种存储介质,非暂时性地存储计算机可读指令,其特征在于,当所述计算机可读指令由计算机执行时可以执行根据权利要求1-11任一项所述的文本图像中手写内容去除方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020227037762A KR20220160660A (ko) | 2020-04-10 | 2021-02-09 | 텍스트 이미지에서 필기 내용을 제거하는 방법, 장치 및 저장 매체 |
JP2022560485A JP2023523152A (ja) | 2020-04-10 | 2021-02-09 | テキスト画像中の手書き内容を除去する方法および装置、ならびに記憶媒体 |
US17/915,488 US20230222631A1 (en) | 2020-04-10 | 2021-02-09 | Method and device for removing handwritten content from text image, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010278143.4A CN111488881A (zh) | 2020-04-10 | 2020-04-10 | 文本图像中手写内容去除方法、装置、存储介质 |
CN202010278143.4 | 2020-04-10 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021203832A1 true WO2021203832A1 (zh) | 2021-10-14 |
Family
ID=71794780
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/076250 WO2021203832A1 (zh) | 2020-04-10 | 2021-02-09 | 文本图像中手写内容去除方法、装置、存储介质 |
Country Status (5)
Country | Link |
---|---|
US (1) | US20230222631A1 (zh) |
JP (1) | JP2023523152A (zh) |
KR (1) | KR20220160660A (zh) |
CN (1) | CN111488881A (zh) |
WO (1) | WO2021203832A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114048822A (zh) * | 2021-11-19 | 2022-02-15 | 辽宁工程技术大学 | 一种图像的注意力机制特征融合分割方法 |
CN117746214A (zh) * | 2024-02-07 | 2024-03-22 | 青岛海尔科技有限公司 | 基于大模型生成图像的文本调整方法、装置、存储介质 |
CN117746214B (zh) * | 2024-02-07 | 2024-05-24 | 青岛海尔科技有限公司 | 基于大模型生成图像的文本调整方法、装置、存储介质 |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275139B (zh) * | 2020-01-21 | 2024-02-23 | 杭州大拿科技股份有限公司 | 手写内容去除方法、手写内容去除装置、存储介质 |
CN111488881A (zh) * | 2020-04-10 | 2020-08-04 | 杭州睿琪软件有限公司 | 文本图像中手写内容去除方法、装置、存储介质 |
CN112070708B (zh) * | 2020-08-21 | 2024-03-08 | 杭州睿琪软件有限公司 | 图像处理方法、图像处理装置、电子设备、存储介质 |
CN112150394B (zh) * | 2020-10-12 | 2024-02-20 | 杭州睿琪软件有限公司 | 图像处理方法及装置、电子设备和存储介质 |
CN112150365B (zh) * | 2020-10-15 | 2023-02-21 | 江西威力固智能设备有限公司 | 一种喷印图像的涨缩处理方法及喷印设备 |
CN113781356A (zh) * | 2021-09-18 | 2021-12-10 | 北京世纪好未来教育科技有限公司 | 图像去噪模型的训练方法、图像去噪方法、装置及设备 |
CN114283156B (zh) * | 2021-12-02 | 2024-03-05 | 珠海移科智能科技有限公司 | 一种用于去除文档图像颜色及手写笔迹的方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521516A (zh) * | 2011-12-20 | 2012-06-27 | 北京商纳科技有限公司 | 一种自动生成错题本的方法及系统 |
CN109254711A (zh) * | 2018-09-29 | 2019-01-22 | 联想(北京)有限公司 | 信息处理方法及电子设备 |
US20190066273A1 (en) * | 2013-07-24 | 2019-02-28 | Georgetown University | Enhancing the legibility of images using monochromatic light sources |
CN111275139A (zh) * | 2020-01-21 | 2020-06-12 | 杭州大拿科技股份有限公司 | 手写内容去除方法、手写内容去除装置、存储介质 |
CN111488881A (zh) * | 2020-04-10 | 2020-08-04 | 杭州睿琪软件有限公司 | 文本图像中手写内容去除方法、装置、存储介质 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20080055119A (ko) * | 2006-12-14 | 2008-06-19 | 삼성전자주식회사 | 화상형성장치 및 그 제어방법 |
CN105898322A (zh) * | 2015-07-24 | 2016-08-24 | 乐视云计算有限公司 | 一种视频去水印方法及装置 |
-
2020
- 2020-04-10 CN CN202010278143.4A patent/CN111488881A/zh active Pending
-
2021
- 2021-02-09 WO PCT/CN2021/076250 patent/WO2021203832A1/zh active Application Filing
- 2021-02-09 KR KR1020227037762A patent/KR20220160660A/ko unknown
- 2021-02-09 US US17/915,488 patent/US20230222631A1/en active Pending
- 2021-02-09 JP JP2022560485A patent/JP2023523152A/ja active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102521516A (zh) * | 2011-12-20 | 2012-06-27 | 北京商纳科技有限公司 | 一种自动生成错题本的方法及系统 |
US20190066273A1 (en) * | 2013-07-24 | 2019-02-28 | Georgetown University | Enhancing the legibility of images using monochromatic light sources |
CN109254711A (zh) * | 2018-09-29 | 2019-01-22 | 联想(北京)有限公司 | 信息处理方法及电子设备 |
CN111275139A (zh) * | 2020-01-21 | 2020-06-12 | 杭州大拿科技股份有限公司 | 手写内容去除方法、手写内容去除装置、存储介质 |
CN111488881A (zh) * | 2020-04-10 | 2020-08-04 | 杭州睿琪软件有限公司 | 文本图像中手写内容去除方法、装置、存储介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114048822A (zh) * | 2021-11-19 | 2022-02-15 | 辽宁工程技术大学 | 一种图像的注意力机制特征融合分割方法 |
CN117746214A (zh) * | 2024-02-07 | 2024-03-22 | 青岛海尔科技有限公司 | 基于大模型生成图像的文本调整方法、装置、存储介质 |
CN117746214B (zh) * | 2024-02-07 | 2024-05-24 | 青岛海尔科技有限公司 | 基于大模型生成图像的文本调整方法、装置、存储介质 |
Also Published As
Publication number | Publication date |
---|---|
JP2023523152A (ja) | 2023-06-02 |
KR20220160660A (ko) | 2022-12-06 |
US20230222631A1 (en) | 2023-07-13 |
CN111488881A (zh) | 2020-08-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021203832A1 (zh) | 文本图像中手写内容去除方法、装置、存储介质 | |
WO2021147631A1 (zh) | 手写内容去除方法、手写内容去除装置、存储介质 | |
WO2021233266A1 (zh) | 边缘检测方法和装置、电子设备和存储介质 | |
JP5972468B2 (ja) | 画像からのラベルの検出 | |
US11106891B2 (en) | Automated signature extraction and verification | |
WO2023284502A1 (zh) | 图像处理方法、装置、设备和存储介质 | |
US9330331B2 (en) | Systems and methods for offline character recognition | |
US9235757B1 (en) | Fast text detection | |
US10423851B2 (en) | Method, apparatus, and computer-readable medium for processing an image with horizontal and vertical text | |
US10169650B1 (en) | Identification of emphasized text in electronic documents | |
CN113033558B (zh) | 一种用于自然场景的文本检测方法及装置、存储介质 | |
CN114283156B (zh) | 一种用于去除文档图像颜色及手写笔迹的方法及装置 | |
Bukhari et al. | The IUPR dataset of camera-captured document images | |
Susan et al. | Text area segmentation from document images by novel adaptive thresholding and template matching using texture cues | |
WO2022002002A1 (zh) | 图像处理方法、图像处理装置、电子设备、存储介质 | |
CN114581928A (zh) | 一种表格识别方法及系统 | |
JP7364639B2 (ja) | デジタル化された筆記の処理 | |
Cai et al. | Bank card and ID card number recognition in Android financial APP | |
WO2019071476A1 (zh) | 一种基于智能终端的快递信息录入方法及录入系统 | |
Konya et al. | Adaptive methods for robust document image understanding | |
Hengaju et al. | Improving the Recognition Accuracy of Tesseract-OCR Engine on Nepali Text Images via Preprocessing | |
Uyun et al. | Skew Correction and Image Cleaning Handwriting Recognition Using a Convolutional Neural Network | |
Mahajan et al. | Improving Classification of Scanned Document Images using a Novel Combination of Pre-Processing Techniques | |
US20240144711A1 (en) | Reliable determination of field values in documents with removal of static field elements | |
Tamirat | Customers Identity Card Data Detection and Recognition Using Image Processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21785421 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2022560485 Country of ref document: JP Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 20227037762 Country of ref document: KR Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21785421 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21785421 Country of ref document: EP Kind code of ref document: A1 |