CN112784850A - Method and device for removing penetrating print of notes - Google Patents

Method and device for removing penetrating print of notes Download PDF

Info

Publication number
CN112784850A
CN112784850A CN201911065009.XA CN201911065009A CN112784850A CN 112784850 A CN112784850 A CN 112784850A CN 201911065009 A CN201911065009 A CN 201911065009A CN 112784850 A CN112784850 A CN 112784850A
Authority
CN
China
Prior art keywords
text image
pixel
clustering
color category
pixel points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911065009.XA
Other languages
Chinese (zh)
Inventor
陈晓念
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Original Assignee
Beijing Kingsoft Office Software Inc
Zhuhai Kingsoft Office Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Office Software Inc, Zhuhai Kingsoft Office Software Co Ltd filed Critical Beijing Kingsoft Office Software Inc
Priority to CN201911065009.XA priority Critical patent/CN112784850A/en
Publication of CN112784850A publication Critical patent/CN112784850A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour

Abstract

The embodiment of the invention provides a method and a device for removing a mark strike-through, which are used for acquiring a text image, clustering all pixel points in the text image, determining the color category to which each pixel point in the text image belongs, extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all the pixel points belonging to the background color category by using the pixel value of the clustering center point of the background color category to obtain an updated text image. The embodiment of the invention can automatically and efficiently remove the note show-through in the text image.

Description

Method and device for removing penetrating print of notes
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for removing a penetrating print of a note.
Background
The electronization of paper documents is an important means for saving documents, and the paper documents are scanned into text images which are easier to store and copy on electronic equipment in a scanning mode. The paper documents comprise printed paper documents and handwritten paper documents, the phenomenon of note show-through may occur on the handwritten paper documents, the note show-through on the back of the paper documents can cause document blurring, the paper documents with note show-through are scanned, and text images obtained by scanning also appear in disorder due to the note show-through.
At present, the removal of the note print-through in the text image is mainly performed by manually erasing the note print-through in the text image by using image processing software. The image processing software has strong specialty, and professional technicians can remove the note print-through only, and if the text image has more note print-through parts, the note print-through can be removed only through complex operations, so that the efficiency of removing the note print-through is low. Therefore, how to automatically and efficiently remove the note print-through in the text image becomes an urgent technical problem to be solved.
Disclosure of Invention
The embodiment of the invention aims to provide a method and a device for removing the note print-through, so as to automatically and efficiently remove the note print-through in a text image. The specific technical scheme is as follows:
in order to achieve the purpose, the invention discloses a method for removing the strike-through of a note, which comprises the following steps:
acquiring a text image;
clustering all pixel points in the text image, and determining the color category of each pixel point in the text image;
and extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all pixel points belonging to the background color category by using the pixel values to obtain an updated text image.
In an embodiment of the present invention, before clustering each pixel point in the text image and determining the color category to which each pixel point in the text image belongs, the method further includes:
sampling pixel points in the text image to obtain a preset number of pixel points;
clustering all pixel points in the text image, and determining the color category to which all the pixel points in the text image belong, wherein the clustering comprises the following steps:
and clustering the preset number of pixel points obtained by sampling, and determining the color categories to which the preset number of pixel points belong respectively.
In an embodiment of the present invention, before clustering each pixel point in the text image and determining the color category to which each pixel point in the text image belongs, the method further includes:
reducing the storage bit depth of each pixel point in the text image to obtain a first text image;
clustering all pixel points in the text image, and determining the color category to which all the pixel points in the text image belong, wherein the clustering comprises the following steps:
and clustering all pixel points in the first text image, and determining the color category to which all the pixel points in the first text image belong.
In an embodiment of the present invention, the step of clustering each pixel point in the text image and determining the color category to which each pixel point in the text image belongs includes:
determining the clustering center points of all color categories in the text image;
and respectively calculating the distance between the pixel point and each clustering central point aiming at any pixel point in the text image, and determining the color category to which the pixel point belongs as the color category of the clustering central point corresponding to the minimum distance in the distances between the pixel point and each clustering central point.
In one embodiment of the invention, the method further comprises:
and aiming at other color categories except the background color category, extracting and replacing the pixel values of all pixel points belonging to the color category by using the pixel value of the cluster center point of the color category to obtain an updated text image.
In order to achieve the above object, the present invention also discloses a device for removing the strike-through mark, which comprises:
the acquisition module is used for acquiring a text image;
the clustering module is used for clustering all pixel points in the text image and determining the color category of all the pixel points in the text image;
the extraction module is used for extracting the pixel value of the clustering center point of the background color category;
and the replacing module is used for replacing the pixel values of all the pixel points belonging to the background color category by using the pixel values to obtain an updated text image.
In one embodiment of the invention, the apparatus further comprises:
the sampling module is used for sampling pixel points in the text image to obtain a preset number of pixel points;
a clustering module specifically configured to:
and clustering the preset number of pixel points obtained by sampling, and determining the color categories to which the preset number of pixel points belong respectively.
In one embodiment of the invention, the apparatus further comprises:
the compression module is used for reducing the storage bit depth of each pixel point in the text image to obtain a first text image;
a clustering module specifically configured to:
and clustering all pixel points in the first text image, and determining the color category to which all the pixel points in the first text image belong.
In an embodiment of the present invention, the clustering module is specifically configured to:
determining the clustering center points of all color categories in the text image;
and respectively calculating the distance between the pixel point and each clustering central point aiming at any pixel point in the text image, and determining the color category to which the pixel point belongs as the color category of the clustering central point corresponding to the minimum distance in the distances between the pixel point and each clustering central point.
In an embodiment of the present invention, the replacement module is further configured to:
and aiming at other color categories except the background color category, extracting and replacing the pixel values of all pixel points belonging to the color category by using the pixel value of the cluster center point of the color category to obtain an updated text image.
In order to achieve the above object, an embodiment of the present invention further discloses an electronic device, which includes a processor, a memory, a display, a communication interface, and a communication bus,
a memory for storing a computer program;
a processor, configured to implement the method provided in the first aspect of the embodiments of the present invention when executing the computer program stored in the memory.
In order to achieve the above object, an embodiment of the present invention further discloses a computer-readable storage medium, in which instructions are stored, and when the instructions are executed on a computer, the method provided in the first aspect of the embodiment of the present invention is performed.
The method and the device for removing the passthrough of the note, provided by the embodiment of the invention, are used for acquiring a text image, clustering all pixel points in the text image, determining the color category to which each pixel point in the text image belongs, extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all the pixel points belonging to the background color category by using the pixel value to obtain an updated text image. The pixel clustering is carried out on the pixel points of the text image, the color category of each pixel point in the text image can be determined, because the color of the note print-through area is generally shallow, after clustering, the pixels of the note print-through are clustered into a background color category, the pixel values of all the pixels belonging to the background color category are replaced by the pixel value of the clustering center point of the background color category, the clustering center point of the background color category is one pixel in the actual background, the pixel values of all the pixels belonging to the background color category are replaced by the pixel value of the pixel, so that the pixel values of the pixels of the note print-through are replaced by the pixel values of the actual background, the method can automatically and efficiently remove the note print-through in the text image without complex manual operation.
Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for removing print-through of a note according to an embodiment of the present invention;
FIG. 2a is an original image before a note print-through is removed according to an embodiment of the present invention;
FIG. 2b is a diagram illustrating an effect of the method for removing print-through of a note according to the embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating a method for print-through removal of a note according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart illustrating a method for print-through removal of a note according to an embodiment of the present invention;
FIG. 5 is a schematic flow chart illustrating a method for print-through removal of a note according to an embodiment of the present invention;
FIG. 6 is a schematic flow chart illustrating a method for print-through removal of a note according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a note print-through removing apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a method and a device for removing a penetrating print of a note, which are respectively explained in detail below.
Referring to fig. 1, fig. 1 is a schematic flow chart of a method for removing print-through of a note according to an embodiment of the present invention, including the following steps:
s101, acquiring a text image.
The text image may be a document picture shot by a camera device such as a mobile phone and a camera, and may be a text image as shown in fig. 2a, and the text image has a remarkable note print-through phenomenon. The reason for the phenomenon of the print-through of the note may be that the force of holding the pen by hand is too heavy, or the thickness of the paper is too thin, or the ink discharged from the pen used for writing is too heavy.
S102, clustering all pixel points in the text image, and determining the color category to which all the pixel points in the text image belong.
Clustering is a process of classifying similar objects into the same class on physical or abstract objects, for example, a group of people in a room, which can be classified into different categories by age, gender, weight, or other common characteristics, and the process of classifying the group of people can be called clustering.
The text image is a set of pixel points, a plurality of pixel points form a text image, the pixel points in the text image are clustered, and the color category of each pixel point in the text image can be determined through a related clustering algorithm.
The pixel clustering process may be a K-means clustering algorithm, a hierarchical clustering method, a GMM (Gaussian Mixture Model), or other clustering methods.
Optionally, S102 may specifically be implemented by the following steps:
the clustering center and the category of the text image are determined by using a K-means clustering algorithm, K objects can be randomly selected as initial clustering centers, then the distance between each pixel point and each initial clustering center is calculated, each pixel point is assigned to the initial clustering center with the nearest distance, and the initial clustering center and the pixel point assigned to the initial clustering center represent a cluster. When all the pixel points in the text image are confirmed to the corresponding initial clustering centers, the new clustering centers can calculate the average value according to the clustered pixel points and the initial clustering centers, and the obtained average value is used as the class center of the next clustering calculation. And (4) iterating the calculation until the iteration times reach the preset iteration times or errors, and stopping the calculation. And taking the clustering result calculated for the last time as a final clustering result.
For example, setting the value of K to 5 corresponds to randomly selecting 5 points in the text image as the initial clustering centers of the text image, calculating the distance between each pixel point in the text image and 5 initial clustering centers, and assigning each pixel point to the closest initial clustering center, such as calculating the distance between the pixel point a in the text image and the initial clustering centers B, C, D, E and F, respectively. And the distance between the A pixel point and the C initial clustering center is minimum, namely the A pixel point is considered to belong to the C initial clustering center. By the algorithm, each initial clustering center and the pixel point belonging to the initial clustering center form a cluster, the clustered pixel points and the initial clustering centers are averaged, and the obtained average value is used as a new clustering center. And if 10 pixel points are arranged around the initial clustering center B, calculating the mean value of the 11 pixel points as a new clustering center. Repeating the above steps for iteration, and setting the number of iterations, for example, setting the number of iterations to 10, i.e., considering that the above steps are repeated 10 times and then stop, or, if the error between the clustering result and the actual output value is less than the threshold, ending the iteration, otherwise, continuing the iteration.
The distance between each pixel point of the text image and each initial clustering center is determined by an Euclidean distance formula:
Figure BDA0002259049390000061
d (p, q) represents pixels p (x, y) and q (x)1,y1) The distance between them.
When the distance between the pixel point and the center of a certain class is minimum, the pixel belongs to the class.
S103, extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all pixel points belonging to the background color category with the pixel value to obtain an updated text image.
The background color category can be determined by calculating the color distribution of the sampled pixel points by adopting a color histogram, and taking a color with the highest color frequency as the background color, or taking a preset color as the background color.
The color filling is to replace the original color of the clustering result belonging to the pixel point, for example, the color represented by the pixel point A is pink, the original pink color of the pixel point A is replaced by the category color closest to the clustering result, and if the category color is closest to the red category in the clustering result, the red is used to replace the pink to replace the color.
And replacing the pixel values of all the pixel points belonging to the color category by using the pixel value of the clustering center, namely replacing the pixel colors of all the pixel points belonging to the color category by using the pixel colors of the clustering center.
For example, when a person writes characters on a piece of white paper with a red pen, a note print-through phenomenon appears behind the white paper, when pixel clustering is performed, the pixel color of the clustering center of the background color category is white, and the pixel colors of all the pixel points belonging to the category are replaced by using the pixel colors of the pixel points of the clustering center of the background color category, that is, the colors represented by the note print-through are replaced by white, so that the effect of removing the note print-through is achieved, as shown in fig. 2 b.
Therefore, by applying the method of the embodiment of the invention, after the text image is obtained, the pixel clustering is carried out on the pixel points of the text image, the color category to which each pixel point in the text image belongs can be determined, because the color of the note print-through area is generally shallow, after the clustering is carried out, the pixel points of the note print-through are generally clustered into the background color category, the pixel values of all the pixel points belonging to the background color category are replaced by the pixel value of the clustering center point of the background color category, the clustering center point of the background color category is one pixel point in the actual background, the pixel values of all the pixel points belonging to the background color category are replaced by the pixel value of the pixel point of the background color category, so that the pixel value of the pixel point of the note print-through is replaced by the pixel value of the actual background, thereby achieving the effect of removing the note print-through the method without complicated manual operation, the note print-through in the text image can be automatically and efficiently removed.
Based on the embodiment shown in fig. 1, another flow diagram of a note print-through removing method is further provided in the embodiment of the present invention, as shown in fig. 3, including the following steps:
s301, acquiring a text image.
S302, sampling pixel points in the text image to obtain a preset number of pixel points.
And S303, clustering the preset number of pixel points obtained by sampling, and determining the color categories to which the preset number of pixel points belong respectively.
S304, extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all pixel points belonging to the background color category with the pixel values to obtain an updated text image.
The content described in step S301 is the same as the content described in S101 in fig. 1, and the content described in step S304 is the same as the content described in S103 in fig. 1, which is not repeated herein.
Generally, the definition of a text image obtained by a camera device such as a mobile phone or a camera is high, and clustering all image pixels directly brings huge calculation amount, so that a predetermined number of pixel points can be randomly sampled for clustering to determine the color category of each pixel point in the text image before clustering each pixel point in the text image, so as to reduce the calculation amount.
When the definition of the text image is not high and the quality is not large, random sampling may not be required.
When sampling the pixel points of the text image, a plurality of preset pixel points can be obtained. For example, when an image obtained by shooting with a mobile phone or other shooting device is sampled, 5 sampling points are preset, and 5 pixel points can be obtained. Then, the 5 pixel points are clustered to obtain the color categories to which the 5 pixel points belong respectively.
Based on the embodiment shown in fig. 1, another flow diagram of a note print-through removing method is further provided in the embodiment of the present invention, as shown in fig. 4, including the following steps:
s401, acquiring a text image.
S402, reducing the storage bit depth of each pixel point in the text image to obtain a first text image.
And S403, clustering all the pixel points in the first text image, and determining the color category to which all the pixel points in the first text image belong.
S404, extracting the pixel value of the cluster center point of the background color category, and replacing the pixel values of all pixel points belonging to the background color category with the pixel value to obtain an updated text image.
The content described in step S401 is the same as the content described in S101 in fig. 1, and the content described in step S404 is the same as the content described in S103 in fig. 1, which is not repeated herein.
Generally speaking, the definition of a text image obtained by a camera device such as a mobile phone or a camera is higher, and the occupied memory is more, so that the text image needs to be judged before clustering each pixel point in the text image, and when the quality of the text image is too high or the resolution is too large, the storage bit depth of each pixel point in the text image can be reduced, so that the memory of each pixel point is reduced, the quality of the text image is reduced, and the subsequent calculation amount is reduced.
When the quality of the text image is not high or the resolution is too small, the storage bit depth of each pixel point of the text image does not need to be reduced.
The bit depth is a unit used when a computer records the color of a color image by each pixel, and the more colorful the image is, the more bits are. The number of bits used by each pixel in a computer is the "bit depth". The deeper the bit depth, the higher the color depth, the more colors are available.
At present, mobile phone photographing images have high resolution and rich colors including text images. The RGB color image stores the colors of three channels of each pixel point by binary digits, when the binary digits are more, the possibility that each pixel point can express different colors is higher, and for the text image with the same size, the storage space occupied by each pixel point is higher. In a 24-bit RGB three channel, one channel of one pixel point of a text image is usually represented by 8-bit bytes, and the channel of each pixel point of a target image can be reduced to 6 bits on the premise of not influencing the visual effect.
The storage bit depth of the text image is reduced, the size of the text image can be compressed, and the degree of the transparent printing of the note is reduced.
When the operation of reducing the storage bit depth is performed on the text image to obtain the first text image, clustering needs to be performed on each pixel point in the first text image to determine the color category of each pixel point in the first text image.
Therefore, by applying the method of the embodiment of the invention, the text image is obtained, the storage bit depth of each pixel point in the text image is reduced, the first text image is obtained, each pixel point in the first text image is clustered, the color category to which each pixel point in the first text image belongs is determined, the pixel value of the clustering center point of the background color category is extracted, and the pixel values of all the pixel points belonging to the background color category are replaced by the pixel values, so that the updated text image is obtained. Reducing the storage depth of the text image, reducing the memory of each pixel point in the text image, lightening the color of the text image, reducing the degree of note print-through, performing pixel clustering on the pixel points of the first text image, determining the color category to which each pixel point in the first text image belongs, clustering the pixel points of the note print-through as the background color category after clustering, replacing the pixel values of all the pixel points belonging to the background color category by the pixel value of the cluster center point of the background color category, replacing the pixel values of all the pixel points belonging to the background color category by the pixel value of the pixel point of the background color category, so that the pixel values of the pixel points belonging to the note print-through are replaced by the pixel values of the actual background, the method can automatically and efficiently remove the note print-through in the text image without complex manual operation.
Based on the embodiment shown in fig. 1, another flow diagram of a note print-through removing method is further provided in the embodiment of the present invention, as shown in fig. 5, including the following steps:
s501, acquiring a text image.
S502, sampling pixel points in the text image to obtain a preset number of pixel points, and obtaining a first text image.
S503, reducing the storage bit depth of each pixel point in the first text image to obtain a second text image.
S504, clustering is carried out on the preset number of pixel points obtained by sampling in the second text image, and the color categories of the preset number of pixel points in the second text image are determined.
And S505, extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all pixel points belonging to the background color category by using the pixel values to obtain an updated text image.
The content described in step S501 is the same as the content described in S101 in fig. 1, the content described in S502 is the same as the content described in S302 in fig. 3, the content described in S503 is the same as the content described in S402 in fig. 4, the content described in S504 is the same as the content described in S403 in fig. 4, and the content described in step S505 is the same as the content described in S103 in fig. 1, which is not repeated here.
When the resolution of a text image obtained by a camera device such as a mobile phone or a camera is high or the resolution is high, the text image needs to be randomly sampled and the storage bit depth needs to be reduced to reduce the amount of calculation.
The flow diagram is that the text image is randomly sampled, the storage bit depth of the text image is reduced, and finally the color category of each pixel point is determined by clustering each pixel point of the text image with the reduced storage bit depth.
In addition, in another implementation manner of the embodiment of the present invention, the storage bit depth of the text image may be reduced, then the text image may be subjected to random sampling processing, and finally, each pixel point of the processed text image may be clustered to determine the color category to which each pixel point belongs.
The two modes can process the text image, reduce subsequent calculation amount and determine the color category of each pixel of the text image.
Based on the embodiment shown in fig. 1, another flow diagram of a note print-through removing method is further provided in the embodiment of the present invention, as shown in fig. 6, including the following steps:
s601, acquiring a text image.
S602, determining the clustering center point of each color category in the text image.
S603, aiming at any pixel point in the text image, respectively calculating the distance between the pixel point and each clustering center point, and determining the color category to which the pixel point belongs as the color category of the clustering center point corresponding to the minimum distance in the distances between the pixel point and each clustering center point.
S604, extracting the pixel value of the cluster center point of the background color category, and replacing the pixel values of all pixel points belonging to the background color category with the pixel value to obtain an updated text image.
The content described in step S601 is the same as the content described in S101 in fig. 1, and the content described in step S604 is the same as the content described in S103 in fig. 1, and is not repeated here.
The clustering processing is carried out on each pixel point in the text image, and the pixel point of the clustering center can be obtained by using related algorithms such as K-means and the like. After the final pixel points of the clustering centers are obtained, the distance between the pixel point and each final clustering center point is calculated for any pixel point in the text image, and the color category to which the pixel point belongs is determined to be the color category of the clustering center point corresponding to the minimum distance in the distances between the pixel point and each clustering center point.
The distance between each pixel point of the text image and each cluster center can be calculated through formulas such as Euclidean distance, Manhattan distance, Mahalanobis distance, cosine distance, Hamming distance and the like. Specifically, each pixel point has three channels of RGB, and each channel calculates the difference value with the three channels of the RGB of the cluster center pixel and then calculates the euclidean distance of the whole pixel.
For example, the distances between the a pixel point and the B cluster center, the C cluster center, and the D cluster center in the text image are obtained by calculation. And the distance between the A pixel point and the C clustering center is minimum, namely the A pixel point is considered to belong to the C clustering center.
And determining the color category of each pixel point as the color category of the clustering center point corresponding to the minimum distance in the distances between the pixel point and each clustering center point.
Meanwhile, the method described in fig. 1, fig. 3, fig. 4, and fig. 5 may further include extracting and replacing the pixel values of all the pixels belonging to the color category with the pixel value of the cluster center point of the color category to obtain an updated text image for each color category other than the background color category.
The pixel value of the clustering center point of the background color category is used for replacing the pixel values of all the pixel points belonging to the color category, so that the phenomenon of note show-through can be eliminated, and the pixel values of the clustering center points of other color categories except the background color category are used for replacing the pixel values of all the pixel points belonging to the color category, so that the effect of front handwriting can be enhanced.
For example, when a red pen is used to write on white paper, the pixel color of the cluster center of the background color category is white, the pixel color of the cluster center of each other color category is other color, such as red, the pixel color of the pixel point of each cluster center is used to replace the pixel color of all the pixel points belonging to the category, that is, the color represented by the note print-through is replaced by white, and the color represented by the character is replaced by red, so that not only can the effect of removing the note print-through be achieved, but also a clearer text image can be obtained, as shown in fig. 2b, not only the note print-through is removed, but also the handwriting of the text image becomes clearer.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a note print-through removing device according to an embodiment of the present invention. The method comprises the following steps: an acquisition module 701, a clustering module 702, an extraction module 703, and a replacement module 704.
Wherein:
an obtaining module 701, configured to obtain a text image;
a clustering module 702, configured to cluster each pixel point in the text image, and determine a color category to which each pixel point in the text image belongs;
an extracting module 703, configured to extract a pixel value of a cluster center point of a background color category;
and the replacing module 704 is configured to replace the pixel values of all the pixel points belonging to the background color category with the pixel value of the cluster center point of the background color category to obtain an updated text image.
Further, the device can also comprise a sampling module.
And the sampling module is used for randomly sampling all pixel points in the text image and randomly sampling a preset number of pixel points for clustering.
Further, the device can also comprise a compression module.
And the compression module is used for reducing the storage bit depth of each pixel point in the text image to obtain a first text image.
The clustering module 702 is specifically configured to:
and clustering all pixel points in the first text image, and determining the color category to which all the pixel points in the first text image belong.
Further, the clustering module 702 is specifically configured to:
determining the clustering center points of all color categories in the text image;
and respectively calculating the distance between the pixel point and each clustering central point aiming at any pixel point in the text image, and determining the color category to which the pixel point belongs as the color category of the clustering central point corresponding to the minimum distance in the distances between the pixel point and each clustering central point.
Further, the replacing module 704 is further configured to:
and aiming at other color categories except the background color category, extracting and replacing the pixel values of all pixel points belonging to the color category by using the pixel value of the cluster center point of the color category to obtain an updated text image.
Therefore, by applying the embodiment of the invention, the text image is obtained, the pixel points in the text image are clustered, the color category to which the pixel points in the text image belong is determined, the pixel value of the clustering center point of the background color category is extracted, and the pixel values of all the pixel points belonging to the background color category are replaced by the pixel values, so that the updated text image is obtained. The pixel clustering is carried out on the pixel points of the text image, the color category of each pixel point in the text image can be determined, because the color of the note print-through area is generally shallow, after clustering, the pixels of the note print-through are clustered into a background color category, the pixel values of all the pixels belonging to the background color category are replaced by the pixel value of the clustering center point of the background color category, the clustering center point of the background color category is one pixel in the actual background, the pixel values of all the pixels belonging to the background color category are replaced by the pixel value of the pixel, so that the pixel values of the pixels of the note print-through are replaced by the pixel values of the actual background, the method can automatically and efficiently remove the note print-through in the text image without complex manual operation.
Referring to fig. 8, fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. The system comprises a memory 801, a processor 802, a display 803, a communication interface 804 and a communication bus 805, wherein the processor 802, the communication interface 804, the memory 801 and the display 803 are communicated with each other through the communication bus 805.
A memory 801 for storing a computer program;
the processor 802 is configured to execute the above-described print-through removing method when executing the computer program stored in the memory 801.
The Memory may include a RAM (Random Access Memory) or an NVM (Non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor including a CPU, an NP (Network Processor), and the like; but also DSPs (Digital Signal processors), ASICs (Application Specific Integrated circuits), FPGAs or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
The communication bus mentioned in the electronic device may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Display may be a CRT (Cathode Ray Tube) Display, an LCD (Liquid Crystal Display), a PDP (Plasma Display Panel), an OLED (Organic Light-Emitting Diode) Display, or the like.
In the embodiment of the present invention, the processor can realize that: the method comprises the steps of obtaining a text image, clustering all pixel points in the text image, determining the color category to which all the pixel points in the text image belong, extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all the pixel points belonging to the background color category by utilizing the pixel value to obtain an updated text image. The pixel clustering is carried out on the pixel points of the text image, the color category of each pixel point in the text image can be determined, because the color of the note print-through area is generally shallow, after clustering, the pixels of the note print-through are clustered into a background color category, the pixel values of all the pixels belonging to the background color category are replaced by the pixel value of the clustering center point of the background color category, the clustering center point of the background color category is one pixel in the actual background, the pixel values of all the pixels belonging to the background color category are replaced by the pixel value of the pixel, so that the pixel values of the pixels of the note print-through are replaced by the pixel values of the actual background, the method can automatically and efficiently remove the note print-through in the text image without complex manual operation.
The embodiment of the invention also discloses a computer readable storage medium, wherein the computer readable storage medium is stored with instructions, and when the instructions are run on a computer, the TCP performance monitoring method is executed. The computer readable storage medium may be an optical disc, a solid state drive, a mechanical hard drive, etc.
In the embodiment of the present invention, the machine-readable storage medium stores instructions for executing the method provided in the embodiment of the present invention when executed, so that the method can realize: the method comprises the steps of obtaining a text image, clustering all pixel points in the text image, determining the color category to which all the pixel points in the text image belong, extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all the pixel points belonging to the background color category by utilizing the pixel value to obtain an updated text image. The pixel clustering is carried out on the pixel points of the text image, the color category of each pixel point in the text image can be determined, because the color of the note print-through area is generally shallow, after clustering, the pixels of the note print-through are clustered into a background color category, the pixel values of all the pixels belonging to the background color category are replaced by the pixel value of the clustering center point of the background color category, the clustering center point of the background color category is one pixel in the actual background, the pixel values of all the pixels belonging to the background color category are replaced by the pixel value of the pixel, so that the pixel values of the pixels of the note print-through are replaced by the pixel values of the actual background, the method can automatically and efficiently remove the note print-through in the text image without complex manual operation.
For the embodiments of the electronic device and the machine-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, and the machine-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the method embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (12)

1. A method of removing strike-through, the method comprising:
acquiring a text image;
clustering all pixel points in the text image, and determining the color category to which all the pixel points in the text image belong;
and extracting the pixel value of the clustering center point of the background color category, and replacing the pixel values of all pixel points belonging to the background color category by using the pixel value to obtain an updated text image.
2. The method according to claim 1, wherein before said clustering of pixels in said text image and determining a color class to which pixels in said text image belong, said method further comprises:
sampling pixel points in the text image to obtain a preset number of pixel points;
the clustering of the pixel points in the text image and the determination of the color category to which the pixel points in the text image belong comprises:
and clustering the preset number of pixel points obtained by sampling, and determining the color categories to which the preset number of pixel points belong respectively.
3. The method according to claim 1, wherein before said clustering of pixels in said text image and determining a color class to which pixels in said text image belong, said method further comprises:
reducing the storage bit depth of each pixel point in the text image to obtain a first text image;
the clustering of the pixel points in the text image and the determination of the color category to which the pixel points in the text image belong comprises:
and clustering all pixel points in the first text image, and determining the color category to which all the pixel points in the first text image belong.
4. The method of claim 1, wherein the clustering pixels in the text image to determine the color class to which the pixels in the text image belong comprises:
determining the clustering center point of each color category in the text image;
and respectively calculating the distance between the pixel point and each clustering center point aiming at any pixel point in the text image, and determining the color category of the pixel point as the color category of the clustering center point corresponding to the minimum distance in the distances between the pixel point and each clustering center point.
5. The method of claim 1, further comprising:
and aiming at other color categories except the background color category, extracting and replacing the pixel values of all pixel points belonging to the color category by using the pixel value of the clustering center point of the color category to obtain an updated text image.
6. A device for removing strike-through, the device comprising:
the acquisition module is used for acquiring a text image;
the clustering module is used for clustering all pixel points in the text image and determining the color category of all the pixel points in the text image;
the extraction module is used for extracting the pixel value of the clustering center point of the background color category;
and the replacing module is used for replacing the pixel values of all the pixel points belonging to the background color category by using the pixel values to obtain an updated text image.
7. The apparatus of claim 6, further comprising:
the sampling module is used for sampling pixel points in the text image to obtain a preset number of pixel points;
the clustering module is specifically configured to:
and clustering the preset number of pixel points obtained by sampling, and determining the color categories to which the preset number of pixel points belong respectively.
8. The apparatus of claim 6, further comprising:
the compression module is used for reducing the storage bit depth of each pixel point in the text image to obtain a first text image;
the clustering module is specifically configured to:
and clustering all pixel points in the first text image, and determining the color category to which all the pixel points in the first text image belong.
9. The apparatus of claim 6, wherein the clustering module is specifically configured to:
determining the clustering center point of each color category in the text image;
and respectively calculating the distance between the pixel point and each clustering center point aiming at any pixel point in the text image, and determining the color category of the pixel point as the color category of the clustering center point corresponding to the minimum distance in the distances between the pixel point and each clustering center point.
10. The apparatus of claim 6, wherein the replacement module is further configured to:
and aiming at other color categories except the background color category, extracting and replacing the pixel values of all pixel points belonging to the color category by using the pixel value of the clustering center point of the color category to obtain an updated text image.
11. An electronic device comprising a processor, a memory, a display, a communication interface, and a communication bus, wherein,
the memory is used for storing a computer program;
the processor, when executing the computer program stored on the memory, implementing the method of any of claims 1-5.
12. A computer-readable storage medium having stored therein instructions which, when executed on a computer, cause the computer to perform the method of any one of claims 1-5.
CN201911065009.XA 2019-11-04 2019-11-04 Method and device for removing penetrating print of notes Pending CN112784850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911065009.XA CN112784850A (en) 2019-11-04 2019-11-04 Method and device for removing penetrating print of notes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911065009.XA CN112784850A (en) 2019-11-04 2019-11-04 Method and device for removing penetrating print of notes

Publications (1)

Publication Number Publication Date
CN112784850A true CN112784850A (en) 2021-05-11

Family

ID=75747255

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911065009.XA Pending CN112784850A (en) 2019-11-04 2019-11-04 Method and device for removing penetrating print of notes

Country Status (1)

Country Link
CN (1) CN112784850A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004336282A (en) * 2003-05-06 2004-11-25 Ricoh Co Ltd Image processor, image processing program and recording medium recorded with relevant program
US20100104163A1 (en) * 2008-10-28 2010-04-29 Ruiping Li Orientation detection for chest radiographic images
CN102523364A (en) * 2011-12-02 2012-06-27 方正国际软件有限公司 Document image strike-through eliminating method and system
CN109509196A (en) * 2018-12-24 2019-03-22 广东工业大学 A kind of lingual diagnosis image partition method of the fuzzy clustering based on improved ant group algorithm
CN109903210A (en) * 2019-01-04 2019-06-18 阿里巴巴集团控股有限公司 Minimizing technology, device and the server of watermark

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004336282A (en) * 2003-05-06 2004-11-25 Ricoh Co Ltd Image processor, image processing program and recording medium recorded with relevant program
US20100104163A1 (en) * 2008-10-28 2010-04-29 Ruiping Li Orientation detection for chest radiographic images
CN102523364A (en) * 2011-12-02 2012-06-27 方正国际软件有限公司 Document image strike-through eliminating method and system
CN109509196A (en) * 2018-12-24 2019-03-22 广东工业大学 A kind of lingual diagnosis image partition method of the fuzzy clustering based on improved ant group algorithm
CN109903210A (en) * 2019-01-04 2019-06-18 阿里巴巴集团控股有限公司 Minimizing technology, device and the server of watermark

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郏宣耀 等: "一种基于聚类的彩色图像分色算法", 《计算技术与自动化》, vol. 25, no. 1, pages 110 - 113 *

Similar Documents

Publication Publication Date Title
US6411730B1 (en) Histogram for generating a palette of colors
CN110136198B (en) Image processing method, apparatus, device and storage medium thereof
US8280175B2 (en) Document processing apparatus, document processing method, and computer readable medium
JP5669957B2 (en) Watermark image segmentation method and apparatus for Western language watermark processing
WO2019041842A1 (en) Image processing method and device, storage medium and computer device
CN106202086B (en) Picture processing and obtaining method, device and system
CN108491845B (en) Character segmentation position determination method, character segmentation method, device and equipment
CN110990617B (en) Picture marking method, device, equipment and storage medium
CN111008624A (en) Optical character recognition method and method for generating training sample for optical character recognition
US10691884B2 (en) System and method for cheque image data masking using data file and template cheque image
CN112784850A (en) Method and device for removing penetrating print of notes
CN104850819B (en) Information processing method and electronic equipment
CN111027533A (en) Conversion method and system of point-to-read coordinates, terminal device and storage medium
CN113038184B (en) Data processing method, device, equipment and storage medium
CN116225956A (en) Automated testing method, apparatus, computer device and storage medium
JP5885956B2 (en) Font matching
CN111062377B (en) Question number detection method, system, storage medium and electronic equipment
CN115083024A (en) Signature identification method, device, medium and equipment based on region division
JP2007334876A (en) System and method for processing document image
US11100355B1 (en) Document image content protection in the context of noise reduction
CN111127310B (en) Image processing method and device, electronic equipment and storage medium
CN109242763B (en) Picture processing method, picture processing device and terminal equipment
CN113269102A (en) Seal information identification method and device, computer equipment and storage medium
CN114677319A (en) Stem cell distribution determination method and device, electronic equipment and storage medium
CN111414728A (en) Numerical data display method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination