CN112651879A - Text image digital watermarking method capable of resisting printing and scanning - Google Patents

Text image digital watermarking method capable of resisting printing and scanning Download PDF

Info

Publication number
CN112651879A
CN112651879A CN202011555431.6A CN202011555431A CN112651879A CN 112651879 A CN112651879 A CN 112651879A CN 202011555431 A CN202011555431 A CN 202011555431A CN 112651879 A CN112651879 A CN 112651879A
Authority
CN
China
Prior art keywords
text
word
character
watermark
line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011555431.6A
Other languages
Chinese (zh)
Inventor
王志明
张烜
裴春红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Xidian Information Technology Research Institute Co ltd
Original Assignee
Shanxi Xidian Information Technology Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Xidian Information Technology Research Institute Co ltd filed Critical Shanxi Xidian Information Technology Research Institute Co ltd
Priority to CN202011555431.6A priority Critical patent/CN112651879A/en
Publication of CN112651879A publication Critical patent/CN112651879A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • G06T1/005Robust watermarking, e.g. average attack or collusion attack resistant
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/10Image enhancement or restoration using non-spatial domain filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0065Extraction of an embedded watermark; Reliable detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20052Discrete cosine transform [DCT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20061Hough transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20064Wavelet transform [DWT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention discloses a printing and scanning resistant text image digital watermarking method, which comprises the following specific steps: (1) recognizing a text line; (2) text word recognition; (3) preprocessing watermark information; (4) watermark embedding preprocessing; (5) embedding a watermark; (6) correcting the image; (7) watermark extraction preprocessing; (8) and extracting the watermark. The invention solves the problems that the redundancy of the text watermarking method is small when the watermark is embedded, and the watermark has weak capability of resisting printing and scanning attacks. The invention utilizes the advantages of the spatial domain text watermarking algorithm, not only has strong robustness to the printing scanning attack, but also has good invisibility of the watermark, simple calculation and high watermark embedding capacity.

Description

Text image digital watermarking method capable of resisting printing and scanning
Technical Field
The invention belongs to the technical field of image processing, and further relates to a digital watermarking method for a printing and scanning resistant text image in the technical field of information hiding. The invention can be used for embedding the watermark information into the text image when printing/printing the output image, and can realize the copyright protection of printed and printed products by scanning the printed image and extracting the embedded watermark information.
Background
The phenomenon that the copyright of paper books and other paper data is infringed is increasingly serious in the internet era, and the legal rights and interests of authors and publishers are seriously damaged. The text-based digital watermarking technology can hide information of a group or a personal identification property in an electronic or paper text in a specific mode, and the information cannot be recognized by human eyes, so that the copyright protection effect on the text works is realized. Text digital watermarking methods are mainly classified into two categories: space domain methods and frequency domain methods.
Huanghua et al, in "a new text digital watermark marking strategy and detection method" ("Sian university of transportation proceedings 2002, 36(2):165-168), propose a new line shift marking strategy, which does not use a reference line, and each text line moves up and down according to the previous text line to realize the embedding of the watermark. The method simplifies the detection method and realizes the blind extraction of the watermark, but the method still has the following defects that the algorithm has smaller watermark capacity due to the limited number of lines of Chinese text.
Tan Zheng et al put forward a text watermarking method based on three-level DWT transformation in the 'anti-printing-scanning digital watermarking technology based on document image' (computer applied research 2007, 24 (12): 199-; the text image has simple texture details, uneven distribution and small redundant space, so that the invisibility and the robustness of the watermark are difficult to balance.
Disclosure of Invention
The invention aims to provide a text image digital watermarking method for resisting printing and scanning, which is mainly used for embedding watermarks into printed/printed works while printing/printing and provides basis for copyright protection. The invention aims to solve the main problems that the existing text watermarking method is difficult to resist printing scanning attack, the watermarking capacity is low, the realization is complex, and especially the balance between the invisibility and the robustness of the watermarking is difficult to obtain.
The invention comprises two processes of watermark embedding and watermark extracting;
the watermark embedding process comprises the following specific steps:
(1) and (3) text line recognition:
1a) counting black pixel points of each pixel row of the carrier image and calculating the line width of the pixel row;
1b) calculating the ratio of the total number of black pixels to the width of each pixel line, judging the blank pixel line if the ratio is 0, and judging the text pixel line if the ratio is not 0;
1c) and traversing all pixel lines from top to bottom, judging the upper boundary of the text line of the current pixel line if the current pixel line is a text pixel line and the previous pixel line is a blank pixel line, and judging the lower boundary of the text line of the previous pixel line if the current pixel line is a blank pixel line and the previous pixel line is a text pixel line. Storing the recognized text line boundary into a line boundary array;
(2) text word recognition:
2a) counting black pixel points of each identified text row pixel column;
2b) traversing the pixel row of the current row from left to right, if the total number of the black pixel points of the current row is not 0 and the total number of the black pixel points of the previous row is 0, judging that the current row is a left boundary of the text character, and if the total number of the black pixel points of the current row is 0 and the total number of the black pixel points of the previous row is not 0, judging that the previous row is a right boundary of the text character. Storing the recognized text word boundary into a word boundary array;
2c) carrying out word width and word space statistical sorting on the determined text word boundary, and setting a word width intermediate value and a word space intermediate value as threshold values T1And a threshold value T2If the width of the two adjacent text characters is smaller than the threshold value T1And the distance between two text words is less than the threshold value T2If so, judging that the current two text characters jointly form a Chinese character, deleting the right boundary of the left text character and the left boundary of the right text character from the character boundary array, and updating the character boundary array;
2d) performing word width statistical sorting on the updated text word boundaries, and setting a word width intermediate value as a threshold value T3Width of Chinese character 'Pai' is larger than 1.8 × T3The text character is split, columns with the total number of black pixel points being 0 are respectively searched leftwards and rightwards from the middle position of the text character, and the columns are used as newly-added left and right boundaries and stored in a character boundary array;
(3) preprocessing watermark information:
the ten-digit Arabic numeral characters are converted into corresponding ASCII codes, and the ASCII codes are subjected to cyclic operation to obtain a binary watermark information sequence to be embedded;
(4) watermark embedding preprocessing:
carrying out word width and word space statistical sorting on the word boundary array obtained in the step (2), and setting a word width intermediate value and a word space intermediate value as threshold values T respectively4And a threshold value T5Width of word is smaller than
Figure RE-GDA0002966304910000031
Setting 0 as a special mark at the boundary of the text word;
(5) and (3) watermark embedding process:
5a) embedding is not carried out on the head and tail text characters of each recognized text line, only the even numbered text characters of each text line are embedded, if the left and right boundaries of the current even numbered text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, the current text character is judged to be an embeddable text character, otherwise, the text character is not embeddable;
5b) comparing the left and right word spacing of the embeddable text word, if the left word spacing and the right word spacing are both greater than 4 XT5If not, the text word is changed into a non-embeddable text word, otherwise, the text word is changed into an embeddable text word;
5c) if the current watermark information to be embedded is 0, moving the embeddable text characters to enable the left-side character spacing to be smaller than the right-side character spacing, and if the current watermark information to be embedded is 1, moving the embeddable text characters to enable the left-side character spacing to be larger than the right-side character spacing;
5d) embedding watermarks into the embeddable text characters of all the text lines, and printing and scanning the text images embedded with the watermarks to obtain scanned images embedded with the watermarks;
the watermark extraction process comprises the following specific steps:
(6) and (3) image rectification:
6a) taking the upper half part of the scanned watermark-containing image, turning the upper half part of the scanned watermark-containing image into a white image with a black background, and removing a white edge of the image generated by inclination in the scanning process;
6b) performing expansion operation on the black-background white character image in the horizontal and vertical directions to connect discontinuous characters into longer line segments;
6c) performing edge detection on the expanded image, and performing Hough transformation on edge points to find the inclination angle theta of the longest line segment;
6d) rotating the scanned text image by an angle theta to remove an image black edge generated by image rotation;
(7) watermark extraction preprocessing:
7a) performing text line recognition according to the step (1), and performing text column recognition according to the step (2);
7b) carrying out word width and word space statistical sorting on the word boundary array obtained in the step (2), and respectively taking a word width intermediate value and a word space intermediate value as threshold values T'4And a threshold value T'5Width of word is smaller than
Figure RE-GDA0002966304910000041
Setting 0 as a special mark at the boundary of the text word;
(8) and (3) watermark extraction process:
8a) the method comprises the steps that the head and tail text characters of each recognized text line are not extracted, only the even numbered text characters of each text line are embedded, if the left and right boundaries of the current even numbered text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, the current text character is judged to be an extractable text character, otherwise, the text character is not extractable;
8b) comparing left and right word spaces of the extractable text words, and if the left word space and the right word space are both larger than 4 xT'5Then change it to non-extractable textOtherwise, the text word can be extracted;
8c) if the left-side character spacing of the current extractable text characters is smaller than the right-side character spacing, extracting the watermark information to be 0, and if the left-side character spacing of the current extractable text characters is larger than the right-side character spacing, extracting the watermark information to be 1;
8d) extracting watermarks from the extractable text characters of all text lines, connecting the extracted watermark information into a binary watermark sequence, converting the binary sequence ASCII code into corresponding Arabic numeral characters, and obtaining the finally extracted watermark digital information.
Compared with the prior art, the invention has the following advantages:
firstly, the invention selects to embed the watermark in the airspace, and realizes the embedding of the watermark through the tiny movement of the text word, thereby overcoming the defect of poor capability of resisting the printing and scanning attacks in the prior art, and leading the invention to have the advantages of strong robustness and good invisibility.
Secondly, the invention does not need to carry out complex matrix DWT and DCT transformation and does not need to carry out blocking on the image, so that the invention has the advantage of extracting the watermark image more quickly and accurately.
Thirdly, the watermark embedding method constructed by the invention not only improves the watermark capacity, but also overcomes the defect that the original image data is required to be referred for watermark extraction in the prior similar technology, so that the invention has the advantages of high watermark capacity and blind extraction.
Description of the drawings:
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a schematic of an experiment according to the present invention;
FIG. 3 schematically illustrates: referring to fig. 3, the print scanning resistant digital watermark system of the present invention firstly identifies text words and text lines of a carrier image, secondly embeds watermarks by using ASCII codes converted from watermark information, secondly performs print scanning operation on the images after watermark embedding, and finally performs correction and watermark extraction on the scanned images to obtain extracted watermark information.
The specific implementation mode is as follows:
the present invention will be described in detail below with reference to the accompanying drawings.
The specific steps of the method of the present invention are as follows with reference to figure 1.
And step 1, recognizing text lines.
And counting black pixel points and calculating the line width of the pixel lines of each pixel line of the carrier image, calculating the ratio of the total number of the black pixel points to the line width of the pixel lines, judging the line blank pixel line if the ratio is 0, and judging the line text pixel line if the ratio is not 0. And traversing all pixel lines from top to bottom, judging the upper boundary of the text line of the current pixel line if the current pixel line is a text pixel line and the previous pixel line is a blank pixel line, and judging the lower boundary of the text line of the previous pixel line if the current pixel line is a blank pixel line and the previous pixel line is a text pixel line. And storing the recognized text line boundary into a line boundary array.
Step 2, text word recognition:
and counting black pixels of each identified text row pixel column. Traversing the pixel rows from left to right, if the total number of the black pixel points of the current row is not 0 and the total number of the black pixel points of the previous row is 0, judging that the current row is a text character left boundary, and if the total number of the black pixel points of the current row is 0 and the total number of the black pixel points of the previous row is not 0, judging that the previous row is a text character right boundary. The recognized text word boundaries are stored in a word boundary array.
Carrying out word width and word space statistical sorting on text word boundaries, and setting a word width intermediate value and a word space intermediate value as threshold values T respectively1And a threshold value T2If the width of the two adjacent text characters is smaller than the threshold value T1And the distance between two text words is less than the threshold value T2And judging that the current two text characters jointly form a Chinese character, deleting the right boundary of the left text character and the left boundary of the right text character from the character boundary array, and updating the character boundary array.
Performing word width statistical sorting on the updated text word boundaries, and setting a word width intermediate value as a threshold value T3Width of Chinese character 'Pai' is larger than 1.8 × T3Is divided from the middle position of the text wordAnd respectively searching the rows with the total number of black pixels being 0 to the left and the right, and storing the rows as newly-added left and right boundaries into the word boundary array.
And 3, preprocessing watermark information.
And converting the ten-digit Arabic numeral characters into corresponding ASCII codes, and performing cycle operation on the ASCII codes to obtain a binary watermark information sequence to be embedded.
And 4, watermark embedding preprocessing.
Carrying out word width and word space statistical sorting on the word boundary array obtained in the step (2), and setting a word width intermediate value and a word space intermediate value as threshold values T respectively4And a threshold value T5Width of word is smaller than
Figure RE-GDA0002966304910000071
The text word boundary of (1) is set with 0 as a special mark.
And 5, embedding the watermark.
The first step, the head and tail text characters of each recognized text line are not embedded, only the even number text characters of each text line are embedded, if the left and right boundaries of the current even number text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, the current text character is judged to be an embeddable text character, otherwise, the text character is not embeddable.
Secondly, comparing the left and right word spaces of the embeddable text words, if the left word space and the right word space are both larger than 4 XT5If the text word is not embeddable, the text word is changed to be non-embeddable, otherwise, the text word is embeddable.
And thirdly, if the current information of the watermark to be embedded is 0, moving the embeddable text characters to enable the left-side character spacing to be smaller than the right-side character spacing, and if the current information of the watermark to be embedded is 1, moving the embeddable text characters to enable the left-side character spacing to be larger than the right-side character spacing.
And fourthly, embedding watermarks into the embeddable text characters of all the text lines, and printing and scanning the text images embedded with the watermarks to obtain scanned images embedded with the watermarks.
And 6, correcting the image.
And (3) turning the upper half part of the scanned watermark-containing image into a white image with black background, and removing the white edge of the image generated by inclination in the scanning process. The expansion operation is carried out on the black-background white character image in the horizontal and vertical directions, and discontinuous characters are connected into long line segments. And carrying out edge detection on the expanded image, carrying out Hough transformation on edge points, and finding out the inclination angle theta of the longest line segment. The scanned text image is rotated by an angle theta, and the black edge of the image generated by the image rotation is removed.
And 7, extracting and preprocessing the watermark.
And (5) performing text line recognition according to the step (1), and performing text column recognition according to the step (2). To step (2)
Carrying out statistical sorting on word width and word space of the obtained word boundary array, and respectively taking a word width intermediate value and a word space intermediate value as threshold values T'4And a threshold value T'5Width of word is smaller than
Figure RE-GDA0002966304910000081
The text word boundary of (1) is set with 0 as a special mark.
And 8, extracting the watermark.
Step one, not extracting the head and tail text characters of each identified text line, only embedding the even text characters of each text line, if the left and right boundaries of the current even text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, determining that the current text character is an extractable text character, otherwise, determining that the current text character is an unextractable text character;
secondly, comparing left and right word spaces of the extractable text words, and if the left word space and the right word space are both larger than 4 xT'5If the text word is not extractable, the text word is changed into an extractable text word, otherwise, the text word is changed into an extractable text word;
thirdly, if the left-side character spacing of the current extractable text characters is smaller than the right-side character spacing, extracting the watermark information to be 0, and if the left-side character spacing of the current extractable text characters is larger than the right-side character spacing, extracting the watermark information to be 1;
and fourthly, extracting watermarks from the extractable text characters of all text lines, connecting the extracted watermark information into a binary watermark sequence, converting the binary watermark sequence ASCII code into corresponding Arabic numeral characters, and obtaining finally extracted watermark digital information.
The effects of the present invention will be further described with reference to the experimental drawings.
The printer model selected in the experiment of the invention is HP Color Laser MFP 178nw, the scanner of the printer is used for scanning, and the printing resolution and the scanning resolution are both 300 dpi. The quality of the image containing the watermark is evaluated by SSIM (structural similarity), and the anti-attack performance of the watermark is evaluated by DR (correct extraction rate). The watermark embedding and extracting of the text image by using the method of the invention are as follows:
referring to FIG. 2, the printer model HP Color Laser MFP 178nw was used in the experiment of the present invention, and the scanner of the printer itself was used for scanning. Fig. 2(a) shows an image of an image carrier having a size of 4958 × 7017, and 1803121134 is information of a watermark to be embedded. Using the method of the present invention, the obtained watermark-containing image is shown in fig. 2(b), SSIM is 0.9429, the image after being corrected by printing and scanning is shown in fig. 2(c), the extracted watermark information is shown in fig. 2(d), DR is 94.23%;
as a result of the experiment, it can be seen from fig. 2(b) that the carrier image has good invisibility after the watermark is embedded. As can be seen from fig. 2(d), the watermark-containing image can still correctly extract the watermark after being printed and scanned, which shows that the method of the present invention has strong robustness against print-scan attacks.

Claims (3)

1. A text image digital watermarking method resisting printing and scanning comprises two processes of watermark embedding and watermark extracting;
the watermark embedding process comprises the following specific steps:
(1) and (3) text line recognition:
1a) counting black pixel points of each pixel row of the carrier image and calculating the line width of the pixel row;
1b) calculating the ratio of the total number of black pixels to the width of each pixel line, judging the blank pixel line if the ratio is 0, and judging the text pixel line if the ratio is not 0;
1c) and traversing all pixel lines from top to bottom, judging the upper boundary of the text line of the current pixel line if the current pixel line is a text pixel line and the previous pixel line is a blank pixel line, and judging the lower boundary of the text line of the previous pixel line if the current pixel line is a blank pixel line and the previous pixel line is a text pixel line. Storing the recognized text line boundary into a line boundary array;
(2) text word recognition:
2a) counting black pixel points of each identified text row pixel column;
2b) traversing the pixel row of the current row from left to right, if the total number of the black pixel points of the current row is not 0 and the total number of the black pixel points of the previous row is 0, judging that the current row is a left boundary of the text character, and if the total number of the black pixel points of the current row is 0 and the total number of the black pixel points of the previous row is not 0, judging that the previous row is a right boundary of the text character. Storing the recognized text word boundary into a word boundary array;
2c) carrying out word width and word space statistical sorting on the determined text word boundary, and setting a word width intermediate value and a word space intermediate value as threshold values T1And a threshold value T2If the width of the two adjacent text characters is smaller than the threshold value T1And the distance between two text words is less than the threshold value T2If so, judging that the current two text characters jointly form a Chinese character, deleting the right boundary of the left text character and the left boundary of the right text character from the character boundary array, and updating the character boundary array;
2d) performing word width statistical sorting on the updated text word boundaries, and setting a word width intermediate value as a threshold value T3Width of Chinese character 'Pai' is larger than 1.8 × T3The text character is split, columns with the total number of black pixel points being 0 are respectively searched leftwards and rightwards from the middle position of the text character, and the columns are used as newly-added left and right boundaries and stored in a character boundary array;
(3) preprocessing watermark information:
the ten-digit Arabic numeral characters are converted into corresponding ASCII codes, and the ASCII codes are subjected to cyclic operation to obtain a binary watermark information sequence to be embedded;
(4) watermark embedding preprocessing:
carrying out word width and word space statistical sorting on the word boundary array obtained in the step (2), and setting a word width intermediate value and a word space intermediate value as threshold values T respectively4And a threshold value T5Width of word is smaller than
Figure FDA0002858899060000021
Setting 0 as a special mark at the boundary of the text word;
(5) and (3) watermark embedding process:
5a) embedding is not carried out on the head and tail text characters of each recognized text line, only the even numbered text characters of each text line are embedded, if the left and right boundaries of the current even numbered text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, the current text character is judged to be an embeddable text character, otherwise, the text character is not embeddable;
5b) comparing the left and right word spacing of the embeddable text word, if the left word spacing and the right word spacing are both greater than 4 XT5If not, the text word is changed into a non-embeddable text word, otherwise, the text word is changed into an embeddable text word;
5c) if the current watermark information to be embedded is 0, moving the embeddable text characters to enable the left-side character spacing to be smaller than the right-side character spacing, and if the current watermark information to be embedded is 1, moving the embeddable text characters to enable the left-side character spacing to be larger than the right-side character spacing;
5d) embedding watermarks into the embeddable text characters of all the text lines, and printing and scanning the text images embedded with the watermarks to obtain scanned images embedded with the watermarks;
the watermark extraction process comprises the following specific steps:
(6) and (3) image rectification:
6a) taking the upper half part of the scanned watermark-containing image, turning the upper half part of the scanned watermark-containing image into a white image with a black background, and removing a white edge of the image generated by inclination in the scanning process;
6b) performing expansion operation on the black-background white character image in the horizontal and vertical directions to connect discontinuous characters into longer line segments;
6c) performing edge detection on the expanded image, and performing Hough transformation on edge points to find the inclination angle theta of the longest line segment;
6d) rotating the scanned text image by an angle theta to remove an image black edge generated by image rotation;
(7) watermark extraction preprocessing:
7a) performing text line recognition according to the step (1), and performing text column recognition according to the step (2);
7b) carrying out word width and word space statistical sorting on the word boundary array obtained in the step (2), and respectively taking a word width intermediate value and a word space intermediate value as threshold values T'4And a threshold value T'5Width of word is smaller than
Figure FDA0002858899060000031
Setting 0 as a special mark at the boundary of the text word;
(8) and (3) watermark extraction process:
8a) the method comprises the steps that the head and tail text characters of each recognized text line are not extracted, only the even numbered text characters of each text line are embedded, if the left and right boundaries of the current even numbered text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, the current text character is judged to be an extractable text character, otherwise, the text character is not extractable;
8b) comparing left and right word spaces of the extractable text words, and if the left word space and the right word space are both larger than 4 xT'5If the text word is not extractable, the text word is changed into an extractable text word, otherwise, the text word is changed into an extractable text word;
8c) if the left-side character spacing of the current extractable text characters is smaller than the right-side character spacing, extracting the watermark information to be 0, and if the left-side character spacing of the current extractable text characters is larger than the right-side character spacing, extracting the watermark information to be 1;
8d) extracting watermarks from the extractable text characters of all text lines, connecting the extracted watermark information into a binary watermark sequence, converting the binary sequence ASCII code into corresponding Arabic numeral characters, and obtaining the finally extracted watermark digital information.
2. The method of claim 1 for print scan resistant digital watermarking of text images, wherein: the black pixel point in the step 1a) is a pixel with a pixel value of 0.
3. The method of claim 1 for print scan resistant digital watermarking of text images, wherein: the pixel line width in the step 1a) refers to the number of pixels between the first black pixel and the last black pixel (including the head and tail black pixels) in the pixel line.
CN202011555431.6A 2020-12-25 2020-12-25 Text image digital watermarking method capable of resisting printing and scanning Pending CN112651879A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011555431.6A CN112651879A (en) 2020-12-25 2020-12-25 Text image digital watermarking method capable of resisting printing and scanning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011555431.6A CN112651879A (en) 2020-12-25 2020-12-25 Text image digital watermarking method capable of resisting printing and scanning

Publications (1)

Publication Number Publication Date
CN112651879A true CN112651879A (en) 2021-04-13

Family

ID=75362738

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011555431.6A Pending CN112651879A (en) 2020-12-25 2020-12-25 Text image digital watermarking method capable of resisting printing and scanning

Country Status (1)

Country Link
CN (1) CN112651879A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393360A (en) * 2021-06-08 2021-09-14 陕西科技大学 Correction method for printing and scanning resistant digital watermark image

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393360A (en) * 2021-06-08 2021-09-14 陕西科技大学 Correction method for printing and scanning resistant digital watermark image
CN113393360B (en) * 2021-06-08 2022-10-21 陕西科技大学 Correction method for printing and scanning resistant digital watermark image

Similar Documents

Publication Publication Date Title
KR101016712B1 (en) Watermark information detection method
CN107248134B (en) Method and device for hiding information in text document
Bhattacharjya et al. Data embedding in text for a copier system
US20110052094A1 (en) Skew Correction for Scanned Japanese/English Document Images
Gebhardt et al. Document authentication using printing technique features and unsupervised anomaly detection
US8275168B2 (en) Orientation free watermarking message decoding from document scans
Kumar et al. Segmentation of printed text in devanagari script and gurmukhi script
US10949509B2 (en) Watermark embedding and extracting method for protecting documents
CN102495833A (en) Document watermark copyright information protection device based on Opentype vector outline fonts
Tan et al. Print-Scan Resilient Text Image Watermarking Based on Stroke Direction Modulation for Chinese Document Authentication.
CN100498834C (en) Digital water mark embedding and extracting method and device
Wu et al. A printer forensics method using halftone dot arrangement model
CN112651879A (en) Text image digital watermarking method capable of resisting printing and scanning
US20110170133A1 (en) Image forming apparatus, method of forming image and method of authenticating document
JP2011139449A (en) Method and system for embedding messages into structure shapes
Cu et al. Watermarking for security issue of handwritten documents with fully convolutional networks
EP1310940A1 (en) Color display device and method
US7221795B2 (en) Document processing method, recording medium having recorded thereon document processing program, document processing program, document processing apparatus, and character-input document
JP2003115031A (en) Image processor and its method
KR100814029B1 (en) Method for digital watermarking
JP5517028B2 (en) Image processing device
Cheng et al. Steganalysis of binary text images
Xia et al. Print-scan resilient watermarking for the Chinese text image
US8125691B2 (en) Information processing apparatus and method, computer program and computer-readable recording medium for embedding watermark information
CN115239605A (en) Anti-printing scanning method for text image based on pixel invariance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination