CN112651879A

CN112651879A - Text image digital watermarking method capable of resisting printing and scanning

Info

Publication number: CN112651879A
Application number: CN202011555431.6A
Authority: CN
Inventors: 王志明; 张烜; 裴春红
Original assignee: Shanxi Xidian Information Technology Research Institute Co ltd
Current assignee: Shanxi Xidian Information Technology Research Institute Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-13

Abstract

The invention discloses a printing and scanning resistant text image digital watermarking method, which comprises the following specific steps: (1) recognizing a text line; (2) text word recognition; (3) preprocessing watermark information; (4) watermark embedding preprocessing; (5) embedding a watermark; (6) correcting the image; (7) watermark extraction preprocessing; (8) and extracting the watermark. The invention solves the problems that the redundancy of the text watermarking method is small when the watermark is embedded, and the watermark has weak capability of resisting printing and scanning attacks. The invention utilizes the advantages of the spatial domain text watermarking algorithm, not only has strong robustness to the printing scanning attack, but also has good invisibility of the watermark, simple calculation and high watermark embedding capacity.

Description

Text image digital watermarking method capable of resisting printing and scanning

Technical Field

The invention belongs to the technical field of image processing, and further relates to a digital watermarking method for a printing and scanning resistant text image in the technical field of information hiding. The invention can be used for embedding the watermark information into the text image when printing/printing the output image, and can realize the copyright protection of printed and printed products by scanning the printed image and extracting the embedded watermark information.

Background

The phenomenon that the copyright of paper books and other paper data is infringed is increasingly serious in the internet era, and the legal rights and interests of authors and publishers are seriously damaged. The text-based digital watermarking technology can hide information of a group or a personal identification property in an electronic or paper text in a specific mode, and the information cannot be recognized by human eyes, so that the copyright protection effect on the text works is realized. Text digital watermarking methods are mainly classified into two categories: space domain methods and frequency domain methods.

Huanghua et al, in "a new text digital watermark marking strategy and detection method" ("Sian university of transportation proceedings 2002, 36(2):165-168), propose a new line shift marking strategy, which does not use a reference line, and each text line moves up and down according to the previous text line to realize the embedding of the watermark. The method simplifies the detection method and realizes the blind extraction of the watermark, but the method still has the following defects that the algorithm has smaller watermark capacity due to the limited number of lines of Chinese text.

Tan Zheng et al put forward a text watermarking method based on three-level DWT transformation in the 'anti-printing-scanning digital watermarking technology based on document image' (computer applied research 2007, 24 (12): 199-; the text image has simple texture details, uneven distribution and small redundant space, so that the invisibility and the robustness of the watermark are difficult to balance.

Disclosure of Invention

The invention aims to provide a text image digital watermarking method for resisting printing and scanning, which is mainly used for embedding watermarks into printed/printed works while printing/printing and provides basis for copyright protection. The invention aims to solve the main problems that the existing text watermarking method is difficult to resist printing scanning attack, the watermarking capacity is low, the realization is complex, and especially the balance between the invisibility and the robustness of the watermarking is difficult to obtain.

The invention comprises two processes of watermark embedding and watermark extracting;

the watermark embedding process comprises the following specific steps:

(1) and (3) text line recognition:

1a) counting black pixel points of each pixel row of the carrier image and calculating the line width of the pixel row;

1b) calculating the ratio of the total number of black pixels to the width of each pixel line, judging the blank pixel line if the ratio is 0, and judging the text pixel line if the ratio is not 0;

1c) and traversing all pixel lines from top to bottom, judging the upper boundary of the text line of the current pixel line if the current pixel line is a text pixel line and the previous pixel line is a blank pixel line, and judging the lower boundary of the text line of the previous pixel line if the current pixel line is a blank pixel line and the previous pixel line is a text pixel line. Storing the recognized text line boundary into a line boundary array;

(2) text word recognition:

2a) counting black pixel points of each identified text row pixel column;

2b) traversing the pixel row of the current row from left to right, if the total number of the black pixel points of the current row is not 0 and the total number of the black pixel points of the previous row is 0, judging that the current row is a left boundary of the text character, and if the total number of the black pixel points of the current row is 0 and the total number of the black pixel points of the previous row is not 0, judging that the previous row is a right boundary of the text character. Storing the recognized text word boundary into a word boundary array;

2c) carrying out word width and word space statistical sorting on the determined text word boundary, and setting a word width intermediate value and a word space intermediate value as threshold values T₁And a threshold value T₂If the width of the two adjacent text characters is smaller than the threshold value T₁And the distance between two text words is less than the threshold value T₂If so, judging that the current two text characters jointly form a Chinese character, deleting the right boundary of the left text character and the left boundary of the right text character from the character boundary array, and updating the character boundary array;

2d) performing word width statistical sorting on the updated text word boundaries, and setting a word width intermediate value as a threshold value T₃Width of Chinese character 'Pai' is larger than 1.8 × T₃The text character is split, columns with the total number of black pixel points being 0 are respectively searched leftwards and rightwards from the middle position of the text character, and the columns are used as newly-added left and right boundaries and stored in a character boundary array;

(3) preprocessing watermark information:

the ten-digit Arabic numeral characters are converted into corresponding ASCII codes, and the ASCII codes are subjected to cyclic operation to obtain a binary watermark information sequence to be embedded;

(4) watermark embedding preprocessing:

carrying out word width and word space statistical sorting on the word boundary array obtained in the step (2), and setting a word width intermediate value and a word space intermediate value as threshold values T respectively₄And a threshold value T₅Width of word is smaller than

Setting 0 as a special mark at the boundary of the text word;

(5) and (3) watermark embedding process:

5a) embedding is not carried out on the head and tail text characters of each recognized text line, only the even numbered text characters of each text line are embedded, if the left and right boundaries of the current even numbered text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, the current text character is judged to be an embeddable text character, otherwise, the text character is not embeddable;

5b) comparing the left and right word spacing of the embeddable text word, if the left word spacing and the right word spacing are both greater than 4 XT₅If not, the text word is changed into a non-embeddable text word, otherwise, the text word is changed into an embeddable text word;

5c) if the current watermark information to be embedded is 0, moving the embeddable text characters to enable the left-side character spacing to be smaller than the right-side character spacing, and if the current watermark information to be embedded is 1, moving the embeddable text characters to enable the left-side character spacing to be larger than the right-side character spacing;

5d) embedding watermarks into the embeddable text characters of all the text lines, and printing and scanning the text images embedded with the watermarks to obtain scanned images embedded with the watermarks;

the watermark extraction process comprises the following specific steps:

(6) and (3) image rectification:

6a) taking the upper half part of the scanned watermark-containing image, turning the upper half part of the scanned watermark-containing image into a white image with a black background, and removing a white edge of the image generated by inclination in the scanning process;

6b) performing expansion operation on the black-background white character image in the horizontal and vertical directions to connect discontinuous characters into longer line segments;

6c) performing edge detection on the expanded image, and performing Hough transformation on edge points to find the inclination angle theta of the longest line segment;

6d) rotating the scanned text image by an angle theta to remove an image black edge generated by image rotation;

(7) watermark extraction preprocessing:

7a) performing text line recognition according to the step (1), and performing text column recognition according to the step (2);

7b) carrying out word width and word space statistical sorting on the word boundary array obtained in the step (2), and respectively taking a word width intermediate value and a word space intermediate value as threshold values T'₄And a threshold value T'₅Width of word is smaller than

Setting 0 as a special mark at the boundary of the text word;

(8) and (3) watermark extraction process:

8a) the method comprises the steps that the head and tail text characters of each recognized text line are not extracted, only the even numbered text characters of each text line are embedded, if the left and right boundaries of the current even numbered text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, the current text character is judged to be an extractable text character, otherwise, the text character is not extractable;

8b) comparing left and right word spaces of the extractable text words, and if the left word space and the right word space are both larger than 4 xT'₅Then change it to non-extractable textOtherwise, the text word can be extracted;

8c) if the left-side character spacing of the current extractable text characters is smaller than the right-side character spacing, extracting the watermark information to be 0, and if the left-side character spacing of the current extractable text characters is larger than the right-side character spacing, extracting the watermark information to be 1;

8d) extracting watermarks from the extractable text characters of all text lines, connecting the extracted watermark information into a binary watermark sequence, converting the binary sequence ASCII code into corresponding Arabic numeral characters, and obtaining the finally extracted watermark digital information.

Compared with the prior art, the invention has the following advantages:

firstly, the invention selects to embed the watermark in the airspace, and realizes the embedding of the watermark through the tiny movement of the text word, thereby overcoming the defect of poor capability of resisting the printing and scanning attacks in the prior art, and leading the invention to have the advantages of strong robustness and good invisibility.

Secondly, the invention does not need to carry out complex matrix DWT and DCT transformation and does not need to carry out blocking on the image, so that the invention has the advantage of extracting the watermark image more quickly and accurately.

Thirdly, the watermark embedding method constructed by the invention not only improves the watermark capacity, but also overcomes the defect that the original image data is required to be referred for watermark extraction in the prior similar technology, so that the invention has the advantages of high watermark capacity and blind extraction.

Description of the drawings:

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic of an experiment according to the present invention;

FIG. 3 schematically illustrates: referring to fig. 3, the print scanning resistant digital watermark system of the present invention firstly identifies text words and text lines of a carrier image, secondly embeds watermarks by using ASCII codes converted from watermark information, secondly performs print scanning operation on the images after watermark embedding, and finally performs correction and watermark extraction on the scanned images to obtain extracted watermark information.

The specific implementation mode is as follows:

the present invention will be described in detail below with reference to the accompanying drawings.

The specific steps of the method of the present invention are as follows with reference to figure 1.

And step 1, recognizing text lines.

And counting black pixel points and calculating the line width of the pixel lines of each pixel line of the carrier image, calculating the ratio of the total number of the black pixel points to the line width of the pixel lines, judging the line blank pixel line if the ratio is 0, and judging the line text pixel line if the ratio is not 0. And traversing all pixel lines from top to bottom, judging the upper boundary of the text line of the current pixel line if the current pixel line is a text pixel line and the previous pixel line is a blank pixel line, and judging the lower boundary of the text line of the previous pixel line if the current pixel line is a blank pixel line and the previous pixel line is a text pixel line. And storing the recognized text line boundary into a line boundary array.

Step 2, text word recognition:

and counting black pixels of each identified text row pixel column. Traversing the pixel rows from left to right, if the total number of the black pixel points of the current row is not 0 and the total number of the black pixel points of the previous row is 0, judging that the current row is a text character left boundary, and if the total number of the black pixel points of the current row is 0 and the total number of the black pixel points of the previous row is not 0, judging that the previous row is a text character right boundary. The recognized text word boundaries are stored in a word boundary array.

Carrying out word width and word space statistical sorting on text word boundaries, and setting a word width intermediate value and a word space intermediate value as threshold values T respectively₁And a threshold value T₂If the width of the two adjacent text characters is smaller than the threshold value T₁And the distance between two text words is less than the threshold value T₂And judging that the current two text characters jointly form a Chinese character, deleting the right boundary of the left text character and the left boundary of the right text character from the character boundary array, and updating the character boundary array.

Performing word width statistical sorting on the updated text word boundaries, and setting a word width intermediate value as a threshold value T₃Width of Chinese character 'Pai' is larger than 1.8 × T₃Is divided from the middle position of the text wordAnd respectively searching the rows with the total number of black pixels being 0 to the left and the right, and storing the rows as newly-added left and right boundaries into the word boundary array.

And 3, preprocessing watermark information.

And converting the ten-digit Arabic numeral characters into corresponding ASCII codes, and performing cycle operation on the ASCII codes to obtain a binary watermark information sequence to be embedded.

And 4, watermark embedding preprocessing.

The text word boundary of (1) is set with 0 as a special mark.

And 5, embedding the watermark.

The first step, the head and tail text characters of each recognized text line are not embedded, only the even number text characters of each text line are embedded, if the left and right boundaries of the current even number text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, the current text character is judged to be an embeddable text character, otherwise, the text character is not embeddable.

Secondly, comparing the left and right word spaces of the embeddable text words, if the left word space and the right word space are both larger than 4 XT₅If the text word is not embeddable, the text word is changed to be non-embeddable, otherwise, the text word is embeddable.

And thirdly, if the current information of the watermark to be embedded is 0, moving the embeddable text characters to enable the left-side character spacing to be smaller than the right-side character spacing, and if the current information of the watermark to be embedded is 1, moving the embeddable text characters to enable the left-side character spacing to be larger than the right-side character spacing.

And fourthly, embedding watermarks into the embeddable text characters of all the text lines, and printing and scanning the text images embedded with the watermarks to obtain scanned images embedded with the watermarks.

And 6, correcting the image.

And (3) turning the upper half part of the scanned watermark-containing image into a white image with black background, and removing the white edge of the image generated by inclination in the scanning process. The expansion operation is carried out on the black-background white character image in the horizontal and vertical directions, and discontinuous characters are connected into long line segments. And carrying out edge detection on the expanded image, carrying out Hough transformation on edge points, and finding out the inclination angle theta of the longest line segment. The scanned text image is rotated by an angle theta, and the black edge of the image generated by the image rotation is removed.

And 7, extracting and preprocessing the watermark.

And (5) performing text line recognition according to the step (1), and performing text column recognition according to the step (2). To step (2)

Carrying out statistical sorting on word width and word space of the obtained word boundary array, and respectively taking a word width intermediate value and a word space intermediate value as threshold values T'₄And a threshold value T'₅Width of word is smaller than

The text word boundary of (1) is set with 0 as a special mark.

And 8, extracting the watermark.

Step one, not extracting the head and tail text characters of each identified text line, only embedding the even text characters of each text line, if the left and right boundaries of the current even text characters are not 0, and the right boundary of the left text character and the left boundary of the right text character are not 0, determining that the current text character is an extractable text character, otherwise, determining that the current text character is an unextractable text character;

secondly, comparing left and right word spaces of the extractable text words, and if the left word space and the right word space are both larger than 4 xT'₅If the text word is not extractable, the text word is changed into an extractable text word, otherwise, the text word is changed into an extractable text word;

thirdly, if the left-side character spacing of the current extractable text characters is smaller than the right-side character spacing, extracting the watermark information to be 0, and if the left-side character spacing of the current extractable text characters is larger than the right-side character spacing, extracting the watermark information to be 1;

and fourthly, extracting watermarks from the extractable text characters of all text lines, connecting the extracted watermark information into a binary watermark sequence, converting the binary watermark sequence ASCII code into corresponding Arabic numeral characters, and obtaining finally extracted watermark digital information.

The effects of the present invention will be further described with reference to the experimental drawings.

The printer model selected in the experiment of the invention is HP Color Laser MFP 178nw, the scanner of the printer is used for scanning, and the printing resolution and the scanning resolution are both 300 dpi. The quality of the image containing the watermark is evaluated by SSIM (structural similarity), and the anti-attack performance of the watermark is evaluated by DR (correct extraction rate). The watermark embedding and extracting of the text image by using the method of the invention are as follows:

referring to FIG. 2, the printer model HP Color Laser MFP 178nw was used in the experiment of the present invention, and the scanner of the printer itself was used for scanning. Fig. 2(a) shows an image of an image carrier having a size of 4958 × 7017, and 1803121134 is information of a watermark to be embedded. Using the method of the present invention, the obtained watermark-containing image is shown in fig. 2(b), SSIM is 0.9429, the image after being corrected by printing and scanning is shown in fig. 2(c), the extracted watermark information is shown in fig. 2(d), DR is 94.23%;

as a result of the experiment, it can be seen from fig. 2(b) that the carrier image has good invisibility after the watermark is embedded. As can be seen from fig. 2(d), the watermark-containing image can still correctly extract the watermark after being printed and scanned, which shows that the method of the present invention has strong robustness against print-scan attacks.

Claims

1. A text image digital watermarking method resisting printing and scanning comprises two processes of watermark embedding and watermark extracting;

the watermark embedding process comprises the following specific steps:

(1) and (3) text line recognition:

(2) text word recognition:

2a) counting black pixel points of each identified text row pixel column;

(3) preprocessing watermark information:

(4) watermark embedding preprocessing:

Setting 0 as a special mark at the boundary of the text word;

(5) and (3) watermark embedding process:

the watermark extraction process comprises the following specific steps:

(6) and (3) image rectification:

(7) watermark extraction preprocessing:

Setting 0 as a special mark at the boundary of the text word;

(8) and (3) watermark extraction process:

8b) comparing left and right word spaces of the extractable text words, and if the left word space and the right word space are both larger than 4 xT'₅If the text word is not extractable, the text word is changed into an extractable text word, otherwise, the text word is changed into an extractable text word;

2. The method of claim 1 for print scan resistant digital watermarking of text images, wherein: the black pixel point in the step 1a) is a pixel with a pixel value of 0.

3. The method of claim 1 for print scan resistant digital watermarking of text images, wherein: the pixel line width in the step 1a) refers to the number of pixels between the first black pixel and the last black pixel (including the head and tail black pixels) in the pixel line.