CN111861846A

CN111861846A - Electronic document digital watermark processing method and system

Info

Publication number: CN111861846A
Application number: CN202010660167.6A
Authority: CN
Inventors: 曾国坤
Original assignee: Harbin Institute of Technology Shenzhen
Current assignee: Harbin Institute of Technology Shenzhen
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2020-10-30

Abstract

The invention relates to a digital watermark processing method and system for electronic documents. The method includes a watermark embedding step and a watermark extraction step. The watermark embedding step includes: acquiring user rights; collecting watermark information, where the watermark information includes user information; The watermark information is uploaded to the server; the watermark information is received and stored in the database; a plurality of features in the watermark information are extracted according to a predetermined algorithm, and the features are fused to generate a digital watermark; after the electronic document is converted into an image, the The digital watermark is embedded in the image, and the image is restored into an electronic document; the watermark extraction step is a reverse operation of the watermark embedding step. The invention can not only solve the problems of low embedding capacity and poor robustness in the prior art, but also has the advantages of better encryption effect, large key space, strong sensitivity, and resistance to exhaustive attacks and statistical attacks.

Description

Electronic document digital watermark processing method and system

技术领域technical field

本发明涉及数字水印处理技术领域，特别涉及一种电子文档数字水印处理方法和系统。The invention relates to the technical field of digital watermark processing, in particular to a method and system for digital watermark processing of electronic documents.

背景技术Background technique

计算机网络和信息技术的快速发展，加快了办公自动化和电子商务的发展，大量的数字信息如视频、图像、邮件、文档等通过网络传播，但随之发生的非法侵权及版权保护问题日益突出，数字水印技术也因此得到了广泛关注和研究。数字水印技术在多媒体数据中得以实现，主要依赖于这种载体信息中存在大量的冗余信息，而电子文档图像的数据冗余度相对于其他多媒体数据非常有限，这也使得基于文本图像的数字水印技术发展较为局限，存在嵌入容量低、鲁棒性差等问题。The rapid development of computer networks and information technology has accelerated the development of office automation and e-commerce. A large number of digital information such as videos, images, emails, documents, etc. are transmitted through the network, but the subsequent illegal infringement and copyright protection problems have become increasingly prominent. Therefore, digital watermarking technology has received extensive attention and research. The realization of digital watermarking technology in multimedia data mainly depends on the existence of a large amount of redundant information in this carrier information, and the data redundancy of electronic document images is very limited compared to other multimedia data, which also makes digital image-based digital images. The development of watermarking technology is relatively limited, and there are problems such as low embedding capacity and poor robustness.

发明内容SUMMARY OF THE INVENTION

基于此，有必要提供一种电子文档数字水印处理方法和系统，其不仅可解决上述问题，而且具有更好的加密效果，密钥空间大，且敏感性强，能够抵抗穷举攻击和统计攻击等优点。Based on this, it is necessary to provide a digital watermark processing method and system for electronic documents, which can not only solve the above problems, but also have better encryption effect, large key space, and strong sensitivity, and can resist exhaustive attacks and statistical attacks. Etc.

为实现上述发明目的，本发明采用以下技术方案。In order to achieve the above purpose of the invention, the present invention adopts the following technical solutions.

本发明首先提供一种电子文档数字水印处理方法，包括水印嵌入步骤，所述水印嵌入步骤具体包括：The present invention first provides a digital watermark processing method for electronic documents, including a watermark embedding step, the watermark embedding step specifically includes:

获取用户权限；Obtain user permissions;

采集水印信息，所述水印信息包含用户信息；Collect watermark information, the watermark information includes user information;

将所述水印信息上传至服务器；uploading the watermark information to the server;

接收所述水印信息并存入数据库；Receive the watermark information and store it in the database;

按照预定算法提取所述水印信息中的多个特征，并进行特征融合，生成数字水印；Extract multiple features in the watermark information according to a predetermined algorithm, and perform feature fusion to generate a digital watermark;

将电子文档转换为图像后，将所述数字水印嵌入所述图像中，再将所述图像还原成电子文档。After the electronic document is converted into an image, the digital watermark is embedded in the image, and the image is restored into an electronic document.

上述电子文档数字水印处理方法中，还包括水印提取步骤，所述水印提取步骤具体包括：In the above-mentioned electronic document digital watermark processing method, it also includes a watermark extraction step, and the watermark extraction step specifically includes:

将已嵌入数字水印的电子文档转换为图像；Convert digitally watermarked electronic documents into images;

按照预定算法对图像进行处理，提取所述数字水印；Process the image according to a predetermined algorithm to extract the digital watermark;

采用逆运算，将所述数字水印转换成水印图像。Using an inverse operation, the digital watermark is converted into a watermarked image.

上述电子文档数字水印处理方法中，所述水印信息为用户的生物特征信息，包括用户的脸部特征、手部特征、指纹特征和声音特征中的一种或几种。In the above electronic document digital watermark processing method, the watermark information is the biometric information of the user, including one or more of the user's facial features, hand features, fingerprint features and voice features.

本发明再提供一种电子文档数字水印处理系统，包括水印嵌入单元，所述水印嵌入单元具体包括：The present invention further provides a digital watermark processing system for electronic documents, comprising a watermark embedding unit, and the watermark embedding unit specifically includes:

授权模块，用于获取用户权限；Authorization module, used to obtain user permissions;

信息采集模块，用于采集水印信息，所述水印信息包含用户信息；an information collection module for collecting watermark information, where the watermark information includes user information;

信息上传模块，用于将所述水印信息上传至服务器；an information uploading module for uploading the watermark information to the server;

数据库访问模块，用于接收所述水印信息并存入服务器的数据库；a database access module for receiving the watermark information and storing it in the database of the server;

水印处理模块，用于按照预定算法提取所述水印信息中的多个特征，并进行特征融合，生成数字水印；a watermark processing module, used for extracting multiple features in the watermark information according to a predetermined algorithm, and performing feature fusion to generate a digital watermark;

水印嵌入模块，用于将电子文档转换为图像后，将所述数字水印嵌入所述图像中，再将所述图像还原成电子文档。The watermark embedding module is used to embed the digital watermark in the image after converting the electronic document into an image, and then restore the image into an electronic document.

上述电子文档数字水印处理系统中，还包括水印提取单元，所述水印提取单元具体包括：The electronic document digital watermark processing system further includes a watermark extraction unit, and the watermark extraction unit specifically includes:

转换模块，用于将已嵌入数字水印的电子文档转换为图像；The conversion module is used to convert the digital watermark-embedded electronic document into an image;

提取模块，用于按照预定算法对图像进行处理，提取所述数字水印；an extraction module, configured to process the image according to a predetermined algorithm to extract the digital watermark;

还原模块，用于采用逆运算，将所述数字水印转换成水印图像。The restoration module is used for converting the digital watermark into a watermark image by inverse operation.

上述电子文档数字水印处理系统中，所述电子文档数字水印处理系统由移动客户端和服务器端组成，其中，所述移动客户端包括所述授权模块、信息采集模块和信息上传模块，所述服务器端包括所述数据库、数据库访问模块、水印处理模块、水印嵌入模块、水印解密模块、查找模块和提取模块。In the above electronic document digital watermark processing system, the electronic document digital watermark processing system consists of a mobile client and a server, wherein the mobile client includes the authorization module, the information collection module and the information upload module, the server The terminal includes the database, a database access module, a watermark processing module, a watermark embedding module, a watermark decryption module, a search module and an extraction module.

上述电子文档数字水印处理系统中，所述水印信息为用户的生物特征信息，包括用户的脸部特征、手部特征、指纹特征和声音特征中的一种或几种；In the above-mentioned electronic document digital watermark processing system, the watermark information is the biometric information of the user, including one or more of the user's facial features, hand features, fingerprint features and voice features;

所述移动客户端安装于手机，所述信息采集模块包括手机的摄像头、话筒、指纹识别器和/或人脸识别器。The mobile client is installed on a mobile phone, and the information collection module includes a camera, a microphone, a fingerprint identifier and/or a face identifier of the mobile phone.

本发明还提供一种电子文档数字水印处理方法，包括将水印图像嵌入电子文档中的水印嵌入步骤，所述水印嵌入步骤具体包括：The present invention also provides a digital watermark processing method for an electronic document, including a watermark embedding step of embedding a watermark image in the electronic document, and the watermark embedding step specifically includes:

将电子文档转换成位深为8的载体图像；Convert electronic documents into carrier images with a bit depth of 8;

选取白色像素点重构载体图像，即，使用离散小波算法，将载体图像分解成四个频域：LL、HL、LH、HH，选择HH频域部分作为水印嵌入区域；Select the white pixels to reconstruct the carrier image, that is, use the discrete wavelet algorithm to decompose the carrier image into four frequency domains: LL, HL, LH, HH, and select the HH frequency domain part as the watermark embedding area;

对所述水印图像进行奇异值分解；performing singular value decomposition on the watermark image;

将水印图像嵌入所述水印嵌入区域，即，用水印图像的奇异值代替水印嵌入区域的奇异值，并采用逆运算得到嵌入水印的重构载体子图像；Embed the watermark image into the watermark embedding area, that is, replace the singular value of the watermark embedding area with the singular value of the watermark image, and use the inverse operation to obtain the reconstructed carrier sub-image embedded in the watermark;

将所述重构载体子图像按照原始像素位置还原到电子文档图像。The reconstructed carrier sub-image is restored to the electronic document image according to the original pixel position.

提取嵌入水印后的电子文档图像的像素点，获得嵌入水印的载体图像；Extracting the pixel points of the watermark-embedded electronic document image to obtain the watermark-embedded carrier image;

对所述嵌入水印的载体图像进行离散小波变换，选择HH频域部分作为水印提取区域；Discrete wavelet transform is performed on the carrier image embedded in the watermark, and the HH frequency domain part is selected as the watermark extraction area;

对所述水印提取区域进行奇异值分解，并提取所述水印提取区域的奇异值；Perform singular value decomposition on the watermark extraction area, and extract the singular values of the watermark extraction area;

采用逆运算，利用获得的奇异值还原出水印图像。Using the inverse operation, the watermark image is restored by using the obtained singular values.

上述电子文档数字水印处理方法中，所述水印图像包含用户的生物特征信息，包括用户的脸部特征、手部特征、指纹特征和声音特征中的一种或几种。In the above-mentioned digital watermark processing method for electronic documents, the watermark image includes the biometric information of the user, including one or more of the user's facial features, hand features, fingerprint features and voice features.

相比于传统的数字水印技术，本发明的具有以下突出优点：Compared with the traditional digital watermarking technology, the present invention has the following outstanding advantages:

1、针对电子文档图像像素分布的特点，本发明提出了电子文档数字水印处理方法和系统，利用电子文档中存在大量的白色像素点，穿插在字里行间，这种大量存在相同像素值的特点是其他自然图像不存在的特点，通过这些像素点，重构成新的载体图像，将水印嵌入其中，再将其还原到原始像素位置，这种重构再还原的方式，也实现了嵌入水印的置乱操作，而本发明的水印嵌入方法是在图像频域进行操作的，结合离散小波变换和奇异值分解技术实现水印嵌入，实验结果表明该方法具有较好的鲁棒性。1. Aiming at the characteristics of pixel distribution of electronic document images, the present invention proposes a digital watermark processing method and system for electronic documents, which utilizes the existence of a large number of white pixel points in electronic documents, which are interspersed between the lines. The characteristics of natural images that do not exist, through these pixels, a new carrier image is reconstructed, the watermark is embedded in it, and then restored to the original pixel position. This method of reconstruction and restoration also realizes the scrambling of the embedded watermark. The watermark embedding method of the present invention operates in the image frequency domain, and combines discrete wavelet transform and singular value decomposition technology to achieve watermark embedding. The experimental results show that the method has good robustness.

2、通过分析电子文档中白色像素点的容量，本发明可实现2至3种生物特征作为水印图像，通过实验证明多重水印的嵌入并没有对载体图像的品质造成很大的影响，水印算法仍然具有较好的不可见性和鲁棒性。2. By analyzing the capacity of white pixels in the electronic document, the present invention can realize 2 to 3 kinds of biological features as watermark images. It is proved by experiments that the embedding of multiple watermarks does not have a great impact on the quality of the carrier image, and the watermarking algorithm is still It has better invisibility and robustness.

附图说明Description of drawings

图1为本发明实施例一中水印嵌入步骤的流程示意图；1 is a schematic flowchart of a watermark embedding step in Embodiment 1 of the present invention;

图2为本发明实施例一中水印提取步骤的流程示意图；2 is a schematic flowchart of a watermark extraction step in Embodiment 1 of the present invention;

图3为本发明实施例二中电子文档数字水印处理系统的原理框图；Fig. 3 is the principle block diagram of the electronic document digital watermark processing system in the second embodiment of the present invention;

图4为本发明实施例三中水印嵌入步骤的流程示意图；4 is a schematic flowchart of a watermark embedding step in Embodiment 3 of the present invention;

图5为本发明实施例三中水印提取步骤的流程示意图；5 is a schematic flowchart of a watermark extraction step in Embodiment 3 of the present invention;

图6a为本发明实施例三中仿真实验采用的载体图像的图例；Fig. 6a is a legend of the carrier image used in the simulation experiment in Embodiment 3 of the present invention;

图6b为本发明实施例三中仿真实验采用的水印图像的图例；Fig. 6b is the legend of the watermark image adopted in the simulation experiment in Embodiment 3 of the present invention;

图6c为本发明实施例三中仿真实验中嵌入水印图像后的图例；Fig. 6c is the legend after embedding the watermark image in the simulation experiment in Embodiment 3 of the present invention;

图6d为本发明实施例三中仿真实验中提取出的水印图像的图例；6d is a legend of the watermark image extracted in the simulation experiment in Embodiment 3 of the present invention;

图7为本发明实施例三中嵌入水印后的电子文档图像的图例；Fig. 7 is the legend of the electronic document image after embedding the watermark in Embodiment 3 of the present invention;

图8a为本发明实施例三中基于多重特征特征对电子文档进行水印处理试验的中文文档的载体图像的图例；Fig. 8a is a legend of a carrier image of a Chinese document for performing a watermark processing test on an electronic document based on multiple features in the third embodiment of the present invention;

图8b、8c、8d分别为拟嵌入图8a所示图像中的水印图像的图例；Figures 8b, 8c, and 8d are legends of watermark images to be embedded in the image shown in Figure 8a, respectively;

图8e为图8a嵌入8b、8c、8d水印图像后的图例；Fig. 8e is a legend after embedding 8b, 8c, 8d watermark images in Fig. 8a;

图9a为本发明实施例三中基于多重特征特征对电子文档进行水印处理试验的英文文档的载体图像的图例；Fig. 9a is a legend of a carrier image of an English document for performing a watermarking test on an electronic document based on multiple features in the third embodiment of the present invention;

图9b、9c、9d分别为拟嵌入图8a所示图像中的水印图像的图例；Figures 9b, 9c, and 9d are legends of watermark images to be embedded in the image shown in Figure 8a, respectively;

图9e为图9a嵌入9b、9c、9d水印图像后的图例；Fig. 9e is a legend after the watermark images 9b, 9c, and 9d are embedded in Fig. 9a;

本发明目的的实现及其功能、原理将在具体实施方式中结合附图作进一步阐述。The realization of the object of the present invention, its function and principle will be further described in the specific embodiments in conjunction with the accompanying drawings.

具体实施方式Detailed ways

下面结合附图及具体实施例做进一步说明。Further description will be given below in conjunction with the accompanying drawings and specific embodiments.

实施例一：Example 1:

如图1所示，本实施例提供一种电子文档数字水印处理方法，主要包括水印嵌入步骤和水印提取步骤。水印嵌入步骤是指将水印图像嵌入到电子文档中的过程，水印提取步骤是指从已嵌入了数字水印的电子文档中提取出原始水印图像的过程，主要用于进行身份认证等。As shown in FIG. 1 , this embodiment provides a digital watermark processing method for an electronic document, which mainly includes a watermark embedding step and a watermark extraction step. The watermark embedding step refers to the process of embedding the watermark image into the electronic document, and the watermark extraction step refers to the process of extracting the original watermark image from the digital watermark embedded electronic document, which is mainly used for identity authentication and so on.

其中，上述水印嵌入步骤主要包括以下几个步骤：Wherein, the above watermark embedding step mainly includes the following steps:

S11：获取用户权限；S11: Obtain user permissions;

S12：采集水印信息，所述水印信息包含用户信息；S12: Collect watermark information, where the watermark information includes user information;

S13：将所述水印信息上传至服务器；S13: upload the watermark information to the server;

S14：接收所述水印信息并存入数据库；S14: Receive the watermark information and store it in the database;

S15：按照预定算法提取所述水印信息中的多个特征，并进行特征融合，生成数字水印；S15: Extract multiple features in the watermark information according to a predetermined algorithm, and perform feature fusion to generate a digital watermark;

S16：将电子文档转换为图像后，将所述数字水印嵌入所述图像中，再将所述图像还原成电子文档。S16: After converting the electronic document into an image, embed the digital watermark in the image, and then restore the image into an electronic document.

具体来说，步骤S11获取用户权限是指通过登录注册用户的账户和密码，经验证通过后，获取用户的操作权限的一个过程。该用户可以是电子文档的创作人，也可以是该电子文档的版权权利人，或者其它利益相关方。Specifically, obtaining the user authority in step S11 refers to a process of obtaining the user's operation authority by logging in the account and password of the registered user and after passing the verification. The user may be the creator of the electronic document, the copyright owner of the electronic document, or other interested parties.

步骤S12用于采集水印信息，本实施例的水印信息是指拟嵌入到电子文档内作为数字水印的原始认证或者标志信息，可以是用户的特征特征信息，包括用户的脸部特征、手部特征、指纹特征和声音特征中的一种或几种。Step S12 is used to collect watermark information. The watermark information in this embodiment refers to the original authentication or sign information to be embedded in the electronic document as a digital watermark, which may be the user's characteristic feature information, including the user's facial features and hand features. , one or more of fingerprint features and voice features.

采集完成后，将水印信息上传至服务器。After the collection is completed, upload the watermark information to the server.

然后，服务器接收该水印信息并存入数据库。Then, the server receives the watermark information and stores it in the database.

同时，对于该水印信息，按照预定的算法提取该水印信息中的多个特征，并进行特征融合，以生成数字水印。Meanwhile, for the watermark information, a plurality of features in the watermark information are extracted according to a predetermined algorithm, and feature fusion is performed to generate a digital watermark.

最后，在将电子文档转换为图像后，将上述数字水印嵌入该图像中，再将该图像还原成电子文档。Finally, after converting the electronic document into an image, the above-mentioned digital watermark is embedded in the image, and then the image is restored into an electronic document.

本实施例所称电子文档包括但不限于office文档、pdf文档和位图文档。这样，在完成了对电子文档的水印嵌入操作后，可保证电子文档在传输中的安全性和真实性。The electronic documents referred to in this embodiment include, but are not limited to, office documents, pdf documents, and bitmap documents. In this way, after the watermark embedding operation on the electronic document is completed, the security and authenticity of the electronic document during transmission can be guaranteed.

参照图2所示，上述水印提取步骤主要包括以下几个步骤：Referring to Figure 2, the above-mentioned watermark extraction step mainly includes the following steps:

S21：将已嵌入数字水印的电子文档转换为图像；S21: Convert the digital watermark-embedded electronic document into an image;

S22：按照预定算法对图像进行处理，提取所述数字水印；S22: Process the image according to a predetermined algorithm to extract the digital watermark;

S23：采用逆运算，将所述数字水印转换成水印图像。S23: Convert the digital watermark into a watermark image by inverse operation.

水印提取步骤主要用于判断数字水印是否完整或是否受到攻击，基本上属于水印嵌入步骤的逆向操作，此处不再详述。The watermark extraction step is mainly used to judge whether the digital watermark is complete or under attack, which is basically the reverse operation of the watermark embedding step, and will not be described in detail here.

数字图像在传输过程中，不可避免会受到一些影响，例如压缩、扫描等，这种攻击称为无意攻击，而有些盗版者蓄意破坏数字产品，篡改图像内容甚至伪造水印信息，这类攻击称为恶意攻击。无论哪一种攻击，都会对提取出的水印检测造成影响，本实施例的水印嵌入和提取技术可有效避免电子文档遭到破坏或篡改。In the process of transmission, digital images will inevitably be affected by some influences, such as compression, scanning, etc. This kind of attack is called unintentional attack, and some pirates deliberately destroy digital products, tamper with image content and even forge watermark information. Such attacks are called vicious assault. No matter what kind of attack, it will affect the detection of the extracted watermark, and the watermark embedding and extraction technology in this embodiment can effectively prevent the electronic document from being damaged or tampered with.

实施例二：Embodiment 2:

如图3所示，本实施例提供一种电子文档数字水印处理系统100，它主要由两大部分组成，一是移动客户端，可安装于手机等移动智能设备上，主要作用是用户注册，并上传用户的生物特征图片，以作为后续嵌入水印的特征信息；一是服务器端，可架设于互联网远端或云端服务器，主要负责移动客户端上传的数据和文档的保存、管理及对电子文档的处理。该电子文档数字水印处理系统100可划分为水印嵌入单元10和水印提取单元20，其中，水印嵌入单元10包括：As shown in FIG. 3 , this embodiment provides a digital watermark processing system 100 for electronic documents, which is mainly composed of two parts. One is a mobile client, which can be installed on mobile smart devices such as mobile phones, and its main functions are user registration, And upload the user's biometric picture as the feature information for the subsequent embedding of the watermark; one is the server side, which can be set up on the Internet remote or cloud server, and is mainly responsible for the preservation and management of the data and documents uploaded by the mobile client and the electronic documents. processing. The electronic document digital watermark processing system 100 can be divided into a watermark embedding unit 10 and a watermark extracting unit 20, wherein the watermark embedding unit 10 includes:

授权模块110，位于移动客户端，用于获取用户权限；The authorization module 110, located in the mobile client, is used to obtain user rights;

信息采集模块120，位于移动客户端，用于采集水印信息，所述水印信息包含用户信息；An information collection module 120, located on the mobile client, is used to collect watermark information, where the watermark information includes user information;

信息上传模块130，位于移动客户端，用于将所述水印信息上传至服务器；an information uploading module 130, located on the mobile client, for uploading the watermark information to the server;

数据库访问模块140，位于服务器端，用于接收所述水印信息并存入服务器的数据库；A database access module 140, located on the server side, is used to receive the watermark information and store it in the database of the server;

水印处理模块150，位于服务器端，用于按照预定算法提取所述水印信息中的多个特征，并进行特征融合，生成数字水印；The watermark processing module 150, located on the server side, is used for extracting a plurality of features in the watermark information according to a predetermined algorithm, and performing feature fusion to generate a digital watermark;

水印嵌入模块160，位于服务器端，用于将电子文档转换为图像后，将所述数字水印嵌入所述图像中，再将所述图像还原成电子文档。The watermark embedding module 160, located on the server side, is used for converting the electronic document into an image, embedding the digital watermark in the image, and then restoring the image into an electronic document.

水印提取单元20全部位于服务器端，具体包括：The watermark extraction units 20 are all located on the server side, and specifically include:

转换模块210，用于将已嵌入数字水印的电子文档转换为图像；a conversion module 210, for converting the digital watermark-embedded electronic document into an image;

提取模块220，用于按照预定算法对图像进行处理，提取所述数字水印；an extraction module 220, configured to process the image according to a predetermined algorithm to extract the digital watermark;

还原模块230，用于采用逆运算，将所述数字水印转换成水印图像。The restoration module 230 is used for converting the digital watermark into a watermark image by inverse operation.

所提取出的水印图像可反馈至移动客户端，供用户查看、保存和认证。The extracted watermark image can be fed back to the mobile client for users to view, save and authenticate.

本实施例采用用户的生物特征信息作为水印信息，包括但不限于用户的脸部特征、手部特征、指纹特征和声音特征中的一种或几种。This embodiment uses the user's biometric feature information as the watermark information, including but not limited to one or more of the user's facial feature, hand feature, fingerprint feature, and voice feature.

所述移动客户端安装于手机等智能移动设备，所述信息采集模块包括该设备的摄像头、话筒、指纹识别器和/或人脸识别器，分别用于采集人脸特征、手部特征、声音特征和指纹特征，等等。The mobile client is installed on an intelligent mobile device such as a mobile phone, and the information collection module includes a camera, a microphone, a fingerprint reader and/or a face recognizer of the device, which are used to collect facial features, hand features, and voices respectively. features and fingerprint features, etc.

本实施例将多重生物特征与数字水印相结合，可提高文档系统的安全性，实现保护数字产品的版权、证明文档内容的真实性、跟踪盗版行为以及判断载体是否被篡改。若用户需要对电子文档内容进行真实性判断或追踪版权，则可从电子文档中提取出水印，做进一步的篡改检测以及生物认证。This embodiment combines multiple biological features with digital watermarks, which can improve the security of the document system, protect the copyright of digital products, prove the authenticity of document content, track piracy, and determine whether the carrier has been tampered with. If the user needs to judge the authenticity of the content of the electronic document or track the copyright, the watermark can be extracted from the electronic document for further tampering detection and biometric authentication.

实施例三：Embodiment three:

如图4所示，本实施例还具体提供一种电子文档数字水印处理方法。经统计发现，电子文档图像存在大量白色像素点。本方法正是基于此发现，选取电子文档作为载体图像，采用适当技术进行水印嵌入和提取的，具体包括两大步骤：一是水印嵌入步骤，二是与之逆向的水印提取步骤。As shown in FIG. 4 , this embodiment further specifically provides a method for processing digital watermarking of an electronic document. According to statistics, there are a large number of white pixels in electronic document images. Based on this finding, this method selects an electronic document as a carrier image, and uses appropriate techniques to embed and extract the watermark, which includes two steps: one is the watermark embedding step, and the other is the reverse watermark extraction step.

上述水印嵌入步骤具体包括：The above watermark embedding steps specifically include:

S31：将电子文档转换成位深为8的载体图像；S31: Convert the electronic document into a carrier image with a bit depth of 8;

S32：选取白色像素点重构载体图像，即，使用离散小波算法，将载体图像分解成四个频域：LL、HL、LH、HH，选择HH频域部分作为水印嵌入区域；S32: Select the white pixels to reconstruct the carrier image, that is, use the discrete wavelet algorithm to decompose the carrier image into four frequency domains: LL, HL, LH, HH, and select the HH frequency domain part as the watermark embedding area;

S33：对所述水印图像进行奇异值分解；S33: Perform singular value decomposition on the watermark image;

S34：将水印图像嵌入所述水印嵌入区域，即，用水印图像的奇异值代替水印嵌入区域的奇异值，并采用逆运算得到嵌入水印的重构载体子图像；S34: Embed the watermark image into the watermark embedding area, that is, replace the singular value of the watermark embedding area with the singular value of the watermark image, and use the inverse operation to obtain the reconstructed carrier sub-image embedded in the watermark;

S35：将所述重构载体子图像按照原始像素位置还原到电子文档图像。S35: Restore the reconstructed carrier sub-image to the electronic document image according to the original pixel position.

位深为8的图像的像素值分布普遍集中在灰度值的中后段，通过对大量电子文档进行统计，发现像素值集中分布在80到255之间，存在大量白色像素点，且这种情况只有文档才会出现。The pixel value distribution of an image with a bit depth of 8 is generally concentrated in the middle and rear segments of the gray value. Through statistics on a large number of electronic documents, it is found that the pixel value is concentrated between 80 and 255, and there are a large number of white pixels. The situation only occurs with documentation.

上述水印图像包含用户的生物特征信息，包括用户的脸部特征、手部特征、指纹特征和声音特征中的一种或几种。The above watermark image contains the user's biometric information, including one or more of the user's facial features, hand features, fingerprint features and voice features.

步骤S32中，选择HH频域部分作为水印嵌入区域，因为这一部分主要代表了原载体图像的边缘和纹理信息，包含较少的主体信息，因此将水印图像嵌入这一频域部分对载体图像的质量影响较小，本实施例的水印嵌入步骤采用以下算法：In step S32, the HH frequency domain part is selected as the watermark embedding area, because this part mainly represents the edge and texture information of the original carrier image, and contains less subject information, so the watermark image is embedded in this frequency domain part to the carrier image. The quality impact is small, and the watermark embedding step of this embodiment adopts the following algorithm:

假设输入电子文档图像D，水印图像W_p×q。从文档图片中提取出像素值为255的像素点，也即白色像素点，重构为新的载体图像A_m×n，并用矩阵P＝{p_ij3记录提取出的像素点在原载体图像的具体位置。Assume that the electronic document image D is input, and the watermark image W _p×q . Extract the pixel point with a pixel value of 255 from the document picture, that is, the white pixel point, reconstruct it into a new carrier image A _m×n , and use the matrix P={p _ij 3 to record the extracted pixel point in the original carrier image. specific location.

对重构的载体图像A_mxn做一级离散小波分解，生成四个子图，分别为：LL，HL，LH，HH。The first-level discrete wavelet decomposition is performed on the reconstructed carrier image A _mxn , and four sub-images are generated, namely: LL, HL, LH, HH.

用以下公式对HH频域子带做奇异值分解：Singular value decomposition of the HH frequency domain subbands is performed using the following formula:

HH＝U_h×S_h×V_h ^T：HH=U _h ×S _h ×V _h ^T :

其中，U_h为m×m的正交矩阵，V_h为n×n的正交矩阵，S_n＝diag(σ₁，σ₂，…，σ_n，)，σ_i为图像的奇异值。Wherein, U _h is an m×m orthogonal matrix, V _h is an n×n orthogonal matrix, S _n =diag(σ ₁ ,σ ₂ ,...,σ _n ,), σ _i is a singular value of the image.

然后，对水印图像W进行奇异值分解：Then, perform singular value decomposition on the watermark image W:

WaterImage-U_s×S_s×V_s ^T；WaterImage-U _s ×S _s ×V _s ^T ;

接着，用水印图像的奇异值S_s替代重构载体图像的奇异值S_h。Next, the singular values S _h of the reconstructed carrier image are replaced by the singular values S _s of the watermark image.

再后，采用奇异值分解逆运算，如下式，获得修改后的子图：Then, the inverse operation of singular value decomposition is used, as follows, to obtain the modified subgraph:

HH_new＝U_h×S_s×V_s ^T；HH _new =U _h ×S _s ×V _s ^T ;

最后，采用离散小波逆变换，生成嵌入水印后的图像，再根据该载体图像在原文档图像中的像素位置，恢复图像，即可得到嵌入水印的电子文档图片。Finally, the inverse discrete wavelet transform is used to generate the image after the watermark is embedded, and then the image is restored according to the pixel position of the carrier image in the original document image, and the watermark-embedded electronic document picture can be obtained.

众所周知，小波变换的概念是由法国工程师Morlet在1974年提出的，离散小波是将连续的小波离散化。假设任意函数x(t)的小波变换为W(a，b)，其中a、b分别为尺度因子和平移因子。将小波基函数中的a，b限定在一些离散点上取值，即进行如下式所示的离散采样：As we all know, the concept of wavelet transform was proposed by French engineer Morlet in 1974, discrete wavelet is to discretize continuous wavelet. Suppose the wavelet transform of any function x(t) is W(a, b), where a and b are scale factor and translation factor, respectively. Limit a and b in the wavelet basis function to some discrete points, that is, perform discrete sampling as shown in the following formula:

则小波Ψ_a，b(t)的计算如下所示：Then the calculation of wavelet Ψ _{a, b} (t) is as follows:

离散小波变换(DWT)的定义如公式2-3所示：The discrete wavelet transform (DWT) is defined as Equation 2-3:

DWT(m，n)＝∫_Rf(t)Ψ_m，n(t)dt：DWT(m,n)=∫ _R f(t)Ψ _m,n (t)dt:

离散小波变换是基于有限时域和变换频率的小波变换，它是一个频域技术。图像是以二维矩阵的形式存储的，因此图像的小波变换是基于二维离散小波的变换，这种算法将一维小波变换分别应用在图像等二维信号的行和列上，以此构成二维小波变换，可以对图像进行有效的时频分解。离散小波变换首先将载体图像变换到频域，然后根据变换后的水印图像的频率系数修改原载体图像的频率系数，以此来得到鲁棒的嵌入水印的图像。Discrete wavelet transform is a wavelet transform based on finite time domain and transform frequency, which is a frequency domain technique. The image is stored in the form of a two-dimensional matrix, so the wavelet transform of the image is based on the two-dimensional discrete wavelet transform. This algorithm applies the one-dimensional wavelet transform to the rows and columns of the two-dimensional signals such as images, etc. Two-dimensional wavelet transform can perform effective time-frequency decomposition on images. Discrete wavelet transform firstly transforms the carrier image to frequency domain, and then modifies the frequency coefficient of the original carrier image according to the frequency coefficient of the transformed watermark image, so as to obtain a robust watermark-embedded image.

离散小波变换按层级分解图像，提供空间和频率的图像描述。它将图像按三个基本空间方向，即水平、垂直和对角，进行分解，结果得到四个不同的组成部分，分别为LL频带，LH频带，HL频带和HH频带。这里的第一个字母表示对载体图像的行进行低通频率操作或者高通频率操作，第二个字母指将滤波器应用于载体图像的列。LL频带表示水平和垂直方向经过低通滤波后所得的细节信息，LH频带表示水平方向低通，垂直方向高通滤波后所得的细节信息，HL频带表示水平方向高通，垂直方向低通滤波后所得的细节信息，HH频带表示水平和垂直方向经过高通滤波后所得的细节信息。LL频带是最低的分辨率水平，由载体图像近似部分组成，集中了其大部分信息，刻画了主体特征，其他三个频带则分别保持了载体图像各方向的边缘细节信息。Discrete wavelet transforms decompose images hierarchically, providing spatial and frequency image descriptions. It decomposes the image in three basic spatial directions, namely horizontal, vertical and diagonal, resulting in four different components, namely LL band, LH band, HL band and HH band. The first letter here refers to low-pass frequency operation or high-pass frequency operation on the rows of the carrier image, and the second letter refers to applying the filter to the columns of the carrier image. The LL band represents the detail information obtained by low-pass filtering in the horizontal and vertical directions, the LH band represents the detail information obtained after low-pass filtering in the horizontal direction and the high-pass filtering in the vertical direction, and the HL band represents the high-pass in the horizontal direction and the low-pass filtering in the vertical direction. Detail information, the HH band represents the detail information obtained after high-pass filtering in the horizontal and vertical directions. The LL band is the lowest resolution level and consists of the approximate part of the carrier image, which concentrates most of its information and depicts the main features. The other three bands maintain the edge detail information of the carrier image in each direction.

对于第二级的分解，其中任一子频带可以被继续分解为四个层级，分解的层级越高，嵌入水印的图像鲁棒性更好。在每一个分解层级，LL频带的小波系数量级都大于其他三个频带。人的视觉系统对低频部分，也即LL频带更为敏感，所以水印信息最好嵌入在其他三个频带，以较好的保留原始图像的质量。For the second-level decomposition, any sub-band can be further decomposed into four levels. The higher the decomposition level, the better the robustness of the watermark-embedded image. At each decomposition level, the wavelet coefficients of the LL band are orders of magnitude larger than the other three bands. The human visual system is more sensitive to the low frequency part, that is, the LL frequency band, so the watermark information is preferably embedded in the other three frequency bands to better preserve the quality of the original image.

图像经过二维小波变换后，便分解为4个大小为原来尺寸1/4的子块区域，分别包含了相应频带的小波系数，相当于在水平方向和垂直方向上进行隔点采样。本实施例算法将HH子带(高频子带)选为水印嵌入区域，因为该频带代表了图像的边缘和纹理信息，嵌入水印后对原始图像的质量影响较小。After the two-dimensional wavelet transform, the image is decomposed into 4 sub-block regions whose size is 1/4 of the original size, respectively including the wavelet coefficients of the corresponding frequency bands, which is equivalent to sampling at intervals in the horizontal and vertical directions. The algorithm of this embodiment selects the HH sub-band (high-frequency sub-band) as the watermark embedding area, because this frequency band represents the edge and texture information of the image, and the quality of the original image is less affected after the watermark is embedded.

奇异值分解隶属于线性代数，是一种非对称正交变换，它以特征向量为基础，被广泛应用于数字图像处理。奇异值在图像受到一些干扰时不会发生明显改变，稳定性好，奇异值代表了一幅图像的本质特性，水印嵌入后，在图像收到简单攻击时，奇异值变化很小。正是由于它不表现图像的视觉特性这个优势，使得奇异值分解具有较好的隐蔽性和抵抗几何攻击能力，能够很好地弥补离散小波变换不能很好抵抗几何攻击的缺陷，所以图像的奇异值分解和离散小波变换可以很好地结合起来，用来提高数字水印的鲁棒性和数字水印算法的执行速率，还使得嵌入系数有了更多的选择。Singular value decomposition belongs to linear algebra and is an asymmetric orthogonal transformation, which is based on eigenvectors and is widely used in digital image processing. The singular value does not change significantly when the image is disturbed, and the stability is good. The singular value represents the essential characteristics of an image. After the watermark is embedded, the singular value changes very little when the image is subjected to a simple attack. It is precisely because it does not represent the visual characteristics of the image that the singular value decomposition has better concealment and resistance to geometric attacks, and can well make up for the defect that the discrete wavelet transform cannot well resist geometric attacks, so the singular value of the image. Value decomposition and discrete wavelet transform can be well combined to improve the robustness of digital watermarking and the execution rate of digital watermarking algorithms, and also make more choices of embedded coefficients.

图像奇异值分解的基本思想：如果把一幅数字图像用矩阵A来表示，设矩阵A∈R^m ^×n，rank(A)＝r，r≤n，那么矩阵A的奇异值分解定义如下式所示：The basic idea of image singular value decomposition: if a digital image is represented by a matrix A, set the matrix ^A∈Rm ^×n , rank(A)=r, r≤n, then the singular value decomposition of matrix A is defined as follows shown:

其中，U＝[u₁，u₂，…，u_m，]∈R^m×m和V＝[v₁，v₂，…，v_n，]∈R^n×n是正交矩阵，其列向量分别为u_i和v_i；U，V分别称为矩阵A的左奇异矩阵和右奇异矩阵；D＝diag(σ₁，σ₂，…，σ_m，)是对角阵，σ_i(i＝1，2，…，m)称为矩阵A的奇异值，此处是AA^T或A^TA的特征值的正平方根，且满足σ₁≥σ₂≥…≥σ_r＞σ_r+1＝…＝σ_m＝0。奇异值分解的几何意义是对于一个任意大小的矩阵，需要找到一组两两正交的单位向量，使得矩阵与这两个向量相互作用，得到两个新的向量，且要保持这两个新生成的向量也是正交的。where U=[u ₁ , u ₂ ,..., _um ,]∈R ^m×m and V=[v ₁ ,v ₂ ,...,v _n ,]∈R ^n×n are orthogonal matrices whose columns The vectors are _ui and v _i respectively; U and V are called the left singular matrix and right singular matrix of matrix A respectively; D=diag(σ ₁ ,σ ₂ ,...,σ _m ,) is a diagonal matrix, σ _i ( i=1, 2, ..., m) is called the singular value of matrix A, here is the positive square root of the eigenvalue of ^{AAT or A T} ^A , and satisfies σ ₁ ≥σ ₂ ≥...≥σ _r >σ _{r+ 1} =...=σ _m =0. The geometric meaning of singular value decomposition is that for a matrix of any size, it is necessary to find a set of unit vectors that are orthogonal to each other, so that the matrix interacts with these two vectors to obtain two new vectors, and it is necessary to maintain these two new vectors. The resulting vectors are also orthogonal.

图像的奇异值表征图像的内在性质，具有较好的稳定性，当图像有微小的变化时，其奇异值不会有很大的改变，而且矩阵的奇异值具有转置不变性和旋转不变性，这些特性对实现数字水印的鲁棒性具有很重要的意义。The singular value of the image characterizes the intrinsic properties of the image and has good stability. When the image has a small change, its singular value will not change greatly, and the singular value of the matrix has transposition invariance and rotation invariance. , these characteristics are of great significance to realize the robustness of digital watermarking.

如图5所示，水印提取步骤基本上是水印嵌入步骤的逆向操作，具体如下：As shown in Figure 5, the watermark extraction step is basically the reverse operation of the watermark embedding step, as follows:

S41：提取嵌入水印后的电子文档图像的像素点，获得嵌入水印的载体图像；S41: extracting the pixel points of the watermark-embedded electronic document image to obtain a watermark-embedded carrier image;

S42：对所述嵌入水印的载体图像进行离散小波变换，选择HH频域部分作为水印提取区域；S42: Perform discrete wavelet transform on the watermark-embedded carrier image, and select the HH frequency domain part as the watermark extraction area;

S43：对所述水印提取区域进行奇异值分解，并提取所述水印提取区域的奇异值；S43: Perform singular value decomposition on the watermark extraction area, and extract singular values of the watermark extraction area;

S44：采用逆运算，利用获得的奇异值还原出水印图像。S44 : use the inverse operation to restore the watermark image by using the obtained singular values.

本实施例的算法在Matlab R2012b环境下进行了仿真实验，实验载体图像为512×512的电子文档图像，如图6a所示；水印图像为256×256的手掌灰度图像，如图6b所示；嵌入水印后的图像如图6c所示；提取出的水印图像如图6d所示。下面将通过实验评估算法的不可见性以及鲁棒性，并与其它水印算法做比较。The algorithm of this embodiment is simulated in the Matlab R2012b environment. The experimental carrier image is a 512×512 electronic document image, as shown in Figure 6a; the watermark image is a 256×256 palm grayscale image, as shown in Figure 6b ; the image after embedding the watermark is shown in Figure 6c; the extracted watermark image is shown in Figure 6d. The following will evaluate the invisibility and robustness of the algorithm through experiments, and compare it with other watermarking algorithms.

嵌入水印后的图像信息在信道中传输，在这个过程中一般会受到各种类型的攻击，而嵌入其中的水印信息也会遭受不同程度的损坏。一些恶意攻击者会盗取水印信息，更有甚者想要篡改和伪造数据，使得数字水印算法需要具有抵抗攻击的安全性。常见的几何攻击主要包括压缩、噪声、滤波和旋转放缩等，为了判定水印算法的抗攻击性，需要根据一些评判标准的对其进行检验。数字图像水印算法的评价指标主要是水印的不可见性和稳定性，经检验，本实施例的方法对嵌入水印后的电子文档具有较高的稳定性和较好的水印不可见性。The image information embedded in the watermark is transmitted in the channel. In this process, it is generally subject to various types of attacks, and the watermark information embedded in it will also be damaged to varying degrees. Some malicious attackers will steal the watermark information, and some even want to tamper and forge data, so that the digital watermarking algorithm needs to have security against attacks. Common geometric attacks mainly include compression, noise, filtering, rotation scaling, etc. In order to determine the attack resistance of the watermarking algorithm, it needs to be tested according to some criteria. The evaluation index of the digital image watermarking algorithm is mainly the invisibility and stability of the watermark. After inspection, the method of this embodiment has higher stability and better watermark invisibility for the electronic document embedded with the watermark.

以下是水印不可见性的评估和验证过程。The following is the evaluation and verification process of watermark invisibility.

水印的不可见性也叫不可感知性，从直观上讲，通过人眼并不能感觉到原始图像与嵌入水印后的图像有差别，不可见性越好，人们也就越不容易发现隐藏的水印信息，对水印信息的安全性有了更好的保障，好的不可见性也使得原始图像的质量不会受到很大影响。The invisibility of the watermark is also called imperceptibility. Intuitively, the human eye cannot feel the difference between the original image and the image after the watermark is embedded. The better the invisibility, the harder it is for people to find the hidden watermark. The security of the watermark information is better guaranteed, and the good invisibility also makes the quality of the original image not greatly affected.

一般选用峰值信噪比(PSNR)和结构相似性(SSIM)做为评价指标。Generally, peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) are used as evaluation indicators.

峰值信噪比用来说明水印嵌入前后图像的质量变化，将水印嵌入系统看做一个通信系统，原始载体图像代表需要传输的信号，嵌入的水印信息则表示加载在原始信号上的噪声，峰值信噪比也可以理解为传输信号的最大值与噪声的比值，其计算公式如下式所示。The peak signal-to-noise ratio is used to describe the quality change of the image before and after watermark embedding. The watermark embedding system is regarded as a communication system. The original carrier image represents the signal to be transmitted, and the embedded watermark information represents the noise loaded on the original signal. The noise ratio can also be understood as the ratio of the maximum value of the transmission signal to the noise, and its calculation formula is shown in the following formula.

其中，MSE表示均方差，计算公式为：Among them, MSE represents mean square error, and the calculation formula is:

其中，I_M×N表示原始载体图像，I′_M×N表示嵌入水印后的图像。均方差用来计算两图像之间的差别，其大小可以表示图像差别的波动程度。Among them, I _M×N represents the original carrier image, and I′ _M×N represents the image after embedding the watermark. The mean square error is used to calculate the difference between the two images, and its size can indicate the degree of fluctuation of the image difference.

通常情况下，峰值信噪比的值不小于36dB时，认为水印算法对原载体图像品质的损坏是可以接受的，而且峰值信噪比的值越大，被测图像与原始图像越相似，图像失真就越少，嵌入水印后的图像品质也越高。Under normal circumstances, when the value of the peak signal-to-noise ratio is not less than 36dB, it is considered acceptable for the watermarking algorithm to damage the quality of the original carrier image. The less distortion, the higher the quality of the image after embedding the watermark.

另外一种衡量两幅图像相似度的指标是结构相似性(SSIM)，其计算公式如下：Another metric to measure the similarity of two images is Structural Similarity (SSIM), which is calculated as follows:

其中，u_x是x的平均值，u_y是y的平均值，

是x的方差，

是y的方差，σ_xy是x和y的标准差。c₁＝(k₁L)²，c₂＝(k₂L)²是用来维持稳定的常数。L是像素值的动态范围，经过实验总结，令k₁＝0.01，k₂＝0.03。结构相似性值越大，表示图像失真越小。where u _x is the mean of x, u _y is the mean of y,

is the variance of x,

is the variance of y, and σ _xy is the standard deviation of x and y. c ₁ =(k ₁ L) ² and c ₂ =(k ₂ L) ² are constants for maintaining stability. L is the dynamic range of the pixel value, and after summarizing the experiment, let k ₁ =0.01, k ₂ =0.03. The larger the structural similarity value, the smaller the image distortion.

图像具有很高的结构性，像素之间也存在很强的相关性，尤其在空间相似的情况下。大多数基于误差敏感度的质量评估方法，如峰值信噪比和均方差，都是基于线性变换分解图像信号，这种方法不会涉及到图像特有的相关性。因此，可选用度量结构相似程度的评估指标。Images are highly structured and there are strong correlations between pixels, especially when they are spatially similar. Most error-sensitivity-based quality assessment methods, such as peak signal-to-noise ratio and mean square error, decompose image signals based on linear transformations, which do not involve image-specific correlations. Therefore, an evaluation index that measures the similarity of the structure can be selected.

本文实验选用三张不同的电子文档图像作为载体图像，并都嵌入如图6b所示的同一水印图像，进行实验对比，实验结果如图7所示，表1-1为三张嵌入水印图像的峰值信噪比以及结构相关性系数，由表中数据可知，本文算法具有较好的不可见性及数据保真性。In this experiment, three different electronic document images are used as carrier images, and they are all embedded with the same watermark image as shown in Figure 6b for experimental comparison. The experimental results are shown in Figure 7. Table 1-1 shows the three embedded watermark images. Peak signal-to-noise ratio and structural correlation coefficient, from the data in the table, it can be seen that the algorithm in this paper has good invisibility and data fidelity.

表1-1 水印算法的PSNR值和SSIM值Table 1-1 PSNR value and SSIM value of watermarking algorithm

本文实验对图6a嵌入图6b所示的水印图像，计算其峰值信噪比和结构相似性，并和其他算法做对比实验。表1-2为对比试验结果，由实验数据可知，本文提出的算法其峰值信噪比的值为58.81，高于其他算法，且结构相关系数为最大值，所以较其他算法相比，有更好的隐蔽性。In this experiment, the watermark image shown in Figure 6a is embedded in Figure 6b, and its peak signal-to-noise ratio and structural similarity are calculated, and compared with other algorithms. Table 1-2 shows the comparative test results. It can be seen from the experimental data that the peak signal-to-noise ratio of the algorithm proposed in this paper is 58.81, which is higher than that of other algorithms, and the structural correlation coefficient is the maximum value, so compared with other algorithms, it has a higher peak signal-to-noise ratio. Good concealment.

表1-2 PSNR和SSIM对比试验Table 1-2 PSNR and SSIM comparison test

以下针对基于多重特征对电子文档进行水印处理试验。The following is an experiment for watermarking electronic documents based on multiple features.

选择两张电子文档图片作为载体图像，一张为全中文，另一张为全英文，这两种文档是在中英文语言下字体最多的形式，也就是白色像素点相对最少的情况，所以这两张载体图片能充分的说明本文水印算法的容量大小。本节实验在两张载体图像中分别嵌入三张不同的水印图像。中文电子文档中嵌入三张手掌图片，英文电子文档图片中嵌入两张人脸图像及一张手掌图像。Select two electronic document pictures as carrier images, one is full Chinese and the other is full English. These two documents have the most fonts in Chinese and English languages, that is, the case where the white pixels are relatively few, so these two carrier images It can fully explain the capacity of the watermarking algorithm in this paper. The experiments in this section embed three different watermark images in the two carrier images respectively. Three palm images are embedded in the Chinese electronic document, and two face images and one palm image are embedded in the English electronic document image.

实验采用的载体图像大小等于1024×1024，嵌入的水印图像大小均为128×128，水印图像是由Android手机采集的。实验结果如图8e和图9e所示，从实验结果可知，嵌入多张生物特征图像，并不会影响原始载体图像的可读性，而且提取出的水印图形也与原始特征图像保持高度的相似性，这也为生物特征识别及多生物特征融合提供了条件。The size of the carrier image used in the experiment is equal to 1024×1024, the size of the embedded watermark image is 128×128, and the watermark image is collected by an Android mobile phone. The experimental results are shown in Figure 8e and Figure 9e. It can be seen from the experimental results that embedding multiple biometric images will not affect the readability of the original carrier image, and the extracted watermark image is also highly similar to the original feature image. It also provides conditions for biometric recognition and multi-biometric fusion.

对上述两种不同载体图像，不同水印图像的嵌入实验，评测其嵌入水印后的图像的保真性，及不可见性，测试嵌入水印后的峰值信噪比(PSNR)及结构相似性(SSIM)，由表2-1显示的数据可见，本文算法的PSNR值较高，所以嵌入多张水印图像，并不会影响原始载体的可读性。Embedding experiments for the above two different carrier images and different watermark images, evaluate the fidelity and invisibility of the images after embedding the watermark, and test the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) after embedding the watermark. , from the data shown in Table 2-1, it can be seen that the PSNR value of the algorithm in this paper is relatively high, so embedding multiple watermark images will not affect the readability of the original carrier.

表2-1 嵌入多张水印图像的实验结果Table 2-1 Experimental results of embedding multiple watermark images

实验对上述两种嵌入水印的图像进行水印提取，并对提取出的水印计算其与原水印图像的归一化相关系数，以判断水印图像是否失真。实验数据如表2-2所示，无论是英文文档还是中文文档图像，从中提取出的三个水印图像都与原图像保持了很高的相关性，也说明本实施例的算法应用于多重水印嵌入时也具有可行性和较好的鲁棒性。In the experiment, watermark extraction is performed on the above two watermarked images, and the normalized correlation coefficient between the extracted watermark and the original watermark image is calculated to judge whether the watermark image is distorted. The experimental data is shown in Table 2-2. Whether it is an English document or a Chinese document image, the three watermark images extracted from it maintain a high correlation with the original image, which also shows that the algorithm of this embodiment is applied to multiple watermarks It is also feasible and robust when embedding.

表2-2 提取水印图像的NCC值Table 2-2 Extract the NCC value of the watermark image

综上所述，本发明针对电子文档图像像素分布的特点，提出了电子文档数字水印处理方法和系统利用电子文档中存在大量的白色像素点，穿插在字里行间，这种大量存在相同像素值的特点是其他自然图像不存在的特点，通过这些像素点，重构成新的载体图像，将水印嵌入其中，再将其还原到原始像素位置，这种重构再还原的方式，也实现了嵌入水印的置乱操作，而本发明的水印嵌入方法是在图像频域进行操作的，结合离散小波变换和奇异值分解技术实现水印嵌入，实验结果表明该方法具有较好的鲁棒性。此外，通过分析电子文档中白色像素点的容量，本发明可实现2至3种生物特征作为水印图像，通过实验证明多重水印的嵌入并没有对载体图像的品质造成很大的影响，水印算法仍然具有较好的不可见性和鲁棒性。To sum up, the present invention proposes a digital watermark processing method and system for electronic documents according to the characteristics of the pixel distribution of electronic documents. The electronic documents have a large number of white pixels, which are interspersed between the lines. It is a feature that does not exist in other natural images. Through these pixels, a new carrier image is reconstructed, the watermark is embedded in it, and then restored to the original pixel position. This method of reconstruction and restoration also realizes the embedding of the watermark. However, the watermark embedding method of the present invention operates in the image frequency domain, and combines discrete wavelet transform and singular value decomposition technology to achieve watermark embedding. The experimental results show that the method has good robustness. In addition, by analyzing the capacity of white pixels in the electronic document, the present invention can realize 2 to 3 kinds of biological features as watermark images. It is proved by experiments that the embedding of multiple watermarks does not have a great impact on the quality of the carrier image, and the watermarking algorithm is still It has better invisibility and robustness.

以上所述实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-described embodiments can be combined arbitrarily. For the sake of brevity, all possible combinations of the technical features in the above-described embodiments are not described. However, as long as there is no contradiction between the combinations of these technical features, All should be regarded as the scope described in this specification.

以上所述实施例仅表达了本发明的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干变形和改进，这些都属于本发明的保护范围。The above-mentioned embodiments only represent several embodiments of the present invention, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the patent of the present invention. It should be pointed out that for those skilled in the art, without departing from the concept of the present invention, several modifications and improvements can be made, which all belong to the protection scope of the present invention.

Claims

1. an electronic document digital watermark processing method, is characterized in that, comprises watermark embedding step, and described watermark embedding step specifically comprises:

Obtain user permissions;

Collect watermark information, the watermark information includes user information;

uploading the watermark information to the server;

Receive the watermark information and store it in the database;

Extract multiple features in the watermark information according to a predetermined algorithm, and perform feature fusion to generate a digital watermark;

After the electronic document is converted into an image, the digital watermark is embedded in the image, and the image is restored into an electronic document.

2. The electronic document digital watermark processing method according to claim 1, further comprising a watermark extraction step, the watermark extraction step specifically comprising:

Convert digitally watermarked electronic documents into images;

Process the image according to a predetermined algorithm to extract the digital watermark;

Using an inverse operation, the digital watermark is converted into a watermarked image.

3. electronic document digital watermark processing method as claimed in claim 1 or 2, is characterized in that, described watermark information is user's biometric information, including user's facial feature, hand feature, fingerprint feature and voice feature. one or more of them.

4. An electronic document digital watermark processing system, characterized in that it comprises a watermark embedding unit, and the watermark embedding unit specifically comprises:

Authorization module, used to obtain user permissions;

an information collection module for collecting watermark information, where the watermark information includes user information;

an information uploading module for uploading the watermark information to the server;

a database access module for receiving the watermark information and storing it in the database of the server;

a watermark processing module, used for extracting multiple features in the watermark information according to a predetermined algorithm, and performing feature fusion to generate a digital watermark;

The watermark embedding module is used to embed the digital watermark in the image after converting the electronic document into an image, and then restore the image into an electronic document.

5. The electronic document digital watermark processing system of claim 4, further comprising a watermark extraction unit, the watermark extraction unit specifically comprising:

The conversion module is used to convert the digital watermark-embedded electronic document into an image;

an extraction module, configured to process the image according to a predetermined algorithm to extract the digital watermark;

The restoration module is used for converting the digital watermark into a watermark image by inverse operation.

6. The electronic document digital watermark processing system according to claim 5, wherein the electronic document digital watermark processing system is composed of a mobile client and a server, wherein the mobile client comprises the authorization module, An information collection module and an information upload module, the server side includes the database, a database access module, a watermark processing module, a watermark embedding module, a watermark decryption module, a search module and an extraction module.

7. The electronic document digital watermark processing system according to claim 6, wherein the watermark information is the biometric information of the user, including one of the user's facial feature, hand feature, fingerprint feature and voice feature. species or several;

The mobile client is installed on a mobile phone, and the information collection module includes a camera, a microphone, a fingerprint identifier and/or a face identifier of the mobile phone.

8. A method for processing digital watermarking of electronic documents, comprising a watermark embedding step of embedding a watermark image in an electronic document, the watermark embedding step specifically comprising:

Convert electronic documents into carrier images with a bit depth of 8;

Select the white pixels to reconstruct the carrier image, that is, use the discrete wavelet algorithm to decompose the carrier image into four frequency domains: LL, HL, LH, HH, and select the HH frequency domain part as the watermark embedding area;

performing singular value decomposition on the watermark image;

Embed the watermark image into the watermark embedding area, that is, replace the singular value of the watermark embedding area with the singular value of the watermark image, and use the inverse operation to obtain the reconstructed carrier sub-image embedded in the watermark;

The reconstructed carrier sub-image is restored to the electronic document image according to the original pixel position.

9. The electronic document digital watermark processing method according to claim 8, further comprising a watermark extraction step, the watermark extraction step specifically comprising:

Extracting the pixel points of the watermark-embedded electronic document image to obtain the watermark-embedded carrier image;

Discrete wavelet transform is performed on the carrier image embedded in the watermark, and the HH frequency domain part is selected as the watermark extraction area;

Perform singular value decomposition on the watermark extraction area, and extract the singular values of the watermark extraction area;

Using the inverse operation, the watermark image is restored by using the obtained singular values.

10. The electronic document digital watermark processing method according to claim 8 or 9, wherein the watermark image comprises the user's biometric information, including the user's facial feature, hand feature, fingerprint feature and voice feature. one or more of them.