CN117454335A - Watermark embedding method and watermark extracting method - Google Patents

Watermark embedding method and watermark extracting method Download PDF

Info

Publication number
CN117454335A
CN117454335A CN202311301171.3A CN202311301171A CN117454335A CN 117454335 A CN117454335 A CN 117454335A CN 202311301171 A CN202311301171 A CN 202311301171A CN 117454335 A CN117454335 A CN 117454335A
Authority
CN
China
Prior art keywords
character
current
font
watermark
graphic information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311301171.3A
Other languages
Chinese (zh)
Inventor
张怡桢
汪艺伟
刘永亮
汤敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shupeng Information Technology Shenzhen Co ltd
Original Assignee
Shupeng Information Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shupeng Information Technology Shenzhen Co ltd filed Critical Shupeng Information Technology Shenzhen Co ltd
Priority to CN202311301171.3A priority Critical patent/CN117454335A/en
Publication of CN117454335A publication Critical patent/CN117454335A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Technology Law (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The application provides a watermark embedding method and a watermark extraction method, wherein the watermark embedding method comprises the following steps: obtaining current character codes and current character graphic information of target characters in a carrier object to be embedded with the watermark; judging whether the target character can correspond to watermark information according to the current character code and the current character graphic information; if the target character can correspond to the watermark information, executing the step of embedding the watermark information in the carrier object, wherein the step comprises the steps of obtaining an updated character code of the target character, replacing the current character code with the updated character code, obtaining updated character graphic information of the target character, and replacing the current character graphic information with the updated character graphic information; the updated character codes and the updated character graphic information are added to a character attribute information file, wherein the character attribute information file is a file embedded in the carrier object. The method has wide application range and better robustness, safety and reliability.

Description

Watermark embedding method and watermark extracting method
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a watermark embedding method, an electronic device, and a computer readable storage medium. The application also relates to a watermark extraction method, electronic equipment and a computer readable storage medium. The application also relates to another watermark extraction method, electronic equipment and computer readable storage medium. The application also relates to a carrier object modification method, an electronic device and a computer readable storage medium.
Background
At present, the digital assets using documents as carriers are frequently subjected to the event of data equity loss, and the development of digital economy is seriously affected. In addition, many document data contains a large amount or important sensitive information, such as business operation status information or personal privacy information of users, and thus, the disclosure of the document data can affect business operation or personal safety. Therefore, effective technical means are required to prevent infringement or leakage of document data. In this regard, conventional encryption and access control techniques can effectively prevent misuse or leakage of document data, while digital watermarking techniques are an effective way of identifying the source of infringement or leakage.
The existing scheme for embedding the watermark in the document has the limitation on tracing the data stolen by screen capturing and photographing, or has higher use threshold and is difficult to realize. Thus, there remains a need for an improved scheme for embedding watermarks in documents.
Disclosure of Invention
The application provides a watermark embedding method, which aims to solve the problems of difficult traceability and difficult realization of the existing watermark embedding method. The application further provides a watermark embedding device, an electronic device and a computer readable storage medium. The application also provides a watermark extraction method, a watermark extraction device, electronic equipment and a computer readable storage medium. The application also provides another watermark extraction method, device, electronic equipment and computer readable storage medium. The application also provides a carrier object modification method, a carrier object modification device, electronic equipment and a computer readable storage medium.
The application provides a watermark embedding method, which comprises the following steps: obtaining a character code currently corresponding to a target character in a carrier object to be embedded with a watermark, wherein the character code is used as a current character code, and obtaining character graphic information currently corresponding to the target character and used as current character graphic information; judging whether the target character can correspond to watermark information or not according to the current character code and the current character graphic information; if the target character can correspond to watermark information, executing the step of embedding watermark information in the carrier object, wherein the step of embedding watermark information in the carrier object comprises the following steps: obtaining updated character codes for the target characters, replacing the current character codes by using the updated character codes as updated character codes, obtaining updated character graphic information for the target characters, and replacing the current character graphic information by using the updated character graphic information as updated character graphic information, wherein the updated character codes represent the character codes of the target characters after updating the character graphic information; and adding the updated character codes and the updated character graphic information into a character attribute information file, wherein the character attribute information file is a file embedded in the carrier object.
Optionally, the method further comprises: determining a first character set and a first font set, wherein the first character set comprises original character codes corresponding to preset characters, the first font set comprises font identification of preset fonts, and the fonts are fonts to which character graphic information belongs; the step of judging whether the target character can correspond to watermark information according to the current character code and the current character graphic information comprises the following steps: if the current character code belongs to the first character set and the font identifier of the font to which the current character graphic information belongs to the first font set, determining that the target character can correspond to watermark information; and if the current character code does not belong to the first character set and/or the font identification of the font to which the current character graphic information belongs does not belong to the first font set, determining that the target character cannot correspond to watermark information.
Optionally, the method further comprises: setting custom character codes for the preset characters; obtaining a second character set according to the custom character codes corresponding to the preset characters; determining a first mapping relation between the custom character codes in the second character set and the original character codes in the first character set according to the same preset character; the obtaining updated character encoding for the target character includes: and obtaining updated character codes aiming at the target characters according to the current character codes and the first mapping relation.
Optionally, the obtaining the updated character code for the target character according to the current character code and the first mapping relation includes: and obtaining the custom character code corresponding to the current character code in the second character set according to the current character code and the first mapping relation, namely, the updated character code aiming at the target character.
Optionally, the method further comprises: obtaining custom character graphic information corresponding to the preset character according to original character graphic information corresponding to the preset character in a character graphic information base of the preset font, wherein the character graphic information base of the preset font is used for storing the original character graphic information belonging to the preset font, and for the same preset character, the difference between the custom character graphic information and the original character graphic information meets a preset difference condition; according to the custom character graphic information corresponding to the preset character, a character graphic information base of a custom font is obtained; obtaining a second font set according to the font identification of the custom font; determining a second mapping relation between the font identification of the custom font in the second font set and the font identification of the preset font in the first font set according to the difference between the custom character graphic information and the original character graphic information; the obtaining updated character graphic information for the target character includes: and obtaining updated character graphic information aiming at the target character according to the current character graphic information and the second mapping relation.
Optionally, the obtaining updated character graphic information for the target character according to the current character graphic information and the second mapping relationship includes: acquiring a font identifier of a font to which the current character graphic information belongs according to the current character graphic information; according to the font identification of the font to which the current character graphic information belongs and the second mapping relation, acquiring the font identification of the custom font corresponding to the font identification of the font to which the current character graphic information belongs in the second font set; and obtaining the custom character graphic information corresponding to the target character in the character graphic information base of the custom font represented by the font identification of the custom font according to the font identification of the custom font, namely the updated character graphic information for the target character.
Optionally, the method further comprises: setting custom character codes for preset characters corresponding to the custom character graphic information; obtaining a third character set according to the custom character codes corresponding to the preset characters; determining a third mapping relation between the custom character codes in the third character set and custom character graphic information in a character graphic information base of the custom fonts according to the same preset character; the obtaining updated character encoding for the target character includes: and obtaining updated character codes for the target characters according to the custom character graphic information corresponding to the target characters and the third mapping relation.
Optionally, the obtaining the updated character code for the target character according to the custom character graphic information corresponding to the target character and the third mapping relationship includes: and obtaining the custom character codes corresponding to the custom character graphic information in the third character set according to the custom character graphic information corresponding to the target character and the third mapping relation, namely the updated character codes for the target character.
Optionally, the method further comprises: obtaining a binary coding sequence corresponding to watermark information to be embedded; the step of embedding watermark information in the carrier object specifically comprises the following steps: determining whether to update the current character code and the current character graphic information according to a current code value to be embedded in a binary code sequence corresponding to the watermark information; if it is determined that the current character code and the current character graphic information are updated, an updated character code for the target character is obtained, the updated character code is used as an updated character code to replace the current character code, and updated character graphic information for the target character is obtained, and the updated character graphic information is used as an updated character graphic information to replace the current character graphic information.
Optionally, the determining whether to update the current character code and the current character graphic information according to the current code value to be embedded in the binary code sequence corresponding to the watermark information includes: if the current code value to be embedded in the binary code sequence corresponding to the watermark information is a first code value, determining to update the current character code and the current character graphic information; and if the current code value to be embedded in the binary code sequence corresponding to the watermark information is the second code value, determining not to update the current character code and the current character graphic information.
The application also provides a watermark embedding device, comprising: the first obtaining unit is used for obtaining a character code currently corresponding to a target character in a carrier object to be embedded with the watermark, taking the character code as a current character code, and obtaining character graphic information currently corresponding to the target character, and taking the character graphic information as current character graphic information; the first judging unit is used for judging whether the target character can correspond to watermark information according to the current character code and the current character graphic information; a first embedding unit, configured to perform a step of embedding watermark information in the carrier object when the target character can correspond to watermark information, where the step of embedding watermark information in the carrier object includes: obtaining updated character codes for the target characters, replacing the current character codes by using the updated character codes as updated character codes, obtaining updated character graphic information for the target characters, and replacing the current character graphic information by using the updated character graphic information as updated character graphic information, wherein the updated character codes represent the character codes of the target characters after updating the character graphic information; and a first adding unit, configured to add the updated character code and the updated character graphic information to a character attribute information file, where the character attribute information file is a file embedded in the carrier object.
The application also provides a watermark extraction method, which comprises the following steps: obtaining a character code currently corresponding to a target character in a carrier object embedded with a watermark as a current character code, and obtaining character graphic information currently corresponding to the target character as current character graphic information; judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information; and if the target character corresponds to watermark information, executing the step of extracting watermark information from the carrier object, wherein the step of extracting watermark information from the carrier object comprises the following steps: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information.
Optionally, the method further comprises: obtaining a first character set and a second character set, wherein the first character set comprises original character codes corresponding to preset characters, and the second character set comprises custom character codes corresponding to the preset characters; obtaining a first font set and a second font set, wherein the first font set comprises font identifications of preset fonts, and the second font set comprises font identifications of custom fonts; the step of judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information comprises the following steps: if the current character code belongs to the first character set and the font identification of the font to which the current character graphic information belongs to the first font set, determining watermark information corresponding to the target character; or if the current character code belongs to the second character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining that the target character corresponds to watermark information.
Optionally, the coding sequence corresponding to the watermark information is a binary coding sequence; the step of obtaining the current embedded coding value in the coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information comprises the following steps: if the current character code belongs to the second character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining the current embedded code value as a first code value; and if the current character code does not belong to the second character set and the font identification of the font to which the current character graphic information belongs does not belong to the second font set, determining the current embedded code value as a second code value.
Optionally, the method further comprises: obtaining a first font set and a second font set, wherein the first font set comprises font identifications of preset fonts, and the second font set comprises font identifications of custom fonts; obtaining a first character set and a third character set, wherein the first character set comprises original character codes corresponding to preset characters, and the third character set comprises custom character codes corresponding to the preset characters when the preset characters are presented in the custom fonts; the step of judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information comprises the following steps: if the current character code belongs to the first character set and the font identification of the font to which the current character graphic information belongs to the first font set, determining watermark information corresponding to the target character; or if the current character code belongs to the third character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining that the target character corresponds to watermark information.
Optionally, the coding sequence corresponding to the watermark information is a binary coding sequence; the step of obtaining the current embedded coding value in the coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information comprises the following steps: if the current character code belongs to the third character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining the current embedded code value as a first code value; and if the current character code does not belong to the third character set and the font identification of the font to which the current character graphic information belongs does not belong to the second font set, determining the current embedded code value as a second code value.
The application also provides a watermark extraction device, comprising: the second obtaining unit is used for obtaining a character code currently corresponding to the target character in the carrier object embedded with the watermark, taking the character code as a current character code, and obtaining character graphic information currently corresponding to the target character, and taking the character graphic information as current character graphic information; the second judging unit is used for judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information; a first extracting unit, configured to perform, when the target character corresponds to watermark information, a step of extracting watermark information from the carrier object, where the step of extracting watermark information from the carrier object includes: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information.
The application also provides a watermark extraction method, which comprises the following steps: obtaining a character pattern corresponding to a target character in the watermark-embedded carrier object image as a current character pattern; judging whether the target character corresponds to watermark information according to the current character graph; and if the target character corresponds to watermark information, executing the step of extracting watermark information from the carrier object image, wherein the step of extracting watermark information from the carrier object image comprises the following steps of: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character graph.
Optionally, the method further comprises: obtaining a character pattern reference table, wherein the character pattern reference table comprises original character patterns corresponding to preset characters when the preset characters are presented in a preset font and custom character patterns corresponding to the preset characters when the preset characters are presented in a custom font; the step of judging whether the target character corresponds to watermark information according to the current character graph comprises the following steps: and if the current character graph belongs to the character graph reference table, determining watermark information corresponding to the target character.
Optionally, the coding sequence corresponding to the watermark information is a binary coding sequence; the step of obtaining the current embedded coding value in the coding sequence corresponding to the watermark information according to the current character graph comprises the following steps: if the current character graph is the same as the custom character graph in the character graph reference table, determining the current embedded coding value as a first coding value; and if the current character graph is the same as the original character graph in the character graph reference table, determining the current embedded coding value as a second coding value.
Optionally, the obtaining the character pattern corresponding to the target character currently in the watermark-embedded carrier object image includes: performing image text recognition on the carrier object image to obtain text information in the carrier object image; determining the target character according to the text information; and carrying out character and figure segmentation on the carrier object image, and corresponding the segmented character and figure to the target character to obtain the character and figure corresponding to the target character currently.
The application also provides a watermark extraction device, comprising: a third obtaining unit, configured to obtain a character pattern corresponding to the target character currently in the watermark-embedded carrier object image, as a current character pattern; a third judging unit, configured to judge whether the target character corresponds to watermark information according to the current character pattern; a second extracting unit, configured to perform, when the target character corresponds to watermark information, a step of extracting watermark information from the carrier object image, where the step of extracting watermark information from the carrier object image includes: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character graph.
The application also provides a carrier object modification method, which comprises the following steps: in response to detecting a first operation for a character in a carrier object embedded with a watermark, obtaining a character code currently corresponding to the character as a current character code, and obtaining character graphic information currently corresponding to the character as current character graphic information, wherein the carrier object embedded with the watermark carries a character attribute information file, the character attribute information file contains updated character codes and updated character graphic information corresponding to preset characters, and the first operation is used for replacing the current character graphic information with first character graphic information; if the current character code is other character codes except the updated character code in the character attribute information file, replacing the current character graphic information with the first character graphic information; if the current character code is the updated character code in the character attribute information file, the current character graphic information is kept unchanged, or the current character graphic information is replaced by other character graphic information except the first character graphic information.
Optionally, the watermark embedded carrier object is obtained by: obtaining a character code currently corresponding to a target character in a carrier object to be embedded with a watermark as a current first character code, and obtaining character graphic information currently corresponding to the target character as current first character graphic information; judging whether the target character can correspond to watermark information or not according to the current first character code and the current first character graphic information; if the target character can correspond to the watermark information, executing the step of embedding the watermark information in the carrier object to be embedded with the watermark, wherein the step of embedding the watermark information in the carrier object to be embedded with the watermark comprises the following steps: obtaining updated character codes for the target characters, replacing the current first character codes by using the updated character codes as updated character codes, obtaining updated character graphic information for the target characters, and replacing the current first character graphic information by using the updated character graphic information as updated character graphic information, wherein the updated character codes represent the character codes of the target characters after updating the character graphic information; and adding the updated character codes and the updated character graphic information into a character attribute information file, wherein the character attribute information file is a file embedded in the carrier object.
Optionally, the method further comprises: obtaining a carrier object embedded with a watermark; and displaying the characters in the watermark-embedded carrier object according to the character attribute information file carried by the watermark-embedded carrier object.
The present application also provides a carrier object modifying device comprising: a fourth obtaining unit, configured to obtain, in response to detecting a first operation on a character in a carrier object embedded with a watermark, a character code currently corresponding to the character as a current character code, and obtain character graphic information currently corresponding to the character as current character graphic information, where the carrier object embedded with the watermark carries a character attribute information file, the character attribute information file includes updated character codes and updated character graphic information corresponding to preset characters, and the first operation is configured to replace the current character graphic information with first character graphic information; a first replacing unit configured to replace the current character graphic information with the first character graphic information when the current character code is other character codes than the updated character code in the character attribute information file; and the second replacing unit is used for keeping the current character graphic information unchanged or replacing the current character graphic information with other character graphic information except the first character graphic information when the current character code is updated in the character attribute information file.
The application also provides an electronic device comprising a processor and a memory; the memory is used for storing programs and data, and the processor calls the programs stored in the memory to execute the watermark embedding method, the watermark extraction method or the carrier object modification method.
The present application also provides a computer-readable storage medium having stored thereon a program and data, the program being executable by a processor for implementing the above-described watermark embedding method, or the above-described watermark extraction method, or the above-described carrier object modification method.
Compared with the prior art, the application has the following advantages:
according to the watermark embedding method, the updated character graphic information is used for replacing the current character graphic information, so that a carrier object embedded with the watermark can resist screen capturing and photographing attacks; the updated character codes are used for replacing the current character codes, so that the carrier object embedded with the watermark can resist format brush attack; by adding the updated character codes and the updated character graphic information into the character attribute information file and embedding the character attribute information file into the carrier object, the carrier object can be ensured to be normally displayed and is not influenced by the environment of an operating system. The method has wide application range and better robustness, safety and reliability.
Drawings
Fig. 1 is a flow chart of a preparation phase in a watermark embedding method provided in the present application.
Fig. 2 is a schematic diagram of a custom character graphic corresponding to a preset character provided in the present application under each custom font.
Fig. 3 is a flowchart of a watermark embedding stage in the watermark embedding method provided in the present application.
Fig. 4 is a schematic diagram of a document to be watermarked provided in the present application.
Fig. 5 is a schematic diagram of all alternative custom character encodings and custom character graphic information in the watermarked document provided herein.
Fig. 6 is a schematic diagram of a watermark information embedded document provided herein.
Fig. 7 is a flow chart of extracting a watermark from a watermarked document provided herein.
Fig. 8 is a schematic diagram of a binary coding sequence corresponding to watermark information extracted from a document provided in the present application.
Fig. 9 is a flowchart of extracting a watermark from a document image corresponding to a watermark-embedded document provided in the present application.
Fig. 10 is a schematic diagram of a character graphic reference table provided in the present application.
Fig. 11 is a flowchart of a watermark embedding method provided in the first embodiment of the present application.
Fig. 12 is a schematic diagram of a watermark embedding device according to a second embodiment of the present application.
Fig. 13 is a flowchart of a watermark extraction method provided in a third embodiment of the present application.
Fig. 14 is a schematic diagram of a watermark extraction apparatus according to a fourth embodiment of the present application.
Fig. 15 is a flowchart of a watermark extraction method provided in a fifth embodiment of the present application.
Fig. 16 is a schematic diagram of a watermark extraction apparatus according to a sixth embodiment of the present application.
Fig. 17 is a flowchart of a carrier object modification method provided in the seventh embodiment of the present application.
Fig. 18 is a schematic view of a carrier object modifying device provided in an eighth embodiment of the present application.
Fig. 19 is a schematic view of an electronic device provided in a ninth embodiment of the present application.
Detailed Description
For the purposes of clarity, advantages, and features of the present application, the following description will provide further details of the present application with reference to the drawings and detailed description. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application, however, may be embodied in many other forms than described herein and similarly practiced by those skilled in the art without departing from the spirit or essential characteristics thereof, and is therefore not limited to the specific embodiments disclosed below.
It should be noted that in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance, as well as a particular order or sequence. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context. Furthermore, in the description of the present application, unless otherwise indicated, the term "plurality" refers to two or more. The term "and/or" describes an association relationship of associated objects, meaning that there may be three relationships, e.g., a and/or B, which may represent: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
At present, the digital assets using documents as carriers are frequently subjected to the events of data equity loss, such as infringement of network literary works and the like, and the development of digital economy is seriously affected. In addition, many document data contains a large amount or important sensitive information, such as business operation status information or personal privacy information of users, and thus, the disclosure of the document data can affect business operation or personal safety. Therefore, effective technical means are required to prevent infringement or leakage of document data. In this regard, conventional encryption and access control techniques can effectively prevent misuse or leakage of document data, while digital watermarking techniques are an effective way of identifying the source of infringement or leakage.
Digital watermarking, also known as digital signature, is a technique for embedding a watermark in a digital carrier (including text, images, audio, video, etc.) in a perceptible or imperceptible form to verify ownership of the digital carrier. The watermark may be text, images, audio or video data representing ownership, etc.
The present application relates generally to a method of embedding a watermark in a document, where the document refers in particular to a digital carrier for recording text information generated by a text editing software. The text attribute information is contained in the document, and text content, typesetting and the like in the document can be determined according to the text attribute information. For example, a file recorded with text information generated by word software, powerPoint software, excel software, or the like is regarded as a document to which the present application relates. The word software, the PowerPoint software, and the Excel software are all office software based on Windows operating system developed by microsoft corporation, and the text editing software is not limited thereto.
The existing scheme for embedding the watermark in the document is mostly to modify the attribute of a specific element in the document to embed the watermark information or to add a new object in the document to embed the watermark information, and then the embedded watermark information has better concealment through a certain setting. This approach has limitations in terms of robustness, especially in terms of tracing data that is stolen by screenshot or photographing. For example, a text box is added in a document, watermark information is added in the text box, the watermark information is not easily perceived by setting the size and the position of the text box, and the scheme of embedding the watermark in the document cannot effectively track data stolen by a screen capturing or photographing mode. Some schemes for embedding watermark in a document are to embed watermark information by adding images or graphics into the page background of the document, and the schemes have certain capabilities of resisting screen capturing and shooting attacks, but are not friendly to the visual experience of users. In addition, too obvious watermarks may lead to user interference and also to attention of malicious leaked persons, so that the malicious persons try to destroy or remove the watermarks to avoid watermark tracing.
Based on the above problems, some watermark embedding schemes based on font substitution have been proposed. For example, a new font is created and then installed on the terminal device, and embedding of watermark information is achieved by replacing the original font with the new font. The main problem with this approach is that the new font installation gives the watermark embedding approach a high usage threshold and limitation. How to issue new fonts to the terminal device is a problem that needs to be considered if the user wants to install the new fonts, especially for personal mobile devices of the user, the forced installation of the new fonts is difficult to realize.
In view of the above problems, the present application provides a watermark embedding method, in a preparation stage, a character set is first determined, and a new character code is defined for each character code in the character set. And then determining a font set supporting the replacement font, and properly deforming the character graph of the character corresponding to each character code in the character set under the condition of supporting the replacement font to generate a similar character graph. And in the watermark embedding stage, judging the characters in the document one by one, and if the character code corresponding to the current character belongs to a character set and the font of the character belongs to a font set, determining that the character supports watermark embedding information. And then determining whether to update character attribute information according to the code value to be embedded in the binary code sequence corresponding to the watermark information. If the code value to be embedded is 1, determining to update character attribute information, namely replacing the current character code with a new character code and replacing the current character graph with a similar character graph; if the value of the code to be embedded is 0, the character attribute information is not updated, namely the current character code and the current character graph are kept unchanged. By carrying out the operation on each character in the document, embedding a binary coding sequence corresponding to watermark information into the document, adding a new character code and a similar character graph which are replaced in the document into a character attribute information file, and embedding the character attribute information file into the document, so that the document carries the character attribute information file, and the document can be normally displayed in other operating system environments.
The foregoing is a summary of a watermark embedding method provided herein, wherein the terms involved are explained as follows:
characters are the collective names of Chinese characters, letters, numbers, symbols and the like in the field of computers. A character may be a chinese character, an english letter, a greek letter, an arabic numeral, a punctuation mark, or a graphic symbol, etc.
Character encoding refers to the code when characters are stored in a computer or transferred through a communication network. Common character codes include ASCII codes, GB2312 codes, unicode codes, and the like. Among them, unicode codes (also called unified codes, ten thousand codes, single codes) set unified and unique binary codes for each character in each language, and most of existing software (including text editing software referred to in this application) and platforms use Unicode codes to support the input and display of characters.
A character set, i.e. a set of one or more characters. In a computer, the characters contained in the character set are stored in the form of character codes, so in this application, the character set includes character codes corresponding to one or more characters, and the character codes may be Unicode codes.
The character pattern refers to a specific font pattern formed for different characters. The character patterns are stored and processed in the computer in a form that the computer can understand, and therefore, in this application, the character patterns stored and processed by the computer are referred to as character pattern information, and the character patterns are displayed in the form of character patterns by the computer.
Fonts, generally refers to style styles of words. In this application, a font is a name for a character graphic having a uniform style. For example, common fonts in the text editing software are Song style, regular script, bold type, and the like, and when one character is presented in different fonts, the style and style of the corresponding character pattern are different. The character editing software comprises character graphic information files of various fonts, wherein the character graphic information files comprise character graphic information corresponding to each character when each character is presented in one font. In the character graphic information file, the character graphic information corresponding to each character takes the Unicode code corresponding to the character as an index, that is, for one character, the character graphic information corresponding to the character can be found in one character graphic information file through the Unicode code corresponding to the character, so that the character graphic corresponding to the character when the character is presented in a certain font is determined.
A font set, i.e. a set of one or more fonts. In this application, a font set contains only the font identification of the font, which may be the font name.
Character attributes refer to characteristics that determine the appearance, style, size, etc. of a character. In the present application, character attribute information includes character codes and character graphic information corresponding to characters.
The watermark embedding method provided in the present application will be described below by taking a specific watermark embedding process as an example.
In the preparation phase, as shown in fig. 1, the steps are included as follows:
s101, determining a character set CS.
The character set CS includes at least one original character code corresponding to a preset character, and the number of the preset characters is not limited in the application, and commonly used characters or high-frequency characters can be selected as preset characters, and the preset characters can include Chinese characters, letters, numbers or symbols. The original character code herein is used to represent a character code common to preset characters in a general computer, and may be a Unicode code which has been defined in the prior art.
For example, the preset characters include five characters of "number, canopy, yes, no", and the character set CS obtained from the original character encoding of the preset characters is as follows:
CS=【\u6570,\u84ec,\u662f,\u5b89,\u7684】。
S102, determining a custom character set CS'.
Setting a new custom character code for each preset character, and if the original character code corresponding to the preset character is the existing defined Unicode code, setting the custom character code corresponding to the preset character as the undefined Unicode code in the Unicode code.
Unicode codes have hundreds of thousands of code points in total, but not every code point has a corresponding character or graphic. These undefined code points are often referred to as "unassigned code points" or "reserved code points". Therefore, these undefined Unicode codes can be utilized as custom character codes for the preset characters.
And determining a custom character set CS 'according to the custom character codes corresponding to the preset characters, wherein the custom character set CS' comprises the custom character codes corresponding to the preset characters.
For the same preset character, a first mapping relation between the original character codes in the character set CS and the custom character codes in the custom character set CS' is determined. According to the original character codes in the character set CS, the custom character codes corresponding to the same preset character can be found in the custom character set CS' through the first mapping relation.
Along the above example, custom character codes are respectively set for the number, the awning, the yes, the safe and the safe of the preset characters, and a custom character set CS' obtained according to the custom character codes of the preset characters is as follows:
CS’=【\ue001,\ue002,\ue003,\ue004,\ue005】。
the first mapping relation, i.e., f (original Unicode) =new Unicode, is determined as shown in the following table.
Preset character Character set CS Custom character set CS'
Number of digits \u6570 \ue001
Awning \u84ec \ue002
Is that \u662f ue003
Anan (safety) \u5b89 \ue004
A kind of electronic device \u7684 \ue005
S103, determining a font set FS.
The font set FS includes at least one font identifier of a preset font, and the number of the preset fonts is not limited in the application, and commonly used fonts can be selected as the preset fonts. Each preset font is provided with a corresponding character graphic information file, the character graphic information file contains original character graphic information corresponding to characters when the characters are presented in the preset fonts, the preset fonts are fonts which are common to a general computer operating system, the character graphic information file corresponding to the preset fonts is a character graphic information file corresponding to the common fonts which are installed in the general computer operating system, the original character graphic information is used for representing the character graphic information corresponding to the characters in the general computer when the characters are presented in the common fonts, and the character graphic corresponding to the characters when the characters are displayed can be determined according to the character graphic information. In the character graphic information file, the character graphic information corresponding to each character takes the Unicode code corresponding to the character as an index, and the character graphic information corresponding to the character represented by the Unicode can be found through the Unicode.
Continuing with the description of the above example, the preset fonts may include microsoft black, song Ti, isopipe, regular script, where the font names of the four fonts are used as font identifiers, and the font set FS obtained according to the font identifier of the preset fonts is as follows:
fs= [ microsoft black, song Ti, isocline, regular script ].
S104, determining a custom font set FS'.
According to the original character graph corresponding to each preset character when the preset characters are presented in the preset fonts, generating character graphs which are different from the original character graph but are not easy to be perceived by human eyes as custom character graphs. The generation of a custom character pattern may be obtained by adding or subtracting some features from the original character pattern, or by deforming the original character pattern.
According to the original character patterns corresponding to all preset characters respectively under a preset font, obtaining custom character patterns corresponding to all preset characters respectively, and according to the custom character patterns, obtaining a character pattern information file of a custom font, wherein the character pattern information file of the custom font comprises custom character pattern information corresponding to each preset character when the custom font is presented, and according to the custom character pattern information, the custom character patterns corresponding to the preset characters under the custom font can be determined. In the character graphic information file of the custom font, each custom character graphic information can be used as an index by corresponding original character codes of preset characters or custom character codes.
And carrying out the operation on each preset font to obtain a custom font similar to each preset font and a character graphic information file of the custom font. The approximation is used to indicate that the original character pattern corresponding to any character when presented in a predetermined font is different from the custom character pattern corresponding to the character when presented in a custom font, but is not easily perceived by human eyes. In practical applications, the similar concept can be determined by using the difference between the custom character pattern and the original character pattern to satisfy the preset difference condition. In fact, a custom character pattern is derived from an original character pattern, and the two character patterns are similar, and the fonts to which each belongs are also called similar.
And determining a custom font set FS 'according to the font identification of the custom font to which the custom character graphic information corresponding to the preset character belongs, wherein the custom font set FS' comprises the font identification of the custom font close to each preset font.
And for the similar preset fonts and the custom fonts, determining a second mapping relation between the font identifications of the custom fonts in the custom font set FS' and the font identifications of the preset fonts in the font set FS. According to the font identification of the preset fonts in the font set FS, the font identification of the similar custom fonts can be found in the custom font set FS' through the second mapping relation.
Along the above example, the original character patterns corresponding to the preset character numbers, the awning, the Yes and the Yes under the preset fonts are properly deformed, for example, dots are added at the upper end or the lower end of the original character patterns, so as to generate the custom character patterns.
Further, the custom character graphic generated according to the character graphic in the microsoft ja black font is herein collectively named as the custom character graphic in the microsoft ja black 1 font, and the character graphic information file corresponding to the microsoft ja black 1 font only contains the custom character graphic information corresponding to the preset character number, the tent, the yes, the no and the no respectively under the microsoft ja black 1 font.
And similarly, according to three preset fonts of the Song body, the isopipe line and the regular script, three custom fonts of the Song body 1, the isopipe line 1 and the regular script 1 and character graphic information files of the three custom fonts are obtained.
The custom fonts obtained by the method comprise four fonts of Microsoft elegant black 1, song Ti 1, isopipe 1 and regular script 1, and custom character graphics corresponding to the preset characters under each custom font are shown in figure 2. The font names of the four types of custom fonts are used as font identifications, and a custom font set FS' obtained according to the font identifications of the custom fonts is as follows:
FS' = [ microsoft black 1, song Ti 1, isocontour 1, regular script 1 ].
The second mapping relationship, i.e., F (FS) =fs', is determined as shown in the following table.
Font set FS Custom font set FS'
Microsoft's elegance black Microsoft elegant black 1
Song Ti Song Ti 1
Iso-line Isocontour 1
Regular script Regular script 1
The preparation phase is thus completed.
In the watermark embedding stage, firstly, a document to be embedded with the watermark and watermark information to be embedded are obtained, the watermark information to be embedded is converted into a binary coding sequence, and then characters in the document are traversed. Referring to fig. 3, the method specifically includes the following steps:
s301, acquiring a character code C and a font identifier F corresponding to the current character.
And obtaining the character code C and character graphic information corresponding to the current character for the traversed current character, and determining the font identifier F of the font to which the character graphic information belongs according to the character graphic information.
S302, judging whether the current character supports embedding watermark information.
If C e CS, i.e. the character code C corresponding to the current character is the same as the original character code in the character set CS, and F e FS, i.e. the font identifier F corresponding to the current character is the same as the font identifier of a preset font in the font set FS, it is determined that the current character supports embedding watermark information, and step S303 is performed.
If it isAnd/or +.>I.e. the word corresponding to the current characterAnd if the symbol code C is different from the original character code in the character set CS and/or the font identifier F corresponding to the current character is different from the font identifier of the preset font in the font set FS, determining that the current character does not support embedding of watermark information, traversing to the next character, taking the next character as the current character, and executing the step S301.
S303, embedding watermark information.
For the current character supporting the embedding of the watermark information, if the current code value to be embedded in the binary code sequence corresponding to the watermark information to be embedded is 1, the character code C corresponding to the current character is replaced by the corresponding custom character code, and the character graphic information corresponding to the current character is replaced by the corresponding custom character graphic information. That is, the currently-to-be-embedded code value 1 is embedded into the character by replacing the character code and character graphic information corresponding to the current character.
The alternative custom character codes are custom character codes corresponding to the current character obtained from the custom character set CS' according to the character codes C corresponding to the current character and the first mapping relation.
The replaced custom character graphic information is the font identification of the similar custom font obtained from the custom font set FS' according to the font identification F of the font to which the character graphic information corresponding to the current character belongs and the second mapping relation, and further the custom character graphic information corresponding to the current character is obtained from the character graphic information file of the custom font represented by the font identification.
For the current character supporting the embedding of watermark information, if the current coding value to be embedded in the binary coding sequence corresponding to the watermark information to be embedded is 0, the character coding and character graphic information corresponding to the current character are kept unchanged, and no processing is performed. That is, the current code value 0 to be embedded is embedded into the character by not replacing the character code and character graphic information corresponding to the current character.
After the current code value to be embedded is embedded into the character, the next code value in the binary code sequence corresponding to the watermark information is used as the current code value to be embedded, the next character is traversed, the next character is used as the current character, and the step S301 is executed.
When the characters in the document are all traversed or the binary code sequences corresponding to the watermark information are all embedded in the document, step S304 is performed.
If the bit number of the binary code sequence corresponding to the watermark information is smaller than the number of characters in the document, in order to ensure the watermark embedding stability, the binary code sequence corresponding to the watermark information can be selected to be circularly embedded, and then each character in the document needs to be traversed.
S304, outputting the document embedded with the watermark information.
And adding the replaced custom character codes and the replaced custom character graphic information into a character attribute information file, and embedding the character attribute information file into the document, so that the document embedded with watermark information can be normally displayed in any operating system environment. The character attribute information file may be an openxml file type.
And outputting the document embedded with the watermark information, wherein the document carries a character attribute information file, and the characters replaced with the character codes and the character graphic information can be displayed according to the character attribute information file, so that a user-defined character graphic information file is not required to be additionally installed on an operating system.
The steps of the watermark embedding stage are described below using the above example.
The document to be embedded with the watermark is shown in fig. 4, and the document illustrated in the figure comprises four lines of characters, wherein the fonts of the first line of characters are Song Ti, the fonts of the second line of characters are imitated Song, the fonts of the third line of characters are Microsoft elegant black, and the fonts of the fourth line of characters are isostere.
The binary code sequence corresponding to the watermark information to be embedded is 10110.
Traversing to the first character "number" in the document as the current character. A character code C (\u 6570) and a font identification F (Song Ti) corresponding to the current character are obtained.
Inquiring whether the character code C (u 6570) corresponding to the current character belongs to a character set CS= [ u6570,_84ec,_662F,_5b89,_7684 ], and inquiring whether the font identification F (Song Ti) corresponding to the current character belongs to a font set FS= [ Microsoft elegant black, song Ti, etc., and the character code is regular.
C epsilon CS and F epsilon FS, then determining that the current character supports embedding watermark information.
And when the current code value to be embedded is 1, the character code C and the character graphic information corresponding to the current character are replaced, and the specific replacing process is not repeated here.
Traversing to the second character "tent" in the document as the current character. The character code C (\u84ec) and the font identification F (Song Ti) corresponding to the current character are obtained.
C epsilon CS and F epsilon FS, then determining that the current character supports embedding watermark information.
The current code value to be embedded is 0, and no processing is performed.
Traversing to the third character "family" in the document as the current character. A character code C and a font identification F corresponding to the current character are obtained (Song Ti).
It is determined that the current character does not support embedding watermark information and traverses to the next character.
Traversing to the fourth character "skill" in the document as the current character. A character code C and a font identification F corresponding to the current character are obtained (Song Ti).
It is determined that the current character does not support embedding watermark information and traverses to the next character.
The subsequent watermark embedding process only needs to be operated according to the steps of the watermark embedding stage, and is not described herein. Since the number of bits of the binary code sequence corresponding to the watermark information is smaller than the number of characters in the document, each character in the document is selected to be traversed so as to circularly embed the binary code sequence corresponding to the watermark information.
And finally, adding all the replaced custom character codes and custom character graphic information into a character attribute information file, and embedding the character attribute information file into the document. As shown in FIG. 5, the contents of the character attribute information file are determined by the document contents and watermark information together, watermark embedding is performed on different document contents, the replaced custom character codes and custom character graphic information are not necessarily identical, and the contents of the obtained character attribute information file are not necessarily identical. The custom character codes and custom character graphic information contained in the character attribute information file are not necessarily all custom character codes and all custom character graphic information determined in the preparation stage. In general, the custom character codes and custom character graphic information contained in the character attribute information file are only a small part of all custom character codes and all custom character graphic information determined in the preparation stage, namely a minimum set, so that the character attribute information file is convenient to embed into a document and transmit along with the document, a receiving end can analyze the document without installing a new font library, and the use threshold is reduced. In addition, an attacker cannot obtain all the custom character codes and all the custom character graphic information determined in the preparation stage, so that the attacker cannot judge which characters in the document support embedding watermark information, the character codes and the character graphic information are not replaced, and which characters do not support embedding watermark information, and further, the watermark information cannot be accurately obtained.
Finally, the document from which the watermark information is output is shown in fig. 6, and it can be seen that there is a slight difference between the character pattern corresponding to a part of the characters and the character pattern before embedding the watermark. In actual operation, the difference between the custom character pattern and the original character pattern is not easy to be perceived by naked eyes, and only people who master the information can finely distinguish the information, so that the screen capturing attack can be resisted.
For a document obtained by the watermark embedding method described above, watermark information may be extracted by the following steps.
It should be noted that, the watermark extraction capable of normal watermark extraction includes the right to obtain the character set CS, the custom character set CS ', the font set FS, the custom font set FS' and the character graphic information file of the custom font determined in the preparation stage, where the default watermark extraction has obtained the character set CS, the custom character set CS ', the font set FS, the custom font set FS' and the character graphic information file of the custom font determined in the preparation stage.
The watermark extraction stage, please refer to fig. 7, includes the following steps:
s701, acquiring a character code C and a font identifier F corresponding to the current character.
Obtaining a document embedded with the watermark, and traversing characters in the document. And obtaining the character code C and character graphic information corresponding to the current character for the traversed current character, and determining the font identifier F of the font to which the character graphic information belongs according to the character graphic information.
S702, judging whether the current character is embedded with watermark information.
If C epsilon CS, namely the character code C corresponding to the current character is the same as the original character code in the character set CS, and F epsilon FS, namely the font identification F corresponding to the current character is the same as the font identification of a preset font in the font set FS, determining that the current character is embedded with watermark information.
Or if C epsilon CS 'and/or F epsilon FS', namely the character code C corresponding to the current character is the same as the character code of a certain custom character in the custom character set CS ', or the character code C corresponding to the current character is the same as the character code of a certain custom font in the custom character set FS', or the character code F corresponding to the current character is the same as the character code of a certain custom character in the custom character set CS ', and the character code F corresponding to the current character is the same as the character code of a certain custom font in the custom character set FS', determining that the current character is embedded with watermark information. Step S703 is performed.
Although the watermark embedding stage is performed to replace the character codes and character graphic information of some characters, when an attacker uses a format brush to destroy the font style of the document, namely, the document is subjected to an operation for changing the character graphic information, the character codes and the character graphic information are replaced, and the character graphic information under the universal font cannot be indexed according to the currently corresponding custom character codes, so that the character is displayed as a messy code after the format brush operation. At this time, the character graphic information corresponding to the character is changed, and cannot be decoded by the current character graphic information, but the character code corresponding to the character is not changed, and the current character can be determined to be embedded with watermark information as long as the character code corresponding to the character is identified as the custom character code. Therefore, the current character can be determined to be embedded with watermark information as long as any condition that the character code corresponding to the character is the custom character code and the character graphic information corresponding to the character is the custom character graphic information is satisfied.
If the character code C and the font identifier F corresponding to the current character do not satisfy the above conditions, determining that the watermark information is not embedded in the current character, traversing to the next character, and executing step S701 with the next character as the current character.
And S703, extracting watermark information.
For the current character in which watermark information has been embedded, if C epsilon CS 'and/or F epsilon FS', the current embedded code value is determined to be 1. If it isAnd->The current embedded code value is determined to be 0.
After extracting the code value embedded by the current character, traversing to the next character, taking the next character as the current character, and executing step S701.
When the characters in the document are all traversed, the binary coding sequence corresponding to the embedded watermark information can be obtained. And analyzing the binary coding sequence to obtain embedded watermark information.
The watermark extraction process will be described below using the above example.
The document in which the embedded watermark information is obtained is shown in fig. 6.
Traversing to the first character "number" in the document as the current character. A character code C (\ue001) and a font identification F (Song Ti 1) corresponding to the current character are obtained.
C epsilon CS 'and/or F epsilon FS' (if the condition is judged under the condition that the format is not attacked), determining that the current character has embedded watermark information, and the current embedded coding value is 1.
Traversing to the second character "tent" in the document as the current character. The character code C (\u84ec) and the font identification F (Song Ti) corresponding to the current character are obtained.
C epsilon CS and F epsilon FS, determining that the current character has embedded watermark information and that the current embedded coding value is 0.
Traversing to the third character "family" in the document as the current character. A character code C and a font identification F corresponding to the current character are obtained (Song Ti).
And if the character code C and the font identifier F (Song Ti) corresponding to the current character do not meet the conditions of C epsilon CS and F epsilon FS and also do not meet the conditions of C epsilon CS 'and/or F epsilon FS', determining that the current character is not embedded with watermark information.
Traversing to the fourth character "skill" in the document as the current character. A character code C and a font identification F corresponding to the current character are obtained (Song Ti).
And if the character code C and the font identifier F (Song Ti) corresponding to the current character do not meet the conditions of C epsilon CS and F epsilon FS and also do not meet the conditions of C epsilon CS 'and/or F epsilon FS', determining that the current character is not embedded with watermark information.
The subsequent watermark extraction process only needs to be operated according to the steps of the watermark extraction stage for the document, and will not be described here. The binary code sequence corresponding to the extracted watermark information is shown in fig. 8, and the number below the character is the code value embedded by the character. And analyzing the extracted binary coding sequence to obtain the embedded watermark information.
For a document image obtained by screen capturing or photographing, watermark information can be extracted through the following steps.
It should be noted that, the watermark extracting device obtains the character set CS, the custom character set CS ', the font set FS, the custom font set FS' and the character graphic information file of the custom font determined in the preparation stage, and creates a character graphic reference table accordingly, where the character graphic reference table includes an original character graphic corresponding to the preset character when the preset character is presented in the preset font and a custom character graphic corresponding to the preset character when the preset character is presented in the custom font, and these character graphics can be determined by the character graphic information in the existing character graphic information file.
The document image may be obtained by photographing, scanning, screen capturing, printing, etc. the document content.
The watermark extraction stage, as shown in fig. 9, comprises the following steps:
s901, obtaining a character graph corresponding to characters in a document image.
And obtaining a document image corresponding to the watermark-embedded document, and performing text recognition processing on the document image by using a PaddleOCR text recognition technology to obtain all characters contained in the document image. PaddleOCR is a multi-language, multi-functional open source OCR (Optical Character Recognition ) tool library based on the PaddlePaddle deep learning platform. OCR is a technique that converts a printed or handwritten text image into text that can be edited and searched. PaddleOCR provides rich OCR functionality including Text Detection, text Recognition, text direction Detection (Text Orientation Detection), and the like. The PaddleOCR can be used for recognizing texts in multiple languages, a large number of pre-training models are supported, and different scenes and requirements can be met. PaddleOCR also provides an easy-to-use API (application programming interface) and pre-training model that allows developers to quickly integrate and deploy OCR functionality into their own applications.
And carrying out character and graph segmentation operation on the document image by utilizing a segmentation algorithm based on vertical projection or edge detection, obtaining the position information of each character graph in the document image, cutting a single rectangular character graph block according to the position information, and carrying out one-to-one correspondence on the single character graph block and the characters identified by the text, thereby obtaining the character graph corresponding to each character.
S902, judging whether the current character is embedded with watermark information.
Traversing the characters identified by the text, and comparing the character graph corresponding to the current character with the character graph in the character graph reference table for the traversed current character.
If the character pattern corresponding to the current character is the same as one of the character patterns in the character pattern reference table, determining that the watermark information has been embedded in the current character, and executing step S903.
If the character pattern corresponding to the current character is different from the character pattern in the character pattern reference table, determining that the watermark information is not embedded in the current character, traversing to the next character, and executing step S902 by taking the next character as the current character.
Of course, it is relatively time-consuming to perform image comparison on the character pattern corresponding to each character, and therefore, it is also possible to determine whether the current character has embedded watermark information through character encoding.
Since the document obtained by text recognition, the character graphic information corresponding to the character therein is no longer the character graphic information of the original text, since the text recognition process focuses only on the text content and not on the text style. But the character codes corresponding to each character can be obtained according to the obtained document by text recognition.
Judging whether the character codes corresponding to each character belong to a character set CS, screening out the characters of the character codes belonging to the character set CS, and comparing the character patterns corresponding to the characters with the character patterns in the character pattern reference table, so that the workload of image comparison is reduced greatly.
S903, extracting watermark information.
And for the current character embedded with the watermark information, if the character graph corresponding to the current character is the same as the custom character graph in the character graph reference table, determining that the current embedded coding value is 1. If the character pattern corresponding to the current character is the same as the original character pattern in the character pattern reference table, determining that the current embedded coding value is 0.
After extracting the code value embedded by the current character, traversing to the next character, taking the next character as the current character, and executing step S902.
And after the characters identified by the text are completely traversed, obtaining the binary coding sequence corresponding to the embedded watermark information. And analyzing the binary coding sequence to obtain embedded watermark information.
The watermark extraction process will be described below using the above example.
The document image in which the watermark information is embedded is shown in fig. 6, and the preset character pattern reference table is shown in fig. 10.
And carrying out text recognition processing on the document image to acquire all characters contained in the document image.
Performing character graph segmentation operation on the document image to obtain the position information of each character in the document image, wherein the specific mode is as follows: the upper left point of the image is selected as an anchor point, the upper left point A (x 1, y 1) and the lower right point B (x 2, y 2) of the rectangle are selected as position information of the image block, x is the horizontal distance from the anchor point, and y is the vertical distance from the anchor point. For example, the anchor point position information is (0, 0), the horizontal distance and vertical distance from the upper left point a of the character "number" to the anchor point are 300 and 400, respectively, and the horizontal distance and vertical distance from the lower right point B to the anchor point are 330 and 430, respectively, and then the position information of the character "number" is recorded as (300, 400) and (330, 430). Cutting out a single character graphic block of a rectangle according to the position information, and enabling the single character graphic block to correspond to characters identified by the text one by one to obtain character graphics respectively corresponding to all the characters contained in the document image.
Traversing to the first character number in the document as the current character, and comparing the character graph corresponding to the current character with the character graph in the character graph reference table.
And if the character graph corresponding to the current character is the same as one custom character graph in the character graph reference table, determining that the current character is embedded with watermark information, and the current embedded coding value is 1.
Traversing to a second character 'tent' in the document, and comparing the character graph corresponding to the current character with the character graph in the character graph reference table as the current character.
And if the character graph corresponding to the current character is the same as one original character graph in the character graph reference table, determining that the current character is embedded with watermark information, and the current embedded coding value is 0.
Traversing to the third character "family" in the document as the current character, and comparing the character graph corresponding to the current character with the character graph in the character graph reference table.
And if the character graph corresponding to the current character is different from the character graph in the character graph reference table, determining that the watermark information is not embedded in the current character.
Traversing to the fourth character skill in the document, and comparing the character graph corresponding to the current character with the character graph in the character graph reference table as the current character.
And if the character graph corresponding to the current character is different from the character graph in the character graph reference table, determining that the watermark information is not embedded in the current character.
The subsequent watermark extraction process only needs to be operated according to the steps of the watermark extraction stage for the document image, and will not be described here. The binary code sequence corresponding to the extracted watermark information is shown in fig. 8, and the number below the character is the code value embedded by the character. And analyzing the extracted binary coding sequence to obtain the embedded watermark information.
The watermark embedding method, the watermark extracting device, the electronic device and the computer readable storage medium according to the present application, the watermark extracting method, the watermark extracting device, the electronic device and the computer readable storage medium, and the modification method, the device, the electronic device and the computer readable storage medium of the carrier object are described in detail.
First embodiment
A first embodiment of the present application provides a watermark embedding method, as shown in fig. 11, including the following steps:
s1101, obtaining a character code currently corresponding to a target character in a carrier object to be embedded with the watermark, wherein the character code is used as a current character code, and obtaining character graphic information currently corresponding to the target character, and the character graphic information is used as current character graphic information.
The carrier object is the object to be embedded with the watermark, i.e. the object to be embedded with the watermark, which in this application may be a document, which here refers in particular to a digital carrier for recording text information, generated by a text editing software. The text attribute information is contained in the document, the character attribute information comprises character codes and character graphic information corresponding to the characters, and text content, typesetting and the like in the document can be determined according to the character attribute information.
Character encoding refers to a code when characters are stored in a computer or transferred through a communication network, and in this application, the character encoding may be Unicode encoding.
The character pattern refers to a specific font pattern formed for different characters. Character graphics having a uniform style, whose style is defined as font. The character editing software comprises character graphic information files of various fonts, wherein the character graphic information files comprise character graphic information corresponding to each character when each character is presented in one font. In the character graphic information file, the character graphic information corresponding to each character takes the Unicode code corresponding to the character as an index, that is, for one character, the character graphic information corresponding to the character can be found in one character graphic information file through the Unicode code corresponding to the character, so that the character graphic corresponding to the character when the character is presented in a certain font is determined.
Since the watermark embedding method provided by the application needs to traverse characters in the document, the traversed current character is used as a target character.
And obtaining the character code and character graphic information corresponding to the target character currently according to the character attribute information of the text contained in the carrier object.
S1102, judging whether the target character can correspond to watermark information according to the current character code and the current character graphic information.
Before this step is performed, a first character set and a first font set are also determined.
The first character set includes original character codes corresponding to preset characters, and commonly, common characters or high-frequency characters can be selected as preset characters, wherein the preset characters can include Chinese characters, letters, numbers or symbols, and the number of the preset characters is not limited herein. The original character code corresponding to the preset character may be a Unicode code defined in the prior art.
The first font set comprises font identification of preset fonts, wherein the fonts are fonts to which character graphic information belongs. Commonly used fonts can be selected as preset fonts, such as Song's body, regular script, bold, etc., and the number of preset fonts is not limited herein.
Judging whether the target character can correspond to watermark information according to the current character code and the current character graphic information, wherein the method specifically comprises the following steps of:
if the current character code belongs to the first character set and the font identifier of the font to which the current character graphic information belongs to the first font set, determining that the target character can correspond to watermark information;
and if the current character code does not belong to the first character set and/or the font identification of the font to which the current character graphic information belongs does not belong to the first font set, determining that the target character cannot correspond to watermark information.
Judging whether the target character can correspond to watermark information or not can be understood as judging whether the target character supports embedding watermark information or not. When the current character code and the current character graphic information of the target character meet certain conditions, determining that the target character supports watermark information embedding, and then performing the next watermark embedding operation on the target character. If the target character is determined not to support embedding watermark information, traversing to the next character, and judging the next character as the target character.
S1103, if the target character may correspond to watermark information, executing a step of embedding watermark information in the carrier object, where the step of embedding watermark information in the carrier object includes: obtaining updated character codes for the target characters, using the updated character codes to replace the current character codes as updated character codes, obtaining updated character graphic information for the target characters, using the updated character graphic information to replace the current character graphic information as updated character graphic information, wherein the updated character codes represent the character codes of the target characters after updating the character graphic information.
Before this step is performed, custom character codes and custom character graphic information are also required to be set for the preset characters, specifically as follows:
and setting custom character codes for the preset characters. If the original character code corresponding to the preset character is the existing defined Unicode code, the custom character code corresponding to the preset character can be set as the undefined Unicode in the Unicode code. And obtaining a second character set according to the custom character codes corresponding to the preset characters, wherein the second character set comprises the custom character codes corresponding to the preset characters.
And determining a first mapping relation between the custom character codes in the second character set and the original character codes in the first character set according to the same preset character. That is, according to the original character codes in the first character set, the custom character codes corresponding to the same preset character can be found in the second character set through the first mapping relation.
And obtaining custom character graphic information corresponding to the preset character according to the original character graphic information corresponding to the preset character in the character graphic information base of the preset font.
The character graphic information base of the preset fonts is used for storing the original character graphic information belonging to the preset fonts, and the character graphic information base can be stored in a file form, namely the character graphic information file.
The original character graph can be determined according to the original character graph information, and the character graph which is different from the original character graph but not easy to be perceived by human eyes is generated as the custom character graph according to the original character graph corresponding to the preset character when the preset character is presented in the preset font. The method comprises the steps of defining a style pattern of a custom character pattern with a unified style as a custom font, thereby obtaining a character pattern information base of the custom font, wherein the character pattern information base of the custom font comprises character pattern information corresponding to each preset character when the custom font is presented, namely, the character pattern information base of the custom font is obtained according to the custom character pattern information corresponding to the preset character. And obtaining a second font set according to the font identification of the custom font.
According to the original character graphic information under a preset font, the customized character graphic information under a customized font is obtained, and for the same preset character, the original character graphic information is similar to the customized character graphic information, and in practical application, the similar concept can be defined as that the difference between the customized character graphic information and the original character graphic information meets the preset difference condition. Further, the preset font may be considered similar to the custom font.
And determining a second mapping relation between the font identification of the custom font in the second font set and the font identification of the preset font in the first font set according to the difference between the custom character graphic information and the original character graphic information, namely according to the similar preset font and custom font. That is, according to the font identification of the preset fonts in the first font set, the font identification of the similar custom fonts can be found in the custom font set through the second mapping relation.
In addition, a binary coding sequence corresponding to the watermark information to be embedded needs to be obtained, that is, the watermark information is converted into the binary coding sequence, and a specific conversion method is the prior art and is not repeated here.
After the preparation work is finished, the step of embedding watermark information in the carrier object is executed aiming at the target character which can correspond to the watermark information, specifically:
and determining whether to update the current character code and the current character graphic information according to the current code value to be embedded in the binary code sequence corresponding to the watermark information.
And if the current code value to be embedded in the binary code sequence corresponding to the watermark information is a first code value, for example, the current code value to be embedded is 1, determining to update the current character code and the current character graphic information.
At this time, an updated character code for the target character is obtained as an updated character code, the updated character code is used to replace the current character code, and updated character pattern information for the target character is obtained as updated character pattern information, and the updated character pattern information is used to replace the current character pattern information.
The obtaining the updated character code for the target character may obtain the updated character code for the target character according to the current character code and the first mapping relationship. Specifically, according to the current character code and the first mapping relation, a custom character code corresponding to the current character code in the second character set is obtained, namely, the updated character code for the target character.
The obtaining updated character graphic information for the target character may be obtaining updated character graphic information for the target character according to the current character graphic information and the second mapping relationship. The method specifically comprises the steps of obtaining a font identifier of a font to which the current character graphic information belongs according to the current character graphic information, obtaining a font identifier of a custom font corresponding to the font identifier of the font to which the current character graphic information belongs in the second font set according to the font identifier of the font to which the current character graphic information belongs and the second mapping relation, and finally obtaining custom character graphic information corresponding to the target character in a character graphic information base of the custom font represented by the font identifier of the custom font according to the font identifier of the custom font, namely updated character graphic information for the target character.
It can be seen that the updated character code represents the character code of the target character after updating the character graphic information.
If the current code value to be embedded in the binary code sequence corresponding to the watermark information is a second code value, for example, if the current code value to be embedded is 0, determining that the current character code and the current character graphic information are not updated, namely, traversing to the next character without any processing on the current character, and judging the next character as a target character.
In the above, the custom character code set for the preset character is unique. A second method for setting custom character codes for preset characters is also provided.
Firstly, setting custom character graphic information for preset characters, which is the same as the setting method, so as to obtain a character graphic information base of custom fonts and a second font set. The character graphic information base of the custom fonts stores custom character graphic information corresponding to each preset character when the custom fonts are presented, and the second fonts store the font identifications of all the custom fonts in a concentrated mode.
Then, custom character codes are set for the preset characters corresponding to each custom character graphic information under each custom font, that is, when the preset characters are presented in different custom fonts, the corresponding custom character codes are different, and it is obvious that the custom character codes set for the preset characters are not unique. For the same character, the number of the corresponding custom character graphic information is the same as the number of custom character codes. For example, the preset character "number" is encoded as "\ue001" with the custom character corresponding to the custom font of microsoft black 1, encoded as "\ue011" with the custom character corresponding to the custom font of songbody 1, encoded as "\ue111" with the custom character corresponding to the custom font of isoline 1, and encoded as "\ue110" with the custom character corresponding to the custom font of regular script 1. And by analogy, four different custom character codes are respectively set for the preset characters of 'tent', 'yes', 'safe', and then all custom character codes corresponding to each preset character are stored in a third character set, namely the third character set comprises 20 different custom character codes. It can be seen that four custom character encodings are provided for the same character, different custom character encodings being associated with different custom fonts.
And obtaining a third character set according to the custom character codes corresponding to the preset characters, namely, obtaining the third character set according to the custom character codes of the preset characters corresponding to the graphic information of each custom character under each custom font. And determining a third mapping relation between the custom character codes in the third character set and custom character graphic information in the character graphic information base of the custom fonts according to the same preset character. That is, according to the custom character graphic information in the character graphic information base of the custom font, the custom character code corresponding to the same preset character can be found in the third character set through the third mapping relation.
Thus, the obtaining the updated character code for the target character may be obtaining the updated character code for the target character according to the custom character graphic information corresponding to the target character and the third mapping relationship. Specifically, according to the custom character graphic information corresponding to the target character and the third mapping relation, a custom character code corresponding to the custom character graphic information in the third character set is obtained, namely, the updated character code for the target character.
S1104, adding the updated character codes and the corresponding updated character graphic information into a character attribute information file, wherein the character attribute information file is a file embedded in the carrier object.
The watermark-embedded carrier object obtained by the watermark embedding method can resist screen capturing and photographing attacks, that is, watermark information can still be accurately extracted from a carrier object image obtained by means of screen capturing and photographing, and tracing is carried out. In addition, the carrier object embedded with the watermark can resist format brush attack, that is, after an attacker performs an operation of changing character graphic information on the carrier object, the watermark information can still be accurately extracted from the carrier object. The carrier object embedded with the watermark carries a character attribute information file containing updated character codes and updated character graphic information, so that the display of the carrier object is not influenced by the environment of an operating system, and the operating system can decode the carrier object without installing a custom font library. In conclusion, the method has wide application range and better robustness, safety and reliability.
Second embodiment
A second embodiment of the present application provides a watermark embedding device, as shown in fig. 12. The apparatus corresponds to the watermark embedding method provided in the first embodiment, and since the apparatus embodiment is similar to the method embodiment, the description is relatively simple, and the relevant points are referred to the content of the first embodiment.
The watermark embedding device 1200 provided in this embodiment includes:
a first obtaining unit 1201, configured to obtain a character code currently corresponding to a target character in a carrier object to be embedded with a watermark, as a current character code, and obtain character graphic information currently corresponding to the target character, as current character graphic information;
a first judging unit 1202, configured to judge whether the target character can correspond to watermark information according to the current character code and the current character graphic information;
a first embedding unit 1203 configured to perform, when the target character can correspond to watermark information, a step of embedding watermark information in the carrier object, where the step of embedding watermark information in the carrier object includes: obtaining updated character codes for the target characters, replacing the current character codes by using the updated character codes as updated character codes, obtaining updated character graphic information for the target characters, and replacing the current character graphic information by using the updated character graphic information as updated character graphic information, wherein the updated character codes represent the character codes of the target characters after updating the character graphic information;
A first adding unit 1204, configured to add the updated character code and the corresponding updated character graphic information to a character attribute information file, where the character attribute information file is a file embedded in the carrier object.
Third embodiment
A third embodiment of the present application provides a watermark extraction method, that is, a method for extracting a watermark for a document embedded with the watermark, as shown in fig. 13, the method includes the steps of:
s1301, obtaining a character code currently corresponding to a target character in the carrier object embedded with the watermark, wherein the character code is used as a current character code, and obtaining character graphic information currently corresponding to the target character, and the character graphic information is used as current character graphic information.
Since the watermark extraction method provided by the application also needs to traverse characters in the document, the traversed current character is taken as a target character.
And obtaining the character code and character graphic information corresponding to the target character currently according to the character attribute information of the text contained in the carrier object.
S1302, judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information.
Before this step is performed, the first character set and the second character set, and the first font set and the second font set, described above, are also obtained.
The first character set comprises original character codes corresponding to preset characters, and the second character set comprises custom character codes corresponding to the preset characters. The first font set comprises font identification of preset fonts, and the second font set comprises font identification of custom fonts. Please refer to the related description of the first embodiment.
Judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information, specifically:
if the current character code belongs to the first character set and the font identification of the font to which the current character graphic information belongs to the first font set, determining watermark information corresponding to the target character; or if the current character code belongs to the second character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining that the target character corresponds to watermark information. At this time, the watermark extraction operation of the next step may be performed on the target character.
If the current character code and the current character graphic information do not meet the conditions, determining that the target character does not correspond to the watermark information, traversing to the next character, and judging the next character as the target character.
For the second case of setting the custom character code for the preset character, correspondingly, the first font set and the second font set, and the first character set and the third character set are obtained first.
The first font set comprises font identification of preset fonts, and the second font set comprises font identification of custom fonts. The first character set comprises original character codes corresponding to preset characters, and the third character set comprises custom character codes corresponding to the preset characters when the preset characters are presented in the custom fonts. Please refer to the related description of the first embodiment.
Judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information, specifically:
if the current character code belongs to the first character set and the font identification of the font to which the current character graphic information belongs to the first font set, determining watermark information corresponding to the target character; or if the current character code belongs to the third character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining that the target character corresponds to watermark information.
S1303, if the target character corresponds to watermark information, executing a step of extracting watermark information from the carrier object, where the step of extracting watermark information from the carrier object includes: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information.
The coding sequence corresponding to the watermark information is a binary coding sequence.
The method comprises the steps of obtaining a current embedded coding value in a coding sequence corresponding to watermark information according to the current character coding and/or the current character graphic information, wherein the current embedded coding value is specifically:
for the target character corresponding to the watermark information, if the current character code belongs to the second character set and/or the font identifier of the font to which the current character graphic information belongs to the second font set, that is, if at least one of the current character code and the current character graphic information has substitution, determining that the current embedded code value is a first code value, for example, the current embedded code value is 1.
And for the target character corresponding to the watermark information, if the current character code does not belong to the second character set and the font identification of the font to which the current character graphic information belongs does not belong to the second font set, determining that the current embedded code value is a second code value, for example, the current embedded code value is 0.
So far, the current embedded coding value corresponding to the target character is extracted and traversed to the next character.
For the second case of setting a custom character code for a preset character, correspondingly, the obtaining, according to the current character code and/or the current character graphic information, a current embedded code value in a code sequence corresponding to the watermark information specifically includes:
and if the current character code belongs to the third character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining the current embedded code value as a first code value.
And if the current character code does not belong to the third character set and the font identification of the font to which the current character graphic information belongs does not belong to the second font set, determining the current embedded code value as a second code value.
Fourth embodiment
A fourth embodiment of the present application provides a watermark extraction apparatus, as shown in fig. 14. The apparatus corresponds to the watermark extraction method provided in the third embodiment, and since the apparatus embodiment is similar to the method embodiment, the description is relatively simple, and please refer to the content of the third embodiment for relevant points.
The watermark extraction apparatus 1400 provided in this embodiment includes:
a second obtaining unit 1401, configured to obtain a character code currently corresponding to a target character in the watermark-embedded carrier object, as a current character code, and obtain character graphic information currently corresponding to the target character, as current character graphic information;
a second judging unit 1402, configured to judge whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information;
a first extraction unit 1403, configured to perform, when the target character corresponds to watermark information, a step of extracting watermark information from the carrier object, where the step of extracting watermark information from the carrier object includes: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information.
Fifth embodiment
A fifth embodiment of the present application provides a watermark extraction method, that is, a method for extracting a watermark for a document image corresponding to a document embedded with the watermark, as shown in fig. 15, where the method includes the following steps:
s1501, obtaining the character pattern corresponding to the target character in the watermark embedded carrier object image as the current character pattern.
And carrying out image text recognition on the carrier object image, for example, carrying out text recognition processing on the carrier object image by adopting a PaddleOCR text recognition technology, so as to obtain text information in the carrier object image. Text information may be understood as characters contained in the carrier object corresponding to the carrier object image.
And determining the target character according to the text information. And traversing the text information, and taking the traversed current character as a target character.
And carrying out character and figure segmentation on the carrier object image, namely carrying out character and figure segmentation operation on the carrier object image by utilizing a segmentation algorithm based on vertical projection or edge detection, obtaining the position information of each character and figure in the carrier object image, and cutting out a single rectangular character and figure block according to the position information.
And the segmented character patterns, namely the single character pattern blocks, are in one-to-one correspondence with the target characters, so that the character patterns corresponding to the target characters at present are obtained.
S1502, judging whether the target character corresponds to watermark information according to the current character graph.
The method comprises the steps of obtaining a character graph reference table, wherein the character graph reference table is preset and comprises an original character graph corresponding to preset characters when the preset characters are presented in a preset font and a custom character graph corresponding to the preset characters when the preset characters are presented in a custom font.
Judging whether the target character corresponds to watermark information according to the current character graph, wherein the method specifically comprises the following steps:
and if the current character graph belongs to the character graph reference table, that is, the current character graph in the single character graph block corresponding to the target character is the same as one character graph in the character graph reference table, determining the watermark information corresponding to the target character. At this time, the watermark extraction operation of the next step may be performed on the target character.
If the current character graph does not belong to the character graph reference table, determining that the target character does not correspond to watermark information, traversing to the next character, and judging the next character as the target character.
S1503, if the target character corresponds to watermark information, executing a step of extracting watermark information from the carrier object image, where the step of extracting watermark information from the carrier object image includes: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character graph.
The coding sequence corresponding to the watermark information is a binary coding sequence.
The method comprises the steps of obtaining a current embedded coding value in a coding sequence corresponding to watermark information according to the current character graph, wherein the current embedded coding value is specifically:
For the target character corresponding to the watermark information, if the current character pattern is the same as the custom character pattern in the character pattern reference table, determining that the current embedded code value is a first code value, for example, the current embedded code value is 1.
For the target character corresponding to the watermark information, if the current character pattern is the same as the original character pattern in the character pattern reference table, determining that the current embedded code value is a second code value, for example, the current embedded code value is 0.
So far, the current embedded coding value corresponding to the target character is extracted and traversed to the next character.
Sixth embodiment
A sixth embodiment of the present application provides a watermark extraction apparatus, as shown in fig. 16. The apparatus corresponds to the watermark extraction method provided in the fifth embodiment, and since the apparatus embodiment is similar to the method embodiment, the description is relatively simple, and please refer to the content of the fifth embodiment for relevant points.
The watermark extraction apparatus 1600 provided in this embodiment includes:
a third obtaining unit 1601, configured to obtain, as a current character pattern, a character pattern corresponding to a target character currently in the watermark-embedded carrier object image;
A third judging unit 1602, configured to judge whether the target character corresponds to watermark information according to the current character pattern;
a second extracting unit 1603, configured to perform, when the target character corresponds to watermark information, a step of extracting watermark information from the carrier object image, where the step of extracting watermark information from the carrier object image includes: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character graph.
Seventh embodiment
A seventh embodiment of the present application provides a method for modifying a carrier object, that is, modifying a document embedded with a watermark, as shown in fig. 17, the method including the steps of:
s1701, in response to detecting a first operation on a character in a carrier object embedded with a watermark, obtaining a character code currently corresponding to the character as a current character code, and obtaining character graphic information currently corresponding to the character as current character graphic information, wherein the carrier object embedded with the watermark carries a character attribute information file, the character attribute information file contains updated character codes and updated character graphic information corresponding to preset characters, and the first operation is used for replacing the current character graphic information with the first character graphic information.
Before responding to the first operation, the carrier object embedded with the watermark is also required to be obtained, and the characters in the carrier object embedded with the watermark are displayed according to the character attribute information file carried by the carrier object embedded with the watermark. The step mainly aims at that the current character code in the carrier object embedded with the watermark is a character code of a custom character code, and the character code can be analyzed and displayed according to the character attribute information file without additionally installing a character graphic information base of the custom font.
The first operation may be a format modification of a portion of the characters in the carrier object using a format brush, the format modification including a modification of character graphic information corresponding to the characters. The first operation may also be to directly modify the font of a part of the characters, and modify the font of the characters, that is, modify the character graphic information corresponding to the characters.
The first character graphic information may be character graphic information under a general font in a general computer.
S1702, if the current character code is a character code other than the updated character code in the character attribute information file, replacing the current character graphic information with the first character graphic information.
And if the current character code is other character codes except the updated character code in the character attribute information file, that is to say, the current character code is a character code commonly used in a general computer, inquiring a character graphic information base of a target font which is wanted to be modified according to the current character code, and obtaining first character graphic information corresponding to the target character.
S1703, if the current character code is updated character code in the character attribute information file, keeping the current character graphic information unchanged or replacing the current character graphic information with other character graphic information except the first character graphic information.
If the current character code is the updated character code in the character attribute information file, that is, if the current character code is the custom character code, the computer cannot query the corresponding character graphic information according to the custom character code, so that the current character graphic information is kept unchanged, or the current character graphic information is replaced by other character graphic information except the first character graphic information, and the other character graphic information represents nonsensical messy codes.
The above describes the process of the watermark embedded carrier object against a format brush attack.
The watermark embedded carrier object is obtained by the following method:
obtaining a character code currently corresponding to a target character in a carrier object to be embedded with a watermark as a current first character code, and obtaining character graphic information currently corresponding to the target character as current first character graphic information; judging whether the target character can correspond to watermark information or not according to the current first character code and the current first character graphic information; if the target character can correspond to the watermark information, executing the step of embedding the watermark information in the carrier object to be embedded with the watermark, wherein the step of embedding the watermark information in the carrier object to be embedded with the watermark comprises the following steps: obtaining updated character codes for the target characters, replacing the current first character codes by using the updated character codes as updated character codes, obtaining updated character graphic information for the target characters, and replacing the current first character graphic information by using the updated character graphic information as updated character graphic information, wherein the updated character codes represent the character codes of the target characters after updating the character graphic information; and adding the updated character codes and the corresponding updated character graphic information into a character attribute information file, wherein the character attribute information file is a file embedded into the carrier object. Please refer to the first embodiment.
Eighth embodiment
An eighth embodiment of the present application provides a carrier object modifying device as shown in fig. 18. The apparatus corresponds to the carrier object modification method provided in the seventh embodiment, and since the apparatus embodiment is similar to the method embodiment, the description is relatively simple, and for the relevant points, please refer to the content of the seventh embodiment.
The carrier object modification apparatus 1800 provided in this embodiment includes:
a fourth obtaining unit 1801, configured to obtain, in response to detecting a first operation on a character in a carrier object embedded with a watermark, a character code currently corresponding to the character as a current character code, and obtain character graphic information currently corresponding to the character as current character graphic information, where the carrier object embedded with the watermark carries a character attribute information file, the character attribute information file contains updated character codes and updated character graphic information corresponding to preset characters, and the first operation is used to replace the current character graphic information with first character graphic information;
a first replacing unit 1802 configured to replace the current character graphic information with the first character graphic information when the current character code is other character codes than the updated character code in the character attribute information file;
A second replacing unit 1803, configured to, when the current character code is the updated character code in the character attribute information file, keep the current character graphic information unchanged, or replace the current character graphic information with other character graphic information except the first character graphic information.
Ninth embodiment
A ninth embodiment of the present application provides an electronic device, as shown in fig. 19. The electronic device includes: at least one processor 1901, at least one memory 1902, at least one communication interface 1903, and at least one communication bus 1904. Alternatively, the processor 1901 may be a processor CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application. Memory 1902 may comprise high-speed RAM memory or may also comprise non-volatile memory (non-volatile memory), such as at least one disk memory. The communication interface 1903 may be an interface of a communication module, such as an interface of a GSM module. The memory 1902 stores a program and data, and the processor 1901 calls the program stored in the memory 1902 to implement the watermark embedding method, the watermark extraction method, or the carrier object modification method described above.
Tenth embodiment
A tenth embodiment of the present application provides a computer-readable storage medium having stored thereon a program and data, the program being executed by a processor for implementing the above-described watermark embedding method, or watermark extraction method, or carrier object modification method.
The foregoing is merely a specific embodiment of the present application, but the scope of protection of the present application is not limited to this, and any modification, equivalent replacement and improvement made by those skilled in the art within the technical scope of the present application, which is within the spirit and principles of the present application, shall be covered by the protection scope of the present application.

Claims (20)

1. A method of watermark embedding, comprising:
obtaining a character code currently corresponding to a target character in a carrier object to be embedded with a watermark, wherein the character code is used as a current character code, and obtaining character graphic information currently corresponding to the target character and used as current character graphic information;
judging whether the target character can correspond to watermark information or not according to the current character code and the current character graphic information;
if the target character can correspond to watermark information, executing the step of embedding watermark information in the carrier object, wherein the step of embedding watermark information in the carrier object comprises the following steps: obtaining updated character codes for the target characters, replacing the current character codes by using the updated character codes as updated character codes, obtaining updated character graphic information for the target characters, and replacing the current character graphic information by using the updated character graphic information as updated character graphic information, wherein the updated character codes represent the character codes of the target characters after updating the character graphic information;
And adding the updated character codes and the corresponding updated character graphic information into a character attribute information file, wherein the character attribute information file is a file embedded into the carrier object.
2. The watermark embedding method according to claim 1, further comprising:
determining a first character set and a first font set, wherein the first character set comprises original character codes corresponding to preset characters, the first font set comprises font identification of preset fonts, and the fonts are fonts to which character graphic information belongs;
the step of judging whether the target character can correspond to watermark information according to the current character code and the current character graphic information comprises the following steps:
if the current character code belongs to the first character set and the font identifier of the font to which the current character graphic information belongs to the first font set, determining that the target character can correspond to watermark information;
and if the current character code does not belong to the first character set and/or the font identification of the font to which the current character graphic information belongs does not belong to the first font set, determining that the target character cannot correspond to watermark information.
3. The watermark embedding method according to claim 2, further comprising:
setting custom character codes for the preset characters;
obtaining a second character set according to the custom character codes corresponding to the preset characters;
determining a first mapping relation between the custom character codes in the second character set and the original character codes in the first character set according to the same preset character;
the obtaining updated character encoding for the target character includes:
and obtaining a custom character code corresponding to the current character code in the second character set according to the current character code and the first mapping relation, and taking the custom character code as an updated character code aiming at the target character.
4. The watermark embedding method according to claim 2, further comprising:
obtaining custom character graphic information corresponding to the preset character according to original character graphic information corresponding to the preset character in a character graphic information base of the preset font, wherein the character graphic information base of the preset font is used for storing the original character graphic information belonging to the preset font, and for the same preset character, the difference between the custom character graphic information and the original character graphic information meets a preset difference condition;
According to the custom character graphic information corresponding to the preset character, a character graphic information base of a custom font is obtained;
obtaining a second font set according to the font identification of the custom font;
determining a second mapping relation between the font identification of the custom font in the second font set and the font identification of the preset font in the first font set according to the difference between the custom character graphic information and the original character graphic information;
the obtaining updated character graphic information for the target character includes:
acquiring a font identifier of a font to which the current character graphic information belongs according to the current character graphic information;
according to the font identification of the font to which the current character graphic information belongs and the second mapping relation, acquiring the font identification of the custom font corresponding to the font identification of the font to which the current character graphic information belongs in the second font set;
and obtaining the custom character graphic information corresponding to the target character in the character graphic information base of the custom font represented by the font identification of the custom font according to the font identification of the custom font, and taking the custom character graphic information as updated character graphic information for the target character.
5. The watermark embedding method according to claim 4, further comprising:
setting custom character codes for preset characters corresponding to the custom character graphic information;
obtaining a third character set according to the custom character codes corresponding to the preset characters;
determining a third mapping relation between the custom character codes in the third character set and custom character graphic information in a character graphic information base of the custom fonts according to the same preset character;
the obtaining updated character encoding for the target character includes:
and obtaining the custom character codes corresponding to the custom character graphic information in the third character set according to the custom character graphic information corresponding to the target character and the third mapping relation, and taking the custom character codes as updated character codes for the target character.
6. The watermark embedding method according to claim 1, further comprising:
obtaining a binary coding sequence corresponding to watermark information to be embedded;
the step of embedding watermark information in the carrier object specifically comprises the following steps:
determining whether to update the current character code and the current character graphic information according to a current code value to be embedded in a binary code sequence corresponding to the watermark information;
If it is determined that the current character code and the current character graphic information are updated, an updated character code for the target character is obtained, the updated character code is used as an updated character code to replace the current character code, and updated character graphic information for the target character is obtained, and the updated character graphic information is used as an updated character graphic information to replace the current character graphic information.
7. The watermark embedding method according to claim 6, wherein determining whether to update the current character code and the current character graphic information according to a current code value to be embedded in a binary code sequence corresponding to the watermark information includes:
if the current code value to be embedded in the binary code sequence corresponding to the watermark information is a first code value, determining to update the current character code and the current character graphic information;
and if the current code value to be embedded in the binary code sequence corresponding to the watermark information is the second code value, determining not to update the current character code and the current character graphic information.
8. A watermark extraction method, comprising:
obtaining a character code currently corresponding to a target character in a carrier object embedded with a watermark as a current character code, and obtaining character graphic information currently corresponding to the target character as current character graphic information;
judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information;
and if the target character corresponds to watermark information, executing the step of extracting watermark information from the carrier object, wherein the step of extracting watermark information from the carrier object comprises the following steps: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information.
9. The watermark extraction method according to claim 8, further comprising:
obtaining a first character set and a second character set, wherein the first character set comprises original character codes corresponding to preset characters, and the second character set comprises custom character codes corresponding to the preset characters;
obtaining a first font set and a second font set, wherein the first font set comprises font identifications of preset fonts, and the second font set comprises font identifications of custom fonts;
The step of judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information comprises the following steps:
if the current character code belongs to the first character set and the font identification of the font to which the current character graphic information belongs to the first font set, determining watermark information corresponding to the target character;
or if the current character code belongs to the second character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining that the target character corresponds to watermark information.
10. The watermark extraction method according to claim 9, wherein the code sequence corresponding to the watermark information is a binary code sequence;
the step of obtaining the current embedded coding value in the coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information comprises the following steps:
if the current character code belongs to the second character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining the current embedded code value as a first code value;
And if the current character code does not belong to the second character set and the font identification of the font to which the current character graphic information belongs does not belong to the second font set, determining the current embedded code value as a second code value.
11. The watermark extraction method according to claim 8, further comprising:
obtaining a first font set and a second font set, wherein the first font set comprises font identifications of preset fonts, and the second font set comprises font identifications of custom fonts;
obtaining a first character set and a third character set, wherein the first character set comprises original character codes corresponding to preset characters, and the third character set comprises custom character codes corresponding to the preset characters when the preset characters are presented in the custom fonts;
the step of judging whether the target character corresponds to watermark information according to the current character code and/or the current character graphic information comprises the following steps:
if the current character code belongs to the first character set and the font identification of the font to which the current character graphic information belongs to the first font set, determining watermark information corresponding to the target character;
or if the current character code belongs to the third character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining that the target character corresponds to watermark information.
12. The watermark extraction method according to claim 11, wherein the code sequence corresponding to the watermark information is a binary code sequence;
the step of obtaining the current embedded coding value in the coding sequence corresponding to the watermark information according to the current character coding and/or the current character graphic information comprises the following steps:
if the current character code belongs to the third character set and/or the font identification of the font to which the current character graphic information belongs to the second font set, determining the current embedded code value as a first code value;
and if the current character code does not belong to the third character set and the font identification of the font to which the current character graphic information belongs does not belong to the second font set, determining the current embedded code value as a second code value.
13. A watermark extraction method, comprising:
obtaining a character pattern corresponding to a target character in the watermark-embedded carrier object image as a current character pattern;
judging whether the target character corresponds to watermark information according to the current character graph;
and if the target character corresponds to watermark information, executing the step of extracting watermark information from the carrier object image, wherein the step of extracting watermark information from the carrier object image comprises the following steps of: and obtaining a current embedded coding value in a coding sequence corresponding to the watermark information according to the current character graph.
14. The watermark extraction method according to claim 13, further comprising:
obtaining a character pattern reference table, wherein the character pattern reference table comprises original character patterns corresponding to preset characters when the preset characters are presented in a preset font and custom character patterns corresponding to the preset characters when the preset characters are presented in a custom font;
the step of judging whether the target character corresponds to watermark information according to the current character graph comprises the following steps:
and if the current character graph belongs to the character graph reference table, determining watermark information corresponding to the target character.
15. The watermark extraction method according to claim 14, wherein the code sequence corresponding to the watermark information is a binary code sequence;
the step of obtaining the current embedded coding value in the coding sequence corresponding to the watermark information according to the current character graph comprises the following steps:
if the current character graph is the same as the custom character graph in the character graph reference table, determining the current embedded coding value as a first coding value;
and if the current character graph is the same as the original character graph in the character graph reference table, determining the current embedded coding value as a second coding value.
16. The watermark extraction method according to claim 13, wherein obtaining the character pattern currently corresponding to the target character in the watermark-embedded carrier object image includes:
performing image text recognition on the carrier object image to obtain text information in the carrier object image;
determining the target character according to the text information;
and carrying out character and figure segmentation on the carrier object image, and corresponding the segmented character and figure to the target character to obtain the character and figure corresponding to the target character currently.
17. A method of modifying a carrier object, comprising:
in response to detecting a first operation for a character in a carrier object embedded with a watermark, obtaining a character code currently corresponding to the character as a current character code, and obtaining character graphic information currently corresponding to the character as current character graphic information, wherein the carrier object embedded with the watermark carries a character attribute information file, the character attribute information file contains updated character codes and updated character graphic information corresponding to preset characters, and the first operation is used for replacing the current character graphic information with first character graphic information;
If the current character code is other character codes except the updated character code in the character attribute information file, replacing the current character graphic information with the first character graphic information;
if the current character code is the updated character code in the character attribute information file, the current character graphic information is kept unchanged, or the current character graphic information is replaced by other character graphic information except the first character graphic information.
18. The carrier object modification method of claim 17, further comprising:
obtaining a carrier object embedded with a watermark;
and displaying the characters in the watermark-embedded carrier object according to the character attribute information file carried by the watermark-embedded carrier object.
19. An electronic device comprising a processor and a memory;
the memory is configured to store a program and data, and the processor invokes the program stored in the memory to implement the watermark embedding method of any one of claims 1 to 7, the watermark extraction method of any one of claims 8 to 16, or the carrier object modification method of any one of claims 17 to 18.
20. A computer readable storage medium having stored thereon a program and data, the program being executable by a processor for implementing the watermark embedding method of any one of claims 1 to 7, or the watermark extraction method of any one of claims 8 to 16, or the carrier object modification method of any one of claims 17 to 18.
CN202311301171.3A 2023-10-09 2023-10-09 Watermark embedding method and watermark extracting method Pending CN117454335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311301171.3A CN117454335A (en) 2023-10-09 2023-10-09 Watermark embedding method and watermark extracting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311301171.3A CN117454335A (en) 2023-10-09 2023-10-09 Watermark embedding method and watermark extracting method

Publications (1)

Publication Number Publication Date
CN117454335A true CN117454335A (en) 2024-01-26

Family

ID=89592009

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311301171.3A Pending CN117454335A (en) 2023-10-09 2023-10-09 Watermark embedding method and watermark extracting method

Country Status (1)

Country Link
CN (1) CN117454335A (en)

Similar Documents

Publication Publication Date Title
US8494280B2 (en) Automated method for extracting highlighted regions in scanned source
CN101443790B (en) Efficient processing of non-reflow content in a digital image
US6782509B1 (en) Method and system for embedding information in document
US20040001606A1 (en) Watermark fonts
US6801673B2 (en) Section extraction tool for PDF documents
CN110532811B (en) PDF (Portable document Format) signature method and PDF signature system
US7408556B2 (en) System and method for using device dependent fonts in a graphical display interface
Al-Nofaie et al. Utilizing pseudo-spaces to improve Arabic text steganography for multimedia data communications
WO2018196661A1 (en) Image processing device and method
CN109492199B (en) PDF file conversion method based on OCR pre-judgment
JP2004158036A (en) Computer system for identifying area on instance of machine-readable form
CN102567938B (en) Watermark image blocking method and device for western language watermark processing
US10402471B2 (en) Method for obfuscating the display of text
Memon et al. EVALUATION OF STEGANOGRAPHY FOR URDU/ARABIC TEXT.
Stojanov et al. A new property coding in text steganography of Microsoft Word documents
JP2014063481A (en) Rendering supported by cloud
US20030025940A1 (en) Document filing apparatus for storing information added to a document file
US20150169508A1 (en) Obfuscating page-description language output to thwart conversion to an editable format
CN112417087B (en) Text-based tracing method and system
CN109800547B (en) Method for quickly embedding and extracting information for WORD document protection and distribution tracking
CN117454335A (en) Watermark embedding method and watermark extracting method
JP5159588B2 (en) Image processing apparatus, image processing method, and computer program
CN117391045B (en) Method for outputting file with portable file format capable of copying Mongolian
Guo et al. Information hiding in ooxml format data based on the splitting of text elements
CN111046096A (en) Method and device for generating image-text structured information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination