CN110457873B - Watermark embedding and detecting method and device - Google Patents

Watermark embedding and detecting method and device Download PDF

Info

Publication number
CN110457873B
CN110457873B CN201810432660.5A CN201810432660A CN110457873B CN 110457873 B CN110457873 B CN 110457873B CN 201810432660 A CN201810432660 A CN 201810432660A CN 110457873 B CN110457873 B CN 110457873B
Authority
CN
China
Prior art keywords
watermark
sub
line
embedded
watermarks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810432660.5A
Other languages
Chinese (zh)
Other versions
CN110457873A (en
Inventor
董军
李莉
段云峰
王宝晗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201810432660.5A priority Critical patent/CN110457873B/en
Publication of CN110457873A publication Critical patent/CN110457873A/en
Application granted granted Critical
Publication of CN110457873B publication Critical patent/CN110457873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/0021Image watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0062Embedding of the watermark in text images, e.g. watermarking text documents using letter skew, letter distance or row distance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2201/00General purpose image data processing
    • G06T2201/005Image watermarking
    • G06T2201/0065Extraction of an embedded watermark; Reliable detection

Abstract

The application relates to the technical field of copyright protection, in particular to a watermark embedding and detecting method and a watermark embedding and detecting device, which are used for solving the problem that the capacity overhead of a watermark is increased by the existing watermark embedding method; the watermark embedding method provided by the embodiment of the application comprises the following steps: acquiring a copyright file and copyright information provided by a copyright person, wherein the copyright file is a text file with no mutual dependency relationship among contents of all rows; generating a watermark to be embedded into the copyright file based on the copyright information and the current timestamp; embedding at least one part of watermark into a copyright file, wherein any row of the copyright file is embedded with a sub-watermark, and the sub-watermark is obtained by dividing the watermark and is determined according to the text content of the row, a Hash algorithm and the number of the sub-watermarks; determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm, and hiding watermark information in the copyright file according to the characters corresponding to each line of text content, character strings corresponding to watermarks and a preset watermark hiding rule.

Description

Watermark embedding and detecting method and device
Technical Field
The present application relates to the field of copyright protection technologies, and in particular, to a watermark embedding and detecting method and device.
Background
Along with the civilization progress of the society, the copyright consciousness of people is gradually enhanced, in order to protect the legal rights and interests of copyright owners, the watermarking technology is generated at the same time, and the watermarking technology is a technology for embedding information such as the identification of the copyright legal owner, the legal ownership time and the like into a copyright file so as to realize anti-counterfeiting tracing and copyright protection.
In the prior art, a watermark to be embedded into a copyright file is divided to obtain a plurality of sub-watermarks, and then the plurality of sub-watermarks are sequentially embedded into each row of the copyright file according to the dividing sequence of the sub-watermarks so as to ensure that the complete watermarks with correct sequence can be detected subsequently.
Moreover, if a lawbreaker acquires a part of the sub-watermarks, the whole watermarks can be easily acquired according to the line numbers, and the concealment of the watermarks is not good.
Disclosure of Invention
The embodiment of the application provides a watermark embedding and detecting method and a watermark embedding and detecting device, which are used for solving the problems that the watermark embedding method in the prior art can increase the capacity overhead of watermarks, is not suitable for copyright files with limited redundant space and has poor concealment of watermarks.
In a first aspect, an embodiment of the present application provides a watermark embedding method, including:
acquiring a copyright file and copyright information provided by a copyright holder, wherein the copyright file is a text file without mutual dependency relationship among contents of each line, and then, generating watermarks to be embedded in the copyright file based on the copyright information and the current timestamp information, further embedding at least one watermark into the copyright file, wherein, any line in the copyright file is embedded with the sub-watermark, the sub-watermark is obtained by dividing the watermark and is determined according to the text content of the line, the Hash algorithm and the number of the sub-watermarks, and can utilize Hash algorithm to determine the character corresponding to every line of text content in the copyright file, and can hide the watermark information in the copyright file according to the character corresponding to every line of text content, character string corresponding to watermark and preset watermark hiding rule, therefore, the correctness of the watermark extracted from the copyright file can be verified according to the hidden watermark information in the subsequent watermark detection.
By adopting the scheme, the watermark is firstly divided to obtain a plurality of sub-watermarks, and then one sub-watermark is embedded in any line and is determined according to the text content of the line, the Hash algorithm and the number of the sub-watermarks, so that the sub-watermarks do not need to be embedded in the sequence of the sub-watermarks, and the line number space does not need to be reserved in the line, thereby saving the capacity overhead of the watermark. In addition, because the embedded sub-watermarks of each row are not required to have a front-back continuous relation, even if a lawless person acquires a part of watermarks, complete watermark information cannot be easily obtained, and the concealment of the watermarks is good.
In a possible implementation manner, the copyright information and the timestamp information may be combined, and then the combined copyright information and timestamp information are encrypted and encoded to obtain the watermark to be embedded in the copyright file.
By adopting the mode, the watermark embedded into the copyright file is encrypted, so that the security of the watermark can be enhanced, and the attack resistance of the watermark can be improved.
In a possible implementation manner, the watermark may be divided according to the length of the watermark embedded in each line, the sub-watermarks obtained by the division are numbered, then, for each line of the sub-watermark to be embedded in the copyright file, the number of the sub-watermark to be embedded in the line is determined by using the text content of the line, the hash algorithm and the number of the sub-watermarks, and the sub-watermark corresponding to the number is embedded in the line.
Therefore, for each row of the sub-watermarks to be embedded in the copyright file, the number of the sub-watermarks to be embedded in the row is determined by the text content of the row, the Hash algorithm and the number of the sub-watermarks, and then the sub-watermarks corresponding to the number are embedded in the row, so that the sub-watermarks do not need to be embedded according to the number sequence of the sub-watermarks, and the row number space does not need to be reserved in the row, and the capacity overhead of the watermarks can be saved.
In a possible implementation manner, for each line of the copyright file in which the sub-watermark is to be embedded, the number k of the sub-watermark to be embedded in the line may be determined according to the following formula:
k=CHASH%K;
wherein, CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; and K is the number of sub-watermarks obtained by dividing the watermark.
By adopting the mode, the sub-watermarks can be ensured to be uniformly embedded into the copyright file.
In a possible implementation manner, the characters included in each sub-watermark are visible characters, and when the sub-watermark corresponding to any number is embedded into a corresponding line, each visible character in the sub-watermark can be converted into an invisible character according to a preset conversion rule between the visible character and the invisible character, so as to obtain an invisible character string corresponding to the sub-watermark, and further embed the invisible character string into a specified position in the line.
By adopting the mode, the watermarks embedded in the copyright file are all invisible characters, the readability of the copyright file cannot be changed, and the influence on the copyright file is small.
In a possible implementation manner, for each line in the copyright file, a hash value of the line of text content may be determined by using a hash algorithm, and then an exclusive or operation is performed on the hash value and a preset value, and a result of the exclusive or operation is determined to be a character corresponding to the line of text content.
In a possible implementation manner, the copyright file may be line-exchanged according to characters corresponding to text contents of each line, character strings corresponding to watermarks, and a preset watermark hiding rule, where the watermark hiding rule is: and a character string consisting of characters corresponding to a plurality of continuous lines in the exchanged copyright file is a character string corresponding to the watermark, a character string consisting of characters of m continuous lines before the initial line is a preset character string used for identifying the initial position of the watermark, and a character string consisting of characters of n continuous lines after the ending line is a preset character string used for identifying the ending position of the watermark, wherein the initial line is the first line in the continuous lines, the ending line is the last line in the continuous lines, and m and n are integers larger than zero.
By adopting the scheme, each line in the copyright file is endowed with one character, then the lines in the copyright file can be exchanged according to the characters corresponding to the lines and the character strings corresponding to the watermarks, so that the watermark which is actually embedded into the copyright file is hidden in the copyright file, the correctness of the watermark extracted from the copyright file is verified when the watermark is detected subsequently, and in addition, no information is added in the copyright file when the watermark is hidden, so the requirement on the redundant space of the copyright file is smaller.
In a second aspect, an embodiment of the present application provides a watermark embedding apparatus, including:
the acquisition module is used for acquiring copyright files and copyright information provided by a copyright person, wherein the copyright files are text files without mutual dependency relationship among contents of each line;
the generating module is used for generating a watermark to be embedded into the copyright file based on the copyright information and the current timestamp information;
the embedding module is used for embedding at least one part of the watermark into the copyright file, wherein the watermark is embedded in any line of the copyright file and is a sub-watermark which is obtained by dividing the watermark and is determined according to the text content of the line, a hash algorithm and the number of the sub-watermarks;
and the hiding module is used for determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm and hiding the watermark information in the copyright file according to the characters corresponding to each line of text content, the character strings corresponding to the watermarks and a preset watermark hiding rule.
For technical effects brought by any design manner in the second aspect of the present application, reference may be made to technical effects brought by different implementation manners in the first aspect, and details are not described here.
In a third aspect, an embodiment of the present application provides a watermark detection method, including:
acquiring a copyright file, wherein the copyright file is a text file without mutual dependency relationship among contents of all lines, then determining characters corresponding to the contents of each line of text in the copyright file by using a Hash algorithm, further determining a watermark hidden in the copyright file according to the characters corresponding to the contents of all lines of text and a preset watermark hiding rule, and extracting a watermark embedded in the copyright file, wherein the watermark extracted from any line in the copyright file is a sub-watermark, the sub-watermark is obtained by dividing the watermark and is determined according to the contents of the text, the Hash algorithm and the number of the sub-watermarks of the line, and if the extracted watermark is determined to be the same as the hidden watermark, the watermark can be analyzed to obtain copyright information of a copyright owner and time stamp information when the copyright owner starts to own the copyright file.
By adopting the scheme, the watermark which is to be embedded into the copyright file and the watermark which is actually embedded into the copyright file can be simultaneously obtained, and the watermark detection is correct only when the two watermarks are the same, so that when the extracted watermark is determined to be the same as the hidden watermark, the watermark is analyzed to obtain the copyright information of a copyright owner, and the credibility of a traced copyright file propagation source can be ensured.
In a possible implementation manner, for each line in the copyright file, a hash value of the line of text content is determined by using a hash algorithm, then, an exclusive or operation is performed on the hash value and a preset value, and a result of the exclusive or operation is determined to be a character corresponding to the line of text content.
In a possible implementation manner, according to characters corresponding to text contents of rows and a preset watermark hiding rule, when a character string formed by characters corresponding to m continuous rows in a copyright file is determined to be a character string used for identifying a watermark start bit, and a character string formed by characters corresponding to n continuous rows is determined to be a character string used for identifying a watermark end bit, a character string formed by characters corresponding to each row between the m rows and the n rows is determined to be a watermark hidden in the copyright file, wherein m and n are integers larger than zero.
By adopting the mode, each line of text content in the copyright file corresponds to one character, and the watermark hidden in the copyright file can be found by utilizing the characters, the preset character string for identifying the watermark ending bit and the preset character string for identifying the watermark ending bit, so that the method is convenient and quick.
In a possible implementation manner, for each row of the sub-watermarks embedded in the copyright file, the sub-watermarks embedded in the row are determined according to a preset watermark embedding position and the watermark length embedded in each row, the number of the sub-watermarks embedded in the row is determined by using the text content, the hash algorithm and the number of the sub-watermarks of the row, and the watermarks embedded in the copyright file are determined according to the sub-watermarks embedded in the rows and the numbers of the sub-watermarks.
In a possible implementation manner, the characters included in each sub-watermark embedded in the copyright file are all invisible characters, an invisible character string corresponding to the sub-watermark can be segmented from each row according to a preset watermark embedding position and the length of the watermark embedded in each row, each invisible character in the character string is converted into a visible character according to a conversion rule between the preset visible character and the invisible character, and the obtained visible character string is determined as the sub-watermark embedded in the row.
In a possible implementation manner, for each line of the copyright file in which the sub-watermark is embedded, the number k of the sub-watermark embedded in the line can be determined according to the following formula:
k=CHASH%K;
wherein, CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; and K is the number of sub-watermarks obtained by dividing the watermark.
In a possible implementation manner, for each number, the sub-watermarks with the largest occurrence number may be counted, the sub-watermarks are determined as the sub-watermarks corresponding to the number, and then the sub-watermarks are spliced according to the sequence from the small number to the large number to obtain the watermark embedded in the copyright file.
By adopting the mode, even if the watermark is partially tampered or removed, the correct extraction of the watermark is not influenced, and the copyright file can be effectively protected.
In a possible implementation manner, the watermark is decoded and decrypted to obtain the combined copyright information and time stamp information, and then the combined copyright information and time stamp information are split to obtain the copyright information and time stamp information of the copyright owner.
In a fourth aspect, an embodiment of the present application provides a watermark detection apparatus, including:
the acquisition module is used for acquiring a copyright file, wherein the copyright file is a text file without mutual dependency relationship among contents of all rows;
the determining module is used for determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm, and determining the watermark hidden in the copyright file according to the characters corresponding to each line of text content and a preset watermark hiding rule;
the copyright file processing module is used for processing the copyright file, and comprises an extraction module, a comparison module and a comparison module, wherein the extraction module is used for extracting the watermark embedded in the copyright file, the watermark extracted from any line in the copyright file is a sub-watermark, and the sub-watermark is obtained by dividing the watermark and is determined according to the text content of the line, a Hash algorithm and the number of sub-watermarks;
and the analysis module is used for analyzing any watermark to obtain copyright information of a copyright owner and timestamp information when the copyright owner starts to own the copyright file if the extracted watermark is determined to be the same as the hidden watermark.
For technical effects brought by any design manner in the fourth aspect of the present application, reference may be made to technical effects brought by different implementation manners in the third aspect, and details are not described here.
In a fifth aspect, a computer provided in an embodiment of the present application includes at least one processing unit and at least one storage unit, where the storage unit stores program code, and when the program code is executed by the processing unit, the computer is caused to perform the steps of the above watermark embedding and/or watermark detecting method.
In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, which includes program code, and when the program code runs on a computer, the computer is caused to execute the steps of the above watermark embedding and/or watermark detection method.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
Fig. 1 is a schematic view of an application scenario of a watermark embedding method according to an embodiment of the present application;
fig. 2 is a flowchart of a watermark embedding method provided in an embodiment of the present application;
fig. 3 is a flowchart of another watermark embedding method provided in the embodiment of the present application;
fig. 4 is a flowchart of a watermark detection method provided in an embodiment of the present application;
fig. 5 is a schematic diagram of a large data platform-based watermark embedding and extracting system according to an embodiment of the present application;
fig. 6 is a flowchart of a further watermark embedding method provided in an embodiment of the present application;
fig. 7 is a flowchart of another watermark detection method provided in an embodiment of the present application;
fig. 8 is a block diagram of a watermark embedding apparatus according to an embodiment of the present application;
fig. 9 is a block diagram of a watermark detection apparatus according to an embodiment of the present application;
fig. 10 is a hardware structural diagram of a computer for implementing a watermark embedding and/or watermark detection method according to an embodiment of the present application.
Detailed Description
In order to solve the problems that the watermark embedding method in the prior art can increase the capacity overhead of the watermark, is not suitable for copyright files with limited redundant space and has poor concealment of the watermark, the embodiment of the application provides a watermark embedding and detecting method and a watermark embedding and detecting device.
It should be noted that, in the embodiment of the present application, the copyright file is a text file, and text contents of different lines in the text file are independent from each other and have no mutual dependency relationship, for example, a csv file, and when text contents of different lines in the copyright file are interchanged, the correctness and readability of the text contents in the copyright file are not affected.
Referring to fig. 1, fig. 1 is a schematic view illustrating an application scenario of the watermark embedding method provided in the embodiment of the present application, where the application scenario includes a terminal 11 and a server 12, where the terminal is, for example, a personal computer, an iPad, a mobile phone, and the like, and the server may be any device capable of providing an internet service.
In specific implementation, a user (an original owner of a copyright file) uploads the copyright file to be embedded with the watermark and copyright information of a legal purchaser to a server through a terminal, the server is requested to embed the copyright information of the legal purchaser into the copyright file, after receiving the copyright file and the copyright information of the legal purchaser, the server can generate the watermark to be embedded into the copyright file according to the copyright information and current timestamp information, and then at least one watermark is embedded into the copyright file.
Specifically, the watermark can be divided according to the length of the watermark embedded in each line to obtain a plurality of sub-watermarks, then, the sub-watermark to be embedded in each line of the sub-watermark to be embedded in the copyright file is determined according to the text content, the hash algorithm and the number of the sub-watermarks of the line, so that the sub-watermark to be embedded in the line is determined by the hash algorithm, the text content and the number of the sub-watermarks of each line, and the sub-watermark to be embedded in each line is not required to have a front-back continuous relationship, i.e. the line number information of the next embedded sub-watermark line is not required to be written in the line in which the sub-watermark is embedded, therefore, the space required for embedding the watermark in the copyright file is smaller, the method is more suitable for the copyright file with a limited redundant space, and because the sub-watermarks embedded in each line do not have a front-back continuous relationship, even if a, complete watermark information cannot be easily obtained, and the concealment of the watermark is good.
Moreover, the server can also determine the characters corresponding to each line of text content in the copyright file by utilizing a hash algorithm, and further hide the watermark information in the copyright file according to the characters corresponding to each line of text content and the character strings corresponding to the watermarks, so that the correctness of the watermarks extracted from the copyright file can be verified according to the hidden watermark information when the watermark detection is carried out on the copyright file in the following process, wherein the watermark information which is actually required to be embedded into each copyright file is hidden in the copyright file without being recorded one by one at the server side, and the pressure of the server can be reduced.
In practical application, due to a distributed data processing mechanism of a big data platform, the copyright file cannot ensure that the data sequence is consistent with the original sequence after being processed by the big data platform, and the existing watermark embedding and detecting method strictly depends on the recording sequence of the original data, so that the method is not suitable for the big data platform. The watermark embedding and detecting method provided by the embodiment of the application has no requirement on the recording sequence of the file contents in the copyright file, so that the method is very suitable for a large data platform.
It should be noted that the above application scenarios are only for facilitating the understanding of the spirit and principles of the present application by the relevant persons, and do not constitute a limitation to the application scenarios of the embodiments of the present application.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it should be understood that the preferred embodiments described herein are merely for illustrating and explaining the present application, and are not intended to limit the present application, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
As shown in fig. 2, a flowchart of a watermark embedding method provided in an embodiment of the present application includes the following steps:
s201: and acquiring a copyright file and copyright information provided by a copyright owner, wherein the copyright file is a text file with no interdependence relation between contents of all rows.
The copyright information provided by the copyright owner can be used for identifying a purchaser, such as a mobile phone number, an identity card and the like of the purchaser.
S202: and generating the watermark to be embedded into the copyright file based on the copyright information and the current timestamp information.
Wherein, the current timestamp information can be used as the time when the copyright owner starts to own the copyright file.
Specifically, the copyright information and the timestamp information are combined, then the combined copyright information and timestamp information are encrypted and encoded, and the obtained character string is used as a watermark to be embedded into the copyright file.
S203: and embedding at least one part of watermark into the copyright file, wherein any row of the copyright file is embedded with a sub-watermark, and the sub-watermark is obtained by dividing the watermark and is determined according to the text content of the row, the Hash algorithm and the number of the sub-watermarks.
In order to improve the accuracy and efficiency of subsequent watermark detection, the redundancy of the watermark can be specified, and the redundancy of the watermark determines the number of the watermarks embedded in the copyright file. Generally, the redundancy of the watermark can be determined according to the number of lines of the copyright file, and the more the number of lines of the copyright file is, the greater the redundancy of the watermark is; the less the number of lines of the copyright file, the less the redundancy of the watermark.
In specific implementation, firstly, the watermark is divided according to the length of the watermark embedded in each line, and the sub-watermarks obtained by division are numbered from zero.
For example, the character string corresponding to the watermark is: 111000011100, each row of embedded watermark has a fixed length of 4, so the watermark can be divided into 3 sub-watermarks, which are: 1110. 0001 and 1100, and the numbers corresponding to the sub-watermarks are 0, 1 and 2 in sequence.
Secondly, aiming at each line of the sub-watermarks to be embedded in the copyright file, determining the number of the sub-watermarks to be embedded in the line by using the text content, the hash algorithm and the number of the sub-watermarks in the line.
For example, for each row of the sub-watermark to be embedded in the copyright file, the number k of the sub-watermark to be embedded in the row may be determined according to the following formula:
k=CHASH%K;
wherein, CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; and K is the number of sub-watermarks obtained by dividing the watermark.
Further, a sub-watermark corresponding to the number is embedded in the row.
In practical application, the characters contained in each sub-watermark are visible characters, when the sub-watermark corresponding to any serial number is embedded into the row, each visible character in the sub-watermark can be converted into an invisible character according to a preset conversion rule between the visible character and the invisible character to obtain an invisible character string corresponding to the sub-watermark, and the invisible character string is further embedded into a specified position in the row, so that the concealment of the watermark can be enhanced, and the influence on the copyright file can be reduced.
S204: determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm, and hiding watermark information in the copyright file according to the characters corresponding to each line of text content, character strings corresponding to watermarks and a preset watermark hiding rule so as to verify the correctness of the watermark extracted from the copyright file according to the hidden watermark information during watermark detection.
In specific implementation, the characters corresponding to each line of text content in the copyright file can be determined according to the following steps:
and determining the hash value of the content of the line of the text by using a hash algorithm for each line in the copyright file, further performing XOR operation on the hash value and a preset value, and determining the result of the XOR operation as a character corresponding to the content of the line of the text.
For example, the character i corresponding to each line of text content can be determined according to the following formula:
Figure BDA0001653790560000111
wherein L isHASHIs the hash value of the line of text content.
Further, line exchange is performed on the copyright file according to characters corresponding to each line of text content, character strings corresponding to watermarks and a preset watermark hiding rule, wherein the watermark hiding rule is as follows: the method comprises the steps that character strings formed by characters corresponding to a plurality of continuous lines in the exchanged copyright file are character strings corresponding to watermarks, character strings formed by characters corresponding to m continuous lines before an initial line are preset character strings used for identifying initial positions of the watermarks, character strings formed by characters corresponding to n continuous lines after an end line are preset character strings used for identifying end positions of the watermarks, wherein the initial line is the first line in the continuous lines, the end line is the last line in the continuous lines, m and n are integers larger than zero, and the values of m and n can be the same.
Taking the character string for identifying the watermark start bit as m 1's and the character string for identifying the watermark end bit as n 0's as an example, the above process may be performed according to the following steps:
the first step is as follows: and the random number generator generates a pseudo random number, and selects a starting line i in the copyright file according to the pseudo random number.
The second step is that: starting from the ith line, performing line exchange on the copyright file to ensure that all characters corresponding to the i-i + m lines of text content are changed into 1, and starting from i + m + 1-i + m + LwmThe character string formed by the characters corresponding to the line text content is the character string corresponding to the watermark, i + m + Lwm+1~i+m+n+LwmWhen all characters corresponding to the +1 line of text content are changed into 0, the hiding of the watermark is completed, wherein LwmIs the watermark length.
It should be noted that there is no specific order between the steps S203 and S204.
The above process is described in detail with reference to specific examples.
After the server acquires the copyright file and the copyright information provided by the copyright owner, the copyright information and the current timestamp information automatically generated by the system can be connected in series to form a plaintext character string with a fixed length, the plaintext character string is symmetrically encrypted and encoded to generate a binary character string consisting of 0 and 1, and the binary character string is determined as the watermark to be embedded into the copyright file.
Further, determining the watermark redundancy embedded into the copyright file, dividing the watermark character string according to the fixed embedded watermark length of each line to obtain a plurality of sub-watermark character strings, numbering each sub-watermark character string, then determining the number of the sub-watermark character string to be embedded in each line of the watermark to be embedded, mapping the sub-watermark character string corresponding to the number into an invisible character string, and further embedding the invisible character string at the tail of the line.
Specifically, the above process may be performed according to the flow shown in fig. 3:
s301: and dividing the watermarks, and numbering each part of sub-watermarks obtained by division in sequence.
Specifically, according to the formula: k is LwmL, dividing the watermark into K parts, and numbering the K parts of sub-watermarks obtained by division, wherein LwmIs the length of the watermark string, l isThe length of the embedded watermark per line.
S302: and reading a row of the sub-watermark to be embedded in the copyright file, and determining the number of the sub-watermark to be embedded in the row.
Specifically, a hash value C of the text content before the fixed position in the line is calculatedHASHMixing C withHASHPerforming modulus operation on the number K of the sub-watermarks to obtain the number of the sub-watermarks to be embedded into the row: k is CHASH%K。
S303: and mapping the sub-watermark string corresponding to the serial number into an invisible character string, and embedding the invisible character string into the tail of the row.
For example, a binary string corresponding to a sub-watermark is: 1001, the preset mapping rule between visible characters and invisible characters is: 0- > space, 1- > Tab, then the invisible string after mapping 1001 is: tab spaces Tab, after which the invisible string is embedded into the end of the line.
S304: judging whether the row of the sub-watermark to be embedded still exists in the copyright file, if so, returning to S302; otherwise, finishing watermark embedding.
Here, assuming that the redundancy of the watermark is r, the total number of sub-watermarks embedded in the copyright file is: and N is K r, and then the embedding operation of the sub-watermark can be carried out only by selecting a corresponding number of rows from the copyright file according to a certain row selection rule.
The row selection rule is as follows: a sub-watermark is embedded every 5 rows or after every 10 rows in the next 3 rows. This is merely an example and does not constitute a limitation on the determination of the sub-watermark lines to be embedded in the present application.
Moreover, the hash algorithm can be used to determine the characters corresponding to each line of text content in the copyright file, and then the watermark information is hidden in the copyright file according to the characters corresponding to each line of text content, the character strings corresponding to the watermark, and the preset watermark hiding rule.
For example, the character string corresponding to the watermark is: 111000011100, at this point, the watermark length LwmAt 12, assume that the character string of the watermark start bit is: 1111 of watermark end bitsThe character string is: 0000, where m is 4.
Then, after determining the starting line i in the copyright file, the lines in the copyright file can be exchanged according to the characters corresponding to the text contents of the lines and the character strings corresponding to the watermarks, so that the characters corresponding to the text contents of the i-i +4 lines are all 1, and the characters corresponding to the text contents of the i-i +4 lines are all 1, i +4+ 1-i +4+ LwmThe character string formed by the character sequence corresponding to the line text content is 111000011100 from i +4+ Lwm+1~i+2*4+LwmAll characters corresponding to the +1 lines of text content are 0.
As shown in fig. 4, a flowchart of a watermark detection method provided in the embodiment of the present application includes the following steps:
s401: and acquiring a copyright file, wherein the copyright file is a text file without mutual dependency relationship among the contents of all rows.
S402: determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm, and determining the watermark hidden in the copyright file according to the characters corresponding to each line of text content and a preset watermark hiding rule.
Specifically, determining the character corresponding to each line of text content in the copyright file by using a hash algorithm includes:
and determining the hash value of the line of text content by using a hash algorithm for each line in the copyright file, further performing XOR operation on the hash value and a preset value, and determining that the result of the XOR operation is the character corresponding to the line of text content.
For example, the character i corresponding to each line of text content can be calculated according to the following formula:
Figure BDA0001653790560000131
wherein L isHASHIs the hash value of the line of text content.
Further, when determining that a character string formed by the character sequence corresponding to m lines of text contents in the copyright file is a character string for identifying a watermark start bit and a character string formed by the character sequence corresponding to n lines of text contents in the copyright file is a character string for identifying a watermark end bit, determining that a character string formed by the character sequence corresponding to each line of text contents between m lines and n lines is a watermark hidden in the copyright file, wherein m and n are integers greater than zero, and the values of m and n can be the same.
Taking the character string of the watermark start bit as m 1, and the character string of the watermark end bit as n 0 as an example, when detecting the characters corresponding to each line of text content line by line, if it is determined that all the characters corresponding to i-i + m lines of text content are 1, i + m + L from a certain line iwm+1~i+m+n+LwmIf all the characters corresponding to the +1 line of text content are 0, i + m +1 to i + m + LwmAnd determining a character string formed by the characters corresponding to the line text content in sequence as a character string corresponding to the watermark.
S403: extracting the watermark embedded in the copyright file, wherein the watermark extracted from any line in the copyright file is a sub-watermark, and the sub-watermark is obtained by dividing the watermark and is determined according to the text content of the line, the Hash algorithm and the number of the sub-watermarks.
In specific implementation, for each row of the sub-watermarks embedded in the copyright file, the sub-watermarks embedded in the row can be determined according to a preset watermark embedding position and the watermark length embedded in each row.
Specifically, for each row of the sub-watermarks embedded in the copyright file, according to a preset watermark embedding position and the watermark length embedded in each row, an invisible character string corresponding to the sub-watermark is segmented from the row, each invisible character in the invisible character string is converted into a visible character according to a conversion rule between a preset visible character and the invisible character, and the obtained visible character string is determined as the sub-watermark embedded in the row.
And, the number of the sub-watermark embedded in the line can be determined using the text content of the line, the hash algorithm, and the number of copies of the sub-watermark.
For example, for each line of the copyright file in which the watermark is embedded, the number k of the sub-watermark embedded in the line can be determined according to the following formula:
k=CHASH%K;
wherein, CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; and K is the number of sub-watermarks obtained by dividing the watermark.
Further, the watermark embedded in the copyright file is determined according to the sub-watermarks embedded in the rows and the numbers of the sub-watermarks.
Specifically, for each number, the sub-watermarks with the largest occurrence number are counted, the sub-watermarks are determined as the sub-watermarks corresponding to the number, and then the sub-watermarks are spliced according to the sequence from the small number to the large number to obtain the watermark embedded in the copyright file.
For example, if there are multiple sub-watermarks corresponding to the number 0, but the number of occurrences is 1110, 1110 may be determined as the sub-watermark uniquely corresponding to the number 0; the number 1 of the corresponding sub-watermarks is also multiple, but the number of the occurrence times is 0001 at most, and then 0001 can be determined as the sub-watermark uniquely corresponding to the number 1; there are also a plurality of sub-watermarks corresponding to number 2, but the number of occurrences is 1100 the most, then 1100 may be determined as the sub-watermark uniquely corresponding to number 2, and 1110, 0001, and 1100 are concatenated in descending order of number to obtain a watermark string: 111000011100.
this is because, in practical applications, the sub-watermarks embedded in the copyright file may be maliciously deleted or partially deleted by mistake, which may cause the extracted sub-watermark types to exceed the original divided types.
S404: if the extracted watermark is the same as the hidden watermark, analyzing any watermark to obtain copyright information of a copyright owner and timestamp information when the copyright owner starts to own the copyright file.
Specifically, any watermark is decoded and decrypted to obtain combined copyright information and timestamp information, and the combined copyright information and timestamp information are split to obtain copyright information of a copyright owner and timestamp information when the copyright owner starts to own the copyright file.
It should be noted that there is no specific order between the steps S402 and S403.
In the embodiment of the application, the watermark hidden in the copyright file is the watermark which is actually embedded into the copyright file, the extracted watermark is the watermark which is actually embedded into the copyright file, and the detected watermark is correct only when the two watermarks are the same, so that the watermark is analyzed to obtain the information of the copyright owner when the two watermarks are determined to be the same, and the reliability of the spreading source of the copyright file traced back according to the information of the copyright owner can be ensured.
The above process is described in detail with reference to specific embodiments.
After the server acquires the copyright file, the server can calculate the characters corresponding to each line of text content by utilizing a Hash algorithm, when the character string used for identifying the initial bit of the watermark is detected, the characters corresponding to each line of text content are recorded until the character string used for identifying the end bit of the watermark is detected, and the character string formed by the recorded character sequence is used as the watermark character string to be compared.
Taking the character strings for marking the initial bit of the watermark as m 1, and the character strings for marking the end bit of the watermark as m 0, when the server detects that the characters corresponding to m continuous lines in the copyright file are 1, recording the characters corresponding to the line from the next line, and stopping the detection until the characters corresponding to m continuous lines are 0, and further taking the character strings formed by the recorded character sequences as binary character strings to be compared.
For example, for each line in the copyright file, the character i corresponding to the text content of the line can be calculated according to the following formula:
Figure BDA0001653790560000161
wherein L isHASHIs the hash value of the line of text content.
Further, for each row of the sub-watermarks embedded in the copyright file, determining the number of the sub-watermarks embedded in the row by using the text content, the hash algorithm and the number of the sub-watermarks, converting the invisible characters at the end of the row into visible binary characters, then counting the sub-watermarks with the largest occurrence frequency for each number, determining the sub-watermarks to be the sub-watermarks corresponding to the number, and splicing the sub-watermarks according to the sequence of the numbers from small to large to obtain the watermarks actually embedded in the copyright file.
In specific implementation, the above process may be performed according to the following steps:
the first step is as follows: and reading the text content in the copyright file line by line, and determining the sub-watermark embedded in the line and the number of the sub-watermark.
Specifically, for each line in the copyright file, whether invisible characters exist at the end of the line is detected, if so, the text content part of the line and the invisible watermark character part at the end of the line are divided, and then the hash value C of the text content S bits before the line is calculatedHASHAnd further calculating the number k ═ C of the row of embedded sub-watermarksHASH% K, where K is the number of sub-watermarks obtained by segmenting the watermark.
And converting the invisible watermark character part at the end of the line into a visible binary character string, and storing the character string as the kth sub-watermark.
The second step is that: and determining the watermark embedded in the copyright file according to the sub-watermarks embedded in the rows and the number of the sub-watermarks.
Specifically, for each number, the sub-watermarks with the largest occurrence number are counted, the sub-watermarks are used as the sub-watermarks corresponding to the number, and then the sub-watermarks are connected in series according to the sequence from the small number to the large number, so that the watermark character string embedded in the copyright file can be obtained.
Further, if it is determined that the watermark character string to be compared is the same as the watermark character string embedded in the copyright file, a symmetric cryptographic algorithm can be used for decrypting any binary watermark character string, the decrypted plaintext character string is subjected to inverse encoding to obtain plaintext watermark element information, the information comprises copyright information provided by a copyright owner and time information when the copyright owner starts to own the copyright file, and the time information when the copyright owner starts to own the copyright file can be represented by system time stamp information when the watermark is embedded.
As shown in fig. 5, a schematic diagram of a watermark embedding and extracting system based on a big data platform provided for an embodiment of the present application includes a watermark embedding module and a watermark extracting module, where:
the watermark embedding module comprises a watermark generating unit and a watermark embedding unit, wherein the watermark generating unit is used for converting the watermark information string into a binary watermark character string; and the watermark embedding unit is used for embedding the binary watermark character string into the copyright file.
The watermark extraction module comprises a watermark extraction unit and a watermark recovery unit, wherein the watermark extraction unit is used for extracting the binary watermark character string embedded in the copyright file; and the watermark recovery unit is used for converting the binary watermark character string into a watermark information string to obtain the copyright information of the copyright file.
Corresponding to fig. 5, fig. 6 is a flowchart of a further watermark embedding method provided in the embodiment of the present application, including:
s601: and generating a binary watermark character string according to the copyright information and the current time stamp provided by the copyright owner.
Specifically, copyright information provided by a copyright owner and current timestamp information automatically generated by a system are connected in series to form a plaintext character string with a fixed length, and the plaintext character string is symmetrically encrypted by adopting Advanced Encryption Standard (AES) and encoded by BASE64 to generate a binary watermark character string consisting of 0 and 1 characters.
S602: the watermarks are divided to obtain multiple sub-watermarks, and each sub-watermark is numbered.
S603: and reading a row of the copyright file to be embedded with the sub-watermark, and determining the number of the sub-watermark embedded into the row.
Alternatively, the top 5-digit text content of the line may be calculatedMD5 value hMD5H is to beMD5And performing modulus operation on the number K of the sub-watermarks to obtain the number of the sub-watermarks embedded in the row.
S604: and mapping the binary sub-watermark string corresponding to the serial number into an invisible character string, and embedding the invisible character string into the tail of the line.
S605: judging whether the row of the sub-watermark to be embedded still exists in the copyright file, if so, returning to the S603; otherwise, the process proceeds to S606.
S606: the row number of the starting row at the time of the row swap is determined.
For example, a pseudo random number i is generated by a random number generator, and i is used as a start line number.
S607: and performing line exchange on the copyright file based on the line number of the starting line.
Concretely, i &i+mOf rows
Figure BDA0001653790560000181
All values are exchanged for 1 from i + m +1 to i + m + LwmOf rows
Figure BDA0001653790560000182
The value is exchanged into a binary watermark encoding value, from i + m + Lwm+1~i+2*m+LwmOf +1 lines
Figure BDA0001653790560000183
And (4) all values are exchanged to be 0, and watermark embedding is finished, wherein m is the length of the initial character string and the end character string which are set for identifying the watermark.
Corresponding to fig. 5, fig. 7 is a flowchart of another watermark detection method provided in the embodiment of the present application, including:
s701: a hidden watermark in the copyright file is determined.
Specifically, the text content of the copyright file is read line by line, the corresponding characters of the text content of the line are determined, and if m continuous lines are detected
Figure BDA0001653790560000184
Is 1, will start from the next row
Figure BDA0001653790560000186
Until m consecutive rows are detected
Figure BDA0001653790560000185
When the value of (d) is 0, detection is stopped, and the recorded binary string is marked as watermark1 (i.e., a hidden watermark).
S702: and determining the number of each row of embedded sub-watermarks and the number of each sub-watermark.
Optionally, when the text content of the copyright file is read line by line, whether invisible characters exist at the end of the line or not can be detected, and if the invisible characters exist, the text content part of the line and the invisible watermark character part at the end of the line are divided.
Further, the MD5 value h is calculated for the top 5-digit text contentMD5Then, the number k of the sub-watermark embedded in the row is calculated as hMD5% K, where K is the number of sub-watermarks.
And converting the invisible watermark character part of the line into a binary string, and storing the binary string as the kth sub-watermark until the whole file is read.
S703: and determining the watermark embedded into the copyright file according to the embedded sub-watermarks of all rows and the number of the sub-watermarks.
Specifically, for each number, the sub-watermarks with the largest occurrence number are counted, the sub-watermarks are used as the sub-watermarks corresponding to the number, and then the sub-watermarks are combined in the order from small to large according to the number to obtain a complete binary watermark character string, which is recorded as watermark 2.
S704: judging whether the hidden watermark is the same as the extracted watermark, if so, entering S705; otherwise, determining that the watermark detection fails.
That is, it is compared whether or not the watermark1 is the same as the watermark2, and if so, the watermark1 or the watermark2 is output.
S705: and analyzing any watermark to obtain watermark element information.
Specifically, AES symmetric decryption and BASE64 inverse encoding are performed on the output binary watermark character string to obtain original plain text watermark element information, and then the original plain text watermark element information is split to obtain copyright information and timestamp information.
Aiming at the characteristics that the recording sequence of data in a plain text file of a large data platform is not fixed and the redundant space is limited, the watermark embedding method provided by the embodiment of the application embeds invisible characters into a copyright text, is irrelevant to the recording sequence of the data, and has small required watermark space, so that the method has strong applicability to the plain text file of the large data platform.
Based on the same inventive concept, the embodiment of the present application further provides a watermark embedding apparatus corresponding to the watermark embedding method, and as the principle of the apparatus for solving the problem is similar to the watermark embedding method in the embodiment of the present application, the implementation of the apparatus can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 8, a structure diagram of a watermark embedding apparatus provided in an embodiment of the present application includes:
an obtaining module 801, configured to obtain a copyright file and copyright information provided by a copyright owner, where the copyright file is a text file with no interdependence relationship between content rows;
a generating module 802, configured to generate a watermark to be embedded in the copyright file based on the copyright information and the current timestamp information;
an embedding module 803, configured to embed at least one part of the watermark into the copyright file, where the watermark is a sub-watermark embedded in any line of the copyright file, and the sub-watermark is obtained by dividing the watermark and is determined according to the text content of the line, a hash algorithm, and the number of parts of the sub-watermark;
a hiding module 804, configured to determine, by using a hash algorithm, a character corresponding to each line of text content in the copyright file, and hide watermark information in the copyright file according to the character corresponding to each line of text content, a character string corresponding to the watermark, and a preset watermark hiding rule.
In a possible implementation manner, the generating module 802 is specifically configured to:
combining the copyright information and the timestamp information;
and encrypting and coding the combined copyright information and the timestamp information to obtain the watermark to be embedded into the copyright file.
In a possible implementation, the embedding module 803 is specifically configured to:
dividing the watermark according to the length of the embedded watermark in each row, and numbering the sub-watermarks obtained by dividing;
and aiming at each row of the sub-watermarks to be embedded in the copyright file, determining the number of the sub-watermarks to be embedded in the row by using the text content, the Hash algorithm and the number of the sub-watermarks in the row, and embedding the sub-watermarks corresponding to the number into the row.
In a possible implementation manner, for each line of the copyright file in which the sub-watermark is to be embedded, the embedding module 803 is specifically configured to determine the number k of the sub-watermark to be embedded in the line according to the following formula:
k=CHASH%K;
wherein, CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; and K is the number of sub-watermarks obtained by dividing the watermark.
In one possible implementation, the characters contained in each sub-watermark are visible characters; the embedding module 803 is specifically configured to:
converting each visible character in the sub-watermark into an invisible character according to a preset conversion rule between the visible character and the invisible character to obtain an invisible character string corresponding to the sub-watermark;
the invisible string is embedded in a specified position in the line.
In a possible implementation manner, the hiding module 804 is specifically configured to:
for each line in the copyright file, determining the hash value of the line of text content by using a hash algorithm;
and performing XOR operation on the hash value and a preset value, and determining that the result of the XOR operation is the character corresponding to the text content of the line.
In a possible implementation manner, the hiding module 804 is specifically configured to:
and performing line exchange on the copyright file according to characters corresponding to each line of text content, character strings corresponding to the watermarks and a preset watermark hiding rule, wherein the watermark hiding rule is as follows: and a character string consisting of characters corresponding to a plurality of continuous lines in the exchanged copyright file is a character string corresponding to the watermark, a character string consisting of characters of m continuous lines before the initial line is a preset character string used for identifying the initial position of the watermark, and a character string consisting of characters of n continuous lines after the ending line is a preset character string used for identifying the ending position of the watermark, wherein the initial line is the first line in the continuous lines, the ending line is the last line in the continuous lines, and m and n are integers larger than zero.
Similarly, the embodiment of the present application also provides a watermark detection apparatus corresponding to the watermark detection method, and as the principle of the apparatus for solving the problem is similar to the watermark detection method in the embodiment of the present application, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 9, a structure diagram of a watermark embedding apparatus provided in an embodiment of the present application includes:
an obtaining module 901, configured to obtain a copyright file, where the copyright file is a text file having no interdependence relationship between content rows;
a determining module 902, configured to determine, by using a hash algorithm, a character corresponding to each line of text content in the copyright file, and determine, according to the character corresponding to each line of text content and a preset watermark hiding rule, a watermark hidden in the copyright file;
an extracting module 903, configured to extract the watermark embedded in the copyright file, where the watermark extracted from any line in the copyright file is a sub-watermark, and the sub-watermark is obtained by dividing the watermark and is determined according to the text content of the line, a hash algorithm, and the number of copies of the sub-watermark;
an analyzing module 904, configured to analyze any one of the watermarks to obtain copyright information of the copyright owner and timestamp information when the copyright owner starts to own the copyright file if it is determined that the extracted watermark is the same as the hidden watermark.
In a possible implementation manner, the determining module 902 is specifically configured to:
for each line in the copyright file, determining the hash value of the line of text content by using a hash algorithm;
and performing XOR operation on the hash value and a preset value, and determining that the result of the XOR operation is the character corresponding to the text content of the line.
In a possible implementation manner, the determining module 902 is specifically configured to:
according to characters corresponding to text contents of all lines and a preset watermark hiding rule, determining that a character string formed by characters corresponding to m continuous lines in the copyright file is a character string used for identifying a watermark initial position, and determining a character string formed by characters corresponding to n continuous lines between the m lines and the n lines as a watermark hidden in the copyright file when a character string formed by characters corresponding to the n continuous lines is a character string used for identifying a watermark end position, wherein m and n are integers larger than zero.
In a possible implementation manner, the extraction module 903 is specifically configured to:
aiming at each row of the sub-watermarks embedded in the copyright file, determining the sub-watermarks embedded in the row according to a preset watermark embedding position and the watermark length embedded in each row, and determining the number of the sub-watermarks embedded in the row by using the text content, the hash algorithm and the number of the sub-watermarks in the row;
and determining the watermark embedded into the copyright file according to the sub-watermarks embedded into the rows and the serial numbers of the sub-watermarks.
In a possible implementation, the characters contained in each sub-watermark embedded in the copyright file are invisible characters; the extraction module 903 is specifically configured to:
according to a preset watermark embedding position and the watermark length embedded in each row, dividing an invisible character string corresponding to the sub-watermark from the row;
and converting each invisible character in the character string into a visible character according to a preset conversion rule between the visible character and the invisible character, and determining the obtained visible character string as the sub-watermark embedded into the row.
In a possible implementation manner, for each line of the copyright file in which the sub-watermark is embedded, the extracting module 903 is specifically configured to determine the number k of the sub-watermark embedded in the line according to the following formula:
k=CHASH%K;
wherein, CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; and K is the number of sub-watermarks obtained by dividing the watermark.
In a possible implementation manner, the extraction module 903 is specifically configured to:
counting the sub-watermarks with the most occurrence times aiming at each number, and determining the sub-watermarks to be the sub-watermarks corresponding to the numbers;
and splicing the sub-watermarks according to the sequence of the numbers from small to large to obtain the watermark embedded into the copyright file.
In a possible implementation manner, the parsing module 904 is specifically configured to:
decoding and decrypting the watermark to obtain combined copyright information and timestamp information;
and splitting the combined copyright information and timestamp information to obtain the copyright information and the timestamp information.
As shown in fig. 10, a hardware structure diagram of a computer for implementing a watermark embedding method or a watermark detecting method provided in the embodiment of the present application includes a processor 1010, a communication interface 1020, a memory 1030, and a communication bus 1040, where the processor 1010, the communication interface 1020, and the memory 1030 are in communication with each other through the communication bus 1040.
A memory 1030 for storing a computer program;
the processor 1010, when executing the program stored in the memory 1030, causes the computer to perform the steps of the above-described watermark embedding or watermark detection method.
A computer-readable storage medium provided in an embodiment of the present application includes program code, which, when executed on a computer, causes the computer to perform the steps of the above-mentioned watermark embedding and/or watermark detection method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (30)

1. A watermark embedding method, comprising:
acquiring a copyright file and copyright information provided by a copyright owner, wherein the copyright file is a text file with no mutual dependency relationship among contents of each row;
generating a watermark to be embedded into the copyright file based on the copyright information and the current timestamp information;
embedding at least one part of the watermark into the copyright file, wherein any row of the copyright file is embedded with a sub-watermark, the sub-watermark is obtained by dividing the watermark, and each row of the copyright file to be embedded with the sub-watermark is embedded with the sub-watermark according to a formula k-CHASH% K determines the number K of the sub-watermark to be embedded in the row, where CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; k is the number of sub-watermarks obtained by dividing the watermark;
determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm, and hiding watermark information in the copyright file according to the characters corresponding to each line of text content, character strings corresponding to the watermarks and a preset watermark hiding rule.
2. The method of claim 1, wherein generating a watermark to be embedded in the rights file based on the rights information and current timestamp information comprises:
combining the copyright information and the timestamp information;
and encoding and then encrypting the combined copyright information and the timestamp information to obtain the watermark to be embedded into the copyright file.
3. The method of claim 1, wherein embedding at least one copy of the watermark in the rights file comprises:
dividing the watermark according to the length of the embedded watermark in each row, and numbering the sub-watermarks obtained by dividing;
and aiming at each row of the sub-watermarks to be embedded in the copyright file, determining the number of the sub-watermarks to be embedded in the row by using the text content, the Hash algorithm and the number of the sub-watermarks in the row, and embedding the sub-watermarks corresponding to the number into the row.
4. The method of claim 3, wherein the characters contained in each sub-watermark are visible characters; embedding the sub-watermark corresponding to the number into the line, including:
converting each visible character in the sub-watermark into an invisible character according to a preset conversion rule between the visible character and the invisible character to obtain an invisible character string corresponding to the sub-watermark;
the invisible string is embedded in a specified position in the line.
5. The method of claim 1, wherein determining the corresponding character of each line of text content in the copyright file by using a hash algorithm comprises:
for each line in the copyright file, determining the hash value of the line of text content by using a hash algorithm;
and performing XOR operation on the hash value and a preset value, and determining that the result of the XOR operation is the character corresponding to the text content of the line.
6. The method as claimed in claim 1, wherein hiding watermark information in the copyright file according to characters corresponding to each line of text content, character strings corresponding to the watermark, and a preset watermark hiding rule comprises:
and performing line exchange on the copyright file according to characters corresponding to each line of text content, character strings corresponding to the watermarks and a preset watermark hiding rule, wherein the watermark hiding rule is as follows: and a character string consisting of characters corresponding to a plurality of continuous lines in the exchanged copyright file is a character string corresponding to the watermark, a character string consisting of characters of m continuous lines before the initial line is a preset character string used for identifying the initial position of the watermark, and a character string consisting of characters of n continuous lines after the ending line is a preset character string used for identifying the ending position of the watermark, wherein the initial line is the first line in the continuous lines, the ending line is the last line in the continuous lines, and m and n are integers larger than zero.
7. A watermark detection method, comprising:
acquiring a copyright file, wherein the copyright file is a text file without mutual dependency relationship among contents of all rows;
determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm, and determining the watermark hidden in the copyright file according to the characters corresponding to each line of text content and a preset watermark hiding rule;
extracting the watermark embedded in the copyright file, wherein the watermark extracted from any line in the copyright file is a sub-watermark, and the sub-watermark is obtained by dividing a hidden watermark and is determined according to the text content of the line, a Hash algorithm and the number of sub-watermarks;
if the extracted watermark is the same as the hidden watermark, analyzing any watermark to obtain copyright information of a copyright owner and timestamp information when the copyright owner starts to own the copyright file.
8. The method of claim 7, wherein determining the corresponding character of each line of text content in the copyright file by using a hash algorithm comprises:
for each line in the copyright file, determining the hash value of the line of text content by using a hash algorithm;
and performing XOR operation on the hash value and a preset value, and determining that the result of the XOR operation is the character corresponding to the text content of the line.
9. The method as claimed in claim 7, wherein determining the watermark hidden in the copyright file according to the characters corresponding to each line of text content and a preset watermark hiding rule comprises:
according to characters corresponding to text contents of all lines and a preset watermark hiding rule, determining that a character string formed by characters corresponding to m continuous lines in the copyright file is a character string used for identifying a watermark initial position, and determining a character string formed by characters corresponding to n continuous lines between the m lines and the n lines as a watermark hidden in the copyright file when a character string formed by characters corresponding to the n continuous lines is a character string used for identifying a watermark end position, wherein m and n are integers larger than zero.
10. The method of claim 7, wherein extracting the watermark embedded in the rights file comprises:
aiming at each row of the sub-watermarks embedded in the copyright file, determining the sub-watermarks embedded in the row according to a preset watermark embedding position and the watermark length embedded in each row, and determining the number of the sub-watermarks embedded in the row by using the text content, the hash algorithm and the number of the sub-watermarks in the row;
and determining the watermark embedded into the copyright file according to the sub-watermarks embedded into the rows and the serial numbers of the sub-watermarks.
11. The method of claim 10, wherein the characters contained in each sub-watermark embedded in the copyright file are invisible characters; determining the sub-watermarks embedded in each row according to a preset watermark embedding position and the embedded watermark length of each row, wherein the method comprises the following steps:
according to a preset watermark embedding position and the watermark length embedded in each row, dividing an invisible character string corresponding to the sub-watermark from the row;
and converting each invisible character in the character string into a visible character according to a preset conversion rule between the visible character and the invisible character, and determining the obtained visible character string as the sub-watermark embedded into the row.
12. The method of claim 10, wherein for each line of the copyright file in which a sub-watermark is embedded, the number k of the sub-watermark embedded in the line is determined according to the following formula:
k=CHASH%K;
wherein, CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; and K is the number of sub-watermarks obtained by dividing the watermark.
13. The method of claim 10, wherein determining the watermark to embed in the rights file based on the sub-watermarks and the number of sub-watermarks to embed in the respective rows comprises:
counting the sub-watermarks with the most occurrence times aiming at each number, and determining the sub-watermarks to be the sub-watermarks corresponding to the numbers;
and splicing the sub-watermarks according to the sequence of the numbers from small to large to obtain the watermark embedded into the copyright file.
14. The method of claim 7, wherein parsing any one of the watermarks to obtain copyright information of a copyright holder and time stamp information of when the copyright holder starts to own the copyright file comprises:
the watermark is decrypted and then decoded to obtain combined copyright information and timestamp information;
and splitting the combined copyright information and timestamp information to obtain the copyright information and the timestamp information.
15. A watermark embedding apparatus, comprising:
the acquisition module is used for acquiring copyright files and copyright information provided by a copyright person, wherein the copyright files are text files without mutual dependency relationship among contents of each line;
the generating module is used for generating a watermark to be embedded into the copyright file based on the copyright information and the current timestamp information;
an embedding module, configured to embed at least one piece of the watermark into the copyright file, where any row of the copyright file is embedded with a sub-watermark, the sub-watermark is obtained by dividing the watermark, and each row of the copyright file in which the sub-watermark is to be embedded is according to a formula k ═ CHASH% K determines the number K of the sub-watermark to be embedded in the row, where CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; k is the number of sub-watermarks obtained by dividing the watermark;
and the hiding module is used for determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm and hiding the watermark information in the copyright file according to the characters corresponding to each line of text content, the character strings corresponding to the watermarks and a preset watermark hiding rule.
16. The apparatus of claim 15, wherein the generation module is specifically configured to:
combining the copyright information and the timestamp information;
and encoding and then encrypting the combined copyright information and the timestamp information to obtain the watermark to be embedded into the copyright file.
17. The apparatus of claim 15, wherein the embedding module is specifically configured to:
dividing the watermark according to the length of the embedded watermark in each row, and numbering the sub-watermarks obtained by dividing;
and aiming at each row of the sub-watermarks to be embedded in the copyright file, determining the number of the sub-watermarks to be embedded in the row by using the text content, the Hash algorithm and the number of the sub-watermarks in the row, and embedding the sub-watermarks corresponding to the number into the row.
18. The apparatus of claim 17, wherein the characters contained in each sub-watermark are visible characters; the embedded module is specifically configured to:
converting each visible character in the sub-watermark into an invisible character according to a preset conversion rule between the visible character and the invisible character to obtain an invisible character string corresponding to the sub-watermark;
the invisible string is embedded in a specified position in the line.
19. The apparatus of claim 15, wherein the concealment module is specifically configured to:
for each line in the copyright file, determining the hash value of the line of text content by using a hash algorithm;
and performing XOR operation on the hash value and a preset value, and determining that the result of the XOR operation is the character corresponding to the text content of the line.
20. The apparatus of claim 15, wherein the concealment module is specifically configured to:
and performing line exchange on the copyright file according to characters corresponding to each line of text content, character strings corresponding to the watermarks and a preset watermark hiding rule, wherein the watermark hiding rule is as follows: and a character string consisting of characters corresponding to a plurality of continuous lines in the exchanged copyright file is a character string corresponding to the watermark, a character string consisting of characters of m continuous lines before the initial line is a preset character string used for identifying the initial position of the watermark, and a character string consisting of characters of n continuous lines after the ending line is a preset character string used for identifying the ending position of the watermark, wherein the initial line is the first line in the continuous lines, the ending line is the last line in the continuous lines, and m and n are integers larger than zero.
21. A watermark detection apparatus, comprising:
the acquisition module is used for acquiring a copyright file, wherein the copyright file is a text file without mutual dependency relationship among contents of all rows;
the determining module is used for determining characters corresponding to each line of text content in the copyright file by utilizing a Hash algorithm, and determining the watermark hidden in the copyright file according to the characters corresponding to each line of text content and a preset watermark hiding rule;
the copyright file processing module is used for processing the copyright file, and comprises an extraction module, a storage module and a processing module, wherein the extraction module is used for extracting the watermark embedded in the copyright file, the watermark extracted from any line in the copyright file is a sub-watermark, the sub-watermark is obtained by dividing a hidden watermark and is determined according to the text content of the line, a Hash algorithm and the number of sub-watermarks;
and the analysis module is used for analyzing any watermark to obtain copyright information of a copyright owner and timestamp information when the copyright owner starts to own the copyright file if the extracted watermark is determined to be the same as the hidden watermark.
22. The apparatus of claim 21, wherein the determination module is specifically configured to:
for each line in the copyright file, determining the hash value of the line of text content by using a hash algorithm;
and performing XOR operation on the hash value and a preset value, and determining that the result of the XOR operation is the character corresponding to the text content of the line.
23. The apparatus of claim 21, wherein the determination module is specifically configured to:
according to characters corresponding to text contents of all lines and a preset watermark hiding rule, determining that a character string formed by characters corresponding to m continuous lines in the copyright file is a character string used for identifying a watermark initial position, and determining a character string formed by characters corresponding to n continuous lines between the m lines and the n lines as a watermark hidden in the copyright file when a character string formed by characters corresponding to the n continuous lines is a character string used for identifying a watermark end position, wherein m and n are integers larger than zero.
24. The apparatus of claim 21, wherein the extraction module is specifically configured to:
aiming at each row of the sub-watermarks embedded in the copyright file, determining the sub-watermarks embedded in the row according to a preset watermark embedding position and the watermark length embedded in each row, and determining the number of the sub-watermarks embedded in the row by using the text content, the hash algorithm and the number of the sub-watermarks in the row;
and determining the watermark embedded into the copyright file according to the sub-watermarks embedded into the rows and the serial numbers of the sub-watermarks.
25. The apparatus of claim 24, wherein the characters contained in each sub-watermark embedded in the copyright file are invisible characters; the extraction module is specifically configured to:
according to a preset watermark embedding position and the watermark length embedded in each row, dividing an invisible character string corresponding to the sub-watermark from the row;
and converting each invisible character in the character string into a visible character according to a preset conversion rule between the visible character and the invisible character, and determining the obtained visible character string as the sub-watermark embedded into the row.
26. The apparatus as claimed in claim 24, wherein for each row of the copyright file in which the sub-watermark is embedded, the extracting module is specifically configured to determine the number k of the sub-watermark embedded in the row according to the following formula:
k=CHASH%K;
wherein, CHASHThe hash value of the previous S-bit text content in the line is obtained, and S is an integer greater than zero; and K is the number of sub-watermarks obtained by dividing the watermark.
27. The apparatus of claim 24, wherein the extraction module is specifically configured to:
counting the sub-watermarks with the most occurrence times aiming at each number, and determining the sub-watermarks to be the sub-watermarks corresponding to the numbers;
and splicing the sub-watermarks according to the sequence of the numbers from small to large to obtain the watermark embedded into the copyright file.
28. The apparatus of claim 21, wherein the parsing module is specifically configured to:
the watermark is decrypted and then decoded to obtain combined copyright information and timestamp information;
and splitting the combined copyright information and timestamp information to obtain the copyright information and the timestamp information.
29. A computer, comprising at least one processing unit and at least one memory unit, wherein the memory unit stores program code which, when executed by the processing unit, causes the computer to perform the steps of the method of any of claims 1 to 6 and/or 7 to 14.
30. A computer-readable storage medium comprising program code means for causing a computer to carry out the steps of the method as claimed in any one of claims 1 to 6 and/or 7 to 14 when said program code means is run on a computer.
CN201810432660.5A 2018-05-08 2018-05-08 Watermark embedding and detecting method and device Active CN110457873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810432660.5A CN110457873B (en) 2018-05-08 2018-05-08 Watermark embedding and detecting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810432660.5A CN110457873B (en) 2018-05-08 2018-05-08 Watermark embedding and detecting method and device

Publications (2)

Publication Number Publication Date
CN110457873A CN110457873A (en) 2019-11-15
CN110457873B true CN110457873B (en) 2021-04-27

Family

ID=68480476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810432660.5A Active CN110457873B (en) 2018-05-08 2018-05-08 Watermark embedding and detecting method and device

Country Status (1)

Country Link
CN (1) CN110457873B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111145069B (en) * 2019-12-03 2021-04-27 支付宝(杭州)信息技术有限公司 Image watermarking processing method and device based on block chain
CN112948895A (en) * 2019-12-10 2021-06-11 航天信息股份有限公司 Data watermark embedding method, watermark tracing method and device
US11669601B2 (en) 2020-09-18 2023-06-06 Huawei Cloud Computing Technologies Co., Ltd. Digital watermarking for textual data
CN112884631A (en) * 2021-02-24 2021-06-01 江苏保旺达软件技术有限公司 Watermark processing method, device, equipment and storage medium
CN113177193A (en) * 2021-04-23 2021-07-27 深圳依时货拉拉科技有限公司 Watermark adding method, watermark verifying method and terminal equipment
CN113255008B (en) * 2021-07-01 2021-10-22 支付宝(杭州)信息技术有限公司 Method and system for outputting multimedia file
CN117272333A (en) * 2022-10-28 2023-12-22 北京鸿鹄元数科技有限公司 Relational database watermark embedding and tracing method
CN116362953B (en) * 2023-05-30 2023-08-01 南京师范大学 High-precision map watermarking method based on invisible characters

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751656B (en) * 2008-12-22 2012-03-28 北京大学 Watermark embedding and extraction method and device
US20170329942A1 (en) * 2016-05-12 2017-11-16 Markany Inc. Method and apparatus of drm systems for protecting enterprise confidentiality

Also Published As

Publication number Publication date
CN110457873A (en) 2019-11-15

Similar Documents

Publication Publication Date Title
CN110457873B (en) Watermark embedding and detecting method and device
JP3749884B2 (en) Digital watermark embedding device, digital watermark analysis device, digital watermark embedding method, digital watermark analysis method, and program
US10482222B2 (en) Methods, apparatus, and articles of manufacture to encode auxiliary data into text data and methods, apparatus, and articles of manufacture to obtain encoded data from text data
US20190158296A1 (en) Redactable document signatures
CN1897522B (en) Water mark embedded and/or inspecting method, device and system
CN106203128B (en) Webpage data encryption and decryption method, device and system
US20120317421A1 (en) Fingerprinting Executable Code
CN112600665B (en) Hidden communication method, device and system based on block chain and encryption technology
KR20130007543A (en) Steganographic messaging system using code invariants
CN105303075B (en) Adaptive Text Watermarking method based on PDF format
CN106709853B (en) Image retrieval method and system
CN103778590A (en) Method and device for utilizing digital image to store and transmit information
US8307450B2 (en) Method and system for hiding information in the instruction processing pipeline
Melkundi et al. A robust technique for relational database watermarking and verification
CN111010490A (en) Watermark adding method, watermark adding device, electronic equipment and computer readable storage medium
JPWO2011077819A1 (en) Verification device, secret information restoration device, verification method, program, and secret sharing system
JP4025283B2 (en) Code embedding method, identification information restoring method and apparatus
CN112434319A (en) Data encryption method and device for electronic file
Khanduja et al. Identification and Proof of Ownership by WatermarkingRelational Databases
CN112532379A (en) File protection method and device
KR102154897B1 (en) Method for supervising digital contents using block chain and fingerprinting, device and computer readable medium for performing the method
CN115935299A (en) Authorization control method, device, computer equipment and storage medium
Ahmad et al. Fingerprinting non-numeric datasets using row association and pattern generation
CN109922228B (en) Ciphertext preservation method under carrier damage
Kozachok et al. Estimation of Watermark Embedding Capacity with Line Space Shifting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant