CN112613045B - Method and system for embedding data watermark of target data - Google Patents

Method and system for embedding data watermark of target data Download PDF

Info

Publication number
CN112613045B
CN112613045B CN202011375206.4A CN202011375206A CN112613045B CN 112613045 B CN112613045 B CN 112613045B CN 202011375206 A CN202011375206 A CN 202011375206A CN 112613045 B CN112613045 B CN 112613045B
Authority
CN
China
Prior art keywords
data
watermark
similarity
embedding
embedded
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011375206.4A
Other languages
Chinese (zh)
Other versions
CN112613045A (en
Inventor
于鹏飞
石聪聪
陈磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Smart Grid Research Institute Co ltd
Original Assignee
State Grid Smart Grid Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Smart Grid Research Institute Co ltd filed Critical State Grid Smart Grid Research Institute Co ltd
Priority to CN202011375206.4A priority Critical patent/CN112613045B/en
Publication of CN112613045A publication Critical patent/CN112613045A/en
Application granted granted Critical
Publication of CN112613045B publication Critical patent/CN112613045B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioethics (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a data watermark embedding method and system of target data, wherein the method comprises the steps of S1 dividing the target data to be embedded with the data watermark into a plurality of content blocks, and embedding the data watermark in each content block; s2, carrying out data item similarity evaluation on the data items embedded with the data watermarks by adopting a preset data similarity evaluation model; s3, evaluating the watermark similarity of the data of the content block based on the similarity of all data items of the data days constituting the content block, executing S4 when the watermark similarity of the data of each content block meets a first threshold range, otherwise, adjusting the embedding proportion and/or position of the watermark of the data in the content block and executing S2; s4, calculating the overall similarity of the target data based on the data watermark similarity of all content blocks forming the target data, and obtaining the target data embedded with the data watermark by adjusting the embedding proportion and/or the position of the data watermark. Finally, high concealment and high simulation after the data watermark is embedded are realized.

Description

Method and system for embedding data watermark of target data
Technical Field
The invention relates to the field of data watermarking, in particular to a data watermarking embedding method and system for target data.
Background
With the continuous development of digital economy, the information exchange among different departments, different areas and different data main bodies is gradually increased, and the data is circulated, recombined and used more frequently in the form of structured data among all the links. The data is used in a dynamic environment, the risk of occurrence of a data leakage event is huge, and once the data leakage occurs, the responsibility links can be accurately positioned so as to trace the safety responsibility of related personnel, and the safety control of weak links is enhanced in a targeted manner.
The data watermarking technology is one of effective technical means for solving the problem of responsibility tracing after data leakage. The data watermark is to add extra redundant identification information to the data content itself, and to associate and record the relevant responsibility links by high-imitation of the real data content and by referencing the identification information, once the data is leaked, the data watermark can be positioned according to the watermark information added in advance. The high simulation and high concealment are effective key indexes of the data watermark, and are prevented from being found and destroyed by malicious users. The realization of high simulation and high concealment of the data watermark requires that the similarity of the target data before and after the data watermark is added must reach a threshold value which is not easy to be found by a user, so how to embed the data watermark in the target data to achieve the high concealment and high simulation after the data watermark is embedded needs to be solved.
Disclosure of Invention
In order to solve the above-mentioned shortcomings existing in the prior art, the present invention provides a data watermark embedding method of target data, including:
s1, dividing target data to be embedded with a data watermark into a plurality of content blocks, and embedding the data watermark in a data entry of each content block;
s2, carrying out data item similarity evaluation on the data items embedded with the data watermarks by adopting a preset data similarity evaluation model;
s3, evaluating the data watermark similarity of the content block based on the data item similarity of all the data items forming the content block, when the data watermark similarity of each content block meets a first threshold range, executing S4, otherwise, adjusting the embedding proportion and/or position of the data watermark in the content block which does not meet the first threshold range, and executing S2;
s4, calculating the similarity of the whole target data based on the data watermark similarity of all content blocks forming the target data, finishing embedding of the data watermark when the similarity of the whole target data meets a second threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in one or more content blocks, and executing S2.
Preferably, adjusting the embedding proportion and/or position of the data watermark includes:
when the data item contains a single type field, adjusting the embedding proportion of the data watermark;
when the data entry contains multiple types of fields, the embedding proportion and/or the position of the data watermark are/is adjusted.
Preferably, the adjusting the embedding ratio of the data watermark includes:
when the similarity of the watermarks of the content block data is greater than the maximum value in the first threshold range, reducing the proportion of embedding the data watermarks in the data items of the content block data to a preset proportion;
when the similarity of the watermarks of the content block data is smaller than the minimum value in the first threshold range, increasing the proportion of embedding the data watermarks in the data items of the content block data to a preset proportion;
when the overall similarity of the target data is greater than the maximum value in the second threshold range, reducing the proportion of embedding the data watermark in one or more content blocks to a preset proportion;
and when the overall similarity of the target data is smaller than the minimum value in the second threshold range, increasing the proportion of embedding the data watermark in one or more content blocks to a preset proportion.
Preferably, the adjusting the embedding position of the data watermark includes:
removing original data watermarks in the data items, and embedding data watermarks matched with field types into various fields in the data items according to preset proportions;
and evaluating the similarity of the data items after the data watermarks matched with the field types are embedded, selecting the position of the field with the maximum similarity of the data items as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.
Preferably, the field types in the data entry include any one or more of the following:
a numeric field, a text field, and a natural language field.
Preferably, embedding a data watermark in the data entry includes:
embedding a numeric data watermark in a numeric field when the numeric field is included in the data entry;
embedding a character text type data watermark in a text field when the text field is included in the data item;
when the data entry comprises a natural language field, embedding a natural language type data watermark in the natural language field.
Preferably, the data item similarity evaluation for the data item embedded with the data watermark by adopting a preset data similarity evaluation model includes:
when a numerical data watermark is embedded in a numerical value field of the data item, deconstructing and word segmentation is carried out on numerical values before and after the data watermark is embedded, and the data item similarity is evaluated through a Euclidean distance vector data similarity evaluation model;
when a text field of the data item is embedded with a text-type character data watermark, deconstructing ASCII code values before and after the data watermark is embedded, and evaluating the similarity of the data item through a cosine vector data similarity evaluation model;
and when the natural language field of the data item is embedded with the natural language type data watermark, performing deconstructing word segmentation on the natural language field before and after the data watermark is embedded by using a space vector model, and performing data item similarity assessment on a deconstructing word segmentation result through a cosine vector data similarity assessment model.
Preferably, the content block data watermark similarity assessment is performed as follows:
Figure BDA0002807028260000031
wherein: delta represents the watermark similarity of the content block data; n represents the total number of data entries in the content block; c (C) i Representing the similarity of the data entries of the ith data entry.
Preferably, the similarity of the whole target data is evaluated as follows:
Figure BDA0002807028260000032
/>
wherein: θ represents the similarity of the whole target data; m represents the total number of content blocks in the target data; delta i And the watermark similarity of the content block data of the ith content block is represented.
Based on the same inventive concept, the invention also provides a data watermark embedding system of target data, comprising:
the embedding module is used for dividing target data to be embedded with the data watermark into a plurality of content blocks, and embedding the data watermark into each content block;
the data item similarity evaluation module is used for evaluating the similarity of the data items embedded with the data watermark by adopting a preset data similarity evaluation model;
the content block similarity evaluation module is used for evaluating the data watermark similarity of the content block based on the data item similarity of all the data items forming the content block, executing the overall similarity evaluation module when the data watermark similarity of each content block meets a first threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in the content block which does not meet the first threshold range, and executing the data item similarity evaluation module;
and the overall similarity evaluation module is used for calculating the overall similarity of the target data based on the data watermark similarity of all content blocks forming the target data, finishing the embedding of the data watermark when the overall similarity of the target data meets a second threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in one or more content blocks, and executing the data item similarity evaluation module.
Preferably, the data item similarity evaluation module is specifically configured to:
when a numerical data watermark is embedded in a numerical value field of the data item, deconstructing and word segmentation is carried out on numerical values before and after the data watermark is embedded, and the data item similarity is evaluated through a Euclidean distance vector data similarity evaluation model;
when a text field of the data item is embedded with a text-type character data watermark, deconstructing ASCII code values before and after the data watermark is embedded, and evaluating the similarity of the data item through a cosine vector data similarity evaluation model;
and when the natural language field of the data item is embedded with the natural language type data watermark, performing deconstructing word segmentation on the natural language field before and after the data watermark is embedded by using a space vector model, and performing data item similarity assessment on a deconstructing word segmentation result through a cosine vector data similarity assessment model.
Compared with the prior art, the invention has the beneficial effects that:
according to the technical scheme provided by the invention, S1, target data to be embedded with the data watermark is divided into a plurality of content blocks, and the data watermark is embedded in a data item of each content block; s2, carrying out data item similarity evaluation on the data items embedded with the data watermarks by adopting a preset data similarity evaluation model; s3, evaluating the data watermark similarity of the content block based on the data item similarity of all the data items forming the content block, when the data watermark similarity of each content block meets a first threshold range, executing S4, otherwise, adjusting the embedding proportion and/or position of the data watermark in the content block which does not meet the first threshold range, and executing S2; s4, calculating the similarity of the whole target data based on the data watermark similarity of all content blocks forming the target data, finishing embedding of the data watermark when the similarity of the whole target data meets a second threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in one or more content blocks, and executing S2. According to the data watermark similarity evaluation results of the data items, the content blocks and the whole data, the data watermarks of the embedded content blocks are dynamically adjusted, so that high concealment and high simulation after the data watermarks are embedded are finally realized.
Drawings
FIG. 1 is a flow chart of a method for embedding a data watermark into target data;
fig. 2 is a schematic diagram of a data watermark embedding system of target data according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present invention, reference is made to the following description, drawings and examples.
Example 1: as shown in fig. 1, in order to meet the urgent needs in the prior art, the present invention provides a data watermark embedding method of target data, including:
s1, dividing target data to be embedded with a data watermark into a plurality of content blocks, and embedding the data watermark in a data entry of each content block;
s2, carrying out data item similarity evaluation on the data items embedded with the data watermarks by adopting a preset data similarity evaluation model;
s3, evaluating the data watermark similarity of the content block based on the data item similarity of all the data items forming the content block, when the data watermark similarity of each content block meets a first threshold range, executing S4, otherwise, adjusting the embedding proportion and/or position of the data watermark in the content block which does not meet the first threshold range, and executing S2;
s4, calculating the similarity of the whole target data based on the data watermark similarity of all content blocks forming the target data, finishing embedding of the data watermark when the similarity of the whole target data meets a second threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in one or more content blocks, and executing S2.
Wherein, adjust embedding proportion and/or position of data watermark, include:
when the data item contains a single type field, adjusting the embedding proportion of the data watermark;
when the data entry contains multiple types of fields, the embedding proportion and/or the position of the data watermark are/is adjusted.
According to the data watermark similarity evaluation results of the data items, the content blocks and the whole data, the data watermark of the embedded target data is dynamically adjusted, so that the similarity of the content blocks after the data watermark is embedded and the similarity of the whole target data respectively meet the set threshold range, and finally high concealment and high simulation after the data watermark is embedded are realized.
In this embodiment, S1 divides target data to be embedded with a data watermark into a plurality of content blocks, and embeds the data watermark in a data entry of each content block, including:
for each data item composing the content block, selecting the corresponding type of data watermark according to the field type in the data item and embedding, in order to improve the information capacity of the data watermark embedding, the proportion of the data watermark embedding in the target data is 100%.
The method specifically comprises the following steps:
embedding a numeric data watermark in a numeric field when the numeric field is included in the data entry;
embedding a character text type data watermark in a text field when the text field is included in the data item;
when the data entry comprises a natural language field, embedding a natural language type data watermark in the natural language field.
S2, carrying out data item similarity evaluation on the data items embedded with the data watermarks by adopting a preset data similarity evaluation model, namely selecting a proper data similarity evaluation model according to different data watermark embedding algorithms to carry out similarity evaluation on the data watermark items, wherein the data item similarity evaluation comprises the following steps:
in this embodiment, the similarity refers to that, for a certain type of data, after the data watermark is embedded, the data type characteristics of the data watermark should not change, and if the data type characteristics change, the similarity evaluation result of the data watermark entry is 0.
For example, the data of the mobile phone number type is 11 bits, wherein the first 3 bits represent the network identification number, the 4 th to 7 th bits represent the region code, the 8 th to 11 th bits represent the user number, and the data watermark is embedded and still accords with the data type characteristic of the mobile phone number.
(1) After the numerical data watermark is embedded, the numerical values before and after the data watermark is embedded are processed through deconstructing word segmentation, similarity evaluation is performed through a Euclidean distance vector data similarity evaluation model, and the evaluation result is D.
For example, the values before and after the data watermark is embedded are P and P ', and each digit is an independent unit through structural word segmentation, namely p= { N1, N2, … …, N11}, P ' = { N '1, N '2, … …, N '11}; then, the Euclidean data similarity evaluation model is carried in, and the similarity is calculated
Figure BDA0002807028260000061
(2) After embedding the character text type data watermark, deconstructing ASCII code values before and after embedding the data watermark, and carrying out similarity assessment through a cosine vector data similarity assessment model, wherein the assessment result is C.
For example, the values before and after the watermark embedding of the WeChat account type data are Pi and Pi ', and each digit is an independent unit through the deconstructing of ASCII code values, namely P= { N1, N2, … … and Nn }, and P ' = { N '1, N '2, … … and N ' N }; then the cosine data similarity evaluation model is carried in, and the similarity is calculated
Figure BDA0002807028260000071
(3) After the data watermark of the natural language type is embedded, deconstructing and word segmentation is carried out on the data watermark before and after the data watermark is embedded by using a space vector model, and data similarity evaluation is carried out on the deconstructing and word segmentation result through a cosine vector data similarity evaluation model.
The data of the power business related to natural language has obvious professional characteristics, such as address data of maintenance addresses, expansion addresses and the like; operation terms, electrical quantity terms and other power technical term data; resident-oriented names, etc., can form a characteristic word bank of the natural language data of the electric power business.
The data related to natural language of the power business before and after the data watermark is added is subjected to word segmentation processing, the obtained vector expressions are O= { O1, O2, … …, on } and O '= { O'1, O '2, … …, O' n }, and the data are brought into a cosine data similarity evaluation model to calculate the similarity
Figure BDA0002807028260000072
S3, evaluating the similarity of the data watermarks of the content blocks based on the similarity of the data items of all the data items forming the content blocks, when the similarity of the data watermarks of all the content blocks meets a first threshold range, executing S4, otherwise, adjusting the embedding proportion and/or the embedding position of the data watermarks in the content blocks which do not meet the first threshold range, and executing S2, wherein the method comprises the following steps:
and performing secondary similarity calculation according to the data item similarity of all the data items constituting the content block, namely, content block data watermark similarity. The size of the content block is set by the user according to a specific service scenario, for example, for convenience of reference, the size of the content block may be set to 20 lines, 50 lines, or 100 lines.
Taking the data with the size of N rows of the content block as an example, carrying out similarity evaluation on data watermark entries according to the method provided in S2, and marking the evaluation result as C, wherein the secondary similarity of the content block after all the content blocks are embedded with the data watermark is
Figure BDA0002807028260000081
Wherein: delta represents the watermark similarity of the content block data; n represents the total number of data entries in the content block; c (C) i Representing the similarity of the data entries of the ith data entry.
Judging whether the data watermark similarity of each content block meets a first threshold range, executing S4 when the data watermark similarity of each content block meets the first threshold range, otherwise, adjusting the embedding proportion and/or position of the data watermark in the content block which does not meet the first threshold range, executing S2,
in this embodiment, a specific description is given of a method adopted when the similarity of content blocks does not satisfy a threshold range:
the method I, dynamically adjusting the adding proportion of the data watermark, comprises the following steps:
when the similarity of the watermarks of the content block data is greater than the maximum value in the first threshold range, reducing the proportion of embedding the data watermarks in the data items of the content block data to a preset proportion;
when the similarity of the watermarks of the content block data is smaller than the minimum value in the first threshold range, increasing the proportion of embedding the data watermarks in the data items of the content block data to a preset proportion;
the process specifically comprises the following steps: when the secondary similarity of all embedded data watermarks of a certain data content block exceeds the maximum value in the first threshold range, the secondary similarity before and after embedding the data watermarks can be ensured by reducing the embedding proportion of the data watermarks, for example, the embedding proportion of the data watermarks can be set to be 50%, 30% or 20% and the like.
When the second-level similarity of a certain data content block after embedding the data watermark is smaller than the minimum value in the first threshold range, the embedding capacity of the data watermark can be increased as much as possible by increasing the embedding proportion of the data watermark, for example, the embedding proportion of the data watermark can be set to be 20%, 30% or 50% and the like.
When the data items formed into the content block contain a plurality of field types, the method II can be adopted to dynamically adjust the position of the data watermarking, and the method comprises the following steps:
removing original data watermarks in the data items, and embedding data watermarks matched with field types into various fields in the data items according to preset proportions;
and evaluating the similarity of the data items after the data watermarks matched with the field types are embedded, selecting the position of the field with the maximum similarity of the data items as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.
In this embodiment, adjusting the position of adding the data watermark specifically includes: when a data item in a certain data content block comprises a numerical value, text and natural language, the data watermark is added in a numerical value field, a text field or a natural language field according to the fixed embedding proportion of the data watermark, the embedded field, the text or the item similarity after the natural language data watermark is calculated according to the method provided by S2, the position with the maximum item similarity is selected as the optimal watermark adding position, the original data added in the item is deleted, the secondary similarity is calculated according to the item similarity, and the data watermark embedding capacity is improved as much as possible on the premise that the secondary similarity after the data watermark is embedded meets a threshold value.
S4, calculating the similarity of the whole target data based on the data watermark similarity of all content blocks forming the target data, completing the embedding of the data watermark when the similarity of the whole target data meets a second threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in one or more content blocks, and executing S2, wherein the method comprises the following steps:
and when the second-level similarity after embedding the data watermark meets the threshold range, simultaneously, the embedding capacity of the data watermark is improved as much as possible, and then the similarity of the data watermark embedded into the whole target data is calculated according to the second-level similarity of all the content blocks, namely, the third-level similarity, and when the third-level similarity meets the second threshold range, the embedding of the data watermark is completed, otherwise, the embedding proportion and/or the position of the data watermark in one or more content blocks are adjusted, and S2 is executed.
When the three-level similarity does not meet the second threshold range, the dynamic adjustment can be performed by the following way of adjusting the proportion:
when the overall similarity of the target data is greater than the maximum value in the second threshold range, reducing the proportion of embedding the data watermark in one or more content blocks to a preset proportion;
and when the overall similarity of the target data is smaller than the minimum value in the second threshold range, increasing the proportion of embedding the data watermark in one or more content blocks to a preset proportion.
I.e. when the overall similarity of the target data does not meet the second threshold range, the proportion of embedded data watermarks in one or more content blocks needs to be adjusted to a preset proportion.
When the three-level similarity does not meet the second threshold range and the data items forming the content block contain fields of various types in the content block to be adjusted, the positions of the data watermarks embedded in the data items can be adjusted to enable the three-level similarity to meet the second threshold range, and therefore the embedding process of the data watermarks is completed.
In this embodiment, taking an example of dividing a certain target data into M content blocks, embedding data watermarks in each content block, performing similarity evaluation of data watermark entries, evaluating the similarity of the content blocks based on the similarity of the data watermark entries, and marking the evaluation result as δ, where three-level similarity of the content blocks of the target data after all the data watermarks are embedded is
Figure BDA0002807028260000091
And if theta exceeds the maximum value of the set second threshold range, adjusting the embedding proportion and/or position of the data watermark, increasing the delta value and further increasing the theta value, and finally increasing the similarity of the data watermark after being embedded into the whole data.
And if the minimum value of the second threshold range set by the theta distance is larger, adjusting the embedding proportion and/or the position of the data watermark, and improving the embedding proportion of the data watermark so as to improve the embedding capacity of the data watermark as much as possible on the premise of ensuring the three-level similarity after the data watermark is embedded.
In order to achieve the aim of achieving high concealment and high simulation after the data watermark is embedded into the target data, the embodiment of the invention selects proper watermarking proportion and distribution strategy according to the similarity evaluation results of different data watermark algorithms so as to finally achieve the high concealment and high simulation after the data watermark is embedded.
Example 2: based on the same inventive concept, the invention also provides a data watermark embedding system of target data, as shown in fig. 2, comprising:
the embedding module is used for dividing target data to be embedded with the data watermark into a plurality of content blocks, and embedding the data watermark into each content block;
the data item similarity evaluation module is used for evaluating the similarity of the data items embedded with the data watermark by adopting a preset data similarity evaluation model;
the content block similarity evaluation module is used for evaluating the data watermark similarity of the content block based on the data item similarity of all the data items forming the content block, executing the overall similarity evaluation module when the data watermark similarity of each content block meets a first threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in the content block which does not meet the first threshold range, and executing the data item similarity evaluation module;
and the overall similarity evaluation module is used for calculating the overall similarity of the target data based on the data watermark similarity of all content blocks forming the target data, finishing the embedding of the data watermark when the overall similarity of the target data meets a second threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in one or more content blocks, and executing the data item similarity evaluation module.
On one hand, the system evaluates the similarity of the data watermark item, the data watermark embedded content block and the data watermark embedded data through a data similarity evaluation model, and on the other hand, dynamically adjusts the proportion and the distribution position of watermark addition according to the evaluation result so as to meet the similarity threshold value of the circulation data set by a user, and integrally ensures the concealment and high simulation of the data watermark embedding.
In an embodiment, the system further comprises an adjustment module for adjusting the embedding ratio and/or the position of the data watermark.
The adjustment module includes:
the first adjusting unit is used for adjusting the embedding proportion of the data watermark when the data entry contains a single type field;
and the second adjusting unit is used for adjusting the embedding proportion and/or the position of the data watermark when the data entry contains multiple types of fields.
The adjustment module further includes: the proportion adjusting unit is specifically used for:
when the similarity of the watermarks of the content block data is greater than the maximum value in the first threshold range, reducing the proportion of embedding the data watermarks in the data items of the content block data to a preset proportion;
when the similarity of the watermarks of the content block data is smaller than the minimum value in the first threshold range, increasing the proportion of embedding the data watermarks in the data items of the content block data to a preset proportion;
when the overall similarity of the target data is greater than the maximum value in the second threshold range, reducing the proportion of embedding the data watermark in one or more content blocks to a preset proportion;
and when the overall similarity of the target data is smaller than the minimum value in the second threshold range, increasing the proportion of embedding the data watermark in one or more content blocks to a preset proportion.
The adjustment module further includes: the position adjustment unit is specifically used for:
removing original data watermarks in the data items, and embedding data watermarks matched with field types into various fields in the data items according to preset proportions;
and evaluating the similarity of the data items after the data watermarks matched with the field types are embedded, selecting the position of the field with the maximum similarity of the data items as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.
In an embodiment, the field types in the data entry include any one or more of:
a numeric field, a text field, and a natural language field.
In an embodiment, the embedding module is specifically configured to:
embedding a numeric data watermark in a numeric field when the numeric field is included in the data entry;
embedding a character text type data watermark in a text field when the text field is included in the data item;
when the data entry comprises a natural language field, embedding a natural language type data watermark in the natural language field.
In an embodiment, the data entry similarity evaluation module is specifically configured to:
when a numerical data watermark is embedded in a numerical value field of the data item, deconstructing and word segmentation is carried out on numerical values before and after the data watermark is embedded, and the data item similarity is evaluated through a Euclidean distance vector data similarity evaluation model;
when a text field of the data item is embedded with a text-type character data watermark, deconstructing ASCII code values before and after the data watermark is embedded, and evaluating the similarity of the data item through a cosine vector data similarity evaluation model;
and when the natural language field of the data item is embedded with the natural language type data watermark, performing deconstructing word segmentation on the natural language field before and after the data watermark is embedded by using a space vector model, and performing data item similarity assessment on a deconstructing word segmentation result through a cosine vector data similarity assessment model.
In an embodiment, the content block data watermark similarity assessment is performed as follows:
Figure BDA0002807028260000121
wherein: delta represents the watermark similarity of the content block data; n represents the total number of data entries in the content block; c (C) i Representing the similarity of the data entries of the ith data entry.
In an embodiment, the overall similarity of the target data is evaluated as follows:
Figure BDA0002807028260000122
wherein: θ represents the similarity of the whole target data; m represents the total number of content blocks in the target data; delta i And the watermark similarity of the content block data of the ith content block is represented.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as providing for the use of additional embodiments and advantages of all such modifications, equivalents, improvements and similar to the present invention are intended to be included within the scope of the present invention as defined by the appended claims.

Claims (11)

1. A method of data watermark embedding for target data, comprising:
s1, dividing target data to be embedded with a data watermark into a plurality of content blocks, and embedding the data watermark in a data entry of each content block;
s2, adopting a preset data similarity evaluation model to evaluate the similarity of the data items before embedding the data watermark and the data items after embedding the data watermark;
s3, evaluating the similarity of the data watermarks of the content blocks based on the data items of all the data items forming the content blocks before embedding the data watermarks and the data items of all the data items forming the content blocks after embedding the data watermarks, when the similarity of the data watermarks of all the content blocks meets a first threshold range, executing S4, otherwise, adjusting the embedding proportion and/or the embedding position of the data watermarks in the content blocks which do not meet the first threshold range, and executing S2;
s4, calculating the similarity of the whole target data based on the whole data of all content blocks forming the target data before embedding the data watermark and the whole data of all content blocks forming the target data after embedding the data watermark, and finishing the embedding of the data watermark when the similarity of the whole target data meets a second threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermark in one or more content blocks, and executing S2.
2. The method of claim 1, wherein adjusting the embedding ratio and/or location of the data watermark comprises:
when the data item contains a single type field, adjusting the embedding proportion of the data watermark;
when the data entry contains multiple types of fields, the embedding proportion and/or the position of the data watermark are/is adjusted.
3. The method of claim 2, wherein said adjusting the embedding ratio of the data watermark comprises:
when the similarity of the watermarks of the content block data is greater than the maximum value in the first threshold range, reducing the proportion of embedding the data watermarks in the data items of the content block data to a preset proportion;
when the similarity of the watermarks of the content block data is smaller than the minimum value in the first threshold range, increasing the proportion of embedding the data watermarks in the data items of the content block data to a preset proportion;
when the overall similarity of the target data is greater than the maximum value in the second threshold range, reducing the proportion of embedding the data watermark in one or more content blocks to a preset proportion;
and when the overall similarity of the target data is smaller than the minimum value in the second threshold range, increasing the proportion of embedding the data watermark in one or more content blocks to a preset proportion.
4. The method of claim 2, wherein said adjusting the embedding location of the data watermark comprises:
removing original data watermarks in the data items, and embedding data watermarks matched with field types into various fields in the data items according to preset proportions;
and evaluating the similarity of the data items after the data watermarks matched with the field types are embedded, selecting the position of the field with the maximum similarity of the data items as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.
5. The method of any of claims 2 or 4, wherein the field types in the data entry include any one or more of:
a numeric field, a text field, and a natural language field.
6. The method of claim 5, wherein embedding a data watermark in the data entry comprises:
embedding a numeric data watermark in a numeric field when the numeric field is included in the data entry;
embedding a character text type data watermark in a text field when the text field is included in the data item;
when the data entry comprises a natural language field, embedding a natural language type data watermark in the natural language field.
7. The method of claim 1, wherein the performing data entry similarity evaluation on the data entry embedded with the data watermark using a preset data similarity evaluation model comprises:
when a numerical data watermark is embedded in a numerical value field of the data item, deconstructing and word segmentation is carried out on numerical values before and after the data watermark is embedded, and the data item similarity is evaluated through a Euclidean distance vector data similarity evaluation model;
when a text field of the data item is embedded with a text-type character data watermark, deconstructing ASCII code values before and after the data watermark is embedded, and evaluating the similarity of the data item through a cosine vector data similarity evaluation model;
and when the natural language field of the data item is embedded with the natural language type data watermark, performing deconstructing word segmentation on the natural language field before and after the data watermark is embedded by using a space vector model, and performing data item similarity assessment on a deconstructing word segmentation result through a cosine vector data similarity assessment model.
8. The method of claim 1, wherein the content block data watermark similarity assessment is performed as follows:
Figure QLYQS_1
wherein: delta represents the watermark similarity of the content block data; n represents the total number of data entries in the content block; c (C) i Representing the similarity of the data entries of the ith data entry.
9. The method of claim 1, wherein the similarity of the target data overall is evaluated as follows:
Figure QLYQS_2
wherein: θ represents the similarity of the whole target data; m represents the total number of content blocks in the target data; delta i And the watermark similarity of the content block data of the ith content block is represented.
10. A data watermark embedding system for target data, comprising:
the embedding module is used for dividing target data to be embedded with the data watermark into a plurality of content blocks, and embedding the data watermark into a data item of each content block;
the data item similarity evaluation module is used for evaluating the similarity of the data items before the data watermark is embedded and the data items after the data watermark is embedded;
the content block similarity evaluation module is used for evaluating the similarity of the content block data watermarks of all data items which form the content block before embedding the data watermarks and all data items which form the content block after embedding the data watermarks, executing the overall similarity evaluation module when the similarity of the content block data watermarks meets a first threshold range, otherwise, adjusting the embedding proportion and/or the position of the data watermarks in the content block which does not meet the first threshold range, and executing the data item similarity evaluation module;
and the overall similarity evaluation module is used for evaluating overall similarity of the data overall based on all content blocks forming the target data before embedding the data watermark and the data overall based on all content blocks forming the target data after embedding the data watermark, completing embedding of the data watermark when the similarity of the data overall meets a second threshold range, otherwise, adjusting the embedding proportion and/or position of the data watermark in one or more content blocks, and executing the data item similarity evaluation module.
11. The system of claim 10, wherein the data item similarity evaluation module is specifically configured to:
when a numerical data watermark is embedded in a numerical value field of the data item, deconstructing and word segmentation is carried out on numerical values before and after the data watermark is embedded, and the data item similarity is evaluated through a Euclidean distance vector data similarity evaluation model;
when a text field of the data item is embedded with a text-type character data watermark, deconstructing ASCII code values before and after the data watermark is embedded, and evaluating the similarity of the data item through a cosine vector data similarity evaluation model;
and when the natural language field of the data item is embedded with the natural language type data watermark, performing deconstructing word segmentation on the natural language field before and after the data watermark is embedded by using a space vector model, and performing data item similarity assessment on a deconstructing word segmentation result through a cosine vector data similarity assessment model.
CN202011375206.4A 2020-11-30 2020-11-30 Method and system for embedding data watermark of target data Active CN112613045B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011375206.4A CN112613045B (en) 2020-11-30 2020-11-30 Method and system for embedding data watermark of target data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011375206.4A CN112613045B (en) 2020-11-30 2020-11-30 Method and system for embedding data watermark of target data

Publications (2)

Publication Number Publication Date
CN112613045A CN112613045A (en) 2021-04-06
CN112613045B true CN112613045B (en) 2023-06-06

Family

ID=75228159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011375206.4A Active CN112613045B (en) 2020-11-30 2020-11-30 Method and system for embedding data watermark of target data

Country Status (1)

Country Link
CN (1) CN112613045B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100364326C (en) * 2005-12-01 2008-01-23 北京北大方正电子有限公司 Method and apparatus for embedding and detecting digital watermark in text file
CN104599225B (en) * 2015-02-04 2017-11-21 河南师范大学 Based on singular value decomposition and the insertion of the image watermark of principal component analysis and extracting method
CN106612467A (en) * 2015-10-21 2017-05-03 上海文广互动电视有限公司 A video content protection method and apparatus based on watermarks
US10296999B2 (en) * 2017-02-13 2019-05-21 Macau University Of Science And Technology Methods and apparatus for color image watermarking
CN107240059A (en) * 2017-04-07 2017-10-10 广东精点数据科技股份有限公司 The modeling method of image digital watermark embedment strength regressive prediction model
CN109784006A (en) * 2019-01-04 2019-05-21 平安科技(深圳)有限公司 Watermark insertion and extracting method and terminal device
CN111861846A (en) * 2020-07-10 2020-10-30 哈尔滨工业大学(深圳) Electronic document digital watermark processing method and system

Also Published As

Publication number Publication date
CN112613045A (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN112559985B (en) Watermark embedding and extracting method
CN110019216B (en) Intellectual property data storage method, medium and computer device based on block chain
CN110688675B (en) Data leakage tracing device and method based on privacy protection and readable storage medium
CN111737750A (en) Data processing method and device, electronic equipment and storage medium
CN110969243B (en) Method and device for training countermeasure generation network for preventing privacy leakage
CN115189878B (en) Shared data sorting method based on secret sharing and electronic equipment
CN114356919A (en) Watermark embedding method, tracing method and device for structured database
CN113807940B (en) Information processing and fraud recognition method, device, equipment and storage medium
CN112613045B (en) Method and system for embedding data watermark of target data
CN101639828A (en) Method for hiding and extracting watermark based on XML electronic document
CN110069781A (en) A kind of recognition methods of entity tag and relevant device
CN114036581A (en) Privacy calculation method based on neural network model
CN111797369B (en) Digital watermarking method for relational database
CN109361696A (en) A kind of safety classification method towards trust on-line
CN112559984A (en) Data watermark embedding method and system
CN113065151A (en) Relational database information security enhancement method, system, terminal and storage medium
CN112765641B (en) Efficient desensitization method and device
Truong et al. On guaranteeing k-anonymity in location databases
CN112580084A (en) New energy data anomaly detection method based on low-carbon economy
CN113742495B (en) Rating feature weight determining method and device based on prediction model and electronic equipment
CN115455965B (en) Character grouping method based on word distance word chain, storage medium and electronic equipment
CN112685418B (en) Method and system for realizing intelligent scheduling charging engine
CN111797639B (en) Machine translation quality assessment method and system
CN113505115A (en) Data batch import method and device and electronic equipment
CN113704709A (en) Digital watermark data tracing method based on attribute importance index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 102209 18 Riverside Avenue, Changping District science and Technology City, Beijing

Applicant after: State Grid Smart Grid Research Institute Co.,Ltd.

Address before: 102209 18 Riverside Avenue, Changping District science and Technology City, Beijing

Applicant before: GLOBAL ENERGY INTERCONNECTION RESEARCH INSTITUTE Co.,Ltd.

GR01 Patent grant
GR01 Patent grant