CN112559984A

CN112559984A - Data watermark embedding method and system

Info

Publication number: CN112559984A
Application number: CN202011375889.3A
Authority: CN
Inventors: 吴宁; 于鹏飞; 邹云峰; 单超; 沈文
Original assignee: State Grid Jiangsu Electric Power Co ltd Marketing Service Center; Global Energy Interconnection Research Institute; Anhui Jiyuan Software Co Ltd
Current assignee: State Grid Jiangsu Electric Power Co ltd Marketing Service Center; Anhui Jiyuan Software Co Ltd
Priority date: 2020-11-30
Filing date: 2020-11-30
Publication date: 2021-03-26

Abstract

The invention discloses a data watermark embedding method and a system, wherein the method comprises the following steps: embedding a data watermark in a data item of target data, and performing data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model; calculating the overall similarity of the target data based on the data item similarity evaluation result; when the overall similarity of the target data meets a threshold range, finishing data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark. According to the data watermark similarity evaluation result of the data item and the data whole, the data watermark embedded into the target data is dynamically adjusted, so that high concealment and high simulation after the data watermark is embedded are finally realized.

Description

Data watermark embedding method and system

Technical Field

The invention relates to the field of data watermarks, in particular to a data watermark embedding method and a data watermark embedding system.

Background

With the continuous development of digital economy, the information exchange among different departments, different regions and different data main bodies is gradually increased, and the data are circulated, recombined and used more and more frequently in the form of structured data among all the ring sections. Data are used in a dynamic environment, the risk of data leakage events is huge, once data leakage occurs, responsibility links can be accurately positioned, so that the safety responsibility of related personnel can be traced, and the safety control of weak links can be pertinently strengthened.

The data watermarking technology is one of effective technical means for solving the problem of responsibility tracing after data leakage. The data watermarking adds extra redundant identification information to the data content, associates and records related responsibility links by highly imitating real data content and adding the identification information, and once data is leaked, positioning can be carried out according to the watermarking information added in advance. And high emulation and high concealment are effective key indexes of the data watermark, and are prevented from being discovered and damaged by malicious users. The realization of high simulation and high concealment of the data watermark requires that the similarity of target data before and after the data watermark is added must reach a threshold value which is not easy to be found by a user, so how to embed the data watermark in the target data achieves the high concealment and high simulation after the data watermark is embedded needs to be solved urgently.

Disclosure of Invention

In order to solve the above-mentioned deficiencies in the prior art, the present invention provides a data watermark embedding method, including:

embedding a data watermark in a data item of target data, and performing data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model;

calculating the overall similarity of the target data based on the data item similarity evaluation result;

when the overall similarity of the target data meets a threshold range, finishing data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark.

Preferably, the adjusting the embedding proportion and/or position of the data watermark to make the overall similarity of the target data meet the threshold range includes:

when the data entry contains a single type field, the whole similarity of the target data meets a threshold range by adjusting the embedding proportion of the data watermark;

when the data entry contains multiple types of fields, the overall similarity of the target data meets a threshold range by adjusting the embedding proportion and/or position of the data watermark.

Preferably, the adjusting the embedding proportion of the data watermark includes:

when the overall similarity of the target data is larger than the maximum value in the threshold range, reducing the proportion of embedding the data watermark in the data entry of the target data to a preset proportion;

and when the overall similarity of the target data is less than the minimum value in the threshold range, increasing the proportion of embedding the data watermark in the data entry of the target data to a preset proportion.

Preferably, the adjusting the embedding position of the data watermark includes:

removing the original data watermarks in the data entries, and respectively embedding the data watermarks matched with the field types into various types of fields in the data entries according to a preset proportion;

and carrying out data item similarity evaluation on the data items embedded with the data watermarks matched with the field types, selecting the position of the field with the maximum data item similarity as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.

Preferably, the field types in the data entry include any one or more of the following:

a value field, a text field, and a natural language field.

Preferably, the embedding a data watermark in a data entry of target data includes:

when the data entry comprises a value field, embedding a numerical data watermark in the value field;

when the data entry comprises a text field, embedding a character text type data watermark in the text field;

when a natural language field is included in the data entry, a natural language type data watermark is embedded in the natural language field.

Preferably, the performing, by using a preset data similarity evaluation model, data entry similarity evaluation on the data entry embedded with the data watermark includes:

when a numerical value type data watermark is embedded into a numerical value field of the data entry, deconstructing and word segmentation are carried out on numerical values before and after the data watermark is embedded, and data entry similarity evaluation is carried out through an Euclidean distance vector data similarity evaluation model;

when a character text type data watermark is embedded into a text field of the data entry, deconstructing ASCII code values before and after the data watermark is embedded, and performing data entry similarity evaluation through a cosine vector data similarity evaluation model;

and when the natural language field of the data entry is embedded with the natural language type data watermark, applying a space vector model to the natural language field before and after the data watermark is embedded for deconstructing and word segmentation, and evaluating the similarity of the data entry according to the deconstructed word segmentation result by a cosine vector data similarity evaluation model.

Preferably, the overall similarity evaluation of the target data is performed according to the following formula:

in the formula: δ represents the similarity of the entire target data; n represents the total number of data entries in the target data; c_iIndicating the data entry similarity of the ith data entry.

Based on the same inventive concept, the invention also provides a data watermark embedding system, which comprises:

the data item similarity evaluation module is used for embedding a data watermark into a data item of target data and carrying out data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model;

the overall similarity calculation module is used for calculating the overall similarity of the target data based on the data item similarity evaluation result;

the adjusting module is used for completing data watermark embedding when the overall similarity of the target data meets a threshold range; and when the overall similarity of the target data does not meet the threshold range, the data watermark embedding is completed by adjusting the embedding proportion and/or position of the data watermark so that the overall similarity of the target data meets the threshold range.

Preferably, the adjusting module includes:

the first adjusting submodule is used for adjusting the embedding proportion of the data watermark when the data item contains a single type field, so that the overall similarity of the target data meets a threshold range;

and the second adjusting submodule is used for adjusting the embedding proportion and/or position of the data watermark when the data entry contains multiple types of fields, so that the overall similarity of the target data meets the threshold range.

Compared with the prior art, the invention has the beneficial effects that:

according to the technical scheme provided by the invention, a data watermark is embedded in a data item of target data, and a preset data similarity evaluation model is adopted to evaluate the similarity of the data item after the data watermark is embedded; calculating the overall similarity of the target data based on the data item similarity evaluation result; when the overall similarity of the target data meets a threshold range, finishing data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark. According to the data watermark similarity evaluation result of the data item and the data whole, the data watermark embedded into the target data is dynamically adjusted, so that high concealment and high simulation after the data watermark is embedded are finally realized.

Drawings

Fig. 1 is a flowchart of a data watermark embedding method provided by the present invention;

fig. 2 is a schematic diagram of a data watermark embedding system according to an embodiment of the present invention.

Detailed Description

For a better understanding of the present invention, reference is made to the following description taken in conjunction with the accompanying drawings and examples.

Example 1: as shown in fig. 1, to meet the urgent needs in the prior art, the present invention provides a data watermark embedding method, including:

s1, embedding a data watermark in a data entry of target data, and performing data entry similarity evaluation on the data entry embedded with the data watermark by adopting a preset data similarity evaluation model;

s2, calculating the overall similarity of the target data based on the data item similarity evaluation result;

s3, when the overall similarity of the target data meets the threshold range, finishing the data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark.

According to the method, on one hand, the similarity evaluation of the data watermark item and the data watermark embedding data is carried out through the data similarity evaluation model, on the other hand, the proportion or the distribution position of the watermark adding or the proportion and the distribution position are dynamically adjusted according to the evaluation result so as to meet the similarity threshold of the streaming data set by a user, and the concealment and the high simulation of the data watermark embedding are integrally guaranteed.

In this embodiment, the step S1 of embedding the data watermark in the data entry of the target data, and performing data entry similarity evaluation on the data entry after embedding the data watermark by using a preset data similarity evaluation model includes:

namely, for each data entry forming the target data, embedding a data watermark of a corresponding type according to a field type in the data entry, wherein in order to improve the information capacity of data watermark embedding, the proportion of data watermark embedding in the target data is 100%, and the specific embedding process comprises the following steps:

Then selecting a proper data similarity evaluation model according to different data watermark embedding algorithms, and carrying out similarity evaluation on data items; the following specifically lists the similarity method for evaluating data items in this embodiment:

the data entry similarity in this embodiment means that, for a certain type of data, the data type feature in the data entry after embedding the data watermark cannot be changed, and if the data type feature is changed, the result of evaluating the similarity of the data watermark entry is 0.

For example, the mobile phone number is 11 bits, wherein the first 3 bits represent a network identification number, the 4 th to 7 th bits represent a regional code, and the 8 th to 11 th bits represent a user number, and after the data watermark is embedded, the data type characteristics of the mobile phone number still need to be met.

(1) After the data watermark of the numerical type is embedded, the numerical values before and after the data watermark is embedded are subjected to deconstruction word segmentation, similarity evaluation is carried out through an Euclidean distance vector data similarity evaluation model, and the evaluation result is D.

For example, in mobile phone number type data, values before and after embedding of a data watermark are P and P ', and each digit is an independent unit through structure segmentation, that is, P ═ { N1, N2, … …, N11}, and P ' ═ N ' 1, N ' 2, … …, N ' 11 }. Then, substituting Euclidean data similarity evaluation model to calculate similarity

(2) After embedding the character text type data watermark, deconstructing ASCII code values before and after embedding the data watermark, and carrying out similarity evaluation through a cosine vector data similarity evaluation model, wherein the evaluation result is C

For example, for data of a WeChat account type, values before and after embedding of a data watermark are respectively Pi and Pi ', and each digit is an independent unit through ASCII code value deconstruction, namely P ═ { N1, N2, … … and Nn }, and P ' ═ N ' 1, N ' 2, … … and N ' N };

then, a cosine data similarity evaluation model is introduced to calculate the similarity

(3) After embedding the data watermark of the natural language type, deconstructing and word segmentation are carried out on the data watermark before and after embedding by applying a space vector model, and data similarity evaluation is carried out on a deconstructed word segmentation result through a cosine vector data similarity evaluation model.

Natural language in this application means: the data related to natural language in the power business has remarkable professional characteristics, such as address class data of overhaul addresses, expansion addresses and the like; electric power professional term data such as operation terms and electric quantity terms; and names and the like facing residents can form a power business natural language data characteristic word bank.

Performing word segmentation on data related to natural language of the power service before and after data watermarking is added to obtain vector expressions O ═ { O1, O2, … …,On and O '═ O' 1, O '2, … …, O' n, are substituted into the cosine data similarity evaluation model, and the similarity is calculated

S2, calculating the overall similarity of the target data based on the data item similarity evaluation result, wherein the overall similarity comprises the following steps:

when the target data comprises N rows of data entries, similarity evaluation of the data entries is carried out according to S1, the evaluation result is marked as C, and then the overall similarity of the target data after the data watermark is completely embedded is shown as C

S3, when the overall similarity of the target data meets the threshold range, finishing the data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark, and the method comprises the following steps:

In this embodiment, a method adopted when the overall similarity of the target data does not meet the threshold requirement is specifically described:

the first method dynamically adjusts the proportion of data watermark addition, and comprises the following steps:

when the overall similarity of the target data is larger than the maximum value in the threshold range, reducing the proportion of the data watermark embedded in the target data to a preset proportion;

and when the overall similarity of the target data is smaller than the minimum value in the threshold range, increasing the proportion of the data watermark embedded in the target data to a preset proportion.

The process specifically comprises the following steps: when the overall similarity of the target data overall embedded with the data watermark exceeds the maximum value in the set threshold range, the overall similarity before and after the data watermark is embedded can be ensured by reducing the embedding proportion of the data watermark, for example, the embedding proportion of the data watermark can be set to 50%, 30% or 20%, and the like.

If the overall similarity of the target data after the data watermark is embedded is smaller than the minimum value in the threshold range, the data watermark embedding capacity can be increased as much as possible by increasing the embedding proportion of the data watermark, for example, the embedding proportion of the data watermark can be set to 20%, 30%, 50%, or the like.

The second method dynamically adjusts the position of data watermark adding, and comprises the following steps:

In this embodiment, adjusting the position of data watermark adding specifically includes: when the data entry of the target data includes a numerical value, a text and a natural language, the data watermark may be added to the numerical value field, the text field or the natural language field according to a fixed embedding proportion of the data watermark, the data entry similarity after the data watermark is added to the embedded field, the text or the natural language field is calculated according to the method provided by S1, the position with the maximum data entry similarity is selected as the optimal position for adding the watermark, the original data watermark added to the entry is deleted, and the overall similarity is calculated according to the data entry similarity, so that the overall similarity meets the threshold requirement, thereby improving the data watermark embedding capacity as much as possible on the premise of ensuring the overall similarity after the data watermark is embedded.

In this embodiment, when the data entry includes multiple types of fields, the proportion or the position of embedding the data watermark may be adjusted according to the requirement, or the proportion and the position may be adjusted at the same time, so that the overall similarity of the target data meets the threshold requirement, and finally, high concealment and high simulation after embedding the data watermark are achieved.

In this embodiment, when the data entry includes a single type field, the proportion of embedding the data watermark may be adjusted according to the requirement, so that the overall similarity of the target data meets the threshold requirement, and finally, high concealment and high simulation after embedding the data watermark are achieved.

Example 2: based on the same inventive concept, the present invention further provides a data watermark embedding system, as shown in fig. 2, including:

In an embodiment, the adjusting module includes:

In an embodiment, the adjusting module further includes: an embedding ratio adjustment unit for:

In an embodiment, the adjusting module further includes: an embedded position adjusting unit for:

In this embodiment, the field types in the data entry include any one or more of the following:

a value field, a text field, and a natural language field.

In this embodiment, the data entry similarity evaluation module includes:

the data watermark embedding unit is specifically configured to:

The data entry similarity evaluation unit is specifically configured to:

In the examples, the overall similarity evaluation of the target data is performed as follows:

In order to achieve the purpose of achieving high concealment and high simulation after the data watermark is embedded into the target data, the embodiment of the invention selects proper proportion and distribution strategy of watermark addition according to the similarity evaluation results of different data watermark algorithms so as to finally achieve high concealment and high simulation after the data watermark is embedded.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention are included in the scope of the claims of the present invention which are filed as the application.

Claims

1. A method of data watermark embedding, comprising:

2. The method of claim 1, wherein the adjusting the embedding proportion and/or the embedding position of the data watermark to make the overall similarity of the target data meet the threshold range comprises:

3. The method of claim 2, wherein the adjusting the embedding ratio of the data watermark comprises:

4. The method of claim 2, wherein the adjusting the embedding location of the data watermark comprises:

5. The method of any of claims 2 or 4, wherein the field types in the data entry include any one or more of:

a value field, a text field, and a natural language field.

6. The method of claim 5, wherein embedding the data watermark in the data entry of the target data comprises:

7. The method of claim 1, wherein the performing data item similarity evaluation on the data item embedded with the data watermark by using a preset data similarity evaluation model comprises:

8. The method of claim 1, wherein the global similarity assessment of the target data is performed according to the following equation:

9. A data watermark embedding system, comprising:

10. The system of claim 9, wherein the adjustment module comprises: