CN112559984A - Data watermark embedding method and system - Google Patents

Data watermark embedding method and system Download PDF

Info

Publication number
CN112559984A
CN112559984A CN202011375889.3A CN202011375889A CN112559984A CN 112559984 A CN112559984 A CN 112559984A CN 202011375889 A CN202011375889 A CN 202011375889A CN 112559984 A CN112559984 A CN 112559984A
Authority
CN
China
Prior art keywords
data
watermark
embedding
similarity
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011375889.3A
Other languages
Chinese (zh)
Inventor
吴宁
于鹏飞
邹云峰
单超
沈文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Anhui Jiyuan Software Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Global Energy Interconnection Research Institute
Anhui Jiyuan Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co ltd Marketing Service Center, Global Energy Interconnection Research Institute, Anhui Jiyuan Software Co Ltd filed Critical State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority to CN202011375889.3A priority Critical patent/CN112559984A/en
Publication of CN112559984A publication Critical patent/CN112559984A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/16Program or content traceability, e.g. by watermarking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Technology Law (AREA)
  • Multimedia (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a data watermark embedding method and a system, wherein the method comprises the following steps: embedding a data watermark in a data item of target data, and performing data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model; calculating the overall similarity of the target data based on the data item similarity evaluation result; when the overall similarity of the target data meets a threshold range, finishing data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark. According to the data watermark similarity evaluation result of the data item and the data whole, the data watermark embedded into the target data is dynamically adjusted, so that high concealment and high simulation after the data watermark is embedded are finally realized.

Description

Data watermark embedding method and system
Technical Field
The invention relates to the field of data watermarks, in particular to a data watermark embedding method and a data watermark embedding system.
Background
With the continuous development of digital economy, the information exchange among different departments, different regions and different data main bodies is gradually increased, and the data are circulated, recombined and used more and more frequently in the form of structured data among all the ring sections. Data are used in a dynamic environment, the risk of data leakage events is huge, once data leakage occurs, responsibility links can be accurately positioned, so that the safety responsibility of related personnel can be traced, and the safety control of weak links can be pertinently strengthened.
The data watermarking technology is one of effective technical means for solving the problem of responsibility tracing after data leakage. The data watermarking adds extra redundant identification information to the data content, associates and records related responsibility links by highly imitating real data content and adding the identification information, and once data is leaked, positioning can be carried out according to the watermarking information added in advance. And high emulation and high concealment are effective key indexes of the data watermark, and are prevented from being discovered and damaged by malicious users. The realization of high simulation and high concealment of the data watermark requires that the similarity of target data before and after the data watermark is added must reach a threshold value which is not easy to be found by a user, so how to embed the data watermark in the target data achieves the high concealment and high simulation after the data watermark is embedded needs to be solved urgently.
Disclosure of Invention
In order to solve the above-mentioned deficiencies in the prior art, the present invention provides a data watermark embedding method, including:
embedding a data watermark in a data item of target data, and performing data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model;
calculating the overall similarity of the target data based on the data item similarity evaluation result;
when the overall similarity of the target data meets a threshold range, finishing data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark.
Preferably, the adjusting the embedding proportion and/or position of the data watermark to make the overall similarity of the target data meet the threshold range includes:
when the data entry contains a single type field, the whole similarity of the target data meets a threshold range by adjusting the embedding proportion of the data watermark;
when the data entry contains multiple types of fields, the overall similarity of the target data meets a threshold range by adjusting the embedding proportion and/or position of the data watermark.
Preferably, the adjusting the embedding proportion of the data watermark includes:
when the overall similarity of the target data is larger than the maximum value in the threshold range, reducing the proportion of embedding the data watermark in the data entry of the target data to a preset proportion;
and when the overall similarity of the target data is less than the minimum value in the threshold range, increasing the proportion of embedding the data watermark in the data entry of the target data to a preset proportion.
Preferably, the adjusting the embedding position of the data watermark includes:
removing the original data watermarks in the data entries, and respectively embedding the data watermarks matched with the field types into various types of fields in the data entries according to a preset proportion;
and carrying out data item similarity evaluation on the data items embedded with the data watermarks matched with the field types, selecting the position of the field with the maximum data item similarity as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.
Preferably, the field types in the data entry include any one or more of the following:
a value field, a text field, and a natural language field.
Preferably, the embedding a data watermark in a data entry of target data includes:
when the data entry comprises a value field, embedding a numerical data watermark in the value field;
when the data entry comprises a text field, embedding a character text type data watermark in the text field;
when a natural language field is included in the data entry, a natural language type data watermark is embedded in the natural language field.
Preferably, the performing, by using a preset data similarity evaluation model, data entry similarity evaluation on the data entry embedded with the data watermark includes:
when a numerical value type data watermark is embedded into a numerical value field of the data entry, deconstructing and word segmentation are carried out on numerical values before and after the data watermark is embedded, and data entry similarity evaluation is carried out through an Euclidean distance vector data similarity evaluation model;
when a character text type data watermark is embedded into a text field of the data entry, deconstructing ASCII code values before and after the data watermark is embedded, and performing data entry similarity evaluation through a cosine vector data similarity evaluation model;
and when the natural language field of the data entry is embedded with the natural language type data watermark, applying a space vector model to the natural language field before and after the data watermark is embedded for deconstructing and word segmentation, and evaluating the similarity of the data entry according to the deconstructed word segmentation result by a cosine vector data similarity evaluation model.
Preferably, the overall similarity evaluation of the target data is performed according to the following formula:
Figure BDA0002807195680000031
in the formula: δ represents the similarity of the entire target data; n represents the total number of data entries in the target data; ciIndicating the data entry similarity of the ith data entry.
Based on the same inventive concept, the invention also provides a data watermark embedding system, which comprises:
the data item similarity evaluation module is used for embedding a data watermark into a data item of target data and carrying out data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model;
the overall similarity calculation module is used for calculating the overall similarity of the target data based on the data item similarity evaluation result;
the adjusting module is used for completing data watermark embedding when the overall similarity of the target data meets a threshold range; and when the overall similarity of the target data does not meet the threshold range, the data watermark embedding is completed by adjusting the embedding proportion and/or position of the data watermark so that the overall similarity of the target data meets the threshold range.
Preferably, the adjusting module includes:
the first adjusting submodule is used for adjusting the embedding proportion of the data watermark when the data item contains a single type field, so that the overall similarity of the target data meets a threshold range;
and the second adjusting submodule is used for adjusting the embedding proportion and/or position of the data watermark when the data entry contains multiple types of fields, so that the overall similarity of the target data meets the threshold range.
Compared with the prior art, the invention has the beneficial effects that:
according to the technical scheme provided by the invention, a data watermark is embedded in a data item of target data, and a preset data similarity evaluation model is adopted to evaluate the similarity of the data item after the data watermark is embedded; calculating the overall similarity of the target data based on the data item similarity evaluation result; when the overall similarity of the target data meets a threshold range, finishing data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark. According to the data watermark similarity evaluation result of the data item and the data whole, the data watermark embedded into the target data is dynamically adjusted, so that high concealment and high simulation after the data watermark is embedded are finally realized.
Drawings
Fig. 1 is a flowchart of a data watermark embedding method provided by the present invention;
fig. 2 is a schematic diagram of a data watermark embedding system according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present invention, reference is made to the following description taken in conjunction with the accompanying drawings and examples.
Example 1: as shown in fig. 1, to meet the urgent needs in the prior art, the present invention provides a data watermark embedding method, including:
s1, embedding a data watermark in a data entry of target data, and performing data entry similarity evaluation on the data entry embedded with the data watermark by adopting a preset data similarity evaluation model;
s2, calculating the overall similarity of the target data based on the data item similarity evaluation result;
s3, when the overall similarity of the target data meets the threshold range, finishing the data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark.
According to the method, on one hand, the similarity evaluation of the data watermark item and the data watermark embedding data is carried out through the data similarity evaluation model, on the other hand, the proportion or the distribution position of the watermark adding or the proportion and the distribution position are dynamically adjusted according to the evaluation result so as to meet the similarity threshold of the streaming data set by a user, and the concealment and the high simulation of the data watermark embedding are integrally guaranteed.
In this embodiment, the step S1 of embedding the data watermark in the data entry of the target data, and performing data entry similarity evaluation on the data entry after embedding the data watermark by using a preset data similarity evaluation model includes:
namely, for each data entry forming the target data, embedding a data watermark of a corresponding type according to a field type in the data entry, wherein in order to improve the information capacity of data watermark embedding, the proportion of data watermark embedding in the target data is 100%, and the specific embedding process comprises the following steps:
when the data entry comprises a value field, embedding a numerical data watermark in the value field;
when the data entry comprises a text field, embedding a character text type data watermark in the text field;
when a natural language field is included in the data entry, a natural language type data watermark is embedded in the natural language field.
Then selecting a proper data similarity evaluation model according to different data watermark embedding algorithms, and carrying out similarity evaluation on data items; the following specifically lists the similarity method for evaluating data items in this embodiment:
the data entry similarity in this embodiment means that, for a certain type of data, the data type feature in the data entry after embedding the data watermark cannot be changed, and if the data type feature is changed, the result of evaluating the similarity of the data watermark entry is 0.
For example, the mobile phone number is 11 bits, wherein the first 3 bits represent a network identification number, the 4 th to 7 th bits represent a regional code, and the 8 th to 11 th bits represent a user number, and after the data watermark is embedded, the data type characteristics of the mobile phone number still need to be met.
(1) After the data watermark of the numerical type is embedded, the numerical values before and after the data watermark is embedded are subjected to deconstruction word segmentation, similarity evaluation is carried out through an Euclidean distance vector data similarity evaluation model, and the evaluation result is D.
For example, in mobile phone number type data, values before and after embedding of a data watermark are P and P ', and each digit is an independent unit through structure segmentation, that is, P ═ { N1, N2, … …, N11}, and P ' ═ N ' 1, N ' 2, … …, N ' 11 }. Then, substituting Euclidean data similarity evaluation model to calculate similarity
Figure BDA0002807195680000051
(2) After embedding the character text type data watermark, deconstructing ASCII code values before and after embedding the data watermark, and carrying out similarity evaluation through a cosine vector data similarity evaluation model, wherein the evaluation result is C
For example, for data of a WeChat account type, values before and after embedding of a data watermark are respectively Pi and Pi ', and each digit is an independent unit through ASCII code value deconstruction, namely P ═ { N1, N2, … … and Nn }, and P ' ═ N ' 1, N ' 2, … … and N ' N };
then, a cosine data similarity evaluation model is introduced to calculate the similarity
Figure BDA0002807195680000061
(3) After embedding the data watermark of the natural language type, deconstructing and word segmentation are carried out on the data watermark before and after embedding by applying a space vector model, and data similarity evaluation is carried out on a deconstructed word segmentation result through a cosine vector data similarity evaluation model.
Natural language in this application means: the data related to natural language in the power business has remarkable professional characteristics, such as address class data of overhaul addresses, expansion addresses and the like; electric power professional term data such as operation terms and electric quantity terms; and names and the like facing residents can form a power business natural language data characteristic word bank.
Performing word segmentation on data related to natural language of the power service before and after data watermarking is added to obtain vector expressions O ═ { O1, O2, … …,On and O '═ O' 1, O '2, … …, O' n, are substituted into the cosine data similarity evaluation model, and the similarity is calculated
Figure BDA0002807195680000062
S2, calculating the overall similarity of the target data based on the data item similarity evaluation result, wherein the overall similarity comprises the following steps:
when the target data comprises N rows of data entries, similarity evaluation of the data entries is carried out according to S1, the evaluation result is marked as C, and then the overall similarity of the target data after the data watermark is completely embedded is shown as C
Figure BDA0002807195680000063
In the formula: δ represents the similarity of the entire target data; n represents the total number of data entries in the target data; ciIndicating the data entry similarity of the ith data entry.
S3, when the overall similarity of the target data meets the threshold range, finishing the data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark, and the method comprises the following steps:
when the data entry contains a single type field, the whole similarity of the target data meets a threshold range by adjusting the embedding proportion of the data watermark;
when the data entry contains multiple types of fields, the overall similarity of the target data meets a threshold range by adjusting the embedding proportion and/or position of the data watermark.
In this embodiment, a method adopted when the overall similarity of the target data does not meet the threshold requirement is specifically described:
the first method dynamically adjusts the proportion of data watermark addition, and comprises the following steps:
when the overall similarity of the target data is larger than the maximum value in the threshold range, reducing the proportion of the data watermark embedded in the target data to a preset proportion;
and when the overall similarity of the target data is smaller than the minimum value in the threshold range, increasing the proportion of the data watermark embedded in the target data to a preset proportion.
The process specifically comprises the following steps: when the overall similarity of the target data overall embedded with the data watermark exceeds the maximum value in the set threshold range, the overall similarity before and after the data watermark is embedded can be ensured by reducing the embedding proportion of the data watermark, for example, the embedding proportion of the data watermark can be set to 50%, 30% or 20%, and the like.
If the overall similarity of the target data after the data watermark is embedded is smaller than the minimum value in the threshold range, the data watermark embedding capacity can be increased as much as possible by increasing the embedding proportion of the data watermark, for example, the embedding proportion of the data watermark can be set to 20%, 30%, 50%, or the like.
The second method dynamically adjusts the position of data watermark adding, and comprises the following steps:
removing the original data watermarks in the data entries, and respectively embedding the data watermarks matched with the field types into various types of fields in the data entries according to a preset proportion;
and carrying out data item similarity evaluation on the data items embedded with the data watermarks matched with the field types, selecting the position of the field with the maximum data item similarity as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.
In this embodiment, adjusting the position of data watermark adding specifically includes: when the data entry of the target data includes a numerical value, a text and a natural language, the data watermark may be added to the numerical value field, the text field or the natural language field according to a fixed embedding proportion of the data watermark, the data entry similarity after the data watermark is added to the embedded field, the text or the natural language field is calculated according to the method provided by S1, the position with the maximum data entry similarity is selected as the optimal position for adding the watermark, the original data watermark added to the entry is deleted, and the overall similarity is calculated according to the data entry similarity, so that the overall similarity meets the threshold requirement, thereby improving the data watermark embedding capacity as much as possible on the premise of ensuring the overall similarity after the data watermark is embedded.
In this embodiment, when the data entry includes multiple types of fields, the proportion or the position of embedding the data watermark may be adjusted according to the requirement, or the proportion and the position may be adjusted at the same time, so that the overall similarity of the target data meets the threshold requirement, and finally, high concealment and high simulation after embedding the data watermark are achieved.
In this embodiment, when the data entry includes a single type field, the proportion of embedding the data watermark may be adjusted according to the requirement, so that the overall similarity of the target data meets the threshold requirement, and finally, high concealment and high simulation after embedding the data watermark are achieved.
Example 2: based on the same inventive concept, the present invention further provides a data watermark embedding system, as shown in fig. 2, including:
the data item similarity evaluation module is used for embedding a data watermark into a data item of target data and carrying out data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model;
the overall similarity calculation module is used for calculating the overall similarity of the target data based on the data item similarity evaluation result;
the adjusting module is used for completing data watermark embedding when the overall similarity of the target data meets a threshold range; and when the overall similarity of the target data does not meet the threshold range, the data watermark embedding is completed by adjusting the embedding proportion and/or position of the data watermark so that the overall similarity of the target data meets the threshold range.
In this embodiment, when the data entry includes multiple types of fields, the proportion or the position of embedding the data watermark may be adjusted according to the requirement, or the proportion and the position may be adjusted at the same time, so that the overall similarity of the target data meets the threshold requirement, and finally, high concealment and high simulation after embedding the data watermark are achieved.
In this embodiment, when the data entry includes a single type field, the proportion of embedding the data watermark may be adjusted according to the requirement, so that the overall similarity of the target data meets the threshold requirement, and finally, high concealment and high simulation after embedding the data watermark are achieved.
In an embodiment, the adjusting module includes:
the first adjusting submodule is used for adjusting the embedding proportion of the data watermark when the data item contains a single type field, so that the overall similarity of the target data meets a threshold range;
and the second adjusting submodule is used for adjusting the embedding proportion and/or position of the data watermark when the data entry contains multiple types of fields, so that the overall similarity of the target data meets the threshold range.
In an embodiment, the adjusting module further includes: an embedding ratio adjustment unit for:
when the overall similarity of the target data is larger than the maximum value in the threshold range, reducing the proportion of embedding the data watermark in the data entry of the target data to a preset proportion;
and when the overall similarity of the target data is less than the minimum value in the threshold range, increasing the proportion of embedding the data watermark in the data entry of the target data to a preset proportion.
In an embodiment, the adjusting module further includes: an embedded position adjusting unit for:
removing the original data watermarks in the data entries, and respectively embedding the data watermarks matched with the field types into various types of fields in the data entries according to a preset proportion;
and carrying out data item similarity evaluation on the data items embedded with the data watermarks matched with the field types, selecting the position of the field with the maximum data item similarity as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.
In this embodiment, the field types in the data entry include any one or more of the following:
a value field, a text field, and a natural language field.
In this embodiment, the data entry similarity evaluation module includes:
the data watermark embedding unit is specifically configured to:
when the data entry comprises a value field, embedding a numerical data watermark in the value field;
when the data entry comprises a text field, embedding a character text type data watermark in the text field;
when a natural language field is included in the data entry, a natural language type data watermark is embedded in the natural language field.
The data entry similarity evaluation unit is specifically configured to:
when a numerical value type data watermark is embedded into a numerical value field of the data entry, deconstructing and word segmentation are carried out on numerical values before and after the data watermark is embedded, and data entry similarity evaluation is carried out through an Euclidean distance vector data similarity evaluation model;
when a character text type data watermark is embedded into a text field of the data entry, deconstructing ASCII code values before and after the data watermark is embedded, and performing data entry similarity evaluation through a cosine vector data similarity evaluation model;
and when the natural language field of the data entry is embedded with the natural language type data watermark, applying a space vector model to the natural language field before and after the data watermark is embedded for deconstructing and word segmentation, and evaluating the similarity of the data entry according to the deconstructed word segmentation result by a cosine vector data similarity evaluation model.
In the examples, the overall similarity evaluation of the target data is performed as follows:
Figure BDA0002807195680000101
in the formula: δ represents the similarity of the entire target data; n represents the total number of data entries in the target data; ciIndicating the data entry similarity of the ith data entry.
In order to achieve the purpose of achieving high concealment and high simulation after the data watermark is embedded into the target data, the embodiment of the invention selects proper proportion and distribution strategy of watermark addition according to the similarity evaluation results of different data watermark algorithms so as to finally achieve high concealment and high simulation after the data watermark is embedded.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The present invention is not limited to the above embodiments, and any modifications, equivalent replacements, improvements, etc. made within the spirit and principle of the present invention are included in the scope of the claims of the present invention which are filed as the application.

Claims (10)

1. A method of data watermark embedding, comprising:
embedding a data watermark in a data item of target data, and performing data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model;
calculating the overall similarity of the target data based on the data item similarity evaluation result;
when the overall similarity of the target data meets a threshold range, finishing data watermark embedding; otherwise, the data watermark embedding is completed after the overall similarity of the target data meets the threshold range by adjusting the embedding proportion and/or position of the data watermark.
2. The method of claim 1, wherein the adjusting the embedding proportion and/or the embedding position of the data watermark to make the overall similarity of the target data meet the threshold range comprises:
when the data entry contains a single type field, the whole similarity of the target data meets a threshold range by adjusting the embedding proportion of the data watermark;
when the data entry contains multiple types of fields, the overall similarity of the target data meets a threshold range by adjusting the embedding proportion and/or position of the data watermark.
3. The method of claim 2, wherein the adjusting the embedding ratio of the data watermark comprises:
when the overall similarity of the target data is larger than the maximum value in the threshold range, reducing the proportion of embedding the data watermark in the data entry of the target data to a preset proportion;
and when the overall similarity of the target data is less than the minimum value in the threshold range, increasing the proportion of embedding the data watermark in the data entry of the target data to a preset proportion.
4. The method of claim 2, wherein the adjusting the embedding location of the data watermark comprises:
removing the original data watermarks in the data entries, and respectively embedding the data watermarks matched with the field types into various types of fields in the data entries according to a preset proportion;
and carrying out data item similarity evaluation on the data items embedded with the data watermarks matched with the field types, selecting the position of the field with the maximum data item similarity as the optimal position for embedding the data watermarks, and embedding the data watermarks at the optimal position.
5. The method of any of claims 2 or 4, wherein the field types in the data entry include any one or more of:
a value field, a text field, and a natural language field.
6. The method of claim 5, wherein embedding the data watermark in the data entry of the target data comprises:
when the data entry comprises a value field, embedding a numerical data watermark in the value field;
when the data entry comprises a text field, embedding a character text type data watermark in the text field;
when a natural language field is included in the data entry, a natural language type data watermark is embedded in the natural language field.
7. The method of claim 1, wherein the performing data item similarity evaluation on the data item embedded with the data watermark by using a preset data similarity evaluation model comprises:
when a numerical value type data watermark is embedded into a numerical value field of the data entry, deconstructing and word segmentation are carried out on numerical values before and after the data watermark is embedded, and data entry similarity evaluation is carried out through an Euclidean distance vector data similarity evaluation model;
when a character text type data watermark is embedded into a text field of the data entry, deconstructing ASCII code values before and after the data watermark is embedded, and performing data entry similarity evaluation through a cosine vector data similarity evaluation model;
and when the natural language field of the data entry is embedded with the natural language type data watermark, applying a space vector model to the natural language field before and after the data watermark is embedded for deconstructing and word segmentation, and evaluating the similarity of the data entry according to the deconstructed word segmentation result by a cosine vector data similarity evaluation model.
8. The method of claim 1, wherein the global similarity assessment of the target data is performed according to the following equation:
Figure FDA0002807195670000021
in the formula: δ represents the similarity of the entire target data; n represents the total number of data entries in the target data; ciIndicating the data entry similarity of the ith data entry.
9. A data watermark embedding system, comprising:
the data item similarity evaluation module is used for embedding a data watermark into a data item of target data and carrying out data item similarity evaluation on the data item embedded with the data watermark by adopting a preset data similarity evaluation model;
the overall similarity calculation module is used for calculating the overall similarity of the target data based on the data item similarity evaluation result;
the adjusting module is used for completing data watermark embedding when the overall similarity of the target data meets a threshold range; and when the overall similarity of the target data does not meet the threshold range, the data watermark embedding is completed by adjusting the embedding proportion and/or position of the data watermark so that the overall similarity of the target data meets the threshold range.
10. The system of claim 9, wherein the adjustment module comprises:
the first adjusting submodule is used for adjusting the embedding proportion of the data watermark when the data item contains a single type field, so that the overall similarity of the target data meets a threshold range;
and the second adjusting submodule is used for adjusting the embedding proportion and/or position of the data watermark when the data entry contains multiple types of fields, so that the overall similarity of the target data meets the threshold range.
CN202011375889.3A 2020-11-30 2020-11-30 Data watermark embedding method and system Pending CN112559984A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011375889.3A CN112559984A (en) 2020-11-30 2020-11-30 Data watermark embedding method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011375889.3A CN112559984A (en) 2020-11-30 2020-11-30 Data watermark embedding method and system

Publications (1)

Publication Number Publication Date
CN112559984A true CN112559984A (en) 2021-03-26

Family

ID=75045523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011375889.3A Pending CN112559984A (en) 2020-11-30 2020-11-30 Data watermark embedding method and system

Country Status (1)

Country Link
CN (1) CN112559984A (en)

Similar Documents

Publication Publication Date Title
CN107368259A (en) A kind of method and apparatus that business datum is write in the catenary system to block
CN111079174A (en) Power consumption data desensitization method and system based on anonymization and differential privacy technology
CN107015985A (en) A kind of data storage and acquisition methods and device
CN103164393B (en) Report form formula disposal route and system
CN108509723A (en) LRU Cache based on artificial neural network prefetch mechanism performance income evaluation method
CN113449753B (en) Service risk prediction method, device and system
CN114356919A (en) Watermark embedding method, tracing method and device for structured database
CN113807940A (en) Information processing and fraud identification method, device, equipment and storage medium
CN113343677B (en) Intention identification method and device, electronic equipment and storage medium
CN112990583B (en) Method and equipment for determining model entering characteristics of data prediction model
CN110069781A (en) A kind of recognition methods of entity tag and relevant device
CN114036581A (en) Privacy calculation method based on neural network model
CN112559984A (en) Data watermark embedding method and system
CN112613045B (en) Method and system for embedding data watermark of target data
CN114511330B (en) Ether house Pompe fraudster detection method and system based on improved CNN-RF
CN114298882A (en) Watermark embedding method and tracing method for CAD data and electronic equipment
CN111027307B (en) Method and device for judging content influencing judgment result in judgment document
CN107203545A (en) A kind of data processing method and device
CN117272333B (en) Relational database watermark embedding and tracing method
CN110990869A (en) Electric power big data desensitization method applied to privacy protection
CN110517010A (en) A kind of data processing method, system and storage medium
CN113742495B (en) Rating feature weight determining method and device based on prediction model and electronic equipment
CN113656266B (en) Performance prediction method and system for government enterprise service system
CN112685418B (en) Method and system for realizing intelligent scheduling charging engine
KR101963822B1 (en) Method and apparatus for catergorizing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210000 9 Aoti street, Jianye District, Nanjing City, Jiangsu Province

Applicant after: State Grid Jiangsu Electric Power Co.,Ltd. Marketing Service Center

Applicant after: State Grid Smart Grid Research Institute Co.,Ltd.

Applicant after: ANHUI JIYUAN SOFTWARE Co.,Ltd.

Address before: 210000 9 Aoti street, Jianye District, Nanjing City, Jiangsu Province

Applicant before: State Grid Jiangsu Electric Power Co.,Ltd. Marketing Service Center

Applicant before: GLOBAL ENERGY INTERCONNECTION RESEARCH INSTITUTE Co.,Ltd.

Applicant before: ANHUI JIYUAN SOFTWARE Co.,Ltd.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230511

Address after: 210000 9 Aoti street, Jianye District, Nanjing City, Jiangsu Province

Applicant after: State Grid Jiangsu Electric Power Co.,Ltd. Marketing Service Center

Applicant after: ANHUI JIYUAN SOFTWARE Co.,Ltd.

Address before: 210000 9 Aoti street, Jianye District, Nanjing City, Jiangsu Province

Applicant before: State Grid Jiangsu Electric Power Co.,Ltd. Marketing Service Center

Applicant before: State Grid Smart Grid Research Institute Co.,Ltd.

Applicant before: ANHUI JIYUAN SOFTWARE Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination