CN117725565B - Data tracing method, device, equipment and medium based on digital watermark - Google Patents

Data tracing method, device, equipment and medium based on digital watermark Download PDF

Info

Publication number
CN117725565B
CN117725565B CN202311649634.5A CN202311649634A CN117725565B CN 117725565 B CN117725565 B CN 117725565B CN 202311649634 A CN202311649634 A CN 202311649634A CN 117725565 B CN117725565 B CN 117725565B
Authority
CN
China
Prior art keywords
data
watermark
user
distributed
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311649634.5A
Other languages
Chinese (zh)
Other versions
CN117725565A (en
Inventor
王齐
周爱华
王一清
郑真
蒋静
杨如侠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Smart Grid Research Institute Co ltd
Original Assignee
State Grid Smart Grid Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Smart Grid Research Institute Co ltd filed Critical State Grid Smart Grid Research Institute Co ltd
Priority to CN202311649634.5A priority Critical patent/CN117725565B/en
Publication of CN117725565A publication Critical patent/CN117725565A/en
Application granted granted Critical
Publication of CN117725565B publication Critical patent/CN117725565B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to the technical field of tracing, in particular to a data tracing method, device, equipment and medium based on digital watermarking. The method comprises the following steps: acquiring data to be distributed; when the data distribution is carried out on the users at the upper level, watermark embedding is carried out on the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information, so as to obtain the data embedded with the watermark and the user watermark; splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the data to users at the lower level; and when the data is leaked, positioning a leaked user according to the leaked data attribute column information and the user watermark. Through data sensitivity level evaluation, watermark embedding and database table splitting technology, when data leakage time occurs, all suspicious leakage users are rapidly positioned, and accurate identity of a leakage person is obtained through splitting comparison and watermark extraction comparison. Therefore, accurate positioning of data leakage is realized, and the method has wide application prospect.

Description

Data tracing method, device, equipment and medium based on digital watermark
Technical Field
The invention relates to the technical field of tracing, in particular to a data tracing method, device, equipment and medium based on digital watermarking.
Background
With the rapid development of informatization and digitalization, the importance of data security to enterprises is increasingly highlighted. The power grid is a key infrastructure of the country, and data security thereof has become an important component of national security. Traditional data security solutions mainly consider confidentiality and integrity of cryptography-based data. An effective data tracing and accountability system cannot be provided for unexpected data leakage.
However, with the development of smart grids, the trend in data development is larger scale, wider sources, and more complex types. This means that it is almost impossible to completely avoid information leakage. Furthermore, most grid data is stored and distributed in the form of a relational database of mixed data types. Most of the existing digital watermark tracing schemes are limited by data types, and cannot completely meet the requirements of practical application.
Disclosure of Invention
In view of the above, the invention provides a data tracing method, device, equipment and medium based on digital watermarking, which are used for solving the problem that most of the existing data tracing methods cannot completely meet the actual application requirements in the field of power grid data security.
In a first aspect, the present invention provides a data tracing method based on digital watermarking, the method comprising: acquiring data to be distributed, wherein the data to be distributed is used for being distributed among a plurality of users of a plurality of levels, and the data to be distributed is sent to users of a lower level by users of an upper level; when the data distribution is carried out on the users at the upper level, watermark embedding is carried out on the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information, so as to obtain the data embedded with the watermark and the user watermark; splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at lower layers; and when data leakage occurs, positioning a leakage user according to the leakage data attribute column information and the user watermark.
According to the data tracing method based on the digital watermark, through data sensitivity level evaluation, watermark embedding and database table splitting technology, when data leakage time occurs, all suspicious leakage users are rapidly located, and accurate identity of a leakage person is obtained through splitting comparison and watermark extraction comparison. Therefore, accurate positioning of data leakage is realized, and the method has wide application prospect.
In an alternative embodiment, watermark embedding is performed on data to be distributed according to the sensitivity level of the data to be distributed and user identity information to obtain watermark embedded data and user watermarks, including: determining data to be watermark embedded according to the sensitivity level of the data to be distributed; generating a user watermark according to the user identity information, and embedding the watermark into the data to be watermark-embedded by adopting the user watermark to obtain the watermark-embedded data.
In this embodiment, the watermark embedding is performed on the data to be watermark embedded according to the sensitivity level of the data, so that the influence on the data precision can be reduced.
In an alternative embodiment, generating a user watermark according to user identity information, and watermark embedding is performed on data to be watermark embedded by using the user watermark to obtain watermark embedded data, including: calculating the user identity information by adopting a hash function to generate a user watermark; determining a preset digital field according to the data type of the data needing watermark embedding, and watermark embedding by adopting a mode of replacing the preset data field with a user watermark to obtain the data embedded with the watermark.
In the embodiment, the user watermark is generated by calculating the user identity information by adopting a hash function, so that the user watermark corresponds to the user; and determining different preset data fields based on the data types to embed the user watermark, so that the confidentiality of watermark embedding is improved.
In an alternative embodiment, splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at a lower hierarchy level, including: generating a user sequence according to attribute column information of the data embedded with the watermark, user identity information of the distributed data and user identity information of the received data based on the hash function; splitting an attribute column of the data embedded with the watermark according to the user sequence to obtain split data; and sending the split data to the users at the lower hierarchy level.
In the embodiment, the user sequence is generated based on the user identity information and the attribute column information, and the user sequence is adopted to split the user, so that different users correspond to different splitting modes, and follow-up data tracing is facilitated.
In an alternative embodiment, when the next level user receiving the split data further includes a next level user, after distributing the split data to the next level user, the method further includes: and taking the lower-level user receiving the split data as the upper-level user, and repeating the steps of watermark embedding, data splitting and split data distribution.
In this embodiment, the steps of watermark embedding, data splitting and splitting data distribution are repeated to realize the distribution of users between different levels.
In an alternative embodiment, locating a compromised user based on compromised data attribute column information and a user watermark, comprises: determining suspicious users according to the relation between the leakage data attribute column information and the user sequence and the hierarchical relation of the users; and positioning the leakage user according to the comparison result of the user watermark of the suspicious user and the leakage data.
In an alternative embodiment, locating the compromised user based on the comparison of the user watermark of the suspected user with the compromised data comprises: determining the data units with the watermarks of the suspicious users according to the comparison of the watermarks of the suspicious users and preset data fields of each data unit in the leaked data; and positioning the leakage user according to the number of the data units with the suspected user watermarks in the leakage data.
In this embodiment, the method of determining the suspicious user by using the user sequence and determining the leaked user in the suspicious user based on the user watermark realizes the positioning of the leaked user, and meanwhile, in the multi-level distribution process, the watermark of the lower user covers the watermark information of most of the upper users. Through the positioning mode, the identity of the user of the last manager of the leakage data can be obtained, so that the identity of the divulger can be determined.
In a second aspect, the present invention provides a data tracing device based on digital watermarking, where the device includes: the data acquisition module is used for acquiring data to be distributed, wherein the data to be distributed is used for being distributed among a plurality of users of a plurality of levels and is sent to users of lower levels by users of upper levels; the watermark embedding module is used for embedding the watermark into the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information when the data distribution is carried out on the users at the upper level, so as to obtain the data embedded with the watermark and the user watermark; the splitting module is used for splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at a lower level; and the positioning module is used for positioning the leaked user according to the leaked data attribute column information and the user watermark when the data leakage occurs.
In a third aspect, the present invention provides a computer device comprising: the digital watermark-based data tracing method of the first aspect or any one of the corresponding embodiments thereof is implemented by the processor and the memory, the memory and the processor are in communication connection with each other, and the memory stores computer instructions.
In a fourth aspect, the present invention provides a computer readable storage medium, on which computer instructions are stored, the computer instructions being configured to cause a computer to perform the digital watermark-based data tracing method according to the first aspect or any one of the embodiments corresponding thereto.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow diagram of a data tracing method based on digital watermarking according to an embodiment of the present invention;
Fig. 2 is a schematic diagram of a Tree-Distr according to an embodiment of the present invention;
fig. 3 is a flow chart of a data tracing method based on digital watermark according to an embodiment of the invention;
fig. 4 is a flow chart of another data tracing method based on digital watermarking according to an embodiment of the present invention;
fig. 5 is a block diagram of a data tracing apparatus based on digital watermarking according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
According to an embodiment of the present invention, there is provided a data tracing method embodiment based on digital watermarking, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different from that herein.
In this embodiment, a data tracing method based on digital watermark is provided, fig. 1 is a flowchart of the data tracing method based on digital watermark according to an embodiment of the present invention, as shown in fig. 1, the flowchart includes the following steps:
Step S101, obtaining data to be distributed, where the data to be distributed is used for distribution among multiple users in multiple tiers, and the data to be distributed is sent to users in lower tiers by users in upper tiers. The method is characterized in that the data tracing based on the digital watermark is to add the watermark into the data, and when the data is leaked, a leaked user can be quickly positioned based on the watermark. Specifically, in the present embodiment, data is distributed among a plurality of users, and the plurality of users are divided into a plurality of tiers according to the distribution relationship of the data, wherein the owners of the data are divided into the 0 th tier, that is, the owners of the data are represented as user 0. The owner of the data distributes the data to the first tier of users, denoted user 1-1, user 1-2, who need the data. The first tier users distribute the data to the second tier users that need the data, the users of the second tier being denoted as user 2-1, user 2-2, and so on, the nth user of the mth tier being denoted as user m-n. Thus, when data is distributed between two levels, for example, between a first level and a second level, the first level is an upper level and the second level is a lower level.
In this embodiment, the data to be distributed is specifically power distribution and distributed new energy data, and in other embodiments, the data to be distributed may also be other data, and the data to be distributed is not specifically limited in the present invention.
Step S102, when the data distribution is carried out on the users at the upper layer, watermark embedding is carried out on the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information, and the data embedded with the watermark and the user watermark are obtained. The data to be distributed, in particular the distributed new energy data resources, are large in scale, wide in source and multiple in types, privacy sensitivity degrees of the data resources are different, so that the data are firstly subjected to sensitivity level assessment before watermark embedding, and then the data to be distributed are subjected to watermark embedding according to assessment results and user identity information, so that the influence on data precision is reduced. Wherein, the evaluation of the sensitivity level can be realized by adopting a related technology.
Step S103, splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at lower layers. Specifically, in this embodiment, the data to be distributed is a database table, and when splitting the data, the splitting is performed according to the attribute column of the data, and a specific splitting manner may be determined based on user identity information, for example, different splitting manners corresponding to different users.
Step S104, when data leakage occurs, the leakage user is positioned according to the leakage data attribute column information and the user watermark. Specifically, after data leakage occurs, leakage data can be obtained, a specific splitting mode is determined according to attribute column information in the leakage data, so that a user who is likely to leak the data is determined, and meanwhile, watermark information in the leakage data and a user watermark can be compared, so that the user who is specific to the leakage data is positioned.
According to the data tracing method based on the digital watermark, through data sensitivity level evaluation, watermark embedding and database table splitting technology, when data leakage time occurs, all suspicious leakage users are rapidly located, and accurate identity of a leakage person is obtained through splitting comparison and watermark extraction comparison. Therefore, accurate positioning of data leakage is realized, and the method has wide application prospect.
In this embodiment, a data tracing method based on digital watermarking is provided, and the process includes the following steps:
in step S201, data to be distributed is acquired, and the data to be distributed is used for distribution among a plurality of users in a plurality of tiers, and is sent to users in a lower tier by users in an upper tier. Please refer to step S101 in the embodiment shown in fig. 1 in detail, which is not described herein.
Step S202, when the data distribution is carried out on the users at the upper layer, watermark embedding is carried out on the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information, and the data embedded with the watermark and the user watermark are obtained.
Specifically, the step S202 includes:
Step S2021, determining the data needing watermark embedding according to the sensitivity level of the data to be distributed; specifically, in the present embodiment, the sensitivity level of the data is evaluated by the sensitive words contained in the data. For example, for data such as power distribution and utilization, distributed new energy and the like, corresponding sensitive word libraries can be established under different application scenes. By searching whether the corresponding sensitive words in the sensitive word stock exist in different data columns of the data to be distributed, the sensitivity level of the data columns can be evaluated. The specific calculation formula is as follows:
ω=x1α1+x2α2+…+xnαn
x1+x2+…+xn=1
wherein ω represents the sensitivity level, α i represents the sensitivity value of the i-th sensitive word, and x i represents the calculation coefficient corresponding to the i-th sensitive word.
In an alternative embodiment, examples of sensitive word stores are shown in Table 1 below:
TABLE 1
It should be noted that the sensitivity value of the sensitive word may be determined based on expert experience, and the calculation coefficient may be determined according to the data type corresponding to the sensitive word, for example, different data types may set different calculation coefficients according to the sensitivity degree. After determining the sensitivity level of the data column, the sensitivity level may be compared with a set sensitivity threshold, and when the sensitivity level is smaller than the sensitivity threshold, the corresponding data column, i.e. the non-sensitive data column, is used as the data to be watermark embedded. The sensitive threshold value can be set according to the scene-based power distribution and distribution type new energy data.
Step S2022, generating a user watermark according to the user identity information, and performing watermark embedding on the data needing watermark embedding by adopting the user watermark to obtain the data embedded with the watermark.
Specifically, the step S2022 includes:
step a1, calculating user identity information by adopting a hash function to generate a user watermark; specifically, the user identity information is user identity information for distributing data or user identity information of a hierarchical level, the user identity information adopts information capable of distinguishing different users, for example, user identity information ID can be adopted, hash function Hash is adopted to calculate the user identity information ID, hash value ζ=hash (ID) is obtained, and the Hash value is used as a user watermark. It should be noted that, after the user watermark is obtained, it may be stored in a trusted third party for subsequent watermark detection.
And a2, determining a preset digital field according to the data type of the data needing watermark embedding, and embedding the watermark by adopting a mode of replacing the preset data field with the user watermark to obtain the data embedded with the watermark. Specifically, for different data types, different preset data fields may be set. For example, for power distribution and distributed new energy data, it mainly includes numeric data and text data, for numeric data, the first P-bit binary value of the user watermark is used to replace the binary value of the P-bit redundant field present in the text. For numeric data, the first Q bits of the user watermark are replaced with their least significant Q bits. Watermark embedding is realized through data replacement, and watermark embedded data is obtained.
Step S203, splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at lower layers;
Specifically, the step S203 includes:
Step S2031, generating a user sequence according to attribute column information of the data embedded with the watermark, user identity information of the distributed data and user identity information of the received data based on a hash function; the user identity information of the distributed data is the user identity information of the user at the upper level, the user identity information of the user is received, the user identity information of the user is the user identity information of the user at the lower level, for example, the user 0 distributes the data to the first-level user 1-n, the user identity information of the distributed data is the user identity information of the user 0, and the user identity information of the received user is the user identity information of the user 1-n. In this embodiment, when generating the user sequence, the total number of attribute columns except the primary key in the watermark embedded data, that is, the total number of data columns except the primary key, is determined to be N, and then hash calculation is performed on the user identity information of the distributed data, that is, the user ID 0, the user identity information of the receiving user, that is, the ID 1-n, and the table name a name of the watermark embedded data by using a hash function to generate the {0,1} user sequence with the length N
Step S2032, splitting the attribute column of the data embedded with the watermark according to the user sequence to obtain split data; specifically, after obtaining the user sequence, the user sequence is used to split the watermark embedded data into 2 sub-tablesSequence(s)Attribute column information maintained in each sub-table is described, where sub-tableRetaining primary keys and sequences in watermark-embedded data A 1 All attribute columns corresponding to the number 1; sub-tableRetaining primary keys and sequences in watermark-embedded data A 1 All attribute columns corresponding to the number 0. Sequence(s)And also stored in a trusted third party for subsequent data tracing.
Step S2033, the split data is sent to the lower-tier users.
The splitting and distribution process will be described below taking the example of user 0 distributing data to user 1-1 and to user 1-2:
Assume that watermark embedded data a 1 = { username, account number, password, phone, mailbox, address, device number, voltage level, power usage } is some table in user 0's original database and that a watermark has been added to non-sensitive information, where the username is the primary key.
When user 0 sends a 1 to user 1-1:
according to the step S2031, a split sequence is obtained by calculation Then split A 1 into tablesAnd That is, the attribute columns of the two sub-tables after splitting are shown in Table 2 below (1 indicates that the corresponding attribute column is included).
TABLE 2
When user 0 sends a 1 to user 1-2:
according to the step S2031, a split sequence is obtained by calculation Then split A 1 into tablesAnd That is, the attribute columns of the two sub-tables after splitting are shown in Table 3 below (1 indicates that the corresponding attribute column is included).
TABLE 3 Table 3
It should be noted that, when the next level user further includes a next level user after receiving the split data, after distributing the split data to the next level user, the method further includes: and taking the lower-level user receiving the split data as the upper-level user, and repeating the steps of watermark embedding, data splitting and split data distribution. Specifically, after the upper level user 0 sends the data to the lower level user 1-n, if the next level user 2-n needs the data, the user 1-n splits the sequence according to the third partyThe two sub-tables are combined into an original table A 1, watermark information of the sub-tables is added, watermark embedding and data splitting are carried out in the same method as the steps S202 to S203, and the split data are sent to users 2-n.
In this embodiment, a Tree structure diagram as shown in fig. 2 is used to describe a hierarchical relationship of data distribution, and is denoted as Tree-Distr. The data stored in the nodes of the tree structure diagram is the code number of each user participating in distribution, wherein the root node represents the owner of the original data, and the father node of each child node represents the data distributor of the previous level.
Step S204, when data leakage occurs, the leakage user is positioned according to the leakage data attribute column information and the user watermark.
Specifically, the step S204 includes:
step S2041, determining suspicious users according to the relation between the leakage data attribute column information and the user sequence and the hierarchical relation of the users; specifically, after determining that data leakage occurs, attribute column information in the leaked data can be obtained, then the attribute column information is compared with a user sequence stored in a third party, and the split mode corresponding to which user sequence contains the attribute column information of the leaked data is judged, so that a user corresponding to the user sequence is obtained.
Step S2042, positioning the leakage user according to the comparison result of the user watermark of the suspicious user and the leakage data. The method specifically comprises the following steps: determining the data units with the watermarks of the suspicious users according to the comparison of the watermarks of the suspicious users and preset data fields of each data unit in the leaked data; and positioning the leakage user according to the number of the data units with the suspected user watermarks in the leakage data. Specifically, when comparing, the user watermark of the suspicious user k stored in the trusted third party is compared with the redundant fields or the low-significant bit areas of all the data units of the non-sensitive data column of the leaked data, when the similarity ratio of the two is higher than a preset threshold value, the watermark of the user k is considered to exist in the element, and when the number of the data units with the watermark detected in the data column is higher than y% of the total number of the data units of the data column. It is determined that the watermark of user k is contained in the leakage data. I.e. the locatable user k is determined to be a leaky user. Thus, in a multi-level distribution process, the watermark of a lower level user will overlay the watermark information of a majority of the upper level users. By the method, the identity of the user of the last manager of the leaked data can be obtained, so that the identity of the divulger, namely the leaked user, is determined.
As a specific application embodiment of the present invention, as shown in fig. 3 and fig. 4, with power distribution and distributed new energy data as data to be distributed, the data tracing method based on digital watermark is implemented by adopting the following flow:
Step 1: user 0 is the owner of the original database, who needs to distribute database table a down. The user m-n in the data distribution process represents the nth user of the mth level.
Step 2: the data resources of the distributed new energy source are large in scale, wide in source and multiple in types, and privacy sensitivity degrees of the data resources are different, so that evaluation of data sensitivity levels is carried out before watermark embedding, and only watermark embedding is carried out on non-sensitive data, so that influence on data precision can be reduced. User 0 first evaluates the sensitivity level of the data columns based on the data type and content of each data column in a.
Step 3: and the user 0 judges whether watermark embedding is carried out or not according to the sensitivity level of each data column in the A. And when the sensitivity level of the data column is smaller than the sensitivity threshold, namely the threshold value, watermark embedding is carried out on the column data.
Step 4: for a data column to be embedded with the watermark, the user 0 adopts a hash function to perform hash calculation on the user identity representation of the user 0, and a user watermark sequence is obtained.
Step 5: and selecting different preset data fields according to the data types, and embedding the watermark by adopting a mode of replacing the preset data fields with a user watermark sequence to obtain a database table A 1 embedded with the watermark.
Step 6: before user 0 distributes the database table A 1 to primary users 1-N, firstly, calculating the total number of attribute columns except the primary key in the database table A 1, marking as N, then using a Hash function, generating {0,1} sequence with length N by own user ID 0, the ID 1-n of the lower user and the name A name of the database table
Then according to the sequenceSplit A 1 into 2 sub-tablesSequence(s)Attribute column information maintained in each sub-table is described, where sub-tablePreserving primary keys and sequences in database table A 1 All attribute columns corresponding to the number 1; sub-tablePreserving primary keys and sequences in database table A 1 All attribute columns corresponding to the number 0. Sequence(s)And also stored in a trusted third party for subsequent data tracing.
Step 7: user 1-n receipt sub-tableThereafter, splitting sequences from third party storageThe two sub-tables are combined into an original table A 1, then watermark information of the sub-tables is added, and watermark embedding and database table splitting are carried out in the same method from step 2 to step 6. The hierarchical relationship of data distribution is described by a Tree structure diagram and is denoted as Tree-Distr. The data stored in the nodes of the tree structure diagram is the code number of each user participating in distribution, wherein the root node represents the owner of the original data, and the father node of each child node represents the data distributor of the previous level.
Step 8: when the data leakage time occurs, positioning all suspicious leaked users, namely the user according to the splitting mode of the leakage data table.
Step 9: watermark information is extracted from the leaked data, and the watermark information of the suspicious leaked person is compared with the watermark information extracted from the leaked data so as to confirm the identity of the leaked person and perform responsibility determination.
The embodiment also provides a data tracing device based on digital watermark, which is used for implementing the above embodiment and the preferred implementation manner, and is not described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.
The embodiment provides a data tracing device based on digital watermarking, as shown in fig. 5, including:
A data acquisition module 51, configured to acquire data to be distributed, where the data to be distributed is used for distribution among multiple users in multiple tiers, and the data to be distributed is sent to users in lower tiers by users in upper tiers;
The watermark embedding module 52 is configured to, when performing data distribution on a user at an upper level, perform watermark embedding on the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information, so as to obtain watermark-embedded data and a user watermark;
A splitting module 53, configured to split the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distribute the split data to users at a lower level;
and the positioning module 54 is used for positioning the leaked user according to the leaked data attribute column information and the user watermark when the data leakage occurs.
In an alternative embodiment, the watermark embedding module comprises: the watermark generation module is used for determining data needing watermark embedding according to the sensitivity level of the data to be distributed; and the embedding sub-module is used for generating a user watermark according to the user identity information, and carrying out watermark embedding on the data needing watermark embedding by adopting the user watermark to obtain the data embedded with the watermark.
In an alternative embodiment, the embedding submodule is specifically adapted to: calculating the user identity information by adopting a hash function to generate a user watermark; determining a preset digital field according to the data type of the data needing watermark embedding, and watermark embedding by adopting a mode of replacing the preset data field with a user watermark to obtain the data embedded with the watermark.
In an alternative embodiment, the splitting module is specifically configured to: generating a user sequence according to attribute column information of the data embedded with the watermark, user identity information of the distributed data and user identity information of the received data based on the hash function; splitting an attribute column of the data embedded with the watermark according to the user sequence to obtain split data; and sending the split data to the users at the lower hierarchy level.
In an alternative embodiment, when a next-level user receiving split data is further included after the next-level user, the apparatus further includes: and the distribution module is used for taking a lower-level user receiving the split data as an upper-level user and repeating the steps of watermark embedding, data splitting and split data distribution.
In an alternative embodiment, the positioning module includes: the suspicious user determining module is used for determining suspicious users according to the relation between the leakage data attribute column information and the user sequence and the hierarchical relation of the users; and the positioning sub-module is used for positioning the leakage user according to the comparison result of the user watermark of the suspicious user and the leakage data.
In an alternative embodiment, the positioning sub-module is specifically configured to: determining the data units with the watermarks of the suspicious users according to the comparison of the watermarks of the suspicious users and preset data fields of each data unit in the leaked data; and positioning the leakage user according to the number of the data units with the suspected user watermarks in the leakage data.
Further functional descriptions of the above respective modules and units are the same as those of the above corresponding embodiments, and are not repeated here.
The embodiment of the invention also provides a computer device which is provided with the data tracing device based on the digital watermark shown in the figure 5.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an alternative embodiment of the present invention, as shown in fig. 6, the computer device includes: one or more processors 10, memory 20, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are communicatively coupled to each other using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the computer device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In some alternative embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple computer devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 10 is illustrated in fig. 6.
The processor 10 may be a central processor, a network processor, or a combination thereof. The processor 10 may further include a hardware chip, among others. The hardware chip may be an application specific integrated circuit, a programmable logic device, or a combination thereof. The programmable logic device may be a complex programmable logic device, a field programmable gate array, a general-purpose array logic, or any combination thereof.
Wherein the memory 20 stores instructions executable by the at least one processor 10 to cause the at least one processor 10 to perform a method for implementing the embodiments described above.
The memory 20 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the computer device of the presentation of a sort of applet landing page, and the like. In addition, the memory 20 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some alternative embodiments, memory 20 may optionally include memory located remotely from processor 10, which may be connected to the computer device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
Memory 20 may include volatile memory, such as random access memory; the memory may also include non-volatile memory, such as flash memory, hard disk, or solid state disk; the memory 20 may also comprise a combination of the above types of memories.
The computer device also includes a communication interface 30 for the computer device to communicate with other devices or communication networks.
The embodiments of the present invention also provide a computer readable storage medium, and the method according to the embodiments of the present invention described above may be implemented in hardware, firmware, or as a computer code which may be recorded on a storage medium, or as original stored in a remote storage medium or a non-transitory machine readable storage medium downloaded through a network and to be stored in a local storage medium, so that the method described herein may be stored on such software process on a storage medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware. The storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, a flash memory, a hard disk, a solid state disk or the like; further, the storage medium may also comprise a combination of memories of the kind described above. It will be appreciated that a computer, processor, microprocessor controller or programmable hardware includes a storage element that can store or receive software or computer code that, when accessed and executed by the computer, processor or hardware, implements the methods illustrated by the above embodiments.
Although embodiments of the present invention have been described in connection with the accompanying drawings, various modifications and variations may be made by those skilled in the art without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope of the invention as defined by the appended claims.

Claims (9)

1. A data tracing method based on digital watermarking, the method comprising:
Acquiring data to be distributed, wherein the data to be distributed is used for being distributed among a plurality of users of a plurality of levels, and the data to be distributed is sent to users of a lower level by users of an upper level;
When the user at the upper layer distributes data, watermark embedding is carried out on the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information to obtain the data embedded with the watermark and the user watermark, wherein when the sensitivity level is smaller than a sensitivity threshold value, the corresponding data column is used as the data needing watermark embedding;
splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at lower layers;
When data leakage occurs, positioning a leakage user according to the leakage data attribute column information and the user watermark;
Splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at a lower level, wherein the method comprises the following steps:
Generating a user sequence according to attribute column information of the data embedded with the watermark, user identity information of the distributed data and user identity information of the received data based on the hash function;
Splitting an attribute column of the data embedded with the watermark according to the user sequence to obtain split data;
the split data is sent to the lower-tier users, and the lower-tier users combine the split data after receiving the split data.
2. The method according to claim 1, wherein watermark embedding is performed on the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information to obtain the data embedded with the watermark and the user watermark, comprising:
Determining data to be watermark embedded according to the sensitivity level of the data to be distributed;
Generating a user watermark according to the user identity information, and embedding the watermark into the data to be watermark-embedded by adopting the user watermark to obtain the watermark-embedded data.
3. The method of claim 2, wherein generating a user watermark from the user identity information, watermark embedding the data to be watermarked using the user watermark, to obtain watermarked data, comprises:
Calculating the user identity information by adopting a hash function to generate a user watermark;
Determining a preset digital field according to the data type of the data needing watermark embedding, and watermark embedding by adopting a mode of replacing the preset data field with a user watermark to obtain the data embedded with the watermark.
4. The method of claim 1, wherein when the next level user is further included after receiving the split data, the method further comprises, after distributing the split data to the next level user:
and taking the lower-level user receiving the split data as the upper-level user, and repeating the steps of watermark embedding, data splitting and split data distribution.
5. A method according to claim 3, wherein locating a compromised user based on compromised data attribute column information and a user watermark, comprises:
Determining suspicious users according to the relation between the leakage data attribute column information and the user sequence and the hierarchical relation of the users;
and positioning the leakage user according to the comparison result of the user watermark of the suspicious user and the leakage data.
6. The method of claim 5, wherein locating the compromised user based on a comparison of the user watermark of the suspected user with the compromised data, comprises:
determining the data units with the watermarks of the suspicious users according to the comparison of the watermarks of the suspicious users and preset data fields of each data unit in the leaked data;
And positioning the leakage user according to the number of the data units with the suspected user watermarks in the leakage data.
7. A digital watermark-based data tracing device, the device comprising:
The data acquisition module is used for acquiring data to be distributed, wherein the data to be distributed is used for being distributed among a plurality of users of a plurality of levels and is sent to users of lower levels by users of upper levels;
the watermark embedding module is used for embedding the watermark into the data to be distributed according to the sensitivity level of the data to be distributed and the user identity information when the data distribution is carried out on the upper-level user, so as to obtain the data embedded with the watermark and the user watermark, wherein when the sensitivity level is smaller than the sensitivity threshold, the corresponding data column is used as the data needing watermark embedding;
The splitting module is used for splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at a lower level;
the positioning module is used for positioning the leaked user according to the leaked data attribute column information and the user watermark when the data leakage occurs;
Splitting the data embedded with the watermark based on the user identity information of the distributed data and the attribute column information of the data embedded with the watermark, and distributing the split data to users at a lower level, wherein the method comprises the following steps:
Generating a user sequence according to attribute column information of the data embedded with the watermark, user identity information of the distributed data and user identity information of the received data based on the hash function;
Splitting an attribute column of the data embedded with the watermark according to the user sequence to obtain split data;
the split data is sent to the lower-tier users, and the lower-tier users combine the split data after receiving the split data.
8. A computer device, comprising:
A memory and a processor, the memory and the processor are in communication connection, the memory stores computer instructions, and the processor executes the computer instructions, thereby executing the data tracing method based on digital watermark as claimed in any one of claims 1 to 6.
9. A computer-readable storage medium having stored thereon computer instructions for causing a computer to perform the digital watermark-based data tracing method of any one of claims 1 to 6.
CN202311649634.5A 2023-12-04 2023-12-04 Data tracing method, device, equipment and medium based on digital watermark Active CN117725565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311649634.5A CN117725565B (en) 2023-12-04 2023-12-04 Data tracing method, device, equipment and medium based on digital watermark

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311649634.5A CN117725565B (en) 2023-12-04 2023-12-04 Data tracing method, device, equipment and medium based on digital watermark

Publications (2)

Publication Number Publication Date
CN117725565A CN117725565A (en) 2024-03-19
CN117725565B true CN117725565B (en) 2024-09-17

Family

ID=90208045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311649634.5A Active CN117725565B (en) 2023-12-04 2023-12-04 Data tracing method, device, equipment and medium based on digital watermark

Country Status (1)

Country Link
CN (1) CN117725565B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849783A (en) * 2021-09-30 2021-12-28 北京创安恒宇科技有限公司 Structured data label watermark tracing method based on state encryption
CN114357393A (en) * 2021-12-06 2022-04-15 哈尔滨工业大学(深圳) Relational database-oriented watermark adaptation method, device and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866735B (en) * 2015-05-22 2017-11-17 电子科技大学 A kind of insertion of numeric type relational database watermark and extraction verification method
WO2021115589A1 (en) * 2019-12-11 2021-06-17 Huawei Technologies Co., Ltd. Devices and methods for applying and extracting a digital watermark to a database
CN111797369B (en) * 2020-07-08 2022-05-06 哈尔滨工业大学(威海) Digital watermarking method for relational database

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113849783A (en) * 2021-09-30 2021-12-28 北京创安恒宇科技有限公司 Structured data label watermark tracing method based on state encryption
CN114357393A (en) * 2021-12-06 2022-04-15 哈尔滨工业大学(深圳) Relational database-oriented watermark adaptation method, device and storage medium

Also Published As

Publication number Publication date
CN117725565A (en) 2024-03-19

Similar Documents

Publication Publication Date Title
CN102460076B (en) Generating test data
US9081978B1 (en) Storing tokenized information in untrusted environments
Imamoglu et al. A new reversible database watermarking approach with firefly optimization algorithm
Farfoura et al. A novel blind reversible method for watermarking relational databases
CN110321675B (en) Webpage watermark-based generation and tracing method and device
CN112000632B (en) Ciphertext sharing method, medium, sharing client and system
Kamran et al. A comprehensive survey of watermarking relational databases research
US20230315846A1 (en) System and method for detecting leaked documents on a computer network
CN114356919A (en) Watermark embedding method, tracing method and device for structured database
Shah et al. Semi-fragile watermarking scheme for relational database tamper detection
Majhi et al. Challenges in Big Data Cloud Computing And Future Research Prospects: A Review: A Review
Tiwari et al. A novel watermarking scheme for secure relational databases
CN117725565B (en) Data tracing method, device, equipment and medium based on digital watermark
CN116702103A (en) Database watermark processing method, database watermark tracing method and device
Li et al. A feature-map-based large-payload DNN watermarking algorithm
CN114461606A (en) Data storage method and device, computer equipment and storage medium
CN115878592A (en) Government affair data management method and device, storage medium and electronic equipment
US11699209B2 (en) Method and apparatus for embedding and extracting digital watermarking for numerical data
Mohanpurkar et al. A fingerprinting technique for numeric relational databases with distortion minimization
CN114298882A (en) Watermark embedding method and tracing method for CAD data and electronic equipment
CN110866858B (en) Watermark embedding method, watermark embedding device, query data providing device, and data processing method
CN107085681A (en) The computing device marking frame of robust
CN113704709B (en) Digital watermark data tracing method based on attribute importance index
JP2020052569A (en) Information processing apparatus, information processing method and program
CN116861013B (en) CIM data credibility improving method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant