CN115455383A

CN115455383A - Method, device and equipment for processing watermark information of database

Info

Publication number: CN115455383A
Application number: CN202211417478.5A
Authority: CN
Inventors: 李公宝
Original assignee: Beijing Yizhixuan Technology Co ltd
Current assignee: Beijing Yizhixuan Technology Co ltd
Priority date: 2022-11-14
Filing date: 2022-11-14
Publication date: 2022-12-09
Anticipated expiration: 2042-11-14
Also published as: CN115455383B

Abstract

The invention provides a method, a device and equipment for processing watermark information of a database, wherein the method comprises the following steps: generating watermark information; embedding the watermark information into a preset database to obtain a watermark database; generating a first decision tree according to first data in the preset database and generating a second decision tree according to second data in the watermark database; modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database; wherein a third decision tree generated from third data in the target watermark database is the same as the first decision tree; the scheme of the invention overcomes the threat of the database to the data security in copyright authentication and tracing, ensures the validity of the data, improves the accuracy of extracting the watermark information and has good robustness.

Description

Method, device and equipment for processing watermark information of database

Technical Field

The invention relates to the technical field of watermarks, in particular to a method, a device and equipment for processing watermark information of a database.

Background

Big Data (Big Data) is a hot research field, is highly concerned by various subject fields, and increasingly influences and changes people's thinking mode, business operation mode, scientific research and education concept, medical health concept and the like. Big data mining and analysis refers to the fact that simple, isolated, scattered and fragmented data are connected through data sharing and fusion, and therefore deep, hidden and valuable information and knowledge can be found.

In view of the current development, a bottleneck of the deep development of big data is a data security problem in data sharing transaction. The big data often contains many sensitive data, including secret-related data, personal privacy and the like, and once the data is leaked, damaged or tampered, the data can have serious consequences. The development and application of big data technology are like a double-edged sword, which brings convenience and brings great threat to the safety of data. The existing database watermarking technology can realize copyright authentication and leakage source tracking in data transmission, but the usability of data used for a data mining algorithm cannot be guaranteed, the embedding of the watermark causes the result of data mining to change, and the actual value of the data is damaged.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method, a device and equipment for processing watermark information of a database, so that the threat of the database on data security in copyright authentication and tracing is solved, the validity of data is ensured, the accuracy of extracting the watermark information is improved, and the robustness is good.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a watermark information processing method for a database comprises the following steps:

generating watermark information;

embedding the watermark information into a preset database to obtain a watermark database;

generating a first decision tree according to first data in the preset database and generating a second decision tree according to second data in the watermark database;

modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database; wherein a third decision tree generated from third data in the target watermark database is the same as the first decision tree.

Optionally, the watermark information is a random sequence with a length of n

，

For the ith bit watermark information in the random sequence W,

，

；

embedding the watermark information into a preset database to obtain a watermark database, comprising:

carrying out Hash grouping on M tuples in a preset database according to a primary key value of each tuple to obtain a plurality of groups;

the ith bit watermark information w in the watermark information is processed _i Is embedded into the t-th _i And obtaining a watermark database in each group.

Optionally, modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database, including:

acquiring a first segmentation value of a first decision tree and a second segmentation value of a second decision tree;

determining the modification direction of second data in the watermark database according to the first segmentation value and the second segmentation value; the modification direction is to modify the target tag in the first database segment in the watermark database to the second database segment;

modifying the second data in the watermark database according to a preset constraint equation and the modification direction to obtain a modified watermark database until a third decision tree generated according to the data in the modified watermark database is the same as the first decision tree;

and determining the modified watermark database as a target watermark database.

Optionally, the first segmentation value divides a preset database into two database segments; the second partitioning value divides the watermark database into two database segments;

determining a modification direction of second data in the watermark database according to the first segmentation value and the second segmentation value, including:

determining a first index and a second index of the first segmentation value in a preset database;

determining a third index and a fourth index of the second segmentation value in a watermark database; the index represents the probability of the data in a random number in the database segment being mistaken;

calculating the difference value of the first index and the third index to obtain a first difference;

calculating a difference value between the second index and the fourth index to obtain a second difference;

and comparing the first difference with the second difference to determine the modification direction of the second data in the watermark database.

Optionally, modifying the second data in the watermark database according to a preset constraint equation and the modification direction to obtain a modified watermark database, including:

acquiring a first variation of the first segmentation value in the watermark database and a second variation of a second segmentation value in the watermark database;

obtaining a third difference according to the first variation and the second variation;

and modifying the second data in the watermark database according to the modification direction under the condition of meeting the preset constraint equation according to the third difference to obtain a modified watermark database.

Optionally, the method for processing watermark information in a database further includes:

and carrying out watermark extraction on the target watermark database to obtain target watermark information.

Optionally, the watermark extraction is performed on the target watermark database to obtain target watermark information, and the method includes:

performing Hash grouping on watermark data in a target watermark database to obtain a plurality of target groups;

and watermark extraction is carried out on the watermark data in each target group to obtain target watermark information.

The invention also provides a watermark information processing device of the database, which comprises:

the generating module is used for generating watermark information;

the processing module is used for embedding the watermark information into a preset database to obtain a watermark database; generating a first decision tree according to first data in the preset database and generating a second decision tree according to second data in the watermark database; modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database; wherein a third decision tree generated from third data in the target watermark database is the same as the first decision tree.

The present invention also provides a computing device comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method as described above.

The present invention also provides a computer-readable storage medium storing instructions which, when executed on a computer, cause the computer to perform the method as described above.

The scheme of the invention at least comprises the following beneficial effects:

according to the scheme of the invention, watermark information is generated; embedding the watermark information into a preset database to obtain a watermark database; generating a first decision tree according to first data in the preset database and generating a second decision tree according to second data in the watermark database; modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database; wherein a third decision tree generated from third data in the target watermark database is the same as the first decision tree; the method overcomes the threat of the database on data security in copyright authentication and tracing, ensures the validity of the data, improves the accuracy of extracting the watermark information, and has good robustness.

Drawings

Fig. 1 is a schematic flow chart of a method for processing watermark information of a database according to an embodiment of the present invention;

fig. 2 is a schematic diagram of the positioning of a first partition value and a second partition value in a watermark database according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a first decision tree in a specific embodiment provided by the present invention;

FIG. 4 is a diagram of a second decision tree in a specific embodiment provided by the present invention;

FIG. 5 is a schematic diagram of a third decision tree in a specific embodiment provided by the present invention;

fig. 6 is a flowchart illustrating a watermark information processing method in an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a watermark information processing apparatus of a database according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As shown in fig. 1, an embodiment of the present invention provides a method for processing watermark information of a database, including:

step 11, generating watermark information;

step 12, embedding the watermark information into a preset database to obtain a watermark database;

step 13, generating a first decision tree according to the first data in the preset database and generating a second decision tree according to the watermark database;

step 14, modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database; wherein a third decision tree generated from third data in the target watermark database is the same as the first decision tree.

In this embodiment, the watermark information processing process of the database includes: embedding watermark information into a preset database and modifying a decision tree; embedding watermark information into a preset database, wherein the embedding of the watermark information into the preset database preferably adopts an LSB (Least Significant Bit) watermark technology, and the embedding of the watermark information based on the LSB watermark technology has better robustness effect; embedding watermark information into a preset database to obtain a watermark database, and respectively generating a first decision tree and a second decision tree according to first data in the preset database and second data in the watermark database, wherein the second decision tree generated by the watermark database added with the watermark information can cause a data result to change when carrying out watermark identification, so that the second data in the watermark database needs to be modified according to the first decision tree and the second decision tree to obtain a target watermark database; therefore, the threat to the data security brought by copyright authentication and tracing of the target watermark database can be overcome, the effectiveness of the data is ensured, the accuracy of extracting the watermark information is improved, and the robustness effect is good.

In an optional embodiment of the present invention, the watermark information W is a random sequence with a length n

，

For the ith bit watermark information in the random sequence W,

，

；

step 12 comprises:

step 121, performing hash grouping on the M tuples in the preset database according to the primary key value of each tuple to obtain a plurality of groups;

step 122, the ith bit watermark information in the watermark information W is processed

Is embedded in the t-th _i And obtaining a watermark database in each group.

In this embodiment, the watermark information W is a random sequence with length n

Embedding the watermark information W into a preset database,

is the ith bit watermark information in the random sequence W of watermark information,

，

；

the predetermined database, preferably a numeric database, comprises M tuples

Wherein, in the step (A),

is an attribute column; l is a label, and if the preset database is a binary data set, the label

；

A unique identification primary key value for each tuple;

embedding watermark information W into a preset database according to the LSB watermark technology, which specifically comprises the following steps: according to the length n of the watermark information, carrying out hash grouping on M tuples in a preset database according to primary key values to obtain a plurality of groups, namely, through a formula

Performing a hash grouping, wherein t _i For the group number of the packet to be,

is a connector, n is the total number of packets, H (.) is a hash function, K is a key, mod is a modulus operator;

the ith bit watermark information w in the watermark information is processed _i Sequentially embedded into the t-th _i In each group, a watermark database is obtained, in particular, by a formula

Watermarking information w of ith bit _i Is embedded in the t-th _i In a group, where w _i Is the ith bit watermark information, a is the tth bit _i The data in each of the packets is transmitted,

for embedding i-th bit watermark information w _i The latter data;

traversing random sequences of watermark information

And sequentially embedding the n-bit watermark information into a preset database D according to the embedding method of the watermark information to obtain a watermark database.

In one embodiment, the predetermined database is a numeric database

(ii) a Wherein the content of the first and second substances,

is the attribute column of the numeric database, L is the label,

a unique identification key value for each tuple in the numeric database, wherein the numeric database comprises M tuples;

carrying out Hash grouping on the numerical database D to obtain a plurality of groups;

random sequence with length n and related to identity identification is generated by using random sequence

Traversing a random sequence W related to the identity, and sequentially embedding n-bit identity information into a numerical database D to obtain a watermark database

，

Wherein, in the step (A),

the attribute column of the watermark database, L is the label,

a unique identifying primary key value for each tuple in the watermark database, which comprises M tuples.

After the watermark database is obtained, in order to ensure that the decision tree can be accurately established according to the watermark database, the second decision tree corresponding to the watermark database is adjusted according to the first decision tree of the preset database, and then the second data in the watermark database is modified, so that the modified second data can generate the correct second decision tree.

Specifically, in an optional embodiment of the present invention, step 14 includes:

step 141, obtaining a first partition value of the first decision tree and a second partition value of the second decision tree;

step 142, determining the modification direction of the second data in the watermark database according to the first division value and the second division value; the modification direction is to modify the target tag in the first database segment in the watermark database to the second database segment;

step 143, modifying the second data in the watermark database according to a preset constraint equation and the modification direction to obtain a modified watermark database until a third decision tree generated according to the data in the modified watermark database is the same as the first decision tree;

step 144, determining the modified watermark database as a target watermark database.

In this embodiment, the first decision tree is respectively established according to the first data in the preset database and the second data in the watermark database

And a second decision tree

Wherein F is a set of first partition values of the first decision tree, in particular

，

M1 is the sum of the number of the first partition values, and Y is a leaf node of the first decision tree; f _w A set of second partition values for a second decision tree, in particular

，

Is a second division value, m2 is the sum of the numbers of the second division values, Y _w Is a leaf node of the second decision tree;

it should be noted that, here, the first decision Tree is generated according to a preset database And the second decision Tree is generated according to a watermark database, preferably through a CART (Classification And Regression Tree) decision Tree algorithm.

Wherein the first and second division values may divide the watermark database into three segments.

Specifically, the watermark database is sorted according to the attribute column to obtain a sorted watermark database, and then the first separation value and the second separation value are positioned on the sorted watermark database, specifically:

by the formula

The positioning is performed on the sorted watermark database, wherein,

and

for adjacent second data in the sorted watermark database,

is a segmentation value;

formula (la)

Represents: if the adjacent watermark can be found in the attribute column of the sorted watermark database

And

two data to divide the value

Satisfy the formula

Then will be

And

position of between as a division value

Of the position of (a).

In a further particular embodiment, as shown in fig. 2, in the sorted watermark database D _w Respectively locating the first separation value

And a second division value

First separation value

In the position of

And

position in between, second division value

In the position of

And

in between, it can be seen that according to the first separation value

And a second division value

The position of the watermark can be sorted into the database D of the watermark _w Is divided into three sections.

In an optional embodiment of the present invention, the first segmentation value divides a preset database into two database segments; the second partitioning value divides the watermark database into two database segments;

step 142 includes:

step 1421, determining a first index and a second index of the first segmentation value in a preset database;

step 1422, determining a third index and a fourth index of the second segmentation value in the watermark database; the index represents the probability of the data in a random number in the database segment being mistaken;

step 1423, calculating a difference between the first index and the third index to obtain a first difference;

step 1424, calculating a difference between the second index and the fourth index to obtain a second difference;

step 1425, comparing the first difference with the second difference, and determining a modification direction of the second data in the watermark database.

In this embodiment, the modification direction of the second data is determined, so that the modification of the second data can effectively adjust the second decision tree, so that the second decision tree is similar to or equal to the first decision tree, and the modification of the second data in the watermark database may be performed by determining the modification direction of the second data first, or by determining the second data to be modified first;

here, determining the modification direction thereof includes: dividing a preset database into two database sections through a first division value, dividing a watermark database into two database sections through a second division value, positioning the first division value into the watermark database, and determining a first index and a second index of the first division value in the preset database; the first index represents an index value of a first database segment in the preset database after the first segmentation value is segmented, and the second index represents an index value of a second database segment in the preset database after the second segmentation value is segmented; the index represents the probability of the data in a random number in the database segment being mistaken;

specifically, the index is given by the formula

And calculating to obtain the result, wherein,

is an index to the database segment and,

the variable quantity of the database section is represented by p, and the proportion of the data of the target label in a preset database is represented by p;

respectively calculating to obtain a first index of the first database segment and a second index of the second database segment after the preset database is segmented by the first partitioning value according to the formula; the third index of the third database segment and the fourth index of the fourth database segment after the watermark database is divided by the second partition value can be respectively calculated through the formula;

it should be noted that the first database segment partitioned by the first partition value to the preset database is on the first side of the first partition value, the second database segment is on the second side of the first partition value, the third database segment partitioned by the second partition value to the watermark database is on the first side of the second partition value, the fourth database segment is on the second side of the second partition value, that is, the first database segment corresponds to the third database segment, and the second database segment corresponds to the fourth database segment;

and then can pass through the formula

A difference (first difference) between the first exponent and the third exponent is calculated, wherein,

is a first difference value of the first difference value,

is a first index of the number of bits,

is a second index;

and by the formula

Calculating a second index and a fourth indexA difference (second difference) of (a) wherein,

in order to be the second difference value,

is a third index of the number of the first and second indices,

is a fourth index;

further, comparing the first difference value with the second difference value to obtain a comparison result, and determining the modification direction of the second data in the watermark database according to the comparison result; the comparison results here include at least three of the following:

1) The first difference value and the second difference value have the same positive and negative values;

2) The first difference and/or the second difference is 0;

3) The first difference and/or the second difference is not 0;

when the first difference value and the second difference value are consistent in positive and negative, the first separation value is represented to be identical in leaf nodes in the first data and the second data;

when the first difference value and/or the second difference value is 0, the target label is represented to have the same proportion in the preset database and/or the watermark database; according to the first difference and/or the second difference, third differences obtained by second data in different modification directions can be screened;

when the first difference and/or the second difference is not 0, the second data in the watermark database may be selected to be modified so that the first difference and/or the second difference becomes smaller as a modification direction, and then third differences of the modification direction are respectively calculated, and a direction with the largest third difference is selected from the third differences to be determined as the modification direction of the data.

In an alternative embodiment of the present invention, step 143 comprises:

step 1431, obtaining a first variation of the first partition value in the watermark database and a second variation of the second partition value in the watermark database;

step 1432, obtaining a third difference according to the first variation and the second variation;

step 1433, according to the third difference, modifying the second data in the watermark database according to the modification direction under the condition that the preset constraint equation is satisfied, so as to obtain a modified watermark database.

In this embodiment, the first decision tree is obtained by the preset database according to a decision tree algorithm, and the decision tree algorithm selects the position of the second data with the smallest variation for segmentation each time, specifically, the position of the second data with the smallest variation may be segmented by a formula

A target index is calculated for each segmentation, wherein,

to segment the database on the first side when the second data is partitioned as a partition value,

database segment for first side

The number of tuples in the list (n),

to segment the database on the second side when the second data is partitioned as a partition value,

database segment for second side

The number of tuples in the list (n),

is composed of

The target index of (2) is,

is composed of

The amount of change in the amount of change,

is a segmentation value;

acquiring a first target index of the first segmentation value in the watermark database and a second target index of the second segmentation value in the watermark database according to the formula;

taking the example that the fifth database segment modifies the second data into the sixth database segment when the first partition value and the second partition value can partition the watermark database into three segments (the fifth database segment, the sixth database segment, and the seventh database segment), the following describes the calculation process of the first variation of the modified first partition value in the watermark database:

when the first division value is obtained

Modifies a second data to the first partition value

To the right (second side), a third target index of the first segmentation value in the watermark database is calculated:

wherein, the first and the second end of the pipe are connected with each other,

and

respectively representing the number of tuples in a fifth database segment and a sixth database segment after the watermark database is divided by the first separation value；

Representing the value from the first division

Modifies a second data to the first partition value

Right side (second side);

and

respectively representing the number of data of the first tag and the second tag in the fifth database segment.

And

respectively representing the number of the data of the first label and the second label in the sixth database segment;

representing the number of tuples in the watermark database;

is a first constant which is a function of the first,

is a second constant, if the modified second data has the first label,

if the modified second data is the second tag,

；

further, according to the first target index and the third target index, calculating a first variation of the first division value in the watermark database as follows:

wherein the content of the first and second substances,

representing a segmentation value

The tag in the left (first side) data is the data of the kth tag,

representing a segmentation value

The data labeled with the kth label in the right (second side) data,

if the shifted tag is the first tag, then

If the shifted tag is the second tag, then

。

Correspondingly, when dividing the value from the first

Modifies a second data to the first partition value

On the left side (first side), the first division value is calculated in the watermark dataFourth target index in library:

and

respectively representing the tuple numbers in a fifth database segment and a sixth database segment after the watermark database is partitioned by the first partition value;

representing the value from the first division

Modifies a second data to the first partition value

The left side (first side) of (a);

and

And

representing the number of tuples in the watermark database;

is a first constant which is a function of the first,

is a second constant, if the modified second data has the first label,

if the modified second data is the second tag,

；

further, according to the first target index and the fourth target index, calculating a first variation of the first division value in the watermark database as follows:

representing a division value

The tag in the left (first side) data is the kth tag data,

representing a division value

The data labeled as the kth label in the right (second side) data,

if the shifted tag is the first tag, then

If shiftedThe label is a second label, then

。

Similarly, a secondary cut value may also be calculated according to the above equation

The second variation is not described herein; respectively calculating the first division values according to the formula

First variance in watermark database

And a second division value

Second variance in watermark database

(ii) a According to the first variation and the second variation, passing through a formula

Obtaining a third difference, wherein d is the third difference;

the magnitude of the third difference d reflects the variation magnitude of the variation, and the larger d is, the closer the second segmentation value is to the first segmentation value after the second data is modified, and when the second data is modified, the second data with the largest value d needs to be continuously selected for modification, so that the position of the second segmentation value is the same as the position of the first segmentation value;

furthermore, it should be noted that, after determining the modification direction and determining the second data to be modified according to the third difference, the second data is modified, and in order that the modification of the second data does not affect the watermark information and the distortion of the data is small, the modification of the second data is defined as a constraint equation solving problem;

when the first and second partition values are located in the watermark database as shown in fig. 2, the first tag (L) in the fifth database segment is marked according to the modification direction ₁ ) If the second data is modified into the sixth database segment, the preset constraint equation is as follows:

wherein the content of the first and second substances,

for the second data of the tag in the fifth database segment,

represents the modified third data as a result of the modification,

is an integer step (for LSB watermark algorithm, when the modified step p =2, the watermark information is not destroyed),

in fractional steps (e.g., 3.98 with a fractional precision of 0.01, then l = 0.001),

is the value of the second division to be,

is the first division value.

According to the third difference d, under the condition of meeting a preset constraint equation, carrying out multiple iterations to modify second data in the watermark database according to a modification direction, and obtaining a modified watermark database; wherein, by traversing the searching way, if found

And

if the preset constraint equation is satisfied, the second data in the watermark database is indicated

Modification shift can be carried out without damage, and if the condition that the preset constraint equation is not met is not found

And

then, the second data in the watermark database is indicated

The shift can not be carried out without destroying the watermark information, and the next sample type is required to be continuously found in the area as the first label L ₁ Performing solution on the preset constraint equation until a solution meeting the preset constraint equation is found; and finally, the obtained third decision tree of the target watermark database is the same as the first decision tree of the preset database.

In a specific embodiment, the first data in the preset database D is shown in the following table:

TABLE 1

Wherein the first division value in the database D is preset

Is 50.6, the first division value

Segmenting a predetermined database D into first database segments D ₁ And a second database segment D2;

the second data in the watermark database sorted according to the attribute column is shown in the following table:

TABLE 2

Wherein the watermark database D _w Second division value of (1)

60.5, first division value

Between numbers 2 and 4 of Table 2 in the watermark database D _w In (2), dividing the first division value

Location to watermark database D _w In (1), the first division value

And a second division value

To watermark database D _w Dividing the data into a fifth database segment, a sixth database segment and a seventh database segment;

by means of exponential calculation formulas

Calculating a first index of the first database segment as

The second index of the second database segment and the seventh database segment is

The third index of the fifth database segment and the sixth database segment is

Fourth exponent of seventh database segment

Wherein p represents the proportion of the label L1;

by the formula

And formula

Calculating the difference between the first index and the third index to obtain a first difference

Calculating the difference between the second index and the fourth index to obtain a second difference

Due to the second difference

Then it represents the first partition value in the watermark database

The data proportion of the label L1 in the database section on the right side (the sixth database section and the seventh database section) is larger than the first segmentation value in the preset database

The data proportion occupied by the label L1 in the database section on the right side;

it should be noted that, here, the calculation is performed first

And

firstly judging the modification direction, and then determining second data with the minimum d value as second data to be modified; of course, it is also possible to calculate the second data to be modified first and then determine the modification direction, specificallyWhen the second data to be modified is calculated and then the modification direction is determined, the d values of different labels (L1, L2) moving to different positions are calculated and obtained, and then the calculation is carried out

And

then by

And

screening;

in order to ensure that the leaf nodes of the second decision tree and the first decision tree are identical, in the watermark database D _w From the first division value

Selects a second data with label L1 to modify to the first segmentation value

In the left database segment (fifth database segment) to ensure that the leaf nodes are close in proportion, therefore, the modification direction is determined as: in a watermark database D _w To select a first division value

Is modified to the first partitioning value

A left database segment of;

based on the modification direction, the watermark data x is obtained by calculation through a preset constraint equation ₁ Of

，

Then, then

. It can be seen that after the second data shift is performed according to the modification direction, the watermark data x ₁ Becomes smaller, and

is unchanged; wherein the first division value

The second data of the right-hand database segment (sixth database segment and seventh database segment) labeled L1 is divided into the first division value

And a second division value

Therefore, only the d value of the second data labeled as L1 needs to be calculated; if the second division value

If the second data labeled L1 also exists in the database segment on the right side, the value d of the second data labeled L1 also needs to be calculated, and the second data with the larger value d is selected as the second data to be modified.

In order to ensure that the distortion of the second data in the watermark database is small and the watermark information is unchanged; shifting the tuple with the primary key of 3 for the first time, modifying the second data with the primary key of 3 according to a preset constraint equation, regenerating a third decision tree by using a CART decision tree algorithm after one-time modification, and obtaining a first segmentation value

And a second division value

The positioning positions of (2) are matched as shown in the following table (first division value in table 3 below)

And a second division value

Equal and both between 3 and 4):

TABLE 3

Further, the other segmentation values are adjusted in sequence, so that the segmentation values in the first decision tree and the third decision tree are completely the same.

In an optional embodiment of the present invention, the method for processing watermark information in a database further includes:

and step 15, extracting the watermark from the target watermark database to obtain target watermark information.

In this embodiment, the watermark extraction of the target watermark database refers to extracting a watermark from the target watermark database containing watermark information by using a watermark extraction algorithm, and using the extracted watermark information for copyright certification, piracy tracing, integrity authentication, and the like, and the watermark extraction stage of the present application includes: data grouping, watermark information extraction and watermark voting.

Specifically, in an optional embodiment of the present invention, step 15 includes:

step 151, performing hash grouping on the watermark data in the target watermark database to obtain a plurality of target groups;

step 152, performing watermark extraction on the watermark data in each target packet to obtain target watermark information.

In this embodiment, the data grouping method is similar to the watermark embedding stage, and the watermark data in the target watermark database is subjected to hash grouping according to the primary key value of the tuple to obtain a plurality of target groups, and each target group is subjected to hash groupingWatermark extraction is carried out on watermark data in the mark groups to obtain target watermark information, and specifically: by the formula

Calculating to obtain watermark information in each tuple; wherein

(ii) a Because the watermark information embedded in the same group is the same, when the watermark information is different, the watermark information in the same group is further voted, and the watermark information of the group is determined according to majority obeying minority principle.

In another specific embodiment, as shown in fig. 3 to 5, an original database containing binary data of a plurality of tuples is tested by the database watermarking information processing method in the embodiment of the present application; fig. 3 shows a first decision tree generated by an original database of classified data, and fig. 4 shows a second decision tree generated by a watermark database of classified data after embedding 64bits of watermark information, and it can be seen that the second decision tree of the watermark database of classified data generated after embedding watermark information is changed, unlike the first decision tree;

fig. 5 is a third decision tree generated by target watermark data of classified data after modification of the second decision tree, and it can be seen that the third decision tree of the target watermark data can be ensured to be the same as the first decision tree of the original database by the method for processing watermark information of a database in the embodiment of the present application.

In another specific embodiment, in order to ensure that the target watermark database in the embodiment of the present application can effectively perform copyright authentication and tracing during the actually applied shared transaction, the robustness of the target watermark database is tested:

respectively giving watermark robustness test results after deletion, insertion and modification of attribute values in an attribute column aiming at a target watermark database tuple, wherein the watermark robustness test results are determined by extracting Bit Error Rate (BER), the larger the BER value is, the more errors of the extracted watermark are shown, and when the BER is 0, the completely correct watermark information is shown;

deleting the target watermark database tuple to randomly select a database tuple with a given proportion, and obtaining a watermark robustness test result shown in the following table:

TABLE 4

As can be seen from table 4, in the case of deletion, both the LSB watermark algorithm and the watermark information processing method of the database in the embodiment of the present application can correctly extract watermark information, and with the increase of the deletion ratio, the robustness of the method in the embodiment of the present application and the robustness of the LSB watermark algorithm can also be kept consistent;

the insertion of the target watermark database tuple is to insert the database tuple with the specified proportion at random, the inserted tuple key still has uniqueness, the attribute value of the inserted tuple is randomly selected from the attribute values of the target watermark database, and the tag of the tuple is randomly generated according to the existing tag to obtain the watermark robustness test result shown in the following table:

TABLE 5

As can be seen from table 5, in the case of insertion, both the LSB watermark algorithm and the watermark information processing method of the database in the embodiment of the present application can correctly extract watermark information, and with the increase of the insertion ratio, the robustness of the method in the embodiment of the present application and the robustness of the LSB watermark algorithm can also be kept consistent, and the watermark information processing method of the database in the embodiment of the present application not only modifies the second decision tree into a decision tree that is the same as the first decision tree, but also well maintains the robustness;

wherein, the modification of the attribute value in the attribute column is to modify the attribute value randomly, and the modified attribute value does not violate the semantics of the current attribute, so as to obtain the watermark robustness test result shown in the following table:

TABLE 6

As can be seen from table 6, under the condition of modifying the attribute value, both the LSB watermark algorithm and the watermark information processing method of the database in the embodiment of the present application can correctly extract the watermark information, and as the modification ratio increases, the robustness of the method in the embodiment of the present application and the robustness of the LSB watermark algorithm can also be kept consistent, and the watermark information processing method of the database in the embodiment of the present application not only modifies the second decision tree into a decision tree that is the same as the first decision tree, but also well maintains the robustness;

in summary, the watermark information processing method of the database in the embodiment of the application not only modifies the second decision tree into the same decision tree as the first decision tree, but also well maintains robustness.

As shown in fig. 6, in a specific embodiment, the original database D is processed with watermark information, and the watermark information W is embedded into the original database D by using a watermark embedding algorithm (LSB), so as to obtain the watermark database D _w (ii) a First decision tree generated based on original database D and watermark database D _w Generating a second decision tree, carrying out decision tree reconstruction on the second decision tree, carrying out division value positioning on the first decision tree and the second decision tree, judging a modification direction according to the division values of the first decision tree and the second decision tree, and carrying out decision tree reconstruction on the watermark database D based on the modification direction _w Modifying the second data to obtain a target watermark database; wherein, a third decision tree generated according to third data in the target watermark database is the same as the first decision tree;

furthermore, watermark extraction can be performed on the target watermark database through a watermark extraction algorithm to obtain target watermark information, and the extracted target watermark information can be used for copyright confirmation and tracing, so that the effectiveness of data is ensured, the accuracy of watermark information extraction is improved, and the robustness is good.

Embodiments of the present invention generate watermark information; embedding the watermark information into a preset database to obtain a watermark database; generating a first decision tree according to first data in the preset database and generating a second decision tree according to second data in the watermark database; modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database; the third decision tree generated according to the third data in the target watermark database is the same as the first decision tree, so that the threat of the database on data safety in copyright authentication and tracing is overcome, the effectiveness of the data is ensured, the accuracy of extracting watermark information is improved, and the robustness is good.

As shown in fig. 7, an embodiment of the present invention further provides a watermark information processing apparatus 70 for a database, including:

a generating module 71, configured to generate watermark information;

the processing module 72 is configured to embed the watermark information into a preset database to obtain a watermark database; generating a first decision tree according to first data in the preset database and generating a second decision tree according to second data in the watermark database; modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database; wherein a third decision tree generated from third data in the target watermark database is the same as the first decision tree.

Optionally, the watermark information is a random sequence with a length of n

，

For the ith bit watermark information in the random sequence W,

，

；

performing hash grouping on M tuples in a preset database according to a key value of each tuple to obtain a plurality of groups;

the first in the watermark information

Bit watermark information

Is embedded into

And obtaining a watermark database in each group.

modifying second data in the watermark database according to a preset constraint equation and the modification direction to obtain a modified watermark database until a third decision tree generated according to data in the modified watermark database is the same as the first decision tree;

and determining the modified watermark database as a target watermark database.

Optionally, the first segmentation value divides the preset database into two database segments; the second partitioning value divides the watermark database into two database segments;

calculating the difference value of the second index and the fourth index to obtain a second difference value;

and comparing the first dispersion with the second dispersion to determine the modification direction of the second data in the watermark database.

acquiring a first variation of the first segmentation value in the watermark database and a second variation of the second segmentation value in the watermark database;

It should be noted that the apparatus is an apparatus corresponding to the above method, and all the implementations in the above method embodiment are applicable to the embodiment of the apparatus, and the same technical effects can be achieved.

Embodiments of the present invention also provide a computing device, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method as described above. All the implementation manners in the above method embodiment are applicable to this embodiment, and the same technical effect can be achieved.

Embodiments of the present invention also provide a computer-readable storage medium storing instructions which, when executed on a computer, cause the computer to perform the method as described above. All the implementation manners in the method embodiment are applicable to the embodiment, and the same technical effect can be achieved.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Furthermore, it is to be noted that in the device and method of the invention, it is obvious that the individual components or steps can be decomposed and/or recombined. These decompositions and/or recombinations are to be regarded as equivalents of the present invention. Also, the steps of performing the series of processes described above may naturally be performed chronologically in the order described, but need not necessarily be performed chronologically, and some steps may be performed in parallel or independently of each other. It will be understood by those skilled in the art that all or any of the steps or elements of the method and apparatus of the present invention may be implemented in any computing device (including processors, storage media, etc.) or network of computing devices, in hardware, firmware, software, or any combination thereof, which can be implemented by those skilled in the art using their basic programming skills after reading the description of the present invention.

The object of the invention is thus also achieved by a program or a set of programs running on any computing device. The computing device may be a well-known general purpose device. The object of the invention is thus also achieved solely by providing a program product comprising program code for implementing the method or the apparatus. That is, such a program product also constitutes the present invention, and a storage medium storing such a program product also constitutes the present invention. It is to be understood that the storage medium may be any known storage medium or any storage medium developed in the future. It is further noted that in the apparatus and method of the present invention, it is apparent that each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present invention. Also, the steps of executing the series of processes described above may naturally be executed chronologically in the order described, but need not necessarily be executed chronologically. Some steps may be performed in parallel or independently of each other.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A method for processing watermark information of a database, comprising:

generating watermark information;

2. The method as claimed in claim 1, wherein the watermark information is a random sequence of length n

，

For the ith bit watermark information in the random sequence W,

，

；

embedding the watermark information into a preset database to obtain a watermark database, wherein the watermark database comprises:

the ith bit watermark information w in the watermark information is processed _i Is embedded in the t-th _i And obtaining a watermark database in each group.

3. The method for processing watermark information of a database according to claim 1, wherein modifying second data in the watermark database according to the first decision tree and the second decision tree to obtain a target watermark database comprises:

and determining the modified watermark database as a target watermark database.

4. The method for processing the watermark information of the database according to claim 3, wherein the first partition value divides a preset database into two database segments; the second partition value divides the watermark database into two database segments;

5. The method for processing the watermark information of the database according to claim 3, wherein the step of modifying the second data in the watermark database according to a preset constraint equation and the modification direction to obtain a modified watermark database comprises:

6. The method for processing watermark information of a database according to claim 1, further comprising:

and extracting the watermark from the target watermark database to obtain target watermark information.

7. The method for processing watermark information of a database according to claim 6, wherein extracting a watermark from the target watermark database to obtain target watermark information comprises:

8. An apparatus for processing watermark information of a database, comprising:

the generating module is used for generating watermark information;

9. A computing device, comprising: a processor, a memory storing a computer program which, when executed by the processor, performs the method of any of claims 1 to 7.

10. A computer-readable storage medium storing instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 7.