CN109522684B

CN109522684B - Data processing method, device and storage medium

Info

Publication number: CN109522684B
Application number: CN201811422511.7A
Authority: CN
Inventors: 裴超
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2018-11-27
Filing date: 2018-11-27
Publication date: 2020-07-28
Anticipated expiration: 2038-11-27
Also published as: CN109522684A

Abstract

The embodiment of the invention provides a data processing method, data processing equipment and a storage medium, wherein the method comprises the following steps: acquiring watermark data and data to be processed, and performing data compression coding processing on the watermark data to obtain compression codes, wherein the data to be processed consists of a plurality of data units; calculating to obtain a first data position by adopting a first preset algorithm based on a preset random number and a secret key, and acquiring first data positioned at the first data position from each data unit; for each data unit, calculating by adopting a second preset algorithm to obtain a second data position based on the secret key, the watermark data and the first data obtained from the data unit, and obtaining second data positioned at the second data position from compression coding; and for each data unit, replacing the first data in the data unit with corresponding second data to obtain target data. The technical scheme provided by the embodiment of the invention can mark the ownership of the data and is convenient to identify the ownership of the data.

Description

Data processing method, device and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a data processing method, data processing equipment and a storage medium.

Background

Due to the development of informatization and networking, data in various fields of society shows explosive growth, and scenes that the data is used as important production data to play value are generated successively. The problem of how to mine and utilize the value of large-scale data is that the data is converted from the original static storage and value consumption form into the form capable of being dynamically used and producing the value, and the data is competitive in all fields. Meanwhile, data sharing is carried out among different data owners, and data cooperation across enterprises, industries and fields is carried out, so that higher value is obtained.

Although large-scale data can be protected by using a cryptographic technical means in the transmission and storage processes, the large-scale data still needs to be restored into plaintext for use in the process of mining and utilizing the large-scale data as production data, and still has the risk of being leaked by internal staff, external attacks, third-party cooperative personnel or data purchasers. Therefore, it is very important to prove the ownership of the data, and to discover the leakage source of the data and trace back the data leakage person when the data is leaked.

Disclosure of Invention

The embodiment of the invention provides a data processing method, equipment and a storage medium, which are used for injecting watermarks into data contents so as to mark the ownership of the data.

A first aspect of an embodiment of the present invention provides a data processing method, including:

acquiring watermark data and data to be processed, and performing data compression coding processing on the watermark data to obtain compression codes, wherein the data to be processed consists of a plurality of data units; calculating to obtain a first data position by adopting a first preset algorithm based on a preset random number and a secret key, and acquiring first data positioned at the first data position from each data unit; for each data unit, calculating to obtain a second data position by adopting a second preset algorithm based on the secret key, the watermark data and the first data obtained from the data unit, and obtaining second data positioned at the second data position from the compression coding; and for each data unit, replacing the first data in the data unit with corresponding second data to obtain target data.

A second aspect of an embodiment of the present invention is to provide a data processing apparatus, including:

a processor and a memory, the memory having instructions stored therein that when executed by the processor perform the following:

A third aspect of embodiments of the present invention provides a computer-readable storage medium, which includes instructions, and when the instructions are executed on the computer, the computer can execute the method of the first aspect.

According to the embodiment of the invention, watermark data and data to be processed are obtained, data compression coding processing is carried out on the watermark data to obtain compression coding, the first data is calculated and obtained by adopting a first preset algorithm based on preset random data and a secret key, the first data positioned on a first data position is obtained from each data unit, a second data position is calculated and obtained by adopting a second preset algorithm based on the secret key, the watermark data and the first data obtained from the data unit aiming at each data unit, the second data positioned on the second data position is obtained from the compression coding, and the corresponding second data is adopted to replace the first data in the data unit aiming at each data unit, so that target data after the watermark is injected is obtained. In the embodiment, the watermark is injected into the data content, so that the data rights are marked, and the reliability of the data rights marking is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a data processing device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises" and "comprising," and any variations thereof, in the description and claims of this invention, are intended to cover non-exclusive inclusions, e.g., a process or an apparatus that comprises a list of steps is not necessarily limited to those structures or steps expressly listed but may include other steps or structures not expressly listed or inherent to such process or apparatus.

For convenience of understanding, terms referred to in the embodiments of the present invention are first explained below:

the embodiment of the invention defines information capable of uniquely identifying a user, equipment, things, activities and other subjects as the identification information. If a user is a principal and the identification information may be an identification number, a mobile phone number, a passport number, a social security card number, a bank card number, a mailbox, etc., one of these pieces of information can be assigned to a specific user principal.

Attribute information, the present embodiment defines information for describing the body as attribute information. As described for the subject, the information such as gender, birth date, residence place, and school calendar is the user attribute information. The description of the mailbox, such as the registration date, the login IP, the secret protection problem and the like, is mailbox attribute information and the like.

The attribute information of the main body is subjected to feature classification according to a preset rule to obtain corresponding label information, for example, the label information of a mobile phone user who uses 4G mobile traffic to perform shopping and browsing in a 9: 00-12: 00 morning time period on a certain day can be divided into a telecom operator, a traffic type, a mobile phone APP category and the like.

It should be noted that: some information has different meanings for different subjects, such as longitude and latitude information of geographic positions, which represents attribute information of user positions when describing users, and identification information which indicates a certain place of the world when describing map data.

Aiming at the problems of how to prove data ownership and how to trace back a leakage source after data leakage, the prior art provides the following two methods:

the method I comprises the steps of establishing a closed system environment, wherein data circulation and use are limited in the system environment and cannot be flowed out. The system has complete access control, auditing system and firewall, and has strict data flow direction management strategy for control. When data is leaked, the time and operation of the main leakage body and the leakage behavior can be known through the auditing system. However, as soon as the method must limit the data usage in a limited closed environment, the data circulation range is limited, data sharing cannot be facilitated, and the limited environment is necessarily centralized, otherwise, management is difficult, but this will cause operation burden on the data provider or data manager to the closed environment.

And secondly, performing digital watermark injection on files with high redundant information content, such as pictures, documents, audios, videos and the like, wherein the injection object is a data carrier, namely a file format, but not file content. Such as picture file format jpeg, bmp, document file format doc, excel, audio file format wav, mp3, video file format mp4, mpeg, etc. When data leak occurs, the leaking main body is obtained by detecting the digital watermarks in the data file formats. However, the second method has too many limitations on the carrier file format of the data, and requires the data file format to contain a large redundant space in order to perform the digital watermarking operation. The method cannot be carried out for a file format with small redundant space, and is inefficient or invalid for data which is not dependent on the file format, for example, when the content in a document is copied from a doc-format file and stored in a txt-format file, the data is leaked, but watermark detection cannot be carried out.

In view of the foregoing problems in the prior art, an embodiment of the present invention provides a data processing method, where a watermark is added to a data content of data, so that the data can prove its ownership even in an open environment, and the method provided by the embodiment of the present invention can get rid of the dependence of a conventional watermark injection method on a data file format, so that even if the data content is copied or the data file is converted into a format, the ownership of the data can still be proved through the watermark in the data content, and even a leakage point of the source data can be traced.

Aspects of embodiments of the invention are described below in conjunction with exemplary embodiments.

Fig. 1 is a flowchart of a data processing method provided by an embodiment of the present invention, and the method may be processed by a data processing apparatus, where for ease of understanding, the data processing apparatus may be exemplarily understood as a computer having an arithmetic processing capability in this embodiment. As shown in fig. 1, the method comprises the steps of:

step 101, acquiring watermark data and data to be processed, and performing data compression coding processing on the watermark data to obtain compression codes, wherein the data to be processed is composed of a plurality of data units.

In this embodiment, after the watermark data is obtained, a preset compression coding function may be used to calculate and obtain a compression coding corresponding to the watermark data.

The data to be processed in this embodiment includes a plurality of data units, and the data units may be embodied as data entries or data blocks. Each data unit comprises at least one piece of identification information and at least one piece of attribute information of a main body to which the data to be processed belongs.

Step 102, calculating to obtain a first data position by adopting a first preset algorithm based on a preset random number and a secret key, and acquiring first data positioned at the first data position from each data unit.

The random number and the secret key in this embodiment are bound to the subject to which the data to be processed belongs.

The first preset algorithm in this embodiment may be set as needed, for example, the first preset algorithm may be understood as a preset cryptographic function for easy understanding.

The first data location in this embodiment is used to indicate the location at which data is to be retrieved from the data unit. For example, in one possible design, the location may be specifically used to indicate the location at which the identification information is obtained from the data unit, or in another possible design, the location may also be used to indicate the location at which the attribute information is obtained from the data unit. For example, in this embodiment, a first set may be generated for each data unit based on the identification information in the data unit, and a second set may be generated based on the attribute information in the data unit, so that an element in the first set is a binary group formed by the identification information and the position where the identification information is located, and an element in the second set is a binary group formed by the attribute information and the position where the attribute information is located, for example, the generated first set and second set may be in the following form:

a first set of: set_ID＝{(ID₁，Loc₁),…,(ID_m，Loc_m)}，

A second set: set_tttr＝{(Attr₁，Loc₁),…,(Attr_n，Loc_n)}

Wherein, ID_mIndicating the mth identification information in the data unit, L oc_mIndicating the position of the mth identification information in the data unit, Attr_nIndicating the nth attribute information in the data unit, L oc_nIndicating the location of the nth attribute information in the data unit. That is, the first data location may be used to indicate the first Set_IDOr may be used to indicate the second Set_AttrIs located at the first data location. It is of course only illustrative and not exclusive here.

For example, the present embodiment may directly use a position obtained by calculation based on the random number and the key as the first data position. For example, assuming that the first data position is used to indicate the identification information in the first set, the binary group located at the first data position may be obtained from the data unit according to the first data position, and then the identification information may be obtained from the binary group as the first data.

Or after a third data position is obtained based on the random number and the key calculation, the first data position is obtained by adopting a preset second function calculation based on the key and the identification information or attribute information indicated by the third data position. Of course, the calculation method of the first data position provided herein is only for illustration and not for the only limitation of the present invention.

Step 103, for each data unit, calculating to obtain a second data position by using a second preset algorithm based on the secret key, the watermark data and the first data obtained from the data unit, and obtaining second data located at the second data position from the compression coding.

The second preset algorithm and the first preset algorithm related in this embodiment may be the same or different, for example, in a possible implementation manner, both the second preset algorithm and the first preset algorithm may be specifically cryptographic functions.

The second data position in the present embodiment is used to indicate a position where data is acquired from compression encoding. For example, if the first data position indicates the attribute information at a certain position in the data unit, the attribute information is used as the first data, and the data position in the compression encoding is calculated and obtained based on the attribute information, the key and the watermark data, so that the data at the position is used as the second data.

And 104, replacing the first data in the data unit with corresponding second data aiming at each data unit to obtain target data.

In this embodiment, watermark data and data to be processed are obtained, data compression coding processing is performed on the watermark data to obtain compression coding, based on preset random data and a secret key, a first preset algorithm is adopted to calculate until first data is obtained, first data located at a first data position is obtained from each data unit, for each data unit, a second preset algorithm is adopted to calculate to obtain a second data position based on the secret key, the watermark data and the first data obtained from the data unit, second data located at the second data position is obtained from the compression coding, and for each data unit, corresponding second data is adopted to replace the first data in the data unit, so that target data after watermark injection is obtained. In the embodiment, the watermark is injected into the data content, so that the data rights are marked, and the reliability of the data rights marking is improved.

Further optimization and extension of the above embodiment are provided below.

Fig. 2 is a flowchart of a data processing method according to an embodiment of the present invention, and as shown in fig. 2, on the basis of the embodiment of fig. 1, the embodiment includes the following steps:

step 201, acquiring watermark data and data to be processed, and performing data compression coding processing on the watermark data to obtain compression coding, wherein the data to be processed is composed of a plurality of data units.

The data unit comprises at least one piece of identification information and at least one piece of attribute information of a main body to which the data to be processed belongs.

Step 202, for each data unit, generating a first set based on the identification information in the data unit, and generating a second set based on the attribute information in the data unit, wherein an element in the first set is a binary group formed by the identification information and a position where the identification information is located, and an element in the second set is a binary group formed by the attribute information and the position where the attribute information is located.

Step 203, based on the preset random number and the secret key, a third data position is obtained by adopting a preset first function to calculate, and a tuple located at the third data position is obtained from the first set corresponding to each data unit.

Step 204, for each data unit, calculating by using a preset second function to obtain a first data position based on the key and the identification information in the binary group obtained from the data unit, and obtaining first data located at the first data position from a second set corresponding to the data unit.

Step 205, for each data unit, calculating to obtain a second data position by using a second preset algorithm based on the secret key, the watermark data and the attribute information in the binary group obtained from the data unit, and obtaining second data located at the second data position from the compressed code.

And step 206, for each data unit, replacing the first data in the data unit with corresponding second data to obtain target data.

For example, assuming watermark data WM, a compression code W may be obtained by calculating a compression coding function CE, where the ith element of W is denoted as W_iAnd i is (1,2 … l), and l represents the length of W.

Further, assuming that the data BD to be processed is composed of data units RowText, for each RowText, identifying information and attribute information in the RowText, recording the position of the identifying information and attribute information, and generating a first set: set_ID＝{(ID₁，Loc₁),…,(ID_m，Loc_m) And a second set: set_Attr＝{(Attr₁，Loc₁),…,(Attr_n，Loc_n) Therein ID of_mIndicating the mth identification information in the data unit, L oc_mIndicating the position of the mth identification information in the data unit, Attr_nIndicating the nth attribute information in the data unit, L oc_nIndicating the location of the nth attribute information in the data unit.

Further, a third data location is calculated using the preset first function based on the preset random data Rpre and the Key Key, the third data location indicating the location of the identification information in the first set, wherein an alternative calculation method may use the cryptographic function H to calculate the third data location α, which is expressed as follows:

H(Rpre||Key)mod m＝α

after determining the third data position, for each data unit, a tuple (ID) located at the third data position is obtained from the first set of data units RowText_α，Loc_α)＝Set_ID(α) based on the ID information in the duplet_αAnd a predetermined Key, which is calculated using a predetermined second function (e.g., a cryptographic function) to obtain a first data location β, which is further processed from the dataObtaining a binary group (Attr) located at a first data position from a second set of cells RowText_β，Loc_β)＝Set_Attr(β)。

Further, for each data unit, based on the Key and the attribute information Attr_βAnd calculating the watermark data WM to obtain a second data position gamma for indicating the data position in the compression coding, and further obtaining second data w positioned at the position gamma in the compression coding_γ. To w_γAnd Attr_βComputing Attr using fuzzy confusion function FC_β′＝FC(Attr_β,w_γγ), using Attr_β' replacement Attr_βAnd obtaining target data.

Optionally, this embodiment may further include an extraction process of the watermark data. Specifically, for each data segment element, the Attr may be obtained by a method similar to the above method_β', and using the inverse function pair of the fuzzy confusion function FC_β' treatment to give the corresponding Attr_βAnd data w at position gamma in compression coding_γWhen acquiring w corresponding to all data units_γThen, can be aligned with w_γAnd sequencing to obtain the compressed code.

The foregoing is, of course, illustrative and not limiting of the invention.

Fig. 3 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, and as shown in fig. 3, the apparatus 30 includes: a processor 31 and a memory 32, the memory 32 having stored therein instructions that when executed by the processor 31 perform the following:

In one possible design, the data unit includes at least one identification information and at least one attribute information of a subject to which the data to be processed belongs;

before the processor 31 calculates to obtain the first data position based on the preset random number and the secret key by using the first preset algorithm, and acquires the first data located at the first data position from each data unit, the processor is further configured to:

and for each data unit, generating a first set based on the identification information in the data unit, and generating a second set based on the attribute information in the data unit, wherein elements in the first set are duplets formed by the identification information and the position of the identification information, and elements in the second set are duplets formed by the attribute information and the position of the attribute information.

In a possible design, when the processor 31 calculates to obtain the first data position based on the preset random number and the secret key by using a first preset algorithm, and obtains the first data located at the first data position from each data unit, the processor is specifically configured to:

calculating by adopting a preset first function to obtain a third data position based on a preset random number and a secret key, and acquiring a binary group positioned at the third data position from a first set corresponding to each data unit; and for each data unit, calculating and obtaining a first data position by adopting a preset second function based on the key and the identification information in the binary group obtained from the data unit, and obtaining first data positioned at the first data position from a second set corresponding to the data unit.

In one possible design, when, for each data unit, the processor 31 calculates, by using a second preset algorithm, a second data position based on the secret key, the watermark data, and the first data obtained from the data unit, and obtains second data located at the second data position from the compression coding, specifically:

and for each data unit, calculating to obtain a second data position by adopting a second preset algorithm based on the secret key, the watermark data and the attribute information in the binary group obtained from the data unit, and obtaining second data positioned at the second data position from the compressed code.

based on a preset random number and a secret key, calculating by adopting a first preset algorithm to obtain a first data position, and acquiring a binary group positioned on the first data position from a first set corresponding to each data unit; and for each data unit, taking the identification information in the binary group acquired from the data unit as first data.

The device provided by this embodiment can be used to execute the method of any of the above embodiments, and the execution mode and the beneficial effect are similar, which are not described herein again.

The embodiment of the invention also provides a computer-readable storage medium, which comprises instructions, and when the instructions are run on the computer, the computer can execute the method of any one of the above embodiments.

Finally, it should be noted that, as one of ordinary skill in the art will appreciate, all or part of the processes of the methods of the embodiments described above may be implemented by hardware related to instructions of a computer program, where the computer program may be stored in a computer-readable storage medium, and when executed, the computer program may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.

Each functional unit in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

The above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A data processing method, comprising:

acquiring watermark data and data to be processed, and performing data compression coding processing on the watermark data to obtain compression codes, wherein the data to be processed consists of a plurality of data units;

calculating to obtain a first data position by adopting a first preset algorithm based on a preset random number and a secret key, and acquiring first data positioned at the first data position from each data unit;

for each data unit, calculating to obtain a second data position by adopting a second preset algorithm based on the secret key, the watermark data and the first data obtained from the data unit, and obtaining second data positioned at the second data position from the compression coding;

for each data unit, replacing the first data in the data unit with corresponding second data to obtain target data;

the data unit comprises at least one piece of identification information and at least one piece of attribute information of a main body to which the data to be processed belongs;

before the calculating to obtain the first data position based on the preset random number and the key by using the first preset algorithm and acquiring the first data located at the first data position from each data unit, the method further includes:

for each data unit, generating a first set based on identification information in the data unit, and generating a second set based on attribute information in the data unit, wherein elements in the first set are duplets formed by the identification information and the position of the identification information, and elements in the second set are duplets formed by the attribute information and the position of the attribute information;

the calculating, based on a preset random number and a secret key, to obtain a first data location by using a first preset algorithm, and obtaining first data located at the first data location from each data unit, includes:

calculating by adopting a preset first function to obtain a third data position based on a preset random number and a secret key, and acquiring a binary group positioned at the third data position from a first set corresponding to each data unit;

and for each data unit, calculating and obtaining a first data position by adopting a preset second function based on the key and the identification information in the binary group obtained from the data unit, and obtaining first data positioned at the first data position from a second set corresponding to the data unit.

2. The method according to claim 1, wherein the calculating, for each data unit, a second data position based on the key, the watermark data, and the first data obtained from the data unit by using a second preset algorithm, and obtaining second data located at the second data position from the compression coding comprises:

3. A data processing apparatus comprising a processor and a memory, said memory having stored therein instructions that when executed by said processor perform the following:

before the processor calculates to obtain the first data position based on the preset random number and the secret key by using a first preset algorithm and acquires the first data positioned at the first data position from each data unit, the processor is further configured to:

the processor, when calculating to obtain a first data position based on a preset random number and a key by using a first preset algorithm and acquiring first data located at the first data position from each data unit, is specifically configured to:

4. The device according to claim 3, wherein the processor, when calculating, for each data unit, a second data position based on the key, the watermark data, and the first data obtained from the data unit by using a second preset algorithm, and obtaining second data located at the second data position from the compressed encoding, is specifically configured to:

5. A computer-readable storage medium comprising instructions which, when executed on the computer, cause the computer to perform the method of claim 1 or 2.