WO2024066271A1 - Database watermark embedding method and apparatus, database watermark tracing method and apparatus, and electronic device - Google Patents

Database watermark embedding method and apparatus, database watermark tracing method and apparatus, and electronic device Download PDF

Info

Publication number
WO2024066271A1
WO2024066271A1 PCT/CN2023/085945 CN2023085945W WO2024066271A1 WO 2024066271 A1 WO2024066271 A1 WO 2024066271A1 CN 2023085945 W CN2023085945 W CN 2023085945W WO 2024066271 A1 WO2024066271 A1 WO 2024066271A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
preset
zero
binary
embedding
Prior art date
Application number
PCT/CN2023/085945
Other languages
French (fr)
Chinese (zh)
Inventor
刘睿民
丁若冰
张锦
Original Assignee
北京柏睿数据技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京柏睿数据技术股份有限公司 filed Critical 北京柏睿数据技术股份有限公司
Publication of WO2024066271A1 publication Critical patent/WO2024066271A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Definitions

  • the present application relates to the field of database technology, and more specifically, to a database watermark embedding method, a source tracing method, a device and an electronic device.
  • Database watermarking technology uses covert means to embed watermark information such as copyright descriptions and user identities into table data and file data without affecting the use of the original data, thereby solving the technical problem of data leakage that cannot be traced during data sharing, distribution, and use, and ensuring data security during data sharing, distribution, and use, thereby enhancing the value of data sharing.
  • the algorithm for implementing database watermarking is usually based on the different types of data. Different transformation algorithms are used to make imperceptible transformations on the data, thereby hiding the watermark data in the specific data and completing the embedding of the database watermark.
  • the backtracking algorithm corresponding to the algorithm type is used to restore the watermark information, thereby realizing data tracing in cases of data leakage.
  • this method of implementing database watermarks based on different transformation algorithms and backtracking algorithms solves the technical problem that data leakage data cannot be traced, this method requires the use of different transformation algorithms to transform different types of data, and has poor generality.
  • the watermark data is also inserted into the data and becomes a component of the data, causing the data value to change after the transformation. Users cannot directly read the value of the data and must use the backtracking method to read it, and the calculation process is relatively complicated.
  • the calculation of different algorithms requires the use of part of the computing resources of the database system, which reduces the database performance and seriously affects the data display.
  • the embodiments of the present application provide a database watermark embedding method, a traceability method, a device and an electronic device, which are used to achieve efficient database watermark embedding without affecting data display.
  • a method for embedding a database watermark comprising:
  • mapping the binary data into zero-width character string data according to a preset mapping relationship table
  • the preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
  • the method before embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed, the method further includes:
  • the label mark is determined in advance according to the type, length, position and attribute of the data to be processed.
  • the binary data is mapped to zero-width character string data according to a preset mapping relationship table, specifically:
  • Each group of the sub-data is mapped into a zero-width character string according to the preset mapping relationship table to obtain the zero-width character string data.
  • each character in the preliminary encoded data is converted into a binary number of a preset length to obtain binary data, specifically:
  • the binary data is obtained according to a binary number of a preset length corresponding to each of the characters.
  • the preset encoding rule includes an encoding rule corresponding to hexadecimal Unicode encoding, or decimal Unicode encoding, or hexadecimal GBK encoding, or decimal GBK encoding.
  • a method for tracing the database watermark as described in the first aspect includes:
  • Each of the characters is converted into the data to be processed according to the preset encoding rule, and the data to be processed is used as the traceability result data.
  • the embedding position is determined by a tag mark, and the tag mark is determined in advance according to the type, length, position and attribute of the data to be processed.
  • a database watermark embedding device comprising:
  • a first conversion module used to convert the data to be processed into preliminary coded data according to a preset coding rule
  • a second conversion module used for converting each character in the preliminary coded data into a binary number of a preset length and obtaining binary data
  • a first mapping module used for mapping the binary data into zero-width character string data according to a preset mapping relationship table
  • An adding module used for adding a preset zero-width character string before and after the zero-width character string data respectively to obtain final encoded data
  • An embedding module used for embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed
  • the preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
  • a device for tracing the source of a database watermark as described in the third aspect comprising:
  • a determination module used for determining the final encoded data according to the embedding position
  • a removal module used for removing the preset zero-width character string before and after the final encoded data and obtaining the zero-width character string data
  • a second mapping module used for mapping the zero-width character string data into the binary data according to the preset mapping relationship table
  • a third conversion module used for dividing the binary data into a plurality of groups of binary numbers according to the preset length, and converting each group of binary numbers into each character of the preliminary coded data;
  • the fourth conversion module is used to convert each of the characters into the data to be processed according to the preset encoding rule, and use the data to be processed as the tracing result data.
  • an electronic device including:
  • a memory configured to store executable instructions of the processor
  • the processor is configured to execute the embedding method described in any one of the first aspects or the tracing method described in any one of the second aspects by executing the executable instructions.
  • the data to be processed is converted into preliminary coded data according to the preset coding rules; each character in the preliminary coded data is converted into a binary number of preset length and binary data is obtained; the binary data is mapped into zero-width character string data according to the preset mapping relationship table; the preset zero-width character string is added before and after the zero-width character string data respectively to obtain the final coded data; the final coded data is embedded as a database watermark in the embedding position corresponding to the data to be processed; wherein the preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers of preset bits and different zero-width character strings.
  • the data to be processed is uniformly converted into preliminary coded data first, it is avoided to use different algorithms to process different types of data, thereby improving the versatility and ensuring the high performance of the database.
  • the final coded data uses a zero-width character string, which will not affect the data display after embedding the database watermark, thereby realizing efficient database watermark embedding without affecting the data display.
  • FIG1 is a schematic diagram showing a flow chart of a method for embedding a database watermark according to an embodiment of the present invention
  • FIG. 2 shows a schematic diagram of a process of tracing the source of a database watermark according to an embodiment of the present invention. picture
  • FIG3 shows a schematic diagram of the structure of a database watermark embedding device proposed in an embodiment of the present invention
  • FIG. 4 shows a schematic structural diagram of a database watermark source tracing device according to an embodiment of the present invention.
  • FIG5 shows a block diagram of an electronic device according to an embodiment of the present invention.
  • the present application embodiment provides a method for embedding a database watermark, as shown in FIG1 , the method comprising the following steps:
  • Step S101 converting the data to be processed into preliminary coded data according to a preset coding rule.
  • the data to be processed may be sensitive data specified by the user.
  • the user may determine the sensitive data by defining keywords or metadata information and then matching the keywords or metadata information; or may define regular expressions according to the structural composition rules of sensitive data by studying the characteristics of sensitive data, and then determine the sensitive data by matching the regular expressions.
  • the data to be processed may include a variety of characters, such as text, numbers, letters, punctuation marks, graphic symbols, etc.
  • the preset encoding rule is a general encoding rule that can uniformly encode different types of characters in the data to be processed.
  • the data to be processed can be converted into preliminary encoded data according to the preset encoding rule.
  • the preset encoding rules include encoding rules corresponding to hexadecimal Unicode encoding, or decimal Unicode encoding, or hexadecimal GBK encoding, or decimal GBK encoding.
  • Unicode is a unified code, which is a character encoding scheme developed by an international organization that can accommodate all the characters and symbols in the world.
  • GBK Choinese Internal Code Specification
  • English uses single-byte encoding, which is fully compatible with ASCII character encoding
  • the Chinese part uses double-byte encoding.
  • the preset encoding rule can use the encoding rule corresponding to the decimal or hexadecimal Unicode encoding, or the encoding rule corresponding to the decimal or hexadecimal GBK encoding.
  • Step S102 converting each character in the preliminary encoded data into a binary number of a preset length to obtain binary data.
  • each character in the preliminary coded data needs to be converted into a binary number of a preset length to achieve normalization of the preliminary coded data.
  • the preset length is not less than the length of the original binary number corresponding to the character.
  • each character in the preliminary encoded data is converted into a binary number of a preset length to obtain binary data, specifically:
  • the binary data is obtained according to a binary number of a preset length corresponding to each of the characters.
  • each character is first converted into a binary number in sequence to obtain an original binary number corresponding to each character.
  • the length of the original binary number may not reach the preset length. If the length of the original binary number is less than the preset length, zeros are added before the highest bit of the original binary number to make the length of the original binary number reach the preset length.
  • binary data is formed according to the binary numbers of each preset length. For example, if the preset length is 8 bits and the original binary number is 6 bits, two zeros are added before the highest bit of the original binary number.
  • the preset length may be 8 bits or 16 bits.
  • Step S103 Map the binary data into zero-width character string data according to a preset mapping relationship table.
  • the preset mapping relationship table is determined according to the mapping relationship between different binary numbers and different zero-width character strings according to the preset number of bits.
  • the zero-width character string consists of zero-width characters, which are non-printable Unicode characters with a byte width of 0. They are invisible but real characters that represent a certain control function in browsers and general text editors.
  • Binary data can be mapped to zero-width character string data according to the preset mapping relationship table.
  • the preset mapping relationship table can be as follows: As shown in Table 1.
  • the binary data is mapped to zero-width character string data according to a preset mapping relationship table, specifically:
  • Each group of the sub-data is mapped into a zero-width character string according to the preset mapping relationship table to obtain the zero-width character string data.
  • the binary data is first grouped according to a preset number of bits to obtain multiple groups of sub-data, and then a preset mapping relationship table is queried according to each group of sub-data, and each zero-width character string is determined according to the query result, so as to map each group of sub-data to zero-width character string data.
  • the preset number of bits is 2, and those skilled in the art may also adopt other preset number of bits according to actual needs.
  • Step S104 adding a preset zero-width character string before and after the zero-width character string data to obtain final encoded data.
  • Step S105 embed the final encoded data as a database watermark into an embedding position corresponding to the data to be processed.
  • the data to be processed corresponds to an embedding position, which can be a position specified by the user.
  • the final encoded data can be fixed or it can be the data to be processed itself, and the final encoded data can be embedded in the embedding position as a database watermark.
  • the method before embedding the final encoded data as the database watermark into the embedding position corresponding to the data to be processed, the method further includes:
  • the label mark is determined in advance according to the type, length, position and attribute of the data to be processed.
  • the label mark is determined in advance according to the type, length, position and attribute of the data to be processed, and the embedding position of the database watermark can be determined according to the label mark.
  • a hash algorithm can be used to process the type, length, position and attribute of the data to be processed, and a label mark is obtained according to the processing result.
  • the data to be processed is converted into preliminary coded data according to the preset coding rules; each character in the preliminary coded data is converted into a binary number of preset length and binary data is obtained; the binary data is mapped into zero-width character string data according to the preset mapping relationship table; the preset zero-width character string is added before and after the zero-width character string data respectively to obtain the final coded data; the final coded data is embedded as a database watermark in the embedding position corresponding to the data to be processed; wherein the preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers of preset bits and different zero-width character strings.
  • the data to be processed is uniformly converted into preliminary coded data first, it is avoided to use different algorithms to process different types of data, thereby improving the versatility and ensuring the high performance of the database.
  • the final coded data uses a zero-width character string, which will not affect the data display after embedding the database watermark, thereby realizing efficient database watermark embedding without affecting the data display.
  • the present application also proposes a database watermark tracing method, as shown in FIG2 , the method comprising the following steps:
  • Step S201 determining the final encoded data according to the embedding position.
  • the embedded position can be obtained according to the tracing instruction input by the user.
  • Step S202 removing the preset zero-width character strings before and after the final encoded data and obtaining the zero-width character string data.
  • Step S203 Map the zero-width character string data to the binary data according to the preset mapping relationship table.
  • Step S204 dividing the binary data into multiple groups of binary numbers according to the preset length, and converting each group of binary numbers into each character of the preliminary coded data.
  • Step S205 convert each of the characters into the data to be processed according to the preset encoding rule, and use the data to be processed as the tracing result data.
  • the embedding position is determined by a tag mark, and the tag mark is pre-determined according to the type, length, position and attribute of the data to be processed.
  • the traceability instruction input by the user may include the tag mark.
  • the final encoded data is determined according to the embedding position; the preset zero-width character strings before and after the final encoded data are removed to obtain zero-width character string data; the zero-width character string data is mapped to binary data according to a preset mapping relationship table; the binary data is divided into multiple groups of binary numbers according to a preset length, and each group of binary numbers is converted into each character of the preliminary encoded data; each character is converted into data to be processed according to a preset encoding rule, and the data to be processed is used as the traceability result data, so that the traceability of the database watermark can be achieved with only a simple mapping, and the high performance of the database is guaranteed.
  • the present application provides a method for embedding a database watermark, comprising the following steps:
  • Step S301 receiving the data to be processed R0, and encoding it according to the encoding rules corresponding to the hexadecimal Unicode encoding to obtain preliminary encoded data R1.
  • each character in the data to be processed is converted into hexadecimal Unicode encoding, and the encoding rule can be shown in Table 2.
  • Step S302 character normalization.
  • Step S303 using zero-width string encoding.
  • the data in R2 is converted into a zero-width string for every two digits, where 00 is converted into ⁇ u200b, 01 is converted into ⁇ u200c, 10 is converted into ⁇ u200d, and 11 is converted into ⁇ u200e, to obtain zero-width string data R3.
  • Step S304 adding prefixes and suffixes.
  • Step S305 embed R4 as a database watermark into an embedding position corresponding to the data to be processed.
  • the present application provides a method for tracing the source of a database watermark, comprising the following steps:
  • Step S401 determining final encoded data according to the embedding position.
  • Step S402 remove the prefix and suffix.
  • Step S403 decoding the zero-width character string data.
  • R3 is converted into binary data R2.
  • Step S404 restoring to preliminary coded data.
  • Step S405 convert R1 into the data to be processed R0, and use the data to be processed R0 as the traceability result data.
  • the hexadecimal R1 is converted according to the Unicode comparison table to obtain R0.
  • the final encoded data is ⁇ uFEFF[S3 result] ⁇ uFEFF.
  • the database watermark embedded in the data is:
  • Step S601 extracting the final encoded data.
  • Step S602 remove the prefix and suffix.
  • Step S603 Decode zero-width string
  • Step S604 restore to hexadecimal Unicode encoding.
  • Step S605 restore S1 to original information.
  • the embodiment of the present application also proposes a database watermark embedding device, as shown in FIG3 , the device includes:
  • the first conversion module 301 is used to convert the data to be processed into preliminary coded data according to a preset coding rule
  • a second conversion module 302 used to convert each character in the preliminary encoded data into a binary number of a preset length and obtain binary data
  • a first mapping module 303 configured to map the binary data into zero-width character string data according to a preset mapping relationship table
  • An adding module 304 is used to add a preset zero-width character string before and after the zero-width character string data to obtain final encoded data;
  • An embedding module 305 used for embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed;
  • the preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
  • the embodiment of the present application further proposes a database watermark tracing device, as shown in FIG4 , the device includes:
  • a determination module 401 configured to determine the final encoded data according to the embedding position
  • a removal module 402 used for removing the preset zero-width character string before and after the final encoded data and obtaining the zero-width character string data
  • a second mapping module 403, configured to map the zero-width character string data into the binary data according to the preset mapping relationship table
  • a third conversion module 404 configured to divide the binary data into a plurality of groups of binary numbers according to the preset length, and convert each group of binary numbers into each character of the preliminary coded data;
  • the fourth conversion module 405 is used to convert each of the characters into the data to be processed according to the preset encoding rule, and use the data to be processed as the tracing result data.
  • the embodiment of the present invention further provides an electronic device, as shown in FIG5 , including a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 communicate with each other via the communication bus 504.
  • the processor 501 is configured to execute, by executing the executable instructions:
  • mapping the binary data into zero-width character string data according to a preset mapping relationship table
  • the preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
  • Each of the characters is converted into the data to be processed according to the preset encoding rule, and the data to be processed is used as the traceability result data.
  • the communication bus can be a PCI (Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture) bus.
  • the communication bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface is used for communication between the above electronic device and other devices.
  • the memory may include RAM (Random Access Memory), or may include a non-volatile memory, such as at least one disk storage.
  • the memory may also be at least one storage device located away from the aforementioned processor.
  • processors can be general-purpose processors, including CPU (Central Processing Unit), NP (Network Processor), etc.; they can also be DSP (Digital Signal Processing), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • CPU Central Processing Unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • Other programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
  • a computer-readable storage medium in which a computer program is stored.
  • the computer program is executed by a processor, the database watermark embedding method or database watermark tracing method as described above is implemented.
  • a computer program product including instructions is provided.
  • the computer program product When the computer program product is run on a computer, the computer executes the database watermark embedding method or database watermark tracing method as described above.
  • all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof.
  • all or part of the embodiments may be implemented in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrated therein.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk), etc.

Abstract

Disclosed in the present invention are a database watermark embedding method and apparatus, a database watermark tracing method and apparatus, and an electronic device. The embedding method comprises: converting data to be processed into preliminarily coded data according to a preset coding rule; converting each character in the preliminarily coded data into a binary number of a preset length and obtaining binary data; mapping the binary data to zero-width character string data according to a preset mapping relation table; separately adding a preset zero-width character string before and after the zero-width character string data to obtain final coded data; and embedding the final coded data as a database watermark into an embedding position corresponding to the data to be processed. The preset length is not smaller than the length of an original binary number corresponding to the character, and the preset mapping relation table is determined according to the mapping relations between different binary numbers of a preset bit and different zero-width character strings. Therefore, efficient database watermark embedding is achieved while data presentation is not affected.

Description

数据库水印的嵌入方法、溯源方法、装置和电子设备Database watermark embedding method, tracing method, device and electronic device 技术领域Technical Field
本申请涉及数据库技术领域,更具体地,涉及一种数据库水印的嵌入方法、溯源方法、装置和电子设备。The present application relates to the field of database technology, and more specifically, to a database watermark embedding method, a source tracing method, a device and an electronic device.
背景技术Background technique
数据库水印技术是用隐秘的手段将版权说明、用户身份等水印信息,在不影响原数据使用情况下,嵌入到表数据、文件数据中,从而解决数据在共享、分发、使用中数据泄露无法溯源的技术难题,并在数据共享、分发和使用的过程中保障数据安全,提升数据共享的价值。Database watermarking technology uses covert means to embed watermark information such as copyright descriptions and user identities into table data and file data without affecting the use of the original data, thereby solving the technical problem of data leakage that cannot be traced during data sharing, distribution, and use, and ensuring data security during data sharing, distribution, and use, thereby enhancing the value of data sharing.
现有技术中,实现数据库水印的算法通常是根据数据的不同类型,通过采用不同变换算法对数据做不可感知的变换,从而将水印数据隐藏在具体的数据中,完成数据库水印的嵌入;数据溯源时,再使用与采用的算法类型对应的回溯算法对水印信息进行还原,实现数据泄漏等情况下的数据溯源。In the prior art, the algorithm for implementing database watermarking is usually based on the different types of data. Different transformation algorithms are used to make imperceptible transformations on the data, thereby hiding the watermark data in the specific data and completing the embedding of the database watermark. When tracing the data, the backtracking algorithm corresponding to the algorithm type is used to restore the watermark information, thereby realizing data tracing in cases of data leakage.
此种基于不同变换算法和回溯算法实现数据库水印的方法,虽然解决了数据泄露数据无法溯源的技术难题,但是该方法需要采用采用不同变换算法才能对不同类型的数据进行变换,通用型差;另外,在通过变换算法将数据进行变换后,水印数据也被插入数据,成为数据的组成部分,使得数据在变换后的数据值发生改变,用户无法直接读取数据的值,必须通过回溯方法才能读取,计算环节较为复杂;同时,不同算法的计算需要占用数据库系统一部分计算资源,降低了数据库性能,也严重影响了数据展示。Although this method of implementing database watermarks based on different transformation algorithms and backtracking algorithms solves the technical problem that data leakage data cannot be traced, this method requires the use of different transformation algorithms to transform different types of data, and has poor generality. In addition, after the data is transformed by the transformation algorithm, the watermark data is also inserted into the data and becomes a component of the data, causing the data value to change after the transformation. Users cannot directly read the value of the data and must use the backtracking method to read it, and the calculation process is relatively complicated. At the same time, the calculation of different algorithms requires the use of part of the computing resources of the database system, which reduces the database performance and seriously affects the data display.
因此,如何在不影响数据展示的基础上实现高效的数据库水印嵌入,是目前有待解决的技术问题。Therefore, how to achieve efficient database watermark embedding without affecting data display is a technical problem that needs to be solved.
发明内容Summary of the invention
本申请实施例提供一种数据库水印的嵌入方法、溯源方法、装置和电子设备,用以在不影响数据展示的基础上实现高效的数据库水印嵌入。The embodiments of the present application provide a database watermark embedding method, a traceability method, a device and an electronic device, which are used to achieve efficient database watermark embedding without affecting data display.
第一方面,提供一种数据库水印的嵌入方法,所述方法包括:In a first aspect, a method for embedding a database watermark is provided, the method comprising:
根据预设编码规则将待处理数据转换为初步编码数据; Convert the data to be processed into preliminary coded data according to preset coding rules;
将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据;Convert each character in the preliminary coded data into a binary number of a preset length and obtain binary data;
根据预设映射关系表将所述二进制数据映射为零宽字符串数据;Mapping the binary data into zero-width character string data according to a preset mapping relationship table;
在所述零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据;Adding preset zero-width character strings before and after the zero-width character string data to obtain final encoded data;
将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置;Embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed;
其中,所述预设长度不小于与所述字符对应的原始二进制数的长度,所述预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的映射关系确定的。The preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
在一些实施例中,在将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置之前,所述方法还包括:In some embodiments, before embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed, the method further includes:
根据所述待处理数据的标签标记确定所述嵌入位置;Determining the embedding position according to the tag mark of the data to be processed;
其中,所述标签标记是预先根据所述待处理数据的类型、长度、位置和属性确定的。The label mark is determined in advance according to the type, length, position and attribute of the data to be processed.
在一些实施例中,根据预设映射关系表将所述二进制数据映射为零宽字符串数据,具体为:In some embodiments, the binary data is mapped to zero-width character string data according to a preset mapping relationship table, specifically:
按所述预设位数将所述二进制数据划分为多组子数据;Dividing the binary data into a plurality of groups of sub-data according to the preset number of bits;
将各组所述子数据按照所述预设映射关系表分别映射为零宽字符串并得到所述零宽字符串数据。Each group of the sub-data is mapped into a zero-width character string according to the preset mapping relationship table to obtain the zero-width character string data.
在一些实施例中,将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据,具体为:In some embodiments, each character in the preliminary encoded data is converted into a binary number of a preset length to obtain binary data, specifically:
按先后顺序依次对每位所述字符进行二进制转换得到各所述原始二进制数;Perform binary conversion on each character in sequence to obtain the original binary number;
若所述原始二进制数的长度小于所述预设长度,在所述原始二进制数的最高位之前补零并使所述原始二进制数的长度达到所述预设长度;If the length of the original binary number is less than the preset length, padding zeros before the highest bit of the original binary number so that the length of the original binary number reaches the preset length;
根据与各所述字符对应的为预设长度的二进制数得到所述二进制数据。The binary data is obtained according to a binary number of a preset length corresponding to each of the characters.
在一些实施例中,所述预设编码规则包括与十六进制Unicode编码,或十进制Unicode编码,或十六进制GBK编码,或十进制GBK编码对应的编码规则。In some embodiments, the preset encoding rule includes an encoding rule corresponding to hexadecimal Unicode encoding, or decimal Unicode encoding, or hexadecimal GBK encoding, or decimal GBK encoding.
第二方面,提供一种如第一方面所述数据库水印的溯源方法,所述方法 包括:In a second aspect, a method for tracing the database watermark as described in the first aspect is provided. include:
根据所述嵌入位置确定所述最终编码数据;Determining the final encoded data according to the embedding position;
去除所述最终编码数据前、后的所述预设零宽字符串并得到所述零宽字符串数据;Removing the preset zero-width character strings before and after the final encoded data and obtaining the zero-width character string data;
根据所述预设映射关系表将所述零宽字符串数据映射为所述二进制数据;Mapping the zero-width character string data to the binary data according to the preset mapping relationship table;
按所述预设长度将所述二进制数据划分为多组二进制数,并将各组二进制数转换为所述初步编码数据的各字符;Dividing the binary data into multiple groups of binary numbers according to the preset length, and converting each group of binary numbers into each character of the preliminary coded data;
根据所述预设编码规则将各所述字符转换为所述待处理数据,并将所述待处理数据作为溯源结果数据。Each of the characters is converted into the data to be processed according to the preset encoding rule, and the data to be processed is used as the traceability result data.
在一些实施例中,所述嵌入位置由标签标记确定,所述标签标记是预先根据所述待处理数据的类型、长度、位置和属性确定的。In some embodiments, the embedding position is determined by a tag mark, and the tag mark is determined in advance according to the type, length, position and attribute of the data to be processed.
第三方面,提供一种数据库水印的嵌入装置,所述装置包括:In a third aspect, a database watermark embedding device is provided, the device comprising:
第一转换模块,用于根据预设编码规则将待处理数据转换为初步编码数据;A first conversion module, used to convert the data to be processed into preliminary coded data according to a preset coding rule;
第二转换模块,用于将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据;A second conversion module, used for converting each character in the preliminary coded data into a binary number of a preset length and obtaining binary data;
第一映射模块,用于根据预设映射关系表将所述二进制数据映射为零宽字符串数据;A first mapping module, used for mapping the binary data into zero-width character string data according to a preset mapping relationship table;
添加模块,用于在所述零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据;An adding module, used for adding a preset zero-width character string before and after the zero-width character string data respectively to obtain final encoded data;
嵌入模块,用于将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置;An embedding module, used for embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed;
其中,所述预设长度不小于与所述字符对应的原始二进制数的长度,所述预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的映射关系确定的。The preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
第四方面,提供一种如第三方面所述数据库水印的溯源装置,所述装置包括:In a fourth aspect, a device for tracing the source of a database watermark as described in the third aspect is provided, the device comprising:
确定模块,用于根据所述嵌入位置确定所述最终编码数据;A determination module, used for determining the final encoded data according to the embedding position;
去除模块,用于去除所述最终编码数据前、后的所述预设零宽字符串并得到所述零宽字符串数据; A removal module, used for removing the preset zero-width character string before and after the final encoded data and obtaining the zero-width character string data;
第二映射模块,用于根据所述预设映射关系表将所述零宽字符串数据映射为所述二进制数据;A second mapping module, used for mapping the zero-width character string data into the binary data according to the preset mapping relationship table;
第三转换模块,用于按所述预设长度将所述二进制数据划分为多组二进制数,并将各组二进制数转换为所述初步编码数据的各字符;A third conversion module, used for dividing the binary data into a plurality of groups of binary numbers according to the preset length, and converting each group of binary numbers into each character of the preliminary coded data;
第四转换模块,用于根据所述预设编码规则将各所述字符转换为所述待处理数据,并将所述待处理数据作为溯源结果数据。The fourth conversion module is used to convert each of the characters into the data to be processed according to the preset encoding rule, and use the data to be processed as the tracing result data.
第五方面,提供一种电子设备,包括:In a fifth aspect, an electronic device is provided, including:
处理器;以及Processor; and
存储器,用于存储所述处理器的可执行指令;A memory, configured to store executable instructions of the processor;
其中,所述处理器配置为经由执行所述可执行指令来执行第一方面中任意一项所述的嵌入方法或第二方面中任意一项所述的溯源方法。The processor is configured to execute the embedding method described in any one of the first aspects or the tracing method described in any one of the second aspects by executing the executable instructions.
通过应用以上技术方案,根据预设编码规则将待处理数据转换为初步编码数据;将初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据;根据预设映射关系表将二进制数据映射为零宽字符串数据;在零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据;将最终编码数据作为数据库水印嵌入与待处理数据对应的嵌入位置;其中,预设长度不小于与字符对应的原始二进制数的长度,预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的映射关系确定的,由于先将待处理数据统一转换为初步编码数据,避免了采用不同的算法对不同类型的数据进行处理,提高了通用性,保证了数据库的高性能,并且最终编码数据采用零宽字符串,在嵌入数据库水印后不会影响数据展示,从而在不影响数据展示的基础上实现高效的数据库水印嵌入。By applying the above technical scheme, the data to be processed is converted into preliminary coded data according to the preset coding rules; each character in the preliminary coded data is converted into a binary number of preset length and binary data is obtained; the binary data is mapped into zero-width character string data according to the preset mapping relationship table; the preset zero-width character string is added before and after the zero-width character string data respectively to obtain the final coded data; the final coded data is embedded as a database watermark in the embedding position corresponding to the data to be processed; wherein the preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers of preset bits and different zero-width character strings. Since the data to be processed is uniformly converted into preliminary coded data first, it is avoided to use different algorithms to process different types of data, thereby improving the versatility and ensuring the high performance of the database. Moreover, the final coded data uses a zero-width character string, which will not affect the data display after embedding the database watermark, thereby realizing efficient database watermark embedding without affecting the data display.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the description of the embodiments will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.
图1示出了本发明实施例提出的一种数据库水印的嵌入方法的流程示意图;FIG1 is a schematic diagram showing a flow chart of a method for embedding a database watermark according to an embodiment of the present invention;
图2示出了本发明实施例提出的一种数据库水印的溯源方法的流程示意 图;FIG. 2 shows a schematic diagram of a process of tracing the source of a database watermark according to an embodiment of the present invention. picture;
图3示出了本发明实施例提出的一种数据库水印的嵌入装置的结构示意图;FIG3 shows a schematic diagram of the structure of a database watermark embedding device proposed in an embodiment of the present invention;
图4示出了本发明实施例提出的一种数据库水印的溯源装置的结构示意图。FIG. 4 shows a schematic structural diagram of a database watermark source tracing device according to an embodiment of the present invention.
图5示出了本发明实施例提出的一种电子设备的框图。FIG5 shows a block diagram of an electronic device according to an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will be combined with the drawings in the embodiments of the present application to clearly and completely describe the technical solutions in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of this application.
本申请实施例提供一种数据库水印的嵌入方法,如图1所示,所述方法包括以下步骤:The present application embodiment provides a method for embedding a database watermark, as shown in FIG1 , the method comprising the following steps:
步骤S101,根据预设编码规则将待处理数据转换为初步编码数据。Step S101, converting the data to be processed into preliminary coded data according to a preset coding rule.
本实施例中,待处理数据可以为用户指定的敏感数据,用户可通过定义关键词或者元数据信息,然后匹配关键词或元数据信息确定敏感数据;也可通过研究敏感数据的特征,按照敏感数据的结构组成规律定义正则表达式,之后依据正则表达式匹配确定敏感数据。待处理数据中可包括多种字符,如文字、数字、字母、标点符号、图形符号等,预设编码规则为可对待处理数据中不同类型字符进行统一编码的一种通用编码规则,根据该预设编码规则可将待处理数据转换为初步编码数据。In this embodiment, the data to be processed may be sensitive data specified by the user. The user may determine the sensitive data by defining keywords or metadata information and then matching the keywords or metadata information; or may define regular expressions according to the structural composition rules of sensitive data by studying the characteristics of sensitive data, and then determine the sensitive data by matching the regular expressions. The data to be processed may include a variety of characters, such as text, numbers, letters, punctuation marks, graphic symbols, etc. The preset encoding rule is a general encoding rule that can uniformly encode different types of characters in the data to be processed. The data to be processed can be converted into preliminary encoded data according to the preset encoding rule.
为了可靠的得到初步编码数据,在本申请一些实施例中,所述预设编码规则包括与十六进制Unicode编码,或十进制Unicode编码,或十六进制GBK编码,或十进制GBK编码对应的编码规则。In order to reliably obtain preliminary encoded data, in some embodiments of the present application, the preset encoding rules include encoding rules corresponding to hexadecimal Unicode encoding, or decimal Unicode encoding, or hexadecimal GBK encoding, or decimal GBK encoding.
本实施例中,Unicode即统一码,是国际组织制定的可以容纳世界上所有文字和符号的字符编码方案。GBK(Chinese Internal Code Specification,汉字编码字符集)采用单双字节变长编码,英文使用单字节编码,完全兼容ASCII字符编码,中文部分采用双字节编码。预设编码规则可采用与十进制或十六进制的Unicode编码对应得编码规则,也可采用与十进制或十六进制的GBK编码对应的编码规则。 In this embodiment, Unicode is a unified code, which is a character encoding scheme developed by an international organization that can accommodate all the characters and symbols in the world. GBK (Chinese Internal Code Specification) uses single- and double-byte variable-length encoding, English uses single-byte encoding, which is fully compatible with ASCII character encoding, and the Chinese part uses double-byte encoding. The preset encoding rule can use the encoding rule corresponding to the decimal or hexadecimal Unicode encoding, or the encoding rule corresponding to the decimal or hexadecimal GBK encoding.
本领域技术人员也可根据实际需要采用其他类型的预设编码规则,这并不影响本申请的保护范围。Those skilled in the art may also adopt other types of preset coding rules according to actual needs, which does not affect the protection scope of the present application.
步骤S102,将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据。Step S102, converting each character in the preliminary encoded data into a binary number of a preset length to obtain binary data.
本实施例中,为了便于计算机进行处理,需要将初步编码数据中的每位字符转换为预设长度的二进制数,以实现对初步编码数据的归一化。为了保证初步编码数据中的每位字符都实现归一化,该预设长度不小于与所述字符对应的原始二进制数的长度。In this embodiment, in order to facilitate computer processing, each character in the preliminary coded data needs to be converted into a binary number of a preset length to achieve normalization of the preliminary coded data. In order to ensure that each character in the preliminary coded data is normalized, the preset length is not less than the length of the original binary number corresponding to the character.
为了得到准确的二进制数据,在本申请一些实施例中,将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据,具体为:In order to obtain accurate binary data, in some embodiments of the present application, each character in the preliminary encoded data is converted into a binary number of a preset length to obtain binary data, specifically:
按先后顺序依次对每位所述字符进行二进制转换得到各所述原始二进制数;Perform binary conversion on each character in sequence to obtain the original binary number;
若所述原始二进制数的长度小于所述预设长度,在所述原始二进制数的最高位之前补零并使所述原始二进制数的长度达到所述预设长度;If the length of the original binary number is less than the preset length, padding zeros before the highest bit of the original binary number so that the length of the original binary number reaches the preset length;
根据与各所述字符对应的为预设长度的二进制数得到所述二进制数据。The binary data is obtained according to a binary number of a preset length corresponding to each of the characters.
本实施例中,先按先后顺序依次对每位所述字符进行二进制转换,得到与每位字符对应的原始二进制数,原始二进制数的长度可能达不到预设长度,若原始二进制数的长度小于预设长度,在原始二进制数的最高位之前补零,以使原始二进制数的长度达到预设长度,在每个字符对应的二进制数均为预设长度时,根据各预设长度的二进制数组成二进制数据。举例来说,若预设长度为8位,原始二进制数为6位,则在原始二进制数的最高位之前补两个零。In this embodiment, each character is first converted into a binary number in sequence to obtain an original binary number corresponding to each character. The length of the original binary number may not reach the preset length. If the length of the original binary number is less than the preset length, zeros are added before the highest bit of the original binary number to make the length of the original binary number reach the preset length. When the binary numbers corresponding to each character are all of the preset length, binary data is formed according to the binary numbers of each preset length. For example, if the preset length is 8 bits and the original binary number is 6 bits, two zeros are added before the highest bit of the original binary number.
可选的,预设长度可以为8位或16位。Optionally, the preset length may be 8 bits or 16 bits.
步骤S103,根据预设映射关系表将所述二进制数据映射为零宽字符串数据。Step S103: Map the binary data into zero-width character string data according to a preset mapping relationship table.
本实施例中,预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的映射关系确定的。零宽字符串由零宽度字符组成,零宽度字符是一种字节宽度为0的不可打印的Unicode字符,在浏览器和一般的文本编辑器中一种不可见、但真实存在的表示某一种控制功能的字符。根据预设映射关系表可将二进制数据映射为零宽字符串数据。In this embodiment, the preset mapping relationship table is determined according to the mapping relationship between different binary numbers and different zero-width character strings according to the preset number of bits. The zero-width character string consists of zero-width characters, which are non-printable Unicode characters with a byte width of 0. They are invisible but real characters that represent a certain control function in browsers and general text editors. Binary data can be mapped to zero-width character string data according to the preset mapping relationship table.
在本申请具体的应用场景中,预设位数为2时,预设映射关系表可以如 表1所示。In the specific application scenario of this application, when the preset number of bits is 2, the preset mapping relationship table can be as follows: As shown in Table 1.
表1
Table 1
为了准确的得到零宽字符串数据,在本申请一些实施例中,根据预设映射关系表将所述二进制数据映射为零宽字符串数据,具体为:In order to accurately obtain zero-width character string data, in some embodiments of the present application, the binary data is mapped to zero-width character string data according to a preset mapping relationship table, specifically:
按所述预设位数将所述二进制数据划分为多组子数据;Dividing the binary data into a plurality of groups of sub-data according to the preset number of bits;
将各组所述子数据按照所述预设映射关系表分别映射为零宽字符串并得到所述零宽字符串数据。Each group of the sub-data is mapped into a zero-width character string according to the preset mapping relationship table to obtain the zero-width character string data.
本实施例中,先按照预设位数将二进制数据进行分组,得到多组子数据,然后根据各组子数据查询预设映射关系表,根据查询结果确定各零宽字符串,从而将各组子数据映射为零宽字符串数据。In this embodiment, the binary data is first grouped according to a preset number of bits to obtain multiple groups of sub-data, and then a preset mapping relationship table is queried according to each group of sub-data, and each zero-width character string is determined according to the query result, so as to map each group of sub-data to zero-width character string data.
可选的,在本申请一些实施例中预设位数为2,本领域技术人员也可根据实际需要采用其他的预设位数。Optionally, in some embodiments of the present application, the preset number of bits is 2, and those skilled in the art may also adopt other preset number of bits according to actual needs.
步骤S104,在所述零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据。Step S104, adding a preset zero-width character string before and after the zero-width character string data to obtain final encoded data.
本实施例中,为了便于区分出零宽字符串数据,需要将零宽字符串数据与其他数据(如主数据)之间进行隔离,在零宽字符串数据的前、后分别添加预设零宽字符串,并得到最终编码数据。In this embodiment, in order to distinguish zero-width character string data, it is necessary to isolate the zero-width character string data from other data (such as main data), add preset zero-width character strings before and after the zero-width character string data, and obtain the final encoded data.
本领域技术人员可根据实际需要设置不同的预设零宽字符串,这并不影响本申请的保护范围。Those skilled in the art may set different preset zero-width character strings according to actual needs, which does not affect the protection scope of the present application.
步骤S105,将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置。Step S105: embed the final encoded data as a database watermark into an embedding position corresponding to the data to be processed.
本实施例中,待处理数据对应一个嵌入位置,该嵌入位置可以是用户指 定的,也可以是待处理数据自带的,将最终编码数据作为数据库水印嵌入该嵌入位置。In this embodiment, the data to be processed corresponds to an embedding position, which can be a position specified by the user. The final encoded data can be fixed or it can be the data to be processed itself, and the final encoded data can be embedded in the embedding position as a database watermark.
为了使数据库水印准确嵌入,在本申请一些实施例中,在将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置之前,所述方法还包括:In order to accurately embed the database watermark, in some embodiments of the present application, before embedding the final encoded data as the database watermark into the embedding position corresponding to the data to be processed, the method further includes:
根据所述待处理数据的标签标记确定所述嵌入位置;Determining the embedding position according to the tag mark of the data to be processed;
其中,所述标签标记是预先根据所述待处理数据的类型、长度、位置和属性确定的。The label mark is determined in advance according to the type, length, position and attribute of the data to be processed.
本实施例中,预先根据待处理数据的类型、长度、位置和属性确定标签标记,根据该标签标记可确定数据库水印的嵌入位置。可选的,在本申请一些实施例中,可使用哈希算法对待处理数据的类型、长度、位置和属性进行处理,根据处理结果得到标签标记。In this embodiment, the label mark is determined in advance according to the type, length, position and attribute of the data to be processed, and the embedding position of the database watermark can be determined according to the label mark. Optionally, in some embodiments of the present application, a hash algorithm can be used to process the type, length, position and attribute of the data to be processed, and a label mark is obtained according to the processing result.
通过应用以上技术方案,根据预设编码规则将待处理数据转换为初步编码数据;将初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据;根据预设映射关系表将二进制数据映射为零宽字符串数据;在零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据;将最终编码数据作为数据库水印嵌入与待处理数据对应的嵌入位置;其中,预设长度不小于与字符对应的原始二进制数的长度,预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的映射关系确定的,由于先将待处理数据统一转换为初步编码数据,避免了采用不同的算法对不同类型的数据进行处理,提高了通用性,保证了数据库的高性能,并且最终编码数据采用零宽字符串,在嵌入数据库水印后不会影响数据展示,从而在不影响数据展示的基础上实现高效的数据库水印嵌入。By applying the above technical scheme, the data to be processed is converted into preliminary coded data according to the preset coding rules; each character in the preliminary coded data is converted into a binary number of preset length and binary data is obtained; the binary data is mapped into zero-width character string data according to the preset mapping relationship table; the preset zero-width character string is added before and after the zero-width character string data respectively to obtain the final coded data; the final coded data is embedded as a database watermark in the embedding position corresponding to the data to be processed; wherein the preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers of preset bits and different zero-width character strings. Since the data to be processed is uniformly converted into preliminary coded data first, it is avoided to use different algorithms to process different types of data, thereby improving the versatility and ensuring the high performance of the database. Moreover, the final coded data uses a zero-width character string, which will not affect the data display after embedding the database watermark, thereby realizing efficient database watermark embedding without affecting the data display.
与本申请实施例中的一种数据库水印的嵌入方法相对应,本申请还提出了一种数据库水印的溯源方法,如图2所示,所述方法包括以下步骤:Corresponding to a database watermark embedding method in an embodiment of the present application, the present application also proposes a database watermark tracing method, as shown in FIG2 , the method comprising the following steps:
步骤S201,根据所述嵌入位置确定所述最终编码数据。Step S201, determining the final encoded data according to the embedding position.
该嵌入位置可根据用户输入的溯源指令获取。The embedded position can be obtained according to the tracing instruction input by the user.
步骤S202,去除所述最终编码数据前、后的所述预设零宽字符串并得到所述零宽字符串数据。Step S202, removing the preset zero-width character strings before and after the final encoded data and obtaining the zero-width character string data.
步骤S203,根据所述预设映射关系表将所述零宽字符串数据映射为所述二进制数据。 Step S203: Map the zero-width character string data to the binary data according to the preset mapping relationship table.
步骤S204,按所述预设长度将所述二进制数据划分为多组二进制数,并将各组二进制数转换为所述初步编码数据的各字符。Step S204, dividing the binary data into multiple groups of binary numbers according to the preset length, and converting each group of binary numbers into each character of the preliminary coded data.
步骤S205,根据所述预设编码规则将各所述字符转换为所述待处理数据,并将所述待处理数据作为溯源结果数据。Step S205: convert each of the characters into the data to be processed according to the preset encoding rule, and use the data to be processed as the tracing result data.
为了准确的确定嵌入位置,在本申请一些实施例中,所述嵌入位置由标签标记确定,所述标签标记是预先根据所述待处理数据的类型、长度、位置和属性确定的。用户输入的溯源指令中可包括该标签标记。In order to accurately determine the embedding position, in some embodiments of the present application, the embedding position is determined by a tag mark, and the tag mark is pre-determined according to the type, length, position and attribute of the data to be processed. The traceability instruction input by the user may include the tag mark.
通过应用以上技术方案,根据嵌入位置确定最终编码数据;去除最终编码数据前、后的预设零宽字符串并得到零宽字符串数据;根据预设映射关系表将零宽字符串数据映射为二进制数据;按预设长度将二进制数据划分为多组二进制数,并将各组二进制数转换为初步编码数据的各字符;根据预设编码规则将各字符转换为待处理数据,并将待处理数据作为溯源结果数据,从而只需要简单映射即可实现数据库水印的溯源,并保证了数据库的高性能。By applying the above technical scheme, the final encoded data is determined according to the embedding position; the preset zero-width character strings before and after the final encoded data are removed to obtain zero-width character string data; the zero-width character string data is mapped to binary data according to a preset mapping relationship table; the binary data is divided into multiple groups of binary numbers according to a preset length, and each group of binary numbers is converted into each character of the preliminary encoded data; each character is converted into data to be processed according to a preset encoding rule, and the data to be processed is used as the traceability result data, so that the traceability of the database watermark can be achieved with only a simple mapping, and the high performance of the database is guaranteed.
为了进一步阐述本发明的技术思想,现结合具体的应用场景,对本发明的技术方案进行说明。In order to further explain the technical idea of the present invention, the technical solution of the present invention is now described in combination with specific application scenarios.
本申请实施例提供一种数据库水印的嵌入方法,包括以下步骤:The present application provides a method for embedding a database watermark, comprising the following steps:
步骤S301,接收待处理数据R0,并按照与十六进制Unicode编码对应的编码规则进行编码,得到初步编码数据R1。Step S301, receiving the data to be processed R0, and encoding it according to the encoding rules corresponding to the hexadecimal Unicode encoding to obtain preliminary encoded data R1.
具体的,将待处理数据中的每个字符转换为十六进制Unicode编码,该编码规则可如表2所示。Specifically, each character in the data to be processed is converted into hexadecimal Unicode encoding, and the encoding rule can be shown in Table 2.
表2

Table 2

步骤S302,字符归一化。Step S302: character normalization.
将R1中为十六进制的每一位字符转换为8位的二进制数来表示,不足8位的,在其高位使用0来补到8位,保证每一个字符对应一个8位的二进制数,得到二进制数据R2。Convert each hexadecimal character in R1 into an 8-bit binary number. If the number is less than 8 bits, use 0 to fill the high bit to 8 bits to ensure that each character corresponds to an 8-bit binary number, and obtain binary data R2.
步骤S303,使用零宽字符串编码。Step S303, using zero-width string encoding.
根据表1的对应关系,将R2中的数据按照每两位转变为一个零宽字符串,其中00转换为\u200b,01转换为\u200c,10转换为\u200d,11转换为\u200e,得到零宽字符串数据R3。According to the corresponding relationship in Table 1, the data in R2 is converted into a zero-width string for every two digits, where 00 is converted into \u200b, 01 is converted into \u200c, 10 is converted into \u200d, and 11 is converted into \u200e, to obtain zero-width string data R3.
步骤S304,添加前后缀。Step S304, adding prefixes and suffixes.
在R3的前、后分别添加预设零宽字符串uFEFF,用uFEFF实现对R3的隔离,并得到最终编码数据R4。Add a preset zero-width character string uFEFF before and after R3, use uFEFF to isolate R3, and obtain the final encoded data R4.
步骤S305,将R4作为数据库水印嵌入与待处理数据对应的嵌入位置。Step S305: embed R4 as a database watermark into an embedding position corresponding to the data to be processed.
本申请实施例提供一种数据库水印的溯源方法,包括以下步骤:The present application provides a method for tracing the source of a database watermark, comprising the following steps:
步骤S401,根据嵌入位置确定最终编码数据。Step S401, determining final encoded data according to the embedding position.
从嵌入位置找到\uFEFF标识,提取从\uFEFF开始到\uFEFF结束的数据,得到最终编码数据R4。Find the \uFEFF marker from the embedded position, extract the data starting from \uFEFF and ending at \uFEFF, and obtain the final encoded data R4.
步骤S402,移除前后缀。Step S402, remove the prefix and suffix.
将R4去掉开头的\uFEFF和结尾的\uFEFF,得到零宽字符串数据R3。Remove the leading \uFEFF and trailing \uFEFF from R4 to obtain zero-width character string data R3.
步骤S403,解码零宽字符串数据。Step S403, decoding the zero-width character string data.
根据表1的对应关系,将R3转换为二进制数据R2。According to the corresponding relationship in Table 1, R3 is converted into binary data R2.
步骤S404,还原为初步编码数据。Step S404, restoring to preliminary coded data.
将R2的二进制数据每8位转换为一个字符,得到十六进制的初步编码数据R1。Convert the binary data of R2 into one character every 8 bits to obtain the preliminary hexadecimal encoded data R1.
步骤S405,将R1转换为待处理数据R0,将待处理数据R0作为溯源结果数据。Step S405, convert R1 into the data to be processed R0, and use the data to be processed R0 as the traceability result data.
具体的,将十六进制的R1根据Unicode对照表进行转换,得到R0。Specifically, the hexadecimal R1 is converted according to the Unicode comparison table to obtain R0.
以下以“用户一”作为待处理数据为例,对数据库水印的嵌入和溯源过程进行说明。The following takes "User 1" as the data to be processed as an example to illustrate the embedding and tracing process of the database watermark.
数据库水印的嵌入过程如下: The embedding process of database watermark is as follows:
S501,将“用户一”转换为十六进制的Unicode编码结果S1:\u7528\u6237\u4e00。S501, converting "User 1" into a hexadecimal Unicode encoding result S1: \u7528\u6237\u4e00.
S502,字符归一化。S502, character normalization.
将S1中\u7528\u6237\u4e00的每一个字符都转换为8位的二进制数,不满足8位的,在其高位补0,最终得到结果为S2:Convert each character of \u7528\u6237\u4e00 in S1 into an 8-bit binary number. If the characters do not meet the 8-bit requirement, add 0 to the high bit. The final result is S2:
01011100011101010011011100110101001100100011100001011100011101010011011000110010001100110011011101011100011101010011010001100101001100000011000001011100011101010011011100110101001100100011100001011100001011101010100110010001100110011011101011100011101010100110100110010011000000110000
S503,使用零宽字符串编码。S503, use zero-width string encoding.
对S2中结果进行零宽字符串编码,根据预设映射关系表(表1)得到转换关系,最后结果为:The result in S2 is encoded by zero-width string, and the conversion relationship is obtained according to the preset mapping relationship table (Table 1). The final result is:
\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200e\u200b\u200e\u200c\u200c\u200b\u200e\u200b\u200d\u200b\u200e\u200d\u200b\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200d\u200b\u200e\u200b\u200d\u200b\u200e\u200b\u200e\u200b\u200e\u200c\u200e\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200b\u200c\u200d\u200c\u200c\u200b\u200e\u200b\u200b\u200b\u200e\u200b\u200b,该结果简化为[S3结果]。\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200e\u200b\u200e\u200c\u200c\u200b\u200e\u200c\u200c\u200b\u200d\u200b\u200e\u200d\u200b\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200d\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200d\u200b\ This result is simplified to [S3 result].
S504,添加前后缀。S504, add prefix and suffix.
对[S3结果]添加前后缀进行隔离,最后结果是:Add prefixes and suffixes to [S3 results] for isolation, and the final result is:
\uFEFF[S3结果]\uFEFF。\uFEFF[S3 result]\uFEFF.
S505,输出为最终编码数据。S505: Output is the final encoded data.
最终编码数据即为\uFEFF[S3结果]\uFEFF。The final encoded data is \uFEFF[S3 result]\uFEFF.
S506,将\uFEFF[S3结果]\uFEFF作为数据库水印嵌入与“用户一”对应的嵌入位置。S506, embed \uFEFF[S3 result]\uFEFF as the database watermark into the embedding position corresponding to "User 1".
嵌入到数据中的数据库水印为:The database watermark embedded in the data is:
\uFEFF\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200e\u200b\u200e\u200c\u200c\u200b\u200e\u200b\u200d\u200b\u200e\u200d\u200b\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200d\u200b\u200e\u200b\u200d\u200b\u200e\u200b\u200e\u200b\u200e\u200c\u200e\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200b\u200c\u200d\u200c\u200c\u200b\u200e\u200b\u200b\u200b\u2 00e\u200b\u200b\uFEFF。\uFEFF\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200e\u200b\u200e\u200c\u200c\u200b\u200e\u200c\u200c\u200b\u200d\u200b\u200e\u200d\u200b\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200c\u200b\u200e\u200c\u200c\u200b\u200e\ u200c\u200d\u200b\u200e\u200b\u200d\u200b\u200e\u200b\u200e\u200b\u200e\u200c\u200e\u200c\u200c\u200e\u200b\u200c\u200e\u200c\u200c\u200b\u200e\u200c\u200c\u200b\u200e\u200c\u200d\u200c\u200c\u200b\u200e\u200b\u200c\u200d\u200c\u200c\u200b\u200e\u200b\u200b\u200c 00e\u200b\u200b\uFEFF.
数据库水印的溯源过程如下:The tracing process of database watermark is as follows:
步骤S601,提取最终编码数据。Step S601, extracting the final encoded data.
从嵌入位置根据\uFEFF提取出最终编码数据:\uFEFF[S3结果]\uFEFF。Extract the final encoded data from the embedded position according to \uFEFF: \uFEFF[S3 result]\uFEFF.
步骤S602,移除前后缀。Step S602, remove the prefix and suffix.
去除前缀信息uFEFF\和后缀信息\uFEFF,得到[S3结果]。Remove the prefix information uFEFF\ and the suffix information \uFEFF to obtain [S3 result].
步骤S603,解码零宽字符串Step S603: Decode zero-width string
将[S3结果]根据表1对应关系还原为二进制数据S2:Restore [S3 result] to binary data S2 according to the corresponding relationship in Table 1:
01011100011101010011011100110101001100100011100001011100011101010011011000110010001100110011011101011100011101010011010001100101001100000011000001011100011101010011011100110101001100100011100001011100001011101010100110010001100110011011101011100011101010100110100110010011000000110000
步骤S604,还原为十六进制的Unicode编码。Step S604, restore to hexadecimal Unicode encoding.
将S2结果中各8位二进制数转换为对应的十六进制结果为S1:\u7528\u6237\u4e00。Convert each 8-bit binary number in S2 to the corresponding hexadecimal result S1: \u7528\u6237\u4e00.
步骤S605,将S1还原为原始信息。Step S605, restore S1 to original information.
通过十六进制Unicode编码表找到对应的字符信息得到“用户一”。Find the corresponding character information through the hexadecimal Unicode encoding table to get "User 1".
本申请实施例还提出了一种数据库水印的嵌入装置,如图3所示,所述装置包括:The embodiment of the present application also proposes a database watermark embedding device, as shown in FIG3 , the device includes:
第一转换模块301,用于根据预设编码规则将待处理数据转换为初步编码数据;The first conversion module 301 is used to convert the data to be processed into preliminary coded data according to a preset coding rule;
第二转换模块302,用于将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据;A second conversion module 302, used to convert each character in the preliminary encoded data into a binary number of a preset length and obtain binary data;
第一映射模块303,用于根据预设映射关系表将所述二进制数据映射为零宽字符串数据;A first mapping module 303, configured to map the binary data into zero-width character string data according to a preset mapping relationship table;
添加模块304,用于在所述零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据;An adding module 304 is used to add a preset zero-width character string before and after the zero-width character string data to obtain final encoded data;
嵌入模块305,用于将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置;An embedding module 305, used for embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed;
其中,所述预设长度不小于与所述字符对应的原始二进制数的长度,所述预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的映射关系确定的。 The preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
本申请实施例还提出了一种数据库水印的溯源装置,如图4所示,所述装置包括:The embodiment of the present application further proposes a database watermark tracing device, as shown in FIG4 , the device includes:
确定模块401,用于根据所述嵌入位置确定所述最终编码数据;A determination module 401, configured to determine the final encoded data according to the embedding position;
去除模块402,用于去除所述最终编码数据前、后的所述预设零宽字符串并得到所述零宽字符串数据;A removal module 402, used for removing the preset zero-width character string before and after the final encoded data and obtaining the zero-width character string data;
第二映射模块403,用于根据所述预设映射关系表将所述零宽字符串数据映射为所述二进制数据;A second mapping module 403, configured to map the zero-width character string data into the binary data according to the preset mapping relationship table;
第三转换模块404,用于按所述预设长度将所述二进制数据划分为多组二进制数,并将各组二进制数转换为所述初步编码数据的各字符;A third conversion module 404, configured to divide the binary data into a plurality of groups of binary numbers according to the preset length, and convert each group of binary numbers into each character of the preliminary coded data;
第四转换模块405,用于根据所述预设编码规则将各所述字符转换为所述待处理数据,并将所述待处理数据作为溯源结果数据。The fourth conversion module 405 is used to convert each of the characters into the data to be processed according to the preset encoding rule, and use the data to be processed as the tracing result data.
本发明实施例还提供了一种电子设备,如图5所示,包括处理器501、通信接口502、存储器503和通信总线504,其中,处理器501,通信接口502,存储器503通过通信总线504完成相互间的通信,The embodiment of the present invention further provides an electronic device, as shown in FIG5 , including a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502, and the memory 503 communicate with each other via the communication bus 504.
存储器503,用于存储处理器的可执行指令;Memory 503, used to store executable instructions of the processor;
处理器501,被配置为经由执行所述可执行指令来执行:The processor 501 is configured to execute, by executing the executable instructions:
根据预设编码规则将待处理数据转换为初步编码数据;Convert the data to be processed into preliminary coded data according to preset coding rules;
将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据;Convert each character in the preliminary coded data into a binary number of a preset length and obtain binary data;
根据预设映射关系表将所述二进制数据映射为零宽字符串数据;Mapping the binary data into zero-width character string data according to a preset mapping relationship table;
在所述零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据;Adding preset zero-width character strings before and after the zero-width character string data to obtain final encoded data;
将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置;Embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed;
其中,所述预设长度不小于与所述字符对应的原始二进制数的长度,所述预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的映射关系确定的。The preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
或,or,
根据所述嵌入位置确定所述最终编码数据;Determining the final encoded data according to the embedding position;
去除所述最终编码数据前、后的所述预设零宽字符串并得到所述零宽字符串数据; Removing the preset zero-width character strings before and after the final encoded data and obtaining the zero-width character string data;
根据所述预设映射关系表将所述零宽字符串数据映射为所述二进制数据;Mapping the zero-width character string data to the binary data according to the preset mapping relationship table;
按预设长度将所述二进制数据划分为多组二进制数,并将各组二进制数转换为所述初步编码数据的各字符;Dividing the binary data into multiple groups of binary numbers according to a preset length, and converting each group of binary numbers into each character of the preliminary coded data;
根据所述预设编码规则将各所述字符转换为所述待处理数据,并将所述待处理数据作为溯源结果数据。Each of the characters is converted into the data to be processed according to the preset encoding rule, and the data to be processed is used as the traceability result data.
上述通信总线可以是PCI(Peripheral Component Interconnect,外设部件互连标准)总线或EISA(Extended Industry Standard Architecture,扩展工业标准结构)总线等。该通信总线可以分为地址总线、数据总线、控制总线等。为便于表示,图中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The communication bus can be a PCI (Peripheral Component Interconnect) bus or an EISA (Extended Industry Standard Architecture) bus. The communication bus can be divided into an address bus, a data bus, a control bus, etc. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
通信接口用于上述电子设备与其他设备之间的通信。The communication interface is used for communication between the above electronic device and other devices.
存储器可以包括RAM(Random Access Memory,随机存取存储器),也可以包括非易失性存储器,例如至少一个磁盘存储器。可选的,存储器还可以是至少一个位于远离前述处理器的存储装置。The memory may include RAM (Random Access Memory), or may include a non-volatile memory, such as at least one disk storage. Optionally, the memory may also be at least one storage device located away from the aforementioned processor.
上述的处理器可以是通用处理器,包括CPU(Central Processing Unit,中央处理器)、NP(Network Processor,网络处理器)等;还可以是DSP(Digital Signal Processing,数字信号处理器)、ASIC(Application Specific Integrated Circuit,专用集成电路)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The above-mentioned processors can be general-purpose processors, including CPU (Central Processing Unit), NP (Network Processor), etc.; they can also be DSP (Digital Signal Processing), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
在本发明提供的又一实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的数据库水印的嵌入方法或数据库水印的溯源方法。In another embodiment of the present invention, a computer-readable storage medium is provided, in which a computer program is stored. When the computer program is executed by a processor, the database watermark embedding method or database watermark tracing method as described above is implemented.
在本发明提供的又一实施例中,还提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如上所述的数据库水印的嵌入方法或数据库水印的溯源方法。In another embodiment of the present invention, a computer program product including instructions is provided. When the computer program product is run on a computer, the computer executes the database watermark embedding method or database watermark tracing method as described above.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的 流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘)等。In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware, or any combination thereof. When implemented by software, all or part of the embodiments may be implemented in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the embodiments described in the embodiments of the present invention are generated. Process or function. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can access or a data storage device such as a server or data center that includes one or more available media integrated therein. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state hard disk), etc.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this article, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Moreover, the terms "include", "comprise" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements includes not only those elements, but also other elements not explicitly listed, or also includes elements inherent to such process, method, article or device. In the absence of further restrictions, the elements defined by the sentence "comprise a ..." do not exclude the presence of other identical elements in the process, method, article or device including the elements.
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。Each embodiment in this specification is described in a related manner, and the same or similar parts between the embodiments can be referenced to each other, and each embodiment focuses on the differences from other embodiments.
以上所述仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内所作的任何修改、等同替换、改进等,均包含在本发明的保护范围内。 The above description is only a preferred embodiment of the present invention and is not intended to limit the protection scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims (10)

  1. 一种数据库水印的嵌入方法,其特征在于,所述方法包括:A method for embedding a database watermark, characterized in that the method comprises:
    根据预设编码规则将待处理数据转换为初步编码数据;Convert the data to be processed into preliminary coded data according to preset coding rules;
    将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据;Convert each character in the preliminary coded data into a binary number of a preset length and obtain binary data;
    根据预设映射关系表将所述二进制数据映射为零宽字符串数据;Mapping the binary data into zero-width character string data according to a preset mapping relationship table;
    在所述零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据;Adding preset zero-width character strings before and after the zero-width character string data to obtain final encoded data;
    将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置;Embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed;
    其中,所述预设长度不小于与所述字符对应的原始二进制数的长度,所述预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的映射关系确定的。The preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is determined according to the mapping relationship between different binary numbers with preset bit numbers and different zero-width character strings.
  2. 如权利要求1所述的方法,其特征在于,在将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置之前,所述方法还包括:The method according to claim 1, characterized in that before embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed, the method further comprises:
    根据所述待处理数据的标签标记确定所述嵌入位置;Determining the embedding position according to the tag mark of the data to be processed;
    其中,所述标签标记是预先根据所述待处理数据的类型、长度、位置和属性确定的。The label mark is determined in advance according to the type, length, position and attribute of the data to be processed.
  3. 如权利要求1所述的方法,其特征在于,根据预设映射关系表将所述二进制数据映射为零宽字符串数据,具体为:The method according to claim 1, characterized in that the binary data is mapped to zero-width string data according to a preset mapping relationship table, specifically:
    按所述预设位数将所述二进制数据划分为多组子数据;Dividing the binary data into a plurality of groups of sub-data according to the preset number of bits;
    将各组所述子数据按照所述预设映射关系表分别映射为零宽字符串并得到所述零宽字符串数据。Each group of the sub-data is mapped into a zero-width character string according to the preset mapping relationship table to obtain the zero-width character string data.
  4. 如权利要求1所述的方法,其特征在于,将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据,具体为:The method according to claim 1, characterized in that each character in the preliminary encoded data is converted into a binary number of a preset length and binary data is obtained, specifically:
    按先后顺序依次对每位所述字符进行二进制转换得到各所述原始二进制数;Perform binary conversion on each character in sequence to obtain the original binary number;
    若所述原始二进制数的长度小于所述预设长度,在所述原始二进制数的最高位之前补零并使所述原始二进制数的长度达到所述预设长度; If the length of the original binary number is less than the preset length, padding zeros before the highest bit of the original binary number so that the length of the original binary number reaches the preset length;
    根据与各所述字符对应的为预设长度的二进制数得到所述二进制数据。The binary data is obtained according to a binary number of a preset length corresponding to each of the characters.
  5. 如权利要求1所述的方法,其特征在于,所述预设编码规则包括与十六进制Unicode编码,或十进制Unicode编码,或十六进制GBK编码,或十进制GBK编码对应的编码规则。The method according to claim 1, characterized in that the preset encoding rule includes an encoding rule corresponding to hexadecimal Unicode encoding, or decimal Unicode encoding, or hexadecimal GBK encoding, or decimal GBK encoding.
  6. 一种如权利要求1所述数据库水印的溯源方法,其特征在于,所述方法包括:A method for tracing the source of a database watermark as claimed in claim 1, characterized in that the method comprises:
    根据所述嵌入位置确定所述最终编码数据;Determining the final encoded data according to the embedding position;
    去除所述最终编码数据前、后的所述预设零宽字符串并得到所述零宽字符串数据;Removing the preset zero-width character strings before and after the final encoded data and obtaining the zero-width character string data;
    根据所述预设映射关系表将所述零宽字符串数据映射为所述二进制数据;Mapping the zero-width character string data to the binary data according to the preset mapping relationship table;
    按所述预设长度将所述二进制数据划分为多组二进制数,并将各组二进制数转换为所述初步编码数据的各字符;Dividing the binary data into multiple groups of binary numbers according to the preset length, and converting each group of binary numbers into each character of the preliminary coded data;
    根据所述预设编码规则将各所述字符转换为所述待处理数据,并将所述待处理数据作为溯源结果数据。Each of the characters is converted into the data to be processed according to the preset encoding rule, and the data to be processed is used as the traceability result data.
  7. 如权利要求6所述的方法,其特征在于,所述嵌入位置由标签标记确定,所述标签标记是预先根据所述待处理数据的类型、长度、位置和属性确定的。The method according to claim 6 is characterized in that the embedding position is determined by a tag mark, and the tag mark is pre-determined based on the type, length, position and attributes of the data to be processed.
  8. 一种数据库水印的嵌入装置,其特征在于,所述装置包括:A database watermark embedding device, characterized in that the device comprises:
    第一转换模块,用于根据预设编码规则将待处理数据转换为初步编码数据;A first conversion module, used to convert the data to be processed into preliminary coded data according to a preset coding rule;
    第二转换模块,用于将所述初步编码数据中的每位字符转换为预设长度的二进制数并得到二进制数据;A second conversion module, used for converting each character in the preliminary coded data into a binary number of a preset length and obtaining binary data;
    第一映射模块,用于根据预设映射关系表将所述二进制数据映射为零宽字符串数据;A first mapping module, used for mapping the binary data into zero-width character string data according to a preset mapping relationship table;
    添加模块,用于在所述零宽字符串数据的前、后分别添加预设零宽字符串并得到最终编码数据;An adding module, used for adding a preset zero-width character string before and after the zero-width character string data respectively to obtain final encoded data;
    嵌入模块,用于将所述最终编码数据作为数据库水印嵌入与所述待处理数据对应的嵌入位置;An embedding module, used for embedding the final encoded data as a database watermark into an embedding position corresponding to the data to be processed;
    其中,所述预设长度不小于与所述字符对应的原始二进制数的长度,所述预设映射关系表是根据预设位数的不同二进制数和不同零宽字符串之间的 映射关系确定的。The preset length is not less than the length of the original binary number corresponding to the character, and the preset mapping relationship table is a table of the relationship between different binary numbers with preset bits and different zero-width character strings. The mapping relationship is determined.
  9. 一种如权利要求8所述数据库水印的溯源装置,其特征在于,所述装置包括:A database watermark tracing device as claimed in claim 8, characterized in that the device comprises:
    确定模块,用于根据所述嵌入位置确定所述最终编码数据;A determination module, used for determining the final encoded data according to the embedding position;
    去除模块,用于去除所述最终编码数据前、后的所述预设零宽字符串并得到所述零宽字符串数据;A removal module, used for removing the preset zero-width character string before and after the final encoded data and obtaining the zero-width character string data;
    第二映射模块,用于根据所述预设映射关系表将所述零宽字符串数据映射为所述二进制数据;A second mapping module, used for mapping the zero-width character string data into the binary data according to the preset mapping relationship table;
    第三转换模块,用于按所述预设长度将所述二进制数据划分为多组二进制数,并将各组二进制数转换为所述初步编码数据的各字符;A third conversion module, used for dividing the binary data into a plurality of groups of binary numbers according to the preset length, and converting each group of binary numbers into each character of the preliminary coded data;
    第四转换模块,用于根据所述预设编码规则将各所述字符转换为所述待处理数据,并将所述待处理数据作为溯源结果数据。The fourth conversion module is used to convert each of the characters into the data to be processed according to the preset encoding rule, and use the data to be processed as the tracing result data.
  10. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    处理器;以及Processor; and
    存储器,用于存储所述处理器的可执行指令;A memory, configured to store executable instructions of the processor;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1~5中任意一项所述的嵌入方法或权利要求6~7中任意一项所述的溯源方法。 The processor is configured to execute the embedding method described in any one of claims 1 to 5 or the tracing method described in any one of claims 6 to 7 by executing the executable instructions.
PCT/CN2023/085945 2022-09-27 2023-04-03 Database watermark embedding method and apparatus, database watermark tracing method and apparatus, and electronic device WO2024066271A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211179867.9 2022-09-27
CN202211179867.9A CN115495439B (en) 2022-09-27 2022-09-27 Embedding method and tracing method and device of database watermark and electronic equipment

Publications (1)

Publication Number Publication Date
WO2024066271A1 true WO2024066271A1 (en) 2024-04-04

Family

ID=84473099

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/085945 WO2024066271A1 (en) 2022-09-27 2023-04-03 Database watermark embedding method and apparatus, database watermark tracing method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN115495439B (en)
WO (1) WO2024066271A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115495439B (en) * 2022-09-27 2023-04-07 北京柏睿数据技术股份有限公司 Embedding method and tracing method and device of database watermark and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205355A (en) * 2015-11-05 2015-12-30 南通大学 Embedding method and extracting method for text watermark based on semantic role position mapping
CN106570356A (en) * 2016-11-01 2017-04-19 南京理工大学 Unicode coding-based text watermark embedding method and extraction method
CN110414194A (en) * 2019-07-02 2019-11-05 南京理工大学 A kind of insertion and extracting method of Text Watermarking
CN111986065A (en) * 2019-05-23 2020-11-24 北京奇虎科技有限公司 Digital watermark embedding method and device
CN115495439A (en) * 2022-09-27 2022-12-20 北京柏睿数据技术股份有限公司 Embedding method and tracing method and device of database watermark and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9087459B2 (en) * 2012-11-30 2015-07-21 The Nielsen Company (Us), Llc Methods, apparatus, and articles of manufacture to encode auxilary data into text data and methods, apparatus, and articles of manufacture to obtain encoded data from text data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105205355A (en) * 2015-11-05 2015-12-30 南通大学 Embedding method and extracting method for text watermark based on semantic role position mapping
CN106570356A (en) * 2016-11-01 2017-04-19 南京理工大学 Unicode coding-based text watermark embedding method and extraction method
CN111986065A (en) * 2019-05-23 2020-11-24 北京奇虎科技有限公司 Digital watermark embedding method and device
CN110414194A (en) * 2019-07-02 2019-11-05 南京理工大学 A kind of insertion and extracting method of Text Watermarking
CN115495439A (en) * 2022-09-27 2022-12-20 北京柏睿数据技术股份有限公司 Embedding method and tracing method and device of database watermark and electronic equipment

Also Published As

Publication number Publication date
CN115495439B (en) 2023-04-07
CN115495439A (en) 2022-12-20

Similar Documents

Publication Publication Date Title
CN107330306B (en) Text watermark embedding and extracting method and device, electronic equipment and storage medium
US7665015B2 (en) Hardware unit for parsing an XML document
US7716577B2 (en) Method and apparatus for hardware XML acceleration
US7596745B2 (en) Programmable hardware finite state machine for facilitating tokenization of an XML document
US7665016B2 (en) Method and apparatus for virtualized XML parsing
CN110597814B (en) Structured data serialization and deserialization method and device
CN110245469B (en) Webpage watermark generation method, watermark analysis method, device and storage medium
WO2022142011A1 (en) Method and device for address recognition, computer device, and storage medium
CN110414194B (en) Text watermark embedding and extracting method
Wang et al. A coverless plain text steganography based on character features
WO2024066271A1 (en) Database watermark embedding method and apparatus, database watermark tracing method and apparatus, and electronic device
TWI604318B (en) Method of data sorting
WO2020140636A1 (en) Watermark embedding and extracting methods and apparatuses, and terminal device and medium
CN112948776A (en) Digital watermark adding method and device, electronic equipment and storage medium
Taleby Ahvanooey et al. An innovative technique for web text watermarking (AITW)
Melkundi et al. A robust technique for relational database watermarking and verification
CN107526742B (en) Method and apparatus for processing multilingual text
JP2012085274A (en) Computer-implemented method of encoding text on matrix code symbol, computer-implemented method of decoding matrix code symbol, encoder for encoding text on matrix code symbol, and decoder for decoding matrix code symbol
Govada et al. Text steganography with multi level shielding
CN103530574A (en) Method for inserting and extracting hidden information based on English PDF document
CN110704813A (en) Character anti-piracy system based on character recoding
CN111355709A (en) Data verification method and device, electronic equipment and computer readable storage medium
WO2020139563A1 (en) Information processing method, hidden information parsing and embedding method, apparatus, and device
CN115982675A (en) Document processing method, device, electronic equipment and storage medium
Wu et al. Coverless steganography based on english texts using binary tags protocol