CN111797368B

CN111797368B - Data watermark recognition analysis method and system

Info

Publication number: CN111797368B
Application number: CN202010637042.1A
Authority: CN
Inventors: 郭骞; 张鸿雁; 刘博�; 秦龙; 张岚; 王献军; 郭俊杰; 沈文; 俞庚申; 于鹏飞; 高先周; 杨如侠; 高鹏; 仇慎健; 李为
Original assignee: State Grid Smart Grid Research Institute Co ltd; State Grid Corp of China SGCC; State Grid Henan Electric Power Co Ltd; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Current assignee: State Grid Smart Grid Research Institute Co ltd; State Grid Corp of China SGCC; State Grid Henan Electric Power Co Ltd; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2024-04-09
Anticipated expiration: 2040-07-03
Also published as: CN111797368A

Abstract

The invention discloses a data watermark identification and analysis method and a system, wherein the method comprises the following steps: by acquiring the data content to be identified, the data content to be identified comprises: at least one single piece of data; classifying according to the data content to be identified, and generating at least one semantic segment; generating a semantic library according to different semantic segments; and (3) matching the single data in the data content to be identified with the semantic segments in the semantic library, and marking the data which cannot be matched with the semantic segments in the semantic library as a data watermark. By analyzing and identifying the data watermark, accurate analysis is provided for the identification processing of the data watermark, and the false identification of the data and the interference of the data meaning are reduced.

Description

Data watermark recognition analysis method and system

Technical Field

The invention relates to the technical field of information security, in particular to a data watermark identification and analysis method and system.

Background

The development of information technology is rapid, represented by big data analysis and new generation artificial intelligence technology, has played an important role in the aspects of national governance, institutional lean management, customer service improvement and the like, and the full fusion and sharing of data have become great trend, so that the economic and social development is greatly influenced. However, the data security problem is increasingly prominent, and the problems of theft and abuse of data are increasingly serious, which is the primary problem that prevents the data from being further fused and shared.

The data watermarking technology is to embed some identification information (namely data watermarking) into a digital carrier (including multimedia, documents, software, etc.), and the use value of the original carrier is not affected, and the original carrier is not easily ascertained and modified again, but can be identified and recognized by a producer. The added data watermark is to add watermark data in a data string for marking the definition of ownership and the definition of a data distribution process, however, the added data watermark in the prior art changes the structure of the data, thereby causing the false identification of the data and the interference of the data meaning.

Disclosure of Invention

Therefore, the data watermark identification and analysis method and the system provided by the invention overcome the defect that the data is erroneously identified and the meaning of the data is interfered due to the addition of the watermark data structure in the prior art.

In order to achieve the above purpose, the present invention provides the following technical solutions:

in a first aspect, an embodiment of the present invention provides a method for identifying and analyzing a data watermark, including:

acquiring data content to be identified, wherein the data content to be identified comprises: at least one single piece of data;

classifying according to the data content to be identified, and generating at least one semantic segment;

generating a semantic library according to different semantic segments;

and (3) matching the single data in the data content to be identified with the semantic segments in the semantic library, and marking the data which cannot be matched with the semantic segments in the semantic library as a data watermark.

In one embodiment, the data content to be identified is obtained as a total set of input data of the same syntax structure.

In one embodiment, the at least one semantic segment is generated by classifying according to the same field and the position of the same field in the data content to be identified.

In an embodiment, the classifying according to the same field and the position of the same field in the data content to be identified, generating at least one semantic segment includes:

determining the fields with the number of repeated fields larger than or equal to a first preset value in the data content to be identified as the same fields, and counting the same fields and the field positions of the same fields;

deleting the same fields with the number smaller than the second preset value, and reserving the same fields with the number larger than or equal to the second preset value;

and sequencing the same reserved fields according to the sequence positions of single data in the data content to be identified, and generating at least one semantic segment.

In an embodiment, after the step of marking, as the data watermark, the data that matches the single piece of data in the data content to be identified with the semantic segment in the semantic library, and cannot match the semantic segment in the semantic library, the method further includes: the content and location marked as a data watermark is returned.

In an embodiment, the data content to be identified includes: at least one of text and character string.

In a second aspect, an embodiment of the present invention provides a data watermark identifying and analyzing system, including:

the data acquisition module is used for acquiring the data content to be identified, and the data content to be identified comprises: at least one single piece of data;

the semantic segment generation module is used for classifying according to the data content to be identified and generating at least one semantic segment;

the semantic library segment generation module is used for generating a semantic library according to different semantic segments;

the data watermark identification module is used for matching single data in the data content to be identified with the semantic segments in the semantic library, and marking the data which cannot be matched with the semantic segments in the semantic library as the data watermark.

In an embodiment, the data watermark recognition analysis system further comprises:

and the data watermark content and position acquisition module is used for returning the content and the position marked as the data watermark.

In a third aspect, an embodiment of the present invention provides a terminal, including: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to cause the at least one processor to perform the data watermark identification analysis method according to the first aspect of the embodiment of the invention.

In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium, where computer instructions are stored, where the computer instructions are configured to cause a computer to perform the method for identifying and analyzing a data watermark according to the first aspect of the embodiment of the present invention.

The technical scheme of the invention has the following advantages:

the data watermark identification and analysis method and system provided by the invention are characterized in that the data content to be identified is obtained, and the data content to be identified comprises the following steps: at least one single piece of data; classifying according to the data content to be identified, and generating at least one semantic segment; generating a semantic library according to different semantic segments; and (3) matching the single data in the data content to be identified with the semantic segments in the semantic library, and marking the data which cannot be matched with the semantic segments in the semantic library as a data watermark. According to the invention, through analyzing and identifying the data watermark, accurate analysis is provided for the identification processing of the data watermark, and the false identification of the data and the interference of the data meaning are reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a specific example of a data watermark identification and analysis method according to an embodiment of the present invention;

fig. 2 is a flowchart of another specific example of a data watermark identifying and analyzing method according to an embodiment of the present invention;

fig. 3 is a block diagram of a specific example of a data watermark identification and analysis system according to an embodiment of the present invention;

fig. 4 is a block diagram of another specific example of a data watermark identifying and analyzing system according to an embodiment of the present invention;

fig. 5 is a composition diagram of a specific example of a terminal according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made apparent and fully in view of the accompanying drawings, in which some, but not all embodiments of the invention are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; the two components can be directly connected or indirectly connected through an intermediate medium, or can be communicated inside the two components, or can be connected wirelessly or in a wired way. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

In addition, the technical features of the different embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

Example 1

The data watermark identification and analysis method provided by the embodiment of the invention, as shown in fig. 1, comprises the following steps:

step S1: acquiring data content to be identified, wherein the data content to be identified comprises: at least one single piece of data.

In the embodiment of the invention, the data content to be identified is obtained as a total set of input data with a grammar structure, wherein the grammar structure comprises regular and irregular, for example: the method comprises the following steps: sentences of structures such as a main sentence, a predicate sentence, a guest sentence and the like are merely examples, and are not limited to the examples, and corresponding structures are selected according to specific requirements; an irregular grammar structure such as: the digits of the identification card represent different meanings, the ordering of the address information from large to small, and the like, which are only examples, but not limiting, and the corresponding structure is selected according to specific requirements.

In an embodiment of the present invention, the data content to be identified includes: at least one of the text and the character string is, for example, not limited to, a corresponding content is selected according to the corresponding requirement.

Step S2: and classifying according to the data content to be identified, and generating at least one semantic segment.

In the embodiment of the invention, the classification is performed according to the same field and the position of the same field in the data content to be identified, and at least one semantic segment is generated. The embodiment defines the same field, extracts the effective information of the data content to be identified by counting the repeated occurrence probability of the same field, and generates at least one semantic segment. Classifying according to the same fields and the positions of the same fields in the data content to be identified, wherein the specific process for generating at least one semantic segment comprises the following steps: determining the fields with the number of repeated fields larger than or equal to a first preset value in the data content to be identified as the same fields, and counting the same fields and the field positions of the same fields; deleting the same fields with the number smaller than the second preset value, and reserving the same fields with the number larger than or equal to the second preset value; and sequencing the same reserved fields according to the sequence positions of single data in the data content to be identified, and generating at least one semantic segment.

In the embodiment of the invention, the fields with the number of the repeated fields greater than or equal to 2 in the data content to be identified are determined to be the same fields, and only by way of example, but not limitation, corresponding numerical values are set according to reasonable requirements, and the fields of the same fields and the field positions of the same fields are counted; deleting the same fields with the number less than 2, and reserving the same fields with the number greater than or equal to 2, by way of example only, without limitation, setting corresponding values according to reasonable requirements; and sequencing the same reserved fields according to the sequence positions of single data in the data content to be identified, and generating at least one semantic segment.

Step S3: and generating a semantic library according to the different semantic segments.

In an embodiment of the invention, the semantic library comprises at least one semantic segment generated by classification according to the data content to be identified.

Step S4: and (3) matching the single data in the data content to be identified with the semantic segments in the semantic library, and marking the data which cannot be matched with the semantic segments in the semantic library as a data watermark.

In the embodiment of the invention, when matching the single piece of data in the data content to be identified with the semantic segments in the semantic library, the single piece of data in the data content to be identified is ensured to be matched with each semantic segment in the semantic library, and the data which cannot be matched with the semantic segments in the semantic library is marked as a data watermark, as shown in fig. 2, and the method further comprises the following steps of:

step S5: and returning the position of the data watermark.

In the embodiment of the invention, the position of the data watermark is returned by analyzing and identifying the data watermark, so that accurate analysis is provided for the identification processing of the data watermark, and the false identification of the data and the interference of the data meaning are reduced; meanwhile, the position of the data watermark can be returned, so that accurate positioning is provided for subsequent processing of the data watermark.

The data watermark identification and analysis method provided by the embodiment of the invention comprises the steps of: at least one single piece of data; classifying according to the data content to be identified, and generating at least one semantic segment; generating a semantic library according to different semantic segments; and (3) matching the single data in the data content to be identified with the semantic segments in the semantic library, and marking the data which cannot be matched with the semantic segments in the semantic library as a data watermark. By analyzing and identifying the data watermark, accurate analysis is provided for the identification processing of the data watermark, and the false identification of the data and the interference of the data meaning are reduced.

Example 2

An embodiment of the present invention provides a data watermark identifying and analyzing system, as shown in fig. 3, including:

the data acquisition module 1 is configured to acquire data content to be identified, where the data content to be identified includes: at least one single piece of data; this module performs the method described in step S1 in embodiment 1, and will not be described here again.

The semantic segment generation module 2 is used for classifying according to the data content to be identified and generating at least one semantic segment; this module performs the method described in step S2 in embodiment 1, and will not be described here.

The semantic library segment generation module 3 is used for generating a semantic library according to different semantic segments; this module performs the method described in step S3 in embodiment 1, and will not be described here.

The data watermark identification module 4 is used for matching single data in the data content to be identified with semantic segments in the semantic library, and data which cannot be matched with the semantic segments in the semantic library are data watermarks; this module performs the method described in step S4 in embodiment 1, and will not be described here.

In an embodiment of the present invention, as shown in fig. 4, the data watermark identifying and analyzing system further includes:

the data watermark content and location obtaining module 5 is configured to return the content and location marked as the data watermark, and this module performs the method described in step S5 in embodiment 1, which is not described herein.

The embodiment of the invention provides a data watermark identification and analysis system, which acquires data content to be identified through a data acquisition module, wherein the data content to be identified comprises the following components: at least one single piece of data; the semantic segment generation module classifies the data content to be identified to generate at least one semantic segment; the semantic library generation module generates a semantic library according to different semantic segments; the data watermark identification module is used for matching single data in the data content to be identified with semantic segments in the semantic library, and marking the data which cannot be matched with the semantic segments in the semantic library as the data watermark. By analyzing and identifying the data watermark, accurate analysis is provided for the identification processing of the data watermark, and the false identification of the data and the interference of the data meaning are reduced.

Example 3

An embodiment of the present invention provides a terminal, as shown in fig. 5, including: at least one processor 401, such as a CPU (Central Processing Unit ), at least one communication interface 403, a memory 404, at least one communication bus 402. Wherein communication bus 402 is used to enable connected communications between these components. The communication interface 403 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional communication interface 403 may further include a standard wired interface and a wireless interface. The memory 404 may be a high-speed RAM memory (Random Access Memory ) or a nonvolatile memory (non-volatile memory), such as at least one magnetic disk memory.

The memory 404 may also optionally be at least one storage device located remotely from the aforementioned processor 401. Wherein the processor 401 may perform the data watermark identification analysis method in embodiment 1. A set of program codes is stored in the memory 404, and the processor 401 calls the program codes stored in the memory 404 for executing the data watermark identification analysis method in embodiment 1. The communication bus 402 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. Communication bus 402 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one line is shown in fig. 5, but not only one bus or one type of bus. Wherein the memory 404 may include volatile memory (English) such as random-access memory (RAM); the memory may also include a nonvolatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated as HDD) or a solid-state drive (english: SSD); memory 404 may also include a combination of the above types of memory. The processor 401 may be a central processor (English: central processing unit, abbreviated: CPU), a network processor (English: network processor, abbreviated: NP) or a combination of CPU and NP.

Wherein the memory 404 may include volatile memory (English) such as random-access memory (RAM); the memory may also include a nonvolatile memory (english: non-volatile memory), such as a flash memory (english: flash memory), a hard disk (english: hard disk drive, abbreviated as HDD) or a solid state disk (english: solid-state drive, abbreviated as SSD); memory 404 may also include a combination of the above types of memory.

The processor 401 may be a central processor (English: central processing unit, abbreviated: CPU), a network processor (English: network processor, abbreviated: NP) or a combination of CPU and NP.

Wherein the processor 401 may further comprise a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof (English: programmable logic device). The PLD may be a complex programmable logic device (English: complex programmable logic device, abbreviated: CPLD), a field programmable gate array (English: field-programmable gate array, abbreviated: FPGA), a general-purpose array logic (English: generic array logic, abbreviated: GAL), or any combination thereof.

Optionally, the memory 404 is also used for storing program instructions. The processor 401 may invoke program instructions to implement the data watermark identification analysis method as in execution of embodiment 1 of the present application.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores computer executable instructions thereon, wherein the computer executable instructions can execute the data watermark identification and analysis method in the embodiment 1. Wherein the storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the storage medium may also comprise a combination of memories of the kind described above.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims

1. A method of data watermark identification and analysis, comprising:

acquiring data content to be identified, wherein the data content to be identified comprises: at least one piece of data, wherein the content of the data to be identified is obtained as a total set of input data with the same grammar structure;

classifying according to the data content to be identified, generating at least one semantic segment, wherein classifying according to the same field and the position of the same field in the data content to be identified, generating at least one semantic segment comprises: determining the fields with the number of repeated fields larger than or equal to a first preset value in the data content to be identified as the same fields, and counting the same fields and the field positions of the same fields; deleting the same fields with the number smaller than the second preset value, and reserving the same fields with the number larger than or equal to the second preset value; sequencing the same reserved fields according to the sequence positions of single data in the data content to be identified, and generating at least one semantic segment;

generating a semantic library according to different semantic segments;

2. The method for identifying and analyzing data watermarks according to claim 1, wherein after the step of matching a single piece of data in the data content to be identified with a semantic segment in a semantic library, the data that cannot be matched with the semantic segment in the semantic library is marked as a data watermark, the method further comprises: the content and location marked as a data watermark is returned.

3. A method of identifying and analyzing a data watermark according to any of claims 1-2, wherein said data content to be identified comprises: at least one of text and character string.

4. A data watermark identification and analysis system, comprising:

the data acquisition module is used for acquiring the data content to be identified, and the data content to be identified comprises: at least one piece of data, wherein the content of the data to be identified is obtained as a total set of input data with the same grammar structure;

the semantic segment generating module is configured to classify according to the data content to be identified, and generate at least one semantic segment, where classifying according to the same field and the position of the same field in the data content to be identified, and generating at least one semantic segment includes: determining the fields with the number of repeated fields larger than or equal to a first preset value in the data content to be identified as the same fields, and counting the same fields and the field positions of the same fields; deleting the same fields with the number smaller than the second preset value, and reserving the same fields with the number larger than or equal to the second preset value; sequencing the same reserved fields according to the sequence positions of single data in the data content to be identified, and generating at least one semantic segment;

5. The data watermark identification analysis system according to claim 4, further comprising:

6. A terminal, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the data watermark identification analysis method of any one of claims 1-3.

7. A computer-readable storage medium storing computer instructions for causing the computer to perform the data watermark identification analysis method of any one of claims 1 to 3.