CN111797368A

CN111797368A - A method and system for identifying and analyzing data watermarks

Info

Publication number: CN111797368A
Application number: CN202010637042.1A
Authority: CN
Inventors: 郭骞; 张鸿雁; 刘博�; 秦龙; 张岚; 王献军; 郭俊杰; 沈文; 俞庚申; 于鹏飞; 高先周; 杨如侠; 高鹏; 仇慎健; 李为
Original assignee: State Grid Corp of China SGCC; Global Energy Interconnection Research Institute; State Grid Henan Electric Power Co Ltd; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; Global Energy Interconnection Research Institute; State Grid Henan Electric Power Co Ltd; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-10-20
Anticipated expiration: 2040-07-03
Also published as: CN111797368B

Abstract

The invention discloses a data watermark identification and analysis method and a system, wherein the method comprises the following steps: by acquiring the data content to be identified, the data content to be identified includes: at least one single piece of data; classifying according to the data content to be identified to generate at least one semantic segment; generating a semantic library according to different semantic segments; and matching the single data in the data content to be identified with the semantic segments in the semantic library, and marking the data which cannot be matched with the semantic segments in the semantic library as the data watermark. By analyzing and identifying the data watermark, accurate analysis is provided for the identification processing of the data watermark, and the false identification of the data and the interference of the data meaning are reduced.

Description

A method and system for identifying and analyzing data watermarks

技术领域technical field

本发明涉及信息安全技术领域，具体涉及一种数据水印识别分析方法及系统。The invention relates to the technical field of information security, in particular to a data watermark identification and analysis method and system.

背景技术Background technique

信息技术发展迅猛，以大数据分析、新一代人工智能技术为代表，已在国家治理、机构精益化管理、提升客户服务等方面发挥了重要作用，数据的充分融合和共享已成为大势所趋，必将为经济社会发展带来深远影响。然而，数据安全问题却日益突出，数据的盗用和滥用问题日趋严重，是目前妨碍数据进一步进行融合共享的首要问题。The rapid development of information technology, represented by big data analysis and a new generation of artificial intelligence technology, has played an important role in national governance, lean management of institutions, and improved customer service. The full integration and sharing of data has become the general trend. It has a profound impact on economic and social development. However, the problem of data security has become increasingly prominent, and the problem of data misappropriation and abuse is becoming more and more serious, which is the primary problem that hinders the further integration and sharing of data.

数据水印技术是将一些标识信息(即数据水印)直接嵌入数字载体当中(包括多媒体、文档、软件等)，且不影响原载体的使用价值，也不容易被探知和再次修改，但可以被生产方识别和辨认。其中，添加型数据水印，是在数据串中增加水印数据，用于标记该数据的所属，可用于所有权的定义、数据分发过程的定义，然而现有技术中添加的数据水印，改变了数据的结构，从而导致数据的误识别，造成数据表义的干扰。Data watermarking technology is to directly embed some identification information (that is, data watermarks) into digital carriers (including multimedia, documents, software, etc.) Party identification and identification. Among them, the addition-type data watermark is to add watermark data in the data string, which is used to mark the belonging of the data, and can be used for the definition of ownership and the definition of the data distribution process. However, the data watermark added in the prior art changes the data. structure, resulting in misidentification of data and interference of data representation.

发明内容SUMMARY OF THE INVENTION

因此，本发明提供的一种数据水印识别分析方法及系统，克服现有技术中因添加水印数据结构导致数据误识别、对数据的表义造成干扰的缺陷。Therefore, the present invention provides a data watermark identification and analysis method and system, which overcomes the defects in the prior art that data misrecognition due to adding a watermark data structure and interference with the representation of the data are caused.

为达到上述目的，本发明提供如下技术方案：To achieve the above object, the present invention provides the following technical solutions:

第一方面，本发明实施例提供一种数据水印识别分析方法，包括：In a first aspect, an embodiment of the present invention provides a method for identifying and analyzing data watermarks, including:

获取待识别数据内容，所述待识别数据内容，包括：至少一个单条数据；Acquiring the data content to be identified, the data content to be identified includes: at least one single piece of data;

根据待识别数据内容进行分类，生成至少一个语义段；Classify according to the content of the data to be identified, and generate at least one semantic segment;

根据不同的语义段，生成语义库；Generate a semantic library according to different semantic segments;

将待识别数据内容中的单条数据和语义库中的语义段进行匹配，无法和语义库中的语义段进行匹配的数据标记为数据水印。A single piece of data in the data content to be identified is matched with a semantic segment in the semantic database, and data that cannot be matched with the semantic segment in the semantic database is marked as a data watermark.

在一实施例中，获取待识别数据内容为同语法结构输入数据的总集合。In one embodiment, the content of the data to be recognized is obtained as a total set of input data with the same grammatical structure.

在一实施例中，根据待识别数据内容中的相同字段及相同字段的位置进行分类，生成至少一个语义段。In one embodiment, at least one semantic segment is generated by classifying the same field and the position of the same field in the data content to be identified.

在一实施例中，所述根据待识别数据内容中的相同字段及相同字段的位置进行分类，生成至少一个语义段，包括：In an embodiment, the classification according to the same field and the position of the same field in the data content to be identified to generate at least one semantic segment includes:

将待识别数据内容中的重复字段的数量大于或等于第一预设数值的字段确定为相同字段，并统计相同字段及相同字段的字段位置；Determining that the number of repeated fields in the data content to be identified is greater than or equal to the first preset value as the same field, and count the same field and the field position of the same field;

将数量小于第二预设数值的相同字段删除，并保留数量大于或等于第二预设数值的相同字段；Delete the same fields whose number is less than the second preset value, and keep the same fields whose number is greater than or equal to the second preset value;

根据在待识别数据内容中的单条数据的先后位置，对保留下的相同字段按先后位置进行排序，生成至少一个语义段。According to the sequence position of a single piece of data in the data content to be identified, the same fields that are retained are sorted by sequence position to generate at least one semantic segment.

在一实施例中，所述将待识别数据内容中的单条数据和语义库中的语义段进行匹配，无法和语义库中的语义段进行匹配的数据标记为数据水印的步骤之后，还包括：返回标记为数据水印的内容和位置。In one embodiment, after the step of marking a single piece of data in the data content to be identified with a semantic segment in the semantic database, and marking data that cannot be matched with the semantic segment in the semantic database as a data watermark, the method further includes: Returns the content and location marked as a data watermark.

在一实施例中，所述待识别数据内容，包括：文字、字符串中的至少之一。In an embodiment, the data content to be identified includes at least one of text and character strings.

第二方面，本发明实施例提供一种数据水印识别分析系统，包括：In a second aspect, an embodiment of the present invention provides a data watermark identification and analysis system, including:

数据获取模块，用于获取待识别数据内容，所述待识别数据内容，包括：至少一个单条数据；a data acquisition module, configured to acquire the content of the data to be identified, the content of the data to be identified includes: at least one single piece of data;

语义段生成模块，用于根据待识别数据内容进行分类，生成至少一个语义段；A semantic segment generation module, configured to classify according to the content of the data to be identified, and generate at least one semantic segment;

语库段生成模块，用于根据不同的语义段，生成语义库；The lexicon segment generation module is used to generate a semantic library according to different semantic segments;

数据水印识别模块，用于将待识别数据内容中的单条数据和语义库中的语义段进行匹配，无法和语义库中的语义段进行匹配的数据标记为数据水印。The data watermark recognition module is used to match a single piece of data in the data content to be recognized with a semantic segment in the semantic database, and mark the data that cannot be matched with the semantic segment in the semantic database as a data watermark.

在一实施例中，数据水印识别分析系统还包括：In one embodiment, the data watermark identification and analysis system further includes:

数据水印内容和位置获取模块，用于返回标记为数据水印的内容和位置。The data watermark content and position acquisition module is used to return the content and position marked as data watermark.

第三方面，本发明实施例提供一种终端，包括：至少一个处理器，以及与所述至少一个处理器通信连接的存储器，其中，所述存储器存储有可被所述至少一个处理器执行的指令，所述指令被所述至少一个处理器执行，以使所述至少一个处理器执行本发明实施例第一方面所述的数据水印识别分析方法。In a third aspect, an embodiment of the present invention provides a terminal, including: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores a program executable by the at least one processor. The instruction is executed by the at least one processor, so that the at least one processor executes the data watermark identification and analysis method described in the first aspect of the embodiment of the present invention.

第四方面，本发明实施例提供一种计算机可读存储介质，所述计算机可读存储介质存储有计算机指令，所述计算机指令用于使所述计算机执行本发明实施例第一方面所述的数据水印识别分析方法。In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to execute the first aspect of the embodiment of the present invention. Data watermark recognition and analysis method.

本发明技术方案，具有如下优点：The technical scheme of the present invention has the following advantages:

本发明提供的数据水印识别分析方法及系统，通过获取待识别数据内容，所述待识别数据内容，包括：至少一个单条数据；根据待识别数据内容进行分类，生成至少一个语义段；根据不同的语义段，生成语义库；将待识别数据内容中的单条数据和语义库中的语义段进行匹配，无法和语义库中的语义段进行匹配的数据标记为数据水印。本发明通过分析并识别数据水印，为数据水印的识别处理提供准确的分析，降低数据的误识别及数据表义的干扰。The data watermark identification and analysis method and system provided by the present invention obtain the content of the data to be identified, the content of the data to be identified includes: at least one single piece of data; classify according to the content of the data to be identified to generate at least one semantic segment; Semantic segment, generate a semantic database; match a single piece of data in the data content to be recognized with a semantic segment in the semantic database, and mark the data that cannot be matched with the semantic segment in the semantic database as a data watermark. The invention provides accurate analysis for the identification and processing of the data watermark by analyzing and identifying the data watermark, and reduces the misidentification of the data and the interference of the meaning of the data.

附图说明Description of drawings

为了更清楚地说明本发明具体实施方式或现有技术中的技术方案，下面将对具体实施方式或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施方式，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to illustrate the specific embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the specific embodiments or the prior art. Obviously, the accompanying drawings in the following description The drawings are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without creative efforts.

图1为本发明实施例提供的数据水印识别分析方法的一个具体示例的流程图；1 is a flowchart of a specific example of a data watermark identification and analysis method provided by an embodiment of the present invention;

图2为本发明实施例提供的数据水印识别分析方法的另一个具体示例的流程图；2 is a flowchart of another specific example of a data watermark identification and analysis method provided by an embodiment of the present invention;

图3为本发明实施例提供的数据水印识别分析系统的一个具体示例的模块组成图；3 is a block diagram of a specific example of a data watermark identification and analysis system provided by an embodiment of the present invention;

图4为本发明实施例提供的数据水印识别分析系统的另一个具体示例的模块组成图；4 is a block diagram of another specific example of a data watermark identification and analysis system provided by an embodiment of the present invention;

图5为本发明实施例提供的一种终端一个具体示例的组成图。FIG. 5 is a composition diagram of a specific example of a terminal according to an embodiment of the present invention.

具体实施方式Detailed ways

下面将结合附图对本发明的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

在本发明的描述中，需要说明的是，术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系，仅是为了便于描述本发明和简化描述，而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作，因此不能理解为对本发明的限制。此外，术语“第一”、“第二”、“第三”仅用于描述目的，而不能理解为指示或暗示相对重要性。In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. The indicated orientation or positional relationship is based on the orientation or positional relationship shown in the accompanying drawings, which is only for the convenience of describing the present invention and simplifying the description, rather than indicating or implying that the indicated device or element must have a specific orientation or a specific orientation. construction and operation, and therefore should not be construed as limiting the invention. Furthermore, the terms "first", "second", and "third" are used for descriptive purposes only and should not be construed to indicate or imply relative importance.

在本发明的描述中，需要说明的是，除非另有明确的规定和限定，术语“安装”、“相连”、“连接”应做广义理解，例如，可以是固定连接，也可以是可拆卸连接，或一体地连接；可以是机械连接，也可以是电连接；可以是直接相连，也可以通过中间媒介间接相连，还可以是两个元件内部的连通，可以是无线连接，也可以是有线连接。对于本领域的普通技术人员而言，可以具体情况理解上述术语在本发明中的具体含义。In the description of the present invention, it should be noted that the terms "installed", "connected" and "connected" should be understood in a broad sense, unless otherwise expressly specified and limited, for example, it may be a fixed connection or a detachable connection connection, or integral connection; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, or it can be the internal connection of two components, which can be a wireless connection or a wired connection connect. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood in specific situations.

此外，下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。In addition, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.

实施例1Example 1

本发明实施例提供的一种数据水印识别分析方法，如图1所示，包括如下步骤：A data watermark identification and analysis method provided by an embodiment of the present invention, as shown in FIG. 1 , includes the following steps:

步骤S1：获取待识别数据内容，所述待识别数据内容，包括：至少一个单条数据。Step S1: Acquire the content of the data to be identified, where the content of the data to be identified includes: at least one single piece of data.

在本发明实施例中，获取待识别数据内容为同语法结构输入数据的总集合，其中，语法结构包括有规则的和无规则的，例如：有规则的包括：主、谓、宾等结构的句子，仅以此举例，不以此为限，根据具体的需求，选择相应的结构；无规则语法结构，例如：身份证的各个数字代表了不同的含义、地址信息由大到小的排序等，仅以此举例，不以此为限，根据具体的需求，选择相应的结构。In the embodiment of the present invention, the content of the data to be recognized is the total set of input data with the same grammatical structure, wherein the grammatical structure includes regular and irregular, for example: regular includes: subject, predicate, object and other structures Sentence, this is only an example, not limited to this, according to specific needs, choose the corresponding structure; irregular grammatical structure, for example: each number of the ID card represents different meanings, the address information is sorted from large to small, etc. , this is only an example, not limited to this, according to specific needs, select the corresponding structure.

在本发明实施例中，待识别数据内容，包括：文字、字符串中的至少之一，仅以此举例，不以此为限，根据相应的需求选择相应的内容。In this embodiment of the present invention, the data content to be identified includes: at least one of text and character strings, which is only an example, not limited thereto, and corresponding content is selected according to corresponding requirements.

步骤S2：根据待识别数据内容进行分类，生成至少一个语义段。Step S2: Classify according to the content of the data to be identified, and generate at least one semantic segment.

在本发明实施例中是根据待识别数据内容中的相同字段及相同字段的位置进行分类，生成至少一个语义段。本实施例定义相同字段，通过统计相同字段重复出现的概率，提取待识别数据内容的有效信息，生成至少一个语义段。根据待识别数据内容中的相同字段及相同字段的位置进行分类，生成至少一个语义段的具体过程包括：将待识别数据内容中的重复字段的数量大于或等于第一预设数值的字段确定为相同字段，并统计相同字段及相同字段的字段位置；将数量小于第二预设数值的相同字段删除，并保留数量大于或等于第二预设数值的相同字段；根据在待识别数据内容中的单条数据的先后位置，对保留下的相同字段按先后位置进行排序，生成至少一个语义段。In the embodiment of the present invention, at least one semantic segment is generated by classifying the same field and the position of the same field in the data content to be identified. In this embodiment, the same field is defined, and by calculating the probability of repeated occurrence of the same field, the valid information of the data content to be identified is extracted, and at least one semantic segment is generated. The classification is performed according to the same field and the position of the same field in the data content to be identified, and the specific process of generating at least one semantic segment includes: determining a field whose number of repeated fields in the data content to be identified is greater than or equal to a first preset value as same fields, and count the same fields and the field positions of the same fields; delete the same fields whose number is less than the second preset value, and keep the same fields whose number is greater than or equal to the second preset value; The sequence position of a single piece of data, sort the reserved same fields according to the sequence position, and generate at least one semantic segment.

在本发明实施例中，将待识别数据内容中的重复字段的数量大于或等于2的字段确定为相同字段，仅以此举例，不以此为限，根据合理需求设置相应数值，并统计相同字段及相同字段的字段位置；将数量小于2的相同字段删除，并保留数量大于或等于2的相同字段，仅以此举例，不以此为限，根据合理需求设置相应数值；根据在待识别数据内容中的单条数据的先后位置，对保留下的相同字段按先后位置进行排序，生成至少一个语义段。In the embodiment of the present invention, the fields with the number of repeated fields greater than or equal to 2 in the data content to be identified are determined as the same fields, and this is only an example, not limited to this. Corresponding values are set according to reasonable requirements, and the same statistics are calculated. Field and the field position of the same field; delete the same field whose number is less than 2, and keep the same field whose number is greater than or equal to 2, this is only an example, not limited to this, and set the corresponding value according to reasonable needs; The sequence position of a single piece of data in the data content, and the reserved same fields are sorted according to the sequence position to generate at least one semantic segment.

步骤S3：根据不同的语义段，生成语义库。Step S3: Generate a semantic library according to different semantic segments.

在本发明实施例中，语义库包括根据待识别数据内容进行分类生成的至少一个语义段。In this embodiment of the present invention, the semantic library includes at least one semantic segment that is classified and generated according to the content of the data to be identified.

步骤S4：将待识别数据内容中的单条数据和语义库中的语义段进行匹配，无法和语义库中的语义段进行匹配的数据标记为数据水印。Step S4: Match a single piece of data in the data content to be identified with a semantic segment in the semantic database, and mark the data that cannot be matched with the semantic segment in the semantic database as a data watermark.

在本发明实施例中，将待识别数据内容中的单条数据和语义库中的语义段进行匹配，匹配时，保证待识别数据内容中的单条数据与语义库中的每个语义段进行匹配，无法和语义库中的语义段进行匹配的数据标记为数据水印，如图2所示，识别数据水印之后还包括：In the embodiment of the present invention, a single piece of data in the data content to be identified is matched with a semantic segment in the semantic library, and when matching, it is ensured that a single piece of data in the data content to be identified is matched with each semantic segment in the semantic library, The data that cannot be matched with the semantic segment in the semantic library is marked as data watermark, as shown in Figure 2, after identifying the data watermark, it also includes:

步骤S5：返回所述数据水印的位置。Step S5: Return to the position of the data watermark.

在本发明实施例中，通过分析并识别数据水印，返回数据水印的位置，为数据水印的识别处理提供准确的分析，降低数据的误识别及数据表义的干扰；同时，还可以返回该数据水印的位置，为后续针对数据水印的处理提供准确的定位。In the embodiment of the present invention, by analyzing and identifying the data watermark, the position of the data watermark is returned, so as to provide accurate analysis for the identification and processing of the data watermark, and reduce the misidentification of the data and the interference of the meaning of the data; at the same time, the data can also be returned. The position of the watermark provides accurate positioning for the subsequent processing of the data watermark.

本发明实施例中提供的数据水印识别分析方法，通过获取待识别数据内容，待识别数据内容，包括：至少一个单条数据；根据待识别数据内容进行分类，生成至少一个语义段；根据不同的语义段，生成语义库；将待识别数据内容中的单条数据和语义库中的语义段进行匹配，无法和语义库中的语义段进行匹配的数据标记为数据水印。通过分析并识别数据水印，为数据水印的识别处理提供准确的分析，降低数据的误识别及数据表义的干扰。The data watermark identification and analysis method provided in the embodiment of the present invention obtains the content of the data to be identified, and the content of the data to be identified includes: at least one single piece of data; classifying the data content to be identified to generate at least one semantic segment; A semantic database is generated; a single piece of data in the data content to be identified is matched with a semantic segment in the semantic database, and the data that cannot be matched with the semantic segment in the semantic database is marked as a data watermark. By analyzing and identifying the data watermark, it provides accurate analysis for the identification and processing of the data watermark, and reduces the misidentification of the data and the interference of the meaning of the data.

实施例2Example 2

本发明实施例提供一种数据水印识别分析系统，如图3所示，包括：An embodiment of the present invention provides a data watermark identification and analysis system, as shown in FIG. 3 , including:

数据获取模块1，用于获取待识别数据内容，所述待识别数据内容，包括：至少一个单条数据；此模块执行实施例1中的步骤S1所描述的方法，在此不再赘述。The data acquisition module 1 is used to acquire the data content to be identified, and the data content to be identified includes: at least one single piece of data; this module executes the method described in step S1 in Embodiment 1, and details are not repeated here.

语义段生成模块2，用于根据待识别数据内容进行分类，生成至少一个语义段；此模块执行实施例1中的步骤S2所描述的方法，在此不再赘述。The semantic segment generation module 2 is configured to classify according to the data content to be identified and generate at least one semantic segment; this module executes the method described in step S2 in Embodiment 1, and details are not repeated here.

语库段生成模块3，用于根据不同的语义段，生成语义库；此模块执行实施例1中的步骤S3所描述的方法，在此不再赘述。The lexical base segment generating module 3 is used to generate a semantic base according to different semantic segments; this module executes the method described in step S3 in Embodiment 1, and details are not repeated here.

数据水印识别模块4，用于将待识别数据内容中的单条数据和语义库中的语义段进行匹配，无法和语义库中的语义段进行匹配的数据为数据水印；此模块执行实施例1中的步骤S4所描述的方法，在此不再赘述。The data watermark identification module 4 is used to match a single piece of data in the data content to be identified with the semantic segment in the semantic library, and the data that cannot be matched with the semantic segment in the semantic library is a data watermark; this module executes in Embodiment 1 The method described in step S4 is not repeated here.

在本发明实施例中，如图4所示，数据水印识别分析系统，还包括：In the embodiment of the present invention, as shown in FIG. 4 , the data watermark identification and analysis system further includes:

数据水印内容和位置获取模块5，用于返回标记为数据水印的内容和位置，此模块执行实施例1中的步骤S5所描述的方法，在此不再赘述。The data watermark content and location acquisition module 5 is used to return the content and location marked as the data watermark. This module executes the method described in step S5 in Embodiment 1, which is not repeated here.

本发明实施例提供一种数据水印识别分析系统，通过数据获取模块获取待识别数据内容，所述待识别数据内容，包括：至少一个单条数据；语义段生成模块根据待识别数据内容进行分类，生成至少一个语义段；语库段生成模块根据不同的语义段，生成语义库；数据水印识别模块将待识别数据内容中的单条数据和语义库中的语义段进行匹配，无法和语义库中的语义段进行匹配的数据标记为数据水印。通过分析并识别数据水印，为数据水印的识别处理提供准确的分析，降低数据的误识别及数据表义的干扰。An embodiment of the present invention provides a data watermark identification and analysis system, which acquires data content to be identified through a data acquisition module, and the data content to be identified includes: at least one single piece of data; the semantic segment generation module classifies the data content to be identified and generates At least one semantic segment; the lexical segment generation module generates a semantic database according to different semantic segments; the data watermark recognition module matches a single piece of data in the data content to be recognized with the semantic segment in the semantic database, and cannot match the semantics in the semantic database The data that matches the segment is marked as a data watermark. By analyzing and identifying the data watermark, it provides accurate analysis for the identification and processing of the data watermark, and reduces the misidentification of the data and the interference of the meaning of the data.

实施例3Example 3

本发明实施例提供一种终端，如图5所示，包括：至少一个处理器401，例如CPU(Central Processing Unit，中央处理器)，至少一个通信接口403，存储器404，至少一个通信总线402。其中，通信总线402用于实现这些组件之间的连接通信。其中，通信接口403可以包括显示屏(Display)、键盘(Keyboard)，可选通信接口403还可以包括标准的有线接口、无线接口。存储器404可以是高速RAM存储器(Random Access Memory，随机存取存储器)，也可以是非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。An embodiment of the present invention provides a terminal, as shown in FIG. 5 , including: at least one processor 401 , such as a CPU (Central Processing Unit, central processing unit), at least one communication interface 403 , memory 404 , and at least one communication bus 402 . Among them, the communication bus 402 is used to realize the connection and communication between these components. The communication interface 403 may include a display screen (Display) and a keyboard (Keyboard), and the optional communication interface 403 may also include a standard wired interface and a wireless interface. The memory 404 may be a high-speed RAM memory (Random Access Memory, random access memory), or a non-volatile memory (non-volatile memory), such as at least one disk memory.

存储器404可选的还可以是至少一个位于远离前述处理器401的存储装置。其中处理器401可以执行实施例1中的数据水印识别分析方法。存储器404中存储一组程序代码，且处理器401调用存储器404中存储的程序代码，以用于执行实施例1中的数据水印识别分析方法。其中，通信总线402可以是外设部件互连标准(peripheral componentinterconnect，简称PCI)总线或扩展工业标准结构(extended industry standardarchitecture，简称EISA)总线等。通信总线402可以分为地址总线、数据总线、控制总线等。为便于表示，图5中仅用一条线表示，但并不表示仅有一根总线或一种类型的总线。其中，存储器404可以包括易失性存储器(英文：volatile memory)，例如随机存取存储器(英文：random-access memory，缩写：RAM)；存储器也可以包括非易失性存储器(英文：non-volatile memory)，例如快闪存储器(英文：flash memory)，硬盘(英文：hard disk drive，缩写：HDD)或固降硬盘(英文：solid-state drive，缩写：SSD)；存储器404还可以包括上述种类的存储器的组合。其中，处理器401可以是中央处理器(英文：central processingunit，缩写：CPU)，网络处理器(英文：network processor，缩写：NP)或者CPU和NP的组合。The memory 404 can optionally also be at least one storage device located away from the aforementioned processor 401 . The processor 401 may execute the data watermark identification and analysis method in Embodiment 1. A set of program codes are stored in the memory 404, and the processor 401 calls the program codes stored in the memory 404 for executing the data watermark identification and analysis method in Embodiment 1. The communication bus 402 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus or the like. The communication bus 402 can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one line is shown in FIG. 5, but it does not mean that there is only one bus or one type of bus. The memory 404 may include volatile memory (English: volatile memory), such as random-access memory (English: random-access memory, abbreviation: RAM); the memory may also include non-volatile memory (English: non-volatile memory) memory), such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid-state drive (English: solid-state drive, abbreviation: SSD); the memory 404 may also include the above types combination of memory. The processor 401 may be a central processing unit (English: central processing unit, abbreviation: CPU), a network processor (English: network processor, abbreviation: NP), or a combination of CPU and NP.

其中，存储器404可以包括易失性存储器(英文：volatile memory)，例如随机存取存储器(英文：random-access memory，缩写：RAM)；存储器也可以包括非易失性存储器(英文：non-volatile memory)，例如快闪存储器(英文：flash memory)，硬盘(英文：hard diskdrive，缩写：HDD)或固态硬盘(英文：solid-state drive，缩写：SSD)；存储器404还可以包括上述种类的存储器的组合。The memory 404 may include volatile memory (English: volatile memory), such as random-access memory (English: random-access memory, abbreviation: RAM); the memory may also include non-volatile memory (English: non-volatile memory) memory), such as flash memory (English: flash memory), hard disk (English: hard diskdrive, abbreviation: HDD) or solid-state drive (English: solid-state drive, abbreviation: SSD); the memory 404 may also include the above-mentioned types of memory The combination.

其中，处理器401可以是中央处理器(英文：central processing unit，缩写：CPU)，网络处理器(英文：network processor，缩写：NP)或者CPU和NP的组合。The processor 401 may be a central processing unit (English: central processing unit, abbreviation: CPU), a network processor (English: network processor, abbreviation: NP), or a combination of CPU and NP.

其中，处理器401还可以进一步包括硬件芯片。上述硬件芯片可以是专用集成电路(英文：application-specific integrated circuit，缩写：ASIC)，可编程逻辑器件(英文：programmable logic device，缩写：PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(英文：complex programmable logic device，缩写：CPLD)，现场可编程逻辑门阵列(英文：field-programmable gate array，缩写：FPGA)，通用阵列逻辑(英文：generic arraylogic,缩写：GAL)或其任意组合。The processor 401 may further include a hardware chip. The above-mentioned hardware chip may be an application-specific integrated circuit (English: application-specific integrated circuit, abbreviation: ASIC), a programmable logic device (English: programmable logic device, abbreviation: PLD) or a combination thereof. The above-mentioned PLD may be a complex programmable logic device (English: complex programmable logic device, abbreviation: CPLD), a field programmable gate array (English: field-programmable gate array, abbreviation: FPGA), a general array logic (English: generic arraylogic , abbreviation: GAL) or any combination thereof.

可选地，存储器404还用于存储程序指令。处理器401可以调用程序指令，实现如本申请执行实施例1中的数据水印识别分析方法。Optionally, memory 404 is also used to store program instructions. The processor 401 may invoke program instructions to implement the data watermark identification and analysis method as in the first embodiment of the present application.

本发明实施例还提供一种计算机可读存储介质，计算机可读存储介质上存储有计算机可执行指令，该计算机可执行指令可执行实施例1中的数据水印识别分析方法。其中，所述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory，ROM)、随机存储记忆体(Random Access Memory，RAM)、快闪存储器(Flash Memory)、硬盘(Hard Disk Drive，缩写：HDD)或固态硬盘(Solid-State Drive，SSD)等；所述存储介质还可以包括上述种类的存储器的组合。Embodiments of the present invention further provide a computer-readable storage medium, where computer-executable instructions are stored on the computer-readable storage medium, and the computer-executable instructions can execute the data watermark identification and analysis method in Embodiment 1. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a flash memory (Flash Memory), a hard disk (Hard) Disk Drive, abbreviation: HDD) or solid-state drive (Solid-State Drive, SSD), etc.; the storage medium may also include a combination of the above-mentioned types of memories.

显然，上述实施例仅仅是为清楚地说明所作的举例，而并非对实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。而由此所引申出的显而易见的变化或变动仍处于本发明创造的保护范围之中。Obviously, the above-mentioned embodiments are only examples for clear description, and are not intended to limit the implementation manner. For those of ordinary skill in the art, changes or modifications in other different forms can also be made on the basis of the above description. There is no need and cannot be exhaustive of all implementations here. However, the obvious changes or changes derived from this are still within the protection scope of the present invention.

Claims

1. a data watermark identification analysis method, is characterized in that, comprises:

Acquiring the data content to be identified, the data content to be identified includes: at least one single piece of data;

Classify according to the content of the data to be identified, and generate at least one semantic segment;

Generate a semantic library according to different semantic segments;

A single piece of data in the data content to be identified is matched with a semantic segment in the semantic database, and data that cannot be matched with the semantic segment in the semantic database is marked as a data watermark.

2 . The data watermark identification and analysis method according to claim 1 , wherein the content of the data to be identified is obtained as a total set of input data with the same grammatical structure. 3 .

3. The data watermark identification and analysis method according to claim 1 or 2, wherein at least one semantic segment is generated by classifying the same field and the position of the same field in the data content to be identified.

4. The method for identifying and analyzing data watermarks according to claim 3, wherein the classification is performed according to the same field and the position of the same field in the data content to be identified, and at least one semantic segment is generated, comprising:

Determining that the number of repeated fields in the data content to be identified is greater than or equal to the first preset value as the same field, and count the same field and the field position of the same field;

Delete the same fields whose number is less than the second preset value, and keep the same fields whose number is greater than or equal to the second preset value;

According to the sequence position of a single piece of data in the data content to be identified, the same fields that are retained are sorted by sequence position to generate at least one semantic segment.

5. The method for identifying and analyzing data watermarks according to claim 4, wherein the single piece of data in the data content to be identified is matched with the semantic segment in the semantic library, and cannot be matched with the semantic segment in the semantic library After the step of marking the data as a data watermark, the method further includes: returning the content and position marked as a data watermark.

6 . The data watermark identification and analysis method according to claim 1 , wherein the data content to be identified comprises: at least one of text and character strings. 7 .

7. A data watermark identification and analysis system, characterized in that, comprising:

a data acquisition module, configured to acquire the content of the data to be identified, the content of the data to be identified includes: at least one single piece of data;

A semantic segment generation module, configured to classify according to the content of the data to be identified, and generate at least one semantic segment;

The lexicon segment generation module is used to generate a semantic library according to different semantic segments;

The data watermark recognition module is used to match a single piece of data in the data content to be recognized with a semantic segment in the semantic database, and mark the data that cannot be matched with the semantic segment in the semantic database as a data watermark.

8. data watermark identification analysis system according to claim 7, is characterized in that, also comprises:

The data watermark content and position acquisition module is used to return the content and position marked as data watermark.

9. A terminal, comprising: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, The instructions are executed by the at least one processor, so that the at least one processor executes the data watermark identification and analysis method according to any one of claims 1-6.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores computer instructions, and the computer instructions are used to cause the computer to perform the data watermark identification according to any one of claims 1-6 Analytical method.