CN103116620B

CN103116620B - Based on the unstructured data safety filtering method of strategy

Info

Publication number: CN103116620B
Application number: CN201310034326.1A
Authority: CN
Inventors: 汪晨; 林为民; 张涛; 邓松; 马缓缓; 时坚; 李伟伟; 周诚; 管小娟
Original assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; State Grid Jiangsu Electric Power Co Ltd; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; State Grid Jiangsu Electric Power Co Ltd; Global Energy Interconnection Research Institute; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2013-01-29
Filing date: 2013-01-29
Publication date: 2016-01-20
Anticipated expiration: 2033-01-29
Also published as: CN103116620A

Abstract

The invention provides a kind of unstructured data safety filtering method based on strategy, the technology combined by Technology and Administration carries out a kind of method of safety filtering to unstructured data, technically, by the policing rule expression formula specification policing rule of flexible design; In management, by the demand analysis to operation system, managerial personnel and business personnel's Joint Designing go out suitable policing rule, the operating personnel of unstructured data transmission are data interpolation attribute flags information, during filtration, the parameter that policing rule expression formula and label information mate as policing rule is mated, by matching result, safety filtering is carried out to unstructured data.

Description

Policy-based Unstructured Data Security Filtering Method

技术领域technical field

本发明属于信息安全领域，具体涉及一种基于策略的非结构化数据安全过滤方法。The invention belongs to the field of information security, and in particular relates to a policy-based unstructured data security filtering method.

背景技术Background technique

随着社会信息化建设的日益完善，很多大型企业、政务部门已实现办公运营信息化和数字化，但由于初始建设时为了安全，网络分为内网和外网，造成互不相通的信息孤岛。为了能够更好服务用户大众，企业或政务部门开始有限度的将内部网络与英特网互联，随之带来的就是数据传输的安全问题。With the increasing improvement of social informatization construction, many large enterprises and government departments have realized informatization and digitalization of office operations. However, due to the safety of the initial construction, the network is divided into internal network and external network, resulting in isolated information islands. In order to better serve the users and the public, enterprises or government departments have begun to interconnect their internal networks with the Internet to a limited extent, which brings about the security of data transmission.

网络信息中的数据主要有结构化和非结构化两种数据形式，结构化数据常以二维表的形式存储于数据库中，由于数据易于提取，在交互过程中过滤也相对容易；而像Word、PDF、图像、视频等没有固定形式的非结构化数据，它们的数据格式各不相同，并且数据内容难于提取，网络交换设备如何很好对它们进行过滤，甚至于统一的进行安全的过滤分析是当前过滤设备的一大挑战。The data in network information mainly has two data forms: structured and unstructured. Structured data is often stored in the database in the form of a two-dimensional table. Because the data is easy to extract, it is relatively easy to filter during the interaction process; and like Word , PDF, image, video, etc. have no fixed form of unstructured data. Their data formats are different, and the data content is difficult to extract. How can network switching devices filter them well, and even perform unified security filtering analysis It is a big challenge for current filtration equipment.

专利申请号为CN201110316665.X、名称为基于标记的非结构化数据安全过滤方法的发明申请提供了一种非结构化数据安全过滤方法，虽然能够对非结构化数据进行过滤，但是策略规则比较简单，不能以灵活的方式对非结构化数据的标记值进行策略匹配过滤。The patent application number is CN201110316665.X, and the invention application titled the tag-based unstructured data security filtering method provides a unstructured data security filtering method. Although unstructured data can be filtered, the policy rules are relatively simple , cannot perform policy matching filtering on tagged values of unstructured data in a flexible manner.

发明内容Contents of the invention

为克服上述缺陷，本发明提供了一种基于策略的非结构化数据安全过滤方法，在服务端通过配置带变量的策略表达式，设计策略匹配算法，在客户端为非结构化数据添加标记值，当客户端带标记值的非结构化数据到服务端时，依据策略表达式，通过策略匹配算法以非常灵活的方式对非结构化数据的标记值进行策略匹配过滤。In order to overcome the above-mentioned defects, the present invention provides a policy-based unstructured data security filtering method, by configuring policy expressions with variables on the server side, designing a policy matching algorithm, and adding tag values for unstructured data on the client side , when the client sends the unstructured data with tagged value to the server, according to the policy expression, the tagged value of the unstructured data is matched and filtered in a very flexible way through the policy matching algorithm.

为实现上述目的，本发明提供一种基于策略的非结构化数据安全过滤方法，在服务端和客户端进行数据传输时进行数据过滤，其改进之处在于，所述方法包括如下步骤：In order to achieve the above object, the present invention provides a policy-based unstructured data security filtering method, which performs data filtering during data transmission between the server and the client. The improvement is that the method includes the following steps:

(1).设计策略规则表达式；(1). Design strategy rule expression;

(2).结合非结构化数据的属性与业务需求，依据策略规则表达式设计具体策略规则；(2). Combining the attributes and business requirements of unstructured data, design specific policy rules based on policy rule expressions;

(3).依据策略规则以及非结构化数据属性，将非结构化数据属性以标记信息的形式记录，并与非结构化数据一起传输；(3). According to policy rules and unstructured data attributes, unstructured data attributes are recorded in the form of tag information and transmitted together with unstructured data;

(4).解析策略规则，将字符串形式的策略规则构造成树型数据结构以进行策略匹配计算；(4). Analyze the policy rules, and construct the policy rules in the form of strings into a tree data structure for policy matching calculation;

(5).解析标记信息，并保存于哈希表中；(5). Parse the tag information and save it in the hash table;

(6).将策略规则数据结构与哈希表中的标记信息作为策略规则匹配的参数，计算匹配结果，匹配成功，允许数据通过，否则不允许数据通过，并记录日志。(6). The policy rule data structure and the tag information in the hash table are used as parameters for policy rule matching, and the matching result is calculated. If the match is successful, the data is allowed to pass, otherwise the data is not allowed to pass, and the log is recorded.

本发明提供的优选技术方案中，在所述步骤1中，策略规则为保存于策略配置文件中的文本串表达式；策略规则由自定义的表达式组成；策略规则表达式由服务端依据业务需求和非结构化数据文档属性设计，以对非结构化数据进行过滤。In the preferred technical solution provided by the present invention, in said step 1, the policy rule is a text string expression stored in the policy configuration file; the policy rule is composed of self-defined expressions; the policy rule expression is determined by the server according to the business Requirements and unstructured data document attribute design to filter unstructured data.

本发明提供的第二优选技术方案中，表达式包括：变量、值和操作符；变量的值在策略匹配过程中，从数据标记信息中提取。In the second preferred technical solution provided by the present invention, the expression includes: variable, value and operator; the value of the variable is extracted from the data tag information during the strategy matching process.

本发明提供的第三优选技术方案中，在所述步骤3中，标记信息由客户端依据文档数据的属性、业务需求及双方协定的条件信息对文档做标记；文档标记是一个键值对应的列表；其中，键为策略表达式上的变量，值为变量的值。In the third preferred technical solution provided by the present invention, in the step 3, the marking information is marked by the client according to the attributes of the document data, business requirements and condition information agreed by both parties; the document mark is a key-value corresponding List; where the keys are variables on the policy expression and the values are the values of the variables.

本发明提供的第四优选技术方案中，对文档的标记包括：文档的大小、类型和文件名。In the fourth preferred technical solution provided by the present invention, the mark on the document includes: the size, type and file name of the document.

本发明提供的第五优选技术方案中，在所述步骤4中，将策略规则解析为适合于策略匹配的数据结构，以方便策略匹配计算；解析策略规则时，对文本的策略规则进行词法分析和语法分析，如果策略规则正确，生成树型策略数据结构，否则报错处理。In the fifth preferred technical solution provided by the present invention, in the step 4, the policy rules are parsed into a data structure suitable for policy matching, so as to facilitate policy matching calculations; when parsing the policy rules, lexical analysis is performed on the policy rules of the text And grammatical analysis, if the policy rules are correct, generate a tree-shaped policy data structure, otherwise report an error.

本发明提供的第六优选技术方案中，所述步骤5包括如下步骤：(5-1).从文档D中提取文档标记L；In the sixth preferred technical solution provided by the present invention, the step 5 includes the following steps: (5-1). Extracting the document mark L from the document D;

(5-2).构建哈希表H保存标记信息；(5-2). Build a hash table H to save the tag information;

(5-3).逐项获得标记信息，并将信息以键值对<key，value>的方式填充到H中。(5-3). Obtain tag information item by item, and fill the information into H in the form of key-value pair <key, value>.

本发明提供的第七优选技术方案中，所述步骤6包括如下步骤：In the seventh preferred technical solution provided by the present invention, the step 6 includes the following steps:

(6-1).构建一个先进先出的队列Queue；(6-1). Construct a first-in first-out queue Queue;

(6-2).从根开始遍历树型结构表达式Exp；(6-2). Traversing the tree structure expression Exp from the root;

(6-3).判定队列中的元素，如为1，返回Exp计算的结果值，否则返回错误。(6-3). Determine the element in the queue. If it is 1, return the result value calculated by Exp, otherwise return an error.

本发明提供的第八优选技术方案中，所述步骤6-2包括如下步骤：In the eighth preferred technical solution provided by the present invention, the step 6-2 includes the following steps:

a.如左子树不为空，遍历树的左子树；a. If the left subtree is not empty, traverse the left subtree of the tree;

b.如右子树不为空，遍历树的右子树；b. If the right subtree is not empty, traverse the right subtree of the tree;

c.如Exp为值value，将Exp入Queue；c. If Exp is value, put Exp into Queue;

d.如Exp为变量var，通过Get函数从保存于标记的哈希表H中提取变量值来构建新的Exp，将新的Exp入Queue中；d. If Exp is a variable var, use the Get function to extract the variable value from the hash table H stored in the tag to construct a new Exp, and put the new Exp into the Queue;

e.如Exp为操作符operator，判断操作符的操作数个数N，并从QQueue取N个操作数，计算表达式，计算结果以值value的方式保存于新的Exp中，然后再将新的Exp入Queue中。e. If Exp is an operator, judge the operand number N of the operator, and take N operands from QQueue, calculate the expression, save the calculation result in the new Exp in the form of value, and then save the new Exp The Exp into the Queue.

本发明提供的第九优选技术方案中，Get函数为Get(key，H)-＞value；In the ninth preferred technical solution provided by the present invention, the Get function is Get(key, H)->value;

Get函数通过key，从哈希表H中，返回相对应的值，此时将变量var作为H的key。The Get function returns the corresponding value from the hash table H through the key. At this time, the variable var is used as the key of H.

与现有技术比，本发明提供的一种基于策略的非结构化数据安全过滤方法，目的在于解决不同安全级别的网络间安全传输非结构化数据的问题；由于当前内容过滤技术在解决非结构化传输的安全问题上并没有很好的方式，而基于策略的技术通过配置策略规则表达式，设计高效的策略匹配算法，并为传输交换的文档做属性标记，在通过网关服务器(不同网络间的接口)时，依据策略规则表达式，采用策略匹配算法的方式对文档标记属性进行匹配，从而对文档进行安全过滤，以确保非结构化数据在传输过程中的安全问题。Compared with the prior art, a policy-based unstructured data security filtering method provided by the present invention aims to solve the problem of secure transmission of unstructured data between networks of different security levels; There is no good way to solve the problem of security of the transmission, but the policy-based technology designs an efficient policy matching algorithm by configuring the policy rule expression, and marks the attributes of the documents exchanged by the gateway server (between different networks) interface), according to the policy rule expression, the policy matching algorithm is used to match the document tag attributes, so as to filter the document security to ensure the security of unstructured data in the transmission process.

附图说明Description of drawings

图1为方法包括的组件示意图。Figure 1 is a schematic diagram of the components involved in the method.

图2为基于策略的非结构化数据安全过滤方法的流程示意图。FIG. 2 is a schematic flowchart of a policy-based method for unstructured data security filtering.

图3为树结构示意图。Figure 3 is a schematic diagram of the tree structure.

具体实施方式detailed description

如图2所示，一种基于策略的非结构化数据安全过滤方法，在服务端和客户端进行数据传输时进行数据过滤，包括如下步骤：As shown in Figure 2, a policy-based unstructured data security filtering method performs data filtering during data transmission between the server and the client, including the following steps:

(1).设计策略规则表达式；(1). Design strategy rule expression;

在所述步骤1中，策略规则为保存于策略配置文件中的文本串表达式；策略规则由自定义的表达式组成；策略规则表达式由服务端依据业务需求和非结构化数据文档属性设计，以对非结构化数据进行过滤。In the step 1, the policy rule is a text string expression stored in the policy configuration file; the policy rule is composed of self-defined expressions; the policy rule expression is designed by the server according to business requirements and unstructured data document attributes , to filter on unstructured data.

表达式包括：变量、值和操作符；变量的值在策略匹配过程中，从数据标记信息中提取。The expression includes: variable, value and operator; the value of the variable is extracted from the data tag information during the policy matching process.

在所述步骤3中，标记信息由客户端依据文档数据的属性、业务需求及双方协定的条件信息对文档做标记；文档标记是一个键值对应的列表；其中，键为策略表达式上的变量，值为变量的值。In said step 3, the marking information is marked by the client according to the attributes of the document data, the business requirements and the conditional information agreed by both parties; the document marking is a list corresponding to a key value; wherein, the key is the key on the policy expression variable, the value is the value of the variable.

对文档的标记包括：文档的大小、类型和文件名。The marking of the document includes: the size, type and file name of the document.

在所述步骤4中，将策略规则解析为适合于策略匹配的数据结构，以方便策略匹配计算；解析策略规则时，对文本的策略规则进行词法分析和语法分析，如果策略规则正确，生成树型策略数据结构，否则报错处理。In said step 4, the policy rule is parsed into a data structure suitable for policy matching, so as to facilitate policy matching calculation; when parsing the policy rule, lexical analysis and grammatical analysis are carried out to the policy rule of the text, if the policy rule is correct, generate a tree type policy data structure, otherwise an error will be reported.

所述步骤5包括如下步骤：(5-1).从文档D中提取文档标记L；The step 5 includes the following steps: (5-1). Extracting the document mark L from the document D;

(5-3).逐项获得标记信息，并将信息以键值对<key,value>的方式填充到H中。(5-3). Obtain tag information item by item, and fill the information into H in the form of key-value pair <key, value>.

所述步骤6包括如下步骤：Described step 6 comprises the steps:

所述步骤6-2包括如下步骤：Said step 6-2 comprises the following steps:

a.如左子树不为空，遍历树的左子树； a. If the left subtree is not empty, traverse the left subtree of the tree;

c.如Exp为值value，将Exp入Queue；c. If Exp is value, put Exp into Queue;

e.如Exp为操作符operator，判断操作符的操作数个数N，并从Queue取N个操作数，计算表达式，计算结果以值value的方式保存于新的Exp中，然后再将新的Exp入Queue中。e. If Exp is an operator, judge the number of operands N of the operator, and take N operands from the Queue, calculate the expression, save the calculation result in the new Exp in the form of value, and then save the new Exp The Exp into the Queue.

Get函数为Get(key，H)-＞value；Get函数通过key，从哈希表H中，返回相对应的值，此时将变量var作为H的key。The Get function is Get(key, H)->value; the Get function returns the corresponding value from the hash table H through the key, and the variable var is used as the key of H at this time.

通过以下实施例对基于策略的非结构化数据安全过滤方法做进一步解释。The policy-based unstructured data security filtering method is further explained through the following embodiments.

图1给出了基于策略的非结构化数据安全过滤方法参考体系结构图，它主要包括三个部分：策略规则信息、数据标记信息和匹配算法。策略规则信息包括策略规则和规则解析两部分。策略规则为表达式文本串，存放于策略配置文件中，它为过滤的依据；规则解析是将规则文本解析为适合匹配的表达形式(数据结构)。标记信息为客户端为文档做出的属性描述，包含有文档的相关信息、用户操作信息等，不同的文档具有不同的标记信息。匹配算法依据客户端传输数据的标记信息，依据策略规则表达式进行计算匹配，以匹配结果作为文档过滤的依据。通过在服务端设计和配置策略规则及在客户端为文档添加属性描述，依据数据属性与策略规则表达式之间的关系来匹配策略结果，以此作为过滤凭证。策略规则设计为带变量的数学表达式形式，计算非常简便。此外，由于策略规则的灵活性，此过滤方法的扩展性非常强大。Figure 1 shows the reference architecture of policy-based unstructured data security filtering method, which mainly includes three parts: policy rule information, data label information and matching algorithm. Policy rule information includes policy rules and rule analysis. Policy rules are expression text strings, stored in policy configuration files, which are the basis for filtering; rule parsing is to parse the rule text into an expression form (data structure) suitable for matching. The tag information is an attribute description made by the client for the document, including document-related information, user operation information, etc. Different documents have different tag information. The matching algorithm calculates and matches according to the tag information of the data transmitted by the client and the policy rule expression, and uses the matching result as the basis for document filtering. By designing and configuring policy rules on the server side and adding attribute descriptions to documents on the client side, the policy results are matched according to the relationship between data attributes and policy rule expressions, which are used as filtering credentials. The policy rules are designed as mathematical expressions with variables, and the calculation is very simple. In addition, due to the flexibility of policy rules, this filtering method is very scalable.

下面给出具体介绍：The specific introduction is given below:

策略规则：策略规则为保存于策略配置文件中的一文本串表达式。策略规则由一个或多个自定义的表达式组成。表达式由变量，值和操作符构成。变量的值在策略匹配过程中，从数据标记信息中提取。由于策略规则使用表达式，所以策略规则的设计非常灵活。在实现时，具体的合适的策略规则表达式由服务端管理员或业务员依据业务需求和非结构化数据文档属性设计，以对非结构化数据进行过滤。Policy rule: A policy rule is a text string expression stored in a policy configuration file. Policy rules consist of one or more custom expressions. Expressions consist of variables, values and operators. The value of the variable is extracted from the data tag information during the policy matching process. Because policy rules use expressions, the design of policy rules is very flexible. During implementation, specific and appropriate policy rule expressions are designed by server administrators or salespersons based on business requirements and attributes of unstructured data documents to filter unstructured data.

解析策略规则：在策略规则配置好之后，为了更好的对策略规则进行匹配计算，需要将策略规则解析为适合于策略匹配的数据结构，以方便策略匹配计算。解析策略规则时，需要对文本串的策略规则进行词法分析和语法分析。如果策略规则正确，将生成策略数据结构，否则，将会报错处理。Parsing policy rules: After the policy rules are configured, in order to better match and calculate the policy rules, it is necessary to parse the policy rules into a data structure suitable for policy matching to facilitate policy matching calculations. When parsing the policy rules, it is necessary to perform lexical analysis and syntax analysis on the policy rules of the text string. If the policy rules are correct, a policy data structure will be generated, otherwise, an error will be reported.

标记信息：标记信息由客户端(发送数据一端)依据文档数据的属性、业务需求及双方协定的条件信息对文档做标记(如文档的大小，类型，文件名等)。文档标记是灵活和可变动的，它为一个键值对的列表，其中，键为策略表达式上的变量，值为变量的值。Marking information: Marking information is marked by the client (the end that sends the data) to the document (such as the size, type, file name, etc.) of the document according to the attributes of the document data, business requirements, and conditional information agreed by both parties. Document markup is flexible and mutable, and it is a list of key-value pairs, where the key is a variable on the policy expression and the value is the value of the variable.

匹配：在有了规则数据结构和标记信息后，匹配算法以规则数据结构和标记信息作为参数，通过遍历并计算策略规则表达式，来过滤文档；在匹配计算中，表达式中的变量以标记信息中对应变量的值取代。相对于策略规则和标记信息，匹配算法是独立的，它不受前面两个部分的影响。Matching: After having the rule data structure and tag information, the matching algorithm takes the rule data structure and tag information as parameters, and filters documents by traversing and calculating the policy rule expression; in the matching calculation, the variables in the expression are tagged The value of the corresponding variable in the message is replaced. Compared with policy rules and tag information, the matching algorithm is independent, and it is not affected by the previous two parts.

1、策略规则表达式设计1. Policy rule expression design

基于策略的非结构化数据安全过滤方法核心任务就是设计出适应过滤条件的策略规则。本专利提出一种策略规则的语法规范P，在此规范的基础上，能够灵活的设计出各种策略规则。策略规则的规范P描述为：P由变量，值，操作符三种基本元素构成，操作规则有，The core task of the policy-based unstructured data security filtering method is to design policy rules that adapt to the filtering conditions. This patent proposes a grammatical specification P of policy rules. Based on this specification, various policy rules can be flexibly designed. The specification P of policy rules is described as: P is composed of three basic elements: variables, values, and operators. The operation rules are:

规则一：值value和变量var是表达式expRule 1: The value value and the variable var are expressions exp

规则二：表达式通过单目操作符opu构成新的表达式Rule 2: Expressions form new expressions through the unary operator opu

规则三：表达式与表达式通过双目操作符opb构成新的表达式最终形式的表达式构成策略。在上述描述中，变量由字母，数字和下划线组成，且首字母不为数字；值由整型数(数学上的整数)，浮点型数(数学上的实数)和字符串组成；操作符由关系操作符(＞，＞＝，＜，＜＝，＝＝，！＝)、逻辑操作符(&&，||，！)及子字符串(substr：左操作数为右操作数的子字符串)操作构成。Rule 3: Expressions and expressions form the expression formation strategy of the final form of new expressions through the binocular operator opb. In the above description, the variable is composed of letters, numbers and underscores, and the first letter is not a number; the value is composed of integer numbers (mathematical integers), floating-point numbers (mathematical real numbers) and strings; operators Relational operators (>, >=, <, <=, ==, !=), logical operators (&&, ||, !) and substrings (substr: the subcharacter whose left operand is the right operand String) operation composition.

策略规则规范P形式化的描述如下：The formal description of the policy rule specification P is as follows:

在配置策略规则时，设计人员依据策略规则规范，结合业务需求与用户操作，选择合适的变量，以及变量满足的值，通过操作符将变量与值之间的关系构成一个策略规则，并以字条串的形式保存于策略规则文件中。设计步骤如下：When configuring policy rules, the designer selects the appropriate variable and the value that the variable satisfies according to the policy rule specification, combined with business requirements and user operations, and forms a policy rule through the relationship between the variable and the value through the operator. The form of a string is saved in the policy rule file. The design steps are as follows:

a.依据策略规则规范P；a. Specification P according to policy rules;

b.依据文档的属性信息，结合业务需求信息，将文字要求转换成策略规则b. Based on the attribute information of the document and combined with the business requirement information, the text requirements are converted into policy rules

表达式expexpression exp

如：通过文档类型，做初始过滤。业务需求仅接收PDF文档，则可设计变量doc_type，使用＝＝操作符，则策略表达式为For example: Do initial filtering by document type. If the business requirement only accepts PDF documents, the variable doc_type can be designed, and the == operator is used, and the policy expression is

doc_type＝＝“PDF”doc_type == "PDF"

此表达式当文档名为PDF时，操作结果为true，允许通过，否则，不允许通过。再如：假设文档密级有1，2，3，4，5五个等级，且数字越大，等级越高，某业务需求为文档密级大于3时，文档不允许传输，则可设计变量doc_security_grade，操作符为＞和！，策略表达式为When this expression is called PDF, the result of the operation is true, and it is allowed to pass, otherwise, it is not allowed to pass. Another example: Assume that the document security level has five levels: 1, 2, 3, 4, and 5, and the larger the number, the higher the level. A certain business requirement is that when the document security level is greater than 3, the document is not allowed to be transmitted, and the variable doc_security_grade can be designed. The operators are > and ! , the policy expression is

！(doc_security_grade＞3)! (doc_security_grade>3)

表示为文档密级大于3时，返回结果为假，不允许传输，反之，允许传输。Indicates that when the document security level is greater than 3, the return result is false, and the transmission is not allowed; otherwise, the transmission is allowed.

2、标记信息2. Mark information

基于策略的非结构化数据安全过滤方法在客户端(文档传输端)对文档做标记信息。标记不仅包含有文档的加密、数字摘要、授权，所有者等所有文档都会存在的信息，还包含有服务端策略规则中的变量的值信息。做标记信息步骤为：The policy-based unstructured data security filtering method marks documents on the client side (document transmission side). The tag not only contains the encryption, digital abstract, authorization, owner and other information that all documents will exist in the document, but also contains the value information of the variables in the server-side policy rules. The steps to mark information are:

a.构建标记La. Build Mark L

b.获得文档的属性信息A，添加到标记L(A)b. Obtain the attribute information A of the document and add it to the tag L(A)

c.获得用户操作属性信息O，添加到标记L(O) c. Obtain user operation attribute information O, add to mark L(O)

d.将标记添加数据文档数据中D(L)d. Add the mark to the data document data D(L)

3、策略规则解析3. Analysis of policy rules

在方法流程1中，设计出的策略规则以字符串形式存在，而为了能够进行策略的匹配操作，需要将字符串形式的策略以适合匹配规则的数据结构展现。In method flow 1, the designed policy rules exist in the form of strings, and in order to perform policy matching operations, the policies in the form of strings need to be presented in a data structure suitable for matching rules.

为了能够很好的进行策略匹配，使用如下数据结构对策略表达式进行保存。In order to perform policy matching well, use the following data structure to save policy expressions.

Exp可以保存一个值表达式(使用data域)；Exp可以保存有一个带操作符的表达式，此时有操作符保存于operator，lchild保存左操作数，rchild保存右操作数。通过多个Exp的递归，Exp最终能够保存一条完整的策略。Exp can save a value expression (using the data field); Exp can save an expression with an operator. At this time, the operator is saved in the operator, lchild saves the left operand, and rchild saves the right operand. Through the recursion of multiple Exps, Exp can finally save a complete strategy.

策略规则解析的目的就是将字符串形式的策略规则exp解析成以Exp结构的策略，以便于进行策略匹配。策略解析规则如下：The purpose of policy rule parsing is to parse the policy rule exp in the form of a character string into a policy with Exp structure, so as to facilitate policy matching. The policy parsing rules are as follows:

a.递归的对exp进行如下处理：a. Recursively process exp as follows:

b-1.当exp为值value或变量var时，构造一个Exp，将value保存于Exp中的data域中，其他域为空；b-1. When exp is the value value or the variable var, construct an Exp, save the value in the data field in the Exp, and leave the other fields empty;

b-2.当exp为op_uexp时，构造一个Exp，将op_u给Exp中的operator域，将op_u后面的exp给Exp的rchild，其他域为空；b-2. When exp is op _u exp, construct an Exp, assign op _u to the operator field in the Exp, assign the exp behind op _u to the rchild of the Exp, and leave the other fields empty;

b-3.当exp为expop_bexp时，构造一个Exp，将op_b给Exp中的operator域，将oP_b后面的exp给Exp的rchild，将oP_b前面的exp给Exp的lchild，其他域为空；b-3. When exp is expop _b exp, construct an Exp, assign op _b to the operator field in Exp, assign the exp after oP _b to the rchild of Exp, assign the exp before oP _b to lchild of Exp, and other fields Is empty;

b.返回由Exp构造的树型数据结构的根，即为一条完整的策略数据结构。b. Return the root of the tree data structure constructed by Exp, which is a complete policy data structure.

4、标记信息提取4. Label information extraction

当服务端获得文档数据后，对文档数据进行解析，以获得标记信息，并将标记信息转换为宜于匹配计算的数据结构。步骤如下：After the server obtains the document data, it parses the document data to obtain tag information, and converts the tag information into a data structure suitable for matching calculation. Proceed as follows:

a.从文档D中提取文档标记L；a. Extract document token L from document D;

b.构建哈希表(HashMap)H保存标记信息；b. Build a hash table (HashMap) H to save the tag information;

c.逐项获得标记信息，并将信息以键值对<key，value>的方式填充到H中。c. Obtain tag information item by item, and fill the information into H in the form of key-value pairs <key, value>.

5、策略规则匹配5. Policy rule matching

由于策略规则在解析后，以表达式方式保存于树型结构中，在策略匹配时，通过后序遍历树型结构，且遍历时计算表达式，最终会得到一个结果值。结果值可以用来判断是否允许文档通过。策略匹配步骤如下：Since the policy rules are stored in the tree structure in the form of expressions after parsing, when the policy is matched, the tree structure is traversed in the postorder, and the expression is calculated during the traversal, and a result value will be obtained in the end. The resulting value can be used to determine whether to allow the document to pass through. The policy matching steps are as follows:

a.构建一个先进先出的队列Queue；a. Build a first-in first-out queue Queue;

b.从根开始后序遍历树型结构表达式Expb. Traverse the tree structure expression Exp in postorder from the root

b1.如左子树(Exp的lchild)不为空，遍历树的左子树；b1. If the left subtree (lchild of Exp) is not empty, traverse the left subtree of the tree;

b2.如右子树(Exp的rchild)不为空，遍历树的右子树；b2. If the right subtree (the rchild of Exp) is not empty, traverse the right subtree of the tree;

b3.如Exp为值value(data不为空)，将Exp入Queue；b3. If Exp is value value (data is not empty), put Exp into Queue;

b4.如Exp为变量var(data不为空)，通过Get函数从保存于标记的哈希表(HashMap)H中提取变量值来构建新的Exp，将新的Exp入Queue中。b4. If Exp is a variable var (data is not empty), use the Get function to extract the variable value from the marked hash table (HashMap) H to construct a new Exp, and put the new Exp into the Queue.

Get函数为The Get function is

Get(key，H)-＞valueGet(key, H)->value

b5.如Exp的operator不为空，即表达式为操作符，判断操作符的操作数个数N，并从Queue取N个操作数，计算表达式，计算结果保存于新的Exp中，然后再将新的Exp入Queue中b5. If the operator of Exp is not empty, that is, the expression is an operator, judge the number of operands N of the operator, and take N operands from the Queue, calculate the expression, save the calculation result in the new Exp, and then Then put the new Exp into the Queue

判定队列中的元素，如为1，返回Exp计算的结果值，否则，表示表达式有错，返回错误。Determine the element in the queue, if it is 1, return the result value calculated by Exp, otherwise, it means that the expression is wrong, and return an error.

为了方便描述，我们假设有如下应用实例：For the convenience of description, we assume the following application examples:

某企业具有内部安全级别高的内部网络和安全级别低的外部网络，外部网络与英特网相联，为了企业内部的业务需求和更好的服务客户，需将内部网络与外部网相连。在连接内部网络与外部网络的网关处，布置基于策略的非结构化数据安全过滤器，内部和外部用户传输文档端添加为文档作标记信息功能的工具。文档在传输之前，依据事先协商好的策略对文档策略变量添加标记信息，在穿过内外网连接处时，即过滤服务器时，服务器会提取标记信息，对策略进行匹配计算，一旦策略匹配不通过，将会对文档进行不予通过处理，反之，允许通过。假定应用场景为用户将非结构化数据文档从安全级别高的内部网络传递到安全级别低的外部网络上，其具体的实施方案为：An enterprise has an internal network with a high internal security level and an external network with a low security level. The external network is connected to the Internet. In order to meet the internal business needs of the enterprise and serve customers better, it is necessary to connect the internal network to the external network. At the gateway connecting the internal network and the external network, a policy-based unstructured data security filter is arranged, and internal and external users transmit documents and add tools for marking information for documents. Before the document is transmitted, mark information is added to the document policy variable according to the pre-negotiated policy. When passing through the connection between the internal and external network, that is, when filtering the server, the server will extract the mark information and perform matching calculation on the policy. Once the policy matching fails , the document will be rejected, otherwise, it will be allowed to pass. Assume that the application scenario is that the user transfers unstructured data files from the internal network with high security level to the external network with low security level. The specific implementation plan is as follows:

一、策略规则：1. Policy rules:

策略规则依据业务需求由服务器的管理员来设定，并要求客户端在传输文档的时候填充策略中的变量值来起到过滤的作用。这里假定业务需要如下：Policy rules are set by the server administrator according to business requirements, and the client is required to fill in the variable values in the policy to filter when transmitting documents. It is assumed here that the business needs are as follows:

1.文档来源为内网业务系统的IP为192.168.216.51. The source of the document is the IP of the intranet business system is 192.168.216.5

2.文档名中不能包括有“方案”2. The file name cannot contain "scheme"

3.文档密级不能超过3(假定密级为1，2，3，4，5五个等级，等级越高，安全级别越高)3. The confidentiality level of the document cannot exceed 3 (assuming that the confidentiality level is 1, 2, 3, 4, and 5, the higher the level, the higher the security level)

则依据策略规则规范，使用变量src表示文档源，name表示文档名，sec_grade表示文件密级，则策略规则如下，According to the policy rule specification, use the variable src to indicate the document source, name to indicate the document name, and sec_grade to indicate the file security level, and the policy rules are as follows,

(src＝”192.168.216.5”)&&(！(“方案”substrname))&&(sec_grade＜＝3)(src="192.168.216.5")&&(!("Scheme" substrname))&&(sec_grade<=3)

二、标记信息：2. Mark information:

在文档传输的客户端，在协商过程中，客户端知道需要对文档做文档源，文档名和文档密级的标记操作，假定有个客户端，传输三份文档：On the client side of document transmission, during the negotiation process, the client knows that it needs to mark the document source, document name and document confidentiality. Suppose there is a client that transmits three documents:

文档1的标记值如下：The tag values for document 1 are as follows:

{src：192.168.216.6，name：“设计方案”，sec_grade＝4}{src: 192.168.216.6, name: "Design", sec_grade=4}

文档2的标记值如下：Document 2 has tagged values as follows:

{src：192.168.216.5，name：“设计方案”，sec_grade＝4}{src: 192.168.216.5, name: "Design", sec_grade=4}

文档3的标记值如下：Document 3 has tagged values as follows:

{src：192.168.216.5，name：“业务系统操作说明”，sec_grade＝1}{src: 192.168.216.5, name: "Business System Operation Instructions", sec_grade=1}

三、匹配：3. Matching:

1.匹配之前需要对策略规则做解析，将字符串形式的策略规则转换成树型结构，则“一”中的策略规则将变成如下的树结构，如图3所示。1. It is necessary to analyze the policy rules before matching, and convert the policy rules in the form of strings into a tree structure, then the policy rules in "one" will become the following tree structure, as shown in Figure 3.

2.然后依据后序遍历树型结构，并在遇到变量时，从标记信息中去提取变量的值，来匹配标记信息，以此来对文档进行过滤。2. Then traverse the tree structure according to the postorder, and when a variable is encountered, extract the value of the variable from the tag information to match the tag information, so as to filter the document.

后序遍历树的作用就是先计算左子树的值，然后再计算右子树的值，左子树，右子树都计算好了，再计算上层子树左子树，右子树，直到树根。The function of the post-order traversal tree is to calculate the value of the left subtree first, and then calculate the value of the right subtree. The left subtree and the right subtree are calculated, and then calculate the left subtree and right subtree of the upper subtree until root.

所以依据后序遍历规则，对于文档1有Therefore, according to the post-order traversal rules, for document 1 there is

2.1匹配计算2.1 Match calculation

src＝192.168.216.5src=192.168.216.5

由于标记提取的src值为192.168.216.6，所以计算值为false。Since the src value extracted by the tag is 192.168.216.6, the calculated value is false.

2.2匹配计算2.2 Match calculation

“方案”substrname"scheme" substrname

由于标记提取文件名值为“设计方案“，所以返回值为true。Since the tag extraction file name value is "Design", the return value is true.

2.3匹配计算2.3 Match calculation

！(“方案“substrname)! ("Scheme" substrname)

由于2.2中已经得到false结果，所以此步结果为true。Since the false result has been obtained in 2.2, the result of this step is true.

2.4匹配计算2.4 Match calculation

sec_grade＞＝3sec_grade>=3

由于sec_grade为4，所以返回false。Since sec_grade is 4, it returns false.

2.5匹配计算2.5 Match calculation

(！(“方案“substrname)&&(sec_grade＞＝3))(!("scheme" substrname) && (sec_grade >= 3))

由2.3和2.4结果可得，此结果为false。From the results of 2.3 and 2.4, this result is false.

2.6匹配计算2.6 Match calculation

(src＝192.168.216.5)&&(！(“方案“substrname)&&(sec_grade＞＝3))(src=192.168.216.5)&&(!("Scheme" substrname)&&(sec_grade>=3))

由2.1和2.4的计算结果可知，最终匹配结果为false，所以，此文档将被过滤，即不允许通过。It can be seen from the calculation results of 2.1 and 2.4 that the final matching result is false, so this document will be filtered, that is, not allowed to pass.

依据上面的步骤，对文档2和文档3进行匹配计算，将得到文档2的结果为false，而文档3的结果为true，即文档3允许通过，而文档2不允许通过。According to the above steps, the matching calculation is performed on document 2 and document 3, and the result of document 2 is false, while the result of document 3 is true, that is, document 3 is allowed to pass, but document 2 is not allowed to pass.

需要声明的是，本发明内容及具体实施方式意在证明本发明所提供技术方案的实际应用，不应解释为对本发明保护范围的限定。本领域技术人员在本发明的精神和原理启发下，可作各种修改、等同替换、或改进。但这些变更或修改均在申请待批的保护范围内。It should be declared that the contents and specific implementation methods of the present invention are intended to prove the practical application of the technical solutions provided by the present invention, and should not be construed as limiting the protection scope of the present invention. Those skilled in the art may make various modifications, equivalent replacements, or improvements under the inspiration of the spirit and principles of the present invention. But these changes or modifications are all within the protection scope of the pending application.

Claims

1. a policy-based unstructured data security filtering method, data filtering is carried out when server and client carry out data transmission, it is characterized in that, described method comprises the steps:

(1). Design strategy rule expression;

(2). Combining the attributes and business requirements of unstructured data, design specific policy rules based on policy rule expressions;

(3). According to policy rules and unstructured data attributes, unstructured data attributes are recorded in the form of tag information and transmitted together with unstructured data;

(4). Analyze the policy rules, and construct the policy rules in the form of strings into a tree data structure for policy matching calculation;

(5). Parse the tag information and save it in the hash table;

(6). The policy rule data structure and the tag information in the hash table are used as parameters for policy rule matching, and the matching result is calculated. If the match is successful, the data is allowed to pass, otherwise the data is not allowed to pass, and the log is recorded.

2. The method according to claim 1, characterized in that, in said step 1, the policy rule is a text string expression stored in the policy configuration file; the policy rule is made up of self-defined expressions; the policy rule expression The format is designed by the server based on business requirements and attributes of unstructured data documents to filter unstructured data.

3. The method according to claim 2, wherein the expression includes: a variable, a value and an operator; the value of the variable is extracted from the data tag information during the policy matching process.

4. The method according to claim 1, characterized in that, in said step 3, the marking information is marked by the client according to the attributes of the document data, the business requirements and the conditional information agreed by both parties; the document marking is a A list of key-value correspondences; where the key is a variable on the policy expression and the value is the value of the variable.

5. The method according to claim 4, wherein the marking of the document includes: the size, type and file name of the document.

6. The method according to claim 1, characterized in that, in said step 4, the policy rules are parsed into a data structure suitable for policy matching, to facilitate policy matching calculations; when parsing the policy rules, the text policy The rules perform lexical analysis and syntax analysis. If the policy rules are correct, a tree-type policy data structure is generated, otherwise an error is reported.

7. The method according to claim 1, wherein said step 5 comprises the steps of:

(5‐1). Extract document markup L from document D;

(5‐2). Construct a hash table H to save the tag information;

(5‐3). Obtain tag information item by item, and fill the information into H in the form of key-value pair <key, value>.

8. The method according to claim 1, wherein said step 6 comprises the steps of:

(6-1). Construct a first-in first-out queue Queue;

(6‐2). Traversing the tree structure expression Exp from the root;

(6‐3). Determine the element in the queue, if it is 1, return the result value of Exp calculation, otherwise return an error.

9. The method according to claim 8, wherein said step 6-2 comprises the steps of:

a. If the left subtree is not empty, traverse the left subtree of the tree;

b. If the right subtree is not empty, traverse the right subtree of the tree;

c. If Exp is value, put Exp into Queue;

d. If Exp is a variable var, use the Get function to extract the variable value from the hash table H stored in the tag to construct a new Exp, and put the new Exp into the Queue;

e. If Exp is an operator, judge the number of operands N of the operator, and take N operands from the Queue, calculate the expression, save the calculation result in the new Exp in the form of value, and then save the new Exp The Exp into the Queue.

10. The method according to claim 9, wherein the Get function is Get(key, H)->value;

The Get function returns the corresponding value from the hash table H through the key. At this time, the variable var is used as the key of H.