Summary of the invention
For overcoming above-mentioned defect, the invention provides a kind of unstructured data safety filtering method based on strategy, at the policy expression of service end by configure band variable, layout strategy matching algorithm, client be unstructured data add mark value, when the unstructured data of client tape label value is to service end, according to policy expression, in mode very flexibly, strategy matching filtration is carried out to the mark value of unstructured data by strategy matching algorithm.
For achieving the above object, the invention provides a kind of unstructured data safety filtering method based on strategy, carry out data filtering when service end and client carry out data transmission, its improvements are, described method comprises the steps:
(1). layout strategy regular expression;
(2). in conjunction with attribute and the business demand of unstructured data, according to policing rule expression formula design specific strategy rule;
(3). according to policing rule and unstructured data attribute, by unstructured data attribute with the form record of label information, and transmit together with unstructured data;
(4). parses policy rule, is configured to tree data structure to carry out strategy matching calculating by the policing rule of character string forms;
(5). resolve label information, and be stored in Hash table;
(6). using the parameter that policy rules data structure is mated as policing rule with the label information in Hash table, calculate matching result, the match is successful, allows data to pass through, otherwise do not allow data to pass through, and log.
In optimal technical scheme provided by the invention, in described step 1, policing rule is the text string expression formula be stored in policy configuration file; Policing rule is made up of self-defining expression formula; Policing rule expression formula is designed, to filter unstructured data according to business demand and unstructured data document properties by service end.
In second optimal technical scheme provided by the invention, expression formula comprises: variable, value and operational character; The value of variable, in strategy matching process, is extracted from data markers information.
In 3rd optimal technical scheme provided by the invention, in described step 3, label information is made marks to document by the conditional information of client according to the attribute of document data, business demand and mutual agreement; Document markup is the list that a key-value pair is answered; Wherein, key is the variable on policy expression, is worth the value for variable.
In 4th optimal technical scheme provided by the invention, the mark of document is comprised: the size of document, type and filename.
In 5th optimal technical scheme provided by the invention, in described step 4, policing rule is resolved to the data structure being suitable for strategy matching, calculate to facilitate strategy matching; During parses policy rule, lexical analysis and grammatical analysis are carried out to the policing rule of text, if policing rule is correct, generate tree-shaped policy data structure, otherwise the process that reports an error.
In 6th optimal technical scheme provided by the invention, described step 5 comprises the steps: (5-1). from document D, extract document markup L;
(5-2). build Hash table H and preserve label information;
(5-3). obtain label information item by item, and information is filled in H in the mode of key-value pair <key, value>.
In 7th optimal technical scheme provided by the invention, described step 6 comprises the steps:
(6-1). build the queue Queue of a first in first out;
(6-2). from root, travel through tree-shaped structure expression Exp;
(6-3). judge the element in queue, as being 1, returning the end value that Exp calculates, otherwise returning mistake.
In 8th optimal technical scheme provided by the invention, described step 6-2 comprises the steps:
A. if left subtree is not empty, the left subtree of traverse tree;
B. if right subtree is not empty, the right subtree of traverse tree;
C. if Exp is value value, Exp is entered Queue;
D. if Exp is variable var, from the Hash table H being stored in mark, extract variate-value by Get function and build new Exp, new Exp is entered in Queue;
E. if Exp is operational character operator, judge the operand number N of operational character, and get N number of operand from QQueue, calculation expression, result of calculation is stored in new Exp in the mode of value value, and then is entered in Queue by new Exp.
In 9th optimal technical scheme provided by the invention, Get function is Get (key, H)-> value;
Get function, by key, from Hash table H, returns corresponding value, now using the key of variable var as H.
Compared with the prior art, a kind of unstructured data safety filtering method based on strategy provided by the invention, the problem of safe transmission unstructured data between the network that object is to solve different level of security; Because Current Content filtering technique does not have good mode in the safety problem solving destructuring transmission, and pass through collocation strategy regular expression based on the technology of strategy, design efficient strategy matching algorithm, and do attribute flags for transmitting the document exchanged, when by gateway server (interface between heterogeneous networks), according to policing rule expression formula, the mode of strategy matching algorithm is adopted to mate document markup attribute, thus safety filtering is carried out to document, to guarantee the safety problem of unstructured data in transmitting procedure.
Embodiment
As shown in Figure 2, a kind of unstructured data safety filtering method based on strategy, carries out data filtering when service end and client carry out data transmission, comprises the steps:
(1). layout strategy regular expression;
(2). in conjunction with attribute and the business demand of unstructured data, according to policing rule expression formula design specific strategy rule;
(3). according to policing rule and unstructured data attribute, by unstructured data attribute with the form record of label information, and transmit together with unstructured data;
(4). parses policy rule, is configured to tree data structure to carry out strategy matching calculating by the policing rule of character string forms;
(5). resolve label information, and be stored in Hash table;
(6). using the parameter that policy rules data structure is mated as policing rule with the label information in Hash table, calculate matching result, the match is successful, allows data to pass through, otherwise do not allow data to pass through, and log.
In described step 1, policing rule is the text string expression formula be stored in policy configuration file; Policing rule is made up of self-defining expression formula; Policing rule expression formula is designed, to filter unstructured data according to business demand and unstructured data document properties by service end.
Expression formula comprises: variable, value and operational character; The value of variable, in strategy matching process, is extracted from data markers information.
In described step 3, label information is made marks to document by the conditional information of client according to the attribute of document data, business demand and mutual agreement; Document markup is the list that a key-value pair is answered; Wherein, key is the variable on policy expression, is worth the value for variable.
The mark of document is comprised: the size of document, type and filename.
In described step 4, policing rule is resolved to the data structure being suitable for strategy matching, calculate to facilitate strategy matching; During parses policy rule, lexical analysis and grammatical analysis are carried out to the policing rule of text, if policing rule is correct, generate tree-shaped policy data structure, otherwise the process that reports an error.
Described step 5 comprises the steps: (5-1). from document D, extract document markup L;
(5-2). build Hash table H and preserve label information;
(5-3). obtain label information item by item, and information is filled in H in the mode of key-value pair <key, value>.
Described step 6 comprises the steps:
(6-1). build the queue Queue of a first in first out;
(6-2). from root, travel through tree-shaped structure expression Exp;
(6-3). judge the element in queue, as being 1, returning the end value that Exp calculates, otherwise returning mistake.
Described step 6-2 comprises the steps:
A. if left subtree is not empty, the left subtree of traverse tree;
B. if right subtree is not empty, the right subtree of traverse tree;
C. if Exp is value value, Exp is entered Queue;
D. if Exp is variable var, from the Hash table H being stored in mark, extract variate-value by Get function and build new Exp, new Exp is entered in Queue;
E. if Exp is operational character operator, judge the operand number N of operational character, and get N number of operand from Queue, calculation expression, result of calculation is stored in new Exp in the mode of value value, and then is entered in Queue by new Exp.
Get function is Get (key, H)-> value; Get function, by key, from Hash table H, returns corresponding value, now using the key of variable var as H.
By following examples, the unstructured data safety filtering method based on strategy is further explained.
Fig. 1 gives the unstructured data safety filtering method reference architecture figure based on strategy, and it mainly comprises three parts: policy rule information, data markers information and matching algorithm.Policy rule information comprises policing rule Sum fanction and resolves two parts.Policing rule is expression formula text string, deposits in policy configuration file, and it is the foundation of filtering; Rule parsing is the expression-form (data structure) being applicable to coupling by regular text resolution.Label information is client is the attribute description that document is made, and include the relevant information of document, user's operation information etc., different documents has different label informations.Matching algorithm, according to the label information of client transmissions data, carries out calculating coupling according to policing rule expression formula, the foundation of filtering using matching result as document.By in service end design and collocation strategy rule and be that document adds attribute description in client, carry out matching strategy result according to the relation between data attribute and policing rule expression formula, in this, as filtering voucher.Policing rule is designed to the mathematic(al) representation form being with variable, calculates very easy.In addition, due to the dirigibility of policing rule, the extendability of this filter method is very powerful.
Provide concrete introduction below:
Policing rule: policing rule is the text string expression formula be stored in policy configuration file.Policing rule is made up of one or more self-defining expression formula.Expression formula is by variable, and value and operational character are formed.The value of variable, in strategy matching process, is extracted from data markers information.Because policing rule uses expression formula, so the design of policing rule is very flexible.When realizing, concrete suitable policing rule expression formula is designed, to filter unstructured data according to business demand and unstructured data document properties by service end keeper or business personnel.
Parses policy rule: after policing rule configures, in order to better carry out matching primitives to policing rule, needing policing rule to be resolved to the data structure being suitable for strategy matching, calculating to facilitate strategy matching.During parses policy rule, need to carry out lexical analysis and grammatical analysis to the policing rule of text string.If policing rule is correct, by generation strategy data structure, otherwise will report an error process.
Label information: label information makes marks (as the size of document, type, filename etc.) according to the conditional information of the attribute of document data, business demand and mutual agreement to document by client (send data one end).Document markup is that it is the list of a key-value pair, and wherein, key is the variable on policy expression, is worth the value for variable flexibly with variable.
Coupling: after having had rules data structure and label information, matching algorithm, using rules data structure and label information as parameter, by traversal and calculative strategy regular expression, carrys out filter document; In matching primitives, the variable in expression formula is to replace the value of dependent variable in label information.Relative to policing rule and label information, matching algorithm is independently, and it is not subject to the impact of two parts above.
1, policing rule expression formula design
Unstructured data safety filtering method core missions based on strategy are exactly design the policing rule adapting to filtercondition.This patent proposes a kind of syntax gauge P of policing rule, on the basis of this specification, can design various policing rule flexibly.The specification P of policing rule is described as: P, by variable, is worth, and operational character three kinds of fundamental elements are formed, and working rule has,
Rule one: value value and variable var are expression formula exp
Rule two: expression formula forms new expression formula by monocular operational character opu
Rule three: the expression formula that expression formula and expression formula form the final form of new expression formula by binocular operational character opb forms strategy.In the foregoing description, variable is by letter, and numeral and underscore form, and initial is not numeral; Value is by integer number (mathematical integer), and floating type number (mathematical real number) and character string form; Operational character by relational operator (>, >=, <, <=,==, unequal to), logical operator (& &, ||,! ) and substring (substr: left operand is the substring of right operand) operation formation.
Policing rule specification P is formal to be described below:
When collocation strategy rule, designer is according to policing rule specification, in conjunction with business demand and user operation, select suitable variable, and the value that variable meets, by operational character, the relation between variable and value is formed a policing rule, and be stored in policing rule file with the form of brief note string.Design procedure is as follows:
A. according to policing rule specification P;
B. according to the attribute information of document, in conjunction with business demand information, word is required to convert policing rule to
Expression formula exp
As: by Doctype, do inceptive filtering.Business demand only receives PDF document, then can design variable doc_type, and use==operational character, then policy expression is
doc_type==“PDF”
This expression formula is when document is called PDF, and operating result is true, allows to pass through, otherwise, do not allow to pass through.For another example: suppose that document security level has 1,2,3,4,5 five grades, and numeral is larger, higher grade, and certain business demand is that document security level is when being greater than 3, document does not allow transmission, then can design variable doc_security_grade, operational character be > and! , policy expression is
!(doc_security_grade>3)
Be expressed as document security level when being greater than 3, it is false for returning results, and does not allow transmission, otherwise, allow transmission.
2, label information
Unstructured data safety filtering method based on strategy to make marks information to document in client (document transmission end).Mark not only includes the encryption of document, digital digest, mandate, and the information that all documents such as the owner all can exist, also includes the value information of the variable in service end policing rule.The information Step that makes marks is:
A. label L is built
B. obtain the attribute information A of document, add label L (A) to
C. obtain user operation attribute information O, add label L (O) to
D. mark is added D (L) in data file data
3, policing rule is resolved
In method flow 1, the policing rule designed exists with character string forms, and in order to carry out the matching operation of strategy, needs the strategy of character string forms to represent with the data structure of applicable matched rule.
In order to can be good at carrying out strategy matching, following data structure is used to preserve policy expression.
Exp can preserve a value expression (using data territory); Exp can preserve the expression formula of a band operational character, and now have operational character to be stored in operator, lchild preserves left operand, and rchild preserves right operand.By the recurrence of multiple Exp, Exp finally can preserve a complete strategy.
The object that policing rule is resolved is exactly resolve to the policing rule exp of character string forms with the strategy of Exp structure, so that carry out strategy matching.Policy resolution rule is as follows:
A. recurrence exp is handled as follows:
B-1., when exp is for value value or variable var, construct an Exp, be stored in by value in the data territory in Exp, other territories are empty;
B-2. when exp is op
uduring exp, construct an Exp, by op
uto the operator territory in Exp, by op
uexp is below to the rchild of Exp, and other territories are empty;
B-3. when exp is expop
bduring exp, construct an Exp, by op
bto the operator territory in Exp, by oP
bexp below to the rchild of Exp, by oP
bexp is above to the lchild of Exp, and other territories are empty;
B. return the root of the tree data structure constructed by Exp, be a complete policy data structure.
4, label information extracts
After service end obtains document data, document data is resolved, to obtain label information, and label information is converted to the data structure being suitable for matching primitives.Step is as follows:
A. from document D, document markup L is extracted;
B. build Hash table (HashMap) H and preserve label information;
C. obtain label information item by item, and information is filled in H in the mode of key-value pair <key, value>.
5, policing rule coupling
Due to policing rule after parsing, be stored in tree in expression formula mode, when strategy matching, by postorder traversal tree, and traversal time calculation expression, finally can obtain an end value.End value can be used for judging whether to allow document to pass through.Strategy matching step is as follows:
A. the queue Queue of a first in first out is built;
B. postorder traversal tree expression formula Exp from root
B1. if left subtree (lchild of Exp) is not empty, the left subtree of traverse tree;
B2. if right subtree (rchild of Exp) is not empty, the right subtree of traverse tree;
B3. if Exp is value value (data is not empty), Exp is entered Queue;
B4. if Exp is variable var (data is not for empty), from Hash table (HashMap) H being stored in mark, extract variate-value by Get function and build new Exp, new Exp is entered in Queue.
Get function is
Get(key,H)->value
Get function, by key, from Hash table H, returns corresponding value, now using the key of variable var as H.
B5. the operator as Exp is not empty, and namely expression formula is operational character, and judge the operand number N of operational character, and get N number of operand from Queue, calculation expression, result of calculation is stored in new Exp, and then is entered in Queue by new Exp
Judging the element in queue, as being 1, return the end value that Exp calculates, otherwise expression is wrong, returns mistake.
For convenience of description, our hypothesis has following application example:
Certain enterprise has the high internal network of internal security rank and the low external network of level of security, and external network and internet are connected, and in order to business demand and the better services client of enterprises, internal network need be connected with extranets.Connecting the gateway of internal network and external network, arrange the unstructured data security filter based on strategy, internal and external customer's transferring documents end is added to document and marks the instrument of informational function.Document before being transmitted, the strategy good according to as offered adds label information to document policies variable, when through intranet and extranet junction, namely during filtering server, server can extract label information, carries out matching primitives to strategy, once strategy matching is not passed through, will to document carry out will not by process, otherwise, allow to pass through.Be delivered to the low external network of level of security from the internal network that level of security is high by unstructured data document assuming that application scenarios is user, its specific embodiment is:
One, policing rule:
Policing rule is set by the keeper of server according to business demand, and requires that the variate-value of client when transferring documents in filling Strategy is to play the effect of filtration.Here suppose that service needed is as follows:
1. document source is the IP of Intranet operation system is 192.168.216.5
2. can not include " scheme " in document name
3. document security level can not more than 3 (assuming that level of confidentiality is 1,2,3,4,5 five grades, higher grade, and level of security is higher)
Then according to policing rule specification, use variable src to represent document source, name represents document name, and sec_grade represents file level of confidentiality, then policing rule is as follows,
(src=" 192.168.216.5 ") & & (! (" scheme " substrname)) & & (sec_grade <=3)
Two, label information:
In the client of document transmission, in negotiations process, client is known to be needed to do document source to document, the marking operation of document name and document security level, suppose there is a client, transmits three parts of documents:
The mark value of document 1 is as follows:
{ src:192.168.216.6, name: " design proposal ", sec_grade=4}
The mark value of document 2 is as follows:
{ src:192.168.216.5, name: " design proposal ", sec_grade=4}
The mark value of document 3 is as follows:
{ src:192.168.216.5, name: " operation system operation instructions ", sec_grade=1}
Three, mate:
1. need to resolve policing rule before coupling, convert the policing rule of character string forms to tree, then the policing rule in " " will become following tree construction, as shown in Figure 3.
2., then according to postorder traversal tree, and when running into variable, from label information, going the value extracting variable, carry out matched indicia information, with this, document is filtered.
The effect of postorder traversal tree is exactly the value first calculating left subtree, and then calculates the value of right subtree, left subtree, and right subtree has all calculated, then calculates upper strata subtree left subtree, right subtree, until tree root.
So according to postorder traversal rule, have for document 1
2.1 matching primitives
src=192.168.216.5
Src value due to marker extraction is 192.168.216.6, so calculated value is false.
2.2 matching primitives
" scheme " substrname
Due to marker extraction filename value be " design proposal ", so rreturn value is true.
2.3 matching primitives
! (" scheme " substrname)
False result has been obtained, so this step result is true in 2.2.
2.4 matching primitives
sec_grade>=3
Because sec_grade is 4, so return false.
2.5 matching primitives
(! (" scheme " substrname) & & (sec_grade >=3))
Can be obtained by 2.3 and 2.4 results, this result is false.
2.6 matching primitives
(src=192.168.216.5) & & (! (" scheme " substrname) & & (sec_grade >=3))
From the result of calculation of 2.1 and 2.4, final matching results is false, so this document will be filtered, does not namely allow to pass through.
According to step above, carry out matching primitives to document 2 and document 3, be false by the result obtaining document 2, and the result of document 3 is true, namely document 3 allows to pass through, and document 2 does not allow to pass through.
It is to be understood that content of the present invention and embodiment are intended to the practical application proving technical scheme provided by the present invention, should not be construed as limiting the scope of the present invention.Those skilled in the art inspired by the spirit and principles of the present invention, can do various amendment, equivalent replacement or improve.But these changes or amendment are all in the protection domain that application is awaited the reply.