Summary of the invention
For overcoming defects, the invention provides a kind of unstructured data safety filtering method based on strategy, in service end by the policy expression of configuration with variable, the layout strategy matching algorithm, be that unstructured data adds mark value in client, when the unstructured data of client tape label value during to service end, according to policy expression, in mode very flexibly, the mark value of unstructured data is carried out strategy matching by the strategy matching algorithm and filter.
For achieving the above object, the invention provides a kind of unstructured data safety filtering method based on strategy, carry out data filtering when service end and client are carried out data transmission, its improvements are, described method comprises the steps:
(1). the layout strategy regular expression;
(2). in conjunction with attribute and the business demand of unstructured data, according to policing rule expression formula design specific strategy rule;
(3). according to policing rule and unstructured data attribute, with the form record of unstructured data attribute with label information, and transmit together with unstructured data;
(4). parses policy rule is configured to tree data structure with the policing rule of character string forms and calculates to carry out strategy matching;
(5). resolve label information, and be stored in Hash table;
(6). with the parameter of the label information in policing rule data structure and Hash table as the policing rule coupling, calculate matching result, the match is successful, and allow data to pass through, otherwise do not allow data to pass through, and log.
In optimal technical scheme provided by the invention, in described step 1, policing rule is the text string expression formula that is stored in tactful configuration file; Policing rule is comprised of self-defining expression formula; The policing rule expression formula is designed according to business demand and unstructured data document properties by service end, so that unstructured data is filtered.
In the second optimal technical scheme provided by the invention, expression formula comprises: variable, value and operational character; The value of variable is extracted from data markers information in the strategy matching process.
In the 3rd optimal technical scheme provided by the invention, in described step 3, label information is made marks to document by the conditional information of client according to attribute, business demand and the mutual agreement of document data; Document markup is a list that key-value pair is answered; Wherein, key is the variable on policy expression, is worth the value for variable.
In the 4th optimal technical scheme provided by the invention, the mark of document is comprised: the size of document, type and filename.
In the 5th optimal technical scheme provided by the invention, in described step 4, policing rule is resolved to the data structure that is suitable for strategy matching, calculate to facilitate strategy matching; Parses policy when rule, the policing rule of text is carried out lexical analysis and grammatical analysis, if policing rule is correct, spanning tree type policy data structure, otherwise the processing that reports an error.
In the 6th optimal technical scheme provided by the invention, described step 5 comprises the steps: (5-1). extract document markup L from document D;
(5-2). build Hash table H and preserve label information;
(5-3). obtain item by item label information, and with information with key-value pair<key, value〉mode be filled in H.
In the 7th optimal technical scheme provided by the invention, described step 6 comprises the steps:
(6-1). build the formation Queue of a first in first out;
(6-2). begin traverse tree type structure expression Exp from root;
(6-3). judge the element in formation, as be 1, return to the end value that Exp calculates, otherwise return to mistake.
In the 8th optimal technical scheme provided by the invention, described step 6-2 comprises the steps:
A. be not empty as left subtree, the left subtree of traverse tree;
B. be not empty as right subtree, the right subtree of traverse tree;
C. be value value as Exp, Exp is entered Queue;
D. be variable var as Exp, extract variate-value by the Get function and build new Exp from the Hash table H that is stored in mark, new Exp is entered in Queue;
E. be operational character operator as Exp, the operand number N of decision operation symbol, and get N operand from QQueue, and calculation expression, result of calculation is stored in new Exp in the mode of value value, and then new Exp is entered in Queue.
In the 9th optimal technical scheme provided by the invention, the Get function be Get (key, H)->value;
The Get function from Hash table H, returns to corresponding value by key, and this moment is with the key of variable var as H.
Compared with the prior art, a kind of unstructured data safety filtering method based on strategy provided by the invention, purpose are to solve the problem of safe transmission unstructured data between the network of different level of securitys; Because current information filtering technology does not have good mode on the safety problem that solves the destructuring transmission, and pass through the collocation strategy regular expression based on the technology of strategy, design efficient strategy matching algorithm, and do attribute flags for transmitting the document that exchanges, by gateway server (interface between heterogeneous networks) time, according to the policing rule expression formula, adopt the mode of strategy matching algorithm that the document markup attribute is mated, thereby document is carried out safety filtering, to guarantee the safety problem of unstructured data in transmitting procedure.
Embodiment
As shown in Figure 2, a kind of unstructured data safety filtering method based on strategy is carried out data filtering when service end and client are carried out data transmission, comprise the steps:
(1). the layout strategy regular expression;
(2). in conjunction with attribute and the business demand of unstructured data, according to policing rule expression formula design specific strategy rule;
(3). according to policing rule and unstructured data attribute, with the form record of unstructured data attribute with label information, and transmit together with unstructured data;
(4). parses policy rule is configured to tree data structure with the policing rule of character string forms and calculates to carry out strategy matching;
(5). resolve label information, and be stored in Hash table;
(6). with the parameter of the label information in policing rule data structure and Hash table as the policing rule coupling, calculate matching result, the match is successful, and allow data to pass through, otherwise do not allow data to pass through, and log.
In described step 1, policing rule is the text string expression formula that is stored in tactful configuration file; Policing rule is comprised of self-defining expression formula; The policing rule expression formula is designed according to business demand and unstructured data document properties by service end, so that unstructured data is filtered.
Expression formula comprises: variable, value and operational character; The value of variable is extracted from data markers information in the strategy matching process.
In described step 3, label information is made marks to document by the conditional information of client according to attribute, business demand and the mutual agreement of document data; Document markup is a list that key-value pair is answered; Wherein, key is the variable on policy expression, is worth the value for variable.
Mark to document comprises: the size of document, type and filename.
In described step 4, policing rule is resolved to the data structure that is suitable for strategy matching, calculate to facilitate strategy matching; Parses policy when rule, the policing rule of text is carried out lexical analysis and grammatical analysis, if policing rule is correct, spanning tree type policy data structure, otherwise the processing that reports an error.
Described step 5 comprises the steps: (5-1). extract document markup L from document D;
(5-2). build Hash table H and preserve label information;
(5-3). obtain item by item label information, and with information with key-value pair<key, value〉mode be filled in H.
Described step 6 comprises the steps:
(6-1). build the formation Queue of a first in first out;
(6-2). begin traverse tree type structure expression Exp from root;
(6-3). judge the element in formation, as be 1, return to the end value that Exp calculates, otherwise return to mistake.
Described step 6-2 comprises the steps:
A. be not empty as left subtree, the left subtree of traverse tree;
B. be not empty as right subtree, the right subtree of traverse tree;
C. be value value as Exp, Exp is entered Queue;
D. be variable var as Exp, extract variate-value by the Get function and build new Exp from the Hash table H that is stored in mark, new Exp is entered in Queue;
E. be operational character operator as Exp, the operand number N of decision operation symbol, and get N operand from Queue, and calculation expression, result of calculation is stored in new Exp in the mode of value value, and then new Exp is entered in Queue.
The Get function be Get (key, H)->value; The Get function from Hash table H, returns to corresponding value by key, and this moment is with the key of variable var as H.
By following examples, the unstructured data safety filtering method based on strategy is done further explanation.
Fig. 1 has provided the unstructured data safety filtering method reference architecture figure based on strategy, and it mainly comprises three parts: policy rule information, data markers information and matching algorithm.Policy rule information comprises policing rule and rule parsing two parts.Policing rule is the expression formula text string, deposits in tactful configuration file its foundation for filtering; Rule parsing is for being fit to the expression-form (data structure) of coupling with regular text resolution.Label information is that client is the attribute description that document is made, and includes relevant information, user's operation information of document etc., and different documents has different label informations.Matching algorithm calculates coupling according to the label information of client transmissions data according to the policing rule expression formula, with the foundation of matching result as the document filtration.By in service end design and collocation strategy rule and be that document adds attribute description in client, come the matching strategy result according to the relation between data attribute and policing rule expression formula, with this as the filtration voucher.Policing rule is designed to the mathematic(al) representation form with variable, calculates very easy.In addition, due to the dirigibility of policing rule, the extendability of this filter method is very powerful.
The below provides concrete introduction:
Policing rule: policing rule is a text string expression formula that is stored in tactful configuration file.Policing rule is comprised of one or more self-defining expression formulas.Expression formula is by variable, and value and operational character consist of.The value of variable is extracted from data markers information in the strategy matching process.Because policing rule uses expression formula, so the design of policing rule is very flexible.When realizing, concrete suitable policing rule expression formula is designed according to business demand and unstructured data document properties by service end keeper or business personnel, so that unstructured data is filtered.
Parses policy rule: after policing rule configures, in order better policing rule to be mated calculating, policing rule need to be resolved to the data structure that is suitable for strategy matching, calculate to facilitate strategy matching.During the parses policy rule, need to carry out lexical analysis and grammatical analysis to the policing rule of text string.If policing rule is correct, with the generation strategy data structure, otherwise processing will report an error.
Label information: label information by client (send data one end) according to the conditional information of attribute, business demand and the mutual agreement of document data to document make marks (as the size of document, type, filename etc.).Document markup is flexible and variable, and it is the list of a key-value pair, and wherein, key is the variable on policy expression, is worth the value for variable.
Coupling: after regular data structure and label information had been arranged, matching algorithm as parameter, by traversal and calculative strategy regular expression, filtered document with regular data structure and label information; In coupling was calculated, the variable in expression formula replaced with the value to dependent variable in label information.With respect to policing rule and label information, matching algorithm is independently, and it is not subjected to the impact of two parts in front.
1, policing rule expression formula design
Unstructured data safety filtering method core missions based on strategy are exactly to design the policing rule that adapts to filtercondition.The syntax gauge P that this patent proposes a kind of policing rule on the basis of this standard, can design various policing rules flexibly.The standard P of policing rule is described as: P is worth by variable, and three kinds of fundamental elements of operational character consist of, and working rule has,
Rule one: value value and variable var are expression formula exp
Rule two: expression formula consists of new expression formula by monocular operational character opu
Rule three: expression formula and expression formula consist of strategy by the expression formula that binocular operational character opb consists of the new final form of expression formula.In the foregoing description, variable is by letter, and numeral and underscore form, and initial is not numeral; Value is by integer number (mathematical integer), and floating type number (mathematical real number) and character string form; Operational character by relational operator (>,>=,<,<=,==, unequal to), logical operator (﹠amp; ﹠amp; , ||,! ) and substring (substr: left operand is the substring of right operand) operation formation.
Formal being described below of policing rule standard P:
When the collocation strategy rule, the designer is according to the policing rule standard, in conjunction with business demand and user's operation, select suitable variable, and the satisfied value of variable, by operational character, the relation between variable and value is consisted of a policing rule, and be stored in the policing rule file with the form of brief note string.Design procedure is as follows:
A. according to policing rule standard P;
B. according to the attribute information of document, in conjunction with business demand information, word is required to convert to policing rule
Expression formula exp
As: by Doctype, do inceptive filtering.Business demand only receives the PDF document, but design variable doc_type, use==operational character, policy expression is
doc_type==“PDF”
This expression formula is when document PDF by name, and operating result is true, allow to pass through, otherwise, do not allow to pass through.For another example: suppose that document security level has 1,2,3,4,5 five grades, and numeral is larger, higher grade, certain business demand be document security level greater than 3 o'clock, document does not allow transmission, but design variable doc_security_grade, operational character be>with! , policy expression is
!(doc_security_grade>3)
Be expressed as document security level greater than 3 o'clock, it is false returning results, and does not allow transmission, otherwise, allow transmission.
2, label information
Based on the unstructured data safety filtering method of strategy in client (document transmission end) to the document information that makes marks.Mark not only includes encryption, digital digest, the mandate of document, and the information that all documents such as the owner all can exist also includes the value information of the variable in the service end policing rule.The information step of making marks is:
A. build label L
B. obtain the attribute information A of document, add label L (A) to
C. obtain user's operational attribute information O, add label L (O) to
D. mark is added D in the data file data (L)
3, policing rule is resolved
In method flow 1, the policing rule of designing exists with character string forms, and in order to carry out the matching operation of strategy, the strategy of character string forms need to be represented with the data structure that is fit to matched rule.
In order to can be good at carrying out strategy matching, use following data structure that policy expression is preserved.
Exp can preserve a value expression (use data territory); Exp can preserve one with the expression formula of operational character, and have operational character to be stored in operator this moment, and lchild preserves left operand, and rchild preserves right operand.By the recurrence of a plurality of Exp, Exp finally can preserve a complete strategy.
The purpose that policing rule is resolved is exactly that policing rule exp with character string forms resolves to strategy with the Exp structure, so that carry out strategy matching.The policy resolution rule is as follows:
A. recurrence exp is handled as follows:
B-1. during for value value or variable var, construct an Exp as exp, value is stored in data territory in Exp, other territories are sky;
B-2. working as exp is op
uDuring exp, construct an Exp, with op
uGive the operator territory in Exp, with op
uThe exp of back is to the rchild of Exp, and other territories are empty;
B-3. working as exp is exp op
bDuring exp, construct an Exp, with op
bGive the operator territory in Exp, with oP
bThe exp of back is to the rchild of Exp, with oP
bThe exp of front is to the lchild of Exp, and other territories are empty;
B. return to the root by the tree data structure of Exp structure, be a complete policy data structure.
4, label information extracts
After service end obtains document data, document data is resolved, with the acquisition label information, and label information is converted to the data structure that is suitable for coupling to be calculated.Step is as follows:
A. extract document markup L from document D;
B. build Hash table (HashMap) H and preserve label information;
C. obtain item by item label information, and with information with key-value pair<key, value〉mode be filled in H.
5, policing rule coupling
, be stored in tree in the expression formula mode after parsing due to policing rule, when strategy matching, by the postorder traversal tree, and when traversal calculation expression, finally can obtain an end value.End value can be used for judging whether to allow document to pass through.The strategy matching step is as follows:
A. build the formation Queue of a first in first out;
B. begin postorder traversal tree expression formula Exp from root
B1. be not empty as left subtree (lchild of Exp), the left subtree of traverse tree;
B2. be not empty as right subtree (rchild of Exp), the right subtree of traverse tree;
B3. be value value (data is not empty) as Exp, Exp is entered Queue;
B4. be variable var (data is not for empty) as Exp, extract variate-value by the Get function and build new Exp from Hash table (HashMap) H that is stored in mark, new Exp is entered in Queue.
The Get function is
Get(key,H)->value
The Get function from Hash table H, returns to corresponding value by key, and this moment is with the key of variable var as H.
B5. the operator as Exp is not empty, and namely expression formula is operational character, the operand number N of decision operation symbol, and get N operand from Queue, and calculation expression, result of calculation is stored in new Exp, and then new Exp is entered in Queue
Judge the element in formation, as be 1, return to the end value that Exp calculates, otherwise expression is wrong, returns to mistake.
For convenience of description, our hypothesis has following application example:
Certain enterprise has the high internal network of internal security rank and the low external network of level of security, and external network and internet link, and business demand and better services client for enterprises need internal network is connected with extranets.Connecting the gateway of internal network with external network, layout is based on the unstructured data security filter of strategy, and internal and external customer's transferring documents end is added to the mark instrument of informational function of document.Document is before transmission, the strategy good according to as offered adds label information to the document policies variable, when passing the intranet and extranet junction, when being filtering server, server can extract label information, and strategy is mated calculating, in case strategy matching is not passed through, will carry out will not passing through to process to document, otherwise, allow to pass through.Suppose that application scenarios is that the user is delivered to the unstructured data document on the low external network of level of security from the high internal network of level of security, its specific embodiment is:
One, policing rule:
Policing rule is set according to the keeper of business demand by server, and requires client variate-value in filling Strategy in transferring documents to play the effect of filtration.Here suppose that service needed is as follows:
1. the document source is 192.168.216.5 for the IP of Intranet operation system
2. can not include " scheme " in document name
3. document security level can not surpass 3 (the supposition level of confidentiality is 1,2,3,4,5 five grades, higher grade, level of security is higher)
According to the policing rule standard, use variable src to represent document source, name represents document name, and sec_grade represents the file level of confidentiality, and policing rule is as follows,
(src=" 192.168.216.5 ") ﹠amp; ﹠amp; (! (" scheme " substr name)) ﹠amp; ﹠amp; (sec_grade<=3)
Two, label information:
In the client of document transmission, in negotiations process, client is known and need to be done document source to document, the marking operation of document name and document security level, and supposing has individual client, transmits three parts of documents:
The mark value of document 1 is as follows:
{ src:192.168.216.6, name: " design proposal ", sec_grade=4}
The mark value of document 2 is as follows:
{ src:192.168.216.5, name: " design proposal ", sec_grade=4}
The mark value of document 3 is as follows:
{ src:192.168.216.5, name: " operation system operation instructions ", sec_grade=1}
Three, coupling:
1. need policing rule is done parsing before the coupling, convert the policing rule of character string forms to tree, the policing rule in " " will become following tree construction, as shown in Figure 3.
2. then according to the postorder traversal tree, and when running into variable, go to extract the value of variable from label information, come matched indicia information, with this, document is filtered.
The effect of postorder traversal tree is exactly the value of first calculating left subtree, and then calculates the value of right subtree, left subtree, and right subtree has all calculated, then calculates upper strata subtree left subtree, and right subtree is until tree root.
So according to the postorder traversal rule, have for document 1
2.1 coupling is calculated
src=192.168.216.5
Because the src value of marker extraction is 192.168.216.6, so calculated value is false.
2.2 coupling is calculated
" scheme " substr name
Due to marker extraction filename value be " design proposal ", so rreturn value is true.
2.3 coupling is calculated
(" scheme " substr name)
Obtained the false result in 2.2, so this step result is true.
2.4 coupling is calculated
sec_grade>=3
Because sec_grade is 4, so return to false.
2.5 coupling is calculated
(! (" scheme " substr name) ﹠amp; ﹠amp; (sec_grade>=3))
Can be got by 2.3 and 2.4 results, this result is false.
2.6 coupling is calculated
() ﹠amp src=192.168.216.5; ﹠amp; (! (" scheme " substr name) ﹠amp; ﹠amp; (sec_grade>=3))
Result of calculation by 2.1 and 2.4 as can be known, final matching results is false, so this document will be filtered, does not namely allow to pass through.
According to top step, document 2 and document 3 are mated calculating, be false with the result that obtains document 2, and the result of document 3 is true, namely document 3 allows to pass through, and document 2 does not allow to pass through.
What need statement is that content of the present invention and embodiment are intended to prove the practical application of technical scheme provided by the present invention, should not be construed as the restriction to protection domain of the present invention.Those skilled in the art can do various modifications, be equal to and replace or improve inspired by the spirit and principles of the present invention.But these changes or modification are all in the protection domain that application is awaited the reply.