CN105491023B - Data isolation exchange and safety filtering method for power Internet of things - Google Patents

Data isolation exchange and safety filtering method for power Internet of things Download PDF

Info

Publication number
CN105491023B
CN105491023B CN201510824673.3A CN201510824673A CN105491023B CN 105491023 B CN105491023 B CN 105491023B CN 201510824673 A CN201510824673 A CN 201510824673A CN 105491023 B CN105491023 B CN 105491023B
Authority
CN
China
Prior art keywords
isolation
word
data
length
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510824673.3A
Other languages
Chinese (zh)
Other versions
CN105491023A (en
Inventor
周诚
张涛
马媛媛
李伟伟
汪晨
邵志鹏
费稼轩
何高峰
楚杰
黄秀丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Smart Grid Research Institute of SGCC
Original Assignee
State Grid Corp of China SGCC
Smart Grid Research Institute of SGCC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Smart Grid Research Institute of SGCC filed Critical State Grid Corp of China SGCC
Priority to CN201510824673.3A priority Critical patent/CN105491023B/en
Publication of CN105491023A publication Critical patent/CN105491023A/en
Application granted granted Critical
Publication of CN105491023B publication Critical patent/CN105491023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0227Filtering policies
    • H04L63/0245Filtering by information in the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0281Proxies

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a data isolation exchange and safety filtering method for an electric power Internet of things, which comprises the following steps: (1) constructing an isolation architecture based on a front proxy and a proprietary protocol; (2) extracting feature vectors in a preposed agent part; (3) performing label encapsulation on the preposed agent part, and performing label analysis on an isolation service side; (4) realizing label filtering at an isolation service side; (5) and performing content filtering on the isolation service side by combining the feature vector and the label filtering. The invention realizes perfect protocol isolation by introducing the preposed agent and the private application layer protocol, and greatly improves the isolation strength.

Description

Data isolation exchange and safety filtering method for power Internet of things
Technical Field
The invention relates to a safety isolation exchange method, in particular to a data isolation exchange and safety filtering method for an electric power Internet of things.
Background
The basic principle of the security isolation and information exchange technology is that two or more networks are enabled to perform secure data transmission and resource sharing between the networks under the condition of non-communication through a special hardware-isolation device. The specific method is that the isolator cuts off the TCP/IP connection between networks, decomposes or recombines TCP/IP data packets, carries out security examination, and then establishes effective connection with the host on the other side and sends out the data.
Under the development background of an intelligent power grid, a large number of intelligent terminals, intelligent disconnecting links, operation terminals and various power business systems exist in the power internet of things environment, and the terminals and the systems need to perform frequent data interaction with a power information network. Because the electric power information network belongs to a secret-related network, the terminal and the system are accessed through a mobile APN network or the Internet, and the interaction of the terminal and the system has obvious safety risk for the electric power information network, an isolation protection measure must be taken. However, the existing security isolation and information exchange technology has obvious defects, and the isolation and exchange requirements of the power internet of things are difficult to meet, and the specific expression is as follows:
1. insufficient isolation strength
The isolation and exchange are a pair of contradictions, and the traditional safety isolation and information exchange technology is based on the decomposition and recombination of TCP/IP data packets, and has obvious defects on solving the contradiction problem of the isolation and exchange. Most specific data exchange needs to be carried by a specific application protocol, and once the consideration of the application layer protocol is added into a security isolation and information exchange model, it is found that the security risk introduced by the application layer protocol cannot be completely eliminated by TCP/IP message recombination. Supposing that a non-secret-involved network performs data exchange with an Oracle database in a secret-involved network through a security isolation and information exchange system, at the moment, the non-secret-involved network and the secret-involved network need to communicate based on a TNS protocol, and a TCP/IP message needs to be restored to a TNS protocol message format after passing through a hardware exchange matrix, so that an attacker of the non-secret-involved network can actually and completely attack the secret-involved network database through an open TNS protocol message. The defense of traditional security isolation and information exchange techniques against such attacks is to enhance the application-level filtering capabilities as much as possible, but this has obviously degraded to the level of application-level firewalls.
2. The filtration depth and efficiency are contradictory
In the traditional security isolation and information exchange technology system, security filtering is an important ring. Many related products claim to have filtering capabilities at the content level. However, content filtering algorithms cannot be without a performance penalty, and the deeper the depth of content filtering, the more severe the delay and throughput degradation it causes. Conventional security isolation and information exchange technology architectures attempt to accomplish deep filtering on isolated devices, which is likely not feasible in an industrial environment. For example, the number of terminals involved in the power internet of things environment is hundreds of millions, the data exchange flow is huge, strict requirements are also imposed on data exchange delay, the traditional safety isolation and information exchange technical system can only deal with the problem by closing the filtering function or reducing the filtering depth, and the contradiction between the filtering depth and the filtering efficiency is obviously overcome.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a data isolation exchange and safety filtering method for the power internet of things.
In order to achieve the purpose of the invention, the invention adopts the following technical scheme:
a data isolation exchange and safety filtering method facing to an electric power Internet of things comprises the following steps:
(1) constructing an isolation architecture based on a front proxy and a proprietary protocol;
(2) extracting feature vectors in a preposed agent part;
(3) performing label encapsulation on the preposed agent part, and performing label analysis on an isolation service side;
(4) realizing label filtering at an isolation service side;
(5) and performing content filtering on the isolation service side by combining the feature vector and the label filtering.
Preferably, in the step (1), the isolation framework includes an application layer exchange protocol proprietary to the TCP/IP protocol, a dedicated security isolation device, and a pre-proxy, and is configured to map a user interaction process to the application layer exchange protocol packet, so as to implement data exchange.
Preferably, the step (2) includes the following steps:
step 2-1, for message content TiCarrying out pretreatment;
step 2-2, for the message content TiCarrying out feature extraction;
step 2-3, generating a message characteristic vector V' and a characteristic vector V of a sensitive library in an isolation service side;
and 2-4, storing the extracted feature vector into a label field of the message.
Preferably, in the step 2-1, the preprocessing includes performing word segmentation analysis on the text file through an ICTCLAS word segmentation interface, and the message content T isiAfter word segmentation, the expression is as follows:
Ti=((ai1,li1,pi1),(ai2,li2,pi2),......,(ain,lin,pin))
in the formula: t isiRepresenting messages i, ainDenotes a divided phrase, linIndicates the length of the phrase, pinAnd expressing the part of speech of the divided phrase.
Preferably, the step 2-2 includes the following steps:
step 2-2-1, for the message content TiSelecting part of speech, extracting noun word groups in the analyzed text word groups, deleting other parts of speech, and the message content TiAfter part of speech selection, the expression is as follows:
Figure GDA0000930522970000031
in the formula: taiIn order to extract the text following the noun,
Figure GDA0000930522970000032
Figure GDA0000930522970000033
for the sake of a noun, the term,
Figure GDA0000930522970000034
is the length of the noun phrase;
step 2-2-2, counting the occurrence frequency of the keywords to form word segmentation triplets comprising the word groups, the occurrence frequency and the part of speech of the word groups in the text, and converting TaiAdding a word frequency item, wherein the expression is as follows:
Figure GDA0000930522970000035
in the formula: tbiTo count the text after the word frequency,
Figure GDA0000930522970000036
for the word group after the word frequency is counted,
Figure GDA0000930522970000037
for counting the length of the word group after the word frequency,
Figure GDA0000930522970000038
is composed of
Figure GDA0000930522970000039
The word frequency of;
step 2-2-3, calculating the length of each keyword and deleting the keywords of a single word, wherein the expression is as follows:
Figure GDA00009305229700000310
in the formula: tciFor deleting a keyword as text after a single word, wherein
Figure GDA00009305229700000311
Is a phrase with the length larger than one character,
Figure GDA00009305229700000312
is composed of
Figure GDA00009305229700000313
Word frequency;
step 2-2-4, eliminating phrases with keywords appearing once, and obtaining a final expression as follows:
Figure GDA00009305229700000314
wherein: tdiTo cull text after a keyword has appeared once,
Figure GDA00009305229700000315
in order to remove the phrase after the keyword appears once,
Figure GDA00009305229700000316
is composed of
Figure GDA00009305229700000317
Word frequency of wherein
Figure GDA00009305229700000318
Preferably, the step 2-3 includes the following steps:
step 2-3-1, calculating the weight of the phrase based on a TF-IDF formula, wherein the formula is as follows:
dij=tij*log(N/nj)
wherein d isijIs the phrase aijIn the text TiIs equal to TdiIn (1)
Figure GDA00009305229700000319
N is the total number of documents, NjFor document librariesIn which the phrase a is includedijThe number of documents of (2);
step 2-3-2, the feature vector composed of the sensitive library data is expressed as:
V=((a11,d11),(a12,d12),......,(a1m,d1m),......,(an1,dn1),(an1,dn1),......,(anm,dnm))
for brevity, this is:
V=(d11,d12,......,d1m,......,dn1,dn2,......,dnm)
step 2-3-3, according to step 2-3-2, the characteristic vector of the obtained message is simplified as:
V′=(d′11,d′12,......,d′1m,......,d′n1,d′n2,......,d′nm)。
preferably, in the step (3), the tag includes user information U (k, V), data attribute information Ad (k, V), a feature vector V', a generation time T, and encryption flag Fe information, and the expression is:
Label=(U(k,v),Ad(k,v),V’,T,Fe)
the label encapsulation of the front proxy part comprises the following steps:
step 3-1-1, arranging user information U, data attribute information Ad (k, V), feature vectors V' and generation time T in sequence, and dividing the sequence into N blocks;
3-1-2, randomly selecting N1 blocks from the N blocks, setting an encryption identifier, and encrypting data to obtain EN 1;
3-1-3, recording a random selection process R, setting an encryption identifier by taking R as a block, and encrypting R to obtain ER;
3-1-4, setting no encryption identifier for the rest N2(N-N1) blocks;
3-1-5, calculating the length of the EN1 and the length of the ER, and connecting the EN1 length, the EN1, the ER length, the ER and the N2 to obtain label-encapsulated private protocol data E;
the label analysis on the isolation service side comprises the following steps:
step 3-2-1, obtaining the private protocol data E;
3-2-2, extracting the EN1 length, extracting EN1 through the EN1 length, and decrypting EN1 to obtain N1;
3-2-3, extracting the ER length, extracting the ER through the ER length, and decrypting the ER to obtain R;
step 3-2-4, extracting the subsequent data N2;
step 3-2-5, restore N1 and N2 to U (k, V), Ad (k, V), V' and T by random selection procedure R.
Preferably, in the step (4), the tag filtering is to filter the data according to the data attribute provided by the client through a policy rule; the strategy rule is composed of a left bracket [, a keyword begin, an expression exp, a keyword end and a right bracket ]; the expression is composed of basic terms and composition terms, wherein the basic terms comprise variables var, numerical values and character strings, and the composition terms are complex expressions which are connected through unary and binary operational characters by the variables, the numerical values and the character strings.
Preferably, the label filtering comprises the following steps:
step 4-1, extracting the user information U (k, v) and the data attribute information Ad (k, v) and assigning new attribute information Ad' (k, v) again;
step 4-2, extracting strategy rules from a strategy library;
4-3, traversing the strategy rule expression exp, and extracting a variable var in the expression;
step 4-4, extracting a value v corresponding to the var from the Ad' (k, v) by taking the var as a key;
step 4-5, replacing var in the strategy rule by v, and calculating an expression;
and 4-6, judging whether the data is filtered or not according to the calculation result, and recording the log.
Preferably, the step (5) includes the steps of:
step 5-1, the feature vector V' and the feature vector V of the sensitive library in the isolation service side are subjected to cosine calculation to obtain a cosine similarity value, and the cosine similarity calculation formula is as follows:
Figure GDA0000930522970000051
wherein V 'and V are two eigenvectors, and V' and V are standard vector dot products defined as
Figure GDA0000930522970000052
t is the dimension of the vector, and the norm V' in the denominator is defined as
Figure GDA0000930522970000053
Norm V in denominator is defined as
Figure GDA0000930522970000054
And 5-2, comparing the cosine similarity value with a predefined similarity threshold value, analyzing whether the obtained message carries secret-related information, and filtering the secret-related documents.
Compared with the prior art, the invention has the beneficial effects that:
the invention only needs to provide a private JDBC driver, does not need to open TNS protocol communication in a non-secret-involved network, and only opens TNS protocol communication in a secret-involved network, so that messages of networks at two sides of an isolation boundary are completely subjected to semantic translation without simple mapping relation, and an attacker of the non-secret-involved network cannot attack internal network TNS protocol loopholes, thereby realizing perfect protocol isolation and greatly improving the isolation strength.
The invention is based on the special isolation exchange architecture, the deep content analysis and the content characteristic value extraction are completed by moving to the preposed agent side, and only characteristic value matching is carried out on the isolation device side, so that the calculation requirement of the isolation boundary is greatly reduced. Under the environment of the power internet of things, the technology can utilize hundreds of millions of intelligent terminal devices to realize distributed content filtering calculation, so that high-efficiency low-delay distributed content filtering is realized. The contradiction between the filtering depth and the exchange efficiency is better solved.
According to the invention, the front-end agent is introduced, the boundary of the isolation exchange is moved to the terminal side, a large number of intelligent terminals in the power Internet of things environment are constructed based on the trusted computing concept, the front-end agent software running on the intelligent terminals can be combined with the trusted computing system of the intelligent terminals, and the whole isolation exchange system is brought into the trusted exchange system through the reinforcement of the private application layer protocol, so that the trusted isolation exchange is realized.
Drawings
FIG. 1 is a flow chart of a data isolation exchange and security filtering method for an electric power Internet of things provided by the invention
FIG. 2 is a flow chart of the present invention for extracting feature vectors in the pre-proxy portion
FIG. 3 is a flow chart of proprietary formatting of tag content by the tag and proprietary protocol encapsulation provided by the present invention
FIG. 4 is a flow chart of policy filtering provided by the present invention
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the invention provides a data isolation exchange and security filtering method for an electric power internet of things, which adopts the following technical scheme:
step 1, construction of isolation framework
An isolation architecture based on a pre-proxy and a proprietary protocol is constructed, the isolation architecture comprises a proprietary application layer interactive protocol based on a TCP/IP protocol and a dedicated safety isolation device, the device has the capability of decomposing, recombining and exchanging the hardware-level TCP/IP protocol on one hand, and only supports the communication of the proprietary application layer protocol on the other hand, all third-party public application layer protocols and the pre-proxy are rejected, the pre-proxy can be in the forms of a driver, an SDK (software development kit) or a hardware plug-in and the like, the main function in the architecture is to map a user interaction process into a proprietary application layer protocol message to realize data exchange, and the pre-proxy can also play the roles of terminal reinforcement and credible authentication in the actual realization process.
Step 2, feature vector extraction is realized in the preposed agent part, as shown in fig. 2
Firstly, preprocessing the content Ti in the message, then extracting the features to generate a message feature vector V' and a sensitive library feature vector V, and storing the extracted feature vectors into the label field of the message.
(1) Pretreatment of
Performing word segmentation analysis on the text file through an ICTS word segmentation interface, and obtaining the message content TiAfter word segmentation, the expression is as follows:
Ti=((ai1,li1,pi1),(ai2,li2,pi2),......,(ain,lin,pin))
wherein: t isiRepresenting messages i, ainDenotes a divided phrase, linIndicates the length of the phrase, pinAnd expressing the part of speech of the divided phrase.
(2) Feature extraction
1) Part-of-speech selection
In the Chinese text, the keywords which can most strongly express the article content are selected according to the parts of speech and used for the subsequent feature extraction, which is beneficial to eliminating redundancy and simplifying the calculation process. Therefore, the noun word groups in the analyzed text word groups are extracted, and other parts of speech are deleted. Text file TiAfter part of speech selection, the expression is as follows:
Figure GDA0000930522970000061
in the formula: taiIn order to extract the text following the noun,
Figure GDA0000930522970000071
Figure GDA0000930522970000072
for the sake of a noun, the term,
Figure GDA0000930522970000073
is the length of the noun phrase.
2) Word frequency statistics
And counting the occurrence frequency of the keywords to form word segmentation triples containing word groups, the occurrence frequency and the part of speech of the word groups in the text. Will TaiAdding a word frequency term, further expressing as:
in the formula: tbiTo count the text after the word frequency,
Figure GDA0000930522970000075
for the word group after the word frequency is counted,
Figure GDA0000930522970000076
for counting the length of the word group after the word frequency,
Figure GDA0000930522970000077
is composed of
Figure GDA0000930522970000078
The word frequency of (c).
3) Word length selection
In a text of chinese, words have a stronger expressive power than words, the length of each keyword is calculated and the keywords of a single word are deleted. Further expressed as:
Figure GDA0000930522970000079
in the formula: tciFor deleting a keyword as text after a single word, wherein
Figure GDA00009305229700000710
Is a phrase with the length larger than one character,
Figure GDA00009305229700000711
is composed of
Figure GDA00009305229700000712
And (4) word frequency.
4) Word frequency selection
In the Chinese text, words appearing only once are occasional and have no representativeness, so phrases appearing only once in the text word-dividing triplets after statistics are removed. The final characteristic binary expression is obtained as:
Figure GDA00009305229700000713
wherein: tdiTo cull text after a keyword has appeared once,
Figure GDA00009305229700000714
in order to remove the phrase after the keyword appears once,
Figure GDA00009305229700000715
is composed of
Figure GDA00009305229700000716
Word frequency of wherein
Figure GDA00009305229700000717
(3) Generating feature vectors
The calculation of the weight of the word is an effective method for measuring the characteristic value, and a TF-IDF formula based on a statistical method is widely used at present, and is proved to be feasible and effective in a large number of practical uses. The core idea is that the less the number of occurrences of a word in other texts, the more information the word contains and the more representative the type of document, and conversely, if the word also occurs in a large number in other documents, the less representative the word.
The currently common calculation formula for the TF-IDF is expressed as follows:
dij=tij*log(N/nj)
wherein, tijIs the phrase aijIn the text TiIs equal to TdiF in (1)imN is the total number of documents, NjFor the inclusion of phrase a in the document libraryijThe number of documents.
The feature vector composed of sensitive library data is represented as:
V=((a11,d11),(a12,d12),......,(a1m,d1m),......,(an1,dn1),(an1,dn1),......,(anm,dnm))
for brevity, this is:
V=(d11,d12,......,d1m,......,dn1dn2,......,dnm)
the characteristic vector of the message obtained by the same method is simplified as:
V'=(d'11,d'12,......,d'1m,......,d'n1d'n2,......,d'nm)
step 3, realizing label encapsulation at the preposed agent part and realizing label analysis at the isolated service side
The label package and analysis includes label, label and private protocol package, label and private protocol analysis. The label encapsulation and analysis are carried out by sending user information of an access user at a sending end, sending data attribute information, marking the characteristic vector information of the data, then carrying out random block encryption on the data through a private protocol, and then sending the data to a server end. At the server, the data is first recovered by parsing techniques. The recovered data serves for label filtering and feature vector filtering.
The label comprises user information U, data attribute information, a characteristic vector V, generation time T, an encryption identifier and other information.
Label=(U(k,v),Ad(k,v),V’,T,Fe)
Wherein the content of the first and second substances,
1) the user information comprises user identity information and user request operation information, and the user information exists in a key value pair mode;
2) the data attribute information includes data type, data size, data creator, data modification time, etc., and the data attribute also exists in the form of key-value pair.
3) The feature vector is used for content filtering based on the feature vector of the server;
4) the generation time is the time of label generation;
5) the encryption identifier is used for identifying whether the block data is encrypted after the label is blocked, and the encryption identifier is used when the server side does not analyze the block data.
As shown in fig. 3, the tag and proprietary protocol encapsulation proprietary format the tag content, as follows,
a, arranging user information U, data attribute information Ad (k, V), a characteristic vector V' and generation time T in sequence, and dividing the sequence into N blocks;
b, randomly selecting N1 blocks from the N blocks, setting an encryption identifier, and encrypting data to obtain EN 1;
recording a random selection process R, setting an encryption identifier by taking R as a block, and encrypting R to obtain ER;
d, setting no encryption identification for the rest N2(N-N1) blocks;
step E-calculate the length of EN1 and the length of ER, and then concatenate E with EN1 length, EN1, ER length, ER, and N2, as shown in FIG. 3.
And after the label and the proprietary protocol are packaged, the label and the proprietary protocol are sent to a server side in a message form. The server side firstly analyzes the label and the private protocol of the report and recovers the label value, and the steps are as follows:
step a, acquiring private protocol data E;
step b, extracting the length of EN1, extracting EN1 through the length of EN1, and decrypting EN1 to obtain N1;
step c, extracting ER length, extracting ER through the ER length, and decrypting the ER to obtain R;
d, extracting the subsequent data N2;
step e-restore N1 and N2 to U (k, V), Ad (k, V), V' and T by random selection procedure R.
Step 4, realizing label filtration at the isolation service side
The label filtering filters the data according to the data attribute provided by the client by designing a flexible strategy rule.
Policy rules are a specification of policy filtering. The policy rules provide a unified policy description to be able to process the attribute information to achieve the purpose of filtering the data. In order to facilitate calculation and expansion, the strategy rule is designed into a self-defined expression which is composed of variables, values and operational characters. The variable values are extracted from the data attribute information according to the variables, and the operators and the variables are set by specific strategies. And during filtering, replacing the variable value with the attribute value, then calculating the strategy expression, and finally outputting the calculation result. Since the policy rules use expressions, the policy rules are very flexible.
The policy rules are formalized such that,
the policy rule is composed of left brackets [, keywords begin, expressions (exp), keywords end, right brackets ]. The expression is made up of two parts,
the basic items are: variables (var), values (float and integer) and strings (string);
the composition items are as follows: complex expressions linked by variables, values and strings, through unary (opu), binary (opb) operators.
The policy rules are described by using the self-defined expression, so that the filtering based on the policy is not only convenient, but also strong in operability and flexible in expansibility. The policy filtering process is shown in figure 4,
step a: extracting user information and data attribute information U (k, v) and Ad (k, v) from the analyzing step, and reassigning to new attribute information Ad (k, v);
b, extracting a strategy rule exp from a strategy library;
step c: traversing the strategy rule expression exp, and extracting a variable var in the expression;
step d: extracting a value v (integer number, floating point number and character string) corresponding to var from Ad (k, v) by taking var as a key;
step e: replacing var in the strategy rule by v, and calculating an expression;
step f: and judging whether the data is filtered or not according to the calculation result, and recording the log.
Step 5, combining the feature vector and the label filtration to realize content filtration at the isolation service side
Extracting a characteristic vector value V' in the analysis label, and obtaining the similarity with a characteristic vector V of a sensitive library on the isolation device through cosine calculation, wherein the cosine similarity calculation formula is as follows:
Figure GDA0000930522970000101
wherein V 'and V are two eigenvectors, and V' and V are standard vector dot products defined as
Figure GDA0000930522970000102
t is the dimension of the vector, which can be generally expressed as a series of numbers, each number in the series being called a component, i.e. the number of components, e.g. (a1, a2, a3) having a dimension of 3, the norm V' | in the denominator being defined as
Figure GDA0000930522970000103
And comparing the cosine similarity value with a predefined similarity threshold value, analyzing whether the obtained message carries secret-related information, and filtering the secret-related documents to achieve the function of filtering the contents.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (6)

1. A data isolation exchange and safety filtering method for an electric power Internet of things is characterized by comprising the following steps:
constructing an isolation framework based on a front proxy and a proprietary protocol;
step (2) extracting the feature vector in the preposed agent part;
step (3) label encapsulation is carried out on the preposed agent part, and label analysis is carried out on the isolation service side;
step (4) realizing label filtration at the isolation service side;
step 5, filtering content on the isolation service side by combining feature vector and label filtering;
the step (2) comprises the following steps:
step 2-1, for message content TiCarrying out pretreatment;
step 2-2, for the message content TiCarrying out feature extraction;
step 2-3, generating a message characteristic vector V' and a characteristic vector V of a sensitive library in an isolation service side;
step 2-4, storing the extracted feature vector into a label field of the message;
the preprocessing is to carry out word segmentation analysis on the text file through an ICTCCLAS word segmentation interface, and the message content TiAfter word segmentation, the expression is as follows:
Ti=((ai1,li1,pi1),(ai2,li2,pi2),......,(ain,lin,pin))
in the formula: t isiRepresenting messages i, ainDenotes a divided phrase, linIndicates the length of the phrase, pinRepresenting the part of speech of the divided phrase;
in the step 2-2, the method comprises the following steps:
step 2-2-1, for the message content TiSelecting part of speech, extracting noun word groups in the analyzed text word groups, deleting other parts of speech, and the message content TiAfter part of speech selection, the expression is as follows:
Figure FDA0002570513110000011
in the formula: taiIn order to extract the text following the noun,
Figure FDA0002570513110000012
Figure FDA0002570513110000013
for the sake of a noun, the term,
Figure FDA0002570513110000014
is the length of the noun phrase;
step 2-2-2, counting the occurrence frequency of the keywords to form word segmentation triplets comprising the word groups, the occurrence frequency and the part of speech of the word groups in the text, and converting TaiAdding a word frequency item, wherein the expression is as follows:
Figure FDA0002570513110000021
in the formula: tbiTo count the text after the word frequency,
Figure FDA0002570513110000022
for the word group after the word frequency is counted,
Figure FDA0002570513110000023
for counting the length of the word group after the word frequency,
Figure FDA0002570513110000024
is composed of
Figure FDA0002570513110000025
The word frequency of;
step 2-2-3, calculating the length of each keyword and deleting the keywords of a single word, wherein the expression is as follows:
Figure FDA0002570513110000026
in the formula: tciFor deleting a keyword as text after a single word, wherein
Figure FDA0002570513110000027
Is a phrase with the length larger than one character,
Figure FDA0002570513110000028
is composed of
Figure FDA0002570513110000029
Word frequency;
step 2-2-4, eliminating phrases with keywords appearing once, and obtaining a final expression as follows:
Figure FDA00025705131100000210
wherein: tdiTo cull text after a keyword has appeared once,
Figure FDA00025705131100000211
in order to remove the phrase after the keyword appears once,
Figure FDA00025705131100000212
is composed of
Figure FDA00025705131100000213
Word frequency of wherein
Figure FDA00025705131100000214
In the step 2-3, the method comprises the following steps:
step 2-3-1, calculating the weight of the phrase based on a TF-IDF formula, wherein the formula is as follows:
dij=tij*log(N/nj)
wherein d isijIs the phrase aijIn the text TiIs equal to TdiIn (1)
Figure FDA00025705131100000215
N is the total number of documents, NjFor words contained in document librariesGroup aijThe number of documents of (2);
step 2-3-2, the feature vector composed of the sensitive library data is expressed as:
V=((a11,d11),(a12,d12),......,(a1m,d1m),......,(an1,dn1),(an1,dn1),......,(anm,dnm))
for brevity, this is:
V=(d11,d12,......,d1m,......,dn1,dn2,......,dnm)
step 2-3-3, according to step 2-3-2, the characteristic vector of the obtained message is simplified as:
V′=(d′11,d′12,......,d′1m,......,d′n1,d′n2,......,d′nm)。
2. the method according to claim 1, wherein in the step (1), the isolation framework comprises an application layer exchange protocol proprietary to the TCP/IP protocol, a dedicated security isolation device and a pre-proxy, and is configured to map the user interaction process to the application layer exchange protocol packet to implement data exchange.
3. The method according to claim 1, wherein in step (3), the tag includes user information U (k, V), data attribute information Ad (k, V), a feature vector V', a generation time T, and encryption flag Fe information, and the expression is:
Label=(U(k,v),Ad(k,v),V’,T,Fe)
the label encapsulation of the front proxy part comprises the following steps:
step 3-1-1, arranging user information U, data attribute information Ad (k, V), feature vectors V' and generation time T in sequence, and dividing the sequence into N blocks;
3-1-2, randomly selecting N1 blocks from the N blocks, setting an encryption identifier, and encrypting data to obtain EN 1;
3-1-3, recording a random selection process R, setting an encryption identifier by taking R as a block, and encrypting R to obtain ER;
3-1-4, setting no encryption identifier for the rest N-N1 blocks;
3-1-5, calculating the length of the EN1 and the length of the ER, and connecting the EN1 length, the EN1, the ER length, the ER and the rest N-N1 blocks to obtain label-encapsulated private protocol data E; the label analysis on the isolation service side comprises the following steps:
step 3-2-1, obtaining the private protocol data E;
3-2-2, extracting the EN1 length, extracting EN1 through the EN1 length, and decrypting EN1 to obtain N1;
3-2-3, extracting the ER length, extracting the ER through the ER length, and decrypting the ER to obtain R;
3-2-4, extracting the subsequent data N-N1;
step 3-2-5, N1 and N-N1 were restored to U (k, V), Ad (k, V), V' and T by random selection procedure R.
4. The method according to claim 3, wherein in the step (4), the tag filtering is to filter the data according to the data attribute provided by the client through a policy rule; the strategy rule is composed of left brackets [ keyword begin, expression exp, keyword end and right brackets ]; the expression is composed of basic terms and composition terms, wherein the basic terms comprise variables var, numerical values and character strings, and the composition terms are complex expressions which are connected through unary and binary operational characters by the variables, the numerical values and the character strings.
5. The method of claim 4, wherein the tag filtering comprises the steps of:
step 4-1, extracting the user information U (k, v) and the data attribute information Ad (k, v) and assigning new attribute information Ad' (k, v) again;
step 4-2, extracting strategy rules from a strategy library;
4-3, traversing the strategy rule expression exp, and extracting a variable var in the expression;
step 4-4, extracting a value v corresponding to the var from the Ad' (k, v) by taking the var as a key;
step 4-5, replacing var in the strategy rule by v, and calculating an expression;
and 4-6, judging whether the data is filtered or not according to the calculation result, and recording the log.
6. The method according to claim 5, wherein the step (5) comprises the steps of:
step 5-1, the feature vector V' and the feature vector V of the sensitive library in the isolation service side are subjected to cosine calculation to obtain a cosine similarity value, and the cosine similarity calculation formula is as follows:
Figure FDA0002570513110000041
wherein V 'and V are two eigenvectors, and V' and V are standard vector dot products defined as
Figure FDA0002570513110000042
t is the dimension of the vector, and the norm V' in the denominator is defined as
Figure FDA0002570513110000043
Norm V in denominator is defined as
Figure FDA0002570513110000044
And 5-2, comparing the cosine similarity value with a predefined similarity threshold value, analyzing whether the obtained message carries secret-related information, and filtering the secret-related documents.
CN201510824673.3A 2015-11-24 2015-11-24 Data isolation exchange and safety filtering method for power Internet of things Active CN105491023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510824673.3A CN105491023B (en) 2015-11-24 2015-11-24 Data isolation exchange and safety filtering method for power Internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510824673.3A CN105491023B (en) 2015-11-24 2015-11-24 Data isolation exchange and safety filtering method for power Internet of things

Publications (2)

Publication Number Publication Date
CN105491023A CN105491023A (en) 2016-04-13
CN105491023B true CN105491023B (en) 2020-10-27

Family

ID=55677739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510824673.3A Active CN105491023B (en) 2015-11-24 2015-11-24 Data isolation exchange and safety filtering method for power Internet of things

Country Status (1)

Country Link
CN (1) CN105491023B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109936541B (en) * 2017-12-18 2021-10-01 中国电子科技集团公司第十五研究所 Software defined network data isolation exchange method
CN108923422B (en) * 2018-07-13 2021-09-03 全球能源互联网研究院有限公司 Internet of things agent data processing method and system and power grid terminal equipment monitoring system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035128A (en) * 2007-04-18 2007-09-12 大连理工大学 Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
CN103001862A (en) * 2011-09-14 2013-03-27 日照市活点网络科技有限公司 Switching system of internet of things and data processing method
CN103377252A (en) * 2012-04-28 2013-10-30 国际商业机器公司 Method and device for data filtration in Internet of Things
US8838741B1 (en) * 2007-09-05 2014-09-16 Trend Micro Incorporated Pre-emptive URL filtering technique
CN104378657A (en) * 2014-09-01 2015-02-25 国家电网公司 Video security access system based on agency and isolation and method of video security access system
CN104881581A (en) * 2015-05-28 2015-09-02 成都艺辰德迅科技有限公司 IoT (Internet of Things) data high-efficiency analysis method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725307B2 (en) * 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Query engine for processing voice based queries including semantic decoding
US7562393B2 (en) * 2002-10-21 2009-07-14 Alcatel-Lucent Usa Inc. Mobility access gateway
US20070149952A1 (en) * 2005-12-28 2007-06-28 Mike Bland Systems and methods for characterizing a patient's propensity for a neurological event and for communicating with a pharmacological agent dispenser
CN102006307A (en) * 2010-12-16 2011-04-06 中国电子科技集团公司第三十研究所 Application proxy-based network management system isolation control device
US9041817B2 (en) * 2010-12-23 2015-05-26 Samsung Electronics Co., Ltd. Method and apparatus for raster output of rotated interpolated pixels optimized for digital image stabilization
CN103116620B (en) * 2013-01-29 2016-01-20 国家电网公司 Based on the unstructured data safety filtering method of strategy
CN103338190B (en) * 2013-06-13 2016-05-11 国家电网公司 Based on the believable unstructured data secure exchange of user behavior method
CN104346379B (en) * 2013-07-31 2017-06-20 克拉玛依红有软件有限责任公司 A kind of data element recognition methods of logic-based and statistical technique

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101035128A (en) * 2007-04-18 2007-09-12 大连理工大学 Three-folded webpage text content recognition and filtering method based on the Chinese punctuation
US8838741B1 (en) * 2007-09-05 2014-09-16 Trend Micro Incorporated Pre-emptive URL filtering technique
CN103001862A (en) * 2011-09-14 2013-03-27 日照市活点网络科技有限公司 Switching system of internet of things and data processing method
CN103377252A (en) * 2012-04-28 2013-10-30 国际商业机器公司 Method and device for data filtration in Internet of Things
CN104378657A (en) * 2014-09-01 2015-02-25 国家电网公司 Video security access system based on agency and isolation and method of video security access system
CN104881581A (en) * 2015-05-28 2015-09-02 成都艺辰德迅科技有限公司 IoT (Internet of Things) data high-efficiency analysis method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
电力数据通信综合传输平台探讨;毛秀伟;《电力系统通信》;20070710;1-6 *
网络层MPLS的安全VPN研究;朱长安;《计算机工程》;20030720;1-3 *

Also Published As

Publication number Publication date
CN105491023A (en) 2016-04-13

Similar Documents

Publication Publication Date Title
CN110011931B (en) Encrypted flow type detection method and system
CN112468347B (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN112468370A (en) High-speed network message monitoring and analyzing method and system supporting custom rules
CN113542259B (en) Encrypted malicious flow detection method and system based on multi-mode deep learning
US8321560B1 (en) Systems and methods for preventing data loss from files sent from endpoints
Harichandran et al. Bytewise approximate matching: the good, the bad, and the unknown
CN104022924A (en) Method for detecting HTTP (hyper text transfer protocol) communication content
Wang et al. Using CNN-based representation learning method for malicious traffic identification
CN111177779A (en) Database auditing method, device thereof, electronic equipment and computer storage medium
CN105491023B (en) Data isolation exchange and safety filtering method for power Internet of things
Iadarola et al. Image-based Malware Family Detection: An Assessment between Feature Extraction and Classification Techniques.
CN109495583A (en) A kind of data safety exchange method that Intrusion Detection based on host feature is obscured
Akbar et al. Knowledge mining in cybersecurity: From attack to defense
Aldwairi et al. n‐Grams exclusion and inclusion filter for intrusion detection in Internet of Energy big data systems
CN113965393B (en) Botnet detection method based on complex network and graph neural network
Kulkarni et al. Personally identifiable information (pii) detection in the unstructured large text corpus using natural language processing and unsupervised learning technique
Hong et al. Graph based encrypted malicious traffic detection with hybrid analysis of multi-view features
Qiao et al. Malware classification method based on word vector of bytes and multilayer perception
CN116055067A (en) Weak password detection method, device, electronic equipment and medium
Zongxun et al. Construction of ttps from apt reports using bert
CN112804239B (en) Traffic safety analysis modeling method and system
Zhu et al. SQL Injection Attack Detection Framework Based on HTTP Traffic
Zhang et al. Detection of android malicious family based on manifest information
CN108133018B (en) Data evidence obtaining recommendation method based on association aggregation
CN114979990B (en) Short message filtering method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant