CN113360522B - Method and device for rapidly identifying sensitive data - Google Patents

Method and device for rapidly identifying sensitive data Download PDF

Info

Publication number
CN113360522B
CN113360522B CN202010145893.4A CN202010145893A CN113360522B CN 113360522 B CN113360522 B CN 113360522B CN 202010145893 A CN202010145893 A CN 202010145893A CN 113360522 B CN113360522 B CN 113360522B
Authority
CN
China
Prior art keywords
rule
result
identification
data
priority
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010145893.4A
Other languages
Chinese (zh)
Other versions
CN113360522A (en
Inventor
于策
冯昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Original Assignee
Qianxin Technology Group Co Ltd
Secworld Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianxin Technology Group Co Ltd, Secworld Information Technology Beijing Co Ltd filed Critical Qianxin Technology Group Co Ltd
Priority to CN202010145893.4A priority Critical patent/CN113360522B/en
Publication of CN113360522A publication Critical patent/CN113360522A/en
Application granted granted Critical
Publication of CN113360522B publication Critical patent/CN113360522B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for quickly identifying sensitive data, relates to the technical field of data processing, and aims to solve the problem of low identification efficiency of the sensitive data in the prior art. The method mainly comprises the following steps: generating an identification strategy of the data to be identified according to a preset identification rule, and selecting a priority identification rule in the identification strategy; extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule; acquiring a priority scanning result according to the priority identification rule scanning rule data; when the preferential scanning result can calculate and obtain the identification strategy result of the identification strategy, determining whether the identification strategy result is the identification result of the sensitive data or not. The method is mainly applied to the process of monitoring the network environment.

Description

Method and device for rapidly identifying sensitive data
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for quickly identifying sensitive data.
Background
With the development of internet technology, data transmission through a network is a mainstream manner of information transmission, and data transmission may involve sensitive data. Because of the specificity of the sensitive data, it is necessary to identify whether the transmission data contains the sensitive data, and perform operations such as transmission restriction, encryption, analysis, and the like on the transmission data according to the identification result.
In the prior art, all the sensitive data are generally scanned, a large amount of CPU and memory resources are wasted in the scanning process, and a large amount of unnecessary information exists in the preferential scanning result, so that the efficiency of sensitive data identification is reduced.
Disclosure of Invention
In view of the above, the present invention provides a method and apparatus for quickly identifying sensitive data, which mainly aims to solve the problem of low identification efficiency of sensitive data in the prior art.
According to one aspect of the present invention, there is provided a method of rapidly identifying sensitive data, comprising:
generating an identification strategy of data to be identified according to a preset identification rule, and selecting a priority identification rule in the identification strategy;
extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule;
scanning the rule data according to a priority identification rule to obtain a priority scanning result;
when the preferential scanning result can be calculated to obtain the recognition strategy result of the recognition strategy, determining whether the recognition strategy result is the recognition result of the sensitive data or not.
Further, the selecting a priority identification rule in the identification policy includes:
And sequentially selecting the priority identification rules in the identification strategy according to the order of the execution efficiency of the preset identification rules from high to low.
Further, the extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule includes:
searching a type identifier corresponding to the rule type of the priority identification rule, wherein the type identifier comprises at least one or a combination of more than one of the following: a channel protocol class identifier, an attribute class identifier and a content class identifier;
extracting rule data corresponding to the type identifier from the data to be identified, wherein the rule data is in one-to-one correspondence with the type identifier, and the rule data comprises at least one or a combination of more than one of the following: channel protocol data, attribute data, content data.
Further, the scanning the rule data according to the priority identification rule to obtain a priority scanning result includes:
scanning the rule data according to a preset scanning algorithm in the priority identification rule to obtain the priority scanning result, wherein the preset scanning algorithm is a character string matching algorithm or an artificial intelligence algorithm.
Further, before the determining that the recognition policy result is the recognition result of whether the data to be recognized is sensitive data, the method further includes:
Extracting rule abstract information of a preset scanning algorithm and rule types in each preset recognition rule according to an information abstract algorithm, and establishing a mapping relation between the rule abstract information and the preset recognition rules;
searching a copy identification rule which is the same as the rule abstract information of the priority identification rule according to the mapping relation in the preset identification rule;
and assigning the scanning result corresponding to the copy identification rule as the preferential scanning result.
Further, the logical operators in the identification strategy comprise AND operations, and/or OR operations, and/or NOT operations; the preferential scan result includes a miss and a hit; the default scanning result of the preset recognition rule, which does not acquire the scanning result, in the recognition strategy is 'uncertain';
judging that the preferential scanning result can be calculated to obtain the recognition strategy result of the recognition strategy comprises the following steps:
in the identification strategy, searching a logic operator corresponding to the priority identification rule according to a combination method of the logic operator;
if the logic operator corresponding to the priority identification rule is AND operation, if the priority scanning result is not hit, determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result not hit; and/or the number of the groups of groups,
If the logic operator corresponding to the priority identification rule is OR operation, if the priority scanning result is hit, determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result hit; and/or the number of the groups of groups,
if the logic operator corresponding to the priority identification rule is not operation, if the priority scanning result is 'uncertain', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'uncertain', if the priority scanning result is 'miss', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'hit', and if the priority scanning result is 'hit', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'miss'; and/or the number of the groups of groups,
if the logic operator corresponding to the priority scanning result is not determined to be capable of calculating to obtain a logic operation recognition result, determining that the recognition strategy result can not be calculated according to the priority scanning result; and/or the number of the groups of groups,
If the logic operator corresponding to the preferential scanning result is determined to be capable of calculating to obtain a logic operation identification result, searching the logic operator corresponding to the logic operation identification result according to a combination method of the logic operator in the identification strategy; and/or the number of the groups of groups,
if the logic operation recognition result can be calculated according to the operator type of the logic operator corresponding to the logic operation recognition result, continuing to judge whether the logic operation recognition result can be calculated or not until the logic operation recognition result does not have the corresponding logic operator and the logic operation recognition result can be calculated, determining whether the recognition strategy result can be calculated according to the priority scanning result or not, and otherwise, determining that the recognition strategy result cannot be calculated according to the priority scanning result.
Further, after the determining that the recognition policy result is the recognition result of whether the data to be recognized is sensitive data, the method further includes:
when the priority scanning result cannot be calculated to obtain the recognition strategy result of the recognition strategy, recording the priority scanning result corresponding to the priority recognition rule, re-extracting the priority recognition rule in the recognition strategy, obtaining a secondary priority scanning result corresponding to the re-extracted priority recognition rule, and judging whether the recognition strategy result of the recognition strategy can be calculated according to the priority scanning result and the secondary priority scanning result.
Further, after the determining that the recognition policy result is the recognition result of whether the data to be recognized is sensitive data, the method further includes:
if the identification strategy results are not hit, determining that the data to be identified is not sensitive data;
if the identification strategy results are hit, determining that the data to be identified is sensitive data, and processing the data to be identified by adopting a mode of prohibiting transmission, generating an alarm prompt popup window or tracing the source of the data to be identified.
According to another aspect of the present invention, there is provided an apparatus for rapidly identifying sensitive data, comprising:
the generation module is used for generating an identification strategy of the data to be identified according to a preset identification rule, and selecting a priority identification rule in the identification strategy;
the extraction module is used for extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule;
the acquisition module is used for scanning the rule data according to the priority identification rule to acquire a priority scanning result;
and the determining module is used for determining whether the identification strategy result is the identification result of the sensitive data or not when the priority scanning result can calculate the identification strategy result of the identification strategy.
According to still another aspect of the present invention, there is provided a storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the method for rapidly identifying sensitive data as described above.
According to still another aspect of the present invention, there is provided a computer device including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method for quickly identifying the sensitive data.
According to a further aspect of the present invention there is provided a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, characterized in that the program instructions, when executed by a computer, cause the computer to perform the method steps of quickly identifying sensitive data.
By means of the technical scheme, the technical scheme provided by the embodiment of the invention has at least the following advantages:
The invention provides a method and a device for quickly identifying sensitive data, which are characterized in that an identification strategy of the data to be identified is generated according to preset identification rules, a priority identification rule in the identification strategy is selected, rule data corresponding to the rule type in the data to be identified is extracted according to the rule type of the priority identification rule, a priority scanning result is obtained according to the priority identification rule scanning rule data, and finally, when the priority scanning result can be calculated to obtain the identification strategy result of the identification strategy, whether the identification strategy result is the identification result of the sensitive data is determined. Compared with the prior art, the method and the device have the advantages that the rule data corresponding to the rule type is selected from the data to be identified, if the identification strategy result can be determined according to the rule data, other data in the data to be identified are not scanned, the rule data as few as possible are extracted, the data to be identified as few as possible are scanned, and the identification strategy result of whether the data to be identified is sensitive data or not is obtained according to the preferential scanning result as few as possible, so that the computer system resource occupation is reduced, and the identification efficiency of the sensitive data is improved.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a flow chart of a method for quickly identifying sensitive data according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for quickly identifying sensitive data provided by an embodiment of the present invention;
FIG. 3 is a block diagram showing an apparatus for quickly identifying sensitive data according to an embodiment of the present invention;
FIG. 4 is a block diagram showing another apparatus for quickly identifying sensitive data according to an embodiment of the present invention;
fig. 5 shows a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The monitoring and judging whether the network transmission data can threaten national security, threat the network completely, influence the network environment or not can be generalized to identify whether the transmission data comprises sensitive data or not. The data transmitted through the network is more and more, the time required for monitoring the data transmitted by the network is longer and longer, so that the real-time performance in the data transmission process is poorer, the user experience is influenced, and the improvement of the identification efficiency of the sensitive data becomes a problem to be solved urgently.
The embodiment of the invention provides a method for quickly identifying sensitive data, as shown in fig. 1, which comprises the following steps:
101. and generating an identification strategy of the data to be identified according to the preset identification rules, and selecting a priority identification rule in the identification strategy.
The preset recognition rules comprise a scanning algorithm for recognizing whether the data to be recognized is sensitive data or not, and rule types. The preset recognition rule may be a protocol rule, a file size rule, a keyword rule, or the like. Different preset recognition rules, corresponding preset scanning algorithms and rule types are different, and if the preset recognition rules are protocol rules, the preset scanning algorithm adopts a character string matching algorithm, and the rule types are set to be channel protocol types. If the preset recognition rule is a keyword rule, the preset scanning algorithm can also adopt a multimode character string matching algorithm, and the rule type is set as a content type.
The identification policy at least comprises one preset identification rule, if the identification policy comprises a plurality of preset identification rules, the plurality of preset identification rules can be completely different rules or can be not completely the same rules, and in the embodiment of the application, the dissimilarity of the preset identification rules contained in the identification policy is not limited. The identification strategy is used for carrying out logic operation on an identification result of identifying the data to be identified according to a preset identification rule. The specific logic operation method is set according to the actual requirement, for example, if the data to be identified includes "character string 1" or "character string 2", it is indicated that the data to be identified is sensitive data, that is, the preset identification rule adopted by the identification of "character string 1" and the preset identification rule adopted by the identification of "character string 2", and after OR operation is required, a conclusion about whether the data to be identified is sensitive data can be obtained. The method of the logical operation is not limited in the present application.
The number of preset identification rules in the identification policy can be multiple, the identification policy can be formed by adopting a tiled organization mode among the preset identification rules, the identification policy can also be formed by adopting a layered organization mode, the influence of different organization modes on the identification performance of the sensitive data is small, and the organization mode of the preset identification rules in the identification policy is not limited in the embodiment of the application. Assuming that the recognition strategy comprises three preset recognition rules including rule 1, rule 2 and rule 2, the logic operation between the preset recognition rules is "rule 1& rule 2| -! Rule 3", this identification policy mode is a tiling organization mode. The recognition strategy is assumed to comprise a rule group 1 and a rule group 2, wherein the preset recognition rule in the rule group 1 comprises a rule 1 and a rule 2, the logical relationship between the rule 1 and the rule 2 is "&", the logical relationship between the rule group 1 and the rule group 2 is "|", and the recognition strategy mode is a hierarchical organization mode. The different organization modes in the recognition strategy are equivalent to limiting the calculation priority among the preset recognition rules.
And selecting a priority recognition rule from all preset recognition rules of the recognition strategy according to the preset selection rule. The preset selection rules can be sequentially selected according to the arrangement sequence of the preset recognition rules from left to right in the recognition strategy, can be sequentially selected according to the sequence from low to high of the algorithm complexity of the scanning algorithm in the preset recognition rules, and can be sequentially selected according to the sequence from small to large of the data volume which is usually corresponding to the rule types in the preset recognition rules. In the embodiment of the invention, the selection mode of the priority identification rule is not limited.
102. And extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule.
The method comprises the steps of preferentially identifying rules, namely presetting identification rules, wherein rule data comprise channel protocol class data, attribute data and content data, and extracting corresponding rule data if the data to be identified comprise rule data corresponding to rule types; and if the data to be identified does not comprise the rule data corresponding to the rule type, setting the data content of the rule data to be null.
103. And acquiring a priority scanning result according to the priority identification rule scanning rule data.
The priority scan results are used to indicate the relationship of the rule data to the sensitive data, and include misses and hits. A miss refers to the rule data not being sensitive data, and a hit refers to the rule data being sensitive data. If the data content of the rule data is empty, the priority scan result is a miss. Marking the preferential identification rules of the scanned rule data so as to judge which preset identification rules in the identification strategy do not execute the scanning instruction. The preferential scanning result is obtained by scanning regular data in the data to be identified, but not all the data to be identified, the obtained scanning results are the same, but the data amount to be scanned is different, namely the mode of scanning part of the data to be identified in the application is to obtain an accurate scanning result with the least data scanning amount, and the scanning efficiency is improved on the premise of ensuring the accuracy.
104. When the preferential scanning result can calculate and obtain the identification strategy result of the identification strategy, determining whether the identification strategy result is the identification result of the sensitive data or not.
If the recognition strategy result can be calculated according to the preferential scanning result, the recognition strategy result can be obtained by only extracting the rule data and only scanning the rule data, and the recognition strategy result is uniquely determined. The recognition strategy results are similar to the preferential scan results, both miss and hit cases. Assuming that the recognition strategy comprises a rule 1 and a rule 2, wherein the logic relation between the rule 1 and the rule 2 is "≡", if the rule 1 is determined to be a priority recognition rule, the priority scanning result of scanning rule data corresponding to the priority rule according to the rule 1 is "miss", and the result of the recognition strategy according to the "≡operation rule is" miss ". Assuming that the identification policy includes rule 1 and rule 2, the logical relationship between rule 1 and rule 2 is "|", if rule 1 is determined to be a priority identification rule, then the priority scanning result of the rule data corresponding to the priority rule is "hit" according to the rule 1 scanning, and the result of the identification policy is "hit" according to the "≡" operation rule.
The recognition result is similar to the recognition strategy result and the priority scanning result, and comprises two situations of miss and hit. A miss refers to the data to be identified not being sensitive data, and a hit refers to the data to be identified being sensitive data. If the recognition strategy result of the recognition strategy cannot be calculated according to the priority scanning result, recording the priority scanning result corresponding to the priority recognition rule, and re-extracting the priority recognition rule in the recognition strategy, namely continuously and repeatedly acquiring and judging the new priority scanning result until the recognition strategy result can be calculated according to the priority scanning result. And judging whether the recognition strategy result can be calculated or not according to all the priority scanning results obtained in the second time and after the second time.
In the process of identifying the sensitive data, a plurality of identification strategies can be included, an interrupt mode can be adopted among the plurality of identification strategies, an audit mode can also be adopted, the interrupt mode means that any one strategy is hit, the scanning is stopped, and the audit mode means that all the identification strategies are scanned. Different scanning modes can be selected according to actual requirements to identify sensitive data. If the speed requirement for identifying the sensitive data is high, the sensitive data is identified by adopting an interrupt mode, and if the positions of all the sensitive data in the data to be identified need to be identified, the sensitive data is identified by adopting an audit mode.
The invention provides a method for quickly identifying sensitive data, which comprises the steps of generating an identification strategy of the data to be identified according to preset identification rules, selecting a priority identification rule in the identification strategy, extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule, scanning the rule data according to the priority identification rule to obtain a priority scanning result, and finally determining whether the identification strategy result is the identification result of the sensitive data or not when the identification strategy result of the identification strategy can be obtained by calculation according to the priority scanning result. Compared with the prior art, the method and the device have the advantages that the rule data corresponding to the rule type is selected from the data to be identified, if the identification strategy result can be determined according to the rule data, other data in the data to be identified are not scanned, the rule data as few as possible are extracted, the data to be identified as few as possible are scanned, and the identification strategy result of whether the data to be identified is sensitive data or not is obtained according to the preferential scanning result as few as possible, so that the computer system resource occupation is reduced, and the identification efficiency of the sensitive data is improved.
The embodiment of the invention provides another method for quickly identifying sensitive data, as shown in fig. 2, which comprises the following steps:
201. And generating an identification strategy of the data to be identified according to the preset identification rules, and selecting a priority identification rule in the identification strategy.
The identification strategy is used for carrying out logic operation on an identification result of identifying the data to be identified according to a preset identification rule. The preset recognition rules comprise a preset scanning algorithm, and rule types of the preset recognition rules comprise channel protocol classes, attribute classes and content classes. Each rule type is provided with a corresponding preset scanning algorithm and rule data, the rule types and the preset scanning algorithm are in one-to-one correspondence, and the rule types and the rule data types in the same data to be identified are also in one-to-one correspondence. And selecting an appropriate scanning algorithm according to different rule types so as to efficiently identify rule data in the data to be identified.
The preferential identification rule refers to a preset identification rule in the identification strategy, wherein the scanning result is not acquired. And arranging according to the order of the execution efficiency of the preset recognition rules from high to low, and extracting the priority recognition rules in the recognition strategy. Scanning algorithms for identifying rules are preset, and different algorithm complexity is achieved. Generally, the lower the complexity of the scanning algorithm, the higher the execution efficiency of the corresponding preset recognition rule. If the complexity of the scanning algorithm of the preset recognition rule is the same, the smaller the data size of the rule data corresponding to the rule type is, the higher the execution efficiency of the corresponding preset recognition rule is. The data volume of the rule data is judged according to the data volume corresponding to the rule type in theory, for example, compared with the attribute class and the content class, the attribute class is only a description of the attribute of the file, the data volume of the attribute class is generally considered to be smaller, the content class relates to the specific content of the file, and the data volume of the content class is considered to be larger than the data volume of the attribute class. Therefore, the arrangement sequence of the identification rule from high to low in execution efficiency is preset, namely the arrangement sequence of the scanning algorithm from low to high in algorithm complexity, and the data size of the rule type theoretical pairs is arranged from small to large when the algorithm complexity is the same.
202. And extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule.
In order to characterize specific data to be identified, data types such as data transmission protocol, coding mode, correction mode, data file name, memory size occupied by data, data format, data content and the like are generally used for representation, the data types can be divided into three types of attribute data, channel protocol data and content data, and in order to cover all data content of the data to be identified, the corresponding set rule data comprises attribute data, channel protocol data and content data. And setting type identifiers corresponding to each type of data one by one.
According to the rule type of the priority identification rule, extracting the rule data corresponding to the rule type in the data to be identified specifically comprises the following steps: searching a type identifier corresponding to the rule type of the priority identification rule, wherein the type identifier comprises at least one or a combination of more than one of the following: a channel protocol class identifier, an attribute class identifier and a content class identifier; extracting rule data corresponding to the type identifier from the data to be identified, wherein the rule data is in one-to-one correspondence with the type identifier, and the rule data comprises at least one or a combination of more than one of the following: channel protocol data, attribute data, content data. The data to be identified is usually binary data, specific type identifiers are specific to the rule data corresponding to different rule types, and the rule data corresponding to the type identifiers are extracted according to the type identifiers.
In the step, if the type identifier is a channel protocol type identifier, extracting channel protocol data in the data to be identified; if the type identifier is an attribute type identifier, extracting attribute data in the data to be identified; and if the type is the content type identification, extracting text content data in the data to be identified. Wherein the attribute data, i.e., metadata class, includes file name, file size, file binary format data, etc. For example, the HTTP protocol, the TCP protocol, the UDP protocol, and the like may be used in transmitting the data, and if the data to be identified transmitted using the HTTP protocol is set as sensitive data, only the transmission protocol used in the data to be identified needs to be extracted.
203. And scanning rule data according to a preset scanning algorithm in the priority identification rule, and obtaining a priority scanning result.
Preset scan algorithms include, but are not limited to, string matching algorithms or artificial intelligence algorithms. The character string matching algorithm is to judge whether the rule data is the same as the preset character string, and by way of example, assuming that the preset character string is "Beijing earthquake", the rule data is attribute data, and if four continuous Chinese characters of "Beijing earthquake" are detected in the attribute data in the data to be identified in the scanning process, the scanning result is preferentially "hit". Artificial intelligence algorithms are typically used to find semantics, emotion, or attitude of data content. The rule data are assumed to be content data, whether emotion is happy or not is judged, emotion in the content data of the data to be identified is extracted according to an artificial intelligence algorithm in the scanning process, and if the emotion is happy, the scanning result is hit preferentially. The specific method adopted by the artificial intelligence algorithm in the embodiment of the application is not limited.
The priority scan results are used to indicate the relationship of the rule data to the sensitive data, and include misses and hits. Specifically, according to a preset scanning algorithm in the priority identification rule, scanning rule data to obtain a priority scanning result.
204. And extracting rule abstract information of a preset scanning algorithm and rule types in each preset recognition rule according to the information abstract algorithm, and establishing a mapping relation between the rule abstract information and the preset recognition rules.
The number of preset recognition rules of the recognition strategy is at least 2. And extracting rule abstract information of the preset recognition rule by using a preset scanning algorithm and rule types in the preset recognition rule as basic information through an information abstract algorithm. After the rule digest information is extracted, a mapping relationship between the rule digest information and a preset recognition rule needs to be established. The message digest algorithm may be an MD5 algorithm.
205. And searching the copy identification rule which is the same as the rule abstract information of the priority identification rule according to the mapping relation in the preset identification rule.
The duplicate recognition rule refers to the rule which is the same as the priority recognition rule in the preset recognition rule. Firstly, acquiring rule abstract information of a priority recognition rule, then comparing the rule abstract information of preset recognition rules one by one, and when rule abstract information which is the same as the rule abstract information of the priority recognition rule is found, determining a copy recognition rule of a non-priority recognition rule by the preset recognition rule corresponding to the abstract information.
206. And assigning the scanning result corresponding to the copy identification rule as a priority scanning result.
The number of preset recognition rules of the recognition strategy is at least 2, and the copy recognition rules are not necessary to be searched. And extracting rule abstract information of the preset recognition rule by using the MD5 algorithm based on a preset scanning algorithm and rule types in the preset recognition rule. After the rule digest information is extracted, a mapping relationship between the rule digest information and a preset recognition rule needs to be established. And searching the copy identification rule which is the same as the priority identification rule in the identification strategy according to the mapping relation between the rule abstract information and the preset identification rule, and then assigning the scanning result of the copy identification rule as the priority scanning result. Through the assignment process, repeated scanning of the same preset identification rule is avoided, so that the efficiency of identifying the sensitive data is improved.
207. When the preferential scanning result can calculate and obtain the identification strategy result of the identification strategy, determining whether the identification strategy result is the identification result of the sensitive data or not.
The process of calculating whether the recognition strategy result can be calculated according to the priority scanning result is similar to the logic operation process which is generally considered, wherein the logic operator in the recognition strategy comprises AND operation, OR operation and NOT operation, the priority scanning result comprises miss and hit, the default scanning result of the preset recognition rule of the non-acquired scanning result in the recognition strategy is 'uncertain', the 'uncertain' default scanning result is required to be specially explained in the logic operation, and if the priority scanning result is 'hit' or 'miss', the logic operation is the same as the common operation process.
The judging process specifically comprises the following steps: in the identification strategy, searching a logic operator corresponding to a priority identification rule according to a combination method of the logic operator;
if the logical operator corresponding to the priority identification rule is AND operation, if the priority scanning result is not hit, determining that the logical operator corresponding to the priority scanning result can calculate to obtain a logical operation identification result, and acquiring the logical operation identification result not hit;
if the logical operator corresponding to the priority identification rule is OR operation, if the priority scanning result is hit, determining that the logical operator corresponding to the priority scanning result can calculate to obtain a logical operation identification result, and acquiring the logical operation identification result hit;
if the logic operator corresponding to the priority identification rule is a non-operation, determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result if the priority scanning result is 'uncertain', acquiring the logic operation identification result 'uncertain', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result if the priority scanning result is 'miss', acquiring the logic operation identification result 'hit', and determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result if the priority scanning result is 'hit', and acquiring the logic operation identification result 'miss';
If the logic operator corresponding to the priority scanning result is not determined to be capable of calculating to obtain a logic operation recognition result, determining that the recognition strategy result cannot be calculated according to the priority scanning result;
if the logic operator corresponding to the preferential scanning result is determined to be capable of calculating to obtain a logic operation identification result, searching the logic operator corresponding to the logic operation identification result according to a combination method of the logic operator in the identification strategy;
if the logic operation recognition result can be calculated according to the operator type of the logic operator corresponding to the logic operation recognition result, continuing to judge whether the logic operation recognition result can be calculated or not until the logic operation recognition result does not have the corresponding logic operator and the logic operation recognition result can be calculated, determining whether the recognition strategy result can be calculated according to the priority scanning result or not, and otherwise, determining that the recognition strategy result can not be calculated according to the priority scanning result.
If the identification policy result is "miss", the data to be identified is not sensitive data, and if the identification policy result is "hit", the data to be identified is sensitive data. If the data to be identified is sensitive data, adopting a mode of prohibiting transmission, generating an alarm prompt popup window or tracing the source of the data to be identified to process the data to be identified.
208. When the priority scanning result cannot be calculated to obtain the recognition strategy result of the recognition strategy, recording the priority scanning result corresponding to the priority recognition rule, re-extracting the priority recognition rule in the recognition strategy, obtaining the secondary priority scanning result corresponding to the re-extracted priority recognition rule, and judging whether the recognition strategy result of the recognition strategy can be calculated according to the priority scanning result and the secondary priority scanning result.
If the first selected priority scanning result of the priority recognition rule cannot determine the recognition strategy result, recording the first scanning result, selecting the priority recognition rule for the second time, acquiring the second scanning result, and continuously recording the scanned priority scanning result according to the first scanning result, the second scanning result and the determined recognition strategy result by judging, and continuously acquiring the next priority scanning result of the priority recognition rule until the recognition strategy result can be acquired.
In general, the application needs to traverse a two-layer loop structure of the identification strategy and the data to be identified in determining whether the data to be identified is sensitive data. Assuming that the recognition strategy comprises three preset recognition rules including rule 1, rule 2 and rule 2, the logic operation between the preset recognition rules is "rule 1& rule 2| -! And 3", assuming that the extracted priority identification rule is rule 3, the rule type of the rule 3 is a channel protocol type identifier, extracting channel type data corresponding to the channel type identifier in the data to be identified, and then judging whether the channel type data is sensitive data by a scanning algorithm in the rule 3. At this time, the default scan results of rule 1 and rule 2 are "uncertain", and assuming that the result of scanning the data to be identified according to rule 3 is hit, the logic operation result of the identification policy is equivalent to calculating "uncertain & uncertain|miss", and the logic operation result of the identification policy cannot be obtained, and further, the extraction of the priority identification rule in the identification policy needs to be continued. Assuming that the result of scanning the data to be identified according to the rule 3 is a miss, the logical operation result of the identification policy is equivalent to calculating "uncertainty & uncertainty |hit", the logical operation result of the identification policy cannot be obtained, and the preferential identification rule in the identification policy needs to be continuously extracted. The above corresponds to extracting the priority identification rule from each rule in the cyclic identification policy, and then, the data in the data to be identified is circulated to acquire rule data. And extracting the priority identification rule from the rule 1 and the rule 2, and repeating the process similarly until the result of the identification strategy is obtained by calculation. The data to be identified is sensitive data if the identification policy result is "hit", and is not sensitive data if the identification policy result is "miss".
The invention provides a method for quickly identifying sensitive data, which comprises the steps of generating an identification strategy of the data to be identified according to preset identification rules, selecting a priority identification rule in the identification strategy, extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule, scanning the rule data according to the priority identification rule to obtain a priority scanning result, and finally determining whether the identification strategy result is the identification result of the sensitive data or not when the identification strategy result of the identification strategy can be obtained by calculation according to the priority scanning result. Compared with the prior art, the method and the device have the advantages that the rule data corresponding to the rule type is selected from the data to be identified, if the identification strategy result can be determined according to the rule data, other data in the data to be identified are not scanned, the rule data as few as possible are extracted, the data to be identified as few as possible are scanned, and the identification strategy result of whether the data to be identified is sensitive data or not is obtained according to the preferential scanning result as few as possible, so that the computer system resource occupation is reduced, and the identification efficiency of the sensitive data is improved.
Further, as an implementation of the method shown in fig. 1, an embodiment of the present invention provides a first apparatus for quickly identifying sensitive data, as shown in fig. 3, where the apparatus includes:
The generating module 31 is configured to generate an identification policy of data to be identified according to a preset identification rule, and select a priority identification rule in the identification policy;
an extracting module 32, configured to extract rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule;
an obtaining module 33, configured to obtain a priority scanning result by scanning the rule data according to a priority identification rule;
and the determining module 34 is configured to determine whether the identification policy result is the identification result of the sensitive data or not when the priority scanning result can calculate the identification policy result of the identification policy.
The invention provides a device for rapidly identifying sensitive data, which is characterized in that an identification strategy of the data to be identified is generated according to preset identification rules, a priority identification rule in the identification strategy is selected, rule data corresponding to the rule type in the data to be identified is extracted according to the rule type of the priority identification rule, a priority scanning result is obtained according to the priority identification rule scanning rule data, and finally, when the priority scanning result can be calculated to obtain the identification strategy result of the identification strategy, whether the identification strategy result is the identification result of the sensitive data or not is determined. Compared with the prior art, the method and the device have the advantages that the rule data corresponding to the rule type is selected from the data to be identified, if the identification strategy result can be determined according to the rule data, other data in the data to be identified are not scanned, the rule data as few as possible are extracted, the data to be identified as few as possible are scanned, and the identification strategy result of whether the data to be identified is sensitive data or not is obtained according to the preferential scanning result as few as possible, so that the computer system resource occupation is reduced, and the identification efficiency of the sensitive data is improved.
Further, as an implementation of the method shown in fig. 2, an embodiment of the present invention provides a first apparatus for quickly identifying sensitive data, as shown in fig. 4, where the apparatus includes:
the generating module 41 is configured to generate an identification policy of data to be identified according to a preset identification rule, and select a priority identification rule in the identification policy;
an extracting module 42, configured to extract rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule;
an obtaining module 43, configured to obtain a priority scanning result by scanning the rule data according to a priority identification rule;
and the determining module 44 is configured to determine whether the identification policy result is the identification result of the sensitive data or not when the priority scanning result can calculate the identification policy result of the identification policy.
Further, the generating module 41 is configured to:
and sequentially selecting the priority identification rules in the identification strategy according to the order of the execution efficiency of the preset identification rules from high to low.
Further, the extracting module 42 includes:
a searching unit 421, configured to search a type identifier corresponding to the rule type of the priority identification rule, where the type identifier includes at least one or more of the following combinations: a channel protocol class identifier, an attribute class identifier and a content class identifier;
An extracting unit 422, configured to extract, from the data to be identified, rule data corresponding to the type identifier, where the rule data corresponds to the type identifier one to one, and the rule data includes at least one or a combination of more than one of the following: channel protocol data, attribute data, content data.
Further, the obtaining module 43 is configured to:
scanning the rule data according to a preset scanning algorithm in the priority identification rule to obtain the priority scanning result, wherein the preset scanning algorithm is a character string matching algorithm or an artificial intelligence algorithm.
Further, the apparatus further comprises:
the abstract extracting module 45 is configured to extract rule abstract information of a preset scanning algorithm and rule types in each preset recognition rule according to an information abstract algorithm before the recognition policy result is determined to be a recognition result of whether the data to be recognized is sensitive data, and establish a mapping relationship between the rule abstract information and the preset recognition rules;
the relationship searching module 46 is configured to search, in the preset recognition rules, for a copy recognition rule that is identical to the rule digest information of the priority recognition rule according to the mapping relationship;
And a result assignment module 47, configured to assign a scan result corresponding to the duplicate identification rule as the preferential scan result.
Further, the logical operators in the identification strategy comprise AND operations, and/or OR operations, and/or NOT operations; the preferential scan result includes a miss and a hit; the default scanning result of the preset recognition rule, which does not acquire the scanning result, in the recognition strategy is 'uncertain';
judging that the preferential scanning result can be calculated to obtain the recognition strategy result of the recognition strategy comprises the following steps:
in the identification strategy, searching a logic operator corresponding to the priority identification rule according to a combination method of the logic operator;
if the logic operator corresponding to the priority identification rule is AND operation, if the priority scanning result is not hit, determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result not hit; and/or the number of the groups of groups,
if the logic operator corresponding to the priority identification rule is OR operation, if the priority scanning result is hit, determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result hit; and/or the number of the groups of groups,
If the logic operator corresponding to the priority identification rule is not operation, if the priority scanning result is 'uncertain', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'uncertain', if the priority scanning result is 'miss', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'hit', and if the priority scanning result is 'hit', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'miss'; and/or the number of the groups of groups,
if the logic operator corresponding to the priority scanning result is not determined to be capable of calculating to obtain a logic operation recognition result, determining that the recognition strategy result can not be calculated according to the priority scanning result; and/or the number of the groups of groups,
if the logic operator corresponding to the preferential scanning result is determined to be capable of calculating to obtain a logic operation identification result, searching the logic operator corresponding to the logic operation identification result according to a combination method of the logic operator in the identification strategy; and/or the number of the groups of groups,
If the logic operation recognition result can be calculated according to the operator type of the logic operator corresponding to the logic operation recognition result, continuing to judge whether the logic operation recognition result can be calculated or not until the logic operation recognition result does not have the corresponding logic operator and the logic operation recognition result can be calculated, determining whether the recognition strategy result can be calculated according to the priority scanning result or not, and otherwise, determining that the recognition strategy result cannot be calculated according to the priority scanning result.
Further, after the determining that the recognition policy result is the recognition result of whether the data to be recognized is sensitive data, the method further includes:
and the recording module 48 is configured to record a priority scanning result corresponding to the priority recognition rule and re-extract the priority recognition rule in the recognition policy when the priority scanning result cannot be calculated to obtain the recognition policy result of the recognition policy, obtain a secondary priority scanning result corresponding to the re-extracted priority recognition rule, and determine whether the recognition policy result of the recognition policy can be calculated according to the priority scanning result and the secondary priority scanning result.
Further, the apparatus further comprises:
the determining module 44 is further configured to determine that the data to be identified is not sensitive data if the identification policy result is a miss after the identification policy result is determined to be the identification result of whether the data to be identified is sensitive data;
the determining module 44 is further configured to determine that the data to be identified is sensitive data if the identification policy result is hit, and process the data to be identified by prohibiting transmission, generating an alarm prompt pop-up window, or tracing the source of the data to be identified.
The invention provides a device for rapidly identifying sensitive data, which is characterized in that an identification strategy of the data to be identified is generated according to preset identification rules, a priority identification rule in the identification strategy is selected, rule data corresponding to the rule type in the data to be identified is extracted according to the rule type of the priority identification rule, a priority scanning result is obtained according to the priority identification rule scanning rule data, and finally, when the priority scanning result can be calculated to obtain the identification strategy result of the identification strategy, whether the identification strategy result is the identification result of the sensitive data or not is determined. Compared with the prior art, the method and the device have the advantages that the rule data corresponding to the rule type is selected from the data to be identified, if the identification strategy result can be determined according to the rule data, other data in the data to be identified are not scanned, the rule data as few as possible are extracted, the data to be identified as few as possible are scanned, and the identification strategy result of whether the data to be identified is sensitive data or not is obtained according to the preferential scanning result as few as possible, so that the computer system resource occupation is reduced, and the identification efficiency of the sensitive data is improved.
According to one embodiment of the present invention, there is provided a storage medium storing at least one executable instruction for performing the method of quickly identifying sensitive data in any of the method embodiments described above.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present invention, and the specific embodiment of the present invention is not limited to the specific implementation of the computer device.
As shown in fig. 5, the computer device may include: a processor 502, a communication interface (Communications Interface) 504, a memory 506, and a communication bus 508.
Wherein: processor 502, communication interface 504, and memory 506 communicate with each other via communication bus 508.
A communication interface 504 for communicating with network elements of other devices, such as clients or other servers.
The processor 502 is configured to execute the program 510, and may specifically perform the relevant steps in the method embodiment for quickly identifying sensitive data.
In particular, program 510 may include program code including computer-operating instructions.
The processor 502 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention. The one or more processors included in the computer device may be the same type of processor, such as one or more CPUs; but may also be different types of processors such as one or more CPUs and one or more ASICs.
A memory 506 for storing a program 510. Memory 506 may comprise high-speed RAM memory or may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 510 may be specifically operable to cause the processor 502 to:
generating an identification strategy of data to be identified according to a preset identification rule, and selecting a priority identification rule in the identification strategy;
extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule;
scanning the rule data according to a priority identification rule to obtain a priority scanning result;
when the preferential scanning result can be calculated to obtain the recognition strategy result of the recognition strategy, determining whether the recognition strategy result is the recognition result of the sensitive data or not.
It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for quickly identifying sensitive data, comprising:
generating an identification strategy of data to be identified according to a preset identification rule, and selecting a priority identification rule in the identification strategy;
extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule;
scanning the rule data according to a priority identification rule to obtain a priority scanning result;
when the priority scanning result can calculate the recognition strategy result of the recognition strategy, extracting rule abstract information of a preset scanning algorithm and rule types in each preset recognition rule according to an information abstract algorithm, establishing a mapping relation between the rule abstract information and the preset recognition rule, searching a copy recognition rule identical to the rule abstract information of the priority recognition rule according to the mapping relation in the preset recognition rule, assigning the scanning result corresponding to the copy recognition rule as the priority scanning result, and determining whether the recognition strategy result is the recognition result of the data to be recognized is sensitive data or not.
2. The method of claim 1, wherein said selecting a priority identification rule in said identification policy comprises:
and sequentially selecting the priority identification rules in the identification strategy according to the order of the execution efficiency of the preset identification rules from high to low.
3. The method of claim 1, wherein extracting rule data corresponding to the rule type from the data to be identified according to the rule type of the priority identification rule comprises:
searching a type identifier corresponding to the rule type of the priority identification rule, wherein the type identifier comprises at least one or a combination of more than one of the following: a channel protocol class identifier, an attribute class identifier and a content class identifier;
extracting rule data corresponding to the type identifier from the data to be identified, wherein the rule data is in one-to-one correspondence with the type identifier, and the rule data comprises at least one or a combination of more than one of the following: channel protocol data, attribute data, content data.
4. The method of claim 1, wherein scanning the rule data according to a priority identification rule obtains a priority scan result, comprising:
Scanning the rule data according to a preset scanning algorithm in the priority identification rule to obtain the priority scanning result, wherein the preset scanning algorithm is a character string matching algorithm or an artificial intelligence algorithm.
5. The method of claim 1, wherein logical operators in the recognition policy include and operations, and/or operations, and/or non-operations; the preferential scan result includes a miss and a hit; the default scanning result of the preset recognition rule, which does not acquire the scanning result, in the recognition strategy is 'uncertain';
judging that the preferential scanning result can be calculated to obtain the recognition strategy result of the recognition strategy comprises the following steps:
in the identification strategy, searching a logic operator corresponding to the priority identification rule according to a combination method of the logic operator;
if the logic operator corresponding to the priority identification rule is AND operation, if the priority scanning result is not hit, determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result not hit; and/or the number of the groups of groups,
if the logic operator corresponding to the priority identification rule is OR operation, if the priority scanning result is hit, determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result hit; and/or the number of the groups of groups,
If the logic operator corresponding to the priority identification rule is not operation, if the priority scanning result is 'uncertain', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'uncertain', if the priority scanning result is 'miss', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'hit', and if the priority scanning result is 'hit', determining that the logic operator corresponding to the priority scanning result can calculate to obtain a logic operation identification result, and acquiring the logic operation identification result 'miss'; and/or the number of the groups of groups,
if the logic operator corresponding to the priority scanning result is not determined to be capable of calculating to obtain a logic operation recognition result, determining that the recognition strategy result can not be calculated according to the priority scanning result; and/or the number of the groups of groups,
if the logic operator corresponding to the preferential scanning result is determined to be capable of calculating to obtain a logic operation identification result, searching the logic operator corresponding to the logic operation identification result according to a combination method of the logic operator in the identification strategy; and/or the number of the groups of groups,
If the logic operation recognition result can be calculated according to the operator type of the logic operator corresponding to the logic operation recognition result, continuing to judge whether the logic operation recognition result can be calculated or not until the logic operation recognition result does not have the corresponding logic operator and the logic operation recognition result can be calculated, determining whether the recognition strategy result can be calculated according to the priority scanning result or not, and otherwise, determining that the recognition strategy result cannot be calculated according to the priority scanning result.
6. The method of claim 1, wherein after the determining that the recognition policy result is a recognition result of whether the data to be recognized is sensitive data, the method further comprises:
when the priority scanning result cannot be calculated to obtain the recognition strategy result of the recognition strategy, recording the priority scanning result corresponding to the priority recognition rule, re-extracting the priority recognition rule in the recognition strategy, obtaining the secondary priority scanning result corresponding to the re-extracted priority recognition rule, and judging whether the recognition strategy result of the recognition strategy can be calculated according to the priority scanning result and the secondary priority scanning result.
7. The method of claim 1, wherein after the determining that the recognition policy result is a recognition result of whether the data to be recognized is sensitive data, the method further comprises:
if the identification strategy results are not hit, determining that the data to be identified is not sensitive data;
if the identification strategy results are hit, determining that the data to be identified is sensitive data, and processing the data to be identified by adopting a mode of prohibiting transmission, generating an alarm prompt popup window or tracing the source of the data to be identified.
8. An apparatus for quickly identifying sensitive data, comprising:
the generation module is used for generating an identification strategy of the data to be identified according to a preset identification rule, and selecting a priority identification rule in the identification strategy;
the extraction module is used for extracting rule data corresponding to the rule type in the data to be identified according to the rule type of the priority identification rule;
the acquisition module is used for scanning the rule data according to the priority identification rule to acquire a priority scanning result;
and the determining module is used for extracting rule abstract information of a preset scanning algorithm and rule types in each preset recognition rule according to an information abstract algorithm when the recognition strategy result of the recognition strategy can be obtained by calculation according to the priority scanning result, establishing a mapping relation between the rule abstract information and the preset recognition rule, searching a copy recognition rule which is the same as the rule abstract information of the priority recognition rule according to the mapping relation in the preset recognition rule, assigning the scanning result corresponding to the copy recognition rule as the priority scanning result, and determining whether the recognition strategy result is the recognition result of the data to be recognized as sensitive data or not.
9. A storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the method of rapidly identifying sensitive data as claimed in any one of claims 1 to 7.
10. A computer device, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete communication with each other through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the method for quickly identifying sensitive data according to any one of claims 1-7.
CN202010145893.4A 2020-03-05 2020-03-05 Method and device for rapidly identifying sensitive data Active CN113360522B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010145893.4A CN113360522B (en) 2020-03-05 2020-03-05 Method and device for rapidly identifying sensitive data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010145893.4A CN113360522B (en) 2020-03-05 2020-03-05 Method and device for rapidly identifying sensitive data

Publications (2)

Publication Number Publication Date
CN113360522A CN113360522A (en) 2021-09-07
CN113360522B true CN113360522B (en) 2023-10-31

Family

ID=77523577

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010145893.4A Active CN113360522B (en) 2020-03-05 2020-03-05 Method and device for rapidly identifying sensitive data

Country Status (1)

Country Link
CN (1) CN113360522B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114783581B (en) * 2022-06-22 2022-09-06 北京惠每云科技有限公司 Reporting method and reporting device for single disease type data

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103973684A (en) * 2014-05-07 2014-08-06 北京神州绿盟信息安全科技股份有限公司 Rule compiling and matching method and device
EP2942731A1 (en) * 2014-05-10 2015-11-11 Informatica Corporation Identifying and securing sensitive data at its source
CN105825138A (en) * 2015-01-04 2016-08-03 北京神州泰岳软件股份有限公司 Sensitive data identification method and device
CN106161095A (en) * 2016-07-15 2016-11-23 北京奇虎科技有限公司 The method for early warning of leaking data and device
CN106789964A (en) * 2016-12-02 2017-05-31 中国移动通信集团新疆有限公司 Cloud resource pool data safety detection method and system
CN108009430A (en) * 2017-12-22 2018-05-08 北京明朝万达科技股份有限公司 A kind of sensitive data fast scanning method and device
CN108052826A (en) * 2017-12-20 2018-05-18 北京明朝万达科技股份有限公司 Distributed sensitive data scan method and system based on anti-data-leakage terminal
CN108062484A (en) * 2017-12-11 2018-05-22 北京安华金和科技有限公司 A kind of classification stage division based on data sensitive feature and database metadata
CN109271808A (en) * 2018-09-07 2019-01-25 北明软件有限公司 A kind of data inactivity desensitization system and method based on the discovery of database sensitivity
CN109684469A (en) * 2018-12-13 2019-04-26 平安科技(深圳)有限公司 Filtering sensitive words method, apparatus, computer equipment and storage medium
CN109766485A (en) * 2018-12-07 2019-05-17 中国电力科学研究院有限公司 A kind of sensitive information inspection method and system
CN110506271A (en) * 2017-03-23 2019-11-26 微软技术许可有限责任公司 For the configurable annotation of privacy-sensitive user content

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103973684A (en) * 2014-05-07 2014-08-06 北京神州绿盟信息安全科技股份有限公司 Rule compiling and matching method and device
EP2942731A1 (en) * 2014-05-10 2015-11-11 Informatica Corporation Identifying and securing sensitive data at its source
CN105825138A (en) * 2015-01-04 2016-08-03 北京神州泰岳软件股份有限公司 Sensitive data identification method and device
CN106161095A (en) * 2016-07-15 2016-11-23 北京奇虎科技有限公司 The method for early warning of leaking data and device
CN106789964A (en) * 2016-12-02 2017-05-31 中国移动通信集团新疆有限公司 Cloud resource pool data safety detection method and system
CN110506271A (en) * 2017-03-23 2019-11-26 微软技术许可有限责任公司 For the configurable annotation of privacy-sensitive user content
CN108062484A (en) * 2017-12-11 2018-05-22 北京安华金和科技有限公司 A kind of classification stage division based on data sensitive feature and database metadata
CN108052826A (en) * 2017-12-20 2018-05-18 北京明朝万达科技股份有限公司 Distributed sensitive data scan method and system based on anti-data-leakage terminal
CN108009430A (en) * 2017-12-22 2018-05-08 北京明朝万达科技股份有限公司 A kind of sensitive data fast scanning method and device
CN109271808A (en) * 2018-09-07 2019-01-25 北明软件有限公司 A kind of data inactivity desensitization system and method based on the discovery of database sensitivity
CN109766485A (en) * 2018-12-07 2019-05-17 中国电力科学研究院有限公司 A kind of sensitive information inspection method and system
CN109684469A (en) * 2018-12-13 2019-04-26 平安科技(深圳)有限公司 Filtering sensitive words method, apparatus, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于流量分析的大数据环境敏感数据识别方法;高运霞;;信息通信(第12期);全文 *

Also Published As

Publication number Publication date
CN113360522A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN107590214B (en) Recommendation method and device for search keywords and electronic equipment
US9460117B2 (en) Image searching
CN111506498A (en) Automatic generation method and device of test case, computer equipment and storage medium
EP3292481B1 (en) Method, system and computer program product for performing numeric searches
EP3767483A1 (en) Method, device, system, and server for image retrieval, and storage medium
CN112866023A (en) Network detection method, model training method, device, equipment and storage medium
CN112989348B (en) Attack detection method, model training method, device, server and storage medium
CN113961768B (en) Sensitive word detection method and device, computer equipment and storage medium
CN110795756A (en) Data desensitization method and device, computer equipment and computer readable storage medium
US20120143844A1 (en) Multi-level coverage for crawling selection
CN108429785A (en) A kind of generation method, reptile recognition methods and the device of reptile identification encryption string
CN109492118A (en) A kind of data detection method and detection device
CN113360522B (en) Method and device for rapidly identifying sensitive data
CN111897828A (en) Data batch processing implementation method, device, equipment and storage medium
CN107992402A (en) Blog management method and log management apparatus
CN112287382A (en) Safety compliance processing system and method for equipment data
CN113810375A (en) Webshell detection method, device and equipment and readable storage medium
CN110442696B (en) Query processing method and device
US8572231B2 (en) Variable-length nonce generation
CN115168509A (en) Processing method and device of wind control data, storage medium and computer equipment
CN112711574A (en) Database security detection method and device, electronic equipment and medium
CN113420288A (en) Container mirror image sensitive information detection system and method
CN111125567A (en) Equipment marking method and device, electronic equipment and storage medium
CN114595148B (en) Java null pointer reference detection method and system based on data stream propagation analysis
CN114928466B (en) Automatic identification method and device for encrypted data, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100032 NO.332, 3rd floor, Building 102, 28 xinjiekouwai street, Xicheng District, Beijing

Applicant after: QAX Technology Group Inc.

Applicant after: Qianxin Wangshen information technology (Beijing) Co.,Ltd.

Address before: 100032 NO.332, 3rd floor, Building 102, 28 xinjiekouwai street, Xicheng District, Beijing

Applicant before: QAX Technology Group Inc.

Applicant before: LEGENDSEC INFORMATION TECHNOLOGY (BEIJING) Inc.

GR01 Patent grant
GR01 Patent grant