CN113254577A - Sensitive file detection method, device, equipment and storage medium - Google Patents

Sensitive file detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN113254577A
CN113254577A CN202110514767.6A CN202110514767A CN113254577A CN 113254577 A CN113254577 A CN 113254577A CN 202110514767 A CN202110514767 A CN 202110514767A CN 113254577 A CN113254577 A CN 113254577A
Authority
CN
China
Prior art keywords
file
detected
sensitive
current
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110514767.6A
Other languages
Chinese (zh)
Inventor
晏明飞
崔士波
王宇星
刘思宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Hongteng Intelligent Technology Co ltd
Original Assignee
Beijing Hongteng Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Hongteng Intelligent Technology Co ltd filed Critical Beijing Hongteng Intelligent Technology Co ltd
Priority to CN202110514767.6A priority Critical patent/CN113254577A/en
Publication of CN113254577A publication Critical patent/CN113254577A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The invention belongs to the technical field of computers, and discloses a sensitive file detection method, a sensitive file detection device, sensitive file detection equipment and a storage medium. According to the method, when a file detection instruction is detected, a file set to be detected is determined according to the file detection instruction; selecting a current file to be detected in a file set to be detected, and performing data analysis on the current file to be detected to obtain content to be detected; acquiring the file type of a current file to be detected, and determining a corresponding sensitive matching strategy according to the file type; and carrying out sensitive information detection on the content to be detected through a sensitive matching strategy so as to obtain a sensitive detection result of the current file to be detected. The corresponding sensitive matching strategy is determined according to the file type, so that the acquired sensitive matching strategy can be used for correctly detecting the sensitive information of the current file to be detected of different file types, and the sensitive detection result can be accurately acquired, thereby determining whether the sensitive information exists in the current file to be detected.

Description

Sensitive file detection method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of computers, in particular to a sensitive file detection method, a sensitive file detection device, sensitive file detection equipment and a storage medium.
Background
Computer network and information system all have extensive application in enterprise and secret unit, and it provides solid guarantee for the electronization of official working and production construction, automation, and various information is stored in computer system with the form of electronic file in a large number, and electronic file can propagate in various forms, if the information supervision is not strong, can have very big potential safety hazard, and the hidden danger is mainly in several aspects: a security unit stores an absolute security level file, and if the supervision is not successful, the file is divulged; and the illegal personnel spread the illegal contents through the electronic file. With the gradual development of the homemade computer systems, the application of the homemade computer systems is more and more extensive, and various enterprises or security units gradually use the homemade computer systems, but at present, the homemade computer systems do not provide the function of effectively checking sensitive files.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a method, a device, equipment and a storage medium for detecting a sensitive file, and aims to solve the technical problem that the sensitive file cannot be effectively detected in the prior art.
In order to achieve the above object, the present invention provides a method for detecting a sensitive document, comprising the steps of:
when a file detection instruction is detected, determining a file set to be detected according to the file detection instruction;
selecting a current file to be detected in the file set to be detected, and performing data analysis on the current file to be detected to obtain a content to be detected;
acquiring the file type of the current file to be detected, and determining a corresponding sensitive matching strategy according to the file type;
and carrying out sensitive information detection on the content to be detected through the sensitive matching strategy so as to obtain a sensitive detection result of the current file to be detected.
Optionally, when the file detection instruction is detected, the step of determining the file set to be detected according to the file detection instruction includes:
when a file detection instruction is detected, extracting a file detection range and a type matching rule from the file detection instruction;
determining a directory of the file to be detected according to the file detection range;
and acquiring files of which the file types meet the type matching rules in the file directory to be detected so as to acquire a file set to be detected.
Optionally, the step of selecting a current file to be detected from the set of files to be detected, and performing data analysis on the current file to be detected to obtain the content to be detected includes:
selecting a current file to be detected in the file set to be detected according to a preset file selection rule;
detecting whether the current file to be detected is a compressed file;
and when the current file to be detected is not a compressed file, performing data analysis on the current file to be detected to obtain the content to be detected.
Optionally, after the step of detecting whether the current file to be detected is a compressed file, the method further includes:
when the current file to be detected is a compressed file, decompressing the current file to be detected to obtain a decompressed file set;
adding the files in the decompressed file set to the file set to be detected;
and removing the current to-be-detected files from the to-be-detected file set, and returning to the step of selecting the current to-be-detected files in the to-be-detected file set according to a preset file selection rule.
Optionally, the step of performing data analysis on the current file to be detected to obtain the content to be detected includes:
detecting the file type of the current file to be detected;
when the file type is a file type, determining the file format of the current file to be detected;
and acquiring a document analysis strategy corresponding to the document format, and performing text data analysis on the current file to be detected according to the document analysis strategy to acquire content to be detected.
Optionally, after the step of detecting the file type of the current file to be detected, the method further includes:
when the file type is the picture type, determining the picture format of the current file to be detected;
acquiring a feature extraction interface corresponding to the picture format;
and extracting the characteristics of the current file to be detected through the characteristic extraction interface to obtain the content to be detected.
Optionally, after the step of detecting the file type of the current file to be detected, the method further includes:
when the file type is a video type, determining the video format of the current file to be detected;
acquiring a video analysis interface corresponding to the video format;
and performing video frame decomposition on the current file to be detected through the video analysis interface to obtain the content to be detected.
Optionally, the step of obtaining the file type of the current file to be detected and determining the corresponding sensitive matching policy according to the file type includes:
acquiring the file type of the current file to be detected;
when the file type is a document type, determining that the corresponding sensitive matching strategy is a document sensitive matching strategy;
correspondingly, the step of performing sensitive information detection on the content to be detected based on the sensitive matching policy to obtain a sensitive detection result of the current file to be detected includes:
acquiring a sensitive keyword matching rule set, a deformed word matching rule set and an ignoring rule set through the document sensitive matching strategy;
and performing sensitive information detection on the text data in the content to be detected according to the sensitive keyword matching rule set, the deformed word matching rule set and the omission rule set to obtain a sensitive detection result of the current file to be detected.
Optionally, the step of obtaining the file type of the current file to be detected and determining the corresponding sensitive matching policy according to the file type includes:
acquiring the file type of the current file to be detected;
when the file type is the picture type, determining that the corresponding sensitive matching strategy is the picture sensitive matching strategy;
correspondingly, the step of performing sensitive information detection on the content to be detected based on the sensitive matching policy to obtain a sensitive detection result of the current file to be detected includes:
acquiring a corresponding picture sensitivity detection model according to the picture sensitivity matching strategy;
and carrying out sensitive information detection on the picture characteristic information in the content to be detected through the picture sensitive detection model so as to obtain a sensitive detection result of the current file to be detected.
Optionally, the step of obtaining the file type of the current file to be detected and determining the corresponding sensitive matching policy according to the file type includes:
acquiring the file type of the current file to be detected;
when the file type is a video type, determining that a corresponding sensitive matching strategy is a video sensitive matching strategy;
correspondingly, the step of performing sensitive information detection on the content to be detected based on the sensitive matching policy to obtain a sensitive detection result of the current file to be detected includes:
acquiring a corresponding video sensitivity detection model according to the video sensitivity matching strategy;
and detecting the sensitive information of each video frame in the content to be detected through the video sensitive detection model so as to obtain a sensitive detection result of the current file to be detected.
Optionally, after the step of performing sensitive information detection on the content to be detected through the sensitive matching policy to obtain a sensitive detection result of the current file to be detected, the method further includes:
storing the sensitive detection result into a preset detection result library according to the file detection instruction, and marking the current file to be detected;
detecting whether the to-be-detected files which are not marked exist in the to-be-detected file set or not;
and returning to the step of selecting the current file to be detected in the file set to be detected and analyzing the data of the current file to be detected to obtain the content to be detected when the unmarked file to be detected exists in the file set to be detected.
Optionally, after the step of detecting whether the unmarked to-be-detected file exists in the to-be-detected file set, the method further includes:
when the unmarked files to be detected do not exist in the set of files to be detected, acquiring a sensitive detection result corresponding to the file detection instruction in the preset detection result library;
constructing a sensitive file detection report according to the to-be-detected file set and the sensitive detection result corresponding to the file detection instruction;
and storing and displaying the sensitive file detection report.
In addition, in order to achieve the above object, the present invention further provides a sensitive document detection apparatus, including the following modules:
the instruction receiving module is used for determining a file set to be detected according to the file detection instruction when the file detection instruction is detected;
the file analysis module is used for selecting a current file to be detected in the set of files to be detected and analyzing data of the current file to be detected to obtain content to be detected;
the strategy matching module is used for acquiring the file type of the current file to be detected and determining a corresponding sensitive matching strategy according to the file type;
and the sensitivity detection module is used for detecting the sensitivity information of the content to be detected through the sensitivity matching strategy so as to obtain the sensitivity detection result of the current file to be detected.
Optionally, the instruction receiving module is further configured to extract a file detection range and a type matching rule from the file detection instruction when the file detection instruction is detected; determining a directory of the file to be detected according to the file detection range; and acquiring files of which the file types meet the type matching rules in the file directory to be detected so as to acquire a file set to be detected.
Optionally, the file analysis module is further configured to select a current file to be detected in the set of files to be detected according to a preset file selection rule; detecting whether the current file to be detected is a compressed file; and when the current file to be detected is not a compressed file, performing data analysis on the current file to be detected to obtain the content to be detected.
Optionally, the file analysis module is further configured to, when the current file to be detected is a compressed file, decompress the current file to be detected to obtain a decompressed file set; adding the files in the decompressed file set to the file set to be detected; and removing the current to-be-detected files from the to-be-detected file set, and returning to the step of selecting the current to-be-detected files in the to-be-detected file set according to a preset file selection rule.
Optionally, the file analysis module is further configured to detect a file type of the current file to be detected; when the file type is a file type, determining the file format of the current file to be detected; and acquiring a document analysis strategy corresponding to the document format, and performing text data analysis on the current file to be detected according to the document analysis strategy to acquire content to be detected.
Optionally, the file parsing module is further configured to determine a picture format of the current file to be detected when the file type is a picture type; acquiring a feature extraction interface corresponding to the picture format; and extracting the characteristics of the current file to be detected through the characteristic extraction interface to obtain the content to be detected.
In addition, in order to achieve the above object, the present invention further provides a sensitive document detecting apparatus, including: the system comprises a processor, a memory and a sensitive file detection program which is stored on the memory and can run on the processor, wherein the steps of the sensitive file detection method are realized when the sensitive file detection program is executed by the processor.
In addition, in order to achieve the above object, the present invention further provides a computer readable storage medium, where a sensitive file detection program is stored, and when the sensitive file detection program is executed, the steps of the sensitive file detection method described above are implemented.
According to the method, when a file detection instruction is detected, a file set to be detected is determined according to the file detection instruction; selecting a current file to be detected in a file set to be detected, and performing data analysis on the current file to be detected to obtain content to be detected; acquiring the file type of a current file to be detected, and determining a corresponding sensitive matching strategy according to the file type; and carrying out sensitive information detection on the content to be detected through a sensitive matching strategy so as to obtain a sensitive detection result of the current file to be detected. The corresponding sensitive matching strategy is determined according to the file type, so that the acquired sensitive matching strategy can be used for correctly detecting the sensitive information of the current file to be detected of different file types, and the sensitive detection result can be accurately acquired, thereby determining whether the sensitive information exists in the current file to be detected.
Drawings
Fig. 1 is a schematic structural diagram of an electronic device in a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a method for detecting a sensitive document according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a method for detecting a sensitive document according to a second embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method for detecting a sensitive document according to a third embodiment of the present invention;
FIG. 5 is a block diagram of a sensitive document detecting apparatus according to a first embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a sensitive file detection device in a hardware operating environment according to an embodiment of the present invention.
As shown in fig. 1, the electronic device may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.
Those skilled in the art will appreciate that the configuration shown in fig. 1 does not constitute a limitation of the electronic device and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a sensitive file detection program.
In the electronic apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the electronic device of the present invention may be disposed in a sensitive file detection device, and the electronic device calls a sensitive file detection program stored in the memory 1005 through the processor 1001 and executes the sensitive file detection method provided by the embodiment of the present invention.
An embodiment of the present invention provides a method for detecting a sensitive file, and referring to fig. 2, fig. 2 is a schematic flowchart of a first embodiment of the method for detecting a sensitive file according to the present invention.
In this embodiment, the sensitive file detection method includes the following steps:
step S10: and when a file detection instruction is detected, determining a file set to be detected according to the file detection instruction.
It should be noted that the execution subject of this embodiment may be the sensitive file detection device, and the sensitive file detection device may be an electronic device such as a personal computer, a server, and the like, and may also be other devices that can achieve the same or similar functions.
The file detection instruction may be an instruction sent by another device to the sensitive file detection device, or may be an instruction generated based on an operation of a user on the sensitive file detection device. The set of files to be detected may be a set formed by combining the files to be detected. Determining the set of files to be detected according to the file detection instruction may be searching for the files to be detected according to the file detection instruction, and constructing a set according to the searched files to be detected to obtain the set of files to be detected.
It can be understood that, constructing the set according to the searched files to be detected to obtain the set of files to be detected may be to sort the searched files to be detected first, and then construct the set according to the sorting result to obtain the set of files to be detected. The sorting mode for sorting the searched files to be detected can be set according to actual needs, for example: sorted by file name or sorted by file type.
Further, in order to facilitate the user to specify the file to be detected and improve the user experience, step S10 in this embodiment may include:
when a file detection instruction is detected, extracting a file detection range and a type matching rule from the file detection instruction; determining a directory of the file to be detected according to the file detection range; and acquiring files of which the file types meet the type matching rules in the file directory to be detected so as to acquire a file set to be detected.
It should be noted that the file detection range may be a range to be detected, the type matching rule may include each file type, and if the file type of the file satisfies the type matching rule, the file is indicated as the file to be detected. The file detection range and the type matching rule can be set according to actual needs, for example: the type matching rule is set as: and judging that the type matching rule is satisfied if the file suffix of the file is any one of suffixes doc, docm, docx, dot, dotm, dotx, dps, dpt, et, ett, pot, potm, potx, pps, ppsm, ppsx, ppt, pptm, pptx, wps, wpt, xls, xlsm, xlrx, xllt, xltm, xltx and the like.
In actual use, a file directory needing to be detected, namely a file directory to be detected, can be determined according to a file detection range, the file directory to be detected can comprise a plurality of files, files of which file types meet a type matching rule in the file directory to be detected are obtained, and a file set to be detected can be obtained according to an obtained file construction set.
It can be understood that the user only needs to specify the file detection range and the type matching rule when specifying the file to be detected, the corresponding file does not need to be selected, the use is more convenient, and the use experience of the user can be improved.
Step S20: and selecting the current file to be detected in the file set to be detected, and performing data analysis on the current file to be detected to obtain the content to be detected.
It should be noted that the current file to be detected may be a file selected to be currently detected. The data analysis is performed on the current file to be detected to obtain the content to be detected, which may be the data analysis is performed on the current file to be detected to obtain the file content and format information of the current file to be detected to obtain the content to be detected.
Step S30: and acquiring the file type of the current file to be detected, and determining a corresponding sensitive matching strategy according to the file type.
It should be noted that the file types may include a document type, a picture type, and a video type according to different actual storage contents and formats of the file.
It can be understood that, when the file types are different, the detection execution flows are also different, so that the corresponding sensitive matching policy can be set according to the difference of the file types, the sensitive matching policy can include a sensitive information detection flow, and determining the corresponding sensitive matching policy according to the file type can be to search the corresponding sensitive matching policy in a preset sensitive matching policy library according to the file type, where the preset sensitive matching policy library can be a preset database including a mapping relationship between the file type and the sensitive matching policy.
Step S40: and carrying out sensitive information detection on the content to be detected through the sensitive matching strategy so as to obtain a sensitive detection result of the current file to be detected.
It can be understood that by performing the sensitive information detection on the content to be detected by executing the sensitive information flow recorded in the sensitive matching policy, a sensitive detection result, that is, a sensitive detection result of the current file to be detected, can be obtained.
Further, in order to more accurately detect the sensitive information of the file of the document type, step S30 of the embodiment may include:
acquiring the file type of the current file to be detected; and when the file type is a document type, determining that the corresponding sensitive matching strategy is a document sensitive matching strategy.
Accordingly, step S40 of this embodiment may include:
acquiring a sensitive keyword matching rule set, a deformed word matching rule set and an ignoring rule set through the document sensitive matching strategy; and performing sensitive information detection on the text data in the content to be detected according to the sensitive keyword matching rule set, the deformed word matching rule set and the omission rule set to obtain a sensitive detection result of the current file to be detected.
It should be noted that, when the file type of the current file to be detected is the file type, the corresponding sensitive matching policy, that is, the file sensitive matching policy, may be searched in the preset sensitive matching policy library according to the file type. The sensitive keyword matching rule set may be a set constructed by combining a plurality of sensitive keyword matching rules. The inflected word matching rule set may be a set constructed by combining a plurality of inflected word matching rules. The ignore rule set may be a set constructed by combining a plurality of ignore rules. The sensitive keyword matching rule, the morpheme matching rule and the ignoring rule can be preset and stored in a preset rule base.
It should be noted that the sensitive information detection on the text data in the content to be detected according to the sensitive keyword matching rule set, the morphable word matching rule set, and the ignore rule set may be the sensitive information detection on the text data in the content to be detected by a preset sensitive detection algorithm according to the sensitive keyword matching rule set, the morphable word matching rule set, and the ignore rule set, where the preset sensitive detection algorithm may be obtained based on an automaton algorithm (Aho-coreck, AC algorithm) fused with a bitmap algorithm (bitmap algorithm), the preset sensitive detection algorithm may implement an and matching mode, and the and matching mode may combine the keyword matching rule with the morphable word matching rule, which may improve the detection accuracy. For example: aiming at the keyword 'harmony', the keyword 'harmony' has a deformed word 'river crab', if the keyword 'harmony' is detected only and the deformed word cannot be detected, the 'harmony' and the 'river crab' are combined through the 'and' matching mode, so that the missing detection probability can be reduced, and the detection accuracy is improved.
It should be noted that, the preset sensitive detection algorithm may also be configured to combine the ignoring rule in the set of ignoring rules with the corresponding keyword matching rule, so as to ignore a certain keyword and prevent false detection, for example: if the neglected keyword in the neglected rule a is "breast cancer" and the keyword in the keyword matching rule B is "breast", the neglected rule a and the keyword matching rule B may be combined, and if the neglected keyword in the neglected rule a appears in the document content, "breast cancer" exists in the keyword matching rule B, the document content is not determined as sensitive information.
It can be understood that some sensitive words are inevitably mentioned in some academic articles, but by combining the ignoring rule with the keyword matching rule, the academic articles can be effectively prevented from being detected by mistake, the false detection probability can be reduced, and the detection accuracy can be improved.
It should be noted that, since the preset sensitive detection algorithm can combine the rules in the ignore rule set, the sensitive keyword matching rule set, and the morpheme matching rule set and then detect the sensitive information, it is not necessary to set a large number of keyword matching rules for detecting the morpheme, and thus the rule management is simpler and more convenient.
In actual use, the preset rule base can be maintained and managed based on an intelligent learning analysis cloud engine, the intelligent learning analysis cloud engine can be a rule management engine developed according to a machine learning algorithm, the preset rule base can be updated according to the existing sensitive words and the heat sensitive words, and the intelligent learning analysis cloud engine is deployed at the cloud end and is not limited to one or more terminals, so that the problem that a large number of rules are difficult to maintain can be solved.
Further, in order to accurately detect the sensitive information of the file of the picture type, step S30 of this embodiment may include:
acquiring the file type of the current file to be detected; and when the file type is the picture type, determining that the corresponding sensitive matching strategy is the picture sensitive matching strategy.
Accordingly, step S40 of this embodiment may include:
acquiring a corresponding picture sensitivity detection model according to the picture sensitivity matching strategy; and carrying out sensitive information detection on the picture characteristic information in the content to be detected through the picture sensitive detection model so as to obtain a sensitive detection result of the current file to be detected.
It should be noted that, when the file type of the current file to be detected is the picture type, the corresponding sensitive matching policy, that is, the picture sensitive matching policy, may be searched in the preset sensitive matching policy library according to the file type. The image sensitivity detection model may be a neural network model obtained by training a neural network model based on bp (back propagation) in advance, and the image sensitivity detection model may determine whether the image has sensitive information according to the image feature information, for example: and judging whether the picture is a pornographic picture, a violent picture or other illegal pictures.
Further, in order to accurately detect the sensitive information of the video-type file, step S30 in this embodiment may include:
acquiring the file type of the current file to be detected; and when the file type is a video type, determining that the corresponding sensitive matching strategy is a video sensitive matching strategy.
Accordingly, step S40 of this embodiment may include:
acquiring a corresponding video sensitivity detection model according to the video sensitivity matching strategy; and detecting the sensitive information of each video frame in the content to be detected through the video sensitive detection model so as to obtain a sensitive detection result of the current file to be detected.
It should be noted that, when the file type of the current file to be detected is a video type, a corresponding sensitive matching policy, that is, a video sensitive matching policy, may be searched in the preset sensitive matching policy library according to the file type. The video sensitivity detection model may be a neural network model obtained by training based on a bp (back propagation) neural network model in advance, and the video sensitivity detection model may detect a video frame to determine whether the video frame includes sensitive information. The method includes the steps of detecting the sensitive information of each video frame in the content to be detected through the video sensitive detection model, detecting each video frame in the content to be detected one by one through the video sensitive detection model, or detecting a plurality of continuous video frames in the content to be detected through the video sensitive detection model, which is not limited in this embodiment.
The embodiment determines the set of files to be detected according to the file detection instruction when the file detection instruction is detected; selecting a current file to be detected in a file set to be detected, and performing data analysis on the current file to be detected to obtain content to be detected; acquiring the file type of a current file to be detected, and determining a corresponding sensitive matching strategy according to the file type; and carrying out sensitive information detection on the content to be detected through a sensitive matching strategy so as to obtain a sensitive detection result of the current file to be detected. The corresponding sensitive matching strategy is determined according to the file type, so that the acquired sensitive matching strategy can be used for correctly detecting the sensitive information of the current file to be detected of different file types, and the sensitive detection result can be accurately acquired, thereby determining whether the sensitive information exists in the current file to be detected.
Referring to fig. 3, fig. 3 is a flowchart illustrating a method for detecting a sensitive document according to a second embodiment of the present invention.
Based on the first embodiment, in step S20, the method for detecting a sensitive file in this embodiment specifically includes:
step S201: and selecting the current file to be detected in the file set to be detected according to a preset file selection rule.
It should be noted that the preset file selection rule may be a preset rule. Selecting the current to-be-detected files in the to-be-detected file set according to a preset file selection rule, wherein the to-be-detected files which are not marked in the to-be-detected file set are sequentially selected to serve as the current to-be-detected files, and the to-be-detected files which are not marked in the to-be-detected file set are randomly selected to serve as the current to-be-detected files.
Step S202: and detecting whether the current file to be detected is a compressed file.
It should be noted that the compressed file is different from a general file, the compressed file is obtained by performing file compression on a plurality of files, and if the file content of the compressed file needs to be acquired, the processing mode of the compressed file is different from that of the general file, so that before data analysis, it is required to detect whether the current file to be detected is the compressed file.
In actual use, detecting whether the current file to be detected is a compressed file may be to obtain a file suffix of the current file to be detected, compare the file suffix with a file suffix of a common compressed file, and determine whether the current file to be detected is a compressed file according to a comparison result.
Further, in order to perform parsing processing on the compressed file, after step S202, the method of this embodiment may further include:
when the current file to be detected is a compressed file, decompressing the current file to be detected to obtain a decompressed file set; adding the files in the decompressed file set to the file set to be detected; and removing the current to-be-detected files from the to-be-detected file set, and returning to the step of selecting the current to-be-detected files in the to-be-detected file set according to a preset file selection rule.
It should be noted that, because the compressed file is obtained by performing file compression on a plurality of files, the file types of the plurality of files may be different, and it is very difficult to directly obtain the entire file content of the compressed file, therefore, when the current file to be detected is a compressed file, the current file to be detected is decompressed to obtain a decompressed file set, the files in the decompressed file set are added to the file set to be detected, the current file to be detected is removed from the file set to be detected, and then the current file to be detected is selected again from the file set to be detected.
In practical use, adding the files in the decompressed file set to the file set to be detected may be adding the files in the decompressed file set to the foremost end of the file set to be detected, so that when the current file to be detected is selected as the sequence selection, the file obtained by preferentially selecting the decompressed files from the current files to be detected is reselected as the current files to be detected.
It can be understood that, according to actual needs, the source of the file in the decompressed file set, that is, the file obtained by decompressing which compressed file, may also be recorded, which is convenient for a user to explicitly decompress the source of each file in the file set when the detection result is displayed.
Step S203: and when the current file to be detected is not a compressed file, performing data analysis on the current file to be detected to obtain the content to be detected.
It can be understood that, if the current file to be detected is not a compressed file, data analysis may be performed on the current file to be detected to obtain the content to be detected corresponding to the file to be detected.
Further, in order to ensure that the correct content to be detected can be acquired, the step of performing data analysis on the current file to be detected to acquire the content to be detected in this embodiment may include:
detecting the file type of the current file to be detected; when the file type is a file type, determining the file format of the current file to be detected; and acquiring a document analysis strategy corresponding to the document format, and performing text data analysis on the current file to be detected according to the document analysis strategy to acquire content to be detected.
It should be noted that, when data analysis is performed on a current file to be detected, a file type of the file needs to be determined first, that is, whether the file is a document type, a picture type or a video type is distinguished, and for different file types, contents to be detected, which need to be analyzed and obtained, are also different.
It should be noted that when the file type of the current file to be detected is determined to be the file type, the file format of the current file to be detected is also determined, and if the file formats are different, the corresponding processes for performing text data analysis are different. The determining of the document format of the current file to be detected may be obtaining a file suffix of the current file to be detected, and determining the corresponding document format according to the file suffix. The document parsing policy may record a process of parsing text data of a document, and documents in different document formats have different corresponding document parsing policies, for example: for a file with a suffix of ". txt", the file format is a text, and the text content and format information of the file can be directly read to obtain the content to be detected; the file suffixes are files of ". doc", ". pdf", ". xml", ". wps" and the like, the file format is an office file, and for the situation that the text content and format information cannot be directly read from the file, the file header information needs to be obtained first, the structure of the file header information is analyzed, and the text content and the format information are obtained according to the analysis result so as to obtain the content to be detected.
Further, in order to obtain the corresponding content to be detected when the file type is the picture type, after the step of detecting the file type of the current file to be detected, the method may further include:
when the file type is the picture type, determining the picture format of the current file to be detected; acquiring a feature extraction interface corresponding to the picture format; and extracting the characteristics of the current file to be detected through the characteristic extraction interface to obtain the content to be detected.
It should be noted that, when the file type of the current file to be detected is the picture type, the content to be detected that needs to be acquired is the picture characteristic information of the file to be detected. The determining of the picture format of the current file to be detected may be obtaining a file suffix of the current file to be detected, and determining the corresponding picture format according to the file suffix. The feature extraction interface can be a preset interface, and can be used for extracting features of the corresponding file in the picture format to acquire picture feature information. The obtaining of the feature extraction interface corresponding to the picture format may be searching for the corresponding feature extraction interface in a preset feature extraction mapping table according to the picture format, where the preset feature extraction mapping table may include a mapping relationship between the picture format and the corresponding feature extraction interface.
Further, in order to obtain the corresponding content to be detected when the file type is the video type, after the step of detecting the file type of the current file to be detected, the method may further include:
when the file type is a video type, determining the video format of the current file to be detected; acquiring a video analysis interface corresponding to the video format; and performing video frame decomposition on the current file to be detected through the video analysis interface to obtain the content to be detected.
It should be noted that, when the file type of the current file to be detected is a video type, the content to be detected that needs to be acquired is each video frame of the file to be detected. The determining of the video format of the current file to be detected may be obtaining a file suffix of the current file to be detected, and determining the corresponding video format according to the file suffix. The video parsing interface may be a preset interface, and the video parsing interface may perform video frame decomposition on a corresponding file in a video format to obtain each video frame. The obtaining of the video parsing interface corresponding to the video format may be searching for the corresponding video parsing interface in a preset video parsing mapping table according to the video format, where the preset video parsing mapping table may include a mapping relationship between the video format and the corresponding video parsing interface.
In the embodiment, the current file to be detected is selected from the file set to be detected according to a preset file selection rule; detecting whether the current file to be detected is a compressed file; and when the current file to be detected is not a compressed file, performing data analysis on the current file to be detected to obtain the content to be detected. The method has the advantages that the file type of the current file to be detected can be acquired when the current file to be detected is determined to be not a compressed file, different data analysis modes are selected according to the file type, the content to be detected of the uncompressed file can be ensured to be acquired, the current file to be detected can be decompressed when the current file to be detected is determined to be the compressed file, the decompressed file is added to the set of files to be detected, the current file to be detected is reselected, the current file to be detected can be normally processed when the current file to be detected is the compressed file, and universality is improved.
Referring to fig. 4, fig. 4 is a flowchart illustrating a method for detecting a sensitive document according to a second embodiment of the present invention.
Based on the first embodiment, after step S40, the method for detecting a sensitive file of this embodiment further includes:
step S50: and storing the sensitive detection result into a preset detection result library according to the file detection instruction, and marking the current file to be detected.
It should be noted that the sensitive detection result may include information such as a file size and a file type of the file to be detected, whether the file contains sensitive information, and a position of the sensitive information in the file to be detected. The preset detection result library may be a preset database for storing detection results. The storing of the sensitive detection result into a preset detection result library according to the file detection instruction may be an instruction identification for obtaining the file detection instruction, and the storing of the sensitive detection result into the preset detection result library according to the instruction identification.
Step S60: detecting whether the to-be-detected files which are not marked exist in the to-be-detected file set or not; and returning to the step of selecting the current file to be detected in the file set to be detected and analyzing the data of the current file to be detected to obtain the content to be detected when the unmarked file to be detected exists in the file set to be detected.
It can be understood that if there is no unmarked file to be detected in the file set to be detected, it indicates that all files to be detected in the file set to be detected have been detected. If the unmarked files to be detected exist in the file set to be detected, the unmarked files to be detected still exist in the file set to be detected, and the current files to be detected can be reselected, so that when the unmarked files to be detected exist in the file set to be detected, the current files to be detected can be selected in the file set to be detected, and the current files to be detected can be reselected in the step of analyzing the data of the current files to be detected to obtain the content to be detected.
It can be understood that if the unmarked to-be-detected files exist in the to-be-detected file set, it indicates that the to-be-detected files still exist in the to-be-detected file set, and at this time, the detection needs to be continued, so that the step of returning to the step of selecting the current to-be-detected files in the to-be-detected file set, and performing data analysis on the current to-be-detected files so as to obtain the to-be-detected content selects the current to-be-detected files again for continued detection.
Further, in order to facilitate a user to know a detection result, after the step of detecting whether the unmarked to-be-detected file exists in the to-be-detected file set, the method may further include:
when the unmarked files to be detected do not exist in the set of files to be detected, acquiring a sensitive detection result corresponding to the file detection instruction in the preset detection result library; constructing a sensitive file detection report according to the to-be-detected file set and the sensitive detection result corresponding to the file detection instruction; and storing and displaying the sensitive file detection report.
It should be noted that the sensitive file detection report may include a file detection range, a number of detected files, a detection execution time, and a sensitive detection result corresponding to each file to be detected. The step of storing the sensitive file detection report may be to store the sensitive file detection report in a preset storage space, and the step of displaying the sensitive file detection report may enable a user to quickly determine whether a file containing sensitive information exists in a file detection range, so that the user can perform subsequent processing conveniently.
In actual use, a scene of detecting part of sensitive information may involve the detection of the sensitive information of a confidential file, and the detection result and the detection report of the confidential file should be kept secret, at this time, a preset detection result base used for storing the sensitive detection result and a preset storage space used for storing the detection report of the confidential file can be encrypted and protected, and the preset detection result base and the preset storage space can be disconnected from an external network to be operated only in a local area network when necessary, so that the risk of leakage of the confidential information is reduced, and the management capability of the confidential file is improved.
In the embodiment, the sensitive detection result is stored in a preset detection result library according to the file detection instruction, and the current file to be detected is marked; detecting whether the to-be-detected files which are not marked exist in the to-be-detected file set or not; and returning to the step of selecting the current file to be detected in the file set to be detected and analyzing the data of the current file to be detected to obtain the content to be detected when the unmarked file to be detected exists in the file set to be detected. The method has the advantages that whether the unmarked files to be detected exist in the file set to be detected or not can be detected after the detection is finished once, the current files to be detected are selected again to be detected when the unmarked files to be detected exist in the file set to be detected, the files in the file set to be detected can be ensured to be detected, a sensitive file detection report can be established when the unmarked files to be detected do not exist in the file set to be detected, the sensitive file detection report is stored and displayed, a user can quickly determine whether the files containing sensitive information exist in a file detection range or not, the subsequent processing is convenient for the user, the sensitive file detection report is stored, the detection report is not required to be lost, and the user experience can be improved.
In addition, an embodiment of the present invention further provides a storage medium, where a sensitive file detection program is stored on the storage medium, and when being executed by a processor, the sensitive file detection program implements the steps of the sensitive file detection method described above.
Referring to fig. 5, fig. 5 is a block diagram illustrating a first embodiment of the apparatus for detecting sensitive documents according to the present invention.
As shown in fig. 5, the sensitive document detecting apparatus provided in the embodiment of the present invention includes:
the instruction receiving module 501 is configured to, when a file detection instruction is detected, determine a set of files to be detected according to the file detection instruction;
a file analysis module 502, configured to select a current file to be detected in the set of files to be detected, and perform data analysis on the current file to be detected to obtain a content to be detected;
the policy matching module 503 is configured to obtain a file type of the current file to be detected, and determine a corresponding sensitive matching policy according to the file type;
the sensitivity detection module 504 is configured to perform sensitivity information detection on the content to be detected through the sensitivity matching policy to obtain a sensitivity detection result of the current file to be detected.
The embodiment determines the set of files to be detected according to the file detection instruction when the file detection instruction is detected; selecting a current file to be detected in a file set to be detected, and performing data analysis on the current file to be detected to obtain content to be detected; acquiring the file type of a current file to be detected, and determining a corresponding sensitive matching strategy according to the file type; and carrying out sensitive information detection on the content to be detected through a sensitive matching strategy so as to obtain a sensitive detection result of the current file to be detected. The corresponding sensitive matching strategy is determined according to the file type, so that the acquired sensitive matching strategy can be used for correctly detecting the sensitive information of the current file to be detected of different file types, and the sensitive detection result can be accurately acquired, thereby determining whether the sensitive information exists in the current file to be detected.
Further, the instruction receiving module 501 is further configured to, when a file detection instruction is detected, extract a file detection range and a type matching rule in the file detection instruction; determining a directory of the file to be detected according to the file detection range; and acquiring files of which the file types meet the type matching rules in the file directory to be detected so as to acquire a file set to be detected.
Further, the file parsing module 502 is further configured to select a current file to be detected in the set of files to be detected according to a preset file selection rule; detecting whether the current file to be detected is a compressed file; and when the current file to be detected is not a compressed file, performing data analysis on the current file to be detected to obtain the content to be detected.
Further, the file analysis module 502 is further configured to, when the current file to be detected is a compressed file, decompress the current file to be detected to obtain a decompressed file set; adding the files in the decompressed file set to the file set to be detected; and removing the current to-be-detected files from the to-be-detected file set, and returning to the step of selecting the current to-be-detected files in the to-be-detected file set according to a preset file selection rule.
Further, the file parsing module 502 is further configured to detect a file type of the current file to be detected; when the file type is a file type, determining the file format of the current file to be detected; and acquiring a document analysis strategy corresponding to the document format, and performing text data analysis on the current file to be detected according to the document analysis strategy to acquire content to be detected.
Further, the file parsing module 502 is further configured to determine a picture format of the current file to be detected when the file type is a picture type; acquiring a feature extraction interface corresponding to the picture format; and extracting the characteristics of the current file to be detected through the characteristic extraction interface to obtain the content to be detected.
Further, the file parsing module 502 is further configured to determine a video format of the current file to be detected when the file type is a video type; acquiring a video analysis interface corresponding to the video format; and performing video frame decomposition on the current file to be detected through the video analysis interface to obtain the content to be detected.
Further, the policy matching module 503 is further configured to obtain a file type of the current file to be detected; when the file type is a document type, determining that the corresponding sensitive matching strategy is a document sensitive matching strategy;
the sensitivity detection module 504 is further configured to obtain a sensitive keyword matching rule set, a deformed word matching rule set, and an ignoring rule set through the document sensitivity matching policy; and performing sensitive information detection on the text data in the content to be detected according to the sensitive keyword matching rule set, the deformed word matching rule set and the omission rule set to obtain a sensitive detection result of the current file to be detected.
Further, the policy matching module 503 is further configured to obtain a file type of the current file to be detected; when the file type is the picture type, determining that the corresponding sensitive matching strategy is the picture sensitive matching strategy;
the sensitivity detection module 504 is further configured to obtain a corresponding picture sensitivity detection model according to the picture sensitivity matching policy; and carrying out sensitive information detection on the picture characteristic information in the content to be detected through the picture sensitive detection model so as to obtain a sensitive detection result of the current file to be detected.
Further, the policy matching module 503 is further configured to obtain a file type of the current file to be detected; when the file type is a video type, determining that a corresponding sensitive matching strategy is a video sensitive matching strategy;
the sensitivity detection module 504 is further configured to obtain a corresponding video sensitivity detection model according to the video sensitivity matching policy; and detecting the sensitive information of each video frame in the content to be detected through the video sensitive detection model so as to obtain a sensitive detection result of the current file to be detected.
Further, the sensitivity detection module 504 is further configured to store the sensitivity detection result into a preset detection result library according to the file detection instruction, and mark the current file to be detected; detecting whether the to-be-detected files which are not marked exist in the to-be-detected file set or not; and returning to the step of selecting the current file to be detected in the file set to be detected and analyzing the data of the current file to be detected to obtain the content to be detected when the unmarked file to be detected exists in the file set to be detected.
Further, the sensitivity detection module 504 is further configured to, when an unmarked file to be detected does not exist in the set of files to be detected, obtain a sensitivity detection result corresponding to the file detection instruction in the preset detection result library; constructing a sensitive file detection report according to the to-be-detected file set and the sensitive detection result corresponding to the file detection instruction; and storing and displaying the sensitive file detection report.
It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited thereto.
It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.
In addition, the technical details that are not described in detail in this embodiment may refer to the method for detecting a sensitive file provided in any embodiment of the present invention, and are not described herein again.
Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. Read Only Memory (ROM)/RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (e.g. a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
The invention discloses A1 and a sensitive file detection method, which comprises the following steps:
when a file detection instruction is detected, determining a file set to be detected according to the file detection instruction;
selecting a current file to be detected in the file set to be detected, and performing data analysis on the current file to be detected to obtain a content to be detected;
acquiring the file type of the current file to be detected, and determining a corresponding sensitive matching strategy according to the file type;
and carrying out sensitive information detection on the content to be detected through the sensitive matching strategy so as to obtain a sensitive detection result of the current file to be detected.
A2, the method for detecting sensitive files according to A1, wherein when detecting the file detection command, the step of determining the set of files to be detected according to the file detection command includes:
when a file detection instruction is detected, extracting a file detection range and a type matching rule from the file detection instruction;
determining a directory of the file to be detected according to the file detection range;
and acquiring files of which the file types meet the type matching rules in the file directory to be detected so as to acquire a file set to be detected.
A3, the sensitive file detection method as in a1, wherein the step of selecting the current file to be detected from the set of files to be detected and analyzing the data of the current file to be detected to obtain the content to be detected includes:
selecting a current file to be detected in the file set to be detected according to a preset file selection rule;
detecting whether the current file to be detected is a compressed file;
and when the current file to be detected is not a compressed file, performing data analysis on the current file to be detected to obtain the content to be detected.
A4, the method for detecting sensitive files according to A3, wherein after the step of detecting whether the current file to be detected is a compressed file, the method further comprises:
when the current file to be detected is a compressed file, decompressing the current file to be detected to obtain a decompressed file set;
adding the files in the decompressed file set to the file set to be detected;
and removing the current to-be-detected files from the to-be-detected file set, and returning to the step of selecting the current to-be-detected files in the to-be-detected file set according to a preset file selection rule.
A5, the method for detecting sensitive files as in A3, wherein the step of performing data analysis on the current file to be detected to obtain the content to be detected includes:
detecting the file type of the current file to be detected;
when the file type is a file type, determining the file format of the current file to be detected;
and acquiring a document analysis strategy corresponding to the document format, and performing text data analysis on the current file to be detected according to the document analysis strategy to acquire content to be detected.
A6, the method for detecting sensitive files according to claim 5, wherein after the step of detecting the file type of the current file to be detected, the method further includes:
when the file type is the picture type, determining the picture format of the current file to be detected;
acquiring a feature extraction interface corresponding to the picture format;
and extracting the characteristics of the current file to be detected through the characteristic extraction interface to obtain the content to be detected.
A7, the method for detecting sensitive files according to A5, wherein the step of detecting the file type of the current file to be detected further comprises:
when the file type is a video type, determining the video format of the current file to be detected;
acquiring a video analysis interface corresponding to the video format;
and performing video frame decomposition on the current file to be detected through the video analysis interface to obtain the content to be detected.
The sensitive file detecting method A8, as described in A1, includes the steps of obtaining the file type of the current file to be detected, and determining the corresponding sensitive matching policy according to the file type, including:
acquiring the file type of the current file to be detected;
when the file type is a document type, determining that the corresponding sensitive matching strategy is a document sensitive matching strategy;
correspondingly, the step of performing sensitive information detection on the content to be detected based on the sensitive matching policy to obtain a sensitive detection result of the current file to be detected includes:
acquiring a sensitive keyword matching rule set, a deformed word matching rule set and an ignoring rule set through the document sensitive matching strategy;
and performing sensitive information detection on the text data in the content to be detected according to the sensitive keyword matching rule set, the deformed word matching rule set and the omission rule set to obtain a sensitive detection result of the current file to be detected.
The sensitive file detecting method A9, as described in A1, includes the steps of obtaining the file type of the current file to be detected, and determining the corresponding sensitive matching policy according to the file type, including:
acquiring the file type of the current file to be detected;
when the file type is the picture type, determining that the corresponding sensitive matching strategy is the picture sensitive matching strategy;
correspondingly, the step of performing sensitive information detection on the content to be detected based on the sensitive matching policy to obtain a sensitive detection result of the current file to be detected includes:
acquiring a corresponding picture sensitivity detection model according to the picture sensitivity matching strategy;
and carrying out sensitive information detection on the picture characteristic information in the content to be detected through the picture sensitive detection model so as to obtain a sensitive detection result of the current file to be detected.
The sensitive file detecting method A10, as described in A1, includes the steps of obtaining the file type of the current file to be detected, and determining the corresponding sensitive matching policy according to the file type, including:
acquiring the file type of the current file to be detected;
when the file type is a video type, determining that a corresponding sensitive matching strategy is a video sensitive matching strategy;
correspondingly, the step of performing sensitive information detection on the content to be detected based on the sensitive matching policy to obtain a sensitive detection result of the current file to be detected includes:
acquiring a corresponding video sensitivity detection model according to the video sensitivity matching strategy;
and detecting the sensitive information of each video frame in the content to be detected through the video sensitive detection model so as to obtain a sensitive detection result of the current file to be detected.
A11, the method for detecting sensitive files according to a1, where after the step of detecting the sensitive information of the content to be detected by the sensitive matching policy to obtain the sensitive detection result of the current file to be detected, the method further includes:
storing the sensitive detection result into a preset detection result library according to the file detection instruction, and marking the current file to be detected;
detecting whether the to-be-detected files which are not marked exist in the to-be-detected file set or not;
and returning to the step of selecting the current file to be detected in the file set to be detected and analyzing the data of the current file to be detected to obtain the content to be detected when the unmarked file to be detected exists in the file set to be detected.
A12, the method for detecting sensitive files according to A11, wherein after the step of detecting whether the set of files to be detected has the files to be detected which are not marked, the method further comprises:
when the unmarked files to be detected do not exist in the set of files to be detected, acquiring a sensitive detection result corresponding to the file detection instruction in the preset detection result library;
constructing a sensitive file detection report according to the to-be-detected file set and the sensitive detection result corresponding to the file detection instruction;
and storing and displaying the sensitive file detection report.
The invention discloses B13 and a sensitive file detection device, which comprises the following modules:
the instruction receiving module is used for determining a file set to be detected according to the file detection instruction when the file detection instruction is detected;
the file analysis module is used for selecting a current file to be detected in the set of files to be detected and analyzing data of the current file to be detected to obtain content to be detected;
the strategy matching module is used for acquiring the file type of the current file to be detected and determining a corresponding sensitive matching strategy according to the file type;
and the sensitivity detection module is used for detecting the sensitivity information of the content to be detected through the sensitivity matching strategy so as to obtain the sensitivity detection result of the current file to be detected.
B14, the sensitive document detecting device as described in B13, the instruction receiving module is further configured to extract the document detecting range and the type matching rule in the document detecting instruction when the document detecting instruction is detected; determining a directory of the file to be detected according to the file detection range; and acquiring files of which the file types meet the type matching rules in the file directory to be detected so as to acquire a file set to be detected.
B15, the sensitive file detection device according to B13, the file analysis module further configured to select a current file to be detected in the set of files to be detected according to a preset file selection rule; detecting whether the current file to be detected is a compressed file; and when the current file to be detected is not a compressed file, performing data analysis on the current file to be detected to obtain the content to be detected.
B16, the sensitive file detecting device according to B15, the file parsing module, when the current file to be detected is a compressed file, decompressing the current file to be detected to obtain a decompressed file set; adding the files in the decompressed file set to the file set to be detected; and removing the current to-be-detected files from the to-be-detected file set, and returning to the step of selecting the current to-be-detected files in the to-be-detected file set according to a preset file selection rule.
B17, the sensitive file detection device as described in B15, the file parsing module, further configured to detect a file type of the current file to be detected; when the file type is a file type, determining the file format of the current file to be detected; and acquiring a document analysis strategy corresponding to the document format, and performing text data analysis on the current file to be detected according to the document analysis strategy to acquire content to be detected.
B18, the sensitive file detecting device according to B17, the file parsing module further configured to determine a picture format of the current file to be detected when the file type is a picture type; acquiring a feature extraction interface corresponding to the picture format; and extracting the characteristics of the current file to be detected through the characteristic extraction interface to obtain the content to be detected.
The invention discloses C19, a sensitive file detection device, which comprises: the system comprises a processor, a memory and a sensitive file detection program which is stored on the memory and can run on the processor, wherein the steps of the sensitive file detection method are realized when the sensitive file detection program is executed by the processor.
The invention discloses D20 and a computer readable storage medium, wherein a sensitive file detection program is stored on the computer readable storage medium, and when the sensitive file detection program is executed, the steps of the sensitive file detection method are realized.

Claims (10)

1. A sensitive file detection method is characterized by comprising the following steps:
when a file detection instruction is detected, determining a file set to be detected according to the file detection instruction;
selecting a current file to be detected in the file set to be detected, and performing data analysis on the current file to be detected to obtain a content to be detected;
acquiring the file type of the current file to be detected, and determining a corresponding sensitive matching strategy according to the file type;
and carrying out sensitive information detection on the content to be detected through the sensitive matching strategy so as to obtain a sensitive detection result of the current file to be detected.
2. The method for detecting sensitive documents according to claim 1, wherein said step of determining the set of documents to be detected according to the document detection instruction when the document detection instruction is detected comprises:
when a file detection instruction is detected, extracting a file detection range and a type matching rule from the file detection instruction;
determining a directory of the file to be detected according to the file detection range;
and acquiring files of which the file types meet the type matching rules in the file directory to be detected so as to acquire a file set to be detected.
3. The method for detecting the sensitive file according to claim 1, wherein the step of selecting the current file to be detected in the set of files to be detected and performing data analysis on the current file to be detected to obtain the content to be detected comprises:
selecting a current file to be detected in the file set to be detected according to a preset file selection rule;
detecting whether the current file to be detected is a compressed file;
and when the current file to be detected is not a compressed file, performing data analysis on the current file to be detected to obtain the content to be detected.
4. The method for detecting sensitive files according to claim 3, wherein after the step of detecting whether the current file to be detected is a compressed file, the method further comprises:
when the current file to be detected is a compressed file, decompressing the current file to be detected to obtain a decompressed file set;
adding the files in the decompressed file set to the file set to be detected;
and removing the current to-be-detected files from the to-be-detected file set, and returning to the step of selecting the current to-be-detected files in the to-be-detected file set according to a preset file selection rule.
5. The method for detecting the sensitive file according to claim 3, wherein the step of performing data analysis on the current file to be detected to obtain the content to be detected comprises:
detecting the file type of the current file to be detected;
when the file type is a file type, determining the file format of the current file to be detected;
and acquiring a document analysis strategy corresponding to the document format, and performing text data analysis on the current file to be detected according to the document analysis strategy to acquire content to be detected.
6. The method for detecting sensitive documents according to claim 5, wherein said step of detecting the document type of the document to be detected further comprises:
when the file type is the picture type, determining the picture format of the current file to be detected;
acquiring a feature extraction interface corresponding to the picture format;
and extracting the characteristics of the current file to be detected through the characteristic extraction interface to obtain the content to be detected.
7. The method for detecting sensitive documents according to claim 5, wherein said step of detecting the document type of the document to be detected further comprises:
when the file type is a video type, determining the video format of the current file to be detected;
acquiring a video analysis interface corresponding to the video format;
and performing video frame decomposition on the current file to be detected through the video analysis interface to obtain the content to be detected.
8. A sensitive document detection device, characterized in that the sensitive document detection device comprises the following modules:
the instruction receiving module is used for determining a file set to be detected according to the file detection instruction when the file detection instruction is detected;
the file analysis module is used for selecting a current file to be detected in the set of files to be detected and analyzing data of the current file to be detected to obtain content to be detected;
the strategy matching module is used for acquiring the file type of the current file to be detected and determining a corresponding sensitive matching strategy according to the file type;
and the sensitivity detection module is used for detecting the sensitivity information of the content to be detected through the sensitivity matching strategy so as to obtain the sensitivity detection result of the current file to be detected.
9. A sensitive document sensing apparatus, comprising: a processor, a memory and a sensitive file detection program stored on the memory and executable on the processor, the sensitive file detection program when executed by the processor implementing the steps of the sensitive file detection method according to any of claims 1-7.
10. A computer-readable storage medium, having stored thereon a sensitive-file detection program which, when executed, performs the steps of the sensitive-file detection method of any one of claims 1-7.
CN202110514767.6A 2021-05-11 2021-05-11 Sensitive file detection method, device, equipment and storage medium Pending CN113254577A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110514767.6A CN113254577A (en) 2021-05-11 2021-05-11 Sensitive file detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110514767.6A CN113254577A (en) 2021-05-11 2021-05-11 Sensitive file detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113254577A true CN113254577A (en) 2021-08-13

Family

ID=77222890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110514767.6A Pending CN113254577A (en) 2021-05-11 2021-05-11 Sensitive file detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113254577A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116132187A (en) * 2023-02-23 2023-05-16 北京京航计算通讯研究所 Data packet filtering method and system
CN116305291A (en) * 2023-05-16 2023-06-23 北京安天网络安全技术有限公司 Office document secure storage method, device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116132187A (en) * 2023-02-23 2023-05-16 北京京航计算通讯研究所 Data packet filtering method and system
CN116305291A (en) * 2023-05-16 2023-06-23 北京安天网络安全技术有限公司 Office document secure storage method, device, equipment and medium
CN116305291B (en) * 2023-05-16 2023-07-21 北京安天网络安全技术有限公司 Office document secure storage method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US20140212040A1 (en) Document Alteration Based on Native Text Analysis and OCR
CN109446837B (en) Text auditing method and device based on sensitive information and readable storage medium
CN113254577A (en) Sensitive file detection method, device, equipment and storage medium
CN108090351A (en) For handling the method and apparatus of request message
CN115150261B (en) Alarm analysis method, device, electronic equipment and storage medium
CN110929110B (en) Electronic document detection method, device, equipment and storage medium
CN112612756A (en) Abnormal file repairing method, device, equipment and storage medium
EP3301603A1 (en) Improved search for data loss prevention
CN115658080A (en) Method and system for identifying open source code components of software
CN111597553A (en) Process processing method, device, equipment and storage medium in virus searching and killing
CN111026765A (en) Dynamic processing method, equipment, storage medium and device for strictly balanced binary tree
CN114676231A (en) Target information detection method, device and medium
CN112632528A (en) Threat information generation method, equipment, storage medium and device
CN112615873A (en) Internet of things equipment safety detection method, equipment, storage medium and device
CN105653674B (en) File management method and system of intelligent terminal
KR101742041B1 (en) an apparatus for protecting private information, a method of protecting private information, and a storage medium for storing a program protecting private information
CN114143074A (en) Webshell attack recognition device and method
CN114676133A (en) Index creating method, device, equipment and storage medium
CN115618349A (en) Industrial control asset vulnerability detection method, equipment, storage medium and device
CN112883375A (en) Malicious file identification method, device, equipment and storage medium
CN112163217A (en) Malicious software variant identification method, device, equipment and computer storage medium
CN112463319A (en) Content detection model generation method and device, electronic equipment and storage medium
CN113127867A (en) Document identification method, device, equipment and storage medium
CN113641964B (en) Repackaging application detection method, electronic device and storage medium
CN103886031A (en) Method and equipment for image browsing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100020 1773, 15 / F, 17 / F, building 3, No.10, Jiuxianqiao Road, Chaoyang District, Beijing

Applicant after: Sanliu0 Digital Security Technology Group Co.,Ltd.

Address before: 100020 1773, 15 / F, 17 / F, building 3, No.10, Jiuxianqiao Road, Chaoyang District, Beijing

Applicant before: Beijing Hongteng Intelligent Technology Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination