CN105912946A - Document detection method and device - Google Patents
Document detection method and device Download PDFInfo
- Publication number
- CN105912946A CN105912946A CN201610206473.6A CN201610206473A CN105912946A CN 105912946 A CN105912946 A CN 105912946A CN 201610206473 A CN201610206473 A CN 201610206473A CN 105912946 A CN105912946 A CN 105912946A
- Authority
- CN
- China
- Prior art keywords
- file
- strategy
- detected
- content
- content information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6227—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
Abstract
The application is aimed at providing a document detection method and device.Compared with the prior art, the application obtains documents to be detected and extract content information of documents to be detected for matching content information with a pre-set strategy in order to obtain a matching result. If the matching result is successful, trigger action of the strategy is performed. The document detection method and device have following beneficial effects: by adopting the pre-set strategy, content information of documents to be detected is detected so that content information of documents to be detected is detected, encryption processing of content carriers is avoided and operation efficiency of an enterprise is raised; and at the same time, if content information of documents to be detected is matched with the pre-set strategy, the trigger action of the strategy is performed, namely, an alarm is raised when confidential data of the enterprise is stolen and prevented so that confidential data and information of the enterprise are effectively protected.
Description
Technical field
The application relates to computer realm, particularly relates to the technology of a kind of file detection.
Background technology
The problem of the stolen always one Ge Ling enterprise worry of secret data in enterprise, information.At present, in order to
Solve the problem that business data is divulged a secret, some security firms by the storage device such as wireless network, USB
Propose some solutions, although serve certain protective action, but there is also the biggest leakage
Hole is with not enough:
(1) physical isolation technology: some enterprise does not provide online environment, cable network, wireless WiFi
Privately use Deng not allowing.PC end directly blocks or removes USB interface.But, such one
Coming, various Internet resources can not get making full use of of employee, and the closure of USB interface also limit other
The use of USB device, brings some troubles to the normal work of employee, also reduces work simultaneously
Efficiency.
(2) file ciphering technology: some enterprise uses encryption and decryption technology, the document to a certain type
Carrying out unifying encryption, such as Finance Department may be encrypted all of excel file, create
Department may be encrypted all of word document.File after so employee steals encryption is also
Can not use.The shortcoming of this technology be impose uniformity without examining individual cases, word document all encrypt or
Do not encrypt.And, so cause the common word document of employee also to suffer that pressure is encrypted, to employee
The file transmission of daily life brings puzzlement.It addition, the defect of file consolidation encryption is also clearly
, employee is easy to the file that Content Transformation is extended formatting of a file thus escapes at encryption
Reason.
Either physical isolation technology or file ciphering technology, there are greatly the most in actual use
Drawback and trouble, it is stolen that it can not both protect secret data in enterprise, information, can not affect again employee
Routine work, reduce its work efficiency.
Summary of the invention
One purpose of the application is to provide the method and apparatus of a kind of file detection.
An aspect according to the application, it is provided that the method for a kind of file detection, wherein, the method
Including:
Obtain file to be detected, and extract the content information in described file to be detected;
Described content information is mated with the strategy preset, obtains matching result;
If described matching result is that the match is successful, then implementation strategy trigger action.
According to further aspect of the application, it is provided that the equipment of a kind of file detection, wherein, this sets
For including:
First device, is used for obtaining file to be detected, and extracts the content letter in described file to be detected
Breath;
Second device, for being mated with the strategy preset by described content information, obtains coupling knot
Really;
3rd device, if for when described matching result is that the match is successful, then implementation strategy triggers dynamic
Make.
Compared with prior art, the application is by obtaining file to be detected, and extracts described literary composition to be detected
Content information in part, mates described content information with the strategy preset, obtains matching result,
If described matching result is that the match is successful, implementation strategy trigger action.The application uses preset strategy pair
The content information of document to be detected detects, it is achieved that only detect the content letter in document to be detected
Breath, it is to avoid encryption to content vector, improves the operational efficiency of enterprise.Meanwhile, if treating
Content information in detection file and the strategy matching preset, then implementation strategy trigger action, i.e. to stealing
The behavior taking secret data in enterprise is reported to the police and blocks, and the confidential data of enterprise, information can be made to obtain
To being effectively protected.
Accompanying drawing explanation
The detailed description that non-limiting example is made made with reference to the following drawings by reading, this Shen
Other features, objects and advantages please will become more apparent upon:
Fig. 1 illustrates the method flow diagram of a kind of file detection according to one aspect of the application;
Fig. 2 illustrates the method flow diagram of a kind of file detection according to one preferred embodiment of the application;
Fig. 3 illustrates the equipment schematic diagram according to the application a kind of file detection in terms of another;
Fig. 4 illustrates the equipment schematic diagram of a kind of file detection according to one preferred embodiment of the application.
In accompanying drawing, same or analogous reference represents same or analogous parts.
Detailed description of the invention
Below in conjunction with the accompanying drawings the application is described in further detail.
In one typical configuration of the application, terminal, the equipment of service network and trusted party all include
One or more processors (CPU), input/output interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory
(RAM) and/or the form such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash
RAM).Internal memory is the example of computer-readable medium.
Computer-readable medium includes that removable media permanent and non-permanent, removable and non-is permissible
Information storage is realized by any method or technology.Information can be computer-readable instruction, data knot
Structure, the module of program or other data.The example of the storage medium of computer includes, but are not limited to phase
Become internal memory (PRAM), static RAM (SRAM), dynamic random access memory
(DRAM), other kinds of random access memory (RAM), read only memory (ROM), electricity
Erasable Programmable Read Only Memory EPROM (EEPROM), fast flash memory bank or other memory techniques, read-only
Compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage,
Magnetic cassette tape, magnetic disk storage or other magnetic storage apparatus or any other non-transmission medium,
Can be used for the information that storage can be accessed by a computing device.According to defining herein, computer-readable
Medium does not include non-temporary computer readable media (transitory media), as modulation data signal and
Carrier wave.
Fig. 1 illustrate according to one aspect of the application a kind of file detection method flow diagram, wherein,
The method comprising the steps of S11, step S12 and step S13.Specifically, in step s 11, if
Standby 1 obtains file to be detected, and extracts the content information in described file to be detected;In step s 12,
Described content information is mated by equipment 1 with the strategy preset, and obtains matching result;In step S13
In, if described matching result is that the match is successful, equipment 1 then implementation strategy trigger action.
Here, described equipment 1 include but not limited to subscriber equipment, the network equipment or subscriber equipment with
The network equipment passes through the mutually integrated equipment constituted of network.Described subscriber equipment its include but not limited to meter
Calculation machine, touch control terminal etc..Wherein, the described network equipment includes that one according to being previously set or can be deposited
The instruction of storage, carries out the electronic equipment of numerical computations and information processing automatically, and its hardware includes but do not limits
In microprocessor, special IC (ASIC), programmable gate array (FPGA), digital processing
Device (DSP), embedded device etc..The described network equipment its include but not limited to computer, network
The cloud that main frame, single network server, multiple webserver collection or multiple server are constituted;Here,
Cloud is made up of a large amount of computers based on cloud computing (Cloud Computing) or the webserver, its
In, cloud computing is the one of Distributed Calculation, be made up of a group loosely-coupled computer collection
Virtual supercomputer.Described network includes but not limited to the Internet, wide area network, Metropolitan Area Network (MAN), local
Net, VPN, wireless self-organization network (Ad Hoc network) etc..Preferably, equipment 1 is also
Described subscriber equipment, the network equipment or subscriber equipment can be operate in set with the network equipment, network
Standby, touch terminal or the network equipment with touch terminal by the foot on the mutually integrated equipment constituted of network
This program.Certainly, those skilled in the art will be understood that the said equipment 1 is only for example, and other are existing
Or the equipment 1 that will be likely to occur from now on be such as applicable to the application, also should be included in the application and protect model
Within enclosing, and it is incorporated herein with way of reference at this.
In step s 11, equipment 1 obtains file to be detected, and extract in described file to be detected interior
Appearance information.
Such as, equipment 1 is obtained by agreements such as gateway analysis FTP, http, smtp, pop3, smb
Take original document A to be detected and describe the file B of original document, wherein, acquired original literary composition
Part A include word document, excel file, PowerPoint file, pdf document, xml document,
Html file, picture file, 7z file, rar file and zip file;The literary composition of original document is described
Containing file protocol, source/destination IP and port numbers, file size, file type and original in part B
The information such as file path.After obtaining described file to be detected, extract from acquired original document A
Content information.
Preferably, in step s 11, equipment 1 obtains user and uploads, downloads or copy to storage Jie
File in matter.
Such as, equipment 1 by gateway analysis, capture user be uploaded in network file, from network
The file of upper download, or copy to U by technology for information acquisition (such as hook) crawl user
File in the USB storage medium such as dish, hard disk, detects the file captured, in order to avoid user
Utilize this kind of chance that enterprise's confidential information is stolen it.
Preferably, the content information of extraction is all text messages in described file to be detected, i.e. in step
In rapid S11, equipment 1 extracts all text messages in described file to be detected.
Such as, equipment 1 is extracted and is uploaded in network or from net by the user of gateway analysis, crawl
The word document of download, excel file, PowerPoint file, pdf document, xml literary composition in network
All text envelope in part, html file, picture file, 7z file, rar file and zip file
Breath, and only extract text message;Or capture user by hook to be copied in USB storage medium
Word document, excel file, PowerPoint file, pdf document, xml document, html
All text messages in file, picture file, 7z file, rar file and zip file, and only take out
Take text message.Such as, in picture file, if existing picture material, be also described or
If explaining the word content of picture, then equipment 1 is when extracting the content information in this picture file,
Only extract word content therein, picture material is not extracted.
In step s 12, described content information is mated by equipment 1 with the strategy preset, and obtains
Join result.
Specifically, before carrying out file detection, user is firstly the need of self-defined corresponding strategy, plan
Slightly comprise policy name, policy levels, policy content and strategy trigger action.Equipment 1 will be taken out
The content information taken out mates with these self-defining strategies, if described content information energy and its
Any one of strategy matching, then obtain matching result.
Preferably, in step s 12, equipment 1 strategically rank from high to low, by described content
Information is mated with the policy content in default strategy successively, if the match is successful, then obtains coupling knot
Really;Otherwise, mate with the policy content in the strategy of next policy levels.
Such as, user is when self-defined corresponding strategy, based on the significance level of policy content in strategy
For the policy levels that each policy definition is different.Content information in having extracted file to be detected
After, equipment 1 will according to the sequence of predefined policy levels, by described content information with
Policy content in strategy mates, say, that equipment 1 first can by described content information with
Policy content in the strategy of the highest policy levels mates, and described content information meets this Gao Ce
Slightly if the policy content in the strategy of rank, then obtain matching result;If described content information with should
The unmatched words of policy content in the strategy of the highest policy levels, then by this content information and next plan
The slightly policy content in the strategy of rank mates.
Preferably, in step s 12, described policy content at least include keyword, structured message,
Any one in file fingerprint and machine learning model.
Specifically, the policy content in the strategy of each policy levels includes keyword, structuring
At least one in information, file fingerprint and machine learning model.Before carrying out file detection, use
Family can these policy content self-defined.
Such as, user can define the vocabulary of some keys, as financial data, VIP member, in
Centre the People's Bank etc. carrys out implementation strategy content and includes the strategy of keyword.User can define how to use
The structural datas such as identification card number, bank's card number, cell-phone number, social security number carry out implementation strategy content and include
The strategy of structured message.Structural data is User Defined or the satisfied certain rule chosen
Data, such as identification card number, be not to say that any 18 bit digital combinations are all effective bodies
Part card number, being all a structural data, user can be customized for effective identification card number must expire
Foot its 7th to the 18 figure place combinatorics on words that the 14th is effective birthdate or the first six
18 figure place combinatorics on words of position ad hoc rule sequence.
User can also use and arrange file fingerprint as policy content.Described file fingerprint is file
Unique mark, such as the md5 of file (message-digest algorithm 5, message digest algorithm
5th edition) code.In actual application, the similar algorithms such as fuzzy hash algorithm can be used, to enterprise
Confidential document carries out file fingerprint in-stockroom operation, when user uploads, downloads or copy to storage medium
In file fingerprint and file fingerprint data base in fingerprint similarity reached the threshold value that sets, then say
This file being uploaded, download or replicating bright belongs to the classified papers of enterprise, i.e. with in this strategy
Hold coupling.
User can also carry out Bayesian model training and generate available machine enterprises file
Learning model, and to being uploaded, download or copy to whether file in storage medium meets Bayes
The machine learning model that grader is generated judges, if the file being uploaded, download or replicating
(the such as similarity with machine learning model has exceeded the threshold set to meet described machine learning model
Value) then illustrate that this file being uploaded, download or replicating belongs to the classified papers of enterprise, i.e. with this
Item policy content coupling.
When described content information is mated with the policy content in arbitrary policy levels, if institute
State content information then just to mate with any one content matching in the policy content of this policy levels
Success, obtains matching result;When described content information and this policy levels All Policies content the most not
During coupling, then carry out the coupling of the policy content of next policy levels.
In step s 13, if the described matching result of equipment 1 is that the match is successful, then implementation strategy triggers dynamic
Make.
Specifically, if described content information matches with the policy content in arbitrary policy levels, then
It is made into merit, performs corresponding strategy trigger action.
Preferably, in step s 13, described implementation strategy trigger action at least includes log, sends out
Deliver newspaper alarming information and any one that blocks in network.
Specifically, when self-defined corresponding strategy, user can come district according to the height of policy levels
Divide the degree of secrecy of document to be detected, and degree of secrecy based on this document to be detected, perform corresponding
Strategy trigger action, these strategy trigger actions can be logs, send warning message and blocking-up
Any one of network or any several.Such as, the content information of document to be detected and the highest strategy
If the policy content of rank matches, user can be with self-defined strategy trigger action for blocking network
(i.e. send suspension strategy to proxy server or fire wall, block and specify source/destination IP and port
Communication) and send warning message and (send mail or to the cell-phone number specified to the Email that specifies
Send note, or both hold concurrently and send out);Say for another example, the content information of document to be detected simply with
If the policy content of low policy levels matches, user can be only with self-defined strategy trigger action
The daily record of which kind of file is uploaded or downloaded to log, i.e. record user.
Fig. 2 illustrates the method flow diagram of a kind of file detection according to one preferred embodiment of the application.
The method comprising the steps of S11 ', step S12 ', step S13 ' and step S14 '.Here, described step
Rapid S11 ', step S12 ', step S13 ' and step S11 in Fig. 1, step S12, step S13
Content is identical or essentially identical, for simplicity's sake, repeats no more.
Specifically, in step S14 ' in, equipment 1 by training sample file update described file fingerprint and
Machine learning model.
Such as, along with the development of business event, the confidential content of enterprise can and then change, thus, enterprise
Recent enterprise's confidential document can be joined in the catalogue specified by the manager of industry, and equipment 1 can lead to
Cross such as content update device etc. in trend file fingerprint data base, to increase file fingerprint, update Bayes
The training sample of model training, updates file fingerprint and machine learning model with this.
Fig. 3 illustrate according to one aspect of the application a kind of file detection equipment schematic diagram, wherein,
This equipment 1 includes first device the 11, second device 12 and the 3rd device 13.Specifically, described
One device 11 obtains file to be detected, and extracts the content information in described file to be detected;Described second
Described content information is mated by device 12 with the strategy preset, and obtains matching result;Place the 3rd fills
If putting 13 described matching results is that the match is successful, then implementation strategy trigger action.
Here, described equipment 1 include but not limited to subscriber equipment, the network equipment or subscriber equipment with
The network equipment passes through the mutually integrated equipment constituted of network.Described subscriber equipment its include but not limited to meter
Calculation machine, touch control terminal etc..Wherein, the described network equipment includes that one according to being previously set or can be deposited
The instruction of storage, carries out the electronic equipment of numerical computations and information processing automatically, and its hardware includes but do not limits
In microprocessor, special IC (ASIC), programmable gate array (FPGA), digital processing
Device (DSP), embedded device etc..The described network equipment its include but not limited to computer, network
The cloud that main frame, single network server, multiple webserver collection or multiple server are constituted;Here,
Cloud is made up of a large amount of computers based on cloud computing (Cloud Computing) or the webserver, its
In, cloud computing is the one of Distributed Calculation, be made up of a group loosely-coupled computer collection
Virtual supercomputer.Described network includes but not limited to the Internet, wide area network, Metropolitan Area Network (MAN), local
Net, VPN, wireless self-organization network (Ad Hoc network) etc..Preferably, equipment 1 is also
Described subscriber equipment, the network equipment or subscriber equipment can be operate in set with the network equipment, network
Standby, touch terminal or the network equipment with touch terminal by the foot on the mutually integrated equipment constituted of network
This program.Certainly, those skilled in the art will be understood that the said equipment 1 is only for example, and other are existing
Or the equipment 1 that will be likely to occur from now on be such as applicable to the application, also should be included in the application and protect model
Within enclosing, and it is incorporated herein with way of reference at this.
Described first device 11 obtains file to be detected, and extracts the content letter in described file to be detected
Breath.
Such as, described first device 11 analyzes FTP, http, smtp, pop3, smb by gateway
Obtain original document A to be detected etc. agreement and describe the file B of original document, wherein, acquired
Original document A include word document, excel file, PowerPoint file, pdf document,
Xml document, html file, picture file, 7z file, rar file and zip file;Describe original
Containing file protocol, source/destination IP and port numbers, file size, files classes in the file B of file
The information such as type and original document path.After obtaining described file to be detected, from acquired original document
A extracts content information.
Preferably, described first device 11 obtains user and uploads, downloads or copy in storage medium
File.
Such as, the literary composition that described first device 11 is uploaded in network by gateway analysis, crawl user
Part, the file downloaded from network, or capture user by technology for information acquisition (such as hook)
Copy to file in the USB storage medium such as USB flash disk, hard disk, the file captured detected,
In order to avoid user utilizes this kind of chance that enterprise's confidential information is stolen it.
Preferably, the content information of extraction is all text messages in described file to be detected, i.e. described
First device 11 extracts all text messages in described file to be detected.
Such as, described first device 11 extracts and is uploaded to network by the user of gateway analysis, crawl
In or from network download word document, excel file, PowerPoint file, PDF literary composition
Institute in part, xml document, html file, picture file, 7z file, rar file and zip file
There is text message, and only extract text message;Or it is copied to USB by hook crawl user deposit
Word document in storage media, excel file, PowerPoint file, pdf document, xml literary composition
All text envelope in part, html file, picture file, 7z file, rar file and zip file
Breath, and only extract text message.Such as, in picture file, if existing picture material, also
Being described or if the word content of interpretation of images, the most described first device 11 is extracting this picture
During content information in file, only extract word content therein, picture material is not taken out
Take.
Described content information is mated by described second device 12 with the strategy preset, and obtains coupling knot
Really.
Specifically, before carrying out file detection, user is firstly the need of self-defined corresponding strategy, plan
Slightly comprise policy name, policy levels, policy content and strategy trigger action.Described second device
The content information extracted is mated by 12 with these self-defining strategies, if described content letter
Breath can then obtain matching result with any of which item strategy matching.
Preferably, described content information from high to low, is depended on by described second device 12 strategically rank
Secondary and in the strategy preset policy content mates, if the match is successful, then obtains matching result;No
Then, mate with the policy content in the strategy of next policy levels.
Such as, user is when self-defined corresponding strategy, based on the significance level of policy content in strategy
For the policy levels that each policy definition is different.Content information in having extracted file to be detected
After, described second device 12 will be according to the sequence of predefined policy levels, by described
Content information mates with the policy content in strategy, say, that described second device 12 is first
First described content information can be mated with the policy content in the strategy of the highest policy levels, described
If content information meets the policy content in the strategy of this highest policy levels, then obtain coupling knot
Really;If described content information and the unmatched words of policy content in the strategy of this highest policy levels,
Then this content information is mated with the policy content in the strategy of next policy levels.
Preferably, in described second device 12, described policy content at least includes keyword, structuring letter
Any one in breath, file fingerprint and machine learning model.
Specifically, the policy content in the strategy of each policy levels includes keyword, structuring
At least one in information, file fingerprint and machine learning model.Before carrying out file detection, use
Family can these policy content self-defined.
Such as, user can define the vocabulary of some keys, as financial data, VIP member, in
Centre the People's Bank etc. carrys out implementation strategy content and includes the strategy of keyword.User can define how to use
The structural datas such as identification card number, bank's card number, cell-phone number, social security number carry out implementation strategy content and include
The strategy of structured message.Structural data is User Defined or the satisfied certain rule chosen
Data, such as identification card number, be not to say that any 18 bit digital combinations are all effective bodies
Part card number, being all a structural data, user can be customized for effective identification card number must expire
Foot its 7th to the 18 figure place combinatorics on words that the 14th is effective birthdate or the first six
18 figure place combinatorics on words of position ad hoc rule sequence.
User can also use and arrange file fingerprint as policy content.Described file fingerprint is file
Unique mark, such as the md5 of file (message-digest algorithm 5, message digest algorithm
5th edition) code.In actual application, the similar algorithms such as fuzzy hash algorithm can be used, to enterprise
Confidential document carries out file fingerprint in-stockroom operation, when user uploads, downloads or copy to storage medium
In file fingerprint and file fingerprint data base in fingerprint similarity reached the threshold value that sets, then say
This file being uploaded, download or replicating bright belongs to the classified papers of enterprise, i.e. with in this strategy
Hold coupling.
User can also carry out Bayesian model training and generate available machine enterprises file
Learning model, and to being uploaded, download or copy to whether file in storage medium meets Bayes
The machine learning model that grader is generated judges, if the file being uploaded, download or replicating
(the such as similarity with machine learning model has exceeded the threshold set to meet described machine learning model
Value) then illustrate that this file being uploaded, download or replicating belongs to the classified papers of enterprise, i.e. with this
Item policy content coupling..
When described content information is mated with the policy content in arbitrary policy levels, if institute
State content information then just to mate with any one content matching in the policy content of this policy levels
Success, obtains matching result;When described content information and this policy levels All Policies content the most not
During coupling, then carry out the coupling of the policy content of next policy levels.
If the described 3rd described matching result of device 13 is that the match is successful, then implementation strategy trigger action.
Specifically, if described content information matches with the policy content in arbitrary policy levels, then
It is made into merit, performs corresponding strategy trigger action.
Preferably, in described 3rd device 13, described implementation strategy trigger action at least includes recording day
Will, any one sent in warning message and blocking-up network.
Specifically, when self-defined corresponding strategy, user can come district according to the height of policy levels
Divide the degree of secrecy of document to be detected, and degree of secrecy based on this document to be detected, perform corresponding
Strategy trigger action, these strategy trigger actions can be logs, send warning message and blocking-up
Any one of network or any several.Such as, the content information of document to be detected and the highest strategy
If the policy content of rank matches, user can be with self-defined strategy trigger action for blocking network
(i.e. send suspension strategy to proxy server or fire wall, block and specify source/destination IP and port
Communication) and send warning message and (send mail or to the cell-phone number specified to the Email that specifies
Send note, or both hold concurrently and send out);Say for another example, the content information of document to be detected simply with
If the policy content of low policy levels matches, user can be only with self-defined strategy trigger action
The daily record of which kind of file is uploaded or downloaded to log, i.e. record user.
Fig. 4 illustrates the equipment schematic diagram of a kind of file detection according to one preferred embodiment of the application.
This equipment 1 includes first device 11 ', the second device 12 ', the 3rd device 13 ' and the 4th device 14 '.
Here, described first device 11 ', the second device 12 ', the 3rd device 13 ' and the first dress in Fig. 3
The content putting the 11, second device the 12, the 3rd device 13 is identical or essentially identical, for simplicity's sake,
Repeat no more.
Specifically, described 4th device 14 ' updates described file fingerprint and machine by training sample file
Learning model.
Such as, along with the development of business event, the confidential content of enterprise can and then change, thus, enterprise
Recent enterprise's confidential document can be joined in the catalogue specified by the manager of industry, described 4th dress
Put 14 ' can by such as content update device etc. increase in trend file fingerprint data base file fingerprint,
Update the training sample of Bayesian model training, update file fingerprint and machine learning model with this.
Compared with prior art, the application is by obtaining file to be detected, and extracts described literary composition to be detected
Content information in part, mates described content information with the strategy preset, obtains matching result,
If described matching result is that the match is successful, implementation strategy trigger action.The application uses preset strategy pair
The content information of document to be detected detects, it is achieved that only detect the content letter in document to be detected
Breath, it is to avoid encryption to content vector, improves the operational efficiency of enterprise.Meanwhile, if treating
Content information in detection file and the strategy matching preset, then implementation strategy trigger action, i.e. to stealing
The behavior taking secret data in enterprise is reported to the police and blocks, and the confidential data of enterprise, information can be made to obtain
To being effectively protected.
It should be noted that the application can be carried out in the assembly of hardware at software and/or software,
Such as, special IC (ASIC), general purpose computer can be used or any other is similar hard
Part equipment realizes.In one embodiment, the software program of the application can be performed by processor
To realize steps described above or function.Similarly, the software program of the application (includes the number being correlated with
According to structure) can be stored in computer readable recording medium storing program for performing, such as, and RAM memory, magnetic
Or CD-ROM driver or floppy disc and similar devices.It addition, some steps of the application or function can use
Hardware realizes, and such as, performs the circuit of each step or function as coordinating with processor.
It addition, the part of the application can be applied to computer program, such as computer program
Instruction, when it is computer-executed, by the operation of this computer, can call or provide basis
The present processes and/or technical scheme.And call the programmed instruction of the present processes, may be deposited
Store up fixing or movably in record medium, and/or by broadcast or other signal bearing medias
Data stream and be transmitted, and/or be stored in the computer equipment that runs according to described programmed instruction
In working storage.Here, include a device according to an embodiment of the application, this device bag
Include the memorizer for storing computer program instructions and for performing the processor of programmed instruction, its
In, when this computer program instructions is performed by this processor, trigger this plant running based on aforementioned
The method of multiple embodiments and/or technical scheme according to the application.
It is obvious to a person skilled in the art that the application is not limited to the thin of above-mentioned one exemplary embodiment
Joint, and in the case of without departing substantially from spirit herein or basic feature, it is possible to concrete with other
Form realizes the application.Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary
, and be nonrestrictive, scope of the present application is limited by claims rather than described above
It is fixed, it is intended that all changes fallen in the implication of equivalency and scope of claim are included
In the application.Any reference in claim should not be considered as limit involved right want
Ask.Furthermore, it is to be understood that " an including " word is not excluded for other unit or step, odd number is not excluded for plural number.
In device claim, multiple unit or the device of statement can also pass through soft by a unit or device
Part or hardware realize.The first, the second word such as grade is used for representing title, and is not offered as any spy
Fixed order.
Claims (14)
1. a method for file detection, wherein, the method includes:
Obtain file to be detected, and extract the content information in described file to be detected;
Described content information is mated with the strategy preset, obtains matching result;
If described matching result is that the match is successful, then implementation strategy trigger action.
Method the most according to claim 1, wherein, obtains file to be detected, including:
Obtain user and upload, download or copy to the file in storage medium.
Method the most according to claim 1, wherein, extracts the content letter in described file to be detected
Breath, including:
Extract all text messages in described file to be detected.
The most according to the method in any one of claims 1 to 3, wherein, described strategy includes strategy
Rank and at least one policy content;
Described content information is mated with the strategy preset, obtains matching result, including:
Strategically rank is from high to low, and described content information is interior with the strategy in default strategy successively
Hold and mate, if the match is successful, then obtain matching result;Otherwise, with the strategy of next policy levels
In policy content mate.
Method the most according to claim 4, wherein, described policy content at least include keyword,
Any one in structured message, file fingerprint and machine learning model.
Method the most according to claim 5, wherein, the method also includes:
Described file fingerprint and machine learning model is updated by training sample file.
Method the most according to claim 1, wherein, described implementation strategy trigger action at least includes
Log, any one sent in warning message and blocking-up network.
8. an equipment for file detection, wherein, this equipment includes:
First device, is used for obtaining file to be detected, and extracts the content information in described file to be detected;
Second device, for being mated with the strategy preset by described content information, obtains matching result;
3rd device, is used for when described matching result is that the match is successful, then implementation strategy trigger action.
Equipment the most according to claim 8, wherein, obtains file to be detected, including:
Obtain user and upload, download or copy to the file in storage medium.
Equipment the most according to claim 8, wherein, extracts the content in described file to be detected
Information, including:
Extract all text messages in described file to be detected.
11. according to Claim 8 to the equipment according to any one of 10, and wherein, described strategy includes plan
Slightly rank and at least one policy content;
Described content information is mated with the strategy preset, obtains matching result, including:
Strategically rank is from high to low, and described content information is interior with the strategy in default strategy successively
Hold and mate, if the match is successful, then obtain matching result;Otherwise, with the strategy of next policy levels
In policy content mate.
12. equipment according to claim 11, wherein, described policy content at least include keyword,
Any one in structured message, file fingerprint and machine learning model.
13. equipment according to claim 12, wherein, this equipment also includes:
4th device, for updating described file fingerprint and machine learning model by training sample file.
14. equipment according to claim 8, wherein, described implementation strategy trigger action at least wraps
Include log, send warning message and any one blocked in network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610206473.6A CN105912946A (en) | 2016-04-05 | 2016-04-05 | Document detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610206473.6A CN105912946A (en) | 2016-04-05 | 2016-04-05 | Document detection method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105912946A true CN105912946A (en) | 2016-08-31 |
Family
ID=56745316
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610206473.6A Pending CN105912946A (en) | 2016-04-05 | 2016-04-05 | Document detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105912946A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108270786A (en) * | 2018-01-16 | 2018-07-10 | 广东欧珀移动通信有限公司 | Right management method, device, storage medium and the intelligent terminal of application program |
CN108959965A (en) * | 2018-07-06 | 2018-12-07 | 北京天空卫士网络安全技术有限公司 | Data review of compliance method and apparatus |
CN109246296A (en) * | 2018-08-27 | 2019-01-18 | 河南丰泰光电科技有限公司 | A kind of mobile phone safe information generates and storage method |
CN112257106A (en) * | 2020-10-20 | 2021-01-22 | 厦门天锐科技股份有限公司 | Data detection method and device |
CN112422536A (en) * | 2020-11-06 | 2021-02-26 | 上海计算机软件技术开发中心 | Data confidentiality detection and judgment method |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4005120B1 (en) * | 2007-03-28 | 2007-11-07 | Sky株式会社 | Access authority control system |
CN102664874A (en) * | 2012-03-29 | 2012-09-12 | 奇智软件(北京)有限公司 | Method and system for secure logging in |
CN103092832A (en) * | 2011-10-27 | 2013-05-08 | 腾讯科技(深圳)有限公司 | Website risk detection processing method and website risk detection processing device |
CN103164515A (en) * | 2013-03-01 | 2013-06-19 | 傅如毅 | Computer system confidential file knowledge base searching method |
CN103646062A (en) * | 2013-12-02 | 2014-03-19 | 北京奇虎科技有限公司 | Scanning method and device for downloaded file |
CN103685150A (en) * | 2012-09-03 | 2014-03-26 | 腾讯科技(深圳)有限公司 | File uploading method and device |
CN103870758A (en) * | 2014-03-20 | 2014-06-18 | 陈建 | Classified information security classification affiliation method based on word classification combined judgment and probability statistics |
CN104217165A (en) * | 2014-09-16 | 2014-12-17 | 百度在线网络技术(北京)有限公司 | Method and device for processing documents |
CN104239795A (en) * | 2014-09-16 | 2014-12-24 | 百度在线网络技术(北京)有限公司 | File scanning method and device |
CN104252531A (en) * | 2014-09-11 | 2014-12-31 | 北京优特捷信息技术有限公司 | File type identification method and device |
CN104811452A (en) * | 2015-04-30 | 2015-07-29 | 北京科技大学 | Data mining based intrusion detection system with self-learning and classified early warning functions |
-
2016
- 2016-04-05 CN CN201610206473.6A patent/CN105912946A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4005120B1 (en) * | 2007-03-28 | 2007-11-07 | Sky株式会社 | Access authority control system |
CN103092832A (en) * | 2011-10-27 | 2013-05-08 | 腾讯科技(深圳)有限公司 | Website risk detection processing method and website risk detection processing device |
CN102664874A (en) * | 2012-03-29 | 2012-09-12 | 奇智软件(北京)有限公司 | Method and system for secure logging in |
CN103685150A (en) * | 2012-09-03 | 2014-03-26 | 腾讯科技(深圳)有限公司 | File uploading method and device |
CN103164515A (en) * | 2013-03-01 | 2013-06-19 | 傅如毅 | Computer system confidential file knowledge base searching method |
CN103646062A (en) * | 2013-12-02 | 2014-03-19 | 北京奇虎科技有限公司 | Scanning method and device for downloaded file |
CN103870758A (en) * | 2014-03-20 | 2014-06-18 | 陈建 | Classified information security classification affiliation method based on word classification combined judgment and probability statistics |
CN104252531A (en) * | 2014-09-11 | 2014-12-31 | 北京优特捷信息技术有限公司 | File type identification method and device |
CN104217165A (en) * | 2014-09-16 | 2014-12-17 | 百度在线网络技术(北京)有限公司 | Method and device for processing documents |
CN104239795A (en) * | 2014-09-16 | 2014-12-24 | 百度在线网络技术(北京)有限公司 | File scanning method and device |
CN104811452A (en) * | 2015-04-30 | 2015-07-29 | 北京科技大学 | Data mining based intrusion detection system with self-learning and classified early warning functions |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108270786A (en) * | 2018-01-16 | 2018-07-10 | 广东欧珀移动通信有限公司 | Right management method, device, storage medium and the intelligent terminal of application program |
CN108959965A (en) * | 2018-07-06 | 2018-12-07 | 北京天空卫士网络安全技术有限公司 | Data review of compliance method and apparatus |
CN108959965B (en) * | 2018-07-06 | 2020-01-17 | 北京天空卫士网络安全技术有限公司 | Data compliance examination method and device |
CN109246296A (en) * | 2018-08-27 | 2019-01-18 | 河南丰泰光电科技有限公司 | A kind of mobile phone safe information generates and storage method |
CN112257106A (en) * | 2020-10-20 | 2021-01-22 | 厦门天锐科技股份有限公司 | Data detection method and device |
CN112257106B (en) * | 2020-10-20 | 2022-06-17 | 厦门天锐科技股份有限公司 | Data detection method and device |
CN112422536A (en) * | 2020-11-06 | 2021-02-26 | 上海计算机软件技术开发中心 | Data confidentiality detection and judgment method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Alneyadi et al. | A survey on data leakage prevention systems | |
US10079835B1 (en) | Systems and methods for data loss prevention of unidentifiable and unsupported object types | |
Tahboub et al. | Data leakage/loss prevention systems (DLP) | |
CN105912946A (en) | Document detection method and device | |
US11036800B1 (en) | Systems and methods for clustering data to improve data analytics | |
WO2004040464A2 (en) | A method and system for managing confidential information | |
Shaerpour et al. | Trends in android malware detection | |
Ghouse et al. | Data leakage prevention for data in transit using artificial intelligence and encryption techniques | |
Luntovskyy et al. | Cryptographic technology blockchain and its applications | |
Brown et al. | An artificial immunity approach to malware detection in a mobile platform | |
Herrera Montano et al. | Survey of Techniques on Data Leakage Protection and Methods to address the Insider threat | |
Rafiq et al. | AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems | |
Sharma et al. | The paradox of choice: investigating selection strategies for android malware datasets using a machine-learning approach | |
Khan et al. | A survey of machine learning applications in digital forensics | |
Patil et al. | Roadmap of digital forensics investigation process with discovery of tools | |
Sifat et al. | Android ransomware attacks detection with optimized ensemble learning | |
Chakraborty et al. | Machine Learning Techniques and Analytics for Cloud Security | |
Gupta et al. | Blockchain based detection of android malware using ranked permissions | |
Verma et al. | Preserving dates and timestamps for incident handling in android smartphones | |
Chenli et al. | Provnet: Networked blockchain for decentralized secure provenance | |
Mantri et al. | Pre-encryption and identification (PEI): an anti-crypto ransomware technique | |
US11556653B1 (en) | Systems and methods for detecting inter-personal attack applications | |
Shabtai et al. | A taxonomy of data leakage prevention solutions | |
Fugkeaw et al. | Design and development of a dynamic and efficient PII data loss prevention system | |
Patil et al. | A comparative analysis of various techniques of data leakage detection in different domains |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160831 |