CN105912946A - Document detection method and device - Google Patents

Document detection method and device Download PDF

Info

Publication number
CN105912946A
CN105912946A CN201610206473.6A CN201610206473A CN105912946A CN 105912946 A CN105912946 A CN 105912946A CN 201610206473 A CN201610206473 A CN 201610206473A CN 105912946 A CN105912946 A CN 105912946A
Authority
CN
China
Prior art keywords
file
strategy
detected
content
content information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610206473.6A
Other languages
Chinese (zh)
Inventor
李梦雅
王志龙
石印
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Upper Marine Infotech Share Co Ltd Of Interrogating
Original Assignee
Upper Marine Infotech Share Co Ltd Of Interrogating
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Upper Marine Infotech Share Co Ltd Of Interrogating filed Critical Upper Marine Infotech Share Co Ltd Of Interrogating
Priority to CN201610206473.6A priority Critical patent/CN105912946A/en
Publication of CN105912946A publication Critical patent/CN105912946A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Abstract

The application is aimed at providing a document detection method and device.Compared with the prior art, the application obtains documents to be detected and extract content information of documents to be detected for matching content information with a pre-set strategy in order to obtain a matching result. If the matching result is successful, trigger action of the strategy is performed. The document detection method and device have following beneficial effects: by adopting the pre-set strategy, content information of documents to be detected is detected so that content information of documents to be detected is detected, encryption processing of content carriers is avoided and operation efficiency of an enterprise is raised; and at the same time, if content information of documents to be detected is matched with the pre-set strategy, the trigger action of the strategy is performed, namely, an alarm is raised when confidential data of the enterprise is stolen and prevented so that confidential data and information of the enterprise are effectively protected.

Description

A kind of method and apparatus of file detection
Technical field
The application relates to computer realm, particularly relates to the technology of a kind of file detection.
Background technology
The problem of the stolen always one Ge Ling enterprise worry of secret data in enterprise, information.At present, in order to Solve the problem that business data is divulged a secret, some security firms by the storage device such as wireless network, USB Propose some solutions, although serve certain protective action, but there is also the biggest leakage Hole is with not enough:
(1) physical isolation technology: some enterprise does not provide online environment, cable network, wireless WiFi Privately use Deng not allowing.PC end directly blocks or removes USB interface.But, such one Coming, various Internet resources can not get making full use of of employee, and the closure of USB interface also limit other The use of USB device, brings some troubles to the normal work of employee, also reduces work simultaneously Efficiency.
(2) file ciphering technology: some enterprise uses encryption and decryption technology, the document to a certain type Carrying out unifying encryption, such as Finance Department may be encrypted all of excel file, create Department may be encrypted all of word document.File after so employee steals encryption is also Can not use.The shortcoming of this technology be impose uniformity without examining individual cases, word document all encrypt or Do not encrypt.And, so cause the common word document of employee also to suffer that pressure is encrypted, to employee The file transmission of daily life brings puzzlement.It addition, the defect of file consolidation encryption is also clearly , employee is easy to the file that Content Transformation is extended formatting of a file thus escapes at encryption Reason.
Either physical isolation technology or file ciphering technology, there are greatly the most in actual use Drawback and trouble, it is stolen that it can not both protect secret data in enterprise, information, can not affect again employee Routine work, reduce its work efficiency.
Summary of the invention
One purpose of the application is to provide the method and apparatus of a kind of file detection.
An aspect according to the application, it is provided that the method for a kind of file detection, wherein, the method Including:
Obtain file to be detected, and extract the content information in described file to be detected;
Described content information is mated with the strategy preset, obtains matching result;
If described matching result is that the match is successful, then implementation strategy trigger action.
According to further aspect of the application, it is provided that the equipment of a kind of file detection, wherein, this sets For including:
First device, is used for obtaining file to be detected, and extracts the content letter in described file to be detected Breath;
Second device, for being mated with the strategy preset by described content information, obtains coupling knot Really;
3rd device, if for when described matching result is that the match is successful, then implementation strategy triggers dynamic Make.
Compared with prior art, the application is by obtaining file to be detected, and extracts described literary composition to be detected Content information in part, mates described content information with the strategy preset, obtains matching result, If described matching result is that the match is successful, implementation strategy trigger action.The application uses preset strategy pair The content information of document to be detected detects, it is achieved that only detect the content letter in document to be detected Breath, it is to avoid encryption to content vector, improves the operational efficiency of enterprise.Meanwhile, if treating Content information in detection file and the strategy matching preset, then implementation strategy trigger action, i.e. to stealing The behavior taking secret data in enterprise is reported to the police and blocks, and the confidential data of enterprise, information can be made to obtain To being effectively protected.
Accompanying drawing explanation
The detailed description that non-limiting example is made made with reference to the following drawings by reading, this Shen Other features, objects and advantages please will become more apparent upon:
Fig. 1 illustrates the method flow diagram of a kind of file detection according to one aspect of the application;
Fig. 2 illustrates the method flow diagram of a kind of file detection according to one preferred embodiment of the application;
Fig. 3 illustrates the equipment schematic diagram according to the application a kind of file detection in terms of another;
Fig. 4 illustrates the equipment schematic diagram of a kind of file detection according to one preferred embodiment of the application.
In accompanying drawing, same or analogous reference represents same or analogous parts.
Detailed description of the invention
Below in conjunction with the accompanying drawings the application is described in further detail.
In one typical configuration of the application, terminal, the equipment of service network and trusted party all include One or more processors (CPU), input/output interface, network interface and internal memory.
Internal memory potentially includes the volatile memory in computer-readable medium, random access memory (RAM) and/or the form such as Nonvolatile memory, such as read only memory (ROM) or flash memory (flash RAM).Internal memory is the example of computer-readable medium.
Computer-readable medium includes that removable media permanent and non-permanent, removable and non-is permissible Information storage is realized by any method or technology.Information can be computer-readable instruction, data knot Structure, the module of program or other data.The example of the storage medium of computer includes, but are not limited to phase Become internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read only memory (ROM), electricity Erasable Programmable Read Only Memory EPROM (EEPROM), fast flash memory bank or other memory techniques, read-only Compact disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic cassette tape, magnetic disk storage or other magnetic storage apparatus or any other non-transmission medium, Can be used for the information that storage can be accessed by a computing device.According to defining herein, computer-readable Medium does not include non-temporary computer readable media (transitory media), as modulation data signal and Carrier wave.
Fig. 1 illustrate according to one aspect of the application a kind of file detection method flow diagram, wherein, The method comprising the steps of S11, step S12 and step S13.Specifically, in step s 11, if Standby 1 obtains file to be detected, and extracts the content information in described file to be detected;In step s 12, Described content information is mated by equipment 1 with the strategy preset, and obtains matching result;In step S13 In, if described matching result is that the match is successful, equipment 1 then implementation strategy trigger action.
Here, described equipment 1 include but not limited to subscriber equipment, the network equipment or subscriber equipment with The network equipment passes through the mutually integrated equipment constituted of network.Described subscriber equipment its include but not limited to meter Calculation machine, touch control terminal etc..Wherein, the described network equipment includes that one according to being previously set or can be deposited The instruction of storage, carries out the electronic equipment of numerical computations and information processing automatically, and its hardware includes but do not limits In microprocessor, special IC (ASIC), programmable gate array (FPGA), digital processing Device (DSP), embedded device etc..The described network equipment its include but not limited to computer, network The cloud that main frame, single network server, multiple webserver collection or multiple server are constituted;Here, Cloud is made up of a large amount of computers based on cloud computing (Cloud Computing) or the webserver, its In, cloud computing is the one of Distributed Calculation, be made up of a group loosely-coupled computer collection Virtual supercomputer.Described network includes but not limited to the Internet, wide area network, Metropolitan Area Network (MAN), local Net, VPN, wireless self-organization network (Ad Hoc network) etc..Preferably, equipment 1 is also Described subscriber equipment, the network equipment or subscriber equipment can be operate in set with the network equipment, network Standby, touch terminal or the network equipment with touch terminal by the foot on the mutually integrated equipment constituted of network This program.Certainly, those skilled in the art will be understood that the said equipment 1 is only for example, and other are existing Or the equipment 1 that will be likely to occur from now on be such as applicable to the application, also should be included in the application and protect model Within enclosing, and it is incorporated herein with way of reference at this.
In step s 11, equipment 1 obtains file to be detected, and extract in described file to be detected interior Appearance information.
Such as, equipment 1 is obtained by agreements such as gateway analysis FTP, http, smtp, pop3, smb Take original document A to be detected and describe the file B of original document, wherein, acquired original literary composition Part A include word document, excel file, PowerPoint file, pdf document, xml document, Html file, picture file, 7z file, rar file and zip file;The literary composition of original document is described Containing file protocol, source/destination IP and port numbers, file size, file type and original in part B The information such as file path.After obtaining described file to be detected, extract from acquired original document A Content information.
Preferably, in step s 11, equipment 1 obtains user and uploads, downloads or copy to storage Jie File in matter.
Such as, equipment 1 by gateway analysis, capture user be uploaded in network file, from network The file of upper download, or copy to U by technology for information acquisition (such as hook) crawl user File in the USB storage medium such as dish, hard disk, detects the file captured, in order to avoid user Utilize this kind of chance that enterprise's confidential information is stolen it.
Preferably, the content information of extraction is all text messages in described file to be detected, i.e. in step In rapid S11, equipment 1 extracts all text messages in described file to be detected.
Such as, equipment 1 is extracted and is uploaded in network or from net by the user of gateway analysis, crawl The word document of download, excel file, PowerPoint file, pdf document, xml literary composition in network All text envelope in part, html file, picture file, 7z file, rar file and zip file Breath, and only extract text message;Or capture user by hook to be copied in USB storage medium Word document, excel file, PowerPoint file, pdf document, xml document, html All text messages in file, picture file, 7z file, rar file and zip file, and only take out Take text message.Such as, in picture file, if existing picture material, be also described or If explaining the word content of picture, then equipment 1 is when extracting the content information in this picture file, Only extract word content therein, picture material is not extracted.
In step s 12, described content information is mated by equipment 1 with the strategy preset, and obtains Join result.
Specifically, before carrying out file detection, user is firstly the need of self-defined corresponding strategy, plan Slightly comprise policy name, policy levels, policy content and strategy trigger action.Equipment 1 will be taken out The content information taken out mates with these self-defining strategies, if described content information energy and its Any one of strategy matching, then obtain matching result.
Preferably, in step s 12, equipment 1 strategically rank from high to low, by described content Information is mated with the policy content in default strategy successively, if the match is successful, then obtains coupling knot Really;Otherwise, mate with the policy content in the strategy of next policy levels.
Such as, user is when self-defined corresponding strategy, based on the significance level of policy content in strategy For the policy levels that each policy definition is different.Content information in having extracted file to be detected After, equipment 1 will according to the sequence of predefined policy levels, by described content information with Policy content in strategy mates, say, that equipment 1 first can by described content information with Policy content in the strategy of the highest policy levels mates, and described content information meets this Gao Ce Slightly if the policy content in the strategy of rank, then obtain matching result;If described content information with should The unmatched words of policy content in the strategy of the highest policy levels, then by this content information and next plan The slightly policy content in the strategy of rank mates.
Preferably, in step s 12, described policy content at least include keyword, structured message, Any one in file fingerprint and machine learning model.
Specifically, the policy content in the strategy of each policy levels includes keyword, structuring At least one in information, file fingerprint and machine learning model.Before carrying out file detection, use Family can these policy content self-defined.
Such as, user can define the vocabulary of some keys, as financial data, VIP member, in Centre the People's Bank etc. carrys out implementation strategy content and includes the strategy of keyword.User can define how to use The structural datas such as identification card number, bank's card number, cell-phone number, social security number carry out implementation strategy content and include The strategy of structured message.Structural data is User Defined or the satisfied certain rule chosen Data, such as identification card number, be not to say that any 18 bit digital combinations are all effective bodies Part card number, being all a structural data, user can be customized for effective identification card number must expire Foot its 7th to the 18 figure place combinatorics on words that the 14th is effective birthdate or the first six 18 figure place combinatorics on words of position ad hoc rule sequence.
User can also use and arrange file fingerprint as policy content.Described file fingerprint is file Unique mark, such as the md5 of file (message-digest algorithm 5, message digest algorithm 5th edition) code.In actual application, the similar algorithms such as fuzzy hash algorithm can be used, to enterprise Confidential document carries out file fingerprint in-stockroom operation, when user uploads, downloads or copy to storage medium In file fingerprint and file fingerprint data base in fingerprint similarity reached the threshold value that sets, then say This file being uploaded, download or replicating bright belongs to the classified papers of enterprise, i.e. with in this strategy Hold coupling.
User can also carry out Bayesian model training and generate available machine enterprises file Learning model, and to being uploaded, download or copy to whether file in storage medium meets Bayes The machine learning model that grader is generated judges, if the file being uploaded, download or replicating (the such as similarity with machine learning model has exceeded the threshold set to meet described machine learning model Value) then illustrate that this file being uploaded, download or replicating belongs to the classified papers of enterprise, i.e. with this Item policy content coupling.
When described content information is mated with the policy content in arbitrary policy levels, if institute State content information then just to mate with any one content matching in the policy content of this policy levels Success, obtains matching result;When described content information and this policy levels All Policies content the most not During coupling, then carry out the coupling of the policy content of next policy levels.
In step s 13, if the described matching result of equipment 1 is that the match is successful, then implementation strategy triggers dynamic Make.
Specifically, if described content information matches with the policy content in arbitrary policy levels, then It is made into merit, performs corresponding strategy trigger action.
Preferably, in step s 13, described implementation strategy trigger action at least includes log, sends out Deliver newspaper alarming information and any one that blocks in network.
Specifically, when self-defined corresponding strategy, user can come district according to the height of policy levels Divide the degree of secrecy of document to be detected, and degree of secrecy based on this document to be detected, perform corresponding Strategy trigger action, these strategy trigger actions can be logs, send warning message and blocking-up Any one of network or any several.Such as, the content information of document to be detected and the highest strategy If the policy content of rank matches, user can be with self-defined strategy trigger action for blocking network (i.e. send suspension strategy to proxy server or fire wall, block and specify source/destination IP and port Communication) and send warning message and (send mail or to the cell-phone number specified to the Email that specifies Send note, or both hold concurrently and send out);Say for another example, the content information of document to be detected simply with If the policy content of low policy levels matches, user can be only with self-defined strategy trigger action The daily record of which kind of file is uploaded or downloaded to log, i.e. record user.
Fig. 2 illustrates the method flow diagram of a kind of file detection according to one preferred embodiment of the application.
The method comprising the steps of S11 ', step S12 ', step S13 ' and step S14 '.Here, described step Rapid S11 ', step S12 ', step S13 ' and step S11 in Fig. 1, step S12, step S13 Content is identical or essentially identical, for simplicity's sake, repeats no more.
Specifically, in step S14 ' in, equipment 1 by training sample file update described file fingerprint and Machine learning model.
Such as, along with the development of business event, the confidential content of enterprise can and then change, thus, enterprise Recent enterprise's confidential document can be joined in the catalogue specified by the manager of industry, and equipment 1 can lead to Cross such as content update device etc. in trend file fingerprint data base, to increase file fingerprint, update Bayes The training sample of model training, updates file fingerprint and machine learning model with this.
Fig. 3 illustrate according to one aspect of the application a kind of file detection equipment schematic diagram, wherein, This equipment 1 includes first device the 11, second device 12 and the 3rd device 13.Specifically, described One device 11 obtains file to be detected, and extracts the content information in described file to be detected;Described second Described content information is mated by device 12 with the strategy preset, and obtains matching result;Place the 3rd fills If putting 13 described matching results is that the match is successful, then implementation strategy trigger action.
Here, described equipment 1 include but not limited to subscriber equipment, the network equipment or subscriber equipment with The network equipment passes through the mutually integrated equipment constituted of network.Described subscriber equipment its include but not limited to meter Calculation machine, touch control terminal etc..Wherein, the described network equipment includes that one according to being previously set or can be deposited The instruction of storage, carries out the electronic equipment of numerical computations and information processing automatically, and its hardware includes but do not limits In microprocessor, special IC (ASIC), programmable gate array (FPGA), digital processing Device (DSP), embedded device etc..The described network equipment its include but not limited to computer, network The cloud that main frame, single network server, multiple webserver collection or multiple server are constituted;Here, Cloud is made up of a large amount of computers based on cloud computing (Cloud Computing) or the webserver, its In, cloud computing is the one of Distributed Calculation, be made up of a group loosely-coupled computer collection Virtual supercomputer.Described network includes but not limited to the Internet, wide area network, Metropolitan Area Network (MAN), local Net, VPN, wireless self-organization network (Ad Hoc network) etc..Preferably, equipment 1 is also Described subscriber equipment, the network equipment or subscriber equipment can be operate in set with the network equipment, network Standby, touch terminal or the network equipment with touch terminal by the foot on the mutually integrated equipment constituted of network This program.Certainly, those skilled in the art will be understood that the said equipment 1 is only for example, and other are existing Or the equipment 1 that will be likely to occur from now on be such as applicable to the application, also should be included in the application and protect model Within enclosing, and it is incorporated herein with way of reference at this.
Described first device 11 obtains file to be detected, and extracts the content letter in described file to be detected Breath.
Such as, described first device 11 analyzes FTP, http, smtp, pop3, smb by gateway Obtain original document A to be detected etc. agreement and describe the file B of original document, wherein, acquired Original document A include word document, excel file, PowerPoint file, pdf document, Xml document, html file, picture file, 7z file, rar file and zip file;Describe original Containing file protocol, source/destination IP and port numbers, file size, files classes in the file B of file The information such as type and original document path.After obtaining described file to be detected, from acquired original document A extracts content information.
Preferably, described first device 11 obtains user and uploads, downloads or copy in storage medium File.
Such as, the literary composition that described first device 11 is uploaded in network by gateway analysis, crawl user Part, the file downloaded from network, or capture user by technology for information acquisition (such as hook) Copy to file in the USB storage medium such as USB flash disk, hard disk, the file captured detected, In order to avoid user utilizes this kind of chance that enterprise's confidential information is stolen it.
Preferably, the content information of extraction is all text messages in described file to be detected, i.e. described First device 11 extracts all text messages in described file to be detected.
Such as, described first device 11 extracts and is uploaded to network by the user of gateway analysis, crawl In or from network download word document, excel file, PowerPoint file, PDF literary composition Institute in part, xml document, html file, picture file, 7z file, rar file and zip file There is text message, and only extract text message;Or it is copied to USB by hook crawl user deposit Word document in storage media, excel file, PowerPoint file, pdf document, xml literary composition All text envelope in part, html file, picture file, 7z file, rar file and zip file Breath, and only extract text message.Such as, in picture file, if existing picture material, also Being described or if the word content of interpretation of images, the most described first device 11 is extracting this picture During content information in file, only extract word content therein, picture material is not taken out Take.
Described content information is mated by described second device 12 with the strategy preset, and obtains coupling knot Really.
Specifically, before carrying out file detection, user is firstly the need of self-defined corresponding strategy, plan Slightly comprise policy name, policy levels, policy content and strategy trigger action.Described second device The content information extracted is mated by 12 with these self-defining strategies, if described content letter Breath can then obtain matching result with any of which item strategy matching.
Preferably, described content information from high to low, is depended on by described second device 12 strategically rank Secondary and in the strategy preset policy content mates, if the match is successful, then obtains matching result;No Then, mate with the policy content in the strategy of next policy levels.
Such as, user is when self-defined corresponding strategy, based on the significance level of policy content in strategy For the policy levels that each policy definition is different.Content information in having extracted file to be detected After, described second device 12 will be according to the sequence of predefined policy levels, by described Content information mates with the policy content in strategy, say, that described second device 12 is first First described content information can be mated with the policy content in the strategy of the highest policy levels, described If content information meets the policy content in the strategy of this highest policy levels, then obtain coupling knot Really;If described content information and the unmatched words of policy content in the strategy of this highest policy levels, Then this content information is mated with the policy content in the strategy of next policy levels.
Preferably, in described second device 12, described policy content at least includes keyword, structuring letter Any one in breath, file fingerprint and machine learning model.
Specifically, the policy content in the strategy of each policy levels includes keyword, structuring At least one in information, file fingerprint and machine learning model.Before carrying out file detection, use Family can these policy content self-defined.
Such as, user can define the vocabulary of some keys, as financial data, VIP member, in Centre the People's Bank etc. carrys out implementation strategy content and includes the strategy of keyword.User can define how to use The structural datas such as identification card number, bank's card number, cell-phone number, social security number carry out implementation strategy content and include The strategy of structured message.Structural data is User Defined or the satisfied certain rule chosen Data, such as identification card number, be not to say that any 18 bit digital combinations are all effective bodies Part card number, being all a structural data, user can be customized for effective identification card number must expire Foot its 7th to the 18 figure place combinatorics on words that the 14th is effective birthdate or the first six 18 figure place combinatorics on words of position ad hoc rule sequence.
User can also use and arrange file fingerprint as policy content.Described file fingerprint is file Unique mark, such as the md5 of file (message-digest algorithm 5, message digest algorithm 5th edition) code.In actual application, the similar algorithms such as fuzzy hash algorithm can be used, to enterprise Confidential document carries out file fingerprint in-stockroom operation, when user uploads, downloads or copy to storage medium In file fingerprint and file fingerprint data base in fingerprint similarity reached the threshold value that sets, then say This file being uploaded, download or replicating bright belongs to the classified papers of enterprise, i.e. with in this strategy Hold coupling.
User can also carry out Bayesian model training and generate available machine enterprises file Learning model, and to being uploaded, download or copy to whether file in storage medium meets Bayes The machine learning model that grader is generated judges, if the file being uploaded, download or replicating (the such as similarity with machine learning model has exceeded the threshold set to meet described machine learning model Value) then illustrate that this file being uploaded, download or replicating belongs to the classified papers of enterprise, i.e. with this Item policy content coupling..
When described content information is mated with the policy content in arbitrary policy levels, if institute State content information then just to mate with any one content matching in the policy content of this policy levels Success, obtains matching result;When described content information and this policy levels All Policies content the most not During coupling, then carry out the coupling of the policy content of next policy levels.
If the described 3rd described matching result of device 13 is that the match is successful, then implementation strategy trigger action.
Specifically, if described content information matches with the policy content in arbitrary policy levels, then It is made into merit, performs corresponding strategy trigger action.
Preferably, in described 3rd device 13, described implementation strategy trigger action at least includes recording day Will, any one sent in warning message and blocking-up network.
Specifically, when self-defined corresponding strategy, user can come district according to the height of policy levels Divide the degree of secrecy of document to be detected, and degree of secrecy based on this document to be detected, perform corresponding Strategy trigger action, these strategy trigger actions can be logs, send warning message and blocking-up Any one of network or any several.Such as, the content information of document to be detected and the highest strategy If the policy content of rank matches, user can be with self-defined strategy trigger action for blocking network (i.e. send suspension strategy to proxy server or fire wall, block and specify source/destination IP and port Communication) and send warning message and (send mail or to the cell-phone number specified to the Email that specifies Send note, or both hold concurrently and send out);Say for another example, the content information of document to be detected simply with If the policy content of low policy levels matches, user can be only with self-defined strategy trigger action The daily record of which kind of file is uploaded or downloaded to log, i.e. record user.
Fig. 4 illustrates the equipment schematic diagram of a kind of file detection according to one preferred embodiment of the application.
This equipment 1 includes first device 11 ', the second device 12 ', the 3rd device 13 ' and the 4th device 14 '. Here, described first device 11 ', the second device 12 ', the 3rd device 13 ' and the first dress in Fig. 3 The content putting the 11, second device the 12, the 3rd device 13 is identical or essentially identical, for simplicity's sake, Repeat no more.
Specifically, described 4th device 14 ' updates described file fingerprint and machine by training sample file Learning model.
Such as, along with the development of business event, the confidential content of enterprise can and then change, thus, enterprise Recent enterprise's confidential document can be joined in the catalogue specified by the manager of industry, described 4th dress Put 14 ' can by such as content update device etc. increase in trend file fingerprint data base file fingerprint, Update the training sample of Bayesian model training, update file fingerprint and machine learning model with this.
Compared with prior art, the application is by obtaining file to be detected, and extracts described literary composition to be detected Content information in part, mates described content information with the strategy preset, obtains matching result, If described matching result is that the match is successful, implementation strategy trigger action.The application uses preset strategy pair The content information of document to be detected detects, it is achieved that only detect the content letter in document to be detected Breath, it is to avoid encryption to content vector, improves the operational efficiency of enterprise.Meanwhile, if treating Content information in detection file and the strategy matching preset, then implementation strategy trigger action, i.e. to stealing The behavior taking secret data in enterprise is reported to the police and blocks, and the confidential data of enterprise, information can be made to obtain To being effectively protected.
It should be noted that the application can be carried out in the assembly of hardware at software and/or software, Such as, special IC (ASIC), general purpose computer can be used or any other is similar hard Part equipment realizes.In one embodiment, the software program of the application can be performed by processor To realize steps described above or function.Similarly, the software program of the application (includes the number being correlated with According to structure) can be stored in computer readable recording medium storing program for performing, such as, and RAM memory, magnetic Or CD-ROM driver or floppy disc and similar devices.It addition, some steps of the application or function can use Hardware realizes, and such as, performs the circuit of each step or function as coordinating with processor.
It addition, the part of the application can be applied to computer program, such as computer program Instruction, when it is computer-executed, by the operation of this computer, can call or provide basis The present processes and/or technical scheme.And call the programmed instruction of the present processes, may be deposited Store up fixing or movably in record medium, and/or by broadcast or other signal bearing medias Data stream and be transmitted, and/or be stored in the computer equipment that runs according to described programmed instruction In working storage.Here, include a device according to an embodiment of the application, this device bag Include the memorizer for storing computer program instructions and for performing the processor of programmed instruction, its In, when this computer program instructions is performed by this processor, trigger this plant running based on aforementioned The method of multiple embodiments and/or technical scheme according to the application.
It is obvious to a person skilled in the art that the application is not limited to the thin of above-mentioned one exemplary embodiment Joint, and in the case of without departing substantially from spirit herein or basic feature, it is possible to concrete with other Form realizes the application.Therefore, no matter from the point of view of which point, embodiment all should be regarded as exemplary , and be nonrestrictive, scope of the present application is limited by claims rather than described above It is fixed, it is intended that all changes fallen in the implication of equivalency and scope of claim are included In the application.Any reference in claim should not be considered as limit involved right want Ask.Furthermore, it is to be understood that " an including " word is not excluded for other unit or step, odd number is not excluded for plural number. In device claim, multiple unit or the device of statement can also pass through soft by a unit or device Part or hardware realize.The first, the second word such as grade is used for representing title, and is not offered as any spy Fixed order.

Claims (14)

1. a method for file detection, wherein, the method includes:
Obtain file to be detected, and extract the content information in described file to be detected;
Described content information is mated with the strategy preset, obtains matching result;
If described matching result is that the match is successful, then implementation strategy trigger action.
Method the most according to claim 1, wherein, obtains file to be detected, including:
Obtain user and upload, download or copy to the file in storage medium.
Method the most according to claim 1, wherein, extracts the content letter in described file to be detected Breath, including:
Extract all text messages in described file to be detected.
The most according to the method in any one of claims 1 to 3, wherein, described strategy includes strategy Rank and at least one policy content;
Described content information is mated with the strategy preset, obtains matching result, including:
Strategically rank is from high to low, and described content information is interior with the strategy in default strategy successively Hold and mate, if the match is successful, then obtain matching result;Otherwise, with the strategy of next policy levels In policy content mate.
Method the most according to claim 4, wherein, described policy content at least include keyword, Any one in structured message, file fingerprint and machine learning model.
Method the most according to claim 5, wherein, the method also includes:
Described file fingerprint and machine learning model is updated by training sample file.
Method the most according to claim 1, wherein, described implementation strategy trigger action at least includes Log, any one sent in warning message and blocking-up network.
8. an equipment for file detection, wherein, this equipment includes:
First device, is used for obtaining file to be detected, and extracts the content information in described file to be detected;
Second device, for being mated with the strategy preset by described content information, obtains matching result;
3rd device, is used for when described matching result is that the match is successful, then implementation strategy trigger action.
Equipment the most according to claim 8, wherein, obtains file to be detected, including:
Obtain user and upload, download or copy to the file in storage medium.
Equipment the most according to claim 8, wherein, extracts the content in described file to be detected Information, including:
Extract all text messages in described file to be detected.
11. according to Claim 8 to the equipment according to any one of 10, and wherein, described strategy includes plan Slightly rank and at least one policy content;
Described content information is mated with the strategy preset, obtains matching result, including:
Strategically rank is from high to low, and described content information is interior with the strategy in default strategy successively Hold and mate, if the match is successful, then obtain matching result;Otherwise, with the strategy of next policy levels In policy content mate.
12. equipment according to claim 11, wherein, described policy content at least include keyword, Any one in structured message, file fingerprint and machine learning model.
13. equipment according to claim 12, wherein, this equipment also includes:
4th device, for updating described file fingerprint and machine learning model by training sample file.
14. equipment according to claim 8, wherein, described implementation strategy trigger action at least wraps Include log, send warning message and any one blocked in network.
CN201610206473.6A 2016-04-05 2016-04-05 Document detection method and device Pending CN105912946A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610206473.6A CN105912946A (en) 2016-04-05 2016-04-05 Document detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610206473.6A CN105912946A (en) 2016-04-05 2016-04-05 Document detection method and device

Publications (1)

Publication Number Publication Date
CN105912946A true CN105912946A (en) 2016-08-31

Family

ID=56745316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610206473.6A Pending CN105912946A (en) 2016-04-05 2016-04-05 Document detection method and device

Country Status (1)

Country Link
CN (1) CN105912946A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108270786A (en) * 2018-01-16 2018-07-10 广东欧珀移动通信有限公司 Right management method, device, storage medium and the intelligent terminal of application program
CN108959965A (en) * 2018-07-06 2018-12-07 北京天空卫士网络安全技术有限公司 Data review of compliance method and apparatus
CN109246296A (en) * 2018-08-27 2019-01-18 河南丰泰光电科技有限公司 A kind of mobile phone safe information generates and storage method
CN112257106A (en) * 2020-10-20 2021-01-22 厦门天锐科技股份有限公司 Data detection method and device
CN112422536A (en) * 2020-11-06 2021-02-26 上海计算机软件技术开发中心 Data confidentiality detection and judgment method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4005120B1 (en) * 2007-03-28 2007-11-07 Sky株式会社 Access authority control system
CN102664874A (en) * 2012-03-29 2012-09-12 奇智软件(北京)有限公司 Method and system for secure logging in
CN103092832A (en) * 2011-10-27 2013-05-08 腾讯科技(深圳)有限公司 Website risk detection processing method and website risk detection processing device
CN103164515A (en) * 2013-03-01 2013-06-19 傅如毅 Computer system confidential file knowledge base searching method
CN103646062A (en) * 2013-12-02 2014-03-19 北京奇虎科技有限公司 Scanning method and device for downloaded file
CN103685150A (en) * 2012-09-03 2014-03-26 腾讯科技(深圳)有限公司 File uploading method and device
CN103870758A (en) * 2014-03-20 2014-06-18 陈建 Classified information security classification affiliation method based on word classification combined judgment and probability statistics
CN104217165A (en) * 2014-09-16 2014-12-17 百度在线网络技术(北京)有限公司 Method and device for processing documents
CN104239795A (en) * 2014-09-16 2014-12-24 百度在线网络技术(北京)有限公司 File scanning method and device
CN104252531A (en) * 2014-09-11 2014-12-31 北京优特捷信息技术有限公司 File type identification method and device
CN104811452A (en) * 2015-04-30 2015-07-29 北京科技大学 Data mining based intrusion detection system with self-learning and classified early warning functions

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4005120B1 (en) * 2007-03-28 2007-11-07 Sky株式会社 Access authority control system
CN103092832A (en) * 2011-10-27 2013-05-08 腾讯科技(深圳)有限公司 Website risk detection processing method and website risk detection processing device
CN102664874A (en) * 2012-03-29 2012-09-12 奇智软件(北京)有限公司 Method and system for secure logging in
CN103685150A (en) * 2012-09-03 2014-03-26 腾讯科技(深圳)有限公司 File uploading method and device
CN103164515A (en) * 2013-03-01 2013-06-19 傅如毅 Computer system confidential file knowledge base searching method
CN103646062A (en) * 2013-12-02 2014-03-19 北京奇虎科技有限公司 Scanning method and device for downloaded file
CN103870758A (en) * 2014-03-20 2014-06-18 陈建 Classified information security classification affiliation method based on word classification combined judgment and probability statistics
CN104252531A (en) * 2014-09-11 2014-12-31 北京优特捷信息技术有限公司 File type identification method and device
CN104217165A (en) * 2014-09-16 2014-12-17 百度在线网络技术(北京)有限公司 Method and device for processing documents
CN104239795A (en) * 2014-09-16 2014-12-24 百度在线网络技术(北京)有限公司 File scanning method and device
CN104811452A (en) * 2015-04-30 2015-07-29 北京科技大学 Data mining based intrusion detection system with self-learning and classified early warning functions

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108270786A (en) * 2018-01-16 2018-07-10 广东欧珀移动通信有限公司 Right management method, device, storage medium and the intelligent terminal of application program
CN108959965A (en) * 2018-07-06 2018-12-07 北京天空卫士网络安全技术有限公司 Data review of compliance method and apparatus
CN108959965B (en) * 2018-07-06 2020-01-17 北京天空卫士网络安全技术有限公司 Data compliance examination method and device
CN109246296A (en) * 2018-08-27 2019-01-18 河南丰泰光电科技有限公司 A kind of mobile phone safe information generates and storage method
CN112257106A (en) * 2020-10-20 2021-01-22 厦门天锐科技股份有限公司 Data detection method and device
CN112257106B (en) * 2020-10-20 2022-06-17 厦门天锐科技股份有限公司 Data detection method and device
CN112422536A (en) * 2020-11-06 2021-02-26 上海计算机软件技术开发中心 Data confidentiality detection and judgment method

Similar Documents

Publication Publication Date Title
Alneyadi et al. A survey on data leakage prevention systems
US10079835B1 (en) Systems and methods for data loss prevention of unidentifiable and unsupported object types
Tahboub et al. Data leakage/loss prevention systems (DLP)
CN105912946A (en) Document detection method and device
US11036800B1 (en) Systems and methods for clustering data to improve data analytics
WO2004040464A2 (en) A method and system for managing confidential information
Shaerpour et al. Trends in android malware detection
Ghouse et al. Data leakage prevention for data in transit using artificial intelligence and encryption techniques
Luntovskyy et al. Cryptographic technology blockchain and its applications
Brown et al. An artificial immunity approach to malware detection in a mobile platform
Herrera Montano et al. Survey of Techniques on Data Leakage Protection and Methods to address the Insider threat
Rafiq et al. AndroMalPack: enhancing the ML-based malware classification by detection and removal of repacked apps for Android systems
Sharma et al. The paradox of choice: investigating selection strategies for android malware datasets using a machine-learning approach
Khan et al. A survey of machine learning applications in digital forensics
Patil et al. Roadmap of digital forensics investigation process with discovery of tools
Sifat et al. Android ransomware attacks detection with optimized ensemble learning
Chakraborty et al. Machine Learning Techniques and Analytics for Cloud Security
Gupta et al. Blockchain based detection of android malware using ranked permissions
Verma et al. Preserving dates and timestamps for incident handling in android smartphones
Chenli et al. Provnet: Networked blockchain for decentralized secure provenance
Mantri et al. Pre-encryption and identification (PEI): an anti-crypto ransomware technique
US11556653B1 (en) Systems and methods for detecting inter-personal attack applications
Shabtai et al. A taxonomy of data leakage prevention solutions
Fugkeaw et al. Design and development of a dynamic and efficient PII data loss prevention system
Patil et al. A comparative analysis of various techniques of data leakage detection in different domains

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160831