CN110532529A - A kind of recognition methods of file type and device - Google Patents

A kind of recognition methods of file type and device Download PDF

Info

Publication number
CN110532529A
CN110532529A CN201910833084.XA CN201910833084A CN110532529A CN 110532529 A CN110532529 A CN 110532529A CN 201910833084 A CN201910833084 A CN 201910833084A CN 110532529 A CN110532529 A CN 110532529A
Authority
CN
China
Prior art keywords
file
encoded information
file type
type
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910833084.XA
Other languages
Chinese (zh)
Inventor
罗志成
喻波
王志海
韩振国
安鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Wondersoft Technology Co Ltd
Original Assignee
Beijing Wondersoft Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Wondersoft Technology Co Ltd filed Critical Beijing Wondersoft Technology Co Ltd
Priority to CN201910833084.XA priority Critical patent/CN110532529A/en
Publication of CN110532529A publication Critical patent/CN110532529A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of recognition methods of file type and devices.The described method includes: obtaining the encoded information of user-defined file to be identified;Judge the matching degree of the encoded information of pre-stored file in the encoded information and component registration;If the matching degree is greater than preset value, then determine the user-defined file file type be the pre-stored file the corresponding file type of encoded information can the file type to user-defined file quickly identified and parsed, the accuracy for improving custom file type identification avoids the occurrence of the problem of file is revealed.

Description

A kind of recognition methods of file type and device
Technical field
The present invention relates to file processing technology fields, recognition methods and device more particularly to a kind of file type.
Background technique
File format is used to store special data, such as: the jpeg file format in image file is only used for storing quiet The image of state, text file generally only store the text for not having formatted ASCII or Unicode simply;Html file then can be with Formatted text of storage tape etc..The format that most of file has disclosed, different degrees of specification or suggests, but In some cases, customized file, file format do not disclose, for example, file format is regarded as business secret by developer It is reluctant that open or developer is reluctant or has spent seldom time for specification document.
It generallys use the mode of the file extension of reading file in the prior art to obtain file type, or passes through reading Take file content according to content characteristic automatic sensing file type, such as open source software Tika, but for no open file The file of format adopts the identification in manner just described to file type, it may appear that the recognition result and authentic document of file type Type-Inconsistencies identify that the accuracy rate of file type is lower.
Summary of the invention
The present invention provides a kind of recognition methods of file type and devices, to solve in the prior art to user-defined file File type the lower problem of recognition accuracy.
To solve the above-mentioned problems, the present invention is implemented as follows:
In a first aspect, the embodiment of the invention provides a kind of recognition methods of file type, comprising:
Obtain the encoded information of user-defined file to be identified;
Judge the matching degree of the encoded information of pre-stored file in the encoded information and component registration;
If the matching degree is greater than preset value, it is determined that the file type of the user-defined file is described pre-stored The corresponding file type of the encoded information of file.
Optionally, the encoded information for obtaining user-defined file to be identified, comprising:
Obtain binary encoded information of the file header of the user-defined file;
Binary encoded information is converted, hexadecimal encoded information is obtained;
The hexadecimal encoded information is determined as to the encoded information of the file to be identified.
Optionally, the method also includes:
The operation requests of client are received, the operation requests include at least inquiry, newly-built, modification or removal request;
Corresponding operation is carried out to the file information in the component registration according to the operation requests;
Wherein, the file information includes at least number, file type, coding corresponding with file type title letter One of breath, registrant and registion time.
Optionally, after the file type for determining the user-defined file, further includes:
It receives the white list that client is sent and identifies request, wherein included at least in the white list identification request described The file type of user-defined file;
According to the file type of the user-defined file, looked into the white list pre-established in white list component It looks for;
If there are the file types of the file to be identified in the white list, sending to the client allows outgoing Response message.
Optionally, the method also includes:
Receive the white list operation requests that client is sent, wherein include at least in the white list operation requests described The file type and action type of user-defined file, the action type include at least inquiry, newly-built, modification or delete operation;
According to the file type and action type of the user-defined file, the file in the white list is carried out corresponding Operation.
Second aspect, the embodiment of the invention provides a kind of identification devices of file type, comprising:
Module is obtained, for obtaining the encoded information of user-defined file to be identified;
Comparison module, for judging of the encoded information of pre-stored file in the encoded information and component registration With degree;
Determining module, if being greater than preset value for the matching degree, it is determined that the file type of the user-defined file is The corresponding file type of encoded information of the pre-stored file.
Optionally, the acquisition module is specifically used for:
Obtain binary encoded information of the file header of the user-defined file;
Binary encoded information is converted, hexadecimal encoded information is obtained;
The hexadecimal encoded information is determined as to the encoded information of the file to be identified.
Optionally, the acquisition module is also used to:
The operation requests of client are received, the operation requests include at least inquiry, newly-built, modification or removal request;
Corresponding operation is carried out to the file information in the component registration according to the operation requests;
Wherein, the file information includes at least number, file type, coding corresponding with file type title letter One of breath, registrant and registion time.
Optionally, the determining module, is also used to:
It receives the white list that client is sent and identifies request, wherein included at least in the white list identification request described The file type of user-defined file;
According to the file type of the user-defined file, looked into the white list pre-established in white list component It looks for;
If there are the file types of the file to be identified in the white list, sending to the client allows outgoing Response message.
Optionally, the determining module, is also used to:
Receive the white list operation requests that client is sent, wherein include at least in the white list operation requests described The file type and action type of user-defined file, the action type include at least inquiry, newly-built, modification or delete operation;
According to the file type and action type of the user-defined file, the file in the white list is carried out corresponding Operation.
The third aspect, the embodiment of the invention provides a kind of terminals, comprising: memory, processor and is stored in described deposit On reservoir and the computer program that can run on the processor, the computer program are realized when being executed by the processor The step of any of the above-described.
Fourth aspect, it is described computer-readable to deposit the embodiment of the invention also provides a kind of computer readable storage medium The step of being stored with computer program on storage media, any of the above-described realized when the computer program is executed by processor.
Compared with prior art, the present invention includes the following advantages:
In embodiments of the present invention, by obtaining the encoded information of user-defined file, and judge encoded information and registration group The matching degree of the encoded information of pre-stored file in part, if the matching degree is greater than preset value, it is determined that user-defined file File type is the corresponding file type of encoded information of the pre-stored file, can be to the files classes of user-defined file The problem of type is quickly identified and is parsed, and the accuracy of custom file type identification is improved, and avoids the occurrence of file leakage.
Detailed description of the invention
Fig. 1 shows a kind of step flow chart of the recognition methods of file type provided in an embodiment of the present invention;
Fig. 2 shows the structural schematic diagrams of the management system of the documentary file type of private provided in an embodiment of the present invention;
Fig. 3 a shows the schematic diagram of the i.e. common file of ordinary file type;
Fig. 3 b shows the i.e. private documentary schematic diagram of custom file type;
Fig. 4 shows the schematic diagram that file type is identified by file extension;
Fig. 5 shows the flow diagram of the identification code provided in an embodiment of the present invention for obtaining file type;
Fig. 6 shows custom file type component registration processing flow schematic diagram provided in an embodiment of the present invention;
Fig. 7 shows the processing flow schematic diagram of custom file type recognizer component provided in an embodiment of the present invention;
Fig. 8 shows the reason flow diagram of the custom file type recognizer component of further embodiment of this invention offer;
Fig. 9 shows the processing flow schematic diagram of custom file type white list component provided in an embodiment of the present invention;
Figure 10 shows the schematic diagram of custom file type white list hit logic provided in an embodiment of the present invention;
Figure 11 shows the time diagram of file type identification provided in an embodiment of the present invention;
Figure 12 shows a kind of structural schematic diagram of the identification device of file type provided in an embodiment of the present invention;
Figure 13 shows the structural schematic diagram of terminal provided in an embodiment of the present invention.
Specific embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific real Applying mode, the present invention is described in further detail.
To the present embodiments relate to the nouns arrived to carry out description below:
File type: also known as file format refers to the specific coding side to information that computer uses to store information Formula is the data of internal reservoir for identification.Than if any storage picture, some storage programs, some storage text informations.Often One category information, can one or more file formats be stored in computer storage in.Each file format usually has one kind Or a variety of extension name can be used to identify, it is also possible to without extension name.The file that extension name can help application program to identify Format.
File extension: filename extension, the also referred to as suffix name of file are that operating system is used to indicate text A kind of mechanism of part type.Usually, an extension name be follow it is subsequent in root name, by a separators.Expand Exhibition name is considered the metadata of a type.
File header: file header is the data that one section started positioned at file undertakes certain task, generally all in the portion of beginning Point.
Metadata: Metadata, also known as broker data, relaying data, for data (the data about for describing data Data), the information of data attribute (property) is mainly described, for supporting as indicated storage location, historical data, resource The functions such as lookup, file record.Metadata a kind of electronic type catalogue at last, in order to achieve the purpose that scheduling, it is necessary to retouch The interior perhaps characteristic of data is stated and collected, and then reaches the mesh for assisting data retrieval.
Open source software Tika: support that (Multipurpose Internet Mail Extensions, multipurpose are mutual by MIME Network Mail Extension type) provided by all the Internet media file types.
The file type for checking file extension to identify file, and many application programs and operation system usually can be used System provides the support of these extension name identification, and specific example is as shown in table 1 below.
Table 1
File Filename Extension name
File type identifies (docx) .docx File type identifies (docx) docx
File type identifies (pptx) .pptx File type identifies (pptx) pptx
File type identifies (xlsx) .xlsx File type identifies (xlsx) xlsx
File type identifies (pdf) .pdf File type identifies (pdf) pdf
Aforesaid way place one's entire reliance upon original document name it is whether true and reliable, if be added original document extension name quilt Artificial modification and deletion, then the recognition result of file type and true file type are inconsistent, so that file type be caused to know Not mistake, specific example are as shown in table 2.
Table 2
Furthermore it is also possible to be identified using content of the open source software to file, such as open source software Tika is in one Hold analysis tool, carries comprehensive parser (resolver) tool, the file of substantially all common formats can be parsed, obtain file Metadata (metadata), content (content) etc., return formatted message.Generally speaking can be used as one it is general Analytical tool, document identification and the accuracy of parsing, efficiency are very high, have extremely strong professional.Open source software Tika It is accurate to public known file type identification, but under special usage scenario not such as organization internal custom file type External disclosure, open source software are not ideal enough to the identification of privately owned file type i.e. custom file type.
When Tika detects file, file extension, the content type prompt, Magic number of file are got inside Tika Section, xml (eXtensible Markup Language, extensible markup language) root character, uses Facade at character code The technologies such as (appearance) Class Type detection realize the identification of file type.User-defined file is exemplified below shown in table 3.
Table 3
Customization type file name Tika identification types Actual types
Customization type .roger Encryption type roger
Customization type .rozip Encryption type rozip
In practical applications, there are such scenes: to guarantee that confidential information or data do not leak outside inside certain organization, There are strict supervision and control to the use especially data outgoing of electronic data.Custom file type belongs to privately owned file Type, external disclosure is not only known by file producer for general its file format coding, and the external world quickly can not identify and parse, from And the identical effect of file encryption is reached.When to be supervised to custom file type, it is necessary to user-defined file Type is accurately identified.
If the permission of customization type document uses inside organization, custom file type is just badly in need of being infused Volume, maintenance, use scope receive control, forbid using with sensitive, confidential information the random outgoing of customized document, while energy Accomplish to allow the file of stated type being capable of outgoing use.
Based on above-mentioned content, one embodiment of the invention provides a kind of recognition methods of file type, makes by oneself for identification The file type of adopted file.The executing subject of the present embodiment is the identification device of file type, the device i.e. customized text Part type identification component.
Fig. 1 shows a kind of step flow chart of the recognition methods of file type provided in an embodiment of the present invention, such as Fig. 1 institute Show, can specifically include following steps:
S101, the encoded information for obtaining user-defined file to be identified;
S102, the matching degree for judging the encoded information of pre-stored file in the encoded information and component registration;
If S103, the matching degree are greater than preset value, it is determined that the file type of the user-defined file is described preparatory The corresponding file type of the encoded information of the file of storage.
Fig. 2 shows the structural schematic diagram of the management system of the documentary file type of private provided in an embodiment of the present invention, As shown in Fig. 2, it includes custom file type registration that the management system of this document type, which can also be called anti-data-leakage system, Component 20, custom file type recognizer component 21 and custom file type white list component 22, system setting are servicing Device terminal side, wherein the identification device i.e. custom file type recognizer component 21 of file type.
Specifically, be stored in advance the file type of user-defined file in custom file type component registration, and with from Define the corresponding encoded information of file type of file;Wherein, custom file type component registration i.e. component registration;
Custom file type recognizer component gets the encoded information of user-defined file to be identified, wherein encoded information For hexadecimal encoded information;Judge of the encoded information of pre-stored file in the encoded information and component registration With degree;If the matching degree is greater than preset value, it is determined that the file type of the user-defined file is the pre-stored text The corresponding file type of the encoded information of part.
Illustratively, if the corresponding encoded information of extension name roger is in file in component registration 504b0304140006000800, user-defined file to be identified are encoded to 504b030414006, user-defined file to be identified It encodes identical as the part extended in component registration in the corresponding coding of entitled roger, it is determined that user-defined file to be identified File type is roger.
Illustratively, if the corresponding encoded information of extension name roger is in file in component registration 504b0304140006000800, user-defined file to be identified is encoded to 504b0304140006000800, to be identified to make by oneself The completion extended in the corresponding coding of entitled roger in the coding of adopted file and component registration is identical, it is determined that file to be identified File type be roger.
It should be noted that the preset value of the matching degree of the coding and coding in registration file of user-defined file to be identified It can according to need sets itself, be not specifically limited in embodiments of the present invention.
Fig. 3 a shows the schematic diagram of the i.e. common file of ordinary file type, and Fig. 3 b shows custom file type i.e. Private documentary schematic diagram, as shown in Figure 3a and Figure 3b shows, text of the embodiment of the present invention for the custom file type in Fig. 3 b Part.Wherein, the file in the embodiment of the present invention, including but not limited to document, video, audio or program class file.
Fig. 4 shows the schematic diagram that file type is identified by file extension, as shown in figure 4, the embodiment of the present invention is just It is the extension name in order to identify file, so that it is determined that the file type of user-defined file, improves the standard of the identification of file type True property.
The recognition methods of file type provided in an embodiment of the present invention, by obtaining the encoded information of user-defined file, and Judge the matching degree of the encoded information of pre-stored file in encoded information and component registration, is preset if the matching degree is greater than Value, it is determined that the file type of user-defined file is the corresponding file type of encoded information of the pre-stored file, can It is quickly identified and is parsed with the file type to user-defined file, improve the accuracy of custom file type identification, The problem of avoiding the occurrence of file leakage.
Further embodiment of this invention does further supplementary explanation to method provided by the above embodiment.
On the basis of the above embodiments, optionally, step S101 is specifically included:
Obtain binary encoded information of the file header of the user-defined file;
Binary encoded information is converted, hexadecimal encoded information is obtained;
The hexadecimal encoded information is determined as to the encoded information of the file to be identified.
Specifically, Fig. 5 shows the flow diagram of the identification code provided in an embodiment of the present invention for obtaining file type, such as Shown in Fig. 5, custom file type recognizer component obtains the type coding of file, and the type coding of file is file type Identification code;Binary encoded information A of the file header of file to be identified, such as available file are read in binary form The binary-coded information of preceding 9 bytes of head, then converts hexadecimal encoded information for binary encoded information B forms a character string, and obtained hexadecimal character string is the identification code of the file type of file to be identified.
Optionally, the method also includes:
The operation requests of client are received, the operation requests include at least inquiry, newly-built, modification or removal request;
Corresponding operation is carried out to the file information in the component registration according to the operation requests;
Wherein, the file information includes at least number, file type, coding corresponding with file type title letter One of breath, registrant and registion time.
Specifically, Fig. 6 shows custom file type component registration process flow signal provided in an embodiment of the present invention Figure, as shown in fig. 6, shown in specific step is as follows:
S601: starting, and receives the request that client is sent;
S602: custom file type component registration receives the file request of client transmission, wherein wraps in the request It includes and checks list request and action type request;
S603: custom file type component registration is grasped after the action type request for receiving client transmission Make;
If custom file type component registration recognizes the request of newly-increased operation, execute S604, i.e., it immediately will be new Increase user-defined file the file information, such as by number, file type, encoded information corresponding with the file type title, All record is put in storage by registrant and registion time etc.;
If custom file type component registration recognizes the request of modification operation, S605 is executed, that is, allows to be directed to Specified custom file type information is modified, such as: it can be to number, file type and the file type title Corresponding encoded information, registrant and registion time etc. are updated and modify;
If custom file type component registration recognizes the request of delete operation, S606 is executed, i.e. basis at once File type title, encoded information corresponding with the file type title position and delete the file information note of registration file Record;
S607: database root is updated according to above content;
S608: terminate.
The main function of custom file type component registration is to provide registering functional for custom file type, will be known All existing custom file types included.For example, newly-increased custom file type, modification user-defined file class Type, deletion custom file type, retrieval and inquisition of custom file type etc..
Fig. 7 shows the processing flow schematic diagram of custom file type recognizer component provided in an embodiment of the present invention, such as Shown in Fig. 7, specific steps are as follows shown:
S701: start;
S702: the path of file to be identified is obtained;
S703: it reads file: reading binary encoded information of the file header of file to be identified in binary form, so Hexadecimal encoded information is converted by binary encoded information afterwards, forms a character string;
S704: separate file header information: isolating file header from the hexadecimal encoded information of acquisition, For example, isolating the header information of preceding 9 bytes of file header;
S705: removing file type information: the header information isolated is removed, and therefrom separates file type letter Breath, i.e. encoded information corresponding to file type;
S706: the encoded information of current file encoded information corresponding with the file type in component registration is compared It is right, wherein current file i.e. file to be identified;
S707: judging whether the file type of current file has been registered, that is, judges the encoded information of current file The matching degree of encoded information corresponding with the file type in component registration;
If the encoded information of current file meets matching condition, illustrates that the file type of current file has been registered, hold Row S709;
S709: the file type of current file is obtained;
S708:: if file type is unknown, S710 is executed.
S710: terminate.
Specifically, custom file type recognizer component major function is the exact type that can identify user-defined file, Make up the shortcomings that open source software cannot identify custom file type.
Illustratively, Fig. 8 shows the reason stream of the custom file type recognizer component of further embodiment of this invention offer Journey schematic diagram, as shown in figure 8, if known file type encoded information is X, that is, 504b0304140006000800, it is to be identified File type encoded information is Y, i.e. 504b0304140006, then when X includes Y, so that it may assert files classes to be identified Type is file type roger corresponding to coding X.
Optionally, after the file type for determining the user-defined file, further includes:
It receives the white list that client is sent and identifies request, wherein included at least in the white list identification request described The file type of user-defined file;
According to the file type of the user-defined file, looked into the white list pre-established in white list component It looks for;
If there are the file types of the file to be identified in the white list, sending to the client allows outgoing Response message.
Fig. 9 shows the processing flow schematic diagram of custom file type white list component provided in an embodiment of the present invention, As shown in figure 9, specific steps are as follows shown:
S901: start;
S902: custom file type white list component receives client and sends white list identification request;
S903: custom file type white list component reads custom file type white list;
S904: comparison is retrieved in white list according to the file type of request, judges custom file type to be identified Whether in white list;
S905: if there are the file types of user-defined file to be identified in white list, then it represents that hit white list, white name Unimodule informs that client hit white list allows the file of this document type to carry out outgoing behaviour to client returning response information Make.
S906: the file type as file to be identified is not present in white list, then it represents that do not hit white list, white list Component gives client returning response information, inform client do not hit white list forbid this type user-defined file allow into The operation of row outgoing.
S907: terminate.
Figure 10 shows the schematic diagram of custom file type white list hit logic provided in an embodiment of the present invention, such as schemes Shown in 10, file type white list hit logic are as follows: if it is determined that custom file type be M, when file type M be present in it is white When in list type set be hit white list, on the contrary it is then be miss white list.
Optionally, the method also includes:
Receive the white list operation requests that client is sent, wherein include at least in the white list operation requests described The file type and action type of user-defined file, the action type include at least inquiry, newly-built, modification or delete operation;
According to the file type and action type of the user-defined file, the file in the white list is carried out corresponding Operation.
Specifically, the major function of custom file type white list component are as follows: one custom file type of maintenance Set, each custom file type recorded in set be all it is safe, allow the outgoing of such file type to operate.In In custom file type white list component, can increase newly file type, modification file type, delete file type, realize it is white The search of list file type is checked.
Figure 11 shows the time diagram of file type identification provided in an embodiment of the present invention, as shown in figure 11, specifically Ground:
A1, client send inquiry custom file type list request, registration to custom file type component registration Component inquires database at once and all registrations custom file type on record is returned to client in the form of a list, from And realize the inquiry of custom file type, wherein be stored in advance in database number, file type title, encoded information, The information such as registrant, registion time.
A2, client send newly-increased request to custom file type component registration, and component registration receives newly-increased request, At once such as by custom file type information: number, file type title, encoded information, registrant, registion time information note Typing library.Component registration assembling response message is simultaneously sent to client, informs that the newly-increased request of client operates successfully.
A3, client send modification request to custom file type component registration, and component registration receives modification request, Allow for by custom file type information such as: file type title, encoded information registration information are modified operation.Note Volume component assembling response message is simultaneously sent to client, informs that client modification request operates successfully.
A4, client send removal request to custom file type component registration, and component registration receives removal request, It is positioned according to number, file type title, encoded information and deletes registration information.Component registration assembling response message is simultaneously sent to Client informs that client removal request operates successfully.
A5, client send file type identification request to custom file type recognizer component, when file type identifies Component assembles response message at once and informs client document identification knot according to recognizer and the exact type of retrieval document Fruit.
A6, client send the identification request of file type white list, white list to custom file type white list component Recognizer component carries out match query according to file type title, encoded information in known white list;No matter match query Success or not, white list component all can notify whether client hits white list.Client receives response message, determines this type Whether the customized document of type allows outgoing, it may be assumed that such file type then allows document outgoing within the scope of white list, on the contrary then prohibit Only document outgoing.
The recognition methods of file type provided in an embodiment of the present invention, by obtaining the encoded information of user-defined file, and Judge the matching degree of the encoded information of pre-stored file in encoded information and component registration, is preset if the matching degree is greater than Value, it is determined that the file type of user-defined file is the corresponding file type of encoded information of the pre-stored file, can It is quickly identified and is parsed with the file type to user-defined file, improve the accuracy of custom file type identification, The problem of avoiding the occurrence of file leakage.
It should be noted that for simple description, therefore, it is stated as a series of action groups for embodiment of the method It closes, but those skilled in the art should understand that, embodiment of that present invention are not limited by the describe sequence of actions, because according to According to the embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art also should Know, the embodiments described in the specification are all preferred embodiments, and the related movement not necessarily present invention is implemented Necessary to example.
Another embodiment of the present invention provides a kind of identification devices of file type, for executing side provided by the above embodiment Method.
Figure 12 shows a kind of structural schematic diagram of the identification device of file type provided in an embodiment of the present invention, such as Figure 12 It is shown, it can specifically include: obtaining module 10, comparison module 20 and determining module 30, in which:
Module is obtained, for obtaining the encoded information of user-defined file to be identified;
Comparison module, for judging of the encoded information of pre-stored file in the encoded information and component registration With degree;
Determining module, if being greater than preset value for the matching degree, it is determined that the file type of the user-defined file is The corresponding file type of encoded information of the pre-stored file.
The identification device of file type provided in an embodiment of the present invention, by obtaining the encoded information of user-defined file, and Judge the matching degree of the encoded information of pre-stored file in encoded information and component registration, is preset if the matching degree is greater than Value, it is determined that the file type of user-defined file is the corresponding file type of encoded information of the pre-stored file, can It is quickly identified and is parsed with the file type to user-defined file, improve the accuracy of custom file type identification, The problem of avoiding the occurrence of file leakage.
Further embodiment of this invention does further supplementary explanation to device provided by the above embodiment.
Optionally, the acquisition module is specifically used for:
Obtain binary encoded information of the file header of the user-defined file;
Binary encoded information is converted, hexadecimal encoded information is obtained;
The hexadecimal encoded information is determined as to the encoded information of the file to be identified.
Optionally, the acquisition module is also used to:
The operation requests of client are received, the operation requests include at least inquiry, newly-built, modification or removal request;
Corresponding operation is carried out to the file information in the component registration according to the operation requests;
Wherein, the file information includes at least number, file type, coding corresponding with file type title letter One of breath, registrant and registion time.
Optionally, the determining module, is also used to:
It receives the white list that client is sent and identifies request, wherein included at least in the white list identification request described The file type of user-defined file;
According to the file type of the user-defined file, looked into the white list pre-established in white list component It looks for;
If there are the file types of the file to be identified in the white list, sending to the client allows outgoing Response message.
Optionally, the determining module, is also used to:
Receive the white list operation requests that client is sent, wherein include at least in the white list operation requests described The file type and action type of user-defined file, the action type include at least inquiry, newly-built, modification or delete operation;
According to the file type and action type of the user-defined file, the file in the white list is carried out corresponding Operation.
The identification device of file type provided in an embodiment of the present invention will by obtaining the encoded information of file to be identified The encoded information is compared with the encoded information of good file registered in advance in component registration, if the file to be identified The matching degree of encoded information and the encoded information of the file registered is greater than preset value, it is determined that the file to be identified File type is the corresponding file type of encoded information of the file registered.The embodiment of the present invention improves file type and knows Other accuracy.
For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simple Place illustrates referring to the part of embodiment of the method.
Yet another embodiment of the invention provides a kind of terminal, that is, server end, provided by the above embodiment for executing Method.
Figure 13 shows the structural schematic diagram of terminal provided in an embodiment of the present invention, as shown in figure 13, the terminal include: to A few processor 51 and memory 52;
The memory stores computer program;At least one described processor executes the computer of the memory storage Program, to realize method provided by the above embodiment.
According to the terminal of the present embodiment, by obtaining the encoded information of user-defined file, and encoded information and registration are judged The matching degree of the encoded information of pre-stored file in component, if the matching degree is greater than preset value, it is determined that user-defined file File type be the pre-stored file the corresponding file type of encoded information, can be to the file of user-defined file Type is quickly identified and is parsed, and the accuracy of custom file type identification is improved, and avoids the occurrence of asking for file leakage Topic.Further embodiment of this invention provides a kind of computer readable storage medium, and meter is stored in the computer readable storage medium Calculation machine program, the computer program are performed the method for realizing that any of the above-described embodiment provides.
According to the computer readable storage medium of the present embodiment, by obtaining the encoded information of user-defined file, and judge The matching degree of the encoded information of pre-stored file in encoded information and component registration, if the matching degree is greater than preset value, The file type for determining user-defined file is the corresponding file type of encoded information of the pre-stored file, can be to certainly The file type for defining file is quickly identified and is parsed, and is improved the accuracy of custom file type identification, is avoided out The problem of existing file leakage.
The embodiment of the present invention can be used in the leakage-preventing system of data (NDLP), for identification customized Doctype, tool Body, in recent years, socio-economic development is higher and higher to information-based requirement and degree of dependence, keen competition is faced, including Financial industry information technology including bank, insurance, security etc. is grown rapidly, and not only greatly accelerates globalization process, Er Qiezheng In the developing direction and form for rapidly changing financial industry (bank, insurance, security).However, internet finance shoots up Also along with risk challenge, information security events happen occasionally, and more become worse in recent years.According to incompletely statistics, it cuts There are nearly 165 P2P platforms since hacker attack causes systemic breakdown or data to be maliciously tampered to the end of the year in 2014, the time P2P is in the common people at the synonym of high risk, and in government in the eyes at the severely afflicated area of invalidation of government's supervising, thus country is for pass It is to propose higher security protection requirement to the financial industry etc. of national economy, to avoid leakage of data event occurs.
The characteristics of for financial industry, in actual project (data exchange) exploitation, by constantly groping and testing Verifying develops recognition methods and the device of custom file type described in the embodiment of the present invention, and the method prevents letting out in data Dew system (NDLP system) is applied, and also performance is good in practical applications, is widelyd popularize in actual items.
Data leakage-preventing system application flow of the method in financial industry, In consistent with process described in Figure 11 This is repeated no more.
Further embodiment of this invention can be used for the Doctype that file scanning tool identifies customized document, specifically, In During the development and implementation of file scanning tool project, manager is identified using custom file type proposed by the invention Method can prepare to identify specific customized Doctype and content information in the document data of TB rank, in operation phase table Now good, detailed process is not repeating herein.
Wherein, the computer readable storage medium, as read-only memory (Read-Only Memory, abbreviation ROM), Random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc..
All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are with The difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can provide as method, system or calculate Machine program product.Therefore, the embodiment of the present invention can be used complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can be used one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.
The embodiment of the present invention be referring to according to the method for the embodiment of the present invention, terminal (system) and computer program product Flowchart and/or the block diagram describe.It should be understood that can be realized by computer program instructions in flowchart and/or the block diagram The combination of process and/or box in each flow and/or block and flowchart and/or the block diagram.It can provide these calculating Processing of the machine program instruction to general purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Device is to generate a machine, so that being generated by the instruction that the processor of computer or other programmable data processing terminals executes For realizing the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram System.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing terminals with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of system, the instruction system realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions can also be loaded on computer or other programmable data processing terminals, so that counting Series of operation steps are executed on calculation machine or other programmable terminals to generate computer implemented processing, thus in computer or The instruction executed on other programmable terminals is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the embodiment of the present invention has been described, once a person skilled in the art knows bases This creative concept, then additional changes and modifications can be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or the terminal that include a series of elements not only include that A little elements, but also including other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of terminal.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except there is also other identical elements in process, method, article or the terminal for including the element.
Above to a kind of method for synchronizing time provided by the present invention and a kind of time synchronism apparatus, detailed Jie has been carried out It continues, used herein a specific example illustrates the principle and implementation of the invention, and the explanation of above embodiments is only It is to be used to help understand method and its core concept of the invention;At the same time, for those skilled in the art, according to this hair Bright thought, there will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not manage Solution is limitation of the present invention.

Claims (10)

1. a kind of recognition methods of file type characterized by comprising
Obtain the encoded information of user-defined file to be identified;
Judge the matching degree of the encoded information of pre-stored file in the encoded information and component registration;
If the matching degree is greater than preset value, it is determined that the file type of the user-defined file is the pre-stored file The corresponding file type of encoded information.
2. the method according to claim 1, wherein the encoded information for obtaining user-defined file to be identified, Include:
Obtain binary encoded information of the file header of the user-defined file;
Binary encoded information is converted, hexadecimal encoded information is obtained;
The hexadecimal encoded information is determined as to the encoded information of the file to be identified.
3. the method according to claim 1, wherein the method also includes:
The operation requests of client are received, the operation requests include at least inquiry, newly-built, modification or removal request;
Corresponding operation is carried out to the file information in the component registration according to the operation requests;
Wherein, the file information include at least number, file type, encoded information corresponding with the file type title, One of registrant and registion time.
4. the method according to claim 1, wherein after the file type for determining the user-defined file, Further include:
It receives the white list that client is sent and identifies request, wherein include at least described make by oneself in the white list identification request The file type of adopted file;
According to the file type of the user-defined file, searched in the white list pre-established in white list component;
If there are the file types of the file to be identified in the white list, the sound for allowing outgoing is sent to the client Answer information.
5. according to the method described in claim 4, it is characterized in that, the method also includes:
Receive the white list operation requests that client is sent, wherein described make by oneself is included at least in the white list operation requests The file type and action type of adopted file, the action type include at least inquiry, newly-built, modification or delete operation;
According to the file type and action type of the user-defined file, the file in the white list is grasped accordingly Make.
6. a kind of identification device of file type characterized by comprising
Module is obtained, for obtaining the encoded information of user-defined file to be identified;
Comparison module, for judging the matching of the encoded information of pre-stored file in the encoded information and component registration Degree;
Determining module, if being greater than preset value for the matching degree, it is determined that the file type of the user-defined file is described The corresponding file type of the encoded information of pre-stored file.
7. device according to claim 6, which is characterized in that the acquisition module is specifically used for:
Obtain binary encoded information of the file header of the user-defined file;
Binary encoded information is converted, hexadecimal encoded information is obtained;
The hexadecimal encoded information is determined as to the encoded information of the file to be identified.
8. device according to claim 6, which is characterized in that the acquisition module is also used to:
The operation requests of client are received, the operation requests include at least inquiry, newly-built, modification or removal request;
Corresponding operation is carried out to the file information in the component registration according to the operation requests;
Wherein, the file information include at least number, file type, encoded information corresponding with the file type title, One of registrant and registion time.
9. a kind of terminal characterized by comprising memory, processor and be stored on the memory and can be at the place The computer program run on reason device is realized when the computer program is executed by the processor as appointed in claim 1 to 5 One the step of.
10. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program is realized when the computer program is executed by processor such as the step of any one of claims 1 to 5.
CN201910833084.XA 2019-09-04 2019-09-04 A kind of recognition methods of file type and device Pending CN110532529A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910833084.XA CN110532529A (en) 2019-09-04 2019-09-04 A kind of recognition methods of file type and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910833084.XA CN110532529A (en) 2019-09-04 2019-09-04 A kind of recognition methods of file type and device

Publications (1)

Publication Number Publication Date
CN110532529A true CN110532529A (en) 2019-12-03

Family

ID=68666836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910833084.XA Pending CN110532529A (en) 2019-09-04 2019-09-04 A kind of recognition methods of file type and device

Country Status (1)

Country Link
CN (1) CN110532529A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111143849A (en) * 2019-12-31 2020-05-12 奇安信科技集团股份有限公司 File type identification method and device applied to electronic equipment and electronic equipment
CN111159758A (en) * 2019-12-18 2020-05-15 深信服科技股份有限公司 Identification method, device and storage medium
CN111694574A (en) * 2020-06-12 2020-09-22 北京百度网讯科技有限公司 Method, device, equipment and storage medium for instruction code processing
CN112738085A (en) * 2020-12-28 2021-04-30 深圳前海微众银行股份有限公司 File security verification method, device, equipment and storage medium
CN116226046A (en) * 2023-03-16 2023-06-06 北京中宏立达科技发展有限公司 File type detection method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571767A (en) * 2011-12-24 2012-07-11 成都市华为赛门铁克科技有限公司 File type recognition method and file type recognition device
CN102768676A (en) * 2012-06-14 2012-11-07 腾讯科技(深圳)有限公司 Method and device for processing file with unknown format
CN106227893A (en) * 2016-08-24 2016-12-14 乐视控股(北京)有限公司 A kind of file type acquisition methods and device
CN106844476A (en) * 2016-12-23 2017-06-13 上海上讯信息技术股份有限公司 A kind of method and apparatus for recognizing file format and correspondence integrality
CN107277037A (en) * 2017-07-14 2017-10-20 北京安数云信息技术有限公司 Any file operation detection method and device based on plug-in unit

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102571767A (en) * 2011-12-24 2012-07-11 成都市华为赛门铁克科技有限公司 File type recognition method and file type recognition device
CN102768676A (en) * 2012-06-14 2012-11-07 腾讯科技(深圳)有限公司 Method and device for processing file with unknown format
CN106227893A (en) * 2016-08-24 2016-12-14 乐视控股(北京)有限公司 A kind of file type acquisition methods and device
CN106844476A (en) * 2016-12-23 2017-06-13 上海上讯信息技术股份有限公司 A kind of method and apparatus for recognizing file format and correspondence integrality
CN107277037A (en) * 2017-07-14 2017-10-20 北京安数云信息技术有限公司 Any file operation detection method and device based on plug-in unit

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159758A (en) * 2019-12-18 2020-05-15 深信服科技股份有限公司 Identification method, device and storage medium
CN111143849A (en) * 2019-12-31 2020-05-12 奇安信科技集团股份有限公司 File type identification method and device applied to electronic equipment and electronic equipment
CN111694574A (en) * 2020-06-12 2020-09-22 北京百度网讯科技有限公司 Method, device, equipment and storage medium for instruction code processing
CN111694574B (en) * 2020-06-12 2023-11-14 北京百度网讯科技有限公司 Method, device, equipment and storage medium for processing instruction codes
CN112738085A (en) * 2020-12-28 2021-04-30 深圳前海微众银行股份有限公司 File security verification method, device, equipment and storage medium
CN112738085B (en) * 2020-12-28 2023-08-08 深圳前海微众银行股份有限公司 File security verification method, device, equipment and storage medium
CN116226046A (en) * 2023-03-16 2023-06-06 北京中宏立达科技发展有限公司 File type detection method and system
CN116226046B (en) * 2023-03-16 2023-09-08 北京中宏立达科技发展有限公司 File type detection method and system

Similar Documents

Publication Publication Date Title
CN110532529A (en) A kind of recognition methods of file type and device
KR20080005491A (en) Efficiently describing relationships between resources
US10521407B2 (en) Grouping of database objects
CN110197085A (en) A kind of document tamper resistant method based on fabric alliance chain
CN112988667B (en) Data storage method and device based on block chain network
US9292703B2 (en) Electronic document management method
US8725774B2 (en) Enforcing policies over linked XML resources
CN114491518A (en) Unauthorized access detection method, device, system and medium
CN112307318B (en) Content publishing method, system and device
CN110008462A (en) A kind of command sequence detection method and command sequence processing method
CN115114372A (en) Data processing method, device and equipment based on block chain and readable storage medium
US8639707B2 (en) Retrieval device, retrieval system, retrieval method, and computer program for retrieving a document file stored in a storage device
JP4807364B2 (en) Information management device
CN115437930B (en) Webpage application fingerprint information identification method and related equipment
CN108170867B (en) Metadata service system
US20170235727A1 (en) Electronic Filing System for Electronic Document and Electronic File
CN114491184B (en) Data processing method and device, storage medium and electronic equipment
CN112685389B (en) Data management method, data management device, electronic device, and storage medium
US8788483B2 (en) Method and apparatus for searching in a memory-efficient manner for at least one query data element
JP5017405B2 (en) Regulation management device and program
CN110879835A (en) Data processing method, device and equipment based on block chain and readable storage medium
CN110377584A (en) A kind of access method and device of the data structure edition compatibility based on metadata
CN118312531B (en) Query language generation method, system, electronic device and storage medium
CN116594658B (en) Version upgrading method and device for metadata, electronic equipment and medium
KR101407334B1 (en) The method, device and server for providing music service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191203

RJ01 Rejection of invention patent application after publication