CN104077304B - file identification system and method - Google Patents

file identification system and method Download PDF

Info

Publication number
CN104077304B
CN104077304B CN201310104918.6A CN201310104918A CN104077304B CN 104077304 B CN104077304 B CN 104077304B CN 201310104918 A CN201310104918 A CN 201310104918A CN 104077304 B CN104077304 B CN 104077304B
Authority
CN
China
Prior art keywords
file
symbol
request
microsoft
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310104918.6A
Other languages
Chinese (zh)
Other versions
CN104077304A (en
Inventor
谢奕智
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu International Technology Shenzhen Co Ltd
Original Assignee
Baidu International Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Baidu International Technology Shenzhen Co Ltd filed Critical Baidu International Technology Shenzhen Co Ltd
Priority to CN201310104918.6A priority Critical patent/CN104077304B/en
Publication of CN104077304A publication Critical patent/CN104077304A/en
Application granted granted Critical
Publication of CN104077304B publication Critical patent/CN104077304B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of file identification system is provided, the system includes:Information extraction unit, it is configured as extracting the routing information of symbol from the Debugging message of Portable executable file;Scalar/vector, it is configured with the routing information generation file identification address of the symbol of extraction;Request unit, the file identification address of generation is configured with to for verifying that the symbol server of file sends request, to identify whether the Portable executable file is certain types of file.

Description

File identification system and method
Technical field
The application is related to a kind of file identification system and method, more particularly, is related to a kind of for whether confirming file Belong to the system and method for Microsoft's file.
Background technology
Identification Microsoft file generally has very big application in terms of checking and killing virus, for example, being added in some antivirus engines Heuristic antivirus, sometimes Microsoft's file virus may be identified as by heuristic antivirus software and be eliminated, cause serious consequence. In addition, collected in the white list of cloud, it is also necessary to the file which file is Microsoft is distinguished, to be done to these files Some strategy processing.Therefore a kind of method for identifying Microsoft's file is needed.
The method that presently, there are some identification Microsoft files, for example, the company information that 1) can check version in resource is not It is Microsoft Corporation, 2) company information of version is Microsoft on resource is judged Digital signature is recycled to be verified on the basis of Corporation.However, the equal existing defects of both the above scheme, for side Case 1), it is auxiliary to obscure antivirus that the exabyte of the file of oneself is often changed to Microsoft Corporation by current virus The file identification of assistant engineer's tool, therefore be difficult to play a part of accurately identifying Microsoft's file.For scheme 2), due to checking numeral label The speed of name is generally relatively slow, therefore also is difficult to realize quick identification Microsoft file.In addition, in WINDOWS XP systems, Microsoft The signing messages of file is not directly embedded into file, and is to rely on the associated documents in system, and in some piracy systems In system, optimization system, often these files are deleted, cause signature verification to fail, so that can not correctly identify Microsoft's file.
Therefore, it is necessary to a kind of method for accurately and rapidly identifying Microsoft's file.
The content of the invention
It is an object of the invention to provide a kind of file identification system and method for recognizable Microsoft's file.
In addition, the present invention can at least solve the above problems and/or shortcoming and will provide at least advantages described below, and The technical problem that the present invention can solve the problem that is not limited to the technical problem described in specification.
According to an aspect of the present invention, there is provided a kind of file identification system, the system include:Information extraction unit, It is configured as extracting the routing information of symbol from the Debugging message of Portable executable file;Scalar/vector, it is configured as Use the routing information generation file identification address of the symbol of extraction;Request unit, it is configured with the file identification of generation Address is to for verifying that the symbol server of file sends request, to identify whether the Portable executable file is certain kinds The file of type.
When request unit sends request using the file identification address of generation to the symbol server, if the symbol Number server responds to the request, then can determine that the Portable executable file is the certain types of file, If the symbol server does not respond to the request, it is not described for can determine that the Portable executable file Certain types of file.
The certain types of file can be the Portable executable file of Microsoft, and the server can be micro- Soft symbol server.
The file identification address can be Uniform Resource Identifier address.
According to another aspect of the present invention, there is provided a kind of file identification method, methods described include:It can be held from portable The symbolic information in the Debugging message extraction path of style of writing part;Use the routing information generation file identification address of the symbol of extraction; Using the file identification address of generation to for verifying that the symbol server of file sends request, to identify that the portable can be held Whether part of composing a piece of writing is certain types of file.
When the file identification address using generation sends request to the symbol server, if the symbol server The request is responded, then can determine that the Portable executable file is the certain types of file, if described Symbol server does not respond to the request, then can determine that the Portable executable file is not the particular type File.
The certain types of file can be the Portable executable file of Microsoft, and the server can be micro- Soft symbol server.
The file identification address can be Uniform Resource Identifier address.
Beneficial effect
By using the solution of the present invention, Microsoft's file rapidly can not only be identified and collected, additionally it is possible to reported inspiring When alert, whether the file of fast verification detection is Microsoft's file, to avoid reporting by mistake.
Brief description of the drawings
By the description carried out with reference to example to accompanying drawing, above and other purpose of the invention and feature will become It is clearer, wherein:
Fig. 1 is to show file identification system according to an embodiment of the invention;
Fig. 2 is to show that portable according to an embodiment of the invention performs file (PE files) structure;
Fig. 3 is to show file identification method according to an embodiment of the invention.
Embodiment
The description that carries out referring to the drawings is provided to help the sheet that comprehensive understanding is limited by claim and its equivalent The exemplary embodiment of invention.The description includes various specific details to help to understand, but these details are to be considered merely as showing Example property.Therefore, one of ordinary skill in the art will be recognized that:Without departing from the scope and spirit of the present invention, The embodiments described herein can be made various changes and modifications.In addition, for clarity and conciseness, known function and structure can be omitted The description made.
In the present invention, Portable executable file (PE files) is can perform in Microsoft's Windows operating system Program file, for example .exe files .dll files etc..The present invention will be said by taking the PE files for identifying Microsoft as an example below It is bright.
Fig. 1 is to show file identification system 100 according to an embodiment of the invention.
As shown in fig. 1, the file identification system 100 may include information extraction unit 101, scalar/vector 102 And request unit 103.
Described information extraction unit 101 can be used for the routing information that symbol is extracted from the Debugging message of PE files.Specifically Say, when generating PE files using the compiler of Microsoft, general acquiescence can store some Debugging message in PE files, for example, The path of symbol.The symbol server that the path of symbol can be used for providing from Microsoft loads corresponding symbol file.Therefore, may be used Using the routing information of the symbol obtained from the Debugging message of PE files come determine a PE file whether be Microsoft PE text Part.The routing information of symbol explained below using in Debugging message come identify PE files whether be Microsoft PE files tool Gymnastics is made.
The routing information generation file of the symbol of extraction can be used to know for scalar/vector 102 according to embodiments of the present invention Other address.When the PE files are the PE files of Microsoft, due to being to use the symbol extracted from the Debugging message of the PE files Number routing information produce file identification address, therefore when using the file identification address come to the symbol server of Microsoft hair When playing request, a response can be necessarily obtained.Therefore, generation can be used in request unit 103 according to an embodiment of the invention File identification address send request to the symbol server, with identify the PE files whether be Microsoft PE files.
The method of the PE files of the Debugging message identification Microsoft using PE files is described in detail below with reference to Fig. 2 and Fig. 3.
Fig. 2 is to show PE file structures according to an embodiment of the invention.Fig. 3 be show it is according to an embodiment of the invention File identification method.
As shown in figure 3, in step 301, the road of symbol can be extracted from the Debugging message of PE files by information extraction unit 101 Footpath information.
Specifically, when it is determined that whether specific PE files are the PE files of Microsoft, PE files can be loaded into first interior In depositing, and the optional head OptionalHeader of PE files is navigated to, wherein, the OptionalHeader is directed in Fig. 2 The pointer of " IMAGE_OPTIONAL_HEADER32 ".By the operation, skew Debug of the Debugging message in internal memory can be obtained, Wherein, Debug=OptionalHeader.DataDirectory [IMAGE_DIRECTORY_ENTRY_DEBUG] And Debug direction structure bodies IMAGE_DEBUG_DIRECTORY, the structure IMAGE_ .VirtualAddress, DEBUG_DIRECTORY grammers are as follows:
By above described structure, storage Debugging message data can be found by grammer Debug- > AddressOfRawData Skews of the structure _ RSDSI in internal memory, wherein, structure _ RSDSI is defined as:
By using _ skews of the RSDSI in internal memory, it can be extracted from structure _ RSDSI and represent symbol path or symbol Title field szPdb, for identifier uniqueness field guidSig and for indicate it is corresponding with symbol The field age for the number that file is compiled, wherein, szPdb represents the title of symbol or the complete trails of symbol, if szPdb The complete trails of symbol is represented, then can extract out the filename of symbol, and be designated as szPdbName.
Then, in step 303, the routing information of the symbol of extraction can be used to know to generate file for scalar/vector 102 Other address.Specifically, can group by using szPdbName, guidSig, the age extracted in above described structure _ RSDSI Into the reference address (that is, URL) of symbol file, the form of the reference address is formed as follows:
Download symbols %s %08X%04X%04X%02X%02X%02X%02X%02X% 02X%02X%02X%d %s_.
In the address, two " %s " each mean szPdbName, and the content between described two " %s " is by guidSig Formed with age." %08X ", " %04X " etc. is the form of format string, and the part after " % " expression " % " is to carry out Format data, the digit when expression such as 08,04 is formatted shared by data, 0 filling of insufficient section.X is represented according to 16 System is formatted to data, for example, FORMAT (" %08X-%02X-%04X ", 0xa, 0xb, 0xc) final result is 0000000a-0b-000c, FORMAT (" %08X ", 0x100) final result be 00000100, FORMAT (" %04X ", Final result 0x100) is 0100.Above-mentioned URL method is formed for ability using szPdbName, guidSig, age field It is known for field technique personnel, therefore will not be described in detail again herein.
After generation file identification address, in step 305, scalar/vector 102 can be used to generate for request unit 103 File identification address send request to symbol server, i.e. using the URL of the symbol file generated in step 303 to micro- The symbol server of soft offer sends request, with identify the PE files whether be Microsoft PE files.
In the present invention, when request unit 103 sends request using the file identification address of generation to symbol server, If symbol server responds to the request, it is determined that the PE files are the PE files of Microsoft, if symbol service Device does not respond to the request, it is determined that the PE files are not the PE files of Microsoft.That is, when PE texts When part is the PE files of Microsoft, the address being made up of the symbol path of the Debugging message extraction from the PE files is to can be used for accessing The address of the symbol server of Microsoft, in this case, when use, the address being capable of symbol server described in successful access When (that is, described symbol server is responded to the request), it is believed that the PE files are the PE files of Microsoft.Phase Instead, if the PE files are not the PE files of Microsoft, use by the symbol path of the Debugging message extraction from the PE files The address of composition can not access the symbol server of Microsoft, and in this case, the symbol server is also impossible to asking The request of unit 103 is asked to carry out any response.
By the above-mentioned means, can accurately and rapidly identify PE files to be detected whether be Microsoft PE files.This Outside, it should be appreciated that although in the present invention to be described in detail exemplified by the PE files for identifying Microsoft, people in the art It should be understood that the invention is not restricted to this, method of the invention is applicable to any its with above-mentioned similar Debugging message member He is identified and verified PE files.
Exemplary embodiment of the present can be realized as the computer-readable code on computer readable recording medium storing program for performing.Computer Readable medium recording program performing is the arbitrary data storage device that can store the data that can be read by computer system thereafter.It is computer-readable The example of recording medium includes:Read-only storage (ROM), random access memory (RAM), CD-ROM, tape, floppy disk, light number According to storage device and carrier wave (data transfer for such as passing through internet through wired or wireless transmission path).Computer-readable record Medium also can be distributed in the computer system of connection network, so as to which computer-readable code is stored and performed in a distributed manner.In addition, Function program, code and the code segment for completing the present invention can be easily by the ordinary programmers in field related to the present invention at these Explained within the scope of invention.
Although the present invention, those skilled in the art are particularly shown and described with reference to its exemplary embodiment It should be understood that in the case where not departing from the spirit and scope of the present invention that claim is limited, form can be carried out to it With the various changes in details.

Claims (8)

1. a kind of file identification system, the system include:
Information extraction unit, it is configured as extracting the routing information of symbol from the Debugging message of Portable executable file;
Scalar/vector, it is configured with the routing information generation file identification address of the symbol of extraction;
Request unit, the file identification address for being configured with generation ask to for verifying that the symbol server of file is sent Ask, to identify whether the Portable executable file is certain types of file.
2. the system as claimed in claim 1, wherein, when request unit is taken using the file identification address of generation to the symbol When business device sends request, if the symbol server responds to the request, it is determined that the executable text of the portable Part is the certain types of file,
If the symbol server does not respond to the request, it is determined that the Portable executable file is not institute State certain types of file.
3. the system as claimed in claim 1, wherein, the certain types of file is the Portable executable file of Microsoft, And the server is the symbol server of Microsoft.
4. the system as claimed in claim 1, wherein, the file identification address is Uniform Resource Identifier address.
5. a kind of file identification method, methods described include:
The routing information of symbol is extracted from the Debugging message of Portable executable file;
Use the routing information generation file identification address of the symbol of extraction;
Using the file identification address of generation to for verifying that the symbol server of file sends request, to identify the portable Whether executable file is certain types of file.
6. method as claimed in claim 5, wherein, sent when using the file identification address of generation to the symbol server During request, if the symbol server responds to the request, it is determined that the Portable executable file is described Certain types of file,
If the symbol server does not respond to the request, it is determined that the Portable executable file is not institute State certain types of file.
7. method as claimed in claim 5, wherein, the certain types of file is the Portable executable file of Microsoft, And the server is the symbol server of Microsoft.
8. method as claimed in claim 5, wherein, the file identification address is Uniform Resource Identifier address.
CN201310104918.6A 2013-03-28 2013-03-28 file identification system and method Active CN104077304B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310104918.6A CN104077304B (en) 2013-03-28 2013-03-28 file identification system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310104918.6A CN104077304B (en) 2013-03-28 2013-03-28 file identification system and method

Publications (2)

Publication Number Publication Date
CN104077304A CN104077304A (en) 2014-10-01
CN104077304B true CN104077304B (en) 2017-12-19

Family

ID=51598564

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310104918.6A Active CN104077304B (en) 2013-03-28 2013-03-28 file identification system and method

Country Status (1)

Country Link
CN (1) CN104077304B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090094A (en) * 2016-11-23 2018-05-29 北京国双科技有限公司 A kind of text message sorting technique and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183418A (en) * 2007-12-25 2008-05-21 北京大学 Windows concealed malevolence software detection method
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101183418A (en) * 2007-12-25 2008-05-21 北京大学 Windows concealed malevolence software detection method
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof

Also Published As

Publication number Publication date
CN104077304A (en) 2014-10-01

Similar Documents

Publication Publication Date Title
CN102243699B (en) Malicious code detection method and system
JP6913746B2 (en) Kernel module loading method and equipment
CN104461491B (en) The operation method and system of a kind of Hybrid components
CN108960830B (en) Intelligent contract deployment method, device, equipment and storage medium
US20140082729A1 (en) System and method for analyzing repackaged application through risk calculation
JP5106643B2 (en) Web page alteration detection device and program
US9934229B2 (en) Telemetry file hash and conflict detection
CN106716398B (en) Visually distinguishing character strings for testing
CN110457628B (en) Webpage version checking method, device, equipment and storage medium
CN104092544B (en) The services signatures method and apparatus of compatible Android application
CN106355092B (en) System and method for optimizing anti-virus measurement
US8966274B2 (en) File tamper detection
CN105760761A (en) Software behavior analyzing method and device
JPWO2011121927A1 (en) Digital content management system, apparatus, program, and method
US20180316696A1 (en) Analysis apparatus, analysis method, and analysis program
CN110147653B (en) Application program security reinforcing method and device
CN108228312A (en) The system and method that code is performed by interpreter
CN104077304B (en) file identification system and method
CN105701405A (en) System and method for antivirus checking of native images of software assemblies
KR20150051508A (en) Method for software asset management based on software birthmark and apparatus thereof
GB2494498A (en) Handling defined areas within an electronic document to preserve integrity and context
CN112994900B (en) File countersigning method, device, client, server and storage medium
CN106406949B (en) Configuration file processing method and device
JP5316719B2 (en) Verification program and verification device
US8745750B2 (en) Origination verification using execution transparent marker context

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant