CN101470620B - Method and apparatus for judging PE file source code consistency - Google Patents

Method and apparatus for judging PE file source code consistency Download PDF

Info

Publication number
CN101470620B
CN101470620B CN 200710033035 CN200710033035A CN101470620B CN 101470620 B CN101470620 B CN 101470620B CN 200710033035 CN200710033035 CN 200710033035 CN 200710033035 A CN200710033035 A CN 200710033035A CN 101470620 B CN101470620 B CN 101470620B
Authority
CN
China
Prior art keywords
section
file
information
resource
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 200710033035
Other languages
Chinese (zh)
Other versions
CN101470620A (en
Inventor
张康宗
王钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Kingsoft Software Co Ltd
Original Assignee
Zhuhai Kingsoft Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Kingsoft Software Co Ltd filed Critical Zhuhai Kingsoft Software Co Ltd
Priority to CN 200710033035 priority Critical patent/CN101470620B/en
Publication of CN101470620A publication Critical patent/CN101470620A/en
Application granted granted Critical
Publication of CN101470620B publication Critical patent/CN101470620B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for judging the consistency of PE file source codes and a device, through analyzing a public structure of a PE file, whether various PE files awaiting to analyze are based on same source codes can be judged when the PE files awaiting to analyze comprise same number of segments and same content of critical segments. The method for judging the consistency of PE file source codes does not depend on the content of original source codes, thereby being capable of not accessing a source code library, and guaranteeing the security safety of the source code library.

Description

The decision method of PE file source code consistency and device
Technical field
The present invention relates to a kind of decision method and device of PE file source code consistency.
Background technology
In the process of software development, the developer can periodically or compile the source code in the code library aperiodically, and along with the introducing of dailybuild (daily compiling) technology, product can ceaselessly compile constantly, thereby produce a large amount of PE files, because when compiling embedded some and compilation time, the information that compiler version etc. are relevant, these information are compiled and enter PE file inside, even cause the PE file based on the identical sources code, repeatedly compiling later binary file contrast also is not quite similar, Production Version also ceaselessly changes, in addition, after adding digital signature technology, data signature mechanism and hash mechanism also can cause the modification on the binary format basis of PE file.Although can carry out some technical test guarantees by automatization testing technique, the tester still needs to confirm have which module to carry out other change of source code level in compilation process, so that the tester determines the test emphasis, in addition, for software version issue personnel, need to confirm that the version that test is passed through is based on identical source code with the version that current preparation is issued, if based on identical source code, then can issue.
In the prior art, judging that two PE files are whether during based on identical source code, two corresponding all source code file of PE file when normally contrasting twice compiling, whether the content of judging these source codes is identical, if source code is identical, then is based on identical source code, this decision procedure, access all source codes that generate the PE file, need the authority of higher source code library, even can threaten the secret and safe of source code library.
Summary of the invention
The object of the present invention is to provide a kind of based on the PE file, do not need the access originator code library, can judge the PE file content whether based on method and the device of identical source code, namely the decision method of PE file source code consistency and device can not affect the security of source code library.
For achieving the above object, the present invention by the following technical solutions:
A kind of decision method of PE file source code consistency comprises step:
Step 1: whether the number of judging the section that each PE file comprises is identical, if identical, enters step 2, if not identical, judges that directly each described PE file is based on different source codes;
Step 2: enumerate the section that each described PE file comprises, judge whether the current section of enumerating comprises Debugging message, if comprise, enters step 3, if do not comprise, enters step 4;
Step 3: judge whether the information except the first inessential information is identical in the Debugging message of the current section of enumerating of each described PE file, if identical, enter step 4, if it is not identical, judge that directly each described PE file is based on different source codes, the described first inessential information comprises timestamp information;
Step 4: judge whether the current section of enumerating is critical section, if, enter step 5, if not, return the next section of described step 2 pair and enumerate and judge;
Step 5: whether the content of this critical section of judging each described PE file is identical, if identical, then return described step 2 pair next section and enumerates and judge, if not identical, judges that directly each described PE file is based on different source codes;
Enumerate and after judgement finishes, if the number of the section that each described PE file comprises is identical, Debugging message is identical and the content of each critical section is identical, judges that then each described PE file is based on identical source code in the section of described PE file to each.
A kind of decision maker of PE file source code consistency comprises:
Hop count order discrimination module, whether identical for the number of judging the section that each PE file comprises, if not identical, then the comparative result processing module judges that each described PE file is based on different source codes;
Section is enumerated module, is used for when described hop count order discrimination module judged result when being identical, to section enumerating that each described PE file comprises;
The Debugging message discrimination module, be used for judging whether the described section current section of enumerating of enumerating module comprises Debugging message, when comprising, judge whether the information except the first inessential information is identical in the Debugging message, if it is not identical, then described comparative result processing module judges that each described PE file is based on different source codes, and the described first inessential information comprises timestamp information;
The critical section discrimination module, be used for judging that described section is enumerated whether the current section of enumerating that module enumerates is critical section, if not critical section, then do not judge, if critical section, judge then whether each critical section is identical, if not identical, then described comparative result processing module judges that each described PE file is based on different source codes;
The comparative result processing module, be used for according to described hop count order discrimination module, the described section differentiation result who enumerates module, described Debugging message discrimination module, reaches described critical section discrimination module, judge that whether each described PE file is based on identical source code, when the section of each described PE file is enumerated complete, the result of determination that described hop count order discrimination module result of determination is identical, described Debugging message discrimination module is the result of determination of identical and described critical section discrimination module when being identical, and described comparative result processing module judges that each described PE file is based on identical source code.
Decision method and device thereof according to PE file source code consistency of the present invention, it can be according to the open structure of PE file, number and the type of the section that acquisition PE file comprises, for the PE file that derives from identical source code, the number of the section that it comprises is identical, and the content of the critical section that comprises is also identical, namely, when each the PE file to be analyzed that contrasts comprise the section number identical, and when the content of each critical section that comprises is also identical, can determine that each PE file to be analyzed is based on identical source code, detecting mode of the present invention does not need to depend on the corresponding original source code of PE file, easy to operate, and can not affect the security of source code library.
Description of drawings
Fig. 1 is the schematic flow sheet of embodiment one of the decision method of PE file source code consistency of the present invention;
Fig. 2 is the schematic flow sheet of embodiment two of the decision method of PE file source code consistency of the present invention;
Fig. 3 is the structural representation of preferred embodiment of the decision maker of PE file source code consistency of the present invention.
Embodiment
Whether the decision method of PE file source code consistency of the present invention can be judged based on identical source code two and plural PE file.
Open structure (normally linear data stream) according to the PE file, can obtain number and the type of the section that the PE file comprises, for the PE file that is generated by identical source code, the number of the section that it comprises must be identical, because the hop count target setting need to be revised the content of source code, thereby, when the number that detects the section that each PE file comprises is not identical, can judge directly that each PE file is to derive from different source codes; In addition, for identical source code, even its compiling is repeatedly, the content of its some section can not change, and is referred to as critical section, and generally, the title of these sections determines, for example:
.text the section, this section be code segment, its content that comprises is the instruction code, if variation has occured the content of this section, its corresponding source code also must be to be modified, namely, if based on identical source code, this .text section that each PE file comprises must comprise identical content;
.data the section, this section be the initialization data section, the overall situation that has been initialised when having comprised compilation of source code and static variable, therefore, if variation has occured the content of this section, so, source code also must be modified;
.idata the section, this section be equivalent to an importing table, comprised function and the data message of external module, therefore, if variation has occured the content of this section, then must revise its corresponding source code.
Based on this, the decision method of PE file source code consistency of the present invention may further comprise the steps:
Whether the number of judging the section that each PE file comprises is identical;
If enumerate the section that described each PE file comprises, and judge whether the content of the critical section that described each PE file has is identical, if identical, judge that then described each PE file content is based on identical source code, if not identical, judge that then described each PE file content is based on different source codes;
If not, judge that then described each PE file is based on different source codes.
Decision method according to PE file source code consistency of the present invention, can not need to depend on the content of the corresponding source code of PE file, can be directly according to the open structure of PE file, whether each PE file content to be analyzed is judged based on identical source code, when the number that detects the section that each PE file comprises is not identical, can judge directly that each PE file is to derive from different source codes; When the number content identical and each critical section of comprising of the section that comprises when each PE file is identical, can judge that namely each PE file is based on identical source code, thereby, use decision procedure of the present invention, do not need to depend on original source code content, can judge the consistance of the source code of PE file, can not affect the security of source code, simple operation.
When having judged that the PE file is whether during based on identical source code, next step application operating process in the time of can specifically using, such as, when judging that the PE file is based on different source codes, can re-start test to the PE file, when judging that the PE file is based on identical source code, then can implement automatic version issue to the PE file of determining, perhaps do software version management, or other application operating process, can be different according to the difference of concrete application needs and environment.
In addition, owing to also include Debugging message in the PE file, and for the PE file that comes from the identical sources code, other information in its Debugging message that comprises separately except the first inessential information are also necessary identical, this first inessential information comprises timestamp information, the difference of the control strategy during according to compiler type and compiling, and Debugging message can be positioned at critical section also can be positioned at non-key section, generally, this Debugging message can be positioned at non-key section.When Debugging message is positioned at critical section, can adopt the method for the invention described above to judge, when the number the content identical and critical section that comprises of the section that namely comprises when each PE file is identical, described each PE file is based on identical source code, and when Debugging message was positioned at non-key section, whether the Debugging message that then also will comprise each PE file is identical judged.
So the decision method of PE file source code consistency of the present invention before the content of judging the critical section that each PE file has is whether identical, can also comprise step:
Detect the current section of enumerating and whether comprise Debugging message;
If, judge whether other information except the first inessential information are identical in the Debugging message, if it is not identical, judge that then described each PE file content is based on different source codes, if identical, judge then whether the described current section of enumerating is critical section, and the described first inessential information comprises timestamp information;
If not, judge then whether the described current section of enumerating is critical section.
Decision method according to this PE file source code consistency of the present invention, in the time can't determining that Debugging message is positioned at critical section or is positioned at non-key section, or determined that Debugging message is when being positioned at non-key section, then can be in the same number of situation of the section of judging each PE file, can judge at first whether the current section of enumerating comprises Debugging message, if, judge at first then whether other information except the first inessential information are identical in the Debugging message of each PE file, if identical, then continue critical section is judged, if different, can judge directly that then each PE file is based on different source codes, analyze and judge speed, saved the time.
In addition, in software development process, according to the needs of concrete development environment and the difference of concrete function realization, the developer may define some self-defined section as required, be to derive from the situation of identical sources code at each PE file, self-defined section the quantity that it comprises is also inevitable identical with content.Therefore, the decision method of PE file source code consistency of the present invention can also comprise:
When described each PE file has self-defined section, judge whether self-defined section quantity of described each PE file is identical with content;
If not identical, judge that then described each PE file content is based on different source codes;
If identical, and the number the content identical and critical section that comprises of the section that comprises of described each PE file is identical, judges that then described each PE file is based on identical source code.
Thereby, if include self-defined section in the PE file, in the time of perhaps can't determining whether to comprise self-defined section in the PE file, when the consistance of PE file source code is judged, whether self-defined section of also need judge each PE file be identical, when self-defined section quantity of each PE file and/or content not simultaneously, can judge directly that then each PE file content is based on different source codes, when self-defined section quantity of each PE file is identical with content, then can comprehensively judge in conjunction with the result of determination of critical section.
Wherein, can be to carry out at any time to self-defined section analysis deterministic process, namely, can before being differentiated, critical section carry out, also can be after critical section is differentiated, to carry out, can also be before Debugging message be differentiated or carry out afterwards, as required different and specifically differentiate the difference of environment can be different.
In addition, because critical section can be resource section, can be non-resource section also, and resource section must be critical section, so when critical section is judged, critical section can be divided into resource section and the non-resource section is treated with a certain discrimination, that is:
If critical section is resource section, .rsrc section for example, because resource section has comprised whole resource datas of module, pictorial information for example, message bit pattern, chart-information, the information such as shape information and version information, and in these information, even can comprise some source codes does not change, the second inessential information that can change according to the difference of compiling number of times or translation and compiling environment, such as version information, in the situation of using identical source code, difference according to the compiling number of times, can produce different version informations, and pictorial information, message bit pattern, the information such as chart-information, then in the situation based on the identical sources code, even compiling repeatedly can not change yet, therefore, when resource section is compared, can not consider this second inessential information, namely, can carry out traversal search to the resource information that resource section comprises, as long as other information except this second inessential information are identical, then can judge the current critical section of enumerating as identical, and when except other any one information of this second inessential information not simultaneously, can judge that then the current critical section of enumerating is not identical;
If critical section is the non-resource section, because the time when compiling each time and the difference of environment, even it is identical to comprise source code when generating the PE file, the 3rd inessential information that still can change along with the difference of compiling number of times or translation and compiling environment, such as timestamp information, therefore, when the non-resource section is compared, can not consider the 3rd inessential information, as long as other information except the 3rd inessential information are identical, can judge that then the current critical section of enumerating is as identical, and when other any one information except the 3rd inessential information not simultaneously, can judge that then the current critical section of enumerating is not identical.
Based on this, whether the decision method of PE file source code consistency of the present invention identical when judging to critical section, specifically can comprise:
When described critical section was resource section, relatively whether the information except the second inessential information was identical in this resource section, if identical, judges that then the content of described each critical section is identical, and the described second inessential information comprises version information;
When described critical section was the non-resource section, relatively whether the information except the 3rd inessential information was identical in this resource section, if identical, judges that then the content of described each critical section is identical, and the described the 3rd inessential information comprises timestamp information.
Therefore, the method according to this invention, when critical section is resource section, then can carry out traversal search to the resource information that resource section comprises, as long as except other information of the second inessential information identical, then can judge the current critical section of enumerating as identical, and when except other any one information of this second inessential information not simultaneously, can judge that then the current critical section of enumerating is not identical; When critical section is non-key section, as long as except other information of the 3rd inessential information identical, then can judge the current critical section of enumerating as identical, and when except other any one information of the 3rd inessential information not simultaneously, can judge that then the current critical section of enumerating is not identical.It is by being that resource section and critical section are that the non-resource section is treated with a certain discrimination to critical section, and Effective Raise is to the consistance Accuracy of Judgement of PE file source code.
The below is elaborated for two preferred embodiments of the decision method of PE file source code consistency of the present invention.
Embodiment one:
As shown in Figure 1, be the schematic flow sheet of the embodiment of the invention one, in the present embodiment, the inventive method comprises step:
Whether step S101: analyzing current file to be analyzed is the PE file, if, enter step S102, if not, then directly finish this analytic process;
Owing to the object of the invention is to whether two or more PE file contents are detected based on identical source code, and in some cases, for example purpose is that need to find out with PE file undetermined is when coming from other PE files of identical sources code, possibly can't determine to add the file of detecting formation is the PE file, therefore, can whether be that the PE file is judged to current file to be analyzed at first, if not, then can directly finish the deterministic process to current file to be analyzed, to save time;
Step S102: whether the number of judging the section that each PE file to be analyzed comprises is identical, if identical, then enter step S103, if not, directly judge that then described each PE file is based on different source codes, this be because, for the PE file that is generated by identical source code, the number of the section that comprises is inevitable identical, if change has occured the number of section, then must be because change has occured the content of source code;
Step S103: enumerate successively the section that each PE file comprises, when also having the section that need enumerate, enter step S104, if no longer have the section that need to enumerate, namely each PE file section enumerate completely, then enter step S110;
Step S104: judge whether Debugging message is positioned at the current section of enumerating, if, enter step S105, if not, enter step S106;
Step S105: judge whether other information except the first inessential information are identical in the Debugging message of the current section of enumerating of each PE file, if, enter step S106, if not, judge directly that then described each PE file is based on different source codes, wherein, the described first inessential information comprises timestamp information;
Step S106: judge whether the current section of enumerating is critical section, if, enter step S107, if not, return step S103;
Step S107: judge whether current critical section is resource section, if, enter step S108, if not, enter step S109;
Step S108: use the resource section way of contrast that resource section is compared, namely, do not consider the second inessential information of comprising in the resource section, other information except this second inessential information are compared, if these other information are identical, judge that then the critical section content that contrasts is identical, if these other information are different, judge that then the content of the critical section that contrasts is different, and return step S103, wherein, this second inessential information comprises version information;
Step S109: use non-resource section way of contrast that this non-resource section is compared, namely, do not consider other information except the 3rd inessential information to be compared the 3rd inessential information that comprises in this non-resource section, if identical, judge that then the critical section content that contrasts is identical, if different, judge that then the content of the critical section that contrasts is not identical, and return step S103, wherein, the 3rd inessential information comprises timestamp information;
Repeatedly carry out above-mentioned steps S103 to step S109, can a plurality of sections that each PE file comprises compared respectively, then, entering step S110;
Step S110, process the comparative result of above steps, judge that whether each PE file is based on identical source code, namely, each PE file comprise the section number identical, Debugging message is identical and the content of each critical section is identical, judges that then described each PE file is based on identical source code.
Embodiment two:
In the present embodiment, be from the different of embodiment one, also self-defined section the content that comprises in each PE file judged that as shown in Figure 2, it comprises step:
Whether step S201: analyzing file to be analyzed is the PE file, if, enter step S202, if not, finish current differentiation process;
Step S202: whether the number of judging the section that current each PE file to be analyzed comprises is identical, if identical, enters step S203, if different, judges directly that then each PE file is based on different source codes;
Step S203: enumerate the section that each PE file to be analyzed comprises, when also having when section that to enumerate, enter step S204, if there has not been the section that to enumerate, then enter step S212;
Step S204: judge whether the current section of enumerating is self-defined section, if, enter step S205, if not, enter step S206;
Step S205: judge whether self-defined section the quantity that each PE file to be analyzed comprises is identical with content, if identical, then enters step S206, if different, judge directly that then described each PE file to be analyzed is based on different source codes;
Step S206: judge whether Debugging message is positioned at the current section of enumerating, if, enter step S207, if not, enter step S208;
Step S207: judge whether other information except the first inessential information are identical in the Debugging message that the current section of enumerating comprises, if identical, then enter step S208, if different, judge directly that then described each PE file to be analyzed is based on different source codes;
Step S208: judge whether the current section of enumerating is critical section, if, enter step S209, if not, return step S203;
Step S209 judges whether current critical section is resource section, if, enter step S210, if not, then enter step S211;
Step S210: use the resource section way of contrast that current resource section is compared, namely, judgement other information except the second inessential information are carried out with the opposite sex relatively, if identical, the content of then judging corresponding each critical section is identical, if different, judges that then the content of corresponding each critical section is not identical, and return step S203, the described second inessential information comprises software version information;
Step S211: use non-resource section way of contrast that current non-resource section is compared, namely, judgement other information except the 3rd inessential information are carried out with the opposite sex relatively, if identical, the content of then judging corresponding each critical section is identical, if different, judges that then the content of corresponding each critical section is not identical, and return step S203, the described the 3rd inessential information comprises timestamp information;
Repeatedly carry out above-mentioned steps S203 to step S211, can obtain the comparative result to each section of each file to be analyzed, thereby can enter step S212, whether each PE file is carried out comprehensive judgement based on identical source code;
Step S212: the comparative result of processing above steps, judge that whether each PE file to be analyzed is based on identical source code, namely, have in self-defined section the situation, if the number of the section that each PE file comprises is identical, self-defined section quantity comprising with content, Debugging message is identical and the content of each critical section is identical, judges that then described each PE file is based on identical source code.
The other technologies feature of present embodiment is identical with embodiment one, does not repeat them here.
This shows, when software development optimization when carrying out software development, if the user has defined self-defined section, can also carry out analyzing with the opposite sex to defined self-defined section of user, thereby make to each PE file content whether based on every analysis of identical sources code more comprehensively and more accurate.
Wherein, in present embodiment two, the same opposite sex to self-defined section judges it is to carry out before to the judgement of Debugging message, in fact, self-defined section the same opposite sex that each PE file comprises is judged, can also be to carry out after Debugging message is differentiated, and can be to carry out before to the differentiation of critical section, also can be after the content of critical section is differentiated, to carry out, can be different according to the difference of concrete needs and applied environment.
In addition, in the various embodiments described above, all be for judging whether each PE file describes based on identical source code, the decision method of PE file source code consistency of the present invention, can also be used for searching other PE files that are based on the identical sources code with some specific PE files, also can be used for judging and whether have the PE file that is based on identical source code in two different catalogues, in this application, can before carrying out the inventive method, at first determine whether to have file to be analyzed, if exist, then carry out the step of the inventive method, if do not exist, then can directly finish, do not differentiate.
As shown in Figure 3, be the structural representation of a preferred embodiment of the decision maker of PE file source code consistency of the present invention, as shown in the figure, in this preferred embodiment, the inventive system comprises:
Hop count order discrimination module 301, whether identical for the number of judging the section that each PE file comprises, if not identical, then comparative result processing module 306 judges that directly described each PE file is based on different source codes;
Section is enumerated module 302, is used for when described hop count order discrimination module 301 judged results when being identical, to section enumerating that described each PE file comprises;
Critical section discrimination module 305 is used for judging that described section is enumerated whether the current section of enumerating that module 302 enumerates is critical section, if judge that whether each critical section is identical, if not, then do not judge;
Comparative result processing module 306, be used for the differentiation result according to described hop count order discrimination module 301 and described critical section discrimination module 305, judge that whether described each PE file content is based on identical source code, when the result of determination of described hop count order discrimination module 301 is the result of determination of identical and described critical section discrimination module 305 when being identical, described comparative result processing module 306 judges that described each PE file content is based on identical source code.
Decision maker according to PE file source code consistency of the present invention, it does not need to depend on the corresponding source code content of each PE file, only need the number of the section that comprises in conjunction with each PE file and the similarities and differences of the content that each PE file comprises, can whether judge based on identical source code the PE file, namely, the number that comprises when each PE file is identical, and the content of the critical section that comprises is identical, can judge that then respectively analyzing the PE file is based on identical source code, thereby can not destroy the security of source code database, have larger convenience.
The decision maker of PE file source code consistency of the present invention can also comprise:
Debugging message discrimination module 303, be used for judging whether the described section current section of enumerating of enumerating module 301 comprises Debugging message, if the time, judge whether other information except the first inessential information are identical in the Debugging message, if it is not identical, then described comparative result processing module 306 judges that directly described each PE file content is based on different source codes, if identical, is then judged by described critical section discrimination module; When the result of determination of described hop count order discrimination module is that result of determination that identical, described critical section discrimination module result of determination is identical and described Debugging message discrimination module is when being identical, described comparative result processing module judges that described each PE file is based on identical source code, wherein, the described first inessential information comprises timestamp information.
Because Debugging message both can be positioned at critical section, also can be positioned at non-key section, if determine that Debugging message is positioned at critical section, then can adopt said apparatus to judge, if can't determine Debugging message place section is critical section or non-key section, then can judge in advance whether the current section of enumerating comprises Debugging message by this Debugging message discrimination module 303, if, when other information except the first inessential information are not identical in the Debugging message that the section of enumerating comprises, can determine directly that then each PE file is based on different source codes, only when the PE file comprise the section number identical, when identical and each critical section of other information in the Debugging message except the first inessential information is identical, judge that the PE file is based on identical source code, judgement speed is faster, and efficient is higher.
In addition, the decision maker of PE file source code consistency of the present invention can also comprise:
Self-defined section discrimination module 304, be used for when the PE file has self-defined section, whether self-defined section quantity judging each PE file is identical with content, if not identical, then described comparative result processing module 306 can judge directly that described each PE file content is based on different source codes;
When described hop count order discrimination module result of determination is that identical, described critical section discrimination module result of determination is identical and described when self-defined section to differentiate the result be identical, described comparative result processing module judges that described each PE file content is based on identical source code.
The difference of the method means that adopt according to software development, the software developer may increase self-defined section when developing, so at detecting PE file during whether based on the identical sources code, can differentiate by 304 pairs self-defined section of self-defined section discrimination module, intactly to judge, when this self-defined section discrimination module 304 judged self-defined section quantity of each file to be analyzed and/or content not simultaneously, can judge directly that each PE file is based on different source codes, only when each PE file comprise the section number identical, the content of critical section is identical, self-defined section content is identical, when having Debugging message discrimination module 303, when other information in the Debugging message except the first inessential information are also identical, can judge that each PE file is based on identical source code.
Because critical section can comprise resource section and non-resource section, and resource information can comprise the information such as pictorial information, version information, message bit pattern, icon information, shape information, even and for identical source code, difference according to the compiling number of times, version information also can be different, and the information such as pictorial information, message bit pattern, if for identical source code, even compiling repeatedly can not change yet, therefore, when the same opposite sex of resource section is judged, can get rid of the judgement to the second inessential information such as version informations.
In like manner, because the non-resource section can comprise timestamp information, even for identical source code, the timestamp information that comprises in the PE file that it generates may be different, therefore, when the same opposite sex of non-resource section is judged, can get rid of the judgement to information such as timestamp informations.
Based on this, the critical section discrimination module 305 among the present invention specifically can comprise:
Resource section discrimination module 3051, be used for when described critical section is resource section, relatively whether the information except the second inessential information is identical in this resource section, if identical, the content of then judging described each critical section is identical, and the described second inessential information comprises version information;
Non-resource section discrimination module 3052, be used for when described critical section is the non-resource section, relatively whether the information except the 3rd inessential information is identical in this resource section, if identical, the content of then judging described each critical section is identical, and the described the 3rd inessential information comprises timestamp information.
Thereby, the conforming discriminating gear of PE file source code of the present invention, when resource section is judged, can carry out traversal search by the resource information that 3051 pairs of resource section of resource section discrimination module comprise, when other information except the second inessential information are identical, judge that namely each resource section is identical, and when other information except this second inessential information are not identical, judge that then each resource section is not identical; When the same opposite sex of non-resource section is judged, can be by the judgement of non-resource section discrimination module 3052 eliminatings to timestamp information, namely, when other information except the 3rd inessential information are identical, judge that then each non-resource section is identical, and when other information except the 3rd inessential information are not identical, judge that then each non-resource section is not identical, it is by being that resource section and critical section are that the non-resource section is treated with a certain discrimination to critical section, Effective Raise is to the consistance Accuracy of Judgement of PE file source code.
Above-described embodiment of the present invention does not consist of the restriction to protection domain of the present invention.Any modification of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., all should be included within the claim protection domain of the present invention.

Claims (6)

1. the decision method of a PE file source code consistency comprises step:
Step 1: whether the number of judging the section that each PE file comprises is identical, if identical, enters step 2, if not identical, judges that directly each described PE file is based on different source codes;
Step 2: enumerate the section that each described PE file comprises, judge whether the current section of enumerating comprises Debugging message, if comprise, enters step 3, if do not comprise, enters step 4;
Step 3: judge whether the information except the first inessential information is identical in the Debugging message of the current section of enumerating of each described PE file, if identical, enter step 4, if it is not identical, judge that directly each described PE file is based on different source codes, the described first inessential information is the information the inevitable identical information in the Debugging message of PE file including of identical sources code, in each Debugging message, and the described first inessential information comprises timestamp information;
Step 4: judge whether the current section of enumerating is critical section, if, enter step 5, if not, return the next section of described step 2 pair and enumerate and judge, described critical section is the identical sources code section that content does not change when repeatedly compiling;
Step 5: whether the content of this critical section of judging each described PE file is identical, if identical, then return described step 2 pair next section and enumerates and judge, if not identical, judges that directly each described PE file is based on different source codes;
Enumerate and after judgement finishes in the section of described PE file to each, if the information content identical and each critical section in identical, the Debugging message of number of the section that each described PE file comprises except the first inessential information is identical, judge that then each described PE file is based on identical source code.
2. the decision method of PE file source code consistency according to claim 1 is characterized in that, also comprises step:
When each described PE file has self-defined section, judge whether self-defined section quantity of each described PE file is identical with content, described self-defined section is the section of research staff's definition in the performance history;
If not identical, judge that then each described PE file is based on different source codes;
If identical, and the information content identical and each critical section except the first inessential information is identical in identical, the Debugging message of number of the section that comprises of each described PE file, judges that then each described PE file is based on identical source code.
3. the decision method of PE file source code consistency according to claim 1 and 2 is characterized in that, whether identical method specifically comprises the described content of judging critical section:
When described critical section is resource section, whether the information in the more described resource section except the second inessential information is identical, if identical, the content of then judging each described critical section is identical, if not identical, judge that then the content of each described critical section is not identical, described resource section is the section that comprises resource data, the described second inessential information is the information that changes according to compiling number of times or translation and compiling environment in the resource information of resource section, and the described second inessential information comprises version information;
When described critical section is the non-resource section, relatively whether the information except the 3rd inessential information is identical in this non-resource section, if identical, the content of then judging each described critical section is identical, if not identical, judge that then the content of each described critical section is not identical, described non-resource section is not for comprising the section of resource data, the described the 3rd inessential information is the information that changes according to compiling number of times or translation and compiling environment in the non-resource section, and the described the 3rd inessential information comprises timestamp information.
4. the decision maker of a PE file source code consistency comprises:
Hop count order discrimination module, whether identical for the number of judging the section that each PE file comprises, if not identical, then the comparative result processing module judges that each described PE file is based on different source codes;
Section is enumerated module, is used for when described hop count order discrimination module judged result when being identical, to section enumerating that each described PE file comprises;
The Debugging message discrimination module, be used for judging whether the described section current section of enumerating of enumerating module comprises Debugging message, when comprising, judge whether the information except the first inessential information is identical in the Debugging message, if it is not identical, then described comparative result processing module judges that each described PE file is based on different source codes, the described first inessential information is the information the inevitable identical information in the Debugging message of PE file including of identical sources code, in each Debugging message, and the described first inessential information comprises timestamp information;
The critical section discrimination module, be used for judging that described section is enumerated whether the current section of enumerating that module enumerates is critical section, if not critical section, then do not judge, if critical section judges then whether each critical section is identical, if not identical, then described comparative result processing module judges that each described PE file is based on different source codes, and described critical section is the identical sources code section that content does not change when repeatedly compiling;
The comparative result processing module, be used for according to described hop count order discrimination module, enumerate module for described section, described Debugging message discrimination module, and the differentiation result of described critical section discrimination module, judge that whether each described PE file is based on identical source code, when the section of each described PE file is enumerated complete, described hop count order discrimination module result of determination is identical, it is identical that described Debugging message discrimination module judges that the described section current section of enumerating of enumerating module comprises in Debugging message and the Debugging message information except the first inessential information, and when the result of determination of described critical section discrimination module was identical, described comparative result processing module judged that each described PE file is based on identical source code.
5. the decision maker of PE file source code consistency according to claim 4 is characterized in that, also comprises:
Self-defined section discrimination module, be used for when described each PE file has self-defined section, whether self-defined section quantity judging each described PE file is identical with content, if it is not identical, then described comparative result processing module judges that each described PE file is based on different source codes, and described self-defined section is the section of research staff's definition in the performance history;
When the result of determination of described hop count order discrimination module is that identical, described Debugging message discrimination module judges that result of determination that the described section current section of enumerating of enumerating module comprises identical, the described critical section discrimination module of information except the first inessential information in Debugging message and the Debugging message is the result of determination of identical and described self-defined section discrimination module when being identical, described comparative result processing module judges that each described PE file is based on identical source code.
6. according to claim 4 or the decision maker of 5 described PE file source code consistencies, it is characterized in that described critical section discrimination module comprises:
The resource section discrimination module, be used for when described critical section is resource section, relatively whether the information except the second inessential information is identical in this resource section, if identical, the content of then judging each described critical section is identical, if it is not identical, the content of then judging each described critical section is not identical, described resource section is the section that comprises resource data, the described second inessential information is the information that changes according to compiling number of times or translation and compiling environment in the resource information of resource section, and the described second inessential information comprises version information;
Non-resource section discrimination module, be used for when described critical section is the non-resource section, relatively whether the information except the 3rd inessential information is identical in this non-resource section, if identical, the content of then judging each described critical section is identical, if it is not identical, the content of then judging each described critical section is not identical, the described the 3rd inessential information comprises timestamp information, described non-resource section is not for comprising the section of resource data, and the described the 3rd inessential information is according to the information that compiles number of times or translation and compiling environment and change in the non-resource section.
CN 200710033035 2007-12-29 2007-12-29 Method and apparatus for judging PE file source code consistency Active CN101470620B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200710033035 CN101470620B (en) 2007-12-29 2007-12-29 Method and apparatus for judging PE file source code consistency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200710033035 CN101470620B (en) 2007-12-29 2007-12-29 Method and apparatus for judging PE file source code consistency

Publications (2)

Publication Number Publication Date
CN101470620A CN101470620A (en) 2009-07-01
CN101470620B true CN101470620B (en) 2013-01-16

Family

ID=40828112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200710033035 Active CN101470620B (en) 2007-12-29 2007-12-29 Method and apparatus for judging PE file source code consistency

Country Status (1)

Country Link
CN (1) CN101470620B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103902905B (en) * 2013-12-17 2017-02-15 哈尔滨安天科技股份有限公司 Malicious code generator identification method and system based on software structure cluster
CN104866765B (en) * 2015-06-03 2017-11-10 康绯 The malicious code homology analysis method of Behavior-based control characteristic similarity
CN111771187B (en) * 2019-01-31 2021-12-10 华为技术有限公司 Method and device for eliminating code construction difference
CN112799649B (en) * 2020-06-15 2023-09-12 中兴通讯股份有限公司 Code construction method, device, equipment and storage medium
CN111858359B (en) * 2020-07-23 2024-01-30 珠海豹趣科技有限公司 Method and device for acquiring engineering code position of executable file

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731347A (en) * 2004-08-06 2006-02-08 梁肇新 Linux-based Windows software compatible layer architecture
CN1818823A (en) * 2005-02-07 2006-08-16 福建东方微点信息安全有限责任公司 Computer protecting method based on programm behaviour analysis
KR20070041800A (en) * 2005-10-17 2007-04-20 이종일 The method of api hook by modifying call instructions in code sections of modules in address space of application

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731347A (en) * 2004-08-06 2006-02-08 梁肇新 Linux-based Windows software compatible layer architecture
CN1818823A (en) * 2005-02-07 2006-08-16 福建东方微点信息安全有限责任公司 Computer protecting method based on programm behaviour analysis
KR20070041800A (en) * 2005-10-17 2007-04-20 이종일 The method of api hook by modifying call instructions in code sections of modules in address space of application

Also Published As

Publication number Publication date
CN101470620A (en) 2009-07-01

Similar Documents

Publication Publication Date Title
CN102054149B (en) Method for extracting malicious code behavior characteristic
CN104424402B (en) It is a kind of for detecting the method and device of pirate application program
Wang et al. Is there a" golden" feature set for static warning identification? an experimental evaluation
Cai et al. An empirical study of long-lived code clones
CN100456292C (en) Method and device for integrating multiple different versions of electronic files
CN101470620B (en) Method and apparatus for judging PE file source code consistency
CN102193810A (en) Cross-module inlining candidate identification
CN102567200A (en) Parallelization security hole detecting method based on function call graph
CN106250769A (en) The source code data detection method of a kind of multistage filtering and device
Huang et al. Detecting sensitive data disclosure via bi-directional text correlation analysis
CN111400724A (en) Operating system vulnerability detection method, system and medium based on code similarity analysis
CN110147235A (en) Semantic comparison method and device between a kind of source code and binary code
Nam et al. Marble: Mining for boilerplate code to identify API usability problems
Solanki et al. Comparative study of software clone detection techniques
CN109801677A (en) Sequencing data automated analysis method, apparatus and electronic equipment
CN111159697A (en) Key detection method and device and electronic equipment
Azuma et al. An empirical study on self-admitted technical debt in dockerfiles
Rantala et al. Prevalence, contents and automatic detection of KL-SATD
Schlie et al. Clustering variation points in matlab/simulink models using reverse signal propagation analysis
Kirinuki et al. Splitting commits via past code changes
CN113805861B (en) Code generation method based on machine learning, code editing system and storage medium
CN116401670A (en) Vulnerability patch existence detection method and system in passive code scene
Mendonça et al. Test2feature: Feature-based test traceability tool for highly configurable software
Guan et al. Code property graph-based vulnerability dataset generation for source code detection
Al Debeyan et al. Improving the performance of code vulnerability prediction using abstract syntax tree information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20090701

Assignee: BEIJING KINGSOFT INTERNET SECURITY SOFTWARE Co.,Ltd.

Assignor: Zhuhai Kingsoft Software Co.,Ltd.

Contract record no.: 2014990000718

Denomination of invention: Method and apparatus for judging PE file source code consistency

Granted publication date: 20130116

License type: Common License

Record date: 20140826

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model