GB2494268A - Performing code optimization - Google Patents

Performing code optimization Download PDF

Info

Publication number
GB2494268A
GB2494268A GB201215035A GB201215035A GB2494268A GB 2494268 A GB2494268 A GB 2494268A GB 201215035 A GB201215035 A GB 201215035A GB 201215035 A GB201215035 A GB 201215035A GB 2494268 A GB2494268 A GB 2494268A
Authority
GB
United Kingdom
Prior art keywords
performance
text
instruction
code
association relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB201215035A
Other versions
GB201215035D0 (en
Inventor
Rui Hou
Michael Wurst
Yan Qi Wang
Zhengya Sun
Jia Zou
Wei Fan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB201215035D0 publication Critical patent/GB201215035D0/en
Publication of GB2494268A publication Critical patent/GB2494268A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/3604Software analysis for verifying properties of programs
    • G06F11/3612Software analysis for verifying properties of programs by runtime analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Method comprising: obtaining 30 performance profiling data (from sampling logs) associated with execution of first code on first physical platform (target); constructing 32 an instruction sequence (like LOAD, MOVE, OR, STORE) and determining the association relationship between the sequence and performance defect events according to the data; providing 34 the relationship to another physical platform. Events are: Cache Miss, TLB Miss, Stall, Recycle. A second code on second platform (developer-platform) is optimised using the relationship, optimizing execution of the second code on first platform. Developed code is optimised on development platform based on the association relationship (cross-platform performance optimisation). The association relationship is based on sequence and defects occurrence times or clustering based on information entropy. The association relationship reflects hardware features of first platform, does not relate to detailed code, and does not leak code information execution on first platform or cause security risks after transmission to second platform.

Description

PERFORMING CODE OPTIMIZATION
TECHNICAL FIELD
S The present invention relates to code optimization in a data processing system.
DESCRIPTION OF THE RELATED ART
As the development of information technology progresses, computers are expected to perfonn to a very high standard. In practice, a computer's performance depends not only on the physical platform of the computer, but also the efficiency of using the physical platforms by a software application when it is executed. If software and hardware have good synergy between each other, that is, a software application can make the best of the executing capacity of a physical platform, a higher executing performance may be achieved.
In order to improve the executing performance of the computer, when a software application is running, it often needs to be optimized for the hardware platform of the computer. In particular, during the process of executing software code, the processor would sample and record the execution of the software code instructions, thus generating hardware profile data.
The hardware profile data may reflect the behavior and events associated with hardware performance occurred when executing instructions, and thus is also referred to as performance profiling data. By analyzing the performance profiling data, the executing profile of specific instructions on a specific hardware platform can be understood. Based on the obtained executing profile, the instruction code is optimized and events which cause performance defects can be eliminated, and therefore improve the executing performance.
In particular, many processor units in the prior art can provide a hardware signal to indicate the hardware performance events. In order to obtain these performance events, a performance monitoring unit may be installed in the hardware. This comprises a plurality of special hardware performance counters, each connecting to a hardware signal via a converter. A sampler executed in the OS kernel can take samples from these counters, periodically or when triggered by overflow exception of the performance monitor. The sampling log may be stored in memory buffer, and finally be written to a file, thus forming performance profiling data.
Typically, depending on different hardware structures, dozens, even thousands, of different hardware performance events may be occurred in the processor. Such events include, for example, Instructions-Cache Miss (ICacheMiss), Data Cache Miss (DCacheMiss), TLB Miss (TLBMiss), Tnstruction Pipeline Suspended (Stall), Pipeline Wait (Recycle), and the like. The types of the events are mainly dependent on the physical structure of the processor. These events can change the status of the processor, and are important factors of influencing executing performance.
Fig. I shows a typical example of performance profiling data. In the example shown in Fig. 1.
performance profiling data is arranged in a table, in which each entry, such as each row, shows executing profile of an instruction, including the instruction information and the performance events statistics associated with the instruction. The instruction information includes, for example, the module name of the instruction (Mod), the address of the instruction (Addr), the target address (of the jump instruction) (TargetAddr), the operation code of the instruction (Opcode), the operand (Operand), number of being sampled (ticks), and so on; the performance events statistics associated with the instruction include the sampling information in which the events such as ICacheMiss, DCacheMiss, TLBMiss, Stall, Recycle, etc. as mentioned above occur. It can be understood that, depending on the structure of the physical platform and the configuration of the performance monitoring unit, the performance profiling data may comprise additional or different performance events.
Generally, at least a part of the performance events recorded is associated with performance defects when executing instructions. For example, the various CachcMiss events recorded in Fig. I would prevent the processor from reading data directly from the cache, the Stall and Recycle events of the pipeline would cause the execution of the instruction stream suspended or waiting temporarily, and the like. The occurrence of these events would reduce the executing rate and efficiency of the processor, and cause performance defects; therefore, such events may be referred to as performance defect events.
Thus, by analyzing the above performance profiling data and obtaining the instruction information associated with the performance defects, we may possibly overcome these defects, and thereby optimize the code and improve the executing performance of the processor. Fig.2 is a schematic diagram showing the code optimization in the prior art. As shown in the figure, an optimizer and an execution unit are provided in the same physical platform. Various codes, including the source code of a software application, the intermediate code or the converted binary code, are first of all input into the optimizer. If it is the first time that the codes are executed, the optimizer will directly transmit the codes to the execution unit. While the execution unit is executing these codes, the performance monitoring unit would generate performance profiling data relating to the execution of the codes, as described S above. Once having obtaincd such performance profiling data, the optimizer may obtain the performance defects-related events by analyzing the performance profiling data. Based on the performance defect events, the optimizer can modifsi and optimize the codes with the intention of eliminating or reducing the performance defect events. Subsequently, the optimizer transmits the optimized codes once again to the execution unit to be executed, thereby forming performance profiling data once again. By analyzing the performance profiling data once again, the optimizer optimizes the codes once again, further removing the performance defects. Hence, after being repeatedly executed and modified as described above, the codes of the software application can be adapted to the executing platform better.
During the process of optimization as shown in Fig.2, the code optimization is based on the analysis on the performance profiling data, and the performance profiling data is generated based on the execution of the codes on a particular platform; therefore, the codes have to be first executed on the target physical platform before they are optimized, and the execution and the optimization must be performed on the same physical platform. In practice, however, a software application is generally developed on a development platform by developers, while it is generally executed on customer's hardware platform. As the hardware platforms which execute the software application have various physical features, code optimization for each hardware platform needs to execute the code on each platform, which inevitably causes very expensive costs for optimization. On the other hand, the performance profiling data can reflect the instruction information executed on a physical platform, and therefore, the code optimization executed on the customer's hardware platform by developers may possibly bring potential security risks, such as, for example, leaking the source code information, or leaking the customer's confidential information. Therefore, the performance optimizing solutions in
the prior art have disadvantages in many aspects.
SUMMARY OF THE INVENTION
In view of the above questions, the present invention provides a cross-platform performance optimizing solution, in order to overcome at least one problem existed in the prior art.
Viewed from a first aspect a computer implemented method of determining an association relationship from performance profiling data of a target data processing platform comprising: determining performance profiling data related to the execution of a first computer program code, the performance profiling data comprising information of instructions corresponding to the first computer program code, and information of performance defect events corresponding to the instructions; constructing at least one instruction sequence, and determining the association relationship betwccn the at least one instruction sequence and the performance defect events from the performance profiling data,; and providing the association relationship to another physical computer platform, for performing the optimization of a second computer program code on the another physical platform based on the association relationship.
Preferably, the present invention provides a method wherein the performance profiling data is formed based on the sampling logs generated when the first code is executed.
Preferably, the present invention provides a method wherein said determining of the association relationship between the at least one instruction sequence and the performance defect events comprises: counting the occurrence times of the at least one instruction sequence; counting the occurrence times of the performance defect events corresponding to the at least one instruction sequence; and based on the proportion of the occurrence times of the instruction sequence to the occurrence times of the performance defect events as counted above, determining the association relationship between the at least one instruction sequence and the performance defect events.
Preferably, the present invention provides a method further comprising selecting the frequently occurring instruction sequences according to the occurrence times of the at least one instruction sequence; and wherein said counting the occurrence times of the performance defect events corresponding to the at least one instruction sequence is only counting the occurrence times of the performance defect events corresponding to the frequently occurring instruction sequences.
Preferably, the present invention provides a method wherein said determining the association relationship between the at least one instruction sequence and the performance defect events comprises: selecting the frequently occurring instruction sequences; clustering the instruction sequences based on the information entropy; selecting the discriminative clusters in combination with the counting of the performance defect events; and fUrther classifying the selected sequence clusters until a predetermined condition is satisfied.
Preferably, thc present invention provides a method wherein the predetermined condition S comprises at least one of the following: the number of instruction sequences in the classified group is less than a first particular threshold; and the degree of association between the instruction sequences and the performance defect events is no less than a second particular threshold.
Preferably, the present invention provides a method of performing code optimization, comprising: obtaining the association relationship as described above and according to the association relationship, determining the performance defect events corresponding to the second code; and based on the determined performance defect events, optimizing the second code.
Preferably, the present invention provides a method wherein said obtaining the association relation comprises: obtaining the association relation via an association sharing platform.
Preferably, the present invention provides a method wherein said determining the performance defect events corresponding to the second code comprises: scanning the second code and generating instruction sequences corresponding to the second code; matching the generated instruction sequences with the obtained association relation; and according to the association relation, determining the performance defect event corresponding to the matched instruction sequence.
Preferably, the present invention provides a method wherein said optimizing the second code comprises eliminating or reducing the performance defect events by at least one of the following ways: adjusting the execution order of the second code; and adding additional instruction codes to the second code.
Viewed from a second aspect the present invention provides an apparatus for determining an association relationship for code optimization, comprising: a profiling data obtaining unit, configured to obtain the performance profiling data associated with the execution of a first code, the performance profiling data comprising the information of instructions corresponding to the first code, and the information of performance defect events corresponding to the instructions; an association determining unit, configured to, according to the performance profiling data, construct at least one instruction sequence, and determine the association relationship between the at least one instruction sequence and the performance defect events; and an association providing unit, configured to provide the association relationship to another physical platform, for performing the optimization of a second code on the another physical platform based on the association relationship.
Preferably, the present invention provides an apparatus wherein the performance profiling data is formed based on the sampling logs generated when the first code is executed.
Preferably, the present invention provides an apparatus wherein the association determining unit is configured to: count the occurrence times of the at least one instruction sequence; count the occurrence times of the performance defect events corresponding to the at least one instruction sequence and based on the proportion of the occurrence times of the instruction sequence to the occurrence times of the performance defect events as counted above, determine the association between the at least one instruction sequence and the performance defect events.
Preferably, the present invention provides an apparatus wherein the association determining unit is further configured to select the frequently occurring instruction sequences according to the occurrence times of the at least one instruction sequence; and said counting the occurrence times of the performance defect events corresponding to the at least one instruction sequence is to only count the occurrence times of the performance defect events corresponding to the frequently occurring instruction sequences.
Preferably, the present invention provides an apparatus wherein the association determining unit is configured to: select the frequently occurring instruction sequences; cluster the instruction sequences based on the information entropy; select the discriminative clusters in combination with the counting of the performance defect events; and further classify the selected sequence clusters until a predetermined condition is satisfied.
Preferably, the present invention provides an apparatus wherein the predetermined condition comprises at least one of the following: the number of instruction sequences in the classified group is less than a first particular threshold; and the degree of association between the instruction sequences and the performance defect events is no less than a second particular threshold.
Preferably, the present invention provides an apparatus for performing code optimization, comprising: an association obtaining unit, configured to obtain the association relationship as provided by the apparatus as described above and further comprising a defects determining unit, configured to, according to the association relationship, determine the performance defect events corresponding to the second code; and a code optimizing unit, configured to, based on the determined performance defect events, optimize the second code.
Preferably, the present invention provides an apparatus wherein the association obtaining unit is configured to obtain the association relationship via an association sharing platform.
Preferably, the present invention provides an apparatus wherein the defects determining unit is configured to: scan the second code and generate instruction sequences corresponding to the second code; match the generated instruction sequences with the obtained association relationship; and according to the association relationship, determine the performance defect event corresponding to the matched instruction sequence.
Preferably, the present invention provides an apparatus wherein the code optimizing unit is configured to eliminate or reduce the performance defect events by at least one of the following ways: adjusting the execution order of the second code; and adding additional instruction codes to the second code.
In another aspect, the present invention provides a method of providing association relationship, comprising: obtaining the performance profiling data associated with the execution of a first code, the performance profiling data comprising the information of instructions corresponding to the first code, and the information of performance defect events corresponding to the instructions; according to the performance profiling data, constructing at least one instruction sequence, and determining the association relationship between the at least one instruction sequence and the performance defect events; and providing the association relationship to another physical platform, for performing the optimization of a second code on the another physical platform based on the association relationship.
According to another aspect, the present invention provides a method for performing code optimization, comprising: obtaining the association relationship as provided in the first aspect; according to the association relationship, determining the performance defect events corresponding to the second code; and based on the determined performance defect events, optimizing the second code.
According to another aspect, the present invention provides an apparatus for providing association relationship, comprising: a profiling data obtaining unit, configured to obtain the performance profiling data associated with the execution of a first code, the performance profiling data comprising the information of instructions corresponding to the first code, and the information of performance defect events corresponding to the instructions; an association determining unit, configured to, according to the performancc profiling data, construct at least one instruction sequence, and determine the association relationship between the at least one instruction sequence and the performance defect events; and an association providing unit, configured to provide the association relationship to another physical platform, for pcrforming the optimization of a second code on the another physical platfbrm based on the association relationship.
According to the another aspect, the present invention provides an apparatus for performing code optimization, comprising: an association obtaining unit, configured to obtain the association relationship as provided in the third aspect; a defects determining unit, configured to, according to the association relationship, determine the performance defect events corresponding to the second code; and a code optimizing unit, configured to, based on the determined performance defect events, optimize the second code.
The method and apparatus according to the invention may allow the developed code to be optimized on the development platform based on the association relationship between the instruction sequence and the performance defect events produced on the target physical platform, so as to make the developed code adapted to the target platform better, thereby realizing the cross-platform performance optimization, and making the optimizing process more effective.
BRIEF DESCRIPTION OF THE DRAWINGS
Embodiments of the invention will now be described, byway of example only, with reference to the accompanying drawings in which:
S
Fig. 1 shows a typical example of performance profiling data; Fig.2 is a schematic diagram showing the code optimization in the prior art; Fig.3 shows a flow chart of a method for providing association relationship according to an embod[mcnt of the invention; Fig.4 shows a flow chart of a method for performing code optimization according to an embodiment of the invention; Fig.5A shows a substep of determining association relationship according to an embodiment ofthe invention; Fig.5B shows a substep of determining association relationship according to another embod[ment of the invention; Fig.6 shows an example of clustering and encoding instruction sequences according to an embodiment of the invention; Fig.7 is a schematic diagram of an association relationship according to an cmbodiment of the invention; Fig.8 shows a substep of determining performance defect events according to an embodiment of the invention Fig.9A shows a schematic block diagram of an apparatus for providing association relationship according to an embodiment of the invention; Fig.9B shows a schematic block diagram of an apparatus for performing code optimization according to an embodiment of the invention; Fig. 10 is a block diagram showing an exemplary computing system 100 suitable to implement the embodiments of the invention.
DETA[LED DESCRIPTION OF THE PREFERRED EMBODIMENT
The embodiments of the invention will now be described with reference to the drawings. As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," "module" or "system." Furthermore, aspects of the invention may take the form of a computer program product embodied in one or more computer readable medium having computer usable program code embodied in the medium.
Any combination of one or more computer readable medium may be utilized. The computer readable medium may be computer-readable signal medium or computer-readable storage medium. The computer-readable storage medium may be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device or any combinations thereof. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device or any proper combinations thereof In the context of this document, a computer-readable storage medium may be any tangible medium that can contain, or store the program for use by or in connection with the instruction execution system, apparatus, or device.
Computer readable signal medium may include a propagated data signal with the computer-readable program code embodied therewith, either in baseband or as part of a carrier wave.
Such propagated signal may use any proper form, including but not limited to, electromagnetic signal, optical signal, or any proper combination thereof Computer readable signal medium may be any computer readable medium that is different from computer-readable storage medium and can communicate, propagate, or transport the program for use by or iii connection with the instruction execution system, apparatus, or device.
Program code included in the computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc. or any proper combination thereof.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as "C" programming language or similar programming languages. The program code may execute entirely on a user computer, partly on a user computer, as a stand-alone software package, partly on user computer and partly on a remote computer or entirely on a remote computer or server. In the latter scheme, the remote computer may be connected to the user computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Below, aspects of the invention will be described with reference to flowchart and/or block diagram of methods, apparatuses (systems) and computer program products of the embod[ment of the invention. Note that, each block of the flowchart and/or block diagram, and combinations of blocks in the flowchart andior block diagram, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general-purpose computer, a special-purpose computer or other programmable data processing apparatus to produce a machine, such that the instructions which execute on the computer or other programmable data processing apparatus create means for implementing the fttnctions/actions specified in the block(s) of the flowchart and/or block diagram.
These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the functions/actions specified in the block(s) of the flowchart and/or block diagram.
The computer program instructions may also be loaded into a computcr or other programmable data processing apparatus to perform a series of operational steps on the computer or other programmable data processing apparatus so as to produce computer implemented process, such that the instructions which execute on the computer or other programmable data processing apparatus will provide process for implementing the functions/actions specified in the block(s) of the flowchart and/or block diagram.
Next, the embodiments of the invention will be described in conjunction with the drawings. It should be appreciated that the description of the following detailed examples are merely to explain the exemplary implementing modes, rather than to impose any limitation on scope of the invention.
In some embodiments of the invention, the association relationship between the instruction sequences and the performance defects is obtained by analyzing the performance profiling data of the target physical platform; subsequently, code is optimized on the development platform by using thc obtained association relationship data, thereby realizing cross-platform performance optimization. As described above, the performance profiling data may reflect the behavior and events associated with the hardware performance occurred when instructions are executed, and are closely dependent on the hardware characteristics of the physical platform and the code executed on the physical platform. That is, the performance profiling data reflect both the hardware features of the physical platform and the features of the executed code.
This is the reason why in the prior art developers have to execute the code first of all and generate the corresponding performance profiling data, and then they can optimize the code according to the profiling data. In the embodiments of the invention, however, by analyzing the performance profiling data, we can dig out the statistical regularity relating to the instruction sequences and the performance defect events, and thereby obtain the association relationship between the instruction sequences and the performance defects, which only reflects the hardware features of the physical platform, and has nothing to do with the executed codes. Based on such association relationship, we can perform code optimization on the development platform, such that the optimized code is adapted to the target physical platform better, thereby realizing cross-platform performance optimization. Next, the embodiments of the invention will be described in detail in conjunction with the drawings.
According to the embodiments of the invention, the process of cross-platform performance optimization may be divided into two stages, i.e., a method for providing association relationship executed on a first physical platform (for example, the target physical platform), and a method for performing code optimization executed on a second physical platform (for example, the development physical platform).
Fig.3 shows a flow chart of a method for providing association relationship according to an embodiment of the invention. The steps in the flow chart may be performed on the first physical platform. In particular, as shown in the figure, the process comprises, in step 30, obtaining the performance profiling data associated with the execution of a first code, the performance profiling data comprising the information of instructions corresponding to the first code, and the information of performance defect events corresponding to the instructions; in step 32, according to the performance profiling data, constructing instruction sequences, and determining the association relationship between the instruction sequences and the performance defect events; and in step 34, providing the association relationship to another physical platform, so as to perform the optimization of a second code on the another physical platform based on the association relationship.
Based on such provided association relationship, code optimization may be performed, and therefore execution performance may be improved. Fig.4 shows a flow chart of a method for performing code optimization according to an embodiment of the invention. The steps in the flow chart may be performed on the second physical platform. In particular, as shown in Fig.4, the process comprises, in step 40, obtaining the association relationship as provided in the method of Fig.3; in step 42, according to the association relationship, determining the performance defect events corresponding to the second code; and in step 44, based on the determined performance defect events, optimizing the second code, and therefore optimizing the execution performance of the second code on the first physical platform.
Next, the steps of the above embodiment will be described in conjunction with detailed
examples.
Firstly, in step 30, thc performance profiling data associated with the execution of a first code is obtained on the first physical platform. The first physical platform may be the target physical platform which the performance optimization is directed to; the first code may be any code executed on the first physical platform, including the source code, the intermediate S code, and the binary code of various software, applications, and programs, etc. As described above, in many physical platforms in the prior art, a performance monitoring unit is provided to collect and count the signals which are generated by the processor and indicate hardware performance events. At the same time, the sampler executed in the OS kernel may take samples from the counting mentioned above. In one embodiment, in order to obtain the performance profiling data, in step 30, while the processor executes the first code, the process reads the sampling results directly from the above sampler in real time, and pools and organizes the sampling results, thus obtaining the performance profiling data. In another embodiment, the sampler produces sampling logs periodically and stores them in memory buffer, or further, reads the sampling logs from the memory buffer, write them to a tile, and stores the file on a particular location of a hard disk. In this case, in step 30, the sampling logs may be read directly from the memory buffer or from the particular location of the hard disk, and be formed into the performance profiling data.
As described above, the performance profiling data generally comprises the instruction information and the related performance events. Accordingly, the performance profiling data obtained by step 30 comprises the corresponding instructions when executing the first code, and the performance events caused by these instructions, including the performance defect events. In particular, the obtained performance profiling data may be as shown in the example of Fig. 1. However, those skilled in the art will understand that the content of the performance profiling data depends on many factors such as the structure of the physical platform, the instruction code executed on the physical platform, the configuration of the performance monitoring unit and the sampler, and the like. As these factors are different, the performance profiling data obtained in step 30 may comprise additional and/or different instruction information and performance events with regard to the example of Fig. 1.
Subsequently, based on the obtained performance profiling data, in step 32, the process analyzes the profiling data, generates instruction sequences and identifies the regularity of performance defects caused by the instruction sequences, that is, determines one or more relationships between the instruction sequences and the performance defect events, such that thc identified relationship reflects the hardware fcaturcs of the first physical platform and has nothing to do with the code executed thereon (for example, the first code). The analysis and determination of the relationship as described above may be realized in a number of ways.
Tn part[cular, Fig.5A shows steps of identifying the relationship according to an embodiment of the invention, i.e., the sub steps of step 32 in Fig.3. In the embodiment, the relationship is obtained by counting the times of occurrence of the instruction sequences and the times of occurrence of the performance events. In particular, as shown in the figure, in step 3211, instruction sequences are constructed according to the performance profiling data. As described above, the performance profiling data record, is executed generally in the order of the of the instructions i.e. the information of the executed instructions, including the instruction address, the operation code, and so on, as shown in Fig.1.
For the jump instruction which is not executed in the general order i.e. the performance profiling data record, the source address and the target address of the instruction and so on, the execution path of the instructions may be obtained by backtracking along the recording order of the performance profiling data, and if necessary, jumping according to the source address and the target address.
Based on the execution path, we can obtain a number of successively executed instructions, thereby constructing an instruction sequence consisting of a number of successive instructions. In one example, the instruction sequence is identified by the operation content of the instructions, more particularly, the operation code. For example, L (load), MVC (move), XC (or) and ST (store), which appear successively in the performance profiling data of Fig.1, can be considered as an instruction sequence. Subsequently, in step 3212, the process counts the times of occurrence of each instruction sequence in the performance profiling data. Then, in step 3213, it counts the times of occurrence of the performance defect events corresponding to each instruction scquence. Fiaally, in step 3214, based on the proportion of the occurrence times as counted above, it determines the association relationship between the instruction sequences and the performance defect events. In particular, in this step, it may determine the proportion of the occurrence times of various performance defect events to the occurrence times of the instruction sequence, compare the determined proportion with a predetermined threshold, and according to the comparison result, determine the association relationship between the instruction sequence and the performance defects. For example, for the instruction sequence L-MVC-XC-ST mentioned above, suppose in step 3212 it is found that the instruction sequence occurs 100 times in the performance profiling data. Then, in step 3213, by checking the performance profiling data, it is found that, when the last instruction ST of the sequence is executed, the TLBMiss events occur 85 times, the Stall status occurs S times, and the ICacheMiss events occur twice. In this case, it can be determined in step 3214, when executing the instruction sequence, the TLBMiss events occur at a proportion of 85%.
Suppose the predetermined threshold proportion is 80%, then it can be determined that the association relationship exists between the above instruction sequence and the performance defect event TLBMiss. In one embodiment, the degree of association may be further introduced to indicate the closeness of the association relationship. In particular, the proportion of the occurrence times as described above may be taken as the value of the degree of association.
It can be seen that, in the embodiment of Fig.5A, the association relationship is determined by directly counting the occurrence times of the instruction sequences and the performance defect events. In many cases, the number of the instruction sequences is so large (especially in the cases of not limiting the sequence length) that the computing efficiency of the above counting steps is not ideal. Therefore, according to the actual computing resources, we may employ different modes to perform step 32 to obtain the association relationship.
In one embodiment, after step 3212 of counting the occurrence times of each instruction sequence in Fig.5A, it selects the frequently occurring instruction sequences according to the occurrence times. The frequently occurring instruction sequences may be the instruction sequences whose occurrence times exceed a predetermined threshold. Then, in step 3213, it only counts the performance defect events corresponding to the frequently occurring instruction sequences. Like this, the removal of less frequently occurring, and therefore unrepresentative instruction sequences may save some counting computation.
In another embodiment, the instruction sequences are preliminarily screened based on the performance events of a single instruction. In particular, for each single instruction in the performance profiling data, the process obtains the proportion of the sampling times of the performance defect events, occurred when executing the instruction, to the sampling times of the instruction. If the proportion is larger than a predetermined threshold, the instruction is then regarded as the end instruction of the instruction sequence that may possibly cause the performance defect events. Accordingly, beginning from this instruction, it backtracks for a certain number of steps along the execution path of instructions, thereby obtaining a candidate instruction sequence. For exampic, for the instruction MYC in line 6 of Fig. 1, the sampling times are 14, in which the Recycle event occurs 5 times. Suppose this proportion is larger than the predetermined threshold, then it can be deemed that the instruction sequence ending with MYC is a sequence that may possibly cause the Recycle event.
Accordingly, starting from the MYC, backtracking along the execution path of instructions can obtain candidate instruction sequences that may possibly cause the Recycle. On the other hand, if the proportion of a certain type of performance defect event occurred for an instruction is less than the above threshold, it can be deemed that the instruction is unlikely the end instruction of the instruction sequence that causes the type of performance defect event, and therefore it is unnecessary to backtrack starting from this instruction. Thus, only the candidate instruction sequences that may possibly cause the performance defect events are obtained, and therefore some counting and computing cost can be saved. For the obtained candidate instruction sequences, the process may count them in a similar way to the embodiment of Fig.5A, and therefore finally determine the association relationship between the instruction sequences and the performance defect events. It can be understood that those skilled in the art can combine the several embodiments described above according to actual requirement to produce more variants, in order to obtain the association relationship between the instruction sequences and the performance defect events.
Alternatively, instead of directly counting the occurrence times of the instruction sequences and the performance defect events, in another embodiment of the invention, the instruction sequences are clustered and analyzed based on the information entropy so as to obtain the association relationship between the instruction sequences and the performance defect events.
Fig.5B shows the steps of determining an association relationship according to the embodiment. As shown in Fig.SB, in the embodiment, the process of the obtaining association relationship comprises: step 3221, constructing instruction sequences; step 3222, selecting the frequently occurring instruction sequences; step 3223, clustering and encoding the instruction sequences based on the information entropy; step 3224, selecting the discriminative clusters in combination with the counting of the performance events; step 3225, further classif'ing the selected clusters; and step 3226, determining whether the current clusters satisfy a predetermined condition, and if they do not, repeat steps 3222-3225 until the predetermined condition is satisfied. Next, the above steps will be described in detail.
In step 3221, the process obtains the execution path of instructions from the performance profiling data, and obtains a plurality of instruction sequences according to the execution path of instructions. This step is similar to step 3211 of Fig.SA and therefore is omitted.
Subsequently, in step 3222, the process selects the frequently occurring instruction sequences.
In particular, it may compare the occurrence times of the instruction sequences w[th a predetermined threshold, and select the instruction sequences whose occurrence times exceed the threshold; or, it may sort the instruction sequences by their occurrence times, and select a certain number of instruction sequences in order of the occurrence times, from high to low; it may employ other modes to select the frequently occurring instruction sequences.
Next, in step 3223, the process clusters and encodes the instruction sequences based on the information entropy. The information entropy is a measurement of the certainty and chaos of the information by referring to the concept of entropy in thermodynamics. Generally speaking, the more chaotic a system is and the more uncertain the variants contained therein are, the bigger the information entropy is; on the contrary, the more ordered a system is, the lower the information entropy is. In the prior art many methods have been proposed to approximate and estimate the information entropy mathematically. Based on the mathematically estimated information entropy, the instruction sequences can be clustered, such that the system consisting of the instruction sequences after being clustered has the least information entropy, that is, the most amount of information. Next, the process "encodes" the clusters of instruction sequences, that is, using different codes to stand for different classes to help further analysis.
In a simple example, the instruction sequences are clustered based on the fact whether the first instruction is L or MYC and whether the end instruction is ST. In particular, if the first instruction of an instruction sequence is L or MVC, the first instruction feature of the instruction sequence is labeled as I, or otherwise itis labeled as 0; if the end instruction is ST.
the end instruction of the instruction sequence is Labeled as 1, or otherwise it is labeled as 0.
Meanwhile, if the proportion of the Stall event occurred at the end instruction of the instruction sequence is larger than 20%, the Stall event is labeled as Y, or otherwise it is labeled as N. Thus, the table as shown in Fig.6 can be obtained. It can be seen that by clustering and encoding the instruction sequences, it is easier to dig out the regularity of the instruction sequences.
Subsequently, in step 3224, the process selects the discriminative clusters in combination with the counting of the performance events. In particular, it lists the degree of association between the clusters aud the performance events, which are labeled with codes, and selects the clusters with relatively high degree of association. Still taking the table of Fig.6 for example, it can be seen that the cluster labeled with 1-1 occurs 8 times, of which the Stall event is labeled withY for 6 times. In this case, the cluster may be selected for further analysis without the need of considering other clusters.
It can be understood that, as we cluster and encode the instruction sequences, we lose a part of the instruction sequence information. For example, as all the instruction sequences with the first instruction being L or MYC and the end instruction being ST are labeled as 1-1, we lose other features of the cluster. To this end, in step 3225, we further classify the selected sequence clusters, and restore apart of the lost information. For example, the cluster 1-1 as selected in step 3224 is further classified, such that the instruction sequences with the first instruction being L are regarded as one group, and the instruction sequences with the first instruction being MVC are regarded as another group. For each group of instruction sequences, in step 3226, the process judges whether the group of instruction sequences satisfies the predetermined condition, and if it does not, repeats steps 3222-3224 to perform clustering and selection once again until the predetermined condition is satisfied. The predetermthcd condition may be set depending on various factors such as computing resources or the desired accuracy, and it may include the requirements that, for example, the number of instruction sequences in each groups is less than a particular threshold, the degree of association between the instruction sequence and the performance events is no less than a particular threshold, and the like. Thus, by clustering and then fbrther classifying the sequences, we extract the representative and discriminative information step by step from a huge number of instruction sequences, and obtain the association relationship between the instruction sequences and the performance defect events.
Several embodiments of constructing instruction sequences and determining the association relationship between the instruction sequences and the performance defect events have been described above in conjunction with detailed examples; however, those skilled in the art can understand that the above embodiments are merely exemplary but not limiting. Based on the above examples, those skilled in the art can make further modification and variation, and employ similar or other modes to extract the association relationship between the instruction sequences and the performance events. Such modification and variation should be encompassed in the scope of the invention.
Fig.7 is a schematic diagram of association relationship obtained according to an embodiment of the invention. In the example of Fig.7, the association relationship is expressed in the format of a table. In this table, each entry shows the association between an instruction sequence and a performance defect event. For example, according to the first row of the table, the instruction sequence {BNORC, (LTGR or LI-I), BERC} and the performance defect event Stall are associated, which means that the successive execution of the instruction sequence would cause the status of Stall. In a further embodiment, the degree of association between the instruction sequences and the performance defect events is given, i.e., the possibility of causing the performance defects when executing the instruction sequence. It can be understood that the association relationship may be stored and shown in other formats, and is not limited to the example of Fig.7.
By comparing the performance profiling data as shown in Fig.1 and the association relationship as shown in Fig.7, it can be seen that, the performance profiling data directly show the information of the instructions (for example, the instruction address, the executing order, the operation code, the operand, the number of samples, etc.); therefore, it is possible to obtain the execution path of instructions by checking the performance profiling data, and furthermore, it is possible to obtain the code information executed on the physical platform, such as the first code information as described above, by reading the execution path of instructions (for example, by using disassembling tools). However, the association relationship shows the relevance between the instruction sequences and the performance defect events, which is obtained based on the statistics of the instruction sequences and all kinds of performance events, and does not relate to the particular execution information of the instructions. Therefore, it is impossible to obtain the code information executed on the physical platform by checking the association relationship between the instruction sequences and the performance defect events. That is to say, the above association relationship merely relates to the features of the physical platform, and has nothing to do with the code executed thereon, therefore having no possibility of leaking the code information. On the other hand, in many cases, the occurrence of the performance defect events is not caused by the execution of a single instruction, but is a result of successively executing a plurality of instructions.
Therefore, mining the association relationship between an instruction sequence, consisting of a plurality of successively executed instructions, and the performance defect events can more intrinsically reflect the regularity of causing the performance defects, and better reflect the features of the physical platform.
In view of the characteristics of the association relationship as described above, the association relationship is very suitable to serve as a reference guideline when optimizing codes for a physical platform. Thus, subsequently, in step 34, the association relationship is provided to another physical platform, for performing optimization of a second code on the another physical platform based on the association relationship, wherein the second code may be different from the first code in step 30. In particular, in one embodiment, the above determined association relationship may be stored locally, and a particular "another physical platform" is assigned with access permissions to the association relationship file.
Alternatively, in another embodiment, in step 34, the determined association relationship may be directly sent to said another physical platform. In yet another embodiment, the determined association relationship may be sent to an association sharing platform, such that the another physical platform can obtain the association relationship from the sharing platform.
Based on the fact that the association relationship is provided on the first physical platform according to the method of Fig.3, another physical platform, such as the second platform, may obtain the association relationship, and use the association relationship to optimize the code, thereby realizing the cross-platform performance optimization, as shown in steps of Fig.4.
In particular, in step 40, the association relationship is obtained on the second physical platform. In one embodiment, the provided association relationship is stored on the first physical platform. In this case, in step 40, the second physical platform directly sends a request to the first physical platform to request for reading the produced association relationship. After obtaining the authorization response from the first physical platform, the second physical platform may read from the first physical platform the association relationship produced for the first physical platform. In another embodiment, the association relationship coming from the first physical platform is stored on an association sharing platform, and is attached with a label of the first physical platform. In this case, in step 40, the second physical platfbrm reads from the sharing platform the association relationship for the first physical platform.
It can be understood that the transmission of the association relationship across the physical platforms maybe realized by using various communication modes that are known or will likely be employed in future, including the transmission conducted via various protocols such as FTP, HTTP, POP3, SMTP, etc. by using various media such as wireless mode, wired mode, optical cable, RF, etc. Based on the fact that the second physical platform obtains the association relationship generated for the first physical platform, it may perform code optimization for the first physical platform on the second physical platform. in particular, the optimization process comprises, in step 42, according to the association relationship, determining the performance defect events corresponding to the second code; and in step 44, based on the determined performance defect events, optimizing the second code, and thereby eliminating or reducing the performance defect events that may be caused by the second code. The above second code is a code to be executed on the first physical platfbrm and to be optimized.
Fig.8 shows sub steps of step 42 according to an embodiment of the invention, i.e., the detailed steps of determining the performance defect events. In particular, as shown in FigS, in order to determine the performance defect events corresponding to the second code, firstly in step 421, the process scans the second code and generates instruction sequences corresponding to the second code. Subsequently, in step 422, it matches the generated instruction sequences with the obtained association relationship, that is, checking if there is a corresponding instruction sequence in the obtained association relationship. If there is a matched instruction sequence, in step 423, according to the association relationship between the instruction sequence and the performance defect event, it determines the performance defect event corresponding to the matched instruction sequence. It can be understood that the performance defect event determined in step 423 is the performance defect event that may occur when supposing the current second code is executed on the first physical platform.
Based on the performance defects determined or predicted above, in step 44 of Fig.4, the second code is optimized to eliminate and/or reduce the performance defect events, and thereby optimize the execution performance of the second code on the first physical platform.
The code optimizing mode used in step 44 may be the commonly used optimizing mode in the prior art. For example, in one embodiment, it may appropriately adjust the execution order of the second code, and therefore reduce the instruction sequences that may possibly cause the performance defect events. For example, the second code includes an instruction sequence A for successively operating a plurality of operands, but the instruction sequence would frequently cause the Stall event. In this case, if an independent instruction B, whose timing is not so important, is present after the instruction sequence A, the instruction B may be brought forward and inserted in the instruction sequence A to interrupt the sequence A that may possibly cause defects. In another embodiment, the code optimization is performed by adding additional codes. For the cache miss events such as ICacheMiss, DCacheMiss, TLBMiss, etc., when finding the instruction sequences that cause these events, some additional instruction codes may be added before these instructions. These additional instruction codes are used to inform the processor to prefetch the operands or instruction data that may be used subsequently into the cache, thereby avoiding the cache miss events when executing the subsequent instruction sequence. In addition, those skilled in the art may modify and optimize the second code in other ways if necessary, such that the second code may be executed effectively on the first physical platform.
Thus, according to the method of the embodiment of Fig.4, the second code is optimized based on the association relationship, therefore improving the execution performance of the second code on the first physical platform.
In the cmbodiments described above, first, the association relationship between the instruction sequences and the performance defect events is obtained on the first physical platform, and then, the second code is optimized on the second platform based on the association relationship, thereby realizing the cross-platform performance optimization. As described above, the association relationship between the instruction sequences and the performance defect events does not relate to detailed instructions or codes, and therefore would not leak the code information executed on the first platform, or cause any security risks after transmitted to the second physical platform. Meanwhile, the association relationship more intrinsically reflects the hardware features of the first physical platform, and therefore, the optimization of the codes to be executed based on the association relationship can get better optimizing result.
Therefore, if a developer wants to optimize his developed software application for the target platform, he does not have to execute the software application on the target platform like the case in the prior art; instead, he can directly optimizes the code of the software application on his own development platform, thereby making the performance optimizing process more convenient and more efficient.
Based on the same inventive concept, the present invention further provides a system for performing performance optimization, comprising an apparatus for providing association relationship situated on a first physical platform and an apparatus for performing code optimization situated on a second physical platform. Fig.9A shows a schematic block diagram of an apparatus for providing association relationship according to an embodiment of the invention. As shown in Fig.9A, the apparatus for providing association relationship according to the embodiment comprises: a profiling data obtaining unit 911, configured to obtain the performancc profiling data associatcd with the execution of a first code, the performance profiling data comprising the information of instructions corresponding to the first code, and the information of performance defect events corresponding to the instructions; an association determining unit 912, configured to, according to the performance profiling data, construct at least one instruction sequence, and determine the association relationship between the at least one instruction sequence and the performance defect events; and an association providing unit 913, configured to provide the association relationship to another physical platform, for performing the optimization of a second code on the another physical platform based on the association relationship. Fig.9B shows a schematic block diagram of an apparatus for performing code optimization according to an embodiment of the invention. As shown in Fig.9B, the apparatus for performing code optimization according to the embodiment comprises: an association obtaining unit 921, configured to obtain the association relationship as provided by the apparatus of Fig.9A; a defects determining unit 922, configured to, according to the association relationship, determine the performance defect events corresponding to the second code; and a code optimizing unit 923, configured to, based on the determined performance defect events, optimize the second code.
In particular, the profiling data obtaining unit 911 obtains from the performance monitor of the first physical platform the performance profiling data recorded when the first code is executed. The obtained performance profiling data may be as shown in the example of Fig. 1, or it may include additional andlor different instruction information and performance events.
Based on the obtained performance profiling data, the association determining unit 912 analyzes these data, constructs instruction sequences therefrom, and digs out the regularity of causing the performance defects by the instruction sequences, that is, determines the association relationship between the instruction sequences and the performance defect events.
In one embodiment, the association determining unit 912 obtains the association relationship by counting the occurrence times of the instruction sequences and the occurrence times of the S performance events. In another embodiment, the association determining unit 912 further screens the instruction sequences, and therefore save some computation for counting. In yet another embodiment, the association determining unit 912 clusters and analyzes the instruction sequences based on the information entropy, and therefore obtains the relevancy between the instruction sequences and the performance defect events. The obtained association relationship may be as shown in Fig.7, or may be stored and shown in other formats. Furthermore, the association providing unit 913 provides the association relationship to another physical platform, such as the second physical platform, in many ways, so as to perform the code optimization on the another physical platform based on the association relationship.
On this basis, the association obtaining unit 921 of the second physical platform reads the association relationship via various communication modes. Then, the defects determining unit 922 compares and matches the instructions corresponding to the second code with the association relationship, and thus determines the performance defect events corresponding to the second code. Therefore, the code optimizing unit 923 may optimize the second code based on the determined performance defect events, and eliminate or reduce the performance defect events that the second code may possibly cause, thereby optimizing the execution performance of the second code on the first physical platform.
The detailed implementing modes of the above units will not be described herein for reference can be made to the above detailed description in conjunction with the method procedure and
particular examples.
The above described method and system for performing performance optimization may be realized by using a computing system. Figure 10 shows a block diagram of an illustrative computing system 100 adapted to implement embodiments of the invention. As shown, the computer system 100 may comprise: a Cpu (Central Processing Unit) 101, a RAM (Random Access Memory) 102, a ROM (Read-Only Memory) 103, a system bus 104, a hard disk controller 105, a keyboard controller 106, a serial interface controller 107, a parallel interface controller 108, a display controller 109, a hard disk 110, a keyboard 111, a serial external device 112, a parallel external device 113 and a display 114. Among these devices, the system bus 104 couples to the CPU 101, the RAM 102, the ROM 103, the hard disk controller 105, the keyboard controller 106, the serial controller 107, the parallel controller 108 and the display controller 109. The hard disk is coupled to the hard disk controller 105, the keyboard ill is coupled to the keyboard controller 106, the serial external device 112 is coupled to the serial interface controller 107, the parallel external device 113 is coupled to the parallel interface controller 108, and the display 114 is coupled to the display controller 109. It is appreciated that, the structural block diagram shown in Figure 10 is merely for purpose of illustration, rather than being a limitation to the scope of the invention. In some circumstances, certain devices may be added or removed based on actual condition.
The flowcharts and block diagrams in the accompany drawing illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts. or combinations of special purpose hardware and computer instructions.
Although respective apparatus and method of the present invention have been described in detail in conjunction with specific embodiments, the present invention is not limited thereto.
Under teaching of the specification, various changes, replacements and modifications may be made to the invention by those skilled in the art without departing from the scope of the invention. It is appreciated that, all such changes, replacements and modifications still fall within the protection scope of the invention. The scope of the invention is defined by the appended claims.

Claims (1)

  1. <claim-text>CLAIMSA computer implemented method of determining an association relationship from performance profiling data of a target data processing platform, comprising: S determining the performance profiling data related to the execution of a first computer program code on the target data processing platform, the performance profiling data comprising information related to instructions corresponding to the first computer program code, and information of performance defect events corresponding to the instructions; consnucting at least one instruction sequence, and determining the association relationship between the at least one instruction sequence and the information of the performance defect events from thc pcrformance profiling data; and providing the association relationship to another physical data processing platform, for performing the optimization of a second computer program code on the another physical data processing platform based on the association relationship.</claim-text> <claim-text>2. The method according to claim 1, wherein the performance profiling data is formed based on the sampling logs generated when the first code is executed.</claim-text> <claim-text>3. The method according to claim 1, wherein said determining of the association relationship between the at least one instruction sequence and the performance defect events comprises: counting the occurrence times of the at least one instruction sequence; counting the occurrence times of the performance defect events corresponding to the at least one instruction sequence; and based on the proportion of the occurrence times of the instruction sequence to the occurrence times of the performance defect events as counted above, determining the association relationship between the at least one instruction sequence and the performance defect events.</claim-text> <claim-text>4. The method according to claim 3, further comprising selecting the frequently occurring instruction sequences according to the occurrence times of the at least one instruction sequence; and wherein said counting the occurrence times of the performance defect events corresponding to the at least one instruction sequence is only counting the occurrence times of thc pcrformance defect evcnts corresponding to the frequently occurring instruction sequences.</claim-text> <claim-text>5. The method according to claim 1, wherein said determining the association relationship between the at least one instruction sequence and the performance defect events comprises: selecting the frequently occurring instruction sequences; clustering the instruction sequences based on the information entropy; selecting the discriminative clusters in combination with the counting of the performance defect events; and further classifying the selected sequence clusters until a predetermined condition is satisfied.</claim-text> <claim-text>6. The method according to claim 5, wherein the predetermined condition comprises at least one of the following: the number of instruction sequences in the classified group is less than a first particular threshold; and the degree of association between the instruction sequences and the performance defect events is no less than a second particular threshold.</claim-text> <claim-text>7. An apparatus for determining an association relationship from performance profiling data of a target data processing platform, comprising: a profiling data obtaining unit, configured to determine the performance profiling data related to the execution of a first computer program code on the target data processing platform, thc performance profiling data comprising information of instructions corresponding to the first computer program code, and information of performance defect events corresponding to the instructions; an association determining unit, configured to, construct at least one instruction sequence, and determining the association relationship between the at least one instruction sequence and the information of the performance defect events from the performance profiling data,; and an association providing unit, configured to provide the association relationship to another physical data processing platform, for performing the optimization of a second computer program code on the another physical data processing platform based on the association relationship.</claim-text> <claim-text>8. The apparatus according to claim 7, whcrcin the performance profiling data is formed based on the sampling logs generated when the first code is executed.</claim-text> <claim-text>9. The apparatus according to claim 7, wherein the association determining unit is configured to: count the occurrence times of the at least one instruction sequence; count the occurrence times of the performance defect events corresponding to the at least one instruction sequence; and based on the proportion of the occurrence times of the instruction sequence to the occurrence times of the performance defect events as counted above, determine the association betwccn the at least one instruction sequence and the performance defect events.</claim-text> <claim-text>10. The apparatus according to claim 9, wherein the association determining unit is further configured to select the frequently occurring instruction sequences according to the occurrence times of the at least one instruction sequence: and said counting the occurrence times of the performance defect events corresponding to the at least one instruction sequence is to only count the occurrence times of the performance defect events corresponding to the frequently occurring instruction sequences.</claim-text> <claim-text>11. The apparatus according to claim 7, wherein the association determining unit is configured to: select the frequently occurring instruction sequences; cluster the instruction sequences based on the information entropy; select the discriminative clusters in combination with the counting of the performance defect events; and further classify the selected sequence clusters until a predetermined condition is satisfied.</claim-text> <claim-text>12. The apparatus according to claim 11, wherein the predetermined condition comprises at least one of the following: the number of instruction sequences in the classified group is less than a first particular threshold; and the degree of association between the instruction sequences and the performance defect events is no less than a second particular threshold.</claim-text> <claim-text>13. A computer program comprising computer program code to, when loaded into a computer system and executed, perform all the steps of the method according to any one of claims Ito 10.</claim-text> <claim-text>14. A method, apparatus and computer program as substantially describcd herein with reference to the description and as illustrated by the accompanying drawings of Figures 3 to 10. I0</claim-text>
GB201215035A 2011-08-30 2012-08-23 Performing code optimization Withdrawn GB2494268A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110252353.7A CN102955712B (en) 2011-08-30 2011-08-30 There is provided incidence relation and the method and apparatus of run time version optimization

Publications (2)

Publication Number Publication Date
GB201215035D0 GB201215035D0 (en) 2012-10-10
GB2494268A true GB2494268A (en) 2013-03-06

Family

ID=47045287

Family Applications (1)

Application Number Title Priority Date Filing Date
GB201215035A Withdrawn GB2494268A (en) 2011-08-30 2012-08-23 Performing code optimization

Country Status (3)

Country Link
CN (1) CN102955712B (en)
DE (1) DE102012214672A1 (en)
GB (1) GB2494268A (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107808098B (en) * 2017-09-07 2020-08-21 阿里巴巴集团控股有限公司 Model safety detection method and device and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002024052A (en) * 2000-07-03 2002-01-25 Matsushita Electric Ind Co Ltd Error reproduction test method of computer peripheral equipment
WO2007005123A2 (en) * 2005-06-29 2007-01-11 Microsoft Corporation Automated test case result analyzer
US20080127107A1 (en) * 2006-09-07 2008-05-29 Sun Microsystems, Inc. Method and apparatus for specification and application of a user-specified filter in a data space profiler
US20080271021A1 (en) * 2007-04-26 2008-10-30 Microsoft Corporation Multi core optimizations on a binary using static and run time analysis
US20090055636A1 (en) * 2007-08-22 2009-02-26 Heisig Stephen J Method for generating and applying a model to predict hardware performance hazards in a machine instruction sequence
US20090113403A1 (en) * 2007-09-27 2009-04-30 Microsoft Corporation Replacing no operations with auxiliary code

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7039910B2 (en) * 2001-11-28 2006-05-02 Sun Microsystems, Inc. Technique for associating execution characteristics with instructions or operations of program code
JP2010026851A (en) * 2008-07-22 2010-02-04 Panasonic Corp Complier-based optimization method
CN101727335A (en) * 2008-10-31 2010-06-09 国际商业机器公司 Installation method for binary code program and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002024052A (en) * 2000-07-03 2002-01-25 Matsushita Electric Ind Co Ltd Error reproduction test method of computer peripheral equipment
WO2007005123A2 (en) * 2005-06-29 2007-01-11 Microsoft Corporation Automated test case result analyzer
US20080127107A1 (en) * 2006-09-07 2008-05-29 Sun Microsystems, Inc. Method and apparatus for specification and application of a user-specified filter in a data space profiler
US20080271021A1 (en) * 2007-04-26 2008-10-30 Microsoft Corporation Multi core optimizations on a binary using static and run time analysis
US20090055636A1 (en) * 2007-08-22 2009-02-26 Heisig Stephen J Method for generating and applying a model to predict hardware performance hazards in a machine instruction sequence
US20090113403A1 (en) * 2007-09-27 2009-04-30 Microsoft Corporation Replacing no operations with auxiliary code

Also Published As

Publication number Publication date
CN102955712B (en) 2016-02-03
GB201215035D0 (en) 2012-10-10
CN102955712A (en) 2013-03-06
DE102012214672A1 (en) 2013-02-28

Similar Documents

Publication Publication Date Title
US8776027B2 (en) Extracting and collecting platform use data
US20220075794A1 (en) Similarity analyses in analytics workflows
US20180082215A1 (en) Information processing apparatus and information processing method
US20040215668A1 (en) Methods and apparatus to manage a cache memory
US8949579B2 (en) Ineffective prefetch determination and latency optimization
US9069915B2 (en) Identifying and routing poison tuples in a streaming application
US10394565B2 (en) Managing an issue queue for fused instructions and paired instructions in a microprocessor
US9965327B2 (en) Dynamically scalable data collection and analysis for target device
JP6823265B2 (en) Analytical instruments, analytical systems, analytical methods and analytical programs
US20190026805A1 (en) Issue resolution utilizing feature mapping
JP5791149B2 (en) Computer-implemented method, computer program, and data processing system for database query optimization
US10031757B2 (en) Operation of a multi-slice processor implementing a mechanism to overcome a system hang
US10853130B1 (en) Load balancing and conflict processing in workflow with task dependencies
CN115913710A (en) Abnormality detection method, apparatus, device and storage medium
US20160217126A1 (en) Text classification using bi-directional similarity
US8549487B2 (en) Automated identification of redundant method calls
US9384305B2 (en) Predicting the impact of change on events detected in application logic
GB2494268A (en) Performing code optimization
US9628109B1 (en) Operation of a multi-slice processor implementing priority encoding of data pattern matches
US7774759B2 (en) Methods and apparatus to detect a macroscopic transaction boundary in a program
US20170168833A1 (en) Instruction weighting for performance profiling in a group dispatch processor
CN113656391A (en) Data detection method and device, storage medium and electronic equipment
CN114745366A (en) Method and apparatus for continuous monitoring telemetry in the field
CN112860652A (en) Operation state prediction method and device and electronic equipment
US20140164397A1 (en) Apparatus and method for searching information

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)