CN101593125B

CN101593125B - Method for dynamically monitoring execution flow of binary internal translator by using monitoring thread

Info

Publication number: CN101593125B
Application number: CN2009100543257A
Authority: CN
Inventors: 管海兵; 梁阿磊; 李晓龙; 倪志晨; 邓海鹏
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2009-07-02
Filing date: 2009-07-02
Publication date: 2012-03-14
Anticipated expiration: 2029-07-02
Also published as: CN101593125A

Abstract

The invention provides a method for dynamically monitoring an execution flow of a binary translator by using a monitoring thread, which comprises the following steps that: firstly, a new program monitoring thread MT is created for the prior dynamic binary translation system; and secondly, a stub code is inserted in each translation-generating basic block, wherein when executed, each basic block write the entry address thereof into a queue, and queue overflow is prevented by interthread wait; thirdly, the monitoring thread MT is responsible for taking the entry addresses out sequentially, searching for corresponding intermediate instruction basic blocks, updating corresponding data structure according to the termination types of the basic blocks and completing the program monitoring; and finally, referring to part of collected information, the monitoring thread, as a decision maker, completes the optimization of the establishment of a heat path. Compared with the prior program monitoring method, the method has the advantages of synchronously carrying out program analysis monitoring and program execution, along with small software expenditure, low hardware cost, complete and accurate acquired monitoring information and the like.

Description

Use the method for monitoring thread to the execution flow of binary internal translator dynamic monitoring

Technical field

The present invention relates to a kind of method of using monitoring thread to the execution flow of binary internal translator dynamic monitoring; Various dynamic subdivision information when being used to obtain the binary program execution, work provides effective support with the optimization binary translation system in order to analyze binary executable code program implementation flow process.The invention belongs to parallelization calculating and field of binary translation.

Background technology

The binary translation technology is as important ingredient in the process level virtual machine field, in the academic exchange activity of computing machine in recent years, mentioned again and again and discusses.Usually, the binary translation technology adopts the technology of " twice translation process " and " buffer memory translation post code " to improve performance.So-called twice translation is meant the source fundamental block from being made up of the source machine instruction earlier, is translated as the middle fundamental block of being made up of from the intermediate language of line description system, is the fundamental block carried out on the target machine by the basic block translation in centre again.The advantage of this technology is: under the condition of higher level lanquage source code information; Can be through directly loading the binary executable of source machine end; And utilize self interpretative function to different machines instruction set architecture (ISA), obtain the binary program carried out of target machine end.The more important thing is that this technology will be translated to be incorporated in program process with execution work and all accomplish, the elder generation that is different from the traditional static Compilation Method compiles the second process of afterwards carrying out.But this advantage of dynamic binary translation system is also obtained program subdivision information (profile) with very big difficulty for it, promptly is difficult in to guarantee to obtain complete subdivision information in the process of dynamic execution.And subdivision information is used for the data foundation of program execution flow monitoring just.

Subdivision information (profile) is meant those in program process, program feature is played the information of material impact effect.Obtaining of these information not only can be used for the quality that discriminating program writes, and the foundation of quantification can also be provided for optimization work.In binary program, program is the fundamental block (Basic Block) that finishes in order to jump instruction or system call for the tolerance and the atomic unit of dividing, so complete subdivision information just comprises the execution number of times of each fundamental block; Number of hops between fundamental block; The misprediction rate of branch prediction; The disappearance hit rate of Instructions Cache, metadata cache etc.These information can be used for confirming the quality and effective enforcement of instructing various optimisation strategy of code translation.The method of these subdivision information of existing statistics comprises pitching pile (Instrumentation) and sampling (Sampling) two kinds of technology.

● this mode of pitching pile (Instrumentation) is gathered and program act of execution, the relevant data message of characteristic through in code, inserting probe instructions.This method realizes by software, though hardware cost is lower, can introduce additional overhead.

● sampling (Sampling) this mode is carried out data aggregation to the related data of program run at a certain time interval; And need program not made amendment; But the subdivision information that this method obtains has certain error, and need realize that cost is higher by hardware.

In the conventional dynamic binary translation system; In order to save hardware cost and reduction system degree of dependence to specific hardware; Usually select for use the method for pitching pile to carry out the subdivision information collecting; But consider the performance of system, only inserted expense less, can only be used for adding up fundamental block and carry out the structure work that the probe instructions of number of times is accomplished hot path.So-called hot path, promptly when some fundamental block in the binary program frequent in a certain order be performed the time, such Program path is called as hot path.When such fundamental block or path are detected; Through each fundamental block on this paths is combined just can faster procedure execution speed; Reason just has been to reduce the number of times that jump instruction is carried out; So not only improve the hit rate of Instructions Cache, also improved the efficient of processor flowing water working mechanism.Make the structure of superblock more be added with basis in order to obtain more subdivision information, classic method can only be passed through frequent readjustment statistical function and realize, and should can make the execution efficient of system slack-off more than tens of times by strategy.(see also document: Valgrind:Aframework for heavyweight dynamic binary instrumentation, PLDI conference, 2007).

Along with the continuous development of polycaryon processor, parallel computation is because its higher performance is applied in the various scenes by increasing, and this mainly ascribes the notion of hardware level thread to, and promptly different threads can be operated in different CPU processor cores in the heart.In dynamic binary translation system; Because collecting the work of subdivision information, binary program do not have too much dependence with original dynamic binary translation system " twice translation " and the flow process of carrying out; Therefore, can attempt using the method for hardware level thread to come to accomplish more efficiently the work that subdivision information is obtained.But this thinking is in existing domestic and international research and put into practice similar realization case is not arranged in the career field.

Summary of the invention

The objective of the invention is to deficiency to prior art; A kind of method of using monitoring thread to the execution flow of binary internal translator dynamic monitoring is provided; It is little to have software overhead; The good characteristic that hardware cost is low, can complete and accurate obtain monitor message, further promote the run time behaviour of dynamic binary translation system.

For realizing above-mentioned purpose; The present invention at first carries out flow process as main thread with original dynamic binary translation system; And be that original system adds new program monitoring thread MT; Be fundamental block pitching pile code that each translation generates then, all self entry address write in the formation when each fundamental block is carried out that the mode that the situation that formation is overflowed uses cross-thread to wait for is avoided.Monitoring thread MT is responsible for according to the order of sequence the corresponding intermediate instruction fundamental block to be taken out and to be searched in each entry address, according to the end type of this fundamental block related data structure among the new thread MT more, accomplishes the purpose of program monitoring.At last, monitoring thread MT accomplishes the structure optimization of hot path with reference to the subdivision information of collecting with decision maker's identity.

Use monitoring thread of the present invention is following to the concrete performing step of method of execution flow of binary internal translator dynamic monitoring:

1, the present invention at first carries out flow process as main thread with original dynamic binary translation system, and on original system, newly creates a monitoring thread MT, is used to accomplish the real time monitoring work to the binary program act of execution.Main thread and monitoring thread MT work alone on the different core of multi-core processor platform.

2, use for reference the principle of traditional pitching pile technology; Be each head by the basic piece of scale-of-two of binary translation generation; Insert one section machine code, when carrying out this fundamental block, this section machine code is responsible for the port address that this fundamental block is gone into is deposited among the formation Q of appointment.When in this process; When in case this formation Q is filled up by the entry address of the fundamental block of constantly carrying out in the main thread; For the accuracy and the program correctness that guarantee to monitor, main thread must suspend execution, waits for that monitoring thread MT handles the Signal of enabling signal once more that sends after all data in the formation; When main thread was received this signal, main thread could continue to carry out.

3, monitoring thread MT obtains the entry address of fundamental block successively from formation Q; And through searching this entry address corresponding intermediate instruction fundamental block; Understand the concrete program behavior of this metainstruction fundamental block; The various subdivision information that collection needs, and the result deposited in the self-built data structure of monitoring thread MT.When if but the position of depositing when this fundamental block entry address is the maximum memory location of formation Q, monitoring thread MT will empty formation Q again, and send enabling signal Signal notice main thread and continue to carry out.

4, according to the subdivision information collected, if when thread MT detects number of times that certain bar Program path carries out greater than certain preset threshold, monitoring thread MT will accomplish the optimizing process that makes up hot path with decision maker's identity.

5, after the subdivision information collection work completion to the corresponding fundamental block in current entry address; Monitoring thread MT is with continuing to obtain next entry address among the formation Q; Operation in the repeating step 3 and 4 is to reach the purpose to the execution flow of binary internal translator dynamic monitoring.

The present invention compares with classic method; It is little to have software overhead; Hardware cost is low, and process analysis monitoring and program are carried out to walk abreast and carried out multiple good characteristics such as the monitor message complete and accurate that obtains; Can instruct the optimization work of dynamic binary translation system further; Improve the quality of its interpreter code, the branch prediction error rate, cache miss rate etc. that reduce its executable code influence the key factor of performance, and therefore this method can further promote the run time behaviour of dynamic binary translation system.

Embodiment

For understanding technical scheme of the present invention better, below further describe through concrete embodiment.Following examples do not constitute qualification of the present invention.

1. create hardware level monitor thread MT

The dynamic binary translation system CrossBit that the embodiment of the invention is based on Shanghai Communications University's independent development (sees also document: design and the realization of binary translation basic platform CrossBit; Computer engineering; 2007.12) on development, the execution flow process of CrossBit is: (1) loads the source executable image; (2) search whether there is the fundamental block object of forming by the target machine sign indicating number after the translation in the Hash table; (3) hit the target machine code fundamental block that execution is corresponding if search; If search disappearance; Then carry out " fundamental block of forming by the source machine sign indicating number-＞fundamental block formed by metainstruction-＞fundamental block formed by the target machine code " the process of twice translation of fundamental block; And the result deposited in the object code buffer memory; Upgrade Hash table, this table has the function of each fundamental block core position, location in CrossBit.(4) carry out target code basic blocks; If run into that the destination address of jump instruction is not sure of or during system call; Switch (Context Switch) through context and get back in the CrossBit program, accomplish system call, or accomplish the operation that jump instruction is linked to newly-generated purpose fundamental block.It is unusual up to program run end or generation that the final program flow process skips back to flow process (2).

The original flow process unification of the present invention CrossBit is set to a main thread, and all modules of being responsible for the watchdog routine behavior are summed up as another monitor thread MT.The pthread program function storehouse under the Linux has been used in the establishment of thread, that is:

pthread_create(pthread_t?pid，pthread_attr_t?attr，void*func(void*)，void*arg)；

Wherein, parameter p id represents the unique sign of this thread in operating system, comprises the due attribute information of this thread among the attr, the code function entity that func uses for this thread, and arg is its parameter.

In order to make main thread and monitor thread MT have the characteristic of hardware thread, the present invention uses the grand function C PU_SET of linux kernel that two threads are distributed to different hardware processor resources respectively:

CPU_SET (0 ， &mask) // main thread is distributed on No. 0 core of polycaryon processor,

CPU_SET (1 ， &mask) // monitoring thread MT is distributed on No. 1 core of polycaryon processor.

2. the necessary code of pitching pile

If do not adopt the mode of hardware thread but classic method is monitored binary program, will inevitably use a large amount of pitching pile codes to accomplish due function, the efficient when causing program to be carried out is lower.The present invention uses for reference the principle of pitching pile technology; The characteristics of combined with hardware thread; All insert like next section machine code through the fundamental block head that the dynamic binary translation system translation generates at each; The various Information Statistics operations of relative complex all are arranged among the thread MT to be accomplished, and promptly becomes original serial subdivision pattern and is a kind of parallel subdivision pattern.

The machine code of concrete insertion is following:

%movw&QueueCount, %eax//queue head pointer's value is put into the eax register;

%cmp 12M, the magnitude relationship of %eax//comparison head pointer value and 12M, 12M is repeatedly a maximal value between the definite queue empty in experiment back;

%jle label1//skip to label1 smaller or equal to 12M;

%movb 1 ， &overflow//otherwise formation overflow indicator bit variable overflow is set is 1;

%ret//return CrossBit master routine;

label1：

%add 4, and on behalf of the eax of head pointer, the %eax//label1 place at first will add 4, because each EnterAddress occupies 4 bytes;

%movw%eax ,/new head pointer is write back former head pointer memory variable;

%add&QueueEntry, %eax//calculate actual deposit position, promptly head pointer adds the formation base address;

%movw EnterAddress, [%eax, 0] // internal memory write operation writes formation with the fundamental block entry address;

This section code is one section and carries out the program of queue operation in the assembly language rank that this formation is present in the virtual memory space of program process, so can be by main thread and monitor thread MT common access.The base address of this formation is left among the variable QueueEntry, and the variable QueueCount of queue head pointer points to a team position of this formation all the time, and the operation that is pressed into formation is exactly that data are preserved in the core position that into QueueEntry+QueueCount indicates.Zone bit overflow is used for notifying the CrossBit master routine that formation has taken place overflowing behavior; Be that the state that formation is filled up by the entry address of fundamental block takes place, this moment, main thread can call the conditional-variable operation pthread_cond_wait (&pthread_cond_t in the pthread storehouse) wait for that monitoring thread MT empties the enabling signal amount of sending after the formation again.This system call meeting suspends the execution of main thread always, up to receiving for the signal of this conditional-variable operation pthread_cond_signal (&pthread_cond_t) call the execution that just can continue main thread.

3. the concrete realization of monitor thread MT

Monitoring thread MT obtains the entry address of fundamental block successively from formation Q; And through searching this entry address corresponding intermediate instruction fundamental block; Understand the concrete program behavior of this metainstruction code block; The various subdivision information that collection needs, and the result deposited in the self-built data structure of monitoring thread MT.

When monitoring thread MT got access to the entry address of a fundamental block, it at first utilized Hash table function in the dynamic binary translation system to search and obtains this entry address corresponding intermediate instruction fundamental block in " twice translation ".End type in the time of can knowing that by the command information in this metainstruction fundamental block this fundamental block finishes comprises direct redirect, redirect or system call indirectly.Hardware level monitoring thread MT is responsible for the program circuit of the binary program of current executed is monitored in real time; Three kinds of data structures that this monitoring thread is self-built for this reason are respectively fundamental block and carry out frequency table (being used to add up the number of times that the target fundamental block of direct redirect is carried out), jump list (being used to add up the number of hops on each limit of indirect redirect), system call record information list indirectly.Because each entry address is that the order that follow procedure is carried out is deposited in formation successively in the main thread, therefore, monitoring thread MT only need read by the order of FIFO (FIFO) successively, content and flow process that just can the execution of understanding program.Simultaneously, monitoring thread MT and master routine thread parallel are carried out, and this just can guarantee the real-time monitored preferably.In order to obtain the semanteme of the corresponding fundamental block in each entry address, the present invention has utilized intermediate language mechanism ripe among the CrossBit (to please refer to document: An Intermediate Language LevelOptimization Framework for Dynamic Binary Translation, Shi Huihui; Wang Yi; Guan Haibing, and Liang Alei, ACM SIG/PLAN Notice; Vol-42 (5), May 2007.)

Illustrate; (each entry address is 4 bytes when the entry address that monitoring thread MT gets is 0x40000000; Be used for indicating the position of certain code block) in the whole process space; The metainstruction fundamental block that thread MT can at first use hash function to find the corresponding intermediate language in this entry address to form through reading the last item instruction of metainstruction fundamental block, obtains one of three kinds of following results:

JMP (v25,0) // redirect indirectly;

BRANCH (tttn, v21, v22, v0; Disp) // and directly redirect, tttn is a condition code, v0==0, v21; Depositing among the v22 needs two operands relatively, and promptly when v21 and v22 satisfied the redirect condition of tttn appointment, program skipped to the destination address of (0+disp) appointment.

SYSCALL // system call;

JMP instruction wherein is owing to used virtual register v25 as the base address addressing; And side-play amount is 0, so this instruction is indirect redirect, when reading next entry address when being 0x50000000; Thread MT is just increasing record (0x40000000 in the jump list indirectly; 0x50000000,1), promptly represent to be performed once from of the indirect redirect of 0x4000000 fundamental block to the 0x50000000 fundamental block.The class of operation of carrying out for direct redirect and system call seemingly, the difference of the different concrete data structures that is merely renewal.

In above-mentioned flow process, each table of the present invention is realized by the data structure that C++ java standard library STL provides.

When monitoring thread MT found that but the current deposit position of its entry address of handling is the maximum memory location of formation, monitoring thread MT need be with formation set again.Method is 0 for existing QueueCount variable is set, and promptly next main thread will be preserved new entry address from the formation reference position again.In addition, monitoring thread MT also need call the pthread_cond_signal (&pthread_cond_t in the pthread storehouse) function accomplishes the transmission of enabling signal.

4. the decision making function of monitor thread MT realizes

Monitoring thread MT carries out the purpose of program monitoring, just is to obtain perfect subdivision information and the execution flow process of analyzing binary program.And when subdivision information satisfied certain condition, the present invention can start some specific optimization methods, and method commonly used is the structure of hot path.

Therefore; Monitoring thread MT of the present invention when the execution number of times of the fundamental block that detects certain direct redirect greater than the execution number of times of threshold value 3000 or the indirect redirect of certain bar during (these threshold values through experiment confirm) repeatedly greater than threshold value 5000; Just accomplish corresponding optimization work; Comprise based on direct redirect and detect the superblock building process accomplished and detect the superblock building process of accomplishing, thereby the execution speed of original binary system is accelerated based on indirect redirect.

5. the design of monitoring thread MT mode of operation

Complete monitoring thread MT has adopted while poll structure to design, and is not the common consumer/producer's relational model that mutex amount (Mutex) realizes that passes through.Although the poll framework can take a large amount of processor time, have better real-time parallel effect, monitoring thread MT is implemented as hardware thread in addition, and its processor waste of time does not influence the execution efficient of master routine, so adopted by the present invention.Therefore, as formation Q (QueueCount＞0 time) when empty, thread MT will ceaselessly circulate and read next entry address and handle.

So, realization is to the dynamic monitoring of execution flow of binary internal translator.

Claims

1. a method of using monitoring thread to the execution flow of binary internal translator dynamic monitoring is characterized in that comprising the steps:

1) original dynamic binary translation system is carried out flow process as main thread, and on original system, newly create a monitoring thread MT, be used to accomplish real time monitoring work binary translation program act of execution; Main thread and monitoring thread MT work alone on the different core of multi-core processor platform;

2) use for reference the technological principle of traditional pitching pile; Be each basic build of scale-of-two that generates through the dynamic binary translation system translation; Insert one section machine code, when carrying out this fundamental block, this section machine code is responsible for the entry address of this fundamental block is write among the formation Q; In this ablation process, in case formation Q is filled, main thread must suspend, and waits for that monitoring thread MT handles the Signal of enabling signal once more that sends after all data in the formation, receives this when main thread and could continue execution once more during enabling signal;

3) monitoring thread MT obtains the entry address of fundamental block successively from formation Q; And through searching this entry address corresponding intermediate instruction fundamental block; Understand the concrete program behavior of this metainstruction fundamental block; The various subdivision information that collection needs, and the result deposited in the self-built data structure of monitoring thread MT; When if but the position of depositing when this fundamental block entry address is the maximum memory location of formation Q, monitoring thread MT will empty formation Q, and send once more enabling signal Signal notice main thread and proceed;

4) according to the subdivision information collected, if when monitoring thread MT detects number of times that certain bar Program path carries out greater than preset threshold, monitoring thread MT will accomplish the optimizing process that makes up hot path with decision maker's identity;

5) after the subdivision information collection work completion to the corresponding fundamental block in current entry address; Monitoring thread MT is with continuing to obtain next entry address among the formation Q; Repeating step 3), 4) in operation, to realize dynamic monitoring to execution flow of binary internal translator.