CN103377032A

CN103377032A - Fine granularity scientific computation parallel processing device on basis of heterogenous multi-core chip

Info

Publication number: CN103377032A
Application number: CN2012101057224A
Authority: CN
Inventors: 刘鹏; 杨劼; 顾雄礼; 史册
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2012-04-11
Filing date: 2012-04-11
Publication date: 2013-10-30

Abstract

An embodiment of the invention discloses a fine granularity scientific computation parallel processing device on the basis of a heterogenous multi-core chip. The fine granularity scientific computation parallel processing device is characterized in that an interface module runs on a main core, and task type identifiers FLAG are generated according to data dependence relationships of objects and are transmitted into a recording module; the recording module runs on the main core and records the task type identifiers FLAG and target processor numbers TaskDest of follow-up objects, and the task type identifiers FLAG are determined according to a data flow model; an object distributing module runs on the main core and is used for distributing tasks to corresponding slave cores according to FLAG values and the TaskDest and updating FLAG and TaskDest in object tables of agent managers on the corresponding slave cores; agent manager modules which are used as agents of parallel processing devices are arranged on the main core and the various slave cores, are used for managing runtime systems and comprise the object tables, actuators and type selectors. The fine granularity scientific computation parallel processing device has the advantage that the fine granularity scientific computation parallel processing device is used for realizing parallelization and performance optimization for fine granularity scientific computation on a heterogenous multi-core system on a chip.

Description

A kind of fine granularity science based on the heterogeneous polynuclear chip is calculated parallel processing apparatus

Technical field

The invention belongs to the Computer Applied Technology field, relate to especially a kind of fine granularity science based on the heterogeneous polynuclear chip and calculate parallel processing apparatus.

Background technology

Develop from the polycaryon processor chip, at first be used for being calculated as main supercomputer with science.Because supercomputer needs large computing power, originally consisted of by multiprocessor exactly, the software of using is concurrent software, when the multi-processor core chip is developed, can accomplish large computing power, the software of using need not to do large change and just can easily transplant, and therefore can be used for very smoothly supercomputer.

When engaging in the applied research of embedded chip multi-core system, also need to use for reference the experience of research supercomputer multiple programming.From processor architecture, according to the difference of nuclear structure, on-chip multi-processor can be divided into two types of isomorphism and isomeries.The isomorphism polycaryon processor refers to that the structure of all processor cores of chip internal is identical, and function, the status of each nuclear are in full accord, can execute the task individually, and is close with general purpose single core processor function, structure.The heterogeneous multi-nucleus processor chip internal comprises the different processor core of a plurality of functions, and different processor cores is responsible for processing different tasks.Heterogeneous multi-nucleus processor is mainly used in the dedicated computing field at present, processor such as multimedia processor, flush bonding processor and super machine, usually comprise general processor and be specifically designed to and calculate the processor that accelerates, such as digital signal processor, network processing unit, Streaming Media processors etc., wherein general processor is responsible for the management of multiple nucleus system and OverDrive Processor ODP is mainly finished specific calculation task usually.Because the isomery on-chip multi-processor can adopt different processor cores to make up polycaryon processor according to application demand, heterogeneous structure can reach best in performance and power dissipation ratio.Because the ratio of task computation amount and the traffic is less in the fine granularity scientific program, generally need to increase as far as possible processor and be used for the ratio of calculating section to reach greater efficiency, and heterogeneous structure utilizes common treatment management multiple nucleus system and will speed up processor and free and be specifically designed to calculating, therefore adopts heterogeneous structure can guarantee that the fine granularity scientific program is in the efficient operation of chip multi-core system.

Parallel computation problem on the multi-core processor oriented is a focus of parallel software development, and its focus is mainly how parallel computation carries out distribution and the scheduling of process/thread.Allocation strategy be with course allocation to rational processor core owing to adopt heterogeneous structure, different processor core different in kinds, running status is different constantly in difference, the rationality of therefore distributing can affect system performance.Most important and typical several parallel computational models comprise random access parallel machine (Parallel Random Access Machine, PRAM) model, Integral synchronous parallel computational model (Bulk Synchronous Parallel Computing Model, BSP) multiprocessor model model and distributed store, point to point link (Latency overhead gap Processor, LogP) model.Under different hypothesis, each model has a lot of expansions.The PRAM model is towards the single instruction stream multiple data stream parallel machine, and it needs parallel machine to have the storage of sharing, and requires at any time processor can access shared memory cell, is not suitable for the heterogeneous polynuclear platform of distributed storage architecture.The BSP model does not also require the storer of parallel machine, can be to share or distribution mechanism, but not have corresponding expansion for the heterogeneous polynuclear platform.LogP model Based on Distributed storer, but require message passing mechanism between processor can only be single-point to single-point, and to obey permanent order, be not suitable for having the science that one-to-many message transmits and calculate.

Therefore, at present on sheet in the heterogeneous multi-core system, realize not pointed solution for the parallelization of fine granularity science computing application, thereby affected fine granularity science computing application getable performance in the heterogeneous multi-core system on sheet.So, for the defects that exists in the present prior art, in fact be necessary to study, so that a kind of scheme to be provided, solve the defective that exists in the prior art, avoid causing fine granularity science computing application getable poor-performing in the heterogeneous multi-core system on sheet.

Summary of the invention

For addressing the above problem, the object of the present invention is to provide a kind of fine granularity science based on the heterogeneous polynuclear chip to calculate parallel processing apparatus, be used for heterogeneous multi-core system on the sheet, calculate for the fine granularity science.By definition science compute type, determine computing cost, call overhead and communication overhead when agreement between application program and the operating system comes the minimizing science to calculate on sheet heterogeneous multi-core system operation, finish the fine granularity science is calculated parallelization and Performance tuning on heterogeneous multi-core system on the sheet.

For achieving the above object, technical scheme of the present invention is:

A kind of fine granularity science based on the heterogeneous polynuclear chip is calculated parallel processing apparatus, is applied to comprise that a main nuclear and at least one from the heterogeneous polynuclear chip of nuclear, comprise interface module, logging modle, and object distribution module and proxy manager module,

Described interface module operates on the main nuclear, is used for the agreement between realization application program and the operating system, according to the data dependence relation generation task type sign FLAG of object, and imports described logging modle into;

Described logging modle operates on the main nuclear, be used for the information about object that agreement defines between records application program and the operating system, record comprises task type sign FLAG and the follow-up object purpose processor code T askDest that determines according to data flow model;

Described object distribution module operates on the main nuclear, the distribution of object when being used for initialization, described object distribution module is assigned to task according to FLAG value and TaskDest corresponding from nuclear, and upgrades FLAG and TaskDest the Object table of corresponding proxy manager from nuclear;

Described proxy manager module is used for the management of runtime system, is present in main nuclear and respectively from nuclear, comprises Object table, actuator and type selecting device as the agency of parallel processing apparatus.

Preferably, described task type sign FLAG is defined as four mark value, described four mark value are the corresponding internuclear producer, internuclear consumer, the interior producer of nuclear and the interior consumer of nuclear successively, mark value is that 1 expression exists corresponding dependence, and mark value is that 0 expression does not exist corresponding dependence.

Compared with prior art, beneficial effect of the present invention is as follows:

(1) because the approach that task data is transmitted in the internuclear nuclear is different, also needs task is distinguished to some extent.According to above analysis, can know mainly need to judge following four problems for the type identification of task: 1) whether this task has the producer; 2) whether this task has the consumer; 3) whether the data transmission of this task occurs in the nuclear; 4) whether the data transmission of this task occurs in internuclear.According to above analysis, the present invention proposes a kind of strategy of the scheduling of classifying, be 16 types with task division at first, thereby cover 16 kinds of situations that above four problems combines in twos fully.In order to distinguish these 16 types, defined task type sign FLAG, it is defined as four mark value ABCD, four consumers in the producer and the nuclear in the corresponding internuclear producer, internuclear consumer, the nuclear successively, value is that 1 expression exists corresponding dependence, and value is that 0 expression does not exist corresponding dependence.Carry out targetedly for every type, reduced call overhead unnecessary in the process, be conducive to the fine granularity science and calculate parallelization and Performance tuning on heterogeneous multi-core system on the sheet;

(2) by the agreement between definition application and the operating system, the multiple programming personnel do not need to be concerned about the synchronous and data communication between object again, thereby have effectively reduced the difficulty of multiple programming.

Description of drawings

Fig. 1 be the embodiment of the invention calculate the frame construction drawing of parallel processing apparatus based on the fine granularity science of heterogeneous polynuclear chip;

Fig. 2 is that the fine granularity science based on the heterogeneous polynuclear chip of the embodiment of the invention is calculated proxy manager function flowchart in the parallel processing apparatus;

Fig. 3 be the embodiment of the invention calculate the application RED platform structure synoptic diagram of the embodiment of parallel processing apparatus based on the fine granularity science of heterogeneous polynuclear chip;

Fig. 4 a be the embodiment of the invention calculate the data dependence graph of application example one matrix multiplication of the embodiment of parallel processing apparatus based on the fine granularity science of heterogeneous polynuclear chip;

Fig. 4 b be the embodiment of the invention calculate the execution graph of application example one matrix multiplication of the embodiment of parallel processing apparatus based on the fine granularity science of heterogeneous polynuclear chip;

Fig. 5 a be the embodiment of the invention calculate the data dependence graph of application example two FFT of the embodiment of parallel processing apparatus based on the fine granularity science of heterogeneous polynuclear chip;

Fig. 5 b be the embodiment of the invention calculate the execution graph of application example two FFT of the embodiment of parallel processing apparatus based on the fine granularity science of heterogeneous polynuclear chip.

Embodiment

In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, is not intended to limit the present invention.

On the contrary, the present invention contain any by claim definition in substituting of making of marrow of the present invention and scope, modification, equivalent method and scheme.Further, in order to make the public the present invention is had a better understanding, in hereinafter details of the present invention being described, detailed some specific detail sections of having described.There is not for a person skilled in the art the description of these detail sections can understand the present invention fully yet.

Referring to Fig. 1, a kind of fine granularity science based on the heterogeneous polynuclear chip is calculated parallel processing apparatus, heterogeneous chip comprises that 1 master examines and i is individual from nuclear, i is not less than 1 integer, calculate parallel processing apparatus based on the fine granularity science of heterogeneous polynuclear chip and comprise interface module, logging modle, object distribution module and proxy manager module, wherein:

Interface module moves 101 on main nuclear, is used for the agreement between realization application program and the operating system, according to the data dependence relation generation task type sign FLAG of object, and imports logging modle into,

In the concrete application example, FLAG is defined as four mark value ABCD, these four consumers in the producer and the nuclear in the corresponding internuclear producer, internuclear consumer, the nuclear successively, mark value is that 1 expression exists corresponding dependence, mark value is that 0 expression does not exist corresponding dependence.

Logging modle 102 operates on the main nuclear, is used for the information about object that agreement defines between records application program and the operating system, task type sign FLAG, follow-up object purpose processor code T askDest that record is determined according to data flow model;

Object distribution module 103 operates on the main nuclear, and the distribution of object when being used for initialization is assigned to task according to FLAG value and TaskDest correspondingly from nuclear, and upgrades FLAG and TaskDest the Object table of the proxy manager of correspondence on examining;

The proxy manager module is used for the management of runtime system, it is present in main examining with each from nuclear as the agency of dispatching system, illustrated in the diagram on the main nuclear proxy manager module 104 and from examining the proxy manager module 105 on 1, the proxy manager module further comprises Object table, actuator and type selecting device, wherein:

Object table has comprised the full detail of the object of the upper mapping of this nuclear;

With reference to figure 2, be depicted as the task execution step process flow diagram of actuator, the process that actuator need to be finished comprises the activation stage of task, the execute phase of task and the synchronous phase of task, specifically may further comprise the steps:

The activation stage of task comprises,

S201 checks whether the input data buffering is ready,

S202 if the input data buffering is ready, provides feedback to the data of the internuclear preorder object production that receives, enters tasks execution phases;

The execute phase of task comprises,

S203, tasks carrying;

Judge whether task is finished,

If be not finished, continue to carry out,

If be finished, enter the tasks synchronization stage,

The tasks synchronization stage may further comprise the steps,

S204 determines synchronously with follow-up object whether data output buffer is effective;

S205, if effectively, the data transmission of internuclear object;

S206 judges whether the data transmission of internuclear object is finished,

S207 if the data transmission of internuclear object is finished, is set to the input data buffering of follow-up internuclear object effectively, and the data output buffer of internuclear preorder object is set to effectively;

S208 judges whether the feedback of follow-up object obtains,

S209 if the feedback of follow-up object obtains, is set to the input data buffering of object in the follow-up nuclear effectively, and the data output buffer of the interior preorder object of nuclear is set to effectively.

The type selecting device is selected corresponding step in the actuator according to Object table FLAG value, and is as shown in table 1:

Table 1 type selecting device is selected corresponding step in the actuator

The FLAG value	Implementation
		0000	S203
0001	S203、S204、S207
		0010	S201、S203、S209
0011	S201、S203、S204、S205、S206、S207
		0100	S203、S204、S205、S206
0101	S203、S204、S205、S206、S207
		0110	S201、S203、S204、S205、S206、S209
0111	S201、S202、S203、S204、S205、S206、S208、S209
		1000	S201、S202、S203、S208
1001	S201、S202、S203、S204、S207、S208
		1010	S201、S202、S203、S208、S209
1011	S201、S202、S203、S207、S208、S209
		1100	S201、S202、S203、S204、S205、S206、S208
1101	S201、S202、S203、S204、S205、S206、S207、S208
		1110	S201、S202、S203、S204、S205、S206、S208、S209
1111	S201、S202、S203、S204、S205、S206、S207、S208、S209

In this system the fine granularity science being calculated parallelization may further comprise the steps:

(1) interface module receives application information, according to the data dependence relation generation task type sign FLAG of object, and imports logging modle into;

(2) logging modle logger task type identification FLAG, follow-up object purpose processor code T askDest;

(3) the object distribution module is assigned to task according to FLAG value and TaskDest correspondingly from nuclear, and upgrades FLAG and TaskDest the Object table of corresponding proxy manager from nuclear;

(4) the type selecting device in the proxy manager module is selected corresponding function in the actuator according to the FLAG in the Object table, until all are complete in the Object table.

The below will describe as application example with matrix multiplication and Fast Fourier Transform (FFT) (Fast Fourier Transformation, FFT).Matrix multiplication is a typical mathematical problem, because its calculated amount is large, is commonly used to the floating-point operation performance of test computer.For parallel machine, the height of its parallel efficiency also can be tested by matrix multiplication.Need to use a large amount of matrix multiplications in applications such as process control, image processing and science calculating, realize that its parallel computation can improve operational efficiency.FFT is one of basic theories in contemporary signal analysis and processing, communication engineering, power engineering, control field and the information engineering and method, and obtains extensive and general application in relevant mathematics, physics and the field of engineering technology such as mechanics, optics, quantum physics and various Linear System Analysis.

Application example one

The Cannon algorithm realization of matrix multiplication, as shown in Figure 4.Matrix multiplication is very common application during the fine granularity science is calculated, if according to the Cannon algorithm, its data dependence graph is shown in Fig. 4 (a), to being described as follows of Task Dependent figure:

1) object objects ti0 (i=0...3) is the preorder object of internuclear follow-up object tij (i=1...4, j=1...8), finishes the transmission of matrix primary data.

2) object tij (i=1,2,3, the j=1...8) intermediate result of calculated sub-matrix, they be nuclear in follow-up object ti+1, j (i=1,2,3, preorder object j=1...8).

3) end product of object t4j (j=1...8) calculated sub-matrix, they are preorder objects of internuclear follow-up object t50.

4) object t40 forms the end product of matrix computations.

How the below will carry out to use to parallel processing system (PPS) describes:

(1) interface module operates on the main nuclear, is used for the agreement between realization application program and the operating system, according to the data dependence relation generation task type sign FLAG of object, and imports logging modle into.

1) the seedless interior producer/consumer of object objects ti0 (i=0...3), seedless the producer has internuclear consumer tij (i=1...4, j=1...8), and interface module is set to 0100 according to protocol description with its FLAG value.

2) object tij (i=1, j=1...8) has consumer in the nuclear, and the seedless interior producer has the internuclear producer, seedless consumer, and interface module is set to 1001 according to protocol description with its FLAG value.

3) object tij (i=2...4, j=1...8) has consumer in the nuclear, and the producer in the nuclear is arranged, and the internuclear producer is arranged, seedless consumer, and interface module is set to 1011 according to protocol description with its FLAG value.

4) object t40 has the internuclear producer, seedless consumer, and the seedless interior producer, seedless the producer, interface module is set to 1000 according to protocol description with its FLAG value.

(2) logging modle operates on the main nuclear, be used for the information about object that agreement defines between records application program and the operating system, task type sign FLAG, object purpose processor code T askDest that record is determined according to data flow model, main list item is as shown in table 2:

Task type sign and the object purpose processor number table of table 2 matrix multiplication

(3) the object distribution module operates on the main nuclear, the distribution of object when being used for initialization, it is assigned to task according to FLAG value and TaskDest corresponding from nuclear, and upgrades FLAG and follow-up TaskDest the Object table of corresponding proxy manager from nuclear;

For using for example, the object distribution module reads the task in the logging modle at first according to the order of sequence, is 0 such as the TaskDest of task t00, and namely task will be assigned on the main nuclear; TaskDest such as task t11 is 1, and namely task will be assigned to from nuclear DSP1, upgrade simultaneously FLAG and follow-up TaskDest in the Object table of the proxy manager on the DSP1; Final execution graph is shown in Fig. 4 b.

(4) the proxy manager module is used for the management of runtime system, and it is present in main examining with each from nuclear as the agency of dispatching system, comprises Object table, actuator and type selecting device, wherein:

Actuator comprises 9 functions, and step order is with reference to figure 2.The type selecting device is selected corresponding function in the actuator according to Object table FLAG value.For using for example, the type selecting device in the proxy manager module is according to Object table FLAG value, selects in the actuator accordingly function, is 0100 such as the FLAG of task t00, and then actuator will only be carried out function 3,4,5,6, and skip functions 1,2,7,8,9; The FLAG of task t11 is 1001, and then actuator will only be carried out function 1,2,3,4,7,8, and skip functions 5,6,9.

Application example two

Fft algorithm is the fast algorithm of discrete Fourier transformation, wherein most popular is the Cooley-Turkey algorithm, computing formula is as follows: 1) object t00 is the preorder object of internuclear follow-up object t1j (j=1...8), be responsible for that original FFT data are carried out permutatation and send to later on corresponding accelerator module DSP, it is mapped on the control RISC.

n = N_{2} n_{1} + n_{2} \{\begin{matrix} 0 \leq n_{1} \leq N_{1} - 1 \\ 0 \leq n_{2} \leq N_{2} - 1 \end{matrix},

k = k_{1} + {N_{1} k}_{2} \{\begin{matrix} 0 \leq k_{1} \leq N_{1} - 1 \\ 0 \leq k_{2} \leq N_{2} - 1 \end{matrix},

N＝N ₁N ₂.

Adopt 64 FFT as the test example in the application example, N1=N2=8 is set, here shown in Fig. 5 a.

1) object t00 is the preorder object of internuclear follow-up object t1j (j=1...8), is responsible for that original FFT data are carried out permutatation and sends to later on corresponding accelerator module DSP, and it is mapped on the control RISC.

2) object t1j (j=1...8) calculates for the first time inner FFT

Then the result is given internuclear follow-up object t20, these object map are on DSP.

3) object t20 calculates

Afterwards with step 2) in the data of send multiply each other so that the arrangement of data is by x[k1, n2] transfer x[n2, k2 to], the result that will obtain afterwards passes to follow-up internuclear object t3j (j=1...8), t20 is mapped on the RISC.

4) object t3j (j=1...8) calculates for the second time inner FFT

Afterwards the result is passed to internuclear follow-up object t40.Object t3j (j=1...8) is mapped on the DSP.

5) object t40 will export data and be transformed into frequency domain, obtain last transformation results, and it is mapped on the RISC.

1) the seedless interior producer/consumer of object t00, seedless the producer has internuclear consumer tij (i=1, j=1...8), and interface module is set to 0100 according to protocol description with its FLAG value.

2) the seedless interior producer/consumer of object tij (i=1, j=1...8) has internuclear producer t00, and internuclear consumer t20 is arranged, and interface module is set to 1100 according to protocol description with its FLAG value.

3) the seedless interior producer/consumer of object t20 has internuclear producer tij (i=1, j=1...8), and internuclear consumer tij (i=3, j=1...8) is arranged, and interface module is set to 1100 according to protocol description with its FLAG value.

4) the seedless interior producer/consumer of object tij (i=3, j=1...8) has internuclear producer t20, and internuclear consumer t40 is arranged, and interface module is set to 1100 according to protocol description with its FLAG value.

5) object t40 has the internuclear producer, seedless consumer, and the seedless interior producer, seedless the producer, interface module is set to 1000 according to protocol description with its FLAG value.

(2) logging modle operates on the main nuclear, be used for the information about object that agreement defines between records application program and the operating system, task type sign FLAG, object purpose processor code T askDest that record is determined according to data flow model, main list item is as shown in table 3:

Task type sign and the object purpose processor number table of table 3FFT algorithm

For using for example, the object distribution module reads the task in the logging modle at first according to the order of sequence, is 0 such as the TaskDest of task t00, and namely task will be assigned on the main nuclear; TaskDest such as task t11 is 1, and namely task will be assigned to from nuclear DSP1, upgrade simultaneously FLAG and follow-up TaskDest in the Object table of the proxy manager on the DSP1; Final execution graph is shown in Fig. 5 b.

Actuator comprises 9 steps, and with explanation in the application example one, execution sequence as shown in Figure 2.

For using for example, the type selecting device in the proxy manager module is according to Object table FLAG value, selects in the actuator accordingly function, is 0100 such as the FLAG of task t00, and then actuator will only be carried out function 3,4,5,6, and skip functions 1,2,7,8,9; The FLAG of task t11 is 1100, and then actuator will only be carried out function 1,2,3,4,5,6,8, and skip functions 7,9.

The present invention utilizes the heterogeneous polynuclear SOC (system on a chip), comprise 1 Reduced Instruction Set Computer (Reduced Instruction Set Computer, RISC) processor and 8 digital signal processor (Digita1 Signal Processors, DSPs) multinuclear RED (1*RISC+8*DSP) platform that forms is tested, the RED platform structure as shown in Figure 3, experimental result is as shown in table 4.

Table 4 adopts parallel processing apparatus of the present invention and is the call overhead comparison sheet that adopts parallel processing apparatus of the present invention

Experimental result shows, by reducing call overhead, in the application example one the entire system improved efficiency 36.35%, in the application example two the entire system improved efficiency 29.28%.

The above only is preferred embodiment of the present invention, not in order to limiting the present invention, all any modifications of doing within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims

1. the fine granularity science based on the heterogeneous polynuclear chip is calculated parallel processing apparatus, is applied to comprise that a main nuclear and at least one from the heterogeneous polynuclear chip of nuclear, is characterized in that, comprise interface module, logging modle, object distribution module and proxy manager module

Described proxy manager module is present in main nuclear and respectively from examining, is used for the management of runtime system as the agency of parallel processing apparatus, comprises Object table, actuator and type selecting device.

2. the fine granularity science based on the heterogeneous polynuclear chip according to claim 1 is calculated parallel processing apparatus, it is characterized in that, described task type sign FLAG is defined as four mark value, described four mark value are the corresponding internuclear producer, internuclear consumer, the interior producer of nuclear and the interior consumer of nuclear successively, mark value is that 1 expression exists corresponding dependence, and mark value is that 0 expression does not exist corresponding dependence.

3. the fine granularity science based on the heterogeneous polynuclear chip according to claim 1 is calculated parallel processing apparatus, it is characterized in that, the tasks carrying of described actuator comprises the activation stage of task, the execute phase of task and the synchronous phase of task, specifically may further comprise the steps:

The activation stage of task comprises, checks whether the input data buffering is ready,

Data to the internuclear preorder object production that receives provide feedback;

The execute phase of task comprises, tasks carrying,

Judge whether task is finished,

If be not finished, continue to carry out,

If be finished, enter the tasks synchronization stage,

The described tasks synchronization stage may further comprise the steps, and determines synchronously with follow-up object whether data output buffer is effective;

If effectively, the data transmission of internuclear object;

Judge whether the data transmission of internuclear object is finished, if finish, the input data buffering of follow-up internuclear object is set to effectively, the data output buffer of internuclear preorder object is set to effectively;

Whether the feedback of judging follow-up object obtains, if obtain, the input data buffering of object in the follow-up nuclear is set to effectively, and the data output buffer of the interior preorder object of nuclear is set to effectively.