CN109189477B - Instruction emission control method oriented to multi-context coarse-grained data stream structure - Google Patents
Instruction emission control method oriented to multi-context coarse-grained data stream structure Download PDFInfo
- Publication number
- CN109189477B CN109189477B CN201810682382.9A CN201810682382A CN109189477B CN 109189477 B CN109189477 B CN 109189477B CN 201810682382 A CN201810682382 A CN 201810682382A CN 109189477 B CN109189477 B CN 109189477B
- Authority
- CN
- China
- Prior art keywords
- stage
- physical context
- physical
- context
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000007246 mechanism Effects 0.000 claims abstract description 33
- 230000008713 feedback mechanism Effects 0.000 claims description 16
- 230000005540 biological transmission Effects 0.000 claims description 5
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 230000001609 comparable effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 241000238876 Acari Species 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000005094 computer simulation Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
Abstract
The invention provides an instruction emission control method, system and device for a multi-context coarse-grained data stream structure. The method relates to the contents of an instruction arrangement mode, a physical context selection strategy, a Stage merging mechanism and the like in a system, and stages of the same type in a coarse-grained data stream mode are merged to the maximum extent through a physical context selection logic to form a United Stage. The merging unit controls the execution of the functional block by operating the Stage internal PC pointer. The technical scheme of the invention can cover high delay caused by instructions such as access and storage, provide enough instructions for the functional components, effectively improve the utilization rate of the components, simplify the selection logic of the functional components and improve the operation efficiency of the system.
Description
Technical Field
The present invention relates to the field of instruction transmission method design in a processing unit, and in particular, to an instruction transmission control method, system and apparatus for a coarse-grained data stream structure including multiple contexts.
Background
In recent years, research and development on data stream structures are widely regarded by academia and markets, and a data-driven mechanism enables the data-driven mechanism to successfully get rid of the limitation of PCs in a control stream structure. In the structure, one instruction can be executed only by waiting for the ready operation number without the support of a shared memory, so that asynchronous parallel execution of different instructions can be effectively realized, parallelism in a program is fully mined, and the calculation speed and efficiency of a processor are improved. Meanwhile, compared with the traditional control flow structure, the simple control logic reduces the area of the data flow structure processor, so that the data flow structure processor has the advantages of low power consumption and high performance power consumption ratio.
In many practical applications including graph computation, there are often instructions with data dependencies, and a traditional data stream cannot efficiently process such instruction segments, and even brings additional overhead, so the concept of "coarse-grained data stream" is introduced. Coarse-grained data flow divides a plurality of instructions with data dependence into a Stage (Stage), and the inside of the Stage executes a control flow mode through a PC (personal computer), so that the instructions with data dependence do not need to pass through complicated matching logic in the data flow, and the stages are advanced in the form of data flow. FIG. 1 shows the application form of the partial flow of the vertex-centralized model commonly used in graph computation in the coarse-grained data stream structure, the execution process of which is divided into a plurality of stages of different types, and each Stage contains 1 or more instructions with data dependency. The execution mode can reasonably and effectively combine the execution modes of the data flow and the control flow, and simplifies logic and avoids unnecessary expenditure while mining the instruction parallelism.
The multi-context mode can effectively reduce the idle time of each part in the data flow mode and further improve the utilization rate of the functional parts. The Context schema comprises two structures of a Physical Context (Physical Context) and a Logical Context (Logical Context): different physical contexts correspond to unrelated data spaces and compete for one set of functional components together, when the functional components are idle, the physical contexts can be switched, the data is not dependent, and high delay generated by instructions such as memory access and the like is effectively covered; each logical context corresponds to one iteration of the program, and in a coarse-grained data flow mode, the logical context completes all stages of the program in a streaming manner (streaming), that is, when the load is full, the logical contexts with the number of stages in the same physical context work simultaneously. However, the increased number of physical contexts and logical contexts makes the data selection logic of the functional unit more complex.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a novel instruction emission control method, and a related system and device thereof aiming at a multi-context coarse-grained data stream structure. The method combines stages of the same type in a coarse-grained data stream mode to form a United Stage, can cover high delay caused by instructions such as access and storage, and meanwhile provides enough instructions for functional components, improves the utilization rate of the components, and simplifies selection logic.
The invention designs an instruction emission control method, a system and a device for a coarse-grained data stream structure containing multiple contexts, and particularly provides the following technical scheme:
in one aspect, the present invention provides a method for controlling instruction issue for a multi-context coarse-grained data stream structure, where the method includes:
placing the instructions of the same type stage in a corresponding area of an instruction RAM continuously;
setting a stage feedback mechanism, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time; and an identification bit is allocated for each stage;
setting a stage merging mechanism, adding a merging unit for each type of stage, and controlling a functional unit to continuously execute instructions of a plurality of stages of the same type through the merging unit;
setting a physical context selection mechanism, and selecting a physical context to be executed by the functional unit at the same time when the functional unit executes a physical context data according to the physical context selection mechanism and the identification bit;
and controlling the functional unit to execute instructions based on the phase feedback mechanism, the phase merging mechanism and the physical context selection mechanism.
It should be noted here that, there is no strict logic order requirement between the mechanisms in the above-mentioned methods, that is, the execution order may be adjusted, or performed in parallel, and should not be understood as the execution order due to the order in the writing form.
Preferably, in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of the dependency relationship between the instructions;
instruction segments of different stages are distinguished by multiple PC pointers. More preferably, the distinction can be made, for example, depending on the value of the PC pointer.
Preferably, in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal.
Preferably, in the phase merging mechanism, the merging unit merges the instructions of the phases by controlling the PC pointer.
Preferably, the physical context selection mechanism comprises a physical context selection policy, and the physical context selection policy is executed by setting a physical context selection unit. The selection policy may be set according to a specific instruction execution requirement, for example, the instruction set may be prioritized according to a certain rule, and then selected according to the priority.
Preferably, the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the successive setting as the transmittable selecting words in the step (2), recording the number of the physical context, and transmitting the recorded information to the merging unit. The set transmittable selection word may be, for example, the bit is set to 1, and then consecutive bits in the selection word may be all 1, and of course, the value of the set may also be set to 0, and the specific set value or form is not limited herein, and such conventional modifications should be considered as falling within the protection scope of the present invention.
Preferably, the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected and merged.
More preferably, the physical context selection policy further includes:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
More preferably, the physical context selection policy further includes:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
In another aspect, the present invention further provides an instruction issue control system for a multi-context coarse-grained data stream structure, where the system includes:
functional unit for executing instructions, instruction RAM for storing instructions, and
a feedback mechanism unit, which is used for allocating an identification bit for each stage, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time;
the merging units are arranged at each type of stage, and the merging units control the functional units to continuously execute the instructions of a plurality of stages of the same type;
the physical context selection unit selects the physical context to be executed next by the functional unit according to the identification bits of each stage in the physical context while the functional unit executes physical context data;
wherein the instructions of the same type stage are successively placed in the corresponding area of the instruction RAM.
Preferably, in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of the dependency relationship between the instructions;
instruction segments of different stages are distinguished by multiple PC pointers. More preferably, the distinction can be made, for example, depending on the value of the PC pointer.
Preferably, in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal. The value of the flag bit may be 1 or 0, and the specific setting may be adjusted according to the requirement.
Preferably, in the phase merging mechanism, the merging unit merges the instructions of the phases by controlling the PC pointer.
Preferably, the physical context selection mechanism comprises a physical context selection policy, and the physical context selection policy is executed by setting a physical context selection unit. The selection policy may be set according to a specific instruction execution requirement, for example, the instruction set may be prioritized according to a certain rule, and then selected according to the priority.
Preferably, the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the successive setting as the transmittable selecting words in the step (2), recording the number of the physical context, and transmitting the recorded information to the merging unit. The set transmittable selection word may be, for example, the bit is set to 1, and then consecutive bits in the selection word may be all 1, and of course, the value of the set may also be set to 0, and the specific set value or form is not limited herein, and such conventional modifications should be considered as falling within the protection scope of the present invention.
Preferably, the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected.
More preferably, the physical context selection policy further includes:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
More preferably, the physical context selection policy further includes:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
In yet another aspect, the present invention also provides an instruction issue control apparatus for a multi-context coarse-grained data stream structure, the apparatus comprising one or more processors,
a memory unit in which computer instructions are stored that can be called and operated by the processor;
the computer instructions execute the instruction transmitting control method facing the multi-context coarse granularity data stream structure.
Compared with the prior art, the invention has the following advantages:
(1) the execution characteristics of the coarse-grained data stream are fully utilized, different stages of the same type are combined to the maximum extent, and sufficient instructions to be executed are provided for the functional components;
(2) high delay caused by instructions such as access and storage is further covered, and the utilization rate of functional components is effectively improved;
(3) the Stage mechanism and the simple and flexible physical context selection strategy are combined, so that the selection logic of the functional components can be effectively simplified, and the system operation efficiency is improved.
Drawings
FIG. 1 is a schematic diagram illustrating an application of a computational model in a coarse-grained data stream structure;
FIG. 2 is a diagram illustrating a multi-context coarse-grained data stream structure;
FIG. 3 is a diagram illustrating the arrangement of instructions in a system;
FIG. 4 is a schematic diagram of valid comparand in the case of 3 stages;
FIG. 5 is an exemplary diagram illustrating different 1 consecutive digits in alternative valid comparand words;
FIG. 6 is an exemplary diagram of alternative valid comparand with 1 consecutive digits being the same.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
The invention provides an instruction emission control method, system and device capable of effectively improving the utilization rate of functional parts, further covering instruction delay and simplifying system selection logic according to the execution characteristics of a coarse-grained data stream structure containing multiple contexts.
Example 1
In a specific embodiment, the present invention further provides an instruction issue control system for a multi-context coarse-grained data stream structure, where the system includes:
functional unit for executing instructions, instruction RAM for storing instructions, and
a feedback mechanism unit, which is used for allocating an identification bit for each stage, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time;
the merging units are arranged at each type of stage, and the merging units control the functional units to continuously execute the instructions of a plurality of stages of the same type;
the physical context selection unit selects the physical context to be executed next by the functional unit according to the identification bits of each stage in the physical context while the functional unit executes physical context data;
wherein the instructions of the same type stage are successively placed in the corresponding area of the instruction RAM.
Preferably, in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of the dependency relationship between the instructions;
instruction segments of different stages are distinguished by multiple PC pointers. More preferably, the distinction can be made, for example, depending on the value of the PC pointer.
Preferably, in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal. The value of the flag bit may be 1 or 0, and the specific setting may be adjusted according to the requirement.
Preferably, in the phase merging mechanism, the merging unit merges the instructions of the phases by controlling the PC pointer.
Preferably, the physical context selection mechanism comprises a physical context selection policy, and the physical context selection policy is executed by setting a physical context selection unit. The selection policy may be set according to a specific instruction execution requirement, for example, the instruction set may be prioritized according to a certain rule, and then selected according to the priority.
Preferably, the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) and (3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the continuous selection words which are set as the transmittable selection words in the step (2), recording the physical context number, and transmitting the recorded information to the merging unit. The set transmittable selection word may be, for example, the bit is set to 1, and then consecutive bits in the selection word may be all 1, and of course, the set value may also be set to 0, and the specific set value or form is not limited herein, and such conventional modifications should be considered as falling within the protection scope of the present invention.
Preferably, the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected.
More preferably, the physical context selection policy further includes:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
More preferably, the physical context selection policy further includes:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
It should be noted that the system may execute a specific instruction transmission control method as described in embodiments 2 and 3.
Example 2
In another embodiment, the present invention provides an instruction issue control method for a coarse-grained data stream structure including multiple contexts, which is shown in fig. 2 (only the selection and merging paths of LOAD instructions are shown in the figure), and includes multiple context units, a context selection Unit, a merging Unit, and a Function Unit (i.e., a Function Unit). The method relates to the contents of an instruction arrangement mode, a physical context selection strategy, a Stage merging mechanism and the like in a system. In order to describe the method of the present invention in more detail, various aspects of the method are set forth below with reference to specific examples.
In summary, the method may be performed by:
the method comprises the following steps:
placing the instructions of the same type stage in a corresponding area of an instruction RAM continuously;
setting a stage feedback mechanism, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time; and an identification bit is allocated for each stage;
setting a stage merging mechanism, adding a merging unit for each type of stage, and controlling a functional unit to continuously execute instructions of a plurality of stages of the same type through the merging unit;
setting a physical context selection mechanism, and selecting a physical context to be executed by the functional unit at the same time when the functional unit executes a physical context data according to the physical context selection mechanism and the identification bit;
and controlling the functional unit to execute instructions based on the phase feedback mechanism, the phase merging mechanism and the physical context selection mechanism.
It should be noted here that, there is no strict logic order requirement between the mechanisms in the above-mentioned methods, that is, the execution order may be adjusted, or performed in parallel, and should not be understood as the execution order due to the order in the writing form.
Preferably, in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of the dependency relationship between the instructions;
instruction segments of different stages are distinguished by multiple PC pointers. More preferably, the distinction can be made, for example, depending on the value of the PC pointer.
Preferably, in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal.
Preferably, in the phase merging mechanism, the merging unit merges the instructions of the phases by controlling the PC pointer.
Preferably, the physical context selection mechanism comprises a physical context selection policy, and the physical context selection policy is executed by setting a physical context selection unit. The selection policy may be set according to a specific instruction execution requirement, for example, the instruction set may be prioritized according to a certain rule, and then selected according to the priority.
Preferably, the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) and (3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the continuous selection words which are set as the transmittable selection words in the step (2), recording the physical context number, and transmitting the recorded information to the merging unit. The set transmittable selection word may be, for example, the bit is set to 1, and then consecutive bits of the selection word may be all 1 in the case of consecutive bits, and of course, the value of the set may also be set to 0, which is not limited to a specific set value or form, and such conventional modifications should be considered as falling within the protection scope of the present invention.
Preferably, the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected.
More preferably, the physical context selection policy further includes:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
More preferably, the physical context selection policy further includes:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
More specifically, the method of the invention may comprise the following aspects:
1. instruction arrangement
The instruction emission control method requires that the instructions of stages of the same type in the system are continuously placed in the corresponding area of the instruction RAM. If the Stage internally comprises a plurality of instructions, the instructions are arranged according to the order of the dependency relationship among the instructions. The system distinguishes instruction segments of different stages by storing multiple PC pointer values. Taking fig. 3 as an example, the program segment can be effectively divided into 3 calculation operations CAL Stage and 3 access operations LOAD Stage which are performed in an interleaving manner. In the calculation instruction area of the Inst RAM, instructions of stages 0-2 are placed from low to high in sequence, and the start and end instruction positions of each Stage are recorded by a PC 0-a PC 3.
Stage feedback mechanism
The instruction emission control method of the invention is oriented to a coarse-grained data stream structure with multiple physical contexts and multiple logical contexts. In the application scenario, a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each Stage is executed by different logic contexts in the same physical context at the same time. Since the progression of stages in the data stream does not follow a fixed number of clock ticks, a feedback mechanism is employed, each Stage being provided with a valid bit that marks whether the Stage instruction can be launched, and when a Stage receives a corresponding ack feedback signal, the valid bit is set to 1, indicating that it is valid to be launched.
Stage merge mechanism
In order to simplify context selection logic and provide more instructions for functional units so as to further improve the utilization rate of the functional units, the instruction transmission control method adds a merging Unit for each class of Stage in the system. Since different stages in each physical context correspond to different logical contexts, there is a case where different stages of the same type can be simultaneously issued, and instructions of the same type are continuously arranged in the same instruction RAM region, so that the control functional unit can continuously execute instructions of a plurality of stages of the same type through the merge unit.
In the method, the merging unit merges the Stage instruction by a simple method of operating a PC pointer, and a process of instruction moving and copying does not exist. As shown in fig. 2, the merging unit obtains, through the selection unit, the start and end PC pointer values PC _ start and PC _ end of the instruction region to be executed consecutively in the physical context to be executed, and further controls the instruction execution of the functional unit.
4. Physical context selection policy
The instruction emission control method of the invention adds a single physical context selection Unit in a multi-context system. When the functional unit executes certain physical context data, the selection unit selects the physical context to be executed by the functional unit according to the valid value condition of each Stage in other physical contexts. The specific selection strategy in the Select Unit is described below:
(1) forming valid bits of different stages of the same type in the physical context to be selected from small to large according to Stage numbers to select the low to high bits of a word;
(2) each valid select word is analyzed in turn, selecting the physical context in which the number of bits that are 1 in succession is the most. The strategy ensures that the physical context which can continuously transmit the instruction number with the most number is selected for Stage combination;
(3) in the policy (2), if a plurality of consecutive numerical values 1 having the same number of bits exist in the valid selection word, a plurality of stages corresponding to the upper bits are preferentially selected. For example, if the value of a valid comparand containing 5 stages of the same type is 11011, Stage3 and Stage4 are preferably selected to be combined;
(4) in the policies (2) and (3), if the number of consecutive 1's in valid comparables in which a plurality of physical contexts exist is equal, the physical context with the largest Stage number that can be merged is selected. The strategy (3) and the strategy (4) give high priority to Stage at the later Stage of iteration;
(5) in the strategies (2) to (4), if a plurality of physical context valid comparables exist, the physical context with the minimum number is selected. The policy assigns high priority to small-number physical contexts;
(6) and recording the serial number of the Physical context as Physical _ id, and transmitting the information to a merging unit.
It should be noted here that the above policies (2) to (5) are more preferable policies of the embodiment of the present invention, and are not necessarily set, that is, if there are a plurality of contexts with the same form or the same priority, another priority policy may be set for filtering. The strategies (2) to (5) given in the present invention are only one of the preferred ways to be used by reference. The sequence numbers of the above policies are used for convenience of description only, and do not affect the essence of the policies, and should not be construed as limiting the scope of the embodiments of the present invention.
Example 3
In another embodiment, a specific scenario is used to describe the implementation of the instruction issue control method of the present invention. The coarse-grained data stream system in this scenario has 4 physical contexts, corresponding to 4 unrelated data spaces, and it is assumed that the program segment being executed thereon can be effectively divided into 3 CAL stages 0-2 and 3 LOAD stages 0-2 for interleaving execution. The arrangement of the program segment instructions in the INST RAM is shown in FIG. 3. The structure of the multi-context coarse-grained data stream is shown in FIG. 2, in which only the selection and merging paths for the LOAD type Stage are shown.
FIG. 4 lists all values corresponding to valid comparables composed of similar stages in a single physical context in this example scenario. And listing the PC start and end values of each valid comparand value corresponding to the combinable Stage according to the selection strategy (3).
The physical context selection logic and Stage merge process are described below with 2 specific examples.
Example 1 alternative valid comparison words with different 1 consecutive digits
In this example, as shown in fig. 5, at the current time, the LOAD functional unit is processing physical context data No. 0, and the selection unit will select physical contexts No. 1-3. In this example, the number of 1 consecutive bits in the valid comparison word of physical context nos. 1-3 is different, and the specific selection and combination steps are as follows:
step 501: 1-3 physical contexts respectively combine valid bits of the LOAD Stage in the physical contexts from small to large according to the Stage number into valid comparison words from low to high, namely the valid comparison word of the No. 1 physical context is 111, the No. 2 physical context is 101, and the No. 3 physical context is 011;
step 502: sending valid comparison words into a selection Unit from No. 1 to No. 3 physical contexts;
step 503: the selection unit analyzes the valid comparison words 1 to 3, the valid comparison word 1 111 contains 1 with 3 consecutive digits, the valid comparison word 2 101 contains 1 with 1 consecutive digits and the valid comparison word 3 011 contains 1 with 2 consecutive digits. According to the selection strategy (2), selecting the physical context with the most continuous digits of 1, namely the context No. 1;
step 504: the selection unit transmits selection result information to the LOAD merging unit, the result information comprises a Physical context number Physical _ id of 1, the starting PC value of Stage to be merged and executed is PC0, and the ending PC value is PC 3;
step 505: the LOAD merging unit receives the physical context selection information and controls the next execution of the LOAD function unit according to the PC start and end values.
Example 2 alternative valid comparison word has the same number of 1 consecutive bits
In the present example, as shown in fig. 6, at the present moment, the LOAD functional unit is processing physical context data No. 0, and the selection unit will select physical contexts No. 1-3. In the present example, when 1 consecutive digits in valid comparison words of physical contexts No. 1 to No. 3 are the same, the specific selection and combination steps are as follows:
step 601: 1-3 physical contexts respectively combine valid bits of the LOAD Stage in the physical contexts from small to large according to the Stage number into valid comparison words from low to high, namely, the valid comparison words of the 1 physical context are 011, the 2 physical context is 110, and the 3 physical context is 110;
step 602: sending valid comparison words into a selection Unit from No. 1 to No. 3 physical contexts;
step 603: the selection unit analyzes the valid comparison words 1 to 3, wherein the valid comparison word 1 111 contains 1 with 2 consecutive digits, the valid comparison word 2 contains 1 with 2 consecutive digits, and the valid comparison word 3 contains 1 with 2 consecutive digits. 1 continuous digits in 3 valid selection words are the same, according to the selection strategy (3), the Stage which can be merged corresponding to the valid comparison word No. 1 is No. 0-1, and the Stage which can be merged corresponding to the valid comparison words No. 2 and No. 3 is No. 1-2, so that the physical context No. 1 should be excluded. In addition, according to the selection strategy (5), the physical context with smaller number is preferentially selected, namely the context No. 2;
step 604: the selection unit transmits selection result information to the LOAD merging unit, the result information contains a Physical context number Physical _ id of 2, the starting PC value of Stage to be merged and executed is PC1, and the ending PC value is PC 3;
step 605: the LOAD merging unit receives the physical context selection information and controls the next execution of the LOAD function unit according to the PC start and end values.
Example 4
In yet another embodiment, the present invention further provides an instruction issue control apparatus for a multi-context coarse-grained data stream structure, the apparatus comprising one or more processors,
a memory unit in which computer instructions are stored that can be called and operated by the processor;
the computer instructions execute the instruction transmitting control method facing the multi-context coarse granularity data stream structure. Specifically, the method performed by the apparatus may be, for example, the methods described in embodiments 2 and 3.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (11)
1. An instruction emission control method facing a multi-context coarse-grained data stream structure, characterized in that the method comprises:
placing the instructions with the same type of the stage in a corresponding area of an instruction RAM continuously;
setting a stage feedback mechanism, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time; and an identification bit is allocated for each stage;
setting a stage merging mechanism, adding a merging unit for each type of stage, and controlling a functional unit to continuously execute instructions of a plurality of stages of the same type through the merging unit;
setting a physical context selection mechanism, and selecting a physical context to be executed by the functional unit at the same time when the functional unit executes a physical context data according to the physical context selection mechanism and the identification bit;
controlling the functional unit to execute instructions based on the phase feedback mechanism and the phase merging mechanism, and the physical context selection mechanism.
2. The method according to claim 1, wherein in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of dependency relationship;
instruction segments of different stages are distinguished by multiple PC pointers.
3. The method of claim 1, wherein in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal.
4. The method of claim 2, wherein the stage merging mechanism, the merging unit implements merging of instructions of stages by controlling the PC pointer.
5. The method of claim 1, wherein the physical context selection mechanism comprises a physical context selection policy, and wherein the physical context selection policy is implemented by setting a physical context selection unit.
6. The method of claim 5, wherein the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the successive setting as the transmittable selecting words in the step (2), recording the number of the physical context, and transmitting the recorded information to the merging unit.
7. The method of claim 6, wherein the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected.
8. The method of claim 7, wherein the physical context selection policy further comprises:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
9. The method of claim 8, wherein the physical context selection policy further comprises:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
10. An instruction issue control system for a multi-context coarse-grained data stream structure, the system comprising:
functional unit for executing instructions, instruction RAM for storing instructions, and
a feedback mechanism unit, which is used for allocating an identification bit for each stage, so that a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time;
the merging units are arranged at each type of stage, and the merging units control the functional units to continuously execute the instructions of a plurality of stages of the same type;
the physical context selection unit selects the physical context to be executed next by the functional unit according to the identification bits of each stage in the physical context while the functional unit executes physical context data;
wherein the instructions of the same type stage are successively placed in the corresponding area of the instruction RAM.
11. An apparatus for instruction issue control for a multi-context coarse-grained data stream structure, the apparatus comprising one or more processors,
a memory unit in which computer instructions are stored that can be called and operated by the processor;
the computer instructions implement the instruction transmission control method for a multi-context coarse-grained data stream structure according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810682382.9A CN109189477B (en) | 2018-06-27 | 2018-06-27 | Instruction emission control method oriented to multi-context coarse-grained data stream structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810682382.9A CN109189477B (en) | 2018-06-27 | 2018-06-27 | Instruction emission control method oriented to multi-context coarse-grained data stream structure |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109189477A CN109189477A (en) | 2019-01-11 |
CN109189477B true CN109189477B (en) | 2021-09-28 |
Family
ID=64948585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810682382.9A Active CN109189477B (en) | 2018-06-27 | 2018-06-27 | Instruction emission control method oriented to multi-context coarse-grained data stream structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109189477B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7646318B2 (en) * | 2007-11-26 | 2010-01-12 | Electronics And Telecommunications Research Institute | H.264 CAVLC decoding method based on application-specific instruction-set processor |
CN102508689A (en) * | 2011-11-08 | 2012-06-20 | 上海交通大学 | Data processing system capable of maintaining dependency relationship in advanced language program data flow diagram extraction |
CN103702310A (en) * | 2013-12-30 | 2014-04-02 | 成都道永网络技术有限公司 | Multi-interface data merge implementation method |
US9009448B2 (en) * | 2011-08-17 | 2015-04-14 | Intel Corporation | Multithreaded DFA architecture for finding rules match by concurrently performing at varying input stream positions and sorting result tokens |
CN105279022A (en) * | 2010-05-04 | 2016-01-27 | 谷歌公司 | Parallel processing of data |
CN106133690A (en) * | 2014-03-27 | 2016-11-16 | 国际商业机器公司 | Thread context in multi-threaded computer system retains |
CN107273092A (en) * | 2017-05-03 | 2017-10-20 | 北京中科睿芯科技有限公司 | A kind of method and its system for optimizing data stream architecture memory access latency |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020107889A1 (en) * | 2001-02-08 | 2002-08-08 | Tilion Corporation | Markup language routing and administration |
CN108052347B (en) * | 2017-12-06 | 2021-07-20 | 北京中科睿芯智能计算产业研究院有限公司 | Device and method for executing instruction selection and instruction mapping method |
-
2018
- 2018-06-27 CN CN201810682382.9A patent/CN109189477B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7646318B2 (en) * | 2007-11-26 | 2010-01-12 | Electronics And Telecommunications Research Institute | H.264 CAVLC decoding method based on application-specific instruction-set processor |
CN105279022A (en) * | 2010-05-04 | 2016-01-27 | 谷歌公司 | Parallel processing of data |
US9009448B2 (en) * | 2011-08-17 | 2015-04-14 | Intel Corporation | Multithreaded DFA architecture for finding rules match by concurrently performing at varying input stream positions and sorting result tokens |
CN102508689A (en) * | 2011-11-08 | 2012-06-20 | 上海交通大学 | Data processing system capable of maintaining dependency relationship in advanced language program data flow diagram extraction |
CN103702310A (en) * | 2013-12-30 | 2014-04-02 | 成都道永网络技术有限公司 | Multi-interface data merge implementation method |
CN106133690A (en) * | 2014-03-27 | 2016-11-16 | 国际商业机器公司 | Thread context in multi-threaded computer system retains |
CN107273092A (en) * | 2017-05-03 | 2017-10-20 | 北京中科睿芯科技有限公司 | A kind of method and its system for optimizing data stream architecture memory access latency |
Non-Patent Citations (2)
Title |
---|
MilošKrstajićet al..Visualization of streaming data: Observing change and context in information visualization techniques.《 2013 IEEE International Conference on Big Data》.2013,全文. * |
一种面向科学计算的数据流优化方法;申小伟 等;《计算机学报》;20170930;第40卷(第9期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN109189477A (en) | 2019-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5707011B2 (en) | Integrated branch destination / predicate prediction | |
US7366874B2 (en) | Apparatus and method for dispatching very long instruction word having variable length | |
CN104040490B (en) | Code optimizer for the acceleration of multi engine microprocessor | |
JP3797471B2 (en) | Method and apparatus for identifying divisible packets in a multi-threaded VLIW processor | |
JP2928695B2 (en) | Multi-thread microprocessor using static interleave and instruction thread execution method in system including the same | |
KR100543887B1 (en) | Piplined memory controller | |
CN103348323B (en) | Method and system for performance objective program in computer systems | |
US6304954B1 (en) | Executing multiple instructions in multi-pipelined processor by dynamically switching memory ports of fewer number than the pipeline | |
US20080250227A1 (en) | General Purpose Multiprocessor Programming Apparatus And Method | |
US20060225139A1 (en) | Semiconductor integrated circuit | |
US11900120B2 (en) | Issuing instructions based on resource conflict constraints in microprocessor | |
JP2002516425A (en) | Controller for digital processor | |
EP2577464B1 (en) | System and method to evaluate a data value as an instruction | |
CN109189477B (en) | Instruction emission control method oriented to multi-context coarse-grained data stream structure | |
JP5576605B2 (en) | Program conversion apparatus and program conversion method | |
US9513921B2 (en) | Computer processor employing temporal addressing for storage of transient operands | |
US5828861A (en) | System and method for reducing the critical path in memory control unit and input/output control unit operations | |
US10606602B2 (en) | Electronic apparatus, processor and control method including a compiler scheduling instructions to reduce unused input ports | |
US7200739B2 (en) | Generation of modified command sequence from original command by feeding back for subsequent modification based on decode control signal | |
JP2003203486A (en) | Semiconductor memory device and its control method | |
US20210042111A1 (en) | Efficient encoding of high fanout communications | |
JP2003296111A (en) | Program generator | |
CN1291310C (en) | Device and method for performing non-direct near skip operation in microprocessors | |
JPH04308930A (en) | Electronic computer | |
JP2009104341A (en) | Compile device and processor system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100095 room 135, 1st floor, building 15, Chuangke Town, Wenquan Town, Haidian District, Beijing Applicant after: Beijing Zhongke Ruixin Technology Group Co.,Ltd. Address before: 1 wensong Road, Zhongguancun environmental protection park, Beiqing Road, Haidian District, Beijing 100095 Applicant before: SMARTCORE (BEIJING) Co.,Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |