CN109189477B - Instruction emission control method oriented to multi-context coarse-grained data stream structure - Google Patents

Instruction emission control method oriented to multi-context coarse-grained data stream structure Download PDF

Info

Publication number
CN109189477B
CN109189477B CN201810682382.9A CN201810682382A CN109189477B CN 109189477 B CN109189477 B CN 109189477B CN 201810682382 A CN201810682382 A CN 201810682382A CN 109189477 B CN109189477 B CN 109189477B
Authority
CN
China
Prior art keywords
stage
physical context
physical
context
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810682382.9A
Other languages
Chinese (zh)
Other versions
CN109189477A (en
Inventor
李涵
严明玉
李文明
叶笑春
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Ruixin Technology Group Co ltd
Original Assignee
Beijing Zhongke Ruixin Technology Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Ruixin Technology Group Co ltd filed Critical Beijing Zhongke Ruixin Technology Group Co ltd
Priority to CN201810682382.9A priority Critical patent/CN109189477B/en
Publication of CN109189477A publication Critical patent/CN109189477A/en
Application granted granted Critical
Publication of CN109189477B publication Critical patent/CN109189477B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution

Abstract

The invention provides an instruction emission control method, system and device for a multi-context coarse-grained data stream structure. The method relates to the contents of an instruction arrangement mode, a physical context selection strategy, a Stage merging mechanism and the like in a system, and stages of the same type in a coarse-grained data stream mode are merged to the maximum extent through a physical context selection logic to form a United Stage. The merging unit controls the execution of the functional block by operating the Stage internal PC pointer. The technical scheme of the invention can cover high delay caused by instructions such as access and storage, provide enough instructions for the functional components, effectively improve the utilization rate of the components, simplify the selection logic of the functional components and improve the operation efficiency of the system.

Description

Instruction emission control method oriented to multi-context coarse-grained data stream structure
Technical Field
The present invention relates to the field of instruction transmission method design in a processing unit, and in particular, to an instruction transmission control method, system and apparatus for a coarse-grained data stream structure including multiple contexts.
Background
In recent years, research and development on data stream structures are widely regarded by academia and markets, and a data-driven mechanism enables the data-driven mechanism to successfully get rid of the limitation of PCs in a control stream structure. In the structure, one instruction can be executed only by waiting for the ready operation number without the support of a shared memory, so that asynchronous parallel execution of different instructions can be effectively realized, parallelism in a program is fully mined, and the calculation speed and efficiency of a processor are improved. Meanwhile, compared with the traditional control flow structure, the simple control logic reduces the area of the data flow structure processor, so that the data flow structure processor has the advantages of low power consumption and high performance power consumption ratio.
In many practical applications including graph computation, there are often instructions with data dependencies, and a traditional data stream cannot efficiently process such instruction segments, and even brings additional overhead, so the concept of "coarse-grained data stream" is introduced. Coarse-grained data flow divides a plurality of instructions with data dependence into a Stage (Stage), and the inside of the Stage executes a control flow mode through a PC (personal computer), so that the instructions with data dependence do not need to pass through complicated matching logic in the data flow, and the stages are advanced in the form of data flow. FIG. 1 shows the application form of the partial flow of the vertex-centralized model commonly used in graph computation in the coarse-grained data stream structure, the execution process of which is divided into a plurality of stages of different types, and each Stage contains 1 or more instructions with data dependency. The execution mode can reasonably and effectively combine the execution modes of the data flow and the control flow, and simplifies logic and avoids unnecessary expenditure while mining the instruction parallelism.
The multi-context mode can effectively reduce the idle time of each part in the data flow mode and further improve the utilization rate of the functional parts. The Context schema comprises two structures of a Physical Context (Physical Context) and a Logical Context (Logical Context): different physical contexts correspond to unrelated data spaces and compete for one set of functional components together, when the functional components are idle, the physical contexts can be switched, the data is not dependent, and high delay generated by instructions such as memory access and the like is effectively covered; each logical context corresponds to one iteration of the program, and in a coarse-grained data flow mode, the logical context completes all stages of the program in a streaming manner (streaming), that is, when the load is full, the logical contexts with the number of stages in the same physical context work simultaneously. However, the increased number of physical contexts and logical contexts makes the data selection logic of the functional unit more complex.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a novel instruction emission control method, and a related system and device thereof aiming at a multi-context coarse-grained data stream structure. The method combines stages of the same type in a coarse-grained data stream mode to form a United Stage, can cover high delay caused by instructions such as access and storage, and meanwhile provides enough instructions for functional components, improves the utilization rate of the components, and simplifies selection logic.
The invention designs an instruction emission control method, a system and a device for a coarse-grained data stream structure containing multiple contexts, and particularly provides the following technical scheme:
in one aspect, the present invention provides a method for controlling instruction issue for a multi-context coarse-grained data stream structure, where the method includes:
placing the instructions of the same type stage in a corresponding area of an instruction RAM continuously;
setting a stage feedback mechanism, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time; and an identification bit is allocated for each stage;
setting a stage merging mechanism, adding a merging unit for each type of stage, and controlling a functional unit to continuously execute instructions of a plurality of stages of the same type through the merging unit;
setting a physical context selection mechanism, and selecting a physical context to be executed by the functional unit at the same time when the functional unit executes a physical context data according to the physical context selection mechanism and the identification bit;
and controlling the functional unit to execute instructions based on the phase feedback mechanism, the phase merging mechanism and the physical context selection mechanism.
It should be noted here that, there is no strict logic order requirement between the mechanisms in the above-mentioned methods, that is, the execution order may be adjusted, or performed in parallel, and should not be understood as the execution order due to the order in the writing form.
Preferably, in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of the dependency relationship between the instructions;
instruction segments of different stages are distinguished by multiple PC pointers. More preferably, the distinction can be made, for example, depending on the value of the PC pointer.
Preferably, in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal.
Preferably, in the phase merging mechanism, the merging unit merges the instructions of the phases by controlling the PC pointer.
Preferably, the physical context selection mechanism comprises a physical context selection policy, and the physical context selection policy is executed by setting a physical context selection unit. The selection policy may be set according to a specific instruction execution requirement, for example, the instruction set may be prioritized according to a certain rule, and then selected according to the priority.
Preferably, the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the successive setting as the transmittable selecting words in the step (2), recording the number of the physical context, and transmitting the recorded information to the merging unit. The set transmittable selection word may be, for example, the bit is set to 1, and then consecutive bits in the selection word may be all 1, and of course, the value of the set may also be set to 0, and the specific set value or form is not limited herein, and such conventional modifications should be considered as falling within the protection scope of the present invention.
Preferably, the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected and merged.
More preferably, the physical context selection policy further includes:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
More preferably, the physical context selection policy further includes:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
In another aspect, the present invention further provides an instruction issue control system for a multi-context coarse-grained data stream structure, where the system includes:
functional unit for executing instructions, instruction RAM for storing instructions, and
a feedback mechanism unit, which is used for allocating an identification bit for each stage, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time;
the merging units are arranged at each type of stage, and the merging units control the functional units to continuously execute the instructions of a plurality of stages of the same type;
the physical context selection unit selects the physical context to be executed next by the functional unit according to the identification bits of each stage in the physical context while the functional unit executes physical context data;
wherein the instructions of the same type stage are successively placed in the corresponding area of the instruction RAM.
Preferably, in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of the dependency relationship between the instructions;
instruction segments of different stages are distinguished by multiple PC pointers. More preferably, the distinction can be made, for example, depending on the value of the PC pointer.
Preferably, in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal. The value of the flag bit may be 1 or 0, and the specific setting may be adjusted according to the requirement.
Preferably, in the phase merging mechanism, the merging unit merges the instructions of the phases by controlling the PC pointer.
Preferably, the physical context selection mechanism comprises a physical context selection policy, and the physical context selection policy is executed by setting a physical context selection unit. The selection policy may be set according to a specific instruction execution requirement, for example, the instruction set may be prioritized according to a certain rule, and then selected according to the priority.
Preferably, the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the successive setting as the transmittable selecting words in the step (2), recording the number of the physical context, and transmitting the recorded information to the merging unit. The set transmittable selection word may be, for example, the bit is set to 1, and then consecutive bits in the selection word may be all 1, and of course, the value of the set may also be set to 0, and the specific set value or form is not limited herein, and such conventional modifications should be considered as falling within the protection scope of the present invention.
Preferably, the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected.
More preferably, the physical context selection policy further includes:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
More preferably, the physical context selection policy further includes:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
In yet another aspect, the present invention also provides an instruction issue control apparatus for a multi-context coarse-grained data stream structure, the apparatus comprising one or more processors,
a memory unit in which computer instructions are stored that can be called and operated by the processor;
the computer instructions execute the instruction transmitting control method facing the multi-context coarse granularity data stream structure.
Compared with the prior art, the invention has the following advantages:
(1) the execution characteristics of the coarse-grained data stream are fully utilized, different stages of the same type are combined to the maximum extent, and sufficient instructions to be executed are provided for the functional components;
(2) high delay caused by instructions such as access and storage is further covered, and the utilization rate of functional components is effectively improved;
(3) the Stage mechanism and the simple and flexible physical context selection strategy are combined, so that the selection logic of the functional components can be effectively simplified, and the system operation efficiency is improved.
Drawings
FIG. 1 is a schematic diagram illustrating an application of a computational model in a coarse-grained data stream structure;
FIG. 2 is a diagram illustrating a multi-context coarse-grained data stream structure;
FIG. 3 is a diagram illustrating the arrangement of instructions in a system;
FIG. 4 is a schematic diagram of valid comparand in the case of 3 stages;
FIG. 5 is an exemplary diagram illustrating different 1 consecutive digits in alternative valid comparand words;
FIG. 6 is an exemplary diagram of alternative valid comparand with 1 consecutive digits being the same.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
The invention provides an instruction emission control method, system and device capable of effectively improving the utilization rate of functional parts, further covering instruction delay and simplifying system selection logic according to the execution characteristics of a coarse-grained data stream structure containing multiple contexts.
Example 1
In a specific embodiment, the present invention further provides an instruction issue control system for a multi-context coarse-grained data stream structure, where the system includes:
functional unit for executing instructions, instruction RAM for storing instructions, and
a feedback mechanism unit, which is used for allocating an identification bit for each stage, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time;
the merging units are arranged at each type of stage, and the merging units control the functional units to continuously execute the instructions of a plurality of stages of the same type;
the physical context selection unit selects the physical context to be executed next by the functional unit according to the identification bits of each stage in the physical context while the functional unit executes physical context data;
wherein the instructions of the same type stage are successively placed in the corresponding area of the instruction RAM.
Preferably, in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of the dependency relationship between the instructions;
instruction segments of different stages are distinguished by multiple PC pointers. More preferably, the distinction can be made, for example, depending on the value of the PC pointer.
Preferably, in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal. The value of the flag bit may be 1 or 0, and the specific setting may be adjusted according to the requirement.
Preferably, in the phase merging mechanism, the merging unit merges the instructions of the phases by controlling the PC pointer.
Preferably, the physical context selection mechanism comprises a physical context selection policy, and the physical context selection policy is executed by setting a physical context selection unit. The selection policy may be set according to a specific instruction execution requirement, for example, the instruction set may be prioritized according to a certain rule, and then selected according to the priority.
Preferably, the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) and (3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the continuous selection words which are set as the transmittable selection words in the step (2), recording the physical context number, and transmitting the recorded information to the merging unit. The set transmittable selection word may be, for example, the bit is set to 1, and then consecutive bits in the selection word may be all 1, and of course, the set value may also be set to 0, and the specific set value or form is not limited herein, and such conventional modifications should be considered as falling within the protection scope of the present invention.
Preferably, the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected.
More preferably, the physical context selection policy further includes:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
More preferably, the physical context selection policy further includes:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
It should be noted that the system may execute a specific instruction transmission control method as described in embodiments 2 and 3.
Example 2
In another embodiment, the present invention provides an instruction issue control method for a coarse-grained data stream structure including multiple contexts, which is shown in fig. 2 (only the selection and merging paths of LOAD instructions are shown in the figure), and includes multiple context units, a context selection Unit, a merging Unit, and a Function Unit (i.e., a Function Unit). The method relates to the contents of an instruction arrangement mode, a physical context selection strategy, a Stage merging mechanism and the like in a system. In order to describe the method of the present invention in more detail, various aspects of the method are set forth below with reference to specific examples.
In summary, the method may be performed by:
the method comprises the following steps:
placing the instructions of the same type stage in a corresponding area of an instruction RAM continuously;
setting a stage feedback mechanism, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time; and an identification bit is allocated for each stage;
setting a stage merging mechanism, adding a merging unit for each type of stage, and controlling a functional unit to continuously execute instructions of a plurality of stages of the same type through the merging unit;
setting a physical context selection mechanism, and selecting a physical context to be executed by the functional unit at the same time when the functional unit executes a physical context data according to the physical context selection mechanism and the identification bit;
and controlling the functional unit to execute instructions based on the phase feedback mechanism, the phase merging mechanism and the physical context selection mechanism.
It should be noted here that, there is no strict logic order requirement between the mechanisms in the above-mentioned methods, that is, the execution order may be adjusted, or performed in parallel, and should not be understood as the execution order due to the order in the writing form.
Preferably, in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of the dependency relationship between the instructions;
instruction segments of different stages are distinguished by multiple PC pointers. More preferably, the distinction can be made, for example, depending on the value of the PC pointer.
Preferably, in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal.
Preferably, in the phase merging mechanism, the merging unit merges the instructions of the phases by controlling the PC pointer.
Preferably, the physical context selection mechanism comprises a physical context selection policy, and the physical context selection policy is executed by setting a physical context selection unit. The selection policy may be set according to a specific instruction execution requirement, for example, the instruction set may be prioritized according to a certain rule, and then selected according to the priority.
Preferably, the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) and (3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the continuous selection words which are set as the transmittable selection words in the step (2), recording the physical context number, and transmitting the recorded information to the merging unit. The set transmittable selection word may be, for example, the bit is set to 1, and then consecutive bits of the selection word may be all 1 in the case of consecutive bits, and of course, the value of the set may also be set to 0, which is not limited to a specific set value or form, and such conventional modifications should be considered as falling within the protection scope of the present invention.
Preferably, the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected.
More preferably, the physical context selection policy further includes:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
More preferably, the physical context selection policy further includes:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
More specifically, the method of the invention may comprise the following aspects:
1. instruction arrangement
The instruction emission control method requires that the instructions of stages of the same type in the system are continuously placed in the corresponding area of the instruction RAM. If the Stage internally comprises a plurality of instructions, the instructions are arranged according to the order of the dependency relationship among the instructions. The system distinguishes instruction segments of different stages by storing multiple PC pointer values. Taking fig. 3 as an example, the program segment can be effectively divided into 3 calculation operations CAL Stage and 3 access operations LOAD Stage which are performed in an interleaving manner. In the calculation instruction area of the Inst RAM, instructions of stages 0-2 are placed from low to high in sequence, and the start and end instruction positions of each Stage are recorded by a PC 0-a PC 3.
Stage feedback mechanism
The instruction emission control method of the invention is oriented to a coarse-grained data stream structure with multiple physical contexts and multiple logical contexts. In the application scenario, a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each Stage is executed by different logic contexts in the same physical context at the same time. Since the progression of stages in the data stream does not follow a fixed number of clock ticks, a feedback mechanism is employed, each Stage being provided with a valid bit that marks whether the Stage instruction can be launched, and when a Stage receives a corresponding ack feedback signal, the valid bit is set to 1, indicating that it is valid to be launched.
Stage merge mechanism
In order to simplify context selection logic and provide more instructions for functional units so as to further improve the utilization rate of the functional units, the instruction transmission control method adds a merging Unit for each class of Stage in the system. Since different stages in each physical context correspond to different logical contexts, there is a case where different stages of the same type can be simultaneously issued, and instructions of the same type are continuously arranged in the same instruction RAM region, so that the control functional unit can continuously execute instructions of a plurality of stages of the same type through the merge unit.
In the method, the merging unit merges the Stage instruction by a simple method of operating a PC pointer, and a process of instruction moving and copying does not exist. As shown in fig. 2, the merging unit obtains, through the selection unit, the start and end PC pointer values PC _ start and PC _ end of the instruction region to be executed consecutively in the physical context to be executed, and further controls the instruction execution of the functional unit.
4. Physical context selection policy
The instruction emission control method of the invention adds a single physical context selection Unit in a multi-context system. When the functional unit executes certain physical context data, the selection unit selects the physical context to be executed by the functional unit according to the valid value condition of each Stage in other physical contexts. The specific selection strategy in the Select Unit is described below:
(1) forming valid bits of different stages of the same type in the physical context to be selected from small to large according to Stage numbers to select the low to high bits of a word;
(2) each valid select word is analyzed in turn, selecting the physical context in which the number of bits that are 1 in succession is the most. The strategy ensures that the physical context which can continuously transmit the instruction number with the most number is selected for Stage combination;
(3) in the policy (2), if a plurality of consecutive numerical values 1 having the same number of bits exist in the valid selection word, a plurality of stages corresponding to the upper bits are preferentially selected. For example, if the value of a valid comparand containing 5 stages of the same type is 11011, Stage3 and Stage4 are preferably selected to be combined;
(4) in the policies (2) and (3), if the number of consecutive 1's in valid comparables in which a plurality of physical contexts exist is equal, the physical context with the largest Stage number that can be merged is selected. The strategy (3) and the strategy (4) give high priority to Stage at the later Stage of iteration;
(5) in the strategies (2) to (4), if a plurality of physical context valid comparables exist, the physical context with the minimum number is selected. The policy assigns high priority to small-number physical contexts;
(6) and recording the serial number of the Physical context as Physical _ id, and transmitting the information to a merging unit.
It should be noted here that the above policies (2) to (5) are more preferable policies of the embodiment of the present invention, and are not necessarily set, that is, if there are a plurality of contexts with the same form or the same priority, another priority policy may be set for filtering. The strategies (2) to (5) given in the present invention are only one of the preferred ways to be used by reference. The sequence numbers of the above policies are used for convenience of description only, and do not affect the essence of the policies, and should not be construed as limiting the scope of the embodiments of the present invention.
Example 3
In another embodiment, a specific scenario is used to describe the implementation of the instruction issue control method of the present invention. The coarse-grained data stream system in this scenario has 4 physical contexts, corresponding to 4 unrelated data spaces, and it is assumed that the program segment being executed thereon can be effectively divided into 3 CAL stages 0-2 and 3 LOAD stages 0-2 for interleaving execution. The arrangement of the program segment instructions in the INST RAM is shown in FIG. 3. The structure of the multi-context coarse-grained data stream is shown in FIG. 2, in which only the selection and merging paths for the LOAD type Stage are shown.
FIG. 4 lists all values corresponding to valid comparables composed of similar stages in a single physical context in this example scenario. And listing the PC start and end values of each valid comparand value corresponding to the combinable Stage according to the selection strategy (3).
The physical context selection logic and Stage merge process are described below with 2 specific examples.
Example 1 alternative valid comparison words with different 1 consecutive digits
In this example, as shown in fig. 5, at the current time, the LOAD functional unit is processing physical context data No. 0, and the selection unit will select physical contexts No. 1-3. In this example, the number of 1 consecutive bits in the valid comparison word of physical context nos. 1-3 is different, and the specific selection and combination steps are as follows:
step 501: 1-3 physical contexts respectively combine valid bits of the LOAD Stage in the physical contexts from small to large according to the Stage number into valid comparison words from low to high, namely the valid comparison word of the No. 1 physical context is 111, the No. 2 physical context is 101, and the No. 3 physical context is 011;
step 502: sending valid comparison words into a selection Unit from No. 1 to No. 3 physical contexts;
step 503: the selection unit analyzes the valid comparison words 1 to 3, the valid comparison word 1 111 contains 1 with 3 consecutive digits, the valid comparison word 2 101 contains 1 with 1 consecutive digits and the valid comparison word 3 011 contains 1 with 2 consecutive digits. According to the selection strategy (2), selecting the physical context with the most continuous digits of 1, namely the context No. 1;
step 504: the selection unit transmits selection result information to the LOAD merging unit, the result information comprises a Physical context number Physical _ id of 1, the starting PC value of Stage to be merged and executed is PC0, and the ending PC value is PC 3;
step 505: the LOAD merging unit receives the physical context selection information and controls the next execution of the LOAD function unit according to the PC start and end values.
Example 2 alternative valid comparison word has the same number of 1 consecutive bits
In the present example, as shown in fig. 6, at the present moment, the LOAD functional unit is processing physical context data No. 0, and the selection unit will select physical contexts No. 1-3. In the present example, when 1 consecutive digits in valid comparison words of physical contexts No. 1 to No. 3 are the same, the specific selection and combination steps are as follows:
step 601: 1-3 physical contexts respectively combine valid bits of the LOAD Stage in the physical contexts from small to large according to the Stage number into valid comparison words from low to high, namely, the valid comparison words of the 1 physical context are 011, the 2 physical context is 110, and the 3 physical context is 110;
step 602: sending valid comparison words into a selection Unit from No. 1 to No. 3 physical contexts;
step 603: the selection unit analyzes the valid comparison words 1 to 3, wherein the valid comparison word 1 111 contains 1 with 2 consecutive digits, the valid comparison word 2 contains 1 with 2 consecutive digits, and the valid comparison word 3 contains 1 with 2 consecutive digits. 1 continuous digits in 3 valid selection words are the same, according to the selection strategy (3), the Stage which can be merged corresponding to the valid comparison word No. 1 is No. 0-1, and the Stage which can be merged corresponding to the valid comparison words No. 2 and No. 3 is No. 1-2, so that the physical context No. 1 should be excluded. In addition, according to the selection strategy (5), the physical context with smaller number is preferentially selected, namely the context No. 2;
step 604: the selection unit transmits selection result information to the LOAD merging unit, the result information contains a Physical context number Physical _ id of 2, the starting PC value of Stage to be merged and executed is PC1, and the ending PC value is PC 3;
step 605: the LOAD merging unit receives the physical context selection information and controls the next execution of the LOAD function unit according to the PC start and end values.
Example 4
In yet another embodiment, the present invention further provides an instruction issue control apparatus for a multi-context coarse-grained data stream structure, the apparatus comprising one or more processors,
a memory unit in which computer instructions are stored that can be called and operated by the processor;
the computer instructions execute the instruction transmitting control method facing the multi-context coarse granularity data stream structure. Specifically, the method performed by the apparatus may be, for example, the methods described in embodiments 2 and 3.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. An instruction emission control method facing a multi-context coarse-grained data stream structure, characterized in that the method comprises:
placing the instructions with the same type of the stage in a corresponding area of an instruction RAM continuously;
setting a stage feedback mechanism, wherein a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time; and an identification bit is allocated for each stage;
setting a stage merging mechanism, adding a merging unit for each type of stage, and controlling a functional unit to continuously execute instructions of a plurality of stages of the same type through the merging unit;
setting a physical context selection mechanism, and selecting a physical context to be executed by the functional unit at the same time when the functional unit executes a physical context data according to the physical context selection mechanism and the identification bit;
controlling the functional unit to execute instructions based on the phase feedback mechanism and the phase merging mechanism, and the physical context selection mechanism.
2. The method according to claim 1, wherein in the corresponding area of the instruction RAM, if a plurality of instructions are included in one stage, the instructions are arranged in the order of dependency relationship;
instruction segments of different stages are distinguished by multiple PC pointers.
3. The method of claim 1, wherein in the phase feedback mechanism, the identification bit is used to identify whether a phase instruction can be transmitted; the flag bit is set to represent that it can transmit when the stage receives a corresponding ack feedback signal.
4. The method of claim 2, wherein the stage merging mechanism, the merging unit implements merging of instructions of stages by controlling the PC pointer.
5. The method of claim 1, wherein the physical context selection mechanism comprises a physical context selection policy, and wherein the physical context selection policy is implemented by setting a physical context selection unit.
6. The method of claim 5, wherein the physical context selection policy comprises:
(1) according to the stage numbers, the identification bits of different stages of the same type in the physical context to be selected form identification bits from small to large, and the identification bits are selected from low to high;
(2) sequentially analyzing each identification bit selection word, and selecting the continuous physical context with the set bits as the most transmittable bits;
(3) recording the starting PC pointer value of the first stage and the ending PC pointer value of the last stage corresponding to the successive setting as the transmittable selecting words in the step (2), recording the number of the physical context, and transmitting the recorded information to the merging unit.
7. The method of claim 6, wherein the physical context selection policy further comprises:
(4) for the (2), if there are a plurality of consecutive set transmittable bits having the same number of bits in the flag bit selection word, the stages corresponding to the high bits are preferentially selected.
8. The method of claim 7, wherein the physical context selection policy further comprises:
(5) for the (2) and (4), if the number of consecutive bits set as transmittable in the identification bit selection word in which a plurality of physical contexts exist is equal, the physical context with the largest combinable stage number is selected.
9. The method of claim 8, wherein the physical context selection policy further comprises:
(6) in the above (2), (4), and (5), if the word is selected by the plurality of physical context identification bits in the same manner, the physical context with the smallest number is selected.
10. An instruction issue control system for a multi-context coarse-grained data stream structure, the system comprising:
functional unit for executing instructions, instruction RAM for storing instructions, and
a feedback mechanism unit, which is used for allocating an identification bit for each stage, so that a plurality of physical contexts share the same instruction and correspond to a plurality of stages, and each stage is executed by different logic contexts in the same physical context at the same time;
the merging units are arranged at each type of stage, and the merging units control the functional units to continuously execute the instructions of a plurality of stages of the same type;
the physical context selection unit selects the physical context to be executed next by the functional unit according to the identification bits of each stage in the physical context while the functional unit executes physical context data;
wherein the instructions of the same type stage are successively placed in the corresponding area of the instruction RAM.
11. An apparatus for instruction issue control for a multi-context coarse-grained data stream structure, the apparatus comprising one or more processors,
a memory unit in which computer instructions are stored that can be called and operated by the processor;
the computer instructions implement the instruction transmission control method for a multi-context coarse-grained data stream structure according to any one of claims 1 to 9.
CN201810682382.9A 2018-06-27 2018-06-27 Instruction emission control method oriented to multi-context coarse-grained data stream structure Active CN109189477B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810682382.9A CN109189477B (en) 2018-06-27 2018-06-27 Instruction emission control method oriented to multi-context coarse-grained data stream structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810682382.9A CN109189477B (en) 2018-06-27 2018-06-27 Instruction emission control method oriented to multi-context coarse-grained data stream structure

Publications (2)

Publication Number Publication Date
CN109189477A CN109189477A (en) 2019-01-11
CN109189477B true CN109189477B (en) 2021-09-28

Family

ID=64948585

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810682382.9A Active CN109189477B (en) 2018-06-27 2018-06-27 Instruction emission control method oriented to multi-context coarse-grained data stream structure

Country Status (1)

Country Link
CN (1) CN109189477B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7646318B2 (en) * 2007-11-26 2010-01-12 Electronics And Telecommunications Research Institute H.264 CAVLC decoding method based on application-specific instruction-set processor
CN102508689A (en) * 2011-11-08 2012-06-20 上海交通大学 Data processing system capable of maintaining dependency relationship in advanced language program data flow diagram extraction
CN103702310A (en) * 2013-12-30 2014-04-02 成都道永网络技术有限公司 Multi-interface data merge implementation method
US9009448B2 (en) * 2011-08-17 2015-04-14 Intel Corporation Multithreaded DFA architecture for finding rules match by concurrently performing at varying input stream positions and sorting result tokens
CN105279022A (en) * 2010-05-04 2016-01-27 谷歌公司 Parallel processing of data
CN106133690A (en) * 2014-03-27 2016-11-16 国际商业机器公司 Thread context in multi-threaded computer system retains
CN107273092A (en) * 2017-05-03 2017-10-20 北京中科睿芯科技有限公司 A kind of method and its system for optimizing data stream architecture memory access latency

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020107889A1 (en) * 2001-02-08 2002-08-08 Tilion Corporation Markup language routing and administration
CN108052347B (en) * 2017-12-06 2021-07-20 北京中科睿芯智能计算产业研究院有限公司 Device and method for executing instruction selection and instruction mapping method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7646318B2 (en) * 2007-11-26 2010-01-12 Electronics And Telecommunications Research Institute H.264 CAVLC decoding method based on application-specific instruction-set processor
CN105279022A (en) * 2010-05-04 2016-01-27 谷歌公司 Parallel processing of data
US9009448B2 (en) * 2011-08-17 2015-04-14 Intel Corporation Multithreaded DFA architecture for finding rules match by concurrently performing at varying input stream positions and sorting result tokens
CN102508689A (en) * 2011-11-08 2012-06-20 上海交通大学 Data processing system capable of maintaining dependency relationship in advanced language program data flow diagram extraction
CN103702310A (en) * 2013-12-30 2014-04-02 成都道永网络技术有限公司 Multi-interface data merge implementation method
CN106133690A (en) * 2014-03-27 2016-11-16 国际商业机器公司 Thread context in multi-threaded computer system retains
CN107273092A (en) * 2017-05-03 2017-10-20 北京中科睿芯科技有限公司 A kind of method and its system for optimizing data stream architecture memory access latency

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MilošKrstajićet al..Visualization of streaming data: Observing change and context in information visualization techniques.《 2013 IEEE International Conference on Big Data》.2013,全文. *
一种面向科学计算的数据流优化方法;申小伟 等;《计算机学报》;20170930;第40卷(第9期);全文 *

Also Published As

Publication number Publication date
CN109189477A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
JP5707011B2 (en) Integrated branch destination / predicate prediction
US7366874B2 (en) Apparatus and method for dispatching very long instruction word having variable length
CN104040490B (en) Code optimizer for the acceleration of multi engine microprocessor
JP3797471B2 (en) Method and apparatus for identifying divisible packets in a multi-threaded VLIW processor
JP2928695B2 (en) Multi-thread microprocessor using static interleave and instruction thread execution method in system including the same
KR100543887B1 (en) Piplined memory controller
CN103348323B (en) Method and system for performance objective program in computer systems
US6304954B1 (en) Executing multiple instructions in multi-pipelined processor by dynamically switching memory ports of fewer number than the pipeline
US20080250227A1 (en) General Purpose Multiprocessor Programming Apparatus And Method
US20060225139A1 (en) Semiconductor integrated circuit
US11900120B2 (en) Issuing instructions based on resource conflict constraints in microprocessor
JP2002516425A (en) Controller for digital processor
EP2577464B1 (en) System and method to evaluate a data value as an instruction
CN109189477B (en) Instruction emission control method oriented to multi-context coarse-grained data stream structure
JP5576605B2 (en) Program conversion apparatus and program conversion method
US9513921B2 (en) Computer processor employing temporal addressing for storage of transient operands
US5828861A (en) System and method for reducing the critical path in memory control unit and input/output control unit operations
US10606602B2 (en) Electronic apparatus, processor and control method including a compiler scheduling instructions to reduce unused input ports
US7200739B2 (en) Generation of modified command sequence from original command by feeding back for subsequent modification based on decode control signal
JP2003203486A (en) Semiconductor memory device and its control method
US20210042111A1 (en) Efficient encoding of high fanout communications
JP2003296111A (en) Program generator
CN1291310C (en) Device and method for performing non-direct near skip operation in microprocessors
JPH04308930A (en) Electronic computer
JP2009104341A (en) Compile device and processor system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100095 room 135, 1st floor, building 15, Chuangke Town, Wenquan Town, Haidian District, Beijing

Applicant after: Beijing Zhongke Ruixin Technology Group Co.,Ltd.

Address before: 1 wensong Road, Zhongguancun environmental protection park, Beiqing Road, Haidian District, Beijing 100095

Applicant before: SMARTCORE (BEIJING) Co.,Ltd.

GR01 Patent grant
GR01 Patent grant