WO2020090009A1 - Dispositif de traitement arithmétique et son procédé de commande - Google Patents

Dispositif de traitement arithmétique et son procédé de commande Download PDF

Info

Publication number
WO2020090009A1
WO2020090009A1 PCT/JP2018/040345 JP2018040345W WO2020090009A1 WO 2020090009 A1 WO2020090009 A1 WO 2020090009A1 JP 2018040345 W JP2018040345 W JP 2018040345W WO 2020090009 A1 WO2020090009 A1 WO 2020090009A1
Authority
WO
WIPO (PCT)
Prior art keywords
arithmetic processing
chain group
read
instruction
chains
Prior art date
Application number
PCT/JP2018/040345
Other languages
English (en)
Japanese (ja)
Inventor
朋広 永野
Original Assignee
富士通株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士通株式会社 filed Critical 富士通株式会社
Priority to JP2020554648A priority Critical patent/JP7036226B2/ja
Priority to PCT/JP2018/040345 priority patent/WO2020090009A1/fr
Publication of WO2020090009A1 publication Critical patent/WO2020090009A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/18Handling requests for interconnection or transfer for access to memory bus based on priority control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/34Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes

Definitions

  • the present invention relates to an arithmetic processing device and its control method.
  • MAC Memory Access Controller
  • each core is provided with a register file and an arithmetic execution unit, and an instruction unit decodes instructions for data read / write (in other words, load / store) and arithmetic execution, and each read / write unit. Issues the order.
  • the performance of the multi-core computing device depends on the throughput of memory data transfer. However, if control is performed to wait for the processing of the subsequent multicast load instruction until the memory read completion of the preceding multicast load instruction is received, the throughput of memory data may decrease.
  • the technology described in this specification aims to improve the throughput of memory data by reducing the waiting time between chains in a multi-core architecture.
  • the arithmetic processing device is an arithmetic processing device having a multi-core forming a plurality of chains, and includes a first chain group that is a target of a first arithmetic processing among the plurality of chains, and the plurality of chains.
  • the second chain group that is the target of the second arithmetic processing of the chain
  • the third chain group that is the target of the third arithmetic processing in the plurality of chains.
  • the first chain group which has a relationship in which the second arithmetic processing is continuously executed later, is defined as a first determined chain group, and at least one third arithmetic operation is performed after the first arithmetic processing.
  • the second chain to be judged is obtained by calculating the third chain group with respect to the first chain group in a fixed manner, in which the second arithmetic processing is executed after the processing is executed. And the plurality of chains A determination circuit that determines whether the first or second determination target chain group includes the second chain group that is the target of the second arithmetic processing, and the first and second determination chain groups. A generation circuit that generates an initialization instruction that initializes the identifier of the second arithmetic processing when none of the determined chain groups includes the second chain group, and the initialization instruction is issued.
  • an identifier initialized for the second arithmetic processing is obtained, and when the initialization instruction is not issued, the identifier for the second arithmetic processing is continued to the identifier of the first arithmetic processing. And an acquisition circuit for acquiring an identifier.
  • the disclosed arithmetic processing device it is possible to improve the throughput of memory data by reducing the waiting time between chains in a multi-core architecture.
  • FIG. 3 is a block diagram illustrating a memory access process in a writing unit of the accelerator shown in FIG. 2.
  • 3 is a flowchart illustrating a memory access process in the accelerator shown in FIG. 3 is a flowchart illustrating a memory access process in the accelerator shown in FIG. 6 is a flowchart illustrating a multicast REQID initialization process in the read / write unit of the accelerator illustrated in FIG. 2.
  • 3 is a flowchart illustrating a load instruction issuing process in the accelerator instruction unit illustrated in FIG. 2.
  • FIG. 3 is a flowchart illustrating REQID acquisition processing in the read / write unit of the accelerator illustrated in FIG. 2.
  • 3 is a table illustrating the usage status of REQIDs in a multicast load process in the related example shown in FIG. 2.
  • FIG. 1 is a block diagram schematically showing a hardware configuration example of an information processing device in an example. It is a block diagram which shows typically the structural example of the accelerator shown in FIG. It is a figure which shows the format of the request packet in the accelerator shown in FIG.
  • FIG. 12 is a diagram showing a format of a completion packet in the accelerator shown in FIG. 11.
  • FIG. 12 is a block diagram illustrating memory access processing in the accelerator shown in FIG. 11.
  • FIG. 13 is a diagram showing a configuration example of a multicast REQID initialization signal generation circuit of the accelerator shown in FIG. 12.
  • FIG. 13 is a diagram showing a configuration example of a multicast acquisition circuit of the accelerator shown in FIG. 12.
  • 12 is a flowchart illustrating a process of issuing a REQID initialization instruction in the instruction unit of the accelerator illustrated in FIG. 11.
  • 12 is a flowchart illustrating REQID initialization processing in the read / write unit of the accelerator illustrated in FIG. 11.
  • 12 is a flowchart illustrating a load instruction issuing process in the instruction unit of the accelerator illustrated in FIG. 11.
  • 12 is a flowchart illustrating REQID acquisition processing in the read / write unit of the accelerator illustrated in FIG. 11.
  • 12 is a table showing an example of the usage status of REQIDs in the multicast load process in the embodiment shown in FIG. 11.
  • each diagram is not intended to include only the constituent elements shown in the diagram, and may include other functions and the like.
  • FIG. 1 is a block diagram schematically showing a configuration example of the accelerator 600 in the related example.
  • the accelerator 600 processes an arithmetic instruction, and includes a MAC 6, a memory 7, and a plurality of (three in the illustrated example) cores 8 (“cores # 0 to # 2”).
  • the MAC 6 handles access to the memory 7 by each core 8.
  • the memory 7 may be used as a primary recording memory or a working memory.
  • Each core 8 loads and stores the memory 7 via the MAC 6.
  • Each core 8 functions as an instruction unit 81, a read / write unit 82, and a calculation execution unit 83, and stores a register file 84.
  • the register file 84 stores the data acquired from the memory 7.
  • the operation execution unit 83 performs an operation using the data stored in the register file 84.
  • the instruction unit 81 decodes the load instruction sent from the software and instructs the read / write unit 82 to execute the load instruction.
  • the read / write unit 82 divides the load instruction into memory access units, puts a request ID (may be referred to as “REQID”) on each divided load instruction, and issues a memory read request to the MAC 6.
  • REQID request ID
  • FIG. 2 is a block diagram explaining a memory access process in the accelerator 600, which is different from FIG. 1, and an instruction unit 81 and a read / write unit 82 are provided independently from each core.
  • FIG. 3 is a block diagram illustrating a memory access process in the read / write unit 82 of the accelerator 600 shown in FIG.
  • the instruction unit 81 decodes the instruction from the software (see symbol A1 in FIGS. 2 and 3).
  • the instruction unit 81 issues a multicast load instruction to the read / write units 82 of all target chains (see reference numeral A2 in FIGS. 2 and 3). At this time, the instruction unit 81 notifies the read / write units 82 of all target chains of the target chains.
  • the decision circuit 822 in the instruction division circuit 821 of each read / write unit 82 determines that a multicast load instruction has been issued by receiving a load instruction having a plurality of target instructions. Then, the instruction dividing circuit 821 divides the load instruction into 256 bytes (see symbol A3 in FIGS. 2 and 3).
  • the REQID management circuit 823 of the read / write unit 82 includes a unicast acquisition circuit 824, a multicast acquisition circuit 825, and an acquisition wait buffer 826.
  • the multicast acquisition circuit 825 acquires the REQID and the data buffer area for each of the divided load instructions (see symbol A4 in FIGS. 2 and 3).
  • Each read / write unit 82 notifies the acquisition waiting buffer 826 in the read / write unit 82 in charge of issuing the memory read request of the acquired REQID (see symbol A5 in FIGS. 2 and 3). As a result, the read / write unit 82 in charge of issuing the memory read request can recognize that the REQIDs acquired by the read / write units 82 are unified.
  • the memory request generation circuit 827 in the read / write unit 82 in charge of issuing the memory read request issues the memory read request to the reception buffer 611 in the port 61 of the MAC 6 after being notified of the REQIDs acquired from the read / write units 82 of all target chains. Is issued (see A6 in FIGS. 2 and 3). At this time, the target chain is designated by the bitmap in the dst field.
  • the read / write units 82 of all target chains receive the memory read completion from the transmission buffer 612 at the port 61 of the MAC 6 as a response to the memory read request, and store the accompanying memory read data in the data buffer 829 (FIG. 2 and Reference A7 in FIG. 3).
  • the register control request generation circuit 828 of each read / write unit 82 reads the memory read data from the data buffer 829, and transfers the read memory read data together with the write request to the register file 84 of the core 8 (see FIGS. 2 and 3). Reference A8).
  • the processing indicated by the reference signs A4 to A8 is repeatedly executed for all the divided requests.
  • Each read / write unit 82 issues a completion notice corresponding to the multicast load instruction indicated by reference sign A2 to the instruction unit 81 (see reference sign A9 in FIGS. 2 and 3).
  • the instruction unit 81 receives completion notifications from the read / write units 82 of all target chains, and recognizes the completion of the instruction (see symbol A10 in FIGS. 2 and 3).
  • the target chain read / write unit 82 determines whether there are a plurality of target chains (step S2).
  • the read / write unit 82 determines whether the REQID to be acquired is free by the multicast acquisition circuit 825 (step S3).
  • step S3 If the REQID you want to acquire is not empty (see No route in step S3), the process in step S3 is repeated.
  • the reading / writing unit 82 notifies the reading / writing unit 82 in charge of the request that the REQID has been acquired (step S4).
  • the read / write unit 82 determines whether it is the read / write unit 82 in charge of the request (step S5).
  • step S5 If the read / write unit 82 is not in charge of the request (see No route in step S5), the process proceeds to step S9.
  • the read / write unit 82 determines whether REQID acquisition notifications have been issued from all target chains (step S6).
  • step S6 If there is a target chain for which the REQID acquisition notification has not been issued (see No route in step S6), the process in step S6 is repeatedly executed.
  • the read / write unit 82 determines whether or not the receive buffer 611 is available at the port 61 of the MAC6 (step S6). S7).
  • step S7 If there is no free space in the reception buffer 611 (see No route in step S7), the process in step S7 is repeated.
  • the read / write unit 82 issues a multicast read request to the MAC 6 (step S8).
  • the destination of the multicast read is set to all target chains.
  • the read / write unit 82 determines whether all the divided read requests have been issued (step S9). If there is a read request that has not been issued (see No route in step S9), the process returns to step S3.
  • step S9 when all the divided read requests are issued (see Yes route in step S9), the read / write unit 82 issues a request corresponding to the subsequent instruction from the instruction unit 81 (step S10). Then, the process returns to step S2.
  • the MAC 6 receives the multicast read request (step S11).
  • MAC6 performs memory read (step S12).
  • the MAC 6 issues a completion with read data added to the read / write unit 82 in the dst field (step S13).
  • the read / write unit 82 receives the completion from the MAC 6 (step S14).
  • the read / write unit 82 issues a register write request with read data added to each core 8 (step S15).
  • the read / write unit 82 releases the REQID and the data buffer 829 (step S16).
  • the read / write unit 82 determines whether all the register write requests corresponding to the divided read requests have been issued (step S17).
  • step S17 If there is a register write request that has not been issued (see No route in step S17), the process returns to step S14.
  • step S17 when all the register write requests have been issued (see Yes route in step S17), the read / write unit 82 notifies the instruction unit 81 of the completion of the instruction in its own chain (step S18).
  • the instruction unit 81 determines whether the instruction completion notification has been received from the read / write units 82 of all target chains (step S19).
  • step S19 If there is a target chain that has not received the instruction completion notification (see No route in step S19), the process in step S19 is repeatedly executed.
  • the instruction unit 81 recognizes the completion of the instruction and the memory access process ends.
  • the read / write unit 82 processes the Nth division of one instruction (step S21).
  • the read / write unit 82 determines whether N is 1 (step S22).
  • step S22 If N is 1 (see Yes route in step S22), the read / write unit 82 initializes the multicast acquisition circuit 825 (step S23). Then, the process proceeds to step S25.
  • step S24 the multicast acquisition circuit 825 is updated (step S24).
  • the read / write unit 82 determines the REQID to be issued in the multicast (step S25). Then, the initialization process of the multicast REQID ends.
  • the instruction unit 81 decodes the load instruction from the software (step S31).
  • the read / write unit 82 receives the load instruction and the target chain according to the instruction from the instruction unit 81 shown in step S32 of FIG. 7 (step S41).
  • the read / write unit 82 determines whether there are a plurality of target chains (step S42).
  • the read / write unit 82 determines whether the first division of the load instruction is being processed (step S43).
  • step S43 If the first division is being processed (see Yes route in step S43), the read / write unit 82 initializes the multicast acquisition circuit 825 (step S44). Then, the process proceeds to step S46.
  • the read / write unit 82 updates the multicast acquisition circuit 825 (step S45).
  • the read / write unit 82 determines whether the REQID to be acquired is free (step S46).
  • step S46 If the REQID to be acquired is not empty (see No route in step S46), the process in step S46 is repeatedly executed.
  • the reading / writing unit 82 performs waiting processing for the acquired REQID (step S47). Then, the REQID acquisition process ends.
  • FIG. 9 is a table illustrating the usage status of REQIDs in the multicast load process in the related example shown in FIG.
  • the performance of the multi-core computing device depends on the throughput of memory data transfer.
  • each multicast load instruction is 1 kilobyte.
  • the processing for the subsequent multicast load instruction is suspended until the completion of the memory read of the preceding multicast load instruction is received. Then, the throughput of memory data may be reduced.
  • FIG. 10 is a block diagram schematically showing a hardware configuration example of the information processing apparatus 1 in the example.
  • the information processing device 1 has a Central Processing Unit (CPU) 10, an Input / Output (I / O) controller 11, an accelerator 12, a hard disk 13, an I / O device 14 and a memory 15.
  • CPU Central Processing Unit
  • I / O Input / Output
  • accelerator 12 a hard disk 13
  • I / O device 14 an I / O device 14
  • memory 15 a memory
  • the I / O controller 11 is connected to the accelerator 12, the hard disk 13, and the I / O device 14.
  • the I / O device 14 refers to an I / O device other than the accelerator 12 and the hard disk 13.
  • the I / O controller 11 receives an instruction from the CPU 10 and controls the accelerator 12, the hard disk 13, and the I / O device 14. Then, the I / O controller 11 relays communication between the accelerator 12, the hard disk 13, the I / O device 14, and the CPU 10.
  • the CPU 10 which is an arithmetic processing unit, is connected to the I / O controller 11 and the memory 15 by a bus. Then, the CPU 10 can send and receive data to and from the memory 15. Further, the CPU 10 can send and receive data to and from the accelerator 12, the hard disk 13, and the I / O device 14 via the I / O controller 11.
  • the hard disk 13 stores various programs such as Operating System (OS) and various applications.
  • OS Operating System
  • applications various applications.
  • the CPU 10 operates the OS and various applications by reading the program from the hard disk 13, expanding it on the memory 15, and executing it.
  • applications include applications that execute deep learning and the like.
  • the CPU 10 causes the accelerator 12 to perform a specific process when executing the application.
  • the CPU 10 causes the accelerator 12 to perform arithmetic processing such as deep learning.
  • the software executed by the CPU 10 transmits an operation command to the accelerator 12 together with data used for the operation via the I / O controller 11.
  • FIG. 11 is a block diagram schematically showing a configuration example of the accelerator 12 shown in FIG.
  • the accelerator 12 includes a MAC 121, a memory 122, a plurality of cores 123, an instruction unit 124, and a read / write unit 125.
  • the memory 122 may be used as a primary recording memory or a working memory.
  • the MAC 121 processes access to the memory 7 by each core 123.
  • the MAC 121 includes a plurality of ports 1211 (denoted as “P # 0 to # 7” in FIG. 11).
  • a plurality of cores 123, one read / write unit 125, and one port 1211 are provided for each chain # 0 to # 7 indicated by the broken line frame in FIG.
  • seven cores 123 are provided in each chain.
  • chain # 0 has cores # 0-1 to # 0-N
  • chain # 1 has cores # 1-1 to # 1-N
  • chain # 7 has core # 7-1.
  • ⁇ # 7-N are provided.
  • Each core 123 executes loading and storing on the memory 122 via the read / write unit 125 and the MAC 121.
  • the instruction unit 124 decodes the load instruction sent from the software and instructs the read / write unit 125 to execute the load instruction.
  • the read / write unit 125 divides the load instruction into memory access units, puts a request ID (may be referred to as “REQID”) on each divided load instruction, and issues a memory read request to the MAC 121.
  • REQID request ID
  • a load (which may be referred to as “multicast load”) instruction in which a plurality of chains are designated at a time is supported.
  • the MAC 121 and the read / write unit 125 support a memory read (may be referred to as a “multicast read request”) that specifies a plurality of chains.
  • FIG. 12 is a diagram showing a format of a request packet in the accelerator 12 shown in FIG.
  • the multicast read request generated by the read / write unit 125 has the format shown in FIG.
  • opc indicating the type of request
  • dst indicating the transmission destination of the read data
  • REQID indicating the assigned ID
  • Address indicating the read address of the data
  • opc, dst and REQID are sent in the first cycle.
  • the Address is sent in the first cycle and the second cycle.
  • FIG. 13 is a diagram showing a format of a completion packet in the accelerator 12 shown in FIG.
  • the memory read recompletion obtained by the read / write unit 125 has the format shown in FIG.
  • the area transmitted in the first cycle of the memory read completion is opc indicating the request type, REQID indicating the ID of the multicast read request that is the response target, and Status indicating the response status, and the reserve (rsv).
  • a header having an area is stored.
  • the chain to which the completion is issued is specified by the bitmap.
  • the request completion of memory read or memory write is determined.
  • the REQID field shown in FIG. 13 is a completion corresponding to the request shown in FIG. 12, and the same value is stored.
  • An address value to be accessed is stored in the Address field shown in FIG.
  • the instruction unit 124 notifies each read / write unit 125 together with the load instruction along with information about the target chain.
  • the representative read / write unit 125 specifies the target chain in the dst field with a bitmap when issuing a memory read request to the MAC 121.
  • the MAC 121 acquires data from the memory 122 and issues a completion to all chains specified by the dst field.
  • FIG. 14 is a block diagram illustrating a memory access process in the accelerator 12 shown in FIG.
  • the instruction unit 124 decodes an instruction from software.
  • the instruction unit 124 issues a multicast load command to the read / write units 125 of all target chains. At this time, the instruction unit 124 notifies the read / write unit 125 of all target chains of the target chains.
  • the instruction unit 124 propagates the REQID initialization signal as an interface signal to the reading / writing unit 125 to each reading / writing unit 125 together with the load instruction.
  • the instruction unit 124 includes a synchronization monitoring circuit 1221 and an initialization signal generation circuit 1222.
  • FIG. 15 is a diagram showing an example of the configuration of the multicast REQID synchronization monitoring circuit 1221 of the accelerator 12 shown in FIG.
  • the synchronization monitoring circuit 1221 receives 28 patterns of (X, Y) combinations.
  • the output signal same_reqid_grp_XY also has 28 patterns and is stored in the group table 1220 shown in FIG. load_valid is a valid signal of a load instruction and an update instruction signal of this circuit.
  • the synchronization monitoring circuit 1221 determines whether the first or second determined chain group of the plurality of chains includes the second chain group that is the target of the second arithmetic processing. It is an example of a circuit.
  • the first determined chain group is a first chain group in which the second arithmetic processing is continuously executed after the first arithmetic processing.
  • the second chain to be judged is a first chain group in a relationship in which after the first arithmetic processing, at least one or more third arithmetic processing is executed and then the second arithmetic processing is executed.
  • the third chain group is calculated by a constant method.
  • the first chain group is a target of the first arithmetic processing of the plurality of chains
  • the second chain group is a target of the second arithmetic processing of the plurality of chains
  • the third chain group is a plurality of the plurality of chains. It is the target of the third arithmetic processing in the chain.
  • the fixed method is a process of removing the third chain group from the first chain group.
  • FIG. 16 is a diagram showing a configuration example of the multicast REQID initialization signal generation circuit 1222 of the accelerator 12 shown in FIG.
  • the initialization signal generation circuit 1222 initializes the identifier of the second arithmetic processing when neither the first nor the second chain group to be judged includes the second chain group. It is an example of a generation circuit that generates an activation instruction.
  • the decision circuit 311 in the instruction division circuit 31 of each read / write unit 125 determines that a multicast load instruction has been issued by receiving a load instruction having a plurality of target instructions. Then, the instruction division circuit 31 divides the load instruction into 256 bytes.
  • the instruction division circuit 31 is an example of a division circuit that divides the instruction related to the second arithmetic processing into a plurality of instructions.
  • the REQID management circuit 32 of the read / write unit 125 includes a unicast acquisition circuit 321, a multicast acquisition circuit 322, and an acquisition wait buffer 323.
  • the multicast acquisition circuit 322 acquires the REQID and the data buffer area for each of the divided load instructions.
  • FIG. 17 is a diagram showing a configuration example of the multicast acquisition circuit 322 of the accelerator 12 shown in FIG.
  • the multicast acquisition circuit 322 acquires the identifier that has been initialized for the second arithmetic processing when the initialization instruction is issued, and the second when the initialization instruction is not issued.
  • 2 is an example of an acquisition circuit that acquires an identifier that is continuous with the identifier of the first arithmetic processing for the arithmetic processing of.
  • the multicast acquisition circuit 322 may acquire an identifier that is continuous with the identifier of the acquisition target immediately before the acquisition target when the acquisition target of the identifier is not the first division of the instruction.
  • the multicast acquisition circuit 322 of each read / write unit 125 notifies the acquired REQID to the acquisition waiting buffer 323 in the read / write unit 125 in charge of issuing a memory read request.
  • the read / write unit 125 in charge of issuing the memory read request can recognize that the REQIDs acquired by the read / write units 125 are unified.
  • the multicast acquisition circuit 322 is an example of an acquisition circuit that notifies the other reader / writer 125 of the plurality of readers / writers 125 of the acquired identifier.
  • the memory request generation circuit 33 in the read / write unit 125 in charge of issuing the memory read request notifies the reception buffer 21 in the port 1211 of the MAC 121 to the memory read request after being notified of the REQIDs acquired from the read / write units 125 of all target chains. To issue. At this time, the target chain is designated by the bitmap in the dst field.
  • the read / write unit 125 of all target chains receives the memory read completion from the transmission buffer 22 at the port 1211 of the MAC 121 as a response to the memory read request, and stores the accompanying memory read data in the data buffer 35.
  • the register control request generation circuit 34 of each read / write unit 125 reads the memory read data from the data buffer 35, and transfers the read memory read data together with the write request to the register file of the core 123.
  • Each read / write unit 125 issues a completion notice corresponding to the multicast load instruction to the instruction unit 124.
  • the instruction unit 124 receives the completion notification from the read / write units 125 of all target chains and recognizes the completion of the command.
  • the synchronization monitoring circuit 1221 updates the group table 1220 for REQID synchronization to the latest state (step S51).
  • the initialization signal generation circuit 1222 determines whether or not all target chains of the load instruction to be issued are included in the group table 1220 (step S52).
  • step S52 If all target chains are included (see Yes route in step S52), the initialization signal generation circuit 1222 de-asserts the REQID initialization instruction signal (step S53). Then, the process proceeds to step S55.
  • the initialization signal generation circuit 1222 asserts a REQID initialization instruction signal (step S54).
  • the initialization signal generation circuit 1222 notifies the read / write unit 125 of the REQID initialization instruction signal. The processing is taken over by the processing of the reading / writing unit 125 in step S61 of FIG. 19, and the issuing processing of the REQID initialization command ends.
  • the instruction division circuit 31 receives the REQID initialization instruction signal in response to the transmission from the instruction unit 124 in step S55 of FIG. 18 (step S61).
  • the multicast acquisition circuit 322 determines whether the REQID initialization instruction signal is 1 (step S62).
  • step S62 If the REQID initialization instruction signal is 1 (see Yes route in step S62), the value of the multicast acquisition circuit 322 is initialized (step S63). Then, the process proceeds to step S65.
  • the multicast acquisition circuit 825 updates or holds the value (step S64).
  • the decision circuit 311 decides the REQID to be issued in the multicast (step S65). Then, the initialization process of the multicast REQID ends.
  • the instruction unit 124 decodes the load instruction from the software (step S71).
  • the initialization signal generation circuit 1222 determines whether or not all target chains of the load instruction to be issued are included in the group table 1220 (step S72).
  • step S72 If all target chains are included (see Yes route in step S72), the initialization signal generation circuit 1222 de-asserts the REQID initialization instruction signal (step S73). Then, the process proceeds to step S75.
  • the initialization signal generation circuit 1222 asserts a REQID initialization instruction signal (step S74).
  • the initialization signal generation circuit 1222 notifies the target read / write unit 125 of the load instruction and the target chain (step S75). Then, the processing is taken over by the processing in the reading / writing unit 125 after step S81 in FIG.
  • the synchronization monitoring circuit 1221 updates the group table 1220 for REQID synchronization based on the target chain for reference at the next load instruction (step S76).
  • the instruction division circuit 31 receives the load instruction and the target chain together with the REQID initialization instruction signal in response to the notification from the instruction unit 124 shown in step S75 of FIG. 20 (step S81).
  • the determination circuit 311 determines whether there are a plurality of target chains (step S82).
  • the multicast acquisition circuit 322 determines whether the first division of the load instruction is being processed (step S83).
  • the multicast acquisition circuit 322 determines whether the REQID initialization instruction signal is 1 (step S84).
  • step S84 If the REQID initialization instruction signal is not 1 (see No route in step S84), the process proceeds to step S86.
  • step S85 when the REQID initialization instruction signal is 1, the multicast acquisition circuit 322 is initialized with a value (step S85). Then, the process proceeds to step S87.
  • step S83 If the first division is not being processed in step S83 (see No route in step S83), the value of the multicast acquisition circuit 322 is updated (step S86).
  • the multicast acquisition circuit 322 determines whether the REQID to be acquired is free (step S87).
  • step S87 If the REQID to be acquired is not empty (see No route in step S87), the process in step S87 is repeatedly executed.
  • the multicast acquisition circuit 322 performs waiting processing for the acquired REQID (step S88). Then, the REQID acquisition process ends.
  • FIG. 22 is a table illustrating the usage status of REQIDs in the multicast load processing in the embodiment shown in FIG.
  • the REQID when the multicast load instruction is processed continuously, the REQID can be serially used without being initialized.
  • each of the multicast load instructions “1” to “4” is 1 kilobyte.
  • the load target chain of the multicast load instruction “2” is included in the group table 1220 for REQID synchronization.
  • the multicast load instruction “3” is included in the group table 1220 for the REQID synchronization of the load target chain as the multicast load instruction “2”.
  • the multicast load instruction “4” is initialized by asserting the REQID because the load target chain is not included in the group table 1220 for REQID synchronization.
  • the opportunity to initialize the REQID is reduced compared to the table that illustrates the usage status of the REQID in the multicast load process in the related example shown in FIG. Then, the opportunity to wait for the release of the REQID in use is reduced, and the memory read request to the MAC 121 can be issued promptly, so that the reduction in the throughput of the memory data bus can be prevented.
  • the synchronization monitoring circuit 1221 determines whether the first or second determination target chain group among the plurality of chains includes the second chain group which is the target of the second arithmetic processing.
  • the first determined chain group is a first chain group in which the second arithmetic processing is continuously executed after the first arithmetic processing.
  • the second chain to be judged is a first chain group in a relationship in which after the first arithmetic processing, at least one or more third arithmetic processing is executed and then the second arithmetic processing is executed.
  • the third chain group is calculated by a constant method.
  • the first chain group is a target of the first arithmetic processing of the plurality of chains
  • the second chain group is a target of the second arithmetic processing of the plurality of chains
  • the third chain group is a plurality of the plurality of chains. It is the target of the third arithmetic processing in the chain.
  • the initialization signal generation circuit 1222 generates an initialization instruction for initializing the identifier of the second arithmetic processing when neither the first nor the second chain group to be judged includes the second chain group. ..
  • the multicast acquisition circuit 322 acquires the identifier initialized for the second arithmetic processing when the initialization instruction is issued, while the multicast acquisition circuit 322 performs the second arithmetic processing for the second arithmetic processing when the initialization instruction is not issued. Then, an identifier that follows the identifier of the first arithmetic processing is acquired.
  • the multicast acquisition circuit 322 notifies the acquired identifier to the other read / write unit 125 among the plurality of read / write units 125.
  • the read / write unit 125 in charge of issuing the memory read request can recognize that the REQIDs acquired by the respective read / write units 125 are unified.
  • the instruction division circuit 31 divides the instruction related to the second arithmetic processing into a plurality of instructions.
  • the multicast acquisition circuit 322 acquires an identifier continuous with the acquisition target identifier immediately before the acquisition target.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Multi Processors (AREA)
  • Advance Control (AREA)
  • Executing Machine-Instructions (AREA)
  • Memory System (AREA)

Abstract

La présente invention comprend : un circuit de détermination (1221) qui, pour des premier à troisième groupes de chaîne devant être respectivement soumis à des premier à troisième processus arithmétiques, définit, en tant que premier groupe de chaînes déterminé, le premier groupe de chaînes ayant une relation dans laquelle un second processus arithmétique est exécuté en continu après un premier processus arithmétique ; définit, en tant que deuxième groupe de chaînes déterminé, un groupe de chaînes qui est obtenu par calcul du troisième groupe de chaînes d'une certaine manière par rapport au premier groupe de chaînes, et qui a une relation dans laquelle, après le premier processus arithmétique, un ou plusieurs troisièmes processus arithmétiques sont exécutés, puis le second processus arithmétique est exécuté, et qui détermine si le premier ou le second groupe de chaînes déterminé comprend un second groupe de chaînes devant être soumis au second processus arithmétique ; un circuit de génération (1222) qui génère une instruction d'initialisation d'un identifiant du second processus arithmétique, lorsque les premier et second groupes de chaînes déterminés ne comprennent pas le second groupe de chaînes ; et un circuit d'acquisition (322) qui acquiert un identifiant initialisé pour le second processus arithmétique lorsque l'instruction d'initialisation est émise, et acquiert un identifiant qui est continu avec l'identifiant du premier processus arithmétique, pour le second processus arithmétique, lorsque l'instruction d'initialisation n'est pas émise.
PCT/JP2018/040345 2018-10-30 2018-10-30 Dispositif de traitement arithmétique et son procédé de commande WO2020090009A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2020554648A JP7036226B2 (ja) 2018-10-30 2018-10-30 演算処理装置及びその制御方法
PCT/JP2018/040345 WO2020090009A1 (fr) 2018-10-30 2018-10-30 Dispositif de traitement arithmétique et son procédé de commande

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2018/040345 WO2020090009A1 (fr) 2018-10-30 2018-10-30 Dispositif de traitement arithmétique et son procédé de commande

Publications (1)

Publication Number Publication Date
WO2020090009A1 true WO2020090009A1 (fr) 2020-05-07

Family

ID=70462017

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/040345 WO2020090009A1 (fr) 2018-10-30 2018-10-30 Dispositif de traitement arithmétique et son procédé de commande

Country Status (2)

Country Link
JP (1) JP7036226B2 (fr)
WO (1) WO2020090009A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001175632A (ja) * 2000-10-10 2001-06-29 Hitachi Ltd マルチプロセッサシステム
WO2011121709A1 (fr) * 2010-03-29 2011-10-06 株式会社東芝 Dispositif à semi-conducteurs
JP2017161954A (ja) * 2014-07-31 2017-09-14 三菱電機株式会社 データ処理システム及びデータ処理方法及びプログラム

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001175632A (ja) * 2000-10-10 2001-06-29 Hitachi Ltd マルチプロセッサシステム
WO2011121709A1 (fr) * 2010-03-29 2011-10-06 株式会社東芝 Dispositif à semi-conducteurs
JP2017161954A (ja) * 2014-07-31 2017-09-14 三菱電機株式会社 データ処理システム及びデータ処理方法及びプログラム

Also Published As

Publication number Publication date
JPWO2020090009A1 (ja) 2021-09-24
JP7036226B2 (ja) 2022-03-15

Similar Documents

Publication Publication Date Title
US6775719B1 (en) Host-fabric adapter and method of connecting a host system to a channel-based switched fabric in a data network
EP2312457B1 (fr) Appareil de traitement de données, procédé de traitement de données et support d'enregistrement lisible sur un ordinateur
US20090138624A1 (en) Storage system and method
JP4763405B2 (ja) データ・フロー・アプリケーションのためのネットワーク・オン・チップ半自動通信アーキテクチャ
JPH09128252A (ja) 優先度付きタスク実行制御方法及びデータ処理装置
CN102651690A (zh) 对共享内存进行控制的方法和装置
CN110928696B (zh) 用户级线程控制系统及其方法
JP2021520578A (ja) タスクスケジューリング
US8416702B2 (en) Network switch, path setting method, and program
CN115543219A (zh) 一种对主机io处理的优化方法、装置、设备及介质
CN118076944A (zh) 可重新配置计算组构中的循环执行期间的数据存储
WO2020090009A1 (fr) Dispositif de traitement arithmétique et son procédé de commande
CN111124979B (zh) 一种基于堆栈结构的i2c多主访问方法和系统
JP6891596B2 (ja) プロセッサ
US11126430B2 (en) Vector processor for heterogeneous data streams
JP4170330B2 (ja) 情報処理装置
JP2007251334A (ja) ネットワークシステムのブロードキャスト処理方法及びネットワークシステム
US20200034213A1 (en) Node device, parallel computer system, and method of controlling parallel computer system
JPH10149293A (ja) 割込みを処理するための方法および装置
WO2022113578A1 (fr) Dispositif de gestion, système, procédé de gestion et support d'enregistrement
CN116302620B (zh) 一种支持乱序回写和并行化的命令通道
EP4109264A1 (fr) Appareil de traitement d'informations, programme de traitement d'informations et procédé de traitement d'informations
CN117931729B (zh) 向量处理器访存指令处理方法和系统
US10901735B2 (en) Apparatus and method to improve performance of memory accesses from plural arithmetic processors
US10268529B2 (en) Parallel processing apparatus and inter-node communication method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18938353

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020554648

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18938353

Country of ref document: EP

Kind code of ref document: A1