WO2016114532A1 - Dispositif et procédé de compilation de programme - Google Patents

Dispositif et procédé de compilation de programme Download PDF

Info

Publication number
WO2016114532A1
WO2016114532A1 PCT/KR2016/000228 KR2016000228W WO2016114532A1 WO 2016114532 A1 WO2016114532 A1 WO 2016114532A1 KR 2016000228 W KR2016000228 W KR 2016000228W WO 2016114532 A1 WO2016114532 A1 WO 2016114532A1
Authority
WO
WIPO (PCT)
Prior art keywords
circuit
unit circuit
loop
unit
syntax
Prior art date
Application number
PCT/KR2016/000228
Other languages
English (en)
Korean (ko)
Inventor
이재진
조강원
Original Assignee
서울대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from KR1020150190701A external-priority patent/KR101737785B1/ko
Application filed by 서울대학교 산학협력단 filed Critical 서울대학교 산학협력단
Publication of WO2016114532A1 publication Critical patent/WO2016114532A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead

Definitions

  • the present invention relates to a program compilation device and a program compilation method, and more particularly, to an apparatus and method for compiling an OpenCL program for an FPGA.
  • OpenCL Open Computing Language
  • the OpenCL platform consists of a host processor and one or more compute devices connected to it.
  • CUs compute units
  • PEs processing elements
  • the host processor corresponds to a CPU, and the operating system runs on the host processor.
  • the computing device corresponds to a multicore CPU or accelerator (GPU, Intel Xeon Phi coprocessor, FPGA, etc.).
  • the CPU becomes a host processor and the FPGA becomes a computing device.
  • the FPGA processes several work items processed in the computing device in parallel according to the instructions of the host program executed in the CPU.
  • a circuit corresponding to one work item is created, and then, using a pipelining technique, the circuit is divided into several pipeline stages, each of which is different from each other. There is a way to process work items at the same time.
  • a plurality of circuits corresponding to one work item (hereinafter, referred to as “unit circuits” for convenience) may be manufactured, and different work items may be independently processed in each unit circuit.
  • the number of work items that can be processed simultaneously in a unit circuit is reduced to the number of pipeline stages that execute the contents in the loop, not the total number of pipeline stages. As long as the pipeline stage of the loop is not terminated, the application is stalled without processing the work item, thereby decreasing the utilization of the FPGA.
  • Korean Patent Laid-Open No. 10-2014-0097548 relates to a software library for a heterogeneous parallel processing platform.
  • the library source code is compiled into an intermediate representation and distributed to the end user computing system.
  • the CPU of the computer system compiles the intermediate representation of the library into binaries that run on the GPU, runs the host application that calls the kernel, and sends the kernel retrieved from the binaries to the GPU.
  • the prior art document also does not solve the problem as described above.
  • the background art described above is technical information that the inventors possess for the derivation of the present invention or acquired in the derivation process of the present invention, and is not necessarily a publicly known technique disclosed to the general public before the application of the present invention. .
  • One embodiment of the present invention is to provide an apparatus and method for compiling an OpenCL program for an FPGA.
  • an embodiment of the present invention has an object to minimize unnecessary circuitry.
  • a program compilation device for compiling an OpenCL program, syntax for separating the OpenCL kernel into a statement before the loop, the statement inside the loop and the statement after the loop It may include a separation unit, a circuit generation unit for generating a circuit corresponding to each syntax, and a language generation unit for expressing the generated circuit in a hardware description language.
  • a method of compiling an OpenCL program by a program compiling device comprising: separating an OpenCL kernel into a statement before a loop, a statement within a loop, and a statement after a loop, and generating a circuit corresponding to each statement. And expressing the generated circuit in a hardware description language.
  • a computer-readable recording medium having recorded thereon a program for executing a program compilation method, wherein the program compilation method comprises: separating the OpenCL kernel into a statement before a loop, a statement inside a loop, and a statement after a loop.
  • the method may include generating a circuit corresponding to each syntax, and expressing the generated circuit in a hardware description language.
  • a computer program executed by a program compilation device and stored in a recording medium for performing a program compilation method may include: After the loop statement, the method may include separating into syntaxes, generating circuits corresponding to the syntaxes, and expressing the generated circuits in a hardware description language.
  • an embodiment of the present invention is to provide an apparatus and method for compiling an OpenCL program for an FPGA.
  • an embodiment of the present invention by using the FPGA to replicate more circuits for processing the loops included in the kernel called by the host program using OpenCL
  • the performance bottleneck of the program by the loop can be reduced to prevent the performance degradation of the system.
  • any one of the problem solving means of the present invention instead of copying the circuit for processing the content outside the loop for the OpenCL kernel containing the loop, the performance of the system by replicating more circuits for processing the loop Can increase.
  • any one of the problem solving means of the present invention by minimizing unnecessary circuitry, it is possible to efficiently use the hardware of the FPGA, resulting in lower power consumption and higher performance.
  • FIG. 1 is a block diagram of an OpenCL platform system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram illustrating a program compilation apparatus according to an embodiment of the present invention.
  • FIG. 3 is a flowchart illustrating a program compilation method according to an embodiment of the present invention.
  • 4 to 5 are exemplary views for explaining a program compilation method according to an embodiment of the present invention.
  • FIG. 1 is a block diagram of an OpenCL platform system 100 according to an embodiment of the present invention.
  • the OpenCL platform system 100 may execute an Open Computing Language (OpenCL) application.
  • OpenCL Open Computing Language
  • Such a system 100 may include a host processor 10 and one or more compute devices 20.
  • the host processor 10 may correspond to a CPU
  • the computing device 20 may be a multicore CPU or an accelerator (GPU, Intel Xeon Phi coprocessor, FPGA, etc.).
  • the host processor 10 may execute an operating system, execute a host program constituting an OpenCL application, and control the calculation device 20 using OpenCL API functions according to the host program.
  • the computing device 20 may be composed of one or more compute units (CUs) 21, and each computation unit may in turn be composed of one or more processing elements (PEs) 22.
  • CUs compute units
  • PEs processing elements
  • the computing device 20 may have three types of memory, namely, a device memory 23, a local memory 24, and a private memory 25.
  • the device memory 23 area is composed of global memory and constant memory, and can also be shared by all PEs 22, and the local memory 24 area can be independently assigned to each calculation unit 21,
  • the private memory 25 area may be independently allocated to each PE 22.
  • the computing device 20 executes the kernel of the OpenCL program by receiving a command of the host program, copies the data of the main memory 11 to the device memory 23, or vice versa. Data can be copied to the main memory 11.
  • the OpenCL program constituting the OpenCL application is composed of a number of kernel functions, written in a language called OpenCL C similar to C, and may be executed in the computing device 20.
  • the host program executed in the host processor 10 may define an N-dimensional index space called NDRange while giving a command to execute a kernel of the OpenCL program.
  • each index of the NDRange may be referred to as a work item, and the work items may be classified into a unit called a work group.
  • the computing device 20 may create a kernel instance that is a thread that executes a kernel function for each work item according to a kernel execution command.
  • Each work group may each be executed in one or more calculation units 21 constituting the calculation device 20, and various work items included in each work group may be executed in the PE 22 of the calculation unit 21, respectively. .
  • the calculation device 20 of the OpenCL platform system 100 is an FPGA
  • an FPGA circuit capable of executing an OpenCL program should be implemented.
  • an FPGA circuit that receives a kernel function of an OpenCL program, specifically an OpenCL program, as an input and executes parallel work items of the kernel function should be implemented.
  • a circuit structure may be expressed in hardware description languages such as Verilog and VHDL, and the circuit structure of the FPGA may be used, which is called a logic synthesis.
  • High-level synthesis consists of a process of creating a circuit structure by receiving a program written in a high-level language and applying a logic synthesis technology to change the circuit structure of the FPGA.
  • the separated processes are conceptually separated and do not have to be clearly divided into two separate steps.
  • high-level synthsis may be performed by a single piece of software.
  • the hardware description language in the process of constructing the circuit structure is not a human-readable language such as Verilog or VHDL. It may be an intermediate representation.
  • the program compilation device may be any component on the OpenCL platform system 100, and may also be a component located outside the OpenCL platform system 100.
  • the host processor 10 will be described as being a program compilation device.
  • FIG. 2 is a block diagram illustrating a program compilation device 20 according to an embodiment of the present invention.
  • the program compilation apparatus 20 may include a graph generator 210 generating a control flow graph based on the OpenCL kernel.
  • the graph generator 210 may represent all paths that the OpenCL kernel can traverse during execution as a control flow graph using graph notation.
  • the program compilation device 20 may further include a syntax separation unit 220 for separating the OpenCL kernel into the syntax before the loop, the statement inside the loop and the statement after the loop.
  • syntax separator 220 may search for a loop in the control flow graph generated by the graph generator 210, and accordingly, based on the loop, the syntax before the loop, the syntax within the loop, and the loop after the loop. Identify syntax
  • the program compilation device 20 may further include a circuit generation unit 230 for generating a circuit corresponding to each of the syntax separated by the syntax separator 220.
  • the circuit generation unit 230 may generate a first unit circuit corresponding to the syntax before the loop, a second unit circuit corresponding to the syntax in the loop, and a third unit circuit corresponding to the syntax after the loop.
  • the circuit generation unit 230 duplicates the second unit circuit more than a predetermined number of times based on the FPGA capacity.
  • the second unit circuit may be generated.
  • the circuit generation unit 230 may additionally generate a first control circuit between the first unit circuit and the second unit circuit and combine the first unit circuit and the second unit circuit, respectively.
  • the first control circuit may examine the first signal value indicating that a new work item may enter the second unit circuit, and if it is determined that the new work item may enter the second unit circuit, the first unit The calculation result calculated by the circuit may be transmitted to the second unit circuit together with the ID of the corresponding work item.
  • the circuit generation unit 230 may additionally generate a second control circuit between the second unit circuit and the third unit circuit and combine it with each of the second unit circuit and the third unit circuit.
  • the second control circuit may check a second signal value indicating a state in which the work item exits the loop and should continue to be executed in the third unit circuit, and if it is determined that the work item should exit from the second unit circuit, The calculation result calculated by the two unit circuit may be transmitted to the third unit circuit together with the ID of the corresponding work item.
  • the program compilation device 20 may further include a language generator 240 for expressing a circuit generated by the circuit generator 230 in a hardware description language.
  • ' ⁇ part' means that to perform certain roles, ' ⁇ part' Means software or a hardware component such as an FPGA or an ASIC, and can also be a CPU, GPU, or the like.
  • ' ⁇ ' is not meant to be limited to software or hardware.
  • ' ⁇ Portion' may be configured to be in an addressable storage medium or may be configured to play one or more processors.
  • ' ⁇ ' means components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, procedures, and the like. Subroutines, segments of program patent code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables.
  • the functionality provided within the components and 'parts' may be combined into a smaller number of components and 'parts' or separated from additional components and 'parts'.
  • components and ' ⁇ ' may be implemented to play one or more CPUs in the device or secure multimedia card.
  • FIG. 3 is a flowchart illustrating a program compilation method according to an embodiment of the present invention.
  • the program compilation method according to the embodiment shown in FIG. 3 includes steps processed in time series by the program compilation device 10 shown in FIG. 2. Therefore, even if omitted below, the above descriptions of the program compilation device 10 shown in FIG. 2 may be applied to the program compilation method according to the embodiment shown in FIG. 3.
  • FIGS. 4 and 5 are each an exemplary view for explaining a program compilation method according to an embodiment of the present invention
  • Figure 4 is an illustration of an FPGA implementing an OpenCL kernel including a loop statement according to an embodiment of the present invention
  • 5 is an exemplary diagram of an FPGA implementing an OpenCL kernel including a plurality of loops according to an embodiment of the present invention.
  • the program compilation device 10 may separate the OpenCL kernel into a loop 40 before, a half wood 41, and a loop 42 (S310).
  • the program compilation device 10 may generate a control flow graph (CFG) from the OpenCL kernel, and may search for loops and separate the kernel using the generated control flow graph.
  • CFG control flow graph
  • the program compilation device 10 may generate a circuit structure for each part of the separated kernel (S320).
  • the loop 40 before the OpenCL kernel may be implemented as one first unit circuit 43, and the loop 41 in the OpenCL kernel may be implemented as three second unit circuits 44. After the loop statement of the OpenCL kernel 42, the third unit circuit 45 may be implemented.
  • the first to third unit circuits 43, 44, and 45 may simultaneously execute a plurality of work items using a pipelining technique.
  • each of the first to third unit circuits illustrated in FIG. 4 is assumed to have two pipeline stages, according to another exemplary embodiment, at least one pipeline stage of the unit circuit may be implemented.
  • each of the first to third unit circuits 43, 44, and 45 may be composed of multiple pipeline stages, and each pipeline stage may store an ID and a calculation result of a work item currently being executed in a register. Each time the clock cycle elapses, each pipeline stage can read the ID of the previous pipeline stage and the calculation result to perform calculations on the corresponding work item.
  • the execution result of the last pipeline stage of the second unit circuit 44 may include the condition value of the loop statement, for example, if the value is true, the work item is the second unit circuit 44. It is run again from the first pipeline stage of.
  • the work item exits the loop, so that the second unit circuit 44 finishes executing the new unit item in the first unit circuit 43. You can then run
  • the first unit circuit 43 and the third unit circuit 45 may be made only one pair without duplicating, and the second unit circuit 44 may be duplicated as much as the FPGA capacity allows. Can be generated.
  • the number of the first unit circuit 43 and the third unit circuit 45 is minimized and the number of the second unit circuits 44 is maximized. That can maximize the advantages of the present invention.
  • the first unit circuit 43 before the loop, the second unit circuit 44 for the loop, and the third unit circuit 45 for the loop after the first to second control circuits 50 and 51 Each can be combined using (S330).
  • the control circuits 50 and 51 to be executed may be located, and the first to third unit circuits 43, 44 and 45 may be coupled to the control circuits 50 and 51, respectively.
  • the first unit circuit 43 and the second unit circuit 44 are coupled to each other by the first control circuit 50 one by one, and the second unit circuit 44 and the second unit circuit 44 are combined.
  • the three unit circuit 45 may be coupled to many by one by the second control circuit 51.
  • first to second control circuits 50 and 51 coupled to the first to third unit circuits 43, 44, and 45 may be implemented to operate as follows.
  • the first control circuit 50 may receive a first signal indicating that a new work item can enter from the second unit circuit 44.
  • the first signal may be, for example, 1 when the condition value is false and 0 when it is true, 0 when the second unit circuit 44 is stalled, and the second unit circuit. If the last pipeline stage of 44 is empty, the value of the first signal may be one.
  • the first control circuit 50 may examine the first signal values of all the second unit circuits 44.
  • the ID of the corresponding work item and the calculation result may be transmitted to the input of the second unit circuit 44 whose first signal has a value of 1 (that is, a new work item may be executed). If there are a plurality of second unit circuits 44 having the value of the first signal of 1, one of them may be selected in a predetermined order.
  • the first control circuit 50 may send a stall signal to the first unit circuit 43 to pause execution of the work item.
  • the second control circuit 51 may include a second control circuit 51 indicating that the work item of the last pipeline stage of the second unit circuit 44 exits from the loop. By receiving the two signals, the second signal values of all the second unit circuits 44 can be examined.
  • the second control circuit 51 is associated with the work item ID of the last pipeline stage of the second unit circuit 44.
  • the calculation result can be transmitted to the input of the third unit circuit 45, and if there are a plurality of second unit circuits 44 having a value of the second signal of 1, one of them can be selected in an arbitrary order.
  • a stall signal may be sent to the remaining second unit circuits so as to pause.
  • the second signal may be, for example, 1 when the condition value determined in the last pipeline stage of the second unit circuit 44 is false, or 0 when true, and the second unit circuit 44 may be stalled.
  • the first signal becomes 0, while the second signal maintains its original value, and when the last pipeline stage of the second unit circuit is empty, the value of the second signal may be 0.
  • FIG. 4 assumes that an OpenCL kernel includes one loop, but is not necessarily limited thereto.
  • the loop may be divided into several parts based on each loop, and each part may be separated. May be implemented as a unit circuit.
  • FIG. 5 is an exemplary diagram of an FPGA implementing an OpenCL kernel including a plurality of loops.
  • an OpenCL kernel may be divided based on first to second loops, and before the first loop 500. Can be divided into a first loop (501) and a first loop (502), and also before the second loop (502) and after the second loop (504) based on another second loop (503).
  • the control flow graph can be used to search for the first to second iterations and to isolate the kernel.
  • program compilation device 10 implements each part divided in the same manner as the method described in step S320 in the FPGA circuit structure, and implements the unit circuits implemented in the step S320 in the first to fourth control circuits 505, 506, and 507. , 508, respectively.
  • the program compilation device 10 may express the circuit structure generated in step S330 in a hardware description language (S340).
  • the program compilation method according to the embodiment described with reference to FIG. 3 may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by the computer.
  • Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media.
  • computer readable media may include both computer storage media and communication media.
  • Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Communication media typically includes computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transmission mechanism, and includes any information delivery media.
  • the program compilation method may be implemented as a computer program (or computer program product) including instructions executable by a computer.
  • the computer program includes programmable machine instructions processed by the processor and may be implemented in a high-level programming language, an object-oriented programming language, an assembly language, or a machine language.
  • the computer program may also be recorded on tangible computer readable media (eg, memory, hard disks, magnetic / optical media or solid-state drives, etc.).
  • the program compilation method may be implemented by executing the computer program as described above by the computing device.
  • the computing device may include at least a portion of a processor, a memory, a storage device, a high speed interface connected to the memory and a high speed expansion port, and a low speed interface connected to the low speed bus and the storage device.
  • a processor may include at least a portion of a processor, a memory, a storage device, a high speed interface connected to the memory and a high speed expansion port, and a low speed interface connected to the low speed bus and the storage device.
  • Each of these components are connected to each other using a variety of buses and may be mounted on a common motherboard or otherwise mounted in a suitable manner.
  • the processor may process instructions within the computing device, such as to display graphical information for providing a graphical user interface (GUI) on an external input, output device, such as a display connected to a high speed interface. Instructions stored in memory or storage. In other embodiments, multiple processors and / or multiple buses may be used with appropriately multiple memories and memory types.
  • the processor may also be implemented as a chipset consisting of chips comprising a plurality of independent analog and / or digital processors.
  • the memory also stores information within the computing device.
  • the memory may consist of a volatile memory unit or a collection thereof.
  • the memory may consist of a nonvolatile memory unit or a collection thereof.
  • the memory may also be other forms of computer readable media, such as, for example, magnetic or optical disks.
  • the storage device can provide a large amount of storage space to the computing device.
  • the storage device may be a computer readable medium or a configuration including such a medium, and may include, for example, devices or other configurations within a storage area network (SAN), and may include a floppy disk device, a hard disk device, an optical disk device, Or a tape device, flash memory, or similar other semiconductor memory device or device array.
  • SAN storage area network

Abstract

La présente invention concerne un dispositif et un procédé de compilation de programme. D'après un premier aspect de la présente invention, un dispositif de compilation de programme permettant de compiler un programme OpenCL comprend : une unité de division en syntaxes conçue pour diviser un noyau OpenCL en une syntaxe avant une instruction répétée, une syntaxe dans l'instruction répétée et une syntaxe après l'instruction répétée; une unité de génération de circuit conçue pour générer un circuit correspondant à chaque syntaxe; et une unité de génération de langage conçue pour exprimer le circuit généré en un langage de description de matériel.
PCT/KR2016/000228 2015-01-16 2016-01-11 Dispositif et procédé de compilation de programme WO2016114532A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR20150007904 2015-01-16
KR10-2015-0007904 2015-01-16
KR10-2015-0190701 2015-12-31
KR1020150190701A KR101737785B1 (ko) 2015-01-16 2015-12-31 프로그램 컴파일 장치 및 프로그램 컴파일 방법

Publications (1)

Publication Number Publication Date
WO2016114532A1 true WO2016114532A1 (fr) 2016-07-21

Family

ID=56406035

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2016/000228 WO2016114532A1 (fr) 2015-01-16 2016-01-11 Dispositif et procédé de compilation de programme

Country Status (1)

Country Link
WO (1) WO2016114532A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130036409A1 (en) * 2011-08-02 2013-02-07 International Business Machines Corporation Technique for compiling and running high-level programs on heterogeneous computers
JP2013164847A (ja) * 2012-02-09 2013-08-22 Altera Corp 高レベル言語を用いるプログラマブルデバイスの構成
US20130346953A1 (en) * 2012-06-22 2013-12-26 Altera Corporation Opencl compilation
KR20140119619A (ko) * 2013-03-29 2014-10-10 삼성전자주식회사 벡터 코드 생성 장치 및 방법
JP2014225194A (ja) * 2013-05-17 2014-12-04 国立大学法人 筑波大学 ハードウェア設計装置,及びハードウェア設計用プログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130036409A1 (en) * 2011-08-02 2013-02-07 International Business Machines Corporation Technique for compiling and running high-level programs on heterogeneous computers
JP2013164847A (ja) * 2012-02-09 2013-08-22 Altera Corp 高レベル言語を用いるプログラマブルデバイスの構成
US20130346953A1 (en) * 2012-06-22 2013-12-26 Altera Corporation Opencl compilation
KR20140119619A (ko) * 2013-03-29 2014-10-10 삼성전자주식회사 벡터 코드 생성 장치 및 방법
JP2014225194A (ja) * 2013-05-17 2014-12-04 国立大学法人 筑波大学 ハードウェア設計装置,及びハードウェア設計用プログラム

Similar Documents

Publication Publication Date Title
BR112019015271B1 (pt) Método implementado em computador, sistema de rastreamento de hardware distribuído e unidade de armazenamento.
US20120216021A1 (en) Performing An All-To-All Data Exchange On A Plurality Of Data Buffers By Performing Swap Operations
US7646721B2 (en) Locating hardware faults in a data communications network of a parallel computer
US20070245122A1 (en) Executing an Allgather Operation on a Parallel Computer
WO2020138663A1 (fr) Dispositif pour opération fondée sur risc-v comprenant une opération rapide fondée sur matériel prenant en charge un ensemble d'instructions défini par l'utilisateur et procédé associé
US8954943B2 (en) Analyze and reduce number of data reordering operations in SIMD code
US11609798B2 (en) Runtime execution of configuration files on reconfigurable processors with varying configuration granularity
US8484440B2 (en) Performing an allreduce operation on a plurality of compute nodes of a parallel computer
US20120079133A1 (en) Routing Data Communications Packets In A Parallel Computer
KR920704231A (ko) 고도 병렬 스칼라/벡터 멀티프로세서 시스템용 클러스터 구성
US8533390B2 (en) Circular buffer in a redundant virtualization environment
WO2016064158A1 (fr) Processeur reconfigurable et procédé d'exploitation associé
US11182264B1 (en) Intra-node buffer-based streaming for reconfigurable processor-as-a-service (RPaaS)
US8548966B2 (en) Asynchronous assertions
BR112019027531A2 (pt) processadores de alto rendimento
RU2016121724A (ru) Архитектура параллельной вычислительной системы
WO2015130093A1 (fr) Procédé et appareil de prévention de conflit de bloc dans une mémoire
WO2016114532A1 (fr) Dispositif et procédé de compilation de programme
WO2013027951A1 (fr) Procédé et appareil d'attribution d'interruptions dans système multicœur
WO2022124720A1 (fr) Procédé de détection d'erreur de la mémoire de noyau du système d'exploitation en temps réel
US9571329B2 (en) Collective operation management in a parallel computer
WO2021137669A1 (fr) Procédé de génération d'un programme pour un accélérateur pour un apprentissage profond
CN112631955B (zh) 数据处理方法、装置、电子设备以及介质
US20130219410A1 (en) Processing Unexpected Messages At A Compute Node Of A Parallel Computer
US8930956B2 (en) Utilizing a kernel administration hardware thread of a multi-threaded, multi-core compute node of a parallel computer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16737496

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16737496

Country of ref document: EP

Kind code of ref document: A1