EP3724779A1 - Architectures de processeur - Google Patents
Architectures de processeurInfo
- Publication number
- EP3724779A1 EP3724779A1 EP18833085.6A EP18833085A EP3724779A1 EP 3724779 A1 EP3724779 A1 EP 3724779A1 EP 18833085 A EP18833085 A EP 18833085A EP 3724779 A1 EP3724779 A1 EP 3724779A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- architecture
- processor
- instructions
- data
- architectures
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7867—Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
- G06F15/7871—Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/44—Encoding
- G06F8/447—Target code generation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
- G06F8/47—Retargetable compilers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44589—Program code verification, e.g. Java bytecode verification, proof-carrying code
Definitions
- the invention relates to the field of processors, in particular their intrinsic operation.
- processors have a defined architecture when they are designed.
- the architecture is at least partly defined by the implementation of a set of machine instructions that the processor can execute (or ISA for "Instruction Set Architecture"). It is generally accepted that each known structure can be classified into one of the following types (or classes), defined according to the Flynn taxonomy:
- MISD Multiple Instructions Single Data
- MIMD Multiple Instructions Multiple Data
- the invention improves the situation.
- a processor comprising a control unit and a plurality of processing units. Processing units interacting according to an operating architecture dynamically imposed by the control unit among at least two of the following architectures and combinations of architectures:
- MIMD multi-flow instruction architecture and unique data flow
- the operating architecture is dynamically imposed by the control unit according to:
- Such a processor allows dynamic and contextual adaptation of its internal functioning.
- the calculations to be carried out are independent of each other, they can be processed in parallel, therefore at the same time, by calculation units that are distinct from one another. The processing of all calculations is accelerated.
- a parallel processing is not suitable.
- Recursive calculations are an example of calculations that are not very suitable for parallel processing: to perform a calculation, the result of a previous calculation is necessary. Line or multiple calculation units must perform calculations sequentially one cycle after the other.
- Lin such processor is versatile. Lin such processor has a variant architecture during the execution of the calculations according to the calculations themselves.
- the method further comprises the following step:
- the compilation includes the inclusion in the machine code of configuration functions.
- the configuration functions are arranged to dynamically impose on a processor executing the machine code an architecture among at least two of the following architectures and combinations of architectures: - architecture with a single flow of instructions and multiple data flows (SIMD),
- MISD multi-flow instruction architecture and unique data flow
- MIMD multiple flow instruction architecture and multiple data flows
- the method of compiling a source code further comprises verifying compliance with a set of pre-established rules in the input processing instructions, the configuration functions included in the machine code during the compilation being furthermore selected according to the respect or the non-respect of these rules.
- processor management method of a processor implemented by computer means, comprising the following steps:
- an operating architecture as a function of said data to be processed and received processing instructions, the operating architectures being selected from at least two of the following architectures and combination of the following architectures:
- MISD multi-flow instruction architecture and unique data flow
- MIMD multiple flow instruction architecture and multiple data flows
- a non-transitory recording medium readable by a computer on which is recorded a compilation program comprising instructions for the implementation of the method above.
- a compilation computer program comprising instructions for implementing the compilation method, when this program is executed by a processor.
- a non-transitory recording medium readable by a control unit of a processor on which is recorded a set of machine instructions for the implementation of an architectural management method. as defined herein.
- a set of machine instructions for implementing the architecture management method when the set of machine instructions is executed by the control unit of a processor.
- FIG. 1 partially shows an architecture of a processor according to the invention
- FIG. 2 shows a mode of operation of a processor according to the invention
- FIG. 3 shows a mode of operation of a processor according to the invention
- FIG. 4 shows a detail of operation of a processor according to the invention.
- FIG. 1 shows a processor 1, sometimes called the central processing unit or 041 for "Central Processing Unit".
- the processor 1 comprises:
- the processor 1 receives via the input-output unit 7 data to be processed ("data") and processing instructions ("instructions").
- data data to be processed
- processing instructions instructions
- the data and instructions are stored in the memory unit 9.
- the memory unit 9 can be divided into several parts.
- the memory unit 9 comprises a data part (or “data pool”) and an instruction part (or “instruction pool”).
- Each processing unit 5 performs the calculations on data and according to instructions derived from those stored in the memory unit 9.
- the control unit 3 imposes on each processing unit 5 the manner of carrying out the elementary calculations, in particular their order, and attribute to each calculating unit of the processing unit the operations to be performed.
- each processing unit 5, or PLF for "Processing Unit” comprises several computing units: the arithmetic and logical units, or ALUs for "Arithmetic-Logic Unit”.
- Each processing unit 5 comprises at least one ALU and at least one set of associated registers REG.
- each processing unit 5 is numbered from PU 0 to PU N.
- Each ALU is numbered AB, where "A" identifies the processing unit PU A to which the ALU belongs and "B”. is an identifier of the ALU among the other ALUs of the PU A processing unit.
- the processor 1 comprises at least two ALUs distributed in two processing units 5.
- each processing unit 5 comprises a single ALU or a plurality of ALUs.
- each processing unit 5 comprises four ALUs numbered 0 to 3.
- the processing unit 5 is called multi-core.
- Each ALU can perform:
- the processing units 5 and the memory unit 9 interact according to one and / or the other of the following three architectures:
- MISD multi-flow instruction architecture and unique data flow
- MIMD multiple flow instruction architecture and multiple data flows
- FIG. 2 An example of a SIMD architecture is shown in FIG. 2.
- the processing units 5 interact according to the SIMD architecture.
- the data to be processed are copied (loaded) from the memory unit 9 to each of the reg register sets REG 0, REG 1, ..., REG N of the corresponding processing unit 5.
- ALUs perform the calculations.
- the results are written in the register set REG 0, REG 1, ..., REG N.
- the results are copied from the register sets REG 0, REG 1, ..., REG N in the unit of memory 9.
- the processing units 5 do not exchange data directly with each other.
- the ALUs of each processing unit process the data and perform calculations independently of one processing unit 5 to another.
- the operation of the processor 1 is parallelized at the level of the processing units 5.
- FIG. 3 An example of the MISD architecture is shown in FIG. 3.
- the processing units 5 interact according to the MISD architecture.
- the data is copied (loaded) from the memory unit 9 to the set of registers of a single processing unit 5, here the register set REG 0 of the processing unit PU 0.
- the ALUs 0.0, 0.1, 0.2 and 0.3 perform calculations.
- the results are written in the register set REG 0.
- the results are copied from the register set REG 0 onto a set of registers of another processing unit 5, here on the register set REG 1 of the PU processing unit 1.
- ALUs 1.0, 1.1, 1.2 and 1.3 perform calculations and the results are written in the REG 1 register set. These operations are reproduced one by one by each of the units 5 until the results are written in the register set of the last processing unit 5, here the register set REG N of the processing unit PU N. Then, the results are copied from the The last set of registers REG N in the memory unit 9.
- the processing units 5 exchange data directly with each other.
- the ALUs of each processing unit 5 perform calculations on data which are themselves the results of the calculations implemented by the other processing units 5.
- the operation of the processor 1 is not parallelized at the level of the processing units 5.
- the processing units 5 have series or cascade operation. This type of operation is for example adapted to so-called recursive calculations.
- the operations implemented by the processing units 5 may be the same but are applied to data which are each time different. Alternatively, the instructions could also be different from each other and the data also be different from each other. In the example of Figure 3, as in that of Figure 2, the interactions between the ALU are not represented.
- the operations of two PUs are represented.
- the PU X processing unit comprises four ALUs.
- the ALUs of the PU X processing unit interact with each other.
- the data is loaded from the register set REG X onto each of the ALUs X.0, X1, X.2, X.3 of the processing unit PU X.
- the ALUs perform the calculations.
- the results are then written to the REG X register set.
- ALUs do not exchange data directly with each other.
- the architectures at the level of the processing units 5 are not represented.
- the processing units 5 can interact according to one and / or the other of the SIMD, MISD and MIMD architectures, as described above with respect to FIGS. 2 and 3.
- the example of the PU X is compatible with the example of Figure 2 and with the example of Figure 3.
- the operating architectures can be imposed dynamically by the control unit 3 according to the data to be processed and the current instructions received at the input of the processor 1.
- Such dynamic adaptation of the architectures can be implemented as soon as the compilation stage, adapting the machine instructions generated by the compiler according to the type of data to be processed and instructions when the type of data to be processed and the instructions can be deduced from the source code.
- Such an adaptation can also be implemented only at the processor level when it executes a conventional machine code when this processor is programmed to implement a set of configuration instructions depending on the data to be processed and the current instructions received.
- This code extract corresponds for example to instructions of a source code to be implemented by the processor 1.
- the processor comprises four processing units PU, each processing unit PU comprises four arithmetic and logical units ALU.
- the matrix multiplication is processed first while the matrix addition is processed Secondly.
- the compiler is able to process matrix operations by breaking them down into elementary operations. For each matrix operation, the elementary operations that compose it are independent of each other. In other words, the result of an elementary operation is useless for implementing another elementary operation. The elementary operations can therefore be implemented in parallel with each other.
- the addition of two matrices of dimensions 4 by 4 requires 16 elementary operations (addition). This matrix addition, ie the 16 elementary operations, can be executed in a single cycle.
- the multiplication of the two matrices of dimensions 4 by 4 requires 64 elementary operations (multiplication + accumulation). This matrix multiplication, ie the 64 elementary operations, are thus executed at least in four cycles.
- processor 1 adopts a SIMD architecture.
- processor 1 adopts a MISD architecture.
- Each cycle is implemented by a PU processing unit.
- the assignment of the N stages of a cycle to the ALUs of the processing unit PU is, for example, as follows:
- ALU 1, ALU 2 and ALU 3 are unused.
- the operating architectures can be imposed dynamically by the control unit 3 according to the data and current instructions received at the input of the processor 1. This covers two cases.
- the architecture and resource allocation are fixed during compilation.
- a developer of a third-party program (other than those governing the intrinsic operation of the processor) may include specific configuration instructions in the source code.
- the specific instructions are transcribed in target language (during compilation) in specific instructions (machine code) recognized by the control unit 3.
- the control unit 3 On reception, on the processor 1, the control unit 3 imposes architectures on the control units. treatment 5 in a predefined manner by the instructions.
- the responsibility for optimizing the operation of the processor may be left to the creator of the third-party program.
- the programmer is free to impose or not a particular operation of the processor, that is to say here an architecture selected from SISD, MISD, MIMD or a combination thereof.
- the architecture and the allocation of resources are fixed in a pre-established manner according to a set of machine instructions implemented in the processor.
- a set of instructions is generally implemented before marketing and use of the processor.
- the machine instruction set is not intended to be modified by the CPU users.
- the control unit 3 upon receiving the instructions on the processor 1, the control unit 3 implements an architecture management method, or configuration of the architectures, prior to the implementation of the instructions received at the input. For example, the control unit 3 transmits beforehand to each processing unit PU configuration data specific to each processing unit PU. Each of the configuration data sets is stored in a configuration register accessible to the corresponding processing unit. Then, PU processing units receive generic processing instructions (common to all PUs) of an instruction bus.
- Each PU implements the processing instructions in a manner that varies according to the configuration data previously received and stored in the configuration register.
- each PU interprets the generic processing instructions using the configuration data to adapt the processing to be implemented.
- the second case allows to introduce a flexibility in the control.
- Generic instructions can be transmitted to all PUs, regardless of the architecture to be adopted (SIMD, MISD, MIMD). The prior transmission of the configuration data makes it possible to select the architecture actually adopted by the PUs on receipt of generic instructions.
- the architectures can be dynamic, that is to say, evolve over the execution steps of the instructions received, in particular according to the nature of the calculations to be performed.
- a SIMD architecture may be imposed by default and an MISD architecture may be required for recursive calculations.
- the processor 1 can be arranged to implement an architecture management method according to specific configuration instructions received (contained in the compiled computer code) and to implement a method of management of the architectures according to a set of machine instructions in the absence or in addition to specific configuration instructions among the instructions received at the input.
- the control unit 3 transforms the processing instructions received in the usual way into entry in adapted instructions or "macro-instructions" (or "custom instructions”).
- the adapted instructions contain both processing instructions and configuration instructions.
- the processor operates in SIMD mode.
- Processing units 5 perform all the same operations on different data to be processed.
- the processor operates in MISD or MIMD mode.
- Processing units 5 perform operations distinct from each other on identical (MISD) or different (MIMD) data to be processed.
- ⁇ In the example is defined an array of matrices, each matrix being of dimension 4 by 4, the array being of size 2 (including two matrices).
- the function noted “inv” on a matrix array consists in inverting each element of the array, that is to say inverting each of the two matrices of dimension 4 by 4.
- the processor comprises four ALUs.
- cycle 1 steps 1 to 4 for matrix A [0];
- cycle 2 steps 1 to 4 for matrix A [l];
- cycle 3 steps 5 to 8 for the matrix A [0];
- cycle 4 steps 5 to 8 for matrix A [l];
- cycle 5 step 9 for matrix A [0];
- cycle 6 step 9 for matrix A [l].
- the ALU 0 can be assigned to the calculations relating to the matrix A [0] whereas the ALU 1 is assigned to the calculations relating to the matrix A [1].
- the assignment of each ALU to operations may be scheduled at compile time if at least a portion of the data is known at this stage, including the size of the arrays and the size of the array.
- the assignment can be done dynamically.
- the allocation may be imposed by the control unit 3 according to a set of machine instructions implemented on the processor 1.
- the set of machine instructions is recorded on a support of non-transient recording (for example a part of the memory unit 9) readable by the control unit 3 to implement a management method of the architecture of the processor 1.
- control unit 3 is arranged to impose on a processing unit 5 the implementation of a first set of operations by the set of ALUs, then the first set of operations is reiterated on each of the two sets of operations. elements of the array (each of the matrices in the previous example). Next, the number of operations that can be carried out in parallel (no interdependent) is estimated. For example, the number of resources (the number of ALUs) is divided by the number of operations to be implemented. Finally, the assignment of operations to each ALU is performed so that at least some of the operations are performed in parallel with each other by separate ALUs of distinct processing units.
- the architecture of the processing units 5 may vary over time.
- the architecture of the processing units 5 can alternate between SIMD, MISD and MIMD.
- the invention is not limited to the examples of processors described above, only by way of example, but it encompasses all the variants that may be considered by those skilled in the art within the framework of the protection sought.
- the invention also relates to a set of machine instructions implementable in a processor for obtaining such a processor, the implementation of such a set of machine instructions on a processor, the processor architecture management method implemented by the processor, the computer program comprising the corresponding set of machine instructions, and the recording medium on which such a set of machine instructions is recorded.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Advance Control (AREA)
- Executing Machine-Instructions (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
Description
Claims
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
FR1762068A FR3074931B1 (fr) | 2017-12-13 | 2017-12-13 | Architectures de processeur |
PCT/FR2018/052995 WO2019115902A1 (fr) | 2017-12-13 | 2018-11-27 | Architectures de processeur |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3724779A1 true EP3724779A1 (fr) | 2020-10-21 |
Family
ID=61802089
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP18833085.6A Pending EP3724779A1 (fr) | 2017-12-13 | 2018-11-27 | Architectures de processeur |
Country Status (6)
Country | Link |
---|---|
US (1) | US20210173809A1 (fr) |
EP (1) | EP3724779A1 (fr) |
KR (1) | KR20200121788A (fr) |
CN (1) | CN111512296A (fr) |
FR (1) | FR3074931B1 (fr) |
WO (1) | WO2019115902A1 (fr) |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5765011A (en) * | 1990-11-13 | 1998-06-09 | International Business Machines Corporation | Parallel processing system having a synchronous SIMD processing with processing elements emulating SIMD operation using individual instruction streams |
CA2073516A1 (fr) * | 1991-11-27 | 1993-05-28 | Peter Michael Kogge | Ordinateur a reseau de processeurs paralleles multimode dynamiques |
US5933642A (en) * | 1995-04-17 | 1999-08-03 | Ricoh Corporation | Compiling system and method for reconfigurable computing |
US5903771A (en) * | 1996-01-16 | 1999-05-11 | Alacron, Inc. | Scalable multi-processor architecture for SIMD and MIMD operations |
US8099777B1 (en) * | 2004-08-26 | 2012-01-17 | Rockwell Collins, Inc. | High security, multi-level processor and method of operating a computing system |
GB2437837A (en) * | 2005-02-25 | 2007-11-07 | Clearspeed Technology Plc | Microprocessor architecture |
US8156474B2 (en) * | 2007-12-28 | 2012-04-10 | Cadence Design Systems, Inc. | Automation of software verification |
KR100960148B1 (ko) * | 2008-05-07 | 2010-05-27 | 한국전자통신연구원 | 데이터 프로세싱 회로 |
WO2014142704A1 (fr) * | 2013-03-15 | 2014-09-18 | Intel Corporation | Procédés et appareil pour compiler des instructions pour une architecture de processeur à vecteurs de pointeurs d'instruction |
US10055228B2 (en) * | 2013-08-19 | 2018-08-21 | Shanghai Xinhao Microelectronics Co. Ltd. | High performance processor system and method based on general purpose units |
KR20160061701A (ko) * | 2014-11-24 | 2016-06-01 | 삼성전자주식회사 | 서로 다른 정확도를 갖는 연산기들을 이용하여 데이터를 처리하는 방법 및 장치 |
JP6427055B2 (ja) * | 2015-03-31 | 2018-11-21 | 株式会社デンソー | 並列化コンパイル方法、及び並列化コンパイラ |
-
2017
- 2017-12-13 FR FR1762068A patent/FR3074931B1/fr active Active
-
2018
- 2018-11-27 US US16/771,376 patent/US20210173809A1/en active Pending
- 2018-11-27 WO PCT/FR2018/052995 patent/WO2019115902A1/fr unknown
- 2018-11-27 KR KR1020207020211A patent/KR20200121788A/ko unknown
- 2018-11-27 EP EP18833085.6A patent/EP3724779A1/fr active Pending
- 2018-11-27 CN CN201880080771.2A patent/CN111512296A/zh active Pending
Also Published As
Publication number | Publication date |
---|---|
FR3074931B1 (fr) | 2020-01-03 |
US20210173809A1 (en) | 2021-06-10 |
WO2019115902A1 (fr) | 2019-06-20 |
FR3074931A1 (fr) | 2019-06-14 |
CN111512296A (zh) | 2020-08-07 |
KR20200121788A (ko) | 2020-10-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11106437B2 (en) | Lookup table optimization for programming languages that target synchronous digital circuits | |
EP3912025B1 (fr) | Langage et compilateur pour la génération d'un circuit numerique synchrone qui maintient l'ordre d'exécution des fils | |
US10810343B2 (en) | Mapping software constructs to synchronous digital circuits that do not deadlock | |
US20100250564A1 (en) | Translating a comprehension into code for execution on a single instruction, multiple data (simd) execution | |
EP2860656B1 (fr) | Procédé d'exécution par un microprocesseur d'un code binaire polymorphique d'une fonction prédéterminée | |
EP1290554A1 (fr) | Systeme informatique modulaire et procede associe | |
EP3724779A1 (fr) | Architectures de processeur | |
US20230315409A1 (en) | Compilation and execution of source code as services | |
WO2013110801A1 (fr) | Procédé de compilation, programme d'ordinateur et système informatique | |
Sengupta | Julia High Performance | |
FR3071642B1 (fr) | Procede d'execution d'un programme par un appareil electronique | |
CN112602058B (zh) | 处理器存储器存取 | |
Najjar et al. | ROCCC 2.0 | |
WO2000060460A1 (fr) | Procede generique d'aide au placement d'applications de traitement de signal sur calculateurs paralleles | |
Ejjaaouani | Design of the InKS programming model for the separation of algorithm and optimizations in simulation codes: application to the 6D Vlasov-Poisson system solving | |
WO2020002783A1 (fr) | Architecture de processeur asynchrone |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20200611 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAV | Request for validation of the european patent (deleted) | ||
DAX | Request for extension of the european patent (deleted) | ||
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20230714 |