EP3782036A1 - Mimd processor emulated on simd architecture - Google Patents

Mimd processor emulated on simd architecture

Info

Publication number
EP3782036A1
EP3782036A1 EP19742845.1A EP19742845A EP3782036A1 EP 3782036 A1 EP3782036 A1 EP 3782036A1 EP 19742845 A EP19742845 A EP 19742845A EP 3782036 A1 EP3782036 A1 EP 3782036A1
Authority
EP
European Patent Office
Prior art keywords
elementary
processor
instruction
elementary processor
instructions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP19742845.1A
Other languages
German (de)
French (fr)
Inventor
Stéphane CHEVOBBE
Marc Duranton
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Original Assignee
Commissariat a lEnergie Atomique CEA
Commissariat a lEnergie Atomique et aux Energies Alternatives CEA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Commissariat a lEnergie Atomique CEA, Commissariat a lEnergie Atomique et aux Energies Alternatives CEA filed Critical Commissariat a lEnergie Atomique CEA
Publication of EP3782036A1 publication Critical patent/EP3782036A1/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/80Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
    • G06F15/8007Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30101Special purpose registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/321Program or instruction counter, e.g. incrementing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • G06F9/3887Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by a single instruction for multiple data lanes [SIMD]

Definitions

  • the present invention generally relates to the field of Multiple Instruction Multiple Data (MIMD) processors, in particular for performing image processing in a vision system such as an intelligent retina.
  • MIMD Multiple Instruction Multiple Data
  • Intelligent retinas are integrated circuits combining a matrix of sensors and a processor consisting of a matrix of elementary processors, the elementary processors, also called PEs (Processing Elements) performing processing on the signals provided by these sensors.
  • the elementary processors also called PEs (Processing Elements) performing processing on the signals provided by these sensors.
  • PEs Processed Elements
  • an elementary processor is in charge of processing the signals from one or more pixels.
  • the processor can perform elementary image processing (spatial filtering for example) or even more complex operations, such as POIs or object detection.
  • the processor architecture is of the SIMD (Single Instruction Multiple Data) type, ie the same instruction is performed in parallel by all the elementary processors which each process a different datum because connected to different pixels.
  • Each elementary processor has its own arithmetic and logical unit (ALU), registers and, if applicable, a local memory and receives the same instruction as all other elementary processors.
  • ALU arithmetic and logical unit
  • SIMD architecture This type of architecture is suitable for massively parallel computations but is not optimal when different processes must be executed on different parts. of the image.
  • the nature of the SIMD architecture requires that these separate processes be performed sequentially, which penalizes the execution time.
  • SIMD processor architecture whose elementary processors operate in parallel on the respective columns of the sensor array.
  • This architecture has been described in the article by T. Yamazaki et al. entitled “A 1 ms high-speed vision chip with 3D-stacked 1 column 140 Gops column-parallel PEs for spatial-temporal image processing” published in ISCCC 2017 Conf. Proc., Session 4, Imagers 4.9, pages 82-84.
  • This architecture allows a certain flexibility in that it is possible to choose independently and simultaneously one of four processing on different vertical regions of the image.
  • the object of the present invention is therefore to provide a processor architecture that is simple and allows to perform in a flexible manner separate parallel processing, in particular on different areas of any configuration of an image captured by a sensor array.
  • the present invention is defined by a SIMD architecture processor comprising a matrix of elementary processors, each elementary processor being associated with a memory cell intended to store data to be processed by said elementary processor, the processor further comprising a central controller, the processors elementaries being connected to the central controller by a first bus, said instruction bus, allowing the central controller to transmit instructions in parallel to the elementary processors, and by a second bus, called status bus, allowing the central controller to receive the statuses.
  • a SIMD architecture processor comprising a matrix of elementary processors, each elementary processor being associated with a memory cell intended to store data to be processed by said elementary processor, the processor further comprising a central controller, the processors elementaries being connected to the central controller by a first bus, said instruction bus, allowing the central controller to transmit instructions in parallel to the elementary processors, and by a second bus, called status bus, allowing the central controller to receive the statuses.
  • different elementary processors said processor being advantageous in that:
  • the central controller comprises a memory in which the tasks to be performed by the various elementary processors are stored in the form of a sequence of instructions, the central controller looping the sequence of instructions on the instruction bus, each instruction comprising a calculation flow identifier, a computational flow being defined as an ordered list of tasks, each calculation flow relating to one or more elementary processor (s);
  • each elementary processor comprises an instruction filter and an identifier table, the instruction filter being adapted to extract the calculation flow identifier of each instruction received by the elementary processor and to determine if the identifier is present; in said table, the instruction being stored in a FIFO buffer to be executed by the elementary processor in the affirmative and rejected by the elementary processor in the negative.
  • the FIFO buffer is typically popped at each instruction executed by said elementary processor.
  • each instruction of a task comprises a sequence number indicating its order of execution in the task
  • the instruction filter of the elementary processor comprising a counter incremented each time the FIFO buffer is pared, an instruction n ' being stored in the FIFO buffer only if its stream identifier is present in the table of the elementary processor and if its sequence number is equal to the output value of said counter.
  • the transmission frequency of the instructions on the instruction bus may notably be substantially greater than the frequency of execution of these instructions by the elementary processors.
  • Each instruction advantageously comprises an instruction pointer and the elementary processor comprises a micro-sequencer connected to a storage memory of a microcode library, the micro-sequencer sequencing micro-instructions of the microcode pointed by said instruction pointer.
  • each elementary processor can be connected to its neighbors by means of communication links, a communication link between a first elementary processor and a second elementary processor connecting a first transmission register of the first elementary processor to a second reception register. of the second elementary processor and a second transmission register of the second elementary processor to a reception register of the first elementary processor. The execution of the micro-instructions by the first elementary processor is then stopped as long as the first transmission register is not empty.
  • the execution of the micro-instructions by the second elementary processor is stopped as long as the second reception register is not full.
  • the first elementary processor having completed the execution of a task informs the central controller by a notification of its status and the second elementary processor is informed of this status by the central controller.
  • the present invention also relates to an intelligent optical sensor characterized in that it comprises a matrix of elementary sensors and a SIMD architecture processor according to one of the preceding claims, each elementary processor being associated with a plurality of sensors of said matrix and being adapted to process the signals from these sensors.
  • Each elementary processor may itself have a SIMD architecture.
  • Fig. 1 schematically represents the general architecture of a SIMD processor according to one embodiment of the invention
  • FIG. 2 schematically represents the architecture of an elementary processor of the processor of FIG. 1;
  • FIG. 3 schematically shows a mode of synchronization between two elementary processors of the processor of FIG. 1;
  • Fig. 4 schematically represents a task delegation between two elementary processors of the processor of FIG. 1. DETAILED PRESENTATION OF PARTICULAR EMBODIMENTS
  • SIMD processor As defined in the introductory part. Recall that such a processor consists of a matrix of elementary processors (PEs) sharing the same instruction bus and intended to run in parallel the same instruction during the same time interval. In a particular mode of use, this processor is integrated with a matrix of sensors (photodiodes for example) within an intelligent optical sensor (intelligent retina). More precisely, in this case, each elementary processor is associated with a sub-matrix of the sensor array, the signals of the various sensors of the sub-array being stored in a storage sub-array, also called macropixel. The structure of such a storage sub-matrix has been described in application FR-A-2984556. The elementary processors themselves advantageously have a SIMD architecture (each elementary processor then comprising a plurality of calculation units operating in parallel ) and can therefore process in parallel several data stored in the storage sub-array.
  • SIMD architecture each elementary processor then comprising a plurality of calculation units operating in parallel
  • MIMD multiple instructions multiple data processor
  • Fig. 1 schematically shows the architecture of a SIMD architecture processor according to one embodiment of the invention.
  • This processor comprises a matrix 120 of elementary processors 150 (PE), each elementary processor can access a memory cell with which it is associated. More precisely, the memory 125 is divided into memory cells 155 (CE) containing the data to be processed by the elementary processor.
  • the memory cell has for example the structure of the aforementioned storage sub-matrix and each elementary processor processes the data of the corresponding macropixel.
  • the elementary processors are connected in parallel to a central controller 110 by means of a first common bus, called the instruction bus. Thus, when an instruction is transmitted by the controller, each of the elementary processors receives it and can execute it in parallel.
  • the elementary processors are also connected to the central controller via a second common bus, called status bus, on which they can transmit their respective statuses.
  • status we mean here for example the state of a task (in particular the end of a task), the occurrence of an error in the execution of a task (division by zero, overflow) or a software interruption.
  • the statuses of the various elementary processors are grouped together in a status table 130.
  • the central controller knows at any time the completion status of the tasks performed by the various elementary processors and can transmit instructions accordingly.
  • the central controller also comprises a memory 140 in which is stored the program to be executed by the processor, said program consisting of a task sequence task 0 Jask l , ... Jask N , each task being itself composed of a series of instructions.
  • the instructions of the task or sequence of tasks are looped over the instruction bus.
  • a computational flow is defined as an ordered sub-sequence of tasks in the task sequence task 0 , task ⁇ ..., task N.
  • a calculation stream may concern a subset of all the elementary processors, or even in some cases all the elementary processors.
  • An instruction includes a header followed by a calculation flow identifier and, if applicable, the order index of the instruction in the task, and then a number of words defining the instruction to be performed. and, where appropriate, arguments of this instruction.
  • the instruction may be coded in compressed form, for example in the form of an instruction index pointing in an instruction library.
  • an example of such an instruction may be the convolution with a kernel for filtering the pixels of the macropixel, the kernel being provided as argument of the instruction.
  • the instruction can be directly executable by the elementary processor without needing to be decoded. The two types of instructions mentioned above can generally coexist.
  • Fig. 2 schematically shows the architecture of an elementary processor of FIG. 1.
  • Each instruction is read on the bus by the elementary processor 200.
  • the header of the instruction is analyzed by a filtering module 210. This detects the beginning of the instruction by means of the header, retrieves the calculation stream identifier and determines whether or not the compute flow is relevant to it. To do this, it compares the identifier received with the identifier stored in a current stream register 220. This register contains the identifier of the current stream to be executed by the elementary processor, ie tasks of this calculation stream that this elementary processor must perform. The contents of the register 220 are loaded at the time of the initialization phase of the processor or by a specific microcode.
  • the instruction may be coded in compressed form, for example in the form of an instruction index pointing in an instruction library.
  • the instruction pointer is stored in a FIFO buffer, 230. In the case where the FIFO buffer is full, the instruction in question is not recorded.
  • the instruction pointer may, however, be stored during a subsequent iteration of the instruction loop if a place has been released meanwhile at the input of the buffer.
  • the sequence can be resumed from any instruction, in particular because the different instructions of the sequence can be executed independently.
  • the elementary processor ensure that the FIFO buffer is empty enough to record a complete sequence that can then be started again.
  • the FIFO buffer can be purged when a sequence has been interrupted or an overflow has occurred.
  • each instruction has an additional field indicating the sequence number of the instruction in the task.
  • the filtering module 210 comprises a counter incremented each time an instruction is stored in the FIFO buffer and is reset at the end of the task. This value is used for filtering instructions and ensures that they are entered in sequence in the FIFO. Thus only the next instruction in the task, whose sequence number is equal to the output of the counter and whose stream identifier corresponds to the one stored in the register 220, can be stored in the FIFO buffer.
  • the frequency of transmission of the instructions by the central controller is substantially higher than the instruction processing frequency by the elementary processors, which makes it possible to transmit different instruction streams to the different elementary processors without forcing them to put on hold an instruction.
  • An advantageous solution is to interleave the instructions of the different calculation flows, allowing a regular supply of instructions for the different streams.
  • a sequence of instructions constituting a task is carried out more quickly than the others, it can be advantageously repeated several times in a repetitive cycle of tasks.
  • Those skilled in the art can define an order of the instructions of the different tasks and the number of repetitions of these tasks for optimal operation of the elementary processor, that is to say to avoid too many times when the FIFO buffer is empty (so the elementary processor is waiting for instruction) or saturated.
  • the instruction pointer When an instruction is taken into account to be executed by the elementary processor, the instruction pointer is unstacked from the FIFO buffer and supplied to the Finite State Machine (FSM) 240.
  • FSM Finite State Machine
  • This microcode library is loaded during the initialization (or during a specific phase of operation - reconfiguration of the system - by the central controller 110 ).
  • the micro-instructions contained in the microcode are sequentially transferred one by one into the microinstruction register 260.
  • the arithmetic and logic unit (ALU) 280 receives these microinstructions sequenced by the state machine 240, the arguments, as well as the data covered by the instruction. The data will have been previously read in the memory cell associated with the elementary processor and stored in the data register 270.
  • program to be executed by the processor may comprise different tasks to be executed in parallel by the different elementary processors, which makes it possible to emulate an MIMD architecture.
  • elementary processors associated with macropixels in the center of the image will be able to search Points of Interest (POIs) while elementary processors associated with macropixels at the periphery of the image will be able to perform motion detection.
  • the instructions for these two tasks are transmitted at high frequency and looped (repetitively) on the instruction bus, the central processors in the central area selecting the instruction flow for the first task (POI search). and those in the peripheral zone selecting the instruction flow for the second task. It will be noted that it is not necessary for the instruction streams of the first task and the second task to be successive.
  • the instructions for these two tasks can be intertwined, for example.
  • the iteration mechanism of the instruction loop on the one hand and the filtering of the instructions on the level of the elementary processors makes it possible to differentiate the processes performed by the latter.
  • the different tasks are executed asynchronously by the various elementary processors. This also makes it possible to have different processing frequencies for the elementary processors and thus optimize the consumption according to the tasks to be performed.
  • two processors Elementals loaded with the same task may terminate it at different times because of the respective occupancy states of their FIFO buffers.
  • an elementary processor When an elementary processor has completed the execution of a flow of instructions, it informs the central controller via the status bus.
  • the asynchronous character of the execution of the tasks can be exploited to distribute the computing load between the elementary processors.
  • Fig. 3 schematically shows a synchronization mode between two neighboring elementary processors.
  • neighboring processors can exchange data by means of duplex communication links, each communication link implementing two registers, namely a transmission register and a reception register.
  • four communication links are provided by elementary processor, connecting it to its four neighbors (in the North, South, East, West directions).
  • eight communication links can be provided linking it to its eight neighbors (the neighbors in the previous sense and those in the diagonal directions).
  • the duplex communication link 350 connects, on the one hand, a first transmission register 311 of the first elementary processor to a second reception register 322 of the second elementary processor and, on the other hand, a second transmission register 321 of the second second elementary processor at a first receive register 312 of the first elementary processor.
  • a send microcode of the elementary processor makes it possible to transmit data to a neighboring elementary processor via a communication link.
  • a receive microcode can receive data from a neighboring elementary processor via this same link.
  • send and receive microcode are possible depending on whether the transfers in the communication registers block the sequence of microinstructions in the elementary processor or not.
  • each communication register includes a status bit that indicates whether the register in question is empty or full.
  • the execution of the send microcode transfers data from the ALU to a transmission register of the elementary processor to be transmitted on the corresponding communication link.
  • the send microcode is blocking, in which case it stops the execution of the microinstruction sequence as long as the transmission register is not empty, or it is non-blocking, in which case the microcode simply writes. the data in the transmission register and sets the register status bit to "full" without affecting the execution of the microinstruction sequence.
  • the latter executes the microcode receive, which can in turn be blocking or non-blocking. If it is blocking, the receiving elementary processor waits for the transmit element status register bit of the sending elementary processor to be "full". When this condition is fulfilled, the data contained in the transmission register of the transmitting elementary processor is stored in the reception register of the receiving elementary processor. The receive microcode then sets the status bit of the transmit register (of the sending elementary processor) to the value "empty” and the status bit of the receive register (of the receiving elementary processor) to the value "full". An additional read microcode can then read the data from the receive register and supply it as input to the ALU (receiver elementary processor). After reading the receive register, the read microcode sets the status bit of the receive register to the value "empty".
  • ALU receiveriver elementary processor
  • the synchronization between elementary processors for sending and receiving data can also be performed via the central controller which then explicitly orders the data exchanges in synchronous mode.
  • Fig. 4 represents a task delegation between two elementary processors under the supervision of the central controller.
  • an elementary processor 430 When an elementary processor 430 has completed its task and has signaled it to the central controller on the status bus, it becomes available to perform a new process. A neighbor elementary processor 420 can then delegate part of its task while it is running. The elementary processor 420 is informed of the availability of the elementary processor 430 by the central controller which maintains the status table. The central controller can then indicate the task to be performed by means of a new code to be loaded in the register 220) and trigger in 425 a transfer of data via the communication link that connects them.
  • This indication may also take the form of a start address and an end address in the compute flow.
  • the elementary processor 430 determines by means of its selection module the instructions that are intended for the elementary processor 420 and whose addresses are between said start and end addresses of the delegated task.
  • the elementary processor 430 informs the central controller that updates its status table.
  • the elementary processor 420 is thus informed of the end of the delegated task and triggers at 435 the transfer of data to receive them in its register (or its buffer) reception.
  • the task delegation may for example concern a part of the data of the macro-pixel and / or a particular operation.
  • the elementary processor 430 may be loaded with a point of interest search on behalf of the elementary processor 420 once it has completed its task of motion detection.
  • the task delegation process can be repeated over time until the end of the program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Advance Control (AREA)
  • Image Processing (AREA)
  • Multi Processors (AREA)

Abstract

The present invention relates to a processor having a SIMD architecture, comprising an array (120) of elementary processors (150), each elementary processor (150) being associated with an elementary memory cell (155), a central controller (110) connected to the elementary processors by an instruction bus and a status bus. The central controller transmits a sequence of instructions in a loop, each instruction comprising a calculation flow indicator. Each elementary processor has an instruction filter that makes it possible to reject or take into account an instruction depending on the identifier it contains. This operating mode makes it possible to emulate a MIMD processor on a SIMD architecture.

Description

PROCESSEUR MIMD ÉMULÉ SUR ARCHITECTURE SIMD  MIMD PROCESSOR EMULATED ON HMIS ARCHITECTURE
DESCRIPTIONDESCRIPTION
DOMAINE TECHNIQUE TECHNICAL AREA
La présente invention concerne de manière générale le domaine des processeurs MIMD (Multiple Instruction Multiple Data), en particulier pour effectuer des traitements d'images dans un système de vision tel qu'une rétine intelligente.  The present invention generally relates to the field of Multiple Instruction Multiple Data (MIMD) processors, in particular for performing image processing in a vision system such as an intelligent retina.
ÉTAT DE LA TECHNIQUE ANTÉRIEURE STATE OF THE PRIOR ART
Les rétines intelligentes sont des circuits intégrés combinant une matrice de capteurs et un processeur constitué d'une matrice de processeurs élémentaires, les processeurs élémentaires, encore dénommés PEs (Processing Eléments) effectuant des traitements sur les signaux fournis par ces capteurs. De manière générale, il existe une correspondance entre les capteurs (ou pixels) et les processeurs élémentaires : un processeur élémentaire est en charge du traitement des signaux issus d'un ou de plusieurs pixels.  Intelligent retinas are integrated circuits combining a matrix of sensors and a processor consisting of a matrix of elementary processors, the elementary processors, also called PEs (Processing Elements) performing processing on the signals provided by these sensors. In general, there is a correspondence between the sensors (or pixels) and the elementary processors: an elementary processor is in charge of processing the signals from one or more pixels.
Le processeur peut effectuer des traitements élémentaires de l'image (filtrage spatial par exemple) voire des opérations plus complexes, telles que recherche de point d'intérêt (POIs) ou détection d'objets. Généralement, l'architecture du processeur est du type SIMD (Single Instruction Multiple Data), autrement dit la même instruction est effectuée en parallèle par tous les processeurs élémentaires qui chacun traite une donnée différente car relié à des pixels différents. Chaque processeur élémentaire dispose de sa propre unité arithmétique et logique (ALU), des registres et, le cas échéant, d'une mémoire locale et reçoit la même instruction que tous les autres processeurs élémentaires.  The processor can perform elementary image processing (spatial filtering for example) or even more complex operations, such as POIs or object detection. Generally, the processor architecture is of the SIMD (Single Instruction Multiple Data) type, ie the same instruction is performed in parallel by all the elementary processors which each process a different datum because connected to different pixels. Each elementary processor has its own arithmetic and logical unit (ALU), registers and, if applicable, a local memory and receives the same instruction as all other elementary processors.
Un exemple de système de vision utilisant un processeur à architecture SIMD a été décrit par exemple dans le chapitre de P. Dudek « SCAMP-3 : a vision chip SIMD current- mode analogue processor array » de l'ouvrage « Focal-plane sensor-processor chips », 2011, publié par A. Zarandy chez Springer.  An example of a vision system using a SIMD architecture processor has been described, for example, in the P. Dudek chapter "SCAMP-3: a SIMD vision chip current-mode analog processor array" of the book "Focal-plane sensor- processor chips ", 2011, published by A. Zarandy at Springer.
Ce type d'architecture est adapté aux calculs massivement parallèles mais n'est pas optimal lorsque des traitements distincts doivent être exécutés sur différentes parties de l'image. La nature de l'architecture SIMD impose en effet que ces traitements distincts soient effectués séquentiellement, ce qui pénalise le temps d'exécution. This type of architecture is suitable for massively parallel computations but is not optimal when different processes must be executed on different parts. of the image. The nature of the SIMD architecture requires that these separate processes be performed sequentially, which penalizes the execution time.
Plus récemment, il a été proposé une architecture de processeur SIMD dont les processeurs élémentaires opèrent en parallèle sur les colonnes respectives de la matrice de capteurs. Cette architecture a été décrite dans l'article de T. Yamazaki et al. intitulé « A 1 ms high-speed vision chip with 3D-stacked 1 column 140 Gops column-parallel PEs for spatial-temporal image Processing » publié dans ISCCC 2017 Conf. Proc., Session 4, Imagers 4.9, pages 82-84. Cette architecture permet une certaine flexibilité dans la mesure où il est possible de choisir indépendamment et simultanément un traitement parmi quatre sur différentes régions verticales de l'image.  More recently, it has been proposed a SIMD processor architecture whose elementary processors operate in parallel on the respective columns of the sensor array. This architecture has been described in the article by T. Yamazaki et al. entitled "A 1 ms high-speed vision chip with 3D-stacked 1 column 140 Gops column-parallel PEs for spatial-temporal image processing" published in ISCCC 2017 Conf. Proc., Session 4, Imagers 4.9, pages 82-84. This architecture allows a certain flexibility in that it is possible to choose independently and simultaneously one of four processing on different vertical regions of the image.
Le but de la présente invention est par conséquent de proposer une architecture de processeur qui soit simple et permette de réaliser de manière flexible des traitements parallèles distincts, en particulier sur des zones différentes de configuration quelconque d'une image captée par une matrice de capteurs.  The object of the present invention is therefore to provide a processor architecture that is simple and allows to perform in a flexible manner separate parallel processing, in particular on different areas of any configuration of an image captured by a sensor array.
EXPOSÉ DE L'INVENTION STATEMENT OF THE INVENTION
La présente invention est définie par un processeur à architecture SIMD comprenant une matrice de processeurs élémentaires, chaque processeur élémentaire étant associé à une cellule mémoire destinée à stocker des données à traiter par ledit processeur élémentaire, le processeur comprenant en outre un contrôleur central, les processeurs élémentaires étant reliés au contrôleur central par un premier bus, dit bus d'instructions, permettant au contrôleur central de transmettre en parallèle des instructions aux processeurs élémentaires, et par un second bus, dit bus de statuts, permettant au contrôleur central de recevoir les statuts des différents processeurs élémentaires, ledit processeur étant avantageux en ce que :  The present invention is defined by a SIMD architecture processor comprising a matrix of elementary processors, each elementary processor being associated with a memory cell intended to store data to be processed by said elementary processor, the processor further comprising a central controller, the processors elementaries being connected to the central controller by a first bus, said instruction bus, allowing the central controller to transmit instructions in parallel to the elementary processors, and by a second bus, called status bus, allowing the central controller to receive the statuses. different elementary processors, said processor being advantageous in that:
- le contrôleur central comprend une mémoire dans laquelle sont stockées sous forme de séquence d'instructions les tâches à effectuer par les différents processeurs élémentaires, le contrôleur central transmettant en boucle la séquence d'instructions sur le bus d'instructions, chaque instruction comprenant un identificateur de flot de calcul, un flot de calcul étant défini comme une liste ordonnée de tâches, chaque flot de calcul concernant un ou plusieurs processeur(s) élémentaire(s) ; the central controller comprises a memory in which the tasks to be performed by the various elementary processors are stored in the form of a sequence of instructions, the central controller looping the sequence of instructions on the instruction bus, each instruction comprising a calculation flow identifier, a computational flow being defined as an ordered list of tasks, each calculation flow relating to one or more elementary processor (s);
- chaque processeur élémentaire comprend un filtre d'instructions et une table d'identificateurs, le filtre d'instructions étant adapté à extraire l'identificateur de flot de calcul de chaque instruction reçue par le processeur élémentaire et à déterminer si l'identificateur est présent dans ladite table, l'instruction étant stockée dans un buffer FIFO pour être exécutée par le processeur élémentaire dans l'affirmative et rejetée par le processeur élémentaire dans la négative.  each elementary processor comprises an instruction filter and an identifier table, the instruction filter being adapted to extract the calculation flow identifier of each instruction received by the elementary processor and to determine if the identifier is present; in said table, the instruction being stored in a FIFO buffer to be executed by the elementary processor in the affirmative and rejected by the elementary processor in the negative.
Le buffer FIFO est typiquement dépilé à chaque instruction exécutée par ledit processeur élémentaire.  The FIFO buffer is typically popped at each instruction executed by said elementary processor.
Avantageusement, chaque instruction d'une tâche comporte un numéro d'ordre indiquant son ordre d'exécution dans la tâche, le filtre d'instructions du processeur élémentaire comprenant un compteur incrémenté à chaque fois que le buffer FIFO est dépilé, une instruction n'étant stockée dans le buffer FIFO que si son identificateur de flot est présent dans la table du processeur élémentaire et si son numéro d'ordre est égal à la valeur de sortie dudit compteur.  Advantageously, each instruction of a task comprises a sequence number indicating its order of execution in the task, the instruction filter of the elementary processor comprising a counter incremented each time the FIFO buffer is pared, an instruction n ' being stored in the FIFO buffer only if its stream identifier is present in the table of the elementary processor and if its sequence number is equal to the output value of said counter.
La fréquence de transmission des instructions sur le bus d'instructions peut notamment être sensiblement supérieure à la fréquence d'exécution de ces instructions par les processeurs élémentaires. The transmission frequency of the instructions on the instruction bus may notably be substantially greater than the frequency of execution of these instructions by the elementary processors.
Chaque instruction comprend avantageusement un pointeur d'instruction et le processeur élémentaire comprend un micro-séquenceur relié à une mémoire de stockage d'une librairie de microcode, le micro-séquenceur séquençant les micro-instructions du microcode pointé par ledit pointeur d'instruction.  Each instruction advantageously comprises an instruction pointer and the elementary processor comprises a micro-sequencer connected to a storage memory of a microcode library, the micro-sequencer sequencing micro-instructions of the microcode pointed by said instruction pointer.
En outre, chaque processeur élémentaire peut être connecté à ses voisins au moyen de liens de communication, un lien de communication entre un premier processeur élémentaire et un second processeur élémentaire reliant un premier registre d'émission du premier processeur élémentaire à un second registre de réception du second processeur élémentaire et un second registre d'émission du second processeur élémentaire à un registre de réception du premier processeur élémentaire. L'exécution des micro-instructions par le premier processeur élémentaire est alors stoppée tant que le premier registre d'émission n'est pas vide. In addition, each elementary processor can be connected to its neighbors by means of communication links, a communication link between a first elementary processor and a second elementary processor connecting a first transmission register of the first elementary processor to a second reception register. of the second elementary processor and a second transmission register of the second elementary processor to a reception register of the first elementary processor. The execution of the micro-instructions by the first elementary processor is then stopped as long as the first transmission register is not empty.
Alternativement, l'exécution des micro-instructions par le second processeur élémentaire est stoppée tant que le second registre de réception n'est pas plein.  Alternatively, the execution of the micro-instructions by the second elementary processor is stopped as long as the second reception register is not full.
Dans le premier cas, le premier processeur élémentaire ayant terminé l'exécution d'une tâche en informe le contrôleur central par une notification de son statut et que le second processeur élémentaire est informé de ce statut par le contrôleur central.  In the first case, the first elementary processor having completed the execution of a task informs the central controller by a notification of its status and the second elementary processor is informed of this status by the central controller.
La présente invention concerne également un capteur optique intelligent caractérisé en ce qu'il comprend une matrice de capteurs élémentaires et un processeur à architecture SIMD selon l'une des revendications précédentes, chaque processeur élémentaire étant associé à une pluralité de capteurs de ladite matrice et étant adapté à traiter les signaux issus de ces capteurs. Chaque processeur élémentaire peut lui-même posséder une architecture SIMD.  The present invention also relates to an intelligent optical sensor characterized in that it comprises a matrix of elementary sensors and a SIMD architecture processor according to one of the preceding claims, each elementary processor being associated with a plurality of sensors of said matrix and being adapted to process the signals from these sensors. Each elementary processor may itself have a SIMD architecture.
BRÈVE DESCRIPTION DES DESSINS BRIEF DESCRIPTION OF THE DRAWINGS
D'autres caractéristiques et avantages de l'invention apparaîtront à la lecture d'un mode de réalisation préférentiel de l'invention, décrit en référence aux figures jointes parmi lesquelles :  Other characteristics and advantages of the invention will appear on reading a preferred embodiment of the invention, described with reference to the appended figures among which:
La Fig. 1 représente schématiquement l'architecture générale d'un processeur SIMD selon un mode de réalisation de l'invention ;  Fig. 1 schematically represents the general architecture of a SIMD processor according to one embodiment of the invention;
La Fig. 2 représente schématiquement l'architecture d'un processeur élémentaire du processeur de la Fig. 1 ;  Fig. 2 schematically represents the architecture of an elementary processor of the processor of FIG. 1;
La Fig. 3 représente schématiquement un mode de synchronisation entre deux processeurs élémentaires du processeur de la Fig. 1 ;  Fig. 3 schematically shows a mode of synchronization between two elementary processors of the processor of FIG. 1;
La Fig. 4 représente schématiquement une délégation de tâche entre deux processeurs élémentaires du processeur de la Fig. 1. EXPOSÉ DÉTAILLÉ DE MODES DE RÉALISATION PARTICULIERS Fig. 4 schematically represents a task delegation between two elementary processors of the processor of FIG. 1. DETAILED PRESENTATION OF PARTICULAR EMBODIMENTS
On considérera dans la suite un processeur SIMD tel que défini dans la partie introductive. On rappelle qu'un tel processeur est constitué d'une matrice de processeurs élémentaires (PEs) partageant un même bus d'instructions et destinés à exécuter en parallèle la même instruction pendant un même intervalle de temps. Dans un mode d'utilisation particulier, ce processeur est intégré avec une matrice de capteurs (photodiodes par exemple) au sein d'un capteur optique intelligent (rétine intelligente). Plus précisément, dans ce cas, chaque processeur élémentaire est associé à une sous- matrice de la matrice de capteurs, les signaux des différents capteurs de la sous-matrice étant stockés dans une sous-matrice de mémorisation, encore dénommée macropixel. La structure d'une telle sous-matrice de mémorisation a été décrite dans la demande FR-A- 2984556. Les processeurs élémentaires possèdent eux-mêmes avantageusement une architecture SIMD (chaque processeur élémentaire comprenant alors une pluralité d'unités de calcul opérant en parallèle) et peuvent par conséquent traiter en parallèle plusieurs données stockées dans la sous-matrice de mémorisation.  We will consider in the following a SIMD processor as defined in the introductory part. Recall that such a processor consists of a matrix of elementary processors (PEs) sharing the same instruction bus and intended to run in parallel the same instruction during the same time interval. In a particular mode of use, this processor is integrated with a matrix of sensors (photodiodes for example) within an intelligent optical sensor (intelligent retina). More precisely, in this case, each elementary processor is associated with a sub-matrix of the sensor array, the signals of the various sensors of the sub-array being stored in a storage sub-array, also called macropixel. The structure of such a storage sub-matrix has been described in application FR-A-2984556. The elementary processors themselves advantageously have a SIMD architecture (each elementary processor then comprising a plurality of calculation units operating in parallel ) and can therefore process in parallel several data stored in the storage sub-array.
L'idée à la base de la présente invention est d'émuler un processeur à architecture MIMD (Multiple Instructions Multiple Data), tel qu'un processeur multi-cœur, à partir d'un processeur à architecture SIMD, permettant de ne pas multiplier les ressources nécessaires pour assurer la mémorisation et le séquencement des instructions, nécessaire pour chaque instance de processeur MIMD.  The idea underlying the present invention is to emulate a multiple instructions multiple data processor (MIMD), such as a multi-core processor, from a SIMD architecture processor, so as not to multiply the resources necessary to ensure the storage and sequencing of instructions, necessary for each MIMD processor instance.
La Fig. 1 représente schématiquement l'architecture d'un processeur à architecture SIMD selon un mode de réalisation de l'invention. Fig. 1 schematically shows the architecture of a SIMD architecture processor according to one embodiment of the invention.
Ce processeur comprend une matrice 120 de processeurs élémentaires 150 (PE), chaque processeur élémentaire peut accéder à une cellule mémoire à laquelle il est associé. Plus précisément, la mémoire, 125, est divisée en cellules mémoire 155 (CE) contenant les données à traiter par le processeur élémentaire. La cellule mémoire possède par exemple la structure de la sous-matrice de mémorisation précitée et chaque processeur élémentaire traite les données du macropixel correspondant. Les processeurs élémentaires sont connectés en parallèle à un contrôleur central 110 au moyen d'un premier bus commun, dit bus d'instructions. Ainsi, lorsqu'une instruction est transmise par le contrôleur, chacun des processeurs élémentaires la reçoit et peut l'exécuter en parallèle. This processor comprises a matrix 120 of elementary processors 150 (PE), each elementary processor can access a memory cell with which it is associated. More precisely, the memory 125 is divided into memory cells 155 (CE) containing the data to be processed by the elementary processor. The memory cell has for example the structure of the aforementioned storage sub-matrix and each elementary processor processes the data of the corresponding macropixel. The elementary processors are connected in parallel to a central controller 110 by means of a first common bus, called the instruction bus. Thus, when an instruction is transmitted by the controller, each of the elementary processors receives it and can execute it in parallel.
Les processeurs élémentaires sont également connectés au contrôleur central via un second bus commun, dit bus de statuts, sur lequel ils peuvent transmettre leurs statuts respectifs. Par statut, on entend ici par exemple l'état d'une tâche (notamment la fin d'une tâche), la survenance d'une erreur dans l'exécution d'une tâche (division par zéro, overflow) ou une interruption logicielle. Les statuts des différents processeurs élémentaires sont regroupés au sein d'une table de statuts 130. Ainsi, le contrôleur central connaît à tout moment l'état d'achèvement des tâches effectuées par les différents processeurs élémentaires et peut transmettre des instructions en conséquence.  The elementary processors are also connected to the central controller via a second common bus, called status bus, on which they can transmit their respective statuses. By status, we mean here for example the state of a task (in particular the end of a task), the occurrence of an error in the execution of a task (division by zero, overflow) or a software interruption. . The statuses of the various elementary processors are grouped together in a status table 130. Thus, the central controller knows at any time the completion status of the tasks performed by the various elementary processors and can transmit instructions accordingly.
Le contrôleur central comprend également une mémoire 140 dans laquelle est stocké le programme à exécuter par le processeur, ledit programme étant constitué d'une séquence de tâches task0Jaskl,...JaskN , chaque tâche étant elle-même constituée d'une série d'instructions. De manière avantageuse, comme nous le verrons plus en détail plus loin, les instructions de la tâche ou de la séquence de tâches sont transmises en boucle sur le bus d'instructions. On définit un flot de calcul comme une sous-séquence ordonnée des tâches de la séquence de tâches task0, task^ ..., taskN . Un flot de calcul peut concerner un sous-ensemble de l'ensemble des processeurs élémentaires, voire dans certains cas la totalité des processeurs élémentaires. The central controller also comprises a memory 140 in which is stored the program to be executed by the processor, said program consisting of a task sequence task 0 Jask l , ... Jask N , each task being itself composed of a series of instructions. Advantageously, as will be discussed in more detail below, the instructions of the task or sequence of tasks are looped over the instruction bus. A computational flow is defined as an ordered sub-sequence of tasks in the task sequence task 0 , task ^ ..., task N. A calculation stream may concern a subset of all the elementary processors, or even in some cases all the elementary processors.
Une instruction comprend un en-tête suivi d'un identificateur de flot de calcul et, le cas échéant, l'indice d'ordre de l'instruction dans la tâche, puis d'un certain nombre de mots définissant l'instruction à effectuer et, le cas échéant, des arguments de cette instruction. Avantageusement, l'instruction peut être codée sous forme compressée par exemple sous la forme d'un index d'instructions pointant dans une librairie d'instructions. Dans le cas d'un capteur optique intelligent, un exemple d'une telle instruction pourra être la convolution avec un noyau pour filtrer les pixels du macropixel, le noyau étant fourni en argument de l'instruction. Alternativement, l'instruction peut être directement exécutable par le processeur élémentaire sans besoin d'être décodée. Les deux types d'instructions précitées peuvent généralement coexister. An instruction includes a header followed by a calculation flow identifier and, if applicable, the order index of the instruction in the task, and then a number of words defining the instruction to be performed. and, where appropriate, arguments of this instruction. Advantageously, the instruction may be coded in compressed form, for example in the form of an instruction index pointing in an instruction library. In the case of an intelligent optical sensor, an example of such an instruction may be the convolution with a kernel for filtering the pixels of the macropixel, the kernel being provided as argument of the instruction. Alternatively, the instruction can be directly executable by the elementary processor without needing to be decoded. The two types of instructions mentioned above can generally coexist.
La Fig. 2 représente de manière schématique l'architecture d'un processeur élémentaire de la Fig. 1. Fig. 2 schematically shows the architecture of an elementary processor of FIG. 1.
A gauche de la figure, on a rappelé que le processeur central transmettait en boucle une séquence d'instructions inst0, ...,instK sur le bus d'instructions. Ces instructions peuvent être relatives à différentes tâches, une tâche appartenant à un flot de calcul qu'un ou plusieurs processeur(s) élémentaire(s) doi(ven)t exécuter. On the left of the figure, it was recalled that the central processor looped a sequence of instructions inst 0 , ..., inst K on the instruction bus. These instructions can be related to different tasks, a task belonging to a calculation stream that one or more elementary processor (s) must (i) execute.
Chaque instruction est lue sur le bus par le processeur élémentaire 200. L'en-tête de l'instruction est analysé par un module de filtrage 210. Celui-ci détecte le début de l'instruction au moyen de l'en-tête, extrait l'identificateur de flot de calcul et détermine si le flot de calcul le concerne ou non. Pour ce faire, il compare l'identificateur reçu avec l'identificateur stocké dans un registre de flot courant 220. Ce registre contient l'identificateur du flot courant devant être exécuté par le processeur élémentaire, autrement dit des tâches de ce flot de calcul que ce processeur élémentaire doit effectuer. Le contenu du registre 220 est chargé au moment de la phase d'initialisation du processeur ou par un microcode spécifique.  Each instruction is read on the bus by the elementary processor 200. The header of the instruction is analyzed by a filtering module 210. This detects the beginning of the instruction by means of the header, retrieves the calculation stream identifier and determines whether or not the compute flow is relevant to it. To do this, it compares the identifier received with the identifier stored in a current stream register 220. This register contains the identifier of the current stream to be executed by the elementary processor, ie tasks of this calculation stream that this elementary processor must perform. The contents of the register 220 are loaded at the time of the initialization phase of the processor or by a specific microcode.
Avantageusement, l'instruction peut être codée sous forme compressée par exemple sous la forme d'un index d'instructions pointant dans une librairie d'instructions.  Advantageously, the instruction may be coded in compressed form, for example in the form of an instruction index pointing in an instruction library.
Lorsque l'instruction appartient à un flot de calcul concernant le processeur élémentaire, le pointeur d'instruction est stocké dans un buffer FIFO, 230. Dans le cas où le buffer FIFO est plein, l'instruction en question n'est pas enregistrée. Le pointeur d'instruction pourra cependant être stocké lors d'une itération suivante de la boucle d'instructions si une place s'est libérée entretemps en entrée du buffer.  When the instruction belongs to a calculation flow concerning the elementary processor, the instruction pointer is stored in a FIFO buffer, 230. In the case where the FIFO buffer is full, the instruction in question is not recorded. The instruction pointer may, however, be stored during a subsequent iteration of the instruction loop if a place has been released meanwhile at the input of the buffer.
Différentes variantes de réalisation sont possibles selon la nature de la séquence d'instructions composant une tâche.  Different variant embodiments are possible depending on the nature of the instruction sequence comprising a task.
Selon une première variante, la séquence peut être reprise à partir de n'importe quelle instruction, notamment parce que les différentes instructions de la séquence peuvent être exécutées de manière indépendante. Dans ce cas, le processeur élémentaire s'assure que le buffer FIFO est suffisamment vide pour enregistrer une séquence complète dont l'exécution peut alors recommencer. Par exemple, le buffer FIFO peut être purgé lorsqu'une séquence a été interrompue ou qu'un overflow est intervenu. According to a first variant, the sequence can be resumed from any instruction, in particular because the different instructions of the sequence can be executed independently. In this case, the elementary processor ensure that the FIFO buffer is empty enough to record a complete sequence that can then be started again. For example, the FIFO buffer can be purged when a sequence has been interrupted or an overflow has occurred.
Selon une seconde variante, toutes les instructions d'une tâche doivent s'effectuer selon l'ordre dans lequel elles apparaissent dans la séquence. Il convient alors de s'assurer que toutes les instructions de cette tâche soient effectuées selon cet ordre par le processeur élémentaire, même en cas de débordement du buffer FIFO. Dans ce cas, chaque instruction comporte un champ additionnel indiquant le numéro d'ordre de l'instruction dans la tâche. En outre, le module de filtrage 210 comprend un compteur incrémenté à chaque fois qu'une instruction est stockée dans le buffer FIFO et est remis à zéro en fin de tâche. Cette valeur sert au filtrage des instructions et s'assure qu'elles sont bien entrées en séquence dans la FIFO. Ainsi seule l'instruction suivante dans la tâche, dont le numéro d'ordre est égal à la sortie du compteur et dont l'identificateur de flot correspond à celui stocké dans le registre 220, peut être stockée dans le buffer FIFO.  According to a second variant, all the instructions of a task must be carried out according to the order in which they appear in the sequence. It should then be ensured that all the instructions for this task are carried out in this order by the elementary processor, even if the FIFO buffer overflows. In this case, each instruction has an additional field indicating the sequence number of the instruction in the task. In addition, the filtering module 210 comprises a counter incremented each time an instruction is stored in the FIFO buffer and is reset at the end of the task. This value is used for filtering instructions and ensures that they are entered in sequence in the FIFO. Thus only the next instruction in the task, whose sequence number is equal to the output of the counter and whose stream identifier corresponds to the one stored in the register 220, can be stored in the FIFO buffer.
En général, la fréquence de transmission des instructions par le contrôleur central est sensiblement plus élevée que la fréquence de traitement des instructions par les processeurs élémentaires, ce qui permet de transmettre différents flots d'instructions aux différents processeurs élémentaires sans obliger ceux-ci à se mettre en attente d'une instruction.  In general, the frequency of transmission of the instructions by the central controller is substantially higher than the instruction processing frequency by the elementary processors, which makes it possible to transmit different instruction streams to the different elementary processors without forcing them to put on hold an instruction.
Une solution avantageuse est d'entrelacer les instructions des différents flots de calcul, permettant une fourniture régulière des instructions pour les différents flots.  An advantageous solution is to interleave the instructions of the different calculation flows, allowing a regular supply of instructions for the different streams.
Si une séquence d'instructions constitutive d'une tâche s'effectue plus rapidement que les autres, elle peut être avantageusement répétée plusieurs fois dans un cycle répétitif de tâches. L'homme de l'art pourra définir un ordre des instructions des différentes tâches et le nombre de répétions de ces tâches pour un fonctionnement optimal du processeur élémentaire, c'est à dire éviter un trop grand nombre de fois où le buffer FIFO est vide (donc le processeur élémentaire est en attente d'instruction) ou saturé.  If a sequence of instructions constituting a task is carried out more quickly than the others, it can be advantageously repeated several times in a repetitive cycle of tasks. Those skilled in the art can define an order of the instructions of the different tasks and the number of repetitions of these tasks for optimal operation of the elementary processor, that is to say to avoid too many times when the FIFO buffer is empty (so the elementary processor is waiting for instruction) or saturated.
Lorsqu'une instruction est prise en compte pour être exécutée par le processeur élémentaire, le pointeur d'instruction est dépilé du buffer FIFO et fourni à la machine d'états finis FSM (Finite State Machine) 240. Celui-ci joue le rôle de micro-séquenceur : il extrait et séquence le microcode pointé par le pointeur d'instruction dans la librairie de microcode 250. Cette librairie de microcode est chargée lors de l'initialisation (ou lors d'une phase spécifique de fonctionnement - reconfiguration du système - par le contrôleur central 110). Les micro-instructions contenues dans le microcode sont transférées séquentiellement une à une dans le registre de micro-instruction 260. L'unité arithmétique et logique (ALU) 280 reçoit ces micro-instructions séquencées par la machine d'états 240, les arguments, ainsi que les données sur lesquelles porte l'instruction. Les données auront été préalablement lues dans la cellule mémoire associée au processeur élémentaire et stockées dans le registre de données 270. When an instruction is taken into account to be executed by the elementary processor, the instruction pointer is unstacked from the FIFO buffer and supplied to the Finite State Machine (FSM) 240. This plays the role of micro-sequencer: it extracts and sequences the microcode pointed by the instruction pointer in the microcode library 250. This microcode library is loaded during the initialization (or during a specific phase of operation - reconfiguration of the system - by the central controller 110 ). The micro-instructions contained in the microcode are sequentially transferred one by one into the microinstruction register 260. The arithmetic and logic unit (ALU) 280 receives these microinstructions sequenced by the state machine 240, the arguments, as well as the data covered by the instruction. The data will have been previously read in the memory cell associated with the elementary processor and stored in the data register 270.
On comprendra ainsi que le programme à exécuter par le processeur peut comprendre des tâches différentes à exécuter en parallèle par les différents processeurs élémentaires, ce qui permet d'émuler une architecture MIMD.  It will thus be understood that the program to be executed by the processor may comprise different tasks to be executed in parallel by the different elementary processors, which makes it possible to emulate an MIMD architecture.
Par exemple, dans le cas d'un capteur optique intelligent, des processeurs élémentaires associés à des macropixels au centre de l'image pourront effectuer une recherche de points d'intérêt ou POI (Points Of Interest) alors que les processeurs élémentaires associés à des macropixels à la périphérie de l'image pourront effectuer une détection de mouvement. Les instructions relatives à ces deux tâches sont transmises à fréquence élevée et en boucle (de manière répétitive) sur le bus d'instructions, les processeurs élémentaires de la zone centrale sélectionnant le flot d'instructions relatif à la première tâche (recherche de POI) et ceux de la zone périphérique sélectionnant le flot d'instructions relatif à la seconde tâche. On notera qu'il n'est pas nécessaire que les flots d'instructions de la première tâche et de la seconde tâche soient successifs. Les instructions relatives à ces deux tâches peuvent être par exemple entrelacées.  For example, in the case of an intelligent optical sensor, elementary processors associated with macropixels in the center of the image will be able to search Points of Interest (POIs) while elementary processors associated with macropixels at the periphery of the image will be able to perform motion detection. The instructions for these two tasks are transmitted at high frequency and looped (repetitively) on the instruction bus, the central processors in the central area selecting the instruction flow for the first task (POI search). and those in the peripheral zone selecting the instruction flow for the second task. It will be noted that it is not necessary for the instruction streams of the first task and the second task to be successive. The instructions for these two tasks can be intertwined, for example.
Le mécanisme d'itération de la boucle d'instructions d'une part et de filtrage des instructions au niveau des processeurs élémentaires permet de différencier les traitements effectués par ces derniers.  The iteration mechanism of the instruction loop on the one hand and the filtering of the instructions on the level of the elementary processors makes it possible to differentiate the processes performed by the latter.
On notera que les différentes tâches sont exécutées de manière asynchrone par les différents processeurs élémentaires. Cela permet aussi d'avoir des fréquences de traitement différentes pour les processeurs élémentaires et ainsi optimiser la consommation en fonction des tâches à effectuer. En particulier, deux processeurs élémentaires chargés de la même tâche peuvent la terminer à des instants différents en raison des états d'occupation respectifs de leurs buffers FIFO. Lorsqu'un processeur élémentaire a achevé l'exécution d'un flot d'instructions, il en informe le contrôleur central via le bus de statuts. It should be noted that the different tasks are executed asynchronously by the various elementary processors. This also makes it possible to have different processing frequencies for the elementary processors and thus optimize the consumption according to the tasks to be performed. In particular, two processors Elementals loaded with the same task may terminate it at different times because of the respective occupancy states of their FIFO buffers. When an elementary processor has completed the execution of a flow of instructions, it informs the central controller via the status bus.
Le caractère asynchrone de l'exécution des tâches peut être exploité pour répartir la charge de calcul entre les processeurs élémentaires.  The asynchronous character of the execution of the tasks can be exploited to distribute the computing load between the elementary processors.
Alternativement, il est possible de synchroniser l'exécution des tâches entre processeurs élémentaires voisins.  Alternatively, it is possible to synchronize the execution of tasks between neighboring elementary processors.
La Fig. 3 représente de manière schématique un mode de synchronisation entre deux processeurs élémentaires voisins. Fig. 3 schematically shows a synchronization mode between two neighboring elementary processors.
Dans ce mode de réalisation, des processeurs voisins peuvent échanger des données au moyen de liens de communication duplex, chaque lien de communication mettant en œuvre deux registres, à savoir un registre d'émission et un registre de réception.  In this embodiment, neighboring processors can exchange data by means of duplex communication links, each communication link implementing two registers, namely a transmission register and a reception register.
Avantageusement, quatre liens de communication sont prévus par processeur élémentaire, le reliant à ses quatre voisins (dans les directions Nord, Sud, Est, Ouest). Alternativement, huit liens de communication pourront être prévus le reliant à ses huit voisins (les voisins au sens précédent et ceux selon les directions diagonales). L'association d'un registre d'émission et d'un registre de réception par lien permet de réaliser une communication asynchrone entre processeurs élémentaires voisins.  Advantageously, four communication links are provided by elementary processor, connecting it to its four neighbors (in the North, South, East, West directions). Alternatively, eight communication links can be provided linking it to its eight neighbors (the neighbors in the previous sense and those in the diagonal directions). The association of a transmission register and a reception register by link makes it possible to carry out asynchronous communication between neighboring elementary processors.
On a représenté sur la Fig. 3 un premier processeur élémentaire 310 et un second processeur élémentaire 320, voisin du premier. Le lien de communication duplex 350 relie, d'une part, un premier registre d'émission 311 du premier processeur élémentaire à un second registre de réception 322 du second processeur élémentaire et, d'autre part, un second registre d'émission 321 du second processeur élémentaire à un premier registre de réception 312 du premier processeur élémentaire.  It is shown in FIG. 3 a first elementary processor 310 and a second elementary processor 320, adjacent to the first. The duplex communication link 350 connects, on the one hand, a first transmission register 311 of the first elementary processor to a second reception register 322 of the second elementary processor and, on the other hand, a second transmission register 321 of the second second elementary processor at a first receive register 312 of the first elementary processor.
Un microcode send du processeur élémentaire permet d'émettre une donnée à un processeur élémentaire voisin via un lien de communication. De manière similaire, un microcode receive permet de recevoir une donnée d'un processeur élémentaire voisin via ce même lien. Il est toutefois nécessaire de s'assurer que les codes des processeurs élémentaires d'émission et de réception sont bien écrits de façon à ce que le transfert de données s'effectue correctement (un microcode send d'un côté correspondant à un microcode send de l'autre, et réciproquement) et ce, dans l'ordre prévu. A send microcode of the elementary processor makes it possible to transmit data to a neighboring elementary processor via a communication link. Similarly, a receive microcode can receive data from a neighboring elementary processor via this same link. However, it is necessary to ensure that the codes of the elementary transmit and receive processors are well written so that the data transfer is carried out correctly (a send microcode on one side corresponding to a send microcode of the other, and vice versa) and in the expected order.
Différentes variantes des microcodes send et receive sont envisageables suivant que les transferts dans les registres de communication bloquent ou non la séquence de micro-instructions dans le processeur élémentaire.  Different variants of the send and receive microcode are possible depending on whether the transfers in the communication registers block the sequence of microinstructions in the elementary processor or not.
A titre d'exemple, la prise en compte d'une émission ou une réception de données peut utiliser le principe des sémaphores. Pour ce faire, chaque registre de communication comprend un bit d'état qui indique si le registre en question est vide ou plein.  For example, taking into account a transmission or reception of data can use the semaphore principle. To do this, each communication register includes a status bit that indicates whether the register in question is empty or full.
L'exécution du microcode send transfère une donnée venant de l'ALU vers un registre d'émission du processeur élémentaire pour être émise sur le lien de communication correspondant. Deux situations sont possibles : soit le microcode send est bloquant auquel cas il stoppe l'exécution de la séquence de micro-instructions tant que le registre d'émission n'est pas vide, soit il est non bloquant, auquel cas le microcode écrit simplement la donnée dans le registre d'émission et met le bit d'état du registre à la valeur « plein » sans que l'exécution de la séquence de micro-instructions en soit affectée.  The execution of the send microcode transfers data from the ALU to a transmission register of the elementary processor to be transmitted on the corresponding communication link. Two situations are possible: either the send microcode is blocking, in which case it stops the execution of the microinstruction sequence as long as the transmission register is not empty, or it is non-blocking, in which case the microcode simply writes. the data in the transmission register and sets the register status bit to "full" without affecting the execution of the microinstruction sequence.
Réciproquement, du côté du processeur élémentaire recevant la donnée, celui-ci exécute le microcode receive qui peut être à son tour bloquant ou non bloquant. S'il est bloquant, le processeur élémentaire récepteur attend que le bit d'état du registre d'émission du processeur élémentaire émetteur soit à la valeur « plein ». Lorsque cette condition est réalisée, la donnée contenue dans le registre d'émission du processeur élémentaire émetteur est stockée dans le registre de réception du processeur élémentaire récepteur. Le microcode receive met alors le bit d'état du registre d'émission (du processeur élémentaire émetteur) à la valeur « vide » et le bit d'état du registre de réception (du processeur élémentaire récepteur) à la valeur « plein ». Un microcode supplémentaire read peut ensuite lire la donnée du registre de réception et la fournir en entrée de l'ALU (du processeur élémentaire récepteur). Après lecture du registre de réception, le microcode read met le bit d'état du registre de réception à la valeur « vide ». L'homme du métier pourra envisager différentes combinaisons des instructions (bloquantes ou non bloquantes) send , receive et read , sans pour autant sortir du cadre de la présente invention. Conversely, on the side of the elementary processor receiving the data, the latter executes the microcode receive, which can in turn be blocking or non-blocking. If it is blocking, the receiving elementary processor waits for the transmit element status register bit of the sending elementary processor to be "full". When this condition is fulfilled, the data contained in the transmission register of the transmitting elementary processor is stored in the reception register of the receiving elementary processor. The receive microcode then sets the status bit of the transmit register (of the sending elementary processor) to the value "empty" and the status bit of the receive register (of the receiving elementary processor) to the value "full". An additional read microcode can then read the data from the receive register and supply it as input to the ALU (receiver elementary processor). After reading the receive register, the read microcode sets the status bit of the receive register to the value "empty". The skilled person may consider different combinations of instructions (blocking or non-blocking) send, receive and read, without departing from the scope of the present invention.
La synchronisation entre processeurs élémentaires pour l'émission et la réception de données peut également être réalisée par l'intermédiaire du contrôleur central qui ordonnance alors explicitement les échanges de données en mode synchrone. The synchronization between elementary processors for sending and receiving data can also be performed via the central controller which then explicitly orders the data exchanges in synchronous mode.
La Fig. 4 représente une délégation de tâche entre deux processeurs élémentaires sous la supervision du contrôleur central. Fig. 4 represents a task delegation between two elementary processors under the supervision of the central controller.
Lorsqu'un processeur élémentaire 430 a achevé sa tâche et l'a signifié au contrôleur central sur le bus de statuts, il devient disponible pour effectuer un nouveau traitement. Un processeur élémentaire voisin 420 peut alors lui déléguer une partie de sa tâche en cours d'exécution. Le processeur élémentaire 420 est informé de la disponibilité du processeur élémentaire 430 par le contrôleur central qui maintient à jour la table des statuts. Le controleur central peut alors lui indiquer la tâche à effectuer par l'intermédiaire d'un nouveau code à charger dans le registre 220) et déclencher en 425 un transfert des données via le lien de communication qui les relie.  When an elementary processor 430 has completed its task and has signaled it to the central controller on the status bus, it becomes available to perform a new process. A neighbor elementary processor 420 can then delegate part of its task while it is running. The elementary processor 420 is informed of the availability of the elementary processor 430 by the central controller which maintains the status table. The central controller can then indicate the task to be performed by means of a new code to be loaded in the register 220) and trigger in 425 a transfer of data via the communication link that connects them.
Cette indication peut aussi prendre la forme d'une adresse de début et d'une adresse de fin dans le flot de calcul. Le processeur élémentaire 430 détermine alors au moyen de son module de sélection les instructions qui sont destinées au processeur élémentaire 420 et dont les adresses sont comprises entre lesdites adresses de début et de fin de la tâche déléguée. A la fin de l'exécution de la tâche déléguée, le processeur élémentaire 430 en informe le contrôleur central qui met à jour sa table des statuts. Le processeur élémentaire 420 est ainsi informé de la fin de la tâche déléguée et déclenche en 435 le transfert des données pour les recevoir dans son registre (ou son buffer) de réception. Dans le cas d'un capteur optique, la délégation de tâche peut par exemple concerner une partie des données du macropixel et/ou une opération particulière. Par exemple, si une recherche de points d'intérêt et une détection de mouvement doivent être effectuées par les processeurs élémentaires sur une zone de l'image (zone hachurée) et que seule une détection de mouvement doive être effectuée dans le reste de l'image, le processeur élémentaire 430 pourra être chargé d'une recherche de point d'intérêt pour le compte du processeur élémentaire 420 une fois qu'il aura terminé sa tâche de détection de mouvement. Le procédé de délégation de tâche peut se répéter au cours du temps, jusqu'à la fin du programme. This indication may also take the form of a start address and an end address in the compute flow. The elementary processor 430 then determines by means of its selection module the instructions that are intended for the elementary processor 420 and whose addresses are between said start and end addresses of the delegated task. At the end of the execution of the delegated task, the elementary processor 430 informs the central controller that updates its status table. The elementary processor 420 is thus informed of the end of the delegated task and triggers at 435 the transfer of data to receive them in its register (or its buffer) reception. In the case of an optical sensor, the task delegation may for example concern a part of the data of the macro-pixel and / or a particular operation. For example, if a search for points of interest and a motion detection must be performed by the elementary processors on an area of the image (shaded area) and that only motion detection must be performed in the rest of the image, the elementary processor 430 may be loaded with a point of interest search on behalf of the elementary processor 420 once it has completed its task of motion detection. The task delegation process can be repeated over time until the end of the program.

Claims

REVENDICATIONS
1. Processeur à architecture SIMD comprenant une matrice de processeurs élémentaires (150), chaque processeur élémentaire étant associé à une cellule mémoire (155) destinée à stocker des données à traiter par ledit processeur élémentaire, le processeur comprenant en outre un contrôleur central (110), les processeurs élémentaires étant reliés au contrôleur central par un premier bus, dit bus d'instructions, permettant au contrôleur central de transmettre en parallèle des instructions aux processeurs élémentaires, et par un second bus, dit bus de statuts, permettant au contrôleur central de recevoir les statuts des différents processeurs élémentaires, caractérisé en ce que : An SIMD architecture processor comprising a matrix of elementary processors (150), each elementary processor being associated with a memory cell (155) for storing data to be processed by said elementary processor, the processor further comprising a central controller (110). ), the elementary processors being connected to the central controller by a first bus, said instruction bus, allowing the central controller to transmit in parallel instructions to the elementary processors, and by a second bus, called status bus, allowing the central controller to receive the statuses of the various elementary processors, characterized in that:
- le contrôleur central comprend une mémoire (140) dans laquelle sont stockées sous forme de séquence d'instructions les tâches à effectuer par les différents processeurs élémentaires, le contrôleur central transmettant en boucle la séquence d'instructions sur le bus d'instructions, chaque instruction comprenant un identificateur de flot de calcul, un flot de calcul étant défini comme une liste ordonnée de tâches, chaque flot de calcul concernant un ou plusieurs processeur(s) élémentaire(s) ;  the central controller comprises a memory (140) in which the tasks to be performed by the various elementary processors are stored in the form of an instruction sequence, the central controller looping the sequence of instructions on the instruction bus, each an instruction comprising a calculation stream identifier, a calculation stream being defined as an ordered list of tasks, each calculation stream relating to one or more elementary processor (s);
- chaque processeur élémentaire comprend un filtre d'instructions (210) et une table d'identificateurs (220), le filtre d'instructions étant adapté à extraire l'identificateur de flot de calcul de chaque instruction reçue par le processeur élémentaire et à déterminer si l'identificateur est présent dans ladite table, l'instruction étant stockée dans un buffer FIFO (230) pour être exécutée par le processeur élémentaire dans l'affirmative et rejetée par le processeur élémentaire dans la négative.  each elementary processor comprises an instruction filter (210) and an identifier table (220), the instruction filter being adapted to extract the calculation flow identifier of each instruction received by the elementary processor and to be determined if the identifier is present in said table, the instruction being stored in a FIFO buffer (230) to be executed by the elementary processor in the affirmative and rejected by the elementary processor in the negative.
2. Processeur à architecture SIMD selon la revendication 1, caractérisé en ce que le buffer FIFO (230) est dépilé à chaque instruction exécutée par ledit processeur élémentaire. 2. SIMD architecture processor according to claim 1, characterized in that the FIFO buffer (230) is depilated at each instruction executed by said elementary processor.
3. Processeur à architecture SIMD selon la revendication 2, caractérisé en ce que chaque instruction d'une tâche comporte un numéro d'ordre indiquant son ordre d'exécution dans la tâche, le filtre d'instructions du processeur élémentaire comprenant un compteur incrémenté à chaque fois que le buffer FIFO est dépilé, une instruction n'étant stockée dans le buffer FIFO que si son identificateur de flot est présent dans la table du processeur élémentaire et si son numéro d'ordre est égal à la valeur de sortie dudit compteur. SIMATIC architecture processor according to claim 2, characterized in that each instruction of a task comprises a sequence number indicating its execution order in the task, the instruction filter of the elementary processor comprising a counter incremented each time the FIFO buffer is unstacked, an instruction being stored in the FIFO buffer only if its stream identifier is present in the table of the elementary processor and if its sequence number is equal to the value of output of said counter.
4. Processeur à architecture SIMD selon l'une des revendications précédentes, caractérisé en ce que la fréquence de transmission des instructions sur le bus d'instructions est sensiblement supérieure à la fréquence d'exécution de ces instructions par les processeurs élémentaires. 4. SIMD architecture processor according to one of the preceding claims, characterized in that the transmission frequency of the instructions on the instruction bus is substantially greater than the frequency of execution of these instructions by the elementary processors.
5. Processeur à architecture SIMD selon l'une des revendications précédentes, caractérisé en ce que chaque instruction comprend un pointeur d'instruction et que le processeur élémentaire comprend un micro-séquenceur (240) relié à une mémoire de stockage d'une librairie de microcode (250), le micro-séquenceur séquençant les micro instructions du microcode pointé par ledit pointeur d'instruction. 5. SIMD architecture processor according to one of the preceding claims, characterized in that each instruction comprises an instruction pointer and the elementary processor comprises a micro-sequencer (240) connected to a storage memory of a library of microcode (250), the micro-sequencer sequencing micro instructions microcode pointed by said instruction pointer.
6. Processeur à architecture SIMD selon la revendication 5, caractérisé en ce que chaque processeur élémentaire est connecté à ses voisins au moyen de liens de communication, un lien de communication (350) entre un premier processeur élémentaireSIMD architecture processor according to claim 5, characterized in that each elementary processor is connected to its neighbors by means of communication links, a communication link (350) between a first elementary processor
(310) et un second processeur élémentaire (320) reliant un premier registre d'émission(310) and a second elementary processor (320) connecting a first transmission register
(311) du premier processeur élémentaire à un second registre de réception (322) du second processeur élémentaire et un second registre d'émission (321) du second processeur élémentaire à un registre de réception (312) du premier processeur élémentaire. (311) from the first elementary processor to a second receive register (322) of the second elementary processor and a second transmit register (321) from the second elementary processor to a receive register (312) of the first elementary processor.
7. Processeur à architecture SIMD selon la revendication 6, caractérisé en ce que l'exécution des micro-instructions par le premier processeur élémentaire est stoppée tant que le premier registre d'émission n'est pas vide. 7. SIMD architecture processor according to claim 6, characterized in that the execution of the micro-instructions by the first elementary processor is stopped until the first transmission register is not empty.
8. Processeur à architecture SIMD selon la revendication 6, caractérisé en ce que l'exécution des micro-instructions par le second processeur élémentaire est stoppée tant que le second registre de réception n'est pas plein. 9. Processeur à architecture SIMD selon la revendication 6, caractérisé en ce que le premier processeur élémentaire ayant terminé l'exécution d'une tâche en informe le contrôleur central par une notification de son statut et que le second processeur élémentaire est informé de ce statut par le contrôleur central. 10. Capteur optique intelligent caractérisé en ce qu'il comprend une matrice de capteurs élémentaires et un processeur à architecture SIMD selon l'une des revendications précédentes, chaque processeur élémentaire étant associé à une pluralité de capteurs de ladite matrice et étant adapté à traiter les signaux issus de ces capteurs. 11. Capteur optique intelligent selon la revendication 10, caractérisé en ce que chaque processeur élémentaire possède lui-même une architecture SIMD. 8. SIMD architecture processor according to claim 6, characterized in that the execution of the micro-instructions by the second elementary processor is stopped until the second reception register is not full. 9. SIMD architecture processor according to claim 6, characterized in that the first elementary processor having completed the execution of a task informs the central controller by a notification of its status and the second elementary processor is informed of this status by the central controller. 10. Intelligent optical sensor characterized in that it comprises a matrix of elementary sensors and a SIMD architecture processor according to one of the preceding claims, each elementary processor being associated with a plurality of sensors of said matrix and being adapted to process the signals from these sensors. 11. Intelligent optical sensor according to claim 10, characterized in that each elementary processor itself has a SIMD architecture.
EP19742845.1A 2018-06-08 2019-06-06 Mimd processor emulated on simd architecture Pending EP3782036A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1855012A FR3082331B1 (en) 2018-06-08 2018-06-08 MIMD PROCESSOR EMULATED ON SIMD ARCHITECTURE
PCT/FR2019/051352 WO2019234359A1 (en) 2018-06-08 2019-06-06 Mimd processor emulated on simd architecture

Publications (1)

Publication Number Publication Date
EP3782036A1 true EP3782036A1 (en) 2021-02-24

Family

ID=63637999

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19742845.1A Pending EP3782036A1 (en) 2018-06-08 2019-06-06 Mimd processor emulated on simd architecture

Country Status (4)

Country Link
US (1) US11182170B2 (en)
EP (1) EP3782036A1 (en)
FR (1) FR3082331B1 (en)
WO (1) WO2019234359A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4435758A (en) * 1980-03-10 1984-03-06 International Business Machines Corporation Method for conditional branch execution in SIMD vector processors
GB2211638A (en) * 1987-10-27 1989-07-05 Ibm Simd array processor
US5765011A (en) * 1990-11-13 1998-06-09 International Business Machines Corporation Parallel processing system having a synchronous SIMD processing with processing elements emulating SIMD operation using individual instruction streams
US9830156B2 (en) * 2011-08-12 2017-11-28 Nvidia Corporation Temporal SIMT execution optimization through elimination of redundant operations
FR2984556B1 (en) 2011-12-20 2014-09-26 Commissariat Energie Atomique SYSTEM AND METHOD FOR COMMUNICATION BETWEEN ACQUISITION CIRCUIT AND DATA PROCESSING CIRCUIT
US9229721B2 (en) * 2012-09-10 2016-01-05 Qualcomm Incorporated Executing subroutines in a multi-threaded processing system

Also Published As

Publication number Publication date
WO2019234359A1 (en) 2019-12-12
FR3082331A1 (en) 2019-12-13
US20210240482A1 (en) 2021-08-05
US11182170B2 (en) 2021-11-23
FR3082331B1 (en) 2020-09-18

Similar Documents

Publication Publication Date Title
EP0020202B1 (en) Multiprocessing system for signal treatment
WO2010037570A1 (en) Device for the parallel processing of a data stream
FR2752466A1 (en) INTEGRATED PROCESSOR DEVICE FOR DIGITAL SIGNALS
FR2587521A1 (en) SIGNAL PROCESSING APPARATUS FOR CARRYING OUT MULTIPLE RESOLUTION OPERATIONS IN REAL TIME
JPS635775B2 (en)
FR3091375A1 (en) LOADING-STORAGE INSTRUCTION
EP1805611A1 (en) Task processing scheduling method and device for implementing same
CN112799726B (en) Data processing device, method and related product
FR3091389A1 (en) REGISTER BENCHES IN A MULTIPLE PERFORMANCE WIRE PROCESSOR
EP1860571B1 (en) DMA controller, system on a chip comprising such a DMA controller, data exchange method using such a DMA controller
FR2583904A1 (en) MULTIPLE DATA TRAIN AND SINGLE INSTRUCTION (SIMD) TYPE COMPUTER SYSTEM WITH SELECTIVE DATA PROCESSING
EP4020475A1 (en) Memory module suitable for performing computing functions
CA2348069A1 (en) Multi-resource architecture management system and method
CN114330686A (en) Configurable convolution processing device and convolution calculation method
WO2019234359A1 (en) Mimd processor emulated on simd architecture
US11276132B2 (en) Data processing method and sensor device for performing the same
EP2553655B1 (en) Data stream processing architecture enabling extension of neighborhood mask
FR2475763A1 (en) DIGITAL PROCESSOR WITH PIPELINE STRUCTURE
WO2016071330A1 (en) Coarse-grain reconfigurable architecture method and device for executing an application code in its entirety
EP0333537A1 (en) Digital signal-processing device
EP0109337A2 (en) Data processing device with a multi-microcomputer for image processing
EP0346420B1 (en) Process for exchanging information in a multiprocessor system
WO2009068419A1 (en) Circuit comprising a microprogrammed machine for processing the inputs or the outputs of a processor so as to enable them to enter or leave the circuit according to any communication protocol
FR2484668A1 (en) METHOD AND APPARATUS FOR TRANSFERRING EXTERNAL INPUT AND OUTPUT DATA TO A MICROPROCESSOR SYSTEM
EP4206938A1 (en) Direct data transfer system

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20201119

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20220519