WO2002027513A1 - Systeme multiprocesseurs, systeme de traitement de donnees, procede de traitement de donnees et programme d'ordinateur - Google Patents
Systeme multiprocesseurs, systeme de traitement de donnees, procede de traitement de donnees et programme d'ordinateur Download PDFInfo
- Publication number
- WO2002027513A1 WO2002027513A1 PCT/JP2001/008434 JP0108434W WO0227513A1 WO 2002027513 A1 WO2002027513 A1 WO 2002027513A1 JP 0108434 W JP0108434 W JP 0108434W WO 0227513 A1 WO0227513 A1 WO 0227513A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- data processing
- processing
- processors
- broadcast
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/80—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors
- G06F15/8007—Architectures of general purpose stored program computers comprising an array of processing units with common control, e.g. single instruction multiple data processors single instruction multiple data [SIMD] multiprocessors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
Definitions
- the present invention relates to a data processing system for performing data processing by a plurality of data processing means, for example, a multiprocessor system and a data processing method.
- the data processing capacity of a multiprocessor depends on the number of processors used and the processing method used, and the dependence on the performance of individual processors is small. For this reason, it is one of the effective means to improve the processing capacity of data processing equipment.
- a processor that performs data processing uses only data processed by an adjacently connected processor-Such control is performed using a cell's automaton, an image filter, calculation of cloth and wave motion, It is suitable for calculating polygons from curved surfaces.
- Processors that perform overnight processing use data processed by all processors
- Such control is suitable for associative memory, optimization of four-color problems. Traveling-Sellsman problem, radiosity, clustering, multi-link simulation, learning, and so on.
- the processor that performs the overnight processing uses only the overnight data processed by some of the processors.
- Such control is suitable for self-organizing calculation, group algorithm based on visual judgment, many-to-many collision judgment, database search, continuous surface generation / deformation calculation, bone animation, inverse kinematics, etc. .
- the data processing in the above case (1) can be efficiently realized by a conventional parallel processor.
- the processing speed of the entire system is limited by the communication speed between the parallel processors, and the processing speed of each processor cannot be fully exhibited.
- the processing speed of each processor cannot be fully exhibited.
- by connecting all processors with a crossbar it is possible to perform the data processing of (2) and (3) at high speed, but in this case, the required hardware becomes enormous, is not.
- An object of the present invention is to provide various multiprocessor systems, data processing systems, data processing methods, computer programs, and semiconductor devices. Disclosure of the invention
- the present invention provides various multiprocessor systems, data processing systems, data processing methods, computer programs, and semiconductor devices as described below.
- the first multiprocessor system includes a plurality of processors that perform data processing, and a controller that broadcasts broadcast data including data used for data processing to the plurality of processors.
- Each of the plurality of processors performs data processing by selecting only data necessary for the overnight processing performed by itself from the broadcast data broadcasted by the controller.
- each of the multiple processors selects only the data they need from the broadcast data and processes the data, so that no data contention occurs and the overall speed is high. Processing is realized.
- the controller When enabling each processor to use the result of processing by another processor or to refer to the result of processing by another processor, the controller transmits the data from each of the plurality of processors.
- the processing result of the evening process is obtained, and the obtained processing result is broadcast as the broadcast data.
- identification data for identifying the processor is assigned to each of the plurality of processors, and the controller is configured to broadcast the processing result with the identification data of the processor from which the processing result is obtained.
- a broadcast is generated, and the broadcast is broadcasted.
- each processor can easily select a processing result necessary for data processing to be performed at the next timing based on the identification data.
- the identification data allows each processor to easily know which processor is the result of the broadcast processing.
- the multiprocessor further includes a sorting mechanism that acquires the identification data from each processor and sends the acquired identification data to the controller in a predetermined order. Configure the processor system. Then, the controller is configured to acquire the processing result based on the identification data received from the sorting mechanism. In this case, the controller further comprises means for generating priority data for determining the reading order of the processing results in the controller, The processor is configured to send the priority data about the processing together with its own identification data to the sorting mechanism, and the sorting mechanism changes the transmission order of the identification data based on the priority data. Configure to determine. By providing a sort mechanism, the controller can obtain processing results in the required order, for example, when the processing order is determined for the entire multi-processor system, and efficiently execute complex processing as a whole system become able to.
- the sorting mechanism includes, for example, the same number of registers as the processors, means for recording the identification data and priority data sent from each processor in the registers corresponding to the processors, and A comparator for determining the order of the identification data by comparing the priority data with each other; and determining the transmission order of the identification data based on the determination result by the comparator.
- the controller in the first multiprocessor system includes, for example, a data recording memory, and a recording control for acquiring the processing result from the processor specified by the identification data received from the sorting mechanism and recording the processing result in the memory. Means for reading out the processing result recorded in the memory and generating the broadcast data by including the processing result and the received identification data. It is.
- each of the plurality of processors determines whether or not the broadcast data includes data necessary for data processing performed by itself, and includes the necessary data.
- a data processing mechanism that performs data processing by selecting only the data when the data has been processed, and a processing result of the data performed by the data processing mechanism in response to a request from the controller and its own identification data.
- Both means for sending to the controller and means for sending a processing end notification data including its own identification data to the sorting mechanism upon completion of data processing are realized. It is possible.
- the second multiprocessor system includes a plurality of processors each holding template data to be compared with input data to be input, and a controller for broadcasting the input data to the plurality of processors. And a comparing mechanism for comparing outputs of the plurality of processors.
- the template data held by the plurality of processors is different from the template data held by each of the other processors.
- Each of the plurality of processors calculates a difference value between the feature of the input data broadcasted by the controller and the feature of the template data held by the processor, and identifies the calculated difference value and itself.
- the pairing data is transmitted to the comparison mechanism, and the comparison mechanism selects any one of the difference values based on the difference values received from each of the plurality of port processors, and selects the selected difference value.
- the controller sends identification data as a difference value and pair data to the controller, and the controller specifies one processor from among a plurality of processors based on the identification data received from the comparison mechanism. Is what you do.
- the third multiprocessor system includes a plurality of processors that perform data processing, a controller that broadcasts data used for data processing to the plurality of processors, and a sum of data processing results by the plurality of processors. And a summation circuit.
- Each of the plurality of processors performs data processing by selecting only data necessary for processing from the data broadcasted by the controller, and transmits a processing result to the summing circuit. Calculating the sum of the processing results transmitted from each of the plurality of processors and sending the calculated sum to the controller; and the controller broadcasting the sum of the processing results transmitted from the summing circuit to the plurality of processors. It is.
- the sum of the data processing results is normalized Is often needed for The calculated sum may be broadcast and transmitted to each processor. With the multiprocessor system having the above configuration, these processes can be performed at high speed.
- At least some of the plurality of processors are connected to each other via a shared memory via a ring, and data is transferred between the ring-connected processors via the shared memory. You may make it.
- a data processing method provided by the present invention is executed in an apparatus or system having a plurality of data processing means for processing data, and a control means for controlling the operation of each of the data processing means. That is,
- the control means acquires data processing results in a predetermined order from a data processor among the plurality of processors that have performed data processing, and the acquired processing results and identification data for identifying an acquisition source data processing means. Generating broadcast data including the following, and broadcasting the broadcast data to the plurality of data processing means;
- At least one of the plurality of data processing means selects only a part of processing results specified based on the identification data included in the broadcast data received from the control means, and performs data processing. And transmitting the processing result to the control means together with the identification data representing itself.
- a first data processing system comprises: a plurality of data processing means for performing data processing; a data processing result received from a part or all of the plurality of data processing means; and at least one of the data processing means And control means for broadcasting broadcast data including data used for data processing by the control means.
- the data processing is performed by selecting only the data necessary for the data processing performed by itself, and the processing result is sent to the control means.
- the second data processing system is a system that performs two-way communication with each of a plurality of data processing units that perform data processing, and specifies at least one of the data processing units and specifies the specified data processing unit.
- a computer program provided by the present invention is a computer program for performing bidirectional communication with each of a plurality of data processing means for performing data processing.
- the semiconductor device provided by the present invention is a computer program that performs bidirectional communication with each of a plurality of data processing units that perform data processing, and is incorporated in an apparatus equipped with the computer. It is a semiconductor device that has the following functions.
- FIG. 1 is a diagram showing a configuration example of a multiprocessor system to which the present invention is applied.
- FIG. 2 is a diagram illustrating a configuration example of a BCMC according to the present invention
- FIG. 3 is a diagram illustrating a configuration example of a cell processor according to the present invention
- FIG. 4 is a diagram illustrating a WTA-sum circuit according to the present invention.
- FIG. 5 is a diagram showing a configuration example, and FIG. 5 is a flowchart showing a flow of processing executed by the multiprocessor system according to the present embodiment;
- FIG. 6 is a conceptual diagram using data processing results of adjacent processors according to the present invention.
- FIG. 7 is a conceptual diagram that uses the result of overnight processing of some processors according to the present invention.
- FIG. 8 is an exemplary diagram in which grid points according to the present invention are grouped
- FIG. 9 is an exemplary diagram in which objects according to the present invention are divided into clusters.
- FIG. 10 is a flowchart showing the processing flow of the collision determination algorithm according to the present invention.
- FIG. 1 is a diagram showing a configuration example of a multiprocessor system.
- the multiprocessor system 1 includes a broadcast memory controller (hereinafter, referred to as a “BCMC (Broadcast Memory Controller)”) 10 which is a control means for data processing and data recording and reading.
- BCMC Broadcast Memory Controller
- a plurality of cell processors 20 each of which is an example of data processing means, and a plurality of WTAs (Winner Take All) 'sum circuit 30' for forming various required functions for data processing. It is comprised including.
- the BCMC 10 and all cell processors 20 are connected by a broadcast channel (a communication channel capable of simultaneous transmission).
- the multiprocessor system 1 manages a state variable value, which is an example of a data processing result by each cell processor 20, in a B CMC 10, and converts the state variable values of all the cell processors 20 from a BC.MC 10 into reference numerical values. As an example, it is transmitted by broadcast. Thereby, each cell processor 20 can refer to the state variable value generated in another cell processor 20 at high speed.
- the broadcast channel is a transmission path between the B CMC 10 and the plurality of cell processors 20, and includes an address bus used for passing addresses and a data bus used for passing data such as state variable values. It is comprised including.
- the address includes a cell address for specifying each cell processor 20 and a broadcast address for all cell processors 20.
- the cell address corresponds to an address (physical address or logical address) in the memory, and the state variable value from the cell processor 20 is always stored in the address corresponding to the cell address indicating the cell processor 20. Has become.
- Each cell processor 20 is provided with ID (identification) as identification information for identifying each cell processor.
- the cell address also corresponds to this ID. With this, it is possible to specify from which cell processor 20 the state variable value was output from the cell address.
- the summation circuit 30 is connected as shown in FIG. That is, the WTA summation circuit 30 is connected in a pyramid shape with the cell processor 20 side as the first stage. Two cell processors 20 are connected to the input terminals of the first-stage WTA / sum circuit 30, and the output terminals are connected to the input terminals of the second-stage WTA / sum circuit 30.
- the input terminals are connected to the output terminals of the two lower stages of the WTA total circuit 30, and the output terminals are connected to the input terminals of the upper stage WTA total circuit 30.
- the uppermost WTA 'summation circuit 30 has two lower WTA -The output terminal of the summation circuit 30 is connected, and the output terminal is connected to the BCMC 10.
- the present invention can be implemented by connecting the WTA / sum circuit 30 in a cascade in addition to the connection form shown in the figure. In this case, two cell processors 20 are connected to the input terminals of the first stage WTA / sum circuit 30, and the output terminals are connected to the input terminals of the upper stage.
- the output terminal of the lower stage WTAsum circuit 30 and the cell processor 20 are connected, and the output terminal is connected to the input terminal of the upper stage. .
- the output terminal of the lower stage WTA ⁇ sum circuit 30 and the cell processor 20 are connected to the input terminal of the uppermost stage “WTA * sum circuit 30”, and the output terminal is connected to the B CMC 10.
- the B CMC 10 broadcasts data to all the cell processors 20 via a broadcast channel, and acquires and holds the state variable values from each cell processor 20.
- Fig. 2 shows a configuration example of the B CMC 10.
- the B CMC 10 includes a CPU core 101 that controls the operation of the entire multiprocessor system 1, a rewritable main memory 102 such as an SRAM (Static Random Access Memory), and a DMAC (Direct Memory Access Controller) 103. It is connected and configured.
- the CPU core 101 cooperates with the main memory 102 to read and execute a predetermined computer program, thereby forming a function for performing the characteristic data processing of the present invention. It is a semiconductor device.
- the main memory 102 is used as a shared memory for the entire system.
- the output terminal of the WTA / summation circuit 30 at the uppermost stage and an external memory such as a hard disk or a portable medium are also connected to the bus B1.
- the CPU core 101 reads the startup program from the external memory at the time of startup, and executes the startup program to operate the operating system. Let also, various data required for data processing are read from the external memory, and are loaded into the main memory 102.
- the main memory 102 also stores data such as the state variable value of each cell processor 20.
- the state variable value is stored in the address of the main memory 102 corresponding to the cell address of the cell processor 20 that has calculated the state variable value.
- the CPU core 101 also generates a broadcast message to be broadcast to each cell processor 20 based on the data read from the main memory 102.
- the broadcast data is, for example, a pair (set) of a set of a state variable value and a cell address indicating the cell processor 20 that has calculated the state variable value.
- One or more sets of pair data are generated.
- the DMAC 103 is a semiconductor device that performs direct memory access transfer control between the main memory 102 and each cell processor 20. For example, a broadcast broadcast is broadcast to each cell processor 20 via a broadcast channel. Also, the data processing results of each cell processor 20 are individually obtained and written to the main memory 102. ⁇ Cell processor>
- Each cell processor 20 performs data processing by selecting the required data from the broadcast data, and reports this to the WTA / summation circuit 30 when the data processing is completed.
- the state variable value which is the data processing result, is transmitted to the BCM 10 according to the instruction from the BMC 10.
- Each of the cell processors 20 is ring-connected via a shared memory (not shown).
- Each cell processor 20 may perform data processing with a synchronous clock, or may perform data processing with a different clock.
- FIG. 3 shows a configuration example of the cell processor 20.
- the cell processor 20 includes a cell CPU 201, an input buffer 202, an output buffer 203, a WTA buffer 204, a program controller 205, an instruction memory 206, And a data memory 207.
- Cell CPU 201 is a processor with a programmable floating-point unit.
- the data processor controls the operation in the cell processor 20 and performs data processing.
- the cell CPU 201 acquires the broadcast data broadcast from the B CMC 10 via the input buffer 202, determines whether or not the data is necessary for the processing to be performed by the cell address of the pair data, and if necessary, Write the state variable value to the corresponding address in memory 207.
- the state variable value is read from the data memory 207, data processing is performed, the data processing result is written to the output buffer 203, and the WTA / summation circuit 30 receives data indicating the end of the data processing. send.
- the input buffer 202 holds broadcast data broadcast from the BCMC 10.
- the held broadcast data is transmitted to the cell CPU 201 in response to a request from the cell CPU 201.
- the output buffer 203 holds the state variable value of the cell CPU 201.
- the held state variable value is transmitted to the BCMC 10 in response to a request from the BCMC 10.
- the input buffer 202 and the output buffer 203 may also transmit and receive control data and the like.
- the WTA buffer 204 receives the data indicating the end of the data processing from the cell CPU 201 at the end of the data processing by the cell CPU 201, and transmits the data to the WTA / summation circuit 30 to perform the data processing.
- the end is reported to WT A ⁇ summation circuit 30.
- the end data indicating the end of the data processing includes, for example, the ID of the own cell processor 20 and the priority data for determining the priority when the state variable value stored in the output buffer 203 is read to the B CMC 10. included.
- the program controller 205 fetches a program that defines the operation of the cell processor 20 from the BCMC 10.
- the programs that define the operation of the cell processor 20 include a data processing program executed by the cell processor 20, a data selection program that determines data required for processing by the cell processor 20, and a processing result to the BCMC 10.
- a priority determination program that determines the priority of each.
- the instruction memory 206 stores the program fetched by the program controller 205.
- the saved program is loaded into cell CPU201 as needed.
- the data memory 207 stores data processed in the cell processor 20. Broadcast data determined to be necessary by cell CPU 201 is written. Broadcast data is stored in an address corresponding to the cell address.
- a part of the data memory 207 is connected to the adjacent cell processor 20 via the shared memory, and data can be transmitted / received to / from the adjacent cell processor 20 every cycle. It has become.
- the plurality of WT A ⁇ sum circuit 30 determines the order in which BCMC 10 takes in the state variable values from cell processor 20 based on the data sent from each cell processor 20 and indicating the end of the data processing. Report to 10
- FIG. 4 shows a configuration example of the WTA-sum circuit 30.
- Each WT A sum circuit 30 has two input registers A and B (hereinafter, first input register 301, second input register 3002), switch 303, and comparator 304 And an adder 305 and an output register 306.
- the first input register 301 and the second input register 302 include an integer register and a floating point register, respectively.
- ID is written in the end data indicating the end of data processing sent from the cell processor 20, and in the floating-point register, for example, priority data is written.
- the switch 303 activates one of the comparator 304 and the adder 305. Specifically, only one can be used according to the operation mode.
- the operation mode is determined by, for example, an instruction from BCMC10. The operation mode will be described later.
- the comparator 304 compares the floating-point values held by the floating-point registers of the first input register 301 and the second input register 302 with the larger (or smaller) value. , And the associated integer are written to output register 306.
- the adder 305 calculates the sum of the floating-point values held by the floating-point registers of the first input register 301 and the second input register 302, and outputs the calculation result to the output register 306. Write.
- the output register 303 is configured substantially the same as the first input register 301 and the second input register 302. In other words, it has integer registers and floating point registers. ID is written in the integer register, and priority data is written in the floating-point register.
- the WTA-sum circuit 30 has three operation modes described below.
- the comparator 304 is activated by the switch 303.
- the comparator 304 compares the floating-point values A and B held in the floating-point registers of the first input register 301 and the second input register 302, respectively. ) And its associated integer value are written to output register 306.
- the first input register 301 and the second input register 302 are cleared.
- the contents of the output register 306 are written to the input register of the upper stage WTA ⁇ summation circuit 30. At this time, if the input register to which the data is to be written is not cleared, the write is stalled, and the write is not performed in that cycle, and the write is performed in the next cycle.
- Addition mode :
- the adder 305 is activated by the switch 303.
- the adder 305 calculates the sum of the floating-point values A and B held by the floating-point registers of the first input register 301 and the second input register 302, and calculates the calculation result.
- the contents of the output register 306 are written to the input register of the upper stage WTA * summation circuit 30. Approximate sort mode:
- the comparator 304 is activated by the switch 303.
- the comparator 304 compares the floating-point values A and B held in the floating-point registers of the first input register 301 and the second input register 302, respectively. ) And the associated integer value are written to output register 306.
- the data received by the BCMC 10 from the uppermost output register 306 of the WT A ⁇ sum circuit 30 is sorted (rearranged) in ascending or descending order of floating point. .
- the first input register 301, the second input register 302, and the output register 303 of all the WTA-sum circuits 30 are cleared.
- the plurality of WTA ⁇ summing circuits 30 function as a sorting mechanism (sorting mechanism) and / or summing circuit as a whole.
- sorting mechanism sorting mechanism
- / or summing circuit as a whole.
- the WT A ⁇ sum circuit 30 operating in the maximum value mode and the approximate sort mode may be realized as follows.
- a WTA ⁇ sum circuit includes the same number of input registers as the cell processor 20, switches, comparators, adders, and output registers. As many input registers as the number of cell processors 20 are provided, each of which is an integer register like the first register 301 and the second register 302. And a floating point register. The comparator compares the floating-point values held in the floating-point registers of all input registers. The adder calculates the sum of the floating-point values held by the floating-point registers of all input registers. The output register is the same as the output register of the WTA ⁇ sum circuit 30 in FIG.
- the comparator compares the priority data held by the floating-point registers of each input register, and writes the associated IDs sequentially to the output registers in descending order of priority. This allows IDs to be sent to BCM10 in the order of priority.
- the data held by each floating-point register can be added by an adder, and the sum can be obtained.
- Such a WTA-sum circuit alone functions as a sort mechanism and a sum circuit in the present invention without taking the connection form shown in FIG.
- FIG. 5 is a flowchart showing a flow of processing executed in the multiprocessor system 1.
- the initial values of the state variable values of all the cell processors 20 are stored in the main memory 102 of the BCM 10 in advance.
- the BCM C10 creates a broadcast data based on the data of the state variable of the cell processor 20 and the cell address indicating the cell processor 20 (step S101). Then, the created broadcast data is broadcast to all the cell processors 20 (step S102).
- Each cell processor 20 takes in the broadcast data into the input buffer 202.
- the cell CPU 201 checks the cell address of the broadcast data stored in the input buffer 202 by the data selection program stored in the instruction memory 206, and executes the data processing performed by the own cell processor 20. Required It is checked whether there is a state variable value to be executed (step S103). When there is no state variable value required for the data processing performed by itself, the cell processor 20 ends the processing operation (step S103: nothing). If there is a state variable value required for the data processing performed by itself (step S103: yes), the corresponding state variable value is stored in the data memory 2 corresponding to the cell address that forms a pair with this state variable value. Overwrite the address on 07 (step S104).
- each cell processor 20 processes the state variable value recorded in the data memory 207 by the data processing program stored in the instruction memory 206 to perform a new state variable value. Generate The new state variable value is written to the data memory 207 and also to the output buffer 203 (step S105). The new state variable value is overwritten on the data memory 207 at the address corresponding to its own cell address.
- the cell CPU 201 sends the end data including the ID and the priority data to the input register of the first stage WTA / summation circuit 30 via the WTA buffer 204.
- the data is transmitted to report the end of the data processing (step S106).
- the priority data is generated by a predetermined priority determination program before or after data processing.
- the first stage WTAA summation circuit 30 outputs the ID to the integer register of the input register and the priority data to the floating-point register in the termination data sent from each cell processor 20. Hold.
- the WT A * summation circuit 30 operates in the approximate sort mode. Therefore, the switch 303 activates the comparator 304.
- the first input register 301 of the summation circuit 30 and the integer register of the second input register hold IDs sent from different cell processors 20, respectively. Each floating-point register holds the priority data associated with the ID. Carry.
- the comparator 304 reads out the priority data from the floating point registers of the first input register 301 and the second input register 302, and compares the priorities. As a result of the comparison, the higher priority data and the associated ID are written to the floating-point register and the integer register of the output register 306. The contents of the input register whose contents have been written to the output register 306 are cleared. The ID and priority data written to the output register 306 are written to the input register of the upper stage WTA ⁇ sum circuit 30.
- Such processing is performed in the WT A ⁇ sum circuit 30 of each stage.
- the uppermost WTA / sum circuit 30 sends the ID written in the integer register of the output register 306 to the BCMC 10.
- the IDs of the WT A ⁇ sum circuit 30 as a whole are transmitted to the B CMC 10 in the order of priority (step S107).
- the B CMC 10 obtains the data-processed state variable value from the output buffer 203 of the cell processor 20 corresponding to the ID sent from the WTA / summation circuit 30.
- the acquired state variable value is overwritten on the address corresponding to the cell address indicating the cell processor 20 that has performed the processing on the main memory 102 in the BCMC 10 (step S108).
- the BCMC 10 obtains a data processing result from each cell processor 20, and thereby generates broadcast data.
- Each cell processor 20 performs data processing by selecting only data necessary for itself from the broadcast data. By performing data processing using the broadcast data, processing using data processed by all other cell processors 20 becomes possible. In addition, by creating broadcast data using paired data including a data processing result from each cell processor 20 and a cell address indicating the cell processor 20 that generated the data processing result, the broadcast data of a specific cell processor 20 is generated. Processing using only the processing result becomes possible. Furthermore, between adjacent cell processors 20 Since they are connected via a shared memory, processing between adjacent cell processors 20 is possible as in the conventional case.
- Each cell processor 20 does not go directly to the main memory 102 to fetch the data required by its own cell processor 20. Since data is held in the processor 20 for processing, high-speed processing can be performed without any contention occurring.
- ⁇ represents a cell processor
- shaded “ ⁇ ” is a cell processor that performs data processing
- “Hata” is a cell processor that holds required data.
- X i, j (X i-1, j + Xi + l. J + X i, j-l + X i, i + l) Z 4
- i Row number of grid point
- j Column number of grid point
- FIG. 8 is an exemplary diagram in which grid point data is grouped, in which grid point data indicated by “ ⁇ ” is grouped into groups of five. One group of grid point data is processed by one cell processor 20.
- the cell processor 20 stores the required grid point data from the broadcast data in the data memory 207.
- the grid point data is sequentially read from the data memory 207 and subjected to data processing.
- Data transfer is performed using the shared memory with the cell processor 20 connected via the shared memory. If the operation of writing data to the shared memory is one cycle, the transfer of the grouped data between the cell processors 20 can be performed in 2 n cycles.
- each cell processor 20 By operating each cell processor 20 synchronously and simultaneously executing writing to the shared memory and calculation as in pipeline processing, communication and calculation between the cell processors 20 can be performed simultaneously.
- the data is broadcast by BCM10. self.
- the mouth processor 20 determines whether or not the required data is over based on the data i and j of the broadcast data.
- Data in the row or column direction can be processed by grouping broadcast data, and data processing in the column or row direction can be performed by transferring data via shared data.
- FIG. 7 an example in which only data processed by some of the cell processors 20 among all the cell processors 20 is used will be described with reference to FIG.
- “ ⁇ ” represents a cell processor
- shaded “ ⁇ ” is a cell processor that performs data processing
- “Hata” is a cell processor that holds required data.
- Such a multiprocessor system is useful for realizing a Hopfield associative memory.
- Each cell processor 20 holds a state variable value as a data processing result and a weight coefficient indicating the importance of the state variable value.
- the cell processor 20 is given a number, and the BCM 10 takes in the state variable values from the cell processor 20 in numerical order.
- the BCMC 10 broadcasts the state variable values fetched from all the cell processors 20 as broadcast data.
- Each cell processor 20 selects only the necessary state variable values from the broadcast data and weights them. Performs a product-sum operation with the coefficient and updates the state variable value. If the required state variable values are all the state variable values included in the broadcast data, this means that the process uses data processed by all processors.
- processing is performed to identify the cell processor 20 that holds data most similar to the characteristics of the input data. This process is performed as follows.
- Each cell processor 20 holds template data to be compared in advance.
- BCCM10 broadcasts the input data to all cell processors 20.
- Each cell processor 20 calculates a difference value between the feature of the template data held by itself and the feature of the input data.
- the difference value is sent to the WTA-sum circuit 30 together with the ID.
- WT A * Summation circuit 30 operates in maximum value mode.
- the input register holds the ID and the floating point register holds the difference value.
- the difference value is compared by the comparator 304, and the smaller difference value and the associated ID are sent to the output register 310. This is performed for the entire WT A ⁇ summation circuit 30, and the smallest difference value is added to it! 3 ⁇ 4 ⁇ Ask for ID.
- the ID and the difference value are sent to BCCM10.
- B CMC 10 specifies the cell processor 20 by ID. As a result, a template data most similar to the feature of the input data and a difference value between the template data most similar to the input data can be detected.
- collision determination algorithm is an algorithm that determines whether n objects (objects) existing in a certain space collide with each other and, if so, how strong. It is assumed that the spatial distribution of n objects is biased and divided into m clusters. Here, for example, it is determined whether one object collides with the other (n-1) objects most strongly.
- Fig. 9 is an illustration of objects in such a space.
- the object represented by " ⁇ " is enclosed in a rectangle to form one cluster.
- the object is divided into five classes. Have been.
- the data indicating the object is broadcast from the BCM 10 and taken into the cell processor 20 for each cluster.
- the cell processor 20 performs a process on the position and the movement in the space of the object included in the one captured class.
- the cell processors A to E perform processing on objects divided into five clusters.
- the BCMC 10 generates broadcast data including the object data including the position and velocity data of the object and the cluster data indicating the cluster to which the object belongs, and broadcasts the broadcast data to all cell processors 20. (Step S201). Each cell processor 20 selects and imports object data from the broadcast data based on the cluster data.
- the cell processor 20 that has taken in the object data calculates new position data after a unit time from the current position data and velocity data of the object.
- a new bounding box value is obtained from the new position data (step S202).
- the bounding box is, for example, the rectangle surrounding the object in Figure 9.
- the value of the pounding box is, for example, the coordinates of the vertex of the pounding box.
- the BCM C10 fetches new position data of the object from each cell processor 20 and updates the position data (step S203).
- BCMC 10 is an object containing the acquired new position data etc. Broadcast the data one by one to all cell processors 20 (step S204). That is, the position data indicating the position of one object to be subjected to collision determination (hereinafter referred to as “determination target object”) is obtained. Send to all cell processor 20.
- Each cell processor 20 first determines whether or not there is a possibility that the determination target object will collide, using the pounding box calculated in step S202 (step S205). Specifically, it is determined whether or not the position of the determination target object is within the bounding box.
- step S205 If there is a possibility of collision, that is, if the object to be determined is in the bounding box (step S205: Y), the range finder for each object in the bounding box processed by the cell processor 20 is used. The calculation is sequentially performed (step S206), and the collision is determined (step S207). If the object to be judged collides with one of the objects in the bounding box (step S207: ⁇ ), data (collision intensity data) quantitatively representing the strength of the impact due to the collision, the object to be judged by the collision A collision data including data indicating the influence on the object is generated (step S208). Further, the cell processor 20 sends the collision intensity data among the generated collision data together with the ID to the WTA / summation circuit 30 (step S209).
- collision intensity data quantitatively representing the strength of the impact due to the collision
- each cell processor 20 When the object to be determined is outside the bounding box (step S205: ⁇ ), or when it is determined that the collision does not occur as a result of the distance calculation (step S207: ⁇ ), each cell processor 20 generates a WTA * sum circuit 30. Then, for example, “-1.0” is sent as the collision intensity data (step S210).
- the WTA ⁇ Summation circuit 30 operates in maximum value mode.
- the WTA / summation circuit 30 compares the collision intensity data sent from the cell processor 20 and detects a collision intensity data indicating that the impact intensity due to the collision is the highest (step S211).
- the cell processor 20 that has generated the obtained collision intensity data is specified.
- the ID representing the specified cell processor 20 is sent to the B CMC 10.
- the BCMC 10 uses the ID sent from the top of the WTA
- the collision data is acquired from the represented cell processor 20 (step S212). By performing the processing after step S204 on all objects, collision determination between all objects in the space is performed.
- each cell processor 20 inputs the data processing result to the WT A ⁇ sum circuit 30.
- the data processing results are added by the adder 305, and finally the sum of the data processing results of all the cell processors 20 is obtained. In this way, the total sum of the data processing results can be obtained at high speed by the WT A ⁇ sum circuit 30.
- the sum of the data processing results is sent to the BMC 10 and can be transmitted to each cell processor 20 by broadcasting at a high speed.
- the sum of the data processing results is used for, for example, a normalization calculation in an optimization calculation such as a new word.
- the BCMC 10 and the WT A summation circuit 30 are independent of each other.However, the controller is configured as one block in which the WTA summation circuit 30 is incorporated in the BCMC 10. You may.
- the data processing means is the cell processor 20 and the control means is the controller (BCMC 10). It is not limited to the example.
- a plurality of data processing terminals are connected via a wide area network in a form capable of two-way communication, one or more of the data processing terminals are controlled by control means, and the other plurality of data processing terminals are Operate as data processing means, and control means include a datacast containing data processing results received from some or all of the plurality of data processing means and data used for data processing by at least one data processing means.
- control means include a datacast containing data processing results received from some or all of the plurality of data processing means and data used for data processing by at least one data processing means.
- a function of broadcasting data is provided, and each of the plurality of data processing means selects only data necessary for its own data processing from the broadcast data broadcasted by the control means and outputs the data.
- a function of transmitting the processing result to the control means may be provided.
- a general-purpose data processing terminal capable of specifying it by predetermined identification information (for example, the identification data described above) is used, and a service capable of two-way communication with these general-purpose data processing terminals is used.
- the data processing system may be configured with only one device or a device equipped with a semiconductor device having a built-in CPU and memory.
- the server or the device specifies a data processing terminal as at least one data processing means in the server body or the device by reading and executing a predetermined computer program by the CPU in the server or the device.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Multi Processors (AREA)
- Image Processing (AREA)
- Hardware Redundancy (AREA)
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AT01972530T ATE500556T1 (de) | 2000-09-27 | 2001-09-27 | Multiprozessorsystem, datenverarbeitungssystem, datenverarbeitungsverfahren und rechnerprogramm |
DE60144155T DE60144155D1 (de) | 2000-09-27 | 2001-09-27 | Multiprozessorsystem, datenverarbeitungssystem, datenverarbeitungsverfahren und rechnerprogramm |
EP01972530A EP1324209B1 (en) | 2000-09-27 | 2001-09-27 | Multiprocessor system, data processing system, data processing method, and computer program |
AU2001292269A AU2001292269A1 (en) | 2000-09-27 | 2001-09-27 | Multiprocessor system, data processing system, data processing method, and computer program |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2000-294732 | 2000-09-27 | ||
JP2000294732 | 2000-09-27 | ||
JP2001289588A JP3426223B2 (ja) | 2000-09-27 | 2001-09-21 | マルチプロセッサシステム、データ処理システム、データ処理方法、コンピュータプログラム |
JP2001-289588 | 2001-09-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2002027513A1 true WO2002027513A1 (fr) | 2002-04-04 |
Family
ID=26600866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2001/008434 WO2002027513A1 (fr) | 2000-09-27 | 2001-09-27 | Systeme multiprocesseurs, systeme de traitement de donnees, procede de traitement de donnees et programme d'ordinateur |
Country Status (10)
Country | Link |
---|---|
US (1) | US7017158B2 (ja) |
EP (1) | EP1324209B1 (ja) |
JP (1) | JP3426223B2 (ja) |
KR (1) | KR100866730B1 (ja) |
CN (1) | CN1258154C (ja) |
AT (1) | ATE500556T1 (ja) |
AU (1) | AU2001292269A1 (ja) |
DE (1) | DE60144155D1 (ja) |
TW (1) | TWI229265B (ja) |
WO (1) | WO2002027513A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111290697A (zh) * | 2018-12-07 | 2020-06-16 | 上海寒武纪信息科技有限公司 | 数据压缩方法、编码电路和运算装置 |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6989843B2 (en) * | 2000-06-29 | 2006-01-24 | Sun Microsystems, Inc. | Graphics system with an improved filtering adder tree |
US8478811B2 (en) | 2002-10-08 | 2013-07-02 | Netlogic Microsystems, Inc. | Advanced processor with credit based scheme for optimal packet flow in a multi-processor system on a chip |
US7334086B2 (en) * | 2002-10-08 | 2008-02-19 | Rmi Corporation | Advanced processor with system on a chip interconnect technology |
US9088474B2 (en) | 2002-10-08 | 2015-07-21 | Broadcom Corporation | Advanced processor with interfacing messaging network to a CPU |
US8015567B2 (en) | 2002-10-08 | 2011-09-06 | Netlogic Microsystems, Inc. | Advanced processor with mechanism for packet distribution at high line rate |
US8176298B2 (en) | 2002-10-08 | 2012-05-08 | Netlogic Microsystems, Inc. | Multi-core multi-threaded processing systems with instruction reordering in an in-order pipeline |
US7346757B2 (en) | 2002-10-08 | 2008-03-18 | Rmi Corporation | Advanced processor translation lookaside buffer management in a multithreaded system |
US8037224B2 (en) | 2002-10-08 | 2011-10-11 | Netlogic Microsystems, Inc. | Delegating network processor operations to star topology serial bus interfaces |
US20050120185A1 (en) * | 2003-12-01 | 2005-06-02 | Sony Computer Entertainment Inc. | Methods and apparatus for efficient multi-tasking |
JP4794194B2 (ja) * | 2005-04-01 | 2011-10-19 | 株式会社日立製作所 | ストレージシステム及び記憶制御方法 |
JP4555145B2 (ja) * | 2005-04-28 | 2010-09-29 | 富士通株式会社 | バッチスケジューリングプログラム、バッチスケジューリング方法およびバッチスケジューリング装置 |
US7444525B2 (en) * | 2005-05-25 | 2008-10-28 | Sony Computer Entertainment Inc. | Methods and apparatus for reducing leakage current in a disabled SOI circuit |
US7970956B2 (en) * | 2006-03-27 | 2011-06-28 | Ati Technologies, Inc. | Graphics-processing system and method of broadcasting write requests to multiple graphics devices |
US9596324B2 (en) | 2008-02-08 | 2017-03-14 | Broadcom Corporation | System and method for parsing and allocating a plurality of packets to processor core threads |
JP5039950B2 (ja) | 2008-03-21 | 2012-10-03 | インターナショナル・ビジネス・マシーンズ・コーポレーション | オブジェクト移動制御システム、オブジェクト移動制御方法、サーバ及びコンピュータプログラム |
US7958341B1 (en) | 2008-07-07 | 2011-06-07 | Ovics | Processing stream instruction in IC of mesh connected matrix of processors containing pipeline coupled switch transferring messages over consecutive cycles from one link to another link or memory |
US8145880B1 (en) | 2008-07-07 | 2012-03-27 | Ovics | Matrix processor data switch routing systems and methods |
US8327114B1 (en) | 2008-07-07 | 2012-12-04 | Ovics | Matrix processor proxy systems and methods |
US8131975B1 (en) | 2008-07-07 | 2012-03-06 | Ovics | Matrix processor initialization systems and methods |
US7870365B1 (en) | 2008-07-07 | 2011-01-11 | Ovics | Matrix of processors with data stream instruction execution pipeline coupled to data switch linking to neighbor units by non-contentious command channel / data channel |
CN101478785B (zh) * | 2009-01-21 | 2010-08-04 | 华为技术有限公司 | 资源池管理系统及信号处理方法 |
JP4539889B2 (ja) * | 2009-02-18 | 2010-09-08 | 日本電気株式会社 | プロセッサ及びデータ収集方法 |
KR101651871B1 (ko) * | 2009-12-28 | 2016-09-09 | 삼성전자주식회사 | 멀티코어 시스템 상에서 단위 작업을 할당하는 방법 및 그 장치 |
US8850262B2 (en) * | 2010-10-12 | 2014-09-30 | International Business Machines Corporation | Inter-processor failure detection and recovery |
CN102306371B (zh) * | 2011-07-14 | 2013-09-18 | 华中科技大学 | 一种分层并行的模块化序列图像实时处理装置 |
KR101863605B1 (ko) | 2011-09-19 | 2018-07-06 | 삼성전자주식회사 | 스트림 데이터를 고속으로 처리하는 프로세서 |
US20130081021A1 (en) * | 2011-09-23 | 2013-03-28 | Elwha LLC, a limited liability company of the State of Delaware | Acquiring and transmitting tasks and subtasks to interface devices, and obtaining results of executed subtasks |
US9710768B2 (en) | 2011-09-23 | 2017-07-18 | Elwha Llc | Acquiring and transmitting event related tasks and subtasks to interface devices |
CN106936994B (zh) | 2017-03-10 | 2019-10-01 | Oppo广东移动通信有限公司 | 一种广播接收者的控制方法、装置及移动终端 |
JP7038608B2 (ja) * | 2018-06-15 | 2022-03-18 | ルネサスエレクトロニクス株式会社 | 半導体装置 |
JP7004083B2 (ja) * | 2018-10-23 | 2022-01-21 | 富士通株式会社 | 演算処理装置及び演算処理装置の制御方法 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61283976A (ja) * | 1985-06-11 | 1986-12-13 | Sanyo Electric Co Ltd | パタ−ン認識装置 |
JPH0247757A (ja) * | 1988-08-09 | 1990-02-16 | Sanyo Electric Co Ltd | 情報処理装置 |
EP0360527A2 (en) * | 1988-09-19 | 1990-03-28 | Fujitsu Limited | Parallel computer system using a SIMD method |
EP0411497A2 (en) * | 1989-07-31 | 1991-02-06 | Hitachi, Ltd. | Data processing system and data transmission and processing method |
JPH0784966A (ja) * | 1993-08-06 | 1995-03-31 | Toshiba Corp | データ処理装置 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4739476A (en) * | 1985-08-01 | 1988-04-19 | General Electric Company | Local interconnection scheme for parallel processing architectures |
JPH0814816B2 (ja) | 1988-09-19 | 1996-02-14 | 富士通株式会社 | 並列計算機 |
JP2850387B2 (ja) | 1989-07-31 | 1999-01-27 | 株式会社日立製作所 | データ伝送方式 |
JP2642039B2 (ja) * | 1992-05-22 | 1997-08-20 | インターナショナル・ビジネス・マシーンズ・コーポレイション | アレイ・プロセッサ |
US5511212A (en) * | 1993-06-10 | 1996-04-23 | Rockoff; Todd E. | Multi-clock SIMD computer and instruction-cache-enhancement thereof |
US6516403B1 (en) * | 1999-04-28 | 2003-02-04 | Nec Corporation | System for synchronizing use of critical sections by multiple processors using the corresponding flag bits in the communication registers and access control register |
-
2001
- 2001-09-21 JP JP2001289588A patent/JP3426223B2/ja not_active Expired - Fee Related
- 2001-09-26 US US09/964,247 patent/US7017158B2/en not_active Expired - Lifetime
- 2001-09-27 AT AT01972530T patent/ATE500556T1/de not_active IP Right Cessation
- 2001-09-27 CN CNB018029167A patent/CN1258154C/zh not_active Expired - Fee Related
- 2001-09-27 KR KR1020027006766A patent/KR100866730B1/ko active IP Right Grant
- 2001-09-27 AU AU2001292269A patent/AU2001292269A1/en not_active Abandoned
- 2001-09-27 TW TW090123900A patent/TWI229265B/zh not_active IP Right Cessation
- 2001-09-27 EP EP01972530A patent/EP1324209B1/en not_active Expired - Lifetime
- 2001-09-27 WO PCT/JP2001/008434 patent/WO2002027513A1/ja active Application Filing
- 2001-09-27 DE DE60144155T patent/DE60144155D1/de not_active Expired - Lifetime
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS61283976A (ja) * | 1985-06-11 | 1986-12-13 | Sanyo Electric Co Ltd | パタ−ン認識装置 |
JPH0247757A (ja) * | 1988-08-09 | 1990-02-16 | Sanyo Electric Co Ltd | 情報処理装置 |
EP0360527A2 (en) * | 1988-09-19 | 1990-03-28 | Fujitsu Limited | Parallel computer system using a SIMD method |
EP0411497A2 (en) * | 1989-07-31 | 1991-02-06 | Hitachi, Ltd. | Data processing system and data transmission and processing method |
JPH0784966A (ja) * | 1993-08-06 | 1995-03-31 | Toshiba Corp | データ処理装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111290697A (zh) * | 2018-12-07 | 2020-06-16 | 上海寒武纪信息科技有限公司 | 数据压缩方法、编码电路和运算装置 |
Also Published As
Publication number | Publication date |
---|---|
ATE500556T1 (de) | 2011-03-15 |
US7017158B2 (en) | 2006-03-21 |
KR20020059430A (ko) | 2002-07-12 |
TWI229265B (en) | 2005-03-11 |
AU2001292269A1 (en) | 2002-04-08 |
JP2002175288A (ja) | 2002-06-21 |
JP3426223B2 (ja) | 2003-07-14 |
DE60144155D1 (de) | 2011-04-14 |
US20020059509A1 (en) | 2002-05-16 |
EP1324209B1 (en) | 2011-03-02 |
CN1392985A (zh) | 2003-01-22 |
EP1324209A4 (en) | 2008-12-17 |
EP1324209A1 (en) | 2003-07-02 |
KR100866730B1 (ko) | 2008-11-03 |
CN1258154C (zh) | 2006-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2002027513A1 (fr) | Systeme multiprocesseurs, systeme de traitement de donnees, procede de traitement de donnees et programme d'ordinateur | |
CN109284823B (zh) | 一种运算装置及相关产品 | |
CN110163359B (zh) | 一种计算装置及方法 | |
JPH02501599A (ja) | 多重プロセッサ・アレイにおける仮想処理手法および仮想プロセッサ | |
CN111630505B (zh) | 深度学习加速器系统及其方法 | |
TW202321999A (zh) | 一種計算裝置及方法 | |
CN111752691B (zh) | Ai计算图的排序方法、装置、设备及存储介质 | |
CN109670581B (zh) | 一种计算装置及板卡 | |
Chen et al. | Highly efficient alltoall and alltoallv communication algorithms for gpu systems | |
JP3872034B2 (ja) | マルチプロセッサシステム、データ処理方法、データ処理システム、コンピュータプログラム、半導体デバイス | |
JP2000163384A (ja) | 半導体装置 | |
WO2022095676A1 (zh) | 神经网络稀疏化的设备、方法及相应产品 | |
Wirawan et al. | Parallel DNA sequence alignment on the cell broadband engine | |
US12001893B1 (en) | Distributed synchronization scheme | |
CN111381875B (zh) | 数据比较器、数据处理方法、芯片及电子设备 | |
CN111382848B (zh) | 一种计算装置及相关产品 | |
JP2002140717A (ja) | 画像処理方法及び装置、コンピュータプログラム | |
JPH06505588A (ja) | 並列ソフトウェア処理用ネットワーク構造 | |
JP2710162B2 (ja) | 荷電ビーム描画用データの作成方法及び作成装置 | |
CN113626083B (zh) | 数据处理装置以及相关产品 | |
WO2020125092A1 (zh) | 计算装置及板卡 | |
CN115471391A (zh) | 用于单目标检测的芯片、板卡、方法及计算装置 | |
CN118278479A (zh) | 一种基于指令集的众核类脑处理器及工作方法 | |
CN118733206A (zh) | 基于多核系统的任务调度方法、装置及相关产品 | |
CN115543329A (zh) | 对运行于人工智能芯片上的区域候选网络进行优化的编译方法及其相关产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AU BR CA CN IN KR MX NZ RU SG |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): BE CH DE DK ES FI FR GB IT NL SE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001972530 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 018029167 Country of ref document: CN Ref document number: 1020027006766 Country of ref document: KR |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWP | Wipo information: published in national office |
Ref document number: 1020027006766 Country of ref document: KR |
|
WWP | Wipo information: published in national office |
Ref document number: 2001972530 Country of ref document: EP |