WO2009004628A2 - Multi-core cpu - Google Patents

Multi-core cpu Download PDF

Info

Publication number
WO2009004628A2
WO2009004628A2 PCT/IL2008/000916 IL2008000916W WO2009004628A2 WO 2009004628 A2 WO2009004628 A2 WO 2009004628A2 IL 2008000916 W IL2008000916 W IL 2008000916W WO 2009004628 A2 WO2009004628 A2 WO 2009004628A2
Authority
WO
WIPO (PCT)
Prior art keywords
core
core cpu
cores
cpu according
types
Prior art date
Application number
PCT/IL2008/000916
Other languages
French (fr)
Other versions
WO2009004628A3 (en
Inventor
Asaf Shelly
Original Assignee
Feldman, Moshe
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Feldman, Moshe filed Critical Feldman, Moshe
Publication of WO2009004628A2 publication Critical patent/WO2009004628A2/en
Publication of WO2009004628A3 publication Critical patent/WO2009004628A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7828Architectures of general purpose stored program computers comprising a single central processing unit without memory
    • G06F15/7832Architectures of general purpose stored program computers comprising a single central processing unit without memory on one IC chip (single chip microprocessors)

Definitions

  • the present invention relates generally to computer processors, and more particularly, relates to computer multi-core processors.
  • a processor is a logic circuitry that responds to and processes the instructions that drive a computerized device.
  • CPU Central Processing Unit
  • the processors are nowadays composed of integrated circuits (IC). These allow to increase the complexity of the CPUs designed and to reduce the dimension of processors and, particularly, to propose processors including a high number of ICs allowing more quick computing.
  • IC integrated circuits
  • An Operating System has many types of tasks to manage.
  • the prior art provides a general purpose CPU which handles a number of type of tasks types such as Digital Signal Processing (DSP) of image and video decoding, network streaming, database memory management, OS Kernel, OS device drivers managing hardware, etc.
  • DSP Digital Signal Processing
  • OS Kernel OS device drivers managing hardware
  • OS Kernel OS device drivers managing hardware
  • Such a CPU has to toggle between internal operating modes for example between Applicative Mode (User Mode) and System Mode (Kernel Mode).
  • This CPU is designed as a compromise between all these necessities.
  • a general purpose CPU runs both applications and drivers, any and all of: DSP, memory intensive operations, network streaming, and hardware real-time tasks.
  • a multi-core CPU combines at least two independent cores into a single package composed of a single IC.
  • a dual-core processor contains two cores, and as another example a quad-core contains four cores.
  • a multi-core microprocessor contains a plurality of cores and implements multiprocessing in a single physical package.
  • a multi-core CPU duplicates the same CPU at least two times.
  • a multi-core CPU works sharing the same interconnection, as buses, to the rest of the system.
  • the multi-core CPU can deal with a number of threads concurrently. Nevertheless this depends on its design and to the software that manages and uses its capabilities.
  • the main challenges in the field of multi-core CPUs is related to the task scheduler which is a critical component of the operation system software.
  • the goal of a task scheduler is to manage the CPU resource allocation to said tasks. This allocation typically works according to task priority levels.
  • a priority level is allocated to a task according to the software or to the hardware component which needs to be executed. According to this definition, simultaneous scheduling of a large number of tasks can reduce the performance of a system (even if including a multi-core CPU) if the number of tasks with a high level of priority is large.
  • the multi- core CPUs architectures of the prior art does not provide a solution to this problem.
  • Task scheduling and real-time behavior differ between real-time and kernel tasks and, non-critical and applicative tasks in parameters such as response time, latency, scheduling algorithms, priority system, and the number of priorities used.
  • the CPUs of the prior art are designed in order to make a compromise between all types of tasks.
  • all cores of a set of CPU compete over the access to memory and I/O
  • increasing the number of cores in a system means that all cores compete over memory access, access to hardware resources, and other system resources.
  • Any task can run on any CPU and it is extremely difficult to manage priorities between cores on one hand but on the other real-time tasks share the same core pool with low priority tasks. This causes real-time and high priority tasks to compete with low priority tasks over the same resources, which is a serious design flaw.
  • This design works fine for a single core CPU because
  • a task is executed by a core; when the number of tasks is higher than the number of cores, the tasks are scheduled to run periodically on the available cores, according to a common scheduling algorithm dealing with all types of tasks on the system.
  • Computer systems use basic data types and build complex data types as a collection of the basic types.
  • Basic types such as Boolean, integer, integer 64, integer 128, floating point, double floating point, etc, where Integer can be either signed or unsigned (for example of 8 bit Integer: 0 - 255 or -127 - 128). These basic types are either multiples of power of 2 or floating points. Most cases the implementation does not really need an Integer that is 2 at the power of multiples of 8 (1, 256, 65536, 4 Giga, etc.). The implementation in such cases is using a basic type Integer that can hold the required value and the CPU manages the type in software.
  • a person's age can have the value of 0 to 120 so an Integer of -127 to +128 can be used and it is up to the software to verify that the age is not below 0 and not above 120.
  • the CPU has a special Flag called the Overflow Flag to indicate that the last operation caused the value to overflow. For example an 8 bit Integer of the value of 127 added with 2 will fall back to -126 and the Overflow Flag will be raised as a warning automatically.
  • an age of 0 to 120 it is up to the software to prevent value overflows so for example an age of 119 added with 2 years will result in an Integer of 121 and therefore the software has to verify that the value did not go over 120.
  • the present invention is related to a multi-core CPU which overcomes the currently available multi-core CPUs.
  • the multi-core CPU of the present invention is composed by a plurality of cores which can be of several types of cores. Each core is optimized for handling a specific type of task and task classification in the system. This optimization improves the behavior and performance of the task running on the core according to the type of task.
  • a General Purpose CPU can never achieve optimization for all types of tasks as it is possible by using dedicated cores.
  • the proposed multi-core CPU can be easily upgraded according to the possibility to connect and to disconnect any core of the system.
  • the present invention allows dealing with types of data currently unused by the available multi-core CPUs.
  • Fig. 1 is a schematic block diagram of an example of a series of cores connected through common buses according to the prior art
  • Fig. 2 is a schematic block diagram of an exemplary embodiment of the invention, with cores connected through a relay core instead of through common buses;
  • Fig. 3 is a schematic block diagram of an exemplary embodiment of the invention, with cores connected through a relay core instead of through a common network bus;
  • Fig.4 is a schematic representation of an exemplary embodiment of the invention, with dedicated multi-core CPU.
  • Fig.5 is a schematic representation of a multi-core CPU and said internal cores.
  • Fig.6 is a schematic representation a multicore CPU that is made of several physical chips or cards.
  • Fig. 7 is a logical representation of Fig. 6.
  • Fig. 1 is a schematic block diagram representing the prior art of multi-core CPU.
  • Cores 102 are identical, that is to say that they are based on the same technology.
  • the set of these cores 102 is connected through common buses of addresses 104 and of data 106.
  • Fig. 2 is a schematic block diagram representing an example of a set of cores 202, 204, and 214, connected through a relay core 212.
  • the first subset 200 of cores 202 and 204 is connected by a common address bus 206 and by a
  • the first subset 210 of cores 212 and 214 is connected by a common address bus 216 and by a common data bus 218.
  • a core e.g. 202 or 212
  • This core is termed "relay core” for convenience in this case, but is not dedicated exclusively to this types of task.
  • the relay core 212 is able to connect the subset of cores 210 to the subset of cores 200 using connections 220 and 222 respectively to the address bus 204 and to the data bus 206 of the subset of cores 200.
  • a relay core can relay several different types of busses.
  • Fig. 3 is a schematic block diagram representing an example of three subsets 300, 310, and 320 of cores 302 and 304, 312 and 314, and 322 and 324, respectively.
  • Each core of a subset is connected to the others using a common network bus (respectively for each subset 306, 316, and 326).
  • a subset of cores is not connected to another using a bus, but rather using a relay core.
  • the relay core is core 312. It is connected to the subset of cores 300 by way of core 302 using connection 330. It is also connected to the subset of cores 320 by way of core 322 using connection 332.
  • the design of the multi-core CPU includes cores which are dedicated for different types of
  • a CPU core can act as a relay for other cores and between segments of buses.
  • a multi-core processor can include several types of cores.
  • Fig. 4 schematically shows a CPU 400 comprising 9 cores; a core mainly dedicated to an operating system 402, two cores 404 and 406 mainly dedicated each one to a type of device driver accessing hardware, two cores 408 and 410 mainly dedicated to Video 408 and to audio 410 Digital Signal Processing (DSP), a core mainly dedicated to the activity of the Floating Point Unit (FPU) 412, two cores 414 and 416 mainly dedicated each one to a core- specific software, and a core 418 mainly dedicated to the activity of the Watch Dog.
  • DSP Digital Signal Processing
  • FPU Floating Point Unit
  • FPU Floating Point Unit
  • 414 and 416 mainly dedicated each one to a core- specific software
  • a core 418 mainly dedicated to the activity of the Watch Dog.
  • Watch Dog is used herein refers to meaning used for this component in prior art CPUs.
  • Fig.5 is a schematic representation of a CPU that has 9 cores. It is an expansion of Fig 4. The references used for the description of this figure are based on the table bellow.
  • This core is video and graphics and it is expected to be part of the video card for the computer system.
  • This core can also be used for video decoding and video compression.
  • the type of core is DSP (Digital Signal Processing) and it should support advanced rendering capabilities such as MAC - Multiply Accumulate and perform operations on sets of pixels.
  • This core is expected to be connected directly to the video display hardware and probably has a fast connection to the computer's memory for fast image transfers. Internally such a core has advanced support for floating point operations and a strong ALU unit that handles large numbers and packed bytes.
  • the data bus for this core is as wide as possible to allow large volume transfers in real-time.
  • Core C System
  • This core is to run the operating system's core and elementary real-time elements. This core would probably used to run the scheduler, application loader and boot loader, and is probably the first core to run in the system so that it can initialize the other cores.
  • the type of core is System and it should be simple to initialize and should be designed for fast response times. This core can potentially be connected directly to the DMA and Interrupt controller and would probably have some control over I/O. Internally such a core has support super fast 16 bit ALU operations and has hardware support for process and thread tables that include Priorities and Permission Tables. This core has full permission over the system and for the current design behaves as the master for all cores.
  • the data bus for this core is 32 bit wide because the focus of this core is response time.
  • this core will work with a super fast memory card that is blocked from access of other cores.
  • Such a core would usually work with system RAM directly and not virtually and use a small amount of memory at reserved areas so there is no need for complex address / data busses.
  • This core is audio and voice and it is expected to be part of the computer's audio card with speaker output and line in and microphone.
  • This core can also be used for audio decoding and encoding such as MP3.
  • the type of core is DSP and it should support advanced rendering capabilities such as MAC - Multiply Accumulate and perform audio operations such as SRC (Sample Rate Conversion).
  • This core can potentially have an analog input and output so it can function as a sound card, or it can be connected digitally to a DAC (Digital to Analog Converter). Internally such a core has support for floating point operations and a 32 bit ALU unit which is enough for 16 bit audio data.
  • the data bus for this core is as wide as possible to allow large volume transfers in real-time.
  • This core is Device Drivers and runs the loadable drivers which manage hardware elements and allow Applications to connect to these hardware elements.
  • This core is expected to perform in real-time and have very fast responses.
  • the type of core is Real-time which means that the latency for event handling has to be deterministic and that context switches must be very fast, if at all allowed.
  • This core manages hardware and thus has control and access to system I/O.
  • the core can have support for Interrupt handling and it can receive events from the system core if it manages the Interrupt hardware. Internally such a core has support for 32 bit ALU operations and supports I/O data manipulations in the instruction set.
  • the data bus for this core is 32 bits wide because it is not expected to perform large buffer transfers in real-time.
  • Core F Streaming Device Drivers
  • This core is Device Drivers and runs the loadable drivers which manage hardware elements and allow Applications to connect to these hardware elements.
  • This core is expected to perform in real-time and have very fast responses.
  • the type of core is streaming which means that the focus of this core is large buffer transfers in real-time.
  • This core manages streaming hardware such as network cards and some types of DSP hardware (such as CODECs - COders DECoders) and thus has control and access to system I/O.
  • the core can have support for Interrupt handling and it can receive events from the system core if it manages the Interrupt hardware. Internally such a core has support for floating point operations and integer ALU operations and supports I/O data manipulations in the instruction set.
  • the data bus for this core is 32 bits wide because it is the exemplary hardware does not support a wider data bus but can be as wide as possible to allow fast operations with memory.
  • This core The function of this core is Applications and it is design to run the user applications on the system.
  • This core has little to no relevance for real-time and scheduling is not expected to occur rapidly so the core can maintain a large amount of data for an execution unit.
  • the type of core is GUI which means that the focus of this core is to handle user events and send output to the display device, so the response times are expected to be suitable for communication with a person.
  • the core should not have any I/O access and should be able to communicate with hardware via the Device Driver cores and with the display via the Video cores. Internally such a core has support for wide integer ALU operations and potentially direct access to Video cores for large image transfers and manipulations.
  • the data bus for this core is 64 bits wide because this is the type of data that this core is expected to use for this example.
  • This core is Database and it exists to manage a database in memory.
  • the type of core is Service which means that Applications communicate with it just like a Device Driver core but the core does not have I/O and hardware access and it is not designed to respond in strong real-time.
  • This core can support database functionality and algorithms in hardware such as Hash functions, Indexing, etc. Internally such a core has support for wide ALU operations and packed data operations.
  • the data bus for this core is as wide as possible to allow large volume transfers because the nature of such service is to scan large chunks of data and large scale copy operations.
  • Fig.6 shows an example of a multi-core CPU that is made of several physical chips or cards.
  • This system has a DSP card / chip installed for support of video and audio rendering. This can be manufactured by a company with expertise of algorithms and analog hardware.
  • the system is installed with a Kernel card / chip that supports I/O and interrupt handling. This is probably part of the computer's motherboard and designed by the computer's manufacturer.
  • the system also has an Application card / chip installed. This should run the applications and might be a set of expansion cards so that it is possible to add more elements when the system runs out of CPU power for the running applications.
  • a Database card / chip is installed on the system to act as a service that manages databases. This is part of the Services expansion support by the system to allow special functionality that is not required for most common system configurations.
  • Fig.7 is a logical representation of Fig. 6.
  • the operating system is not expected to care that the cores are installed on several separate hardware elements and software running on the machine should think that there is one big virtual CPU installed on the machine.
  • a CPU designed according to the previous description is defined by a CPU instruction set, privileges, power consumption, bus speed, Input/Output access/memory access; and memory access type (virtual, physical, etc).
  • all cores use, in a CPU designed according to the present invention, the same bus/bus methodology and behave vis-a-vis the software/operating system as a single multi-core CPU.
  • a set of multi-core CPU's that join all cores to form a virtual multi-core CPU, which can then be divided into smaller virtual CPU's.
  • This can also allow a set of CPU's with special core types, for example, a CPU with cores for an operating system and a CPU for processes, to form two virtual CPU's that have some cores for processes and some cores for an operating system.
  • the proposed multi-core CPU can have cores dynamically attached/detached, e.g., plug & play cores. This can increase the modularity of the proposed system.
  • the proposed multi-core CPU can support advanced types of data in its instruction set.
  • Prior art CPUs support simple types such as bit, boolean, byte, word, double word, float, packages of these items; some CPUs have instruction support for strings.
  • the system of the present invention can furthermore support, types of data which are not bound to 2 in the power of multiples of 8 (256, 65536, etc.). These advanced types, for example, can begin with -3 and go up to 12, adding 1 to a variable with the value of 12, which will overflow to -3, if overflow is allowed.
  • a variable can be [1 to 22] [0 to 15] [32 to 234], packed as a single variable.
  • a variable with value [I][O] [234] added to [I][O][I] will overflow to [I][I][O].
  • Different overflow behaviors can be defined, as will be apparent to the skilled person and not described here in detail, for the sake of brevity.
  • a variable of type HASH is available.
  • a variable of type HASH (more specifically the calculated HASH value) can be compared to variables of other types.
  • a CPU core that can implement a mechanism to maintain the value limitations in hardware should be able to detect such overflows in hardware. In such a scenario the CPU should know that the type of value is 0 to 120 and therefore adding 2 to a value of 119 will overflow to 1 and the Overflow Flag will be raised. This means that the memory will not store illegal values if the software programmer made the mistake of not verifying the data before addition.
  • ALU Arimetic Logic Unit
  • ALU support supporting HASH type can include Addition that will add value to the HASH according to HASH algorithm, Subtraction will reverse this operation or use a negative input value, this while maintaining ALU flags such as Carry Flag, Zero Flag, etc.
  • a value type that is not based on 2 by the power of multiples of 8 can start at any value and up to any value.
  • Using a collection of these types as base elements of a numerical type will produce a number that is not 256 based math.
  • Today data is saved as bytes of 8 bits and larger numbers are a collection of bytes so each element in the actual number in memory is 256 based.
  • In actual memory numbers are saved as 256 base and so an addition operation on two such numbers means an addition using 256 base math.
  • Intel 8086 processor and others are known to be able to do 10 base math. Using a collection of types as described above it is possible to use math of any base.
  • Another type that can be implemented by hardware is a type that has a value which is a multiple of a number. For example a type that can only hold a number that is a multiple of 3. This can be useful for grid calculations for example an area of 120 * 120 with a grid of 3 * 3. a value inserted to this variable that is not an exact multiple will roll over to the closest valid value, just like Prior Art applications roll over decimal point values when inserted into Integers.
  • Data can be stored in memory sequentially as is commonly used but it can also be stored with spaces between values. These spaces can be space fillers for a value so that the next value start on a byte boundary for example, or can be used as part of a sequence of data for example in a communication stream where multiple data values are transmitted and some are not relevant for the application.
  • a CPU can implement a type that has a mask that defines the locations of valid types. An example is a collection of bytes where the first two are Sender Address, the next two are Sender Verification, next two are Receiver Address, and following is the transmission data.

Abstract

A multi-core CPU has a plurality of interconnected cores, each of said cores being mainly dedicated to a set of predefined tasks, and each of said cores being adapted to perform its predefined tasks.

Description

MULTI-CORE CPU Field of the Invention
The present invention relates generally to computer processors, and more particularly, relates to computer multi-core processors.
Background of the Invention
A processor is a logic circuitry that responds to and processes the instructions that drive a computerized device. For computers the term CPU (Central Processing Unit) is more generally used.
The form, design, and implementation of CPUs are constantly changing due to the evolution of the electronic technologies and miniaturization methods and systems. The fundamental operation of CPUs (managing and computing data) has remained pretty much the same, as the internal architecture defined by Von Neumann.
The processors are nowadays composed of integrated circuits (IC). These allow to increase the complexity of the CPUs designed and to reduce the dimension of processors and, particularly, to propose processors including a high number of ICs allowing more quick computing.
In a computer the most important component is the CPU. An Operating System (OS) has many types of tasks to manage. The prior art provides a general purpose CPU which handles a number of type of tasks types such as Digital Signal Processing (DSP) of image and video decoding, network streaming, database memory management, OS Kernel, OS device drivers managing hardware, etc. Sometimes too, such a CPU has to toggle between internal operating modes for example between Applicative Mode (User Mode) and System Mode (Kernel Mode). This CPU is designed as a compromise between all these necessities. As an example, a general purpose CPU runs both applications and drivers, any and all of: DSP, memory intensive operations, network streaming, and hardware real-time tasks.
Today, however, in order to allow the use of powerful software and devices, computers needs to work with multi-core CPUs. A multi-core CPU combines at least two independent cores into a single package composed of a single IC. As an example, a dual-core processor contains two cores, and as another example a quad-core contains four cores. A multi-core microprocessor contains a plurality of cores and implements multiprocessing in a single physical package. A multi-core CPU duplicates the same CPU at least two times.
A multi-core CPU works sharing the same interconnection, as buses, to the rest of the system. When properly operating, the multi-core CPU can deal with a number of threads concurrently. Nevertheless this depends on its design and to the software that manages and uses its capabilities. One of
the main challenges in the field of multi-core CPUs is related to the task scheduler which is a critical component of the operation system software. The goal of a task scheduler is to manage the CPU resource allocation to said tasks. This allocation typically works according to task priority levels. A priority level is allocated to a task according to the software or to the hardware component which needs to be executed. According to this definition, simultaneous scheduling of a large number of tasks can reduce the performance of a system (even if including a multi-core CPU) if the number of tasks with a high level of priority is large. However, the multi- core CPUs architectures of the prior art does not provide a solution to this problem. Task scheduling and real-time behavior differ between real-time and kernel tasks and, non-critical and applicative tasks in parameters such as response time, latency, scheduling algorithms, priority system, and the number of priorities used. The CPUs of the prior art are designed in order to make a compromise between all types of tasks. As an example, since all cores of a set of CPU compete over the access to memory and I/O, increasing the number of cores in a system means that all cores compete over memory access, access to hardware resources, and other system resources. Any task can run on any CPU and it is extremely difficult to manage priorities between cores on one hand but on the other real-time tasks share the same core pool with low priority tasks. This causes real-time and high priority tasks to compete with low priority tasks over the same resources, which is a serious design flaw. This design works fine for a single core CPU because
tasks are stopped to allow higher tasks to execute. Algorithmically, a task is executed by a core; when the number of tasks is higher than the number of cores, the tasks are scheduled to run periodically on the available cores, according to a common scheduling algorithm dealing with all types of tasks on the system.
Computer systems use basic data types and build complex data types as a collection of the basic types. Basic types such as Boolean, integer, integer 64, integer 128, floating point, double floating point, etc, where Integer can be either signed or unsigned (for example of 8 bit Integer: 0 - 255 or -127 - 128). These basic types are either multiples of power of 2 or floating points. Most cases the implementation does not really need an Integer that is 2 at the power of multiples of 8 (1, 256, 65536, 4 Giga, etc.). The implementation in such cases is using a basic type Integer that can hold the required value and the CPU manages the type in software. For example a person's age can have the value of 0 to 120 so an Integer of -127 to +128 can be used and it is up to the software to verify that the age is not below 0 and not above 120. The CPU has a special Flag called the Overflow Flag to indicate that the last operation caused the value to overflow. For example an 8 bit Integer of the value of 127 added with 2 will fall back to -126 and the Overflow Flag will be raised as a warning automatically. In the case of an age of 0 to 120 it is up to the software to prevent value overflows so for example an age of 119 added with 2 years will result in an Integer of 121 and therefore the software has to verify that the value did not go over 120.
It is an object of the present invention to provide a multi-core CPU that obviate the drawbacks of the prior art.
It is another object of the present invention to provide a multi-core processor with a plurality of internal cores grouped to form virtual multi-core CPU's.
It is yet another object of the present invention to provide a multi-core CPU including several types of dedicated cores suitable to handle particular types of tasks working independently one of the other.
It is an object of the present invention to provide a multi-core CPU including different algorithms optimizing scheduling of different types of tasks.
It is a further object of the present invention to provide a multi-core CPU to which cores can be attached or detached by plug & play. It is another further object of the present invention to provide a multi-core CPU which can manage data to reduce the risk of overflow.
Further purposes and advantages of this invention will become apparent as the description proceeds.
Summary of the Invention
The present invention is related to a multi-core CPU which overcomes the currently available multi-core CPUs. The multi-core CPU of the present invention is composed by a plurality of cores which can be of several types of cores. Each core is optimized for handling a specific type of task and task classification in the system. This optimization improves the behavior and performance of the task running on the core according to the type of task. A General Purpose CPU can never achieve optimization for all types of tasks as it is possible by using dedicated cores. The proposed multi-core CPU can be easily upgraded according to the possibility to connect and to disconnect any core of the system. The present invention allows dealing with types of data currently unused by the available multi-core CPUs.
All the above and other characteristics and advantages of the invention will be further understood through the following illustrative and non-limitative description of preferred embodiments thereof, with reference to the appended drawings; wherein like components are designated by the same reference numerals.
Brief Description of the Drawings
— Fig. 1 is a schematic block diagram of an example of a series of cores connected through common buses according to the prior art;
— Fig. 2 is a schematic block diagram of an exemplary embodiment of the invention, with cores connected through a relay core instead of through common buses;
— Fig. 3 is a schematic block diagram of an exemplary embodiment of the invention, with cores connected through a relay core instead of through a common network bus;
— Fig.4 is a schematic representation of an exemplary embodiment of the invention, with dedicated multi-core CPU.
— Fig.5 is a schematic representation of a multi-core CPU and said internal cores.
— Fig.6 is a schematic representation a multicore CPU that is made of several physical chips or cards.
— Fig. 7 is a logical representation of Fig. 6.
Detailed Description of Preferred Embodiments
Fig. 1 is a schematic block diagram representing the prior art of multi-core CPU. Cores 102 are identical, that is to say that they are based on the same technology. The set of these cores 102 is connected through common buses of addresses 104 and of data 106.
Fig. 2 is a schematic block diagram representing an example of a set of cores 202, 204, and 214, connected through a relay core 212. The first subset 200 of cores 202 and 204 is connected by a common address bus 206 and by a
common data bus 208. The first subset 210 of cores 212 and 214 is connected by a common address bus 216 and by a common data bus 218. In each subset of cores 200 or 210 a core (e.g. 202 or 212) can be allocated to the task that allows the two subsets 200 and 210 to be connected and to communicate. This core is termed "relay core" for convenience in this case, but is not dedicated exclusively to this types of task. The relay core 212 is able to connect the subset of cores 210 to the subset of cores 200 using connections 220 and 222 respectively to the address bus 204 and to the data bus 206 of the subset of cores 200. A relay core can relay several different types of busses.
Fig. 3 is a schematic block diagram representing an example of three subsets 300, 310, and 320 of cores 302 and 304, 312 and 314, and 322 and 324, respectively. Each core of a subset is connected to the others using a common network bus (respectively for each subset 306, 316, and 326). As shown previously with reference to Fig. 2, a subset of cores is not connected to another using a bus, but rather using a relay core. In this example, the relay core is core 312. It is connected to the subset of cores 300 by way of core 302 using connection 330. It is also connected to the subset of cores 320 by way of core 322 using connection 332.
According to an embodiment of the present invention, the design of the multi-core CPU includes cores which are dedicated for different types of
tasks. This dedication allows overcoming the possibility that some cores can not be able to run on a fully featured computer system.
According to one embodiment of the present invention, a CPU core can act as a relay for other cores and between segments of buses.
According to another embodiment of the present invention, a multi-core processor can include several types of cores. As an example Fig. 4 schematically shows a CPU 400 comprising 9 cores; a core mainly dedicated to an operating system 402, two cores 404 and 406 mainly dedicated each one to a type of device driver accessing hardware, two cores 408 and 410 mainly dedicated to Video 408 and to audio 410 Digital Signal Processing (DSP), a core mainly dedicated to the activity of the Floating Point Unit (FPU) 412, two cores 414 and 416 mainly dedicated each one to a core- specific software, and a core 418 mainly dedicated to the activity of the Watch Dog. The term "Watch Dog" is used herein refers to meaning used for this component in prior art CPUs.
Fig.5 is a schematic representation of a CPU that has 9 cores. It is an expansion of Fig 4. The references used for the description of this figure are based on the table bellow.
Figure imgf000011_0001
Core A, B: Video
The function of this core is video and graphics and it is expected to be part of the video card for the computer system. This core can also be used for video decoding and video compression. The type of core is DSP (Digital Signal Processing) and it should support advanced rendering capabilities such as MAC - Multiply Accumulate and perform operations on sets of pixels. This core is expected to be connected directly to the video display hardware and probably has a fast connection to the computer's memory for fast image transfers. Internally such a core has advanced support for floating point operations and a strong ALU unit that handles large numbers and packed bytes. The data bus for this core is as wide as possible to allow large volume transfers in real-time. Core C: System
The function of this core is to run the operating system's core and elementary real-time elements. This core would probably used to run the scheduler, application loader and boot loader, and is probably the first core to run in the system so that it can initialize the other cores. The type of core is System and it should be simple to initialize and should be designed for fast response times. This core can potentially be connected directly to the DMA and Interrupt controller and would probably have some control over I/O. Internally such a core has support super fast 16 bit ALU operations and has hardware support for process and thread tables that include Priorities and Permission Tables. This core has full permission over the system and for the current design behaves as the master for all cores. The data bus for this core is 32 bit wide because the focus of this core is response time. It is possible that this core will work with a super fast memory card that is blocked from access of other cores. Such a core would usually work with system RAM directly and not virtually and use a small amount of memory at reserved areas so there is no need for complex address / data busses.
Core D: Audio
The function of this core is audio and voice and it is expected to be part of the computer's audio card with speaker output and line in and microphone. This core can also be used for audio decoding and encoding such as MP3. The type of core is DSP and it should support advanced rendering capabilities such as MAC - Multiply Accumulate and perform audio operations such as SRC (Sample Rate Conversion). This core can potentially have an analog input and output so it can function as a sound card, or it can be connected digitally to a DAC (Digital to Analog Converter). Internally such a core has support for floating point operations and a 32 bit ALU unit which is enough for 16 bit audio data. The data bus for this core is as wide as possible to allow large volume transfers in real-time.
Core E: Device Drivers
The function of this core is Device Drivers and runs the loadable drivers which manage hardware elements and allow Applications to connect to these hardware elements. This core is expected to perform in real-time and have very fast responses. The type of core is Real-time which means that the latency for event handling has to be deterministic and that context switches must be very fast, if at all allowed. This core manages hardware and thus has control and access to system I/O. The core can have support for Interrupt handling and it can receive events from the system core if it manages the Interrupt hardware. Internally such a core has support for 32 bit ALU operations and supports I/O data manipulations in the instruction set. The data bus for this core is 32 bits wide because it is not expected to perform large buffer transfers in real-time. Core F: Streaming Device Drivers
The function of this core is Device Drivers and runs the loadable drivers which manage hardware elements and allow Applications to connect to these hardware elements. This core is expected to perform in real-time and have very fast responses. The type of core is streaming which means that the focus of this core is large buffer transfers in real-time. This core manages streaming hardware such as network cards and some types of DSP hardware (such as CODECs - COders DECoders) and thus has control and access to system I/O. The core can have support for Interrupt handling and it can receive events from the system core if it manages the Interrupt hardware. Internally such a core has support for floating point operations and integer ALU operations and supports I/O data manipulations in the instruction set. The data bus for this core is 32 bits wide because it is the exemplary hardware does not support a wider data bus but can be as wide as possible to allow fast operations with memory.
Core G, H: Application
The function of this core is Applications and it is design to run the user applications on the system. This core has little to no relevance for real-time and scheduling is not expected to occur rapidly so the core can maintain a large amount of data for an execution unit. The type of core is GUI which means that the focus of this core is to handle user events and send output to the display device, so the response times are expected to be suitable for communication with a person. The core should not have any I/O access and should be able to communicate with hardware via the Device Driver cores and with the display via the Video cores. Internally such a core has support for wide integer ALU operations and potentially direct access to Video cores for large image transfers and manipulations. The data bus for this core is 64 bits wide because this is the type of data that this core is expected to use for this example.
Core I: Database
The function of this core is Database and it exists to manage a database in memory. The type of core is Service which means that Applications communicate with it just like a Device Driver core but the core does not have I/O and hardware access and it is not designed to respond in strong real-time. This core can support database functionality and algorithms in hardware such as Hash functions, Indexing, etc. Internally such a core has support for wide ALU operations and packed data operations. The data bus for this core is as wide as possible to allow large volume transfers because the nature of such service is to scan large chunks of data and large scale copy operations.
Fig.6 shows an example of a multi-core CPU that is made of several physical chips or cards. This system has a DSP card / chip installed for support of video and audio rendering. This can be manufactured by a company with expertise of algorithms and analog hardware.
The system is installed with a Kernel card / chip that supports I/O and interrupt handling. This is probably part of the computer's motherboard and designed by the computer's manufacturer.
The system also has an Application card / chip installed. This should run the applications and might be a set of expansion cards so that it is possible to add more elements when the system runs out of CPU power for the running applications.
A Database card / chip is installed on the system to act as a service that manages databases. This is part of the Services expansion support by the system to allow special functionality that is not required for most common system configurations.
Fig.7 is a logical representation of Fig. 6. The operating system is not expected to care that the cores are installed on several separate hardware elements and software running on the machine should think that there is one big virtual CPU installed on the machine. According to an embodiment of the present invention, a CPU designed according to the previous description is defined by a CPU instruction set, privileges, power consumption, bus speed, Input/Output access/memory access; and memory access type (virtual, physical, etc).
According to yet another embodiment of the present invention, all cores use, in a CPU designed according to the present invention, the same bus/bus methodology and behave vis-a-vis the software/operating system as a single multi-core CPU.
According to an embodiment of the present invention, there is provided a set of multi-core CPU's that join all cores to form a virtual multi-core CPU, which can then be divided into smaller virtual CPU's. This can also allow a set of CPU's with special core types, for example, a CPU with cores for an operating system and a CPU for processes, to form two virtual CPU's that have some cores for processes and some cores for an operating system.
The proposed multi-core CPU can have cores dynamically attached/detached, e.g., plug & play cores. This can increase the modularity of the proposed system.
According to another embodiment of the present invention, the proposed multi-core CPU can support advanced types of data in its instruction set. Prior art CPUs support simple types such as bit, boolean, byte, word, double word, float, packages of these items; some CPUs have instruction support for strings. The system of the present invention can furthermore support, types of data which are not bound to 2 in the power of multiples of 8 (256, 65536, etc.). These advanced types, for example, can begin with -3 and go up to 12, adding 1 to a variable with the value of 12, which will overflow to -3, if overflow is allowed. According to this last point and to the possibility of prior art systems allowing packing different types of data, the present invention allows packed types of the above kind where packing is not bound to multiples of bytes, for example, a variable can be [1 to 22] [0 to 15] [32 to 234], packed as a single variable. As an example, if overflow is allowed then a variable with value [I][O] [234] added to [I][O][I] will overflow to [I][I][O]. Different overflow behaviors can be defined, as will be apparent to the skilled person and not described here in detail, for the sake of brevity.
According to some embodiments of the present invention, a variable of type HASH is available. A variable of type HASH (more specifically the calculated HASH value) can be compared to variables of other types. More particularly, a CPU core that can implement a mechanism to maintain the value limitations in hardware should be able to detect such overflows in hardware. In such a scenario the CPU should know that the type of value is 0 to 120 and therefore adding 2 to a value of 119 will overflow to 1 and the Overflow Flag will be raised. This means that the memory will not store illegal values if the software programmer made the mistake of not verifying the data before addition. ALU (Arithmetic Logic Unit) supports for such types should allow any ALU operation on the given type according to the rules used today for a 2 by the power of multiples of 8 types of values. As an example, ALU support supporting HASH type can include Addition that will add value to the HASH according to HASH algorithm, Subtraction will reverse this operation or use a negative input value, this while maintaining ALU flags such as Carry Flag, Zero Flag, etc.
A value type that is not based on 2 by the power of multiples of 8 can start at any value and up to any value. Using a collection of these types as base elements of a numerical type will produce a number that is not 256 based math. Today data is saved as bytes of 8 bits and larger numbers are a collection of bytes so each element in the actual number in memory is 256 based. In actual memory numbers are saved as 256 base and so an addition operation on two such numbers means an addition using 256 base math. Intel 8086 processor and others are known to be able to do 10 base math. Using a collection of types as described above it is possible to use math of any base.
Another type that can be implemented by hardware is a type that has a value which is a multiple of a number. For example a type that can only hold a number that is a multiple of 3. This can be useful for grid calculations for example an area of 120 * 120 with a grid of 3 * 3. a value inserted to this variable that is not an exact multiple will roll over to the closest valid value, just like Prior Art applications roll over decimal point values when inserted into Integers.
Many times computer applications use a collection set of valid values. An example is the number of wheels that a vehicle has which can be either of 2, 3, 4, 6, 8, and 12. Another example is the engine size. These value types are currently maintained by software a CPU that can support such a collection type by hardware will behave similar to a value type that is a multiple of a number when ALU operations are performed.
Data can be stored in memory sequentially as is commonly used but it can also be stored with spaces between values. These spaces can be space fillers for a value so that the next value start on a byte boundary for example, or can be used as part of a sequence of data for example in a communication stream where multiple data values are transmitted and some are not relevant for the application. A CPU can implement a type that has a mask that defines the locations of valid types. An example is a collection of bytes where the first two are Sender Address, the next two are Sender Verification, next two are Receiver Address, and following is the transmission data. Data bytes would look like this: [Src][Src][Vrfy][Vrfy][Dst][Dst][Dat][Dat][Dat][Dat][Dat]... For this example a mask can be made of [X][X] [-][-] PC][X] to specify that valid values are located on bytes 1, 2, 5, 6 and when the CPU will copy the data it will only consider the valid data, and the same for ALU operations. Mask can be in bytes, bits, or any other type of value.
Although embodiments of the invention have been described by way of illustration, it will be understood that the invention may be carried out with many variations, modifications, and adaptations, without exceeding the scope of the claims.

Claims

Claims
1. A multi-core CPU having a plurality of interconnected cores each of said cores being mainly dedicated to a set of predefined tasks, each of said cores being adapted to perform its predefined tasks.
2. A multi-core CPU according to claim 1, wherein each said core is designed to be suitable for its dedicated functionality.
3. A multi-core CPU according to claim 1, including a plurality of cores which are grouped in subsets of cores and linked by common buses.
4. A multi core CPU according to claim 3, wherein each said core is connected to more than one type of bus.
5. A multi-core CPU according to claim 3, including a plurality of cores using the same bus protocols.
6. A multi-core CPU according to claim 3 wherein multi-core CPU uses several types of buses internally.
7. A multi-core CPU according to claim 3, wherein a subset of cores communicates with another subset of cores using a relay core.
8. A multi-core CPU according to claim 7, wherein the relay core is not fully dedicated to the relay function.
9. A multi-core CPU According to claim 1, wherein the plurality of cores is connectable and disconnectable via plug-and-play technology.
10. A multi-core CPU according to claim 1, which is coupled to one or more other multi-core CPUs to form a virtual multi-core CPU.
11. A multi-core CPU according to claim 1, which is dedicated to support advanced types of data in the instruction set.
12. A multi-core CPU according to claim 11, which supports in the instruction set, types of data which are not bound to power of 2.
13. A multi-core CPU According to claim 11, which supports in the instruction set, types of data as string.
14. A multi-core CPU according to claim 11, which supports computation starting at any value and up to any value.
15. A multi-core CPU according to claim 11, which supports computation using multiples of any value.
16.A multi-core CPU according to claim 11, which supports a mask type such as X_,_,_XXX_XXX where 'X' indicates relevant and '_' irrelevant data
17. A multi-core CPU according to claim 16, which supports a collection of these types such as [-3 to 12]_,_,_,_[0 to 25].
18. A multi-core CPU according to claim 8, which supports in the instruction set, types of data as HASH.
19. A multi-core CPU according to claim 8, which supports in the instruction set, types of data which are packs of different types of data.
20. A multi-core CPU according to claim 1, using a tag added to the software code to specify the core type on which this function should run.
PCT/IL2008/000916 2007-07-05 2008-07-03 Multi-core cpu WO2009004628A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US94799607P 2007-07-05 2007-07-05
US60/947,996 2007-07-05

Publications (2)

Publication Number Publication Date
WO2009004628A2 true WO2009004628A2 (en) 2009-01-08
WO2009004628A3 WO2009004628A3 (en) 2010-03-04

Family

ID=40226623

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IL2008/000916 WO2009004628A2 (en) 2007-07-05 2008-07-03 Multi-core cpu

Country Status (1)

Country Link
WO (1) WO2009004628A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886331B2 (en) 2014-11-19 2018-02-06 International Business Machines Corporation Network traffic processing
CN114866499A (en) * 2022-04-27 2022-08-05 曙光信息产业(北京)有限公司 Synchronous broadcast communication method, device and storage medium of multi-core system on chip

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4922414A (en) * 1982-12-17 1990-05-01 Symbolics Inc. Symbolic language data processing system
US5835775A (en) * 1996-12-12 1998-11-10 Ncr Corporation Method and apparatus for executing a family generic processor specific application
US20040024845A1 (en) * 2001-11-15 2004-02-05 Globalview Software, Inc. Data transfer system
US20060179277A1 (en) * 2005-02-04 2006-08-10 Flachs Brian K System and method for instruction line buffer holding a branch target buffer
US20060236147A1 (en) * 2005-04-15 2006-10-19 Rambus Inc. Processor controlled interface
US20070061433A1 (en) * 2005-09-12 2007-03-15 Scott Reynolds Methods and apparatus to support dynamic allocation of traffic management resources in a network element
US20070130445A1 (en) * 2005-12-05 2007-06-07 Intel Corporation Heterogeneous multi-core processor having dedicated connections between processor cores

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4922414A (en) * 1982-12-17 1990-05-01 Symbolics Inc. Symbolic language data processing system
US5835775A (en) * 1996-12-12 1998-11-10 Ncr Corporation Method and apparatus for executing a family generic processor specific application
US20040024845A1 (en) * 2001-11-15 2004-02-05 Globalview Software, Inc. Data transfer system
US20060179277A1 (en) * 2005-02-04 2006-08-10 Flachs Brian K System and method for instruction line buffer holding a branch target buffer
US20060236147A1 (en) * 2005-04-15 2006-10-19 Rambus Inc. Processor controlled interface
US20070061433A1 (en) * 2005-09-12 2007-03-15 Scott Reynolds Methods and apparatus to support dynamic allocation of traffic management resources in a network element
US20070130445A1 (en) * 2005-12-05 2007-06-07 Intel Corporation Heterogeneous multi-core processor having dedicated connections between processor cores

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUTTAG, J.: 'Abstract Data Types and the Development of Data Structures.' COMMUNICATIONS OF THE ACM., [Online] vol. 20, no. 6, June 1977, pages 396 - 404 Retrieved from the Internet: <URL:http://www.cs.wright.edu/people/iaculty/tkprasad/courses/cs784/guttag-cacm77.pdf> [retrieved on 2008-11-08] *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886331B2 (en) 2014-11-19 2018-02-06 International Business Machines Corporation Network traffic processing
US9891964B2 (en) 2014-11-19 2018-02-13 International Business Machines Corporation Network traffic processing
CN114866499A (en) * 2022-04-27 2022-08-05 曙光信息产业(北京)有限公司 Synchronous broadcast communication method, device and storage medium of multi-core system on chip
CN114866499B (en) * 2022-04-27 2024-02-23 曙光信息产业(北京)有限公司 Synchronous broadcast communication method, device and storage medium of multi-core system on chip

Also Published As

Publication number Publication date
WO2009004628A3 (en) 2010-03-04

Similar Documents

Publication Publication Date Title
US9459874B2 (en) Instruction set architecture-based inter-sequencer communications with a heterogeneous resource
US8972699B2 (en) Multicore interface with dynamic task management capability and task loading and offloading method thereof
US8797332B2 (en) Device discovery and topology reporting in a combined CPU/GPU architecture system
EP2725498B1 (en) DMA vector buffer
US6249830B1 (en) Method and apparatus for distributing interrupts in a scalable symmetric multiprocessor system without changing the bus width or bus protocol
CN100562892C (en) Image processing engine and comprise the image processing system of image processing engine
CN102640131A (en) Unanimous branch instructions in a parallel thread processor
CN107957965B (en) Quality of service ordinal modification
WO2012083012A1 (en) Device discovery and topology reporting in a combined cpu/gpu architecture system
Yan et al. A reconfigurable processor architecture combining multi-core and reconfigurable processing unit
WO2009004628A2 (en) Multi-core cpu
WO2022161013A1 (en) Processor apparatus and instruction execution method therefor, and computing device
CN115858439A (en) Three-dimensional stacked programmable logic architecture and processor design architecture
CN110928816B (en) On-chip configurable interrupt control system circuit
CN113961506A (en) Accelerator and electronic device
Isaacson et al. The Task-Resource Matrix: Control for a Distributed Reconfigurable Multi-Processor Hardware RTOS.
de difusión Público et al. Información del Documento
CN114064198A (en) Multiple virtual NUMA domains within a single NUMA domain via an OS interface table
CN116107955A (en) Dynamically configurable multi-mode memory allocation in an accelerator multi-core system-on-chip
JP2002140203A (en) Computer system
Kim et al. Fast and flexible pipelined multi-processor architecture for multimedia device
JP2003015861A (en) Semiconductor integrated circuit and computer readable recording medium
Isaacson Hardware Support for a Configurable Architecture for Real-Time Embedded Systems on a Programmable Chip
JP2005346673A (en) Interruption controller and system lsi

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08763672

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase in:

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08763672

Country of ref document: EP

Kind code of ref document: A2