EP3172669A2 - Système et procédé de traitement parallèle au moyen de cellules de co-traitement proactives configurables de manière dynamique - Google Patents

Système et procédé de traitement parallèle au moyen de cellules de co-traitement proactives configurables de manière dynamique

Info

Publication number
EP3172669A2
EP3172669A2 EP15825147.0A EP15825147A EP3172669A2 EP 3172669 A2 EP3172669 A2 EP 3172669A2 EP 15825147 A EP15825147 A EP 15825147A EP 3172669 A2 EP3172669 A2 EP 3172669A2
Authority
EP
European Patent Office
Prior art keywords
task
pool
cell
task pool
agent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP15825147.0A
Other languages
German (de)
English (en)
Other versions
EP3172669A4 (fr
Inventor
Alfonso INIGUEZ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/340,332 external-priority patent/US9852004B2/en
Application filed by Individual filed Critical Individual
Publication of EP3172669A2 publication Critical patent/EP3172669A2/fr
Publication of EP3172669A4 publication Critical patent/EP3172669A4/fr
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5011Pool
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload

Definitions

  • the present invention generally relates to parallel-process computing, and particularly to a processing architecture which involves autonomous coprocessors configured to proactively retrieve tasks from a task pool populated by a central processing unit.
  • the Internet of Things (also referred to as the Cloud of Things) refers to an ad hoc network of uniquely identifiable embedded computing devices within the existing Internet infrastructure.
  • the internet of things (IoT) portends advanced connectivity of devices, systems, and services that goes beyond machine-to- machine communications (M2M).
  • M2M machine-to- machine communications
  • the scope of things contemplated by the IoT is unlimited, and may include devices such as heart monitoring implants, biochip transponders, automobile sensors, aerospace and defense field operation devices, and public safety applications that assist fire-fighters in search and rescue operations, for example.
  • Current market examples include home based networks that involve smart thermostats, light bulbs, and washer/dryers that utilize wifi for remote monitoring. Due to the ubiquitous nature of connected objects in the IoT, it is estimated that more than 30 billion devices will be wirelessly connected to the Internet of Things by 2020. Harnessing the processing capacity of the controllers and processors associated with these devices is one of the objectives of the present invention.
  • Computer processors traditionally execute machine coded instructions serially. To run a plurality of applications concurrently, a single processor interleaves instructions from various programs and executes them serially, although from the user's perspective the applications appear to be processed in parallel.
  • True parallel or multi-core processing is a computational approach that breaks large computational tasks into individual blocks of computations and distributes them among two or more processors.
  • a computing architecture that uses task parallelism (parallel processing) divides a large computational requirement into discrete modules of executable code. The modules are then executed concurrently or sequentially, based on their respective priorities.
  • a typical multiprocessor system includes a central processing unit (“CPU") and one or more co-processors.
  • the CPU partitions the computational requirements into tasks and distributes the tasks to co-processors. Completed threads are reported to the CPU, which continues to distribute additional threads to the co-processors as needed.
  • Presently known multiprocessing approaches are disadvantageous in that a significant amount of CPU bandwidth is consumed by task distribution; waiting for tasks to be completed before distributing new tasks (often with dependencies on previous tasks); responding to interrupts from coprocessors when a task is completed; and responding to other messages from coprocessors.
  • co-processors often remain idle while waiting for a new task from the CPU.
  • a multiprocessor architecture in thus needed which reduces CPU management overhead, and which also more effectively harnesses and exploits available co-processing resources.
  • Various embodiments of a parallel processing computing architecture include a CPU configured to populate a task pool, and one or more co-processors configured to proactively retrieve threads (tasks) from the task pool.
  • Each coprocessor notifies the task pool upon completion of a task, and pings the task pool until another task becomes available for processing. In this way, the CPU communicates directly with the task pool, and communicates indirectly with the co-processors through the task pool.
  • the co-processors may also be capable of acting autonomously; that is, they may interact with the task pool independently of the CPU.
  • each co-processor includes an agent that interrogates the task pool to seek a task to perform.
  • the co-processors work together "in solidarity" with one another and with the task pool to complete aggregate computational requirements by autonomously retrieving and completing individual tasks which may or may not be inter-related.
  • a task B involves computing an average temperature over time.
  • the CPU and the various co-processors may thereby inferentially communicate with each other via the task pool.
  • the co-processors are referred to as autonomous, proactive solidarity cells.
  • autonomous implies that a coprocessor may interact with the task pool without being instructed to do so by the CPU or by the task pool.
  • proactive suggests that each co-processor may be configured (e.g., programmed) to periodically send an agent to monitor the task pool for available tasks appropriate to that co-processor.
  • solidarity implies that co-processing cells share a common objective in monitoring and executing all available tasks within the task pool.
  • a solidarity cell may be a general purpose or special purpose processor, and therefore may have the same or different instruction set, architecture, and microarchitecture as compared to the CPU and other solidarity cells in the system.
  • the software programs to be executed and data to be processed may be contained within one or more memory units.
  • a software program consists of a series of instructions that may require data to be used by the program. For example, if the program corresponds to a media player, then the data contained in memory may be compressed audio data which is read by a co-processor and eventually played on a speaker.
  • Each solidarity cell in the system may be configured to communicate, ohmically or wirelessly, with the task pool through a crossbar switch, also known as fabric.
  • the radio signals themselves may constitute the fabric.
  • the co-processors may also communicate directly with the CPU.
  • the switching fabric facilitates communication among system resources.
  • Each solidarity cell is proactive, in that it obtains a task to perform by sending its agent to the task pool when the solidarity cell has no processing to perform or, alternatively, when the solidarity cell is able to contribute processing cycles without impeding its normal operation.
  • a co-processor associated with a device such as a light bulb may be programmed to listen for "on” and “off commands from a master device (such as a smartphone) as its normal operation, but its processing resources may also be harnessed through a task pool.
  • agent refers to a software module, analogous to a network packet, associated with a coprocessor that interacts with the task pool to thereby obtain available tasks which are appropriate for that co-processor cell.
  • the solidarity cells may execute the tasks sequentially, when the tasks are contingent on the execution of a previous task, or in parallel, when more than one solidarity cell is available and more than one matching tasks are available for execution.
  • the tasks may be executed independently or collaboratively, depending on the task thread restrictions (if any) provided by the CPU. Interdependent tasks within the task pool may be logically combined.
  • the task pool notifies the CPU when a task thread is completed. If a task thread is composed of a single task, then the task pool may notify the CPU at completion of such task. If a task thread is composed of multiple tasks, the task pool may notify the CPU at completion of such chain of tasks. Since task threads may be logically combined, it is conceivable to have a case in which the task pool notifies the CPU after completion of logically combined task threads.
  • Various features of the invention are applicable to, inter alia, a network of Internet-of-Things devices and sensors; heterogeneous computing environments; high performance computing, two dimensional and three dimensional monolithic integrated circuits; and motion control and robotics.
  • FIG. 1 is a schematic block diagram of a parallel processing architecture including a CPU, memory, task pool, and a plurality of co-processors configured to communicate through a fabric in accordance with an embodiment
  • FIG. 2 is a schematic block diagram illustrating details of an exemplary task pool in accordance with an embodiment
  • FIG. 3 is a schematic block diagram of a network including coprocessing cells and their corresponding agents interacting with a task pool in accordance with an embodiment
  • FIG. 4 is a schematic layout of an internet of things network including available plug and play devices in accordance with an embodiment
  • FIG. 5 is a schematic layout diagram of an exemplary internet of things use case illustrating dynamic harnessing of nearby devices in accordance with an embodiment
  • FIG. 6 is a flow chart illustrating the operation of an exemplary parallel computing environment in accordance with an embodiment.
  • Various embodiments relate to parallel processing computing systems and environments, from simple switching and control functions to complex programs and algorithms including, without limitation: data encryption; graphics, video, and audio processing; direct memory access; mathematical computations; data mining; game algorithms; ethernet packet and other network protocol processing including construction, reception and transmission of data the outside network; financial services and business methods; search engines; internet data streaming and other web-based applications; execution of internal or external software programs; switching on and off and/or otherwise controlling or manipulating appliances, light bulbs, consumer electronics, and the like, e.g., in the context of the Internet-of- Things.
  • a distributed processing system 10 includes a single or multi-core CPU 11 and one or more solidarity or co-processing cells 12A - 12 configured to communicate with a task pool 13 through a cross-bar switching fabric 14.
  • the solidarity cells 12 may also communicate with each other through the switching fabric 14 or through a separate cell bus (not shown).
  • the CPU 11 may communicate with the task pool 13 directly or through the switching fabric 14.
  • One or more memory units 15 each contain data and/or instructions.
  • the term "instructions" include a software program that may be compiled for execution by the CPU 11.
  • the memory units 15, cells 12, and the task pool 13 may be ohmically or wirelessly interconnected to communicate with the CPU and /or with each other via the switching fabric 14.
  • the CPU 11 communicates with the cells 12 only indirectly through the task pool.
  • the CPU 11 may also communicate directly with the cells 12 without using the task pool as an intermediary.
  • the system 10 may include more than one CPU 11 and more than one task pools 13, in which case a particular CPU 11 may interact exclusively with a particular task pool 13, or multiple CPUs 11 may share one or more task pools 13.
  • each solidarity cell may be configured to interact with more than one task pool 13.
  • a particular cell may be configured to interact with a single designated task pool, for example, in a high performance or high security context.
  • cells may be dynamically paired, ohmically (plug and play) or wirelessly (on the fly), with a task pool when the following three conditions are meet:
  • the cell is able to communicate, ohmically or wirelessly, with the task pool.
  • the connection to the task pool can be through a port in the task pool itself, or through a switching fabric that is connected to the task pool;
  • the task pool recognizes the agent sent by the cell as trustworthy, for example, using input from the user, with or without password, through traditional Wi-Fi, Blootooth or similar pairing, manually through a graphical software program running on a smartphone or tablet, or by any other secure or unsecure method; and
  • At least one of the available tasks within the task pool is compatible with the capabilities of the solidarity cell.
  • the foregoing dynamic pairing conditions apply, except that a given cell may be locked or restricted to work with only one of the task pools; otherwise, the cells may connect with one or more task pools, using a first found basis, round robin basis or any other selection scheme. It is also possible to assign priorities to the tasks within the task pools, whereby the cells give preference to the high priority tasks and serve the lower priority tasks when not otherwise engaged by the higher priority tasks.
  • the CPU 11 may be any single or multi-core processor, applications processor or microcontroller, used to execute a software program.
  • the system 10 may be implemented on a personal computer, smart phone, tablet, or Internet-of- Things device, in which case the CPU 11 may be any personal computer, central processor, or processor cluster, such as an Intel® Pentium® or multi-core processor local to or remote from the immediate computing environment.
  • the system 10 may be implemented on a supercomputer and the CPU 11 may be a reduced instruction set computer ("RISC") processor, applications processor, a microcontroller, or the like.
  • RISC reduced instruction set computer
  • the system 10 may be implemented on a locally connected series of personal computers, such as a Beowulf cluster, in which case the CPU 11 may include the central processors of all, a subset, or one of the networked computers.
  • the system 10 may be implemented on a network of remotely connected computers, in which case the CPU 11 may be a presently known or later developed central processor for a server or mainframe.
  • the particular manner in which the CPU 11 performs the subject parallel processing methods within the presently described system 10 may be influenced by the CPU's operating system.
  • the CPU 11 may be configured for use within the system 10 by programing it to recognize and communicate with the task pool 13 and divide the computing requirements into threads, as described below.
  • the system 10 may be implemented retroactively on any computer or computer network having an operating system that may be modified or otherwise configured to implement the functionality described herein.
  • the data to be processed is contained within the memory units 15, for example in the context of addressable regions or sectors of random access or read-only memory, cache memory for the CPU 1 1 , or other forms of data storage such as flash memory and magnetic storage.
  • the memory units 15 contain the data to be processed as well as the location to place the results of the processed data. Not every task is required to access the memory units 15, as in the case of, for example, smart meters and automotive instrumentation, which may return data to the system 10, or as in the case of a robot and motor controllers which may actuate a mechanism.
  • Each cell 12 is a conceptually or logically independent computational unit capable of executing one or more tasks/threads.
  • a cell 12 may be a microcontroller, a microprocessor, application processor, a "dumb" switch, or a standalone computer such as a machine in a Beowulf cluster.
  • a cell 12 may be a general or special purpose co-processor configured to supplement, perform all of, or perform a limited range of functions of the CPU, or functions that are foreign to the CPU 11 such as ambient monitoring and robotic actuators, for example.
  • a special-purpose processor may be a dedicated hardware module designed, programmed, or otherwise configured to perform a specialized task, or it may be a general-purpose processor configured to perform specialized tasks such as graphics processing, floating-point arithmetic, or data encryption.
  • any cell 12 that is a special-purpose processor may also be configured to access and write to memory and execute descriptors, as described below, as well as other software programs.
  • any number of cells 12 may comprise a heterogeneous computing environment; that is, a system that uses more than one kind of processor such as an AMD-based and/or an Intel-based processor, or a mixture of 32-bit and 64-bit processors.
  • Each cell 12 configured to perform one or a plurality of specialized tasks, as illustrated in the following sequence of events.
  • each cell periodically sends an agent to the task pool until a matching task is found.
  • both the cell and the task pool may be equipped with a transceiver.
  • the transceiver maybe located in the task pool itself or in the switching fabric to which the task pool is connected.
  • the task pool transmits an acknowledgement to the cell.
  • the next step is the "communication channel" phase.
  • the cell receives the task and begins to execute the task. In one implementation, once the first task is completed, the communication channel is maintained so that the solidary cell can fetch another task without having to repeat the "poll" and "acknowledge” phases.
  • the system 10 may include a plurality of cells, wherein some of the cells are capable of performing the same task types as other cells, to thereby create redundancy in the system 10.
  • the set of task types performed by a given cell 12 may be a subset of the set of task types performed by another cell.
  • the system 10 may divides an aggregate computational problem into a group of tasks, and populate the task pool 13 with a first type, a second type, and a third type of tasks.
  • a first cell 12A may capable of performing only tasks of the first type; a second cell 12B may be capable of perform tasks of the second type; a third cell 12C may be capable of performing tasks of the third type; a fourth cell 12D may be capable of performing tasks of the second or third types; and a fifth cell 12N may be capable of performing all three task types.
  • the system 10 may be configured with this redundancy so that if a given cell is removed from the system 10 (or currently busy or otherwise unavailable), the system 10 may continue to function seamlessly. Furthermore, if a cell is dynamically added to the system 10, the system 10 may continue to function seamlessly with the benefit of increased performance.
  • the task pool 13 may occupy a region of physical memory that is accessible by the CPU 11.
  • the task pool 13 may be accessible by MAC address or IP address.
  • Multiple embodiments are envisioned for the task pool 13; it may be physically located with the CPU in the same 2D or 3D monolithic IC, or it may be implemented as a stand-alone IC and be physically interconnected to a computer board, smart phone, tablet, router or Internet-of-Things device.
  • the task pool may be a stand-alone multi-port, wired and/or wireless connected device which may be shared among multiple CPU 11 systems, or dedicated to a given CPU 11.
  • the task pool 13 may also be addressable by the cells 12.
  • the task pool 13 may be disposed in a dedicated hardware block to provide maximum access speed by the CPU 11 and cells 12.
  • the task pool 13 may be software based, wherein the contents of the task pool 13 are stored in memory, analogous to the hardware- based embodiment, but represented by data structures.
  • the task pool 13 Upon being populated by the CPU 11, the task pool 13 contains one or more task threads 21. Each task thread 21 represents a computational task that may be a component or subset of the larger aggregate computational requirement imposed on the CPU 11. In one embodiment, the CPU 11 may initialize and then populate the task pool 13 with concurrently executable threads 21. Each thread 21 may include one or more discrete tasks 22. A task 22 may have a task type and a descriptor. The task type indicates which cells 12 are capable of performing the task 22. The task pool 13 may also use the task type to prioritize tasks 22 having the same type.
  • the task pool 13 may maintain a prioritization table (not shown) that documents the solidarity cells 12 present in the system 10, the types of tasks 22 each cell is capable of performing, and whether or not each cell is presently processing a task 22.
  • the task pool 13 may use the prioritization table to determine which of the eligible tasks 22 to assign to a requesting cell, as described below.
  • the CPU 11 may retrieve and execute a task or thread from the task pool. Moreover, the CPU 11 may abort any task that is determined to be stale, broken, stuck, or erroneous. In such case, the CPU 11 may refresh the task, making available for subsequent processing. None precludes the CPU 11 from implementing adaptive task management, for example, as may be required by Artificial Intelligence, whereupon the CPU 11 may add, remove, or change tasks within an unfinished existing thread 21.
  • the descriptor may contain one or more of a specific instruction to be executed, a mode of execution, the location (e.g., address) of the data to be processed, and the location for placement of the task results, if any.
  • the location for placement of results is optional, such as in the case of animation and multimedia tasks that often present results to a display rather than storing them in memory.
  • task descriptors may be chained together, as in a linked list, so that the data to be processed may be accessed with fewer memory calls than if the descriptors were not chained together.
  • the descriptor is a data structure containing a header and a plurality of reference pointers to memory locations, and the task 22 includes the memory address of the data structure.
  • the header defines the function or instruction to be executed.
  • a first pointer references the location of the data to be processed.
  • a second, optional pointer references the location for placement of processed data. If the descriptor is linked to another descriptor to be sequentially executed, the descriptor may include a third pointer that references the next descriptor. In an alternative embodiment where the descriptor is a data structure, the task 22 may include the full data structure.
  • a thread 21 may further comprise a "recipe" describing the order in which the tasks 22 may be performed and any conditions that affect the order of performance.
  • the tasks 22 may be executed sequentially, concurrently, out-of order, interdependently, or conditionally according to Boolean operations.
  • thread 21 A comprises four tasks: 22A, 22B, 22C, and 22D.
  • the first task 22A must be completed before either the second task 22B or the third task 22C can begin.
  • the fourth task 22D may begin.
  • Threads 21 may also be interdependent. For example, as shown in FIG. 2, due to the Boolean operation in thread 2 IB, a completed task 22C may allow processing of tasks in thread 2 IB to continue.
  • the task pool 13 may lock a task 22 while the task 22 is waiting for completion of another task 22 upon which it depends. When a task 22 is locked, it cannot be acquired by a cell.
  • the task pool 13 may notify the CPU 1 1 of the completion. The CPU may then advance processing beyond the completed thread 21.
  • the cells advantageously maintain solidarity with each other and with the CPU 11, thereby helping the system 10 to perform complex computations by autonomously and proactively retrieving tasks from the task pool 13.
  • the cells 12 act autonomously in that they may act independently of the CPU 11 or any other coprocessor. Alternatively, a cell may be acted upon or instructed directly by the CPU. Each cell acts proactively in that it seeks a task 22 from the task pool 13 as soon as the cell becomes available for further processing.
  • a cell 12 acquires a task from the task pool by sending an agent 30 to interrogate (search for) the task pool and retrieve an available task 22 that requires completion, is not locked, and that has a task type that can be performed by the cell.
  • the system 10 has the same number of agents as solidarity co-processing cells.
  • an agent is generally analogous to a data frame in the networking sense, in that an agent may be equipped with a source address, a destination address, and a payload.
  • the destination address is the address of the task pool 13 when the agent 30 is seeking a task 22, and the destination address is the address of the corresponding cell 12 when the agent 30 is returning to its cell with a task 22.
  • the source address is the address of the cell 12 when the agent 30 is seeking a task 22, and the source address is the address of the task pool 13 when the agent 30 is returning to its cell with a task 22.
  • the source and destination addresses may facilitate frame synchronization. That is, the system 10 may be configured to unequivocally differentiate addresses from payload data, so that when the contents of an agent 30 are read, the destination address indicates the beginning of the frame and the source address indicates the end of the frame, or vice versa. This allows the payload to vary in size when it is placed between the addresses.
  • an agent 30 may include a header that indicates the payload size. The header information may be compared to the payload to verify the data integrity.
  • the payload may be a fixed length. When an agent 30 is dispatched to the task pool 13 by its coprocessor cell, the payload contains identifying information of the types of tasks the cell 12 can perform. When the agent 30 returns from the task pool 13, the payload contains the descriptor of the task 22, either in the form of a memory location or the full descriptor data structure.
  • some or all of the agents 30 are autonomous representatives of their respective corresponding cells 12. That is, each agent 30 may be dispatched by its corresponding cell 12 to retrieve a task 22 any time the cell is idle or capable of performing additional processing. In this way, the processing capacity of the solidarity cells 12 may be more fully exploited, inasmuch as the cells need not wait idly for an instruction from the CPU 11.
  • This approach has the additional benefit of reducing CPU overhead by relieving the CPU of the need to send a request to a cell to retrieve a task from the task pool.
  • the solidarity cells 12 A - 12n are ambivalent as to the particular composition of the thread itself. Rather, an agent is only concerned about finding a match between the capabilities of its corresponding cell and an available task 22 to be completed in the task pool 13. That is, as long as there are available tasks 22 in the task pool 13, and an available task 22 matches the capability of the cell, then the system may effectively harness the processing capacity of the cell.
  • Some or all of the solidarity cells 12A - 12n may work independently of each other, or may communicate with each other directly, through the switching fabric 14, through the task pool 13, or pursuant to a command or request from the CPU to invoke another solidarity cell to assist in processing, moving, or transmitting data.
  • the agent 30A may search for a match between the task type of the ready tasks 22 and the types of tasks that the cell 12A is able to perform. This architecture may involve hard-coding of the types of tasks that the CPU 11 is configured to create.
  • the CPU 11 may be configured to "learn” or be taught how to create tasks of the fourth type in order to more fully exploit the available processing resources.
  • the agent 3 OA searches the task 22 descriptors for an executable instruction that matches one of the instructions that that cell 12A is capable of executing.
  • the agent 30A delivers the descriptor of the matching task 22 to the cell 12 A, whereupon the cell 12A begins to process the task 22.
  • the agent 3 OA may deliver the memory address of the descriptor to the cell 12 A, and the cell 12A retrieves the data structure from memory.
  • the agent 3 OA may deliver the complete data structure to the cell 12A for processing.
  • the descriptor informs the cell 12A which instruction to execute, the location in memory units 15 where the data to be processed may be found, and the location in memory 15 where the results are to be placed.
  • the cell 12A Upon completion of the task 22, the cell 12A notifies the task pool 13 to change the status of the selected task 22 from 'to be completed' to 'completed.' Further, once the cell 12A finishes a task 22, the cell may dispatch its agent 30A to the task pool 13 to seek another task 22.
  • agents 30A - 30n may travel through the system 10 by wire or wirelessly, for example, using a Wi-Fi network, wireless Ethernet, wireless USB, wireless bridge, wireless repeater, wireless router, Zigbee®, ANT+® or Bluetooth® pairing, according to the particular architecture and/or implementation of the system 10.
  • an agent 30 may be guided to the task pool 13 wirelessly by including a receptor feature at the task pool 13 and further by including a transmitter feature with the cell 12.
  • the task pool may answer wirelessly to the cells by equipping the task pool with a transmitter and the solidarity cells with a receiver. In this manner, the cells may communicate wirelessly with the task pool with or without use of the switching fabric.
  • the switching fabric 14 facilitates connections for data transfer and arbitration between system resources.
  • the switching fabric 14 may be a router or crossbar switch that provides connectivity between the various cells and the task pool.
  • the switching fabric 14 may further provide connectivity between each solidarity cell 12 A - 12n and system resources such as the CPU 1 1, memory units 15, and traditional system components including, without limitation: direct memory access units, transmitters, hard disks and their controllers, display and other input/output devices, and other coprocessors.
  • the cells 12 A - 12n may be connected physically to the switching fabric 14, or the cells may be connected wirelessly.
  • the wireless connection of cells into the system 10 facilitates the dynamic addition and/or removal of cells for use in the system 10.
  • the CPU 11 may recruit cells from other cell systems, allowing for dynamic expansion and increased performance.
  • two or more cell systems e.g., networks
  • a cell that becomes idle may look for and/or be recruited by another system that has a need for additional processing resources, i.e., it has available processing tasks that need to be completed.
  • the system 10 may expand performance by incorporating clusters of additional cells for a particular task.
  • the system 10 may enhance performance of an encryption/decryption function, or the processing of audio and/or video data, by incorporating nearby cells capable of performing these tasks.
  • the CPU 11 may provide the task pool 13 with a list of or, alternatively, criteria for identifying trusted and/or untrusted cells as well as authentication requirements or protocols. Moreover, the task pool itself may exclude particular cells on the basis of low performance, unreliable connection, poor data throughput, or suspicion of malicious or otherwise inappropriate activity.
  • cells 12 may be added to a task pool 13, or excluded from a task pool 13, by a user through the use of a smartphone, tablet or other device or application.
  • a graphical application interface may provide the user with useful statistical and/or iconic information such as location of available cells and other devices, performance gain, or performance penalty, as a result of adding or removing particular cells from a network.
  • some or all of the co-processing cells may connect directly to the task pool 13, such as by a wired configuration that does not require a switching fabric 14 for communication.
  • the wired connection of cells may further facilitate dynamic expansion and contraction of the system 10 analogous to the wireless configuration discussed above, although wired connections may physical (e.g., manual) integration and extraction of peripheral devices. In either case, scalability of the system is greatly enhanced over conventional parallel processing schemes, as co-processors may be added and removed without reprogramming the CPU 11 to account for the changes to the system 10.
  • a network 300 includes a CPU 302, a first memory 304, a second memory 306, a task pool 308, a switching fabric 310, a first co-processing cell 312 configured to perform (execute) type A tasks, a second cell 314 configured to perform type B tasks, a third cell 316 configured to perform type C tasks, and a fourth cell 318 configured to perform both type A and type B tasks.
  • the task pool 308 is populated (e.g., by the CPU 302) with tasks (or task threads) 330 and 332 of task type A; tasks 334 and 336 of task type B; and tasks 340 and 342 of task type C.
  • each cell preferably has a unique, dedicated agent.
  • cell 312 includes an agent 320; cell 314 includes an agent 322; cell 316 includes an agent 324; and cell 318 includes an agent 326.
  • Each agent preferably includes an information field or header which identifies the type of tasks its associated cell is configured to perform, for example, a single task or combination of tasks A, B, C.
  • a cell when a cell is either idle or otherwise has available processing capacity, its agent proactively interrogates the task pool to determine whether any tasks are in the task queue which are appropriate for that particular cell. For example, cell 312 may dispatch its agent 320 to retrieve one or both of tasks 330 and 332 corresponding to task type A. Similarly, cell 314 may dispatch its agent 322 to retrieve either task 334 or 336 (depending on their relative priorities) corresponding to task type B, and so on. For cells which are capable of performing more than one task type, such as cell 318 configured to perform task types A and B, agent 326 may retrieve any one of tasks 330, 332, 334, and/or 336.
  • a cell may then process that task, typically by retrieving data from a particular location in first memory 304, processing that data, and storing the processed data at a particular location within second memory 306.
  • the cell notifies the task pool, the task pool marks the task as completed, and the task pool notifies the CPU that the task is completed.
  • the task pool may notify the CPU when a task thread is completed, inasmuch as a task thread may comprise a single task, a series of tasks, or Boolean combination of tasks.
  • the retrieval of tasks and the processing of data by the cells may occur without direct communication between the CPU and the various cells.
  • an internet of things network 400 includes a controller (CPU) 402, a task pool 408, and various devices 410 - 422, some or all of which include an associated or embedded microcontroller, such as an integrated circuit (IC) chip or other component which embodies processing capacity.
  • the devices may include a light bulb 410, a thermostat 412, an electrical receptacle 414, a power switch 416, an appliance (e.g., toaster) 418, a vehicle 420, a keyboard 422, and virtually any other plug and play device or application capable of interfacing with a network.
  • the controller 402 may be a smartphone, tablet, laptop, or other device which may include a display 404 and a user interface (e.g., keypad) 406 for facilitating user interaction with the various devices on the network.
  • a user interface e.g., keypad
  • the controller may effectively harvest or recruit processing resources from the peripheral devices via the task pool, for example as explained below in conjunction with FIG. 5.
  • Network 500 use case illustrates the dynamic harnessing of nearby (or otherwise available) devices.
  • Network 500 includes a primary control unit 502 (e.g., a laptop, tablet, or gaming device), a task pool 504, a first co-processor device 506, and a second co-processor device 508.
  • a primary control unit 502 e.g., a laptop, tablet, or gaming device
  • task pool 504 e.g., a laptop, tablet, or gaming device
  • first co-processor device 506 e.g., a first co-processor device 506
  • second co-processor device 508 e.g., a second co-processor device
  • the present invention proposes a method to harness the processing power of underutilized computer resources located within the vicinity of, or otherwise available to, the user.
  • the laptop 502 connects to the task pool 504.
  • the laptop itself may be equipped with a task pool, or the task pool may be in the form an external device or application located within wireless reach from the laptop 502.
  • the task pool itself could perform the duties of a switching fabric with ports to allow connection to multiple co-processing cells.
  • the laptop 502 populates the task pool 504 with computationally intensive tasks.
  • a nearby underutilized device, such as a smartphone 508, subsequently connects to the task pool 504 and sends its agent to fetch a matching task type. Consequently, the smart phone 508 becomes a co-processor seamlessly assisting the laptop 502, thereby enhancing the video game experience.
  • the same method may be repeated in the event other underutilized processing resources exist and are needed. Indeed, even the processing power of an available light-bulb 506 may become a coprocessor to the laptop.
  • FIG. 6 is a flow chart illustrating the operation of an exemplary parallel computing environment.
  • a method 600 includes populating a task pool with tasks (Step 602), proactively dispatching one or more agents from one or more corresponding cells to the task pool (Step 604), retrieving and processing a task (Step 606), and notifying the task pool and the CPU that the task thread has been performed (Step 608).
  • the method 600 further includes dynamically incorporating (Step 610) an additional device into the network, as needed.
  • a processing system which includes a task pool, a controller configured to populate the task pool with a first task, and a first coprocessor configured to proactively retrieve the first task from the task pool.
  • the first co-processor comprises a first agent configured to retrieve the first task from the task pool without communicating with the controller.
  • the first task includes indicia of a first task type
  • the first co-processor is configured to perform tasks of the first type
  • the first agent is configured to search the task pool for a task of the first type.
  • the first co-processor is further configured to process the first and notify the task pool upon completion of the first task, and the task pool is configured to notify the controller upon completion of the first task.
  • controller and the first co-processor are configured communicate with each other only through the task pool.
  • controller and the first co-processor are configured communicate with each other directly and through the task pool.
  • the first co-processor is configured to determine that it has available processing capacity, and to dispatch the agent to the task pool in response to the determination.
  • the controller is further configured to populate the task pool with a second task
  • the system further comprises a second co-processor having a second agent configured to proactively retrieve the second task from the task pool.
  • the second task includes indicia of a second task type
  • the second co-processor is configured to perform tasks of the second type
  • the second agent is configured to search the task pool for a task of the second type.
  • the controller and the task pool reside on a monolithic integrated circuit (IC), and the first co-processor does not reside on the IC.
  • IC monolithic integrated circuit
  • controller, the task pool, and the first and second co-processors reside on a monolithic integrated circuit (IC).
  • the method includes the steps of: programming a first cell to perform the first task type; adding the programmed first cell to the network; proactively sending a first agent from the first cell to the task pool; searching the task pool, by the first agent, for a task of the first type; retrieving, by the first agent, the first task from the task pool; transporting, by the first agent, the first task to the first cell; processing, by the first cell, the first task; and sending a notification from the first cell to the task pool that the first task is completed.
  • CPU central processing unit
  • the method also includes: marking, by the task pool, the first task as being completed; and sending a notification from the task pool to the CPU that the first task is completed.
  • the method also includes configuring the first cell to determine that the first cell has available processing capacity as a predicate to proactively sending the first agent to the task pool.
  • the method also includes integrating the first cell into a first device prior to adding the programmed first cell to the network.
  • the first device comprises one of a sensor, light bulb, power switch, appliance, biometric device, medical device, diagnostic device, lap top, tablet, smartphone, motor controller, and a security device.
  • adding the programmed first cell to the network comprises establishing a communication link between the first cell and the task pool.
  • the (CPU) is further configured to populate the task pool with a second task having a second task type, the method further comprising the steps of: programming the second cell to perform the second task type; establishing a communication link between the second cell and the task pool; proactively sending a second agent from the second cell to the task pool; searching the task pool, by the second agent, for a task of the second type; retrieving, by the second agent, the second task from the task pool; transporting, by the second agent, the second task to the second cell; processing, by the second cell, the second task; sending a notification from the second cell to the task pool that the second task is completed; marking, by the task pool, the second task as being completed; and sending a notification from the task pool to the CPU that the second task is completed.
  • a system for controlling distributed processing resources in an internet of things (IoT) computing environment, including: a CPU configured to partition an aggregate computing requirement into a plurality of tasks and place the tasks in a pool; and a plurality of devices each having a unique dedicated agent configured to proactively retrieve a task from the pool without direct communication with the CPU.
  • IoT internet of things

Abstract

Une architecture de traitement parallèle comprend une unité centrale, un pool de tâches rempli par l'unité centrale, et une pluralité de cellules de co-traitement autonomes, chacune comprenant un agent configuré pour interroger le pool de tâches de manière proactive afin de récupérer des tâches adaptées à un co-processeur particulier. Chaque co-processeur communique avec le pool de tâches par le biais d'une matrice de commutation, ce qui facilite les connexions pour le transfert de données et l'arbitrage entre toutes les ressources du système. Chaque co-processeur avertit le pool de tâches lorsqu'une tâche ou un fil de tâche est terminé, puis le pool de tâches avertit l'unité centrale.
EP15825147.0A 2014-07-24 2015-07-10 Système et procédé de traitement parallèle au moyen de cellules de co-traitement proactives configurables de manière dynamique Ceased EP3172669A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US14/340,332 US9852004B2 (en) 2013-01-25 2014-07-24 System and method for parallel processing using dynamically configurable proactive co-processing cells
PCT/US2015/039993 WO2016014263A2 (fr) 2014-07-24 2015-07-10 Système et procédé de traitement parallèle au moyen de cellules de co-traitement proactives configurables de manière dynamique

Publications (2)

Publication Number Publication Date
EP3172669A2 true EP3172669A2 (fr) 2017-05-31
EP3172669A4 EP3172669A4 (fr) 2018-03-14

Family

ID=55165563

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15825147.0A Ceased EP3172669A4 (fr) 2014-07-24 2015-07-10 Système et procédé de traitement parallèle au moyen de cellules de co-traitement proactives configurables de manière dynamique

Country Status (3)

Country Link
EP (1) EP3172669A4 (fr)
CN (1) CN106537343A (fr)
WO (1) WO2016014263A2 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112713993A (zh) * 2020-12-24 2021-04-27 天津国芯科技有限公司 一种加密算法模块加速器及数据高速加密方法
CN112799792B (zh) * 2021-02-01 2023-12-05 安徽芯纪元科技有限公司 一种嵌入式操作系统的任务上下文寄存器保护方法
CN113535405A (zh) * 2021-07-30 2021-10-22 上海壁仞智能科技有限公司 云端服务系统及其操作方法
CN117389731B (zh) * 2023-10-20 2024-04-02 上海芯高峰微电子有限公司 数据处理方法和装置、芯片、设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6222530B1 (en) * 1998-08-21 2001-04-24 Corporate Media Partners System and method for a master scheduler
US7159215B2 (en) * 2001-06-27 2007-01-02 Sun Microsystems, Inc. Termination detection for shared-memory parallel programs
US8037474B2 (en) * 2005-09-27 2011-10-11 Sony Computer Entertainment Inc. Task manager with stored task definition having pointer to a memory address containing required code data related to the task for execution
US8209702B1 (en) * 2007-09-27 2012-06-26 Emc Corporation Task execution using multiple pools of processing threads, each pool dedicated to execute different types of sub-tasks
US8108867B2 (en) * 2008-06-24 2012-01-31 Intel Corporation Preserving hardware thread cache affinity via procrastination
WO2010095358A1 (fr) * 2009-02-18 2010-08-26 日本電気株式会社 Dispositif d'allocation de tâche, procédé d'allocation de tâche et support d'enregistrement stockant un programme d'allocation de tâche
US8732713B2 (en) * 2010-09-29 2014-05-20 Nvidia Corporation Thread group scheduler for computing on a parallel thread processor
US8949853B2 (en) * 2011-08-04 2015-02-03 Microsoft Corporation Using stages to handle dependencies in parallel tasks
CN102427577A (zh) * 2011-12-06 2012-04-25 安徽省徽商集团有限公司 从协同服务器向移动终端推送信息的系统及其方法
US8990833B2 (en) * 2011-12-20 2015-03-24 International Business Machines Corporation Indirect inter-thread communication using a shared pool of inboxes

Also Published As

Publication number Publication date
CN106537343A (zh) 2017-03-22
WO2016014263A3 (fr) 2016-03-17
EP3172669A4 (fr) 2018-03-14
WO2016014263A2 (fr) 2016-01-28

Similar Documents

Publication Publication Date Title
US20200183735A1 (en) System and Method For Swarm Collaborative Intelligence Using Dynamically Configurable Proactive Autonomous Agents
US7689694B2 (en) Process management apparatus, computer systems, distributed processing method, and computer program for avoiding overhead in a process management device
US8250164B2 (en) Query performance data on parallel computer system having compute nodes
CN110178118B (zh) 硬件实现的负载平衡
CN107092573B (zh) 用于异构计算系统中的工作窃取的方法和设备
US8321876B2 (en) System and method of dynamically loading and executing module devices using inter-core-communication channel in multicore system environment
EP3172669A2 (fr) Système et procédé de traitement parallèle au moyen de cellules de co-traitement proactives configurables de manière dynamique
Tariq et al. Energy-efficient static task scheduling on VFI-based NoC-HMPSoCs for intelligent edge devices in cyber-physical systems
JP2006107513A (ja) 処理環境におけるパワー・マネジメント
US9158713B1 (en) Packet processing with dynamic load balancing
EP1768024B1 (fr) Dispositif de gestion de traitement, système informatique, procédé de traitement distribué et programme informatique
Papadogiannaki et al. Efficient software packet processing on heterogeneous and asymmetric hardware architectures
Si et al. Direct MPI library for Intel Xeon Phi co-processors
JP2020027613A (ja) 人工知能チップ及び人工知能チップに用いられる命令実行方法
WO2018182746A1 (fr) Exécution pouvant être initiée à chaud
US11366690B2 (en) Scheduling commands in a virtual computing environment
JP6740210B2 (ja) 動的に構成可能な先回りコプロセッシングセルを用いる並列処理のためのシステムおよび方法
CN113556242B (zh) 一种基于多处理节点来进行节点间通信的方法和设备
CN114281558A (zh) 多核处理器、用于多核处理器的方法及相应产品
CN103294623B (zh) 一种可配置simd系统的多线程调度电路
Takase et al. Work-in-Progress: Design Concept of a Lightweight Runtime Environment for Robot Software Components Onto Embedded Devices
WO2024012280A1 (fr) Procédé et dispositif de planification de tâches, carte et support de stockage lisible par ordinateur
Iezzi et al. Enabling Run-time Resource-aware Task Placement in Fog Scenario
Shih et al. Imprecise computation over the cloud
Wen et al. Dynamic Co-operative Intelligent Memory

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20170224

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20180213

RIC1 Information provided on ipc code assigned before grant

Ipc: G06F 9/50 20060101ALI20180207BHEP

Ipc: G06F 9/46 20060101AFI20180207BHEP

Ipc: G06F 9/54 20060101ALI20180207BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20200706

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20221013