CN106537343A

CN106537343A - Systems and methods for parallel processing using dynamically configurable active co-processing units

Info

Publication number: CN106537343A
Application number: CN201580039190.0A
Authority: CN
Inventors: 阿方索·伊尼格斯
Original assignee: Individual
Current assignee: Individual
Priority date: 2014-07-24
Filing date: 2015-07-10
Publication date: 2017-03-22
Also published as: EP3172669A4; WO2016014263A3; EP3172669A2; WO2016014263A2

Abstract

A parallel processing architecture comprising: a CPU, a task pool populated via the CPU, and a plurality of autonomous co-processing units, each having an agent configured to actively interrogate the task pool to retrieve tasks appropriate for a particular co-processor. Each co-processor communicates with the task pool through a switch fabric, which facilitates the connection of data transfer and arbitration between all system resources. Each co-processor notifies the task pool when a task or task thread is completed, and thus the task pool notifies the CPU.

Description

Systems and methods for parallel processing using dynamically configurable active co-processing units

本申请是2013年1月25日提交的美国申请序列号13/750,696的继续申请，其通过引用并入本文。This application is a continuation of US Application Serial No. 13/750,696, filed January 25, 2013, which is incorporated herein by reference.

技术领域technical field

本发明总体上涉及并行处理计算，具体涉及一种处理架构，其涉及被配置为从经由中央处理器填入的任务池主动检索任务的主动协同处理器。The present invention relates generally to parallel processing computing, and more particularly to a processing architecture involving active co-processors configured to actively retrieve tasks from a task pool populated via a central processing unit.

背景技术Background technique

物联网(也被称作物联云)是指在现有互联网基础设施内的唯一可识别的嵌入式计算设备的特设网络。物联网(IoT)意味着超越机器对机器通信(M2M)的设备、系统和服务的高级连接。IoT设想的事物的范围是无限的，可包括诸如心脏监测植入物、生物芯片应答器、汽车传感器、航空航天和防御现场操作设备、以及例如在搜索和救援操作中协助消防员的公共安全应用的设备。当前市场示例包括基于家庭的网络，其涉及智能恒温器、灯泡和利用wifi进行远程监控的洗衣机/干衣机。由于物联网中相连接的对象无所不在的性质，估计到2020年，超过300亿个设备将无线连接到物联网。本发明的目的之一在于利用与这些设备相关的控制器和处理器的处理能力。The Internet of Things (also known as IoT Cloud) refers to an ad hoc network of uniquely identifiable embedded computing devices within existing Internet infrastructure. The Internet of Things (IoT) means advanced connectivity of devices, systems and services beyond machine-to-machine communication (M2M). The range of things envisioned by the IoT is limitless and could include applications such as heart monitoring implants, biochip transponders, automotive sensors, aerospace and defense field operations equipment, and public safety applications such as assisting firefighters in search and rescue operations device of. Current market examples include home-based networking involving smart thermostats, light bulbs, and washer/dryers utilizing wifi for remote monitoring. Due to the ubiquitous nature of connected objects in IoT, it is estimated that by 2020, more than 30 billion devices will be wirelessly connected to IoT. It is an object of the present invention to utilize the processing power of the controllers and processors associated with these devices.

计算机处理器通常串行地执行机器编码指令。为了同时运行多个应用，单个处理器将来自各种程序的指令交错处理并且串行执行它们，尽管从用户的角度看，应用似乎被并行处理。另一方面，真正的并行处理或多核处理是一种将大的计算任务分为单独的计算块并将它们分配到两个或两个以上的计算器之中的计算方法。使用任务并行性(并行处理)的计算架构将大的计算需求划分为可执行代码的离散模块。然后基于它们各自的优先级同时或顺序地执行这些模块。Computer processors typically execute machine-coded instructions serially. To run multiple applications simultaneously, a single processor interleaves instructions from the various programs and executes them serially, even though from the user's perspective the applications appear to be processed in parallel. On the other hand, true parallel processing, or multi-core processing, is a method of computing that divides large computing tasks into separate computing blocks and distributes them among two or more computers. Computing architectures that use task parallelism (parallel processing) divide large computing needs into discrete modules of executable code. These modules are then executed concurrently or sequentially based on their respective priorities.

典型的多处理器系统包括中央处理器(CPU)以及一个或多个协同处理器。CPU将计算需求划分为任务，并将这些任务分发到协同处理器。已完成的线程被报告给CPU，CPU根据需要继续向协同处理器分发附加线程。目前已知的多处理方法的缺点在于：任务分发会消耗大量的CPU带宽；在分发新任务之前等待任务完成(通常具有对先前任务的依赖性)；当任务完成时响应来自协同处理器的中断；响应来自协同处理器的其它消息。此外，协同处理器在等待来自CPU的新任务时通常保持空闲。A typical multiprocessor system includes a central processing unit (CPU) and one or more coprocessors. The CPU divides computing needs into tasks and distributes these tasks to co-processors. Completed threads are reported to the CPU, which continues to dispatch additional threads to coprocessors as needed. Disadvantages of currently known multiprocessing methods are: task dispatch consumes a lot of CPU bandwidth; wait for tasks to complete before dispatching new ones (often with dependencies on previous tasks); respond to interrupts from coprocessors when tasks complete ;Respond to other messages from the coprocessor. Additionally, coprocessors typically remain idle while waiting for new tasks from the CPU.

因此，需要一种多处理器架构，所述架构减少CPU管理开销，并且还更有效地处理和利用可用的协同处理资源。Therefore, there is a need for a multiprocessor architecture that reduces CPU management overhead and also more efficiently processes and utilizes available co-processing resources.

发明内容Contents of the invention

并行处理计算架构的各种实施例包括：CPU，被配置为填入任务池；一个或多个协同处理器，被配置为从任务池主动检索线程(任务)。每个协同处理器在完成任务时通知任务池，并直到另一任务变得可用以进行处理之前发送回(ping)任务池。以这种方式，CPU直接与任务池进行通信，并通过任务池间接与协同处理器进行通信。Various embodiments of a parallel processing computing architecture include: a CPU configured to populate a task pool; and one or more co-processors configured to actively retrieve threads (tasks) from the task pool. Each co-processor notifies the task pool when a task is complete and pings back to the task pool until another task becomes available for processing. In this way, the CPU communicates directly with the task pool and indirectly with the co-processors through the task pool.

协同处理器还可以能够自主地操作；也就是说，它们可以独立于CPU与任务池交互。在优选实施例中，每个协同处理器包括询问任务池以搜索任务以执行的代理。因此，协同处理器“等同地”彼此地并与任务池一起地工作在一起，通过自动检索并完成可能是或可能不是相互关联的独立任务来完成集群计算需求。作为非限制示例，假设任务B涉及计算随时间的平均温度。通过限定任务A以包括捕捉随时间的温度度数，进一步通过限定任务B以包括获取捕捉的读数，CPU和各种协同处理器可由此经由任务池彼此通信。Co-processors may also be capable of operating autonomously; that is, they may interact with the task pool independently of the CPU. In a preferred embodiment, each co-processor includes an agent that queries the task pool to search for tasks to execute. Thus, the co-processors work together "equivalently" with each other and with the task pool to fulfill cluster computing needs by automatically retrieving and completing independent tasks that may or may not be interrelated. As a non-limiting example, assume task B involves calculating the average temperature over time. By defining Task A to include capturing temperature degrees over time, and further defining Task B to include taking captured readings, the CPU and various co-processors can thus communicate with each other via the task pool.

在各种实施例中，协同处理器被称作自动、主动等同单元。在本上下文中，术语“自主”意味着协同处理器可以与任务池交互，而不被CPU或任务池指示这样做。术语“主动”提议每个协同处理器可被配置(例如，编程)为周期性地发送代理以监视任务池中适合于该协同处理器的可用任务。术语“等同”意味着协同处理单元在监视和执行任务池中的所有可用任务时共享共同的目标。In various embodiments, co-processors are referred to as automatic, active equivalent units. In this context, the term "autonomous" means that the coprocessor can interact with the task pool without being instructed to do so by the CPU or the task pool. The term "active" proposes that each coprocessor may be configured (eg, programmed) to periodically send agents to monitor available tasks in the task pool suitable for that coprocessor. The term "equal" means that the co-processing units share a common goal in monitoring and executing all available tasks in the task pool.

等同单元(协同处理器)可以是通用处理器或专用处理器，因此可相较于CPU或在系统中的其它等同单元具有相同或不同指令集、结构和微架构。此外，将被执行的软件程序和将被处理的数据可被包括在一个或多个存储单元中。在常规的计算机系统中，例如，软件程序包括可需要将由程序使用的数据的指令串。例如，如果该程序与媒体播放器对应，则包含在存储器中的数据可以是压缩的音频数据，所述音频数据可由协同处理器读取并最终在扬声器上播放。The equivalent unit (co-processor) may be a general purpose processor or a special purpose processor and thus may have the same or a different instruction set, architecture and microarchitecture than the CPU or other equivalent units in the system. Also, software programs to be executed and data to be processed may be included in one or more storage units. In a conventional computer system, for example, a software program includes a string of instructions that may require data to be used by the program. For example, if the program corresponds to a media player, the data contained in the memory may be compressed audio data that can be read by the co-processor and eventually played on speakers.

在系统中的每个等同单元可被配置为通过纵横开关(也被称作结构)与任务池进行电阻或无线通信。在纯正的无线网状拓扑中，无线电信号自己可构成结构。在各种实施例中，协同处理器也可直接与CPU进行通信。切换结构有利于系统资源之间的通信。每个等同单元是主动的，因为当等同单元没有要执行的处理时，或者当等同单元能够有助于处理循环而不阻碍其正常操作时，它通过将其代理发送到任务池来获得要执行的任务。作为非限制性示例，在物联网(下面更详细地讨论)的上下文中，与诸如灯泡的设备相关联的协同处理器可以被编程为监听来自于主设备(例如智能手机)的“开”和“关”命令作为其正常操作，但是其处理资源也可以通过任务池来利用。Each equivalent unit in the system can be configured to communicate resistively or wirelessly with the task pool through a crossbar switch (also called a fabric). In a pure wireless mesh topology, the radio signals themselves form the structure. In various embodiments, co-processors may also communicate directly with the CPU. The switching structure facilitates communication between system resources. Each peer unit is active in that it obtains processing by sending its agents to the task pool when the peer unit has no processing to perform, or when the peer unit is able to contribute to the processing cycle without hindering its normal operation. task. As a non-limiting example, in the context of the Internet of Things (discussed in more detail below), a co-processor associated with a device such as a light bulb could be programmed to listen for "on" and The "off" command acts as its normal operation, but its processing resources can also be utilized through the task pool.

在本文中描述的各种实施例的上下文中，术语“代理”是指与协同处理器相关联的软件模块，类似于网络包，其中，所述协同处理器与任务池进行交互从而获得对协同处理器单元适合的可用任务。当任务可能在先前任务的运行中时，等同单元可顺序地执行任务，或者当一个以上的等同单元是可用的并且一个以上的匹配任务可用于运行时，并行地执行任务。根据由CPU提供的任务线程限制(如果任何)，可独立地或协作地执行任务。在任务池中相互依赖的任务可被逻辑上组合。当任务线程已完成时，任务池通知CPU。如果任务线程包括单个任务，则任务池可在该任务完成时通知CPU。如果任务线程包括多个任务，则任务池在完成这种任务链时通知CPU。由于任务线程可以被逻辑上组合，所以可以想到具有以下情况：任务池在逻辑上组合的任务线程完成之后通知CPU。In the context of the various embodiments described herein, the term "agent" refers to a software module, similar to a network package, associated with a coprocessor that interacts with a pool of tasks to obtain access to a coprocessor. Available tasks for which the processor unit is suitable. Peer units may execute tasks sequentially when a task may be in the run of a previous task, or in parallel when more than one peer unit is available and more than one matching task is available for execution. Depending on the task thread limit (if any) provided by the CPU, tasks may be executed independently or cooperatively. Tasks that depend on each other in a task pool can be logically combined. The task pool notifies the CPU when a task thread has completed. If a task thread includes a single task, the task pool can notify the CPU when the task is complete. If a task thread includes multiple tasks, the task pool notifies the CPU when such a chain of tasks is completed. Since task threads can be logically combined, it is conceivable to have a case where the task pool notifies the CPU after the logically combined task threads are completed.

本领域技术人员将理解，可以通过将CPU配置为以独立于与各种协同处理器相关联的指令集架构的抽象级别来组合和/或结构任务来促进CPU与协同处理器之间的互操作性，从而允许组件在任务级别而不是在指令级别进行通信。因此，可以在“即插即用”的基础上将设备及其相关联的协同处理器添加到网络。本发明的另一方面提供具有不同指令集架构的CPU的异构阵列内的互操作性。Those skilled in the art will appreciate that interoperation between a CPU and coprocessors can be facilitated by configuring the CPU to combine and/or structure tasks at a level of abstraction independent of the instruction set architecture associated with the various coprocessors nature, allowing components to communicate at the task level rather than at the instruction level. Thus, devices and their associated co-processors can be added to the network on a "plug and play" basis. Another aspect of the invention provides interoperability within a heterogeneous array of CPUs with different instruction set architectures.

本发明的各种特征尤其适用于物联网设备和传感器的网络；异构计算环境；高性能计算、二维和三维单片集成电路；运动控制和机器人。The various features of the invention are particularly applicable to networks of IoT devices and sensors; heterogeneous computing environments; high performance computing, 2D and 3D monolithic integrated circuits; motion control and robotics.

附图说明Description of drawings

本发明将在下文中结合附图进行描述，其中，相同的数字表示相似的元件，在附图中：The present invention will hereinafter be described with reference to the accompanying drawings, wherein like numerals represent similar elements, in the accompanying drawings:

图1是根据实施例的包括CPU、存储器、任务池和被配置为经由结构(fabric)进行通信的多个协同处理器的并行处理架构的示意框图；1 is a schematic block diagram of a parallel processing architecture including a CPU, a memory, a task pool, and a plurality of coprocessors configured to communicate via a fabric, according to an embodiment;

图2是示出根据实施例的示例性任务池的细节的示意框图；Figure 2 is a schematic block diagram illustrating details of an exemplary task pool according to an embodiment;

图3是根据实施例的包括协同处理单元以及与任务池进行交互的它们对应的代理的网络的示意框图；Figure 3 is a schematic block diagram of a network comprising collaborative processing units and their corresponding agents interacting with task pools, according to an embodiment;

图4是根据实施例的包括可用插头和播放设备的物联网的示意布局；Figure 4 is a schematic layout of the Internet of Things including available plugs and playback devices according to an embodiment;

图5是示出根据实施例的附近设备的动态处理的示例性物联网使用案例的示意布局；Figure 5 is a schematic layout of an exemplary Internet of Things use case illustrating dynamic handling of nearby devices according to an embodiment;

图6是示出根据实施例的示例性并行计算环境的操作的流程图。Figure 6 is a flowchart illustrating the operation of an exemplary parallel computing environment according to an embodiment.

具体实施方式detailed description

各种实施例涉及并行处理计算系统和环境，从简单的切换和控制功能到复杂的程序和算法，包括但不限于：数据加密；图形、视频和音频处理；直接内存访问；数学计算；数据挖掘；游戏算法；以太网包和其它网络协议处理，包括外部网络的数据建设、接收和传输；金融服务和商业方法；搜索引擎；互联网数据流和其它基于网络的应用；执行内部或外部软件程序；例如在物联网的环境中接通和断开和/或以其它方式控制或操纵器具、灯泡、消费电子产品等。Various embodiments relate to parallel processing computing systems and environments, ranging from simple switching and control functions to complex programs and algorithms, including but not limited to: data encryption; graphics, video, and audio processing; direct memory access; mathematical computing; data mining ; game algorithms; Ethernet packets and other network protocol processing, including data construction, reception and transmission of external networks; financial services and business methods; search engines; Internet data streaming and other network-based applications; execution of internal or external software programs; For example, switching on and off and/or otherwise controlling or manipulating appliances, light bulbs, consumer electronics, etc. in the context of the Internet of Things.

各种特征可并入任何当前已知或以后开发的计算机架构中。例如，涉及同步、数据安全、无序执行和主处理器中断的并行处理问题可以使用本文描述的发明构思来解决。The various features may be incorporated into any currently known or later developed computer architecture. For example, parallel processing problems involving synchronization, data safety, out-of-order execution, and host processor interrupts can be solved using the inventive concepts described herein.

现在参照图1，分布式处理系统10包括单核或多核CPU 11以及被配置为经由横杆切换结构14与任务池13进行通信的一个或多个等同或协同处理单元12A至12。等同单元12还可经由切换结构14或经由单独的单元总线(未示出)彼此通信。CPU 11可直接或经由切换结构14与任务池13进行通信。一个或多个存储单元15中的每一个包含数据和/或指令。在本上下文中，术语“指令”包括可经由CPU 11执行编译的软件程序。存储单元15、单元12和任务池13可电阻或无线互联以经由切换结构14与CPU和/或彼此相互通信。在某些实施例中，CPU11仅通过任务池间接与单元12通信。在其它实施例中，CPU 11也可在不使用任务池作为中间物直接与单元12进行通信。Referring now to FIG. 1 , a distributed processing system 10 includes a single-core or multi-core CPU 11 and one or more equivalent or co-processing units 12A-12 configured to communicate with a task pool 13 via a crossbar switch structure 14 . The equivalent units 12 may also communicate with each other via the switch fabric 14 or via a separate unit bus (not shown). CPU 11 may communicate with task pool 13 directly or via switch fabric 14 . Each of the one or more storage units 15 contains data and/or instructions. In this context, the term “instructions” includes compiled software programs executable via the CPU 11 . Storage unit 15 , unit 12 and task pool 13 may be resistively or wirelessly interconnected to communicate with the CPU and/or each other via switching fabric 14 . In some embodiments, CPU 11 communicates with unit 12 only indirectly through a task pool. In other embodiments, the CPU 11 can also directly communicate with the unit 12 without using a task pool as an intermediary.

在某些实施例中，系统10可包括一个以上的CPU 11和一个以上的任务池13，在这种情况下，一个特定的CPU 11可与一个特定的任务池13进行交互，或者多个CPU 11可共享一个或一个以上的任务池13。此外，每个等同单元可被配置为与一个以上任务池13进行交互。可选地，一个特定的单元可被配置为与单个指定任务池进行交互，例如，在高性能或高安全性环境下。In some embodiments, the system 10 may include more than one CPU 11 and more than one task pool 13, in which case a specific CPU 11 may interact with a specific task pool 13, or multiple CPUs 11 can share one or more task pools 13 . Furthermore, each peer unit may be configured to interact with more than one task pool 13 . Optionally, a particular unit can be configured to interact with a single designated task pool, for example, in high-performance or high-security environments.

在各种实施例中，当满足以下三个条件时，单元可与任务池动态配对、电阻(插入并播放)或无线(空中)地：In various embodiments, a unit can be dynamically paired with a task pool, resistively (plug and play) or wirelessly (over the air), when the following three conditions are met:

1)单元能够与任务池进行电阻通信或无线通信。到任务池的连接可以通过任务池本身中的端口，或通过连接到任务池的切换结构；1) The unit is capable of resistive or wireless communication with the task pool. The connection to the task pool can be through a port in the task pool itself, or through a switch structure connected to the task pool;

2)任务池识别由单元发送的代理是可信的，例如，使用具有或不具有密码的来自用户的输入，通过传统Wi-Fi、Blootooth或类似配对，手动地通过在智能手机或平板电脑上运行的图形软件程序或通过任何其他安全或不安全的方法；2) The task pool identifies that the agent sent by the unit is authentic, e.g. using input from the user with or without a passcode, via traditional Wi-Fi, Bluetooth or similar pairing, manually via a smartphone or tablet run graphics software programs or through any other secure or unsecure method;

3)任务池中的至少一个可用任务与等同单元的能力兼容。3) At least one available task in the task pool is compatible with the capabilities of the equivalent unit.

在具有多任务池的多处理器环境的情况下，除了给定单元可以被锁定或被限制为仅与任一个任务池一起工作之外，前述动态配对条件适用；否则，单元可以使用第一查找基础、循环基础或任何其它选择方案，与一个或多个任务池连接。还可以向任务池中的任务分配优先级，由此单元给予高优先级任务优先级，并且当没有以其它方式被更高优先级任务占用时，服务较低优先级任务。In the case of a multiprocessor environment with multiple task pools, the aforementioned dynamic pairing conditions apply, except that a given unit may be locked or restricted to work only with either task pool; otherwise, the unit may use the first lookup Basis, cyclic basis, or any other selection scheme, connected to one or more task pools. Tasks in the task pool may also be assigned priorities whereby the unit gives priority to high priority tasks and services lower priority tasks when not otherwise occupied by higher priority tasks.

CPU 11可以是用于执行软件程序的单核处理器或多核处理器、应用处理器或微控制器。系统10可在个人计算机、智能电话、平板电脑、互联网设备上被实现，在这种情况下，CPU 11可以是任何个人计算机、中央处理器或处理器集群，诸如，或者立即计算环境的本地或远程多核处理器。可选地，系统10可在超级计算机上实现，并且CPU11可以是精简指令集计算机(“RISC”)处理器、应用处理器、微处理器等。The CPU 11 may be a single-core or multi-core processor, an application processor, or a microcontroller for executing software programs. System 10 may be implemented on a personal computer, smart phone, tablet, internet appliance, in which case CPU 11 may be any personal computer, central processing unit or cluster of processors, such as, Or local or remote multi-core processors for immediate computing environments. Alternatively, system 10 may be implemented on a supercomputer, and CPU 11 may be a Reduced Instruction Set Computer ("RISC") processor, an application processor, a microprocessor, or the like.

在其它实施例中，系统10可在本地连接的一系列个人计算机(诸如，贝奥武夫(Beowulf)集群)上实现，在这种情况下，CPU 11可包括所有的中央处理器、子集或联网计算机中的一个。可选地，系统10可在远程连接的计算机上的网络上实现，在这种情况下，CPU11可以是现在已知的或稍后将开发的用于服务器或大型机的中央处理器。CPU 11在当前描述的系统10内执行对象并行处理方法的具体方式可以受到CPU的操作系统的影响。例如，如下所述，CPU 11可以被配置为通过对其进行编程以识别并与任务池13通信并将计算需求划分为线程而在系统10内使用。In other embodiments, system 10 may be implemented on a series of locally connected personal computers (such as a Beowulf cluster), in which case CPU 11 may include all central processing units, a subset, or One of the networked computers. Alternatively, system 10 may be implemented over a network of remotely connected computers, in which case CPU 11 may be a central processing unit for servers or mainframes now known or later developed. The specific manner in which the CPU 11 executes the object parallel processing method within the presently described system 10 may be affected by the operating system of the CPU. For example, as described below, CPU 11 may be configured for use within system 10 by programming it to recognize and communicate with task pool 13 and divide computing requirements into threads.

还可以预期，系统10可以在具有操作系统的任何计算机或计算机网络上反作用实现，该操作系统可以被修改或以其它方式配置为实现本文所描述的功能。如本领域中已知的，要处理的数据包含在存储器单元15内，例如在随机存取的可寻址区域或分区或只读存储器的情景下，用于CPU 11的高速缓存存储器，或其它形式数据存储器如闪存和磁存储器。存储器单元15包含要处理的数据以及放置处理的数据的结果的位置。不是每个任务都需要访问存储器单元15，例如在智能仪表和汽车仪表的情况下，其可以将数据返回到系统10，或者在机器人和马达控制器的情况下，其可使机器制动。It is also contemplated that system 10 may be implemented retroactively on any computer or network of computers having an operating system that may be modified or otherwise configured to carry out the functions described herein. As is known in the art, the data to be processed is contained within a memory unit 15, such as a cache memory for the CPU 11 in the context of random-access addressable areas or partitions or read-only memory, or other Forms of data storage such as flash memory and magnetic storage. The memory unit 15 contains the data to be processed and the location where the results of the processed data are placed. Not every task requires access to the memory unit 15, eg in the case of smart and automotive meters, which may return data to the system 10, or in the case of robotics and motor controllers, which may brake the machine.

每个单元12在概念上或逻辑上是能够运行一个或多个任务/线程的独立的计算单元。单元12可以是微控制器、微处理器、应用处理器、“哑巴”开关或独立计算机，如贝奥武夫集群中的机器。Each unit 12 is conceptually or logically an independent computing unit capable of running one or more tasks/threads. Unit 12 may be a microcontroller, microprocessor, application processor, "dumb" switch, or a stand-alone computer, such as a machine in a Beowulf cluster.

单元12可以是被配置为补充、执行全部或执行有限范围的CPU的功能的通用或专用协同处理器，或者例如对CPU 11是外部的功能，诸如环境监视和机器人致动器。专用处理器可以是被设计、编程或以其它方式配置为执行专门任务的专用硬件模块，或者其可以是被配置为执行诸如图形处理、浮点算术或数据加密的专用任务的通用处理器。Unit 12 may be a general-purpose or special-purpose co-processor configured to supplement, perform all or a limited range of functions of the CPU, or functions external to CPU 11 such as environmental monitoring and robotic actuators, for example. A special-purpose processor can be a special-purpose hardware module designed, programmed, or otherwise configured to perform specialized tasks, or it can be a general-purpose processor configured to perform specialized tasks such as graphics processing, floating-point arithmetic, or data encryption.

在实施例中，作为专用处理器的任何单元12还可被配置为访问并写入存储器并执行如下所述的描述符以及其它软件程序。In an embodiment, any unit 12 that is a dedicated processor may also be configured to access and write memory and execute descriptors and other software programs as described below.

此外，任何数量的单元12可以包括异构计算环境；即，使用一种类型以上的处理器(诸如基于AMD和/或基于Intel的处理器)或32位和64位处理器的混合的系统。Furthermore, any number of units 12 may comprise a heterogeneous computing environment; that is, a system using more than one type of processor (such as AMD-based and/or Intel-based processors) or a mix of 32-bit and 64-bit processors.

如在以下序列事件中所示，每个单元12被配置为执行一个或多个专用任务。在轮询阶段期间，每个单元周期性地向任务池发送代理，直到找到匹配任务。为了便于这种匹配，单元和任务池可以配备有收发器。在任务池的情况下，收发器可位于任务池本身或者位于与任务池连接的切换结构中。当在任务池内找到任务匹配时，任务池将确认发送到单元。接下来的步骤是“通信信道”阶段。在通信信道阶段，单元接收任务并开始执行任务。在一个实施方案中，一旦第一任务被完成，则维持通信信号，使得等同单元可在不需要重复“轮询”和“确认”阶段就可抓取其它任务。As shown in the following sequence of events, each unit 12 is configured to perform one or more dedicated tasks. During the polling phase, each unit periodically sends agents to the task pool until a matching task is found. To facilitate this matching, cells and task pools can be equipped with transceivers. In the case of task pools, the transceiver may be located in the task pool itself or in a switching fabric connected to the task pool. When a task match is found within the task pool, the task pool sends an acknowledgment to the unit. The next step is the "communication channel" phase. In the communication channel phase, the unit receives the task and starts executing the task. In one embodiment, once the first task is completed, the communication signal is maintained so that the equivalent unit can grab other tasks without repeating the "poll" and "acknowledgement" phases.

系统10可包括多个单元，其中，这些单元中的某些单元能够执行与其它单元相同任务类型，从而在系统10中创造冗余。由给定单元12执行的任务类型集合可以是由另一单元执行的任务类型的集合的子集。例如，在图1中，系统10可将聚集的计算问题划分为任务组，利用第一类型、第二类型和第三类型的任务来填充任务池13。第一单元12A可能够仅执行第一类型的任务；第二单元12B可能够执行第二类型的任务；第三单元12C可能够执行第三类型的任务；第四单元12D可能够执行第二或第三类型的任务；第五单元12N可能够执行所有三种任务类型。系统10可被配置具有这种冗余，使得如果给定单元从系统10(或当前忙或其它方式不可用)移除，则系统10可继续无缝运行。此外，如果单元被动态地添加到系统10，则系统10可在具有更高性能的好处下继续无缝运行。System 10 may include multiple units, where some of these units are capable of performing the same type of tasks as other units, thereby creating redundancy in system 10 . The set of task types performed by a given unit 12 may be a subset of the set of task types performed by another unit. For example, in FIG. 1 , system 10 may divide aggregated computing problems into task groups, populating task pool 13 with tasks of a first type, a second type, and a third type. The first unit 12A may be capable of performing only a first type of task; the second unit 12B may be capable of performing a second type of task; the third unit 12C may be capable of performing a third type of task; the fourth unit 12D may be capable of performing a second or second type of task. A third type of task; the fifth unit 12N may be capable of performing all three task types. System 10 can be configured with such redundancy so that if a given unit is removed from system 10 (or is currently busy or otherwise unavailable), system 10 can continue to operate seamlessly. Furthermore, if units are dynamically added to the system 10, the system 10 can continue to operate seamlessly with the benefit of higher performance.

现在参照图1和图2，任务池13可占据可由CPU 11访问的物理存储器的区域。可选地，任务池13可被MAC地址或IP地址访问。对于任务池13预想了多个实施例；它可与CPU在物理上位于相同的2D或3D单片IC中，或者它可以被实现为独立IC并且物理地互连到计算机板、智能电话、平板电脑、路由器或物联网设备。在另一可选实施例中，任务池可以是可以在多个CPU 11系统之间共享或专用于给定CPU 11的独立多端口，有线和/或无线连接的设备。任务池13还可以由单元12寻址。任务池13可以设置在专用硬件块中，以通过CPU 11和单元12提供最大访问速度。可选地，任务池13可以是基于软件的，其中，类似于基于硬件的实施例，任务池13的内容存储在存储器中，但由数据结构表示。Referring now to FIGS. 1 and 2 , task pool 13 may occupy an area of physical memory accessible by CPU 11 . Optionally, the task pool 13 can be accessed by MAC address or IP address. Multiple embodiments are envisioned for the task pool 13; it may be physically located in the same 2D or 3D monolithic IC as the CPU, or it may be implemented as a separate IC and physically interconnected to a computer board, smartphone, tablet Computers, routers or IoT devices. In another alternative embodiment, the task pool may be an independent multi-port, wired and/or wirelessly connected device that may be shared between multiple CPU 11 systems or dedicated to a given CPU 11 . Task pool 13 is also addressable by unit 12 . The task pool 13 can be arranged in a dedicated hardware block to provide maximum access speed by the CPU 11 and unit 12 . Alternatively, the task pool 13 may be software-based, wherein, similar to the hardware-based embodiment, the content of the task pool 13 is stored in memory, but represented by a data structure.

在被CPU 11填充时，任务池13包含一个或多个任务线程21。每个任务线程21表示计算任务，该计算任务可以是施加在CPU 11上的较大聚合计算需求的组件或子集。在一个实施例中，CPU 11可初始化并随后用同时可执行的线程21填充任务池13。每个线程21可以包括一个或多个离散任务22。任务22可具有任务类型和描述符。任务类型指示哪些单元12能够执行任务22。任务池13还可以使用任务类型来对具有相同类型的任务22进行优先级排序。在一个实施例中，任务池13可维护记录存在于系统10中的等同单元12的优先级表(未示出)、每个单元能够执行的任务22的类型以及每个单元当前是否正在处理。如下所述，任务池13可以使用优先级表来确定将哪些合格任务22分配给请求单元。When populated by CPUs 11 , task pool 13 contains one or more task threads 21 . Each task thread 21 represents a computing task, which may be a component or subset of a larger aggregate computing demand placed on CPU 11 . In one embodiment, CPU 11 may initialize and then populate task pool 13 with concurrently executable threads 21 . Each thread 21 may include one or more discrete tasks 22 . A task 22 may have a task type and a descriptor. The task type indicates which units 12 are capable of performing the task 22 . The task pool 13 may also use task types to prioritize tasks 22 of the same type. In one embodiment, task pool 13 may maintain a priority table (not shown) that records the equivalent units 12 that exist in system 10, the types of tasks 22 that each unit is capable of performing, and whether each unit is currently processing. As described below, task pool 13 may use a priority table to determine which eligible tasks 22 to assign to requesting units.

在某些实施例中，CPU 11可从任务池检索并执行任务或线程。此外，CPU 11可中断被确定为过时、损坏、卡住或错误的任何任务。在这种情况下，CPU 11可更新任务，使其可用于后续处理。没有任何东西阻止CPU 11实现自适应任务管理，例如，如人工智能可能需要的，由此CPU 11可以添加、移除或改变未完成的现有线程21内的任务。In some embodiments, CPU 11 may retrieve and execute tasks or threads from a task pool. Additionally, CPU 11 may interrupt any task that is determined to be obsolete, corrupt, stuck, or erroneous. In this case, the CPU 11 can update the task to make it available for subsequent processing. Nothing prevents the CPU 11 from implementing adaptive task management, eg, as artificial intelligence may require, whereby the CPU 11 can add, remove or change outstanding tasks within existing threads 21 .

描述符可包含将被执行的具体指令，执行模式、将被处理的数据的位置(例如，地址)和任务结果的放置位置(如果有的话)中的一个或更多个。结果的放置位置是可选的，诸如，在动画和多媒体任务的情况下，其通常将结果呈现给显示器而不是将它们存储在存储器中。此外，任务描述符可以链接在一起，如在链表中，使得与如果描述符不被链接在一起相比，可以用更少的存储器调用来访问要处理的数据。在一个实施例中，描述符是包含头部和到存储器位置的多个引用指针的数据结构，任务22包括数据结构的存储器地址。头部定义了要执行的函数或指令。第一指针引用要处理的数据的位置。第二个可选指针引用处理数据的放置位置。如果描述符链接到要被顺序执行的另一描述符，则描述符可以包括引用下一描述符的第三指针。在描述符是数据结构的替代实施例中，任务22可以包括完整数据结构。A descriptor may contain one or more of the specific instruction to be executed, the mode of execution, the location (eg, address) of data to be processed, and the placement location (if any) of the task result. The placement of the results is optional, such as in the case of animation and multimedia tasks, which typically present the results to a display rather than storing them in memory. Furthermore, task descriptors can be linked together, such as in a linked list, so that fewer memory calls can be used to access the data to be processed than if the descriptors were not linked together. In one embodiment, the descriptor is a data structure containing a header and a number of reference pointers to memory locations, and task 22 includes the memory address of the data structure. The header defines the functions or instructions to be executed. The first pointer refers to the location of the data to be processed. The second optional pointer references where to place the processing data. If the descriptor is linked to another descriptor to be executed sequentially, the descriptor may include a third pointer referencing the next descriptor. In an alternate embodiment where the descriptor is a data structure, task 22 may include the complete data structure.

线程21还可包括描述可以执行任务22的顺序和影响性能顺序的任何条件的“配方”。根据配方，可以根据布尔运算顺序地、同时地、无序地、相互依赖地或有条件地执行任务22。例如，如图2所示，线程21A包括四个任务：22A、22B、22C和22D。在所示实施例中，第一任务22A必须在第二任务22B或第三任务22C可以开始之前完成。根据配方，一旦第二任务22B或第三任务22C完成，则第四任务22D可以开始。Threads 21 may also include "recipes" that describe the order in which tasks 22 may be executed and any conditions that affect the order of performance. Depending on the recipe, tasks 22 may be executed sequentially, simultaneously, out of order, interdependently, or conditionally according to Boolean operations. For example, as shown in FIG. 2, thread 21A includes four tasks: 22A, 22B, 22C, and 22D. In the illustrated embodiment, the first task 22A must complete before either the second task 22B or the third task 22C can begin. According to the recipe, once either the second task 22B or the third task 22C is completed, the fourth task 22D may begin.

线程21也可以是相互依赖的。例如，如图2所示，由于线程21B中的布尔运算，完成的任务22C可以允许线程21B中的任务的处理继续。任务池13可以锁定任务22，同时任务22正等待其所依赖的另一任务22的完成。当任务22被锁定时，其不能被单元获取。当线程21的任务22完成时，任务池13可以通知CPU 11完成。然后，CPU可使处理超过完成的线程21。Threads 21 may also be interdependent. For example, as shown in FIG. 2 , due to a Boolean operation in thread 21B, completed task 22C may allow processing of tasks in thread 21B to continue. The task pool 13 may lock a task 22 while the task 22 is waiting for the completion of another task 22 on which it depends. When a task 22 is locked, it cannot be acquired by a unit. When the task 22 of the thread 21 is completed, the task pool 13 may notify the CPU 11 of the completion. The CPU can then overrun the thread 21 for processing.

这些单元有利地保持彼此和与CPU 11的等同，从而帮助系统10通过自主地和主动地从任务池13检索任务来执行复杂的计算。单元12自主地操作，因为它们可以独立于CPU11或任何其它协同处理器。可选地，可以直接由CPU作用或指示单元。每个单元主动地行动，因为一旦单元变得可用于进一步处理，它就从任务池13寻找任务22。These units advantageously remain identical to each other and to the CPU 11 , thereby helping the system 10 to perform complex calculations by autonomously and actively retrieving tasks from the task pool 13 . Units 12 operate autonomously in that they may be independent of CPU 11 or any other co-processor. Optionally, the CPU can act or instruct the unit directly. Each unit acts proactively in that it seeks tasks 22 from the task pool 13 as soon as the unit becomes available for further processing.

更具体地，在一个实施例中，单元12通过发送代理30以询问(搜索)任务池并检索可用任务22从任务池获取任务，可用任务22需要完成、未被锁定并且具有单元可执行的任务类型。通常来说，系统10与等同协同处理单元具有相同数量的代理。在这种情况下，代理通常类似于网络意义上的数据帧，因为代理可以配备有源地址、目的地址和负载。在实施例中，当代理30正在寻找任务22时，目的地址是任务池13的地址，当代理30返回到其具有任务22的单元时，目的地址是相应的单元12的地址。相应地，当代理30正在寻找任务22时，源地址是单元12的地址，并且当代理30返回到其具有任务22的单元时，源地址是任务池13的地址。More specifically, in one embodiment, a unit 12 acquires tasks from the task pool by sending an agent 30 to query (search) the task pool and retrieve available tasks 22 that need to be completed, are unlocked, and have tasks that the unit can execute Types of. In general, system 10 has the same number of agents as equivalent co-processing units. In this context, proxies are often analogous to data frames in the networking sense, since a proxy can be equipped with a source address, a destination address, and a payload. In an embodiment, the destination address is the address of the task pool 13 when the agent 30 is looking for a task 22 and the address of the corresponding unit 12 when the agent 30 returns to a unit where it has a task 22 . Correspondingly, when the agent 30 is looking for a task 22, the source address is the address of the unit 12, and when the agent 30 returns to the unit where it has a task 22, the source address is the address of the task pool 13.

此外，源地址和目的地址可有利于帧同步。也就是说，系统10可被配置为明确区分地址与负载数据，使得当读取代理30的内容时，目的地址指示帧的开始，并且源地址指示帧的结束，或反之亦然。这允许负载在放置在地址之间时在大小上变化。在大小可变的负载的另一实施例中，代理30可包括指示负载大小的头部。可将头部信息与负载进行比较以验证数据完整性。在又一个实施例中，负载可以是固定长度。当代理30通过其协同处理器单元被分派到任务池13时，负载包括单元12可以执行的任务类型的标识信息。当代理30从任务池13返回时，负载包括以存储位置的形式或全部描述符数据结构的形式的任务22的描述符。Additionally, source and destination addresses can facilitate frame synchronization. That is, system 10 may be configured to explicitly distinguish addresses from payload data such that when reading the contents of proxy 30, the destination address indicates the start of a frame and the source address indicates the end of a frame, or vice versa. This allows loads to vary in size when placed between addresses. In another embodiment of a variable-sized payload, proxy 30 may include a header indicating the payload size. The header information can be compared to the payload to verify data integrity. In yet another embodiment, the load may be of fixed length. When an agent 30 is dispatched to the task pool 13 by its co-processor unit, the payload includes identification information of the type of task that the unit 12 can perform. When the agent 30 returns from the task pool 13, the payload includes the descriptors of the tasks 22 in the form of memory locations or global descriptor data structures.

在另一实施例中，代理30中的某些或全部代理是其各自对应的单元12的自主代表。也就是说，每个代理30可以由其相应的单元12分派，以在该单元空闲或能够执行附加处理时检索任务22。以这种方式，可以更充分地利用等同单元12的处理能力，因为单元不需要空闲地等待来自CPU 11的指令。此方法具有通过减轻CPU向单元发送请求以从任务池检索任务的需求来减少CPU开销的额外优点。这些优点使得系统10比传统计算机架构更有效，其中辅助模块和协同处理器依赖于来自主CPU的指令。In another embodiment, some or all of the agents 30 are autonomous representatives of their respective corresponding units 12 . That is, each agent 30 may be dispatched by its corresponding unit 12 to retrieve tasks 22 when the unit is idle or able to perform additional processing. In this way, the processing capabilities of the equivalent unit 12 can be more fully utilized, since the unit does not need to sit idle waiting for instructions from the CPU 11 . This approach has the added advantage of reducing CPU overhead by alleviating the need for the CPU to send requests to the unit to retrieve tasks from the task pool. These advantages make system 10 more efficient than conventional computer architectures in which auxiliary modules and co-processors rely on instructions from the main CPU.

此外，等同单元12A至12n对于线程本身的具体组成是矛盾的。相反，代理仅关注找到其相应单元的能力与要在任务池13中完成的可用任务22之间的匹配。也就是说，只要任务池13中存在可用任务22，并且可用任务22匹配单元的能力，则系统可有效地利用单元的处理能力。Furthermore, the equivalence units 12A to 12n are contradictory to the specific composition of the threads themselves. Instead, the agent is only concerned with finding a match between the capabilities of its corresponding unit and the available tasks 22 to be completed in the task pool 13 . That is, as long as there are available tasks 22 in the task pool 13 and the available tasks 22 match the capabilities of the unit, the system can effectively utilize the processing capability of the unit.

等同单元12A至12n中的某些或全部可彼此独立地工作，或可通过切换结构14、通过任务池13或根据来自CPU的命令或请求彼此通信以唤醒另一等同单元来帮助处理、移动或发送数据。在一个实施例中，代理30A可以搜索就绪任务22的任务类型以及单元12A能够执行的任务的类型之间的匹配。该架构可涉及CPU 11被配置为创建的任务的类型的硬编码。因此，如果任务池13包含三种类型的任务22，并且大的计算需求包括第四类型的任务，则该第四类型的任务可以不被放置在任务池13中，即使能够执行第四类型的任务包括在系统10中或添加到系统10中。因此，CPU 11可被配置为“学习”或被教导如何创建第四类型的任务，以便更充分地利用可用的处理资源。Some or all of the equivalent units 12A to 12n may operate independently of each other, or may communicate with each other to wake up another equivalent unit by switching the fabric 14, through the task pool 13, or upon command or request from the CPU to aid in processing, moving or send data. In one embodiment, agent 30A may search for a match between the task type of ready task 22 and the type of task that unit 12A is capable of executing. The architecture may involve hardcoding the types of tasks the CPU 11 is configured to create. Thus, if the task pool 13 contains three types of tasks 22, and a large computing requirement includes a fourth type of task, this fourth type of task may not be placed in the task pool 13 even though the fourth type of task can be executed. Tasks are included in system 10 or added to system 10 . Accordingly, CPU 11 may be configured to "learn" or be taught how to create a fourth type of task in order to more fully utilize available processing resources.

在另一实施例中，代理30在任务22描述符中搜索与单元12A能够执行的指令之一匹配的可执行指令。当找到匹配任务22时，代理30A将匹配任务22的描述符分发到单元12A，因此，单元12A开始处理任务22。具体来说，代理30A可将描述符的存储地址分发到单元12A，单元12A从存储器检索数据结构。可选地，描述符的完整数据结构被包括在任务22中，代理30A可将完整数据结构分发到单元12A以进行处理。描述符通知单元12A执行哪个指令，可找到在存储单元15中数据将被处理的位置以及在存储器15中结构将被放置的位置。在完成任务22时，单元12A通知任务池13将选择的任务22的状态从“将被完成”改变为“已完成”。此外，一旦单元12A完成任务22，则单元可将它的代理30A分派到任务池13以搜索另一任务22。In another embodiment, agent 30 searches the task 22 descriptor for an executable instruction that matches one of the instructions that unit 12A is capable of executing. When a matching task 22 is found, the agent 30A distributes the descriptor of the matching task 22 to the unit 12A, whereupon the unit 12A starts processing the task 22 . Specifically, agent 30A may distribute the storage address of the descriptor to unit 12A, which retrieves the data structure from memory. Optionally, a complete data structure for the descriptor is included in task 22, which agent 30A may distribute to unit 12A for processing. The descriptor informs the unit 12A which instruction to execute, where in the storage unit 15 the data is to be processed and where in the memory 15 the structure is to be located, can be found. Upon completion of a task 22, the unit 12A notifies the task pool 13 to change the status of the selected task 22 from "to be completed" to "completed". Furthermore, once a unit 12A completes a task 22 , the unit may dispatch its agent 30A to the task pool 13 to search for another task 22 .

代理30A至30n中的某些代理或全部代理可根据系统10的具体架构和/或实施方案，通过有线或无线(例如，使用Wi-Fi网络、无线以太网、无线USB、无线桥、无线中继器、无线路由器、或蓝牙配对)穿过系统10。在实施例中，代理30可通过在任务池13包括接收器特征并进一步通过包括具有单元12的发送器特征被无线引导至任务池13。相似地，任务池可通过为任务池配备发送器并且为等同单元配备接收器而对单元进行无线应答。以这种方式，单元可以在使用或不使用切换结构的情况下与任务池无线地通信。Depending on the particular architecture and/or implementation of the system 10, some or all of the agents 30A through 30n may be wired or wireless (e.g., using a Wi-Fi network, wireless Ethernet, wireless USB, wireless bridge, wireless medium repeaters, wireless routers, or Bluetooth pairing) through the system 10. In an embodiment, the agent 30 may be directed wirelessly to the task pool 13 by including a receiver feature on the task pool 13 and further by including a transmitter feature with the unit 12 . Similarly, task pools can wirelessly answer units by equipping the task pool with transmitters and equivalent units with receivers. In this way, the unit can communicate wirelessly with the task pool with or without the use of a switching fabric.

在优选实施例中，然而，利用切换结构14的某一形式。切换结构14有利用数据传输的连接以及系统资源之间的仲裁。切换结构14可以是在各种单元与任务池之间提供连接的路由器或纵横开关。切换结构14还可提供在每个等同单元12A至12n与系统资源(诸如，CPU11、存储单元15和传统系统组件，包括但不限于，直接存储器访问单元、发送器、硬盘及其控制器、显示器和其它输入/输出设备以及其它协同处理器)之间的连接。单元12A至12n可被物理地连接到切换结构14，或者单元可被无线连接。In a preferred embodiment, however, some form of switching structure 14 is utilized. Switching fabric 14 has connections for data transfer and arbitration between system resources. Switching fabric 14 may be a router or a crossbar switch that provides connectivity between the various units and task pools. Switching fabric 14 may also be provided between each equivalent unit 12A to 12n with system resources such as CPU 11, storage unit 15 and conventional system components including, but not limited to, direct memory access units, transmitters, hard drives and their controllers, displays and other I/O devices and other coprocessors). The units 12A through 12n may be physically connected to the switching fabric 14, or the units may be connected wirelessly.

单元无线连接至系统10有利于在系统10中使用的单元的动态增加和/或移除。例如，CPU 11可以从其他单元系统招募单元，允许动态扩展和提高性能。以这种方式，两个或更多个单元系统(例如，网络)可以共享等同单元。在一个实施例中，变得空闲的单元可以寻找和/或由需要附加处理资源的另一系统招募，即，其具有需要完成的可用处理任务。相似地，系统10可通过并入用于具体任务的额外单元的群集来扩展性能。例如，系统10可通过并入能够执行这些任务的附近单元来增强加密功能/解密功能的性能，或对音频数据和/或视频数据的处理。Wireless connection of units to system 10 facilitates dynamic addition and/or removal of units used in system 10 . For example, the CPU 11 can recruit units from other unit systems, allowing dynamic expansion and increased performance. In this way, two or more unit systems (eg, networks) can share equivalent units. In one embodiment, units that become idle may be sought out and/or recruited by another system that requires additional processing resources, ie, that has available processing tasks that need to be completed. Similarly, system 10 may expand performance by incorporating clusters of additional units for specific tasks. For example, system 10 may enhance the performance of encryption/decryption functions, or processing of audio data and/or video data, by incorporating nearby units capable of performing these tasks.

为了防止不期望的连接，CPU 11可以向任务池13提供用于标识可信和/或不可信单元以及认证要求或协议的列表或者备选地，用于标识可信和/或不可信单元的标准。此外，任务池本身可基于低性能、不可靠连接、差的数据吞吐量或恶意或不正当行为来排除具体单元。在各种实施例中，单元12可由用户通过智能手机、平板电脑或其它设备或应用的使用，而被添加到任务池13或者从任务池13排除。在一个实施例中，图形应用接口可向用户提供有用的静态和/或图标信息，诸如，可用单元和其它设备的位置、性能增益或性能代偿，作为增加具体单元或从网络移除具体单元的结果。To prevent undesired connections, the CPU 11 may provide the task pool 13 with a list for identifying trusted and/or untrusted units and authentication requirements or protocols or alternatively criteria for identifying trusted and/or untrusted units. Furthermore, the task pool itself may exclude specific units based on low performance, unreliable connections, poor data throughput, or malicious or improper behavior. In various embodiments, units 12 may be added to or excluded from task pool 13 by a user through use of a smartphone, tablet, or other device or application. In one embodiment, the graphical application interface may provide the user with useful static and/or icon information, such as the location of available units and other equipment, performance gains or performance trade-offs as a function of adding or removing specific units from the network. the result of.

在可选实施例中，协同处理单元中的某些单元或全部单元可诸如通过不需要用于通信的切换结构14的有线配置直接连接到任务池13。单元的有线连接还可促进类似于上述无线配置的系统10的动态扩展和收缩，尽管有线连接可以是物理(例如，手动)集成和外围设备的提取。在任一情况下，与常规并行处理方案相比，系统的可扩展性大大增强，因为可以添加和移除协同处理器而不对CPU 11进行重新编程以考虑对系统10的改变。In an alternative embodiment, some or all of the co-processing units may be directly connected to the task pool 13, such as by a wired configuration that does not require the switching fabric 14 for communication. Wired connections of units may also facilitate dynamic expansion and contraction of system 10 similar to the wireless configuration described above, although wired connections may be physical (eg, manual) integration and abstraction of peripherals. In either case, the scalability of the system is greatly enhanced compared to conventional parallel processing schemes because co-processors can be added and removed without reprogramming the CPU 11 to account for changes to the system 10 .

现在参照图3，网络300包括CPU 302、第一存储器304、第二存储器306、任务池308、切换结构310、被配置为执行(运行)类型A任务的第一协同处理单元312、被配置为执行类型B任务的第二单元314、被配置为执行类型C任务的第三单元316，以及被配置为执行类型A任务和类型B任务两者的第四单元318。如上所述，任务池308被任务类型A的任务(或任务线程)330和332；任务类型B的任务334和336以及任务类型C的任务340和342而填入(例如，由CPU 302)。在实施例中，每个单元优选地具有唯一专用的代理。具体来说，单元312包括代理320；单元314包括代理322；单元316包括代理324；单元318包括代理326。每个代理优选地包括信息字段或识别任务的类型的头部，该任务为与其相关联的单元被配置执行的任务，例如，单个任务或任务A、B、C的组合。Referring now to FIG. 3, a network 300 includes a CPU 302, a first memory 304, a second memory 306, a task pool 308, a switching fabric 310, a first co-processing unit 312 configured to execute (run) type A tasks, configured to A second unit 314 that performs Type B tasks, a third unit 316 that is configured to perform Type C tasks, and a fourth unit 318 that is configured to perform both Type A tasks and Type B tasks. As described above, task pool 308 is populated (eg, by CPU 302 ) by tasks (or task threads) 330 and 332 of task type A; tasks 334 and 336 of task type B; and tasks 340 and 342 of task type C. In an embodiment, each unit preferably has a unique dedicated agent. Specifically, element 312 includes agent 320 ; element 314 includes agent 322 ; element 316 includes agent 324 ; and element 318 includes agent 326 . Each agent preferably includes an information field or a header identifying the type of task that the unit associated with it is configured to perform, eg a single task or a combination of tasks A, B, C.

在操作期间，当单元是空闲的或者以其它方式具有可用处理能力时，它的代理主动询问任务池以确定任何任务是否在任务队列中适合于该具体单元。例如，单元312可分派其代理320以检索与任务类型A对应的任务330和332中的一个或两个。相似地，单元314可分派其代理322以检索与任务类型B对应的任务334或336(根据它们对应的优先级)等。针对能够执行超过一个任务类型的单元，诸如单元318被配置为执行任务类型A和B、代理326可检索任务330、332、334和/或336中的任意一个。During operation, when a unit is idle or otherwise has available processing capacity, its agent actively queries the task pool to determine whether any tasks are suitable for that particular unit in the task queue. For example, unit 312 may dispatch its agent 320 to retrieve one or both of tasks 330 and 332 corresponding to task type A. FIG. Similarly, unit 314 may dispatch its agent 322 to retrieve tasks 334 or 336 corresponding to task type B (according to their corresponding priorities), and so on. For units capable of performing more than one task type, such as unit 318 configured to perform task types A and B, agent 326 may retrieve any of tasks 330 , 332 , 334 and/or 336 .

在从任务池检索到任务时，单元可随后处理该任务，通常通过从第一存储器304中的具体位置检索数据、处理该数据并将处理后的数据存储在第二存储器306内的具体位置上。当任务被完成时，单元通知任务池，任务池将该任务标记为已完成，任务池通知CPU该任务已完成。可选地，任务池当任务线程已完成可通知CPU，因为任务线程可包括单个任务、任务串或任务的布尔组合。重要的是，在没有CPU和各个单元之间的直接通信的情况下，可能发生单元对任务的检索和对数据的处理。When a task is retrieved from the task pool, the unit can then process the task, typically by retrieving data from a specific location in the first memory 304, processing the data, and storing the processed data at a specific location in the second memory 306 . When the task is completed, the unit notifies the task pool, the task pool marks the task as completed, and the task pool notifies the CPU that the task is completed. Optionally, the task pool can notify the CPU when a task thread has completed, since a task thread can consist of a single task, a string of tasks, or a Boolean combination of tasks. Importantly, retrieval of tasks and processing of data by units can occur without direct communication between the CPU and the individual units.

现在参照图4，物联网网络400包括控制器(CPU)402、任务池408和各种设备410至422，其中，所述设备中的某些或全部包括相关联或嵌入的微控制器，诸如，集成电路(IC)芯片或实现处理能力的其它组件。作为非限制示例，该设备可包括灯泡410、恒温器412、电插座414、电源开关416、器具(例如，烤面包机)418、车辆420、键盘422以及能够与网络交互的实际上任何其它即插即用设备或应用。Referring now to FIG. 4, an Internet of Things network 400 includes a controller (CPU) 402, a task pool 408, and various devices 410 through 422, wherein some or all of the devices include associated or embedded microcontrollers, such as , an integrated circuit (IC) chip or other component that implements processing capabilities. As non-limiting examples, the device may include a light bulb 410, a thermostat 412, an electrical outlet 414, a power switch 416, an appliance (e.g., a toaster) 418, a vehicle 420, a keyboard 422, and virtually anything else capable of interacting with a network, namely Plug and play devices or applications.

在示出的实施例中，控制器402可以是智能电话、平板电脑、膝上型笔记本、或可包括显示器404和用户接口(例如，键盘)406以方便用户与网络中的各种设备进行用户交互的其它设备。在控制器402的处理能力(例如，带宽)可能不足以充分支持网络的程度上，控制器可以经由任务池有效地从外围设备获取或招募处理资源，例如如下面结合图5进行解释。In the illustrated embodiment, the controller 402 may be a smartphone, a tablet computer, a laptop, or may include a display 404 and a user interface (e.g., a keyboard) 406 to facilitate user interaction with various devices in the network. interact with other devices. To the extent that the processing power (eg, bandwidth) of the controller 402 may not be sufficient to adequately support the network, the controller may efficiently acquire or recruit processing resources from peripheral devices via a task pool, eg, as explained below in conjunction with FIG. 5 .

现在参照图5，物联网网络500使用案例示出附近(或其它方式可用)设备的动态利用。网络500包括主控制单元502(例如，膝上型笔记本、平板电脑或游戏设备)、任务池504、第一协同处理器设备506和第二协同处理器设备508。现在将描述在网络500的背景下的示例性使用案例。Referring now to FIG. 5 , an Internet of Things network 500 use case illustrates the dynamic utilization of nearby (or otherwise available) devices. Network 500 includes a master control unit 502 (eg, a laptop, tablet, or gaming device), a task pool 504 , a first coprocessor device 506 , and a second coprocessor device 508 . An exemplary use case in the context of network 500 will now be described.

假设用户在她的膝上型计算机502上正播放视频游戏。视频游戏需要详细的计算机生成图像，并且可能膝上型计算机502中的处理能力足以呈现单个真实外观角色，但是当第二角色被引入到屏幕上时，图像质量劣化，并且角色的移动不再连续。本发明提出一种利用位于用户附近或用户可用的未充分利用的计算机资源的处理能力的方法。Suppose a user is playing a video game on her laptop computer 502 . Video games require detailed computer-generated graphics, and perhaps the processing power in the laptop computer 502 is sufficient to render a single life-like character, but when a second character is introduced onto the screen, the image quality degrades and the character's movement is no longer continuous . The present invention proposes a method of exploiting the processing power of underutilized computer resources located in the vicinity of or available to the user.

为了解决附加处理能力的需求，膝上型计算机502连接到任务池504。在这一方面，膝上型计算机本身可配备有任务池，或者任务池可以以外部设备或应用位于从膝上型计算机502无线到达的范围内。在外部任务池的情况下，任务池本身可以执行具有端口的切换结构的职责，以允许连接到多个协同处理单元。膝上型计算机502利用计算密集型任务来填充任务池504。附近未充分利用的设备(诸如，智能电话508)随后连接至任务池504，并发送其代理以提取匹配的任务类型。因此，智能电话508变成无缝协助膝上型计算机502的协同处理器，从而增强视频游戏体验。在存在其它未充分利用的处理资源并且需要的情况下，可重复相同的方法。实际上，即使可用灯泡506的处理能力也可以成为膝上型计算机的协同处理器。To address the need for additional processing power, laptop computer 502 is connected to task pool 504 . In this regard, the laptop itself may be equipped with a task pool, or the task pool may be located as an external device or application within wireless reach from the laptop 502 . In the case of an external task pool, the task pool itself may perform the duties of a switching structure with ports to allow connection to multiple co-processing units. Laptop 502 populates task pool 504 with computationally intensive tasks. Nearby underutilized devices, such as smartphones 508 , then connect to task pool 504 and send their agents to extract matching task types. Thus, smartphone 508 becomes a co-processor that seamlessly assists laptop 502 to enhance the video game experience. The same method can be repeated in case other underutilized processing resources exist and are needed. In fact, even the processing power of available light bulb 506 could be a co-processor for a laptop computer.

图6是示出示例性并行计算环境的操作的流程图。具体来说，方法600包括：使用任务填入任务池(步骤602)；将一个或多个代理从一个或多个相应单元主动分派到任务池(步骤604)；检索并处理任务(步骤606)；通知任务池和CPU任务线程已被执行(步骤608)。所述方法还包括根据需要将附加设备动态地并入网络(步骤610)。6 is a flowchart illustrating the operation of an exemplary parallel computing environment. Specifically, method 600 includes: populating a task pool with tasks (step 602); actively dispatching one or more agents from one or more corresponding units to the task pool (step 604); retrieving and processing the tasks (step 606) ; Notification task pool and CPU task thread has been executed (step 608). The method also includes dynamically incorporating additional devices into the network as needed (step 610).

因此，提供了一种处理系统，所述处理系统包括任务池；控制器，被配置为利用第一任务填充任务池；以及第一协同处理器，被配置为从任务池主动检索第一任务。Accordingly, a processing system is provided that includes a task pool; a controller configured to populate the task pool with a first task; and a first co-processor configured to actively retrieve the first task from the task pool.

在实施例中，第一协同处理器包括第一代理，被配置为在不需要与控制器进行通信的情况下从任务池检索第一任务。In an embodiment, the first co-processor includes a first agent configured to retrieve the first task from the task pool without requiring communication with the controller.

在实施例中，第一任务包括第一任务类型的标记，第一协同处理器被配置为执行第一类型的任务，第一代理被配置为在任务池中搜索第一类型的任务。In an embodiment, the first task includes an indicia of a first task type, the first coprocessor is configured to execute the first type of task, and the first agent is configured to search the task pool for the first type of task.

在实施例中，第一协同处理器还被配置为处理第一任务并在完成第一任务时通知任务池，任务池被配置为在完成第一任务时通知控制器。In an embodiment, the first co-processor is further configured to process the first task and notify the task pool when the first task is completed, and the task pool is configured to notify the controller when the first task is completed.

在实施例中，控制器和第一协同处理器被配置为仅通过任务池彼此之间进行通信。In an embodiment, the controller and the first co-processor are configured to communicate with each other only through the task pool.

在实施例中，控制器和第一协同处理器被配置为直接地和通过任务池彼此进行通信。In an embodiment, the controller and the first co-processor are configured to communicate with each other both directly and through a task pool.

在实施例中，第一协同处理器被配置为确定其具有可用的处理能力并响应于所述确定将代理分配到任务池。In an embodiment, the first co-processor is configured to determine that it has available processing capacity and to assign an agent to the task pool in response to said determination.

在实施例中，控制器还被配置为利用第二任务填入任务池，其中，所述系统还包括第二协同处理器，该第二协同处理器具有被配置为从任务池主动检索第二任务的第二代理。In an embodiment, the controller is further configured to populate the task pool with a second task, wherein the system further includes a second co-processor configured to actively retrieve the second task from the task pool. Second agent for the task.

在实施例中，第二任务包括第二任务类型的标记，第二协同处理器被配置为执行第二类型的任务，第二代理被配置为在任务池中搜索第二类型的任务。In an embodiment, the second task includes an indicia of a second task type, the second coprocessor is configured to execute the second type of task, and the second agent is configured to search the task pool for the second type of task.

在实施例中，控制器和任务池常驻在单片集成电路(IC)上，第一协同处理器未常驻在IC上。In an embodiment, the controller and task pool are resident on a monolithic integrated circuit (IC), and the first coprocessor is not resident on the IC.

在另一实施例中，控制器、任务池以及第一协同处理器和第二协同处理器常驻在单片集成电路(IC)上。In another embodiment, the controller, the task pool, and the first co-processor and the second co-processor reside on a single integrated circuit (IC).

此外，提供一种动态控制在包括中央处理单元(CPU)的类型的网络中的处理资源的方法，该CPU被配置为利用具有第一任务类型的第一任务填入任务池。所述方法包括以下步骤：对第一单元进行编程以执行第一任务类型；将编程后的第一单元添加到网络；将第一代理主动从第一单元发送到任务池；第一代理在任务池中搜索第一类型的任务；第一代理从任务池检索第一任务；第一代理将第一任务传输到第一单元；第一单元处理第一任务；将第一任务任务已完成的通知从第一单元发送到任务池。Furthermore, a method of dynamically controlling processing resources in a network comprising a type of central processing unit (CPU) configured to populate a task pool with a first task of a first task type is provided. The method comprises the steps of: programming a first unit to perform a first task type; adding the programmed first unit to the network; actively sending a first agent from the first unit to the task pool; The pool is searched for tasks of the first type; the first agent retrieves the first task from the task pool; the first agent transmits the first task to the first unit; the first unit processes the first task; Sent from the first unit to the task pool.

在实施例中，所述方法还包括：任务池将第一任务标记为已完成；将第一任务已完成的通知从任务池发送到CPU。In an embodiment, the method further includes: the task pool marking the first task as completed; and sending a notification that the first task is completed from the task pool to the CPU.

在实施例中，所述方法还包括：配置第一单元以确定第一单元具有可用处理能力作为依据(predicate)以将第一代理主动发送到任务池。In an embodiment, the method further comprises: configuring the first unit to determine that the first unit has available processing power as a predicate to proactively send the first agent to the task pool.

在实施例中，所述方法还包括：在将编程后的第一单元添加到网络之前将第一单元集成到第一设备。In an embodiment, the method further comprises integrating the first unit into the first device prior to adding the programmed first unit to the network.

在实施例中，第一设备包括传感器、灯泡、电源开关、器具、生物识别设备、医疗设备、诊断设备、膝上型笔记本、平板电脑、智能电话、电机控制器和安全设备中的一个。In an embodiment, the first device includes one of a sensor, a light bulb, a power switch, an appliance, a biometric device, a medical device, a diagnostic device, a laptop, a tablet, a smartphone, a motor controller, and a security device.

在实施例中，将编程的第一单元添加到网络包括在第一单元与任务池之间建立通信链接。In an embodiment, adding the programmed first unit to the network includes establishing a communication link between the first unit and the task pool.

在实施例中，CPU还被配置为利用具有第二任务类型的第二任务填入任务池，所述方法还包括以下步骤：对第二单元进行编程以执行第二任务类型；在第二单元与任务池之间建立通信链接；将第二代理从第二单元主动发送到所述任务池；第二代理在任务池中搜索第二类型的任务；第二代理从任务池检索第二任务；第二代理将第二任务发送到第二单元；第二单元处理第二任务；将第二任务已完成的通知从第二单元发送到任务池；任务池将第二任务标记为已完成；以及将第二任务已完成的通知从任务池发送到CPU。In an embodiment, the CPU is further configured to fill the task pool with a second task of a second task type, the method further comprising the steps of: programming the second unit to perform the second task type; A communication link is established with the task pool; a second agent is actively sent from the second unit to the task pool; the second agent searches for a second type of task in the task pool; the second agent retrieves a second task from the task pool; The second agent sends the second task to the second unit; the second unit processes the second task; a notification that the second task has been completed is sent from the second unit to the task pool; the task pool marks the second task as completed; and A notification that the second task has completed is sent from the task pool to the CPU.

还提供了一种用于控制物联网(IoT)计算环境下的分布式处理资源的系统，所述系统包括：CPU，被配置为将集群计算需求划分为多个任务并将任务放置在池中；以及多个设备，每个设备具有唯一专用代理，被配置为在不需要与CPU直接通信的情况下从所述池主动检索任务。Also provided is a system for controlling distributed processing resources in an Internet of Things (IoT) computing environment, the system comprising: a CPU configured to divide cluster computing requirements into tasks and place the tasks in a pool ; and a plurality of devices, each having a unique dedicated agent, configured to actively retrieve tasks from said pool without requiring direct communication with the CPU.

尽管已经示出了对包括发明人已知的最佳模式的各种实施例的使能描述，但是本领域技术人员将理解，在不脱离本发明的范围的情况下，可以进行各种改变和修改，并且等同物可以替代各种元件。因此，意图是，这里公开的本发明不限于所公开的具体实施例，而是本发明将包括落入权利要求的文字和等同范围内的所有实施例。While there has been shown an enabling description of various embodiments including the best mode known to the inventors, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the scope of the invention. modifications, and equivalents may be substituted for various elements. Therefore, it is intended that the invention disclosed herein not be limited to the particular embodiments disclosed, but that the invention will include all embodiments falling within the words and equivalents of the claims.

Claims

1. A processing system comprising:

task pool;

a controller configured to populate the task pool with a first task; and

The first coprocessor is configured to actively retrieve the first task from the task pool.

2. The processing system of claim 1 , wherein the first co-processor includes a first agent configured to retrieve data from the task without communicating with the controller. The pool retrieves the first task.

3. The processing system of claim 2 , wherein the first task includes an identification of a first task type, the first coprocessor is configured to execute a task of the first type, and the first agent configured to search the pool of tasks for tasks of the first type.

4. The processing system of claim 1, wherein the first co-processor is further configured to process the first task and notify the task pool upon completion of the first task.

5. The processing system of claim 1, wherein the task pool is configured to notify the controller when the first task is completed.

6. The processing system of claim 1, wherein the controller and the first coprocessor are configured to communicate with each other only through the task pool.

7. The processing system of claim 1, wherein the controller and the first co-processor are configured to communicate with each other directly and through the task pool.

8. The processing system of claim 2, wherein the first co-processor is configured to determine that processing capacity is available, and to assign the agent to the task pool in response to the determination.

9. The processing system of claim 3, wherein the controller is further configured to populate the task pool with a second task, and wherein the system further comprises a second co-processing with a second agent The second agent is configured to actively retrieve the second task from the task pool.

10. The processing system of claim 9 , wherein the second task includes an identification of a second task type, the second coprocessor is configured to execute a task of the second type, and the second agent configured to search the pool of tasks for tasks of the second type.

11. The processing system of claim 1 , wherein the controller and the task pool are resident on a monolithic integrated circuit (IC), and the first coprocessor is not resident on the IC. superior.

12. The processing system of claim 9 , wherein the controller, the task pool, and the first coprocessor and the second coprocessor are resident on a monolithic integrated circuit (IC) .

13. A method of dynamically controlling processing resources in a network of a type comprising a central processing unit (CPU) configured to populate a task pool with a first task of a first task type, the method comprising The following steps:

programming a first unit to perform said first task type;

adding said programmed first unit to said network;

actively sending a first agent from said first unit to said task pool;

the first agent searches the task pool for tasks of a first type;

the first agent retrieves the first task from the task pool;

the first agent delivers the first task to the first unit;

the first unit processes the first task; and

A notification is sent from the first unit to the pool of tasks that the first task has been completed.

14. The method of claim 13 , further comprising: the task pool marking the first task as completed; and sending a notification from the task pool to the CPU that the first task is complete .

15. The method of claim 13, further comprising configuring the first unit to proactively send the first agent to the task pool based on a determination that the first unit has available processing power.

16. The method of claim 13, further comprising integrating the programmed first unit into the first device prior to adding the programmed first unit to the network.

17. The method of claim 16, wherein the first device comprises a sensor, a light bulb, a power switch, an appliance, a biometric device, a medical device, a diagnostic device, a laptop, a tablet, a smart phone, a motor One of the controller and safety device.

18. The method of claim 13, wherein adding the programmed first unit to the network comprises:

A communication link is established between the first unit and the task pool.

19. The method of claim 13, wherein the (CPU) is further configured to populate the task pool with second tasks having a second task type, the method further comprising the step of:

programming a second unit to perform said second task type;

establishing a communication link between the second unit and the task pool;

actively sending a second agent from the second unit to the pool of tasks;

the second agent searches the task pool for tasks of a second type;

the second agent retrieves the second task from the task pool;

the second agent delivers the second task to the second unit;

the second unit processes the second task;

sending a notification from the second unit to the task pool that the second task has been completed;

the task pool marks the second task as completed; and

A notification is sent from the task pool to the CPU that the second task has completed.

20. A system for controlling distributed processing resources in an Internet of Things (IoT) computing environment, comprising:

a CPU configured to divide the computing needs of the cluster into tasks and place the tasks in a pool; and

A plurality of devices, each with a unique dedicated agent configured to actively retrieve tasks from the pool without requiring direct communication with the CPU.