CN112486572A - Multi-threaded wireless communication processor with refined thread processes - Google Patents

Multi-threaded wireless communication processor with refined thread processes Download PDF

Info

Publication number
CN112486572A
CN112486572A CN202010940096.5A CN202010940096A CN112486572A CN 112486572 A CN112486572 A CN 112486572A CN 202010940096 A CN202010940096 A CN 202010940096A CN 112486572 A CN112486572 A CN 112486572A
Authority
CN
China
Prior art keywords
thread
processor
communication
operable
communications
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010940096.5A
Other languages
Chinese (zh)
Inventor
P·S·穆拉利
S·R·卡拉姆
V·马特拉
V·S·P·普拉加姆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Silicon Laboratories Inc
Original Assignee
Silicon Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US16/945,871 external-priority patent/US20210073027A1/en
Priority claimed from US16/945,870 external-priority patent/US20210076248A1/en
Application filed by Silicon Laboratories Inc filed Critical Silicon Laboratories Inc
Publication of CN112486572A publication Critical patent/CN112486572A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30047Prefetch instructions; cache control instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/3012Organisation of register space, e.g. banked or distributed register file
    • G06F9/30123Organisation of register space, e.g. banked or distributed register file according to context, e.g. thread buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/80Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W84/00Network topologies
    • H04W84/02Hierarchically pre-organised networks, e.g. paging networks, cellular networks, WLAN [Wireless Local Area Network] or WLL [Wireless Local Loop]
    • H04W84/10Small scale networks; Flat hierarchical networks
    • H04W84/12WLAN [Wireless Local Area Networks]

Abstract

The application relates to a multi-threaded wireless communication processor with a refined thread process. The communications processor is operable to adapt thread allocation on an instruction-by-instruction basis to the communications process handled by the multithreaded processor. The thread map register controls the allocation of each processor cycle to a particular thread and is reprogrammed in accordance with the network process load of a plurality of communication processors, such as WLAN, bluetooth, Zigbee or LTE, having increasing or decreasing load requirements. The thread management process may dynamically allocate processor cycles to each respective process during the active time of each associated communication process.

Description

Multi-threaded wireless communication processor with refined thread processes
Technical Field
The invention relates to a multithreaded processor. More particularly, the present invention relates to a multithreaded processor having a refined and dynamic thread allocation feature such that a variable percentage of Central Processing Unit (CPU) processing capacity can be dynamically allocated to each thread.
Background
A multithreaded processor is used when the system runs multiple processes, each running on its own separate thread. Examples of prior art multithreaded processors and uses are described in U.S. patent nos. 7,761,688, 7,657,683, and 8,396,063. In a typical application for wireless communication using an example dedicated two-thread processor, the processor alternates execution cycles between executing instructions of a high priority program on a first thread and executing instructions of a low priority program on a second thread, and the alternating execution causes each thread to be allocated 50% of the CPU processing capacity. Furthermore, allocating CPU bandwidth to each thread is protected because during a thread stall, such as when a first thread accesses an external peripheral and must wait for data to return, a second thread can continue execution unaffected by the first thread stall.
Problems arise where a multithreaded processor needs to allocate bandwidth unequally or needs to change allocation dynamically. It is desirable to provide a dynamic allocation of thread utilization for each task such that during each interval consisting of a set of processor execution cycles, each of the threads during that interval receives a fixed percentage of CPU cycles. During subsequent intervals, other threads may be added or deleted, or the CPU cycle percentage allocation for each thread may be changed. It is also desirable to provide unequal allocation of CPU power among the multiple threads and to perform the allocation dynamically.
Another problem in multithreaded processors is the timely handling of interrupts. During interrupt handling, the new interrupt will be disabled so that handling of a particular previous interrupt can be completed. Subsequently received interrupts cannot be identified until the previous interrupt handling is complete and the interrupt is revealed. It is desirable to provide interrupt handling that is capable of identifying in time new interrupts that arrive during the pendency of previously interrupted task handling.
Object of the Invention
A first object of the present invention is a multithreaded superscalar processor having a series of cascaded stages, each cascaded stage providing results of operations to a subsequent stage, a first one of the cascaded stages receiving instructions from a program memory address referenced by a thread identifier and an associated program counter, the thread identifier being provided by a thread mapping register containing a sequence of thread identifiers, each thread identifier indicating which of the program counter and register file is to be used by a particular processor stage, a particular instruction being selected using a thread identifier and a per-thread program counter provided to a sequence of pipelined stages (including an instruction fetch stage, an instruction decode stage, a decode/execute stage, an execute stage, a load/store stage, and a writeback stage), the decode/execute stage being coupled to the register file selected by the thread identifier.
A second object of the present invention is a multithreaded superscalar processor operable to handle a plurality of interrupted processes, each interrupted process being associated with a particular thread.
A third object of the present invention is a multithreaded superscalar processor having thread mapping registers that are reprogrammable to dynamically identify a sequence of threads to be executed, each thread being associated with a program counter register and register file, the program counter register and register file being coupled to at least one subsequent stage: a prefetch stage, an instruction fetch stage, an instruction decode stage, a decode/execute stage, an execute stage, a load-store stage, and an optional write-back stage.
A fourth object of the present invention is the dynamic allocation of thread bandwidth from a first protocol process to a second protocol process, each protocol process handling data packets arriving over a separate interface and handled by a different thread in a multithreaded processor with refined control over the allocation cycles per thread.
A fifth object of the present invention is a communication interface having concurrent processing of unrelated communication protocols, such as bluetooth and WLAN, the bluetooth interface being active during regular time intervals separated by gaps in which the bluetooth protocol is inactive, and gaps in which the bluetooth protocol for WLAN communication is inactive, the communication protocol running on the multithreaded processor providing dynamic assignment of a greater number of thread cycles to the bluetooth protocol during active bluetooth intervals and providing dynamic assignment of a greater number of thread cycles to the WLAN protocol during active WLAN intervals.
Disclosure of Invention
In one example of the invention, a superscalar processor has (in order) a prefetch stage, a fetch stage, a decode/execute stage, an execute stage, a load/store stage, and an optional write back stage. The prefetch stage receives instructions provided by a thread-by-thread program counter under direction of a thread map register that provides a canonical sequence of thread identifiers that index into the thread-by-thread program counter to select the identified thread, and the selected program counter directs the prefetch stage to receive instructions from the instruction memory. The decode/execute stage is coupled to a register file that selects the register file associated with the thread then being executed by the decode/execute stage, addressing the thread-specific register set.
The thread map register identifies the particular thread being executed, where the thread map register may refer to any number of different threads, but is limited by the thread-by-thread program counter and the number of thread-by-thread register files. For example, the thread map register may contain 10 entries and the number of thread-by-thread program counters and thread-by-thread register files may be 4. In this case, the granularity of each of the 4 threads may be specified as 10%, such that thread _0 may receive 1 cycle, thread _1 may receive 4 cycles, thread _2 may receive 3 cycles, and thread 3 may receive 2 cycles. A thread register (non-limiting) may specify any of [0,1,1,1,1,2,2,2,3,3] of the specification execution. The thread registers may be updated to change the number of threads or thread allocation, such as by writing new values [0,0,0,0,1,2,2,2,3,3] to the thread registers, thread 0 may be extended and thread 1 may be decreased.
In another example of the invention, interrupt masking is provided on a superscalar multithreaded processor thread by thread such that each thread has its own separate interrupt register. In this example of the invention, each thread has its own separate interrupt processing, such that interrupts to thread _0 are masked by thread _0, and other threads (such as thread _1, thread _2, a. In this example architecture, each thread may be capable of handling different protocol types, e.g., each of the wireless protocol WLAN, bluetooth, and Zigbee packet processing may be handled using a packet buffer coupled with a processor interface of a multi-protocol baseband processor having a common packet buffer interface. In this example, the multithreaded processor may handle acknowledgement and retransmit requests, each of which must be completed in time using interrupt handling, with each protocol type being performed on a separate interrupt dedicated to a separate thread, and with the thread registers being rewritten as needed to allocate larger thread cycles on an adaptive basis.
Drawings
FIG. 1 shows a block diagram of a multithreaded superscalar processor with a thread-by-thread program counter and a thread-by-thread register file.
FIG. 1A shows a block diagram of an organization for thread-by-thread program counters.
FIG. 1B illustrates a block diagram of an example for thread mapping registers.
FIG. 2A illustrates an example thread mapping register for thread order mapping and thread mapping register allocation for a given thread.
FIG. 2B illustrates thread mapping registers for non-sequential mapping of the threads of FIG. 2A.
FIG. 3 illustrates a thread-by-thread interrupt controller and handling for the multithreaded processor of FIG. 1.
Fig. 4 shows a block diagram of the bluetooth and WLAN processors using separate CPUs.
Fig. 5 shows a block diagram of a bluetooth and WLAN processor using a multithreaded processor.
FIG. 5A illustrates an example allocation of program code and associated tasks for a multithreaded processor.
Fig. 5B illustrates an example allocation of RAM for a packet buffer.
Detailed Description
FIG. 1 shows an example of the present invention of a superscalar processor 100, the superscalar processor 100 having sequential stages: a prefetch stage 102, a fetch stage 104, a decode stage 106, a decode/execute stage 108, an execute stage 110, a load/store stage 112, and an optional write-back stage 114. The instructions delivered to the prefetch stage 102 are executed sequentially by each subsequent stage on separate clock cycles, advancing (carrying forward) any context and intermediate results required by the next stage. In one example of the invention, the thread map register 103 provides a canonical sequence of thread identifiers (thread _ ids) for delivery to the thread-by-thread program counter 105, the thread-by-thread program counter 105 provides the associated current program counter 105 address to the prefetch stage 102, and the prefetch stage 102 retrieves the associated instruction from the instruction memory 116 and delivers it to the fetch stage 104 on a subsequent clock cycle. The decode/execute stage 108 is coupled to a thread-by-thread register file 118, the thread-by-thread register file 118 being responsive to read requests from the decode/execute stage 108 or write-back operations from the stage 114, each of which is dedicated to a thread, so that data read or written into the register file 118 corresponds to the thread _ id that is requesting or providing the data.
FIG. 1A shows a plurality of thread-by-thread program counters 105: PC _ T0 for thread _0, PC _ T1 for thread _1, PC _ Tn for thread n, one program counter operable for use with each thread.
FIG. 1B illustrates thread map register 103, which includes a sequence of thread identifiers T0130 through Tn 132 for canonical execution. The number of threads (each thread being a separate process executing in a particular stage of CPU cycles) is m, limited by the number of register files 118 and program counters 105, and the thread map registers 103 may support m threads to equally allocate CPU bandwidth for the threads, or for larger-granularity thread control, n slots may be provided, where n > m. For example, a thread map with 16 entries may support 4 threads, each with a granularity of 1/16 of available CPU processing power, and support any value between 0/16 and 16/16 of available CPU processing power (depending on the allocation of CPU processing power to the remaining threads).
Fig. 2A shows an example 16-entry thread map register 103 over a canonical cycle length 204, which repeats canonical at the end of each 16-entry. The present example of fig. 2A is shown for 4 threads and a sequential mapping, which may be suitable for applications such as in a wireless thread stop situation, where a thread cannot execute sequential cycles due to a delay in receiving results from an external resource. For an n-16 thread map register location, the thread map register provides 1/16 resolution of the processor application for each task, and the processor can be used with one thread at each thread map register location, however, this provides a fixed time allocation for each thread. In a preferred utilization, the number of thread identifiers m is less than the number of thread mapping register locations n, which provides that the assignment of a particular thread to a task may have a granularity of p/n, where n is typically fixed, and p is programmable as the number of cycles assigned to a particular thread and may vary from 0 to n to allocate more or less computational resources to each thread. In another example of the invention, the length n of the thread map register may itself be programmable to provide greater granularity in task cycle management, or to support a greater number of threads.
FIG. 2A illustrates an example thread mapping register for a four-threaded processor in the 16-position thread mapping register 202, where threads 0,1,2, and 3 (T0, T1, T2, T3, respectively) and each respective thread is allocated 12.5%, 25%, 50%, and 12.5% of the processor's capacity. A problem arises in which a particular thread must wait for an external resource response (referred to as a thread stop). In the example of fig. 2A, the decode/execute stage 108 may need to read an external shared memory or Media Access Controller (MAC), not shown, and the delay in reading the external resource may require 4 clock cycles. In the case where the threads that are shown in FIG. 2A to be allocated and accessing external resources are T0 and T3, or are subject to a delay in reading or writing to the device, T0 will be at thread stop in operation 208 and T3 will be at thread stop 214 at cycle 210. With the arrangement of thread identifiers shown in FIG. 2A, this would cause a loss of otherwise available CPU cycles for each thread stall.
FIG. 2B shows an alternative mapping that uses the same time allocation as FIG. 2A, but rearranges the thread sequence 220 for the same thread stop case as shown in FIG. 2A. The rearrangement of T0 to positions 0 and 7 and the rearrangement of T3 to positions 1 and 8 is reflected in the arrangement of FIG. 2B. The T0 thread is stopped only when the thread stops longer than 6 clock cycles 224, whereas the thread stop 212 is 4 clock cycles, thus performing two occurrences of T0 in the arrangement of FIG. 2B, rather than one in FIG. 2A. Similarly, unless the thread stop has a duration of 226, a T3 stop causing the second T3 cycle of FIG. 2A to stall does not occur in FIG. 2B.
Fig. 3 illustrates another aspect of the present invention, an example for wireless signal processing, where process threads 308 may execute as different threads on the multithreaded processor of fig. 1, and the multithreaded processor has interfaces 310 as part of multithreaded CPU 100, each interface being associated with a particular MAC. The radio signal is received and transmitted on an antenna 301, converted to baseband upon reception, or modulated to RF upon transmission 302, and provided to a multi-protocol baseband processor 304. When a packet arrives at a particular interface of the multi-protocol MAC, an interrupt for a particular thread may be sent to the interrupt controller 306, where each interrupt may be masked by an associated process 308 operating in the multi-protocol processor. Each process can control an associated interrupt mask (shown as IM0, IM1, IM2, IM3) that is provided to interrupt controller 306 to mask the interrupt so that the associated process does not process the new interrupt until the previous interrupt for that process has completed.
The current multitasking handling of interrupts has certain advantages over the prior art. In the prior art, the interrupt service routine on thread 0 may be handling packet acknowledgements for multiple packet interfaces. In this task, after receiving a data packet, the receive buffer is checked to detect any missing data packets in the sequence, and the process acknowledges the received data packet or makes a retransmission request to the sender for any missing data packet. There is a critical timing window associated with packet acknowledgements and retransmissions and it is therefore important that the acknowledgement or retransmission request is made in a timely manner after the packet is received. We can consider the following cases: where retransmission requests must be made within 30us after receipt of a data packet, and the first retransmission task 0 requires 5us to complete, the second retransmission task 1 requires 10us to process and complete, and the third retransmission task 3 requires 5us to process and complete, and a single process handles three tasks on a single thread. In this example where three tasks are handled by a common thread and common interrupt masking is used as in the prior art, upon receipt of a packet, the process on thread 0 handles task 0 masking interrupts to prevent other packet acknowledgements from slowing down the handling of the current acknowledgement requiring 5 us. If the second interrupt associated with task 1 of thread 0 arrives during the handling of task 0, task 1 is not handled until at least 5us after it arrives, because thread 0 is still busy with task 0. It may further happen that due to bursty data packets on a different interface, a third task 3 requiring 5us to complete may arrive while task 1 (requiring 10us) is waiting for task 0 (requiring 5us) to complete. When task 0 completes, the interrupt mask is removed, task 1 generates an interrupt and is detected, the interrupt mask is asserted again, and the processing of task 1 is complete, after which the interrupt mask is cleared and task 2 is detected by asserting its interrupt. Thereafter, the interrupt mask is again asserted, task 2 begins at least 15us after the request arrives, and the request completes at 20us after the required window of retransmission requests has passed. After task 2 is completed, the interrupt mask is cleared, however the remote station does not receive a retransmission request from task 2 in time and the retransmission protocol fails. A prior art solution to the problem of latency delays for task 2 after the earlier tasks 1 and 2 is a faster processor. Additionally, a thread lock may occur when the multi-core processor is reading the MAC interface, which may be avoided by the rearrangement of thread identifiers as previously shown in fig. 2B. In this case, it may be desirable to allocate a small number of thread cycles for the acknowledge and retransmit tasks, but to spread these three tasks into separate threads, each thread allocating a small amount of time, which will overcome the interface read/write latency and latency delays of interrupt masking by associating each thread with a separate interrupt and interrupt mask.
In the prior art, where each of the tasks is executed on a single thread and each task requires 50MIPS, successfully handling three tasks requires 300MIPS of processing power due to latency and latency of sequentially handling interrupts, whereas with the novel approach of fig. 3, only about 150MIPS is required, saving MIPS requirements by one-half, thereby reducing power consumption requirements.
In another example of the multi-protocol processor of fig. 1, each of the wireless protocols may be handled by a separate thread. For example, the processes handling WLAN, bluetooth and Zigbee may each operate on a separate process on their own thread, and the retransmission process for each may be handled by a separate process for each protocol, each operating on its own thread.
In another example of the invention, the thread mapping registers may be changed interactively according to process requirements detected by a separate thread management process. Since the context from each stage is forwarded to the subsequent stages of fig. 1, the change to the thread map register can be done at any time, depending on the synchronous clock requirements of the prefetch stage 102 and the associated thread-by-thread program counter 105 to receive the deterministic thread _ ID.
Fig. 4 shows an example Wireless Local Area Network (WLAN) and Bluetooth (BT) combined transceiver having an interface 480 for exchanging data with a communication system. Each interface type requires a CPU due to the dedicated WLAN and BT processing operations required for each protocol and the response timeliness required for each. The requirement of the CPU to do low latency processing for each interface results in WLAN and BT processing as performed by the system architecture shown in fig. 4.
Fig. 4 illustrates a WLAN processor including an analog front end and MAC 401 coupled to WLAN CPU 424, and a BT process including an analog front end and MAC 450 coupled to BT CPU 482. Each WLAN CPU 424 and BT CPU 482 is able to respond in time to interrupts and bursts that require immediate processing by software programs associated with each respective WLAN processor 401 and BT processor 450.
In the WLAN processor 401, an antenna 402 is coupled to a transmit/receive switch 404 for coupling to a receive signal from a low noise amplifier 406 and a transmit signal from a power amplifier 414. The input signal is mixed 408 to baseband using a clock source 418, low pass filtering 410, and the analog baseband signal is digitized and processed with a combined ADC and baseband processor 412, which demodulates the received symbols into a data stream that forms layer 2 data packets across a Serial Data Interface (SDI), such as to a CPU 424, by a Media Access Controller (MAC) 422. CPU 424 has associated Random Access Memory (RAM) 428 for storing received and transmitted data packets, program code executed by CPU 424, and other non-persistent information about the system when power is removed. Read Only Memory (ROM) or flash memory 426 is used to store program instructions that are typically downloaded into RAM from flash/ROM during a power-up sequence. MAC 422 receives data sent over interface 423, such as a Serial Data Interface (SDI), and provides the received data packets to CPU 424 along with sequence numbers so that CPU 424 can detect and manage retransmission of any lost data, as well as set up any WLAN authentication protocols, perform any required packet-by-packet operations, such as encapsulation and decapsulation, channel management, packet aggregation, and connection management and authentication.
Fig. 4 shows an example bluetooth processor 450 including an analog front end and BT MAC that similarly operates with an antenna 452, a transmit/receive switch 454, a low noise amplifier 456, a mixer 458, a band pass filter 460, and an analog/digital converter and baseband processor 462 that can convert the baseband bluetooth hopping pattern into a data stream as the ADC/baseband processor 412 does for WLAN 802.11 data packets. The bluetooth transmit chain includes a baseband processor and DAC 470, a mixer 466 that modulates a baseband frequency hopping stream to an RF carrier frequency using a modulation clock source 468, and a power amplifier 464 that couples the modulated bluetooth frequency hopping stream to the transmit/receive switch 454. The BT CPU handles various connection management including pairing.
WLAN MAC 422 is coupled to WLAN CPU 424 via a digital interface 423, such as a Serial Peripheral Interface (SPI), and BT MAC 480 is coupled to BT CPU 482 via a digital interface 481. The architecture of fig. 4 thus provides separate CPU processing power for each of the respective operating WLAN and bluetooth processes, including low latency for processing connection or packet requests from each interface.
Fig. 5 shows an alternative architecture to fig. 4, where the WLAN RF front end/MAC 504 (corresponding to process 401 of fig. 4) and the BT RF front end/MAC 508 (corresponding to process 450 of fig. 4) are coupled via respective digital interfaces 518 and 520 to a multithreaded CPU 510, which multithreaded CPU 510 is itself coupled to a ROM/flash memory 512 and a RAM 514. Optionally, there is a thread map register 516 that provides for the allocation of CPU cycles to Bluetooth or WLAN processes. In one example of the invention, the number of process _ id entries in the thread map register is fixed, and an increasing or decreasing number of thread _ id values may be present in the thread map register to provide an increasing or decreasing number of process cycles to the particular process associated with each thread _ id. For a pipelined multithreaded processor that receives one instruction at a time as previously described, the granularity of control allocated to the thread process is instruction-by-instruction, for a multithreaded processor that receives each instruction of a thread as determined by a thread mapping register (which issues a next thread _ id for each instruction). Since the thread map registers issue thread _ ids in a repeating canonical manner, the process-to-thread assignment has a very fine granularity, which is equal to the inverse of the number of values that the thread map registers can support. In one embodiment of the invention, a thread management process may operate as one of the processes with a unique thread _ id, which examines activity in other threads according to activity level to increase or decrease the number of entries of the corresponding thread _ id, and assigns and de-assigns a thread _ id value from a thread mapping register. The activity level of a communication process associated with a communication processor may be determined, for example, by the number of data packets sent or received by the associated communication processor and handled by the thread, and a threshold may be established to indicate when more or fewer thread _ id values for the particular thread are present in the thread map register. Examples of process threads with unique thread _ ids with more or fewer entries dynamically placed in thread mapping registers by a thread management process include link layer processes, network layer processes, or application layer processes, where each link layer, network layer, or application layer process may include multiple processes with unique threshold metrics, each of which is associated with a particular communication processor, such as 401, 450, 504, or 508. The increased thread _ id allocation in the thread map register may be made for a period of time for which a threshold metric (such as packet data rate, number of packets remaining to be processed, thread load metric, or percentage of thread process task completion) exceeds a threshold.
Fig. 5A shows the allocation of memory (ROM/flash 512 or RAM 514) to the various threads that are present. One thread may be WLAN code corresponding to tasks performed by WLAN CPU 424 in fig. 4 and another thread may be BT code corresponding to tasks performed by BT CPU 482 in fig. 4. Additional threads may be designated to manage thread mapping registers to control the allocation of bandwidth of various tasks to the previously described thread mapping registers 103, and additional tasks may perform memory management of packet buffers and other low priority, infrequently executed functions. The thread map management task may periodically check the BT and WLAN interface utilization and change the CPU cycle allocation for each task as needed. In one aspect of the invention, bluetooth and WLAN operations are performed exclusively (exclusive) and CPU thread allocation for the interfaces (BT and WLAN tasks of fig. 5A) is dedicated to one interface or the other.
In another example of the invention, various threads may handle different portions of a particular communication protocol. For example, one thread may handle layer 2 and other operations, and another thread may handle layer 3 and application aspects of a particular protocol. In one aspect of the invention for any WLAN protocol, one thread may handle basic communication aspects, which may be collectively referred to as the lower MAC function. The underlying MAC functions of WLAN and bluetooth include packet transmission, packet reception, Clear Channel Assessment (CCA), inter-frame spacing, rate control, request to send and clear to send (RTS/CTS) exchange, wireless packet acknowledgement DATA/ACK for WLAN and bluetooth, or bluetooth-specific channel hopping. The upper MAC function performs other ISO (international standards organization) layer 2 functions at the data link layer that are not performed by the lower MAC function. The upper MAC function in this specification collectively refers to any one of the following: WLAN requestors (any protocol associated with joining or logging in to a wireless network access point), WLAN packet retransmissions and acknowledgements, security functions such as those described in standard WPA or WPA2 (wireless protected access). The ISO layer 3 (network layer) functions may be performed by separate threads. Layer 3 functions include IP packet formation, TCP retransmission and acknowledgement, SSL encryption and connection management, and application layer operations such as packet encapsulation for specific application layer processes. In another example of the present invention for bluetooth, one thread may be designated to handle bluetooth controller, stack, retry and acknowledgement, and another thread may be designated to handle application layer tasks. In this manner, the two tasks for a particular protocol are separated and provided to separate threads, and a common interface (such as SRAM) can be used to transfer data from one thread to the other.
In some applications, WLAN communications and bluetooth communications may coexist and operate concurrently. In this example configuration, CPU thread cycles may be dynamically allocated to WLAN communication processes while processing WLAN data packets and to BT thread cycles while processing bluetooth data packets. The unique thread _ id value may be used to create multiple processes associated with a particular communication processor 401, 450, 504 or 508, place each thread _ id into a thread map register 516 to provide processing bandwidth for each associated process, and exit these processes when the associated communication processor is not enabled and remove the thread _ id from the thread map register 516. Concurrent communication may be performed by a regular communication interval relying on bluetooth communication in which data packets are transmitted at regular slot intervals, and may be separated in time by a larger time interval in which the channel is not used for BT communication. During these intervals, WLAN packets may be transmitted and acknowledged so as not to interfere with the BT communication window. The thread map register 103 may be dynamically changed to provide a greater percentage of CPU power for BT during bluetooth packet intervals and then for WLAN during WLAN packet intervals, thereby reducing power consumption on the architecture of fig. 4.
The examples shown in fig. 4 and 5 are for specific different communication protocols for WLAN and bluetooth, but it should be understood that these are for illustration purposes only. The different communication protocols are a set of communication protocols that require completely different handling of data packets. Examples are any of bluetooth, WLAN, Zigbee, near field communication, other examples are known to those skilled in the art of communication protocols.

Claims (21)

1. A communications processor, comprising:
a plurality of communication controllers operable to participate in wireless communications, each communication controller having a data interface for sending and receiving wireless data packets;
a multithreaded processor coupled to the plurality of communication controllers, the multithreaded processor operable to execute at least one process thread having a unique thread _ id for each communication controller;
the multithreaded processor has a thread map register containing thread _ id values and generating the thread _ id values in a canonical sequence having a number of thread _ id values greater than a number of unique thread _ id values;
the multithreaded processor executing each instruction of each thread in accordance with the corresponding thread _ id value generated by the thread map register;
a thread management controller operable to modify the thread mapping register so that threads associated with communication controllers that send and receive larger amounts of data contain a larger number of thread _ id values than threads associated with communication controllers that send and receive smaller amounts of data.
2. The communications processor of claim 1 wherein at least one communications controller is operable to transmit and receive Wireless Local Area Network (WLAN) packets, the communications controller having a media access controller for transmitting and receiving data using the data interface.
3. The communication processor of claim 1, wherein at least one communication controller is operable to send and receive bluetooth data packets.
4. The communications processor of claim 1 wherein at least one thread has an associated thread _ id value that is not in a consecutive thread mapping register location.
5. The communications processor of claim 1 wherein the multi-threaded processor is coupled to the plurality of communications controllers through a single data interface.
6. The communications processor of claim 5, wherein the single data interface is a serial data interface.
7. The communications processor of claim 1, wherein at least one communications controller is operable to receive or transmit Wireless Local Area Network (WLAN) packets.
8. The communication processor of claim 1, wherein at least one communication controller is operable to receive or send bluetooth packets.
9. The communications processor of claim 1 wherein the thread management controller monitors at least one network parameter associated with utilization of a bluetooth interface or utilization of a WLAN interface.
10. A thread management process operable on a plurality of process threads in a communication processor, the communication processor having:
a plurality of communication controllers operable to participate in wireless communications, each communication controller having a data interface for sending and receiving wireless data packets;
a multithreaded processor coupled to the plurality of communication controllers, the multithreaded processor operable to execute at least one process thread having a unique thread _ id for each communication controller;
the multithreaded processor has a thread map register containing thread _ id values and generating the thread _ id values in a canonical sequence having a number of thread _ id values greater than a number of unique thread _ id values;
the multithreaded processor executing each instruction of each thread in accordance with the corresponding thread _ id value generated by the thread map register;
the thread management process comprises:
determining a first activity metric for a process operable on a first communication controller and a second activity metric for a process operable on a second communication controller;
determining a greater activity metric of the first activity metric or the second activity metric;
adding an additional thread _ id entry associated with the larger activity metric into the thread mapping register.
11. The thread management process of claim 10 wherein at least one communication processor has a media access controller and an associated link layer communication process that performs transmission and reception of data packets, the at least one communication processor also having an associated network layer process that performs network layer retransmission and acknowledgement of network layer data packets.
12. The thread management process of claim 11, wherein the link layer communication process is performed by a first process thread having a unique thread _ id, and the network layer process is performed by a different process having a unique thread _ id that is different from the thread _ id of the first process thread.
13. The thread management process of claim 12, wherein when the number of data packets sent or received by the associated at least one communication processor increases beyond a threshold, the number of first process thread _ id values in the thread mapping register increases and the number of second process thread _ ids in the thread mapping register increases.
14. The thread management process of claim 10, wherein the communication processor has a security function performed by a security thread process having an associated unique thread _ id, during which the number of security function unique thread _ id entries in the thread mapping register is increased.
15. The thread management process of claim 14, wherein the security function is authentication.
16. The thread management process of claim 10, wherein at least one communication processor comprises a Media Access Controller (MAC), and one process thread associated with the MAC handles at least one lower layer MAC function and a different process thread associated with the MAC handles at least one upper layer MAC function.
17. The thread management process of claim 16, wherein information from the process thread handling the at least one lower MAC function and the process thread handling the at least one upper MAC function shares data in a random access memory associated with the multithreaded processor.
18. The thread management process of claim 10, wherein one of the plurality of process threads is a thread management process that creates a new thread process and places an associated thread _ id in the thread mapping register.
19. The thread management process of claim 10, wherein a new thread process is created when the communication processor is enabled for a wireless communication protocol.
20. The thread management process of claim 10, wherein at least one of the threads performs a memory management process comprising allocating and deallocating packet buffer memory in Random Access Memory (RAM) accessible to the multithreaded processor.
21. The thread management process of claim 10, wherein at least one of the plurality of process threads is an interrupt handler process.
CN202010940096.5A 2019-09-11 2020-09-09 Multi-threaded wireless communication processor with refined thread processes Pending CN112486572A (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
US201962899078P 2019-09-11 2019-09-11
US201962899072P 2019-09-11 2019-09-11
US62/899,072 2019-09-11
US62/899,078 2019-09-11
US16/945,871 US20210073027A1 (en) 2019-09-11 2020-08-02 Multi-Thread Wireless Communications Processor with Granular Thread Processes
US16/945,870 US20210076248A1 (en) 2019-09-11 2020-08-02 Communication Processor Handling Communications Protocols on Separate Threads
US16/945,870 2020-08-02
US16/945,871 2020-08-02

Publications (1)

Publication Number Publication Date
CN112486572A true CN112486572A (en) 2021-03-12

Family

ID=74920046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010940096.5A Pending CN112486572A (en) 2019-09-11 2020-09-09 Multi-threaded wireless communication processor with refined thread processes

Country Status (1)

Country Link
CN (1) CN112486572A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11288072B2 (en) * 2019-09-11 2022-03-29 Ceremorphic, Inc. Multi-threaded processor with thread granularity

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11288072B2 (en) * 2019-09-11 2022-03-29 Ceremorphic, Inc. Multi-threaded processor with thread granularity

Similar Documents

Publication Publication Date Title
US20210076248A1 (en) Communication Processor Handling Communications Protocols on Separate Threads
CN114730261B (en) Multithreaded processor with thread granularity
Tan et al. Sora: high-performance software radio using general-purpose multi-core processors
EP0852357B1 (en) Method for handling interrupts in a high speed I/O controller
US9720739B2 (en) Method and system for dedicating processors for desired tasks
US7313104B1 (en) Wireless computer system with latency masking
US9474084B2 (en) MAC protocol in wireless body area network capable of processing emergency data and wireless network communication method using same
CN111095230B (en) Shared radio arbitration
US7075914B2 (en) Software modem architecture
US11013060B2 (en) Selective multiple-media access control
US20210073027A1 (en) Multi-Thread Wireless Communications Processor with Granular Thread Processes
CN112486572A (en) Multi-threaded wireless communication processor with refined thread processes
US7606259B2 (en) Communication resource scheduling among multiple links
CN109857686B (en) Method for converting synchronous transmission of DMA data into asynchronous transmission
WO2016134634A1 (en) Message receiving method, apparatus and device, computer storage medium and central processing unit
US20230199814A1 (en) Enhanced prediction of timing of activity for wireless devices
WO2024014995A1 (en) Processing unit, packet handling unit, arrangement and methods for handling packets
DE102020123498A1 (en) Wireless multi-threaded communication processor with granular thread processes
GB2484907A (en) Data processing system with a plurality of data processing units and a task-based scheduling scheme
JP2005136856A (en) Packet processor and program
GB2484904A (en) Data processing system with a plurality of data processing units and a task-based scheduling scheme
GB2484905A (en) Data processing system with a plurality of data processing units and a task-based scheduling scheme
GB2484899A (en) Data processing system with a plurality of data processing units and a task-based scheduling scheme

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination