WO2023249637A1

WO2023249637A1 - Apparatus and method to implement a token-based processing scheme for virtual dataplane threads

Info

Publication number: WO2023249637A1
Application number: PCT/US2022/034878
Authority: WO
Inventors: Su-Lin Low; Chun-I Lee; Tianan Tim MA; Sangwon Ki
Original assignee: Zeku, Inc.
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2023-12-28

Abstract

According to one aspect of the present disclosure, a baseband chip is provided. The baseband chip may include a microcontroller may assign a first token associated with a first number of clock cycles to a first virtual dataplane (DP) thread based on a first quality-of-service (QoS) profile of a first DP Layer 2 circuit. The microcontroller may assign a second token associated with a second number of clock cycles to a second virtual DP thread based on a second QoS profile of a second DP Layer 2 circuit. The microcontroller may execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a first time period. The microcontroller may execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a second time period.

Description

APPARATUS AND METHOD TO IMPLEMENT A TOKEN-BASED PROCESSING SCHEME FOR VIRTUAL DATAPLANE THREADS

BACKGROUND

[0001] Embodiments of the present disclosure relate to apparatus and method for wireless communication.

[0002] Wireless communication systems are widely deployed to provide various telecommunication services such as telephony, video, data, messaging, and broadcasts. In cellular communication, such as the 4th-gen eration (4G) Long Term Evolution (LTE) and the 5th- generation (5G) New Radio (NR), the 3rd Generation Partnership Project (3GPP) defines a Radio Layer 2 (referred to here as “Layer 2”) as part of the cellular protocol stack structure corresponding to the DP (DP) (also referred to as the “user plane”), which includes a Service Data Adaptation Protocol (SDAP) layer, a Packet Data Convergence Protocol (PDCP) layer, a Radio Link Control (RLC) layer, a Security (SEC) layer, and a Medium Access Control (MAC), from top to bottom in the stack.

SUMMARY

[0003] According to one aspect of the present disclosure, a baseband chip is provided. The baseband chip may include a dataplane (DP) Layer 2 hardware accelerator block comprising a plurality of DP Layer 2 circuits. The baseband chip may further include a microcontroller. The microcontroller may include a first instruction set for a first virtual DP thread associated with at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block and a second instruction set for at least one second virtual DP thread associated with at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. The microcontroller may be configured to assign a first token associated with a first number of clock cycles to the first virtual DP thread based on a first quality-of-service (QoS) profile associated with the at least one first DP Layer 2 circuit. The microcontroller may be configured to assign a second token associated with a second number of clock cycles to the second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit. The microcontroller may be configured to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a first time period. The microcontroller may be configured to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a second time period that follows the first time period. [0004] According to another aspect of the present disclosure, a DP Layer 2 hardware block for a baseband chip is provided. The DP Layer 2 hardware block may include a plurality of DP Layer 2 circuits. The DP Layer 2 hardware block may also include a microcontroller. The microcontroller may include a first instruction set for a first virtual DP thread associated with at least one first DP Layer 2 circuit of the plurality of DP Layer 2 circuits and a second instruction set for at least one second virtual DP thread associated with at least one second DP Layer 2 circuit of the plurality of DP Layer 2 circuits. The microcontroller may be configured to assign a first token associated with a first number of clock cycles to the first virtual DP thread based on a first QoS profile associated with the at least one first DP Layer 2 circuit. The microcontroller may be configured to assign a second token associated with a second number of clock cycles to a second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit. The microcontroller may be configured to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a first time period. The microcontroller may be configured to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a second time period that follows the first time period.

[0005] According to yet another aspect of the present disclosure, a method of wireless communication of a baseband chip is provided. The method may include maintaining, by a first register of a microcontroller, a first set of custom instructions for a first virtual DP thread associated with at least one first DP Layer 2 circuit of a DP Layer 2 hardware accelerator block. The method may include maintaining, by a second register of the microcontroller, a second instruction set for a second virtual DP thread associated with at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. The method may include assigning, by the microcontroller, a first token associated with a first number of clock cycles to the first virtual DP thread based on a first QoS profile associated with the at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. The method may include assigning, by the microcontroller, a second token associated with a second number of clock cycles to the second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. The method may include executing, by the microcontroller, the first instruction set for the first number of clock cycles using the first token to run the first virtual DP thread during a first time period. The method may include executing, by the microcontroller, the second instruction set for the second number of clock cycles using the second token to run the second virtual DP thread during a second time period that follows the first time period.

[0006] These illustrative embodiments are mentioned not to limit or define the present disclosure, but to provide examples to aid understanding thereof. Additional embodiments are discussed in the Detailed Description, and further description is provided there.

BRIEF DESCRIPTION OF THE DRAWINGS

[0007] The accompanying drawings, which are incorporated herein and form a part of the specification, illustrate embodiments of the present disclosure and, together with the description, further serve to explain the principles of the present disclosure and to enable a person skilled in the pertinent art to make and use the present disclosure.

[0008] FIG. 1 illustrates a block diagram of an example baseband chip protocol stack.

[0009] FIG. 2 illustrates an exemplary wireless network, according to some embodiments of the present disclosure.

[0010] FIG. 3 illustrates a block diagram of an exemplary node, according to some embodiments of the present disclosure.

[0011] FIG. 4 illustrates a block diagram of an exemplary apparatus including a baseband chip, a radio frequency (RF) chip, and a host chip, according to some embodiments of the present disclosure.

[0012] FIG. 5A illustrates a detailed block diagram of a first exemplary baseband chip, according to some embodiments of the present disclosure.

[0013] FIG. 5B illustrates a detailed block diagram of a second exemplary baseband chip, according to some embodiments of the present disclosure.

[0014] FIG. 5C illustrates a detailed block diagram of a third exemplary baseband chip, according to some embodiments of the present disclosure.

[0015] FIG. 5D illustrates a detailed block diagram of a fourth exemplary baseband chip, according to some embodiments of the present disclosure.

[0016] FIG. 5E illustrates a detailed block diagram of a fifth exemplary baseband chip, according to some embodiments of the present disclosure.

[0017] FIG. 6 is a flowchart of a first method of wireless communication, according to some embodiments of the present disclosure.

[0018] FIG. 7 is a flowchart of a second method of wireless communication, according to some embodiments of the present disclosure.

[0019] Embodiments of the present disclosure will be described with reference to the accompanying drawings.

DETAILED DESCRIPTION

[0020] Although specific configurations and arrangements are discussed, it should be understood that this is done for illustrative purposes only. A person skilled in the pertinent art will recognize that other configurations and arrangements can be used without departing from the spirit and scope of the present disclosure. It will be apparent to a person skilled in the pertinent art that the present disclosure can also be employed in a variety of other applications.

[0021] It is noted that references in the specification to “one embodiment,” “an embodiment,” “an example embodiment,” “some embodiments,” “certain embodiments,” etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases do not necessarily refer to the same embodiment. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it would be within the knowledge of a person skilled in the pertinent art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

[0022] In general, terminology may be understood at least in part from usage in context. For example, the term “one or more” as used herein, depending at least in part upon context, may be used to describe any feature, structure, or characteristic in a singular sense or may be used to describe combinations of features, structures or characteristics in a plural sense. Similarly, terms, such as “a,” “an,” or “the,” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context. In addition, the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.

[0023] Various aspects of wireless communication systems will now be described with reference to various apparatus and methods. These apparatus and methods will be described in the following detailed description and illustrated in the accompanying drawings by various blocks, modules, units, components, circuits, steps, operations, processes, algorithms, etc. (collectively referred to as “elements”). These elements may be implemented using electronic hardware, firmware, computer software, or any combination thereof. Whether such elements are implemented as hardware, firmware, or software depends upon the particular application and design constraints imposed on the overall system.

[0024] The techniques described herein may be used for various wireless communication networks, such as code division multiple access (CDMA) system, time division multiple access (TDMA) system, frequency division multiple access (FDMA) system, orthogonal frequency division multiple access (OFDMA) system, single-carrier frequency division multiple access (SC- FDMA) system, wireless local area network (WLAN) system, and other networks. The terms “network” and “system” are often used interchangeably. A CDMA network may implement a radio access technology (RAT), such as Universal Terrestrial Radio Access (UTRA), evolved UTRA (E-UTRA), CDMA 2000, etc. A TDMA network may implement a RAT, such as the Global System for Mobile Communications (GSM). An OFDMA network may implement a RAT, such as LTE or NR. A WLAN system may implement a RAT, such as Wi-Fi. The techniques described herein may be used for the wireless networks and RATs mentioned above, as well as other wireless networks and RATs.

[0025] In cellular and/or Wi-Fi communication, Layer 2 is the protocol stack layer responsible for ensuring a reliable, error-free datalink for the wireless modem (referred to herein as a “baseband chip”) of a user equipment. More specifically, Layer 2 interfaces with Radio Layer 1 (also referred to as “Layer 1” or the “physical (PHY) layer”) and Radio Layer 3 (also referred to as “Layer 3” or the “Internet Protocol (IP) layer”), passing data packets up or down the protocol stack structure, depending on whether the data packets are associated with UL or DL transmissions. [0026] Furthermore, Layer 2 may perform de-multiplexing / multiplexing, segmentation / reassembly, aggregation / de-aggregation, and sliding window automatic repeat request (ARQ) techniques, among others, to ensure reliable end-to-end data integrity and in-order error-free delivery of data packets. For a UL data packet, Layer 3 data packets (e.g., IP data packets) may be input into the Layer 2 protocol stack, and encoded into MAC layer packets (e.g., 5G NR) for transporting to the PHY layer. For a DL data packet, Layer 1 data packets (e.g., PHY layer data packets) may be input into the Layer 2 protocol stack, where Layer 2 data processing operations are performed on the data packets before being passed up to Layer 3. Layer 3 performs IP header extraction, IP checksum, IP tracing, and IP routing and classification, among other things.

[0027] FIG. 1 illustrates a block diagram of a conventional baseband chip 100. As seen in FIG. 1, a conventional baseband chip 100 may include PHY subsystem 102 configured to transmit and/or receive data packets over an air interface, a protocol stack 104 (e.g., residing at the baseband chip) that includes a control plane 106 and a DP 108, Layer 3/Layer 4 subsystems 110, and an application processor (AP)/host 112.

[0028] Control plane 106 performs two main functions: non-access stratum (NAS) function and radio resource control (RRC) function. The NAS function performs network layer control that relates to mobility management, session management, security management, and system selection, just to name a few. The RRC function performs radio resource allocation and configuration, as well as the radio channel control of radio bearers, logical channels, and security (ciphering, integrity configurations).

[0029] DP 108 performs Layer 2 and Layer 3/4 functions. Layer 2 functions relate to protocol data unit (PDU) processing. For example, the MAC layer performs multiplexing and demultiplexing, and mapping of logical channels to transport channels. The RLC layer performs automatic repeat request (ARQ) procedures at the radio link level and the error recovery of each logical channel. The PDCP layer performs packet level processing for data ciphering, integrity, and compression. The SDAP layer performs QoS classification of IP flows to data radio bearers.

[0030] The example baseband chip 100 illustrated in FIG. 1 uses a software-centric Layer 2 protocol data stack. Namely, the data stack processing resides on a Layer 2 main processor and uses a limited number of hardware accelerators. Using example baseband chip 100, the Layer 2 main processor (not shown) may access a data packet by direct memory access (DMA) from a PHY layer memory at the PHY subsystem(s) 102. Furthermore, the hardware (HW) accelerators may DMA a UL data packet to the Layer 3 external DDR memory of Layer 3 subsystem 110.

[0031] In example baseband chip 100, Layer 2 data processing (e.g., processing the transport blocks received from Layer 1 (e.g., PHY subsystem 102) in the DL user plane or processing data packets received from Layer 3 in the UL user plane) is usually implemented using software modules executed on a generic baseband processor, such as a central processing unit (CPU) or a digital signal processor (DSP). During processing, data may be frequently transferred between the generic main processor (not shown) and external memory (Layer 3 external DDR memory or Layer 2 buffer - not shown), e.g., for buffering between each layer. As a result, the known solutions for Layer 2 data processing suffer from high power consumption, large data buffer, and long process delays.

[0032] Moreover, when a user equipment is configurated with Carrier Aggregation (CA), multiple Component Carriers (CCs) are typically aggregated for reception and transmission. As such, the user equipment may receive multiple grants concurrently, one from each CC and Cell, which determines the scheduled packets reception and transmission in the downlink and uplink directions, respectively.

[0033] In the downlink (DL) direction, the DL MAC layer receives code blocks from the PHY subsystem 102 from multiple CCs. The DL MAC layer may then re-order each Transport Block (TB), extract the MACsubPDU headers to obtain the MAC PDUs, and transfer the packet to the RLC and PDCP DP Layer 2 for further processing in each logical channel and associated radio bearers. Once Layer 2 data processing is complete, the packets are sent to Layer 3/Layer 4 subsystems 110, where the QoS flows in each radio bearer are routed to the appropriate application. [0034] Conversely, in the uplink (UL) direction, the Layer 3/Layer 4 subsystems 110 prepare the UL packets from multiple QoS flows for each radio bearers, and the UL packets may then be transferred to Layer 2 logical channel queues, ready for transmission. Once the UL MAC layer receives the UL grant, which allocates resources for the physical uplink shared channel (PUSCH)) using the physical downlink control channel (PDCCH) at the beginning of a slot. For example, the UL grant may be received in a downlink control indicator (DCI) on the PDCCH. The UL grant may inform the UE to transmit the UL MACPDU at a time delay equivalent to K2 slots away from the current slot. Typically, K2<1 grants are implied to be serviced for low latency application data, and hence, radio bearers/logical channels (LCs) data are pulled into such grants to be sent out as soon as possible. The UL MAC scheduling algorithm uses a Logical Channel Prioritization (LCP) method to schedule packets from a logical channel (LC) according to allocated grant bytes from a configured maximum bucket size setting.

[0035] One challenge of DP 108 processing relates to the power consumption by baseband chip 100, which needs to support different application types like high-throughput, high-latency data transfers, as well as low-latency, low-data rate applications. When operating in low-data rate applications, the power usage at example baseband chip 100 is not optimized. For example, in low-data rate applications, conventional DP 108 processing uses resources inefficiently when processing DL/UL Layer 2/Layer 3 data packets, consumes power unnecessarily during low-data rate transfers, uses an increased double data rate (DDR) transfer, and an increased DP interconnect bus transactions during periods of activity, and Layer 2 to Layer 3 data transfers are unoptimized and cause undue delays. This is because the UL/DL Layer 2 processing is not aligned, which causes an excess number of network-on-chip (NoC) interconnect resources (e.g., such as bus transactions, external DDR memory access, etc.) to be used during low data rate applications. Consequently, baseband chip 100 uses central processing unit (CPU) cores and memory resources in efficiently when processing Layer 2/Layer 3 data packets; lacks distinction between different QoS flows when scheduling DP threads, high overhead-delay during thread context switching, interrupt handling, and memory input/output (I/O) access; wastes power when performing low- data rate transfers; suffers from lengthy active periods when performing DP interconnect bus transactions; performs non-optimized Layer 2 to Layer 3 data transfers with undesirable delays; and starves low-priority threads, just to name a few.

[0036] Thus, there exists an unmet need for a DP processing technique that minimizes power usage for low-data applications without sacrificing QoS latency-performance or high- throughput data-performance.

[0037] To overcome these and other challenges, the present disclosure provides a computationally efficient DP Layer 2 architecture for the baseband chip, which can process multiple virtual DP hardware threads (referred to hereinafter as “virtual DP threads”) within a single core, with dynamic and tunable token cycles on a per-thread basis according to the thread’s QoS profile, thereby saving power and CPU resources with optimized QoS performance. For instance, the baseband chip of the present disclosure may run multiple virtual DP threads within a single core (also referred to as a “microcontroller”), with each thread being executed in a customgrained interleaved round-robin fashion using tunable and dynamic token cycles per run, according to the thread’ s QoS profile. Each virtual DP thread context is maintained in a fast, local customized register set at the microcontroller, and is switched in and out quickly without accessing external memory. The present DP Layer 2 architecture is scalable to multiple cores for different throughput requirements, where in each core, each DP layer is running one or multiple virtual HW threads, which can be virtually mapped to one or more DP HW blocks collectively. In addition, the techniques described herein are extensible to different technologies including 5G NR/6G/7G/beyond, Wi-fi 6/Wi-fi 7/beyond, and Non-Terrestrial Networks (NTN), as well as automotive technologies, just to name a few. Additional details of the token-based virtual DP thread processing technique are provided below in connection with FIGs. 2-7.

[0038] Although the following processing techniques are described in connection with Layer 2 data processing, the same or similar techniques may be applied to Layer 3 and/or Layer 4 data processing to optimize power consumption at Layer 3 and/or Layer 4 subsystems without departing from the scope of the present disclosure. [0039] FIG. 2 illustrates an exemplary wireless network 200, in which some aspects of the present disclosure may be implemented, according to some embodiments of the present disclosure. As shown in FIG. 2, wireless network 200 may include a network of nodes, such as user equipment 202, an access node 204, and a core network element 206. User equipment 202 may be any terminal device, such as a mobile phone, a desktop computer, a laptop computer, a tablet, a vehicle computer, a gaming console, a printer, a positioning device, a wearable electronic device, a smart sensor, or any other device capable of receiving, processing, and transmitting information, such as any member of a vehicle to everything (V2X) network, a cluster network, a smart grid node, or an Internet-of-Things (loT) node. It is understood that user equipment 202 is illustrated as a mobile phone simply by way of illustration and not by way of limitation.

[0040] Access node 204 may be a device that communicates with user equipment 202, such as a wireless access point, a base station (BS), a Node B, an enhanced Node B (eNodeB or eNB), a next-generation NodeB (gNodeB or gNB), a cluster master node, or the like. Access node 204 may have a wired connection to user equipment 202, a wireless connection to user equipment 202, or any combination thereof. Access node 204 may be connected to user equipment 202 by multiple connections, and user equipment 202 may be connected to other access nodes in addition to access node 204. Access node 204 may also be connected to other user equipments. When configured as a gNB, access node 204 may operate in millimeter wave (mmW) frequencies and/or near mmW frequencies in communication with the user equipment 202. When access node 204 operates in mmW or near mmW frequencies, the access node 204 may be referred to as an mmW base station. Extremely high frequency (EHF) is part of the radio frequency (RF) in the electromagnetic spectrum. EHF has a range of 30 GHz to 300 GHz and a wavelength between 1 millimeter and 10 millimeters. Radio waves in the band may be referred to as a millimeter wave. Near mmW may extend down to a frequency of 3 GHz with a wavelength of 200 millimeters. The super high frequency (SHF) band extends between 3 GHz and 30 GHz, also referred to as centimeter wave. Communications using the mmW or near mmW radio frequency band have extremely high path loss and a short range. The mmW base station may utilize beamforming with user equipment 202 to compensate for the extremely high path loss and short range. It is understood that access node 204 is illustrated by a radio tower by way of illustration and not by way of limitation.

[0041] Access nodes 204, which are collectively referred to as E-UTRAN in the evolved packet core network (EPC) and as NG-RAN in the 5G core network (5GC), interface with the EPC and 5GC, respectively, through dedicated backhaul links (e.g., SI interface). In addition to other functions, access node 204 may perform one or more of the following functions: transfer of user data, radio channel ciphering and deciphering, integrity protection, header compression, mobility control functions (e.g., handover, dual connectivity), inter-cell interference coordination, connection setup and release, load balancing, distribution for non-access stratum (NAS) messages, NAS node selection, synchronization, radio access network (RAN) sharing, multimedia broadcast multicast service (MBMS), subscriber and equipment trace, RAN information management (RIM), paging, positioning, and delivery of warning messages. Access nodes 204 may communicate directly or indirectly (e.g., through the 5GC) with each other over backhaul links (e.g., X2 interface). The backhaul links may be wired or wireless.

[0042] Core network element 206 may serve access node 204 and user equipment 202 to provide core network services. Examples of core network element 206 may include a home subscriber server (HSS), a mobility management entity (MME), a serving gateway (SGW), or a packet data network gateway (PGW). These are examples of core network elements of an evolved packet core (EPC) system, which is a core network for the LTE system. Other core network elements may be used in LTE and in other communication systems. In some embodiments, core network element 206 includes an access and mobility management function (AMF), a session management function (SMF), or a user plane function (UPF) of the 5GC for the NR system. The AMF may be in communication with a Unified Data Management (UDM). The AMF is the control node that processes the signaling between the user equipment 202 and the 5GC. Generally, the AMF provides QoS flow and session management. All user Internet protocol (IP) packets are transferred through the UPF. The UPF provides user equipment (UE) IP address allocation as well as other functions. The UPF is connected to the IP Services. The IP Services may include the Internet, an intranet, an IP Multimedia Subsystem (IMS), a PS Streaming Service, and/or other IP services. It is understood that core network element 206 is shown as a set of rack-mounted servers by way of illustration and not by way of limitation.

[0043] Core network element 206 may connect with a large network, such as the Internet 208, or another Internet Protocol (IP) network, to communicate packet data over any distance. In this way, data from user equipment 202 may be communicated to other user equipments connected to other access points, including, for example, a computer 210 connected to Internet 208, for example, using a wired connection or a wireless connection, or to a tablet 212 wirelessly connected to Internet 208 via a router 214. Thus, computer 210 and tablet 212 provide additional examples of possible user equipments, and router 214 provides an example of another possible access node. [0044] A generic example of a rack-mounted server is provided as an illustration of core network element 206. However, there may be multiple elements in the core network including database servers, such as a database 216, and security and authentication servers, such as an authentication server 218. Database 216 may, for example, manage data related to user subscription to network services. A home location register (HLR) is an example of a standardized database of subscriber information for a cellular network. Likewise, authentication server 218 may handle authentication of users, sessions, and so on. In the NR system, an authentication server function (AUSF) device may be the entity to perform user equipment authentication. In some embodiments, a single server rack may handle multiple such functions, such that the connections between core network element 206, authentication server 218, and database 216, may be local connections within a single rack.

[0045] Each element in FIG. 2 may be considered a node of wireless network 200. More detail regarding the possible implementation of a node is provided by way of example in the description of a node 300 in FIG. 3. Node 300 may be configured as user equipment 202, access node 204, or core network element 206 in FIG. 2. Similarly, node 300 may also be configured as computer 210, router 214, tablet 212, database 216, or authentication server 218 in FIG. 2. As shown in FIG. 3, node 300 may include a processor 302, a memory 304, and a transceiver 306. These components are shown as connected to one another by a bus, but other connection types are also permitted. When node 300 is user equipment 202, additional components may also be included, such as a user interface (UI), sensors, and the like. Similarly, node 300 may be implemented as a blade in a server system when node 300 is configured as core network element 206. Other implementations are also possible.

[0046] Transceiver 306 may include any suitable device for sending and/or receiving data. Node 300 may include one or more transceivers, although only one transceiver 306 is shown for simplicity of illustration. An antenna 308 is shown as a possible communication mechanism for node 300. Multiple antennas and/or arrays of antennas may be utilized for receiving multiple spatially multiplex data streams. Additionally, examples of node 300 may communicate using wired techniques rather than (or in addition to) wireless techniques. For example, access node 204 may communicate wirelessly to user equipment 202 and may communicate by a wired connection (for example, by optical or coaxial cable) to core network element 206. Other communication hardware, such as a network interface card (NIC), may be included as well.

[0047] As shown in FIG. 3, node 300 may include processor 302. Although only one processor is shown, it is understood that multiple processors can be included. Processor 302 may include microprocessors, microcontroller units (MCUs), digital signal processors (DSPs), application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), programmable logic devices (PLDs), state machines, gated logic, discrete hardware circuits, and other suitable hardware configured to perform the various functions described throughout the present disclosure. Processor 302 may be a hardware device having one or more processing cores. Processor 302 may execute software. Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executables, threads of execution, procedures, functions, etc., whether referred to as software, firmware, middleware, microcode, hardware description language, or otherwise. Software can include computer instructions written in an interpreted language, a compiled language, or machine code. Other techniques for instructing hardware are also permitted under the broad category of software. [0048] As shown in FIG. 3, node 300 may also include memory 304. Although only one memory is shown, it is understood that multiple memories can be included. Memory 304 can broadly include both memory and storage. For example, memory 304 may include random-access memory (RAM), read-only memory (ROM), static RAM (SRAM), dynamic RAM (DRAM), ferroelectric RAM (FRAM), electrically erasable programmable ROM (EEPROM), compact disc readonly memory (CD-ROM) or other optical disk storage, hard disk drive (HDD), such as magnetic disk storage or other magnetic storage devices, Flash drive, solid-state drive (SSD), or any other medium that can be used to carry or store desired program code in the form of instructions that can be accessed and executed by processor 302. Broadly, memory 304 may be embodied by any computer-readable medium, such as a non-transitory computer-readable medium.

[0049] Processor 302, memory 304, and transceiver 306 may be implemented in various forms in node 300 for performing wireless communication functions. In some embodiments, at least two of processor 302, memory 304, and transceiver 306 are integrated into a single system- on-chip (SoC) or a single system-in-package (SiP). In some embodiments, processor 302, memory 304, and transceiver 306 of node 300 are implemented (e.g., integrated) on one or more SoCs. In one example, processor 302 and memory 304 may be integrated on an application processor (AP) SoC (sometimes known as a “host,” referred to herein as a “host chip”) that handles application processing in an operating system (OS) environment, including generating raw data to be transmitted. In another example, processor 302 and memory 304 may be integrated on a baseband processor (BP) SoC (sometimes known as a “modem,” referred to herein as a “baseband chip”) that converts the raw data, e.g., from the host chip, to signals that can be used to modulate the carrier frequency for transmission, and vice versa, which can run a real-time operating system (RTOS). In still another example, processor 302 and transceiver 306 (and memory 304 in some cases) may be integrated on an RF SoC (sometimes known as a “transceiver,” referred to herein as an “RF chip”) that transmits and receives RF signals with antenna 308. It is understood that in some examples, some or all of the host chip, baseband chip, and RF chip may be integrated as a single SoC. For example, a baseband chip and an RF chip may be integrated into a single SoC that manages all the radio functions for cellular communication.

[0050] Referring back to FIG. 2, in some embodiments, user equipment 202 may include an exemplary baseband chip with a computationally efficient DP Layer 2 architecture, which can process multiple virtual DP threads within a single core, with dynamic and tunable token cycles on a per-thread basis according to the thread’s QoS profile, thereby saving power and CPU resources with optimized QoS performance. Additional details of the token-based virtual DP thread processing technique are provided below in connection with FIGs. 4-7.

[0051] FIG. 4 illustrates a block diagram of an apparatus 400 including a baseband chip 402, an RF chip 404, and a host chip 406, according to some embodiments of the present disclosure. Apparatus 400 may be implemented as user equipment 202 of wireless network 200 in FIG. 2. As shown in FIG. 4, apparatus 400 may include baseband chip 402, RF chip 404, host chip 406, and one or more antennas 410. In some embodiments, baseband chip 402 is implemented by a processor and a memory, and RF chip 404 is implemented by a processor, a memory, and a transceiver. Besides the on-chip memory 418 (also known as “internal memory,” e.g., registers, buffers, or caches) on each chip 402, 404, or 406, apparatus 400 may further include an external memory 408 (e.g., the system memory or main memory) that can be shared by each chip 402, 404, or 406 through the system/main bus. Although baseband chip 402 is illustrated as a standalone SoC in FIG. 4, it is understood that in one example, baseband chip 402 and RF chip 404 may be integrated as one SoC or one SiP; in another example, baseband chip 402 and host chip 406 may be integrated as one SoC or one SiP; in still another example, baseband chip 402, RF chip 404, and host chip 406 may be integrated as one SoC or one SiP, as described above.

[0052] In the uplink, host chip 406 may generate raw data and send it to baseband chip 402 for encoding, modulation, and mapping. Interface 414 of baseband chip 402 may receive the data from host chip 406. Baseband chip 402 may also access the raw data generated by host chip 406 and stored in external memory 408, for example, using the direct memory access (DMA). Baseband chip 402 may first encode (e.g., by source coding and/or channel coding) the raw data and modulate the coded data using any suitable modulation techniques, such as multi-phase shift keying (MPSK) modulation or quadrature amplitude modulation (QAM). Baseband chip 402 may perform any other functions, such as symbol or layer mapping, to convert the raw data into a signal that can be used to modulate the carrier frequency for transmission. In the uplink, baseband chip 402 may send the modulated signal to RF chip 404 via interface 414. RF chip 404, through the transmitter, may convert the modulated signal in the digital form into analog signals, i.e., RF signals, and perform any suitable front-end RF functions, such as filtering, digital pre-distortion, up-conversion, or sample-rate conversion. Antenna 410 (e.g., an antenna array) may transmit the RF signals provided by the transmitter of RF chip 404.

[0053] In the downlink, antenna 410 may receive RF signals from an access node or other wireless device. The RF signals may be passed to the receiver (Rx) of RF chip 404. RF chip 404 may perform any suitable front-end RF functions, such as filtering, IQ imbalance compensation, down-paging conversion, or sample-rate conversion, and convert the RF signals (e.g., transmission) into low-frequency digital signals (baseband signals) that can be processed by baseband chip 402.

[0054] Still referring to FIG. 4, baseband chip 402 includes a DP subsystem 420 designed with a computationally efficient DP Layer 2 architecture, as shown in detail in FIGs. 5 A-5E. This computationally efficient DP Layer 2 architecture enables DP subsystem 420 to process multiple virtual DP threads within a single microcontroller (uC) that assigns dynamic and tunable token cycles on a per-thread basis according to the thread’s QoS profile. Using this exemplary tokenbased virtual DP thread scheduling mechanism, baseband chip 402 uses less power and computational resources as compared to other baseband chips, while at the same time optimizing QoS performance. Additional details of the exemplary token-based virtual DP thread scheduling/processing technique supported by the computationally efficient DP Layer 2 architecture of DP subsystem 420 are provided below in connection with FIGs. 5A-5E.

[0055] FIG. 5 A illustrates a detailed block diagram of a first exemplary architecture 500 of baseband chip 402, according to some embodiments of the present disclosure. FIG. 5B illustrates a detailed block diagram of a second exemplary architecture 515 of baseband chip 402, according to some embodiments of the present disclosure. FIG. 5C illustrates a detailed block diagram of a third exemplary architecture 525 of baseband chip 402, according to some embodiments of the present disclosure. FIG. 5D illustrates a detailed block diagram of a fourth exemplary architecture 535 of baseband chip 402, according to some embodiments of the present disclosure. FIG. 5E illustrates a detailed block diagram of a fifth exemplary architecture 545 of baseband chip 402, according to some embodiments of the present disclosure. FIGs. 5 A-5E will be described together. [0056] Referring to FIGs. 5 A-5E, in addition to DP subsystem 420, baseband chip 402 may also include, e.g., an application processor (AP)/host 502, Layer 2 shared memory/external memory 504 (e.g., a double data rate (DDR) memory), and a physical layer (PHY) subsystem 506. [0057] As shown in FIG. 5A, DP subsystem 420 may include, e.g., a DL DP Layer 2 hardware block 510a and a UL DP Layer 2 hardware block 510b. The DL Layer 2 hardware block 510a may include, e.g., a DL DP Layer 2 hardware accelerator block 520a (referred to hereinafter as “DL hardware accelerator block 520a”) that includes a plurality of DP Layer 2 circuits 580a (e.g., PDCP, RLC, SEC, MAC, MAC -PHY interface, etc.), a DL uC 522a configured to generate commands used by the DL Layer 2 circuits 580a for DL packet processing, and a DL shared memory 524a that maintains commands generated by DL uC 522a using the exemplary tokenbased virtual DP thread processing technique and accessible by plurality of DL Layer 2 circuits 580a. Using the commands in the corresponding command/status queue of DL shared memory 524a, DL Layer 2 circuits 580a perform packet processing of DL data packets received from PHY subsystem 506.

[0058] As also shown in FIG. 5 A, UL Layer 2 hardware block 510b of DP subsystem 420 may include, e.g., a UL DP Layer 2 hardware accelerator block 520b (referred to hereinafter as “UL hardware accelerator block 520b”) that includes a plurality of UL DP Layer 2 circuits 580b (e.g., PDCP, RLC, SEC, MAC, MAC -PHY interface, etc.), a UL uC 522b configured to generate commands used by UL Layer 2 circuits 580b for UL packet processing, and a UL shared memory 524b that maintains commands generated by UL uC 522b using the exemplary token-based virtual DP thread processing technique and accessible by plurality of UL Layer 2 circuits 580b. Using the commands in the corresponding command/status queue of UL shared memory 524b, UL Layer 2 circuits 580b perform packet processing of UL data packets, which are sent to PHY subsystem 506 for over-the-air transmission to the base station.

[0059] The DL processing path of DL hardware accelerator block 520a and UL data processing path of UL hardware accelerator block 520b include DL Layer 2 circuits 510a and UL Layer 2 circuits 580b, respectively, that process one or more of, e.g., the MAC header extractions, RLC layer bitmap and window checking operations, PDCP layer bit, window operations, and SEC ciphering/deciphering, and data integrity operations, just to name a few.

[0060] As shown in FIG. 5 A, each of DL uC 522a and UL uC 522b include several virtual DP Layer 2 threads 590 (referred to hereinafter as “VDP Thread k 590”). Each VDP Thread k 590 is responsible for the execution of an instance of a DP Layer (e.g., MAC, SEC, RLC, SDAP, (not shown), PDCP, etc.). As depicted in FIGs. 5B-5E, each VDP Thread k 590 has an associated instance of DP firmware (e.g., controller 560) and a customized instruction set (CX) 564 specific to the DP Layer functions and maintained in a corresponding custom register (CR) 562. In some embodiments, the interworking between the uC and the DP Layer 2 circuits of the corresponding hardware accelerator block may be implemented through a set of Layer 2 command/ status queues, which reside in the corresponding shared memory (e.g., DL shared memory 524a or UL shared memory 524b), as shown in FIGs. 5A, 5B, 5D, and 5E. However, in some other embodiments, each VDP Thread k 590 may also be directly coupled with its corresponding hardware accelerator block without intervening command/status queues and/or a shared memory, as shown in FIG. 5C. For ease of description, the following details of DP subsystem 420 focus on the token-based virtual DP thread process along the DL processing path of DL hardware accelerator block 520a. It is understood that the same or similar operations may be performed along the UL processing path by UL hardware accelerator block 520b without departing from the scope of the present disclosure.

[0061] Referring to FIGs. 5B, DL uC 522a runs each VDP Thread k 590 in an exemplary token-based, round-robin processing loop 501. For example, DL uC 522 loads and executes each VDP Thread k 590 for a specific number of Token k cycles (also referred to herein as “clock cycles”) per VDP Thread k. Each VDP Thread k 590 has its own associated CR 562 to store thread-specific CX 564, data, and thread context, so that the VDP Thread k 590 can be restored instantly when triggered to run again during the next token cycle. DL uC 522a schedules the Token k cycles per VDP Thread k 590 according to the virtual DP thread’s QoS profile. This can be configured during setup time to default values. During run-time, the VDP Thread scheduler (not shown) of DL uC 522a may dynamically adjust the Token k cycle values for each VDP Thread k (e.g., according to the procedure described below in connection with FIG. 7), and interleave all of the virtual DP threads in a round-robin fashion with the customized-grain cycle scheduling.

[0062] In so doing, DP subsystem 420 optimally utilizes uC shared processing resources, e.g., such as the arithmetic logic unit (ALU), pipeline stage logic, instruction decode logic, load/store logic, and cache resources. In addition to avoiding resource wastage due to multiple idle uCs (e.g., multiple DL uCs and multiple UL uCs), pipeline forward logic may also be eliminated. These uC shared processing resources, as well as the associated shared local 524a, Layer 2 hardware accelerators (e.g., DP Layer 2 circuits 590), and network-on-chip (NoC) interconnect resources, e.g., such as bus transactions, external DDR memory access, etc., can be optimized to minimum levels during low data-rate applications. For example, the exemplary token-based virtual DP thread scheduling/processing technique 1) eliminates overhead for thread context switch, including interrupt handling, and memory input/output (I/O) load/store; 2) provides a flexible and scalable DP Layer 2 architecture platform that can be programmable to explore new software-hardware partition(s), add new hardware threads (e.g., VDP Thread k 590), CX 564, CR 562, and associated hardware accelerator blocks; and 3) support an expandable and reusable DP Layer 2 architecture for future 3GPP iterations (e.g., 6G, 7G, and beyond), Wi-Fi 6/7 and beyond, satellite technologies, and automotive technologies, and other subsystems (e.g., PHY subsystem 506, Layer 3 subsystem, etc.). Moreover, in the absence of traffic, baseband chip 402 may turn of unused logic and memory resources to further optimize power consumption.

[0063] Referring to FIG. 5C, DP Layer 2 hardware block 510a includes multiple virtual DP threads in each DP Layer (except the MAC layer, which may also have multiple virtual DP threads in some scenarios), each for a differentiated QoS profile, e.g., such as an ultra-low latency communication (URLLC) application and an enhanced mobile broadband (eMBB) application. In the example shown in FIG. 5C, there are two such different QoS priority VDP Threads in the SEC, RLC, and PDCP circuits (also referred to as “hardware blocks”), respectively. This allows each differentiated QoS flow to be serviced with different priorities by its corresponding VDP Thread 590 with a different number of Token cycles. This configuration also supports fast and efficient thread switching among all the virtual DP threads using CX 564 stored in CR 562. Once a VDP Thread has been run for its allocated number of clock cycles, information may be maintained in cache 592 so that DL uC 522a may pick up the execution of CX 564 at the appropriate instruction during the next token cycle. The default number of Token k cycles (also referred to as the “number of clock cycles”) allocated to each VDP Thread k can be initialized during setup according to its associated Logical Channel/Radio Bearer’s QoS flow profile, where k is the thread number. For instance, during the setup of the QoS Flow with the QoS flow identification (ID) (QFI), with an associated 5G QoS Identifier (5QI), DL uC 522a constructs the mapping of each flow’s 5QI attributes (e.g., priority level, traffic resource type, packet delay budget) to an associated VDP Thread’s token cycle attribute value. The VDP Thread is associated with the handling of one or more mapped QoS flow’ s Logical Channel or Radio Bearer associated with any of the MAC circuit, SEC circuit, RLC circuit, PDCP circuit, etc.

[0064] The number of clock cycles allocated to each VDP Thread by the assigned Token k can be flexibly tuned/configured during initialization with the weights of KI, K2, K3 values, and can have a separate set for each use case scenario or application profile, as set forth below in equation (1).

Token k = (KI) (P) + (K2) (R) + (K3) (D) (1), where P is a QoS flow priority level value (e.g., 1-100 increasing in priority), R is the resource type value (e.g., non-guaranteed bit rate (non-GBR) = 1, guaranteed bit rate (GBR) = 2, or Delay- Critical GBR = 3), D is a packet budget delay value (e.g., range from 5ms to 500ms).

[0065] For example, a set of initial Token k cycle values (e.g., number of clock cycles) can be configured for the VDP Threads, for each use case scenario: VDP Thread l = Token in; VDP_Thread_2 = Token_2n; VDP_Thread_3 = Token_3n; VDP_Thread_4 = Token_4n; VDP_Thread_5 = Token_5n; VDP_Thread_6 = Token_6n; VDP_Thread_7 = Token_7n, etc. Using the same example, Token in may allocate a first number of clock cycles for VDP Thread l, Token_2n may allocate a second number of clock cycles for VDP_Thread_2, and so on.

[0066] Referring to FIGs. 5B-5E, the MAC controller may execute the MAC CX, using Token in, to run VDP Thread l for a first number of clock cycles (allocated by Token in) during a first time period. The duration of the first time period may be the length of the first number of clock cycles. At the end of the first time period, the MAC controller may halt the execution of the MAC CX. The command generated by the execution of MAC CX may be sent to the MAC circuit either directly or via Layer 2 shared memory 524a, e.g., the end of the first time period or after the end of the first time period. An indication of where the execution of the MAC controller halted the execution of the MAC CX at the end of the first time period may be maintained in cache 592. [0067] During a second time period subsequent to and contiguous with the first time period, the SEC controller may execute SEC CX, using Token_2n, to run VDP_Thread_2 for a second number of clock cycles (allocated by Token_2n). The duration of the second time period may be the length of the second number of clock cycles. At the end of the second time period, the SEC controller may halt the execution of the SEC CX. The command generated by the execution of SEC CX may be sent to the SEC circuit either directly or via Layer 2 shared memory 524a, e.g., the end of the second time period or after the end of the second time period. An indication of where the execution of the SEC controller halted the execution of the SEC CX at the end of the second time period may be maintained in cache 592. This process may continue for each of remaining VDP Threads 590. Once the CX 564 for each of the VDP Threads 590 has been executed by its corresponding controller 560, the MAC controller may resume the execution of MAC CX based on the indication maintained in cache 592 during the previous token cycle.

[0068] In some embodiments, the MAC controller may collect and cal culate/di splay delay statistics associated with VDP Thread l and maintain the delay statistics in cache 592. The delay statistics may indicate whether the generation of a command took longer than the number of clock cycles allocated by the corresponding Token k. The number of clock cycles associated with a Token k assigned to a VDP Thread 590 may be dynamically changed based on the delay statistics, as described below in connection with FIG. 7.

[0069] Referring to FIG. 5D, in some embodiments, each VDP Thread 590 may be tightly coupled to its corresponding DP Layer 2 circuit 590 without an intervening Layer 2 shared memory (e.g., Layer 2 shared memory 524a). Here, each CX 564 is loaded directly from its corresponding CR 562, and the command(s) generated by its execution are sent to the DP Layer 2 circuit 590 immediately and without a Layer 2 shared memory and the associated memory operations overhead. While the MAC circuit is processing the packet descriptor from the current command generated by VDP Thread l, DL uC 522a may execute VDP_Thread_2 based on its Token_2n. With this scheme, there is no need for a queue-based command/ status queue interworking between the DL uC 522a and DL hardware accelerator block 520a, thereby eliminating the use of shared memory resources and the associated signaling.

[0070] Referring to FIG. 5E, the DP Layer 2 architecture may include multiple DP Layer 2 circuits for the same layer (e.g., two MAC circuits (not shown), two SEC circuits, two RLC circuits, two PDCP circuits, two SDAP circuits (now shown), etc.) to support both high-throughput application(s) (e.g., eMBB, massive machine-type communication (mMTC), etc.) and low-latency application(s) (e.g., URLLC). Here, the DL uC 522a still seamlessly configures and runs each VDP Thread 590, where each thread is virtually mapped to its corresponding DP Layer 2 circuit 590 of the appropriate application type (e.g., eMBB or URLLC). In the example depicted in FIG. 5E, the SEC hardware accelerator block, RLC hardware accelerator block, and PDCP hardware accelerator block each comprise multiple SEC, RLC, and PDCP circuits, respectively. Each layer in the Layer 2 protocol stack (except the MAC layer) is mapped to its own VDP Thread 590 (e.g., three VDP Threads per layer).

[0071] FIG. 6 illustrates a flowchart of a first exemplary method 600 of wireless communication, according to embodiments of the disclosure. First exemplary method 600 may be performed by an apparatus for wireless communication, e.g., such as a UE, a baseband chip, a DP subsystem, a DP Layer 2 hardware accelerator block, and/or a microcontroller, just to name a few. Method 600 may include steps 602-612 as described below. It is to be appreciated that some of the steps may be optional, and some of the steps may be performed simultaneously, or in a different order than shown in FIG. 6.

[0072] Referring to FIG. 6, at 602, the apparatus may maintain a first instruction set for a first virtual DP thread associated with at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. For example, referring to FIGs. 5A-5E, each VDP Thread k 590 has an associated instance of DP firmware (e.g., controller 560) and a CX 564 specific to the DP Layer functions and maintained in a corresponding CR 562.

[0073] At 604, the apparatus may maintain a second instruction set for at least one second virtual DP thread associated with at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. For example, referring to FIGs. 5B-5D, each VDP Thread k 590 has an associated instance of DP firmware (e.g., controller 560) and a CX 564 specific to the DP Layer functions and maintained in a corresponding CR 562.

[0074] At 606, the apparatus may assign a first token associated with a first number of clock cycles to the first virtual DP thread based on a first QoS profile associated with the at least one first DP Layer 2 circuit. For example, referring to FIGs. 5A-5E, DP Layer 2 hardware block 510a includes multiple virtual DP threads in each DP Layer (except the MAC layer, which may also have multiple virtual DP threads in some scenarios), each for a differentiated QoS profile, e.g., such as a URLLC application and an eMBB application. In the example shown in FIG. 5C, there are two such different QoS priority VDP Threads in the SEC, RLC, and PDCP circuits (also referred to as “hardware blocks”), respectively. This allows each differentiated QoS flow to be serviced with different priorities by the virtual DP thread with a different number of Token cycles. This configuration may also enable fast and efficient thread switching among all the virtual DP threads using CX 564 stored in CR 562. Information about each virtual DP thread instance may be maintained in cache 592 so that DL uC 522a may pick up the execution of CX 564 at the appropriate instruction during the next token cycle. The default number Token k cycles for each VDP Thread k can be initialized during setup according to its associated Logical Channel/ Radio Bearer’s QoS flow profile, where k is the thread number. For instance, during the setup of the QoS Flow with the QFI, with an associated 5QI, DL uC 522a constructs the mapping of each flow’s 5QI attributes (e.g., priority level, traffic resource type, packet delay budget) to an associated VDP Thread’s token cycle attribute value. The VDP Thread is associated with the handling of one or more mapped QoS flow’s Logical Channel or Radio Bearer associated with any of the MAC circuit, SEC circuit, RLC circuit, PDCP circuit, etc. The number of Token cycles (e.g., token cycle value) allocated to each VDP Thread’s by the assigned Token k can be flexibly tuned/configured during initialization or during operation (as described below in connection with FIG. 7) with the weights of KI, K2, K3 values, and can have a separate set for each use case scenario or application profile, as set forth above in equation (1).

[0075] At 608, the apparatus may assign a second token associated with a second number of clock cycles to the second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit. For example, referring to FIGs. 5A-5E, DP Layer 2 hardware block 510a includes multiple virtual DP threads in each DP Layer (except the MAC layer which may also have multiple virtual DP threads in some scenarios), each for a differentiated QoS profile, e.g., such as a URLLC application and an eMBB application. In the example shown in FIG. 5C, there are two such different QoS priority VDP Threads in the SEC, RLC, and PDCP circuits (also referred to as “hardware blocks”), respectively. This allows each differentiated QoS flow to be serviced with different priorities by the virtual DP thread with a different number of Token cycles. This configuration may also enable fast and efficient thread switching among all the virtual DP threads using CX 564 stored in CR 562. Information about each virtual DP thread instance may be maintained in cache 592 so that DL uC 522a may pick up the execution of CX 564 at the appropriate instruction during the next token cycle. The default number Token k cycles for each VDP Thread k can be initialized during setup according to its associated Logical Channel/ Radio Bearer’s QoS flow profile, where k is the thread number. For instance, during the setup of the QoS Flow with the QFI, with an associated 5QI, DL uC 522a constructs the mapping of each flow’s 5QI attributes (e.g., priority level, traffic resource type, packet delay budget) to an associated VDP Thread’s token cycle attribute value. The VDP Thread is associated with the handling of one or more mapped QoS flow’s Logical Channel or Radio Bearer associated with any of the MAC circuit, SEC circuit, RLC circuit, PDCP circuit, etc. The number of Token cycles (e.g., token cycle value) allocated to each VDP Thread by the assigned Token k can be flexibly tuned/configured during initialization or during operation (as described below in connection with FIG. 7) with the weights of KI, K2, K3 values, and can have a separate set for each use case scenario or application profile, as set forth above in equation (1).

[0076] At 610, the apparatus may execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a first time period. For example, referring to FIGs. 5B-5E, the MAC controller may execute the MAC CX, using Token in, to run VDP Thread l for a first number of clock cycles (allocated by Token in) during a first time period. The duration of the first time period may be the length of the first number of clock cycles. At the end of the first time period, the MAC controller may halt the execution of the MAC CX. The command generated by the execution of MAC CX may be sent to the MAC circuit either directly or via Layer 2 shared memory 524a, e.g., the end of the first time period or after the end of the first time period. An indication of where the execution of the MAC controller halted the execution of the MAC CX at the end of the second time period may be maintained in cache 592.

[0077] At 612, the apparatus may execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a second time period. For example, referring to FIGs. 5B-5E, during a second time period subsequent to and contiguous with the first time period, the SEC controller may execute SEC CX, using Token_2n, to run VDP_Thread_2 for a second number of clock cycles (allocated by Token_2n). The duration of the second time period may be the length of the second number of clock cycles. At the end of the second time period, the SEC controller may halt the execution of the SEC CX. The command generated by the execution of SEC CX may be sent to the SEC circuit either directly or via Layer 2 shared memory 524a, e.g., the end of the second time period or after the end of the second time period. An indication of where the execution of the SEC controller halted the execution of the SEC CX at the end of the second time period may be maintained in cache 592. This process may continue for each of remaining VDP Threads 590. Once the CX 564 for each of the VDP Threads 590 has been executed by its corresponding controller 560, the MAC controller may resume the execution of MAC CX based on the indication maintained in cache 592 during the previous token cycle.

[0078] FIG. 7 illustrates a flowchart of a first exemplary method 700 of wireless communication, according to embodiments of the disclosure. First exemplary method 700 may be performed by an apparatus for wireless communication, e.g., such as a UE, a baseband chip, a DP subsystem, a DP Layer 2 hardware accelerator block, and/or a microcontroller, just to name a few. Method 700 may include steps 702-718 as described below. It is to be appreciated that some of the steps may be optional, and some of the steps may be performed simultaneously, or in a different order than shown in FIG. 7.

[0079] Referring to FIG. 7, at 702, the apparatus may initialize token cycles for each VDP Thread to default values based on QoS requirements. For example, referring to FIGs. 5 A- 5E, DP Layer 2 hardware block 510a includes multiple virtual DP threads in each DP Layer (except the MAC layer, which may also have multiple virtual DP threads in some scenarios), each for a differentiated QoS profile, e.g., such as a URLLC application and an eMBB application. In the example shown in FIG. 5C, there are two such different QoS priority VDP Threads in the SEC, RLC, and PDCP circuits (also referred to as “hardware blocks”), respectively. This allows each differentiated QoS flow to be serviced with different priority by the virtual DP thread with a different number of Token cycles, and with fast and efficient thread switching among all the virtual DP threads using CX 564 stored in CR 562 and information about each virtual DP thread instance, which may be maintained in cache 592 so that DL uC 522a may pick up the execution of CX 564 at the appropriate instruction during the next token cycle. The default number Token k cycles for each VDP Thread k can be initialized during setup according to its associated Logical Channel/Radio Bearer’s QoS flow profile, where k is the thread number. For instance, during the setup of the QoS Flow with the QFI, with an associated 5G 5QI, DL uC 522a constructs the mapping of each flow’s 5QI attributes (e.g., priority level, traffic resource type, packet delay budget) to an associated VDP Thread’s token cycle attribute value. The VDP Thread is associated with the handling of one or more mapped QoS flow’s Logical Channel or Radio Bearer associated with any of the MAC circuit, SEC circuit, RLC circuit, PDCP circuit, etc. The number of clock cycles (also referred to herein as Token k cycle value) allocated to each VDP Thread by the assigned Token k can be flexibly tuned/configured during initialization with the weights of KI, K2, K3 values, and can have a separate set for each use case scenario or application profile, as set forth above in equation (1).

[0080] At 704, the apparatus may implement a thread context switch to each VDP Thread in a round-robin fashion. For example, referring to FIGs. 5A-5E, baseband chip 402 may run multiple virtual DP threads within a single uC, with each thread being executed in a custom-grained interleaved round-robin fashion using tunable and dynamic token cycles per run, according to the thread’s QoS profile. Each virtual DP thread context is maintained in a fast, local customized register set at the microcontroller, and is switched in and out quickly without accessing external memory.

[0081] At 706, the apparatus may determine whether there are CX left to execute/process for VDP Thread k. In response to there being additional CX to execute/process for VDP Thread k (YES: at 706), the operations may move to 708. On the other hand, when there is no CX left to execute/process for VDP Thread k (NO: at 706), the operations may return to 704, where the apparatus may switch to the next VDP_Thread_k+l.

[0082] At 708, the apparatus may run VDP Thread k for the number of clock cycles associated with Token k. For example, referring to FIGs. 5A-5E, the MAC controller may execute the MAC CX, using Token in, to run VDP Thread l for a first number of clock cycles (allocated by Token in) during a first time period. The duration of the first time period may be the length of the first number of clock cycles. At the end of the first time period, the MAC controller may halt the execution of the MAC CX. The command generated by the execution of MAC CX may be sent to the MAC circuit either directly or via Layer 2 shared memory 524a, e.g., the end of the first time period or after the end of the first time period.

[0083] At 710, the apparatus may collect and calculate statistics for processing delay T_k for VDP Thread k. For example, referring to FIGs. 5A-5E, MAC controller may collect and calculate/display delay statistics associated with VDP Thread l and maintain the delay statistics in cache 592. The delay statistics may indicate whether the generation of a command took longer than the number of clock cycles allocated by the corresponding Token k.

[0084] At 712, the apparatus may determine whether the processing delay T_k (e.g., processing duration) is greater than or equal to a processing delay/duration threshold T k thresh for a first predetermined number of times (e.g., Countup k). Countup k (e.g., the first predetermined number of times) may be any integer value greater than or equal to 1. For example, the MAC controller may compare the processing delay/duration T_k to the processing delay/duration threshold to determine whether it is greater than or equal to T k thresh. In response to T_k being greater than or equal to T k thresh for the first predetermined number of times (YES: at 712), the operations may move to 714. Otherwise, in response to T_k not being greater than or equal to T k thresh for the first predetermined number of times (NO: at 712), the operations may move to 716.

[0085] At 714, the apparatus may increase the number of clock cycles associated with Token k to Token_k+=Token_k_deltaUp. In some examples, the number of clock cycles may be increased by one clock cycle. In some other embodiments, the number of clock cycles may be increased by an integer number of clock cycles greater than one. By way of example, DL uC 522a may increase the number of clock cycles associated with Token in for VDP Thread l by a number of clock cycles (e.g., one or more additional clock cycles) proportional to the amount by which T_k exceeds T k thresh.

[0086] At 716, the apparatus may determine whether the processing del ay /duration T_k is less than or equal to the number of clock cycles (e.g., T k max) initially set for Token k for a second predetermined number of times (e.g., Countdown k). Countdown k (e.g., the second predetermined number of times) may be any integer value greater than or equal to 1. In response to T_k not being less than or equal to T k max for the second predetermined number of times (NO: at 716), the operations may return to 704. On the other hand, in response to T_k being less than or equal to the clock cycles initially set for Token k for the predetermined number of times (YES: at 716), the operations may move to 718.

[0087] At 718, the apparatus decreases the number of clock cycles associated with Token k to Token_k-=Token_k_deltaDown. In some embodiments, the number of clock cycles may be decreased by one clock cycle. In some other embodiments, the number of clock cycles may be decreased by an integer number of clock cycles greater than one. By way of example, DL uC 522a may decrease the number of clock cycles associated with Token in for VDP Thread l by a number of clock cycles (e.g., one or more fewer clock cycles) proportional to the amount by a difference between T_k and T k max.

[0088] In various aspects of the present disclosure, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or encoded as instructions or code on a non-transitory computer-readable medium. Computer-readable media includes computer storage media. Storage media may be any available media that can be accessed by a computing device, such as node 300 in FIG. 3. By way of example, and not limitation, such computer-readable media can include RAM, ROM, EEPROM, CD-ROM or other optical disk storage, HDD, such as magnetic disk storage or other magnetic storage devices, Flash drive, SSD, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a processing system, such as a mobile device or a computer. Disk and disc, as used herein, includes CD, laser disc, optical disc, digital video disc (DVD), and floppy disk where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media. [0089] According to one aspect of the present disclosure, a baseband chip is provided. The baseband chip may include a DP Layer 2 hardware accelerator block comprising a plurality of DP Layer 2 circuits. The baseband chip may further include a microcontroller. The microcontroller may include a first instruction set for a first virtual DP thread associated with at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block and a second instruction set for at least one second virtual DP thread associated with at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. The microcontroller may be configured to assign a first token associated with a first number of clock cycles to the first virtual DP thread based on a first QoS profile associated with the at least one first DP Layer 2 circuit. The microcontroller may be configured to assign a second token associated with a second number of clock cycles to the second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit. The microcontroller may be configured to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a first time period. The microcontroller may be configured to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a second time period that follows the first time period.

[0090] In some embodiments, the microcontroller may further include a first register configured to maintain the first instruction set associated with the first virtual DP thread. In some embodiments, the microcontroller may further include a second register configured to maintain the second instruction set associated with the second virtual DP thread.

[0091] In some embodiments, the microcontroller may be configured to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a third time period following the second time period. In some embodiments, the microcontroller may be configured to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a fourth time period following the third time period. In some embodiments, the first time period, the second time period, the third time period, and the fourth time period may be contiguous in a time-domain. [0092] In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller may be configured to access the first instruction set from the first register at a start of the first time period associated with the first number of clock cycles. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller may be configured to execute a first portion of the first instruction set for the first number of clock cycles to generate a first command during the first time period. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller may be configured to halt an execution of the first portion of the first instruction set at an end of the first time period associated with the first number of clock cycles. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller may be configured to send the first command to the first DP Layer 2 circuit during the first time period.

[0093] In some embodiments, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the second time period, the microcontroller may be configured to access the second instruction set from the second register at a start of the second time period associated with the second number of clock cycles. In some embodiments, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the second time period, the microcontroller may be configured to execute a first portion of the second instruction set for the second number of clock cycles to generate a second command during the second time period. In some embodiments, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the second time period, the microcontroller may be configured to halt an execution of the first portion of the second instruction set at an end of the second time period associated with the second number of clock cycles. In some embodiments, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the second time period, the microcontroller may be configured to send the second command to the second DP Layer 2 circuit at the end of the second time period.

[0094] In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the third time period, the microcontroller may be configured to access the first instruction set from the first register at a start of the third time period associated with the first number of clock cycles. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the third time period, the microcontroller may be configured to execute a second portion of the first instruction set for the first number of clock cycles to generate a third command, the second portion of the first instruction set being different than the first portion of the first instruction set. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the third time period, the microcontroller may be configured to halt an execution of the second portion of the first instruction set at an end of the third time period associated with the first number of clock cycles. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the third time period, the microcontroller may be configured to send the third command to the first DP Layer 2 circuit at the end of the third time period.

[0095] In some embodiments, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the fourth time period, the microcontroller may be configured to access the second instruction set from the second register at a start of the fourth time period associated with the second number of clock cycles. In some embodiments, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the fourth time period, the microcontroller may be configured to execute a second portion of the second instruction set for the second number of clock cycles to generate a second command during the fourth time period. In some embodiments, the second portion of the second instruction set may be different than the first portion of the second instruction set. In some embodiments, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the fourth time period, the microcontroller may be configured to halt an execution of the second portion of the second instruction set at an end of the fourth time period associated with the second number of clock cycles. In some embodiments, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the fourth time period, the microcontroller may be configured to send the second command to the second DP Layer 2 circuit at the end of the fourth time period.

[0096] In some embodiments, the baseband chip may further include Layer 2 shared memory coupled to the microcontroller and the plurality of DP Layer 2 circuits. In some embodiments, the first command may be sent to the first DP Layer 2 circuit via the Layer 2 shared memory.

[0097] In some embodiments, the first command may be sent directly to the first DP Layer 2 circuit.

[0098] In some embodiments, the microcontroller may further include a third instruction set for a third virtual DP thread associated with the at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block and a fourth instruction set for a fourth virtual DP thread associated with the at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. In some embodiments, the microcontroller may be configured to assign a third token associated with a third number of clock cycles to the third virtual DP thread based on a third QoS profile associated with the first DP Layer 2 circuit. In some embodiments, the microcontroller may be configured to assign a fourth token associated with a fourth number of clock cycles to a fourth virtual DP thread based on a fourth QoS profile associated with the second DP Layer 2 circuit. In some embodiments, the microcontroller may be configured to execute, using the third token, the third instruction set for the third number of clock cycles to run the third virtual DP thread during a third time period following the first time period and before the second time period. In some embodiments, the microcontroller may be configured to execute, using the fourth token, the fourth instruction set for the fourth number of clock cycles to run the fourth virtual DP thread during a fourth time period following the second time period.

[0099] In some embodiments, the first token may be assigned based on a first QoS priority level associated with the first QoS profile, a first resource type associated with the first QoS profile, a first packet delay budget associated with the first QoS profile, and a first set of weights. In some embodiments, the second token may be assigned based on a second QoS priority level associated with the second QoS profile, a second resource type associated with the second QoS profile, a second packet delay budget associated with the second QoS profile, and a second set of weights.

[0100] In some embodiments, the microcontroller may be further configured to determine, based on the first virtual DP thread being run, a processing delay associated with the first virtual DP thread during the first time period. In some embodiments, in response to the processing delay meeting a first delay threshold for a first predetermined number of times, the microcontroller may be further configured to increase the first number of clock cycles associated with the first token for a third time period following the second time period. In some embodiments, in response to the processing delay meeting a second delay threshold for a second predetermined number of times, the microcontroller may be further configured to decrease the first number of clock cycles associated with the first token for the third time period following the second time period.

[0101] In some embodiments, the at least one first DP Layer 2 circuit may include a plurality of first Layer 2 hardware circuits each associated with a first Layer 2 entity. In some embodiments, the at least one second DP Layer 2 circuit may include a plurality of second Layer 2 hardware circuits each associated with a second Layer 2 entity different than the first Layer 2 entity. In some embodiments, the first virtual DP thread may be run at the first time period for each of the plurality of first Layer 2 circuits associated with the first Layer 2 entity. In some embodiments, the second virtual DP thread may be run at the second time period for each of the plurality of second Layer 2 circuits associated with the second Layer 2 entity.

[0102] According to another aspect of the present disclosure, a DP Layer 2 hardware block for a baseband chip is provided. The DP Layer 2 hardware block may include a plurality of DP Layer 2 circuits. The DP Layer 2 hardware block may also include a microcontroller. The microcontroller may include a first instruction set for a first virtual DP thread associated with at least one first DP Layer 2 circuit of the plurality of DP Layer 2 circuits and a second instruction set for at least one second virtual DP thread associated with at least one second DP Layer 2 circuit of the plurality of DP Layer 2 circuits. The microcontroller may be configured to assign a first token associated with a first number of clock cycles to the first virtual DP thread based on a first QoS profile associated with the at least one first DP Layer 2 circuit. The microcontroller may be configured to assign a second token associated with a second number of clock cycles to a second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit. The microcontroller may be configured to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a first time period. The microcontroller may be configured to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a second time period that follows the first time period.

[0103] In some embodiments, the microcontroller may be further configured to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a third time period. In some embodiments, the microcontroller may be further configured to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a fourth time period. In some embodiments, the third time period may follow the second time period, and the fourth time period may follow the third time period. In some embodiments, the first time period, the second time period, the third time period, and the fourth time period may be contiguous in a time-domain.

[0104] In some embodiments, the microcontroller may further include a first register configured to maintain the first instruction set associated with the first virtual DP thread. In some embodiments, the microcontroller may further include a second register configured to maintain the second instruction set associated with the second virtual DP thread. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller may be configured to access the first instruction set from the first register at a start of the first time period associated with the first number of clock cycles. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller may be configured to execute a first portion of the first instruction set for the first number of clock cycles to generate a first command during the first time period. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller may be configured to halt an execution of the first portion of the first instruction set at an end of the first time period associated with the first number of clock cycles. In some embodiments, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller may be configured to send the first command to the first DP Layer 2 circuit during the first time period.

[0105] In some embodiments, the microcontroller may be further configured to determine, based on the first virtual DP thread being run, a processing delay associated with the first virtual DP thread during the first time period. In some embodiments, in response to the processing delay meeting a first delay threshold for a first predetermined number of times, the microcontroller may be configured to increase the first number of clock cycles associated with the first token for a third time period following the second time period. In some embodiments, in response to the processing delay meeting a second delay threshold for a second predetermined number of times, the microcontroller may be configured to decrease the first number of clock cycles associated with the first token for the third time period following the second time period.

[0106] According to yet another aspect of the present disclosure, a method of wireless communication of a baseband chip is provided. The method may include maintaining, by a first register of a microcontroller, a first instruction set for a first virtual DP thread associated with at least one first DP Layer 2 circuit of a DP Layer 2 hardware accelerator block. The method may include maintaining, by a second register of the microcontroller, a second instruction set for a second virtual DP thread associated with at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. The method may include assigning, by the microcontroller, a first token associated with a first number of clock cycles to the first virtual DP thread based on a first QoS profile associated with the at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. The method may include assigning, by the microcontroller, a second token associated with a second number of clock cycles to the second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block. The method may include executing, by the microcontroller, the first instruction set for the first number of clock cycles using the first token to run the first virtual DP thread during a first time period. The method may include executing, by the microcontroller, the second instruction set for the second number of clock cycles using the second token to run the second virtual DP thread during a second time period that follows the first time period.

[0107] In some embodiments, the executing, by the microcontroller, the first instruction set for the first number of clock cycles using the first token to run the first virtual DP thread during the first time period may include accessing, by the microcontroller, the first instruction set from the first register at a start of the first time period associated with the first number of clock cycles. In some embodiments, the executing, by the microcontroller, the first instruction set for the first number of clock cycles using the first token to run the first virtual DP thread during the first time period may include executing, by the microcontroller, a first portion of the first instruction set for the first number of clock cycles to generate a first command during the first time period. In some embodiments, the executing, by the microcontroller, the first instruction set for the first number of clock cycles using the first token to run the first virtual DP thread during the first time period may include halting, by the microcontroller, an execution of the first portion of the first instruction set at an end of the first time period associated with the first number of clock cycles. In some embodiments, the executing, by the microcontroller, the first instruction set for the first number of clock cycles using the first token to run the first virtual DP thread during the first time period may include sending, by the microcontroller, the first command to the first DP Layer 2 circuit during the first time period. In some embodiments, the executing, by the microcontroller, the second instruction set for the second number of clock cycles using the second token to run the second virtual DP thread during the second time period includes accessing, by the microcontroller, the second instruction set from the second register at a start of the second time period associated with the second number of clock cycles. In some embodiments, the executing, by the microcontroller, the second instruction set for the second number of clock cycles using the second token to run the second virtual DP thread during the second time period includes executing, by the microcontroller, a first portion of the second instruction set for the second number of clock cycles to generate a second command during the second time period. In some embodiments, the executing, by the microcontroller, the second instruction set for the second number of clock cycles using the second token to run the second virtual DP thread during the second time period includes halting, by the microcontroller, an execution of the first portion of the second instruction set at an end of the second time period associated with the second number of clock cycles. In some embodiments, the executing, by the microcontroller, the second instruction set for the second number of clock cycles using the second token to run the second virtual DP thread during the second time period includes sending, by the microcontroller, the second command to the second DP Layer 2 circuit at the end of the second time period.

[0108] In some embodiments, the method may include determining, based on the first virtual DP thread being run during the first time period, a processing delay associated with the first virtual DP thread during the first time period. In some embodiments, in response to the processing delay meeting a first delay threshold for a first predetermined number of times, the method may include increasing the first number of clock cycles associated with the first token for a third time period following the second time period. In some embodiments, in response to the processing delay meeting a second delay threshold for a second predetermined number of times, the method may include decreasing the first number of clock cycles associated with the first token for the third time period following the second time period.

[0109] The foregoing description of the specific embodiments will so reveal the general nature of the present disclosure that others can, by applying knowledge within the skill of the art, readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present disclosure. Therefore, such adaptations and modifications are intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance.

[0110] Embodiments of the present disclosure have been described above with the aid of functional building blocks illustrating the implementation of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed.

[OHl] The Summary and Abstract sections may set forth one or more but not all exemplary embodiments of the present disclosure as contemplated by the inventor(s), and thus, are not intended to limit the present disclosure and the appended claims in any way.

[0112] Various functional blocks, modules, and steps are disclosed above. The particular arrangements provided are illustrative and without limitation. Accordingly, the functional blocks, modules, and steps may be re-ordered or combined in different ways than in the examples provided above. Likewise, certain embodiments include only a subset of the functional blocks, modules, and steps, and any such subset is permitted.

[0113] The breadth and scope of the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

Claims

WHAT IS CLAIMED IS:

1. A baseband chip, comprising: a dataplane (DP) Layer 2 hardware accelerator block comprising a plurality of DP Layer 2 circuits; and a microcontroller comprising a first instruction set for a first virtual DP thread associated with at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block and a second instruction set for at least one second virtual DP thread associated with at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block, wherein the microcontroller is configured to: assign a first token associated with a first number of clock cycles to the first virtual DP thread based on a first quality-of-service (QoS) profile associated with the at least one first DP Layer 2 circuit; assign a second token associated with a second number of clock cycles to the second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit; execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a first time period; and execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a second time period, wherein the second time period follows the first time period.

2. The baseband chip of claim 1, wherein the microcontroller further comprises: a first register configured to maintain the first instruction set associated with the first virtual DP thread; and a second register configured to maintain the second instruction set associated with the second virtual DP thread.

3. The baseband chip of claim 2, wherein the microcontroller is further configured to: execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a third time period following the second time period; and execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a fourth time period following the third time period, wherein the first time period, the second time period, the third time period, and the fourth time period are contiguous in a time-domain.

4. The baseband chip of claim 3, wherein, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller is configured to: access the first instruction set from the first register at a start of the first time period associated with the first number of clock cycles; execute a first portion of the first instruction set for the first number of clock cycles to generate a first command during the first time period; halt an execution of the first portion of the first instruction set at an end of the first time period associated with the first number of clock cycles; and send the first command to the first DP Layer 2 circuit during the first time period.

5. The baseband chip of claim 4, wherein, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the second time period, the microcontroller is configured to: access the second instruction set from the second register at a start of the second time period associated with the second number of clock cycles; execute a first portion of the second instruction set for the second number of clock cycles to generate a second command during the second time period; halt an execution of the first portion of the second instruction set at an end of the second time period associated with the second number of clock cycles; and send the second command to the second DP Layer 2 circuit at the end of the second time period.

6. The baseband chip of claim 5, wherein, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the third time period, the microcontroller is configured to: access the first instruction set from the first register at a start of the third time period associated with the first number of clock cycles; execute a second portion of the first instruction set for the first number of clock cycles to generate a third command, the second portion of the first instruction set being different than the first portion of the first instruction set; halt an execution of the second portion of the first instruction set at an end of the third time period associated with the first number of clock cycles; and send the third command to the first DP Layer 2 circuit at the end of the third time period.

7. The baseband chip of claim 6, wherein, to execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during the fourth time period, the microcontroller is configured to: access the second instruction set from the second register at a start of the fourth time period associated with the second number of clock cycles; execute a second portion of the second instruction set for the second number of clock cycles to generate a second command during the fourth time period, the second portion of the second instruction set being different than the first portion of the second instruction set; halt an execution of the second portion of the second instruction set at an end of the fourth time period associated with the second number of clock cycles; and send the second command to the second DP Layer 2 circuit at the end of the fourth time period.

8. The baseband chip of claim 4, further comprising:

Layer 2 shared memory coupled to the microcontroller and the plurality of DP Layer 2 circuits, wherein the first command is sent to the first DP Layer 2 circuit via the Layer 2 shared memory.

9. The baseband chip of claim 4, wherein the first command is sent directly to the first DP Layer 2 circuit.

10. The baseband chip of claim 1, wherein the microcontroller further comprises a third instruction set for a third virtual DP thread associated with the at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block and a fourth instruction set for a fourth virtual DP thread associated with the at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block, wherein the microcontroller is configured to: assign a third token associated with a third number of clock cycles to the third virtual DP thread based on a third QoS profile associated with the first DP Layer 2 circuit; assign a fourth token associated with a fourth number of clock cycles to a fourth virtual DP thread based on a fourth QoS profile associated with the second DP Layer 2 circuit; execute, using the third token, the third instruction set for the third number of clock cycles to run the third virtual DP thread during a third time period following the first time period and before the second time period; and execute, using the fourth token, the fourth instruction set for the fourth number of clock cycles to run the fourth virtual DP thread during a fourth time period following the second time period.

11. The baseband chip of claim 1, wherein: the first token is assigned based on a first QoS priority level associated with the first QoS profile, a first resource type associated with the first QoS profile, a first packet delay budget associated with the first QoS profile, and a first set of weights, and the second token is assigned based on a second QoS priority level associated with the second QoS profile, a second resource type associated with the second QoS profile, a second packet delay budget associated with the second QoS profile, and a second set of weights.

12. The baseband chip of claim 1, wherein the microcontroller is further configured to: determine, based on the first virtual DP thread being run, a processing delay associated with the first virtual DP thread during the first time period; in response to the processing delay meeting a first delay threshold for a first predetermined number of times, increase the first number of clock cycles associated with the first token for a third time period following the second time period; and in response to the processing delay meeting a second delay threshold for a second predetermined number of times, decrease the first number of clock cycles associated with the first token for the third time period following the second time period.

13. The baseband chip of claim 1, wherein: the at least one first DP Layer 2 circuit comprises a plurality of first Layer 2 circuits each associated with a first Layer 2 entity, the at least one second DP Layer 2 circuit comprises a plurality of second Layer 2 circuits each associated with a second Layer 2 entity different than the first Layer 2 entity, the first virtual DP thread is run at the first time period for each of the plurality of first Layer 2 circuits associated with the first Layer 2 entity, and the second virtual DP thread is run at the second time period for each of the plurality of second Layer 2 circuits associated with the second Layer 2 entity.

14. A dataplane (DP) Layer 2 hardware block for a baseband chip, comprising: a plurality of DP Layer 2 circuits; and a microcontroller comprising a first instruction set for a first virtual DP thread associated with at least one first DP Layer 2 circuit of the plurality of DP Layer 2 circuits and a second instruction set for at least one second virtual DP thread associated with at least one second DP Layer 2 circuit of the plurality of DP Layer 2 circuits, wherein the microcontroller is configured to: assign a first token associated with a first number of clock cycles to the first virtual DP thread based on a first quality-of-service (QoS) profile associated with the at least one first DP Layer 2 circuit; assign a second token associated with a second number of clock cycles to a second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit; execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a first time period; and execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a second time period, wherein the second time period follows the first time period.

15. The DP Layer 2 hardware block of claim 14, wherein the microcontroller is further configured to: execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during a third time period; and execute, using the second token, the second instruction set for the second number of clock cycles to run the second virtual DP thread during a fourth time period, wherein the third time period follows the second time period and the fourth time period follows the third time period, and wherein the first time period, the second time period, the third time period, and the fourth time period are contiguous in a time-domain.

16. The DP Layer 2 hardware block of claim 14, wherein the microcontroller further comprises: a first register configured to maintain the first instruction set associated with the first virtual

DP thread; a second register configured to maintain the second instruction set associated with the second virtual DP thread, wherein, to execute, using the first token, the first instruction set for the first number of clock cycles to run the first virtual DP thread during the first time period, the microcontroller is configured to: access the first instruction set from the first register at a start of the first time period associated with the first number of clock cycles; execute a first portion of the first instruction set for the first number of clock cycles to generate a first command during the first time period; halt an execution of the first portion of the first instruction set at an end of the first time period associated with the first number of clock cycles; and send the first command to the first DP Layer 2 circuit during the first time period.

17. The DP Layer 2 hardware block of claim 14, wherein the microcontroller is further configured to: determine, based on the first virtual DP thread being run, a processing delay associated with the first virtual DP thread during the first time period; in response to the processing delay meeting a first delay threshold for a first predetermined number of times, increase the first number of clock cycles associated with the first token for a third time period following the second time period; and in response to the processing delay meeting a second delay threshold for a second predetermined number of times, decrease the first number of clock cycles associated with the first token for the third time period following the second time period.

18. A method of wireless communication of a baseband chip, comprising: maintaining, by a first register of a microcontroller, a first instruction set for a first virtual dataplane (DP) thread associated with at least one first DP Layer 2 circuit of a DP Layer 2 hardware accelerator block; maintaining, by a second register of the microcontroller, a second instruction set for a second virtual DP thread associated with at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block; assigning, by the microcontroller, a first token associated with a first number of clock cycles to the first virtual DP thread based on a first quality-of-service (QoS) profile associated with the at least one first DP Layer 2 circuit of the DP Layer 2 hardware accelerator block; assigning, by the microcontroller, a second token associated with a second number of clock cycles to the second virtual DP thread based on a second QoS profile associated with the at least one second DP Layer 2 circuit of the DP Layer 2 hardware accelerator block; executing, by the microcontroller, the first instruction set for the first number of clock cycles using the first token to run the first virtual DP thread during a first time period; and executing, by the microcontroller, the second instruction set for the second number of clock cycles using the second token to run the second virtual DP thread during a second time period, wherein the second time period follows the first time period.

19. The method of claim 18, wherein: the executing, by the microcontroller, the first instruction set for the first number of clock cycles using the first token to run the first virtual DP thread during the first time period comprises: accessing, by the microcontroller, the first instruction set from the first register at a start of the first time period associated with the first number of clock cycles; executing, by the microcontroller, a first portion of the first instruction set for the first number of clock cycles to generate a first command during the first time period; halting, by the microcontroller, an execution of the first portion of the first instruction set at an end of the first time period associated with the first number of clock cycles; and sending, by the microcontroller, the first command to the first DP Layer 2 circuit during the first time period, and the executing, by the microcontroller, the second instruction set for the second number of clock cycles using the second token to run the second virtual DP thread during the second time period comprises: accessing, by the microcontroller, the second instruction set from the second register at a start of the second time period associated with the second number of clock cycles; executing, by the microcontroller, a first portion of the second instruction set for the second number of clock cycles to generate a second command during the second time period; halting, by the microcontroller, an execution of the first portion of the second instruction set at an end of the second time period associated with the second number of clock cycles; and sending, by the microcontroller, the second command to the second DP Layer 2 circuit at the end of the second time period.

20. The method of claim 18, further comprising: determining, based on the first virtual DP thread being run during the first time period, a processing delay associated with the first virtual DP thread during the first time period; in response to the processing delay meeting a first delay threshold for a first predetermined number of times, increasing the first number of clock cycles associated with the first token for a third time period following the second time period; and in response to the processing delay meeting a second delay threshold for a second predetermined number of times, decreasing the first number of clock cycles associated with the first token for the third time period following the second time period.