WO2024045888A1 - 一种处理装置及控制方法 - Google Patents
一种处理装置及控制方法 Download PDFInfo
- Publication number
- WO2024045888A1 WO2024045888A1 PCT/CN2023/105636 CN2023105636W WO2024045888A1 WO 2024045888 A1 WO2024045888 A1 WO 2024045888A1 CN 2023105636 W CN2023105636 W CN 2023105636W WO 2024045888 A1 WO2024045888 A1 WO 2024045888A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- computing
- connection relationship
- algorithm
- slot
- communication
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 117
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 204
- 238000004891 communication Methods 0.000 claims abstract description 114
- 238000013473 artificial intelligence Methods 0.000 claims abstract description 113
- 230000006870 function Effects 0.000 claims description 52
- 238000004364 calculation method Methods 0.000 claims description 36
- 238000003860 storage Methods 0.000 claims description 16
- 238000004590 computer program Methods 0.000 claims description 12
- 238000013461 design Methods 0.000 description 24
- 238000010586 diagram Methods 0.000 description 18
- 230000005540 biological transmission Effects 0.000 description 14
- 238000010801 machine learning Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000013528 artificial neural network Methods 0.000 description 7
- 238000006243 chemical reaction Methods 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 6
- 238000012549 training Methods 0.000 description 6
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 4
- 230000004913 activation Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 238000003058 natural language processing Methods 0.000 description 4
- 238000003062 neural network model Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 230000008520 organization Effects 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 230000001960 triggered effect Effects 0.000 description 4
- 238000000354 decomposition reaction Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000012896 Statistical algorithm Methods 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000019771 cognition Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000001627 detrimental effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 239000004984 smart glass Substances 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 230000017105 transposition Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
Definitions
- the embodiments of the present application relate to the field of communication technology, and in particular, to a processing device and a control method.
- AI artificial intelligence
- ML machine learning
- AI algorithms are widely used in image processing, natural language processing, autonomous driving and other fields.
- most AI algorithms are based on real number flow neural network design.
- the deep learning frameworks widely used in the industry are based on various basic operators of real number model construction algorithms.
- the hardware computing platform and processing device of the AI algorithm are also designed based on the real number model to design its internal computing methods, data storage, data organization and transfer methods.
- the communication algorithm is mainly built on the mathematical model of complex numbers.
- the hardware computing platform and processing device of the communication algorithm are also designed based on the complex number model. Their internal calculation methods, data storage, data organization and transfer methods are also designed.
- Embodiments of the present application provide a processing device and a control method, which are used to effectively achieve efficient calculation of communication algorithms and AI algorithms in one processing device.
- the first aspect provides a processing device, which can be applied in network equipment, terminal equipment, or other electronic equipment.
- the processing device can support the operation of communication algorithms and the operation of AI algorithms.
- the processing device includes a calculation unit and a control unit; wherein the calculation unit includes a calculation element that supports: a first connection relationship of at least one adder and at least one multiplier; and, the at least one adder and the at least one multiplier.
- the second connection relationship is used to implement the communication algorithm or AI algorithm based on the real number stream, and the second connection relationship is used to implement the communication algorithm or AI algorithm based on the complex number stream.
- a control unit configured to control the connection relationship used by the computing element in the computing unit, where the connection relationship includes the first connection relationship and the second connection relationship.
- the processing device can support multiple types of operators.
- conventional complex operators such as complex operators in communication algorithms
- conventional real operators such as real operators in communication algorithms
- AI Complex operators such as complex operators in communication AI algorithms
- AI real operators such as real operators in non-communication AI algorithms, real operators in communication AI algorithms
- the hardware efficiently supports communication algorithms and AI algorithms.
- the two types of algorithms can share hardware resources and improve the computing efficiency, area efficiency, and energy efficiency of the processing device hardware.
- the operation to be performed once is determined for multiple data that make up the data stream. Compared with the operation to be performed once for each data, the present application can improve the efficiency of calculation.
- the at least one adder and the at least one multiplier in the above computing element are fully connected.
- the computing element can support multiple connection relationships, so that the computing element can support the above-mentioned first connection relationship and the second connection relationship, and then the computing element can implement operations based on real number streams and operations based on complex number streams through different connection methods. Operation.
- control unit can schedule the computing unit to implement the communication algorithm and AI algorithm using time division multiplexing. In this way, the computing efficiency, area efficiency, and energy efficiency of the processing device hardware can be improved.
- the computing unit includes a slot, where the slot includes one or more of the above computing elements.
- the computing unit includes at least two slots, and the at least two slots are connected to each other, where each slot includes one or more of the above-mentioned computing elements.
- the interconnection between slots makes the slots reusable, thereby improving resource utilization.
- the corresponding computing element is scheduled in the slot. operation.
- the computing elements scheduled in a slot can be related to the amount of computation. For example, if the calculation amount of the computing task is large, the control unit can schedule more computing elements in the slot to perform the calculation. If the calculation amount of the computing task is small, the control unit can schedule fewer computing elements in the slot to perform the calculation. Through the above design, the computing elements of the slots can be scheduled according to the computing volume, thereby improving resource utilization.
- control unit is also used to schedule all or part of the at least two slots to perform operations.
- the AI algorithm includes communication AI algorithms and/or non-communication AI algorithms.
- the second aspect provides a control method.
- the execution subject of the method can be a control unit in the processing device.
- the method can be implemented through the following steps: the control unit determines the operation task; if the operation task is an operation based on a real number stream, The control unit schedules the first connection relationship of at least one adder and at least one multiplier of the computing element in the computing unit to perform operations; if the computing task is an operation based on a complex stream, the control unit schedules all the computing components of the computing element.
- the second connection relationship between the at least one adder and the at least one multiplier is used to perform operations.
- the first connection relationship is used to implement the communication algorithm or AI algorithm based on the real number stream
- the second connection relationship is used to implement the communication algorithm or AI algorithm based on the complex number stream.
- control unit can schedule the operators of the computing unit according to the computing tasks, thereby improving the utilization rate and energy efficiency of the processing device hardware.
- the method further includes: the control unit schedules at least one slot in the computing unit according to the computing task, wherein the scheduled at least one slot is used to perform operations on the computing task.
- the method further includes: when the control unit schedules the first slot in the computing unit, it can schedule the computing elements in the first slot according to the computing task, wherein the scheduled computing elements are used to execute Computation of computing tasks.
- the computing elements scheduled in a slot can be related to the amount of computation. For example, if the calculation amount of the computing task is large, the control unit can schedule more computing elements in the slot to perform the calculation. If the calculation amount of the computing task is small, the control unit can schedule fewer computing elements in the slot to perform the calculation. Through the above design, the computing elements of the slots can be scheduled according to the computing volume, thereby improving resource utilization.
- a control unit which includes a determination module and a scheduling module.
- the determination module is used to determine the computing tasks.
- a scheduling module configured to schedule the first connection relationship of at least one adder and at least one multiplier of the computing element in the computing unit to perform the operation when the operation task is an operation based on a real number stream; and, when the operation task is an operation based on a complex number stream
- the second connection relationship of at least one adder and at least one multiplier of the computing element is scheduled to perform operation.
- the first connection relationship is used to implement a communication algorithm or AI algorithm based on real number streams
- the second connection relationship is used to implement a communication algorithm or AI algorithm based on complex number streams.
- the scheduling module is further configured to: schedule at least one slot in the computing unit according to the computing task, where the scheduled at least one slot is used to perform operations on the computing task.
- the scheduling module is also used to: when scheduling the first slot in the computing unit, specifically, the computing elements in the first slot can be scheduled according to the computing tasks, wherein the scheduled computing elements are used to execute Computation of computing tasks.
- a computer-readable storage medium In a fourth aspect, a computer-readable storage medium is provided. Computer programs or instructions are stored in the computer-readable storage medium. When the computer program or instructions are executed by a processing device, the aforementioned second aspect and any possible design are realized. method in.
- a computer program product that stores instructions. When the instructions are executed by a processing device, the method in the second aspect and any possible design is implemented.
- a sixth aspect provides a chip system, which includes the processing device of the first aspect and any possible design, and may also include a memory.
- the chip system can be composed of chips or include chips and other discrete devices.
- Figure 1 is a schematic diagram of a communication algorithm processing device and an AI algorithm processing device provided by an embodiment of the present application;
- Figure 2A is a schematic diagram of a communication AI algorithm provided by an embodiment of the present application.
- Figure 2B is a schematic diagram of another communication-type AI algorithm provided by an embodiment of the present application.
- Figure 2C is a schematic diagram of another communication-type AI algorithm provided by an embodiment of the present application.
- FIG. 3 is a schematic structural diagram of a processing device provided by an embodiment of the present application.
- Figure 4 is a schematic structural diagram of a processing device provided by an embodiment of the present application.
- Figure 5 is a schematic diagram of a computing element structure provided by an embodiment of the present application.
- Figure 6A is a schematic diagram of a communication algorithm and the scheduling of an AI algorithm provided by an embodiment of the present application
- Figure 6B is a schematic diagram of a slot connection relationship provided by an embodiment of the present application.
- Figure 7 is a schematic structural diagram of a computing unit provided by an embodiment of the present application.
- Figure 8 is a schematic structural diagram of a control unit provided by an embodiment of the present application.
- Embodiments of the present application may be applied to the field of communications, which may include but are not limited to 5G communication systems, future communication systems (such as 6G communication systems), satellite communication systems, underwater communication systems, device-to-device , D2D) communication system, machine to machine (M2M) communication system, Internet of things (IoT), UAV communication system, narrowband-internet of things (NB-IoT) ), long term evolution (LTE) and the three major application scenarios of 5G mobile communication systems: enhanced mobile broadband (eMBB), ultra-reliable low latency communication (URLLC) and massive machine-type communications (mMTC).
- 5G communication systems may include but are not limited to 5G communication systems, future communication systems (such as 6G communication systems), satellite communication systems, underwater communication systems, device-to-device , D2D) communication system, machine to machine (M2M) communication system, Internet of things (IoT), UAV communication system, narrowband-internet of things (NB-IoT) ), long term evolution (LTE)
- the processing device can be applied to network equipment or terminal equipment.
- the network device can be a device with wireless transceiver function or a chip that can be installed on the network device.
- the network device includes but is not limited to: base station (generation node B, gNB), wireless network controller (radio network controller, RNC) , Node B (Node B, NB), base station controller (BSC), base transceiver station (BTS), home base station (for example, home evolved NodeB, or home Node B, HNB), baseband Unit (baseband unit, BBU), access point (AP), wireless relay node, wireless backhaul node, satellite, drone, transmission point in wireless fidelity (Wi-Fi) system (transmission and reception point, TRP or transmission point, TP), etc., can also be a network node that constitutes a gNB or transmission point, such as a baseband unit (BBU), or a distributed unit (DU), etc.
- BBU baseband unit
- DU distributed unit
- Terminal equipment can also be called user equipment (UE), access terminal, user unit, user station, mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication equipment, user Agent or user device.
- the terminal device in the embodiment of the present application may be a mobile phone (mobile phone), a tablet computer (Pad), a computer with wireless transceiver functions, a virtual reality (VR) terminal device, or an augmented reality (AR) terminal.
- Equipment wireless terminals in industrial control, wireless terminals in self-driving, wireless terminals in remote medical, drones, wireless terminals in smart grid , wireless terminals in transportation safety, wireless terminals in smart city, smart wearable devices (smart glasses, smart watches, smart headphones, etc.), wireless terminals in smart home, etc. , or it may be a chip or chip module (or chip system) that can be installed in the above equipment.
- Real number stream The hardware is based on one configuration (such as instruction configuration). The hardware receives a continuous series of real number data and performs the same calculation processing on this series of real number data according to the same instruction configuration. When the processing of the data amount specified by the corresponding instruction is completed, the processing of an instruction (or operator) is considered completed.
- Complex stream The hardware is based on one configuration (such as instruction configuration). The hardware receives a continuous series of complex data and performs the same calculation processing on this series of complex data according to the same instruction configuration. An instruction (or operator) is considered completed when the amount of data specified by the corresponding instruction is completed.
- Slot a functional unit composed of a set of processing elements (PE).
- AI algorithm Artificial intelligence algorithm is an algorithm that enables computers or computer-controlled software and hardware to perform intelligent learning, decision-making and problem-solving in a manner similar to human intelligent cognition and thinking.
- AI algorithms include many different types, such as machine learning (ML) algorithms, deep learning algorithms, Bayesian statistical algorithms, etc.
- AI algorithms can solve complex high-dimensional problems It can accurately perform abstract modeling, accurately predict dynamic systems, and quickly and effectively solve multi-objective optimal decisions for complex problems.
- AI algorithms are used in many fields such as image recognition, speech processing, natural language processing, recommendation systems, medical diagnosis, financial analysis, wireless communication networks, wired communication networks, and intelligent manufacturing.
- wireless AI algorithms can significantly improve the performance of communication systems and reduce the transmission overhead and operation and maintenance costs of communication systems.
- At least one refers to one or more, and “multiple” refers to two or more.
- “And/or” describes the association of associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural.
- the character “/” generally indicates that the related objects are in an “or” relationship.
- “At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
- At least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
- ordinal numbers such as “first” and “second” mentioned in the embodiments of this application are used to distinguish multiple objects and are not used to limit the size, content, order, and timing of multiple objects. , priority or importance, etc.
- first connection relationship and the second connection relationship are only to distinguish different connection relationships, but do not indicate the difference in complexity, priority or importance of the two connection relationships.
- Conventional AI algorithms are mainly based on real number models to build each basic operator of the algorithm.
- the hardware computing platform and processing device of the conventional AL algorithm are also designed based on the real number model to design its internal operation methods, data storage, data organization and transfer methods.
- Algorithms in the conventional communication field (hereinafter referred to as conventional communication algorithms, such as communication algorithms for 5G NR wireless communication systems, satellite communication systems, Wi-Fi communication systems, etc.) are mainly based on complex number models to build each basic operator of the algorithm.
- hardware computing platforms and processing devices in the field of conventional communications are also designed based on the complex number model for their internal computing methods, data storage, data organization and transfer methods.
- communication AI algorithms are increasingly used.
- communication AI algorithms depending on the specific application object, there are algorithms based on complex number models and algorithms based on real number models. If the existing treatment device design method is adopted, multiple sets of different types of treatment devices need to be configured, which is detrimental to the realization of cost, area efficiency, energy efficiency, processing efficiency and other aspects.
- embodiments of the present application provide a processing device and a control method.
- the processing device supports efficient operations of communication algorithms and efficient operations of AI algorithms.
- communication algorithms and AI algorithms can be efficiently run on one processing device to obtain optimal implementation cost, area efficiency, energy efficiency, computing efficiency, etc.
- the AI algorithm may include communication AI algorithms and/or non-communication AI algorithms.
- the communication algorithm described below can be understood as the conventional communication algorithm described above.
- the communication algorithm described below can be a communication-type non-AI algorithm.
- the communication algorithm in the embodiment of the present application may include a communication algorithm based on real numbers or a communication algorithm based on complex numbers.
- Communication-type AI algorithms may include communication-type AI algorithms based on real numbers, or may also include communication-type AI algorithms based on complex numbers.
- Non-communication AI algorithms may include non-communication AI algorithms based on real numbers, and may also include non-communication AI algorithms based on complex numbers.
- Communication-based AI algorithms can, but are not limited to, implement complex-based communication-based AI algorithms through one or more of the following structures: complex neural network models (or complex decision trees, or complex support vector machines (SVM), Or complex k-nearest neighbor method (k-nearest neighbor, k-NN), etc.), complex cost function, complex training algorithm, as shown in Figure 2A.
- complex neural network models or complex decision trees, or complex support vector machines (SVM), Or complex k-nearest neighbor method (k-nearest neighbor, k-NN), etc.
- SVM complex support vector machines
- k-nearest neighbor method k-nearest neighbor, k-NN
- complex cost function complex training algorithm
- communication-based AI algorithms can also, but are not limited to, implement real-number-based communication AI algorithms through one or more of the following structures: real-number neural network models (or real-number SVM, or real-number k-NN, etc.), real-number cost functions , real number training algorithm, as shown in Figure 2B.
- the communication-type AI algorithm can, but is not limited to, implement the complex-number-based communication-type AI algorithm and/or the real-number-based communication type AI algorithm through one or more of the following structures: Connect the above-mentioned structure shown in Figure 2A through a conversion module. and the structure shown in Figure 2B above, the The conversion module is used to realize the conversion of real numbers to complex numbers and the conversion of complex numbers to real numbers, as shown in Figure 2C.
- complex neural network models may include but are not limited to one or more of the following: complex number-based multilayer perceptron (MLP) models, complex number-based convolutional neural networks (CNN) models, Complex number-based residual network (ResNet) model, complex number-based recurrent neural network (RNN) model, complex number-based transformer (Transformer) model, complex number-based autoencoder (Autoencoder) model, complex number-based Generative adversarial network (GAN) model; complex number-based activation function, complex number-based normalization function model Batch Normalization, complex number-based pooling function, etc.
- MLP complex number-based multilayer perceptron
- CNN Complex number-based convolutional neural networks
- ResNet Complex number-based residual network
- RNN complex number-based recurrent neural network
- Transformer complex number-based transformer
- Autoencoder complex number-based autoencoder
- GAN Generative adversarial network
- complex number-based activation function complex number-based normalization function model Batch
- the complex cost function model may include but is not limited to one or more of the following: minimum mean squared error (MMSE), minimized cosine similarity (CS), minimized squared cosine similarity (squared generalized) cosine similarity (SGCS), maximize cross entropy (cross entropy, CE), etc.
- MMSE minimum mean squared error
- CS minimized cosine similarity
- SGCS squared generalized cosine similarity
- CE cross entropy
- the complex training algorithm is used to train the AI/ML algorithm model so that the output of its cost function continuously converges to the optimal value.
- Complex number training algorithms may include but are not limited to one or more of the following: complex number-based stochastic gradient descent algorithm (stochastic gradient descent, SGD), complex number-based adaptive moment estimation algorithm (adaptive moment estimation, Adam), and the Adam algorithm. Extended algorithms AdaMax, AMSGrad, etc.
- the real number neural network model may include but is not limited to one or more of the following: a real number-based multilayer perceptron (MLP) model, a real number-based convolutional neural network (CNN) model, Real number-based residual network (ResNet) model, real number-based recurrent neural network (RNN) model, real number-based transformer (Transformer) model, real number-based autoencoder (Autoencoder) model, real number-based Generative adversarial network (GAN) model; activation function based on real numbers, normalization function model Batch Normalization based on real numbers, pooling function based on real numbers, etc.
- MLP real number-based multilayer perceptron
- CNN real number-based convolutional neural network
- ResNet Real number-based residual network
- RNN real number-based recurrent neural network
- Transformer real number-based transformer
- Autoencoder real number-based autoencoder
- GAN Generative adversarial network
- Real cost functions are used to define the optimization goals of AI/ML algorithms.
- the real cost function model may include but is not limited to one or more of the following: minimum mean squared error (MMSE), minimized cosine similarity (CS), minimized squared cosine similarity (squared generalized) cosine similarity (SGCS), maximize cross entropy (cross entropy, CE), etc.
- the real number training algorithm is used to train the AI/ML algorithm model so that the output of its cost function continuously converges to the optimal value.
- Real number training algorithms may include but are not limited to one or more of the following: real number-based stochastic gradient descent algorithm (stochastic gradient descent, SGD), real number-based adaptive moment estimation algorithm (adaptive moment estimation, Adam), and Adam algorithm. Extended algorithms AdaMax, AMSGrad, etc.
- Communication AI algorithms can be applied to a variety of applications in communication systems.
- the access network and core network used in 5G NR wireless networks include the physical layer, link layer and network layer of the access network.
- the physical layer receiver receives the uplink DMRS channel, uplink SRS channel or downlink CSI-RS channel, etc., and uses the AI channel estimation algorithm to effectively reduce the negative effects of channel noise and channel interference, improve the accuracy of channel estimation, and obtain uplink or Downlink transmission throughput gain;
- the physical layer receiver uses the results of multiple historical channel estimations to perform channel time domain predictions on unknown channels at one or more moments in the future through AI algorithms, improving the ability to track dynamic changes in wireless communication channels and obtaining accurate predictions of future channel information. , ensuring that transmission performance is not affected and improving user experience in scenarios where end users are constantly moving;
- the physical layer transmitter uses the AI neural network algorithm to perform constellation symbol modulation, and can construct a regular constellation or an irregular constellation at the transmitter.
- the AI neural network algorithm is used to demodulate the transmitted constellation symbols and accurately recover the transmitted bit information, thereby improving the system transmission capacity;
- the physical layer transmitter of the physical layer uses the AI neural network algorithm to perform compressed feedback on the estimated channel state information (CSI), and the receiver uses the AI neural network algorithm to perform CSI recovery and channel reconfiguration on the received feedback bit information. structure.
- AI algorithms to perform CSI compression feedback and CSI recovery and reconstruction can effectively improve the compression ratio of channel CSI information and the accuracy of CSI reconstruction, and improve system transmission capacity.
- the physical layer transmitter uses AI algorithms to perform beam shaping on the transmitted signal to improve the interference suppression capability of multi-user transmission.
- AI algorithms On the receiver side, accurate detection of signals through AI algorithms can increase the number of user multiplexing streams in multiple-input multiple-out (MIMO) systems and improve system transmission capacity.
- MIMO multiple-input multiple-out
- the link layer uses AI algorithms to predict link quality and determine the optimal decision-making based on measurement quantities such as channel quality indicator (CQI) and reference signal received power (RSRP) fed back by the terminal. Select the optimal transmission method MCS and send the corresponding signal. Highly accurate MCS selection can significantly improve link transmission quality and achieve higher transmission throughput.
- communication AI algorithms can also perform high-frequency beam management, nonlinear device compensation, etc., and wireless air interface resource scheduling in MIMO systems. I won’t list them all here.
- Non-communication AI algorithms can include one or more of the following: AI algorithms for image processing, AI algorithms for speech processing, AI algorithms for recommendation systems, AI algorithms for medical diagnosis, AI algorithms for natural AI algorithms for language processing, AI algorithms for financial analysis, etc. are not listed here one by one.
- an embodiment of the present application provides a structure of a processing device.
- the processing device includes a computing unit and a control unit, and the computing unit is used to perform algorithm operations.
- the control unit is used to control the computing unit to perform operations. Specifically, it may control the computing unit to execute the communication algorithm or AI algorithm through software scheduling, for example, scheduling the computing unit by sending instructions to the computing unit.
- software scheduling for example, scheduling the computing unit by sending instructions to the computing unit.
- control unit may also be called a micro-controller unit (MCU), and the computing unit may include a hybrid tensor processing unit (Hybrid Tensor Array) and a vector processing unit (Vector Unit).
- MCU micro-controller unit
- computing unit may include a hybrid tensor processing unit (Hybrid Tensor Array) and a vector processing unit (Vector Unit).
- the hybrid tensor processing unit may include but is not limited to matrix calculation and tensor calculation.
- the calculations of the hybrid tensor processing unit include matrix multiplication, matrix addition, matrix decomposition (such as singular value decomposition (SVD), Cholesky decomposition, etc.), matrix inversion, matrix point Multiplication, one-dimensional, two-dimensional or high-dimensional (greater than two-dimensional) convolution, one-dimensional, two-dimensional or high-dimensional (greater than two-dimensional) sparse convolution, tensor multiplication, tensor addition, tensor dot product, Tensor data extraction, tensor data transposition, tensor dimension conversion, etc.
- matrix decomposition such as singular value decomposition (SVD), Cholesky decomposition, etc.
- matrix inversion matrix point Multiplication
- one-dimensional, two-dimensional or high-dimensional (greater than two-dimensional) convolution one-dimensional, two-dimensional or high-dimensional (greater than two-dimensional) sparse convolution
- the vector processing unit may include but is not limited to vector calculation and nonlinear calculation.
- the calculations of the vector processing unit include vector multiplication, vector addition, vector dot product, vector transpose, vector reciprocal, vector root, vector trigonometric function, vector exponential function, vector activation function, etc.
- Activation function types may include but are not limited to sigmoid function, tanh function, Relu function, Elu function, LeakyRelu function, softmax function, softplus function, swish function, etc.
- the processing device also includes a storage unit.
- the storage unit is usually used to temporarily cache data required for current program calculations.
- the storage unit serves as a shared storage unit (share mem) to store data required by the computing unit.
- shared storage unit shared mem
- the computing unit which is a hybrid tensor processing unit and vector processing unit, for reading and writing data. As shown in Figure 4.
- the direct memory access unit (direct memory access, DMA) is used as a processing device and external access device. It is mainly used to move external data into the internal shared storage unit, or move the calculated data in the shared storage unit to External storage space.
- DMA direct memory access
- the computing unit includes multiple PEs, and each PE includes at least one multiplier and at least one adder.
- the number of multipliers and/or the number of adders included in each PE may be the same or different. This application does not specify limited.
- the above-mentioned PE supports the connection of adders and multipliers through at least two connection relationships, including a first connection relationship and a second connection relationship.
- at least one adder and at least one multiplier are connected through a first connection relationship for implementing an algorithm based on a real number stream
- at least one adder and at least one multiplier are connected through a second connection relationship for implementing an algorithm based on a complex number stream. algorithm.
- the multipliers connected by the first connection relationship and the multipliers connected by the second connection relationship may be the same or different.
- the adders connected by the first connection relationship and the adders connected by the second connection relationship may be the same or different.
- the first connection relationship can be used to implement communication algorithms based on real numbers, AI algorithms based on real number streams (for example, communication-based AI algorithms based on real numbers, non-communication AI algorithms based on real numbers), etc.
- the second connection relationship It can be used to implement complex number-based communication algorithms, complex number flow-based AI algorithms (such as complex number-based communication AI algorithms, complex number-based non-communication AI algorithms), etc.
- the adders and multipliers in the above-mentioned PE are fully connected, that is, there is a connection relationship between any two devices (adders and/or multipliers) in the above-mentioned PE, so that the PE can support Adders and multipliers are connected through various connections.
- the control unit can control the connection relationship used by the PE in the calculation unit.
- control unit can control the connection relationship used by the PE in the computing unit according to the computing task.
- the control device may determine that the operation task is an operation based on the real number stream. If the input data of the operation task is a complex number stream, the control device may determine that the operation task is an operation based on the complex number stream.
- control unit can control the connection relationship used by the PE in the computing unit according to the input data of the computing task. For example, if the input data is a real number stream, you can control the PE in the computing unit to use the first connection relationship. If the input data is a complex number stream, you can control The PE in the system calculation unit uses the second connection relationship.
- control unit can determine the computing task in the following manner: the control unit can determine the computing task when detecting a preset event that triggers the computing task.
- control unit when the control unit detects a preset event that triggers image recognition, it determines that the computing task is image recognition.
- the preset event can be a user-triggered face recognition instruction, or a user-triggered object recognition instruction. etc.
- control unit when the control unit detects a preset event that triggers voice processing, it determines that the computing task is to perform voice processing.
- the preset event may be a user-triggered call instruction, or a user-triggered recording instruction, and so on.
- control unit when the control unit detects a preset event that triggers channel estimation, it determines that the computing task is to perform channel estimation.
- the preset event may be that the device where the processing device is located receives an uplink DMRS channel, an uplink SRS channel, or a downlink CSI-RS channel, etc.
- control unit when the control unit detects a preset event that triggers channel time domain prediction, it determines that the computing task is to perform channel time domain prediction.
- the preset event may be that the device where the processing device is located receives a message sent by the peer communication device. Instruction information, or the equipment where the processing device is located determines that channel time domain prediction is to be performed, and so on.
- control unit when the control unit detects a preset event that triggers constellation symbol modulation, it determines that the computing task is to perform constellation symbol modulation.
- the preset event may be that the device where the processing device is located determines to construct a regular constellation of the transmitter or Irregular constellations, or the equipment where the processing device is located needs to send signals to the receiving end, etc.
- control unit when the control unit detects a preset event that triggers constellation symbol demodulation, it determines that the computing task is to perform constellation symbol demodulation.
- the preset event may be that the processing device receives a signal sent by the peer device, etc. wait.
- control unit when the control unit detects a preset event that triggers compressed feedback CSI, it determines that the computing task is to perform compressed feedback on the estimated CSI.
- the preset event may be that the device where the processing device is located receives a message from the peer device.
- the pilot signal, or the equipment where the processing device is located performs channel estimation, etc.
- control unit when the control unit detects a preset event that triggers channel reconstruction, it determines that the computing task is channel reconstruction.
- the preset event may be that the device where the processing device is located receives the CSI reported by the peer device. etc.
- the control unit may determine that the computing task also includes performing CSI recovery.
- control unit when the control unit detects a preset event that triggers beamforming of the transmitted signal, it determines that the computing task is to perform beamforming of the transmitted signal.
- the preset event may be that the device where the processing device is located wants to perform beamforming.
- the peer device sends a signal, and so on.
- control unit when the control unit detects a preset event that triggers signal detection, it determines that the computing task is signal detection.
- the preset event may be that the device where the processing device is located receives a signal sent by the peer device, and so on.
- control unit when the control unit detects a preset event that triggers link quality prediction, it determines that the computing task is link quality prediction.
- the preset event may be that the device where the processing device is located receives the CQI fed back by the terminal device. Measurements such as RSRP, etc.
- the PE includes three switch modules, a multiplier module and an adder module.
- Each switch module includes at least one switch, and the multiplier module includes four multipliers.
- the adder module includes two adders.
- the switch module 1 controls the multiplier to which the data stream is input through the opening and closing state of the switch
- the switch module 2 controls the adder to which the data stream is input through the opening and closing state of the switch.
- the switch module 3 controls the output of the data flow through the opening and closing status of the switch.
- the first connection relationship can be used.
- the switch module 1 can control the data flow input to the multiplier part (such as four multipliers) in the multiplier module through the opening and closing status of the switch.
- the switch module 2 can control the data stream processed by the multiplier module to be directly output to the PE output terminal (such as across the adder) through the opening and closing status of the switch.
- the second connection relationship can be used.
- the switch module 1 can control the data flow input to the multiplier part (such as four multipliers) in the multiplier module through the opening and closing status of the switch.
- the switch Module 2 can control the data stream processed by the multiplier module to be input into the adder part (such as two adders) in the adder module through the opening and closing status of the switch, and then output from the adder.
- the above method can be implemented by adding a small amount of hardware interconnection, so that multipliers and adders can be multiplexed in one processing device. different algorithms.
- the processing device supports flexible hardware processing and configuration, and improves resource utilization.
- the processing device provided by this application can implement algorithms based on both real number streams and complex number streams, efficient calculation of communication algorithms and AI algorithms can be achieved through one processing device.
- the control unit can control the connections used by the PE in the computing unit when controlling the computing unit to perform operations. relation.
- the control unit can schedule the connection relationship used by the PE in the computing unit according to the computing task to be executed. For example, when performing real number operations, the PE in the computing unit can be scheduled to use the above-mentioned first connection relationship, and when performing complex number operations, the PE in the computing unit can be scheduled to use the above-mentioned second connection relationship.
- control unit can schedule the computing unit to implement the communication algorithm and AI algorithm in a time division multiplexing manner. For example, as shown in Figure 6A. In this way, the utilization rate and energy efficiency of the processing device hardware can be improved.
- the computing unit may include one or more slots, and each slot may include one or more PEs. If the computing unit includes multiple slots, the multiple slots are connected to each other, that is, there is a connection relationship between any two slots in the multiple slots. For example, as shown in Figure 6B, the computing unit includes slots. 1. Slot 2, Slot 3 and Slot 4, where any one of the four slots has a connection relationship with other slots. Any two slots among the plurality of slots implement the same or different functions, and each independently completes a functional processing step in the operator calculation.
- the functions of the PEs in the above slots can be the same or different, and there are no specific limitations here.
- the computing unit includes a first slot, a second slot, and a third slot, and the first slot, the second slot, and the third slot are connected to each other.
- the first slot includes multiple PEs that implement the multiplication function;
- the second slot includes multiple PEs that implement the addition tree function;
- the third slot includes multiple PEs that implement the accumulator function. It should be understood that the description here only takes the PEs included in the slots with the same functions as an example, and does not limit the functional types of the PEs in the slots.
- all PEs in the slot are used to implement operations based on a real number stream. For example, all PEs in the slot use the first connection relationship. If a slot is used to implement operations based on complex streams, all PEs in the slot are used to implement operations based on complex streams. For example, all PEs in the slot use the second connection relationship.
- the computing unit includes three switch modules, a first slot, a second slot and a third slot, wherein each switch module includes at least one switch.
- Switch module A controls the data flow input to the first slot through the opening and closing status of the switch.
- Switch module B controls the data flow input to the second slot through the opening and closing status of the switch.
- the switch module C controls the data flow input to the third slot through the opening and closing status of the switch.
- this application does not limit the position of the switch module in the processing device, and the switch module may be included in the slot. Alternatively, the switch module can also be deployed outside the slot, and there is no specific limitation here.
- the control unit when controlling the computing unit to perform operations, can schedule all or part of the above-mentioned plurality of slots to perform operations. For example, when executing a first computing task, all slots among the plurality of slots may be scheduled to perform computing, and when executing a second computing task, a first part of slots among the plurality of slots may be scheduled to perform computing. When executing the third computing task, a second part of the slots among the plurality of slots may be scheduled to perform computing.
- the control unit can schedule the first slot and the third slot according to the execution of the computing task. Specifically, the control unit can control switch module A to turn on, switch module B to turn off, and switch module C to turn on. Therefore, the data flow is input into the first slot through the control of switch module A. After being processed in the first slot, it is input into the third slot under the control of switch module C. After passing through the third slot, the data is output.
- the switch module being on means that at least one switch in the switch module is in the on state
- the switch module being off means that all the switches in the switch module are in the off state.
- control unit when the control unit schedules a certain slot to perform an operation, it may specifically schedule the computing element in the slot to perform the corresponding operation according to the operation task. For example, taking the above-mentioned FIG. 7 as an example, the control unit can schedule the PE that performs the operation in the first slot according to the executed operation task. The control unit may also schedule the PE in the second slot to perform the operation according to the executed operation task. The control unit can also schedule the PE that performs the operation in the third slot according to the executed operation task, and so on.
- the computing elements scheduled in the slots may be related to the amount of computation. For example, if the calculation amount of the computing task is large, the control unit can schedule more computing elements in the slot to perform the calculation. If the calculation amount of the computing task is small, the control unit can schedule fewer computing elements in the slot to perform the calculation.
- computing resources in different slots such as multipliers, addition trees, accumulators, etc.
- opertors which can improve Calculate resource utilization.
- the embodiments of the present application do not limit the specific connection media between the above control units, computing units, and storage units.
- the control unit, the computing unit, and the storage unit are connected through a bus in Figures 1 and 3.
- the bus is represented by a thick line in Figures 1 and 3.
- the connection mode between other components is only The description is schematic and not limiting.
- the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figures 1 and 3, but it does not mean that there is only one bus or one type of bus.
- the processing device can support multiple types of operators.
- conventional complex operators such as those in communication algorithms
- conventional real operators such as real operators in communication algorithms
- AI complex operators such as complex operators in communication AI algorithms
- AI real operators such as non-communication AI
- the real number operator in the algorithm, the real number operator in the communication AI algorithm), etc. can realize a single hardware to efficiently support communication algorithms and AI algorithms.
- the two types of algorithms can share hardware resources in a time-division multiplexing manner, improving the processing device Hardware utilization and energy efficiency.
- the control information required for one execution is determined for multiple data that make up the data stream. Compared with determining the control information for one execution for each data, the present application can improve the calculation s efficiency.
- the embodiment of the present application provides a control unit.
- the structure of the control unit can be shown in Figure 8 , including a determination module 801 and a scheduling module 802.
- the control unit may be specifically used to implement the methods executed by the control unit in the embodiments of FIGS. 3-5, 6A, 6B, and 7.
- the determination module 801 is used to determine the computing task.
- the scheduling module 802 is configured to schedule the first connection relationship of at least one adder and at least one multiplier of the computing element in the computing unit to perform the operation when the operation task is an operation based on real numbers; and, when the operation task is an operation based on complex numbers
- the second connection relationship of at least one adder and at least one multiplier of the computing element is scheduled to perform operations; wherein the first connection relationship is used to implement a communication algorithm or artificial intelligence AI algorithm based on real number flow, and the second connection relationship is The connection relationship is used to implement communication algorithms or AI algorithms based on complex flow.
- the scheduling module 802 is also configured to schedule at least one slot in the computing unit according to the computing task, where at least one slot is used to perform operations on the computing task.
- the scheduling module 802 is also configured to: when scheduling the first slot in the computing unit, schedule the computing elements in the first slot according to the computing tasks, wherein the scheduled computing elements are used to perform the operation The operation of the task.
- each functional module in each embodiment of the present application may be integrated into one processing unit. In the device, it may exist physically alone, or two or more modules may be integrated into one module.
- the above integrated modules can be implemented in the form of hardware or software function modules. It can be understood that the functions or implementation of each module in the embodiment of the present application can be further referred to the relevant descriptions of the embodiments described in FIGS. 3-5, 6A, 6B, and 7.
- embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
- the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
- These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
- Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Pure & Applied Mathematics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
一种处理装置及控制方法,用于实现在一个处理装置中达成通信算法和人工智能算法的高效计算。该处理装置可以支持通信算法的运算和人工智能算法的运算,具体包括计算单元和控制单元;计算单元包括计算元件,该计算元件支持:至少一个加法器和至少一个乘法器的第一连接关系和第二连接关系;第一连接关系用于实现基于实数流的通信算法或者人工智能算法,第二连接关系用于实现基于复数流的通信算法或者人工智能算法。控制单元,用于控制该计算元件使用的连接关系,该连接关系包括上述第一连接关系和第二连接关系。上述处理装置可以实现单个硬件支持通信算法和人工智能算法,两种类型算法可以共享硬件资源,提升处理装置硬件的计算效率、面效、和能效等。
Description
相关申请的交叉引用
本申请要求在2022年08月31日提交中华人民共和国知识产权局、申请号为202211060064.1、申请名称为“一种处理装置及控制方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请实施例涉及通信技术领域,尤其涉及一种处理装置及控制方法。
人工智能(artificial intelligence,AI)/机器学习(machine learning,ML)等算法在图像处理、自然语言处理、自动驾驶等多种领域内有广泛应用。目前AI算法大多数都是基于实数流的神经网络设计。业界广泛使用的深度学习框架都是基于实数模型构建算法的各个基础算子。相应的,AI算法的硬件计算平台和处理装置等也是以实数模型为基础设计其内部运算方式、数据存储、数据组织和搬移方式。
而通信算法主要构建在复数的数学模型上。相应的,通信算法的硬件计算平台和处理装置等也是以复数模型为基础设计其内部运算方式、数据存储、数据组织和搬移方式。
因此,通信算法和AI算法的处理流程差异大,基础运算各自以复数为中心和以实数为中心。对于两种类型算法的处理装置设计,目前都是各自独立优化。如何在一个处理装置中有效地达成通信算法和AI算法的高效计算,提升处理装置的计算效率、面效、和能效成为亟待解决的问题。
发明内容
本申请实施例提供一种处理装置及控制方法,用于实现在一个处理装置中有效地达成通信算法和AI算法的高效计算。
第一方面,提供一种处理装置,该处理装置可以应用在网络设备或者终端设备或者其他电子设备。该处理装置可以支持通信算法的运算和AI算法的运算。处理装置包括计算单元和控制单元;其中,计算单元包括计算元件,该计算元件支持:至少一个加法器和至少一个乘法器的第一连接关系,以及,该至少一个加法器和该至少一个乘法器的第二连接关系;第一连接关系用于实现基于实数流的通信算法或者AI算法,第二连接关系用于实现基于复数流的通信算法或者AI算法。控制单元,用于控制计算单元中该计算元件使用的连接关系,该连接关系包括所述第一连接关系和所述第二连接关系。
通过本申请实施例,处理装置可以支持多种类型算子,举例说明,常规复数算子(例如通信算法中的复数算子)、常规实数算子(例如通信算法中的实数算子)、AI复数算子(例如通信类的AI算法中的复数算子)、AI实数算子(例如非通信类的AI算法中的实数算子,通信类AI算法中的实数算子)等,可以实现单个硬件高效地支持通信算法和AI算法,两种类型算法可以共享硬件资源,提升处理装置硬件的计算效率、面效、和能效。并且,本申请实施例中在对数据流进行处理时,针对组成数据流的多个数据判断一次执行的运算,相比于针对每个数据判断一次执行的运算,本申请可以提升计算的效率。
一种可能的设计中,上述计算元件中该至少一个加法器和该至少一个乘法器全连接。通过该方式使得计算元件可以支持多种连接关系,从而使得计算元件可以支持上述第一连接关系和第二连接关系,进而计算元件可以通过不同的连接方式实现基于实数流的运算和基于复数流的运算。
一种可能的设计中,控制单元可以采用时分复用的方式调度计算单元实现通信算法和AI算法。通过该方式可以提升处理装置硬件的计算效率、面效、和能效等。
一种可能的设计中,计算单元包括一个槽位,其中,该槽位包括一个或多个上述计算元件。
一种可能的设计中,计算单元包括至少两个槽位,该至少两个槽位相互连接,其中,每个槽位包括一个或多个上述计算元件。上述方式中,通过槽位之间的互联,使得槽位具有可复用性,从而可以提升资源利用率。
一种可能的设计中,上述槽位在实现该槽位对应的功能时根据该槽位中被调度的计算元件进行相应
的运算。
槽位中被调度的计算元件可以与运算量相关。例如,若运算任务的运算量较大,控制单元可以调度槽位中较多的计算元件执行运算,若运算任务的运算量较小,控制单元可以调度槽位中较少的计算元件执行运算。通过上述设计,可以根据运算量调度槽位的计算元件,从而可以提升资源利用率。
一种可能的设计中,控制单元,还用于:调度上述至少两个槽位中的全部槽位或部分槽位执行运算。通过上述设计,可以提升运算效率以及资源利用率。
一种可能的设计中,AI算法包括通信类的AI算法和/或非通信类的AI算法。
第二方面,提供一种控制方法,该方法的执行主体可以是处理装置中的控制单元,该方法可以通过以下步骤实现:控制单元确定运算任务;若所述运算任务为基于实数流的运算,所述控制单元调度计算单元中计算元件的至少一个加法器和至少一个乘法器的第一连接关系进行运算;若所述运算任务为基于复数流的运算,所述控制单元调度该计算元件的所述至少一个加法器和所述至少一个乘法器的第二连接关系进行运算。其中,第一连接关系用于实现基于实数流的通信算法或者AI算法,第二连接关系用于实现基于复数流的通信算法或AI算法。
通过本申请实施例,控制单元可以根据运算任务调度计算单元的算子,从而可以提升处理装置硬件的使用率和能效。
一种可能的设计中,方法还包括:控制单元根据运算任务调度所述计算单元中的至少一个槽位,其中,被调度的至少一个槽位用于执行对运算任务的运算。通过上述设计,可以提升运算效率以及资源利用率。
一种可能的设计中,方法还包括:控制单元在调度计算单元中的第一槽位时,可以根据运算任务调度该第一槽位中的计算元件,其中,被调度的计算元件用于执行对运算任务的运算。
槽位中被调度的计算元件可以与运算量相关。例如,若运算任务的运算量较大,控制单元可以调度槽位中较多的计算元件执行运算,若运算任务的运算量较小,控制单元可以调度槽位中较少的计算元件执行运算。通过上述设计,可以根据运算量调度槽位的计算元件,从而可以提升资源利用率。
第三方面,提供了一种控制单元,该控制单元包括确定模块和调度模块。其中,确定模块,用于确定运算任务。调度模块,用于在运算任务为基于实数流的运算时,调度计算单元中计算元件的至少一个加法器和至少一个乘法器的第一连接关系进行运算;以及,在运算任务为基于复数流的运算时,调度计算元件的至少一个加法器和至少一个乘法器的第二连接关系进行运算。其中,第一连接关系用于实现基于实数流的通信算法或者AI算法,所述第二连接关系用于实现基于复数流的通信算法或AI算法。
一种可能的设计中,调度模块,还用于:根据运算任务调度计算单元中的至少一个槽位,其中,被调度的至少一个槽位用于执行对运算任务的运算。
一种可能的设计中,调度模块还用于:在调度计算单元中的第一槽位时,具体可以根据运算任务调度第一槽位中的计算元件,其中,被调度的计算元件用于执行对运算任务的运算。
第四方面,提供了一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序或指令,当该计算机程序或指令被处理装置执行时,实现前述第二方面以及任意可能的设计中的方法。
第五方面,提供了一种存储有指令的计算机程序产品,当该指令被处理装置运行时,实现前述第二方面以及任意可能的设计中的方法。
第六方面,提供一种芯片系统,该芯片系统包括第一方面以及任意可能的设计中的处理装置,还可以包括存储器。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
上述第三方面至第六方面中任一方面的技术方案可以达到的技术效果,可以参照上述第一方面或第二方面的技术方案可以达到的技术效果描述,重复之处不予赘述。
图1为本申请实施例提供的一种通信算法处理装置和AI算法处理装置的示意图;
图2A为本申请实施例提供的一种通信类的AI算法的示意图;
图2B为本申请实施例提供的另一种通信类的AI算法的示意图;
图2C为本申请实施例提供的另一种通信类的AI算法的示意图;
图3为本申请实施例提供的一种处理装置的结构示意图;
图4为本申请实施例提供的一种处理装置的结构示意图;
图5为本申请实施例提供的一种计算元件结构的示意图;
图6A为本申请实施例提供的一种通信算法和AI算法的调度示意图;
图6B为本申请实施例提供的一种槽位连接关系的示意图;
图7为本申请实施例提供的一种计算单元的结构示意图;
图8为本申请实施例提供的一种控制单元的结构示意图。
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。
本申请实施例可以应用于通信领域,通信领域可以包括但不限于5G通信系统、未来的通信系统(如6G通信系统)、卫星通信系统、水下通信系统、设备到设备(device-to-device,D2D)通信系统、机器到机器(machine to machine,M2M)通信系统、物联网(internet of things,IoT)、无人机通信系统、窄带物联网系统(narrow band-internet of things,NB-IoT)、长期演进系统(long term evolution,LTE)以及5G移动通信系统的三大应用场景增强移动宽带(enhanced mobile broadband,eMBB),超高可靠与低延迟的通信(ultra reliable low latency communication,URLLC)以及大规模机器通信(massive machine-type communications,mMTC)。
还可以应用于其他领域,例如,图像处理领域、语音处理领域、深度学习领域、机器学习领域、自然语言处理领域、大数据处理领域等应用AI技术的领域。
示例性的,若该处理装置应用于通信领域,具体可以应用于网络设备,也可以应用于终端设备。其中,网络设备可以为具有无线收发功能的设备或可设置于该网络设备的芯片,该网络设备包括但不限于:基站(generation node B,gNB)、无线网络控制器(radio network controller,RNC)、节点B(Node B,NB)、基站控制器(base station controller,BSC)、基站收发台(base transceiver station,BTS)、家庭基站(例如,home evolved NodeB,或home Node B,HNB)、基带单元(baseband unit,BBU),无线保真(wireless fidelity,Wi-Fi)系统中的接入点(access point,AP)、无线中继节点、无线回传节点、卫星、无人机、传输点(transmission and reception point,TRP或者transmission point,TP)等,还可以为构成gNB或传输点的网络节点,如基带单元(BBU),或,分布式单元(distributed unit,DU)等。
终端设备也可以称为用户设备(user equipment,UE)、接入终端、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、终端、无线通信设备、用户代理或用户装置。本申请的实施例中的终端设备可以是手机(mobile phone)、平板电脑(Pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、无人机、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智能穿戴设备(智能眼镜、智能手表、智能耳机等)、智慧家庭(smart home)中的无线终端等等,也可以是能够设置于以上设备的芯片或芯片模组(或芯片系统)等。
为了便于本领域技术人员理解,下面对本申请实施例中的部分用语进行解释说明。
1)实数流:硬件基于一种配置(如指令配置)下,硬件接收连续一串连续的实数数据,按照相同的指令配置,对这一连串实数数据进行相同的计算处理。当完成对应指令指定数据量的处理后,认为一条指令(或者算子)处理完成。
2)复数流:硬件基于一种配置(如指令配置)下,硬件接收连续一连串连续的复数数据,按照相同的指令配置对这一连串的复数数据进行相同的计算处理。当完成对应指令指定的数据量时,认为一条指令(或者算子)完成。
3)槽位:一组计算元件(processing element,PE)组合成的功能单元。
4)算子:在数学领域中可以理解为一种映射,其作用是把函数映射到函数,或者函数映射到向量空间的元素。在计算机领域中可以理解为完成某个特定数学操作的函数运算。在使用算子时往往会有输入和输出,算子则完成输入到输出的相应函数运算或者数据的转化。
5)AI算法:人工智能算法是一种使计算机或者计算机控制的软件、硬件进行智能化学习、决策和解决问题的算法,其方式与人类的智能认知和思维类似。AI算法包括多种不同类型,例如机器学习(machine learning,ML)算法、深度学习算法、贝叶斯统计算法等。AI算法可以对复杂的高维度问题进
行准确地抽象建模,可以进行动态系统的准确预测,可以快速有效地解决复杂问题的多目标最优决策。目前,AI算法应用在图像识别、语音处理、自然语言处理、推荐系统、医疗诊断、金融分析、无线通信网络、有线通信网络、智能制造等多个领域。在无线通信系统中,无线AI算法可以显著提升通信系统的性能,降低通信系统的传输开销和运维代价。
本申请实施例中“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。
以及,除非有相反的说明,本申请实施例提及“第一”、“第二”等序数词是用于对多个对象进行区分,不用于限定多个对象的大小、内容、顺序、时序、优先级或者重要程度等。例如,第一连接关系和第二连接关系,只是为了区分不同的连接关系,而并不是表示这两个连接关系的复杂度、优先级或者重要程度等的不同。
前文介绍了本申请实施例所涉及到的一些名词概念,下面介绍本申请实施例涉及的技术特征。
常规AI算法(例如应用在图像识别、语音处理、自然语言处理、推荐系统等领域的AL/ML算法)主要基于实数模型构建算法的各个基础算子。相应的,常规AL算法的硬件计算平台和处理装置等也是以实数模型为基础设计其内部运算方式、数据存储、数据组织和搬移方式。常规的通信领域的算法(下面称为常规通信算法,例如5G NR无线通信系统、卫星通信系统、Wi-Fi通信系统等的通信算法),主要基于复数模型构建算法的各个基础算子。相应的,常规通信领域的硬件计算平台和处理装置等也是以复数模型为基础设计其内部运算方式、数据存储、数据组织和搬移方式。
由于常规通信算法和常规AI算法的数学模型差异大,目前常规通信算法通常采用通用通信处理装置实现,常规AI算法通常采用通用AI处理装置实现,如图1所示。这两类处理装置都包括各自的控制单元和计算单元。其中,两类处理装置的控制单元用于进行对应算子的调度和控制。通用通信处理装置的计算单元主要构造复数算子,并利用相应算子完成常规通信算法的功能。通用AI处理装置的计算单元主要构造实数算子,并利用相应算子完成常规AI算法的功能。上述现有设计方法导致这两类处理装置的应用对象和场景受到限制。如果配置多套不同类型处理装置,则不利于获得更优的实现代价、能效、面效、计算效率等。
更进一步,随着通信系统的智能化发展,通信类AI算法的应用越来越多。对于通信类AI算法,根据具体应用对象,既有基于复数模型的算法,也有基于实数模型的算法。如果采用现有处理装置设计方法,面临需要配置多套不同类型处理装置,对于实现代价、面效、能效、处理效率等各方面不利。
基于此,本申请实施例提供一种处理装置和控制方法,该处理装置支持通信算法的高效运算和AI算法的高效运算。通过本申请提供的处理装置,可以实现通信算法和AI算法在一种处理装置上进行高效运行,获得最优的实现代价、面效、能效、计算效率等。示例性的,AI算法可以包括通信类的AI算法和/或非通信类的AI算法。示例性的,下文所述的通信算法可以理解为前文所述的常规通信算法,作为一种示例,下文所述的通信算法可以是通信类的非AI算法。
需要说明的是,本申请实施例中通信算法可以包括基于实数的通信算法,也可以包括基于复数的通信算法。通信类的AI算法可以包括基于实数的通信类的AI算法,也可以包括基于复数的通信类的AI算法。非通信类的AI算法可以包括基于实数的非通信类的AI算法,也可以包括基于复数的非通信类的AI算法。
通信类的AI算法可以但不限于通过如下一种或多种结构实现基于复数的通信类的AI算法:复数神经网络模型(或者复数决策树、或者复数支持向量机(support vector machines,SVM)、或者复数k-近邻法(k-nearest neighbor,k-NN)等)、复数代价函数、复数训练算法,如图2A所示。
或者,通信类的AI算法也可以但不限于通过如下一种或多种结构实现基于实数的通信类的AI算法:实数神经网络模型(或者实数SVM、或者实数k-NN等)、实数代价函数、实数训练算法,如图2B所示。
或者,通信类的AI算法可以但不限于通过如下一种或多种结构实现基于复数的通信类的AI算法和/或基于实数的通信类的AI算法:通过转换模块连接上述图2A所示结构和上述图2B所示结构,该
转换模块用于实现实数到复数的转换以及复数到实数的转换,如图2C所示。
举例说明,复数神经网络模型可以包括但不限于如下一种或多种:基于复数的多层感知机(multilayer perceptron,MLP)模型、基于复数的卷积神经网络(convolutional neural networks,CNN)模型、基于复数的残差网络(ResNet)模型、基于复数的循环神经网络(recurrent neural network,RNN)模型、基于复数的变换器(Transformer)模型、基于复数的自编码器(Autoencoder)模型、基于复数的生成对抗网络(generative adversarial network,GAN)模型;基于复数的激活函数、基于复数的归一函数模型Batch Normalization、基于复数的池化函数等。
复数代价函数用于定义AI/ML算法的优化目标。复数代价函数模型可以包括但不限于如下一种或多种:最小均方误差(minimum mean squared error,MMSE)、最小化余弦相似度(cosine similarity,CS)、最小化平方余弦相似度(squared generalized cosine similarity,SGCS)、最大化交叉熵(cross entropy,CE)等。
复数训练算法用于训练AI/ML算法模型,使其代价函数的输出不断收敛到最优值。复数训练算法可以包括但不限于如下一种或多种:基于复数的随机梯度下降算法(stochastic gradient descent,SGD)、基于复数的自适应矩估计算法(adaptive moment estimation,Adam)、以及Adam算法的扩展算法AdaMax、AMSGrad等。
举例说明,实数神经网络模型可以包括但不限于如下一种或多种:基于实数的多层感知机(multilayer perceptron,MLP)模型、基于实数的卷积神经网络(convolutional neural networks,CNN)模型、基于实数的残差网络(ResNet)模型、基于实数的循环神经网络(recurrent neural network,RNN)模型、基于实数的变换器(Transformer)模型、基于实数的自编码器(Autoencoder)模型、基于实数的生成对抗网络(generative adversarial network,GAN)模型;基于实数的激活函数、基于实数的归一函数模型Batch Normalization、基于实数的池化函数等。
实数代价函数用于定义AI/ML算法的优化目标。实数代价函数模型可以包括但不限于如下一种或多种:最小均方误差(minimum mean squared error,MMSE)、最小化余弦相似度(cosine similarity,CS)、最小化平方余弦相似度(squared generalized cosine similarity,SGCS)、最大化交叉熵(cross entropy,CE)等。
实数训练算法用于训练AI/ML算法模型,使其代价函数的输出不断收敛到最优值。实数训练算法可以包括但不限于如下一种或多种:基于实数的随机梯度下降算法(stochastic gradient descent,SGD)、基于实数的自适应矩估计算法(adaptive moment estimation,Adam)、以及Adam算法的扩展算法AdaMax、AMSGrad等。
通信类的AI算法可以应用于通信系统的多种应用。举例说明,应用于5G NR无线网络的接入网和核心网,包括接入网的物理层、链路层和网络层等。以无线网络物理层、链路层中的若干AI算法应用为例:
物理层的接收机接收到上行DMRS信道、上行SRS信道或者下行CSI-RS信道等,通过AI信道估计算法,可以有效降低信道噪声和信道干扰的负面影响,提升信道估计的准确性,获得上行或者下行的传输吞吐率增益;
物理层接收机利用多个历史信道估计的结果,通过AI算法对未来一个或者多个时刻的未知信道进行信道时域预测,提升跟踪无线通信信道动态变化的能力,获得对于未来信道信息的准确预测,在终端用户不断移动的场景下保证传输性能不受影响,提升用户体验;
物理层的发射机利用AI神经网络算法进行星座符号调制,可以构建发送端的规则星座或者非规则星座。在接收机侧,通过AI神经网络算法对于所发星座符号进行解调,准确恢复出发送比特信息,从而可以提升系统传输容量;
物理层的物理层发射机利用AI神经网络算法对于估计获得的信道状态信息(channel state information,CSI)进行压缩反馈,接收机利用AI神经网络算法对于接收到的反馈比特信息进行CSI恢复和信道重构。通过AI算法进行CSI压缩反馈和CSI恢复重构,可以有效提升信道CSI信息的压缩比和CSI重构的精度,提升系统传输容量。
物理层发射机利用AI算法对发送信号进行波束赋型,提高多用户发送的干扰抑制能力。在接收机侧,通过AI算法进行信号的准确检测,可以提高多输入多输出系统的(multiple-input multiple-out,MIMO)用户复用流数,提升系统传输容量。
链路层在接收机中根据终端反馈的信道质量指示(chanel quality indicator,CQI)、参考信号接收功率(reference signal received power,RSRP)等测量量,利用AI算法进行链路质量预测,判断决策最优的传输方式MCS选择,进行对应的信号发送。高准确度的MCS选择可以显著提升链路传输质量,获得更高的传输吞吐率。此外,通信AI算法还可以进行高频波束管理、非线性器件补偿等,MIMO系统中的无线空口资源调度等。这里不再一一列举。
非通信类的AI算法可以包括如下一项或多项:用于图像处理的AI算法、用于语音处理的AI算法、用于推荐系统的AI算法、用于医疗诊断的AI算法、用于自然语言处理的AI算法、用于金融分析的AI算法等等,这里不再一一列举。
参见图3所示,本申请实施例提供一种处理装置的结构。该处理装置包括计算单元和控制单元,计算单元用于执行算法的运算。控制单元用于控制计算单元执行运算,具体可以是通过软件调度的方式控制计算单元执行通信算法或AI算法,例如通过向计算单元发送指令调度计算单元。计算单元和控制单元的具体作用将在下文详细说明。
示例性的,控制单元还可以称为微控制器单元(micro-controller Unit,MCU),计算单元可以包括混合张量处理单元(Hybrid Tensor Array)和矢量处理单元(Vector Unit)。
其中,混合张量处理单元可以包括但不限于矩阵计算和张量计算。举例说明,混合张量处理单元的计算包括矩阵相乘,矩阵相加,矩阵分解(如奇异值分解(singular value decomposition,SVD),乔里斯基(Cholesky)分解等),矩阵求逆,矩阵点乘,一维、二维或者高维(大于二维)卷积,一维、二维或者高维(大于二维)稀疏卷积,张量相乘,张量相加,张量点乘,张量数据抽取、张量数据转置、张量维度转换等。
其中,矢量处理单元可以包括但不限于矢量计算和非线性计算。举例说明,矢量处理单元的计算包括矢量相乘、矢量相加、矢量点乘、矢量转置,矢量求倒数、矢量开根号、矢量三角函数、矢量指数函数、矢量激活函数等。激活函数类型可以包括但不限于sigmoid函数、tanh函数、Relu函数、Elu函数、LeakyRelu函数、softmax函数、softplus函数、swish函数等。
可选的,该处理装置还包括存储单元,存储单元通常用于临时缓存当前程序计算需要的数据,本实施例中,存储单元作为一个共享存储单元(share mem),储存计算单元所需要的数据,提供给计算单元也就是混合张量处理单元和矢量处理单元进行读写数据使用。如图4所示。
可选的,直接内存访问单元(direct memory access,DMA)作为处理装置与外部的访问设备,主要用于将外部数据搬入到内部共享存储单元中,或将共享存储单元中计算完成的数据搬移到外部存储空间。
下面首先对计算单元的结构进行说明。
计算单元包括多个PE,每个PE包括至少一个乘法器和至少一个加法器,其中,各个PE包括的乘法器的数量和/或加法器的数量可以相同,也可以不同,本申请不做具体限定。
上述PE支持加法器和乘法器通过至少两种连接关系相连,其中包括第一连接关系和第二连接关系。其中,通过第一连接关系连接至少一个加法器和至少一个乘法器,用于实现基于实数流的算法,通过第二连接关系连接至少一个加法器和至少一个乘法器,用于实现基于复数流的算法。需要说明的是,第一连接关系连接的乘法器和第二连接关系连接的乘法器可以相同,也可以不同。第一连接关系连接的加法器和第二连接关系连接中的加法器可以相同,也可以不同。
举例说明,第一连接关系可以用于实现基于实数的通信算法、基于实数流的AI算法(例如基于实数的通信类的AI算法、基于实数的非通信类的AI算法)等,第二连接关系可以用于实现基于复数的通信算法、基于复数流的AI算法(例如基于复数的通信类的AI算法、基于复数的非通信类的AI算法)等。
一种示例性说明中,上述PE中加法器和乘法器全连接,也就是,上述PE中任意两个器件(加法器和/或乘法器)之间均存在连接关系,从而该PE中可以支持加法器和乘法器通过多种连接方式相连。
控制单元可以根据控制计算单元中PE使用的连接关系。
一种可能的实现方式中,控制单元可以根据运算任务控制计算单元中PE使用的连接关系。
可选的,若该运算任务的输入数据为实数流,则该控制设备可以确定该运算任务为基于实数流的运算。若该运算任务的输入数据为复数流,则该控制设备可以确定该运算任务为基于复数流的运算。
具体的,控制单元可以根据运算任务的输入数据控制计算单元中PE使用的连接关系。例如,若输入的数据为实数流,则可以控制计算单元中PE使用第一连接关系,若输入的数据为复数流,则可以控
制计算单元中PE使用第二连接关系。
一种可能的实施方式中,控制单元可以通过如下方式确定运算任务:控制单元可以在检测到触发某个运算任务的预设事件时确定该运算任务。
例如,控制单元检测到触发图像识别的预设事件时确定该运算任务为进行图像识别,举例说明,该预设事件可以是用户触发的人脸识别指令、或者、用户触发的识别物体的指令,等等。
又例如,控制单元检测到触发语音处理的预设事件时确定该运算任务为进行语音处理,举例说明,该预设事件可以是用户触发的通话指令、或者用户触发的录音指令,等等。
又例如,控制单元检测到触发信道估计的预设事件时确定该运算任务为进行信道估计,举例说明,该预设事件可以是该处理装置所在的设备接收到上行DMRS信道、上行SRS信道或者下行CSI-RS信道,等等。
又例如,控制单元检测到触发信道时域预测的预设事件时确定该运算任务为进行信道时域预测,举例说明,该预设事件可以是该处理装置所在的设备接收到对端通信设备发送的指示信息,或者,该处理装置所在的设备确定要进行信道时域预测,等等。
又例如,控制单元检测到触发星座符号调制的预设事件时确定该运算任务为进行星座符号调制,举例说明,该预设事件可以是该处理装置所在的设备确定要进行构建发送端的规则星座或者非规则星座,或者,该处理装置所在的设备要向接收端发送信号,等等。
又例如,控制单元检测到触发星座符号解调的预设事件时确定该运算任务为进行星座符号解调,举例说明,该预设事件可以是该处理装置接收到对端设备发送的信号,等等。
又例如,控制单元检测到触发压缩反馈CSI的预设事件时确定该运算任务为对估计获得的CSI进行压缩反馈,举例说明,该预设事件可以是该处理装置所在的设备接收到对端设备的导频信号、或者该处理装置所在的设备进行了信道估计,等等。
又例如,控制单元检测到触发进行信道重构的预设事件时确定该运算任务为信道重构,举例说明,该预设事件可以是该处理装置所在的设备接收到对端设备上报的CSI,等等。进一步的,若该CSI为经过压缩的CSI,控制单元可以确定该运算任务还包括进行CSI恢复。
又例如,控制单元检测到触发对发送信号进行波束赋型的预设事件时确定该运算任务为对发送信号进行波束赋型,举例说明,该预设事件可以是该处理装置所在的设备要向对端设备发送信号,等等。
又例如,控制单元检测到触发信号检测的预设事件时确定该运算任务为信号检测,举例说明,该预设事件可以是该处理装置所在的设备接收到对端设备发送的信号,等等。
又例如,控制单元检测到触发链路质量预测的预设事件时确定该运算任务为链路质量预测,举例说明,该预设事件可以是该处理装置所在的设备接收到终端设备反馈的CQI、RSRP等测量量,等等。
如图5所示,以一个PE为例,该PE包括三个开关(switch)模块,乘法器模块和加法器模块,其中,每个开关模块包括至少一个开关,乘法器模块包括四个乘法器,加法器模块包括二个加法器。开关模块1通过开关的开合状态控制数据流所输入的乘法器,开关模块2通过开关的开合状态控制数据流所输入的加法器。开关模块3通过开关的开合状态控制数据流的输出。
具体的,若该PE执行基于实数流的运算,可以使用第一连接关系,例如,开关模块1可以通过开关的开合状态控制数据流输入乘法器模块中的乘法器部分(如四个乘法器),开关模块2可以通过开关的开合状态控制经过乘法器模块处理后的数据流直接输出到PE输出端(如跨过加法器)。若该PE执行基于复数流的运算,可以使用第二连接关系,例如,开关模块1可以通过开关的开合状态控制数据流输入乘法器模块中的乘法器部分(如四个乘法器),开关模块2可以通过开关的开合状态控制经过乘法器模块处理后的数据流输入加法器模块中的加法器部分(如两个加法器),再从加法器输出。
应理解,上述图5仅是一种示例性说明,本申请并不限定开关模块、乘法器模块、加法器模块的数量和连接关系,也不下限定乘法器模块中乘法器的数量、加法器模块中的加法器数量。
相比于基于实数流的算法和基于复数流的算法需要通过两种类型的处理装置分别实现,上述方式通过增加少量的硬件互联,从而可以在一个处理装置中复用乘法器和加法器来实现不同的算法。通过该方法,使得处理装置支持灵活的硬件处理和配置,提升资源利用率。并且,由于本申请提供的处理装置既可以实现基于实数流的算法,还可以实现基于复数流的算法,可以通过一个处理装置实现通信算法和AI算法的高效计算。
基于上述PE的架构,控制单元在控制计算单元执行运算时,可以控制计算单元中PE使用的连接
关系。例如,控制单元可以根据要执行的运算任务调度计算单元中PE使用的连接关系。举例说明,执行实数运算时可以调度计算单元中PE使用上述第一连接关系,执行复数运算时可以调度计算单元中PE使用上述第二连接关系。
可选的,控制单元可以采用时分复用的方式调度计算单元实现通信算法和AI算法。例如,如图6A所示。通过该方式可以提升处理装置硬件的使用率和能效。
一种可能的实施方式中,计算单元可以包括一个或多个槽位,每个槽位包括一个或多个PE。若计算单元包括多个槽位,该多个槽位相互连接,也就是该多个槽位中任意两个槽位之间均存在连接关系,例如,如图6B所示,计算单元包括槽位1、槽位2、槽位3和槽位4,其中,该4个槽位中任一槽位与其他槽位存在连接关系。该多个槽位中任意两个槽位实现的功能相同或者不同,各自独立完成算子计算中的一个功能处理步骤。上述槽位中的PE的功能可以相同,也可以不同,这里不做具体限定。
举例说明,计算单元包括第一槽位、第二槽位以及第三槽位,第一槽位、第二槽位以及第三槽位之间相互连接。其中,第一槽位包括实现乘法功能的多个PE;第二槽位包括实现加法树功能的多个PE;第三槽位包括实现累加器功能的多个PE。应理解,这里仅以槽位中包括的PE的功能相同为例进行说明,并不限定槽位中PE的功能类型。
作为一种示例性说明,若槽位用于实现基于实数流的运算,该槽位中的PE均用于实现基于实数流的运算,例如该槽位中PE均使用第一连接关系。若槽位用于实现基于复数流的运算,该槽位中的PE均用于实现基于复数流的运算,例如该槽位中PE均使用第二连接关系。
如图7所示,该计算单元包括三个开关模块,第一槽位、第二槽位和第三槽位,其中,每个开关模块包括至少一个开关。开关模块A通过开关的开合状态控制数据流输入到第一槽位开关模块B通过开关的开合状态控制数据流输入到第二槽位。开关模块C通过开关的开合状态控制数据流输入到第三槽位。
应理解,上述图7仅是一种示例性说明,本申请并不限定开关模块、槽位的数量和功能。
需要说明的是,本申请并不限定开关模块在处理装置中的位置,开关模块可以包括在槽位中。或者,开关模块也可以部署在槽位之外,这里不做具体限定。
基于上述计算单元的结构,控制单元在控制计算单元执行运算时,可以调度上述多个槽位中的全部槽位或部分槽位执行运算。例如,在执行第一运算任务时,可以调度上述多个槽位中的全部槽位执行运算,在执行第二运算任务时,可以调度上述多个槽位中的第一部分槽位执行运算,在执行第三运算任务时,可以调度上述多个槽位中的第二部分槽位执行运算。
结合图7的例子,下面对控制单元调度部分槽位的过程进行举例说明,控制单元可以根据执行的运算任务调度第一槽位和第三槽位。具体的,控制单元可以控制开关模块A开启,开关模块B关闭,开关模块C开启。从而数据流通过开关模块A的控制输入第一槽位,在经过第一槽位处理后,在开关模块C的控制下输入第三槽位,在经过第三槽位出后,输出数据。其中,开关模块开启指开关模块中至少一个开关为开启状态,开关模块关闭指开关模块中所有开关为关闭状态。
一种可能的实施方式中,控制单元在调度某个槽位执行运算时,具体可以根据运算任务调度该槽位中的计算元件进行相应运算。例如,以上述图7为例,控制单元可以根据执行的运算任务调度第一槽位中执行运算的PE。控制单元也可以根据执行的运算任务调度第二槽位中执行运算的PE。控制单元也可以根据执行的运算任务调度第三槽位中执行运算的PE,等等。
应理解,槽位中被调度的计算元件可以与运算量相关。例如,若运算任务的运算量较大,控制单元可以调度槽位中较多的计算元件执行运算,若运算任务的运算量较小,控制单元可以调度槽位中较少的计算元件执行运算。
上述方式中,通过槽位之间的互联,使得不同槽位上的计算资源,如乘法器,加法树,累加器等在不同的指令配置(算子)下具有可复用性,从而可以提升计算资源利用率。
本申请实施例中不限定上述控制单元、计算单元、存储单元之间的具体连接介质。本申请实施例在图1、图3中以控制单元、计算单元、存储单元之间通过总线连接,总线在图1、图3中以粗线表示,其它部件之间的连接方式,仅是进行示意性说明,并不引以为限。总线可以分为地址总线、数据总线、控制总线等。为便于表示,图1、图3中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。
通过本申请实施例,处理装置可以支持多种类型算子,举例说明,常规复数算子(例如通信算法中
的复数算子)、常规实数算子(例如通信算法中的实数算子)、AI复数算子(例如通信类的AI算法中的复数算子)、AI实数算子(例如非通信类的AI算法中的实数算子,通信类AI算法中的实数算子)等,可以实现单个硬件高效地支持通信算法和AI算法,两种类型算法可以以时分复用的方式共享硬件资源,提升处理装置硬件的使用率和能效。并且,本申请实施例中在对数据流进行处理时,针对组成数据流的多个数据判断一次执行需要的控制信息,相比于针对每个数据判断一次执行的控制信息,本申请可以提升计算的效率。
基于与方法实施例的同一构思,本申请实施例提供一种控制单元,该控制单元的结构可以如图8所示,包括确定模块801和调度模块802。控制单元具体可以用于实现图3-图5、图6A、图6B、图7的实施例中控制单元执行的方法。其中,确定模块801,用于确定运算任务。调度模块802,用于在运算任务为基于实数的运算时,调度计算单元中计算元件的至少一个加法器和至少一个乘法器的第一连接关系进行运算;以及,在运算任务为基于复数的运算时,调度计算元件的至少一个加法器和至少一个乘法器的第二连接关系进行运算;其中,所述第一连接关系用于实现基于实数流的通信算法或者人工智能AI算法,所述第二连接关系用于实现基于复数流的通信算法或AI算法。
可选的,调度模块802,还用于:根据运算任务调度计算单元中的至少一个槽位,其中,至少一个槽位用于执行对运算任务的运算。
可选的,调度模块802还用于:在调度所述计算单元中的第一槽位时,根据运算任务调度第一槽位中的计算元件,其中,被调度的计算元件用于执行对运算任务的运算。
本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,另外,在本申请各个实施例中的各功能模块可以集成在一个处理装置中,也可以是单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。可以理解的是,本申请实施例中各个模块的功能或者实现可以进一步参考图3-图5、图6A、图6B、图7所述实施例的相关描述。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (17)
- 一种处理装置,其特征在于,所述处理装置支持通信算法的运算和人工智能AI算法的运算;所述处理装置包括计算单元和控制单元;其中,所述计算单元包括计算元件,所述计算元件支持:至少一个加法器和至少一个乘法器的第一连接关系,以及,所述至少一个加法器和所述至少一个乘法器的第二连接关系;所述第一连接关系用于实现基于实数流的通信算法或者AI算法,所述第二连接关系用于实现基于复数流的通信算法或AI算法;所述控制单元,用于控制所述计算单元中所述计算元件使用的连接关系;所述连接关系包括所述第一连接关系和所述第二连接关系。
- 如权利要求1所述的处理装置,其特征在于,所述至少一个加法器和所述至少一个乘法器全连接。
- 如权利要求1所述的处理装置,其特征在于,所述控制单元具体用于:采用时分复用的方式调度计算单元实现通信算法和AI算法。
- 如权利要求1所述的处理装置,其特征在于,所述计算单元包括一个槽位,其中,所述槽位包括一个或多个所述计算元件。
- 如权利要求1所述的处理装置,其特征在于,所述计算单元包括至少两个槽位,所述至少两个槽位相互连接,其中,每个所述槽位包括一个或多个所述计算元件。
- 如权利要求4或5所述的处理装置,其特征在于,所述槽位在实现所述槽位对应的功能时通过所述槽位中被调度的计算元件进行相应的运算。
- 如权利要求5所述的处理装置,其特征在于,所述控制单元,还用于:调度所述至少两个槽位中的全部槽位或部分槽位执行运算。
- 如权利要求1-7任一项所述的处理装置,其特征在于,所述AI算法包括通信类的AI算法和/或非通信类的AI算法。
- 一种控制方法,其特征在于,所述方法包括:控制单元确定运算任务;若所述运算任务为基于实数流的运算,所述控制单元调度计算单元中计算元件的至少一个加法器和至少一个乘法器的第一连接关系进行运算;若所述运算任务为基于复数流的运算,所述控制单元调度所述计算元件的所述至少一个加法器和所述至少一个乘法器的第二连接关系进行运算;所述第一连接关系用于实现基于实数流的通信算法或者人工智能AI算法,所述第二连接关系用于实现基于复数流的通信算法或AI算法。
- 如权利要求9所述的方法,其特征在于,所述方法还包括:所述控制单元根据所述运算任务调度所述计算单元中的至少一个槽位,其中,所述至少一个槽位用于执行对所述运算任务的运算。
- 如权利要求10所述的方法,其特征在于,所述方法还包括:所述控制单元在调度所述计算单元中的第一槽位时,根据所述运算任务调度所述第一槽位中的计算元件,其中,被调度的计算元件用于执行对所述运算任务的运算。
- 一种控制单元,其特征在于,所述控制单元包括:确定模块,用于确定运算任务;调度模块,用于在所述运算任务为基于实数流的运算时,调度计算单元中计算元件的至少一个加法器和至少一个乘法器的第一连接关系进行运算;以及,在所述运算任务为基于复数流的运算时,调度所述计算元件的所述至少一个加法器和所述至少一个乘法器的第二连接关系进行运算;所述第一连接关系用于实现基于实数流的通信算法或者人工智能AI算法,所述第二连接关系用于实现基于复数流的通信算法或AI算法。
- 如权利要求12所述的控制单元,其特征在于,所述调度模块,还用于:根据所述运算任务调度所述计算单元中的至少一个槽位,其中,所述至少一个槽位用于执行对所述运算任务的运算。
- 如权利要求13所述的控制单元,其特征在于,所述调度模块还用于:在调度所述计算单元中的第一槽位时,根据所述运算任务调度所述第一槽位中的计算元件,其中,被调度的计算元件用于执行对所述运算任务的运算。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如权利要求9~11中任意一项所述的方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品包括指令,当所述指令被处理装置运行时,使得如权利要求9~11任一项所述的方法被实现。
- 一种芯片系统,其特征在于,包括处理器,所述处理器用于执行存储器中存储的程序、指令或代码,以实现如权利要求9~11中任意一项所述的方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211060064.1 | 2022-08-31 | ||
CN202211060064.1A CN117675608A (zh) | 2022-08-31 | 2022-08-31 | 一种处理装置及控制方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024045888A1 true WO2024045888A1 (zh) | 2024-03-07 |
Family
ID=90073874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2023/105636 WO2024045888A1 (zh) | 2022-08-31 | 2023-07-04 | 一种处理装置及控制方法 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN117675608A (zh) |
WO (1) | WO2024045888A1 (zh) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210256092A1 (en) * | 2020-02-19 | 2021-08-19 | Nvidia Corporation | Application programming interface to accelerate matrix operations |
CN113614749A (zh) * | 2021-06-25 | 2021-11-05 | 华为技术有限公司 | 人工智能模型的处理方法、装置、设备及可读存储介质 |
CN114443559A (zh) * | 2020-10-30 | 2022-05-06 | 辰芯科技有限公司 | 可重构算子单元、处理器、计算方法、装置、设备及介质 |
-
2022
- 2022-08-31 CN CN202211060064.1A patent/CN117675608A/zh active Pending
-
2023
- 2023-07-04 WO PCT/CN2023/105636 patent/WO2024045888A1/zh unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210256092A1 (en) * | 2020-02-19 | 2021-08-19 | Nvidia Corporation | Application programming interface to accelerate matrix operations |
CN114443559A (zh) * | 2020-10-30 | 2022-05-06 | 辰芯科技有限公司 | 可重构算子单元、处理器、计算方法、装置、设备及介质 |
CN113614749A (zh) * | 2021-06-25 | 2021-11-05 | 华为技术有限公司 | 人工智能模型的处理方法、装置、设备及可读存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN117675608A (zh) | 2024-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xia et al. | A deep learning framework for optimization of MISO downlink beamforming | |
Ge et al. | Deep reinforcement learning for distributed dynamic MISO downlink-beamforming coordination | |
Wu et al. | Linear precoding for the MIMO multiple access channel with finite alphabet inputs and statistical CSI | |
Park et al. | Rate-splitting multiple access for downlink MIMO: A generalized power iteration approach | |
KR102510513B1 (ko) | 심층 학습 기반의 빔포밍 방법 및 이를 위한 장치 | |
WO2023040887A1 (zh) | 信息上报方法、装置、终端及可读存储介质 | |
CN114219354A (zh) | 一种基于联邦学习资源分配优化方法及系统 | |
WO2022105913A1 (zh) | 通信方法、装置及通信设备 | |
Perera et al. | Flex-Net: A graph neural network approach to resource management in flexible duplex networks | |
CN116848828A (zh) | 机器学习模型分布 | |
WO2024045888A1 (zh) | 一种处理装置及控制方法 | |
KR20240117116A (ko) | 통신 방법 및 통신 장치 | |
WO2023125996A1 (zh) | 一种上行预编码方法及装置 | |
EP4422094A1 (en) | Calibration method and apparatus | |
WO2023125985A1 (zh) | 模型的数据处理方法及装置 | |
EP4422317A1 (en) | Communication method and apparatus | |
WO2024008004A1 (zh) | 一种通信方法及装置 | |
US20240357560A1 (en) | Data processing method for model, and apparatus | |
WO2023185890A1 (zh) | 一种数据处理方法及相关装置 | |
US20230259742A1 (en) | Communication method, apparatus, and system | |
WO2024169600A1 (zh) | 一种通信方法及装置 | |
US20240356595A1 (en) | Uplink precoding method and apparatus | |
US20240346384A1 (en) | Communication method and apparatus | |
EP4373006A1 (en) | Communication method and apparatus | |
WO2024060013A1 (zh) | 数据处理方法及相关设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23858926 Country of ref document: EP Kind code of ref document: A1 |