WO2023246132A1 - 通道分路器、存储控制装置、片上系统及终端 - Google Patents

通道分路器、存储控制装置、片上系统及终端 Download PDF

Info

Publication number
WO2023246132A1
WO2023246132A1 PCT/CN2023/077375 CN2023077375W WO2023246132A1 WO 2023246132 A1 WO2023246132 A1 WO 2023246132A1 CN 2023077375 W CN2023077375 W CN 2023077375W WO 2023246132 A1 WO2023246132 A1 WO 2023246132A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
address
decoding
bit
data
Prior art date
Application number
PCT/CN2023/077375
Other languages
English (en)
French (fr)
Inventor
刘卓睿
Original Assignee
哲库科技(上海)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 哲库科技(上海)有限公司 filed Critical 哲库科技(上海)有限公司
Publication of WO2023246132A1 publication Critical patent/WO2023246132A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1694Configuration of memory controller to different memory types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7821Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Definitions

  • Embodiments of the present application relate to the field of storage technology, and in particular to a channel splitter, a storage control device, a system on a chip, and a terminal.
  • Multi-channel is widely used in terminals as a technology to improve memory read and write bandwidth. For example, it is common for mobile phones to support 4 memory channels.
  • Embodiments of the present application provide a channel splitter, a storage control device, a system on a chip, and a terminal.
  • the technical solutions are as follows:
  • embodiments of the present application provide a channel splitter, which includes: a splitter component and N bit-width conversion components, where N is greater than or equal to 2;
  • the branching component is used to divide the memory address into N memory channels;
  • the bit width conversion component is used to perform bit width conversion on the data input by the shunt component
  • the shunt component includes an address decoder.
  • the address decoder is used for memory address decoding.
  • the address decoder supports at least two working modes, and the address decoder operates in different working modes.
  • the decoding methods used are different.
  • embodiments of the present application provide a storage control device, which includes: at least one channel splitter and controller as described in the above aspect;
  • the storage control device is connected to the main device through a main bus, and the storage control device is connected to the memory through a physical layer interface.
  • embodiments of the present application provide a system-on-chip, which includes: a main device and a storage control device as described above;
  • the main device is connected to the storage control device through a main bus;
  • the storage control device is connected to the memory through a physical layer interface.
  • embodiments of the present application provide a terminal, in which the system-on-chip as described above is provided.
  • Figure 1 is an architectural schematic diagram of the memory reading architecture in related technologies
  • Figure 2 shows a schematic structural diagram of a channel splitter according to an exemplary embodiment of the present application
  • Figure 3 shows a schematic structural diagram of a shunt component according to an exemplary embodiment of the present application
  • Figure 4 is a schematic structural diagram of an address decoder according to an exemplary embodiment of the present application.
  • Figure 5 is a schematic structural diagram of an address decoder with two working modes according to an exemplary embodiment of the present application
  • Figure 6 is a schematic structural diagram of an address decoder with three working modes according to an exemplary embodiment of the present application
  • Figure 7 shows a schematic structural diagram of a channel splitter according to another exemplary embodiment of the present application.
  • Figure 8 is a schematic structural diagram of a channel splitter arranged outside the safety bus according to an exemplary embodiment of the present application.
  • Figure 9 is a schematic structural diagram of a channel splitter installed within a safety bus according to an exemplary embodiment of the present application.
  • Figure 10 is a schematic diagram of the connection between the security bus and the encryption and decryption components according to an exemplary embodiment of the present application
  • Figure 11 is a schematic diagram of the connection between the security bus and the encryption and decryption components shown in another exemplary embodiment of the present application;
  • Figure 12 is a schematic structural diagram of a storage control device according to an exemplary embodiment of the present application.
  • Figure 13 is a schematic structural diagram of a storage control device with 8 content channels according to an exemplary embodiment of the present application.
  • Figure 14 shows a schematic structural diagram of a system-on-chip according to an exemplary embodiment of the present application
  • FIG. 15 shows a schematic structural diagram of a system-on-chip according to another exemplary embodiment of the present application.
  • the "plurality” mentioned in this article means two or more than two.
  • “And/or” describes the relationship between related objects, indicating that there can be three relationships.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone.
  • the character “/” generally indicates that the related objects are in an "or” relationship.
  • Each master device has established 4 bus links with the primary bus (primary bus) 11 to send data to the primary bus 11 through the bus link. Data read and write instructions. Four links are also established between the main bus 11 and the secondary bus 12, and four memory channels are established between the secondary bus 12 and the four controllers 13.
  • data read and write instructions are sent from the bus 12 to the controller 13 corresponding to a certain memory channel, and the controller 13 reads and writes data to the memory 15 through the physical layer interface 14 .
  • the number of memory channels is consistent with the number of links between master and slave buses.
  • this memory reading method is relatively simple in design, its performance has become increasingly difficult to meet the needs of artificial intelligence and other parallel computing applications.
  • the embodiment of the present application provides a channel splitter composed of a splitter component and N bit-width conversion components.
  • the memory address is divided into N memory channels through the branch component, and the bit
  • the wide conversion component performs bit-width conversion on the data input by the shunt component to expand the memory channel, thereby enabling data reading and writing through multiple memory channels, which helps to increase the memory reading and writing bandwidth, thereby improving the performance of the upstream host device, achieving Support for more concurrent application scenarios.
  • the address decoder in the shunt component supports at least two working modes. It can use different decoding methods to decode memory addresses in different working modes according to the performance and power consumption requirements of the scenario to meet different needs. The power consumption and performance requirements of the usage scenario.
  • the channel splitter provided by the embodiment of the present application is applied to the above memory reading architecture (can be set in the slave bus)
  • the number of memory channels between the slave bus and the controller is reduced from the original number of links between the master and slave buses. Consistent, it becomes greater than the number of links between master and slave buses, which realizes the expansion of the number of memory channels, thus improving the memory read and write bandwidth. Without increasing the read and write speed of memory particles, it can also improve the on-chip system and The overall performance of the terminal device.
  • the structure and working principle of the channel splitter are described below through schematic embodiments.
  • FIG. 2 shows a schematic structural diagram of a channel splitter according to an exemplary embodiment of the present application.
  • the channel splitter 21 includes: a splitter component 211 and N bit-width conversion components 212, where N is greater than or equal to 2.
  • the channel splitter when the channel splitter is a dual-channel splitter, the dual-channel splitter is provided with a splitter component and two bit-width conversion components; when the channel splitter is a three-channel splitter When converting, the three-channel splitter is equipped with a splitter component and three bit-width conversion components.
  • the channel splitter may be called a 1 to N component, which is not limited in the embodiment of the present application.
  • the shunt component 211 includes an input interface and N shunt output interfaces, and the shunt component 211 is connected to the N bit-width conversion components 212 through the N shunt output interfaces.
  • the shunt component 211 is used to implement the shunt function. , that is, the memory address is divided into N memory channels, and the input bit width of the splitting component 211 is the same as the output bit width.
  • the bit width conversion component 212 includes a shunt input interface and an output interface.
  • the bit width conversion component 212 is used to perform bit width conversion on the input data from the shunt input interface, and output the output data after the bit width conversion through the output interface, that is, the bit width.
  • the conversion component 212 is used to implement a bit width conversion function, and the input bit width and output bit width of the bit width conversion component 212 are different.
  • the bit width conversion component 212 can be implemented as a Downsizer for converting a high bit width input into a bit width output. Wherein, the sum of the output bit widths of the N bit width conversion components 212 is greater than or equal to the input bit width.
  • the bit width conversion component is used to implement bit width conversion from 256 bits to 128 bits.
  • the bit width conversion component is 2 bit-width conversion components (128bits ⁇ 2) is equal to the input bit width; when the number of bit-width conversion components is 3, the sum of the output bit widths of the 2 bit-width conversion components is 3
  • the sum of the output bit widths (128bits ⁇ 3) is greater than the input bit width.
  • the upstream device When the upstream device performs data read and write operations, it will indicate the address of the data to be read and written in the data read and write instructions.
  • the channel splitter After the channel splitter receives the data read and write request, it needs to transfer the address indicated by the master device (usually a virtual pseudo address) into a physical address, so that the controller can be subsequently instructed to read data based on the physical address. This process is the decoding process of the memory address.
  • the branching component 211 includes an address decoder 2111.
  • the address decoder can perform decoding using a single decoding (or called a word structure) or a double decoding (or called an X-Y decoding structure).
  • the number of output address bits of the address decoder is less than the number of input address bits of the address decoder, and the difference in the number of bits is positively correlated with N.
  • the larger N that is, the more memory channels
  • the virtual address input to the address decoder is 37 or 40 bits.
  • the physical memory address output by the address decoder after decoding is 35 or 38 bits.
  • the physical memory address output by the address decoder after decoding is 34 or 37 bits.
  • the internal structure of the shunt component 300 with one input interface S0 and N output interfaces (M0 to MN) is enlarged.
  • the input interface S0 is provided with an address decoder 301 and a bus matrix (BUS Matrix). 302.
  • Each output interface is provided with its corresponding matrix (M0 matrix to MN matrix).
  • the address decoder 301 completes address decoding, it performs data interleaving based on the decoded memory address, thereby distributing data corresponding to each memory channel to the matrix of the output interface through the bus matrix 302.
  • the address decoder in the embodiment of the present application supports at least two working modes, and in different working modes, the address decoder uses different Decoding mode performs address decoding.
  • the channel splitter can set the address decoder to the corresponding working mode based on the memory read and write performance and power consumption requirements of the usage scenario, thereby meeting the performance requirements and/or power consumption requirements of the current usage scenario.
  • the address decoder is provided with a first register and at least two address decoding modules.
  • the data stored in the first register is used to indicate the working mode, and different address decoding modules are used to operate in different Work in working mode.
  • the at least two address decoding modules include a first address decoding module and a second address decoding module
  • the first address decoding module is used to work in the first working mode, and the decoding method used by the first address decoding module is low-bit decoding;
  • the second address decoding module is used to work in the second working mode, and the decoding method used by the second address decoding module is high-bit decoding;
  • the interleaving granularity used in low-bit decoding is smaller than that used in high-bit decoding.
  • the first address decoding module is also provided with a hash function, and the hash function is used to load balance the N memory channels.
  • At least two address decoding modules also include a third address decoding module
  • the third address decoding module is configured to work in the third working mode.
  • the decoding method used by the third address decoding module includes low-bit decoding and high-bit decoding.
  • the address decoder is further provided with a second register, and the data stored in the second register is used to indicate the memory address range corresponding to different decoding methods in the third working mode.
  • the third address decoding module is also provided with a hash function, and the hash function is used to load balance the N memory channels during the low-bit decoding process.
  • the bit-width conversion component uses a random access memory structure.
  • the input interface of the channel splitter is connected to the main device through the main bus.
  • the main device is a device with data reading and writing requirements;
  • the N output interfaces of the channel splitter are connected to N controllers, and the N controllers correspond to N memory channels.
  • the address decoder 400 is provided with a first register 401 and at least two address decoding modules 402. Different address decoding modules 402 is used to work in different working modes.
  • the data stored in the first register 401 is used to indicate the working mode.
  • the first register when receiving a mode setting instruction from an upstream master device, contains mode data corresponding to the target mode indicated by the mode setting instruction.
  • the address decoding module 402 is a hardware module.
  • the address decoder 400 controls the operation of the address decoding module 402 corresponding to the corresponding working mode based on the data written in the first register 401.
  • the address decoder 500 when the address decoder 500 supports two operating modes, the data stored in the first register 511 includes 0 (indicating the first operating mode) and 1 (indicating the second operating mode). working mode), correspondingly, the address decoder 500 is provided with a first address decoding module 521 and a second address decoding module 522.
  • the address decoder 600 when the address decoder 600 supports three working modes, the data stored in the first register 611 includes 00 (indicating the first working mode), 01 (indicating the first working mode), and 01 (indicating the first working mode). second working mode) and 10 (indicating the third working mode).
  • the address decoder 600 is provided with a first address decoding module 621, a second address decoding module 622 and a third address decoding module 623.
  • the partitioning granularity used when partitioning memory channels can be called interleaving granularity. For example, when 8GB of memory is divided into 8 memory channels according to the interleaving granularity of 1GB, 0 to 1GB are divided into the first memory channel, 1 to 2GB are divided into the second memory channel, and 2 to 3GB are divided into to third memory channel, and so on.
  • the 8i to (8i+1) MB will be divided into the first memory channel, and the (8i+1) to (8i+2) MB will be divided into is divided into the second memory channel, (8i+2) to (8i+3) MB will be divided into the third memory channel, and so on, i is an integer.
  • each memory channel is divided into different memory blocks under different interleaving granularities (the total amount is the same). As the interleaving granularity decreases, the speed of data reading and writing through the memory channel will continue to increase. For example, when the memory channel is divided according to the granularity of 1GB, when the 400th to 500MB data is read, only the first memory channel can be read; when the memory channel is divided according to the granularity of 1MB, 8 memory channels can be read at the same time. Read the first 400 to 500MB of data.
  • At least two address decoding modules provided in the address decoder include a first address decoding module 521 and a first address decoding module 521. 2. Address decoding module 522.
  • the first address decoding module 521 is configured to work in the first working mode, and the decoding method used by the first address decoding module 521 is low-bit decoding;
  • the second address decoding module 522 is configured to work in the second operating mode, and the decoding method used by the second address decoding module 522 is high-bit decoding.
  • the performance of data reading and writing in the first working mode is better than the performance of data reading and writing in the second working mode.
  • the power consumption of data reading and writing in the first working mode is higher than the performance of data reading and writing in the second working mode.
  • the first operating mode may be called a performance mode
  • the second operating mode may be called a power consumption mode.
  • the address decoder can dynamically switch working modes in different application scenarios to meet the requirements for data read and write performance and power consumption in different scenarios. power consumption requirements.
  • the address decoder can be set to the first working mode (using low-bit decoding) to prioritize data reading in this scenario.
  • write performance in scenarios with high power consumption requirements (such as when the application is running in the background), the address decoder can be set to the second working mode (using high-bit decoding) to reduce the data reading and writing process in this scenario the power consumption caused.
  • the interleaving granularity used in high-bit decoding is 10MB, and the interleaving granularity used in low-bit decoding is 2MB; or, the interleaving granularity used in high-bit decoding is 1GB, The interleaving granularity used in low-bit decoding is 100MB.
  • the embodiment of this application does not limit the specific interleaving granularity used in high- and low-bit decoding.
  • a hash function is set up, so that when dividing memory channels, the hash function is used to load balance the N memory channels, thereby improving the overall data reading and writing performance.
  • the first address decoding module can also apply other hash functions for achieving load balancing between memory channels.
  • the embodiments of the present application are not limited to this.
  • the memory channels corresponding to different channel splitters adopt the same interleaving granularity.
  • the address decoder in addition to supporting the first working mode and the second working mode, the address decoder also supports the third working mode, and in the third working mode, the memory channels corresponding to different channel splitters can Use different interweaving granularities.
  • the address decoder is also provided with a third address decoding module.
  • the third address decoding module is used to work in the third working mode, and the decoding method used by the third address decoding module is Coding methods include low-bit decoding and high-bit decoding to achieve a balance between read and write performance and power consumption.
  • the memory address needs to be divided and different decoding methods are used in different memory address ranges.
  • the memory address is divided in at least one way.
  • the third address decoding module performs low-bit decoding in the first memory address range and high-bit decoding in the second memory address range.
  • the memory address range from 0 to 16GB uses low-bit decoding
  • the memory address range from 16GB to 32GB uses high-bit decoding
  • the address decoder 600 is also provided with a second register 612, and the data stored in the second register 612 is used to indicate different decoding in the third working mode.
  • the data stored in the second register is only valid when the data stored in the first register indicates that the current mode is the third working mode.
  • the mode setting instruction indicates the third working mode
  • the data corresponding to the third working mode is written in the first register
  • the data corresponding to the third working mode is written in the second register. Write data corresponding to the memory address range indicated by the mode setting command.
  • the larger the memory address range corresponding to low-bit decoding the better the performance of data reading and writing, but the higher the power consumption.
  • the larger the memory address range corresponding to high-bit decoding the better the performance of data reading and writing. The lower the consumption, the worse the performance.
  • the data stored in the second register includes 00 (indicating 8GB low-bit decoding + 12GB ⁇ 2 high-bit decoding code), 01 (indicates 16GB low bit decoding + 8GB ⁇ 2 high bit decoding), 10 (indicates 24GB low bit decoding + 4GB ⁇ 2 high bit decoding) and 11 (indicates 24GB low bit decoding + 8GB high bit decoding) bit decoding).
  • the data stored in the second register includes 00 (indicating 6GB low-bit decoding + 9GB ⁇ 2 high-bit decoding), 01 (indicating 12GB low-bit decoding + 6GB ⁇ 2 high-bit decoding) and 10 (indicates 18GB low-bit decoding + 3GB ⁇ 2 high-bit decoding).
  • the data stored in the second register includes 00 (indicating 4GB low bit decoding + 6GB ⁇ 2 high bits decoding), 01 (indicating 8GB low-bit decoding + 4GB ⁇ 2 high-bit decoding), and 10 (indicating 12GB low-bit decoding + 2GB ⁇ 2 high-bit decoding).
  • the data stored in the second register includes 00 (indicating 3GB low bit decoding + 4.5GB ⁇ 2 high bit decoding), 01 (indicates 6GB low-bit decoding + 3GB ⁇ 2 high-bit decoding), and 10 (indicates 9GB low-bit decoding + 1.5GB ⁇ 2 high-bit decoding).
  • the address decoder reads the data in the first register during the decoding process. If the data indicates the third operating mode, it further reads the data in the second register, thereby indicating based on the data For the memory address range, the corresponding decoding method is used for address decoding.
  • the third address decoding module is equipped with a hash function, so that when dividing the memory channels, the hash function is used to load balance the N memory channels during the low-bit decoding process, thereby improving the overall data read Write performance.
  • the address decoder may support more than three working modes, which is not limited in this embodiment.
  • the channel splitter can choose to decode the memory address through different decoding methods in different working modes. This meets the performance and power consumption requirements of different scenarios.
  • the bit width conversion component adopts a CAM (Content Addressable Memory) structure, that is, a register array is used to implement the bit width. conversion function.
  • CAM Content Addressable Memory
  • the bit-width conversion component using the CAM structure has a faster bit-width conversion speed.
  • a certain threshold such as 128 or 256
  • the register array requires additional cycles to complete the bit width conversion.
  • the bit width conversion component adopts a RAM (Random Access Memory, random access memory) structure, that is, with the help of SRAM (Static Random-Access Memory, static random access memory) Implement bit width conversion function. Since SRAM is not limited by depth issues, when the buffer depth is large, the bit-width conversion component using the RAM structure has lower latency than the bit-width conversion component using the CAM structure (theoretically, it can save A cycle).
  • RAM Random Access Memory, random access memory
  • SRAM Static Random-Access Memory, static random access memory
  • bit-width conversion component for low-performance platforms (small buffer depth), a CAM-structured bit-width conversion component can be used; for high-performance platforms (large buffer depth), RAM can be used The bit-width conversion component of the structure.
  • bit-width conversion components using a CAM structure and a RAM structure can be set, and in the low-performance mode
  • the bit width conversion component of the CAM structure is used, and the bit width conversion component of the RAM structure is used in the high performance mode. This is not limited in the embodiment of the present application.
  • the controller needs to perform data ordering on the read data, so a data ordering module needs to be set inside the controller.
  • a data ordering module needs to be set inside the controller.
  • the bit width conversion component is combined with the data rearrangement of the read data path in the controller, that is, a data rearrangement module is set in the bit width conversion component. Instead of setting the data rearrangement module in the controller.
  • the read data is transparent to the controller. After the controller transmits the read data to the bit width conversion component, the data rearrangement module in the bit width conversion component performs data rearrangement and further upwards. The game transmits the rearranged data.
  • the address decoder 711 of the channel splitter 71 is connected to the main device 72 through the main bus 73, N
  • the bit width conversion component 712 is connected to the N controllers 74 .
  • the main device 72 is a device that has data reading and writing requirements during operation.
  • the main device may include, but is not limited to, a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU), a neural network processor (Neural-network Processing Unit, NPU), a digital signal processor (Digital Signal Processors such as Processor (DSP), and non-processors such as Image Sensor (Image Sensor), Image Signal Processing Unit (ISP), and Video Processing Unit (VPU).
  • the embodiments of this application do not limit the specific type of the master device.
  • the main device 72 may be a main device that has data reading and writing requirements, such as a processor, or may only have reading or writing requirements, such as an image sensor. Whether the master device has both reading and writing requirements does not constitute a limitation on this application.
  • the main bus 73 can be implemented as a system cache (System Cache, SC) bus.
  • system cache System Cache, SC
  • the controller 74 may be implemented as a dynamic memory controller (Dynamic Memory Controller, DMC).
  • DMC Dynamic Memory Controller
  • N controllers 74 correspond to N memory channels, that is, different controllers 74 is used to control data read and write operations through different memory channels.
  • the controller 74 is connected to the memory through a corresponding physical layer interface (PHY) to implement data reading and writing operations on the memory.
  • PHY physical layer interface
  • the channel splitter is provided in the slave bus used to connect the master device and the memory to realize multi-memory channel access of the data in the memory by the master device.
  • the slave bus can be realized as a Double Data Rate (DDR) bus; during the data access process, the master device acts as the Master, and the memory acts as the Slave.
  • DDR Double Data Rate
  • each master device 72 is connected to the master bus 73 through n links, and the master bus 73 interleaves the links corresponding to different master devices 72 to establish n links with the slave bus.
  • the number of links established between slave buses is related to the number of channel splitters set in the slave bus.
  • the link between the master device 72 and the master bus 73 and the link between the master bus 73 and the slave bus adopt the same bus protocol.
  • this link uses the Advanced eXtensible Interface (AXI) bus protocol.
  • AXI Advanced eXtensible Interface
  • the embodiments of this application do not limit the specific bus protocol adopted by the link.
  • another possible implementation method is to increase the link between the main device and the main bus.
  • the number of channels (that is, branching on the main device side) is used to increase the number of memory channels. For example, after increasing the number of links between the main device and the main bus from n to m, and increasing the number of controllers from n to m, the number of memory channels can also be increased.
  • the number of memory channels can be increased by implementing shunting at the main bus. For example, when n links are established between the main device and the main bus, the main bus establishes m links with m controllers through shunts, thereby increasing the number of content channels from n to m.
  • shunting is still carried out at the main bus, because the shunting is too early, and its hardware implementation complexity is higher than that of shunting downstream of the main bus.
  • implementing a shunt downstream of the main bus can reduce the impact on the main device and main bus, ensure compatibility with existing main devices and buses, and improve compatibility.
  • implementing shunting downstream of the main bus can save the area of the on-chip system and make it simpler to implement system timing; it also helps to reduce power consumption and reduce The implementation complexity of subsequent power consumption optimization.
  • implementing channel branching downstream of the main bus can not only reduce the impact on the upstream main device and main bus, but also ensure the adaptability of the solution. , it can also save the area of the on-chip system and make it easier to implement system timing.
  • the solution of implementing shunting at the storage control device helps to reduce power consumption and reduces the implementation complexity of subsequent power consumption optimization.
  • a security bus (Security BUS, SBUS) is usually set up in the on-chip system.
  • the channel splitter can be located outside the safety bus, or the channel splitter is located on the safety bus internal. The two setting positions are described separately below through exemplary embodiments.
  • the safety bus 810 includes a first interface 811 and a second interface 812 .
  • the output interface of the channel splitter 800 is connected to the first interface 811 of the safety bus 810, and the second interface 812 of the safety bus 810 is connected to the controller 830, that is, the data output by the channel splitter is downward, and the controller outputs upward All data needs to pass through the secure bus.
  • the channel splitter 800 in Figure 8 corresponds to N safety buses 810, and the channel splitter includes a splitter component 801 and N bit-width conversion components 802. Each bit-width conversion component 802 and the corresponding safety bus 810 The first interface 811 is connected.
  • the flow direction of data with security attributes is: the branch component of the channel splitter (AXI256) ⁇ the bit width conversion component of the channel splitter (AXI128) ⁇ the first interface of the security bus (AXI128) ⁇ Second interface of the safety bus (AXI128) ⁇ controller (AXI128).
  • the flow direction of data with safety attributes is: controller ⁇ the second interface of the safety bus ⁇ the first interface of the safety bus ⁇ the bit width conversion component of the channel splitter ⁇ the splitter component of the channel splitter .
  • the arrow points to the writing process that only includes data with security attributes, and the reading process is not included for the sake of simplicity of illustration, and is not intended to limit the embodiment of the present application.
  • the safety bus 900 includes the channel splitter 910 and N third interfaces 901 .
  • the N output interfaces of the channel splitter 910 are connected to the N third interfaces 901
  • the third interface 901 is connected to the controller 920 .
  • the channel splitter 910 in FIG. 9 includes a splitter component 911 and N bit-width conversion components 912, and each bit-width conversion component 912 is connected to the corresponding third interface 901.
  • the flow direction of data with security attributes is: the branch component of the channel splitter (AXI256) ⁇ the bit width conversion component of the channel splitter (AXI128) ⁇ the third interface of the security bus (AXI128) ⁇ Controller (AXI128).
  • the flow direction of data with security attributes is: controller ⁇ third interface of the security bus ⁇ bit width conversion component of the channel splitter ⁇ splitter component of the channel splitter.
  • the arrow points to the writing process that only includes data with security attributes, and the reading process is not included for the sake of simplicity of illustration, and is not intended to limit the embodiment of the present application.
  • the security bus is also provided with an encryption and decryption component, so that the data is encrypted and decrypted through the encryption and decryption component.
  • the encryption and decryption component can be implemented as a DDR Encryption Engine (DDRE).
  • the security bus and encryption and decryption components are set up serially, and during the data reading and writing process, data transmission and data festival are executed serially.
  • the security bus needs to wait for the encryption and decryption component to complete data encryption and decryption before it can continue with subsequent data transmission.
  • the security bus and encryption and decryption components are set up in parallel, so that data transmission and data encryption and decryption support parallel execution.
  • the following takes the parallel setting of the security bus and encryption and decryption components as an example for explanation.
  • the security bus 900 is provided with an encryption and decryption component 930, and in addition to the third interface 901, the security bus 900 includes A fourth interface 902 and a fifth interface 903 are also included.
  • the output interface of the channel splitter 910 is connected to the third interface 901 and the fourth interface 902 of the security bus 900.
  • the third interface 901 of the security bus 900 is connected to the controller 920.
  • the fourth interface 902 is connected to the controller through the encryption and decryption component 930.
  • the fifth interface 903 is connected, and the fifth interface 903 is connected to the third interface 901 .
  • the bit width conversion component 912 transmits the data to the fourth interface 902, and the security bus 900 inputs data to the encryption and decryption component 930 through the fourth interface 902. After the encryption and decryption component 930 completes data encryption, it outputs the encrypted data to the fifth interface 903.
  • the security bus 900 receives the encrypted data output by the encryption and decryption component 930 through the fifth interface 903, and outputs the encrypted data to the controller 920 through the third interface 901.
  • bit width conversion component 912 transmits the data to the third interface 901 without being blocked by the data encryption process.
  • the security bus 900 After receiving the data transmitted by the controller 920 through the third interface 901, the security bus 900 sends the data to the fifth interface 903 through the fourth interface 902, and then to the fifth interface 903 through the fifth interface 903.
  • the encryption and decryption component 930 transmits data. After the encryption and decryption component 930 completes data decryption, it outputs the decrypted data to the fourth interface 902.
  • the security bus 900 receives the decrypted data output by the encryption and decryption component 930 through the fourth interface 902, and transmits the decrypted data to the bit width conversion component 912.
  • the security bus 900 directly transmits the data to the bit width conversion component 912 through the third interface 901 without being blocked by the data decryption process.
  • only the encryption component may be provided, or only the decryption component may be provided, or both the encryption component and the decryption component may be provided.
  • the encryption component and the decryption component can be two independent components, or they can be integrated components, that is, the encryption and decryption functions are implemented through a single encryption and decryption component.
  • the arrow points to the writing process that only includes data with security attributes, and the reading process is not included for the sake of simplicity of illustration, and is not intended to limit the embodiment of the present application.
  • the security bus 810 is provided with an encryption and decryption component 840, and the security bus 810 includes a first interface 811 and a third interface. In addition to the second interface 812, it also includes a sixth interface 813 and a seventh interface 814.
  • the bit width conversion component 802 transmits the data to the first interface 811, and the security bus 810 inputs data to the encryption and decryption component 840 through the sixth interface 813. After the encryption and decryption component 840 completes data encryption, it outputs the encrypted data to the seventh interface 814.
  • the security bus 800 receives the encrypted data output by the encryption and decryption component 840 through the seventh interface 814, and outputs the encrypted data to the controller 830 through the second interface 812.
  • the security bus 810 directly transmits the data to the controller 830 through the second interface 812 without being blocked by the data encryption process.
  • the security bus 810 sends the data to the seventh interface 814 through the second interface 812, and then sends the data to the seventh interface 814 through the seventh interface 814.
  • the encryption and decryption component 840 transmits data. After the encryption and decryption component 840 completes data decryption, it outputs the decrypted data to the sixth interface 813 .
  • the security bus 810 receives the decrypted data output by the encryption and decryption component 840 through the sixth interface 813, transmits the decrypted data to the first interface 811, and finally outputs the data to the bit width conversion component 802 through the first interface 811.
  • the security bus 810 directly transmits the data to the bit width conversion component 802 through the first interface 811 without being blocked by the data decryption process.
  • an encryption and decryption component when an encryption and decryption component is provided, two additional interfaces are provided on the security bus and the encryption and decryption components are connected through these two interfaces, so that the data encryption and decryption process and the data transmission process can be executed in parallel. This prevents the encryption and decryption components from blocking the data transmission path when encrypting and decrypting data, thereby improving the data read and write bandwidth.
  • the storage control device 1200 includes: at least one channel splitter 1210 and a controller 1220.
  • the storage control device 1200 is connected to the main device through the main bus 1230, and the storage control device 1200 is connected to the memory through the physical layer interface 1240 (which can be regarded as a part of the storage control device).
  • the structure of the channel splitter 1210 in the storage control device 1200 can be referred to the above-mentioned embodiments, and will not be described in detail here.
  • the storage control device 1200 includes a slave bus, and the channel splitter is provided in the slave bus.
  • the slave bus may also be provided with a safety bus, which will not be described in detail in this embodiment.
  • the link between the main bus 1230 and the storage control device 1200 is branched at the storage control device 1200, thereby increasing the number of memory channels.
  • the number of memory channels is related to the structure and number of channel splitters.
  • the storage control device 1300 is provided with a dual-channel splitter 1310 and a controller 1320 (corresponding to a physical layer interface 1340).
  • the dual-channel splitter 1310 is used to connect the memory
  • the address is divided into 2 memory channels.
  • four links (AXI bit width is 256 bits) are established between the main bus 1330 and the storage control device 1300, and four dual-channel splitters 1310 are provided in the storage control device 1300, the storage control device 1300 and 8 physical layer interfaces are connected at 1240, and the memory channels are increased from 4 to 8 (AXI bit width is 128bits).
  • the channel splitter provided in the storage control device is a three-channel splitter, and the three-channel splitter is used to divide the memory address into three memory channels.
  • AXI bit width is 256 bits
  • 4 three-channel splitters are installed in the storage control device, the storage control device has 12 physical layer interfaces Connected, the memory channels are increased from 4 to 12 (AXI bit width is 128bits).
  • the channel splitter is a dual-channel splitter, 4 links are established between the main bus and the storage control device (AXI bit width is 256 bits), and the storage control device is provided with
  • AXI bit width is 256 bits
  • the storage control device is connected to 6 physical layer interfaces, and the memory channels are increased from 4 to 6 (AXI bit width is 128bits).
  • the storage control device provided by the embodiment of the present application can be applied in a mobile terminal to improve the performance of the mobile terminal.
  • the mobile terminal can be a smart phone, a tablet computer, a wearable device, etc.
  • the data read and write bandwidth of the memory can meet the needs of high-speed photography, beauty algorithms, and AI algorithms. Thereby improving the shooting quality, user experience, and overall performance of mobile terminals.
  • the data read and write bandwidth of the memory can meet the needs of multiple applications running in the foreground at the same time, which helps to improve the folding screen terminal.
  • the screen terminal supports concurrent application scenarios.
  • the system on chip 1400 includes: a main device 1401, a main bus 1402 and a storage control device 1403.
  • the main device 1401 is connected to the storage control device 1403 through the main bus 1402, and the storage control device 1403 is connected to the memory through the physical layer interface 14033.
  • the memory is dynamic random access memory (Dynamic Random Access Memory, DRAM).
  • the master device 1401 is a processor or non-processor with data reading and writing requirements.
  • the processor includes a CPU, a GPU, and an NPU
  • the non-processor includes an image sensor and a VPU.
  • this is not a limitation.
  • the processor uses various interfaces and lines to connect various parts of the entire terminal, and executes various functions of the terminal by running or executing instructions, programs, code sets or instruction sets stored in the memory, and calling data stored in the memory. functions and process data.
  • the processor may adopt at least one of digital signal processing (Digital Signal Processing, DSP), field-programmable gate array (Field-Programmable Gate Array, FPGA), and programmable logic array (Programmable Logic Array, PLA).
  • DSP Digital Signal Processing
  • FPGA field-programmable gate array
  • PLA programmable logic array
  • the processor can integrate one or a combination of CPU, GPU, NPU and baseband chip.
  • the CPU mainly processes the operating system, user interface and applications;
  • the GPU is responsible for rendering and drawing the content that needs to be displayed on the display;
  • the NPU is used to implement AI functions;
  • the baseband chip is used to process wireless communications.
  • a link using the AXI protocol is established between the master device 1401 and the master bus 1402.
  • AXI links with a bit width of 256 bits are established between each master device 1401 and the main bus 1402.
  • the storage control device 1403 includes at least one channel splitter 14031 from the bus, a controller 14032, and a physical layer (PHY) interface 14033 corresponding to each controller 14032.
  • PHY physical layer
  • a link using the AXI protocol is established between the channel splitter 14031 and the controller 14032.
  • N AXI links with a bit width of 128 bits are established between the channel splitter 14031 and the controller 14032.
  • Figure 14 takes the system-on-chip that does not contain memory (that is, the memory is set outside the system-on-chip) as an example for illustration.
  • the memory 1404 can be integrated on the system-on-chip 1400, that is, it is set on the system-on-chip. inside the system.
  • embodiments of the present application also provide a terminal, which is provided with the system-on-chip shown in Figure 14 or Figure 15 .
  • the terminal can also include other necessary components, such as read-only memory (Read-Only Memory, ROM), display components, input units, audio circuits, speakers, microphones, power supplies and other components. This implementation The example will not be repeated here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Static Random-Access Memory (AREA)

Abstract

本申请实施例公开了一种通道分路器、存储控制装置、片上系统及终端,属于存储技术领域。通道分路器(21)包括:分路组件(211)和N个位宽转换组件(212);分路组件(211)用于将内存地址划分至N条内存通道;位宽转换组件(212)用于对分路组件(211)输入的数据进行位宽转换;分路组件(211)中包括地址译码器(2111),地址译码器(2111)用于进行内存地址译码,地址译码器(2111)支持至少两种工作模式,且不同工作模式下地址译码器(2111)所采用的译码方式不同。采用本申请实施例提供的通道分路器有助于提高内存读写带宽,满足不同使用场景对功耗以及性能的需求。

Description

通道分路器、存储控制装置、片上系统及终端
本申请要求于2022年06月20日提交的申请号为202210699613.3、发明名称为“通道分路器、存储控制装置、片上系统及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及存储技术领域,特别涉及一种通道分路器、存储控制装置、片上系统及终端。
背景技术
随着终端功能的不断丰富,终端对内存的要求也越来越高。比如,终端的处理器在运行人工智能(Artificial Intelligence,AI)算法时,对内存的数据读写带宽的要求较高。
多通道(channel)作为一种提高内存读写带宽的技术,被广泛应用于终端中。例如,手机常见的是支持4个内存通道。
发明内容
本申请实施例提供了一种通道分路器、存储控制装置、片上系统及终端。所述技术方案如下:
一方面,本申请实施例提供了一种通道分路器,所述通道分路器包括:分路组件和N个位宽转换组件,N大于或等于2;
所述分路组件用于将内存地址划分至N条内存通道;
所述位宽转换组件用于对所述分路组件输入的数据进行位宽转换;
所述分路组件中包括地址译码器,所述地址译码器用于进行内存地址译码,所述地址译码器支持至少两种工作模式,且不同工作模式下所述地址译码器所采用的译码方式不同。
另一方面,本申请实施例提供了一种存储控制装置,所述存储控制装置包括:至少一个如上述方面所述的通道分路器以及控制器;
所述存储控制装置通过主总线与主设备相连,且所述存储控制装置通过物理层接口与存储器相连。
另一方面,本申请实施例提供了一种片上系统,所述片上系统包括:主设备以及如上述方面所述的存储控制装置;
所述主设备通过主总线与所述存储控制装置相连;
所述存储控制装置通过物理层接口与存储器相连。
另一方面,本申请实施例提供了一种终端,所述终端中设置有如上述方面所述的片上系统。
附图说明
图1是相关技术中内存读取架构的架构示意图;
图2示出了本申请一个示例性实施例示出的通道分路器的结构示意图;
图3示出了本申请一个示例性实施例示出的分路组件的结构示意图;
图4是本申请一个示例性实施例示出的地址译码器的结构示意图;
图5是本申请一个示例性实施例示出的具有两种工作模式的地址译码器的结构示意图;
图6是本申请一个示例性实施例示出的具有三种工作模式的地址译码器的结构示意图;
图7示出了本申请另一个示例性实施例示出的通道分路器的结构示意图;
图8是本申请一个示例性实施例示出的通道分路器设置在安全总线之外的结构示意图;
图9是本申请一个示例性实施例示出的通道分路器设置在安全总线之内的结构示意图;
图10是本申请一个示例性实施例示出的安全总线与加解密组件的连接示意图;
图11是本申请另一个示例性实施例示出的安全总线与加解密组件的连接示意图;
图12是本申请一个示例性实施例示出的存储控制装置的结构示意图;
图13是本申请一个示例性实施例示出的具有8条内容通道的存储控制装置的结构示意图;
图14示出了本申请一个示例性实施例示出的片上系统的结构示意图;
图15示出了本申请另一个示例性实施例示出的片上系统的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
如图1所示,终端中设置有N个主设备(Master)101至10N,各个主设备与主总线(primary bus)11建立有4条总线链路,以通过总线链路向主总线11发送数据读写指令。主总线11与从总线12(secondary bus)之间也建立有4条链路,而从总线12则与4个控制器13之间建立有4条内存通道。进行数据读写时,从总线12将数据读写指令发送至某一内存通道对应的控制器13,由控制器13通过物理层接口14实现对存储器15的数据读写。
显然,相关技术中,内存通道的数量与主从总线间链路的数量一致。这种内存读取方式虽然设计较简单,但其性能已经越来越难以满足人工智能以及其他并行运算等应用的需求。
本申请实施例提供了一种由分路组件和N个位宽转换组件构成的通道分路器(channel splitter)。通过分路组件将内存地址划分至N个内存通道,并通过位 宽转换组件对分路组件输入的数据进行位宽转换,实现内存通道扩充,进而实现通过多个内存通道进行数据读写,有助于提高内存读写带宽,进而提高上游主设备的性能,实现对更多并发应用场景的支持。
并且,分路组件中的地址译码器支持至少两种工作模式,能够根据场景对性能以及功耗的需求,在不同工作模式下采用不同的译码方式进行内存地址译码,以此满足不同使用场景对功耗以及性能的需求。
将本申请实施例提供的通道分路器应用于上述内存读取架构后(可以设置在从总线中),使从总线与控制器间内存通道的数量由原先与主从总线间链路的数量一致,变为大于主从总线间链路的数量,实现了对内存通道数量的扩充,从而提升了内存读写带宽,在不提升存储器颗粒的读写速度的前提下,也能提升片上系统以及终端设备的整体性能。下面通过示意性的实施例对通道分路器的结构以及工作原理进行说明。
请参考图2,其示出了本申请一个示例性实施例示出的通道分路器的结构示意图。该通道分路器21包括:分路组件211和N个位宽转换组件212,N大于或等于2。
在一个示例性的例子中,当通道分路器为双通道分路器时,双通道分路器中设置有分路组件和两个位宽转换组件;当通道分路器为三通道分路器时,三通道分路器中设置有分路组件和三个位宽转换组件。
在其他可能的命名方式中,通道分路器可以被称为1to N组件,本申请实施例对此不作限定。
其中,分路组件211包括输入接口和N个分路输出接口,且分路组件211通过N个分路输出接口与N个位宽转换组件212相连,该分路组件211用于实现分路功能,即将内存地址划分至N条内存通道,且分路组件211的输入位宽与输出位宽相同。
位宽转换组件212包括分路输入接口和输出接口,位宽转换组件212用于对来自分路输入接口的输入数据进行位宽转换,并通过输出接口输出位宽转换后的输出数据,即位宽转换组件212用于实现位宽转换功能,位宽转换组件212的输入位宽与输出位宽不同。
在一种可能的设计中,由于上游数据位宽通常大于下游数据位宽,因此该位宽转换组件212可以实现称为Downsizer,用于将高位宽输入转换为地位宽输出。其中,N个位宽转换组件212的输出位宽之和大于或等于输入位宽。
在一个示意性的例子中,该位宽转化组件用于实现256bits至128bits的位宽转换。当位宽转换组件的数量为2时,2个位宽转换组件的输出位宽之和(128bits×2)等于输入位宽;当位宽转换组件的数量为3时,3个位宽转换组件的输出位宽之和(128bits×3)大于输入位宽。需要说明的是,本实施例进以256bits转换为128bits为例进行示意性说明,但并不对此构成限定。
上游设备进行数据读写操作时,会在数据读写指令中指示待读写数据的地址。通道分路器接收到数据读写请求后,需要将主设备指示的地址(通常为虚 拟地址)转换为物理地址,以便后续指示控制器根据该物理地址进行数据读取,该过程即为内存地址的译码过程。
本申请实施例中,分路组件211中包括地址译码器2111。其中,该地址译码器可以采用单译码(或称为字结构方式)或者双译码方式(或称为X-Y译码结构)进行译码。
在一些实施例中,地址译码器的输出地址比特数小于地址译码器的输入地址比特数,且比特数差值与N呈正相关关系。其中,N越大(即内存通道数越多),比特数差值越大。
在一个示意性的例子中,输入地址译码器的虚拟地址为37或40bits,在存在4个内存通道的情况下,经过译码后地址译码器输出的物理内存地址为35或38bits,在存在8个内存通道的情况下(即采用双通道分路器),经过译码后地址译码器输出的物理内存地址为34或37bits。
通过地址译码器完成内存地址译码后,需要基于译码得到的物理内存地址,确定该内存地址所属的内存通道,进而针对不同内存通道进行数据交织(interleave),得到分路至各个内存通道的数据,并将数据传输至内存通道对应的位宽转换组件。
如图3所示,对具有一个输入接口S0以及N个输出接口(M0至MN)的分路组件300进行内部结构放大,输入接口S0中设置有地址译码器301以及总线矩阵(BUS Matrix)302,各个输出接口设置有各自对应的矩阵(M0矩阵至MN矩阵)。地址译码器301完成地址译码后,基于译码得到的内存地址进行数据交织,从而通过总线矩阵302向输出接口的矩阵分发各个内存通道对应的数据。
不同于相关技术中地址译码器采用单一译码方式进行地址译码,本申请实施例中的地址译码器支持至少两种工作模式,且在不同工作模式下,地址译码器采用不同的译码方式进行地址译码。
在一种可能的实施方式中,采用不同译码方式进行地址译码时,内存读写的性能以及功耗存在差异。因此通道分路器可以基于使用场景对内存读写性能以及功耗的需求,将地址译码器设置为相应的工作模式,从而满足当前使用场景的性能需求和/或功耗需求。
在一种可能的设计中,地址译码器中设置有第一寄存器以及至少两个地址译码模块,第一寄存器中存储的数据用于指示工作模式,且不同地址译码模块用于在不同工作模式下工作。
在一种可能的设计中,至少两个地址译码模块包括第一地址译码模块和第二地址译码模块;
第一地址译码模块用于在第一工作模式下工作,第一地址译码模块所采用的译码方式为低比特译码;
第二地址译码模块用于在第二工作模式下工作,第二地址译码模块所采用的译码方式为高比特译码;
其中,低比特译码所采用的交织颗粒度小于高比特译码所采用的交织颗粒 度。
在一种可能的设计中,第一地址译码模块还设置有哈希函数,哈希函数用于对N个内存通道进行负载均衡。
在一种可能的设计中,至少两个地址译码模块还包括第三地址译码模块;
第三地址译码模块用于在第三工作模式下工作,第三地址译码模块所采用的译码方式包括低比特译码和高比特译码。
在一种可能的设计中,地址译码器中还设置有第二寄存器,第二寄存器中存储的数据用于指示第三工作模式下不同译码方式对应的内存地址范围。
在一种可能的设计中,第三地址译码模块还设置有哈希函数,哈希函数用于在低比特译码过程中对N个内存通道进行负载均衡。
在一种可能的设计中,位宽转换组件采用随机存取存储器结构。
在一种可能的设计中,通道分路器的输入接口与主设备之间通过主总线相连,主设备是具有数据读写需求的设备;
通道分路器的N个输出接口与N个控制器相连,N个控制器对应N条内存通道。
关于实现多工作模式的具体方式,在一种可能的设计中,如图4所示,地址译码器400中设置有第一寄存器401以及至少两个地址译码模块402,不同地址译码模块402用于在不同工作模式下工作。
其中,第一寄存器401中存储的数据用于指示工作模式。在一种可能的实施方式中,在接收到上游主设备的模式设置指令时,第一寄存器中即模式设置指令所指示目标模式对应的模式数据。
在一种可能的设计中,该地址译码模块402为硬件模块。相应的,地址译码器400基于第一寄存器401中写入的数据,控制相应工作模式对应的地址译码模块402工作。
在一个示意性的例子中,如图5所示,当地址译码器500支持两种工作模式时,第一寄存器511中存储的数据包括0(指示第一工作模式)和1(指示第二工作模式),相应的,地址译码器500中设置有第一地址译码模块521和第二地址译码模块522。
在另一个示意性的例子中,如图6所示,当地址译码器600支持三种工作模式时,第一寄存器611中存储的数据包括00(指示第一工作模式)、01(指示第二工作模式)和10(指示第三工作模式),相应的,地址译码器600中设置有第一地址译码模块621、第二地址译码模块622和第三地址译码模块623。
需要说明的是,本申请实施例并不对第一寄存器中写入的具体数据,以及地址译码器支持的工作模式的数量进行限定。
划分内存通道时采用的划分粒度可以被称为交织颗粒度。比如,按照1GB这一交织颗粒度将8GB内存划分至8个内存通道时,第0至1GB被划分至第一内存通道,第1至2GB被划分至第二内存通道,第2至3GB被划分至第三内存 通道,以此类推。
按照1MB这一交织颗粒度将8GB内存划分至8个内存通道时,第8i至(8i+1)MB将被划分至第一内存通道,第(8i+1)至(8i+2)MB将被划分至第二内存通道,第(8i+2)至(8i+3)MB将被划分至第三内存通道,以此类推,i为整数。
显然,不同交织颗粒度下各个内存通道划分到的内存块不同(总量相同)。随着交织颗粒度的降低,通过内存通道进行数据读写的速度会不断提高。比如,按照1GB这一粒度进行内存通道划分时,当读取第400至500MB的数据时只能通过第一内存通道;而按照1MB这一粒度进行内存通道划分时,能够同时通过8个内存通道读取第400至500MB的数据。
因此,出于功耗以及性能层面的考虑,在一种可能的设计中,如图5所示,地址译码器中设置的至少两个地址译码模块包括第一地址译码模块521和第二地址译码模块522。
第一地址译码模块521用于在第一工作模式下工作,第一地址译码模块521所采用的译码方式为低比特译码;
第二地址译码模块522用于在第二工作模式下工作,第二地址译码模块522所采用的译码方式为高比特译码。
由于低比特译码所采用的交织颗粒度小于高比特译码所采用的交织颗粒度,因此第一工作模式下的数据读写的性能优于第二工作模式下的数据读写的性能,相对应的,第一工作模式下的数据读写的功耗高于第二工作模式下的数据读写的性能。在一些实施例中,该第一工作模式可以被称为性能模式,第二工作模式则可以被称为功耗模式。
由于不同应用场景对数据读写性能和功耗需求不同,因此通过设置两种工作模式,地址译码器能够在不同的应用场景下,动态切换工作模式,以满足不同场景对数据读写性能以及功耗的需求。
比如,在对性能需求较高的场景下(例如并行运行多个应用程序时),地址译码器可以被设置为第一工作模式(采用低比特译码),优先保证该场景下的数据读写性能;而在对功耗要求较高的场景下(例如应用后台运行时),地址译码器可以被设置为第二工作模式(采用高比特译码),降低该场景下数据读写过程所带来的功耗。
在一个示意性的例子中,高比特译码所采用的交织颗粒度为10MB,而低比特译码所采用的交织颗粒度为2MB;或者,高比特译码所采用的交织颗粒度为1GB,而低比特译码所采用的交织颗粒度为100MB,本申请实施例并不对高低比特译码所采用的具体交织颗粒度进行限定。
进一步的,第一工作模式下,为了避免部分内存通道的读写操作过分频繁,而其他内存通道过分空闲,影响整体数据读写性能,在一种可能的设计中,第一地址译码模块中设置有哈希函数(hash function),从而在进行内存通道划分时,利用哈希函数对N个内存通道进行负载均衡,进而提高整体数据读写性能。
在一个示意性的例子中,该哈希函数可以表示为:hash_chsel=^(addr_s[32:6] & hash_mask[26:0]),其中,^表示按位异或操作,addr_s[32:6]表示选择第6至32位的地址比特,hash mask则用于对选取的地址比特进行掩码。当hash_chsel=0时,确定内存通道为内存通道1,当hash_chsel=1时,确定内存通道为内存通道2(具有两条内存通道的情况)。
需要说明的是,上述哈希函数仅用于示意性说明,第一地址译码模块还可以应用其他用于实现内存通道间负载均衡的哈希函数,本申请实施例并不对此进行限定。
上述实施例中,第一工作模式和第二工作模式下,不同通道分路器对应的内存通道均采用相同的交织颗粒度。在另一种可能的设计中,地址译码器除了支持第一工作模式和第二工作模式外,还支持第三工作模式,且第三工作模式下,不同通道分路器对应的内存通道可以采用不同的交织颗粒度。
在一种可能的设计中,地址译码器中还设置有第三地址译码模块,第三地址译码模块用于在第三工作模式下工作,且第三地址译码模块所采用的译码方式包括低比特译码和高比特译码,从而实现读写性能与功耗之间的兼顾。
在一种可能的设计中,为了同时支持低比特译码和高比特译码,需要对内存地址进行划分,并在不同的内存地址范围内采用不同的译码方式。
可选的,内存地址的划分方式为至少一种。
当支持一种内存地址划分方式时,第三工作模式下,第三地址译码模块即在第一内存地址范围内进行低比特译码,在第二内存地址范围内进行高比特译码。
比如,对于32GB内存,0至16GB内存地址范围采用低比特译码,16GB至32GB内存地址范围采用高比特译码。
当支持至少两种内存地址划分方式时,如图6所示,地址译码器600中还设置有第二寄存器612,第二寄存器612中存储的数据用于指示第三工作模式下不同译码方式对应的内存地址范围。其中,第二寄存器中存储的数据仅在第一寄存器中存储的数据指示当前为第三工作模式时有效。
在一种可能的实施方式中,在接收到上游主设备的模式设置指令时,且模式设置指令指示第三工作模式时,第一寄存器中写入第三工作模式对应的数据,第二寄存器中写入模式设置指令所指示内存地址范围对应的数据。
需要说明的是,低比特译码对应的内存地址范围越大,数据读写的性能越好,但功耗越高,反之,高比特译码对应的内存地址范围越大,数据读写的功耗越低,但性能越差。
在一个示意性的例子中,当第三工作模式支持4种内存地址划分方式,且内存为32GB时,第二寄存器中存储的数据包括00(指示8GB低比特译码+12GB×2高比特译码)、01(指示16GB低比特译码+8GB×2高比特译码)、10(指示24GB低比特译码+4GB×2高比特译码)以及11(指示24GB低比特译码+8GB高比特译码)。
在另一个示意性的例子中,当第三工作模式支持3种内存地址划分方式, 且内存为24GB时,第二寄存器中存储的数据包括00(指示6GB低比特译码+9GB×2高比特译码)、01(指示12GB低比特译码+6GB×2高比特译码)以及10(指示18GB低比特译码+3GB×2高比特译码)。
在另一个示意性的例子中,当第三工作模式支持3种内存地址划分方式,且内存为16GB时,第二寄存器中存储的数据包括00(指示4GB低比特译码+6GB×2高比特译码)、01(指示8GB低比特译码+4GB×2高比特译码)以及10(指示12GB低比特译码+2GB×2高比特译码)。
在另一个示意性的例子中,当第三工作模式支持3种内存地址划分方式,且内存为12GB时,第二寄存器中存储的数据包括00(指示3GB低比特译码+4.5GB×2高比特译码)、01(指示6GB低比特译码+3GB×2高比特译码)以及10(指示9GB低比特译码+1.5GB×2高比特译码)。
需要说明的是,上述示例仅用于示意性说明内存地址范围的划分方式,但并不对此构成限定。
地址译码过程中,地址译码器进行译码过程中,读取第一寄存器中的数据,若该数据指示第三工作模式时,进一步读取第二寄存器中的数据,从而基于该数据指示的内存地址范围,采用对应的译码方式进行地址译码。
与第一工作模式类似的,为了避免部分内存通道的读写操作过分频繁,而其他内存通道过分空闲(特指采用低比特译码的内存通道),影响整体数据读写性能,在一种可能的设计中,第三地址译码模块中设置有哈希函数,从而在进行内存通道划分时,利用哈希函数在低比特译码过程中对N个内存通道进行负载均衡,进而提高整体数据读写性能。
需要说明的是,上述实施例仅以三种工作模式为例进行示意性说明,在其他可能的实施方式中,地址译码器可以支持三种以上工作模式,本实施例并不对此构成限定。
本实施例中,通过在通道分路器中设置支持至少两种工作模式的地址译码器,使通道分路器可以选择在不同工作模式下,通过不同的译码方式进行内存地址译码,以此满足不同场景对性能以及功耗的需求。
关于位宽转换组件所采用的结构,在一种可能的实施方式中,该位宽转换组件采用CAM(Content Addressable Memory,内容可寻址存储器)结构,即借助寄存器阵列(register array)实现位宽转换功能。
在缓冲深度较小的情况下,采用CAM结构的位宽转换组件的位宽转换速度较快。然而,随着缓冲深度的不断增加,当缓冲深度达到一定阈值时(比如128或256),寄存器阵列需要额外的周期(cycle)才能完成位宽转换。
为了降低位宽转换延迟,在另一种可能的实施方式中,该位宽转换组件采用RAM(Random Access Memory,随机存储存储器)结构,即借助SRAM(Static Random-Access Memory,静态随机存储存储器)实现位宽转换功能。由于SRAM并不受限于深度问题,因此在缓冲深度较大的情况下,采用RAM结构的位宽转换组件相较于采用CAM结构的位宽转换组件具有更低的延迟(理论上可以节省 一个周期)。
关于位宽转换组件的结构选择,在一些实施例中,对于低性能平台(缓冲深度较小),可以采用CAM结构的位宽转换组件;对于高性能平台(缓冲深度较大),可以采用RAM结构的位宽转换组件。
当然,在其他可能的实施例中,对于性能可变的平台(即同时支持高性能和低性能模式),可以设置采用CAM结构和RAM结构的两个位宽转换组件,并在低性能模式下使用CAM结构的位宽转换组件,在高性能模式下使用RAM结构的位宽转换组件,本申请实施例对此不作限定。
相关技术中,在读数据过程中,控制器需要对读取到的数据进行数据重排(data ordering),因此控制器内部需要设置数据重排模块。而本申请实施例中,由于增加了通道分路器,且在通道分路器处需要进行数据重排,以保证数据时序的准确性,因此为了避免控制器执行不必要的数据重排,影响读数据性能并在成片上面积浪费,在一种可能的设计中,将位宽转换组件与控制器中读数据通路的数据重排合并,即在位宽转换组件中设置数据重排模块,而不再控制器中设置数据重排模块。读数据过程中,读取到的数据对于控制器透明,控制器将读取到的数据传输至位宽转换组件后,由位宽转换组件中的数据重排模块进行数据重排,并进一步向上游传输重排后的数据。
本实施例中,由于无需在控制器中设置数据重排模块,因此能够节省片上面积,且控制器无需进行无效的数据重排,有助于降低读数据延迟,提高读数据性能。
关于通道分路器与其他组件之间的连接关系,在一种可能的设计中,如图7所示,通道分路器71的地址译码器711通过主总线73与主设备72相连,N个位宽转换组件712与N个控制器74相连。
其中,主设备72是在运行过程中具有数据读写需求的设备。主设备可以包括但不限于中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)、神经网络处理器(Neural-network Processing Unit,NPU)、数字信号处理器(Digital Signal Processor,DSP)等处理器,以及图像传感器(Image Sensor)、图像信号处理单元(Image Signal Processing Unit,ISP)、视频处理单元(Video Processing Unit,VPU)等非处理器。本申请实施例并不对主设备的具体类型进行限定。
此外,主设备72可以是有数据读和写的需求的主设备,如处理器,也可能只有读或者写的需求,如图像传感器。主设备是否同时具有读和写的需求不构成对本申请的限定。
在一些实施例中,主总线73可以实现成为系统缓存(System Cache,SC)总线。
在一些实施例中,控制器74可以实现成为动态存储控制器(Dynamic Memory Controller,DMC)。N个控制器74对应N条内存通道,即不同控制器 74用于控制通过不同内存通道进行数据读写操作。
在一些实施例中,控制器74通过对应的物理层接口(PHY)与存储器相连,实现对存储器的数据读写操作。
在一种可能的设计中,该通道分路器设置在用于连接主设备和存储器的从总线中,以实现主设备对存储器中数据多内存通道访问。其中,从总线可以实现成为双倍速率(Double Data Rate,DDR)总线;数据访问过程中,主设备作为Master,而存储器作为Slave。
从总线与主总线73之间通过n条链路相连。在一些实施例中,各个主设备72与主总线73之间通过n条链路相连,主总线73则通过对不同主设备72对应的链路进行交织,从而与从总线建立n条链路。其中,与从总线间建立的链路的数量与从总线中设置的通道分路器的数量相关。
在一种可能的实现方式中,主设备72与主总线73之间的链路,以及主总线73与从总线之间的链路采用相同的总线协议。比如,该链路均采用先进可扩展接口(Advanced eXtensible Interface,AXI)总线协议。本申请实施例并不对链路所采用的具体总线协议进行限定。
除了在主总线下游(即将通道分路器设置在主总线与控制器之间)实现分路这一方式外,在另一种可能的实现方式中,可以通过增加主设备与主总线之间链路的数量(即在主设备侧进行分路),来增加内存通道的数量。比如,将主设备与主总线之间链路的数量由n提升至m,并将控制器的数量由n提升至m后,同样可以提升内存通道数量。
在还一种可能的实现方式中,可以通过在主总线处实现分路,来增加内存通道的数量。比如,在主设备与主总线之间建立n条链路的情况下,主总线通过分路与m个控制器建立m条链路,实现内容通道数量由n提升至m。
然而,在主设备侧还是在主总线处进行分路,因为分路过早,其硬件实现复杂度均高于在主总线下游处进行分路。且在主总线下游处实现分路,能够降低对主设备以及主总线的影响,保证与已有主设备以及总线的适配性,提高兼容性。
此外,相较于在主设备或主总线侧实现分路,在主总线下游处实现分路能够节省片上系统的面积,且在实现系统时序时更加简单;同时有助于降低功耗,并降低后续进行功耗优化的实现复杂度。
本申请实施例中,相较于在主设备或主总线侧实现分路,通过在主总线下游实现通道分路,不仅能够降低对上游的主设备以及主总线的影响,保证方案的适配性,还能够节省片上系统的面积,且在实现系统时序时更加简单。此外,采用在在存储控制装置处实现分路的方案,有助于降低功耗,并降低后续进行功耗优化的实现复杂度。
为了保证数据读写过程中数据的安全性,片上系统中通常会设置安全总线(Security BUS,SBUS)。关于上述通道分路器的设置位置,在一种可能的设计中,该通道分路器可以位于安全总线的外部,或者,通道分路器位于安全总线 的内部。下面通过示例性实施例对这两种设置位置分别进行说明。
在通道分路器设置在安全总线外部的情况下,如图8所示,安全总线810包括第一接口811和第二接口812。通道分路器800的输出接口与安全总线810的第一接口811相连,且安全总线810的第二接口812与控制器830相连,即通道分路器向下输出的数据,以及控制器向上输出的数据均需要经过安全总线。
具体的,图8中的通道分路器800对应N条安全总线810,且通道分路器包括分路组件801以及N个位宽转换组件802,各个位宽转换组件802与对应安全总线810的第一接口811相连。
数据写入过程中,具有安全属性的数据的流向为:通道分路器的分路组件(AXI256)→通道分路器的位宽转换组件(AXI128)→安全总线的第一接口(AXI128)→安全总线的第二接口(AXI128)→控制器(AXI128)。
数据读取过程中,具有安全属性的数据的流向为:控制器→安全总线的第二接口→安全总线的第一接口→通道分路器的位宽转换组件→通道分路器的分路组件。
在图8中,箭头指向仅包含了有安全属性的数据的写入过程,而没有包含读取过程是为了图示简洁,而不是对本申请实施例的限定。
在通道分路器设置在安全总线内部的情况下,如图9所示,安全总线900包括通道分路器910和N个第三接口901。通道分路器910的N个输出接口与N个第三接口901相连,且第三接口901与控制器920相连。
具体的,图9中的通道分路器910包括分路组件911以及N个位宽转换组件912,各个位宽转换组件912与对应的第三接口901相连。
数据写入过程中,具有安全属性的数据的流向为:通道分路器的分路组件(AXI256)→通道分路器的位宽转换组件(AXI128)→安全总线的第三接口(AXI128)→控制器(AXI128)。
数据读取过程中,具有安全属性的数据的流向为:控制器→安全总线的第三接口→通道分路器的位宽转换组件→通道分路器的分路组件。
在图9中,箭头指向仅包含了有安全属性的数据的写入过程,而没有包含读取过程是为了图示简洁,而不是对本申请实施例的限定。
为了保证数据读写的安全性,部分数据写入存储器前需要经过加密,相应的,读取经过加密的数据时,需要对加密的数据进行解密。因此,在一种可能的设计中,安全总线还对应设置有加解密组件,从而通过加解密组件对数据进行加解密处理。在一些实施例中,该加解密组件可以实现成为DDR加密引擎(DDR Encryption Engine,DDRE)。
在一种可能的设计中,安全总线与加解密组件串行设置,数据读写过程中,数据传输与数据佳节串行执行。安全总线需要等待加解密组件完成数据加解密后,才能继续进行后续数据传输。
然而,采用上述串行传输方案时,待加解密数据会对无需加解密数据的传 输造成阻塞,影响整体的数据读写速度。
在另一种可能的设计中,安全总线与加解密组件并行设置,使数据传输与数据加解密支持并行执行。下面以安全总线与加解密组件并行设置为例进行说明。
在通道分路器设置在安全总线内部的情况下,在图9的基础上,如图10所示,安全总线900对应设置有加解密组件930,且安全总线900除了包括第三接口901外,还包括第四接口902和第五接口903。其中,通道分路器910的输出接口与安全总线900的第三接口901和第四接口902相连,安全总线900的第三接口901与控制器920相连,第四接口902通过加解密组件930与第五接口903相连,且第五接口903与第三接口901相连。
数据加密过程中,对于需要进行加密的数据,位宽转换组件912将该数据传输至第四接口902,安全总线900即通过第四接口902向加解密组件930输入数据。加解密组件930完成数据加密后,向第五接口903输出加密后的数据。对应的,安全总线900通过第五接口903接收加解密组件930输出的加密后的数据,并通过第三接口901向控制器920输出加密后的数据。
对于无需进行加密的数据,位宽转换组件912将该数据传输至第三接口901,并不会因数据加密过程造成阻塞。
数据解密过程中,对于需要进行解密的数据,安全总线900通过第三接口901接收到控制器920传输的数据后,通过第四接口902向第五接口903发送数据,进而通过第五接口903向加解密组件930传输数据。加解密组件930完成数据解密后,向第四接口902输出解密后的数据。对应的,安全总线900通过第四接口902接收加解密组件930输出的解密后的数据,并向位宽转换组件912传输解密后的数据。
对于无需进行加密的数据,安全总线900直接通过第三接口901将该数据传输至位宽转换组件912,并不会因数据解密过程造成阻塞。
在一些实施例中,可以仅设置加密组件,或者,仅设置解密组件,或者,同时加密组件和解密组件。
在一种可能的设计中,加密组件和解密组件可以为两个独立的组件,也可以是一体化的组件,即通过单个加解密组件实现加密和解密功能。
在图9中,箭头指向仅包含了有安全属性的数据的写入过程,而没有包含读取过程是为了图示简洁,而不是对本申请实施例的限定。
在通道分路器设置在安全总线外部的情况下,在图8的基础上,如图11所示,安全总线810对应设置有加解密组件840,且安全总线810除了包括第一接口811和第二接口812外,还包括第六接口813和第七接口814。
数据加密过程中,对于需要进行加密的数据,位宽转换组件802将该数据传输至第一接口811,安全总线810即通过第六接口813向加解密组件840输入数据。加解密组件840完成数据加密后,向第七接口814输出加密后的数据。对应的,安全总线800通过第七接口814接收加解密组件840输出的加密后的数据,并通过第二接口812向控制器830输出加密后的数据。
对于无需进行加密的数据,安全总线810直接通过第二接口812将该数据传输控制器830,并不会因数据加密过程造成阻塞。
数据解密过程中,对于需要进行解密的数据,安全总线810通过第二接口812接收到控制器830传输的数据后,通过第二接口812向第七接口814发送数据,进而通过第七接口814向加解密组件840传输数据。加解密组件840完成数据解密后,向第六接口813输出解密后的数据。对应的,安全总线810通过第六接口813接收加解密组件840输出的解密后的数据,并向第一接口811传输解密后的数据,最终通过第一接口811向位宽转换组件802输出数据。
对于无需进行加密的数据,安全总线810直接通过第一接口811将该数据传输至位宽转换组件802,并不会因数据解密过程造成阻塞。
本实施例中,在设置有加解密组件的情况下,通过在安全总线上额外设置两个接口,并通过这两个接口连接加解密组件,使数据加解密过程与数据传输过程能够并行执行,避免加解密组件对数据进行加解密操作时阻塞数据传输通路,从而提高数据的读写带宽。
如图12所示,其示出了本申请一个示例性实施例示出的存储控制装置的结构示意图。该存储控制装置1200包括:至少一个通道分路器1210以及控制器1220。
其中,存储控制装置1200通过主总线1230与主设备相连,且存储控制装置1200通过物理层接口1240(可以被视为存储控制装置的一部分)与存储器相连。存储控制装置1200中通道分路器1210的结构可以参考上述各个实施例,本实施例在此不作赘述。
在一种可能的设计中,存储控制装置1200包括从总线,而通道分路器则设置在从总线中。此外,从总线中还可以设置有安全总线,本实施例对此不作赘述。
本实施例中,主总线1230与存储控制装置1200之间的链路在存储控制装置1200出分路,以此增加内存通道的数量。其中,内存通道的数量与通道分路器的结构以及数量相关。
在一种可能的设计中,如图13所示,存储控制装置1300中设置有双通道分路1310和控制器1320(对应设置有物理层接口1340),双通道分路器1310用于将内存地址划分至2个内存通道。在主总线1330与存储控制装置1300之间建立有4条链路(AXI位宽为256bits),且存储控制装置1300中设置有4个双通道分路器1310的情况下,存储控制装置1300与8个物理层接口1240相连,内存通道由4条提升至8条(AXI位宽为128bits)。
在另一种可能的设计中,存储控制装置中设置的通道分路器为三通道分路,三通道分路器用于将内存地址划分至3个内存通道。在主总线与存储控制装置之间建立有4条链路(AXI位宽为256bits),且存储控制装置中设置有4个三通道分路器的情况下,存储控制装置与12个物理层接口相连,内存通道由4条提升至12条(AXI位宽为128bits)。
在还一种可能的设计中,通道分路器为双通道分路器,在主总线与存储控制装置之间建立有4条链路(AXI位宽为256bits),且存储控制装置中设置有2个双通道分路器的情况下,存储控制装置与6个物理层接口相连,内存通道由4条提升至6条(AXI位宽为128bits)。
需要说明的是,上述实施例仅作为示例性说明,并不对存储控制装置中通道分路器的分路数量以及设置数量构成限定。
本申请实施例提供的存储控制装置可以被应用于移动终端中,以提高移动终端的性能。其中,该移动终端可以是智能手机、平板电脑、可穿戴式设备等等。
在一种可能的应用场景下,将本申请实施例提供的存储控制装置应用于具有图像拍摄功能的移动终端后,内存的数据读写带宽能够满足高速摄影、美颜算法、AI算法的需求,从而提升移动终端的拍摄质量,用户体验,以及整体性能。
在另一种可能的应用场景下,将本申请实施例提供的存储控制装置应用于折叠屏终端后,内存的数据读写带宽能够满足多个应用程序同时前台运行的需求,有助于提高折叠屏终端对并发应用场景的支持。
如图14所示,其示出了本申请一个示例性实施例示出的片上系统(System on Chip,SoC)的结构示意图。该片上系统1400包括:主设备1401、主总线1402以及存储控制装置1403。
主设备1401通过主总线1402与存储控制装置1403相连,存储控制装置1403通过物理层接口14033与存储器相连。在一些实施例中,该存储器为动态随机存取存储器(Dynamic Random Access Memory,DRAM)。
主设备1401是具有数据读写需求的处理器或者非处理器。图14中以处理器包括CPU、GPU和NPU,非处理器包括图像传感器与VPU为例进行示意性说明,但并不对此构成限定。
其中,处理器利用各种接口和线路连接整个终端内的各个部分,通过运行或执行存储在存储器内的指令、程序、代码集或指令集,以及调用存储在存储器内的数据,执行终端的各种功能和处理数据。
在一些实施例中,处理器可以采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。
处理器可集成CPU、GPU、NPU和基带芯片等中的一种或几种的组合。其中,CPU主要处理操作系统、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;NPU用于实现AI功能;基带芯片用于处理无线通信。
在一些实施例中,主设备1401与主总线1402之间建立有采用AXI协议的链路。比如,各个主设备1401与主总线1402之间建立有4条位宽为256bits的AXI链路。
在一些实施例中,存储控制装置1403包括从总线至少一个通道分路器14031、控制器14032以及各个控制器14032对应的物理层(PHY)接口14033。
在一些实施例中,通道分路器14031与控制器14032之间建立有采用AXI协议的链路。比如,通道分路器14031与控制器14032之间建立有N条位宽为128bits的AXI链路。
存储控制装置1403的具体结构可以参考上述实施例示出的存储控制装置,本实施例在此不再赘述。
图14以片上系统中不包含存储器(即存储器设置在片上系统外部)为例进行说明,在其他可能的设计中,如图15所示,存储器1404可以集成在片上系统1400上,即设置在片上系统内部。
在一些实施例中,本申请实施例还提供了一种终端,该终端设置有图14或图15所示的片上系统。需要说明的是,除了片上系统外,终端还可以包括其它必要组件,比如只读存储器(Read-Only Memory,ROM)、显示组件、输入单元、音频电路、扬声器、麦克风、电源等部件,本实施例在此不作赘述。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (13)

  1. 一种通道分路器,所述通道分路器包括:分路组件和N个位宽转换组件,N大于或等于2;
    所述分路组件用于将内存地址划分至N条内存通道;
    所述位宽转换组件用于对所述分路组件输入的数据进行位宽转换;
    所述分路组件中包括地址译码器,所述地址译码器用于进行内存地址译码,所述地址译码器支持至少两种工作模式,且不同工作模式下所述地址译码器所采用的译码方式不同。
  2. 根据权利要求1所述的通道分路器,其中,所述地址译码器中设置有第一寄存器以及至少两个地址译码模块,所述第一寄存器中存储的数据用于指示所述工作模式,且不同地址译码模块用于在不同工作模式下工作。
  3. 根据权利要求2所述的通道分路器,其中,至少两个所述地址译码模块包括第一地址译码模块和第二地址译码模块;
    所述第一地址译码模块用于在第一工作模式下工作,所述第一地址译码模块所采用的译码方式为低比特译码;
    所述第二地址译码模块用于在第二工作模式下工作,所述第二地址译码模块所采用的译码方式为高比特译码;
    其中,低比特译码所采用的交织颗粒度小于高比特译码所采用的交织颗粒度。
  4. 根据权利要求3所述的通道分路器,其中,所述第一地址译码模块还设置有哈希函数,所述哈希函数用于对N个所述内存通道进行负载均衡。
  5. 根据权利要求3所述的通道分路器,其中,至少两个所述地址译码模块还包括第三地址译码模块;
    所述第三地址译码模块用于在第三工作模式下工作,所述第三地址译码模块所采用的译码方式包括低比特译码和高比特译码。
  6. 根据权利要求5所述的通道分路器,其中,所述地址译码器中还设置有第二寄存器,所述第二寄存器中存储的数据用于指示所述第三工作模式下不同译码方式对应的内存地址范围。
  7. 根据权利要求5所述的通道分路器,其中,所述第三地址译码模块还设置有哈希函数,所述哈希函数用于在低比特译码过程中对N个所述内存通道进行负载均衡。
  8. 根据权利要求1至7任一所述的通道分路器,其中,所述位宽转换组件采用随机存取存储器结构。
  9. 根据权利要求1至7任一所述的通道分路器,其中,
    所述通道分路器的输入接口与主设备之间通过主总线相连,所述主设备是具有数据读写需求的设备;
    所述通道分路器的N个输出接口与N个控制器相连,N个所述控制器对应N条内存通道。
  10. 一种存储控制装置,所述存储控制装置包括:至少一个如权利要求1至9任一所述的通道分路器以及控制器;
    所述存储控制装置通过主总线与主设备相连,且所述存储控制装置通过物理层接口与存储器相连。
  11. 一种片上系统,所述片上系统包括:主设备以及如权利要求10所述的存储控制装置;
    所述主设备通过主总线与所述存储控制装置相连;
    所述存储控制装置通过物理层接口与存储器相连。
  12. 根据权利要求11所述的片上系统,其中,所述存储器设置在所述片上系统的内部,或者,所述存储器设置在所述片上系统的外部。
  13. 一种终端,所述终端中设置有如权利要求11或12所述的片上系统。
PCT/CN2023/077375 2022-06-20 2023-02-21 通道分路器、存储控制装置、片上系统及终端 WO2023246132A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210699613.3A CN117290081A (zh) 2022-06-20 2022-06-20 通道分路器、存储控制装置、片上系统及终端
CN202210699613.3 2022-06-20

Publications (1)

Publication Number Publication Date
WO2023246132A1 true WO2023246132A1 (zh) 2023-12-28

Family

ID=89237744

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/077375 WO2023246132A1 (zh) 2022-06-20 2023-02-21 通道分路器、存储控制装置、片上系统及终端

Country Status (2)

Country Link
CN (1) CN117290081A (zh)
WO (1) WO2023246132A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461727A (zh) * 2013-09-16 2015-03-25 华为技术有限公司 内存模组访问方法及装置
CN104750557A (zh) * 2013-12-27 2015-07-01 华为技术有限公司 一种内存管理方法和内存管理装置
CN111045963A (zh) * 2019-12-15 2020-04-21 苏州浪潮智能科技有限公司 一种高位宽总线读写的方法及装置
CN112181682A (zh) * 2020-09-23 2021-01-05 上海爱数信息技术股份有限公司 一种多任务并发场景下的数据传输控制系统及其方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461727A (zh) * 2013-09-16 2015-03-25 华为技术有限公司 内存模组访问方法及装置
CN104750557A (zh) * 2013-12-27 2015-07-01 华为技术有限公司 一种内存管理方法和内存管理装置
CN111045963A (zh) * 2019-12-15 2020-04-21 苏州浪潮智能科技有限公司 一种高位宽总线读写的方法及装置
CN112181682A (zh) * 2020-09-23 2021-01-05 上海爱数信息技术股份有限公司 一种多任务并发场景下的数据传输控制系统及其方法

Also Published As

Publication number Publication date
CN117290081A (zh) 2023-12-26

Similar Documents

Publication Publication Date Title
US10056123B2 (en) Method and system for improving serial port memory communication latency and reliability
KR101076869B1 (ko) 코어스 그레인 재구성 어레이에서의 메모리 중심 통신 장치
US11580026B2 (en) Systems and methods for efficient data buffering
US10943635B2 (en) Memory device shared by two or more processors and system including the same
US20040076044A1 (en) Method and system for improving access latency of multiple bank devices
US8738852B2 (en) Memory controller and a dynamic random access memory interface
JP2014534529A (ja) ネットワークプロセッサにおけるマルチコア相互接続
CN109564562B (zh) 大数据运算加速系统和芯片
CN112084138A (zh) 一种用于可信存储的SoC安全盘控芯片架构设计方法
WO2023246132A1 (zh) 通道分路器、存储控制装置、片上系统及终端
WO2023246133A1 (zh) 通道分路器、存储控制装置、片上系统及终端
US11481323B2 (en) Systems and methods for efficient data buffering
CN117435251A (zh) 一种后量子密码算法处理器及其片上系统
CN104598406A (zh) 扩展功能单元及计算设备扩展系统和扩展方法
Fujita et al. HBM2 Memory System for HPC Applications on an FPGA
CN112486904B (zh) 可重构处理单元阵列的寄存器堆设计方法及装置
CN112740193B (zh) 大数据运算加速系统执行运算的方法
Sell The xbox one x scorpio engine
US7953938B2 (en) Processor enabling input/output of data during execution of operation
US20200409876A1 (en) Data transmission apparatuses, data processing systems and methods
CN117009260A (zh) 存储控制装置、片上系统、终端及数据读写方法
CN108536642B (zh) 大数据运算加速系统和芯片
US12001352B1 (en) Transaction ordering based on target address
CN112740192B (zh) 大数据运算加速系统及数据传输方法
WO2020087278A1 (zh) 大数据运算加速系统及方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23825791

Country of ref document: EP

Kind code of ref document: A1