WO2021078197A1 - 内嵌处理器进行快速数据通信的方法、装置及存储介质 - Google Patents

内嵌处理器进行快速数据通信的方法、装置及存储介质 Download PDF

Info

Publication number
WO2021078197A1
WO2021078197A1 PCT/CN2020/122890 CN2020122890W WO2021078197A1 WO 2021078197 A1 WO2021078197 A1 WO 2021078197A1 CN 2020122890 W CN2020122890 W CN 2020122890W WO 2021078197 A1 WO2021078197 A1 WO 2021078197A1
Authority
WO
WIPO (PCT)
Prior art keywords
request
data
memory interface
chip
read
Prior art date
Application number
PCT/CN2020/122890
Other languages
English (en)
French (fr)
Inventor
贾复山
张继存
Original Assignee
盛科网络(苏州)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 盛科网络(苏州)有限公司 filed Critical 盛科网络(苏州)有限公司
Priority to US17/771,507 priority Critical patent/US12013802B2/en
Publication of WO2021078197A1 publication Critical patent/WO2021078197A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/32Handling requests for interconnection or transfer for access to input/output bus using combination of interrupt and burst mode transfer
    • G06F13/34Handling requests for interconnection or transfer for access to input/output bus using combination of interrupt and burst mode transfer with priority control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/28DMA

Definitions

  • the invention belongs to the technical field of integrated circuit design, and mainly relates to a method, a device and a storage medium for fast data communication with an embedded processor.
  • SoC SoC
  • the other parts are mostly dedicated functional unit logic; among them, the CPU is responsible for general data processing and chip operation configuration and other functions, but it is for big data
  • the processing capacity is insufficient; dedicated functional units can realize large data volumes and fast data processing in hardware; therefore, there must be data communication (interaction) requirements between the CPU and the above-mentioned dedicated functional units, and with all
  • the supported functions are becoming more and more complex, and the amount of data to be interacted is also increasing.
  • the data communication between the dedicated functional unit and the processor is completed by the processor's active read and write operations; specifically, the dedicated functional unit writes the data to be interacted into the internal data buffer, and
  • the processor collects the data through an ordinary read operation and writes it into the memory of the processor; when the processor wants to send data to the special function unit, the processor directly writes the data to the data buffer in the special function unit. Then notify the special function unit; in the above implementation scheme, all data processing must be participated by the processor, which greatly increases the burden on the processor, and when the amount of data is large, it will have a greater impact on other operations of the processor, or even impossible Meet the needs of practical applications.
  • Scheme 1 uses DMA processing to realize data interaction, and adds a DMA controller (abbreviation of Direct Memory Access, Chinese literal translation: direct memory access) between the processor and the dedicated functional unit; the processor is only responsible for Configure the DMA controller, and the specific data movement operation is completed by the DMA controller, which greatly reduces the burden on the processor when interacting with a large amount of data; in this embodiment, the DMA controller is connected to the system bus of the processor And directly access the memory space of the processor through the bus, read and write to it, complete data collection and transmission; because the system bus is a sharing mechanism, so there will be other master devices on it that will initiate requests .
  • a DMA controller abbreviation of Direct Memory Access, Chinese literal translation: direct memory access
  • the processor does not need to read and write data from the dedicated functional unit, only the DMA controller needs to be configured, and the specific data movement operation is completed by the DMA controller; correspondingly, the processor is implemented in this technology
  • There is no need to read and write a large amount of data which can greatly reduce the burden on the processor; however, because the DMA controller and the processor share the system bus during the implementation of Scheme 1, in this way, the DMA operation will inevitably occupy the data transmission bandwidth of the system bus.
  • the processing efficiency of the entire system becomes low, especially when there is a large amount of data DMA operation, the impact is more obvious.
  • Option 2 is to use a shared memory mechanism.
  • the dedicated function unit shares the memory space, which can realize the data receiving and sending operations of the processor.
  • a shared memory space that can be read and written by both the processor and the special function unit is set on the chip.
  • the processor sends data to the special function unit
  • the data is written to the shared memory space first, and the memory status flag bit is added for Mark the status of the data in the current memory to facilitate the special function unit to judge when reading the corresponding data; then use the interrupt or status flag register to notify the special function unit to read the data from the specified location; otherwise, the special function unit sends the data When it reaches the processor, a similar operation is performed.
  • the purpose of the present invention is to provide a method and device for fast data communication with an embedded processor.
  • an embodiment of the present invention provides a method for fast data communication with an embedded processor, and the method includes:
  • the memory interface controller includes a plurality of memory interface control units, each memory interface control unit corresponds to at least one on-chip storage unit, and each on-chip storage unit uniquely corresponds to one memory interface control unit ;
  • the DMA controller includes a plurality of request distribution units set in one-to-one correspondence with the memory interface control unit, and each request distribution unit corresponds to the request information processed
  • the address matches the address segment of the on-chip processor, and the address corresponding to the request information processed by each request allocation unit matches the address of the on-chip storage unit corresponding to the memory interface control unit;
  • the dedicated function module includes multiple data acquisition units and multiple data receiving units.
  • the on-chip processor is used to configure the address range of the memory corresponding to the dedicated function unit.
  • the functional modules are respectively connected to each request distribution unit;
  • the memory interface control unit matches the corresponding on-chip storage unit to read and write the data in the internal memory, and returns the read data to the original requesting module, and/or as a dedicated function
  • the module makes a read and write request to the internal memory through the DMA controller, it connects to the memory interface controller according to the requested address through the DMA controller, and matches the corresponding on-chip storage unit through the memory interface control unit to read and write data in the internal memory. And return the read data to the original requesting module.
  • the method further includes: parsing the request information to obtain the address carried in the request information, and matching according to the resolved address To obtain the request allocation unit matching the request information.
  • the method specifically includes: parsing the request information to obtain the address carried in the request information, and querying each address according to the resolved address.
  • a request allocation unit to determine whether the address carried in the analysis request belongs to the address range scheduled by the current request allocation unit. If so, the current request allocation unit responds to the analysis request and stores the corresponding on-chip storage unit through the memory interface control unit connected to the current request allocation unit To make a specific response to the requested information.
  • the method further includes: pre-configuring the priority level and/or processing weight of the request information, when any interface control unit and/or any request distribution unit simultaneously receives multiple request information At the time, each request information is processed in order according to the priority level and/or processing weight of the request information; among them, each request information is processed in the order of priority level from high to low, and/or each request is scheduled cyclically according to the processing weight information.
  • the method further includes: updating the status flag information of the local register and generating an interrupt signal, and sending the interrupt signal To the on-chip processor.
  • the instruction carried in the completed request information includes: completing the data operation of at least one descriptor.
  • the method further includes: setting a timeout mechanism, when the DMA controller confirms that sufficient data processing has not been completed within a predetermined time, the timeout mechanism is used to trigger the update of the status flag information of the local register and generate Interrupt signal.
  • the method further includes: configuring ECC logic units one by one at the entrance of each on-chip storage unit;
  • the ECC check code is directly calculated according to the ECC algorithm, and it is written into the corresponding on-chip storage along with the original written data In the unit
  • write data width included in the write operation is smaller than the storage width of the on-chip storage unit, read the original data from the corresponding on-chip storage unit, modify the part of the data that needs to be updated, and then calculate the entire data after the update ECC check code, and write it back to the on-chip storage unit together with the modified data;
  • the read operation If the read operation is performed, it will automatically detect whether the read data is wrong according to the error detection algorithm, and record the error status to the corresponding register.
  • an embodiment of the present invention provides a device for fast data communication with an embedded processor, the device comprising:
  • the internal memory is divided into a plurality of on-chip storage units with consecutive addresses and sequentially addressed, and read and write operations between different on-chip storage units are independent of each other and can be performed simultaneously;
  • a memory interface controller connected to the internal memory, the memory interface controller including a plurality of memory interface control units, each memory interface control unit corresponds to at least one on-chip storage unit, and each on-chip storage unit uniquely corresponds to one memory interface control unit;
  • the on-chip processor and the DMA controller respectively connected to the memory interface controller.
  • the DMA controller includes a plurality of request distribution units set in one-to-one correspondence with the memory interface control unit, and each request distribution unit corresponds to the request information processed
  • the address matches the address segment of the on-chip processor, and the address corresponding to the request information processed by each request allocation unit matches the address of the on-chip storage unit corresponding to the memory interface control unit;
  • the modules are respectively connected to each request distribution unit;
  • the memory interface control unit matches the corresponding on-chip storage unit to read and write the data in the internal memory, and returns the read data to the original requesting module, and/or as a dedicated function
  • the module makes a read and write request to the internal memory through the DMA controller, it connects to the memory interface controller according to the requested address through the DMA controller, and matches the corresponding on-chip storage unit through the memory interface control unit to read and write data in the internal memory. And return the read data to the original requesting module.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored.
  • the above-mentioned embedded processor performs fast data processing. The steps of the method of communication.
  • the beneficial effect of the present invention is that the method, device and storage medium for fast data communication with the embedded processor of the present invention can provide concurrent data communication by dividing the internal memory of the system into multiple on-chip storage units.
  • the data processing capability improves the data processing bandwidth; at the same time, the data transmission and reception processing is completed by the DMA controller, and the on-chip processor only participates in a small amount, reducing the burden on the on-chip processor.
  • FIGS 1 and 2 are schematic diagrams of the frame structure of devices for fast data communication with embedded processors in different embodiments proposed in the background art of the present invention
  • FIG. 3 is a schematic diagram of a framework module of a device for fast data communication with an embedded processor provided by an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a method for fast data communication with an embedded processor according to an embodiment of the present invention.
  • an embodiment of the present invention provides a device for fast data communication with an embedded processor, the device includes: an internal memory 10, the internal memory 10 is divided into address continuation, and sequential addressing multiple Two on-chip storage units 11, the read and write operations between different on-chip storage units 11 are independent of each other and can be performed simultaneously; a memory interface controller 20 connected to the internal memory 10, the memory interface controller 20 includes a plurality of memory interface control units 21.
  • Each memory interface control unit 21 corresponds to at least one on-chip storage unit 11, and each on-chip storage unit 11 uniquely corresponds to one memory interface control unit 21; the on-chip processor 30 and the DMA controller 40 respectively connected to the memory interface controller 20
  • the DMA controller 40 includes a plurality of request allocation units 41 arranged in a one-to-one correspondence with the memory interface control unit 21, and the address corresponding to the request information processed by each request allocation unit 41 matches the address segment of the on-chip processor 30, And the address corresponding to the request information processed by each request allocation unit 41 matches the address of the on-chip storage unit 11 corresponding to the memory interface control unit 21; the dedicated function module 50 connected to the DMA controller 40, the dedicated function module 50 It includes multiple data acquisition units 51 and multiple data receiving units 52.
  • the on-chip processor 30 is used to configure the address range of the memory corresponding to the dedicated function module 50, and each dedicated function module 50 is connected to each request allocation.
  • Unit 41 when the on-chip processor 30 makes a read and write request to the internal memory 10, and/or the dedicated function module 50 makes a read and write request to the internal memory 10 through the DMA controller 40, the DMA controller 40 is connected according to the requested address
  • the memory interface controller 20 searches for the corresponding on-chip storage unit 11 through the memory interface control unit 21 to read and write data in the internal memory, and returns the read data to the original requesting module.
  • an embodiment of the present invention provides a method for fast data communication with an embedded processor.
  • the above-mentioned device is referenced, and each module and unit in the device is described in detail.
  • the method includes:
  • the internal memory is divided into multiple on-chip storage units with continuous addresses and sequentially addressed.
  • the read and write operations between different on-chip storage units are independent of each other and can be performed simultaneously;
  • each memory interface control unit corresponds to at least one on-chip storage unit, and each on-chip storage unit uniquely corresponds to one memory interface control unit;
  • the DMA controller includes a plurality of request distribution units set in one-to-one correspondence with the memory interface control unit, and the request information processed by each request distribution unit The corresponding address matches the address segment of the on-chip processor, and the address corresponding to the request information processed by each request allocation unit matches the address of the on-chip storage unit corresponding to the memory interface control unit;
  • the dedicated function module includes multiple data acquisition units and multiple data receiving units.
  • the on-chip processor is used to configure the address range of the memory corresponding to the dedicated function unit.
  • a dedicated function module is respectively connected to each request distribution unit;
  • the on-chip processor makes a read and write request to the internal memory, it matches the corresponding on-chip storage unit through the memory interface control unit to read and write data in the internal memory, and returns the read data to the original requesting module, and/or when
  • the dedicated function module makes a read and write request to the internal memory through the DMA controller, it connects to the memory interface controller according to the requested address through the DMA controller, and matches the corresponding on-chip storage unit through the memory interface control unit to read and write in the internal memory. Data, and return the read data to the original requesting module.
  • a memory interface controller and a dedicated DMA controller are designed between the on-chip processor and the on-chip dedicated function module to realize fast and efficient data communication between the on-chip dedicated function module and the on-chip processor;
  • the numbers S1 to S5 are only labeled for the convenience of description. In actual applications, the order of the numbers S1 to S4 described above can be performed at the same time or the order can be adjusted, and the arrangement of the order will not affect the data output result.
  • step S1 the internal memory is divided into multiple on-chip storage units, and each on-chip storage unit is connected to a unique memory interface control unit. In this way, the read and write operations between the on-chip storage units are independent of each other and can be performed simultaneously; In the specific embodiment of the present invention, the addresses of each on-chip memory are addressed in a sequential manner.
  • the memory interface control unit is responsible for receiving the request to read and write the internal memory from the on-chip processor and the DMA controller, read and write the data in the internal memory according to the requested address, and return the read data to the original request module .
  • the data volume of the configured memory interface control unit can be specifically set according to needs, and the set number is usually less than or equal to the number of on-chip storage units, that is, each memory interface control unit can operate at least one on-chip storage unit; for example: memory interface control unit When the number of on-chip storage units is the same, it is configured one-to-one. When the number of on-chip storage units is greater than the number of memory interface control units, the redundant on-chip storage units are configured to the same memory interface while performing one-to-one configuration. Control unit, or configure to multiple on-chip storage units.
  • the data acquisition unit in the DMA controller is responsible for collecting the data to be sent, and transfers it to the corresponding memory interface control unit according to the address specified by the processor on the photo
  • the number of configured request distribution units is the same as the number of the memory interface control units, and each request distribution unit is connected to each unit in the dedicated function module, that is, each request distribution unit can receive The data collected by any data collection unit can also return the result of the read request to any data receiving unit.
  • the data quantity of the request distribution unit is less than or equal to the sum of the data collection unit and the data receiving unit.
  • each request allocation unit can receive the request information of any dedicated function module, according to the specific address segment of each request allocation unit, the request allocation unit only responds to its corresponding on-chip memory according to the address division The access request of the address segment.
  • the on-chip storage unit is connected to the request allocation unit through the request allocation unit, and the number of request allocation units and on-chip storage units can be expanded or reduced as needed. The change in this demand depends on the performance requirements of the entire system and the complexity of the logic design. Balanced; each request allocation unit can correspond to one or more on-chip storage units.
  • step S5 it is the specific operation flow of the data.
  • the request information comes from the on-chip processor or DMA controller.
  • the memory interface controller first allocates the read and write requests from the on-chip processor and DMA controller to the internal memory. On the memory interface control unit corresponding to the designated on-chip storage unit; after the memory interface control unit receives these read and write requests, if it is a read request, it needs to read according to the source of the request (on-chip processor or DMA controller) The data is returned; if it is a write operation, the requested data is written to the corresponding on-chip storage unit.
  • the method further includes: Request information to set appropriate arbitration principles. Specifically, in its achievable manner, the priority level and/or processing weight of the request information are pre-configured. When any interface control unit and/or any request distribution unit simultaneously receives multiple request information, the priority of the request information is Each request information is processed in order by level and/or processing weight; wherein, each request information is processed sequentially in the order of priority level from high to low, and/or each request information is scheduled cyclically according to the processing weight.
  • the method further includes: parsing the request information to obtain the address carried by the request information, and then according to the parsed The address is matched to obtain a request allocation unit that matches the request information.
  • the method specifically includes: parsing the request information to obtain the address carried in the request information, and according to the parsed Address query of each request allocation unit to determine whether the address carried in the resolution request belongs to the address range scheduled by the current request allocation unit. If so, the current request allocation unit responds to the resolution request, and the memory interface control unit connected to the current request allocation unit is in the corresponding The on-chip storage unit responds specifically to the request information.
  • the method further includes: updating the status flag information of the local register and generating an interrupt signal, and sending the interrupt signal to the on-chip processor. For example: When the DMA controller confirms that the data is written into the specified on-chip memory by the memory interface control unit, it updates the status flag information of the local register and generates an interrupt signal to notify the on-chip processor that the current data has been processed.
  • each data usually has a corresponding descriptor.
  • the command carried in the setting completed request information is: complete Data operation of at least one descriptor; that is, according to specific needs, it can be configured to modify the status flag of the local register and generate the corresponding interrupt signal after completing the data operation of one or more descriptors; the configuration mode of a descriptor can ensure the data
  • the transmission status is notified to the on-chip processor in time, and the configuration mode of multiple descriptors can ensure that the on-chip processor will not be frequently interrupted to ensure the normal processing of other applications; these two configuration modes can be used for small data volumes. And the larger two different situations.
  • the method further includes: setting a timeout mechanism, when the DMA controller confirms that sufficient data processing has not been completed within a predetermined time, The time-out mechanism is used to trigger the update of the status flag information of the local register and generate an interrupt signal.
  • the system is configured with ECC to realize the optional function of ECC;
  • the ECC is the abbreviation of Error Checking and Correction, which is literally translated in Chinese: Error Detection and Correction.
  • the method further includes: configuring ECC logic units one by one at the entrance of each on-chip storage unit; if the write data width included in the write operation performed by the ECC logic unit is consistent with the storage width of the on-chip storage unit, then The ECC algorithm directly calculates the ECC check code and writes it into the corresponding on-chip storage unit along with the original write data; if the width of the write data contained in the executed write operation is less than the storage width of the on-chip storage unit, the original The data is read from the corresponding on-chip storage unit, after modifying part of the data that needs to be updated, the ECC check code is calculated according to the entire updated data, and it is written back to the on-chip storage unit together with the modified data; if Perform a read operation, automatically detect whether
  • the ECC logic unit when configured to support the ECC function, is enabled in the corresponding memory interface controller; the ECC logic unit is divided into two operations, read and write, for processing.
  • the ECC logic unit For the write data width included in the write operation that is consistent with the storage width of the on-chip storage unit and the operation of reading data, the ECC logic unit can be enabled to improve the fault tolerance of the data without causing any performance loss.
  • the write data width contained in the executed write operation is smaller than the storage width of the on-chip memory unit, it means that the current write operation only needs to update part of the data at the specified address in the internal memory.
  • a read-modify-write operation that is, read the original data from the corresponding on-chip storage unit, modify the part of the data that needs to be updated, and then calculate the ECC check code based on the entire updated data, and compare it with The modified data is written back to the on-chip storage unit.
  • the read-modify-write operation needs to be performed only when the width of the written data is inconsistent with the storage width of the on-chip storage unit.
  • the impact on the upper and lower logic is only the processing delay becomes longer, and it will not affect its normal function. No modification will be required.
  • the internal memory is continuously divided into 4 on-chip storage units according to addresses.
  • the 4 on-chip storage units are respectively on-chip storage unit 1, on-chip storage unit 2, and on-chip storage unit 3.
  • On-chip storage unit 4 indicates that, correspondingly, the on-chip processor can configure the address segment of each on-chip storage unit.
  • the address segments corresponding to the 4 on-chip storage units are: 0x0000 ⁇ 0x3FFF, 0x4000 ⁇ 0x7FFF, 0x8000 ⁇ 0xBFFF, 0xC000 ⁇ 0xFFFF.
  • the number of configuration memory interface control units is equal to the number of on-chip storage units, and the two are configured and connected in one-to-one correspondence.
  • the four memory interface control units are represented by MEM Mux1, MEM Mux2, MEM Mux3, and MEM Mux4, respectively.
  • the number of configuration request allocation units is equal to the number of memory interface control units, and the two are configured and connected in a one-to-one correspondence.
  • the 4 request allocation units are represented by DMA Mux1, DMA Mux2, DMA Mux3, and DMA Mux4; what needs to be explained here Yes, the address corresponding to the request information processed by each request allocation unit must match the address segment configured by the on-chip processor for each on-chip storage unit; at the same time, each of the above 4 request allocation units and dedicated function modules All are connected.
  • the dedicated function module is explained by taking two data acquisition units and two data receiving units as an example, which are data acquisition unit 1 and data acquisition unit 2, as well as data receiving unit 1 and Data receiving unit 2.
  • the on-chip processor configures the address segment corresponding to each data acquisition unit and data receiving unit.
  • any data acquisition unit and/or data receiving unit receives the request information, it sends the request information to the DMA Controller; and each request allocation unit DMA Mux in the DMA controller can receive the request information, and judge the address range of which request allocation unit the request information matches according to the address contained in the request information; if it matches the specific If it fails to match, it will not respond to the request information if it fails to match the request allocation unit. For example: when the address information contained in the current request information is in the address range 0x0000 ⁇ 0x3FFF, the request will be processed by DMA Mux1, otherwise, DMA Mux1 will not respond to the request information.
  • DMA Mux1 if there are multiple requests that need to be processed in DMA Mux1 at the same time, they are processed through the arbitration rules of multiple requests.
  • the priority of the request that needs urgent processing is higher; in addition, when multiple requests are When the priority levels are the same, the weights can also be used for cyclic scheduling, which will not be repeated here; the arbitration rules are set to ensure that the actual application requirements can be met and the logic design is simple as the ultimate goal.
  • the memory interface controller receives the data operation request from the DMA controller, it is assigned to the corresponding memory interface control unit for processing according to the address.
  • the data operation request of the on-chip processor is also sent to the corresponding memory according to the address.
  • the interface control unit processes; if the data processed by the above DMA Mux1 is sent to the on-chip processing unit MEM Mux1, and the address contained in the request information of the current on-chip processor is also 0x0000 ⁇ 0x3FFF, the request will be sent to the memory interface at the same time
  • the control unit MEM Mux1 receives and processes it.
  • the same sampling arbitration rule selects the data from the on-chip processor and the request distribution unit, and then processes them in sequence; further, the finally selected request will operate in the on-chip memory unit 1.
  • the data; write operation directly writes the data into the on-chip storage unit 1, and read operation reads out the data corresponding to the address of the on-chip storage unit 1 and returns it to the on-chip processor or DMA controller.
  • the invention can be used for fast and efficient data interaction processing between the embedded on-chip processor and on-chip dedicated functional units.
  • the number of system memory divisions that is, the number of on-chip functional units
  • the number of request allocation units in the DMA controller it should be noted that under the premise that the performance meets the requirements, the smaller the number of the above units, the simpler the logic implementation; correspondingly, based on the implementation of the present invention, on-chip processing
  • the processor is only responsible for the processing of a small amount of control information (such as DMA and interrupts).
  • the specific data transmission and reception process is implemented autonomously by on-chip logic; at the same time, because of the flexible division of system memory and the arbitration method of data transmission and reception requests, efficient data can be achieved The effect of treatment.
  • the method, device and storage medium for fast data communication with an embedded processor of the present invention can provide concurrent data processing capabilities by dividing the internal memory of the system into multiple on-chip storage units, and increase the data processing bandwidth. ; At the same time, the data transmission and reception processing is completed by the DMA controller, and the on-chip processor only participates in a small amount, which reduces the burden on the on-chip processor; further, the data transmission and reception processing is selected through two-level arbitration, and the arbitration priority principle can be flexibly set. Simple logic meets the requirements of overall system performance; in addition, the present invention also sets up a flexible and controllable ECC logic unit, so that the upper and lower modules can support the ECC data protection function without modifying the logic.
  • modules described as separate components may or may not be physically separated, and the components displayed as modules are logical modules, that is, they may be located in one of the chip logic. In the module, or can also be distributed to multiple data processing modules in the chip. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of this embodiment. Those of ordinary skill in the art can understand and implement it without creative work.
  • This application can be used in many general-purpose or special-purpose chip designs. For example: switch chips, router chips, server chips and so on.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bus Control (AREA)
  • Multi Processors (AREA)

Abstract

本发明提供一种内嵌处理器进行快速数据通信的方法、装置及存储介质,所述方法包括:将内存储器分割为地址连续,且顺序编址的多个片上存储单元;配置连接于内存储器的内存接口控制器,内存接口控制器包括多个内存接口控制单元;配置分别连接于内存接口控制器的片上处理器和DMA控制器,DMA控制器包括与内存接口控制单元一一对应设置的多个请求分配单元;配置连接于DMA控制器的专用功能模块,专用功能模块包括多个数据采集单元和多个数据接收单元;当片上处理器以及DMA控制器发生读写请求时,通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块。本发明可以提供并发的数据处理能力,提高了数据处理带宽。

Description

内嵌处理器进行快速数据通信的方法、装置及存储介质
本申请要求了申请日为2019年10月23日,申请号为201911009338.2,发明名称为“内嵌处理器进行快速数据通信的方法、装置及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明属于集成电路设计技术领域,主要涉及一种内嵌处理器进行快速数据通信的方法、装置及存储介质。
背景技术
内嵌处理器(CPU)的芯片(SoC)设计中,除了CPU外,其他部分多是专用的功能单元逻辑;其中,CPU负责通用的数据处理和芯片运行的配置等功能,但其对于大数据量的处理能力不足;专用的功能单元则可以用硬件的方式实现大数据量和快速的数据处理;所以,CPU与上述专用的功能单元之间必然存在数据通信(交互)需求,且随着所支持的功能越来越复杂,待交互的数据量也越来越多。
传统的设计方案中,专用的功能单元与处理器之间的数据通信是通过处理器主动读写操作来完成的;具体的,专用功能单元将待交互的数据写入内部的数据缓冲器,而处理器则通过普通的读操作将该数据收集进来,并写入处理器的内存中;处理器要向专用功能单元发送数据时,处理器直接写数据到专用功能单元中的数据缓冲器中,然后再通知该专用功能单元;以上的实现方案中,所有的数据处理必须由处理器参与,大大增加了处理器的负担,在数据量较大时对处理器的其他操作影响较大,甚至不能满足实际应用的需求。
基于以上方案缺陷,较多的新技术涌现出来,主要有以下两种实现方案。
如图1所示,方案1采用DMA的处理方式实现数据交互,在处理器与专用功能单元之间增加一个DMA控制器(Direct Memory Access的缩写,中文直译:直接存储器访问);处理器仅负责配置DMA控制器,而具体的数据搬移操作由DMA控制器来完成,在较大数据量的交互时,大大减轻了处理器的负担;该实施方式中,DMA控制器连接在处理器的系统总线上,并直接通过该总线直接访问处理器的内存空间,对其做读写操作,完成数据的收集和发送;因为系统总线是共享机制,如此,其上还会有其他的主设备会发起请求。
方案1实现过程中,处理器无须再从专用功能单元中读写数据,仅需要配置DMA控制器即可,而具体的数据搬移操作由DMA控制器完成;相应的,这种技术实现中处理器无须大量数据的读写操作,可以大大地减轻处理器的负担;然而,因为方案1实现过程中,DMA控制器与处理器共享系统总线,如此,DMA操作必然会抢占系统总线的数据传输带宽,导致整个系统的处理效率变低,特别是在有大量数据的DMA操作时,影响更明显。
如图2所示,方案2的实现方案是采用共享内存的机制实现。在该实施方式中,因为处理器是内嵌的,所以在片上必然会有处理器的部分内存空间;专用功能单元共享该内存空间,可以实现处理器的数据收发操作。具体的,在片上设置处理器与专用功能单元都可以读写的共享内存空间,处理器发送数据到专用功能单元时,先将数据写入该共享内存空间,并增加内存状态标志位,用于标志当前内存中的数据的状态,以方便专用功能单元读取相应数据时进行判断;然后通过中断或状态标志寄存器来通知专用功能单元从指定的位置读取该数据;反之,专用功能单元发送数据到处理器时,执行类似的操作。
方案2中,因为处理器和专用功能单元之间共享内存,数据的交互操作仅需要发送端执行写操作,接收端执行读操作即可,减少了读写数据的次数,提高了数据处理的效率。然而,在该方案中,专用功能单元中的每个数据收集和发送模块都需要设置与共享内存连接的逻辑接口,共享内存中需要增加 内存状态标志位,用以指示当前数据信息的状态;如此,共享内存连接接口设计的复杂度与与接口个数成正比,个数越多,设计越复杂;进一步的,共享内存中增加的状态标志位会占用内存空间,导致有效存储空间减少,特别是在专用功能单元中有多个数据收发模块,且待收发数据量较大时,以上增加的逻辑会更复杂,而且因为处理器要根据内存状态标志位来决定当前存储空间的数据的状态,引入了较多的额外逻辑消耗,导致数据效率降低。
发明内容
为解决上述技术问题,本发明的目的在于提供一种内嵌处理器进行快速数据通信的方法和装置。
为了实现上述发明目的之一,本发明一实施方式提供一种内嵌处理器进行快速数据通信的方法,所述方法包括:
将内存储器分割为地址连续,且顺序编址的多个片上存储单元,不同片上存储单元之间的读写操作相互独立且可以同时进行;
配置连接于内存储器的内存接口控制器,所述内存接口控制器包括多个内存接口控制单元,每一内存接口控制单元对应至少一个片上存储单元,每一片上存储单元唯一对应一个内存接口控制单元;
配置分别连接于内存接口控制器的片上处理器和DMA控制器,所述DMA控制器包括与内存接口控制单元一一对应设置的多个请求分配单元,每一请求分配单元处理的请求信息所对应的地址与片上处理器的地址段匹配,且每一请求分配单元处理的请求信息所对应的地址与通过内存接口控制单元对应的片上存储单元的地址匹配;
配置连接于DMA控制器的专用功能模块,所述专用功能模块包括多个数据采集单元和多个数据接收单元,所述片上处理器用于配置专用功能单元所对应的内存的地址范围,每一专用功能模块均分别连接于每一请求分配单元;
当片上处理器对内存储器发生读写请求时,通过内存接口控制单元匹配 对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块,和/或当专用功能模块通过DMA控制器对内存储器发生读写请求时,通过所述DMA控制器按请求的地址对接内存接口控制器,并通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块。
作为本发明一实施方式的进一步改进,当专用功能模块通过DMA控制器对内存发生读写请求时,所述方法还包括:解析请求信息以获得请求信息携带的地址,根据解析出的地址进行匹配以获得与请求信息匹配的请求分配单元。
作为本发明一实施方式的进一步改进,当专用功能模块通过DMA控制器对内存发生读写请求时,所述方法具体包括:解析请求信息以获得请求信息携带的地址,根据解析出的地址查询每一请求分配单元,判断解析请求携带的地址是否属于当前请求分配单元调度的地址范围,若是,当前请求分配单元响应解析请求,并通过当前请求分配单元连接的内存接口控制单元在相应的片上存储单元上对请求信息进行具体响应。
作为本发明一实施方式的进一步改进,所述方法还包括:预配置请求信息的优先级别和/或处理权重,当任一接口控制单元和/或任一请求分配单元同时接收到多个请求信息时,根据请求信息的优先级别和/或处理权重按序处理每一请求信息;其中,按优先级别自高至低的顺序依次处理每一请求信息,和/或按照处理权重循环调度每一请求信息。
作为本发明一实施方式的进一步改进,通过DMA控制器确认片上存储器已完成请求信息所携带的指令后,所述方法还包括:更新本地寄存器的状态标志信息和产生中断信号,并将中断信号发送至片上处理器。
作为本发明一实施方式的进一步改进,已完成请求信息所携带的指令包括:完成至少一个描述符的数据操作。
作为本发明一实施方式的进一步改进,所述方法还包括:设置超时机制,当DMA控制器确认预定时间内未能完成足够的数据处理时,采用超时机制 触发更新本地寄存器的状态标志信息和产生中断信号。
作为本发明一实施方式的进一步改进,所述方法还包括:在每一片上存储单元的入口处一一配置ECC逻辑单元;
若ECC逻辑单元执行的写操作中包含的写数据宽度与片上存储单元的存储宽度一致,则根据ECC的算法直接计算ECC校验码,并将其与原写数据一并写入对应的片上存储单元中;
若执行的写操作中包含的写数据宽度小于片上存储单元的存储宽度,则将原始数据从对应的片上存储单元中读取出来,修改需要更新的部分数据后,再按更新后的整个数据计算ECC校验码,并将其与修改后的数据一并写回片上存储单元;
若执行读操作,则根据检错算法自动检测读出的数据是否有错,并记录错误状态到相应的寄存器中。
为了实现上述发明目的之一,本发明一实施方式提供一种内嵌处理器进行快速数据通信的装置,所述装置包括:
内存储器,所述内存储器被分割为地址连续,且顺序编址的多个片上存储单元,不同片上存储单元之间的读写操作相互独立且可以同时进行;
连接于内存储器的内存接口控制器,所述内存接口控制器包括多个内存接口控制单元,每一内存接口控制单元对应至少一个片上存储单元,每一片上存储单元唯一对应一个内存接口控制单元;
分别连接于内存接口控制器的片上处理器和DMA控制器,所述DMA控制器包括与内存接口控制单元一一对应设置的多个请求分配单元,每一请求分配单元处理的请求信息所对应的地址与片上处理器的地址段匹配,且每一请求分配单元处理的请求信息所对应的地址与通过内存接口控制单元对应的片上存储单元的地址匹配;
连接于DMA控制器的专用功能模块,所述专用功能模块包括多个数据采集单元和多个数据接收单元,所述片上处理器用于配置专用功能单元所对应的内存的地址范围,每一专用功能模块均分别连接于每一请求分配单元;
当片上处理器对内存储器发生读写请求时,通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块,和/或当专用功能模块通过DMA控制器对内存储器发生读写请求时,通过所述DMA控制器按请求的地址对接内存接口控制器,并通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块。
为了实现上述发明目的之一,本发明一实施方式提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的内嵌处理器进行快速数据通信的方法的步骤。
与现有技术相比,本发明的有益效果是:本发明的内嵌处理器进行快速数据通信的方法、装置及存储介质,通过将系统内存储器分割为多个片上存储单元,可以提供并发的数据处理能力,提高了数据处理带宽;同时,数据收发处理由DMA控制器完成,片上处理器仅少量参与,减轻了片上处理器的负担。
附图说明
图1、图2是本发明背景技术提出的不同实施方式的内嵌处理器进行快速数据通信的装置的框架结构示意图;
图3是本发明一实施方式提供的内嵌处理器进行快速数据通信的装置的框架模块示意图;
图4是本发明一实施方式提供的内嵌处理器进行快速数据通信的方法的流程示意图。
具体实施方式
以下将结合附图所示的具体实施方式对本发明进行详细描述。但这些实施方式并不限制本发明,本领域的普通技术人员根据这些实施方式所做出的结构、方法、或功能上的变换均包含在本发明的保护范围内。
如图3所示,本发明一实施方式提供的内嵌处理器进行快速数据通信的装置,所述装置包括:内存储器10,所述内存储器10被分割为地址延续,且顺序编址的多个片上存储单元11,不同片上存储单元11之间的读写操作相互独立且可以同时进行;连接于内存储器10的内存接口控制器20,所述内存接口控制器20包括多个内存接口控制单元21,每一内存接口控制单元21对应至少一个片上存储单元11,每一片上存储单元11唯一对应一个内存接口控制单元21;分别连接于内存接口控制器20的片上处理器30和DMA控制器40,所述DMA控制器40包括与内存接口控制单元21一一对应设置的多个请求分配单元41,每一请求分配单元41处理的请求信息所对应的地址与片上处理器30的地址段匹配,且每一请求分配单元41处理的请求信息所对应的地址与通过内存接口控制单元21对应的片上存储单元11的地址匹配;连接于DMA控制器40的专用功能模块50,所述专用功能模块50包括多个数据采集单元51和多个数据接收单元52,所述片上处理器30用于配置专用功能模块50所对应的内存的地址范围,每一专用功能模块50均分别连接于每一请求分配单元41;当片上处理器30对内存储器10发生读写请求,和/或专用功能模块50通过DMA控制器40对内存储器10发生读写请求时,所述DMA控制器40按请求的地址对接内存接口控制器20,并通过内存接口控制单元21查找对应的片上存储单元11以读写内存储器中数据,并将读出的数据返回给原请求模块。
结合图4所示,本发明一实施方式提供一种内嵌处理器进行快速数据通信的方法,该方法中引用上述装置,并对装置中的各个模块及单元做详细说明。
本发明具体实施方式中,所述方法包括:
S1、将内存储器分割为地址连续,且顺序编址的多个片上存储单元,不同片上存储单元之间的读写操作相互独立且可以同时进行;
S2、配置连接于内存储器的内存接口控制器,所述内存接口控制器包括 多个内存接口控制单元,每一内存接口控制单元对应至少一个片上存储单元,每一片上存储单元唯一对应一个内存接口控制单元;
S3、配置分别连接于内存接口控制器的片上处理器和DMA控制器,所述DMA控制器包括与内存接口控制单元一一对应设置的多个请求分配单元,每一请求分配单元处理的请求信息所对应的地址与片上处理器的地址段匹配,且每一请求分配单元处理的请求信息所对应的地址与通过内存接口控制单元对应的片上存储单元的地址匹配;
S4、配置连接于DMA控制器的专用功能模块,所述专用功能模块包括多个数据采集单元和多个数据接收单元,所述片上处理器用于配置专用功能单元所对应的内存的地址范围,每一专用功能模块均分别连接于每一请求分配单元;
S5、当片上处理器对内存储器发生读写请求时,通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块,和/或当专用功能模块通过DMA控制器对内存储器发生读写请求时,通过所述DMA控制器按请求的地址对接内存接口控制器,并通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块。
需要说明的是,本发明在片上处理器与片上专用功能模块之间设计内存接口控制器和专用的DMA控制器,实现片上专用功能模块与片上处理器之间的快速、高效的数据通信;上述标号S1至S5仅仅是为了方便描述而进行标号,实际应用中,上述标号S1至S4的顺序可同时进行也可以进行顺序调整,其顺序的排布不会影响数据的输出结果。
对于步骤S1,内存储器被分割为多个片上存储单元,每个片上存储单元均与唯一的内存接口控制单元连接,如此,各个片上存储单元之间的读写操作相互独立,且可以同时进行;本发明具体实施方式中,各个片上存储器的地址以顺序方式进行编址。
对于步骤S2,内存接口控制单元负责接收片上处理器和DMA控制器发出的读写内存储器的请求,并按照请求的地址读写内存储器中的数据,并将读出的数据返回给原请求模块。配置的内存接口控制单元的数据量可根据需要具体设定,其设定数量通常小于等于片上存储单元的数量,即每一内存接口控制单元可操作至少一个片上存储单元;例如:内存接口控制单元和片上存储单元数量相同时,其一对一进行配置,当片上存储单元数量大于内存接口控制单元数量时,在对其进行一对一配置的同时,将多余的片上存储单元配置至同一内存接口控制单元,或配置至多个片上存储单元。
对于步骤S3,DMA控制器中的数据采集单元负责收集待发送的数据,并按照片上处理器对待发送数据指定的地址转入相应的内存接口控制单元,进一步的,通过内存接口控制单元查找相应的片上存储单元,并将数据写入;同时,DMA控制器根据片上处理器的配置,将来自片上存储单元的读数据写入指定的数据接收单元。本实施方式中,配置的请求分配单元的数量与所述内存接口控制单元的数量相同,每个请求分配单元与专用功能模块中的每个单元均有连接,即每个请求分配单元均可以接收任一数据采集单元采集的数据也可以将读请求的结果返回至任一数据接收单元,通常情况下,请求分配单元的数据数量小于等于数据采集单元和数据接收单元的数量之和,然而,实际应用中,虽然每个请求分配单元均可以接收任一专用功能模块的请求信息,但根据每一请求分配单元的具体划分地址段,所述请求分配单元仅根据地址划分响应自己所对应的片上存储器的地址段的访问请求。如上,片上存储单元通过请求分配单元连接至请求分配单元,请求分配单元和片上存储单元的数量均可以根据需要进行扩展或缩减,该需求的变更取决于整个系统的性能要求和逻辑设计复杂度的平衡;每个请求分配单元可对应1个或多个片上存储单元,其原因在于,这些片上存储单元在逻辑上是独立的,请求分配单元对其内部的数据的读写也是相互独立的;相应的,片上存储单元的个数越多,所支持数据处理带宽也就越高;但同时,个数越多逻辑设计也就越复 杂,进而导致芯片成本增加;数据采集单元和数据接收单元的数量一般会多于请求分配单元的个数,在系统设计的地址分配时,将有大量数据操作的专用功能模块配置为单独占用一个请求分配单元,而其他仅偶尔有数据操作的专用功能模块可以共用一个请求分配单元,这样的设计即可以保证系统性能又能简化逻辑设计的复杂度。
至于步骤S5,为数据的具体操作流程,实际应用中,请求信息来自片上处理器或DMA控制器,内存接口控制器首先将来自片上处理器和DMA控制器的对内存储器的读写请求分配到指定的片上存储单元所对应的内存接口控制单元上;内存接口控制单元在接收这些读写请求后,若果是读请求,则需要根据请求的源(片上处理器或DMA控制器)将读出的数据返回;如果时写操作,则将请求的数据写入对应的片上存储单元中。
需要说明的是,对于内存接口控制单元和请求分配单元,在同一时间内,其接收的请求有可能为多个业务相互影响,本发明较佳实施方式中,所述方法还包括:对每类请求信息设置适当的仲裁原则。具体的,其可实现方式中,预配置请求信息的优先级别和/或处理权重,当任一接口控制单元和/或任一请求分配单元同时接收到多个请求信息时,根据请求信息的优先级别和/或处理权重按序处理每一请求信息;其中,按优先级别自高至低的顺序依次处理每一请求信息,和/或按照处理权重循环调度每一请求信息。
本发明具体实施方式中,数据交互过程中,当专用功能模块通过DMA控制器对应于内存发生读写请求时,所述方法还包括:解析请求信息以获得请求信息携带的地址,根据解析出的地址进行匹配以获得与请求信息匹配的请求分配单元。较佳的,本发明一具体实施方式中,当专用功能模块通过DMA控制器对应于内存发生读写请求时,所述方法具体包括:解析请求信息以获得请求信息携带的地址,根据解析出的地址查询每一请求分配单元,判断解析请求携带的地址是否属于当前请求分配单元调度的地址范围,若是,当前请求分配单元响应解析请求,并通过当前请求分配单元连接的内存接口控制 单元在相应的片上存储单元上对请求信息进行具体响应。
本发明较佳实施方式中,通过DMA控制器确认片上存储器已完成请求信息所携带的指令后,所述方法还包括:更新本地寄存器的状态标志信息和产生中断信号,并将中断信号发送至片上处理器。例如:当DMA控制器确认数据被内存接口控制单元写入指定的片上存储器后,更新本地寄存器的状态标志信息和产生中断信号,以通知片上处理器当前数据已经处理完成。
通常情况下,每一数据通常均具有相应的描述符,当描述符操作完成时,可认为数据处理完成,如此,本发明较佳实施方式中,设置已完成请求信息所携带的指令为:完成至少一个描述符的数据操作;即,根据具体需要,可以配置完成一个或多个描述符的数据操作后修改本地寄存器的状态标志和产生相应的中断信号;一个描述符的配置模式可以保证数据的传输状态及时通知到片上处理器,而多个描述符的配置模式可以保证片上处理器不会被频繁中断,以保证其他应用程序的正常处理;这两种配置模式可以分别用在数据量较小和较大两种不同的情况。
本发明一较佳实施方式中,为了保证数据传输过程中的出现的暂时中断的情况,所述方法还包括:设置超时机制,当DMA控制器确认预定时间内未能完成足够的数据处理时,采用超时机制触发更新本地寄存器的状态标志信息和产生中断信号。
另外,本发明较佳实施方式中,对系统进行ECC配置,以实现ECC可选功能;所述ECC为Error Checking and Correction的缩写,中文直译:误码检测与纠正。具体的,所述方法还包括:在每一片上存储单元的入口处一一配置ECC逻辑单元;若ECC逻辑单元执行的写操作中包含的写数据宽度与片上存储单元的存储宽度一致,则根据ECC的算法直接计算ECC校验码,并将其与原写数据一并写入对应的片上存储单元中;若执行的写操作中包含的写数据宽度小于片上存储单元的存储宽度,则将原始数据从对应的片上存储单元中读取出来,修改需要更新的部分数据后,再按更新后的整个数据计 算ECC校验码,并将其与修改后的数据一并写回片上存储单元;若执行读操作,则根据检错算法自动检测读出的数据是否有错,并记录错误状态到相应的寄存器中。
在本发明具体应用中,当配置为要支持ECC功能时,相应的内存接口控制器中便会使能ECC逻辑单元;ECC逻辑单元分成读和写两种操作来处理。对于写操作中包含的写数据宽度与片上存储单元的存储宽度一致以及读数据的操作,ECC逻辑单元使能后可以提高数据的容错性,不会导致任何的性能的损失。
然而,若执行的写操作中包含的写数据宽度小于片上存储单元的存储宽度,则表示当前写操作仅需要更新内存储器中指定地址的部分数据,此时,为了保证写操作不会影响原有数据,需要执行读-修改-写操作,即将原始数据从对应的片上存储单元中读取出来,修改需要更新的部分数据后,再按更新后的整个数据计算ECC校验码,并将其与修改后的数据一并写回片上存储单元。如此,仅当写数据宽度与片上存储单元的存储宽度不一致时才需要执行读-修改-写操作,对于上下逻辑的影响也仅是处理延时变长,并不会影响其正常的功能,也不会需要做任何修改。
结合图3所示,本发明具体示例中,将内存储器按地址连续分成4个片上存储单元该示例中,4个片上存储单元分别以片上存储单元1,片上存储单元2,片上存储单元3,片上存储单元4表示,相应的,片上处理器可配置每一片上存储单元的地址段,在该示例中,4个片上存储单元对应的地址段依次分别为:0x0000~0x3FFF,0x4000~0x7FFF,0x8000~0xBFFF,0xC000~0xFFFF。
配置内存接口控制单元的数量与片上存储单元的数量相等,且两者一一对应配置连接,该4个内存接口控制单元分别为以MEM Mux1,MEM Mux2,MEM Mux3,MEM Mux4表示。
配置请求分配单元的数量与内存接口控制单元的数量相等,且两者一一 对应配置连接,该4个请求分配单元分别以DMA Mux1,DMA Mux2,DMA Mux3,DMA Mux4表示;在此需要说明的是,每个请求分配单元处理的请求信息所对应的地址一定与片上处理器对各个片上存储单元配置的的址段相匹配;同时,上述4个请求分配单元与专用功能模块中的每个单元均有连接,在该具体示例中,专用功能模块以包含两个数据采集单元和两个数据接收单元为例进行说明,其分别为数据采集单元1和数据采集单元2,以及数据接收单元1和数据接收单元2。
数据交互过程中,片上处理器对每一数据采集单元和数据接收单元所对应地址段进行配置,当任一数据采集单元和/或数据接收单元接收到请求信息时,均将请求信息发送至DMA控制器;且DMA控制器中的每一请求分配单元DMA Mux均可以接收该请求信息,并根据该请求信息包含的地址判断该请求信息匹配到哪一个请求分配单元的地址范围;若匹配到具体的请求分配单元,则由该请求单元调度当前请求信息,若未能匹配,则不相应请求信息。例如:当前请求信息包含的地址信息位于地址段0x0000~0x3FFF内时,该请求会被DMA Mux1处理,否则,DMA Mux1对于该请求信息不做任何响应。
进一步的,若同时有多个请求信息需要同时在DMA Mux1中被处理时,通过多个请求的仲裁规则进行处理,一般情况下,需要紧急处理的请求优先级较高;另外,当多个请求优先级别相同时,还可以通过权重进行循环调度,在此不做进一步的赘述;仲裁规则的设置以保证能满足实际应用需求且逻辑设计简单为最终目标。
进一步的,内存接口控制器接收到来自DMA控制器的数据操作请求后,按地址分配给对应的内存接口控制单元进行处理,同时,片上处理器的数据操作请求也根据地址被送到相应的内存接口控制单元处理;若在上述DMA Mux1处理的数据发送至片上处理单元MEM Mux1的同时,当前片上处理器的请求信息中包含的地址同样为0x0000~0x3FFF,则该请求会一并送至内存接口控制单元MEM Mux1接收并处理,此时,同样采样仲裁规则对来自于片 上处理器和请求分配单元的数据做选择后,依次进行处理;进一步的,最终被选中的请求会操作片上存储单元1中的数据;写操作直接将数据写入该片上存储单元1,读操作则将片上存储单元1对应地址的数据读出后返回片上处理器或DMA控制器。
本发明可用于内嵌片上处理器与片上专用功能单元之间的快速高效的数据交互处理,基于对性能和设计复杂度的平衡,可以设定系统内存划分的个数,即片上功能单元的数量以及DMA控制器中请求分配单元的数量;需要说明的是,在保证性能达到要求的前提下,以上各个单元的数量越少,逻辑实现越简单;相应的,基于本发明的实现方案,片上处理器仅负责少量的控制信息(如DMA和中断)的处理,具体的数据收发过程由片上的逻辑自主实现;同时,因为系统内存灵活的划分方式和数据收发请求的仲裁方式,可以达到高效的数据处理的效果。
综上所述,本发明的内嵌处理器进行快速数据通信的方法、装置及存储介质,通过将系统内存储器分割为多个片上存储单元,可以提供并发的数据处理能力,提高了数据处理带宽;同时,数据收发处理由DMA控制器完成,片上处理器仅少量参与,减轻了片上处理器的负担;进一步的,数据收发处理经过两级仲裁选择,可以灵活地设置仲裁优先级原则,用较简单的逻辑满足整体系统性能的需求;另外,本发明还通过设置灵活可控的ECC逻辑单元,使得上下级模块无须修改逻辑即可支持ECC数据保护功能。
以上所描述的系统实施方式仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件是逻辑模块,即可以位于芯片逻辑中的一个模块中,或者也可以分布到芯片内的多个数据处理模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施方式方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
本申请可用于众多通用或专用的芯片设计中。例如:交换芯片、路由器 芯片,服务器芯片等等。
应当理解,虽然本说明书按照实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施方式中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。
上文所列出的一系列的详细说明仅仅是针对本发明的可行性实施方式的具体说明,它们并非用以限制本发明的保护范围,凡未脱离本发明技艺精神所作的等效实施方式或变更均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种内嵌处理器进行快速数据通信的方法,其特征在于,所述方法包括:
    将内存储器分割为地址连续,且顺序编址的多个片上存储单元,不同片上存储单元之间的读写操作相互独立且可以同时进行;
    配置连接于内存储器的内存接口控制器,所述内存接口控制器包括多个内存接口控制单元,每一内存接口控制单元对应至少一个片上存储单元,每一片上存储单元唯一对应一个内存接口控制单元;
    配置分别连接于内存接口控制器的片上处理器和DMA控制器,所述DMA控制器包括与内存接口控制单元一一对应设置的多个请求分配单元,每一请求分配单元处理的请求信息所对应的地址与片上处理器的地址段匹配,且每一请求分配单元处理的请求信息所对应的地址与通过内存接口控制单元对应的片上存储单元的地址匹配;
    配置连接于DMA控制器的专用功能模块,所述专用功能模块包括多个数据采集单元和多个数据接收单元,所述片上处理器用于配置专用功能单元所对应的内存的地址范围,每一专用功能模块均分别连接于每一请求分配单元;
    当片上处理器对内存储器发生读写请求时,通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块,和/或当专用功能模块通过DMA控制器对内存储器发生读写请求时,通过所述DMA控制器按请求的地址对接内存接口控制器,并通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块。
  2. 根据权利要求1所述的内嵌处理器进行快速数据通信的方法,其特征在于,当专用功能模块通过DMA控制器对内存发生读写请求时,所述方法 还包括:解析请求信息以获得请求信息携带的地址,根据解析出的地址进行匹配以获得与请求信息匹配的请求分配单元。
  3. 根据权利要求2所述的内嵌处理器进行快速数据通信的方法,其特征在于,当专用功能模块通过DMA控制器对内存发生读写请求时,所述方法具体包括:解析请求信息以获得请求信息携带的地址,根据解析出的地址查询每一请求分配单元,判断解析请求携带的地址是否属于当前请求分配单元调度的地址范围,若是,当前请求分配单元响应解析请求,并通过当前请求分配单元连接的内存接口控制单元在相应的片上存储单元上对请求信息进行具体响应。
  4. 根据权利要求1所述的内嵌处理器进行快速数据通信的方法,其特征在于,所述方法还包括:预配置请求信息的优先级别和/或处理权重,当任一接口控制单元和/或任一请求分配单元同时接收到多个请求信息时,根据请求信息的优先级别和/或处理权重按序处理每一请求信息;其中,按优先级别自高至低的顺序依次处理每一请求信息,和/或按照处理权重循环调度每一请求信息。
  5. 根据权利要求1所述的内嵌处理器进行快速数据通信的方法,其特征在于,通过DMA控制器确认片上存储器已完成请求信息所携带的指令后,所述方法还包括:更新本地寄存器的状态标志信息和产生中断信号,并将中断信号发送至片上处理器。
  6. 根据权利要求5所述的内嵌处理器进行快速数据通信的方法,其特征在于,已完成请求信息所携带的指令包括:完成至少一个描述符的数据操作。
  7. 根据权利要求5所述的内嵌处理器进行快速数据通信的方法,其特征在于,所述方法还包括:设置超时机制,当DMA控制器确认预定时间内未能完成足够的数据处理时,采用超时机制触发更新本地寄存器的状态标志信息和产生中断信号。
  8. 根据权利要求1所述的内嵌处理器进行快速数据通信的方法,其特征 在于,所述方法还包括:在每一片上存储单元的入口处一一配置ECC逻辑单元;
    若ECC逻辑单元执行的写操作中包含的写数据宽度与片上存储单元的存储宽度一致,则根据ECC的算法直接计算ECC校验码,并将其与原写数据一并写入对应的片上存储单元中;
    若执行的写操作中包含的写数据宽度小于片上存储单元的存储宽度,则将原始数据从对应的片上存储单元中读取出来,修改需要更新的部分数据后,再按更新后的整个数据计算ECC校验码,并将其与修改后的数据一并写回片上存储单元;
    若执行读操作,则根据检错算法自动检测读出的数据是否有错,并记录错误状态到相应的寄存器中。
  9. 一种内嵌处理器进行快速数据通信的装置,其特征在于,所述装置包括:
    内存储器,所述内存储器被分割为地址连续,且顺序编址的多个片上存储单元,不同片上存储单元之间的读写操作相互独立且可以同时进行;
    连接于内存储器的内存接口控制器,所述内存接口控制器包括多个内存接口控制单元,每一内存接口控制单元对应至少一个片上存储单元,每一片上存储单元唯一对应一个内存接口控制单元;
    分别连接于内存接口控制器的片上处理器和DMA控制器,所述DMA控制器包括与内存接口控制单元一一对应设置的多个请求分配单元,每一请求分配单元处理的请求信息所对应的地址与片上处理器的地址段匹配,且每一请求分配单元处理的请求信息所对应的地址与通过内存接口控制单元对应的片上存储单元的地址匹配;
    连接于DMA控制器的专用功能模块,所述专用功能模块包括多个数据采集单元和多个数据接收单元,所述片上处理器用于配置专用功能单元所对应的内存的地址范围,每一专用功能模块均分别连接于每一请求分配单元;
    当片上处理器对内存储器发生读写请求时,通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块,和/或当专用功能模块通过DMA控制器对内存储器发生读写请求时,通过所述DMA控制器按请求的地址对接内存接口控制器,并通过内存接口控制单元匹配对应的片上存储单元以读写内存储器中数据,并将读出的数据返回给原请求模块。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-8任意一项所述的内嵌处理器进行快速数据通信的方法的步骤。
PCT/CN2020/122890 2019-10-23 2020-10-22 内嵌处理器进行快速数据通信的方法、装置及存储介质 WO2021078197A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/771,507 US12013802B2 (en) 2019-10-23 2020-10-22 Method and apparatus for embedded processor to perform fast data communication, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911009338.2 2019-10-23
CN201911009338.2A CN110737618B (zh) 2019-10-23 2019-10-23 内嵌处理器进行快速数据通信的方法、装置及存储介质

Publications (1)

Publication Number Publication Date
WO2021078197A1 true WO2021078197A1 (zh) 2021-04-29

Family

ID=69270954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122890 WO2021078197A1 (zh) 2019-10-23 2020-10-22 内嵌处理器进行快速数据通信的方法、装置及存储介质

Country Status (3)

Country Link
US (1) US12013802B2 (zh)
CN (1) CN110737618B (zh)
WO (1) WO2021078197A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110737618B (zh) * 2019-10-23 2021-03-16 盛科网络(苏州)有限公司 内嵌处理器进行快速数据通信的方法、装置及存储介质
CN114037506A (zh) * 2021-10-21 2022-02-11 深圳市道旅旅游科技股份有限公司 工单分配方法、装置、设备和介质
CN114416614A (zh) * 2022-01-19 2022-04-29 安徽芯纪元科技有限公司 一种用于保护现场和恢复现场的中断处理模块

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556565A (zh) * 2009-01-22 2009-10-14 杭州中天微系统有限公司 嵌入式处理器的片上高性能dma
WO2018120010A1 (en) * 2016-12-30 2018-07-05 Intel Corporation Memory sharing for application offload from host processor to integrated sensor hub
CN109308283A (zh) * 2018-08-31 2019-02-05 西安微电子技术研究所 一种SoC片上系统及其外设总线切换方法
CN110737618A (zh) * 2019-10-23 2020-01-31 盛科网络(苏州)有限公司 内嵌处理器进行快速数据通信的方法、装置及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0293851B1 (en) * 1987-06-05 1994-10-19 Mitsubishi Denki Kabushiki Kaisha Digital signal processor
CN101558396B (zh) * 2006-12-15 2011-12-14 密克罗奇普技术公司 直接存储器存取控制器
US7877524B1 (en) * 2007-11-23 2011-01-25 Pmc-Sierra Us, Inc. Logical address direct memory access with multiple concurrent physical ports and internal switching
US9477597B2 (en) * 2011-03-25 2016-10-25 Nvidia Corporation Techniques for different memory depths on different partitions
CN104820657A (zh) * 2015-05-14 2015-08-05 西安电子科技大学 一种基于嵌入式异构多核处理器上的核间通信方法及并行编程模型
US11301295B1 (en) * 2019-05-23 2022-04-12 Xilinx, Inc. Implementing an application specified as a data flow graph in an array of data processing engines

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556565A (zh) * 2009-01-22 2009-10-14 杭州中天微系统有限公司 嵌入式处理器的片上高性能dma
WO2018120010A1 (en) * 2016-12-30 2018-07-05 Intel Corporation Memory sharing for application offload from host processor to integrated sensor hub
CN109308283A (zh) * 2018-08-31 2019-02-05 西安微电子技术研究所 一种SoC片上系统及其外设总线切换方法
CN110737618A (zh) * 2019-10-23 2020-01-31 盛科网络(苏州)有限公司 内嵌处理器进行快速数据通信的方法、装置及存储介质

Also Published As

Publication number Publication date
US20220365893A1 (en) 2022-11-17
CN110737618B (zh) 2021-03-16
CN110737618A (zh) 2020-01-31
US12013802B2 (en) 2024-06-18

Similar Documents

Publication Publication Date Title
WO2021078197A1 (zh) 内嵌处理器进行快速数据通信的方法、装置及存储介质
US11481346B2 (en) Method and apparatus for implementing data transmission, electronic device, and computer-readable storage medium
KR101861312B1 (ko) 다중슬롯 링크 계층 플릿에서의 제어 메시징
US20230015404A1 (en) Memory system and data processing system including the same
CN103049406B (zh) 用于i/o流量的一致性开关
US20090307408A1 (en) Peer-to-Peer Embedded System Communication Method and Apparatus
US8850081B2 (en) Method, system and apparatus for handling events for partitions in a socket with sub-socket partitioning
US5682551A (en) System for checking the acceptance of I/O request to an interface using software visible instruction which provides a status signal and performs operations in response thereto
WO2019233322A1 (zh) 资源池的管理方法、装置、资源池控制单元和通信设备
JP2001142842A (ja) Dmaハンドシェークプロトコル
CN101102305A (zh) 管理网络信息处理的系统和方法
JPH1097513A (ja) マルチプロセッサ・コンピュータ・システム中のノード、及びマルチプロセッサ・コンピュータ・システム
EP3062232B1 (en) Method and device for automatically exchanging signals between embedded multi-cpu boards
US20150067695A1 (en) Information processing system and graph processing method
JP2001051959A (ja) 少なくとも1つのnuma(non−uniformmemoryaccess)データ処理システムとして構成可能な相互接続された処理ノード
JP2002342299A (ja) クラスタシステム、コンピュータ及びプログラム
CN1740997A (zh) 网络设备及其外围器件互连资源的分配方法
US20040044877A1 (en) Computer node to mesh interface for highly scalable parallel processing system
JPH10187631A (ja) 拡張された対称マルチプロセッサ・アーキテクチャ
US10445267B2 (en) Direct memory access (DMA) unit with address alignment
US20040215861A1 (en) Method of allowing multiple, hardware embedded configurations to be recognized by an operating system
CN117033275A (zh) 加速卡间的dma方法、装置、加速卡、加速平台及介质
CN115328832B (zh) 一种基于pcie dma的数据调度系统与方法
US11861403B2 (en) Method and system for accelerator thread management
CN114237717A (zh) 一种多核异构处理器片上暂存动态调度管理器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20879829

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20879829

Country of ref document: EP

Kind code of ref document: A1