CN114265791A - Data scheduling method, chip and electronic equipment - Google Patents

Data scheduling method, chip and electronic equipment Download PDF

Info

Publication number
CN114265791A
CN114265791A CN202111440615.2A CN202111440615A CN114265791A CN 114265791 A CN114265791 A CN 114265791A CN 202111440615 A CN202111440615 A CN 202111440615A CN 114265791 A CN114265791 A CN 114265791A
Authority
CN
China
Prior art keywords
data
page
controller
size
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111440615.2A
Other languages
Chinese (zh)
Inventor
李树青
王江
孙华锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd filed Critical Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202111440615.2A priority Critical patent/CN114265791A/en
Publication of CN114265791A publication Critical patent/CN114265791A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data scheduling method, which comprises the following steps: acquiring data to be scheduled by using a first controller; responding to the target address of the data to be scheduled in the virtual address provided by the second controller, sequentially acquiring the data of each data page in the data to be scheduled, and determining the offset in the first data block in each data page according to the PRP address of each data page; sending a data processing request to the first controller by using the second controller so that the first controller sends data with corresponding size in the current data page to the second controller according to the size of the data to be processed corresponding to the data processing request; the second controller determines the page sequence number of the data page to be stored and the page offset in the data page to be stored according to the virtual address range corresponding to the destination address of the data with the corresponding size and the offset of the first data block in the current data page; and writing the data with the corresponding size into the corresponding position in the data page to be stored according to the page sequence number and the offset in the page.

Description

Data scheduling method, chip and electronic equipment
Technical Field
The invention relates to the field of data processing, in particular to a data scheduling method, a chip and electronic equipment.
Background
The explosion of applications such as AI and cloud computing puts higher demands on data processing, and a server needs to have larger data storage capacity, higher data storage bandwidth and stronger data processing performance. The speed of CPU performance increase cannot gradually keep up with the increase speed of storage bandwidth and network bandwidth, which is called bottleneck. The storage acceleration device is a hardware with a special processing chip, and is used for unloading data processing related to storage by a CPU, such as data transfer, data verification, processing of a storage protocol, RAID calculation and the like, so that a server system can break through storage bottleneck caused by the CPU, and meanwhile, relatively expensive general computing resources of the CPU can be replaced by relatively cheap special computing resources for storage, so that the cost of the server is reduced.
Storage acceleration devices can be classified into a variety of types according to application scenarios, and RAID acceleration is one of the wide-ranging applications. The method is generally used for mapping data read-write requests issued by a host to a plurality of corresponding disks, and is responsible for calculating redundancy check data so as to protect the data. For example, a typical RAID5 system may have 4 disks, where 3 disks have space to store data and one disk has space to store parity data (space only, with actual data and redundancy checks being spread across all disks). The upper layer applications in the host typically see an overall large space consisting of 3 disk spaces. The disk read-write (hereinafter abbreviated as primary IO) command issued by the host is directed at the whole space, the IO command is analyzed by the accelerator card, data are mapped to a plurality of physical disks according to the read-write position of the host, then the data at the host end are moved to the accelerator card, the redundancy check data are calculated, and finally the accelerator card reorganizes to form the IO command for operating the disks and issues the IO command to the disks.
The method is characterized in that the method also has a similar process for other types of storage acceleration, the core steps of the acceleration card can be abstracted into the steps of analyzing host commands and moving data, operating the data, mapping the data to the physical space of a disk and reorganizing the commands, and specific services made by the three steps can be different under different application scenes, but the specific services are the same in nature. For example, for acceleration of storage compression, the only difference from RAID is that the operation on the data is compression rather than computing the redundancy check bits and the mapping of the compressed data to disk and the original data may be different.
There are many factors that affect memory acceleration chip performance, but generally, scheduling of data is one of the key factors. Due to the moving of data among the host, the acceleration card and the disk, the temporary storage of data in the calculation and verification processes and the like, the data needs to be written in/read from the DRAM of the acceleration card for many times. In a high-performance storage system, since PCIe interfaces widely used among a host, a board, and a disk have a high data bandwidth, which puts high demands on the bandwidth of a DRAM, the PCIe interfaces are likely to become a bottleneck of the system.
The NVMe protocol is more and more widely applied to the storage system due to its high performance, and therefore, a higher requirement is also put on a data scheduling method of the acceleration device. Compared with traditional storage protocols such as SAS/SATA and the like, the NVMe protocol is mainly characterized by a plurality of key characteristics, on one hand, more IO queues and larger queue depth are supported, on the other hand, NVMe equipment is used as an initiator of a DMA process, namely, the equipment actively goes to a host memory to obtain data instead of sending the data to the equipment by the host.
The device as the initiator of DMA puts forward a new requirement for the NVMe storage system, that is, the organization of data needs to meet the NVMe protocol standard before it can be recognized by the device. In the traditional SAS/SATA system, data can be organized in any form in the accelerator card, because the accelerator card serving as a transmission initiator can reorganize the data in the data packaging and sending process. However, in the NVMe system, the disk directly acquires data from the accelerator card, and the accelerator card cannot perform additional operations during data transmission.
The NVMe protocol defines two data organization modes, referred to as PRP and SGL, respectively. Where PRP is a more widely adopted NVMe protocol that specifies formats that need mandatory support.
As shown in fig. 1, the PRP is composed of two addresses, PRP1 and PRP2, respectively. Where the PRP1 address points to a memory page for storing data. The PRP1 address contains 2 parts, the lower order bits being used to record the starting address of valid data in the page (related to page size, e.g., for a 4096 byte data page, the lower order bits refer to the lower 12 bits), also known as the offset of valid data, and the upper order bits being used to record the page base address of the data page. PRP2 then:
PRP2 is invalid if the total data is stored in 1 data page;
if the total data is stored in 2 pages of data, then PRP2 points to one page of data, but at this point the offset of PRP2 must be 0.
If the total data is stored in more than 2 data pages, the PRP2 points to a memory page for storing the PRP list, and the offset of the PRP2 may not be 0, i.e., the valid data of the PRP list is not stored from the address 0 of the memory page. The PRP list consists of a number of PRP addresses, each of which points to a valid data page, but which must be offset by 0.
As shown in fig. 2, since a typical storage system is composed of many disks, these disks are managed by the accelerator card and then often present a unified storage device to the host. Therefore, in an IO operation of a large data block at the host end, data of the large data block needs to be mapped to multiple disks, and since data sent by the host has an offset, each data block of the mapped data has an offset. Since the PRP format does not allow 2 or more data blocks to use the offset field, it can only be split into multiple IO operations on the disk. The split increases the IO times and reduces the single IO data volume, inevitably brings great IO pressure to the disk, and therefore reduces the read-write efficiency.
In addition, there are several traditional methods, one is to continuously place data belonging to the same disk locally when the host moves the data through the DMA, so as to solve the problem of IO splitting by the method, but this results in that the original DMA operation of one data block needs to be split into 2 times, and when the offset is small, it means that the data load of one DMA is small, which greatly reduces the bus bandwidth utilization rate.
The other method is to arrange the data in the local memory once, continuously place the data belonging to the same disk or eliminate the offset, but in any way, the DRAM is inevitably subjected to one-time reading and one-time writing operation, which causes huge pressure on the bandwidth requirement of the DRAM.
Disclosure of Invention
In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a data scheduling method, including the following steps:
acquiring data to be scheduled by using a first controller;
responding to the fact that the destination address of the data to be scheduled is in the virtual address provided by the second controller, sequentially obtaining the data of each data page in the data to be scheduled, and determining the offset in the first data block in each data page according to the PRP address of each data page;
sending a data processing request to the first controller by using the second controller, so that the first controller sends data with a corresponding size in a current data page to the second controller according to the size of the data to be processed corresponding to the data processing request;
the second controller determines the page sequence number of the data page to be stored and the offset in the page of the data page to be stored according to the virtual address range corresponding to the destination address of the data with the corresponding size and the offset of the first data block in the current data page;
and writing the data with the corresponding size into the corresponding position in the data page to be stored according to the page sequence number and the intra-page offset.
In some embodiments, further comprising:
initializing a preset area of a memory, wherein the initialized preset area comprises a plurality of data pages to be stored;
and generating resource numbers according to the address of each data page to be stored and putting each resource number into a resource queue.
In some embodiments, further comprising:
determining the number of required data pages to be stored according to the size of the data to be scheduled;
acquiring a corresponding number of resource numbers in the resource queue, converting the resource numbers into addresses according to the corresponding number of resource numbers to form a PRP address linked list, and aiming at each PRP address linked list;
feeding back the PRP address linked list to the second controller;
and in response to the data on a plurality of data pages to be stored in the PRP address linked list being written into a disk, adding the resource numbers of the plurality of data pages to be stored into the resource queue again.
In some embodiments, the second controller determines a page sequence number of a data page to be stored and an intra-page offset in the data page to be stored according to a virtual address range corresponding to a destination address of the data of the corresponding size and an offset of a first data block in the current data page, further including:
determining the page sequence number of the corresponding data page to be stored in the PRP address linked list according to the virtual address range corresponding to the destination address of the data with the corresponding size;
and determining whether the data with the corresponding size belongs to the first data block according to the offset of the first data block in the current data page and the virtual address range corresponding to the destination address of the data with the corresponding size.
In some embodiments, writing the data of the respective size to a corresponding location in the page of data to be stored according to the page sequence number and the intra-page offset, further comprises:
responding to the data belonging to the first data block, and writing the data with the corresponding size into the corresponding position of the data page to be stored corresponding to the page sequence number according to the intra-page offset;
and responding to the data which does not belong to the first data block, and writing the data with the corresponding size into the corresponding position of the data page to be stored corresponding to the previous page sequence number or the next page sequence number of the page sequence number according to the intra-page offset.
In some embodiments, writing the data of the respective size to a corresponding location in the page of data to be stored according to the page sequence number and the intra-page offset, further comprises:
setting a cache smaller than the size corresponding to the virtual address in the second controller;
and storing the data with the corresponding size into a cache and then writing the data into the corresponding position in the data page to be stored.
In some embodiments, sending, by the second controller, a data processing request to the first controller, so that the first controller sends, to the second controller, data of a corresponding size in a current data page according to a size of data to be processed corresponding to the data processing request, further includes:
respectively initializing a first counter and a second counter in the first controller and the second controller;
setting an initial value of the second counter according to the size of a buffer of a second controller, and setting an initial value of the first counter to 0;
and in response to the second controller sending a data processing request to the first controller, subtracting the size of data to be requested to be processed in the data processing request from the second counter, and adding the size of the data to be requested to be processed in the data processing request to the first counter in the first controller, wherein the size of the data to be requested to be processed in the data processing request is not larger than the size of the corresponding cache space.
In some embodiments, further comprising:
subtracting the first counter by a corresponding size in response to the first controller processing a portion of the data in the data processing request;
and in response to the second controller writing partial data to the corresponding position of the data page to be stored, adding the corresponding size to the second counter.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a chip including a digital logic circuit, where the digital logic circuit is operative to implement the steps of the data scheduling method according to any one of the embodiments.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides an electronic device, including the chip described above.
The invention has the following beneficial technical effects: the scheme provided by the invention adopts the virtual space and the virtual address to receive data and extracts corresponding information to finish the remapping from the virtual address to the real address, thereby avoiding the amplification effect of IO and improving the utilization rate of the disk performance, and therefore, better system performance can be realized under the same condition. The IO frequency of the bus is reduced, and the problem of reduction of the bus utilization rate caused by small data loads is solved, so that the requirement on the bus can be reduced, and the cost is reduced. Meanwhile, the data copying times of the local DRAM are reduced, the performance requirement on the DRAM can be reduced, and the cost is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
FIG. 1 is a diagram illustrating typical PRP data organization in the prior art;
FIG. 2 is a schematic diagram of a data block with an offset mapped to different disks;
fig. 3 is a schematic flowchart of a data scheduling method according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of an overall circuit provided by an embodiment of the present invention;
FIG. 5 is a diagram illustrating a DRAM page management module according to an embodiment of the present invention;
FIG. 6 is a functional diagram of a data mapper according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a data mapper according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of disk RRP list regeneration according to an embodiment of the present invention;
FIG. 9 is a schematic structural diagram of a chip according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.
It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.
In the embodiment of the invention, DMA, Direct Memory Access;
according to an aspect of the present invention, an embodiment of the present invention provides a data scheduling method, as shown in fig. 3, which may include the steps of:
s1, acquiring data to be scheduled by using the first controller;
s2, responding to the destination address of the data to be scheduled in the virtual address provided by the second controller, sequentially acquiring the data of each data page in the data to be scheduled and determining the offset in the first data block in each data page according to the PRP address of each data page;
s3, sending a data processing request to the first controller by using the second controller, so that the first controller sends data with corresponding size in the current data page to the second controller according to the size of the data to be processed corresponding to the data processing request;
s4, the second controller determines the page sequence number of the data page to be stored and the offset in the data page to be stored according to the virtual address range corresponding to the destination address of the data with corresponding size and the offset of the first data block in the current data page;
and S5, writing the data with the corresponding size into the corresponding position in the data page to be stored according to the page sequence number and the intra-page offset.
The scheme provided by the invention adopts the virtual space and the virtual address to receive data and extracts corresponding information to finish the remapping from the virtual address to the real address, thereby avoiding the amplification effect of IO and improving the utilization rate of the disk performance, and therefore, better system performance can be realized under the same condition. The IO frequency of the bus is reduced, and the problem of reduction of the bus utilization rate caused by small data loads is solved, so that the requirement on the bus can be reduced, and the cost is reduced. Meanwhile, the data copying times of the local DRAM are reduced, the performance requirement on the DRAM can be reduced, and the cost is reduced.
In some embodiments, the circuit for implementing data scheduling shown in fig. 4 comprises several modules, which are connected to each other through an interconnection bus, including a PCIe bus controller, a DRAM controller, a DMA controller (first controller), a DRAM page manager, an accelerated operation unit, a data mapper (second controller), an array manager, and a disk controller.
The PCIe controller is used for realizing a PCIe interface connected with the host, mutually converting the bus read-write of the system on chip of the chip and the read-write of the PCIe bus domain of the host, and enabling the system on chip to be an endpoint device on the host PCIe bus. The DMA controller is used for realizing data transfer between the host and the system on chip, and sends a read-write request to the host by controlling the PCIe controller, and the PCIe controller reads the system on chip data and sends the data to the host or obtains the data from the host end and writes the data into a specified system on chip address. In addition, the DMA controller is generally responsible for parsing the PRP linked list at the host side. The DRAM controller is used for controlling DRAM particles connected with the chip and providing a corresponding bus read-write interface for the system on the chip. The array manager is used for managing a disk array formed by a plurality of disks, mainly responsible for splitting read-write IO of a host, decomposing and mapping the read-write IO into read/write IO of the plurality of disks, and responsible for reorganizing data of the host to accord with a data organization form of each disk. The array manager can be implemented by hardware circuits, or by a CPU and software. The disk controller is used for controlling an external disk, and is mainly used for realizing read-write access of a bus of the system on chip to the disk, and the disk controller can also be generally realized by a hardware circuit or a CPU.
The acceleration operation unit may be multiple according to different usage scenarios of the chip, each of which performs one or more operations, such as CRC operation for data check, xor operation for redundant check data of the disk array, and the like, and after each operation, the unit may write the data back to the DRAM.
The modules are typical modules in the storage acceleration device, and the functional boundaries of the modules may be different from those described above in practical implementation, but the functions may generally correspond to one or more of the modules. The embodiment adds a DRAM page management module and a data mapper on the basis of the general modules. The DRAM page management module is used for providing a data page to be stored. The data mapper is mainly used for changing the data storage with offset at the host end into the data storage without offset at the local end so as to simplify the processing of a data page linked list during multi-disk operation.
In some embodiments, further comprising:
initializing a preset area of a memory, wherein the initialized preset area comprises a plurality of data pages to be stored;
and generating resource numbers according to the address of each data page to be stored and putting each resource number into a resource queue.
In some embodiments, further comprising:
determining the number of required data pages to be stored according to the size of the data to be scheduled;
acquiring a corresponding number of resource numbers in the resource queue, converting the resource numbers into addresses according to the corresponding number of resource numbers to form a PRP address linked list, and aiming at each PRP address linked list;
feeding back the PRP address linked list to the second controller;
and in response to the data on a plurality of data pages to be stored in the PRP address linked list being written into a disk, adding the resource numbers of the plurality of data pages to be stored into the resource queue again.
Specifically, as shown in fig. 5, the DRAM page management module is used to manage the space of the DRAM. First, a piece of memory area is statically allocated in the DRAM for storing data, and the area is managed by the DRAM page management module. Then, the page management module divides the memory area into pages, each page is a continuous memory, the pages are adjacent, and the size of the page is the same as the size of the data exchange page of the host and the storage acceleration device. That is, if the host constructs a PRP list, one PRP address points to a data page size of 4KB, the page size of the DRAM page management module is also 4 KB.
The DRAM page manager module maintains a resource queue that is the same depth as the number of pages managed. For example, if the page management module manages a DRAM size of 4GB, a total of 4GB/4KB — 1M pages, and thus the depth of the queue is 1M. Depending on the size and cost requirements of the queue, the elements of the queue may be stored in areas of the DRAM that are not data-holding, only the head and tail pointers of the queue may be maintained in the management module, or the elements of the queue may be stored in a FIFO of the management module. Each element in the resource queue is a number (hereinafter referred to as a resource number) that represents a free page of data in DRAM that can be allocated, and the number can be simply represented by the upper bits of the page's address, or other simple operation.
The DRAM page management module also comprises a linked list generation module which calculates the required number of pages according to the size of the requested data to be distributed, then acquires the corresponding number of resource numbers from the resource queue, then converts the resource numbers into the addresses of the DRAM pages, and organizes the addresses into a linked list according to the PRP format conforming to the NVMe protocol.
When a piece of data is written into a disk, the corresponding DRAM area can be released, the area is also an area described by a PRP linked list, the linked list is analyzed by an analyzer in a DRAM page management module, data pages to be recovered and resource numbers corresponding to the data pages are extracted, and then the data pages and the resource numbers are put into a resource queue by a page resource recovery module.
In some embodiments, the second controller determines a page sequence number of a data page to be stored and an intra-page offset in the data page to be stored according to a virtual address range corresponding to a destination address of the data of the corresponding size and an offset of a first data block in the current data page, further including:
determining the page sequence number of the corresponding data page to be stored in the PRP address linked list according to the virtual address range corresponding to the destination address of the data with the corresponding size;
and determining whether the data with the corresponding size belongs to the first data block according to the offset of the first data block in the current data page and the virtual address range corresponding to the destination address of the data with the corresponding size.
In some embodiments, writing the data of the respective size to a corresponding location in the page of data to be stored according to the page sequence number and the intra-page offset, further comprises:
responding to the data belonging to the first data block, and writing the data with the corresponding size into the corresponding position of the data page to be stored corresponding to the page sequence number according to the intra-page offset;
and responding to the data which does not belong to the first data block, and writing the data with the corresponding size into the corresponding position of the data page to be stored corresponding to the previous page sequence number or the next page sequence number of the page sequence number according to the intra-page offset.
Specifically, as shown in fig. 6, when there is an offset in the host data storage, one data block is scattered over two data pages, and one data page contains data of two data blocks. If the host page is directly moved in a common DMA mode, because one host page needs to be moved to two different local pages, 2 DMA operations are needed, and one of the DMA operations is the size of an offset, when the offset is small, the bandwidth utilization rate of the PCIe bus is greatly reduced.
The core function of the data mapping module is to provide a continuous "virtual space" which is used to receive data pages from the host side as a whole and then distribute the data pages to a plurality of local data pages. Therefore, only one DMA operation is needed for the data page of the host, and the DMA efficiency can be greatly improved.
The functional structure of the data mapper is shown in fig. 7. The internal task information module stores task parameters, and the core is a host PRP and a local PRP, and other parameters such as page size and the like are included. The PRP resolver respectively resolves the host and the local PRP to acquire the mapping relation of the data page.
One of the cores of the data mapper is a data path consisting of three modules, namely an address decoding module, a data caching module and a data distribution module.
Unlike the normal modules, the data mapper provides a "virtual address" much larger than its own cache for receiving the data of the DMA. For example, the size of data that a host IO needs to process at a time may be 4MB, the data buffer inside the data mapper may only be 4KB, but the data mapper provides a 4MB virtual address range to which data is written to be received by the mapper.
The address decoder is used for decoding the virtual address, resolving the page sequence number of the current data, the page offset of the data and the like according to the virtual address, and temporarily storing the data at the designated position in the data cache. The designated location is typically determined from the in-page offset of the data and may be used to reorder the PCIe packets out of order as possible.
The data distribution module finds a local real page address corresponding to the current data according to the mapping relation analyzed by the page sequence number and the PRP, and then writes the data into a position corresponding to the DRAM according to the in-page offset of the data.
For example, the current data page is data page 3 in fig. 6, and for data page 3, the first data block is data block 3. The offset of the first data block in the data page 3 can be determined according to the PRP at the host side, and the size of each data page is a fixed value, for example, 4kb, so that the virtual address corresponding to the data page 3 is a range of 9-12kb, and corresponds to the data page 3 to be stored. Thus, after the second controller obtains the data of the corresponding size in the data page 3, the page sequence number of the data page to be stored corresponding to the data can be determined according to the virtual address to be written in the data, and then the in-page offset of the data can be determined according to the offset of the first data block and the virtual address to be written in, so as to judge whether the data page belongs to the data block 3. If the data belongs to the data block 3, writing the data with the corresponding size into the data page 3 to be stored, and if the data does not belong to the data block 3, indicating that the data belongs to the data block 2, writing the data into the data page 2 to be stored. In this way, only one data block is written in each page of data to be stored in the DRAM, so that there is no offset when the data is stored locally.
It should be noted that the data page 2 to be stored and the data page 3 to be stored are obtained by sequentially arranging according to the order in the PRP chain table provided by the DARM controller, the data page to be stored corresponding to the first PRP address in the PRP chain table is the data page 1 to be stored, the second is the data page 2 to be stored, and so on.
In some embodiments, writing the data of the respective size to a corresponding location in the page of data to be stored according to the page sequence number and the intra-page offset, further comprises:
setting a cache smaller than the size corresponding to the virtual address in the second controller;
and storing the data with the corresponding size into a cache and then writing the data into the corresponding position in the data page to be stored.
Specifically, the data caching function mainly has two points, the first point is that when the data distribution module cannot write data into the DRAM in time, the data is temporarily stored to avoid data overflow, for example, other modules access the DRAM or encounter operations such as DRAM refresh and precharge; the second point is to rearrange the data out of order for the convenience of subsequent processing, for example, the subsequent stage is connected to other operation modules instead of the DRAM.
In some embodiments, sending, by the second controller, a data processing request to the first controller, so that the first controller sends, to the second controller, data of a corresponding size in a current data page according to a size of data to be processed corresponding to the data processing request, further includes:
respectively initializing a first counter and a second counter in the first controller and the second controller;
setting an initial value of the second counter according to the size of a buffer of a second controller, and setting an initial value of the first counter to 0;
and in response to the second controller sending a data processing request to the first controller, subtracting the size of data to be requested to be processed in the data processing request from the second counter, and adding the size of the data to be requested to be processed in the data processing request to the first counter in the first controller, wherein the size of the data to be requested to be processed in the data processing request is not larger than the size of the corresponding cache space.
In some embodiments, further comprising:
subtracting the first counter by a corresponding size in response to the first controller processing a portion of the data in the data processing request;
and in response to the second controller writing partial data to the corresponding position of the data page to be stored, adding the corresponding size to the second counter.
In particular, flow control is another core module of the data mapper to prevent overflow due to issuing too many data requests. Since the range of virtual addresses is larger than the data buffer, if the DMA controller issues too many data requests, but the mapper cannot consume in time, it will cause data overflow, especially when the latter stage of the data mapper is connected to other operation modules. This module sends a flow control message to the DMA controller module requesting the DMA module to send data of the specified size. The DMA module can only send data that does not exceed this size after a request is received. The specific implementation mode is that a counter is used in the module to record data requests which are sent by the module and are not consumed, that is, the initial value of the counter is set to 0, every time a certain number of data requests are sent to the DMA, the value is added, and after every data is sent to the subsequent module, the counter subtracts 1, so that the count value needs to be controlled not to exceed the buffer size. A similar counter is implemented in the DMA controller, when a data request sent by the data mapper is received, the counter increases a corresponding value, and after the data request is sent to the PCIe controller, the corresponding value is subtracted, and when the DMA controller needs to ensure that a count value is 0, the DMA controller does not send the request to the PCIe controller any more.
The DMA controller needs to make an additional change to have the original local PRP as the destination address and the virtual address space of the data mapper as the destination address.
As shown in fig. 8, after the above-mentioned processing, since there is no offset when the data is stored locally, the PRP linked list of each disk can be reconstructed only by performing a simple operation of extracting every N local PRPs.
The scheme provided by the invention manages the storage of local data according to pages, the page size is equal to the page size of a host end or 1/N of the host end, and N is an integer. And receiving data by adopting a virtual space and a virtual address, extracting corresponding information to complete remapping from the virtual address to a real address, and storing the data at the host side without offset after moving the data to the local. This makes the offset field of each entry in the PRP list 0, so that the entry can be simply interleaved for each disk. Thus, conversion of offset page data to non-offset page data can be achieved without reducing DMA efficiency.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 9, an embodiment of the present invention further provides a chip 501, including:
comprising digital logic circuitry 510, said digital logic circuitry 510 being operative to implement the steps of the data scheduling method according to any of the embodiments described above.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 10, an embodiment of the present invention further provides an electronic device 601 including the chip 610 described above.
Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above.
Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (10)

1. A method for scheduling data, comprising the steps of:
acquiring data to be scheduled by using a first controller;
responding to the fact that the destination address of the data to be scheduled is in the virtual address provided by the second controller, sequentially obtaining the data of each data page in the data to be scheduled, and determining the offset in the first data block in each data page according to the PRP address of each data page;
sending a data processing request to the first controller by using the second controller, so that the first controller sends data with a corresponding size in a current data page to the second controller according to the size of the data to be processed corresponding to the data processing request;
the second controller determines the page sequence number of the data page to be stored and the offset in the page of the data page to be stored according to the virtual address range corresponding to the destination address of the data with the corresponding size and the offset of the first data block in the current data page;
and writing the data with the corresponding size into the corresponding position in the data page to be stored according to the page sequence number and the intra-page offset.
2. The method of claim 1, further comprising:
initializing a preset area of a memory, wherein the initialized preset area comprises a plurality of data pages to be stored;
and generating resource numbers according to the address of each data page to be stored and putting each resource number into a resource queue.
3. The method of claim 2, further comprising:
determining the number of required data pages to be stored according to the size of the data to be scheduled;
acquiring a corresponding number of resource numbers in the resource queue, converting the resource numbers into addresses according to the corresponding number of resource numbers to form a PRP address linked list, and aiming at each PRP address linked list;
feeding back the PRP address linked list to the second controller;
and in response to the data on a plurality of data pages to be stored in the PRP address linked list being written into a disk, adding the resource numbers of the plurality of data pages to be stored into the resource queue again.
4. The method of claim 3, wherein the second controller determines a page sequence number of a data page to be stored and an intra-page offset in the data page to be stored according to a virtual address range corresponding to a destination address of the data of the corresponding size and an offset of the first data block in the current data page, further comprising:
determining the page sequence number of the corresponding data page to be stored in the PRP address linked list according to the virtual address range corresponding to the destination address of the data with the corresponding size;
and determining whether the data with the corresponding size belongs to the first data block according to the offset of the first data block in the current data page and the virtual address range corresponding to the destination address of the data with the corresponding size.
5. The method of claim 4, wherein writing the respective size of data to a corresponding location in the page of data to be stored according to the page sequence number and the intra-page offset, further comprises:
responding to the data belonging to the first data block, and writing the data with the corresponding size into the corresponding position of the data page to be stored corresponding to the page sequence number according to the intra-page offset;
and responding to the data which does not belong to the first data block, and writing the data with the corresponding size into the corresponding position of the data page to be stored corresponding to the previous page sequence number or the next page sequence number of the page sequence number according to the intra-page offset.
6. The method of claim 1, wherein writing the respective size of data to a corresponding location in the page of data to be stored according to the page sequence number and the intra-page offset, further comprises:
setting a cache smaller than the size corresponding to the virtual address in the second controller;
and storing the data with the corresponding size into a cache and then writing the data into the corresponding position in the data page to be stored.
7. The method of claim 6, wherein the second controller is used to send a data processing request to the first controller, so that the first controller sends data of a corresponding size in a current data page to the second controller according to a size of data to be processed corresponding to the data processing request, further comprising:
respectively initializing a first counter and a second counter in the first controller and the second controller;
setting an initial value of the second counter according to the size of a buffer of a second controller, and setting an initial value of the first counter to 0;
and in response to the second controller sending a data processing request to the first controller, subtracting the size of data to be requested to be processed in the data processing request from the second counter, and adding the size of the data to be requested to be processed in the data processing request to the first counter in the first controller, wherein the size of the data to be requested to be processed in the data processing request is not larger than the size of the corresponding cache space.
8. The method of claim 7, further comprising:
subtracting the first counter by a corresponding size in response to the first controller processing a portion of the data in the data processing request;
and in response to the second controller writing partial data to the corresponding position of the data page to be stored, adding the corresponding size to the second counter.
9. A chip comprising digital logic circuitry, said digital logic circuitry being operative to implement the steps of the method according to any one of claims 1 to 8.
10. An electronic device comprising the chip of claim 9.
CN202111440615.2A 2021-11-30 2021-11-30 Data scheduling method, chip and electronic equipment Pending CN114265791A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111440615.2A CN114265791A (en) 2021-11-30 2021-11-30 Data scheduling method, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111440615.2A CN114265791A (en) 2021-11-30 2021-11-30 Data scheduling method, chip and electronic equipment

Publications (1)

Publication Number Publication Date
CN114265791A true CN114265791A (en) 2022-04-01

Family

ID=80826010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111440615.2A Pending CN114265791A (en) 2021-11-30 2021-11-30 Data scheduling method, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN114265791A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117032591A (en) * 2023-10-08 2023-11-10 苏州元脑智能科技有限公司 Application method, device, computer equipment and storage medium of direct access channel

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117032591A (en) * 2023-10-08 2023-11-10 苏州元脑智能科技有限公司 Application method, device, computer equipment and storage medium of direct access channel
CN117032591B (en) * 2023-10-08 2024-02-06 苏州元脑智能科技有限公司 Application method, device, computer equipment and storage medium of direct access channel

Similar Documents

Publication Publication Date Title
CN107168657B (en) Virtual disk hierarchical cache design method based on distributed block storage
JP2019067417A (en) Final level cache system and corresponding method
US10209922B2 (en) Communication via a memory interface
US9075729B2 (en) Storage system and method of controlling data transfer in storage system
US9098404B2 (en) Storage array, storage system, and data access method
KR20160022226A (en) Heterogeneous unified memory section and method for manaing extended unified memory space thereof
US10552936B2 (en) Solid state storage local image processing system and method
WO2011002438A1 (en) Organizing and managing a memory blade with super pages and buffers
CN112199309B (en) Data reading method and device based on DMA engine and data transmission system
WO2023125524A1 (en) Data storage method and system, storage access configuration method and related device
WO2023000770A1 (en) Method and apparatus for processing access request, and storage device and storage medium
KR102471966B1 (en) Data input and output method using storage node based key-value srotre
CN109375868B (en) Data storage method, scheduling device, system, equipment and storage medium
CN107577492A (en) The NVM block device drives method and system of accelerating file system read-write
CN113918087B (en) Storage device and method for managing namespaces in the storage device
CN114265791A (en) Data scheduling method, chip and electronic equipment
US20180074709A1 (en) Stream management for storage devices
WO2023020136A1 (en) Data storage method and apparatus in storage system
US11487465B2 (en) Method and system for a local storage engine collaborating with a solid state drive controller
CN104424124A (en) Memory device, electronic equipment and method for controlling memory device
US20210019066A1 (en) Method and system for data reduction in a storage infrastructure to support a high-ration thin-provisioned service
CN101859232A (en) Variable length data memory interface
CN113312275A (en) Data processing method, device and system of memory equipment
CN109783032A (en) A kind of distributed storage accelerating method and device based on Heterogeneous Computing
CN115826882B (en) Storage method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination