CN116166185A

CN116166185A - Caching method, image transmission method, electronic device and storage medium

Info

Publication number: CN116166185A
Application number: CN202211542749.XA
Authority: CN
Inventors: 王云锋; 赵鹏; 张楠赓
Original assignee: Hangzhou Canaan Creative Information Technology Ltd
Current assignee: Hangzhou Canaan Creative Information Technology Ltd
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-05-26

Abstract

The embodiment of the disclosure provides a caching method, an image transmission method, electronic equipment and a storage medium. The method comprises the following steps: acquiring target data with bit width of byte index times; determining a buffer block bit width of a buffer structure of the target data based on the data bit width of the target data, wherein the data bit width is an integer multiple of the buffer block bit width; determining the parallel storage data length of the cache structure based on the cache block bit width and the interface data bit width configured by the cache structure; determining the number of parallel cache blocks in the cache structure based on the parallel storage data length and the cache block bit width; and caching the target data according to the buffer block bit width, the interface data bit width and the parallel buffer block number of the buffer structure. According to the embodiment of the disclosure, the efficiency of data caching can be improved.

Description

Caching method, image transmission method, electronic device and storage medium

Technical Field

The disclosure relates to the technical field of artificial intelligence, and in particular relates to a caching method, an image transmission method, electronic equipment and a storage medium.

Background

In SOC (System on Chip), data movement in memory space can be directly read or written by CPU (central processing unit ) or other processors through register mode, the data movement mode is simple and visual to operate, suitable for temporarily processing small data block, large data block is unsuitable for long time intervention of processor due to long movement time. The conventional data transfer device with a buffer structure supports an AXI interface, but the data transfer device is not optimized for the buffer to adapt to the characteristics of an AXI bus, has low data transfer efficiency, and cannot read data at a byte level from any address in a single clock cycle.

Disclosure of Invention

The embodiment of the disclosure provides a caching method, an image transmission method, electronic equipment and a storage medium, which are used for solving or relieving one or more technical problems in the prior art.

As a first aspect of the embodiments of the present disclosure, the embodiments of the present disclosure provide a caching method, including:

acquiring target data with bit width of byte index times;

determining a buffer block bit width of a buffer structure of the target data based on the data bit width of the target data, wherein the data bit width is an integer multiple of the buffer block bit width;

Determining the parallel storage data length of the cache structure based on the cache block bit width and the interface data bit width configured by the cache structure;

determining the number of parallel cache blocks in the cache structure based on the parallel storage data length and the cache block bit width;

and caching the target data according to the buffer block bit width, the interface data bit width and the parallel buffer block number of the buffer structure.

As a second aspect of the embodiments of the present disclosure, the embodiments of the present disclosure provide an image transmission method, including:

determining a cache structure of target data as the cache structure provided by any embodiment of the disclosure, wherein the target data is a target image;

determining a pixel reading mode of the target image based on the conversion mode of the target image and the cache structure;

and based on the pixel reading mode, caching the target image into the caching structure.

As a third aspect of the embodiments of the present disclosure, the embodiments of the present disclosure provide an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the methods provided by the embodiments of the present disclosure.

As a fourth aspect of the disclosed embodiments, the disclosed embodiments provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the methods provided by the disclosed embodiments.

As a fifth aspect of the disclosed embodiments, the disclosed embodiments provide a computer program product comprising a computer program which, when executed by a processor, implements the method provided by the disclosed embodiments.

According to the technical scheme provided by the embodiment of the disclosure, when the bit width of the target data is determined to be byte index times, the bit width of the cache block of the cache structure is determined based on the data bit width of the data, and the data bit width is an integer multiple of the bit width of the cache block. Then, based on the bit width of the buffer block and the bit width of the interface data configured by the data bus interface, the parallel storage data length of the buffer structure is determined. Furthermore, based on the parallel storage data length and the buffer block bit width of the buffer structure, the number of parallel buffer blocks of the buffer structure can be obtained. Therefore, when data is stored based on the cache structure, the data bus interface can realize full bandwidth transmission, and the data transmission efficiency is improved.

The foregoing summary is for the purpose of the specification only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present disclosure will become apparent by reference to the drawings and the following detailed description.

Drawings

In the drawings, the same reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily drawn to scale. It is appreciated that these drawings depict only some embodiments according to the disclosure and are not to be considered limiting of its scope.

FIG. 1 is a flow chart of a caching method of an embodiment of the present disclosure;

FIG. 2 is a flow chart of an image transmission method of an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a cache structure according to an embodiment of the disclosure;

FIG. 4 is a schematic diagram of a DMA structure of an image transfer method according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of an image buffer according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of an image buffer of another embodiment of the present disclosure;

FIG. 7 is a schematic diagram of an image buffer of another embodiment of the present disclosure;

FIG. 8 is a schematic diagram of an image buffer of another embodiment of the present disclosure;

FIG. 9 is a schematic diagram of an image buffer of another embodiment of the present disclosure;

FIG. 10 is a schematic diagram of an image buffer of another embodiment of the present disclosure;

fig. 11 is a block diagram of an electronic device of an embodiment of the present disclosure.

Detailed Description

Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in various different ways without departing from the spirit or scope of the present disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.

For data handling within SOC systems, the handling is typically done by a DMA (Direct Memory Access ) controller.

The DMA controller is a common data transfer tool, is mainly applied to data transfer between memory and between memory and peripheral in a system, and can finish specified data transfer work by carrying out parameterization configuration on the DMA controller by a processor before starting. The existence of the method greatly reduces the workload of the processor in the aspect of data movement and improves the working efficiency of the system.

As a general-purpose data transfer tool, a conventional DMA controller generally includes a configuration module, a read data channel and a write data channel, an inter-channel FIFO interface buffer, and the like. Although the conventional DMA controller supports an AXI interface, the interior of the conventional DMA controller is not deeply optimized according to the characteristics of an AXI bus, and the high-bandwidth low-delay characteristics of the AXI bus are difficult to develop.

Therefore, the present application provides a caching method, which can configure a cache structure and configure interface parameters of an AXI bus corresponding to the cache structure, such as interface bit width, burst length, etc., so as to implement high-speed data handling, and can implement byte-level data reading and writing for any address in a single clock cycle.

FIG. 1 is a flow chart of a caching method of an embodiment of the present disclosure. As shown in fig. 1, the caching method may include:

s110, acquiring target data with bit width of byte index times;

s120, determining the bit width of a cache block of a cache structure corresponding to target data based on the data bit width of the target data, wherein the data bit width is an integer multiple of the bit width of the cache block;

s130, determining the parallel storage data length of the cache structure based on the bit width of the cache block and the bit width of interface data configured by the cache structure;

S140, determining the number of parallel cache blocks in a cache structure based on the length of parallel storage data and the bit width of the cache blocks;

and S150, caching the target data according to the bit width of the cache block, the bit width of the interface data and the number of parallel cache blocks of the cache structure.

In this example, when the transmission data is byte index multiple, the buffer structure adopting the buffer structure can realize that the data bus interface can realize full bandwidth transmission, and improve the efficiency of data transmission.

Illustratively, the bit width is an exponent of bytes, meaning the product between a single byte and an exponent of 2. For example, 1byte,2b byte,8byte,16byte,32byte, etc. If the bit width is 1byte, i.e. 8 bits, it is a byte index multiple.

In some embodiments, the data bit width is consistent with the cache block bit width.

For example, if the data bit width is 8 bits, the buffer block bit width is 8 bits. If the data bit width is 16 bits, the buffer block bit width is 16 bits.

In this example, since the data bit width is an integer multiple of the buffer block bit width, e.g., the bit widths of both are the same, the data bit width of the target data can be fully occupied by one or more buffer block bit widths, thereby achieving full utilization of the buffer block bit width to store the target data.

The cache structure characteristics are determined using the data structure characteristics of the target data. The standard adopted by the buffer block bit width is usually 8 bits, 16 bits and 32 bits, and the data bit width of the target data can be fully occupied by one or more buffer block bit widths at the moment, so that the full utilization of the buffer block bit width for storing the target data is realized.

After obtaining the buffer block bit width of the buffer structure, the number of parallel buffer blocks of the buffer structure can be further determined. In determining the number of parallel cache blocks, the parallel storage data length of the data needs to be determined first. While the minimum parallel storage data length is related to the interface data bit width.

In some embodiments, in the step S120, determining the parallel storage data length of the buffer structure based on the buffer block bit width and the interface data bit width of the data bus interface configuration corresponding to the buffer structure includes:

and determining the parallel storage data length of the cache structure based on the common multiple of the bit width of the cache block and the bit width of the interface data.

The method aims at fully filling the length of parallel storage data according to the bit width of one or more stored target data when the data are stored in parallel based on the bit width of interface data, so that the bit width of the interface data, namely the bandwidth, is fully utilized.

In the SOC system, the data transmission method is based on an AXI bus, so that the interface data bit width of the data bus interface configuration corresponding to the buffer structure is 64 bits, and of course, the interface data bit width can also be 128 bits, 32 bits and the like based on other AXI bus protocols or other bus protocols.

Illustratively, the common multiple is a least common multiple.

Illustratively, if the buffer block bit width is 8 bits, the interface data bit width is 64 bits, the least common multiple between 8 bits and 64 bits is 64 bits, and the parallel store data length of the buffer write-once data is 64 bits. Thus, data of one bandwidth can be written in parallel in 8 cache blocks at a time. Of course, other common multiples such as 128 bits are adopted, and the length of parallel storage data of the one-time parallel write-in data is 128 bits. Thus, data of two bandwidths can be written in parallel in 16 cache blocks at a time.

Illustratively, if the buffer block bit width is 16 bits, the interface data bit width is 64 bits, the least common multiple between 16 bits and 64 bits is 64 bits, then the parallel store data length of the buffered write-once data is 64 bits. Thus, data of one bandwidth can be written in parallel in 4 cache blocks at a time.

In some embodiments, the step S130, based on the parallel storage data length and the buffer block bit width, determines the number of parallel buffer blocks in the buffer structure, includes:

the number of parallel cache blocks is determined based on a ratio between the parallel storage data length and the cache block bit width.

Illustratively, if the parallel store data is 64 bits long and the buffer block bit width is 8 bits, then the number of parallel buffer blocks is 8.

Illustratively, if the parallel store data is 64 bits long and the cache block bit width is 16 bits, the number of parallel cache blocks is 4.

Illustratively, if the parallel store data is 128 bits long and the cache block bit width is 8 bits, then the number of parallel cache blocks is 16.

In some embodiments, after determining the buffer block bit width and the number of parallel buffer blocks of the buffer structure for the target data, the burst length of the data bus interface corresponding to the buffer structure may also be configured.

Illustratively, the burst length of the data bus interface configuration is determined based on a ratio between the parallel storage data length and the interface data bit width.

Illustratively, the data bit width is 8 bits, the buffer block bit width is 8 bits, and the interface data bit width is 64 bits.

If the parallel memory data length is 64 bits and the interface data bit width is 64 bits, then 64 bits/64 bit=1, and the burst length of the data bus interface configuration is at least an integer multiple of 1, for example, 8 times, i.e., the burst length is 8.

If the parallel memory data length is 128 bits and the interface data bit width is 64 bits, 128 bits/64 bits=2, and the burst length of the data bus interface configuration is at least an integer multiple of 2, for example, 4 times, that is, the burst length is 8.

As shown in tables 1 and 2 below, for the target data with byte index multiple, the buffer structure and the data bus interface parameters corresponding to the buffer structure may be set in a similar manner to those in tables 1 and 2.

Table 1: buffer characteristics of different data bit widths with 64bit bandwidth

Table 2: buffer characteristics of different data bit widths with 32bit bandwidth

Taking the above bandwidth of 64 bits and the bit width of 8 bits as an example, the design scheme of the cache structure is described, and the specific steps are as follows:

as shown in fig. 3, a single-set buffer (a buffer structure in the embodiment of the present disclosure) is configured by splicing 8 ram blocks with 8-bit width and 512-bit depth, where the ram blocks execute parallel operations. The design of the parallel ram blocks has the advantages that the read or write of 8 x 8bit data width of each working clock can be completed and is consistent with the interface data bit width, any ram block can be selected according to any address, and therefore when the parallel ram blocks are used for special data conversion, the efficient read or write of the single full-beat data bit width can still be realized, and the method is an advantage that a fifo (First Input First Output, first-in first-out) cache mode cannot be adopted.

Other bit width data in the above table 1 and table 2 can be similarly set, so as to realize full bandwidth high-speed transmission of the AXI interface and improve the efficiency of data caching.

Fig. 2 is a flowchart of an image transmission method according to an embodiment of the present disclosure, and as shown in fig. 2, the method may include the steps of:

s210, determining a cache structure of target data as the cache structure provided by the embodiment of the disclosure, wherein the target data is a target image;

s220, determining a pixel reading mode of the target image based on a conversion mode and a buffer structure of the target image;

s230, based on the pixel reading mode, the target image is cached in a cache structure.

In this example, the pixel reading order of the target image is determined based on the conversion manner of the image and the buffer structure, and then the pixels of the target image are read into the buffer structure in this reading order. Thus, in the process of transferring an image, the object image is turned over, for example, horizontally, vertically, or the like.

As shown in fig. 4, the DMA controller related to the image transmission method of the present application generally structurally includes a configuration module, a read data channel, a write data channel, an inter-channel FIFO interface buffer, and the like. The DMA controller is designed while fully combining the features (configured interface data bit width) of an AXI bus, such as data bit width, burst length, outlining value and the like; reasonably setting the size of a buffer (cache structure of the embodiment of the disclosure) between a DMA read channel and a write channel, and setting a ping-pong buffer to enable the read channel and the write channel to work simultaneously; and the ram block splicing is used as a buffer cache, so that data read-write of byte level is carried out on any address by single beat which cannot be achieved by the fifo buffer.

In one embodiment, the data bus interface adopts AXI3 protocol, the interface data bit width 64bit, burst_length (burst length) is set to 8, the internal supporting value of burst is 8, and buffer (buffer structure in the embodiment of the present disclosure) is divided into two groups and used as DMA read/write ping-pong buffer.

In this example, by setting two sets of caches, each time one set of caches is full, the next set of caches can be written into, each time one set of caches is read, the next set of caches can be read, and thus the two sets of caches are continuously circulated, and the parallel reading and writing efficiency can be higher.

For one embodiment, the AXI3 protocol may be replaced with an updated version of the AXI4.0 and AXI5.0 protocols. The axi_stream interface can also be involved in some application scenarios. Compared with axi interface protocol, axi_stream interface removes address control logic, removes burst length limitation, and is more suitable for high-speed large data stream transmission in system, such as high-speed video stream transmission.

Illustratively, in the above step S220, for image data of different bit widths, the corresponding buffer structures thereof are different.

Illustratively, the data bit width of the target image is a byte index multiple, e.g., 8 bits, 16 bits, 32 bits, etc.

Illustratively, the data bit width of the target image is 8 bits, and the buffer structure that can be adopted is a buffer formed by splicing 8 groups of ram blocks with 8bit width and 128 depth.

For example, the conversion means may include rotation and flipping. Wherein the flipping may include horizontal flipping and vertical flipping, i.e., horizontal mirroring and vertical mirroring. Rotation may include 90 degrees, 180 degrees, and 270 degrees of rotation.

In some embodiments, if the conversion mode of the target image is unchanged, it can be directly read in accordance with the conventional reading sequence. Or may be split into multiple sub-images for reading.

In some embodiments, if the conversion mode of the target image is rotated by 90 degrees, 180 degrees, 270 degrees, etc., horizontally flipped, vertically flipped, etc., the target image may be split into a plurality of sub-images based on the buffer structure.

Illustratively, in the above step S220, determining the pixel reading order of the target image based on the conversion manner and the buffer structure of the target image may include:

splitting a target image based on a cache structure to obtain at least one sub-image;

determining a pixel reading sequence of the sub-image based on a conversion mode of the target image;

Based on the pixel reading sequence, the pixel of the target image is read into a buffer structure, including:

based on the pixel reading sequence of the sub-images, the pixels of the sub-images are cached in a cache structure.

In this example, based on the buffer structure, the target image is split into a plurality of sub-images and the pixel reading sequence of each sub-image is determined, so that the quick moving of the image can be realized and the overturning of the image in the buffer process can be realized conveniently, thereby realizing the conversion of the image when the image moves the destination address from the source address, avoiding the need of carrying out the image conversion after the destination address is moved, and improving the image conversion efficiency.

Illustratively, the splitting the target image based on the cache structure to obtain at least one sub-image may include:

determining the number of parallel storage pixels for the target image based on the number of parallel cache blocks of the cache structure and the number of single data cache blocks of the target image;

the line pixel amount and the wide pixel amount of the sub-image are determined based on the parallel storage pixel number.

The number of the single data cache blocks can be determined according to the cache bit width of the cache structure and the data bit width of the target image. For example, the buffer bit width of the buffer structure is 8 bits, the data bit width of the target image is 8 bits, and the number of single data buffer blocks is 1. Since the number of single data buffer blocks is 1, the number of parallel buffer blocks is the same as the number of parallel storage pixels of the image. For example, if the number of parallel buffer blocks is 8, the number of parallel storage pixels of the image is 8, and each storage block stores one pixel. Thus, the row pixel count of the sub-image may be a positive integer multiple of 8 and the column pixel count of the sub-image may be a positive integer multiple of 8. For example, the sub-image is 8 pixels by 8 pixels, 16 pixels by 16 pixels, and so on.

In this example, the target image is split into multiple sub-images, which are read from sub-image to sub-image.

For example, the number of parallel buffer blocks of the buffer structure is 8, the bit width of the buffer blocks is 8 bits, the target data is 8 bits, the number of parallel storage pixels is 8, the number of line pixels of the sub-image can be an integer multiple of 8, and the number of wide pixels can also be an integer multiple of 8.

Illustratively, the line pixel amount of the sub-image is the same as the wide pixel amount.

Illustratively, if the number of parallel cache blocks of the cache structure is 24, and the number of single data cache blocks of the target image is 3, then the number of parallel storage pixels is 8, and then the line pixel amount and the wide pixel amount of the sub-image are 32.

Illustratively, if the number of parallel cache blocks of the cache result is 8, and the number of single data cache blocks of the target image is 1, then the number of parallel storage pixels is 8, and then the line pixel amount and the wide pixel amount of the sub-image are both 8.

The pixel reading order of the sub-images is different for different image conversion modes, and the following description is made for unconverted, horizontally flipped, vertically flipped, rotated by 90 degrees and rotated by 180 degrees:

illustratively, the conversion mode of the target image is unconverted, and the pixel reading sequence of the sub-image includes one of the following:

Reading from a row, wherein the row reading sequence of the sub-images is from the first row to the last row, and the row pixel reading sequence of each row of pixels is from the first row to the last row;

the sub-images are read from the columns in the order of the first to last columns and the pixels in each column are read from the rows from the first to the last.

Then, when the sub-images are read and written into the cache, since they are not converted, they can be written into the cache in the order of reading, and the cache addresses are: when the turning mode of the target image is unconverted, determining a writing buffer block of a kth pixel subunit as a kth buffer block according to the kth pixel subunit of a jth pixel unit of an ith pixel block, and determining a buffer address of the kth pixel subunit in the k buffer block as i x n+j;

wherein i is an integer, j is an integer, k is an integer, the value range of j is [0, N-1], and the value range of k is [0, M-1].

The pixel block is the above sub-image (8 pixels by 8 pixels). For example, a sub-image can be obtained by obtaining a pixel block with a burst length, i.e. 8×64 bits, based on the burst length. The segmentation is performed according to the pixel units of the row, and the pixel units are segmented into 8 pixel units, namely each pixel unit is 64 bits and can be divided into 8 x 8 bits, namely 8 sub-pixel units (8 pixels). So that one pixel element, i.e. 8 sub-pixel elements, is written in parallel at a time. Thus, the pixel block with one burst length can be completely written into the buffer memory after being written into 8 times of parallel.

As shown in fig. 5, the original image stored at the source address src_addr is moved to the destination address dst_addr by using the DMA controller, the DMA reads data R1, R2, R3, R4 from the source address in units of burst (burst length) in the line operation mode, sequentially puts them into the buffer, and after the buffer status bit is pulled up, the W1, W2, W3, W4 data is written into the destination address in units of burst in the line operation mode, and the whole image data is sequentially moved according to this operation.

As shown in FIG. 6, the order of data in buffer cache in the moving process is shown in the figure, the data quantity R1 read by single burst is split by taking bytes as a unit, so that corresponding addresses of ram blocks can be written in all of 64-bit data single beats of each beat, and writing of burst data can be completed by 8 beats. Since this is a row-by-row copy mode, the read and write orders of data are consistent, and thus, embodiments of the present disclosure configure a ping-pong cache. Thus, the reading and writing of data can be operated in parallel, and the waiting time of each other is greatly shortened.

Illustratively, the conversion mode of the target image is horizontal flip, and the pixel reading sequence of the sub-image includes one of the following:

reading from a row, wherein the row reading sequence of the sub-images is from the first row to the last row, and the row pixel reading sequence of each row of pixels is from the tail row to the first row;

The sub-images are read from the columns in the order of the first to last columns and the pixels in each column are read from the columns to the first.

Then, after the sub-image is read, performing cache writing according to a specified write cache mode, wherein the writing address is as follows: when the turning mode of the target image is horizontal turning, determining that a writing buffer block of a kth pixel subunit is an Mth-k-1 buffer block aiming at the kth pixel subunit of a jth pixel unit of an ith pixel block, and determining that a buffer address of the kth pixel subunit in the Mth-k-1 buffer block is i x N+N-j-1;

Wherein the pixel block is the sub-image. For example, a sub-image can be obtained by obtaining a pixel block with a burst length, i.e. 8×64 bits, based on the burst length. The segmentation is performed according to the pixel units of the row, and the pixel units are segmented into 8 pixel units, namely each pixel unit comprises 8 pixels, namely 8 x 8bit. Thus, one pixel cell at a time, i.e., 8 pixels, is written in parallel. Thus, the pixel block (sub-image) with one burst length can be completely written into the buffer after being written in parallel for 8 times.

As shown in fig. 7, a horizontal flip mode of image data: the DMA controller is used for moving the source image stored at the source address src_addr to the destination address dst_addr, and the image is stored at the destination address and simultaneously realizes horizontal flipping. The DMA firstly reads data R1, R2, R3 and R4 from the row tail of the first row in a burst mode and puts them into a buffer, after the buffer status bit is pulled up, the data W1, W2, W3 and W4 are written into the corresponding destination address in burst mode, and the whole image data is moved according to the action.

As shown in fig. 8, in the moving process, the order of data in buffer is shown, and the data amount R1 (8×64 bits) read by single burst is divided into 8 beats of data, and each beat of data is 64 bits. Thus, only 8 beats of time are needed to complete buffer writing of the data volume of a single burst. Assuming that initial addresses of ram blocks forming a buffer are addr0, splitting 1 st beat of data (64 bit) of R1 into 8byte according to bytes according to the characteristic of horizontal overturn, namely 8 x 8bit, respectively placing the 8 th beat of data into addr0+7 addresses of ram blocks, placing 2 nd beat of data into addr0+6 addresses of ram blocks, and so on, placing 8 th beat of data into addr0 addresses of ram blocks. Thus, data such as R2, R3, and R4 can be written. In addition, reading the buffer is consistent with the copy mode of the image data, and the horizontal turning of the image data is realized after the data is read from the buffer according to the copy mode and the destination address is written.

Wherein, the numbers 0 to 7 of the same column in the ram block in fig. 8 represent the number of bytes in 1 beat of data. The addresses in the same column in each ram block are the same, and the starting address is addr0, which is not specifically identified in the figure.

Therefore, the DMA receives the sub-image data, and after the sub-image data is cached according to the pixel reading sequence through the DMA (cache structure of the application), the data is finally written into the destination address, so that the horizontal overturning transmission of the sub-image is realized.

Illustratively, the conversion mode of the target image is vertical inversion, and the pixel reading sequence of the sub-image includes one of the following:

reading from a row, wherein the row reading sequence of the target image is from a last row to a first row, and the row pixel reading sequence of each row of pixels is from a first row to a last row;

the target image is read from the columns in the order from the last column to the first column, and the pixels in each column are read from the first column to the last column.

For an image whose conversion mode is vertical inversion, the sub-image is read and then written into the buffer memory in the same manner as the unchanged write-in buffer memory. Thus, for its write address, reference may be made to the aforementioned confirmation procedure of the conversion manner being an unchanged write address.

Illustratively, the conversion is rotated 90 degrees, and the sub-image has a pixel reading order of one of the following:

the method comprises the steps of starting reading from a row, wherein the row reading sequence is from a first row to a last row, and the row pixel reading sequence of each row of pixels is staggered reading of pixels of different rows;

The column reading sequence is from the first column to the last column, and the column pixel reading sequence of each column of pixels is that of different columns of pixels are in staggered reading.

This misaligned read is actually a misaligned write into the cache.

For example, as shown in fig. 9, the sub-image is 8×8, before being put into the buffer, the read pixels are grouped by 8 pixels, the data are sequentially circularly shifted into the buffer by groups, and the circularly shifted unit is one pixel, so that parallel reading of pixels (such as p0 in a dashed line frame) at the same position in each row of the source image can be realized.

Specifically, as shown in fig. 10, the sub-image stored at the source address src_addr is transferred to the destination address dst_addr. When the sub-images are read at first, the sub-images are read from the rows, the row reading sequence is from the first row to the last row, the row pixel reading sequence of each row of pixels is that the pixels of different rows are read in a staggered manner, the pixels are cached to a cache buffer (the cache structure of the application is adopted), and 8 pixels of each row occupy a column of parallel cache blocks, so that parallel data storage is realized. The purpose of this is that, as shown in fig. 9, when accessing the buffer read data, it is necessary to:

p0 of the first row of pixels, p0 of the second row of pixels, p0 of the eighth row of pixels of p0 … … of the third row of pixels are read as p7, p6 and p5 … … p0 of the first row of pixels of the sub-image rotated 90 degrees to the right, respectively;

P1 of the first row of pixels, p1 of the second row of pixels, p1 of the eighth row of pixels of p1 … … of the third row of pixels are read as p7, p6, p5 … … p0 of the second row of pixels of the sub-image rotated 90 degrees to the right, respectively;

……

p2 of the first row pixel, p2 of the second row pixel, p2 of the third row pixel, p2 of the eighth row pixel of … … are read as p7, p6, p5 … … p0 of the eighth row pixel of the sub-image rotated 90 degrees to the right, respectively. The pixel reading sequence rotated 90 degrees to the right is consistent with the above principle and will not be described again here.

The purpose of the above-mentioned rotation by 90 degrees is that the pixel reading sequence of the sub-image is the staggered reading, and the purpose is to directly cache the pixels of one row before the rotation by 90 degrees in parallel and directly take out the pixels of one row after the rotation by 90 degrees in parallel to the destination address by utilizing the characteristic that the cache block of the cache buffer can be cached and read out in parallel.

The format of data taken out from the buffer is shown as a virtual frame and a dotted line in fig. 9, namely, the first burst data of the AXI write data interface is required to realize the operation of continuously writing small frame data into the destination address, and only p 0-p 7 pixels of each row in the source image small frame are required to be sequentially read out in a burst mode. According to the method, reasonable parameters are set in combination with the resolution of the image, the whole image is decomposed into proper small frames, and efficient transmission of the whole image in a 90-degree rotation mode can be achieved.

In this way, the DMA receives sub-image data from the source address, and after buffering according to the pixel reading sequence through buffer (the buffering structure of the present disclosure), the data is finally written into the destination address, so as to realize transmission of rotating the sub-image by 180 degrees.

When the sub-image is 32 x 32, the principle is also consistent, each row of pixels of the sub-image is 32, each row of pixels is split into 4 groups of 8 pixels, 4-column parallel buffer blocks (i.e. buffer depth 4) are adopted to store one row of pixels, and the pixels of the corresponding group of each row of pixels are subjected to staggered reading.

Illustratively, the conversion is rotated 180 degrees, and the sub-image has a pixel reading order of one of the following:

reading from a row, wherein the row reading sequence is from a tail row to a head row, and the row pixel reading sequence of each row of pixels is from a tail row to a head row;

the reading is started from the column, the column reading sequence is from the last column to the first column, and the column pixel reading sequence of each column pixel is from the column tail to the column head.

Therefore, the DMA receives the sub-image data, and after the sub-image data is cached according to the pixel reading sequence through the DMA (cache structure of the application), the data is finally written into the destination address, so that the transmission of rotating the sub-image by 180 degrees is realized.

Therefore, by the above example, the DMA reads and writes the target image from the source address into the cache according to the cache structure and the conversion mode, and simultaneously reads the data from the cache and writes the data into the destination address in parallel according to the specified mode, thereby realizing the rapid handling of the image and realizing the image conversion in the rapid handling process.

Based on the same inventive concept, the caching method and the related caching structure of the application can be also applied to the caching structures in the master and slave.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 11 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the electronic device 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the electronic device 800 can also be stored. The computing unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input output (I/O) interface 805 is also connected to the bus 804.

Various components in electronic device 800 are connected to I/O interface 805, including: an input unit 806 such as a keyboard, mouse, etc.; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, etc.; and a communication unit 809, such as a network card, modem, wireless communication transceiver, or the like. The communication unit 809 allows the electronic device 800 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 801 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 801 performs the respective methods and processes described above, such as a buffer method or an image transmission method. For example, in some embodiments, the caching method or the image transmission method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 800 via the ROM 802 and/or the communication unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the above-described caching method or image transmission method may be performed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform a caching method or an image transmission method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable atmosphere lamp fixture such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be carried out. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A caching method, comprising:

acquiring target data with bit width of byte index times;

2. The caching method of claim 1, wherein the data bit width is consistent with the cache block bit width.

3. The caching method according to claim 1, wherein the determining the parallel storage data length of the cache structure based on the cache block bit width and the interface data bit width configured by the cache structure includes:

4. A caching method as claimed in claim 3, characterized in that the common multiple is the smallest common multiple.

5. The caching method of claim 1, wherein the determining the number of parallel cache blocks in the cache structure based on the parallel storage data length and the cache block bit width comprises:

and determining the number of parallel cache blocks based on the ratio between the parallel storage data length and the cache block bit width.

6. The caching method of claim 1, further comprising:

and determining the burst length of the data bus interface configuration based on the ratio between the parallel storage data length and the interface data bit width.

7. The method of claim 1, wherein the data has a bit width of 8 bits, the cache block has a bit width of 8 bits, and the interface data has a bit width of 64 bits.

8. An image transmission method, comprising:

determining a cache structure of target data as the cache structure of any one of claims 1 to 7, wherein the target data is a target image;

9. The method of claim 8, wherein the determining a pixel read mode of the target image based on the conversion mode of the target image and the buffer structure comprises:

splitting the target image based on the cache structure to obtain at least one sub-image;

Determining a pixel reading sequence of the sub-image based on the conversion mode of the target image;

the caching the pixels of the target image into the cache structure based on the pixel reading sequence includes:

and caching the pixels of the sub-images into the caching structure based on the pixel reading sequence of the sub-images.

10. The method according to claim 9, wherein splitting the target image based on the buffer structure to obtain at least one sub-image comprises:

determining the number of parallel storage pixels for the target image based on the number of parallel cache blocks of the cache structure and the number of single data cache blocks of the target image; wherein, the single data buffer block data is determined according to the buffer bit width of the buffer structure and the data bit width of the target image;

based on the parallel storage pixel number, a line pixel amount and a wide pixel amount of the sub-image are determined.

11. The method of claim 9, wherein the conversion is a horizontal flip, and the pixel reading sequence of the sub-image includes one of:

The sub-images are read from the columns, the column reading sequence of the sub-images is from the first column to the last column, and the column pixel reading sequence of each column of pixels is from the tail column to the first column.

12. The method of claim 9, wherein the conversion is not flipped, and the pixel reading sequence of the sub-image includes one of:

the sub-images are read from the columns, the column reading sequence of the sub-images is from the first column to the last column, and the column pixel reading sequence of each column of pixels is from the head of the row to the tail of the row.

13. The method of claim 9, wherein the conversion mode is vertical flip, and the reading sequence of the target image includes one of:

reading from a row, wherein the row reading sequence of the sub-images is from a last row to a first row, and the row pixel reading sequence of each row of pixels is from the head of the row to the tail of the row;

the sub-images are read from the columns, the column reading sequence of the sub-images is from the last column to the first column, and the column pixel reading sequence of each column of pixels is from the first column to the last column.

14. An electronic device, comprising:

At least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.

15. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-13.

16. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-13.