CN117971501A - Data access method, device, storage medium and program product - Google Patents
Data access method, device, storage medium and program product Download PDFInfo
- Publication number
- CN117971501A CN117971501A CN202410369923.8A CN202410369923A CN117971501A CN 117971501 A CN117971501 A CN 117971501A CN 202410369923 A CN202410369923 A CN 202410369923A CN 117971501 A CN117971501 A CN 117971501A
- Authority
- CN
- China
- Prior art keywords
- memory
- interest
- memory access
- continuous
- region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 62
- 239000000872 buffer Substances 0.000 claims description 74
- 238000004590 computer program Methods 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 22
- 238000010586 diagram Methods 0.000 description 14
- 230000006870 function Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000004075 alteration Effects 0.000 description 2
- 238000010191 image analysis Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/94—Hardware or software architectures specially adapted for image or video understanding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Image Input (AREA)
Abstract
The embodiment of the application provides a data access method, equipment, a storage medium and a program product, and relates to the technical field of integrated circuits, wherein the method comprises the following steps: and expanding the original region of interest of the image to be processed stored in the memory to obtain a target region of interest, so that the target region of interest is continuous in the memory. And reading the target region of interest from the memory by adopting a continuous memory access mode, and calculating the original region of interest contained in the read target region of interest through a plurality of execution units. Compared with the method of reading data from the memory by adopting a discontinuous memory access mode, the continuous memory access mode improves memory access efficiency, improves the efficiency of computing the original region of interest, and further improves the performance of an image processing operator. Because the target region of interest is continuous in the memory, the reading tasks can be uniformly distributed to the execution units, the problem of unbalanced tasks on the execution units is avoided, and the utilization rate of hardware resources is improved.
Description
Technical Field
Embodiments of the present application relate to the field of integrated circuits, and in particular, to a data access method, apparatus, storage medium, and program product.
Background
In the technical fields of machine vision, image processing and the like, an artificial intelligent chip is generally required to process an image with a region of interest (Region Of Interest, abbreviated as ROI), wherein the region of interest is an image region selected from the images, and the region is a focus region of interest in image analysis and processing.
In practical applications, the image is stored in a memory. When the image is processed, the artificial intelligent chip reads the image data from the memory, and the image processing operator is adopted to perform corresponding calculation on the image data. Since the region of interest is a partial image region in one image, the region of interest is discontinuous in memory. Accordingly, when the region of interest is processed, the region of interest is generally read from the memory by adopting a discontinuous access mode, and the region of interest is correspondingly calculated by adopting an image processing operator.
However, when the non-continuous access method is adopted to read the region of interest from the memory, the access efficiency is low, which results in poor performance of the image processing operators in the image processing library.
Disclosure of Invention
The embodiment of the application provides a data access method, data access equipment, a storage medium and a program product, which are used for improving the memory access efficiency and further improving the performance of an image processing operator.
In one aspect, an embodiment of the present application provides a data access method, including:
expanding an original ROI of an image to be processed stored in a memory to obtain a target ROI, wherein the target ROI is continuous on the memory;
And reading the target ROI from the internal memory by adopting a continuous memory access mode, and calculating the original ROI contained in the read target ROI through a plurality of execution units.
In one aspect, an embodiment of the present application provides a data access apparatus, including:
The expansion module is used for expanding the original ROI of the image to be processed stored in the memory to obtain a target ROI, and the target ROI is continuous on the memory;
And the memory access module is used for reading the target ROI from the memory in a continuous memory access mode and calculating the original ROI contained in the read target ROI through a plurality of execution units.
Optionally, the memory access module is specifically configured to:
reading the target ROI from the internal memory to an on-chip buffer zone in a continuous memory access mode, wherein the target ROI comprises the original ROI;
and reading the data blocks in the original ROI from the on-chip buffer zone in a discontinuous memory access mode to the plurality of execution units for calculation.
Optionally, the memory access module is specifically configured to:
based on the region size of the target ROI, uniformly distributing corresponding tasks for the plurality of execution units;
for the plurality of execution units, the following operations are respectively executed:
and reading the data block corresponding to the allocated task from the target ROI stored in the memory by adopting a continuous memory access mode through an execution unit, and reading the data block corresponding to the allocated task to the on-chip buffer area.
Optionally, the memory access module is further configured to:
Before the target ROI is read from the memory to an on-chip buffer zone by adopting a continuous memory access mode, determining that the first memory access time is longer than the second memory access time, wherein the first memory access time refers to: reading the memory access time of the original ROI from the memory by adopting a discontinuous memory access mode;
The second memory access time refers to: and reading the target ROI from the memory to the on-chip buffer area by adopting a continuous memory access mode, and reading the total memory access time of the original ROI from the on-chip buffer area by adopting a discontinuous memory access mode.
Optionally, the memory access module is specifically configured to:
and determining the first memory access time based on the region size of the original ROI and the first speed of accessing the memory by adopting a discontinuous memory access mode.
Optionally, the memory access module is specifically configured to:
Determining continuous memory access time based on the region size of the target ROI and a second speed of reading data from the memory to the on-chip buffer area by adopting a continuous memory access mode;
determining discontinuous memory access time based on the region size of the original ROI and a third speed of accessing the on-chip buffer area by adopting a discontinuous memory access mode;
summing the continuous memory access time and the discontinuous memory access time to obtain the second memory access time.
Optionally, the memory access module is specifically configured to:
based on the region size of the target ROI, uniformly distributing corresponding tasks for the plurality of execution units;
for the plurality of execution units, the following operations are respectively executed:
Reading a first data block corresponding to the allocated task from the target ROI stored in the memory by adopting a continuous memory access mode through an execution unit; and extracting a second data block associated with the original ROI from the first data block for calculation.
Optionally, the memory access module is further configured to:
determining that the first memory time is greater than the third memory time before uniformly distributing corresponding tasks to the plurality of execution units based on the region size of the target ROI, wherein the first memory time refers to: reading the memory access time of the original ROI from the memory by adopting a discontinuous memory access mode;
The third access time refers to: and reading the target ROI from the internal memory by adopting a continuous memory access mode, and extracting the total time spent of the original ROI from the target ROI.
In one aspect, an embodiment of the present application provides a computer device including a memory, a processor chip, and a computer program stored on the memory and executable on the processor chip, where the processor chip implements the steps of the data access method described above when the computer program is executed.
In one aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program executable by a computer device, which when run on the computer device, causes the computer device to perform the steps of the data access method described above.
In one aspect, embodiments of the present application provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer device, cause the computer device to perform the steps of the data access method described above.
In the embodiment of the application, the original ROI of the image to be processed stored in the memory is expanded to obtain the target ROI, so that the target ROI is continuous in the memory. And then, a target ROI is read from the memory in a continuous memory access mode, wherein the target ROI comprises the original ROI. Compared with the method for reading data from the memory by adopting a discontinuous memory access mode, the continuous memory access mode adopted by the application greatly improves the memory access efficiency, and simultaneously improves the efficiency of calculating the original ROI, thereby improving the performance of an image processing operator in an image processing library.
Secondly, since the target ROI is continuous in the memory, the reading tasks can be uniformly distributed to the plurality of execution units, so that the problem of unbalanced tasks on the plurality of execution units is effectively avoided, and the utilization rate of hardware resources is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it will be apparent that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic structural diagram of a processor chip according to an embodiment of the present application;
fig. 2 is a schematic flow chart of a data access method according to an embodiment of the present application;
FIG. 3A is a schematic diagram I of a target region of interest according to an embodiment of the present application;
FIG. 3B is a schematic diagram II of a target region of interest according to an embodiment of the present application;
fig. 4 is a second flowchart of a data access method according to an embodiment of the present application;
FIG. 5 is a flow chart of a memory access method selection method according to an embodiment of the present application;
fig. 6 is a flowchart illustrating a data access method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a data access device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantageous effects of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Referring to fig. 1, which is a block diagram of a processor chip to which an embodiment of the present application is applied, the processor chip 100 includes at least: memory 101, on-chip buffer 102, and execution unit 103.
Memory 101 may be a high bandwidth memory (High Bandwidth Memory, HBM) or other types of memory. On-chip buffer 102 is a temporary memory that has a smaller capacity than memory 101 but faster data exchange speed. The number of on-chip buffers 102 may be one or more, with one on-chip buffer 102 providing storage services for multiple execution units 103. The execution unit 103 may be used to execute various types of image processing operators.
Execution unit 103 may directly read image data from memory 101 and perform corresponding calculations; alternatively, the image data may be read from the memory 101 into the on-chip buffer 102, and then the image data may be read from the on-chip buffer 102 and calculated accordingly.
The efficiency of execution unit 103 to read data from memory 101 using a continuous memory access scheme is higher than the efficiency of reading data from memory 101 using a discontinuous memory access scheme. In addition, the efficiency of the execution unit 103 for reading data from the on-chip buffer 102 by using the discontinuous access method is higher than the efficiency of reading data from the memory 101 by using the discontinuous access method.
The processor chip 100 of the present application may include other structures besides the above-described structures, and the present application is not particularly limited thereto.
The processor chip 100 may be: central processing units (Central Processing Unit, CPU for short), graphics processing units (Graphics Processing Unit, GPU for short), general-purpose graphics processing units (General-purpose computing on graphics processing units, GPGPU for short, domain specific architecture (Domain Specific Architecture, DSA for short), etc.
In practical applications, the region of interest is discontinuous in memory because it is a partial image region in one image. Correspondingly, when the region of interest is processed, the execution unit adopts a discontinuous access mode to read the region of interest from the memory, and adopts an image processing operator to correspondingly calculate the region of interest. However, when the non-continuous access method is adopted to read the region of interest from the memory, the access efficiency is low, which results in poor performance of the image processing operators in the image processing library.
In view of this, the present application provides a flow of a data access method based on the architecture diagram of the processor chip shown in fig. 1, and as shown in fig. 2, the flow of the method is executed by the processor chip, and includes the following steps:
step 201, expanding an original ROI of an image to be processed stored in a memory to obtain a target ROI.
Specifically, the original ROI in the image to be processed is an image region selected from the image to be processed, which is a focus region of interest for image analysis and processing. For example, in the face recognition scenario, the original ROI in the image to be processed is: a face region in the image to be processed.
If data is stored in the memory in a line-first form, then data from adjacent lines is contiguous in the memory. In this case, the expansion is performed laterally along the line in which the original ROI is located until the expanded target ROI is contiguous in memory.
For example, referring to fig. 3A and 3b, in the hbm stored image to be processed, the original ROI includes: pixel 6, pixel 7, pixel 8, pixel 11, pixel 12, pixel 13. The original ROI is laterally expanded until the expanded target ROI is contiguous in memory, wherein the target ROI has at least the following possible forms:
Referring to fig. 3A, the target ROI includes: pixel 6, pixel 7, pixel 8, pixel 9, pixel 10, pixel 11, pixel 12, pixel 13.
Form two, see fig. 3B, the target ROI includes: pixel 5, pixel 6, pixel 7, pixel 8, pixel 9, pixel 10, pixel 11, pixel 12, pixel 13, pixel 14.
Likewise, if data is stored in a memory in a column-first form, then data from adjacent columns is contiguous in memory. In this case, the longitudinal expansion is performed along the column in which the original ROI is located until the expanded target ROI is contiguous in memory.
Step 202, a continuous access mode is adopted to read the target ROI from the memory, and a plurality of execution units are used to calculate the original ROI contained in the read target ROI.
Specifically, since the target ROI obtained after expansion is continuous in the memory, the target ROI can be read from the memory by a continuous memory access method. In the actual image processing process, only the original ROI is needed to be calculated, and the expanded region is not needed to be calculated; in addition, a plurality of execution units are included in the processor chip, and thus, the original ROI contained in the read target ROI is calculated by the plurality of execution units.
In specific implementation, each execution unit reads a data block in the original ROI, and executes an image processing operator based on the read data block to obtain a processing result, where the data block may be one pixel or a pixel block formed by a plurality of pixels. And finally, combining the processing results of the execution units to obtain a final image processing result.
In the embodiment of the application, the original ROI of the image to be processed stored in the memory is expanded to obtain the target ROI, so that the target ROI is continuous in the memory. And then, a target ROI is read from the memory in a continuous memory access mode, wherein the target ROI comprises the original ROI. Compared with the method for reading data from the memory by adopting a discontinuous memory access mode, the continuous memory access mode adopted by the application greatly improves the memory access efficiency, and simultaneously improves the efficiency of calculating the original ROI, thereby improving the performance of an image processing operator in an image processing library.
In some embodiments, a continuous memory access is used to read the target ROI from memory into an on-chip buffer, the target ROI comprising the original ROI. And then, reading the data blocks in the original ROI from the on-chip buffer area by adopting a discontinuous memory access mode to a plurality of execution units for calculation.
Specifically, the target ROI may be read from the memory to the on-chip buffer by using a continuous memory access manner through a plurality of execution units, or may be read from the memory to the on-chip buffer by using a continuous memory access manner by other units.
In practical application, since the original ROI is discontinuous in the memory, when the original ROI is read from the memory by using a discontinuous memory access manner through a plurality of execution units, it is difficult to uniformly allocate the read tasks to the plurality of execution units, which results in that some execution units are in a busy state and other execution units are in an idle state, which results in low utilization rate of hardware resources and reduced performance of the image processing operator.
In view of this, in the embodiment of the present application, after the original ROI of the image to be processed stored in the memory is expanded to obtain the target ROI, corresponding tasks are uniformly distributed to the plurality of execution units based on the region size of the target ROI. For a plurality of execution units, the following operations are respectively executed:
And reading the data block corresponding to the allocated task from the target ROI stored in the memory to the on-chip buffer area by adopting a continuous memory access mode through an execution unit.
Specifically, the region size of the target ROI may be characterized by the number of pixels contained in the target ROI. Since the target ROI obtained by expansion is continuous in memory, the target ROI is uniformly divided into a plurality of data blocks based on the region size of the target ROI and the number of the plurality of execution units, wherein the number of the data blocks obtained by division is the same as the number of the execution units, and then each execution unit is assigned a task of reading the data blocks.
For each execution unit, after receiving the task, the execution unit adopts a continuous memory access mode to read the data block corresponding to the allocated task from the target ROI in the memory. The data block is then written into an on-chip buffer. After the multiple execution units execute the allocated tasks, the target ROI is read from the memory to the on-chip buffer area.
Since the calculation of the original ROI is only required when the image processing algorithm is performed, the calculation of the extended region is not required. The original ROI is not continuous in the on-chip buffer, so the multiple execution units use a discontinuous access method to read the data blocks in the original ROI from the on-chip buffer for calculation.
For example, referring to fig. 4, in HBM, the original ROI includes: pixel 6, pixel 7, pixel 8, pixel 11, pixel 12, pixel 13.
Expanding the original ROI to obtain a target ROI, wherein the target ROI comprises: pixel 5, pixel 6, pixel 7, pixel 8, pixel 9, pixel 10, pixel 11, pixel 12, pixel 13, pixel 14.
Uniformly dividing the target ROI into a data block A and a data block B, wherein the data block A comprises: pixel 5, pixel 6, pixel 7, pixel 8, pixel 9; the data block B includes: pixel 10, pixel 11, pixel 12, pixel 13, and pixel 14.
The read task of data block a is assigned to execution unit 1 and the read task of data block B is assigned to execution unit 2. The execution unit 1 reads the data block a from the HBM in a continuous memory access manner and writes the data block a into an on-chip buffer. The execution unit 2 reads the data block B from the HBM in a continuous memory access manner and writes the data block B into the on-chip buffer.
In the on-chip buffer, the original ROI includes: pixel 6, pixel 7, pixel 8, pixel 11, pixel 12, pixel 13.
The execution unit 1 adopts a discontinuous access mode to read a data block C (comprising a pixel point 6, a pixel point 7 and a pixel point 8) in the original ROI from the on-chip buffer area, and executes an image processing operator based on the read data block C to obtain a processing result.
The execution unit 2 adopts a discontinuous access mode to read the data block D (comprising the pixel point 11, the pixel point 12 and the pixel point 13) in the original ROI from the on-chip buffer area, and executes an image processing operator based on the read data block D to obtain a processing result.
In the embodiment of the application, the continuous memory access mode is adopted to read the target ROI from the memory to the on-chip buffer area, and compared with the discontinuous memory access mode adopted to read the data from the memory, the continuous memory access mode provided by the application greatly improves the memory access efficiency, thereby improving the operator performance. Secondly, since the target ROI is continuous in the memory, the reading tasks can be uniformly distributed to the plurality of execution units, so that the problem of unbalanced tasks on the plurality of execution units is effectively avoided, and the utilization rate of hardware resources is improved.
In some embodiments, when the original ROI is expanded, the fewer the expanded regions, the less data is additionally read, and the shorter the corresponding additional time is, and therefore, the better the access performance. Conversely, the more the extended area, the more data is additionally read, and the longer the corresponding additional time is spent, so the memory access performance is relatively reduced, and in some cases, the performance is even lower than the performance of directly reading the original ROI from the memory by using the discontinuous memory access method. Therefore, in practical application, it is necessary to determine whether to select to read the target ROI obtained by extension from the memory by using the continuous access method to the on-chip buffer according to the practical situation.
In the embodiment of the application, when the first access time is longer than the second access time, a continuous access mode is adopted to read the target ROI from the memory to the on-chip buffer area. And then, reading the data blocks in the original ROI from the on-chip buffer area by adopting a discontinuous memory access mode to a plurality of execution units for calculation.
When the first memory access time is less than or equal to the second memory access time, the original ROI is directly read from the memory to be calculated by adopting a discontinuous memory access mode, wherein the first memory access time refers to: the access time of the original ROI is read from the memory by adopting a discontinuous access mode; the second memory access time refers to: and reading the target ROI from the memory to the on-chip buffer area by adopting a continuous memory access mode, and reading the total memory access time of the original ROI from the on-chip buffer area by adopting a discontinuous memory access mode.
In some embodiments, the first memory access time is determined based on the region size of the original ROI and a first speed of accessing the memory in a discontinuous memory access manner.
Specifically, the region size of the original ROI may be characterized by the number of pixels contained in the original ROI, or may be characterized by other parameters. The first speed of accessing the memory by the discontinuous access mode is the hardware attribute of the processor chip. Dividing the region size of the original ROI by the first speed of accessing the memory in a discontinuous memory access mode, and obtaining the first memory access time.
In some embodiments, the continuous memory access time is determined based on the region size of the target ROI and a second speed of reading data from the memory to the on-chip buffer using the continuous memory access. And then determining the discontinuous memory access time based on the region size of the original ROI and the third speed of accessing the on-chip buffer area by adopting a discontinuous memory access mode. Summing the continuous memory access time and the discontinuous memory access time to obtain a second memory access time.
Specifically, the second speed of reading data from the memory to the on-chip buffer by adopting the continuous memory access mode and the third speed of accessing the on-chip buffer by adopting the discontinuous memory access mode are both hardware attributes of the processor chip. Dividing the size of the region of the target ROI by the second speed of reading data from the memory to the on-chip buffer area in a continuous memory access mode, and obtaining the time of continuous memory access. Dividing the size of the original region of the ROI by the third speed of accessing the on-chip buffer area in a discontinuous memory access mode to obtain the time of discontinuous memory access.
In the image processing process, besides the continuous memory access time from the target ROI to the on-chip buffer area by adopting a continuous memory access mode, the method also comprises the discontinuous memory access time from the original ROI to the execution unit by adopting a discontinuous memory access mode. Therefore, the continuous memory access time and the discontinuous memory access time are summed to obtain the second memory access time.
When the first access time is longer than the second access time, the access performance of the technical scheme of reading the original ROI from the on-chip buffer by adopting a continuous access mode to read the target ROI from the memory to the on-chip buffer and then adopting a discontinuous access mode to read the original ROI from the on-chip buffer to calculate the execution unit is better. When the first memory access time is less than or equal to the second memory access time, the memory access performance of the technical scheme from the original ROI read from the memory to the calculation of the execution unit by adopting the discontinuous memory access mode is better.
In order to better explain the embodiment of the present application, the following describes a flow of a data access method provided by the embodiment of the present application in connection with a specific implementation scenario, as shown in fig. 5, where the flow of the method is executed by a processor chip, and includes the following steps:
In step 501, the region size of the original ROI is set to S1, and the speed of discontinuous access to the HBM is V1.
Step 502, setting the size of the expanded target ROI as S2, and continuously accessing the HBM read data to the on-chip buffer at a speed V2.
In step 503, the speed of discontinuous access to the on-chip buffer is set to V3.
In step 504, the memory time t1=s1/V1 for the discontinuous access HBM to read the original ROI is calculated.
In step 505, the access time t2=s2/v2+s1/V3 for the continuous access HBM to read the target ROI to the on-chip buffer and for the discontinuous access to the on-chip buffer to read the original ROI is calculated.
Step 506, determining whether t1 is greater than t2, if so, executing step 507, otherwise executing step 509,
Step 507, the HBM is continuously accessed to read the target ROI into the on-chip buffer.
In step 508, the non-sequential access on-chip buffer reads the original ROI for calculation.
In step 509, the discontinuous access HBM reads the original ROI for calculation.
In the embodiment of the application, the first access time of the discontinuous access HBM for reading the original ROI is calculated in advance, and the second access time of the continuous access HBM for reading the target ROI to the on-chip buffer area and the discontinuous access on-chip buffer area for reading the original ROI is calculated. And then determining the performance of the two access modes by comparing the first access time with the second access time, and selecting the access mode with better performance to execute the image processing operator, thereby improving the performance of the image processing operator.
In some embodiments, besides the above-described technical solution of reading the target ROI from the memory to the on-chip buffer by using the continuous access method and then reading the original ROI from the on-chip buffer to the execution unit by using the discontinuous access method, the target ROI may be directly read from the memory to the execution unit by using the continuous access method without using the on-chip buffer, and then the execution unit performs calculation on the original ROI after eliminating the extension region in the target ROI.
Specifically, based on the region size of the target ROI, corresponding tasks are uniformly allocated to the plurality of execution units. For a plurality of execution units, the following operations are respectively executed:
Reading a first data block corresponding to the allocated task from a target ROI stored in a memory by adopting a continuous memory access mode through an execution unit; and extracting a second data block associated with the original ROI from the first data block for calculation.
In a specific implementation, since the target ROI obtained by expansion is continuous in memory, the target ROI is uniformly divided into a plurality of first data blocks based on the region size of the target ROI and the number of the plurality of execution units, wherein the number of the first data blocks obtained by division is the same as the number of the execution units, and then each execution unit is assigned with a task of reading the first data blocks.
For each execution unit, after receiving the task, the execution unit adopts a continuous memory access mode to read a first data block corresponding to the allocated task from a target ROI in the memory. Since the first data block may contain the expanded data, the expanded data in the first data block is culled, and the second data block associated with the original ROI is retained. And executing an image processing operator based on the second data block to obtain a processing result.
For example, referring to fig. 6, in HBM, the original ROI includes: pixel 6, pixel 7, pixel 8, pixel 11, pixel 12, pixel 13.
Expanding the original ROI to obtain a target ROI, wherein the target ROI comprises: pixel 5, pixel 6, pixel 7, pixel 8, pixel 9, pixel 10, pixel 11, pixel 12, pixel 13, pixel 14.
Uniformly dividing the target ROI into a data block A and a data block B, wherein the data block A comprises: pixel 5, pixel 6, pixel 7, pixel 8, pixel 9; the data block B includes: pixel 10, pixel 11, pixel 12, pixel 13, and pixel 14. The read task of data block a is assigned to execution unit 1 and the read task of data block B is assigned to execution unit 2.
The execution unit 1 reads the data block a from the HBM in a continuous memory access manner, then eliminates the data (i.e., the pixel point 5 and the pixel point 9) expanded in the data block a to obtain the data block C (including the pixel point 6, the pixel point 7 and the pixel point 8), and executes an image processing operator based on the read data block C to obtain a processing result.
The execution unit 3 reads the data block B from the HBM in a continuous access manner, then eliminates the data (i.e., the pixel point 10 and the pixel point 14) expanded in the data block B to obtain the data block D (including the pixel point 11, the pixel point 12 and the pixel point 13), and executes an image processing operator based on the read data block D to obtain a processing result.
In the embodiment of the application, a continuous memory access mode is adopted to read the target ROI from the memory to the execution unit. The execution unit eliminates the expanded region in the target ROI to obtain the original ROI, calculates the original ROI, omits the process of reading and writing the buffer region on the chip, thereby improving the memory access efficiency and further improving the performance of the image processing operator. Secondly, since the target ROI is continuous in the memory, the reading task can be uniformly distributed to the plurality of execution units, so that the problem of unbalanced task on the plurality of execution units is avoided, and the utilization rate of hardware resources is improved.
In some embodiments, when the original ROI is expanded, the smaller the expansion area is, the smaller the expansion area that is additionally read by the execution unit is, the smaller the expansion area that is removed before calculation by the execution unit is, and accordingly, the shorter the additional time is spent, so that the operator performance is better. In contrast, the larger the extension area is, the larger the extension area that is additionally read is, the larger the extension area that is removed before calculation by the execution unit is, and the longer the corresponding additional time is spent, so that the operator performance is relatively reduced, and in some cases, the operator performance is even lower than the performance that the original ROI is directly read from the memory by adopting a discontinuous access mode. Therefore, in practical application, it is necessary to determine whether to select the target ROI obtained by reading the extension from the memory by using the continuous access method according to the practical situation.
In the embodiment of the application, when the first access time is longer than the third access time, the target ROI is read from the memory by adopting a continuous access mode, and then the original ROI is extracted from the target ROI for calculation.
When the first memory access time is less than or equal to the third memory access time, the original ROI is directly read from the memory to be calculated by adopting a discontinuous memory access mode, wherein the first memory access time refers to: the access time of the original ROI is read from the memory by adopting a discontinuous access mode; the third memory access time refers to: and reading the target ROI from the memory by adopting a continuous memory access mode, and extracting the total time spent of the original ROI from the target ROI.
The calculation method for the first memory access is described above, and will not be described here again. For the third memory access time, the application is obtained by adopting the following implementation modes:
And determining the continuous memory access time based on the size of the region of the target ROI and the speed of reading the data from the memory by adopting a continuous memory access mode. The extraction time consuming for extracting the original ROI from the target ROI is determined based on the size of the extended region in the target ROI other than the original ROI, and the data culling speed. Summing the continuous memory access time and the extraction time to obtain a third memory access time.
In specific implementation, the size of the region of the target ROI is divided by the speed of reading data from the memory in a continuous memory access manner, so that the time of continuous memory access is obtained. Dividing the size of the expansion area by the data eliminating speed to obtain the time consumption of extraction.
In the embodiment of the application, the first access time for reading the original ROI by the discontinuous access HBM is calculated in advance, and the third access time for reading the target ROI by the discontinuous access HBM and extracting the original ROI from the target ROI is calculated. And then determining the performance of the two access modes by comparing the first access time with the third access time, and then selecting the access mode with better performance to execute the image processing operator, thereby improving the performance of the image processing operator.
Based on the same technical concept, an embodiment of the present application provides a schematic structural diagram of a data access device, as shown in fig. 7, the data access device 700 includes:
an expansion module 701, configured to expand an original ROI of an image to be processed stored in a memory, to obtain a target ROI, where the target ROI is continuous in the memory;
The memory access module 702 is configured to read the target ROI from the memory by adopting a continuous memory access manner, and calculate the original ROI contained in the read target ROI through a plurality of execution units.
Optionally, the memory access module 702 is specifically configured to:
reading the target ROI from the internal memory to an on-chip buffer zone in a continuous memory access mode, wherein the target ROI comprises the original ROI;
and reading the data blocks in the original ROI from the on-chip buffer zone in a discontinuous memory access mode to the plurality of execution units for calculation.
Optionally, the memory access module 702 is specifically configured to:
based on the region size of the target ROI, uniformly distributing corresponding tasks for the plurality of execution units;
for the plurality of execution units, the following operations are respectively executed:
and reading the data block corresponding to the allocated task from the target ROI stored in the memory by adopting a continuous memory access mode through an execution unit, and reading the data block corresponding to the allocated task to the on-chip buffer area.
Optionally, the access module 702 is further configured to:
Before the target ROI is read from the memory to an on-chip buffer zone by adopting a continuous memory access mode, determining that the first memory access time is longer than the second memory access time, wherein the first memory access time refers to: reading the memory access time of the original ROI from the memory by adopting a discontinuous memory access mode;
The second memory access time refers to: and reading the target ROI from the memory to the on-chip buffer area by adopting a continuous memory access mode, and reading the total memory access time of the original ROI from the on-chip buffer area by adopting a discontinuous memory access mode.
Optionally, the memory access module 702 is specifically configured to:
and determining the first memory access time based on the region size of the original ROI and the first speed of accessing the memory by adopting a discontinuous memory access mode.
Optionally, the memory access module 702 is specifically configured to:
Determining continuous memory access time based on the region size of the target ROI and a second speed of reading data from the memory to the on-chip buffer area by adopting a continuous memory access mode;
determining discontinuous memory access time based on the region size of the original ROI and a third speed of accessing the on-chip buffer area by adopting a discontinuous memory access mode;
summing the continuous memory access time and the discontinuous memory access time to obtain the second memory access time.
Optionally, the memory access module 702 is specifically configured to:
based on the region size of the target ROI, uniformly distributing corresponding tasks for the plurality of execution units;
for the plurality of execution units, the following operations are respectively executed:
Reading a first data block corresponding to the allocated task from the target ROI stored in the memory by adopting a continuous memory access mode through an execution unit; and extracting a second data block associated with the original ROI from the first data block for calculation.
Optionally, the access module 702 is further configured to:
determining that the first memory time is greater than the third memory time before uniformly distributing corresponding tasks to the plurality of execution units based on the region size of the target ROI, wherein the first memory time refers to: reading the memory access time of the original ROI from the memory by adopting a discontinuous memory access mode;
The third access time refers to: and reading the target ROI from the internal memory by adopting a continuous memory access mode, and extracting the total time spent of the original ROI from the target ROI.
In the embodiment of the application, the original ROI of the image to be processed stored in the memory is expanded to obtain the target ROI, so that the target ROI is continuous in the memory. And then, a target ROI is read from the memory in a continuous memory access mode, wherein the target ROI comprises the original ROI. Compared with the method for reading data from the memory by adopting a discontinuous memory access mode, the continuous memory access mode adopted by the application greatly improves the memory access efficiency, and simultaneously improves the efficiency of calculating the original ROI, thereby improving the performance of an image processing operator in an image processing library.
Secondly, since the target ROI is continuous in the memory, the reading tasks can be uniformly distributed to the plurality of execution units, so that the problem of unbalanced tasks on the plurality of execution units is effectively avoided, and the utilization rate of hardware resources is improved.
Based on the same technical concept, the embodiment of the present application provides a computer device, as shown in fig. 8, including at least one processor chip 100 and a memory 801 connected to the at least one processor chip 100, where the embodiment of the present application is not limited to a specific connection medium between the processor chip 100 and the memory 801, and in fig. 8, the processor chip 100 and the memory 801 are connected by a bus as an example. The buses may be divided into address buses, data buses, control buses, etc.
In the embodiment of the present application, the memory 801 stores instructions executable by the at least one processor chip 100, and the at least one processor chip 100 can perform the steps of the data access method described above by executing the instructions stored in the memory 801.
The processor chip 100 is a control center of a computer device, and various interfaces and lines can be used to connect various parts of the computer device, and data access can be realized by executing or executing instructions stored in the memory 801 and calling data stored in the memory 801. Alternatively, the processor chip 100 may include one or more processing units, and the processor chip 100 may integrate an application processor and a modem processor, wherein the application processor primarily processes an operating system, a user interface, an application program, and the like, and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor chip 100. In some embodiments, the processor chip 100 and the memory 801 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The processor chip 100 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application SPECIFIC INTEGRATED Circuit (ASIC), field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
The memory 801, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 801 may include at least one type of storage medium, and may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM), magnetic Memory, magnetic disk, optical disk, and the like. Memory 801 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer device, but is not limited thereto. The memory 801 in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and/or data.
Based on the same inventive concept, an embodiment of the present application provides a computer-readable storage medium storing a computer program executable by a computer device, which when run on the computer device causes the computer device to perform the steps of the above-described data access method.
Based on the same inventive concept, embodiments of the present application provide a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer device, cause the computer device to perform the steps of the data access method described above.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, or as a computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer device or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer device or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer device or other programmable apparatus to produce a computer device implemented process such that the instructions which execute on the computer device or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (11)
1. A method of data access, comprising:
expanding an original region of interest of an image to be processed stored in a memory to obtain a target region of interest, wherein the target region of interest is continuous on the memory;
and reading the target region of interest from the memory by adopting a continuous memory access mode, and calculating the original region of interest contained in the read target region of interest through a plurality of execution units.
2. The method of claim 1, wherein the reading the target region of interest from the memory in a continuous access manner and calculating, by a plurality of execution units, the original region of interest included in the read target region of interest includes:
reading the target region of interest from the memory to an on-chip buffer area in a continuous memory access mode, wherein the target region of interest comprises the original region of interest;
And reading the data blocks in the original interested area from the on-chip buffer area by adopting a discontinuous memory access mode to the plurality of execution units for calculation.
3. The method of claim 2, wherein the reading the target region of interest from the memory to the on-chip buffer using a continuous access method comprises:
Based on the region size of the target region of interest, uniformly distributing corresponding tasks to the plurality of execution units;
for the plurality of execution units, the following operations are respectively executed:
And reading the data block corresponding to the allocated task from the target interested area stored in the memory by adopting a continuous memory access mode through an execution unit to the on-chip buffer area.
4. The method of claim 2, wherein before the step of reading the target region of interest from the memory to the on-chip buffer by using the continuous access method, further comprises:
determining that the first access time is greater than the second access time;
Wherein, the first access time refers to: reading the access time of the original region of interest from the memory by adopting a discontinuous access mode;
The second memory access time refers to: and reading the target region of interest from the memory to the on-chip buffer area by adopting a continuous memory access mode, and reading the total memory access time of the original region of interest from the on-chip buffer area by adopting a discontinuous memory access mode.
5. The method of claim 4, wherein the first access time is obtained using:
and determining the first memory access time based on the region size of the original region of interest and the first speed of accessing the memory by adopting a discontinuous memory access mode.
6. The method of claim 4, wherein the second access time is obtained using:
determining continuous memory access time based on the region size of the target region of interest and a second speed of reading data from the memory to the on-chip buffer area by adopting a continuous memory access mode;
determining discontinuous memory access time based on the area size of the original region of interest and a third speed of accessing the on-chip buffer area by adopting a discontinuous memory access mode;
summing the continuous memory access time and the discontinuous memory access time to obtain the second memory access time.
7. The method of claim 1, wherein the reading the target region of interest from the memory in a continuous access manner and calculating, by a plurality of execution units, the original region of interest included in the read target region of interest includes:
Based on the region size of the target region of interest, uniformly distributing corresponding tasks to the plurality of execution units;
for the plurality of execution units, the following operations are respectively executed:
Reading a first data block corresponding to the allocated task from the target interested area stored in the memory by adopting a continuous memory access mode through an execution unit; and extracting a second data block associated with the original region of interest from the first data block for calculation.
8. The method of claim 7, wherein prior to uniformly distributing the respective tasks to the plurality of execution units based on the region size of the target region of interest, further comprising:
determining that the first access time is greater than the third access time;
Wherein, the first access time refers to: reading the access time of the original region of interest from the memory by adopting a discontinuous access mode;
The third access time refers to: and reading the target region of interest from the memory by adopting a continuous memory access mode, and extracting the total time consumption of the original region of interest from the target region of interest.
9. A computer device comprising a memory, a processor chip and a computer program stored on the memory and executable on the processor chip, characterized in that the processor chip implements the steps of the method according to any of claims 1-8 when the computer program is executed.
10. A computer readable storage medium, characterized in that it stores a computer program executable by a computer device, which computer program, when run on the computer device, causes the computer device to perform the steps of the method according to any one of claims 1-8.
11. A computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer device, cause the computer device to carry out the steps of the method according to any one of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410369923.8A CN117971501B (en) | 2024-03-28 | 2024-03-28 | Data access method, device, storage medium and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410369923.8A CN117971501B (en) | 2024-03-28 | 2024-03-28 | Data access method, device, storage medium and program product |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117971501A true CN117971501A (en) | 2024-05-03 |
CN117971501B CN117971501B (en) | 2024-07-09 |
Family
ID=90848176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410369923.8A Active CN117971501B (en) | 2024-03-28 | 2024-03-28 | Data access method, device, storage medium and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117971501B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1928871A (en) * | 2006-09-07 | 2007-03-14 | 北京优纳科技有限公司 | Big capacity image fast browsing system |
US20130145373A1 (en) * | 2011-12-01 | 2013-06-06 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and storage medium |
CN104932982A (en) * | 2014-03-21 | 2015-09-23 | 华为技术有限公司 | Message access memory compiling method and related apparatus |
CN106569961A (en) * | 2016-10-31 | 2017-04-19 | 珠海市微半导体有限公司 | Access address continuity-based cache module and access method thereof |
CN115080261A (en) * | 2021-03-16 | 2022-09-20 | 华为技术有限公司 | Data processing method, device, system, computing equipment and computer storage medium |
CN115357377A (en) * | 2022-07-25 | 2022-11-18 | 芯来智融半导体科技(上海)有限公司 | Memory control scheduling method and device, computer equipment and storage medium |
CN115964331A (en) * | 2022-09-20 | 2023-04-14 | 北京达佳互联信息技术有限公司 | Data access method, device and equipment |
CN116700798A (en) * | 2023-06-15 | 2023-09-05 | 龙芯中科技术股份有限公司 | Irregular memory access processing method and device and electronic equipment |
CN117389630A (en) * | 2023-12-11 | 2024-01-12 | 北京开源芯片研究院 | Data caching method and device, electronic equipment and readable storage medium |
-
2024
- 2024-03-28 CN CN202410369923.8A patent/CN117971501B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1928871A (en) * | 2006-09-07 | 2007-03-14 | 北京优纳科技有限公司 | Big capacity image fast browsing system |
US20130145373A1 (en) * | 2011-12-01 | 2013-06-06 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and storage medium |
CN104932982A (en) * | 2014-03-21 | 2015-09-23 | 华为技术有限公司 | Message access memory compiling method and related apparatus |
CN106569961A (en) * | 2016-10-31 | 2017-04-19 | 珠海市微半导体有限公司 | Access address continuity-based cache module and access method thereof |
CN115080261A (en) * | 2021-03-16 | 2022-09-20 | 华为技术有限公司 | Data processing method, device, system, computing equipment and computer storage medium |
CN115357377A (en) * | 2022-07-25 | 2022-11-18 | 芯来智融半导体科技(上海)有限公司 | Memory control scheduling method and device, computer equipment and storage medium |
CN115964331A (en) * | 2022-09-20 | 2023-04-14 | 北京达佳互联信息技术有限公司 | Data access method, device and equipment |
CN116700798A (en) * | 2023-06-15 | 2023-09-05 | 龙芯中科技术股份有限公司 | Irregular memory access processing method and device and electronic equipment |
CN117389630A (en) * | 2023-12-11 | 2024-01-12 | 北京开源芯片研究院 | Data caching method and device, electronic equipment and readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117971501B (en) | 2024-07-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090300621A1 (en) | Local and Global Data Share | |
US11455781B2 (en) | Data reading/writing method and system in 3D image processing, storage medium and terminal | |
CN115035128B (en) | Image overlapping sliding window segmentation method and system based on FPGA | |
WO2019184888A1 (en) | Image processing method and apparatus based on convolutional neural network | |
CN111340790B (en) | Bounding box determination method, device, computer equipment and storage medium | |
CN117971501B (en) | Data access method, device, storage medium and program product | |
CN118193410A (en) | Execution method, equipment and storage medium of memory handling operator | |
CN114372928A (en) | Data processing method and device and electronic equipment | |
CN114648105A (en) | Slicing method, device, chip and storage medium of multi-output neural network | |
CN111062473B (en) | Data calculation method, image processing method and device in neural network model | |
CN115049531B (en) | Image rendering method and device, graphic processing equipment and storage medium | |
CN116263982B (en) | Graphics processor, system, method, electronic device and apparatus | |
KR101204866B1 (en) | Method and apparatus of executing pixel calculation within window area at high speed in window-based image processing | |
CN115049529A (en) | Image gradient determination method, device, equipment and storage medium | |
CN113469282B (en) | Feature comparison method, device and system | |
CN118093452B (en) | Memory architecture mapping method, device, storage medium and program product | |
CN112256431B (en) | Cost aggregation method and device, storage medium and terminal | |
CN116228634B (en) | Distance transformation calculation method, application, terminal and medium for image detection | |
CN115546520A (en) | Image matching method and device and electronic equipment | |
CN117422608A (en) | Image guided filtering method and system | |
CN118608376A (en) | Execution method, device and storage medium of picture scaling fusion operator | |
CN118397298B (en) | Self-attention space pyramid pooling method based on mixed pooling and related components | |
CN118505513A (en) | Execution method, device, storage medium and program product of rotation operator | |
CN118672589A (en) | Register resource allocation method, device, storage medium and program product | |
CN118608746A (en) | Detection frame screening method, detection frame screening device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |