CN118333840A - Image remap optimization method, system, equipment and medium suitable for DSP - Google Patents

Image remap optimization method, system, equipment and medium suitable for DSP Download PDF

Info

Publication number
CN118333840A
CN118333840A CN202410391801.9A CN202410391801A CN118333840A CN 118333840 A CN118333840 A CN 118333840A CN 202410391801 A CN202410391801 A CN 202410391801A CN 118333840 A CN118333840 A CN 118333840A
Authority
CN
China
Prior art keywords
dsp
data
image
coordinates
maximum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410391801.9A
Other languages
Chinese (zh)
Inventor
刘东东
黄佳雯
陈一
汪博
朱力
吕方璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Guangjian Aoshen Technology Co ltd
Shenzhen Guangjian Technology Co Ltd
Original Assignee
Chongqing Guangjian Aoshen Technology Co ltd
Shenzhen Guangjian Technology Co Ltd
Filing date
Publication date
Application filed by Chongqing Guangjian Aoshen Technology Co ltd, Shenzhen Guangjian Technology Co Ltd filed Critical Chongqing Guangjian Aoshen Technology Co ltd
Publication of CN118333840A publication Critical patent/CN118333840A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides an image remap optimization method, system, equipment and medium applicable to DSP, comprising the following steps: step S1: counting the maximum difference between dst coordinates of all pixels and corresponding src coordinates, and judging whether the maximum difference is smaller than a maximum instruction range or not so as to determine a maximum coordinate difference value M; the maximum instruction range is VPLD data quantity which can be processed by the maximum instruction; the src coordinates include mapx (i, j), mapy (i, j); step S2: dividing an image into a plurality of blocks according to the dst coordinates; wherein each of said blocks has N rows; n is determined by the on-chip memory size of the DSP; each of the blocks has m columns; m is the greatest common divisor of a plurality of M; step S3: copying mapx (i, j), mapy (i, j) and src data corresponding to the block into a DSP; step S4: parallel computation is performed in the DSP and dst data is copied out into cpu. The invention can maximally utilize computer resources with highest efficiency.

Description

Image remap optimization method, system, equipment and medium suitable for DSP
Technical Field
The invention relates to the technical field of image processing, in particular to an image remap optimization method, an image remap optimization system, image remap optimization equipment and an image remap optimization medium applicable to a Digital Signal Processor (DSP).
Background
DSP technology, i.e., digital signal Processing (DIGITAL SIGNAL Processing) technology, is a microprocessor chip that is specifically used to process digital signals.
The core of DSP technology is a DSP chip designed to efficiently execute digital signal processing algorithms. These algorithms typically involve audio, image, video and other forms of signal processing. DSP chips employ a harvard architecture on an internal architecture, meaning that they have independent program and data memory, and specialized hardware multipliers and pipelining, which enable the DSP chips to quickly implement a variety of complex digital signal processing algorithms.
DSP technology has been used in a wide variety of applications including, but not limited to, the following:
Communication system: DSP techniques are used for encoding, decoding, modulation, and demodulation of signals in wireless communications, network transmissions, and the like.
Consumer electronics: in smart phones, music players, etc., DSP technology is used for sound processing and image processing.
Computer application: in the field of sound cards, graphics cards, etc., DSP technology is used to improve the quality of audio and video.
Automotive electronics: in smart automobiles, DSP technology is used for in-vehicle infotainment systems and Advanced Driving Assistance Systems (ADAS).
Medical equipment: DSP technology is used for signal analysis and processing in medical imaging, electrocardiographs, and the like.
With the development of technology, DSP technology is advancing, and the application field is expanding. For example, in smart homes, DSP technology may be used for speech recognition and control; in the biotechnology field, DSP technology can be used to analyze gene expression patterns. In addition, the education field has also conducted extensive teaching and application research on DSP technology to foster more specialized talents who understand digital signal processing.
The DSP platform can only store limited data, such as 512K, which is much smaller than other platforms; at the same time, parallel lookup instructions also limit the size of the index range, such as int16 can only retrieve 16 bits of data. However, the computing power of the hardware of the computer is far greater than the storage range of the DSP platform, so that the data read by the parallel instruction on the DSP platform is far smaller than the data quantity that the computing power can bear, and thus the parallel computing power cannot be fully utilized. Compared with other platforms, the DSP platform can perform parallel search calculation, and can perform parallel calculation with up to 16 bits of data, so that the DSP platform has stronger parallel calculation capability.
With the progress of technology, the image size is larger and larger, and the processing efficiency of the DSP platform on the image is often lower due to the relatively smaller memory, and even if the DSP platform has higher computing power, the DSP platform cannot exert advantages, so that the computing efficiency is low.
The foregoing background is only for the purpose of providing an understanding of the inventive concepts and technical aspects of the present application and is not necessarily prior art to the present application and is not intended to be used as an aid in the evaluation of the novelty and creativity of the present application in the event that no clear evidence indicates that such is already disclosed at the date of filing of the present application.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an image remap optimization method, system, equipment and medium applicable to DSP.
The image remap optimization method suitable for the DSP is characterized by comprising the following steps of:
Step S1: counting the maximum difference between dst coordinates of all pixels and corresponding src coordinates, and judging whether the maximum difference is smaller than a maximum instruction range or not so as to determine a maximum coordinate difference value M; the maximum instruction range is VPLD data quantity which can be processed by the maximum instruction; the src coordinates include mapx (i, j), mapy (i, j);
Step S2: dividing an image into a plurality of blocks according to the dst coordinates; wherein each of said blocks has N rows; n is determined by the on-chip memory size of the DSP; each of the blocks has m columns; m is the greatest common divisor of a plurality of M;
Step S3: copying mapx (i, j), mapy (i, j) and src data corresponding to the block into a DSP;
step S4: parallel computation is performed in the DSP and dst data is copied out into cpu.
Optionally, the method for optimizing the image remap applicable to the DSP is characterized in that step S3 includes:
Step S31: copying mapx (i, j), mapy (i, j) and src data corresponding to the block into a DSP according to a start index;
step S32: expanding the row upwards or downwards by using the index corresponding to the dst coordinate, and if the expanded row does not exceed the image range, updating the initial index, and executing step S31; if the expanded line exceeds the image range, executing step S33;
step S33: the column of the start index is changed and the row coordinates are reset, and step S31 is performed.
Optionally, the image remap optimization method applicable to the DSP is further characterized by comprising:
Step S34: if the columns of the start index have traversed all columns and there are more row data not processed, copying consecutive N x m rows of the data corresponding to mapx (i, j), mapy (i, j) and src data into DSP.
Optionally, the image remap optimization method applicable to the DSP is characterized in that the DSP includes a first partition and a second partition, and step S3 and step S4 do not process the first partition or the second partition at the same time.
Optionally, the image remap optimization method applicable to the DSP is characterized in that when the first partition is processed in step S4, data is copied into the second partition in the DSP in step S3.
Optionally, in the step S4, the first partition and the second partition are processed in a cyclic manner.
Optionally, in the step S3, m×2m+n data connected to the block range is copied into the DSP.
The invention provides an image remap optimization system applicable to a DSP, which comprises the following modules:
The size module is used for counting the maximum difference between dst coordinates and corresponding src coordinates of all pixels, judging whether the maximum difference is smaller than a maximum instruction range or not, and accordingly determining a maximum coordinate difference value M; the maximum instruction range is VPLD data quantity which can be processed by the maximum instruction; the src coordinates include mapx (i, j), mapy (i, j);
A block module for dividing the image into a plurality of blocks according to the dst coordinates; wherein each of said blocks has N rows; n is determined by the on-chip memory size of the DSP; each of the blocks has m columns; m is the greatest common divisor of a plurality of M;
A transmission module, configured to copy the mapx (i, j), the mapy (i, j), and the src data corresponding to the block into a DSP;
And the calculation module is used for carrying out parallel calculation in the DSP and copying dst data into the cpu.
The invention provides an image remap optimizing device suitable for DSP, comprising:
A processor;
a memory having stored therein executable instructions of the processor;
Wherein the processor is configured to perform the steps of the one DSP-compliant image remap optimization method via execution of the executable instructions.
According to the present invention, there is provided a computer-readable storage medium storing a program which, when executed, implements the steps of the image remap optimization method for a DSP.
Compared with the prior art, the invention has the following beneficial effects:
The invention simultaneously accesses a plurality of data by using parallel instructions on the DSP platform, for example, simultaneously accesses 16 data (within one clock period) from more than 3 ten thousand data, and solves the problems that the computer is too fast to calculate, but the reading is slow, and the computer calculation power is not fully utilized.
The invention makes the calculation of the DSP platform parallel to the copying data and hides the time of copying data through the multi-level buffer, thereby maximizing the utilization of computer resources and maximizing the efficiency.
The invention combines the characteristics of image processing, fully utilizes the parallel random search instruction provided by the DSP, maximizes the parallelism, greatly improves the data access efficiency, fully plays the computing capacity of a computer, improves the resource utilization rate and improves the processing capacity of the image.
The sizes of the blocks are determined according to the greatest common divisor exceeding the maximum instruction range, the area incapable of performing parallel computation can be identified, the sizes of the blocks do not need to be repeatedly adjusted during computation, on-chip memories are more uniformly divided in the DSP, the parallel computing capacity of the DSP is utilized to the greatest extent, the image is rapidly processed, and the image processing efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art. Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a flowchart showing steps of an image remap optimization method applicable to a DSP according to an embodiment of the invention;
FIG. 2 is a schematic diagram of coordinate mapping in an embodiment of the present invention;
FIG. 3 is a schematic diagram of image segmentation in an embodiment of the present invention;
FIG. 4 is a schematic diagram of a first partition and a second partition according to an embodiment of the present invention;
FIG. 5 is a flowchart showing steps for copying data into a DSP according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating another step of copying data into a DSP according to an embodiment of the present invention;
FIG. 7 is a block diagram of an image remap optimization system for DSP according to an embodiment of the invention;
FIG. 8 is a schematic diagram of an image remap optimization device applicable to a DSP according to an embodiment of the invention; and
Fig. 9 is a schematic diagram of a computer-readable storage medium according to an embodiment of the present invention.
11-Src diagram;
12-dst plot;
13-mapx;
14-mapy;
15-block;
16-a first partition;
17-a second partition;
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
The invention provides an image remap optimization method applicable to a DSP (digital Signal processor), which aims to solve the problems in the prior art.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a step flowchart of an image remap optimization method suitable for a DSP in an embodiment of the invention, and as shown in fig. 1, the image remap optimization method suitable for a DSP provided by the invention includes the following steps:
Step S1: counting the maximum difference between dst coordinates of all pixels and corresponding src coordinates, and judging whether the maximum difference is smaller than a maximum instruction range or not so as to determine a maximum coordinate difference value M; the maximum instruction range is VPLD data quantity which can be processed by the maximum instruction; the src coordinates include mapx (i, j), mapy (i, j).
In this step, as shown in fig. 2, src fig. 11 is an original image, and src coordinates are coordinate values of pixel points in the original image. The dst map 12 is a generation target map, and dst coordinates are image coordinates after pixel point conversion corresponding to src coordinates. mapx is an image of an abscissa x on the src coordinate, and mapy is an image of an ordinate y on the src coordinate. Each of the mapx and mapy graphs has only one value, which is a matrix of one-dimensional data. The difference between dst coordinates and corresponding src coordinates represents the pixel range that needs to be calculated. The maximum difference between dst coordinates and corresponding src coordinates represents the maximum range of pixel points that need to be calculated. The VPLD instruction is a query instruction of the DSP platform, belongs to SIMD technology, and can access multiple data simultaneously, for example, 16 data can be accessed simultaneously from 32768 data in one clock cycle. 32768 data are calculated in a 16 bit system, 16384 data if the system is 8 bits. The parallel computation can be performed only when the maximum difference obtained in this step does not exceed the maximum range supported by the instruction, otherwise the parallel computation cannot be performed. This step ensures the feasibility of the parallel computation of the data. If a difference value larger than the maximum instruction range exists, when M is determined, M is not larger than a pixel point corresponding to the difference value.
Step S2: dividing an image into a plurality of blocks according to the dst coordinates; wherein each of said blocks has N rows; n is determined by the on-chip memory size of the DSP; each of the blocks has m columns; m is the greatest common divisor of a plurality of M.
In this step, as shown in fig. 3, the image is divided into a plurality of blocks 15 according to dst coordinates. The block size is mxn. In step S1, when there is a difference corresponding to one dst coordinate greater than the maximum command range, M1 and M2 are obtained at least in the line; when there is a difference between the two dst coordinates that is greater than the maximum command range, M1, M2, and M3 are least available in the row. M is the greatest common divisor of a plurality of M, and the size of the block can be ensured to be fixed, so that the area of each region is the same when the DSP is partitioned, the reading and calculating time of each partition is the same, and the running is more stable. Taking m as the column width, it is ensured that the image can just cover all columns of the image by multiple movements of the block. M is the maximum coordinate difference within the maximum instruction range obtained in the step S1, and N is the number determined by the on-chip memory size of the DSP and used for simultaneously calculating N rows. According to the method, the image is segmented, the utilization rate of the memory is improved, the memory is copied more accurately, and the requirement for data in calculation is met.
Step S3: copying mapx (i, j), mapy (i, j) and src data corresponding to the block into a DSP.
In this step, mapx (i, j), mapy (i, j) and corresponding src data are copied into the DSP in units of blocks. As shown in fig. 4, the DSP is divided into at least two partitions, a first partition 16 and a second partition 17. Of course, the DSP may be divided into more partitions, such as a third partition, a fourth partition, a fifth partition, a sixth partition, and so on. This embodiment describes an example in which the DSP is divided into two partitions. In the step, when the data is copied for the first time, the first block data is copied to the first partition, and after the copying is completed, the second block data is continuously copied to the second partition. And when the first partition is processed in the step S4, continuing to copy the third block data to the first partition so as to cover the original data. And when the second partition is processed in the step S4, continuing to copy the fourth block data to the second partition so as to cover the original data. And the like, repeatedly copying the data among the multiple partitions in a circulating way until all data in the image are copied.
Step S4: parallel computation is performed in the DSP and dst data is copied out into cpu.
In this step, the DSP is used as a secondary cache or a multi-level cache, a parallel computing instruction is executed in the DSP, parallel computing is performed, and dst data obtained by computing is copied to the cpu. Since bandwidth is dominant, this step applies for computers of different computing power. The data copied out in this step is the data copied in step S3. The copied data is a data range that can satisfy the requirement of one parallel computation. When the copy-out is performed, the copy-out is performed in a cycle of the corresponding partition units. For example, in the previous example, after the first partition data is copied out, the second partition is copied out next time, and then the first partition is copied out, and so on. After the data is copied into the CPU, parallel computation can be performed in the CPU, so that the DSP platform can perform parallel computation for image processing, and the computing resource is fully utilized.
In some embodiments, in step S3, the row is further extended upward or downward with the index corresponding to the dst coordinate, and the starting index is updated. The region on the dst map containing the top row or bottom row is iterated as a block at the start of the calculation, and N is taken as a step size each time the start index is updated.
In some embodiments, the DSP includes a first partition and a second partition therein, and step S3 and step S4 do not process the first partition or the second partition at the same time. Step S3 and step S4 do not process the same partition at the same time. Step S3 copies data into a partition in the DSP, and step S4 copies out the data of the copied-in partition. For example, when step S4 is processing the first partition, the data is copied into the second partition in the DSP in step S3. When the step S3 and the step S4 process different partitions respectively, the DSP may perform the copy-in and copy-out operations at the same time, so that the calculation is parallel to the copy data, and the time of copying the data is hidden, thereby maximizing the utilization of computer resources and maximizing the efficiency.
In some embodiments, in step S4, the first partition and the second partition are processed in a round robin fashion. When the CPU is copied out, different partitions are circularly processed in sequence, so that the CPU always has data to process, no extra time is needed for waiting for the data, and the efficiency is improved.
In some embodiments, M (2m+n) data connected to the block range is also copied into the DSP in step S3. The start and end coordinates need to be moved continuously during the calculation. Compared with the method that only N.M data are copied in, the method can ensure that all data required by data calculation in the block are copied in through the additional data, thereby ensuring that corresponding data sources exist during the data calculation in the block and avoiding error feedback generated during the calculation.
FIG. 5 is a flowchart illustrating steps for copying data into a DSP according to an embodiment of the present invention. As shown in fig. 5, the steps of copying data into the DSP in the embodiment of the present invention include:
step S31: and copying mapx (i, j), mapy (i, j) and src data corresponding to the block into a DSP according to a start index.
In this step, the start index is used to mark the data range. By the start index, the range of data copied into the DSP can be determined. The start index determines the range of mapx (i, j), mapy (i, j), and src data corresponding to the block.
Step S32: expanding the row upwards or downwards by using the index corresponding to the dst coordinate, and if the expanded row does not exceed the image range, updating the initial index, and executing step S31; if the expanded line is beyond the image range, step S33 is performed.
In this step, N is used as a step size each time the start index is updated. This step changes the range of the starting index by expanding the rows up or down so that the rows of the processed data area change and the columns do not change. For example, if the data range corresponding to the start index is (1, 1) - (4, 3), the start index is updated by step 3 with m=4, n=3, and the data ranges (1, 4) - (4, 6) are obtained, and the updated start index is located right below the start index before updating. When the number of lines in the dst plot is an integer multiple of N, all lines of the image can be computed in its entirety through multiple iterations. When the number of rows in the dst plot is not an integer multiple of N, there are (N-T) rows remaining after multiple iterations, where T < N. At this time, the line is extended upward or downward, and the extended line is beyond the image range, and step S33 is performed.
Step S33: the column of the start index is changed and the row coordinates are reset, and step S31 is performed.
In this step, the starting index is updated to the following M columns, and the iterative calculation is continued upward or downward. In this embodiment, when updating the start index, only the same direction can be selected when updating the line up or down. Such as updating down for the first time, then updating down. The default position of the start index is one of the four corners of the image.
FIG. 6 is a flowchart illustrating another embodiment of the steps for copying data into a DSP. As shown in fig. 6, compared to the foregoing embodiment, another step of copying data into the DSP according to the embodiment of the present invention further includes:
Step S34: if the columns of the start index have traversed all columns and there are more row data not processed, copying consecutive N x m rows of the data corresponding to mapx (i, j), mapy (i, j) and src data into DSP.
In this step, the problem of rows not being completely covered is handled. A total of (N-T) rows remain. After a plurality of iterations of the columns, a plurality of (N-T) rows remain, at this time, the (N-T) rows are copied out according to the size m multiplied by N of the block, and if the remaining data still remain after a plurality of iterations, all the remaining data are copied out for parallel calculation.
FIG. 7 is a block diagram of an image remap optimization system for a DSP according to an embodiment of the invention. The depth camera includes a flood source and a first receiver. As shown in FIG. 7, the image remap optimization system applicable to the DSP provided by the invention comprises the following modules:
The size module is used for counting the maximum difference between dst coordinates and corresponding src coordinates of all pixels, judging whether the maximum difference is smaller than a maximum instruction range or not, and accordingly determining a maximum coordinate difference value M; the maximum instruction range is VPLD data quantity which can be processed by the maximum instruction; the src coordinates include mapx (i, j), mapy (i, j);
A block module for dividing the image into a plurality of blocks according to the dst coordinates; wherein each of said blocks has N rows; n is determined by the on-chip memory size of the DSP; each of the blocks has m columns; m is the greatest common divisor of a plurality of M;
A transmission module, configured to copy the mapx (i, j), the mapy (i, j), and the src data corresponding to the block into a DSP;
And the calculation module is used for carrying out parallel calculation in the DSP and copying dst data into the cpu.
According to the embodiment, the image is partitioned by using the statistical data through multi-level caching, so that the sizes of different blocks are obtained, the calculation of the DSP platform is parallel to the copying data, the copying data time is hidden, and therefore the computer resource is utilized maximally, and the efficiency is highest; simultaneously, a plurality of data are simultaneously accessed by utilizing parallel instructions on a DSP platform, so that the problems that the computer is too fast to calculate, but the reading is slow and the computer calculation power cannot be fully utilized are solved; by combining the characteristics of image processing, the parallel random search instruction provided by the DSP is fully utilized, the parallelism is maximized, the data access efficiency is greatly improved, the computing capacity of a computer is fully exerted, the resource utilization rate is improved, and the processing capacity of the image is improved.
The embodiment of the invention also provides image remap optimizing equipment suitable for the DSP, which comprises a processor. A memory having stored therein executable instructions of a processor. Wherein the processor is configured to execute steps of a DSP-compliant image remap optimization method via execution of executable instructions.
As described above, in this embodiment, the image is partitioned by using the statistical data to obtain different block sizes, so that the calculation of the DSP platform is parallel to the copy data, and the time of copying the data is hidden, thereby maximizing the utilization of computer resources and maximizing the efficiency; simultaneously, a plurality of data are simultaneously accessed by utilizing parallel instructions on a DSP platform, so that the problems that the computer is too fast to calculate, but the reading is slow and the computer calculation power cannot be fully utilized are solved; by combining the characteristics of image processing, the parallel random search instruction provided by the DSP is fully utilized, the parallelism is maximized, the data access efficiency is greatly improved, the computing capacity of a computer is fully exerted, the resource utilization rate is improved, and the processing capacity of the image is improved.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" platform.
Fig. 8 is a schematic structural diagram of an image remap optimization device applicable to a DSP in an embodiment of the present invention. An electronic device 600 according to this embodiment of the invention is described below with reference to fig. 8. The electronic device 600 shown in fig. 8 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 8, the electronic device 600 is in the form of a general purpose computing device. Components of electronic device 600 may include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including memory unit 620 and processing unit 610), a display unit 640, etc.
Wherein the storage unit stores program code that can be executed by the processing unit 610, such that the processing unit 610 performs the steps according to various exemplary embodiments of the present invention described in the above-described image remap optimization method section of the applicable DSP of the present specification. For example, the processing unit 610 may perform the steps as shown in fig. 1.
The storage unit 620 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 6201 and/or cache memory unit 6202, and may further include Read Only Memory (ROM) 6203.
The storage unit 620 may also include a program/utility 6204 having a set (at least one) of program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 630 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 600 may also communicate with one or more external devices 700 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 600, and/or any device (e.g., router, modem, etc.) that enables the electronic device 600 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 650. Also, electronic device 600 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 over the bus 630. It should be appreciated that although not shown in fig. 8, other hardware and/or software modules may be used in connection with electronic device 600, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage platforms, and the like.
The embodiment of the invention also provides a computer readable storage medium for storing a program, and the steps of the image remap optimization method applicable to the DSP are realized when the program is executed. In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the above description of an image remap optimization method section of a DSP-applicable, when the program product is run on the terminal device.
As shown above, in this embodiment, the image is partitioned by using statistical data to obtain different block sizes, so that the calculation of the DSP platform is parallel to the copy data, and the time of copying the data is hidden, thereby maximizing the utilization of computer resources and maximizing the efficiency; simultaneously, a plurality of data are simultaneously accessed by utilizing parallel instructions on a DSP platform, so that the problems that the computer is too fast to calculate, but the reading is slow and the computer calculation power cannot be fully utilized are solved; by combining the characteristics of image processing, the parallel random search instruction provided by the DSP is fully utilized, the parallelism is maximized, the data access efficiency is greatly improved, the computing capacity of a computer is fully exerted, the resource utilization rate is improved, and the processing capacity of the image is improved.
Fig. 9 is a schematic structural view of a computer-readable storage medium in an embodiment of the present invention. Referring to fig. 9, a program product 800 for implementing the above-described method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
According to the embodiment, the image is partitioned by using the statistical data through multi-level caching, so that the sizes of different blocks are obtained, the calculation of the DSP platform is parallel to the copying data, the copying data time is hidden, and therefore the computer resource is utilized maximally, and the efficiency is highest; simultaneously, a plurality of data are simultaneously accessed by utilizing parallel instructions on a DSP platform, so that the problems that the computer is too fast to calculate, but the reading is slow and the computer calculation power cannot be fully utilized are solved; by combining the characteristics of image processing, the parallel random search instruction provided by the DSP is fully utilized, the parallelism is maximized, the data access efficiency is greatly improved, the computing capacity of a computer is fully exerted, the resource utilization rate is improved, and the processing capacity of the image is improved.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims (10)

1. The image remap optimization method suitable for the DSP is characterized by comprising the following steps of:
Step S1: counting the maximum difference between dst coordinates of all pixels and corresponding src coordinates, and judging whether the maximum difference is smaller than a maximum instruction range or not so as to determine a maximum coordinate difference value M; the maximum instruction range is VPLD data quantity which can be processed by the maximum instruction; the src coordinates include mapx (i, j), mapy (i, j);
Step S2: dividing an image into a plurality of blocks according to the dst coordinates; wherein each of said blocks has N rows; n is determined by the on-chip memory size of the DSP; each of the blocks has m columns; m is the greatest common divisor of a plurality of M;
Step S3: copying mapx (i, j), mapy (i, j) and src data corresponding to the block into a DSP;
step S4: parallel computation is performed in the DSP and dst data is copied out into cpu.
2. The method for optimizing an image remap for a DSP according to claim 1, wherein step S3 comprises:
Step S31: copying mapx (i, j), mapy (i, j) and src data corresponding to the block into a DSP according to a start index;
step S32: expanding the row upwards or downwards by using the index corresponding to the dst coordinate, and if the expanded row does not exceed the image range, updating the initial index, and executing step S31; if the expanded line exceeds the image range, executing step S33;
step S33: the column of the start index is changed and the row coordinates are reset, and step S31 is performed.
3. The method for optimizing an image remap for a DSP according to claim 2, further comprising:
Step S34: if the columns of the start index have traversed all columns and there are more row data not processed, copying consecutive N x m rows of the data corresponding to mapx (i, j), mapy (i, j) and src data into DSP.
4. The method for optimizing an image remap using a DSP according to claim 1, wherein the DSP includes a first partition and a second partition, and step S3 and step S4 do not process the first partition or the second partition at the same time.
5. The method of claim 4, wherein when step S4 is performed on the first partition, the data is copied into the second partition in the DSP in step S3.
6. The method according to claim 4, wherein in step S4, the first partition and the second partition are processed in a round-robin manner.
7. A method of optimizing an image remap for a DSP according to claim 1, characterized in that M (2m+n) data connected to said block range are copied into the DSP together in step S3.
8. An image remap optimization system applicable to a DSP (digital signal processor), which is characterized by comprising the following modules:
The size module is used for counting the maximum difference between dst coordinates and corresponding src coordinates of all pixels, judging whether the maximum difference is smaller than a maximum instruction range or not, and accordingly determining a maximum coordinate difference value M; the maximum instruction range is VPLD data quantity which can be processed by the maximum instruction; the src coordinates include mapx (i, j), mapy (i, j);
A block module for dividing the image into a plurality of blocks according to the dst coordinates; wherein each of said blocks has N rows; n is determined by the on-chip memory size of the DSP; each of the blocks has m columns; m is the greatest common divisor of a plurality of M;
A transmission module, configured to copy the mapx (i, j), the mapy (i, j), and the src data corresponding to the block into a DSP;
And the calculation module is used for carrying out parallel calculation in the DSP and copying dst data into the cpu.
9. An image remap optimizing apparatus adapted to a DSP, comprising:
A processor;
a memory having stored therein executable instructions of the processor;
Wherein the processor is configured to perform the steps of a DSP-compliant image remap optimization method of any of claims 1 to 7 via execution of the executable instructions.
10. A computer-readable storage medium storing a program, wherein the program when executed implements the steps of a DSP-compliant image remap optimization method of any of claims 1 to 7.
CN202410391801.9A 2024-04-02 Image remap optimization method, system, equipment and medium suitable for DSP Pending CN118333840A (en)

Publications (1)

Publication Number Publication Date
CN118333840A true CN118333840A (en) 2024-07-12

Family

ID=

Similar Documents

Publication Publication Date Title
CN110506260B (en) Methods, systems, and media for enhanced data processing in a neural network environment
US20230306249A1 (en) Transposed convolution using systolic array
CN104008064B (en) The method and system compressed for multi-level store
WO2020073211A1 (en) Operation accelerator, processing method, and related device
CN106846235B (en) Convolution optimization method and system accelerated by NVIDIA Kepler GPU assembly instruction
CN111008040B (en) Cache device and cache method, computing device and computing method
CN114026569A (en) Extended convolution using systolic arrays
TWI775210B (en) Data dividing method and processor for convolution operation
CN113313247B (en) Operation method of sparse neural network based on data flow architecture
CN114092336A (en) Image scaling method, device, equipment and medium based on bilinear interpolation algorithm
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
CN112799599A (en) Data storage method, computing core, chip and electronic equipment
CN112396085A (en) Method and apparatus for recognizing image
US11030714B2 (en) Wide key hash table for a graphics processing unit
CN112200310B (en) Intelligent processor, data processing method and storage medium
CN117574970A (en) Inference acceleration method, system, terminal and medium for large-scale language model
CN117370488A (en) Data processing method, device, electronic equipment and computer readable storage medium
WO2020256836A1 (en) Sparse convolutional neural network
CN118333840A (en) Image remap optimization method, system, equipment and medium suitable for DSP
CN118229509A (en) Image processing optimization method, system, equipment and medium suitable for DSP
CN118043821A (en) Hybrid sparse compression
CN110222777B (en) Image feature processing method and device, electronic equipment and storage medium
CN109308194B (en) Method and apparatus for storing data
CN115034351A (en) Data processing method, convolutional neural network training method and device and FPGA
CN113570053A (en) Neural network model training method and device and computing equipment

Legal Events

Date Code Title Description
PB01 Publication