CN107527320B - Method for accelerating bilinear interpolation calculation - Google Patents
Method for accelerating bilinear interpolation calculation Download PDFInfo
- Publication number
- CN107527320B CN107527320B CN201610479164.6A CN201610479164A CN107527320B CN 107527320 B CN107527320 B CN 107527320B CN 201610479164 A CN201610479164 A CN 201610479164A CN 107527320 B CN107527320 B CN 107527320B
- Authority
- CN
- China
- Prior art keywords
- instruction
- data
- bilinear interpolation
- pixels
- epi32
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000004364 calculation method Methods 0.000 title claims description 15
- 230000002093 peripheral effect Effects 0.000 claims description 7
- 230000008676 import Effects 0.000 claims description 2
- 230000001133 acceleration Effects 0.000 abstract description 6
- 230000000694 effects Effects 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4007—Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
Abstract
The bilinear interpolation algorithm is widely applied when image scaling is processed, but as floating point operation and a large number of multiplication operations are involved in the algorithm, the operation amount is increased along with the increase of the image size, the real-time requirement is difficult to meet, and simultaneously, the finally realized chip has high power consumption and low processing speed. The initial acceleration of the bilinear interpolation algorithm can be considered to remove floating points, but the invention uses SSE acceleration method faster than the floating point removal acceleration method. The SSE of the X86 architecture platform has the capability of processing 128-bit data at the same time, and experimental results show that compared with the original algorithm, the SSE instruction acceleration method used by the invention can be accelerated by more than one time.
Description
Technical Field
The invention relates to a method for accelerating a bilinear interpolation algorithm in the field of image processing.
Background
The image scaling is one of basic operations in the field of image processing, and the algorithms for image scaling are many, and the common algorithms include a nearest neighbor method, an edge-based image algorithm and a bilinear interpolation algorithm. The nearest neighbor algorithm is simplest, but has a very poor image scaling effect; although the image algorithm based on the edge has good effect, the algorithm has high complexity and is difficult to realize; the bilinear interpolation algorithm can be a compromise between the effect and the algorithm complexity, so the bilinear interpolation algorithm is most applied.
The principle of the bilinear interpolation algorithm is that four real pixel values around a virtual point in a source image are utilized to jointly determine one pixel value in a target image, so that the information of an original image is reflected more truly. However, a large number of multiplication operations are adopted in the bilinear interpolation algorithm, and floating point operations are involved, so that when the size of an image is increased, the operation amount is correspondingly increased, and a computer vision application scene generally has a certain requirement on real-time performance, so that the acceleration algorithm is a key point for research.
Disclosure of Invention
The invention aims to provide a method for accelerating bilinear interpolation calculation, which has the speed at least doubled compared with the speed of a common method.
The invention aims to solve the technical problem of accelerating a bilinear interpolation algorithm under the condition of unchanging effect.
In order to solve the technical problems, the invention adopts the following technical scheme: the SSE Instruction set is a Single Instruction, Multiple Data (SIMD) Instruction set on an X86 platform, and the invention achieves the purpose of improving the Data processing efficiency by using a mode of processing a plurality of Data by one SSE Instruction. The method mainly comprises the following steps:
(1) the mm _ loadl _ epi64 instruction imports loading two pairs of pixels P12 and P34, with P12 being pixels P1 and P2 and P34 being pixels P3 and P4;
(2) calculating weights W1, W2, W3, W4, mm _ mul _ ps instruction multiplies W by 256, mm _ cvtps _ epi32 converts W into integer, mm _ packs _ epi32 converts 32-bit data expansion into 16-bit data, and W is { W1, W2, W3, W4 };
(3) converting [ RGBARBARBARGBAGRBA ] into [ RRRRRRGGGGBBBBAAA ] data by specifically recombining the data types through an instruction of _ mm _ unpacklo _ epi8 and _ mm _ unpackhi _ epi64, namely converting the AoS type into the SoA type;
(4) an instruction of _ mm _ unpacklo _ epi8 acquires low-order data and converts the low-order data into 16 bits to obtain data pRG, and an instruction of _ mm _ unpackhi _ epi8 acquires high-order data and converts the high-order data into 16 bits to obtain data pBA;
(5) the mm _ madd _ epi16 instruction calculates the weight W with pRG and pBA to obtain outRG and outBA;
(6) the mm _ hadd _ epi32 instruction adds the outRG, outBA levels;
(7) the instructions _ mm _ packus _ epi32, _ mm _ packus _ epi16 and _ mm _ cvtsi128_ si32 convert the final data into a 32-bit integer, wherein the 32-bit integer is the target pixel RGBA value, and the calculation is completed;
the weight of the four peripheral pixels is calculated as follows: the weights of the four peripheral pixels are calculated by SSE instruction as follows
(21) A _mm _ set _ ss instruction loads floating point coordinates (x, y), and a _mm _ unpacklo _ ps instruction cross packs x and y;
(22) the _ m _ floor _ ps instruction calculates the integer parts ix and iy of x and y, and the _ mm _ sub _ ps instruction calculates the coordinate fractional parts fx and fy;
(23) mm _ sub _ ps, mm _ unpacklo _ ps, mm _ move _ ps, and mm _ shuffle _ ps instructions calculate Wx ═ 1-fx, fx,1-fx, fx ], and Wy ═ 1-fy,1-fy, fy, fy ];
(24) the mm _ mul _ ps instruction calculates Wx × Wy, and then 4 weight values of W1, W2, W3 and W4 are obtained: (1-fx) × (1-fy), fx × (1-fy), (1-fx) × fy, fx × fy, the weight calculation is complete and the values are returned.
Drawings
Fig. 1-an example of the position of a new pixel in the target image in the region of the source image 2 x 2.
FIG. 2 is a flow chart of bilinear interpolation calculation according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating weight calculation of neighboring pixels in bilinear interpolation according to an embodiment of the present invention.
Detailed Description
The following describes the technical solution of the present invention in detail by taking examples, and first describes how a bilinear interpolation algorithm is calculated, in a bilinear interpolation scaling algorithm of an image, a pixel value in a target image is determined by pixel values of four real points around a corresponding virtual point in a source image. The method for calculating the value of a certain pixel point in a target image by the bilinear interpolation algorithm mainly comprises the following steps:
1. determining the coordinates of a virtual point P (x, y) of a corresponding source image;
fx=frac(x)fy=frac(y)
2. loading four adjacent pixels;
P=[p1,p2,p3,p4]
3. calculating weights for the four pixels respectively;
W=[(1-fx)*(1-fy),fx*(1-fy),(1-fx)*fy,fx*fy]
4. calculating pixel values in the target image;
N=dot(P,W)
5. and returning a result N calculated by bilinear interpolation.
From the above steps, it can be seen that: the bilinear interpolation algorithm is not very complex, but floating point operation and a large number of multiplication operations are involved, so when the size of a target image is large, the calculation amount is increased along with the increase of the size of the target image, and the requirement on real-time performance is difficult to meet.
The present invention is explained in detail below with reference to the sub-figures, and as shown in fig. 2, an input image is an RGBA image, and each pixel is 32 bits, and the present invention discloses a method for accelerating bilinear interpolation calculation, which mainly comprises the following steps:
1. loading data:
the mm _ loadl _ epi64 command loads two pairs of pixels P12 and P34, P12 being pixels P1 and P2, P34 being pixels P3 and P4, and the peripheral pixel positions are defined as shown in FIG. 1;
2. calculating the peripheral pixel weight:
a _ mm _ mul _ ps instruction multiplies the weight W of the floating point type by 256, a _ mm _ cvtps _ epi32 instruction converts the weight to an integer, and a _ mm _ packs _ epi32 instruction converts the 32-bit data extension to 16-bit data;
3. recombining RGBA value, converting [ RGBARBARBARBAGGRBA ] into [ RRRRGGGGGGBBBBBBAAAA ] data, namely converting the AoS type into the SoA type:
the data types are reorganized specifically by the _ mm _ unpacklo _ epi8, _ mm _ unpackhi _ epi64 instructions. In the present invention, the SoA (Structure of Array) type is used to further increase the speed, and such data type may use the instructions of _ mm _ madd _ epi16 (multiply add instruction), _ mmhadd _ epi32 (horizontal add instruction), and the like.
4. Converting 32-bit [ RRRRGGGGBBBBAAAA ] data into 2 groups of 16-bit data:
an instruction of _ mm _ unpacklo _ epi8 acquires low-order data and converts the low-order data into 16 bits to obtain data pRG, and an instruction of _ mm _ unpackhi _ epi8 acquires high-order data and converts the high-order data into 16 bits to obtain data pBA;
the _ mm _ madd _ epi16 instruction calculates the weight W with pRG and pBA to obtain outRG and outBA;
the _ mm _ hadd _ epi32 instruction adds the outRG, outBA levels;
7._ mm _ packus _ epi32, _ mm _ packus _ epi16 convert the data to 8 bits, and _ mm _ cvtsi128_ si32 convert the final data to a 32-bit integer, which is the target pixel RGBA value, and the calculation is complete.
The step 2 of calculating the peripheral pixel weights is as shown in fig. 3, and the specific steps include:
a _ mm _ set _ ss instruction loads floating point coordinates (x, y), a _ mm _ unpacklo _ ps instruction cross packs x and y;
2._ m _ floor _ ps instruction calculates the integer parts ix and iy of x and y, and _ mm _ sub _ ps instruction calculates the coordinate fractional parts fx and fy;
instructions such as _ mm _ sub _ ps, _ mm _ unpacklo _ ps, _ mm _ movell _ ps _ mm _ shuffle _ ps, etc. calculate Wx ═ 1-fx, fx,1-fx, fx ] and Wy ═ 1-fy,1-fy, fy, fy ];
4._ mm _ mul _ ps instruction calculates Wx × Wy, i.e. the 4 weight values of W1, W2, W3, W4 are obtained: (1-fx) × (1-fy), fx × (1-fy), (1-fx) × fy, fx × fy, the weight calculation is complete and the values are returned.
The acceleration method used by the invention reduces the cycle number of calculation, improves the calculation speed and ensures the real-time performance of operation by a mode of processing a plurality of data by one SSE instruction.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (1)
1. A method for accelerating bilinear interpolation computation is characterized by comprising the following steps:
(1) the mm _ loadl _ epi64 instruction imports loading two pairs of pixels P12 and P34, with P12 being pixels P1 and P2 and P34 being pixels P3 and P4;
(2) calculating weights W1, W2, W3, W4, mm _ mul _ ps instruction multiplies W by 256, mm _ cvtps _ epi32 converts W into integer, mm _ packs _ epi32 converts 32-bit data expansion into 16-bit data, and W is { W1, W2, W3, W4 };
(3) the data types are recombined by the instructions of _ mm _ unpacklo _ epi8, _ mm _ unpackhi _ epi 64: converting [ RGBARBARBARBAGGRBA ] into [ RRRRGGGGBBBBAAAA ] data, namely converting the AoS type into the SoA type;
(4) an instruction of _ mm _ unpacklo _ epi8 acquires low-order data and converts the low-order data into 16 bits to obtain data pRG, and an instruction of _ mm _ unpackhi _ epi8 acquires high-order data and converts the high-order data into 16 bits to obtain data pBA;
(5) the mm _ madd _ epi16 instruction calculates the weight W with pRG and pBA to obtain outRG and outBA;
(6) the mm _ hadd _ epi32 instruction adds the outRG, outBA levels;
(7) the instructions _ mm _ packus _ epi32, _ mm _ packus _ epi16 and _ mm _ cvtsi128_ si32 convert the final data into a 32-bit integer, wherein the 32-bit integer is the target pixel RGBA value, and the calculation is completed;
the weight of the four peripheral pixels is calculated as follows: the weights of the four peripheral pixels are calculated by SSE instruction as follows
(21) A _mm _ set _ ss instruction loads floating point coordinates (x, y), and a _mm _ unpacklo _ ps instruction cross packs x and y;
(22) the _ m _ floor _ ps instruction calculates the integer parts ix and iy of x and y, and the _ mm _ sub _ ps instruction calculates the coordinate fractional parts fx and fy;
(23) mm _ sub _ ps, mm _ unpacklo _ ps, mm _ move _ ps, and mm _ shuffle _ ps instructions calculate Wx ═ 1-fx, fx,1-fx, fx ], and Wy ═ 1-fy,1-fy, fy, fy ];
(24) the mm _ mul _ ps instruction calculates Wx × Wy, and then 4 weight values of W1, W2, W3 and W4 are obtained: (1-fx) × (1-fy), fx × (1-fy), (1-fx) × fy, fx × fy, the weight calculation is complete and the values are returned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610479164.6A CN107527320B (en) | 2016-06-22 | 2016-06-22 | Method for accelerating bilinear interpolation calculation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610479164.6A CN107527320B (en) | 2016-06-22 | 2016-06-22 | Method for accelerating bilinear interpolation calculation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107527320A CN107527320A (en) | 2017-12-29 |
CN107527320B true CN107527320B (en) | 2020-06-02 |
Family
ID=60734201
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610479164.6A Active CN107527320B (en) | 2016-06-22 | 2016-06-22 | Method for accelerating bilinear interpolation calculation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107527320B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111210389B (en) * | 2020-01-10 | 2023-09-19 | 北京华捷艾米科技有限公司 | Image scaling processing method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8090031B2 (en) * | 2007-10-05 | 2012-01-03 | Hong Kong Applied Science and Technology Research Institute Company Limited | Method for motion compensation |
CN102831576A (en) * | 2012-06-14 | 2012-12-19 | 北京暴风科技股份有限公司 | Video image zooming method and system |
CN104378642A (en) * | 2014-10-29 | 2015-02-25 | 南昌大学 | Quick H.264 fractional pixel interpolation method based on CUDA |
CN104952038A (en) * | 2015-06-05 | 2015-09-30 | 北京大恒图像视觉有限公司 | SSE2 (streaming SIMD extensions 2nd) instruction set based image interpolation method |
-
2016
- 2016-06-22 CN CN201610479164.6A patent/CN107527320B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8090031B2 (en) * | 2007-10-05 | 2012-01-03 | Hong Kong Applied Science and Technology Research Institute Company Limited | Method for motion compensation |
CN102831576A (en) * | 2012-06-14 | 2012-12-19 | 北京暴风科技股份有限公司 | Video image zooming method and system |
CN104378642A (en) * | 2014-10-29 | 2015-02-25 | 南昌大学 | Quick H.264 fractional pixel interpolation method based on CUDA |
CN104952038A (en) * | 2015-06-05 | 2015-09-30 | 北京大恒图像视觉有限公司 | SSE2 (streaming SIMD extensions 2nd) instruction set based image interpolation method |
Also Published As
Publication number | Publication date |
---|---|
CN107527320A (en) | 2017-12-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062472B (en) | Sparse neural network accelerator based on structured pruning and acceleration method thereof | |
US8248422B2 (en) | Efficient texture processing of pixel groups with SIMD execution unit | |
JP2010191991A (en) | Device or method for calculating surface normal vector | |
Jagadeshwar Rao et al. | A high speed wallace tree multiplier using modified booth algorithm for fast arithmetic circuits | |
Venkatachalam et al. | Approximate sum-of-products designs based on distributed arithmetic | |
CN102567950A (en) | Image scaling method and system | |
CN116049907A (en) | Paillier homomorphic encryption processor and processing method thereof | |
CN107527320B (en) | Method for accelerating bilinear interpolation calculation | |
JP4300001B2 (en) | Clipping device | |
US20040046764A1 (en) | Pixel delta interpolation method and apparatus | |
Monteiro et al. | Exploring the impacts of multiple kernel sizes of Gaussian filters combined to approximate computing in canny edge detection | |
Qiqieh et al. | Energy-efficient approximate wallace-tree multiplier using significance-driven logic compression | |
Molahosseini et al. | Efficient MRC-based residue to binary converters for the new moduli sets {2 2n, 2 n-1, 2 n+ 1-1} and {2 2n, 2 n-1, 2 n-1-1} | |
WO2023100372A1 (en) | Data processing device, data processing method, and data processing program | |
Cao et al. | A new design method to modulo 2/sup n/-1 squaring | |
Nandhini et al. | Implementation of Normal Urdhva Tiryakbhayam Multiplier in VLSI | |
JP2021500653A (en) | Data processing device and data processing method | |
Mazza et al. | A comparison of hardware/software techniques in the speedup of color image processing algorithms | |
Huang et al. | A Trusted Inference Mechanism for Edge Computing Based on Post-Quantum Encryption | |
WO2021184143A1 (en) | Data processing apparatus and data processing method | |
Kumari et al. | Design and Implementation of 12-bit Vedic Multiplier using Optimized Decoder-based Adder | |
Wojko | Pipelined multipliers and FPGA architectures | |
JPH03268024A (en) | Microprocessor, information processor and graphic display device using it | |
Gao et al. | Dynamic programming addition optimization approach for large size multipliers in FPGAs | |
Saranya et al. | Design of an Efficient Mac Unit for DSP Applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: Everwise road in Qinhuai District of Nanjing City, Jiangsu province 210001 No. 6 Baixia Nanjing high tech Industrial Park, No. four, building B, F23 (423). Patentee after: Nanjing inspector Intelligent Technology Co., Ltd Address before: Everwise road in Qinhuai District of Nanjing City, Jiangsu province 210001 No. 6 Baixia Nanjing high tech Industrial Park, No. four, building B, F23 (423). Patentee before: NANJING SHICHAZHE IMAGE IDENTIFICATION TECHNOLOGY Co.,Ltd. |