CN107527320B - Method for accelerating bilinear interpolation calculation - Google Patents

Method for accelerating bilinear interpolation calculation Download PDF

Info

Publication number
CN107527320B
CN107527320B CN201610479164.6A CN201610479164A CN107527320B CN 107527320 B CN107527320 B CN 107527320B CN 201610479164 A CN201610479164 A CN 201610479164A CN 107527320 B CN107527320 B CN 107527320B
Authority
CN
China
Prior art keywords
instruction
data
bilinear interpolation
pixels
epi32
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610479164.6A
Other languages
Chinese (zh)
Other versions
CN107527320A (en
Inventor
朱旭光
刘宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing inspector Intelligent Technology Co., Ltd
Original Assignee
Nanjing Shichazhe Image Identification Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Shichazhe Image Identification Technology Co ltd filed Critical Nanjing Shichazhe Image Identification Technology Co ltd
Priority to CN201610479164.6A priority Critical patent/CN107527320B/en
Publication of CN107527320A publication Critical patent/CN107527320A/en
Application granted granted Critical
Publication of CN107527320B publication Critical patent/CN107527320B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The bilinear interpolation algorithm is widely applied when image scaling is processed, but as floating point operation and a large number of multiplication operations are involved in the algorithm, the operation amount is increased along with the increase of the image size, the real-time requirement is difficult to meet, and simultaneously, the finally realized chip has high power consumption and low processing speed. The initial acceleration of the bilinear interpolation algorithm can be considered to remove floating points, but the invention uses SSE acceleration method faster than the floating point removal acceleration method. The SSE of the X86 architecture platform has the capability of processing 128-bit data at the same time, and experimental results show that compared with the original algorithm, the SSE instruction acceleration method used by the invention can be accelerated by more than one time.

Description

Method for accelerating bilinear interpolation calculation
Technical Field
The invention relates to a method for accelerating a bilinear interpolation algorithm in the field of image processing.
Background
The image scaling is one of basic operations in the field of image processing, and the algorithms for image scaling are many, and the common algorithms include a nearest neighbor method, an edge-based image algorithm and a bilinear interpolation algorithm. The nearest neighbor algorithm is simplest, but has a very poor image scaling effect; although the image algorithm based on the edge has good effect, the algorithm has high complexity and is difficult to realize; the bilinear interpolation algorithm can be a compromise between the effect and the algorithm complexity, so the bilinear interpolation algorithm is most applied.
The principle of the bilinear interpolation algorithm is that four real pixel values around a virtual point in a source image are utilized to jointly determine one pixel value in a target image, so that the information of an original image is reflected more truly. However, a large number of multiplication operations are adopted in the bilinear interpolation algorithm, and floating point operations are involved, so that when the size of an image is increased, the operation amount is correspondingly increased, and a computer vision application scene generally has a certain requirement on real-time performance, so that the acceleration algorithm is a key point for research.
Disclosure of Invention
The invention aims to provide a method for accelerating bilinear interpolation calculation, which has the speed at least doubled compared with the speed of a common method.
The invention aims to solve the technical problem of accelerating a bilinear interpolation algorithm under the condition of unchanging effect.
In order to solve the technical problems, the invention adopts the following technical scheme: the SSE Instruction set is a Single Instruction, Multiple Data (SIMD) Instruction set on an X86 platform, and the invention achieves the purpose of improving the Data processing efficiency by using a mode of processing a plurality of Data by one SSE Instruction. The method mainly comprises the following steps:
(1) the mm _ loadl _ epi64 instruction imports loading two pairs of pixels P12 and P34, with P12 being pixels P1 and P2 and P34 being pixels P3 and P4;
(2) calculating weights W1, W2, W3, W4, mm _ mul _ ps instruction multiplies W by 256, mm _ cvtps _ epi32 converts W into integer, mm _ packs _ epi32 converts 32-bit data expansion into 16-bit data, and W is { W1, W2, W3, W4 };
(3) converting [ RGBARBARBARGBAGRBA ] into [ RRRRRRGGGGBBBBAAA ] data by specifically recombining the data types through an instruction of _ mm _ unpacklo _ epi8 and _ mm _ unpackhi _ epi64, namely converting the AoS type into the SoA type;
(4) an instruction of _ mm _ unpacklo _ epi8 acquires low-order data and converts the low-order data into 16 bits to obtain data pRG, and an instruction of _ mm _ unpackhi _ epi8 acquires high-order data and converts the high-order data into 16 bits to obtain data pBA;
(5) the mm _ madd _ epi16 instruction calculates the weight W with pRG and pBA to obtain outRG and outBA;
(6) the mm _ hadd _ epi32 instruction adds the outRG, outBA levels;
(7) the instructions _ mm _ packus _ epi32, _ mm _ packus _ epi16 and _ mm _ cvtsi128_ si32 convert the final data into a 32-bit integer, wherein the 32-bit integer is the target pixel RGBA value, and the calculation is completed;
the weight of the four peripheral pixels is calculated as follows: the weights of the four peripheral pixels are calculated by SSE instruction as follows
(21) A _mm _ set _ ss instruction loads floating point coordinates (x, y), and a _mm _ unpacklo _ ps instruction cross packs x and y;
(22) the _ m _ floor _ ps instruction calculates the integer parts ix and iy of x and y, and the _ mm _ sub _ ps instruction calculates the coordinate fractional parts fx and fy;
(23) mm _ sub _ ps, mm _ unpacklo _ ps, mm _ move _ ps, and mm _ shuffle _ ps instructions calculate Wx ═ 1-fx, fx,1-fx, fx ], and Wy ═ 1-fy,1-fy, fy, fy ];
(24) the mm _ mul _ ps instruction calculates Wx × Wy, and then 4 weight values of W1, W2, W3 and W4 are obtained: (1-fx) × (1-fy), fx × (1-fy), (1-fx) × fy, fx × fy, the weight calculation is complete and the values are returned.
Drawings
Fig. 1-an example of the position of a new pixel in the target image in the region of the source image 2 x 2.
FIG. 2 is a flow chart of bilinear interpolation calculation according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating weight calculation of neighboring pixels in bilinear interpolation according to an embodiment of the present invention.
Detailed Description
The following describes the technical solution of the present invention in detail by taking examples, and first describes how a bilinear interpolation algorithm is calculated, in a bilinear interpolation scaling algorithm of an image, a pixel value in a target image is determined by pixel values of four real points around a corresponding virtual point in a source image. The method for calculating the value of a certain pixel point in a target image by the bilinear interpolation algorithm mainly comprises the following steps:
1. determining the coordinates of a virtual point P (x, y) of a corresponding source image;
fx=frac(x)fy=frac(y)
2. loading four adjacent pixels;
P=[p1,p2,p3,p4]
3. calculating weights for the four pixels respectively;
W=[(1-fx)*(1-fy),fx*(1-fy),(1-fx)*fy,fx*fy]
4. calculating pixel values in the target image;
N=dot(P,W)
5. and returning a result N calculated by bilinear interpolation.
From the above steps, it can be seen that: the bilinear interpolation algorithm is not very complex, but floating point operation and a large number of multiplication operations are involved, so when the size of a target image is large, the calculation amount is increased along with the increase of the size of the target image, and the requirement on real-time performance is difficult to meet.
The present invention is explained in detail below with reference to the sub-figures, and as shown in fig. 2, an input image is an RGBA image, and each pixel is 32 bits, and the present invention discloses a method for accelerating bilinear interpolation calculation, which mainly comprises the following steps:
1. loading data:
the mm _ loadl _ epi64 command loads two pairs of pixels P12 and P34, P12 being pixels P1 and P2, P34 being pixels P3 and P4, and the peripheral pixel positions are defined as shown in FIG. 1;
2. calculating the peripheral pixel weight:
a _ mm _ mul _ ps instruction multiplies the weight W of the floating point type by 256, a _ mm _ cvtps _ epi32 instruction converts the weight to an integer, and a _ mm _ packs _ epi32 instruction converts the 32-bit data extension to 16-bit data;
3. recombining RGBA value, converting [ RGBARBARBARBAGGRBA ] into [ RRRRGGGGGGBBBBBBAAAA ] data, namely converting the AoS type into the SoA type:
the data types are reorganized specifically by the _ mm _ unpacklo _ epi8, _ mm _ unpackhi _ epi64 instructions. In the present invention, the SoA (Structure of Array) type is used to further increase the speed, and such data type may use the instructions of _ mm _ madd _ epi16 (multiply add instruction), _ mmhadd _ epi32 (horizontal add instruction), and the like.
4. Converting 32-bit [ RRRRGGGGBBBBAAAA ] data into 2 groups of 16-bit data:
an instruction of _ mm _ unpacklo _ epi8 acquires low-order data and converts the low-order data into 16 bits to obtain data pRG, and an instruction of _ mm _ unpackhi _ epi8 acquires high-order data and converts the high-order data into 16 bits to obtain data pBA;
the _ mm _ madd _ epi16 instruction calculates the weight W with pRG and pBA to obtain outRG and outBA;
the _ mm _ hadd _ epi32 instruction adds the outRG, outBA levels;
7._ mm _ packus _ epi32, _ mm _ packus _ epi16 convert the data to 8 bits, and _ mm _ cvtsi128_ si32 convert the final data to a 32-bit integer, which is the target pixel RGBA value, and the calculation is complete.
The step 2 of calculating the peripheral pixel weights is as shown in fig. 3, and the specific steps include:
a _ mm _ set _ ss instruction loads floating point coordinates (x, y), a _ mm _ unpacklo _ ps instruction cross packs x and y;
2._ m _ floor _ ps instruction calculates the integer parts ix and iy of x and y, and _ mm _ sub _ ps instruction calculates the coordinate fractional parts fx and fy;
instructions such as _ mm _ sub _ ps, _ mm _ unpacklo _ ps, _ mm _ movell _ ps _ mm _ shuffle _ ps, etc. calculate Wx ═ 1-fx, fx,1-fx, fx ] and Wy ═ 1-fy,1-fy, fy, fy ];
4._ mm _ mul _ ps instruction calculates Wx × Wy, i.e. the 4 weight values of W1, W2, W3, W4 are obtained: (1-fx) × (1-fy), fx × (1-fy), (1-fx) × fy, fx × fy, the weight calculation is complete and the values are returned.
The acceleration method used by the invention reduces the cycle number of calculation, improves the calculation speed and ensures the real-time performance of operation by a mode of processing a plurality of data by one SSE instruction.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A method for accelerating bilinear interpolation computation is characterized by comprising the following steps:
(1) the mm _ loadl _ epi64 instruction imports loading two pairs of pixels P12 and P34, with P12 being pixels P1 and P2 and P34 being pixels P3 and P4;
(2) calculating weights W1, W2, W3, W4, mm _ mul _ ps instruction multiplies W by 256, mm _ cvtps _ epi32 converts W into integer, mm _ packs _ epi32 converts 32-bit data expansion into 16-bit data, and W is { W1, W2, W3, W4 };
(3) the data types are recombined by the instructions of _ mm _ unpacklo _ epi8, _ mm _ unpackhi _ epi 64: converting [ RGBARBARBARBAGGRBA ] into [ RRRRGGGGBBBBAAAA ] data, namely converting the AoS type into the SoA type;
(4) an instruction of _ mm _ unpacklo _ epi8 acquires low-order data and converts the low-order data into 16 bits to obtain data pRG, and an instruction of _ mm _ unpackhi _ epi8 acquires high-order data and converts the high-order data into 16 bits to obtain data pBA;
(5) the mm _ madd _ epi16 instruction calculates the weight W with pRG and pBA to obtain outRG and outBA;
(6) the mm _ hadd _ epi32 instruction adds the outRG, outBA levels;
(7) the instructions _ mm _ packus _ epi32, _ mm _ packus _ epi16 and _ mm _ cvtsi128_ si32 convert the final data into a 32-bit integer, wherein the 32-bit integer is the target pixel RGBA value, and the calculation is completed;
the weight of the four peripheral pixels is calculated as follows: the weights of the four peripheral pixels are calculated by SSE instruction as follows
(21) A _mm _ set _ ss instruction loads floating point coordinates (x, y), and a _mm _ unpacklo _ ps instruction cross packs x and y;
(22) the _ m _ floor _ ps instruction calculates the integer parts ix and iy of x and y, and the _ mm _ sub _ ps instruction calculates the coordinate fractional parts fx and fy;
(23) mm _ sub _ ps, mm _ unpacklo _ ps, mm _ move _ ps, and mm _ shuffle _ ps instructions calculate Wx ═ 1-fx, fx,1-fx, fx ], and Wy ═ 1-fy,1-fy, fy, fy ];
(24) the mm _ mul _ ps instruction calculates Wx × Wy, and then 4 weight values of W1, W2, W3 and W4 are obtained: (1-fx) × (1-fy), fx × (1-fy), (1-fx) × fy, fx × fy, the weight calculation is complete and the values are returned.
CN201610479164.6A 2016-06-22 2016-06-22 Method for accelerating bilinear interpolation calculation Active CN107527320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610479164.6A CN107527320B (en) 2016-06-22 2016-06-22 Method for accelerating bilinear interpolation calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610479164.6A CN107527320B (en) 2016-06-22 2016-06-22 Method for accelerating bilinear interpolation calculation

Publications (2)

Publication Number Publication Date
CN107527320A CN107527320A (en) 2017-12-29
CN107527320B true CN107527320B (en) 2020-06-02

Family

ID=60734201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610479164.6A Active CN107527320B (en) 2016-06-22 2016-06-22 Method for accelerating bilinear interpolation calculation

Country Status (1)

Country Link
CN (1) CN107527320B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210389B (en) * 2020-01-10 2023-09-19 北京华捷艾米科技有限公司 Image scaling processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8090031B2 (en) * 2007-10-05 2012-01-03 Hong Kong Applied Science and Technology Research Institute Company Limited Method for motion compensation
CN102831576A (en) * 2012-06-14 2012-12-19 北京暴风科技股份有限公司 Video image zooming method and system
CN104378642A (en) * 2014-10-29 2015-02-25 南昌大学 Quick H.264 fractional pixel interpolation method based on CUDA
CN104952038A (en) * 2015-06-05 2015-09-30 北京大恒图像视觉有限公司 SSE2 (streaming SIMD extensions 2nd) instruction set based image interpolation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8090031B2 (en) * 2007-10-05 2012-01-03 Hong Kong Applied Science and Technology Research Institute Company Limited Method for motion compensation
CN102831576A (en) * 2012-06-14 2012-12-19 北京暴风科技股份有限公司 Video image zooming method and system
CN104378642A (en) * 2014-10-29 2015-02-25 南昌大学 Quick H.264 fractional pixel interpolation method based on CUDA
CN104952038A (en) * 2015-06-05 2015-09-30 北京大恒图像视觉有限公司 SSE2 (streaming SIMD extensions 2nd) instruction set based image interpolation method

Also Published As

Publication number Publication date
CN107527320A (en) 2017-12-29

Similar Documents

Publication Publication Date Title
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
US8248422B2 (en) Efficient texture processing of pixel groups with SIMD execution unit
JP2010191991A (en) Device or method for calculating surface normal vector
Jagadeshwar Rao et al. A high speed wallace tree multiplier using modified booth algorithm for fast arithmetic circuits
Venkatachalam et al. Approximate sum-of-products designs based on distributed arithmetic
CN102567950A (en) Image scaling method and system
CN116049907A (en) Paillier homomorphic encryption processor and processing method thereof
CN107527320B (en) Method for accelerating bilinear interpolation calculation
JP4300001B2 (en) Clipping device
US20040046764A1 (en) Pixel delta interpolation method and apparatus
Monteiro et al. Exploring the impacts of multiple kernel sizes of Gaussian filters combined to approximate computing in canny edge detection
Qiqieh et al. Energy-efficient approximate wallace-tree multiplier using significance-driven logic compression
Molahosseini et al. Efficient MRC-based residue to binary converters for the new moduli sets {2 2n, 2 n-1, 2 n+ 1-1} and {2 2n, 2 n-1, 2 n-1-1}
WO2023100372A1 (en) Data processing device, data processing method, and data processing program
Cao et al. A new design method to modulo 2/sup n/-1 squaring
Nandhini et al. Implementation of Normal Urdhva Tiryakbhayam Multiplier in VLSI
JP2021500653A (en) Data processing device and data processing method
Mazza et al. A comparison of hardware/software techniques in the speedup of color image processing algorithms
Huang et al. A Trusted Inference Mechanism for Edge Computing Based on Post-Quantum Encryption
WO2021184143A1 (en) Data processing apparatus and data processing method
Kumari et al. Design and Implementation of 12-bit Vedic Multiplier using Optimized Decoder-based Adder
Wojko Pipelined multipliers and FPGA architectures
JPH03268024A (en) Microprocessor, information processor and graphic display device using it
Gao et al. Dynamic programming addition optimization approach for large size multipliers in FPGAs
Saranya et al. Design of an Efficient Mac Unit for DSP Applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: Everwise road in Qinhuai District of Nanjing City, Jiangsu province 210001 No. 6 Baixia Nanjing high tech Industrial Park, No. four, building B, F23 (423).

Patentee after: Nanjing inspector Intelligent Technology Co., Ltd

Address before: Everwise road in Qinhuai District of Nanjing City, Jiangsu province 210001 No. 6 Baixia Nanjing high tech Industrial Park, No. four, building B, F23 (423).

Patentee before: NANJING SHICHAZHE IMAGE IDENTIFICATION TECHNOLOGY Co.,Ltd.