CN107527320B

CN107527320B - Method for accelerating bilinear interpolation calculation

Info

Publication number: CN107527320B
Application number: CN201610479164.6A
Authority: CN
Inventors: 朱旭光; 刘宇
Original assignee: Nanjing Shichazhe Image Identification Technology Co ltd
Current assignee: Nanjing inspector Intelligent Technology Co., Ltd
Priority date: 2016-06-22
Filing date: 2016-06-22
Publication date: 2020-06-02
Anticipated expiration: 2036-06-22
Also published as: CN107527320A

Abstract

The bilinear interpolation algorithm is widely applied when image scaling is processed, but as floating point operation and a large number of multiplication operations are involved in the algorithm, the operation amount is increased along with the increase of the image size, the real-time requirement is difficult to meet, and simultaneously, the finally realized chip has high power consumption and low processing speed. The initial acceleration of the bilinear interpolation algorithm can be considered to remove floating points, but the invention uses SSE acceleration method faster than the floating point removal acceleration method. The SSE of the X86 architecture platform has the capability of processing 128-bit data at the same time, and experimental results show that compared with the original algorithm, the SSE instruction acceleration method used by the invention can be accelerated by more than one time.

Description

Method for accelerating bilinear interpolation calculation

Technical Field

The invention relates to a method for accelerating a bilinear interpolation algorithm in the field of image processing.

Background

The image scaling is one of basic operations in the field of image processing, and the algorithms for image scaling are many, and the common algorithms include a nearest neighbor method, an edge-based image algorithm and a bilinear interpolation algorithm. The nearest neighbor algorithm is simplest, but has a very poor image scaling effect; although the image algorithm based on the edge has good effect, the algorithm has high complexity and is difficult to realize; the bilinear interpolation algorithm can be a compromise between the effect and the algorithm complexity, so the bilinear interpolation algorithm is most applied.

The principle of the bilinear interpolation algorithm is that four real pixel values around a virtual point in a source image are utilized to jointly determine one pixel value in a target image, so that the information of an original image is reflected more truly. However, a large number of multiplication operations are adopted in the bilinear interpolation algorithm, and floating point operations are involved, so that when the size of an image is increased, the operation amount is correspondingly increased, and a computer vision application scene generally has a certain requirement on real-time performance, so that the acceleration algorithm is a key point for research.

Disclosure of Invention

The invention aims to provide a method for accelerating bilinear interpolation calculation, which has the speed at least doubled compared with the speed of a common method.

The invention aims to solve the technical problem of accelerating a bilinear interpolation algorithm under the condition of unchanging effect.

In order to solve the technical problems, the invention adopts the following technical scheme: the SSE Instruction set is a Single Instruction, Multiple Data (SIMD) Instruction set on an X86 platform, and the invention achieves the purpose of improving the Data processing efficiency by using a mode of processing a plurality of Data by one SSE Instruction. The method mainly comprises the following steps:

(1) the mm _ loadl _ epi64 instruction imports loading two pairs of pixels P12 and P34, with P12 being pixels P1 and P2 and P34 being pixels P3 and P4;

(2) calculating weights W1, W2, W3, W4, mm _ mul _ ps instruction multiplies W by 256, mm _ cvtps _ epi32 converts W into integer, mm _ packs _ epi32 converts 32-bit data expansion into 16-bit data, and W is { W1, W2, W3, W4 };

(3) converting [ RGBARBARBARGBAGRBA ] into [ RRRRRRGGGGBBBBAAA ] data by specifically recombining the data types through an instruction of _ mm _ unpacklo _ epi8 and _ mm _ unpackhi _ epi64, namely converting the AoS type into the SoA type;

(4) an instruction of _ mm _ unpacklo _ epi8 acquires low-order data and converts the low-order data into 16 bits to obtain data pRG, and an instruction of _ mm _ unpackhi _ epi8 acquires high-order data and converts the high-order data into 16 bits to obtain data pBA;

(5) the mm _ madd _ epi16 instruction calculates the weight W with pRG and pBA to obtain outRG and outBA;

(6) the mm _ hadd _ epi32 instruction adds the outRG, outBA levels;

(7) the instructions _ mm _ packus _ epi32, _ mm _ packus _ epi16 and _ mm _ cvtsi128_ si32 convert the final data into a 32-bit integer, wherein the 32-bit integer is the target pixel RGBA value, and the calculation is completed;

the weight of the four peripheral pixels is calculated as follows: the weights of the four peripheral pixels are calculated by SSE instruction as follows

(21) A _mm _ set _ ss instruction loads floating point coordinates (x, y), and a _mm _ unpacklo _ ps instruction cross packs x and y;

(22) the _ m _ floor _ ps instruction calculates the integer parts ix and iy of x and y, and the _ mm _ sub _ ps instruction calculates the coordinate fractional parts fx and fy;

(23) mm _ sub _ ps, mm _ unpacklo _ ps, mm _ move _ ps, and mm _ shuffle _ ps instructions calculate Wx ═ 1-fx, fx,1-fx, fx ], and Wy ═ 1-fy,1-fy, fy, fy ];

(24) the mm _ mul _ ps instruction calculates Wx × Wy, and then 4 weight values of W1, W2, W3 and W4 are obtained: (1-fx) × (1-fy), fx × (1-fy), (1-fx) × fy, fx × fy, the weight calculation is complete and the values are returned.

Drawings

Fig. 1-an example of the position of a new pixel in the target image in the region of the source image 2 x 2.

FIG. 2 is a flow chart of bilinear interpolation calculation according to an embodiment of the present invention.

FIG. 3 is a flowchart illustrating weight calculation of neighboring pixels in bilinear interpolation according to an embodiment of the present invention.

Detailed Description

The following describes the technical solution of the present invention in detail by taking examples, and first describes how a bilinear interpolation algorithm is calculated, in a bilinear interpolation scaling algorithm of an image, a pixel value in a target image is determined by pixel values of four real points around a corresponding virtual point in a source image. The method for calculating the value of a certain pixel point in a target image by the bilinear interpolation algorithm mainly comprises the following steps:

1. determining the coordinates of a virtual point P (x, y) of a corresponding source image;

fx＝frac(x)fy＝frac(y)

2. loading four adjacent pixels;

P＝[p1,p2,p3,p4]

3. calculating weights for the four pixels respectively;

W＝[(1-fx)*(1-fy),fx*(1-fy),(1-fx)*fy,fx*fy]

4. calculating pixel values in the target image;

N＝dot(P,W)

5. and returning a result N calculated by bilinear interpolation.

From the above steps, it can be seen that: the bilinear interpolation algorithm is not very complex, but floating point operation and a large number of multiplication operations are involved, so when the size of a target image is large, the calculation amount is increased along with the increase of the size of the target image, and the requirement on real-time performance is difficult to meet.

The present invention is explained in detail below with reference to the sub-figures, and as shown in fig. 2, an input image is an RGBA image, and each pixel is 32 bits, and the present invention discloses a method for accelerating bilinear interpolation calculation, which mainly comprises the following steps:

1. loading data:

the mm _ loadl _ epi64 command loads two pairs of pixels P12 and P34, P12 being pixels P1 and P2, P34 being pixels P3 and P4, and the peripheral pixel positions are defined as shown in FIG. 1;

2. calculating the peripheral pixel weight:

a _ mm _ mul _ ps instruction multiplies the weight W of the floating point type by 256, a _ mm _ cvtps _ epi32 instruction converts the weight to an integer, and a _ mm _ packs _ epi32 instruction converts the 32-bit data extension to 16-bit data;

3. recombining RGBA value, converting [ RGBARBARBARBAGGRBA ] into [ RRRRGGGGGGBBBBBBAAAA ] data, namely converting the AoS type into the SoA type:

the data types are reorganized specifically by the _ mm _ unpacklo _ epi8, _ mm _ unpackhi _ epi64 instructions. In the present invention, the SoA (Structure of Array) type is used to further increase the speed, and such data type may use the instructions of _ mm _ madd _ epi16 (multiply add instruction), _ mmhadd _ epi32 (horizontal add instruction), and the like.

4. Converting 32-bit [ RRRRGGGGBBBBAAAA ] data into 2 groups of 16-bit data:

an instruction of _ mm _ unpacklo _ epi8 acquires low-order data and converts the low-order data into 16 bits to obtain data pRG, and an instruction of _ mm _ unpackhi _ epi8 acquires high-order data and converts the high-order data into 16 bits to obtain data pBA;

the _ mm _ madd _ epi16 instruction calculates the weight W with pRG and pBA to obtain outRG and outBA;

the _ mm _ hadd _ epi32 instruction adds the outRG, outBA levels;

7._ mm _ packus _ epi32, _ mm _ packus _ epi16 convert the data to 8 bits, and _ mm _ cvtsi128_ si32 convert the final data to a 32-bit integer, which is the target pixel RGBA value, and the calculation is complete.

The step 2 of calculating the peripheral pixel weights is as shown in fig. 3, and the specific steps include:

a _ mm _ set _ ss instruction loads floating point coordinates (x, y), a _ mm _ unpacklo _ ps instruction cross packs x and y;

2._ m _ floor _ ps instruction calculates the integer parts ix and iy of x and y, and _ mm _ sub _ ps instruction calculates the coordinate fractional parts fx and fy;

instructions such as _ mm _ sub _ ps, _ mm _ unpacklo _ ps, _ mm _ movell _ ps _ mm _ shuffle _ ps, etc. calculate Wx ═ 1-fx, fx,1-fx, fx ] and Wy ═ 1-fy,1-fy, fy, fy ];

4._ mm _ mul _ ps instruction calculates Wx × Wy, i.e. the 4 weight values of W1, W2, W3, W4 are obtained: (1-fx) × (1-fy), fx × (1-fy), (1-fx) × fy, fx × fy, the weight calculation is complete and the values are returned.

The acceleration method used by the invention reduces the cycle number of calculation, improves the calculation speed and ensures the real-time performance of operation by a mode of processing a plurality of data by one SSE instruction.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for accelerating bilinear interpolation computation is characterized by comprising the following steps:

(3) the data types are recombined by the instructions of _ mm _ unpacklo _ epi8, _ mm _ unpackhi _ epi 64: converting [ RGBARBARBARBAGGRBA ] into [ RRRRGGGGBBBBAAAA ] data, namely converting the AoS type into the SoA type;

(6) the mm _ hadd _ epi32 instruction adds the outRG, outBA levels;