CN107172426B

CN107172426B - Conversion method in the parallel frame per second of OpenCL based on double MIC

Info

Publication number: CN107172426B
Application number: CN201710490906.XA
Authority: CN
Inventors: 朱虎明; 王朵; 焦李成; 鹿乐; 田小林; 张小华; 侯彪; 关云辉; 焦文
Original assignee: Xian University of Electronic Science and Technology
Current assignee: Xian University of Electronic Science and Technology
Priority date: 2017-06-23
Filing date: 2017-06-23
Publication date: 2019-10-11
Anticipated expiration: 2037-06-23
Also published as: CN107172426A

Abstract

The invention proposes conversion methods in a kind of parallel frame per second of the OpenCL based on double MIC, under the premise of guaranteeing picture quality, effectively shorten the runing time converted in frame per second, improve the operational efficiency converted in frame per second.Implementation steps are as follows: main thread initializes the MIC1 and MIC2 of OpenCL equipment；The video of reading is numbered in main thread；Main thread definition signal amount simultaneously initializes；Main thread opens up memory on host and creates sub thread 1 and sub thread 2；Sub thread 1 controls MIC1, executes motion estimation algorithm, while sub thread 2 controls MIC2, executes movement compensating algorithm, realize the upper conversion of video frame rate；Main thread closes sub thread 1 and sub thread 2.The present invention effectively improves the operational efficiency of algorithm, can be used for conversion art on video frame rate.

Description

Conversion method in the parallel frame per second of OpenCL based on double MIC

Technical field

The invention belongs to technical field of video processing, it is related to conversion method in a kind of parallel frame per second of OpenCL, and in particular to Conversion method in a kind of parallel frame per second of OpenCL based on double MIC is suitable for the fields such as conversion on video frame rate.

Background technique

In recent years, the new technology of video field continues to bring out, by increasing the resolution ratio of video and improving the frame per second of video Etc. technological means, brought people it is apparent, more impact force visual experience.Such as from original SD video, till now HD video in addition ultra high-definition 4K video.Nowadays 4K video source is also more and more, has progressed into the view of people In open country, it is also higher to imply that people require the clarity of video pictures.New work " the ratio of Li An in November, 2016 director The midfield war of Li Linen ", " 120 frames/4K/3D " (per second to play 120 frames, 4K resolution ratio, 3D effect) has been attempted for the first time Technology has been started the new rule of motion picture technique with the broadcast mode of 120 vertical frame dimension frame per second per second, has caused the extensive concern of industry.

Switch technology (FRUC) on video frame rate, as a kind of Video post-processing means, by being inserted in original video frame Low frame-rate video is converted into high frame-rate video by the mode for entering intermediate frame.Most early in the 1980s in the industry cycle there have been The technology of frame per second conversion, linear interpolation frame are employed to execute transfer algorithm in frame per second, average including frame duplication and frame, with After develop it is more and more mature.The mid-90 proposes switch technology on the video frame rate based on motion compensation, this technology pair The object of movement carries out motion estimation algorithm first, obtains the vector field close to real motion as far as possible.It is with block or pixel Point is unit, and the vector that different blocks obtains is likely to different, so the various pieces of moving object are contained, obtained arrow Amount is more accurate.Then according to obtained motion vector, the calculating of motion compensating module is carried out, i.e., according to original video Frame and motion vector obtained in the previous step are obtained by way of interpolation to interleave.

With the appearance of video image striding forward from high definition to ultra high-definition and high frame-rate video, so that handling video Image size and frame number have great growth.The increase of image scale, the raising of algorithm complexity, so that at original algorithm The time of reason greatly increases, and is unable to satisfy processing rapidly and even requires in real time.Therefore, how research accelerates to convert in frame per second and calculate Method has become current relatively one of urgent problems.

Intel has issued Intel Xeon Phi coprocessor in 2012, it is a based on integrated many-core framework MIC (Many Intergrated Cores) to strong fusion product.The coprocessor is integrated with 50 or more calculating cores, and And has the vector processing unit (VPU) of 512bit.The hardware of Intel MIC many-core framework had both remained the multistage in CPU Assembly line, while it being equipped with numerous calculating cores again, each calculating core can concurrently execute 4 threads again, this is ensured that MIC has while handling the advantage of multiple tasks.Burnt text in 2015 proposes parallel on MIC framework in Master's thesis Estimation and motion compensated in parallel method, this method utilizes the fork-join model of OpenMP, by each of estimation piece The process for solving motion vector is placed on the execution of MIC per thread, and each pixel of motion compensating module is placed on MIC per thread It executes.Although this method obtains certain acceleration effect, still, there is no researchs how to realize video frame rate on double MIC Upper conversion.

Summary of the invention

It is an object of the invention to be directed to the deficiency of above-mentioned prior art, a kind of OpenCL based on double MIC is proposed simultaneously Conversion method in row frame per second effectively shortens the runing time converted in frame per second under the premise of guaranteeing picture quality, improves frame per second The operational efficiency of upper conversion.

To achieve the above object, the technical solution that the present invention takes comprises the following steps that

(1) main thread initializes the MIC1 and MIC2 of OpenCL equipment, realizes control of the host side to MIC equipment System；

(2) video of reading is numbered in main thread: main thread reads in N frame video, and to regarding in motion estimation algorithm The picture number of frequency present frame is i, initializes i=1, while being j to the picture number of video present frame in movement compensating algorithm, Initialize j=1, wherein the value range of i is [1, N], and the value range of j is [1, N]；

(3) it main thread definition signal amount and initializes: main thread definition signal amount 1 and semaphore 2, and by semaphore 1 Value is initialized as 1, and the value of semaphore 2 is initialized as 0；

(4) main thread opens up memory on host and creates sub thread: main thread opens up host memory cpu_ on host Mem1, host memory cpu_mem2 and host memory cpu_mem3, while creating sub thread 1 and sub thread 2；

(5) sub thread 1 controls MIC1, executes motion estimation algorithm:

(5a) sub thread 1 opens up memory mic1_mem1 and memory mic1_mem2 on MIC1；

The image data of i-th frame and i+1 frame is transferred to memory mic1_mem1 by (5b) sub thread 1；

(5c) MIC1 calculates the motion vector MVi of the i-th frame image data in motion estimation algorithm, and MVi is stored in memory In mic1_mem2；

MVi is passed to host memory cpu_mem1 from memory mic1_mem2 by (5d) sub thread 1；

(5e) sub thread 1 judges whether the value of semaphore 1 is greater than 0, if so, subtract 1 for the value of semaphore 1, while by host In MVi write-in host memory cpu_mem2 in memory cpu_mem1, the value of semaphore 2 is added 1, and execute step (5g), it is no Then, step (5f) is executed；

(5f) sub thread 1 waits sub thread 2 to modify the value of semaphore 1, until modification completion, and executes step (5e)；

(5g) enables i=i+1, and sub thread 1 judges whether i≤N is true, if so, executing step (5b), otherwise, sub thread 1 is hung It rises；

(6) sub thread 2 controls MIC2, executes movement compensating algorithm, realizes the upper conversion of video frame rate:

(6a) sub thread 2 opens up memory mic2_mem1, memory mic2_mem2 and memory mic2_mem3 on MIC2；

The image data of+1 frame of jth frame and jth is passed to memory mic2_mem1 by (6b) sub thread 2；

(6c) sub thread 2 judges whether the value of semaphore 2 is greater than 0, if so, subtract 1 for the value of semaphore 2, while by host MVi in memory cpu_mem2 reads memory mic2_mem2, and executes step (6e), otherwise, executes step (6d)；

(6d) sub thread 2 waits sub thread 1 to modify the value of semaphore 2, until modification completion, and executes step (6c)；

(6e) MIC2 calculates the motion compensated interpolation to pixel each in interleave, and the interpolation result to interleave is stored In memory mic2_mem3；

Interpolation result is passed to host memory cpu_mem3 from memory mic2_mem3 by (6f) sub thread 2, and by host memory Interpolation result written document in cpu_mem3 adds 1 into hard disk, while by the value of semaphore 1；

(6g) enables j=j+1, and sub thread 2 judges whether j≤N is true, if so, executing step (6b), otherwise, sub thread 2 is hung It rises；

(7) main thread closes sub thread 1 and sub thread 2.

Compared with the prior art, the invention has the following advantages:

1, the present invention creates sub thread 1 and sub thread 2 in host side by Pthread, and sub thread 1 controls MIC1 calculating and works as The motion vector of prior image frame, while sub thread 2 controls the motion compensated interpolation that MIC2 calculates the previous frame image of present frame, keeps away The prior art is exempted from by individually calculating equipment serial computing motion vector and the time-consuming big defect of motion compensated interpolation, effectively Improve the operational efficiency converted in frame per second.

2, sub thread 1 and sub thread 2 require the host memory that motion vector is stored in access in the present invention, wherein sub thread 1 using semaphore control to this block host memory carry out write operation, sub thread 2 using semaphore control to this block host memory into Row read operation avoids two threads to the read/write conflict of this block memory, ensure that the correctness converted in frame per second.

Detailed description of the invention

Fig. 1 is implementation flow chart of the invention；

Fig. 2 is implementation flow chart of the present invention to the initialization of OpenCL equipment；

Fig. 3 is the implementation flow chart that sub thread 1 of the present invention controls that MIC1 executes motion estimation algorithm；

Fig. 4 is the implementation flow chart that sub thread 2 of the present invention controls that MIC2 executes movement compensating algorithm；

Fig. 5 is the single frames test video figure of the different resolution of emulation experiment input of the present invention；

Fig. 6 is the simulation experiment result figure that the present invention differentiates correctness.

Specific embodiment

In the following with reference to the drawings and specific embodiments, the invention will be further described.

Referring to Fig.1, conversion method in the parallel frame per second of OpenCL based on double MIC, includes the following steps:

Step 1) main thread initializes the MIC1 and MIC2 of OpenCL equipment, realizes host side to MIC equipment Control: after program starts execution, into main thread.To implementation flow chart reference Fig. 2 of OpenCL equipment initialization, main thread exists When obtaining the execution OpenCL equipment stage, the facility information of MIC1 is obtained by device [0], is obtained by device [1] Take the facility information of MIC2.Then command queue commandqueue1 is created to MIC1 equipment, order team is created to MIC2 equipment Commandqueue2 is arranged, in motion estimation algorithm, reading and writing data and the execution to MIC1 are controlled using commandqueue1 The operation of estimation kernel.In movement compensating algorithm, using commandqueue2 control to the reading and writing data of MIC2 and Execute the operation of motion compensation kernel.

The video of reading is numbered in step 2) main thread: main thread reads in 30 frame videos, due to executing estimation The sub thread 1 of algorithm and the sub thread 2 for executing movement compensating algorithm, are not to same frame image Parallel Processing, so to fortune The picture number of video present frame is i in dynamic algorithm for estimating, initializes i=1, while to video present frame in movement compensating algorithm Picture number be j, initialize j=1, wherein the value range of i is [1,30], and the value range of j is [1,30].

Step 3) main thread definition signal amount simultaneously initializes: two threads rush the access of same memory in order to prevent Prominent, the present invention controls the communication between two threads, main thread definition signal amount 1 and semaphore 2 using semaphore, by semaphore 1 value is initialized as 1, and the value of semaphore 2 is initialized as 0.

Step 4) main thread opens up memory on host and creates sub thread: main thread opens up host memory on host Cpu_mem1, host memory cpu_mem2 and host memory cpu_mem3 are created using Pthread function pthread_create Sub thread 1 and sub thread 2.

Step 5) sub thread 1 controls MIC1, executes motion estimation algorithm, implementation flow chart is referring to Fig. 3:

Step 5a) sub thread 1 using OpenCL function clCreateBuffer opened up on MIC1 memory mic1_mem1 and Memory mic1_mem2, mic1_mem1 are used to store the image data of input, and mic1_mem2 is used to store calculated movement arrow Measure result.

Step 5b) sub thread 1 using OpenCL function clEnqueueWriteBuffer by the figure of the i-th frame and i+1 frame As data are transferred to memory mic1_mem1.

Step 5c) MIC1 calculates the motion vector MVi of the i-th frame image data in motion estimation algorithm, and in MVi is stored in It deposits in mic1_mem2: the image of the i-th frame being divided into 240 × 135 macro blocks, calculates the set of candidate motion vectors of each macro block, According to SAD calculation formula, the sad value of each candidate motion vector in vector set is obtained, select the smallest Candidate Motion arrow of sad value The motion vector as the macro block is measured, the motion vector of i-th all macro blocks of frame image successively has been calculated.

Wherein X represents the width of the i-th frame current macro, and Y represents the height of the i-th frame current macro, x_mnRepresent current block in the i-th frame Interior position is the pixel value of (m, n) point, y_mnIt is the pixel value of (m, n) point for match block position in i+1 frame.

Step 5d) sub thread 1 using OpenCL function clEnqueueReadBuffer by MVi from memory mic1_mem2 pass Enter host memory cpu_mem1.

Step 5e) sub thread 1 judges whether the value of semaphore 1 is greater than 0, if so, showing that jth has had been calculated in sub thread 2 The motion compensation of frame can copy data to the memory block for storing input data in sub thread 2, subtract 1 for the value of semaphore 1, together When by host memory cpu_mem1 MVi write-in host memory cpu_mem2 in, the value of semaphore 2 is added 1, and execute step 5g), otherwise, step 5f is executed).

Step 5f) sub thread 1 wait sub thread 2 modify semaphore 1 value, until modification complete, and execute step 5e).

Step 5g) i=i+1 is enabled, sub thread 1 judges whether i≤30 are true, if so, calculating the movement arrow of next frame image Amount executes step 5b), otherwise, show that the motion vector of all frames of video has calculated completion, sub thread 1 is hung up.

Step 6) sub thread 2 controls MIC2, executes movement compensating algorithm, implementation flow chart is referring to Fig. 4:

Step 6a) sub thread 2 using OpenCL function clCreateBuffer opened up on MIC2 memory mic2_mem1, Memory mic2_mem2 and memory mic2_mem3.

Step 6b) sub thread 2 using OpenCL function clEnqueueWriteBuffer by the figure of+1 frame of jth frame and jth As data are passed to memory mic2_mem1.

Step 6c) sub thread 2 judges whether the value of semaphore 2 is greater than 0, if so, showing sub thread 1 by MVi from master Host memory cpu_mem2 is written in machine memory cpu_mem1, subtracts 1 for the value of semaphore 2, while will be in host memory cpu_mem2 MVi read memory mic2_mem2, and execute step 6e), otherwise, execute step 6d).

Step 6d) sub thread 2 wait sub thread 1 modify semaphore 2 value, until modification complete, and execute step 6c).

Step 6e) interpolation result to pixel each in interleave is calculated on MIC2, and the interpolation result to interleave is deposited It is placed in memory mic2_mem3.

Step 6f) sub thread 2 using OpenCL function clEnqueueReadBuffer by interpolation result from memory mic2_ Mem3 is passed to host memory cpu_mem3, and by the interpolation result written document in host memory cpu_mem3 into hard disk, simultaneously The value of semaphore 1 is added 1, shows that the motion compensated interpolation of jth frame calculates and completes.

Step 6g) j=j+1 is enabled, sub thread 2 judges whether j≤N is true, if so, calculating the motion compensation of next frame image Interpolation executes step (6b), otherwise, shows that video is all and has calculated completion to interleave, sub thread 2 is hung up.

Step 7) main thread closes sub thread 1 and sub thread 2.

Below in conjunction with emulation experiment, technical effect of the invention is further illustrated:

1) simulated conditions:

The single frames test video figure of the different resolution of emulation experiment input, referring to Fig. 5, Fig. 5 (a) is the list of 2K resolution ratio Frame test video figure ParkScene_1920 × 1080, Fig. 5 (b) are the single frames test video figure Sunset_3840 of 4K resolution ratio ×2160。

Emulation experiment environment uses Xian Electronics Science and Technology University's High Performance Computing Center cluster device, and test platform parameter is shown in Shown in table 1.

Table 1

2) emulation content and interpretation of result:

Conversion method in the parallel frame per second of OpenCL based on double MIC, table 2 are serial algorithm and parallel algorithm of the present invention The comparison of PSNR value, table 3 are motion estimation algorithm and movement compensating algorithm testing time, and table 4 is serially test time and the present invention Testing time.

Table 2

Video sequence	Serial algorithm	Parallel algorithm of the present invention
			2K video	36.39	36.37
4K video	38.43	38.40

Table 3

Cycle tests	Motion estimation algorithm (ms)	Movement compensating algorithm (ms)
			2K video	240.52	187.92
4K video	249.37	684.38

Table 4

Cycle tests	Serial algorithm (ms)	Parallel method (ms) of the present invention
			2K video	428.44	261.30
4K video	933.75	689.61

Fig. 6 is the simulation experiment result figure for differentiating correctness, and Fig. 6 (a) is the single frames simulation result diagram of 2K resolution video, Fig. 6 (b) is the single frames simulation result diagram of 4K resolution video.It is correct for more accurate judgement parallel scheme of the invention Property, it is used to the quality of evaluation algorithms, the PSNR of simulation result of the present invention using the Y-PSNR (PSNR) of objective evaluation criteria As shown in table 2, the PSNR of transfer algorithm obtains in the parallel frame per second proposed in the present invention PSNR and serial algorithm are close for value, Available conclusion, the picture quality of transfer algorithm and the picture quality of serial algorithm are consistent in parallel frame per second, therefore are tested The correctness of transfer algorithm in this parallel frame per second is demonstrate,proved.

Table 3 is the serially test time of motion estimation algorithm and movement compensating algorithm in transfer algorithm in frame per second.

Table 4 is the serially test time and concurrent testing time of the invention of transfer algorithm in frame per second, wherein serially test Time is the summed result of motion estimation algorithm time and movement compensating algorithm time in table 3, and the testing time of the invention is about The greater of motion estimation algorithm testing time and movement compensating algorithm testing time between the two in table 3, along with data are copied Shellfish time and semaphore wait time.It is calculated from table 4, it can be seen that the testing time of the invention effectively accelerates to convert in frame per second The computational efficiency of method.

Claims

1. conversion method in a kind of parallel frame per second of OpenCL based on double MIC, includes the following steps:

(1) main thread initializes the MIC1 and MIC2 of OpenCL equipment, realizes control of the host side to MIC equipment；

(2) video of reading is numbered in main thread: main thread reads in N frame video, and works as to video in motion estimation algorithm The picture number of previous frame is i, initializes i=1, while being j to the picture number of video present frame in movement compensating algorithm, initially Change j=1, wherein the value range of i is [1, N], and the value range of j is [1, N]；

(3) it main thread definition signal amount and initializes: main thread definition signal amount 1 and semaphore 2, and will be at the beginning of the value of semaphore 1 Beginning turns to 1, and the value of semaphore 2 is initialized as 0；

(4) main thread opens up memory on host and creates sub thread: main thread opened up on host host memory cpu_mem1, Host memory cpu_mem2 and host memory cpu_mem3, while creating sub thread 1 and sub thread 2；

(5) sub thread 1 controls MIC1, executes motion estimation algorithm:

(5a) sub thread 1 opens up memory mic1_mem1 and memory mic1_mem2 on MIC1；

(5c) MIC1 calculates the motion vector MVi of the i-th frame image data in motion estimation algorithm, and MVi is stored in memory mic1_ In mem2；

(5e) sub thread 1 judges whether the value of semaphore 1 is greater than 0, if so, subtract 1 for the value of semaphore 1, while by host memory In MVi write-in host memory cpu_mem2 in cpu_mem1, the value of semaphore 2 is added 1, and execute step (5g) and otherwise hold Row step (5f)；

(5g) enables i=i+1, and sub thread 1 judges whether i≤N is true, if so, executing step (5b), otherwise, sub thread 1 is hung up；

(6c) sub thread 2 judges whether the value of semaphore 2 is greater than 0, if so, subtract 1 for the value of semaphore 2, while by host memory MVi in cpu_mem2 reads memory mic2_mem2, and executes step (6e), otherwise, executes step (6d)；

(6e) MIC2 calculates motion compensated interpolation to pixel each in interleave, and in the interpolation result to interleave is stored in It deposits in mic2_mem3；

Interpolation result is passed to host memory cpu_mem3 from memory mic2_mem3 by (6f) sub thread 2, and by host memory cpu_ Interpolation result written document in mem3 adds 1 into hard disk, while by the value of semaphore 1；

(6g) enables j=j+1, and sub thread 2 judges whether j≤N is true, if so, executing step (6b), otherwise, sub thread 2 is hung up；

(7) main thread closes sub thread 1 and sub thread 2.

2. conversion method in the parallel frame per second of the OpenCL according to claim 1 based on double MIC, which is characterized in that step (4) creation sub thread 1 and sub thread 2 described in, using Pthread function pthread_create.

3. conversion method in the parallel frame per second of the OpenCL according to claim 1 based on double MIC, which is characterized in that step Sub thread 1 described in (5a) opens up memory mic1_mem1 and memory mic1_mem2 on MIC1, with institute in step (6a) The sub thread 2 stated opens up memory mic2_mem1, memory mic2_mem2 and memory mic2_mem3 on MIC2, is all made of OpenCL function clCreateBuffer.

4. conversion method in the parallel frame per second of the OpenCL according to claim 1 based on double MIC, which is characterized in that step The image data of i-th frame and i+1 frame is transferred to memory mic1_mem1 by sub thread 1 described in (5b), with step (6b) The image data of+1 frame of jth frame and jth is passed to memory mic2_mem1 by the sub thread 2, is all made of OpenCL function clEnqueueWriteBuffer。

5. conversion method in the parallel frame per second of the OpenCL according to claim 1 based on double MIC, which is characterized in that step MIC1 described in (5c) calculates the motion vector MVi of the i-th frame image data in motion estimation algorithm, realizes that steps are as follows:

The image of i-th frame is divided into M × N by (5c1)₂A macro block, wherein M >=1, N₂≥1；

(5c2) calculates the set of candidate motion vectors of each macro block；

(5c3) obtains the sad value of each candidate motion vector in vector set according to SAD calculation formula；

Wherein X represents the width of the i-th frame current macro, and Y represents the height of the i-th frame current macro, x_mnRepresent in the i-th frame position in current block It is set to the pixel value of (m, n) point, y_mnIt is the pixel value of (m, n) point for match block position in i+1 frame；

(5c4) selects motion vector of the smallest candidate motion vector of sad value as the macro block；

The motion vector of i-th all macro blocks of frame image successively has been calculated in (5c5).

6. conversion method in the parallel frame per second of the OpenCL according to claim 1 based on double MIC, which is characterized in that step MVi is passed to host memory cpu_mem1 from memory mic1_mem2 by sub thread 1 described in (5d), with institute in step (6f) Interpolation result is passed to host memory cpu_mem3 from memory mic2_mem3 by the sub thread 2 stated, and is all made of OpenCL function cEnqueueReadBuffer。