CN102497547A

CN102497547A - Processing method for cloud terminal video data

Info

Publication number: CN102497547A
Application number: CN2011103909489A
Authority: CN
Inventors: 李涛; 马海峰; 李志宁; 肖雄伟; 何剑荣; 刘小瑞; 季统凯
Original assignee: G Cloud Technology Co Ltd
Current assignee: G Cloud Technology Co Ltd
Priority date: 2011-11-30
Filing date: 2011-11-30
Publication date: 2012-06-13

Abstract

The invention relates to the technical field of video data processing, in particular to a processing method for a cloud terminal video data. The processing method is characterized in that when a video coding is compressed, the compression is processed by a CPU (central processing unit) and a GPU (graphics processing unit) together, i.e. a motion estimation of the video coding is finished by the GPU; and a feedback path to the motion estimation for a reconstruction frame can be cut off, the motion estimation can be processed by using an original frame for replacing the reconstruction frame, and the interdependence of the CPU and the GPU can be eliminated. According to the processing method for the cloud terminal video data disclosed by the invention, the problems that the CPU utilization rate of a server is high, the resource is in a bottleneck situation, and thus the playing of a video is not smooth when multiple cloud terminals play videos at the same time are effectively solved, and the processing method can be widely applied for the cloud terminal video data processing.

Description

The processing method of cloud terminal video data

Technical field

The present invention relates to the video data processing technology field, the processing method of especially a kind of cloud terminal video data.

Background technology

At present, during cloud terminal plays video, the encoding compression work of video data is many to be accomplished by server CPU; When many cloud terminals while displaying videos, can cause the service end CPU usage higher.For the CPU of 8 nuclear 2.4G, at 5 cloud terminals simultaneously during displaying video, CPU usage is near 800% the limit.During displaying video, card screen, to tremble the screen phenomenon serious, and video playback is not smooth; Restricting the number at the cloud terminal of station server ability traction.

Summary of the invention

The technical problem that the present invention solves is to provide a kind of cloud terminal video data handling procedure, can effectively overcome many cloud terminals and take place simultaneously that the server CPU usage is high, resource reaches bottleneck, thereby causes the slack problem of displaying video during displaying video easily.

The technical scheme that the present invention solves the problems of the technologies described above is: when video coding is compressed, carried out jointly by CPU and GPU, the estimation that is about to coding is accomplished by GPU; And with the feedback network cut-out of reconstruction frames to estimation, replace reconstruction frames with primitive frame and carry out estimation, eliminate interdepending of CPU and GPU.

Between CPU and GPU coding streamline, buffering area is set and is used to deposit the MV that GPU produces.

The motion estimation algorithm of GPU is the LMES algorithm based on depth test.

Described algorithm is to carry out as follows:

Step 1, the Z value that will form the summit in the triangular mesh of Frame all are changed to 1;

Step 2, the value among the ZBuffer all is initialized as 0, and the depth test condition is set is " greater than through ", open depth test;

Step 3, close the ZBuffer update functions, carry out PS0: the SAD that calculates current candidate MV; If the corresponding points among the ZBuffer are 1, this PS will can not carry out or not have at least output;

Step 4, open the ZBuffer update functions, carry out PS1, whether more current SAD is less than threshold value; If greater than, then abandon this point; Otherwise the corresponding points of upgrading automatically among the ZBuffer are 1;

Step 5, with next MV as candidate MV, return step 3.

Adopt two-wire journey program structure, main thread is responsible for controlling the whole procedure flow process, accomplishes the work of encoder except that estimation, and the take charge work of GPU completion estimation of sub-thread.

Be provided with in the main thread and derive from sub-thread step, and the function that carries out estimation is replaced by the function of reading MVBuf.

The present invention adopts to a video card of server configures; And the cloud encoder of the parallel architecture of CPU+GPU proposed; Adopt two-wire journey program structure and based on the LMES algorithm of depth test, and made full use of the parallel computing of GPU, thereby; Effectively reduced the occupancy of CPU, the resource bottleneck problem when having solved many cloud terminal plays videos.And the video coding of GPU and compression want high a lot of than CPU efficient, have guaranteed the fluency of video playback, have improved the cloud terminal number of station server ability traction.

The present invention adopts the parallel architecture based on Frame Level, when therefore only having only a frame data processing to finish, the data interaction between CPU and the GPU takes place, owing to be Frame Level, also makes GPU work more effective.

Adopt two-wire journey program structure; GPU itself can't deal with data; Must promote GPU work by CPU, simultaneously, our encoder architecture requires other parts of estimation and coding in two parallel pipelines, to carry out respectively again; Therefore single-threaded can't finely meeting this requirement adopt two-wire journey program structure.

LMES algorithm based on depth test has been proposed; Overcome the limitation of traditional algorithm; The increase condition judgment statement that this algorithm need not show in the PS program; But utilize depth test intrinsic in the 3D streamline to accomplish the strategy that withdraws from ME in advance, its operational efficiency will be higher than programmable pixel pipeline.

Description of drawings:

Below in conjunction with accompanying drawing the present invention is further specified:

Fig. 1 is existing encoder frame structure chart;

Fig. 2 is parallel architecture figure of the present invention;

Fig. 3 is the LMES algorithm flow chart that the present invention is based on depth test;

Fig. 4 is the workflow diagram that the present invention is based on the general-purpose computations program of GPU;

Fig. 5 is main thread flow process figure of the present invention;

Fig. 6 is the sub-thread flow chart of the present invention.

Embodiment

Shown in accompanying drawing; Existing video encoding standard, its basic compression frame all is based on the transition coding algorithm of time domain prediction.At first according to the frame of having encoded, present frame is predicted, obtained residual error then, residual error is carried out transition coding, quantification, during quantification, losing of comentropy will be taken place, encryption algorithm keeps the coding of entropy at last in transform domain.Tradition mpeg 2 encoder framework has wherein omitted the coding control section referring to accompanying drawing 1.Square frame among the figure, the job step of presentation code device.Wherein there is one from quantizing (Quantization) feedback to motion compensation (MC) and estimation (ME).

To quicken the MPEG2 coding with GPU, in the architecture of encoder, will have two processor CPU and GPU so.Must adopt the encoder of parallel architecture, make CPU and GPU work simultaneously, must transform conventional coding device structure for this reason.Consider that for a long time, ME is a bottleneck maximum in the encoder, can ME partly be transplanted on the GPU and carry out, and the feedback network of reconstruction frames to ME cut off, replace reconstruction frames with primitive frame and carry out ME, eliminate CPU and the complementary problem of GPU.The present invention proposes double flow line parallel architecture like accompanying drawing 2.Can find out among the figure that new encoder architecture forms two parallel pipelines, one is operated on the GPU, and one is operated on the CPU.The GPU streamline does not receive the influence of CPU streamline fully, only receives the restriction of I/O.And the CPU streamline removes the MV that receives the I/O restriction also need wait for GPU production.Wait for for eliminating, between two streamlines, set up buffering area to be used to deposit the MV that GPU produces.

The proposition of parallel architecture makes the CPU+GPU parallel encoding become possibility, for making full use of the characteristic of the high parallel computation of GPU, needs design to be fit to the ME algorithm of GPU.The present invention proposes LMES algorithm (ZB-LMES), considered that just depth test can accelerate the fact that the 3D streamline is played up speed greatly based on depth test.Its specific algorithm is following:

1, the Z value that will form the summit in the triangular mesh of Frame all is changed to 1.

2, the value among the ZBuffer all is initialized as 0, and the depth test condition is set, open depth test for " greater than passing through ".

3, close the ZBuffer update functions, carry out PS0: the SAD that calculates current candidate MV.If the corresponding points among the ZBuffer are 1, this PS will can not carry out or not have at least output.

4, open the ZBuffer update functions, carry out PS1: whether more current SAD is less than threshold value.If greater than, then abandon this point, otherwise the corresponding points of upgrading automatically among the ZBuffer are 1.

5, with next MV as candidate MV, return step 3.

The algorithm of ZB-LMES such as accompanying drawing 3.Depth test part among the figure is accomplished by the 3D streamline with the part of upgrading ZBuffer automatically, do not need the programmer to intervene, so execution speed is very fast.Wherein two square frames of PS0 and PS1 are represented two Pixel Shader programs among the ZB-LMES respectively.They accomplish SAD and the task of upgrading ZBuffer calculated respectively.The first step of algorithm all is changed to 1 to the Z attribute on all summits, and owing to the attribute of pixel is to be come by the vertex attribute interpolation, this will make the Z attribute of all pixels also constant is 1.

The programming of cuda technology is adopted in the realization of encoder, runs on the Linux platform, adopts C Plus Plus to write.Because the cuda technology is the parallel computing of nvidia company, so only be fit to support the relevant nvidia video card of cuda technology.Stand in programmer's angle, general based on the workflow of the general-purpose computations program of GPU like accompanying drawing 4.Can find out among the figure that GPU itself can't automatic data processing, must promote GPU work by CPU.Simultaneously, our encoder architecture requires other parts of estimation and coding in two parallel streamlines, to carry out respectively again.Single-threading program can't well meet this requirement, and therefore, the present invention adopts two-wire journey program structure.Have two threads simultaneously, main thread is responsible for controlling the whole procedure flow process, accomplishes the work of encoder except that ME, and the take charge work of GPU completion ME of sub-thread.

Accompanying drawing 5 is main thread flow processs; As can be seen from the figure, the flow process of main thread and the traditional encoder sequence of on CPU, realizing are very similar, just many steps that derive from sub-thread in the main thread; And the function that carries out estimation is read the function replacement of MVBuf.

Accompanying drawing 6 is sub-thread flow processs, and sub-thread has replaced ME part in the conventional codec.Because GPU works under the control of CPU; Therefore some part among the figure is for the programmer; Only need calling, 3D API accomplishes updating data and sends rendering order just passable; And need in sub-thread program, not write the code of accomplishing real work, for example Padding partly is exactly like this.Certainly, the Shader program of actual completion Padding operation on GPU still need be write in addition.Except the Padding part, the SAD threshold setting in the sub-thread flow process, whole pixel motion are estimated and the half-pix estimation, also follow identical mode of operation.

At main thread and sub-cross-thread, there is a result buffer MVBuf who is used to deposit ME.It is the formation of a first in first out, and the relation of main thread and sub-thread and this formation is the typical producer-consumer's model: sub-thread is produced MV and is filled the formation afterbody, and main thread consumes MV from the formation head.Utilize thread synchronization mechanism solution main thread and sub-thread access conflict problem to MVBuf.

Claims

1. cloud terminal video data handling procedure is characterized in that: when video coding is compressed, carried out jointly by CPU and GPU, the estimation that is about to coding is accomplished by GPU; And with the feedback network cut-out of reconstruction frames to estimation, replace reconstruction frames with primitive frame and carry out estimation, eliminate interdepending of CPU and GPU.

2. cloud according to claim 1 terminal video data handling procedure is characterized in that: between CPU and GPU coding streamline, buffering area is set and is used to deposit the MV that GPU produces.

3. cloud according to claim 1 and 2 terminal video data handling procedure is characterized in that: the motion estimation algorithm of GPU is the LMES algorithm based on depth test.

4. cloud according to claim 3 terminal video data handling procedure is characterized in that: described algorithm is to carry out as follows:

Step 5, with next MV as candidate MV, return step 3.

5. cloud according to claim 4 terminal video data handling procedure; It is characterized in that: adopt two-wire journey program structure; Main thread is responsible for controlling the whole procedure flow process, accomplishes the work of encoder except that estimation, and the take charge work of GPU completion estimation of sub-thread.

6. cloud according to claim 5 terminal video data handling procedure is characterized in that: be provided with in the main thread and derive from sub-thread step, and the function that carries out estimation is replaced by the function of reading MVBuf.