CN102497547A - Processing method for cloud terminal video data - Google Patents

Processing method for cloud terminal video data Download PDF

Info

Publication number
CN102497547A
CN102497547A CN2011103909489A CN201110390948A CN102497547A CN 102497547 A CN102497547 A CN 102497547A CN 2011103909489 A CN2011103909489 A CN 2011103909489A CN 201110390948 A CN201110390948 A CN 201110390948A CN 102497547 A CN102497547 A CN 102497547A
Authority
CN
China
Prior art keywords
gpu
video data
estimation
cpu
terminal video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103909489A
Other languages
Chinese (zh)
Inventor
李涛
马海峰
李志宁
肖雄伟
何剑荣
刘小瑞
季统凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
G Cloud Technology Co Ltd
Original Assignee
G Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by G Cloud Technology Co Ltd filed Critical G Cloud Technology Co Ltd
Priority to CN2011103909489A priority Critical patent/CN102497547A/en
Publication of CN102497547A publication Critical patent/CN102497547A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to the technical field of video data processing, in particular to a processing method for a cloud terminal video data. The processing method is characterized in that when a video coding is compressed, the compression is processed by a CPU (central processing unit) and a GPU (graphics processing unit) together, i.e. a motion estimation of the video coding is finished by the GPU; and a feedback path to the motion estimation for a reconstruction frame can be cut off, the motion estimation can be processed by using an original frame for replacing the reconstruction frame, and the interdependence of the CPU and the GPU can be eliminated. According to the processing method for the cloud terminal video data disclosed by the invention, the problems that the CPU utilization rate of a server is high, the resource is in a bottleneck situation, and thus the playing of a video is not smooth when multiple cloud terminals play videos at the same time are effectively solved, and the processing method can be widely applied for the cloud terminal video data processing.

Description

The processing method of cloud terminal video data
Technical field
The present invention relates to the video data processing technology field, the processing method of especially a kind of cloud terminal video data.
Background technology
At present, during cloud terminal plays video, the encoding compression work of video data is many to be accomplished by server CPU; When many cloud terminals while displaying videos, can cause the service end CPU usage higher.For the CPU of 8 nuclear 2.4G, at 5 cloud terminals simultaneously during displaying video, CPU usage is near 800% the limit.During displaying video, card screen, to tremble the screen phenomenon serious, and video playback is not smooth; Restricting the number at the cloud terminal of station server ability traction.
Summary of the invention
The technical problem that the present invention solves is to provide a kind of cloud terminal video data handling procedure, can effectively overcome many cloud terminals and take place simultaneously that the server CPU usage is high, resource reaches bottleneck, thereby causes the slack problem of displaying video during displaying video easily.
The technical scheme that the present invention solves the problems of the technologies described above is: when video coding is compressed, carried out jointly by CPU and GPU, the estimation that is about to coding is accomplished by GPU; And with the feedback network cut-out of reconstruction frames to estimation, replace reconstruction frames with primitive frame and carry out estimation, eliminate interdepending of CPU and GPU.
Between CPU and GPU coding streamline, buffering area is set and is used to deposit the MV that GPU produces.
The motion estimation algorithm of GPU is the LMES algorithm based on depth test.
Described algorithm is to carry out as follows:
Step 1, the Z value that will form the summit in the triangular mesh of Frame all are changed to 1;
Step 2, the value among the ZBuffer all is initialized as 0, and the depth test condition is set is " greater than through ", open depth test;
Step 3, close the ZBuffer update functions, carry out PS0: the SAD that calculates current candidate MV; If the corresponding points among the ZBuffer are 1, this PS will can not carry out or not have at least output;
Step 4, open the ZBuffer update functions, carry out PS1, whether more current SAD is less than threshold value; If greater than, then abandon this point; Otherwise the corresponding points of upgrading automatically among the ZBuffer are 1;
Step 5, with next MV as candidate MV, return step 3.
Adopt two-wire journey program structure, main thread is responsible for controlling the whole procedure flow process, accomplishes the work of encoder except that estimation, and the take charge work of GPU completion estimation of sub-thread.
Be provided with in the main thread and derive from sub-thread step, and the function that carries out estimation is replaced by the function of reading MVBuf.
The present invention adopts to a video card of server configures; And the cloud encoder of the parallel architecture of CPU+GPU proposed; Adopt two-wire journey program structure and based on the LMES algorithm of depth test, and made full use of the parallel computing of GPU, thereby; Effectively reduced the occupancy of CPU, the resource bottleneck problem when having solved many cloud terminal plays videos.And the video coding of GPU and compression want high a lot of than CPU efficient, have guaranteed the fluency of video playback, have improved the cloud terminal number of station server ability traction.
The present invention adopts the parallel architecture based on Frame Level, when therefore only having only a frame data processing to finish, the data interaction between CPU and the GPU takes place, owing to be Frame Level, also makes GPU work more effective.
Adopt two-wire journey program structure; GPU itself can't deal with data; Must promote GPU work by CPU, simultaneously, our encoder architecture requires other parts of estimation and coding in two parallel pipelines, to carry out respectively again; Therefore single-threaded can't finely meeting this requirement adopt two-wire journey program structure.
LMES algorithm based on depth test has been proposed; Overcome the limitation of traditional algorithm; The increase condition judgment statement that this algorithm need not show in the PS program; But utilize depth test intrinsic in the 3D streamline to accomplish the strategy that withdraws from ME in advance, its operational efficiency will be higher than programmable pixel pipeline.
Description of drawings:
Below in conjunction with accompanying drawing the present invention is further specified:
Fig. 1 is existing encoder frame structure chart;
Fig. 2 is parallel architecture figure of the present invention;
Fig. 3 is the LMES algorithm flow chart that the present invention is based on depth test;
Fig. 4 is the workflow diagram that the present invention is based on the general-purpose computations program of GPU;
Fig. 5 is main thread flow process figure of the present invention;
Fig. 6 is the sub-thread flow chart of the present invention.
Embodiment
Shown in accompanying drawing; Existing video encoding standard, its basic compression frame all is based on the transition coding algorithm of time domain prediction.At first according to the frame of having encoded, present frame is predicted, obtained residual error then, residual error is carried out transition coding, quantification, during quantification, losing of comentropy will be taken place, encryption algorithm keeps the coding of entropy at last in transform domain.Tradition mpeg 2 encoder framework has wherein omitted the coding control section referring to accompanying drawing 1.Square frame among the figure, the job step of presentation code device.Wherein there is one from quantizing (Quantization) feedback to motion compensation (MC) and estimation (ME).
To quicken the MPEG2 coding with GPU, in the architecture of encoder, will have two processor CPU and GPU so.Must adopt the encoder of parallel architecture, make CPU and GPU work simultaneously, must transform conventional coding device structure for this reason.Consider that for a long time, ME is a bottleneck maximum in the encoder, can ME partly be transplanted on the GPU and carry out, and the feedback network of reconstruction frames to ME cut off, replace reconstruction frames with primitive frame and carry out ME, eliminate CPU and the complementary problem of GPU.The present invention proposes double flow line parallel architecture like accompanying drawing 2.Can find out among the figure that new encoder architecture forms two parallel pipelines, one is operated on the GPU, and one is operated on the CPU.The GPU streamline does not receive the influence of CPU streamline fully, only receives the restriction of I/O.And the CPU streamline removes the MV that receives the I/O restriction also need wait for GPU production.Wait for for eliminating, between two streamlines, set up buffering area to be used to deposit the MV that GPU produces.
The proposition of parallel architecture makes the CPU+GPU parallel encoding become possibility, for making full use of the characteristic of the high parallel computation of GPU, needs design to be fit to the ME algorithm of GPU.The present invention proposes LMES algorithm (ZB-LMES), considered that just depth test can accelerate the fact that the 3D streamline is played up speed greatly based on depth test.Its specific algorithm is following:
1, the Z value that will form the summit in the triangular mesh of Frame all is changed to 1.
2, the value among the ZBuffer all is initialized as 0, and the depth test condition is set, open depth test for " greater than passing through ".
3, close the ZBuffer update functions, carry out PS0: the SAD that calculates current candidate MV.If the corresponding points among the ZBuffer are 1, this PS will can not carry out or not have at least output.
4, open the ZBuffer update functions, carry out PS1: whether more current SAD is less than threshold value.If greater than, then abandon this point, otherwise the corresponding points of upgrading automatically among the ZBuffer are 1.
5, with next MV as candidate MV, return step 3.
The algorithm of ZB-LMES such as accompanying drawing 3.Depth test part among the figure is accomplished by the 3D streamline with the part of upgrading ZBuffer automatically, do not need the programmer to intervene, so execution speed is very fast.Wherein two square frames of PS0 and PS1 are represented two Pixel Shader programs among the ZB-LMES respectively.They accomplish SAD and the task of upgrading ZBuffer calculated respectively.The first step of algorithm all is changed to 1 to the Z attribute on all summits, and owing to the attribute of pixel is to be come by the vertex attribute interpolation, this will make the Z attribute of all pixels also constant is 1.
The programming of cuda technology is adopted in the realization of encoder, runs on the Linux platform, adopts C Plus Plus to write.Because the cuda technology is the parallel computing of nvidia company, so only be fit to support the relevant nvidia video card of cuda technology.Stand in programmer's angle, general based on the workflow of the general-purpose computations program of GPU like accompanying drawing 4.Can find out among the figure that GPU itself can't automatic data processing, must promote GPU work by CPU.Simultaneously, our encoder architecture requires other parts of estimation and coding in two parallel streamlines, to carry out respectively again.Single-threading program can't well meet this requirement, and therefore, the present invention adopts two-wire journey program structure.Have two threads simultaneously, main thread is responsible for controlling the whole procedure flow process, accomplishes the work of encoder except that ME, and the take charge work of GPU completion ME of sub-thread.
Accompanying drawing 5 is main thread flow processs; As can be seen from the figure, the flow process of main thread and the traditional encoder sequence of on CPU, realizing are very similar, just many steps that derive from sub-thread in the main thread; And the function that carries out estimation is read the function replacement of MVBuf.
Accompanying drawing 6 is sub-thread flow processs, and sub-thread has replaced ME part in the conventional codec.Because GPU works under the control of CPU; Therefore some part among the figure is for the programmer; Only need calling, 3D API accomplishes updating data and sends rendering order just passable; And need in sub-thread program, not write the code of accomplishing real work, for example Padding partly is exactly like this.Certainly, the Shader program of actual completion Padding operation on GPU still need be write in addition.Except the Padding part, the SAD threshold setting in the sub-thread flow process, whole pixel motion are estimated and the half-pix estimation, also follow identical mode of operation.
At main thread and sub-cross-thread, there is a result buffer MVBuf who is used to deposit ME.It is the formation of a first in first out, and the relation of main thread and sub-thread and this formation is the typical producer-consumer's model: sub-thread is produced MV and is filled the formation afterbody, and main thread consumes MV from the formation head.Utilize thread synchronization mechanism solution main thread and sub-thread access conflict problem to MVBuf.

Claims (6)

1. cloud terminal video data handling procedure is characterized in that: when video coding is compressed, carried out jointly by CPU and GPU, the estimation that is about to coding is accomplished by GPU; And with the feedback network cut-out of reconstruction frames to estimation, replace reconstruction frames with primitive frame and carry out estimation, eliminate interdepending of CPU and GPU.
2. cloud according to claim 1 terminal video data handling procedure is characterized in that: between CPU and GPU coding streamline, buffering area is set and is used to deposit the MV that GPU produces.
3. cloud according to claim 1 and 2 terminal video data handling procedure is characterized in that: the motion estimation algorithm of GPU is the LMES algorithm based on depth test.
4. cloud according to claim 3 terminal video data handling procedure is characterized in that: described algorithm is to carry out as follows:
Step 1, the Z value that will form the summit in the triangular mesh of Frame all are changed to 1;
Step 2, the value among the ZBuffer all is initialized as 0, and the depth test condition is set is " greater than through ", open depth test;
Step 3, close the ZBuffer update functions, carry out PS0: the SAD that calculates current candidate MV; If the corresponding points among the ZBuffer are 1, this PS will can not carry out or not have at least output;
Step 4, open the ZBuffer update functions, carry out PS1, whether more current SAD is less than threshold value; If greater than, then abandon this point; Otherwise the corresponding points of upgrading automatically among the ZBuffer are 1;
Step 5, with next MV as candidate MV, return step 3.
5. cloud according to claim 4 terminal video data handling procedure; It is characterized in that: adopt two-wire journey program structure; Main thread is responsible for controlling the whole procedure flow process, accomplishes the work of encoder except that estimation, and the take charge work of GPU completion estimation of sub-thread.
6. cloud according to claim 5 terminal video data handling procedure is characterized in that: be provided with in the main thread and derive from sub-thread step, and the function that carries out estimation is replaced by the function of reading MVBuf.
CN2011103909489A 2011-11-30 2011-11-30 Processing method for cloud terminal video data Pending CN102497547A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103909489A CN102497547A (en) 2011-11-30 2011-11-30 Processing method for cloud terminal video data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103909489A CN102497547A (en) 2011-11-30 2011-11-30 Processing method for cloud terminal video data

Publications (1)

Publication Number Publication Date
CN102497547A true CN102497547A (en) 2012-06-13

Family

ID=46189330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103909489A Pending CN102497547A (en) 2011-11-30 2011-11-30 Processing method for cloud terminal video data

Country Status (1)

Country Link
CN (1) CN102497547A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483587A (en) * 2023-06-21 2023-07-25 湖南马栏山视频先进技术研究院有限公司 Video super-division parallel method, server and medium based on image segmentation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1641278A2 (en) * 2004-09-13 2006-03-29 Microsoft Corporation Accelerated video encoding using a graphics processing unit
CN101267556A (en) * 2008-03-21 2008-09-17 海信集团有限公司 Quick motion estimation method and video coding and decoding method
CN101873483A (en) * 2009-04-24 2010-10-27 深圳市九洲电器有限公司 Motion estimation method and coding chip and device using same

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1641278A2 (en) * 2004-09-13 2006-03-29 Microsoft Corporation Accelerated video encoding using a graphics processing unit
CN101267556A (en) * 2008-03-21 2008-09-17 海信集团有限公司 Quick motion estimation method and video coding and decoding method
CN101873483A (en) * 2009-04-24 2010-10-27 深圳市九洲电器有限公司 Motion estimation method and coding chip and device using same

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
房波: "基于通用可编程GPU的视频编解码器--架构、算法与实现", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483587A (en) * 2023-06-21 2023-07-25 湖南马栏山视频先进技术研究院有限公司 Video super-division parallel method, server and medium based on image segmentation
CN116483587B (en) * 2023-06-21 2023-09-08 湖南马栏山视频先进技术研究院有限公司 Video super-division parallel method, server and medium based on image segmentation

Similar Documents

Publication Publication Date Title
US9955194B2 (en) Server GPU assistance for mobile GPU applications
US10728564B2 (en) Systems and methods of encoding multiple video streams for adaptive bitrate streaming
US20190373040A1 (en) Systems and methods game streaming
CN102447906A (en) Low-latency video decoding
WO2014190308A1 (en) Systems and methods of encoding multiple video streams with adaptive quantization for adaptive bitrate streaming
CN106031177A (en) Host encoder for hardware-accelerated video encoding
CN109862357A (en) Cloud game image encoding method, device, equipment and the storage medium of low latency
CN106464886A (en) Robust encoding and decoding of pictures in video
CN105592314B (en) Parallel decoding method and apparatus for parallel decoding
CN105187845A (en) Video data decoding device and method
CN103533325A (en) Depth image intra-frame encoding and decoding methods, devices and encoder and decoder
CN105208394B (en) A kind of real-time digital image compression prediction technique and system
CN102404576A (en) Cloud terminal decoder and load equalization algorithm thereof and decoding algorithm of GPU (Graphics Processing Unit)
CN103716318A (en) Method for improving display quality of virtual desktop by jointly using RFB coding and H.264 coding in cloud computing environment
US10616585B2 (en) Encoding data arrays
CN101873498A (en) Video decoding method, video decoding device and video/audio play system
CN111757103A (en) VR video encoding and decoding method, system and storage medium based on video card computing unit
US20170034522A1 (en) Workload balancing in multi-core video decoder
CN116506618B (en) Video decoding optimization method based on load dynamic self-adaption
CN102497547A (en) Processing method for cloud terminal video data
CN103051899A (en) Method and device for video decoding
CN104168482B (en) A kind of video coding-decoding method and device
CN108989814A (en) A kind of bit rate control method based on parallel encoding structure
CN103268619B (en) The method of image data batch compression in swf file and device
TWI735297B (en) Coding of video and audio with initialization fragments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120613