CN103716644A

CN103716644A - H264 multi-granularity parallel handling method

Info

Publication number: CN103716644A
Application number: CN201310645144.8A
Authority: CN
Inventors: 钱荣华
Original assignee: NANJING COREWISE SMART TECHNOLOGY Inc
Current assignee: NANJING COREWISE SMART TECHNOLOGY Inc
Priority date: 2013-12-05
Filing date: 2013-12-05
Publication date: 2014-04-09

Abstract

The invention relates to an H264 multi-granularity parallel handling method. In an H264 coding hierarchical structure, a frame level, a piece level, and data level are divided based on the magnitude of parallel granularities from large to small. The H264 multi-granularity parallel comprises three types, i.e., frame level parallel, piece level parallel, and data level parallel. The handling method is using an arm instruction set to applying data operation parallel on vector manipulation. The multi-granularity parallel enables a program to have better locality, thereby increasing the Cache hit rate and increasing the CPU execution efficiency.

Description

The processing method that the many granularities of a kind of H264 are parallel

Technical field

The present invention relates to a kind of decoding method of 3G transmission of video, relate in particular to the parallel processing method of the many granularities of a kind of H264.

Background technology

Video monitoring system, as the important component part of wisdom security protection and wisdom traffic in the Internet of Things application towards municipal public safety integrated management, has broad application prospects.Along with the development of social city's level, the bottleneck of traditional video monitoring system manifests.The core technology addressing this problem is the traditional video monitoring system of video structural description technological transformation, makes it to form video monitoring system of new generation---wisdom, semantization, informationalized semantic video supervisory control system.The core of video monitoring system technology is the encoding and decoding of video, for better compatible each hardware platform, need to use soft decoding to coordinate hard decoder jointly to complete the encoding and decoding work of image.The current 3G mobile video monitoring system based on android platform development, ubiquity image quality is unintelligible, show the problem that lag time is long, its basic reason is that soft code decode algorithm is DSP platform or the embedded Linux platform development based on traditional mostly, it is also to carry out based on above-mentioned platform to the Optimization Work of algorithm, and for the algorithm optimization of Android system imperfection also, so need to be optimized for the arm platform of Android system.

At present the software decode mode of android platform, under 640*480 resolution, can only reach the encoding and decoding speed of the limit of frame each seconds 10, reach more smooth user and experience, encoding and decoding speed need to be brought up to 15 frames per second more than.

Summary of the invention

Technical problem to be solved: the technical scheme of utilizing soft encoding and decoding and hard encoding and decoding to be used in conjunction with is proposed for above problem the present invention, many granularities and the program of exercising there is better locality to improve Cache hit rate, improve the execution efficiency of CPU.

Technical scheme: in order overcoming the above problems, to the invention provides the method for parallel processing of the many granularities of a kind of H264, in H264 coding hierarchical structure, can be divided into frame level, chip level and data level from big to small according to parallel granularity; It is characterized in that: the many granularities of described H264 are parallel comprises that frame level is parallel, chip level is parallel and parallel three kinds of data level, processing method is to utilize parallel to the computing of vector operations implementation data of arm instruction set, obtains considerable speed-up ratio; Concrete is as follows:

1.1 data levels are parallel

The data level of many granularities of H264 parallel encoding walks abreast and mainly comprises following two aspects:

1.1.1 the data level based on Multimedia Xtension is parallel

The described data level based on Multimedia Xtension is parallel has used MMX/SSE2 and AltiVec technology, polycaryon processor is embedded multimedia instruction set, thereby this multimedia instruction set by logarithm factually row vector computing reach the effect of parallel processing; Described multimedia instruction set is the expansion of SIMD; Described SIMD refers to that an instruction can be carried out computing to a vector of a plurality of data composition simultaneously when execute vector operates;

1.1.2 under heterogeneous polynuclear, the data level of dct transform is parallel

Data level is parallel be feature for heterogeneous polynuclear platform by the Module Division that has intensive to calculating from core, by large macroblock partitions is become to sub-block, from core, the sub-block after a plurality of divisions is being carried out to DCT conversion, finally remerging the dct transform result of large macro block; 16 * 16 big or small macro blocks are divided, the sub-block that is divided into 48 * 8 sizes, the DCT of these 4 sub-blocks conversion is carried out creating thread from core respectively, thereby again sub-block transformation results is transferred back to the dct transform that main core completes whole 16 * 16 macro blocks after each sub-block DCT conversion is complete;

1.2 chip levels are parallel

Described chip level is parallel is by frame image data cell-average is divided into a plurality of data blocks, then each sheet creates thread simultaneously and carries out parallel encoding, in cataloged procedure, add sheet header, after having encoded, successively each sheet data are combined into a frame image data again.

1.3. frame level is parallel

In H264 coding, I is intra-coded frame, does not need with reference to other frame, and as a reference, B is bi-directional predicted frames to the I frame that P frame needs forward direction, needs the I of forward and backward or P frame as its reference frame;

1.3.1 the method that realizes parallel I/P frame and B frame code synchronism in frame level design of Parallel Algorithms is that I/P frame and B frame are carried out to separately storage;

After described separately storage determines frame type to the image sequence reading in exactly, if I/P frame just carries out the operation of joining the team of I/P frame according to coded sequence, otherwise carry out the operation of joining the team of B frame, because the border of parallel granularity is I or P frame, therefore first parallel parsing initiates from I/P frame coding thread, concrete:

If I/P frame queue non-NULL takes out an I/P frame from queue, according to the coding parallel parsing coding B frame that taking-up can be parallel with this I/P frame from B frame queue again, create a plurality of threads and implement the parallel processing of frame level;

When take out P frame from I/P frame queue, by frame number corresponding relation, take out all walked abreast B frames before this P frame coding, comprise that the B frame that lags behind current P frame coded frame controls the synchronous of parallel encoding; Corresponding relation suc as formula:

f(P)-?f(B)>=T+2

Wherein f (P) is coding P frame frame number, and f (B) is coding B frame frame number, and T is B frame parameters;

When take out I frame from I/P frame queue, the reference frame that this I frame all picture frames are before described all exists, therefore before this I frame is encoded, first all codified frames in B frame queue are all taken out and create coding thread enforcement parallel encoding, subsequently again to this I frame, the initial frame of new GOP, encodes;

1.3.2 in frame level design of Parallel Algorithms, realizing the method that encoding code stream returns according to coding frame number sequential write is a special network abstraction layer structure queue of definition, and concrete algorithm design is as follows:

Coding creates the network abstraction layer data queue of regular length while starting, the data cell that the data cell that this network extraction layer queue has three kinds of state: A neither to comprise frame number also not comprise code stream altogether, data cell, the C that B only comprises frame number comprise frame number and complete code stream;

The initial condition of network extraction layer queue is all A, after the operation of at every turn coded image judgment frame type being joined the team, the operation of just also once joining the team in network extraction layer queue, this operation of joining the team only writes a frame number information, as the foundation that writes actual code flow data, now in this queue, data cell state becomes B; After the frame of appointment has been encoded, just according to current frame number of having encoded, in network extraction layer queue, search the data cell of the frame number of correspondence with it, the code stream of having encoded is all write into, now this data cell state becomes C; Owing to reading in image sequence and determining that the order of frame type is all frame coded sequence, therefore finally write code stream network extraction layer queue order afterwards, input and output thread is write data in disk file more together, in network extraction layer queue, the state of data cell becomes A from C again, be returned to initial condition, so circulation; With this, improve Code And Decode speed.

Useful effect:

The present invention uses many granularities parallel Programming technology, carries out correlation analysis, makes program have better locality to improve Cache hit rate, improves the execution efficiency of CPU, thereby has improved the speed of encoding and decoding.

Embodiment

Below the present invention is described in further details.

Embodiment:

A method for parallel processing for the many granularities of H264, in H264 coding hierarchical structure, can be divided into frame level, chip level and data level from big to small according to parallel granularity; The many granularities of described H264 are parallel comprises that frame level is parallel, chip level is parallel and parallel three kinds of data level, and processing method is to utilize parallel to the computing of vector operations implementation data of arm instruction set, and concrete is as follows:

1.1 data levels are parallel

1.1.1 the data level based on Multimedia Xtension is parallel

1.2 chip levels are parallel

1.3. frame level is parallel

f(P)-?f(B)>=T+2

The initial condition of network extraction layer queue is all A, after the operation of at every turn coded image judgment frame type being joined the team, the operation of just also once joining the team in network extraction layer queue, this operation of joining the team only writes a frame number information, as the foundation that writes actual code flow data, now in this queue, data cell state becomes B; After the frame of appointment has been encoded, just according to current frame number of having encoded, in network extraction layer queue, search the data cell of the frame number of correspondence with it, the code stream of having encoded is all write into, now this data cell state becomes C; Owing to reading in image sequence and determining that the order of frame type is all frame coded sequence, therefore finally write code stream network extraction layer queue order afterwards, input and output thread is write data in disk file more together, in network extraction layer queue, the state of data cell becomes A from C again, be returned to initial condition, so circulation.

Claims

1. a method for parallel processing for the many granularities of H264, in H264 coding hierarchical structure, can be divided into frame level, chip level and data level from big to small according to parallel granularity; It is characterized in that: the many granularities of described H264 are parallel comprises that frame level is parallel, chip level is parallel and parallel three kinds of data level, processing method is to utilize parallel to the computing of vector operations implementation data of arm instruction set, and concrete is as follows:

1.1 data levels are parallel

1.1.1 the data level based on Multimedia Xtension is parallel

1.2 chip levels are parallel

Described chip level is parallel is by frame image data cell-average is divided into a plurality of data blocks, then each sheet creates thread simultaneously and carries out parallel encoding, in cataloged procedure, add sheet header, after having encoded, successively each sheet data are combined into a frame image data again;

1.3. frame level is parallel

f(P)-?f(B)>=T+2