CN112714319B

CN112714319B - Computer readable storage medium, video encoding and decoding method and apparatus using multiple execution units

Info

Publication number: CN112714319B
Application number: CN202011552608.7A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Shanghai Biren Intelligent Technology Co Ltd
Current assignee: Shanghai Bi Ren Technology Co ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2023-01-13
Anticipated expiration: 2040-12-24
Also published as: CN112714319A

Abstract

The invention relates to a computer readable storage medium, a video coding and decoding method and a device using a plurality of execution units, wherein the video coding and decoding method comprises the following steps: finding out a part which can be processed in parallel from source data, wherein the part which can be processed in parallel comprises a plurality of images which are not mutually referenced with each other or a plurality of frames which are not mutually referenced with each other; sending a command to a video codec engine to request a plurality of execution units to process the plurality of images or the plurality of frames in parallel, wherein the video codec engine comprises a plurality of execution units; obtaining result data for the portion from a plurality of execution units; and storing the result data in the memory. The invention enables the computing power of a plurality of hardware execution units in the video coding and decoding engine to be fully used by sending the commands, thereby improving the data throughput and reducing the total time delay.

Description

Computer readable storage medium, video encoding and decoding method and apparatus using multiple execution units

Technical Field

The present invention relates to video encoding and decoding, and more particularly, to a video encoding and decoding method and apparatus using multiple hardware execution units.

Background

In a virtualized environment, the video codec engine is shared by multiple virtual machines. As more and more applications demand video codec engines, the performance and quality of the codec engines becomes higher and higher. Particularly, in the virtualization case, the Quality of Service (QoS) of the video codec engine needs to be considered on the basis of guaranteeing the fairness of each virtual machine user.

Referring to fig. 1, conventionally, virtualization of a video codec engine 110 is to time-share all execution units 110-1 to 110-4 in the entire codec engine 110 for use by different virtual machine clients 120-1 to 120-4. However, in a virtualization scenario where the number of video sequences that can be encoded or the number of video streams that can be decoded in a single virtual machine client is less than the number of physical hardware codec units, the physical hardware codec units are not completely filled each time the virtual machine client obtains execution time, which results in a decrease in data throughput and an increase in total time delay. For example, at time T1, virtual machine client 120-2 obtains execution time. Since the video stream is serially encoded/decoded in the current solution, only one of the four execution units 110-1 to 110-4 can load video frame data and perform encoding and decoding at this time, so that only the execution unit 110-1 serves the virtual machine client 120-2 and the other execution units 110-2 to 110-4 are in an idle state. At time T2, the virtual machine guest 120-1 obtains execution time, but only the execution unit 110-1 services the virtual machine guest 120-1, and the other execution units 110-2 through 110-4 are idle. The service status at a later time, and so on, as shown in table 1:

TABLE 1

Time slot #	EU1	EU2	EU3	EU4
					T1	VM2-P1	N/A	N/A	N/A
T2	VM1-P1	N/A	N/A	N/A
					T3	VM4-P1	N/A	N/A	N/A
T4	VM3-P1	N/A	N/A	N/A
					T5	…	…	…	…

Because each time the virtual machine client obtains the execution time, several execution units are always in an idle state, the total delay time is increased, and the waste of hardware resources is caused. In order to solve the above problems, the present invention provides a video encoding and decoding method and apparatus using multiple hardware execution units, which are used to improve data throughput and reduce total time delay.

Disclosure of Invention

The invention relates to a video coding and decoding method using a plurality of execution units, which is executed by a processor and comprises the following steps: finding out a part which can be processed in parallel from source data, wherein the part which can be processed in parallel comprises a plurality of images which are not mutually referenced with each other or a plurality of frames which are not mutually referenced with each other; sending a command to a video codec engine to request a plurality of execution units to process the plurality of pictures or the plurality of frames in parallel, wherein the video codec engine comprises a plurality of execution units; obtaining result data for the portion from a plurality of execution units; and storing the result data in the memory.

The invention also relates to a computer readable storage medium for storing program code that can be executed by a processor and which, when executed by the processor, implements a video codec method as described above using a plurality of execution units.

The invention relates to a video coding and decoding device using a plurality of execution units, comprising: a memory; the video coding and decoding engine comprises a plurality of execution units; and a processor. The processor is coupled with the memory and the video coding and decoding engine in a bus structure and used for finding out a part which can be processed in parallel from source data, wherein the part which can be processed in parallel comprises a plurality of pictures which are not mutually referenced with each other or a plurality of frames which are not mutually referenced with each other; sending a command to a video codec engine to request a plurality of execution units to process the plurality of images or the plurality of frames in parallel; obtaining result data for the portion from a plurality of execution units; and storing the result data in the memory.

One of the advantages of the above embodiments is that by issuing the above command, the computing capabilities of the hardware execution units in the video codec engine are fully utilized, thereby improving data throughput and reducing total time delay.

Other advantages of the present invention will be explained in more detail in conjunction with the following description and the accompanying drawings.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application.

Fig. 1 illustrates an example of a codec task executed by a plurality of execution units for a plurality of virtual machines.

FIG. 2 is a block diagram of an electronic device according to an embodiment of the invention.

FIG. 3 is an example of a video string according to an embodiment of the present invention.

FIG. 4 is a flowchart of a video encoding and decoding method using multiple EUs according to an embodiment of the present invention.

FIG. 5 is a diagram illustrating an example of codec tasks executed by multiple execution units for multiple virtual machines according to an embodiment of the present invention.

Wherein the symbols in the drawings are briefly described as follows:

110: a video encoding and decoding engine; 110-1 to 110-4: an execution unit; 120-1 to 120-4: a virtual machine; 20: an electronic device; 210: a video encoding and decoding engine; 212: a virtual machine scheduler; 214: an execution unit controller; 216-1 to 216-4: an execution unit; 230: a processor; 230-1 to 230-4: a virtual machine; 232: a video separator; 234: a media frame; 236: a video driver; 250: a memory; 252: a video sequence; 254: an encoded video string; 255: reference data; 256: encoding results; 258: decoding the result; 270: a bus architecture; 310. 332 to 338, 352 to 358, 370: a picture; s410 to S470: the method comprises the following steps.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings. In the drawings, the same reference numerals indicate the same or similar components or process flows.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of further features, integers, steps, operations, elements, components, and/or groups thereof.

The use of words such as "first," "second," "third," etc. in this disclosure is intended to modify a component in a claim and is not intended to imply a priority order, precedence relationship, or order between components or steps in a method.

It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is described as being "directly connected" or "directly coupled" to another element, there are no intervening elements present. Other words used to describe the relationship between components may also be interpreted in a similar manner, e.g., "between" versus "directly between," or "adjacent" versus "directly adjacent," etc.

Refer to fig. 2. The electronic device 20 may be implemented as a mainframe, workstation, personal computer, notebook computer (Laptop PC), tablet computer, mobile phone, digital camera, digital video camera, and other electronic products. The electronic device 20 includes a video codec engine 210, a processor 230, and a Memory (Memory) 250, and is connected to each other through a bus structure 270 to transfer data, addresses, control signals, etc., such as Peripheral Component Interconnect Express (PCI-E). The processor 230 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a General-Purpose Graphics Processing Unit (GPGPU), and the like, and is configured to load and execute program codes of the plurality of virtual machines 230-1 to 230-4. The virtual machine is a computer system simulated by software, has complete hardware functions, and runs in an environment completely isolated from other virtual machines. Different virtual machines can run in different operating systems, such as Linux, red hat (Redhat), and the like, and the different virtual machines are physically isolated from each other, that is, information such as data, data tables, variables, and execution results in one virtual machine cannot be accessed by any other virtual machine. Although 4 virtual machines 230-1 through 230-4 are depicted in FIG. 2, one skilled in the art may implement more or less virtual machines in processor 230 as needed by the system, and the invention is not so limited.

In the case of virtualization, the video codec engine 210 includes a plurality of Hardware Execution Units (EUs) 216-1 to 216-4 for all the virtual machines 230-1 to 230-4, so as to improve the codec performance of the user. The execution units 216-1 to 216-4 may independently perform video encoding and/or decoding. In some embodiments, execution units 216-1 through 216-4 are coding units (encoding units). In other embodiments, execution units 216-1 through 216-4 are decoding units (decoding units). In other embodiments, execution units 216-1 to 216-4 are all codec units (encoding-decoding units) capable of operating in an encoding mode or a decoding mode according to a control signal. The EUs 216-1 through 216-4 are scheduled for simultaneous switching, for example, when the VM 230-1 obtains access at a time, all of the EUs 216-1 through 216-4 are allocated to the VM 230-1 for use. Although 4 execution units 216-1 to 216-4 are illustrated in fig. 2, one skilled in the art may arrange more or less execution units in the video codec engine 210 according to the needs of the system, and the present invention is not limited thereby.

The memory 20 may be configured with space for storing an original video sequence (video sequence) 252, which includes a plurality of pictures (pictures) as source data for video coding. The memory 20 may also be configured with a space for storing encoded (or compressed) video streams 254, which include a plurality of frames (frames), such as intra-coded frames (I-frames), predictive-coded frames (P-frames), and bidirectional predictive-coded frames (B-frames), as source data for video decoding. In addition, the memory 20 may also be configured with a space for storing reference data 255 required in the encoding or decoding process, such as reference pictures and reference frames required in encoding, reference pictures and reference frames required in decoding, and the like.

Each virtual machine contains an independently running video splitter (demux) 232, media frame (media frame) 234, and video driver (video driver) 236. The video separator 232, media framework 234, and video driver 236 contain computer code and when loaded and executed by the processor 230 perform the specified operations in the video encoding or decoding flow associated with a virtual machine as described below. In video encoding, the video separator 232 retrieves the video sequence 252 from the memory 20 and separates a plurality of pictures to be encoded according to the management data therein, so that the media framework 234 can transmit the retrieved pictures to the video driver 236 one by one. In video decoding, the video separator 232 retrieves the encoded video stream 254 from the memory 20 and separates a plurality of frames to be decoded according to the management data therein, so that the media framework 234 can transmit the retrieved frames to the video driver 236 one by one. The media framework 234 is responsible for controlling the overall encoding or decoding process. The video driver 236 converts the data received from the media frame 234 into a data format that the hardware execution units 216-1 to 216-4 can interpret, assigns the execution units 216-1 to 216-4 a task of encoding a specific image or a task of decoding a specific frame, and controls the execution and synchronization of the execution units 216-1 to 216-4. The video driver 236 may send commands regarding the encoding task of a particular picture or the decoding task of a particular frame to a virtual-machine scheduler (virtual-machine scheduler) 212 in the video codec engine 210 through the bus architecture 270 and prepare the required reference data 255 in the memory 250 for each encoding or decoding task.

In order for the execution units 216-1 to 216-4 to perform video encoding or decoding in parallel, the video driver 236 separates multiple images that can be encoded in parallel, or multiple frames that can be decoded in parallel, from the source data that needs to be encoded or decoded, and generates a task of encoding one or more images at the same time, or a task of decoding one or more frames at the same time. During the encoding process, multiple pictures that are not mutually referenced to each other may be processed in parallel. When coded in parallel, these pictures may refer to the same or different other pictures. During decoding, multiple frames that are not mutually referenced to each other may be processed in parallel. These frames may refer to the same or different other frames when decoded in parallel.

For example, refer to one example of fig. 3. Assuming that 10 pictures 310, 332-338, 352-358, and 370 are included in video sequence 252, picture 310 needs to be encoded as an I frame, picture 370 needs to be encoded as a P frame of reference picture 310, and each of pictures 332-358 needs to be encoded as a B frame of

reference pictures

310 and 370. The execution units 216-1 to 216-4 may use compression techniques such as Discrete Cosine Transform (DCT), quantization, entropy encoding, etc. to eliminate or reduce spatial redundancy in the picture 310, which does not need to refer to other pictures in the video sequence 252 during compression. The execution units 216-1 to 216-4 may perform predictive coding on the picture 370 using information of the picture (or I-frame) 310 to remove temporal redundancy between the picture 310 and the picture 370, represent predicted blocks using Motion Vectors (Motion Vectors) referenced to the picture (or I-frame) 310, and use compression techniques such as discrete cosine transform, quantization, entropy coding, etc. on non-predicted blocks in the picture 370 to remove or reduce spatial redundancy in the picture 370. Reference picture 310 is needed during the encoding of picture 370. The execution units 216-1 to 216-4 may perform predictive coding on the picture 332 using information of the

pictures

310 and 370 to remove temporal redundancy between the picture 332 and the picture 310 and between the picture 332 and the picture 370, represent each predicted block using a motion vector referred to the picture (or I frame) 310 or the picture (or P frame) 370, and use compression techniques such as discrete cosine transform, quantization, entropy coding, and the like on blocks not predicted in the picture 332 to remove or reduce spatial redundancy in the picture 332. Reference pictures 310 and 370 are needed in the coding of picture 332. The coding of the frames 334 to 358 can be analogized from the above-mentioned technical content, and is not described again for brevity. Since the pictures 332 to 358 do not refer to each other during the encoding process, the video driver 236 can allow the encoding of the pictures 332 to 358 to be processed arbitrarily in parallel. However, because a plurality of execution units need to spend some time to synchronize with each other when being executed in parallel, according to experiments, the total efficiency of the two execution units when being executed in parallel is about 90%, and the acceleration ratio is 1.8; the total efficiency of the four execution units when actually executed in parallel is about 75%, and the acceleration ratio is 3. The overall efficiency of more than four execution units when actually executed in parallel may be close to or less than 50%. Accordingly, the video driver 236 may indicate encoding of a maximum of four pictures in one encoding command.

For example, the video driver 236 of the virtual machine 230-1 may issue a first command to the virtual machine scheduler 212 at a time point t1 to indicate that the frame 310 is encoded as an I frame, where the first command may include information such as an identifier of the virtual machine 230-1, a number of the frame 310, a start address stored in the memory 250, and a length. Video driver 236 may also store picture 310 and/or the encoded I frame in memory 250 as reference data 255 for subsequent pictures 332-370. The video driver 236 may issue a second command to the vm scheduler 212 at time t2 to encode the picture 370 into a P frame, where the second command may include information such as an identifier of the vm 230-1, a number of the picture 370, a start address and a length stored in the memory 250, a start address and a length of the picture 310 and/or a length of an encoded I frame stored in the memory 250. The video driver 236 may also store the picture 370 and/or the encoded P frame in the memory 250 as reference data 255 for subsequent pictures 332 through 358. The video driver 236 may issue a third command to the vm scheduler 212 at time t3 to indicate that the frames 332 to 338 are encoded as B frames, where the third command may include information such as an identifier of the vm 230-1, numbers of the frames 332 to 338, start addresses and lengths of the frames 332 to 338 stored in the memory 250, start addresses and lengths of the frame 310 and/or the encoded I frames stored in the memory 250, and start addresses and lengths of the frame 370 and/or the encoded P frames stored in the memory 250. At time t4, the video driver 236 may issue a fourth command to the vm scheduler 212 to encode the pictures 352 through 358 into B frames, where the fourth command may include the identifier of the vm 230-1, the numbers of the pictures 352 through 358, the start address and length stored in the memory 250, the start address and length of the picture 310 and/or the encoded I frame stored in the memory 250, the start address and length of the picture 370 and/or the start address and length of the encoded P frame stored in the memory 250, and the like. Memory 20 may also be configured with space for storing coding results 256, including coded I-frames, B-frames, and P-frames. The output results of each execution unit may be sorted in the order of encoding or the order of frames and then written into the memory 20 or the storage. In addition, the encoded I frame, B frame, and P frame may output the code stream in the order of actual encoding.

How to issue a plurality of commands to the vm scheduler 212 during the decoding process by the video driver 236 is opposite to the example shown in fig. 3, and the technical details for instructing to decode the encoded I frame, B frame, and P frame can be deduced by those skilled in the art according to the above-mentioned encoding example and referring to the video decoding technology that is common in the art, and are not described again for brevity. Similarly, the video driver 236 may issue a command to the VM scheduler 212 to process multiple B frames in parallel that are not mutually referenced to each other. The memory 20 may also be configured with space for storing decoding results 258, including reconstructed (or decompressed) pictures from the encoded I, B, and P frames, for playback by the associated application in the particular virtual machine.

The virtual machine scheduler 212 includes a command queue for storing commands received from the video drivers 236 of the virtual machines 230-1 to 230-4 according to the order of arrival times. In each time interval, the virtual machine scheduler 212 picks a command associated with a particular virtual machine from the command queue using a preset algorithm and transmits the picked command to the execution unit controller 214. For example, the virtual machine scheduler 212 may choose the commands associated with a particular virtual machine by round robin (round robin), by ensuring that predefined conditions regarding the quality of service of different virtual machines are met, or by other algorithms. The EU controller 214 reads one or more pictures to be encoded, one or more frames to be decoded, one or more pictures to be referenced, one or more frames to be referenced, other required reference data (such as quantization matrices, etc.), or any combination thereof from the memory 250 via the bus architecture 270 according to the parameters in the command, and transmits the read data to one or more of the EUs 216-1 to 216-4, so that the driven EUs start the actual video encoding or decoding task. When any of the driven EUs completes a video encoding or decoding task, the EU controller 214 notifies the video driver 236 in the corresponding virtual machine. Then, the notified video driver 236 reads the encoded I frame, B frame, or P frame from the designated execution unit and stores the read frame in the space for storing the encoding result 256 in the memory 250 through the bus structure 270, or reads the reconstructed picture from the I frame, B frame, or P frame and stores the read frame in the space for storing the decoding result 258 in the memory 250 through the bus structure 270.

Referring to the flow chart of the video codec method shown in fig. 4, the method is implemented by the processor 230 when loading and executing the computer code of the video driver 236 of a virtual machine. The method can be applied to scenes such as real-time or offline coding of video sequences using a plurality of hardware coding units or real-time or offline decoding of video strings using a plurality of hardware decoding units, and comprises the following steps:

step S410: the variable i is initially 0 for numbering the video processing tasks.

Step S420: the first or next portion that can be read or processed in parallel is found from the source data and the ith video processing task is generated. In the case of a video coding scenario, the source data may be the original video sequence 252, and the portion that can be processed separately may refer to a picture (e.g., a picture that needs to be encoded as an I frame or a P frame), and the portion that can be processed in parallel may refer to a plurality of pictures that are not referenced to each other (e.g., 4 pictures that need to be encoded as B frames that are not referenced to each other). In the case of a video decoding scenario, the source data may be the encoded video stream 254, and the portion that can be processed separately may refer to a frame (e.g., an I-frame or a P-frame), and the portion that can be processed in parallel may refer to a plurality of frames that are not referenced to each other (e.g., 4B-frames that are not referenced to each other).

Step S430: data that needs to be referred to when the ith video processing task is executed, such as other pictures that need to be referred to when encoding (e.g., pictures that need to be encoded as I frames and/or P frames) and/or other reference frames (e.g., encoded I frames and/or P frames), other pictures that need to be referred to when decoding (e.g., pictures reconstructed from I frames and/or P frames), and/or other frames (e.g., I frames and/or P frames), etc., are prepared in the memory 250, so that the execution units 216-1 to 216-4 can read other images and/or other frames that need to be referred to from the memory 250 through the bus structure 270 in the calculation process.

Step S440: a command is sent to the video codec engine 210 to request the ith video processing task to be performed. The command may include an identifier of the virtual machine, which is used to allow the virtual machine scheduler 212 to schedule and execute the command accordingly. The command may further include information such as the location of the portion to be processed stored in the memory 250, the location of the reference data stored in the memory 250, and the like, for the execution unit controller 214 to read the portion to be processed in the video sequence 252 or the encoded video string 254 from the specified location in the memory 250, and to read the reference data from the specified location in the memory 250 when necessary.

Step S450: upon notification from the execution unit controller 214, the result data (e.g., encoded I, P, or B frames, or decoded pictures from I, P, or B frames) of the ith video processing task is obtained from a designated hardware execution unit in the video codec engine 210.

Step S460: the result data of the ith video processing task is stored in the memory 250.

Step S470: and judging whether the video processing flow is ended or not. If so, the entire flow ends. Otherwise, the flow proceeds to the process of step S480.

Step S480: the variable i is incremented by 1.

Compared to the prior art, the video codec method according to the embodiment of the present invention can use the computing power of the hardware execution units 216-1 to 216-4 of the video codec engine 210 as much as possible. Referring to the execution result example of FIG. 5, at time T1, the virtual machine client 230-2 obtains execution time, and the four execution units 216-1 through 216-4 load parts that can be processed in parallel and perform corresponding encoding or decoding, so that the four execution units 216-1 through 216-4 can all serve the virtual machine client 230-2. At time T2, the virtual machine client 230-1 obtains execution time, and the four execution units 216-1 through 216-4 load portions that can be processed in parallel and perform corresponding encoding or decoding, so that the four execution units 216-1 through 216-4 can all service the virtual machine client 230-1. The service status at a later time, and so on, as shown in table 2:

TABLE 2

Time slot #	EU1	EU2	EU3	EU4
					T1	VM2-P1	VM2-P2	VM2-P3	VM2-P4
T2	VM1-P1	VM1-P2	VM1-P3	VM1-P4
					T3	VM4-P1	VM4-P2	VM4-P3	VM4-P4
T4	VM3-P1	VM3-P2	VM3-P3	VM3-P4
					T5	…	…	…	…

One of the advantages of the above embodiments is that by sending the video processing task including multiple pictures or multiple frames that can be processed in parallel to the video codec engine 210, the computing capabilities of the hardware execution units 216-1 to 216-4 in the video codec engine 210 can be fully utilized, thereby improving data throughput and reducing total time delay.

All or a portion of the steps of the methods described herein may be implemented by a computer program, such as any combination of an application program, a driver, an operating system, and the like. In addition, other types of programs as shown above may also be implemented. Those skilled in the art can write the method of the embodiments of the present invention as computer code, which will not be described for the sake of brevity. The computer program implemented according to the embodiments of the present invention can be stored in a suitable computer readable storage medium, such as a DVD, a CD-ROM, a usb disk, a hard disk, or can be disposed in a network server accessible via a network (e.g., the internet, or other suitable medium).

Although the components described above are included in fig. 2, it is not excluded that more additional components may be used to achieve better technical results without departing from the spirit of the present invention. Further, although the flowchart of fig. 4 is executed in the order specified, a person skilled in the art may modify the order between the steps to achieve the same effect without departing from the spirit of the invention, and therefore, the invention is not limited to use of only the order described above. In addition, one skilled in the art may also integrate several steps into one step, or perform more steps in sequence or in parallel besides these steps, and the present invention should not be limited thereby.

The above description is only for the preferred embodiment of the present invention, and it is not intended to limit the scope of the present invention, and any person skilled in the art can make further modifications and variations without departing from the spirit and scope of the present invention, therefore, the scope of the present invention should be determined by the claims of the present application.

Claims

1. A video coding/decoding method using a plurality of execution units, comprising:

for each virtual machine in a plurality of virtual machines executed in a processor, respectively finding out a part which can be processed in parallel from source data and storing the part in a memory, wherein the part which can be processed in parallel comprises a plurality of images which are not mutually referenced or a plurality of frames which are not mutually referenced;

the processor sends a command of an encoding task of a specific image or a decoding task of a specific frame to a virtual machine scheduler of the video encoding and decoding engine for each virtual machine;

in each time interval, the virtual machine scheduler selects a command associated with a specific virtual machine from a command queue and sends the command to an execution unit controller of the video codec engine to read a part to be processed and reference data from the memory and request the plurality of execution units to process the plurality of images or the plurality of frames in parallel, wherein the video codec engine comprises the plurality of execution units;

the processor obtaining result data for the portion from the plurality of execution units; and

and storing the result data in the memory.

2. The method of claim 1, wherein the number of the plurality of pictures or the plurality of frames does not exceed four.

3. The method of claim 1, wherein the plurality of pictures or the plurality of frames refer to the same other pictures or other frames.

4. The method of claim 1, wherein the method comprises:

preparing other images or other frames in the memory to which the plurality of images or the plurality of frames refer, so that the plurality of execution units can read the other images or other frames from the memory in a calculation process.

5. The method of claim 1, wherein the plurality of pictures are pictures that need to be encoded into a plurality of predictive-coded frames or bidirectionally predictive-coded frames, and the plurality of frames are a plurality of predictive-coded frames or bidirectionally predictive-coded frames.

6. The method of claim 1, wherein the EUs are a plurality of CU's, the source data is original video sequence, and the portion capable of being processed in parallel comprises a plurality of pictures that are not mutually referenced.

7. The method of claim 1, wherein the plurality of EUs are a plurality of ECs, the source data is an encoded video stream, and the portion capable of being processed in parallel comprises a plurality of frames that are not mutually referenced.

8. A computer readable storage medium storing computer code executable by a processor, the computer code implementing a video codec method using multiple execution units as claimed in any one of claims 1 to 7 when executed by the processor.

9. A video encoding and decoding apparatus, comprising:

a memory;

the video coding and decoding engine comprises a virtual machine scheduler, an execution unit controller and a plurality of execution units; and

a processor executing a plurality of virtual machines, coupled to the memory and the video codec engine in a bus structure, for finding a portion capable of being processed in parallel from source data and storing the portion in the memory for each virtual machine, wherein the portion capable of being processed in parallel includes a plurality of pictures or frames that are not mutually referenced; sending a command for each virtual machine to the virtual machine scheduler regarding an encoding task of a specific image or a decoding task of a specific frame;

the virtual machine scheduler selects a command associated with a specific virtual machine from a command queue in each time interval and sends the command to the execution unit controller to read a part to be processed and reference data from the memory and request the execution units to process the images or the frames in parallel;

the processor obtaining result data for the portion from the plurality of execution units; and storing the result data in the memory.

10. The video codec of claim 9, wherein the number of the plurality of pictures or the plurality of frames does not exceed four.

11. The video coding and decoding apparatus according to claim 9, wherein the plurality of pictures or the plurality of frames refer to the same other pictures or other frames.

12. The video codec of claim 9, wherein the processor is configured to prepare other pictures or other frames in the memory that are referenced by the multiple pictures or the multiple frames, such that the multiple execution units can read the other pictures or other frames from the memory during the calculation.

13. The video coding/decoding apparatus according to claim 9, wherein the plurality of pictures are pictures that need to be coded into a plurality of predictive coded frames or bidirectionally predictive coded frames, and the plurality of frames are the plurality of predictive coded frames or bidirectionally predictive coded frames.

14. The video codec of claim 9, wherein the execution units are coding units, the source data is an original video sequence, and the portion capable of being processed in parallel comprises pictures that are not mutually referenced.

15. The video codec of claim 9, wherein the plurality of execution units are a plurality of decoding units, the source data is an encoded video stream, and the portion capable of being processed in parallel comprises a plurality of frames that are not mutually referenced.