CN114245133A

CN114245133A - Video block coding method, coding transmission method, system and equipment

Info

Publication number: CN114245133A
Application number: CN202210164367.1A
Authority: CN
Inventors: 袁潮; 温建伟
Original assignee: Beijing Zhuohe Technology Co Ltd
Current assignee: Beijing Zhuohe Technology Co Ltd
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2022-03-25

Abstract

The invention provides a video block coding method, a coding transmission method, a system and equipment, and belongs to the technical field of video transmission processing. The method comprises the following steps: s100: acquiring a video picture; s200: determining a video picture segmentation mode; s300: acquiring the real-time processing capacity of an encoder; s400: determining the video picture segmentation number according to the real-time processing capacity of an encoder; s500: the video picture is divided to obtain a plurality of block picture data. The apparatus includes a processor and a memory having computer-executable program instructions stored thereon that are executed by the processor for implementing the method. The invention provides a method for coding an original picture after partitioning, which realizes the compression coding of high-resolution video content by adopting a plurality of video coders with lower cost. In addition, a special software or hardware decoder is matched at the client, so that the decoding, recombination and presentation of the coded high-resolution video can be realized.

Description

Video block coding method, coding transmission method, system and equipment

Technical Field

The present invention belongs to the technical field of video transmission processing, and in particular, to a video block encoding method, an encoding transmission method, a computer device, a system, and a storage medium for implementing the encoding transmission method.

Background

With the continuous improvement of the resolution of the image sensor, more and more pixels can be generated by a single image sensor, and at present, video sensors with more than 1 hundred million pixels are appeared, if the original video data is not compressed, the data volume is very large, and therefore coding compression is often needed, and commonly used video coding algorithms include MPEG-2, h.264, h.265 and the like, which can provide compression ratios of several tens to several hundreds times according to the difference of compression quality.

Because the video encoder has high design complexity and extremely high data throughput, a common commercial encoder cannot process images with excessively high resolution, and the common commercial video encoder can generally encode videos with 4K size (3840x2160) in real time at present. For higher resolution video, a more specialized encoder is required, which means higher price and usage thresholds.

On the other hand, the currently used video coding algorithm has asymmetry at the encoding and decoding ends, so that the processing capacity of the decoder is often several times that of the encoder, or the cost of the decoder is much lower than that of the encoder for the video with the same resolution.

How to realize the fast transmission and normal decoding and playing of high-resolution video images on the basis of the existing conventional commercial video decoder without increasing the hardware cost becomes a technical problem to be solved in the field.

Disclosure of Invention

In order to solve the above technical problem, the present invention provides a video block encoding method, a video block encoding transmission method, a computer device, a system, and a storage medium for implementing the video block encoding transmission method.

In a first aspect of the present invention, a method for video block coding is proposed, the method comprising the steps of:

s100: acquiring a video picture;

s200: determining a video picture segmentation mode according to the video picture size;

s300: acquiring the real-time processing capacity of an encoder;

s400: determining the video picture segmentation number according to the real-time processing capacity of an encoder;

s500: dividing the video picture based on the determined video picture dividing mode and the determined video picture dividing number to obtain a plurality of block picture data;

s600: each of the block picture data is assigned to a corresponding encoder to perform compression encoding.

As a further improvement, in order to cooperate with the subsequent decoding playing process, the step S500 further includes:

generating time stamp information and position stamp information for each of the block picture data;

inserting the time stamp information and the position stamp information into the each of the divided picture data.

At this time, after the step S600, the method further includes:

s700: and transmitting each block picture data subjected to the compression coding to a client for video live broadcast.

Based on the timestamp and the position stamp used in the encoding and compressing, the step S700 of executing decoding and playing specifically includes:

s701: the client receives the picture block description information to construct a video canvas;

s702: the client decodes each received block picture data, and the decoded block picture data comprises the time stamp information and the position stamp information of the block picture data;

s703: placing the blocked picture data with the same timestamp information to the corresponding position of the video canvas based on the position stamp information;

s704: and displaying a video picture spliced by the plurality of blocked picture data on a visual interface of the client.

In step S701, the picture segmentation description information includes a video picture segmentation mode and a video picture segmentation number.

As a further improvement, the step S400 specifically includes:

determining the number of the current idle encoders and the real-time processing capacity of each idle encoder according to the real-time processing capacity of the encoders;

determining a video picture segmentation number based on the number of idle encoders and the real-time processing capability of each idle encoder.

The method of the first aspect is based on the single-frame picture data as a whole, and the invention is introduced to the coding compression transmission and decoding playing basis of each single-frame picture data.

In practical applications, the video data is composed of multiple frames, and each frame of picture data needs to be transmitted as required, for this, in a second aspect of the present invention, a method for video block coding transmission is provided, the method includes the following steps:

SS 1: acquiring the size of a current video picture and the real-time processing capacity of a current encoder;

SS 2: determining a picture segmentation mode and a video picture segmentation number of a current video picture;

SS 3: dividing the current video picture based on the determined video picture dividing mode and the determined video picture dividing number to obtain a plurality of current block picture data;

SS 4: distributing each current block picture data to a corresponding encoder to execute compression encoding;

SS 5: transmitting each current block picture data after the compression coding is executed to a client;

SS 6: and acquiring the next video picture, taking the next video picture as the current video picture, and returning to the step SS 1.

As a further improvement, step SS3 further includes: edge extension is performed for each current block picture data.

In a third aspect of the present invention, to implement the method of the first aspect or the second aspect, a video block coding transmission system is provided, the system comprising:

the video frame acquisition module is used for acquiring each frame of picture of the hundred million-level pixel video;

the video size acquisition module is used for acquiring the size of each frame of picture;

an encoder capability detection module for detecting a real-time processing capability of each encoder;

the image frame segmentation module determines the segmentation mode and segmentation data of each frame of image based on the size of each frame of image and the real-time processing capacity of each encoder, and segments each frame of image based on the segmentation mode and the segmentation data to obtain a plurality of block image data;

a code compression module that distributes each of the block picture data to a corresponding at least one encoder to perform a code compression process;

a block transmission module that transmits each block picture data after performing the encoding compression process to the client.

The technical scheme of the invention can be automatically realized through computer equipment or a system based on computer program instructions.

In particular, in a fourth aspect of the present invention, there is provided a video block coding transmission device, the device comprising a processor and a memory, the memory having stored thereon computer-executable program instructions, which are executed by the processor, for implementing the method of the first or second aspect.

Further, in a fifth aspect of the present invention, the present invention may be implemented as a computer medium having stored thereon computer program instructions for executing the method of the first or second aspect.

Similarly, in a sixth aspect of the present invention, the present invention may also be embodied as a computer program product, which is loaded into a computer storage medium and executed by a processor, thereby implementing the method of the first or second aspect.

It can be seen that the present invention implements compression encoding of high resolution video content using multiple lower cost video encoders by providing a method for blocking and then encoding an original picture. In addition, the client can be matched with a special software or hardware decoder to decode, recombine and present the high-resolution video coded by the coding method described herein. Therefore, the technical solution of the present invention can also be summarized as a method and an apparatus for block coded transmission and decoded playback of high resolution video images.

Further advantages of the invention will be apparent in the detailed description section in conjunction with the drawings attached hereto.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart illustrating a method for block encoded transmission and decoded playback of high resolution video images according to an embodiment of the present invention;

FIGS. 2(a) -2 (c) are schematic diagrams of the method of FIG. 1 for determining the video picture segmentation mode;

3(a) -3 (b) are schematic diagrams of the manner in which four edges are expanded outward for the gridding block manner of FIG. 2 (a);

FIGS. 3(c) -3 (f) are schematic diagrams of the manner of edge extension for the gridding and blocking manner of FIGS. 2(b) and 2 (c);

FIG. 4 is a schematic diagram of a preferred embodiment for implementing the method of FIG. 1;

FIG. 5 is a schematic diagram of a preferred embodiment of the loop execution of the method for block encoded transmission and decoded playback of high resolution video images in accordance with one embodiment of the present invention;

FIG. 6 is a system diagram of a server, a client that performs the steps of the method described in FIGS. 1-5;

fig. 7 is a block architecture diagram of an apparatus that performs the steps of the method described in fig. 1-5.

Detailed Description

The invention is further described with reference to the following drawings and detailed description.

Referring to fig. 1, fig. 1 is a complete flow chart illustrating a method for block coded transmission and decoded playback of high resolution video images according to an embodiment of the present invention.

The method of FIG. 1 includes steps S100-S700; steps S100-S600 complete the process of video blocking, compression encoding, and step S700 completes the decoding and playing of the client.

Thus, in fact, steps S100-S600 shown in fig. 1 can be implemented as a method for video block coding, and steps S100-S700 are implemented as a method for video block coding transmission.

Specifically, the steps are realized as follows:

s100: acquiring a video picture;

s300: acquiring the real-time processing capacity of an encoder;

s600: distributing each block picture data to a corresponding encoder to perform compression encoding;

Next, this step introduces specific implementations of related sub-steps relating to the improvement point of the present invention.

Referring to fig. 2(a) -2 (c), schematic diagrams of determining multiple segmentation modes of a video picture in the method of fig. 1, that is, a specific implementation of step S200, are shown.

In practice, different frame division modes, including horizontal division, vertical division or gridding division, can be adopted according to different frame sizes and aspect ratios.

For example, for a common 16:9 or 4:3 picture, gridding segmentation may be employed, as shown in fig. 2 (a).

For pictures with horizontal dimensions much larger than vertical dimensions, horizontal segmentation may be employed, as shown in fig. 2 (b).

For pictures with vertical size much larger than horizontal size, vertical division can be used, as shown in fig. 2 (c).

It is understood that "substantially greater" herein means that one dimension is N times the other dimension, with N > 5.

After determining the segmentation mode, it is also necessary to determine the segmentation data, i.e. how many sub-meshes the current picture is segmented into.

In this regard, steps S300-S400 are performed, i.e., after acquiring the real-time processing capability of the encoder, the video picture division number is determined according to the real-time processing capability of the encoder.

Specifically, the step S400 specifically includes:

Taking the gridded partition described in fig. 2(a) as an example, it is known that existing commercial encoders can encode video at 25 frames per second at 4K resolution in real time. That is to say its real-time encoding capability is 3840x2160x25=207360000 pixels/sec.

We can grid the video as 5x8 grids, each grid size being 480x 432.

For a camera with 1000 frames per second, the amount of data produced per grid is 480x432x1000=207360000 pixels/second, not exceeding the real-time processing power of the encoder. For cameras with higher frame rates, this approach can be implemented using small mesh partitions or using higher processing power encoders.

The processing device internally comprises no less than the number of the grids of encoders which respectively carry out encoding compression on the video data of one grid. As calculated by the above example, the amount of data encoded per mesh is about 2MB/s, the amount of data generated by all meshes is about 80MB/s, and the I/O speed of many existing commercial storage devices can fully meet this requirement.

Therefore, step S500 of dividing the video picture based on the determined video picture division manner and the determined video picture division number to obtain a plurality of block picture data may be performed.

As a further improvement, referring to fig. 3(a) -3 (f), after obtaining each of the block picture data, the method further comprises: edge extension is performed for each of the block picture data.

Specifically, the further processing is carried out on the basis of the division, and n (n is more than or equal to 0) pixels are taken outwards from the edge of each block, so that a block larger than the original block is obtained. The number of pixels that are fetched outward by four sides of the grid may or may not be identical.

For example, the size of the original block is 200 × 200 pixels, and then after taking 8 more pixels out of each of the four sides, the size of the obtained new block is 216 × 216 pixels.

After expansion, there will be some overlap area between each partition and its neighboring partitions. As shown in fig. 3(a), the dark grid is the original image after being divided, and the single-line box represents the area occupied by the new expanded partition.

As can be seen from fig. 3(a), after edge extension is performed on the segmented blocks located at the edge of the image, the segmented blocks may include some regions that are not included in the original image, and these additional regions are shown as edge frame lines in fig. 3(b), and we call these regions as image edge extension regions. The pixel values of the regions can be customized according to requirements, for example, fixed values are taken, or the pixel values are the same as the pixel values of the nearest original image in the expansion direction.

Fig. 3(a) -3 (b) are views of the way of expanding all four edges of the gridding block-by-block way described in fig. 2(a) to the outside, and are referred to as standard expansion way.

Similar edge expansion can be adopted for the horizontal division or the vertical division described in fig. 2(b) and fig. 2(c), and the following is specifically introduced:

for the horizontal division method in fig. 2(a), in addition to the standard expansion method, the method in fig. 3(c) may be adopted, that is, the expansion is performed only in the horizontal direction, so as to reduce the data processing amount. For the edge blocking on both sides of the original image, it is only necessary to expand the corresponding edges to the left and right sides, respectively, as shown in fig. 3 (d).

In consideration of the particularity of the vertical division method shown in fig. 2(b), the method shown in fig. 3(e) may be used to reduce the data processing amount by performing the extension only in the vertical direction. For the edge blocking of both sides, it is also only necessary to expand the corresponding edges to the upper and lower sides, respectively, as shown in fig. 3 (f).

After the segmentation mode and the edge extension mode are determined, the image block information description can be obtained.

Specifically, the image blocking information must contain enough information to ensure that the blocked video pictures can be recombined according to the information, for example, as follows:

assuming that the original image resolution is 7680x4320, the original image is firstly divided into 10x10 blocks, each block has a size of 768x432, and then each block is respectively expanded to the four sides by 8 pixels, so that 100 blocks with a size of 784x448 are obtained, and the size of the whole image becomes 7696x4336 due to edge expansion. We record the image blocking information as follows:

{

split mode grid split

"originWidth": 7680,// original picture width

"originHeight": 4320,// original image height

Block Width 768// block width

"blockHeight": 432,// block height

8 for left extended width of each partition

8,// number of rightward extension widths of each partition

"uppaddsize": 8,// number of upward extension widths of each partition

Down PadSize 8// each block down extension width degree

}

Based on the above description, we can easily obtain the blocking manner of the image and the size of each block.

The above is only an example of the gridding blocking mode, and for the horizontal or vertical blocking mode, we can write similar blocking description information.

In order to ensure that the subsequent video decoding and playing can be normal, the step S500 further includes:

When each video block is encoded, position stamp information of the block in the whole picture and time stamp information of each frame picture are included in video data to be encoded and output. This information will be used for later video recombination.

The position stamp information may be stored in various ways as long as it can be corresponded to the image blocking information described above. For example, for a gridded block-partitioning approach, we can save the information of each block in the form of coordinates, for example, coordinates (0, 0) representing the block in the upper left corner of the image.

The time stamp information requires that the sub-pictures of each block in the same picture have the same time stamp information. The accuracy of the time stamp must be sufficient to distinguish between pictures at different times. For example in milliseconds, microseconds, or nanoseconds, or 1/90000 seconds, as is commonly used in video transmission systems, etc.

The blocked video frame data is compressed and encoded, the encoding algorithm may adopt various commonly used algorithms, such as h.264, h.265, MPEG-2, etc., and what kind of video encoding algorithm is adopted is not within the discussion range of the present invention, and the present invention does not make any restriction on this.

Next, step S700 is performed: and transmitting each block picture data subjected to the compression coding to a client for video live broadcast.

Specifically, the video blocking information and the data obtained by coding each block of video are directly transmitted to the client, and the method can be used for live video broadcasting.

In the transmission process, the description information of the image blocks is transmitted, and then the video frame stream of each video block is transmitted, wherein the video frame stream of each block contains the position information of the block in the whole picture and the time stamp information of each frame.

The transport channel and transport protocol may take many forms, such as MPEG2-TS, RTP, WebSocket, etc., and the present invention is not limited thereto.

The blocking information and the encoded video stream are saved to a computer file, which may be in a custom format or based on some existing standard, such as MP4 format. It is also necessary to save the block information of the image, the position information of each video block, and the time stamp information of each frame in the file.

The client can read the electronic file, and the picture is recombined and played after the information is analyzed.

Specifically, the step 700 of performing 5 video decoding and recomposing specifically includes:

The above process can be summarized as follows:

the client receives the picture block description information first, and a video canvas can be constructed according to the picture block description information.

Then the client decodes the received video stream of each video block, and the decoded video pictures contain respective position information, frame time stamps and picture data.

And the client finds the picture data with the same timestamp, then puts the picture data into different positions of the video canvas according to the position information, and cuts off the redundant part extending from the edge of each block.

The client renders the final picture on the screen.

Based on the above improvement, fig. 4 shows the following flow of a preferred embodiment:

after steps S100-S400 are performed, the method of fig. 4 continues by:

dividing the video picture based on the determined video picture dividing mode and the determined video picture dividing number to obtain a plurality of block picture data;

inserting the time stamp information and the position stamp information into the each of the divided picture data;

distributing each block picture data to a corresponding encoder to perform compression encoding;

the client receives the picture block description information to construct a video canvas;

the client decodes each received block picture data;

placing the blocked picture data with the same timestamp information to the corresponding position of the video canvas based on the position stamp information;

and displaying a video frame picture spliced by a plurality of blocked picture data on a visual interface of the client.

The method of fig. 1 or fig. 4 is based on single-frame picture data as a whole, and the present invention is introduced for the encoded compression transmission and the decoding playback basis of each single-frame picture data.

In practical applications, the video data is composed of multiple frames, and each frame of picture data needs to be transmitted as required, for which, fig. 5 is a schematic diagram of a preferred embodiment of the loop execution of the method for block coding transmission and decoding playing of high-resolution video images according to an embodiment of the present invention.

The method comprises the following steps:

determining the number of currently idle encoders and the real-time processing capability of each idle encoder according to the real-time processing capability of the encoders, thereby determining the number and the processing capability of each available encoder;

It should be noted that the embodiment shown in fig. 5 is further improved on the basis of fig. 1 or fig. 4, that is, in fig. 1 or fig. 4, the video picture segmentation mode can be determined only according to the video picture size, and the video picture segmentation number can be determined only according to the real-time processing capability of the encoder;

in fig. 5, since it is a process executed in a loop, the video picture division manner is no longer determined only by the video picture size or the video picture division number is determined only by the real-time processing capability of the encoder, but two factors of the size of the current video picture and the real-time processing capability of the current encoder are acquired for each frame in each step, and the picture division manner and the video picture division number of the current video picture are determined at the same time.

It will be appreciated that each slicing of the grid is performed for each frame of the current picture, and the grid specifications (video picture segmentation and number of video picture segmentations) are dynamically changed, since the real-time processing power of each video encoder currently in existence changes, and a previously idle video encoder may be busy for the next frame, while a previously busy video encoder may be idle for the next frame.

In addition, the aforementioned means of dividing, edge expanding, etc. proposed in fig. 1 or fig. 4 are also applicable to the corresponding steps of the embodiment of fig. 5,

namely, the step SS3 further includes: edge extension is performed for each current block picture data.

The video picture segmentation mode in the step SS2 includes horizontal segmentation, vertical segmentation and gridding segmentation;

of course, the video picture division manner in step SS2 includes one of horizontal division, vertical division, and gridding division, or any combination thereof, because of dynamic change.

in fig. 6, the video source refers to the original video pictures without coding compression, the video pictures are from a high-resolution mega pixel camera, and the following processing steps are described as follows:

1. picture chunking

The method comprises the following steps: picture segmentation and edge extension.

1.1 Picture splitting

According to the difference of the picture size and the aspect ratio, different picture division modes can be adopted, including horizontal division, vertical division or gridding division, and particularly, refer to fig. 2(a) -2 (c).

1.2 edge extension

Based on the division, the further processing is carried out, and n (n is more than or equal to 0) pixels are taken outwards from the edge of each block, so that a block larger than the original block is obtained. The number of pixels that are fetched outward by four sides of the grid may or may not be identical.

For the segmentation blocks located at the edge of the image, after the edge extension, some regions not included in the original image are included in the segmentation blocks, and we call these regions as image edge extension regions. The pixel values of the regions can be customized according to requirements, for example, fixed values are taken, or the pixel values are the same as the pixel values of the nearest original image in the expansion direction.

Other modes are not described in detail, and refer to fig. 3(a) -3 (f).

1.3 image blocking information description

The image blocking information must contain enough information to ensure that the blocked video pictures can be recombined based on this information, for example as follows:

{

split mode grid split

"originWidth": 7680,// original picture width

"originHeight": 4320,// original image height

Block Width 768// block width

"blockHeight": 432,// block height

8 for left extended width of each partition

8,// number of rightward extension widths of each partition

"uppaddsize": 8,// number of upward extension widths of each partition

Down PadSize 8// each block down extension width degree

}

2. Block video coding

The video data blocked in the previous step is compressed and encoded, and the encoding algorithm may adopt various commonly used algorithms, such as h.264, h.265, MPEG-2, etc., and what video encoding algorithm is adopted is not within the scope of the present invention, and the present invention is not limited thereto.

When each video block is encoded, the video data to be encoded and output includes position information of the block in the entire picture and time stamp information of each frame of the picture. This information will be used for later video recombination.

The position information may be stored in various ways as long as it can be associated with the image blocking information described in section 1.3. For example, for a gridded block-partitioning approach, we can save the information of each block in the form of coordinates, for example, coordinates (0, 0) representing the block in the upper left corner of the image.

3. Video transmission

The method is to directly transmit the video block information and the data after each block video coding to the client, and can be used for video live broadcast.

In the transmission process, the description information of the image blocks is transmitted, and then the video stream of each video block is transmitted, wherein the video stream of the block contains the position information of the block in the whole picture and the time stamp information of each frame.

4. Data storage and reading

The blocking information and the encoded video stream may also be saved to a computer file, which may be in a custom format or based on some existing standard, such as MP4 format. It is also necessary to save the block information of the image, the position information of each video block, and the time stamp information of each frame in the file.

5. Video decoding and recomposition

The client renders the final picture on the screen.

Based on the architecture of fig. 6, see fig. 7, a block architecture diagram of a device that performs the steps of the method described in fig. 1-5.

In fig. 7, a video block coding transmission system is shown, the system comprising:

The technical scheme of the invention can be automatically realized by computer equipment based on computer program instructions. Similarly, the present invention can also be embodied as a computer program product, which is loaded on a computer storage medium and executed by a processor to implement the above technical solution.

Further embodiments therefore include a computer device comprising a memory storing a computer executable program and a processor configured to perform the steps of the above method.

The invention realizes the compression coding of high-resolution video content by adopting a plurality of video coders with lower cost by providing a method for firstly partitioning and then coding an original picture. In addition, the client can be matched with a special software or hardware decoder to decode, recombine and present the high-resolution video coded by the coding method described herein. Therefore, the method of the present invention can also be summarized as a method and apparatus for block encoded transmission and decoded playback of high resolution video images.

It should be noted that the present invention can solve a plurality of technical problems or achieve corresponding technical effects, but does not require that each embodiment of the present invention solves all the technical problems or achieves all the technical effects, and an embodiment that separately solves one or several technical problems or achieves one or more improved effects also constitutes a separate technical solution.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

The present invention is not limited to the specific module structure described in the prior art. The prior art mentioned in the background section can be used as part of the invention to understand the meaning of some technical features or parameters. The scope of the present invention is defined by the claims.

Claims

1. A method for video block coding, the method comprising the steps of:

s100: acquiring a video picture;

s300: acquiring the real-time processing capacity of an encoder;

2. The method of claim 1, wherein:

the step S500 further includes:

3. The method of claim 2, wherein after the step S600, the method further comprises:

4. The method of claim 3, wherein the step S700 specifically comprises:

5. The method of claim 1, wherein:

the video picture segmentation method in step S200 includes horizontal segmentation, vertical segmentation, and gridding segmentation.

6. The method of claim 1, wherein:

the step S400 specifically includes:

7. A method for video block coded transmission, the method comprising the steps of:

8. The method of claim 7, wherein the video block coding transmission,

step SS3 further includes: edge extension is performed for each current block picture data.

9. A video block coding transmission system, the system comprising:

10. A video block coding transmission apparatus, the apparatus comprising a processor and a memory, the memory having stored thereon computer-executable program instructions, the executable program instructions being executable by the processor for implementing the method of any one of claims 1 to 8.