CN114760455A

CN114760455A - Multi-channel video multi-view scene coding and decoding method based on AVS3 coding framework

Info

Publication number: CN114760455A
Application number: CN202210329909.6A
Authority: CN
Inventors: 陈智敏; 龙仕强; 林晓帆; 曾辉; 张伟民
Original assignee: Guangdong Bohua Ultra Hd Innovation Center Co ltd
Current assignee: Guangdong Bohua Ultra Hd Innovation Center Co ltd
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-07-15
Anticipated expiration: 2042-03-30
Also published as: CN114760455B

Abstract

A multi-channel video multi-view scene coding and decoding method based on AVS3 coding frame includes the following steps: the method comprises the following steps of firstly, multi-view AVS3 encoding, secondly, time synchronization, thirdly, inventing an encoding data structure, fourthly, expanding data filling, fifthly, packaging audio and video into TS (transport stream), and sixthly, transmitting the TS: the TS stream is transmitted by the network, and the seventh step is that the TS stream is received and decoded: and decoding the TS stream by the user after receiving the TS stream, decoding the image extension data, and selecting a scene to watch by the user in the ninth step. The method realizes panoramic multi-angle coding under an AVS coding framework, solves the problem of a plurality of boundary fuzziness caused by image splicing of the video, has high scene coding and decoding efficiency, does not use equipment of a splicing algorithm and a CPU \ GPU, and can save cost.

Description

Multi-channel video multi-view scene coding and decoding method based on AVS3 coding framework

Technical Field

The invention belongs to the field of video compression coding, and particularly relates to a multi-channel video multi-view scene coding and decoding method based on an AVS3 coding framework.

Background

In 2021, the ultra high definition 8K coding Standard third generation Audio/Video Standard AVS3(Audio Video Standard3) in China has been commercialized in various parts of China, especially when a central television central office CCTV 8K ultra high definition test channel is broadcast, the mark formally enters the 8K ultra high definition era, and then under the ultra high definition basic research, a 3D ultra high definition coding and decoding technology is promoted, so that the ultra high definition coding Standard in China continues to get across international standards.

At present, in a motion field, images shot by a traditional multi-path camera are spliced to form a 360-degree panoramic image, and then the 360-degree panoramic image is encoded and transmitted again, and decoded multi-view output is carried out at the rear end for watching. According to the prior art, the image splicing quality has high requirements on algorithms and has extremely high requirements on performance, fuzzy images can appear at a splicing boundary, the fuzzy images directly affect the image quality and user experience, the splicing algorithm is related to CPU/GPU (Central processing Unit/graphics processing Unit) calculation, the higher the image resolution is, the stronger hardware calculation force is required, the huge challenge is brought to image processing, the image real-time property is affected, the fuzzy defect of the splicing boundary caused by the splicing of the ultra-high definition images is solved, on one hand, the stronger real-time image processing capability is required, and meanwhile, once a multi-path shooting visual angle is changed, the splicing needs to be performed again, the algorithm needs to be implanted again, the difficulty is higher, the cost is higher, and the universality is not strong.

Therefore, an algorithm without splicing needs to be provided in the market, a multi-channel direct coding transmission method is carried out on an image coding layer, the existing single-channel coding data structure of the AVS3 standard is utilized, a multi-channel coding structure is expanded, some scene applications are preset, and the purpose of realizing 360-degree panoramic viewing as the traditional splicing means can be achieved through theoretical verification and analysis.

Disclosure of Invention

The invention aims to disclose a multi-channel video multi-view scene coding and decoding method based on an AVS3 coding framework. According to the multi-channel multi-view coding and decoding method, multi-channel videos are coded on a coding side, data synchronization and transmission are carried out, and different videos can be selected and configured at any time on a rear-end decoding side to be finally output.

The technical scheme for realizing the technical purpose of the invention is as follows:

a multi-channel video multi-view scene coding and decoding method based on AVS3 coding frame includes the following steps:

first step, multiview AVS3 encoding: under a certain scene, multiple paths of ultra-high definition cameras respectively carry out multi-view AVS3 coding

Step two, time synchronization: and acquiring the image after each path of video coding after coding, and carrying out multi-path video coding time verification to ensure that the time data after coding are consistent.

Step three, invent the coded data structure: in the AVS3 coding framework, the extended data of the coded image is utilized, and the multi-path video coding related information is embedded into the extended data, wherein the extended data comprises multi-scene definition, each path of image information, synchronous time information, decoding frame taking sequence of decoding under multi-scene, and the like.

Step four, filling of extended data: and filling the expansion data into the expansion data area after the second step of image coding.

And step five, packaging the audio and video into TS (transport stream): and combining the coded image carrying the extended data with audio, and packaging into a TS (Transport Stream) format, wherein the TS (Transport Stream) comprises the following components: transport stream, is a media file package format.

Sixthly, TS stream transmission: the TS stream is transmitted over a network.

And step seven, receiving and decoding the TS stream: and the user decodes the TS after receiving the TS.

Eighth step, decoding the image extension data: when the extended data of the decoding side decoding image is detected to have multi-path multi-view, the user can select the view pictures with different recommended views (different scenes) to output.

And ninthly, selecting a scene by the user for watching: the user experiences the selection and experiences different perspectives.

Preferably, the second step of time synchronization is to perform multi-channel video coding time verification, specifically, each frame of image data containing a timestamp is stored in a memory according to a linked list data structure, each channel of data is read from the linked list, and a timestamp is read, is a string of long integer time data, performs digital size comparison, and confirms that the coded time data are consistent. Preferably, the third step, inventive coded data structure is: the invention relates to a method for encoding image extension data, which is specific data carrying encoding information, and the inventor creates an extension data structure of an extension data structure panoramic encoding image called a panoramic encoding image based on the data structure, and embeds multi-path video encoding related information into the extension data, wherein the extension data structure comprises multi-scene definition, each path of image information, synchronous time information, decoding frame taking sequence of decoding under multiple scenes and the like.

Preferably, the eighth step, decoding the image extension data, is: the decoding side decodes the image, analyzes the extended data structure table extended data of the panoramic coding image, and enables a user to select view pictures of different recommended views (different scenes) to output when detecting that the panoramic coding image has multiple paths of multiple views.

According to the technical scheme of the invention, the beneficial effects are as follows:

1. the multi-channel video simultaneous encoding method based on the AVS3 in the encoding framework realizes panoramic multi-corner encoding under the AVS encoding framework, provides a method guide for the production of ultra-high-definition content, particularly the field needing panoramic content production, and further perfects the AVS3 encoding standard.

2. The method is simple and convenient to realize, does not need panoramic stitching, saves a plurality of boundary fuzzy problems caused by image stitching, solves a plurality of boundary fuzzy problems caused by image stitching of the video, provides clearer video signals and user experience for the market, and has more efficient scene coding and decoding efficiency.

3. The multi-channel video multi-view scene coding and decoding method based on the AVS3 coding framework does not use a splicing algorithm based on a GPU (Graphics Processing Unit), does not need to splice images, does not need strong calculation support, but can provide different views for the manufacture and output of back-end content, and provides a powerful 360-degree panoramic scheme, so the cost can be saved.

The meaning of English abbreviations of technical terms and Chinese references are used in the present invention:

AVS3(Audio Video Standard 3) third generation Audio Video Standard;

TS Stream (Transport Stream, a media file encapsulation format).

Drawings

FIG. 1 is a block diagram of a flow chart of a coding and decoding method of the present invention;

FIG. 2 is a system flow diagram of a simulation implementation of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by the embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

A multi-channel video multi-view scene coding and decoding method based on an AVS3 coding framework comprises the following steps:

the first step S1, multiview AVS3 encodes S1: and the multi-path ultra-high-definition cameras respectively carry out multi-view AVS3 coding.

Second step S2, time synchronization S2: the method comprises the steps of obtaining coded images of each video after coding, wherein the coded images contain time stamps (the time stamps are converted based on a synchronous clock 90K hz, if the frame rate is 25fps, the sampling time of one frame of data is 40ms (1000 ms/25), the time stamp value of each frame of data is increased to be 90K x 40ms (3600), the image data containing the time stamps of each frame are stored in a memory according to a linked list data structure, each frame of data is read from a linked list, the time stamp values are read, the obtained time stamps are a string of long integer data, and when the difference of the time stamps of each channel of coding does not exceed a certain specific value (such as 100, the specific adjustable setting is carried out), the coding time of the two channels can be confirmed to be basically consistent, and therefore the consistency of the coded time data is confirmed.

Third step S3, coded data Structure S3: within the standard framework of AVS3, AVS3 itself, when encoded, carries various types of extension data, table 1: extended data structure definition under the AVS3 standard.

Table 1: extended data structure definition under AVS3 standard

Table 2 is an extended data structure table of a panorama coded image, as shown in table 2, the coded image extension data is specific data carrying coded information, and the present inventors have created an extended data structure called a panorama coded image based on the data structure, and the C language pseudo code is as follows:

and writing multi-channel video coding attribute information (see table 2 in detail) into the extended data through codes, wherein the extended data structure comprises multi-scene definition, each channel of image information, synchronous time information, decoding frame taking sequence for decoding under multiple scenes and the like.

Table 2: extended data structure table for panoramic coding image

Fourth step S4, extended data padding S4: adding the extended data into the extended data area after the second step of coding the image to form a latest AVS extended data area, wherein the C language pseudo code is as follows:

step five, S5, audio and video are packaged into TS stream S5: and combining the coded image carrying the extended data with the audio, and packaging into a TS (transport stream) format.

Sixth step S6, TS stream transmission S6: and the encapsulated TS stream utilizes a UDP sending server to send a TS packet into the multicast network through a UDP protocol in the multicast network, so as to push the data stream and realize network transmission.

UDP is the abbreviation of User Datagram Protocol, namely User data packet Protocol.

Seventh step S7, receiving TS stream and decoding S7: and the user decodes the TS after receiving the TS.

Eighth step S8, image extension data decoding S8: the decoding side decodes the image to analyze the extended data structure table extended data of the panoramic coding image, obtains the extended data structure of the panoramic coding image by detecting the condition that (next _ bits (4) ═ 1101', refer to table 2) of the AVS extended data area, and analyzes the multi-channel video coding attribute information.

Ninth step S9, user selects scene view S9: once the stream is detected to have multiple paths of multi-view and multi-scene, the user can select the view angles of different recommended view angles (different scenes), the user can select one path of view for decoding, the decoder knows the data of scene number, decoding frame sequence, PCR, width, height and the like according to the multi-path video coding attribute information under the video, and then selects one path of view for decoding. And finally outputting the video in the view angle scene.

Fig. 2 is a system flow diagram of a simulated implementation of the present invention, as shown in fig. 2 below,

the process basically verifies all the steps of the invention, has certain implementation feasibility, and achieves the target of multi-view scene coding and decoding. The simulation process of the invention on the server is as follows:

1. four paths of YUV original data are prepared, each frame of image simulates four-angle video streams shot by a camera, the angles are 0 degree, 90 degrees, 180 degrees and 270 degrees respectively, an encoding server (including UDP transmission service) is prepared, and a receiving server (including AVS3 decoding) is prepared.

2. And simultaneously and separately encoding the four-picture YUV data by using an AVS3 encoding engine to generate AVS3 encoded basic video data.

3. And creating extension data, and well creating an extension data structure of the panoramic coding image of each path.

4. Writing the extended data structure of the panoramic coding image into the extended data area according to the four scenes simulated in advance

5. The data is encapsulated into a TS stream.

6. Sending TS streams using UDP protocol

7. Receiving server receives TS data

8. Decoding, parsing extended data structure of panorama coded image

9. Four scenarios are listed.

10. The scene is manually selected.

11. Scene output by selection

Through a series of simulation demonstrations, we find that, firstly, the flow from encoding to decoding basically conforms to the whole process of the invention, secondly, when analyzing the extended data, the extended data can be obtained in real time and the purpose of decoding in sequence can be achieved, thirdly, we can output the decoded data corresponding to a scene by manually selecting a certain scene, and when verifying the decoding, the decoded data also can conform to the key point of the invention, and we reasonably believe that when applied to videos with more than 4 paths or more visual angles, the requirements of the invention can also be conformed to, the purpose of panoramic multi-visual-angle viewing of the invention can be achieved, and therefore, the technical effect is obvious.

The above description is only for the purpose of illustrating the embodiments of the present invention, and the scope of the present invention should not be limited thereto, and any modifications, equivalents and improvements made by those skilled in the art within the technical scope of the present invention as disclosed in the present invention should be covered by the scope of the present invention.

Claims

1. A multi-channel video multi-view scene coding and decoding method based on AVS3 coding frame includes the following steps:

Step two, time synchronization: acquiring the image after each path of video coding after coding, and carrying out multi-path video coding time verification to ensure that the time data after coding are consistent;

thirdly, inventing a coding data structure: in the AVS3 coding frame, a kind of extended data of coding image, and embedding multi-channel video coding related information into the extended data, wherein the extended data includes multi-scene definition, each channel of image information and synchronous time information, and decoding frame-fetching sequence of decoding under multi-scene, etc.;

step four, filling of extended data: filling the extended data into the extended data area after the second step of coding the image;

and step five, packaging the audio and video into TS (transport stream): combining the encoded image carrying the extended data with the audio, and then packaging into a TS (transport stream) format;

sixthly, TS stream transmission: the TS stream is transmitted through a network;

and step seven, receiving and decoding the TS stream: decoding the TS stream after the user receives the TS stream;

eighth step, decoding the image extension data: decoding the extended data of the image at the decoding side, and enabling a user to select the view pictures with different recommended views to output when detecting that the image has multiple paths of multiple views;

2. The method of claim 1, wherein the multi-way video multi-view scene coding and decoding based on AVS3 coding framework comprises: and the second step of time synchronization for multi-channel video coding time verification is realized, specifically, each frame of image data containing a timestamp is stored in a memory according to a linked list data structure, each channel of data is read from the linked list, and the timestamp is read, is a string of long integer time data, is subjected to digital size comparison, and confirms that the coded time data are consistent.

3. The method of claim 1, wherein the multi-way video multi-view scene coding and decoding based on AVS3 coding framework comprises: thirdly, the inventive coding data structure is: the coded image extension data is specific data carrying coded information, based on the data structure, an extension data structure of a panoramic coded image with an extension data structure called a panoramic coded image is created, and multiple paths of video coding related information are embedded into the extension data, wherein the extension data structure comprises multiple scene definitions, image information and synchronous time information of each path, decoding frame taking sequencing of decoding under multiple scenes and the like.

4. The method of claim 3, wherein the multi-channel video multi-view scene coding and decoding method based on AVS3 coding framework comprises: the extension data is specifically:

5. The method of claim 1, wherein the multi-way video multi-view scene coding and decoding based on AVS3 coding framework comprises: the eighth step, decoding the image extension data, is: and analyzing the decoding image at the decoding side to obtain extended data of an extended data structure table of the panoramic coding image, and enabling a user to select view pictures with different recommended views to output when detecting that the panoramic coding image has multiple paths of multiple views.