CN113382177B

CN113382177B - Multi-view-angle surrounding shooting method and system

Info

Publication number: CN113382177B
Application number: CN202110600717.XA
Authority: CN
Inventors: 杨君蔚; 谈新; 戚荣辉; 陆趣; 袁跃; 包宇骄; 张怡; 朱辉
Original assignee: Shanghai Media Tech Co ltd
Current assignee: Shanghai Media Tech Co ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2023-03-28
Anticipated expiration: 2041-05-31
Also published as: CN113382177A

Abstract

The invention relates to the technical field of video data processing, in particular to a multi-view surround shooting method, which comprises the following steps: carrying out primary focusing processing on each camera by using a reference object; determining one of the cameras as a reference machine position, and acquiring correction data of pictures acquired by the other cameras by taking pictures acquired by the reference machine position as a standard; before pictures collected by a plurality of cameras are played, correction data and a pre-established animation template are imported into rendering and playing software; after the time point of the preset standard picture is obtained, the rendering and playing software calls a corresponding video picture in the video recording server according to the time point of the preset standard picture and the animation template; and the rendering and playing software corrects the adjusted video picture according to the animation template and the correction data. Has the advantages that: and the correction processing and the time synchronization processing are carried out on the pictures of all the stands, so that the consistency of the pictures collected by all the stands in space and time is ensured.

Description

Multi-view-angle surrounding shooting method and system

Technical Field

The invention relates to the technical field of video data processing, in particular to a multi-view surround shooting method and a multi-view surround shooting system.

Background

In order to provide better experience for television viewers, some television programs usually require a multi-view surround-shooting method, for example, some sports events usually require a plurality of pictures from different views to be captured. This means that a plurality of cameras need to be arranged to shoot a target in real time around a stadium, and then a series of processing technologies such as video frame acquisition, frame alignment, multi-machine frame picture correction, animation effect rendering and the like are performed, so that a video playback effect of a special effect picture which is fast moving and static 'time coagulation' at a certain moment can be presented in live broadcasting, and a viewing effect of a free viewing angle of 360 degrees can be presented on mobile equipment.

In the prior art, when a multi-view surround shooting system shoots a surround video, the most central problem is whether pictures obtained from a plurality of cameras have consistency in space and time. The spatial consistency refers to the consistency of the camera in the horizontal direction, the position offset and the focal length when shooting a target; temporal consistency means that when the operator determines that a picture at a certain point in time is needed, the pictures of all cameras given by the video recording system at that point in time need to be consistent. Therefore, a time synchronization method and a picture rectification method are needed to keep the pictures of all the stands consistent in time and space.

Disclosure of Invention

In view of the above problems in the prior art, a multi-view surround shooting method and system are provided.

The specific technical scheme is as follows:

the invention comprises a multi-view angle surrounding shooting method, which comprises the steps of simultaneously utilizing a plurality of cameras to carry out real-time shooting and correcting pictures collected by the plurality of cameras, and specifically comprises the following steps:

s1, respectively erecting a plurality of cameras on preset positions;

s2, placing a reference object, and carrying out primary focusing processing on each camera by using the reference object so as to enable the focuses of the cameras to be consistent;

s3, determining one of the cameras as a reference machine position, and acquiring correction data of pictures acquired by the other cameras by taking pictures acquired by the reference machine position as a standard;

s4, before pictures collected by a plurality of cameras are played, the correction data and a pre-established animation template are imported into rendering and playing software;

step S5, after the time point of a preset standard picture is obtained, the rendering and playing software calls a corresponding video picture in a video recording server according to the time point of the preset standard picture and the animation template;

and S6, the rendering and playing software corrects the called video picture according to the animation template and the correction data.

Preferably, said corrective data comprise the anchor point position and/or the scaling and/or the displacement and/or the angle of rotation.

Preferably, in step S3, the step of acquiring the correction data includes:

step S31, importing a frame of video picture of each camera into video processing software;

step S32, selecting a tracker and setting the tracker;

step S33, setting the tracking point position of the reference machine position;

step S34, calculating the tracking point positions of all the cameras according to the tracking point positions of the reference machine positions;

and step S35, adjusting the tracking point position of each camera to generate tracking point data, and storing the tracking point data in a text file as the correction data.

Preferably, the content of the animation template includes:

selecting rules of the video pictures; and/or

The frame number of the video pictures selected by each camera; and/or

Whether the video picture is set with a push-pull effect or not; and/or

Setting the playing length of each frame of the video picture; and/or

A combination rule of the video pictures; and/or

An output resolution setting of the video picture.

Preferably, in step S6, the correction processing specifically includes:

step S61, filling the acquired video picture by using the correction data;

step S62, utilizing the correction data to rotate the video picture;

step S63, utilizing the correction data to carry out scaling processing on the video picture;

step S64, cutting the video picture;

step S65, zooming the video frame according to the animation template;

and S66, converting the format of the video picture according to the animation template.

Preferably, in step S5, the method further includes performing time synchronization processing on the video picture:

step S51, calculating a time sequence number value corresponding to each frame of the video picture according to the recording information of the video picture by using a time alignment algorithm;

step S52, performing frame alignment processing on all the video pictures;

and step S53, calculating the frame extraction sequence numbers and the frame extraction quantity of all the cameras according to the animation template and the time points of the preset standard picture.

Preferably, before performing the step S52, one frame of the video picture corresponding to a preset time point is extracted from each of the cameras, and it is checked whether the time sequence numbers corresponding to the video pictures extracted from the plurality of cameras are the same.

Preferably, the recording information of the video pictures comprises a video clip type and/or a video clip name and/or a local time of a first frame of the video clip and/or a local time of a current frame of the video clip and/or a current frame number of the video clip and/or a video clip information time alignment offset and/or a video clip information resolution width and/or a video clip information resolution height and/or a time code of the first frame of the video clip and/or a frame rate of the video clip.

Preferably, after the time synchronization process is completed, if a synchronization abnormality occurs in a certain camera, the offset of the abnormal camera is adjusted according to the animation template.

The invention also comprises a multi-view surround shooting system, which is applied to the multi-view surround shooting method and comprises the following steps:

the cameras are used for collecting the video pictures at different angles;

the time code synchronization equipment and the frame synchronization equipment are used for carrying out time synchronization processing on the video pictures acquired by the cameras;

the video recording server is used for storing the video pictures acquired by the plurality of cameras;

the rendering and playing device is used for pre-storing the rendering and playing software, and after the rendering and playing device acquires the time point of the preset standard picture, the rendering and playing software corrects the video picture called in the video recording server according to the animation template and the correction data;

and the local area network equipment is respectively connected with the video recording server and the rendering and playing equipment through a network and is used for establishing a data transmission channel for the video recording server and the rendering and playing equipment.

The technical scheme of the invention has the following advantages or beneficial effects: the method comprises the steps of setting a reference machine position, forming correction data of each camera machine position by taking the reference machine position as a standard, and correcting all pictures acquired by the machine positions by using the correction data to ensure the spatial consistency of the pictures acquired by all the machine positions; in addition, through time synchronization processing, the consistency of the acquired pictures of all the seats on the time is ensured, and therefore the experience of audiences is improved.

Drawings

Embodiments of the present invention will be described more fully with reference to the accompanying drawings. The drawings are, however, to be regarded as illustrative and explanatory only and are not restrictive of the scope of the invention.

FIG. 1 is a flowchart illustrating steps of a multi-view surround photographing method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a multi-view surround-shooting system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a spatial image effect of each machine position according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a process of image rectification according to an embodiment of the present invention;

FIG. 5 is a corrective data file for 5 stands in accordance with an exemplary embodiment of the present invention;

FIG. 6 is an animation template file for 5 bays as exemplified in an embodiment of the invention;

FIG. 7 is a flowchart illustrating a process of image rectification according to an embodiment of the present invention;

FIG. 8 is a schematic diagram of anchor point positions before a picture is filled in according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of anchor point positions after the process of filling the picture in the embodiment of the present invention;

FIG. 10 is a schematic diagram of an effective area after a filling process is performed on a picture according to an embodiment of the present invention;

FIG. 11 is a flowchart of a frame synchronization process according to an embodiment of the present invention;

FIG. 12 is a code diagram illustrating the use of an animation template to adjust the frame offset according to an embodiment of the present invention;

FIG. 13 is a view showing a construction of a video camera in the embodiment of the present invention;

fig. 14 is a specific flowchart of the rendering and playing operation in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.

The invention comprises a multi-view angle surrounding shooting method, which comprises the steps of simultaneously utilizing a plurality of cameras to carry out real-time shooting, and correcting pictures acquired by the plurality of cameras as shown in figure 1, and specifically comprises the following steps:

s1, respectively erecting a plurality of cameras on preset positions;

s2, placing a reference object, and carrying out primary focusing treatment on each camera by using the reference object so as to enable the focuses of the cameras to be consistent;

s3, determining one camera as a reference machine position, and acquiring correction data of pictures acquired by the other cameras by taking pictures acquired by the reference machine position as a standard;

s4, before pictures collected by a plurality of cameras are played, correcting data and a pre-established animation template are imported into rendering and playing software;

s5, after the operator selects the time point of the preset standard picture, rendering and playing software calls a corresponding video picture in a video recording server according to the time point of the preset standard picture and the animation template;

Specifically, in the present embodiment, after the multi-view surround photographing system is built, each camera needs to go through a focusing process (i.e., lens alignment with the photographed target) due to problems of a field environment and an installation process. After alignment, the picture effect photographed by all the stands may be the situation shown in fig. 3, and the pictures of each stand are not consistent in space, so that when a surround video is generated, a strong picture shaking sense is generated, and the video cannot be actually played. The image rectification processing adopted by the system is to solve the problems of rotation, scaling and offset of the image of each machine position on a 2D plane, therefore, the rectification data preferably comprises an anchor point position, a scaling ratio, a displacement and a rotation angle, so that each machine position is consistent in space. After the correction of the picture in 2D is completed, the smoothness of the rotation of the video picture in 3D around the z-axis is determined by the interval of the machine position layout.

In particular, multi-view surround shooting systems employ digital storage and encoding of video pictures taken from a camera. As shown in fig. 4, the picture rectification process flow includes two stages, namely, a pre-stage preprocessing stage and a live broadcast real-time processing stage, wherein the pre-stage preprocessing stage includes: s1, erecting a camera, wherein the function of a multi-camera erecting process in picture correction is very critical. Each corrected camera picture is subjected to over-rotation, scaling and displacement operations, the picture has certain loss after the operation, and the resolution of the effective picture is certain to be smaller than that of the input picture. Therefore, when the cameras are erected, the more accurately each camera focuses on a shooting object, the smaller the difference between the camera positions is, and the smaller the picture loss of the output video is. At present, the system adopts a manual erection mode, the erection of the cameras is required to be horizontal, the distances between the cameras and the shot objects can be inconsistent, and the distances between each camera and the shot objects are relatively consistent through the optical zooming of the cameras. And S2, the first step after the camera is erected is primary focusing, and the primary focusing of the system adopts a manual focusing mode. To ensure that the focus of each machine position is substantially consistent, a reference object, such as a rod, may be placed at the rotational position of the frame during focusing. The camera of each machine position can shoot the reference object, the reference object is kept in the center of a picture, the level, the size, the displacement and the brightness of the picture shot by the camera of each machine position are basically kept consistent with each other by naked eyes, and then the picture input by each machine position is processed in real time through an algorithm to ensure the consistency of pixel levels. And S3, after the primary focusing is finished, acquiring correction data of each machine position for subsequent picture correction processing, firstly determining a certain machine position as a reference machine position, and comparing pictures of other machine positions with pictures of the reference machine position to generate correction data of anchor point positions, scaling ratios, rotation angles and displacements. The system preferably adopts a tracker of Adobe After Effects CC software (hereinafter referred to as AE) to acquire correction data of each machine position. And after the AE is used for acquiring correction data from each machine position picture, storing the correction data in a text file for correcting each frame picture during subsequent real-time rendering. As shown in fig. 5, the format of the text file includes corrected initial data of 5 camera positions, where Transform AnchorPoint represents an anchor point, and subsequent image zooming and rotation both use the anchor point as a picture center; transform Scale stands for scaling; transform Position represents displacement; transform Rotation represents the Rotation angle. And after the correction data file is generated, completing the preparation work in the early preprocessing stage.

Further, the correction data obtained by preprocessing in the early stage is utilized to carry out real-time picture correction in the live broadcast real-time rendering process in the follow-up step. The live broadcast real-time processing stage specifically includes: in step S4, before live broadcast, rendering and playing software is opened, correction data files are imported from the software, the correction data files are analyzed by a software program, and four items of data including anchor point positions, rotation angles, displacements and scaling ratios of machine positions are structured. Then, an animation template is imported, the animation template is a basis for video generation and is stored in a hard disk in an xml file form, fig. 6 shows an animation template file with 5 machine positions listed in this embodiment, each video consists of a plurality of camera machine positions, and the video includes an input resolution (sourcewidth, sourcehiegh) and an output resolution (outputwidth, outputtheight), and a frame rate (fps); camera represents information of each Camera position: id identifies the machine position number; frames represents that the frame needs to acquire several frames; the frame mode represents whether the acquired multi-frame picture time is forward or backward; offset is the offset from the current dotting time; d represents the acquired frame pictures to display several frame times, and the frame time displayed by each machine position is based on the time when the operator sends a dotting instruction on the rendering and broadcasting server.

In a preferred embodiment, the content of the animation template mainly includes a selection rule of video pictures (pictures of which machine positions need to be selected), the number of frames of the video pictures selected by each camera (several frames of pictures are taken for each machine position), whether the video pictures need to be set with a push-pull effect, the playing length of each frame of the video pictures, a combination rule of the video pictures, an output resolution setting of the video pictures, and the like. When the video is generated, each frame of picture is rendered in real time according to the definition of the animation template, and picture rectification is the first process of live broadcast real-time processing.

In this embodiment, the preset standard frame is a highlight frame selected by the operator and meeting the playback standard. After an operator finishes dotting operation on the rendering and playing software, the rendering and playing software acquires the time point of the wonderful picture, the rendering and playing software sends a calling instruction to the video recording server according to the content of the animation template and the time point of the wonderful picture, and the video recording server retrieves the corresponding video frame picture and pushes the video frame picture to the rendering and playing server. And after the rendering playing server obtains the frame picture, performing picture correction, picture assembly and rendering video processing according to the requirements of the animation template. Because the video pictures provided by each camera position are processed by correction and the like, in order to ensure that the rendering time can meet the efficiency requirement required by live broadcasting (namely, the rendered video can be output within 10 seconds), the system adopts an ffmpeg filter frame as a development engine, and the engine can realize processing operations such as image format conversion, picture rotation, clipping, video assembly, transcoding and the like in real time. In the real-time rendering process, the system uses five filters, namely pad, zoompan, scale, crop and rotate, in the ffmpeg filter frame to realize the correction processing of the picture, and the specific processing flow is shown in fig. 7.

In a preferred embodiment, in step S3, the step of acquiring correction data comprises:

step S31, importing a frame of video picture of each camera in video processing software;

step S32, selecting a tracker and setting the tracker;

step S34, calculating the tracking point positions of all cameras according to the tracking point positions of the reference machine positions;

and step S35, adjusting the tracking point position of each camera to generate tracking point data, and storing the tracking point data in a text file as correction data.

Specifically, a frame of picture file of each machine position is imported into AE software, a file selection box is popped up, a reference machine position file is selected, a PNG sequence is selected and forcibly sorted according to a letter sequence, and then the import is clicked; dragging a picture sequence frame, dragging a picture sequence from the project frame after importing the picture sequence frame, and recording the picture sequence; then, setting a tracker, selecting the tracker, selecting stable motion for tracking motion, and performing pointing rotation and zooming; setting a tracking point of a reference machine position: zooming the view, positioning the first tracking point at the center point of the camera, namely the anchor point position, and selecting the position of the second tracking point; automatically analyzing and calculating the tracking point positions of all computer positions; adjusting the tracking point position of each machine position, which is because the automatically calculated tracking point position always has deviation and the tracking point position of each machine position needs to be adjusted manually; selecting X and Y dimensions to generate tracking point data; the copy tracking point data is stored in a text file.

In a preferred embodiment, as shown in fig. 7, in step S6, the step of performing the correction process specifically includes:

step S61, filling the acquired video picture by using the correction data;

step S62, rotating the video picture by utilizing the correction data;

s63, carrying out zooming processing on the video picture by utilizing the correction data;

step S64, cutting the video picture;

step S65, zooming the video frame according to the animation template;

Specifically, in step S61, the pad filter is used for the filling operation. The rotation angle and the scaling ratio acquired from the correction data file are realized by taking the anchor point as a central point. However, the processing of the image by the rate and scale filters of ffmpeg is performed with respect to the center point of the image. As shown in fig. 8, if the anchor point a is not at the image center point position O, there is a problem in rotating and scaling the image using the ffmpeg filter. Therefore, before all processing is performed on the video picture, the picture is filled, and the anchor point a is moved to the central point of the graph. If the anchor point A needs to be converted into the image center point, then the subsequent rotation and scaling processing is carried out. The processing method is filling, the image is filled into the form shown in fig. 9, the hatched portion is the filled portion, and after the image is filled, the anchor point a becomes the image center point.

Note that the larger the shaded (filled) area is, the smaller the picture is effectively output. Therefore, when erecting cameras and performing initial focusing, the error between each camera is minimized. The pseudo code for the filled region is specifically calculated as follows:

calculating the difference value of the image center point O and the anchor point A on the x axis:

subx＝SourceWidth/2-anchorPos.x

if(subx<0){

if the center point x is smaller than the anchor point x, the anchor point A is shown to be on the right side of the center point O;

outputWidth = SourceWidth + abs (subx) × 2, i.e. the original width plus the distance between anchor point a and central point O;

output.x =0; // the source image is at the position of 0 point of the x-axis, and the redundant space is complemented by a black field;

}else{

if the center point x is larger than the anchor point x, the anchor point A is positioned on the left side of the center point O;

outputWidth = SourceWidth () + abs (subx) × 2, i.e. the original width plus the distance between anchor point a and central point O;

output.x =2 abs (subx); the source image begins at the difference position of the x-axis anchor point and the central point, and the redundant space is complemented by a black field;

calculating the difference value of the image center point O and the anchor point A on the y axis;

suby＝SourceHeight()/2-anchorPos.y

if(suby<0){

if the center point y is smaller than the anchor point y, the anchor point A is positioned at the lower side of the center point O;

output height = SourceHeight + abs (by) × 2, i.e. the original height plus the distance between anchor point a and central point O;

output.y =0; v, the position of a source image at the 0 point of the y axis, and the redundant space is complemented by a black field;

if the center point y is larger than the anchor point y, the anchor point A is positioned on the upper side of the center point O;

output height = SourceHeight + abs (by) × 2, i.e. the original height plus the distance between anchor point a and center point O

Output.y =2 abs (by)// source image starts at the difference position of y-axis anchor point a and center point O, the upper space is complemented by a black field;

and obtaining several parameter values of output width, output height, output.x and output.y through the calculation processing to fill the picture. The picture is filled with these several parameter values through a pad filter pad = output width: output height: output.x: output.y. Wherein the filled canvas is generated by using an outputWidth parameter which is an outputHeight parameter; output.x. Output.y parameter to specify the location of the source image on the new canvas.

After the frame filling, the anchor point a is located at the center point of the image, and in step S62, the frame rotation process may be performed by using an ffmpeg rotate filter, where the rotate process formula rotate = rotate angle PI/180, and the rotate angle may be obtained from the corrected data file, that is, the transformrotate of the machine position.

After the image rotation, the image scaling process is performed in step S63. The scaling formula scale = output.x: output.y, where output.x and output.y refer to the scaling of the picture in the x-axis and the y-axis, respectively.

After the original picture is filled, rotated and scaled, the correction of each camera position is basically completed. The corrected pictures are unified with the anchor point a as a reference. However, since the picture of each machine position is filled, rotated and scaled, the resolution of the picture is different, and since the picture is filled, there is a black filled area on the picture, and the available effective area is only shown as an area B in fig. 9, the picture needs to be cut in step S64.

The picture cutting process is the last step of the picture correction process, and the step process is to cut all the processed machine position pictures by taking the anchor point as the center. The clipping area is the maximum available area of each machine position after filling and rotation. If there are n machine positions, the maximum available area is the area of the minimum available area in the n machine positions. The pseudo code for the clipping process is as follows:

/>

/>

specifically, after the image correction is completed, zooming operation is performed on each frame of image according to the animation template, and the default zooming is 1.0 time. In the ffmpeg filter, the zoom process is completed using a zoompan filter. The following formula is used for processing:

zoompan＝z＝'if(lte(zoom,1.0),pos.z,max(1.001,zoom-(pos.z-1.0)/pos.d))':x＝'x+iw/2-(iw/zoom/2)':y＝'y+ih/2-(ih/zoom/2)':s＝OutputWidthxOutputHeight:d＝pos.d,")；

the picture zooming has a transition effect, the formula is an instruction of zooming from 1.0 to the specified multiplying power of the animation template, and the zooming speed is smooth transition according to d parameter time in the animation template.

Finally, because the original camera picture is collected by uyuv422, the final output is converted according to the picture format requirement of the animation template, and format = uyvy422 is used in the Ffmpeg filter. After format conversion is completed, processing steps of performing real-time correction and rendering on the video picture of each camera by using an ffmpeg filter frame are all completed, and then performing sdi baseband output to a back-end playing system according to an fps frame rate.

In a preferred embodiment, in step S5, the method further includes performing time synchronization processing on the video picture:

step S51, calculating a time sequence number value corresponding to each frame of video picture according to the recording information of the video pictures by using a time alignment algorithm;

step S52, performing frame alignment processing on all video pictures;

Specifically, unlike a video camera and a still camera, the pictures captured by the video camera are continuous. In the continuous pictures of a plurality of cameras, alignment is carried out at each time point (for example, 20ms of one frame), otherwise, the problem of shaking of the shot object occurs. If time synchronization is to be achieved, three problems need to be solved: (1) each machine position acquisition video frame synchronous processing; (2) performing frame extraction time positioning consistency processing on each machine position; and (3) processing the missing frames and the multiple frames of the collected video of each machine position. The solution of the problem (1) is ensured by hardware, and all cameras are accessed into a uniform synchronous signal to ensure the synchronization of the acquired video frames; aiming at the problem (2), hardware and software are used for processing, all cameras are connected into a unified time code signal, each frame of collected pictures is ensured to have corresponding time code time in a unified standard, and the frame number positioning of a specified time code can be realized by combining with the time code information for recording the first frame of pictures; aiming at the problem (3), frame missing or multi-frame conditions caused by the problems of a camera, a video cable or board card acquisition can be mainly used for frame supplementing and frame missing processing according to the frame rate of time codes and the time code condition of the acquired video.

Specifically, the time synchronization process includes: in the field building period, a reference object is shot (a serial number video with a frame rate of 50 is played), and cameras, wiring, synchronization, time code synchronization and software configuration preparation of each machine position in the field are confirmed to have no problem. The reference object of the 50P video played by the display is a time alignment reference object, a 60-second 1080P50 video is adopted for cyclic playing, each frame of the video is a sequence number value picture, and each frame (20 ms) of the display picture is ensured to correspond to different time sequence number values; checking before live broadcast, calling a template of each machine position frame-drawing fixed time frame, drawing a sequence of video frames, checking whether time sequence numbers in each frame are the same or not, and ensuring that a camera, a connecting line, frame synchronization, time code synchronization and a software system are normal; then, carrying out frame synchronization of the cameras, wherein the frame synchronization is to ensure that video frames acquired by all the positions are acquired at the same time point; and time code synchronization of the cameras, wherein the time code synchronization is used for ensuring that all video frames acquired by the camera positions are marked with corresponding time code values, and the time code values are used as a reference of the time synchronization for retrieval and alignment. It should be noted that, for each video segment in the video recording server, the first frame information of each machine position has record information, so that the time alignment algorithm calculates the time sequence number value of the frame picture, and the record information preferably includes the type of the video segment and/or the name of the video segment and/or the local time of the first frame of the video segment and/or the local time of the current frame of the video segment and/or the current frame number of the video segment and/or the time alignment offset of the video segment information and/or the resolution width of the video segment information and/or the resolution height of the video segment information and/or the time code of the first frame of the video segment and/or the frame rate of the video segment.

Further, video frame alignment is performed on the video pictures. The first processing mode of video frame alignment is that video frames are aligned and linearized according to time code frames, firstly, unified processing of the time code frame rate and the video frame rate is carried out, the time code frame is 25fps generally, the video frame is 50fps, two video frames are the same time code value, front frame doubling is needed, rear frame doubling and 1 processing are needed, then, frame missing and frame supplementing processing are carried out, the missing frames are supplemented by the front frames, and the multiframes are directly discarded. The time alignment algorithm is based on time code synchronizer hardware, is simple, reliable and high in cost, and does not need to consider the time difference of each video recording server. In addition, there is a second alignment mode, which takes the server time of the control end as a reference, calculates the time difference and the frame rate difference between the video recording servers, and carries out linearization processing and frame supplementing and frame dropping processing on the video frames of each machine position according to the time difference and the frame rate difference, and can also realize frame alignment.

After finishing aligning the video frames, searching the video frames, namely calculating frame extraction serial numbers and frame extraction quantities of all the machine positions from time codes of dotting positions of operators according to the content of the animation template, wherein the frame extraction serial numbers = (signaled int) ((frame extraction time codes-current machine position recording start time codes) + 0.5). Thus, the video can be decoded according to the frame number, and the frame number ensures the same time code value and the consistency of the frame time.

In a preferred embodiment, after the time synchronization process is completed, if a synchronization abnormality occurs in a certain camera, the offset of the abnormal camera is adjusted according to the animation template.

Specifically, the time alignment of the frame pictures has been completed through the above steps. If a certain camera has synchronization abnormity in the implementation process and has stable frame difference, the offset of the machine position can still be adjusted through the offset parameter in the animation template, so that other machine positions are aligned. As shown in fig. 12, an offset of 0 represents a time point when the operator selects the highlight; the positive offset is offset several frames backwards in time; the negative offset is offset a few frames ahead of time.

The present invention further includes a multi-view surround photographing system, as shown in fig. 2, applied to the multi-view surround photographing method in any of the above embodiments, including:

the system comprises a plurality of cameras 1, a camera control unit and a display unit, wherein the cameras 1 are used for collecting video pictures at a plurality of different angles;

the time code synchronization equipment 2 and the frame synchronization equipment 3 are used for carrying out time synchronization processing on video pictures acquired by the plurality of cameras 1;

the video recording server 4 is used for storing video pictures collected by the plurality of cameras 1;

the rendering and playing device 5 is used for storing rendering and playing software in advance, and after the rendering and playing device obtains a time point of a preset standard picture (namely a wonderful picture which is selected by an operator and meets a playback standard), the rendering and playing software corrects the video picture called in the video recording server according to the animation template and the correction data;

and the local area network equipment 6 is respectively connected with the video recording server and the rendering and playing equipment through a network and is used for establishing a data transmission channel for the video recording server and the rendering and playing equipment.

Specifically, the plurality of cameras 1 in the embodiment of the present invention are preferably 4K cameras, and are used for capturing images of multiple machine positions. The 4K camera selected in the system is a special camera for broadcasting and television, supports 3840x2160 resolution 12G SDI baseband signals, and performs video image acquisition in a 50p mode. The system is compatible with access from high definition to 8K video signals. Because the problems of horizontal ground and other implementation processes may be encountered in the actual installation process of the cameras, after the installation of the plurality of camera positions is completed, the actual captured images of each camera have differences in certain levels, scaling, offset and the like. The output picture can be ensured to be smooth and not to shake only by capturing the common effective area of a plurality of machine positions and correcting and cutting in the process of rendering the animation video. Therefore, in the camera selection process, the higher the input resolution is, the higher the resolution of the processed output picture is. The cameras are installed in a basketball court as shown in fig. 13, a plurality of 4K cameras can surround the basketball court for one or half of a circle, and the specific placement number depends on the final picture presentation effect according to the actual required scene.

Specifically, the time code synchronization device 2 and the frame synchronization device 3 mainly perform time synchronization processing on video frames acquired by cameras of all positions, and after the cameras 1 perform video recording, it is ensured that video frame frames acquired by a plurality of positions at the same time point (one frame for 20 ms) are consistent. As long as one frame of picture of the left and right adjacent cameras is not aligned, when a continuous animation is formed, the situation of shaking of the shooting object and the like occurs, and the presentation effect is affected. The time code synchronizing device 2 further comprises a time code generation server 21 and a time code synchronizer 22.

Specifically, the video recording server 4 functions to record the video image of the front-end camera 1 in real time. When the operator determines to capture a picture at a certain point in time in the field, a capture instruction may request the video listing server 4 to acquire one frame of picture or consecutive frames of pictures. The camera is used as the shooting front end, so that continuous pictures can be obtained, for example, continuous action of one machine position for one second is obtained, and then pictures of other machine positions are switched to, and more diversified video effects are generated. The recorded video pictures are stored in a memory and a hard disk in a recording server, and the video stored in the memory is a frame picture which can be provided within 1 second and is required for rendering in order to ensure timeliness. However, since the memory is not unlimited, video frames that exceed a certain time (depending on the size of the memory of the video recording server) are encoded by h.264 or h.265 and then stored on the hard disk, or may be used for post-production after live broadcast.

Specifically, fig. 14 shows a specific workflow of rendering the playback device. The function of the rendering and playing device 5 is to provide an operation interface for an operator, and the operator can click on an actual scene picture on the interface, determine a time point for generating the surround video, and then select an animation template to generate the surround video and play the surround video. The finally generated surround video is output to a back-end relay system through an SDI (Serial Digital Interface) in the system.

The embodiment of the invention has the following advantages or beneficial effects: the method comprises the steps that a reference machine position is set, correction data of all camera positions are formed by taking the reference machine position as a standard, and correction data are utilized to correct pictures collected by all the machine positions, so that the spatial consistency of the pictures collected by all the machine positions is ensured; in addition, through the time synchronization processing, the consistency of the acquired pictures of all the stands in time is ensured.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A multi-view surround shooting method comprises the steps of simultaneously utilizing a plurality of cameras to carry out real-time shooting, and is characterized by further comprising the step of correcting pictures acquired by the plurality of cameras, and the method specifically comprises the following steps:

s1, respectively erecting a plurality of cameras on preset positions;

s6, the rendering and playing software corrects the adjusted video picture according to the animation template and the correction data;

in step S5, the method further includes performing time synchronization processing on the video picture:

step S52, carrying out frame alignment processing on all the video pictures;

and S53, calculating the frame extraction serial numbers and the frame extraction quantity of all the cameras according to the animation template and the time points of the preset standard pictures.

2. The multi-view surround photographing method according to claim 1, wherein the correction data includes an anchor point position and/or a scale and/or a displacement and/or a rotation angle.

3. The multi-view surround photographing method according to claim 1, wherein in the step S3, the step of acquiring the correction data includes:

step S32, selecting a tracker and setting the tracker;

4. The multi-view surround photographing method according to claim 1, wherein the contents of the animation template include:

selecting rules of the video pictures; and/or

The frame number of the video picture selected by each camera; and/or

Whether the video picture has a push-pull effect or not is judged; and/or

Setting the playing length of each frame of the video picture; and/or

A combination rule of the video pictures; and/or

An output resolution setting of the video picture.

5. The multi-view surround photographing method according to claim 1, wherein in the step S6, the correction processing specifically includes:

step S61, filling the acquired video picture by using the correction data;

step S62, utilizing the correction data to rotate the video picture;

step S63, utilizing the correction data to carry out zooming processing on the video picture;

step S64, cutting the video picture;

step S65, zooming the video frame according to the animation template;

6. The method of claim 1, wherein before the step S52, a frame of the video frames corresponding to a predetermined time point is extracted from each of the cameras, and whether the time sequence numbers corresponding to the extracted video frames of the plurality of cameras are the same or not is checked.

7. The multiview surround shooting method of claim 1, wherein the recording information of the video picture comprises a video clip type and/or a video clip name and/or a local time of a first frame of a video clip and/or a local time of a current frame of a video clip and/or a current frame number of a video clip and/or a video clip information time alignment offset and/or a video clip information resolution width and/or a video clip information resolution height and/or a time code of a first frame of a video clip and/or a frame rate of a video clip.

8. The method according to claim 6, wherein after the time synchronization process is completed, if a synchronization abnormality occurs in a certain camera, an offset of the abnormal camera is adjusted according to the animation template.

9. A multi-view surround photographing system applied to the multi-view surround photographing method according to any one of claims 1 to 8, comprising:

the cameras are used for acquiring the video pictures at different angles;

the rendering and playing device is used for pre-storing the rendering and playing software, and after the rendering and playing device acquires the time point of the preset standard picture, the rendering and playing software corrects the video picture called from the video recording server according to the animation template and the correction data;