CN106127680B

CN106127680B - 720-degree panoramic video fast browsing method

Info

Publication number: CN106127680B
Application number: CN201610496238.7A
Authority: CN
Inventors: 罗文峰
Original assignee: Shenzhen Youxiang Computing Technology Co Ltd
Current assignee: Shenzhen Youxiang Computing Technology Co Ltd
Priority date: 2016-06-29
Filing date: 2016-06-29
Publication date: 2019-12-17
Anticipated expiration: 2036-06-29
Also published as: CN106127680A

Abstract

the invention discloses a 720-degree panoramic video fast browsing method which includes the steps of firstly reconstructing 720-degree panoramic video images through a back projection method to obtain corresponding views in each sight line direction of a spherical viewpoint space, then judging the length of a lens through calculating absolute brightness frame differences of adjacent image frames in a video sequence, and then extracting key frames to achieve fast browsing of panoramic videos. The method can quickly generate perspective views of the virtual scene in different sight directions, effectively simulate the rotation and zooming motion of the camera in the views in all directions, improve the browsing speed of the virtual scene, and can well meet the specific application field of a virtual reality system.

Description

720-degree panoramic video fast browsing method

Technical Field

The invention belongs to the technical field of image processing, relates to video panoramic image processing, and particularly relates to a 720-degree panoramic video quick browsing method.

background

With the development of information technology, people have higher and higher requirements for acquiring scene information in a wide view angle range, a traditional photographing method can only acquire image frames in a limited view angle range, and an image splicing technology is generated and rapidly developed for solving the problem. The method splices two or more pictures with overlapped information into a complete image with an ultra-wide visual angle so as to achieve the purposes of reducing image redundancy and acquiring wider visual angle information, wherein the generation of the panoramic image is also a typical application of the image splicing technology.

the 720-degree panoramic video is a video image sequence based on a spherical model, and can realize the panoramic browsing in any view angle directions of 360 degrees horizontally and 360 degrees vertically. When browsing, the spherical video image needs to be subjected to back projection transformation according to the current sight direction and the view field range so as to obtain a planar perspective image which accords with the habit of human vision. In this way, it is possible to simultaneously realize the simulation of the rotational movement and the zoom movement of the camera, changing the field of view.

a great deal of time and energy are consumed for retrieving and playing back massive video data, the traditional dragging and browsing method easily ignores sudden abnormal events in a short time, and the long-time video data search is not beneficial to the extraction of effective information. Therefore, the panoramic video needs to be further processed to realize quick browsing of the panoramic video, and the core work of the method is segmentation of the original video and extraction of key sequences in the video.

currently, video segmentation and key frame extraction methods are mainly classified into four categories:

the method is a simple generation algorithm, and a method for extracting key frames by performing equal-time uniform sampling on a video sequence, but the method easily has the problems of excessive extraction or insufficient representation of the key frames due to different changes of video information quantity in a short time;

Secondly, a generation method based on visual information is used for carrying out operations such as scene clustering, lens detection, key frame extraction and the like by applying various video processing technologies according to visual information such as color, shape, texture and the like in the video to finally generate a thumbnail video, and the method based on visual characteristics is obviously improved on a simple generation algorithm, but ignores information such as audio, subtitles and the like in the original video;

Thirdly, a multi-feature fusion generation method, for example, a face recognition technology is adopted to detect the appearance of important characters in news, an audio processing technology is utilized to detect wonderful segments in sports videos, and a plurality of features of the videos are fused by combining the features of the videos and other image processing technologies, so that the algorithm processing process is relatively complex;

and fourthly, searching structural rules between shots and between scenes based on a video syntax semantic generating method, and forming a video abstract on the basis of the structural rules.

in summary, for different video types and purposes, the video fast browsing processing methods are different, and the current panoramic video technology is widely applied to network virtual displays such as tourist attractions, home homes, automobile displays, leisure clubs, urban building plans, and the like, and these video scenes mainly provide an immersive experience and perfectly display the panorama to achieve a better propaganda purpose.

Disclosure of Invention

The invention provides a fast browsing method of 720-degree panoramic video, which realizes the video observation under all-directional angles of 360 degrees horizontally and 360 degrees vertically through a back projection method, extracts key frames according to the difference of the lengths of shooting lenses under different scenes of the video, forms a video summary and achieves the purpose of fast browsing.

A720-degree panoramic video fast browsing method comprises the following steps:

S1, firstly, reconstructing the 720-degree panoramic video image by using a back projection method to obtain a view sequence corresponding to each sight line direction of the spherical viewpoint space.

and S2, judging the length of the shot by calculating the absolute brightness frame difference of adjacent image frames in the video sequence, and then extracting key frames to realize the quick browsing of the panoramic video.

wherein, S1 includes the following steps:

S1.1, completing splicing of 720-degree panoramic images based on a spherical viewpoint space model, establishing two coordinate systems with the center of a sphere as the center, and respectively representing a world coordinate system XYZ and a camera coordinate system XYZ; the camera coordinate system XYZ is obtained by rotating the world coordinate system XYZ by an angle α around the X axis in the world coordinate system and then by an angle β around the Y axis in the world coordinate system.

The method for completing the splicing of the 720-degree panoramic image based on the spherical viewpoint space model in the S1.1 comprises the following steps: according to the property that straight lines which are parallel to the y axis in the camera coordinate system xyz and are perpendicular to the image transverse axis in images generated according to a spherical parameter transformation formula are still perpendicular lines, a plurality of live-action images shot by a fisheye lens are subjected to rotation transformation correction to obtain the direction information of pixel points on each live-action image in a viewpoint space, the plurality of images are spliced by using the direction information to eliminate repeated information possibly existing between the live-action images, and finally the images are projected onto a spherical surface and stored in the form of spherical panoramic images.

And S1.2, unifying the basic measurement units of the pixels under the two coordinate systems in the S1.1, and calculating the pixel focal distance taking the pixel as the basic measurement unit, namely, estimating the pixel focal distance f from the viewpoint to the view plane for each pixel under the camera coordinate system.

s1.2, setting an image S as a spliced spherical panoramic image, setting Q as any pixel point on the spherical panoramic image S, and setting image coordinates as; j is a view to be generated, a point P is a point corresponding to a point Q on the spherical panoramic image on the view J, and the image coordinates are as follows; f represents the focal length of the pixel, and f is estimated according to the lens used for shooting the live-action image.

The method for estimating the pixel focal length f of the wide-angle lens or the standard lens comprises the following steps: if the camera horizontally rotates for a circle to shoot n live-action images, the horizontal visual angle of the camera is 360/n, the width of the live-action image is W, and the pixel focal length estimation formula of the common lens can be obtained according to the trigonometric function relationship as follows:

f＝W/(2tan(180/n))。

The method for estimating the pixel focal length f of the fisheye lens comprises the following steps: and (3) recording the width of the image after the black frame of the fisheye image is removed as W, and then the pixel focal length estimation formula of the fisheye lens is as follows: and f is W/phi, wherein phi is the horizontal visual field of the fisheye lens.

s1.3, establishing a conversion relation between coordinates of two-dimensional image points and three-dimensional parameter coordinate points corresponding to a spherical surface by using a pixel focal length f, rotating an alpha angle around an X axis in a world coordinate system according to the world coordinate system XYZ, and rotating a beta angle around a Y axis in the world coordinate system, wherein along with the rotation of coordinate axes, the representation of the pixel points on each coordinate component is correspondingly changed (the corresponding positions of the pixel points after the rotation of the coordinate system need to be represented again under a new coordinate system, and the coordinate components are the corresponding components on three coordinate axes of X, Y and z respectively), and the change can be represented on each coordinate component by using a trigonometric function relation, so that a transformation matrix H of the corresponding points under the two coordinate systems is obtained.

and S1.4, establishing an inverse transformation function by the transformation matrix H, finding out a corresponding relation from any point on the panoramic image to a point on each view in the spherical space, and calculating coordinates of each point to obtain a corresponding view in each sight direction of the viewpoint space.

in S1.3, a conversion relation between the coordinates of the two-dimensional image point and the three-dimensional parameter coordinate point corresponding to the spherical surface is established by using the pixel focal length f, as follows:

Calculating a transformation matrix H of corresponding points under two coordinate systems, wherein the expression is as follows:

In S1.4, as is clear from expressions (1) and (2) in S1.3, the point in the coordinate system XYZ corresponds to the coordinate in the coordinate system XYZ.

knowing that the width of a live-action image shot in a video is W and the height is H, establishing a functional relation between any point Q on the spherical panoramic image and a point P corresponding to the point Q on a view J, and calculating the coordinates of each corresponding point by using a formula (3) to obtain a view corresponding to each sight line direction of a viewpoint space.

the method of S2 of the present invention comprises the steps of:

s2.1, structuring the panoramic video sequence, wherein the panoramic video corresponds to a video sequence formed by a group of views in each sight line direction, and classifying the video sequence obtained in the step S1 according to the view frame sequences projected on the view angles in different directions to obtain video sequence groups which can be browsed independently in a plurality of view angles;

S2.2, respectively segmenting the video sequence group in each visual angle direction, calculating the absolute brightness frame difference of adjacent image frames in the video sequence, judging the conversion node of the video shot, and segmenting the video sequence into a plurality of shot segments;

S2.3, calculating the sum of the motion amount of each lens segment, setting a motion amount measurement threshold, and judging whether the current lens is a long lens or a short lens according to the lens duration;

S2.4, respectively extracting key frames from the long shot and the short shot, randomly extracting one key frame from the short shot, and extracting multi-frame images as the key frames from the long shot according to an equal interval method;

And S2.5, recombining the extracted key frame sequences, restoring the extracted key frame sequences to different visual angle directions to generate a video summary, and achieving the purpose of quickly browsing the video by an observer through the operation on the video summary.

In S2.2, the absolute luminance frame difference AIFD is selected as a characteristic quantity for measuring the degree of change of video content, and the definition formula is as follows:

in the above formula, the sum respectively represents the brightness value of a pixel point of an image frame at the coordinate at the moment t and the brightness value of a pixel point of the next frame at the coordinate at the moment t in the video sequence, and W and H respectively represent the width and height of the video frame; if the number of image frames of the video completely played in a certain view angle direction is N, the average value of the luminance frame differences of the video is:

the average value of the brightness frame differences is calculated as a judgment standard, two different coefficients a and b are set (false detection is easy when the values of a and b are set to be too small, and omission is easy when the values of a and b are set to be too large, in the experiment, the value of a is 1.2, the value of b is 2.3, and empirical values are set to weight the average value of the brightness frame differences to obtain high and low thresholds thresh _ low and thresh _ high which are used as judgment conditions for whether the shot is converted or not and in which mode, wherein the judgment conditions are that

In S2.2, the method for segmenting the video sequence group is as follows:

Firstly, initializing input video frame data, calculating AIFD characteristic values of two adjacent frames at the time t, comparing and judging the characteristic value of a current frame with a judgment threshold value, and thus detecting whether shot conversion exists between the current frame and a next frame.

In S2.3, the lens is judged to belong to a long lens or a short lens by calculating the sum of the motion amount of the calculated lens and comparing the sum with a preset motion amount measurement threshold, wherein the relative motion amount between two adjacent video frames at the time t is represented and the duration of the lens is represented, and when the sum of the motion amount of the lens is greater than the motion amount measurement threshold, the long lens is judged, otherwise, the short lens is judged.

The 72-degree panoramic video quick browsing method provided by the invention can quickly generate perspective views of a virtual scene in different sight directions, effectively simulate camera rotation and zooming motion in the views in all directions, improve the browsing speed of the virtual scene, and can well meet the specific application field of a virtual reality system.

drawings

FIG. 1 is a schematic diagram of a coordinate system for back projection of panoramic images

FIG. 2 is a block diagram of a panoramic video with segmentation and reconstruction in different view directions

FIG. 3 key frame extraction block diagram

FIG. 4 is a schematic diagram of trigonometric function relationship between W, f and θ

Detailed Description

the present invention will be described in further detail below with reference to the accompanying drawings.

In order to effectively browse 720-degree panoramic videos in all directions, the first step of the invention is to reconstruct the panoramic videos by using a back projection method to obtain a view sequence corresponding to each sight line direction of a spherical viewpoint space, and simulate the rotation and zoom motion of a camera to realize browsing the videos in different view angles, and the specific steps are as follows:

s1.1, completing splicing of 720-degree panoramic images based on a spherical viewpoint space model, establishing two coordinate systems with the center of sphere as the center, and respectively representing a world coordinate system XYZ and a camera coordinate system XYZ.

The method for completing splicing of the 720-degree panoramic image based on the spherical viewpoint space model comprises the following steps: according to the property that straight lines which are parallel to the y axis in the camera coordinate system xyz and are perpendicular to the image transverse axis in images generated according to a spherical parameter transformation formula are still perpendicular lines, a plurality of live-action images shot by a fisheye lens are subjected to rotation transformation correction to obtain the direction information of pixel points on each live-action image in a viewpoint space, the plurality of images are spliced by using the direction information to eliminate repeated information possibly existing between the live-action images, and finally the images are projected onto a spherical surface and stored in the form of spherical panoramic images.

The camera coordinate system XYZ is obtained by rotating the world coordinate system XYZ around the X-axis in the world coordinate system and then rotating around the Y-axis in the world coordinate system.

Setting an image S as a spliced spherical panoramic image, wherein Q is any pixel point on the spherical panoramic image S and the image coordinate is; j is a view to be generated (i.e. J is a view in a certain sight line direction to be finally obtained), as shown in fig. 1, a point P is a point on the spherical surface corresponding to a point Q on the view J, and the image coordinates are; f represents the focal length of a pixel, and is estimated according to the use of a common lens (a general wide-angle lens and a standard lens) or a fisheye lens for shooting a live-action image.

S1.2, estimating the pixel focal length f of the lens in order to unify the basic measurement units of the pixel points under the two coordinate systems.

the pixel focal length estimation method of a common lens (a general wide-angle lens and a standard lens) comprises the following steps: if the camera horizontally rotates for a circle to shoot n live-action images, the horizontal visual angle of the camera is 360/n, the width of the live-action image is W, and the pixel focal length estimation formula of the common lens can be obtained according to the trigonometric function relationship as follows: f is W/(2tan (180/n)). The trigonometric function relationship refers to the relationship of sine, cosine, tangent and the like existing in a right triangle.

Referring to fig. 4, taking a cut-plane view of the panorama, the various quantities may be represented by trigonometric relationships. Assuming that θ represents the horizontal angle of view of the camera, θ is 360/n, and it can be seen from the figure that W, f and θ are trigonometric functions:

The conversion deduces that the information is converted,W/(2tan (θ/2)), that is, W/(2tan (180/n))

The pixel focal length estimation of the fisheye lens can be deduced by an equidistant imaging model of the fisheye lens, and specifically comprises the following steps: and (3) recording the width of the image after the black frame of the fisheye image is removed as W, and then the pixel focal length estimation formula of the fisheye lens is as follows: and f is W/phi, wherein phi is the horizontal visual field of the fisheye lens, and the searching can be carried out through the fisheye lens specification.

s1.3, according to the inverse operation in the panoramic image generation process, converting the two-dimensional image coordinates into three-dimensional parameter coordinates for operation, wherein an image coordinate point Q corresponds to a point on a spherical surface and meets the following conversion relational expression:

after the transformation matrix is obtained, the point in the coordinate system XYZ corresponds to the coordinate in the coordinate system XYZ, as can be seen from the above two equations.

S1.4, knowing that the width of a live-action image frame shot in the video is W and the height is H, establishing a functional relation between any point Q on the spherical panoramic image and a corresponding point P on a view J, and calculating the coordinates of each corresponding point.

The first step of the method is completed, the image frames of the 720-degree panoramic video are subjected to back projection transformation to obtain the corresponding view in any sight line direction of the viewpoint space, and the 720-degree panoramic video can be watched in any sight line direction. The method comprises the following steps of directly browsing the 720-degree panoramic video one by one in an all-round mode, wherein the problem of overlarge browsing access data exists, on one hand, fatigue of an observer is easily caused, and on the other hand, the efficiency of extracting key information is influenced.

the video is composed of a plurality of different scenes, each scene comprises a plurality of shots, wherein each shot comprises a long shot and a short shot, each shot is formed by playing a plurality of frames of associated images according to a certain sequence, and therefore the video frames are the most basic units for forming the video. In order to realize the quick browsing of the video, acquiring the key frame in the video image becomes the key for extracting the effective information of the video. Generally speaking, different types of videos have key points and secondary points to a shooting scene according to their own subjects, and the length of a shot is also distinguished according to the difference of the focus points, so that the extraction of key frames is more beneficial by detecting and judging the long shot and the short shot in the videos.

S2.1, obtaining view sequences of the panoramic video in different directions in the first step, classifying the video sequences according to the view sequences projected on different direction view angles (an expanded 360-degree panoramic image is formed by splicing images of a plurality of different view angles, restoring the panoramic image to a plurality of views on different view angles through back projection, arranging the views according to the sequences, and classifying the views according to sequence numbers) to obtain a view sequence group which can be browsed independently on a plurality of view angles. A frame of panorama in a panoramic video is back-projected to obtain a plurality of views in the direction of the line of sight, and a panoramic video is back-projected to obtain a sequence of video views in the direction of a plurality of viewing angles, each direction having a plurality of views.

And S2.2, respectively carrying out segmentation processing on the video sequence groups in different directions and visual angles.

The absolute luminance frame difference aifd (absolute intensity frame difference) is used as a feature quantity for measuring the degree of change of video content, and is defined as follows:

in the above formula, the sum represents the brightness value of the pixel point of the image frame at the coordinate at the moment t and the brightness value of the pixel point of the image frame at the coordinate at the moment t +1 in the video sequence, and W and H represent the width and height of the video frame respectively. If the number of image frames of the video completely played in a certain view angle direction is N, the average value of the luminance frame differences of the video is:

because the brightness frame difference of the pixel points under the same lens is not changed greatly and presents a more uniform distribution condition, two different coefficients a and b can be set by calculating the average value of the brightness frame difference as a judgment reference, and when the values of a and b are set to be too small, false detection is easy; if the setting is too large, the leak detection is easy. (in the experiment, the value of a is 1.2, the value of b is 2.3, and the value is an empirical value), the average value of the luminance frame differences is weighted to obtain high and low thresholds thresh _ low and thresh _ high, which are used as the judgment conditions for whether the shot is converted or not and in which mode.

The specific implementation steps of segmenting the video sequence include initializing input video frame data, calculating AIFD characteristic values of two adjacent frames at time t, comparing and judging the characteristic value of the current frame with a judgment threshold value, and thus detecting whether lens conversion exists between the current frame and the next frame. The judging method is that if the current frame characteristic value is less than thresh _ low, no shot switching exists, if the current frame characteristic value is more than thresh _ low and less than thresh _ high, the current frame is considered to have the possibility of gradual shot conversion, if the current frame characteristic value is more than thresh _ high, the current frame is considered to have the possibility of abrupt shot conversion, and the current frame is recorded as the conversion of the shot, namely, a shot conversion node is recorded no matter the current frame is gradual or abrupt.

s2.3, the motion component is usually used for representing the content change condition in the video, the total motion amount of the calculated lens is calculated and compared with a set motion amount measurement threshold (the motion amount measurement threshold is a preset threshold, generally, the histogram difference of two frames of images under the same lens is considered to be small, when a difference accumulated value, namely the motion amount total exceeds the set motion amount measurement threshold, the lens is judged to be a long lens or a short lens, wherein the difference represents the relative motion amount between two adjacent video frames at the time t, namely the difference between the two adjacent frames, and is measured by the histogram difference rate, even if the two frames of images under the same lens are not completely unchanged, the difference value is only small. And the duration of the shot is represented, and when the sum of the motion amount of the shot is greater than the motion amount measurement threshold, the shot is judged to be a long shot, otherwise, the shot is judged to be a short shot.

and S2.4, extracting a key frame for the short shot according to a random selection method, and selecting a plurality of frames of images at equal intervals as the key frame of the long shot by the long shot according to the start frame of the shot.

And S2.5, recombining the extracted key frame sequence, restoring the extracted key frame sequence to different visual angle directions to generate a video summary, and achieving the purpose of quickly browsing the video by an observer through the operation on the video summary.

Claims

1. A720-degree panoramic video fast browsing method is characterized by comprising the following steps:

s1, firstly, reconstructing a 720-degree panoramic video image by using a back projection method to obtain a view sequence corresponding to each sight line direction of a spherical viewpoint space;

s2, judging the length of the shot by calculating the absolute brightness frame difference of adjacent image frames in the video sequence, and then extracting key frames to realize the quick browsing of the panoramic video, wherein the method comprises the following steps:

S2.1, carrying out structuring processing on the panoramic video sequence, and classifying the video sequence obtained in the step S1 according to the view frame sequences projected on the view angles in different directions to obtain a video sequence group which can be browsed independently on a plurality of view angles;

s2.3, through the sum of the lens motion amount of each lens segmentThe motion quantity measuring threshold is compared with a preset motion quantity measuring threshold, whether the shot belongs to a long shot or a short shot is judged, wherein M _ f (t) represents the relative motion quantity between two adjacent video frames at the time t, S _ time represents the duration of the shot, when the sum of the motion quantities of the shot is greater than the motion quantity measuring threshold, the shot is judged to be the long shot, and otherwise, the shot is the short shot;

2. The method for browsing 720-degree panoramic video quickly as claimed in claim 1, wherein the method of S1 is as follows:

s1.1, completing splicing of 720-degree panoramic images based on a spherical viewpoint space model, establishing two coordinate systems with the center of a sphere as the center, and respectively representing a world coordinate system XYZ and a camera coordinate system XYZ; the camera coordinate system XYZ is obtained by rotating a world coordinate system XYZ by an angle alpha around an X axis in the world coordinate system and then rotating by an angle beta around a Y axis in the world coordinate system;

S1.2, unifying the basic measurement units of the pixels under the two coordinate systems in the S1.1, calculating the pixel focal distance taking the pixel as the basic measurement unit, namely estimating the pixel focal distance f from the viewpoint to the view plane for each pixel under the camera coordinate system;

S1.3, establishing a conversion relation between coordinates of two-dimensional image points and three-dimensional parameter coordinate points corresponding to a spherical surface by using a pixel focal length f, rotating an alpha angle around an X axis in a world coordinate system according to the world coordinate system XYZ, and rotating a beta angle around a Y axis in the world coordinate system, wherein along with the rotation of a coordinate axis, the representation of pixel points on each coordinate component also changes correspondingly, and the change can be represented on each coordinate component by using a trigonometric function relation, so that a transformation matrix H of corresponding points under two coordinate systems is obtained;

3. The method for browsing 720-degree panoramic video quickly according to claim 1, wherein in S1.2, an image S is a spliced spherical panoramic image, Q is any one pixel point on the spherical panoramic image S, and the image coordinates are (x ', y'); j is a view to be generated, a point P is a point corresponding to a point Q on the spherical panoramic image on the view J, and the image coordinates are (x, y); f represents the focal length of the pixel, and f is estimated according to a lens used for shooting a live-action image;

The method for estimating the pixel focal length f of the wide-angle lens or the standard lens comprises the following steps: if the camera horizontally rotates for a circle to shoot n live-action images, the horizontal visual angle of the camera is 360/n, the width of the live-action image is W, and the pixel focal length estimation formula of the common lens can be obtained according to the trigonometric function relationship as follows: w/(2tan (180/n));

4. The method for browsing 720-degree panoramic video quickly as claimed in claim 3, wherein in S1.3, a pixel focal length f is used to establish a conversion relation between the coordinates of a two-dimensional image point and a three-dimensional parameter coordinate point corresponding to a spherical surface, as follows:

5. The method according to claim 4, wherein in S1.4, as shown in equations (1) and (2) in S1.3, the point Q '(u, v, w) in the coordinate system XYZ corresponds to (u', v ', w') ═ H (u, v, w) in the coordinate system XYZ;

Knowing that the width of a live-action image shot in a video is W and the height of the live-action image is H, establishing a functional relation between any point Q (x ', y') on the spherical panoramic image and a point P (x, y) corresponding to the point Q on a view J, and calculating the coordinates of each corresponding point by using a formula (3) to obtain a view corresponding to each sight line direction of a viewpoint space;

6. the method according to claim 1, wherein in S2.2, an absolute luminance frame difference AIFD is selected as a characteristic quantity for measuring a degree of change of video content, and the definition thereof is as follows:

In the above formula, f (x, y, t) and f (x, y, t +1) respectively represent the brightness value of a pixel point at the (x, y) coordinate of an image frame at the time t in the video sequence and the brightness value of a pixel point at the (x, y) coordinate of the next frame at the time t in the video sequence, and W and H respectively represent the width and the height of the video frame; if the number of image frames of the video completely played in a certain view angle direction is N, the average value of the luminance frame differences of the video is:

setting two different coefficients a and b by calculating the mean value of the luminance frame difference as a determination reference, weighting the mean value of the luminance frame difference to obtain high and low thresholds thresh _ low and thresh _ high as determination conditions for whether and in which manner the shot is converted, wherein

7. The method for browsing 720-degree panoramic video quickly according to claim 6, wherein a takes a value of 1.2 and b takes a value of 2.3.

8. The method for browsing 720-degree panoramic video quickly according to claim 6, wherein in S2.2, the method for segmenting the video sequence group is as follows: