CN112954443A

CN112954443A - Panoramic video playing method and device, computer equipment and storage medium

Info

Publication number: CN112954443A
Application number: CN202110307765.XA
Authority: CN
Inventors: 张伟俊; 陈聪; 马龙祥
Original assignee: Insta360 Innovation Technology Co Ltd
Current assignee: Insta360 Innovation Technology Co Ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-06-11
Also published as: WO2022199441A1

Abstract

The application relates to a method and a device for playing a panoramic video, a computer device and a storage medium. The method comprises the following steps: acquiring a panoramic video, and carrying out target detection on the panoramic video to obtain a detection result; wherein the detection result comprises a candidate region where the target object is located; generating a recommended viewing angle video according to the detection result; wherein the picture content of the recommended viewing perspective video comprises the target object; displaying the recommended viewing angle video and a preset viewing angle video on the same display picture; and the picture content of the preset viewing angle video is different from the picture content of the recommended viewing angle video. By adopting the method, the display of other target objects in the panoramic video can be realized, the user is prevented from missing wonderful contents, and the watching experience of the panoramic video is improved.

Description

Panoramic video playing method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for playing a panoramic video, a computer device, and a storage medium.

Background

The panoramic video is characterized in that 360-degree all-dimensional shooting is carried out on the environment through a plurality of cameras to obtain a plurality of video streams, the plurality of video streams are synthesized through technologies such as synchronization, splicing and projection, and a user can select any angle to watch in an upper, lower, left, right, front and back 360-degree range, so that the personally-immersive watching experience is obtained.

In order to ensure the viewing effect of the panoramic video, when the panoramic video is played, the display device only displays a partial area of the panoramic area covered in the panoramic video, and a user can change the viewing angle to view other areas in the panoramic video. The range of the current viewable area of the user is commonly referred to as the viewing angle (FOV). In the conventional technology, a partial area in a panoramic video is played at a preset viewing angle.

However, the conventional method of playing the panoramic video at the preset viewing angle causes the user to easily miss the highlight content in other areas of the panoramic video, and reduces the viewing experience of the panoramic video.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a method and an apparatus for playing a panoramic video, a computer device, and a storage medium.

A method for playing panoramic video comprises the following steps:

acquiring a panoramic video, and carrying out target detection on the panoramic video to obtain a detection result; the detection result comprises a candidate area where the target object is located;

generating a recommended viewing angle video according to the detection result; the image content of the recommended viewing angle video comprises a target object;

displaying the recommended viewing angle video and the preset viewing angle video on the same display picture; the picture content of the preset viewing angle video is different from the picture content of the recommended viewing angle video.

In one embodiment, the target detection of the panoramic video to obtain a detection result includes:

extracting single-frame images in the panoramic video through a preset interval to obtain a single-frame image set;

performing target detection on each single-frame image in the single-frame image set to obtain a detection result; the detection result comprises candidate regions corresponding to the single-frame images, and the candidate regions comprise target objects.

In one embodiment, generating a recommended viewing angle video according to the detection result includes:

determining a target candidate region of each single-frame image based on the characteristic parameters of the candidate regions corresponding to the single-frame images; determining the central point of the target candidate region of each single-frame image as the position of a target object in each single-frame image;

generating a recommendation picture corresponding to each single-frame image according to the position of the target object of each single-frame image;

and generating a recommended viewing angle video according to the recommended picture corresponding to each single-frame image.

In one embodiment, the determining the target candidate region of each single-frame image based on the feature parameters of the candidate regions corresponding to the multiple single-frame images includes:

and acquiring the candidate region with the maximum confidence in each single-frame image as a target candidate region.

In one embodiment, the characteristic parameter includes the area of the candidate region in a single-frame image, the single-frame image set includes N single-frame images, the N single-frame images have a time sequence, and N is a positive integer;

determining a target candidate region of each single-frame image from the candidate region corresponding to each single-frame image based on the characteristic parameters, wherein the determining comprises the following steps:

if N is 1, the candidate region with the largest area in the 1 st frame single frame image is used as the target candidate region of the 1 st frame single frame image.

In one embodiment, the feature parameter further includes a center point position of the candidate region, and the determining, based on the feature parameter, the target candidate region of each single-frame image from the candidate region corresponding to each single-frame image further includes:

if N is larger than 1, determining the position of the central point of each candidate area in the single-frame image of the Nth frame;

calculating Euclidean distances between the central point position of each candidate region in the N-th frame single-frame image and the central point position of the target candidate region in the N-1-th frame single-frame image;

and determining the candidate region with the minimum Euclidean distance as a target candidate region of the single-frame image of the Nth frame.

In one embodiment, generating a recommended picture corresponding to each single-frame image according to the position of the target of interest of each single-frame image includes:

acquiring the type of a target object included in a target candidate region to which the position of the target object of each single-frame image belongs;

if the type of the target object is a preset target type, generating a recommendation picture with a preset size and the position of the target object being at a preset picture position;

and if the type of the target object is not the preset target type, generating a recommendation picture with the minimum area including the target candidate region.

In one embodiment, generating a recommended viewing angle video according to a recommended picture corresponding to each single-frame image includes:

performing interpolation calculation on the position coordinates of the target object in the adjacent recommended pictures by adopting an interpolation algorithm to obtain intermediate position coordinates;

generating an intermediate recommendation picture including the target object according to the intermediate position coordinates; the playing time of the middle recommendation picture is positioned between the adjacent recommendation pictures;

and generating a recommended viewing angle video by sequencing the recommended pictures and the middle recommended pictures from front to back according to the playing time.

A method for automatically generating a video with a recommended viewing angle from a panoramic video comprises the following steps:

generating a recommended viewing angle video according to the detection result; wherein, the picture content of the recommended viewing angle video comprises the target object.

A playback apparatus of a panoramic video, comprising:

the target detection module is used for acquiring the panoramic video and carrying out target detection on the panoramic video to obtain a detection result; the detection result comprises a candidate area where the target object is located;

the video generation module is used for generating a recommended viewing angle video according to the detection result; the image content of the recommended viewing angle video comprises a target object;

the synchronous display module is used for displaying the recommended viewing angle video and the preset viewing angle video on the same display picture; the picture content of the preset viewing angle video is different from the picture content of the recommended viewing angle video.

An apparatus for automatically generating a recommended viewing angle video from a panoramic video, comprising:

the video generation module is used for generating a recommended viewing angle video according to the detection result; wherein, the picture content of the recommended viewing angle video comprises the target object.

A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:

According to the method and device for playing the panoramic video, the computer equipment and the storage medium, the panoramic video is obtained, the target detection is carried out on the panoramic video, the detection result of the candidate area where the target object is located is obtained, the recommended viewing angle video including the target object is generated according to the detection result, and the recommended viewing angle video with different picture contents and the preset viewing angle video are displayed on the same display picture, so that the display of other target objects in the panoramic video is realized, the user is prevented from missing wonderful contents, and the viewing experience of the panoramic video is improved.

Drawings

FIG. 1 is a diagram illustrating an internal structure of a computer device according to an embodiment;

FIG. 2 is a flowchart illustrating a method for playing a panoramic video according to an embodiment;

FIG. 3 is a schematic diagram illustrating a relationship between a panoramic video and a recommended viewing angle video and a preset viewing angle video according to an embodiment;

fig. 4a to 4e are schematic display diagrams of a recommended viewing angle video and a preset viewing angle video;

FIG. 5 is a schematic diagram illustrating an exemplary process for performing object detection on a panoramic video;

FIG. 6 is a schematic flow chart illustrating generation of a video with a recommended viewing perspective in one embodiment;

FIG. 7 is a flow diagram illustrating the determination of a target candidate region in one embodiment;

FIG. 8 is a flow diagram that illustrates the generation of a recommendation screen, in one embodiment;

FIG. 9 is a schematic diagram of a process for generating a video with a recommended viewing angle according to another embodiment;

FIG. 10 is a flow chart illustrating a process of generating a video with a recommended viewing angle according to another embodiment;

FIG. 11 is a block diagram showing a configuration of a playback apparatus for panoramic video according to an embodiment;

fig. 12 is a block diagram of a recommended viewing angle video generation apparatus based on panoramic video in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The panoramic video playing method provided by the application can be applied to computer equipment shown in fig. 1. The computer device may be a terminal, the internal structure of which may be as shown in fig. 1. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of playing a panoramic video. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, as shown in fig. 2, a method for playing a panoramic video is provided, which is exemplified by the method applied to the computer device in fig. 1, and includes the following steps:

s210, acquiring the panoramic video, and carrying out target detection on the panoramic video to obtain a detection result.

The detection result comprises a candidate area where the detected target object is located, the candidate area is used for representing a position area of the target object in the single-frame image, and can be determined by framing on the single-frame image through a rectangular frame and also can be determined by framing on the single-frame image through a boundary line of the target object. In this embodiment, the expression form of the candidate region is not specifically limited.

Optionally, after the computer device obtains the panoramic video, target detection may be performed on the panoramic video according to a preset target of interest to which the user focuses, so as to obtain a detection result including a candidate region where the target of interest is located. The object of interest is a target object of interest, i.e. a point of interest, which is preset by a user, for example, the object of interest may be a person, an animal, a vehicle, an airplane, etc., may be static, such as a building, a tree on the road, or may be dynamic, such as a running vehicle, a running athlete, etc. The computer equipment can also perform target detection on the panoramic video according to a preset target identification algorithm to obtain the detection results of the candidate areas where all the identified target objects are located.

And S220, generating a recommended viewing angle video according to the detection result.

Wherein, the picture content of the recommended viewing angle video comprises the target object.

Optionally, the computer device generates a recommended picture according to the candidate region where the target object is located in the detection result, and the recommended picture constitutes the recommended viewing angle video. Optionally, the candidate region may be directly used as the recommended picture, and the recommended image may be obtained by selecting a frame in the single frame image by using the center point of the candidate region as the center and externally expanding the preset size specification.

When the detection result of the single frame image includes candidate regions of a plurality of target objects, optionally, a recommended viewing perspective video may be generated for the candidate regions of the same target object, and when a plurality of target objects exist, a plurality of corresponding recommended viewing perspective videos may be generated. For example, the target detection that the target object is the vehicle is performed on 100 single-frame images (1 st to 100 th frames) with continuous playing time in the panoramic video, the 1 st to 50 th single-frame images include candidate areas of the vehicle a and the vehicle B, the 51 st to 100 th single-frame images include only the candidate area of the vehicle B, the computer device generates a recommendation screen about the vehicle a according to the candidate area of the vehicle a in the 1 st to 50 th single-frame images to form a recommended viewing angle video about the vehicle a, and generates a recommendation screen about the vehicle B according to the candidate area of the vehicle B in the 1 st to 100 th single-frame images to form a recommended viewing angle video about the vehicle B. The method may further include determining a target candidate region among the plurality of candidate regions by using a preset screening condition, generating a recommended picture corresponding to the target candidate region, and correspondingly forming a recommended viewing angle viewing video corresponding to the target candidate region, where the screening condition may be that a confidence of a detection result is maximum or minimum, or that an area of a candidate region in the detection result is maximum or minimum.

And S230, displaying the recommended viewing angle video and the preset viewing angle video on the same display picture.

The picture content of the preset viewing angle video is different from the picture content of the recommended viewing angle video. The preset viewing angle video is a video generated by the computer device based on the panoramic video at the default viewing angle, and the picture can be adjusted based on an instruction input by a user so as to display other areas in the panoramic video according to the intention of the user. For example, the preset viewing angle video may be a video acquired by a preset camera, or a video whose picture content includes a preset target object. In this embodiment, the forming manner of the preset viewing angle video is not specifically limited.

For example, as shown in fig. 3, the computer device performs target detection on a target object person on a single frame image in a panoramic video, and further generates a recommended viewing angle video according to a detection result, where the preset viewing angle is a video acquired by a certain camera, and the computer device displays the generated recommended viewing angle video and a preset viewing angle video different from the recommended viewing angle video on the same display screen.

Optionally, the computer device may display the recommended viewing angle video and the preset viewing angle video in parallel on the same display screen, such as row display (fig. 4a), column display (fig. 4c and fig. 4d), or diagonal display, and may display the recommended viewing angle video first and then the preset viewing angle video, and may also display the preset viewing angle video first and then the recommended viewing angle video. The computer device may also recommend the viewing angle video as a secondary displayed video, display the recommended viewing angle video and the preset viewing angle on the same display screen with the video mainly displayed by the preset viewing angle video, for example, display the preset viewing angle video in a full screen, and display the recommended viewing angle video by a thumbnail, where the thumbnail may be located at any position on the whole display screen, such as an upper left corner (fig. 4d), a lower left corner, a middle corner, an upper right corner, or a lower right corner (fig. 4 e). The computer equipment can also realize the recommended viewing angle video in a full screen mode, and display the preset viewing angle video in a thumbnail mode. In this embodiment, the size and the positional relationship between the recommended viewing angle video and the preset viewing angle video are not specifically limited.

In this embodiment, the computer device performs target detection on the panoramic video to obtain a detection result including a candidate region where the target object is located, generates a recommended viewing angle video including the target object according to the detection result, and displays the recommended viewing angle video with different picture contents and a preset viewing angle video on the same display picture, so that display of other target objects in the panoramic video is realized, a user is prevented from missing wonderful contents, and viewing experience of the panoramic video is improved.

In one embodiment, to improve the target detection efficiency, as shown in fig. 5, the step S210 includes:

and S510, extracting single-frame images in the panoramic video through a preset interval to obtain a single-frame image set.

Optionally, the computer device may extract a single frame image in the panoramic video at preset time period intervals to obtain a single frame image set. For example, the time period T is 1s, and the computer device extracts a single frame image in the panoramic video every 1s, and correspondingly obtains a single frame image set a. The computer equipment can also extract single-frame images in the panoramic video at preset interval frame number intervals to obtain a single-frame image set. For example, the preset interval frame number is 5 frames, and the computer device extracts a single frame image in the panoramic video every 5 frames, so as to obtain a single frame image set B correspondingly.

S520, performing target detection on each single-frame image in the single-frame image set to obtain a detection result.

The detection result comprises candidate regions corresponding to the single-frame images, and the candidate regions comprise target objects. The type of the target object may be a person, a human face, a vehicle, a building, or the like.

Alternatively, the computer device may perform target detection on each single-frame image in the single-frame image set by using a target detection model based on machine learning, so as to obtain a detection result. For example, the computer device inputs each single frame image in the single frame image set into a face detection model trained by using a large number of face images as training samples and a vehicle detection model trained by using a large number of vehicle images as training samples to perform target detection, so as to obtain corresponding detection results, where the detection results include candidate regions including faces and candidate regions including vehicles, which correspond to the single frame images. And the target detection of the target object can be carried out by adopting modes of template matching, key point matching, key feature detection and the like.

In this embodiment, the computer device extracts single-frame images in the panoramic video at preset intervals to obtain a single-frame image set, performs target detection on each single-frame image in the single-frame image set to obtain a detection result of a candidate region including a target object, which corresponds to each of the plurality of single-frame images, and extracts the single-frame images in the panoramic video at the preset intervals to perform target detection, so that data amount required for target detection is reduced, and target detection efficiency is improved.

In one embodiment, to improve the picture effect of the recommended viewing angle video, as shown in fig. 6, the step S220 includes:

s610, determining a target candidate region of each single-frame image based on the characteristic parameters of the candidate regions corresponding to the single-frame images.

Wherein the feature parameter may be used to characterize a region characteristic of the candidate region, such as at least one of an area, a center point position, a color histogram, or a confidence of a target type in the candidate region.

When the detection result corresponding to the single frame image includes a plurality of candidate regions, the computer device needs to determine a target candidate region corresponding to the single frame image from the plurality of candidate regions according to the feature parameters of the plurality of candidate regions.

In an optional embodiment, the feature parameter includes a confidence level of the candidate region in a single-frame image, that is, a confidence level of a target type characterized by the candidate region, and for each single-frame image, the computer device may acquire, as a target candidate region of the single-frame image, a candidate region with the highest confidence level of a plurality of candidate regions in the corresponding detection result. For example, for the single-frame images 1 to 200, the confidence degrees of the candidate regions a in the detection results corresponding to the single-frame images 1 to 100 are the highest, the computer device determines that the candidate regions a in the single-frame images 1 to 150 are the target candidate regions corresponding to the single-frame images 1 to 100, the confidence degrees of the candidate regions B in the detection results corresponding to the single-frame images 151 to 200 are the highest, the candidate regions a in the single-frame images 1 to 150 are the target candidate regions corresponding to the single-frame images 1 to 100, and the computer device determines that the candidate regions B in the single-frame images 151 to 200 are the target candidate regions corresponding to the single-frame images 151 to 200.

Optionally, the computer device may further acquire, for a 1 st single-frame image in the N single-frame images having a time sequence, a candidate region with the highest confidence of the multiple candidate regions in the 1 st single-frame image as a target candidate region of the 1 st single-frame image, and after determining the target candidate region of the 1 st single-frame image, acquire similarities between the multiple candidate regions in the 2 nd single-frame image and the target candidate region in the 1 st single-frame image, determine the target candidate region in the 2 nd single-frame image, and specifically may determine the candidate target region with the highest similarity as the target candidate region in the 2 nd single-frame image. And by analogy, the target candidate region of each single-frame image in the N single-frame images is finally obtained. The similarity may be an intersection ratio of areas between the candidate region and the target candidate region, a correlation coefficient of a color histogram, or a babbitt distance.

S620, determining the central point of the target candidate region of each single-frame image as the position of the target object in each single-frame image.

Optionally, if the candidate region obtained by target detection is a region framed by a rectangular frame in a single-frame image, and correspondingly, the target candidate region is also a region framed by a rectangular frame in a single-frame image, the computer device obtains an intersection point of two diagonal lines of the target candidate region as the position of the target object in the single-frame image. If the candidate region obtained by target detection is a region framed and selected by the boundary line of the target object in the single-frame image, and correspondingly, the target candidate region is also a region framed and selected by the boundary line of the target object in the single-frame image, the computer device obtains the geometric center of the target candidate region as the position of the target object in the single-frame image.

And S630, generating a recommendation picture corresponding to each single-frame image according to the position of the target object of each single-frame image.

And S640, generating a recommended viewing angle video according to the recommended picture corresponding to each single-frame image.

Alternatively, the computer device may use the position of the target object of each single frame image as a central point, extend a preset length outward to form a circular region with the preset length as a radius, and use the circular region as a recommendation picture corresponding to the single frame image. The computer equipment can also take the position of the target object of each single-frame image as a central point, extend a preset length towards the X-axis direction and extend a preset length towards the Y-axis direction to form a rectangular area with the preset length, and the rectangular area is taken as a recommended picture of the corresponding single-frame image. Accordingly, a recommended viewing angle video composed of the recommended pictures is obtained.

In this embodiment, the computer device determines, based on the confidence degrees or the areas of the candidate regions corresponding to the multiple single-frame images, a target candidate region of each single-frame image in the corresponding candidate region, and determines the center point of the target candidate region of each single-frame image as the position of the target object in each single-frame image, so as to generate a recommended picture corresponding to each single-frame image according to the position of the target object in each single-frame image, and then generates a recommended viewing angle video from the recommended picture corresponding to each single-frame image, so that each frame image in the generated recommended viewing angle video includes the target object with the highest confidence degree or the appropriate area, thereby improving the picture effect of the recommended viewing angle video.

In one embodiment, the feature parameter includes an area of the candidate region in a single frame image, the single frame image set includes N single frame images, the N single frame images have a time sequence, N is a positive integer, and when the target candidate region of the single frame image is determined according to the area of the candidate region, S610 includes:

and if N is equal to 1, the computer device takes the candidate area with the largest area in the 1 st frame single-frame image as the target candidate area of the 1 st frame single-frame image.

The 1 st frame of single-frame image is the 1 st frame of image in the single-frame image set, and is not necessarily the 1 st frame of image in the panoramic video.

Specifically, the computer device obtains the areas of all candidate regions in the 1 st frame single-frame image in the single-frame image set, and determines the candidate region with the largest area as the target candidate region of the 1 st frame single-frame image.

In an optional embodiment, the feature parameter further includes a center point position of the candidate region, as shown in fig. 7, if N is greater than 1, the step S610 further includes:

s710, determining the central point position of each candidate area in the single-frame image of the Nth frame.

Specifically, the computer device acquires the geometric center position of each candidate region in the single-frame image of the nth frame, and takes the geometric center position as the center point position of the corresponding candidate region.

S720, calculating Euclidean distances between the central point positions of all candidate areas in the single-frame image of the Nth frame and the central point positions of target candidate areas in the single-frame image of the (N-1) th frame.

And S730, determining the candidate region with the minimum Euclidean distance as a target candidate region of the single-frame image of the Nth frame.

Specifically, the computer device calculates euclidean distances between the center point positions of the respective candidate regions in the N-th frame single-frame image and the center point positions of the target candidate regions of the N-1-th frame (i.e., the previous frame) single-frame image, and takes the candidate region in the N-th frame single-frame image, in which the euclidean distance is the smallest, as the target candidate region of the N-th frame single-frame image. For example, after determining the target candidate region of the single frame image of the 1 st frame, when N is 2, the single frame image of the 2 nd frame includes three candidate regions a, b, and c, the computer device obtains the euclidean distances between the center point position of the target candidate region of the single frame image of the 1 st frame and the center point positions of the three candidate regions a, b, and c in the single frame image of the 2 nd frame, and obtains the euclidean distances of L1, L2, and L3, where L2 < L1 < L3, the computer device determines the candidate region b corresponding to the minimum euclidean distance L2 as the target candidate region of the single frame image of the 2 nd frame, and so on, and then calculates the euclidean distance between the center point position of the target candidate region in the single frame image of the 2 nd frame and the center point position of the candidate region in the single frame image of the 3 rd frame, and determines the candidate region with the smallest euclidean distance therefrom as the target candidate region … of the target region in the single frame image of the 3 rd frame to determine the target region of the single frame image of the 3 rd frame based on the target candidate region of And determining a target candidate region in the candidate regions of the single-frame images.

In this embodiment, the computer device determines that the candidate region with the largest area in the 1 st single-frame image of the single-frame image set is the target candidate region of the 1 st single-frame image, and determines that the candidate region in the next single-frame image with the smallest euclidean distance is the target candidate region of the next single-frame image by calculating the euclidean distance between the center point position of the target candidate region in the previous single-frame image and the center point position of the candidate region in the next single-frame image, so that the recommended viewing angle video always includes the same target object, thereby improving the retrospective display of the same target object.

In an embodiment, to further improve the picture effect of the video with the recommended viewing angle, as shown in fig. 8, the above S630 includes:

and S810, acquiring the type of the target object included in the target candidate region to which the position of the target object of each single-frame image belongs.

Specifically, after determining the position of the target object in each single frame image, the computer device further obtains the type of the target object included in the target candidate region to which the position belongs, and determines the generated recommendation screen according to the type of the target object.

And S820, if the type of the target object is a preset target type, generating a recommended picture with a preset size and the position of the target object being at a preset picture position.

And S830, if the type of the target object is not the preset target type, generating a recommendation picture with the minimum area including the target candidate region.

Optionally, if the type of the target object is a preset target type, the computer device generates a recommendation screen with a preset size, and the position of the target object is located at a preset screen position of the generated recommendation screen. For example, if the preset target type is a human face, and if the type of the target object is a human face, the computer device generates a recommendation screen in which the position of the human face is located at the height, 1/2 width of the generated recommendation screen 2/3. And if the type of the target object is not the preset target type, the computer equipment generates a recommendation picture with the minimum area including the target candidate region. For example, the type of the target object is a football field (not a human face), and the computer device generates a recommendation screen having the smallest area including a target candidate region where the football field is located.

In the present embodiment, the computer device acquires the type of the target object included in the target candidate region to which the position of the target object of each single-frame image belongs, and generating a recommended picture with a preset size and the position of the target object being at a preset position of the picture under the condition that the type of the target object is a preset target type, generating a recommendation screen including a target candidate region having a smallest area in a case where the type of the target object is not a preset target type, therefore, different recommended pictures are correspondingly determined according to the types of different target objects, the position of the target object in each picture of the video with the recommended viewing angle is appropriate, a user can see the local picture of the target object with the preset target type and can also see the whole picture of the target object with the non-preset target type, and the picture effect of the video with the recommended viewing angle is further improved.

In one embodiment, to improve the fluency of the video with the recommended viewing angle, as shown in fig. 9, the step S640 includes:

s910, performing interpolation calculation on the position coordinates of the target object in the adjacent recommended pictures by adopting an interpolation algorithm to obtain intermediate position coordinates.

Specifically, the single-frame image where each recommended picture is located has the uniquely determined playing time in the panoramic video, and the computer device performs interpolation calculation on the positions of the target objects in the recommended pictures which are adjacent in playing time and comprise the same target object by adopting an interpolation algorithm so as to fill the positions of the target objects in the empty pictures in the two adjacent recommended pictures. Optionally, the computer device applies a formula to the position coordinates of the target object in the adjacent recommended picture

And performing linear interpolation calculation to obtain a middle position coordinate. Wherein, P^tPosition coordinates, P, of the target object in the recommended picture for the previous playing time t^t+NPosition coordinates, P, of the target object in the recommended picture at the next playing time t + N^t+kThe position coordinates of the target object in the intermediate recommended picture corresponding to a time t + k between the previous playing time and the next playing time.

Optionally, the position coordinate of the target object may be a position coordinate of the target object in the corresponding recommended image, may also be a position coordinate of the target object in the corresponding single frame image, and may also be a position coordinate of the target object in the actual environment.

And S920, generating an intermediate recommendation picture comprising the target object according to the intermediate position coordinates.

And the playing time of the middle recommendation picture is positioned between the adjacent recommendation pictures.

And S930, generating a recommended viewing visual angle video by sequencing the recommended pictures and the middle recommended pictures from front to back according to the playing time.

Specifically, the computer device generates an intermediate recommended picture which has the same size as the recommended picture and comprises the same target object based on the intermediate position coordinates, and generates the recommended viewing angle video by the recommended picture and the intermediate recommended picture according to the sequence from front to back of the playing time.

Optionally, the computer device may further perform filtering on the positions of the target object in the recommended image and the intermediate recommended image forming the recommended viewing angle video by using a kalman filtering algorithm, a variation thereof, a sliding window averaging method, and other filtering algorithms, so that the generated recommended viewing angle video is more stable and has less jitter, and the picture effect of the recommended viewing angle video is further improved.

In the embodiment, the computer device performs interpolation calculation on the position coordinates of the target object in the adjacent recommended pictures by adopting an interpolation algorithm to obtain the middle position coordinates, generates the recommended viewing angle video with the play time between the adjacent recommended pictures according to the middle position coordinates, including the middle recommended picture of the target object, and then generates the recommended viewing angle video by the recommended pictures and the middle recommended pictures according to the play time in a front-to-back sequencing manner, supplements the middle recommended pictures for the recommended pictures with discontinuous play time, and further improves the fluency of the recommended viewing angle video.

In one embodiment, the method for playing the panoramic video may generate at least two videos of recommended viewing angles for at least two target objects. For example, in a live broadcast/recorded broadcast scene of a basketball event, a user sets an interpreter and a basketball court as target objects in advance, the computer device performs target detection on a single-frame video in the panoramic video of the basketball event according to the interpreter and the basketball court, detects the interpreter in each single-frame image in the panoramic video of the basketball event, generates a first recommended image of which the interpreter is located in the middle, and forms a first recommended viewing angle video for the interpreter by the first recommended image. Meanwhile, a basketball court in each single-frame image in the event panoramic video is detected, a second recommended image which comprises the whole basketball court and is the smallest in area is generated, and the second recommended image forms a second recommended viewing angle video for the basketball court. And the computer equipment displays the obtained first recommended watching visual angle video aiming at the instructor and the second recommended watching visual angle video aiming at the basketball court on the same display picture with the preset watching visual angle video.

In one embodiment, as shown in fig. 10, there is provided a method for automatically generating a recommended viewing angle video from a panoramic video, including:

and S1010, acquiring the panoramic video, and performing target detection on the panoramic video to obtain a detection result.

The detection result comprises a candidate region where the target object is located.

And S1020, generating a recommended viewing angle video according to the detection result.

Wherein, the target object of the picture content of the view angle video is recommended to watch.

Specifically, the generation process of the recommended viewing angle video based on the panoramic video may refer to the embodiments shown in fig. 5 to 9, and details are not repeated here.

It should be understood that although the various steps in the flowcharts of fig. 2-10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least some of the steps in fig. 2-10 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

In one embodiment, as shown in fig. 11, there is provided a playing apparatus of a panoramic video, including: an object detection module 1101, a video generation module 1102 and a synchronized presentation module 1103, wherein:

the target detection module 1101 is configured to obtain a panoramic video, perform target detection on the panoramic video, and obtain a detection result; the detection result comprises a candidate area where the target object is located; the video generating module 1102 is configured to generate a recommended viewing angle video according to the detection result; the image content of the recommended viewing angle video comprises a target object; the synchronous display module 1103 is configured to display the recommended viewing angle video and the preset viewing angle video on the same display screen; the picture content of the preset viewing angle video is different from the picture content of the recommended viewing angle video.

In one embodiment, the target detection module 1101 is specifically configured to:

extracting single-frame images in the panoramic video through a preset interval to obtain a single-frame image set; performing target detection on each single-frame image in the single-frame image set to obtain a detection result; the detection result comprises candidate regions corresponding to the single-frame images, and the candidate regions comprise target objects.

In one embodiment, the video generation module 1102 is specifically configured to:

determining a target candidate region of each single-frame image based on the characteristic parameters of the candidate regions corresponding to the single-frame images; determining the central point of the target candidate region of each single-frame image as the position of a target object in each single-frame image; generating a recommendation picture corresponding to each single-frame image according to the position of the target object of each single-frame image; and generating a recommended viewing angle video according to the recommended picture corresponding to each single-frame image.

In one embodiment, the feature parameters include confidence of the candidate region in a single frame image, and the video generation module 1102 is specifically configured to:

In one embodiment, the characteristic parameter includes the area of the candidate region in a single-frame image, the single-frame image set includes N single-frame images, the N single-frame images have a time sequence, and N is a positive integer; the video generation module 1102 is specifically configured to:

In one embodiment, the feature parameters further include a center point position of the candidate region, and the video generation module 1102 is further configured to:

if N is larger than 1, determining the position of the central point of each candidate area in the single-frame image of the Nth frame; calculating Euclidean distances between the central point position of each candidate region in the N-th frame single-frame image and the central point position of the target candidate region in the N-1-th frame single-frame image; and determining the candidate region with the minimum Euclidean distance as a target candidate region of the single-frame image of the Nth frame.

acquiring the type of a target object included in a target candidate region to which the position of the target object of each single-frame image belongs; if the type of the target object is a preset target type, generating a recommendation picture with a preset size and the position of the target object being at a preset picture position; and if the type of the target object is not the preset target type, generating a recommendation picture with the minimum area including the target candidate region.

In one embodiment, the video generation module 1002 is specifically configured to:

performing interpolation calculation on the position coordinates of the target object in the adjacent recommended pictures by adopting an interpolation algorithm to obtain intermediate position coordinates; generating an intermediate recommendation picture including the target object according to the intermediate position coordinates; the playing time of the middle recommendation picture is positioned between the adjacent recommendation pictures; and generating a recommended viewing angle video by sequencing the recommended pictures and the middle recommended pictures from front to back according to the playing time.

In one embodiment, as shown in fig. 12, there is provided an apparatus for automatically generating a recommended viewing angle video from a panoramic video, including: an object detection module 1201 and a video generation module 1202. Wherein:

the object detection module 1201 has the same function as the object detection module 1101, and the video generation module 1202 has the same function as the video generation module 1102, which is not described herein again.

For specific limitations of the playback apparatus of the panoramic video, the above limitations on the playback method of the panoramic video may be referred to, and for specific limitations of the apparatus for automatically generating the recommended viewing angle video of the panoramic video, the above limitations on the method for automatically generating the recommended viewing angle video of the panoramic video may be referred to, and details are not repeated here. All modules in the panoramic video playing device and the panoramic video automatic generation recommended viewing angle video device can be completely or partially realized through software, hardware and combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:

acquiring a panoramic video, and carrying out target detection on the panoramic video to obtain a detection result; the detection result comprises a candidate area where the target object is located; generating a recommended viewing angle video according to the detection result; the image content of the recommended viewing angle video comprises a target object; displaying the recommended viewing angle video and the preset viewing angle video on the same display picture; the picture content of the preset viewing angle video is different from the picture content of the recommended viewing angle video.

In one embodiment, the processor, when executing the computer program, further performs the steps of:

In one embodiment, the feature parameters include a confidence level of the candidate region in the single frame image, and the processor when executing the computer program further performs the steps of:

In one embodiment, the characteristic parameter includes the area of the candidate region in a single-frame image, the single-frame image set includes N single-frame images, the N single-frame images have a time sequence, and N is a positive integer; the processor, when executing the computer program, further performs the steps of:

In one embodiment, the feature parameters further include a center point position of the candidate region, and the processor, when executing the computer program, further performs the steps of:

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In an embodiment, the feature parameter comprises a confidence level of the candidate region in the single frame image, the computer program, when executed by the processor, further performs the steps of:

In one embodiment, the characteristic parameter includes the area of the candidate region in a single-frame image, the single-frame image set includes N single-frame images, the N single-frame images have a time sequence, and N is a positive integer; the computer program when executed by the processor further realizes the steps of:

In an embodiment, the feature parameters further comprise a position of a center point of the candidate region, the computer program, when executed by the processor, further performing the steps of:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for playing a panoramic video, the method comprising:

acquiring a panoramic video, and carrying out target detection on the panoramic video to obtain a detection result; wherein the detection result comprises a candidate region where the target object is located;

generating a recommended viewing angle video according to the detection result; wherein the picture content of the recommended viewing perspective video comprises the target object

Displaying the recommended viewing angle video and a preset viewing angle video on the same display picture; and the picture content of the preset viewing angle video is different from the picture content of the recommended viewing angle video.

2. The method of claim 1, wherein the performing target detection on the panoramic video to obtain a detection result comprises:

performing target detection on each single-frame image in the single-frame image set to obtain the detection result; wherein the detection result comprises candidate regions corresponding to the single-frame images respectively, and the candidate regions comprise the target object.

3. The method of claim 2, wherein generating a recommended viewing perspective video according to the detection result comprises:

determining a target candidate region of each single-frame image based on the characteristic parameters of the candidate regions corresponding to the single-frame images;

determining the central point of a target candidate region of each single-frame image as the position of the target object in each single-frame image;

generating a recommendation picture corresponding to each single-frame image according to the position of the target object in each single-frame image;

and generating the recommended viewing angle video according to the recommended picture corresponding to each single-frame image.

4. The method according to claim 3, wherein the feature parameters include confidence degrees of the candidate regions in the single-frame images, and the determining the target candidate region of each single-frame image based on the feature parameters of the candidate regions corresponding to the single-frame images comprises:

and acquiring the candidate region with the maximum confidence coefficient in each single-frame image as the target candidate region.

5. The method according to claim 3, wherein the feature parameter comprises an area of the candidate region in the single-frame image, the single-frame image set comprises N single-frame images, N single-frame images have a time sequence therebetween, and N is a positive integer;

the determining a target candidate region of each single-frame image from the candidate region corresponding to each single-frame image based on the characteristic parameters comprises:

and if N is equal to 1, taking the candidate region with the largest area in the 1 st frame single-frame image as the target candidate region of the 1 st frame single-frame image.

6. The method of claim 5, wherein the feature parameter further includes a center point position of the candidate region, and wherein the determining the target candidate region of each single-frame image from the candidate region corresponding to each single-frame image based on the feature parameter further includes:

7. The method according to claim 3, wherein the generating a recommended picture corresponding to each single-frame image according to the position of the target object in each single-frame image comprises:

acquiring the type of a target object included in a target candidate region to which the position of the target object belongs in each single-frame image;

if the type of the target object is a preset target type, generating a recommended picture with a preset size, wherein the position of the target object is located at a preset picture position;

and if the type of the target object is not the preset target type, generating the recommendation picture with the minimum area including the target candidate region.

8. The method according to any one of claims 3 to 7, wherein the generating the recommended viewing perspective video according to the recommended picture corresponding to each single frame image comprises:

generating an intermediate recommendation picture comprising the target object according to the intermediate position coordinates; the playing time of the middle recommendation picture is positioned between the adjacent recommendation pictures;

and generating the recommended viewing angle video by the recommended pictures and the middle recommended pictures according to the sequence from front to back of the playing time.

9. A method for automatically generating a video with a recommended viewing angle from a panoramic video, the method comprising:

generating a recommended viewing angle video according to the detection result; wherein the screen content of the recommended viewing perspective video includes the target object.

10. An apparatus for playing a panoramic video, the apparatus comprising:

the target detection module is used for acquiring a panoramic video and carrying out target detection on the panoramic video to obtain a detection result; wherein the detection result comprises a candidate region where the target object is located;

the video generation module is used for generating a recommended viewing angle video according to the detection result; wherein the picture content of the recommended viewing perspective video comprises the target object;

the synchronous display module is used for displaying the recommended viewing angle video and a preset viewing angle video on the same display picture; and the picture content of the preset viewing angle video is different from the picture content of the recommended viewing angle video.

11. An apparatus for automatically generating a video with a recommended viewing angle from a panoramic video, the apparatus comprising:

the video generation module is used for generating a recommended viewing angle video according to the detection result; wherein the screen content of the recommended viewing perspective video includes the target object.

12. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 9 when executing the computer program.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 9.