CN115294493A

CN115294493A - Visual angle path acquisition method and device, electronic equipment and medium

Info

Publication number: CN115294493A
Application number: CN202210882895.0A
Authority: CN
Inventors: 符峥; 龙良曲; 姜文杰
Original assignee: Insta360 Innovation Technology Co Ltd
Current assignee: Insta360 Innovation Technology Co Ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-11-04
Also published as: WO2024022301A1

Abstract

The embodiment of the application provides a method, a device, an electronic device and a medium for obtaining a visual angle path, wherein the method comprises the following steps: acquiring a visual angle target of a first key frame image of a panoramic video, wherein each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video; acquiring a visual angle target of a target frame image of the panoramic video according to the visual angle target of the first key frame image, wherein the target frame image is positioned between the first key frame image and a second key frame image, and the second key frame image is a next key frame image of the first key frame image in the panoramic video; and obtaining the view angle path of the panoramic video according to the obtained view angle targets.

Description

Visual angle path acquisition method and device, electronic equipment and medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method and an apparatus for obtaining a view path, an electronic device, and a medium.

Background

The panoramic video refers to all scenes around an observation point in space and is composed of all light rays received by the observation point. For a shot panoramic video, a view angle path of the panoramic video can be acquired.

Disclosure of Invention

The embodiment of the invention provides a method and a device for obtaining a view angle path, electronic equipment and a medium, which can obtain the view angle path of a panoramic video.

In a first aspect, an embodiment of the present invention provides a method for obtaining a view path, including: acquiring a visual angle target of a first key frame image of a panoramic video, wherein each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video; acquiring a visual angle target of a target frame image of the panoramic video according to the visual angle target of the first key frame image, wherein the target frame image is positioned between the first key frame image and a second key frame image, and the second key frame image is a next key frame image of the first key frame image in the panoramic video; and obtaining the view angle path of the panoramic video according to the obtained view angle targets.

Optionally, the obtaining a view angle target of a target frame image of the panoramic video according to the view angle target of the first key frame image includes: and performing target tracking processing on the target frame image according to the visual angle target of the first key frame image to obtain the visual angle target of the target frame image.

Optionally, the method further comprises: performing frame extraction processing on each frame image positioned between the first key frame image and the second key frame image in the panoramic video to obtain at least one frame of first image;

and taking the first image of each frame as the target frame image respectively.

Optionally, the obtaining at least one frame of the first image includes: obtaining at least one frame of first image and other frame images except the at least one frame of first image; the method further comprises the following steps: and obtaining the visual angle targets of the other frames of images according to the visual angle target of the first key frame image, the visual angle target of the second key frame image and the visual angle target of the first image of each frame.

Optionally, the method further comprises: acquiring a visual angle target of the second key frame image;

the obtaining of the view target of the target frame image of the panoramic video according to the view target of the first key frame image includes: and obtaining the visual angle target of the target frame image according to the visual angle target of the first key frame image and the visual angle target of the second key frame image.

Optionally, the acquiring the perspective target of the first key frame image includes: performing target detection on the first key frame image to obtain each target in the first key frame image; evaluating each target according to a preset multi-dimensional characteristic evaluation strategy to obtain an evaluation result of each target; and according to the evaluation result of each target, taking the target with the optimal evaluation result as the visual angle target of the first key frame image.

Optionally, the evaluating each target according to a preset multi-dimensional feature evaluation policy to obtain an evaluation result of each target includes: respectively executing the following operations on each target: for each first evaluation dimension in a plurality of preset evaluation dimensions, evaluating the target based on the first evaluation dimension to obtain a value of the target corresponding to the first evaluation dimension; obtaining a score of the target corresponding to the first evaluation dimension according to the value of the target corresponding to the first evaluation dimension and a preset weight for the first evaluation dimension; and taking the sum of the scores of the target corresponding to each preset evaluation dimension as an evaluation result of the target.

Optionally, the obtaining, according to the obtained each view target, a view path of the panoramic video includes: for each obtained visual angle target, according to a boundary box corresponding to the visual angle target, taking the central point of the boundary box as the viewpoint of the frame image where the visual angle target is located; and obtaining the view angle path of the panoramic video according to the obtained viewpoints.

Optionally, the method further comprises: for each obtained visual angle target, taking the central point of the boundary frame corresponding to the visual angle target as the central point of a visual angle, and obtaining the visual angle of the frame image where the visual angle target is located according to the boundary frame corresponding to the visual angle target, wherein the visual angle range corresponding to the visual angle is larger than or equal to the range of the boundary frame; generating a plane video frame corresponding to the frame image of the visual angle target according to the visual angle; and obtaining the plane video corresponding to the panoramic video according to the view angle path of the panoramic video and each generated plane video frame.

In a second aspect, an embodiment of the present invention provides a viewing angle path obtaining apparatus, including: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a visual angle target of a first key frame image of a panoramic video, and each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video; a second obtaining module, configured to obtain a view angle target of a target frame image of the panoramic video according to the view angle target of the first key frame image, where the target frame image is located between the first key frame image and a second key frame image, and the second key frame image is a next key frame image of the first key frame image in the panoramic video; and the third acquisition module is used for acquiring the view angle path of the panoramic video according to the acquired view angle targets.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory is used to store program instructions, and the processor is used to execute the program instructions to implement the method according to any one of the first aspect.

In a fourth aspect, the present invention provides a computer-readable storage medium on which program instructions are stored, the program instructions, when executed by a processor, implementing the method according to any one of the first aspect.

In the embodiment of the invention, each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video, the visual angle target of the first key frame image is obtained for the first key frame image of the panoramic video, the visual angle target of the target frame image of the panoramic video is obtained for the target frame image between the first key frame image and the second key frame image according to the visual angle target of the first key frame image, the second key frame image is the next key frame image of the first key frame image in the panoramic video, and the visual angle path of the panoramic video is obtained according to each obtained visual angle target. Therefore, the embodiment can acquire the view angle path of the panoramic video.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for obtaining a view path according to an embodiment of the present invention;

fig. 2 is a schematic diagram for explaining a panoramic video according to an embodiment of the present invention;

FIG. 3 is a schematic view of a viewing path according to an embodiment of the present invention;

fig. 4 is a schematic diagram illustrating a view path obtaining method according to an embodiment of the present invention;

fig. 5 is a schematic diagram for explaining a manner of obtaining a view angle target according to an embodiment of the present invention;

fig. 6 is a schematic diagram for explaining another view-angle target obtaining manner according to an embodiment of the present invention;

fig. 7 is a schematic diagram for explaining another manner of acquiring the view-angle target according to the embodiment of the present invention;

fig. 8 is a block schematic diagram of a viewing angle path obtaining apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It is to be understood that the term "at least one" as used herein refers to one or more, and "a plurality" refers to two or more. The term "and/or" as used herein is merely an associative relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. "at least one of the following" and the like, refer to any combination of these items, including any combination of singular or plural items. For example, at least one of a, b, and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.

It should be understood that although the terms first, second, etc. may be used to describe the set threshold in the embodiments of the present invention, these set thresholds should not be limited to these terms. These terms are used only to distinguish the set thresholds from each other. For example, a first set threshold may also be referred to as a second set threshold, and similarly, a second set threshold may also be referred to as a first set threshold, without departing from the scope of embodiments of the present invention.

The terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.

Before describing embodiments of the present application, some technical terms referred to in the present application will be described.

Saliency, in this application, generally refers to image saliency. The image saliency is an important visual feature in an image, and represents the degree of importance of human eyes to each region of the image.

Interpolation may refer to interpolating a third pixel between the original two pixels and making its color equal to the average of the pixels located around it. For example, when a panoramic video is converted into a planar video, pixels on spherical coordinates may be converted into a specified position on planar coordinates according to coordinate conversion. At this time, gaps exist among pixels, and interpolation is needed to obtain a rectangular plane image/video.

The field of view (FOV) refers to the angle between two edges of the panoramic video projected from the focus to the maximum range of the picture.

Referring to fig. 2, the panoramic video can be abstracted into a sphere centered on the observation point. The projection focal length may be a distance from the center to the spherical surface, i.e., the value of the projection focal length may be a radius corresponding to the spherical surface. Possibly, the projection focal length may have a value of 1.

Referring to fig. 2 and 3, two dimensions Φ and θ can be used to represent the change of the viewpoint on the sphere, and the dimension Ω is used to represent the change of the video capturing time t.

Referring to fig. 2, T =5 in fig. 2 may represent a corresponding 5-frame video image.

Referring to fig. 3, T in fig. 3 may represent a T-th frame video image, and T +5 may represent a T + 5-th frame video image, and the frame images between the two frame images are not shown in fig. 3.

As shown in fig. 3, based on the viewpoint on each frame of video image and the chronological order of the change of each frame of video image with time, a viewpoint path as shown by a dotted line in fig. 3 can be obtained.

Before the embodiments of the present application are described, a conventional view path acquisition method is described.

In a feasible implementation manner, please refer to fig. 3, the spherical image may be divided into a plurality of regions, the view path planning is defined as the movement between the regions, the movement is converted into a learnable scheme, and the model is learnt by inputting a large number of manually labeled view path samples, and finally the model automatically infers the optimal path on the panoramic video.

However, the final effect of this implementation is strongly dependent on the manually labeled view path samples, and the interpretability is poor. And the performance of the model is also lower.

Technical implementation of the embodiments of the present application will be described below with reference to the accompanying drawings.

As shown in fig. 1, an embodiment of the present invention provides a method for obtaining a view path, including steps 101 to 103:

step 101, for a first key frame image of a panoramic video, acquiring a view angle target of the first key frame image, wherein each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video.

In this embodiment, frame extraction processing is performed on the panoramic video to realize "key frame detection", and the extracted key frames are recorded as key frame images, that is, only the key frame images are detected to determine the view angle target.

It is feasible to perform frame extraction processing on the panoramic video, and each key frame can be extracted from the first frame or a specified position of the panoramic video. The frame extraction may be performed according to image content, according to a fixed time interval, or the like. For example, the target detection frequency may be detected every 15 frames.

The bounding box of the visual angle object can be a rectangular area, and can also be an area with other shapes, such as a circle, an ellipse and a free shape. When the bounding box of the view target is a rectangular region, the data of the bounding box may be the boundary data of the rectangular region or the corner coordinates of two non-adjacent corners of the rectangular region.

In this embodiment, frame extraction processing is performed on the panoramic video to extract the key frame image, and only the extracted key frame image is subjected to view angle target detection, that is, the key frame image is a video image to be subjected to view angle target detection, and view angle target detection is not performed on other frame images, where view angle targets of other frame images may be determined in other ways, that is, view angle target detection is not performed frame by frame.

Referring to fig. 5-7, the extracted key frame images may include a first key frame image (nth frame video image) and a second key frame image (n +4 th frame video image) as shown in fig. 5-7, where the second key frame image is a next key frame image of the first key frame image in the panoramic video. There are multiple frame video images (n +1 to n +3 frame video images) between the two key frame images.

According to the method and the device, the key frame images are extracted through key frame detection to perform view angle target detection, and view angle target detection is not performed frame by frame, so that the calculation cost for calculating the optimal view angle of the panoramic video can be greatly reduced.

And 102, acquiring a visual angle target of a target frame image of the panoramic video according to the visual angle target of the first key frame image, wherein the target frame image is positioned between the first key frame image and a second key frame image, and the second key frame image is a next key frame image of the first key frame image in the panoramic video.

The target frame image can be a part of or all of the frame video image between two adjacent key frame images.

For other frame video images without perspective target detection, the perspective target of the other frame video images can be determined according to the perspective target of the latest key frame image before the other frame video images. Namely, for each other frame video image in the two adjacent key frame images, the view angle target of each other frame video image in the two adjacent key frame images is determined according to the view angle target of the previous key frame image in the two adjacent key frame images.

For example, referring to fig. 5-7, the perspective targets of the frames of video images (n +1 th to n +3 th frame of video images) between the first key frame image and the second key frame image can be determined according to the perspective target of the first key frame image (i.e., the nth frame of video image), and the specific determination manner can be as follows:

referring to fig. 5, the method according to mode 1: and determining the view angle target of each frame of video image between the first key frame image and the second key frame image based on the view angle target of the first key frame image by the target tracking mode.

For example, the view angle targets of the n +1 th to n +3 th frame video images may be determined according to the view angle target of the nth frame video image in the manner 1.

Referring to fig. 7, the method can also be according to mode 2: and determining the view angle target of each frame of video image between the first key frame image and the second key frame image based on the view angle target of the first key frame image and the view angle target of the second key frame image in a mode that the view point can be interpolated in the middle of the interval and the intermediate value is taken.

For example, the view angle targets of the n +1 th to n +3 th frame video images may be determined according to the view angle target of the nth frame video image and the view angle target of the n +4 th frame video image in the manner 2.

In addition, referring to fig. 6, in the method 1, the method may also be implemented by "key frame tracking" instead of frame-by-frame tracking. For each frame of video images that are not tracked, the view angle targets of these frame of video images can be determined in combination with mode 2.

For example, the view angle target of the n +2 frame video image can be determined according to the view angle target of the n frame video image in the manner 1. Further, in the manner 2, the view angle target of the n +1 th frame of video image is determined according to the view angle target of the n +2 th frame of video image and the view angle target of the n +2 th frame of video image, and the view angle target of the n +3 th frame of video image is determined according to the view angle target of the n +2 th frame of video image and the view angle target of the n +4 th frame of video image.

In this way, the view angle target of each frame of video image in the panoramic video can be obtained. The perspective target may generally be an optimal perspective target for the video image.

And 103, acquiring a view angle path of the panoramic video according to the acquired view angle targets.

Therefore, the embodiment can achieve the acquisition of the view angle path of the panoramic video, and can greatly reduce the calculation cost of calculating the optimal view angle of the panoramic video.

Based on the above, the implementation of step 102 is further described below with reference to fig. 5-7.

In an embodiment of the present invention, referring to fig. 5, the obtaining the view angle target of the target frame image of the panoramic video according to the view angle target of the first key frame image includes: and according to the visual angle target of the first key frame image, performing target tracking processing on the target frame image to obtain the visual angle target of the target frame image.

Referring to fig. 5, the method 1: and in the target tracking mode, the view angle targets of the (n + 1) -n + 3) th frame video images are determined according to the view angle target of the nth frame video image.

In this embodiment, the (n + 1) th to (n + 3) th frame video images may be all used as the target frame images, that is, the viewpoint targets of the (n + 1) th to (n + 3) th frame video images may be determined based on the above-described mode 1.

As shown in fig. 5, target tracking may be performed based on the view angle target of the nth frame of video image, so as to obtain a view angle target of the (n + 1) th frame of video image; then, target tracking is carried out on the basis of the view angle target of the (n + 1) th frame of video image, and the view angle target of the (n + 2) th frame of video image is obtained; and then carrying out target tracking based on the view angle target of the (n + 2) th frame of video image to obtain the view angle target of the (n + 3) th frame of video image. And ending the target tracking because the next frame of video image (the n +4 th frame of video image) is the second key frame image.

Therefore, the visual angle target of each frame of video image between two key frame images can be accurately determined by target tracking.

In an embodiment of the present invention, referring to fig. 7, the method further includes: acquiring a visual angle target of a second key frame image; the acquiring the view angle target of the target frame image of the panoramic video according to the view angle target of the first key frame image comprises the following steps: and obtaining the visual angle target of the target frame image according to the visual angle target of the first key frame image and the visual angle target of the second key frame image.

Referring to fig. 7, the method 2: and determining the view angle targets of the n +1 th to n +3 th frames of video images according to the view angle target of the n frame of video image and the view angle target of the n +4 th frame of video image in a mode of interpolating the view points in the middle of the interval and taking an intermediate value.

In this embodiment, the (n + 1) th to (n + 3) th frame video images may be all used as the target frame images, that is, the viewpoint targets of the (n + 1) th to (n + 3) th frame video images may be determined based on the above-described mode 2.

Taking the bounding box of the view target as a rectangular area as an example, for the mode 2: in a manner that the viewpoint can be interpolated and an intermediate value can be obtained in the middle of the interval, if one rectangular corner of the view angle target of the m-th frame of video image is a pixel point in the X-th row and the Y-th column in the image, and one rectangular corner of the view angle target of the m + 2-th frame of video image is a pixel point in the X + a-th row and the Y + b-th column in the image, one rectangular corner of the view angle target of the m + 1-th frame of video image can be a pixel point in the X + a/2-th row and the Y + b/2-th column in the image.

If one rectangular corner of the view-angle target of the m frame video image and one rectangular corner of the view-angle target of the m +2 frame video image are pixel points in the X-th row and the Y-th column in the image, one rectangular corner of the view-angle target of the m +1 frame video image can also be pixel points in the X-th row and the Y-th column in the image.

Referring to fig. 7, based on the view angle target of the nth frame video image and the view angle target of the (n + 4) th frame video image, the view angle target of the (n + 2) th frame video image may be obtained by interpolating the view points between the two and taking the intermediate value; based on the view angle target of the nth frame video image and the view angle target of the (n + 2) th frame video image, interpolating the view points between the two frames and taking the intermediate value to obtain the view angle target of the (n + 1) th frame video image; and based on the view angle target of the (n + 2) th frame of video image and the view angle target of the (n + 4) th frame of video image, interpolating the view point between the two frames of video image and taking the intermediate value to obtain the view angle target of the (n + 3) th frame of video image. Thus, the view angle targets of the n +1 th to n +3 th frames of video images are determined.

Therefore, in the embodiment, the visual angle target of each frame of video image between two key frame images can be accurately determined by interpolating the visual point in the middle of the interval.

In an embodiment of the present invention, referring to fig. 6, in an embodiment of the present invention, the method further includes: performing frame extraction processing on each frame image positioned between the first key frame image and the second key frame image in the panoramic video to obtain at least one frame of first image; and respectively taking the first image of each frame as a target frame image.

In this embodiment, frame extraction processing is performed between two key frame images to realize "key frame tracking", and a video image of an extracted key frame is taken as a first image, that is, only the first image is tracked to determine a view angle target of the first image. For example the tracking frequency may be once every 3 frames.

Referring to fig. 6, of the n +1 th to n +3 th frame video images, only the n +2 th frame video image may be used as the target frame image, that is, the viewpoint target of the n +2 th frame video image may be determined based on the above-described manner 1, but the viewpoint targets of the n +1 th and n +3 th frame video images may not be determined based on the above-described manner 1 (may be based on the above-described manner 2).

Based on the above, in an embodiment of the present invention, the obtaining at least one frame of the first image includes: obtaining at least one frame of first image and other frame images except the at least one frame of first image;

the method further comprises the following steps: and obtaining the visual angle targets of other frames of images according to the visual angle target of the first key frame image, the visual angle target of the second key frame image and the visual angle target of the first image of each frame.

In this embodiment, frame extraction processing is performed between two key frame images, a video image of an extracted key frame is referred to as a first image, and the remaining other frame images are not referred to as first images.

The view angle target is determined based on the above-described mode 1 for the first image, and the view angle target is determined based on the above-described mode 2 for the remaining other frame images.

As shown in fig. 6, first, frame extraction processing is performed between the first key frame image and the second key frame image, so that the key frame of the n +2 th frame video image can be extracted. Then, by the above mode 1, target tracking is performed based on the view angle target of the nth frame of video image, so as to obtain the view angle target of the (n + 2) th frame of video image. Then, by the above mode 2, based on the view angle target of the nth frame video image and the view angle target of the (n + 2) th frame video image, the view point is interpolated between the two frames and the intermediate value is taken, so as to obtain the view angle target of the (n + 1) th frame video image; and based on the view angle target of the (n + 2) th frame of video image and the view angle target of the (n + 4) th frame of video image, interpolating the view point between the two frames of video image and taking the intermediate value to obtain the view angle target of the (n + 3) th frame of video image. Thus, the view angle targets of the n +1 th to n +3 th frame video images are determined.

In this embodiment, the above mode 1 and mode 2 are combined to determine the view angle target of each frame of video image between two key frame images, which not only can realize accurate determination of the view angle target of each frame of video image between two key frame images, but also can greatly reduce the calculation cost for calculating the optimal view angle of the panoramic video.

When performing object detection on a video image to obtain a view angle object, detection may be performed based on multi-dimensional features (i.e., multiple evaluation dimensions) of the video image. That is, the present embodiment may comprehensively evaluate the importance degree of the target in the current video frame from multiple quantifiable dimensions according to a priori or a model.

The multi-dimensional features may be area, saliency, expression, motion, etc.

Based on this, in an embodiment of the present invention, the acquiring the perspective target of the first key frame image includes: carrying out target detection on the first key frame image to obtain each target in the first key frame image; evaluating each target according to a preset multi-dimensional characteristic evaluation strategy to obtain an evaluation result of each target; and according to the evaluation results of the targets, taking the target with the optimal evaluation result as the visual angle target of the first key frame image.

In this embodiment, first, target detection is performed on a video image to detect each candidate target in the video image. And then, the candidate targets can be evaluated based on a multi-dimensional feature evaluation strategy so as to evaluate the optimal target.

Taking the multi-dimensional features as the features such as area, saliency, expression, and motion as an example, the evaluation result obtained by evaluating the target may include area size, saliency, wonderful expression, confidence of motion, and the like.

Based on the evaluation results of the targets, an optimal target can be evaluated and used as a view angle target of the video image. And then the optimal view angle path of the panoramic video can be obtained.

In the embodiment, the target importance degree is evaluated by using the multi-dimensional characteristics, so that the target can be accurately evaluated, the interpretability of the target evaluation is strong, and different evaluation strategies can be easily formulated according to requirements.

Each evaluation dimension may be assigned a weight, which may be manually defined or obtained through machine learning. Therefore, the weighted summation can be carried out on all the evaluation dimensions to obtain the target evaluation result.

Based on this, in an embodiment of the present invention, the evaluating each target according to a preset multidimensional feature evaluation policy to obtain an evaluation result of each target includes:

the following operations are respectively executed for each target: for each first evaluation dimension in a plurality of preset evaluation dimensions, evaluating the target based on the first evaluation dimension to obtain a value of the target corresponding to the first evaluation dimension; obtaining a score of the target corresponding to the first evaluation dimension according to the value of the target corresponding to the first evaluation dimension and a preset weight for the first evaluation dimension; and taking the sum of the scores of the target corresponding to each preset evaluation dimension as an evaluation result of the target.

In this embodiment, each target may be evaluated sequentially or in parallel to obtain a corresponding evaluation result.

For each of the plurality of evaluation dimensions, the target may be evaluated based on the evaluation dimension to obtain a corresponding evaluation value, and then a corresponding weighted evaluation value may be obtained by combining the preset weight of the evaluation dimension. Further, the weighted evaluation values corresponding to the evaluation dimensions may be summed to obtain the target evaluation result.

As can be seen from the above, in the embodiment, the panoramic video can be processed by adopting the modes of the "key frame detection target", "multi-dimensional feature evaluation target", "tracking target", and "key frame tracking target" to obtain the corresponding optimal viewing angle path, so that the calculation cost for calculating the optimal viewing angle of the panoramic video can be greatly reduced, the performance of planning the optimal path of the panoramic video is improved, the interpretability of the multi-dimensional feature is strong when the importance degree of the target is evaluated, and different optimal viewing angle path planning strategies can be easily formulated according to different requirements.

It should be noted that this embodiment may be executed after obtaining the panoramic video, or may be executed during obtaining the panoramic video, so that the view angle path of the panoramic video may be obtained in real time.

In an embodiment of the present invention, before step 101, the original panoramic video may be formatted, and step 101 is performed based on the processed panoramic video.

The original panoramic video may refer to an original spherical video photographed using a panoramic camera.

The embodiment can be realized by a panoramic view model, and the original panoramic video can be converted into a format which can be processed by the panoramic view model through format processing. The panoramic view model may execute part or all of processing processes such as a "key frame detection target", a "multidimensional feature evaluation target", a "tracking target", and a "key frame tracking target", that is, the panoramic view model may include part or all of a target detection model, a feature evaluation model, a tracking model, an evaluation strategy, and parameters.

Possibly, according to the format requirement of the panoramic view model on the input data, the original panoramic video can be processed in modes such as projection, splicing, video format conversion, resolution conversion and other video processing modes.

In this embodiment, when obtaining a view angle target of a video image, a bounding box of the view angle target may be specifically obtained (that is, in this embodiment, a target bounding box series corresponding to each frame of video image in a panoramic video may be obtained). In this way, the center point of the bounding box can be used as the viewpoint of the frame image where the view angle target is located.

Based on this, in an embodiment of the present invention, the obtaining a view path of a panoramic video according to the obtained respective view targets includes: for each obtained visual angle target, according to a boundary frame corresponding to the visual angle target, taking the central point of the boundary frame as the viewpoint of the frame image where the visual angle target is located; and obtaining a view angle path of the panoramic video according to the obtained viewpoints.

The bounding box of the visual angle object can be a rectangular area, and can also be an area with other shapes, such as a circle, an ellipse and a free shape. When the bounding box of the view target is a rectangular region, the data of the bounding box may be the coordinates of the corner points of two non-adjacent corner points of the rectangular region.

According to the data of the boundary frame of the visual angle target, the central point of the boundary frame can be determined, and the central point can be the central point of the visual angle target and is used as the viewpoint of the frame image where the visual angle target is located. And then based on the determined view point of each frame of video image and the time-varying sequence of each frame of video image, obtaining the view angle path of the panoramic video.

Based on the obtained view angle path of the panoramic video, the panoramic video can be intelligently edited. In one possible implementation, the panoramic video may be converted to a flat video based on the acquired view path. And then, the plane video obtained by conversion can be played to display. In other feasible implementation manners, other manners of presentation can be performed based on the determined view angle path, for example, the path presentation on a 2:1 panorama or spherical video can be performed.

Based on the above, in an embodiment of the present invention, the method further includes: for each obtained visual angle target, taking the central point of the boundary frame corresponding to the visual angle target as the central point of the visual angle, and obtaining the visual angle of the frame image where the visual angle target is located according to the boundary frame corresponding to the visual angle target, wherein the visual angle range corresponding to the visual angle is larger than or equal to the range of the boundary frame; generating a plane video frame of a frame image corresponding to the visual angle target according to the visual angle; and obtaining the planar video corresponding to the panoramic video according to the view angle path of the panoramic video and each generated planar video frame.

In this embodiment, for any frame of video image, the viewpoint of the video image is taken as the central point of the angle of view, and based on the specified focal length and the bounding box of the angle of view target of the video image, the size of the angle of view can be determined.

The specified focal length can be a fixed projection focal length, and can also be self-adaptive according to the size of the field angle, and the larger the focal length is, the larger the field angle visual field range is, and the wider the visual angle of the projected plane video is.

Wherein the size of the field angle is not smaller than the range of the bounding box, so that the viewing angle target exists at least in the field range of the determined field angle. As such, in one implementation, the field angle range of the field angle may be equal to the range of the bounding box. In yet other implementations, the field angle may have a field angle range that is greater than the range of the bounding box, but not more than a maximum of one radian (i.e., not greater than 2 π).

Based on the projection focal length and the field angle of each frame of video image, a flat video frame of the video image may be generated. Optionally, the pixels can be converted from spherical coordinates to planar coordinates according to the field angle and the projection focal length of the panoramic video, and a rectangular planar video frame which can be played by a common device can be obtained by combining an interpolation pixel technology.

Furthermore, a flat video corresponding to the panoramic video can be obtained according to the viewing angle path and the generated flat video frames, and for example, the flat video frames can be spliced into the flat video in a splicing manner. The obtained flat video can be played on a common device for a user to watch.

It should be noted that the present embodiment may be executed after obtaining the panoramic video, or may be executed in the process of obtaining the panoramic video, so that a corresponding plane video of the panoramic video may be obtained in real time.

Considering that the field angle of each frame of video image is usually different, after the viewpoint path of the panoramic video is obtained, the smooth path processing may be performed by using a moving window averaging method to obtain a processed smooth viewpoint path, and then the field angle sequence may be determined based on the processed smooth viewpoint path. And processing the determined field angle sequence by using a moving window averaging method, and further executing conversion from the panoramic video to the flat video based on the smooth field angle sequence obtained by processing. Therefore, the plane video with better video effect can be obtained, and the user watching experience is better.

As shown in fig. 8, an embodiment of the present invention provides a viewing angle path acquiring apparatus 10, which includes a first acquiring module 11, a second acquiring module 12, and a third acquiring module 13.

The first obtaining module 11 is configured to obtain, for a first key frame image of a panoramic video, a view angle target of the first key frame image, where each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video. The second obtaining module 12 is configured to obtain a view angle target of a target frame image of the panoramic video according to the view angle target of the first key frame image, where the target frame image is located between the first key frame image and a second key frame image, and the second key frame image is a next key frame image of the first key frame image in the panoramic video. The third obtaining module 13 is configured to obtain a view path of the panoramic video according to the obtained view targets.

In an embodiment of the present invention, the second obtaining module 12 is configured to perform target tracking processing on the target frame image according to the perspective target of the first key frame image, so as to obtain the perspective target of the target frame image.

In one embodiment of the present invention, the viewing angle path acquiring apparatus 10 further includes: the first module is used for performing frame extraction processing on each frame image positioned between the first key frame image and the second key frame image in the panoramic video to obtain at least one frame of first image; and respectively taking the first image of each frame as a target frame image.

In one embodiment of the invention, the first module is used for obtaining at least one frame of first image and other frames of images except for the at least one frame of first image;

the viewing angle path acquisition apparatus 10 further includes: and the second module is used for obtaining the visual angle targets of other frames of images according to the visual angle target of the first key frame image, the visual angle target of the second key frame image and the visual angle target of each frame of first image.

In one embodiment of the present invention, the viewing angle path acquiring apparatus 10 further includes: the third module is used for acquiring a visual angle target of the second key frame image;

the second obtaining module 12 is configured to obtain a visual angle target of the target frame image according to the visual angle target of the first key frame image and the visual angle target of the second key frame image.

In an embodiment of the present invention, the first obtaining module 11 is configured to perform target detection on the first key frame image, and obtain each target in the first key frame image; evaluating each target according to a preset multi-dimensional characteristic evaluation strategy to obtain an evaluation result of each target; and according to the evaluation results of the targets, taking the target with the optimal evaluation result as the visual angle target of the first key frame image.

In an embodiment of the present invention, the first obtaining module 11 is configured to perform the following operations on each target: for each first evaluation dimension in a plurality of preset evaluation dimensions, evaluating the target based on the first evaluation dimension to obtain a value of the target corresponding to the first evaluation dimension; obtaining a score of the target corresponding to the first evaluation dimension according to the value of the target corresponding to the first evaluation dimension and a preset weight for the first evaluation dimension; and taking the sum of the scores of the target corresponding to each preset evaluation dimension as an evaluation result of the target.

In an embodiment of the present invention, the third obtaining module 13 is configured to, for each obtained view angle target, according to a bounding box corresponding to the view angle target, use a central point of the bounding box as a viewpoint of a frame image where the view angle target is located; and obtaining a view angle path of the panoramic video according to the obtained viewpoints.

In one embodiment of the present invention, the viewing angle path acquiring apparatus 10 further includes: the fourth module is used for taking the central point of the boundary frame corresponding to the view angle target as the central point of the view angle for each obtained view angle target and obtaining the view angle of the frame image where the view angle target is located according to the boundary frame corresponding to the view angle target, wherein the view angle range corresponding to the view angle is larger than or equal to the range of the boundary frame; generating a plane video frame corresponding to a frame image where the visual angle target is located according to the visual angle; and obtaining the planar video corresponding to the panoramic video according to the view angle path of the panoramic video and each generated planar video frame.

Embodiments of the present invention further provide an electronic device, including a processor and a memory, where the memory is used to store program instructions and the processor is used to execute the program instructions to implement a method as any of the above method embodiments.

Embodiments of the present invention also provide a computer-readable storage medium on which program instructions are stored, which program instructions, when executed by a processor, implement a method as any of the above method embodiments.

Fig. 9 is a schematic diagram of a computer device according to an embodiment of the present invention. As shown in fig. 9, the computer device 20 of this embodiment includes: a processor 21 and a memory 22, where the memory 22 is used for storing a computer program 23 that can be run on the processor 21, and the computer program 23 implements the steps in the method embodiment of the present invention when executed by the processor 21, and in order to avoid repetition, the details are not repeated herein. Alternatively, the computer program 23 is executed by the processor 21 to implement the functions of each model/unit in the embodiment of the apparatus of the present invention, which are not described herein again to avoid repetition.

The computer device 20 includes, but is not limited to, a processor 21 and a memory 22. Those skilled in the art will appreciate that fig. 9 is merely an example of a computer device 20 and is not intended to limit the computer device 20 and that it may include more or fewer components than shown, or some of the components may be combined, or different components, e.g., the computer device may also include input output devices, network access devices, buses, etc.

The Processor 21 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor, or the processor may be any conventional processor or the like.

The storage 22 may be an internal storage unit of the computer device 20, such as a hard disk or a memory of the computer device 20. The memory 22 may also be an external storage device of the computer device 20, such as a plug-in hard disk provided on the computer device 20, a Smart Media (SM) card, a Secure Digital (SD) card, a flash card (FlashCard), and the like. Further, the memory 22 may also include both internal storage units of the computer device 20 and external storage devices. The memory 22 is used for storing a computer program 23 and other programs and data required by the computer device. The memory 22 may also be used to temporarily store data that has been output or is to be output.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or in the form of hardware plus a software functional unit.

The integrated unit, which is implemented in the form of a software functional unit, may be stored in a computer-readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A viewing angle path acquisition method, comprising:

acquiring a visual angle target of a first key frame image of a panoramic video, wherein each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video;

acquiring a visual angle target of a target frame image of the panoramic video according to the visual angle target of the first key frame image, wherein the target frame image is positioned between the first key frame image and a second key frame image, and the second key frame image is a next key frame image of the first key frame image in the panoramic video;

and obtaining the view angle path of the panoramic video according to the obtained view angle targets.

2. The method of claim 1, wherein the obtaining the perspective target of the target frame image of the panoramic video according to the perspective target of the first key frame image comprises:

and carrying out target tracking processing on the target frame image according to the visual angle target of the first key frame image to obtain the visual angle target of the target frame image.

3. The method of claim 2, further comprising:

performing frame extraction processing on each frame image positioned between the first key frame image and the second key frame image in the panoramic video to obtain at least one frame of first image;

4. The method of claim 3, wherein obtaining at least one frame of the first image comprises: obtaining at least one frame of first image and other frame images except the at least one frame of first image;

the method further comprises the following steps: and obtaining the view angle target of each other frame image according to the view angle target of the first key frame image, the view angle target of the second key frame image and the view angle target of the first image of each frame.

5. The method of claim 1, further comprising: acquiring a visual angle target of the second key frame image;

the obtaining of the view target of the target frame image of the panoramic video according to the view target of the first key frame image includes:

and obtaining the visual angle target of the target frame image according to the visual angle target of the first key frame image and the visual angle target of the second key frame image.

6. The method of claim 1, wherein said obtaining the perspective target of the first key frame image comprises:

performing target detection on the first key frame image to obtain each target in the first key frame image;

evaluating each target according to a preset multi-dimensional characteristic evaluation strategy to obtain an evaluation result of each target;

and according to the evaluation result of each target, taking the target with the optimal evaluation result as the visual angle target of the first key frame image.

7. The method according to claim 6, wherein the evaluating each target according to a preset multi-dimensional feature evaluation strategy to obtain an evaluation result of each target comprises:

respectively executing the following operations on each target:

for each first evaluation dimension in a plurality of preset evaluation dimensions, evaluating the target based on the first evaluation dimension to obtain a value of the target corresponding to the first evaluation dimension;

obtaining a score of the target corresponding to the first evaluation dimension according to the value of the target corresponding to the first evaluation dimension and a preset weight for the first evaluation dimension;

and taking the sum of the scores of the target corresponding to each preset evaluation dimension as an evaluation result of the target.

8. The method according to claim 1, wherein said obtaining a view path of the panoramic video according to each obtained view target comprises:

for each obtained visual angle target, according to a boundary box corresponding to the visual angle target, taking the central point of the boundary box as the viewpoint of the frame image where the visual angle target is located;

and obtaining the view angle path of the panoramic video according to the obtained viewpoints.

9. The method of claim 8, further comprising:

for each obtained visual angle target, taking a central point of a boundary frame corresponding to the visual angle target as a central point of a visual angle, and obtaining the visual angle of a frame image where the visual angle target is located according to the boundary frame corresponding to the visual angle target, wherein the visual angle range corresponding to the visual angle is larger than or equal to the range of the boundary frame;

generating a plane video frame corresponding to the frame image of the visual angle target according to the visual angle;

and obtaining the plane video corresponding to the panoramic video according to the view angle path of the panoramic video and each generated plane video frame.

10. A viewing angle path acquisition apparatus, comprising:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a visual angle target of a first key frame image of a panoramic video, and each key frame image of the panoramic video is obtained by performing frame extraction processing on the panoramic video;

a second obtaining module, configured to obtain a view target of a target frame image of the panoramic video according to the view target of the first key frame image, where the target frame image is located between the first key frame image and a second key frame image, and the second key frame image is a next key frame image of the first key frame image in the panoramic video;

and the third acquisition module is used for acquiring the view angle path of the panoramic video according to the acquired view angle targets.

11. An electronic device comprising a processor and a memory, the memory being configured to store program instructions, the processor being configured to execute the program instructions to implement the method of any of claims 1 to 9.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon program instructions which, when executed by a processor, implement the method of any one of claims 1 to 9.