CN110225402B

CN110225402B - Method and device for intelligently keeping interesting target time display in panoramic video

Info

Publication number: CN110225402B
Application number: CN201910629300.9A
Authority: CN
Inventors: 朱磊; 杨晓光
Original assignee: Qingdao Yispace Technology Co ltd
Current assignee: Qingdao Yispace Technology Co ltd
Priority date: 2019-07-12
Filing date: 2019-07-12
Publication date: 2022-03-04
Anticipated expiration: 2039-07-12
Also published as: CN110225402A

Abstract

The invention provides a method and a device for intelligently keeping the display of an interest target in a panoramic video at any moment. The method comprises the following steps: acquiring panoramic video data; panoramic video data are analyzed based on an intelligent algorithm, and the analysis comprises the following steps: determining scene samples containing the same preset interest target in the panoramic video according to the characteristics of a preset target and at least one preset parameter, and indexing the playing view angle of the scene sample sequence to enable the subsequent video frame sequence to track according to the playing view angle; and when receiving a tracking interest target playing instruction, playing the panoramic video data based on the analysis result. The method and the device can keep the interested target in the playing visual angle without the operations of sliding a screen and the like of a user when the panoramic camera collects images and the interested target moves relatively.

Description

Method and device for intelligently keeping interesting target time display in panoramic video

Technical Field

The invention relates to a video processing technology, in particular to a method and a device for intelligently keeping an interesting target in a panoramic video to be displayed at a moment.

Background

Usually, when the intelligent terminal plays the panoramic video, only a part of pictures in a certain orientation in the panoramic video is played by the display screen. In fact, the panoramic video is a view scene of 360 degrees in all directions, which is observed by rotating around a fixed point around an observer as a center. During playing, the intelligent terminal can determine which partial image in the current frame image of the panoramic video is displayed by default or according to manual selection of a user, the partial image is equivalent to the partial image seen when the user is used as an observer to watch the panoramic video at a certain watching visual angle, and the watching visual angle is the playing visual angle of the current video.

When a user wishes to continuously observe an interest object in a video while watching a panoramic video, since the display screen of the player can only display a small part of the panoramic video, that is, the current picture seen on the screen by the user is only a part of the current frame image of the panoramic video, the user has to manually swipe the screen to continuously drag the picture containing the interest object to the viewable area of the screen for viewing on the premise that the panoramic camera and the interest object are not kept relatively still. Obviously, this approach is very disadvantageous to the user experience.

Disclosure of Invention

The following presents a simplified summary of the invention in order to provide a basic understanding of some aspects of the invention. It should be understood that this summary is not an exhaustive overview of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is discussed later.

In view of this, the present invention provides a method and an apparatus for intelligently maintaining the display of an interesting target moment in a panoramic video, so as to at least solve the problem that the display of the interesting target moment cannot be maintained when the panoramic video is played in the prior art.

The invention provides a method for intelligently keeping an interesting object in a panoramic video to be displayed at a moment, which comprises the following steps: acquiring panoramic video data, wherein the panoramic video data comprises a plurality of panoramic video sequences; analyzing panoramic video data based on an intelligent algorithm: determining scene samples containing the same preset interest target in the panoramic video according to the characteristics of a preset target and at least one preset parameter, and indexing the playing view angle of the scene sample sequence to enable the subsequent video frame sequence to track according to the playing view angle; wherein the analyzing comprises: extracting N scene samples containing N preset interest targets from each frame of a panoramic video based on the characteristics of the preset interest targets and at least one preset parameter, wherein N and N are positive integers, calculating the association degree of the N scene samples in each panoramic video frame and the preset interest targets of the N scene samples in the next frame of the panoramic video frame, judging that the scene samples of the panoramic video frame and the next panoramic video frame contain the same preset interest targets if the association degree is higher than a preset threshold value, grouping the scene samples containing the same preset interest targets in a plurality of panoramic video sequences according to a time sequence through an unsupervised learning process to obtain a plurality of groups, and indexing the playing view angle of each group of scene samples; and when receiving a tracking interest target playing instruction, playing the panoramic video data based on the analysis result.

Further, the at least one preset parameter includes one or more preset parameters of one or more categories.

Further, the one or more categories of the at least one preset parameter include at least one of: the angle of view; distance of the center of the object of interest from the center of projection.

Further, the at least one preset parameter includes: a first field of view; a second field of view; a third field of view; the distance between the center of the interest target and the projection center is a first distance; the distance between the center of the interest target and the projection center is a second distance; and the distance between the center of the interest target and the projection center is a third distance.

Further, if the user does not operate the playing device for more than the preset time, an interest target tracking playing instruction is triggered, so that the most matched interest target is selected from one or more interest targets corresponding to the panoramic video frame to serve as the interest target to be tracked.

Further, in the step of selecting the best matching interest target from the one or more interest targets corresponding to the panoramic video frame as the interest target to be tracked, the interest target to be tracked is determined according to the interest target in the corresponding playing view angle when the user triggers the playing instruction of the interest target to be tracked, so that the interest view angle is continuously changed according to the playing view angle corresponding to the scene sample group corresponding to the interest target to be tracked.

Further, the user perspective and the interest perspective take one of the following forms: coordinates, rotation angles, or spatial vectors.

According to another aspect of the present invention, there is also provided an apparatus for intelligently maintaining a display of a moment of interest object in a panoramic video, the apparatus comprising: an acquisition unit adapted to acquire panoramic video data, the panoramic video data comprising a plurality of panoramic video sequences; the analysis unit is suitable for analyzing the panoramic video data based on an intelligent algorithm: determining scene samples containing the same preset interest target in the panoramic video according to the characteristics of a preset target and at least one preset parameter, and indexing the playing view angle of the scene sample sequence to enable the subsequent video frame sequence to track according to the playing view angle; the playing unit is suitable for playing the panoramic video data based on the analysis result when receiving the tracking interest target playing instruction; the analysis unit comprises an extraction module, a calculation module and a grouping module; the extraction module is suitable for extracting N scene samples containing N preset interest targets from each frame of the panoramic video based on the characteristics of the preset interest targets and at least one preset parameter, wherein N and N are positive integers; the calculation module is suitable for calculating the relevance of the preset interest targets of the N scene samples in each panoramic video frame and the N scene samples in the next frame of the panoramic video frame, and if the relevance is higher than a preset threshold value, the calculation module judges that the scene samples of the panoramic video frame and the next panoramic video frame contain the same preset interest targets; the grouping module is suitable for grouping scene samples containing the same (n) preset interest targets in a plurality of panoramic video sequences according to a time sequence through an unsupervised learning process to obtain a plurality of groups, and indexing the playing visual angle of each group of scene samples.

According to the method and the device for intelligently keeping the interesting target time display in the panoramic video, when the panoramic camera and the interesting target move relatively, the intelligent terminal actively recommends the target which is possibly interested by the user according to the video content when playing the panoramic video, and further continuously tracks the interesting target in the panoramic video within a fixed time interval; the user can keep the interested target in the panoramic video all the time without frequently and manually adjusting the playing visual angle, thereby simplifying the user operation and improving the experience of the user watching the panoramic video.

These and other advantages of the present invention will become more apparent from the following detailed description of the preferred embodiments of the present invention, taken in conjunction with the accompanying drawings.

Drawings

The invention may be better understood by referring to the following description in conjunction with the accompanying drawings, in which like reference numerals are used throughout the figures to indicate like or similar parts. The accompanying drawings, which are incorporated in and form a part of this specification, illustrate preferred embodiments of the present invention and, together with the detailed description, serve to further explain the principles and advantages of the invention. Wherein:

FIG. 1 is a flow diagram illustrating an exemplary process of a method of the present invention for intelligently maintaining the display of a moment of interest object in a panoramic video;

FIG. 2 is a flowchart illustrating one possible process of step S120 in FIG. 1;

FIG. 3 is a block diagram illustrating an example of an apparatus for intelligently maintaining a temporal display of objects of interest in a panoramic video in accordance with the present invention;

fig. 4 is a schematic diagram showing one possible structure of the analysis unit in fig. 3.

Skilled artisans appreciate that elements in the figures are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help improve the understanding of the embodiments of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in the specification. It will of course be appreciated that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which will vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.

It should be noted that, in order to avoid obscuring the present invention with unnecessary details, only the device structures and/or processing steps closely related to the solution according to the present invention are shown in the drawings, and other details not so relevant to the present invention are omitted.

In the prior art, when an image is captured by a panoramic camera, the panoramic camera moves relative to an interested target, and the interested target cannot be always kept within a certain fixed observation angle range in a panoramic video, so that a user must slide a screen or move hardware equipment and the like to keep the interested target within a playing view angle. The method for intelligently keeping the interesting target in the panoramic video to be displayed at the moment can solve the problem. An exemplary process of the above method is described below in conjunction with fig. 1.

In the embodiment of the present invention, a playing view angle corresponding to a preset interest target in a panoramic video is referred to as an interest view angle. One exemplary process of the method of intelligently maintaining the display of the moment of interest in a panoramic video of the present invention is described below in conjunction with fig. 1, by which a user can maintain the display of the moment of interest in a panoramic video without manually swiping a screen.

As shown in fig. 1, in step S110, panoramic video data including a plurality of panoramic video sequences is acquired. Then, step S120 is performed.

In step S120, the panoramic video data is analyzed based on an intelligent algorithm: determining scene samples containing the same preset interest target in the panoramic video according to the characteristics of the preset target and at least one preset parameter, and indexing the playing view angle of the scene sample sequence to enable the subsequent video frame sequence to track according to the playing view angle.

As an example, the feature of the preset target includes at least one of a color feature, a shape feature, a texture feature, and the like.

As an example, the at least one preset parameter may comprise one or more preset parameters of one category, for example.

As an example, the at least one preset parameter may also include a plurality of preset parameters of a plurality of categories, where each category may correspond to one or more preset parameters.

In one example, the at least one preset parameter may include (but is not limited to) two types of preset parameters, where one type is a viewing angle, and the other type is a distance between a center of the object of interest and a projection center (a center of a playing viewing angle). Among the above at least one preset parameter, for example, 3 field angle parameters may be included, such as a 60 ° field angle (as an example of a first field angle), a 90 ° field angle (as an example of a second field angle), and a 120 ° field angle (as an example of a third field angle), and further, distance parameters of 3 interest target centers from a projection center may also be included, such as distances between the interest target center and the projection center being 20 pixel points (as an example of the first distance), 50 pixel points (as an example of the second distance), and 100 pixel points (as an example of the third distance), respectively.

In an embodiment of the present invention, the processing of step S120 may include, for example, steps S210-S230 as shown in FIG. 2.

As shown in fig. 2, in step S210, N scene samples containing N predetermined objects of interest are extracted from each frame (i.e. each of the current frame and the subsequent frame) of the panoramic video based on the features of the predetermined objects of interest and at least one preset parameter, where N and N are positive integers, and N is greater than or equal to 1.

For example, the predetermined interest targets may be bridges, towers, rivers, mountains, and the like, the field angle of a1 in at least one preset parameter is 120 °, and the field angle of a2 is 60 °, and then 6 scene samples are extracted from the panoramic video frame Im based on the characteristics of the above interest targets and the parameter values a1 and a2, where the 1 st scene sample corresponding parameter value a1 includes the predetermined interest target bridges and rivers, the 2 nd scene sample corresponding parameter value a2 includes the predetermined interest target bridges, the 3 rd scene sample corresponding parameter value a2 includes the predetermined interest target rivers, the 4 th scene sample corresponding parameter value a1 includes the predetermined interest target towers and mountains, the 5 th scene sample corresponding parameter value a2 includes the predetermined interest target towers, and the 6 th scene sample corresponding parameter value a2 includes the predetermined interest target mountains.

Next, in step S220, predetermined interest object feature association degrees of N scene samples in each panoramic video frame (e.g., panoramic video frame Im) and N scene samples in a next frame (e.g., panoramic video frame I (m +1)) of the panoramic video frame (where m represents an ordinal number of the current frame, e.g., 1,2, …, and m +1 represents a frame next to the m-th frame, i.e., m + 1-th frame) are calculated, and if the feature association degrees are higher than a preset threshold, it is determined that the panoramic video frame and the scene samples of the next panoramic video frame contain the same (same or same N) predetermined interest objects.

For example, the calculation of the feature association degree may be implemented by using an existing feature comparison method, which is not described herein again.

The predetermined threshold value may be set empirically, for example, or determined experimentally, and will not be described herein.

Next, in step S230, grouping scene samples containing the same predetermined interest object in the plurality of panoramic video sequences according to a time sequence through an unsupervised learning process, dividing the scene samples into a plurality of groups, and indexing a playing view angle of each group of scene samples; thus, scene samples belonging to the same group contain exactly the same predetermined object of interest. For example, assuming any two scene samples sa1 and sa2 in the same group, if sa1 contains and only contains the predetermined interest objects ta1, ta2 and ta3, sa2 also needs to contain and only contain the aforementioned predetermined interest objects ta1, ta2 and ta 3. Assuming two scene samples sa3 and sa4, where sa3 contains and only contains the predetermined interest targets ta1 and ta2, and sa4 contains and only contains the predetermined interest targets ta1, ta2, and ta4, sa3 and sa4 cannot be grouped into the same group, and sa3 or sa4 cannot be grouped into the above-mentioned groups where sa1 and sa2 are located.

For example, 4 scene samples extracted from the panoramic video frame I1 are { I11, I12, I13, I14}, 4 scene samples extracted from the panoramic video frame I2 are { I21, I22, I23, I24}, and there are 4 groups of scene samples containing the same predetermined object of interest: { I11, I21}, { I12, I22}, { I13, I23}, { I14, I24}, and then the play views corresponding to 4 sets of scene samples are, for example, spherical attitude angles { (α 11, β 11), (α 21, β 21) }, { (α 12, β 12), (α 22, β 22) }, { (α 13, β 13), (α 23, β 23) }, { (α 14, β 14), (α 24, β 24) }; and so on.

Then, in step S130, when a "track target of interest" play instruction triggered by the user is received, the panoramic video data is played based on the analysis result of step S120.

In step S130, the user interacts with the playing device by inputting an instruction, such as sliding the screen, while watching the panoramic video in a free manner, the playing device decodes and analyzes the panoramic video in advance or in real time, after the user does not operate the playing device for a preset time, for example, t seconds (t is, for example, 3 seconds or 5 seconds, etc.), a tracking interest object playing instruction is triggered, for example, an interest object a, an interest object B, an interest object C, when the user triggers the interest target tracking playing instruction, the playing visual angle triggered by the user does not contain all interest targets, or only contains the interest target a, the playing device has judged the interest target corresponding to the current playing view angle as the interest target a according to the pre-analysis result, finds the playing view angle according to the last step, and continuously changes the playing view angle.

For another example, after the user triggers the interest target tracking playing instruction, the playing view angle when the user triggers includes all the interest targets, or only includes the interest target a and the interest target B, the playing device has already determined, according to the pre-analysis result, that the interest target corresponding to the current playing view angle is the interest target a and the interest target B, and then finds the playing view angle according to the last step, and continuously changes the playing view angle.

It should be noted that, when the panoramic video enters the "track interest target" mode for playing, the user can enter the panoramic video free mode (sliding screen) for playing through interaction with the playing device at any time.

It should be noted that the user perspective and the interest perspective may take the same multiple expressions, such as coordinates, rotation angles, space vectors, and the like.

In addition, the embodiment of the invention also provides a device for intelligently keeping the display of the interesting target moment in the panoramic video.

Fig. 3 is a block diagram showing an example of the above-mentioned apparatus for intelligently maintaining the display of the time of interest object in the panoramic video.

As shown in fig. 3, the apparatus for intelligently maintaining the display of the time of interest object in the panoramic video includes an obtaining unit 310, an analyzing unit 320, and a playing unit 330.

As shown in fig. 3, the acquisition unit 210 is configured to acquire panoramic video data.

The analysis unit 320 is configured to analyze the panoramic video data based on an intelligent algorithm: determining scene samples containing the same preset interest target in the panoramic video according to the characteristics of the preset target and at least one preset parameter, and indexing the playing view angle of the scene sample sequence to enable the subsequent video frame sequence to track according to the playing view angle.

In this way, when a "track interest target" play instruction triggered by the user is received, the play unit 330 may play the panoramic video data based on the analysis result of the analysis unit 320.

In an embodiment of the present invention, the analysis unit 320 may include a structure as shown in fig. 4, for example.

As shown in fig. 4, the analysis unit 320 may include, for example, an extraction module 410, a calculation module 420, and a grouping module 430.

As shown in fig. 4, the extracting module 410 may extract N scene samples containing N predetermined objects of interest from each frame (i.e., each of the current frame and the subsequent frame) of the panoramic video based on the features of the predetermined objects of interest and at least one preset parameter, where N and N are positive integers.

Next, the calculating module 420 may calculate a predetermined interest target association degree between N scene samples in each panoramic video frame (e.g., the panoramic video frame Im) and N scene samples in a next frame (e.g., the panoramic video frame I (m +1)) (where m represents an ordinal number of the current frame, e.g., 1,2, …, and m +1 represents a frame next to the m-th frame, i.e., the m + 1-th frame), and determine that the scene samples of the panoramic video frame and the next panoramic video frame include the same (or the same N) predetermined interest targets if the association degree is higher than a preset threshold.

Next, the grouping module 430 may group scene samples containing the same (n) predetermined objects of interest in the plurality of panoramic video sequences into a plurality of groups according to a time sequence through an unsupervised learning process, and index a playing perspective of each group of scene samples.

As an example, a user may interact with a playback device by inputting an instruction, such as sliding a screen, while viewing a panoramic video in a free manner, the playback device decodes and analyzes the panoramic video in advance or in real time, after the user does not operate the playing device for a preset time, e.g., t seconds (t is, e.g., 3 seconds or 5 seconds), the playing unit 330 triggers a tracking interest object playing instruction, e.g., an interest object a, an interest object B, an interest object C, when the user triggers the interest target tracking playing instruction, the playing visual angle triggered by the user does not contain all interest targets, or only contains the interest target a, the playing device has judged the interest target corresponding to the current playing view angle as the interest target a according to the pre-analysis result, finds the playing view angle according to the last step, and continuously changes the playing view angle.

It should be noted that, when the panoramic video enters the "track interest target" mode for playing, the user can enter the panoramic video free mode (sliding screen or moving hardware device) for playing through interaction with the playing device at any time.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; although the present invention and the advantageous effects thereof have been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A method for intelligently maintaining a temporal display of an object of interest in a panoramic video, the method comprising:

acquiring panoramic video data, wherein the panoramic video data comprises a plurality of panoramic video sequences;

analyzing panoramic video data based on an intelligent algorithm: determining scene samples containing the same preset interest target in the panoramic video according to the characteristics of a preset target and at least one preset parameter, and indexing the playing view angle of the scene sample sequence to enable the subsequent video frame sequence to track according to the playing view angle; the at least one preset parameter comprises one or more preset parameters of one or more kinds; the one or more categories of the at least one preset parameter include at least one of: the field angle, the distance between the center of the interest target and the projection center; wherein the angles of view include a first angle of view of 60 °, a second angle of view of 90 °, and a third angle of view of 120 °; the analysis specifically comprises: extracting N scene samples containing N preset interest targets from each frame of the panoramic video based on the characteristics of the preset interest targets and at least one preset parameter, wherein N and N are positive integers; calculating the relevance of preset interest targets of N scene samples in each panoramic video frame and N scene samples in the next frame of the panoramic video frame, and if the relevance is higher than a preset threshold value, judging that the scene samples of the panoramic video frame and the next panoramic video frame contain the same preset interest targets; grouping scene samples containing the same preset interest target in a plurality of panoramic video sequences according to a time sequence through an unsupervised learning process to obtain a plurality of groups, and indexing the playing view angle of each group of scene samples;

and when receiving a tracking interest target playing instruction, playing the panoramic video data based on the analysis result.

2. The method of claim 1, wherein if the user does not operate the playing device for more than a predetermined time, triggering a tracking interest object playing command to select a best matching interest object from the one or more interest objects corresponding to the panoramic video frame as the interest object to be tracked.

3. The method according to claim 2, wherein in the step of selecting the best matching target of interest from the one or more targets of interest corresponding to the panoramic video frame as the target of interest to be tracked, the target of interest to be tracked is determined according to the target of interest within the corresponding playing perspective when the user triggers the tracking target of interest playing instruction, so as to continuously change the angle of interest according to the playing perspective corresponding to the scene sample grouping corresponding to the target of interest to be tracked.

4. The method of claim 3, wherein the user perspective and the interest perspective take one of the following forms: coordinates, rotation angles, or spatial vectors.

5. An apparatus for intelligently maintaining a temporal display of an object of interest in a panoramic video, the apparatus comprising:

an acquisition unit adapted to acquire panoramic video data, the panoramic video data comprising a plurality of panoramic video sequences;

the analysis unit is suitable for analyzing the panoramic video data based on an intelligent algorithm: determining scene samples containing the same preset interest target in the panoramic video according to the characteristics of a preset target and at least one preset parameter, and indexing the playing view angle of the scene sample sequence to enable the subsequent video frame sequence to track according to the playing view angle; wherein the at least one preset parameter comprises one or more preset parameters of one or more categories; the one or more categories of the at least one preset parameter include at least one of: the field angle, the distance between the center of the interest target and the projection center; in the at least one preset parameter, the field angles include a first field angle of 60 °, a second field angle of 90 °, and a third field angle of 120 °; and

the playing unit is suitable for playing the panoramic video data based on the analysis result when receiving a tracking interest target playing instruction;

the analysis unit comprises an extraction module, a calculation module and a grouping module;

the extraction module is suitable for extracting N scene samples containing N preset interest targets from each frame of the panoramic video based on the characteristics of the preset interest targets and at least one preset parameter, wherein N and N are positive integers;

the calculation module is suitable for calculating the relevance of the preset interest targets of the N scene samples in each panoramic video frame and the N scene samples in the next frame of the panoramic video frame, and if the relevance is higher than a preset threshold value, the calculation module judges that the scene samples of the panoramic video frame and the next panoramic video frame contain the same preset interest targets;

the grouping module is suitable for grouping scene samples containing the same or the same n preset interest targets in a plurality of panoramic video sequences according to a time sequence through an unsupervised learning process to obtain a plurality of groups, and indexing the playing visual angle of each group of scene samples.