CN113313735B

CN113313735B - Panoramic video data processing method and device

Info

Publication number: CN113313735B
Application number: CN202110573171.3A
Authority: CN
Inventors: 潘一汉
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2021-05-25
Filing date: 2021-05-25
Publication date: 2023-04-07
Anticipated expiration: 2041-05-25
Also published as: CN113313735A

Abstract

The application provides a panoramic video data processing method and a panoramic video data processing device, wherein the panoramic video data processing method comprises the following steps: under the condition that the frame selection operation of a target object positioned in the center of a playing plane image for a current video frame is received, determining a reference polar coordinate of the center position of the target object in a spherical polar coordinate system; generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in a spherical polar coordinate system; determining an update polar coordinate of the central position of the target object in the tracking plane image of the next video frame in the spherical polar coordinate system according to the object characteristics of the target object; and returning to execute the operation step of generating a planar image of the next video frame by taking the reference polar coordinate as a center according to the panoramic image of the next video frame in the spherical polar coordinate system by taking the updated polar coordinate as the reference polar coordinate until a tracking stop condition is reached, and obtaining a reference polar coordinate sequence corresponding to the current video frame to the target video frame.

Description

Panoramic video data processing method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a panoramic video data processing method. The application also relates to a panoramic video data processing device, a computing device and a computer readable storage medium.

Background

With the rapid development of computer technology and image processing technology, panoramic videos come into play, the shooting and production of panoramic videos are more and more pursued by people, and a plurality of video websites take the panoramic videos as a special category for users to select and watch. The panoramic video is a dynamic video which is shot by a panoramic camera and contains 360-degree omnibearing picture contents, a static panoramic picture is converted into a dynamic video image, and a user can watch the dynamic video within the shooting angle range of the panoramic camera at will.

In the prior art, if an author of a panoramic video wants a user to have a specific default plane image view angle center when watching the panoramic video, the panoramic video which is shot needs to be processed frame by frame, the processing efficiency of the panoramic video is low, and energy is consumed very much.

Disclosure of Invention

In view of this, an embodiment of the present application provides a method for processing panoramic video data. The application also relates to a panoramic video data processing device, a computing device and a computer readable storage medium, which are used for solving the problem of low panoramic video processing efficiency in the prior art.

According to a first aspect of embodiments of the present application, there is provided a panoramic video data processing method, including:

under the condition that a frame selection operation of a target object positioned at the center of a playing plane image aiming at a current video frame is received, determining a reference polar coordinate of the center position of the target object in a spherical polar coordinate system;

generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system;

determining an updated polar coordinate of the central position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame according to the object feature of the target object;

and returning to the step of executing the operation of generating the tracking plane image of the next video frame by taking the reference polar coordinate as the center according to the panoramic image of the next video frame in the spherical polar coordinate system by taking the updated polar coordinate as the reference polar coordinate until a tracking stop condition is reached, and obtaining a reference polar coordinate sequence corresponding to the current video frame to a target video frame, wherein the target video frame is the video frame corresponding to the tracking stop condition.

According to a second aspect of embodiments of the present application, there is provided a panoramic video data processing apparatus including:

a first determination module configured to determine a reference polar coordinate of a center position of a target object in a spherical polar coordinate system in a case where a frame selection operation of the target object located at a center of a playing plane image for a current video frame is received;

a generation module configured to generate a tracking plane image of a next video frame with the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system;

a second determining module configured to determine an updated polar coordinate of the central position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame according to the object feature of the target object;

and the execution module is configured to use the updated polar coordinate as the reference polar coordinate, return to the operation step of executing the panoramic image in the spherical polar coordinate system according to the next video frame, and generate a tracking plane image of the next video frame by taking the reference polar coordinate as a center until a tracking stop condition is reached, so as to obtain a reference polar coordinate sequence corresponding to the current video frame to a target video frame, wherein the target video frame is a video frame corresponding to the tracking stop condition.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions to implement the method of:

under the condition that a frame selection operation of a target object positioned in the center of a playing plane image for a current video frame is received, determining a reference polar coordinate of the center position of the target object in a spherical polar coordinate system;

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any of the panoramic video data processing methods.

The method for processing the panoramic video data comprises the steps that under the condition that a frame selection operation of a target object located in the center of a playing plane image for a current video frame is received, a reference polar coordinate of the center position of the target object in a spherical polar coordinate system is determined; generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system; determining an updated polar coordinate of the central position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame according to the object feature of the target object; and returning to the step of executing the operation of generating the tracking plane image of the next video frame by taking the reference polar coordinate as the center according to the panoramic image of the next video frame in the spherical polar coordinate system by taking the updated polar coordinate as the reference polar coordinate until a tracking stop condition is reached, and obtaining a reference polar coordinate sequence corresponding to the current video frame to a target video frame, wherein the target video frame is the video frame corresponding to the tracking stop condition.

Under the circumstance, after framing a target object positioned in the center of a playing plane image in the current video, an author can determine a reference polar coordinate of the center position of the target object in a spherical polar coordinate system and uses the reference polar coordinate as the center of a tracking plane image for generating the next video frame, so that a corresponding panoramic video is played by always taking the target object as the center, namely, the author only needs to frame the target object, can automatically generate the tracking plane image taking the target object as the center and track the target, the visual angle center of the continuous tracking plane image can be automatically generated for the target object, the author does not need to process the panoramic video frame by frame, and the processing efficiency of the panoramic video is greatly improved; and the target object in the panoramic video is tracked by a spherical polar coordinate system aiming at the framed target object, so that the tracking failure caused by the graphic distortion of the target object at different positions in the panoramic video can be effectively avoided, the target object tracking accuracy and success rate are improved, and the panoramic video processing effect and playing effect are improved.

Drawings

Fig. 1 is a flowchart of a panoramic video data processing method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a spherical polar coordinate system according to an embodiment of the present disclosure;

fig. 3 is a flowchart of another panoramic video data processing method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a panoramic video data processing apparatus according to an embodiment of the present application;

fig. 5 is a block diagram of a computing device according to an embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit and scope of this application, and thus this application is not limited to the specific implementations disclosed below.

The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if," as used herein, may be interpreted as "at \8230; \8230when" or "when 8230; \823030when" or "in response to a determination," depending on the context.

In the present application, a panoramic video data processing method is provided, and the present application relates to a panoramic video data processing apparatus, a computing device, and a computer readable storage medium, which are described in detail in the following embodiments one by one.

Fig. 1 shows a flowchart of a panoramic video data processing method according to an embodiment of the present application, which specifically includes the following steps:

step 102: in the case of receiving a frame selection operation of a target object located at the center of a playing plane image for a current video frame, determining a reference polar coordinate of the center position of the target object in a spherical polar coordinate system.

In practical application, when the panoramic video is shot, because the panoramic video can cover 360 degrees visual angles, an object or a target which is required to be shot in a key mode can not be fixed in a certain visual angle range, and when the panoramic video is played, the image content in the 360 degrees visual angle range can not be completely displayed on the playing device at one time, a user needs to select an observation visual angle suitable for the user, the observation visual angle is the image playing visual angle of the current video, namely, the user needs to continuously pull the visual angle to watch the content required to be watched by the user when watching the panoramic video, the operation is troublesome, and the experience of watching the panoramic video is poor.

In addition, if the creator of the video wants that the user has a specific center of view of the playing screen when watching the panoramic video, the panoramic video that has been shot needs to be processed frame by frame, which results in low processing efficiency of the panoramic video and high energy consumption. In addition, when the panoramic video is processed frame by frame at present, the panoramic video is usually directly decoded into a frame sequential image, and then target tracking is directly performed on the original panoramic image frame by frame, and as the target is at different positions in the original panoramic image, graphic distortion may occur, which may cause target tracking failure.

Therefore, in order to improve the processing efficiency of the panoramic video and the tracking success rate of the target object, the application provides a panoramic video data processing method, which determines the reference polar coordinate of the center position of the target object in a spherical polar coordinate system under the condition of receiving the frame selection operation of the target object positioned at the center of a playing plane image aiming at the current video frame; generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system; determining an updated polar coordinate of the central position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame according to the object feature of the target object; and returning to the step of executing the operation of generating the tracking plane image of the next video frame by taking the reference polar coordinate as the center according to the panoramic image of the next video frame in the spherical polar coordinate system by taking the updated polar coordinate as the reference polar coordinate until a tracking stop condition is reached, and obtaining a reference polar coordinate sequence corresponding to the current video frame to a target video frame, wherein the target video frame is the video frame corresponding to the tracking stop condition. Therefore, the visual angle center of the continuous tracking plane image can be automatically generated for the target object, the creator does not need to process the panoramic video frame by frame, and the processing efficiency of the panoramic video is greatly improved; and for the framed target object, the target object in the panoramic video is tracked by using a spherical polar coordinate system, so that the tracking failure caused by the graphic distortion of the target object at different positions in the panoramic video can be effectively avoided.

Specifically, the current video frame is a panoramic video frame of the creator for framing the target object; the planar image is a two-dimensional image obtained by mapping the panoramic video image, and the playing planar image is a planar image actually played by the client, namely a planar image which can be seen by a user; the target object is an object which the creator wants to subsequently display in the center of the playing plane image, and the content framed and selected by the framing operation is the target object; the frame selection operation refers to an operation of adding a target frame outside the target object, namely, selecting the target object by using the target frame.

In addition, the spherical polar coordinate system, also called spatial polar coordinate, is a kind of three-dimensional coordinate system, which is extended from two-dimensional polar coordinate system to determine the position of the midpoint, line, plane and body in three-dimensional space, and it uses the origin of coordinates as the reference point, and is composed of azimuth angle, elevation angle and radius distance, in this application, the radius distance in the spherical polar coordinate system is set as the default value, usually set as 100-300, such as 128. That is, the polar spherical coordinate system in the present application is a polar spherical coordinate system with a fixed sphere radius, and thus the polar reference coordinate system in the present application includes an azimuth angle and an elevation angle, by which a point on the spherical surface (i.e., a point on the spherical surface corresponding to the center position of the target object) can be uniquely determined. The reference polar coordinates refer to the center of the view angle of the planar image corresponding to the panoramic video image that the creator needs to make.

For example, fig. 2 is a schematic diagram of a spherical polar coordinate system according to an embodiment of the present application, and as shown in fig. 2, lat (elevation angle) and lon (azimuth angle) are polar coordinate representations of an elevation angle and an azimuth angle of a point a in a sphere, respectively.

It should be noted that, in general, the playing plane image of the panoramic video viewed by the user is a plane picture generated by taking a certain point on a sphere as a center and taking a certain elevation angle and azimuth angle range as a viewing angle. Tracking of the target object in the present application is also based on changes in the polar coordinates of reference (azimuth and elevation), wherein the range of elevation changes to plus or minus 90 degrees and the range of azimuth changes to plus or minus 180 degrees.

In the application, when a creator needs to fix a target object as a view angle center of a playing plane image, the target object can be framed, then the target object can be automatically tracked in subsequent video frames according to a reference polar coordinate (azimuth angle and elevation angle) of the center position of the target object in a spherical polar coordinate system, the reference polar coordinate of the center position of the target object is updated, so that the view angle center of a continuous tracking plane image corresponding to the video segment is generated, and then the video segment is played according to the view angle center of the continuous tracking plane image.

In an optional implementation manner of this embodiment, before receiving a frame selection operation on a target object located in the center of the playing plane image in a current video frame, the method further includes:

receiving a moving operation of dragging the target object to the center of the playing plane image;

and updating the playing plane image of the current video frame into a plane image taking the target object as a center according to the moving operation.

It should be noted that the target object that the author wants to select by frame may not be located in the center of the playing plane image, so the author may drag the target object to the center of the playing plane image in the playing plane image of the current video frame, and then select the target object located in the center of the playing plane image by frame.

According to the method and the device, the target object needing to be framed can be dragged to the center of the playing plane image, then framing is carried out, target tracking is carried out according to the object characteristics of the target object located at the center of the playing plane image, and therefore the problem that the target object generates graphic distortion at other different positions of the playing plane image, and then follow-up target tracking fails is avoided.

In an optional implementation manner of this embodiment, a reference polar coordinate of the center position of the target object in the spherical polar coordinate system is determined, and a specific implementation process may be as follows:

determining the central position of the target object in the playing plane image;

and determining a reference polar coordinate of the central position in the spherical polar coordinate system according to the panoramic image of the current video frame in the spherical polar coordinate system and the central position.

It should be noted that, after a certain frame of panoramic video (i.e., panoramic image) is projected to a spherical polar coordinate system, each pixel point has a corresponding polar coordinate on the spherical polar coordinate system, and the planar image is a two-dimensional planar image mapped by a certain panoramic video frame, and the planar image also includes a plurality of pixel points, so that a pixel point at the center position of a target object in the planar image can be determined first, and then a polar coordinate corresponding to the pixel point on the spherical polar coordinate system is found, where the polar coordinate is a reference polar coordinate of the center position of the target object in the spherical polar coordinate system.

In an optional implementation manner of this embodiment, after determining the reference polar coordinate in the spherical polar coordinate system of the center position of the target object, the method further includes:

and carrying out image recognition on the playing plane image of the current video frame, and determining the object characteristics of the target object.

It should be noted that, in order to track a target object in a subsequent video frame, image recognition needs to be performed on a target object framed and selected in a current video frame, so as to obtain an object feature of the target object located in the center of the playing plane image. In specific implementation, the tracking algorithm may be a tracking algorithm based on Correlation filtering, such as KCF (Kernel Correlation Filter), DSST (discrete Scale Space Tracker, a filtering algorithm combining position and Scale), and the like, and may also be a tracking algorithm based on deep learning, such as SiamRPN, siamFC, and the like, and the specific tracking algorithm is not limited in this application.

In an optional implementation manner of this embodiment, the image recognition is performed on the playing plane image of the current video frame, and a specific implementation process may be as follows:

determining a target frame corresponding to the frame selection operation;

determining a corresponding recognition area according to the target frame;

and carrying out image recognition in the recognition area in the playing plane image of the current video frame.

When the target object is selected, a target frame may be used for frame selection, and then a partial image content greater than or equal to the target frame may be selected as an identification area according to the target frame, and then image identification may be performed only in the identification area.

In actual application, the area framed by the target frame may be determined, and a preset multiple of the area may be determined as the identification area. Of course, the length and width of the target frame may also be determined, and a region formed by preset multiples of the length and width may be determined as the recognition region. Specifically, the preset multiple may be preset, and the preset multiple is used to determine the area where the image recognition is finally performed, for example, the preset multiple may be 1.5 times, 2 times, and the like.

It should be noted that, when performing image recognition on the playing plane image of the current video frame and extracting the object feature of the target object, image recognition may be performed on the entire playing plane image to extract the feature. In addition, only the object characteristics of the target object need to be acquired finally, so that image recognition can be performed only on the area near the target object, namely, the area framed by the framing operation can be determined first, then the area of the preset multiple is determined as the recognition area, and image recognition is performed only in the recognition area, so that image recognition of the whole playing plane image is not needed, the image recognition speed is increased, and the processing efficiency of the whole panoramic video is improved.

In the application, after the creator selects the target object located in the center of the playing plane image in the current video, the reference polar coordinate of the center position of the target object in the spherical polar coordinate system can be determined, so that the target object can be conveniently used as the center of the tracking plane image for generating the next video frame, and therefore the target object can be conveniently and always used for target tracking, and the corresponding panoramic video can be played.

Step 104: and generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system.

Specifically, on the basis of determining the reference polar coordinate of the center position of the target object in the spherical polar coordinate system, further, the tracking plane image of the next video frame is generated according to the panoramic image of the next video frame in the spherical polar coordinate system by taking the reference polar coordinate as the center. The tracking plane image is a plane image used for target tracking of a target object.

In an optional implementation manner of this embodiment, the tracking plane image of the next video frame is generated according to the panoramic image of the next video frame in the spherical polar coordinate system with the reference polar coordinate as a center, and a specific implementation process may be as follows:

mapping the next video frame to the spherical polar coordinate system to obtain a panoramic image of the next video frame in the spherical polar coordinate system;

taking the reference polar coordinate as a center and a preset angle as a range, and intercepting the panoramic image;

and converting the intercepted panoramic image into a tracking plane image of the next video frame.

Specifically, the predetermined angle range refers to a predetermined range of elevation and azimuth angles, such as an elevation angle of 0-30 degrees and an azimuth angle of 10-45 degrees. It should be noted that, when generating the tracking plane image corresponding to each video frame of the panoramic video, the preset angle ranges are the same, that is, the ranges of an elevation angle and an azimuth angle are preset, the tracking plane image of the first frame of the panoramic video frame is generated according to the ranges of the elevation angle and the azimuth angle, and the tracking plane image of each subsequent frame of the panoramic video frame is generated according to the ranges of the elevation angle and the azimuth angle.

In the application, the panoramic image of the next video frame can be projected into a spherical polar coordinate system, then a certain panoramic image is intercepted by taking the determined reference polar coordinate as the center, and the panoramic image is mapped into a two-dimensional plane image, so that the tracking plane image of the next video frame can be obtained.

Step 106: and determining the updated polar coordinates of the central position of the target object in the tracking plane image of the next video frame in the spherical polar coordinate system according to the object characteristics of the target object.

Specifically, on the basis of generating a tracking plane image of the next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system, further, an updated polar coordinate of the central position of the target object in the tracking plane image of the next video frame in the spherical polar coordinate system is determined according to the object feature of the target object.

It should be noted that, image recognition is performed on the playing plane image of the current video frame, so that the object feature of the target object located in the center of the current playing plane image can be obtained, based on the object feature, target tracking can be performed in the generated tracking plane image of the next video frame, a corresponding target object is found, and then, based on the newly determined target object, the reference polar coordinate is updated.

In an optional implementation manner of this embodiment, according to the object feature of the target object, an updated polar coordinate of the center position of the target object in the tracking plane image of the next video frame in the spherical polar coordinate system is determined, and a specific implementation process may be as follows:

performing image recognition on the recognition area in the tracking plane image of the next video frame, and determining the central position of a target object in the next video frame;

and determining the updated polar coordinates of the central position of the target object in the next video frame in the spherical polar coordinate system.

It should be noted that, image recognition is performed on the tracking plane image of the next video frame to obtain corresponding image features, then the image features having the object features of the target object are determined as the target object, and then the updated polar coordinates of the center position of the updated target object can be determined. In practical application, the whole tracking plane image of the next video frame can be directly identified to obtain corresponding image characteristics, and the identification area determined based on the current video frame can be identified, namely, the image identification is carried out only in the identification area in the tracking plane image of the next video frame.

In an optional implementation manner of this embodiment, the image recognition is performed on the recognition area in the tracking plane image of the next video frame, and the center position of the target object in the next video frame is determined, which may be specifically implemented as follows:

performing image recognition on the recognition area in the tracking plane image of the next video frame to obtain image characteristics;

analyzing and processing the image features and the object features to obtain a confidence coefficient that the target object exists in the identification region and a position offset of the image features relative to the center position of the identification region;

and under the condition that the confidence degree is greater than a confidence degree threshold value, determining the central position of the target object in the next video frame according to the central position of the target object in the playing plane image and the position offset.

In particular, the confidence level is also referred to as reliability, or confidence level, confidence coefficient, and the confidence level that the target object exists in the recognition area may indicate whether the target object exists in the recognition area. It should be noted that after image recognition is performed on the recognition area in the tracking plane image of the next video frame to obtain the image features, it is necessary to determine whether the image features obtained by the recognition are the target object initially framed, so that the image features and the object features can be analyzed to determine the confidence level of the target object existing in the recognition area, that is, the reliability of the target object existing in the recognition area, and in specific implementation, the image features and the object features can be analyzed by different algorithms to obtain the confidence level of the target object existing in the recognition area.

In one possible implementation, the similarity between the image feature and the object feature may be determined through feature comparison, so as to obtain the confidence that the target object exists in the recognition region. In a specific implementation, the image features and the object features may be compared, a similarity between the image features and the object features may be determined, and the similarity may be determined as a confidence level that a target object exists in the recognition region.

In addition, the confidence that the target object exists in the identification region can also be obtained by performing convolution on the image features and the object features, and of course, in practical application, other tracking algorithms can also be adopted to input the image features and the object features into the tracking algorithm to obtain the confidence that the target object exists in the identification region, which is not limited in the present application.

It should be noted that, when the image feature and the object feature are analyzed, in addition to the confidence that the target object exists in the recognition region, the position offset of the image feature with respect to the center position of the recognition region may be obtained. Since the identification area is determined according to the target object in the playing plane image of the current video frame, the central position of the identification area may actually represent the central position of the target object in the current video frame. In addition, the image feature is a feature obtained by identifying the identification region in the next video frame, and the object feature is an object feature of the target object in the current video frame (i.e. a feature when the target object is located at the image center position), and by analyzing and comparing the image feature and the object feature, a change of the image feature in the next video frame relative to the feature when the target object is located at the image center position can be obtained, and the change can represent a position offset of the image feature relative to the center position of the identification region.

In addition, since the image feature is a feature obtained by performing image recognition on the recognition area in the tracking plane image of the next video frame, the image feature is a feature corresponding to the candidate target object, a position offset of the image feature with respect to a center position of the recognition area is a position offset of the candidate target object with respect to the center position of the recognition area, and when it is determined that the candidate target object is the target object in the current video frame subsequently, the position offset may indicate how much the target object moves with respect to the current video frame in the next video frame.

When the confidence is greater than the confidence threshold, it is indicated that the identified image feature is the target object initially selected by the frame, and therefore, the updated center position of the target object (i.e., the center position of the target object in the next video frame) may be obtained according to the initial center position (the center position of the target object in the playing plane image) and the position offset (i.e., the distance moved by the target object) at this time.

It should be noted that, target tracking is performed in a new video frame (i.e., a next video frame), a position of the target object in the next video frame is determined, and then updated polar coordinates of a center position of the target object are further determined, so that it is convenient to continue to generate a tracking plane image of a subsequent video frame and continue to track the target object.

In an optional implementation manner of this embodiment, after determining the center position of the target object in the next video frame according to the center position of the target object in the playing plane image and the position offset, the method further includes:

and fusing the image characteristic and the object characteristic to obtain an updated object characteristic.

It should be noted that, after image recognition is performed on the playing plane image of the initial video frame to obtain the initial object feature of the target object located in the center of the playing plane image, when target tracking is subsequently performed on each video frame, the recognized image features may all be compared with the initial object feature, so as to determine the target object. In addition, the image features recognized each time can be fused with the previous object features to serve as a tracking standard of the target object recognized by the next video frame, that is, the recognized image features and the previous object features are fused each time to obtain updated object features to serve as a comparison standard of target tracking of the subsequent video frame.

In an example, assuming that an initial video frame is a 10 th frame video frame, performing image recognition on a playing plane image of the 10 th frame video frame to obtain an object feature 1 (a dog feature) of a target object; then, carrying out image recognition on the tracking plane image of the 11 th frame of video frame, carrying out analysis processing on the image feature 1 (the hat-wearing dog) and the object feature 1, and when the confidence coefficient between the image feature 1 and the object feature 1 is determined to be greater than a confidence coefficient threshold value, indicating that the image feature 1 is the target object, and fusing the object feature 1 and the image feature 1 to obtain an updated object feature 1 (the hat-wearing dog); and then, carrying out image recognition on the tracking plane image of the 12 th frame of video frame, carrying out analysis processing on the image feature 2 (the dog wearing clothes) and the update object feature 1, when the confidence coefficient between the image feature 2 and the update object feature 1 is determined to be greater than a confidence coefficient threshold value, indicating that the image feature 2 obtained by recognition is the target object, fusing the image feature 2 and the update object feature 1 to obtain the update object feature 2 (the dog wearing a hat and wearing clothes), and the like.

Step 108: and taking the updated polar coordinate as the reference polar coordinate, and returning to the operation step of the step 104 until a tracking stop condition is reached to obtain a reference polar coordinate sequence corresponding to the current video frame to a target video frame, wherein the target video frame is a video frame corresponding to the tracking stop condition.

Specifically, on the basis of determining an updated polar coordinate of the central position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame according to the object feature of the target object, further, the updated polar coordinate is used as the reference polar coordinate, and the operation step of step 104 is returned to until the tracking stop condition is reached, so as to obtain a reference polar coordinate sequence corresponding to the current video frame to the target video frame, where the target video frame is a video frame corresponding to the tracking stop condition.

The tracking stop condition is a condition indicating that the target tracking fails, and when the tracking stop condition is reached, the target tracking fails, and the target object may not exist in the current tracking plane image, so that the tracking is stopped, and a series of obtained reference polar coordinates (i.e., a reference polar coordinate sequence) are output, where the series of reference polar coordinates are the viewing angle centers of the tracking plane images from the current video frame to the target video frame, that is, the plane images from the current video frame to the target video frame can be played subsequently according to the series of reference polar coordinates as the centers of the tracking plane images.

It should be noted that after the updated polar coordinate is determined, the next video frame may continue to be processed, according to the operation steps of the foregoing step 104, a tracking plane image of the next video frame is generated, the updated polar coordinate is determined, the updated polar coordinate is used as a reference polar coordinate, the next video frame is continuously processed, and the process is repeated in a circulating manner until a tracking stop condition is reached, so as to obtain a continuous reference polar coordinate sequence corresponding to a segment of video, where the continuous reference polar coordinate sequence may be used as a reference for subsequently playing the segment of video.

In an optional implementation manner of this embodiment, when the tracking stop condition is reached, a specific implementation process may be as follows:

determining that the tracking-stop condition is reached if a confidence that the target object is present in the recognition region is below a confidence threshold.

Specifically, the confidence threshold is a preset numerical value, and is used to determine whether the confidence of the target object existing in the recognition region is too low, that is, determine whether the target object still exists in the recognition region, for example, the confidence threshold may be 50 or 60.

It should be noted that, image recognition is performed on the tracking plane image of the next video frame to obtain corresponding image features, the image features and the object features are analyzed to obtain a confidence level that the target object exists in the recognition area, if the confidence level is greater than a confidence level threshold value, it is indicated that the target object still exists in the recognition area, at this time, it indicates that the target tracking is normal, and the tracking of the next frame can be continued; and if the confidence coefficient is lower than the confidence coefficient threshold value, the target object does not exist in the identification area, the target tracking is lost at the moment, the tracking stopping condition is reached, and the tracking is not carried out any more.

In an optional implementation manner of this embodiment, after obtaining the reference polar coordinate sequence corresponding to the current video frame to the target video frame, the method further includes:

performing smoothing filtering processing on the reference polar coordinates corresponding to the current video frame to the target video frame to obtain a reference polar coordinate sequence after smoothing filtering processing;

and taking the reference polar coordinate sequence after the smoothing filtering processing as the center of a playing plane image for playing the current video frame to the target video frame.

In practical implementation, after a reference polar coordinate sequence corresponding to a current video frame to a target video frame is obtained, the series of reference polar coordinates may be input to a filter for smoothing filtering, where the filter may be a filter such as mean filtering or median filtering, and of course, in practical application, the filter may also be another filter capable of smoothing filtering series data, which is not limited in this application.

In the application, smoothing filtering processing can be performed on the obtained reference polar coordinate sequence corresponding to the current video frame to the target video frame, and the processed reference polar coordinate sequence is used as the center of a playing plane image for playing the current video frame to the target video frame, that is, the reference polar coordinate after smoothing processing is used as the center to play the panoramic video, so that visual angle jitter is prevented from occurring in the playing process.

In the above steps 102 to 108, a planar image centered on one target object is created for the target object, and the panoramic video is played. In practical application, the above operations can be repeatedly executed to select different target objects, thereby providing different playing viewing angles for viewers to select. The panoramic video viewer can independently select a default visual angle when opening video browsing, can manually drag the visual angle to change according to own preference, and can switch back to the default playing visual angle customized by the creator.

Illustratively, the target object is a "dog", and the operation steps of the above steps 102 to 104 are performed, so that a planar image with the "dog" as the center of the playing viewing angle can be obtained, and when a viewer watches the video, the viewer can see that the "dog" is always located at the center of the planar image; the target object may also be a "cat", and the operation steps in steps 102-104 are also performed, so that a planar image with the "cat" as the center of the playing viewing angle can be obtained, and when the viewer views the video, the viewer can see that the "cat" is always in the center of the planar image.

Under the circumstance, after framing a target object positioned in the center of a playing plane image in the current video, an author can determine a reference polar coordinate of the center position of the target object in a spherical polar coordinate system and uses the reference polar coordinate as the center of a tracking plane image for generating the next video frame, so that a corresponding panoramic video is played by always taking the target object as the center, namely, the author only needs to frame the target object, can automatically generate the tracking plane image taking the target object as the center and track the target, the visual angle center of the continuous tracking plane image can be automatically generated for the target object, the author does not need to process the panoramic video frame by frame, and the processing efficiency of the panoramic video is greatly improved; and for the framed target object, the target object in the panoramic video is tracked by using a spherical polar coordinate system, so that the tracking failure caused by the image distortion of the target object at different positions in the panoramic video can be effectively avoided, the target object tracking accuracy and success rate are improved, and the panoramic video processing effect and playing effect are improved.

Fig. 3 is a flowchart illustrating another panoramic video data processing method according to an embodiment of the present application, which specifically includes the following steps:

step 302: in the Nth frame of video frame, receiving a moving operation of dragging a target object to the center of a playing plane image, and updating the playing plane image of the Nth frame of video frame into a plane image taking the target object as the center according to the moving operation.

Step 304: in the case of receiving a frame selection operation for a target object located at the center of a playing plane image in an nth frame video frame, determining a reference polar coordinate of the center position of the target object in a spherical polar coordinate system.

Step 306: and carrying out image recognition on the playing plane image of the Nth frame of video frame, and determining the object characteristics of the target object.

Step 308: mapping the (N + 1) th frame of video frame to the spherical polar coordinate system to obtain a panoramic image of the (N + 1) th frame of video frame in the spherical polar coordinate system; taking the reference polar coordinate as a center and a preset angle as a range, and intercepting the panoramic image; and converting the intercepted panoramic image into a tracking plane image of the (N + 1) th frame video frame.

Step 310: and determining an updated polar coordinate of the central position of the target object in the spherical polar coordinate system in the plane image of the (N + 1) th frame of video frame according to the object feature of the target object.

Step 312: and taking the updated polar coordinate as the reference polar coordinate, determining whether a tracking stop condition is reached, if so, executing the step 314, and if not, returning to the step 308.

Step 314: and obtaining a reference polar coordinate sequence corresponding to the video frames from the Nth frame to the (N + X) th frame, wherein the (N + X) th frame is a video frame corresponding to the tracking stop condition.

Step 316: and performing smooth filtering processing on the reference polar coordinate sequences corresponding to the video frames from the Nth frame to the (N + X) th frame to obtain the centers of the playing plane images from the video frames from the Nth frame to the (N + X) th frame.

According to the panoramic video data processing method, after a creator selects a target object located in the center of a playing plane image in a current video, a reference polar coordinate of the center position of the target object in a spherical polar coordinate system can be determined and used as the center of a tracking plane image for generating a next video frame, so that a corresponding panoramic video is played by always taking the target object as the center, namely, the creator only needs to select the target object to automatically generate the tracking plane image taking the target object as the center for target tracking, the visual angle center of a continuous tracking plane image can be automatically generated for the target object, the creator does not need to process the panoramic video frame by frame, and the processing efficiency of the panoramic video is greatly improved; and the target object in the panoramic video is tracked by a spherical polar coordinate system aiming at the framed target object, so that the tracking failure caused by the graphic distortion of the target object at different positions in the panoramic video can be effectively avoided, the target object tracking accuracy and success rate are improved, and the panoramic video processing effect and playing effect are improved.

The above is a schematic scheme of a panoramic video data processing method according to this embodiment. It should be noted that the technical solution of the panoramic video data processing method shown in fig. 3 is the same as the technical solution of the panoramic video data processing method shown in fig. 1, and details of the technical solution of the panoramic video data processing method shown in fig. 3, which are not described in detail, can be referred to the description of the technical solution of the panoramic video data processing method shown in fig. 1.

Corresponding to the above method embodiment, the present application further provides an embodiment of a panoramic video data processing apparatus, and fig. 4 shows a schematic structural diagram of a panoramic video data processing apparatus provided in an embodiment of the present application. As shown in fig. 4, the apparatus includes:

a first determining module 402, configured to, in a case that a frame selection operation of a target object located at the center of a playing plane image for a current video frame is received, determine a reference polar coordinate of a center position of the target object in a spherical polar coordinate system;

a generating module 404 configured to generate a tracking plane image of a next video frame with the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system;

a second determining module 406, configured to determine, according to the object feature of the target object, an updated polar coordinate of the central position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame;

the executing module 408 is configured to use the updated polar coordinate as the reference polar coordinate, return to the operation step of executing the panoramic image in the spherical polar coordinate system according to the next video frame, and generate a tracking plane image of the next video frame with the reference polar coordinate as a center until a tracking stop condition is reached, obtain a reference polar coordinate sequence corresponding to the current video frame to a target video frame, where the target video frame is a video frame corresponding to the tracking stop condition.

Optionally, the apparatus further comprises an update module configured to:

and updating the playing plane image of the current video frame into a plane image taking the target object as the center according to the moving operation.

Optionally, the first determining module 402 is further configured to:

Optionally, the apparatus further comprises an identification module configured to:

Optionally, the identification module is further configured to:

determining a target frame corresponding to the frame selection operation;

determining a corresponding recognition area according to the target frame;

Optionally, the generating module 404 is further configured to:

Optionally, the second determining module 406 is further configured to:

Optionally, the execution module 408 is further configured to:

determining the similarity of the target image characteristic and the object characteristic;

Optionally, the apparatus further comprises a processing module configured to:

According to the panoramic video data processing device, after a creator frames a target object located in the center of a playing plane image in a current video, a reference polar coordinate of the center position of the target object in a spherical polar coordinate system can be determined and used as the center of a tracking plane image for generating a next video frame, so that a corresponding panoramic video is played by always taking the target object as the center, namely, the creator only needs to frame the target object to automatically generate the tracking plane image with the target object as the center for target tracking, the visual angle center of a continuous tracking plane image can be automatically generated for the target object, the creator does not need to process the panoramic video frame by frame, and the processing efficiency of the panoramic video is greatly improved; and for the framed target object, the target object in the panoramic video is tracked by using a spherical polar coordinate system, so that the tracking failure caused by the image distortion of the target object at different positions in the panoramic video can be effectively avoided, the target object tracking accuracy and success rate are improved, and the panoramic video processing effect and playing effect are improved.

The above is a schematic scheme of a panoramic video data processing apparatus of the present embodiment. It should be noted that the technical solution of the panoramic video data processing apparatus and the technical solution of the panoramic video data processing method belong to the same concept, and details of the technical solution of the panoramic video data processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the panoramic video data processing method.

Fig. 5 illustrates a block diagram of a computing device 500 provided according to an embodiment of the present application. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530, and database 550 is used to store data.

Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the application, the above-described components of computing device 500 and other components not shown in FIG. 5 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 5 is for purposes of example only and is not limiting as to the scope of the present application. Other components may be added or replaced as desired by those skilled in the art.

Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.

Wherein, the processor 520 is configured to execute the following computer-executable instructions to implement the following method:

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the panoramic video data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the panoramic video data processing method.

An embodiment of the present application further provides a computer-readable storage medium, which stores computer-executable instructions, and when executed by a processor, the computer-executable instructions are used to implement the above-mentioned operation steps of panoramic video data processing.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the above-mentioned panoramic video data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the above-mentioned panoramic video data processing method.

The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in source code form, object code form, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that for simplicity and convenience of description, the above-described method embodiments are described as a series of combinations of acts, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and/or concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to the related descriptions of other embodiments.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the application and its practical application, to thereby enable others skilled in the art to best understand the application and its practical application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. A panoramic video data processing method, comprising:

under the condition that a frame selection operation of a target object which is positioned at the center of a playing plane image aiming at a current video frame is received, determining a reference polar coordinate of the center position of the target object in a spherical polar coordinate system, wherein the target object refers to an object displayed at the center of the playing plane image, and the reference polar coordinate refers to the center of a view angle of a plane image corresponding to a panoramic video image;

generating a tracking plane image of a next video frame by taking the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system, wherein the tracking plane image is a plane image generated when target tracking is performed on the target object;

taking the updated polar coordinate as the reference polar coordinate, returning to execute the operation step of generating a tracking plane image of the next video frame according to the panoramic image of the next video frame in the spherical polar coordinate system by taking the reference polar coordinate as a center until a tracking stop condition is reached, and obtaining a reference polar coordinate sequence corresponding to the current video frame to a target video frame, wherein the target video frame is the video frame corresponding to the tracking stop condition; and playing the plane images from the current video frame to the target video frame by taking the reference polar coordinate sequence as the center of the tracking plane image.

2. The method of claim 1, wherein prior to receiving a frame selection operation for a target object located in a center of a playing plane image in a current video frame, the method further comprises:

3. The panoramic video data processing method according to claim 1 or 2, wherein the determining a reference polar coordinate of the center position of the target object in a spherical polar coordinate system comprises:

4. The panoramic video data processing method according to claim 1 or 2, wherein the determining that the center position of the target object is behind a reference polar coordinate in a spherical polar coordinate system further comprises:

5. The method of claim 4, wherein the image recognition of the playing plane image of the current video frame comprises:

determining a target frame corresponding to the frame selection operation;

determining a corresponding recognition area according to the target frame;

6. The panoramic video data processing method according to claim 1 or 2, wherein the generating a tracking plane image of the next video frame centered on the reference polar coordinate from the panoramic image of the next video frame in the spherical polar coordinate system comprises:

7. The method according to claim 5, wherein said determining updated polar coordinates of the center position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame according to the object feature of the target object comprises:

8. The method according to claim 7, wherein said performing image recognition on the recognition area in the tracking plane image of the next video frame, and determining the center position of the target object in the next video frame, comprises:

9. The method of processing panoramic video data according to claim 8, wherein after determining the center position of the target object in the next video frame according to the center position of the target object in the playing plane image and the position offset, the method further comprises:

10. The panoramic video data processing method according to claim 8, wherein the reaching of the tracking stop condition includes:

11. The method according to claim 1 or 2, wherein after obtaining the reference polar coordinate sequence corresponding to the current video frame to the target video frame, the method further comprises:

performing smoothing filtering processing on a reference polar coordinate sequence corresponding to the current video frame to the target video frame to obtain a reference polar coordinate sequence after smoothing filtering processing;

12. A panoramic video data processing apparatus, comprising:

the first determination module is configured to determine a reference polar coordinate of a central position of a target object in a spherical polar coordinate system when a frame selection operation of the target object located at the center of a playing plane image for a current video frame is received, wherein the target object refers to an object displayed in the center of the playing plane image, and the reference polar coordinate refers to a viewing angle center of a plane image corresponding to a panoramic video image;

a generating module configured to generate a tracking plane image of a next video frame with the reference polar coordinate as a center according to a panoramic image of the next video frame in the spherical polar coordinate system, wherein the tracking plane image is a plane image generated when target tracking is performed on the target object;

a second determination module configured to determine an updated polar coordinate of the central position of the target object in the spherical polar coordinate system in the tracking plane image of the next video frame according to the object feature of the target object;

an execution module configured to use the updated polar coordinate as the reference polar coordinate, return to execute the operation step of generating the tracking plane image of the next video frame in the spherical polar coordinate system according to the panoramic image of the next video frame in the spherical polar coordinate system with the reference polar coordinate as a center until a tracking stop condition is reached, obtain a reference polar coordinate sequence corresponding to the current video frame to a target video frame, where the target video frame is a video frame corresponding to the tracking stop condition; and playing the plane images from the current video frame to the target video frame by taking the reference polar coordinate sequence as the center of the tracking plane image.

13. A computing device, comprising:

a memory and a processor;

under the condition that a frame selection operation of a target object which is positioned in the center of a playing plane image aiming at a current video frame is received, determining a reference polar coordinate of the center position of the target object in a spherical polar coordinate system, wherein the target object refers to an object displayed in the center of the playing plane image, and the reference polar coordinate refers to the view angle center of a plane image corresponding to a panoramic video image;

14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the panoramic video data processing method of any one of claims 1 to 11.