WO2022199441A1 - 全景视频的播放方法、装置、计算机设备和存储介质 - Google Patents

全景视频的播放方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2022199441A1
WO2022199441A1 PCT/CN2022/081149 CN2022081149W WO2022199441A1 WO 2022199441 A1 WO2022199441 A1 WO 2022199441A1 CN 2022081149 W CN2022081149 W CN 2022081149W WO 2022199441 A1 WO2022199441 A1 WO 2022199441A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame image
video
recommended
target
target object
Prior art date
Application number
PCT/CN2022/081149
Other languages
English (en)
French (fr)
Inventor
张伟俊
陈聪
马龙祥
Original Assignee
影石创新科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 影石创新科技股份有限公司 filed Critical 影石创新科技股份有限公司
Publication of WO2022199441A1 publication Critical patent/WO2022199441A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/4728End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for selecting a Region Of Interest [ROI], e.g. for requesting a higher resolution version of a selected region
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • H04N21/4312Generation of visual interfaces for content selection or interaction; Content or additional data rendering involving specific graphical features, e.g. screen layout, special fonts or colors, blinking icons, highlights or animations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
    • H04N5/2624Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects for obtaining an image which is composed of whole input images, e.g. splitscreen

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a panorama video playback method, device, computer equipment and storage medium.
  • Panoramic video is to use multiple cameras to take 360° omnidirectional shooting of the environment to obtain multiple video streams, and then synthesize multiple video streams through synchronization, splicing, projection and other technologies. Choose any angle to watch within the range to get an immersive viewing experience.
  • the display device In order to ensure the viewing effect of the panoramic video, when playing the panoramic video, the display device only displays part of the panoramic area covered by the panoramic video, and the user can change the viewing angle to view other areas in the panoramic video.
  • the current viewable area of the user is usually referred to as the field of view (FOV, Field of View).
  • FOV Field of View
  • a portion of the panoramic video is played at a preset viewing angle.
  • a method for playing a panoramic video comprising:
  • the detection result includes a candidate area where the target object is located;
  • the recommended viewing angle video and the preset viewing angle video are displayed on the same display screen; wherein the screen content of the preset viewing angle video is different from the screen content of the recommended viewing angle video.
  • target detection is performed on the panoramic video to obtain a detection result, including:
  • Target detection is performed on each single-frame image in the single-frame image set, and a detection result is obtained; wherein, the detection result includes respective candidate regions corresponding to a plurality of single-frame images, and the candidate regions include target objects.
  • generating a recommended viewing angle video according to the detection result includes:
  • a recommended viewing angle video is generated according to the recommended picture corresponding to each single frame of image.
  • the feature parameter includes the confidence of the candidate region in the single-frame image
  • the target candidate region of each single-frame image is determined based on the feature parameters of the candidate regions corresponding to each of the multiple single-frame images, including:
  • the feature parameter includes the area of the candidate region in the single-frame image
  • the single-frame image set includes N single-frame images
  • the N single-frame images have a time sequence
  • N is a positive integer
  • the target candidate region of each single-frame image is determined from the corresponding candidate regions of each single-frame image, including:
  • the candidate area with the largest area in the first frame of single-frame image is used as the target candidate area of the first frame of single-frame image.
  • the feature parameter further includes the position of the center point of the candidate region, and the target candidate region of each single-frame image is determined from the candidate region corresponding to each single-frame image based on the feature parameter, and further includes:
  • N If N is greater than 1, determine the position of the center point of each candidate area in the Nth single-frame image
  • the candidate region with the smallest Euclidean distance is determined as the target candidate region of the Nth single-frame image.
  • a recommended picture corresponding to each single-frame image is generated, including:
  • the type of the target object is the preset target type
  • a recommended screen with a preset size and the position of the target object is located at the preset position of the screen is generated;
  • a recommended screen including the target candidate area with the smallest area is generated.
  • a recommended viewing perspective video is generated according to a recommended picture corresponding to each single frame of image, including:
  • Interpolation calculation is performed on the position coordinates of the target object in the adjacent recommended pictures by using an interpolation algorithm, and the intermediate position coordinates are obtained;
  • the recommended viewing angle videos are generated by sorting the recommended pictures and the intermediate recommended pictures according to the playback time from front to back.
  • a method for automatically generating a recommended viewing angle video from a panoramic video comprising:
  • the detection result includes a candidate area where the target object is located;
  • a recommended viewing perspective video is generated according to the detection result; wherein, the picture content of the recommended viewing perspective video includes the target object.
  • a device for playing panoramic video comprising:
  • the target detection module is used to obtain the panoramic video, perform target detection on the panoramic video, and obtain the detection result; wherein, the detection result includes the candidate area where the target object is located;
  • a video generation module configured to generate a recommended viewing perspective video according to the detection result; wherein, the screen content of the recommended viewing perspective video includes a target object;
  • the synchronous display module is used to display the recommended viewing angle video and the preset viewing angle video on the same display screen; wherein the screen content of the preset viewing angle video is different from the screen content of the recommended viewing angle video.
  • a device for automatically generating a recommended viewing angle video from a panoramic video comprising:
  • the target detection module is used to obtain the panoramic video, perform target detection on the panoramic video, and obtain the detection result; wherein, the detection result includes the candidate area where the target object is located;
  • the video generation module is configured to generate a recommended viewing angle video according to the detection result; wherein, the screen content of the recommended viewing angle video includes a target object.
  • a computer device includes a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program:
  • the detection result includes a candidate area where the target object is located;
  • the recommended viewing angle video and the preset viewing angle video are displayed on the same display screen; wherein the screen content of the preset viewing angle video is different from the screen content of the recommended viewing angle video.
  • the detection result includes a candidate area where the target object is located;
  • the recommended viewing angle video and the preset viewing angle video are displayed on the same display screen; wherein the screen content of the preset viewing angle video is different from the screen content of the recommended viewing angle video.
  • the above-mentioned panorama video playback method, device, computer equipment and storage medium by acquiring the panorama video and performing target detection on the panorama video, a detection result including a candidate region where the target object is located is obtained, and a recommendation including the target object is generated according to the detection result.
  • Watch the viewing angle video display the recommended viewing angle video with different screen content and the preset viewing angle video on the same display screen, so as to realize the display of other target objects in the panoramic video, avoid users from missing the wonderful content, and improve the panoramic video. viewing experience.
  • Fig. 1 is the internal structure diagram of computer equipment in one embodiment
  • FIG. 2 is a schematic flowchart of a method for playing a panoramic video in one embodiment
  • FIG. 3 is a schematic diagram of the relationship between a panoramic video, a recommended viewing angle video, and a preset viewing angle video, according to an embodiment
  • 4a to 4e are schematic diagrams showing the display of a recommended viewing angle video and a preset viewing angle video
  • FIG. 5 is a schematic flowchart of target detection for panoramic video in one embodiment
  • FIG. 6 is a schematic flowchart of generating a recommended viewing angle video in one embodiment
  • FIG. 7 is a schematic flowchart of determining a target candidate region in one embodiment
  • FIG. 8 is a schematic flowchart of generating a recommendation screen in one embodiment
  • FIG. 9 is a schematic flowchart of generating a recommended viewing angle video in another embodiment
  • FIG. 10 is a schematic flowchart of generating a recommended viewing angle video in another embodiment
  • FIG. 11 is a structural block diagram of a device for playing panoramic video in one embodiment
  • FIG. 12 is a structural block diagram of an apparatus for generating a recommended viewing angle video based on a panoramic video in one embodiment.
  • the panorama video playback method provided by the present application can be applied to the computer device as shown in FIG. 1 .
  • the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 1 .
  • the computer equipment includes a processor, memory, a communication interface, a display screen, and an input device connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium, an internal memory.
  • the nonvolatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium.
  • the communication interface of the computer device is used for wired or wireless communication with an external terminal, and the wireless communication can be realized by WIFI, operator network, NFC (Near Field Communication) or other technologies.
  • the display screen of the computer equipment may be a liquid crystal display screen or an electronic ink display screen
  • the input device of the computer equipment may be a touch layer covered on the display screen, or a button, a trackball or a touchpad set on the shell of the computer equipment , or an external keyboard, trackpad, or mouse.
  • FIG. 1 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • FIG. 2 a method for playing a panoramic video is provided, and the method is applied to the computer device in FIG. 1 as an example to illustrate, including the following steps:
  • S210 Acquire a panoramic video, perform target detection on the panoramic video, and obtain a detection result.
  • the detection result includes the candidate area where the detected target object is located, and the candidate area is used to represent the position area of the target object in the single-frame image, which can be determined by selecting a rectangular frame on the single-frame image, and can also be determined by the target object's location area.
  • the boundary line is determined by box selection on the single frame image.
  • the representation form of the candidate region is not specifically limited.
  • the computer device may perform target detection on the panoramic video according to a preset target of interest that the user pays attention to, and obtain a detection result including a candidate region where the target of interest is located.
  • the target of interest is the target object that is pre-set by the user, that is, the point of interest.
  • the target of interest can be a person, an animal, a vehicle, an airplane, etc. It can be static, such as buildings, trees on the road, It can also be dynamic, such as a moving vehicle, a running athlete, etc.
  • the computer equipment can also perform target detection on the panoramic video according to a preset target recognition algorithm, and obtain detection results of the candidate regions where all the identified target objects are located.
  • the picture content of the recommended viewing angle video includes the target object.
  • the computer device generates a recommended picture according to the candidate region where the target object is located in the detection result, and the recommended viewing angle video is constituted by the recommended picture.
  • the candidate area may be directly used as the recommended image, or the center point of the candidate area may be used as the center, the preset size specification may be expanded outward, and the recommended image may be obtained by frame selection in the single frame image.
  • a recommended viewing angle video can be generated for the candidate regions of the same target object.
  • the target object is detected as a vehicle, and the single-frame images from the 1st to the 50th frame include vehicle A and vehicle B
  • the candidate area of the 51st to 100th single-frame images only includes the candidate area of vehicle B, and the computer device generates a recommendation screen for vehicle A based on the candidate area of vehicle A in the 1st to 50th single-frame images , form a video of the recommended viewing angle of vehicle A, and generate a recommended screen for vehicle B according to the candidate area of vehicle B in the first to 100th single-frame images, and form a video of the recommended viewing angle of vehicle B.
  • Preset screening conditions can also be used to determine a target candidate area among multiple candidate areas, generate a recommended screen corresponding to the target candidate area, and watch videos from a recommended viewing angle corresponding to the target candidate area.
  • the screening conditions can be: The confidence of the detection result is the largest or the smallest, and it may also be that the area of the candidate region in the detection result is the largest or the smallest.
  • the screen content of the preset viewing angle video is different from the screen content of the recommended viewing angle video.
  • the preset viewing angle video is a video generated by a computer device based on the panoramic video under the default viewing angle, and the screen can be adjusted based on the instructions input by the user to display other areas in the panoramic video according to the user's wishes.
  • the preset viewing angle video may be a video obtained by a preset camera, or may be a video in which the screen content includes a preset target object.
  • the manner of forming the preset viewing angle video is not specifically limited.
  • the computer equipment performs target detection of the target object as a human on a single frame image in the panoramic video, and then generates a recommended viewing angle video according to the detection result.
  • the preset viewing angle is the video obtained by a certain camera, and the computer equipment Then, the recommended viewing angle video and the preset viewing angle video different from the recommended viewing angle video will be generated and displayed on the same display screen.
  • the computer device can display the recommended viewing angle video and the preset viewing angle video side by side on the same display screen, such as parallel display (FIG. 4a), same column display (FIG. 4c and FIG. 4d), or diagonal display.
  • the recommended viewing angle video is displayed first, and then the preset viewing angle video is displayed.
  • the preset viewing angle video can be displayed first, and then the recommended viewing angle video is displayed.
  • the computer device can also recommend the video of the viewing angle as the secondary display video, the video that is mainly displayed with the preset viewing angle video, and display the recommended viewing angle video and the preset viewing angle on the same display screen, for example, display the preset viewing angle in full screen.
  • the recommended viewing perspective video is displayed as a thumbnail, and the thumbnail can be located at any position on the entire display screen, such as the upper left corner (Figure 4d), the lower left corner, the middle, the upper right corner or the lower right corner ( Figure 4e).
  • the computer device can also realize the recommended viewing angle video in full screen, and display the preset viewing angle video in thumbnails.
  • the computer device performs target detection on the panoramic video, obtains a detection result including the candidate area where the target object is located, and generates a recommended viewing angle video including the target object according to the detection result, and compares the recommended viewing angle video with different screen contents with the recommended viewing angle video.
  • the preset viewing angle video is displayed on the same display screen, so as to realize the display of other target objects in the panoramic video, avoid users from missing the wonderful content, and improve the viewing experience of the panoramic video.
  • the above S210 includes:
  • the computer device may also extract a single frame image in the panoramic video at a preset interval frame number to obtain a single frame image set. For example, the preset interval frame number is 5 frames, and the computer device extracts a single frame image from the panoramic video every 5 frames, correspondingly to obtain a single frame image set B.
  • the detection result includes candidate regions corresponding to each of multiple single-frame images, and the candidate regions include target objects.
  • the type of target object can be a person, a face, a vehicle, or a building.
  • the computer device may use a machine learning-based target detection model to perform target detection on each single-frame image in the single-frame image set to obtain a detection result.
  • the computer equipment inputs each single-frame image in the single-frame image set into a face detection model trained by using a large number of face images as training samples and a vehicle detection model trained by using a large number of vehicle images as training samples. Then, the corresponding detection result is obtained, and the detection result includes the candidate area including the face and the candidate area including the vehicle respectively corresponding to the single frame image.
  • the target detection of target objects can also be performed by means of template matching, key point matching, and key feature detection.
  • the computer device extracts single-frame images in the panoramic video at preset intervals to obtain a single-frame image set, and performs target detection on each single-frame image in the single-frame image set, and obtains a set of single-frame images including multiple single-frame images.
  • the method of extracting a single frame of image in the panoramic video at a preset interval and performing target detection reduces the amount of data required for target detection, thereby improving the target detection efficiency.
  • the above S220 includes:
  • S610 Determine the target candidate region of each single-frame image based on the feature parameters of the candidate regions corresponding to each of the multiple single-frame images.
  • the feature parameters can be used to characterize the regional characteristics of the candidate region, such as at least one of area, center point position, color histogram, or confidence of the target type in the candidate region.
  • the computer device needs to determine the target candidate region corresponding to the single frame image from the multiple candidate regions according to the characteristic parameters of the multiple candidate regions.
  • the feature parameter includes the confidence of the candidate region in the single-frame image, that is, the confidence of the target type represented by the candidate region.
  • the computer equipment can obtain the corresponding detection results.
  • the candidate region with the highest confidence of the multiple candidate regions is used as the target candidate region of the single frame image.
  • the computer equipment determines that candidate region A in single-frame images 1-150 is a single frame
  • the computer device determines that the candidate region B in the single-frame images 151-200 is the target candidate region corresponding to the single-frame images 151-200.
  • the computer device may also, for the first frame of single-frame image in the N single-frame images with time sequence, obtain the candidate region with the highest confidence in the plurality of candidate regions in the first frame of single-frame image as the first frame of single-frame image.
  • the target candidate area of 1 frame of single-frame image after determining the target candidate area of the first frame of single-frame image, then obtain the distance between multiple candidate areas in the second frame of single-frame image and the target candidate area in the first frame of single-frame image
  • Determine the target candidate area in the second frame of single-frame image specifically, the candidate target area with the greatest similarity may be determined as the target candidate area in the second frame of single-frame image.
  • the target candidate region of each single-frame image in the N single-frame images is finally obtained.
  • the above similarity may be the intersection ratio of the area between the candidate region and the target candidate region, the correlation coefficient of the color histogram, or thearia distance.
  • the target candidate area obtained by target detection is an area obtained by a rectangular frame in a single-frame image
  • the target candidate area is also an area obtained by a rectangular frame in a single-frame image
  • the computer device obtains the area.
  • the intersection of the two diagonal lines of the target candidate region is the position of the target object in the single frame image.
  • the candidate area obtained by target detection is the area obtained by the boundary line of the target object in the single frame image
  • the target candidate area is also the area obtained by the boundary line of the target object in the single frame image. Then the geometric center of the obtained target candidate region is the position of the target object in the single frame image.
  • the computer device may take the position of the target object of each single frame image as the center point, extend a preset length outward to form a circular area with the preset length as the radius, and use the circular area as the corresponding single frame image. recommended screen.
  • the computer equipment can also take the position of the target object of each single frame image as the center point, extend the preset length outward in the direction of the X-axis, and simultaneously extend the preset length in the Y-axis to form a rectangular area with the preset length, and use the rectangle.
  • the region is used as the recommended picture corresponding to a single frame of image.
  • a recommended viewing angle video constituted by the recommended screen is obtained.
  • the computer device determines the target candidate region of each single-frame image in the corresponding candidate region, and determines the target of each single-frame image based on the confidence or area of the candidate regions corresponding to each of the multiple single-frame images
  • the center point of the candidate area is the position of the target object in each single-frame image, so as to generate the recommended screen corresponding to each single-frame image according to the position of the target object in each single-frame image, and then use the corresponding
  • the recommendation screen generates a recommended viewing perspective video, so that each frame of the generated recommended viewing perspective video includes a target object with the highest confidence or an appropriate area, thereby improving the picture effect of the recommended viewing perspective video.
  • the feature parameter includes the area of the candidate region in a single-frame image
  • the single-frame image set includes N single-frame images
  • the N single-frame images have a time sequence
  • N is a positive integer.
  • the computer device takes the candidate area with the largest area in the first frame of single-frame image as the target candidate area of the first frame of single-frame image.
  • the first frame of single-frame image is the first frame of image in the single-frame image set, not necessarily the first frame of image in the panoramic video.
  • the computer device obtains the areas of all candidate regions in the first single-frame image in the single-frame image set, and determines the candidate region with the largest area as the target candidate region of the first single-frame image.
  • the feature parameter further includes the position of the center point of the candidate region. As shown in FIG. 7 , if N is greater than 1, the above S610 further includes:
  • S710 Determine the position of the center point of each candidate region in the Nth frame of single-frame image.
  • the computer device acquires the geometric center position of each candidate region in the Nth single-frame image, and uses the geometric center position as the center point position of the corresponding candidate region.
  • S720 Calculate the Euclidean distance between the center point position of each candidate region in the Nth frame of single-frame image and the center point position of the target candidate region in the N-1th frame of single-frame image.
  • the computer device calculates the Euclidean distance between the center point position of each candidate region in the Nth frame single-frame image and the center point position of the target candidate region in the N-1th frame (ie, the previous frame) single-frame image, and The candidate region in the Nth single-frame image with the smallest Euclidean distance is used as the target candidate region of the Nth single-frame image.
  • the second frame of single-frame image includes three candidate areas a, b, and c
  • the computer device obtains the first frame of The Euclidean distance between the center point position of the target candidate region of the frame image and the center point positions of the three candidate regions a, b, and c in the second frame of single-frame image, corresponding to three Euclidean distances L1, L2 and L3, Where L2 ⁇ L1 ⁇ L3, the computer equipment determines that the candidate area b corresponding to the minimum Euclidean distance L2 is the target candidate area of the second frame of single-frame image, and so on, and then calculates the target candidate area in the second frame of single-frame image.
  • the Euclidean distance between the position of the center point and the position of the center point of the candidate area in the third frame of single-frame image, and the candidate area with the smallest Euclidean distance is determined as the target candidate area of the third frame of single-frame image...
  • the target candidate area of the frame image is determined in the candidate area of the subsequent single frame image.
  • the computer device determines that the candidate region with the largest area in the first single-frame image of the single-frame image set is the target candidate region of the first single-frame image, and calculates the target candidate region in the previous single-frame image by calculating The Euclidean distance between the position of the center point and the position of the center point of the candidate area in the next frame of single-frame image, determine the candidate area in the next frame of single-frame image with the smallest Euclidean distance as the target candidate area of the next frame of single-frame image , so that the recommended viewing angle video always includes the same target object, thereby improving the retrospective display of the same target object.
  • the above S630 includes:
  • S810 Acquire the type of the target object included in the target candidate region to which the position of the target object of each single-frame image belongs.
  • the computer device after determining the position of the target object in each single-frame image, the computer device further acquires the type of the target object included in the target candidate region to which the position belongs, and determines the generated recommended screen according to the type of the target object.
  • the type of the target object is a preset target type, generate a recommended image with a preset size and a position of the target object at a preset position of the image.
  • the computer device generates a recommended screen of a preset size, and the position of the target object is located at a screen preset position of the generated recommended screen.
  • the preset target type is a human face. If the target object type is a human face, the computer device generates a recommended image in which the position of the human face is 2/3 of the height and 1/2 of the width of the generated recommended image. If the type of the target object is not the preset target type, the computer device generates a recommended screen with the smallest area including the target candidate area. For example, if the type of the target object is a football field (not a human face), the computer device generates a recommended picture with the smallest area including the football field in the target candidate area.
  • the computer device obtains the type of the target object included in the target candidate area to which the position of the target object of each single frame image belongs, and generates a preset size when the type of the target object is a preset target type, And the position of the target object is located at the preset position of the screen.
  • the recommended screen with the smallest area including the target candidate area is generated, so as to realize the type of different target objects.
  • Correspondingly determine different recommended pictures so that the target object position in each picture of the recommended viewing angle video is suitable, so that the user can see the partial picture of the target object of the preset target type, and can also see the target object of the non-preset target type.
  • the overall picture further improves the picture effect of the recommended viewing angle video.
  • the above S640 includes:
  • the single-frame image in which each recommended picture is located has a uniquely determined playback time in the panoramic video
  • the computer device uses an interpolation algorithm to calculate the position of the target object in the recommended pictures with adjacent playback times and including the same target object. Interpolation calculation to complete the position of the target object in the vacant pictures in the adjacent two recommended pictures.
  • the computer device uses a formula for the position coordinates of the target object in the adjacent recommended pictures. Perform a linear interpolation calculation to obtain the intermediate position coordinates.
  • P t is the position coordinates of the target object in the recommended screen at the previous playback time t
  • P t+N is the position coordinates of the target object in the recommended screen at the next playback time t+N
  • P t+k is located in the previous playback time t.
  • the position coordinates of the target object in the intermediate recommended screen corresponding to a time t+k between the playback time and the next playback time.
  • the position coordinates of the target object may be the position coordinates of the target object in the corresponding recommended image, the position coordinates of the target object in the corresponding single-frame image, or the target object in the actual environment. location coordinates in .
  • the playing time of the middle recommended picture is located between adjacent recommended pictures.
  • the recommended viewing angle video is generated by sorting the recommended pictures and the intermediate recommended pictures from front to back according to the playback time.
  • the computer device generates an intermediate recommended image with the same size as the above-mentioned recommended image and including the same target object based on the intermediate position coordinates, and generates a recommended viewing angle video by sorting the recommended image and the intermediate recommended image from front to back according to the playback time.
  • the computer device may also use Kalman filtering algorithm and its variants, sliding window averaging method and other filtering algorithms to filter the recommended image constituting the recommended viewing angle video and the position of the target object in the intermediate recommended image, so that the generated recommendation The viewing angle video is more stable and the jitter is small, which further improves the picture effect of the recommended viewing angle video.
  • the computer device uses an interpolation algorithm to perform interpolation calculation on the position coordinates of the target object in the adjacent recommended pictures to obtain the intermediate position coordinates, and generates a playback time between adjacent recommended pictures according to the intermediate position coordinates, including the target object.
  • the middle recommended screen, and then the recommended viewing angle video is generated by sorting the recommended screen and the middle recommended screen according to the playback time from front to back. Fluency.
  • the above-mentioned panorama video playing method may generate at least two recommended viewing angle videos corresponding to at least two target objects.
  • the user presets the guide and the basketball court as the target objects, and the computer device performs target detection on the single frame video in the panoramic video of the game according to the guide and the basketball court, and detects the panoramic view of the game.
  • the guide in each single-frame image in the video generates a first recommended image with the guide in the middle, and the first recommended image constitutes a first recommended viewing angle video for the guide.
  • the basketball court in each single frame image in the panoramic video of the game is detected, and a second recommended image including the entire basketball court with the smallest area is generated, and the second recommended image constitutes a second recommended viewing angle video for the basketball court.
  • the computer device displays the obtained first recommended viewing angle video for the instructor and the second recommended viewing angle video for the basketball court on the same display screen as the preset viewing angle video.
  • a method for automatically generating a recommended viewing angle video from a panoramic video including:
  • S1010 Acquire a panoramic video, perform target detection on the panoramic video, and obtain a detection result.
  • the detection result includes the candidate region where the target object is located.
  • the screen content target object of the viewing angle video is recommended.
  • steps in the flowcharts in FIGS. 2-10 are shown in sequence according to the arrows, these steps are not necessarily executed in the sequence indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order, and these steps may be performed in other orders. Moreover, at least a part of the steps in FIGS. 2-10 may include multiple steps or multiple stages. These steps or stages are not necessarily executed and completed at the same time, but may be executed at different times. The execution of these steps or stages The order is also not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or phases within the other steps.
  • a panoramic video playback device including: a target detection module 1101, a video generation module 1102, and a synchronous display module 1103, wherein:
  • the target detection module 1101 is used to obtain panoramic video, perform target detection on the panoramic video, and obtain the detection result; wherein, the detection result includes the candidate area where the target object is located; the video generation module 1102 is used to generate a recommended viewing angle video according to the detection result; wherein, The screen content of the recommended viewing angle video includes the target object; the synchronous display module 1103 is used to display the recommended viewing angle video and the preset viewing angle video on the same display screen; wherein, the screen content of the preset viewing angle video and the recommended viewing angle are displayed. The content of the video is different.
  • the target detection module 1101 is specifically used for:
  • the candidate region of , the candidate region includes the target object.
  • the video generation module 1102 is specifically used to:
  • the feature parameter includes the confidence of the candidate region in the single frame image
  • the video generation module 1102 is specifically used for:
  • the feature parameter includes the area of the candidate region in a single-frame image, the single-frame image set includes N single-frame images, the N single-frame images have a time sequence, and N is a positive integer;
  • video generation Module 1102 is specifically used to:
  • the candidate area with the largest area in the first frame of single-frame image is used as the target candidate area of the first frame of single-frame image.
  • the feature parameter further includes the position of the center point of the candidate region
  • the video generation module 1102 is further configured to:
  • N determines the position of the center point of each candidate region in the Nth single-frame image; calculate the center point position of each candidate region in the Nth single-frame image and the target candidate region in the N-1th single-frame image
  • the Euclidean distance between the positions of the center points; the candidate region with the smallest Euclidean distance is determined as the target candidate region of the Nth single-frame image.
  • the video generation module 1102 is specifically used to:
  • the type of the target object included in the target candidate area to which the position of the target object of each single-frame image belongs if the type of the target object is the preset target type, the preset size is generated, and the position of the target object is at the preset position of the screen If the type of the target object is not the preset target type, a recommended image with the smallest area including the target candidate area is generated.
  • the video generation module 1002 is specifically used for:
  • the position coordinates of the target object in the adjacent recommended pictures are calculated by interpolation algorithm to obtain the intermediate position coordinates; the intermediate recommended pictures including the target object are generated according to the intermediate position coordinates; wherein, the playback time of the intermediate recommended pictures is located between the adjacent recommended pictures. time; the recommended viewing angle video is generated by sorting the recommended screen and the intermediate recommended screen according to the playback time from front to back.
  • an apparatus for automatically generating a recommended viewing angle video from a panoramic video including: a target detection module 1201 and a video generation module 1202 . in:
  • the function of the target detection module 1201 is the same as that of the above-mentioned target detection module 1101
  • the function of the video generation module 1202 is the same as that of the above-mentioned video generation module 1102 , which will not be repeated here.
  • the device for playing panoramic video For the specific limitation of the device for playing panoramic video, please refer to the above definition of the method for playing panoramic video, and for the specific limitation of the device for automatically generating the recommended viewing angle video for panoramic video, please refer to the above for the automatic generation of panoramic video recommended viewing angle video.
  • the limitation of the method is not repeated here.
  • the various modules in the above-mentioned apparatus for playing panoramic video and apparatus for automatically generating recommended viewing angle video for panoramic video may be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when executing the computer program:
  • Obtain a panoramic video perform target detection on the panoramic video, and obtain a detection result; wherein, the detection result includes a candidate area where the target object is located; a recommended viewing perspective video is generated according to the detection result; wherein, the screen content of the recommended viewing perspective video includes the target object;
  • the recommended viewing angle video and the preset viewing angle video are displayed on the same display screen; wherein, the screen content of the preset viewing angle video is different from the screen content of the recommended viewing angle video.
  • the processor further implements the following steps when executing the computer program:
  • the candidate region of , the candidate region includes the target object.
  • the processor further implements the following steps when executing the computer program:
  • the feature parameter includes the confidence of the candidate region in the single frame image
  • the processor further implements the following steps when executing the computer program:
  • the feature parameter includes the area of the candidate region in a single-frame image
  • the single-frame image set includes N single-frame images
  • the N single-frame images have a time sequence
  • N is a positive integer
  • the processor executes
  • the computer program also implements the following steps:
  • the candidate area with the largest area in the first frame of single-frame image is used as the target candidate area of the first frame of single-frame image.
  • the characteristic parameter further includes the position of the center point of the candidate region, and the processor further implements the following steps when executing the computer program:
  • N determines the position of the center point of each candidate region in the Nth single-frame image; calculate the center point position of each candidate region in the Nth single-frame image and the target candidate region in the N-1th single-frame image
  • the Euclidean distance between the positions of the center points; the candidate region with the smallest Euclidean distance is determined as the target candidate region of the Nth single-frame image.
  • the processor further implements the following steps when executing the computer program:
  • the type of the target object included in the target candidate area to which the position of the target object of each single-frame image belongs if the type of the target object is the preset target type, the preset size is generated, and the position of the target object is at the preset position of the screen If the type of the target object is not the preset target type, a recommended image with the smallest area including the target candidate area is generated.
  • the processor further implements the following steps when executing the computer program:
  • the position coordinates of the target object in the adjacent recommended pictures are calculated by interpolation algorithm to obtain the intermediate position coordinates; the intermediate recommended pictures including the target object are generated according to the intermediate position coordinates; wherein, the playback time of the intermediate recommended pictures is located between the adjacent recommended pictures. time; the recommended viewing angle video is generated by sorting the recommended screen and the intermediate recommended screen according to the playback time from front to back.
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • Obtain a panoramic video perform target detection on the panoramic video, and obtain a detection result; wherein, the detection result includes a candidate area where the target object is located; a recommended viewing perspective video is generated according to the detection result; wherein, the screen content of the recommended viewing perspective video includes the target object;
  • the recommended viewing angle video and the preset viewing angle video are displayed on the same display screen; wherein, the screen content of the preset viewing angle video is different from the screen content of the recommended viewing angle video.
  • the computer program further implements the following steps when executed by the processor:
  • the candidate region of , the candidate region includes the target object.
  • the computer program further implements the following steps when executed by the processor:
  • the feature parameter includes the confidence level of the candidate region in the single frame image
  • the computer program further implements the following steps when executed by the processor:
  • the feature parameter includes the area of the candidate region in a single-frame image, the single-frame image set includes N single-frame images, the N single-frame images have a time sequence, and N is a positive integer;
  • the processor also implements the following steps when executing:
  • the candidate area with the largest area in the first frame of single-frame image is used as the target candidate area of the first frame of single-frame image.
  • the characteristic parameter further includes the position of the center point of the candidate region
  • the computer program further implements the following steps when executed by the processor:
  • N determines the position of the center point of each candidate region in the Nth single-frame image; calculate the center point position of each candidate region in the Nth single-frame image and the target candidate region in the N-1th single-frame image
  • the Euclidean distance between the positions of the center points; the candidate region with the smallest Euclidean distance is determined as the target candidate region of the Nth single-frame image.
  • the computer program further implements the following steps when executed by the processor:
  • the type of the target object included in the target candidate area to which the position of the target object of each single-frame image belongs if the type of the target object is the preset target type, the preset size is generated, and the position of the target object is at the preset position of the screen If the type of the target object is not the preset target type, a recommended image with the smallest area including the target candidate area is generated.
  • the computer program further implements the following steps when executed by the processor:
  • the position coordinates of the target object in the adjacent recommended pictures are calculated by interpolation algorithm to obtain the intermediate position coordinates; the intermediate recommended pictures including the target object are generated according to the intermediate position coordinates; wherein, the playing time of the intermediate recommended pictures is located between the adjacent recommended pictures. time; the recommended viewing angle video is generated by sorting the recommended screen and the intermediate recommended screen according to the playback time from front to back.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory, and the like.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • the RAM may be in various forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

本申请涉及一种全景视频的播放方法、装置、计算机设备和存储介质。所述方法包括:获取全景视频,对所述全景视频进行目标检测,得到检测结果;其中,所述检测结果包括目标物体所在的候选区域;根据所述检测结果生成推荐观看视角视频;其中,所述推荐观看视角视频的画面内容包括所述目标物体;将所述推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,所述预设观看视角视频的画面内容与所述推荐观看视角视频的画面内容不同。采用本方法能够实现对全景视频中其他目标物体的展示,避免用户错过精彩内容,提高全景视频的观看体验。

Description

全景视频的播放方法、装置、计算机设备和存储介质 技术领域
本申请涉及图像处理技术领域,特别是涉及一种全景视频的播放方法、装置、计算机设备和存储介质。
背景技术
全景视频是采用多个摄像头对环境进行360°的全方位拍摄,得到多个视频流,再通过同步、拼接、投影等技术,将多个视频流合成而成,用户可以在上下左右前后360°范围内选择任意角度进行观看,获得一种身临其境的观看体验。
为确保全景视频的观赏效果,在播放全景视频时,显示装置只显示全景视频中所涵盖的全景区域的部分区域,用户可以改变观看视角以观看全景视频中的其他区域。通常将用户当前可视区域范围称为观看视角(FOV,Field of View)。传统技术中,则是以预设的观看视角播放全景视频中的部分区域。
技术问题
然而,以预设视角播放全景视频的传统方法,导致用户容易错过全景视频中其他区域的精彩内容,降低了对于全景视频的观看体验。
技术解决方案
基于此,有必要针对上述技术问题,提供一种全景视频的播放方法、装置、计算机设备和存储介质。
一种全景视频的播放方法,包括:
获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物体所在的候选区域;
根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体;
将推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,预设观看视角视频的画面内容与推荐观看视角视频的画面内容不同。
在其中一个实施例中,对全景视频进行目标检测,得到检测结果,包括:
通过预设间隔抽取全景视频中的单帧图像,得到单帧图像集合;
对单帧图像集合中每个单帧图像进行目标检测,得到检测结果;其中,检测结果包括多个单帧图像各自对应的候选区域,候选区域包括目标物体。
在其中一个实施例中,根据检测结果生成推荐观看视角视频,包括:
基于多个单帧图像各自对应的候选区域的特征参数,确定每个单帧图像的目标候选区域; 确定每个单帧图像的目标候选区域的中心点为每个单帧图像中目标物体的位置;
根据每个单帧图像的目标物体的位置,生成每个单帧图像对应的推荐画面;
根据每个单帧图像对应的推荐画面生成推荐观看视角视频。
在其中一个实施例中,特征参数包括候选区域在单帧图像中的置信度,基于多个单帧图像各自对应的候选区域的特征参数,确定每个单帧图像的目标候选区域,包括:
获取每个单帧图像中的置信度最大的候选区域作为目标候选区域。
在其中一个实施例中,特征参数包括候选区域在单帧图像中的面积,单帧图像集合中包括N个单帧图像,N个单帧图像之间具有时间顺序,N为正整数;
基于特征参数从每个单帧图像对应的候选区域中,确定每个单帧图像的目标候选区域,包括:
若N=1,将第1帧单帧图像中面积最大的候选区域作为第1帧单帧图像的目标候选区域。
在其中一个实施例中,特征参数还包括候选区域的中心点位置,基于特征参数从每个单帧图像对应的候选区域中,确定每个单帧图像的目标候选区域,还包括:
若N大于1,则确定第N帧单帧图像中各个候选区域的中心点位置;
计算第N帧单帧图像中各个候选区域的中心点位置与第N-1帧单帧图像中目标候选区域的中心点位置之间的欧式距离;
将欧式距离最小的候选区域确定为第N帧单帧图像的目标候选区域。
在其中一个实施例中,根据每个单帧图像的感兴趣目标的位置,生成每个单帧图像对应的推荐画面,包括:
获取每个单帧图像的目标物体的位置所属目标候选区域所包括的目标物体的类型;
若目标物体的类型为预设目标类型,则生成预设大小,且目标物体的位置位于画面预设位置的推荐画面;
若目标物体的类型不为预设目标类型,则生成包括目标候选区域的面积最小的推荐画面。
在其中一个实施例中,根据每个单帧图像对应的推荐画面生成推荐观看视角视频,包括:
对相邻推荐画面中目标物体的位置坐标采用插值算法进行插值计算,得到中间位置坐标;
根据中间位置坐标生成包括目标物体的中间推荐画面;其中,中间推荐画面的播放时刻位于相邻推荐画面之间;
由推荐画面和中间推荐画面按照播放时刻由前至后排序生成推荐观看视角视频。
一种全景视频自动生成推荐观看视角视频的方法,包括:
获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物体所在的候选区域;
根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体。
一种全景视频的播放装置,包括:
目标检测模块,用于获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物体所在的候选区域;
视频生成模块,用于根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体;
同步展示模块,用于将推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,预设观看视角视频的画面内容与推荐观看视角视频的画面内容不同。
一种全景视频自动生成推荐观看视角视频的装置,包括:
目标检测模块,用于获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物体所在的候选区域;
视频生成模块,用于根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:
获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物体所在的候选区域;
根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体;
将推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,预设观看视角视频的画面内容与推荐观看视角视频的画面内容不同。
一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:
获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物体所在的候选区域;
根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体;
将推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,预设观看视角视频的画面内容与推荐观看视角视频的画面内容不同。
技术效果
上述全景视频的播放方法、装置、计算机设备和存储介质,通过获取全景视频,并对全景视频进行目标检测,得到包括目标物体所在的候选区域的检测结果,并根据检测结果生成包括目标物体的推荐观看视角视频,将画面内容不同的推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示,以此实现对全景视频中其他目标物体的展示,避免用户错过精彩内容,提高全景视频的观看体验。
附图说明
图1为一个实施例中计算机设备的内部结构图;
图2为一个实施例中全景视频的播放方法的流程示意图;
图3为一个实施例全景视频与推荐观看视角视频以及预设观看视角视频的关系示意图;
图4a~图4e为推荐观看视角视频与预设观看视角视频的显示示意图;
图5为一个实施例中对全景视频进行目标检测的流程示意图;
图6为一个实施例中生成推荐观看视角视频的流程示意图;
图7为一个实施例中确定目标候选区域的流程示意图;
图8为一个实施例中生成推荐画面的流程示意图;
图9为另一个实施例中生成推荐观看视角视频的流程示意图;
图10为另一个实施例中生成推荐观看视角视频的流程示意图;
图11为一个实施例中全景视频的播放装置的结构框图;
图12为一个实施例中基于全景视频的推荐观看视角视频生成装置的结构框图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请提供的全景视频的播放方法,可以应用于如图1所示的计算机设备中。该计算机设备可以是终端,其内部结构图可以如图1所示。该计算机设备包括通过系统总线连接的处理器、存储器、通信接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的通信接口用于与外部的终端进行有线或无线方式的通信,无线方式可通过WIFI、运营商网络、NFC(近场通信)或其他技术实现。该计算机程序被处理器执行时以实现一种全景视频的播放方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机 设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。
本领域技术人员可以理解,图1中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,如图2所示,提供了一种全景视频的播放方法,以该方法应用于图1中的计算机设备为例进行说明,包括以下步骤:
S210、获取全景视频,对全景视频进行目标检测,得到检测结果。
其中,检测结果包括检测出的目标物体所在的候选区域,候选区域用于表征目标物体在单帧图像中的位置区域,可通过矩形框在单帧图像上框选确定,还可以通过目标物体的边界线在单帧图像上框选确定。本实施例中,对于候选区域的表现形式不做具体限定。
可选地,计算机设备获取全景视频后,可根据预设的用户所关注的感兴趣目标对全景视频进行目标检测,得到包括该感兴趣目标所在的候选区域的检测结果。感兴趣目标为用户预先设定的所关注的目标物体即兴趣点,例如,感兴趣目标可以是人物,动物,也可以是车辆,飞机等,可以是静态的,如建筑物,路上的树木,还可以是动态的,如行驶的车辆,奔跑中的运动员等。计算机设备还可以根据预设的目标识别算法对全景视频进行目标检测,得到识别出的所有目标物体所在候选区域的检测结果。
S220、根据检测结果生成推荐观看视角视频。
其中,推荐观看视角视频的画面内容包括目标物体。
可选地,计算机设备根据检测结果中目标物体所在的候选区域生成推荐画面,由该推荐画面构成推荐观看视角视频。可选地,可直接以候选区域作为推荐画面,还可以以候选区域的中心点为中心,外扩预设尺寸规格,在单帧图像中框选得到推荐图像。
在单帧图像的检测结果中包括多个目标物体的候选区域时,可选地,可针对同一目标物体的候选区域生成一推荐观看视角视频,存在多个目标物体时,即可生成对应的多个推荐观看视角视频。例如,对于全景视频中播放时间连续的100帧单帧图像(第1帧~第100帧)进行目标物体为车辆的目标检测,第1帧~第50帧单帧图像中包括车辆A和车辆B的候选区域,第51帧~第100帧单帧图像中只包括车辆B的候选区域,计算机设备则根据第1帧~第50帧单帧图像中车辆A的候选区域生成关于车辆A的推荐画面,形成车辆A的推荐观看角视频,并根据第1帧~第100帧单帧图像中车辆B的候选区域生成关于车辆B的推荐画面,形成车辆B的推荐观看角视频。还可以采用预设的筛选条件,在多个候选区域中确定一目标候选区域,生成该目标候选区域所对应的推荐画面,对应形成该目标候选区域所对应的推荐视角观看视频,筛选条件可以是检测结果的置信度最大或最小,还可以是检测结果中候选区域的面 积最大或最小。
S230、将推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示。
其中,预设观看视角视频的画面内容与推荐观看视角视频的画面内容不同。预设观看视角视频为计算机设备基于全景视频在默认观看视角下生成的视频,可基于用户输入的指令进行画面调整,以根据用户的意愿显示全景视频中的其他区域。例如,预设观看视角视频可以是预设摄像头获取的视频,也可以是画面内容包括预设目标物体的视频。本实施例中,对于预设观看视角视频的形成方式不做具体限定。
例如,如图3所示,计算机设备对全景视频中的单帧图像进行目标物体为人的目标检测,进而根据检测结果生成推荐观看视角视频,预设观看视角为某一摄像头获取的视频,计算机设备则将生成推荐观看视角视频以及与推荐观看视角视频不同的预设观看视角视频在同一显示画面上进行展示。
可选地,计算机设备可以将推荐观看视角视频与预设观看视角视频并列在同一显示画面上显示,如同行显示(图4a)、同列显示(图4c和图4d)、或者对角显示,可先显示推荐观看视角视频,后显示预设观看视角视频,也可先显示预设观看视角视频,后显示推荐观看视角视频。计算机设备还可以推荐观看视角视频为次要显示的视频,以预设观看视角视频主要显示的视频,将推荐观看视角视频与预设观看视角在同一显示画面上显示,例如,全屏显示预设观看视角视频,以缩略图显示推荐观看视角视频,缩略图可位于整个显示画面上的任意位置,如左上角(图4d)、左下角、中间、右上角或者右下角(图4e)。计算机设备也可以全屏实现推荐观看视角视频,以缩略图显示预设观看视角视频。本实施例中,对于推荐观看视角视频和预设观看视角视频显示的大小以及位置关系不做具体限定。
本实施例中,计算机设备对全景视频进行目标检测,得到包括目标物体所在的候选区域的检测结果,并根据检测结果生成包括目标物体的推荐观看视角视频,将画面内容不同的推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示,以此实现对全景视频中其他目标物体的展示,避免用户错过精彩内容,提高全景视频的观看体验。
在一个实施例中,为提高目标检测效率,如图5所示,上述S210包括:
S510、通过预设间隔抽取全景视频中的单帧图像,得到单帧图像集合。
可选地,计算机设备可以预设时间周期间隔抽取全景视频中的单帧图像,得到单帧图像集合。例如,时间周期T=1s,计算机设备则每隔1s则在全景视频中抽取一帧单帧图像,对应得到单帧图像集合A。计算机设备还可以预设间隔帧数间隔抽取全景视频中的单帧图像,得到单帧图像集合。例如,预设间隔帧数5帧,计算机设备则每隔5帧则在全景视频中抽取一帧单帧图像,对应得到单帧图像集合B。
S520、对单帧图像集合中每个单帧图像进行目标检测,得到检测结果。
其中,检测结果包括多个单帧图像各自对应的候选区域,候选区域包括目标物体。目标物体的类型可以是人、人脸、车辆或者建筑物等。
可选地,计算机设备可采用基于机器学习的目标检测模型对单帧图像集合中的每一单帧图像进行目标检测,得到检测结果。例如,计算机设备则将单帧图像集合中的每一单帧图像输入采用大量人脸图像作为训练样本训练得到的人脸检测模型以及采用大量车辆图像作为训练样本训练得到的车辆检测模型中进行目标检测,进而得到对应的检测结果,检测结果中则包括单帧图像各自对应的包括人脸的候选区域,以及包括车辆的候选区域。还可以采用模板匹配、关键点匹配、关键特征检测等方式进行目标物体的目标检测。
本实施例中,计算机设备通过预设间隔抽取全景视频中的单帧图像,得到单帧图像集合,并对单帧图像集合中每个单帧图像进行目标检测,得到包括多个单帧图像各自对应的包括目标物体的候选区域的检测结果,通过预设间隔抽取全景视频中的单帧图像并进行目标检测的方式,降低了需要进行目标检测的数据量,从而提高了目标检测效率。
在一个实施例中,为提高推荐观看视角视频的画面效果,如图6所示,上述S220包括:
S610、基于多个单帧图像各自对应的候选区域的特征参数,确定每个单帧图像的目标候选区域。
其中,特征参数可用于表征候选区域的区域特性,如面积、中心点位置、颜色直方图或者候选区域中目标类型的置信度中的至少一种。
在单帧图像对应的检测结果中包括多个候选区域时,计算机设备则需根据多个候选区域的特征参数从多个候选区域中确定该单帧图像所对应的目标候选区域。
在一可选地实施例中,特征参数包括候选区域在单帧图像中的置信度,即候选区域所表征的目标类型的置信度,计算机设备针对每一单帧图像,可获取对应检测结果中多个候选区域的置信度最大的候选区域作为该单帧图像的目标候选区域。例如,针对单帧图像1~200,单帧图像1~100对应的检测结果中,均为候选区域A的置信度最大,计算机设备则确定单帧图像1~150中的候选区域A为单帧图像1~100所对应的目标候选区域,而单帧图像151~200对应的检测结果中,均为候选区域B的置信度最大,则单帧图像1~150中的候选区域A为单帧图像1~100所对应的目标候选区域,计算机设备则确定单帧图像151~200中的候选区域B为单帧图像151~200所对应的目标候选区域。
可选地,计算机设备还可以针对具有时间顺序的N个单帧图像中的第1帧单帧图像,获取该第1帧单帧图像中多个候选区域的置信度最大的候选区域作为该第1帧单帧图像的目标候选区域,在确定第1帧单帧图像的目标候选区域后,则获取第2帧单帧图像中多个候选区 域与第1帧单帧图像中目标候选区域之间的相似度,确定第2帧单帧图像中的目标候选区域,具体可将相似度最大的候选目标区域确定为该第2帧单帧图像中的目标候选区域。以此类推,最终得到N个单帧图像中每一单帧图像的目标候选区域。其中,上述相似度可以是候选区域与目标候选区域之间面积的交并比、颜色直方图的相关系数或者巴氏距离。
S620、确定每个单帧图像的目标候选区域的中心点为每个单帧图像中目标物体的位置。
可选地,若目标检测得到的候选区域为矩形框在单帧图像中框选得到的区域,相应地,目标候选区域也为矩形框在单帧图像中框选得到的区域,计算机设备则获取该目标候选区域的两对角线交点为单帧图像中目标物体的位置。若目标检测得到的候选区域为目标物体的边界线在单帧图像中框选得到的区域,相应地,目标候选区域也为目标物体的边界线在单帧图像中框选得到的区域,计算机设备则获取该目标候选区域的几何中心为单帧图像中目标物体的位置。
S630、根据每个单帧图像的目标物体的位置,生成每个单帧图像对应的推荐画面。
S640、根据每个单帧图像对应的推荐画面生成推荐观看视角视频。
可选地,计算机设备可以每个单帧图像的目标物体的位置为中心点,向外延伸预设长度形成以预设长度为半径的圆形区域,并以该圆形区域作为对应单帧图像的推荐画面。计算机设备还可以每个单帧图像的目标物体的位置为中心点,向X轴方向外延伸预设长度,同时向Y轴延伸预设长度,形成以预设长度的矩形区域,并以该矩形区域作为对应单帧图像的推荐画面。相应地,得到推荐画面构成的推荐观看视角视频。
本实施例中,计算机设备基于多个单帧图像各自对应的候选区域的置信度或者面积,在对应的候选区域中确定每个单帧图像的目标候选区域,并确定每个单帧图像的目标候选区域的中心点为每个单帧图像中目标物体的位置,以根据每个单帧图像的目标物体的位置,生成每个单帧图像对应的推荐画面,再由每个单帧图像对应的推荐画面生成推荐观看视角视频,使得生成的推荐观看视角视频中的每一帧图像包括置信度最大或者面积适宜的目标物体,进而提高了推荐观看视角视频的画面效果。
在一个实施例中,特征参数包括候选区域在单帧图像中的面积,上述单帧图像集合中包括N个单帧图像,N个单帧图像之间具有时间顺序,N为正整数,在根据候选区域的面积确定单帧图像的目标候选区域时,上述S610则包括:
若N=1,计算机设备则将第1帧单帧图像中面积最大的候选区域作为第1帧单帧图像的目标候选区域。
其中,第1帧单帧图像为单帧图像集合中的第1帧图像,不一定为全景视频中的第1帧图像。
具体地,计算机设备获取单帧图像集合中第1帧单帧图像中所有候选区域的面积,并将面积最大的候选区域确定为该第1帧单帧图像的目标候选区域。
在一可选地实施例中,特征参数还包括候选区域的中心点位置,如图7所示,若N大于1,上述S610则还包括:
S710、确定第N帧单帧图像中各个候选区域的中心点位置。
具体地,计算机设备获取第N帧单帧图像中各个候选区域的几何中心位置,并将该几何中心位置作为对应候选区域的中心点位置。
S720、计算第N帧单帧图像中各个候选区域的中心点位置与第N-1帧单帧图像中目标候选区域的中心点位置之间的欧式距离。
S730、将欧式距离最小的候选区域确定为第N帧单帧图像的目标候选区域。
具体地,计算机设备计算第N帧单帧图像中各个候选区域的中心点位置与第N-1帧(即前一帧)单帧图像的目标候选区域的中心点位置之间的欧式距离,并将欧式距离最小的第N帧单帧图像中的候选区域作为该第N帧单帧图像的目标候选区域。例如,在确定第1帧单帧图像的目标候选区域后,在N=2时,第2帧单帧图像中包括候选区域a、b、c三个候选区域,计算机设备则获取第1帧单帧图像的目标候选区域的中心点位置分别与第2帧单帧图像中a、b、c三个候选区域的中心点位置之间的欧式距离,对应得到L1、L2和L3三个欧式距离,其中L2<L1<L3,计算机设备则确定最小欧式距离L2所对应的候选区域b为第2帧单帧图像的目标候选区域,以此类推,再计算第2帧单帧图像中目标候选区域的中心点位置与第3帧单帧图像中候选区域的中心点位置之间的欧式距离,并从中确定欧式距离最小的候选区域为第3帧单帧图像的目标候选区域…以根据前一帧单帧图像的目标候选区域在后一帧单帧图像的候选区域中确定目标候选区域。
本实施例中,计算机设备确定单帧图像集合的第1帧单帧图像中面积最大的候选区域为第1帧单帧图像的目标候选区域,并通过计算前一帧单帧图像中目标候选区域中心点位置与后一帧单帧图像中候选区域中心点位置之间的欧式距离,确定欧氏距离最小的后一帧单帧图像中的候选区域为该后一帧单帧图像的目标候选区域,使得推荐观看视角视频中始终包括同一目标物体,进而提高了对于同一目标物体的追溯显示。
在一个实施例中,为进一步提高推荐观看视角视频的画面效果,如图8所示,上述S630包括:
S810、获取每个单帧图像的目标物体的位置所属目标候选区域所包括的目标物体的类型。
具体地,计算机设备在确定每个单帧图像的目标物体的位置后,进一步获取该位置所属的目标候选区域所包括的目标物体的类型,并根据该目标物体的类型确定所生成的推荐画面。
S820、若目标物体的类型为预设目标类型,则生成预设大小,且目标物体的位置位于画面预设位置的推荐画面。
S830、若目标物体的类型不为预设目标类型,则生成包括目标候选区域的面积最小的推荐画面。
可选地,若目标物体的类型为预设目标类型,计算机设备则生成预设大小的推荐画面,且目标物体的位置位于所生成的推荐画面的画面预设位置。例如,预设目标类型为人脸,若目标物体的类型为人脸,计算机设备则生成人脸的位置位于所生成的推荐画面2/3高度,1/2宽度处的推荐画面。若目标物体的类型不为预设目标类型,计算机设备则生成包括目标候选区域的面积最小的推荐画面。例如,目标物体的类型为足球场(不为人脸),计算机设备则生成包括足球场所在目标候选区域的面积最小的推荐画面。
本实施例中,计算机设备获取每个单帧图像的目标物体的位置所属目标候选区域所包括的目标物体的类型,并在目标物体的类型为预设目标类型的情况下,生成预设大小,且目标物体的位置位于画面预设位置的推荐画面,在目标物体的类型不为预设目标类型的情况下,生成包括目标候选区域的面积最小的推荐画面,以此实现根据不同目标物体的类型对应确定不同推荐画面,使得推荐观看视角视频的每一画面中目标物体位置适宜,使用户可以看到预设目标类型的目标物体的局部画面,也可以看到非预设目标类型的目标物体的整体画面,进一步提高了推荐观看视角视频的画面效果。
在一个实施例中,为提高推荐观看视角视频的流畅性,如图9所示,上述S640包括:
S910、对相邻推荐画面中目标物体的位置坐标采用插值算法进行插值计算,得到中间位置坐标。
具体地,每一推荐画面所在的单帧图像在全景视频中均具有唯一确定的播放时刻,计算机设备对播放时间相邻的,且包括同一目标物体的推荐画面中目标物体的位置采用插值算法进行插值计算,以补全相邻两推荐画面中空缺画面中目标物体的位置。可选地,计算机设备对相邻推荐画面中目标物体的位置坐标采用公式
Figure PCTCN2022081149-appb-000001
进行线性插值计算,得到中间位置坐标。其中,P t为前一播放时刻t的推荐画面中目标物体的位置坐标,P t+N为后一播放时刻t+N的推荐画面中目标物体的位置坐标,P t+k为位于前一播放时刻与后一播放时刻之间的一时刻t+k所对应的中间推荐画面中目标物体的位置坐标。
可选地,上述目标物体的位置坐标可以是该目标物体在对应推荐图像中的位置坐标,还可以是该目标物体在对应所属单帧图像中的位置坐标,还可以是该目标物体在实际环境中的位置坐标。
S920、根据中间位置坐标生成包括目标物体的中间推荐画面。
其中,中间推荐画面的播放时刻位于相邻推荐画面之间。
S930、由推荐画面和中间推荐画面按照播放时刻由前至后排序生成推荐观看视角视频。
具体地,计算机设备基于中间位置坐标生成与上述推荐画面大小相同,包括相同目标物体的中间推荐画面,并由推荐画面和中间推荐画面按照播放时刻由前至后排序生成推荐观看视角视频。
可选地,计算机设备还可以对构成推荐观看视角视频的推荐图像和中间推荐图像中的目标物体的位置采用卡尔曼滤波算法及其变种、滑动窗口平均法等滤波算法进行滤波,使得生成的推荐观看视角视频更加平稳,抖动小,进一步提高推荐观看视角视频的画面效果。
本实施例中,计算机设备对相邻推荐画面中目标物体的位置坐标采用插值算法进行插值计算,得到中间位置坐标,并根据中间位置坐标生成播放时刻位于相邻推荐画面之间,包括目标物体的中间推荐画面,进而由推荐画面和中间推荐画面按照播放时刻由前至后排序生成推荐观看视角视频,对播放时刻不连续的推荐画面进行了中间推荐画面的补充,进而提高了推荐观看视角视频的流畅性。
在一个实施例中,上述全景视频的播放方法可针对至少两目标物体,相应生成至少两个推荐观看视角视频。例如,在一个篮球赛事直播/录播场景中,用户预先设置讲解员和篮球场为目标物体,计算机设备根据讲解员和篮球场对赛事全景视频中的单帧视频进行目标检测,检测到赛事全景视频中每一单帧图像中的讲解员,生成讲解员位于中间位置的第一推荐图像,并由第一推荐图像构成针对该讲解员的第一推荐观看视角视频。同时,检测到赛事全景视频中每一单帧图像中的篮球场,生成包括整个篮球场且面积最小的第二推荐图像,并由第二推荐图像构成针对该篮球场的第二推荐观看视角视频。计算机设备再将得到的针对讲解员的第一推荐观看视角视频和针对篮球场的第二推荐观看视角视频,与预设观看视角视频在同一显示画面上进行展示。
在一个实施例中,如图10所示,提供了一种全景视频自动生成推荐观看视角视频的方法,包括:
S1010、获取全景视频,对全景视频进行目标检测,得到检测结果。
其中,检测结果包括目标物体所在的候选区域。
S1020、根据检测结果生成推荐观看视角视频。
其中,推荐观看视角视频的画面内容目标物体。
具体地,上述基于全景视频的推荐观看视角视频的生成过程可参见图5~图9所示的实施例,在此不再赘述。
应该理解的是,虽然图2-10的中流程图中的各个步骤按照箭头的指示依次显示,但是 这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-10中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在一个实施例中,如图11所示,提供了一种全景视频的播放装置,包括:目标检测模块1101、视频生成模块1102和同步展示模块1103,其中:
目标检测模块1101用于获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物体所在的候选区域;视频生成模块1102用于根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体;同步展示模块1103用于将推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,预设观看视角视频的画面内容与推荐观看视角视频的画面内容不同。
在其中一个实施例中,目标检测模块1101具体用于:
通过预设间隔抽取全景视频中的单帧图像,得到单帧图像集合;对单帧图像集合中每个单帧图像进行目标检测,得到检测结果;其中,检测结果包括多个单帧图像各自对应的候选区域,候选区域包括目标物体。
在其中一个实施例中,视频生成模块1102具体用于:
基于多个单帧图像各自对应的候选区域的特征参数,确定每个单帧图像的目标候选区域;确定每个单帧图像的目标候选区域的中心点为每个单帧图像中目标物体的位置;根据每个单帧图像的目标物体的位置,生成每个单帧图像对应的推荐画面;根据每个单帧图像对应的推荐画面生成推荐观看视角视频。
在其中一个实施例中,特征参数包括候选区域在单帧图像中的置信度,视频生成模块1102具体用于:
获取每个单帧图像中的置信度最大的候选区域作为目标候选区域。
在其中一个实施例中,特征参数包括候选区域在单帧图像中的面积,单帧图像集合中包括N个单帧图像,N个单帧图像之间具有时间顺序,N为正整数;视频生成模块1102具体用于:
若N=1,将第1帧单帧图像中面积最大的候选区域作为第1帧单帧图像的目标候选区域。
在其中一个实施例中,特征参数还包括候选区域的中心点位置,视频生成模块1102还用于:
若N大于1,则确定第N帧单帧图像中各个候选区域的中心点位置;计算第N帧单帧图 像中各个候选区域的中心点位置与第N-1帧单帧图像中目标候选区域的中心点位置之间的欧式距离;将欧式距离最小的候选区域确定为第N帧单帧图像的目标候选区域。
在其中一个实施例中,视频生成模块1102具体用于:
获取每个单帧图像的目标物体的位置所属目标候选区域所包括的目标物体的类型;若目标物体的类型为预设目标类型,则生成预设大小,且目标物体的位置位于画面预设位置的推荐画面;若目标物体的类型不为预设目标类型,则生成包括目标候选区域的面积最小的推荐画面。
在其中一个实施例中,视频生成模块1002具体用于:
对相邻推荐画面中目标物体的位置坐标采用插值算法进行插值计算,得到中间位置坐标;根据中间位置坐标生成包括目标物体的中间推荐画面;其中,中间推荐画面的播放时刻位于相邻推荐画面之间;由推荐画面和中间推荐画面按照播放时刻由前至后排序生成推荐观看视角视频。
在一个实施例中,如图12所示,提供了一种全景视频自动生成推荐观看视角视频的装置,包括:目标检测模块1201和视频生成模块1202。其中:
目标检测模块1201与上述目标检测模块1101的作用相同,视频生成模块1202与上述视频生成模块1102的作用相同,在此不再赘述。
关于全景视频的播放装置的具体限定可以参见上文中对于全景视频的播放方法的限定,关于全景视频自动生成推荐观看视角视频的装置的具体限定可以参见上文中对于全景视频自动生成推荐观看视角视频的方法的限定,在此不再赘述。上述全景视频的播放装置和全景视频自动生成推荐观看视角视频的装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:
获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物体所在的候选区域;根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体;将推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,预设观看视角视频的画面内容与推荐观看视角视频的画面内容不同。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
通过预设间隔抽取全景视频中的单帧图像,得到单帧图像集合;对单帧图像集合中每个单帧图像进行目标检测,得到检测结果;其中,检测结果包括多个单帧图像各自对应的候选 区域,候选区域包括目标物体。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
基于多个单帧图像各自对应的候选区域的特征参数,确定每个单帧图像的目标候选区域;确定每个单帧图像的目标候选区域的中心点为每个单帧图像中目标物体的位置;根据每个单帧图像的目标物体的位置,生成每个单帧图像对应的推荐画面;根据每个单帧图像对应的推荐画面生成推荐观看视角视频。
在一个实施例中,特征参数包括候选区域在单帧图像中的置信度,处理器执行计算机程序时还实现以下步骤:
获取每个单帧图像中的置信度最大的候选区域作为目标候选区域。
在一个实施例中,特征参数包括候选区域在单帧图像中的面积,单帧图像集合中包括N个单帧图像,N个单帧图像之间具有时间顺序,N为正整数;处理器执行计算机程序时还实现以下步骤:
若N=1,将第1帧单帧图像中面积最大的候选区域作为第1帧单帧图像的目标候选区域。
在一个实施例中,特征参数还包括候选区域的中心点位置,处理器执行计算机程序时还实现以下步骤:
若N大于1,则确定第N帧单帧图像中各个候选区域的中心点位置;计算第N帧单帧图像中各个候选区域的中心点位置与第N-1帧单帧图像中目标候选区域的中心点位置之间的欧式距离;将欧式距离最小的候选区域确定为第N帧单帧图像的目标候选区域。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
获取每个单帧图像的目标物体的位置所属目标候选区域所包括的目标物体的类型;若目标物体的类型为预设目标类型,则生成预设大小,且目标物体的位置位于画面预设位置的推荐画面;若目标物体的类型不为预设目标类型,则生成包括目标候选区域的面积最小的推荐画面。
在一个实施例中,处理器执行计算机程序时还实现以下步骤:
对相邻推荐画面中目标物体的位置坐标采用插值算法进行插值计算,得到中间位置坐标;根据中间位置坐标生成包括目标物体的中间推荐画面;其中,中间推荐画面的播放时刻位于相邻推荐画面之间;由推荐画面和中间推荐画面按照播放时刻由前至后排序生成推荐观看视角视频。
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:
获取全景视频,对全景视频进行目标检测,得到检测结果;其中,检测结果包括目标物 体所在的候选区域;根据检测结果生成推荐观看视角视频;其中,推荐观看视角视频的画面内容包括目标物体;将推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,预设观看视角视频的画面内容与推荐观看视角视频的画面内容不同。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
通过预设间隔抽取全景视频中的单帧图像,得到单帧图像集合;对单帧图像集合中每个单帧图像进行目标检测,得到检测结果;其中,检测结果包括多个单帧图像各自对应的候选区域,候选区域包括目标物体。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
基于多个单帧图像各自对应的候选区域的特征参数,确定每个单帧图像的目标候选区域;确定每个单帧图像的目标候选区域的中心点为每个单帧图像中目标物体的位置;根据每个单帧图像的目标物体的位置,生成每个单帧图像对应的推荐画面;根据每个单帧图像对应的推荐画面生成推荐观看视角视频。
在一个实施例中,特征参数包括候选区域在单帧图像中的置信度,计算机程序被处理器执行时还实现以下步骤:
获取每个单帧图像中的置信度最大的候选区域作为目标候选区域。
在一个实施例中,特征参数包括候选区域在单帧图像中的面积,单帧图像集合中包括N个单帧图像,N个单帧图像之间具有时间顺序,N为正整数;计算机程序被处理器执行时还实现以下步骤:
若N=1,将第1帧单帧图像中面积最大的候选区域作为第1帧单帧图像的目标候选区域。
在一个实施例中,特征参数还包括候选区域的中心点位置,计算机程序被处理器执行时还实现以下步骤:
若N大于1,则确定第N帧单帧图像中各个候选区域的中心点位置;计算第N帧单帧图像中各个候选区域的中心点位置与第N-1帧单帧图像中目标候选区域的中心点位置之间的欧式距离;将欧式距离最小的候选区域确定为第N帧单帧图像的目标候选区域。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
获取每个单帧图像的目标物体的位置所属目标候选区域所包括的目标物体的类型;若目标物体的类型为预设目标类型,则生成预设大小,且目标物体的位置位于画面预设位置的推荐画面;若目标物体的类型不为预设目标类型,则生成包括目标候选区域的面积最小的推荐画面。
在一个实施例中,计算机程序被处理器执行时还实现以下步骤:
对相邻推荐画面中目标物体的位置坐标采用插值算法进行插值计算,得到中间位置坐标; 根据中间位置坐标生成包括目标物体的中间推荐画面;其中,中间推荐画面的播放时刻位于相邻推荐画面之间;由推荐画面和中间推荐画面按照播放时刻由前至后排序生成推荐观看视角视频。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (13)

  1. 一种全景视频的播放方法,其特征在于,所述方法包括:
    获取全景视频,对所述全景视频进行目标检测,得到检测结果;其中,所述检测结果包括目标物体所在的候选区域;
    根据所述检测结果生成推荐观看视角视频;其中,所述推荐观看视角视频的画面内容包括所述目标物体
    将所述推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,所述预设观看视角视频的画面内容与所述推荐观看视角视频的画面内容不同。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述全景视频进行目标检测,得到检测结果,包括:
    通过预设间隔抽取所述全景视频中的单帧图像,得到单帧图像集合;
    对所述单帧图像集合中每个单帧图像进行目标检测,得到所述检测结果;其中,所述检测结果包括多个单帧图像各自对应的候选区域,所述候选区域包括所述目标物体。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述检测结果生成推荐观看视角视频,包括:
    基于所述多个单帧图像各自对应的候选区域的特征参数,确定每个单帧图像的目标候选区域;
    确定每个单帧图像的目标候选区域的中心点为所述每个单帧图像中所述目标物体的位置;
    根据所述每个单帧图像中所述目标物体的位置,生成每个单帧图像对应的推荐画面;
    根据每个单帧图像对应的推荐画面生成所述推荐观看视角视频。
  4. 根据权利要求3所述的方法,其特征在于,所述特征参数包括所述候选区域在所述单帧图像中的置信度,所述基于所述多个单帧图像各自对应的候选区域的特征参数,确定每个单帧图像的目标候选区域,包括:
    获取每个单帧图像中的置信度最大的候选区域作为所述目标候选区域。
  5. 根据权利要求3所述的方法,其特征在于,所述特征参数包括所述候选 区域在所述单帧图像中的面积,所述单帧图像集合中包括N个单帧图像,N个单帧图像之间具有时间顺序,所述N为正整数;
    所述基于所述特征参数从每个单帧图像对应的候选区域中,确定每个单帧图像的目标候选区域,包括:
    若N=1,将第1帧单帧图像中面积最大的候选区域作为所述第1帧单帧图像的目标候选区域。
  6. 根据权利要求5所述的方法,其特征在于,所述特征参数还包括所述候选区域的中心点位置,所述基于所述特征参数从每个单帧图像对应的候选区域中,确定每个单帧图像的目标候选区域,还包括:
    若N大于1,则确定第N帧单帧图像中各个候选区域的中心点位置;
    计算第N帧单帧图像中各个候选区域的中心点位置与第N-1帧单帧图像中目标候选区域的中心点位置之间的欧式距离;
    将欧式距离最小的候选区域确定为所述第N帧单帧图像的目标候选区域。
  7. 根据权利要求3所述的方法,其特征在于,所述根据所述每个单帧图像中所述目标物体的位置,生成每个单帧图像对应的推荐画面,包括:
    获取每个单帧图像中所述目标物体的位置所属目标候选区域所包括的目标物体的类型;
    若所述目标物体的类型为预设目标类型,则生成预设大小,且所述目标物体的位置位于画面预设位置的所述推荐画面;
    若所述目标物体的类型不为所述预设目标类型,则生成包括所述目标候选区域的面积最小的所述推荐画面。
  8. 根据权利要求3-7任一项所述的方法,其特征在于,所述根据每个单帧图像对应的推荐画面生成所述推荐观看视角视频,包括:
    对相邻所述推荐画面中所述目标物体的位置坐标采用插值算法进行插值计算,得到中间位置坐标;
    根据所述中间位置坐标生成包括所述目标物体的中间推荐画面;其中,所述中间推荐画面的播放时刻位于所述相邻所述推荐画面之间;
    由所述推荐画面和所述中间推荐画面按照所述播放时刻由前至后排序生成 所述推荐观看视角视频。
  9. 一种全景视频自动生成推荐观看视角视频的方法,其特征在于,所述方法包括:
    获取全景视频,对所述全景视频进行目标检测,得到检测结果;其中,所述检测结果包括目标物体所在的候选区域;
    根据所述检测结果生成推荐观看视角视频;其中,所述推荐观看视角视频的画面内容包括所述目标物体。
  10. 一种全景视频的播放装置,其特征在于,所述装置包括:
    目标检测模块,用于获取全景视频,对所述全景视频进行目标检测,得到检测结果;其中,所述检测结果包括目标物体所在的候选区域;
    视频生成模块,用于根据所述检测结果生成推荐观看视角视频;其中,所述推荐观看视角视频的画面内容包括所述目标物体;
    同步展示模块,用于将所述推荐观看视角视频与预设观看视角视频在同一显示画面上进行展示;其中,所述预设观看视角视频的画面内容与所述推荐观看视角视频的画面内容不同。
  11. 一种全景视频自动生成推荐观看视角视频的装置,其特征在于,所述装置包括:
    目标检测模块,用于获取全景视频,对所述全景视频进行目标检测,得到检测结果;其中,所述检测结果包括目标物体所在的候选区域;
    视频生成模块,用于根据所述检测结果生成推荐观看视角视频;其中,所述推荐观看视角视频的画面内容包括所述目标物体。
  12. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至9中任一项所述方法的步骤。
  13. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9中任一项所述的方法的步骤。
PCT/CN2022/081149 2021-03-23 2022-03-16 全景视频的播放方法、装置、计算机设备和存储介质 WO2022199441A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110307765.X 2021-03-23
CN202110307765.XA CN112954443A (zh) 2021-03-23 2021-03-23 全景视频的播放方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022199441A1 true WO2022199441A1 (zh) 2022-09-29

Family

ID=76228061

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/081149 WO2022199441A1 (zh) 2021-03-23 2022-03-16 全景视频的播放方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN112954443A (zh)
WO (1) WO2022199441A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953726A (zh) * 2023-03-14 2023-04-11 深圳中集智能科技有限公司 机器视觉的集装箱箱面破损检测方法和系统

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112954443A (zh) * 2021-03-23 2021-06-11 影石创新科技股份有限公司 全景视频的播放方法、装置、计算机设备和存储介质
CN117710756B (zh) * 2024-02-04 2024-04-26 成都数之联科技股份有限公司 一种目标检测及模型训练方法、装置、设备、介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331732A (zh) * 2016-09-26 2017-01-11 北京疯景科技有限公司 生成、展现全景内容的方法及装置
CN108632674A (zh) * 2017-03-23 2018-10-09 华为技术有限公司 一种全景视频的播放方法和客户端
WO2018234622A1 (en) * 2017-06-21 2018-12-27 Nokia Technologies Oy METHOD OF DETECTING EVENTS OF INTEREST
CN110197126A (zh) * 2019-05-06 2019-09-03 深圳岚锋创视网络科技有限公司 一种目标追踪方法、装置及便携式终端
CN111309147A (zh) * 2020-02-12 2020-06-19 咪咕视讯科技有限公司 全景视频播放方法、装置及存储介质
CN111954003A (zh) * 2019-05-17 2020-11-17 阿里巴巴集团控股有限公司 全景视频播放方法以及装置
CN112954443A (zh) * 2021-03-23 2021-06-11 影石创新科技股份有限公司 全景视频的播放方法、装置、计算机设备和存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10616620B1 (en) * 2016-04-06 2020-04-07 Ambarella International Lp Low bitrate encoding of spherical video to support live streaming over a high latency and/or low bandwidth network
CN106897735A (zh) * 2017-01-19 2017-06-27 博康智能信息技术有限公司上海分公司 一种快速移动目标的跟踪方法及装置
CN107872731B (zh) * 2017-11-22 2020-02-21 三星电子(中国)研发中心 全景视频播放方法及装置
CN109753883A (zh) * 2018-12-13 2019-05-14 北京字节跳动网络技术有限公司 视频定位方法、装置、存储介质和电子设备
CN109788370A (zh) * 2019-01-14 2019-05-21 北京奇艺世纪科技有限公司 一种全景视频播放方法、装置及电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106331732A (zh) * 2016-09-26 2017-01-11 北京疯景科技有限公司 生成、展现全景内容的方法及装置
CN108632674A (zh) * 2017-03-23 2018-10-09 华为技术有限公司 一种全景视频的播放方法和客户端
WO2018234622A1 (en) * 2017-06-21 2018-12-27 Nokia Technologies Oy METHOD OF DETECTING EVENTS OF INTEREST
CN110197126A (zh) * 2019-05-06 2019-09-03 深圳岚锋创视网络科技有限公司 一种目标追踪方法、装置及便携式终端
CN111954003A (zh) * 2019-05-17 2020-11-17 阿里巴巴集团控股有限公司 全景视频播放方法以及装置
CN111309147A (zh) * 2020-02-12 2020-06-19 咪咕视讯科技有限公司 全景视频播放方法、装置及存储介质
CN112954443A (zh) * 2021-03-23 2021-06-11 影石创新科技股份有限公司 全景视频的播放方法、装置、计算机设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115953726A (zh) * 2023-03-14 2023-04-11 深圳中集智能科技有限公司 机器视觉的集装箱箱面破损检测方法和系统
CN115953726B (zh) * 2023-03-14 2024-02-27 深圳中集智能科技有限公司 机器视觉的集装箱箱面破损检测方法和系统

Also Published As

Publication number Publication date
CN112954443A (zh) 2021-06-11

Similar Documents

Publication Publication Date Title
WO2022199441A1 (zh) 全景视频的播放方法、装置、计算机设备和存储介质
US10182270B2 (en) Methods and apparatus for content interaction
JP5347279B2 (ja) 画像表示装置
US10958854B2 (en) Computer-implemented method for generating an output video from multiple video sources
US10609284B2 (en) Controlling generation of hyperlapse from wide-angled, panoramic videos
US10284789B2 (en) Dynamic generation of image of a scene based on removal of undesired object present in the scene
US11317139B2 (en) Control method and apparatus
EP2428036B1 (en) Systems and methods for the autonomous production of videos from multi-sensored data
US20180182114A1 (en) Generation apparatus of virtual viewpoint image, generation method, and storage medium
Chen et al. Personalized production of basketball videos from multi-sensored data under limited display resolution
JP2018107793A (ja) 仮想視点画像の生成装置、生成方法及びプログラム
WO2016045381A1 (zh) 呈现图像的方法、终端设备和服务器
WO2020108573A1 (zh) 视频图像遮挡方法、装置、设备及存储介质
JP6203188B2 (ja) 類似画像検索装置
JP5768265B2 (ja) 類似画像検索システム
US8407575B1 (en) Video content summary
CN101611629A (zh) 图像处理设备、运动图像再现设备及其处理方法和程序
CN109600667B (zh) 一种基于网格与帧分组的视频重定向的方法
US20190005133A1 (en) Method, apparatus and arrangement for summarizing and browsing video content
Tompkin et al. Video collections in panoramic contexts
Zhang et al. Coherent video generation for multiple hand-held cameras with dynamic foreground
JP6632134B2 (ja) 画像処理装置、画像処理方法およびコンピュータプログラム
JP5276609B2 (ja) 画像処理装置及びプログラム
Li et al. Ultra high definition video saliency database
RU2790029C1 (ru) Способ формирования панорамного изображения

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22774102

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 15-11-2023)