WO2022073409A1 - 视频处理方法、装置、计算机设备及存储介质 - Google Patents

视频处理方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022073409A1
WO2022073409A1 PCT/CN2021/117982 CN2021117982W WO2022073409A1 WO 2022073409 A1 WO2022073409 A1 WO 2022073409A1 CN 2021117982 W CN2021117982 W CN 2021117982W WO 2022073409 A1 WO2022073409 A1 WO 2022073409A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
interactive operation
video
pixel
parameter
Prior art date
Application number
PCT/CN2021/117982
Other languages
English (en)
French (fr)
Inventor
夏爽
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21876929.7A priority Critical patent/EP4106337A4/en
Publication of WO2022073409A1 publication Critical patent/WO2022073409A1/zh
Priority to US17/963,879 priority patent/US20230036919A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/858Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot
    • H04N21/8583Linking data to content, e.g. by linking an URL to a video object, by creating a hotspot by creating hot-spots
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04845Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range for image manipulation, e.g. dragging, rotation, expansion or change of colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04847Interaction techniques to control parameter settings, e.g. interaction with sliders or dials
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • G06F3/04883Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures for inputting data by handwriting, e.g. gesture or text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/431Generation of visual interfaces for content selection or interaction; Content or additional data rendering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440245Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display the reformatting operation being performed only on part of the stream, e.g. a region of the image or a time segment
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/485End-user interface for client configuration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8106Monomedia components thereof involving special audio data, e.g. different tracks for different languages

Definitions

  • the present application relates to the field of computer technology, and in particular, to a video processing method, apparatus, computer device, and storage medium.
  • Video includes images and audio, and can provide users with an intuitive and engaging viewing experience from both visual and auditory aspects.
  • the embodiments of the present application provide a video processing method, apparatus, computer equipment and storage medium, which can realize interactive support for a user and a video being played, enhance the interactivity of the video, and improve the visual effect during the video playing process.
  • the technical solution is as follows:
  • a video processing method executed by a computer device, the method comprising:
  • an adjustment parameter corresponding to the interactive operation is obtained, where the adjustment parameter indicates an adjustment range of the display position of the pixel point in the first image based on the interactive operation, the The first image is the image currently displayed in the played video;
  • the second image is displayed based on the adjusted display position of the pixel point.
  • a video processing apparatus comprising:
  • a first obtaining module configured to obtain adjustment parameters corresponding to the interactive operations in response to the interactive operations acting on the first image, where the adjustment parameters indicate the display of pixels in the first image based on the interactive operations
  • the adjustment range of the position, the first image is the image currently displayed in the video being played;
  • a second acquisition module configured to acquire a displacement parameter of a pixel point of the first image, where the displacement parameter represents the displacement of the pixel point between the first image and the second image, and the second image is an image displayed after said first image;
  • a second display module configured to adjust the display position of the pixel in the first image based on the adjustment parameter and the displacement parameter
  • the second display module is further configured to display the second image based on the adjusted display position of the pixel point.
  • the first acquisition module includes:
  • a force acquisition unit configured to acquire the force of the interaction operation in response to the interaction operation acting on the first image
  • a parameter determination unit configured to determine the adjustment parameter matching the action force based on the action force of the interactive operation.
  • the parameter determining unit is configured to:
  • the adjustment parameter is positively correlated with the reference adjustment parameter, the adjustment parameter is negatively correlated with the reference action force, and the adjustment parameter is positively correlated with the action force of the interactive operation.
  • the second acquisition module is configured to input the first image into an optical flow estimation model to obtain displacement parameters of pixels of the first image
  • the second obtaining module is configured to decode the encoded data of the video to obtain displacement parameters of the pixels of the first image, and the encoded data includes the encoded displacement parameters.
  • the second display module includes:
  • a pixel point offset unit configured to adjust, based on the adjustment parameter and the displacement parameter, the display position of the pixel point in the first image on which the interactive operation acts.
  • the pixel point offset unit is used for:
  • the display position of the pixel on which the interactive operation acts in the first image is adjusted.
  • the second display module is used for:
  • the weight is used to represent the degree of influence of the interactive operation on the display position shift of the pixel point;
  • the adjustment parameter is weighted, and the display position of the pixel point in the first image is adjusted based on the weighted adjustment parameter and the displacement parameter.
  • the apparatus further includes:
  • a first object determination module configured to, in response to an interactive operation acting on the first image, determine a first object in the first image to which the interactive operation acts;
  • an audio determination module configured to obtain the audio data corresponding to the first object from the corresponding relationship between the object and the audio data
  • the audio playing module is used for playing the audio data corresponding to the first object.
  • the first object determination module is configured to:
  • An object in the first target area is determined as the first object.
  • the apparatus further includes:
  • a pixel tracking module configured to determine at least one second pixel area of the second image based on the pixel points in the at least one first pixel area and the adjusted display position of the pixel points, and one second pixel area is the same as the second pixel area.
  • a first pixel area corresponds, and the original display position of the pixel in the second pixel area is in the corresponding first pixel area;
  • the first object determination module is further configured to, in response to an interactive operation acting on the second image, determine from the at least one second pixel area a second target area on which the interactive operation acts; The object in the second target area is determined as the second object;
  • the audio playing module is further configured to play the audio data corresponding to the second object.
  • the audio playback module is used for:
  • the audio data corresponding to the first object is played.
  • the apparatus further includes:
  • a second object determination module configured to determine the main object in the video
  • an audio extraction module configured to extract the audio data of the main object from the video clip in which the main object exists in the video
  • a relationship generating module is configured to generate a corresponding relationship between the subject object and the audio data of the subject object.
  • a computer device in one aspect, includes a processor and a memory, the memory stores at least one piece of program code, and the at least one piece of program code is loaded and executed by the processor to implement any of the above The video processing method described in an optional implementation manner.
  • a computer-readable storage medium is provided, and at least one piece of program code is stored in the computer-readable storage medium, and the at least one piece of program code is loaded and executed by a processor to implement any of the above-mentioned optional
  • the video processing method described in the implementation manner is implemented.
  • a computer program product or computer program comprising computer program code stored in a computer-readable storage medium from which a processor of a computer device The computer program code is read, and the processor executes the computer program code, so that the computer device executes the video processing method described in any of the foregoing optional implementation manners.
  • the displacement parameter can represent the displacement of the pixel point change between the first image and the second image
  • the interactive operation can The displacement of the pixel point changes has an influence. Therefore, the display position of the pixel point of the first image is adjusted in combination with the displacement parameter and the adjustment parameter, so that the effect of the interactive operation can be presented on the second image displayed after the first image.
  • the video presents a dynamic effect with a higher degree of matching with the interactive operation, realizes the interactive support for the user and the video being played, enhances the interactivity of the video, and improves the visual effect during the video playback process.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 2 is a flowchart of a video processing method provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of a video processing method provided by an embodiment of the present application.
  • FIG. 4 is a flowchart of a video interactive playback provided by an embodiment of the present application.
  • FIG. 5 is a block diagram of a video processing apparatus provided by an embodiment of the present application.
  • FIG. 6 is a block diagram of a terminal provided by an embodiment of the present application.
  • FIG. 7 is a block diagram of a server provided by an embodiment of the present application.
  • a computer device plays a video on the screen.
  • the computer device does not support the user to interact with the video being played, and the video playback performance of the computer device cannot meet the interaction requirements. more singular.
  • Optical flow estimation is used to represent the instantaneous displacement of each pixel in the image, which is obtained based on the correlation of each pixel between frames in the video. For two frames of images I(t-1) and I(t) that are adjacent in time series, after each pixel point on I(t-1) is shifted, the position of each pixel point is consistent with I(t). On the one hand, the position of the object at the next moment can be known through the optical flow estimation, so that the optical flow can be used to improve the speed and accuracy of the target tracking in the video, and the effect of fast tracking the object can be achieved in the process of video playback. On the other hand, the motion trend of pixels in the current frame to the next frame can be predicted through optical flow estimation.
  • Semantic segmentation understands images at the pixel level, dividing the pixels in the image into multiple categories. For example, an image includes a motorcycle and a person riding a motorcycle, and through semantic segmentation, the pixels depicting a person riding a motorcycle are divided into the same class, and the pixels depicting a motorcycle are divided into another class.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment includes a terminal 101 , and the video processing method provided in this embodiment of the present application is executed by the terminal 101 .
  • the terminal 101 is a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart TV, a VR (Virtual Reality, virtual reality) device, etc., but is not limited thereto.
  • the terminal 101 is provided with an application program supporting interactive video playback, for example, the application program is a video playback application program, a browser, and the like.
  • the implementation environment includes a terminal 101 and a server 102 , and the video processing method provided in this embodiment of the present application is implemented through interaction between the terminal 101 and the server 102 .
  • the server 102 is an independent physical server; alternatively, the server 102 is a server cluster or a distributed system composed of multiple physical servers; alternatively, the server 102 provides cloud services, cloud databases, cloud computing, cloud functions, and cloud storage , network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • the server 102 and the terminal 101 are directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • the technical solutions provided in the embodiments of the present application are implemented by the terminal as the execution body; or the technical solutions provided in the embodiments of the present application are implemented by the server as the execution body; or, the technical solutions provided by the embodiments of the present application are implemented by the terminal and the It is implemented by interaction between servers, which is not limited in this application.
  • the execution subject of the technical solution is a terminal as an example for description.
  • FIG. 2 is a flowchart of a video processing method provided by an embodiment of the present application.
  • a terminal is an execution subject as an example for description, and the embodiment includes:
  • the terminal in response to the interactive operation acting on the first image, acquires an adjustment parameter corresponding to the interactive operation, where the adjustment parameter indicates the adjustment range of the display position of the pixel point in the first image based on the interactive operation, and the first image is: The image currently displayed in the video being played.
  • a video is composed of multiple frames of static images, and the multiple frames of images are displayed on the terminal in rapid succession according to the first frame rate to achieve a dynamic video effect.
  • the first frame rate is any frame rate.
  • the terminal acquires adjustment parameters corresponding to the interactive operation, adjusts the display position of the pixels in the first image based on the adjustment parameters, and presents the effect of the interactive operation in the next frame of image, that is, in the second image displayed after the first image. in the image.
  • the terminal When the terminal is a smart phone, tablet computer, notebook computer, desktop computer or smart TV, etc., the user triggers an interactive operation on the first image by touching the display screen of the terminal, or performs operations on the display screen through a mouse or a keyboard , triggering an interactive operation on the first image.
  • the terminal detects an interactive operation acting on the first image, and acquires adjustment parameters corresponding to the interactive operation.
  • the terminal When the terminal is a VR device, the user wears a hand operation sensing device in the VR device, and interacts with the video through the hand operation sensing device. During the process of displaying a frame of image in the video, the VR device detects an interactive operation through the hand operation sensing device, and acquires adjustment parameters corresponding to the interactive operation.
  • the terminal acquires a displacement parameter of a pixel of an image, where the displacement parameter represents the displacement of a pixel between the first image and a second image, where the second image is an image displayed after the first image.
  • the pixels between two adjacent frames of images in the video are correlated, and the displacement of the pixels between frames is represented by the movement of objects in the video picture in visual effect.
  • the movement of the same pixel from the Nth frame image to the N+1th frame image is expressed as optical flow, where N is a positive integer, the Nth frame image can be called the first image, and the N+1th frame image can be called the first image.
  • Second image The instantaneous displacement of pixels between the original two adjacent frames in the video is represented by the optical flow estimation parameter, that is, the instantaneous displacement of the pixels between the original two adjacent frames in the video is represented by the displacement parameter.
  • the terminal adjusts the display position of the pixel in the first image based on the adjustment parameter and the displacement parameter.
  • the displacement parameter represents the original displacement of the pixel point between one frame of image and the next frame of the image, that is, the displacement parameter represents the displacement of the pixel point between the first image and the second image, based on the displacement parameter , Offset the position of the pixel point, which can show the original displacement change of the pixel point. Then combined with the adjustment parameters, the position of the pixel point is shifted again, and the displacement change caused by the interactive operation can be superimposed on the basis of the original displacement change of the pixel point, so as to realize the adjustment of the display position of the pixel point.
  • the terminal displays the second image based on the display position adjusted by the pixel points.
  • the second image displayed based on the display position adjusted by the pixel points can present the effect of the interactive operation, thereby realizing the interactive playback of the video.
  • the displacement parameter can represent the displacement of the pixel point change between the first image and the second image
  • the interactive operation can The displacement of the pixel point changes has an impact. Therefore, the display position of the pixel point of the first image is adjusted in combination with the displacement parameter and the adjustment parameter, so that the effect of the interactive operation can be presented on the second image displayed after the first image.
  • the video presents a dynamic effect with a higher degree of matching with the interactive operation, realizes the interactive support for the user and the video being played, enhances the interactivity of the video, and improves the visual effect during the video playback process.
  • FIG. 3 is a flowchart of a video processing method provided by an embodiment of the present application.
  • the interactive playback of a video by a terminal is used as an example for description, that is, the terminal makes feedback on the user's interactive operation by superimposing the effect of the interactive operation in the video, so as to realize the video interactive playback.
  • This embodiment includes:
  • the terminal displays the first image in the played video.
  • Multiple frames of images are displayed in rapid succession to form a video.
  • the process of the terminal performing video playback is disassembled to be the process that the terminal displays multiple frames of images in sequence.
  • the playback and processing of the video by the terminal is also the display and processing of the images in the video.
  • the terminal supports interactive playback of videos of any video type.
  • the terminal interactively plays the video by default.
  • the terminal interactively plays the video when the interactive play mode is turned on.
  • the terminal provides a start-stop switch of the interactive play mode, and the user can control the on and off of the interactive play mode through the start-stop switch of the interactive play mode.
  • the terminal determines that the interactive playback mode has entered the on state in response to the start-stop switch of the interactive play mode being turned on; the terminal determines that the interactive play mode has entered the off state in response to the start-stop switch of the interactive play mode being turned off.
  • the terminal when the terminal is a smart phone, the user can watch the video through a video playing application on the terminal.
  • the terminal runs the video playback application in response to the start operation of the video playback application.
  • the user can select a video to watch through the application interface of the video playback application.
  • the terminal displays the application interface of the video playing application; in response to the click operation on the video in the application interface, the video is played.
  • the video playback interface includes a start/stop switch of the interactive playback mode, and the terminal performs interactive playback of the video in response to the start/stop switch of the interactive playback mode being turned on.
  • the terminal When the terminal is a VR device or a smart TV, the user can control the terminal to enter the interactive playback mode through voice commands or gesture operations. In the case where the terminal is a smart TV, the user can also control the terminal to enter the interactive play mode by pressing the interactive button on the remote control of the smart TV. In this embodiment of the present application, the manner in which the terminal enters the interactive play mode is not limited.
  • the terminal mainly supports interactive playback of videos of the target video type.
  • videos of other video types except the target video type users have higher video interaction requirements for videos of the target video type.
  • target video types include nature documentaries, astronomy documentaries, food documentaries, and VR films.
  • the image currently displayed in the video played by the terminal is the first image, and the embodiment of the present application uses the first image as an example to describe the video processing process.
  • the terminal In response to the interactive operation acting on the first image, the terminal acquires adjustment parameters corresponding to the interactive operation.
  • the interactive operation is a touch operation on the display screen of the terminal, and the interactive operation acts on the display screen when the display screen of the terminal displays the first image.
  • the interaction operation is a hand operation captured by a hand operation sensing device of the VR device, and the VR device captures an interaction operation acting on the image through the hand operation sensing device when the VR device displays the image.
  • the above-mentioned adjustment parameters are used to adjust the display positions of the pixels in the first image, so that the effect of the interactive operation is presented in the second image displayed after the first image.
  • the adjustment parameter is a vector with magnitude and direction indicating the amount of displacement by which the pixel's display position is adjusted.
  • the adjustment parameters include an offset distance and an offset direction for adjusting the display position of the pixel point. That is, the adjustment parameter indicates the adjustment range of the display position of the pixel point in the first image based on the interactive operation, and the adjustment range refers to the range of adjusting the display position of the pixel point on the basis of the original displacement of the pixel point.
  • the terminal acquires adjustment parameters matching the action force of the interactive operation, so as to express the action effect of the interactive operation according to the action force.
  • the terminal acquires the adjustment parameter corresponding to the interactive operation through the following steps 3021 to 3022 .
  • the terminal acquires the force of the interactive operation.
  • the lower layer of the display screen of the terminal is provided with a pressure sensor.
  • the terminal recognizes the force of the interaction through the pressure sensor.
  • the terminal determines an adjustment parameter that matches the force of action.
  • the force of action is positively correlated with the adjustment parameter, the greater the force of action, the greater the adjustment parameter.
  • the terminal determines the adjustment parameter corresponding to the current action force according to the corresponding relationship between the maximum force and the maximum adjustment parameter, wherein the maximum force may be referred to as the reference force, and the maximum adjustment parameter may be called the reference tuning parameter.
  • the above step 3022 includes: the terminal determines the adjustment parameter based on the reference action force, the reference adjustment parameter corresponding to the reference action force, and the action force of the interactive operation; wherein, the adjustment parameter is positively correlated with the reference adjustment parameter, and the adjustment parameter is related to the reference action force. It is negatively correlated, and the adjustment parameters are positively correlated with the force of interaction.
  • the adjustment parameter is a vector with a direction, and the direction of the adjustment parameter is consistent with the direction of the interaction force.
  • the above process is also a process in which the terminal determines the adjustment parameter through the following formula 1.
  • ⁇ W i represents the adjustment parameter corresponding to the force of the interactive operation acting on the image of the ith frame
  • ⁇ W i is a vector with a direction
  • the modulus of ⁇ W i is a non-negative number
  • i is a positive integer
  • ⁇ W represents the reference adjustment parameter corresponding to the reference force
  • ⁇ W is a scalar
  • ⁇ W is a non-negative number
  • Fi represents the force of the interactive operation acting on the image of the ith frame
  • Fi is a vector with a direction
  • the modulus of Fi is a non-negative number
  • F m represents the reference force
  • F m is a scalar
  • F m is a non-negative number.
  • the reference force is 1N (unit of force: Newton), and the reference adjustment parameter is 10mm (millimeters). If the force of the interactive operation is 0.2N, the modulo of the adjustment parameter that matches the force is 2mm.
  • the terminal determines the adjustment parameter corresponding to the action force of the interactive operation based on the reference adjustment parameter corresponding to the unit action force and the action force of the interactive operation.
  • the step of determining the adjustment parameter matched with the action force by the terminal based on the action force of the interactive operation includes: the terminal obtains the reference displacement corresponding to the unit action force; and determining the ratio of the action force of the interactive operation to the unit action force as the reference quantity; The product of the reference quantity and the reference adjustment parameter is determined as the modulus of the adjustment parameter, and the direction of the acting force of the interaction operation is determined as the direction of the adjustment parameter.
  • the unit action force is 0.1N
  • the reference adjustment parameter corresponding to the unit action force is 1mm.
  • the modulo of the adjustment parameter matching the action force is 2mm.
  • the action effect and the interactive operation presented after the adjustment of the pixel can be achieved.
  • the effect of the video corresponds to the intensity of the action, so as to present a more realistic interactive effect, improve the real body sense of video interaction, make the video playback performance meet richer interactive needs, and further expand the experience mode during the video playback process.
  • the terminal determines the displacement of the hand movement when the user performs the interactive operation as the adjustment parameter.
  • the step of obtaining the adjustment parameter corresponding to the interactive operation includes: the terminal responding to the interactive operation acting on the first image, obtaining the interactive operation acting on the first image. The starting position point on the first image, and the ending position point that the interactive operation acts on the first image is obtained; the displacement from the starting position point to the ending position point is determined as the adjustment parameter.
  • the terminal determines the first duration between the time point at which the first image is displayed and the time point at which the interactive operation is detected, and in the case where the sum of the first duration and the target duration is not greater than the display interval of two frames of images, the terminal is in this When the duration of the interactive operation acting on the first image reaches the target duration, the position point at which the interactive operation acts on the first image is determined as the termination position point; or, when the sum of the first duration and the target duration of the terminal is greater than In the case of the display interval of two frames of images, the position point where the interactive operation acts on the first image when the first image is finally displayed is determined as the termination position point, and then the adjustment parameter is determined, and the display position of the pixel point is adjusted according to the adjustment parameter. Adjustment.
  • the target duration represents the effective duration of the interactive operation on the first image when the user performs the interactive operation, and the target duration
  • the terminal triggers an interactive operation when the display duration of the first image reaches 0.01 seconds, the interaction
  • the duration of the operation acting on the first image reaches 0.02 seconds
  • the position point where the interactive operation acts on the first image is determined as the termination position point; or, the interactive operation is triggered when the display duration of the first image reaches 0.02 seconds
  • the position point where the interactive operation acts on the first image is determined as the termination position point.
  • the terminal is a smart phone, a tablet computer, a notebook computer, a desktop computer, or a smart TV, etc.
  • the display screen of the terminal can detect the position where the interactive operation acts.
  • the display screen of the terminal is a resistive touch screen, a capacitive touch screen, an infrared touch screen or a surface acoustic wave touch screen, etc.
  • the types of display screens of the terminal are different, and the principles of detecting the position point where the interactive operation acts are different. In this embodiment of the present application, the principle of detecting the position point on which the interactive operation acts on the display screen of the terminal is not limited.
  • the terminal acquires the displacement parameter of the pixel point of the first image.
  • the displacement parameter may also be referred to as an optical flow estimation parameter, and the displacement parameter represents the displacement of the pixels of the first image between the first image and the second image, and the second image is an image displayed after the first image .
  • the terminal predicts the displacement parameter of the pixel point of the first image by using an optical flow estimation model.
  • the above step 303 includes: the terminal inputs the first image into the optical flow estimation model, and obtains displacement parameters of the pixels of the first image.
  • the optical flow estimation model is used to predict the displacement of the pixels of the current frame image to the movement of the next frame image.
  • the optical flow estimation model is a prediction model trained by FlowNet (optical flow neural network).
  • FlowNet optical flow neural network
  • the optical flow estimation is performed on multiple pairs of training images through the optical flow neural network; based on the displacement parameters output by the optical flow neural network and the real displacement parameters, the network parameters of the optical flow neural network are updated. , so that the displacement parameters output by the optical flow neural network are as close as possible to the real optical flow estimation parameters.
  • the above technical solution uses an optical flow estimation model to predict the displacement parameters of the pixels of a frame of image, and the optical flow estimation model can be used to predict the displacement parameters of images in videos of any format, thereby supporting interactive playback of any video. , which expands the application scope of interactive video playback.
  • the encoded data of the video includes displacement parameters of the pixels of the image in the video, that is, the encoded data includes the encoded displacement parameters
  • the terminal can decode the encoded data of the video to obtain the pixels of the image.
  • the displacement parameter is determined in advance during the encoding process of the video and encoded into the encoded data of the video, wherein the displacement parameter is a computer device used for video encoding in advance according to the displacement change of the pixels of two adjacent frames of images. determined.
  • the displacement parameters of the pixels can be directly decoded from the encoded data of the video, and then based on the directly decoded displacement parameters parameters to perform video processing, which can improve the efficiency of video processing.
  • the displacement parameters of the pixels in the image can also be calculated by other optical flow estimation algorithms.
  • Lucas–Kanade a two-frame difference optical flow estimation algorithm
  • Horn–Schunck an optical flow estimation algorithm for estimating the dense optical flow field of an image
  • the acquisition method of the displacement parameter is not limited.
  • the adjustment parameters are obtained first, and then the displacement parameters are obtained as an example for description.
  • the above-mentioned steps for the terminal to obtain the adjustment parameters and the steps for the terminal to obtain the displacement parameters are described above. Other timings can also be followed.
  • the terminal obtains the adjustment parameter and the displacement parameter at the same time; or, the terminal first obtains the displacement parameter, and then obtains the adjustment parameter, which is not limited in this embodiment of the present application.
  • the terminal adjusts the display position of the pixel in the first image based on the adjustment parameter and the displacement parameter, and displays the second image based on the adjusted display position of the pixel.
  • the terminal superimposes the effect of the interactive operation on the operation area on which the interactive operation acts.
  • the terminal shifts the pixel point from the original display position to the target display position based on the adjustment parameter and the displacement parameter; Point, based on the displacement parameter, the pixel point is shifted from the original display position to the target display position to display the target image.
  • the target image is an image displayed after the first image, and the target image may be referred to as a second image.
  • the terminal adjusts the display position of the pixels affected by the interactive operation based on the adjustment parameter and the displacement parameter; the terminal adjusts the display position of the pixels not affected by the interactive operation based on the displacement parameter, and then based on the adjusted display position of the pixels. to display the second image.
  • the terminal shifts the pixel point from the original display position to the target display position by adjusting the display position of the pixel point, the original display position is the display position of the pixel point in the first image, and the target display position is the adjusted pixel point The display position of the pixel in the second image.
  • the pixels acted on by the interactive operation are offset, which can be used in the first
  • the second image shows the deformation of the animal's fur, creating a flickering effect on the animal's fur.
  • the action direction of the interactive operation is the same as that of the river.
  • the effect of the interactive operation on the The offset of the pixel points can present the effect of accelerating the flow of water in the second image.
  • the pixel points acted on by the interactive operation are shifted, so that the effect of changing snow can be presented in the second image.
  • the pixel points are shifted, and the effect of the interactive operation is superimposed on the operation area on which the interactive operation acts, thereby highlighting the interactive operation in the second image.
  • the effect is to give feedback to the user's interactive operation through the deformation on the video screen, which enriches the interactive effect of the video, realizes the interactive playback of the video, and expands the experience mode during the video playback process.
  • the pixel points in the image are offset based on the displacement parameters, which makes full use of the prior knowledge of video playback and reduces the complex video understanding and calculation.
  • the calculation amount of video processing is small and easy to deploy, which can improve the video
  • the processing efficiency expands the application scope of video interactive playback.
  • the above-mentioned step of shifting the pixel point from the original display position to the target display position based on the adjustment parameter and the displacement parameter includes: the terminal responds to the pixel point that the pixel point acts on the interactive operation, based on the adjustment parameter and the displacement parameter, Determine the target offset parameter; the terminal offsets the pixel point from the original display position to the target display position based on the offset distance and offset direction indicated by the target offset parameter, that is, the terminal is based on the offset indicated by the target offset parameter.
  • the displacement distance and the offset direction are adjusted to adjust the display position of the pixels in the first image that the interactive operation acts on.
  • the terminal adds the adjustment parameter and the displacement parameter to obtain the target offset parameter based on a vector summation method such as the triangle rule, the parallelogram rule, or the coordinate system solution method.
  • the above technical solution first determines the target offset parameter based on the adjustment parameter and the displacement parameter, so that the pixel point can be shifted from the original display position to the target display position at one time based on the target offset parameter, which improves the offset efficiency of the pixel point. Further, the efficiency of video processing can be improved.
  • the terminal can also first shift the pixel point from the original display position to the middle display position based on the displacement parameter; and then shift the pixel point from the middle display position to the target display position based on the adjustment parameter, that is, The terminal first adjusts the display position of the pixel point in the first image based on the displacement parameter, and then adjusts the display position of the pixel point affected by the interactive operation again based on the adjustment parameter on the basis of the adjustment.
  • the process of shifting the pixel point from the original display position to the target display position is not limited.
  • the effect of the superimposed interaction operation is an auxiliary function during the video playback process.
  • the purpose is to enrich the user's video viewing experience. While superimposing the effect of the interaction operation, the original object in the video should be kept.
  • some sports trends For example, the movement trend of the animal in the video is to walk forward, and the interactive operation on the animal's fur should not affect the movement trend of the animal to walk forward.
  • the effect of the interactive operation does not affect the original motion trend of the object in the video.
  • the terminal shifts the pixel point from the original display position to the target display position based on the adjustment parameter and the displacement parameter, so as to display the second image.
  • the influence degree of the display position offset the terminal weights the adjustment parameters based on the weights, and based on the weighted adjustment parameters and displacement parameters, offsets the pixels from the original display position to the target display position to display the second image, and also That is, the adjustment parameters are weighted based on the weights, and the positions of the pixels in the first image are adjusted based on the weighted adjustment parameters and displacement parameters.
  • the weight can also be called the influence weight.
  • the above-mentioned terminal adjusts the pixels in the first image based on the weight, adjustment parameters and displacement parameters to display the second image.
  • the process is implemented based on the following formula 2:
  • Image i+1 represents the i+1 th frame image corresponding to the ith frame image, and i is a positive integer.
  • Image i represents the ith frame image, and the ith frame image is the image affected by the interactive operation.
  • Flow i represents the displacement parameter of the ith frame image, Flow i is a vector with direction, and the modulus of Flow i is a non-negative number.
  • represents the weight, and ⁇ is any value greater than 0 and less than or equal to 1.
  • ⁇ W i represents the adjustment parameter corresponding to the force of the interactive operation acting on the image of the ith frame, ⁇ W i is a vector with a direction, and the modulus of ⁇ W i is a non-negative number.
  • the above formula 2 represents: for the pixels in the image of the i-th frame affected by the interactive operation, the adjustment parameters corresponding to the interactive operation are weighted based on the weights; the weighted adjustment parameters and displacement parameters are summed, and based on the summation result, the The pixel points are shifted from the original display position to the target display position to display the second image.
  • the superposition of the effect of the interactive operation does not affect the original motion trend of the object in the video, and the video can be played normally according to the original progress, while ensuring the user's video. Based on the viewing experience, the interactive effect is further enriched.
  • the terminal does not detect the interactive operation acting on the first image, it does not acquire adjustment parameters corresponding to the interactive operation, and just directly displays the second image.
  • the displacement parameter can represent the displacement of the pixel point change between the first image and the second image
  • the interactive operation can The displacement of the pixel point changes has an impact. Therefore, the display position of the pixel point of the first image is adjusted in combination with the displacement parameter and the adjustment parameter, so that the effect of the interactive operation can be presented on the second image displayed after the first image.
  • the video presents a dynamic effect with a higher degree of matching with the interactive operation, realizes the interactive support for the user and the video being played, enhances the interactivity of the video, and improves the visual effect during the video playback process.
  • the terminal not only superimposes the effect of the interactive operation on the image displayed later to visually enhance the interactive experience of the video, but also plays the object affected by the interactive operation through the following steps 305 to 307. audio data, and give corresponding sound feedback to further enrich the interactive effect of the video.
  • the terminal determines, in response to the interactive operation acting on the first image, a first object in the first image to which the interactive operation acts.
  • At least one object is present in the first image.
  • the first image is an image included in a nature documentary, there are objects such as animals, trees, rivers, and grass in the first image. Wherein, each object in the first image occupies an area in the first image for presentation.
  • the terminal determines the first object on which the interactive operation acts based on semantic segmentation.
  • the above step 305 includes: in response to the interactive operation acting on the first image, the terminal performs semantic segmentation on the first image to obtain at least one first pixel area, that is, the terminal responds to the interactive operation acting on the first image.
  • the interactive operation determine at least one first pixel area of the first image, and each first pixel area contains an object; the terminal determines the first target area on which the interactive operation acts from the at least one first pixel area; the terminal will The object in the first target area is determined to be the first object.
  • the semantic segmentation of the first image refers to identifying objects in the first image, and dividing the first image into at least one first pixel area according to the identified objects, so that each first pixel area contains one object .
  • each first pixel area is used to represent an object in the first image.
  • the first image includes a lion, a grass and a river
  • the first image is semantically segmented to obtain a first pixel area for representing a lion, a first pixel area for representing grass, and a first pixel for representing a river area.
  • the interactive operation acts on the first pixel area for representing the lion
  • the first object on which the interactive operation acts is the lion.
  • the image is divided into a plurality of regions used to represent different objects through semantic segmentation, each region represents an object in the first image, and the object in the region where the interactive operation acts is determined as the interactive operation. Since the semantic segmentation divides the area from the pixel level, the border of the divided area is finer, so that the object affected by the interactive operation can be more accurately determined, and the played audio data can be related to the interactive operation. The effected objects are matched, so that the playback of audio data is more in line with the real scene, and further enhances the interactive experience of the video.
  • the terminal can perform semantic segmentation on the first image through the image segmentation model to obtain at least one first pixel area.
  • the network structure of the image segmentation model is based on CNN (Convolutional Neural Networks).
  • the image segmentation model is an encoder-decoder architecture. The encoder of the image segmentation model captures the local features in the first image through convolutional layers, and nests multiple modules for capturing the local features of the first image in a hierarchical manner, thereby extracting the complex features of the first image.
  • the encoder obtains a feature map by encoding the first image, the size of the feature map is smaller than the size of the first image, and the feature map can represent each pixel point The class label to which it belongs, and then the feature map is input to the decoder of the image segmentation model, and upsampling is performed through the transposed convolution in the decoder, so as to expand the feature map to the same size as the first image, and generate a representation of the first image.
  • An array of category labels of each pixel in an image, and a first pixel area is composed of a plurality of pixels with the same category label.
  • Another point to be noted is that, due to the correlation between adjacent multi-frame images in the video, the objects included in the multi-frame images are the same, and the positions of the same object in the multi-frame images are different. Therefore, after semantic segmentation of one frame of image, the pixels in the same pixel region can be tracked based on optical flow estimation, so that pixel regions representing different objects can be determined in the next frame of image through pixel tracking.
  • the terminal performs semantic segmentation on the first image to obtain at least one first pixel area, and determines that the interactive operation is in the following steps when the interactive operation acts on the second image.
  • Playing the audio data corresponding to the object acting on the second image includes: the terminal determining the second image of the second image based on the target display position of the pixel in the second image whose original display position is in the first pixel area.
  • the terminal determines the pixel point based on at least one pixel point in the first pixel area and the adjusted display position of the pixel point At least one second pixel area of the second image, the one second pixel area corresponds to a first pixel area, and the original display position of the pixels in the second pixel area is in the corresponding first pixel area; the terminal responds to the action For the interactive operation on the second image, the second target area on which the interactive operation acts is determined from at least one second pixel area; the terminal determines the object in the second target area as the second object, and plays the corresponding image of the second object. audio data.
  • the pixel points can be tracked based on optical flow estimation, so as to obtain one or more frames of images after the frame of image.
  • the multiple pixel regions in the image can be obtained without semantic segmentation of each frame image, which saves the time consumed by multiple semantic segmentations and improves the efficiency of determining the objects affected by the interactive operation.
  • the efficiency of sound feedback can be improved, and the video interaction experience can be further improved.
  • the terminal can also determine the object affected by the interactive operation by methods such as target detection, classification and positioning, or instance segmentation.
  • the process of determining the object affected by the interactive operation is not limited. .
  • the terminal determines the audio data corresponding to the first object from the correspondence between the object and the audio data.
  • the encoded data of the video includes the correspondence between the object and the audio data
  • the terminal can decode the encoded data of the video to obtain the object relationship between the object and the audio data; from the correspondence between the object and the audio data , the audio data of the first object is determined.
  • the server stores a correspondence between objects and audio data
  • the terminal can send an audio data acquisition request to the server, where the audio data acquisition request is used to request acquisition of audio data corresponding to the first object
  • the server Receive an audio data acquisition request from the terminal; determine the audio data corresponding to the first object from the stored correspondence between the object and the audio data; return the audio data to the terminal; and the terminal receives the audio data returned by the server.
  • the server stores the correspondence between the object and the audio data in the audio database.
  • the server determines the corresponding relationship of the first object from the stored corresponding relationship between the object and the audio data.
  • the corresponding relationship between the object and the audio data is also generated.
  • the corresponding relationship between the object generated by the server and the audio data is taken as an example for description.
  • the process of generating the correspondence between the object and the audio data by the computer device for video encoding is the same as the process of the server generating the correspondence between the object and the audio data.
  • the step that the server generates the corresponding relationship between the object and the audio data includes the following steps 1 to 3:
  • Step 1 The server determines the main object in the video.
  • the main object is the object highlighted in the video.
  • the main objects are forests, animals, rivers, etc.; in astronomical documentaries, the main objects are stars, gases, etc. in the universe; in food documentaries, the main objects are various ingredients.
  • the server performs semantic segmentation on the image in the video, and determines the object in the image; divides the video into multiple video clips; determines the frequency of occurrence of each object in the video clip; The ratio of the appearance frequency to the sum of the appearance frequencies of each object in the video clip is determined as the appearance proportion of each object; the object whose appearance proportion is greater than the reference threshold is determined as the main object.
  • the server divides the video into multiple video segments according to a fixed duration, for example, the total duration of the video is 1 hour, and the server intercepts one video segment every 5 minutes.
  • the reference threshold is a preset threshold greater than 0 and less than 1, for example, the reference threshold is 0.8, 0.9, and so on.
  • Step 2 The server obtains the audio data of the subject object.
  • the server extracts the audio data of the main object from the video segment in which the main object exists in the video. For example, for an animal documentary, if a lion is included in the animal documentary, the server extracts the audio data of the lion from the video clip in which the lion appears.
  • the main object is a forest or an animal
  • video clips for audio extraction usually include narration
  • the video clips with narration are those with greater influence of human voice
  • the video clips without narration are the video clips with simple audio data of the main object. If there is no video clip with relatively simple audio data in the video clip, the server can perform noise reduction filtering on the video clip with human voice, and extract the audio data of the main object.
  • the server obtains the audio data of the subject object from other audio data sources including the subject object.
  • the main object is a mountain, starry sky, etc.
  • the main object is a static target
  • the video where the main object is located has less audio data of the main object, so it is necessary to use other audio data sources for audio data. supplement.
  • the main object is a stone mountain
  • the audio data of touching the stone is obtained from other audio data sources.
  • the main object is a starry sky
  • the audio data of the wind chime is obtained from the audio data source.
  • the friction sound of animal hair is obtained through other audio data sources.
  • the server obtains the audio data of the main object, it classifies the multiple videos that need to add the video interactive playback function according to the video type.
  • Natural landscape videos with audio data and animal videos with rich audio data of main objects.
  • the audio data of the subject object is extracted from the video clip in which the subject object exists in the video.
  • Step 3 The server generates a correspondence between the main object and the audio data of the main object.
  • the server After acquiring the audio data of the main object, the server generates a corresponding relationship between the main object and the audio data of the main object.
  • the terminal can obtain the corresponding audio data from the server for playing, which enriches the audio-visual experience of the interactive video playing process.
  • the server stores the correspondence between the subject object and the audio data of the subject object in the audio database.
  • the terminal plays audio data corresponding to the first object.
  • the terminal plays the audio data corresponding to the first object.
  • the terminal plays the audio data corresponding to the first object.
  • the volume of the audio data corresponding to the first object played by the terminal is greater than the volume of the original audio data of the played video, so as to highlight the sound feedback effect generated by the interactive operation.
  • the effect of the interactive operation is presented on the second image to visually represent the feedback of the interactive operation, and on the other hand, the audio of the object acted on by the interactive operation is played.
  • Data showing the sound feedback of the interactive operation, so that from the two aspects of vision and hearing, during the video playback process, feedback to the user's interactive operation can be realized, which can realize the interactive playback of the video and improve the audio-visual in the process of video interactive playback. Effect.
  • the terminal also implements sound feedback of different volumes in combination with the force of the interactive operation.
  • the above step 307 includes: the terminal acquires the playback volume corresponding to the force of the interaction operation; and the terminal plays the audio data corresponding to the first object based on the playback volume.
  • the force of action is positively correlated with the playback volume, and the greater the force of action, the greater the playback volume.
  • the terminal determines the playback volume corresponding to the action force based on the volume conversion parameter and the action force.
  • the action force is 0.1N
  • the volume conversion parameter is 400
  • the playback volume is 40.
  • the terminal stores a corresponding relationship between the action force and the playback volume, and determines the playback volume based on the corresponding relationship.
  • the terminal requests the server to return the playback volume corresponding to the force by sending a volume conversion request to the server.
  • the process for the terminal to obtain the playback volume corresponding to the force of the interaction operation is not limited.
  • the above technical solution can realize sound feedback of different volumes according to the force of the interactive operation, thereby further improving the audio-visual effect of interactive video playback and enriching the interactive experience of the video.
  • the terminal performs steps 302 to 307 in sequence as an example for description. In some embodiments, the terminal can also perform steps 302 to 307 according to other sequences. Optionally, the terminal performs steps 302 to 304 and steps 305 to 307 simultaneously; or, the terminal performs steps 305 to 307 first, and then performs steps 302 to 304, which is not limited in this embodiment of the present application. Optionally, while displaying the second image, the terminal plays the audio data corresponding to the first object, so that the visual effect and the sound effect generated by the interactive operation are generated synchronously, so as to enhance the user's physical sensation and further improve the audio-visual effect.
  • step 401 performing the extraction of main objects in the video and the establishment of an audio database.
  • step 401 may be implemented through steps 1 to 3 of generating the correspondence between objects and audio data in step 306 .
  • the process of video interactive playback includes: 402, video playback, the viewer turns on the interactive playback mode, and the terminal displays the first image in the video through the above step 301; 403, the viewer touches and interacts; 404, the interaction algorithm based on optical flow estimation, the terminal Through the above steps 302 to 304, the second image is displayed based on the optical flow estimation to present the effect of touch interaction; 405, sound feedback, the terminal plays the audio data of the object affected by the interactive operation of the viewer through the above steps 305 to 307, so as to realize Sound feedback; 406.
  • Final interactive playback While displaying the second image, the terminal plays the audio data of the object affected by the interactive operation, so as to realize the final interactive playback.
  • the terminal only needs to present the visual effect caused by the interactive operation through the above steps 302 to 304, and the terminal may not perform steps 305 to 307 to increase the sound effect caused by the interactive operation.
  • the interactive playback of the video is realized through the interaction between the terminal and the server. For example, display the first image in the video; in response to the interactive operation acting on the first image, send a video processing request to the server to request the server to determine the second image; the terminal receives the second image returned by the server, and displays the second image image.
  • the process for the server to determine the second image is the same as the process for the terminal to determine the second image.
  • the video processing request is further used to request the server to determine audio data corresponding to the interactive operation, and the terminal receives the audio data returned by the server and plays the audio data.
  • the process for the server to determine the audio data corresponding to the interactive operation is the same as the process for the terminal to determine the audio data corresponding to the interactive operation.
  • FIG. 5 is a block diagram of a video processing apparatus provided by an embodiment of the present application. Referring to Figure 5, the device includes:
  • the first obtaining module 501 is configured to obtain adjustment parameters corresponding to the interactive operations in response to the interactive operations acting on the first image, and the adjustment parameters indicate the adjustment range of the display positions of the pixels in the first image based on the interactive operations.
  • the image is the image currently displayed in the video being played;
  • the second obtaining module 502 is used for obtaining the displacement parameter of the pixel point of the first image, the displacement parameter represents the displacement of the pixel point between the first image and the second image, and the second image is an image displayed after the first image;
  • the second display module 503 is configured to adjust the display position of the pixel point in the first image based on the adjustment parameter and the displacement parameter;
  • the second display module 503 is further configured to display the second image based on the display position adjusted by the pixel points.
  • the displacement parameter can represent the displacement of the pixel point change between the first image and the second image
  • the interactive operation can The displacement of the pixel point changes has an impact. Therefore, the display position of the pixel point of the first image is adjusted in combination with the displacement parameter and the adjustment parameter, so that the effect of the interactive operation can be presented on the second image displayed after the first image.
  • the video presents a dynamic effect with a higher degree of matching with the interactive operation, realizes the interactive support for the user and the video being played, enhances the interactivity of the video, and improves the visual effect during the video playback process.
  • the first obtaining module 501 includes:
  • a force acquisition unit configured to acquire the force of the interaction operation in response to the interaction operation acting on the first image
  • the parameter determination unit is used for determining an adjustment parameter matching the action force based on the action force of the interactive operation.
  • the parameter determination unit is used to:
  • the adjustment parameter is positively correlated with the reference adjustment parameter
  • the adjustment parameter is negatively correlated with the reference action force
  • the adjustment parameter is positively correlated with the action force of the interactive operation.
  • the second obtaining module 502 is configured to input the first image into the optical flow estimation model to obtain displacement parameters of the pixels of the first image;
  • the second obtaining module 502 is configured to decode the encoded data of the video to obtain displacement parameters of the pixels of the first image, and the encoded data includes the encoded displacement parameters.
  • the second display module 503 includes:
  • the pixel point offset unit is used to adjust the display position of the pixel point in the first image on which the interactive operation acts based on the adjustment parameter and the displacement parameter.
  • the pixel point offset unit is used for:
  • the second display module 503 is used for:
  • the adjustment parameters are weighted, and based on the weighted adjustment parameters and displacement parameters, the display positions of the pixels in the first image are adjusted.
  • the device further includes:
  • a first object determination module configured to determine, in response to an interactive operation acting on the first image, a first object on which the interactive operation acts in the first image
  • an audio determination module configured to obtain the audio data corresponding to the first object from the corresponding relationship between the object and the audio data
  • the audio playing module is used for playing the audio data corresponding to the first object.
  • the first object determination module is used for:
  • An object in the first target area is determined as the first object.
  • the device further includes:
  • a pixel tracking module configured to determine at least one second pixel area of the second image based on the pixel points in the at least one first pixel area and the adjusted display position of the pixel points, where one second pixel area corresponds to one first pixel area , the original display position of the pixel in the second pixel area is in the corresponding first pixel area;
  • the first object determination module is further configured to, in response to the interactive operation acting on the second image, determine from the at least one second pixel area a second target area on which the interactive operation acts; determine the object in the second target area for the second object;
  • the audio playing module is further configured to play the audio data corresponding to the second object.
  • the audio playback module is used to:
  • the audio data corresponding to the first object is played.
  • the device further includes:
  • the second object determination module is used to determine the main object in the video
  • the audio extraction module is used for extracting the audio data of the main object from the video clip in which the main object exists in the video;
  • the relationship generation module is used to generate the corresponding relationship between the main object and the audio data of the main object.
  • the computer device may be configured as a terminal or a server. If the computer device is configured as a terminal, the terminal is used as an execution subject to implement the technical solutions provided by the embodiments of the present application. If the computer device is configured as a server, the server is used as the execution body to implement the technical solutions provided by the embodiments of the present application, or the technical solutions provided by the embodiments of the present application are implemented through the interaction between the terminal and the server. The comparison is not limited.
  • FIG. 6 shows a structural block diagram of a terminal 600 provided by an exemplary embodiment of the present application.
  • the terminal 600 includes: a processor 601 and a memory 602 .
  • the processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 601 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • DSP Digital Signal Processing, digital signal processing
  • FPGA Field-Programmable Gate Array, field programmable gate array
  • PLA Programmable Logic Array, programmable logic array
  • Memory 602 may include one or more computer-readable storage media, which may be non-transitory.
  • the non-transitory computer-readable storage medium in the memory 602 is used to store at least one piece of program code, and the at least one piece of program code is used to be executed by the processor 601 to implement the methods provided by the method embodiments in this application. video processing method.
  • the terminal 600 may optionally further include: a peripheral device interface 603 and at least one peripheral device.
  • the processor 601, the memory 602 and the peripheral device interface 603 may be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 603 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 604 , a display screen 605 , and an audio circuit 606 .
  • the peripheral device interface 603 may be used to connect at least one peripheral device related to I/O (Input/Output) to the processor 601 and the memory 602 .
  • I/O Input/Output
  • the display screen 605 is used for displaying UI (User Interface, user interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 605 also has the ability to acquire touch signals on or above the surface of the display screen 605 .
  • the touch signal may be input to the processor 601 as a control signal for processing.
  • the display screen 605 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • Audio circuitry 606 may include a microphone and speakers.
  • the microphone is used to collect the sound waves of the user and the environment, convert the sound waves into electrical signals and input them to the processor 601 for processing, or to the radio frequency circuit 604 to realize voice communication.
  • the microphone may also be an array microphone or an omnidirectional collection microphone.
  • the speaker is used to convert the electrical signal from the processor 601 or the radio frequency circuit 604 into sound waves.
  • terminal 600 also includes one or more pressure sensors 607 .
  • the pressure sensor 607 may be disposed on the side frame of the terminal 600 and/or the lower layer of the display screen 605 .
  • the processor 601 can perform left and right hand identification or shortcut operations according to the holding signal collected by the pressure sensor 607.
  • the processor 601 controls the operability controls on the UI interface according to the user's pressure operation on the display screen 605.
  • the operability controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
  • FIG. 6 does not constitute a limitation on the terminal 600, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • FIG. 7 is a block diagram of a server provided by an embodiment of the present application.
  • the server 700 may vary greatly due to different configurations or performance, and may include one or more processors (Central Processing Units, CPU) 701 and one or more memories 702, wherein, at least one piece of program code is stored in the memory 702, and at least one piece of program code is loaded and executed by the processor 701 to implement the video processing methods provided by the above method embodiments.
  • the server may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output, and the server may also include other components for implementing device functions, which will not be described here.
  • the embodiment of the present application also provides a computer device, the computer device includes a processor and a memory, and at least one piece of program code is stored in the storage, and the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • an adjustment parameter corresponding to the interactive operation is obtained, where the adjustment parameter indicates an adjustment range of the display position of the pixel point in the first image based on the interactive operation, the The first image is the image currently displayed in the played video;
  • the second image is displayed based on the adjusted display position of the pixel point.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the adjustment parameter matching the force of action is determined.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the adjustment parameter is positively correlated with the reference adjustment parameter, the adjustment parameter is negatively correlated with the reference action force, and the adjustment parameter is positively correlated with the action force of the interactive operation.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the encoded data of the video is decoded to obtain displacement parameters of the pixels of the first image, and the encoded data includes the encoded displacement parameters.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the display position of the pixel on which the interactive operation acts in the first image is adjusted.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the display position of the pixel on which the interactive operation acts in the first image is adjusted.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the weight represents the degree of influence of the interactive operation on the display position shift of the pixel point
  • the adjustment parameter is weighted, and the display position of the pixel point in the first image is adjusted based on the weighted adjustment parameter and the displacement parameter.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • An object in the first target area is determined as the first object.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • At least one second pixel area of the second image is determined based on the pixel points in the at least one first pixel area and the adjusted display position of the pixel points, and one second pixel area corresponds to one first pixel area , the original display position of the pixel in the second pixel area is in the corresponding first pixel area;
  • the object in the second target area is determined as the second object, and the audio data corresponding to the second object is played.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the audio data corresponding to the first object is played.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • a corresponding relationship between the subject object and the audio data of the subject object is generated.
  • a computer-readable storage medium is also provided, and at least one piece of program code is stored in the computer-readable storage medium, and the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • an adjustment parameter corresponding to the interactive operation is obtained, where the adjustment parameter indicates an adjustment range of the display position of the pixel point in the first image based on the interactive operation, the The first image is the image currently displayed in the played video;
  • the second image is displayed based on the adjusted display position of the pixel point.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • the adjustment parameter matching the force of action is determined.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • the adjustment parameter is positively correlated with the reference adjustment parameter, the adjustment parameter is negatively correlated with the reference action force, and the adjustment parameter is positively correlated with the action force of the interactive operation.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • the encoded data of the video is decoded to obtain displacement parameters of the pixels of the first image, and the encoded data includes the encoded displacement parameters.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • the display position of the pixel on which the interactive operation acts in the first image is adjusted.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • the display position of the pixel on which the interactive operation acts in the first image is adjusted.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • the weight represents the degree of influence of the interactive operation on the display position shift of the pixel point
  • the adjustment parameter is weighted, and the display position of the pixel point in the first image is adjusted based on the weighted adjustment parameter and the displacement parameter.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • An object in the first target area is determined as the first object.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • At least one second pixel area of the second image is determined based on the pixel points in the at least one first pixel area and the adjusted display position of the pixel points, and one second pixel area corresponds to one first pixel area , the original display position of the pixel in the second pixel area is in the corresponding first pixel area;
  • the object in the second target area is determined as the second object, and the audio data corresponding to the second object is played.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • the audio data corresponding to the first object is played.
  • the at least one piece of program code can be executed by a processor of a computer device to implement the following steps:
  • a corresponding relationship between the subject object and the audio data of the subject object is generated.
  • the computer-readable storage medium may be ROM (Read-Only Memory, read-only memory), RAM (Random Access Memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, read-only optical disk), magnetic tape , floppy disks and optical data storage devices.
  • ROM Read-Only Memory, read-only memory
  • RAM Random Access Memory, random access memory
  • CD-ROM Compact Disc Read-Only Memory, read-only optical disk
  • magnetic tape floppy disks and optical data storage devices.
  • the present application also provides a computer program product or computer program, the computer program product or computer program comprising computer program code, the computer program code being stored in a computer-readable storage medium, and the processor of the computer device from the computer-readable storage medium After reading the computer program code, the processor executes the computer program code, so that the computer device executes the video processing method in each of the above method embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种视频处理方法、装置、计算机设备及存储介质,属于计算机技术领域。该方法包括:响应于作用在第一图像上的交互操作,获取交互操作对应的调整参数(201);获取第一图像的像素点的位移参数(202);基于调整参数和位移参数,调整第一图像中像素点的显示位置(203);基于像素点调整后的显示位置,显示第二图像(204)。采用上述方法、装置、计算机设备及存储介质,增强了视频的交互性,提高了视频播放过程中的视觉效果。

Description

视频处理方法、装置、计算机设备及存储介质
本申请要求于2020年10月10日提交、申请号为202011078356.9、发明名称为“视频处理方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,特别涉及一种视频处理方法、装置、计算机设备及存储介质。
背景技术
视频的出现丰富了人们的生活。人们通过观看视频能够直观高效的获取到各种信息,感受世界的多姿多彩。视频包括图像和音频,能够从视觉和听觉两个方面为用户提供直观的、感染性较强的观看体验。
发明内容
本申请实施例提供了一种视频处理方法、装置、计算机设备及存储介质,能够实现对用户与正在播放的视频的交互支持,增强了视频的交互性,提高了视频播放过程中的视觉效果。所述技术方案如下:
一方面,提供了一种视频处理方法,由计算机设备执行,所述方法包括:
响应于作用在第一图像上的交互操作,获取所述交互操作对应的调整参数,所述调整参数指示基于所述交互操作对所述第一图像中像素点的显示位置的调整幅度,所述第一图像为所播放的视频中当前显示的图像;
获取所述第一图像的像素点的位移参数,所述位移参数表示所述像素点在所述第一图像与第二图像之间的位移,所述第二图像为在所述第一图像之后显示的图像;
基于所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置;
基于所述像素点调整后的显示位置,显示所述第二图像。
一方面,提供了一种视频处理装置,所述装置包括:
第一获取模块,用于响应于作用在第一图像上的交互操作,获取所述交互操作对应的调整参数,所述调整参数指示基于所述交互操作对所述第一图像中像素点的显示位置的调整幅度,所述第一图像为所播放的视频中当前显示的图像;
第二获取模块,用于获取所述第一图像的像素点的位移参数,所述位移参数表示所述像素点在所述第一图像与第二图像之间的位移,所述第二图像为在所述第一图像之后显示的图像;
第二显示模块,用于基于所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置;
所述第二显示模块,还用于基于所述像素点调整后的显示位置,显示所述第二图像。
在一种可选的实现方式中,所述第一获取模块,包括:
力度获取单元,用于响应于作用在所述第一图像上的交互操作,获取所述交互操作的作用力度;
参数确定单元,用于基于所述交互操作的作用力度,确定与所述作用力度匹配的所述调整参数。
在另一种可选的实现方式中,所述参数确定单元,用于:
基于参考作用力度、所述参考作用力度对应的参考调整参数、所述交互操作的作用力度,确定所述调整参数;
其中,所述调整参数与所述参考调整参数呈正相关,所述调整参数与所述参考作用力度呈负相关,所述调整参数与所述交互操作的作用力度呈正相关。
在另一种可选的实现方式中,所述第二获取模块,用于将所述第一图像输入到光流估计模型中,得到所述第一图像的像素点的位移参数;
或者,所述第二获取模块,用于对所述视频的编码数据进行解码,得到所述第一图像的像素点的位移参数,所述编码数据包括编码后的所述位移参数。
在另一种可选的实现方式中,所述第二显示模块,包括:
像素点偏移单元,用于基于所述调整参数和所述位移参数,调整所述第一图像中所述交互操作所作用的像素点的显示位置。
在另一种可选的实现方式中,所述像素点偏移单元,用于:
基于所述调整参数和所述位移参数,确定目标偏移参数;
基于所述目标偏移参数所指示的偏移距离和偏移方向,调整所述第一图像中所述交互操作所作用的像素点的显示位置。
在另一种可选的实现方式中,所述第二显示模块,用于:
获取所述交互操作对应的权重,所述权重用于表示所述交互操作对所述像素点的显示位置偏移的影响程度;
基于所述权重,对所述调整参数进行加权,基于加权后的所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置。
在另一种可选的实现方式中,所述装置还包括:
第一对象确定模块,用于响应于作用在所述第一图像上的交互操作,确定所述第一图像中所述交互操作所作用的第一对象;
音频确定模块,用于从对象与音频数据的对应关系中,获取所述第一对象对应的音频数据;
音频播放模块,用于播放所述第一对象对应的音频数据。
在另一种可选的实现方式中,所述第一对象确定模块,用于:
响应于作用在所述第一图像上的所述交互操作确定所述第一图像的至少一个第一像素区域,每个所述第一像素区域包含一个对象;
从所述至少一个第一像素区域中,确定所述交互操作所作用在的第一目标区域;
将所述第一目标区域中的对象确定为所述第一对象。
在另一种可选的实现方式中,所述装置还包括:
像素跟踪模块,用于基于所述至少一个第一像素区域内的像素点和所述像素点调整后的显示位置,确定所述第二图像的至少一个第二像素区域,一个第二像素区域与一个第一像素区域对应,所述第二像素区域中的像素点的原显示位置在对应的第一像素区域内;
所述第一对象确定模块,还用于响应于作用在所述第二图像上的交互操作,从所述至少一个第二像素区域中确定所述交互操作所作用在的第二目标区域;将所述第二目标区域中的对象确定为第二对象;
所述音频播放模块,还用于播放所述第二对象对应的音频数据。
在另一种可选的实现方式中,所述音频播放模块,用于:
获取所述交互操作的作用力度对应的播放音量;
基于所述播放音量,播放所述第一对象对应的音频数据。
在另一种可选的实现方式中,所述装置还包括:
第二对象确定模块,用于确定所述视频中的主体对象;
音频提取模块,用于从所述视频中存在所述主体对象的视频片段中,提取所述主体对象的音频数据;
关系生成模块,用于生成所述主体对象与所述主体对象的音频数据的对应关系。
一方面,提供了一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以实现上述任一种可选的实现方式所述的视频处理方法。
一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以实现上述任一种可选的实现方式所述的视频处理方法。
一方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机程序代码,该计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机程序代码,处理器执行该计算机程序代码,使得该计算机设备执行上述任一种可选的实现方式所述的视频处理方法。
本申请实施例提供的技术方案,由于位移参数能够表示第一图像与第二图像之间像素点变化的位移,且在交互操作作用在视频的第一图像上的情况下,该交互操作能够对像素点变化的位移产生影响,因此结合位移参数和调整参数对第一图像的像素点的显示位置进行调整,能够将交互操作的作用效果呈现于在第一图像之后显示的第二图像上,从而使视频呈现出与交互操作匹配度更高的动态效果,实现对用户与正在播放的视频的交互支持,增强了视频的交互性,提高了视频播放过程中的视觉效果。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种实施环境的示意图;
图2是本申请实施例提供的一种视频处理方法的流程图;
图3是本申请实施例提供的一种视频处理方法的流程图;
图4是本申请实施例提供的一种视频交互播放的流程图;
图5是本申请实施例提供的一种视频处理装置的框图;
图6是本申请实施例提供的一种终端的框图;
图7是本申请实施例提供的一种服务器的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请的说明书和权利要求书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们的任意变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
相关技术中,计算机设备在屏幕上播放视频,在视频播放过程中,计算机设备并不支持用户与正在播放的视频进行交互,计算机设备的视频播放性能不能满足交互需求,视频播放 过程中的体验方式较为单一。
为了方便理解,下面对本申请实施例中涉及的名词进行解释说明。
光流估计:光流用于表示图像中各个像素点的瞬时移位,是依据视频中帧与帧之间各像素点的相关性得到的。对于时序相邻的两帧图像I(t-1)和I(t),I(t-1)上每个像素点移位之后,各像素点的位置与I(t)一致。通过光流估计一方面能够得知对象在下一时刻的位置,从而利用光流来提升视频中目标追踪的速度和准确性,在视频播放的过程中达到快速追踪对象的效果。通过光流估计另一方面能够预测当前帧中的像素点向下一帧的运动趋势。
语义分割:语义分割从像素级别来理解图像,将图像中的像素点划分为多个类别。例如,图像包括摩托车和骑摩托车的人,通过语义分割,将描绘骑摩托车的人的像素点划分为同一类,将描绘摩托车的像素点划分为另一类。
图1是本申请实施例提供的一种实施环境的示意图。参见图1,在一种可选的实现方式中,该实施环境包括终端101,本申请实施例提供的视频处理方法由终端101执行。可选地,终端101是智能手机、平板电脑、笔记本电脑、台式计算机、智能电视、VR(Virtual Reality,虚拟现实)设备等,但并不局限于此。可选地,终端101上设有支持视频交互播放的应用程序,例如,该应用程序为视频播放类应用程序、浏览器等。
在另一种可选的实现方式中,该实施环境包括终端101和服务器102,本申请实施例提供的视频处理方法通过终端101和服务器102之间的交互来实施。可选地,服务器102是独立的物理服务器;或者,服务器102是多个物理服务器构成的服务器集群或者分布式系统;或者,服务器102是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。可选地,服务器102以及终端101通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。
可选地,本申请实施例提供的技术方案由终端作为执行主体来实施;或者本申请实施例提供的技术方案由服务器作为执行主体来实施;或者,本申请实施例提供的技术方案通过终端和服务器之间的交互来实施,本申请对此不加以限定。在本申请实施例中,以技术方案的执行主体是终端为例进行说明。
图2是本申请实施例提供的一种视频处理方法的流程图。参见图2,在本申请实施例中,以终端是执行主体为例进行说明,该实施例包括:
201、终端响应于作用在第一图像上的交互操作,获取该交互操作对应的调整参数,该调整参数指示基于交互操作对第一图像中像素点的显示位置的调整幅度,该第一图像为所播放的视频中当前显示的图像。
一个视频由多帧静态的图像组成,多帧图像按照第一帧率快速连续地显示在终端上,达到动态的视频效果。其中,第一帧率为任一帧率。
终端获取交互操作对应的调整参数,基于调整参数对第一图像中像素点的显示位置进行调整,将交互操作的作用效果呈现在下一帧图像中,即呈现于在第一图像之后显示的第二图像中。
在终端为智能手机、平板电脑、笔记本电脑、台式计算机或者智能电视等设备的情况下,用户通过触摸终端的显示屏触发对第一图像的交互操作,或者通过鼠标、键盘在显示屏上进行操作,触发对第一图像的交互操作。终端在显示视频中的第一图像的过程中,检测到作用在第一图像上的交互操作,则获取该交互操作对应的调整参数。
在终端为VR设备的情况下,用户穿戴VR设备中的手部操作感应装置,通过手部操作感应装置与视频进行交互。VR设备在显示视频中的一帧图像的过程中,通过手部操作感应装置检测到交互操作,则获取该交互操作对应的调整参数。
202、终端获取图像的像素点的位移参数,该位移参数表示像素点在该第一图像与第二图 像之间的位移,该第二图像为在第一图像之后显示的图像。
视频中相邻两帧图像之间的像素点具有相关性,帧与帧之间像素点的移位在视觉效果上表现为视频画面中物体的运动。同一像素点从第N帧图像向第N+1帧图像的运动表现为光流,其中,N为正整数,第N帧图像可称为第一图像,第N+1帧图像可称为第二图像。视频中原相邻两帧图像之间的像素点的瞬时移位由光流估计参数来表示,即视频中原相邻两帧图像之间的像素点的瞬时位移由位移参数来表示。
203、终端基于调整参数和位移参数,调整第一图像中像素点的显示位置。
位移参数表示像素点在一帧图像与该帧图像的下一帧图像之间的原位移,也即是位移参数表示像素点在第一图像与第二图像之间的位移,以位移参数为基础,对像素点的位置进行偏移,能够表现出像素点原始的位移变化。然后再结合调整参数,对像素点的位置再次进行偏移,能够在像素点原位移变化的基础上叠加交互操作造成的位移变化,实现对像素点的显示位置的调整。
204、终端基于像素点调整后的显示位置,显示第二图像。
基于像素点调整后的显示位置显示的第二图像,能够呈现出交互操作的作用效果,从而实现视频的交互式播放。
本申请实施例提供的技术方案,由于位移参数能够表示第一图像与第二图像之间像素点变化的位移,且在交互操作作用在视频的第一图像上的情况下,该交互操作能够对像素点变化的位移产生影响,因此结合位移参数和调整参数对第一图像的像素点的显示位置进行调整,能够将交互操作的作用效果呈现于在第一图像之后显示的第二图像上,从而使视频呈现出与交互操作匹配度更高的动态效果,实现对用户与正在播放的视频的交互支持,增强了视频的交互性,提高了视频播放过程中的视觉效果。
图3是本申请实施例提供的一种视频处理方法的流程图。参见图3,在本申请实施例中,以终端进行视频的交互播放为例进行说明,也即是,终端通过在视频中叠加交互操作的作用效果,对用户的交互操作做出反馈,实现视频的交互播放。该实施例包括:
301、终端显示所播放的视频中的第一图像。
多帧图像依次快速连续的进行显示形成视频。终端进行视频播放的过程拆解开来是终端依次显示多帧图像的过程。终端对视频的播放和处理也即是对视频中图像的显示和处理。
需要说明的是,终端支持任意视频类型的视频进行交互播放。在一种可选的实现方式中,终端默认对视频进行交互播放。
在另一种可选的实现方式中,终端在交互播放模式处于开启状态的情况下,对视频进行交互播放。终端提供交互播放模式的启停开关,用户能够通过交互播放模式的启停开关,来控制交互播放模式的开启与关闭。终端响应于交互播放模式的启停开关被开启,确定交互播放模式进入开启状态;终端响应于交互播放模式的启停开关被关闭,确定交互播放模式进入关闭状态。
例如,在终端为智能手机的情况下,用户能够通过终端上的视频播放应用程序观看视频。终端响应于对视频播放应用程序的启动操作,运行视频播放应用程序。用户打开视频播放应用程序后,能够通过视频播放应用程序的应用界面,选择视频进行观看。终端显示视频播放应用程序的应用界面;响应于对应用界面中视频的点击操作,播放该视频。用户打开视频之后,能够根据自身需要打开交互播放模式。例如,视频播放界面上包括交互播放模式的启停开关,终端响应于交互播放模式的启停开关被开启,对该视频进行交互播放。
在终端为VR设备或者智能电视的情况下,用户能够通过语音指令或者手势操作,控制终端进入交互播放模式。在终端为智能电视的情况下,用户也能够通过按下智能电视的遥控器上的交互按键,来控制终端进入交互播放模式。在本申请实施例中,对终端进入交互播放模式的方式,不加以限定。
在实际应用中,终端主要支持目标视频类型的视频进行交互播放。相较于除目标视频类型之外的其他视频类型的视频,用户对目标视频类型的视频具有更高的视频交互需求。例如,目标视频类型包括自然类纪录片、天文类记录片、食物类记录片以及VR类影片等。
终端所播放的视频中当前显示的图像即为第一图像,本申请实施例以第一图像为例,对视频处理过程进行说明。
302、终端响应于作用在该第一图像上的交互操作,获取该交互操作对应的调整参数。
在一个示例中,交互操作为对终端的显示屏的触摸操作,交互操作在终端的显示屏显示该第一图像时作用在该显示屏上。在另一个示例中,交互操作是通过VR设备的手部操作感应装置捕捉到的手部操作,VR设备在显示该图像时通过手部操作感应装置捕捉到作用在该图像上的交互操作。
上述调整参数用于对该第一图像中像素点的显示位置进行调整,以使交互操作的作用效果呈现在在该第一图像之后显示的第二图像中。调整参数是具有大小和方向的矢量,指示像素点的显示位置所调整的位移幅度。调整参数包括用于调整像素点的显示位置的偏移距离和偏移方向。也即是该调整参数指示基于交互操作对第一图像中像素点的显示位置的调整幅度,该调整幅度是指在像素点的原位移的基础上对像素点的显示位置进行调整的幅度。
在一种可选的实现方式中,终端获取与交互操作的作用力度相匹配的调整参数,以按照作用力度来表现交互操作的作用效果。相应的,终端响应于作用在该第一图像上的交互操作,获取该交互操作对应的调整参数通过以下步骤3021至步骤3022实现。
3021、终端响应于作用在该第一图像上的交互操作,获取该交互操作的作用力度。
例如,终端的显示屏的下层设有压力传感器。终端通过压力传感器识别出交互操作的作用力度。
3022、终端基于该交互操作的作用力度,确定与该作用力度匹配的调整参数。
其中,作用力度与调整参数呈正相关,作用力度越大,调整参数也越大。
在一种可选的实现方式中,终端依据最大作用力度与最大调整参数的对应关系,确定与当前的作用力度相对应的调整参数,其中最大作用力可称为参考作用力,最大调整参数可称为参考调整参数。相应的,上述步骤3022包括:终端基于参考作用力度、参考作用力度对应的参考调整参数、交互操作的作用力度,确定调整参数;其中,调整参数与参考调整参数呈正相关,调整参数与参考作用力度呈负相关,调整参数与交互操作的作用力度呈正相关。调整参数为具有方向的矢量,调整参数的方向与交互操作的作用力的方向一致。
上述过程也即是终端通过以下公式一确定调整参数的过程。
公式一:
Figure PCTCN2021117982-appb-000001
其中,ΔW i表示作用在第i帧图像上的交互操作的作用力度对应的调整参数,ΔW i为具有方向的矢量,ΔW i的模为非负数,i为正整数。ΔW表示参考作用力度对应的参考调整参数,ΔW为标量,ΔW为非负数。F i表示作用在第i帧图像上的交互操作的作用力度,F i为具有方向的矢量,F i的模为非负数。F m表示参考作用力度,F m为标量,F m为非负数。
例如,参考作用力度为1N(力的计量单位:牛顿),参考调整参数为10mm(毫米),若交互操作的作用力度为0.2N,则与该作用力度匹配的调整参数的模为2mm。
在另一种可选的实现方式中,终端基于单位作用力度对应的参考调整参数以及交互操作的作用力度,确定出交互操作的作用力度对应的调整参数。相应的,终端基于交互操作的作用力度,确定与作用力度匹配的调整参数的步骤包括:终端获取单位作用力度对应的参考位移;将交互操作的作用力度与单位作用力度的比值确定为参考数量;将参考数量与参考调整参数的乘积确定为该调整参数的模,将交互操作的作用力的方向确定为该调整参数的方向。
例如,单位作用力度为0.1N,该单位作用力度对应的参考调整参数为1mm,在交互操作的作用力度为0.2N的情况下,与该作用力度匹配的调整参数的模为2mm。
上述技术方案,通过识别交互操作的作用力度,确定出与该作用力度匹配的调整参数, 对第一图像中像素点的显示位置进行调整,能够使得像素点调整后呈现出的作用效果与交互操作的作用力度相对应,从而呈现出更加真实的交互效果,提升视频交互的真实体感,使得视频播放性能能够满足更加丰富的交互需求,进一步扩展视频播放过程中的体验方式。
在另一种可选的实现方式中,终端将用户执行交互操作时手部移动的位移确定为调整参数。相应的,终端响应于作用在该第一图像上的交互操作,获取该交互操作对应的调整参数的步骤包括:终端响应于作用在该第一图像上的交互操作,获取该交互操作作用在该第一图像上的起始位置点,以及获取该交互操作作用在该第一图像上的终止位置点;将起始位置点指向终止位置点的位移确定为调整参数。
需要说明的是,为保证交互操作的作用效果能够呈现在第二图像中,且第二图像能够按照第一帧率进行显示。终端确定开始显示第一图像的时间点与检测到交互操作的时间点之间的第一时长,在该第一时长与目标时长之和不大于两帧图像的显示间隔的情况下,终端在该交互操作作用在该第一图像上的时长达到目标时长时,将该交互操作作用在该第一图像上的位置点确定为终止位置点;或者,终端在该第一时长与目标时长之和大于两帧图像的显示间隔的情况下,将最后显示第一图像时该交互操作作用在第一图像上的位置点确定为终止位置点,进而确定调整参数,按照调整参数对像素点的显示位置进行调整。其中,目标时长表示用户在执行交互操作时,该交互操作在第一图像上所作用的有效时长,该目标时长为任一不大于两帧图像的时间间隔的时长。
例如,第一帧率为每秒30帧,两帧图像的显示间隔为0.033秒,目标时长为0.02秒,终端在第一图像的显示时长达到0.01秒时触发交互操作的情况下,在该交互操作作用在该第一图像上的时长达到0.02秒时,将该交互操作作用在第一图像上的位置点确定为终止位置点;或者,在第一图像的显示时长达到0.02秒时触发交互操作的情况下,在该交互操作作用在该第一图像上的时长达到0.012秒时,将该交互操作作用在第一图像上的位置点确定为终止位置点。
需要说明的是,终端为智能手机、平板电脑、笔记本电脑、台式计算机或者智能电视等,终端的显示屏能够检测到交互操作作用在的位置点。可选地,终端的显示屏为电阻式触摸屏、电容式触摸屏、红外线式触摸屏或者表面声波式触摸屏等,终端的显示屏的类型不同,检测交互操作作用在的位置点的原理不同。在本申请实施例中,对终端的显示屏检测交互操作作用在的位置点的原理,不加以限定。
303、终端获取该第一图像的像素点的位移参数。
其中,位移参数也可称为光流估计参数,该位移参数表示第一图像的像素点在第一图像与第二图像之间的位移,该第二图像为在该第一图像之后显示的图像。
在一种可选的实现方式中,终端通过光流估计模型预测该第一图像的像素点的位移参数。上述步骤303包括:终端将该第一图像输入到光流估计模型中,得到该第一图像的像素点的位移参数。
其中,光流估计模型用于预测当前帧图像的像素点向下一帧图像运动的位移。可选地,光流估计模型为通过FlowNet(光流神经网络)训练得到的预测模型。在光流估计模型的训练过程中,通过光流神经网络对多对训练图像进行光流估计;基于光流神经网络输出的位移参数以及真实的位移参数,对光流神经网络的网络参数进行更新,以使光流神经网络输出的位移参数尽可能接近真实的光流估计参数。
上述技术方案,通过光流估计模型来预测一帧图像的像素点的位移参数,光流估计模型能够用于对任意格式的视频中的图像的位移参数进行预测,从而能够支持任意视频的交互播放,扩展了视频交互播放的应用范围。
在另一种可选的实现方式中,视频的编码数据包括视频中图像的像素点的位移参数,即编码数据包括编码后的位移参数,终端能够对视频的编码数据进行解码,得到图像的像素点的位移参数。该位移参数是在视频的编码过程中预先确定出并编码到视频的编码数据中的, 其中,该位移参数是用于进行视频编码的计算机设备预先根据相邻两帧图像的像素点的位移变化确定出的。
上述技术方案,通过在视频的编码数据中预先编码图像中像素点的位移参数,在视频播放过程中,能够从视频的编码数据中直接解码出像素点的位移参数,进而基于直接解码出的位移参数,进行视频处理,能够提高视频处理的效率。
需要说明的是,图像中像素点的位移参数也可以通过其他光流估计算法计算得到。例如,Lucas–Kanade(一种两帧差分的光流估计算法)算法、Horn–Schunck(一种估计图像的稠密光流场的光流估计算法)算法等光流估计算法。在本申请实施例中,对位移参数的获取方式,不加以限定。
需要说明的另一点是,在本申请实施例中,以先获取调整参数,再获取位移参数为例进行说明,而在一些实施例中,上述终端获取调整参数的步骤以及终端获取位移参数的步骤还能够按照其他时序进行。可选地,终端同时获取调整参数和位移参数;或者,终端先获取位移参数,再获取调整参数,本申请实施例对此不加以限定。
304、终端基于调整参数和位移参数,调整第一图像中像素点的显示位置,基于像素点调整后的显示位置,显示第二图像。
终端将交互操作的作用效果叠加在交互操作所作用的操作区域上。终端响应于像素点为交互操作所作用的像素点,基于调整参数和位移参数,将该像素点从原显示位置偏移至目标显示位置;以及,终端响应于像素点为交互操作未作用的像素点,基于位移参数,将该像素点从原显示位置偏移至目标显示位置,以显示目标图像。其中,目标图像即为在第一图像之后显示的图像,该目标图像可称为第二图像。也即是,终端基于调整参数和位移参数,调整交互操作所作用的像素点的显示位置;终端基于位移参数,调整交互操作未作用的像素点的显示位置,然后基于调整后像素点的显示位置,显示第二图像。其中,终端通过调整像素点的显示位置,将像素点从原显示位置偏移至目标显示位置,该原显示位置即为像素点在第一图像中的显示位置,该目标显示位置即为调整后像素点在第二图像中的显示位置。
例如,对于自然类记录片中以动物为主体对象的动物类纪录片,在交互操作作用在动物皮毛的区域上的情况下,基于上述过程,对交互操作所作用的像素点进行偏移,能够在第二图像中呈现出动物皮毛的形变,产生对动物皮毛的拂动效果。对于自然类记录片中以自然景观为主体对象的自然景观类纪录片,在交互操作作用在河流的区域上的情况下,交互操作的作用方向与河流流向相同,基于上述过程,对交互操作所作用的像素点进行偏移,能够在第二图像中呈现出加速水的流动的作用效果。在交互操作作用在雪地的区域上的情况下,基于上述过程,对交互操作所作用的像素点进行偏移,能够在第二图像中呈现出雪的变化效果。
上述技术方案,通过结合交互操作对应的调整参数,进行像素点的偏移,将交互操作的作用效果叠加在了交互操作所作用在的操作区域上,从而在第二图像中突显出交互操作的作用效果,通过视频画面上的形变对用户的交互操作做出反馈,丰富了视频的交互效果,实现了视频的交互播放,扩展了视频播放过程中的体验方式。
并且,基于位移参数对图像中的像素点进行偏移处理,充分利用了视频播放的先验知识,减少了复杂的视频理解和计算,视频处理的计算量较小且易于部署,进而能够提高视频处理的效率,扩展视频交互播放的应用范围。
可选地,上述基于调整参数和位移参数,将像素点从原显示位置偏移至目标显示位置的步骤包括:终端响应于像素点为交互操作所作用的像素点,基于调整参数和位移参数,确定目标偏移参数;终端基于目标偏移参数所指示的偏移距离和偏移方向,将像素点从原显示位置偏移至目标显示位置,也即是终端基于目标偏移参数所指示的偏移距离和偏移方向,调整第一图像中交互操作所作用的像素点的显示位置。可选地,终端基于三角形定则、平行四边形定则或者坐标系解法等向量求和方法,将调整参数与位移参数相加,得到目标偏移参数。
上述技术方案,先基于调整参数和位移参数,确定目标偏移参数,从而能够基于目标偏 移参数,一次将像素点从原显示位置偏移至目标显示位置,提高了像素点的偏移效率,进而能够提高视频处理的效率。
需要说明的一点是,终端也可以先基于位移参数,将像素点从原显示位置偏移至中间显示位置;再基于调整参数,将像素点从中间显示位置偏移至目标显示位置,也即是终端先基于位移参数,调整第一图像中像素点的显示位置,再基于调整参数,在已调整的基础上,再次调整交互操作所作用的像素点的显示位置。在本申请实施例中,对像素点从原显示位置偏移至目标显示位置的过程,不加以限定。
需要说明的另一点是,叠加交互操作的作用效果是视频播放过程中的辅助功能,目的是为了丰富用户的视频观看体验,在叠加交互操作的作用效果的同时,仍应保持视频中的对象原有的运动趋势。例如,视频中的动物的运动趋势为向前行走,对动物皮毛的交互操作不应影响到该动物向前行走的运动趋势。在本申请实施例中,通过赋予交互操作一定的权重,使交互操作的作用效果不影响视频中对象原有的运动趋势。相应的,终端基于调整参数和位移参数,将像素点从原显示位置偏移至目标显示位置,以显示第二图像的步骤包括:终端获取交互操作对应的权重,该权重表示交互操作对像素点的显示位置偏移的影响程度;终端基于权重,对调整参数进行加权,基于加权后的调整参数和位移参数,将像素点从原显示位置偏移至目标显示位置,以显示第二图像,也即是,基于权重对调整参数进行加权,基于加权后的调整参数和位移参数,调整第一图像中像素点的位置。其中,权重也可称为影响权重。
上述终端基于权重、调整参数和位移参数,调整第一图像中像素点,以显示第二图像的过程基于以下公式二实现:
公式二:Image i+1=Image i+Flow i+λ×ΔW i
其中,Image i+1表示第i帧图像对应的第i+1帧图像,i为正整数。Image i表示第i帧图像,该第i帧图像为交互操作所作用的图像。Flow i表示第i帧图像的位移参数,Flow i为具有方向的矢量,Flow i的模为非负数。λ表示权重,λ为大于0,并且,小于或等于1的任一数值。ΔW i表示作用在第i帧图像上的交互操作的作用力度对应的调整参数,ΔW i为具有方向的矢量,ΔW i的模为非负数。上述公式二表示:对于第i帧图像中交互操作所作用的像素点,基于权重,对交互操作对应的调整参数进行加权;对加权后的调整参数以及位移参数求和,基于求和结果,将像素点从原显示位置偏移至目标显示位置,显示第二图像。
上述技术方案,通过赋予交互操作对应的调整参数一定的权重,使得交互操作的作用效果的叠加不影响视频中的对象原有的运动趋势,视频能够按照原有进度正常播放,在保证用户的视频观看体验的基础上,进一步丰富了交互效果。
需要说明的另一点是,终端在未检测到作用于第一图像上的交互操作的情况下,不获取交互操作对应的调整参数,直接显示第二图像即可。
本申请实施例提供的技术方案,由于位移参数能够表示第一图像与第二图像之间像素点变化的位移,且在交互操作作用在视频的第一图像上的情况下,该交互操作能够对像素点变化的位移产生影响,因此结合位移参数和调整参数对第一图像的像素点的显示位置进行调整,能够将交互操作的作用效果呈现于在第一图像之后显示的第二图像上,从而使视频呈现出与交互操作匹配度更高的动态效果,实现对用户与正在播放的视频的交互支持,增强了视频的交互性,提高了视频播放过程中的视觉效果。
需要说明的另一点是,终端除了将交互操作的作用效果叠加在之后显示的图像上,在视觉方面提升视频的交互体验之外,还通过以下步骤305至步骤307,播放交互操作所作用的对象的音频数据,进行相应的声音反馈,进一步丰富视频的交互效果。
305、终端响应于作用在该第一图像上的交互操作,确定第一图像中该交互操作所作用的第一对象。
第一图像中存在至少一个对象。例如,第一图像为自然类记录片所包括的图像,则第一图像中存在动物、树木、河流、草地等对象。其中,第一图像中的每个对象占用该第一图像 中的一块区域进行呈现。
可选地,终端基于语义分割来确定交互操作所作用的第一对象。相应的,上述步骤305包括:终端响应于作用在第一图像上的交互操作,对第一图像进行语义分割,得到至少一个第一像素区域,也即是,终端响应于作用在第一图像上的交互操作,确定第一图像的至少一个第一像素区域,每个第一像素区域包含一个对象;终端从至少一个第一像素区域中,确定交互操作所作用在的第一目标区域;终端将第一目标区域中的对象确定为第一对象。其中,对第一图像进行语义分割是指识别第一图像中的对象,并按照识别出的对象,将第一图像划分为至少一个第一像素区域,使每个第一像素区域中包含一个对象。
其中,每个第一像素区域用于表示第一图像中的一个对象。例如,第一图像包括狮子、草地和河流,则对该第一图像进行语义分割得到用于表示狮子的第一像素区域、用于表示草地的第一像素区域以及用于表示河流的第一像素区域。在交互操作作用在用于表示狮子的第一像素区域中的情况下,交互操作所作用的第一对象为狮子。
上述技术方案,通过语义分割将图像划分为多个用于表示不同对象的区域,每个区域代表第一图像中的一个对象,将交互操作所作用在的区域中的对象确定为交互操作所作用的第一对象,由于语义分割从像素级别进行区域划分,所划分出的区域边框更加精细,从而能够更加准确的确定出交互操作所作用的对象,进而能够使得所播放的音频数据与交互操作所作用的对象相匹配,使得音频数据的播放更加符合真实场景,进一步提升视频的交互体验。
需要说明的一点是,终端能够通过图像分割模型,对第一图像进行语义分割,得到至少一个第一像素区域。在一个示例中,图像分割模型的网络结构以CNN(Convolutional Neural Networks,卷积神经网络)为基础。图像分割模型为编码器-解码器的架构。图像分割模型的编码器通过卷积层捕捉第一图像中的局部特征,并以层级的方式将多个用于捕捉第一图像的局部特征的模块嵌套在一起,从而提取第一图像的复杂特征,将第一图像的内容编码为紧凑表征,即编码器通过对第一图像进行编码,得到特征图,该特征图的尺寸小于第一图像的尺寸,且该特征图能够表示每个像素点所属的类别标签,然后将特征图输入至图像分割模型的解码器,通过解码器中的转置卷积执行上采样,从而将特征图扩展到与第一图像相同的尺寸,生成用于表示第一图像中各像素点的类别标签的数组,由类别标签相同的多个像素点组成第一像素区域。
需要说明的另一点是,由于视频中相邻多帧图像之间存在相关性,多帧图像所包括的对象相同,同一对象在多帧图像中的位置存在差异。因此,在对一帧图像进行语义分割之后,能够基于光流估计,对同一像素区域内的像素点进行追踪,从而通过像素点的追踪在下一帧图像中确定用于表示不同对象的像素区域。
因此,终端响应于作用在第一图像上的交互操作,对第一图像进行语义分割,得到至少一个第一像素区域之后,通过以下步骤在交互操作作用在第二图像上时,确定交互操作在第二图像中所作用的对象,播放该对象对应的音频数据,包括:终端基于原显示位置在第一像素区域内的像素点在第二图像中的目标显示位置,确定第二图像的第二像素区域,其中,第二像素区域中的像素点的原显示位置在第一像素区域内,也即是,终端基于至少一个第一像素区域内的像素点和像素点调整后的显示位置,确定第二图像的至少一个第二像素区域,该一个第二像素区域与一个第一像素区域对应,第二像素区域中的像素点的原显示位置在对应的第一像素区域内;终端响应于作用在第二图像上的交互操作,从至少一个第二像素区域中确定交互操作所作用在的第二目标区域;终端将第二目标区域中的对象确定为第二对象,播放第二对象对应的音频数据。
上述技术方案,通过语义分割确定一帧图像的多个用于表示不同对象的像素区域之后,能够基于光流估计,对像素点进行追踪,以得到该帧图像之后的一帧或者多帧图像中的多个像素区域,不需要对每帧图像进行语义分割,就能得到图像中的多个像素区域,节省了多次语义分割所消耗的时间,提高了确定交互操作所作用的对象的效率,进而能够提高声音反馈 的效率,进一步提升视频交互体验。
需要说明的另一点是,终端也可以通过目标检测、分类定位或者实例分割等方法确定交互操作所作用的对象,在本申请实施例中,对确定交互操作所作用的对象的过程,不加以限定。
306、终端从对象与音频数据的对应关系中,确定第一对象对应的音频数据。
在一种可选的实现方式中,视频的编码数据包括对象与音频数据的对应关系,终端能够对视频的编码数据进行解码,得到对象与音频数据的对象关系;从对象与音频数据的对应关系中,确定第一对象的音频数据。
在另一种可选的实现方式中,服务器存储有对象与音频数据的对应关系,终端能够向服务器发送音频数据获取请求,该音频数据获取请求用于请求获取第一对象对应的音频数据;服务器接收终端的音频数据获取请求;从已存储的对象与音频数据的对应关系中,确定第一对象对应的音频数据;向终端返回该音频数据;终端接收服务器返回的音频数据。可选地,服务器在音频数据库中存储对象与音频数据的对应关系。
需要说明的一点是,用于进行视频编码的计算机设备将对象与音频数据的对应关系编码到编码数据中之前,或者服务器从已存储的对象与音频数据的对应关系中,确定第一对象对应的音频数据之前,还生成对象与音频数据的对应关系。在本申请实施例中,以服务器生成对象与音频数据的对应关系为例进行说明。用于进行视频编码的计算机设备生成对象与音频数据的对应关系的过程与服务器生成对象与音频数据的对应关系的过程同理。
其中,服务器生成对象与音频数据的对应关系的步骤包括以下步骤1至步骤3:
步骤1、服务器确定视频中的主体对象。
主体对象为视频中重点呈现的对象。例如,在自然类记录片中,主体对象为森林、动物、河流等;在天文类记录片中,主体对象为宇宙中的星体、气体等;在食物类纪录片中,主体对象为各种食材。
可选地,服务器对视频中的图像进行语义分割,确定图像中的对象;将视频划分为多个视频片段;确定每个对象在视频片段中的出现频次;将每个对象在视频片段中的出现频次与该视频片段中各个对象的出现频次之和的比值,确定为每个对象的出现比重;将出现比重大于参考阈值的对象确定为主体对象。其中,服务器按照固定时长将视频划分为多个视频片段,例如,视频的总时长为1小时,服务器每5分钟截取一个视频片段。参考阈值为预设的大于0小于1的阈值,例如,参考阈值为0.8、0.9等。
步骤2、服务器获取主体对象的音频数据。
在一种可选的实现方式中,服务器从视频中存在主体对象的视频片段中,提取主体对象的音频数据。例如,对于动物类记录片,在动物类记录片中包括狮子的情况下,服务器从狮子出现的视频片段中,对狮子的音频数据进行提取。
需要说明的一点是,在主体对象为森林或者动物的情况下,在提取主体对象的音频数据的过程中,需要先过滤掉人声影响较大的视频片段,确定出主体对象的音频数据较为单纯的视频片段进行音频提取。例如,自然类记录片的音频数据中通常包括旁白,存在旁白的视频片段为人声影响较大的视频片段,不存在旁白的视频片段为主体对象的音频数据较为单纯的视频片段。在视频片段中不存在音频数据较为单纯的视频片段的情况下,服务器能够对存在人声的视频片段进行人声的降噪过滤,提取出主体对象的音频数据。
在另一种可选的实现方式中,服务器从包括该主体对象的其他音频数据源中获取主体对象的音频数据。例如,对于自然景观类记录片或者天文类记录片,主体对象为山、星空等,主体对象是静止目标,主体对象所在的视频中该主体对象的音频数据较少,需要通过其他音频数据源进行音频数据的补充。在主体对象为石山的情况下,从其他音频数据源中获取触摸石头的音频数据。在主体对象为星空的情况下,从音频数据源中获取风铃的音频数据。再如,对于动物类视频,通过其他音频数据源获取动物毛发的摩擦声。
需要说明的一点是,可选地,服务器获取主体对象的音频数据之前,按照视频类型,对需要增加视频交互播放功能的多个视频进行分类,例如,将多个视频分为不易提取主体对象的音频数据的自然景观类视频,以及主体对象的音频数据较为丰富的动物类视频。对于自然景观类视频,从其他音频数据源中提取音频数据。对于动物类视频,从视频中存在主体对象的视频片段中,提取主体对象的音频数据。
步骤3、服务器生成主体对象与主体对象的音频数据的对应关系。
服务器获取到主体对象的音频数据后,生成主体对象与主体对象的音频数据的对应关系。后续终端在播放视频的过程中,能够从服务器获取相应的音频数据进行播放,丰富视频交互播放过程的视听体验。可选地,服务器将主体对象与主体对象的音频数据的对应关系存储于音频数据库中。
307、终端播放第一对象对应的音频数据。
终端对第一对象对应的音频数据进行播放。可选地,终端在播放视频原有的音频数据的同时,播放第一对象对应的音频数据。可选地,终端播放第一对象对应的音频数据的音量大于播放视频原有的音频数据的音量,以突出交互操作所产生的声音反馈效果。
本申请实施例提供的技术方案,一方面,将交互操作的作用效果呈现在第二图像上,在视觉上表现出对交互操作的反馈,另一方面,通过播放交互操作所作用的对象的音频数据,表现出对交互操作的声音反馈,从而从视觉和听觉两个方面,在视频播放过程中,对用户的交互操作做出反馈,能够实现视频的交互播放,提升视频交互播放过程中的视听效果。
需要说明的一点是,可选地,终端还结合交互操作的作用力度,实现不同音量的声音反馈。相应的,上述步骤307包括:终端获取交互操作的作用力度对应的播放音量;终端基于播放音量,播放第一对象对应的音频数据。其中,作用力度与播放音量呈正相关,作用力度越大,播放音量越大。
可选地,终端基于音量转换参数和作用力度,确定作用力度对应的播放音量。例如,作用力度为0.1N,音量转换参数为400,播放音量为40。或者,终端存储有作用力度与播放音量的对应关系,基于该对应关系确定播放音量。或者,终端通过向服务器发送音量转换请求,请求服务器返回作用力度对应的播放音量。在本申请实施例中,对终端获取交互操作的作用力度对应的播放音量的过程,不加以限定。
上述技术方案,能够按照交互操作的作用力度,实现不同音量的声音反馈,从而进一步提升视频交互播放的视听效果,丰富视频的交互体验。
需要说明的一点是,在本申请实施例中,以终端按照顺序执行步骤302至步骤307为例进行说明。在一些实施例中,终端还能够按照其他时序执行步骤302至步骤307。可选地,终端同时执行步骤302至步骤304以及步骤305至步骤307;或者,终端先执行步骤305至步骤307,再执行步骤302至步骤304,本申请实施例对此不加以限定。可选地,终端显示第二图像的同时,播放第一对象对应的音频数据,以使交互操作产生的视觉效果和声音效果同步产生,增强用户体感,进一步提升视听效果。
为了使视频交互播放的过程更加清晰,下面结合图4进行说明,参见图4,视频交互播放的过程开始之前,还包括步骤401、执行视频中主体对象的提取以及音频数据库的建立。其中,步骤401可以通过步骤306中生成对象与音频数据的对应关系的步骤1至步骤3实现。视频交互播放的过程包括:402、视频播放,观看者打开交互播放模式,终端通过上述步骤301显示视频中的第一图像;403、观看者触摸交互;404、基于光流估计的交互算法,终端通过上述步骤302至步骤304基于光流估计显示第二图像,呈现触摸交互的作用效果;405、声音反馈,终端通过上述步骤305至步骤307播放观看者交互操作所作用的对象的音频数据,实现声音反馈;406、最终交互播放,终端在显示第二图像的同时,播放交互操作所作用的对象的音频数据,实现最终的交互播放。
需要说明的另一点是,终端通过上述步骤302至步骤304呈现交互操作造成的视觉效果 即可,终端可以不执行步骤305至步骤307,增加交互操作造成的声音效果。
需要说明的另一点是,上述实施例以终端进行视频的交互播放为例进行说明。可选地,视频的交互播放通过终端与服务器之间的交互实现。例如,显示视频中的第一图像;响应于作用在该第一图像上的交互操作,向服务器发送视频处理请求,以请求服务器确定第二图像;终端接收服务器返回的第二图像,显示第二图像。服务器确定第二图像的过程与终端确定第二图像的过程同理。可选地,视频处理请求还用于请求服务器确定交互操作对应的音频数据,终端接收服务器返回的音频数据,播放该音频数据。服务器确定交互操作对应的音频数据的过程与终端确定交互操作对应的音频数据的过程同理。
上述所有可选技术方案,能够采用任意结合形成本申请的可选实施例,在此不再一一赘述。
图5是本申请实施例提供的一种视频处理装置的框图。参见图5,该装置包括:
第一获取模块501,用于响应于作用在第一图像上的交互操作,获取交互操作对应的调整参数,调整参数指示基于交互操作对第一图像中像素点的显示位置的调整幅度,第一图像为所播放的视频中当前显示的图像;
第二获取模块502,用于获取第一图像的像素点的位移参数,位移参数表示像素点在第一图像与第二图像之间的位移,第二图像为在第一图像之后显示的图像;
第二显示模块503,用于基于调整参数和位移参数,调整第一图像中像素点的显示位置;
第二显示模块503,还用于基于像素点调整后的显示位置,显示第二图像。
本申请实施例提供的技术方案,由于位移参数能够表示第一图像与第二图像之间像素点变化的位移,且在交互操作作用在视频的第一图像上的情况下,该交互操作能够对像素点变化的位移产生影响,因此结合位移参数和调整参数对第一图像的像素点的显示位置进行调整,能够将交互操作的作用效果呈现于在第一图像之后显示的第二图像上,从而使视频呈现出与交互操作匹配度更高的动态效果,实现对用户与正在播放的视频的交互支持,增强了视频的交互性,提高了视频播放过程中的视觉效果。
在一种可选的实现方式中,第一获取模块501,包括:
力度获取单元,用于响应于作用在第一图像上的交互操作,获取交互操作的作用力度;
参数确定单元,用于基于交互操作的作用力度,确定与作用力度匹配的调整参数。
在另一种可选的实现方式中,参数确定单元,用于:
基于参考作用力度、参考作用力度对应的参考调整参数、交互操作的作用力度,确定调整参数;
其中,调整参数与参考调整参数呈正相关,调整参数与参考作用力度呈负相关,调整参数与交互操作的作用力度呈正相关。
在另一种可选的实现方式中,第二获取模块502,用于将第一图像输入到光流估计模型中,得到第一图像的像素点的位移参数;
或者,第二获取模块502,用于对视频的编码数据进行解码,得到第一图像的像素点的位移参数,编码数据包括编码后的位移参数。
在另一种可选的实现方式中,第二显示模块503,包括:
像素点偏移单元,用于基于调整参数和位移参数,调整第一图像中交互操作所作用的像素点的显示位置。
在另一种可选的实现方式中,像素点偏移单元,用于:
基于调整参数和位移参数,确定目标偏移参数;
基于目标偏移参数所指示的偏移距离和偏移方向,调整第一图像中交互操作所作用的像素点的显示位置。
在另一种可选的实现方式中,第二显示模块503,用于:
获取交互操作对应的权重,权重用于表示交互操作对像素点的显示位置偏移的影响程度;
基于权重,对调整参数进行加权,基于加权后的调整参数和位移参数,调整第一图像中像素点的显示位置。
在另一种可选的实现方式中,该装置还包括:
第一对象确定模块,用于响应于作用在第一图像上的交互操作,确定第一图像中交互操作所作用的第一对象;
音频确定模块,用于从对象与音频数据的对应关系中,获取第一对象对应的音频数据;
音频播放模块,用于播放第一对象对应的音频数据。
在另一种可选的实现方式中,第一对象确定模块,用于:
响应于作用在第一图像上的交互操作确定第一图像的至少一个第一像素区域,每个第一像素区域包含一个对象;
从至少一个第一像素区域中,确定交互操作所作用在的第一目标区域;
将第一目标区域中的对象确定为第一对象。
在另一种可选的实现方式中,该装置还包括:
像素跟踪模块,用于基于至少一个第一像素区域内的像素点和像素点调整后的显示位置,确定第二图像的至少一个第二像素区域,一个第二像素区域与一个第一像素区域对应,第二像素区域中的像素点的原显示位置在对应的第一像素区域内;
第一对象确定模块,还用于响应于作用在第二图像上的交互操作,从至少一个第二像素区域中确定交互操作所作用在的第二目标区域;将第二目标区域中的对象确定为第二对象;
音频播放模块,还用于播放第二对象对应的音频数据。
在另一种可选的实现方式中,音频播放模块,用于:
获取交互操作的作用力度对应的播放音量;
基于播放音量,播放第一对象对应的音频数据。
在另一种可选的实现方式中,该装置还包括:
第二对象确定模块,用于确定视频中的主体对象;
音频提取模块,用于从视频中存在主体对象的视频片段中,提取主体对象的音频数据;
关系生成模块,用于生成主体对象与主体对象的音频数据的对应关系。
需要说明的是:上述实施例提供的视频处理装置在进行视频处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将计算机设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的视频处理装置与视频处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在本申请实施例中,计算机设备可被配置为终端或者服务器。若计算机设备被配置为终端,则由终端作为执行主体来实施本申请实施例提供的技术方案。若计算机设备被配置为服务器,则由服务器作为执行主体来实施本申请实施例提供的技术方案,或者,通过终端和服务器之间的交互来实施本申请实施例提供的技术方案,本申请实施例对比不加以限定。
若计算机设备被配置为终端,图6示出了本申请一个示例性实施例提供的终端600的结构框图。通常,终端600包括有:处理器601和存储器602。
处理器601可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器601可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。
存储器602可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。在一些实施例中,存储器602中的非暂态的计算机可读存储介质用于存储至少一条 程序代码,该至少一条程序代码用于被处理器601所执行以实现本申请中方法实施例提供的视频处理方法。
在一些实施例中,终端600还可选包括有:外围设备接口603和至少一个外围设备。处理器601、存储器602和外围设备接口603之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口603相连。具体地,外围设备包括:射频电路604、显示屏605、音频电路606中的至少一种。
外围设备接口603可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器601和存储器602。
显示屏605用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏605是触摸显示屏时,显示屏605还具有采集在显示屏605的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器601进行处理。此时,显示屏605还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。
音频电路606可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器601进行处理,或者输入至射频电路604以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端600的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器601或射频电路604的电信号转换为声波。
在一些实施例中,终端600还包括有一个或多个压力传感器607。压力传感器607可以设置在终端600的侧边框和/或显示屏605的下层。当压力传感器607设置在终端600的侧边框时,可以检测用户对终端600的握持信号,由处理器601根据压力传感器607采集的握持信号进行左右手识别或快捷操作。当压力传感器607设置在显示屏605的下层时,由处理器601根据用户对显示屏605的压力操作,实现对UI界面上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
本领域技术人员可以理解,图6中示出的结构并不构成对终端600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
若计算机设备被配置为服务器,图7是本申请实施例提供的一种服务器的框图,该服务器700可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(Central Processing Units,CPU)701和一个或一个以上的存储器702,其中,存储器702中存储有至少一条程序代码,至少一条程序代码由处理器701加载并执行以实现上述各个方法实施例提供的视频处理方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还可以包括其他用于实现设备功能的部件,在此不做赘述。
本申请实施例还提供了一种计算机设备,该计算机设备包括处理器和存储器,存储中存储有至少一条程序代码,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
响应于作用在第一图像上的交互操作,获取所述交互操作对应的调整参数,所述调整参数指示基于所述交互操作对所述第一图像中像素点的显示位置的调整幅度,所述第一图像为所播放的视频中当前显示的图像;
获取所述第一图像的像素点的位移参数,所述位移参数表示所述像素点在所述第一图像与第二图像之间的位移,所述第二图像为在所述第一图像之后显示的图像;
基于所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置;
基于所述像素点调整后的显示位置,显示所述第二图像。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
响应于作用在所述第一图像上的交互操作,获取所述交互操作的作用力度;
基于所述交互操作的作用力度,确定与所述作用力度匹配的所述调整参数。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
基于参考作用力度、所述参考作用力度对应的参考调整参数、所述交互操作的作用力度,确定所述调整参数;
其中,所述调整参数与所述参考调整参数呈正相关,所述调整参数与所述参考作用力度呈负相关,所述调整参数与所述交互操作的作用力度呈正相关。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
将所述第一图像输入到光流估计模型中,得到所述第一图像的像素点的位移参数;
对所述视频的编码数据进行解码,得到所述第一图像的像素点的位移参数,所述编码数据包括编码后的所述位移参数。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
基于所述调整参数和所述位移参数,调整所述第一图像中所述交互操作所作用的像素点的显示位置。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
基于所述调整参数和所述位移参数,确定目标偏移参数;
基于所述目标偏移参数所指示的偏移距离和偏移方向,调整所述第一图像中所述交互操作所作用的像素点的显示位置。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
获取所述交互操作对应的权重,所述权重表示所述交互操作对所述像素点的显示位置偏移的影响程度;
基于所述权重,对所述调整参数进行加权,基于加权后的所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
响应于作用在所述第一图像上的交互操作,确定所述第一图像中所述交互操作所作用的第一对象;
从对象与音频数据的对应关系中,获取所述第一对象对应的音频数据;
播放所述第一对象对应的音频数据。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
响应于作用在所述第一图像上的所述交互操作,确定所述第一图像的至少一个第一像素区域,每个所述第一像素区域包含一个对象;
从所述至少一个第一像素区域中,确定所述交互操作所作用在的第一目标区域;
将所述第一目标区域中的对象确定为所述第一对象。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
基于所述至少一个第一像素区域内的像素点和所述像素点调整后的显示位置,确定所述第二图像的至少一个第二像素区域,一个第二像素区域与一个第一像素区域对应,所述第二像素区域中的像素点的原显示位置在对应的第一像素区域内;
响应于作用在所述第二图像上的交互操作,从所述至少一个第二像素区域中确定所述交互操作所作用在的第二目标区域;
将所述第二目标区域中的对象确定为第二对象,播放所述第二对象对应的音频数据。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
获取所述交互操作的作用力度对应的播放音量;
基于所述播放音量,播放所述第一对象对应的音频数据。
可选地,该至少一条程序代码由处理器加载并执行,以实现如下步骤:
确定所述视频中的主体对象;
从所述视频中存在所述主体对象的视频片段中,提取所述主体对象的音频数据;
生成所述主体对象与所述主体对象的音频数据的对应关系。
在示例性实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条程序代码,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
响应于作用在第一图像上的交互操作,获取所述交互操作对应的调整参数,所述调整参数指示基于所述交互操作对所述第一图像中像素点的显示位置的调整幅度,所述第一图像为所播放的视频中当前显示的图像;
获取所述第一图像的像素点的位移参数,所述位移参数表示所述像素点在所述第一图像与第二图像之间的位移,所述第二图像为在所述第一图像之后显示的图像;
基于所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置;
基于所述像素点调整后的显示位置,显示所述第二图像。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
响应于作用在所述第一图像上的交互操作,获取所述交互操作的作用力度;
基于所述交互操作的作用力度,确定与所述作用力度匹配的所述调整参数。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
基于参考作用力度、所述参考作用力度对应的参考调整参数、所述交互操作的作用力度,确定所述调整参数;
其中,所述调整参数与所述参考调整参数呈正相关,所述调整参数与所述参考作用力度呈负相关,所述调整参数与所述交互操作的作用力度呈正相关。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
将所述第一图像输入到光流估计模型中,得到所述第一图像的像素点的位移参数;
对所述视频的编码数据进行解码,得到所述第一图像的像素点的位移参数,所述编码数据包括编码后的所述位移参数。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
基于所述调整参数和所述位移参数,调整所述第一图像中所述交互操作所作用的像素点的显示位置。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
基于所述调整参数和所述位移参数,确定目标偏移参数;
基于所述目标偏移参数所指示的偏移距离和偏移方向,调整所述第一图像中所述交互操作所作用的像素点的显示位置。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
获取所述交互操作对应的权重,所述权重表示所述交互操作对所述像素点的显示位置偏移的影响程度;
基于所述权重,对所述调整参数进行加权,基于加权后的所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
响应于作用在所述第一图像上的交互操作,确定所述第一图像中所述交互操作所作用的第一对象;
从对象与音频数据的对应关系中,获取所述第一对象对应的音频数据;
播放所述第一对象对应的音频数据。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
响应于作用在所述第一图像上的所述交互操作,确定所述第一图像的至少一个第一像素区域,每个所述第一像素区域包含一个对象;
从所述至少一个第一像素区域中,确定所述交互操作所作用在的第一目标区域;
将所述第一目标区域中的对象确定为所述第一对象。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
基于所述至少一个第一像素区域内的像素点和所述像素点调整后的显示位置,确定所述第二图像的至少一个第二像素区域,一个第二像素区域与一个第一像素区域对应,所述第二像素区域中的像素点的原显示位置在对应的第一像素区域内;
响应于作用在所述第二图像上的交互操作,从所述至少一个第二像素区域中确定所述交互操作所作用在的第二目标区域;
将所述第二目标区域中的对象确定为第二对象,播放所述第二对象对应的音频数据。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
获取所述交互操作的作用力度对应的播放音量;
基于所述播放音量,播放所述第一对象对应的音频数据。
可选地,该至少一条程序代码可由计算机设备的处理器执行以实现如下步骤:
确定所述视频中的主体对象;
从所述视频中存在所述主体对象的视频片段中,提取所述主体对象的音频数据;
生成所述主体对象与所述主体对象的音频数据的对应关系。
例如,计算机可读存储介质可以是ROM(Read-Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、CD-ROM(Compact Disc Read-Only Memory,只读光盘)、磁带、软盘和光数据存储设备等。
本申请还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机程序代码,该计算机程序代码存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机程序代码,处理器执行该计算机程序代码,使得该计算机设备执行上述各个方法实施例中的视频处理方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (16)

  1. 一种视频处理方法,由计算机设备执行,所述方法包括:
    响应于作用在第一图像上的交互操作,获取所述交互操作对应的调整参数,所述调整参数指示基于所述交互操作对所述第一图像中像素点的显示位置的调整幅度,所述第一图像为所播放的视频中当前显示的图像;
    获取所述第一图像的像素点的位移参数,所述位移参数表示所述像素点在所述第一图像与第二图像之间的位移,所述第二图像为在所述第一图像之后显示的图像;
    基于所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置;
    基于所述像素点调整后的显示位置,显示所述第二图像。
  2. 根据权利要求1所述的方法,其中,所述响应于作用在所述第一图像上的交互操作,获取所述交互操作对应的调整参数,包括:
    响应于作用在所述第一图像上的交互操作,获取所述交互操作的作用力度;
    基于所述交互操作的作用力度,确定与所述作用力度匹配的所述调整参数。
  3. 根据权利要求2所述的方法,其中,所述基于所述交互操作的作用力度,确定与所述作用力度匹配的所述调整参数,包括:
    基于参考作用力度、所述参考作用力度对应的参考调整参数、所述交互操作的作用力度,确定所述调整参数;
    其中,所述调整参数与所述参考调整参数呈正相关,所述调整参数与所述参考作用力度呈负相关,所述调整参数与所述交互操作的作用力度呈正相关。
  4. 根据权利要求1所述的方法,其中,所述获取所述第一图像的像素点的位移参数,包括下述任一项:
    将所述第一图像输入到光流估计模型中,得到所述第一图像的像素点的位移参数;
    对所述视频的编码数据进行解码,得到所述第一图像的像素点的位移参数,所述编码数据包括编码后的所述位移参数。
  5. 根据权利要求1所述的方法,其中,所述基于所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置,包括:
    基于所述调整参数和所述位移参数,调整所述第一图像中所述交互操作所作用的像素点的显示位置。
  6. 根据权利要求5所述的方法,其中,所述基于所述调整参数和所述位移参数,调整所述第一图像中所述交互操作所作用的像素点的显示位置,包括:
    基于所述调整参数和所述位移参数,确定目标偏移参数;
    基于所述目标偏移参数所指示的偏移距离和偏移方向,调整所述第一图像中所述交互操作所作用的像素点的显示位置。
  7. 根据权利要求1所述的方法,其中,所述基于所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置,包括:
    获取所述交互操作对应的权重,所述权重表示所述交互操作对所述像素点的显示位置偏移的影响程度;
    基于所述权重,对所述调整参数进行加权,基于加权后的所述调整参数和所述位移参数, 调整所述第一图像中像素点的显示位置。
  8. 根据权利要求1所述的方法,其中,所述方法还包括:
    响应于作用在所述第一图像上的交互操作,确定所述第一图像中所述交互操作所作用的第一对象;
    从对象与音频数据的对应关系中,获取所述第一对象对应的音频数据;
    播放所述第一对象对应的音频数据。
  9. 根据权利要求8所述的方法,其中,所述响应于作用在所述第一图像上的交互操作,确定所述交互操作所作用的第一对象,包括:
    响应于作用在所述第一图像上的所述交互操作,确定所述第一图像的至少一个第一像素区域,每个所述第一像素区域包含一个对象;
    从所述至少一个第一像素区域中,确定所述交互操作所作用在的第一目标区域;
    将所述第一目标区域中的对象确定为所述第一对象。
  10. 根据权利要求9所述的方法,其中,所述方法还包括:
    基于所述至少一个第一像素区域内的像素点和所述像素点调整后的显示位置,确定所述第二图像的至少一个第二像素区域,一个第二像素区域与一个第一像素区域对应,所述第二像素区域中的像素点的原显示位置在对应的第一像素区域内;
    响应于作用在所述第二图像上的交互操作,从所述至少一个第二像素区域中确定所述交互操作所作用在的第二目标区域;
    将所述第二目标区域中的对象确定为第二对象,播放所述第二对象对应的音频数据。
  11. 根据权利要求8所述的方法,其中,所述播放所述第一对象对应的音频数据,包括:
    获取所述交互操作的作用力度对应的播放音量;
    基于所述播放音量,播放所述第一对象对应的音频数据。
  12. 根据权利要求8所述的方法,其中,所述方法还包括:
    确定所述视频中的主体对象;
    从所述视频中存在所述主体对象的视频片段中,提取所述主体对象的音频数据;
    生成所述主体对象与所述主体对象的音频数据的对应关系。
  13. 一种视频处理装置,其特征在于,所述装置包括:
    第一获取模块,用于响应于作用在第一图像上的交互操作,获取所述交互操作对应的调整参数,所述调整参数指示基于所述交互操作对所述第一图像中像素点的显示位置的调整幅度,所述第一图像为所播放的视频中当前显示的图像;
    第二获取模块,用于获取所述第一图像的像素点的位移参数,所述位移参数表示所述像素点在所述第一图像与第二图像之间的位移,所述第二图像为在所述第一图像之后显示的图像;
    第二显示模块,用于基于所述调整参数和所述位移参数,调整所述第一图像中像素点的显示位置;
    所述第二显示模块,还用于基于所述像素点调整后的显示位置,显示所述第二图像。
  14. 一种计算机设备,所述计算机设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以实现如权利要求1-12任一项所述的视频处理方法。
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以实现如权利要求1-12任一项所述的视频处理方法。
  16. 一种计算机程序产品,所述计算机程序产品包括计算机程序代码,所述计算机程序代码由处理器加载并执行,以实现如权利要求1-12任一项所述的视频处理方法。
PCT/CN2021/117982 2020-10-10 2021-09-13 视频处理方法、装置、计算机设备及存储介质 WO2022073409A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21876929.7A EP4106337A4 (en) 2020-10-10 2021-09-13 VIDEO PROCESSING METHOD AND APPARATUS, COMPUTER DEVICE AND STORAGE MEDIUM
US17/963,879 US20230036919A1 (en) 2020-10-10 2022-10-11 Incorporating interaction actions into video display through pixel displacement

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011078356.9A CN112218136B (zh) 2020-10-10 2020-10-10 视频处理方法、装置、计算机设备及存储介质
CN202011078356.9 2020-10-10

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/963,879 Continuation US20230036919A1 (en) 2020-10-10 2022-10-11 Incorporating interaction actions into video display through pixel displacement

Publications (1)

Publication Number Publication Date
WO2022073409A1 true WO2022073409A1 (zh) 2022-04-14

Family

ID=74053034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/117982 WO2022073409A1 (zh) 2020-10-10 2021-09-13 视频处理方法、装置、计算机设备及存储介质

Country Status (4)

Country Link
US (1) US20230036919A1 (zh)
EP (1) EP4106337A4 (zh)
CN (1) CN112218136B (zh)
WO (1) WO2022073409A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112218136B (zh) * 2020-10-10 2021-08-10 腾讯科技(深圳)有限公司 视频处理方法、装置、计算机设备及存储介质
CN113825013B (zh) * 2021-07-30 2023-11-14 腾讯科技(深圳)有限公司 图像显示方法和装置、存储介质及电子设备
CN115243092B (zh) * 2022-07-01 2024-02-23 网易(杭州)网络有限公司 视频播放方法、装置及存储介质
CN115061770B (zh) * 2022-08-10 2023-01-13 荣耀终端有限公司 显示动态壁纸的方法和电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010048753A1 (en) * 1998-04-02 2001-12-06 Ming-Chieh Lee Semantic video object segmentation and tracking
US20060285724A1 (en) * 2005-06-20 2006-12-21 Ying-Li Tian Salient motion detection system, method and program product therefor
CN104394313A (zh) * 2014-10-27 2015-03-04 成都理想境界科技有限公司 特效视频生成方法及装置
CN109242802A (zh) * 2018-09-28 2019-01-18 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及计算机可读介质
CN110324672A (zh) * 2019-05-30 2019-10-11 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置、系统及介质
CN111147880A (zh) * 2019-12-30 2020-05-12 广州华多网络科技有限公司 视频直播的互动方法、装置、系统、电子设备及存储介质
CN112218136A (zh) * 2020-10-10 2021-01-12 腾讯科技(深圳)有限公司 视频处理方法、装置、计算机设备及存储介质

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE112013001343B4 (de) * 2012-03-06 2019-02-28 Apple Inc. Benutzerschnittstelle für ein virtuelles Musikinstrument und Verfahren zum Bestimmen einer Eigenschaft einer auf einem virtuellen Saiteninstrument gespielten Note
US20140241573A1 (en) * 2013-02-27 2014-08-28 Blendagram, Inc. System for and method of tracking target area in a video clip
US9317112B2 (en) * 2013-11-19 2016-04-19 Microsoft Technology Licensing, Llc Motion control of a virtual environment
CN104408743A (zh) * 2014-11-05 2015-03-11 百度在线网络技术(北京)有限公司 图像分割方法和装置
WO2016145406A1 (en) * 2015-03-11 2016-09-15 Massachusetts Institute Of Technology Methods and apparatus for modeling deformations of an object
US9860451B2 (en) * 2015-06-07 2018-01-02 Apple Inc. Devices and methods for capturing and interacting with enhanced digital images
CN105005441B (zh) * 2015-06-18 2018-11-06 美国掌赢信息科技有限公司 一种即时视频的显示方法和电子设备
CN105357585B (zh) * 2015-08-29 2019-05-03 华为技术有限公司 对视频内容任意位置和时间播放的方法及装置
CN106780408B (zh) * 2016-11-30 2020-03-27 努比亚技术有限公司 图片处理方法及装置
KR102364420B1 (ko) * 2017-04-26 2022-02-17 삼성전자 주식회사 전자 장치 및 터치 입력에 기초하여 상기 전자 장치를 제어하는 방법
CN108062760B (zh) * 2017-12-08 2020-12-08 广州市百果园信息技术有限公司 视频编辑方法、装置及智能移动终端
CN110062269A (zh) * 2018-01-18 2019-07-26 腾讯科技(深圳)有限公司 附加对象显示方法、装置及计算机设备
US20190272094A1 (en) * 2018-03-01 2019-09-05 Jack M. MINSKY System for multi-tagging images
US10956724B1 (en) * 2019-09-10 2021-03-23 Facebook Technologies, Llc Utilizing a hybrid model to recognize fast and precise hand inputs in a virtual environment
WO2021149563A1 (ja) * 2020-01-20 2021-07-29 ファナック株式会社 ロボットシミュレーション装置
CN111583300B (zh) * 2020-04-23 2023-04-25 天津大学 一种基于富集目标形态变化更新模板的目标跟踪方法
CN111464830B (zh) * 2020-05-19 2022-07-15 广州酷狗计算机科技有限公司 图像显示的方法、装置、系统、设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010048753A1 (en) * 1998-04-02 2001-12-06 Ming-Chieh Lee Semantic video object segmentation and tracking
US20060285724A1 (en) * 2005-06-20 2006-12-21 Ying-Li Tian Salient motion detection system, method and program product therefor
CN104394313A (zh) * 2014-10-27 2015-03-04 成都理想境界科技有限公司 特效视频生成方法及装置
CN109242802A (zh) * 2018-09-28 2019-01-18 Oppo广东移动通信有限公司 图像处理方法、装置、电子设备及计算机可读介质
CN110324672A (zh) * 2019-05-30 2019-10-11 腾讯科技(深圳)有限公司 一种视频数据处理方法、装置、系统及介质
CN111147880A (zh) * 2019-12-30 2020-05-12 广州华多网络科技有限公司 视频直播的互动方法、装置、系统、电子设备及存储介质
CN112218136A (zh) * 2020-10-10 2021-01-12 腾讯科技(深圳)有限公司 视频处理方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
EP4106337A1 (en) 2022-12-21
EP4106337A4 (en) 2023-10-18
US20230036919A1 (en) 2023-02-02
CN112218136A (zh) 2021-01-12
CN112218136B (zh) 2021-08-10

Similar Documents

Publication Publication Date Title
WO2022073409A1 (zh) 视频处理方法、装置、计算机设备及存储介质
JP7312853B2 (ja) 人工知能に基づく音声駆動アニメーション方法及び装置、デバイス及びコンピュータプログラム
CN106648083B (zh) 增强演奏场景合成控制方法及装置
CN112040263A (zh) 视频处理方法、视频播放方法、装置、存储介质和设备
CN111726536A (zh) 视频生成方法、装置、存储介质及计算机设备
CN112560605B (zh) 交互方法、装置、终端、服务器和存储介质
CN112235635B (zh) 动画显示方法、装置、电子设备及存储介质
CN111984763B (zh) 一种答问处理方法及智能设备
CN107211198A (zh) 用于编辑内容的装置和方法
CN111491187B (zh) 视频的推荐方法、装置、设备及存储介质
US20230047858A1 (en) Method, apparatus, electronic device, computer-readable storage medium, and computer program product for video communication
US20230368461A1 (en) Method and apparatus for processing action of virtual object, and storage medium
CN113490010B (zh) 基于直播视频的互动方法、装置、设备及存储介质
CN111343512B (zh) 信息获取方法、显示设备及服务器
CN111726676B (zh) 基于视频的图像生成方法、显示方法、装置以及设备
CN112188228A (zh) 直播方法及装置、计算机可读存储介质和电子设备
EP4025963A1 (en) Light field display for mobile devices
CN110377220A (zh) 一种指令响应方法、装置、存储介质及电子设备
US11430158B2 (en) Intelligent real-time multiple-user augmented reality content management and data analytics system
WO2020234939A1 (ja) 情報処理装置、情報処理方法、およびプログラム
CN114125531B (zh) 视频预览方法、装置、终端及存储介质
JP2024503957A (ja) 動画の編集方法、装置、電子機器、媒体
CN111858856A (zh) 多轮检索式聊天方法及显示设备
WO2024051467A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN115334367B (zh) 视频的摘要信息生成方法、装置、服务器以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21876929

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021876929

Country of ref document: EP

Effective date: 20220914

NENP Non-entry into the national phase

Ref country code: DE