WO2023071603A1 - 视频融合方法、装置、电子设备及存储介质 - Google Patents

视频融合方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2023071603A1
WO2023071603A1 PCT/CN2022/119508 CN2022119508W WO2023071603A1 WO 2023071603 A1 WO2023071603 A1 WO 2023071603A1 CN 2022119508 W CN2022119508 W CN 2022119508W WO 2023071603 A1 WO2023071603 A1 WO 2023071603A1
Authority
WO
WIPO (PCT)
Prior art keywords
fused
video frame
foreground
fusion
target
Prior art date
Application number
PCT/CN2022/119508
Other languages
English (en)
French (fr)
Inventor
吴泽寰
程京
苏再卿
杜绪晗
焦少慧
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2023071603A1 publication Critical patent/WO2023071603A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • G06T3/053
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to the field of computer technology, for example, to a video fusion method, device, electronic equipment and storage medium.
  • Various live video broadcasts can be fused together, for example, the combination of live video + virtual reality (Virtual Reality, VR) live broadcast, so as to output a digitally created stage live broadcast effect.
  • VR Virtual Reality
  • live video fusion often requires the introduction of green screen matting technology, that is, a large-scale green screen scene needs to be built, resulting in a high threshold for live video fusion; at the same time, because the video fusion is based on green screen matting, the obtained image Most of them are greenish, etc., resulting in unreal fusion effect, which leads to the problem of poor user experience.
  • the present disclosure provides a video fusion method, device, electronic equipment and storage medium, so as to realize the universality of video fusion and improve the technical effect of user experience.
  • the present disclosure provides a video fusion method, the method comprising:
  • the target foreground processing method corresponding to the foreground video frame to be fused and the fusion parameters determine the target object to be fused corresponding to the foreground video frame to be fused;
  • the target fused video frame displayed on the target terminal is determined.
  • the present disclosure also provides a video fusion device, which includes:
  • the video frame receiving module is configured to receive the foreground video frame to be fused in the foreground video stream to be fused and the background video frame to be fused in the background video stream to be fused;
  • a fusion parameter determination module configured to determine a fusion parameter corresponding to the background video frame to be fused
  • the fusion object determination module is configured to determine the target object to be fused corresponding to the foreground video frame to be fused according to the target foreground processing method corresponding to the foreground video frame to be fused and the fusion parameters;
  • the video fusion module is configured to determine the target fused video frame displayed on the target terminal according to the target object to be fused and the background video frame to be fused.
  • the present disclosure also provides an electronic device, the electronic device comprising:
  • processors one or more processors
  • a storage device configured to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors are made to implement the above video fusion method.
  • the present disclosure also provides a storage medium containing computer-executable instructions for performing the above-mentioned video fusion method when executed by a computer processor.
  • FIG. 1 is a schematic flowchart of a video fusion method provided in Embodiment 1 of the present disclosure
  • FIG. 2 is a schematic flowchart of a video fusion method provided in Embodiment 2 of the present disclosure
  • FIG. 3 is a schematic structural diagram of a video fusion device provided by Embodiment 3 of the present disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by Embodiment 4 of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • This technical solution can be applied to scenes such as fused live broadcast scenes and fused short video shooting, for example, the scene where the two video streams captured are fused, for example, the video stream of scene 1 and the video stream of scene 2 are combined in a blended scene.
  • Fig. 1 is a schematic flowchart of a video fusion method provided by Embodiment 1 of the present disclosure. This embodiment is applicable to the situation where at least two video streams are fused, and the method can be executed by a video fusion device, which can It is implemented in the form of software and/or hardware, and the hardware may be an electronic device, such as a mobile terminal, a personal computer (Personal Computer, PC) or a server.
  • a video fusion device which can It is implemented in the form of software and/or hardware, and the hardware may be an electronic device, such as a mobile terminal, a personal computer (Personal Computer, PC) or a server.
  • PC Personal Computer
  • the method of the present embodiment includes:
  • S110 Receive foreground video frames to be fused in the foreground video stream to be fused and background video frames to be fused in the background video stream to be fused.
  • audio and video transmission between multiple ports usually adopts a streaming transmission manner, for example, multiple video frames are transmitted in the form of video streams.
  • the video frame to which the object to be fused belongs can be used as the foreground video frame to be fused.
  • the video frame to which user A belongs may be used as the foreground video frame to be fused.
  • the video stream to which user A belongs is the foreground video stream to be fused.
  • the video frame into which the object to be fused is used as the background video frame to be fused.
  • the video to which each background video frame to be fused belongs is taken as the background video stream to be fused.
  • the video stream including user A in the room can be shot based on the camera device as the foreground video stream to be fused; at the same time, the video stream of the street scene is shot based on the camera device, As the background video stream to be fused.
  • the to-be-fused foreground video frames of the to-be-fused foreground video stream and the to-be-fused background video frames of the to-be-fused background video stream can be processed at the same time, so as to realize the fusion of user A into the multi-street scene.
  • a video frame selected from the background video stream to be fused and a video frame selected from the foreground video stream to be fused can be fused according to the user's preference. If it is the fusion of videos, the video frames of the foreground video stream to be fused and the background video stream to be fused can be fused in turn at the same moment; The video frames are processed to obtain the final required target video frame, and then the target video is obtained according to the multiple target video frames.
  • the background video frame to be fused is a street scene at night
  • the foreground video frame to be fused is shot under the condition that all lights in the room are turned on, that is, the environmental parameters of the street scene at night and the indoor scene are quite different, if When the target object of the indoor scene is fused into the night street scene, there is a problem that the brightness of the target object is relatively strong, so as to achieve the problem of low fusion fidelity.
  • the fusion parameters of the background video frames to be fused can be determined first, and the fusion parameters can be color temperature parameters or saturation parameters.
  • the advantage of determining the above parameters is that the parameters of the fusion object in the foreground video frame to be fused can be adjusted based on the fusion parameter, so that the fused video frame has a higher degree of realism.
  • fusion parameters in the background video frame to be fused may be determined based on a preset algorithm, for example, a color temperature parameter and the like.
  • the determining a fusion parameter corresponding to the background video frame to be fused includes: determining a color temperature parameter of the background video frame to be fused.
  • the processing method for processing the foreground video frame to be fused may be used as the target foreground processing method.
  • the video types of the background video frame to be fused include multiple types, and the processing method corresponding to the video type is used as the target foreground processing method of the foreground video frame to be fused.
  • the target object to be fused may be the subject object in the foreground video frame to be fused.
  • the main object can be pre-set, for example, if the user in the foreground video frame to be fused is to be integrated into the background video frame to be fused, the processed user can be used as the target object to be fused; if the foreground video frame to be fused is to be Both the pet and the user in the frame are fused into the background video frame to be fused, then both the processed pet and the user can be used as the target object to be fused.
  • a target foreground processing manner for processing the foreground video frame to be fused can be determined. Based on the target foreground processing method, the foreground video frame to be fused is extracted and processed to obtain the fused object to be processed. According to the fusion parameters of the background video frame to be fused and the fused object to be processed, the target object to be fused corresponding to the foreground video frame to be fused can be determined.
  • white balance processing can be performed on the fusion object to be processed in the foreground video frame to be fusion to obtain the target fusion object.
  • S140 Determine a target fused video frame displayed on the target terminal according to the target object to be fused and the background video frame to be fused.
  • the target fused video frame is the final video frame obtained by fusing the foreground video frame to be fused and the background video frame to be fused.
  • the above steps can be repeatedly performed to obtain multiple target fusion video frames, and the target video is obtained by fusing the video frames according to the multiple targets.
  • the target object to be fused can be added to the background video frame to be fused to obtain the target fused video frame displayed on the target terminal.
  • the fusion parameters of the background video frames to be fused and the fusion objects to be processed of the foreground video frames to be fused can be respectively determined.
  • Process the object to be processed based on the fusion parameters determine the target object to be fused, and obtain the target video frame based on the target object to be fused and the background video frame to be fused, which solves the need to introduce green screen matting technology when performing video fusion in related technologies , that is, it is necessary to build a green screen, which leads to a high threshold for live broadcast construction and poor universality.
  • the image after fusion is relatively green, which is quite different from the actual image.
  • the target fusion video frame has a higher similarity with the actual image, which improves the technical effect of image display authenticity and user experience.
  • Fig. 2 is a schematic flow chart of a video fusion method provided by Embodiment 2 of the present disclosure.
  • the foreground video frame to be fused can be processed according to the video type of the background video frame to be fused to obtain the target object to be fused , for the specific implementation manner, please refer to the detailed elaboration of the technical solution. Wherein, technical terms that are the same as or corresponding to those in the foregoing embodiments will not be repeated here.
  • the method includes:
  • At least one of the following three manners may be used to determine the background video frame to be fused in the background video stream to be fused.
  • the first implementation manner may be: according to the obtained depth video stream of the target scene, determine a 3D model corresponding to the target scene, and determine the background video frame to be fused according to the 3D model.
  • a depth image that can reflect the geometric depth information of the scene while collecting the color image of the scene
  • at least two cameras need to be deployed for shooting, one camera is set to acquire the color image of the target scene, and the other camera is set to Obtain the depth image of the target scene, and finally integrate the depth video stream based on the images in the data transmission channels corresponding to the two cameras.
  • the above two cameras can be aimed at the target scene under the same viewing angle to shoot, and then a depth video stream including a color image and a depth image can be obtained.
  • the computing terminal can use these data as raw data to construct a 3D model.
  • the 3D model is the 3D model corresponding to the target scene, which is a polygonal representation of the target scene (and objects in the scene). Reflects the 3D geometry of the scene (and the objects within it).
  • a 3D model can be constructed based on each frame of the depth video stream, and the video stream contains multiple frames of real-time images of the target scene. Therefore, the constructed 3D model is also a dynamic 3D video model. Based on the selected viewpoint, the scene is drawn for the 3D video model, and the background video frame to be fused is obtained.
  • the deployment method can be selected according to the task requirements, and the embodiments of the present disclosure are not limited here. .
  • the 3D model can be determined first according to the depth video stream, and then the scene is drawn according to the selected viewing angle to obtain the background video frame to be fused.
  • the second implementation manner may be: render the virtual scene according to preset camera rendering parameters, and use the rendered video frame corresponding to the virtual scene as the background video frame to be fused.
  • the real-shot objects For 3D fusion/space fusion, it is necessary to project the real-shot objects to be fused into the virtual scene space.
  • the camera parameters and motion information of the camera that shoots the real image and the virtual camera in the three-dimensional space are bound to achieve the purpose of fusion.
  • the ground on which the target user is actually standing is overlapped with the ground in the virtual space, so as to avoid the problem that the user leaves the ground when walking in the virtual scene, resulting in poor display images.
  • the camera rendering parameters are parameters for shooting the virtual scene.
  • the background video frame to be fused is the video frame obtained after the virtual scene is rendered and processed based on the camera rendering parameters.
  • Rendering parameters that is, virtual camera parameters in the virtual space
  • the virtual scene can be rendered and processed to obtain a video frame corresponding to the virtual scene, and the video frame obtained at this time is used as the background video frame to be fused.
  • a third implementation manner may be: based on the two-dimensional video stream of the target area captured by the camera device, each video frame in the two-dimensional video stream is used as the background video frame to be fused.
  • the two-dimensional video stream may be a video stream captured by a common camera device.
  • the two-dimensional video stream of the target scene can be shot based on the camera device, and each video frame in the captured two-dimensional video stream can be used as the scene to be fused. Blends background video frames.
  • S220 Receive foreground video frames to be fused in the foreground video stream to be fused and background video frames to be fused in the background video stream to be fused.
  • the methods of obtaining the background video frames to be fused are different, and correspondingly, the video types of the background video frames to be fused are also different.
  • the processing methods of the foreground video frames to be fused are also different.
  • the processing method of the foreground video frame to be fused can be determined according to the video type of the background video frame to be fused, and this processing method can be used as the target foreground processing method.
  • the processing method includes a depth estimation sub-mode, an image space fusion sub-mode, and a matting algorithm corresponding to the three-dimensional video stream type; a matting algorithm corresponding to the two-dimensional video stream; and a virtual video stream corresponding to the type Foreground camera change sub-way.
  • S240 Process the to-be-fused foreground video frame according to the target foreground processing manner and the fusion parameters to obtain a target to-be-fused object.
  • the method of processing the fusion parameters and the foreground video frames to be fused based on the target foreground processing method may be: based on the matting algorithm, determine the foreground video frames to be fused Process the fusion object; determine the depth information to be processed of the foreground video frame to be fused based on the depth estimation sub-mode; process the depth information to be processed, the fusion object to be processed and the fusion parameters based on the image space fusion sub-mode, and determine the target object to be fused.
  • the matting algorithm can use a deep learning algorithm.
  • the input of this deep learning algorithm is the original image of the foreground video frame to be fused, and the return is the refined alpha channel corresponding to the portrait.
  • the fusion object to be processed can be determined.
  • the depth estimation sub-method may be to determine the depth value corresponding to the fusion object in the foreground video frame to be fused, so as to fuse it into the background video frame to be fused, and avoid the situation that the user floats up.
  • the depth information of the foreground video frame to be fused can be determined, and this depth information is used as the depth information to be processed.
  • the sub-mode of image space fusion can be a way to process the depth information to be processed, the fusion object to be processed, and the fusion parameters.
  • the object after the white balance processing is subjected to spatial fusion processing to obtain the target object to be fused.
  • the self-adaptive white balance can be: to correct some scenes with overexposed lighting or insufficient light supplement.
  • the target foreground processing method is a matting algorithm.
  • processing the foreground video frame to be fused based on the target foreground processing method to obtain the target object to be fused includes: determining the fused object to be processed in the foreground video frame to be fused based on the matting algorithm; parameters to adjust the white balance of the object to be fused, and determine the target object to be fused.
  • the fusion object to be processed in the foreground video frame to be fusion can be determined.
  • the white balance of the object to be fused is adjusted, so as to obtain the target object to be fused so that the object to be fused is compatible with the background video frame to be fused.
  • the target foreground processing manner includes a foreground camera parameter changing sub-mode.
  • Determining the target object to be fused can be: obtaining the virtual camera parameters of the background video frame to be fused corresponding to the virtual video stream type; determining the foreground camera parameter corresponding to the foreground video frame to be fused; Parameters and virtual camera parameters are processed to obtain target transformation parameters, and based on the target transformation parameters and fusion parameters, the fusion object to be processed in the foreground video frame to be fused is adjusted to obtain the target fusion object.
  • the foreground camera parameters may be camera parameters when shooting foreground video frames to be fused.
  • the target transformation parameters are transformation parameters obtained after processing the foreground camera parameters based on the virtual camera parameters corresponding to the background video frames to be fused. Based on the target transformation parameters and fusion parameters, the fusion object to be processed can be processed to obtain the target fusion object.
  • the benefit and purpose of determining the color temperature parameter in the fusion parameters is: considering that the color temperature of the collected foreground video frame to be fused and the background video frame to be fused are different, and the main factors affecting the color temperature are red and blue Therefore, the color temperature migration is performed between the foreground video frame to be fused and the background video frame to be fused.
  • the target terminal may be a terminal device watching the live broadcast.
  • the target object to be fused After the target object to be fused corresponding to the foreground video frame to be fused is obtained, the target object to be fused can be fused into the background video frame to be fused, and the target fused video frame displayed on the target terminal is obtained.
  • the fusion parameters of the background video frames to be fused and the fusion objects to be processed of the foreground video frames to be fused can be respectively determined.
  • Process the object to be processed based on the fusion parameters determine the target object to be fused, and obtain the target video frame based on the target object to be fused and the background video frame to be fused, which solves the need to introduce green screen matting technology when performing video fusion in related technologies , that is, it is necessary to build a green screen, which leads to a high threshold for live broadcast construction and poor universality.
  • the image after fusion is relatively green, which is quite different from the actual image.
  • the target fusion video frame has a higher similarity with the actual image, which improves the technical effect of image display authenticity and user experience.
  • FIG. 3 is a schematic structural diagram of a video fusion device provided in Embodiment 3 of the present disclosure, which can execute the video fusion method provided in any embodiment of the present disclosure, and has corresponding functional modules and effects for executing the method.
  • the device includes: a video frame receiving module 310 , a fusion parameter determination module 320 , a fusion object determination module 330 and a video fusion module 340 .
  • the video frame receiving module 310 is configured to receive the foreground video frame to be fused in the foreground video stream to be fused and the background video frame to be fused in the background video stream to be fused;
  • the fusion parameter determination module 320 is configured to determine the background video frame to be fused with the background to be fused The fusion parameter corresponding to the video frame;
  • the fusion object determination module 330 is configured to determine the corresponding target foreground video frame to be fused according to the target foreground processing method corresponding to the foreground video frame to be fused and the fusion parameter The target object to be fused;
  • the video fusion module 340 is configured to determine the target fused video frame displayed on the target terminal according to the target object to be fused and the background video frame to be fused.
  • the fusion parameter determination module 320 is configured to: determine the color temperature parameter of the background video frame to be fused.
  • the device also includes a background video frame determination module to be fused, which is set to:
  • the acquired depth video stream of the target scene determine the 3D model corresponding to the target scene, and determine the background video frame to be fused according to the 3D model; or, according to the preset camera rendering parameters for the virtual scene Perform rendering, using the rendered video frame corresponding to the virtual scene as the background video frame to be fused; or, based on the two-dimensional video stream of the target area captured by the camera device, each video in the two-dimensional video stream frame as the background video frame to be fused.
  • the fusion object determination module 330 is configured to: determine the target foreground processing method corresponding to the foreground video frame to be fused according to the video type of the background video frame to be fused; The target foreground processing method and the fusion parameters are used to process the foreground video frame to be fused to obtain the target object to be fused.
  • the video type includes a three-dimensional video stream type
  • the target foreground processing method includes a depth estimation sub-mode, an image space fusion sub-mode and a matting algorithm
  • the fusion object determination module 330 includes:
  • the fused object determination unit to be processed is configured to determine the fused object to be processed in the foreground video frame to be fused based on the matting algorithm;
  • the depth information determination unit is configured to determine the fused object to be fused based on the depth estimation sub-mode
  • the fusion object determination unit is configured to process the depth information to be processed, the fusion object to be processed and the fusion parameters based on the image space fusion sub-mode, and determine the target to be fusion object.
  • the video type includes a two-dimensional video stream
  • the fused object determination module 330 includes: a fused object determination unit to be processed, configured to determine the foreground to be fused based on the matting algorithm The fusion object to be processed in the video frame; the fusion object determination unit is configured to adjust the white balance of the fusion object to be processed based on the fusion parameters, and determine the target fusion object.
  • the video type includes a virtual video stream type generated based on a virtual scene
  • the fusion object determination module 330 includes: a virtual camera parameter determination unit configured to obtain a virtual video stream type corresponding to the virtual video stream type.
  • Corresponding virtual camera parameters of the background video frame to be fused; the foreground camera parameter determination unit is configured to determine the corresponding foreground camera parameter of the foreground video frame to be fused; the fusion object determination unit is configured to change the parameters based on the foreground camera Process the foreground camera parameters and the virtual camera parameters to obtain target transformation parameters, and adjust the fusion objects to be processed in the foreground video frames to be fused based on the target transformation parameters and the fusion parameters to obtain the obtained The target to be fused.
  • the video fusion module 340 is configured to: add the target object to be fused to the background video frame to be fused to obtain a target fused video frame displayed on the target terminal.
  • the fusion parameters of the background video frames to be fused and the fusion objects to be processed of the foreground video frames to be fused can be respectively determined.
  • Process the object to be processed based on the fusion parameters determine the target object to be fused, and obtain the target video frame based on the target object to be fused and the background video frame to be fused, which solves the need to introduce green screen matting technology when performing video fusion in related technologies , that is, it is necessary to build a green screen, which leads to a high threshold for live broadcast construction and poor universality.
  • the image after fusion is relatively green, which is quite different from the actual image.
  • the similarity between the target video frame and the actual image is higher, which improves the authenticity of image display and the technical effect of user experience.
  • the video fusion device provided in the embodiments of the present disclosure can execute the video fusion method provided in any embodiment of the present disclosure, and has corresponding functional modules and effects for executing the method.
  • the multiple units and modules included in the above-mentioned device are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, the names of multiple functional units are only for the convenience of distinguishing each other , and are not intended to limit the protection scope of the embodiments of the present disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by Embodiment 4 of the present disclosure.
  • the terminal device 400 in the embodiment of the present disclosure may include but not limited to mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA), tablet computers (Portable Android Device, PAD), portable multimedia player Mobile terminals such as Portable Media Player (PMP), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital television (Television, TV), desktop computers, etc.
  • PDA Personal Digital Assistant
  • PAD Portable Multimedia Player
  • vehicle-mounted terminals such as vehicle-mounted navigation terminals
  • fixed terminals such as digital television (Television, TV), desktop computers, etc.
  • the electronic device 400 shown in FIG. 4 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 400 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 401, which may be stored in a read-only memory (Read-Only Memory, ROM) 402 according to a program 408 is loaded into a program in a random access memory (Random Access Memory, RAM) 403 to execute various appropriate actions and processes.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • various programs and data necessary for the operation of the electronic device 400 are also stored.
  • the processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An edit/output (Input/Output, I/O) interface 405 is also connected to the bus 404 .
  • an input device 406 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; including, for example, a liquid crystal display (Liquid Crystal Display, LCD) , an output device 407 such as a speaker, a vibrator, etc.; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409.
  • the communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data.
  • FIG. 4 shows electronic device 400 having various means, it is not a requirement to implement or possess all of the means shown. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 409, or from storage means 408, or from ROM 402.
  • the processing device 401 When the computer program is executed by the processing device 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the electronic device provided by the embodiment of the present disclosure belongs to the same idea as the video fusion method provided by the above embodiment, and the technical details not described in detail in this embodiment can be referred to the above embodiment, and this embodiment has the same effect as the above embodiment .
  • An embodiment of the present disclosure provides a computer storage medium, on which a computer program is stored, and when the program is executed by a processor, the video fusion method provided in the foregoing embodiments is implemented.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • Examples of computer readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • the program code contained on the computer readable medium can be transmitted by any appropriate medium, including but not limited to: electric wire, optical cable, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as Hypertext Transfer Protocol (HyperText Transfer Protocol, HTTP), and can communicate with digital data in any form or medium Communications (eg, communication networks) are interconnected.
  • Examples of communication networks include local area networks (Local Area Network, LAN), wide area networks (Wide Area Network, WAN), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently existing networks that are known or developed in the future.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: receives the foreground video frame to be fused and the background video to be fused in the foreground video stream to be fused The background video frame to be fused in the stream; the fusion parameter corresponding to the background video frame to be fused is determined; according to the target foreground processing method corresponding to the foreground video frame to be fused and the fusion parameter, determine the fusion parameter corresponding to the foreground video frame to be fused The target object to be fused corresponding to the foreground video frame to be fused; and the target fused video frame displayed on the target terminal is determined according to the target object to be fused and the background video frame to be fused.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a LAN or WAN, or it can be connected to an external computer (eg via the Internet using an Internet Service Provider).
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware.
  • the name of the unit does not constitute a limitation on the unit itself in one case, for example, the fusion parameter determination module can also be described as "a module for determining the fusion parameter corresponding to the background video frame to be fused".
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (Field Programmable Gate Arrays, FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (Application Specific Standard Parts, ASSP), System on Chip (System on Chip, SOC), Complex Programmable Logic Device (Complex Programming Logic Device, CPLD) and so on.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. Examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, RAM, ROM, EPROM or flash memory, optical fiber, CD-ROM, optical storage devices, magnetic storage devices, or Any suitable combination of the above.
  • Example 1 provides a video fusion method, which includes:
  • the target foreground processing method corresponding to the foreground video frame to be fused and the fusion parameters determine the target object to be fused corresponding to the foreground video frame to be fused;
  • the target fused video frame displayed on the target terminal is determined.
  • Example 2 provides a video fusion method, which also includes:
  • Example 3 provides a video fusion method, which also includes:
  • Determine the background video frame to be fused including:
  • each video frame in the two-dimensional video stream is used as the background video frame to be fused.
  • Example 4 provides a video fusion method, which further includes:
  • the determining the target object to be fused corresponding to the foreground video frame to be fused according to the target foreground processing method corresponding to the foreground video frame to be fused and the fusion parameters includes:
  • the to-be-fused foreground video frame is processed according to the target foreground processing manner and the fusion parameters to obtain a target to-be-fused object.
  • Example 5 provides a video fusion method, which further includes:
  • the video type includes a three-dimensional video stream type
  • the target foreground processing method includes a depth estimation sub-mode, an image space fusion sub-mode and a matting algorithm
  • the foreground to be fused is processed according to the target foreground processing method and the fusion parameters
  • the video frame is processed to obtain the target object to be fused, including:
  • the depth information to be processed, the fusion object to be processed, and the fusion parameters are processed based on the image space fusion sub-mode, and the target object to be fusion is determined.
  • Example 6 provides a video fusion method, which further includes:
  • the video type includes a two-dimensional video stream
  • the target foreground processing method includes a matting algorithm
  • the foreground video frame to be fused is processed according to the target foreground processing method and the fusion parameters to obtain a target object to be fused, include:
  • Example 7 provides a video fusion method, which further includes:
  • the video type includes a virtual video stream type generated based on a virtual scene
  • the target foreground processing method includes a foreground camera parameter change sub-mode
  • the foreground video frame to be fused is performed according to the target foreground processing method and the fusion parameters Processing to obtain the target object to be fused, including:
  • the object to be fused is obtained to obtain the target object to be fused.
  • Example 8 provides a video fusion method, which further includes:
  • the target object to be fused is added to the background video frame to be fused to obtain a target fused video frame displayed on the target terminal.
  • Example 9 provides a video fusion device, which includes:
  • the video frame receiving module is configured to receive the foreground video frame to be fused in the foreground video stream to be fused and the background video frame to be fused in the background video stream to be fused;
  • a fusion parameter determination module configured to determine a fusion parameter corresponding to the background video frame to be fused
  • the fusion object determination module is configured to determine the target object to be fused corresponding to the foreground video frame to be fused according to the target foreground processing method corresponding to the foreground video frame to be fused and the fusion parameters;
  • the video fusion module is configured to determine the target fused video frame displayed on the target terminal according to the target object to be fused and the background video frame to be fused.

Abstract

本文公开了一种视频融合方法、装置、电子设备及存储介质。该视频融合方法包括:接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;确定与待融合背景视频帧相对应的融合参数;根据与待融合前景视频帧相对应的目标前景处理方式和融合参数,确定与待融合前景视频帧相对应的目标待融合对象;根据目标待融合对象以及待融合背景视频帧,确定显示在目标终端的目标融合视频帧。

Description

视频融合方法、装置、电子设备及存储介质
本申请要求在2021年10月25日提交中国专利局、申请号为202111243155.4的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,例如涉及一种视频融合方法、装置、电子设备及存储介质。
背景技术
随着多媒体技术的发展,为了满足用户对视频的多样化需求。可以将多种视频直播融合在一起,例如,将视频直播+虚拟现实(Virtual Reality,VR)直播的方式相结合,从而输出数字化打造的舞台直播效果。
但是,视频直播融合多需要引入绿幕抠图技术,即需要大范围搭建绿幕场景,导致视频直播融合的门槛较高;同时,由于是基于绿幕抠图来实现的视频融合,得到的图像多是偏绿色等,导致融合效果不真实,进而存在用户使用体验不佳的问题。
发明内容
本公开提供一种视频融合方法、装置、电子设备及存储介质,以实现视频融合普适性,以及提高使用体验的技术效果。
本公开提供了一种视频融合方法,该方法包括:
接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;
确定与所述待融合背景视频帧相对应的融合参数;
根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象;
根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
本公开还提供了一种视频融合装置,该装置包括:
视频帧接收模块,设置为接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;
融合参数确定模块,设置为确定与所述待融合背景视频帧相对应的融合参数;
融合对象确定模块,设置为根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象;
视频融合模块,设置为根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
本公开还提供了一种电子设备,所述电子设备包括:
一个或多个处理器;
存储装置,设置为存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现上述的视频融合方法。
本公开还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行上述的视频融合方法。
附图说明
图1为本公开实施例一所提供的一种视频融合方法的流程示意图;
图2为本公开实施例二所提供的一种视频融合方法的流程示意图;
图3为本公开实施例三所提供的一种视频融合装置的结构示意图;
图4为本公开实施例四所提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的一些实施例,然而本公开可以通过多种形式来实现,提供这些实施例是为了理解本公开。本公开的附图及实施例仅用于示例性作用。
本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
在介绍本技术方案之前,可以先对应用场景进行示例性说明。可以将本技术方案应用在融合直播场景、融合短视频拍摄等场景中,如,将拍摄的两个视频流进行融合的场景,例如,将拍摄的场景1的视频流和场景2的视频流进行融合的场景中。
实施例一
图1为本公开实施例一所提供的一种视频融合方法的流程示意图,本实施例可适用于对至少两个视频流进行融合的情形,该方法可以由视频融合装置来执行,该装置可以通过软件和/或硬件的形式实现,该硬件可以是电子设备,如移动终端、个人电脑(Personal Computer,PC)端或服务器等。
如图1所示,本实施例的方法包括:
S110、接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧。
在基于流媒体的通信或交互方案中,多个端口间的音视频传递通常采用流式传输的方式,例如,将多个视频帧以视频流的形式进行传输。
可以将待融合对象所属的视频帧作为待融合前景视频帧。如,将视频帧中的用户A融合到另一视频帧中,则可以将用户A所属的视频帧作为待融合前景视频帧。相应的,用户A所属的视频流作为待融合前景视频流。将待融合对象将要融入的视频帧作为待融合背景视频帧。相应的,将每个待融合背景视频帧所属的视频作为待融合背景视频流。
示例性的,如果需要将室内的用户A融合到街头场景中,则可以基于摄像装置拍摄室内包括用户A的视频流,作为待融合前景视频流;同时,基于摄像装置拍摄街头场景的视频流,作为待融合背景视频流。可以对同一时刻待融合前景视频流的待融合前景视频帧和待融合背景视频流的待融合背景视频帧进行处理,以实现将用户A融合多街头场景中。
如果仅仅是图像的融合,则可以根据用户的偏好将从待融合背景视频流中选取的一个视频帧和从待融合前景视频流中选取的一个视频帧进行融合处理即 可。如果是视频的融合,则可以依次对同一时刻的待融合前景视频流和待融合背景视频流的视频帧融合处理;也可以是,将待融合前景视频流和待融合背景视频流中同一帧数的视频帧进行处理,以得到最终所需要的目标视频帧,进而根据多个目标视频帧得到目标视频。
S120、确定与待融合背景视频帧相对应的融合参数。
在实际应用中,待融合前景视频帧和待融合背景视频帧的拍摄参数和环境参数是存在一定的差异的,因此,直接将得到的待融合前景视频帧和待融合背景视频帧融合时,存在真实度较低的问题。示例性的,待融合背景视频帧是夜间街头场景,待融合前景视频帧是在室内所有灯光都打开的条件下拍摄的,即夜间街头场景和室内场景的环境参数是存在较大差异的,如果将室内场景的目标对象融合到夜间街头场景时,存在目标对象的亮度较强,从而达到融合真实度较低的问题。
为了解决上述问题,在确定待融合背景视频流中的待融合背景视频帧时,可以先确定待融合背景视频帧的融合参数,融合参数可以是色温参数或者饱和度参数等。确定上述参数的好处在于,可以基于融合参数调整待融合前景视频帧中待处理融合对象的参数,进而使融合得到的视频帧有较高的真实度。
在接收到待融合背景视频帧后,可以基于预先设置的算法确定待融合背景视频帧中的融合参数,例如,色温参数等。
在本实施例中,所述确定与所述待融合背景视频帧相对应的融合参数,包括:确定所述待融合背景视频帧的色温参数。
S130、根据与待融合前景视频帧相对应的目标前景处理方式和融合参数,确定与待融合前景视频帧相对应的目标待融合对象。
可以将对待融合前景视频帧进行处理的处理方式作为目标前景处理方式。待融合背景视频帧的视频类型包括多种,将与视频类型相对应的处理方式,作为待融合前景视频帧的目标前景处理方式。目标待融合对象可以是待融合前景视频帧中的主体对象。主体对象可以是预先设置的,例如,如果要将待融合前景视频帧中的用户融入到待融合背景视频帧,则可以将处理后的用户作为目标待融合对象;如果要将待融合前景视频帧中的宠物和用户都融合到待融合背景视频帧中,则可以将处理后的宠物和用户均作为目标待融合对象。
根据待融合背景视频帧的视频类型,可以确定对待融合前景视频帧进行处理的目标前景处理方式。基于目标前景处理方式对待融合前景视频帧进行提取处理,得到待处理融合对象。根据待融合背景视频帧的融合参数和待处理融合对象,可以确定与待融合前景视频帧相对应的目标待融合对象。
基于融合参数中的色温参数,可以对待融合前景视频帧的待处理融合对象进行白平衡处理,得到目标待融合对象。
S140、根据目标待融合对象以及待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
目标融合视频帧为将待融合前景视频帧和待融合背景视频帧融合得到的最终视频帧。
可以重复执行上述步骤得到多个目标融合视频帧,并根据多个目标融合视频帧,得到目标视频。
在确定目标待融合对象后,可以将目标待融合对象添加至待融合背景视频帧中,得到显示在目标终端的目标融合视频帧。
本公开实施例的技术方案,在接收到待融合前景视频帧和待融合背景视频帧后,可以分别确定出待融合背景视频帧的融合参数,以及待融合前景视频帧的待处理融合对象。基于融合参数对待处理融合对象进行处理,确定目标待融合对象,基于目标待融合对象和待融合背景视频帧,得到目标视频帧,解决了相关技术中进行视频融合时,需要引入绿幕抠图技术,即需要搭建绿幕,导致直播搭建门槛较高,普适性较差的问题,同时,由于是搭建的绿幕,因此融合之后的图像比较偏绿,与实际图像存在较大的差异,导致融合后真实度较低的问题,实现了在无需搭建绿幕的条件下,可以基于待融合背景视频流的融合参数,调整待融合前景视频帧中待处理融合对象的参数,以使融合得到的目标融合视频帧与实际图像相似度更高,提高了图像显示真实性以及用户使用体验的技术效果。
实施例二
图2为本公开实施例二所提供的一种视频融合方法流程示意图,在前述实施例的基础上,可以根据待融合背景视频帧的视频类型对待融合前景视频帧进行处理,得到目标待融合对象,其具体的实施方式可以参见本技术方案的详细阐述。其中,与上述实施例相同或者相应的技术术语在此不再赘述。
如图2所示,所述方法包括:
S210、确定待融合背景视频流中的待融合背景视频帧。
在本实施例中,确定待融合背景视频流中的待融合背景视频帧可以采用下述三种方式中的至少一种。
第一种实施方式可以是:根据获取到的目标场景的深度视频流,确定与所 述目标场景相对应的三维模型,并根据所述三维模型确定所述待融合背景视频帧。
为了在采集场景彩色图像的同时,得到能够反应该场景几何深度信息的深度图像,用于拍摄的相机至少需要部署两台,一台相机设置为获取目标场景的彩色图像,另一台相机设置为获取目标场景的深度图像,最后基于与两个相机所对应数据传输通道中的图像整合出深度视频流。示例性的,可以将上述两台相机在相同的视角下瞄准目标场景进行拍摄,进而得到包含彩色图像和深度图像的深度视频流。计算端接收到深度视频流之后,可以将这些数据作为原始数据进而构建出三维模型,三维模型即是与目标场景相对应的三维模型,是目标场景(以及场景内物体)的多边形表示,至少可以反映场景(以及场景内物体)的三维几何结构。同时,基于深度视频流中的每一帧画面都可以构建出三维模型,而视频流中又包含目标场景的多帧实时画面,因此,所构建的三维模型也是动态的三维视频模型。基于视点视角选定,对三维视频模型进行场景绘制,得到待融合背景视频帧。
在实际应用过程中,为了提升图像信息采集的效率和准确度,还可以针对目标场景从多个视角部署更多的相机,部署方式可以根据任务要求进行选择,本公开实施例在此不做限定。
在基于深度视频流确定待融合背景视频帧时,可以先根据深度视频流确定三维模型,进而根据选定的视角进行场景绘制,得到待融合背景视频帧。
第二种实施方式可以是:根据预先设置的相机渲染参数对虚拟场景进行渲染,将渲染后的所述虚拟场景对应的视频帧作为所述待融合背景视频帧。
对于三维融合/空间融合,需要将实拍的待融合对象投影到虚拟场景空间中。将拍摄真实影像的摄像机和三维空间中的虚拟摄像机的相机参数和运动信息绑定,从而达到融合的目的。例如,将目标用户真实站立的地面和虚拟空间的地面重合,避免用户在虚拟场景中走动时,脱离地面,导致显示画面不佳的问题。
相机渲染参数为对虚拟场景进行拍摄的参数。待融合背景视频帧是基于相机渲染参数对虚拟场景渲染处理后,得到的视频帧。
可以预先设置渲染参数,即虚拟空间中的虚拟相机参数。基于虚拟相机参数可以对虚拟场景进行渲染处理,得到与虚拟场景相对应的视频帧,并将此时得到的视频帧作为待融合背景视频帧。
第三种实施方式可以是:基于摄像装置拍摄目标区域的二维视频流,将所述二维视频流中的每个视频帧作为所述待融合背景视频帧。
二维视频流可以为基于普通的摄像装置拍摄的视频流。
可以预先确定要将待融合对象融合到哪一个场景中,在确定场景后,可以基于摄像装置拍摄目标场景的二维视频流,并将拍摄到的二维视频流中的每一个视频帧作为待融合背景视频帧。
S220、接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧。
S230、根据所述待融合背景视频帧的视频类型,确定与所述待融合前景视频帧相对应的目标前景处理方式。
基于上述可知,待融合背景视频帧的得到方式不同,相应的,待融合背景视频帧的视频类型也不相同,同时,在视频类型不同的情况下,对待融合前景视频帧的处理方式也不同。
可以根据待融合背景视频帧的视频类型,确定对待融合前景视频帧的处理方式,并将此处理方式作为目标前景处理方式。
在本实施例中,待融合背景视频帧的确定方式有至少三种,那么视频类型也有至少三种类型。视频类型至少包括三维视频流、二维视频流以及虚拟视频流类型。相应的,处理方式包括与三维视频流类型相对应的深度估算子方式、图像空间融合子方式和抠图算法;与二维视频流相对应的抠图算法;以及与虚拟视频流类型相对应的前景相机变化子方式。
S240、根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到目标待融合对象。
在本实施例中,若视频类型为三维视频流类型,基于目标前景处理方式对融合参数和待融合前景视频帧进行处理的方式可以是:基于抠图算法,确定待融合前景视频帧中的待处理融合对象;基于深度估算子方式确定待融合前景视频帧的待处理深度信息;基于图像空间融合子方式对待处理深度信息、待处理融合对象和融合参数进行处理,确定目标待融合对象。
抠图算法可以采用的是深度学习算法。该深度学习算法的输入是待融合前景视频帧的原图,返回的是人像对应的精细化alpha通道。基于该通道中的像素点,可以确定待处理融合对象。深度估算子方式可以为,确定待融合前景视频帧中待处理融合对象所对应的深度值,从而将其融合到待融合背景视频帧时,避免用户飘起来的情况。基于深度估算子方式可以确定出待融合前景视频帧的深度信息,并将此深度信息作为待处理深度信息。图像空间融合子方式可以是对待处理深度信息、待处理融合对象和融合参数进行处理的方式,例如,根据融合参数对待处理融合对象进行自适应白平衡处理,同时,基于待处理深度信息将自适应白平衡处理后的对象进行空间融合处理,得到目标待融合对象。
在本实施例中,自适应白平衡可以为:针对一些采光过曝或者补光不足的场景进行矫正,此处采用自动gamma矫正白平衡,通过灰度图计算图像的均值,再根据均值对gamma参数进行计算(gamma=math.log10(0.5)/math.log10(mean/255)),然后对gamma矫正后的图片进行色彩平衡(通过去掉一定的比例极端颜色,再将剩余区间的色彩重新分布)调整来解决白平衡可能引入的图像色彩的偏差(如泛白)。
在本实施例中,若视频类型包括二维视频流,则目标前景处理方法为抠图算法。相应的,基于目标前景处理方式对待融合前景视频帧进行处理,得到目标待融合对象,包括:基于所述抠图算法,确定所述待融合前景视频帧中的待处理融合对象;基于所述融合参数对所述待处理融合对象进行白平衡调整,确定所述目标待融合对象。
基于抠图算法可以确定待融合前景视频帧中的待处理融合对象。同时,基于融合参数中的色温参数,对待处理融合对象进行白平衡调整,从而得到使得待处理融合对象与待融合背景视频帧相适配的目标待融合对象。
在本实施例中,若视频类型包括基于虚拟场景生成的虚拟视频类型,所述目标前景处理方式包括前景相机参数变化子方式。确定目标待融合对象可以是:获取与虚拟视频流类型相对应的待融合背景视频帧的虚拟相机参数;确定与待融合前景视频帧相对应的前景相机参数;基于前景相机变化子方式对前景相机参数和虚拟相机参数进行处理,得到目标变换参数,并基于目标变换参数和融合参数调整待融合前景视频帧中的待处理融合对象,得到目标待融合对象。
前景相机参数可以为拍摄待融合前景视频帧时的相机参数。目标变换参数是基于与待融合背景视频帧相对应的虚拟相机参数,对前景相机参数处理后,得到的变换参数。基于目标变换参数和融合参数,可以对待处理融合对象进行处理,得到目标待融合对象。
在本实施例中,确定融合参数中的色温参数的好处和目的在于:考虑到采集待融合前景视频帧与待融合背景视频帧的色温有所不同,而影响色温的主要因素是红色与蓝色的分布,故对待融合前景视频帧和待融合背景视频帧进行色温迁移。仅采用红绿蓝(Red-Green-Blue,RGB)色彩空间中的RGB通道处理会出现颜色不均衡的情况,于是可以将待融合前景视频帧和待融合背景视频帧均转换到颜色模型(LAB)色彩空间(-a*代表从绿色到红色的分量,-b*代表从蓝色到黄色的分量),将对应的a,b通道进行匹配(即将待融合前景视频帧的对应通道按待融合背景视频帧对应通道的均值和方差重新分布)。
S250、根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
目标终端可以是观看直播的终端设备。
在得到与待融合前景视频帧相对应的目标待融合对象后,可以将目标待融合对象融合到待融合背景视频帧中,得到显示在目标终端的目标融合视频帧。
本公开实施例的技术方案,在接收到待融合前景视频帧和待融合背景视频帧后,可以分别确定出待融合背景视频帧的融合参数,以及待融合前景视频帧的待处理融合对象。基于融合参数对待处理融合对象进行处理,确定目标待融合对象,基于目标待融合对象和待融合背景视频帧,得到目标视频帧,解决了相关技术中进行视频融合时,需要引入绿幕抠图技术,即需要搭建绿幕,导致直播搭建门槛较高,普适性较差的问题,同时,由于是搭建的绿幕,因此融合之后的图像比较偏绿,与实际图像存在较大的差异,导致融合后真实度较低的问题,实现了在无需搭建绿幕的条件下,可以基于待融合背景视频流的融合参数,调整待融合前景视频帧中待处理融合对象的参数,以使融合得到的目标融合视频帧与实际图像相似度更高,提高了图像显示真实性以及用户使用体验的技术效果。
实施例三
图3为本公开实施例三所提供的一种视频融合装置的结构示意图,可执行本公开任意实施例所提供视频融合方法,具备执行方法相应的功能模块和效果。如图3所示,该装置包括:视频帧接收模块310、融合参数确定模块320、融合对象确定模块330以及视频融合模块340。
视频帧接收模块310,设置为接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;融合参数确定模块320,设置为确定与所述待融合背景视频帧相对应的融合参数;融合对象确定模块330,设置为根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象;视频融合模块340,设置为根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
在上述技术方案的基础上,所述融合参数确定模块320,设置为:确定所述待融合背景视频帧的色温参数。
在上述技术方案的基础上,所述装置还包括待融合背景视频帧确定模块,设置为:
根据获取到的目标场景的深度视频流,确定与所述目标场景相对应的三维模型,并根据所述三维模型确定所述待融合背景视频帧;或,根据预先设置的 相机渲染参数对虚拟场景进行渲染,将渲染后的所述虚拟场景对应的视频帧作为所述待融合背景视频帧;或,基于摄像装置拍摄目标区域的二维视频流,将所述二维视频流中的每个视频帧作为所述待融合背景视频帧。
在上述技术方案的基础上,所述融合对象确定模块330,设置为:根据所述待融合背景视频帧的视频类型,确定与所述待融合前景视频帧相对应的目标前景处理方式;根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到目标待融合对象。
在上述技术方案的基础上,所述视频类型包括三维视频流类型,所述目标前景处理方式包括深度估算子方式、图像空间融合子方式和抠图算法,所述融合对象确定模块330,包括:待处理融合对象确定单元,设置为基于所述抠图算法,确定所述待融合前景视频帧中的待处理融合对象;深度信息确定单元,设置为基于所述深度估算子方式确定所述待融合前景视频帧的待处理深度信息;融合对象确定单元,设置为基于所述图像空间融合子方式对所述待处理深度信息、待处理融合对象和所述融合参数进行处理,确定所述目标待融合对象。
在上述技术方案的基础上,所述视频类型包括二维视频流,所述融合对象确定模块330,包括:待处理融合对象确定单元,设置为基于所述抠图算法,确定所述待融合前景视频帧中的待处理融合对象;融合对象确定单元,设置为基于所述融合参数对所述待处理融合对象进行白平衡调整,确定所述目标待融合对象。
在上述技术方案的基础上,所述视频类型包括基于虚拟场景生成的虚拟视频流类型,所述融合对象确定模块330,包括:虚拟相机参数确定单元,设置为获取与所述虚拟视频流类型相对应的待融合背景视频帧的虚拟相机参数;前景相机参数确定单元,设置为确定与所述待融合前景视频帧相对应的前景相机参数;融合对象确定单元,设置为基于所述前景相机变化子方式对所述前景相机参数和所述虚拟相机参数进行处理,得到目标变换参数,并基于所述目标变换参数和所述融合参数调整所述待融合前景视频帧中的待处理融合对象,得到所述目标待融合对象。
在上述技术方案的基础上,所述视频融合模块340,设置为:将所述目标待融合对象添加到所述待融合背景视频帧中,得到显示在所述目标终端的目标融合视频帧。
本公开实施例的技术方案,在接收到待融合前景视频帧和待融合背景视频帧后,可以分别确定出待融合背景视频帧的融合参数,以及待融合前景视频帧的待处理融合对象。基于融合参数对待处理融合对象进行处理,确定目标待融合对象,基于目标待融合对象和待融合背景视频帧,得到目标视频帧,解决了 相关技术中进行视频融合时,需要引入绿幕抠图技术,即需要搭建绿幕,导致直播搭建门槛较高,普适性较差的问题,同时,由于是搭建的绿幕,因此融合之后的图像比较偏绿,与实际图像存在较大的差异,导致融合后真实度较低的问题,实现了在无需搭建绿幕的条件下,可以基于待融合背景视频流的融合参数,调整待融合前景视频帧中待处理融合对象的参数,以使融合得到的目标视频帧与实际图像相似度更高,提高了图像显示真实性以及用户使用体验的技术效果。
本公开实施例所提供的视频融合装置可执行本公开任意实施例所提供的视频融合方法,具备执行方法相应的功能模块和效果。
上述装置所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的名称也只是为了便于相互区分,并不用于限制本公开实施例的保护范围。
实施例四
图4为本公开实施例四所提供的一种电子设备的结构示意图。下面参考图4,其示出了适于用来实现本公开实施例的电子设备(例如图4中的终端设备或服务器)400的结构示意图。本公开实施例中的终端设备400可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(Portable Android Device,PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(Television,TV)、台式计算机等等的固定终端。图4示出的电子设备400仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图4所示,电子设备400可以包括处理装置(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(Read-Only Memory,ROM)402中的程序或者从存储装置408加载到随机访问存储器(Random Access Memory,RAM)403中的程序而执行多种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的多种程序和数据。处理装置401、ROM 402以及RAM 403通过总线404彼此相连。编辑/输出(Input/Output,I/O)接口405也连接至总线404。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置407;包括 例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有多种装置的电子设备400,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装置408被安装,或者从ROM 402被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
本公开实施例提供的电子设备与上述实施例提供的视频融合方法属于同一构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的效果。
实施例五
本公开实施例提供了一种计算机存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述实施例所提供的视频融合方法。
本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算 机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;确定与所述待融合背景视频帧相对应的融合参数;根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象;根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括LAN或WAN—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图 中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在一种情况下并不构成对该单元本身的限定,例如,融合参数确定模块还可以被描述为“确定与所述待融合背景视频帧相对应的融合参数的模块”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programming Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM或快闪存储器、光纤、CD-ROM、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,【示例一】提供了一种视频融合方法,该方法包括:
接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;
确定与所述待融合背景视频帧相对应的融合参数;
根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象;
根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
根据本公开的一个或多个实施例,【示例二】提供了一种视频融合方法, 还包括:
所述确定与所述待融合背景视频帧相对应的融合参数,包括:
确定所述待融合背景视频帧的色温参数。
根据本公开的一个或多个实施例,【示例三】提供了一种视频融合方法,还包括:
确定待融合背景视频帧,包括:
根据获取到的目标场景的深度视频流,确定与所述目标场景相对应的三维模型,并根据所述三维模型确定所述待融合背景视频帧;或,
根据预先设置的相机渲染参数对虚拟场景进行渲染,将渲染后的所述虚拟场景对应的视频帧作为所述待融合背景视频帧;或,
基于摄像装置拍摄目标区域的二维视频流,将所述二维视频流中的每个视频帧作为所述待融合背景视频帧。
根据本公开的一个或多个实施例,【示例四】提供了一种视频融合方法,还包括:
所述根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象,包括:
根据所述待融合背景视频帧的视频类型,确定与所述待融合前景视频帧相对应的目标前景处理方式;
根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到目标待融合对象。
根据本公开的一个或多个实施例,【示例五】提供了一种视频融合方法,还包括:
所述视频类型包括三维视频流类型,所述目标前景处理方式包括深度估算子方式、图像空间融合子方式和抠图算法,根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到目标待融合对象,包括:
基于所述抠图算法,确定所述待融合前景视频帧中的待处理融合对象;
基于所述深度估算子方式确定所述待融合前景视频帧的待处理深度信息;
基于所述图像空间融合子方式对所述待处理深度信息、待处理融合对象和所述融合参数进行处理,确定所述目标待融合对象。
根据本公开的一个或多个实施例,【示例六】提供了一种视频融合方法,还包括:
所述视频类型包括二维视频流,所述目标前景处理方式包括抠图算法,根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到目标待融合对象,包括:
基于所述抠图算法,确定所述待融合前景视频帧中的待处理融合对象;
基于所述融合参数对所述待处理融合对象进行白平衡调整,确定所述目标待融合对象。
根据本公开的一个或多个实施例,【示例七】提供了一种视频融合方法,还包括:
所述视频类型包括基于虚拟场景生成的虚拟视频流类型,所述目标前景处理方式包括前景相机参数变化子方式,根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到目标待融合对象,包括:
获取与所述虚拟视频流类型相对应的待融合背景视频帧的虚拟相机参数;
确定与所述待融合前景视频帧相对应的前景相机参数;
基于所述前景相机变化子方式对所述前景相机参数和所述虚拟相机参数进行处理,得到目标变换参数,并基于所述目标变换参数和所述融合参数调整所述待融合前景视频帧中的待处理融合对象,得到所述目标待融合对象。
根据本公开的一个或多个实施例,【示例八】提供了一种视频融合方法,还包括:
所述根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧,包括:
将所述目标待融合对象添加到所述待融合背景视频帧中,得到显示在所述目标终端的目标融合视频帧。
根据本公开的一个或多个实施例,【示例九】提供了一种视频融合装置,该装置包括:
视频帧接收模块,设置为接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;
融合参数确定模块,设置为确定与所述待融合背景视频帧相对应的融合参数;
融合对象确定模块,设置为根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象;
视频融合模块,设置为根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
此外,虽然采用特定次序描绘了多个操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了多个实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的一些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。

Claims (11)

  1. 一种视频融合方法,包括:
    接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;
    确定与所述待融合背景视频帧相对应的融合参数;
    根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象;
    根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧。
  2. 根据权利要求1所述的方法,其中,所述确定与所述待融合背景视频帧相对应的融合参数,包括:
    确定所述待融合背景视频帧的色温参数。
  3. 根据权利要求1所述的方法,其中,所述待融合背景视频帧的确定方式,包括:
    根据获取到的目标场景的深度视频流,确定与所述目标场景相对应的三维模型,并根据所述三维模型确定所述待融合背景视频帧;或,
    根据预先设置的相机渲染参数对虚拟场景进行渲染,将渲染后的所述虚拟场景对应的视频帧作为所述待融合背景视频帧;或,
    基于摄像装置拍摄目标区域的二维视频流,将所述二维视频流中的每个视频帧作为所述待融合背景视频帧。
  4. 根据权利要求1所述的方法,其中,所述根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象,包括:
    根据所述待融合背景视频帧的视频类型,确定与所述待融合前景视频帧相对应的目标前景处理方式;
    根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到所述目标待融合对象。
  5. 根据权利要求4所述的方法,其中,所述视频类型包括三维视频流类型,所述目标前景处理方式包括深度估算子方式、图像空间融合子方式和抠图算法,所述根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到所述目标待融合对象,包括:
    基于所述抠图算法,确定所述待融合前景视频帧中的待处理融合对象;
    基于所述深度估算子方式确定所述待融合前景视频帧的待处理深度信息;
    基于所述图像空间融合子方式对所述待处理深度信息、所述待处理融合对象和所述融合参数进行处理,确定所述目标待融合对象。
  6. 根据权利要求4所述的方法,其中,所述视频类型包括二维视频流,所述目标前景处理方式包括抠图算法,所述根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到所述目标待融合对象,包括:
    基于所述抠图算法,确定所述待融合前景视频帧中的待处理融合对象;
    基于所述融合参数对所述待处理融合对象进行白平衡调整,确定所述目标待融合对象。
  7. 根据权利要求4所述的方法,其中,所述视频类型包括基于虚拟场景生成的虚拟视频流类型,所述目标前景处理方式包括前景相机参数变化子方式,所述根据所述目标前景处理方式和所述融合参数对所述待融合前景视频帧进行处理,得到所述目标待融合对象,包括:
    获取与所述虚拟视频流类型相对应的待融合背景视频帧的虚拟相机参数;
    确定与所述待融合前景视频帧相对应的前景相机参数;
    基于所述前景相机变化子方式对所述前景相机参数和所述虚拟相机参数进行处理,得到目标变换参数,并基于所述目标变换参数和所述融合参数调整所述待融合前景视频帧中的待处理融合对象,得到所述目标待融合对象。
  8. 根据权利要求1所述的方法,其中,所述根据所述目标待融合对象以及所述待融合背景视频帧,确定显示在目标终端的目标融合视频帧,包括:
    将所述目标待融合对象添加到所述待融合背景视频帧中,得到显示在所述目标终端的目标融合视频帧。
  9. 一种视频融合装置,包括:
    视频帧接收模块,设置为接收待融合前景视频流中的待融合前景视频帧和待融合背景视频流中的待融合背景视频帧;
    融合参数确定模块,设置为确定与所述待融合背景视频帧相对应的融合参数;
    融合对象确定模块,设置为根据与所述待融合前景视频帧相对应的目标前景处理方式和所述融合参数,确定与所述待融合前景视频帧相对应的目标待融合对象;
    视频融合模块,设置为根据所述目标待融合对象以及所述待融合背景视频 帧,确定显示在目标终端的目标融合视频帧。
  10. 一种电子设备,包括:
    至少一个处理器;
    存储装置,设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-8中任一项所述的视频融合方法。
  11. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行如权利要求1-8中任一项所述的视频融合方法。
PCT/CN2022/119508 2021-10-25 2022-09-19 视频融合方法、装置、电子设备及存储介质 WO2023071603A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111243155.4A CN113989173A (zh) 2021-10-25 2021-10-25 视频融合方法、装置、电子设备及存储介质
CN202111243155.4 2021-10-25

Publications (1)

Publication Number Publication Date
WO2023071603A1 true WO2023071603A1 (zh) 2023-05-04

Family

ID=79741170

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/119508 WO2023071603A1 (zh) 2021-10-25 2022-09-19 视频融合方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN113989173A (zh)
WO (1) WO2023071603A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117336459A (zh) * 2023-10-10 2024-01-02 雄安雄创数字技术有限公司 一种三维视频融合方法、装置、电子设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113989173A (zh) * 2021-10-25 2022-01-28 北京字节跳动网络技术有限公司 视频融合方法、装置、电子设备及存储介质
CN114494328B (zh) * 2022-02-11 2024-01-30 北京字跳网络技术有限公司 图像显示方法、装置、电子设备及存储介质
CN114584797A (zh) * 2022-02-28 2022-06-03 北京字节跳动网络技术有限公司 直播画面的展示方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945223A (zh) * 2010-09-06 2011-01-12 浙江大学 视频一致性融合处理方法
JP2013149219A (ja) * 2012-01-23 2013-08-01 Canon Inc 映像処理装置及びその制御方法
CN111260601A (zh) * 2020-02-12 2020-06-09 北京字节跳动网络技术有限公司 图像融合方法、装置、可读介质及电子设备
CN113436343A (zh) * 2021-06-21 2021-09-24 广州博冠信息科技有限公司 用于虚拟演播厅的画面生成方法及装置、介质及电子设备
CN113989173A (zh) * 2021-10-25 2022-01-28 北京字节跳动网络技术有限公司 视频融合方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101945223A (zh) * 2010-09-06 2011-01-12 浙江大学 视频一致性融合处理方法
JP2013149219A (ja) * 2012-01-23 2013-08-01 Canon Inc 映像処理装置及びその制御方法
CN111260601A (zh) * 2020-02-12 2020-06-09 北京字节跳动网络技术有限公司 图像融合方法、装置、可读介质及电子设备
CN113436343A (zh) * 2021-06-21 2021-09-24 广州博冠信息科技有限公司 用于虚拟演播厅的画面生成方法及装置、介质及电子设备
CN113989173A (zh) * 2021-10-25 2022-01-28 北京字节跳动网络技术有限公司 视频融合方法、装置、电子设备及存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117336459A (zh) * 2023-10-10 2024-01-02 雄安雄创数字技术有限公司 一种三维视频融合方法、装置、电子设备及存储介质
CN117336459B (zh) * 2023-10-10 2024-04-30 雄安雄创数字技术有限公司 一种三维视频融合方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN113989173A (zh) 2022-01-28

Similar Documents

Publication Publication Date Title
WO2023071603A1 (zh) 视频融合方法、装置、电子设备及存储介质
WO2017113734A1 (zh) 一种视频的多点同屏播放方法及系统
WO2023071574A1 (zh) 3d影像的重构方法、装置、电子设备及存储介质
WO2022161107A1 (zh) 三维视频的处理方法、设备及存储介质
CN111970524B (zh) 交互类直播连麦的控制方法、装置、系统、设备及介质
WO2023071707A1 (zh) 视频图像处理方法、装置、电子设备及存储介质
CN113873264A (zh) 显示图像的方法、装置、电子设备及存储介质
WO2023207379A1 (zh) 图像处理方法、装置、设备及存储介质
WO2023143217A1 (zh) 特效道具的展示方法、装置、设备及存储介质
WO2023185455A1 (zh) 图像处理方法、装置、电子设备及存储介质
CN114004927A (zh) 3d视频模型重建方法、装置、电子设备及存储介质
CN110012336A (zh) 直播界面的图片配置方法、终端及装置
US20240062479A1 (en) Video playing method and apparatus, electronic device, and storage medium
JP2023538825A (ja) ピクチャのビデオへの変換の方法、装置、機器および記憶媒体
WO2023216822A1 (zh) 图像校正方法、装置、电子设备及存储介质
WO2024056030A1 (zh) 一种图像深度估计方法、装置、电子设备及存储介质
WO2024027611A1 (zh) 视频直播方法、装置、电子设备以及存储介质
WO2024022391A1 (zh) 多媒体数据传输方法、装置、电子设备及存储介质
WO2023169287A1 (zh) 美妆特效的生成方法、装置、设备、存储介质和程序产品
US11871137B2 (en) Method and apparatus for converting picture into video, and device and storage medium
WO2023088104A1 (zh) 视频的处理方法、装置、电子设备和存储介质
WO2023138441A1 (zh) 视频生成方法、装置、设备及存储介质
WO2023134509A1 (zh) 视频推流方法、装置、终端设备及存储介质
CN116614716A (zh) 图像处理方法、图像处理装置、存储介质及电子设备
WO2023109564A1 (zh) 视频图像处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885490

Country of ref document: EP

Kind code of ref document: A1