WO2023025085A1 - 视频处理方法、装置、设备、介质及程序产品 - Google Patents

视频处理方法、装置、设备、介质及程序产品 Download PDF

Info

Publication number
WO2023025085A1
WO2023025085A1 PCT/CN2022/113881 CN2022113881W WO2023025085A1 WO 2023025085 A1 WO2023025085 A1 WO 2023025085A1 CN 2022113881 W CN2022113881 W CN 2022113881W WO 2023025085 A1 WO2023025085 A1 WO 2023025085A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
grid
pixel
background
Prior art date
Application number
PCT/CN2022/113881
Other languages
English (en)
French (fr)
Inventor
冷晓旭
张永杰
Original Assignee
北京字跳网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字跳网络技术有限公司 filed Critical 北京字跳网络技术有限公司
Publication of WO2023025085A1 publication Critical patent/WO2023025085A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/222Studio circuitry; Studio devices; Studio equipment
    • H04N5/262Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects

Definitions

  • the present disclosure relates to the technical field of video processing, and in particular, to a video processing method, device, device, medium, program product, and computer program.
  • the foreground completion technology can be widely used in the field of digital entertainment special effects, and the existing foreground completion technology usually uses the deep learning method to complete, specifically, the optical flow network is generally used Combined with Generative Adversarial Network (GAN) to complete video frames.
  • GAN Generative Adversarial Network
  • the present disclosure provides a video processing method, device, equipment, medium, program product and computer program, which are used to solve the problem that it is difficult to guarantee the real-time performance of foreground completion in current video interactive applications.
  • an embodiment of the present disclosure provides a video processing method, including:
  • the target image being a previous video frame of a reference image
  • the reference image being a video frame currently acquired by an image sensor
  • the processed image is displayed as a current video frame, and the processed image is an image after at least part of the target object is removed from the reference image.
  • an embodiment of the present disclosure provides a video processing device, including:
  • An image acquisition module configured to acquire a target image in response to a trigger instruction, where the target image is a previous video frame of a reference image, and the reference image is a video frame currently acquired by an image sensor;
  • An image processing module configured to use the background area in the target image to fill the target area in the reference image to generate a processed image, the target area is covered by the target object in the reference image Area;
  • An image display module configured to display the processed image as a current video frame, where the processed image is an image after at least part of the target object is removed from the reference image.
  • an electronic device including:
  • a memory for storing the computer program of the processor
  • a display for displaying the processed video
  • the processor is configured to implement the video processing method described in the above first aspect and various possible designs of the first aspect by executing the computer program.
  • an embodiment of the present disclosure provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, the above first aspect and the first Aspects of the video processing method described in various possible designs.
  • an embodiment of the present disclosure provides a computer program product, including a computer program.
  • the computer program is executed by a processor, the video processing method described in the above first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program, which implements the video processing method described in the above first aspect and various possible designs of the first aspect when the computer program is executed by a processor.
  • a video processing method, device, device, medium, and program product provided by the embodiments of the present disclosure acquire a target image by responding to a trigger instruction, wherein the target image is the previous video frame of the reference image, and the reference image is the current frame of the image sensor.
  • the acquired video frame is then filled with the background area in the target image to fill in the target area where the target object is located in the reference image to generate a processed image, so that only two adjacent frames are used for complementary processing , it can not only guarantee the completion effect, but also ensure the real-time performance of the completion process.
  • the processed image is displayed as the current video frame, so that the processed image is at least partially removed from the reference image. Image to achieve the special effect of disappearing the target object in the video.
  • FIG. 1 is an application scenario diagram of a video processing method according to an example embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a video processing method according to an example embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of a video processing method according to another exemplary embodiment of the present disclosure.
  • FIG. 4 is a schematic diagram of deformation of a grid where a pixel is located in the present disclosure
  • FIG. 5 is a schematic diagram of a similar transformation of a triangle in a grid in the present disclosure.
  • Fig. 6 is a schematic structural diagram of a video processing device according to an example embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of an electronic device according to an example embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • foreground completion technology can be widely used in the field of digital entertainment special effects, and the existing foreground completion technology usually uses deep learning methods to complete, specifically, optical flow network and GAN are generally used.
  • the network jointly completes the video frame, but, through the completion method of deep learning, the calculation amount in the completion process is relatively large.
  • the existing methods are generally based on multi-frame simultaneous optimization, which further increases the amount of calculation in the completion process, making it more difficult to guarantee real-time performance.
  • the present disclosure is intended to provide a video processing method for acquiring a target image in response to a trigger instruction, wherein the target image is a previous video frame of a reference image, and the reference image is a video frame currently acquired by an image sensor. Then, the background area in the target image is used to fill in the target area where the target object is located in the reference image to generate a processed image, so that only two adjacent frames are used for complementary processing, which can ensure the complementary effect , can ensure the real-time performance of the completion process, and finally, display the processed image as the current video frame, so that the processed image is an image after at least part of the target object is removed from the reference image, so as to realize in the video The special effect that the target object disappears.
  • Fig. 1 is an application scenario diagram showing a video processing method according to an example embodiment of the present disclosure.
  • the video processing method provided in this embodiment can be executed by a terminal device equipped with a camera and a display screen.
  • a video of a target object for example, a vehicle, an animal, a plant, a building, etc.
  • a camera on the terminal device for example, a front camera, a rear camera, an external camera, etc.
  • the target object can be illustrated as a vehicle.
  • the camera on the terminal device is usually aimed at the target vehicle to shoot. It is understandable that in During the viewfinder process, the camera will obtain not only the target vehicle, but also the target background at the same time.
  • the user can input trigger commands to the terminal device (for example: target gesture command, target voice command, target expression command, target Text instructions and target body instructions, etc.), so as to at least partially remove the target object from the current image of the video.
  • the trigger instruction may be a trigger instruction input by a user, or may be a trigger instruction issued by a target object in the video.
  • the video frame currently acquired by the image sensor is the reference image 120, and then the previous video frame of the reference image 120, that is, the background area in the target image 110 is used to fill in the target area in the reference image to generate And display the processed image 121 .
  • the removal of the target object from the reference image can be triggered by a trigger command and a corresponding completion operation can be performed, so as to realize the visual effect that the target object disappears from the reference image.
  • Fig. 2 is a schematic flowchart of a video processing method according to an example embodiment of the present disclosure. As shown in Figure 2, the video processing method provided by this embodiment includes:
  • Step 101 acquiring a target image in response to a trigger instruction.
  • the camera on the terminal device is usually aimed at the target object for shooting. It is understandable that during the framing process , the camera not only acquires the target object, but also acquires the target background at the same time.
  • the terminal device acquires the target image in response to the trigger instruction, wherein the target image is a previous video frame of the reference image, and the reference image is the video frame currently acquired by the image sensor.
  • the user can trigger the effect of the target object disappearing from the image by inputting the target gesture command (for example: hand out command).
  • the terminal device recognizes the target gesture command, it can trigger the target object to disappear from the image Effects removed in the reference image.
  • Step 102 using the background area in the target image to fill in the target area in the reference image to generate a processed image.
  • the background area in the target image is used to fill the target area in the reference image to generate a processed image, where the target area is the target object in the reference image the area covered by .
  • the user triggers by inputting a hand-stretching command, so as to realize the special effect that the target object disappears in the current frame of the video in response to the user gesture during the video recording process.
  • the target object may be in the form of a target plant, a target animal, or a target building, which is not specifically limited here.
  • the above-mentioned triggering instruction may also be in the form of a target gesture command, a target voice command, a target expression command, a target text command, a target body command, etc., which are not specifically limited here.
  • Step 103 Display the processed image as the current video frame.
  • the processed image is displayed as the current video frame, wherein the processed image is an image after at least part of the target object is removed from the reference image, so as to realize the special effect that the target object disappears in the current video frame.
  • the target image is acquired in response to the trigger instruction, wherein the target image is the previous video frame of the reference image, and the reference image is the video frame currently acquired by the image sensor, and then, the background area in the target image is used to Complement and fill the target area where the target object is located in the reference image to generate a processed image, so that only two adjacent frames are used for complement processing, which can not only ensure the completion effect, but also ensure the real-time completion processing Finally, the processed image is displayed as the current video frame, so that the processed image is an image after at least part of the target object is removed from the reference image, so as to realize the special effect of the target object disappearing in the video.
  • Fig. 3 is a schematic flowchart of a video processing method according to another exemplary embodiment of the present disclosure. As shown in Figure 3, the video processing method provided by this embodiment includes:
  • Step 201 acquiring a target image in response to a trigger instruction.
  • the camera on the terminal device is usually aimed at the target object for shooting.
  • the target background will also be acquired.
  • the terminal device acquires the target image in response to the trigger instruction, wherein the target image is a previous video frame of the reference image, and the reference image is the video frame currently acquired by the image sensor.
  • the user can trigger the effect of the target object disappearing from the image by inputting the target gesture command (for example: hand out command).
  • the terminal device recognizes the target gesture command, it can trigger the target object to disappear from the image Effects removed in the reference image.
  • Step 202 determine the photometric error of each pixel in the target background relative to the corresponding pixel in the reference background, the grid deformation error determined by the similarity transformation of the grid vertices corresponding to each pixel in the target background, and the error of each pixel in the target background Corresponding mesh vertex displacement normalized mesh vertex inertia error.
  • the target foreground corresponding to the target image and the reference foreground corresponding to the reference image can be determined according to the preset foreground segmentation module.
  • the target image includes the target foreground and the target background
  • the reference image includes Referring to the foreground and the reference background, and constructing grids for the target image, the target foreground, the reference image and the reference foreground respectively, so as to generate the target image grid, the target foreground grid, the reference image grid and the reference foreground grid.
  • the above-mentioned preset foreground segmentation module can be based on any foreground segmentation algorithm, aiming at estimating the foreground object mask from the target image through computer vision and graphics algorithms, so as to determine the corresponding target foreground.
  • image pyramids can be constructed for the target image, target foreground, reference image and reference foreground respectively to generate target image pyramids, target foreground pyramids, reference image pyramids and reference foreground pyramids with the same number of layers. Then, construct grids for each image layer in the target image pyramid, target foreground pyramid, reference image pyramid, and reference foreground pyramid, wherein the grid shape and scale of the corresponding layers in each image pyramid are the same.
  • an L-layer image pyramid can be constructed for the target image I tar , the target foreground F tar , the reference image I ref and the reference foreground F ref respectively, where L is generally set to 3, where the 0th layer represents the topmost layer, and the Lth layer -1 layer means the bottom layer.
  • each pixel in the above-mentioned target background it can be obtained by sampling at each layer of the image pyramid. Specifically, it can be to calculate the brightness gradient of each pixel on the first target image layer, and according to the preset step size Traversing through each pixel on the first target image layer to determine a first target sample set, wherein the pixels in the first target sample set belong to the first target background layer, and the corresponding brightness gradient is greater than a preset first threshold, the first The target image layer is at the same level in the image pyramid as the first target background layer.
  • the above-mentioned preset step size can be N pixels, so that on the first target image layer, one pixel is taken every N pixels apart, and the direction of traversal can be along the first target image layer
  • the horizontal direction may also be along the vertical direction of the first target image layer.
  • the luminance gradients of the L-layer target image may be calculated separately, and the pixel q on each target image layer is traversed according to the step size s pixel interval, wherein s generally takes an integer between 1-5. If q is a background pixel, that is, it belongs to the L-level target background layer, and the gradient size of the location exceeds the threshold t g , then add it to the L-level target sample set of the corresponding level, and maintain a sample for each target image layer set.
  • the L layer target sample set it can be expressed as:
  • grad(q) is the brightness gradient of the q pixel position
  • Background L is the target background of the L layer
  • t g can be set to 0.1.
  • Step 203 Determine the overall cost function according to the photometric error, grid deformation error and grid vertex inertial error.
  • the overall cost function adopted includes three kinds of cost functions, which are photometric error f q based on sampling points, grid deformation error f d and grid vertex inertia error f V.
  • I tar represents the target image
  • I ref represents the reference image
  • q is the pixel position in the target sample set SL
  • I tar (q) and I ref (q) are the same target image layer and reference image layer at q brightness value, is the brightness gradient of the target image layer at q.
  • FIG. 4 is a schematic diagram of deformation of a grid where pixels are located in the present disclosure. As shown in Figure 4, let the four vertices of the square where q is located be Thus, q can be expressed as a linear combination of these four vertices (wherein, the coefficient C k is determined by bilinear interpolation), then q can be expressed as:
  • the solid-line quadrilateral represents the grid before deformation
  • the dotted-line quadrilateral represents the grid after deformation.
  • the deformation of the grid is achieved through vertex displacement.
  • the interpolation coefficient C k of q before and after deformation remains unchanged, and the deformation of the grid changes q Move to a more suitable position q ' , resulting in a reduction in the photometric error.
  • f d is a component of f d , used to represent the deformation error of l 12 and l 13 relative to l' 12 and l' 13 ; is another component of f d , which is used to represent the deformation error of l 12 and l 23 relative to l' 12 and l' 23 .
  • l 12 , l 13 and l 23 constitute one of the triangles after the grid division
  • l' 12 , l' 13 and l' 23 constitute the corresponding deformed triangle.
  • each grid can be divided into two triangles, that is, the grid corresponding to each pixel in the target background can be divided into the first triangle and the second triangle, so that the grid deformation error can be used to make the first triangle similar transformation is the deformed third triangle, and the second triangle is similarly transformed into the deformed fourth triangle.
  • FIG. 5 is a schematic diagram of a similar transformation of a triangle in the grid in the present disclosure. With reference to FIG. 5 , it can be minimized by minimizing as well as , to ensure as much as possible that each triangle conforms to the similarity transformation when the mesh deforms.
  • ⁇ l k1k2 ,k1k2 12,13,23 ⁇
  • t V ⁇ R 2 represents the displacement vector of the grid vertex V, so as to regularize the grid vertex displacement through the inertial error, so as to suppress the excessive grid vertex displacement.
  • the overall cost function can be constructed:
  • the obtained optimization variable is the displacement vector t V of each grid vertex.
  • Gauss-Newton or Levenberg-Marquardt (LM) algorithm can be used to obtain the displacement vector of the vertices of the grid and realize the deformation of the grid.
  • ⁇ 1 can take a value of 1
  • ⁇ 2 can take a value of 0.2-0.5
  • ⁇ 3 can take a value of 0.1.
  • Step 204 Determine the displacement vector of the grid vertex corresponding to each pixel in the target background according to the overall cost function and a preset optimization algorithm.
  • Grid deformation may be performed on the corresponding pixels in the first target sample set according to the displacement vector, and the grid deformation result is transferred to the next target image layer in the target image pyramid, Up to the bottom of the target image pyramid to generate an optimized target image, wherein the first target image layer is the top of the target image pyramid, wherein the transfer direction of the result after grid deformation is from the top of the target image pyramid to the bottom.
  • a coarse-to-fine method can be adopted, that is, starting from the topmost grid, propagating the deformed grid vertices layer by layer to The next layer performs mesh vertex position optimization until the bottommost deformed mesh vertices are obtained.
  • the first target image layer is the first layer of the image pyramid
  • grid deformation can be performed on the corresponding pixels in the first target sample set according to the displacement vector, and the grid deformation result is transferred to the target
  • the grid deformation result of the first target image layer is used as the initial grid before the grid deformation of the second target image layer.
  • Step 205 Generate an optimized target image after adjusting the target image according to the displacement vector.
  • Step 206 calculating the dense optical flow of the reference image relative to the optimization target image.
  • Step 207 Completing and filling the target area in the reference image according to the dense optical flow.
  • the dense optical flow of the reference image relative to the optimized target image can be calculated to fill the target area in the reference image according to the dense optical flow.
  • the user triggers the instruction so that the terminal device can respond to the user gesture to realize the special effect that the target object disappears in the current frame of the video.
  • the use of grid optimization ensures that the calculated dense optical flow can not only It ensures that the overlapping background area of the target image and the reference image has a low photometric error, and ensures that adjacent pixels share a consistent optical flow vector, thereby effectively predicting the pixel optical flow in the foreground area.
  • the mesh vertex deformation of each layer is transferred layer by layer, that is, starting from the top layer of the image pyramid, the mesh vertex deformation of the previous layer is transferred to the next layer until it is transferred to the image
  • the bottom layer of the pyramid forms a multi-scale processing strategy to accelerate optical flow calculations.
  • the rendered image quality in the foreground area in a static state will gradually become blurred. Therefore, on the basis of the above embodiments, before using the background area in the target image to fill the target area in the reference image, it is necessary to first determine that the image acquisition state is the motion acquisition state. Specifically, through steps 201 to 204 in the above embodiment, the displacement of each grid vertex in the optimized target image relative to the corresponding grid vertex in the target image can be calculated, and then the average displacement can be calculated. If the average displacement is greater than or is equal to the preset displacement threshold, then it is determined that the image acquisition state is the motion acquisition state. Wherein, the above-mentioned displacement threshold can be set to 0.3-0.5 pixel length.
  • the image acquisition state at this time is the static acquisition state
  • the target image I tar is set as the background image I bg
  • the target foreground F tar is set as the background foreground F bg .
  • the background image is updated. It may be performed every 5-10 frames, and the background image I bg and the background foreground F bg are updated with the background area of the reference image.
  • the second step is to complete the reference image. Fill the foreground area of the reference image with the background area of the background image Ibg , and update the foreground template of the reference image at the same time.
  • the mode of image acquisition state is determined by judging the relationship between the average displacement of the grid vertices and the preset displacement threshold;
  • the effect and real-time performance of the foreground completion in the motion capture state can also ensure the foreground completion effect in the static capture state.
  • Fig. 6 is a schematic structural diagram of a video processing device according to an example embodiment of the present disclosure. As shown in FIG. 6, the video processing device 300 provided in this embodiment includes:
  • An image acquisition module 301 configured to acquire a target image in response to a trigger instruction, the target image being a previous video frame of a reference image, the reference image being a video frame currently acquired by an image sensor;
  • the image processing module 302 is configured to use the background area in the target image to fill the target area in the reference image to generate a processed image, the target area is the target object in the reference image the area covered;
  • the display 303 is configured to display the processed image as a current video frame, where the processed image is an image after at least part of the target object is removed from the reference image.
  • the image processing module 302 is specifically configured to:
  • the image processing module 302 is further configured to:
  • the image processing module 302 is specifically configured to:
  • the image processing module 302 is specifically configured to:
  • the image processing module 302 is specifically configured to:
  • the image processing module 302 is specifically configured to:
  • the first triangle is similarly transformed into a deformed third triangle
  • the second triangle is similarly transformed into a deformed fourth triangle.
  • the image processing module 302 is further configured to: determine that the image acquisition state is a motion acquisition state.
  • the image processing module 302 is specifically configured to:
  • the image acquisition state is the motion acquisition state.
  • video processing device provided by the embodiment shown in FIG. 6 can be used to execute the method steps provided by any of the above method embodiments, and the specific implementation method and technical effect are similar, and will not be repeated here.
  • Fig. 7 is a schematic structural diagram of an electronic device according to an example embodiment of the present disclosure. As shown in FIG. 7 , it shows a schematic structural diagram of an electronic device 400 suitable for implementing the embodiments of the present disclosure.
  • the terminal equipment in the embodiments of the present disclosure may include but not limited to mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA for short), tablet computers (Portable Android Device, PAD for short), portable multimedia Players (Portable Media Player, PMP for short), vehicle-mounted terminals (such as vehicle navigation terminals), wearable electronic devices and other mobile terminals with image acquisition functions, as well as external devices with image acquisition functions such as digital TVs, desktop computers, smart home devices, etc. Get the fixed terminal of the device.
  • the electronic device shown in FIG. 7 is only an example, and should not limit the functions and application scope of the embodiments of the present disclosure.
  • an electronic device 400 may include a processor (such as a central processing unit, a graphics processing unit, etc.) Various appropriate actions and processes are executed by a program loaded into a random access memory (RAM for short) 403 .
  • RAM random access memory
  • various programs and data necessary for the operation of the electronic device 400 are also stored.
  • the processor 401, ROM 402, and RAM 403 are connected to each other through a bus 404.
  • An input/output (Input/Output, I/O for short) interface 405 is also connected to the bus 404 .
  • the memory is used to store programs for executing the video processing methods described in the above method embodiments; the processor is configured to execute the programs stored in the memory.
  • an input device 406 including, for example, a touch screen, a touchpad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.; ), a speaker, a vibrator, etc.; a memory 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409.
  • the communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data. While FIG. 7 shows electronic device 400 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • the processes described above with reference to the flowcharts can be implemented as computer software programs.
  • the embodiments of the present disclosure include a computer-readable storage medium including a computer program carried on a non-transitory computer-readable medium, the computer program including the video shown in the flowchart for executing the embodiments of the present disclosure
  • the program code for the processing method may be downloaded and installed from a network via communication means 409, or from memory 408, or from ROM 402.
  • the computer-readable storage medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (Electrical Programmable Read Only Memory, referred to as EPROM or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, referred to as CD-ROM), optical storage device, magnetic storage device, or the above any suitable combination.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • the program code contained on the computer readable medium can be transmitted by any appropriate medium, including but not limited to: electric wire, optical cable, radio frequency (Radio Frequency, RF for short), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable storage medium may be included in the above-mentioned electronic device, or may exist independently without being assembled into the electronic device.
  • the above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires a target image in response to a trigger instruction, and the target image is a previous reference image
  • the video frame, the reference image is the video frame currently acquired by the image sensor; the background area in the target image is used to fill the target area in the reference image to generate a processed image, and the target area is the target object in the reference image
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or it can be connected to an external A computer (connected via the Internet, eg, using an Internet service provider).
  • LAN Local Area Network
  • WAN Wide Area Network
  • the client and the server can communicate using any currently known or future developed network protocols such as HyperText Transfer Protocol (HyperText Transfer Protocol, referred to as HTTP), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections.
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include local area networks ("LANs”), wide area networks ("WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation of the unit itself under certain circumstances, for example, the display module can also be described as "a unit that displays the target human face and human face mask sequence".
  • exemplary types of hardware logic components include: Field Programmable Gate Array (Field Programmable Gate Array, FPGA for short), Application Specific Integrated Circuit (ASIC for short), application specific standard product ( Application Specific Standard Parts (ASSP for short), System on Chip (SOC for short), Complex Programmable Logic Device (CPLD for short), etc.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • a video processing method including:
  • the target image being a previous video frame of a reference image
  • the reference image being a video frame currently acquired by an image sensor
  • the processed image is displayed as a current video frame, and the processed image is an image after at least part of the target object is removed from the reference image.
  • using the background area in the target image to fill the target area in the reference image includes:
  • Complementary filling is performed on the target area in the reference image according to the dense optical flow.
  • the grid vertices corresponding to each pixel in the target background are determined according to the similarity transformation Before the grid deformation error, and the grid vertex inertia error normalized by the grid vertex displacement corresponding to each pixel in the target background, it also includes:
  • the grids are respectively constructed for the target image, the target foreground, the reference image, and the reference foreground to generate a target image grid, a target foreground grid, Reference image grids as well as reference foreground grids, including:
  • the determination of the photometric error of each pixel in the target background relative to the corresponding pixel in the reference background, the grid vertex corresponding to each pixel in the target background is determined according to the similarity transformation Grid deformation error, and grid vertex inertial error normalized by grid vertex displacement corresponding to each pixel in the target background, including:
  • the displacement vector of the grid vertex corresponding to each pixel in the target background is determined according to the overall cost function and a preset optimization algorithm, so that the displacement vector can be adjusted according to the displacement vector.
  • the above target image is adjusted to generate an optimized target image, including:
  • the grid deformation error determined by the similarity transformation of the grid vertices corresponding to each pixel in the target background, it further includes:
  • the first triangle is similarly transformed into a deformed third triangle
  • the second triangle is similarly transformed into a deformed fourth triangle.
  • the background area in the target image before using the background area in the target image to fill the target area in the reference image, it further includes:
  • the determining that the image acquisition state is a motion acquisition state includes:
  • the image acquisition state is the motion acquisition state.
  • a video processing device including:
  • An image acquisition module configured to acquire a target image in response to a trigger instruction, where the target image is a previous video frame of a reference image, and the reference image is a video frame currently acquired by an image sensor;
  • An image processing module configured to use the background area in the target image to fill the target area in the reference image to generate a processed image, the target area is covered by the target object in the reference image Area;
  • a display configured to display the processed image as a current video frame, where the processed image is an image in which at least part of the target object is removed from the reference image.
  • the image processing module is specifically used for:
  • the image processing module is further used for:
  • the image processing module is specifically used for:
  • the image processing module is specifically used for:
  • the image processing module is specifically used for:
  • the image processing module is specifically used for:
  • the first triangle is similarly transformed into a deformed third triangle
  • the second triangle is similarly transformed into a deformed fourth triangle.
  • the image processing module is further configured to: determine that the image acquisition state is a motion acquisition state.
  • the image processing module is specifically used for:
  • the image acquisition state is the motion acquisition state.
  • an electronic device including:
  • a memory for storing the computer program of the processor
  • a display for displaying the video processed by the processor
  • the processor is configured to implement the video processing method described in the above first aspect and various possible designs of the first aspect by executing the computer program.
  • an embodiment of the present disclosure provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the processor executes the computer-executable instructions, the above first aspect and the first Aspects of the video processing method described in various possible designs.
  • an embodiment of the present disclosure provides a computer program product, including a computer program.
  • the computer program is executed by a processor, the video processing method described in the above first aspect and various possible designs of the first aspect is implemented.
  • an embodiment of the present disclosure provides a computer program, which implements the video processing method described in the above first aspect and various possible designs of the first aspect when the computer program is executed by a processor.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供一种视频处理方法、装置、设备、介质、程序产品及计算机程序。本公开提供的视频处理方法,响应于触发指令,获取目标图像,其中,目标图像为参考图像的前一视频帧,参考图像为图像传感器当前所获取到的视频帧,然后,利用目标图像中的背景区域对参考图像中目标对象所在的目标区域进行补全填充,以生成处理后图像,从而仅采用相邻两帧进行补全处理的方式,既能够保证补全效果,又可以确保补全处理的实时性,最后,再将处理后图像作为当前视频帧进行显示,进而使得处理后图像为目标对象至少部分从参考图像中移除后的图像,以在视频中实现目标对象消失的特效。

Description

视频处理方法、装置、设备、介质及程序产品
相关申请的交叉引用
本申请要求于2021年08月24日提交中国专利局、申请号为202110977267.6、申请名称为“视频处理方法、装置、设备、介质及程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及视频处理技术领域,尤其涉及一种视频处理方法、装置、设备、介质、程序产品及计算机程序。
背景技术
随着通信技术和终端设备的发展,各种终端设备例如手机、平板电脑等已经成为了人们工作和生活中不可或缺的一部分,而且随着终端设备的日益普及,视频应用成为一种沟通和娱乐的主要渠道。
其中,在视频应用中,前景补全技术能够在数字娱乐特效领域有广泛应用,而现有的前景补全技术通常是采用深度学习方法的方式进行补全,具体的,一般是采用光流网络和生成对抗网络(Generative Adversarial Network,GAN)联合对视频帧进行补全。
但是,上述通过深度学习的补全方式,在补全的过程中计算量较大,若需要进行实时补全,则实时性难以保证。
发明内容
本公开提供一种视频处理方法、装置、设备、介质、程序产品及计算机程序,用于解决当前视频交互应用中前景补全的实时性难以保证的问题。
第一方面,本公开实施例提供一种视频处理方法,包括:
响应于触发指令,获取目标图像,所述目标图像为参考图像的前一视频帧,所述参考图像为图像传感器当前所获取到的视频帧;
利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,以生成处理后图像,所述目标区域为目标对象在所述参考图像中所覆盖的区域;
将所述处理后图像作为当前视频帧进行显示,所述处理后图像为所述目标对象至少部分从所述参考图像中移除后的图像。
第二方面,本公开实施例提供一种视频处理装置,包括:
图像获取模块,用于响应于触发指令,获取目标图像,所述目标图像为参考图像的前一视频帧,所述参考图像为图像传感器当前所获取到的视频帧;
图像处理模块,用于利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,以生成处理后图像,所述目标区域为目标对象在所述参考图像中所覆盖的区域;
图像显示模块,用于将所述处理后图像作为当前视频帧进行显示,所述处理后图像为所述目标对象至少部分从所述参考图像中移除后的图像。
第三方面,本公开实施例提供一种电子设备,包括:
处理器;以及
存储器,用于存储所述处理器的计算机程序;
显示器,用于显示经所述处理器理后的视频;
其中,所述处理器被配置为通过执行所述计算机程序来实现如上第一方面以及第一方面各种可能的设计中所述的视频处理方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计中所述的视频处理方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计中所述的视频处理方法。
第六方面,本公开实施例提供一种计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计中所述的视频处理方法。
本公开实施例提供的一种视频处理方法、装置、设备、介质及程序产品,通过响应于触发指令,获取目标图像,其中,目标图像为参考图像的前一视频帧,参考图像为图像传感器当前所获取到的视频帧,然后,利用目标图像中的背景区域对参考图像中目标对象所在的目标区域进行补全填充,以生成处理后图像,从而仅采用相邻两帧进行补全处理的方式,既能够保证补全效果,又可以确保补全处理的实时性,最后,再将处理后图像作为当前视频帧进行显示,进而使得处理后图像为目标对象至少部分从参考图像中移除后的图像,以在视频中实现目标对象消失的特效。
附图说明
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本公开根据一示例实施例示出的视频处理方法的应用场景图;
图2为本公开根据一示例实施例示出的视频处理方法的流程示意图;
图3为本公开根据另一示例实施例示出的视频处理方法的流程示意图;
图4为本公开中像素所在方格的变形示意图;
图5为本公开中网格内的一个三角形的相似变换示意图;
图6为本公开根据一示例实施例示出的视频处理装置的结构示意图;
图7为本公开根据一示例实施例示出的电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
在视频应用中,前景补全技术能够在数字娱乐特效领域有广泛应用,而现有的前景补全技术通常是采用深度学习方法的方式进行补全,具体的,一般是采用光流网络和GAN网络联合对视频帧进行补全,但是,通过深度学习的补全方式,在补全的过程中计算量较大。此外,现有的方法一般也都是基于多帧同时优化,则进一步增加了补全过程中的计算量,使得实时性更加难以保证。
而在本公开中,旨在提供一种视频处理方法,响应于触发指令,获取目标图像,其中,目标图像为参考图像的前一视频帧,参考图像为图像传感器当前所获取到的视频帧,然后,利用目标图像中的背景区域对参考图像中目标对象所在的目标区域进行补全填充,以生成处理后图像,从而仅采用相邻两帧进行补全处理的方式,既能够保证补全效果,又可以确保补全处理的实时性,最后,再将处理后图像作为当前视频帧进行显示,进而使得处理后图像为目标对象至少部分从参考图像中移除后的图像,以在视频中实现目标对象消失的特效。
图1为本公开根据一示例实施例示出的视频处理方法的应用场景图。如图1所示,本实施例提供的视频处理方法,可以通过带有摄像头以及显示屏幕的终端设备执行。具体的,可以通过终端设备上的摄像头(例如,前置摄像头,后置摄像头,外接摄像头等)来对目标对象(例如:车辆、动物、植物、建筑等)进行视频拍摄。
可以以目标对象为车辆进行举例说明,在一种可能的场景中,当利用终端设备对目标车辆进行视频拍摄时,通常是将终端设备上的摄像头对准目标车辆进行拍摄,可以理解的,在取景过程中,摄像头除了会获取到目标车辆之外,还同时会获取到目标背景,此时,用户可以通过向终端设备输入触发指令(例如:目标手势指令、目标语音指令、目标表情指令、目标文字指令以及目标肢体指令等),从而将目标对象至少部分从视频当前图像中进行移除。
值得说明的,触发指令,可以是用户输入的触发指令,也可以是视频中的目标对象发出的触发指令。此时,图像传感器当前所获取到的视频帧为参考图像120,然后利用参考图像120的前一视频帧,即目标图像110中的背景区域对参考图像中的目标区域进行补全填充,以生成并显示处理后图像121。可见,在对目标对象进行拍摄或播放的过程中,可以通过触发指令,触发目标对象从参考图像中移除并进行相应的补全操作,从而实现目标对象从参考图像中消失的视觉效果。
图2为本公开根据一示例实施例示出的视频处理方法的流程示意图。如图2所示,本实施例提供的视频处理方法,包括:
步骤101、响应于触发指令,获取目标图像。
在一种可能的场景中,当利用终端设备对目标对象,例如,目标植物等,进行视频拍摄时,通常是将终端设备上的摄像头对准目标对象进行拍摄,可以理解的,在取景过程中,摄像头除了会获取到目标对象之外,还同时会获取到目标背景。
此时,终端设备响应于触发指令,获取目标图像,其中,目标图像为参考图像的前一视频帧,而参考图像为图像传感器当前所获取到的视频帧。在对目标对象进行视频拍摄的过程中,用户通过输入目标手势指令(例如:伸手指令)来触发目标对象从图像中消失的效果,当终端设备识别到目标手势指令时,即可以触发目标对象从参考图像中移除的特效。
步骤102、利用目标图像中的背景区域对参考图像中的目标区域进行补全填充,以生成处理后图像。
在触发目标对象从参考图像中移除的特效之后,再利用目标图像中的背景区域对参考图像中的目标区域进行补全填充,以生成处理后图像,其中,目标区域为目标对象在参考图像中所覆盖的区域。
在对目标对象进行目标视频录制的过程中,用户通过输入伸手指令来进行触发,从而实现在视频录制过程中,响应于用户手势,来实现目标对象在视频的当前帧中消失的特效。此外,值得说明的,在本实施例中,目标对象可以是目标植物、目标动物、目标建筑等形式,在此不作具体限定。此外,对于上述的触发指令,也可以是目标手势指令、目标语音指令、目标表情指令、目标文字指令、目标肢体指令等形式,在此同样不作具体限定。
步骤103、将处理后图像作为当前视频帧进行显示。
在步骤中,将处理后图像作为当前视频帧进行显示,其中,处理后图像为目标对象至少部分从参考图像中移除后的图像,从而来实现目标对象在视频当前帧中消失的特效。
在本实施例中,响应于触发指令,获取目标图像,其中,目标图像为参考图像的前一视频帧,参考图像为图像传感器当前所获取到的视频帧,然后,利用目标图像中的背景区域对参考图像中目标对象所在的目标区域进行补全填充,以生成处理后图像,从而仅采用相邻两帧进行补全处理的方式,既能够保证补全效果,又可以确保补全处理的实时性,最后,再将处理后图像作为当前视频帧进行显示,进而使得处理后图像为目标对象至少部分从参考图像中移除后的图像,以在视频中实现目标对象消失的特效。
图3为本公开根据另一示例实施例示出的视频处理方法的流程示意图。如图3所示,本实施例提供的视频处理方法,包括:
步骤201、响应于触发指令,获取目标图像。
在一种可能的场景中,当利用终端设备对目标对象进行视频拍摄时,通常是将终端设备上的摄像头对准目标对象进行拍摄,可以理解的,在取景过程中,摄像头除了会获取到目标对象之外,还同时会获取到目标背景。
此时,终端设备响应于触发指令,获取目标图像,其中,目标图像为参考图像的前一视频帧,而参考图像为图像传感器当前所获取到的视频帧。在对目标对象进行视频拍摄的过程中,用户通过输入目标手势指令(例如:伸手指令)来触发目标对象从图像中消失的效果,当终端设备识别到目标手势指令时,即可以触发目标对象从参考图像中移除的特效。
步骤202、确定目标背景中每个像素相对于参考背景中对应像素的光度误差,目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差。
在本步骤中,在获取到目标图像以及参考图像之后,可以根据预设前景分割模块确定目标图像对应的目标前景,以及参考图像对应的参考前景,目标图像包括目标前景以及目标背景,参考图像包括参考前景以及参考背景,并分别对目标图像、目标前景、参考图像以及参考前景构造网格,以生成目标图像网格、目标前景网格、参考图像网格以及参考前景网格。值得说明的,上述预设前景分割模块可以是基于任意前景分割算法,旨在通过计算机视觉和图形学算法从目标图像中估计前景物体掩膜,以确定对应的目标前景。
然后,确定目标背景中每个像素相对于参考背景中对应像素的光度误差,目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差,其中,目标背景对应目标图像中的背景区域,参考背景对应参考图像中的背景区域。
而对于上述网格的构造过程,可以是分别对目标图像、目标前景、参考图像以及参考前景构造图像金字塔,以生成层数相同的目标图像金字塔、目标前景金字塔、参考图像金字塔以及参考前景金字塔。然后,分别对目标图像金字塔、目标前景金字塔、参考图像金字塔以及参考前景金字塔中的各个图像层构造网格,其中,各个图像金字塔中对应层级的网格形状与尺度相同。
具体的,可以是分别对目标图像I tar、目标前景F tar、参考图像I ref和参考前景F ref构造L层图像金字塔,其中,L一般设置为3,其中第0层表示最顶层,第L-1层表示最底层。
对于上述的各个图像金字塔,首先,构造最顶层图像的网格,设置每个格子的大小为M×M像素,在目标图像空间上根据格子大小构造网格,最终生成W grid×H grid个网格顶点,其中M一般取值30-50个像素之间。接下来每下一层图像的格子长宽均变为其上图像格子长宽的两倍,最终L层图像都有同样形状、不同尺度的二维网格。
而对于上述的目标背景中每个像素的确定,可以是在图像金字塔的每一层进行采样获取,具体的,可以是计算第一目标图像层上各个像素的亮度梯度,并按照预设步长遍历第一目标图像层上的各个像素,以确定第一目标样本集,其中,第一目标样本集中的像素属于第一目标背景层,且所对应的亮度梯度大于预设第一阈值,第一目标图像层与第一目标背景层位于图像金字塔中的相同层级。其中,对于上述的预设步长可以为N个像素点,从而在第一目标图像层上,每间隔N个像素点取一个像素点,并且,遍历的方向可以是沿着第一目标图像层的横向方向,也可以是沿着第一目标图像层的纵向方向。
然后,确定第一目标图像层上各个像素相对于参考背景中对应像素的光度误差,第一目标图像层上各个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及第一目标图像层上各个像素对应的网格顶点位移正则化的网格顶点惯性误差。
具体的,可以是分别计算L层目标图像的亮度梯度,按照步长s像素间隔遍历每一目标图像层上的像素q,其中,s一般取1-5之间的整数。如果q为背景像素,即属于L层目标背景层,且所在位置的梯度大小超过阈值t g,则将其加入对应层级的L层目标样本集,并且,每一层目标图像层分别维护一个样本集。其中,对于L层目标样本集,可以表达为:
S L={q|grad(q)>t g&&q∈Background L}
grad(q)为q像素位置的亮度梯度,Background L为L层的目标背景,t g可以设为0.1。
步骤203、根据光度误差、方格变形误差以及网格顶点惯性误差确定总体代价函数。
而为实现网格变形优化,在本实施例中,所采用的总体代价函数包括三种代价函数,分别是基于采样点的光度误差f q、方格变形误差f d以及网格顶点惯性误差f V
具体的,对于光度误差f q,参见下述公式:
Figure PCTCN2022113881-appb-000001
其中,I tar表示目标图像,I ref表示参考图像,q为目标样本集S L中的像素位置,I tar(q)和I ref(q)分别是同一目标图像层与参考图像层在q处的亮度值,
Figure PCTCN2022113881-appb-000002
为目标图像层在q处的亮度梯度。
图4为本公开中像素所在方格的变形示意图。如图4所示,令q所在方格的四个顶点为
Figure PCTCN2022113881-appb-000003
从而可将q表示为这四个顶点的线形组合(其中,系数C k由双线性插值确定),则q可以表示为:
Figure PCTCN2022113881-appb-000004
而对于q的
Figure PCTCN2022113881-appb-000005
位移向量则可以表示为:
Figure PCTCN2022113881-appb-000006
继续参照图4,实线四边形表示变形之前的方格,虚线四边形表示变形之后的方格,方格的变形通过顶点位移实现,变形前后q的插值系数C k保持不变,方格变形将q移动到更合适的位置q ,从而导致光度误差降低。
而对于方格变形误差f d,参见下述公式:
Figure PCTCN2022113881-appb-000007
Figure PCTCN2022113881-appb-000008
Figure PCTCN2022113881-appb-000009
为f d的一个分量,用于表示l 12与l 13相对于l’ 12与l’ 13的变形误差大小;
Figure PCTCN2022113881-appb-000010
为f d的另一个分量,用于表示l 12与l 23相对于l’ 12与l’ 23的变形误差大小。其中,l 12、l 13以及l 23构成方格分割后其中一个三角形,而l’ 12、l’ 13以及l’ 23构成对应变形后的三角形。
通过控制方格变形误差,可以确保网格内的方格变形时尽可能符合相似变换,从而使得前景区域内填充的图像在视觉效果上看起来更自然,更符合透视变换。具体的,可以将每个方格划分成两个三角形,即将目标背景中每个像素对应的网格分割为第一三角形与第二三角形,从而利用方格变形误差用于使得第一三角形相似变换为变形后的第三三角形,第二三角形相似变换为变形后的第四三角形。其中,图5为本公开中网格内的一个三角形的相似变换示意图,参照图5,然可以通过尽量减小
Figure PCTCN2022113881-appb-000011
以及
Figure PCTCN2022113881-appb-000012
的数值,以尽可能确保每个三角形在网格变形时符合相似变换。
具体的,{V k,k=1,2,3}和{V k’,k=1,2,3}分别表示变形前后的三角形顶点,{l k1k2,k1k2=12,13,23}和{l k1k2’,k1k2=1,2,3}分别表示变形前后的三角形边长。
而对于惯性误差f V,参见下述公式:
f V=‖t V2
其中,t V∈R 2表示网格顶点V的位移向量,以通过惯性误差起到对网格顶点位移正则化的作用,从而抑制过大的网格顶点位移。
之后,基于上述的光度误差、方格变形误差以及惯性误差,可以构造总体代价函数:
Figure PCTCN2022113881-appb-000013
从而,利用上述的总体代价函数,获得的优化变量为各网格顶点的位移向量t V。而对于进行优化的算法,则可以是利用高斯牛顿或Levenberg Marquardt(LM)算法,从而得到网格顶点的位移向量,实现网格变形。并且,上述的λ 1可以取值1,λ 2可以取值0.2-0.5,λ 3可以取值0.1。
步骤204、根据总体代价函数以及预设优化算法确定目标背景中每个像素对应的网格顶点的位移向量。
根据总体代价函数以及预设优化算法确定目标背景中每个像素对应的网格顶点的位移向量,以根据位移向量对目标图像进行调整后生成优化目标图像。在一种可能的实现方式中,可以根据位移向量对第一目标样本集中的对应的像素进行网格变形,并将网格变形后的结果传递至目标图像金字塔中的下一层目标图像层,直至目标图像金字塔的最底层,以生成优化目标图像,其中,第一目标图像层为目标图像金字塔的最顶层,其中,网格变形后的结果传递方向为目标图像金字塔的最顶层指向最底层。
对于上述的网格变形后的结果传递至目标图像金字塔中的下一层目标图像层的过程,可以采用由粗到精的方式,即从最顶层网格开始,逐层传播变形网格顶点到下一层执行网格顶点位置优化,直至得到最底层的变形网格顶点。可以理解的,当第一目标图像层为图像金字塔的第一层时,可以是根据位移向量对第一目标样本集中的对应的像素进行网格变形,并将网格变形后的结果传递至目标图像金字塔中的第二目标图像层,即将第一目标图像层网格变形结果作为第二目标图像层未进行网格变形前的初始网格。然后,再对第二目标图像层进行相同方式的网格变形,并将网格变形后的结果传递至目标图像金字塔中的第三目标图像层,即将第二目标图像层网格变形结果作为第三目标图像层未进行网格变形前的初始网格。最后,再对第三目标图像层进行相同方式的网格变形,并将网格变形后的结果传递至目标图像金字塔中的第四目标图像层,即将第三目标图像层网格变形结果作为第四目标图像层未进行网格变形前的初始网格。
此外,值得说明的,由于在对金字塔各个图像层进行网格划分时,每下一层图像的格子长宽均变为其上图像格子长宽的两倍,因此,对于上述将上一层网格顶点位置优化结果传递至下一层图像时,还需进行尺度扩大处理,即V”=2·V’,其中,V”为下一层网格顶点位置,V’为上一层网格顶点位置。
步骤205、根据位移向量对目标图像进行调整后生成优化目标图像。
在根据总体代价函数以及预设优化算法确定目标背景中每个像素对应的网格顶点的位移向量,以根据位移向量对目标图像进行调整后,即完成最底层的变形网格顶点优化,则即可生成优化目标图像。
步骤206、计算参考图像相对于优化目标图像的稠密光流。
步骤207、根据稠密光流对参考图像中的目标区域进行补全填充。
之后,在生成优化目标图像之后,即可计算参考图像相对于优化目标图像的稠密光流,以根据稠密光流对参考图像中的目标区域进行补全填充,从在对目标对象进行目标视频录制的过程中,用户通过触发指令,使得终端设备能够响应于用户手势,来实现目标对象在视频的当前帧中消失的特效。
此外,在本实施例中,通过采用相邻两帧补全的策略,既能够保证补全效果,又可以确保补全处理的实时性,此外,利用网格优化确保计算的稠密光流既能保证目标图像与参考图像的重叠背景区域有较低的光度误差,又能确保相邻像素共享一致的光流向量,进而 有效预测前景区域的像素光流。此外,通过构建图像金字塔,然后,将各层的网格顶点变形逐层依次传递,即从图像金字塔的最顶层开始,将上一层的网格顶点变形传递至下一层,直至传递至图像金字塔的最底层,从而形成多尺度处理策略,以加速光流计算。
通过上述实施例中的视频处理方法,利用目标图像对参考图像进行前景实时补全的过程中,由于采用了双线性插值,会导致静止状态下前景区域内渲染画质逐渐模糊。因此,在上述实施例的基础上,在利用目标图像中的背景区域对参考图像中的目标区域进行补全填充之前,还需要先确定图像获取状态为运动获取状态。具体的,可以是通过上述实施例中步骤201至步骤204,计算优化目标图像中的各个网格顶点相对于目标图像中对应的各个网格顶点的位移,然后再计算平均位移,若平均位移大于或等于预设位移阈值,则确定图像获取状态为运动获取状态。其中,上述的位移阈值可以设置为0.3-0.5个像素长度。
而若所确定的平均位移小于预设位移阈值,则此时的图像获取状态为静止获取状态,则将目标图像I tar设置为背景图像I bg,目标前景F tar设置为背景前景F bg。对后续新加入的视频帧,若图像获取状态为静止获取状态,则利用背景图像进行补全,否则采用运动状态前景补全策略,即图2或图3所示实施例中的步骤。
基于背景图像I bg的补全算法如下:
第一步,背景图像更新。可以是每隔5-10帧执行,用参考图像的背景区域更新背景图像I bg和背景前景F bg
第二步,参考图像补全。用背景图像I bg的背景区域填充参考图像的前景区域,同时更新参考图像的前景模板。
在上述实施例中,通过判断网格顶点的平均位移与预设位移阈值的关系来确定图像获取状态的方式;然后,根据所确定的图像获取状态选取对应的前景补全策略,既可以保证在运动获取状态前景补全的效果以及实时性,又可以保证在静止获取状态的前景补全效果。
图6为本公开根据一示例实施例示出的视频处理装置的结构示意图。如图6所示,本实施例提供的视频处理装置300,包括:
图像获取模块301,用于响应于触发指令,获取目标图像,所述目标图像为参考图像的前一视频帧,所述参考图像为图像传感器当前所获取到的视频帧;
图像处理模块302,用于利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,以生成处理后图像,所述目标区域为目标对象在所述参考图像中所覆盖的区域;
显示器303,用于将所述处理后图像作为当前视频帧进行显示,所述处理后图像为所述目标对象至少部分从所述参考图像中移除后的图像。
根据本公开的一个或多个实施例,图像处理模块302,具体用于:
确定目标背景中每个像素相对于参考背景中对应像素的光度误差,所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差,其中,所述目标背景对应所述目标图像中的背景区域,所述参考背景对应所述参考图像中的背景区域;
根据所述光度误差、所述方格变形误差以及所述网格顶点惯性误差确定总体代价函数;
根据所述总体代价函数以及预设优化算法确定所述目标背景中每个像素对应的网格顶点的位移向量,以根据所述位移向量对所述目标图像进行调整后生成优化目标图像;
计算所述参考图像相对于所述优化目标图像的稠密光流,以根据所述稠密光流对所述参考图像中的所述目标区域进行补全填充。
根据本公开的一个或多个实施例,图像处理模块302,还用于:
确定所述目标图像对应的目标前景,以及所述参考图像对应的参考前景,所述目标图像包括所述目标前景以及所述目标背景,所述参考图像包括所述参考前景以及所述参考背景;
分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造网格,以生成目标图像网格、目标前景网格、参考图像网格以及参考前景网格。
根据本公开的一个或多个实施例,图像处理模块302,具体用于:
分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造图像金字塔,以生成层数相同的目标图像金字塔、目标前景金字塔、参考图像金字塔以及参考前景金字塔;
分别对所述目标图像金字塔、所述目标前景金字塔、所述参考图像金字塔以及所述参考前景金字塔中的各个图像层构造网格,其中,各个图像金字塔中对应层级的网格形状与尺度相同。
根据本公开的一个或多个实施例,图像处理模块302,具体用于:
计算第一目标图像层上各个像素的亮度梯度,并按照预设步长遍历所述第一目标图像层上的各个像素,以确定第一目标样本集,其中,所述第一目标样本集中的像素属于第一目标背景层,且所对应的亮度梯度大于预设第一阈值,所述第一目标图像层与所述第一目标背景层位于图像金字塔中的相同层级;
确定所述第一目标图像层上各个像素相对于所述参考背景中对应像素的光度误差,所述第一目标图像层上各个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述第一目标图像层上各个像素对应的网格顶点位移正则化的网格顶点惯性误差。
根据本公开的一个或多个实施例,图像处理模块302,具体用于:
根据所述位移向量对所述第一目标样本集中的对应的像素进行网格变形,并将网格变形后的结果传递至所述目标图像金字塔中的下一层目标图像层,直至所述目标图像金字塔的最底层,以生成优化目标图像,其中,所述第一目标图像层为所述目标图像金字塔的最顶层。
根据本公开的一个或多个实施例,图像处理模块302,具体用于:
将所述目标背景中每个像素对应的网格分割为第一三角形与第二三角形;
根据所述方格变形误差,将所述第一三角形相似变换为变形后的第三三角形,所述第二三角形相似变换为变形后的第四三角形。
根据本公开的一个或多个实施例,图像处理模块302,还用于:确定图像获取状态为运动获取状态。
根据本公开的一个或多个实施例,图像处理模块302,具体用于:
计算所述优化目标图像中的各个网格顶点相对于所述目标图像中对应的各个网格顶点的平均位移;
若所述平均位移大于或等于预设位移阈值,则确定图像获取状态为所述运动获取状态。
值得说明的,图6所示实施例提供的视频处理装置,可用于执行上述任一方法实施例所提供的方法步骤,具体实现方式和技术效果类似,此处不再赘述。
图7为本公开根据一示例实施例示出的电子设备的结构示意图。如图7所示,其示出了适于用来实现本公开实施例的电子设备400的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,简称PDA)、平板电脑(Portable Android Device,简称PAD)、便携式多媒体播放器(Portable Media Player,简称PMP)、车载终端(例如车载导航终端)、可穿戴电子设备等等具有图像获取功能的移动终端以及诸如数字TV、台式计算机、智能家居设备等等外接有具有图像获取设备的固定终端。图7示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图7所示,电子设备400可以包括处理器(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(Read Only Memory,简称ROM)402中的程序或者从存储器408加载到随机访问存储器(Random Access Memory,简称RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理器401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(Input/Output,简称I/O)接口405也连接至总线404。存储器用于存储执行上述各个方法实施例所述视频处理方法的程序;处理器被配置为执行存储器中存储的程序。
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(Liquid Crystal Display,简称LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储器408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图7示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机可读存储介质,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行本公开实施例的流程图所示的视频处理方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储器408被安装,或者从ROM 402被安装。在该计算机程序被处理器401执行时,执行本公开实施例的方法中限定的上述视频处理功能。
需要说明的是,本公开上述的计算机可读存储介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(Electrical Programmable Read Only Memory,简称EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,简称CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质 可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,简称RF)等等,或者上述的任意合适的组合。
上述计算机可读存储介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:响应于触发指令,获取目标图像,目标图像为参考图像的前一视频帧,参考图像为图像传感器当前所获取到的视频帧;利用目标图像中的背景区域对参考图像中的目标区域进行补全填充,以生成处理后图像,目标区域为目标对象在参考图像中所覆盖的区域;将处理后图像作为当前视频帧进行显示,处理后图像为目标对象至少部分从参考图像中移除后的图像。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(Local Area Network,简称LAN)或广域网(Wide Area Network,简称WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,简称HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该单元本身的限定,例如,显示模块还可以被描述为“显示对象人脸以及人脸面具序列的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、专用集成电路(Application Specific Integrated  Circuit,简称ASIC)、专用标准产品(Application Specific Standard Parts,简称ASSP)、片上系统(System on Chip,简称SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,简称CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
第一方面,根据本公开的一个或多个实施例,提供了一种视频处理方法,包括:
响应于触发指令,获取目标图像,所述目标图像为参考图像的前一视频帧,所述参考图像为图像传感器当前所获取到的视频帧;
利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,以生成处理后图像,所述目标区域为目标对象在所述参考图像中所覆盖的区域;
将所述处理后图像作为当前视频帧进行显示,所述处理后图像为所述目标对象至少部分从所述参考图像中移除后的图像。
根据本公开的一个或多个实施例,所述利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,包括:
确定目标背景中每个像素相对于参考背景中对应像素的光度误差,所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差,其中,所述目标背景对应所述目标图像中的背景区域,所述参考背景对应所述参考图像中的背景区域;
根据所述光度误差、所述方格变形误差以及所述网格顶点惯性误差确定总体代价函数;
根据所述总体代价函数以及预设优化算法确定所述目标背景中每个像素对应的网格顶点的位移向量,以根据所述位移向量对所述目标图像进行调整后生成优化目标图像;
计算所述参考图像相对于所述优化目标图像的稠密光流;
根据所述稠密光流对所述参考图像中的所述目标区域进行补全填充。
根据本公开的一个或多个实施例,在所述确定目标背景中每个像素相对于参考背景中对应像素的光度误差,所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差之前,还包括:
确定所述目标图像对应的目标前景,以及所述参考图像对应的参考前景,所述目标图像包括所述目标前景以及所述目标背景,所述参考图像包括所述参考前景以及所述参考背景;
分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造网格,以生成目标图像网格、目标前景网格、参考图像网格以及参考前景网格。
根据本公开的一个或多个实施例,所述分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造网格,以生成目标图像网格、目标前景网格、参考图像网格以及参考前景网格,包括:
分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造图像金字塔,以生成层数相同的目标图像金字塔、目标前景金字塔、参考图像金字塔以及参考前景金字塔;
分别对所述目标图像金字塔、所述目标前景金字塔、所述参考图像金字塔以及所述参考前景金字塔中的各个图像层构造网格,其中,各个图像金字塔中对应层级的网格形状与尺度相同。
根据本公开的一个或多个实施例,所述确定目标背景中每个像素相对于参考背景中对应像素的光度误差,所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差,包括:
计算第一目标图像层上各个像素的亮度梯度,并按照预设步长遍历所述第一目标图像层上的各个像素,以确定第一目标样本集,其中,所述第一目标样本集中的像素属于第一目标背景层,且所对应的亮度梯度大于预设第一阈值,所述第一目标图像层与所述第一目标背景层位于图像金字塔中的相同层级;
确定所述第一目标图像层上各个像素相对于所述参考背景中对应像素的光度误差,所述第一目标图像层上各个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述第一目标图像层上各个像素对应的网格顶点位移正则化的网格顶点惯性误差。
根据本公开的一个或多个实施例,所述根据所述总体代价函数以及预设优化算法确定所述目标背景中每个像素对应的网格顶点的位移向量,以根据所述位移向量对所述目标图像进行调整后生成优化目标图像,包括:
根据所述位移向量对所述第一目标样本集中的对应的像素进行网格变形,并将网格变形后的结果传递至所述目标图像金字塔中的下一层目标图像层,直至所述目标图像金字塔的最底层,以生成优化目标图像,其中,所述第一目标图像层为所述目标图像金字塔的最顶层。
根据本公开的一个或多个实施例,在确定所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差之后,还包括:
将所述目标背景中每个像素对应的网格分割为第一三角形与第二三角形;
根据所述方格变形误差,将所述第一三角形相似变换为变形后的第三三角形,所述第二三角形相似变换为变形后的第四三角形。
根据本公开的一个或多个实施例,在所述利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充之前,还包括:
确定图像获取状态为运动获取状态。
根据本公开的一个或多个实施例,所述确定图像获取状态为运动获取状态,包括:
计算所述优化目标图像中的各个网格顶点相对于所述目标图像中对应的各个网格顶点的平均位移;
若所述平均位移大于或等于预设位移阈值,则确定图像获取状态为所述运动获取状态。
第二方面,根据本公开的一个或多个实施例,提供了一种视频处理装置,包括:
图像获取模块,用于响应于触发指令,获取目标图像,所述目标图像为参考图像的前一视频帧,所述参考图像为图像传感器当前所获取到的视频帧;
图像处理模块,用于利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,以生成处理后图像,所述目标区域为目标对象在所述参考图像中所覆盖的区域;
显示器,用于将所述处理后图像作为当前视频帧进行显示,所述处理后图像为所述目标对象至少部分从所述参考图像中移除后的图像。
根据本公开的一个或多个实施例,图像处理模块,具体用于:
确定目标背景中每个像素相对于参考背景中对应像素的光度误差,所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差,其中,所述目标背景对应所述目标图像中的背景区域,所述参考背景对应所述参考图像中的背景区域;
根据所述光度误差、所述方格变形误差以及所述网格顶点惯性误差确定总体代价函数;
根据所述总体代价函数以及预设优化算法确定所述目标背景中每个像素对应的网格顶点的位移向量,以根据所述位移向量对所述目标图像进行调整后生成优化目标图像;
计算所述参考图像相对于所述优化目标图像的稠密光流,以根据所述稠密光流对所述参考图像中的所述目标区域进行补全填充。
根据本公开的一个或多个实施例,图像处理模块,还用于:
确定所述目标图像对应的目标前景,以及所述参考图像对应的参考前景,所述目标图像包括所述目标前景以及所述目标背景,所述参考图像包括所述参考前景以及所述参考背景;
分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造网格,以生成目标图像网格、目标前景网格、参考图像网格以及参考前景网格。
根据本公开的一个或多个实施例,图像处理模块,具体用于:
分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造图像金字塔,以生成层数相同的目标图像金字塔、目标前景金字塔、参考图像金字塔以及参考前景金字塔;
分别对所述目标图像金字塔、所述目标前景金字塔、所述参考图像金字塔以及所述参考前景金字塔中的各个图像层构造网格,其中,各个图像金字塔中对应层级的网格形状与尺度相同。
根据本公开的一个或多个实施例,图像处理模块,具体用于:
计算第一目标图像层上各个像素的亮度梯度,并按照预设步长遍历所述第一目标图像层上的各个像素,以确定第一目标样本集,其中,所述第一目标样本集中的像素属于第一目标背景层,且所对应的亮度梯度大于预设第一阈值,所述第一目标图像层与所述第一目标背景层位于图像金字塔中的相同层级;
确定所述第一目标图像层上各个像素相对于所述参考背景中对应像素的光度误差,所述第一目标图像层上各个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述第一目标图像层上各个像素对应的网格顶点位移正则化的网格顶点惯性误差。
根据本公开的一个或多个实施例,图像处理模块,具体用于:
根据所述位移向量对所述第一目标样本集中的对应的像素进行网格变形,并将网格变形后的结果传递至所述目标图像金字塔中的下一层目标图像层,直至所述目标图像金字塔的最底层,以生成优化目标图像,其中,所述第一目标图像层为所述目标图像金字塔的最顶层。
根据本公开的一个或多个实施例,图像处理模块,具体用于:
将所述目标背景中每个像素对应的网格分割为第一三角形与第二三角形;
根据所述方格变形误差,将所述第一三角形相似变换为变形后的第三三角形,所述第二三角形相似变换为变形后的第四三角形。
根据本公开的一个或多个实施例,图像处理模块,还用于:确定图像获取状态为运动获取状态。
根据本公开的一个或多个实施例,图像处理模块,具体用于:
计算所述优化目标图像中的各个网格顶点相对于所述目标图像中对应的各个网格顶点的平均位移;
若所述平均位移大于或等于预设位移阈值,则确定图像获取状态为所述运动获取状态。
第三方面,本公开实施例提供一种电子设备,包括:
处理器;以及
存储器,用于存储所述处理器的计算机程序;
显示器,用于显示经所述处理器处理后的视频;
其中,所述处理器被配置为通过执行所述计算机程序来实现如上第一方面以及第一方面各种可能的设计中所述的视频处理方法。
第四方面,本公开实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上第一方面以及第一方面各种可能的设计中所述的视频处理方法。
第五方面,本公开实施例提供一种计算机程序产品,包括计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计中所述的视频处理方法。
第六方面,本公开实施例提供一种计算机程序,该计算机程序被处理器执行时实现如上第一方面以及第一方面各种可能的设计中所述的视频处理方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (14)

  1. 一种视频处理方法,其特征在于,包括:
    响应于触发指令,获取目标图像,所述目标图像为参考图像的前一视频帧,所述参考图像为图像传感器当前所获取到的视频帧;
    利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,以生成处理后图像,所述目标区域为目标对象在所述参考图像中所覆盖的区域;
    将所述处理后图像作为当前视频帧进行显示,所述处理后图像为所述目标对象至少部分从所述参考图像中移除后的图像。
  2. 根据权利要求1所述的视频处理方法,其特征在于,所述利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,包括:
    确定目标背景中每个像素相对于参考背景中对应像素的光度误差,所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差,其中,所述目标背景对应所述目标图像中的背景区域,所述参考背景对应所述参考图像中的背景区域;
    根据所述光度误差、所述方格变形误差以及所述网格顶点惯性误差确定总体代价函数;
    根据所述总体代价函数以及预设优化算法确定所述目标背景中每个像素对应的网格顶点的位移向量,以根据所述位移向量对所述目标图像进行调整后生成优化目标图像;
    计算所述参考图像相对于所述优化目标图像的稠密光流;
    根据所述稠密光流对所述参考图像中的所述目标区域进行补全填充。
  3. 根据权利要求2所述的视频处理方法,其特征在于,在所述确定目标背景中每个像素相对于参考背景中对应像素的光度误差,所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述目标背景中每个像素对应的网格顶点位移正则化的网格顶点惯性误差之前,还包括:
    确定所述目标图像对应的目标前景,以及所述参考图像对应的参考前景,所述目标图像包括所述目标前景以及所述目标背景,所述参考图像包括所述参考前景以及所述参考背景;
    分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造网格,以生成目标图像网格、目标前景网格、参考图像网格以及参考前景网格。
  4. 根据权利要求3所述的视频处理方法,其特征在于,所述分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造网格,以生成目标图像网格、目标前景网格、参考图像网格以及参考前景网格,包括:
    分别对所述目标图像、所述目标前景、所述参考图像以及所述参考前景构造图像金字塔,以生成层数相同的目标图像金字塔、目标前景金字塔、参考图像金字塔以及参考前景金字塔;
    分别对所述目标图像金字塔、所述目标前景金字塔、所述参考图像金字塔以及所述参考前景金字塔中的各个图像层构造网格,其中,各个图像金字塔中对应层级的网格形状与尺度相同。
  5. 根据权利要求2至4中任意一项所述的视频处理方法,其特征在于,所述确定目标背景中每个像素相对于参考背景中对应像素的光度误差,所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述目标背景中每个像素对应的网格 顶点位移正则化的网格顶点惯性误差,包括:
    计算第一目标图像层上各个像素的亮度梯度,并按照预设步长遍历所述第一目标图像层上的各个像素,以确定第一目标样本集,其中,所述第一目标样本集中的像素属于第一目标背景层,且所对应的亮度梯度大于预设第一阈值,所述第一目标图像层与所述第一目标背景层位于图像金字塔中的相同层级;
    确定所述第一目标图像层上各个像素相对于所述参考背景中对应像素的光度误差,所述第一目标图像层上各个像素对应的网格顶点按照相似变换所确定的方格变形误差,以及所述第一目标图像层上各个像素对应的网格顶点位移正则化的网格顶点惯性误差。
  6. 根据权利要求5所述的视频处理方法,其特征在于,所述根据所述总体代价函数以及预设优化算法确定所述目标背景中每个像素对应的网格顶点的位移向量,以根据所述位移向量对所述目标图像进行调整后生成优化目标图像,包括:
    根据所述位移向量对所述第一目标样本集中的对应的像素进行网格变形,并将网格变形后的结果传递至所述目标图像金字塔中的下一层目标图像层,直至所述目标图像金字塔的最底层,以生成优化目标图像,其中,所述第一目标图像层为所述目标图像金字塔的最顶层。
  7. 根据权利要求2至6中任意一项所述的视频处理方法,其特征在于,在确定所述目标背景中每个像素对应的网格顶点按照相似变换所确定的方格变形误差之后,还包括:
    将所述目标背景中每个像素对应的网格分割为第一三角形与第二三角形;
    根据所述方格变形误差,将所述第一三角形相似变换为变形后的第三三角形,所述第二三角形相似变换为变形后的第四三角形。
  8. 根据权利要求1至7中任意一项所述的视频处理方法,其特征在于,在所述利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充之前,还包括:
    确定图像获取状态为运动获取状态。
  9. 根据权利要求8所述的视频处理方法,其特征在于,所述确定图像获取状态为运动获取状态,包括:
    计算所述优化目标图像中的各个网格顶点相对于所述目标图像中对应的各个网格顶点的平均位移;
    若所述平均位移大于或等于预设位移阈值,则确定图像获取状态为所述运动获取状态。
  10. 一种视频处理装置,其特征在于,包括:
    图像获取模块,用于响应于触发指令,获取目标图像,所述目标图像为参考图像的前一视频帧,所述参考图像为图像传感器当前所获取到的视频帧;
    图像处理模块,用于利用所述目标图像中的背景区域对所述参考图像中的目标区域进行补全填充,以生成处理后图像,所述目标区域为目标对象在所述参考图像中所覆盖的区域;
    显示器,用于将所述处理后图像作为当前视频帧进行显示,所述处理后图像为所述目标对象至少部分从所述参考图像中移除后的图像。
  11. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,用于存储计算机程序;其中,所述处理器被配置为通过执行所述计算机程序来实现权利要求1至9中任意一项所述的视频处理方法;以及
    显示器,用于显示经所述处理器处理后的视频。
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如权利要求1至9中任意一项所述的视频处理方法。
  13. 一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9中任意一项所述的视频处理方法。
  14. 一种计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至9中任意一项所述的视频处理方法。
PCT/CN2022/113881 2021-08-24 2022-08-22 视频处理方法、装置、设备、介质及程序产品 WO2023025085A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110977267.6 2021-08-24
CN202110977267.6A CN115914497A (zh) 2021-08-24 2021-08-24 视频处理方法、装置、设备、介质及程序产品

Publications (1)

Publication Number Publication Date
WO2023025085A1 true WO2023025085A1 (zh) 2023-03-02

Family

ID=85321542

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/113881 WO2023025085A1 (zh) 2021-08-24 2022-08-22 视频处理方法、装置、设备、介质及程序产品

Country Status (2)

Country Link
CN (1) CN115914497A (zh)
WO (1) WO2023025085A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523951A (zh) * 2023-07-03 2023-08-01 瀚博半导体(上海)有限公司 多层并行光流估计方法和装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650744A (zh) * 2016-09-16 2017-05-10 北京航空航天大学 局部形状迁移指导的图像对象共分割方法
CN112637517A (zh) * 2020-11-16 2021-04-09 北京字节跳动网络技术有限公司 视频处理方法、装置、电子设备及存储介质
WO2021110226A1 (en) * 2019-12-02 2021-06-10 Claviate Aps A method of monitoring a production area and a system thereof
CN113014846A (zh) * 2019-12-19 2021-06-22 华为技术有限公司 一种视频采集控制方法、电子设备、计算机可读存储介质
CN113139913A (zh) * 2021-03-09 2021-07-20 杭州电子科技大学 一种面向人物肖像的新视图修正生成方法
CN113747048A (zh) * 2020-05-30 2021-12-03 华为技术有限公司 一种图像内容的去除方法及相关装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650744A (zh) * 2016-09-16 2017-05-10 北京航空航天大学 局部形状迁移指导的图像对象共分割方法
WO2021110226A1 (en) * 2019-12-02 2021-06-10 Claviate Aps A method of monitoring a production area and a system thereof
CN113014846A (zh) * 2019-12-19 2021-06-22 华为技术有限公司 一种视频采集控制方法、电子设备、计算机可读存储介质
CN113747048A (zh) * 2020-05-30 2021-12-03 华为技术有限公司 一种图像内容的去除方法及相关装置
CN112637517A (zh) * 2020-11-16 2021-04-09 北京字节跳动网络技术有限公司 视频处理方法、装置、电子设备及存储介质
CN113139913A (zh) * 2021-03-09 2021-07-20 杭州电子科技大学 一种面向人物肖像的新视图修正生成方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523951A (zh) * 2023-07-03 2023-08-01 瀚博半导体(上海)有限公司 多层并行光流估计方法和装置
CN116523951B (zh) * 2023-07-03 2023-09-05 瀚博半导体(上海)有限公司 多层并行光流估计方法和装置

Also Published As

Publication number Publication date
CN115914497A (zh) 2023-04-04

Similar Documents

Publication Publication Date Title
WO2020186935A1 (zh) 虚拟对象的显示方法、装置、电子设备和计算机可读存储介质
WO2022007627A1 (zh) 一种图像特效的实现方法、装置、电子设备及存储介质
WO2021254502A1 (zh) 目标对象显示方法、装置及电子设备
WO2022100735A1 (zh) 视频处理方法、装置、电子设备及存储介质
CN110728622B (zh) 鱼眼图像处理方法、装置、电子设备及计算机可读介质
WO2022042290A1 (zh) 一种虚拟模型处理方法、装置、电子设备和存储介质
WO2023151524A1 (zh) 图像显示方法、装置、电子设备及存储介质
WO2022247630A1 (zh) 图像处理方法、装置、电子设备及存储介质
WO2023221409A1 (zh) 虚拟现实空间的字幕渲染方法、装置、设备及介质
WO2023193639A1 (zh) 图像渲染方法、装置、可读介质及电子设备
WO2023103720A1 (zh) 视频特效处理方法、装置、电子设备及程序产品
WO2023193642A1 (zh) 视频处理方法、装置、设备及介质
WO2023025085A1 (zh) 视频处理方法、装置、设备、介质及程序产品
US11494961B2 (en) Sticker generating method and apparatus, and medium and electronic device
CN114596383A (zh) 线条特效处理方法、装置、电子设备、存储介质及产品
CN111818265B (zh) 基于增强现实模型的交互方法、装置、电子设备及介质
WO2023246302A1 (zh) 字幕的显示方法、装置、设备及介质
WO2023193613A1 (zh) 高光渲染方法、装置、介质及电子设备
WO2023207354A1 (zh) 特效视频确定方法、装置、电子设备及存储介质
CN111833459A (zh) 一种图像处理方法、装置、电子设备及存储介质
CN114640796B (zh) 视频处理方法、装置、电子设备及存储介质
CN115756231A (zh) 特效处理方法、装置、设备、计算机可读存储介质及产品
CN110062158A (zh) 控制拍摄装置的方法、装置、电子设备和计算机可读存储介质
CN114049403A (zh) 一种多角度三维人脸重建方法、装置及存储介质
CN113703704A (zh) 界面显示方法、头戴式显示设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22860431

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE