WO2021184303A1 - Video processing method and device - Google Patents

Video processing method and device Download PDF

Info

Publication number
WO2021184303A1
WO2021184303A1 PCT/CN2020/080221 CN2020080221W WO2021184303A1 WO 2021184303 A1 WO2021184303 A1 WO 2021184303A1 CN 2020080221 W CN2020080221 W CN 2020080221W WO 2021184303 A1 WO2021184303 A1 WO 2021184303A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
video
video frame
frame
generate
Prior art date
Application number
PCT/CN2020/080221
Other languages
French (fr)
Chinese (zh)
Inventor
布雷顿·雷米
Original Assignee
深圳市创梦天地科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市创梦天地科技有限公司 filed Critical 深圳市创梦天地科技有限公司
Priority to PCT/CN2020/080221 priority Critical patent/WO2021184303A1/en
Publication of WO2021184303A1 publication Critical patent/WO2021184303A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present invention relates to the field of image processing, in particular to a method and equipment for video processing.
  • Augmented Reality has developed rapidly in recent years and has attracted widespread attention.
  • the precise registration and tracking of virtual objects, the seamlessly integrated reality of virtual and reality, and the real-time interaction between the user and the scene determine the realism and immersion of AR Sense and interactivity.
  • the image of the virtual object in the AR fusion scene is simply merged with the original background photo or video, which will cause the virtual object to float on the image of the object in the real world
  • the virtual object image will always occlude the object in the real world, so the virtual object image cannot be integrated with the real object image in the video, or the correctness between the real object and the virtual object may not be realized.
  • the occlusion relationship affects the user's perception and experience. Wrong occlusion relationship will give users the illusion of relative positional distortion between virtual and real objects and confusion in depth perception, which reduces the realism of the fusion scene.
  • the present invention provides a video processing method and device, which are used to solve the problem that the image of a moving object occludes the image of a virtual object.
  • an embodiment of the present invention provides a video processing method, the method including: taking a first photo. Take a video and detect that the video contains an image of a moving object.
  • the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the image of the first photo.
  • the difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
  • the image area corresponding to the moving object is an opaque image
  • the image area outside the moving object is a transparent image.
  • Each video frame is fused with the opaque mask image and the image of the virtual object corresponding to each video frame, and a composite frame corresponding to each video frame is generated. Concatenate the corresponding composite frames of each video frame to generate composite video.
  • the aforementioned difference between each video frame and the first photo in the video is compared to generate an opaque mask image corresponding to each video frame, which specifically includes: calculating the first photo and the first photo
  • the difference value of the color space of a video frame is the length difference or the square root difference or the product difference
  • multiple gray-scale segmented images are generated after calculation.
  • Combine the multiple gray-scale segmented images to obtain the first gray-scale segmented image.
  • the image of the moving object in the first gray-scale segmented image is segmented and used as the first opaque mask image.
  • the difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
  • the first photo cannot contain an image of a moving object.
  • the aforementioned fusion of each video frame with the corresponding opaque mask image and virtual object image of each video frame, respectively, to generate a composite frame corresponding to each video frame specifically Including: each video frame and the corresponding opaque mask image fusion of each video frame to generate a corresponding occlusion frame for each video frame.
  • Each video frame is merged with the image of the virtual object to generate a rendering frame corresponding to each video frame.
  • the occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.
  • the rendered frame is composed of a background image and an image of a virtual object.
  • the background image is consistent with the image of the first photo.
  • the image of the virtual object overlaps the image area of the background image, the image of the virtual object The image obscures the background image.
  • an embodiment of the present invention provides a video processing device, which includes: a shooting module, a segmentation module, a rendering module, and a synthesis module.
  • the shooting module is used to shoot the first photo, and the shooting module is also used to shoot a video, and it is detected that the video contains an image of a moving object.
  • the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the image of the first photo.
  • the segmentation module is used to compare the difference between each video frame in the video and the first photo, and generate an opaque mask image corresponding to each video frame. In each opaque mask image, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image.
  • the rendering module is used to merge each video frame with the image of the virtual object to generate a rendering frame corresponding to each video frame.
  • the synthesis module is used for fusing each video frame and each corresponding opaque mask image of each video frame, and each corresponding rendering frame of each video frame, to generate a synthesis frame corresponding to each video frame.
  • the synthesis module is also used for concatenating the corresponding synthesized frames of each video frame to generate a synthesized video.
  • the segmentation module is specifically configured to: calculate the difference value of the color space of the first photo and the first video frame, the difference value is the length difference or the square root difference or the product difference, and the calculation generates multiple Gray-scale segmentation of the image. Combine the multiple gray-scale segmented images to obtain the first gray-scale segmented image. The image of the moving object in the first gray-scale segmented image is segmented and used as the first opaque mask image. The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
  • the first photo cannot contain an image of a moving object.
  • the synthesis module is specifically configured to merge each video frame and the corresponding opaque mask image of each video frame to generate the occlusion frame corresponding to each video frame.
  • the occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame. Concatenate the corresponding composite frames of each video frame to generate composite video.
  • the rendered frame is composed of a background image and an image of a virtual object.
  • the background image is consistent with the image of the first photo.
  • the image of the virtual object overlaps the image area of the background image, the image of the virtual object The image obscures the background image.
  • the moving object image in the video or photo, can occlude the virtual object image, and the occlusion relationship between virtual and real objects can be changed, so that the user can see the visual effect of augmented reality, presenting a more natural, real, and highly immersive experience.
  • AR integrates scenes to increase users' perception and experience of scenes, and the visual experience is better.
  • FIG. 1a is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention.
  • Figure 1b is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention.
  • FIG. 1c is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention.
  • Figure 1d is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention.
  • FIG. 2 is a flowchart of a video processing method provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a method for processing video frames according to an embodiment of the present invention.
  • [Correct 20.04.2020 according to Rule 91] 4 is a schematic diagram of modules of a video processing device provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of the hardware structure of a video processing device provided by an embodiment of the present invention.
  • the present invention provides a occlusion method and device for video processing, which are used to solve the problem that the image of a moving object occludes the image of a virtual object. This method is to segment the moving object image as an opaque mask, and then cover the virtual object image with the opaque mask.
  • moving object images can occlude virtual object images, and can change the occlusion relationship between virtual and real objects, allowing users to see the visual effects of augmented reality, presenting a more natural, real, and highly immersive sense
  • the AR integrates the scene to increase the user’s perception and experience of the scene, and the visual experience is better.
  • the method also has automatic calibration function and augmented reality support.
  • FIG. 1 is a schematic diagram of an implementation effect of this application.
  • Figure 1a shows a background image, that is, an area that needs image processing.
  • the example background image in Figure 1a has a tree, a stool, and a white cloud.
  • Figure 1b shows that a moving object enters the background during video shooting.
  • the example of the moving object in Figure 1b is a running person.
  • Figure 1c shows that we insert a virtual object image into the video or photo. Without any image processing, the virtual object image will be added to the top layer of the photo layer, covering the background image and the moving object image.
  • the example virtual object image is an animated image named Pikachu.
  • Figure 1d shows that using a method provided by this application, the moving object image can be placed on top of the virtual object image to achieve the visual effect of the moving object occluding the virtual object image.
  • the running person occludes Pikachu. Effect.
  • the method for video processing provided by this application has specific steps as shown in Figure 2 and includes:
  • the user takes a background snapshot image, and then performs noise reduction processing on the background snapshot image.
  • the image In the process of image generation and transmission, the image is often degraded due to the interference and influence of various noises, which will adversely affect the subsequent image processing and image visual effects. Therefore, in order to suppress the influence of noise, improve image quality, and facilitate higher-level processing, the image must be denoised.
  • the method of reducing the noise of the background snapshot image can be a median filtering method or a Gaussian filtering method.
  • the image after the noise reduction process is called the reference image, that is, the first photo, which is also the area where the virtual object image is integrated in the subsequent steps. Calculate the noise difference between the reference image and the background snapshot image, and the noise difference can be used in subsequent steps.
  • the image of the moving object is not included in the first photo.
  • S102 Shoot a video, and detect that the video contains an image of a moving object.
  • the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the reference image.
  • the moving object enters the real-time video
  • the difference between the reference image and the real-time video frame is compared, a first gray-scale segmented image is generated, and the moving object image is segmented as an opaque mask corresponding to the video frame.
  • the reference image and each video frame In order to compare the difference between the reference image and each video frame more conveniently, first reduce the reference image and each video frame to a smaller resolution, because the image is normalized in the process of reducing the pixels, which can reduce the image pair
  • the sensitivity of the noise generated by the camera sensor so using a smaller image resolution during the comparison process will improve the quality of the comparison.
  • the reduction of the resolution is set according to the actual situation, for example, it can be reduced to 1/2 of the original image. In this way, the resolution of the opaque mask we get will also be reduced, and the edges of the opaque mask will be smoother.
  • the above-mentioned reference image and each video frame will use multiple color spaces, and the difference is compared through a comparison algorithm.
  • the comparison algorithm can be used to calculate the above-mentioned reference image and real-time video in each preset color space. The difference between the value of each pixel in the frame of the image.
  • the color space can be commonly used color spaces such as RGB, HSV, YCbCr, LAB, and XYZ, or it can be a custom color space set by the user.
  • the custom color space refers to the color space standard commonly used in the industry in the prior art. Modify the color space after some parameter values according to requirements.
  • the comparison algorithm can be to compare the difference between each pixel corresponding to the same position in the reference image and the video frame.
  • the calculation of the difference value can be the calculation of length difference, square root difference, product difference, etc. After calculation, each pixel will get a 0 Or the result of 1, 0 means white, 1 means black, so that after the comparison algorithm is processed, a picture composed of black and white pixels will be obtained, that is, a gray-scale segmented image.
  • the white area image is different from the two images obtained by the comparison algorithm. The area composed of pixels.
  • the gray segmentation images obtained are also different.
  • comparison algorithm A can get a gray-scale segmentation image of a human body with a broken arm
  • comparison algorithm B can get a gray-scale segmentation image of a human body with a broken leg.
  • this application may consider adding a noise reference value to the comparison algorithm to filter out the results of the comparison algorithm that is more sensitive to noise.
  • the noise reference value can be used as a judgment condition for selecting the comparison algorithm. For example, if the noise reference value is greater than a certain set value, we will discard the results of the corresponding comparison algorithms. Specifically, in practice, the setting of the judgment condition of the discarding algorithm is selected by the prior art personnel after testing.
  • the noise reference value in the comparison algorithm may use the noise difference between the reference image and the background snapshot image in step S101.
  • the final gray-scale segmented image with the reduced resolution described above is proportionally matched according to the resolution of the display screen.
  • the reference image and the video frame can use multiple color spaces, such as LAB color space, YCbCr color space, and CMYK color space.
  • Each color space uses multiple comparison algorithms. For example, the LAB color space uses the first comparison algorithm, the YCbCr color space uses the second comparison algorithm, and the CMYK color space uses the third comparison algorithm.
  • Each color space gets a different grayscale segmentation image. The white area image parts of all gray-scale segmented images are mixed and combined to obtain a final gray-scale segmented image.
  • each occlusion frame the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image.
  • the shader can have different blending operations, such as Add, Sub, and different blending factors, such as SrcColor, One, so as to blend the 3D object and the real object image in the real-time video and create multiple combined results.
  • the shader we use can be Unity3D shader.
  • Three-dimensional modeling software can include professional modeling software or software with modeling functions, and software with Unity3D plug-in can be selected.
  • Unity3D plug-in can support the effect of adding virtual 3D objects, and can be applied to terminal platforms such as Android and IOS.
  • the corresponding save format of the generated model can be .fbx and .obj.
  • the hybrid virtual object is not limited to 3D virtual objects, but can also be 2D virtual objects and the like.
  • the rendered frame is composed of the background image and the image of the virtual object, and the background image is consistent with the reference image.
  • the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.
  • S107 Concatenate the respective composite frames corresponding to each video frame to generate a composite video.
  • the present application provides a video processing device 400, which can make the image of the moving object cover the image of the virtual object added in the real-time video in the real-time video.
  • the device can be a mobile phone, mobile phone, tablet computer and other terminal devices with camera photo/video functions.
  • the graphics processor GPU of the device can support running high-level shader language (HLSL) or OpenGL shading language (openGLshading). language, GLSL), the device may also have a Simultaneous Localization and Mapping (SLAM) system.
  • HLSL high-level shader language
  • openGLshading OpenGL shading language
  • GLSL Simultaneous Localization and Mapping
  • FIG. 1 The functional block diagram of the device is shown in FIG. 1
  • the photographing module 401 is used to photograph the first photo. Specifically, the user takes a background snapshot image, and then performs noise reduction processing on the background snapshot image. In the process of image generation and transmission, the image is often degraded due to the interference and influence of various noises, which will adversely affect the subsequent image processing and image visual effects. Therefore, in order to suppress the influence of noise, improve image quality, and facilitate higher-level processing, the image must be denoised.
  • the method of reducing the noise of the background snapshot image can be a median filtering method or a Gaussian filtering method.
  • the image after the noise reduction process is called the reference image, that is, the first photo, which is also the area where the virtual object image is integrated in the subsequent steps. Calculate the noise difference between the reference image and the background snapshot image, and the noise difference can be used in subsequent steps. In addition, the image of the moving object is not included in the first photo.
  • the shooting module 401 is also used to shoot a video, and it is detected that the video contains an image of a moving object.
  • the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the reference image.
  • the segmentation module 402 is configured to compare the difference between each video frame and the reference image in the video, and generate an opaque mask image corresponding to each video frame.
  • the moving object enters the real-time video
  • the difference between the reference image and the real-time video frame is compared, a first gray-scale segmented image is generated, and the moving object image is segmented as an opaque mask corresponding to the video frame.
  • the reference image and each video frame In order to compare the difference between the reference image and each video frame more conveniently, first reduce the reference image and each video frame to a smaller resolution, because the image is normalized in the process of reducing the pixels, which can reduce the image pair
  • the sensitivity of the noise generated by the camera sensor so using a smaller image resolution during the comparison process will improve the quality of the comparison.
  • the reduction of the resolution is set according to the actual situation, for example, it can be reduced to 1/2 of the original image. In this way, the resolution of the opaque mask we get will also be reduced, and the edges of the opaque mask will be smoother.
  • the comparison algorithm can be used to calculate the preset reference image and real-time video in each color space.
  • the difference between the value of each pixel in the frame of the image.
  • the color space can be commonly used color spaces such as RGB, HSV, YCbCr, LAB, and XYZ, or it can be a custom color space set by the user.
  • the custom color space refers to the color space standard commonly used in the industry in the prior art. Modify the color space after some parameter values according to requirements.
  • the comparison algorithm can be to compare the difference between each pixel corresponding to the same position in the reference image and the video frame.
  • the calculation of the difference value can be the calculation of length difference, square root difference, product difference, etc. After calculation, each pixel will get a 0 Or the result of 1, 0 means white, 1 means black, so that after the comparison algorithm is processed, a black-and-white pixel image will be obtained, that is, a gray-scale segmented image.
  • the white area image is different from the two images obtained by the comparison algorithm. The area composed of pixels.
  • comparison algorithm A can get a gray-scale segmentation image of a human body with a broken arm
  • comparison algorithm B can get a gray-scale segmentation image of a human body with a broken leg.
  • this application may consider adding a noise reference value to the comparison algorithm to filter out the results of the comparison algorithm that is more sensitive to noise.
  • the noise reference value can be used as a judgment condition for selecting the comparison algorithm. For example, if the noise reference value is greater than a certain set value, we will discard the results of the corresponding comparison algorithms. Specifically, in practice, the setting of the judgment condition of the discarding algorithm is selected by the prior art personnel after testing.
  • the noise reference value in the comparison algorithm may use the noise difference between the reference image and the background snapshot image in step S101.
  • the final gray-scale segmented image with the reduced resolution is scaled according to the resolution of the display screen.
  • the reference image and the video frame can use multiple color spaces, such as LAB color space, YCbCr color space, and CMYK color space.
  • Each color space uses multiple comparison algorithms. For example, the LAB color space uses the first comparison algorithm, the YCbCr color space uses the second comparison algorithm, and the CMYK color space uses the third comparison algorithm.
  • Each color space gets a different grayscale segmentation image. The white area image parts of all gray-scale segmented images are mixed and combined to obtain a final gray-scale segmented image.
  • the rendering module 403 is used for fusing each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame.
  • the shader can have different blending operations, such as Add, Sub, and different blending factors, such as SrcColor, One, so as to blend the 3D object and the real object image in the real-time video and create multiple combined results.
  • the shader we use can be Unity3D shader.
  • Three-dimensional modeling software can include professional modeling software or software with modeling functions, and software with Unity3D plug-in can be selected.
  • Unity3D plug-in can support the effect of adding virtual 3D objects, and can be applied to terminal platforms such as Android and IOS.
  • the corresponding save format of the generated model can be .fbx and .obj.
  • the hybrid virtual object is not limited to 3D virtual objects, but can also be 2D virtual objects and the like.
  • the rendered frame is composed of the background image and the image of the virtual object, and the background image is consistent with the reference image.
  • the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.
  • the synthesis module 404 is configured to merge each video frame and each corresponding opaque mask image of each video frame, and each corresponding rendering frame of each video frame, to generate a synthesis frame corresponding to each video frame.
  • each video frame and the corresponding opaque mask image of each video frame are merged to generate the occlusion frame corresponding to each video frame.
  • the image area corresponding to the moving object is an opaque image
  • the image area outside the moving object is a transparent image.
  • the occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.
  • the synthesis module 404 is also used for concatenating the respective synthesized frames corresponding to each video frame to generate a synthesized video.
  • the terminal device can be a mobile phone, a tablet computer, a notebook computer and other terminal devices with camera and video functions.
  • the graphics processor GPU of the terminal device can support running high-level shader language (HLSL) or OpenGL shading language (openGLshading language, GLSL), the terminal device may also have a simultaneous positioning and mapping (Simultaneous Localization and Mapping, SLAM) system.
  • the terminal device can be equipped with a larger screen (for example, a screen of 5 inches and above) to facilitate the user to watch the shooting effect.
  • the terminal device is equipped with one or more cameras, such as a 2D camera, a 3D camera, and there is no restriction here.
  • FIG. 5 is a structural block diagram of an implementation manner of the terminal device 500.
  • the terminal device 500 may include: a baseband chip 510, a memory 515 (one or more computer-readable storage media), a radio frequency (RF) module 516, and a peripheral system 517. These components may communicate on one or more communication buses 514.
  • a baseband chip 510 one or more computer-readable storage media
  • a radio frequency (RF) module 516 one or more computer-readable storage media
  • RF radio frequency
  • the peripheral system 517 is mainly used to implement the interactive function between the terminal 500 and the user/external environment, and mainly includes the input and output devices of the terminal 500.
  • the peripheral system 517 may include: a touch screen controller 518, a camera controller 519, an audio controller 520, and a sensor management module 521.
  • each controller can be coupled with its corresponding peripheral devices (such as the touch screen 523, the camera 524, the audio circuit 525, and the sensor 526).
  • the touch screen 523 is also called a touch panel, which can collect the user's touch operations on or near it (for example, the user's operations on the touch screen or near the touch screen using any suitable objects or accessories such as fingers, stylus, etc.), and can be set according to the preset
  • the specified program drives the corresponding connection device.
  • the touch screen may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 511, and can receive and execute the commands sent by the processor 511.
  • the peripheral system 517 may also include a display panel.
  • the display panel may be configured in the form of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), etc.
  • the touch screen can cover the display panel. When the touch screen detects a touch operation on or near it, it is transmitted to the processor 511 to determine the type of the touch event, and then the processor 511 provides corresponding information on the display panel according to the type of the touch event. Visual output.
  • a touch screen and a display panel can be integrated to realize the input and output functions of the terminal device 500.
  • the camera 524 may be a 2D camera or a 3D camera. It should be noted that the peripheral system 517 may also include other I/O peripherals, which is not limited.
  • the baseband chip 510 may integrate: one or more processors 511, a clock module 512, and a power management module 513.
  • the clock module 512 integrated in the baseband chip 510 is mainly used to generate a clock required for data transmission and timing control for the processor 511.
  • the power management module 513 integrated in the baseband chip 510 mainly manages charging, discharging, and power consumption distribution functions to provide a stable, high-precision voltage for the processor 511, the radio frequency module 516, and peripheral systems.
  • the processor 511 in the embodiment of this application may include at least one of the following types: general-purpose central processing unit (central processing unit, CPU), graphics processing unit (GPU), digital signal processor (digital signal processor, DSP), microprocessor, application-specific integrated circuit (ASIC), microcontroller (microcontroller unit, MCU), field programmable gate array (field programmable gate array, FPGA), or It is an integrated circuit that implements logic operations.
  • the processor 511 may be a single-CPU processor or a multi-CPU processor.
  • the at least one processor 511 may be integrated in one chip or located on multiple different chips.
  • the radio frequency (RF) module 516 is used to receive and transmit radio frequency signals, and mainly integrates the receiver and transmitter of the terminal 500.
  • the radio frequency (RF) module 516 communicates with the communication network and other communication devices through radio frequency signals.
  • the radio frequency (RF) module 516 may include, but is not limited to: an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chip, a SIM card 5161 And storage media.
  • the radio frequency (RF) module 516 may be implemented on a separate chip.
  • the RF module 516 can also communicate with the network and other devices through wireless communication, such as Wi-Fi 5162.
  • the wireless communication can use any communication standard or protocol, including but not limited to GSM (global system of mobile communication, global system for mobile communication), GPRS (general packet radio service, general packet radio service), CDMA (code division multiple access) , Code division multiple access), WCDMA (wideband code division multiple access, wideband code division multiple access), LTE (long term evolution), email, SMS (short messaging service, short message service), short-distance communication technology Wait.
  • GSM global system of mobile communication, global system for mobile communication
  • GPRS general packet radio service, general packet radio service
  • CDMA code division multiple access
  • Code division multiple access Code division multiple access
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • email short messaging service, short message service
  • SMS short messaging service, short message service
  • short-distance communication technology Wait any communication standard or protocol, including but not limited to GSM (global system of mobile communication, global system for mobile communication), GPRS (general packet radio service, general packet radio service), CDMA (code division multiple access) ,
  • the memory 515 is coupled with the processor 511, and is used to store various software programs and/or multiple sets of instructions.
  • the memory 515 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
  • the memory 515 may store an operating system (hereinafter referred to as system), such as embedded operating systems such as ANDROID, IOS.
  • system such as embedded operating systems such as ANDROID, IOS.
  • the memory 515 may also store a network communication program, which may be used to communicate with one or more additional devices, one or more terminal devices, and one or more network devices.
  • the memory 515 can also store a user interface program, which can vividly display the content of the application program through a graphical operation interface, and receive user control operations on the application program through input controls such as menus, dialog boxes, and keys. .
  • the memory 515 may also store one or more application programs. As shown in Figure 5, these applications may include: social applications (such as Facebook), image management applications (such as photo albums), map applications (such as Google Maps), browsers (such as Safari, Google Chrome), etc. .
  • social applications such as Facebook
  • image management applications such as photo albums
  • map applications such as Google Maps
  • browsers such as Safari, Google Chrome
  • terminal device 500 is only an example provided by the embodiment of the present invention, and the terminal device 500 may have more or fewer components than shown, may combine two or more components, or may have Different configurations of components are realized.
  • the moving object image in a video or photo, when the moving object image and the virtual object image area overlap, the moving object image can block the virtual object image in real time, change the occlusion relationship between objects, and make the user see enhanced Realistic visual effects, better visual experience.
  • the method of the present invention uses smaller calculation amount, smaller processor load, easy operation and convenient realization.
  • the method also has automatic calibration function and augmented reality support.
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may be in the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, optical storage, etc.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Abstract

The present invention relates to the field of image processing. Disclosed are a video processing method and device. The method comprises: capturing a first photo as a reference image; capturing a video and detecting that the video comprises an image of a moving object; separately comparing each video frame in the video with the reference image to obtain a difference, and generating an opaque mask image respectively corresponding to each video frame; fusing each video frame with the opaque mask image corresponding thereto, to generate a corresponding occlusion frame; fusing each video frame with an image of a virtual object to generate a corresponding rendering frame; fusing the occlusion frame and the rendering frame corresponding to each video frame to generate a corresponding synthesized frame; and finally connecting all synthesized frames together in series to generate a synthesized video. The present invention solves the problem of an occlusion relation between a virtual object and a real object. According to this solution, in a video, the image of a moving object can occlude the image of a virtual object, thus the occlusion relation between the virtual and real objects is changed, and the user's perception and experience of augmented reality scenes are improved.

Description

一种视频处理的方法及设备Method and equipment for video processing 技术领域Technical field
本发明涉及图像处理领域,尤其涉及一种视频处理的方法及设备。The present invention relates to the field of image processing, in particular to a method and equipment for video processing.
背景技术Background technique
增强现实(Augmented Reality,AR)近年来快速发展并得到了人们的广泛关注,虚拟物体的精确注册跟踪、虚实无缝融合的现实、用户与场景之间的实时交互决定了AR的真实感、沉浸感以及交互性。Augmented Reality (AR) has developed rapidly in recent years and has attracted widespread attention. The precise registration and tracking of virtual objects, the seamlessly integrated reality of virtual and reality, and the real-time interaction between the user and the scene determine the realism and immersion of AR Sense and interactivity.
虚拟物体图像添加入视频或照片中时,在不进行技术处理的情况下,AR融合场景中虚拟物体的图像与原背景照片或视频简单融合,会导致虚拟物体浮于真实世界中的物体的图像之上,无论真实物体怎么移动,虚拟物体图像会一直遮挡真实世界中的物体,这样虚拟物体图像无法与视频中真实物体图像融为一体,或者说可能没有实现真实物体与虚拟物体之间的正确遮挡关系,影响了用户的观感体验。错误的遮挡关系,会给用户造成虚实物体之间相对位置失真的错觉以及在深度感知上的错乱,降低了融合场景的真实感。When a virtual object image is added to a video or photo, without technical processing, the image of the virtual object in the AR fusion scene is simply merged with the original background photo or video, which will cause the virtual object to float on the image of the object in the real world Above, no matter how the real object moves, the virtual object image will always occlude the object in the real world, so the virtual object image cannot be integrated with the real object image in the video, or the correctness between the real object and the virtual object may not be realized. The occlusion relationship affects the user's perception and experience. Wrong occlusion relationship will give users the illusion of relative positional distortion between virtual and real objects and confusion in depth perception, which reduces the realism of the fusion scene.
发明内容Summary of the invention
本发明提供一种视频处理的方法及设备,用来解决移动对象图像遮挡虚拟对象图像的问题。The present invention provides a video processing method and device, which are used to solve the problem that the image of a moving object occludes the image of a virtual object.
第一方面,本发明实施例提供了一种视频处理的方法,该方法包括:拍摄第一照片。拍摄视频,检测到视频中包含移动对象的图像。视频中的视频帧由背景图像和移动对象的图像构成,背景图像和第一照片的图像一致。分别对比视频中的每个视频帧和第一照片的差异,生成每个视频帧各自对应的不透明遮罩图像。在每个不透明遮罩图像中,对应移动对象的图像区域是不透明图像,移动对象以外的图像区域为透明图像。将每个视频帧分别和每个视频帧各自对应的不透明遮罩图像、虚拟对象的图像进行融合,生成每个视频帧各自对应的合成帧。将每个视频帧各自对应的合成帧串联起来生成合成视频。In the first aspect, an embodiment of the present invention provides a video processing method, the method including: taking a first photo. Take a video and detect that the video contains an image of a moving object. The video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the image of the first photo. The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated. In each opaque mask image, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image. Each video frame is fused with the opaque mask image and the image of the virtual object corresponding to each video frame, and a composite frame corresponding to each video frame is generated. Concatenate the corresponding composite frames of each video frame to generate composite video.
结合第一方面,在一些实施例中,在每个合成帧中,当虚拟对象的图像与移动对象的图像发生图像区域重叠时,虚拟对象的图像被移动对象的图像遮挡。In combination with the first aspect, in some embodiments, in each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is blocked by the image of the moving object.
结合第一方面,在一些实施例中,前述分别对比视频中的每个视频帧和第一照片的差异,生成每个视频帧各自对应的不透明遮罩图像,具体包括:计算第一照片与第一视频帧的色彩空间的差异值,差异值是长度差或平方根差或乘积差,计算后生成多个灰度分割图像。将多个灰度分割图像进行图像合并,得到第一灰度分割图像。将第一灰度分割图像中移动对象的图像分割出来,作为第一不透明遮罩图像。分别对比视频中的每个视频帧和第一照片的差异,生成每个视频帧各自对应的不透明遮罩图像。With reference to the first aspect, in some embodiments, the aforementioned difference between each video frame and the first photo in the video is compared to generate an opaque mask image corresponding to each video frame, which specifically includes: calculating the first photo and the first photo The difference value of the color space of a video frame, the difference value is the length difference or the square root difference or the product difference, and multiple gray-scale segmented images are generated after calculation. Combine the multiple gray-scale segmented images to obtain the first gray-scale segmented image. The image of the moving object in the first gray-scale segmented image is segmented and used as the first opaque mask image. The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
结合第一方面,在一些实施例中,第一照片中不能包含移动对象的图像。In combination with the first aspect, in some embodiments, the first photo cannot contain an image of a moving object.
结合第一方面,在一些实施例中,前述将每个视频帧分别和每个视频帧各自对应的不透明遮罩图像、虚拟对象的图像进行融合,生成每个视频帧各自对应的合成帧,具体包括: 每个视频帧和每个视频帧各自对应的不透明遮罩图像融合,生成每个视频帧对应的遮挡帧。每个视频帧和虚拟对象的图像融合,生成每个视频帧对应的渲染帧。将每个视频帧对应的遮挡帧和每个视频帧对应的渲染帧融合,生成每个视频帧各自对应的合成帧。With reference to the first aspect, in some embodiments, the aforementioned fusion of each video frame with the corresponding opaque mask image and virtual object image of each video frame, respectively, to generate a composite frame corresponding to each video frame, specifically Including: each video frame and the corresponding opaque mask image fusion of each video frame to generate a corresponding occlusion frame for each video frame. Each video frame is merged with the image of the virtual object to generate a rendering frame corresponding to each video frame. The occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.
结合第一方面,在一些实施例中,渲染帧由背景图像和虚拟对象的图像构成,背景图像和第一照片的图像一致,当虚拟对象的图像与背景图像的图像区域重叠时,虚拟对象的图像遮挡住背景图像。In combination with the first aspect, in some embodiments, the rendered frame is composed of a background image and an image of a virtual object. The background image is consistent with the image of the first photo. When the image of the virtual object overlaps the image area of the background image, the image of the virtual object The image obscures the background image.
第二方面,本发明实施例提供了一种视频处理的设备,该设备包括:拍摄模块、分割模块、渲染模块、合成模块。拍摄模块用于拍摄第一照片,拍摄模块还用于拍摄视频,检测到视频中包含移动对象的图像。视频中的视频帧由背景图像和移动对象的图像构成,背景图像和第一照片的图像一致。分割模块用于分别对比视频中的每个视频帧和第一照片的差异,生成每个视频帧各自对应的不透明遮罩图像。在每个不透明遮罩图像中,对应移动对象的图像区域是不透明图像,移动对象以外的图像区域为透明图像。渲染模块用于将每个视频帧和虚拟对象的图像融合,生成每个视频帧对应的渲染帧。合成模块用于将每个视频帧和每个视频帧各自对应的不透明遮罩图像、每个视频帧各自对应的渲染帧进行融合,生成每个视频帧各自对应的合成帧。合成模块还用于将每个视频帧各自对应的合成帧串联起来生成合成视频。In a second aspect, an embodiment of the present invention provides a video processing device, which includes: a shooting module, a segmentation module, a rendering module, and a synthesis module. The shooting module is used to shoot the first photo, and the shooting module is also used to shoot a video, and it is detected that the video contains an image of a moving object. The video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the image of the first photo. The segmentation module is used to compare the difference between each video frame in the video and the first photo, and generate an opaque mask image corresponding to each video frame. In each opaque mask image, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image. The rendering module is used to merge each video frame with the image of the virtual object to generate a rendering frame corresponding to each video frame. The synthesis module is used for fusing each video frame and each corresponding opaque mask image of each video frame, and each corresponding rendering frame of each video frame, to generate a synthesis frame corresponding to each video frame. The synthesis module is also used for concatenating the corresponding synthesized frames of each video frame to generate a synthesized video.
结合第二方面,在一些实施例中,在每个合成帧中,当虚拟对象的图像与移动对象的图像发生图像区域重叠时,虚拟对象的图像被移动对象的图像遮挡。In combination with the second aspect, in some embodiments, in each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is blocked by the image of the moving object.
结合第二方面,在一些实施例中,分割模块具体用于:计算第一照片与第一视频帧的色彩空间的差异值,差异值是长度差或平方根差或乘积差,计算后生成多个灰度分割图像。将多个灰度分割图像进行图像合并,得到第一灰度分割图像。将第一灰度分割图像中移动对象的图像分割出来,作为第一不透明遮罩图像。分别对比视频中的每个视频帧和第一照片的差异,生成每个视频帧各自对应的不透明遮罩图像。With reference to the second aspect, in some embodiments, the segmentation module is specifically configured to: calculate the difference value of the color space of the first photo and the first video frame, the difference value is the length difference or the square root difference or the product difference, and the calculation generates multiple Gray-scale segmentation of the image. Combine the multiple gray-scale segmented images to obtain the first gray-scale segmented image. The image of the moving object in the first gray-scale segmented image is segmented and used as the first opaque mask image. The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
结合第二方面,在一些实施例中,第一照片中不能包含移动对象的图像。In combination with the second aspect, in some embodiments, the first photo cannot contain an image of a moving object.
结合第二方面,在一些实施例中,合成模块具体用于:每个视频帧和每个视频帧各自对应的不透明遮罩图像融合,生成每个视频帧对应的遮挡帧。将每个视频帧对应的遮挡帧和每个视频帧对应的渲染帧融合,生成每个视频帧各自对应的合成帧。将每个视频帧各自对应的合成帧串联起来生成合成视频。With reference to the second aspect, in some embodiments, the synthesis module is specifically configured to merge each video frame and the corresponding opaque mask image of each video frame to generate the occlusion frame corresponding to each video frame. The occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame. Concatenate the corresponding composite frames of each video frame to generate composite video.
结合第二方面,在一些实施例中,渲染帧由背景图像和虚拟对象的图像构成,背景图像和第一照片的图像一致,当虚拟对象的图像与背景图像的图像区域重叠时,虚拟对象的图像遮挡住背景图像。In combination with the second aspect, in some embodiments, the rendered frame is composed of a background image and an image of a virtual object. The background image is consistent with the image of the first photo. When the image of the virtual object overlaps the image area of the background image, the image of the virtual object The image obscures the background image.
根据上述技术方案,在视频或照片中,移动对象图像可以遮挡虚拟对象图像,可以改变虚实物体之间的遮挡关系,使用户看到增强现实的视觉效果,呈现更自然、真实、高度沉浸感的AR融合场景,增加用户对场景的观感体验,视觉体验效果更佳。According to the above technical solution, in the video or photo, the moving object image can occlude the virtual object image, and the occlusion relationship between virtual and real objects can be changed, so that the user can see the visual effect of augmented reality, presenting a more natural, real, and highly immersive experience. AR integrates scenes to increase users' perception and experience of scenes, and the visual experience is better.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。In order to more clearly describe the technical solutions in the embodiments of the present application or the background art, the following will describe the drawings that need to be used in the embodiments of the present application or the background art.
图1a是本发明实施例提供的一种视频处理的方法的效果示意性图;FIG. 1a is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention;
图1b是本发明实施例提供的一种视频处理的方法的效果示意性图;Figure 1b is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention;
图1c是本发明实施例提供的一种视频处理的方法的效果示意性图;FIG. 1c is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention;
图1d是本发明实施例提供的一种视频处理的方法的效果示意性图;Figure 1d is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention;
图2是本发明实施例提供的一种视频处理的方法流程图;Figure 2 is a flowchart of a video processing method provided by an embodiment of the present invention;
[根据细则91更正 20.04.2020] 
图3是本发明实施例涉及的一种处理视频帧的方法的示意图;
[Correct 20.04.2020 according to Rule 91]
FIG. 3 is a schematic diagram of a method for processing video frames according to an embodiment of the present invention;
[根据细则91更正 20.04.2020] 
图4是本发明实施例提供的一种视频处理的设备的模块示意图;
[Correct 20.04.2020 according to Rule 91]
4 is a schematic diagram of modules of a video processing device provided by an embodiment of the present invention;
[根据细则91更正 20.04.2020] 
图5是本发明实施例提供的一种视频处理的设备的硬件结构示意图。
[Correct 20.04.2020 according to Rule 91]
Fig. 5 is a schematic diagram of the hardware structure of a video processing device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面结合附图对本申请实施例进行具体说明。本申请实施例的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。The embodiments of the present application will be described in detail below with reference to the drawings. The terms used in the implementation part of the embodiments of the present application are only used to explain the specific embodiments of the present application, and are not intended to limit the present application.
虚拟对象图像添加入视频或照片中时,在不进行技术处理的情况下,由于无法精确动态定位到真实世界中的物体位置、大小和轮廓,会导致虚拟对象图像浮于真实世界中的物体的图像之上,虚拟对象图像会一直遮挡真实世界中的物体图像,这样虚拟对象图像无法与视频中真实物体图像融为一体,影响了用户的观感体验。本发明提供一种视频处理的遮挡方法及设备,用来解决移动对象图像遮挡虚拟对象图像的问题。本方法是将移动对象图像分割出来,作为不透明罩,再用不透明罩覆盖虚拟对象图像。根据我们的技术方案,在视频或照片中,移动对象图像可以遮挡虚拟对象图像,可以改变虚实物体之间的遮挡关系,使用户看到增强现实的视觉效果,呈现更自然、真实、高度沉浸感的AR融合场景,增加用户对场景的观感体验,视觉体验效果更佳。同时,该方法还有自动校准功能和增强现实支持。When the virtual object image is added to the video or photo, without technical processing, the position, size and contour of the object in the real world cannot be accurately and dynamically positioned, which will cause the virtual object image to float on the object in the real world. Above the image, the virtual object image will always occlude the object image in the real world, so the virtual object image cannot be integrated with the real object image in the video, which affects the user's viewing experience. The present invention provides a occlusion method and device for video processing, which are used to solve the problem that the image of a moving object occludes the image of a virtual object. This method is to segment the moving object image as an opaque mask, and then cover the virtual object image with the opaque mask. According to our technical solution, in videos or photos, moving object images can occlude virtual object images, and can change the occlusion relationship between virtual and real objects, allowing users to see the visual effects of augmented reality, presenting a more natural, real, and highly immersive sense The AR integrates the scene to increase the user’s perception and experience of the scene, and the visual experience is better. At the same time, the method also has automatic calibration function and augmented reality support.
本申请提供一种视频处理的方法,可以在实时视频中分离出不存在参照图像中的移动对象图像,并生成不透明罩,覆盖住视频中加入的虚拟对象图像。图1为本申请的一种实现效果示意性图,图1a表示的是背景图像,即需要图像处理的区域,图1a中示例的背景图像中有一棵树,一个凳子,一朵白云。图1b表示的是在视频拍摄时一个移动对象进入背景中,图1b中该移动对象示例为一个跑步的人。图1c表示的是我们在视频或照片中插入一个虚拟对象图像,不做任何图像处理的情况下,该虚拟对象图像会添加于照片图层的最上一层,覆盖背景图像以及移动对象图像,在图1c中,示例的虚拟对象图像为一个名为皮卡丘的动画形象。图1d表示的是使用本申请提供的一种方法,可以使移动对象图像置于的虚拟对象图像上方,达到移动对象遮挡虚拟对象图像的视觉效果,在图1c示例中为跑步的人遮挡住皮卡丘的效果。The present application provides a video processing method, which can separate moving object images that do not exist in the reference image in real-time video, and generate an opaque mask to cover the virtual object image added in the video. Figure 1 is a schematic diagram of an implementation effect of this application. Figure 1a shows a background image, that is, an area that needs image processing. The example background image in Figure 1a has a tree, a stool, and a white cloud. Figure 1b shows that a moving object enters the background during video shooting. The example of the moving object in Figure 1b is a running person. Figure 1c shows that we insert a virtual object image into the video or photo. Without any image processing, the virtual object image will be added to the top layer of the photo layer, covering the background image and the moving object image. In Figure 1c, the example virtual object image is an animated image named Pikachu. Figure 1d shows that using a method provided by this application, the moving object image can be placed on top of the virtual object image to achieve the visual effect of the moving object occluding the virtual object image. In the example of Figure 1c, the running person occludes Pikachu. Effect.
本申请提供的一种视频处理的方法,具体步骤如图2所示,包括:The method for video processing provided by this application has specific steps as shown in Figure 2 and includes:
S101、拍摄第一照片。S101. Take a first photo.
用户拍摄一张背景快照图像,然后对该背景快照图像进行降噪处理。图像在生成和传输过程中常常因受到各种噪声的干扰和影响而使得图像降质,这对后续图像的处理和图像视觉效应将产生不利影响。因此,为了抑制噪声影响,改善图像质量,便于更高层次的处理,必须对图像进行降噪处理。在这里,对背景快照图像的降噪处理的方式可以采用中值滤波的方法,也可以采用高斯滤波的方法。降噪处理后的图像称为参照图像,即第一照片, 也是后续步骤中集成虚拟对象图像的区域。计算参照图像与背景快照图像的噪声差值,该噪声差值可以用于后续步骤。另外,在第一照片中并不包含移动对象的图像。The user takes a background snapshot image, and then performs noise reduction processing on the background snapshot image. In the process of image generation and transmission, the image is often degraded due to the interference and influence of various noises, which will adversely affect the subsequent image processing and image visual effects. Therefore, in order to suppress the influence of noise, improve image quality, and facilitate higher-level processing, the image must be denoised. Here, the method of reducing the noise of the background snapshot image can be a median filtering method or a Gaussian filtering method. The image after the noise reduction process is called the reference image, that is, the first photo, which is also the area where the virtual object image is integrated in the subsequent steps. Calculate the noise difference between the reference image and the background snapshot image, and the noise difference can be used in subsequent steps. In addition, the image of the moving object is not included in the first photo.
S102、拍摄视频,检测到视频中包含移动对象的图像。S102. Shoot a video, and detect that the video contains an image of a moving object.
其中,视频中的视频帧由背景图像和移动对象的图像构成,背景图像和参照图像一致。Among them, the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the reference image.
S103、分别对比视频中的每个视频帧和参照图像的差异,生成每个视频帧各自对应的不透明遮罩图像。S103: Compare the difference between each video frame and the reference image in the video, and generate an opaque mask image corresponding to each video frame.
具体地,拍摄实时视频时,移动对象进入实时视频中,对比参照图像和实时视频帧的差异,生成第一灰度分割图像,分割出移动对象图像,作为该视频帧对应的不透明遮罩。Specifically, when the real-time video is taken, the moving object enters the real-time video, the difference between the reference image and the real-time video frame is compared, a first gray-scale segmented image is generated, and the moving object image is segmented as an opaque mask corresponding to the video frame.
为了更方便地比较参照图像和每一视频帧的差异,首先将参照图像与每一视频帧缩小到较小的分辨率,因为在缩小像素的过程中图像被归一化,这样可以降低图像对相机传感器产生的噪声的敏感度,所以对比过程中使用较小的图像分辨率会提高比较质量。缩小分辨率的情况根据实际情况进行设定,例如可以缩小为原图的1/2。这样我们得到的不透明遮罩的分辨率也会降低,不透明遮罩的边缘会更加平滑。In order to compare the difference between the reference image and each video frame more conveniently, first reduce the reference image and each video frame to a smaller resolution, because the image is normalized in the process of reducing the pixels, which can reduce the image pair The sensitivity of the noise generated by the camera sensor, so using a smaller image resolution during the comparison process will improve the quality of the comparison. The reduction of the resolution is set according to the actual situation, for example, it can be reduced to 1/2 of the original image. In this way, the resolution of the opaque mask we get will also be reduced, and the edges of the opaque mask will be smoother.
降低图片分辨率之后,上述参照图像与每一视频帧会使用多个色彩空间,通过比较算法进行差异比较,比较算法可以用来计算预设的每个色彩空间中的上述参照图像与实时视频中该帧图像的每个像素值之间的差异。色彩空间可以是常用的RGB、HSV、YCbCr、LAB和XYZ等色彩空间,也可以是用户设置的自定义色彩空间,自定义色彩空间是指在现有技术中业界通用的色彩空间标准基础上,根据需求修改一些参数数值之后的色彩空间。比较算法可以是比较参照图像与该视频帧中同样位置对应的每一个像素点的差异,差异值的计算可以是长度差、平方根差、乘积差等计算,计算后每个像素点会得到一个0或1的结果,0表示白色,1表示黑色,这样比较算法处理后会得到一张黑白像素点组成的图,即灰度分割图像,白色区域图像部分是该比较算法得到的两张图像不同的像素点组成的区域。当每种比较算法得出的结果不相同时,得到的灰度分割图像也不相同。比如,比较算法A可以得出一张胳膊残缺的人体灰度分割图像,比较算法B可以得出一张腿部残缺的人体灰度分割图像,我们将多个灰度分割图像进行合并取最大差异值,可以得到一张更接近完整人体的最终灰度分割图像,即第一灰度分割图像。After reducing the picture resolution, the above-mentioned reference image and each video frame will use multiple color spaces, and the difference is compared through a comparison algorithm. The comparison algorithm can be used to calculate the above-mentioned reference image and real-time video in each preset color space. The difference between the value of each pixel in the frame of the image. The color space can be commonly used color spaces such as RGB, HSV, YCbCr, LAB, and XYZ, or it can be a custom color space set by the user. The custom color space refers to the color space standard commonly used in the industry in the prior art. Modify the color space after some parameter values according to requirements. The comparison algorithm can be to compare the difference between each pixel corresponding to the same position in the reference image and the video frame. The calculation of the difference value can be the calculation of length difference, square root difference, product difference, etc. After calculation, each pixel will get a 0 Or the result of 1, 0 means white, 1 means black, so that after the comparison algorithm is processed, a picture composed of black and white pixels will be obtained, that is, a gray-scale segmented image. The white area image is different from the two images obtained by the comparison algorithm. The area composed of pixels. When the results obtained by each comparison algorithm are not the same, the gray segmentation images obtained are also different. For example, comparison algorithm A can get a gray-scale segmentation image of a human body with a broken arm, and comparison algorithm B can get a gray-scale segmentation image of a human body with a broken leg. We combine multiple gray-scale segmentation images to get the largest difference. Value, a final gray-scale segmented image closer to the complete human body can be obtained, that is, the first gray-scale segmented image.
因此,为了保证最终灰度分割图像的完整性和准确性,我们对每个色彩空间采用多种比较算法进行比较,假设我们在差异比较中采用了M个色彩空间,每个色彩空间得到N个灰度分割图像,我们可以将所有灰度分割图像,即M×N个灰度分割图像的白色区域图像部分取并集,得到一张最终灰度分割图像。具体地使用哪些比较算法,可以根据不同情况由技术人员进行设定,这里不做限制。Therefore, in order to ensure the completeness and accuracy of the final gray-scale segmentation image, we use multiple comparison algorithms for each color space to compare. Suppose we use M color spaces in the difference comparison, and each color space gets N For gray-scale segmented images, we can take the union of all gray-scale segmented images, that is, the white area image parts of M×N gray-scale segmented images, to obtain a final gray-scale segmented image. The specific comparison algorithm to be used can be set by the technician according to different situations, and there is no restriction here.
为了降低相机传感器产生的噪声的影响,即降低实时视频中的噪点,本申请可以考虑在比较算法中加入噪声参考值,用来筛选掉对噪点比较敏感的比较算法的结果。该噪声参考值可以作为一个选择比较算法的判断条件,比如,该噪声参考值大于某个设定值,我们会舍弃相应的几种比较算法的结果。具体地在实践中,该舍弃算法的判断条件的设定是先前技术人员在测试后选定的。比较算法中的噪声参考值可以使用S101步骤中参照图像与背景快照图像的噪声差值。In order to reduce the influence of the noise generated by the camera sensor, that is, to reduce the noise in the real-time video, this application may consider adding a noise reference value to the comparison algorithm to filter out the results of the comparison algorithm that is more sensitive to noise. The noise reference value can be used as a judgment condition for selecting the comparison algorithm. For example, if the noise reference value is greater than a certain set value, we will discard the results of the corresponding comparison algorithms. Specifically, in practice, the setting of the judgment condition of the discarding algorithm is selected by the prior art personnel after testing. The noise reference value in the comparison algorithm may use the noise difference between the reference image and the background snapshot image in step S101.
得到最终灰度分割图像之后,上述缩小分辨率的最终灰度分割图像根据显示屏幕的分 辨率进行比例匹配。另外,我们就可以将最终灰度分割图像中白色区域代表的移动对象分割出来,作为第一不透明遮罩,这个第一不透明遮罩可以在后续步骤中被使用。After the final gray-scale segmented image is obtained, the final gray-scale segmented image with the reduced resolution described above is proportionally matched according to the resolution of the display screen. In addition, we can segment the moving object represented by the white area in the final gray-scale segmentation image as the first opacity mask, which can be used in subsequent steps.
我们可以借助一个示例如图3所示来理解上述过程。首先,我们将参照图像和某一视频帧均缩小分辨率,降低图片分辨率之后,上述参照图像与该视频帧可以使用多个色彩空间,例如LAB色彩空间,YCbCr色彩空间,CMYK色彩空间,每个色彩空间采用多种比较算法,例如LAB色彩空间采用第一比较算法,YCbCr色彩空间采用第二比较算法,CMYK色彩空间采用第三比较算法,每个色彩空间得到不同的灰度分割图像,将所有灰度分割图像的白色区域图像部分混合取并集,得到一张最终灰度分割图像。We can use an example as shown in Figure 3 to understand the above process. First, we reduce the resolution of both the reference image and a certain video frame. After reducing the image resolution, the reference image and the video frame can use multiple color spaces, such as LAB color space, YCbCr color space, and CMYK color space. Each color space uses multiple comparison algorithms. For example, the LAB color space uses the first comparison algorithm, the YCbCr color space uses the second comparison algorithm, and the CMYK color space uses the third comparison algorithm. Each color space gets a different grayscale segmentation image. The white area image parts of all gray-scale segmented images are mixed and combined to obtain a final gray-scale segmented image.
S104、每个视频帧和每个视频帧各自对应的不透明遮罩图像融合,生成每个视频帧对应的遮挡帧。S104: Each video frame and the corresponding opaque mask image of each video frame are merged to generate an occlusion frame corresponding to each video frame.
具体的,我们可以将每一帧图像的不透明遮罩应用于对应的每一帧实时视频的图像上,去捕捉每一帧移动对象的图像,生成遮挡帧。在每个遮挡帧中,对应移动对象的图像区域是不透明图像,移动对象以外的图像区域为透明图像。Specifically, we can apply the opacity mask of each frame of image to the corresponding image of each frame of real-time video to capture the image of each frame of the moving object, and generate the occlusion frame. In each occlusion frame, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image.
S105、每个视频帧和虚拟对象的图像融合,生成每个视频帧对应的渲染帧。S105. Fusion of each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame.
在一示例中,我们可以在三维建模软件中对实时视频中的真实物体进行建模,将生成的模型保存成可以导入着色器的格式。该着色器可以具有不同的混合操作,如Add,Sub,以及不同的混合因子,如SrcColor,One,从而将3D对象和实时视频中真实物体图像混合并创建多个组合结果。我们使用的着色器可以是Unity3D着色器。比如,我们可以将上述建模生成的模型导入AR程序Unity3D工程,添加自定义3D虚拟对象,生成添有3D虚拟对象的渲染视频。三维建模软件可以包括专业建模软件或带有建模功能的软件,可以选用带有Unity3D插件的软件,Unity3D插件可以支持添加虚拟3D对象的效果,可以适用于安卓、IOS等终端平台上。对应生成的模型的保存格式可以为.fbx和.obj。另外,混合虚拟对象不仅仅限于3D虚拟对象,也可以是2D虚拟对象等。In one example, we can model real objects in real-time video in 3D modeling software, and save the generated model in a format that can be imported into a shader. The shader can have different blending operations, such as Add, Sub, and different blending factors, such as SrcColor, One, so as to blend the 3D object and the real object image in the real-time video and create multiple combined results. The shader we use can be Unity3D shader. For example, we can import the model generated by the above modeling into the Unity3D project of the AR program, add a custom 3D virtual object, and generate a rendered video with the 3D virtual object added. Three-dimensional modeling software can include professional modeling software or software with modeling functions, and software with Unity3D plug-in can be selected. Unity3D plug-in can support the effect of adding virtual 3D objects, and can be applied to terminal platforms such as Android and IOS. The corresponding save format of the generated model can be .fbx and .obj. In addition, the hybrid virtual object is not limited to 3D virtual objects, but can also be 2D virtual objects and the like.
在虚拟对象与实时视频中每一帧图像进行混合的时候,我们可以将光照信息和纹理细节信息考虑进模型中,这样会使得虚拟对象与真实世界的合成效果更为真实。When the virtual object is mixed with each frame of the real-time video, we can take the lighting information and texture detail information into the model, which will make the composite effect of the virtual object and the real world more realistic.
渲染帧由背景图像和虚拟对象的图像构成,背景图像和参照图像一致。当虚拟对象的图像与背景图像的图像区域重叠时,虚拟对象的图像遮挡住背景图像。The rendered frame is composed of the background image and the image of the virtual object, and the background image is consistent with the reference image. When the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.
S106、将每个视频帧对应的遮挡帧和每个视频帧对应的渲染帧融合,生成每个视频帧各自对应的合成帧。S106. Fusion the occlusion frame corresponding to each video frame and the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.
在每个合成帧中,当虚拟对象的图像与移动对象的图像发生图像区域重叠时,虚拟对象的图像被移动对象的图像遮挡。In each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is blocked by the image of the moving object.
S107、将每个视频帧各自对应的合成帧串联起来生成合成视频。S107: Concatenate the respective composite frames corresponding to each video frame to generate a composite video.
最后,在合成帧以及合成视频中我们可以看到添加入实时视频中的虚拟对象的图像可以被移动对象的图像遮挡的效果。Finally, in the composite frame and composite video, we can see the effect that the image of the virtual object added to the real-time video can be blocked by the image of the moving object.
本申请提供了一种视频处理的设备400,可以在实时视频中使移动对象的图像覆盖住实时视频中添加入的虚拟对像的图像。该设备可以是移动电话、手机、平板电脑等具有相机拍照/摄像功能的终端设备,该设备的图形处理器GPU可以支持运行高级着色器语言(high level shader language,HLSL)或者OpenGL着色语言(openGLshading language,GLSL), 该设备还可以具有同步定位与建图(Simultaneous Localization and Mapping,SLAM)系统。The present application provides a video processing device 400, which can make the image of the moving object cover the image of the virtual object added in the real-time video in the real-time video. The device can be a mobile phone, mobile phone, tablet computer and other terminal devices with camera photo/video functions. The graphics processor GPU of the device can support running high-level shader language (HLSL) or OpenGL shading language (openGLshading). language, GLSL), the device may also have a Simultaneous Localization and Mapping (SLAM) system.
该设备的功能框图如图4所示,该设备由拍摄模块401、分割模块402、渲染模块403、合成模块404构成。The functional block diagram of the device is shown in FIG.
拍摄模块401用于拍摄第一照片。具体地,用户拍摄一张背景快照图像,然后对该背景快照图像进行降噪处理。图像在生成和传输过程中常常因受到各种噪声的干扰和影响而使得图像降质,这对后续图像的处理和图像视觉效应将产生不利影响。因此,为了抑制噪声影响,改善图像质量,便于更高层次的处理,必须对图像进行降噪处理。在这里,对背景快照图像的降噪处理的方式可以采用中值滤波的方法,也可以采用高斯滤波的方法。降噪处理后的图像称为参照图像,即第一照片,也是后续步骤中集成虚拟对象图像的区域。计算参照图像与背景快照图像的噪声差值,该噪声差值可以用于后续步骤。另外,在第一照片中并不包含移动对象的图像。The photographing module 401 is used to photograph the first photo. Specifically, the user takes a background snapshot image, and then performs noise reduction processing on the background snapshot image. In the process of image generation and transmission, the image is often degraded due to the interference and influence of various noises, which will adversely affect the subsequent image processing and image visual effects. Therefore, in order to suppress the influence of noise, improve image quality, and facilitate higher-level processing, the image must be denoised. Here, the method of reducing the noise of the background snapshot image can be a median filtering method or a Gaussian filtering method. The image after the noise reduction process is called the reference image, that is, the first photo, which is also the area where the virtual object image is integrated in the subsequent steps. Calculate the noise difference between the reference image and the background snapshot image, and the noise difference can be used in subsequent steps. In addition, the image of the moving object is not included in the first photo.
拍摄模块401还用于拍摄视频,检测到视频中包含移动对象的图像。其中,视频中的视频帧由背景图像和移动对象的图像构成,背景图像和参照图像一致。The shooting module 401 is also used to shoot a video, and it is detected that the video contains an image of a moving object. Among them, the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the reference image.
分割模块402用于分别对比视频中的每个视频帧和参照图像的差异,生成每个视频帧各自对应的不透明遮罩图像。The segmentation module 402 is configured to compare the difference between each video frame and the reference image in the video, and generate an opaque mask image corresponding to each video frame.
具体地,拍摄实时视频时,移动对象进入实时视频中,对比参照图像和实时视频帧的差异,生成第一灰度分割图像,分割出移动对象图像,作为该视频帧对应的不透明遮罩。Specifically, when the real-time video is taken, the moving object enters the real-time video, the difference between the reference image and the real-time video frame is compared, a first gray-scale segmented image is generated, and the moving object image is segmented as an opaque mask corresponding to the video frame.
为了更方便地比较参照图像和每一视频帧的差异,首先将参照图像与每一视频帧缩小到较小的分辨率,因为在缩小像素的过程中图像被归一化,这样可以降低图像对相机传感器产生的噪声的敏感度,所以对比过程中使用较小的图像分辨率会提高比较质量。缩小分辨率的情况根据实际情况进行设定,例如可以缩小为原图的1/2。这样我们得到的不透明遮罩的分辨率也会降低,不透明遮罩的边缘会更加平滑。In order to compare the difference between the reference image and each video frame more conveniently, first reduce the reference image and each video frame to a smaller resolution, because the image is normalized in the process of reducing the pixels, which can reduce the image pair The sensitivity of the noise generated by the camera sensor, so using a smaller image resolution during the comparison process will improve the quality of the comparison. The reduction of the resolution is set according to the actual situation, for example, it can be reduced to 1/2 of the original image. In this way, the resolution of the opaque mask we get will also be reduced, and the edges of the opaque mask will be smoother.
降低图片分辨率之后,上述参照图像与每一视频帧会使用多个色彩空间,通过比较算法进行差异比较,比较算法可以用来计算预设的每个色彩空间中的上述参照图像与实时视频中该帧图像的每个像素值之间的差异。色彩空间可以是常用的RGB、HSV、YCbCr、LAB和XYZ等色彩空间,也可以是用户设置的自定义色彩空间,自定义色彩空间是指在现有技术中业界通用的色彩空间标准基础上,根据需求修改一些参数数值之后的色彩空间。比较算法可以是比较参照图像与该视频帧中同样位置对应的每一个像素点的差异,差异值的计算可以是长度差、平方根差、乘积差等计算,计算后每个像素点会得到一个0或1的结果,0表示白色,1表示黑色,这样比较算法处理后会得到一张黑白像素点组成的图,即灰度分割图像,白色区域图像部分是该比较算法得到的两张图像不同的像素点组成的区域。当每种比较算法得出的结果不相同时,得到的灰度分割图像也不相同。比如,比较算法A可以得出一张胳膊残缺的人体灰度分割图像,比较算法B可以得出一张腿部残缺的人体灰度分割图像,我们将多个灰度分割图像进行合并取最大差异值,可以得到一张更接近完整人体的最终灰度分割图像,即第一灰度分割图像。After reducing the picture resolution, the above reference image and each video frame will use multiple color spaces, and the difference is compared through a comparison algorithm. The comparison algorithm can be used to calculate the preset reference image and real-time video in each color space. The difference between the value of each pixel in the frame of the image. The color space can be commonly used color spaces such as RGB, HSV, YCbCr, LAB, and XYZ, or it can be a custom color space set by the user. The custom color space refers to the color space standard commonly used in the industry in the prior art. Modify the color space after some parameter values according to requirements. The comparison algorithm can be to compare the difference between each pixel corresponding to the same position in the reference image and the video frame. The calculation of the difference value can be the calculation of length difference, square root difference, product difference, etc. After calculation, each pixel will get a 0 Or the result of 1, 0 means white, 1 means black, so that after the comparison algorithm is processed, a black-and-white pixel image will be obtained, that is, a gray-scale segmented image. The white area image is different from the two images obtained by the comparison algorithm. The area composed of pixels. When the results obtained by each comparison algorithm are not the same, the gray segmentation images obtained are also different. For example, comparison algorithm A can get a gray-scale segmentation image of a human body with a broken arm, and comparison algorithm B can get a gray-scale segmentation image of a human body with a broken leg. We combine multiple gray-scale segmentation images to get the largest difference. Value, a final gray-scale segmented image closer to the complete human body can be obtained, that is, the first gray-scale segmented image.
因此,为了保证最终灰度分割图像的完整性和准确性,我们对每个色彩空间采用多种比较算法进行比较,假设我们在差异比较中采用了M个色彩空间,每个色彩空间得到N个灰度分割图像,我们可以将所有灰度分割图像,即M×N个灰度分割图像的白色区域图像 部分取并集,得到一张最终灰度分割图像。具体地使用哪些比较算法,可以根据不同情况由技术人员进行设定,这里不做限制。Therefore, in order to ensure the completeness and accuracy of the final gray-scale segmentation image, we use multiple comparison algorithms for each color space to compare. Suppose we use M color spaces in the difference comparison, and each color space gets N For gray-scale segmented images, we can take the union of all gray-scale segmented images, that is, the white area image parts of M×N gray-scale segmented images, to obtain a final gray-scale segmented image. The specific comparison algorithm to be used can be set by the technician according to different situations, and there is no restriction here.
为了降低相机传感器产生的噪声的影响,即降低实时视频中的噪点,本申请可以考虑在比较算法中加入噪声参考值,用来筛选掉对噪点比较敏感的比较算法的结果。该噪声参考值可以作为一个选择比较算法的判断条件,比如,该噪声参考值大于某个设定值,我们会舍弃相应的几种比较算法的结果。具体地在实践中,该舍弃算法的判断条件的设定是先前技术人员在测试后选定的。比较算法中的噪声参考值可以使用S101步骤中参照图像与背景快照图像的噪声差值。In order to reduce the influence of the noise generated by the camera sensor, that is, to reduce the noise in the real-time video, this application may consider adding a noise reference value to the comparison algorithm to filter out the results of the comparison algorithm that is more sensitive to noise. The noise reference value can be used as a judgment condition for selecting the comparison algorithm. For example, if the noise reference value is greater than a certain set value, we will discard the results of the corresponding comparison algorithms. Specifically, in practice, the setting of the judgment condition of the discarding algorithm is selected by the prior art personnel after testing. The noise reference value in the comparison algorithm may use the noise difference between the reference image and the background snapshot image in step S101.
得到最终灰度分割图像之后,上述缩小分辨率的最终灰度分割图像根据显示屏幕的分辨率进行比例匹配。另外,我们就可以将最终灰度分割图像中白色区域代表的移动对象分割出来,作为第一不透明遮罩,这个第一不透明遮罩可以在后续步骤中被使用。After the final gray-scale segmented image is obtained, the final gray-scale segmented image with the reduced resolution is scaled according to the resolution of the display screen. In addition, we can segment the moving object represented by the white area in the final gray-scale segmentation image as the first opacity mask, which can be used in subsequent steps.
我们可以借助一个示例如图3所示来理解上述过程。首先,我们将参照图像和某一视频帧均缩小分辨率,降低图片分辨率之后,上述参照图像与该视频帧可以使用多个色彩空间,例如LAB色彩空间,YCbCr色彩空间,CMYK色彩空间,每个色彩空间采用多种比较算法,例如LAB色彩空间采用第一比较算法,YCbCr色彩空间采用第二比较算法,CMYK色彩空间采用第三比较算法,每个色彩空间得到不同的灰度分割图像,将所有灰度分割图像的白色区域图像部分混合取并集,得到一张最终灰度分割图像。We can use an example as shown in Figure 3 to understand the above process. First, we reduce the resolution of both the reference image and a certain video frame. After reducing the image resolution, the reference image and the video frame can use multiple color spaces, such as LAB color space, YCbCr color space, and CMYK color space. Each color space uses multiple comparison algorithms. For example, the LAB color space uses the first comparison algorithm, the YCbCr color space uses the second comparison algorithm, and the CMYK color space uses the third comparison algorithm. Each color space gets a different grayscale segmentation image. The white area image parts of all gray-scale segmented images are mixed and combined to obtain a final gray-scale segmented image.
渲染模块403用于将每个视频帧和虚拟对象的图像融合,生成每个视频帧对应的渲染帧。The rendering module 403 is used for fusing each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame.
在一示例中,我们可以在三维建模软件中对实时视频中的真实物体进行建模,将生成的模型保存成可以导入着色器的格式。该着色器可以具有不同的混合操作,如Add,Sub,以及不同的混合因子,如SrcColor,One,从而将3D对象和实时视频中真实物体图像混合并创建多个组合结果。我们使用的着色器可以是Unity3D着色器。比如,我们可以将上述建模生成的模型导入AR程序Unity3D工程,添加自定义3D虚拟对象,生成添有3D虚拟对象的渲染视频。三维建模软件可以包括专业建模软件或带有建模功能的软件,可以选用带有Unity3D插件的软件,Unity3D插件可以支持添加虚拟3D对象的效果,可以适用于安卓、IOS等终端平台上。对应生成的模型的保存格式可以为.fbx和.obj。另外,混合虚拟对象不仅仅限于3D虚拟对象,也可以是2D虚拟对象等。In one example, we can model real objects in real-time video in 3D modeling software, and save the generated model in a format that can be imported into a shader. The shader can have different blending operations, such as Add, Sub, and different blending factors, such as SrcColor, One, so as to blend the 3D object and the real object image in the real-time video and create multiple combined results. The shader we use can be Unity3D shader. For example, we can import the model generated by the above modeling into the Unity3D project of the AR program, add a custom 3D virtual object, and generate a rendered video with the 3D virtual object added. Three-dimensional modeling software can include professional modeling software or software with modeling functions, and software with Unity3D plug-in can be selected. Unity3D plug-in can support the effect of adding virtual 3D objects, and can be applied to terminal platforms such as Android and IOS. The corresponding save format of the generated model can be .fbx and .obj. In addition, the hybrid virtual object is not limited to 3D virtual objects, but can also be 2D virtual objects and the like.
在虚拟对象与实时视频中每一帧图像进行混合的时候,我们可以将光照信息和纹理细节信息考虑进模型中,这样会使得虚拟对象与真实世界的合成效果更为真实。When the virtual object is mixed with each frame of the real-time video, we can take the lighting information and texture detail information into the model, which will make the composite effect of the virtual object and the real world more realistic.
渲染帧由背景图像和虚拟对象的图像构成,背景图像和参照图像一致。当虚拟对象的图像与背景图像的图像区域重叠时,虚拟对象的图像遮挡住背景图像。The rendered frame is composed of the background image and the image of the virtual object, and the background image is consistent with the reference image. When the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.
合成模块404用于将每个视频帧和每个视频帧各自对应的不透明遮罩图像、每个视频帧各自对应的渲染帧进行融合,生成每个视频帧各自对应的合成帧。The synthesis module 404 is configured to merge each video frame and each corresponding opaque mask image of each video frame, and each corresponding rendering frame of each video frame, to generate a synthesis frame corresponding to each video frame.
具体地,每个视频帧和每个视频帧各自对应的不透明遮罩图像融合,生成每个视频帧对应的遮挡帧。我们可以将每一帧图像的不透明遮罩应用于对应的每一帧实时视频的图像上,去捕捉每一帧移动对象的图像,生成遮挡帧。在每个遮挡帧中,对应移动对象的图像区域是不透明图像,移动对象以外的图像区域为透明图像。Specifically, each video frame and the corresponding opaque mask image of each video frame are merged to generate the occlusion frame corresponding to each video frame. We can apply the opacity mask of each frame of image to the corresponding image of each frame of real-time video to capture the image of each frame of moving objects and generate occlusion frames. In each occlusion frame, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image.
将每个视频帧对应的遮挡帧和每个视频帧对应的渲染帧融合,生成每个视频帧各自对应的合成帧。The occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.
在每个合成帧中,当虚拟对象的图像与移动对象的图像发生图像区域重叠时,虚拟对象的图像被移动对象的图像遮挡。In each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is blocked by the image of the moving object.
合成模块404还用于将每个视频帧各自对应的合成帧串联起来生成合成视频。The synthesis module 404 is also used for concatenating the respective synthesized frames corresponding to each video frame to generate a synthesized video.
最后,在合成帧以及合成视频中我们可以看到添加入实时视频中的虚拟对象的图像可以被移动对象的图像遮挡的效果。Finally, in the composite frame and composite video, we can see the effect that the image of the virtual object added to the real-time video can be blocked by the image of the moving object.
下面介绍本申请实施例涉及的一种终端设备500的硬件架构图。该终端设备可以是手机、平板电脑、笔记本电脑等带有拍照和摄像功能的终端设备,该终端设备的图形处理器GPU可以支持运行高级着色器语言(high level shader language,HLSL)或者OpenGL着色语言(openGLshading language,GLSL),该终端设备还可以具有同步定位与建图(Simultaneous Localization and Mapping,SLAM)系统。该终端设备可以配置有较大屏幕(例如5寸及以上的屏幕),方便用户观看拍摄效果。该终端设备搭载有一颗或多颗摄像头,例如2D摄像头,3D摄像头,这里不作限制。The following describes a hardware architecture diagram of a terminal device 500 related to an embodiment of the present application. The terminal device can be a mobile phone, a tablet computer, a notebook computer and other terminal devices with camera and video functions. The graphics processor GPU of the terminal device can support running high-level shader language (HLSL) or OpenGL shading language (openGLshading language, GLSL), the terminal device may also have a simultaneous positioning and mapping (Simultaneous Localization and Mapping, SLAM) system. The terminal device can be equipped with a larger screen (for example, a screen of 5 inches and above) to facilitate the user to watch the shooting effect. The terminal device is equipped with one or more cameras, such as a 2D camera, a 3D camera, and there is no restriction here.
图5是终端设备500的一种实现方式的结构框图。如图5所示,终端设备500可包括:基带芯片510、存储器515(一个或多个计算机可读存储介质)、射频(RF)模块516、外围系统517。这些部件可在一个或多个通信总线514上通信。FIG. 5 is a structural block diagram of an implementation manner of the terminal device 500. As shown in FIG. 5, the terminal device 500 may include: a baseband chip 510, a memory 515 (one or more computer-readable storage media), a radio frequency (RF) module 516, and a peripheral system 517. These components may communicate on one or more communication buses 514.
外围系统517主要用于实现终端500和用户/外部环境之间的交互功能,主要包括终端500的输入输出装置。具体实现中,外围系统517可包括:触摸屏控制器518、摄像头控制器519、音频控制器520以及传感器管理模块521。其中,各个控制器可与各自对应的外围设备(如触摸屏523、摄像头524、音频电路525以及传感器526)耦合。触摸屏523也称为触控面板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触摸屏上或在触摸屏附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触摸屏可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器511,并能接收处理器511发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触摸屏。外围系统517还可以包括显示面板,可选的,可以采用LCD(Liquid Crystal Display,液晶显示器)、OLED(Organic Light-Emitting Diode,有机发光二极管)等形式来配置显示面板。进一步的,触摸屏可覆盖显示面板,当触摸屏检测到在其上或附近的触摸操作后,传送给处理器511以确定触摸事件的类型,随后处理器511根据触摸事件的类型在显示面板上提供相应的视觉输出。在某些实施例中,可以将触摸屏与显示面板集成而实现终端设备500的输入和输出功能。另外,在一些实施例中,摄像头524可以是2D摄像头或者3D摄像头。需要说明的,外围系统517还可以包括其他I/O外设,对此不做限制。The peripheral system 517 is mainly used to implement the interactive function between the terminal 500 and the user/external environment, and mainly includes the input and output devices of the terminal 500. In a specific implementation, the peripheral system 517 may include: a touch screen controller 518, a camera controller 519, an audio controller 520, and a sensor management module 521. Among them, each controller can be coupled with its corresponding peripheral devices (such as the touch screen 523, the camera 524, the audio circuit 525, and the sensor 526). The touch screen 523 is also called a touch panel, which can collect the user's touch operations on or near it (for example, the user's operations on the touch screen or near the touch screen using any suitable objects or accessories such as fingers, stylus, etc.), and can be set according to the preset The specified program drives the corresponding connection device. Optionally, the touch screen may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 511, and can receive and execute the commands sent by the processor 511. In addition, multiple types of resistive, capacitive, infrared, and surface acoustic waves can be used to implement touch screens. The peripheral system 517 may also include a display panel. Optionally, the display panel may be configured in the form of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), etc. Further, the touch screen can cover the display panel. When the touch screen detects a touch operation on or near it, it is transmitted to the processor 511 to determine the type of the touch event, and then the processor 511 provides corresponding information on the display panel according to the type of the touch event. Visual output. In some embodiments, a touch screen and a display panel can be integrated to realize the input and output functions of the terminal device 500. In addition, in some embodiments, the camera 524 may be a 2D camera or a 3D camera. It should be noted that the peripheral system 517 may also include other I/O peripherals, which is not limited.
基带芯片510可集成包括:一个或多个处理器511、时钟模块512以及电源管理模块513。集成于基带芯片510中的时钟模块512主要用于为处理器511产生数据传输和时序控制所需要的时钟。集成于基带芯片510中的电源管理模块513主要通过管理充电、放电以 及功耗分配等功能,为处理器511、射频模块516以及外围系统提供稳定的、高精确度的电压。本申请实施例中的处理器511,可以包括如下至少一种类型:通用中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、微处理器、特定应用集成电路专用集成电路(application-specific integrated circuit,ASIC)、微控制器(microcontroller unit,MCU)、现场可编程门阵列(field programmable gate array,FPGA)、或者用于实现逻辑运算的集成电路。例如,处理器511可以是一个单核(single-CPU)处理器或多核(multi-CPU)处理器。至少一个处理器511可以是集成在一个芯片中或位于多个不同的芯片上。The baseband chip 510 may integrate: one or more processors 511, a clock module 512, and a power management module 513. The clock module 512 integrated in the baseband chip 510 is mainly used to generate a clock required for data transmission and timing control for the processor 511. The power management module 513 integrated in the baseband chip 510 mainly manages charging, discharging, and power consumption distribution functions to provide a stable, high-precision voltage for the processor 511, the radio frequency module 516, and peripheral systems. The processor 511 in the embodiment of this application may include at least one of the following types: general-purpose central processing unit (central processing unit, CPU), graphics processing unit (GPU), digital signal processor (digital signal processor, DSP), microprocessor, application-specific integrated circuit (ASIC), microcontroller (microcontroller unit, MCU), field programmable gate array (field programmable gate array, FPGA), or It is an integrated circuit that implements logic operations. For example, the processor 511 may be a single-CPU processor or a multi-CPU processor. The at least one processor 511 may be integrated in one chip or located on multiple different chips.
射频(RF)模块516用于接收和发送射频信号,主要集成了终端500的接收器和发射器。射频(RF)模块516通过射频信号与通信网络和其他通信设备通信。具体实现中,射频(RF)模块516可包括但不限于:天线系统、RF收发器、一个或多个放大器、调谐器、一个或多个振荡器、数字信号处理器、CODEC芯片、SIM卡5161和存储介质等。在一些实施例中,可在单独的芯片上实现射频(RF)模块516。此外,RF模块516还可以通过无线通信与网络和其他设备通信,例如Wi-Fi5162。所述无线通信可以使用任一通信标准或协议,包括但不限于GSM(global system of mobile communication,全球移动通讯系统)、GPRS(general packet radio service,通用分组无线服务)、CDMA(code division multiple access,码分多址)、WCDMA(wideband code division multiple access,宽带码分多址)、LTE(long term evolution,长期演进)、电子邮件、SMS(short messaging service,短消息服务)、短距离通信技术等。The radio frequency (RF) module 516 is used to receive and transmit radio frequency signals, and mainly integrates the receiver and transmitter of the terminal 500. The radio frequency (RF) module 516 communicates with the communication network and other communication devices through radio frequency signals. In a specific implementation, the radio frequency (RF) module 516 may include, but is not limited to: an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chip, a SIM card 5161 And storage media. In some embodiments, the radio frequency (RF) module 516 may be implemented on a separate chip. In addition, the RF module 516 can also communicate with the network and other devices through wireless communication, such as Wi-Fi 5162. The wireless communication can use any communication standard or protocol, including but not limited to GSM (global system of mobile communication, global system for mobile communication), GPRS (general packet radio service, general packet radio service), CDMA (code division multiple access) , Code division multiple access), WCDMA (wideband code division multiple access, wideband code division multiple access), LTE (long term evolution), email, SMS (short messaging service, short message service), short-distance communication technology Wait.
存储器515与处理器511耦合,用于存储各种软件程序和/或多组指令。具体实现中,存储器515可包括高速随机存取的存储器,并且也可包括非易失性存储器,例如一个或多个磁盘存储设备、闪存设备或其他非易失性固态存储设备。存储器515可以存储操作系统(下述简称系统),例如ANDROID,IOS等嵌入式操作系统。存储器515还可以存储网络通信程序,该网络通信程序可用于与一个或多个附加设备,一个或多个终端设备,一个或多个网络设备进行通信。存储器515还可以存储用户接口程序,该用户接口程序可以通过图形化的操作界面将应用程序的内容形象逼真的显示出来,并通过菜单、对话框以及按键等输入控件接收用户对应用程序的控制操作。The memory 515 is coupled with the processor 511, and is used to store various software programs and/or multiple sets of instructions. In a specific implementation, the memory 515 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 515 may store an operating system (hereinafter referred to as system), such as embedded operating systems such as ANDROID, IOS. The memory 515 may also store a network communication program, which may be used to communicate with one or more additional devices, one or more terminal devices, and one or more network devices. The memory 515 can also store a user interface program, which can vividly display the content of the application program through a graphical operation interface, and receive user control operations on the application program through input controls such as menus, dialog boxes, and keys. .
存储器515还可以存储一个或多个应用程序。如图5所示,这些应用程序可包括:社交应用程序(例如Facebook),图像管理应用程序(例如相册),地图类应用程序(例如谷歌地图),浏览器(例如Safari,Google Chrome)等等。The memory 515 may also store one or more application programs. As shown in Figure 5, these applications may include: social applications (such as Facebook), image management applications (such as photo albums), map applications (such as Google Maps), browsers (such as Safari, Google Chrome), etc. .
应当理解,终端设备500仅为本发明实施例提供的一个例子,并且,终端设备500可具有比示出的部件更多或更少的部件,可以组合两个或更多个部件,或者可具有部件的不同配置实现。It should be understood that the terminal device 500 is only an example provided by the embodiment of the present invention, and the terminal device 500 may have more or fewer components than shown, may combine two or more components, or may have Different configurations of components are realized.
实施本发明方法实施例,在视频或照片中,在移动对象图像与虚拟对象图像区域发生重叠的时候,移动对象图像可以实时遮挡虚拟对象图像,改变物体之间的遮挡关系,使用户看到增强现实的视觉效果,视觉体验效果更佳。本发明方法使用的计算量更小,处理器负荷量更小,操作容易,实现方便。同时,该方法还有自动校准功能和增强现实支持。Implementing the method embodiments of the present invention, in a video or photo, when the moving object image and the virtual object image area overlap, the moving object image can block the virtual object image in real time, change the occlusion relationship between objects, and make the user see enhanced Realistic visual effects, better visual experience. The method of the present invention uses smaller calculation amount, smaller processor load, easy operation and convenient realization. At the same time, the method also has automatic calibration function and augmented reality support.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产 品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may be in the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in further detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

Claims (12)

  1. 一种视频处理的方法,其特征在于,所述方法包括:A video processing method, characterized in that the method includes:
    拍摄第一照片;Take the first photo;
    拍摄视频,检测到所述视频中包含移动对象的图像;所述视频中的视频帧由背景图像和所述移动对象的图像构成,所述背景图像和所述第一照片的图像一致;Shooting a video, and detecting that the video contains an image of a moving object; the video frame in the video is composed of a background image and an image of the moving object, and the background image is consistent with the image of the first photo;
    分别对比所述视频中的每个视频帧和所述第一照片的差异,生成所述每个视频帧各自对应的不透明遮罩图像;在所述每个不透明遮罩图像中,对应所述移动对象的图像区域是不透明图像,所述移动对象以外的图像区域为透明图像;The difference between each video frame in the video and the first photo is respectively compared to generate an opaque mask image corresponding to each video frame; in each opaque mask image, corresponding to the movement The image area of the object is an opaque image, and the image area other than the moving object is a transparent image;
    将所述每个视频帧分别和每个视频帧各自对应的不透明遮罩图像、虚拟对象的图像进行融合,生成所述每个视频帧各自对应的合成帧;Fusing each video frame with an opaque mask image and an image of a virtual object corresponding to each video frame to generate a composite frame corresponding to each video frame;
    将所述每个视频帧各自对应的合成帧串联起来生成合成视频。The composite frame corresponding to each video frame is connected in series to generate a composite video.
  2. 根据权利要求1所述的方法,其特征在于,在所述每个合成帧中,当所述虚拟对象的图像与所述移动对象的图像发生图像区域重叠时,所述虚拟对象的图像被所述移动对象的图像遮挡。The method according to claim 1, wherein, in each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is The image occlusion of the moving object.
  3. 根据权利要求1或2所述的方法,其特征在于,所述分别对比所述视频中的每个视频帧和所述第一照片的差异,生成所述每个视频帧各自对应的不透明遮罩图像,具体包括:The method according to claim 1 or 2, wherein the difference between each video frame in the video and the first photo is respectively compared to generate an opaque mask corresponding to each video frame Images, including:
    计算所述第一照片与所述第一视频帧的色彩空间的差异值;所述差异值是长度差或平方根差或乘积差;所述计算后生成多个灰度分割图像;Calculating a difference value in the color space of the first photo and the first video frame; the difference value is a length difference or a square root difference or a product difference; generating multiple gray-scale segmented images after the calculation;
    将所述多个灰度分割图像进行图像合并,得到第一灰度分割图像;Image merging the multiple gray-scale segmented images to obtain a first gray-scale segmented image;
    将所述第一灰度分割图像中所述移动对象的图像分割出来,作为第一不透明遮罩图像;Segmenting the image of the moving object in the first gray-scale segmented image as a first opaque mask image;
    分别对比所述视频中的每个视频帧和所述第一照片的差异,生成所述每个视频帧各自对应的不透明遮罩图像。The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
  4. 根据权利要求1或2所述的方法,其特征在于,所述第一照片中不能包含所述移动对象的图像。The method according to claim 1 or 2, wherein the image of the moving object cannot be included in the first photo.
  5. 根据权利要求1或2所述的方法,其特征在于,所述将所述每个视频帧分别和每个视频帧各自对应的不透明遮罩图像、虚拟对象的图像进行融合,生成所述每个视频帧各自对应的合成帧,具体包括:The method according to claim 1 or 2, characterized in that said fusing each video frame with the corresponding opaque mask image and virtual object image of each video frame to generate each The composite frame corresponding to each video frame includes:
    所述每个视频帧和每个视频帧各自对应的不透明遮罩图像融合,生成每个视频帧对应的遮挡帧;Fusion of the respective opaque mask images corresponding to each video frame and each video frame to generate a mask frame corresponding to each video frame;
    所述每个视频帧和虚拟对象的图像融合,生成每个视频帧对应的渲染帧;Fusion of each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame;
    将所述每个视频帧对应的遮挡帧和所述每个视频帧对应的渲染帧融合,生成所述每个视频帧各自对应的合成帧。The occlusion frame corresponding to each video frame and the rendered frame corresponding to each video frame are merged to generate a composite frame corresponding to each video frame.
  6. 根据权利要求5所述的方法,其特征在于,所述渲染帧由所述背景图像和所述虚拟对象的图像构成,所述背景图像和所述第一照片的图像一致;当所述虚拟对象的图像与所述背景图像的图像区域重叠时,所述虚拟对象的图像遮挡住所述背景图像。The method according to claim 5, wherein the rendering frame is composed of the background image and the image of the virtual object, and the background image is consistent with the image of the first photo; when the virtual object When the image of is overlapped with the image area of the background image, the image of the virtual object occludes the background image.
  7. 一种视频处理的设备,其特征在于,所述设备包括:拍摄模块、分割模块、渲染模块、合成模块;A video processing device, characterized in that the device includes: a shooting module, a segmentation module, a rendering module, and a synthesis module;
    所述拍摄模块用于拍摄第一照片;所述拍摄模块还用于拍摄视频,检测到所述视频中包含移动对象的图像;所述视频中的视频帧由背景图像和所述移动对象的图像构成,所述背景图像和所述第一照片的图像一致;The shooting module is used to shoot a first photo; the shooting module is also used to shoot a video, and it is detected that the video contains an image of a moving object; the video frame in the video is composed of a background image and an image of the moving object. Composition, the background image is consistent with the image of the first photo;
    所述分割模块用于分别对比所述视频中的每个视频帧和所述第一照片的差异,生成所述每个视频帧各自对应的不透明遮罩图像;在所述每个不透明遮罩图像中,对应所述移动对象的图像区域是不透明图像,所述移动对象以外的图像区域为透明图像;The segmentation module is used to compare the difference between each video frame in the video and the first photo, and generate an opaque mask image corresponding to each video frame; in each opaque mask image Wherein, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image;
    所述渲染模块用于将所述每个视频帧和虚拟对象的图像融合,生成每个视频帧对应的渲染帧;The rendering module is configured to merge each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame;
    所述合成模块用于将所述每个视频帧和每个视频帧各自对应的不透明遮罩图像、每个视频帧各自对应的所述渲染帧进行融合,生成所述每个视频帧各自对应的合成帧;The compositing module is used for fusing the respective opaque mask images corresponding to each video frame and each video frame, and the rendering frames corresponding to each video frame to generate the respective corresponding rendering frames of each video frame. Composite frame
    所述合成模块还用于将所述每个视频帧各自对应的合成帧串联起来生成合成视频。The synthesis module is also used for concatenating the respective synthesized frames corresponding to each of the video frames to generate a synthesized video.
  8. 根据权利要求7所述的方法,其特征在于,在所述每个合成帧中,当所述虚拟对象的图像与所述移动对象的图像发生图像区域重叠时,所述虚拟对象的图像被所述移动对象的图像遮挡。The method according to claim 7, wherein, in each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is The image occlusion of the moving object.
  9. 根据权利要求7或8所述的方法,其特征在于,所述分割模块具体用于:The method according to claim 7 or 8, wherein the segmentation module is specifically configured to:
    计算所述第一照片与所述第一视频帧的色彩空间的差异值;所述差异值是长度差或平方根差或乘积差;所述计算后生成多个灰度分割图像;Calculating a difference value in the color space of the first photo and the first video frame; the difference value is a length difference or a square root difference or a product difference; generating multiple gray-scale segmented images after the calculation;
    将所述多个灰度分割图像进行图像合并,得到第一灰度分割图像;Image merging the multiple gray-scale segmented images to obtain a first gray-scale segmented image;
    将所述第一灰度分割图像中所述移动对象的图像分割出来,作为第一不透明遮罩图像;Segmenting the image of the moving object in the first gray-scale segmented image as a first opaque mask image;
    分别对比所述视频中的每个视频帧和所述第一照片的差异,生成所述每个视频帧各自对应的不透明遮罩图像。The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
  10. 根据权利要求7或8所述的方法,其特征在于,所述第一照片中不能包含所述移动对象的图像。The method according to claim 7 or 8, wherein the image of the moving object cannot be included in the first photo.
  11. 根据权利要求7或8所述的方法,其特征在于,所述合成模块具体用于:The method according to claim 7 or 8, wherein the synthesis module is specifically configured to:
    所述每个视频帧和每个视频帧各自对应的不透明遮罩图像融合,生成每个视频帧对应的遮挡帧;Fusion of the respective opaque mask images corresponding to each video frame and each video frame to generate a mask frame corresponding to each video frame;
    将所述每个视频帧对应的遮挡帧和所述每个视频帧对应的渲染帧融合,生成所述每个视频帧各自对应的合成帧;Fusing the occlusion frame corresponding to each video frame and the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame;
    将所述每个视频帧各自对应的合成帧串联起来生成合成视频。The composite frame corresponding to each video frame is connected in series to generate a composite video.
  12. 根据权利要求7或8所述的方法,其特征在于,所述渲染帧由所述背景图像和所述虚拟对象的图像构成,所述背景图像和所述第一照片的图像一致;当所述虚拟对象的图像与所述背景图像的图像区域重叠时,所述虚拟对象的图像遮挡住所述背景图像。The method according to claim 7 or 8, wherein the rendering frame is composed of the background image and the image of the virtual object, and the background image is consistent with the image of the first photo; when the When the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.
PCT/CN2020/080221 2020-03-19 2020-03-19 Video processing method and device WO2021184303A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080221 WO2021184303A1 (en) 2020-03-19 2020-03-19 Video processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080221 WO2021184303A1 (en) 2020-03-19 2020-03-19 Video processing method and device

Publications (1)

Publication Number Publication Date
WO2021184303A1 true WO2021184303A1 (en) 2021-09-23

Family

ID=77768452

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080221 WO2021184303A1 (en) 2020-03-19 2020-03-19 Video processing method and device

Country Status (1)

Country Link
WO (1) WO2021184303A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710234A (en) * 2024-02-06 2024-03-15 青岛海尔科技有限公司 Picture generation method, device, equipment and medium based on large model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287511A1 (en) * 2007-09-25 2010-11-11 Metaio Gmbh Method and device for illustrating a virtual object in a real environment
CN106056663A (en) * 2016-05-19 2016-10-26 京东方科技集团股份有限公司 Rendering method for enhancing reality scene, processing module and reality enhancement glasses
CN106683161A (en) * 2016-12-13 2017-05-17 中国传媒大学 Augmented reality shielding method based on image segmentation and customized layer method
CN107909652A (en) * 2017-11-10 2018-04-13 上海电机学院 A kind of actual situation scene mutually blocks implementation method
CN108830940A (en) * 2018-06-19 2018-11-16 广东虚拟现实科技有限公司 Hiding relation processing method, device, terminal device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287511A1 (en) * 2007-09-25 2010-11-11 Metaio Gmbh Method and device for illustrating a virtual object in a real environment
CN106056663A (en) * 2016-05-19 2016-10-26 京东方科技集团股份有限公司 Rendering method for enhancing reality scene, processing module and reality enhancement glasses
CN106683161A (en) * 2016-12-13 2017-05-17 中国传媒大学 Augmented reality shielding method based on image segmentation and customized layer method
CN107909652A (en) * 2017-11-10 2018-04-13 上海电机学院 A kind of actual situation scene mutually blocks implementation method
CN108830940A (en) * 2018-06-19 2018-11-16 广东虚拟现实科技有限公司 Hiding relation processing method, device, terminal device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI, HONGBO ET AL.: "Virtual-Reality Occlusion Processing Method based on Dynamic Transformation Background Frame", COMPUTER ENGINEERING AND DESIGN, vol. 36, no. 1, 31 January 2015 (2015-01-31), pages 227 - 231, XP055851682 *
RAO, SHAOYAN: "A Study of Virtual Reality Occlusion in Augmented Reality", CHINESE MASTER’S THESES FULL-TEXT DATABASE, 15 June 2018 (2018-06-15), pages 1 - 71, XP055851699 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710234A (en) * 2024-02-06 2024-03-15 青岛海尔科技有限公司 Picture generation method, device, equipment and medium based on large model

Similar Documents

Publication Publication Date Title
US11488359B2 (en) Providing 3D data for messages in a messaging system
US11189104B2 (en) Generating 3D data in a messaging system
CN109118569B (en) Rendering method and device based on three-dimensional model
CN115699114B (en) Method and apparatus for image augmentation for analysis
US11508135B2 (en) Augmented reality content generators including 3D data in a messaging system
US11410401B2 (en) Beautification techniques for 3D data in a messaging system
WO2019034142A1 (en) Three-dimensional image display method and device, terminal, and storage medium
US11825065B2 (en) Effects for 3D data in a messaging system
KR20220051376A (en) 3D Data Generation in Messaging Systems
US10891796B2 (en) Systems and methods for augmented reality applications
WO2018151891A1 (en) Refinement of structured light depth maps using rgb color data
EP4036790A1 (en) Image display method and device
US10970918B2 (en) Image processing method and apparatus using a pixelated mask image and terminal orientation for a reflection effect
US11475652B2 (en) Automatic representation toggling based on depth camera field of view
WO2023216526A1 (en) Calibration information determination method and apparatus, and electronic device
WO2021184303A1 (en) Video processing method and device
US11682234B2 (en) Texture map generation using multi-viewpoint color images
US20150009123A1 (en) Display apparatus and control method for adjusting the eyes of a photographed user
WO2023207379A1 (en) Image processing method and apparatus, device and storage medium
TW202024721A (en) Method and system for building environment map
EP2706508B1 (en) Reducing latency in an augmented-reality display
KR102534449B1 (en) Image processing method, device, electronic device and computer readable storage medium
CN115330926A (en) Shadow estimation method, device, electronic equipment and readable storage medium
CN114723800A (en) Method and device for correcting point cloud data, electronic device, and storage medium
CN116934937A (en) Data processing method and related equipment thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20926304

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20926304

Country of ref document: EP

Kind code of ref document: A1