WO2021184303A1 - Procédé et dispositif de traitement vidéo - Google Patents

Procédé et dispositif de traitement vidéo Download PDF

Info

Publication number
WO2021184303A1
WO2021184303A1 PCT/CN2020/080221 CN2020080221W WO2021184303A1 WO 2021184303 A1 WO2021184303 A1 WO 2021184303A1 CN 2020080221 W CN2020080221 W CN 2020080221W WO 2021184303 A1 WO2021184303 A1 WO 2021184303A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
video
video frame
frame
generate
Prior art date
Application number
PCT/CN2020/080221
Other languages
English (en)
Chinese (zh)
Inventor
布雷顿·雷米
Original Assignee
深圳市创梦天地科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市创梦天地科技有限公司 filed Critical 深圳市创梦天地科技有限公司
Priority to PCT/CN2020/080221 priority Critical patent/WO2021184303A1/fr
Publication of WO2021184303A1 publication Critical patent/WO2021184303A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present invention relates to the field of image processing, in particular to a method and equipment for video processing.
  • Augmented Reality has developed rapidly in recent years and has attracted widespread attention.
  • the precise registration and tracking of virtual objects, the seamlessly integrated reality of virtual and reality, and the real-time interaction between the user and the scene determine the realism and immersion of AR Sense and interactivity.
  • the image of the virtual object in the AR fusion scene is simply merged with the original background photo or video, which will cause the virtual object to float on the image of the object in the real world
  • the virtual object image will always occlude the object in the real world, so the virtual object image cannot be integrated with the real object image in the video, or the correctness between the real object and the virtual object may not be realized.
  • the occlusion relationship affects the user's perception and experience. Wrong occlusion relationship will give users the illusion of relative positional distortion between virtual and real objects and confusion in depth perception, which reduces the realism of the fusion scene.
  • the present invention provides a video processing method and device, which are used to solve the problem that the image of a moving object occludes the image of a virtual object.
  • an embodiment of the present invention provides a video processing method, the method including: taking a first photo. Take a video and detect that the video contains an image of a moving object.
  • the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the image of the first photo.
  • the difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
  • the image area corresponding to the moving object is an opaque image
  • the image area outside the moving object is a transparent image.
  • Each video frame is fused with the opaque mask image and the image of the virtual object corresponding to each video frame, and a composite frame corresponding to each video frame is generated. Concatenate the corresponding composite frames of each video frame to generate composite video.
  • the aforementioned difference between each video frame and the first photo in the video is compared to generate an opaque mask image corresponding to each video frame, which specifically includes: calculating the first photo and the first photo
  • the difference value of the color space of a video frame is the length difference or the square root difference or the product difference
  • multiple gray-scale segmented images are generated after calculation.
  • Combine the multiple gray-scale segmented images to obtain the first gray-scale segmented image.
  • the image of the moving object in the first gray-scale segmented image is segmented and used as the first opaque mask image.
  • the difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
  • the first photo cannot contain an image of a moving object.
  • the aforementioned fusion of each video frame with the corresponding opaque mask image and virtual object image of each video frame, respectively, to generate a composite frame corresponding to each video frame specifically Including: each video frame and the corresponding opaque mask image fusion of each video frame to generate a corresponding occlusion frame for each video frame.
  • Each video frame is merged with the image of the virtual object to generate a rendering frame corresponding to each video frame.
  • the occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.
  • the rendered frame is composed of a background image and an image of a virtual object.
  • the background image is consistent with the image of the first photo.
  • the image of the virtual object overlaps the image area of the background image, the image of the virtual object The image obscures the background image.
  • an embodiment of the present invention provides a video processing device, which includes: a shooting module, a segmentation module, a rendering module, and a synthesis module.
  • the shooting module is used to shoot the first photo, and the shooting module is also used to shoot a video, and it is detected that the video contains an image of a moving object.
  • the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the image of the first photo.
  • the segmentation module is used to compare the difference between each video frame in the video and the first photo, and generate an opaque mask image corresponding to each video frame. In each opaque mask image, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image.
  • the rendering module is used to merge each video frame with the image of the virtual object to generate a rendering frame corresponding to each video frame.
  • the synthesis module is used for fusing each video frame and each corresponding opaque mask image of each video frame, and each corresponding rendering frame of each video frame, to generate a synthesis frame corresponding to each video frame.
  • the synthesis module is also used for concatenating the corresponding synthesized frames of each video frame to generate a synthesized video.
  • the segmentation module is specifically configured to: calculate the difference value of the color space of the first photo and the first video frame, the difference value is the length difference or the square root difference or the product difference, and the calculation generates multiple Gray-scale segmentation of the image. Combine the multiple gray-scale segmented images to obtain the first gray-scale segmented image. The image of the moving object in the first gray-scale segmented image is segmented and used as the first opaque mask image. The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
  • the first photo cannot contain an image of a moving object.
  • the synthesis module is specifically configured to merge each video frame and the corresponding opaque mask image of each video frame to generate the occlusion frame corresponding to each video frame.
  • the occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame. Concatenate the corresponding composite frames of each video frame to generate composite video.
  • the rendered frame is composed of a background image and an image of a virtual object.
  • the background image is consistent with the image of the first photo.
  • the image of the virtual object overlaps the image area of the background image, the image of the virtual object The image obscures the background image.
  • the moving object image in the video or photo, can occlude the virtual object image, and the occlusion relationship between virtual and real objects can be changed, so that the user can see the visual effect of augmented reality, presenting a more natural, real, and highly immersive experience.
  • AR integrates scenes to increase users' perception and experience of scenes, and the visual experience is better.
  • FIG. 1a is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention.
  • Figure 1b is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention.
  • FIG. 1c is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention.
  • Figure 1d is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention.
  • FIG. 2 is a flowchart of a video processing method provided by an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a method for processing video frames according to an embodiment of the present invention.
  • [Correct 20.04.2020 according to Rule 91] 4 is a schematic diagram of modules of a video processing device provided by an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of the hardware structure of a video processing device provided by an embodiment of the present invention.
  • the present invention provides a occlusion method and device for video processing, which are used to solve the problem that the image of a moving object occludes the image of a virtual object. This method is to segment the moving object image as an opaque mask, and then cover the virtual object image with the opaque mask.
  • moving object images can occlude virtual object images, and can change the occlusion relationship between virtual and real objects, allowing users to see the visual effects of augmented reality, presenting a more natural, real, and highly immersive sense
  • the AR integrates the scene to increase the user’s perception and experience of the scene, and the visual experience is better.
  • the method also has automatic calibration function and augmented reality support.
  • FIG. 1 is a schematic diagram of an implementation effect of this application.
  • Figure 1a shows a background image, that is, an area that needs image processing.
  • the example background image in Figure 1a has a tree, a stool, and a white cloud.
  • Figure 1b shows that a moving object enters the background during video shooting.
  • the example of the moving object in Figure 1b is a running person.
  • Figure 1c shows that we insert a virtual object image into the video or photo. Without any image processing, the virtual object image will be added to the top layer of the photo layer, covering the background image and the moving object image.
  • the example virtual object image is an animated image named Pikachu.
  • Figure 1d shows that using a method provided by this application, the moving object image can be placed on top of the virtual object image to achieve the visual effect of the moving object occluding the virtual object image.
  • the running person occludes Pikachu. Effect.
  • the method for video processing provided by this application has specific steps as shown in Figure 2 and includes:
  • the user takes a background snapshot image, and then performs noise reduction processing on the background snapshot image.
  • the image In the process of image generation and transmission, the image is often degraded due to the interference and influence of various noises, which will adversely affect the subsequent image processing and image visual effects. Therefore, in order to suppress the influence of noise, improve image quality, and facilitate higher-level processing, the image must be denoised.
  • the method of reducing the noise of the background snapshot image can be a median filtering method or a Gaussian filtering method.
  • the image after the noise reduction process is called the reference image, that is, the first photo, which is also the area where the virtual object image is integrated in the subsequent steps. Calculate the noise difference between the reference image and the background snapshot image, and the noise difference can be used in subsequent steps.
  • the image of the moving object is not included in the first photo.
  • S102 Shoot a video, and detect that the video contains an image of a moving object.
  • the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the reference image.
  • the moving object enters the real-time video
  • the difference between the reference image and the real-time video frame is compared, a first gray-scale segmented image is generated, and the moving object image is segmented as an opaque mask corresponding to the video frame.
  • the reference image and each video frame In order to compare the difference between the reference image and each video frame more conveniently, first reduce the reference image and each video frame to a smaller resolution, because the image is normalized in the process of reducing the pixels, which can reduce the image pair
  • the sensitivity of the noise generated by the camera sensor so using a smaller image resolution during the comparison process will improve the quality of the comparison.
  • the reduction of the resolution is set according to the actual situation, for example, it can be reduced to 1/2 of the original image. In this way, the resolution of the opaque mask we get will also be reduced, and the edges of the opaque mask will be smoother.
  • the above-mentioned reference image and each video frame will use multiple color spaces, and the difference is compared through a comparison algorithm.
  • the comparison algorithm can be used to calculate the above-mentioned reference image and real-time video in each preset color space. The difference between the value of each pixel in the frame of the image.
  • the color space can be commonly used color spaces such as RGB, HSV, YCbCr, LAB, and XYZ, or it can be a custom color space set by the user.
  • the custom color space refers to the color space standard commonly used in the industry in the prior art. Modify the color space after some parameter values according to requirements.
  • the comparison algorithm can be to compare the difference between each pixel corresponding to the same position in the reference image and the video frame.
  • the calculation of the difference value can be the calculation of length difference, square root difference, product difference, etc. After calculation, each pixel will get a 0 Or the result of 1, 0 means white, 1 means black, so that after the comparison algorithm is processed, a picture composed of black and white pixels will be obtained, that is, a gray-scale segmented image.
  • the white area image is different from the two images obtained by the comparison algorithm. The area composed of pixels.
  • the gray segmentation images obtained are also different.
  • comparison algorithm A can get a gray-scale segmentation image of a human body with a broken arm
  • comparison algorithm B can get a gray-scale segmentation image of a human body with a broken leg.
  • this application may consider adding a noise reference value to the comparison algorithm to filter out the results of the comparison algorithm that is more sensitive to noise.
  • the noise reference value can be used as a judgment condition for selecting the comparison algorithm. For example, if the noise reference value is greater than a certain set value, we will discard the results of the corresponding comparison algorithms. Specifically, in practice, the setting of the judgment condition of the discarding algorithm is selected by the prior art personnel after testing.
  • the noise reference value in the comparison algorithm may use the noise difference between the reference image and the background snapshot image in step S101.
  • the final gray-scale segmented image with the reduced resolution described above is proportionally matched according to the resolution of the display screen.
  • the reference image and the video frame can use multiple color spaces, such as LAB color space, YCbCr color space, and CMYK color space.
  • Each color space uses multiple comparison algorithms. For example, the LAB color space uses the first comparison algorithm, the YCbCr color space uses the second comparison algorithm, and the CMYK color space uses the third comparison algorithm.
  • Each color space gets a different grayscale segmentation image. The white area image parts of all gray-scale segmented images are mixed and combined to obtain a final gray-scale segmented image.
  • each occlusion frame the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image.
  • the shader can have different blending operations, such as Add, Sub, and different blending factors, such as SrcColor, One, so as to blend the 3D object and the real object image in the real-time video and create multiple combined results.
  • the shader we use can be Unity3D shader.
  • Three-dimensional modeling software can include professional modeling software or software with modeling functions, and software with Unity3D plug-in can be selected.
  • Unity3D plug-in can support the effect of adding virtual 3D objects, and can be applied to terminal platforms such as Android and IOS.
  • the corresponding save format of the generated model can be .fbx and .obj.
  • the hybrid virtual object is not limited to 3D virtual objects, but can also be 2D virtual objects and the like.
  • the rendered frame is composed of the background image and the image of the virtual object, and the background image is consistent with the reference image.
  • the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.
  • S107 Concatenate the respective composite frames corresponding to each video frame to generate a composite video.
  • the present application provides a video processing device 400, which can make the image of the moving object cover the image of the virtual object added in the real-time video in the real-time video.
  • the device can be a mobile phone, mobile phone, tablet computer and other terminal devices with camera photo/video functions.
  • the graphics processor GPU of the device can support running high-level shader language (HLSL) or OpenGL shading language (openGLshading). language, GLSL), the device may also have a Simultaneous Localization and Mapping (SLAM) system.
  • HLSL high-level shader language
  • openGLshading OpenGL shading language
  • GLSL Simultaneous Localization and Mapping
  • FIG. 1 The functional block diagram of the device is shown in FIG. 1
  • the photographing module 401 is used to photograph the first photo. Specifically, the user takes a background snapshot image, and then performs noise reduction processing on the background snapshot image. In the process of image generation and transmission, the image is often degraded due to the interference and influence of various noises, which will adversely affect the subsequent image processing and image visual effects. Therefore, in order to suppress the influence of noise, improve image quality, and facilitate higher-level processing, the image must be denoised.
  • the method of reducing the noise of the background snapshot image can be a median filtering method or a Gaussian filtering method.
  • the image after the noise reduction process is called the reference image, that is, the first photo, which is also the area where the virtual object image is integrated in the subsequent steps. Calculate the noise difference between the reference image and the background snapshot image, and the noise difference can be used in subsequent steps. In addition, the image of the moving object is not included in the first photo.
  • the shooting module 401 is also used to shoot a video, and it is detected that the video contains an image of a moving object.
  • the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the reference image.
  • the segmentation module 402 is configured to compare the difference between each video frame and the reference image in the video, and generate an opaque mask image corresponding to each video frame.
  • the moving object enters the real-time video
  • the difference between the reference image and the real-time video frame is compared, a first gray-scale segmented image is generated, and the moving object image is segmented as an opaque mask corresponding to the video frame.
  • the reference image and each video frame In order to compare the difference between the reference image and each video frame more conveniently, first reduce the reference image and each video frame to a smaller resolution, because the image is normalized in the process of reducing the pixels, which can reduce the image pair
  • the sensitivity of the noise generated by the camera sensor so using a smaller image resolution during the comparison process will improve the quality of the comparison.
  • the reduction of the resolution is set according to the actual situation, for example, it can be reduced to 1/2 of the original image. In this way, the resolution of the opaque mask we get will also be reduced, and the edges of the opaque mask will be smoother.
  • the comparison algorithm can be used to calculate the preset reference image and real-time video in each color space.
  • the difference between the value of each pixel in the frame of the image.
  • the color space can be commonly used color spaces such as RGB, HSV, YCbCr, LAB, and XYZ, or it can be a custom color space set by the user.
  • the custom color space refers to the color space standard commonly used in the industry in the prior art. Modify the color space after some parameter values according to requirements.
  • the comparison algorithm can be to compare the difference between each pixel corresponding to the same position in the reference image and the video frame.
  • the calculation of the difference value can be the calculation of length difference, square root difference, product difference, etc. After calculation, each pixel will get a 0 Or the result of 1, 0 means white, 1 means black, so that after the comparison algorithm is processed, a black-and-white pixel image will be obtained, that is, a gray-scale segmented image.
  • the white area image is different from the two images obtained by the comparison algorithm. The area composed of pixels.
  • comparison algorithm A can get a gray-scale segmentation image of a human body with a broken arm
  • comparison algorithm B can get a gray-scale segmentation image of a human body with a broken leg.
  • this application may consider adding a noise reference value to the comparison algorithm to filter out the results of the comparison algorithm that is more sensitive to noise.
  • the noise reference value can be used as a judgment condition for selecting the comparison algorithm. For example, if the noise reference value is greater than a certain set value, we will discard the results of the corresponding comparison algorithms. Specifically, in practice, the setting of the judgment condition of the discarding algorithm is selected by the prior art personnel after testing.
  • the noise reference value in the comparison algorithm may use the noise difference between the reference image and the background snapshot image in step S101.
  • the final gray-scale segmented image with the reduced resolution is scaled according to the resolution of the display screen.
  • the reference image and the video frame can use multiple color spaces, such as LAB color space, YCbCr color space, and CMYK color space.
  • Each color space uses multiple comparison algorithms. For example, the LAB color space uses the first comparison algorithm, the YCbCr color space uses the second comparison algorithm, and the CMYK color space uses the third comparison algorithm.
  • Each color space gets a different grayscale segmentation image. The white area image parts of all gray-scale segmented images are mixed and combined to obtain a final gray-scale segmented image.
  • the rendering module 403 is used for fusing each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame.
  • the shader can have different blending operations, such as Add, Sub, and different blending factors, such as SrcColor, One, so as to blend the 3D object and the real object image in the real-time video and create multiple combined results.
  • the shader we use can be Unity3D shader.
  • Three-dimensional modeling software can include professional modeling software or software with modeling functions, and software with Unity3D plug-in can be selected.
  • Unity3D plug-in can support the effect of adding virtual 3D objects, and can be applied to terminal platforms such as Android and IOS.
  • the corresponding save format of the generated model can be .fbx and .obj.
  • the hybrid virtual object is not limited to 3D virtual objects, but can also be 2D virtual objects and the like.
  • the rendered frame is composed of the background image and the image of the virtual object, and the background image is consistent with the reference image.
  • the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.
  • the synthesis module 404 is configured to merge each video frame and each corresponding opaque mask image of each video frame, and each corresponding rendering frame of each video frame, to generate a synthesis frame corresponding to each video frame.
  • each video frame and the corresponding opaque mask image of each video frame are merged to generate the occlusion frame corresponding to each video frame.
  • the image area corresponding to the moving object is an opaque image
  • the image area outside the moving object is a transparent image.
  • the occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.
  • the synthesis module 404 is also used for concatenating the respective synthesized frames corresponding to each video frame to generate a synthesized video.
  • the terminal device can be a mobile phone, a tablet computer, a notebook computer and other terminal devices with camera and video functions.
  • the graphics processor GPU of the terminal device can support running high-level shader language (HLSL) or OpenGL shading language (openGLshading language, GLSL), the terminal device may also have a simultaneous positioning and mapping (Simultaneous Localization and Mapping, SLAM) system.
  • the terminal device can be equipped with a larger screen (for example, a screen of 5 inches and above) to facilitate the user to watch the shooting effect.
  • the terminal device is equipped with one or more cameras, such as a 2D camera, a 3D camera, and there is no restriction here.
  • FIG. 5 is a structural block diagram of an implementation manner of the terminal device 500.
  • the terminal device 500 may include: a baseband chip 510, a memory 515 (one or more computer-readable storage media), a radio frequency (RF) module 516, and a peripheral system 517. These components may communicate on one or more communication buses 514.
  • a baseband chip 510 one or more computer-readable storage media
  • a radio frequency (RF) module 516 one or more computer-readable storage media
  • RF radio frequency
  • the peripheral system 517 is mainly used to implement the interactive function between the terminal 500 and the user/external environment, and mainly includes the input and output devices of the terminal 500.
  • the peripheral system 517 may include: a touch screen controller 518, a camera controller 519, an audio controller 520, and a sensor management module 521.
  • each controller can be coupled with its corresponding peripheral devices (such as the touch screen 523, the camera 524, the audio circuit 525, and the sensor 526).
  • the touch screen 523 is also called a touch panel, which can collect the user's touch operations on or near it (for example, the user's operations on the touch screen or near the touch screen using any suitable objects or accessories such as fingers, stylus, etc.), and can be set according to the preset
  • the specified program drives the corresponding connection device.
  • the touch screen may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 511, and can receive and execute the commands sent by the processor 511.
  • the peripheral system 517 may also include a display panel.
  • the display panel may be configured in the form of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), etc.
  • the touch screen can cover the display panel. When the touch screen detects a touch operation on or near it, it is transmitted to the processor 511 to determine the type of the touch event, and then the processor 511 provides corresponding information on the display panel according to the type of the touch event. Visual output.
  • a touch screen and a display panel can be integrated to realize the input and output functions of the terminal device 500.
  • the camera 524 may be a 2D camera or a 3D camera. It should be noted that the peripheral system 517 may also include other I/O peripherals, which is not limited.
  • the baseband chip 510 may integrate: one or more processors 511, a clock module 512, and a power management module 513.
  • the clock module 512 integrated in the baseband chip 510 is mainly used to generate a clock required for data transmission and timing control for the processor 511.
  • the power management module 513 integrated in the baseband chip 510 mainly manages charging, discharging, and power consumption distribution functions to provide a stable, high-precision voltage for the processor 511, the radio frequency module 516, and peripheral systems.
  • the processor 511 in the embodiment of this application may include at least one of the following types: general-purpose central processing unit (central processing unit, CPU), graphics processing unit (GPU), digital signal processor (digital signal processor, DSP), microprocessor, application-specific integrated circuit (ASIC), microcontroller (microcontroller unit, MCU), field programmable gate array (field programmable gate array, FPGA), or It is an integrated circuit that implements logic operations.
  • the processor 511 may be a single-CPU processor or a multi-CPU processor.
  • the at least one processor 511 may be integrated in one chip or located on multiple different chips.
  • the radio frequency (RF) module 516 is used to receive and transmit radio frequency signals, and mainly integrates the receiver and transmitter of the terminal 500.
  • the radio frequency (RF) module 516 communicates with the communication network and other communication devices through radio frequency signals.
  • the radio frequency (RF) module 516 may include, but is not limited to: an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chip, a SIM card 5161 And storage media.
  • the radio frequency (RF) module 516 may be implemented on a separate chip.
  • the RF module 516 can also communicate with the network and other devices through wireless communication, such as Wi-Fi 5162.
  • the wireless communication can use any communication standard or protocol, including but not limited to GSM (global system of mobile communication, global system for mobile communication), GPRS (general packet radio service, general packet radio service), CDMA (code division multiple access) , Code division multiple access), WCDMA (wideband code division multiple access, wideband code division multiple access), LTE (long term evolution), email, SMS (short messaging service, short message service), short-distance communication technology Wait.
  • GSM global system of mobile communication, global system for mobile communication
  • GPRS general packet radio service, general packet radio service
  • CDMA code division multiple access
  • Code division multiple access Code division multiple access
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • email short messaging service, short message service
  • SMS short messaging service, short message service
  • short-distance communication technology Wait any communication standard or protocol, including but not limited to GSM (global system of mobile communication, global system for mobile communication), GPRS (general packet radio service, general packet radio service), CDMA (code division multiple access) ,
  • the memory 515 is coupled with the processor 511, and is used to store various software programs and/or multiple sets of instructions.
  • the memory 515 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
  • the memory 515 may store an operating system (hereinafter referred to as system), such as embedded operating systems such as ANDROID, IOS.
  • system such as embedded operating systems such as ANDROID, IOS.
  • the memory 515 may also store a network communication program, which may be used to communicate with one or more additional devices, one or more terminal devices, and one or more network devices.
  • the memory 515 can also store a user interface program, which can vividly display the content of the application program through a graphical operation interface, and receive user control operations on the application program through input controls such as menus, dialog boxes, and keys. .
  • the memory 515 may also store one or more application programs. As shown in Figure 5, these applications may include: social applications (such as Facebook), image management applications (such as photo albums), map applications (such as Google Maps), browsers (such as Safari, Google Chrome), etc. .
  • social applications such as Facebook
  • image management applications such as photo albums
  • map applications such as Google Maps
  • browsers such as Safari, Google Chrome
  • terminal device 500 is only an example provided by the embodiment of the present invention, and the terminal device 500 may have more or fewer components than shown, may combine two or more components, or may have Different configurations of components are realized.
  • the moving object image in a video or photo, when the moving object image and the virtual object image area overlap, the moving object image can block the virtual object image in real time, change the occlusion relationship between objects, and make the user see enhanced Realistic visual effects, better visual experience.
  • the method of the present invention uses smaller calculation amount, smaller processor load, easy operation and convenient realization.
  • the method also has automatic calibration function and augmented reality support.
  • the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may be in the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.
  • a computer-usable storage media including but not limited to disk storage, optical storage, etc.
  • These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment.
  • the instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Processing Or Creating Images (AREA)

Abstract

L'invention se rapporte au domaine du traitement d'images. L'invention concerne un procédé et un dispositif de traitement vidéo. Le procédé consiste à : capturer une première photo en tant qu'image de référence ; capturer une vidéo et détecter que la vidéo comprend une image d'un objet mobile ; comparer séparément chaque trame vidéo dans la vidéo avec l'image de référence pour obtenir une différence, puis générer une image de masque opaque correspondant respectivement à chaque trame vidéo ; fusionner chaque trame vidéo avec l'image de masque opaque correspondant à celle-ci afin de générer une trame d'occlusion correspondante ; fusionner chaque trame vidéo avec une image d'un objet virtuel pour générer une trame de rendu correspondante ; fusionner la trame d'occlusion et la trame de rendu correspondant à chaque trame vidéo pour générer une trame synthétisée correspondante ; et enfin connecter toutes les trames synthétisées ensemble en série pour générer une vidéo synthétisée. L'invention résout le problème d'une relation d'occlusion entre un objet virtuel et un objet réel. Selon cette solution, dans une vidéo, l'image d'un objet mobile peut occlure l'image d'un objet virtuel. Ainsi, la relation d'occlusion entre les objets virtuels et réels est modifiée, et la perception et l'expérience de l'utilisateur des scènes de réalité augmentée sont améliorées.
PCT/CN2020/080221 2020-03-19 2020-03-19 Procédé et dispositif de traitement vidéo WO2021184303A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080221 WO2021184303A1 (fr) 2020-03-19 2020-03-19 Procédé et dispositif de traitement vidéo

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/080221 WO2021184303A1 (fr) 2020-03-19 2020-03-19 Procédé et dispositif de traitement vidéo

Publications (1)

Publication Number Publication Date
WO2021184303A1 true WO2021184303A1 (fr) 2021-09-23

Family

ID=77768452

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/080221 WO2021184303A1 (fr) 2020-03-19 2020-03-19 Procédé et dispositif de traitement vidéo

Country Status (1)

Country Link
WO (1) WO2021184303A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710234A (zh) * 2024-02-06 2024-03-15 青岛海尔科技有限公司 基于大模型的图片生成方法、装置、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287511A1 (en) * 2007-09-25 2010-11-11 Metaio Gmbh Method and device for illustrating a virtual object in a real environment
CN106056663A (zh) * 2016-05-19 2016-10-26 京东方科技集团股份有限公司 增强现实场景中的渲染方法、处理模块和增强现实眼镜
CN106683161A (zh) * 2016-12-13 2017-05-17 中国传媒大学 基于图像分割与自定义图层法的增强现实遮挡方法
CN107909652A (zh) * 2017-11-10 2018-04-13 上海电机学院 一种虚实场景相互遮挡实现方法
CN108830940A (zh) * 2018-06-19 2018-11-16 广东虚拟现实科技有限公司 遮挡关系处理方法、装置、终端设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100287511A1 (en) * 2007-09-25 2010-11-11 Metaio Gmbh Method and device for illustrating a virtual object in a real environment
CN106056663A (zh) * 2016-05-19 2016-10-26 京东方科技集团股份有限公司 增强现实场景中的渲染方法、处理模块和增强现实眼镜
CN106683161A (zh) * 2016-12-13 2017-05-17 中国传媒大学 基于图像分割与自定义图层法的增强现实遮挡方法
CN107909652A (zh) * 2017-11-10 2018-04-13 上海电机学院 一种虚实场景相互遮挡实现方法
CN108830940A (zh) * 2018-06-19 2018-11-16 广东虚拟现实科技有限公司 遮挡关系处理方法、装置、终端设备及存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI, HONGBO ET AL.: "Virtual-Reality Occlusion Processing Method based on Dynamic Transformation Background Frame", COMPUTER ENGINEERING AND DESIGN, vol. 36, no. 1, 31 January 2015 (2015-01-31), pages 227 - 231, XP055851682 *
RAO, SHAOYAN: "A Study of Virtual Reality Occlusion in Augmented Reality", CHINESE MASTER’S THESES FULL-TEXT DATABASE, 15 June 2018 (2018-06-15), pages 1 - 71, XP055851699 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710234A (zh) * 2024-02-06 2024-03-15 青岛海尔科技有限公司 基于大模型的图片生成方法、装置、设备和介质
CN117710234B (zh) * 2024-02-06 2024-05-24 青岛海尔科技有限公司 基于大模型的图片生成方法、装置、设备和介质

Similar Documents

Publication Publication Date Title
US11961189B2 (en) Providing 3D data for messages in a messaging system
US11189104B2 (en) Generating 3D data in a messaging system
CN109118569B (zh) 基于三维模型的渲染方法和装置
CN115699114B (zh) 用于分析的图像增广的方法和装置
US11783556B2 (en) Augmented reality content generators including 3D data in a messaging system
US11410401B2 (en) Beautification techniques for 3D data in a messaging system
WO2019034142A1 (fr) Procédé et dispositif d'affichage d'image tridimensionnelle, terminal et support d'informations
US11825065B2 (en) Effects for 3D data in a messaging system
US10891796B2 (en) Systems and methods for augmented reality applications
KR20220051376A (ko) 메시징 시스템에서의 3d 데이터 생성
US20220245912A1 (en) Image display method and device
WO2018151891A1 (fr) Affinement de cartes de profondeur de lumière structurée à l'aide de données de couleur rgb
US10970918B2 (en) Image processing method and apparatus using a pixelated mask image and terminal orientation for a reflection effect
US11475652B2 (en) Automatic representation toggling based on depth camera field of view
WO2023216526A1 (fr) Procédé et appareil de détermination d'informations d'étalonnage et dispositif électronique
WO2023207379A1 (fr) Procédé et appareil de traitement d'images, dispositif et support de stockage
WO2021184303A1 (fr) Procédé et dispositif de traitement vidéo
US11682234B2 (en) Texture map generation using multi-viewpoint color images
US9536133B2 (en) Display apparatus and control method for adjusting the eyes of a photographed user
TW202024721A (zh) 環境地圖建立方法以及其系統
EP2706508B1 (fr) Réduction de la latence dans un affichage à réalité augmentée
KR102534449B1 (ko) 이미지 처리 방법, 장치, 전자 장치 및 컴퓨터 판독 가능 저장 매체
CN114723800A (zh) 点云数据的校正方法和校正装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20926304

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.02.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20926304

Country of ref document: EP

Kind code of ref document: A1