WO2021184303A1

WO2021184303A1 - Video processing method and device

Info

Publication number: WO2021184303A1
Application number: PCT/CN2020/080221
Authority: WO
Inventors: 布雷顿·雷米
Original assignee: 深圳市创梦天地科技有限公司
Priority date: 2020-03-19
Filing date: 2020-03-19
Publication date: 2021-09-23

Abstract

The present invention relates to the field of image processing. Disclosed are a video processing method and device. The method comprises: capturing a first photo as a reference image; capturing a video and detecting that the video comprises an image of a moving object; separately comparing each video frame in the video with the reference image to obtain a difference, and generating an opaque mask image respectively corresponding to each video frame; fusing each video frame with the opaque mask image corresponding thereto, to generate a corresponding occlusion frame; fusing each video frame with an image of a virtual object to generate a corresponding rendering frame; fusing the occlusion frame and the rendering frame corresponding to each video frame to generate a corresponding synthesized frame; and finally connecting all synthesized frames together in series to generate a synthesized video. The present invention solves the problem of an occlusion relation between a virtual object and a real object. According to this solution, in a video, the image of a moving object can occlude the image of a virtual object, thus the occlusion relation between the virtual and real objects is changed, and the user's perception and experience of augmented reality scenes are improved.

Description

Method and equipment for video processing

Technical field

The present invention relates to the field of image processing, in particular to a method and equipment for video processing.

Background technique

Augmented Reality (AR) has developed rapidly in recent years and has attracted widespread attention. The precise registration and tracking of virtual objects, the seamlessly integrated reality of virtual and reality, and the real-time interaction between the user and the scene determine the realism and immersion of AR Sense and interactivity.

When a virtual object image is added to a video or photo, without technical processing, the image of the virtual object in the AR fusion scene is simply merged with the original background photo or video, which will cause the virtual object to float on the image of the object in the real world Above, no matter how the real object moves, the virtual object image will always occlude the object in the real world, so the virtual object image cannot be integrated with the real object image in the video, or the correctness between the real object and the virtual object may not be realized. The occlusion relationship affects the user's perception and experience. Wrong occlusion relationship will give users the illusion of relative positional distortion between virtual and real objects and confusion in depth perception, which reduces the realism of the fusion scene.

Summary of the invention

The present invention provides a video processing method and device, which are used to solve the problem that the image of a moving object occludes the image of a virtual object.

In the first aspect, an embodiment of the present invention provides a video processing method, the method including: taking a first photo. Take a video and detect that the video contains an image of a moving object. The video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the image of the first photo. The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated. In each opaque mask image, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image. Each video frame is fused with the opaque mask image and the image of the virtual object corresponding to each video frame, and a composite frame corresponding to each video frame is generated. Concatenate the corresponding composite frames of each video frame to generate composite video.

In combination with the first aspect, in some embodiments, in each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is blocked by the image of the moving object.

With reference to the first aspect, in some embodiments, the aforementioned difference between each video frame and the first photo in the video is compared to generate an opaque mask image corresponding to each video frame, which specifically includes: calculating the first photo and the first photo The difference value of the color space of a video frame, the difference value is the length difference or the square root difference or the product difference, and multiple gray-scale segmented images are generated after calculation. Combine the multiple gray-scale segmented images to obtain the first gray-scale segmented image. The image of the moving object in the first gray-scale segmented image is segmented and used as the first opaque mask image. The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.

In combination with the first aspect, in some embodiments, the first photo cannot contain an image of a moving object.

With reference to the first aspect, in some embodiments, the aforementioned fusion of each video frame with the corresponding opaque mask image and virtual object image of each video frame, respectively, to generate a composite frame corresponding to each video frame, specifically Including: each video frame and the corresponding opaque mask image fusion of each video frame to generate a corresponding occlusion frame for each video frame. Each video frame is merged with the image of the virtual object to generate a rendering frame corresponding to each video frame. The occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.

In combination with the first aspect, in some embodiments, the rendered frame is composed of a background image and an image of a virtual object. The background image is consistent with the image of the first photo. When the image of the virtual object overlaps the image area of the background image, the image of the virtual object The image obscures the background image.

In a second aspect, an embodiment of the present invention provides a video processing device, which includes: a shooting module, a segmentation module, a rendering module, and a synthesis module. The shooting module is used to shoot the first photo, and the shooting module is also used to shoot a video, and it is detected that the video contains an image of a moving object. The video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the image of the first photo. The segmentation module is used to compare the difference between each video frame in the video and the first photo, and generate an opaque mask image corresponding to each video frame. In each opaque mask image, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image. The rendering module is used to merge each video frame with the image of the virtual object to generate a rendering frame corresponding to each video frame. The synthesis module is used for fusing each video frame and each corresponding opaque mask image of each video frame, and each corresponding rendering frame of each video frame, to generate a synthesis frame corresponding to each video frame. The synthesis module is also used for concatenating the corresponding synthesized frames of each video frame to generate a synthesized video.

In combination with the second aspect, in some embodiments, in each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is blocked by the image of the moving object.

With reference to the second aspect, in some embodiments, the segmentation module is specifically configured to: calculate the difference value of the color space of the first photo and the first video frame, the difference value is the length difference or the square root difference or the product difference, and the calculation generates multiple Gray-scale segmentation of the image. Combine the multiple gray-scale segmented images to obtain the first gray-scale segmented image. The image of the moving object in the first gray-scale segmented image is segmented and used as the first opaque mask image. The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.

In combination with the second aspect, in some embodiments, the first photo cannot contain an image of a moving object.

With reference to the second aspect, in some embodiments, the synthesis module is specifically configured to merge each video frame and the corresponding opaque mask image of each video frame to generate the occlusion frame corresponding to each video frame. The occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame. Concatenate the corresponding composite frames of each video frame to generate composite video.

In combination with the second aspect, in some embodiments, the rendered frame is composed of a background image and an image of a virtual object. The background image is consistent with the image of the first photo. When the image of the virtual object overlaps the image area of the background image, the image of the virtual object The image obscures the background image.

According to the above technical solution, in the video or photo, the moving object image can occlude the virtual object image, and the occlusion relationship between virtual and real objects can be changed, so that the user can see the visual effect of augmented reality, presenting a more natural, real, and highly immersive experience. AR integrates scenes to increase users' perception and experience of scenes, and the visual experience is better.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application or the background art, the following will describe the drawings that need to be used in the embodiments of the present application or the background art.

FIG. 1a is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention;

Figure 1b is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention;

FIG. 1c is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention;

Figure 1d is a schematic diagram of the effect of a video processing method provided by an embodiment of the present invention;

Figure 2 is a flowchart of a video processing method provided by an embodiment of the present invention;

[Correct 20.04.2020 according to Rule 91]
FIG. 3 is a schematic diagram of a method for processing video frames according to an embodiment of the present invention;

[Correct 20.04.2020 according to Rule 91]
4 is a schematic diagram of modules of a video processing device provided by an embodiment of the present invention;

[Correct 20.04.2020 according to Rule 91]
Fig. 5 is a schematic diagram of the hardware structure of a video processing device provided by an embodiment of the present invention.

Detailed ways

The embodiments of the present application will be described in detail below with reference to the drawings. The terms used in the implementation part of the embodiments of the present application are only used to explain the specific embodiments of the present application, and are not intended to limit the present application.

When the virtual object image is added to the video or photo, without technical processing, the position, size and contour of the object in the real world cannot be accurately and dynamically positioned, which will cause the virtual object image to float on the object in the real world. Above the image, the virtual object image will always occlude the object image in the real world, so the virtual object image cannot be integrated with the real object image in the video, which affects the user's viewing experience. The present invention provides a occlusion method and device for video processing, which are used to solve the problem that the image of a moving object occludes the image of a virtual object. This method is to segment the moving object image as an opaque mask, and then cover the virtual object image with the opaque mask. According to our technical solution, in videos or photos, moving object images can occlude virtual object images, and can change the occlusion relationship between virtual and real objects, allowing users to see the visual effects of augmented reality, presenting a more natural, real, and highly immersive sense The AR integrates the scene to increase the user’s perception and experience of the scene, and the visual experience is better. At the same time, the method also has automatic calibration function and augmented reality support.

The present application provides a video processing method, which can separate moving object images that do not exist in the reference image in real-time video, and generate an opaque mask to cover the virtual object image added in the video. Figure 1 is a schematic diagram of an implementation effect of this application. Figure 1a shows a background image, that is, an area that needs image processing. The example background image in Figure 1a has a tree, a stool, and a white cloud. Figure 1b shows that a moving object enters the background during video shooting. The example of the moving object in Figure 1b is a running person. Figure 1c shows that we insert a virtual object image into the video or photo. Without any image processing, the virtual object image will be added to the top layer of the photo layer, covering the background image and the moving object image. In Figure 1c, the example virtual object image is an animated image named Pikachu. Figure 1d shows that using a method provided by this application, the moving object image can be placed on top of the virtual object image to achieve the visual effect of the moving object occluding the virtual object image. In the example of Figure 1c, the running person occludes Pikachu. Effect.

The method for video processing provided by this application has specific steps as shown in Figure 2 and includes:

S101. Take a first photo.

The user takes a background snapshot image, and then performs noise reduction processing on the background snapshot image. In the process of image generation and transmission, the image is often degraded due to the interference and influence of various noises, which will adversely affect the subsequent image processing and image visual effects. Therefore, in order to suppress the influence of noise, improve image quality, and facilitate higher-level processing, the image must be denoised. Here, the method of reducing the noise of the background snapshot image can be a median filtering method or a Gaussian filtering method. The image after the noise reduction process is called the reference image, that is, the first photo, which is also the area where the virtual object image is integrated in the subsequent steps. Calculate the noise difference between the reference image and the background snapshot image, and the noise difference can be used in subsequent steps. In addition, the image of the moving object is not included in the first photo.

S102. Shoot a video, and detect that the video contains an image of a moving object.

Among them, the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the reference image.

S103: Compare the difference between each video frame and the reference image in the video, and generate an opaque mask image corresponding to each video frame.

Specifically, when the real-time video is taken, the moving object enters the real-time video, the difference between the reference image and the real-time video frame is compared, a first gray-scale segmented image is generated, and the moving object image is segmented as an opaque mask corresponding to the video frame.

In order to compare the difference between the reference image and each video frame more conveniently, first reduce the reference image and each video frame to a smaller resolution, because the image is normalized in the process of reducing the pixels, which can reduce the image pair The sensitivity of the noise generated by the camera sensor, so using a smaller image resolution during the comparison process will improve the quality of the comparison. The reduction of the resolution is set according to the actual situation, for example, it can be reduced to 1/2 of the original image. In this way, the resolution of the opaque mask we get will also be reduced, and the edges of the opaque mask will be smoother.

After reducing the picture resolution, the above-mentioned reference image and each video frame will use multiple color spaces, and the difference is compared through a comparison algorithm. The comparison algorithm can be used to calculate the above-mentioned reference image and real-time video in each preset color space. The difference between the value of each pixel in the frame of the image. The color space can be commonly used color spaces such as RGB, HSV, YCbCr, LAB, and XYZ, or it can be a custom color space set by the user. The custom color space refers to the color space standard commonly used in the industry in the prior art. Modify the color space after some parameter values according to requirements. The comparison algorithm can be to compare the difference between each pixel corresponding to the same position in the reference image and the video frame. The calculation of the difference value can be the calculation of length difference, square root difference, product difference, etc. After calculation, each pixel will get a 0 Or the result of 1, 0 means white, 1 means black, so that after the comparison algorithm is processed, a picture composed of black and white pixels will be obtained, that is, a gray-scale segmented image. The white area image is different from the two images obtained by the comparison algorithm. The area composed of pixels. When the results obtained by each comparison algorithm are not the same, the gray segmentation images obtained are also different. For example, comparison algorithm A can get a gray-scale segmentation image of a human body with a broken arm, and comparison algorithm B can get a gray-scale segmentation image of a human body with a broken leg. We combine multiple gray-scale segmentation images to get the largest difference. Value, a final gray-scale segmented image closer to the complete human body can be obtained, that is, the first gray-scale segmented image.

Therefore, in order to ensure the completeness and accuracy of the final gray-scale segmentation image, we use multiple comparison algorithms for each color space to compare. Suppose we use M color spaces in the difference comparison, and each color space gets N For gray-scale segmented images, we can take the union of all gray-scale segmented images, that is, the white area image parts of M×N gray-scale segmented images, to obtain a final gray-scale segmented image. The specific comparison algorithm to be used can be set by the technician according to different situations, and there is no restriction here.

In order to reduce the influence of the noise generated by the camera sensor, that is, to reduce the noise in the real-time video, this application may consider adding a noise reference value to the comparison algorithm to filter out the results of the comparison algorithm that is more sensitive to noise. The noise reference value can be used as a judgment condition for selecting the comparison algorithm. For example, if the noise reference value is greater than a certain set value, we will discard the results of the corresponding comparison algorithms. Specifically, in practice, the setting of the judgment condition of the discarding algorithm is selected by the prior art personnel after testing. The noise reference value in the comparison algorithm may use the noise difference between the reference image and the background snapshot image in step S101.

After the final gray-scale segmented image is obtained, the final gray-scale segmented image with the reduced resolution described above is proportionally matched according to the resolution of the display screen. In addition, we can segment the moving object represented by the white area in the final gray-scale segmentation image as the first opacity mask, which can be used in subsequent steps.

We can use an example as shown in Figure 3 to understand the above process. First, we reduce the resolution of both the reference image and a certain video frame. After reducing the image resolution, the reference image and the video frame can use multiple color spaces, such as LAB color space, YCbCr color space, and CMYK color space. Each color space uses multiple comparison algorithms. For example, the LAB color space uses the first comparison algorithm, the YCbCr color space uses the second comparison algorithm, and the CMYK color space uses the third comparison algorithm. Each color space gets a different grayscale segmentation image. The white area image parts of all gray-scale segmented images are mixed and combined to obtain a final gray-scale segmented image.

S104: Each video frame and the corresponding opaque mask image of each video frame are merged to generate an occlusion frame corresponding to each video frame.

Specifically, we can apply the opacity mask of each frame of image to the corresponding image of each frame of real-time video to capture the image of each frame of the moving object, and generate the occlusion frame. In each occlusion frame, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image.

S105. Fusion of each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame.

In one example, we can model real objects in real-time video in 3D modeling software, and save the generated model in a format that can be imported into a shader. The shader can have different blending operations, such as Add, Sub, and different blending factors, such as SrcColor, One, so as to blend the 3D object and the real object image in the real-time video and create multiple combined results. The shader we use can be Unity3D shader. For example, we can import the model generated by the above modeling into the Unity3D project of the AR program, add a custom 3D virtual object, and generate a rendered video with the 3D virtual object added. Three-dimensional modeling software can include professional modeling software or software with modeling functions, and software with Unity3D plug-in can be selected. Unity3D plug-in can support the effect of adding virtual 3D objects, and can be applied to terminal platforms such as Android and IOS. The corresponding save format of the generated model can be .fbx and .obj. In addition, the hybrid virtual object is not limited to 3D virtual objects, but can also be 2D virtual objects and the like.

When the virtual object is mixed with each frame of the real-time video, we can take the lighting information and texture detail information into the model, which will make the composite effect of the virtual object and the real world more realistic.

The rendered frame is composed of the background image and the image of the virtual object, and the background image is consistent with the reference image. When the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.

S106. Fusion the occlusion frame corresponding to each video frame and the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.

In each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is blocked by the image of the moving object.

S107: Concatenate the respective composite frames corresponding to each video frame to generate a composite video.

Finally, in the composite frame and composite video, we can see the effect that the image of the virtual object added to the real-time video can be blocked by the image of the moving object.

The present application provides a video processing device 400, which can make the image of the moving object cover the image of the virtual object added in the real-time video in the real-time video. The device can be a mobile phone, mobile phone, tablet computer and other terminal devices with camera photo/video functions. The graphics processor GPU of the device can support running high-level shader language (HLSL) or OpenGL shading language (openGLshading). language, GLSL), the device may also have a Simultaneous Localization and Mapping (SLAM) system.

The functional block diagram of the device is shown in FIG.

The photographing module 401 is used to photograph the first photo. Specifically, the user takes a background snapshot image, and then performs noise reduction processing on the background snapshot image. In the process of image generation and transmission, the image is often degraded due to the interference and influence of various noises, which will adversely affect the subsequent image processing and image visual effects. Therefore, in order to suppress the influence of noise, improve image quality, and facilitate higher-level processing, the image must be denoised. Here, the method of reducing the noise of the background snapshot image can be a median filtering method or a Gaussian filtering method. The image after the noise reduction process is called the reference image, that is, the first photo, which is also the area where the virtual object image is integrated in the subsequent steps. Calculate the noise difference between the reference image and the background snapshot image, and the noise difference can be used in subsequent steps. In addition, the image of the moving object is not included in the first photo.

The shooting module 401 is also used to shoot a video, and it is detected that the video contains an image of a moving object. Among them, the video frame in the video is composed of a background image and an image of a moving object, and the background image is consistent with the reference image.

The segmentation module 402 is configured to compare the difference between each video frame and the reference image in the video, and generate an opaque mask image corresponding to each video frame.

After reducing the picture resolution, the above reference image and each video frame will use multiple color spaces, and the difference is compared through a comparison algorithm. The comparison algorithm can be used to calculate the preset reference image and real-time video in each color space. The difference between the value of each pixel in the frame of the image. The color space can be commonly used color spaces such as RGB, HSV, YCbCr, LAB, and XYZ, or it can be a custom color space set by the user. The custom color space refers to the color space standard commonly used in the industry in the prior art. Modify the color space after some parameter values according to requirements. The comparison algorithm can be to compare the difference between each pixel corresponding to the same position in the reference image and the video frame. The calculation of the difference value can be the calculation of length difference, square root difference, product difference, etc. After calculation, each pixel will get a 0 Or the result of 1, 0 means white, 1 means black, so that after the comparison algorithm is processed, a black-and-white pixel image will be obtained, that is, a gray-scale segmented image. The white area image is different from the two images obtained by the comparison algorithm. The area composed of pixels. When the results obtained by each comparison algorithm are not the same, the gray segmentation images obtained are also different. For example, comparison algorithm A can get a gray-scale segmentation image of a human body with a broken arm, and comparison algorithm B can get a gray-scale segmentation image of a human body with a broken leg. We combine multiple gray-scale segmentation images to get the largest difference. Value, a final gray-scale segmented image closer to the complete human body can be obtained, that is, the first gray-scale segmented image.

After the final gray-scale segmented image is obtained, the final gray-scale segmented image with the reduced resolution is scaled according to the resolution of the display screen. In addition, we can segment the moving object represented by the white area in the final gray-scale segmentation image as the first opacity mask, which can be used in subsequent steps.

The rendering module 403 is used for fusing each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame.

The synthesis module 404 is configured to merge each video frame and each corresponding opaque mask image of each video frame, and each corresponding rendering frame of each video frame, to generate a synthesis frame corresponding to each video frame.

Specifically, each video frame and the corresponding opaque mask image of each video frame are merged to generate the occlusion frame corresponding to each video frame. We can apply the opacity mask of each frame of image to the corresponding image of each frame of real-time video to capture the image of each frame of moving objects and generate occlusion frames. In each occlusion frame, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image.

The occlusion frame corresponding to each video frame is merged with the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame.

The synthesis module 404 is also used for concatenating the respective synthesized frames corresponding to each video frame to generate a synthesized video.

The following describes a hardware architecture diagram of a terminal device 500 related to an embodiment of the present application. The terminal device can be a mobile phone, a tablet computer, a notebook computer and other terminal devices with camera and video functions. The graphics processor GPU of the terminal device can support running high-level shader language (HLSL) or OpenGL shading language (openGLshading language, GLSL), the terminal device may also have a simultaneous positioning and mapping (Simultaneous Localization and Mapping, SLAM) system. The terminal device can be equipped with a larger screen (for example, a screen of 5 inches and above) to facilitate the user to watch the shooting effect. The terminal device is equipped with one or more cameras, such as a 2D camera, a 3D camera, and there is no restriction here.

FIG. 5 is a structural block diagram of an implementation manner of the terminal device 500. As shown in FIG. 5, the terminal device 500 may include: a baseband chip 510, a memory 515 (one or more computer-readable storage media), a radio frequency (RF) module 516, and a peripheral system 517. These components may communicate on one or more communication buses 514.

The peripheral system 517 is mainly used to implement the interactive function between the terminal 500 and the user/external environment, and mainly includes the input and output devices of the terminal 500. In a specific implementation, the peripheral system 517 may include: a touch screen controller 518, a camera controller 519, an audio controller 520, and a sensor management module 521. Among them, each controller can be coupled with its corresponding peripheral devices (such as the touch screen 523, the camera 524, the audio circuit 525, and the sensor 526). The touch screen 523 is also called a touch panel, which can collect the user's touch operations on or near it (for example, the user's operations on the touch screen or near the touch screen using any suitable objects or accessories such as fingers, stylus, etc.), and can be set according to the preset The specified program drives the corresponding connection device. Optionally, the touch screen may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact coordinates, and then sends it To the processor 511, and can receive and execute the commands sent by the processor 511. In addition, multiple types of resistive, capacitive, infrared, and surface acoustic waves can be used to implement touch screens. The peripheral system 517 may also include a display panel. Optionally, the display panel may be configured in the form of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), etc. Further, the touch screen can cover the display panel. When the touch screen detects a touch operation on or near it, it is transmitted to the processor 511 to determine the type of the touch event, and then the processor 511 provides corresponding information on the display panel according to the type of the touch event. Visual output. In some embodiments, a touch screen and a display panel can be integrated to realize the input and output functions of the terminal device 500. In addition, in some embodiments, the camera 524 may be a 2D camera or a 3D camera. It should be noted that the peripheral system 517 may also include other I/O peripherals, which is not limited.

The baseband chip 510 may integrate: one or more processors 511, a clock module 512, and a power management module 513. The clock module 512 integrated in the baseband chip 510 is mainly used to generate a clock required for data transmission and timing control for the processor 511. The power management module 513 integrated in the baseband chip 510 mainly manages charging, discharging, and power consumption distribution functions to provide a stable, high-precision voltage for the processor 511, the radio frequency module 516, and peripheral systems. The processor 511 in the embodiment of this application may include at least one of the following types: general-purpose central processing unit (central processing unit, CPU), graphics processing unit (GPU), digital signal processor (digital signal processor, DSP), microprocessor, application-specific integrated circuit (ASIC), microcontroller (microcontroller unit, MCU), field programmable gate array (field programmable gate array, FPGA), or It is an integrated circuit that implements logic operations. For example, the processor 511 may be a single-CPU processor or a multi-CPU processor. The at least one processor 511 may be integrated in one chip or located on multiple different chips.

The radio frequency (RF) module 516 is used to receive and transmit radio frequency signals, and mainly integrates the receiver and transmitter of the terminal 500. The radio frequency (RF) module 516 communicates with the communication network and other communication devices through radio frequency signals. In a specific implementation, the radio frequency (RF) module 516 may include, but is not limited to: an antenna system, an RF transceiver, one or more amplifiers, a tuner, one or more oscillators, a digital signal processor, a CODEC chip, a SIM card 5161 And storage media. In some embodiments, the radio frequency (RF) module 516 may be implemented on a separate chip. In addition, the RF module 516 can also communicate with the network and other devices through wireless communication, such as Wi-Fi 5162. The wireless communication can use any communication standard or protocol, including but not limited to GSM (global system of mobile communication, global system for mobile communication), GPRS (general packet radio service, general packet radio service), CDMA (code division multiple access) , Code division multiple access), WCDMA (wideband code division multiple access, wideband code division multiple access), LTE (long term evolution), email, SMS (short messaging service, short message service), short-distance communication technology Wait.

The memory 515 is coupled with the processor 511, and is used to store various software programs and/or multiple sets of instructions. In a specific implementation, the memory 515 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, or other non-volatile solid-state storage devices. The memory 515 may store an operating system (hereinafter referred to as system), such as embedded operating systems such as ANDROID, IOS. The memory 515 may also store a network communication program, which may be used to communicate with one or more additional devices, one or more terminal devices, and one or more network devices. The memory 515 can also store a user interface program, which can vividly display the content of the application program through a graphical operation interface, and receive user control operations on the application program through input controls such as menus, dialog boxes, and keys. .

The memory 515 may also store one or more application programs. As shown in Figure 5, these applications may include: social applications (such as Facebook), image management applications (such as photo albums), map applications (such as Google Maps), browsers (such as Safari, Google Chrome), etc. .

It should be understood that the terminal device 500 is only an example provided by the embodiment of the present invention, and the terminal device 500 may have more or fewer components than shown, may combine two or more components, or may have Different configurations of components are realized.

Implementing the method embodiments of the present invention, in a video or photo, when the moving object image and the virtual object image area overlap, the moving object image can block the virtual object image in real time, change the occlusion relationship between objects, and make the user see enhanced Realistic visual effects, better visual experience. The method of the present invention uses smaller calculation amount, smaller processor load, easy operation and convenient realization. At the same time, the method also has automatic calibration function and augmented reality support.

Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may be in the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.

These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, so as to execute on the computer or other programmable equipment. The instructions provide steps for implementing the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram.

The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present invention in further detail. It should be understood that the above descriptions are only specific embodiments of the present invention and are not intended to limit the scope of the present invention. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of the present invention shall be included in the protection scope of the present invention.

Claims

A video processing method, characterized in that the method includes:

Take the first photo;

Shooting a video, and detecting that the video contains an image of a moving object; the video frame in the video is composed of a background image and an image of the moving object, and the background image is consistent with the image of the first photo;

The difference between each video frame in the video and the first photo is respectively compared to generate an opaque mask image corresponding to each video frame; in each opaque mask image, corresponding to the movement The image area of the object is an opaque image, and the image area other than the moving object is a transparent image;

Fusing each video frame with an opaque mask image and an image of a virtual object corresponding to each video frame to generate a composite frame corresponding to each video frame;

The composite frame corresponding to each video frame is connected in series to generate a composite video.
The method according to claim 1, wherein, in each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is The image occlusion of the moving object.
The method according to claim 1 or 2, wherein the difference between each video frame in the video and the first photo is respectively compared to generate an opaque mask corresponding to each video frame Images, including:

Calculating a difference value in the color space of the first photo and the first video frame; the difference value is a length difference or a square root difference or a product difference; generating multiple gray-scale segmented images after the calculation;

Image merging the multiple gray-scale segmented images to obtain a first gray-scale segmented image;

Segmenting the image of the moving object in the first gray-scale segmented image as a first opaque mask image;

The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
The method according to claim 1 or 2, wherein the image of the moving object cannot be included in the first photo.
The method according to claim 1 or 2, characterized in that said fusing each video frame with the corresponding opaque mask image and virtual object image of each video frame to generate each The composite frame corresponding to each video frame includes:

Fusion of the respective opaque mask images corresponding to each video frame and each video frame to generate a mask frame corresponding to each video frame;

Fusion of each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame;

The occlusion frame corresponding to each video frame and the rendered frame corresponding to each video frame are merged to generate a composite frame corresponding to each video frame.
The method according to claim 5, wherein the rendering frame is composed of the background image and the image of the virtual object, and the background image is consistent with the image of the first photo; when the virtual object When the image of is overlapped with the image area of the background image, the image of the virtual object occludes the background image.
A video processing device, characterized in that the device includes: a shooting module, a segmentation module, a rendering module, and a synthesis module;

The shooting module is used to shoot a first photo; the shooting module is also used to shoot a video, and it is detected that the video contains an image of a moving object; the video frame in the video is composed of a background image and an image of the moving object. Composition, the background image is consistent with the image of the first photo;

The segmentation module is used to compare the difference between each video frame in the video and the first photo, and generate an opaque mask image corresponding to each video frame; in each opaque mask image Wherein, the image area corresponding to the moving object is an opaque image, and the image area outside the moving object is a transparent image;

The rendering module is configured to merge each video frame and the image of the virtual object to generate a rendering frame corresponding to each video frame;

The compositing module is used for fusing the respective opaque mask images corresponding to each video frame and each video frame, and the rendering frames corresponding to each video frame to generate the respective corresponding rendering frames of each video frame. Composite frame

The synthesis module is also used for concatenating the respective synthesized frames corresponding to each of the video frames to generate a synthesized video.
The method according to claim 7, wherein, in each composite frame, when the image of the virtual object overlaps with the image of the moving object, the image of the virtual object is The image occlusion of the moving object.
The method according to claim 7 or 8, wherein the segmentation module is specifically configured to:

Calculating a difference value in the color space of the first photo and the first video frame; the difference value is a length difference or a square root difference or a product difference; generating multiple gray-scale segmented images after the calculation;

Image merging the multiple gray-scale segmented images to obtain a first gray-scale segmented image;

Segmenting the image of the moving object in the first gray-scale segmented image as a first opaque mask image;

The difference between each video frame in the video and the first photo is respectively compared, and an opaque mask image corresponding to each video frame is generated.
The method according to claim 7 or 8, wherein the image of the moving object cannot be included in the first photo.
The method according to claim 7 or 8, wherein the synthesis module is specifically configured to:

Fusion of the respective opaque mask images corresponding to each video frame and each video frame to generate a mask frame corresponding to each video frame;

Fusing the occlusion frame corresponding to each video frame and the rendered frame corresponding to each video frame to generate a composite frame corresponding to each video frame;

The composite frame corresponding to each video frame is connected in series to generate a composite video.
The method according to claim 7 or 8, wherein the rendering frame is composed of the background image and the image of the virtual object, and the background image is consistent with the image of the first photo; when the When the image of the virtual object overlaps the image area of the background image, the image of the virtual object occludes the background image.