CN114449251B

CN114449251B - Video perspective method, device, system, electronic equipment and storage medium

Info

Publication number: CN114449251B
Application number: CN202011198831.6A
Authority: CN
Inventors: 梁天鹰; 赖武军
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-10-31
Filing date: 2020-10-31
Publication date: 2024-01-16
Anticipated expiration: 2040-10-31
Also published as: WO2022089100A1; CN114449251A

Abstract

The application provides a video perspective method, a video perspective device, a video perspective system, electronic equipment and a storage medium, and relates to the field of electronic equipment. Wherein the method comprises the following steps: obtaining a real image corresponding to a real world scene and a virtual image containing a virtual object in parallel; determining a first image according to the acquisition results of the real image and the virtual image, wherein the first image is the real image or a synthesized image of the real image and the virtual image; the first image is displayed. According to the method, the real image corresponding to the real world scene and the virtual image containing the virtual object are obtained in parallel, so that the real image and the virtual object can be synthesized after being separately rendered, and the overall time delay from the obtaining of the real image to the displaying of the synthesized image in the video perspective process can be reduced.

Description

Video perspective method, device, system, electronic equipment and storage medium

Technical Field

The embodiment of the application relates to the field of electronic equipment, in particular to a video perspective method, a device, a system, electronic equipment and a storage medium.

Background

The video perspective technology refers to: capturing a real image of the real world by a camera (or called a camera module), generating a virtual image from the real image, and then combining the virtual image and the real image and displaying the combined virtual image and the real image. For example, the video perspective technology may be applied to a virtual reality helmet, and the virtual reality helmet is simultaneously provided with the function of augmented reality (augmented reality, AR).

Currently, when a user uses a video perspective device (such as a head-mounted device), there is a "time-shift" between the user and the reality, i.e., there is a greater time delay in the composite image seen in the user's eyes compared to the real scene. The larger this delay, the more pronounced the time-offset phenomenon will be. For example, when a user wears a video see-through headset, if the user takes an object by hand, the brain may already perceive that the hand has touched the object, but the eyes can only see after a certain time delay.

Disclosure of Invention

The embodiment of the application provides a video perspective method, a device, a system, electronic equipment and a storage medium, which can reduce the overall time delay from the acquisition of a real image to the display of a synthesized image in the video perspective process by a mode of separately rendering the real image and a virtual object and then synthesizing.

In a first aspect, an embodiment of the present application provides a video perspective method, including: obtaining a real image corresponding to a real world scene and a virtual image containing a virtual object in parallel; determining a first image according to the acquisition results of the real image and the virtual image, wherein the first image is the real image or a synthesized image of the real image and the virtual image; the first image is displayed.

According to the method, the real image corresponding to the real world scene and the virtual image containing the virtual object are obtained in parallel, so that the real image and the virtual object can be synthesized after being separately rendered, and the overall time delay from the obtaining of the real image to the displaying of the synthesized image in the video perspective process can be reduced.

In one possible design, the determining the first image according to the obtained result of the real image and the virtual image includes: and for each frame of real image, if the virtual image is not acquired after the real image is acquired, waiting until the virtual image is acquired, and combining the real image and the virtual image to obtain a combined image as a first image.

It can be understood that if the virtual image is not acquired after the real image is acquired, the real image and the virtual image are directly synthesized to obtain a synthesized image as the first image.

In another possible design, the determining the first image according to the obtained result of the real image and the virtual image includes: directly determining a real image as a first image for the real image acquired before the virtual image is acquired; and synthesizing the real image and the virtual image, which are obtained after the virtual image is obtained, to obtain a synthesized image as a first image.

In the design, when the real image acquired before the virtual image is acquired is directly determined to be the first image, the real image can be directly displayed, and the duration of no picture or blank picture of the display when the video perspective system is started initially can be reduced.

Optionally, the synthesizing the real image and the virtual image to obtain a synthesized image includes: adjusting the real image and the virtual image to a first size; identifying an effective pixel in the virtual image as 1, and identifying an ineffective pixel as 0 to obtain a mask image corresponding to the virtual image; the effective pixels are pixels occupied by the virtual object in the virtual image, and the ineffective pixels are pixels except the effective pixels in the virtual image; and synthesizing the real image and the virtual image according to the mask image to obtain a synthesized image.

In a second aspect, embodiments of the present application provide a video perspective device, which may be used to implement the method described in the first aspect. The functions of the device can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the above functions, for example, an acquisition unit, a synthesis unit, a display unit, and the like.

The acquisition unit is used for acquiring real images corresponding to the real world scene and virtual images containing the virtual objects in parallel; the synthesizing unit is used for determining a first image according to the acquisition results of the real image and the virtual image, wherein the first image is the real image or the synthesized image of the real image and the virtual image; and a display unit for displaying the first image.

In one possible design, the synthesizing unit is specifically configured to synthesize, for each frame of real image, the real image and the virtual image after the virtual image is acquired if the virtual image is not acquired yet after the real image is acquired, and obtain the synthesized image as the first image. If the virtual image is not acquired after the real image is acquired, the real image and the virtual image are directly synthesized, and the synthesized image is obtained as a first image.

In another possible design, the synthesizing unit is specifically configured to directly determine, for a real image acquired before the virtual image is acquired, that the real image is the first image; and synthesizing the real image and the virtual image, which are obtained after the virtual image is obtained, to obtain a synthesized image as a first image.

Optionally, the synthesizing unit is specifically configured to adjust the real image and the virtual image to the first size; identifying an effective pixel in the virtual image as 1, and identifying an ineffective pixel as 0 to obtain a mask image corresponding to the virtual image; the effective pixels are pixels occupied by the virtual object in the virtual image, and the ineffective pixels are pixels except the effective pixels in the virtual image; and synthesizing the real image and the virtual image according to the mask image to obtain a synthesized image.

For example, the first size may be 848×480, 300×150, etc., and the size of the first size may be adjusted according to the display requirement, the virtual image, the real image, etc., which is not limited in this application.

In a third aspect, embodiments of the present application provide a video perspective system, including: the device comprises a camera module, a central processing unit, a graphic processor, an image synthesis chip and a display; the camera module is used for capturing a real image corresponding to a real world scene and directly transmitting the real image to the image synthesis chip; the CPU and the graphic processor are used for generating a virtual image containing a virtual object and sending the virtual image to the image synthesis chip; the image synthesis chip is used for obtaining real images and virtual images in parallel; determining a first image according to the acquisition results of the real image and the virtual image, and sending the first image to a display, wherein the first image is the real image or a synthesized image of the real image and the virtual image; the display is used for displaying the first image.

In the video perspective system, the algorithm for synthesizing the virtual image and the real image is hardened in the image synthesizing chip, so that the operation time delay when the virtual image and the real image are synthesized can be reduced.

In a fourth aspect, embodiments of the present application provide an electronic device, which may be a video perspective device, such as: video see-through head mounted devices, video see-through glasses, and the like. The electronic device includes: a processor, a memory for storing processor-executable instructions; the processor is configured to, when executing the instructions, cause the electronic device to implement the method as described in the first aspect.

In a fifth aspect, embodiments of the present application provide a computer-readable storage medium having computer program instructions stored thereon; the computer program instructions, when executed by an electronic device, cause the electronic device to implement the method as described in the first aspect.

The advantages of the second to fifth aspects are described with reference to the first aspect, and are not described herein.

In a sixth aspect, embodiments of the present application provide a computer program product comprising computer readable code which, when run in an electronic device, causes the electronic device to implement the method of the first aspect.

It should be appreciated that the description of technical features, aspects, benefits or similar language in this application does not imply that all of the features and advantages may be realized with any single embodiment. Conversely, it should be understood that the description of features or advantages is intended to include, in at least one embodiment, the particular features, aspects, or advantages. Therefore, the description of technical features, technical solutions or advantageous effects in this specification does not necessarily refer to the same embodiment. Furthermore, the technical features, technical solutions and advantageous effects described in the present embodiment may also be combined in any appropriate manner. Those of skill in the art will appreciate that an embodiment may be implemented without one or more particular features, aspects, or benefits of a particular embodiment. In other embodiments, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments.

Drawings

FIG. 1 shows a schematic diagram of a video perspective;

fig. 2 shows a schematic structural diagram of a video perspective system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a virtual image according to an embodiment of the present application;

FIG. 4 shows a schematic representation of a real image provided by an embodiment of the present application;

FIG. 5 shows a schematic representation of a composite image provided by an embodiment of the present application;

fig. 6 shows a schematic structural diagram of a video perspective device according to an embodiment of the present application;

fig. 7 shows another schematic structural diagram of a video perspective system provided in an embodiment of the present application.

Detailed Description

With the development of Virtual Reality (VR) devices, a video perspective (video see through) technology based on a camera module gradually becomes a mainstream technology, and has wide application scenarios such as external world, electronic fence, MR application, and the like. For example, the video perspective technology may be applied to a virtual reality helmet, and the virtual reality helmet is simultaneously provided with the function of augmented reality (augmented reality, AR). Wherein, the video perspective technology refers to: capturing a real image of the real world by a camera (or called a camera module), generating a virtual image from the real image, and then combining the virtual image and the real image and displaying the combined virtual image and the real image.

Illustratively, fig. 1 shows a schematic diagram of a video perspective. As shown in fig. 1, in the video perspective technology at present, a camera module may capture a real world scene, obtain a real image (or video stream) corresponding to the real world scene, and transmit the real image (or video stream) to an intermediate processing module.

The intermediate processing module may include: an instant positioning and map building (simultaneously localization and mapping, SLAM) module, a plane detection module, a virtual object generation module, and a virtual reality synthesis module.

The SLAM module can be positioned according to the environments of the camera module and other sensors, and meanwhile, the environment structure is drawn according to the real images. Other sensors may include gyroscopes, accelerometers, infrared sensors, etc., such as: the SLAM module can acquire rotation, translation and other pose information acquired by the gyroscope and draw an environment structure.

The plane detection module may detect which of the real images are planes, such as: desktop, ground, etc. The virtual object generation module can combine the processing results of the SLAM module and the plane detection module to generate a virtual object, obtain a virtual image containing the virtual object and transmit the virtual image to the virtual reality synthesis module. The virtual reality synthesizing module can synthesize the virtual image output by the virtual object generating module with the real image captured by the camera module to obtain a synthesized image and transmit the synthesized image to the display module.

The display module may display the composite image, such as: presented in front of the human eye by the display.

With continued reference to fig. 1, it is assumed that in the above-mentioned video perspective process shown in fig. 1, the time consumed in the step of obtaining the real image by the camera module is t ₀ The time consumption of the step of transmitting the real image to the intermediate processing module by the camera module is t ₁ The processing step of SLAM module is time-consuming t ₂ The processing steps of the plane detection module are time-consuming t ₃ The processing steps of the virtual object generation module are time-consuming t ₄ The time consumption of the processing step of the virtual reality synthesis module for obtaining the synthesized image is t ₅ The time consumption of the step of transmitting the synthesized image to the display module by the virtual reality synthesis module is t ₆ The time consumption of the step of displaying the composite image by the display module is t ₇ Then the overall time delay T of the video perspective process _total The following are provided:

wherein,from t ₀ To t ₇ A kind of electronic device. I.e. overall delay T _total Can also be expressed as:

T _total ＝t ₀ +t ₁ +t ₂ +t ₃ +t ₄ +t ₅ +t ₆ +t ₇ 。

in a scenario where the other intermediate processing modules further include more sub-modules, the overall delay of the video perspective may be further increased to be greater than T _total Is a value of (2).

In combination with the video perspective principle, when a user uses a video perspective device (such as a head-mounted device), a time dislocation exists between the user and the realityAs an example, i.e. the composite image seen in the user's eye will have the above-mentioned T compared to the real scene _total Is a time delay of (a). The larger this delay, the more pronounced the time-offset phenomenon. For example, if a user holds an object by hand while wearing the video see-through headset, the brain may have perceived that the hand touches the object, but the eyes pass through T _total Can be seen after a delay of (a).

Under the background technology, the embodiment of the application provides a video perspective system, which can reduce the overall time delay from the acquisition of a real image to the display of a synthesized image in the video perspective process by a mode of synthesizing a real image and a virtual object after separate rendering. Embodiments of the present application are described below by way of example with reference to the accompanying drawings.

Fig. 2 shows a schematic structural diagram of a video perspective system according to an embodiment of the present application. As shown in fig. 2, the video perspective system may include: camera module, intermediate processing module, display module, and other sensors.

The specific explanation of the camera module, the intermediate processing module, the display module, and the other sensors may be described with reference to the foregoing embodiments. For example, the camera module may capture a real world scene, obtain a real image corresponding to the real world scene, and transmit the real image to the intermediate processing module. The intermediate processing module may include: the SLAM module, the plane detection module, the virtual object generation module and the virtual reality synthesis module, and each module in the intermediate processing module can realize the same functions as the previous embodiment.

Optionally, the camera module is also referred to as a camera (camera) imaging module, and may specifically include a lens (lens), an optical filter, an image sensor (sensor), an image processor (image signal processor, ISP), etc., which will not be described in detail herein.

Alternatively, in the intermediate processing module shown in fig. 2, the SLAM module and the plane detection module may be implemented on a central processing unit (central processing unit, CPU). The virtual object generation module may be implemented on a graphics processor (graphics processing unit, GPU). The virtual reality composition module may be a single chip for implementing the composition function of the virtual image and the real image. The application is not limited herein.

The display module may be a display, and may display the composite image. Such as: the display module may be a display on a video see-through headset.

In addition, the modules shown in fig. 2 may be integrated on a device, for example: video see-through head mounted devices. Or, the system can be respectively deployed on a plurality of devices to form a video perspective system. For example, the camera module may be a webcam connected to the Internet, some separate image capturing device (e.g., a camera), etc. The camera module can be connected with a personal computer (personal computer, PC) or a mobile phone, and the acquired real images can be sent to the PC or the mobile phone. The CPU and GPU in the PC or the mobile phone are used as algorithm processing equipment to realize the functions of the intermediate processing module, and the display screen of the PC or the mobile phone is used as a display screen to realize the functions of the display module. The present application is also not limited in this regard.

Based on the video perspective system shown in fig. 2, in the embodiment of the application, after the camera module obtains the real image, the real image is directly sent to the virtual reality synthesis module. The SLAM module, the plane detection module and the virtual object generation module are sequentially processed to obtain a virtual image, and the virtual object generation module sends the virtual image to the virtual reality synthesis module. The virtual reality synthesis module can synthesize the received virtual image and the received real image to obtain a synthesized image, and the synthesized image is transmitted to the display module for display.

In the video perspective system shown in fig. 2, the step of acquiring the real image by the camera module, the step of processing the SLAM module, the step of processing the plane detection module, the step of processing the virtual object generation module, and the step of displaying the composite image by the display module are the same as the video perspective process shown in fig. 1. Based on this, please continue to refer to fig. 2, it is assumed that the time consumed in the step of capturing the real image by the camera module is also t ₀ The processing steps of the SLAM module are time-consuming t ₂ Plane detection dieThe processing step of the block is time-consuming t ₃ The processing steps of the virtual object generation module are time-consuming t ₄ The time consumption of the step of displaying the composite image by the display module is t ₇ . In addition, it is assumed that the time consumed in the step of transmitting the real image to the virtual reality synthesizing module by the camera module is t _{1_new} The time consumption of the processing step of the virtual reality synthesis module for obtaining the synthesized image is t _{5_new} The time consumption of the step of transmitting the synthesized image to the display module by the virtual reality synthesis module is t _{6_new} . In the embodiment of the application, the overall time delay T of the video perspective process _{_total} The following are provided:

T _{_total} ＝t ₀ +t _{1_new} +t _{5_new} +t _{6_new} +t ₇ 。

compared with the existing video perspective process shown in fig. 1, in the video perspective system provided in the embodiment of the present application, the processing steps of the SLAM module, the plane detection module, and the virtual object generation module in the intermediate processing module are executed in parallel with the step of the camera module transmitting the real image to the virtual reality synthesis module and the processing step of the virtual reality synthesis module obtaining the synthesized image. That is, the virtual image and the real image are synthesized after being separately rendered, and the real image which is required to be synthesized by the virtual reality synthesis module does not pass through intermediate processing modules such as the SLAM module, the plane detection module, the virtual object generation module and the like. Thus, the overall time delay T of the video perspective process in the embodiment of the application _{_total} Much smaller than the overall time delay T of the existing video perspective process _total . Therefore, the embodiment of the application can effectively reduce the overall system time delay of the video perspective system, and greatly reduce the negative influence caused by the phenomenon of time dislocation between a user and reality.

The following is a brief description of the processing procedure of the virtual reality synthesizing module in the embodiment of the present application.

In a practical implementation, the processing of the virtual reality synthesis module includes the following two scenarios.

Scene 1: the virtual reality composition module receives the real image from the camera module and the virtual image from the virtual object generation module.

Scene 2: the virtual reality composition module receives the real image from the camera module but does not receive the virtual image from the virtual object generation module.

In one possible design, for scenario 1 described above, the virtual reality composition module may compose the real image and the virtual image and then send the composed image to the display module for display. For scenario 2 described above, the virtual reality composition module may perform operations similar to scenario 1 after waiting until a virtual image is received.

For example, if a certain virtual reality helmet based on the video perspective technology is at a first moment (for example, the first moment may be a moment when a start switch of the virtual reality helmet is turned on), a 1 st frame of real image of a real scene is obtained through the camera module and sent to the virtual reality synthesis module, and at this moment, the virtual object generation module has not generated a virtual image including a virtual object yet. The virtual reality synthesizing module may determine whether a virtual image is received when receiving the 1 st frame of real image, and if the virtual image is not received, the virtual reality synthesizing module may wait for the virtual image. And if the virtual image is received in the k frame of real image, the virtual reality synthesis module synthesizes the 1 st frame of real image to the k frame of real image with the virtual image respectively, and sequentially sends the synthesized images to the display module for display according to the sequence of the corresponding real images. Similarly, after the virtual reality synthesizing module receives the k+1th frame real image, the k+2th frame real image, the k+3rd frame real image, and the like, the same processing manner as the k frame real image is adopted. Where k is an integer greater than 1, and may be 2, 3, 5, 8, 10, etc., without limitation.

In another possible design, for scenario 1 described above, the virtual reality composition module may compose the real image and the virtual image and then send the composed image to the display module for display. For the above scenario 2, the virtual reality synthesis module may directly send the real image to the display module for display.

For example, if a certain virtual reality helmet based on the video perspective technology is at a first moment (for example, the first moment may be a moment when a start switch of the virtual reality helmet is turned on), a 1 st frame of real image of a real scene is obtained through the camera module and sent to the virtual reality synthesis module, and at this moment, the virtual object generation module has not generated a virtual image including a virtual object yet. The virtual reality synthesizing module can judge whether the virtual image is received or not when the 1 st frame of real image is received, and the virtual reality synthesizing module can directly send the 1 st frame of real image to the display module for display because the virtual image is not received. Similarly, the virtual reality synthesis module receives the 2 nd frame real image, the 3 rd frame real image, the 4 th frame real image and the like subsequently, and if no virtual image is received, the same processing mode as the 1 st frame real image is adopted. If the virtual reality synthesizing module judges that the virtual image is received when the k frame real image is received, the virtual reality synthesizing module can synthesize the k frame real image and the virtual image and send the synthesized image to the display module for display. Similarly, after the virtual reality synthesizing module receives the k+1th frame real image, the k+2th frame real image, the k+3rd frame real image, and the like, the same processing manner as the k frame real image is adopted. Where k is an integer greater than 1, and may be 2, 3, 5, 8, 10, etc., without limitation.

In this exemplary illustration, the processing mode of the virtual object generation module for the 1 st to k-1 th frame real images corresponds to the above-described scene 2, and the processing mode of the virtual object generation module for each frame of real images after the k-th to k-th frames corresponds to the above-described scene 1.

In the design, when the virtual reality synthesis module receives the real image from the camera module but does not receive the virtual image from the virtual object generation module, the virtual reality synthesis module directly sends the real image to the display module for display, so that the duration of no picture or blank picture of the display module (or the display) when the video perspective system is started initially can be reduced.

For example, with the former design, when the user uses the video see-through headset, the display is free of pictures (or displays blank pictures) for an initial period of time when the display of the video see-through headset is just turned on, which is the duration of the interval from the 1 st frame of real image to the kth frame of real image described in the exemplary description of the former design. In the design, when the user uses the video perspective head-mounted device, the display can immediately display real scenes shot in real time when the display of the video perspective head-mounted device is just opened, and no phenomenon of no picture exists, so that the use experience of the user can be optimized.

The specific implementation principle of the virtual reality synthesizing module in the embodiment of the present application for synthesizing the virtual image and the real image is described below with reference to fig. 3 to 5. Wherein, fig. 3 shows a schematic diagram of a virtual image provided by an embodiment of the present application, fig. 4 shows a schematic diagram of a real image provided by an embodiment of the present application, and fig. 5 shows a schematic diagram of a composite image provided by an embodiment of the present application.

Please refer to fig. 3-5: fig. 3 shows a virtual image including a virtual object generated by the virtual object generating module, in which an unfilled annular blank area represents the virtual object, an area occupied by the virtual object in the virtual image is an effective pixel, and an area filled with oblique lines is an ineffective pixel. Fig. 4 shows a real image of a real scene obtained by the camera module. When the virtual reality synthesis module receives the virtual image shown in fig. 3 and the real image shown in fig. 4, invalid pixels in the virtual image shown in fig. 3 can be removed, and the virtual image (at this time, only valid pixels in the area where the virtual object is located are included) after the invalid pixels are removed and the real image shown in fig. 4 are synthesized, so that a synthesized image shown in fig. 5 is obtained. The virtual reality and real compositing module may then send the composite image shown in fig. 5 to the display module for display.

Alternatively, the step of the virtual reality composition module composing the virtual image with the real image may be described with reference to the following steps 1) to 3).

1) The virtual image and the real image are adjusted to a first size, such as an image with a pixel count of M x N, such that the virtual image is aligned with the real image pixels.

Illustratively, m×n may be 848×480, 300×150, etc., and the sizes of M and N may be adjusted according to the display requirement, the virtual image, the real image, etc., which are not limited herein.

2) For the virtual image of m×n pixels obtained in 1), the invalid pixels of the virtual image are identified as 0, the valid pixels are identified as 1, and a 1-bit (bit) mask image X of m×n is generated.

3) And synthesizing the real image and the virtual object, and according to the mask image X, if the ith row and the jth column in the M X N pixels are masked as 1, using the ith row and the jth column in the virtual image, otherwise, using the ith row and the jth column in the real image. Wherein i is an integer greater than 0, less than or equal to M, and j is an integer greater than 0, less than or equal to N.

According to the mode of the steps 1) to 3), the virtual reality synthesis module can synthesize the virtual image and the real image to obtain a synthesized image, wherein the synthesized image contains the virtual object.

Optionally, in some embodiments of the present application, an algorithm for implementing the functions of the virtual reality synthesis module (such as the above-mentioned algorithm for mask synthesis) may be hardened in a chip, and the chip is used as the virtual reality synthesis module, so as to reduce the operation delay of the virtual reality synthesis module.

Optionally, in the embodiment of the present application, through a chip for implementing the function of the virtual reality synthesis module, the image (real image or synthetic image) may be sent to the display module for display by using a higher bandwidth communication protocol such as a mobile industry processor interface-camera serial interface (mobile industry processor interface-camera serial interface, MIPI-CSI) protocol, a mobile industry processor interface-display serial interface (mobile industry processor interface-display serial interface, MIPI-DSI) protocol, etc., so as to reduce transmission delay, thereby further reducing overall system delay of the video perspective system and reducing negative effects caused by human and real "time dislocation" phenomena.

Based on the video perspective system provided in the foregoing embodiment, the embodiment of the present application further provides a video perspective method, which may be applied to the video perspective system. For example, the execution subject of the method may be a virtual reality composition module in a video perspective system, or some chip with the functionality of a virtual reality composition module. The video perspective method comprises the following steps: obtaining a real image corresponding to a real world scene and a virtual image containing a virtual object in parallel; determining a first image according to the acquisition results of the real image and the virtual image, wherein the first image is the real image or a synthesized image of the real image and the virtual image; the first image is displayed.

For example, in one possible design, the determining the first image according to the obtained results of the real image and the virtual image includes: and for each frame of real image, if the virtual image is not acquired after the real image is acquired, waiting until the virtual image is acquired, and combining the real image and the virtual image to obtain a combined image as a first image.

For another example, in another possible design, the determining the first image according to the obtained result of the real image and the virtual image includes: directly determining a real image as a first image for the real image acquired before the virtual image is acquired; and synthesizing the real image and the virtual image, which are obtained after the virtual image is obtained, to obtain a synthesized image as a first image.

The specific implementation of this method may be as described in the previous embodiments.

Alternatively, the SLAM module, the plane detection module, and the like in the intermediate processing module described in the foregoing embodiments of the present application may also be replaced by other modules corresponding to algorithms capable of implementing the same function, for example: the deep learning algorithm is not limited herein.

Corresponding to the method described in the foregoing embodiments, the embodiments of the present application further provide a video perspective device, which may be used to implement the foregoing video perspective method. The functions of the device can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules or units corresponding to the functions described above. For example, fig. 6 shows a schematic structural diagram of a video perspective device according to an embodiment of the present application. As shown in fig. 6, the video perspective device may include an acquisition unit 601, a synthesis unit 602, and a display unit 603.

The acquiring unit 601 is configured to acquire, in parallel, a real image corresponding to a real world scene and a virtual image including a virtual object; a synthesizing unit 602, configured to determine a first image according to the acquisition results of the real image and the virtual image, where the first image is the real image or a synthesized image of the real image and the virtual image; a display unit 603 for displaying the first image.

For example, the acquiring unit 601 may acquire a real image corresponding to a real world scene captured by the camera module, and acquire a virtual image generated by the virtual object generating module in parallel. The display unit 603 may send the first image to a display for display, or the display unit 603 itself may be a display or the like.

In one possible design, the synthesizing unit 602 is specifically configured to synthesize, for each frame of real image, the real image and the virtual image after the virtual image is acquired if the virtual image is not acquired yet after the real image is acquired, so as to obtain the synthesized image as the first image. If the virtual image is not acquired after the real image is acquired, the real image and the virtual image are directly synthesized, and the synthesized image is obtained as a first image.

In another possible design, the synthesizing unit 602 is specifically configured to directly determine, for a real image acquired before the virtual image is acquired, the real image as the first image; and synthesizing the real image and the virtual image, which are obtained after the virtual image is obtained, to obtain a synthesized image as a first image.

Optionally, the synthesizing unit 602 is specifically configured to adjust the real image and the virtual image to the first size; identifying an effective pixel in the virtual image as 1, and identifying an ineffective pixel as 0 to obtain a mask image corresponding to the virtual image; the effective pixels are pixels occupied by the virtual object in the virtual image, and the ineffective pixels are pixels except the effective pixels in the virtual image; and synthesizing the real image and the virtual image according to the mask image to obtain a synthesized image.

It should be understood that the division of units or modules (hereinafter referred to as units) in the above apparatus is merely a division of logic functions, and may be fully or partially integrated into one physical entity or may be physically separated. And the units in the device can be all realized in the form of software calls through the processing element; or can be realized in hardware; it is also possible that part of the units are implemented in the form of software, which is called by the processing element, and part of the units are implemented in the form of hardware.

For example, each unit may be a processing element that is set up separately, may be implemented as integrated in a certain chip of the apparatus, or may be stored in a memory in the form of a program, and the functions of the unit may be called and executed by a certain processing element of the apparatus. Furthermore, all or part of these units may be integrated together or may be implemented independently. The processing element described herein, which may also be referred to as a processor, may be an integrated circuit with signal processing capabilities. In implementation, each step of the above method or each unit above may be implemented by an integrated logic circuit of hardware in a processor element or in the form of software called by a processing element.

In one example, the units in the above apparatus may be one or more integrated circuits configured to implement the above method, for example: one or more application specific integrated circuits (application specific integrated circuit, ASIC), or one or more digital signal processors (digital signal process, DSP), or one or more field programmable logic gate arrays (field programmable gate array, FPGA), or a combination of at least two of these integrated circuit forms.

For another example, when the units in the apparatus may be implemented in the form of a scheduler of processing elements, the processing elements may be general-purpose processors, such as CPUs or other processors that may invoke programs. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).

In one implementation, the above means for implementing each corresponding step in the above method may be implemented in the form of a processing element scheduler. For example, the apparatus may comprise a processing element and a storage element, the processing element invoking a program stored in the storage element to perform the method described in the above method embodiments. The memory element may be a memory element on the same chip as the processing element, i.e. an on-chip memory element.

In another implementation, the program for performing the above method may be on a memory element on a different chip than the processing element, i.e. an off-chip memory element. At this point, the processing element invokes or loads a program from the off-chip storage element onto the on-chip storage element to invoke and execute the method described in the method embodiments above.

For example, embodiments of the present application may also provide an apparatus, such as: an electronic device may include: a processor, a memory for storing instructions executable by the processor. The processor is configured to execute the above-described instructions to cause the electronic device to implement the method as described in the previous embodiments. For example, the electronic device may be a video see-through headset as described in the previous embodiments. The memory may be located within the electronic device or may be located external to the electronic device. And the processor includes one or more.

In yet another implementation, the unit of the apparatus implementing each step in the above method may be configured as one or more processing elements, where the processing elements may be integrated circuits, for example: one or more ASICs, or one or more DSPs, or one or more FPGAs, or a combination of these types of integrated circuits. These integrated circuits may be integrated together to form a chip.

For example, the embodiment of the application also provides a chip, and the chip can be applied to the electronic equipment. The chip includes one or more interface circuits and one or more processors; the interface circuit and the processor are interconnected through a circuit; the processor receives and executes computer instructions from the memory of the electronic device through the interface circuit to implement the methods described in the method embodiments above.

Embodiments of the present application also provide a computer program product comprising computer readable code which, when run in an electronic device, causes the electronic device to implement the method described in the previous embodiments.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. With such understanding, the technical solutions of the embodiments of the present application may be essentially or partly contributing to the prior art or all or part of the technical solutions may be embodied in the form of a software product, such as: and (5) program. The software product is stored in a program product, such as a computer readable storage medium, comprising instructions for causing a device (which may be a single-chip microcomputer, chip or the like) or processor (processor) to perform all or part of the steps of the methods described in the various embodiments of the application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

For example, embodiments of the present application may also provide a computer-readable storage medium having computer program instructions stored thereon. The computer program instructions, when executed by an electronic device, cause the electronic device to carry out the method as described in the foregoing method embodiments.

Optionally, the embodiment of the application further provides a video perspective system. Fig. 7 shows another schematic structural diagram of a video perspective system provided in an embodiment of the present application. As shown in fig. 7, the video perspective system includes: a camera module 701, a central processing unit 702, a graphics processor 703, an image synthesis chip 704, and a display 705; the camera module 701 is configured to capture a real image corresponding to a real world scene, and directly send the captured real image to the image synthesis chip 704; the central processor 702 and the graphics processor 703 are used for generating a virtual image containing a virtual object, and sending the virtual image to the image synthesis chip 704; the image synthesis chip 704 is used for obtaining real images and virtual images in parallel; and according to the obtained results of the real image and the virtual image, determining a first image and sending the first image to the display 705, wherein the first image is the real image or a synthesized image of the real image and the virtual image; the display 705 is for displaying a first image.

That is, in the video perspective system shown in fig. 7, the central processor 702 can realize the functions realized by the SLAM module and the plane detection module as described in the foregoing embodiments, the graphics processor 703 can realize the functions realized by the virtual object generation module as described in the foregoing embodiments, and the image synthesis chip 704 can realize the functions realized by the virtual reality synthesis module as described in the foregoing embodiments.

Optionally, the video perspective system further comprises an infrared sensor, a gyroscope, etc. other sensors, not shown in fig. 7.

In the video perspective system shown in fig. 7, the algorithm for synthesizing the virtual image and the real image is hardened in the image synthesizing chip, so that the operation time delay when synthesizing the virtual image and the real image can be reduced.

The foregoing is merely a specific embodiment of the present application, but the protection scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered in the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of video perspective, the method comprising:

Obtaining a real image corresponding to a real world scene and a virtual image containing a virtual object in parallel;

directly determining the real image as a first image for the real image acquired before the virtual image is acquired, or combining the real image and the virtual image after the virtual image is acquired to obtain a combined image as the first image;

the real image obtained after the virtual image is obtained is synthesized with the virtual image, and a synthesized image is obtained as the first image; the first image is the real image or a synthesized image of the real image and the virtual image;

displaying the first image;

the synthesizing the real image and the virtual image to obtain a synthesized image comprises the following steps:

adjusting the real image and the virtual image to a first size;

identifying effective pixels in the virtual image as 1, and identifying ineffective pixels as 0 to obtain a mask image corresponding to the virtual image; wherein the effective pixel is a pixel occupied by the virtual object in the virtual image, and the ineffective pixel is a pixel except the effective pixel in the virtual image;

And synthesizing the real image and the virtual image according to the mask image to obtain the synthesized image.

2. A video perspective device, the device comprising:

the acquisition unit is used for acquiring real images corresponding to the real world scene and virtual images containing the virtual objects in parallel;

the synthesizing unit is used for directly determining the real image as a first image for the real image acquired before the virtual image is acquired, or synthesizing the real image and the virtual image after the virtual image is acquired, so as to obtain a synthesized image as the first image; the real image obtained after the virtual image is obtained is synthesized with the virtual image, and a synthesized image is obtained as the first image; the first image is the real image or a synthesized image of the real image and the virtual image;

a display unit configured to display the first image;

wherein the synthesizing unit is further configured to adjust the real image and the virtual image to a first size;

3. A video perspective system, comprising: the device comprises a camera module, a central processing unit, a graphic processor, an image synthesis chip and a display;

the camera module is used for capturing a real image corresponding to a real world scene and directly sending the real image to the image synthesis chip;

the CPU and the graphic processor are used for generating a virtual image containing a virtual object and sending the virtual image to the image synthesis chip;

the image synthesis chip is used for parallelly acquiring the real image and the virtual image; directly determining the real image as a first image for the real image acquired before the virtual image is acquired, or combining the real image and the virtual image after the virtual image is acquired to obtain a combined image as the first image; the real image obtained after the virtual image is obtained is synthesized with the virtual image, and a synthesized image is obtained as the first image; the first image is sent to the display, and the first image is the real image or a synthesized image of the real image and the virtual image;

The display is used for displaying the first image;

the image synthesis chip is further used for adjusting the real image and the virtual image to be of a first size;

4. An electronic device, comprising: a processor, a memory for storing instructions executable by the processor;

the processor is configured to, when executing the instructions, cause the electronic device to implement the method of claim 1.

5. A computer readable storage medium having stored thereon computer program instructions; it is characterized in that the method comprises the steps of,

the computer program instructions, when executed by an electronic device, cause the electronic device to implement the method of claim 1.