CN115686182A

CN115686182A - Processing method of augmented reality video and electronic equipment

Info

Publication number: CN115686182A
Application number: CN202110831693.9A
Authority: CN
Inventors: 刘小伟; 陈兵; 王国毅; 周俊伟
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2023-02-03
Anticipated expiration: 2041-07-22
Also published as: CN115686182B; WO2023000746A1

Abstract

A processing method and an electronic device for augmented reality video are provided, wherein the processing method comprises the following steps: acquiring an original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing a pose when a terminal device acquires the original video; generating a virtual plane according to the original video and the pose information, wherein the virtual plane is used for determining position information of virtual content added in the original video; and adding the virtual content in the original video according to the virtual plane to generate an AR video. Based on the technical method, the quality of the recorded AR video can be improved.

Description

Processing method of augmented reality video and electronic equipment

Technical Field

The application relates to the field of terminals, in particular to a processing method of an augmented reality video and an electronic device.

Background

The Augmented Reality (AR) technology is a technology for calculating the position and angle of a camera image in real time and adding a corresponding image, and is a new technology for seamlessly integrating real world information and virtual world information, and the aim of the technology is to sleeve the virtual world on a screen in the real world and perform interaction.

At present, when the AR video is recorded, because the virtual content cannot be well fused with the video of a real object, and particularly when a user needs to interact with the virtual content in a shooting scene, repeated shooting is needed for many times, which wastes time and labor.

Therefore, how to better fuse the virtual content and the real object content when recording the AR video and improve the video quality of the AR video becomes a problem which needs to be solved urgently.

Disclosure of Invention

The application provides a processing method of an augmented reality video and an electronic device, which can enable videos of virtual contents and real objects to be better fused when an AR video is recorded, and improve the video quality of the AR video.

In a first aspect, a method for processing an augmented reality video is provided, including:

acquiring an original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing a pose when a terminal device acquires the original video; generating a virtual plane according to the original video and the pose information, wherein the virtual plane is used for determining position information of virtual content added in the original video; and adding the virtual content in the original video according to the virtual plane to generate an AR video.

In the embodiment of the application, the pose information corresponding to the original video can be obtained when the original video is obtained; obtaining a virtual plane according to the pose information and the original video; when virtual content is added to an image frame of an original video, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better blended into the original video, and the video quality of an AR video is improved.

It should be understood that the pose information is used for representing the pose of the camera of the terminal equipment when the original video is acquired; the pose information may include pose information and position information.

With reference to the first aspect, in certain implementations of the first aspect, the pose information includes three-dimensional pose information, and further includes:

and expressing the three-dimensional attitude information by quaternion.

In the embodiment of the application, the three-dimensional attitude information can be converted into a quaternion to be represented, so that ambiguity caused by representing the attitude information by three parameters is avoided.

With reference to the first aspect, in certain implementations of the first aspect, the generating information of a virtual plane from the original video and the pose information includes:

extracting characteristic points of the image frames according to pose information of the image frames in the original video;

and generating the virtual plane according to the characteristic points.

It should be understood that the feature points of the image frame may refer to points where the gray value of the image changes drastically, or points with a large curvature on the edge of the image; the feature points may be used to identify objects in the image.

With reference to the first aspect, in certain implementations of the first aspect, the method further includes:

and storing the pose information and the information of the virtual plane.

In the embodiment of the application, the pose information and the information of the virtual plane are stored, so that after the recording of the original video is finished, virtual content is added into the original video according to the pose information and the information of the virtual plane of the original video to generate a new AR video; because the pose information and the information of the virtual plane are stored, a user can edit the original video for multiple times to generate AR videos with different virtual contents respectively.

With reference to the first aspect, in certain implementations of the first aspect, the saving the pose information and the information of the virtual plane includes:

and storing the pose information and the information of the virtual plane in a binary file.

In one possible implementation, the terminal device may save the pose information and the information of the virtual plane as independent binary files.

In a possible implementation manner, the pose information corresponding to the original video and the information of the virtual plane may be stored in the same directory.

In a possible implementation manner, the pose information corresponding to the original video and the information of the virtual plane may be saved in the terminal device with the same name as the original video.

In a possible implementation manner, the pose information corresponding to the original video and the information of the virtual plane may be stored in the terminal device by using the frame number of each image frame as an identifier.

and storing the pose information and the information of the virtual plane in the supplementary enhancement information corresponding to the original video.

In a possible implementation manner, the pose information and the information of the virtual plane can be saved into the supplementary enhancement information of h.264 or h.265 when the video compression coding is performed on the pose information and the information of the virtual plane.

and compressing the stored pose information and the information of the virtual plane.

In the embodiment of the application, when the attitude information and the information of the virtual plane are stored, the stored information can be compressed, so that the memory space occupied by the stored information can be effectively reduced.

In a possible implementation manner, at least one of the following manners may be adopted to perform compression processing on the information of the saved pose information and the virtual plane:

saving pose information according to the difference between the current image frame and the previous image frame; or the plane number of the virtual plane can be stored in an unsigned character mode; or, for the description of the vertex in the virtual plane, the horizontal plane may retain the Z-axis information of one point to delete the Z-axis information of other points, and the vertical plane may retain the Y-axis information of one point to delete the Y-axis information of other points; alternatively, the location description of the vertices may be in float16; alternatively, when the information of the virtual plane is stored, only the plane in the current visual field range may be stored.

With reference to the first aspect, in certain implementations of the first aspect, the adding the virtual content to the original video according to the information of the virtual plane to generate an AR video includes:

and after the original video is recorded, adding the virtual content in the original video according to the virtual plane to generate the AR video.

With reference to the first aspect, in certain implementations of the first aspect, the virtual plane includes a first virtual plane, where the first virtual plane refers to a virtual plane corresponding to a first image frame, and the first image frame is any one image frame in the original video;

the information of the first virtual plane includes a total number of image frames, an identification of the first virtual plane, a number of vertices included in the first virtual plane, and position information of each vertex included in the first virtual plane, where the total number refers to a total number of image frames included in the original video.

In a second aspect, an AR video processing apparatus is provided, where the processing apparatus includes an obtaining unit and a processing unit; the acquiring unit is used for acquiring an original video and pose information, the original video is used for representing a video of a real object, and the pose information is used for representing a pose when the terminal equipment acquires the original video; the processing unit is used for generating a virtual plane according to the original video and the pose information, and the virtual plane is used for determining position information of virtual content added in the original video; and adding the virtual content in the original video according to the virtual plane to generate an AR video.

With reference to the second aspect, in certain implementations of the second aspect, the pose information includes three-dimensional pose information, and the processing unit is further configured to:

and expressing the three-dimensional attitude information by quaternion.

With reference to the second aspect, in some implementations of the second aspect, the processing unit is specifically configured to:

and generating the virtual plane according to the characteristic points.

With reference to the second aspect, in certain implementations of the second aspect, the processing unit is further configured to:

and storing the pose information and the information of the virtual plane.

With reference to the second aspect, in some implementations of the second aspect, the virtual plane includes a first virtual plane, where the first virtual plane refers to a virtual plane corresponding to a first image frame, and the first image frame is any one image frame in the original video;

the information of the first virtual plane includes a total number of image frames, an identifier of the first virtual plane, a number of vertices included in the first virtual plane, and position information of each vertex included in the first virtual plane, where the total number refers to a total number of image frames included in the original video.

In a possible implementation manner, the processing apparatus of the AR video may refer to a chip.

When the processing device is a chip, the obtaining unit may refer to an output interface, a pin, a circuit, or the like; the processing unit may refer to a processing unit inside a chip.

It is to be understood that extensions, definitions, explanations and explanations of relevant contents in the above-mentioned first aspect also apply to the same contents in the second aspect.

In a third aspect, an electronic device is provided, which includes: one or more processors, memory, and a display screen; the memory coupled with the one or more processors, the memory to store computer program code, the computer program code including computer instructions, the one or more processors to invoke the computer instructions to cause the electronic device to perform:

With reference to the third aspect, in certain implementations of the third aspect, the pose information includes three-dimensional pose information, the one or more processors invoke the computer instructions to cause the electronic device to further perform:

and expressing the three-dimensional attitude information by quaternion.

With reference to the third aspect, in certain implementations of the third aspect, the one or more processors invoke the computer instructions to cause the electronic device to further perform:

and generating the virtual plane according to the characteristic points.

and storing the pose information and the information of the virtual plane.

With reference to the third aspect, in some implementations of the third aspect, the virtual plane includes a first virtual plane, where the first virtual plane refers to a virtual plane corresponding to a first image frame, and the first image frame is any one image frame in the original video;

It will be appreciated that extensions, definitions, explanations and explanations of relevant content in the above-described first aspect also apply to the same content in the third aspect.

In a fourth aspect, an electronic device is provided, the electronic device comprising: one or more processors, memory, and a display screen; the memory coupled with the one or more processors for storing computer program code comprising computer instructions which the one or more processors invoke to cause the electronic device to perform any of the processing methods of the first aspect.

In a fifth aspect, a chip system is provided, which is applied to an electronic device, and includes one or more processors, where the processor is configured to invoke computer instructions to cause the electronic device to execute any one of the processing methods in the first aspect.

In a sixth aspect, a computer-readable storage medium is provided, which stores computer program code, which, when executed by an electronic device, causes the electronic device to perform any one of the processing methods of the first aspect.

In a seventh aspect, a computer program product is provided, the computer program product comprising: computer program code which, when run by an electronic device, causes the electronic device to perform any of the processing methods of the first aspect.

In the embodiment of the application, the pose information corresponding to the original video can be acquired when the original video is acquired; obtaining a virtual plane according to the pose information and the original video; when virtual content is added into an image frame of an original video, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane; therefore, in the embodiment of the application, the virtual content can be better merged into the original video through the virtual plane, so that the video quality of the generated AR video is improved.

Drawings

FIG. 1 is a schematic diagram of a hardware system suitable for use in the apparatus of the present application;

FIG. 2 is a schematic diagram of a software system suitable for use in the apparatus of the present application;

FIG. 3 is a schematic diagram of an application scenario provided herein;

fig. 4 is a schematic diagram of a processing method of an augmented reality video provided by the present application;

FIG. 5 is a schematic diagram of a display interface of an AR video processing system provided in the present application;

FIG. 6 is a schematic diagram of a display interface of an AR video processing system provided in the present application;

FIG. 7 is a schematic diagram of a display interface of an AR video processing system provided in the present application;

FIG. 8 is a schematic view of a display interface for AR video processing provided herein;

FIG. 9 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 10 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 11 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 12 is a schematic diagram of a display interface for AR video processing provided herein;

fig. 13 is a schematic diagram of a processing method of an augmented reality video provided by the present application;

FIG. 14 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 15 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 16 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 17 is a schematic view of a display interface for AR video processing provided herein;

FIG. 18 is a schematic illustration of a display interface for AR video processing provided herein;

FIG. 19 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 20 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 21 is a schematic diagram of a display interface for AR video processing provided herein;

fig. 22 is a schematic structural diagram of a processing apparatus for augmented reality video according to the present application;

fig. 23 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 shows a hardware system suitable for a terminal device of the present application.

The terminal device 100 may be a mobile phone, a smart screen, a tablet computer, a wearable electronic device, an in-vehicle electronic device, an Augmented Reality (AR) device, a Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), a projector, and the like, and the specific type of the terminal device 100 is not limited in this embodiment.

The terminal device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

The configuration shown in fig. 1 is not intended to specifically limit the terminal device 100. In other embodiments of the present application, terminal device 100 may include more or fewer components than shown in FIG. 1, or terminal device 100 may include a combination of some of the components shown in FIG. 1, or terminal device 100 may include sub-components of some of the components shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units. For example, the processor 110 may include at least one of the following processing units: an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and a neural Network Processor (NPU). The different processing units may be independent devices or integrated devices.

The controller can generate an operation control signal according to the instruction operation code and the time sequence signal to finish the control of instruction fetching and instruction execution.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. For example, the processor 110 may include at least one of the following interfaces: an inter-integrated circuit (I2C) interface, an inter-integrated circuit audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a SIM interface, and a USB interface.

The I2C interface is a bidirectional synchronous serial bus including a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, the charger, the flash, the camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the terminal device 100.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 through an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through the I2S interface, so as to implement a function of receiving a call through a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194 and camera 193. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture function of terminal device 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the terminal device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal interface and may also be configured as a data signal interface. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, and the sensor module 180. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, or a MIPI interface.

The USB interface 130 is an interface conforming to a USB standard specification, and may be a Mini (Mini) USB interface, a Micro (Micro) USB interface, or a USB Type C (USB Type C) interface, for example. The USB interface 130 may be used to connect a charger to charge the terminal device 100, to transmit data between the terminal device 100 and a peripheral device, and to connect an earphone to play audio through the earphone. The USB interface 130 may also be used to connect other terminal devices 100, such as AR devices.

The connection relationship between the modules shown in fig. 1 is only illustrative, and does not limit the connection relationship between the modules of the terminal device 100. Alternatively, the modules of the terminal device 100 may also adopt a combination of multiple connection manners in the above embodiments.

The charge management module 140 is used to receive power from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive the current of the wired charger through the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive electromagnetic waves through a wireless charging coil of the terminal device 100 (current path is shown as dashed line). The charging management module 140 may also supply power to the terminal device 100 through the power management module 141 while charging the battery 142.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, and battery state of health (e.g., leakage, impedance). Alternatively, the power management module 141 may be disposed in the processor 110, or the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the terminal device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication applied to the terminal device 100, for example, as followsAt least one of the following schemes: second generation (2) ^th generation, 2G) mobile communication solution, third generation (3) ^th generation, 3G) mobile communication solution, fourth generation (4) ^th generation, 5G) mobile communication solution, fifth generation (5) ^th generation, 5G) mobile communication solutions. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and then transmit the electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 may further amplify the signal modulated by the modem processor, and the amplified signal is converted into electromagnetic waves by the antenna 1 and radiated. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs sound signals through an audio device (e.g., speaker 170A, microphone 170B) or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

Similar to the mobile communication module 150, the wireless communication module 160 may also provide a wireless communication solution applied on the terminal device 100, such as at least one of the following: wireless Local Area Networks (WLANs), bluetooth (BT), bluetooth Low Energy (BLE), ultra Wide Band (UWB), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR) technologies. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency-modulates and filters electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive, frequency modulate and amplify the signal to be transmitted from the processor 110, which is converted into electromagnetic waves via the antenna 2 for radiation.

In some embodiments, the antenna 1 of the terminal device 100 and the mobile communication module 150 are coupled and the antenna 2 of the terminal device 100 and the wireless communication module 160 are coupled so that the terminal device 100 can communicate with a network and other electronic devices through wireless communication technology. The wireless communication technology may include at least one of the following communication technologies: global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), time division code division multiple access (TD-SCDMA), long Term Evolution (LTE), BT, GNSS, WLAN, NFC, FM, IR technologies. The GNSS may include at least one of the following positioning techniques: global Positioning System (GPS), global navigation satellite system (GLONASS), beidou satellite navigation system (BDS), quasi-zenith satellite system (QZSS), satellite Based Augmentation System (SBAS).

Terminal device 100 may implement display functionality through the GPU, display screen 194, and application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 may be used to display images or video. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini light-emitting diode (Mini LED), a Micro light-emitting diode (Micro LED), a Micro OLED (Micro OLED), or a quantum dot light-emitting diode (QLED). In some embodiments, the terminal device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The terminal device 100 may implement a photographing function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can perform algorithm optimization on the noise, brightness and color of the image, and can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to be converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into a standard Red Green Blue (RGB), YUV, or the like format image signal. In some embodiments, the terminal device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the terminal device 100 selects a frequency point, the digital signal processor is used to perform fourier transform or the like on the frequency point energy.

Video codecs are used to compress or decompress digital video. The terminal device 100 may support one or more video codecs. In this way, the terminal device 100 can play or record video in a plurality of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, and MPEG4.

The NPU is a processor which uses biological neural network structure for reference, for example, the NPU can rapidly process input information by using a transfer mode between human brain neurons, and can also continuously self-learn. The NPU can implement functions of intelligent cognition and the like of the terminal device 100, for example: image recognition, face recognition, speech recognition and text understanding.

The external memory interface 120 may be used to connect an external memory card, such as a Secure Digital (SD) card, to expand the memory capability of the terminal device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in the external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. Wherein the storage program area may store an operating system, an application program required for at least one function (e.g., a sound playing function and an image playing function). The storage data area may store data (e.g., audio data and a phonebook) created during use of the terminal device 100. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a nonvolatile memory such as: at least one magnetic disk storage device, a flash memory device, and a universal flash memory (UFS), and the like. The processor 110 executes various processing methods of the terminal device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The terminal device 100 may implement audio functions, such as music playing and recording, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.

The audio module 170 is used to convert digital audio information into an analog audio signal for output, and may also be used to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a horn, converts the audio electrical signal into a sound signal. The terminal device 100 can listen to music or handsfree talk through the speaker 170A.

The receiver 170B, also called an earpiece, is used to convert the electrical audio signal into a sound signal. When the user answers a call or voice information using the terminal apparatus 100, the voice can be answered by placing the receiver 170B close to the ear.

The microphone 170C, also referred to as a microphone or microphone, is used to convert sound signals into electrical signals. When a user makes a call or sends voice information, a voice signal may be input into the microphone 170C by sounding close to the microphone 170C. The terminal device 100 may be provided with at least one microphone 170C. In other embodiments, the terminal device 100 may be provided with two microphones 170C to implement the noise reduction function. In other embodiments, three, four or more microphones 170C may be provided in the terminal device 100 to achieve the functions of identifying the sound source and directing the recording. The processor 110 may process the electrical signal output by the microphone 170C, for example, the audio module 170 and the wireless communication module 160 may be coupled via a PCM interface, and the microphone 170C converts the ambient sound into an electrical signal (e.g., a PCM signal) and transmits the electrical signal to the processor 110 via the PCM interface; the electrical signal is subjected to a volume analysis and a frequency analysis from processor 110 to determine the volume and frequency of the ambient sound.

The earphone interface 170D is used to connect a wired earphone. The earphone interface 170D may be the USB interface 130, or may be an open mobile terminal 100 platform (OMTP) standard interface of 3.5mm, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A may be of a wide variety of types, and may be, for example, a resistive pressure sensor, an inductive pressure sensor, or a capacitive pressure sensor. The capacitive pressure sensor may be a sensor including at least two parallel plates having conductive materials, and when a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes, and the terminal device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation is applied to the display screen 194, the terminal device 100 detects the touch operation from the pressure sensor 180A. The terminal device 100 may also calculate the touched position from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message; and when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine the motion attitude of the terminal device 100. In some embodiments, the angular velocity of terminal device 100 about three axes (i.e., the x-axis, y-axis, and z-axis) may be determined by gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the terminal device 100, calculates the distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the terminal device 100 by a reverse movement, thereby achieving anti-shake. The gyro sensor 180B can also be used in scenes such as navigation and motion sensing games.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal device 100 calculates an altitude from the barometric pressure measured by the barometric pressure sensor 180C, and assists in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The terminal device 100 may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the terminal device 100 is a folder, the terminal device 100 may detect the opening and closing of the folder according to the magnetic sensor 180D. The terminal device 100 may set the automatic unlocking of the flip according to the detected opening/closing state of the holster or the detected opening/closing state of the flip.

The acceleration sensor 180E can detect the magnitude of acceleration of the terminal device 100 in various directions (generally, x-axis, y-axis, and z-axis). The magnitude and direction of gravity may be detected when the terminal device 100 is stationary. The acceleration sensor 180E may also be used to recognize the posture of the terminal device 100 as an input parameter for applications such as horizontal and vertical screen switching and pedometer.

The distance sensor 180F is used to measure a distance. The terminal device 100 may measure the distance by infrared or laser. In some embodiments, for example in a shooting scene, terminal device 100 may utilize range finding of range sensor 180F to achieve fast focus.

The proximity light sensor 180G may include, for example, a light-emitting diode (LED) and a photodetector, for example, a photodiode. The LED may be an infrared LED. The terminal device 100 emits infrared light outward through the LED. The terminal device 100 detects infrared reflected light from a nearby object using a photodiode. When the reflected light is detected, the terminal device 100 can determine that an object exists nearby. When the reflected light is not detected, the terminal device 100 can determine that there is no object nearby. The terminal device 100 may use the proximity light sensor 180G to detect whether the user holds the terminal device 100 close to the ear for conversation, so as to automatically turn off the screen to save power. The proximity light sensor 180G may also be used for automatic unlocking and automatic screen locking in a holster mode or a pocket mode.

The ambient light sensor 180L is used to sense the ambient light level. The terminal device 100 may adaptively adjust the brightness of the display screen 194 according to the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the terminal device 100 is in a pocket, in order to prevent accidental touches.

The fingerprint sensor 180H is used to collect a fingerprint. The terminal device 100 can utilize the collected fingerprint characteristics to realize functions of unlocking, accessing an application lock, taking a picture, answering an incoming call and the like.

The temperature sensor 180J is used to detect temperature. In some embodiments, the terminal device 100 executes a temperature processing policy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds the threshold, the terminal device 100 performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the terminal device 100 heats the battery 142 when the temperature is below another threshold to avoid the terminal device 100 being abnormally shut down due to low temperature. In other embodiments, when the temperature is lower than a further threshold, the terminal device 100 performs boosting on the output voltage of the battery 142 to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also referred to as a touch device. The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also referred to as a touch screen. The touch sensor 180K is used to detect a touch operation applied thereto or in the vicinity thereof. The touch sensor 180K may pass the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on the surface of the terminal device 100 at a different position from the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 180M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.

The keys 190 include a power-on key and a volume key. The keys 190 may be mechanical keys or touch keys. The terminal device 100 may receive a key input signal and implement a function related to a case input signal.

The motor 191 may generate vibrations. The motor 191 may be used for incoming call prompts as well as for touch feedback. The motor 191 may generate different vibration feedback effects for touch operations applied to different applications. The motor 191 may also produce different vibratory feedback effects for touch operations applied to different areas of the display screen 194. Different application scenarios (e.g., time reminders, received messages, alarms, and games) may correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a change in charge status and charge level, or may be used to indicate a message, missed call, and notification.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195 to make contact with the terminal device 100, or may be pulled out from the SIM card interface 195 to make separation from the terminal device 100. The terminal device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The same SIM card interface 195 may be inserted with multiple cards at the same time, which may be of the same or different types. The SIM card interface 195 may also be compatible with external memory cards. The terminal device 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the terminal device 100 employs an embedded SIM (eSIM) card, which may be embedded in the terminal device 100 and cannot be separated from the terminal device 100.

The hardware system of the terminal device 100 is described above in detail, and the software system of the terminal device 100 is described below. The software system may adopt a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture or a cloud architecture, and the software system of the terminal device 100 is exemplarily described in this embodiment by taking the layered architecture as an example.

As shown in fig. 2, the software system adopting the layered architecture is divided into a plurality of layers, and each layer has a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the software system may be divided into four layers, an application layer, an application framework layer, an Android Runtime (Android Runtime) and system library, and a kernel layer from top to bottom, respectively.

The application layer may include applications such as camera, gallery, calendar, talk, map, navigation, WLAN, bluetooth, music, video, short message, etc.

The Application framework layer provides an Application Programming Interface (API) and a Programming framework for the Application program of the Application layer. The application framework layer may include some predefined functions.

For example, the application framework layer includes a window manager, a content provider, a view system, a resource manager And a notification manager, a Simultaneous Localization And Mapping (SLAM) pose calculation module, and a plane generation module; the application framework layer may also include a telephony manager.

The SLAM pose calculation module is used for outputting pose information and sparse point cloud; the pose information refers to pose information of a camera of the terminal equipment, and the camera of the terminal equipment is used for acquiring a video of a real scene; feature point extraction can be carried out on any frame of image in the video according to pose information of the frame of image, and sparse point cloud is obtained through calculation.

The plane generation module is used for generating a virtual plane through algorithm fitting according to the sparse point cloud provided by the SLAM; when virtual content is added in a real scene, the placement position of the virtual content can be adjusted according to the virtual plane; for example, when the user clicks the screen/gesture operation to place the virtual content, the operation of the user may collide with the generated virtual plane to determine the placement position of the virtual content. It should be understood that the program instructions corresponding to the processing method for the augmented reality video provided in the embodiment of the present application may be executed in the SLAM pose calculation module and the plane generation module.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen and judge whether a status bar, a lock screen and a capture screen exist.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and answered, browsing history and bookmarks, and phone books.

The view system includes visual controls such as controls to display text and controls to display pictures. The view system may be used to build applications. The display interface may be composed of one or more views, for example, a display interface including a short message notification icon, and may include a view displaying text and a view displaying pictures.

The telephone manager is used to provide a communication function of the terminal device 100, such as management of a call state (on or off).

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, and video files.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as notification managers, are used for download completion notifications and message reminders. The notification manager may also manage notifications that appear in a chart or scrollbar text form in a status bar at the top of the system, such as notifications for applications running in the background. The notification manager may also manage notifications that appear on the screen in dialog windows, such as prompting for text messages in a status bar, sounding an alert tone, vibrating the electronic device, and flashing an indicator light.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.

The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used to perform the functions of object lifecycle management, stack management, thread management, security and exception management, and garbage collection.

The system library may include a plurality of functional modules, such as: surface managers (surface managers), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., open graphics library for embedded systems, openGL ES) and 2D graphics engines (e.g., sketch graphics library, SGL) for embedded systems.

The surface manager is used for managing the display subsystem and providing fusion of the 2D layer and the 3D layer for a plurality of application programs.

The media library supports playback and recording of multiple audio formats, playback and recording of multiple video formats, and still image files. The media library may support a variety of audiovisual coding formats, such as MPEG4, h.264, moving picture experts group audio layer III (MP 3), advanced Audio Coding (AAC), adaptive multi-rate (AMR), joint photographic experts group (JPG), and Portable Network Graphics (PNG).

The three-dimensional graphics processing library may be used to implement three-dimensional graphics drawing, image rendering, compositing, and layer processing.

The two-dimensional graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The kernel layer can comprise driving modules such as a display driver, a camera driver, an audio driver and a sensor driver.

The following exemplifies the workflow of the software system and the hardware system of the terminal device 100 in conjunction with displaying a photographing scene.

When a user performs a touch operation on the touch sensor 180K, a corresponding hardware interrupt is sent to the kernel layer, and the kernel layer processes the touch operation into an original input event, where the original input event includes information such as touch coordinates and a timestamp of the touch operation. The original input event is stored in the kernel layer, and the application framework layer acquires the original input event from the kernel layer, identifies a control corresponding to the original input event, and notifies an Application (APP) corresponding to the control. For example, the touch operation is a click operation, the APP corresponding to the control is a camera APP, and after the camera APP is awakened by the click operation, the camera drive of the kernel layer can be called through the API, and the camera 193 is controlled to shoot through the camera drive.

At present, when the AR video is recorded, because the virtual content and the video of a real object cannot be well fused, and particularly when a user needs to interact with the virtual content in a shooting scene, repeated shooting is needed for many times, so that time and labor are wasted.

In view of this, the present application provides a processing method of an AR video, which acquires pose information corresponding to an original video when acquiring the original video; obtaining a virtual plane according to the pose information and the original video; when virtual content is added to an image frame of an original video, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better blended into the original video, and the video quality of an AR video is improved.

The following describes in detail a processing method of an augmented reality video provided by the present application with reference to fig. 3 to 21 by taking the terminal device 100 as an example.

FIG. 3 is a schematic diagram of an application scenario of the present application; as shown in fig. 3, the processing method of the AR video provided by the embodiment of the present application may be applied to the field of AR video; the method comprises the steps that an original video can be obtained, and a target video can be obtained through AR video processing; the original video may refer to a video of a real object shot by a user, and the target video may refer to an AR video obtained by adding virtual content to the original video.

For example, the processing method for the AR video provided by the embodiment of the present Application may also be run in an Application (APP) to perform editing of the AR video; for example, the AR video APP may execute the processing method of the AR video of the present application. Or, the processing method of the AR video provided by the embodiment of the present application may also be integrated in a camera of the terminal device; for example, an AR video mode may be selected in the setting of a camera of a terminal device, thereby implementing the processing method of an AR video provided in the embodiment of the present application; these two implementations are described in detail below.

The implementation mode is as follows: the AR video processing method is realized through an application program.

As shown in fig. 4, fig. 4 is a schematic flowchart of a processing method of an AR video provided by the embodiment of the present application; the processing method 200 includes steps S210 to S260, which are described in detail below.

And step S210, running the AR video APP.

For example, a user may click an AR video APP in a display interface of a terminal device; responding to the click operation of the user, the terminal equipment can run the AR video APP; as shown in fig. 5, fig. 5 illustrates a Graphical User Interface (GUI) of the terminal device, which may be a desktop 310 of the terminal device. When the terminal device detects that the user clicks the icon 320 of the AR video APP on the desktop 310, the AR video APP may be run, and another GUI as shown in fig. 6 is displayed; a shooting view finder 340 can be included on the display interface 330 shown in fig. 6, and a preview image can be displayed in real time in the shooting view finder 340; controls 350 for indicating a shot, as well as other shooting controls, may also be included on the shooting interface.

In one example, when the terminal device detects that a user clicks an icon of an AR video APP on a display interface, the AR video APP can be started, and the display interface of the AR video APP is displayed; a shooting view frame can be included on the display interface; for example, in the video recording mode, the shooting view frame may be a part of the screen, or may be the entire display screen. In a preview state, namely before the user opens the AR video APP and presses the shooting button, the preview image can be displayed in real time in the shooting view finder.

It should also be understood that the above description is by way of example of the AR video APP, and the application program name is not limited in this embodiment of the application.

And S220, acquiring the original video and pose information.

For example, as shown in fig. 7, the terminal device detects an operation of clicking a control 350 for shooting by the user, and starts recording an image displayed in the shooting finder.

It should be understood that the act of instructing the user to shoot may include pressing a shooting button, or may include the user equipment instructing the terminal equipment to shoot through voice, or may also include the user otherwise instructing the terminal equipment to shoot. The foregoing is illustrative and not limiting of the present application.

Exemplarily, the pose information may be used to represent the pose of a camera of the terminal device when acquiring the original video; the pose information may include pose information and position information.

For example, the terminal device may acquire pose information corresponding to each frame of image through the gyro sensor 180B as shown in fig. 1.

And step S230, storing the pose information and the information of the virtual plane.

The stored pose information may refer to pose information corresponding to each image frame in the original video.

Exemplarily, feature point extraction can be performed on any image frame in an original video according to pose information of the image frame, and sparse point cloud is obtained through calculation; a virtual plane can be generated by algorithm fitting according to the sparse point cloud; when virtual content is added to the video of the real object, the placement position of the virtual content can be adjusted according to the virtual plane.

In the embodiment of the application, the pose information and the information of the virtual plane are stored, so that after the recording of the original video is finished, virtual content is added into the original video according to the pose information and the information of the virtual plane of the original video to generate a new AR video; because the pose information and the information of the virtual plane are stored, a user can edit an original video for multiple times to generate AR videos comprising different virtual contents respectively.

In one example, in the processing method of the AR video, the acquired three-dimensional pose information may be represented by quaternions, so that ambiguity caused by representing the pose by three parameters can be avoided.

Wherein, the quaternion can be formed by real number plus three imaginary number units i, j, k; for example, the quaternions can all be linear combinations of 1, i, j and k, i.e., the quaternions can be generally expressed as a + bi + cj + dk, where a, b, c, d all represent real numbers; i. j, k may represent rotation; wherein, i rotation may represent rotation of the X-axis in the X-axis and Y-axis intersection plane in the positive direction to the Y-axis, j rotation may represent rotation of the Z-axis in the positive direction to the X-axis in the Z-axis and X-axis intersection plane, and k rotation may represent rotation of the Y-axis in the positive direction to the Z-axis in the Y-axis and Z-axis intersection plane.

Illustratively, receiving an instruction of a user to instruct shooting at a terminal device; for example, when a user clicks video recording on terminal equipment, the terminal equipment can start the initialization work of pose calculation; before the initialization is not successful, the pose can be expressed as (position x/y/z, rotation quaternion), namely (0,0,0,0,0,0,0), and the information of the virtual plane is (quantity 0); when the initialization is successful, the pose of the designated image frame (initialization initial frame) is expressed as (0,0,0,0,0,0,0), the information of the virtual plane is expressed as (number X, plane number 0, plane 0 point number n, and position X of point0 ₁ ,Y ₁ ,Z ₁ …, position X of point n _n ,Y _n ,Z _n )。

Wherein the number x represents the total number of virtual planes, i.e., the total number of image frames included in the video; plane number 0 may be used to represent a first virtual plane of the plurality of virtual planes; plane 0 points n may be used to indicate that the number of vertices included in the first virtual plane is n; position X of point0 ₁ ,Y ₁ ,Z ₁ Position information for indicating that the first virtual plane includes vertex 0; of point nPosition X _n ,Y _n ,Z _n For indicating the position information of the first virtual plane including the vertex n.

It should be understood that the information of the virtual plane may include position information of all vertices included in the virtual plane.

For example, in the video recording process, the pose information corresponding to the acquired current image frame may be represented as (X, Y, Z, q) ₀ ,q ₁ ,q ₂ ,q ₃ ) The information of the virtual plane can be expressed as (number X, plane number A, number n of plane A points, position X of point0 ₁ ,Y ₁ ,Z ₁ …, position X of point q _q ,Y _q ,Z _q )。

The x, y and z may respectively represent coordinates of a camera acquiring a current image frame on an x axis, a y axis and a z axis; q. q.s ₀ ,q ₁ ,q ₂ ,q ₃ Representing a rotational quaternion; for example, it can be expressed as a pitch angle, an azimuth angle, a rotation angle, and an euler angle; the number x represents the total number of planes; the plane number a may be used to represent an identifier of a virtual plane corresponding to the current image frame; the point number n of the plane A is used for representing that the number of vertexes included in a virtual plane corresponding to the current image frame is n; position X of point0 ₁ ,Y ₁ ,Z ₁ The position information which can be used for representing that the virtual plane corresponding to the current image frame comprises a vertex 0; position X of point n _n ,Y _n ,Z _n And the position information is used for representing the vertex n included by the virtual plane corresponding to the current image frame.

In one example, one image frame in the original video may be acquired; extracting characteristic points of the image frame according to the pose information of the image frame, and calculating to obtain sparse point cloud; a virtual plane can be generated in a fitting mode according to the sparse point cloud information; when virtual content is added to a video, the position of the virtual content added to the video can be adjusted according to the virtual plane.

For example, when the user clicks the screen/gesture operation to place the virtual content, the operation of the user may collide with the generated virtual plane to determine the placement position of the virtual content.

In the embodiment of the application, after the pose information and the information of the virtual plane are acquired, the terminal device may store the pose information and the information of the virtual plane.

In one example, the custom information includes the above pose information and information of the virtual plane, and the terminal device may save the custom information as an independent binary file (bin).

For example, the original video and the custom information corresponding to the original video may be stored in the same directory.

For example, the customized information corresponding to the original video may be saved in the terminal device with the same name as the original video.

For example, the customized information corresponding to the original video can be stored in the terminal device by using the frame number of one image frame as the identifier.

Illustratively, the custom information corresponding to each image frame in the original video may be saved as a separate bin file according to the following data format:

frame number: frame num, unscented int32;

pose information: (data 1, data 2, data 3, data 4, data 5, data 6, data 7); wherein, the data 1 to 7 can be data in float format;

information of the virtual plane: (num: unsigned int32; planeNum0: unsigned int32; planeNumPoint: unsigned int32; point0 (float, float, float) … point N (float, float, float) … planeNumN …);

for example, when an original video is edited, the original video and the bin file may be loaded at the same time; and synchronously aligning the image frame in the original video with the custom information corresponding to the image frame according to the frame number.

In an example, the custom information may include the pose information and the information of the virtual plane, and the terminal device may store the custom information into the supplemental enhancement information in the video code stream corresponding to the original video.

For example, the following information may be stored in the SEI information of h.264/h.265 when video compression coding is performed:

pose information: (float, float, float, float, float, float, float);

information of the virtual plane: (num: unscigned int32; planeNum0: unscigned int32; planeNumPoint: unscigned int32; point0 (float, float) … pointN (float, float, float) … planeNumN …).

In the case that the custom information is stored in the SEI information of the video compression coding, when the edited video is decoded in step S250, the custom information may be decoded according to the above format.

In the embodiment of the present application, in order to reduce the storage space of the terminal device occupied by storing the pose information and the information of the virtual plane, at least one of the following ways may be adopted to perform compression processing on the custom information:

saving pose information according to the difference between the current image frame and the previous image frame; or, the plane number of the virtual plane may be saved in an unsigned char (unsigned char) manner; or, for the description of the vertex in the virtual plane, the horizontal plane may retain the Z-axis information of one point to delete the Z-axis information of other points, and the vertical plane may retain the Y-axis information of one point to delete the Y-axis information of other points; alternatively, the location description of the vertices may be in float16; alternatively, when the information of the virtual plane is stored, only the plane in the current visual field range may be stored.

In the embodiment of the application, on one hand, the original video is obtained through the AR video APP, so that the pose information and the information of a virtual plane of the original video can be generated and stored when the video is recorded; on the other hand, after the recording of the original video is finished, each image frame in the original video can be edited; such as adding virtual content.

And step S240, finishing the recording of the original video.

For example, as shown in fig. 8, the terminal device detects that the user clicks the shooting control 350 again, and ends recording of the video this time; for example, the video is recorded for 20 seconds.

And S250, opening a visual interface of the virtual plane, and editing the original video.

It should be understood that when any image frame in the original video is edited, the terminal device may call the stored user-defined information corresponding to the image frame; namely, the pose information and the plane information of the image frame are called.

For example, any image frame of the 8 th second in the original video is extracted, and the display interface 330 shown in fig. 9 may further include an editing option 360; after the terminal device detects that the user clicks the edit option 360, the terminal device may display an interface of an edit mode, as shown in fig. 10; after detecting that the user clicks on the edit mode interface to indicate the AR content selection 361, the terminal device displays a display interface as shown in fig. 11; a display plane option 362 is further included in the display interface of fig. 11, and the terminal device detects that the user can click on the display plane option 362, and displays the generated virtual plane 363 in the display interface, see fig. 12; in an embodiment of the application, a visualization plane for placing virtual content can be provided for a user on a display interface of a terminal device; for example, in the process of adding virtual content by the user, a virtual plane 363 may be displayed on the display interface; when the user clicks the screen/gesture operation to place the virtual content, the operation of the user collides with the virtual plane 363, thereby determining the placement position of the virtual content, as shown in fig. 12.

It is to be understood that when editing the virtual content, such as adjusting the position of the virtual content, a virtual plane 363 may be displayed in the interface; after editing is completed, the virtual plane 363 does not appear in the AR video; the virtual plane 363 can serve as a reference plane for the user to determine the location of the virtual content to be added to the video.

And step S260, generating the AR video comprising the virtual content.

Illustratively, a user may edit each image frame in the original video; for example, virtual content may be added to each image frame, and the position information of the virtual content in each image frame may be adjusted; thereby generating an AR video with virtual content.

In one example, a user may play an original video, and clicking a pause key by the user may extract and edit a current image frame, i.e., add virtual content to the current image frame; when the user clicks the play button again, the current image frame editing is completed.

In the embodiment of the application, when the original video is acquired, the pose information corresponding to the original video can be acquired; obtaining a virtual plane according to the pose information and the original video; when virtual content is added to an image frame of an original video, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better blended into the original video, and the video quality of an AR video is improved.

The implementation mode two is as follows: the AR video processing method is integrated in the camera mode of the terminal device.

As shown in fig. 13, fig. 13 is a schematic flowchart of a processing method of an AR video provided by an embodiment of the present application; the processing method 400 includes steps S410 to S470, which are described in detail below.

And step S410, operating the camera of the terminal equipment.

For example, the terminal device checks an operation of clicking a camera by the user; in response to a click operation by the user, the terminal device may operate the camera.

FIG. 14 illustrates a GUI of a terminal device, which may be a desktop 510 of the terminal device; when the terminal device detects an operation of clicking an icon 520 of a camera on the desktop 510 by the user, the camera may be operated to display another GUI as shown in fig. 15, and the GUI may be a display interface 530 of the camera; the display interface 530 may include a shooting view finder 540, a control 550 for indicating shooting, and other shooting controls, wherein the preview image may be displayed in real time within the shooting view finder 540.

Step S420, the AR shooting mode is selected.

For example, the terminal device may be an operation of detecting that the user indicates an AR shooting mode. Among them, the AR photographing mode may refer to a photographing mode in which virtual contents are added in a process that can be performed on an original video.

As shown in fig. 16, the shooting interface further includes a setting 560, and after the terminal device detects that the user clicks the setting 560, the terminal device displays a setting mode interface, as shown in fig. 17; after the terminal device detects that the user clicks the AR video 561 which is used for indicating on the setting mode interface, the terminal device enters an AR shooting mode.

And step S430, acquiring the original video and pose information.

For example, as shown in fig. 17, the terminal device detects an operation of clicking a control 550 for shooting by the user, and starts recording an image displayed in the shooting finder.

It should be understood that the act of instructing the user to shoot may include pressing a shooting button, or may include an act of instructing the terminal device to shoot through voice by the user device, or may also include an act of instructing the terminal device to shoot by the user; the foregoing is illustrative and not limiting of the present application.

For example, the pose information may be used to represent the pose of a camera of the terminal device when acquiring the original video; the pose information may include pose information and position information.

And step S440, storing the pose information and the information of the virtual plane.

Exemplarily, feature point extraction can be performed on any image frame in an original video according to pose information of the image frame, and sparse point cloud is obtained through calculation; a virtual plane can be generated by algorithm fitting according to the sparse point cloud; when virtual content is added to a real scene, the placement position of the virtual content can be adjusted according to the virtual plane.

In the embodiment of the application, the pose information and the information of the virtual plane are stored, so that after the recording of the original video is finished, virtual content is added into the original video according to the pose information and the information of the virtual plane of the original video to generate a new AR video; because the pose information and the information of the virtual plane are stored, a user can edit an original video for multiple times to generate AR videos containing different virtual contents respectively.

Wherein, the quaternion can be formed by real number plus three imaginary number units i, j, k; for example, the quaternions can all be linear combinations of 1, i, j and k, i.e., the quaternion can be generally expressed as a + bi + cj + dk, where a, b, c, d all represent real numbers; i. j, k may represent rotation; wherein, i rotation may represent rotation of the X-axis in the X-axis and Y-axis intersection plane in the positive direction to the Y-axis, j rotation may represent rotation of the Z-axis in the positive direction to the X-axis in the Z-axis and X-axis intersection plane, and k rotation may represent rotation of the Y-axis in the positive direction to the Z-axis in the Y-axis and Z-axis intersection plane.

Illustratively, receiving an instruction of a user to instruct shooting at a terminal device; for example, when a user clicks video recording on a terminal device, the terminal device may start the initialization work of pose calculation; before the initialization is not successful, the pose can be expressed as (position x/y/z, rotation quaternion), namely (0,0,0,0,0,0,0), and the information of the virtual plane is (quantity 0); when the initialization is successful, the pose of the designated image frame (initialization initial frame) is expressed as (0,0,0,0,0,0,0), the information of the virtual plane is expressed as (number X, plane number 0, plane 0 point number n, and position X of point0 ₁ ,Y ₁ ,Z ₁ …, position X of point n _n ,Y _n ,Z _n )。

Wherein the number x represents the total number of virtual planes, i.e., the total number of image frames included in the video; plane number 0 may be used to represent a first virtual plane of the plurality of virtual planes; plane 0 points n may be used to indicate that the number of vertices included in the first virtual plane is n; position X of point0 ₁ ,Y ₁ ,Z ₁ Position information for indicating that the first virtual plane includes vertex 0; position X of point n _n ,Y _n ,Z _n Indicating the position information of the vertex n included in the first virtual plane.

For example, in the video recording process, the pose information corresponding to the acquired current image frame may be represented as (X, Y, Z, q) ₀ ,q ₁ ,q ₂ ,q ₃ ) The information of the virtual plane can be expressed as (number X, plane number A, number n of points of plane A, position X of point 0) ₁ ,Y ₁ ,Z ₁ …, position X of point q _q ,Y _q ,Z _q )。

The x, y and z can respectively represent coordinates of a camera acquiring the current image frame on an x axis, a y axis and a z axis; q. q.s ₀ ,q ₁ ,q ₂ ,q ₃ Representing a rotational quaternion; for example, it can be expressed as a pitch angle, an azimuth angle, a rotation angle, and an euler angle; the number x represents the total number of planes; the plane number a may be used to represent an identifier of a virtual plane corresponding to the current image frame; the point number n of the plane A is used for representing that the number of vertexes included in a virtual plane corresponding to the current image frame is n; position X of point0 ₁ ,Y ₁ ,Z ₁ The position information which can be used for representing that the virtual plane corresponding to the current image frame comprises a vertex 0; position X of point n _n ,Y _n ,Z _n And the position information is used for representing the vertex n included by the virtual plane corresponding to the current image frame.

In one example, one image frame in the original video may be acquired; extracting characteristic points of the image frame according to the pose information of the image frame, and calculating to obtain sparse point cloud; fitting and generating a virtual plane according to the sparse point cloud information; when virtual content is added to a video, the position of the virtual content added to the video can be adjusted according to the virtual plane.

For example, when the user clicks the screen/gesture operation to place the virtual content, the operation of the user and the generated virtual plane may collide to determine the placement position of the virtual content.

frame number: frame num, unscheduled int32;

information of the virtual plane: (num: unsigned dint32; planeNum0: unsigned int32; planeNumPoint: unsigned int32; point0 (float, float, float) … point N (float, float, float) … planeNumN …);

For example, the following information may be stored in the SEI information of h.264/h.265 when video compression encoding is performed:

pose information: (float, float, float, float, float, float, float);

information of the virtual plane: (num: unidentified 32; planeNum0: unidentified int32; planeNumPoint: unidentified int32; point0 (float, float, float) … point N (float, float, float) … planeNumN …).

In an embodiment of the present application, in order to reduce a storage space of the terminal device occupied by storing the pose information and the information of the virtual plane, at least one of the following manners may be adopted to perform compression processing on the custom information:

saving pose information according to the difference between the current image frame and the previous image frame; or, the plane number of the virtual plane may be saved in an unsigned character (unsigned char) manner; or, for the description of the vertex in the virtual plane, the horizontal plane may retain the Z-axis information of one point to delete the Z-axis information of other points, and the vertical plane may retain the Y-axis information of one point to delete the Y-axis information of other points; alternatively, the vertex position description may use float16; alternatively, when the information of the virtual plane is stored, only the plane in the current visual field range may be stored.

And step S450, finishing recording the original video.

For example, as shown in fig. 19, the terminal device detects that the user clicks the control 550 for shooting again, and ends recording of the video this time; for example, the video is recorded for 20 seconds.

And step S460, editing the original video.

For example, the original video is edited through a visualization interface of a virtual plane; any image of the 8 th second frame in the original video can be extracted, the display interface shown in fig. 20 may further include a display plane option 570, and the terminal device detects that the user clicks the display plane option 570, and the generated virtual plane 562 may be displayed in the display interface, as shown in fig. 21.

For example, during the process of adding virtual content by the user, a virtual plane 562 may be displayed on the display interface; when the user clicks the screen/gesture operation to place the virtual content, the user's operation collides with the virtual plane 562, thereby determining the placement position of the virtual content.

It should be appreciated that the virtual plane 562 may be displayed in the interface when editing the virtual content, such as adjusting the position of the virtual content; after editing is completed, the virtual plane 562 does not appear in the AR video; the virtual plane 562 is used for the user to determine the location of the virtual content addition in the video.

And step S470, generating the AR video comprising the virtual content.

In the embodiment of the application, when the original video is acquired, the pose information corresponding to the original video can be acquired; obtaining a virtual plane according to the pose information and the original video; when virtual content is added into an image frame of an original video, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better integrated into the original video, and the video quality of an AR video is improved.

It is to be understood that the above description is intended to assist those skilled in the art in understanding the embodiments of the present application and is not intended to limit the embodiments of the present application to the particular values or particular scenarios illustrated. It will be apparent to those skilled in the art from the foregoing description that various equivalent modifications or changes may be made, and such modifications or changes are intended to fall within the scope of the embodiments of the present application.

The processing method of the AR video according to the embodiment of the present application is described in detail above with reference to fig. 1 to 21, and the device embodiment of the present application is described in detail below with reference to fig. 22 and 23. It should be understood that the apparatus in this embodiment of the present application may perform the processing method of the AR video in the foregoing embodiment of the present application, that is, the following specific working processes of various products, and reference may be made to corresponding processes in the foregoing method embodiments.

Fig. 22 is a schematic structural diagram of an augmented reality video processing apparatus provided in the present application. The processing device 600 comprises an acquisition unit 610 and a processing unit 620.

The acquiring unit 610 acquires an original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing a pose when a terminal device acquires the original video; the processing unit 620 is configured to generate a virtual plane according to the original video and the pose information, where the virtual plane is used to determine position information for adding virtual content in the original video; and adding the virtual content in the original video according to the virtual plane to generate an AR video.

Optionally, as an embodiment, the pose information includes three-dimensional pose information, and the processing unit 620 is further configured to:

and expressing the three-dimensional attitude information by quaternion.

Optionally, as an embodiment, the processing unit 620 is specifically configured to:

extracting feature points of the image frame according to pose information of the image frame in the original video;

and generating the virtual plane according to the characteristic points.

Optionally, as an embodiment, the processing unit 620 is further configured to:

and saving the pose information and the information of the virtual plane.

Optionally, as an embodiment, the processing unit 620 is further configured to:

Optionally, as an embodiment, the virtual plane includes a first virtual plane, where the first virtual plane refers to a virtual plane corresponding to a first image frame, and the first image frame is any one image frame in the original video;

The processing device 600 is embodied in the form of a functional unit. The term "unit" herein may be implemented in software and/or hardware, and is not particularly limited thereto.

For example, a "unit" may be a software program, a hardware circuit, or a combination of both that implement the above-described functions. The hardware circuitry may include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (e.g., a shared processor, a dedicated processor, or a group of processors) and memory that execute one or more software or firmware programs, a combinational logic circuit, and/or other suitable components that support the described functionality.

Thus, the units of each example described in the embodiments of the present application can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Fig. 23 shows a schematic structural diagram of an electronic device provided in the present application. The dashed lines in fig. 23 indicate that the unit or the module is optional. The electronic device 700 may be used to implement the processing methods described in the method embodiments above.

The electronic device 700 includes one or more processors 701, and the one or more processors 702 may support the electronic device 700 to implement the methods in the method embodiments. The processor 701 may be a general purpose processor or a special purpose processor. For example, the processor 701 may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic device, such as a discrete gate, a transistor logic device, or a discrete hardware component.

The processor 701 may be used to control the electronic device 700, execute software programs, and process data of the software programs. The electronic device 700 may further include a communication unit 705 to enable input (reception) and output (transmission) of signals.

For example, the electronic device 700 may be a chip and the communication unit 705 may be an input and/or output circuit of the chip, or the communication unit 705 may be a communication interface of the chip, which may be a component of a terminal device or other electronic device.

Also for example, the electronic device 700 may be a terminal device and the communication unit 705 may be a transceiver of the terminal device, or the communication unit 705 may be a transceiver circuit of the terminal device.

The electronic device 700 may include one or more memories 702, on which programs 704 are stored, and the programs 704 may be executed by the processor 701, and generate instructions 703, so that the processor 701 executes the processing method of the AR video described in the above method embodiment according to the instructions 703.

Optionally, data may also be stored in the memory 702. Alternatively, the processor 701 may also read data stored in the memory 702, the data may be stored at the same memory address as the program 704, or the data may be stored at a different memory address from the program 704.

The processor 701 and the memory 702 may be provided separately or integrated together; for example, on a System On Chip (SOC) of the terminal device.

For example, the memory 702 may be configured to store a program 704 related to the processing method of the AR video provided in the embodiment of the present application, and the processor 701 may be configured to call the program 704 related to the processing method of the AR video stored in the memory 702 when the AR video is edited, and perform the processing of the AR video of the embodiment of the present application; for example, an original video and pose information are obtained, the original video is used for representing a video of a real object, and the pose information is used for representing a pose of a terminal device when the original video is obtained; the processing unit is used for generating a virtual plane according to the original video and the pose information, and the virtual plane is used for determining position information of virtual content added in the original video; and adding the virtual content in the original video according to the virtual plane to generate an AR video.

The present application also provides a computer program product, which when executed by the processor 701 implements the processing method according to any of the method embodiments of the present application.

The computer program product may be stored in the memory 702, for example, as the program 704, and the program 704 is finally converted into an executable object file capable of being executed by the processor 701 through processes such as preprocessing, compiling, assembling and linking.

The present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a computer, implements the method of any of the method embodiments of the present application. The computer program may be a high-level language program or an executable object program.

Optionally, the computer readable storage medium is, for example, a memory 702. Memory 702 may be either volatile memory or nonvolatile memory, or memory 702 may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate SDRAM, enhanced SDRAM, SLDRAM, synchronous Link DRAM (SLDRAM), and direct rambus RAM (DR RAM).

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and the generated technical effects of the above-described apparatuses and devices may refer to the corresponding processes and technical effects in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the disclosed system, apparatus and method can be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not performed. The above-described embodiments of the apparatus are merely exemplary, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, and a plurality of units or components may be combined or integrated into another system. In addition, the coupling between the units or the coupling between the components may be direct coupling or indirect coupling, and the coupling includes electrical, mechanical or other connections.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Additionally, the terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is only one kind of association relationship describing the association object, and means that there may be three kinds of relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

In short, the above description is only a preferred embodiment of the present disclosure, and is not intended to limit the scope of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A processing method of an Augmented Reality (AR) video is characterized by comprising the following steps:

acquiring an original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing a pose when a terminal device acquires the original video;

generating a virtual plane according to the original video and the pose information, wherein the virtual plane is used for determining the position of adding virtual content in the original video;

and adding the virtual content in the original video according to the virtual plane to generate an AR video.

2. The processing method of claim 1, wherein the pose information comprises three-dimensional pose information, further comprising:

and expressing the three-dimensional attitude information by quaternion.

3. The processing method according to claim 1 or 2, wherein the generating information of a virtual plane from the original video and the pose information comprises:

and generating the virtual plane according to the characteristic points.

4. The process of any one of claims 1 to 3, further comprising:

and storing the pose information and the information of the virtual plane.

5. The processing method of claim 4, wherein the saving the pose information and the information of the virtual plane comprises:

6. The processing method of claim 4, wherein the saving the pose information and the information of the virtual plane comprises:

7. The processing method of any of claims 4 to 6, further comprising:

8. The processing method according to any one of claims 1 to 7, wherein said adding the virtual content to the original video according to the information of the virtual plane to generate an AR video comprises:

9. The processing method according to any one of claims 4 to 8, wherein the virtual plane comprises a first virtual plane, the first virtual plane refers to a virtual plane corresponding to a first image frame, the first image frame is any one image frame in the original video;

10. An electronic device, characterized in that the electronic device comprises: one or more processors, memory, and a display screen; the memory coupled with the one or more processors, the memory for storing computer program code, the computer program code comprising computer instructions that the one or more processors invoke to cause the electronic device to perform the processing method of any of claims 1 to 9.

11. A chip system, applied to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to execute the processing method of any one of claims 1 to 9.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the processing method of any one of claims 1 to 9.

13. A computer program product, characterized in that the computer program product comprises computer program code which, when executed by a processor, causes the processor to carry out the processing method of any one of claims 1 to 9.