CN115686182B

CN115686182B - Processing method of augmented reality video and electronic equipment

Info

Publication number: CN115686182B
Application number: CN202110831693.9A
Authority: CN
Inventors: 刘小伟; 陈兵; 王国毅; 周俊伟
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2024-02-27
Anticipated expiration: 2041-07-22
Also published as: CN115686182A; WO2023000746A1

Abstract

A processing method of an augmented reality video and an electronic device, the processing method comprises the following steps: acquiring original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing the pose of terminal equipment when the terminal equipment acquires the original video; generating a virtual plane according to the original video and the pose information, wherein the virtual plane is used for determining position information for adding virtual content in the original video; and adding the virtual content into the original video according to the virtual plane to generate an AR video. Based on the technical method, the video quality of the AR video can be improved.

Description

Processing method of augmented reality video and electronic equipment

Technical Field

The application relates to the field of terminals, in particular to a processing method of an augmented reality video and electronic equipment.

Background

The augmented reality (augmented reality, AR) technology is a technology for calculating the position and angle of a camera image in real time and adding a corresponding image, and is a new technology for integrating real world information and virtual world information in a 'seamless' manner, and the goal of the technology is to fit the virtual world around the real world and interact with each other on a screen.

At present, when an AR video is recorded, the virtual content and the video of a real object cannot be fused well, and particularly when a user is required to interact with the virtual content in a shooting scene, repeated shooting is required for many times, and time and labor are wasted.

Therefore, how to better integrate the virtual content and the real object content when recording the AR video, and to improve the video quality of the AR video is a problem to be solved.

Disclosure of Invention

The application provides a processing method of an augmented reality video and electronic equipment, which can enable the video of virtual contents and real objects to be well fused when an AR video is recorded, and improve the video quality of the AR video.

In a first aspect, a method for processing an augmented reality video is provided, including:

acquiring original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing the pose of terminal equipment when the terminal equipment acquires the original video; generating a virtual plane according to the original video and the pose information, wherein the virtual plane is used for determining position information for adding virtual content in the original video; and adding the virtual content into the original video according to the virtual plane to generate an AR video.

In the embodiment of the application, pose information corresponding to an original video can be acquired when the original video is acquired; according to the pose information and the original video, a virtual plane can be obtained; when virtual content is added in an image frame of an original video, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better blended into the original video, and the video quality of the AR video is improved.

It should be understood that pose information is used to represent the pose of a camera of the terminal device when acquiring an original video; the pose information may include pose information and position information.

With reference to the first aspect, in certain implementation manners of the first aspect, the pose information includes three-dimensional pose information, and further includes:

and representing the three-dimensional attitude information by quaternion.

In the embodiment of the application, the three-dimensional gesture information can be converted into quaternion to be represented, so that ambiguity generated by representing the gesture information through three parameters is avoided.

With reference to the first aspect, in some implementations of the first aspect, the generating information of a virtual plane according to the original video and the pose information includes:

Extracting feature points of the image frames according to pose information of the image frames in the original video;

and generating the virtual plane according to the characteristic points.

It should be understood that the feature points of the image frame may refer to points where the gray value of the image changes drastically, or points where the curvature is large on the edge of the image; the feature points may be used to identify objects in the image.

With reference to the first aspect, in certain implementation manners of the first aspect, the method further includes:

and storing the pose information and the information of the virtual plane.

In the embodiment of the application, after the recording of the original video is finished, virtual content is added into the original video according to the pose information of the original video and the information of the virtual plane to generate a new AR video by storing the pose information and the information of the virtual plane; because the pose information and the virtual plane information are stored, a user can edit the original video for multiple times, and AR videos with different virtual contents are generated respectively.

With reference to the first aspect, in certain implementation manners of the first aspect, the saving the pose information and the information of the virtual plane includes:

and storing the pose information and the information of the virtual plane in a binary file.

In one possible implementation, the terminal device may save the pose information and the information of the virtual plane as separate binary files.

In one possible implementation, pose information of the original video corresponding to the original video and information of the virtual plane may be stored under the same directory.

In one possible implementation manner, pose information corresponding to the original video and information of the virtual plane may be stored in the terminal device in the same manner as the naming of the original video.

In one possible implementation manner, pose information and virtual plane information corresponding to the original video may be stored in the terminal device by using a frame number of each image frame as an identifier.

and storing the pose information and the information of the virtual plane in the supplementary enhancement information corresponding to the original video.

In one possible implementation, the pose information and the information of the virtual plane may be stored in the supplemental enhancement information of h.264 or h.265 when video compression encoded.

and compressing the stored pose information and the information of the virtual plane.

In the embodiment of the application, the stored information can be compressed when the pose information and the virtual plane information are stored, so that the memory space occupied by the stored information can be effectively reduced.

In one possible implementation, the compressing of the saved pose information and the information of the virtual plane may be performed in at least one of the following ways:

storing pose information according to the difference between the current image frame and the previous image frame; alternatively, the plane number of the virtual plane may be saved in an unsigned character manner; or, for the description of the vertex in the virtual plane, the horizontal plane may retain the Z-axis information of one point and delete the Z-axis information of other points, and the vertical plane may retain the Y-axis information of one point and delete the Y-axis information of other points; alternatively, the location description of the vertices may employ float16; alternatively, only the plane within the current field of view may be saved when saving the information of the virtual plane.

With reference to the first aspect, in certain implementation manners of the first aspect, the adding the virtual content to the original video according to the information of the virtual plane to generate AR video includes:

And after the original video is recorded, adding the virtual content into the original video according to the virtual plane to generate the AR video.

With reference to the first aspect, in certain implementation manners of the first aspect, the virtual plane includes a first virtual plane, where the first virtual plane refers to a virtual plane corresponding to a first image frame, and the first image frame is any image frame in the original video;

the information of the first virtual plane includes a total number of image frames, an identification of the first virtual plane, a number of vertices included in the first virtual plane, and position information of each vertex included in the first virtual plane, and the total number refers to the total number of image frames included in the original video.

In a second aspect, there is provided a processing apparatus for AR video, the processing apparatus including an acquisition unit and a processing unit; the acquisition unit is used for acquiring an original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing a pose of terminal equipment when the terminal equipment acquires the original video; the processing unit is used for generating a virtual plane according to the original video and the pose information, and the virtual plane is used for determining position information for adding virtual content in the original video; and adding the virtual content into the original video according to the virtual plane to generate an AR video.

With reference to the second aspect, in certain implementations of the second aspect, the pose information includes three-dimensional pose information, and the processing unit is further configured to:

and representing the three-dimensional attitude information by quaternion.

With reference to the second aspect, in certain implementations of the second aspect, the processing unit is specifically configured to:

and generating the virtual plane according to the characteristic points.

With reference to the second aspect, in certain implementations of the second aspect, the processing unit is further configured to:

and storing the pose information and the information of the virtual plane.

With reference to the second aspect, in some implementations of the second aspect, the virtual plane includes a first virtual plane, where the first virtual plane refers to a virtual plane corresponding to a first image frame, and the first image frame is any one image frame in the original video;

In one possible implementation, the processing device of AR video may refer to a chip.

When the processing device is a chip, the acquiring unit may refer to an output interface, a pin, a circuit, or the like; the processing unit may refer to a processing unit inside the chip.

It should be appreciated that the extensions, definitions, explanations and illustrations of the relevant content in the first aspect described above also apply to the same content in the second aspect.

In a third aspect, an electronic device is provided, the electronic device comprising: one or more processors, memory, and a display screen; the memory is coupled with the one or more processors, the memory is for storing computer program code, the computer program code comprising computer instructions that the one or more processors call to cause the electronic device to perform:

With reference to the third aspect, in certain implementations of the third aspect, the pose information includes three-dimensional pose information, and the one or more processors invoke the computer instructions to cause the electronic device to further perform:

And representing the three-dimensional attitude information by quaternion.

With reference to the third aspect, in certain implementations of the third aspect, the one or more processors invoke the computer instructions to cause the electronic device to further perform:

and generating the virtual plane according to the characteristic points.

and storing the pose information and the information of the virtual plane.

With reference to the third aspect, in some implementations of the third aspect, the virtual plane includes a first virtual plane, where the first virtual plane refers to a virtual plane corresponding to a first image frame, and the first image frame is any one image frame in the original video;

It should be appreciated that the extensions, definitions, explanations and illustrations of the relevant content in the first aspect described above also apply to the same content in the third aspect.

In a fourth aspect, there is provided an electronic device comprising: one or more processors, memory, and a display screen; the memory is coupled with the one or more processors, the memory for storing computer program code, the computer program code comprising computer instructions that the one or more processors call to cause the electronic device to perform any of the processing methods of the first aspect.

In a fifth aspect, there is provided a chip system for application to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to perform any of the processing methods of the first aspect.

In a sixth aspect, there is provided a computer readable storage medium storing computer program code which, when executed by an electronic device, causes the electronic device to perform any one of the processing methods of the first aspect.

In a seventh aspect, there is provided a computer program product comprising: computer program code which, when run by an electronic device, causes the electronic device to perform any of the processing methods of the first aspect.

In the embodiment of the application, as pose information corresponding to the original video can be acquired when the original video is acquired; according to the pose information and the original video, a virtual plane can be obtained; when virtual content is added in an image frame of an original video, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane; therefore, in the embodiment of the application, the virtual content can be better integrated into the original video through the virtual plane, so that the video quality of the generated AR video is improved.

Drawings

FIG. 1 is a schematic diagram of a hardware system suitable for use in the apparatus of the present application;

FIG. 2 is a schematic diagram of a software system suitable for use with the apparatus of the present application;

fig. 3 is a schematic diagram of an application scenario provided in the present application;

fig. 4 is a schematic diagram of a processing method of an augmented reality video provided in the present application;

FIG. 5 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 6 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 7 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 8 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 9 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 10 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 11 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 12 is a schematic diagram of a display interface for AR video processing provided herein;

fig. 13 is a schematic diagram of a processing method of an augmented reality video provided in the present application;

FIG. 14 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 15 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 16 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 17 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 18 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 19 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 20 is a schematic diagram of a display interface for AR video processing provided herein;

FIG. 21 is a schematic diagram of a display interface for AR video processing provided herein;

fig. 22 is a schematic structural diagram of a processing device for augmented reality video provided in the present application;

fig. 23 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 shows a hardware system suitable for a terminal device of the present application.

The terminal device 100 may be a mobile phone, a smart screen, a tablet computer, a wearable electronic device, an in-vehicle electronic device, an augmented reality (augmented reality, AR) device, a Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), a projector, or the like, and the specific type of the terminal device 100 is not limited in the embodiments of the present application.

The terminal device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

The configuration shown in fig. 1 does not constitute a specific limitation on the terminal device 100. In other embodiments of the present application, the terminal device 100 may include more or fewer components than those shown in fig. 1, or the terminal device 100 may include a combination of some of the components shown in fig. 1, or the terminal device 100 may include sub-components of some of the components shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units. For example, the processor 110 may include at least one of the following processing units: application processors (application processor, AP), modem processors, graphics processors (graphics processing unit, GPU), image signal processors (image signal processor, ISP), controllers, video codecs, digital signal processors (digital signal processor, DSP), baseband processors, neural-Network Processors (NPU). The different processing units may be separate devices or integrated devices.

The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. For example, the processor 110 may include at least one of the following interfaces: inter-integrated circuit, I2C) interfaces, inter-integrated circuit audio (inter-integrated circuit sound, I2S) interfaces, pulse code modulation (pulse code modulation, PCM) interfaces, universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interfaces, mobile industry processor interfaces (mobile industry processor interface, MIPI), general-purpose input/output (GPIO) interfaces, SIM interfaces, USB interfaces.

The I2C interface is a bi-directional synchronous serial bus comprising a serial data line (SDA) and a serial clock line (derail clock line, SCL). In some embodiments, the processor 110 may contain multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., respectively, through different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement a touch function of the terminal device 100.

The I2S interface may be used for audio communication. In some embodiments, the processor 110 may contain multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through the I2S interface, to implement a function of answering a call through the bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus for asynchronous communications. The bus may be a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit an audio signal to the wireless communication module 160 through a UART interface, to implement a function of playing music through a bluetooth headset.

The MIPI interface may be used to connect the processor 110 with peripheral devices such as the display 194 and camera 193. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing function of terminal device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display function of the terminal device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal interface as well as a data signal interface. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, and the sensor module 180. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, or a MIPI interface.

The USB interface 130 is an interface conforming to the USB standard specification, and may be, for example, a Mini (Mini) USB interface, a Micro (Micro) USB interface, or a C-type USB (USB Type C) interface. The USB interface 130 may be used to connect a charger to charge the terminal device 100, to transfer data between the terminal device 100 and a peripheral device, and to connect a headset to play audio through the headset. The USB interface 130 may also be used to connect other terminal devices 100, such as AR devices.

The connection relationship between the modules shown in fig. 1 is only schematically illustrated, and does not constitute a limitation on the connection relationship between the modules of the terminal device 100. Alternatively, each module of the terminal device 100 may also use a combination of multiple connection manners in the foregoing embodiments.

The charge management module 140 is used to receive power from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive the current of the wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive electromagnetic waves (current path shown as dashed lines) through the wireless charging coil of the terminal device 100. The charging management module 140 may also supply power to the terminal device 100 through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle times, and battery state of health (e.g., leakage, impedance). Alternatively, the power management module 141 may be provided in the processor 110, or the power management module 141 and the charge management module 140 may be provided in the same device.

The wireless communication function of the terminal device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the terminal device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication applied on the terminal device 100, such as at least one of the following: second generation (2) ^th generation, 2G) mobile communication solutions, third generation (3 ^th generation, 3G) mobile communication solution, fourth generation (4 ^th generation, 5G) mobile communication solution, fifth generation (5 ^th generation, 5G) mobile communication solution. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering and amplifying the received electromagnetic waves, and then transmit the electromagnetic waves to a modem processor for demodulation. The mobile communication module 150 may further amplify the signal modulated by the modem processor, and the amplified signal is converted into electromagnetic waves by the antenna 1 and radiated. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be configured to process In the device 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through audio devices (e.g., speaker 170A, receiver 170B) or displays images or video through display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

Similar to the mobile communication module 150, the wireless communication module 160 may also provide wireless communication solutions applied on the terminal device 100, such as at least one of the following: wireless local area networks (wireless local area networks, WLAN), bluetooth (BT), bluetooth low energy (bluetooth low energy, BLE), ultra Wide Band (UWB), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication (near field communication, NFC), infrared (IR) technologies. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency-modulates and filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate and amplify it, and convert the signal into electromagnetic waves to radiate via the antenna 2.

In some embodiments, antenna 1 of terminal device 100 is coupled to mobile communication module 150 and antenna 2 of terminal device 100 is coupled to wireless communication module 160 such that terminal device 100 may communicate with networks and other electronic devices via wireless communication techniques. The wireless communication technology may include at least one of the following communication technologies: global system for mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, IR technologies. The GNSS may include at least one of the following positioning techniques: global satellite positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), beidou satellite navigation system (beidou navigation satellite system, BDS), quasi zenith satellite system (quasi-zenith satellite system, QZSS), satellite based augmentation system (satellite based augmentation systems, SBAS).

The terminal device 100 may implement display functions through a GPU, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 may be used to display images or video. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a Mini light-emitting diode (Mini LED), a Micro light-emitting diode (Micro LED), a Micro OLED (Micro OLED), or a quantum dot LED (quantum dot light emitting diodes, QLED). In some embodiments, the terminal device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The terminal device 100 may implement a photographing function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.

The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. The ISP can carry out algorithm optimization on noise, brightness and color of the image, and can optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into a standard Red Green Blue (RGB), YUV, etc. format image signal. In some embodiments, the terminal device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the terminal device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.

Video codecs are used to compress or decompress digital video. The terminal device 100 may support one or more video codecs. In this way, the terminal device 100 can play or record video in various encoding formats, for example: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, and MPEG4.

The NPU is a processor which refers to the biological neural network structure, for example, refers to the transmission mode among human brain neurons to rapidly process input information, and can also be continuously self-learned. The NPU may implement functions such as intelligent cognition of the terminal device 100, for example: image recognition, face recognition, speech recognition, and text understanding.

The external memory interface 120 may be used to connect an external memory card, such as a Secure Digital (SD) card, to enable expansion of the memory capabilities of the terminal device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. Wherein the storage program area may store application programs required for at least one function (e.g., a sound playing function and an image playing function) of the operating system. The storage data area may store data (e.g., audio data and phonebook) created during use of the terminal device 100. Further, the internal memory 121 may include a high-speed random access memory, and may also include a nonvolatile memory such as: at least one disk storage device, a flash memory device, and a universal flash memory (universal flash storage, UFS), etc. The processor 110 performs various processing methods of the terminal device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The terminal device 100 may implement audio functions, such as music playing and recording, through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like.

The audio module 170 is used to convert digital audio information into an analog audio signal output, and may also be used to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 or a portion of the functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also referred to as a horn, is used to convert audio electrical signals into sound signals. The terminal device 100 can listen to music or hands-free conversation through the speaker 170A.

A receiver 170B, also referred to as an earpiece, converts the audio electrical signal into a sound signal. When a user answers a call or voice information using the terminal device 100, the voice can be answered by bringing the receiver 170B close to the ear.

Microphone 170C, also known as a microphone or microphone, is used to convert sound signals into electrical signals. When a user makes a call or transmits voice information, a sound signal may be input to the microphone 170C by sounding near the microphone 170C. The terminal device 100 may be provided with at least one microphone 170C. In other embodiments, the terminal device 100 may be provided with two microphones 170C to implement the noise reduction function. In other embodiments, the terminal device 100 may be further provided with three, four or more microphones 170C to perform functions such as identifying a sound source and directing a sound recording. The processor 110 may process the electrical signal output by the microphone 170C, for example, the audio module 170 and the wireless communication module 160 may be coupled through a PCM interface, and after the microphone 170C converts the environmental sound into an electrical signal (such as a PCM signal), the electrical signal is transmitted to the processor 110 through the PCM interface; the electrical signal is subjected to volume analysis and frequency analysis from the processor 110 to determine the volume and frequency of the ambient sound.

The earphone interface 170D is used to connect a wired earphone. The earphone interface 170D may be a USB interface 130 or a 3.5mm open mobile terminal device 100 platform (open mobile terminal platform, OMTP) standard interface, a american cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A may be of various types, such as a resistive pressure sensor, an inductive pressure sensor, or a capacitive pressure sensor. The capacitive pressure sensor may be a device comprising at least two parallel plates with conductive material, and when a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes, and the terminal device 100 determines the strength of the pressure based on the change in capacitance. When a touch operation acts on the display screen 194, the terminal device 100 detects the touch operation according to the pressure sensor 180A. The terminal device 100 may also calculate the position of the touch from the detection signal of the pressure sensor 180A. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon; and executing the instruction of newly creating the short message when the touch operation with the touch operation intensity being larger than or equal to the first pressure threshold acts on the short message application icon.

The gyro sensor 180B may be used to determine a motion gesture of the terminal device 100. In some embodiments, the angular velocity of the terminal device 100 about three axes (i.e., x-axis, y-axis, and z-axis) may be determined by the gyro sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor 180B detects the angle of shake of the terminal apparatus 100, calculates the distance to be compensated for by the lens module according to the angle, and allows the lens to cancel the shake of the terminal apparatus 100 by the reverse movement, thereby realizing anti-shake. The gyro sensor 180B can also be used for scenes such as navigation and motion sensing games.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the terminal device 100 calculates altitude from barometric pressure values measured by the barometric pressure sensor 180C, aiding in positioning and navigation.

The magnetic sensor 180D includes a hall sensor. The terminal device 100 can detect the opening and closing of the flip cover using the magnetic sensor 180D. In some embodiments, when the terminal device 100 is a folder, the terminal device 100 may detect opening and closing of the folder according to the magnetic sensor 180D. The terminal device 100 may set the characteristics of automatic unlocking of the flip cover according to the detected opening and closing state of the leather sheath or the detected opening and closing state of the flip cover.

The acceleration sensor 180E can detect the magnitude of acceleration of the terminal device 100 in various directions (typically, x-axis, y-axis, and z-axis). The magnitude and direction of gravity may be detected when the terminal device 100 is stationary. The acceleration sensor 180E may also be used to recognize the posture of the terminal device 100 as an input parameter for applications such as landscape switching and pedometer.

The distance sensor 180F is used to measure a distance. The terminal device 100 may measure the distance by infrared or laser. In some embodiments, for example, in a shooting scene, the terminal device 100 may range using the distance sensor 180F to achieve fast focusing.

The proximity light sensor 180G may include, for example, a light-emitting diode (LED) and a light detector, for example, a photodiode. The LED may be an infrared LED. The terminal device 100 emits infrared light outward through the LED. The terminal device 100 detects infrared reflected light from a nearby object using a photodiode. When the reflected light is detected, the terminal device 100 may determine that an object exists nearby. When the reflected light is not detected, the terminal device 100 may determine that there is no object nearby. The terminal device 100 can detect whether the user holds the terminal device 100 close to the ear to talk by using the proximity light sensor 180G, so as to automatically extinguish the screen for the purpose of saving power. The proximity light sensor 180G may also be used for automatic unlocking and automatic screen locking in holster mode or pocket mode.

The ambient light sensor 180L is used to sense ambient light level. The terminal device 100 may adaptively adjust the brightness of the display 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust white balance when taking a photograph. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the terminal device 100 is in a pocket to prevent false touches.

The fingerprint sensor 180H is used to collect a fingerprint. The terminal device 100 can utilize the collected fingerprint characteristics to realize the functions of unlocking, accessing an application lock, photographing, answering an incoming call and the like.

The temperature sensor 180J is for detecting temperature. In some embodiments, the terminal device 100 performs a temperature processing strategy using the temperature detected by the temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the terminal device 100 performs a reduction in the performance of a processor located near the temperature sensor 180J in order to reduce power consumption to implement thermal protection. In other embodiments, when the temperature is below another threshold, the terminal device 100 heats the battery 142 to avoid the low temperature causing the terminal device 100 to shut down abnormally. In other embodiments, when the temperature is below a further threshold, the terminal device 100 performs boosting of the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperatures.

The touch sensor 180K, also referred to as a touch device. The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a touch screen. The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor 180K may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the terminal device 100 and at a different location than the display 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, bone conduction sensor 180M may acquire a vibration signal of a human vocal tract vibrating bone pieces. The bone conduction sensor 180M may also contact the pulse of the human body to receive the blood pressure pulsation signal. In some embodiments, bone conduction sensor 180M may also be provided in a headset, in combination with an osteoinductive headset. The audio module 170 may analyze the voice signal based on the vibration signal of the sound portion vibration bone block obtained by the bone conduction sensor 180M, so as to implement a voice function. The application processor may analyze the heart rate information based on the blood pressure beat signal acquired by the bone conduction sensor 180M, so as to implement a heart rate detection function.

The keys 190 include a power-on key and an volume key. The keys 190 may be mechanical keys or touch keys. The terminal device 100 may receive a key input signal and implement a function related to the case input signal.

The motor 191 may generate vibration. The motor 191 may be used for incoming call alerting as well as for touch feedback. The motor 191 may generate different vibration feedback effects for touch operations acting on different applications. The motor 191 may also produce different vibration feedback effects for touch operations acting on different areas of the display screen 194. Different application scenarios (e.g., time alert, receipt message, alarm clock, and game) may correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 192 may be an indicator light, which may be used to indicate a change in state of charge and charge, or may be used to indicate a message, missed call, and notification.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195 to make contact with the terminal apparatus 100, or may be pulled out from the SIM card interface 195 to make separation from the terminal apparatus 100. The terminal device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The same SIM card interface 195 may simultaneously insert multiple cards, which may be of the same type or of different types. The SIM card interface 195 may also be compatible with external memory cards. The terminal device 100 interacts with the network through the SIM card to realize functions such as call and data communication. In some embodiments, the terminal device 100 employs an embedded SIM (eSIM) card, which may be embedded in the terminal device 100 and not separable from the terminal device 100.

The hardware system of the terminal device 100 is described in detail above, and the software system of the terminal device 100 is described below. The software system may employ a layered architecture, an event driven architecture, a microkernel architecture, a micro-service architecture, or a cloud architecture, and the embodiment of the present application exemplarily describes the software system of the terminal device 100.

As shown in fig. 2, the software system using the hierarchical architecture is divided into several layers, each of which has a clear role and division. The layers communicate with each other through a software interface. In some embodiments, the software system may be divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.

The application layer may include camera, gallery, calendar, conversation, map, navigation, WLAN, bluetooth, music, video, short message, etc. applications.

The application framework layer provides an application programming interface (Application Programming Interface, API) and programming framework for application programs of the application layer. The application framework layer may include some predefined functions.

For example, the application framework layer includes a window manager, a content provider, a view system, a resource manager and notification manager, a synchronized localization and mapping (Simultaneous Localization And Mapping, SLAM) pose calculation module, and a plane generation module; the application framework layer may also include a telephony manager.

The SLAM pose calculation module is used for outputting pose information and sparse point clouds; the pose information refers to pose information of a camera of the terminal equipment, and the camera of the terminal equipment is used for acquiring a video of a real scene; and extracting characteristic points of any frame of image in the video according to pose information of the frame of image, and obtaining sparse point cloud through calculation.

The plane generation module is used for generating a virtual plane through algorithm fitting according to the sparse point cloud provided by the SLAM; when virtual content is added in a real scene, the placement position of the virtual content can be adjusted according to a virtual plane; for example, when a user clicks a screen/gesture operation to place virtual content, the user's operation and the generated virtual plane may collide, and the placement position of the virtual content may be determined. It should be understood that, program instructions corresponding to the processing method of the augmented reality video provided in the embodiments of the present application may be executed in the SLAM pose calculation module and the plane generation module.

The window manager is used for managing window programs. The window manager may obtain the display screen size, determine if there are status bars, lock screens, and intercept screens.

The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, and phonebooks.

The view system includes visual controls, such as controls to display text and controls to display pictures. The view system may be used to build applications. The display interface may be composed of one or more views, for example, a display interface including a text notification icon may include a view displaying text and a view displaying a picture.

The telephone manager is used to provide communication functions of the terminal device 100, such as management of a call state (on or off).

The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, and video files.

The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as a notification manager, is used for download completion notification and message alerting. The notification manager may also manage notifications that appear in the system top status bar in the form of charts or scroll bar text, such as notifications for applications running in the background. The notification manager may also manage notifications that appear on the screen in the form of dialog windows, such as prompting text messages in status bars, sounding prompts, vibrating electronic devices, and flashing lights.

Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.

The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing functions such as management of object life cycle, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules, such as: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., open graphics library (open graphics library for embedded systems, openGL ES) for embedded systems) and 2D graphics engines (e.g., skia graphics library (skia graphics library, SGL)).

The surface manager is used to manage the display subsystem and provides a fusion of the 2D and 3D layers for the plurality of applications.

The media library supports playback and recording of multiple audio formats, playback and recording of multiple video formats, and still image files. The media library may support a variety of audio video coding formats such as MPEG4, h.264, moving picture experts group audio layer 3 (moving picture experts group audio layer III, MP 3), advanced audio coding (advanced audio coding, AAC), adaptive multi-rate (AMR), joint picture experts group (joint photographic experts group, JPG), and portable network graphics (portable network graphics, PNG).

Three-dimensional graphics processing libraries may be used to implement three-dimensional graphics drawing, image rendering, compositing, and layer processing.

The two-dimensional graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The kernel layer may include a display driver, a camera driver, an audio driver, a sensor driver, and the like.

The workflow of the software system and hardware system of the terminal device 100 is exemplarily described below in connection with displaying a photographing scene.

When a user performs a touch operation on the touch sensor 180K, a corresponding hardware interrupt is sent to the kernel layer, which processes the touch operation into a raw input event, for example, information including touch coordinates and a time stamp of the touch operation. The original input event is stored in the kernel layer, and the application framework layer acquires the original input event from the kernel layer, identifies a control corresponding to the original input event, and notifies an Application (APP) corresponding to the control. For example, the touch operation is a click operation, the APP corresponding to the control is a camera APP, and after the camera APP is awakened by the click operation, the camera APP may call the camera driver of the kernel layer through the API, and the camera driver controls the camera 193 to shoot.

At present, when an AR video is recorded, the virtual content and the video of a real object cannot be well fused, and particularly when a user is required to interact with the virtual content in a shooting scene, repeated shooting is required for many times, and time and labor are wasted.

In view of this, the present application provides a processing method of AR video, by acquiring pose information corresponding to an original video when the original video is acquired; according to the pose information and the original video, a virtual plane can be obtained; when virtual content is added in an image frame of an original video, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better blended into the original video, and the video quality of the AR video is improved.

The processing method of the augmented reality video provided in the present application will be described in detail with reference to fig. 3 to 21, taking the terminal device 100 as an example.

FIG. 3 is a schematic diagram of an application scenario of the present application; as shown in fig. 3, the processing method of AR video provided in the embodiment of the present application may be applied to the field of AR video; the original video can be obtained, and the target video can be obtained through AR video processing; the original video may refer to a video of a real object shot by a user, and the target video may refer to an AR video obtained by adding virtual content to the original video.

For example, the processing method of AR video provided in the embodiment of the present Application may also be executed in an Application (APP) to perform AR video editing; for example, the AR video APP may perform the processing method of AR video of the present application. Alternatively, the processing method of AR video provided in the embodiment of the present application may also be integrated in a camera of the terminal device; for example, an AR video mode may be selected in the setting of the camera of the terminal device, so as to implement the processing method of AR video provided in the embodiment of the present application; these two implementations are each described in detail below.

The implementation mode is as follows: the AR video processing method is achieved through an application program.

As shown in fig. 4, fig. 4 is a schematic flowchart of a processing method of AR video provided in an embodiment of the present application; the processing method 200 includes steps S210 to S260, which are described in detail below.

Step S210, running the AR video APP.

For example, the user may click on an AR video APP in the display interface of the terminal device; responding to clicking operation of a user, and enabling the terminal equipment to run an AR video APP; as shown in fig. 5, fig. 5 shows a graphical user interface (graphical user interface, GUI) of the terminal device, which may be the desktop 310 of the terminal device. When the terminal device detects that the user clicks the icon 320 of the AR video APP on the desktop 310, the AR video APP may be run, and another GUI as shown in fig. 6 is displayed; the display interface 330 shown in fig. 6 may include a photographing viewfinder 340, and preview images may be displayed in real time in the photographing viewfinder 340; a control 350 for indicating photographing may also be included on the photographing interface, as well as other photographing controls.

In one example, the terminal device detects that the user clicks an icon of the AR video APP on the display interface, and may start the AR video APP to display the display interface of the AR video APP; a photographing viewfinder may be included on the display interface; for example, in the video mode, the photographing viewfinder may be a partial screen or may be an entire display screen. In the preview state, the user can display the preview image in real time in the shooting view frame before opening the AR video APP and not pressing the shooting button.

It should also be understood that the above description is given by way of example of AR video APP, and the names of the applications in the embodiments of the present application are not limited in any way.

Step S220, obtaining original video and pose information.

For example, as shown in fig. 7, the terminal apparatus detects an operation of the user clicking on the shooting control 350, and starts recording an image displayed in the shooting viewfinder.

It should be understood that the action of the user for indicating shooting may include pressing a shooting button, or may include the user equipment indicating the terminal equipment to perform shooting action through voice, or may also include the user indicating other terminal equipment to perform shooting action. The foregoing is illustrative and not intended to limit the present application in any way.

For example, the pose information may be used to represent the pose of the camera of the terminal device when acquiring the original video; the pose information may include pose information and position information.

For example, the terminal device may acquire pose information corresponding to each frame of image through the gyro sensor 180B as shown in fig. 1.

Step S230, the pose information and the information of the virtual plane are saved.

The saved pose information may refer to pose information corresponding to each image frame in the original video.

Illustratively, feature point extraction can be performed on any image frame in the original video according to pose information of the image frame, and sparse point cloud is obtained through calculation; fitting according to the sparse point cloud through an algorithm to generate a virtual plane; when virtual content is added in the video of the real object, the placement position of the virtual content can be adjusted according to the virtual plane.

In the embodiment of the application, after the recording of the original video is finished, virtual content is added into the original video according to the pose information of the original video and the information of the virtual plane to generate a new AR video by storing the pose information and the information of the virtual plane; because the pose information and the information of the virtual plane are saved, a user can edit the original video for multiple times, and AR videos comprising different virtual contents are respectively generated.

In one example, in the processing method of AR video in the present application, the acquired three-dimensional gesture information may be represented by using a quaternion, so that ambiguity generated by representing the gesture by using three parameters may be avoided.

Wherein, the quaternion may be composed of a real number plus three imaginary units i, j, k; for example, quaternions may all be linear combinations of 1, i, j and k, i.e., quaternions may be generally expressed as a+bi+cj+dk, where a, b, c, d all represent real numbers; i. j, k may represent rotation; wherein, the i rotation can represent the positive rotation of the positive X axis and the positive Y axis in the X axis and the Y axis intersecting plane, the j rotation can represent the positive rotation of the positive X axis and the positive Z axis in the Z axis and the X axis intersecting plane, and the k rotation can represent the positive rotation of the positive Z axis and the positive Y axis in the Y axis and the Z axis intersecting plane.

The method includes the steps that an instruction of shooting is received at a terminal device; for example, when a user clicks a video recording on a terminal device, the terminal device can start initialization work of pose calculation; the pose can be expressed as (position x/y/z, rotation quaternion) before unsuccessful initialization, namely can be expressed as (0,0,0,0,0,0,0), and the information of the virtual plane is (quantity 0); when the initialization is successful, the pose of the designated image frame (initialization start frame) is expressed as (0,0,0,0,0,0,0), and the information of the virtual plane is expressed as (number X, plane number 0, plane 0 point number n, position X of point 0) ₁ ,Y ₁ ,Z ₁ … position X of Point n _n ,Y _n ,Z _n )。

Wherein the number x represents the total number of virtual planes, i.e. the total number of image frames included in the video; plane number 0 may be used to represent a first virtual plane of the plurality of virtual planes; plane 0 points n may be used to represent the number of vertices included in the first virtual plane as n; position X of Point 0 ₁ ,Y ₁ ,Z ₁ Position information for representing that the first virtual plane includes vertex 0; position X of point n _n ,Y _n ,Z _n For representing the position information including vertex n in the first virtual plane.

It should be understood that the information of the virtual plane may include positional information of all vertices included in the virtual plane.

For example, in the video recording process, the pose information corresponding to the current image frame obtained may be expressed as (X, Y, Z, q) ₀ ,q ₁ ,q ₂ ,q ₃ ) The information of the virtual plane can be expressed as (number X, plane number a, plane a number n, position X of point 0) ₁ ,Y ₁ ,Z ₁ …, position X of Point q _q ,Y _q ,Z _q )。

Wherein, x, y and z can respectively represent the coordinates of the camera for acquiring the current image frame in the x axis, the y axis and the z axis; q ₀ ,q ₁ ,q ₂ ,q ₃ Indicating a rotationConverting quaternion; for example, it may be expressed as pitch angle, azimuth angle, rotation angle, and euler angle; the number x represents the total number of planes; the plane number a may be used to represent an identification of a virtual plane corresponding to the current image frame; the number n of the plane A points is used for indicating that the number n of the vertexes included in the virtual plane corresponding to the current image frame is n; position X of Point 0 ₁ ,Y ₁ ,Z ₁ Position information including vertex 0 in a virtual plane corresponding to the current image frame; position X of point n _n ,Y _n ,Z _n And the position information is used for representing the vertex n included in the virtual plane corresponding to the current image frame.

In one example, one image frame in the original video may be acquired; extracting feature points of the image frame according to pose information of the image frame, and obtaining sparse point cloud through calculation; fitting to generate a virtual plane according to the sparse point cloud information; when virtual content is added to a video, the position of the virtual content added to the video can be adjusted according to the virtual plane.

For example, when a user clicks on a screen/gesture operation to place virtual content, the user's operation and the generated virtual plane may collide, determining the place of the virtual content.

In the embodiment of the application, after the pose information and the information of the virtual plane are acquired, the terminal device may store the pose information and the information of the virtual plane.

In one example, the customization information includes the pose information and the information of the virtual plane, and the terminal device may store the customization information as a binary file (bin) that is independently stored.

For example, the original video and the custom information corresponding to the original video may be stored under the same directory.

For example, the custom information corresponding to the original video may be stored in the terminal device in the same manner as the naming of the original video.

For example, the custom information corresponding to the original video may be stored in the terminal device by using the frame number of one image frame as an identifier.

Illustratively, the custom information corresponding to each image frame in the original video may be saved as an independent bin file according to the following data format:

frame number: frame num to unsigned int32;

pose information: (data 1, data 2, data 3, data 4, data 5, data 6, data 7); wherein, the data 1 to the data 7 can be data in a float format;

information of virtual plane: (num: unsigned int32; planeNum0: unsigned int32; planeNum point: unsigned int32; point0 (float, float) … pointN (float ) … planenmn …);

for example, when editing the original video, the original video and the bin file may be loaded simultaneously; and synchronizing and aligning the image frames in the original video with the custom information corresponding to the image frames according to the frame numbers.

In one example, the custom information may include pose information and information of a virtual plane, and the terminal device may save the custom information into supplemental enhancement information in a video bitstream corresponding to the original video.

For example, the following information may be stored into the SEI information of h.264/h.265 when video compression encoding:

pose information: (float, float, float, float, float);

information of virtual plane: (num: unsigned int32; planeNum0: unsigned int32; planeNum point: unsigned int32; point0 (float, float) … pointN (float ) … planeNum …).

In the case of storing the custom information in the SEI information of the video compression encoding, the decoding of the custom information may be performed according to the above-described format when performing the decoding of the edited video in step S250.

In an embodiment of the present application, in order to reduce a storage space occupied by the terminal device for storing the pose information and the information of the virtual plane, the compression processing of the custom information may be performed in at least one of the following manners:

storing pose information according to the difference between the current image frame and the previous image frame; alternatively, the plane number of the virtual plane may be saved in an unsigned character (unsigned character) manner; or, for the description of the vertex in the virtual plane, the horizontal plane may retain the Z-axis information of one point and delete the Z-axis information of other points, and the vertical plane may retain the Y-axis information of one point and delete the Y-axis information of other points; alternatively, the location description of the vertices may employ float16; alternatively, only the plane within the current field of view may be saved when saving the information of the virtual plane.

In the embodiment of the application, the original video is acquired through the AR video APP, so that pose information and virtual plane information of the original video can be generated and stored when the video is recorded; on the other hand, after the recording of the original video is finished, each image frame in the original video can be edited; for example, virtual content is added.

Step S240, the original video recording is finished.

For example, as shown in fig. 8, the terminal device detects that the user clicks the shot control 350 again, and ends the recording of the video; for example, the current recorded video is 20 seconds.

And S250, opening a visual interface of the virtual plane, and editing the original video.

It should be understood that when any one image frame in the original video is edited, the terminal device may call the stored custom information corresponding to the image frame; namely, the pose information and the plane information of the image frame are called.

For example, any image frame of 8 th second in the original video is extracted, and the display interface 330 shown in fig. 9 may further include an editing option 360; after the terminal device detects that the user clicks the edit option 360, the terminal device may display an interface of an edit mode, as shown in fig. 10; after detecting that the user clicks the editing mode interface for indicating AR content selection 361, the terminal device displays a display interface shown in FIG. 11; the display interface of fig. 11 further includes a display plane option 362, and the terminal device detects that the user can click on the display plane option 362, and displays the generated virtual plane 363 in the display interface, see fig. 12; in an embodiment of the present application, a visualization plane for placing virtual content may be provided to a user on a display interface of a terminal device; for example, in the process of the user adding virtual content, the virtual plane 363 may be displayed on the display interface; when the user clicks on the screen/gesture operation to place virtual content, the user's operation collides with the virtual plane 363, thereby determining the place of the virtual content, as shown in fig. 12.

It should be appreciated that virtual plane 363 may be displayed in the interface when editing virtual content, such as adjusting the position of virtual content; after editing is completed, virtual plane 363 does not appear in the AR video; the virtual plane 363 may be used as a reference plane for the user to determine the location of the virtual content addition in the video.

Step S260, generating AR video including virtual content.

Illustratively, the user may edit each image frame in the original video; for example, virtual content may be added to each image frame, and position information of the virtual content in each image frame may be adjusted; thereby generating AR video with virtual content.

In one example, a user may play an original video, and the user may click a pause key to extract and edit a current image frame, i.e., add virtual content in the current image frame; when the user clicks the play button again, the current image frame editing is completed.

The implementation mode II is as follows: the processing method of the AR video is integrated in the mode of the camera of the terminal equipment.

As shown in fig. 13, fig. 13 is a schematic flowchart of a processing method of AR video provided in an embodiment of the present application; the processing method 400 includes steps S410 to S470, which are described in detail below.

Step S410, running the camera of the terminal device.

For example, the terminal device checks an operation of clicking the camera by the user; in response to a click operation by the user, the terminal device may run the camera.

FIG. 14 illustrates a GUI of a terminal device, which may be a desktop 510 of the terminal device; when the terminal device detects an operation that the user clicks an icon 520 of the camera on the desktop 510, the camera may be operated to display another GUI as shown in fig. 15, which may be a display interface 530 of the camera; the display interface 530 may include a photographing viewfinder 540, a control 550 indicating photographing, and other photographing controls, wherein a preview image may be displayed in real time within the photographing viewfinder 540.

Step S420, selecting an AR shooting mode.

For example, the terminal device may be an operation to detect that the user indicates the AR photographing mode. Here, the AR photographing mode may refer to a photographing mode in which virtual contents can be added when an original video can be processed.

As shown in fig. 16, the shooting interface further includes a setting 560, and after the terminal device detects that the user clicks the setting 560, the terminal device displays a setting mode interface, as shown in fig. 17; after detecting that the user clicks on the setting mode interface for indicating the AR video 561, the terminal device enters an AR shooting mode.

Step S430, obtaining original video and pose information.

For example, as shown in fig. 17, the terminal apparatus detects an operation of clicking the shooting control 550 by the user, and starts recording an image displayed in the shooting viewfinder.

It should be understood that the action of the user for indicating shooting may include pressing a shooting button, or may include an action of the user device for indicating the terminal device to shoot through voice, or may also include an action of the user for indicating the terminal device to shoot; the foregoing is illustrative and not intended to limit the present application in any way.

Step S440, the pose information and the information of the virtual plane are saved.

Illustratively, feature point extraction can be performed on any image frame in the original video according to pose information of the image frame, and sparse point cloud is obtained through calculation; fitting according to the sparse point cloud through an algorithm to generate a virtual plane; when virtual content is added in a real scene, the placement position of the virtual content can be adjusted according to the virtual plane.

Wherein, x, y and z can respectively represent the coordinates of the camera for acquiring the current image frame in the x axis, the y axis and the z axis; q ₀ ,q ₁ ,q ₂ ,q ₃ Representing a rotation quaternion; for example, canExpressed as pitch angle, azimuth angle, rotation angle, and euler angle; the number x represents the total number of planes; the plane number a may be used to represent an identification of a virtual plane corresponding to the current image frame; the number n of the plane A points is used for indicating that the number n of the vertexes included in the virtual plane corresponding to the current image frame is n; position X of Point 0 ₁ ,Y ₁ ,Z ₁ Position information including vertex 0 in a virtual plane corresponding to the current image frame; position X of point n _n ,Y _n ,Z _n And the position information is used for representing the vertex n included in the virtual plane corresponding to the current image frame.

frame number: frame num to unsigned int32;

information of virtual plane: (num: unsigned 32; planeNum0: unsigned int32; planeNum point: unsigned int32; point0 (float, float) … pointN (float ) … planeNum …);

pose information: (float, float, float, float, float);

information of virtual plane: (num: unsigned 32; planeNum0: unsigned int32; planeNum point: unsigned int32; point0 (float, float) … pointN (float ) … planeNum …).

Step S450, the original video recording is finished.

For example, as shown in fig. 19, the terminal device detects that the user clicks the shot control 550 again, and ends the recording of the video; for example, the current recorded video is 20 seconds.

Step S460, editing the original video.

Editing the original video through a visual interface of a virtual plane, for example; any frame image of 8 th second in the original video can be extracted, the display interface shown in fig. 20 can further comprise a display plane option 570, the terminal device detects that the user clicks the display plane option 570, and the generated virtual plane 562 can be displayed in the display interface, as shown in fig. 21.

For example, during the user's addition of virtual content, virtual plane 562 may be displayed on a display interface; when the user clicks on the screen/gesture operation to place virtual content, the user's operation collides with the virtual plane 562, thereby determining the place of the virtual content.

It should be appreciated that virtual plane 562 may be displayed in the interface as virtual content is edited, such as by adjusting the position of the virtual content; after editing is completed, virtual plane 562 does not appear in the AR video; virtual plane 562 is used for a user to determine where virtual content is added to the video.

Step S470, generating AR video including virtual content.

It should be appreciated that the above illustration is to aid one skilled in the art in understanding the embodiments of the application and is not intended to limit the embodiments of the application to the specific numerical values or the specific scenarios illustrated. It will be apparent to those skilled in the art from the foregoing description that various equivalent modifications or variations can be made, and such modifications or variations are intended to be within the scope of the embodiments of the present application.

The processing method of the AR video according to the embodiment of the present application is described in detail above with reference to fig. 1 to 21, and the device embodiment of the present application will be described in detail below with reference to fig. 22 and 23. It should be understood that, the apparatus in the embodiments of the present application may perform the foregoing processing method of AR video in the embodiments of the present application, that is, the specific working processes of the following various products may refer to the corresponding processes in the foregoing method embodiments.

Fig. 22 is a schematic structural diagram of a processing device for augmented reality video provided in the present application. The processing apparatus 600 includes an acquisition unit 610 and a processing unit 620.

The acquiring unit 610 acquires an original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing a pose of a terminal device when the terminal device acquires the original video; the processing unit 620 is configured to generate a virtual plane according to the original video and the pose information, where the virtual plane is used to determine location information for adding virtual content to the original video; and adding the virtual content into the original video according to the virtual plane to generate an AR video.

Optionally, as an embodiment, the pose information includes three-dimensional pose information, and the processing unit 620 is further configured to:

and representing the three-dimensional attitude information by quaternion.

Optionally, as an embodiment, the processing unit 620 is specifically configured to:

and generating the virtual plane according to the characteristic points.

Optionally, as an embodiment, the processing unit 620 is further configured to:

and storing the pose information and the information of the virtual plane.

Optionally, as an embodiment, the processing unit 620 is further configured to:

Optionally, as an embodiment, the virtual plane includes a first virtual plane, where the first virtual plane refers to a virtual plane corresponding to a first image frame, and the first image frame is any one image frame in the original video;

The processing device 600 is embodied as a functional unit. The term "unit" herein may be implemented in software and/or hardware, without specific limitation.

For example, a "unit" may be a software program, a hardware circuit or a combination of both that implements the functions described above. The hardware circuitry may include application specific integrated circuits (application specific integrated circuit, ASICs), electronic circuits, processors (e.g., shared, proprietary, or group processors, etc.) and memory for executing one or more software or firmware programs, merged logic circuits, and/or other suitable components that support the described functions.

Thus, the elements of the examples described in the embodiments of the present application can be implemented in electronic hardware, or in a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Fig. 23 shows a schematic structural diagram of an electronic device provided in the present application. The dashed line in fig. 23 indicates that the unit or the module is optional. The electronic device 700 may be used to implement the processing methods described in the method embodiments described above.

The electronic device 700 includes one or more processors 701, which one or more processors 702 may support the electronic device 700 to implement the methods in the method embodiments. The processor 701 may be a general-purpose processor or a special-purpose processor. For example, the processor 701 may be a central processing unit (central processing unit, CPU), digital signal processor (digital signal processor, DSP), application specific integrated circuit (application specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA), or other programmable logic device such as discrete gates, transistor logic, or discrete hardware components.

The processor 701 may be used to control the electronic device 700, execute a software program, and process data of the software program. The electronic device 700 may further comprise a communication unit 705 for enabling input (reception) and output (transmission) of signals.

For example, the electronic device 700 may be a chip, the communication unit 705 may be an input and/or output circuit of the chip, or the communication unit 705 may be a communication interface of the chip, which may be an integral part of a terminal device or other electronic device.

For another example, the electronic device 700 may be a terminal device, the communication unit 705 may be a transceiver of the terminal device, or the communication unit 705 may be a transceiver circuit of the terminal device.

The electronic device 700 may include one or more memories 702 having a program 704 stored thereon, the program 704 being executable by the processor 701 to generate instructions 703 such that the processor 701 performs the AR video processing method described in the above method embodiments according to the instructions 703.

Optionally, the memory 702 may also have data stored therein. Alternatively, processor 701 may also read data stored in memory 702, which may be stored at the same memory address as program 704, or which may be stored at a different memory address than program 704.

The processor 701 and the memory 702 may be provided separately or may be integrated together; for example, integrated on a System On Chip (SOC) of the terminal device.

Illustratively, the memory 702 may be used to store a related program 704 of the processing method of AR video provided in the embodiments of the present application, and the processor 701 may be used to invoke the related program 704 of the processing method of AR video stored in the memory 702 when the AR video is edited, to perform the processing of AR video of the embodiments of the present application; for example, acquiring an original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing a pose of a terminal device when the terminal device acquires the original video; the processing unit is used for generating a virtual plane according to the original video and the pose information, and the virtual plane is used for determining position information for adding virtual content in the original video; and adding the virtual content into the original video according to the virtual plane to generate an AR video.

The present application also provides a computer program product which, when executed by the processor 701, implements the processing method described in any of the method embodiments of the present application.

The computer program product may be stored in the memory 702, for example, the program 704, and the program 704 is finally converted into an executable object file capable of being executed by the processor 701 through preprocessing, compiling, assembling, and linking.

The present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a computer, implements a method according to any of the method embodiments of the present application. The computer program may be a high-level language program or an executable object program.

Optionally, the computer readable storage medium is, for example, memory 702. The memory 702 may be volatile memory or nonvolatile memory, or the memory 702 may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

It will be clearly understood by those skilled in the art that, for convenience and brevity of description, specific working processes and technical effects of the apparatus and device described above may refer to corresponding processes and technical effects in the foregoing method embodiments, which are not described in detail herein.

In several embodiments provided in the present application, the disclosed systems, apparatuses, and methods may be implemented in other manners. For example, some features of the method embodiments described above may be omitted, or not performed. The above-described apparatus embodiments are merely illustrative, the division of units is merely a logical function division, and there may be additional divisions in actual implementation, and multiple units or components may be combined or integrated into another system. In addition, the coupling between the elements or the coupling between the elements may be direct or indirect, including electrical, mechanical, or other forms of connection.

It should be understood that, in various embodiments of the present application, the size of the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

In addition, the terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely one association relationship describing the associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

In summary, the foregoing description is only a preferred embodiment of the technical solution of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A method for processing an augmented reality AR video, comprising:

acquiring original video and pose information, wherein the original video is used for representing a video of a real object, and the pose information is used for representing the pose of terminal equipment when the terminal equipment acquires the original video;

generating a virtual plane according to the original video and the pose information, wherein the virtual plane is used for determining the position for adding virtual content in the original video;

Adding the virtual content in the original video according to the virtual plane to generate an AR video;

the method further comprises the steps of:

storing the pose information and the information of the virtual plane; the user-defined information comprises pose information and information of the virtual plane;

the storing the pose information and the information of the virtual plane includes:

the pose information and the information of the virtual plane are stored in a binary file, specifically: storing the custom information corresponding to each image frame in the original video as an independent bin file according to the following data format:

frame number: frame num to unsigned int32;

pose information: (data 1, data 2, data 3, data 4, data 5, data 6, data 7); wherein, the data 1 to the data 7 are data in float format;

information of virtual plane: (n u m: u n s ig n edi n t3 2;pla n e N u m 0:u n s ig n edi n t3 2;planeNumPoint:unsigned int32;point0 (float, float) … pointN (float ) … playNumN …);

when editing the original video, loading the original video and the bin file at the same time; synchronizing and aligning an image frame in an original video with user-defined information corresponding to the image frame according to a frame number;

Or,

the pose information and the information of the virtual plane are stored in the supplementary enhancement information corresponding to the original video, specifically: the following information is stored into SEI information of h.264/h.265 when video compression coding is carried out:

pose information: (float, float, float, float, float);

when the edited video is decoded, the custom information is decoded according to the format.

2. The processing method of claim 1, wherein the pose information comprises three-dimensional pose information, further comprising:

and representing the three-dimensional attitude information by quaternion.

3. The processing method according to claim 1 or 2, wherein the generating information of a virtual plane from the original video and the pose information includes:

and generating the virtual plane according to the characteristic points.

4. The method of processing according to claim 1, further comprising:

5. The processing method of any of claims 1, 2, 4, wherein the adding the virtual content to the original video according to the information of the virtual plane to generate AR video comprises:

6. The processing method of claim 4, wherein the virtual plane comprises a first virtual plane, the first virtual plane being a virtual plane corresponding to a first image frame, the first image frame being any one of the original video;

7. An electronic device, the electronic device comprising: one or more processors, memory, and a display screen; the memory is coupled with the one or more processors, the memory is for storing computer program code, the computer program code comprising computer instructions that the one or more processors call to cause the electronic device to perform the processing method of any of claims 1-6.

8. A chip system for application to an electronic device, the chip system comprising one or more processors for invoking computer instructions to cause the electronic device to perform the processing method of any of claims 1 to 6.

9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, causes the processor to perform the processing method of any one of claims 1 to 6.