WO2023000746A1 - 增强现实视频的处理方法与电子设备 - Google Patents

增强现实视频的处理方法与电子设备 Download PDF

Info

Publication number
WO2023000746A1
WO2023000746A1 PCT/CN2022/089308 CN2022089308W WO2023000746A1 WO 2023000746 A1 WO2023000746 A1 WO 2023000746A1 CN 2022089308 W CN2022089308 W CN 2022089308W WO 2023000746 A1 WO2023000746 A1 WO 2023000746A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual plane
information
video
original video
pose information
Prior art date
Application number
PCT/CN2022/089308
Other languages
English (en)
French (fr)
Inventor
刘小伟
陈兵
王国毅
周俊伟
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Publication of WO2023000746A1 publication Critical patent/WO2023000746A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics

Definitions

  • the present application relates to the field of terminals, in particular to an augmented reality video processing method and electronic equipment.
  • Augmented reality (augmented reality, AR) technology is a technology that calculates the position and angle of the camera image in real time and adds the corresponding image. It is a new technology that "seamlessly" integrates real world information and virtual world information. The goal of this technology is to put the virtual world on the screen and interact with the real world.
  • the present application provides an augmented reality video processing method and electronic equipment, which can better integrate virtual content and real object video when recording AR video, and improve the video quality of AR video.
  • a processing method for augmented reality video including:
  • the original video is used to represent the video of the real object
  • the pose information is used to represent the pose of the terminal device when acquiring the original video; according to the original video and the pose
  • the information generates a virtual plane, and the virtual plane is used to determine the position information for adding the virtual content in the original video; adding the virtual content in the original video according to the virtual plane to generate an AR video.
  • the pose information corresponding to the original video can be obtained when the original video is obtained; a virtual plane can be obtained according to the pose information and the original video; when adding virtual content to the image frame of the original video, the virtual plane can As a reference plane, the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better integrated into the original video and the video quality of the AR video can be improved.
  • the pose information is used to represent the pose of the camera of the terminal device when acquiring the original video; the pose information may include pose information and position information.
  • the pose information includes three-dimensional pose information, and further includes:
  • the three-dimensional pose information is represented by a quaternion.
  • the three-dimensional pose information can be converted into a quaternion to represent, thereby avoiding the ambiguity caused by expressing the pose information through three parameters.
  • the generating information of the virtual plane according to the original video and the pose information includes:
  • the virtual plane is generated according to the feature points.
  • the feature point of the image frame may refer to a point where the gray value of the image changes drastically, or a point with a large curvature on the edge of the image; the feature point may be used to identify an object in the image.
  • the first aspect in combination with the first aspect, in some implementation manners of the first aspect, it also includes:
  • the pose information and the information of the virtual plane are saved.
  • saving the pose information and the information of the virtual plane can make it possible to generate a new AR by adding virtual content to the original video according to the pose information of the original video and the information of the virtual plane after the recording of the original video.
  • Video Since the pose information and virtual plane information are saved, users can edit the original video multiple times to generate AR videos with different virtual content.
  • the storing the pose information and the information of the virtual plane includes:
  • the pose information and the information of the virtual plane are saved in a binary file.
  • the terminal device may save the pose information and the virtual plane information as independent binary files.
  • the original video, the pose information corresponding to the original video, and the virtual plane information may be stored in the same directory.
  • the pose information and virtual plane information corresponding to the original video may be saved in the terminal device with the same name as the original video.
  • the frame number of each image frame may be used as an identifier, and the pose information corresponding to the original video and the virtual plane information may be stored in the terminal device.
  • the storing the pose information and the information of the virtual plane includes:
  • the pose information and the information of the virtual plane are stored in supplementary enhancement information corresponding to the original video.
  • the pose information and the information of the virtual plane may be stored in supplementary enhancement information of h.264 or h.265 during video compression encoding.
  • the first aspect in combination with the first aspect, in some implementation manners of the first aspect, it also includes:
  • the stored information when storing the pose information and the information of the virtual plane, the stored information can be compressed, so that the memory space occupied by the stored information can be effectively reduced.
  • At least one of the following manners may be used to compress the stored pose information and virtual plane information:
  • the plane number of the virtual plane can be saved in the form of unsigned characters; or, for the description of vertices in the virtual plane, the horizontal plane can retain the Z-axis information of a point Delete the Z-axis information of other points, and the vertical plane can retain the Y-axis information of one point and delete the Y-axis information of other points; or, the position description of the vertex can use float16; or, when saving the information of the virtual plane, only the current field of view can be saved the inner plane.
  • adding the virtual content to the original video according to the information of the virtual plane to generate an AR video includes:
  • the AR video is generated by adding the virtual content to the original video according to the virtual plane.
  • the virtual plane includes a first virtual plane, and the first virtual plane refers to a virtual plane corresponding to the first image frame, and the first image frame is any image frame in the original video;
  • the information of the first virtual plane includes the total number of image frames, the identification of the first virtual plane, the number of vertices included in the first virtual plane, and the position information of each vertex included in the first virtual plane, so
  • the total number refers to the total number of image frames included in the original video.
  • an AR video processing device in a second aspect, includes an acquisition unit and a processing unit;
  • the acquiring unit is used to acquire the original video and pose information
  • the original video is used to represent the video of the real object
  • the pose information is used to represent the pose of the terminal device when acquiring the original video
  • the The processing unit is used to generate a virtual plane according to the original video and the pose information, and the virtual plane is used to determine the position information for adding virtual content in the original video; according to the virtual plane in the original video Adding the virtual content generates an AR video.
  • the pose information includes three-dimensional pose information
  • the processing unit is further configured to:
  • the three-dimensional pose information is represented by a quaternion.
  • the processing unit is specifically configured to:
  • the virtual plane is generated according to the feature points.
  • the processing unit is further configured to:
  • the pose information and the information of the virtual plane are saved.
  • the processing unit is specifically configured to:
  • the pose information and the information of the virtual plane are saved in a binary file.
  • the processing unit is specifically configured to:
  • the pose information and the information of the virtual plane are stored in supplementary enhancement information corresponding to the original video.
  • the processing unit is further configured to:
  • the processing unit is specifically configured to:
  • the AR video is generated by adding the virtual content to the original video according to the virtual plane.
  • the virtual plane includes a first virtual plane, and the first virtual plane refers to a virtual plane corresponding to the first image frame, and the first image frame is any image frame in the original video;
  • the information of the first virtual plane includes the total number of image frames, the identification of the first virtual plane, the number of vertices included in the first virtual plane, and the position information of each vertex included in the first virtual plane, so
  • the total number refers to the total number of image frames included in the original video.
  • the foregoing AR video processing device may refer to a chip.
  • the acquisition unit may refer to an output interface, a pin or a circuit, etc.; the processing unit may refer to a processing unit inside the chip.
  • an electronic device in a third aspect, includes: one or more processors, a memory, and a display screen; the memory is coupled to the one or more processors, and the memory is used to store computer program code, the computer program code comprising computer instructions that are invoked by the one or more processors to cause the electronic device to perform:
  • the original video is used to represent the video of the real object
  • the pose information is used to represent the pose of the terminal device when acquiring the original video; according to the original video and the pose
  • the information generates a virtual plane, and the virtual plane is used to determine the position information for adding the virtual content in the original video; adding the virtual content in the original video according to the virtual plane to generate an AR video.
  • the pose information includes three-dimensional pose information
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the three-dimensional pose information is represented by a quaternion.
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the virtual plane is generated according to the feature points.
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the pose information and the information of the virtual plane are saved.
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the pose information and the information of the virtual plane are saved in a binary file.
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the pose information and the information of the virtual plane are stored in supplementary enhancement information corresponding to the original video.
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the one or more processors invoke the computer instructions so that the electronic device further executes:
  • the AR video is generated by adding the virtual content to the original video according to the virtual plane.
  • the virtual plane includes a first virtual plane, and the first virtual plane refers to a virtual plane corresponding to the first image frame, and the first image frame is any image frame in the original video;
  • the information of the first virtual plane includes the total number of image frames, the identification of the first virtual plane, the number of vertices included in the first virtual plane, and the position information of each vertex included in the first virtual plane, so
  • the total number refers to the total number of image frames included in the original video.
  • an electronic device in a fourth aspect, includes: one or more processors, a memory, and a display screen; the memory is coupled to the one or more processors, and the memory is used to store computer program code, the computer program code includes computer instructions, and the one or more processors call the computer instructions to make the electronic device execute any processing method in the first aspect.
  • a chip system is provided, the chip system is applied to an electronic device, and the chip system includes one or more processors, and the processor is used to call a computer instruction so that the electronic device executes the first aspect any of the processing methods.
  • a computer-readable storage medium stores computer program codes, and when the computer program codes are run by an electronic device, the electronic device executes any of the items in the first aspect. a method of treatment.
  • a computer program product comprising: computer program code, when the computer program code is run by an electronic device, the electronic device is made to execute any one of the processing methods in the first aspect .
  • the virtual plane can be obtained according to the pose information and the original video; when adding virtual content to the image frame of the original video, the virtual plane It can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane; therefore, in the embodiment of the present application, the virtual content can be better integrated into the original video through the virtual plane, thereby improving the generated Video quality for AR videos.
  • Fig. 1 is a schematic diagram of a hardware system applicable to the device of the present application
  • Fig. 2 is a schematic diagram of a software system applicable to the device of the present application
  • Fig. 3 is a schematic diagram of an application scenario provided by the present application.
  • FIG. 4 is a schematic diagram of a processing method for augmented reality video provided by the present application.
  • FIG. 5 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 6 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 7 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 8 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 9 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 10 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 11 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 12 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 13 is a schematic diagram of a processing method for augmented reality video provided by the present application.
  • FIG. 14 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 15 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 16 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 17 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 18 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 19 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 20 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • FIG. 21 is a schematic diagram of a display interface for AR video processing provided by the present application.
  • Fig. 22 is a schematic structural diagram of an augmented reality video processing device provided by the present application.
  • FIG. 23 is a schematic structural diagram of an electronic device provided by the present application.
  • Fig. 1 shows a hardware system applicable to a terminal device of this application.
  • the terminal device 100 may be a mobile phone, a smart screen, a tablet computer, a wearable electronic device, a vehicle electronic device, an augmented reality (augmented reality, AR) device, a virtual reality (virtual reality, VR) device, a notebook computer, a super mobile personal computer ( Ultra-mobile personal computer, UMPC), netbook, personal digital assistant (personal digital assistant, PDA), projector, etc., the embodiment of the present application does not impose any limitation on the specific type of the terminal device 100.
  • the terminal device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2 , mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display screen 194, and A subscriber identification module (subscriber identification module, SIM) card interface 195 and the like.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, bone conduction sensor 180M, etc.
  • the structure shown in FIG. 1 does not constitute a specific limitation on the terminal device 100 .
  • the terminal device 100 may include more or fewer components than those shown in FIG. 1, or the terminal device 100 may include a combination of some of the components shown in FIG. 1, or , the terminal device 100 may include subcomponents of some of the components shown in FIG. 1 .
  • the components shown in FIG. 1 can be realized in hardware, software, or a combination of software and hardware.
  • Processor 110 may include one or more processing units.
  • the processor 110 may include at least one of the following processing units: an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor (image signal processor) , ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, neural network processor (neural-network processing unit, NPU).
  • an application processor application processor, AP
  • modem processor graphics processing unit
  • graphics processing unit graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural network processor neural-network processing unit
  • the controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is a cache memory.
  • the memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the processor 110 may include at least one of the following interfaces: an inter-integrated circuit (inter-integrated circuit, I2C) interface, an inter-integrated circuit sound (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, universal asynchronous receiver/transmitter (UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, SIM interface, USB interface.
  • I2C inter-integrated circuit
  • I2S inter-integrated circuit sound
  • PCM pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (serial data line, SDA) and a serial clock line (derail clock line, SCL).
  • processor 110 may include multiple sets of I2C buses.
  • the processor 110 may be respectively coupled to the touch sensor 180K, the charger, the flashlight, the camera 193 and the like through different I2C bus interfaces.
  • the processor 110 may be coupled to the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the terminal device 100 .
  • the I2S interface can be used for audio communication.
  • processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled to the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170 .
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
  • the PCM interface can also be used for audio communication, sampling, quantizing and encoding the analog signal.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM interface.
  • the audio module 170 can also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • a UART interface is generally used to connect the processor 110 and the wireless communication module 160 .
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function.
  • the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with peripheral devices such as the display screen 194 and the camera 193 .
  • MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 communicates with the camera 193 through a CSI interface to realize the shooting function of the terminal device 100 .
  • the processor 110 communicates with the display screen 194 through the DSI interface to realize the display function of the terminal device 100 .
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal interface or as a data signal interface.
  • the GPIO interface can be used to connect the processor 110 with the camera 193 , the display screen 194 , the wireless communication module 160 , the audio module 170 and the sensor module 180 .
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface or MIPI interface.
  • the USB interface 130 is an interface conforming to the USB standard specification, for example, it can be a mini (Mini) USB interface, a micro (Micro) USB interface or a C-type USB (USB Type C) interface.
  • the USB interface 130 can be used to connect a charger to charge the terminal device 100 , can also be used to transmit data between the terminal device 100 and peripheral devices, and can also be used to connect an earphone to play audio through the earphone.
  • the USB interface 130 can also be used to connect other terminal devices 100, such as AR devices.
  • connection relationship between the modules shown in FIG. 1 is only a schematic illustration, and does not constitute a limitation on the connection relationship between the modules of the terminal device 100 .
  • each module of the terminal device 100 may also use a combination of various connection modes in the foregoing embodiments.
  • the charging management module 140 is used to receive power from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 can receive the current of the wired charger through the USB interface 130 .
  • the charging management module 140 can receive electromagnetic waves through the wireless charging coil of the terminal device 100 (the current path is shown as a dotted line). While the charging management module 140 is charging the battery 142 , it can also supply power to the terminal device 100 through the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives the input from the battery 142 and/or the charging management module 140 to provide power for the processor 110 , the internal memory 121 , the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (eg, leakage, impedance).
  • the power management module 141 may be set in the processor 110, or the power management module 141 and the charge management module 140 may be set in the same device.
  • the wireless communication function of the terminal device 100 may be implemented by components such as the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, and a baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the terminal device 100 can be used to cover single or multiple communication frequency bands. Different antennas can also be multiplexed to improve the utilization of the antennas.
  • Antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 may provide a wireless communication solution applied to the terminal device 100, such as at least one of the following solutions: a second generation (2th generation, 2G) mobile communication solution, a third generation (3th generation, 3G) Mobile communication solutions, fourth generation (4th generation, 4G) mobile communication solutions, fifth generation (5th generation, 5G) mobile communication solutions.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, filter and amplify the received electromagnetic waves, and then send them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and the amplified signal is converted into electromagnetic waves and radiated by the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be set in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be set in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator sends the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is passed to the application processor after being processed by the baseband processor.
  • the application processor outputs a sound signal through an audio device (for example, a speaker 170A, a receiver 170B), or displays an image or video through a display screen 194 .
  • the modem processor may be a stand-alone device. In some other embodiments, the modem processor may be independent from the processor 110, and be set in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can also provide a wireless communication solution applied to the terminal device 100, such as at least one of the following solutions: wireless local area networks (wireless local area networks, WLAN), Bluetooth (bluetooth, BT ), Bluetooth low energy (bluetooth low energy, BLE), ultra wide band (ultra wide band, UWB), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication (near field communication, NFC), infrared (infrared, IR) technology.
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency-modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be transmitted from the processor 110 , frequency-modulate and amplify it, and convert the signal into electromagnetic wave and radiate it through the antenna 2 .
  • the antenna 1 of the terminal device 100 is coupled to the mobile communication module 150, and the antenna 2 of the terminal device 100 is coupled to the wireless communication module 160, so that the terminal device 100 can communicate with the network and other electronic devices through wireless communication technology.
  • the wireless communication technology may include at least one of the following communication technologies: global system for mobile communications (GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access (CDMA), wideband code division multiple access (WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM, IR technology.
  • GSM global system for mobile communications
  • general packet radio service general packet radio service
  • CDMA code division multiple access
  • WCDMA wideband code division multiple access
  • TD-SCDMA time-division code division multiple access
  • LTE long term evolution
  • BT GNSS
  • WLAN NFC
  • FM FM
  • IR technology IR technology
  • the GNSS may include at least one of the following positioning technologies: global positioning system (global positioning system, GPS), global navigation satellite system (global navigation satellite system, GLONASS), Beidou satellite navigation system (beidou navigation satellite system, BDS), Quasi-zenith satellite system (QZSS), satellite based augmentation systems (SBAS).
  • global positioning system global positioning system
  • GLONASS global navigation satellite system
  • Beidou satellite navigation system beidou navigation satellite system, BDS
  • QZSS Quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the terminal device 100 may implement a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
  • Display 194 may be used to display images or video.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (organic light-emitting diode, OLED), active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), flexible Light-emitting diode (flex light-emitting diode, FLED), mini light-emitting diode (mini light-emitting diode, Mini LED), micro light-emitting diode (micro light-emitting diode, Micro LED), micro OLED (Micro OLED) or quantum dot light emitting Diodes (quantum dot light emitting diodes, QLED).
  • the terminal device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • the terminal device 100 may realize the shooting function through the ISP, the camera 193 , the video codec, the GPU, the display screen 194 , and the application processor.
  • the ISP is used for processing the data fed back by the camera 193 .
  • the light is transmitted to the photosensitive element of the camera through the lens, and the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, and converts it into an image visible to the naked eye.
  • ISP can optimize the algorithm of image noise, brightness and color, and ISP can also optimize parameters such as exposure and color temperature of the shooting scene.
  • the ISP may be located in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object generates an optical image through the lens and projects it to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the light signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard red green blue (red green blue, RGB), YUV and other image signals.
  • the terminal device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal device 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the terminal device 100 may support one or more video codecs.
  • the terminal device 100 can play or record videos in multiple encoding formats, for example: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3 and MPEG4.
  • MPEG moving picture experts group
  • NPU is a processor that draws on the structure of biological neural networks. For example, it can quickly process input information by drawing on the transmission mode between neurons in the human brain, and it can also continuously learn by itself. Functions such as intelligent cognition of the terminal device 100 can be implemented through the NPU, such as image recognition, face recognition, voice recognition and text understanding.
  • the external memory interface 120 can be used to connect an external memory card, such as a secure digital (secure digital, SD) card, so as to expand the storage capacity of the terminal device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
  • the internal memory 121 may be used to store computer-executable program codes including instructions.
  • the internal memory 121 may include an area for storing programs and an area for storing data.
  • the storage program area can store an operating system and an application program required by at least one function (for example, a sound playing function and an image playing function).
  • the storage data area can store data created during the use of the terminal device 100 (for example, audio data and phonebook).
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, for example: at least one magnetic disk storage device, flash memory device, and universal flash storage (universal flash storage, UFS), etc.
  • the processor 110 executes various processing methods of the terminal device 100 by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
  • the terminal device 100 can implement audio functions, such as music playing and recording, through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor.
  • audio functions such as music playing and recording
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and can also be used to convert analog audio input into digital audio signal.
  • the audio module 170 may also be used to encode and decode audio signals.
  • the audio module 170 or some functional modules of the audio module 170 may be set in the processor 110 .
  • Speaker 170A also known as a horn, is used to convert audio electrical signals into sound signals.
  • the terminal device 100 can listen to music or make a hands-free call through the speaker 170A.
  • Receiver 170B also known as an earpiece, is used to convert audio electrical signals into audio signals.
  • the user uses the terminal device 100 to answer calls or voice messages, he can listen to the voice by putting the receiver 170B close to the ear.
  • Microphone 170C also known as microphone or microphone, is used to convert sound signals into electrical signals. When the user makes a call or sends a voice message, a sound signal may be input into the microphone 170C by uttering a sound close to the microphone 170C.
  • the terminal device 100 may be provided with at least one microphone 170C. In other embodiments, the terminal device 100 may be provided with two microphones 170C to implement the noise reduction function. In some other embodiments, the terminal device 100 may also be provided with three, four or more microphones 170C to implement functions such as identifying sound sources and directional recording.
  • the processor 110 can process the electrical signal output by the microphone 170C. For example, the audio module 170 and the wireless communication module 160 can be coupled through a PCM interface. The electrical signal is transmitted to the processor 110; the processor 110 performs volume analysis and frequency analysis on the electrical signal to determine the volume and frequency of the ambient sound.
  • the earphone interface 170D is used for connecting wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal device 100 platform (open mobile terminal platform, OMTP) standard interface, a cellular telecommunications industry association of the USA (cellular telecommunications industry association of the USA, CTIA) standard interface .
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the pressure sensor 180A is used to sense the pressure signal and convert the pressure signal into an electrical signal.
  • pressure sensor 180A may be disposed on display screen 194 .
  • pressure sensor 180A may be a resistive pressure sensor, an inductive pressure sensor or a capacitive pressure sensor.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials.
  • touch operations acting on the same touch position but with different touch operation intensities may correspond to different operation instructions. For example: when the touch operation with the touch operation intensity less than the first pressure threshold acts on the short message application icon, execute the instruction of viewing the short message; when the touch operation with the intensity greater than or equal to the first pressure threshold acts on the short message application icon , to execute the instruction of creating a new short message.
  • the gyroscope sensor 180B can be used to determine the motion posture of the terminal device 100 .
  • the angular velocity of the terminal device 100 around three axes ie, x-axis, y-axis and z-axis
  • the gyro sensor 180B can be used for image stabilization. For example, when the shutter is pressed, the gyro sensor 180B detects the shaking angle of the terminal device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shaking of the terminal device 100 through reverse movement to achieve anti-shake.
  • the gyro sensor 180B can also be used in scenarios such as navigation and somatosensory games.
  • the air pressure sensor 180C is used to measure air pressure.
  • the terminal device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C to assist positioning and navigation.
  • the magnetic sensor 180D includes a Hall sensor.
  • the terminal device 100 may use the magnetic sensor 180D to detect the opening and closing of the flip holster.
  • the terminal device 100 may detect opening and closing of the clamshell according to the magnetic sensor 180D.
  • the terminal device 100 may set features such as automatic unlocking of the flip cover according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover.
  • the acceleration sensor 180E can detect the magnitude of the acceleration of the terminal device 100 in various directions (generally x-axis, y-axis and z-axis). When the terminal device 100 is stationary, the magnitude and direction of gravity can be detected. The acceleration sensor 180E can also be used to identify the posture of the terminal device 100 as an input parameter for applications such as switching between horizontal and vertical screens and a pedometer.
  • the distance sensor 180F is used to measure distance.
  • the terminal device 100 can measure the distance by infrared or laser. In some embodiments, for example, in a shooting scene, the terminal device 100 may use the distance sensor 180F for distance measurement to achieve fast focusing.
  • the proximity light sensor 180G may include, for example, a light-emitting diode (LED) and a light detector, such as a photodiode.
  • the LEDs may be infrared LEDs.
  • the terminal device 100 emits infrared light through the LED.
  • the terminal device 100 detects infrared reflected light from nearby objects using a photodiode. When the reflected light is detected, the terminal device 100 may determine that there is an object nearby. When no reflected light is detected, the terminal device 100 may determine that there is no object nearby.
  • the terminal device 100 can use the proximity light sensor 180G to detect whether the user is holding the terminal device 100 close to the ear to make a call, so as to automatically turn off the screen to save power.
  • the proximity light sensor 180G can also be used for automatic unlocking and automatic screen locking in leather case mode or pocket mode.
  • the ambient light sensor 180L is used for sensing ambient light brightness.
  • the terminal device 100 can adaptively adjust the brightness of the display screen 194 according to the perceived ambient light brightness.
  • the ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the terminal device 100 is in the pocket to prevent accidental touch.
  • the fingerprint sensor 180H is used to collect fingerprints.
  • the terminal device 100 can use the collected fingerprint characteristics to implement functions such as unlocking, accessing the application lock, taking pictures, and answering incoming calls.
  • the temperature sensor 180J is used to detect temperature.
  • the terminal device 100 uses the temperature detected by the temperature sensor 180J to implement a temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds the threshold, the terminal device 100 may reduce the performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection.
  • the terminal device 100 when the temperature is lower than another threshold, the terminal device 100 heats the battery 142 to avoid abnormal shutdown of the terminal device 100 caused by the low temperature.
  • the terminal device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.
  • the touch sensor 180K is also referred to as a touch device.
  • the touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a touch screen.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor 180K may transmit the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation can be provided through the display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the terminal device 100, and disposed at a different position from the display screen 194.
  • the bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can acquire the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the human pulse and receive the blood pressure beating signal. In some embodiments, the bone conduction sensor 180M can also be disposed in the earphone, combined into a bone conduction earphone.
  • the audio module 170 can analyze the voice signal based on the vibration signal of the vibrating bone mass of the vocal part acquired by the bone conduction sensor 180M, so as to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.
  • Keys 190 include a power key and a volume key.
  • the key 190 can be a mechanical key or a touch key.
  • the terminal device 100 can receive key input signals and implement functions related to case input signals.
  • the motor 191 can generate vibrations.
  • the motor 191 can be used for notification of incoming calls, and can also be used for touch feedback.
  • the motor 191 can generate different vibration feedback effects for touch operations on different application programs. For touch operations acting on different areas of the display screen 194, the motor 191 can also generate different vibration feedback effects. Different application scenarios (for example, time reminder, receiving information, alarm clock and games) may correspond to different vibration feedback effects.
  • the touch vibration feedback effect can also support customization.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging status and the change of the battery capacity, and can also be used to indicate messages, missed calls and notifications.
  • the SIM card interface 195 is used for connecting a SIM card.
  • the SIM card can be inserted into the SIM card interface 195 to realize contact with the terminal device 100 , and can also be pulled out from the SIM card interface 195 to realize separation from the terminal device 100 .
  • the terminal device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. Multiple cards can be inserted into the same SIM card interface 195 at the same time, and the types of the multiple cards can be the same or different.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the terminal device 100 interacts with the network through the SIM card to implement functions such as calling and data communication.
  • the terminal device 100 adopts an embedded-SIM (embedded-SIM, eSIM) card, and the eSIM card may be embedded in the terminal device 100 and cannot be separated from the terminal device 100 .
  • embedded-SIM embedded-SIM
  • the hardware system of the terminal device 100 is described in detail above, and the software system of the terminal device 100 is introduced below.
  • the software system may adopt a layered architecture, an event-driven architecture, a micro-kernel architecture, a micro-service architecture, or a cloud architecture.
  • the embodiment of the present application uses a layered architecture as an example to exemplarily describe the software system of the terminal device 100 .
  • a software system adopting a layered architecture is divided into several layers, and each layer has a clear role and division of labor. Layers communicate through software interfaces.
  • the software system can be divided into four layers, which are application program layer, application program framework layer, Android Runtime (Android Runtime) and system library, and kernel layer respectively from top to bottom.
  • the application layer can include applications such as camera, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, and short message.
  • the application framework layer provides an application programming interface (Application Programming Interface, API) and a programming framework for applications in the application layer.
  • API Application Programming Interface
  • the application framework layer can include some predefined functions.
  • the application framework layer includes window manager, content provider, view system, resource manager and notification manager, simultaneous positioning and mapping (Simultaneous Localization And Mapping, SLAM) pose calculation module and plane generation module;
  • the framework layer may also include a telephony manager.
  • the SLAM pose calculation module is used to output pose information and sparse point cloud; among them, the pose information refers to the pose information of the camera of the terminal device, and the camera of the terminal device is used to obtain the video of the real scene; according to any frame in the video
  • the pose information of the image can be used to extract the feature points of the frame image, and the sparse point cloud can be obtained through calculation.
  • the plane generation module is used to generate a virtual plane through algorithm fitting according to the sparse point cloud provided by SLAM; when adding virtual content in a real scene, the placement position of the virtual content can be adjusted according to the virtual plane; for example, the user clicks on the screen/gesture When operating to place the virtual content, the user's operation may collide with the generated virtual plane to determine the placement position of the virtual content.
  • the program instructions corresponding to the augmented reality video processing method provided in the embodiment of the present application may be executed in the SLAM pose calculation module and the plane generation module.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display, determine whether there is a status bar, lock the screen, and capture the screen.
  • Content providers are used to store and retrieve data and make it accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, and phonebook.
  • the view system includes visual controls, such as those that display text and those that display pictures.
  • the view system can be used to build applications.
  • the display interface may be composed of one or more views, for example, a display interface including an SMS notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide communication functions of the terminal device 100, such as management of call status (connected or hung up).
  • the resource manager provides various resources to the application, such as localized strings, icons, pictures, layout files, and video files.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and can automatically disappear after a short stay without user interaction.
  • the notification manager is used for download completion notifications and message reminders.
  • the notification manager can also manage notifications that appear in the status bar at the top of the system in the form of charts or scrolling text, such as notifications from applications running in the background.
  • the notification manager can also manage notifications that appear on the screen in the form of dialog windows, such as prompting text messages in the status bar, making alert sounds, vibrating electronic devices, and blinking lights.
  • the Android Runtime includes core library and virtual machine. The Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function function that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application program layer and the application program framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the system library can include multiple functional modules, such as: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: open graphics library for embedded systems (open graphics library for embedded systems, OpenGL ES) and 2D graphics engine (for example: skia graphics library (skia graphics library, SGL)).
  • surface manager surface manager
  • media library Media Libraries
  • three-dimensional graphics processing library for example: open graphics library for embedded systems (open graphics library for embedded systems, OpenGL ES)
  • 2D graphics engine for example: skia graphics library (skia graphics library, SGL)
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D layers and 3D layers for multiple applications.
  • the media library supports playback and recording of multiple audio formats, playback and recording of multiple video formats, and still image files.
  • the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, moving picture experts group audio layer III (MP3), advanced audio coding (AAC), auto Adaptive multi-rate (adaptive multi-rate, AMR), joint photographic experts group (joint photographic experts group, JPG) and portable network graphics (portable network graphics, PNG).
  • MP3 moving picture experts group audio layer III
  • AAC advanced audio coding
  • AMR auto Adaptive multi-rate
  • JPG joint photographic experts group
  • portable network graphics portable network graphics
  • the 3D graphics processing library can be used to implement 3D graphics drawing, image rendering, compositing and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer may include driver modules such as display driver, camera driver, audio driver and sensor driver.
  • a corresponding hardware interrupt is sent to the kernel layer, and the kernel layer processes the touch operation into an original input event.
  • the original input event includes information such as touch coordinates and a time stamp of the touch operation.
  • the original input event is stored in the kernel layer, and the application framework layer obtains the original input event from the kernel layer, identifies the control corresponding to the original input event, and notifies the corresponding application (application, APP) of the control.
  • the above-mentioned touch operation is a single-click operation
  • the APP corresponding to the above-mentioned control is a camera APP. After the camera APP is awakened by the single-click operation, it can call the camera driver of the kernel layer through the API, and control the camera 193 to take pictures through the camera driver.
  • the present application provides a processing method for AR video, by obtaining the pose information corresponding to the original video when acquiring the original video; a virtual plane can be obtained according to the pose information and the original video; in the image frame of the original video When adding virtual content, the virtual plane can be used as a reference plane, and the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better integrated into the original video and the video quality of the AR video can be improved.
  • the following uses the terminal device 100 as an example to describe in detail the processing method of the augmented reality video provided by the present application with reference to FIG. 3 to FIG. 21 .
  • Fig. 3 is a schematic diagram of the application scenario of the present application; as shown in Fig. 3, the AR video processing method provided by the embodiment of the present application can be applied to the field of AR video; the original video can be obtained and the target video can be obtained through AR video processing; wherein, The original video may refer to a video of a real object shot by a user, and the target video may refer to an AR video obtained by adding virtual content to the original video.
  • the AR video processing method provided in the embodiment of the present application can also be run in an application (Application, APP) to perform AR video editing; for example, the AR video APP can execute the AR video processing method of the present application.
  • the AR video processing method provided in the embodiment of the present application can also be integrated in the camera of the terminal device; Processing methods; the following two implementation methods are described in detail.
  • Implementation mode 1 implement the AR video processing method in the embodiment of the present application through an application program.
  • FIG. 4 is a schematic flow chart of the AR video processing method provided by the embodiment of the present application; the processing method 200 includes step S210 to step S260 , and these steps will be described in detail below.
  • Step S210 run the AR video APP.
  • the user can click on the AR video APP in the display interface of the terminal device; in response to the user's click operation, the terminal device can run the AR video APP; as shown in Figure 5, Figure 5 shows a graphical user interface of the terminal device (graphical user interface, GUI), the GUI may be the desktop 310 of the terminal device.
  • GUI graphical user interface
  • the AR video APP can be run to display another GUI as shown in FIG. 6; the display interface 330 shown in FIG.
  • the viewfinder frame 340, the preview image can be displayed in real time in the shooting viewfinder frame 340; the shooting interface can also include a control 350 for instructing shooting, and other shooting controls.
  • the terminal device detects that the user clicks the icon of the AR video APP on the display interface, and can start the AR video APP to display the display interface of the AR video APP;
  • the display interface can include a shooting frame; for example, in In the video recording mode, the shooting frame can be a part of the screen, or it can be the entire display screen.
  • the preview image In the preview state, that is, before the user opens the AR video APP and does not press the shooting button, the preview image can be displayed in real time in the shooting viewfinder.
  • Step S220 acquiring original video and pose information.
  • the terminal device detects that the user clicks the shooting control 350 and starts to record the image displayed in the shooting frame.
  • the action of the user for instructing the shooting may include pressing a shooting button, or may include the user equipment instructing the terminal device to take a picture by voice, or may also include other actions of the user instructing the terminal device to take a picture.
  • the above is for illustration and does not limit the present application in any way.
  • the pose information may be used to represent the pose of the camera of the terminal device when acquiring the original video; the pose information may include pose information and position information.
  • the terminal device may acquire pose information corresponding to each frame of image through the gyroscope sensor 180B shown in FIG. 1 .
  • Step S230 saving the pose information and the virtual plane information.
  • the saved pose information may refer to pose information corresponding to each image frame in the original video.
  • feature points can be extracted from the image frame, and a sparse point cloud can be obtained through calculation; a virtual plane can be generated by algorithm fitting according to the sparse point cloud; When adding virtual content to the video, the placement position of the virtual content can be adjusted according to the virtual plane.
  • saving the pose information and the information of the virtual plane can make it possible to generate a new AR by adding virtual content to the original video according to the pose information of the original video and the information of the virtual plane after the recording of the original video.
  • Video Since the pose information and virtual plane information are saved, users can edit the original video multiple times to generate AR videos with different virtual content.
  • the acquired three-dimensional pose information may be expressed as a quaternion, thereby avoiding the ambiguity caused by expressing the pose with three parameters.
  • a quaternion can refer to a real number plus three imaginary units i, j, and k; for example, a quaternion can be a linear combination of 1, i, j, and k, that is, a quaternion can generally be expressed as a+bi+cj+dk, where a, b, c, and d all represent real numbers; i, j, and k can represent rotation; among them, i rotation can represent the positive direction of the X-axis to the Y-axis in the plane where the X-axis and the Y-axis intersect For positive rotation, j rotation can represent the positive direction of the Z axis to the positive direction of the X axis in the plane where the Z axis and the X axis intersect, and the k rotation can represent the positive direction of the Y axis to the positive direction of the Z axis in the plane where the Y axis and the Z axis intersect rotation.
  • the terminal device when the terminal device receives an instruction from the user to instruct shooting; for example, when the user clicks video recording on the terminal device, the terminal device can start the initialization of the pose calculation; before the initialization is unsuccessful, the pose can be expressed as (position x/y/z, rotation quaternion), which can be expressed as (0,0,0,0,0,0), the information of the virtual plane is (number 0); when the initialization is successful, specify the image frame ( Initialize the starting frame) pose is expressed as (0,0,0,0,0,0,0), and the information of the virtual plane is expressed as (quantity x, plane number 0, plane 0 point number n, point 0 position X1, Y1, Z1,..., the position of point n (Xn, Yn, Zn).
  • the number x represents the total number of virtual planes, i.e. the total number of image frames included in the video; plane number 0 can be used to represent the first virtual plane in multiple virtual planes; plane 0 points n can be used to represent the first
  • the number of vertices included in a virtual plane is n; the position X1, Y1, Z1 of point 0 is used to represent the position information of vertex 0 in the first virtual plane; the position Xn, Yn, Zn of point n is used to represent the first
  • a virtual plane includes the position information of vertex n.
  • the information of the virtual plane may include position information of all vertices included in the virtual plane.
  • the acquired pose information corresponding to the current image frame can be expressed as (X, Y, Z, q0, q1, q2, q3)
  • the information of the virtual plane can be expressed as (quantity x, plane number A, the number n of plane A points, the position X1, Y1, Z1,... of point 0, the position Xq, Yq, Zq of point q).
  • x, y, and z can respectively represent the coordinates of the camera that acquires the current image frame on the x-axis, y-axis, and z-axis;
  • q0, q1, q2, and q3 represent rotation quaternions; for example, they can be expressed as pitch angle, azimuth angle, rotation angle, and Euler angle;
  • the quantity x represents the total number of planes;
  • the plane number A can be used to indicate the identification of the virtual plane corresponding to the current image frame;
  • the point number n of plane A is used to indicate that the virtual plane corresponding to the current image frame includes The number of vertices is n;
  • the position X1, Y1, Z1 of point 0 can be used to indicate the position information of vertex 0 in the virtual plane corresponding to the current image frame;
  • the position Xn, Yn, Zn of point n is used to indicate that the current image frame corresponds to
  • the virtual plane includes the position information of the vertex
  • an image frame in the original video can be obtained; feature points can be extracted from the image frame according to the pose information of the image frame, and a sparse point cloud can be obtained through calculation; according to the sparse point cloud information, it can be fitted to generate Virtual plane; when adding virtual content to the video, the position of the virtual content added in the video can be adjusted according to the virtual plane.
  • the user's operation may collide with the generated virtual plane to determine the placement position of the virtual content.
  • the terminal device may save the pose information and the information of the virtual plane.
  • the custom information includes the aforementioned pose information and information of the virtual plane, and the terminal device may store the custom information independently as an independent binary file (binary, bin).
  • the original video and the custom information corresponding to the original video may be saved in the same directory.
  • the custom information corresponding to the original video may be saved in the terminal device in the same name as the original video.
  • the custom information corresponding to the original video may be saved in the terminal device by using the frame number of an image frame as an identifier.
  • custom information corresponding to each image frame in the original video can be saved as an independent bin file according to the following data format:
  • Frame number Frame num: unsigned int32;
  • Pose information (data 1, data 2, data 3, data 4, data 5, data 6, data 7); among them, data 1 to data 7 can be data in float format;
  • Virtual plane information (num:unsigned int32; planeNum0:unsigned int32; planeNumPoint:unsigned int32; point0(float,float,float)...pointN(float,float,float)...planeNumN...);
  • the original video and the above bin file can be loaded at the same time; the image frame in the original video and the custom information corresponding to the image frame are synchronously aligned according to the frame number.
  • the custom information may include the aforementioned pose information and virtual plane information, and the terminal device may save the custom information into supplementary enhancement information in the video code stream corresponding to the original video.
  • the following information can be stored in the SEI information of h.264/h.265 when performing video compression encoding:
  • Pose information (float, float, float, float, float, float, float, float, float);
  • the custom information may be decoded according to the above-mentioned format when executing step S250 when decoding the edited video.
  • At least one of the following methods can be used to compress the custom information:
  • the plane number of the virtual plane can be saved in the form of unsigned char; or, for the description of the vertices in the virtual plane, a point can be reserved for the horizontal plane
  • the Z-axis information of the vertex can delete the Z-axis information of other points, and the vertical plane can retain the Y-axis information of one point and delete the Y-axis information of other points; or, the position description of the vertex can use float16; or, when saving the information of the virtual plane, only Save the plane in the current field of view.
  • the original video is obtained through the AR video APP to generate and save the pose information and virtual plane information of the original video when recording the video; on the other hand, after the original video recording ends, Each image frame in the original video can be edited; for example, adding virtual content.
  • Step S240 the original video recording ends.
  • the terminal device detects that the user clicks the shooting control 350 again, and ends the recording of the current video; for example, the duration of the recorded video is 20 seconds.
  • Step S250 open the visual interface of the virtual plane, and edit the original video.
  • the terminal device may call the saved custom information corresponding to the image frame; that is, call the pose information and plane information of the image frame.
  • the display interface 330 as shown in FIG. , as shown in FIG. 10; after the terminal device detects that the user clicks on the edit mode interface to indicate AR content selection 361, it displays the display interface as shown in FIG. 11; the display interface in FIG. 11 also includes a display plane option 362, The terminal device detects that the user can click the operation of the display plane option 362, and displays the generated virtual plane 363 on the display interface, see FIG.
  • the visual plane of virtual content for example, in the process of adding virtual content by the user, the virtual plane 363 can be displayed on the display interface; when the user clicks the screen/gesture operation to place the virtual content, the user's operation collides with the virtual plane 363, thereby Determine the placement position of the virtual content, as shown in Figure 12.
  • the virtual plane 363 when editing the virtual content, such as adjusting the position of the virtual content, the virtual plane 363 can be displayed in the interface; after the editing is completed, the virtual plane 363 will not appear in the AR video; the virtual plane 363 can be used as a reference A plane for the user to determine where to add virtual content in the video.
  • Step S260 generating an AR video including virtual content.
  • the user can edit each image frame in the original video; for example, virtual content can be added to each image frame, and the position information of the virtual content in each image frame can be adjusted; AR video of content.
  • the user can play the original video, and the user can click the pause button to extract the current image frame and edit the current image frame, that is, add virtual content to the current image frame; when the user clicks the play button again, the current image frame edits Finish.
  • the pose information corresponding to the original video can be obtained when the original video is obtained; a virtual plane can be obtained according to the pose information and the original video; when adding virtual content to the image frame of the original video, the virtual plane can As a reference plane, the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better integrated into the original video and the video quality of the AR video can be improved.
  • Implementation Mode 2 Integrate the AR video processing method of the embodiment of the present application into the camera mode of the terminal device.
  • FIG. 13 is a schematic flowchart of the AR video processing method provided by the embodiment of the present application; the processing method 400 includes step S410 to step S470 , and these steps will be described in detail below.
  • Step S410 run the camera of the terminal device.
  • the terminal device detects that the user clicks the camera; in response to the user's click operation, the terminal device can run the camera.
  • Fig. 14 shows a kind of GUI of terminal equipment, and this GUI can be the desktop 510 of terminal equipment;
  • Another GUI of the camera the GUI can be the display interface 530 of the camera;
  • the display interface 530 can include a shooting viewfinder frame 540, a control 550 indicating shooting, and other shooting controls, wherein the preview image can be displayed in real time in the shooting viewfinder frame 540 .
  • Step S420 select an AR shooting mode.
  • the terminal device may detect that the user indicates an AR shooting mode operation.
  • the AR shooting mode may refer to a shooting mode in which virtual content can be added by processing the original video.
  • the shooting interface also includes a setting 560.
  • the terminal device After the terminal device detects that the user clicks on the setting 560, the terminal device displays the setting mode interface, as shown in Figure 17; the terminal device detects that the user clicks on the setting mode interface for After instructing the AR video 561, the terminal device enters the AR shooting mode.
  • Step S430 acquiring the original video and pose information.
  • the terminal device detects that the user clicks the shooting control 550 and starts recording the image displayed in the shooting frame.
  • the behavior of the user for instructing the shooting may include pressing a shooting button, or the behavior of the user device instructing the terminal device to take a photo by voice, or may also include other behaviors of the user instructing the terminal device to take a photo; the above is an example description, without any limitation to the application.
  • the pose information may be used to represent the pose of the camera of the terminal device when acquiring the original video; the pose information may include pose information and position information.
  • the terminal device may acquire pose information corresponding to each frame of image through the gyroscope sensor 180B shown in FIG. 1 .
  • Step S440 saving the pose information and the virtual plane information.
  • the saved pose information may refer to pose information corresponding to each image frame in the original video.
  • feature points can be extracted for the image frame, and a sparse point cloud can be obtained through calculation; a virtual plane can be generated by algorithm fitting according to the sparse point cloud; in a real scene When adding virtual content in , the placement position of the virtual content can be adjusted according to the virtual plane.
  • saving the pose information and the information of the virtual plane can make it possible to generate a new AR by adding virtual content to the original video according to the pose information of the original video and the information of the virtual plane after the recording of the original video.
  • Video Since the pose information and virtual plane information are saved, users can edit the original video multiple times to generate AR videos with different virtual content.
  • the acquired three-dimensional pose information may be expressed as a quaternion, thereby avoiding the ambiguity caused by expressing the pose with three parameters.
  • a quaternion can refer to a real number plus three imaginary units i, j, and k; for example, a quaternion can be a linear combination of 1, i, j, and k, that is, a quaternion can generally be expressed as a+bi+cj+dk, where a, b, c, and d all represent real numbers; i, j, and k can represent rotation; among them, i rotation can represent the positive direction of the X-axis to the Y-axis in the plane where the X-axis and the Y-axis intersect For positive rotation, j rotation can represent the positive direction of the Z axis to the positive direction of the X axis in the plane where the Z axis and the X axis intersect, and the k rotation can represent the positive direction of the Y axis to the positive direction of the Z axis in the plane where the Y axis and the Z axis intersect rotation.
  • the terminal device when the terminal device receives an instruction from the user to instruct shooting; for example, when the user clicks video recording on the terminal device, the terminal device can start the initialization of the pose calculation; before the initialization is unsuccessful, the pose can be expressed as (position x/y/z, rotation quaternion), which can be expressed as (0,0,0,0,0,0), the information of the virtual plane is (number 0); when the initialization is successful, specify the image frame ( Initialize the starting frame) pose is expressed as (0,0,0,0,0,0,0), and the information of the virtual plane is expressed as (quantity x, plane number 0, plane 0 point number n, point 0 position X1, Y1, Z1,..., the position of point n (Xn, Yn, Zn).
  • the number x represents the total number of virtual planes, i.e. the total number of image frames included in the video; plane number 0 can be used to represent the first virtual plane in multiple virtual planes; plane 0 points n can be used to represent the first
  • the number of vertices included in a virtual plane is n; the position X1, Y1, Z1 of point 0 is used to represent the position information of vertex 0 in the first virtual plane; the position Xn, Yn, Zn of point n is used to represent the first
  • a virtual plane includes the position information of vertex n.
  • the information of the virtual plane may include position information of all vertices included in the virtual plane.
  • the acquired pose information corresponding to the current image frame can be expressed as (X, Y, Z, q0, q1, q2, q3)
  • the information of the virtual plane can be expressed as (quantity x, plane number A, the number n of plane A points, the position X1, Y1, Z1,... of point 0, the position Xq, Yq, Zq of point q).
  • x, y, and z can respectively represent the coordinates of the camera that acquires the current image frame on the x-axis, y-axis, and z-axis;
  • q0, q1, q2, and q3 represent rotation quaternions; for example, they can be represented as pitch angle, azimuth angle, rotation angle, and Euler angle;
  • the quantity x represents the total number of planes;
  • the plane number A can be used to indicate the identification of the virtual plane corresponding to the current image frame;
  • the point number n of plane A is used to indicate that the virtual plane corresponding to the current image frame includes The number of vertices is n;
  • the position X1, Y1, Z1 of point 0 can be used to indicate the position information of vertex 0 in the virtual plane corresponding to the current image frame;
  • the position Xn, Yn, Zn of point n is used to indicate that the current image frame corresponds to
  • the virtual plane includes the position information of the vertex
  • an image frame in the original video can be obtained; feature points can be extracted from the image frame according to the pose information of the image frame, and a sparse point cloud can be obtained through calculation; according to the sparse point cloud information, it can be fitted to generate Virtual plane; when adding virtual content to the video, the position of the virtual content added in the video can be adjusted according to the virtual plane.
  • the user's operation may collide with the generated virtual plane to determine the placement position of the virtual content.
  • the terminal device may save the pose information and the information of the virtual plane.
  • the custom information includes the aforementioned pose information and information of the virtual plane, and the terminal device may store the custom information independently as an independent binary file (binary, bin).
  • the original video and the custom information corresponding to the original video may be saved in the same directory.
  • the custom information corresponding to the original video may be saved in the terminal device in the same name as the original video.
  • the custom information corresponding to the original video may be saved in the terminal device by using the frame number of an image frame as an identifier.
  • custom information corresponding to each image frame in the original video can be saved as an independent bin file according to the following data format:
  • Frame number Frame num:unsigned int32;
  • Pose information (data 1, data 2, data 3, data 4, data 5, data 6, data 7); among them, data 1 to data 7 can be data in float format;
  • Virtual plane information (num:unsigned int32; planeNum0:unsigned int32; planeNumPoint:unsigned int32; point0(float,float,float)...pointN(float,float,float)...planeNumN...);
  • the original video and the above bin file can be loaded at the same time; the image frame in the original video and the custom information corresponding to the image frame are synchronously aligned according to the frame number.
  • the custom information may include the aforementioned pose information and virtual plane information, and the terminal device may save the custom information into supplementary enhancement information in the video code stream corresponding to the original video.
  • the following information can be stored in the SEI information of h.264/h.265 when performing video compression encoding:
  • Pose information (float, float, float, float, float, float, float, float, float);
  • the custom information may be decoded according to the above-mentioned format when executing step S250 when decoding the edited video.
  • At least one of the following methods can be used to compress the custom information:
  • the plane number of the virtual plane can be saved in the form of unsigned char; or, for the description of the vertices in the virtual plane, a point can be reserved for the horizontal plane
  • the Z-axis information of the vertex can delete the Z-axis information of other points, and the vertical plane can retain the Y-axis information of one point and delete the Y-axis information of other points; or, the position description of the vertex can use float16; or, when saving the information of the virtual plane, only Save the plane in the current field of view.
  • Step S450 the original video recording ends.
  • the terminal device detects that the user clicks the shooting control 550 again, and ends the recording of the current video; for example, the duration of the recorded video is 20 seconds.
  • Step S460 editing the original video.
  • the terminal device may call the saved custom information corresponding to the image frame; that is, call the pose information and plane information of the image frame.
  • the original video is edited through the visual interface of the virtual plane; any frame image of the 8th second in the original video can be extracted, and the display interface as shown in FIG.
  • the operation of the plane option 570 can display the generated virtual plane 562 in the display interface, as shown in FIG. 21 .
  • the virtual plane 562 can be displayed on the display interface; when the user clicks the screen/gesture operation to place the virtual content, the user's operation collides with the virtual plane 562, thereby determining the placement position of the virtual content .
  • the virtual plane 562 can be displayed in the interface; after the editing is completed, the virtual plane 562 will not appear in the AR video; the virtual plane 562 is used for the user to determine Where the virtual content is added in the video.
  • Step S470 generating an AR video including virtual content.
  • the user can edit each image frame in the original video; for example, virtual content can be added to each image frame, and the position information of the virtual content in each image frame can be adjusted; AR video of content.
  • the user can play the original video, and the user can click the pause button to extract the current image frame and edit the current image frame, that is, add virtual content to the current image frame; when the user clicks the play button again, the current image frame edits Finish.
  • the pose information corresponding to the original video can be obtained when the original video is obtained; a virtual plane can be obtained according to the pose information and the original video; when adding virtual content to the image frame of the original video, the virtual plane can As a reference plane, the position of the virtual content in the original video can be adjusted according to the virtual plane, so that the virtual content can be better integrated into the original video and the video quality of the AR video can be improved.
  • the AR video processing method of the embodiment of the present application is described in detail above with reference to FIG. 1 to FIG. 21 , and the device embodiment of the present application will be described in detail below in conjunction with FIG. 22 and FIG. 23 . It should be understood that the device in the embodiment of the present application can execute the AR video processing method in the embodiment of the present application, that is, for the specific working process of the following various products, you can refer to the corresponding process in the foregoing method embodiment.
  • FIG. 22 is a schematic structural diagram of an augmented reality video processing device provided by the present application.
  • the processing device 600 includes an acquisition unit 610 and a processing unit 620 .
  • the obtaining unit 610 obtains the original video and pose information, the original video is used to represent the video of the real object, and the pose information is used to represent the pose when the terminal device obtains the original video; the processing unit 620 is used to Generate a virtual plane according to the original video and the pose information, the virtual plane is used to determine the position information for adding virtual content in the original video; add the virtual plane in the original video according to the virtual plane Content generates AR video.
  • the pose information includes three-dimensional pose information
  • the processing unit 620 is further configured to:
  • the three-dimensional pose information is represented by a quaternion.
  • the processing unit 620 is specifically configured to:
  • the virtual plane is generated according to the feature points.
  • processing unit 620 is further configured to:
  • the pose information and the information of the virtual plane are saved.
  • the processing unit 620 is specifically configured to:
  • the pose information and the information of the virtual plane are saved in a binary file.
  • the processing unit 620 is specifically configured to:
  • the pose information and the information of the virtual plane are stored in supplementary enhancement information corresponding to the original video.
  • processing unit 620 is further configured to:
  • the processing unit 620 is specifically configured to:
  • the AR video is generated by adding the virtual content to the original video according to the virtual plane.
  • the virtual plane includes a first virtual plane, and the first virtual plane refers to a virtual plane corresponding to the first image frame, and the first image frame is any an image frame;
  • the information of the first virtual plane includes the total number of image frames, the identification of the first virtual plane, the number of vertices included in the first virtual plane, and the position information of each vertex included in the first virtual plane, so
  • the total number refers to the total number of image frames included in the original video.
  • processing device 600 is embodied in the form of a functional unit.
  • unit here may be implemented in the form of software and/or hardware, which is not specifically limited.
  • a "unit” may be a software program, a hardware circuit or a combination of both to realize the above functions.
  • the hardware circuitry may include application specific integrated circuits (ASICs), electronic circuits, processors (such as shared processors, dedicated processors, or group processors) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.
  • ASICs application specific integrated circuits
  • processors such as shared processors, dedicated processors, or group processors for executing one or more software or firmware programs. etc.
  • memory incorporating logic, and/or other suitable components to support the described functionality.
  • the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
  • FIG. 23 shows a schematic structural diagram of an electronic device provided by the present application.
  • the dashed line in Figure 23 indicates that the unit or the module is optional.
  • the electronic device 700 may be used to implement the processing methods described in the foregoing method embodiments.
  • the electronic device 700 includes one or more processors 701, and the one or more processors 701 can support the electronic device 700 to implement the method in the method embodiment.
  • Processor 701 may be a general purpose processor or a special purpose processor.
  • the processor 701 may be a central processing unit (central processing unit, CPU), a digital signal processor (digital signal processor, DSP), an application specific integrated circuit (application specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices such as discrete gates, transistor logic devices, or discrete hardware components.
  • the processor 701 can be used to control the electronic device 700, execute software programs, and process data of the software programs.
  • the electronic device 700 may further include a communication unit 705, configured to implement input (reception) and output (send) of signals.
  • the electronic device 700 can be a chip, and the communication unit 705 can be an input and/or output circuit of the chip, or the communication unit 705 can be a communication interface of the chip, and the chip can be used as a component of a terminal device or other electronic devices .
  • the electronic device 700 may be a terminal device, and the communication unit 705 may be a transceiver of the terminal device, or the communication unit 705 may be a transceiver circuit of the terminal device.
  • the electronic device 700 may include one or more memories 702, on which there is a program 704, and the program 704 may be run by the processor 701 to generate an instruction 703, so that the processor 701 executes the AR video described in the above method embodiment according to the instruction 703 processing method.
  • data may also be stored in the memory 702 .
  • the processor 701 may also read the data stored in the memory 702, the data may be stored in the same storage address as the program 704, and the data may also be stored in a different storage address from the program 704.
  • the processor 701 and the memory 702 can be set separately, or can be integrated together; for example, integrated on a system-on-chip (system on chip, SOC) of the terminal device.
  • SOC system on chip
  • the memory 702 can be used to store the related program 704 of the AR video processing method provided in the embodiment of the present application, and the processor 701 can be used to call the AR video processing method stored in the memory 702 when editing the AR video.
  • the related program 704 executes the processing of the AR video in the embodiment of the present application; for example, acquires the original video and pose information, the original video is used to represent the video of the real object, and the pose information is used to indicate that the terminal device acquires the The pose of the original video; the processing unit is used to generate a virtual plane according to the original video and the pose information, and the virtual plane is used to determine the position information for adding virtual content in the original video; according to the virtual The plane adds the virtual content to the original video to generate an AR video.
  • the present application also provides a computer program product, which implements the processing method described in any method embodiment in the present application when the computer program product is executed by the processor 701 .
  • the computer program product can be stored in the memory 702, such as a program 704, and the program 704 is finally converted into an executable object file that can be executed by the processor 701 through processes such as preprocessing, compiling, assembling and linking.
  • the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a computer, the method described in any method embodiment in the present application is implemented.
  • the computer program may be a high-level language program or an executable object program.
  • the computer-readable storage medium is, for example, the memory 702 .
  • the memory 702 may be a volatile memory or a nonvolatile memory, or, the memory 702 may include both a volatile memory and a nonvolatile memory.
  • the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which acts as external cache memory.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM double data rate synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM enhanced synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • serial link DRAM SLDRAM
  • direct memory bus random access memory direct rambus RAM, DR RAM
  • the disclosed systems, devices and methods may be implemented in other ways. For example, some features of the method embodiments described above may be omitted, or not implemented.
  • the device embodiments described above are only illustrative, and the division of units is only a logical function division. In actual implementation, there may be other division methods, and multiple units or components may be combined or integrated into another system.
  • the coupling between the various units or the coupling between the various components may be direct coupling or indirect coupling, and the above coupling includes electrical, mechanical or other forms of connection.
  • sequence numbers of the processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, rather than by the embodiments of the present application.
  • the implementation process constitutes any limitation.
  • system and “network” are often used herein interchangeably.
  • the term “and/or” in this article is just an association relationship describing associated objects, which means that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and A and B exist alone. There are three cases of B.
  • the character "/" in this article generally indicates that the contextual objects are an "or” relationship.

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种增强现实视频的处理方法与电子设备,该处理方法包括:获取原始视频与位姿信息,所述原始视频用于表示真实物体的视频,所述位姿信息用于表示终端设备获取所述原始视频时的位姿;根据所述原始视频与所述位姿信息生成虚拟平面,所述虚拟平面用于确定在所述原始视频中添加虚拟内容的位置信息;根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成AR视频。基于本申请的技术方法,能够提高录制AR视频的视频质量。

Description

增强现实视频的处理方法与电子设备
本申请要求于2021年07月22日提交国家知识产权局、申请号为202110831693.9、申请名称为“增强现实视频的处理方法与电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端领域,具体涉及一种增强现实视频的处理方法与电子设备。
背景技术
增强现实(augmented reality,AR)技术是一种实时地计算摄影机影像的位置及角度并加上相应图像的技术,是一种将真实世界信息和虚拟世界信息“无缝”集成的新技术,这种技术的目标是在屏幕上把虚拟世界套在现实世界并进行互动。
目前,在录制AR视频时由于无法将虚拟内容和真实物体的视频较好的融合,尤其是在摄场景中需要用户与虚拟内容进行交互时,需要多次的重复拍摄,费时费力。
因此,如何在录制AR视频时使得虚拟内容和真实物体内容较好的融合,提高AR视频的视频质量成为一个亟需解决的问题。
发明内容
本申请提供了一种增强现实视频的处理方法与电子设备,能够在录制AR视频时使得虚拟内容和真实物体的视频较好融合,提高AR视频的视频质量。
第一方面,提供了一种增强现实视频的处理方法,包括:
获取原始视频与位姿信息,所述原始视频用于表示真实物体的视频,所述位姿信息用于表示终端设备获取所述原始视频时的位姿;根据所述原始视频与所述位姿信息生成虚拟平面,所述虚拟平面用于确定在所述原始视频中添加虚拟内容的位置信息;根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成AR视频。
在本申请的实施例中,在获取原始视频时可以获取原始视频对应的位姿信息;根据位姿信息与原始视频可以得到虚拟平面;在原始视频的图像帧中添加虚拟内容时,虚拟平面可以作为一个基准面,根据虚拟平面可以调整虚拟内容在原始视频中的位置,使得虚拟内容能够更好的融入到原始视频中,提高AR视频的视频质量。
应理解,位姿信息用于表示终端设备的摄像头在获取原始视频时的位姿;位姿信息可以包括姿态信息与位置信息。
结合第一方面,在第一方面的某些实现方式中,所述位姿信息包括三维姿态信息,还包括:
通过四元数表示所述三维姿态信息。
在本申请的实施例中,可以将三维姿态信息转化为四元数来表示,从而避免将姿 态信息通过三个参数表示所产生的歧义。
结合第一方面,在第一方面的某些实现方式中,所述根据所述原始视频与所述位姿信息生成虚拟平面的信息,包括:
根据所述原始视频中图像帧的位姿信息提取所述图像帧的特征点;
根据所述特征点生成所述虚拟平面。
应理解,图像帧的特征点可以是指图像灰度值发生剧烈变化的点,或者在图像边缘上曲率较大的点;特征点可以用于标识图像中物体。
结合第一方面,在第一方面的某些实现方式中,还包括:
保存所述位姿信息与所述虚拟平面的信息。
在本申请的实施例中,保存位姿信息与虚拟平面的信息可以使得在原始视频录制结束后,根据原始视频的位姿信息与虚拟平面的信息在原始视频中添加虚拟内容生成一个新的AR视频;由于保存了位姿信息与虚拟平面的信息,用户可以对原始视频进行多次不同的编辑,分别生成带不同虚拟内容的AR视频。
结合第一方面,在第一方面的某些实现方式中,所述保存所述位姿信息与所述虚拟平面的信息,包括:
将所述位姿信息与所述虚拟平面的信息保存在二进制文件中。
在一种可能的实现方式中,终端设备可以将位姿信息与虚拟平面的信息保存为独立的二进制文件。
在一种可能的实现方式中,可以将原始视频与原始视频对应的位姿信息与虚拟平面的信息保存在相同的目录下。
在一种可能的实现方式中,可以将原始视频对应的位姿信息与虚拟平面的信息与原始视频的命名相同保存在终端设备中。
在一种可能的实现方式中,可以通过每个图像帧的帧号作为标识,将原始视频对应的位姿信息与虚拟平面的信息保存在终端设备中。
结合第一方面,在第一方面的某些实现方式中,所述保存所述位姿信息与所述虚拟平面的信息,包括:
将所述位姿信息与所述虚拟平面的信息保存在所述原始视频对应的补充增强信息中。
在一种可能的实现方式中,可以将位姿信息与虚拟平面的信息进行视频压缩编码的时保存至h.264或者h.265的补充增强信息中。
结合第一方面,在第一方面的某些实现方式中,还包括:
对保存的所述位姿信息与所述虚拟平面的信息进行压缩处理。
在本申请的实施例中,在保存位姿信息与虚拟平面的信息时可以对保存的信息进行压缩处理,从而能够有效的减少保存信息占用的内存空间。
在一种可能的实现方式中,可以采用以下的至少一种方式进行对保存位姿信息与虚拟平面的信息进行压缩处理:
根据当前图像帧与前一图像帧的差保存位姿信息;或者,虚拟平面的平面编号可以采用无符号字符方式保存;或者,对于虚拟平面中顶点的描述,水平面可以保留一个点的Z轴信息删除其他点的Z轴信息,垂直面可以保留一个点的Y轴信息删除其他 点的Y轴信息;或者,顶点的位置描述可以采用float16;或者,保存虚拟平面的信息时可以只保存当前视野范围内的平面。
结合第一方面,在第一方面的某些实现方式中,所述根据所述虚拟平面的信息在所述原始视频中添加所述虚拟内容生成AR视频,包括:
在所述原始视频录制完成后,根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成所述AR视频。
结合第一方面,在第一方面的某些实现方式中,所述虚拟平面包括第一虚拟平面,所述第一虚拟平面是指第一图像帧对应的虚拟平面,所述第一图像帧是所述原始视频中的任意一个图像帧;
所述第一虚拟平面的信息包括图像帧的总数、所述第一虚拟平面的标识、所述第一虚拟平面包括的顶点数量以及所述第一虚拟平面包括的每一个顶点的位置信息,所述总数是指所述原始视频包括图像帧的总数。
第二方面,提供了一种AR视频的处理装置,所述处理装置包括获取单元与处理单元;
其中,所述获取单元用于获取原始视频与位姿信息,所述原始视频用于表示真实物体的视频,所述位姿信息用于表示终端设备获取所述原始视频时的位姿;所述处理单元用于根据所述原始视频与所述位姿信息生成虚拟平面,所述虚拟平面用于确定在所述原始视频中添加虚拟内容的位置信息;根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成AR视频。
结合第二方面,在第二方面的某些实现方式中,所述位姿信息包括三维姿态信息,所述处理单元还用于:
通过四元数表示所述三维姿态信息。
结合第二方面,在第二方面的某些实现方式中,所述处理单元具体用于:
根据所述原始视频中图像帧的位姿信息提取所述图像帧的特征点;
根据所述特征点生成所述虚拟平面。
结合第二方面,在第二方面的某些实现方式中,所述处理单元还用于:
保存所述位姿信息与所述虚拟平面的信息。
结合第二方面,在第二方面的某些实现方式中,所述处理单元具体用于:
将所述位姿信息与所述虚拟平面的信息保存在二进制文件中。
结合第二方面,在第二方面的某些实现方式中,所述处理单元具体用于:
将所述位姿信息与所述虚拟平面的信息保存在所述原始视频对应的补充增强信息中。
结合第二方面,在第二方面的某些实现方式中,所述处理单元还用于:
对保存的所述位姿信息与所述虚拟平面的信息进行压缩处理。
结合第二方面,在第二方面的某些实现方式中,所述处理单元具体用于:
在所述原始视频录制完成后,根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成所述AR视频。
结合第二方面,在第二方面的某些实现方式中,所述虚拟平面包括第一虚拟平面,所述第一虚拟平面是指第一图像帧对应的虚拟平面,所述第一图像帧是所述原始视频 中的任意一个图像帧;
所述第一虚拟平面的信息包括图像帧的总数、所述第一虚拟平面的标识、所述第一虚拟平面包括的顶点数量以及所述第一虚拟平面包括的每一个顶点的位置信息,所述总数是指所述原始视频包括图像帧的总数。
在一种可能的实现方式中,上述AR视频的处理装置可以是指芯片。
在上述处理装置为芯片时,获取单元可以是指输出接口、管脚或电路等;处理单元可以是指芯片内部的处理单元。
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第二方面中相同的内容。
第三方面,提供了一种电子设备,所述电子设备包括:一个或多个处理器、存储器和显示屏;所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行:
获取原始视频与位姿信息,所述原始视频用于表示真实物体的视频,所述位姿信息用于表示终端设备获取所述原始视频时的位姿;根据所述原始视频与所述位姿信息生成虚拟平面,所述虚拟平面用于确定在所述原始视频中添加虚拟内容的位置信息;根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成AR视频。
结合第三方面,在第三方面的某些实现方式中,所述位姿信息包括三维姿态信息,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:
通过四元数表示所述三维姿态信息。
结合第三方面,在第三方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:
根据所述原始视频中图像帧的位姿信息提取所述图像帧的特征点;
根据所述特征点生成所述虚拟平面。
结合第三方面,在第三方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:
保存所述位姿信息与所述虚拟平面的信息。
结合第三方面,在第三方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:
将所述位姿信息与所述虚拟平面的信息保存在二进制文件中。
结合第三方面,在第三方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:
将所述位姿信息与所述虚拟平面的信息保存在所述原始视频对应的补充增强信息中。
结合第三方面,在第三方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:
对保存的所述位姿信息与所述虚拟平面的信息进行压缩处理。
结合第三方面,在第三方面的某些实现方式中,所述一个或多个处理器调用所述计算机指令以使得所述电子设备还执行:
在所述原始视频录制完成后,根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成所述AR视频。
结合第三方面,在第三方面的某些实现方式中,所述虚拟平面包括第一虚拟平面,所述第一虚拟平面是指第一图像帧对应的虚拟平面,所述第一图像帧是所述原始视频中的任意一个图像帧;
所述第一虚拟平面的信息包括图像帧的总数、所述第一虚拟平面的标识、所述第一虚拟平面包括的顶点数量以及所述第一虚拟平面包括的每一个顶点的位置信息,所述总数是指所述原始视频包括图像帧的总数。
应理解,在上述第一方面中对相关内容的扩展、限定、解释和说明也适用于第三方面中相同的内容。
第四方面,提供了一种电子设备,所述电子设备包括:一个或多个处理器、存储器和显示屏;所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行第一方面中的任一种处理方法。
第五方面,提供了一种芯片系统,所述芯片系统应用于电子设备,所述芯片系统包括一个或多个处理器,所述处理器用于调用计算机指令以使得所述电子设备执行第一方面中的任一种处理方法。
第六方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被电子设备运行时,使得该电子设备执行第一方面中的任一种处理方法。
第七方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序代码,当所述计算机程序代码被电子设备运行时,使得该电子设备执行第一方面中的任一种处理方法。
在本申请的实施例中,由于在获取原始视频时可以获取原始视频对应的位姿信息;根据位姿信息与原始视频可以得到虚拟平面;在原始视频的图像帧中添加虚拟内容时,虚拟平面可以作为一个基准面,根据虚拟平面可以调整虚拟内容在原始视频中的位置;因此,在本申请的实施例中,通过虚拟平面使得虚拟内容能够更好的融入到原始视频中,从而提高生成的AR视频的视频质量。
附图说明
图1是一种适用于本申请的装置的硬件系统的示意图;
图2是一种适用于本申请的装置的软件系统的示意图;
图3是本申请提供的一种应用场景的示意图;
图4是本申请提供的一种增强现实视频的处理方法的示意图;
图5是本申请提供的一种AR视频处理的显示界面的示意图;
图6是本申请提供的一种AR视频处理的显示界面的示意图;
图7是本申请提供的一种AR视频处理的显示界面的示意图;
图8是本申请提供的一种AR视频处理的显示界面的示意图;
图9是本申请提供的一种AR视频处理的显示界面的示意图;
图10是本申请提供的一种AR视频处理的显示界面的示意图;
图11是本申请提供的一种AR视频处理的显示界面的示意图;
图12是本申请提供的一种AR视频处理的显示界面的示意图;
图13是本申请提供的一种增强现实视频的处理方法的示意图;
图14是本申请提供的一种AR视频处理的显示界面的示意图;
图15是本申请提供的一种AR视频处理的显示界面的示意图;
图16是本申请提供的一种AR视频处理的显示界面的示意图;
图17是本申请提供的一种AR视频处理的显示界面的示意图;
图18是本申请提供的一种AR视频处理的显示界面的示意图;
图19是本申请提供的一种AR视频处理的显示界面的示意图;
图20是本申请提供的一种AR视频处理的显示界面的示意图;
图21是本申请提供的一种AR视频处理的显示界面的示意图;
图22是本申请提供的一种增强现实视频的处理装置的结构示意图;
图23是本申请提供的一种电子设备的结构示意图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
图1示出了一种适用于本申请的终端设备的硬件系统。
终端设备100可以是手机、智慧屏、平板电脑、可穿戴电子设备、车载电子设备、增强现实(augmented reality,AR)设备、虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、投影仪等等,本申请实施例对终端设备100的具体类型不作任何限制。
终端设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。
需要说明的是,图1所示的结构并不构成对终端设备100的具体限定。在本申请另一些实施例中,终端设备100可以包括比图1所示的部件更多或更少的部件,或者,终端设备100可以包括图1所示的部件中某些部件的组合,或者,终端设备100可以包括图1所示的部件中某些部件的子部件。图1示的部件可以以硬件、软件、或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元。例如,处理器110可以包括以下处理单元中的至少一个:应用处理器(application processor,AP)、调制解调处理器、图 形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、神经网络处理器(neural-network processing unit,NPU)。其中,不同的处理单元可以是独立的器件,也可以是集成的器件。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。例如,处理器110可以包括以下接口中的至少一个:内部集成电路(inter-integrated circuit,I2C)接口、内部集成电路音频(inter-integrated circuit sound,I2S)接口、脉冲编码调制(pulse code modulation,PCM)接口、通用异步接收传输器(universal asynchronous receiver/transmitter,UART)接口、移动产业处理器接口(mobile industry processor interface,MIPI)、通用输入输出(general-purpose input/output,GPIO)接口、SIM接口、USB接口。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K、充电器、闪光灯、摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现终端设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块170与无线通信模块160可以通过PCM接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194和摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI)、显示屏串行接口(display  serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现终端设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现终端设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号接口,也可被配置为数据信号接口。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194、无线通信模块160、音频模块170和传感器模块180。GPIO接口还可以被配置为I2C接口、I2S接口、UART接口或MIPI接口。
USB接口130是符合USB标准规范的接口,例如可以是迷你(Mini)USB接口、微型(Micro)USB接口或C型USB(USB Type C)接口。USB接口130可以用于连接充电器为终端设备100充电,也可以用于终端设备100与外围设备之间传输数据,还可以用于连接耳机以通过耳机播放音频。USB接口130还可以用于连接其他终端设备100,例如AR设备。
图1所示的各模块间的连接关系只是示意性说明,并不构成对终端设备100的各模块间的连接关系的限定。可选地,终端设备100的各模块也可以采用上述实施例中多种连接方式的组合。
充电管理模块140用于从充电器接收电力。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的电流。在一些无线充电的实施例中,充电管理模块140可以通过终端设备100的无线充电线圈接收电磁波(电流路径如虚线所示)。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为终端设备100供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量、电池循环次数和电池健康状态(例如,漏电、阻抗)等参数。可选地,电源管理模块141可以设置于处理器110中,或者,电源管理模块141和充电管理模块140可以设置于同一个器件中。
终端设备100的无线通信功能可以通过天线1、天线2、移动通信模块150、无线通信模块160、调制解调处理器以及基带处理器等器件实现。
天线1和天线2用于发射和接收电磁波信号。终端设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在终端设备100上的无线通信的解决方案,例如下列方案中的至少一个:第二代(2th generation,2G)移动通信解决方案、第三代(3th generation,3G)移动通信解决方案、第四代(4th generation,4G)移动通信解决方案、第五代(5th generation,5G)移动通信解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波和放大等处理,随后传送至调制解调处理器进行解调。移动通信模块150还可以放大经调制解调处理 器调制后的信号,放大后的该信号经天线1转变为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(例如,扬声器170A、受话器170B)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
与移动通信模块150类似,无线通信模块160也可以提供应用在终端设备100上的无线通信解决方案,例如下列方案中的至少一个:无线局域网(wireless local area networks,WLAN)、蓝牙(bluetooth,BT)、蓝牙低功耗(bluetooth low energy,BLE)、超宽带(ultra wide band,UWB)、全球导航卫星系统(global navigation satellite system,GNSS)、调频(frequency modulation,FM)、近场通信(near field communication,NFC)、红外(infrared,IR)技术。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,并将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频和放大,该信号经天线2转变为电磁波辐射出去。
在一些实施例中,终端设备100的天线1和移动通信模块150耦合,终端设备100的天线2和无线通信模块160耦合,使得终端设备100可以通过无线通信技术与网络和其他电子设备通信。该无线通信技术可以包括以下通信技术中的至少一个:全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,IR技术。该GNSS可以包括以下定位技术中的至少一个:全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS),星基增强系统(satellite based augmentation systems,SBAS)。
终端设备100可以通过GPU、显示屏194以及应用处理器实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194可以用于显示图像或视频。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting  diode,OLED)、有源矩阵有机发光二极体(active-matrix organic light-emitting diode,AMOLED)、柔性发光二极管(flex light-emitting diode,FLED)、迷你发光二极管(mini light-emitting diode,Mini LED)、微型发光二极管(micro light-emitting diode,Micro LED)、微型OLED(Micro OLED)或量子点发光二极管(quantum dot light emitting diodes,QLED)。在一些实施例中,终端设备100可以包括1个或N个显示屏194,N为大于1的正整数。
终端设备100可以通过ISP、摄像头193、视频编解码器、GPU、显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP可以对图像的噪点、亮度和色彩进行算法优化,ISP还可以优化拍摄场景的曝光和色温等参数。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的红绿蓝(red green blue,RGB),YUV等格式的图像信号。在一些实施例中,终端设备100可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当终端设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。终端设备100可以支持一种或多种视频编解码器。这样,终端设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1、MPEG2、MPEG3和MPEG4。
NPU是一种借鉴生物神经网络结构的处理器,例如借鉴人脑神经元之间传递模式对输入信息快速处理,还可以不断地自学习。通过NPU可以实现终端设备100的智能认知等功能,例如:图像识别、人脸识别、语音识别和文本理解。
外部存储器接口120可以用于连接外部存储卡,例如安全数码(secure digital,SD)卡,实现扩展终端设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能(例如,声音播放功能和图像播放功能)所需的应用程序。存储数据区可存储终端设备100使用过程中所创建的数据(例如,音频数据和电话本)。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如:至少一个磁盘存储器件、闪存器件和通用闪存存储器(universal flash storage,UFS)等。处理器110通过运行存储在内部存储器121的指令和/或存储在设置于处理 器中的存储器的指令,执行终端设备100的各种处理方法。
终端设备100可以通过音频模块170、扬声器170A、受话器170B、麦克风170C、耳机接口170D以及应用处理器等实现音频功能,例如,音乐播放和录音。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也可以用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170或者音频模块170的部分功能模块可以设置于处理器110中。
扬声器170A,也称为喇叭,用于将音频电信号转换为声音信号。终端设备100可以通过扬声器170A收听音乐或免提通话。
受话器170B,也称为听筒,用于将音频电信号转换成声音信号。当用户使用终端设备100接听电话或语音信息时,可以通过将受话器170B靠近耳朵接听语音。
麦克风170C,也称为话筒或传声器,用于将声音信号转换为电信号。当用户拨打电话或发送语音信息时,可以通过靠近麦克风170C发声将声音信号输入麦克风170C。终端设备100可以设置至少一个麦克风170C。在另一些实施例中,终端设备100可以设置两个麦克风170C,以实现降噪功能。在另一些实施例中,终端设备100还可以设置三个、四个或更多麦克风170C,以实现识别声音来源和定向录音等功能。处理器110可以对麦克风170C输出的电信号进行处理,例如,音频模块170与无线通信模块160可以通过PCM接口耦合,麦克风170C将环境声音转换为电信号(如PCM信号)后,通过PCM接口将该电信号传输至处理器110;从处理器110对该电信号进行音量分析和频率分析,确定环境声音的音量和频率。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动终端设备100平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。压力传感器180A的种类很多,例如可以是电阻式压力传感器、电感式压力传感器或电容式压力传感器。电容式压力传感器可以是包括至少两个具有导电材料的平行板,当力作用于压力传感器180A,电极之间的电容改变,终端设备100根据电容的变化确定压力的强度。当触摸操作作用于显示屏194时,终端设备100根据压力传感器180A检测所述触摸操作。终端设备100也可以根据压力传感器180A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令;当触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器180B可以用于确定终端设备100的运动姿态。在一些实施例中,可以通过陀螺仪传感器180B确定终端设备100围绕三个轴(即,x轴、y轴和z轴)的角速度。陀螺仪传感器180B可以用于拍摄防抖。例如,当快门被按下时,陀螺仪传感器180B检测终端设备100抖动的角度,根据角度计算出镜头模组需要补偿的距 离,让镜头通过反向运动抵消终端设备100的抖动,实现防抖。陀螺仪传感器180B还可以用于导航和体感游戏等场景。
气压传感器180C用于测量气压。在一些实施例中,终端设备100通过气压传感器180C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器180D包括霍尔传感器。终端设备100可以利用磁传感器180D检测翻盖皮套的开合。在一些实施例中,当终端设备100是翻盖机时,终端设备100可以根据磁传感器180D检测翻盖的开合。终端设备100可以根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器180E可检测终端设备100在各个方向上(一般为x轴、y轴和z轴)加速度的大小。当终端设备100静止时可检测出重力的大小及方向。加速度传感器180E还可以用于识别终端设备100的姿态,作为横竖屏切换和计步器等应用程序的输入参数。
距离传感器180F用于测量距离。终端设备100可以通过红外或激光测量距离。在一些实施例中,例如在拍摄场景中,终端设备100可以利用距离传感器180F测距以实现快速对焦。
接近光传感器180G可以包括例如发光二极管(light-emitting diode,LED)和光检测器,例如,光电二极管。LED可以是红外LED。终端设备100通过LED向外发射红外光。终端设备100使用光电二极管检测来自附近物体的红外反射光。当检测到反射光时,终端设备100可以确定附近存在物体。当检测不到反射光时,终端设备100可以确定附近没有物体。终端设备100可以利用接近光传感器180G检测用户是否手持终端设备100贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器180G也可用于皮套模式或口袋模式的自动解锁与自动锁屏。
环境光传感器180L用于感知环境光亮度。终端设备100可以根据感知的环境光亮度自适应调节显示屏194亮度。环境光传感器180L也可用于拍照时自动调节白平衡。环境光传感器180L还可以与接近光传感器180G配合,检测终端设备100是否在口袋里,以防误触。
指纹传感器180H用于采集指纹。终端设备100可以利用采集的指纹特性实现解锁、访问应用锁、拍照和接听来电等功能。
温度传感器180J用于检测温度。在一些实施例中,终端设备100利用温度传感器180J检测的温度,执行温度处理策略。例如,当温度传感器180J上报的温度超过阈值,终端设备100执行降低位于温度传感器180J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,终端设备100对电池142加热,以避免低温导致终端设备100异常关机。在其他一些实施例中,当温度低于又一阈值时,终端设备100对电池142的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器180K,也称为触控器件。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,触摸屏也称为触控屏。触摸传感器180K用于检测作用于其上或其附近的触摸操作。触摸传感器180K可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于终端设备100的 表面,并且与显示屏194设置于不同的位置。
骨传导传感器180M可以获取振动信号。在一些实施例中,骨传导传感器180M可以获取人体声部振动骨块的振动信号。骨传导传感器180M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器180M也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于所述骨传导传感器180M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器180M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键190包括开机键和音量键。按键190可以是机械按键,也可以是触摸式按键。终端设备100可以接收按键输入信号,实现于案件输入信号相关的功能。
马达191可以产生振动。马达191可以用于来电提示,也可以用于触摸反馈。马达191可以对作用于不同应用程序的触摸操作产生不同的振动反馈效果。对于作用于显示屏194的不同区域的触摸操作,马达191也可产生不同的振动反馈效果。不同的应用场景(例如,时间提醒、接收信息、闹钟和游戏)可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器192可以是指示灯,可以用于指示充电状态和电量变化,也可以用于指示消息、未接来电和通知。
SIM卡接口195用于连接SIM卡。SIM卡可以插入SIM卡接口195实现与终端设备100的接触,也可以从SIM卡接口195拔出实现与终端设备100的分离。终端设备100可以支持1个或N个SIM卡接口,N为大于1的正整数。同一个SIM卡接口195可以同时插入多张卡,所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容外部存储卡。终端设备100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,终端设备100采用嵌入式SIM(embedded-SIM,eSIM)卡,eSIM卡可以嵌在终端设备100中,不能和终端设备100分离。
上文详细描述了终端设备100的硬件系统,下面介绍终端设备100的软件系统。软件系统可以采用分层架构、事件驱动架构、微核架构、微服务架构或云架构,本申请实施例以分层架构为例,示例性地描述终端设备100的软件系统。
如图2所示,采用分层架构的软件系统分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,软件系统可以分为四层,从上至下分别为应用程序层、应用程序框架层、安卓运行时(Android Runtime)和系统库、以及内核层。
应用程序层可以包括相机、图库、日历、通话、地图、导航、WLAN、蓝牙、音乐、视频、短信息等应用程序。
应用程序框架层为应用程序层的应用程序提供应用程序编程接口(Application Programming Interface,API)和编程框架。应用程序框架层可以包括一些预定义的函数。
例如,应用程序框架层包括窗口管理器、内容提供器、视图系统、资源管理器和通知管理器、同步定位与建图(Simultaneous Localization And Mapping,SLAM)位姿计算模块以及平面生成模块;应用程序框架层还可以包括电话管理器。
SLAM位姿计算模块用于输出位姿信息与稀疏点云;其中,位姿信息是指终端设 备的摄像头的位姿信息,终端设备的摄像头用于获取真实场景的视频;根据视频中任意一帧图像的位姿信息可以对该帧图像进行特征点提取,并通过计算得到稀疏点云。
平面生成模块用于根据SLAM提供的稀疏点云,通过算法拟合生成虚拟平面;在真实场景中添加虚拟内容时,可以根据虚拟平面对虚拟内容的放置位置进行调整;例如,用户点击屏幕/手势操作放置虚拟内容时,用户的操作与生成的虚拟平面可以产生碰撞,确定虚拟内容的放置位置。应理解,本申请实施例提供的增强现实视频的处理方法对应的程序指令可以在SLAM位姿计算模块与平面生成模块中执行。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏、锁定屏幕和截取屏幕。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频、图像、音频、拨打和接听的电话、浏览历史和书签、以及电话簿。
视图系统包括可视控件,例如显示文字的控件和显示图片的控件。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成,例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供终端设备100的通信功能,例如通话状态(接通或挂断)的管理。
资源管理器为应用程序提供各种资源,比如本地化字符串、图标、图片、布局文件和视频文件。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于下载完成告知和消息提醒。通知管理器还可以管理以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知。通知管理器还可以管理以对话窗口形式出现在屏幕上的通知,例如在状态栏提示文本信息、发出提示音、电子设备振动以及指示灯闪烁。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理、堆栈管理、线程管理、安全和异常的管理、以及垃圾回收等功能。
系统库可以包括多个功能模块,例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:针对嵌入式系统的开放图形库(open graphics library for embedded systems,OpenGL ES)和2D图形引擎(例如:skia图形库(skia graphics library,SGL))。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D图层和3D图层的融合。
媒体库支持多种音频格式的回放和录制、多种视频格式回放和录制以及静态图像文件。媒体库可以支持多种音视频编码格式,例如:MPEG4、H.264、动态图像专家组 音频层面3(moving picture experts group audio layer III,MP3)、高级音频编码(advanced audio coding,AAC)、自适应多码率(adaptive multi-rate,AMR)、联合图像专家组(joint photographic experts group,JPG)和便携式网络图形(portable network graphics,PNG)。
三维图形处理库可以用于实现三维图形绘图、图像渲染、合成和图层处理。
二维图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层可以包括显示驱动、摄像头驱动、音频驱动和传感器驱动等驱动模块。
下面结合显示拍照场景,示例性说明终端设备100的软件系统和硬件系统的工作流程。
当用户在触摸传感器180K上进行触摸操作时,相应的硬件中断被发送至内核层,内核层将触摸操作加工成原始输入事件,原始输入事件例如包括触摸坐标和触摸操作的时间戳等信息。原始输入事件被存储在内核层,应用程序框架层从内核层获取原始输入事件,识别出原始输入事件对应的控件,并通知该控件对应的应用程序(application,APP)。例如,上述触摸操作为单击操作,上述控件对应的APP为相机APP,相机APP被单击操作唤醒后,可以通过API调用内核层的摄像头驱动,通过摄像头驱动控制摄像头193进行拍摄。
目前,在录制AR视频时由于无法将虚拟内容和真实物体的视频很好的融合,尤其是在摄场景中需要用户与虚拟内容进行交互时,需要多次的重复拍摄,费时费力。
有鉴于此,本申请提供了一种AR视频的处理方法,通过在获取原始视频时获取原始视频对应的位姿信息;根据位姿信息与原始视频可以得到虚拟平面;在原始视频的图像帧中添加虚拟内容时,虚拟平面可以作为一个基准面,根据虚拟平面可以调整虚拟内容在原始视频中的位置,使得虚拟内容能够更好的融入到原始视频中,提高AR视频的视频质量。
下面以终端设备100为例,结合图3至图21对本申请提供的增强现实视频的处理方法进行详细描述。
图3是本申请的应用场景的示意图;如图3所示,本申请实施例提供的AR视频的处理方法可以应用于AR视频领域;可以获取原始视频通过AR视频处理可以得到目标视频;其中,原始视频可以是指用户拍摄的真实物体的视频,目标视频可以是指在原始视频中添加虚拟内容后得到的AR视频。
示例性地,本申请实施例提供的AR视频的处理方法也可以在应用程序(Application,APP)中运行从而执行AR视频编辑;比如,AR视频APP可以执行本申请的AR视频的处理方法。或者,本申请实施例提供的AR视频的处理方法也可以集成在终端设备的相机中;比如,在终端设备的相机的设置中可以选择AR视频模式,从而实现本申请实施例提供的AR视频的处理方法;下面分别对这两种实现方式进行详细描述。
实现方式一:通过应用程序实现本申请实施例的AR视频的处理方法。
如图4所示,图4是本申请实施例提供的AR视频的处理方法的示意性流程图;该处理方法200包括步骤S210至步骤S260,下面分别对这些步骤进行详细的描述。
步骤S210、运行AR视频APP。
例如,用户可以点击终端设备的显示界面中的AR视频APP;响应于用户的点击操作,终端设备可以运行AR视频APP;如图5所示,图5示出了终端设备的一种图形用户界面(graphical user interface,GUI),该GUI可以为终端设备的桌面310。当终端设备检测到用户点击桌面310上的AR视频APP的图标320的操作后,可以运行AR视频APP,显示如图6所示的另一GUI;图6所示的显示界面330上可以包括拍摄取景框340,拍摄取景框340内可以实时显示预览图像;拍摄界面上还可以包括用于指示拍摄的控件350,以及其它拍摄控件。
在一个示例中,终端设备检测到用户点击显示界面上的AR视频APP的图标的操作,可以启动AR视频APP,显示AR视频APP的显示界面;在显示界面上可以包括拍摄取景框;例如,在录像模式下,拍摄取景框可以为部分屏幕,或者也可以为整个显示屏。在预览状态下,即可以是用户打开AR视频APP且未按下拍摄按钮之前,拍摄取景框内均可以实时显示预览图像。
还应理解,上述通过AR视频APP进行举例描述,本申请实施例对应用程序的名称不作任何限定。
步骤S220、获取原始视频与位姿信息。
例如,如图7所示终端设备检测到用户点击拍摄的控件350的操作,开始录制拍摄取景框中显示的图像。
应理解,用户用于指示拍摄的行为可以包括按下拍摄按钮,也可以包括用户设备通过语音指示终端设备进行拍摄行为,或者,还可以包括用户其它的指示终端设备进行拍摄行为。上述为举例说明,并不对本申请作任何限定。
示例性地,位姿信息可以用于表示终端设备的摄像头在获取原始视频时的位姿;位姿信息可以包括姿态信息与位置信息。
例如,终端设备可以通过如图1所示的陀螺仪传感器180B获取每帧图像对应的位姿信息。
步骤S230、保存位姿信息与虚拟平面的信息。
其中,保存的位姿信息可以是指原始视频中每个图像帧对应的位姿信息。
示例性地,根据原始视频中任意一个图像帧的位姿信息可以对该图像帧进行特征点提取,并通过计算得到稀疏点云;根据稀疏点云通过算法拟合可以生成虚拟平面;在真实物体的视频中添加虚拟内容时,可以根据虚拟平面对虚拟内容的放置位置进行调整。
在本申请的实施例中,保存位姿信息与虚拟平面的信息可以使得在原始视频录制结束后,根据原始视频的位姿信息与虚拟平面的信息在原始视频中添加虚拟内容生成一个新的AR视频;由于保存了位姿信息与虚拟平面的信息,用户可以对原始视频进行多次不同的编辑,分别生成包括不同虚拟内容的AR视频。
在一个示例中,在本申请的AR视频的处理方法中可以将获取的三维姿态信息通过为四元数进行表示,从而能够避免通过三个参数表示姿态所产生歧义。
其中,四元数可以是指由实数加上三个虚数单位i、j、k组成;比如,四元数都可以是1、i、j和k的线性组合,即四元数一般可表示为a+bi+cj+dk,其中a、b、c、d 均表示实数;i、j、k可以表示旋转;其中,i旋转可以表示X轴与Y轴相交平面中X轴正向向Y轴正向的旋转,j旋转可以表示Z轴与X轴相交平面中Z轴正向向X轴正向的旋转,k旋转可以表示Y轴与Z轴相交平面中Y轴正向向Z轴正向的旋转。
示例性地,在终端设备接收到用户指示拍摄的指令;比如,用户在终端设备上点击视频录制时,终端设备可以启动位姿计算的初始化工作;在未成功初始化前位姿可以表示为(位置x/y/z,旋转四元数),即可以表示为(0,0,0,0,0,0,0),虚拟平面的信息为(数量0);初始化成功时,指定图像帧(初始化起始帧)位姿表示为(0,0,0,0,0,0,0),虚拟平面的信息表示为(数量x,平面编号0,平面0点数n,点0的位置X1,Y1,Z1,…,点n的位置Xn,Yn,Zn)。
其中,数量x表示虚拟平面的总数量,即视频中包括的图像帧的总数;平面编号0可以用于表示多个虚拟平面中的第一个虚拟平面;平面0点数n可以用于表示第一个虚拟平面中包括顶点的数量为n;点0的位置X1,Y1,Z1用于表示第一个虚拟平面中包括顶点0的位置信息;点n的位置Xn,Yn,Zn用于表示第一个虚拟平面中包括顶点n的位置信息。
应理解,虚拟平面的信息可以包括改虚拟平面中包括的所有顶点的位置信息。
例如,在视频录制过程中,获取的当前图像帧对应的位姿信息可以表示为(X,Y,Z,q0,q1,q2,q3),虚拟平面的信息可以表示为(数量x,平面编号A,平面A点数n,点0的位置X1,Y1,Z1,…,点q的位置Xq,Yq,Zq)。
其中,x,y,z可以分别表示获取当前图像帧的摄像头在x轴、y轴以及z轴的坐标;q0,q1,q2,q3表示旋转四元数;比如,可以表示为俯仰角、方位角、旋转角以及欧拉角;数量x表示平面的总数量;平面编号A可以用于表示当前图像帧对应的虚拟平面的标识;平面A点数n用于表示当前图像帧对应的虚拟平面中包括顶点的数量为n;点0的位置X1,Y1,Z1可以用于表示当前图像帧对应的虚拟平面中包括顶点0的位置信息;点n的位置Xn,Yn,Zn用于表示当前图像帧对应的虚拟平面包括的顶点n的位置信息。
在一个示例中,可以获取原始视频中的一个图像帧;根据该图像帧的位姿信息可以对该图像帧进行特征点提取,并通过计算得到稀疏点云;根据稀疏点云信息可以拟合生成虚拟平面;在对视频添加虚拟内容时,可以根据虚拟平面对视频中添加的虚拟内容的所在位置进行调整。
例如,在用户点击屏幕/手势操作放置虚拟内容时,用户的操作与生成的虚拟平面可以产生碰撞,确定虚拟内容的放置位置。
在本申请的实施例中,在获取位姿信息与虚拟平面的信息后,终端设备可以保存位姿信息与虚拟平面的信息。
在一个示例中,自定义信息包括上述位姿信息与虚拟平面的信息,终端设备可以将自定义信息保存为独立保存为独立的二进制文件(binary,bin)。
例如,可以将原始视频与原始视频对应的自定义信息保存在相同的目录下。
例如,可以将原始视频对应的自定义信息与原始视频的命名相同保存在终端设备中。
例如,可以通过一个图像帧的帧号作为标识,将原始视频对应的自定义信息保存在终端设备中。
示例性地,可以根据以下数据格式将原始视频中每个图像帧对应的自定义信息保存为独立的bin文件:
帧号:Frame num:unsigned int32;
位姿信息:(数据1,数据2,数据3,数据4,数据5,数据6,数据7);其中,数据1~数据7可以是float格式的数据;
虚拟平面的信息:(num:unsigned int32;planeNum0:unsigned int32;planeNumPoint:unsigned int32;point0(float,float,float)…pointN(float,float,float)…planeNumN…);
例如,在对原始视频进行编辑时,可以同时加载原始视频和上述bin文件;根据帧号对原始视频中的图像帧与该图像帧对应的自定义信息进行同步对齐。
在一个示例中,自定义信息可以包括上述位姿信息与虚拟平面的信息,终端设备可以将自定义信息保存到原始视频对应视频码流中的补充增强信息中。
例如,可以将以下信息进行视频压缩编码的时候存入到h.264/h.265的SEI信息中:
位姿信息:(float,float,float,float,float,float,float);
虚拟平面的信息:(num:unsigned int32;planeNum0:unsigned int32;planeNumPoint:unsigned int32;point0(float,float,float)…pointN(float,float,float)…planeNumN…)。
将自定义信息存入视频压缩编码的SEI信息的情况下,在执行步骤S250在进行编辑视频解码时,可以按照上述格式进行自定义信息的解码。
在本申请的实施例中,为了减少保存上述位姿信息与虚拟平面的信息所占用终端设备的存储空间,可以采用以下的至少一种方式进行对自定义信息进行压缩处理:
根据当前图像帧与前一图像帧的差保存位姿信息;或者,虚拟平面的平面编号可以采用无符号字符(unsigned char)方式保存;或者,对于虚拟平面中顶点的描述,水平面可以保留一个点的Z轴信息删除其他点的Z轴信息,垂直面可以保留一个点的Y轴信息删除其他点的Y轴信息;或者,顶点的位置描述可以采用float16;或者,保存虚拟平面的信息时可以只保存当前视野范围内的平面。
在本申请的实施例中,通过AR视频APP获取原始视频一方面是为了在录制视频时能够生成以及保存原始视频的位姿信息与虚拟平面的信息;另一方面,在原始视频录制结束后,可以对原始视频中每个图像帧进行编辑;比如,添加虚拟内容。
步骤S240、原始视频录制结束。
例如,如图8所示终端设备检测到用户再次点击拍摄的控件350的操作,结束本次视频的录制;比如,本次录制视频为20秒。
步骤S250、打开虚拟平面的可视化界面,对原始视频进行编辑。
应理解,在对原始视频中的任意一个图像帧进行编辑时,终端设备可以调用保存的该图像帧对应的自定义信息;即调用该图像帧的位姿信息与平面信息。
例如,提取原始视频中第8秒的任意一个图像帧,如图9所示显示界面330还可以包括编辑选项360;在终端设备检测到用户点击编辑选项360后,终端设备可以显示编辑模式的界面,如图10所示;终端设备检测到用户点击编辑模式界面上用于指示AR内容选择361后,显示如图11所示的显示界面;在图11的显示界面中还包括显示平面选项362,终端设备检测到用户可以点击显示平面选项362的操作,在显示界面中显示生成的虚拟平面363,参见图12;在本申请的实施例中,终端设备的显示界面 上可以向用户提供用于放置虚拟内容的可视化平面;例如,在用户添加虚拟内容的过程中,在显示界面上可以显示虚拟平面363;在用户点击屏幕/手势操作放置虚拟内容时,用户的操作与虚拟平面363产生碰撞,从而确定虚拟内容的放置位置,如图12所示。
应理解,在对虚拟内容进行编辑比如调整虚拟内容的位置时,可以在界面中显示虚拟平面363;在完成编辑后,虚拟平面363并不会出现在AR视频中;虚拟平面363可以作为一个参考平面,用于用户确定虚拟内容在视频中的添加位置。
步骤S260、生成包括虚拟内容的AR视频。
示例性地,用户可以对原始视频中的每一个图像帧进行编辑;比如,可以在每一个图像帧中添加虚拟内容,对每一个图像帧中的虚拟内容的位置信息进行调整;从而生成带虚拟内容的AR视频。
在一个示例中,用户可以播放原始视频,用户点击暂停键可以提取当前图像帧并对当前图像帧进行编辑,即在当前图像帧中添加虚拟内容;当用户再次点击播放按钮时,当前图像帧编辑完成。
在本申请的实施例中,在获取原始视频时可以获取原始视频对应的位姿信息;根据位姿信息与原始视频可以得到虚拟平面;在原始视频的图像帧中添加虚拟内容时,虚拟平面可以作为一个基准面,根据虚拟平面可以调整虚拟内容在原始视频中的位置,使得虚拟内容能够更好的融入到原始视频中,提高AR视频的视频质量。
实现方式二:将本申请实施例的AR视频的处理方法集成在终端设备的相机的模式中。
如图13所示,图13是本申请实施例提供的AR视频的处理方法的示意性流程图;该处理方法400包括步骤S410至步骤S470,下面分别对这些步骤进行详细的描述。
步骤S410、运行终端设备的相机。
例如,终端设备检查到用户点击相机的操作;响应于用户的点击操作,终端设备可以运行相机。
图14示出了终端设备的一种GUI,该GUI可以为终端设备的桌面510;当终端设备检测到用户点击桌面510上的相机的图标520的操作后,可以运行相机显示如图15所示的另一GUI,该GUI可以是相机的显示界面530;该显示界面530上可以包括拍摄取景框540、指示拍摄的控件550,以及其它拍摄控件,其中,拍摄取景框540内可以实时显示预览图像。
步骤S420、选择AR拍摄模式。
例如,终端设备可以是检测到用户指示AR拍摄模式的操作。其中,AR拍摄模式可以是指在可以对原始视频进行处理添加虚拟内容的拍摄模式。
如图16所示,拍摄界面上还包括设置560,在终端设备检测到用户点击设置560后,终端设备显示设置模式界面,如图17所示;终端设备检测到用户点击设置模式界面上用于指示AR视频561后,终端设备进入AR拍摄模式。
步骤S430、获取原始视频与位姿信息。
例如,如图17所示终端设备检测到用户点击拍摄的控件550的操作,开始录制拍摄取景框中显示的图像。
应理解,用户用于指示拍摄的行为可以包括按下拍摄按钮,也可以包括用户设备通过语音指示终端设备进行拍摄的行为,或者,还可以包括用户其它的指示终端设备进行拍摄行为;上述为举例说明,并不对本申请作任何限定。
示例性地,位姿信息可以用于表示终端设备的摄像头在获取原始视频时的位姿;位姿信息可以包括姿态信息与位置信息。
例如,终端设备可以通过如图1所示的陀螺仪传感器180B获取每帧图像对应的位姿信息。
步骤S440、保存位姿信息与虚拟平面的信息。
其中,保存的位姿信息可以是指原始视频中每个图像帧对应的位姿信息。
示例性地,根据原始视频中任意一个图像帧的位姿信息可以对该图像帧进行特征点提取,并通过计算得到稀疏点云;根据稀疏点云通过算法拟合可以生成虚拟平面;在真实场景中添加虚拟内容时,可以根据虚拟平面对虚拟内容的放置位置进行调整。
在本申请的实施例中,保存位姿信息与虚拟平面的信息可以使得在原始视频录制结束后,根据原始视频的位姿信息与虚拟平面的信息在原始视频中添加虚拟内容生成一个新的AR视频;由于保存了位姿信息与虚拟平面的信息,用户可以对原始视频进行多次不同的编辑,分别生成包括不同虚拟内容的AR视频。
在一个示例中,在本申请的AR视频的处理方法中可以将获取的三维姿态信息通过为四元数进行表示,从而能够避免通过三个参数表示姿态所产生歧义。
其中,四元数可以是指由实数加上三个虚数单位i、j、k组成;比如,四元数都可以是1、i、j和k的线性组合,即四元数一般可表示为a+bi+cj+dk,其中a、b、c、d均表示实数;i、j、k可以表示旋转;其中,i旋转可以表示X轴与Y轴相交平面中X轴正向向Y轴正向的旋转,j旋转可以表示Z轴与X轴相交平面中Z轴正向向X轴正向的旋转,k旋转可以表示Y轴与Z轴相交平面中Y轴正向向Z轴正向的旋转。
示例性地,在终端设备接收到用户指示拍摄的指令;比如,用户在终端设备上点击视频录制时,终端设备可以启动位姿计算的初始化工作;在未成功初始化前位姿可以表示为(位置x/y/z,旋转四元数),即可以表示为(0,0,0,0,0,0,0),虚拟平面的信息为(数量0);初始化成功时,指定图像帧(初始化起始帧)位姿表示为(0,0,0,0,0,0,0),虚拟平面的信息表示为(数量x,平面编号0,平面0点数n,点0的位置X1,Y1,Z1,…,点n的位置Xn,Yn,Zn)。
其中,数量x表示虚拟平面的总数量,即视频中包括的图像帧的总数;平面编号0可以用于表示多个虚拟平面中的第一个虚拟平面;平面0点数n可以用于表示第一个虚拟平面中包括顶点的数量为n;点0的位置X1,Y1,Z1用于表示第一个虚拟平面中包括顶点0的位置信息;点n的位置Xn,Yn,Zn用于表示第一个虚拟平面中包括顶点n的位置信息。
应理解,虚拟平面的信息可以包括改虚拟平面中包括的所有顶点的位置信息。
例如,在视频录制过程中,获取的当前图像帧对应的位姿信息可以表示为(X,Y,Z,q0,q1,q2,q3),虚拟平面的信息可以表示为(数量x,平面编号A,平面A点数n,点0的位置X1,Y1,Z1,…,点q的位置Xq,Yq,Zq)。
其中,x,y,z可以分别表示获取当前图像帧的摄像头在x轴、y轴以及z轴的坐标; q0,q1,q2,q3表示旋转四元数;比如,可以表示为俯仰角、方位角、旋转角以及欧拉角;数量x表示平面的总数量;平面编号A可以用于表示当前图像帧对应的虚拟平面的标识;平面A点数n用于表示当前图像帧对应的虚拟平面中包括顶点的数量为n;点0的位置X1,Y1,Z1可以用于表示当前图像帧对应的虚拟平面中包括顶点0的位置信息;点n的位置Xn,Yn,Zn用于表示当前图像帧对应的虚拟平面包括的顶点n的位置信息。
在一个示例中,可以获取原始视频中的一个图像帧;根据该图像帧的位姿信息可以对该图像帧进行特征点提取,并通过计算得到稀疏点云;根据稀疏点云信息可以拟合生成虚拟平面;在对视频添加虚拟内容时,可以根据虚拟平面对视频中添加的虚拟内容的所在位置进行调整。
例如,在用户点击屏幕/手势操作放置虚拟内容时,用户的操作与生成的虚拟平面可以产生碰撞,确定虚拟内容的放置位置。
在本申请的实施例中,在获取位姿信息与虚拟平面的信息后,终端设备可以保存位姿信息与虚拟平面的信息。
在一个示例中,自定义信息包括上述位姿信息与虚拟平面的信息,终端设备可以将自定义信息保存为独立保存为独立的二进制文件(binary,bin)。
例如,可以将原始视频与原始视频对应的自定义信息保存在相同的目录下。
例如,可以将原始视频对应的自定义信息与原始视频的命名相同保存在终端设备中。
例如,可以通过一个图像帧的帧号作为标识,将原始视频对应的自定义信息保存在终端设备中。
示例性地,可以根据以下数据格式将原始视频中每个图像帧对应的自定义信息保存为独立的bin文件:
帧号:Frame num:unsigned int32;
位姿信息:(数据1,数据2,数据3,数据4,数据5,数据6,数据7);其中,数据1~数据7可以是float格式的数据;
虚拟平面的信息:(num:unsigned int32;planeNum0:unsigned int32;planeNumPoint:unsigned int32;point0(float,float,float)…pointN(float,float,float)…planeNumN…);
例如,在对原始视频进行编辑时,可以同时加载原始视频和上述bin文件;根据帧号对原始视频中的图像帧与该图像帧对应的自定义信息进行同步对齐。
在一个示例中,自定义信息可以包括上述位姿信息与虚拟平面的信息,终端设备可以将自定义信息保存到原始视频对应视频码流中的补充增强信息中。
例如,可以将以下信息进行视频压缩编码的时候存入到h.264/h.265的SEI信息中:
位姿信息:(float,float,float,float,float,float,float);
虚拟平面的信息:(num:unsigned int32;planeNum0:unsigned int32;planeNumPoint:unsigned int32;point0(float,float,float)…pointN(float,float,float)…planeNumN…)。
将自定义信息存入视频压缩编码的SEI信息的情况下,在执行步骤S250在进行编辑视频解码时,可以按照上述格式进行自定义信息的解码。
在本申请的实施例中,为了减少保存上述位姿信息与虚拟平面的信息所占用终端设备的存储空间,可以采用以下的至少一种方式进行对自定义信息进行压缩处理:
根据当前图像帧与前一图像帧的差保存位姿信息;或者,虚拟平面的平面编号可以采用无符号字符(unsigned char)方式保存;或者,对于虚拟平面中顶点的描述,水平面可以保留一个点的Z轴信息删除其他点的Z轴信息,垂直面可以保留一个点的Y轴信息删除其他点的Y轴信息;或者,顶点的位置描述可以采用float16;或者,保存虚拟平面的信息时可以只保存当前视野范围内的平面。
步骤S450、原始视频录制结束。
例如,如图19所示终端设备检测到用户再次点击拍摄的控件550的操作,结束本次视频的录制;比如,本次录制视频为20秒。
步骤S460、对原始视频进行编辑。
应理解,在对原始视频中的任意一个图像帧进行编辑时,终端设备可以调用保存的该图像帧对应的自定义信息;即调用该图像帧的位姿信息与平面信息。
例如,通过虚拟平面的可视化界面对原始视频进行编辑;可以提取原始视频中第8秒的任意一帧图像,如图20所示显示界面还可以包括显示平面选项570,终端设备检测到用户点击显示平面选项570的操作,在显示界面中可以显示生成的虚拟平面562,如图21所示。
例如,在用户添加虚拟内容的过程中,在显示界面上可以显示虚拟平面562;在用户点击屏幕/手势操作放置虚拟内容时,用户的操作与虚拟平面562产生碰撞,从而确定虚拟内容的放置位置。
应理解,在对虚拟内容进行编辑比如调整虚拟内容的位置时,可以在界面中显示虚拟平面562;在完成编辑后,虚拟平面562并不会出现在AR视频中;虚拟平面562用于用户确定虚拟内容在视频中的添加位置。
步骤S470、生成包括虚拟内容的AR视频。
示例性地,用户可以对原始视频中的每一个图像帧进行编辑;比如,可以在每一个图像帧中添加虚拟内容,对每一个图像帧中的虚拟内容的位置信息进行调整;从而生成带虚拟内容的AR视频。
在一个示例中,用户可以播放原始视频,用户点击暂停键可以提取当前图像帧并对当前图像帧进行编辑,即在当前图像帧中添加虚拟内容;当用户再次点击播放按钮时,当前图像帧编辑完成。
在本申请的实施例中,在获取原始视频时可以获取原始视频对应的位姿信息;根据位姿信息与原始视频可以得到虚拟平面;在原始视频的图像帧中添加虚拟内容时,虚拟平面可以作为一个基准面,根据虚拟平面可以调整虚拟内容在原始视频中的位置,使得虚拟内容能够更好的融入到原始视频中,提高AR视频的视频质量。
应理解,上述举例说明是为了帮助本领域技术人员理解本申请实施例,而非要将本申请实施例限于所例示的具体数值或具体场景。本领域技术人员根据所给出的上述举例说明,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。
上文结合图1至图21,详细描述了本申请实施例的AR视频的处理方法,下面将结合图22和图23,详细描述本申请的装置实施例。应理解,本申请实施例中的装置可以执行前述本申请实施例的AR视频的处理方法,即以下各种产品的具体工作过程, 可以参考前述方法实施例中的对应过程。
图22是本申请提供的一种增强现实视频的处理装置的结构示意图。该处理装置600包括获取单元610和处理单元620。
其中,获取单元610获取原始视频与位姿信息,所述原始视频用于表示真实物体的视频,所述位姿信息用于表示终端设备获取所述原始视频时的位姿;处理单元620用于根据所述原始视频与所述位姿信息生成虚拟平面,所述虚拟平面用于确定在所述原始视频中添加虚拟内容的位置信息;根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成AR视频。
可选地,作为一个实施例,所述位姿信息包括三维姿态信息,所述处理单元620还用于:
通过四元数表示所述三维姿态信息。
可选地,作为一个实施例,所述处理单元620具体用于:
根据所述原始视频中图像帧的位姿信息提取所述图像帧的特征点;
根据所述特征点生成所述虚拟平面。
可选地,作为一个实施例,所述处理单元620还用于:
保存所述位姿信息与所述虚拟平面的信息。
可选地,作为一个实施例,所述处理单元620具体用于:
将所述位姿信息与所述虚拟平面的信息保存在二进制文件中。
可选地,作为一个实施例,所述处理单元620具体用于:
将所述位姿信息与所述虚拟平面的信息保存在所述原始视频对应的补充增强信息中。
可选地,作为一个实施例,所述处理单元620还用于:
对保存的所述位姿信息与所述虚拟平面的信息进行压缩处理。
可选地,作为一个实施例,所述处理单元620具体用于:
在所述原始视频录制完成后,根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成所述AR视频。
可选地,作为一个实施例,所述虚拟平面包括第一虚拟平面,所述第一虚拟平面是指第一图像帧对应的虚拟平面,所述第一图像帧是所述原始视频中的任意一个图像帧;
所述第一虚拟平面的信息包括图像帧的总数、所述第一虚拟平面的标识、所述第一虚拟平面包括的顶点数量以及所述第一虚拟平面包括的每一个顶点的位置信息,所述总数是指所述原始视频包括图像帧的总数。
需要说明的是,上述处理装置600以功能单元的形式体现。这里的术语“单元”可以通过软件和/或硬件形式实现,对此不作具体限定。
例如,“单元”可以是实现上述功能的软件程序、硬件电路或二者结合。所述硬件电路可能包括应用特有集成电路(application specific integrated circuit,ASIC)、电子电路、用于执行一个或多个软件或固件程序的处理器(例如共享处理器、专有处理器或组处理器等)和存储器、合并逻辑电路和/或其它支持所描述的功能的合适组件。
因此,在本申请的实施例中描述的各示例的单元,能够以电子硬件、或者计算机 软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
图23示出了本申请提供的一种电子设备的结构示意图。图23中的虚线表示该单元或该模块为可选的。电子设备700可用于实现上述方法实施例中描述的处理方法。
电子设备700包括一个或多个处理器701,该一个或多个处理器701可支持电子设备700实现方法实施例中的方法。处理器701可以是通用处理器或者专用处理器。例如,处理器701可以是中央处理器(central processing unit,CPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件,如分立门、晶体管逻辑器件或分立硬件组件。
处理器701可以用于对电子设备700进行控制,执行软件程序,处理软件程序的数据。电子设备700还可以包括通信单元705,用以实现信号的输入(接收)和输出(发送)。
例如,电子设备700可以是芯片,通信单元705可以是该芯片的输入和/或输出电路,或者,通信单元705可以是该芯片的通信接口,该芯片可以作为终端设备或其它电子设备的组成部分。
又例如,电子设备700可以是终端设备,通信单元705可以是该终端设备的收发器,或者,通信单元705可以是该终端设备的收发电路。
电子设备700中可以包括一个或多个存储器702,其上存有程序704,程序704可被处理器701运行,生成指令703,使得处理器701根据指令703执行上述方法实施例中描述的AR视频的处理方法。
可选地,存储器702中还可以存储有数据。可选地,处理器701还可以读取存储器702中存储的数据,该数据可以与程序704存储在相同的存储地址,该数据也可以与程序704存储在不同的存储地址。
处理器701和存储器702可以单独设置,也可以集成在一起;例如,集成在终端设备的系统级芯片(system on chip,SOC)上。
示例性地,存储器702可以用于存储本申请实施例中提供的AR视频的处理方法的相关程序704,处理器701可以用于在AR视频编辑时调用存储器702中存储的AR视频的处理方法的相关程序704,执行本申请实施例的AR视频的处理;例如,获取原始视频与位姿信息,所述原始视频用于表示真实物体的视频,所述位姿信息用于表示终端设备获取所述原始视频时的位姿;处理单元用于根据所述原始视频与所述位姿信息生成虚拟平面,所述虚拟平面用于确定在所述原始视频中添加虚拟内容的位置信息;根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成AR视频。
本申请还提供了一种计算机程序产品,该计算机程序产品被处理器701执行时实现本申请中任一方法实施例所述的处理方法。
该计算机程序产品可以存储在存储器702中,例如是程序704,程序704经过预处理、编译、汇编和链接等处理过程最终被转换为能够被处理器701执行的可执行目标文件。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被计算机执行时实现本申请中任一方法实施例所述的方法。该计算机程序可以是高级语言程序,也可以是可执行目标程序。
可选地,该计算机可读存储介质例如是存储器702。存储器702可以是易失性存储器或非易失性存储器,或者,存储器702可以同时包括易失性存储器和非易失性存储器。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
本领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和设备的具体工作过程以及产生的技术效果,可以参考前述方法实施例中对应的过程和技术效果,在此不再赘述。
在本申请所提供的几个实施例中,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的方法实施例的一些特征可以忽略,或不执行。以上所描述的装置实施例仅仅是示意性的,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,多个单元或组件可以结合或者可以集成到另一个系统。另外,各单元之间的耦合或各个组件之间的耦合可以是直接耦合,也可以是间接耦合,上述耦合包括电的、机械的或其它形式的连接。
应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请的实施例的实施过程构成任何限定。
另外,本文中术语“系统”和“网络”在本文中常被可互换使用。本文中的术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
总之,以上所述仅为本申请技术方案的较佳实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (13)

  1. 一种增强现实AR视频的处理方法,其特征在于,包括:
    获取原始视频与位姿信息,所述原始视频用于表示真实物体的视频,所述位姿信息用于表示终端设备获取所述原始视频时的位姿;
    根据所述原始视频与所述位姿信息生成虚拟平面,所述虚拟平面用于确定在所述原始视频中添加虚拟内容的位置;
    根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成AR视频。
  2. 如权利要求1所述的处理方法,其特征在于,所述位姿信息包括三维姿态信息,还包括:
    通过四元数表示所述三维姿态信息。
  3. 如权利要求1或2所述的处理方法,其特征在于,所述根据所述原始视频与所述位姿信息生成虚拟平面的信息,包括:
    根据所述原始视频中图像帧的位姿信息提取所述图像帧的特征点;
    根据所述特征点生成所述虚拟平面。
  4. 如权利要求1至3中任一项所述的处理方法,其特征在于,还包括:
    保存所述位姿信息与所述虚拟平面的信息。
  5. 如权利要求4所述的处理方法,其特征在于,所述保存所述位姿信息与所述虚拟平面的信息,包括:
    将所述位姿信息与所述虚拟平面的信息保存在二进制文件中。
  6. 如权利要求4所述的处理方法,其特征在于,所述保存所述位姿信息与所述虚拟平面的信息,包括:
    将所述位姿信息与所述虚拟平面的信息保存在所述原始视频对应的补充增强信息中。
  7. 如权利要求4至6中任一项所述的处理方法,其特征在于,还包括:
    对保存的所述位姿信息与所述虚拟平面的信息进行压缩处理。
  8. 如权利要求1至7中任一项所述的处理方法,其特征在于,所述根据所述虚拟平面的信息在所述原始视频中添加所述虚拟内容生成AR视频,包括:
    在所述原始视频录制完成后,根据所述虚拟平面在所述原始视频中添加所述虚拟内容生成所述AR视频。
  9. 如权利要求4至8中任一项所述的处理方法,其特征在于,所述虚拟平面包括第一虚拟平面,所述第一虚拟平面是指第一图像帧对应的虚拟平面,所述第一图像帧是所述原始视频中的任意一个图像帧;
    所述第一虚拟平面的信息包括图像帧的总数、所述第一虚拟平面的标识、所述第一虚拟平面包括的顶点数量以及所述第一虚拟平面包括的每一个顶点的位置信息,所述总数是指所述原始视频包括图像帧的总数。
  10. 一种电子设备,其特征在于,所述电子设备包括:一个或多个处理器、存储器和显示屏;所述存储器与所述一个或多个处理器耦合,所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,所述一个或多个处理器调用所述计算机指令以使得所述电子设备执行如权利要求1至9中任一项所述的处理方法。
  11. 一种芯片系统,其特征在于,所述芯片系统应用于电子设备,所述芯片系统包括一个或多个处理器,所述处理器用于调用计算机指令以使得所述电子设备执行如权利要求1至9中任一项所述的处理方法。
  12. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储了计算机程序,当所述计算机程序被处理器执行时,使得处理器执行权利要求1至9中任一项所述的处理方法。
  13. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码被处理器执行时,使得处理器执行权利要求1至9中任一项所述的处理方法。
PCT/CN2022/089308 2021-07-22 2022-04-26 增强现实视频的处理方法与电子设备 WO2023000746A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110831693.9A CN115686182B (zh) 2021-07-22 2021-07-22 增强现实视频的处理方法与电子设备
CN202110831693.9 2021-07-22

Publications (1)

Publication Number Publication Date
WO2023000746A1 true WO2023000746A1 (zh) 2023-01-26

Family

ID=84978886

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089308 WO2023000746A1 (zh) 2021-07-22 2022-04-26 增强现实视频的处理方法与电子设备

Country Status (2)

Country Link
CN (1) CN115686182B (zh)
WO (1) WO2023000746A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI831665B (zh) * 2023-04-10 2024-02-01 晶達光電股份有限公司 具有USB Type-C規格的顯示器

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102678A (zh) * 2013-04-15 2014-10-15 腾讯科技(深圳)有限公司 增强现实的实现方法以及实现装置
CN110378990A (zh) * 2019-07-03 2019-10-25 北京悉见科技有限公司 增强现实场景展现方法、装置及存储介质
US20190362555A1 (en) * 2018-05-25 2019-11-28 Tiff's Treats Holdings Inc. Apparatus, method, and system for presentation of multimedia content including augmented reality content
CN110827411A (zh) * 2018-08-09 2020-02-21 北京微播视界科技有限公司 自适应环境的增强现实模型显示方法、装置、设备及存储介质
CN110879979A (zh) * 2019-11-13 2020-03-13 泉州师范学院 一种基于移动终端的增强现实系统
US20200226823A1 (en) * 2019-01-11 2020-07-16 Microsoft Technology Licensing, Llc Virtual object placement for augmented reality
CN112882576A (zh) * 2021-02-26 2021-06-01 北京市商汤科技开发有限公司 Ar交互方法、装置、电子设备及存储介质
US20210166485A1 (en) * 2018-04-05 2021-06-03 Holome Technologies Limited Method and apparatus for generating augmented reality images

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9407865B1 (en) * 2015-01-21 2016-08-02 Microsoft Technology Licensing, Llc Shared scene mesh data synchronization
US10679415B2 (en) * 2017-07-05 2020-06-09 Qualcomm Incorporated Enhanced signaling of regions of interest in container files and video bitstreams
CN107835436B (zh) * 2017-09-25 2019-07-26 北京航空航天大学 一种基于WebGL的实时虚实融合直播系统及方法
CN110381111A (zh) * 2019-06-03 2019-10-25 华为技术有限公司 一种显示方法、位置确定方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102678A (zh) * 2013-04-15 2014-10-15 腾讯科技(深圳)有限公司 增强现实的实现方法以及实现装置
US20210166485A1 (en) * 2018-04-05 2021-06-03 Holome Technologies Limited Method and apparatus for generating augmented reality images
US20190362555A1 (en) * 2018-05-25 2019-11-28 Tiff's Treats Holdings Inc. Apparatus, method, and system for presentation of multimedia content including augmented reality content
CN110827411A (zh) * 2018-08-09 2020-02-21 北京微播视界科技有限公司 自适应环境的增强现实模型显示方法、装置、设备及存储介质
US20200226823A1 (en) * 2019-01-11 2020-07-16 Microsoft Technology Licensing, Llc Virtual object placement for augmented reality
CN110378990A (zh) * 2019-07-03 2019-10-25 北京悉见科技有限公司 增强现实场景展现方法、装置及存储介质
CN110879979A (zh) * 2019-11-13 2020-03-13 泉州师范学院 一种基于移动终端的增强现实系统
CN112882576A (zh) * 2021-02-26 2021-06-01 北京市商汤科技开发有限公司 Ar交互方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN115686182A (zh) 2023-02-03
CN115686182B (zh) 2024-02-27

Similar Documents

Publication Publication Date Title
US11669242B2 (en) Screenshot method and electronic device
WO2020259452A1 (zh) 一种移动终端的全屏显示方法及设备
WO2020253719A1 (zh) 一种录屏方法及电子设备
US12032410B2 (en) Display method for flexible display, and terminal
WO2020233553A1 (zh) 一种拍摄方法及终端
CN113704014B (zh) 日志获取系统、方法、电子设备及存储介质
CN110489215A (zh) 一种应用程序中等待场景的处理方法和装置
WO2022127787A1 (zh) 一种图像显示的方法及电子设备
CN114556294A (zh) 一种主题切换方法以及主题切换装置
US20230254550A1 (en) Video Synthesis Method and Apparatus, Electronic Device, and Storage Medium
WO2021218429A1 (zh) 应用窗口的管理方法、终端设备及计算机可读存储介质
WO2023056795A1 (zh) 快速拍照方法、电子设备及计算机可读存储介质
CN113935898A (zh) 图像处理方法、系统、电子设备及计算机可读存储介质
WO2022001258A1 (zh) 多屏显示方法、装置、终端设备及存储介质
CN113254409A (zh) 文件共享方法、系统及相关设备
US20240098354A1 (en) Connection establishment method and electronic device
WO2023273543A1 (zh) 一种文件夹管理方法及装置
CN112449101A (zh) 一种拍摄方法及电子设备
WO2023000746A1 (zh) 增强现实视频的处理方法与电子设备
CN115119048B (zh) 一种视频流处理方法及电子设备
CN114444000A (zh) 页面布局文件的生成方法、装置、电子设备以及可读存储介质
CN116048831B (zh) 一种目标信号处理方法和电子设备
CN115482143B (zh) 应用的图像数据调用方法、系统、电子设备及存储介质
CN114003241A (zh) 应用程序的界面适配显示方法、系统、电子设备和介质
CN114115772B (zh) 灭屏显示的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22844917

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22844917

Country of ref document: EP

Kind code of ref document: A1