WO2021083031A1 - 一种时延误差校正方法、终端设备、服务器及存储介质 - Google Patents

一种时延误差校正方法、终端设备、服务器及存储介质 Download PDF

Info

Publication number
WO2021083031A1
WO2021083031A1 PCT/CN2020/122944 CN2020122944W WO2021083031A1 WO 2021083031 A1 WO2021083031 A1 WO 2021083031A1 CN 2020122944 W CN2020122944 W CN 2020122944W WO 2021083031 A1 WO2021083031 A1 WO 2021083031A1
Authority
WO
WIPO (PCT)
Prior art keywords
virtual object
server
pose data
pose
data
Prior art date
Application number
PCT/CN2020/122944
Other languages
English (en)
French (fr)
Inventor
林亚
沈灿
朱方
刘佳
孙健
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Priority to EP20882409.4A priority Critical patent/EP3993428A4/en
Publication of WO2021083031A1 publication Critical patent/WO2021083031A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • H04N21/43072Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen of multiple content streams on the same device
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/0093Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00 with means for monitoring data relating to the user, e.g. head-tracking, eye-tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/012Head tracking input arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/242Synchronization processes, e.g. processing of PCR [Program Clock References]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/60Network structure or processes for video distribution between server and client or between remote clients; Control signalling between clients, server and network components; Transmission of management data between server and client, e.g. sending from server to client commands for recording incoming content stream; Communication details between server and client 
    • H04N21/65Transmission of management data between client and server
    • H04N21/658Transmission by the client directed to the server
    • H04N21/6582Data stored in the client, e.g. viewing habits, hardware capabilities, credit card number
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/01Head-up displays
    • G02B27/0101Head-up displays characterised by optical features
    • G02B2027/014Head-up displays characterised by optical features comprising information/image processing systems

Definitions

  • the embodiments of the present application relate to the field of augmented reality, and in particular, to a delay error correction method, terminal device, server, and storage medium.
  • Augmented Reality (AR) technology is a technology that ingeniously integrates virtual information with the real world. It uses a variety of technical methods such as multimedia, three-dimensional modeling, real-time tracking and registration, intelligent interaction, and sensing. Computer-generated text, images, three-dimensional models, music, video and other virtual information are simulated and applied to the real world. The two kinds of information complement each other, thus realizing the "enhancement" of the real world.
  • SLAM simultaneous positioning and mapping
  • AR has high requirements on the hardware performance and design of the device.
  • the current mainstream AR processing engines such as Google's ARCore and Apple's ARKit, limit the terminal platforms they support to a few high-configuration models. According to our experience with mainstream AR engines, even with qualified high-end mobile phones, the effect of running AR services is not ideal. The outstanding problem is that high-load terminals generate heat, and the frequency is reduced due to heat, which leads to performance degradation and poor AR experience.
  • one of the current solutions is to use the computing power of the cloud to handle the time-consuming AR process, and place the recalculation amount of feature extraction, matching and tracking, and other AR processing. Do it on the server.
  • Delay includes the unpacking and decoding time of the server, the AR processing time, and the network transmission time.
  • the real-world scene seen by the terminal has changed from the time the request was sent.
  • the current frame is based on the bit returned by the server.
  • the implementation manners of the present application provide a delay error correction method, a terminal device, a server, and a storage medium.
  • the embodiment of the application provides a method for correcting time delay error, including: sending a video data packet to a server for the server to determine the pose data of the virtual object to be superimposed according to the video data packet; receiving the position and orientation data of the virtual object returned by the server Pose data; determine the incremental change of pose data according to the time difference between the video frame corresponding to the pose data and the video frame currently to be displayed, and correct the pose data according to the incremental change; according to the corrected pose The data superimposes the virtual object in the current video frame to be displayed.
  • the embodiment of the present application also provides a delay error correction method, including: receiving a video data packet sent by a terminal device; determining the pose data of the virtual object to be superimposed according to the video data packet; and returning the pose data to the terminal device , After the terminal device corrects the pose data, superimpose the virtual object on the currently to be displayed video frame according to the corrected pose data.
  • the embodiment of the present application also provides a terminal device, including: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions that can be executed by the at least one processor, and the instructions are at least One processor executes, so that at least one processor can execute the above-mentioned delay error correction method.
  • the embodiment of the present application also provides a server, including: at least one processor; and a memory communicatively connected with the at least one processor; wherein the memory stores instructions that can be executed by at least one processor, and the instructions are executed by at least one processor.
  • the processor executes, so that at least one processor can execute the foregoing delay error correction method.
  • the embodiment of the present application also provides a computer-readable storage medium that stores a computer program, and when the computer program is executed by a processor, realizes the above-mentioned time delay error correction method applied to a terminal; or, realizes the above-mentioned time delay applied to a server Difference correction method.
  • FIG. 1 is a flowchart of a method for correcting time delay errors in a first embodiment according to the present application
  • FIG. 2 is a flowchart of a method for correcting time delay errors in a second embodiment of the present application
  • FIG. 3 is a diagram of a time delay error correction system according to the second embodiment of the present application.
  • FIG. 4 is a flowchart of a method for correcting time delay errors in a third embodiment according to the present application.
  • FIG. 5 is a schematic structural diagram of a terminal device in a fourth embodiment according to the present application.
  • Fig. 6 is a schematic diagram of the structure of the server in the fifth embodiment according to the present application.
  • the first embodiment of the present application relates to a method for correcting time delay error, which is applied to a terminal.
  • the video data packet is sent to the server for the server to determine the pose data of the virtual object to be superimposed according to the video data packet; the pose data of the virtual object returned by the server is received; the video frame corresponding to the pose data is received
  • the time difference between the video frame and the current video frame to be displayed determine the incremental change of the pose data, and correct the pose data according to the incremental change; superimpose the virtual object on the current video to be displayed according to the corrected pose data Frame.
  • Step 101 Send the video data packet to the server for the server to determine the pose data of the virtual object to be superimposed according to the video data packet.
  • the terminal sends the collected video data packet about the original image to the server.
  • the server receives the video data packet and analyzes the video frame data in the video data packet payload to calculate the pose data of the virtual object, and then the server The identification information of the virtual object and the pose data corresponding to the virtual object are sent to the terminal device.
  • the terminal device uses a camera or AR glasses device to collect the original image data in the real world at a certain frame rate, compress and encode the obtained original image data, and encapsulate the encoded original image data in a suitable Among the data packets transmitted over the network, the packaged video data packets are sent to the server.
  • the server obtains the video data of the corresponding format through the analysis of the video data packets, and extracts and compares the target features of the obtained video data to complete the target Recognition, matching the identified target object with the virtual object database to obtain the corresponding virtual object, and calculate the pose data of the virtual object; finally, the server sends the identification information of the virtual object and the pose data corresponding to the virtual object to the terminal equipment.
  • the pose data of the virtual object includes the position and posture of the virtual object on the video frame image.
  • the corresponding mathematical form can be any form such as the transfer matrix (rotation matrix and translation vector), homography matrix, and essence matrix.
  • the pose data includes the rotation matrix R and the translation vector t as an example for description; in addition, the pose data may also include the video image frame number corresponding to the pose.
  • Step 102 Receive the pose data of the virtual object returned by the server.
  • the terminal device receives the pose data of the virtual object sent by the server, analyzes the pose data of the virtual object, obtains the rotation matrix R and the translation vector t, and the video frame number corresponding to the pose, and downloads the corresponding Virtual object.
  • Step 103 Determine the incremental change of the pose data according to the time difference between the video frame corresponding to the pose data and the video frame currently to be displayed, and correct the pose data according to the incremental change.
  • the time difference between the video frame corresponding to the pose data and the video frame currently to be displayed is used to obtain the rotation matrix increment and the translation vector increment according to the angular velocity and acceleration information of the inertial measurement unit IMU; use the obtained rotation Matrix increment and translation vector increment are used to correct the pose data of the virtual object, and quickly calculate the pose data of the virtual object in the current video frame to be displayed.
  • the terminal sends the Nth frame of video image to the server, and the server processes the Nth frame of video image to obtain the pose data of the virtual object and return it to the terminal device, and notify the terminal device of the corresponding pose data
  • the video frame is the Nth video image.
  • the terminal device confirms the time difference between the current video frame to be displayed and the Nth frame of video image. For example, the video of the Nth frame has been played, and now it is time to play the N+kth frame of video, where N+k Frame is the video frame currently to be displayed. At this time, the terminal device must calculate the time difference between the N+k frame and the N frame.
  • the frame interval time can be multiplied by the frame number difference k to obtain the video frame corresponding to the pose data and the current video frame to be displayed. The time difference between.
  • the pose data R and t are corrected.
  • the specific correction method is shown in formula (1):
  • the inertial measurement unit IMU involved in this embodiment includes a gyroscope and an accelerometer, and can return angular velocity and acceleration information in real time.
  • the calculation amount for calculating the incremental change of the pose based on the angular velocity and acceleration information of the IMU is very small and time-consuming. Therefore, the correction processing will not cause excessive performance consumption to the terminal.
  • the correction processing of the pose data of the virtual object may also include ambient light detection and estimation, and the virtual content and the ambient light are adjusted to be consistent, so that the virtual The content is more authentic.
  • Step 104 Superimpose the virtual object on the current video frame to be displayed according to the corrected pose data.
  • the N+k-th video frame is rendered to obtain a video image with virtual content superimposed and presented to the user.
  • the terminal device can start two threads.
  • One thread is responsible for the video data collection, encoding, and package delivery, and the other thread is responsible for receiving the pose data of the virtual object returned by the server and downloading the virtual object for real-time rendering.
  • the terminal sends video data packets to the server for the server to determine the pose data of the virtual object, so that the computing power of the external server is used to process the time-consuming pose data analysis process, because it is not in the terminal.
  • the terminal device determines the incremental change corresponding to the pose data according to the time difference between the video frame corresponding to the pose data and the current video frame to be displayed to obtain the increase
  • the pose data of the virtual object is corrected by the volume change, and the virtual object is superimposed on the current video frame to be displayed with the corrected pose data; the pose data is corrected to ensure that the virtual object is superimposed on the current video frame to be displayed
  • the accuracy of the location makes it possible to eliminate the problem of synchronization between virtual and reality caused by time delay and improve the user experience.
  • the second embodiment of the present application relates to a time delay error correction method.
  • the second embodiment is roughly the same as the first embodiment.
  • the main difference is that: in the second embodiment of the present application, before sending the video data packet to the server, the statistics are calculated from the request to the server to the reception of the position returned by the server. The delay value between the pose data; and after receiving the pose data of the virtual object returned by the server, it is determined whether the virtual object exists, and if there is no virtual object, the corresponding virtual object is downloaded from the server.
  • the specific process is shown in Figure 2, and the corresponding time delay correction system is shown in Figure 3:
  • Step 201 Count the time delay value from sending a request to the server to receiving the pose data returned by the server;
  • the time delay statistics unit 305 in the terminal device records the time when each video data packet is sent to the server and the time when the pose data returned by the server is received, and calculates the difference between the two times, and the difference is the time Delay value, due to the instability of the network, the delay has real-time variability.
  • the delay value sent to the server is the delay value generated during the last time the terminal sends and receives data back from the server to the server.
  • Step 202 Send the video data packet to the server for the server to determine the pose data of the virtual object to be superimposed according to the video data packet.
  • the delay value calculated in step 201 and the video frame obtained after encoding the collected original image data are encapsulated in a data packet suitable for network transmission and sent to the server for the server to generate a virtual data packet based on the delay value.
  • the pose data of the object is encapsulated in a data packet suitable for network transmission and sent to the server for the server to generate a virtual data packet based on the delay value.
  • the terminal device uses the acquisition unit 301 such as a camera device or an AR glasses device to acquire original image data in the real world at a frame rate of 30 fps, and uses the encoding unit 302 to perform H264 encoding on the obtained original image data.
  • the packaging unit 303 encapsulates the time delay T transmitted from the time delay statistics unit 305 in the RTP extension header, and encapsulates the encoded H264 format video data in the RTP payload according to rfc3984, and sends them to the server through the packet sending unit 304 after being packaged together. .
  • the server receives the video data packet and analyzes the video frame data in the video data packet payload to complete target identification, and matches the identified object with the virtual object database to obtain the ID of the corresponding virtual object. After that, the server predicts the posture of the target based on the current frame after a delay of T time according to the time delay T, and calculates the pose data of the virtual object according to the posture of the target based on the current frame after the delay of T time, and combines the ID of the virtual object with The pose data corresponding to the virtual object is sent to the terminal.
  • Step 203 Receive the pose data of the virtual object returned by the server.
  • the pose data of the virtual object sent by the server is received, the pose data of the virtual object is analyzed, and the rotation matrix and translation vector are obtained, as well as the video frame number and corresponding virtual content corresponding to the pose.
  • the receiving unit 306 in the terminal receives the RTP packet sent by the server, and parses out the pose data of the virtual object: the rotation matrix R and the translation vector t, the virtual object ID, and the video frame number corresponding to the pose data .
  • Step 204 Determine the incremental change of the pose data according to the time difference between the video frame corresponding to the pose data and the video frame currently to be displayed, and correct the pose data according to the incremental change;
  • the time difference between the video frame corresponding to the pose data and the video frame currently to be displayed is used to obtain the rotation matrix increment and the translation vector increment respectively.
  • the pose data of the virtual object is corrected, and the pose data of the virtual object in the current video frame to be displayed is quickly calculated.
  • the terminal sends the Nth frame of video image to the server, and the server predicts the pose data of the virtual object in the N+mth frame of video image according to the calculated delay value T and returns it to the terminal device, and The terminal device is notified that the video frame corresponding to the pose data is the N+m-th video image, where m is the number of video frames played within the time range of the delay value T.
  • the terminal device confirms the time difference between the current video frame to be displayed and the N+mth frame of video image. For example, the video image of the N+mth frame has been played, and now the N+m+kth frame of video is about to be played.
  • the N+m+k frame here is the video frame currently to be displayed.
  • the secondary correction unit 307 of the terminal device has to calculate the time difference between the N+m+kth frame and the N+mth frame, which can be multiplied by the frame interval time by the frame number difference k to obtain the corresponding pose data The time difference between the video frame and the video frame currently to be displayed.
  • the pose change of the virtual object from the N+mth frame image to the N+m+kth frame is obtained, where the rotation matrix increment ⁇ R is calculated from the angular velocity ,
  • the translation vector increment ⁇ t is obtained by integrating the acceleration.
  • the secondary correction unit 307 corrects the pose data R and t according to the obtained incremental change of the pose data, and the specific correction method is shown in formula (2).
  • the inertial measurement unit IMU in this step includes a gyroscope and an accelerometer, and can return angular velocity and acceleration information in real time.
  • the calculation amount for calculating the incremental change of the pose based on the angular velocity and acceleration information of the IMU is very small and time-consuming. Therefore, the correction processing will not cause excessive performance consumption to the terminal.
  • Step 205 Determine whether the virtual object already exists.
  • step 206 is executed; when the terminal device determines that there is currently a virtual object that matches the received virtual object ID number. If it is an object, proceed directly to step 207.
  • Step 206 Download the corresponding virtual object from the server.
  • the corresponding virtual object is searched from the virtual object database in the server according to the ID number of the virtual object, and then downloaded to the terminal.
  • Step 207 Superimpose the virtual object on the current video frame to be displayed according to the corrected pose data.
  • the rendering and display unit 308 renders and displays the virtual object in the currently collected real image according to the corrected pose data R'and t'.
  • the pose data of the virtual object is initially corrected on the server side according to the delay value counted by the terminal, which reduces the consumption of terminal performance and is conducive to the accuracy of subsequent pose correction. ; Then take the time difference between the video frame corresponding to the pose data and the video frame currently to be displayed, and obtain the incremental change of the pose data according to the IMU of the terminal, and perform secondary correction on the pose data of the virtual object to further ensure The accuracy of the position of the virtual object superimposed on the real picture eliminates the problem of synchronization between virtual and reality caused by time delay, and improves the user experience.
  • the server Since it takes a certain amount of time for the server to unpack and decode the video data packets transmitted by the terminal, it is also a time-consuming process to obtain the pose data of the virtual object.
  • the network transmission itself also has a certain transmission time, so from the terminal There is a delay in the process from sending the request to receiving the pose data returned by the server.
  • the server does not know the specific delay value of this process at the beginning, and the terminal needs to send the delay value generated in each data transmission to the server along with the video data packet, so that the server can use the delay value obtained according to the delay value.
  • the tracking algorithm predicts the pose of the virtual object, and performs preliminary correction on the pose of the virtual object to be superimposed on the current video frame to be displayed, eliminating the problem of the difference between virtual and real to a certain extent.
  • the third embodiment of the present application relates to a method for correcting time delay errors, which is applied to a server.
  • the video data packet sent by the terminal device is received; the pose data of the virtual object to be superimposed is determined according to the video data packet; the pose data is returned to the terminal device for the terminal device to correct the pose data , Superimpose the virtual object on the current video frame to be displayed according to the corrected pose data.
  • the implementation details of the time delay error correction method in this embodiment will be described below in detail. The following content is only provided for ease of understanding and is not necessary for implementing this solution.
  • the specific process is shown in Figure 4, and the corresponding time delay correction system is shown in Figure 3:
  • Step 401 Receive a video data packet sent by a terminal device.
  • the server receives the video data packet sent by the terminal from the network.
  • the video data packet contains the video frame data and the last time the terminal device counts the pose data from the request to the server to the reception of the server's return. The delay value between.
  • the receiving unit 309 receives a video data packet sent from the terminal, the video data packet is in RTP format, and the unpacking unit 310 parses the video data packet, extracts the video frame data in the RTP load, and puts it in the buffer Unit 311 extracts the delay value T and sends it to the delay filtering unit 314.
  • Step 402 Determine the pose data of the virtual object to be superimposed according to the video data packet.
  • the original image data obtained by decoding is subjected to target perception recognition, and the recognized target is matched with the object in the virtual object database to obtain the corresponding virtual object; according to the time delay T
  • the tracking predicts the pose of the target based on the current frame after a delay of T time, and calculates the pose of the virtual object; then the server sends the virtual object and the pose data corresponding to the virtual object to the terminal.
  • the decoding unit 312 decodes the video frame data in the buffer unit to obtain the original image YUV data; the intelligent perception recognition unit 313 perceives and recognizes the decoded YUV data through feature extraction, target recognition and matching technology, and The identified target is matched with the template in the virtual object database 318, and the matching virtual object is obtained and its object ID is recorded.
  • the feature extraction here uses the ORB algorithm, and the original image is divided into three layers of pyramids, each of which extracts feature points. In order to make the feature points evenly distributed, the original image is rasterized and the score is matched in each grid. The highest is the characteristic point. In practical applications, other target recognition algorithms can also be selected according to server performance.
  • the tracking prediction unit 315 tracks and predicts the posture of the target superimposed on the virtual object after the delay value, and the pose calculation unit 316 determines the posture data of the virtual object according to the posture of the target after the delay value.
  • the delay filtering unit 314 can also filter according to the current input delay value and the historical delay value. Processing, the average time delay Tavg(k) is obtained, as shown in formula (2):
  • Tavg(k) a*T(k)+(1-a)*Tavg(k-1) (2)
  • a can be set according to empirical values, such as 0.95.
  • the tracking prediction unit 315 uses the Markov chain model to accurately predict the state of the next moment based on the state and historical state at the previous moment, combined with the transition probability matrix in the Markov chain, and is calculated by this algorithm The target position and posture that are delayed by Tavg(k) relative to the current frame are obtained.
  • the pose calculation unit 316 uses PnP (Perspective-n-Point: The n-point perspective projection) algorithm generates the pose data of the virtual object.
  • the pose data includes: a rotation matrix R and a translation vector t.
  • Step 403 Return the pose data to the terminal device, so that after the terminal device corrects the pose data, the virtual object is superimposed on the current video frame to be displayed according to the corrected pose data.
  • the calculated pose data, the frame number corresponding to the pose data, and the virtual object ID are packaged into an RTP packet, and sent to the terminal through the packet sending unit 317.
  • the terminal determines the incremental change of the pose data according to the time difference between the video frame corresponding to the pose data and the video frame currently to be displayed, and corrects the pose data according to the incremental change; and changes the pose data according to the corrected pose data.
  • the virtual object is superimposed on the video frame currently to be displayed.
  • the server opens two threads, one thread is responsible for receiving packets and buffering after parsing, and the other thread is responsible for taking out video frames from the buffer for AR processing.
  • the server receives the video frame data and the delay value sent by the terminal, determines the pose data of the virtual object to be superimposed, reduces the consumption of terminal performance, and superimposes the position of the virtual object on the target.
  • the pose data is corrected for the first time to eliminate the pose error caused by time delay to a certain extent; the pose data after the initial correction is transmitted to the terminal for the terminal to correct the pose data after the initial correction.
  • the posture data after the second correction superimposes the virtual content virtual object in the current video frame to be displayed, and further ensures the accuracy of the position where the virtual object is superimposed on the real picture, so as to eliminate the virtual and reality caused by time delay. Synchronization issues reduce the perceived delay and improve user experience.
  • the server performs filtering processing on the current delay value and historical delay value transmitted by the terminal to obtain the predicted delay value.
  • This filtering processing method allows the server to obtain a smoother predicted delay value, so that the subsequent delay value is based on the predicted delay value.
  • the predicted pose of the virtual object is also accurate.
  • the fourth embodiment of the present application relates to a terminal device, as shown in FIG. 5, including: at least one processor 501; and,
  • a memory 502 that is communicatively connected with at least one processor; wherein the memory 502 stores instructions that can be executed by at least one processor 501, and the instructions are executed by at least one processor 501, so that at least one processor 501 can execute the above-mentioned first implementation Or the time delay error correction method in the second embodiment.
  • the memory 502 and the processor 501 are connected in a bus manner.
  • the bus may include any number of interconnected buses and bridges, and the bus connects one or more various circuits of the processor 501 and the memory 502 together.
  • the bus can also connect various other circuits such as peripheral devices, voltage regulators, and power management circuits, etc., which are all known in the art, and therefore, no further description will be given herein.
  • the bus interface provides an interface between the bus and the transceiver.
  • the transceiver may be one element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices on the transmission medium.
  • the data processed by the processor 501 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 501.
  • the processor 501 is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions.
  • the memory 502 may be used to store data used by the processor 501 when performing operations.
  • the fifth embodiment of the present application relates to a server, as shown in FIG. 6, including: at least one processor 601; and,
  • a memory 602 that is communicatively connected to at least one processor; wherein the memory 602 stores instructions that can be executed by at least one processor 601, and the instructions are executed by at least one processor 601, so that the at least one processor 601 can execute the above-mentioned third implementation The delay error correction method in the mode.
  • the bus may include any number of interconnected buses and bridges.
  • the bus connects one or more various circuits of the processor 601 and the memory 602 together.
  • the bus can also connect various other circuits such as peripheral devices, voltage regulators, and power management circuits, etc., which are all known in the art, and therefore, no further description will be given herein.
  • the bus interface provides an interface between the bus and the transceiver.
  • the transceiver may be one element or multiple elements, such as multiple receivers and transmitters, providing a unit for communicating with various other devices on the transmission medium.
  • the data processed by the processor 601 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 601.
  • the processor 601 is responsible for managing the bus and general processing, and can also provide various functions, including timing, peripheral interfaces, voltage regulation, power management, and other control functions.
  • the memory 602 may be used to store data used by the processor 601 when performing operations.
  • the sixth embodiment of the present application relates to a computer-readable storage medium storing a computer program.
  • the computer program is executed by the processor, the above method embodiment is realized.
  • the program is stored in a storage medium and includes several instructions to enable a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) that executes all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .
  • Embodiments of the present application provide a delay error correction method, terminal device, server, and storage medium, so that the problem of asynchrony between virtual and reality caused by delay is eliminated without increasing terminal performance consumption, and user experience is improved.
  • the terminal device sends video data packets to the server for the server to determine the pose data of the virtual object, so that the computing power of the external server is used to process the time-consuming pose data analysis process, because it is not in the terminal.
  • This analysis process will not increase the performance consumption of the terminal; then the terminal device determines the incremental change corresponding to the pose data according to the time difference between the video frame corresponding to the pose data and the current video frame to be displayed to obtain the increase
  • the pose data of the virtual object is corrected by the volume change, and the virtual object is superimposed on the current video frame to be displayed with the corrected pose data; the pose data is corrected to ensure that the virtual object is superimposed on the current video frame to be displayed
  • the accuracy of location eliminates the problem of synchronization between virtual and reality caused by time delay and improves user experience.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Optics & Photonics (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请实施例涉及增强现实领域,公开了一种时延误差校正方法、终端设备、服务器及存储介质。本申请实施方式中,将视频数据包发送给服务器,供服务器根据视频数据包确定待叠加的虚拟对象的位姿数据;接收服务器返回的虚拟对象的位姿数据;根据位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,确定位姿数据的增量变化,并根据增量变化对位姿数据进行校正;根据校正后的位姿数据将虚拟对象叠加在当前待显示的视频帧中。

Description

一种时延误差校正方法、终端设备、服务器及存储介质
相关申请的交叉引用
本申请基于申请号为201911051113.3、申请日为2019年10月31日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请实施例涉及增强现实领域,特别涉及一种时延误差校正方法、终端设备、服务器以及存储介质。
背景技术
增强现实(Augmented Reality,简称AR)技术是一种将虚拟信息与真实世界巧妙融合的技术,广泛运用了多媒体、三维建模、实时跟踪及注册、智能交互、传感等多种技术手段,将计算机生成的文字、图像、三维模型、音乐、视频等虚拟信息模拟仿真后,应用到真实世界中,两种信息互为补充,从而实现对真实世界的“增强”。随着SLAM(同时定位和建图)技术的发展,增强现实在教育、游戏、工业等方面得到了越来越广泛的应用。
AR对设备的硬件性能、设计都有很高的要求。当前主流的AR处理引擎,譬如谷歌的ARCore、苹果的ARKit,都将其支持的终端平台限制在了少数几款高配置机型。根据我们对主流AR引擎的体验结果,即使使用符合条件的高配手机终端,运行AR业务的效果也不太理想。突出的问题是高负荷的终端发热明显,由发热引起降频,导致性能下降,AR体验的效果变差。
为了解决终端性能影响AR业务较长时间的体验效果的问题,目前的一种解决方案就是借助云端的运算能力来处理耗时的AR过程,将重计算量的特征提取匹配跟踪等AR处理放在服务器上做。
发明人发现相关技术中至少存在如下问题:采用借助云端的方案对于实时性要求高的AR应用场景,会因为从终端向服务器发送请求到接收到服务器返回的位姿信息,这之间存在一定的时延,时延包括服务器的解包解码时长、AR处理时长、网络传输时长,这时终端看到的现实世界场景已经距发送请求时发生了一定的变化,在当前帧上根据服务器返回的位姿信息来叠加虚拟物体便会产生误差,导致虚拟和现实不同步,严重影响用户体验。
发明内容
本申请实施方式提供一种时延误差校正方法、终端设备、服务器及存储介质。
本申请的实施方式提供了一种时延误差校正方法,包括:将视频数据包发送给服务器,供服务器根据视频数据包确定待叠加的虚拟对象的位姿数据;接收服务器返回的虚拟对象的位姿数据;根据位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,确定位姿数据的增量变化,并根据增量变化对位姿数据进行校正;根据校正后的位姿数据将虚拟对象叠加在当前待显示的视频帧中。
本申请的实施方式还提供了一种时延误差校正方法,包括:接收终端设备发送的视频数据包;根据视频数据包确定待叠加的虚拟对象的位姿数据;将位姿数据返回给终端设备,供终端设备在对位姿数据进行校正后,根据校正后的位姿数据将虚拟对象叠加在当前待显示的视频帧中。
本申请的实施方式还提供了一种终端设备,包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述时延误差校正方法。
本申请的实施方式还提供了一种服务器,包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述时延误差校正方法。
本申请的实施方式还提供了一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述应用于终端的时延误差校正方法;或者,实现上述应用于服务器的时延误差校正方法。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。
图1是根据本申请的第一实施方式中时延误差校正方法的流程图;
图2是根据本申请的第二实施方式中时延误差校正方法的流程图;
图3是根据本申请的第二实施方式中时延误差校正系统图;
图4是根据本申请的第三实施方式中时延误差校正方法的流程图;
图5是根据本申请的第四实施方式中终端设备的结构示意图;
图6是根据本申请的第五实施方式中服务器的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施方式进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施方式中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施方式的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。
本申请的第一实施方式涉及一种时延误差校正方法,应用于终端。在本实施方式中,将视频数据包发送给服务器,供服务器根据视频数据包确定待叠加的虚拟对象的位姿数据;接收服务器返回的虚拟对象的位姿数据;根据位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,确定位姿数据的增量变化,并根据增量变化对位姿数据进行校正;根据校正后的位姿数据将虚拟对象叠加在当前待显示的视频帧中。下面对本实施方式的时延误差校正方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。具体流程如图1所示,包括:
步骤101:将视频数据包发送给服务器,供服务器根据视频数据包确定待叠加的虚拟对象的位姿数据。
具体地说,终端把采集到的关于原始图像的视频数据包发送给服务器,服务器接收视频数据包并对视频数据包负载里的视频帧数据进行解析,计算出虚拟对象的位姿数据,然后服务器将虚拟对象的标识信息和虚拟对象对应的位姿数据发送给终端设备。
在一个具体的例子中,终端设备使用摄像头或AR眼镜设备按一定帧频采集现实世界中的原始图像的数据,对得到的原始图像数据进行压缩编码,并将编码后的原始图像数据封装在适合网络传输的数据包中,将打包好的视频数据包发送给服务器,服务器通过对视频数据包的解析,得到对应格式的视频数据,对得到的视频数据进行目标特征的提取、比对,完成目标的识别,将识别得到的目标对象与虚拟对象数据库进行匹配获取对应的虚拟对象,并计算出虚拟对象的位姿数据;最后服务器将虚拟对象的标识信息和虚拟对象对应的位姿数据发送给终端设备。
需要说明的是,虚拟对象可以是3D模型、图像、文字、视频等形式的内容,但也不限于这些形式的内容,在此不一一例举。虚拟对象的位姿数据包括虚拟对象在视频帧图像上的位置和姿态,其对应的数学形式可以是转移矩阵(旋转矩阵和平移向量)、单应性矩阵、本质矩阵等任何一种形式,本实施方式中,以位姿数据包括旋转矩阵R和平移向量t为例进行说 明;另外位姿数据还可以包括该位姿对应的视频图像帧号。
步骤102:接收服务器返回的虚拟对象的位姿数据。
具体地说,终端设备接收由服务器发来的虚拟对象的位姿数据,解析虚拟对象的位姿数据,得到旋转矩阵R和平移向量t,以及该位姿对应的视频帧号,并下载对应的虚拟对象。
步骤103:根据位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,确定位姿数据的增量变化,并根据增量变化对位姿数据进行校正。
具体地说,以位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,根据惯性测量单元IMU的角速度和加速度信息,获取旋转矩阵增量和平移向量增量;利用获得的旋转矩阵增量和平移向量增量,对虚拟对象的位姿数据进行修正,快速计算出虚拟对象在当前待显示的视频帧中的位姿数据。
在一个具体的例子中,终端将第N帧视频图像发送给服务器,服务器对第N帧视频图像进行处理,得到虚拟对象的位姿数据返给终端设备,并通知终端设备该位姿数据对应的视频帧为第N帧视频图像。此时,终端设备确认当前待显示的视频帧与第N帧视频图像的时间差,例如,已经播放过第N帧的视频画面,现在已经要播放第N+k帧视频,这里的第N+k帧即当前待显示的视频帧。此时,终端设备就要计算第N+k帧与第N帧之间的时间差,可以通过帧数差值k乘以帧间隔时长,得到位姿数据对应的视频帧与当前待显示的视频帧之间的时间差。
以得到的时间差,根据惯性测量单元IMU的角速度和加速度信息,计算虚拟对象在第N帧图像到第N+k帧的位姿变化量,其中旋转矩阵增量由角速度计算得到,平移向量增量通过对加速度进行积分得到。根据得到的位姿数据的增量变化,对位姿数据R和t进行校正,具体校正方式如公式(1)所示:
R'=ΔR*R,t'=Δt+t   (1)
本实施方式中涉及的惯性测量单元IMU包括陀螺仪和加速度计,可以实时返回角速度和加速度信息。通过IMU的角速度和加速度信息计算位姿增量变化的计算量很小,耗时也很少,因此校正处理对终端不会带来过多的性能消耗。
在另外一个例子中,为了使虚拟内容更好地与现实世界相融合,对虚拟对象的位姿数据进行校正处理还可以包括环境光检测与估计,将虚拟内容与环境光照调为一致,使虚拟内容更真实。
步骤104:根据校正后的位姿数据将虚拟对象叠加在当前待显示的视频帧中。
具体地说,根据校正后的旋转矩阵和平移向量在第N+k视频帧画面中进行渲染,得到叠 加虚拟内容后的视频画面,并呈现给用户。
在实际应用中,终端设备可以开启两个线程,由一个线程负责视频数据的采集编码发包工作,另一个线程负责接收服务器返回的虚拟对象的位姿数据并下载虚拟对象进行实时渲染。
不难发现,本实施方式中终端通过将视频数据包发送给服务器,供服务器确定虚拟对象的位姿数据,使得借助外部服务器的运算能力来处理耗时的位姿数据分析过程,因不在终端中进行该分析过程,所以不会增加终端的性能消耗;然后终端设备根据位姿数据对应的视频帧与当前待显示视频帧之间的时间差,确定位姿数据对应的增量变化,以得到的增量变化对虚拟对象的位姿数据进行校正,并以校正后的位姿数据将虚拟对象叠加到当前待显示的视频帧;通过对位姿数据的校正保证虚拟对象叠加到当前待显示视频帧中位置的准确性,使得消除因时延带来的虚拟和现实不同步问题,提升了用户体验。
本申请的第二实施方式涉及一种时延误差校正方法。第二实施方式与第一实施方式大致相同,主要区别之处在于:在本申请第二实施方式中,在将视频数据包发送给服务器之前,统计从向服务器发出请求到接收到服务器返回的位姿数据之间的时延值;且在接收服务器返回的虚拟对象的位姿数据后,判断虚拟对象是否存在,若不存在虚拟对象,则从服务器中下载对应的虚拟对象。具体流程如图2所示,对应的时延校正系统如图3所示:
步骤201:统计从向服务器发出请求到接收到服务器返回的位姿数据之间的时延值;
具体地说,终端设备中的时延统计单元305记录每一次向服务器发送视频数据包的时刻,以及接收到服务器返回的位姿数据的时刻,计算两个时刻的差值,该差值为时延值,由于网络的不稳定性,时延具有实时变化性。
在一个例子中,时延统计单元305记录视频图像第N帧相应视频数据包的发送时间为Tns,记录接收到服务器返回位姿数据包的时间为Tnr,则从向服务器发出请求到接收到服务器返回的位姿数据之间的时延值T=Tnr–Tns;终端可将时延值发送给服务器,以供服务器对接收到的时延数据进行滤波处理,得到平均时延。
需要说明的是,在终端首次向服务器发送视频数据包时,由于没有完成一次完成发送接收过程,所以视频数据包中不存在时延值。即向服务器发送的时延值是上一次终端向服务器发送并接收服务器返回数据过程产生的时延值。
步骤202:将视频数据包发送给服务器,供服务器根据视频数据包确定待叠加的虚拟对象的位姿数据。
具体地说,将步骤201计算得到的时延值,以及对采集的原始图像数据编码后得到的视频帧封装在适合网络传输的数据包中,发送给服务器,供服务器根据时延值,生成虚拟对象 的位姿数据。
在一个具体的例子中,终端设备使用采集单元301如摄像头设备或AR眼镜设备按30fps的帧频采集现实世界中的原始图像的数据,利用编码单元302对得到的原始图像数据进行H264编码。打包单元303将时延统计单元305中传输来的时延T封装在RTP扩展头中,将编码后的H264格式的视频数据按rfc3984封装在RTP负载中,一起打包后经发包单元304发送给服务器。服务器接收视频数据包并对视频数据包负载里的视频帧数据进行解析,完成目标识别,将识别得到的对象与虚拟对象数据库进行匹配获取对应的虚拟对象的ID。之后,服务器根据时延T跟踪预测出目标基于当前帧延迟T时间后的姿态,并根据目标基于当前帧延迟T时间后的姿态,计算出虚拟对象的位姿数据,并将虚拟对象的ID和虚拟对象对应的位姿数据发送给终端。
步骤203:接收服务器返回的虚拟对象的位姿数据。
具体地说,接收由服务器发来的虚拟对象的位姿数据,解析虚拟对象的位姿数据,得到旋转矩阵和平移向量,以及该位姿对应的视频帧号和对应的虚拟内容。
在一个例子中,终端中的接收单元306接收服务器发来的RTP包,解析出虚拟对象的位姿数据:旋转矩阵R和平移向量t,虚拟对象ID,以及该位姿数据对应的视频帧号。
步骤204:根据位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,确定位姿数据的增量变化,并根据增量变化对所述位姿数据进行校正;
具体地说,根据惯性测量单元IMU的角速度和加速度信息,以位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,分别获取旋转矩阵增量和平移向量增量。利用获得的旋转矩阵增量和平移向量增量,对虚拟对象的位姿数据进行修正,快速计算出虚拟对象在当前待显示的视频帧中的位姿数据。
在一个具体的例子中,终端将第N帧视频图像发送给服务器,服务器根据计算得到的时延值T,预测出第N+m帧视频图像中虚拟对象的位姿数据返给终端设备,并通知终端设备该位姿数据对应的视频帧为第N+m帧视频图像,其中,m为在时延值T的时间范围内播放的视频帧数。此时,终端设备确认当前待显示的视频帧与第N+m帧视频图像的时间差,例如,已经播放过第N+m帧的视频画面,现在已经要播放第N+m+k帧视频,这里的第N+m+k帧即当前待显示的视频帧。此时,终端设备的二次校正单元307就要计算第N+m+k帧与第N+m帧之间的时间差,可以通过帧数差值k乘以帧间隔时长,得到位姿数据对应的视频帧与当前待显示的视频帧之间的时间差。
以得到的时间差,根据惯性测量单元IMU的角速度和加速度信息,得到虚拟对象在第 N+m帧图像到第N+m+k帧的位姿变化量,其中旋转矩阵增量ΔR由角速度计算得到,平移向量增量Δt通过对加速度进行积分得到。二次校正单元307根据得到的位姿数据的增量变化,对位姿数据R和t进行校正,具体校正方式如公式(2)所示。
R'=ΔR*R,t'=Δt+t   (2)
本步骤中的惯性测量单元IMU包括陀螺仪和加速度计,可以实时返回角速度和加速度信息。通过IMU的角速度和加速度信息计算位姿增量变化的计算量很小,耗时也很少,因此校正处理对终端不会带来过多的性能消耗。
步骤205:判断虚拟对象是否已经存在。
具体地说,当判断终端设备中当前不存在与接收到的虚拟对象ID号一致的虚拟对象时,则执行步骤206;当终端设备判断当前已经存在与接收到的虚拟对象ID号匹配一致的虚拟对象时,则直接执行步骤207。
在一个例子中,判断与接收到的虚拟对象的ID对应的虚拟对象在终端中是否已经存在,如果不存在则从服务器上下载虚拟对象,若已经存在则将虚拟对象叠加在当前待显示的视频中。比如说,在AR虚拟交互过程中,虚拟对象在之前的视频画面中出现过,则终端设备中已经存储过该ID号的虚拟对象,就无需从服务器中下载,直接在终端中调用即可;如果虚拟对象首次在交互中出现,终端设备中并未存储过该ID号的虚拟对象,则要从服务器下载对应ID号的虚拟对象。
步骤206:从服务器下载对应的虚拟对象。
具体地说,根据虚拟对象的ID号从服务器中的虚拟对象数据库中查找对应的虚拟对象,然后下载至终端。
步骤207:根据校正后的位姿数据将虚拟对象叠加在当前待显示的视频帧中。
具体地说,渲染显示单元308根据校正后的位姿数据R'和t',将虚拟对象渲染显示在当前采集到的真实图像中。
需要说明的是,本实施方式中的步骤204和205并没有明显的先后执行顺序,本领域技术人员可以根据实施习惯自行安排执行顺序。
不难发现,本申请第二实施方式中首先根据终端统计的时延值,在服务器端对虚拟对象的位姿数据进行初步修正,减少终端性能的消耗的同时有利于后续位姿校正的准确性;然后以位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,并根据终端的IMU获取位姿数据的增量变化,对虚拟对象的位姿数据进行二次校正,进一步保证虚拟对象叠加到真实画面中的位置的准确性,消除了因时延带来的虚拟和现实不同步的问题,提升了用户体验。
由于服务器对终端传输来的视频数据包进行解包解码需要一定的时间,获取虚拟对象的位姿数据也是一个耗时的过程,与此同时,网络传输的本身也有一定的传输时长,所以从终端发出请求到接收到服务器返回的位姿数据这一过程中存在时延。而服务器一开始并不知道这一过程的具体时延值,需要终端将每一次数据传输中产生的时延值一起随着视频数据包发送给服务器,使得服务器能根据得到的时延值,利用跟踪算法预测出虚拟对象的位姿,对要叠加到当前待显示的视频帧中的虚拟对象的位姿进行初步校正,在一定程度上消除虚拟和现实不同的问题。
本申请的第三实施方式涉及一种时延误差校正方法,应用于服务器。在本实施方式中,接收终端设备发送的视频数据包;根据视频数据包确定待叠加的虚拟对象的位姿数据;将位姿数据返回给终端设备,供终端设备在对位姿数据进行校正后,根据校正后的位姿数据将虚拟对象叠加在当前待显示的视频帧中。下面对本实施方式中时延误差校正方法的实现细节进行具体的说明,以下内容仅为方便理解提供的实现细节,并非实施本方案的必须。具体流程如图4所示,对应的时延校正系统如图3所示:
步骤401:接收终端设备发送的视频数据包。
具体地说,服务器从网络上收取终端发来的视频数据包,该视频数据包包含视频帧数据和上一次终端设备统计的从向所述服务器发出请求到接收到所述服务器返回的位姿数据之间的时延值。
在一个例子中,接收单元309接收到由终端发来的视频数据包,该视频数据包为RTP格式,解包单元310对视频数据包进行解析,提取出RTP负载中的视频帧数据放入缓存单元311,提取出时延值T送入时延滤波单元314。
步骤402:根据视频数据包确定待叠加的虚拟对象的位姿数据。
具体地说,通过特征提取、目标识别和匹配技术对解码得到的原始图像数据进行目标感知识别,将识别得到的目标与虚拟对象数据库中的对象进行匹配,获取对应的虚拟对象;根据时延T跟踪预测出目标基于当前帧延迟T时间后的姿态,并计算出虚拟对象的位姿;然后服务器将虚拟对象和虚拟对象对应的位姿数据发送给终端。
在一个例子中,解码单元312对缓存单元中的视频帧数据进行解码,得到原始图像YUV数据;智能感知识别单元313通过特征提取、目标识别和匹配技术对解码得到的YUV数据进行感知识别,将识别得到的目标与虚拟对象数据库318中的模板进行匹配,获取与之匹配的虚拟对象并记录其对象ID。这里特征提取采用ORB算法,并对原始图像进行3层金字塔分层,每一层分别提取特征点,为使特征点分布均匀,对原图像进行栅格化处理,将每个栅 格中匹配得分最高的作为特征点。在实际应用中,也可根据服务器性能选择其他目标识别算法。
由跟踪预测单元315跟踪预测出叠加虚拟对象的目标在延迟时延值后的姿态,再由位姿计算单元316根据目标在延迟时延值后的姿态,确定出虚拟对象的位姿数据。
在具体实现中,在跟踪预测单元315跟踪预测出叠加虚拟对象的目标在延迟时延值后的姿态前,还可以由时延滤波单元314根据当前输入的时延值及历史时延值进行滤波处理,得到平均时延时Tavg(k),具体如公式(2)所示:
Tavg(k)=a*T(k)+(1-a)*Tavg(k-1)   (2)
其中,a可根据经验值设定,如取0.95。
跟踪预测单元315应用马尔科夫链模型,根据上一时刻的状态和历史状态,结合马尔科夫链中的转移概率矩阵较为准确的对下一时刻的状态迸行准确的预测,通过该算法计算出相对于当前帧延后Tavg(k)时长的目标位置、姿态。
在跟踪预测出叠加虚拟对象的目标在延迟预测时延值后的姿态以后,位姿计算单元316根据目标在延迟预测时延值Tavg(k)后的姿态,使用PnP(Perspective-n-Point:n点透视投影)算法生成虚拟对象的位姿数据,该位姿数据包括:旋转矩阵R和平移向量t。
步骤403:将位姿数据返回给终端设备,供终端设备在对位姿数据进行校正后,根据校正后的位姿数据将虚拟对象叠加在当前待显示的视频帧中。
具体地说,将上述计算出的位姿数据、位姿数据对应的帧号、虚拟对象ID打包为RTP包,通过发包单元317发送给终端。终端根据位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,确定位姿数据的增量变化,根据增量变化对位姿数据进行校正;并依据校正后的位姿数据将虚拟对象叠加到当前待显示的视频帧中。
在实际应用中,服务器端开启两个线程,一个线程负责收包且解析后缓存,另一个线程负责从缓存中取出视频帧进行AR处理。
本申请的第三实施方式中,服务器接收由终端发送的视频帧数据和时延值,确定待叠加的虚拟对象的位姿数据,减少终端性能的消耗同时,对虚拟对象叠加到目标上的位姿数据进行初次校正,在一定程度上消除因时延而导致的位姿误差问题;通过将初次校正后的位姿数据传输给终端,供终端对经过初次校正的位姿数据进行校正后,根据二次校正后的位姿数据将虚拟内容虚拟对象叠加在当前待显示的视频帧中,进一步保证虚拟对象叠加到真实画面中的位置的准确性,以消除因时延带来的虚拟和现实不同步的问题,减少感知到的延迟,提高用户体验。
通过服务器接收到终端设备统计的时延值,并利用得到的时延值,利用跟踪算法预测出虚拟对象的位姿,对要叠加到当前待显示的视频帧中虚拟对象的位姿进行初步校正,减少虚拟对象在当前待显示视频帧中的位置误差。
服务器对终端传输过来的当前时延值和历史时延值进行滤波处理,得到预测时延值,这种滤波处理方式使得服务器可以获得较为平滑的预测时延值,使得后续根据该预测时延值预测得到的虚拟对象的位姿也具有准确性。
上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。
本申请第四实施方式涉及一种终端设备,如图5所示,包括:至少一个处理器501;以及,
与至少一个处理器通信连接的存储器502;其中,存储器502存储有可被至少一个处理器501执行的指令,指令被至少一个处理器501执行,以使至少一个处理器501能够执行上述第一实施方式或第二实施方式中的时延误差校正方法。
其中,存储器502和处理器501采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器501和存储器502的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器501处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器501。
处理器501负责管理总线和通常的处理,还可以提供各种功能,包括定时、外围接口、电压调节、电源管理以及其他控制功能。而存储器502可以被用于存储处理器501在执行操作时所使用的数据。
本申请第五实施方式涉及一种服务器,如图6所示,,包括:至少一个处理器601;以及,
与至少一个处理器通信连接的存储器602;其中,存储器602存储有可被至少一个处理器601执行的指令,指令被至少一个处理器601执行,以使至少一个处理器601能够执行上述第三实施方式中的时延误差校正方法。
其中,存储器602和处理器601采用总线方式连接,总线可以包括任意数量的互联的总 线和桥,总线将一个或多个处理器601和存储器602的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器601处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传送给处理器601。
处理器601负责管理总线和通常的处理,还可以提供各种功能,包括定时、外围接口、电压调节、电源管理以及其他控制功能。而存储器602可以被用于存储处理器601在执行操作时所使用的数据。
本申请第六实施方式涉及一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述方法实施例。
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请实施方式提供一种时延误差校正方法、终端设备、服务器及存储介质,使得在不增加终端性能消耗的基础上消除因时延带来的虚拟和现实不同步的问题,提升用户体验。
在本申请实施方式中,终端设备通过将视频数据包发送给服务器,供服务器确定虚拟对象的位姿数据,使得借助外部服务器的运算能力来处理耗时的位姿数据分析过程,因不在终端中进行该分析过程,所以不会增加终端的性能消耗;然后终端设备根据位姿数据对应的视频帧与当前待显示视频帧之间的时间差,确定位姿数据对应的增量变化,以得到的增量变化对虚拟对象的位姿数据进行校正,并以校正后的位姿数据将虚拟对象叠加到当前待显示的视频帧;通过对位姿数据的校正保证虚拟对象叠加到当前待显示视频帧中位置的准确性,消除了因时延带来的虚拟和现实不同步问题,提升了用户体验。
本领域的普通技术人员可以理解,上述各实施方式是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。

Claims (11)

  1. 一种时延误差校正方法,包括:
    将视频数据包发送给服务器,供所述服务器根据所述视频数据包确定待叠加的虚拟对象的位姿数据;
    接收所述服务器返回的所述虚拟对象的位姿数据;
    根据所述位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,确定所述位姿数据的增量变化,并根据所述增量变化对所述位姿数据进行校正;
    根据所述校正后的位姿数据将所述虚拟对象叠加在当前待显示的视频帧中。
  2. 根据权利要求1所述的时延误差校正方法,其中,在所述将视频数据包发送给服务器前,还包括:
    统计从向所述服务器发出请求到接收到所述服务器返回的位姿数据之间的时延值;
    所述将视频数据包发送给服务器,包括:
    将所述时延值,以及对采集的原始图像数据编码后得到的视频帧封装在数据包中,发送给所述服务器,供所述服务器根据所述时延值,确定所述虚拟对象的位姿数据。
  3. 根据权利要求1所述的时延误差校正方法,其中,所述位姿数据包括:旋转矩阵和平移向量;
    所述根据所述位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,确定所述位姿数据的增量变化,包括:
    根据惯性测量单元IMU的角速度信息,获取旋转矩阵增量,并根据所述IMU的加速度信息,以及所述位姿数据对应的视频帧与当前待显示的视频帧之间的时间差,获取平移向量增量。
  4. 根据权利要求3所述的时延误差校正方法,其中,所述根据所述增量变化对所述位姿数据进行校正,包括:
    将所述旋转矩阵与所述旋转矩阵增量的乘积,作为校正后的旋转矩阵;
    将所述平移向量与所述平移向量增量的和,作为校正后的平移向量。
  5. 根据权利要求1至4中任一项所述的时延误差校正方法,其中,在所述接收所述服务器返回的所述虚拟对象的位姿数据后,还包括:
    判断所述虚拟对象是否已经存在;
    若不存在所述虚拟对象,则从所述服务器下载所述虚拟对象。
  6. 一种时延误差校正方法,包括:
    接收终端设备发送的视频数据包;
    根据所述视频数据包确定待叠加的虚拟对象的位姿数据;
    将所述位姿数据返回给所述终端设备,供所述终端设备在对所述位姿数据进行校正后,根据所述校正后的位姿数据将所述虚拟对象叠加在当前待显示的视频帧中。
  7. 根据权利要求6所述的时延误差校正方法,其中,所述视频数据包,包括所述终端设备统计的从向所述服务器发出请求到接收到所述服务器返回的位姿数据之间的时延值;
    所述根据所述视频数据包确定待叠加的虚拟对象的位姿数据,包括:
    跟踪预测出叠加所述虚拟对象的目标在延迟所述时延值后的姿态;
    根据所述目标在延迟所述时延值后的姿态,确定所述虚拟对象的位姿数据。
  8. 根据权利要求7所述的时延误差校正方法,其中,在所述跟踪预测出叠加所述虚拟对象的目标在延迟所述时延值后的姿态前,还包括:
    根据所述时延值和历史时延值,对所述时延值进行滤波处理,得到预测时延值;
    所述跟踪预测出叠加所述虚拟对象的目标在延迟所述时延值后的姿态,为:
    跟踪预测出叠加所述虚拟对象的目标在延迟所述预测时延值后的姿态。
  9. 一种终端设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至5中任一项所述的时延误差校正方法。
  10. 一种服务器,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求6至8中任一项所述的时延误差校正方法。
  11. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至5中任一项所述的时延误差校正方法;或者,实现权利要求6至8中 任一项所述的时延误差校正方法。
PCT/CN2020/122944 2019-10-31 2020-10-22 一种时延误差校正方法、终端设备、服务器及存储介质 WO2021083031A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP20882409.4A EP3993428A4 (en) 2019-10-31 2020-10-22 TIME DELAY ERROR CORRECTION METHODS, TERMINAL, SERVER AND STORAGE MEDIA

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911051113.3A CN112752119B (zh) 2019-10-31 2019-10-31 一种时延误差校正方法、终端设备、服务器及存储介质
CN201911051113.3 2019-10-31

Publications (1)

Publication Number Publication Date
WO2021083031A1 true WO2021083031A1 (zh) 2021-05-06

Family

ID=75641372

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/122944 WO2021083031A1 (zh) 2019-10-31 2020-10-22 一种时延误差校正方法、终端设备、服务器及存储介质

Country Status (3)

Country Link
EP (1) EP3993428A4 (zh)
CN (1) CN112752119B (zh)
WO (1) WO2021083031A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117296082A (zh) * 2021-05-20 2023-12-26 华为技术有限公司 一种图像处理方法及装置
CN113747253B (zh) * 2021-08-17 2023-04-28 中移(杭州)信息技术有限公司 网络带宽的确定方法、视频rtp接收端及存储介质
CN117237399A (zh) * 2022-06-08 2023-12-15 华为云计算技术有限公司 一种物体跟踪方法以及相关设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106710002A (zh) * 2016-12-29 2017-05-24 深圳迪乐普数码科技有限公司 基于观察者视角定位的ar实现方法及其系统
CN106796481A (zh) * 2016-09-27 2017-05-31 深圳市大疆创新科技有限公司 控制方法、控制装置及电子装置
CN107203257A (zh) * 2016-03-17 2017-09-26 深圳多哚新技术有限责任公司 一种头部姿态补偿方法及相关设备
US20170289209A1 (en) * 2016-03-30 2017-10-05 Sony Computer Entertainment Inc. Server-based sound mixing for multiuser voice chat system
CN109126122A (zh) * 2017-06-16 2019-01-04 上海拆名晃信息科技有限公司 一种用于虚拟现实的云游戏系统实现方法

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4144888B2 (ja) * 2005-04-01 2008-09-03 キヤノン株式会社 画像処理方法、画像処理装置
ES2745739T3 (es) * 2010-09-20 2020-03-03 Qualcomm Inc Un entorno adaptable para realidad aumentada asistida por la nube
WO2014020364A1 (en) * 2012-07-30 2014-02-06 Zinemath Zrt. System and method for generating a dynamic three-dimensional model
WO2014075237A1 (zh) * 2012-11-14 2014-05-22 华为技术有限公司 一种实现增强现实的方法及用户设备
US9514571B2 (en) * 2013-07-25 2016-12-06 Microsoft Technology Licensing, Llc Late stage reprojection
CN107025661B (zh) * 2016-01-29 2020-08-04 成都理想境界科技有限公司 一种实现增强现实的方法、服务器、终端及系统
WO2018039586A1 (en) * 2016-08-26 2018-03-01 Magic Leap, Inc. Continuous time warp and binocular time warp for virtual and augmented reality display systems and methods
CN106856566B (zh) * 2016-12-16 2018-09-25 中国商用飞机有限责任公司北京民用飞机技术研究中心 一种基于ar设备的信息同步方法及系统
CN106980368B (zh) * 2017-02-28 2024-05-28 深圳市未来感知科技有限公司 一种基于视觉计算及惯性测量单元的虚拟现实交互设备
CN107360060B (zh) * 2017-08-07 2020-04-10 瑞斯康达科技发展股份有限公司 一种时延测量方法及装置
CN108161882B (zh) * 2017-12-08 2021-06-08 华南理工大学 一种基于增强现实的机器人示教再现方法及装置
CN109375764B (zh) * 2018-08-28 2023-07-18 北京凌宇智控科技有限公司 一种头戴显示器、云端服务器、vr系统及数据处理方法
CN109276883B (zh) * 2018-09-14 2022-05-31 网易(杭州)网络有限公司 游戏信息的同步方法、服务端、客户端、介质及电子设备
CN109847361B (zh) * 2019-02-27 2020-11-10 腾讯科技(深圳)有限公司 运动状态的同步方法和装置、存储介质、电子装置
US11010921B2 (en) * 2019-05-16 2021-05-18 Qualcomm Incorporated Distributed pose estimation
CN110244840A (zh) * 2019-05-24 2019-09-17 华为技术有限公司 图像处理方法、相关设备及计算机存储介质
CN110335317B (zh) * 2019-07-02 2022-03-25 百度在线网络技术(北京)有限公司 基于终端设备定位的图像处理方法、装置、设备和介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107203257A (zh) * 2016-03-17 2017-09-26 深圳多哚新技术有限责任公司 一种头部姿态补偿方法及相关设备
US20170289209A1 (en) * 2016-03-30 2017-10-05 Sony Computer Entertainment Inc. Server-based sound mixing for multiuser voice chat system
CN106796481A (zh) * 2016-09-27 2017-05-31 深圳市大疆创新科技有限公司 控制方法、控制装置及电子装置
CN106710002A (zh) * 2016-12-29 2017-05-24 深圳迪乐普数码科技有限公司 基于观察者视角定位的ar实现方法及其系统
CN109126122A (zh) * 2017-06-16 2019-01-04 上海拆名晃信息科技有限公司 一种用于虚拟现实的云游戏系统实现方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3993428A4 *

Also Published As

Publication number Publication date
EP3993428A1 (en) 2022-05-04
CN112752119A (zh) 2021-05-04
CN112752119B (zh) 2023-12-01
EP3993428A4 (en) 2022-09-07

Similar Documents

Publication Publication Date Title
WO2021083031A1 (zh) 一种时延误差校正方法、终端设备、服务器及存储介质
CN111627116B (zh) 图像渲染控制方法、装置及服务器
CN110310326B (zh) 一种视觉定位数据处理方法、装置、终端及计算机可读存储介质
US10984583B2 (en) Reconstructing views of real world 3D scenes
US11694316B2 (en) Method and apparatus for determining experience quality of VR multimedia
CN112154669A (zh) 基于系统时钟的视频流帧时间戳的相关
US10198842B2 (en) Method of generating a synthetic image
CN114543797B (zh) 位姿预测方法和装置、设备、介质
CN115802076A (zh) 一种三维模型分布式云端渲染方法、系统及电子设备
KR102346090B1 (ko) 체적 3d 비디오 데이터의 실시간 혼합 현실 서비스를 위해 증강 현실 원격 렌더링 방법
WO2024109317A1 (zh) 一种传输视频帧及摄像参数信息的方法与设备
US20240019702A1 (en) Control method for head-mounted device and image rendering method
CN115065827A (zh) 视频编码方法、装置、电子设备及介质
KR102471792B1 (ko) Ar 컨텐츠의 렌더링을 위한 클라우드 및 그 동작 방법
Hasper et al. Remote execution vs. simplification for mobile real-time computer vision
CN110211239B (zh) 基于无标记识别的增强现实方法、装置、设备及介质
CN114143486A (zh) 视频流同步方法、装置、计算机设备和存储介质
CN113556600A (zh) 基于时序信息的驱动控制方法、装置、电子设备和可读存储介质
WO2023050590A1 (zh) 一种图像处理方法、装置、存储介质及终端
CN113630745B (zh) 无人机通信方法、系统、装置、设备及存储介质
US11910068B2 (en) Panoramic render of 3D video
CN114390314B (zh) 可变帧率音视频处理方法、设备及存储介质
EP4202611A1 (en) Rendering a virtual object in spatial alignment with a pose of an electronic device
EP4169248A1 (en) Video pass-through computing system
CN116299544A (zh) 一种头戴显示设备及其定位方法、装置和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20882409

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020882409

Country of ref document: EP

Effective date: 20220127

NENP Non-entry into the national phase

Ref country code: DE