CN113766119A

CN113766119A - Virtual image display method, device, terminal and storage medium

Info

Publication number: CN113766119A
Application number: CN202110512971.4A
Authority: CN
Inventors: 黄归
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2021-12-07
Anticipated expiration: 2041-05-11
Also published as: CN113766119B

Abstract

The application provides a virtual image display method, a virtual image display device, a terminal and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: acquiring a target video frame output by the SurfaceText through an application program, wherein the target video frame is obtained by processing an original video frame currently acquired by a camera; calling getTimeStamp of the SurfaceText through an application program, and acquiring a first time stamp, wherein the first time stamp is used for representing a time point when the SurfaceText is processed to obtain a target video frame; and determining the posture data corresponding to the first time stamp of the camera through an application program, determining the target position in the target video frame based on the posture data, and displaying the virtual image at the target position. By the method, the shaking degree of the virtual image in the video can be relieved under the condition that the video is collected by the camera in the moving process, so that the display effect of the virtual image is improved.

Description

Virtual image display method, device, terminal and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a terminal, and a storage medium for displaying an avatar.

Background

Augmented Reality (AR) can construct an avatar that does not exist in a real environment by means of a computer graphics technology and a visualization technology, fuse the avatar into a video corresponding to a real environment, and present the video fused with the avatar to a user to provide the user with a real environment with a sensory effect. When the avatar is integrated into the video corresponding to the real environment, a target position for displaying the avatar in the video frame needs to be determined, and the avatar is displayed at the target position.

In the related art, a period of time is required for transmitting a video frame to an application program after the video frame is acquired by a camera, so that the deviation between the time point of acquiring the video frame by the application program and the time point of actually acquiring the video frame by the camera is large, and a target position in the video frame is determined according to attitude data corresponding to the time point of acquiring the video frame by the application program, so that the accuracy of the determined target position is low, and therefore, under the condition that the video is acquired by the camera in the moving process, the phenomenon of shaking of an avatar in the video is serious, and the display effect is poor.

Disclosure of Invention

The embodiment of the application provides an avatar display method, device, terminal and storage medium, which can relieve the shaking degree of an avatar in a video under the condition that the video is collected by a camera in the moving process, so that the display effect of the avatar is improved. The technical scheme is as follows:

in one aspect, there is provided an avatar display method performed by a terminal, the method including:

acquiring a target video frame output by a video frame processing component SurfaceText through an application program, wherein the target video frame is obtained by processing an original video frame currently acquired by a camera;

calling a timestamp acquisition interface getTimeStamp of the SurfaceText through the application program to acquire a first timestamp, wherein the first timestamp is used for representing a time point of the SurfaceText for processing to obtain the target video frame;

determining, by the application program, pose data corresponding to the first timestamp of the camera, determining a target position in the target video frame based on the pose data, and displaying an avatar at the target position.

In another aspect, there is provided an avatar display apparatus, the apparatus including:

the video frame acquisition module is used for acquiring a target video frame output by a video frame processing component SurfaceText through an application program, wherein the target video frame is obtained by processing an original video frame currently acquired by a camera;

a timestamp obtaining module, configured to call a timestamp obtaining interface getTimeStamp of the SurfaceText through the application program, and obtain a first timestamp, where the first timestamp is used to indicate a time point at which the SurfaceText is processed to obtain the target video frame;

and the virtual image display module is used for determining the attitude data corresponding to the first time stamp of the camera through the application program, determining the target position in the target video frame based on the attitude data and displaying the virtual image at the target position.

In one possible implementation, the apparatus further includes:

and the interface calling module is used for calling data to update an interface updateTextImage through the application program, and triggering the SurfaceText to process the original video frame currently acquired by the camera to obtain the target video frame.

In one possible implementation, the avatar display module includes:

the data acquisition unit is used for acquiring a plurality of pieces of posture data of the camera in the video acquisition process and a second timestamp corresponding to each piece of posture data through the application program, wherein the second timestamp is used for representing a time point of generating the corresponding posture data;

the data selecting unit is used for selecting attitude data corresponding to a target timestamp from a plurality of second timestamps, wherein the target timestamp is the second timestamp with the minimum deviation from the first timestamp in the plurality of second timestamps;

and the data determining unit is used for determining the selected attitude data as the attitude data corresponding to the first time stamp of the camera.

In a possible implementation manner, the data obtaining unit is configured to obtain, by the application program, a plurality of pieces of inertial measurement unit IMU data and a second timestamp corresponding to each piece of IMU data from a cache pool; determining the plurality of pieces of IMU data as a plurality of pieces of pose data of the camera; the cache pool is used for storing IMU data generated in the video acquisition process.

In a possible implementation manner, the first timestamp is calibrated based on a normal running clock Uptime, and the second timestamp is calibrated based on a real-time running clock elapsedcultime;

the avatar display module further includes:

a difference determining unit, configured to obtain a time difference between the elapsedcultime and the Uptime at any time in the video acquisition process;

and the timestamp conversion unit is used for converting the second timestamp into a timestamp under the Updime based on the time difference, or converting the first timestamp into a timestamp under the ElapsedRealtime based on the time difference.

In a possible implementation manner, the timestamp conversion unit is configured to subtract the time difference from the second timestamp to obtain the second timestamp under the Uptime.

In a possible implementation manner, the timestamp conversion unit is configured to add the time difference to the first timestamp to obtain the first timestamp under the elapsedcultime.

In a possible implementation manner, the avatar display module is configured to determine, based on the pose data, a corresponding second horizontal plane of a first horizontal plane in the target video frame, where the first horizontal plane is any horizontal plane in a three-dimensional space captured by the camera; displaying the avatar on the second horizontal plane.

In a possible implementation manner, the avatar display module is further configured to render the target video frame in a shooting interface of the application program through a video frame rendering component GLSurfaceView.

In another aspect, a terminal is provided, which includes a processor and a memory, wherein the memory stores at least one computer program, and the computer program is loaded and executed by the processor to implement the operations performed in the avatar display method in any one of the above possible implementation manners.

In another aspect, a computer-readable storage medium is provided, in which at least one computer program is stored, the computer program being loaded and executed by a processor to implement the operations performed in the avatar display method in any of the above possible implementations.

In yet another aspect, a computer program product or a computer program is provided, the computer program product or the computer program comprising a computer program stored in a computer readable storage medium. The processor of the terminal reads the computer program from the computer-readable storage medium, and executes the computer program, so that the terminal performs the operations performed in the avatar display method in the above-described various optional implementations.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

in the embodiment of the application, considering that the surfacentext can acquire the original video frames collected by the camera earlier, therefore, the target video frame is directly obtained from the SurfaceText by the application program, the time offset caused by transmitting the target video frame to the application program after a series of processing is avoided, so that the deviation between the first timestamp acquired by the application and the timestamp of the original video frame captured by the camera is small, therefore, the deviation between the attitude data corresponding to the first time stamp of the camera and the attitude data when the original video frame is actually acquired by the camera is smaller, therefore, the accuracy of determining the target position in the video frame based on the posture data corresponding to the first time stamp of the camera is high, the situation that the camera acquires the video during moving can be relieved, the degree of shaking of the avatar in the video, thereby improving the display effect of the avatar.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flowchart of an avatar display method according to an embodiment of the present application;

FIG. 3 is a flowchart of an avatar display method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a process for determining pose data corresponding to a first timestamp according to an embodiment of the present application;

fig. 5 is a schematic diagram of a process for unifying a first timestamp and a second timestamp under a same calibration clock according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a process for generating video frames according to an embodiment of the present application;

fig. 7 is a target video frame after an avatar is fused according to an embodiment of the present disclosure;

fig. 8 is a block diagram of an avatar display apparatus provided in an embodiment of the present application;

fig. 9 is a block diagram of an avatar display apparatus provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of a terminal according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The terms "first," "second," "third," "fourth," and the like as used herein may be used herein to describe various concepts, but these concepts are not limited by these terms unless otherwise specified. These terms are only used to distinguish one concept from another. For example, a first timestamp may be referred to as a timestamp, and similarly, a second timestamp may be referred to as a first timestamp, without departing from the scope of the present application.

As used herein, the terms "at least one," "a plurality," "each," and "any," at least one of which includes one, two, or more than two, and a plurality of which includes two or more than two, each of which refers to each of the corresponding plurality, and any of which refers to any of the plurality. For example, the plurality of timestamps includes 3 timestamps, each of which refers to each of the 3 timestamps, and any one of the 3 timestamps refers to any one of the 3 timestamps, which may be the first one, the second one, or the third one.

Fig. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal 101 and a server 102. The terminal 101 and the server 102 are connected via a wireless or wired network. Optionally, the terminal 101 is a smartphone, tablet, laptop, desktop computer, smart speaker, smart watch, in-vehicle terminal, video camera, or other terminal. Optionally, the server 102 is an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud service, cloud database, cloud computing, cloud function, cloud storage, web service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network), big data and artificial intelligence platform.

The terminal 101 has installed thereon an application program serviced by the server 102, by which the terminal 101 can implement functions such as data transmission, message interaction, and the like. Alternatively, the application is an application in the operating system of the terminal 101, or an application provided by a third party. The application has a video shooting function and an AR function, for example, the application can display various avatars in a shot video frame, and of course, the application can also have other functions, which is not limited in this embodiment of the application. Optionally, the application is a short video application, a photographing application, a gaming application, a shopping application, a chat application, or other application. In the embodiment of the present application, the server 102 is configured to provide an avatar for the terminal 101, and the terminal 101 is configured to collect a video and fuse the avatar in the collected video.

The virtual image display method can be applied to AR scenes. For example, after the user selects a favorite avatar from the application program in the terminal, the terminal can fuse the avatar selected by the user in the currently photographed video through the avatar display method provided by the application program.

Fig. 2 is a flowchart of an avatar display method according to an embodiment of the present disclosure. Referring to fig. 2, the embodiment includes:

201. the terminal obtains a target video frame output by a video frame processing component SurfaceText through an application program, wherein the target video frame is obtained by processing an original video frame currently acquired by a camera.

The application program is any application program in the terminal, and the embodiment of the application is not limited thereto. The SurfaceText is a video frame processing component supported by a version behind Android api 11 (a version of an Android system interface), and can be used as an output carrier of videos or images and a place for caching the videos or the images. When the SurfaceText processes a video or an image, the processed video or image does not need to be displayed in an interface of the terminal. After the terminal collects an original video frame through the camera, the original video frame is transmitted to the SurfaceText, and the SurfaceText processes the original video frame collected by the camera to obtain a target video frame. For example, the original video frame is a bitmap, and the SurfaceText converts the bitmap into a texture image, i.e., a target video frame.

202. And the terminal calls a timestamp acquisition interface getTimeStamp of the SurfaceText through an application program to acquire a first timestamp, wherein the first timestamp is used for representing a time point when the SurfaceText is processed to obtain a target video frame.

A timestamp (timestamp) is a sequence of characters that uniquely represents a certain point in time.

getTimeStamp is a timestamp capture interface that provides a timestamp of the latest video frame resulting from the surfacentt process. The target video frame is obtained by processing the original video frame currently acquired by the camera through the SurfaceText, so that the target video frame is the latest video frame obtained by the SurfaceText processing, and correspondingly, the first timestamp obtained by calling the getTimeStamp is the timestamp of the target video frame, which represents the time point of the target video frame obtained by the SurfaceText processing.

203. And the terminal determines the attitude data corresponding to the first time stamp of the camera through an application program, determines the target position in the target video frame based on the attitude data and displays the virtual image at the target position.

The pose data represents the pose of the camera. In the process of shooting a video, the posture of the camera may change, for example, the terminal rotates a certain angle, which may cause the posture of the camera to change, and the posture of the camera may cause the target position in the captured video frame to change, so that in order to be able to display an avatar at the target position of the video frame, it is necessary to determine the target position in the video frame based on the posture data of the camera at the time of capturing the video frame. The target position can be set to any position, which is not limited in the embodiment of the present application.

The virtual image is a virtual image which does not exist in a real environment and is constructed by a computer technology. The avatar includes, in terms of the type of avatar, an image of a cartoon character and an image of a real character, for example, the cartoon character is a cartoon character of a cat, a cartoon character of a person, or other cartoon characters. The image of the actual character is an image of an actual cat, an image of an actual person, or an image of another actual character. The avatar includes a dynamic avatar and a static avatar in terms of the display effect of the avatar. From the source of the avatar, the avatar includes an avatar acquired from the server and an avatar acquired locally from the terminal.

Fig. 3 is a flowchart of an avatar display method according to an embodiment of the present application. Referring to fig. 3, the embodiment includes:

301. and the terminal calls a data updating interface updateTextImage through an application program, and triggers a video frame processing component SurfaceText to process the original video frame currently acquired by the camera to obtain a target video frame.

The updateTextImage is a data update interface, and when the updateTextImage is called, the updateTextImage triggers the SurfaceTexture to acquire the latest original video frame, that is, the currently acquired original video frame, and the video frame is processed to obtain the target video frame.

Optionally, the step of processing the current original video frame acquired by the camera by the SurfaceText to obtain the target video frame includes: SurfaceText converts an original video frame belonging to a bitmap into a target video frame belonging to a texture image.

302. And the terminal acquires a target video frame output by the SurfaceText through an application program.

The target video frame is obtained by processing an original video frame currently acquired by the camera.

303. And the terminal calls a timestamp acquisition interface getTimeStamp of the SurfaceText through an application program to acquire a first timestamp, wherein the first timestamp is used for representing a time point when the SurfaceText is processed to obtain a target video frame.

304. And the terminal determines the attitude data corresponding to the first time stamp of the camera through an application program.

In a possible implementation manner, the determining, by the terminal, the pose data corresponding to the first timestamp by the camera through the application program includes: the terminal acquires a plurality of pieces of attitude data of a camera in the video acquisition process and a second timestamp corresponding to each piece of attitude data through an application program; and selecting attitude data corresponding to the target timestamp from the plurality of second timestamps, and determining the selected attitude data as the attitude data corresponding to the first timestamp of the camera. Wherein the second timestamp is used to represent a point in time at which the corresponding pose data was generated. The target timestamp is a second timestamp of the plurality of second timestamps having a smallest deviation from the first timestamp.

In the embodiment of the application, the gesture data with the minimum deviation between the corresponding second timestamp and the corresponding first timestamp is selected from the plurality of pieces of gesture data, the gesture data is determined to be the gesture data corresponding to the first timestamp of the camera, and then the deviation between the gesture data and the gesture data when the camera actually collects video frames is minimum, so that the target position determined based on the gesture data can be guaranteed to be the most accurate, the shaking degree of the virtual image in the video is relieved under the condition that the camera collects the video in the moving process, and the display effect of the virtual image is improved.

In a possible implementation manner, the method for acquiring, by an application program, a plurality of pieces of pose data of a camera during video acquisition and a second timestamp corresponding to each piece of pose data includes: the terminal acquires a plurality of IMU (Inertial Measurement Unit) data and a second timestamp corresponding to each IMU data from the cache pool through an application program; the plurality of pieces of IMU data are determined as a plurality of pieces of pose data of the camera.

The cache pool is used for storing IMU data generated in the video acquisition process. In the video acquisition process of the terminal, IMU data and a second timestamp corresponding to the IMU data are generated in real time through an IMU in the terminal, and the generated IMU data and the second timestamp corresponding to the IMU data are stored in the cache pool together. And the subsequent terminal can directly acquire the IMU data and the second timestamp corresponding to the IMU data from the cache pool. Optionally, the IMU in the terminal comprises an angular velocity sensor and an acceleration sensor. The angular velocity sensor is used for measuring the angular velocity of the terminal in a three-dimensional space relative to a certain coordinate system, and the acceleration sensor is used for measuring the acceleration of the terminal in three coordinate axis directions in the three-dimensional space. Accordingly, the IMU data includes an angular velocity of the terminal in a three-dimensional space with respect to a coordinate system at a certain time and accelerations of the terminal in three coordinate axis directions in the three-dimensional space. The attitude of the terminal can be determined based on the angular velocity and the acceleration, and the camera is in the terminal, and therefore, the IMU data can be taken as attitude data of the camera.

In a possible implementation manner, the first timestamp is calibrated based on the normal running clock Uptime, and the second timestamp is calibrated based on the real-time running clock elapsedcultime. Since the clock for calibrating the timestamp of the video frame is Uptime, and the clock for calibrating the timestamp of the attitude data is elapsedRealtime, that is, the clock for calibrating the timestamp of the video frame is different from the clock for calibrating the timestamp of the attitude data, it is necessary to unify the clock for calibrating the video frame and the clock for calibrating the attitude data, that is, unify the dimensions of the first timestamp corresponding to the video frame and the second timestamp corresponding to the attitude data. Correspondingly, before the terminal selects the posture data corresponding to the target timestamp from the plurality of second timestamps through the application program, the method further includes: the terminal obtains a time difference value between ElapsedRealtime and Updime at any moment in the video acquisition process through an application program; and converting the second timestamp into a timestamp under Updime based on the time difference, or converting the first timestamp into a timestamp under ElapsedRealtime based on the time difference.

Wherein, Updime (a clock) records the total duration of the non-sleep period of the system in the process from the system start to the current time. Elapsedrultime (a clock) records the total time from system start-up to the current time. In a terminal using an Android (operating system), a part of terminals calibrate a timestamp corresponding to a video frame based on Updime, another part of terminals calibrate a timestamp corresponding to a video frame based on ElapsedRealtime, and all terminals using the Android system calibrate a second timestamp corresponding to attitude data based on ElapsedRealtime. Therefore, there is a case where the dimensions of the first time stamp and the second time stamp are not uniform. In addition, because the terminal is not in the sleep period in the video acquisition process, the time difference value between elapsedrultime and Uptime is a constant value, and therefore, the dimension unification of the first timestamp and the second timestamp can be performed on the basis of the time difference value when the time difference value between elapsedrultime and Uptime is acquired at any moment in the video acquisition process.

In the embodiment of the application, in consideration of the fact that the first timestamp corresponding to the video frame and the second timestamp corresponding to the attitude data are not uniform in dimension, the second timestamp is converted into the timestamp under the Updime, or the first timestamp is converted into the timestamp under the ElapsedRealtime, so that the dimensions of the first timestamp and the second timestamp are uniform, and when the attitude data corresponding to the first timestamp of the camera is determined based on the second timestamp, the accuracy of the determined attitude data can be ensured.

In a possible implementation manner, the converting, by the terminal, the second timestamp into a timestamp under the Uptime based on the time difference value through the application program includes: and the terminal subtracts the time difference value from the second timestamp through the application program to obtain a second timestamp under Uptime. Wherein, the time difference is the total duration of the sleep period from the system start to the current time.

In the embodiment of the application, since the time difference between elapsedRealtime and Uptime is the total time of the sleep period from the system start of the terminal to the current time, the second timestamp calibrated based on elapsedRealtime is longer than the first timestamp calibrated based on Uptime by the total time of the sleep period, so that the first timestamp and the second timestamp can be unified under elapsedRealtime only by subtracting the time difference from the second timestamp, and the method is simple and efficient.

In a possible implementation manner, the converting, by the terminal, the first timestamp into a timestamp under elapsedcrealme through an application program based on a time difference value includes: and the terminal adds the time difference value to the first time stamp through the application program to obtain the first time stamp under the ElapsedRealtime.

In the embodiment of the application, since the time difference value between elapsedRealtime and Uptime is the total time of the sleep period from the system start of the terminal to the current time, and the second timestamp calibrated based on elapsedRealtime is longer than the first timestamp calibrated based on Uptime by the total time of the sleep period, the first timestamp and the second timestamp can be unified under Uptime only by adding the time difference value to the first timestamp, and the method is simple and efficient.

Fig. 4 is a schematic diagram of a process of determining pose data corresponding to a first timestamp. Referring to fig. 4, two time axes in fig. 4 respectively correspond to a video frame calibration clock for calibrating a timestamp corresponding to a video frame and an IMU data calibration clock for calibrating a timestamp corresponding to IMU data. As can be seen from the figure, since the frequency of capturing the video frames is different from the frequency of generating the IMU data, for example, the frequency of capturing the video frames is 30 times/s, and the frequency of generating the IMU data is 200 times/s, the video frames and the IMU data are not generated in a one-to-one correspondence manner, and for one video frame, one IMU data most matched with the video frame, that is, one IMU data with the smallest deviation between the corresponding second timestamp and the corresponding first timestamp of the video frame, needs to be selected from the multiple pieces of IMU data in the cache pool.

Fig. 5 is a schematic diagram illustrating a process of unifying the first time stamp and the second time stamp to be under the same clock. Referring to fig. 5, after the first timestamp of the video frame is acquired, it is determined whether the clock for calibrating the first timestamp is elapsedreal or Uptime, and if the clock is elapsedreal, clock conversion is not required, and if the clock is Uptime, the first timestamp is required to be added with a time difference between elapsedreal and Uptime, so as to convert the first timestamp into the first timestamp under elapsedreal. In addition, after the first time stamp and the second time stamp are unified under ElapsedRealtime, the deviation between the first time stamp of the video frame and the second time stamp of the attitude data corresponding to the video frame is determined, so that the deviation can objectively reflect the accuracy degree of the time stamp of the calibrated video frame, and the jitter degree of the virtual image in the video can be quantized.

305. And the terminal determines a target position in the target video frame based on the attitude data through an application program and displays the virtual image at the target position.

In one possible implementation, the method includes: the terminal determines a second horizontal plane corresponding to the first horizontal plane in the target video frame through an application program based on the attitude data; and displaying the virtual image on a second horizontal plane, wherein the first horizontal plane is any horizontal plane in the three-dimensional space shot by the camera.

The first horizontal plane is a horizontal plane randomly selected by the application program from the three-dimensional space shot by the camera, or the first horizontal plane is determined based on the triggering operation of the user in the shooting interface of the application program. For the condition that a first horizontal plane is determined based on the triggering operation of a user, a terminal acquires any target video frame through an application program, and determines a triggered target pixel point in the target video frame after the target video frame is displayed in a shooting interface; and determining a space point corresponding to the target pixel point from a three-dimensional space shot by the camera, determining a horizontal plane where the space point is located, and determining the horizontal plane where the space point is located as a first horizontal plane. By the method, the user determines the display position of the virtual image in the video, and the interaction effect with the user can be improved.

In the embodiment of the application, the application program may obtain a plurality of consecutive target video frames, and since the second horizontal plane in each target video frame corresponds to the first horizontal plane in the three-dimensional space, displaying the avatar on the second horizontal plane in each target video frame can ensure that the avatar in the video is always on a fixed horizontal plane in the three-dimensional space, and the phenomenon that the avatar shakes up and down in the video does not occur, thereby ensuring the display effect of the avatar.

In a possible implementation manner, the terminal invokes a timestamp obtaining interface getTimeStamp of the SurfaceText through an application program, and after obtaining the first timestamp, the method further includes: and the terminal renders the target video frame in a shooting interface of the application program through a video frame rendering component GLSurfaceView.

The terminal processes an original video frame through the SurfaceText to obtain a target video frame, transmits the target video frame to the GLSurfaceView, and renders the target video frame on a shooting interface of an application program through the GLSurfaceView for a user to preview. Accordingly, the terminal displays the avatar at a target position in a target video frame in the photographing interface through the application. In this case, when the user shoots the video through the application program in the terminal, the video picture in the shooting interface is the video picture with the integrated virtual image.

In the embodiment of the application, after the terminal provides the timestamp corresponding to the target video frame to the application program through getTimeStamp, the target video frame is rendered in a shooting interface of the application program through GLSurfaceView, instead of the target video frame being rendered in the shooting interface, the timestamp of the target video frame is calibrated through the application program, so that the timestamp of the target video frame acquired by the application program is closer to the timestamp of the original video frame acquired by the camera, and therefore, the attitude data determined by the application program based on the acquired timestamp is closer to the attitude data when the original video frame is acquired by the camera, and the accuracy of the target position in the target video frame determined by the application program is further ensured.

Fig. 6 is a schematic diagram of a video frame generation process. Referring to fig. 6, firstly, a hardware layer of a camera is driven by a driving layer of the camera to collect an original video frame, then a system service layer transmits the original video frame to a surfacenext of an application layer, the surfacenext processes the original video frame to obtain a target video frame, then an application program obtains the target video frame, the glsurfacenew renders the target video frame in a shooting interface of the application program, and the application program displays an avatar at a target position in the target video frame in the shooting interface after determining the target position of the avatar in the target video frame. It can be seen from the generation process of the video frame that the real generation time of the video frame is in the hardware layer of the camera, so that what can be done in the application layer is to calibrate the time stamp for the video frame as early as possible, so that the calibrated time stamp is more accurate. In the embodiment of the application, a scheme for calibrating the timestamp of the video frame as early as possible is provided by analyzing a bottom layer principle of the Android native camera and an interface call flow of a system layer, that is, the video frame and the timestamp corresponding to the video frame are obtained from the SurfaceText which can take the video frame earlier and provides a timestamp obtaining interface, instead of obtaining the video frame from the onFrameAvailable (video frame obtaining interface), and the timestamp of the video frame is calibrated based on the moment of obtaining the video frame from the onFrameAvailable. The video frame output by the SurfaceText needs to be further processed subsequently and can be transmitted to the onFrameAvailable, so that the moment when the SurfaceText acquires the video frame is far earlier than the onFrameAvailable, and therefore, the time point represented by the first time stamp of the target video frame obtained by the SurfaceText processing is closer to the time point when the camera acquires the original video frame.

Fig. 7 is a target video frame after fusing the avatar. Referring to fig. 7, the avatar in the video frame is a cat 701, and the cat 701 is displayed on a desktop in the real environment.

It should be noted that Android native Camera2 (a Camera component) provides a timestamp capture interface for providing timestamps for capturing original video frames, however, only about 40% of terminals in the market support Camera2, most terminals do not support Camera2, and only Camera1 (a Camera component) is supported. Since Camera1 does not provide an interface for acquiring the timestamp of the original video frame, for this part of terminals, the method provided by the embodiment of the present application can be used to acquire a more accurate timestamp of the video frame, so as to alleviate the shake degree of the avatar in the video when this part of terminals acquire the video in the moving process. In addition, through analysis of experimental results of 100 types of terminals, the method for displaying the avatar, provided by the embodiment of the application, can relieve 97% of jitter degree of the avatar of the terminal type on the market, and achieves 97% of terminal coverage. In addition, the average error of the timestamp of the video frame obtained in the embodiment of the present application is about 15ms, and the error does not affect the application program to determine the target position in the video frame.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

Fig. 8 is a block diagram of an avatar display apparatus according to an embodiment of the present application. Referring to fig. 8, the embodiment includes:

a video frame acquiring module 801, configured to acquire, through an application program, a target video frame output by a video frame processing component surfacenext, where the target video frame is obtained by processing an original video frame currently acquired by a camera;

a timestamp obtaining module 802, configured to invoke a timestamp obtaining interface getTimeStamp of the SurfaceText through an application program, and obtain a first timestamp, where the first timestamp is used to indicate a time point when the SurfaceText is processed to obtain a target video frame;

and an avatar display module 803, configured to determine, by the application program, pose data corresponding to the first timestamp of the camera, determine a target position in the target video frame based on the pose data, and display the avatar at the target position.

In one possible implementation, referring to fig. 9, the apparatus further includes:

the interface calling module 804 is configured to call data to update the interface updateTextImage through an application program, and trigger the surfitext to process an original video frame currently acquired by the camera, so as to obtain a target video frame.

In one possible implementation, referring to fig. 9, avatar display module 803 includes:

the data acquisition unit 8031 is configured to acquire, by an application program, a plurality of pieces of pose data of the camera during a video acquisition process, and a second timestamp corresponding to each piece of pose data, where the second timestamp is used to indicate a time point at which the corresponding pose data is generated;

a data selecting unit 8032, configured to select, from the multiple second timestamps, the posture data corresponding to a target timestamp, where the target timestamp is a second timestamp with a smallest deviation from the first timestamp among the multiple second timestamps;

the data determining unit 8033 is configured to determine the selected pose data as pose data corresponding to the first timestamp of the camera.

In a possible implementation manner, the data obtaining unit 8031 is configured to obtain, by an application program, a plurality of pieces of inertial measurement unit IMU data and a second timestamp corresponding to each piece of IMU data from a buffer pool; determining a plurality of pieces of IMU data as a plurality of pieces of attitude data of the camera; the cache pool is used for storing IMU data generated in the video acquisition process.

In a possible implementation manner, the first timestamp is calibrated based on a normal running clock Uptime, and the second timestamp is calibrated based on a real-time running clock elapsedRealtime;

referring to fig. 9, the avatar display module 803 further includes:

a difference value determining unit 8034, configured to obtain a time difference value between elapsedRealtime and Uptime at any time in a video acquisition process;

the timestamp conversion unit 8035 is configured to convert the second timestamp into a timestamp under the Uptime based on the time difference, or convert the first timestamp into a timestamp under the elapsedRealtime based on the time difference.

In a possible implementation manner, the timestamp conversion unit 8035 is configured to subtract the time difference value from the second timestamp to obtain a second timestamp under Uptime.

In a possible implementation manner, the timestamp conversion unit 8035 is configured to add a time difference to the first timestamp to obtain the first timestamp under elapsedcultime.

In a possible implementation manner, the avatar display module 803 is configured to determine, based on the pose data, a corresponding second horizontal plane of a first horizontal plane in the target video frame, where the first horizontal plane is any horizontal plane in a three-dimensional space captured by the camera; the avatar is displayed on a second horizontal plane.

In a possible implementation, the avatar display module 803 is further configured to render the target video frame in a shooting interface of the application program through a video frame rendering component GLSurfaceView.

It should be noted that: in the avatar display apparatus provided in the above embodiment, when displaying an avatar, only the division of the above functional modules is used for illustration, in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the terminal is divided into different functional modules to complete all or part of the above described functions. In addition, the avatar display apparatus provided in the above embodiments and the avatar display method embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 10 shows a block diagram of a terminal 1000 according to an exemplary embodiment of the present application. The terminal 1000 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1000 can also be referred to as user equipment, portable terminal, laptop terminal, desktop terminal, or the like by other names.

Terminal 1000 can include: a processor 1001 and a memory 1002.

Processor 1001 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so forth. The processor 1001 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Memory 1002 may include one or more computer-readable storage media, which may be non-transitory. The memory 1002 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in the memory 1002 is used to store at least one computer program for execution by the processor 1001 to implement the avatar display method provided by the method embodiments herein.

In a possible implementation manner, terminal 1000 can further optionally include: a peripheral interface 1003 and at least one peripheral. The processor 1001, memory 1002 and peripheral interface 1003 may be connected by a bus or signal line. Various peripheral devices may be connected to peripheral interface 1003 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a display screen 1004, a camera assembly 1005, and a power supply 1006.

The peripheral interface 1003 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1001 and the memory 1002. In some embodiments, processor 1001, memory 1002, and peripheral interface 1003 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1001, the memory 1002, and the peripheral interface 1003 may be implemented on separate chips or circuit boards, which are not limited by this embodiment.

The display screen 1004 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1004 is a touch display screen, the display screen 1004 also has the ability to capture touch signals on or over the surface of the display screen 1004. The touch signal may be input to the processor 1001 as a control signal for processing. At this point, the display 1004 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1004 can be one, providing a front panel of terminal 1000; in other embodiments, display 1004 can be at least two, either separately disposed on different surfaces of terminal 1000 or in a folded design; in other embodiments, display 1004 can be a flexible display disposed on a curved surface or a folded surface of terminal 1000. Even more, the display 1004 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display screen 1004 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), or other materials.

The camera assembly 1005 is used to capture images or video. Optionally, camera assembly 1005 includes a front camera and a rear camera. The front camera is arranged on the front panel of the terminal, and the rear camera is arranged on the back of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 1005 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

Power supply 1006 is used to provide power to the various components in terminal 1000. The power supply 1006 may be ac, dc, disposable or rechargeable. When the power supply 1006 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, terminal 1000 can also include one or more sensors 1010. The one or more sensors 1010 include, but are not limited to: acceleration sensor 1007, gyro sensor 1008, pressure sensor 1009, optical sensor 1010.

Acceleration sensor 1007 can detect acceleration in three coordinate axes of a coordinate system established with terminal 1000. For example, the acceleration sensor 1007 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1001 may control the display screen 1004 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1007. The acceleration sensor 1007 may also be used for collection of motion data of a game or a user.

The gyro sensor 1008 can detect the body direction and the rotation angle of the terminal 1000, and the gyro sensor 1008 and the acceleration sensor 1007 can cooperate to acquire the 3D motion of the user on the terminal 1000. From the data collected by the gyro sensor 1008, the processor 1001 may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization at the time of photographing, game control, and inertial navigation.

Pressure sensor 1009 can be disposed on a side bezel of terminal 1000 and/or underneath display screen 1004. When the pressure sensor 1009 is disposed at a side frame of the terminal 1000, a holding signal of the terminal 1000 by a user can be detected, and the processor 1001 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1009. When the pressure sensor 1009 is disposed at a lower layer of the display screen 1004, the processor 1001 controls the operability control on the UI interface according to the pressure operation of the user on the display screen 1004. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The optical sensor 1010 is used to collect the ambient light intensity. In one embodiment, the processor 1001 may control the display brightness of the display screen 1004 according to the ambient light intensity collected by the optical sensor 1010. Specifically, when the ambient light intensity is high, the display brightness of the display screen 1004 is increased; when the ambient light intensity is low, the display brightness of the display screen 1004 is reduced. In another embodiment, the processor 1001 may also dynamically adjust the shooting parameters of the camera assembly 1005 according to the ambient light intensity collected by the optical sensor 1010.

Those skilled in the art will appreciate that the configuration shown in FIG. 10 is not intended to be limiting and that terminal 1000 can include more or fewer components than shown, or some components can be combined, or a different arrangement of components can be employed.

An embodiment of the present application further provides a computer-readable storage medium, in which at least one computer program is stored, and the at least one computer program is loaded and executed by a processor, so as to implement the operations executed in the avatar display method of the above embodiment.

Embodiments of the present application also provide a computer program product or a computer program, which includes a computer program, and the computer program is stored in a computer readable storage medium. The processor of the terminal reads the computer program from the computer-readable storage medium, and executes the computer program, so that the terminal performs the operations performed in the avatar display method in the various alternative implementations described above.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. An avatar display method, performed by a terminal, the method comprising:

2. The method of claim 1, wherein before the obtaining, by the application program, the target video frame output by the video frame processing component SurfaceText, the method further comprises:

and calling a data update interface updateTextImage through the application program, and triggering the SurfaceText to process the original video frame currently acquired by the camera to obtain the target video frame.

3. The method of claim 1, wherein the determining, by the application program, pose data corresponding to the camera at the first timestamp comprises:

acquiring a plurality of pieces of attitude data of the camera in a video acquisition process and a second timestamp corresponding to each piece of attitude data through the application program, wherein the second timestamp is used for representing a time point of generating corresponding attitude data;

selecting attitude data corresponding to a target timestamp from a plurality of second timestamps, wherein the target timestamp is a second timestamp with the smallest deviation from the first timestamp in the plurality of second timestamps;

and determining the selected attitude data as the attitude data corresponding to the first time stamp of the camera.

4. The method of claim 3, wherein the obtaining, by the application program, a plurality of pieces of pose data of the camera during video capture and a second timestamp corresponding to each piece of pose data comprises:

acquiring a plurality of pieces of inertial measurement unit IMU data and a second timestamp corresponding to each piece of IMU data from a cache pool through the application program;

determining the plurality of pieces of IMU data as a plurality of pieces of pose data of the camera;

the cache pool is used for storing IMU data generated in the video acquisition process.

5. The method of claim 3, wherein the first timestamp is calibrated based on a normal running clock UpTime and the second timestamp is calibrated based on a real-time running clock ElapsedRealtime;

before the selecting the pose data corresponding to the target timestamp from the plurality of second timestamps, the method further includes:

acquiring a time difference value between the ElapsedRealtime and the Updime at any time in the video acquisition process;

and converting the second timestamp into a timestamp under the Updime based on the time difference, or converting the first timestamp into a timestamp under the ElapsedRealtime based on the time difference.

6. The method of claim 5, wherein the converting the second timestamp to a timestamp at the UpTime based on the time difference value comprises:

and subtracting the time difference value from the second timestamp to obtain the second timestamp under the Uptime.

7. The method of claim 5, wherein the converting the first timestamp to a timestamp under the ElapsedRealtime based on the time difference value comprises:

and adding the time difference value to the first timestamp to obtain the first timestamp under the ElapsedRealtime.

8. The method of claim 1, wherein determining a target location in the target video frame based on the pose data, displaying an avatar at the target location, comprises:

determining a second horizontal plane corresponding to a first horizontal plane in the target video frame based on the attitude data, wherein the first horizontal plane is any horizontal plane in a three-dimensional space shot by the camera;

displaying the avatar on the second horizontal plane.

9. The method of claim 1, wherein after the timestamp get interface getTimeStamp of the SurfaceText is called by the application to get a first timestamp, the method further comprises:

and rendering the target video frame in a shooting interface of the application program through a video frame rendering component GLSurfaceView.

10. An avatar display apparatus, said apparatus comprising:

11. The apparatus of claim 10, further comprising:

12. The apparatus of claim 10, wherein the avatar display module comprises:

13. The apparatus of claim 12,

the data acquisition unit is used for acquiring a plurality of pieces of IMU data of the inertial measurement unit and a second timestamp corresponding to each piece of IMU data from a cache pool through the application program; determining the plurality of pieces of IMU data as a plurality of pieces of pose data of the camera; the cache pool is used for storing IMU data generated in the video acquisition process.

14. A terminal characterized in that it comprises a processor and a memory in which at least one computer program is stored, the computer program being loaded and executed by the processor to implement the operations performed by the avatar display method of any one of claims 1-9.

15. A computer-readable storage medium, in which at least one computer program is stored, the computer program being loaded and executed by a processor to implement the operations performed by the avatar display method of any one of claims 1-9.