CN115767206A

CN115767206A - Data processing method and system based on extended reality

Info

Publication number: CN115767206A
Application number: CN202211304735.4A
Authority: CN
Inventors: 汤旭涛; 耿文波; 蒋佳忆
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-03-07

Abstract

The embodiment of the specification provides a data processing method and a system based on augmented reality, wherein the data processing method based on augmented reality comprises the following steps: in an extended reality scene, audio data generated by a target virtual user in a preset time interval are collected, virtual actions of the target virtual user corresponding to the audio data are determined, animation data of the target virtual user are determined according to the virtual actions, corresponding time stamps are added in coding results of the audio data according to the preset time interval to generate target coding results, the corresponding time stamps are added in the animation data to generate target animation data, the target coding results are sent to a client through an audio transmission channel, the target animation data are sent to the client through an animation data transmission channel, so that the client decodes the target coding results, and the generated decoding results and the animation data are synchronously played according to the time stamps.

Description

Data processing method and system based on extended reality

技术领域technical field

本说明书实施例涉及扩展现实技术领域，特别涉及基于扩展现实的数据处理方法。The embodiments of this specification relate to the technical field of extended reality, and in particular, to a data processing method based on the extended reality.

背景技术Background technique

扩展现实技术，是一种全新的人机交互技术，该技术可以模拟现实场景，让使用者能够通过虚拟现实系统，感受到客观物理世界中的事物，并且能突破空间、时间以及其他客观限制，感受真实世界中无法亲身经历的体验，以将真实世界信息和虚拟世界信息“无缝集成”。Extended reality technology is a brand-new human-computer interaction technology. This technology can simulate real scenes, allowing users to feel things in the objective physical world through the virtual reality system, and can break through space, time and other objective constraints. Feel the experience that you can't experience in the real world to "seamlessly integrate" the real world information and the virtual world information.

目前常见的扩展现实场景包括虚拟游戏场景以及虚拟直播场景等。但是目前的扩展现实场景，其实现方式通常是将多媒体数据发送至用户终端，由用户终端从多媒体数据中分离出相应的视频数据和音频数据，通过在显示器显示视频数据，并通过扬声器播放音频数据，从而实现多媒体数据的播放。Currently common extended reality scenarios include virtual game scenarios and virtual live broadcast scenarios. However, the current extended reality scenario is usually implemented by sending multimedia data to the user terminal, and the user terminal separates the corresponding video data and audio data from the multimedia data, displays the video data on the display, and plays the audio data through the speaker. , so as to realize the playback of multimedia data.

但这种处理方式，由于视频数据的处理时间一般会比音频数据的处理时间长，导致视频播放会落后于音频，造成音画不同步的现象。因此，亟需一种有效的方法以解决此类问题。However, in this processing method, since the processing time of video data is generally longer than the processing time of audio data, the video playback will lag behind the audio, resulting in the phenomenon that audio and video are not synchronized. Therefore, there is an urgent need for an effective method to solve such problems.

发明内容Contents of the invention

有鉴于此，本说明书实施例提供了基于扩展现实的数据处理方法。本说明书一个或者多个实施例同时涉及基于扩展现实的数据处理装置，一种基于扩展现实的数据处理系统，一种计算设备，一种计算机可读存储介质以及一种计算机程序，以解决现有技术中存在的技术缺陷。In view of this, the embodiment of this specification provides a data processing method based on extended reality. One or more embodiments of this specification also relate to an extended reality-based data processing device, an extended reality-based data processing system, a computing device, a computer-readable storage medium, and a computer program to solve existing Technical defects in the technology.

根据本说明书实施例的第一方面，提供了一种基于扩展现实的数据处理方法，包括：According to the first aspect of the embodiments of this specification, an extended reality-based data processing method is provided, including:

采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据；Collect audio data generated by the target virtual user within a preset time interval in the extended reality scene;

确定与所述音频数据对应的、所述目标虚拟用户的虚拟动作，并根据所述虚拟动作确定所述目标虚拟用户的动画数据；determining a virtual action of the target virtual user corresponding to the audio data, and determining animation data of the target virtual user according to the virtual action;

根据所述预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果，并在所述动画数据中添加对应的时间戳，生成目标动画数据；According to the preset time interval, adding a corresponding time stamp to the encoding result of the audio data to generate a target encoding result, and adding a corresponding time stamp to the animation data to generate target animation data;

将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端，以使所述客户端对所述目标编码结果进行解码，并根据所述时间戳，对生成的解码结果及所述动画数据进行同步播放。sending the target encoding result to the client through an audio transmission channel, and sending the target animation data to the client through an animation data transmission channel, so that the client can decode the target encoding result, and According to the time stamp, the generated decoding result and the animation data are played synchronously.

根据本说明书实施例的第二方面，提供了一种基于扩展现实的数据处理装置，包括：According to the second aspect of the embodiments of this specification, an extended reality-based data processing device is provided, including:

采集模块，被配置为采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据；The collection module is configured to collect audio data generated by the target virtual user within a preset time interval in the extended reality scene;

确定模块，被配置为确定与所述音频数据对应的、所述目标虚拟用户的虚拟动作，并根据所述虚拟动作确定所述目标虚拟用户的动画数据；A determination module configured to determine a virtual action of the target virtual user corresponding to the audio data, and determine animation data of the target virtual user according to the virtual action;

添加模块，被配置为根据所述预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果，并在所述动画数据中添加对应的时间戳，生成目标动画数据；The adding module is configured to add a corresponding time stamp to the encoding result of the audio data according to the preset time interval to generate a target encoding result, and add a corresponding time stamp to the animation data to generate a target animation data;

发送模块，被配置为将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端，以使所述客户端对所述目标编码结果进行解码，并根据所述时间戳，对生成的解码结果及所述动画数据进行同步播放。The sending module is configured to send the target encoding result to the client through the audio transmission channel, and send the target animation data to the client through the animation data transmission channel, so that the client can understand the target The encoding result is decoded, and the generated decoding result and the animation data are played synchronously according to the time stamp.

根据本说明书实施例的第三方面，提供了另一种基于扩展现实的数据处理方法，包括：According to the third aspect of the embodiments of this specification, another extended reality-based data processing method is provided, including:

通过动画数据传输通道接收扩展现实场景中，目标虚拟用户在预设时间区间内生成的动画数据，其中，所述动画数据中包含第一时间戳；receiving animation data generated by the target virtual user within a preset time interval in the extended reality scene through the animation data transmission channel, wherein the animation data includes a first time stamp;

通过音频传输通道接收扩展现实场景中，所述目标虚拟用户在预设时间区间内生成的音频数据的编码结果，其中，所述编码结果中包含第二时间戳；receiving an encoding result of audio data generated by the target virtual user within a preset time interval in the extended reality scenario through an audio transmission channel, wherein the encoding result includes a second timestamp;

对所述编码结果进行解码，并基于所述动画数据进行动画渲染；Decoding the encoding result, and performing animation rendering based on the animation data;

根据所述第一时间戳及所述第二时间戳，在所述扩展现实场景中对生成的解码结果及动画渲染结果进行同步播放。According to the first time stamp and the second time stamp, the generated decoding result and animation rendering result are played synchronously in the extended reality scene.

根据本说明书实施例的第四方面，提供了另一种基于扩展现实的数据处理装置，包括：According to the fourth aspect of the embodiments of this specification, another data processing device based on extended reality is provided, including:

第一接收模块，被配置为通过动画数据传输通道接收扩展现实场景中，目标虚拟用户在预设时间区间内生成的动画数据，其中，所述动画数据中包含第一时间戳；The first receiving module is configured to receive the animation data generated by the target virtual user within a preset time interval in the extended reality scene through the animation data transmission channel, wherein the animation data includes a first time stamp;

第二接收模块，被配置为通过音频传输通道接收扩展现实场景中，所述目标虚拟用户在预设时间区间内生成的音频数据的编码结果，其中，所述编码结果中包含第二时间戳；The second receiving module is configured to receive an encoding result of audio data generated by the target virtual user within a preset time interval in the extended reality scene through an audio transmission channel, wherein the encoding result includes a second time stamp;

处理模块，被配置为对所述编码结果进行解码，并基于所述动画数据进行动画渲染；A processing module configured to decode the encoding result, and perform animation rendering based on the animation data;

播放模块，被配置为根据所述第一时间戳及所述第二时间戳，在所述扩展现实场景中对生成的解码结果及动画渲染结果进行同步播放。The playing module is configured to synchronously play the generated decoding result and animation rendering result in the extended reality scene according to the first time stamp and the second time stamp.

根据本说明书实施例的第五方面，提供了另一种基于扩展现实的数据处理系统，包括：According to the fifth aspect of the embodiments of this specification, another data processing system based on extended reality is provided, including:

第一客户端、云服务器以及至少一个第二客户端；a first client, a cloud server, and at least one second client;

所述第一客户端，用于采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据，确定与所述音频数据对应的、所述目标虚拟用户的虚拟动作，并根据所述虚拟动作确定所述目标虚拟用户的动画数据，根据所述预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果，并在所述动画数据中添加对应的时间戳，生成目标动画数据，将所述目标编码结果及所述目标动画数据发送至所述云服务器；The first client is configured to collect audio data generated by the target virtual user within a preset time interval in the extended reality scene, determine the virtual action of the target virtual user corresponding to the audio data, and The virtual action determines the animation data of the target virtual user, and according to the preset time interval, adds a corresponding time stamp to the encoding result of the audio data to generate a target encoding result, and adds a corresponding time stamp to the animation data. timestamp, generate target animation data, and send the target encoding result and the target animation data to the cloud server;

所述云服务器，用于通过音频传输通道将所述目标编码结果传输至所述至少一个第二客户端，并通过动画数据传输通道将所述目标动画数据传输至所述至少一个第二客户端；The cloud server is configured to transmit the target encoding result to the at least one second client through an audio transmission channel, and transmit the target animation data to the at least one second client through an animation data transmission channel ;

所述至少一个第二客户端，用于对所述目标编码结果进行解码，以及，基于所述动画数据进行动画渲染，并根据所述时间戳，对生成的解码结果及动画渲染结果进行同步播放。The at least one second client is configured to decode the target encoding result, and perform animation rendering based on the animation data, and synchronously play the generated decoding result and animation rendering result according to the timestamp .

根据本说明书实施例的第六方面，提供了一种计算设备，包括：According to a sixth aspect of the embodiments of this specification, a computing device is provided, including:

存储器和处理器；memory and processor;

所述存储器用于存储计算机可执行指令，所述处理器用于执行所述计算机可执行指令实现任意一项所述基于扩展现实的数据处理方法的步骤。The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions to implement any one of the steps of the extended reality-based data processing method.

根据本说明书实施例的第七方面，提供了一种计算机可读存储介质，其存储有计算机可执行指令，该指令被处理器执行时实现任意一项所述基于扩展现实的数据处理方法的步骤。According to the seventh aspect of the embodiments of this specification, there is provided a computer-readable storage medium, which stores computer-executable instructions, and when the instructions are executed by a processor, any one of the steps of the extended reality-based data processing method is implemented. .

根据本说明书实施例的第八方面，提供了一种计算机程序，其中，当所述计算机程序在计算机中执行时，令计算机执行上述基于扩展现实的数据处理方法的步骤。According to an eighth aspect of the embodiments of the present specification, a computer program is provided, wherein, when the computer program is executed in a computer, it causes the computer to execute the steps of the above-mentioned extended reality-based data processing method.

本说明书一个实施例通过采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据，确定与所述音频数据对应的、所述目标虚拟用户的虚拟动作，并根据所述虚拟动作确定所述目标虚拟用户的动画数据，根据所述预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果，并在所述动画数据中添加对应的时间戳，生成目标动画数据，将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端，以使所述客户端对所述目标编码结果进行解码，并根据所述时间戳，对生成的解码结果及所述动画数据进行同步播放。In one embodiment of this specification, by collecting the audio data generated by the target virtual user within a preset time interval in the extended reality scene, the virtual action of the target virtual user corresponding to the audio data is determined, and according to the virtual action Determine the animation data of the target virtual user, add a corresponding time stamp to the encoding result of the audio data according to the preset time interval, generate a target encoding result, and add the corresponding time stamp to the animation data , generate target animation data, send the target encoding result to the client through the audio transmission channel, and send the target animation data to the client through the animation data transmission channel, so that the client can understand the target The encoding result is decoded, and the generated decoding result and the animation data are played synchronously according to the time stamp.

本说明书实施例通过在音频数据和动画数据中添加时间戳，使得客户端可以时间戳为基准将动画数据和音频数据进行同步播放，有利于提高音画同步播放效果；另外，在进行数据传输时，使用独立的传输通道分别传输动画数据和音频数据的编码结果，使得二者在传输过程中互不干扰，有利于保证数据传输的实时性，从而保证数据处理效率；此外，在传输动画数据时，不对其进行压缩编码，有利于避免数据丢失，使得客户端基于动画数据进行动画渲染，得到的渲染结果能够符合预期，基于该动画渲染结果及音频数据的解码结果进行音画同步播放，有利于提升用户的观看体验。The embodiment of this specification adds time stamps to the audio data and animation data, so that the client can play the animation data and audio data synchronously based on the time stamps, which is conducive to improving the synchronous playback effect of audio and video; in addition, when performing data transmission , using independent transmission channels to transmit the encoding results of animation data and audio data respectively, so that the two do not interfere with each other during the transmission process, which is conducive to ensuring the real-time performance of data transmission, thereby ensuring the efficiency of data processing; in addition, when transmitting animation data , without compressing and encoding it, it is beneficial to avoid data loss, so that the client performs animation rendering based on the animation data, and the rendering result can meet expectations. Based on the animation rendering result and the decoding result of the audio data, the audio and video are played synchronously, which is beneficial Improve user viewing experience.

附图说明Description of drawings

图1是本说明书一个实施例提供的一种基于扩展现实的数据处理系统的架构图；FIG. 1 is an architecture diagram of an extended reality-based data processing system provided by an embodiment of this specification;

图2是本说明书一个实施例提供的一种基于扩展现实的数据处理方法的流程图；Fig. 2 is a flow chart of an extended reality-based data processing method provided by an embodiment of this specification;

图3是本说明书一个实施例提供的一种基于扩展现实的数据处理方法的流程图；Fig. 3 is a flow chart of an extended reality-based data processing method provided by an embodiment of this specification;

图4是本说明书一个实施例提供的一种基于扩展现实的数据处理方法的流程图；Fig. 4 is a flowchart of an extended reality-based data processing method provided by an embodiment of this specification;

图5a是本说明书一个实施例提供的一种基于扩展现实的数据处理方法的流程图；Fig. 5a is a flowchart of an extended reality-based data processing method provided by an embodiment of this specification;

图5b是本说明书一个实施例提供的一种基于扩展现实的数据处理方法的流程图；Fig. 5b is a flowchart of an extended reality-based data processing method provided by an embodiment of this specification;

图6是本说明书一个实施例提供的一种基于扩展现实的数据处理装置的结构示意图；Fig. 6 is a schematic structural diagram of an extended reality-based data processing device provided by an embodiment of this specification;

图7是本说明书一个实施例提供的另一种基于扩展现实的数据处理方法的流程图；Fig. 7 is a flow chart of another data processing method based on extended reality provided by an embodiment of this specification;

图8是本说明书一个实施例提供的一种基于扩展现实的数据处理方法的处理过程流程图；Fig. 8 is a flowchart of a processing process of an extended reality-based data processing method provided by an embodiment of this specification;

图9是本说明书一个实施例提供的另一种基于扩展现实的数据处理装置的结构示意图；Fig. 9 is a schematic structural diagram of another data processing device based on extended reality provided by an embodiment of this specification;

图10是本说明书一个实施例提供的一种计算设备的结构框图。Fig. 10 is a structural block diagram of a computing device provided by an embodiment of this specification.

具体实施方式Detailed ways

在下面的描述中阐述了很多具体细节以便于充分理解本说明书。但是本说明书能够以很多不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本说明书内涵的情况下做类似推广，因此本说明书不受下面公开的具体实施的限制。In the following description, numerous specific details are set forth in order to provide a thorough understanding of the specification. However, this specification can be implemented in many other ways different from those described here, and those skilled in the art can make similar extensions without violating the connotation of this specification, so this specification is not limited by the specific implementations disclosed below.

在本说明书一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本说明书一个或多个实施例。在本说明书一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本说明书一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。Terms used in one or more embodiments of this specification are for the purpose of describing specific embodiments only, and are not intended to limit one or more embodiments of this specification. As used in one or more embodiments of this specification and the appended claims, the singular forms "a", "the", and "the" are also intended to include the plural forms unless the context clearly dictates otherwise. It should also be understood that the term "and/or" used in one or more embodiments of the present specification refers to and includes any or all possible combinations of one or more associated listed items.

应当理解，尽管在本说明书一个或多个实施例中可能采用术语第一、第二等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本说明书一个或多个实施例范围的情况下，第一也可以被称为第二，类似地，第二也可以被称为第一。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, etc. may be used to describe various information in one or more embodiments of the present specification, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, the first may also be referred to as the second, and similarly, the second may also be referred to as the first without departing from the scope of one or more embodiments of the present specification. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

首先，对本说明书一个或多个实施例涉及的名词术语进行解释。First, terms and terms involved in one or more embodiments of this specification are explained.

虚拟用户(虚拟人、Avatar)：通过数字技术而合成的三维模型，以数字形式存在于非物理世界，具有人的外观行为感知和拟人化的交互能力依赖。Virtual user (virtual human, Avatar): A three-dimensional model synthesized through digital technology, which exists in the non-physical world in digital form, and has human appearance behavior perception and anthropomorphic interaction ability dependence.

音频数据：麦克风采集到声音数据。Audio data: Sound data collected by the microphone.

动画(动作)数据：指的是虚拟用户交互时产生的关键点数据，比如唇部、眨眼和眉毛动作等自然面部表情、头部动作以及身体骨骼动作关键点数据等。需要保持动画数据的一致性以及不可修改，数据不可以进行编码和有损压缩。Animation (action) data: refers to key point data generated during virtual user interaction, such as natural facial expressions such as lip, blink and eyebrow movements, head movement, and key point data of body bone movements. It is necessary to maintain the consistency and non-modification of the animation data, and the data cannot be encoded and lossy compressed.

以唇部表情数据为例，由15个视觉音素的浮点数数据组成，通过这些数据渲染出对应的唇部动画。如果数据值在接收端和发送端有误差，会导致渲染效果异常。Taking the lip expression data as an example, it consists of floating point data of 15 visual phonemes, through which the corresponding lip animation is rendered. If there is an error in the data value between the receiving end and the sending end, the rendering effect will be abnormal.

同步：音频数据和动画数据在生产-传输-接收全链路上的处理方法，最终达到“音-动画”一致的显示效果。Synchronization: the processing method of audio data and animation data in the production-transmission-reception link, and finally achieve the consistent display effect of "sound-animation".

在本说明书中，提供了基于扩展现实的数据处理方法，本说明书同时涉及基于扩展现实的数据处理装置，一种基于扩展现实的数据处理系统，一种计算设备，一种计算机可读存储介质，以及一种计算机程序，在下面的实施例中逐一进行详细说明。In this specification, a data processing method based on extended reality is provided. This specification also relates to a data processing device based on extended reality, a data processing system based on extended reality, a computing device, and a computer-readable storage medium. And a computer program, which will be described in detail one by one in the following embodiments.

图1示出了根据本说明书一个实施例提供的一种基于扩展现实的数据处理系统的架构图，包括：Fig. 1 shows an architecture diagram of an extended reality-based data processing system provided according to an embodiment of this specification, including:

第一客户端102、云服务器104以及至少一个第二客户端106；A first client 102, a cloud server 104 and at least one second client 106;

所述第一客户端102，用于采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据，确定与所述音频数据对应的、所述目标虚拟用户的虚拟动作，并根据所述虚拟动作确定所述目标虚拟用户的动画数据，根据所述预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果，并在所述动画数据中添加对应的时间戳，生成目标动画数据，将所述目标编码结果及所述目标动画数据发送至所述云服务器104；The first client 102 is configured to collect audio data generated by the target virtual user within a preset time interval in the extended reality scene, determine the virtual action of the target virtual user corresponding to the audio data, and The virtual action determines the animation data of the target virtual user, and according to the preset time interval, a corresponding time stamp is added to the encoding result of the audio data to generate a target encoding result, and is added to the animation data Generate target animation data corresponding to the timestamp, and send the target encoding result and the target animation data to the cloud server 104;

所述云服务器104，用于通过音频传输通道将所述目标编码结果传输至所述至少一个第二客户端106，并通过动画数据传输通道将所述目标动画数据传输至所述至少一个第二客户端106；The cloud server 104 is configured to transmit the target encoding result to the at least one second client 106 through an audio transmission channel, and transmit the target animation data to the at least one second client 106 through an animation data transmission channel. Client 106;

所述至少一个第二客户端106，用于对所述目标编码结果进行解码，以及，基于所述动画数据进行动画渲染，并根据所述时间戳，对生成的解码结果及动画渲染结果进行同步播放。The at least one second client 106 is configured to decode the target encoding result, perform animation rendering based on the animation data, and synchronize the generated decoding result and animation rendering result according to the timestamp play.

具体的，第一客户端采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据，确定与音频数据对应的、目标虚拟用户的虚拟唇部动作和/或虚拟肢体动作，并根据虚拟唇部动作和/或虚拟肢体动作确定目标虚拟用户的动画数据。Specifically, the first client collects the audio data generated by the target virtual user within a preset time interval in the extended reality scene, determines the virtual lip movement and/or virtual body movement of the target virtual user corresponding to the audio data, and The animation data of the target virtual user is determined according to the virtual lip motion and/or the virtual body motion.

第一客户端对音频数据进行编码，并根据预设时间区间，在音频数据的编码结果及动画数据中添加对应的时间戳，然后将添加时间戳的编码结果及动画数据发送至云服务器。The first client encodes the audio data, and according to a preset time interval, adds a corresponding time stamp to the encoding result of the audio data and the animation data, and then sends the encoding result with the time stamp and the animation data to the cloud server.

云服务器通过音频传输通道将添加时间戳的编码结果传输至第二客户端，并通过动画数据传输通道将添加时间戳的动画数据传输至第二客户端。The cloud server transmits the time-stamped encoding result to the second client through the audio transmission channel, and transmits the time-stamped animation data to the second client through the animation data transmission channel.

第二客户端对添加时间戳的编码结果进行解码，并基于动画数据进行动画渲染，再根据编码结果及动画数据中包含的时间戳，对生成的解码结果及动画渲染结果进行同步播放。The second client decodes the encoding result with the added timestamp, performs animation rendering based on the animation data, and then plays the generated decoding result and animation rendering result synchronously according to the encoding result and the timestamp contained in the animation data.

上述为本实施例的一种基于扩展现实的数据处理系统的示意性方案。需要说明的是，该基于扩展现实的数据处理系统的技术方案与下述的基于扩展现实的数据处理方法的技术方案属于同一构思，基于扩展现实的数据处理系统的技术方案未详细描述的细节内容，均可以参见下述基于扩展现实的数据处理方法的技术方案的描述。The foregoing is a schematic solution of an extended reality-based data processing system in this embodiment. It should be noted that the technical solution of the extended reality-based data processing system and the following technical solution of the extended reality-based data processing method belong to the same concept, and the details of the technical solution based on the extended reality data processing system are not described in detail , can refer to the following description of the technical solution of the data processing method based on extended reality.

图2示出了根据本说明书一个实施例提供的一种基于扩展现实的数据处理方法的流程图，具体包括以下步骤。Fig. 2 shows a flow chart of an extended reality-based data processing method according to an embodiment of the present specification, which specifically includes the following steps.

步骤202，采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据。Step 202, collecting audio data generated by the target virtual user within a preset time interval in the extended reality scene.

具体的，本说明书实施例提供的基于扩展现实的数据处理方法，应用于第一客户端。Specifically, the extended reality-based data processing method provided in the embodiment of this specification is applied to the first client.

扩展现实(ExtendedReality，XR)是指通过以计算机为核心的现代高科技手段营造真实、虚拟组合的数字化环境，以及新型人机交互方式，为体验者带来虚拟世界与现实世界之间无缝转换的沉浸感，是AR(AugmentedReality，增强现实)、VR(VirtualReality，虚拟现实)、MR(MediatedReality，介导现实)等多种技术的统称。Extended Reality (XR) refers to the creation of real and virtual combined digital environments and new human-computer interaction methods through modern high-tech means with computers as the core, bringing seamless transition between the virtual world and the real world to the experiencers. The immersion of AR (Augmented Reality, Augmented Reality), VR (Virtual Reality, Virtual Reality), MR (Mediated Reality, Mediated Reality) and other technologies collectively.

本说明书实施例中的扩展现实场景，可以是基于扩展现实的虚拟会议场景或虚拟直播场景等，场景中的参会者或主播即为虚拟用户，目标虚拟用户即可以是参会者或主播中的正在发言的虚拟用户。The extended reality scene in the embodiment of this specification can be a virtual conference scene or a virtual live broadcast scene based on extended reality. The virtual user who is speaking.

扩展现实场景中，需将任意虚拟用户实时产生的音频数据和动画数据在其他虚拟用户的用户终端进行同步，因此，为保证各用户终端的音频及画面的一致性，本说明书实施例可分别采集目标虚拟用户的音频数据和动画数据，并在音频数据和动画数据中添加时间戳，使得在将音频数据和动画数据发送至各用户终端后，各用户终端可以时间戳为基准将动画数据和音频数据进行同步播放。In the extended reality scenario, it is necessary to synchronize the audio data and animation data generated by any virtual user in real time on the user terminals of other virtual users. Therefore, in order to ensure the consistency of the audio and picture of each user terminal, the embodiments of this specification can separately collect Audio data and animation data of the target virtual user, and time stamps are added to the audio data and animation data, so that after the audio data and animation data are sent to each user terminal, each user terminal can use the time stamp as a reference to transfer the animation data and audio The data is played synchronously.

步骤204，确定与所述音频数据对应的、所述目标虚拟用户的虚拟动作，并根据所述虚拟动作确定所述目标虚拟用户的动画数据。Step 204, determine the virtual action of the target virtual user corresponding to the audio data, and determine the animation data of the target virtual user according to the virtual action.

具体的，虚拟动作即虚拟用户的面部或肢体动作等，例如微笑、打招呼等动作。Specifically, the virtual action is the facial or body action of the virtual user, such as smiling, greeting, and the like.

本说明书实施例的目的在于将目标虚拟用户的音频数据及动画数据在各用户终端进行同步播放，因此，需分别采集虚拟用户的音频数据和动画数据，但实际应用中，由于扩展现实场景中，虚拟用户的发言内容，以及虚拟用户发言过程中所做出的虚拟动作，均可预先进行设置。因此，本说明书实施例可先采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据，然后根据音频数据，确定对应的虚拟动作，并根据虚拟动作确定目标虚拟用户的动画数据。The purpose of the embodiments of this specification is to synchronously play the audio data and animation data of the target virtual user on each user terminal. Therefore, the audio data and animation data of the virtual user need to be collected separately. However, in practical applications, due to the extended reality scene, The speaking content of the virtual user and the virtual actions made by the virtual user during the speaking process can be set in advance. Therefore, the embodiment of this specification can first collect the audio data generated by the target virtual user in the preset time interval in the extended reality scene, and then determine the corresponding virtual action according to the audio data, and determine the animation data of the target virtual user according to the virtual action .

例如，若目标虚拟用户的音频数据为：嗨，大家好。根据音频数据确定该目标虚拟用户在向大家打招呼，那么可以确定其虚拟动作即包括肢体的挥手动作，这种情况下，即可以将挥手的动画确定为目标虚拟用户的动画数据。For example, if the audio data of the target virtual user is: Hi, everyone. According to the audio data, it is determined that the target virtual user is greeting everyone, and then it can be determined that its virtual action includes the waving motion of the limbs. In this case, the waving animation can be determined as the animation data of the target virtual user.

具体实施时，采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据，包括：During specific implementation, the audio data generated by the target virtual user within the preset time interval in the extended reality scene is collected, including:

采集扩展现实场景中，目标虚拟用户在至少一个第一预设时间区间内生成的音频数据；Collecting audio data generated by the target virtual user within at least one first preset time interval in the extended reality scene;

相应地，所述根据所述虚拟动作确定所述目标虚拟用户的动画数据，包括：Correspondingly, the determining the animation data of the target virtual user according to the virtual action includes:

按照所述至少一个第一预设时间区间的时间顺序，对所述音频数据对应的至少一个虚拟动作进行整合，生成所述目标虚拟用户在第二预设时间区间内的动画数据，其中，所述第二预设时间区间由所述至少一个第一预设时间区间组成。According to the time sequence of the at least one first preset time interval, at least one virtual action corresponding to the audio data is integrated to generate animation data of the target virtual user in the second preset time interval, wherein the The second preset time interval is composed of the at least one first preset time interval.

进一步的，按照所述至少一个第一预设时间区间的时间顺序，对所述音频数据对应的至少一个虚拟动作进行整合，生成所述目标虚拟用户在第二预设时间区间内的动画数据，包括：Further, according to the time sequence of the at least one first preset time interval, at least one virtual action corresponding to the audio data is integrated to generate animation data of the target virtual user within the second preset time interval, include:

按照预设帧率，确定第二预设时间区间对应的第一时间长度；Determine the first time length corresponding to the second preset time interval according to the preset frame rate;

在确定所述第一时间长度大于所述第一预设时间区间对应的第二时间长度的情况下，确定所述第二预设时间区间对应的至少一个虚拟动作；When it is determined that the first time length is greater than a second time length corresponding to the first preset time interval, determine at least one virtual action corresponding to the second preset time interval;

根据所述至少一个第一预设时间区间的时间顺序，对所述至少一个虚拟动作进行排序，并根据排序结果确定目标虚拟动作；sorting the at least one virtual action according to the time sequence of the at least one first preset time interval, and determining the target virtual action according to the sorting result;

对所述目标虚拟动作进行整合，生成所述目标虚拟用户在所述第二预设时间区间内的动画数据。The target virtual action is integrated to generate animation data of the target virtual user within the second preset time interval.

具体的，由于扩展现实场景中，目标虚拟用户在发言过程中，其发言内容以及其面部或肢体动作是实时变化的，因此，为保证数据采集结果的准确性，本说明书实施例可按第一预设时间区间，分段采集目标虚拟用户的音频数据，例如，第一预设时间区间对应的时间长度可以是10ms，则第一预设时间区间即可是0-10ms、10ms-20ms等，这样1s内，即可采集获得目标虚拟用户的100个音频数据。Specifically, since in the extended reality scene, the target virtual user's speech content and facial or body movements change in real time during the speech process, therefore, in order to ensure the accuracy of the data collection results, the embodiment of this specification can be based on the first Preset time intervals to collect the audio data of the target virtual user in segments. For example, the time length corresponding to the first preset time interval can be 10ms, then the first preset time interval can be 0-10ms, 10ms-20ms, etc., so Within 1 second, 100 audio data of the target virtual user can be collected.

进一步的，采集获得目标虚拟用户在至少一个第一预设时间区间内生成的音频数据后，即可按照第一预设时间区间的时间先后顺序，对各音频数据对应的虚拟动作进行整合，生成目标虚拟用户在第二预设时间区间内的动画数据。Further, after collecting and obtaining the audio data generated by the target virtual user in at least one first preset time interval, the virtual actions corresponding to each audio data can be integrated according to the chronological order of the first preset time interval to generate Animation data of the target virtual user within the second preset time interval.

具体可先确定动画的预设帧率，并根据预设帧率确定第二预设时间区间对应的第一时间长度，然后在确定第一时间长度大于第一预设时间区间对应的第二时间长度的情况下，确定第二预设时间区间对应的至少一个虚拟动作，并根据各第一预设时间区间的时间先后顺序，对各虚拟动作进行排序，再根据排序结果确定目标虚拟动作，接着对目标虚拟动作进行整合，生成所述目标虚拟用户在所述第二预设时间区间内的动画数据。Specifically, you can first determine the preset frame rate of the animation, and determine the first time length corresponding to the second preset time interval according to the preset frame rate, and then determine that the first time length is greater than the second time corresponding to the first preset time interval In the case of length, determine at least one virtual action corresponding to the second preset time interval, and sort the virtual actions according to the chronological order of each first preset time interval, and then determine the target virtual action according to the sorting result, and then The target virtual action is integrated to generate animation data of the target virtual user within the second preset time interval.

例如，动画的预设帧率为30fps，则表示动画的播放速度是，每秒播放30帧图像，相当于33ms播放1帧图像，即第二预设时间区间对应的时间长度即为33ms，但音频数据的第一预设时间区间的时间长度是10ms，即第二预设时间区间对应的第一时间长度大于第一预设时间区间对应的第二时间长度，且在每个第一预设时间区间内的音频数据均对应一个虚拟动作的情况下，则一个第二预设时间区间则会对应三个同类型的虚拟动作，这种情况下，即需要从三个同类型的虚拟动作中筛选出一个目标虚拟动作，再对不同类型的目标虚拟动作进行整合，生成目标虚拟用户在第二预设时间区间内的动画数据。For example, if the preset frame rate of the animation is 30fps, it means that the playback speed of the animation is to play 30 frames of images per second, which is equivalent to playing one frame of images in 33ms, that is, the time length corresponding to the second preset time interval is 33ms, but The time length of the first preset time interval of the audio data is 10ms, that is, the first time length corresponding to the second preset time interval is greater than the second time length corresponding to the first preset time interval, and every first preset If the audio data in the time interval corresponds to one virtual action, then a second preset time interval will correspond to three virtual actions of the same type. A target virtual action is selected, and then different types of target virtual actions are integrated to generate animation data of the target virtual user within a second preset time interval.

或者，按照所述至少一个第一预设时间区间的时间顺序，对所述音频数据对应的至少一个虚拟动作进行整合，生成所述目标虚拟用户在第二预设时间区间内的动画数据，还可按照预设帧率，确定第二预设时间区间对应的第一时间长度，在确定第一时间长度大于第一预设时间区间对应的第二时间长度的情况下，确定第二预设时间区间对应的至少一个虚拟动作，并对至少一个虚拟动作进行去重，再根据预设帧率及去重后剩余的虚拟动作的数量，确定是否需对剩余的虚拟动作进行增删，使得剩余的虚拟动作的数量等于预设帧率，再按各第一预设时间区间的时间顺序，对剩余虚拟动作进行整合，生成目标虚拟用户在所述第二预设时间区间内的动画数据。Or, according to the time sequence of the at least one first preset time interval, at least one virtual action corresponding to the audio data is integrated to generate animation data of the target virtual user within the second preset time interval, and The first time length corresponding to the second preset time interval can be determined according to the preset frame rate, and the second preset time can be determined when the first time length is determined to be greater than the second time length corresponding to the first preset time interval At least one virtual action corresponding to the interval, and at least one virtual action is deduplicated, and then according to the preset frame rate and the number of remaining virtual actions after deduplication, determine whether to add or delete the remaining virtual actions, so that the remaining virtual actions The number of actions is equal to the preset frame rate, and the remaining virtual actions are integrated according to the time sequence of each first preset time interval to generate animation data of the target virtual user in the second preset time interval.

具体实施时，在采集目标虚拟用户的音频数据后，基于音频数据确定目标虚拟用户的虚拟动作，并根据虚拟动作确定动画数据时，确定与所述音频数据对应的、所述目标虚拟用户的虚拟动作的过程，具体可通过以下方式实现：During specific implementation, after the audio data of the target virtual user is collected, the virtual action of the target virtual user is determined based on the audio data, and when the animation data is determined according to the virtual action, the virtual action of the target virtual user corresponding to the audio data is determined. The process of action can be realized in the following ways:

获取所述目标虚拟用户的预设虚拟动作类型，并确定所述目标虚拟动作类型对应的视觉音素；Acquire the preset virtual action type of the target virtual user, and determine the visual phoneme corresponding to the target virtual action type;

根据所述音频数据确定与所述视觉音素中目标视觉音素对应的音素值；determining a phoneme value corresponding to a target visual phoneme in the visual phonemes according to the audio data;

根据所述目标视觉音素对应的音素值，确定所述目标虚拟用户对应的、所述目标虚拟动作类型的虚拟动作。A virtual action of the target virtual action type corresponding to the target virtual user is determined according to the phoneme value corresponding to the target visual phoneme.

具体的，预设虚拟动作类型即可以包括虚拟唇部动作、虚拟眼部动作以及虚拟肢体动作等，不同虚拟动作类型即可对应不同数量的视觉音素，而不同视觉音素即可以代表一种该类型的虚拟动作，视觉音素对应的音素值(0或1)即可表征虚拟用户是否做出该虚拟动作。Specifically, the preset types of virtual movements can include virtual lip movements, virtual eye movements, and virtual body movements, etc. Different virtual movement types can correspond to different numbers of visual phonemes, and different visual phonemes can represent a type A virtual action, the phoneme value (0 or 1) corresponding to the visual phoneme can represent whether the virtual user makes the virtual action.

例如，虚拟唇部动作即可对应15种视觉音素，分别为sil、PP、FF、TH、DD、kk、CH、SS、nn、RR、aa、E、ih、oh、ou，即对应着15个口型。每种视觉音素即代表一种唇部的虚拟动作(口型)。For example, virtual lip movements can correspond to 15 visual phonemes, namely sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, ou, which correspond to 15 mouth shape. Each visual phoneme represents a virtual movement (mouth shape) of the lips.

其中，sil即代表虚拟用户的唇部未做出任何动作，sil的音素值为1时，即代表虚拟用户的口型为不发任何音时的口型；PP即代表虚拟用户的唇部做出发英文“b、p或m”的动作，PP的音素值为1时，即代表虚拟用户的口型为发英文“b、p或m”时的口型；FF即代表虚拟用户的唇部做出发英文“f或v”的动作，FF的音素值为1时，即代表虚拟用户的口型为发中文“f或v”时的口型；TH即代表虚拟用户的唇部做出发英文“th”的动作，TH的音素值为1时，即代表虚拟用户的口型为发英文“th”时的口型。Among them, sil means that the virtual user's lips do not make any movement, and when the phoneme value of sil is 1, it means that the virtual user's mouth shape is the mouth shape when no sound is made; PP means that the virtual user's lips are moving. When starting the action of "b, p or m" in English, when the phoneme value of PP is 1, it means that the mouth shape of the virtual user is the mouth shape when sending the English "b, p or m"; FF means the lip shape of the virtual user When the phoneme value of FF is 1, it means that the mouth shape of the virtual user is the mouth shape when the Chinese "f or v" is pronounced; TH means that the lip shape of the virtual user is For the action of starting the English "th", when the phoneme value of TH is 1, it means that the mouth shape of the virtual user is the mouth shape when the English "th" is pronounced.

DD即代表虚拟用户的唇部做出发英文“t或d”的动作，DD的音素值为1时，即代表虚拟用户的口型为发英文“t或d”时的口型；kk即代表虚拟用户的唇部做出发英文“k或g”的动作，kk的音素值为1时，即代表虚拟用户的口型为发英文“k或g”时的口型；CH即代表虚拟用户的唇部做出发英文“tS、dZ或S”的动作，CH的音素值为1时，即代表虚拟用户的口型为发英文“tS、dZ或S”时的口型。DD means that the virtual user’s lips make an action of pronouncing the English “t or d”. When the phoneme value of DD is 1, it means that the virtual user’s mouth shape is the mouth shape when pronouncing the English “t or d”; kk is It represents the movement of the virtual user's lips to pronounce the English "k or g". The user's lips make the movement of speaking English "tS, dZ or S", and when the phoneme value of CH is 1, it means that the virtual user's mouth shape is the mouth shape when speaking English "tS, dZ or S".

SS即代表虚拟用户的唇部做出发英文“s或z”的动作，SS的音素值为1时，即代表虚拟用户的口型为发英文“s或z”时的口型；nn即代表虚拟用户的唇部做出发英文“n或l”的动作，nn的音素值为1时，即代表虚拟用户的口型为发英文“n或l”时的口型；RR即代表虚拟用户的唇部做出发英文“r”的动作，RR的音素值为1时，即代表虚拟用户的口型为发英文“r”时的口型；aa即代表虚拟用户的唇部做出发英文“a”的动作，aa的音素值为1时，即代表虚拟用户的口型为发英文“a”时的口型。SS means that the virtual user's lips make an action of "s or z" in English. When the phoneme value of SS is 1, it means that the mouth shape of the virtual user is the mouth shape of the English "s or z"; nn means Represents the virtual user's lips making the movement of "n or l" in English. When the phoneme value of nn is 1, it means that the mouth shape of the virtual user is the mouth shape when the English "n or l" is pronounced; RR stands for virtual The user's lips make an action of pronouncing the English "r", and when the phoneme value of RR is 1, it means that the virtual user's mouth shape is the same as when the English "r" is pronounced; aa means that the virtual user's lips make For the action of pronouncing English "a", when the phoneme value of aa is 1, it means that the virtual user's mouth shape is the same as when pronouncing English "a".

E即代表虚拟用户的唇部做出发英文“e”的动作，E的音素值为1时，即代表虚拟用户的口型为发英文“e”时的口型；ih即代表虚拟用户的唇部做出发英文“a”的动作，ih的音素值为1时，即代表虚拟用户的口型为发英文“a”时的口型；oh即代表虚拟用户的唇部做出发英文“oh”的动作，oh的音素值为1时，即代表虚拟用户的口型为发英文“oh”时的口型；ou即代表虚拟用户的唇部做出发英文“ou”的动作，ou的音素值为1时，即代表虚拟用户的口型为发英文“ou”时的口型。E means that the virtual user's lips make an action of pronouncing the English "e". When the phoneme value of E is 1, it means that the virtual user's mouth shape is the mouth shape when the English "e" is pronounced; ih means the virtual user's mouth shape The lips make the movement of speaking English "a". When the phoneme value of ih is 1, it means that the virtual user's mouth shape is the mouth shape when speaking English "a"; oh means that the virtual user's lips make the pronunciation of English For the action of "oh", when the phoneme value of oh is 1, it means that the mouth shape of the virtual user is the mouth shape of the English "oh"; When the phoneme value of ou is 1, it means that the mouth shape of the virtual user is the mouth shape of the English "ou".

因此，本说明书实施例可获取目标虚拟用户的预设虚拟动作类型，并确定目标虚拟动作类型对应的视觉音素，然后根据音频数据确定与视觉音素中目标视觉音素对应的音素值，并根据目标视觉音素对应的音素值，确定目标虚拟用户对应的、目标虚拟动作类型的虚拟动作。Therefore, the embodiment of this specification can obtain the preset virtual action type of the target virtual user, and determine the visual phoneme corresponding to the target virtual action type, and then determine the phoneme value corresponding to the target visual phoneme in the visual phoneme according to the audio data, and according to the target visual phoneme The phoneme value corresponding to the phoneme determines the virtual action of the target virtual action type corresponding to the target virtual user.

例如，目标虚拟动作类型为虚拟唇部动作，其对应的视觉音素即为sil、PP、FF、TH、DD、kk、CH、SS、nn、RR、aa、E、ih、oh、ou这15种，然后根据音频数据确定目标虚拟用户在第一预设时间区间内发了中文“f”的音，则可将视觉音素FF的音素值置为1，并可确定目标虚拟用户对应的目标虚拟动作类型的虚拟动作即为唇部做出发中文“f”的音的动作，然后可根据这一动作确定目标虚拟用户的动画数据。For example, if the target virtual action type is virtual lip action, the corresponding visual phonemes are sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, and ou. Then, according to the audio data, it is determined that the target virtual user has pronounced the Chinese "f" within the first preset time interval, then the phoneme value of the visual phoneme FF can be set to 1, and the target virtual user corresponding to the target virtual user can be determined. The virtual action of the action type is the action of the lips making the Chinese "f" sound, and then the animation data of the target virtual user can be determined according to this action.

步骤206，根据所述预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果，并在所述动画数据中添加对应的时间戳，生成目标动画数据。Step 206: According to the preset time interval, add a corresponding time stamp to the encoding result of the audio data to generate a target encoding result, and add a corresponding time stamp to the animation data to generate target animation data.

具体的，本说明书实施例为使得将音频数据和动画数据发送至客户端后，客户端在播放音频数据和动画数据时，实现音画同步，可在发送数据之前，对音频数据进行压缩编码，并在压缩编码结果中添加对应的时间戳，生成目标编码结果。而对于动画数据，由于将动画数据发送至客户端后，客户端需基于动画数据进行动画渲染，若在发送动画数据之前，对动画数据进行压缩编码，则客户端在接收到动画数据的压缩编码结果后，还需要进行解码，这个压缩编码和解码的过程，可能会导致动画数据部分丢失，从而导致动画渲染效果无法达到预期，因此，本说明书实施例在向客户端发送动画数据时，不对动画进行压缩，直接在动画数据中添加时间戳，并将添加时间戳的目标动画数据以及音频数据的目标编码结果发送至客户端。Specifically, in the embodiment of this specification, after the audio data and animation data are sent to the client, when the client plays the audio data and animation data, the audio and video synchronization can be realized, and the audio data can be compressed and encoded before the data is sent. And add the corresponding time stamp to the compressed encoding result to generate the target encoding result. As for the animation data, after the animation data is sent to the client, the client needs to perform animation rendering based on the animation data. If the animation data is compressed and encoded before the animation data is sent, the client will After the result, it needs to be decoded. This process of compression encoding and decoding may lead to the loss of part of the animation data, resulting in the failure of the animation rendering effect to meet expectations. Perform compression, directly add a time stamp to the animation data, and send the target animation data with the time stamp and the target encoding result of the audio data to the client.

具体实施时，根据所述预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果，并在所述动画数据中添加对应的时间戳，生成目标动画数据，包括：During specific implementation, according to the preset time interval, a corresponding time stamp is added to the encoding result of the audio data to generate a target encoding result, and a corresponding time stamp is added to the animation data to generate target animation data, include:

根据所述第一预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果；以及，According to the first preset time interval, adding a corresponding time stamp to the encoding result of the audio data to generate a target encoding result; and,

根据所述第二预设时间区间，在所述动画数据中添加对应的时间戳，生成目标动画数据。According to the second preset time interval, a corresponding time stamp is added to the animation data to generate target animation data.

具体的，如前所述，由于第一预设时间区间与第二预设时间区间对应的时间长度可能不同，因此，为保证时间戳添加结果的准确性，可分别根据第一预设时间区间在音频数据的编码结果中添加对应的时间戳，并根据第二预设时间区间在动画数据中添加对应的时间戳，该时间戳即可以是第一预设时间区间或第二预设时间区间的起始时间或结束时间。Specifically, as mentioned above, since the time lengths corresponding to the first preset time interval and the second preset time interval may be different, in order to ensure the accuracy of the timestamp adding results, the first preset time interval Add a corresponding time stamp to the encoding result of the audio data, and add a corresponding time stamp to the animation data according to the second preset time interval, the time stamp can be the first preset time interval or the second preset time interval start time or end time.

步骤208，将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端，以使所述客户端对所述目标编码结果进行解码，并根据所述时间戳，对生成的解码结果及所述动画数据进行同步播放。Step 208: Send the target encoding result to the client through the audio transmission channel, and send the target animation data to the client through the animation data transmission channel, so that the client performs the target encoding result on the client. Decoding, and synchronously playing the generated decoding result and the animation data according to the time stamp.

具体的，本说明书实施例可分别将音频数据的目标编码结果及目标动画数据通过独立的数据传输通道进行传输，以使得音频数据和动画数据在传输过程中互不干扰，从而保证数据传输的实时性。Specifically, in the embodiment of this specification, the target encoding result of audio data and the target animation data can be transmitted through independent data transmission channels, so that the audio data and animation data do not interfere with each other during the transmission process, thereby ensuring real-time data transmission. sex.

分别将音频数据的目标编码结果通过音频传输通道发送至客户端，并将目标动画数据通过动画数据传输通道发送至客户端后，客户端即可对音频数据的目标编码结果进行解码，并根据编码结果及目标动画数据中添加的时间戳，对生成的解码结果及动画数据进行同步播放，以实现客户端的音画同步效果。After sending the target encoding result of the audio data to the client through the audio transmission channel, and sending the target animation data to the client through the animation data transmission channel, the client can decode the target encoding result of the audio data, and The time stamp added to the result and the target animation data, and the generated decoding result and animation data are played synchronously to achieve the audio and video synchronization effect of the client.

具体实施时，将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端，包括：During specific implementation, the target encoding result is sent to the client through the audio transmission channel, and the target animation data is sent to the client through the animation data transmission channel, including:

将所述目标编码结果及所述目标动画数据发送至云服务器，以使所述云服务器将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端。Send the target encoding result and the target animation data to the cloud server, so that the cloud server sends the target encoding result to the client through the audio transmission channel, and transmits the target animation data through the animation data transmission channel sent to the client.

具体的，本说明书实施例可借助云服务器实现数据传输，即在需将音频数据的目标编码结果及目标动画数据传输至客户端时，可先将音频数据的目标编码结果及目标动画数据发送至云服务器，由云服务器将目标编码结果通过音频传输通道发送至客户端，并将目标动画数据通过动画数据传输通道发送至该客户端。Specifically, the embodiment of this specification can realize data transmission by means of a cloud server, that is, when the target encoding result of audio data and target animation data need to be transmitted to the client, the target encoding result of audio data and target animation data can be sent to the The cloud server sends the target encoding result to the client through the audio transmission channel, and sends the target animation data to the client through the animation data transmission channel.

另外，在将音频数据的编码结果及动画数据发送至云服务器或客户端之前，还可根据所述目标编码结果中的时间戳，将所述目标编码结果添加至音频数据发送队列；以及，In addition, before sending the encoding result of the audio data and the animation data to the cloud server or the client, the target encoding result can also be added to the audio data sending queue according to the timestamp in the target encoding result; and,

根据所述目标动画数据中的时间戳，将所述目标动画数据添加至动画数据发送队列。Add the target animation data to an animation data sending queue according to the time stamp in the target animation data.

进一步的，将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端，包括：Further, sending the target encoding result to the client through an audio transmission channel, and sending the target animation data to the client through an animation data transmission channel, including:

按照第一预设时间周期，将所述音频数据发送队列中的目标编码结果发送云服务器；以及，Sending the target encoding result in the audio data sending queue to the cloud server according to a first preset time period; and,

按照第二预设时间周期，将所述动画数据发送队列中的目标动画数据发送所述云服务器，以使所述云服务器将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端。Send the target animation data in the animation data sending queue to the cloud server according to the second preset time period, so that the cloud server will send the target encoding result to the client through the audio transmission channel, and send the target animation data to the client through the audio transmission channel. The target animation data is sent to the client through the animation data transmission channel.

具体的，采集一段时间内的音频数据和动画数据后，可将音频数据做预处理并加入音频数据发送队列，预处理包含压缩编码和添加时间戳VTS等。将动画数据添加时间戳ATS，并保持二进制数据的形式，将其添加至动画数据发送队列。Specifically, after collecting audio data and animation data for a period of time, the audio data can be preprocessed and added to the audio data sending queue. The preprocessing includes compression encoding and adding a time stamp VTS, etc. Add the time stamp ATS to the animation data, and keep the form of binary data, and add it to the animation data sending queue.

对于音频数据发送队列中的目标编码结果以及动画数据发送队列中的目标动画数据，可按照相同或不同的预设时间周期，定时将其从发送队列取出，并发送到云服务器，由云服务器将目标编码结果通过音频传输通道发送至客户端，并将目标动画数据通过动画数据传输通道发送至该客户端。For the target encoding results in the audio data sending queue and the target animation data in the animation data sending queue, they can be regularly taken out from the sending queue according to the same or different preset time periods, and sent to the cloud server, and the cloud server will The target encoding result is sent to the client through the audio transmission channel, and the target animation data is sent to the client through the animation data transmission channel.

基于此，本说明书实施例提供的一种基于扩展现实场景的数据处理过程的示意图如图3所示。图3中，发送端(第一客户端)先生成一段时间内的音频数据和动画数据，例如：10ms的音频数据、33ms的动画数据，其中，动画数据可包含与音频相关的唇部动作、眨眼和眉毛动作等自然面部表情、头部动作、身体骨骼动作等。Based on this, a schematic diagram of a data processing process based on an extended reality scene provided by the embodiment of this specification is shown in FIG. 3 . In Fig. 3, the sending end (first client) first generates audio data and animation data within a period of time, for example: 10ms audio data, 33ms animation data, wherein the animation data can include lip movements related to audio, Natural facial expressions such as blinking and eyebrow movements, head movements, body bone movements, and more.

然后将音频数据做预处理并加入音频数据发送队列，预处理包含压缩、编码和添加时间戳VTS等；将动画数据添加时间戳ATS，并保持二进制数据的形式，将其加入动画数据发送队列，并定时将发送队列的数据取出并发送到云服务器。Then preprocess the audio data and add it to the audio data sending queue. The preprocessing includes compression, encoding and adding time stamp VTS, etc.; add the time stamp ATS to the animation data, and keep the form of binary data, and add it to the animation data sending queue. And periodically take out the data in the sending queue and send it to the cloud server.

云服务器端的音频传输通道和动画数据传输通道是独立的服务，分别负责将数据转发到接收端(第二客户端)。独立的动画数据传输通道，可以发送较大容量的数据，并且不干扰音频数据。The audio transmission channel and the animation data transmission channel on the cloud server side are independent services, which are respectively responsible for forwarding data to the receiving end (second client). An independent animation data transmission channel can send large-capacity data without interfering with audio data.

接收端接收音频数据和动画数据，处理后将其加入缓冲区，然后将动画数据和音频数据进行同步，在进行播放时，播放音频数据，同时在当前帧渲染动画。The receiving end receives audio data and animation data, adds them to the buffer after processing, and then synchronizes the animation data and audio data. When playing, the audio data is played and the animation is rendered in the current frame.

实际应用中，接收端(第二客户端)存在音频数据处理线程和动画数据处理线程，本说明书实施例提供的音画同步播放过程的示意图如图4所示。In practical applications, there are audio data processing threads and animation data processing threads at the receiving end (second client). The schematic diagram of the audio-video synchronous playback process provided by the embodiment of this specification is shown in FIG. 4 .

图4中，对于音频数据处理线程，可先读取缓冲区的音频数据和时间戳VTS，然后更新当前游标的时间戳CTS，使CTS＝VTS，并播放音频，对于动画数据处理线程，可先读取缓冲区的动画数据和时间戳ATS，判断ATS和当前游标的时间戳CTS是否匹配，匹配条件可以自定义，例如两者之差小于20ms。若匹配，则基于动画数据进行动画渲染并显示；若不匹配，则判断(ATS-CTS)是否小于DeltaTS，若(ATS-CTS)<DeltaTS，则将动画数据丢弃；若(ATS-CTS)≥DeltaTS，则表示未到达动画数据的渲染时间，这种情况下，可等待CTS更新并延迟渲染。其中，DeltaTS表示同步延迟阈值，可以根据实际情况自定义，例如设为50ms。In Fig. 4, for the audio data processing thread, the audio data and the time stamp VTS of the buffer can be read first, and then the time stamp CTS of the current cursor can be updated to make CTS=VTS, and the audio can be played. For the animation data processing thread, the time stamp CTS can be updated first. Read the animation data and timestamp ATS of the buffer, and judge whether the ATS matches the timestamp CTS of the current cursor. The matching conditions can be customized, for example, the difference between the two is less than 20ms. If it matches, perform animation rendering and display based on the animation data; if it does not match, then judge whether (ATS-CTS) is less than DeltaTS, if (ATS-CTS)<DeltaTS, then discard the animation data; if (ATS-CTS)≥ DeltaTS means that the rendering time of the animation data has not been reached. In this case, wait for the CTS update and delay the rendering. Among them, DeltaTS represents the synchronization delay threshold, which can be customized according to the actual situation, for example, set to 50ms.

另外，本说明书实施例提供的一种应用于虚拟直播场景的基于扩展现实的数据处理过程的示意图如图5a所示，图5a中，主播终端采集主播的音频数据以及肢体表情等动画数据，生成同步时间戳，并在音频数据和动画数据中分别添加时间戳，然后通过云服务器将音频数据和肢体表情等动画数据传输至观众终端，由观众终端基于动画数据进行动画渲染，并进行音频和动画同步播放。In addition, a schematic diagram of an extended reality-based data processing process applied to a virtual live broadcast scene provided by the embodiment of this specification is shown in Figure 5a. In Figure 5a, the anchor terminal collects the anchor's audio data and animation data such as body expressions, and generates Synchronize time stamps, add time stamps to audio data and animation data, and then transmit audio data and animation data such as body expressions to the audience terminal through the cloud server, and the audience terminal performs animation rendering based on the animation data, and performs audio and animation Play synchronously.

本说明书实施例提供的一种应用于虚拟会议场景的基于扩展现实的数据处理过程的示意图如图5b所示，图5b中，发言人终端采集会议发言人的音频数据以及肢体表情等动画数据，生成同步时间戳，并在音频数据和动画数据中分别添加时间戳，然后将音频数据和肢体表情等动画数据传输至云服务器，并通过云渲染引擎服务基于动画数据进行动画渲染，并将音频数据和动画渲染结果发送至观众终端，以在观众终端进行音频和动画同步播放，从而实现视频直播的效果，另外用户可通过观众终端提交与发言人的交互指令，由云渲染引擎服务对交互指令进行处理，并将处理结果发送至观众终端，以展示相应的交互效果。A schematic diagram of an extended reality-based data processing process applied to a virtual meeting scene provided by the embodiment of this specification is shown in Figure 5b. In Figure 5b, the speaker terminal collects the audio data of the conference speaker and animation data such as body expressions, Generate synchronous timestamps, and add timestamps to audio data and animation data respectively, then transmit audio data and animation data such as body expressions to the cloud server, and perform animation rendering based on the animation data through the cloud rendering engine service, and convert the audio data and animation rendering results are sent to the audience terminal for synchronous audio and animation playback on the audience terminal, so as to achieve the effect of live video broadcasting. In addition, the user can submit interactive instructions with the speaker through the audience terminal, and the cloud rendering engine service will execute the interactive instructions. processing, and send the processing result to the audience terminal to display the corresponding interactive effect.

本说明书实施例通过在音频数据和动画数据中添加时间戳，使得客户端可以时间戳为基准将动画数据和音频数据进行同步播放，有利于提高音画同步播放效果；另外，在进行数据传输时，使用独立的传输通道分别传输目标动画数据和音频数据的目标编码结果，使得二者在传输过程中互不干扰，有利于保证数据传输的实时性，从而保证数据处理效率；此外，在传输动画数据时，不对其进行压缩编码，有利于避免数据丢失，使得客户端基于动画数据进行动画渲染，得到的渲染结果能够符合预期，基于该动画渲染结果及音频数据的解码结果进行音画同步播放，有利于提升用户的观看体验。The embodiment of this specification adds time stamps to the audio data and animation data, so that the client can play the animation data and audio data synchronously based on the time stamps, which is conducive to improving the synchronous playback effect of audio and video; in addition, when performing data transmission , using independent transmission channels to transmit the target animation data and audio data target encoding results respectively, so that the two do not interfere with each other during the transmission process, which is conducive to ensuring the real-time performance of data transmission, thereby ensuring the efficiency of data processing; in addition, when transmitting animation data, it is not compressed and encoded, which is beneficial to avoid data loss, so that the client can perform animation rendering based on the animation data, and the rendering result can meet expectations. It is beneficial to improve the viewing experience of the user.

与上述方法实施例相对应，本说明书还提供了基于扩展现实的数据处理装置实施例，图6示出了本说明书一个实施例提供的一种基于扩展现实的数据处理装置的结构示意图。如图6所示，该装置包括：Corresponding to the foregoing method embodiments, this specification also provides an embodiment of an extended reality-based data processing device. FIG. 6 shows a schematic structural diagram of an extended reality-based data processing device provided by an embodiment of this specification. As shown in Figure 6, the device includes:

采集模块602，被配置为采集扩展现实场景中，目标虚拟用户在预设时间区间内生成的音频数据；The collection module 602 is configured to collect audio data generated by the target virtual user within a preset time interval in the extended reality scene;

确定模块604，被配置为确定与所述音频数据对应的、所述目标虚拟用户的虚拟动作，并根据所述虚拟动作确定所述目标虚拟用户的动画数据；The determination module 604 is configured to determine the virtual action of the target virtual user corresponding to the audio data, and determine the animation data of the target virtual user according to the virtual action;

添加模块606，被配置为根据所述预设时间区间，在所述音频数据的编码结果中添加对应的时间戳，生成目标编码结果，并在所述动画数据中添加对应的时间戳，生成目标动画数据；The adding module 606 is configured to add a corresponding time stamp to the encoding result of the audio data according to the preset time interval to generate a target encoding result, and add a corresponding time stamp to the animation data to generate a target animation data;

发送模块608，被配置为将所述目标编码结果通过音频传输通道发送至客户端，并将所述目标动画数据通过动画数据传输通道发送至所述客户端，以使所述客户端对所述目标编码结果进行解码，并根据所述时间戳，对生成的解码结果及所述动画数据进行同步播放。The sending module 608 is configured to send the target encoding result to the client through an audio transmission channel, and send the target animation data to the client through an animation data transmission channel, so that the client can Decoding the target encoding result, and synchronously playing the generated decoding result and the animation data according to the timestamp.

可选地，所述采集模块602，进一步被配置为：Optionally, the collection module 602 is further configured to:

相应地，所述确定模块604，进一步被配置为：Correspondingly, the determining module 604 is further configured to:

可选地，所述确定模块604，进一步被配置为：Optionally, the determining module 604 is further configured to:

可选地，所述添加模块606，进一步被配置为：Optionally, the adding module 606 is further configured to:

可选地，所述发送模块608，进一步被配置为：Optionally, the sending module 608 is further configured to:

可选地，所述基于扩展现实的数据处理装置，还包括处理模块，被配置为：Optionally, the extended reality-based data processing device further includes a processing module configured to:

根据所述目标编码结果中的时间戳，将所述目标编码结果添加至音频数据发送队列；以及，adding the target encoding result to an audio data sending queue according to the timestamp in the target encoding result; and,

可选地，所述发送模块608.进一步被配置为：Optionally, the sending module 608. is further configured to:

上述为本实施例的一种基于扩展现实的数据处理装置的示意性方案。需要说明的是，该基于扩展现实的数据处理装置的技术方案与上述的基于扩展现实的数据处理方法的技术方案属于同一构思，基于扩展现实的数据处理装置的技术方案未详细描述的细节内容，均可以参见上述基于扩展现实的数据处理方法的技术方案的描述。The foregoing is a schematic solution of an extended reality-based data processing device in this embodiment. It should be noted that the technical solution of the data processing device based on extended reality and the technical solution of the data processing method based on extended reality belong to the same idea, and the details of the technical solution based on the data processing device of extended reality are not described in detail. For details, please refer to the above description of the technical solution of the data processing method based on extended reality.

图7示出了根据本说明书一个实施例提供的另一种基于扩展现实的数据处理方法的流程图，具体包括以下步骤。Fig. 7 shows a flow chart of another extended reality-based data processing method according to an embodiment of the present specification, which specifically includes the following steps.

步骤702，通过动画数据传输通道接收扩展现实场景中，目标虚拟用户在预设时间区间内生成的动画数据，其中，所述动画数据中包含第一时间戳。Step 702: Receive animation data generated by the target virtual user within a preset time interval in the extended reality scene through the animation data transmission channel, wherein the animation data includes a first time stamp.

步骤704，通过音频传输通道接收扩展现实场景中，所述目标虚拟用户在预设时间区间内生成的音频数据的编码结果，其中，所述编码结果中包含第二时间戳。Step 704: Receive an encoding result of audio data generated by the target virtual user within a preset time interval in the extended reality scenario through an audio transmission channel, wherein the encoding result includes a second time stamp.

步骤706，对所述编码结果进行解码，并基于所述动画数据进行动画渲染。Step 706, decode the encoding result, and perform animation rendering based on the animation data.

步骤708，根据所述第一时间戳及所述第二时间戳，在所述扩展现实场景中对生成的解码结果及动画渲染结果进行同步播放。Step 708, according to the first time stamp and the second time stamp, synchronously play the generated decoding result and animation rendering result in the extended reality scene.

具体实施时，所述根据所述第一时间戳及所述第二时间戳，在所述扩展现实场景中对生成的解码结果及动画渲染结果进行同步播放，包括：During specific implementation, the synchronous playback of the generated decoding result and animation rendering result in the extended reality scene according to the first timestamp and the second timestamp includes:

根据所述第一时间戳，对参考时间进行更新，并确定所述第二时间戳与所述参考时间之间的时间差是否小于第一预设时间阈值；updating a reference time according to the first time stamp, and determining whether the time difference between the second time stamp and the reference time is less than a first preset time threshold;

若是，则在所述扩展现实场景中，对生成的解码结果及动画渲染结果进行同步播放。If so, in the extended reality scene, the generated decoding result and animation rendering result are played synchronously.

另外，在确定所述第二时间戳与所述参考时间之间的时间差大于第一预设时间阈值的情况下，在所述扩展现实场景中，对生成的解码结果进行播放；In addition, when it is determined that the time difference between the second time stamp and the reference time is greater than a first preset time threshold, in the extended reality scene, play the generated decoding result;

判断所述第二时间戳与所述参考时间之间的时间差是否小于第二预设时间阈值；judging whether the time difference between the second time stamp and the reference time is less than a second preset time threshold;

若是，则将所述动画数据删除；If so, then delete the animation data;

若否，则执行所述根据所述第一时间戳，对参考时间进行更新的步骤。If not, execute the step of updating the reference time according to the first timestamp.

具体的，本说明书实施例提供的基于扩展现实的数据处理方法，应用于第二客户端，第二客户端通过独立的通道分别接收音频数据的编码结果以及动画数据，另外，第二客户端存在音频数据处理线程和动画数据处理线程，对于音频数据处理线程，可先读取缓冲区的音频数据和时间戳VTS，然后更新当前游标的时间戳CTS，使CTS＝VTS，并播放音频，对于动画数据处理线程，可先读取缓冲区的动画数据和时间戳ATS，判断ATS和当前游标的时间戳CTS是否匹配，匹配条件可以自定义，例如两者之差小于20ms。若匹配，则基于动画数据进行动画渲染并显示；若不匹配，则判断(ATS-CTS)是否小于DeltaTS，若(ATS-CTS)<DeltaTS，则将动画数据丢弃；若(ATS-CTS)≥DeltaTS，则表示未到达动画数据的渲染时间，这种情况下，可等待CTS更新并延迟渲染。其中，DeltaTS表示同步延迟阈值，可以根据实际情况自定义，例如设为50ms。Specifically, the extended reality-based data processing method provided by the embodiment of this specification is applied to the second client, and the second client receives the encoding result of the audio data and the animation data respectively through independent channels. In addition, the second client has Audio data processing thread and animation data processing thread, for audio data processing thread, can read the audio data and timestamp VTS of the buffer first, then update the timestamp CTS of the current cursor, make CTS=VTS, and play audio, for animation The data processing thread can first read the animation data in the buffer and the timestamp ATS, and judge whether the ATS matches the timestamp CTS of the current cursor. The matching conditions can be customized, for example, the difference between the two is less than 20ms. If it matches, perform animation rendering and display based on the animation data; if it does not match, then judge whether (ATS-CTS) is less than DeltaTS, if (ATS-CTS)<DeltaTS, then discard the animation data; if (ATS-CTS)≥ DeltaTS means that the rendering time of the animation data has not been reached. In this case, wait for the CTS update and delay the rendering. Among them, DeltaTS represents the synchronization delay threshold, which can be customized according to the actual situation, for example, set to 50ms.

上述为本实施例的另一种基于扩展现实的数据处理方法的示意性方案。需要说明的是，该另一种基于扩展现实的数据处理方法的技术方案与上述的一种基于扩展现实的数据处理方法的技术方案属于同一构思，基于扩展现实的数据处理装置的技术方案未详细描述的细节内容，均可以参见上述一种基于扩展现实的数据处理方法的技术方案的描述。The foregoing is a schematic solution of another data processing method based on extended reality in this embodiment. It should be noted that the technical solution of another extended reality-based data processing method is the same concept as the above-mentioned technical solution of an extended reality-based data processing method, and the technical solution of an extended reality-based data processing device is not described in detail. For the details of the description, please refer to the description of the technical solution of the above-mentioned extended reality-based data processing method.

下述结合附图8，以本说明书提供的基于扩展现实的数据处理方法在虚拟会议场景的应用为例，对所述基于扩展现实的数据处理方法进行进一步说明。其中，图8示出了本说明书一个实施例提供的一种基于扩展现实的数据处理方法的处理过程流程图，具体包括以下步骤。The following describes the data processing method based on extended reality by taking the application of the data processing method based on extended reality provided in this specification in a virtual conference scene as an example in conjunction with FIG. 8 . Wherein, FIG. 8 shows a flowchart of a processing process of an extended reality-based data processing method provided by an embodiment of this specification, which specifically includes the following steps.

步骤802，第一客户端采集基于扩展现实的虚拟会议场景中，发言用户在预设时间区间内生成的音频数据，确定与音频数据对应的、发言用户的虚拟唇部动作和/或虚拟肢体动作，并根据虚拟唇部动作和/或虚拟肢体动作确定发言用户的动画数据。Step 802, the first client collects the audio data generated by the speaking user within a preset time interval in the virtual conference scene based on the extended reality, and determines the virtual lip movement and/or virtual body movement of the speaking user corresponding to the audio data , and determine the animation data of the speaking user according to the virtual lip movement and/or the virtual body movement.

步骤804，第一客户端对音频数据进行编码，并根据预设时间区间，在音频数据的编码结果及动画数据中添加对应的时间戳。Step 804, the first client encodes the audio data, and adds a corresponding time stamp to the encoding result of the audio data and the animation data according to the preset time interval.

步骤806，第一客户端将添加时间戳的编码结果及动画数据发送至云服务器。Step 806, the first client sends the coding result and animation data with time stamp added to the cloud server.

步骤808，云服务器通过音频传输通道将添加时间戳的编码结果传输至第二客户端。Step 808, the cloud server transmits the encoding result with the time stamp added to the second client through the audio transmission channel.

步骤810，云服务器通过动画数据传输通道将添加时间戳的动画数据传输至第二客户端。Step 810, the cloud server transmits the animation data with time stamp added to the second client through the animation data transmission channel.

步骤812，第二客户端对添加时间戳的编码结果进行解码，并基于动画数据进行动画渲染。Step 812, the second client decodes the encoded result with the time stamp added, and performs animation rendering based on the animation data.

步骤814，第二客户端根据添加时间戳的编码结果及动画数据中包含的时间戳，对生成的解码结果及动画渲染结果进行同步播放。Step 814, the second client synchronously plays the generated decoding result and animation rendering result according to the encoding result with added time stamp and the time stamp included in the animation data.

与上述方法实施例相对应，本说明书还提供了另一种基于扩展现实的数据处理装置实施例，图9示出了本说明书一个实施例提供的另一种基于扩展现实的数据处理装置的结构示意图。如图9所示，该装置包括：Corresponding to the above-mentioned method embodiments, this specification also provides another embodiment of an extended reality-based data processing device. FIG. 9 shows the structure of another extended reality-based data processing device provided by an embodiment of this specification. schematic diagram. As shown in Figure 9, the device includes:

第一接收模块902，被配置为通过动画数据传输通道接收扩展现实场景中，目标虚拟用户在预设时间区间内生成的动画数据，其中，所述动画数据中包含第一时间戳；The first receiving module 902 is configured to receive the animation data generated by the target virtual user within a preset time interval in the extended reality scene through the animation data transmission channel, wherein the animation data includes a first time stamp;

第二接收模块904，被配置为通过音频传输通道接收扩展现实场景中，所述目标虚拟用户在预设时间区间内生成的音频数据的编码结果，其中，所述编码结果中包含第二时间戳；The second receiving module 904 is configured to receive an encoding result of the audio data generated by the target virtual user within a preset time interval in the extended reality scene through an audio transmission channel, wherein the encoding result includes a second time stamp ;

处理模块906，被配置为对所述编码结果进行解码，并基于所述动画数据进行动画渲染；The processing module 906 is configured to decode the encoding result, and perform animation rendering based on the animation data;

播放模块908，被配置为根据所述第一时间戳及所述第二时间戳，在所述扩展现实场景中对生成的解码结果及动画渲染结果进行同步播放。The playing module 908 is configured to synchronously play the generated decoding result and animation rendering result in the extended reality scene according to the first time stamp and the second time stamp.

可选地，所述播放模块908，进一步被配置为：Optionally, the playback module 908 is further configured to:

在确定所述第二时间戳与所述参考时间之间的时间差大于第一预设时间阈值的情况下，在所述扩展现实场景中，对生成的解码结果进行播放；When it is determined that the time difference between the second time stamp and the reference time is greater than a first preset time threshold, in the extended reality scene, play the generated decoding result;

上述为本实施例的另一种基于扩展现实的数据处理装置的示意性方案。需要说明的是，该另一种基于扩展现实的数据处理装置的技术方案与上述的另一种基于扩展现实的数据处理方法的技术方案属于同一构思，基于扩展现实的数据处理装置的技术方案未详细描述的细节内容，均可以参见上述另一种基于扩展现实的数据处理方法的技术方案的描述。The foregoing is a schematic solution of another data processing device based on extended reality in this embodiment. It should be noted that the technical solution of another data processing device based on extended reality belongs to the same idea as the technical solution of another data processing method based on extended reality, and the technical solution of the data processing device based on extended reality is not For details of the detailed description, refer to the description of the technical solution of another extended reality-based data processing method above.

图10示出了根据本说明书一个实施例提供的一种计算设备1000的结构框图。该计算设备1000的部件包括但不限于存储器1010和处理器1020。处理器1020与存储器1010通过总线1030相连接，数据库1050用于保存数据。Fig. 10 shows a structural block diagram of a computing device 1000 provided according to an embodiment of this specification. Components of the computing device 1000 include, but are not limited to, a memory 1010 and a processor 1020 . The processor 1020 is connected to the memory 1010 through the bus 1030, and the database 1050 is used for storing data.

计算设备1000还包括接入设备1040，接入设备1040使得计算设备1000能够经由一个或多个网络1060通信。这些网络的示例包括公用交换电话网(PSTN)、局域网(LAN)、广域网(WAN)、个域网(PAN)或诸如因特网的通信网络的组合。接入设备1040可以包括有线或无线的任何类型的网络接口(例如，网络接口卡(NIC))中的一个或多个，诸如IEEE802.11无线局域网(WLAN)无线接口、全球微波互联接入(Wi-MAX)接口、以太网接口、通用串行总线(USB)接口、蜂窝网络接口、蓝牙接口、近场通信(NFC)接口，等等。Computing device 1000 also includes access device 1040 that enables computing device 1000 to communicate via one or more networks 1060 . Examples of these networks include the Public Switched Telephone Network (PSTN), Local Area Network (LAN), Wide Area Network (WAN), Personal Area Network (PAN), or a combination of communication networks such as the Internet. Access device 1040 may include one or more of any type of network interface (e.g., a network interface card (NIC)), wired or wireless, such as an IEEE 802.11 wireless local area network (WLAN) wireless interface, Worldwide Interoperability for Microwave Access ( Wi-MAX) interface, Ethernet interface, Universal Serial Bus (USB) interface, cellular network interface, Bluetooth interface, Near Field Communication (NFC) interface, etc.

在本说明书的一个实施例中，计算设备1000的上述部件以及图10中未示出的其他部件也可以彼此相连接，例如通过总线。应当理解，图10所示的计算设备结构框图仅仅是出于示例的目的，而不是对本说明书范围的限制。本领域技术人员可以根据需要，增添或替换其他部件。In an embodiment of the present specification, the above-mentioned components of the computing device 1000 and other components not shown in FIG. 10 may also be connected to each other, for example, through a bus. It should be understood that the structural block diagram of the computing device shown in FIG. 10 is only for the purpose of illustration, rather than limiting the scope of this description. Those skilled in the art can add or replace other components as needed.

计算设备1000可以是任何类型的静止或移动计算设备，包括移动计算机或移动计算设备(例如，平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等)、移动电话(例如，智能手机)、可佩戴的计算设备(例如，智能手表、智能眼镜等)或其他类型的移动设备，或者诸如台式计算机或PC的静止计算设备。计算设备1000还可以是移动式或静止式的服务器。Computing device 1000 can be any type of stationary or mobile computing device, including mobile computers or mobile computing devices (e.g., tablet computers, personal digital assistants, laptop computers, notebook computers, netbooks, etc.), mobile telephones (e.g., smartphones), ), wearable computing devices (eg, smart watches, smart glasses, etc.), or other types of mobile devices, or stationary computing devices such as desktop computers or PCs. Computing device 1000 may also be a mobile or stationary server.

其中，处理器1020用于执行如下计算机可执行指令，该计算机可执行指令被处理器执行时实现上述基于扩展现实的数据处理方法的步骤。Wherein, the processor 1020 is configured to execute the following computer-executable instructions. When the computer-executable instructions are executed by the processor, the steps of the above-mentioned extended reality-based data processing method are implemented.

上述为本实施例的一种计算设备的示意性方案。需要说明的是，该计算设备的技术方案与上述的基于扩展现实的数据处理方法的技术方案属于同一构思，计算设备的技术方案未详细描述的细节内容，均可以参见上述基于扩展现实的数据处理方法的技术方案的描述。The foregoing is a schematic solution of a computing device in this embodiment. It should be noted that the technical solution of the computing device and the technical solution of the above-mentioned extended reality-based data processing method belong to the same concept. Description of the technical solution of the method.

本说明书一实施例还提供一种计算机可读存储介质，其存储有计算机可执行指令，该计算机可执行指令被处理器执行时实现上述基于扩展现实的数据处理方法的步骤。An embodiment of the present specification also provides a computer-readable storage medium, which stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the steps of the above-mentioned extended reality-based data processing method are implemented.

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是，该存储介质的技术方案与上述的基于扩展现实的数据处理方法的技术方案属于同一构思，存储介质的技术方案未详细描述的细节内容，均可以参见上述基于扩展现实的数据处理方法的技术方案的描述。The foregoing is a schematic solution of a computer-readable storage medium in this embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the above-mentioned extended reality-based data processing method belong to the same idea, and details not described in detail in the technical solution of the storage medium can be found in the above-mentioned data processing based on extended reality Description of the technical solution of the method.

本说明书一实施例还提供一种计算机程序，其中，当所述计算机程序在计算机中执行时，令计算机执行上述基于扩展现实的数据处理方法的步骤。An embodiment of the present specification also provides a computer program, wherein, when the computer program is executed in a computer, the computer is made to execute the steps of the above-mentioned extended reality-based data processing method.

上述为本实施例的一种计算机程序的示意性方案。需要说明的是，该计算机程序的技术方案与上述的基于扩展现实的数据处理方法的技术方案属于同一构思，计算机程序的技术方案未详细描述的细节内容，均可以参见上述基于扩展现实的数据处理方法的技术方案的描述。The foregoing is a schematic solution of a computer program in this embodiment. It should be noted that the technical solution of the computer program and the technical solution of the above-mentioned extended reality-based data processing method belong to the same idea, and details not described in detail in the technical solution of the computer program can be found in the above-mentioned data processing based on extended reality Description of the technical solution of the method.

上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

所述计算机指令包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM，Read-OnlyMemory)、随机存取存储器(RAM，RandomAccessMemory)、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减，例如在某些司法管辖区，根据立法和专利实践，计算机可读介质不包括电载波信号和电信信号。The computer instructions include computer program code, which may be in source code form, object code form, executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-OnlyMemory), Random access memory (RAM, RandomAccessMemory), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, computer-readable media Excludes electrical carrier signals and telecommunication signals.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本说明书实施例并不受所描述的动作顺序的限制，因为依据本说明书实施例，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本说明书实施例所必须的。It should be noted that, for the sake of simplicity of description, the above-mentioned method embodiments are expressed as a series of action combinations, but those skilled in the art should know that the embodiments of this specification are not limited by the described action sequence. Because according to the embodiment of the present specification, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the embodiments of the specification.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

以上公开的本说明书优选实施例只是用于帮助阐述本说明书。可选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本说明书实施例的内容，可作很多的修改和变化。本说明书选取并具体描述这些实施例，是为了更好地解释本说明书实施例的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本说明书。本说明书仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the present specification disclosed above are only for helping to explain the present specification. Alternative embodiments are not exhaustive in all detail, nor are the inventions limited to specific implementations described. Obviously, many modifications and changes can be made according to the contents of the embodiments of this specification. This specification selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of this specification, so that those skilled in the art can well understand and use this specification. This specification is to be limited only by the claims, along with their full scope and equivalents.

Claims

1. A data processing method based on extended reality, comprising:

Collect audio data generated by the target virtual user within a preset time interval in the extended reality scene;

determining a virtual action of the target virtual user corresponding to the audio data, and determining animation data of the target virtual user according to the virtual action;

According to the preset time interval, adding a corresponding time stamp to the encoding result of the audio data to generate a target encoding result, and adding a corresponding time stamp to the animation data to generate target animation data;

sending the target encoding result to the client through an audio transmission channel, and sending the target animation data to the client through an animation data transmission channel, so that the client can decode the target encoding result, and According to the time stamp, the generated decoding result and the animation data are played synchronously.

2. The data processing method based on extended reality according to claim 1, wherein in the collection of the extended reality scene, the audio data generated by the target virtual user within a preset time interval includes:

Collecting audio data generated by the target virtual user within at least one first preset time interval in the extended reality scene;

Correspondingly, the determining the animation data of the target virtual user according to the virtual action includes:

According to the time sequence of the at least one first preset time interval, at least one virtual action corresponding to the audio data is integrated to generate animation data of the target virtual user in the second preset time interval, wherein the The second preset time interval is composed of the at least one first preset time interval.

3. The data processing method based on extended reality according to claim 2, wherein the at least one virtual action corresponding to the audio data is integrated according to the time sequence of the at least one first preset time interval to generate the The animation data of the target virtual user in the second preset time interval includes:

Determine the first time length corresponding to the second preset time interval according to the preset frame rate;

When it is determined that the first time length is greater than a second time length corresponding to the first preset time interval, determine at least one virtual action corresponding to the second preset time interval;

sorting the at least one virtual action according to the time sequence of the at least one first preset time interval, and determining the target virtual action according to the sorting result;

The target virtual action is integrated to generate animation data of the target virtual user within the second preset time interval.

4. The data processing method based on extended reality according to claim 2 or 3, wherein according to the preset time interval, a corresponding time stamp is added to the encoding result of the audio data to generate a target encoding result, and Add corresponding timestamps to the animation data to generate target animation data, including:

According to the first preset time interval, adding a corresponding time stamp to the encoding result of the audio data to generate a target encoding result; and,

According to the second preset time interval, a corresponding time stamp is added to the animation data to generate target animation data.

5. The data processing method based on extended reality according to claim 1, said determining the virtual action of said target virtual user corresponding to said audio data, comprising:

Acquire the preset virtual action type of the target virtual user, and determine the visual phoneme corresponding to the target virtual action type;

determining a phoneme value corresponding to a target visual phoneme in the visual phonemes according to the audio data;

A virtual action of the target virtual action type corresponding to the target virtual user is determined according to the phoneme value corresponding to the target visual phoneme.

6. The data processing method based on extended reality according to claim 1, wherein the target encoding result is sent to the client through an audio transmission channel, and the target animation data is sent to the client through an animation data transmission channel. Clients, including:

Send the target encoding result and the target animation data to the cloud server, so that the cloud server sends the target encoding result to the client through the audio transmission channel, and transmits the target animation data through the animation data transmission channel sent to the client.

7. The data processing method based on extended reality according to claim 1, further comprising:

adding the target encoding result to an audio data sending queue according to the timestamp in the target encoding result; and,

Add the target animation data to an animation data sending queue according to the time stamp in the target animation data.

8. The data processing method based on extended reality according to claim 7, wherein the target encoding result is sent to the client through an audio transmission channel, and the target animation data is sent to the client through an animation data transmission channel. Clients, including:

Sending the target encoding result in the audio data sending queue to the cloud server according to a first preset time period; and,

Send the target animation data in the animation data sending queue to the cloud server according to the second preset time period, so that the cloud server will send the target encoding result to the client through the audio transmission channel, and send the target animation data to the client through the audio transmission channel. The target animation data is sent to the client through the animation data transmission channel.

9. A data processing method based on extended reality, comprising:

receiving animation data generated by the target virtual user within a preset time interval in the extended reality scene through the animation data transmission channel, wherein the animation data includes a first time stamp;

receiving an encoding result of audio data generated by the target virtual user within a preset time interval in the extended reality scenario through an audio transmission channel, wherein the encoding result includes a second timestamp;

Decoding the encoding result, and performing animation rendering based on the animation data;

According to the first time stamp and the second time stamp, the generated decoding result and animation rendering result are played synchronously in the extended reality scene.

10. The data processing method based on extended reality according to claim 9, according to the first timestamp and the second timestamp, the generated decoding result and animation rendering result in the extended reality scene Play synchronously, including:

updating a reference time according to the first time stamp, and determining whether the time difference between the second time stamp and the reference time is less than a first preset time threshold;

If so, in the extended reality scene, the generated decoding result and animation rendering result are played synchronously.

11. The data processing method based on extended reality according to claim 10, further comprising:

When it is determined that the time difference between the second time stamp and the reference time is greater than a first preset time threshold, in the extended reality scene, play the generated decoding result;

judging whether the time difference between the second time stamp and the reference time is less than a second preset time threshold;

If so, then delete the animation data;

If not, execute the step of updating the reference time according to the first timestamp.

12. A data processing system based on extended reality, comprising:

a first client, a cloud server, and at least one second client;

The first client is configured to collect audio data generated by the target virtual user within a preset time interval in the extended reality scene, determine the virtual action of the target virtual user corresponding to the audio data, and The virtual action determines the animation data of the target virtual user, and according to the preset time interval, adds a corresponding time stamp to the encoding result of the audio data to generate a target encoding result, and adds a corresponding time stamp to the animation data. timestamp, generate target animation data, and send the target encoding result and the target animation data to the cloud server;

The cloud server is configured to transmit the target encoding result to the at least one second client through an audio transmission channel, and transmit the target animation data to the at least one second client through an animation data transmission channel ;

The at least one second client is configured to decode the target encoding result, and perform animation rendering based on the animation data, and synchronously play the generated decoding result and animation rendering result according to the timestamp .

13. A computing device comprising:

memory and processor;

The memory is used to store computer-executable instructions, and the processor is used to execute the computer-executable instructions. When the computer-executable instructions are executed by the processor, the extended reality-based data described in any one of claims 1 to 11 is realized. The steps of the processing method.

14. A computer-readable storage medium storing computer-executable instructions, which implement the steps of the extended reality-based data processing method according to any one of claims 1 to 11 when the computer-executable instructions are executed by a processor.