CN115767206A - Data processing method and system based on augmented reality - Google Patents

Data processing method and system based on augmented reality Download PDF

Info

Publication number
CN115767206A
CN115767206A CN202211304735.4A CN202211304735A CN115767206A CN 115767206 A CN115767206 A CN 115767206A CN 202211304735 A CN202211304735 A CN 202211304735A CN 115767206 A CN115767206 A CN 115767206A
Authority
CN
China
Prior art keywords
target
data
animation data
animation
augmented reality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211304735.4A
Other languages
Chinese (zh)
Inventor
汤旭涛
耿文波
蒋佳忆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202211304735.4A priority Critical patent/CN115767206A/en
Publication of CN115767206A publication Critical patent/CN115767206A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The embodiment of the specification provides a data processing method and a system based on augmented reality, wherein the data processing method based on augmented reality comprises the following steps: in an extended reality scene, audio data generated by a target virtual user in a preset time interval are collected, virtual actions of the target virtual user corresponding to the audio data are determined, animation data of the target virtual user are determined according to the virtual actions, corresponding time stamps are added in coding results of the audio data according to the preset time interval to generate target coding results, the corresponding time stamps are added in the animation data to generate target animation data, the target coding results are sent to a client through an audio transmission channel, the target animation data are sent to the client through an animation data transmission channel, so that the client decodes the target coding results, and the generated decoding results and the animation data are synchronously played according to the time stamps.

Description

Data processing method and system based on augmented reality
Technical Field
The embodiment of the specification relates to the technical field of augmented reality, in particular to a data processing method based on the augmented reality.
Background
The augmented reality technology is a brand-new man-machine interaction technology, can simulate a real scene, enables a user to feel things in an objective physical world through a virtual reality system, breaks through space, time and other objective limitations, and feels experience that the user cannot experience personally in the real world, so that real world information and virtual world information are integrated seamlessly.
Currently, common augmented reality scenes include virtual game scenes, virtual live scenes and the like. However, in the current augmented reality scene, the multimedia data is usually sent to the user terminal, the user terminal separates the corresponding video data and audio data from the multimedia data, and the video data is displayed on the display and the audio data is played through the speaker, so as to realize the playing of the multimedia data.
However, in this processing method, since the processing time of the video data is generally longer than that of the audio data, the video playing will lag behind the audio, resulting in the phenomenon of audio-video asynchronism. Therefore, an effective method is needed to solve such problems.
Disclosure of Invention
In view of this, the embodiments of the present specification provide a data processing method based on augmented reality. One or more embodiments of the present specification also relate to an augmented reality-based data processing apparatus, an augmented reality-based data processing system, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical drawbacks in the prior art.
According to a first aspect of embodiments of the present specification, there is provided an augmented reality-based data processing method, including:
acquiring audio data generated by a target virtual user within a preset time interval in an augmented reality scene;
determining a virtual action of the target virtual user corresponding to the audio data, and determining animation data of the target virtual user according to the virtual action;
adding a corresponding time stamp in the coding result of the audio data according to the preset time interval to generate a target coding result, and adding a corresponding time stamp in the animation data to generate target animation data;
and sending the target coding result to a client through an audio transmission channel, and sending the target animation data to the client through an animation data transmission channel, so that the client decodes the target coding result, and synchronously plays the generated decoding result and the animation data according to the timestamp.
According to a second aspect of embodiments herein, there is provided an augmented reality-based data processing apparatus comprising:
the acquisition module is configured to acquire audio data generated by a target virtual user within a preset time interval in an augmented reality scene;
a determination module configured to determine a virtual action of the target virtual user corresponding to the audio data and determine animation data of the target virtual user according to the virtual action;
the adding module is configured to add a corresponding time stamp in the coding result of the audio data according to the preset time interval to generate a target coding result, and add a corresponding time stamp in the animation data to generate target animation data;
and the sending module is configured to send the target coding result to a client through an audio transmission channel and send the target animation data to the client through an animation data transmission channel so that the client decodes the target coding result and synchronously plays the generated decoding result and the animation data according to the timestamp.
According to a third aspect of the embodiments of the present specification, there is provided another augmented reality-based data processing method, including:
receiving animation data generated by a target virtual user in an extended reality scene within a preset time interval through an animation data transmission channel, wherein the animation data comprises a first timestamp;
receiving an encoding result of audio data generated by the target virtual user within a preset time interval in an augmented reality scene through an audio transmission channel, wherein the encoding result comprises a second timestamp;
decoding the encoding result, and performing animation rendering based on the animation data;
and synchronously playing the generated decoding result and the animation rendering result in the augmented reality scene according to the first time stamp and the second time stamp.
According to a fourth aspect of embodiments herein, there is provided another augmented reality-based data processing apparatus including:
the device comprises a first receiving module, a second receiving module and a third receiving module, wherein the first receiving module is configured to receive animation data generated by a target virtual user within a preset time interval in an augmented reality scene through an animation data transmission channel, and the animation data comprises a first timestamp;
a second receiving module, configured to receive, through an audio transmission channel, an encoding result of audio data generated by the target virtual user within a preset time interval in an augmented reality scene, where the encoding result includes a second timestamp;
a processing module configured to decode the encoding result and perform animation rendering based on the animation data;
and the playing module is configured to synchronously play the generated decoding result and animation rendering result in the augmented reality scene according to the first time stamp and the second time stamp.
According to a fifth aspect of embodiments herein, there is provided another augmented reality-based data processing system, comprising:
the system comprises a first client, a cloud server and at least one second client;
the first client is used for acquiring audio data generated by a target virtual user in an extended reality scene within a preset time interval, determining a virtual action of the target virtual user corresponding to the audio data, determining animation data of the target virtual user according to the virtual action, adding a corresponding timestamp into an encoding result of the audio data according to the preset time interval to generate a target encoding result, adding a corresponding timestamp into the animation data to generate target animation data, and sending the target encoding result and the target animation data to the cloud server;
the cloud server is used for transmitting the target coding result to the at least one second client through an audio transmission channel and transmitting the target animation data to the at least one second client through an animation data transmission channel;
and the at least one second client is used for decoding the target coding result, performing animation rendering based on the animation data, and synchronously playing the generated decoding result and the animation rendering result according to the timestamp.
According to a sixth aspect of embodiments herein, there is provided a computing device comprising:
a memory and a processor;
the memory is used for storing computer-executable instructions, and the processor is used for executing the computer-executable instructions to realize any step of the augmented reality-based data processing method.
According to a seventh aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the augmented reality-based data processing methods.
According to an eighth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the augmented reality-based data processing method described above.
In an embodiment of the present specification, audio data generated by a target virtual user in an augmented reality scene in a preset time interval is collected, a virtual action of the target virtual user corresponding to the audio data is determined, animation data of the target virtual user is determined according to the virtual action, a corresponding timestamp is added to an encoding result of the audio data according to the preset time interval to generate a target encoding result, the corresponding timestamp is added to the animation data to generate target animation data, the target encoding result is sent to a client through an audio transmission channel, and the target animation data is sent to the client through an animation data transmission channel, so that the client decodes the target encoding result, and the generated decoding result and the animation data are synchronously played according to the timestamp.
In the embodiment of the description, the time stamps are added in the audio data and the animation data, so that the client can synchronously play the animation data and the audio data by taking the time stamps as a reference, and the sound and picture synchronous playing effect is favorably improved; in addition, when data transmission is carried out, the coding results of the animation data and the audio data are respectively transmitted by using independent transmission channels, so that the animation data and the audio data are not interfered with each other in the transmission process, the real-time property of data transmission is favorably ensured, and the data processing efficiency is ensured; in addition, when the animation data are transmitted, the data are not compressed and encoded, data loss is avoided, the client side can perform animation rendering based on the animation data, the obtained rendering result can meet expectations, sound and picture synchronous playing is performed based on the animation rendering result and the decoding result of the audio data, and viewing experience of a user is improved.
Drawings
FIG. 1 is an architecture diagram of an augmented reality based data processing system provided in one embodiment of the present description;
FIG. 2 is a flow chart of a method for augmented reality based data processing according to an embodiment of the present disclosure;
FIG. 3 is a flow chart of a method for augmented reality based data processing according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for augmented reality-based data processing according to an embodiment of the present disclosure;
fig. 5a is a flowchart of a data processing method based on augmented reality according to an embodiment of the present specification;
FIG. 5b is a flowchart of a method for augmented reality-based data processing according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an augmented reality-based data processing apparatus according to an embodiment of the present specification;
FIG. 7 is a flow chart of another augmented reality-based data processing method provided in an embodiment of the present description;
fig. 8 is a flowchart illustrating a processing procedure of a data processing method based on augmented reality according to an embodiment of the present specification;
fig. 9 is a schematic structural diagram of another augmented reality-based data processing apparatus provided in an embodiment of the present specification;
fig. 10 is a block diagram of a computing device according to an embodiment of the present disclosure.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be implemented in many ways other than those specifically set forth herein, and those skilled in the art will appreciate that the present description is susceptible to similar generalizations without departing from the scope of the description, and thus is not limited to the specific implementations disclosed below.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can be termed a second and, similarly, a second can be termed a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.
First, the noun terms to which one or more embodiments of the present specification relate are explained.
Virtual user (Avatar ): the three-dimensional model synthesized by digital technology exists in a non-physical world in a digital form, and has the interaction capability dependence of human appearance behavior perception and personification.
Audio data: the microphone collects sound data.
Animation (motion) data: it refers to key point data generated when a virtual user interacts, such as natural facial expressions like lip, blink and eyebrow movements, head movements, and body skeletal movement key point data. There is a need to maintain the consistency of the animation data and to be non-modifiable, non-encodable and lossy-compressible.
Taking lip expression data as an example, the lip expression data is composed of floating point data of 15 visual phonemes, and corresponding lip animation is rendered through the data. If the data value has errors at the receiving end and the transmitting end, rendering effect is abnormal.
And (3) synchronization: the processing method of audio data and animation data on the production-transmission-reception full link finally achieves the consistent display effect of 'sound-animation'.
In the present specification, an augmented reality-based data processing method is provided, and the present specification relates to an augmented reality-based data processing apparatus, an augmented reality-based data processing system, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.
Fig. 1 shows an architecture diagram of an augmented reality-based data processing system provided according to an embodiment of the present specification, including:
a first client 102, a cloud server 104, and at least one second client 106;
the first client 102 is configured to collect audio data generated by a target virtual user within a preset time interval in an augmented reality scene, determine a virtual action of the target virtual user corresponding to the audio data, determine animation data of the target virtual user according to the virtual action, add a corresponding timestamp to an encoding result of the audio data according to the preset time interval to generate a target encoding result, add a corresponding timestamp to the animation data to generate target animation data, and send the target encoding result and the target animation data to the cloud server 104;
the cloud server 104 is configured to transmit the target encoding result to the at least one second client 106 through an audio transmission channel, and transmit the target animation data to the at least one second client 106 through an animation data transmission channel;
the at least one second client 106 is configured to decode the target encoding result, perform animation rendering based on the animation data, and perform synchronous playing on the generated decoding result and the animation rendering result according to the timestamp.
Specifically, the first client acquires audio data generated by a target virtual user within a preset time interval in an augmented reality scene, determines virtual lip movements and/or virtual limb movements of the target virtual user corresponding to the audio data, and determines animation data of the target virtual user according to the virtual lip movements and/or the virtual limb movements.
The first client encodes the audio data, adds corresponding time stamps to the encoding results of the audio data and the animation data according to a preset time interval, and then sends the encoding results with the time stamps and the animation data to the cloud server.
The cloud server transmits the encoding result added with the timestamp to the second client through the audio transmission channel, and transmits the animation data added with the timestamp to the second client through the animation data transmission channel.
And the second client decodes the coding result added with the time stamp, renders the animation based on the animation data, and synchronously plays the generated decoding result and the animation rendering result according to the coding result and the time stamp contained in the animation data.
In the embodiment of the description, the time stamps are added in the audio data and the animation data, so that the client can synchronously play the animation data and the audio data by taking the time stamps as a reference, and the sound and picture synchronous playing effect is favorably improved; in addition, when data transmission is carried out, the independent transmission channels are used for respectively transmitting the coding results of the animation data and the audio data, so that the animation data and the audio data are not interfered with each other in the transmission process, the real-time property of data transmission is favorably ensured, and the data processing efficiency is ensured; in addition, when the animation data are transmitted, the data are not compressed and coded, data loss is favorably avoided, the client side conducts animation rendering based on the animation data, the obtained rendering result can meet the expectation, sound and picture synchronous playing is conducted based on the animation rendering result and the decoding result of the audio data, and watching experience of a user is favorably improved.
The foregoing is a schematic diagram of a data processing system based on augmented reality according to this embodiment. It should be noted that the technical solution of the augmented reality-based data processing system and the following technical solution of the augmented reality-based data processing method belong to the same concept, and for details that are not described in detail in the technical solution of the augmented reality-based data processing system, reference may be made to the following description of the technical solution of the augmented reality-based data processing method.
Fig. 2 is a flowchart illustrating an augmented reality-based data processing method according to an embodiment of the present specification, which specifically includes the following steps.
Step 202, acquiring audio data generated by a target virtual user within a preset time interval in an augmented reality scene.
Specifically, the data processing method based on augmented reality provided in the embodiment of the present specification is applied to the first client.
The extended reality (XR) is a general term for various technologies such as AR (augmented reality), VR (virtual reality), MR (media reality), and the like, which are used to create a real and virtual combined digital environment by modern high-tech means with a computer as a core, and bring an immersion feeling of seamless conversion between a virtual world and a real world to an experiencer in a novel man-machine interaction manner.
The augmented reality scene in the embodiment of the present specification may be a virtual conference scene or a virtual live broadcast scene based on augmented reality, where a participant or a main broadcast in the scene is a virtual user, and a target virtual user may be a virtual user speaking in the participant or the main broadcast.
In the augmented reality scene, audio data and animation data generated by any virtual user in real time need to be synchronized at user terminals of other virtual users, so that in order to ensure consistency of audio and pictures of each user terminal, the embodiment of the specification can respectively collect audio data and animation data of a target virtual user, and add timestamps to the audio data and the animation data, so that after the audio data and the animation data are sent to each user terminal, each user terminal can synchronously play the animation data and the audio data by taking the timestamps as references.
Step 204, determining the virtual action of the target virtual user corresponding to the audio data, and determining the animation data of the target virtual user according to the virtual action.
Specifically, the virtual motion is a motion of the face or limbs of the virtual user, such as a smiling motion or a call.
The embodiments of the present disclosure aim to synchronously play audio data and animation data of a target virtual user at each user terminal, so that the audio data and the animation data of the virtual user need to be collected separately, but in practical applications, because in an extended reality scene, speech content of the virtual user and a virtual action performed during speech of the virtual user can be set in advance. Therefore, in the embodiment of the present specification, audio data generated by a target virtual user within a preset time interval in an augmented reality scene may be collected first, then a corresponding virtual action is determined according to the audio data, and animation data of the target virtual user is determined according to the virtual action.
For example, if the audio data of the target virtual user is: hi, good home. If it is determined from the audio data that the target virtual user is calling everywhere, it may be determined that the virtual motion thereof includes a hand-waving motion of a limb, in which case, the animation of waving the hand may be determined as the animation data of the target virtual user.
During specific implementation, the method for acquiring the audio data generated by the target virtual user in the extended reality scene within the preset time interval comprises the following steps:
acquiring audio data generated by a target virtual user in at least one first preset time interval in an augmented reality scene;
correspondingly, the determining the animation data of the target virtual user according to the virtual action comprises the following steps:
and integrating at least one virtual action corresponding to the audio data according to the time sequence of the at least one first preset time interval to generate animation data of the target virtual user in a second preset time interval, wherein the second preset time interval consists of the at least one first preset time interval.
Further, according to the time sequence of the at least one first preset time interval, integrating at least one virtual action corresponding to the audio data to generate animation data of the target virtual user in a second preset time interval, including:
determining a first time length corresponding to a second preset time interval according to a preset frame rate;
under the condition that the first time length is determined to be larger than a second time length corresponding to the first preset time interval, determining at least one virtual action corresponding to the second preset time interval;
sequencing the at least one virtual action according to the time sequence of the at least one first preset time interval, and determining a target virtual action according to a sequencing result;
and integrating the target virtual actions to generate animation data of the target virtual user in the second preset time interval.
Specifically, in the augmented reality scenario, the speech content and the facial or body movements of the target virtual user are changed in real time during the speech process of the target virtual user, so that to ensure the accuracy of the data acquisition result, in the embodiments of the present specification, the audio data of the target virtual user may be acquired in segments according to a first preset time interval, for example, the time length corresponding to the first preset time interval may be 10ms, and the first preset time interval may be 0-10ms, 10ms-20ms, and the like, so that 100 pieces of audio data of the target virtual user may be acquired within 1 s.
Further, after audio data generated by the target virtual user in at least one first preset time interval are acquired and obtained, the virtual actions corresponding to the audio data can be integrated according to the time sequence of the first preset time interval, and animation data of the target virtual user in a second preset time interval are generated.
Specifically, a preset frame rate of the animation can be determined, a first time length corresponding to a second preset time interval is determined according to the preset frame rate, then at least one virtual action corresponding to the second preset time interval is determined under the condition that the first time length is larger than a second time length corresponding to the first preset time interval, all the virtual actions are sequenced according to the time sequence of all the first preset time intervals, target virtual actions are determined according to sequencing results, and then the target virtual actions are integrated to generate animation data of the target virtual user in the second preset time interval.
For example, if the preset frame rate of the animation is 30fps, it indicates that the playing speed of the animation is that 30 frames of images are played per second, which is equivalent to that 1 frame of images is played for 33ms, that is, the time length corresponding to the second preset time interval is 33ms, but the time length of the first preset time interval of the audio data is 10ms, that is, the first time length corresponding to the second preset time interval is greater than the second time length corresponding to the first preset time interval, and in the case that the audio data in each first preset time interval corresponds to one virtual motion, one second preset time interval corresponds to three virtual motions of the same type, in this case, one target virtual motion needs to be screened out from the three virtual motions of the same type, and then different types of target virtual motions are integrated to generate data of the target virtual user in the second preset time interval.
Or, according to the time sequence of the at least one first preset time interval, integrating at least one virtual action corresponding to the audio data to generate animation data of the target virtual user in a second preset time interval, further according to a preset frame rate, determining a first time length corresponding to the second preset time interval, under the condition that the first time length is greater than a second time length corresponding to the first preset time interval, determining at least one virtual action corresponding to the second preset time interval, performing deduplication on the at least one virtual action, determining whether the remaining virtual actions need to be added or deleted according to the preset frame rate and the number of the remaining virtual actions after the deduplication, so that the number of the remaining virtual actions is equal to the preset frame rate, and then according to the time sequence of each first preset time interval, integrating the remaining virtual actions to generate animation data of the target virtual user in the second preset time interval.
In specific implementation, after audio data of a target virtual user is collected, a virtual action of the target virtual user is determined based on the audio data, and when animation data is determined according to the virtual action, a process of determining the virtual action of the target virtual user corresponding to the audio data can be specifically realized through the following modes:
acquiring a preset virtual action type of the target virtual user, and determining a visual phoneme corresponding to the target virtual action type;
determining a phoneme value corresponding to a target visual phoneme in the visual phonemes according to the audio data;
and determining the virtual action of the target virtual action type corresponding to the target virtual user according to the phoneme value corresponding to the target visual phoneme.
Specifically, the preset virtual motion types may include virtual lip motions, virtual eye motions, virtual limb motions, and the like, different virtual motion types may correspond to different numbers of visual phonemes, different visual phonemes may represent one type of virtual motion, and a phoneme value (0 or 1) corresponding to a visual phoneme may represent whether the virtual user makes the virtual motion.
For example, the virtual lip movements can correspond to 15 visual phonemes, namely sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, ou, that is, 15 mouth shapes. Each visual phoneme represents a virtual movement of the lips (mouth shape).
Wherein sil represents that the lip of the virtual user does not make any action, and when the phoneme value of sil is 1, the mouth shape of the virtual user is the mouth shape without any sound; the PP represents that the lip of the virtual user makes the action of starting English "b, p or m", and when the phoneme value of the PP is 1, the mouth shape of the virtual user is the mouth shape of starting English "b, p or m"; the FF represents the action of the lip of the virtual user making the starting English "f or v", and when the phoneme value of the FF is 1, the FF represents the mouth shape of the virtual user when the mouth shape is the Chinese character "f or v"; TH means that the lip of the virtual user performs the action of uttering the english "TH", and when the phoneme value of TH is 1, the mouth shape of the virtual user is the mouth shape of the uttering the english "TH".
The DD represents that the lip of the virtual user makes the action of starting English "t or d", and when the phoneme value of the DD is 1, the DD represents the mouth shape of the virtual user when the mouth shape is English "t or d"; kk represents that the lip of the virtual user makes the action of starting English "k or g", and when the phoneme value of kk is 1, the mouth shape of the virtual user is the mouth shape of starting English "k or g"; the CH represents that the lips of the virtual user make the action of the initial english "tS, dZ or S", and when the phoneme value of the CH is 1, the mouth shape of the virtual user is the mouth shape of the initial english "tS, dZ or S".
The SS represents that the lips of the virtual user do the action of starting English "s or z", and when the phoneme value of the SS is 1, the SS represents that the mouth shape of the virtual user is the mouth shape of starting English "s or z"; nn represents that the lip of the virtual user makes an action of starting English "n or l", and when the phoneme value of nn is 1, nn represents that the mouth shape of the virtual user is the mouth shape of issuing English "n or l"; RR is the action of pronouncing english "r" on the lip of the virtual user, and when the phoneme value of RR is 1, the mouth shape of the virtual user is the mouth shape of pronouncing english "r"; aa represents the operation of the lip of the virtual user to pronounce English "a", and when the phoneme value of aa is 1, the mouth shape of the virtual user is the mouth shape of pronouncing English "a".
E represents that the lips of the virtual user do the action of starting English "E", when the phoneme value of E is 1, the mouth shape of the virtual user is the mouth shape of the virtual user when the English "E" is sent; ih represents that the lip of the virtual user makes an English "a" sending action, and when the phoneme value of ih is 1, the mouth shape of the virtual user represents the mouth shape of the virtual user when the mouth shape is English "a"; oh represents the action of sending English oh by the lips of the virtual user, and when the phoneme value of oh is 1, the mouth shape of the virtual user is the mouth shape of sending English oh; ou means that the lip representing the virtual user makes an action of sending English "ou", and when the phoneme value of ou is 1, the mouth shape representing the virtual user is the mouth shape of sending English "ou".
Therefore, the embodiments of the present specification may obtain a preset virtual action type of a target virtual user, determine a visual phoneme corresponding to the target virtual action type, then determine a phoneme value corresponding to a target visual phoneme in the visual phonemes according to the audio data, and determine a virtual action of the target virtual action type corresponding to the target virtual user according to the phoneme value corresponding to the target visual phoneme.
For example, the target virtual action type is a virtual lip action, the corresponding visual phonemes are 15 types of sil, PP, FF, TH, DD, kk, CH, SS, nn, RR, aa, E, ih, oh, ou, and then it is determined that the target virtual user utters a Chinese "f" sound within a first preset time interval according to the audio data, the phoneme value of the visual phoneme FF may be set to 1, and it may be determined that the virtual action of the target virtual action type corresponding to the target virtual user is an action of making a Chinese "f" sound for the lip, and then animation data of the target virtual user may be determined according to the action.
And step 206, adding a corresponding time stamp in the coding result of the audio data according to the preset time interval to generate a target coding result, and adding a corresponding time stamp in the animation data to generate target animation data.
Specifically, in the embodiments of the present specification, after the audio data and the animation data are sent to the client, when the client plays the audio data and the animation data, sound-picture synchronization is implemented, and before the data is sent, the audio data may be compressed and encoded, and a corresponding timestamp may be added to a compression encoding result, so as to generate a target encoding result. For animation data, after the animation data are sent to the client, the client needs to perform animation rendering based on the animation data, if the animation data are compressed and encoded before the animation data are sent, the client needs to decode after receiving a compression encoding result of the animation data, and the compression encoding and decoding process may cause partial loss of the animation data, so that an animation rendering effect cannot reach an expected effect.
In specific implementation, according to the preset time interval, adding a corresponding timestamp in the encoding result of the audio data to generate a target encoding result, and adding a corresponding timestamp in the animation data to generate target animation data, including:
adding a corresponding time stamp in the coding result of the audio data according to the first preset time interval to generate a target coding result; and the number of the first and second groups,
and adding a corresponding time stamp in the animation data according to the second preset time interval to generate target animation data.
Specifically, as mentioned above, since the time lengths corresponding to the first preset time interval and the second preset time interval may be different, in order to ensure the accuracy of the time stamp adding result, the corresponding time stamp may be added to the encoding result of the audio data according to the first preset time interval, and the corresponding time stamp may be added to the animation data according to the second preset time interval, where the time stamp may be the start time or the end time of the first preset time interval or the second preset time interval.
And step 208, sending the target coding result to a client through an audio transmission channel, and sending the target animation data to the client through an animation data transmission channel, so that the client decodes the target coding result, and synchronously plays the generated decoding result and the animation data according to the timestamp.
Specifically, in the embodiments of the present disclosure, the target encoding result of the audio data and the target animation data may be transmitted through independent data transmission channels, so that the audio data and the animation data do not interfere with each other in the transmission process, thereby ensuring real-time performance of data transmission.
And respectively sending the target coding result of the audio data to the client through the audio transmission channel, sending the target animation data to the client through the animation data transmission channel, then decoding the target coding result of the audio data by the client, and synchronously playing the generated decoding result and the animation data according to the coding result and the timestamp added in the target animation data so as to realize the sound-picture synchronization effect of the client.
In specific implementation, the sending the target coding result to a client through an audio transmission channel, and sending the target animation data to the client through an animation data transmission channel includes:
and sending the target coding result and the target animation data to a cloud server so that the cloud server sends the target coding result to a client through an audio transmission channel and sends the target animation data to the client through an animation data transmission channel.
Specifically, in the embodiments of the present disclosure, data transmission may be implemented by using a cloud server, that is, when a target encoding result of audio data and target animation data need to be transmitted to a client, the target encoding result of the audio data and the target animation data may be first sent to the cloud server, the cloud server sends the target encoding result to the client through an audio transmission channel, and the target animation data is sent to the client through an animation data transmission channel.
In addition, before the encoding result of the audio data and the animation data are sent to a cloud server or a client, the target encoding result can be added to an audio data sending queue according to a timestamp in the target encoding result; and (c) a second step of,
and adding the target animation data to an animation data sending queue according to the timestamp in the target animation data.
Further, sending the target encoding result to a client through an audio transmission channel, and sending the target animation data to the client through an animation data transmission channel, includes:
according to a first preset time period, sending the target coding result in the audio data sending queue to a cloud server; and (c) a second step of,
and sending the target animation data in the animation data sending queue to the cloud server according to a second preset time period, so that the cloud server sends the target coding result to a client through an audio transmission channel, and sends the target animation data to the client through an animation data transmission channel.
Specifically, after audio data and animation data within a period of time are collected, the audio data can be preprocessed and added into an audio data sending queue, and the preprocessing comprises compression coding, timestamp adding VTS and the like. The animation data is added with a time stamp ATS, and is kept in the form of binary data, and is added to the animation data transmission queue.
The target coding results in the audio data sending queue and the target animation data in the animation data sending queue can be taken out of the sending queue at regular time according to the same or different preset time periods and sent to the cloud server, the cloud server sends the target coding results to the client through the audio transmission channel, and the target animation data are sent to the client through the animation data transmission channel.
Based on this, a schematic diagram of a data processing process based on an augmented reality scene provided by the embodiments of the present specification is shown in fig. 3. In fig. 3, the transmitting end (first client) first generates audio data and animation data for a period of time, for example: audio data of 10ms, animation data of 33ms, wherein the animation data can comprise natural facial expressions such as lip movements, eye blinks and eyebrow movements, head movements, body skeleton movements and the like related to audio.
Then preprocessing the audio data and adding the preprocessed audio data into an audio data sending queue, wherein the preprocessing comprises compression, encoding, time stamp adding VTS (virtual terminal system) and the like; and adding a timestamp ATS to the animation data, keeping the form of binary data, adding the binary data into an animation data sending queue, and taking out and sending the data of the sending queue to the cloud server at regular time.
The audio transmission channel and the animation data transmission channel of the cloud server are independent services and are respectively responsible for forwarding data to a receiving end (a second client). The independent animation data transmission channel can transmit data with larger capacity and does not interfere with audio data.
The receiving end receives the audio data and the animation data, adds the audio data and the animation data into the buffer area after processing, then synchronizes the animation data and the audio data, plays the audio data when playing, and renders the animation at the current frame.
In practical applications, an audio data processing thread and an animation data processing thread exist in a receiving end (a second client), and a schematic diagram of a sound and picture synchronous playing process provided in an embodiment of the present disclosure is shown in fig. 4.
In fig. 4, for the audio data processing thread, the audio data and the time stamp VTS in the buffer may be read first, then the time stamp CTS of the current cursor is updated, CTS = VTS, and audio is played, for the animation data processing thread, the animation data and the time stamp ATS in the buffer may be read first, and it is determined whether the time stamps CTS of the ATS and the current cursor match, the matching condition may be customized, for example, the difference between the two is less than 20ms. If so, rendering and displaying the animation based on the animation data; if not, judging whether the (ATS-CTS) is smaller than the DeltaTS, and if the (ATS-CTS) < DeltaTS, discarding the animation data; if (ATS-CTS) ≧ DeltaTS, it indicates that the rendering time of the animation data has not been reached, in which case, the CTS update may be waited and rendering may be delayed. The DeltaTS represents a synchronization delay threshold, and may be customized according to actual conditions, for example, set to 50ms.
In addition, a schematic diagram of a data processing process based on augmented reality applied to a virtual live broadcast scene is shown in fig. 5a, an anchor terminal collects audio data of an anchor and animation data such as a physical expression, generates a synchronization timestamp, adds timestamps to the audio data and the animation data respectively, transmits the animation data such as the audio data and the physical expression to a viewer terminal through a cloud server, performs animation rendering based on the animation data by the viewer terminal, and performs audio and animation synchronous playing.
An expanded reality-based data processing process schematic diagram applied to a virtual conference scene is shown in fig. 5b, a speaker terminal collects audio data, body expression and other animation data of a speaker in a conference, generates a synchronous timestamp, adds timestamps to the audio data and the animation data respectively, transmits the audio data, the body expression and other animation data to a cloud server, performs animation rendering based on the animation data through a cloud rendering engine service, and sends the audio data and an animation rendering result to a viewer terminal to perform audio and animation synchronous playing at the viewer terminal, so that a live video effect is achieved.
In an embodiment of the present specification, audio data generated by a target virtual user in an augmented reality scene in a preset time interval is collected, a virtual action of the target virtual user corresponding to the audio data is determined, animation data of the target virtual user is determined according to the virtual action, a corresponding timestamp is added to an encoding result of the audio data according to the preset time interval to generate a target encoding result, the corresponding timestamp is added to the animation data to generate target animation data, the target encoding result is sent to a client through an audio transmission channel, and the target animation data is sent to the client through an animation data transmission channel, so that the client decodes the target encoding result, and the generated decoding result and the animation data are synchronously played according to the timestamp.
In the embodiment of the description, the time stamps are added in the audio data and the animation data, so that the client can synchronously play the animation data and the audio data by taking the time stamps as a reference, and the sound and picture synchronous playing effect is favorably improved; in addition, when data transmission is carried out, independent transmission channels are used for respectively transmitting target coding results of target animation data and audio data, so that the target animation data and the audio data are not interfered with each other in the transmission process, the real-time performance of data transmission is guaranteed, and the data processing efficiency is guaranteed; in addition, when the animation data are transmitted, the data are not compressed and coded, data loss is favorably avoided, the client side conducts animation rendering based on the animation data, the obtained rendering result can meet the expectation, sound and picture synchronous playing is conducted based on the animation rendering result and the decoding result of the audio data, and watching experience of a user is favorably improved.
Corresponding to the above method embodiment, the present specification further provides an embodiment of a data processing apparatus based on augmented reality, and fig. 6 shows a schematic structural diagram of a data processing apparatus based on augmented reality provided in an embodiment of the present specification. As shown in fig. 6, the apparatus includes:
the acquisition module 602 is configured to acquire audio data generated by a target virtual user within a preset time interval in an augmented reality scene;
a determining module 604 configured to determine a virtual action of the target virtual user corresponding to the audio data, and determine animation data of the target virtual user according to the virtual action;
an adding module 606 configured to add a corresponding timestamp to the encoding result of the audio data according to the preset time interval to generate a target encoding result, and add a corresponding timestamp to the animation data to generate target animation data;
the sending module 608 is configured to send the target coding result to a client through an audio transmission channel, and send the target animation data to the client through an animation data transmission channel, so that the client decodes the target coding result, and synchronously plays the generated decoding result and the animation data according to the timestamp.
Optionally, the acquisition module 602 is further configured to:
acquiring audio data generated by a target virtual user in at least one first preset time interval in an augmented reality scene;
accordingly, the determining module 604 is further configured to:
and integrating at least one virtual action corresponding to the audio data according to the time sequence of the at least one first preset time interval to generate animation data of the target virtual user in a second preset time interval, wherein the second preset time interval consists of the at least one first preset time interval.
Optionally, the determining module 604 is further configured to:
determining a first time length corresponding to a second preset time interval according to a preset frame rate;
under the condition that the first time length is determined to be larger than a second time length corresponding to the first preset time interval, determining at least one virtual action corresponding to the second preset time interval;
sequencing the at least one virtual action according to the time sequence of the at least one first preset time interval, and determining a target virtual action according to a sequencing result;
and integrating the target virtual actions to generate animation data of the target virtual user in the second preset time interval.
Optionally, the adding module 606 is further configured to:
adding a corresponding time stamp in the coding result of the audio data according to the first preset time interval to generate a target coding result; and (c) a second step of,
and adding a corresponding time stamp in the animation data according to the second preset time interval to generate target animation data.
Optionally, the determining module 604 is further configured to:
acquiring a preset virtual action type of the target virtual user, and determining a visual phoneme corresponding to the target virtual action type;
determining a phoneme value corresponding to a target visual phoneme in the visual phonemes according to the audio data;
and determining the virtual action of the target virtual action type corresponding to the target virtual user according to the phoneme value corresponding to the target visual phoneme.
Optionally, the sending module 608 is further configured to:
and sending the target coding result and the target animation data to a cloud server so that the cloud server sends the target coding result to a client through an audio transmission channel and sends the target animation data to the client through an animation data transmission channel.
Optionally, the augmented reality-based data processing apparatus further includes a processing module configured to:
adding the target coding result to an audio data sending queue according to the timestamp in the target coding result; and the number of the first and second groups,
and adding the target animation data to an animation data sending queue according to the time stamp in the target animation data.
Optionally, the sending module 608 is further configured to:
according to a first preset time period, sending the target coding result in the audio data sending queue to a cloud server; and the number of the first and second groups,
and sending the target animation data in the animation data sending queue to the cloud server according to a second preset time period, so that the cloud server sends the target coding result to a client through an audio transmission channel, and sends the target animation data to the client through an animation data transmission channel.
The foregoing is a schematic diagram of a data processing apparatus based on augmented reality according to this embodiment. It should be noted that the technical solution of the augmented reality-based data processing apparatus belongs to the same concept as the technical solution of the augmented reality-based data processing method, and for details that are not described in detail in the technical solution of the augmented reality-based data processing apparatus, reference may be made to the description of the technical solution of the augmented reality-based data processing method.
Fig. 7 is a flowchart illustrating another augmented reality-based data processing method according to an embodiment of the present specification, which specifically includes the following steps.
Step 702, receiving animation data generated by a target virtual user within a preset time interval in an augmented reality scene through an animation data transmission channel, wherein the animation data comprises a first timestamp.
Step 704, receiving an encoding result of audio data generated by the target virtual user within a preset time interval in the augmented reality scene through an audio transmission channel, where the encoding result includes a second timestamp.
Step 706, decoding the encoding result, and performing animation rendering based on the animation data.
And 708, synchronously playing the generated decoding result and the animation rendering result in the augmented reality scene according to the first time stamp and the second time stamp.
In a specific implementation, the synchronously playing the generated decoding result and the animation rendering result in the augmented reality scene according to the first timestamp and the second timestamp includes:
updating the reference time according to the first timestamp, and determining whether the time difference between the second timestamp and the reference time is less than a first preset time threshold;
and if so, synchronously playing the generated decoding result and the animation rendering result in the augmented reality scene.
In addition, under the condition that the time difference between the second timestamp and the reference time is determined to be greater than a first preset time threshold, playing a generated decoding result in the augmented reality scene;
judging whether the time difference between the second timestamp and the reference time is smaller than a second preset time threshold value or not;
if yes, deleting the animation data;
if not, the step of updating the reference time according to the first time stamp is executed.
Specifically, the data processing method based on augmented reality provided by the embodiment of the present specification is applied to a second client, where the second client receives an encoding result of audio data and animation data through independent channels, and in addition, the second client has an audio data processing thread and an animation data processing thread, and for the audio data processing thread, the audio data and the timestamp VTS in a buffer area can be read first, and then the timestamp CTS of a current cursor is updated, so that CTS = VTS, and audio is played, and for the animation data processing thread, the animation data and the timestamp ATS in the buffer area can be read first, and whether the timestamp CTS of the ATS and the current cursor is matched or not is determined, and a matching condition can be customized, for example, a difference between the two is less than 20ms. If so, performing animation rendering and displaying based on the animation data; if not, judging whether the (ATS-CTS) is smaller than the DeltaTS, and if the (ATS-CTS) < DeltaTS, discarding the animation data; if (ATS-CTS) ≧ DeltaTS, it indicates that the rendering time of the animation data has not been reached, in which case, the CTS update may be waited and rendering may be delayed. The DeltaTS represents a synchronization delay threshold, and may be customized according to actual situations, for example, set to 50ms.
In the embodiment of the description, the time stamps are added in the audio data and the animation data, so that the client can synchronously play the animation data and the audio data by taking the time stamps as a reference, and the sound and picture synchronous playing effect is favorably improved; in addition, when data transmission is carried out, the coding results of the animation data and the audio data are respectively transmitted by using independent transmission channels, so that the animation data and the audio data are not interfered with each other in the transmission process, the real-time property of data transmission is favorably ensured, and the data processing efficiency is ensured; in addition, when the animation data are transmitted, the data are not compressed and encoded, data loss is avoided, the client side can perform animation rendering based on the animation data, the obtained rendering result can meet expectations, sound and picture synchronous playing is performed based on the animation rendering result and the decoding result of the audio data, and viewing experience of a user is improved.
The above is an illustrative scheme of another data processing method based on augmented reality according to the embodiment. It should be noted that the other technical solution of the data processing method based on augmented reality belongs to the same concept as the above technical solution of the data processing method based on augmented reality, and details of the technical solution of the data processing device based on augmented reality, which are not described in detail, can be referred to the above description of the technical solution of the data processing method based on augmented reality.
The following describes the data processing method based on augmented reality further by taking an application of the data processing method based on augmented reality provided in this specification in a virtual conference scene as an example, with reference to fig. 8. Fig. 8 shows a processing flow chart of a data processing method based on augmented reality according to an embodiment of the present specification, which specifically includes the following steps.
Step 802, a first client collects audio data generated by a speaking user in a preset time interval in a virtual conference scene based on the augmented reality, determines a virtual lip action and/or a virtual limb action of the speaking user corresponding to the audio data, and determines animation data of the speaking user according to the virtual lip action and/or the virtual limb action.
And step 804, the first client encodes the audio data, and adds corresponding timestamps to the encoding result of the audio data and the animation data according to a preset time interval.
In step 806, the first client sends the encoding result added with the timestamp and the animation data to the cloud server.
Step 808, the cloud server transmits the encoding result added with the timestamp to the second client through the audio transmission channel.
And step 810, the cloud server transmits the animation data added with the time stamp to the second client through an animation data transmission channel.
And step 812, the second client decodes the encoding result added with the timestamp and performs animation rendering based on the animation data.
And 814, the second client synchronously plays the generated decoding result and the animation rendering result according to the encoding result added with the timestamp and the timestamp contained in the animation data.
In the embodiment of the description, the time stamps are added in the audio data and the animation data, so that the client can synchronously play the animation data and the audio data by taking the time stamps as a reference, and the sound and picture synchronous playing effect is favorably improved; in addition, when data transmission is carried out, the coding results of the animation data and the audio data are respectively transmitted by using independent transmission channels, so that the animation data and the audio data are not interfered with each other in the transmission process, the real-time property of data transmission is favorably ensured, and the data processing efficiency is ensured; in addition, when the animation data are transmitted, the data are not compressed and encoded, data loss is avoided, the client side can perform animation rendering based on the animation data, the obtained rendering result can meet expectations, sound and picture synchronous playing is performed based on the animation rendering result and the decoding result of the audio data, and viewing experience of a user is improved.
Corresponding to the above method embodiment, the present specification further provides another embodiment of an augmented reality-based data processing apparatus, and fig. 9 shows a schematic structural diagram of another augmented reality-based data processing apparatus provided in an embodiment of the present specification. As shown in fig. 9, the apparatus includes:
a first receiving module 902, configured to receive animation data, generated by a target virtual user within a preset time interval, in an augmented reality scene through an animation data transmission channel, where the animation data includes a first timestamp;
a second receiving module 904, configured to receive, through an audio transmission channel, an encoding result of audio data generated by the target virtual user within a preset time interval in an augmented reality scene, where the encoding result includes a second timestamp;
a processing module 906 configured to decode the encoding result and perform animation rendering based on the animation data;
a playing module 908 configured to play the generated decoding result and the animation rendering result synchronously in the augmented reality scene according to the first timestamp and the second timestamp.
Optionally, the playing module 908 is further configured to:
updating the reference time according to the first timestamp, and determining whether the time difference between the second timestamp and the reference time is less than a first preset time threshold;
and if so, synchronously playing the generated decoding result and the animation rendering result in the augmented reality scene.
Optionally, the playing module 908 is further configured to:
playing a generated decoding result in the augmented reality scene under the condition that the time difference between the second timestamp and the reference time is determined to be greater than a first preset time threshold;
judging whether the time difference between the second timestamp and the reference time is smaller than a second preset time threshold value or not;
if yes, deleting the animation data;
and if not, executing the step of updating the reference time according to the first time stamp.
The above is another schematic solution of the data processing apparatus based on augmented reality according to the embodiment. It should be noted that the technical solution of the other data processing apparatus based on augmented reality belongs to the same concept as the technical solution of the other data processing method based on augmented reality, and details of the technical solution of the data processing apparatus based on augmented reality, which are not described in detail, can be referred to the description of the technical solution of the other data processing method based on augmented reality.
FIG. 10 illustrates a block diagram of a computing device 1000 provided in accordance with one embodiment of the present description. The components of the computing device 1000 include, but are not limited to, a memory 1010 and a processor 1020. The processor 1020 is coupled to the memory 1010 via a bus 1030 and the database 1050 is used to store data.
Computing device 1000 also includes access device 1040, access device 1040 enabling computing device 1000 to communicate via one or more networks 1060. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 1040 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present description, the above-described components of computing device 1000 and other components not shown in FIG. 10 may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 10 is for purposes of example only and is not limiting as to the scope of the present description. Other components may be added or replaced as desired by those skilled in the art.
Computing device 1000 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 1000 may also be a mobile or stationary server.
Wherein the processor 1020 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the augmented reality based data processing method described above.
The foregoing is a schematic diagram of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method based on augmented reality belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method based on augmented reality.
An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor, implement the steps of the augmented reality-based data processing method described above.
The above is an illustrative scheme of a computer-readable storage medium of the embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method based on augmented reality, and for details that are not described in detail in the technical solution of the storage medium, reference may be made to the description of the technical solution of the data processing method based on augmented reality.
An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the augmented reality-based data processing method.
The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program is the same as the technical solution of the data processing method based on the augmented reality, and details of the technical solution of the computer program, which are not described in detail, can be referred to the description of the technical solution of the data processing method based on the augmented reality.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM), a Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, etc. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.
It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of combinations of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the embodiments. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, and to thereby enable others skilled in the art to best understand the specification and utilize the specification. The specification is limited only by the claims and their full scope and equivalents.

Claims (14)

1. An augmented reality-based data processing method, comprising:
acquiring audio data generated by a target virtual user within a preset time interval in an augmented reality scene;
determining a virtual action of the target virtual user corresponding to the audio data, and determining animation data of the target virtual user according to the virtual action;
adding a corresponding time stamp in the coding result of the audio data according to the preset time interval to generate a target coding result, and adding a corresponding time stamp in the animation data to generate target animation data;
and sending the target coding result to a client through an audio transmission channel, and sending the target animation data to the client through an animation data transmission channel so that the client decodes the target coding result, and synchronously plays the generated decoding result and the animation data according to the timestamp.
2. The augmented reality-based data processing method according to claim 1, wherein the acquiring audio data generated by a target virtual user within a preset time interval in an augmented reality scene comprises:
acquiring audio data generated by a target virtual user in at least one first preset time interval in an augmented reality scene;
correspondingly, the determining the animation data of the target virtual user according to the virtual action comprises the following steps:
and integrating at least one virtual action corresponding to the audio data according to the time sequence of the at least one first preset time interval to generate animation data of the target virtual user in a second preset time interval, wherein the second preset time interval consists of the at least one first preset time interval.
3. The augmented reality-based data processing method according to claim 2, wherein the integrating at least one virtual motion corresponding to the audio data according to the time sequence of the at least one first preset time interval to generate animation data of the target virtual user in a second preset time interval includes:
determining a first time length corresponding to a second preset time interval according to a preset frame rate;
under the condition that the first time length is determined to be larger than a second time length corresponding to the first preset time interval, determining at least one virtual action corresponding to the second preset time interval;
sequencing the at least one virtual action according to the time sequence of the at least one first preset time interval, and determining a target virtual action according to a sequencing result;
and integrating the target virtual actions to generate animation data of the target virtual user in the second preset time interval.
4. The augmented reality-based data processing method according to claim 2 or 3, wherein the generating target encoding results by adding corresponding time stamps to the encoding results of the audio data according to the preset time interval and generating target animation data by adding corresponding time stamps to the animation data comprises:
adding a corresponding time stamp in the coding result of the audio data according to the first preset time interval to generate a target coding result; and (c) a second step of,
and adding a corresponding time stamp in the animation data according to the second preset time interval to generate target animation data.
5. The augmented reality-based data processing method of claim 1, the determining a virtual action of the target virtual user corresponding to the audio data, comprising:
acquiring a preset virtual action type of the target virtual user, and determining a visual phoneme corresponding to the target virtual action type;
determining a phoneme value corresponding to a target visual phoneme in the visual phonemes according to the audio data;
and determining the virtual action of the target virtual action type corresponding to the target virtual user according to the phoneme value corresponding to the target visual phoneme.
6. The augmented reality-based data processing method according to claim 1, wherein the sending the target encoding result to a client through an audio transmission channel and the target animation data to the client through an animation data transmission channel comprises:
and sending the target coding result and the target animation data to a cloud server so that the cloud server sends the target coding result to a client through an audio transmission channel and sends the target animation data to the client through an animation data transmission channel.
7. The augmented reality-based data processing method of claim 1, further comprising:
adding the target coding result to an audio data sending queue according to the timestamp in the target coding result; and (c) a second step of,
and adding the target animation data to an animation data sending queue according to the time stamp in the target animation data.
8. The augmented reality-based data processing method according to claim 7, wherein the sending the target encoding result to a client through an audio transmission channel and the target animation data to the client through an animation data transmission channel comprises:
according to a first preset time period, sending the target coding result in the audio data sending queue to a cloud server; and (c) a second step of,
and sending the target animation data in the animation data sending queue to the cloud server according to a second preset time period, so that the cloud server sends the target coding result to a client through an audio transmission channel, and sends the target animation data to the client through an animation data transmission channel.
9. An augmented reality-based data processing method, comprising:
receiving animation data generated by a target virtual user in an extended reality scene within a preset time interval through an animation data transmission channel, wherein the animation data comprises a first timestamp;
receiving an encoding result of audio data generated by the target virtual user within a preset time interval in an augmented reality scene through an audio transmission channel, wherein the encoding result comprises a second timestamp;
decoding the encoding result, and performing animation rendering based on the animation data;
and synchronously playing the generated decoding result and the animation rendering result in the augmented reality scene according to the first time stamp and the second time stamp.
10. The augmented reality-based data processing method according to claim 9, wherein the synchronous playing of the generated decoding result and the animation rendering result in the augmented reality scene according to the first timestamp and the second timestamp comprises:
updating the reference time according to the first timestamp, and determining whether the time difference between the second timestamp and the reference time is less than a first preset time threshold;
and if so, synchronously playing the generated decoding result and the animation rendering result in the augmented reality scene.
11. The augmented reality-based data processing method of claim 10, further comprising:
playing a generated decoding result in the augmented reality scene under the condition that the time difference between the second timestamp and the reference time is determined to be greater than a first preset time threshold;
judging whether the time difference between the second timestamp and the reference time is smaller than a second preset time threshold value or not;
if yes, deleting the animation data;
if not, the step of updating the reference time according to the first time stamp is executed.
12. An augmented reality based data processing system comprising:
the system comprises a first client, a cloud server and at least one second client;
the first client is used for acquiring audio data generated by a target virtual user in an extended reality scene within a preset time interval, determining a virtual action of the target virtual user corresponding to the audio data, determining animation data of the target virtual user according to the virtual action, adding a corresponding timestamp into an encoding result of the audio data according to the preset time interval to generate a target encoding result, adding a corresponding timestamp into the animation data to generate target animation data, and sending the target encoding result and the target animation data to the cloud server;
the cloud server is used for transmitting the target coding result to the at least one second client through an audio transmission channel and transmitting the target animation data to the at least one second client through an animation data transmission channel;
and the at least one second client is used for decoding the target coding result, performing animation rendering based on the animation data, and synchronously playing the generated decoding result and the animation rendering result according to the timestamp.
13. A computing device, comprising:
a memory and a processor;
the memory is configured to store computer-executable instructions, and the processor is configured to execute the computer-executable instructions, which when executed by the processor, implement the steps of the augmented reality based data processing method of any one of claims 1 to 11.
14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the augmented reality based data processing method of any one of claims 1 to 11.
CN202211304735.4A 2022-10-24 2022-10-24 Data processing method and system based on augmented reality Pending CN115767206A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211304735.4A CN115767206A (en) 2022-10-24 2022-10-24 Data processing method and system based on augmented reality

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211304735.4A CN115767206A (en) 2022-10-24 2022-10-24 Data processing method and system based on augmented reality

Publications (1)

Publication Number Publication Date
CN115767206A true CN115767206A (en) 2023-03-07

Family

ID=85353321

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211304735.4A Pending CN115767206A (en) 2022-10-24 2022-10-24 Data processing method and system based on augmented reality

Country Status (1)

Country Link
CN (1) CN115767206A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116614650A (en) * 2023-06-16 2023-08-18 上海随幻智能科技有限公司 Voice and picture synchronous private domain live broadcast method, system, equipment, chip and medium
CN118018584A (en) * 2023-07-25 2024-05-10 广州通则康威科技股份有限公司 XR information transmission control method, device, system and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1868213A (en) * 2003-09-02 2006-11-22 索尼株式会社 Content receiving apparatus, video/audio output timing control method, and content providing system
CN105898506A (en) * 2016-05-03 2016-08-24 乐视控股(北京)有限公司 Method and system for multi-screen playing of media files
CN114221940A (en) * 2021-12-13 2022-03-22 北京百度网讯科技有限公司 Audio data processing method, system, device, equipment and storage medium
CN114554277A (en) * 2020-11-24 2022-05-27 腾讯科技(深圳)有限公司 Multimedia processing method, device, server and computer readable storage medium
WO2022121558A1 (en) * 2020-12-11 2022-06-16 北京字跳网络技术有限公司 Livestreaming singing method and apparatus, device, and medium
CN114782594A (en) * 2022-04-29 2022-07-22 北京慧夜科技有限公司 Animation generation method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1868213A (en) * 2003-09-02 2006-11-22 索尼株式会社 Content receiving apparatus, video/audio output timing control method, and content providing system
CN105898506A (en) * 2016-05-03 2016-08-24 乐视控股(北京)有限公司 Method and system for multi-screen playing of media files
CN114554277A (en) * 2020-11-24 2022-05-27 腾讯科技(深圳)有限公司 Multimedia processing method, device, server and computer readable storage medium
WO2022121558A1 (en) * 2020-12-11 2022-06-16 北京字跳网络技术有限公司 Livestreaming singing method and apparatus, device, and medium
CN114221940A (en) * 2021-12-13 2022-03-22 北京百度网讯科技有限公司 Audio data processing method, system, device, equipment and storage medium
CN114782594A (en) * 2022-04-29 2022-07-22 北京慧夜科技有限公司 Animation generation method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116614650A (en) * 2023-06-16 2023-08-18 上海随幻智能科技有限公司 Voice and picture synchronous private domain live broadcast method, system, equipment, chip and medium
CN118018584A (en) * 2023-07-25 2024-05-10 广州通则康威科技股份有限公司 XR information transmission control method, device, system and storage medium

Similar Documents

Publication Publication Date Title
CN112543342B (en) Virtual video live broadcast processing method and device, storage medium and electronic equipment
CN115767206A (en) Data processing method and system based on augmented reality
WO2022166709A1 (en) Virtual video live broadcast processing method and apparatus, and storage medium and electronic device
CN110446000B (en) Method and device for generating dialogue figure image
US20150269928A1 (en) Instant messaging method and system, communication information processing method, terminal, and storage medium
US6943794B2 (en) Communication system and communication method using animation and server as well as terminal device used therefor
CN113592985B (en) Method and device for outputting mixed deformation value, storage medium and electronic device
JP4344658B2 (en) Speech synthesizer
US20040114731A1 (en) Communication system
JP2006330958A (en) Image composition device, communication terminal using the same, and image communication system and chat server in the system
CN1532775A (en) Visuable telephone terminal
US20030163315A1 (en) Method and system for generating caricaturized talking heads
CN115035604B (en) Method, model and training method for driving character mouth shape through audio
CN114286021B (en) Rendering method, rendering device, server, storage medium, and program product
CN116389777A (en) Cloud digital person live broadcasting method, cloud device, anchor terminal device and system
CN115550711A (en) Virtual digital human rendering method, rendering engine and system
CA2162199A1 (en) Acoustic-assisted image processing
CN113282791B (en) Video generation method and device
CN113395569B (en) Video generation method and device
CN115643424A (en) Live broadcast data processing method and system
JP2001357414A (en) Animation communicating method and system, and terminal equipment to be used for it
CN114079800A (en) Virtual character performance method, device, system and computer readable storage medium
JP3854871B2 (en) Image processing apparatus, image processing method, recording medium, and program
Lin et al. A speech driven talking head system based on a single face image
CN112995568B (en) Customer service system based on video and construction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination