CN113923065B

CN113923065B - Cross-version communication method, system, medium and server based on chat room audio

Info

Publication number: CN113923065B
Application number: CN202111039308.3A
Authority: CN
Inventors: 李维将; 田云翔; 陈正超; 段凌云
Original assignee: Guiyang Yuwan Technology Co ltd
Current assignee: Guiyang Yuwan Technology Co ltd
Priority date: 2021-09-06
Filing date: 2021-09-06
Publication date: 2023-11-24
Anticipated expiration: 2041-09-06
Also published as: CN113923065A

Abstract

The application provides a cross-version communication method, a system, a medium and a server based on chat room audio, wherein the cross-version communication method based on the chat room audio is operated by utilizing a voice server, so that the cross-version communication between new and old version clients can be realized, and the voice audio sampling rate and the updated codec updated in the audio system upgrading process can enable the user of the new version client to obtain better experience. Meanwhile, the voice server configures audio parameters, and the server encodes and decodes and resamples the audio, so that call compatibility among clients of new version, old version and new and old version can be supported. Moreover, the method can very conveniently realize gray verification by configuring the audio parameters corresponding to the client of the new version of the configuration of the single server, so that the audio parameters of the room can be configured based on the audio parameters, and the iterative seamless upgrading of the version of the audio parameters of the client can be realized.

Description

Cross-version communication method, system, medium and server based on chat room audio

Technical Field

The application relates to the technical field of data processing, in particular to a cross-version communication method, a system, a medium and a server based on chat room audio.

Background

In a chat room scenario, in order to enhance the user experience, an audio parameter upgrade is often involved. The existing upgrading mode is generally directly issued: directly releasing new audio parameters in the new version, wherein the server only supports the new parameters; alternatively, the pre-buried version: and supporting parameter configuration in the new version, when the new version is released, firstly, not starting the new parameters, and after the majority of old version users are upgraded to the new version, releasing the new audio parameters. However, these conventional solutions have the following problems:

directly issuing: this approach only supports new versions, irrespective of old version users, and is then unusable for online systems with a large number of old users, easily resulting in user churn.

Pre-buried version: the client is pre-buried for a period of time, the user can support a new sampling rate and a new codec after basic upgrade, and the server switches the upgrade. However, this approach is inconvenient for gray level testing, and is slow for old users to upgrade, and is not friendly for old users who are unwilling to upgrade.

Therefore, how to seamlessly upgrade the chat-room speech audio sampling rate without affecting the use of old version users is a challenge in this subdivision area for the case where old version audio parameters are difficult to modify (very costly) in the chat-room scenario.

Disclosure of Invention

The embodiment of the application aims to provide a cross-version communication method, a system, a medium and a server based on chat room audio, so that the seamless upgrading of the voice audio sampling rate of a chat room is realized without affecting the use of old version users under the condition that the parameters of the old version audio in a chat room scene cannot be modified.

In order to achieve the above object, an embodiment of the present application is achieved by:

in a first aspect, an embodiment of the present application provides a cross-version communication method based on chat room audio, an audio system includes a voice server and a plurality of terminals, and the plurality of terminals at least includes a first terminal with an old version client and a second terminal with a new version client, where the old version client adapts to a first sampling rate, the new version client adapts to a second sampling rate, and the first sampling rate is lower than the second sampling rate, and the method is applied to the voice server, and for a virtual chat room associated with the first terminal and the second terminal, the method includes: receiving first audio data sent by the first terminal, calling a first voice decoder to decode the first audio data to obtain first decoded data, and calling a second voice encoder to perform audio sampling of a second sampling rate on the first decoded data to obtain second audio data, so as to determine final audio data based on the second audio data and send the final audio data to the second terminal; or, receiving third audio data sent by the second terminal, calling a second voice decoder to decode the third audio data to obtain second decoded data, and calling a first voice encoder to perform audio sampling with a first sampling rate on the second decoded data to obtain fourth audio data, so as to determine final audio data based on the fourth audio data and send the final audio data to the first terminal.

In the embodiment of the application, the cross-version communication method based on the chat room audio is operated by utilizing the voice server, so that the cross-version communication between the new version client and the old version client can be realized, and the user of the new version client can obtain better experience by the voice audio sampling rate and the updated codec updated in the audio system upgrading (the old version audio parameters cannot be modified) process. Meanwhile, by configuring audio parameters (the old version client adapts to the first sampling rate and the new version client adapts to the second sampling rate) through the voice server, the server can support call compatibility among the new version clients, the old version clients and the new and old version clients by encoding and decoding the audio and resampling the audio. Moreover, the method can very conveniently realize gray verification by configuring the audio parameters corresponding to the client of the new version of the configuration of the single server, so that the audio parameters of a room (virtual chat room) can be configured based on the audio parameters, and the iterative seamless upgrading of the version of the audio parameters of the client can be realized. In addition, the user verification sampling rate and the new codec of the client with partial new version can be supported, if the release of the client with new version is problematic, the audio parameters of the client with old version can be conveniently returned to be normally used, and thus the sound use experience of the client with new version is improved on the premise of not influencing the use of the client with old version. And, this way the decoded resampling process of the audio is placed on the voice server without requiring additional configuration of the client.

With reference to the first aspect, in a first possible implementation manner of the first aspect, the virtual chat room further has an additional terminal associated therewith, where the additional terminal is homogenous with the first terminal, and before determining final audio data based on the second audio data and sending the final audio data to the second terminal, the method further includes: receiving fifth audio data sent by the additional terminal, calling a first voice decoder to decode the fifth audio data to obtain third decoded data, and calling a second voice encoder to perform audio sampling of a second sampling rate on the third decoded data to obtain sixth audio data; correspondingly, determining final audio data based on the second audio data and sending the final audio data to the second terminal, including: and mixing the second audio data with the sixth audio data to obtain final audio data, and sending the final audio data to the second terminal.

In this implementation manner, the virtual chat room is further associated with an additional terminal, and the additional terminal is homogenous with the first terminal (i.e. adapts to the first sampling rate), so that the virtual chat room belongs to a multi-user chat scene, and can perform corresponding audio mixing operation, on one hand, the virtual chat room can be matched with an actual chat scene, so as to improve multi-user chat experience, and on the other hand, the data amount required to be received when the second terminal is used as a receiving end can be reduced, and the final audio data is processed into a form (without additional codec, resampling and audio mixing processing performed by the second terminal) convenient for the second terminal.

With reference to the first aspect, in a second possible implementation manner of the first aspect, the virtual chat room further has an additional terminal associated therewith, where the additional terminal is identical to the first terminal, and before determining final audio data based on the fourth audio data and sending the final audio data to the first terminal, the method further includes: receiving fifth audio data sent by the additional terminal; correspondingly, determining final audio data based on the fourth audio data and sending the final audio data to the first terminal, including: and mixing the fourth audio data with the fifth audio data to obtain final audio data, and sending the final audio data to the first terminal.

In this implementation manner, the virtual chat room is further associated with an additional terminal, and the additional terminal is homogenous with the first terminal (i.e. adapts to the first sampling rate), so that the virtual chat room belongs to a multi-user chat scene, and can perform corresponding audio mixing operation, on one hand, the virtual chat room can be matched with an actual chat scene, so as to improve multi-user chat experience, and on the other hand, the data amount required to be received when the first terminal is used as a receiving end can be reduced, and the final audio data is processed into a form (without additional codec, resampling and audio mixing processing performed by the first terminal) convenient for the first terminal.

With reference to the first aspect, in a third possible implementation manner of the first aspect, the virtual chat room further has an additional terminal associated therewith, where the additional terminal is homogenous with the second terminal, and before determining final audio data based on the second audio data and sending the final audio data to the second terminal, the method further includes: receiving seventh audio data sent by the additional terminal; correspondingly, determining final audio data based on the second audio data and sending the final audio data to the second terminal, including: and mixing the second audio data with the seventh audio data to obtain final audio data, and sending the final audio data to the second terminal.

In this implementation manner, the virtual chat room is further associated with an additional terminal, and the additional terminal is homogenous with the second terminal (i.e. adapts to the second sampling rate), so that the virtual chat room belongs to a multi-user chat scene, and can perform corresponding audio mixing operation, on one hand, the virtual chat room can be matched with an actual chat scene, so as to improve multi-user chat experience, and on the other hand, the data amount required to be received when the second terminal is used as a receiving end can be reduced, and the final audio data is processed into a form (without additional codec, resampling and audio mixing processing performed by the second terminal) convenient for the second terminal.

With reference to the first aspect, in a fourth possible implementation manner of the first aspect, the virtual chat room further has an additional terminal associated therewith, where the additional terminal is homogenous with the second terminal, and before determining final audio data based on the fourth audio data and sending the final audio data to the first terminal, the method further includes: receiving seventh audio data sent by the additional terminal, calling a second voice decoder to decode the seventh audio data to obtain fourth decoded data, and calling a first voice encoder to perform audio sampling of a first sampling rate on the fourth decoded data to obtain eighth audio data; correspondingly, determining final audio data based on the fourth audio data and sending the final audio data to the first terminal, including: and mixing the fourth audio data with the eighth audio data to obtain final audio data, and sending the final audio data to the first terminal.

In this implementation manner, the virtual chat room is further associated with an additional terminal, and the additional terminal is homogenous with the second terminal (i.e. adapts to the second sampling rate), so that the virtual chat room belongs to a multi-user chat scene, and can perform corresponding audio mixing operation, on one hand, the virtual chat room can be matched with an actual chat scene, so as to improve multi-user chat experience, and on the other hand, the data amount required to be received when the first terminal is used as a receiving end can be reduced, and the final audio data is processed into a form (without additional codec, resampling and audio mixing processing performed by the first terminal) which is convenient for the first terminal.

With reference to the first aspect, in a fifth possible implementation manner of the first aspect, the audio system further includes a management server, the new version client may further adapt a first sampling rate, and before the association between the undetermined terminal of the new version client and any virtual chat room is established in any virtual chat room, the method further includes: acquiring a first request or a second request sent by the management server, wherein the first request represents that the management server requests to adapt a first sampling rate for the undetermined terminal, and the second request represents that the management server requests to adapt a second sampling rate for the undetermined terminal; based on the first request, configuring a first voice encoder and a first voice decoder corresponding to a first sampling rate for the undetermined terminal, wherein the undetermined terminal is regarded as being identical to a first terminal with an internally-arranged old version client; and configuring a second voice encoder and a second voice decoder corresponding to a second sampling rate for the undetermined terminal based on the second request, wherein the undetermined terminal is a second terminal with a built-in new version client.

In this implementation manner, by acquiring the first request (the management server requests to adapt the first sampling rate for the terminal to be determined) or the second request (the management server requests to adapt the second sampling rate for the terminal to be determined) sent by the management server, corresponding configuration (configuring the speech encoder and the speech decoder corresponding to the sampling rate) is performed based on the request, so that compatible calls between users of the new and old version clients are facilitated.

With reference to the fifth possible implementation manner of the first aspect, in a sixth possible implementation manner of the first aspect, before the association between the pending terminal and the virtual chat room is established, the management server determines to adapt the first sampling rate or the second sampling rate for the pending terminal by: judging whether the virtual chat room is applicable to the second sampling rate or not; if the virtual chat room is not applicable to the second sampling rate at present, generating a first request for adapting the first sampling rate for the undetermined terminal; if the virtual chat room can be currently adapted to the second sampling rate, judging whether the model or the unique number of the undetermined terminal belongs to the model or the unique number in the preset blacklist, if so, generating a first request for adapting the first sampling rate to the undetermined terminal, and if not, generating a second request for adapting the second sampling rate to the undetermined terminal.

In the implementation manner, whether the virtual chat room can be currently adapted to the second sampling rate and whether the model or the unique number of the terminal to be determined belongs to the model or the unique number in the preset blacklist are taken as the basis for adapting the first sampling rate and the second sampling rate, so that the actual situation of the virtual chat room and the situation of the terminal to be determined can be well considered, and the most suitable sampling rate adaptation request is determined, thereby being beneficial to improving user experience.

In a second aspect, an embodiment of the present application provides an audio system based on chat room audio, including a voice server and a plurality of terminals, where the plurality of terminals at least includes a first terminal with an old version client and a second terminal with a new version client, where the old version client is adapted to a first sampling rate, the new version client is adapted to a second sampling rate, the first sampling rate is lower than the second sampling rate, and the first terminal is configured to perform audio sampling based on the first sampling rate, obtain first audio data, and send the first audio data to the voice server; the voice server is used for receiving first audio data sent by the first terminal, calling a first voice decoder to decode the first audio data to obtain first decoded data, calling a second voice encoder to conduct audio sampling with a second sampling rate on the first decoded data to obtain second audio data, and determining final audio data based on the second audio data and sending the final audio data to the second terminal; or the second terminal is used for carrying out audio sampling based on the second sampling rate to obtain third audio data and sending the third audio data to the voice server; the voice server is configured to receive third audio data sent by the second terminal, call a second voice decoder to decode the third audio data to obtain second decoded data, and call a first voice encoder to sample audio of a first sampling rate of the second decoded data to obtain fourth audio data, so as to determine final audio data based on the fourth audio data and send the final audio data to the first terminal.

In a third aspect, an embodiment of the present application provides a storage medium, where the storage medium includes a stored program, where the program when executed controls a device in which the storage medium is located to perform the cross-version communication method based on chat room audio according to any one of the first aspect or the possible implementation manners of the first aspect.

In a fourth aspect, an embodiment of the present application provides a voice server, including a memory and a processor, where the memory is configured to store information including program instructions, and the processor is configured to control execution of the program instructions, where the program instructions, when loaded and executed by the processor, implement the chat room audio-based cross-version communication method according to the first aspect or any one of the possible implementation manners of the first aspect.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of an audio system based on chat room audio according to an embodiment of the present application.

Fig. 2 is a timing chart of a voice server according to an embodiment of the present application when applying a cross-version communication method based on chat room audio.

Fig. 3 is a block diagram of a voice server according to an embodiment of the present application.

Icon: a 100-audio system; 111-a first terminal; 112-a second terminal; 120-a voice server; 121-a memory; 122-a communication unit; 123-bus; 124-a processor; 130-management server.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Referring to fig. 1, fig. 1 is a schematic diagram of an audio system 100 based on chat room audio according to an embodiment of the application.

In this embodiment, the chat room audio based audio system 100 may include a voice server 120 and a plurality of terminals. The terminal can be internally provided with an old version client or a new version client, the old version client is adapted to the first sampling rate, the new version client is adapted to the second sampling rate, and the first sampling rate is also adapted to the first sampling rate, and the first sampling rate is lower than the second sampling rate.

For easy understanding and description, the present solution distinguishes between different terminals in the following way:

the terminal with the old version client built therein is taken as the first terminal 111, wherein the first terminal 111 can only adapt to the first sampling rate, and correspondingly, the voice server 120 configures a first voice encoder and a first voice decoder for the terminal.

The terminal with the new version client is used as the undetermined terminal, and the undetermined terminal with the new version client has the condition of adapting to the first sampling rate and the second sampling rate, but the same terminal can only determine one sampling rate at the same time to adapt, so the terminal with the new version client which is not determined to adapt to the sampling rate is called the undetermined terminal.

The new version client will be built in and determined to be the terminal adapting the first sampling rate, at which point it is considered to be homogeneous with the first terminal 111 (i.e. it can be considered to be the first terminal 111) until the first sampling rate adapted by that terminal is revoked or changed, of course, at which point the speech server 120 configures it with the first speech encoder and the first speech decoder.

The new version client will be built in and determined to be the terminal adapting the second sampling rate, at this point referred to as the second terminal 112, until the second sampling rate adapted by the terminal is revoked or changed, at which point the speech server 120 configures it with the second speech encoder and the second speech decoder, of course.

In the virtual chat room, a plurality of terminals may be associated, and the present solution mainly solves the problem of compatible communication between the new and old version clients in the seamless upgrade process of the audio system 100, so the description will mainly be given by taking the virtual chat room associated with the first terminal 111 and the second terminal 112 as an example.

But before introducing communication between different terminals associated with a virtual chat room, the process of joining a terminal into the virtual chat room (establishing an association with the virtual chat room) is described herein.

In this embodiment, in order to facilitate management of the sample rate configuration of the terminal, the chat room audio based audio system 100 may further comprise a management server 130.

The sample rate configuration (also called configuration of audio parameters) may be based on the following rules before a terminal joins a virtual chat room:

if the terminal to be added to the virtual room is the first terminal 111 (i.e. the terminal with the old version client built in), the first terminal 111 can only adapt to the first sampling rate, so when the first terminal 111 joins the virtual room, it is not required to send a configuration request to the management server 130, but only needs to configure the first speech encoder and the first speech decoder corresponding to the default first sampling rate for the first terminal through the speech server 120. Of course, the audio parameter request may also be sent by the first terminal 111 to the management server 130, and the management server 130 sends a corresponding configuration request to the voice server 120, so that the voice server 120 configures the first voice encoder and the first voice decoder corresponding to the first sampling rate for the first terminal 111 based on the configuration request, which is not limited herein.

If the terminal to be added to the virtual room is a pending terminal with a new version client, the pending terminal may send a configuration request to the management server 130. And the management server 130 can determine whether the virtual chat room is currently adaptable to the second sampling rate based on the configuration request.

For example, the management server 130 may obtain a recommended sampling rate for the virtual chat room (the terminal recommending joining configures the first sampling rate or the second sampling rate). Thus, the management server 130 can determine whether the virtual chat room is currently adaptable to the second sampling rate.

Here, the recommended sampling rate may be that the voice server 120 decides which sampling rate (audio parameter) it uses according to the first incoming terminal in the virtual chat room: if it is a pending terminal (configured to a second sampling rate after entering, referred to as a second terminal 112) with a new version client built in, recommending use of a second speech encoder and a second speech decoder corresponding to the second sampling rate (i.e., recommending the sampling rate to be the second sampling rate); if it is the first terminal 111 with the old version client built in, it is recommended to use the first speech encoder and the first speech decoder corresponding to the first sampling rate (i.e., the recommended sampling rate is the first sampling rate). Of course, the manner in which the recommended sampling rate is determined is merely exemplary and should not be considered as limiting the application. For example, the determination method of the recommended sampling rate may also be: determining whether the number of the first terminals 111 of the client with the built-in old version currently existing in the virtual chat room is zero, thereby determining a recommended sampling rate of the virtual chat room; the recommended sampling rate (manually selected when creating the virtual chat room) may also be set manually by the user, and is not limited herein.

If the virtual chat room is not currently adaptable to the second sampling rate, the management server 130 may generate a first request requesting adaptation of the first sampling rate for the pending terminal.

If the virtual chat room is currently adaptable to the second sampling rate, the management server 130 may further obtain the model (or the unique number) of the pending terminal, and determine whether the model (or the unique number) of the pending terminal belongs to the model (or the unique number) in the preset blacklist. If so, the management server 130 may generate a first request to adapt the first sampling rate for the pending terminal, and if not, the management server 130 may generate a second request to adapt the second sampling rate for the pending terminal. Here, the first request indicates that the management server 130 requests the voice server 120 to adapt the first sampling rate for the pending terminal, and the second request indicates that the management server 130 requests the voice server 120 to adapt the second sampling rate for the pending terminal.

Whether the virtual chat room can be adapted to the second sampling rate currently and whether the model or the unique number of the terminal to be determined belongs to the model or the unique number in the preset blacklist are taken as the basis for adapting the first sampling rate and the second sampling rate, the actual situation of the virtual chat room and the situation of the terminal to be determined can be well considered, and therefore the most suitable sampling rate adaptation request is determined, and the user experience is facilitated to be improved.

After the management server 130 generates the first request or the second request, the first request and the second request may be transmitted to the voice server 120.

Then, the voice server 120 may obtain the first request or the second request transmitted by the management server 130. The voice server 120 may configure a first voice encoder and a first voice decoder corresponding to the first sampling rate for the pending terminal based on the first request, where the pending terminal is considered to be homogenous with the first terminal 111 of the legacy client. The voice server 120 may configure a second voice encoder and a second voice decoder corresponding to the second sampling rate for the pending terminal based on the second request, where the pending terminal is the second terminal 112 with the new version client.

By acquiring the first request (the first sampling rate is requested to be adapted for the pending terminal by the management server 130) or the second request (the second sampling rate is requested to be adapted for the pending terminal by the management server 130) sent by the management server 130, corresponding configuration (the voice encoder and the voice decoder corresponding to the configuration sampling rate) is performed based on the request, so that compatible communication between users of the new and old version clients is facilitated.

Of course, in order to cope with an emergency situation (e.g., a problem exists in the released new version client, and a rollback to the old version client is required), a "switch" (essentially a parameter) may be configured in the voice server 120 to determine whether the voice server 120 turns on the audio parameter corresponding to the second sampling rate (i.e., whether the second voice encoder and the second voice decoder corresponding to the second sampling rate may be configured). If the "switch" (parameter) is closed, then the voice server 120 returns the audio parameters for the first sample rate (first voice encoder and first voice decoder for the first sample rate), whether it is an old version client or a new version client.

It should be noted that, in addition to the foregoing configuration of the sampling rates of the first terminal 111 and the pending terminal, the corresponding configuration parameters need to be returned to the corresponding terminal in order to make the terminal sample based on the corresponding configuration parameters. For example, for the first terminal 111, the voice server 120 returns the configuration parameter including the first sampling rate, so that when the first terminal 111 collects voice information, sampling is performed with the first sampling rate, and corresponding audio data is obtained.

The above is an introduction of a process of joining a terminal into a virtual chat room, and hereinafter, a communication process between different terminals associated with the virtual chat room will be described.

Referring to fig. 2, fig. 2 is a timing chart of a voice server 120 according to an embodiment of the application when applying a cross-version communication method based on chat room audio.

In this embodiment, the cross-version communication method based on the chat room audio can be described in two procedures for the virtual chat room associated with the first terminal 111 and the second terminal 112:

for the first procedure, procedure when the first terminal 111 is the transmitting end and the second terminal 112 is the receiving end:

The simplest case, i.e. the case where the second terminal 112 receives only the audio data of the first terminal 111, will be discussed here first:

the first terminal 111 may perform audio sampling based on the first sampling rate, obtain first audio data, and transmit the first audio data to the voice server 120.

The voice server 120 may receive the first audio data sent by the first terminal 111, call the first voice decoder to decode the first audio data to obtain first decoded data, and call the second voice encoder to sample the first decoded data at the audio frequency with the second sampling rate to obtain second audio data. The voice server 120 may then determine final audio data based on the second audio data and transmit the final audio data to the second terminal 112.

Since the audio data that the second terminal 112 needs to receive here is derived from the first terminal 111 only, the voice server 120 can take the second audio data as final audio data and transmit this final audio data to the second terminal 112.

For the second procedure, with the second terminal 112 as the transmitting end, the procedure when the first terminal 111 is the receiving end:

the simplest case, i.e. the case where the first terminal 111 receives only the audio data of the second terminal 112, will also be discussed here first:

The second terminal 112 may perform audio sampling based on the second sampling rate to obtain third audio data, and send the third audio data to the voice server 120.

The voice server 120 may receive the third audio data sent by the second terminal 112, call the second voice decoder to decode the third audio data to obtain second decoded data, and call the first voice encoder to perform audio sampling at the first sampling rate on the second decoded data to obtain fourth audio data. Then, the voice server 120 may determine final audio data based on the fourth audio data and transmit the final audio data to the first terminal 111.

Here, the audio data that the first terminal 111 needs to receive is derived from the second terminal 112 only, and thus, the voice server 120 may take the fourth audio data as final audio data and transmit this final audio data to the second terminal 112.

Note that, in this embodiment, communication between any terminal or between any terminal and the voice server 120 is performed based on the rtp protocol, however, in other implementations, other communication protocols may be used, which is not limited herein.

Since more than two terminals may be involved in a virtual chat room in communication at the same time (i.e., more than two users in the chat room are engaged in voice chat at the same time), a slightly more complex situation will be further described below. Since the second terminal 112 is not just receiving more than one voice content at the same time from the viewpoint of the second terminal 112 as a receiving end regardless of how many users chat at the same time, it is necessary to perform a mixing process.

Then, the concept of an additional terminal is introduced in this embodiment to facilitate the description of the case where the second terminal 112 needs to receive the voice contents of a plurality of different terminals as the receiving end.

In this embodiment, the virtual chat room is further associated with an additional terminal, where only one additional terminal is introduced, and the description is given of the case where the additional terminal belongs to different types, so as to describe the communication process in the case where the number of terminals is large, because the process of processing the virtual chat room is consistent regardless of the number of terminals, and the class of the virtual chat room is determined.

First, if the additional terminal is identical to the first terminal 111, take the second terminal 112 as the receiving terminal for example:

before the voice server 120 determines the final audio data based on the second audio data and transmits the final audio data to the second terminal 112, the voice server 120 also needs to receive the fifth audio data transmitted by the additional terminal (the fifth audio data is obtained by audio sampling by the additional terminal based on the first sampling rate, and the receiving period of the fifth audio data is the same as the receiving period of the first audio data). Then, the voice server 120 may invoke the first voice decoder to decode the fifth audio data to obtain third decoded data, and invoke the second voice encoder to perform audio sampling at the second sampling rate on the third decoded data to obtain sixth audio data.

Then, the voice server 120 may mix the second audio data with the sixth audio data to obtain final audio data, and transmit the final audio data to the second terminal 112.

In this way, on the one hand, the method can be matched with an actual chat scene, so as to improve the multi-user chat experience, and on the other hand, the method can also reduce the data amount required to be received when the second terminal 112 is used as a receiving end, and process the final audio data into a form which is convenient for the second terminal 112 to use (no additional codec, resampling and mixing processing is required by the second terminal 112).

Next, if the additional terminal is identical to the first terminal 111, taking the first terminal 111 as a receiving terminal for example:

before the voice server 120 determines the final audio data based on the fourth audio data and transmits the final audio data to the first terminal 111, the voice server 120 also needs to receive the fifth audio data transmitted by the additional terminal (the fifth audio data is obtained by audio sampling by the additional terminal based on the first sampling rate, and the receiving period of the fifth audio data is the same as the receiving period of the third audio data).

Then, the voice server 120 may mix the fourth audio data with the fifth audio data to obtain final audio data, and transmit the final audio data to the first terminal 111.

In this way, on the one hand, the method can be matched with an actual chat scene, so as to improve the multi-user chat experience, and on the other hand, the method can also reduce the data amount required to be received when the first terminal 111 is used as a receiving end, and process the final audio data into a form which is convenient for the first terminal 111 to use (no additional codec, resampling and mixing processing is required by the first terminal 111).

Furthermore, if the additional terminal is homogeneous with the second terminal 112, the second terminal 112 is taken as a receiving terminal for example:

before the voice server 120 determines the final audio data based on the second audio data and transmits the final audio data to the second terminal 112, the voice server 120 also needs to receive the seventh audio data transmitted by the additional terminal (the seventh audio data is obtained by audio sampling by the additional terminal based on the second sampling rate, and the receiving period of the seventh audio data is the same as the receiving period of the first audio data).

Then, the voice server 120 may mix the second audio data with the seventh audio data to obtain final audio data, and transmit the final audio data to the second terminal 112.

Finally, if the additional terminal is homogeneous with the second terminal 112, take the first terminal 111 as the receiving end for example:

before the voice server 120 determines the final audio data based on the fourth audio data and transmits the final audio data to the first terminal 111, the voice server 120 also needs to receive the seventh audio data transmitted by the additional terminal (the seventh audio data is obtained by audio sampling by the additional terminal based on the second sampling rate, and the receiving period of the seventh audio data is the same as the receiving period of the third audio data). Then, the voice server 120 may invoke the second voice decoder to decode the seventh audio data to obtain fourth decoded data, and invoke the first voice encoder to perform audio sampling at the first sampling rate on the fourth decoded data to obtain eighth audio data.

Then, the voice server 120 may mix the fourth audio data with the eighth audio data to obtain final audio data, and transmit the final audio data to the first terminal 111.

Referring to fig. 3, fig. 3 is a block diagram illustrating a voice server 120 according to an embodiment of the application.

In this embodiment, the voice server 120 may be a cloud server, a server cluster, or the like, which is not limited herein.

By way of example, the voice server 120 may include: a communication module 122 connected to the outside through a network, one or more processors 124 for executing program instructions, a bus 123, and a different form of memory 121, such as a disk, ROM, or RAM, or any combination thereof. The memory 121, the communication module 122, and the processor 124 may be connected by a bus 123.

Illustratively, the memory 121 has a program stored therein. Processor 124 can call and run these programs from memory 121 so that cross-version communication methods based on chat room audio can be implemented by running the programs.

The embodiment of the application also provides a storage medium, which comprises a stored program, wherein the device where the storage medium is controlled to execute the cross-version communication method based on the chat room audio in the embodiment when the program runs.

In summary, the embodiments of the present application provide a cross-version communication method, system, medium and server based on chat room audio, which can implement cross-version communication between new and old version clients by using the voice server 120 to operate the cross-version communication method based on chat room audio, and the voice audio sampling rate and the updated codec updated in the audio system 100 upgrading (the old version audio parameters cannot be modified) process can enable the user of the new version client to obtain better experience. Meanwhile, by configuring audio parameters (the old version client adapts to the first sampling rate and the new version client adapts to the second sampling rate) through the voice server 120, the server can support call compatibility between the new version clients, between the old version clients and between the new and old version clients by encoding and decoding the audio and resampling. Moreover, the method can very conveniently realize gray verification by configuring the audio parameters corresponding to the client of the new version of the configuration of the single server, so that the audio parameters of a room (virtual chat room) can be configured based on the audio parameters, and the iterative seamless upgrading of the version of the audio parameters of the client can be realized. In addition, the user verification sampling rate and the new codec of the client with partial new version can be supported, if the release of the client with new version is problematic, the audio parameters of the client with old version can be conveniently returned to be normally used, and thus the sound use experience of the client with new version is improved on the premise of not influencing the use of the client with old version. And, this way the decoded resampling process of the audio is placed on the voice server 120 without requiring additional configuration of the client.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A cross-version communication method based on chat room audio, wherein an audio system comprises a voice server and a plurality of terminals, and the plurality of terminals at least comprise a first terminal with an old version client and a second terminal with a new version client, the old version client is adapted to a first sampling rate, the new version client is adapted to a second sampling rate, and the first sampling rate is lower than the second sampling rate, the method is applied to the voice server, and aiming at a virtual chat room associated with the first terminal and the second terminal, the method comprises:

Receiving first audio data sent by the first terminal, calling a first voice decoder to decode the first audio data to obtain first decoded data, and calling a second voice encoder to perform audio sampling of a second sampling rate on the first decoded data to obtain second audio data, so as to determine final audio data based on the second audio data and send the final audio data to the second terminal;

or, receiving third audio data sent by the second terminal, calling a second voice decoder to decode the third audio data to obtain second decoded data, and calling a first voice encoder to perform audio sampling with a first sampling rate on the second decoded data to obtain fourth audio data, so as to determine final audio data based on the fourth audio data and send the final audio data to the first terminal;

the audio system further comprises a management server, the new version client can adapt to a first sampling rate, and for any virtual chat room, before the undetermined terminal of the new version client is set in any virtual chat room to establish association with the virtual chat room, the method further comprises:

acquiring a first request or a second request sent by the management server, wherein the first request represents that the management server requests to adapt a first sampling rate for the undetermined terminal, and the second request represents that the management server requests to adapt a second sampling rate for the undetermined terminal;

Based on the first request, configuring a first voice encoder and a first voice decoder corresponding to a first sampling rate for the undetermined terminal, wherein the undetermined terminal is regarded as being identical to a first terminal with an internally-arranged old version client;

and configuring a second voice encoder and a second voice decoder corresponding to a second sampling rate for the undetermined terminal based on the second request, wherein the undetermined terminal is a second terminal with a built-in new version client.

2. The cross-version communication method based on chat room audio according to claim 1, wherein the virtual chat room is further associated with an additional terminal, the additional terminal being homogeneous with the first terminal, the method further comprising, before determining final audio data based on the second audio data and transmitting to the second terminal:

receiving fifth audio data sent by the additional terminal, calling a first voice decoder to decode the fifth audio data to obtain third decoded data, and calling a second voice encoder to perform audio sampling of a second sampling rate on the third decoded data to obtain sixth audio data;

correspondingly, determining final audio data based on the second audio data and sending the final audio data to the second terminal, including:

And mixing the second audio data with the sixth audio data to obtain final audio data, and sending the final audio data to the second terminal.

3. The cross-version communication method based on chat room audio according to claim 1, wherein the virtual chat room is further associated with an additional terminal, the additional terminal being homogeneous with the first terminal, the method further comprising, before determining final audio data based on the fourth audio data and transmitting to the first terminal:

receiving fifth audio data sent by the additional terminal;

correspondingly, determining final audio data based on the fourth audio data and sending the final audio data to the first terminal, including:

and mixing the fourth audio data with the fifth audio data to obtain final audio data, and sending the final audio data to the first terminal.

4. The cross-version communication method based on chat room audio according to claim 1, wherein the virtual chat room is further associated with an additional terminal, the additional terminal being homogeneous with the second terminal, the method further comprising, prior to determining final audio data based on the second audio data and transmitting to the second terminal:

Receiving seventh audio data sent by the additional terminal;

and mixing the second audio data with the seventh audio data to obtain final audio data, and sending the final audio data to the second terminal.

5. The cross-version communication method based on chat room audio according to claim 1, wherein the virtual chat room is further associated with an additional terminal, the additional terminal being homogeneous with the second terminal, the method further comprising, before determining final audio data based on the fourth audio data and transmitting to the first terminal:

receiving seventh audio data sent by the additional terminal, calling a second voice decoder to decode the seventh audio data to obtain fourth decoded data, and calling a first voice encoder to perform audio sampling of a first sampling rate on the fourth decoded data to obtain eighth audio data;

and mixing the fourth audio data with the eighth audio data to obtain final audio data, and sending the final audio data to the first terminal.

6. The cross-version communication method based on chat room audio according to claim 1, wherein the management server determines to adapt the first sampling rate or the second sampling rate for the pending terminal before the pending terminal establishes an association with the virtual chat room by:

judging whether the virtual chat room is applicable to the second sampling rate or not;

if the virtual chat room is not applicable to the second sampling rate at present, generating a first request for adapting the first sampling rate for the undetermined terminal;

if the virtual chat room can be currently adapted to the second sampling rate, judging whether the model or the unique number of the undetermined terminal belongs to the model or the unique number in the preset blacklist, if so, generating a first request for adapting the first sampling rate to the undetermined terminal, and if not, generating a second request for adapting the second sampling rate to the undetermined terminal.

7. An audio system based on chat room audio is characterized by comprising a voice server and a plurality of terminals, wherein the terminals at least comprise a first terminal internally provided with an old version client and a second terminal internally provided with a new version client, the old version client is adapted to a first sampling rate, the new version client is adapted to a second sampling rate, the first sampling rate is lower than the second sampling rate, and the audio system is used for a virtual chat room associated with the first terminal and the second terminal,

The first terminal is used for carrying out audio sampling based on a first sampling rate to obtain first audio data and sending the first audio data to the voice server;

the voice server is used for receiving first audio data sent by the first terminal, calling a first voice decoder to decode the first audio data to obtain first decoded data, calling a second voice encoder to conduct audio sampling with a second sampling rate on the first decoded data to obtain second audio data, and determining final audio data based on the second audio data and sending the final audio data to the second terminal; or,

the second terminal is used for carrying out audio sampling based on a second sampling rate to obtain third audio data and sending the third audio data to the voice server;

the voice server is used for receiving third audio data sent by the second terminal, calling a second voice decoder to decode the third audio data to obtain second decoded data, calling a first voice encoder to conduct audio sampling of a first sampling rate on the second decoded data to obtain fourth audio data, and determining final audio data based on the fourth audio data and sending the final audio data to the first terminal;

Wherein the audio system further comprises a management server, the new version client is further adapted to adapt the first sampling rate, for any virtual chat room, before the association between the undetermined terminal of the new version client and the virtual chat room is established in any virtual chat room,

the voice server is configured to obtain a first request or a second request sent by the management server, where the first request indicates that the management server requests to adapt a first sampling rate for the pending terminal, and the second request indicates that the management server requests to adapt a second sampling rate for the pending terminal; based on the first request, configuring a first voice encoder and a first voice decoder corresponding to a first sampling rate for the undetermined terminal, wherein the undetermined terminal is regarded as being identical to a first terminal with an internally-arranged old version client; and configuring a second voice encoder and a second voice decoder corresponding to a second sampling rate for the undetermined terminal based on the second request, wherein the undetermined terminal is a second terminal with a built-in new version client.

8. A storage medium comprising a stored program, wherein the program, when run, controls a device in which the storage medium resides to perform the cross-version communication method based on chat room audio of any of claims 1 to 6.

9. A voice server comprising a memory for storing information including program instructions, and a processor for controlling execution of the program instructions which, when loaded and executed by the processor, implement the cross-version communication method based on chat room audio of any of claims 1 to 6.