CN110557226A

CN110557226A - Audio transmission method and device

Info

Publication number: CN110557226A
Application number: CN201910841322.1A
Authority: CN
Inventors: 岑裕; 杨攀
Original assignee: Beijing Cloud In Faith Network Technology Co Ltd
Current assignee: Beijing Cloud In Faith Network Technology Co Ltd
Priority date: 2019-09-05
Filing date: 2019-09-05
Publication date: 2019-12-10

Abstract

the embodiment of the application provides an audio transmission method and an audio transmission device, wherein the audio transmission method comprises the following steps: the method comprises the steps that a first device generates a first audio frame, wherein the first audio frame comprises N audio data, and N is a positive integer greater than or equal to 2; the first device transmits a first audio frame to the second device. According to the embodiment of the application, the first device generates the first audio frame, so that the first device can send at least two audio data to the second device through one first audio frame, and the problem of bandwidth resource waste in the prior art is solved.

Description

Audio transmission method and device

Technical Field

The present application relates to the field of audio transmission technologies, and in particular, to an audio transmission method and apparatus.

background

With the rapid development of network technology, the application of audio and video technology is more and more popular. In addition, users have a great deal of audio or audio-video communication requirements. For example, different users may send voice or conduct video conferences, etc. through audio frames transmitted between different devices.

In the process of implementing the present invention, the inventor finds that the proportion of the payload in the audio frame transmitted between different devices in the prior art is relatively small, which results in a large amount of bandwidth resource waste in the case of relatively tight bandwidth resource.

Disclosure of Invention

An object of the embodiments of the present application is to provide an audio transmission method and apparatus, so as to solve the problem in the prior art that bandwidth resources are wasted due to a relatively small proportion of a payload in an audio frame.

In a first aspect, an embodiment of the present application provides an audio transmission method, where the method includes: the method comprises the steps that a first device generates a first audio frame, wherein the first audio frame comprises N audio data, and N is a positive integer greater than or equal to 2; the first device transmits a first audio frame to the second device.

therefore, the first audio frame is generated by the first device, where the first audio frame includes N audio data, where N is a positive integer greater than or equal to 2, so that the first device can send at least two audio data to the second device through one first audio frame.

That is to say, compared with the audio frames in the prior art, the number of audio data carried in one first audio frame is increased in the first audio frame in the embodiment of the present application, so that the proportion of the payload in the first audio frame is increased. Furthermore, even under the condition that the bandwidth resources are relatively tense, compared with the scheme in the prior art, the scheme in the embodiment of the application can improve the utilization rate of the bandwidth resources, so that the problem of bandwidth resource waste in the prior art is solved.

In one possible embodiment, the first device generates a first audio frame comprising: the first device acquires the N pieces of audio data; and the first equipment combines the N audio data to obtain a first audio frame.

Therefore, the first audio frame is obtained by combining the N audio data, so that the transmission performance under low bandwidth can be improved.

In one possible embodiment, the audio transmission method further includes: the first equipment receives feedback information sent by the second equipment; and the first equipment adjusts the number of the audio data in the first audio frame according to the feedback information.

Therefore, the number of the audio data in the first audio frame is adjusted through the feedback information, so that the requirements of different bandwidth resources can be met by adjusting the number of the audio data of the first audio frame. For example, if the first device determines that the current bandwidth resources are tight according to the feedback message, the first device may reduce the number of audio data in the first audio frame, thereby reducing the bandwidth resources required for the first audio.

in addition, the scheme of the embodiment of the application is not limited by the format of the audio data, so that the scheme of the embodiment of the application also has universality. And the embodiment of the application can also adapt to different bandwidth resources by adjusting the number of the audio data in the first audio frame, and the user experience can be ensured without sacrificing the quality of the audio, so that the problem caused by reducing the code rate of the sound coding in the prior art can be solved.

In one possible embodiment, the first device generates a first audio frame comprising: the method comprises the steps that a first device obtains M pieces of audio data, wherein M is a positive integer larger than N; the first device generates a first audio frame through N audio data of the M audio data; the audio transmission method further includes: the first device generates a second audio frame through M-N audio data other than the N audio data among the M audio data; the first device transmits the second audio frame to the second device.

Therefore, in the embodiment of the application, the first audio frame and the second audio frame are used for transmitting the continuous M audio data corresponding to one segment of audio, so that even when one of the two audio frames is lost, the second device can perform difference through the other one of the two audio frames, thereby ensuring the playing of the segment of audio, and solving the problem that the audio cannot be played due to the fact that the whole segment of audio is lost when a network loses packets.

In a second aspect, an embodiment of the present application provides an audio transmission method, where the method includes: the method comprises the steps that a first audio frame sent by a first device is received by a second device, wherein the first audio frame comprises N pieces of audio data, and N is a positive integer greater than or equal to 2; and the second equipment analyzes the first audio frame to obtain N audio data.

In one possible embodiment, the first audio frame is obtained by the first device by merging the N audio data.

In one possible embodiment, the audio transmission method further includes: the second equipment generates feedback information according to the first audio frame; and the second equipment sends the feedback information to the first equipment so that the first equipment can adjust the number of the audio data in the first audio frame according to the feedback information.

In one possible embodiment, in a case where the first device transmits a first audio frame and a second audio frame to the second device, and the second device receives only the first audio frame, the first audio frame is generated by N audio data out of M audio data acquired by the first device, and the second audio frame is generated by M-N audio data out of the M audio data, excluding the N audio data, the audio transmission method further includes: the second device interpolates the first audio frame to acquire M-N interpolated audio data corresponding to the M-N audio data.

In a third aspect, the present application provides an audio transmission apparatus applied to a first device, the audio transmission apparatus including: the audio processing device comprises a first generating module, a second generating module and a processing module, wherein the first generating module is used for generating a first audio frame, the first audio frame comprises N audio data, and N is a positive integer greater than or equal to 2; and the first sending module is used for sending the first audio frame to the second equipment.

in one possible embodiment, the first generating module comprises: the acquisition module is used for acquiring N pieces of audio data; and the merging module is used for merging the N audio data to obtain a first audio frame.

in one possible embodiment, the audio transmission device further comprises: the receiving module is used for receiving feedback information sent by the second equipment; and the adjusting module is used for adjusting the number of the audio data in the first audio frame according to the feedback information.

in a possible embodiment, the obtaining module is further configured to obtain M audio data, where M is a positive integer greater than N; the first generating module is further used for generating a first audio frame through N audio data in the M audio data; a first generating module, further configured to generate a second audio frame from M-N audio data, excluding the N audio data, of the M audio data; the first sending module is further configured to send the second audio frame to the second device.

In a fourth aspect, the present application provides an audio transmission apparatus applied to a second device, the audio transmission apparatus including: the device comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a first audio frame sent by first equipment, the first audio frame comprises N pieces of audio data, and N is a positive integer greater than or equal to 2; and the analysis module is used for analyzing the first audio frame to obtain N audio data.

In one possible embodiment, the audio transmission device further comprises: the second generating module is used for generating feedback information according to the first audio frame; the second sending module is further configured to send the feedback information to the first device, so that the first device adjusts the number of the audio data in the first audio frame according to the feedback information.

In one possible embodiment, in a case where the first device transmits a first audio frame and a second audio frame to the second device, and the second device receives only the first audio frame, the first audio frame is generated by N audio data out of M audio data acquired by the first device, and the second audio frame is generated by M-N audio data out of the M audio data, excluding the N audio data, the audio transmission apparatus further includes: and the interpolation module is used for interpolating the first audio frame to acquire M-N interpolated audio data corresponding to the M-N audio data.

In a fifth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the method of the first aspect or any of the alternative implementations of the first aspect.

In a sixth aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the machine-readable instructions when executed by the processor performing the method of the second aspect or any of the alternative implementations of the second aspect.

In a seventh aspect, this application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method according to the first aspect or any optional implementation manner of the first aspect.

In an eighth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method of the second aspect or any optional implementation manner of the second aspect.

In a ninth aspect, the present application provides a computer program product, which when run on a computer, causes the computer to execute the method of the first aspect or any possible implementation manner of the first aspect.

In a tenth aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to execute the method of the second aspect or any possible implementation manner of the second aspect.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

in order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 shows a flow chart of a conventional audio transmission process;

FIG. 2 is a schematic diagram illustrating a structure of an audio frame;

FIG. 3 is a schematic diagram illustrating one implementation scenario in which examples of the present application may be applied;

Fig. 4 is a flowchart of an audio transmission method provided in an embodiment of the present application;

Fig. 5 is a schematic structural diagram of a first audio frame according to an embodiment of the present application;

Fig. 6 shows a detailed flowchart of an audio transmission method provided in an embodiment of the present application;

Fig. 7 is a schematic diagram illustrating feedback on a first audio frame according to an embodiment of the present application;

FIG. 8 is a schematic diagram illustrating an interpolation of audio data according to an embodiment of the present application;

Fig. 9 is a schematic structural diagram illustrating an audio transmission apparatus according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of another audio transmission device provided in the embodiment of the present application;

Fig. 11 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

the technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.

it should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

referring to fig. 1, fig. 1 is a flowchart illustrating a conventional audio transmission process, where the conventional audio transmission process illustrated in fig. 1 includes the following steps:

A capture device (e.g., microphone, etc.) in the first device performs audio capture; the first device converts the collected audio into PCM (Pulse Code Modulation) data; the first device encodes the PCM data into audio data in a specific format, wherein the audio data in the specific format can be audio data in MP3 format, audio data in WAV format or the like; the first device encapsulates (or packs) the audio data into an RTP (Real-time transport protocol) packet, wherein the RTP packet includes an RTP Header (or RTP Header) and an RTP application Payload (or RTP Payload); the first device submits the RTP packet to a UDP (User Datagram Protocol) channel or a TCP (Transmission Control Protocol) channel, so that the first device encapsulates the RTP packet into a UDP packet or a TCP packet, wherein the UDP packet includes a UDP Header, an RTP Header, and an RTP application payload, or the TCP includes a TCP Header; the first device encapsulates the UDP packet or the TCP packet into an audio frame (or the audio frame may also be referred to as an IP frame), wherein the audio frame includes an IP Header, a UDP Header, an RTP Header, and an RTP application payload, or the audio frame includes an IP Header, a TCP Header, an RTP Header, and an RTP application payload; the first device sends the audio frames to the second device.

The second equipment receives the audio frame sent by the first equipment through the network; the second device analyzes the RTP packet from the UDP channel or the TCP channel, namely the second device analyzes the audio frame into the UDP packet or the TCP packet, and the second device analyzes the UDP packet or the TCP packet into the RTP packet; the second device analyzes (or extracts) the content of the RTP application load from the RTP packet, so that the second device acquires the audio data in a specific format; the second device places the audio data in a receiving buffer area so as to buffer the audio data; the second device may periodically read the audio data from the receive buffer and also decode the audio data into PCM data; the second device converts the PCM data into audio and performs playback of the audio through a playback device (e.g., a speaker, a loudspeaker, etc.).

Although the above describes the process of transmitting audio frames between the first device and the second device, in order to clearly understand the structure of the audio frames transmitted between the first device and the second device, the following description takes the existing audio frame shown in fig. 2 as an example. It should be understood that the structures of other existing audio frames are similar and will not be described one by one here.

Referring to fig. 2, fig. 2 is a schematic diagram illustrating a structure of an audio frame according to the prior art. An existing audio frame includes a plurality of data: an IP header of 20 bytes, a UDP header of 20 bytes, an RTP header of 20 bytes, and an RTP payload carrying 20ms of audio data.

Thus, according to the size of each data in the audio frame shown in fig. 2, it can be determined that the audio data in the existing one audio frame is relatively small, that is, the proportion of the payload in one audio frame is relatively small, so that in the case of a relatively tight bandwidth resource, transmission in the audio frame similar to that shown in fig. 2 would result in a large amount of bandwidth resource waste.

In addition, when a network bandwidth resource is low or a network is unstable, in order to ensure basic usability of audio, in the prior art, a coding rate of voice coding is reduced by adjusting parameters of an encoder, so that smoothness of audio is ensured by sacrificing quality of audio.

However, this scheme of reducing the coding rate of the vocoding may reduce the quality of the audio received by the user, resulting in a poor user experience. In addition, since the compression algorithm for the code rate is generally applied to the audio coding of a specific format, the scheme for reducing the code rate of the sound coding also has the problem of lack of universality.

Based on this, the application skillfully provides an audio transmission method and an audio transmission device, a first audio frame is generated by a first device, wherein the first audio frame comprises N audio data, and N is a positive integer greater than or equal to 2, so that the first device can send at least two audio data to a second device through one first audio frame.

Referring to fig. 3, fig. 3 is a schematic diagram of an implementation scenario 300 in which the present example is applicable. As shown in fig. 3, the implementation scenario 300 includes: a first device 310, a network 320, and a second device 330.

the first device 310 may be a terminal device for use by a user. For example, the first device 310 may be a mobile phone, a tablet computer, a notebook computer, a laptop portable computer, or a desktop computer. In other words, the specific device type of the first device 310 may be selected according to actual requirements, and the embodiment of the present application is not limited thereto.

in addition, the first device 310 has communication capabilities, and may run a browser or other application capable of loading and displaying web pages. For example, the application may be a communication application (e.g., WeChat, etc.), a conference application, or the like.

and, the first device 310 and the second device 330 may communicate over the network 320. The network 320 may be any type of wired or wireless network, or combination thereof. For example, the network 120 may include wired or wireless network access points, such as base stations and/or network switching nodes, through which the first device 310 and the second device 330 may connect to the network 120 to exchange data and/or information.

and, the second device 330 may also be a terminal device for use by a user. For example, the second device 330 may be a mobile phone, a tablet computer, a notebook computer, a laptop portable computer, or a desktop computer. In other words, the specific device type of the second device 330 may be selected according to actual requirements, and the embodiment of the present application is not limited thereto.

In addition, the second device 330 is equipped with communication capabilities, and may run a browser or other application capable of loading and displaying web pages. For example, the application may be even a communication application (e.g., WeChat), a conference application, or the like.

In the embodiment of the present application, when the user a sends voice to the user B through the application program in the first device 310, the first device 310 may convert audio corresponding to the voice of the user a into a plurality of audio data, and the first device 310 may generate the first audio frame using at least two audio data. Subsequently, the first device 310 may transmit the first audio frame to the second device 330. And after the second device 330 receives the first audio frame, the second device 330 acquires at least two audio data according to the first audio frame, so that the user B can acquire the voice transmitted by the first device 310 through an application program in the second device 330.

It should be noted that the audio transmission method and apparatus provided in the embodiment of the present invention may be further extended to other suitable implementation scenarios, and are not limited to the implementation scenario 300 shown in fig. 3. Although only 2 devices are shown in fig. 3, it should be understood by those skilled in the art that the application scenario 300 may include more devices in the process of practical application, and the embodiments of the present application are not limited thereto.

Referring to fig. 4, fig. 4 is a flowchart of an audio transmission method according to an embodiment of the present disclosure.

As shown in fig. 4, the method includes:

In step S410, the first device obtains N audio data, where N is a positive integer greater than or equal to 2.

It should be understood that the audio data in the embodiment of the present application is audio data in a specific format. For example, the audio data may be audio data in MP3 format, or may be audio data in WAV format. In other words, the audio data may be audio data in any format, and the embodiments of the present application are not limited thereto.

It should also be understood that the audio data may also be referred to as audio compressed data, and may also be referred to as audio encoded data, and the embodiments of the present application are not limited thereto.

it should also be understood that the amount of the audio data acquired by the first device may also be set according to actual requirements, in other words, the amount of the audio data carried in the first audio frame may also be set according to actual requirements, and the embodiment of the present application is not limited thereto.

Specifically, the first device may put audio data encoded by PCM data into a transmission buffer queue (or the transmission buffer queue may also be referred to as a transmission buffer), where the transmission buffer queue is used for buffering the audio data. That is to say, compared with the first device in the prior art, the first device in the embodiment of the present application adds a sending buffer queue to the first device.

Subsequently, under the condition that the first device is provided with the sending buffer queue, the first device may read the audio data from the sending buffer queue, so that the first device can acquire the N audio data.

The audio data in the sending buffer queue may be arranged according to the serial number or the playing order of the audio, so that the first device may obtain (or read) N audio data with a continuous playing order, or may obtain N audio data with a discontinuous playing order.

For example, when the transmission buffer queue has audio data 1, audio data 2, and audio data 3 arranged in the playing order, the first device may read 2 audio data with two consecutive playing orders of audio data 1 and audio data 2, or may read 2 audio data with two discontinuous playing orders of audio data 1 and audio data 3.

In addition, since the audio data buffered in the sending buffer queue may be in the same format or in different formats, the N audio data acquired by the first device may also be in the same format or in different formats, and the embodiment of the present application is not limited thereto.

For example, the first device acquires 2 audio data, wherein the 2 audio data are all in MP3 format. As another example, the first device acquires 2 audio data, wherein one audio data is in MP3 format and the other audio data is in WAV format.

it should be understood that, although the above description is made by taking the example of setting the sending buffer queue in the first device as an example, it should be understood by those skilled in the art that the first device may also be implemented by other ways to obtain the N audio data, and the embodiment of the present application is not limited to this.

For example, during the encoding of the PCM data by the first device, the first device may wait for a predetermined amount of PCM data to be encoded during the encapsulation of the RTP packet, and then generate the first audio frame using the acquired predetermined amount of audio data. That is, the first device in the embodiment of the present application may not be provided with a transmission buffer queue, and the first device may generate the first audio frame by waiting for a predetermined amount of audio data.

In step S420, the first device merges the N audio data, so that the first device generates a first audio frame.

Specifically, after acquiring N audio data, the first device merges the N audio data, and then the first device packetizes the merged audio data and puts the packetized audio data into an RTP application payload to generate an RTP packet, and then the first device generates a first audio frame using the RTP packet. In addition, a specific process of the first device generating the first audio frame by using the RTP packet will be described below, and specific reference may be made to the related description below, which is not described herein again.

The audio data may be merged by arranging the plurality of audio data together according to the playing sequence of the audio data, and a separator is disposed between every two adjacent audio data, or by not randomly arranging the plurality of audio data together according to the playing sequence of the audio data, and a separator is disposed between every two adjacent audio data, which is not limited in this embodiment of the present application.

for convenience of understanding, the manner of merging audio data in the embodiment of the present application is described below by taking the first audio frame shown in fig. 5 as an example. It should be understood that the first audio frame in fig. 5 is only an example and is not a limitation of the first audio frame in the embodiment of the present application.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a first audio frame according to an embodiment of the present disclosure. The first audio frame as shown in fig. 5 includes the following data: a 20 byte IP header, a 20 byte UDP header, a 20 byte RTP header, and three RTP payloads carrying 20ms of audio data. Wherein a separator is arranged between every two adjacent RTP payloads in the three RTP payloads carrying 20ms audio data.

It should also be understood that, in the case where the RTP packet carries multiple audio data, the first device may also identify the relevant information of the audio data carried in each first audio frame by adding a new identification within the RTP header in the RTP packet.

For example, in the case that the first audio frame carries audio data 1 and audio data 2, an identifier that the first audio frame carries 2 pieces of audio data may be added to an RTP header in the first audio, and an identifier of the transmission time of the first audio frame and the sequence number of the first audio frame may also be added to the RTP header.

It should also be understood that, although the above description is made by taking the way of adding the new identifier in the RTP header as an example, it should be understood by those skilled in the art that the new identifier may also be added in the UDP header, the TCP header, or the IP header in the first audio frame, and the embodiment of the present application is not limited thereto.

Step S430, the first device sends the first audio frame to the second device.

It should be understood that, in the case that the first device sends multiple first audio frames to the second device, the first device may send multiple audio frames one by one through a serial transmission manner, or may send multiple audio frames at a time through a parallel transmission manner, and the embodiments of the present application are not limited thereto.

In step S440, the second device receives the first audio frame sent by the first device.

It should be understood that in the case where the first device uses a serial transmission scheme, the second device may receive the first audio frames one by one. Alternatively, in the case where the first device uses a parallel transmission scheme, the second device may receive a plurality of first audio frames at a time.

Step S450, the second device parses the first audio frame to obtain N audio data.

Specifically, the second device parses the received first audio frame to obtain N audio data in the first audio frame. Subsequently, the second device may place the N audio data into the receive buffer queue. Subsequently, the playing device of the second device may implement playing of the audio by reading the audio data from the receiving buffer queue (or the receiving buffer queue may also be referred to as a receiving buffer).

In the process of playing audio or audio and video by the second device, if the second device determines that the received N audio data are multiple audio data with continuous playing sequence, the second device may play normal audio data. If the second device determines that the received N audio data are the N audio data with discontinuous playing sequence, and other first audio frames received by the second device do not carry the interval audio data in the N discontinuous audio data, the second device can realize the playing of audio or audio and video in an interpolation mode.

The interpolation may be implemented by a weighted average of two adjacent audio data.

For example, the second device parses a received first audio frame to obtain audio data 1 and audio data 3, but one interval audio data 2 is lacked between the audio data 1 and the audio data 3, and the second device may implement audio or audio-video playing in an interpolation manner under the condition that other first audio frames received by the second device still do not carry the interval audio data 2. The calculation process of the interpolation audio data comprises the following steps: when the loudness of the audio data 1 is 6 and the loudness of the audio data 3 is 4, the loudness of the interpolated audio data corresponding to the interval audio data 2 is (6+4)/2 — 5.

It should be understood that, although the above description is made by taking 1 interval audio data as an example, a person skilled in the art should connect that the number of interval audio data may also be at least two, and the embodiment of the present application is not limited thereto.

In step S460, the second device generates feedback information according to the first audio frame.

it should be understood that the feedback information may be transmission delay, network packet loss rate, or transmission delay of the network and network packet loss rate. That is to say, the information included in the feedback information in the embodiment of the present application may be set according to a requirement, and the embodiment of the present application is not limited to this.

In order to facilitate understanding of the embodiment of the present application, the following description takes as an example that the feedback information includes transmission delay and network packet loss rate, and a new identifier is added in the RTP header.

Specifically, since the RTP header in the first audio frame may be added with the new identifier of the sending time of the first audio frame and the sequence number of the first audio frame, after the second device receives the first audio frame, the second device may determine the sending time of the current first audio frame and the sequence number of the current first audio frame through the new identifier in the RTP header.

After the second device acquires the transmission time of the first audio frame, the second device may determine a transmission delay of the network based on the current time and the transmission time of the first audio frame.

And the second device may determine the sequence number of the first audio frame through the new identifier of the first audio frame, in addition to determining the transmission time of the first audio frame through the new identifier of the first audio frame. Therefore, after the second device obtains the sequence number of the first audio frame, the second device can determine the network packet loss rate.

for example, in a preset time, the second device receives 3 first audio frames, and sequence numbers of the 3 first audio frames are first audio frame 1, first audio frame 3, and first audio frame 4, respectively, then the second device may determine that 1 first audio frame 2 is lost according to the sequence numbers, and then the second device determines that the network packet loss rate is 1/4. The preset time may be set according to actual requirements, and the embodiment of the present application is not limited to this.

Therefore, when the second device obtains the transmission delay and the network packet loss rate of the network, the second device may generate the feedback information according to the transmission delay and the network packet loss rate of the network.

Step S470, the second device sends feedback information to the first device.

It should be understood that the path for the second device to send feedback information to the first device may be implemented in accordance with the path for the second device to receive the first audio frame.

In step S480, the first device receives the feedback information sent by the second device.

It should be understood that the path along which the first device receives the feedback information may be implemented in accordance with the path along which the first device generates the first audio frame.

It should be noted that the following schemes both relate to two cases, for example, the first device may or may not adjust the number of audio data in the first audio frame. Therefore, in order to facilitate clear understanding of the technical solution of the embodiment of the present application, the following solution is not shown in fig. 4, and the following related contents can be referred to the following description.

after the first device receives the feedback information sent by the second device, the first device may continue to send according to the current sending mode, that is, the first device does not make an adjustment after receiving the feedback information. Of course, after the first device receives the feedback information sent by the second device, the first device may also adjust the sending mode of the first audio frame, for example, after the first device receives the feedback information, the first device may adjust the number of audio data in the first audio frame. The number of the audio data may be adjusted to increase the number of the audio data, or may be adjusted to decrease the number of the audio data, which is not limited in this embodiment of the application.

In order to facilitate understanding of the embodiments of the present application, the following description will be given by way of specific examples.

after the first device receives the feedback information sent by the second device, the first device determines the use condition of the current bandwidth resource according to the transmission delay of the network in the feedback information. If the first device determines that the current bandwidth resource is relatively abundant, that is, the bandwidth resource required by the first device is smaller than the available resource of the current bandwidth resource, the first device may increase the number of audio data in the first audio frame. If the first device determines that the current bandwidth resource is relatively tight, the first device may reduce the number of audio data in the first audio frame in order to ensure the reliability of transmission.

It should be understood that, the determination manner for determining the use condition of the current bandwidth resource by the first device according to the transmission delay of the network in the feedback information may be set according to actual requirements, and the embodiment of the present application is not limited to this.

For example, the first device may determine the current usage of the bandwidth resource by comparing the transmission delay of the network with a first preset value. And if the transmission delay of the network is greater than a first preset value, the first equipment determines that the current network resources are tense. And if the transmission delay of the network is less than a first preset value, the first equipment determines that the current network resources are more abundant. The first preset value may be set according to actual requirements, and the embodiment of the present application is not limited to this.

in addition, the feedback information sent by the second device may include a network packet loss rate in addition to the transmission delay of the above network.

After the first device determines the network packet loss rate, the first device determines the stability of the current network environment according to the network packet loss rate, where the stability of the network environment may identify the size of the network packet loss rate, poor stability of the network environment indicates that the network packet loss rate is higher, and good stability of the network environment indicates that the network packet loss rate is lower.

if the first device determines that the stability of the network environment is better according to the network packet loss rate, the first device may continue to transmit the first audio frame according to the current transmission mode, that is, the first device does not make an adjustment. If the first device determines that the stability of the network environment is poor according to the network packet loss rate, the first device may adjust the current transmission mode of the first audio frame.

It should be understood that, the manner in which the first device determines the stability of the current network environment according to the network packet loss rate may be set according to actual requirements, and the embodiment of the present application is not limited thereto.

For example, the first device may determine the stability of the current network environment by comparing the network packet loss rate with a second preset value. And if the network packet loss rate is greater than a second preset value, the first equipment determines that the stability of the network environment is poor. And if the network packet loss rate is smaller than a second preset value, the first equipment determines that the stability of the network environment is better. The second preset value may be set according to actual requirements, and the embodiment of the present application is not limited to this.

In addition, the first device may adjust the current transmission mode of the first audio frame, and may set the current transmission mode according to actual requirements, which is not limited to this embodiment of the application.

In order to facilitate understanding of the embodiment of the present application, a description is provided below of a specific scheme for adjusting a current transmission mode of a first audio frame by a first device.

In order to avoid losing the entire audio when the network packet is lost in the case where the first device determines that the current network transmission is unstable, the first device may modulate a transmission mode in which the entire audio is transmitted by one audio frame into a transmission mode in which the entire audio is transmitted by at least two audio frames (including the first audio frame and the second audio frame).

In order to facilitate understanding of the embodiments of the present application, the following description is made by way of specific embodiments.

After the first device acquires the M audio data, the first device may generate a first audio frame from N audio data of the M audio data, and the first device may generate a second audio frame from M-N audio data of the M audio data other than the N audio frames.

The interval between any two adjacent audio data in the first audio frame and the second audio frame may be equal or unequal, and the embodiment of the present application is not limited thereto.

For example, the sequence numbers of the audio data carried by the first audio frame are audio data 1 and audio data 3, and the sequence numbers of the audio data carried by the second audio frame are audio data 2 and audio data 4, then the interval between the audio data 1 and the audio data 3 in the first audio frame is 1 (i.e. 1 audio data 2 is absent), and the interval between the audio data 2 and the audio data 4 in the second audio frame is 1 (i.e. 1 audio data 3 is absent), i.e. the interval between any two adjacent audio data in the first audio frame and the second audio frame is equal.

For another example, the sequence numbers of the audio data carried by the first audio frame are audio data 1, audio data 4 and audio data 5, the sequence numbers of the audio data carried by the second audio frame are audio data 2, audio data 3 and audio data 6, the interval between the audio data 1 and the audio data 4 in the first audio frame is 2 (i.e. the audio data 2 and the audio data 3 are absent), and the interval between the audio data 4 and the audio data 5 is 0 (i.e., audio data is not missing), the interval between the audio data 2 and the audio data 3 in the second audio frame is 0, and the interval between the audio data 3 and the audio data 6 is 2 (i.e., audio data 4 and audio data 5 are missing), the interval between any adjacent two audio data in the first audio frame is not equal and the interval between any adjacent two audio data in the second audio frame is not equal.

In addition, in a case where the first device transmits M audio data corresponding to the entire piece of audio through at least two audio frames, the second device may receive the at least two audio frames transmitted by the first device. And, in the case where the second device determines that there is a lost audio frame among the at least two audio frames, the second device may perform interpolation processing by other audio frames than the lost audio frame among the at least two audio frames, so that the second device can acquire interpolated audio data corresponding to the audio data in the lost audio frame.

In a case where the first device transmits a first audio frame generated from N audio data out of M audio data acquired by the first device and a second audio frame generated from M-N audio data out of the M audio data, excluding the N audio data, and the second device receives only the first audio frame. Thus, the second device may interpolate the first audio frame to acquire M-N interpolated audio data corresponding to the M-N audio data.

It should be understood that the second audio frame may be lost when the second device does not receive the second audio frame, or when the second device receives a part of the second audio frame, however, after the preset time is reached, the second device does not receive the complete second audio frame, and subsequently, the second device may also discard the received part of the second audio frame, and the embodiment of the present application is not limited thereto. The preset time here can also be set according to actual requirements, and the embodiment of the present application is not limited to this.

For example, under the condition that a first audio frame carries audio data 1 and audio data 4, and a second audio frame carries audio data 2 and audio data 3, and under the condition that the second device receives only the first audio frame, the second device may analyze the first audio frame to obtain the audio data 1 and the audio data 4 in the first audio frame. However, the interval audio data 2 and the interval audio data 3 are missing between the audio data 1 and the audio data 4, and the second device can acquire the interpolated audio data 2 and the interpolated audio data 3 corresponding to the interval audio data 2 and the interval audio data 3 by interpolation. The interpolation is realized by the following steps: the loudness of the audio data 1 is 6, the loudness of the audio data 4 is 4, the second device may perform interpolation processing by using the audio data 1 and the audio data 4 that are close in time to the interval audio data 2 and the interval audio data 3, and the second device may also regard the interval audio data 2 and the interval audio data 3 as a whole, and the loudness of the interpolated audio data 2 and the interpolated audio data 3 corresponding to the interval audio data 2 and the interval audio data 3 is (6+ 4)/2-5.

It should be noted that, when the number of the interval audio data in the first audio frame is greater than 1, the consecutive interval audio data may be regarded as a whole, and the interpolated audio data corresponding to the whole is obtained according to the two audio data connected to the whole, so that the audio can be played.

it should also be understood that although the above illustrates the case where the first device transmits the first audio frame and the second device receives only the first audio frame, those skilled in the art will understand that the case where the first device transmits the first audio frame and the second device receives only the second audio frame is similar to the case where only the first audio frame is received, and the detailed description is omitted here, and the above-mentioned related description can be specifically referred to.

In summary, the first device generates the first audio frame, where the first audio frame includes N audio data, where N is a positive integer greater than or equal to 2, so that the first device can send at least two audio data to the second device through one first audio frame.

In order to facilitate understanding of the technical solutions of the embodiments of the present application, the solutions of the present application are described below by specific solutions.

Referring to fig. 6, fig. 6 is a specific flowchart illustrating an audio transmission method according to an embodiment of the present disclosure. The audio transmission method as shown in fig. 6 includes the steps of:

The acquisition device in the first device performs audio acquisition (or the first device performs audio acquisition through the acquisition device); the first device converts the collected audio into PCM data; the first device encodes the PCM data into audio data of a specific format. The audio data in the specific format may be audio data in MP3 format, or audio data in WAV format, etc.; the first equipment sends the audio data with a specific format to a sending buffer queue; the first equipment reads at least two audio data from the sending buffer queue and packs the at least two audio data into an RTP packet; the first device submits the RTP packet to a UDP channel or a TCP channel, so that the first device encapsulates the RTP packet into a UDP packet or a TCP packet; the first device encapsulates the UDP packets or the TCP packets into first audio frames; the first device sends the first audio frame to the second device.

The second equipment receives a first audio frame sent by the first equipment through a network; the second device parses the RTP packet from the UDP channel or the TCP channel, that is, the second device parses the first audio frame into the UDP packet or the TCP packet, and the second device further parses the UDP packet or the TCP packet into the RTP packet; the second equipment analyzes the content of the RTP application load from the RTP packet, so that the second equipment acquires the audio data with a specific format; the second device puts at least two audio data analyzed from the first audio frame into a receiving buffer area so as to buffer the audio data; the second device may periodically read the audio data from the receive buffer and also decode the audio data into PCM data; the second device converts the PCM data into audio and plays the audio through the playing device.

It should be noted that, compared with fig. 1 in the prior art, fig. 6 in the embodiment of the present application differs: the first device in this embodiment of the present application is added with a sending buffer queue, and the first audio frame in this embodiment of the present application has at least two audio data.

As shown in fig. 7, fig. 7 is a schematic diagram illustrating a feedback of a first audio frame according to an embodiment of the present application. The feedback process of the second device can be seen in fig. 7 as a dashed line from the RTP packet of the second device to the sending buffer queue of the first device, i.e. the feedback can be implemented by RTCP technique. Specifically, the method comprises the following steps:

In a case where the second device receives the first audio frame transmitted by the first device, the second device may generate feedback information according to the parsed RTP packet, and transmit the feedback information to a transmission buffer queue in the first device. Therefore, after the first device receives the feedback message, the first device may adjust the number of audio data in the first audio frame, and may also transmit M audio data through the first audio frame and the second audio frame.

It should be understood that the feedback process is only an example, and the embodiment of the present application is not limited thereto. For example, in the case where a new identifier is added to the UDP header in the first audio frame, the feedback of the second device may also be from the RTP packet of the second device to the transmission buffer queue of the first device.

A scheme of transmitting M audio data through the first audio frame and the second audio frame in the embodiment of the present application is described below with reference to fig. 8.

After the first device acquires the feedback information sent by the second device, in order to avoid the situation that the whole segment of audio is lost when the network loses packets, when the first device reads audio data from the sending buffer queue, the first device may read a plurality of audio data in a jumping manner at certain intervals.

For example, as shown in fig. 8, fig. 8 shows a schematic diagram for interpolating audio data according to an embodiment of the present application. In fig. 8, in the case where a certain entire piece of audio includes audio data 1, audio data 2, audio data 3, audio data 4, audio data 5 through audio data 6, the first device may construct a first audio frame using the audio data 1, audio data 3, and audio data 5, and the first device also constructs a second audio frame using the audio data 2, audio data 4, and audio data 6. And the first device sends the first audio frame and the second audio frame to the second device, and under the condition that the second device only receives the first audio frame, namely the second audio frame is lost, the second device can calculate the interpolation audio data 2 corresponding to the audio data 2 through the audio data 1 and the audio data 3. It should be understood that the acquisition process of the interpolated audio data 4 corresponding to the audio data 4 and the acquisition process of the interpolated audio data 6 corresponding to the audio data 6 are similar to the acquisition process of the interpolated audio data 2, and a description thereof will not be repeated.

it should be understood that the above-mentioned audio transmission method is only exemplary, and those skilled in the art can make various changes, modifications or alterations according to the above-mentioned method.

Referring to fig. 9, fig. 9 shows a schematic structural diagram of an audio transmission apparatus 900 provided in an embodiment of the present application, and it should be understood that the apparatus 900 corresponds to the above-mentioned embodiment on the first device side in fig. 4, and is capable of performing the steps related to the first device side in the above-mentioned method embodiment, and specific functions of the apparatus 900 may be referred to the above description, and a detailed description is appropriately omitted here to avoid repetition. The device 900 includes at least one software functional module that can be stored in memory in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the device 900.

Specifically, the apparatus 900 includes:

A first generating module 910, configured to generate a first audio frame, where the first audio frame includes N audio data, and N is a positive integer greater than or equal to 2; a first sending module 920, configured to send the first audio frame to the second device.

In one possible embodiment, the first generating module 910 includes: an acquisition module (not shown) for acquiring N audio data; and a merging module (not shown) configured to merge the N audio data to obtain a first audio frame.

In one possible embodiment, the audio transmission device further comprises: a receiving module (not shown) for receiving the feedback information sent by the second device; and an adjusting module (not shown) for adjusting the number of audio data in the first audio frame according to the feedback information.

In a possible embodiment, the obtaining module is further configured to obtain M audio data, where M is a positive integer greater than N; a first generating module 910, further configured to generate a first audio frame by N audio data of the M audio data; a first generating module 910, further configured to generate a second audio frame by using M-N audio data other than the N audio data from among the M audio data; the first sending module 920 is further configured to send the second audio frame to the second device.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the apparatus described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.

Referring to fig. 10, fig. 10 shows a schematic structural diagram of another audio transmission apparatus 1000 provided in the embodiment of the present application, and it should be understood that the apparatus 1000 corresponds to the embodiment of the second device side in fig. 4, and can perform the steps related to the second device side in the above method embodiment, and the specific functions of the apparatus 1000 may be referred to the description above, and a detailed description is appropriately omitted here to avoid repetition. The device 1000 includes at least one software functional module that can be stored in a memory in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the device 1000. Specifically, the apparatus 1000 includes:

A receiving module 1010, configured to receive a first audio frame sent by a first device, where the first audio frame includes N audio data, and N is a positive integer greater than or equal to 2; the parsing module 1020 is configured to parse the first audio frame to obtain N audio data.

in one possible embodiment, the audio transmission device further comprises: a second generating module (not shown) for generating feedback information from the first audio frame; the second sending module (not shown) is further configured to send the feedback information to the first device, so that the first device adjusts the number of the audio data in the first audio frame according to the feedback information.

In one possible embodiment, in a case where the first device transmits a first audio frame and a second audio frame to the second device, and the second device receives only the first audio frame, the first audio frame is generated by N audio data out of M audio data acquired by the first device, and the second audio frame is generated by M-N audio data out of the M audio data, excluding the N audio data, the audio transmission apparatus further includes: an interpolation module (not shown) is configured to interpolate the first audio frame to obtain M-N interpolated audio data corresponding to the M-N audio data.

the present application further provides an electronic device 1100, which electronic device 1100 may be provided in a first device or in a second device.

Fig. 11 is a block diagram of an electronic device 1100 in the embodiment of the present application, as shown in fig. 11. Electronic device 1100 may include a processor 1110, a communication interface 1120, a memory 1130, and at least one communication bus 1140. Wherein a communication bus 1140 is used to enable direct, coupled communication of these components. The communication interface 1120 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. Processor 1110 may be an integrated circuit chip having signal processing capabilities. The Processor 1110 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 1110 may be any conventional processor or the like.

The Memory 1130 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 1130 stores computer readable instructions, and when the computer readable instructions are executed by the processor 1110, the electronic device 1100 may perform the steps of the corresponding device side in the embodiment of the method in fig. 4. For example, in the case where the electronic device 1100 is provided in a first device, the memory 1130 stores therein computer-readable instructions, and when the computer-readable instructions are executed by the processor 1110, the electronic device 1100 may perform the steps of the first device side in the above-described embodiment of the method of fig. 4.

The electronic device 1100 may further include a memory controller, an input-output unit, an audio unit, and a display unit.

The memory 1130, the memory controller, the processor 1110, the peripheral interface, the input/output unit, the audio unit, and the display unit are electrically connected to each other directly or indirectly to implement data transmission or interaction. For example, these components may be electrically coupled to each other via one or more communication buses 1140. The processor 1110 is configured to execute executable modules stored in the memory 1130, such as software functional modules or computer programs included in the electronic device 1100.

The input and output unit is used for providing input data for a user to realize the interaction of the user and the server (or the local terminal). The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.

The audio unit provides an audio interface to the user, which may include one or more microphones, one or more speakers, and audio circuitry.

The display unit provides an interactive interface (e.g. a user interface) between the electronic device and a user or for displaying image data to a user reference. In this embodiment, the display unit may be a liquid crystal display or a touch display. In the case of a touch display, the display can be a capacitive touch screen or a resistive touch screen, which supports single-point and multi-point touch operations. The support of single-point and multi-point touch operations means that the touch display can sense touch operations simultaneously generated from one or more positions on the touch display, and the sensed touch operations are sent to the processor for calculation and processing.

The input and output unit is used for providing input data for a user to realize the interaction between the user and the processing terminal. The input/output unit may be, but is not limited to, a mouse, a keyboard, and the like.

It is to be understood that the configuration shown in FIG. 11 is merely exemplary, and that the electronic device 1100 may include more or fewer components than shown in FIG. 11, or have a different configuration than shown in FIG. 11. The components shown in fig. 11 may be implemented in hardware, software, or a combination thereof.

An embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the method according to any optional implementation manner on the first device side in fig. 4.

the embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the method according to any optional implementation manner on the second device side in fig. 4.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing method, and will not be described in too much detail herein.

it should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An audio transmission method, comprising:

Generating a first audio frame by a first device, wherein the first audio frame comprises N audio data, and N is a positive integer greater than or equal to 2;

The first device sends the first audio frame to a second device.

2. The audio transmission method of claim 1, wherein the first device generates a first audio frame comprising:

The first device acquires the N pieces of audio data;

And the first equipment combines the N audio data to obtain a first audio frame.

3. The audio transmission method according to claim 2, characterized in that the audio transmission method further comprises:

The first equipment receives feedback information sent by the second equipment;

And the first equipment adjusts the number of the audio data in the first audio frame according to the feedback information.

4. the audio transmission method of claim 1, wherein the first device generates a first audio frame comprising:

The first device acquires M pieces of audio data, wherein M is a positive integer larger than N;

The first device generating the first audio frame with the N of the M pieces of the audio data;

The audio transmission method further includes:

The first device generating a second audio frame by M-N audio data other than the N audio data among the M audio data;

The first device sends the second audio frame to the second device.

5. An audio transmission method, comprising:

The method comprises the steps that a first audio frame sent by a first device is received by a second device, wherein the first audio frame comprises N pieces of audio data, and N is a positive integer greater than or equal to 2;

and the second equipment analyzes the first audio frame to obtain the N audio data.

6. The audio transmission method according to claim 5, wherein the first audio frame is obtained by the first device combining the N audio data.

7. The audio transmission method according to claim 6, further comprising:

The second equipment generates feedback information according to the first audio frame;

And the second equipment sends the feedback information to the first equipment, so that the first equipment can adjust the number of the audio data in the first audio frame according to the feedback information.

8. the audio transmission method according to claim 5, wherein in a case where the first device transmits the first audio frame and a second audio frame to the second device and the second device receives only the first audio frame, the first audio frame is generated from the N audio data out of the M audio data acquired by the first device, and the second audio frame is generated from M-N audio data out of the M audio data, excluding the N audio data, the audio transmission method further comprises:

And the second equipment interpolates the first audio frame to acquire M-N interpolated audio data corresponding to the M-N audio data.

9. an audio transmission apparatus, applied to a first device, the audio transmission apparatus comprising:

The audio processing device comprises a first generating module, a second generating module and a processing module, wherein the first generating module is used for generating a first audio frame, the first audio frame comprises N audio data, and N is a positive integer greater than or equal to 2;

And the first sending module is used for sending the first audio frame to the second equipment.

10. An audio transmission apparatus, wherein the audio transmission apparatus is applied to a second device, the audio transmission apparatus comprising:

The device comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a first audio frame sent by first equipment, the first audio frame comprises N pieces of audio data, and N is a positive integer greater than or equal to 2;

And the analysis module is used for analyzing the first audio frame to obtain the N audio data.