CN113674751A

CN113674751A - Audio processing method and device, electronic equipment and storage medium

Info

Publication number: CN113674751A
Application number: CN202110778572.2A
Authority: CN
Inventors: 高华; 严新民; 许一峰
Original assignee: Beijing Zitiao Network Technology Co Ltd
Current assignee: Beijing Zitiao Network Technology Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-11-19

Abstract

The disclosure provides an audio processing method, an audio processing apparatus, an electronic device, and a storage medium. One embodiment of the method comprises: determining azimuth information of the target audio according to an extension head of the audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and codes of the target audio are recorded in a packet body of the audio data packet; and outputting the target audio according to the azimuth information of the target audio and the coding of the target audio. On one hand, the method does not need the encoding of the target audio to contain the azimuth information, can be compatible with the existing encoding and decoding mode, has wider application range, and on the other hand, the method can ensure that the azimuth information directly corresponds to the encoded data, thereby reducing the problem of information lag or loss caused by multi-path transmission.

Description

Audio processing method and device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the technical field of audio processing, in particular to an audio processing method, an audio processing device, electronic equipment and a storage medium.

Background

In a centralized Real-time Transport Protocol (RTP) audio conference, the mono audio codec does not relate to the characteristics of the audio source, such as iLBC, OPUS, g.711, etc. Stereo or multi-channel codecs can transmit spatial features but with higher bandwidth. Using directional audio coding (DirAC) development, audio source locations can be captured and the spatial characteristics of sound recorded using existing microphone systems can be reproduced. In an audio-video conference, real-time remote surround sound reconstruction may improve the interaction between participants.

However, DirAC focuses mainly on how to acquire spatial attributes, and how to transmit the spatial attributes is still a technical problem to be solved.

Disclosure of Invention

The embodiment of the disclosure provides an audio processing method and device, an electronic device and a storage medium.

In a first aspect, the present disclosure provides an audio processing method, including:

determining azimuth information of a target audio according to an extension header of an audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and codes of the target audio are recorded in a packet body of the audio data packet;

and outputting the target audio according to the azimuth information of the target audio and the coding of the target audio.

In some optional embodiments, before outputting the target audio according to the orientation information of the target audio and the coding of the target audio, the method further includes:

determining reverberation information of the target audio according to an extension header of the audio data packet; and

the outputting the target audio according to the azimuth information of the target audio and the coding of the target audio includes:

and outputting the target audio according to the azimuth information of the target audio, the reverberation information of the target audio and the coding of the target audio.

In some optional embodiments, the determining the azimuth information of the target audio according to the extension header of the audio data packet includes:

determining the type of a coordinate system according to the first data in the extension head;

and determining the azimuth information under the coordinate system type according to the second data in the extension head.

In some alternative embodiments, the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or

The coordinate system type is a rectangular plane coordinate system, and the orientation information includes a first coordinate value and a second coordinate value.

In some alternative embodiments, the coordinate system is a spherical coordinate system, the accuracy of the first angle value is non-linear, and/or the accuracy of the second angle value is non-linear.

In some optional embodiments, the audio data packet is a real-time transport protocol data packet.

In some optional embodiments, the target audio is audio for audio-video conference.

In some optional embodiments, the outputting the target audio according to the orientation information of the target audio and the encoding of the target audio includes:

and outputting a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology according to the azimuth information.

In a second aspect, the present disclosure provides an audio processing method, including:

recording the code of a target audio in a packet body of an audio data packet, and recording the azimuth information of the target audio in an extension header of the audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio;

and sending the audio data packet to the target equipment.

In some optional embodiments, before the sending the audio data packet to the target device, the method further includes:

and recording the reverberation information of the target audio in the extension header of the audio data packet.

In some optional embodiments, the recording of the azimuth information of the target audio in the extension header of the audio data packet includes:

first data for recording a coordinate system type in the extension header;

and second data for recording the orientation information in the coordinate system type in the extension header.

In some optional embodiments, the orientation information is used for the target device to output a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology.

In a third aspect, the present disclosure provides an audio processing apparatus comprising:

the processing unit is used for determining azimuth information of a target audio according to an extension header of an audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and the code of the target audio is recorded in a packet body of the audio data packet;

and an output unit for outputting the target audio according to the azimuth information of the target audio and the code of the target audio.

In some optional embodiments, the processing unit is further configured to:

the output unit is further configured to:

In some optional embodiments, the processing unit is further configured to:

In some optional embodiments, the output unit is further configured to:

In a fourth aspect, the present disclosure also provides an audio processing apparatus, including:

a recording unit, configured to record a target audio code in a packet of an audio data packet, and record azimuth information of the target audio in an extension header of the audio data packet, where the azimuth information is used to represent a spatial azimuth of a sound source corresponding to the target audio;

and the sending unit is used for sending the audio data packet to the target equipment.

In some optional embodiments, the recording unit is further configured to:

first data for recording a coordinate system type in the extension header;

In a fifth aspect, the present disclosure provides an electronic device comprising:

one or more processors;

a storage device having one or more programs stored thereon,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments of the first or second aspects of the disclosure.

In a sixth aspect, the present disclosure provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by one or more processors, implements the method as described in any one of the embodiments of the first or second aspect of the present disclosure.

According to the audio processing method, the audio processing device, the electronic equipment and the storage medium, the encoding of the target audio is transmitted by using the inclusion of the audio data packet, and the azimuth information of the target audio is transmitted by using the extension header of the audio data packet, so that on one hand, the encoding of the target audio does not need to contain azimuth information, the existing encoding and decoding mode can be compatible, the application range is wider, on the other hand, the azimuth information can directly correspond to the encoded data, and the problem of information lag or loss caused by multipath transmission is solved.

In addition, the audio processing method, the audio processing device, the electronic equipment and the storage medium provided by the disclosure utilize the extension header of the audio data packet to transmit the azimuth information of the target audio, so that the transmission bandwidth requirement is reduced. In addition, the transmitting end can adjust the extension head of the audio data packet in real time before the audio data packet is transmitted, so the scheme provided by the disclosure can also realize the real-time transmission of the azimuth information and the dynamic adjustment of the azimuth information by the transmitting end.

Drawings

Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings. The drawings are only for purposes of illustrating the particular embodiments and are not to be construed as limiting the invention. In the drawings:

fig. 1 is a system architecture diagram of one embodiment of an audio processing system according to the present disclosure;

FIG. 2A is a flow diagram for one embodiment of an audio processing method according to the present disclosure;

fig. 2B is a schematic diagram of an extension header of an audio data packet in the audio processing method according to the present disclosure;

fig. 2C is a schematic diagram of a mapping relationship of first angle data and first angle values in the audio processing method according to the present disclosure;

fig. 2D is a schematic diagram of a mapping relationship of second angle data and second angle values in an audio processing method according to the present disclosure;

FIG. 2E is a schematic diagram of a rectangular plane coordinate system in the audio processing method according to the disclosure;

FIG. 3 is a flow diagram of another embodiment of an audio processing method according to the present disclosure;

FIG. 4A is a schematic block diagram of one embodiment of an audio processing device according to the present disclosure;

FIG. 4B is a schematic block diagram of another embodiment of an audio processing device according to the present disclosure;

FIG. 5 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the audio processing method, apparatus, terminal device, and storage medium of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a voice interaction application, a video conference application, a short video social application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices having a microphone and a speaker, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background server that processes the audio-video conference request sent by the

terminal devices

101, 102, 103.

In some cases, the audio processing method provided by the present disclosure may be executed by the server 105, and accordingly, the audio processing apparatus may also be disposed in the server 105, and in this case, the system architecture 100 may also not include the

terminal devices

101, 102, and 103.

In some cases, the audio processing method provided by the present disclosure may be executed by the

terminal devices

101, 102, and 103, and accordingly, the audio processing apparatus may also be disposed in the

terminal devices

101, 102, and 103, and in this case, the system architecture 100 may not include the server 105.

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continuing reference to fig. 2A, a flow 210 of one embodiment of an audio processing method according to the present disclosure is shown, applicable to the terminal device in fig. 1, the flow 210 including the steps of:

and step 211, determining azimuth information of the target audio according to the extension header of the audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and the code of the target audio is recorded in the packet body of the audio data packet.

In the present embodiment, the azimuth information is used to indicate the spatial azimuth of the sound source corresponding to the target audio, for example, the direction of the sound source with respect to the audio collecting position or the listener.

Generally, a data packet is a transmission unit in network data transmission. The data packet can be divided into a packet header and a packet body. The header is a special reserved field of defined bit length attached to the front of the data packet for carrying and transmission of control information, which serves both control and descriptive purposes. The body is the part used to carry the data to be transmitted.

Some communication protocols (e.g., real-time transport protocol, RTP) support extension of packet headers. At this time, the Header of the packet may be further divided into a Fixed Header (Fixed Header) and an extended Header (Header extension).

In the present embodiment, the azimuth information of the target audio may be recorded in the extension header of the audio packet.

Fig. 2B is a schematic diagram of an extension header of an audio data packet in the audio processing method according to the present disclosure. In the example shown in fig. 2B, two kinds of extended header audio position information defined in [ RFC5285] are used. Wherein, the upper part of FIG. 2B is a one-Byte expansion head, and the lower part of FIG. 2B is a Two-Byte expansion head.

As shown in fig. 2B, the parameter "m" (i.e., the first data) is used to record the coordinate system type. In one example, the parameter "m" is two bits in length. The parameter "m" has a value of "00" and may indicate that the coordinate system type is a spherical coordinate system. The origin of the spherical coordinate system is, for example, the position of the listener. The parameter "m" has a value of "01", and may indicate that the coordinate system type is a planar rectangular coordinate system. The origin of the rectangular plane coordinate system is, for example, the position of the listener. The mapping relationship between the value of the parameter "m" and the type of the coordinate system may be set according to actual needs, which is not limited by the present disclosure.

As shown in fig. 2B, the parameters "sita (or) X" and "phi (or) Y" (i.e., the second data) are used to record the orientation information corresponding to the above-mentioned coordinate system type. For example, in the case where the coordinate system type is a spherical coordinate system, the parameter "sita (or) X" (i.e., first angle data) represents a horizontal angle (i.e., a first angle value) in the spherical coordinate system, and the parameter "phi (or) Y" (i.e., second angle data) represents a vertical angle (i.e., a second angle value) in the spherical coordinate system. For another example, in the case where the coordinate system type is a planar rectangular coordinate system, the parameter "sita (or) X" represents an abscissa (i.e., a first coordinate value) in the planar rectangular coordinate system, and the parameter "phi (or) Y" represents an ordinate (i.e., a second coordinate value) in the planar rectangular coordinate system.

In the above example, in the case where the parameter "sita (or) X" represents the horizontal angle in the spherical coordinate system, the length thereof is, for example, 7 bits, where the first bit is the sign bit, the last six bits are the angle bits, and 0 degree is defined as being directly in front of the listener. Since 7-bit data can represent 128 values at most, and the horizontal angle in the spherical coordinate system has a range of [ -180,180], which refers to 360 integers, it is necessary to determine the mapping relationship between the parameter "sita (or) X" (i.e. the first angle data) and the horizontal angle in the spherical coordinate system (i.e. the first angle value).

Fig. 2C is a schematic diagram of a first mapping relationship between first angle data and first angle values in the audio processing method according to the present disclosure. In the psychoacoustic study, it was confirmed that there was a blur in the determination of human orientation, and the resolution was ± 2 degrees in the front direction, ± 10 degrees in the side direction, and ± 5 degrees in the rear direction. Therefore, the intervals of the first angle values in fig. 2C are non-uniform, and as the first angle values increase, the intervals between adjacent first angle values undergo a variation process of "1-2-10-5", i.e., the accuracy of the first angle values is non-linear. Through the mode, the first angle data with the limited number can be used for representing the first angle value with the large range on one hand, and on the other hand, the first angle data can keep consistent with the distinguishing characteristics of the human to the direction, so that the influence on the actual space auditory sensation is avoided.

It should be noted that fig. 2C only shows the mapping relationship between the first angle data and the first angle value when both the first angle data and the first angle value are positive values, and the mapping relationship between the first angle data and the first angle value when both the first angle data and the first angle value are negative values is similar to this, and is not repeated here.

Fig. 2D is a schematic diagram of a mapping relationship of second angle data and second angle values in the audio processing method according to the present disclosure. Similar to the example shown in fig. 2C, the interval of the second angle values in fig. 2D is also non-uniform. As the second angle value increases, the interval between adjacent second angle values undergoes a "2-5" change, i.e., the accuracy of the second angle value is non-linear. Fig. 2D only shows the mapping relationship between the second angle data and the second angle value when both the second angle data and the second angle value are positive values, and the mapping relationship between the second angle data and the second angle value when both the second angle data and the second angle value are negative values is similar to this, which is not described herein again.

Fig. 2E is a schematic diagram of a planar rectangular coordinate system in the audio processing method according to the present disclosure. The planar rectangular coordinate system does not relate to height information. As shown in fig. 2E, the origin of the rectangular plane coordinate system may be the position of the listener, the abscissa axis may be along the direction facing the listener, the positive direction of which is consistent with the direction of the line of sight of the listener, and the ordinate axis may be perpendicular to the aforementioned abscissa axis.

The range of the abscissa (i.e., the first coordinate value) in the rectangular planar coordinate system is [ -1,1], and a parameter "sita (or) X" with a length of 7 bits can be used for quantization expression. Similarly, the range of the ordinate (i.e. the second coordinate value) in the rectangular plane coordinate system is [ -1,1], and the parameter "phi (or) Y" with a length of 7 can be used for quantization.

Based on the above description, the execution subject of the audio processing method in the present embodiment may determine the coordinate system type from the first data in the extension header, and determine the orientation information under the coordinate system type from the second data in the extension header.

And 212, outputting the target audio according to the azimuth information of the target audio and the coding of the target audio.

In this embodiment, the execution subject of the audio processing method in this embodiment may output a three-dimensional sound field of the target audio by combining the orientation information of the target audio through a virtual three-dimensional (3D) technology or the like, thereby realizing the spatial listening sensation of the target audio.

Here, the execution main body of the audio processing method in this embodiment may directly play the target audio, or transmit the corresponding audio signal to other electronic devices (e.g., a bluetooth speaker) for playing.

In one example, the extension header of the audio data packet also includes reverberation information. For example, in the example shown in fig. 2B, the parameter "revb" may represent the reverberation type (e.g., hall, room, or tunnel, etc.). On this basis, the execution subject of the audio processing method in the present embodiment may output the target audio by determining the reverberation information of the target audio from the extension header of the audio packet, and outputting the target audio from the azimuth information of the target audio, the reverberation information of the target audio, and the encoding of the target audio. Through the method, the spatial perception reconstruction can be performed by using the predefined reverberation model, and the spatial perception of the target audio is further improved.

The audio processing method in this embodiment is different from a method of transmitting azimuth information using a packet body, in which a packet body of an audio data packet is used to transmit a code of a target audio, and an extension header of the audio data packet is used to transmit azimuth information of the target audio. Therefore, the audio processing method in this embodiment does not need the encoding itself of the target audio to contain the azimuth information, can be compatible with the existing encoding and decoding method, and has a wider application range, and can make the azimuth information directly correspond to the audio encoding, thereby reducing the problem of information lag or loss caused by multipath transmission.

With continuing reference to fig. 3, a flow 310 of another embodiment of an audio processing method according to the present disclosure is shown, applicable to the server in fig. 1, the flow 310 including the steps of:

step 311, recording the code of the target audio in the packet body of the audio data packet, and recording the azimuth information of the target audio in the extension header of the audio data packet, where the azimuth information is used to represent the spatial azimuth of the sound source corresponding to the target audio.

Here, the structure of the extension header of the audio data packet may refer to the description in the above embodiment, and is not described here again.

Step 312, the audio data packet is sent to the target device.

Here, the target device is, for example, the terminal device in fig. 1.

In one example, the target audio may be an audio-video conference audio, and the execution subject of the audio processing method in this embodiment may be an audio-video conference server. In this case, the video conference server can conveniently perform real-time transfer and dynamic control of the orientation information.

The audio processing method in the present embodiment can achieve similar technical effects to the audio processing method in the foregoing embodiments. In addition, the audio processing method in this embodiment transmits the azimuth information of the target audio by using the extension header of the audio data packet, and can implement real-time transmission of the azimuth information and dynamic adjustment of the azimuth information by the sending end.

With further reference to fig. 4A, as an implementation of the method shown in fig. 2A described above, the present disclosure provides an embodiment of an audio processing apparatus, which corresponds to the method embodiment shown in fig. 2A, and which is specifically applicable to various terminal devices.

As shown in fig. 4A, the audio processing apparatus 410 of the present embodiment includes: a processing unit 411 and an output unit 412. The processing unit 411 is configured to determine, according to an extension header of an audio data packet, azimuth information of a target audio, where the azimuth information is used to represent a spatial azimuth of a sound source corresponding to the target audio, and a code of the target audio is recorded in a packet body of the audio data packet; an output unit 412, configured to output the target audio according to the azimuth information of the target audio and the coding of the target audio.

In this embodiment, the detailed processing of the processing unit 411 and the output unit 412 and the technical effects thereof can refer to the related descriptions of step 211 and step 212 in the corresponding embodiment of fig. 2A, which are not repeated herein.

In some optional embodiments, the 411 processing unit is further configured to: determining reverberation information of the target audio according to an extension header of the audio data packet; and the output unit 412 is further configured to: and outputting the target audio according to the azimuth information of the target audio, the reverberation information of the target audio and the coding of the target audio.

In some optional embodiments, the processing unit 411 is further configured to: determining the type of a coordinate system according to the first data in the extension head; and determining the azimuth information under the coordinate system type according to the second data in the extension head.

In some alternative embodiments, the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or the coordinate system type is a plane rectangular coordinate system, and the orientation information comprises a first coordinate value and a second coordinate value.

In some optional embodiments, the output unit 412 is further configured to: and outputting a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology according to the azimuth information.

It should be noted that, for details of implementation and technical effects of each unit in the audio processing apparatus provided in the embodiments of the present disclosure, reference may be made to descriptions of other embodiments in the present disclosure, and details are not described herein again.

With further reference to fig. 4B, as an implementation of the method shown in fig. 3 described above, the present disclosure provides an embodiment of an audio processing apparatus, which corresponds to the embodiment of the method shown in fig. 3, and which is particularly applicable to various servers.

As shown in fig. 4B, the audio processing apparatus 420 of the present embodiment includes: a recording unit 421 and a transmitting unit 422. The recording unit 411 is configured to record a code of a target audio in a packet of an audio data packet, and record azimuth information of the target audio in an extension header of the audio data packet, where the azimuth information is used to indicate a spatial azimuth of a sound source corresponding to the target audio; a sending unit 412, configured to send the audio data packet to a target device.

In this embodiment, the detailed processing of the recording unit 421 and the sending unit 422 and the technical effects thereof can refer to the related descriptions of step 321 and step 322 in the corresponding embodiment of fig. 3, which are not repeated herein.

In some optional embodiments, the recording unit 421 is further configured to: and recording the reverberation information of the target audio in the extension header of the audio data packet.

In some optional embodiments, the recording unit 421 is further configured to: first data for recording a coordinate system type in the extension header; and second data for recording the orientation information in the coordinate system type in the extension header.

Referring now to FIG. 5, there is shown a schematic block diagram of a computer system 500 suitable for use in implementing the terminal device or server of the present disclosure. The computer system 500 shown in fig. 5 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.

As shown in fig. 5, computer system 500 may include a processing device (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage device 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the computer system 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.

Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, and the like; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the computer system 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates a computer system 500 having various means of electronic equipment, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the audio processing method as shown in the embodiment shown in fig. 2A or fig. 2B and its optional embodiments.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation on the unit itself, for example, the processing unit may also be described as "a unit for determining the azimuth information of the target audio from the extension header of the audio packet".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. An audio processing method, comprising:

Determine the orientation information of the target audio according to the extension header of the audio data packet, wherein the orientation information is used to represent the spatial orientation of the sound source corresponding to the target audio, and the encoding of the target audio is recorded in the audio data packet. in the package;

The target audio is output according to the orientation information of the target audio and the encoding of the target audio.

2. The method according to claim 1, wherein, before outputting the target audio according to the orientation information of the target audio and the encoding of the target audio, the method further comprises:

determining the reverberation information of the target audio according to the extension header of the audio data packet; and

The outputting of the target audio according to the orientation information of the target audio and the encoding of the target audio includes:

The target audio is output according to the orientation information of the target audio, the reverberation information of the target audio and the encoding of the target audio.

3. The method according to claim 1, wherein, determining the orientation information of the target audio according to the extension header of the audio data packet, comprising:

Determine the coordinate system type according to the first data in the extension header;

According to the second data in the extension header, the orientation information under the coordinate system type is determined.

4. The method according to claim 3, wherein the type of the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or

The type of the coordinate system is a plane rectangular coordinate system, and the orientation information includes a first coordinate value and a second coordinate value.

5. The method according to claim 4, wherein the type of the coordinate system is a spherical coordinate system, the precision of the first angle value is non-linear, and/or the precision of the second angle value is non-linear.

6. The method according to any one of claims 1-5, wherein the audio data packets are real-time transport protocol data packets.

7. The method according to any one of claims 1-5, wherein the target audio is audio-video conference audio.

8. The method according to any one of claims 1-5, wherein the outputting the target audio according to the orientation information of the target audio and the encoding of the target audio comprises:

According to the orientation information, the virtual three-dimensional technology is used to output the three-dimensional sound field corresponding to the target audio.

9. An audio processing method, comprising:

The encoding of the target audio is recorded in the packet body of the audio data packet, and the orientation information of the target audio is recorded in the extension header of the audio data packet, wherein the orientation information is used to indicate that the target audio corresponds to the spatial orientation of the sound source;

Send the audio data packet to the target device.

10. The method of claim 9, wherein before the sending the audio data packet to the target device, the method further comprises:

The reverberation information of the target audio is recorded in the extension header of the audio data packet.

11. The method according to claim 9, wherein the recording of the orientation information of the target audio in the extension header of the audio data packet comprises:

recording the coordinate system type in the first data in the extension header;

The orientation information under the coordinate system type is recorded in the second data in the extension header.

12. The method according to claim 11, wherein the type of the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or

13. The method of claim 12, wherein the type of the coordinate system is a spherical coordinate system, the precision of the first angle value is non-linear, and/or the precision of the second angle value is non-linear.

14. The method according to any one of claims 9-13, wherein the orientation information is used for the target device to output a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology.

15. An audio processing device, comprising:

The processing unit is configured to determine the orientation information of the target audio according to the extension header of the audio data packet, wherein the orientation information is used to represent the spatial orientation of the sound source corresponding to the target audio, and the encoding of the target audio is recorded in the in the body of the audio data packet;

The output unit is configured to output the target audio according to the orientation information of the target audio and the encoding of the target audio.

16. An audio processing device, comprising:

a recording unit, configured to record the encoding of the target audio in the packet body of the audio data packet, and record the orientation information of the target audio in the extension header of the audio data packet, wherein the orientation information is used to indicate the spatial orientation of the sound source corresponding to the target audio;

A sending unit, configured to send the audio data packet to the target device.

17. An electronic device comprising:

one or more processors;

a storage device on which one or more programs are stored,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-14.

18. A computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by one or more processors, implements the method of any of claims 1-14.