CN113674751A - Audio processing method and device, electronic equipment and storage medium - Google Patents
Audio processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113674751A CN113674751A CN202110778572.2A CN202110778572A CN113674751A CN 113674751 A CN113674751 A CN 113674751A CN 202110778572 A CN202110778572 A CN 202110778572A CN 113674751 A CN113674751 A CN 113674751A
- Authority
- CN
- China
- Prior art keywords
- audio
- target audio
- coordinate system
- target
- data packet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 37
- 238000000034 method Methods 0.000 claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 36
- 238000004590 computer program Methods 0.000 claims description 11
- 238000005516 engineering process Methods 0.000 claims description 9
- 230000005540 biological transmission Effects 0.000 abstract description 9
- 238000010586 diagram Methods 0.000 description 19
- 238000013507 mapping Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000035807 sensation Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/40—Support for services or applications
- H04L65/403—Arrangements for multi-party communication, e.g. for conferences
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/65—Network streaming protocols, e.g. real-time transport protocol [RTP] or real-time control protocol [RTCP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/70—Media network packetisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S5/00—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation
- H04S5/005—Pseudo-stereo systems, e.g. in which additional channel signals are derived from monophonic signals by means of phase shifting, time delay or reverberation of the pseudo five- or more-channel type, e.g. virtual surround
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2420/00—Techniques used stereophonic systems covered by H04S but not provided for in its groups
- H04S2420/03—Application of parametric coding in stereophonic audio systems
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Stereophonic System (AREA)
Abstract
The disclosure provides an audio processing method, an audio processing apparatus, an electronic device, and a storage medium. One embodiment of the method comprises: determining azimuth information of the target audio according to an extension head of the audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and codes of the target audio are recorded in a packet body of the audio data packet; and outputting the target audio according to the azimuth information of the target audio and the coding of the target audio. On one hand, the method does not need the encoding of the target audio to contain the azimuth information, can be compatible with the existing encoding and decoding mode, has wider application range, and on the other hand, the method can ensure that the azimuth information directly corresponds to the encoded data, thereby reducing the problem of information lag or loss caused by multi-path transmission.
Description
Technical Field
The embodiment of the disclosure relates to the technical field of audio processing, in particular to an audio processing method, an audio processing device, electronic equipment and a storage medium.
Background
In a centralized Real-time Transport Protocol (RTP) audio conference, the mono audio codec does not relate to the characteristics of the audio source, such as iLBC, OPUS, g.711, etc. Stereo or multi-channel codecs can transmit spatial features but with higher bandwidth. Using directional audio coding (DirAC) development, audio source locations can be captured and the spatial characteristics of sound recorded using existing microphone systems can be reproduced. In an audio-video conference, real-time remote surround sound reconstruction may improve the interaction between participants.
However, DirAC focuses mainly on how to acquire spatial attributes, and how to transmit the spatial attributes is still a technical problem to be solved.
Disclosure of Invention
The embodiment of the disclosure provides an audio processing method and device, an electronic device and a storage medium.
In a first aspect, the present disclosure provides an audio processing method, including:
determining azimuth information of a target audio according to an extension header of an audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and codes of the target audio are recorded in a packet body of the audio data packet;
and outputting the target audio according to the azimuth information of the target audio and the coding of the target audio.
In some optional embodiments, before outputting the target audio according to the orientation information of the target audio and the coding of the target audio, the method further includes:
determining reverberation information of the target audio according to an extension header of the audio data packet; and
the outputting the target audio according to the azimuth information of the target audio and the coding of the target audio includes:
and outputting the target audio according to the azimuth information of the target audio, the reverberation information of the target audio and the coding of the target audio.
In some optional embodiments, the determining the azimuth information of the target audio according to the extension header of the audio data packet includes:
determining the type of a coordinate system according to the first data in the extension head;
and determining the azimuth information under the coordinate system type according to the second data in the extension head.
In some alternative embodiments, the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or
The coordinate system type is a rectangular plane coordinate system, and the orientation information includes a first coordinate value and a second coordinate value.
In some alternative embodiments, the coordinate system is a spherical coordinate system, the accuracy of the first angle value is non-linear, and/or the accuracy of the second angle value is non-linear.
In some optional embodiments, the audio data packet is a real-time transport protocol data packet.
In some optional embodiments, the target audio is audio for audio-video conference.
In some optional embodiments, the outputting the target audio according to the orientation information of the target audio and the encoding of the target audio includes:
and outputting a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology according to the azimuth information.
In a second aspect, the present disclosure provides an audio processing method, including:
recording the code of a target audio in a packet body of an audio data packet, and recording the azimuth information of the target audio in an extension header of the audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio;
and sending the audio data packet to the target equipment.
In some optional embodiments, before the sending the audio data packet to the target device, the method further includes:
and recording the reverberation information of the target audio in the extension header of the audio data packet.
In some optional embodiments, the recording of the azimuth information of the target audio in the extension header of the audio data packet includes:
first data for recording a coordinate system type in the extension header;
and second data for recording the orientation information in the coordinate system type in the extension header.
In some alternative embodiments, the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or
The coordinate system type is a rectangular plane coordinate system, and the orientation information includes a first coordinate value and a second coordinate value.
In some alternative embodiments, the coordinate system is a spherical coordinate system, the accuracy of the first angle value is non-linear, and/or the accuracy of the second angle value is non-linear.
In some optional embodiments, the orientation information is used for the target device to output a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology.
In a third aspect, the present disclosure provides an audio processing apparatus comprising:
the processing unit is used for determining azimuth information of a target audio according to an extension header of an audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and the code of the target audio is recorded in a packet body of the audio data packet;
and an output unit for outputting the target audio according to the azimuth information of the target audio and the code of the target audio.
In some optional embodiments, the processing unit is further configured to:
determining reverberation information of the target audio according to an extension header of the audio data packet; and
the output unit is further configured to:
and outputting the target audio according to the azimuth information of the target audio, the reverberation information of the target audio and the coding of the target audio.
In some optional embodiments, the processing unit is further configured to:
determining the type of a coordinate system according to the first data in the extension head;
and determining the azimuth information under the coordinate system type according to the second data in the extension head.
In some alternative embodiments, the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or
The coordinate system type is a rectangular plane coordinate system, and the orientation information includes a first coordinate value and a second coordinate value.
In some alternative embodiments, the coordinate system is a spherical coordinate system, the accuracy of the first angle value is non-linear, and/or the accuracy of the second angle value is non-linear.
In some optional embodiments, the audio data packet is a real-time transport protocol data packet.
In some optional embodiments, the target audio is audio for audio-video conference.
In some optional embodiments, the output unit is further configured to:
and outputting a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology according to the azimuth information.
In a fourth aspect, the present disclosure also provides an audio processing apparatus, including:
a recording unit, configured to record a target audio code in a packet of an audio data packet, and record azimuth information of the target audio in an extension header of the audio data packet, where the azimuth information is used to represent a spatial azimuth of a sound source corresponding to the target audio;
and the sending unit is used for sending the audio data packet to the target equipment.
In some optional embodiments, the recording unit is further configured to:
and recording the reverberation information of the target audio in the extension header of the audio data packet.
In some optional embodiments, the recording unit is further configured to:
first data for recording a coordinate system type in the extension header;
and second data for recording the orientation information in the coordinate system type in the extension header.
In some alternative embodiments, the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or
The coordinate system type is a rectangular plane coordinate system, and the orientation information includes a first coordinate value and a second coordinate value.
In some alternative embodiments, the coordinate system is a spherical coordinate system, the accuracy of the first angle value is non-linear, and/or the accuracy of the second angle value is non-linear.
In some optional embodiments, the orientation information is used for the target device to output a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology.
In a fifth aspect, the present disclosure provides an electronic device comprising:
one or more processors;
a storage device having one or more programs stored thereon,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as described in any of the embodiments of the first or second aspects of the disclosure.
In a sixth aspect, the present disclosure provides a computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by one or more processors, implements the method as described in any one of the embodiments of the first or second aspect of the present disclosure.
According to the audio processing method, the audio processing device, the electronic equipment and the storage medium, the encoding of the target audio is transmitted by using the inclusion of the audio data packet, and the azimuth information of the target audio is transmitted by using the extension header of the audio data packet, so that on one hand, the encoding of the target audio does not need to contain azimuth information, the existing encoding and decoding mode can be compatible, the application range is wider, on the other hand, the azimuth information can directly correspond to the encoded data, and the problem of information lag or loss caused by multipath transmission is solved.
In addition, the audio processing method, the audio processing device, the electronic equipment and the storage medium provided by the disclosure utilize the extension header of the audio data packet to transmit the azimuth information of the target audio, so that the transmission bandwidth requirement is reduced. In addition, the transmitting end can adjust the extension head of the audio data packet in real time before the audio data packet is transmitted, so the scheme provided by the disclosure can also realize the real-time transmission of the azimuth information and the dynamic adjustment of the azimuth information by the transmitting end.
Drawings
Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings. The drawings are only for purposes of illustrating the particular embodiments and are not to be construed as limiting the invention. In the drawings:
fig. 1 is a system architecture diagram of one embodiment of an audio processing system according to the present disclosure;
FIG. 2A is a flow diagram for one embodiment of an audio processing method according to the present disclosure;
fig. 2B is a schematic diagram of an extension header of an audio data packet in the audio processing method according to the present disclosure;
fig. 2C is a schematic diagram of a mapping relationship of first angle data and first angle values in the audio processing method according to the present disclosure;
fig. 2D is a schematic diagram of a mapping relationship of second angle data and second angle values in an audio processing method according to the present disclosure;
FIG. 2E is a schematic diagram of a rectangular plane coordinate system in the audio processing method according to the disclosure;
FIG. 3 is a flow diagram of another embodiment of an audio processing method according to the present disclosure;
FIG. 4A is a schematic block diagram of one embodiment of an audio processing device according to the present disclosure;
FIG. 4B is a schematic block diagram of another embodiment of an audio processing device according to the present disclosure;
FIG. 5 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 illustrates an exemplary system architecture 100 to which embodiments of the audio processing method, apparatus, terminal device, and storage medium of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various communication client applications, such as a voice interaction application, a video conference application, a short video social application, a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like, may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices having a microphone and a speaker, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a background server that processes the audio-video conference request sent by the terminal devices 101, 102, 103.
In some cases, the audio processing method provided by the present disclosure may be executed by the server 105, and accordingly, the audio processing apparatus may also be disposed in the server 105, and in this case, the system architecture 100 may also not include the terminal devices 101, 102, and 103.
In some cases, the audio processing method provided by the present disclosure may be executed by the terminal devices 101, 102, and 103, and accordingly, the audio processing apparatus may also be disposed in the terminal devices 101, 102, and 103, and in this case, the system architecture 100 may not include the server 105.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to fig. 2A, a flow 210 of one embodiment of an audio processing method according to the present disclosure is shown, applicable to the terminal device in fig. 1, the flow 210 including the steps of:
and step 211, determining azimuth information of the target audio according to the extension header of the audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and the code of the target audio is recorded in the packet body of the audio data packet.
In the present embodiment, the azimuth information is used to indicate the spatial azimuth of the sound source corresponding to the target audio, for example, the direction of the sound source with respect to the audio collecting position or the listener.
Generally, a data packet is a transmission unit in network data transmission. The data packet can be divided into a packet header and a packet body. The header is a special reserved field of defined bit length attached to the front of the data packet for carrying and transmission of control information, which serves both control and descriptive purposes. The body is the part used to carry the data to be transmitted.
Some communication protocols (e.g., real-time transport protocol, RTP) support extension of packet headers. At this time, the Header of the packet may be further divided into a Fixed Header (Fixed Header) and an extended Header (Header extension).
In the present embodiment, the azimuth information of the target audio may be recorded in the extension header of the audio packet.
Fig. 2B is a schematic diagram of an extension header of an audio data packet in the audio processing method according to the present disclosure. In the example shown in fig. 2B, two kinds of extended header audio position information defined in [ RFC5285] are used. Wherein, the upper part of FIG. 2B is a one-Byte expansion head, and the lower part of FIG. 2B is a Two-Byte expansion head.
As shown in fig. 2B, the parameter "m" (i.e., the first data) is used to record the coordinate system type. In one example, the parameter "m" is two bits in length. The parameter "m" has a value of "00" and may indicate that the coordinate system type is a spherical coordinate system. The origin of the spherical coordinate system is, for example, the position of the listener. The parameter "m" has a value of "01", and may indicate that the coordinate system type is a planar rectangular coordinate system. The origin of the rectangular plane coordinate system is, for example, the position of the listener. The mapping relationship between the value of the parameter "m" and the type of the coordinate system may be set according to actual needs, which is not limited by the present disclosure.
As shown in fig. 2B, the parameters "sita (or) X" and "phi (or) Y" (i.e., the second data) are used to record the orientation information corresponding to the above-mentioned coordinate system type. For example, in the case where the coordinate system type is a spherical coordinate system, the parameter "sita (or) X" (i.e., first angle data) represents a horizontal angle (i.e., a first angle value) in the spherical coordinate system, and the parameter "phi (or) Y" (i.e., second angle data) represents a vertical angle (i.e., a second angle value) in the spherical coordinate system. For another example, in the case where the coordinate system type is a planar rectangular coordinate system, the parameter "sita (or) X" represents an abscissa (i.e., a first coordinate value) in the planar rectangular coordinate system, and the parameter "phi (or) Y" represents an ordinate (i.e., a second coordinate value) in the planar rectangular coordinate system.
In the above example, in the case where the parameter "sita (or) X" represents the horizontal angle in the spherical coordinate system, the length thereof is, for example, 7 bits, where the first bit is the sign bit, the last six bits are the angle bits, and 0 degree is defined as being directly in front of the listener. Since 7-bit data can represent 128 values at most, and the horizontal angle in the spherical coordinate system has a range of [ -180,180], which refers to 360 integers, it is necessary to determine the mapping relationship between the parameter "sita (or) X" (i.e. the first angle data) and the horizontal angle in the spherical coordinate system (i.e. the first angle value).
Fig. 2C is a schematic diagram of a first mapping relationship between first angle data and first angle values in the audio processing method according to the present disclosure. In the psychoacoustic study, it was confirmed that there was a blur in the determination of human orientation, and the resolution was ± 2 degrees in the front direction, ± 10 degrees in the side direction, and ± 5 degrees in the rear direction. Therefore, the intervals of the first angle values in fig. 2C are non-uniform, and as the first angle values increase, the intervals between adjacent first angle values undergo a variation process of "1-2-10-5", i.e., the accuracy of the first angle values is non-linear. Through the mode, the first angle data with the limited number can be used for representing the first angle value with the large range on one hand, and on the other hand, the first angle data can keep consistent with the distinguishing characteristics of the human to the direction, so that the influence on the actual space auditory sensation is avoided.
It should be noted that fig. 2C only shows the mapping relationship between the first angle data and the first angle value when both the first angle data and the first angle value are positive values, and the mapping relationship between the first angle data and the first angle value when both the first angle data and the first angle value are negative values is similar to this, and is not repeated here.
Fig. 2D is a schematic diagram of a mapping relationship of second angle data and second angle values in the audio processing method according to the present disclosure. Similar to the example shown in fig. 2C, the interval of the second angle values in fig. 2D is also non-uniform. As the second angle value increases, the interval between adjacent second angle values undergoes a "2-5" change, i.e., the accuracy of the second angle value is non-linear. Fig. 2D only shows the mapping relationship between the second angle data and the second angle value when both the second angle data and the second angle value are positive values, and the mapping relationship between the second angle data and the second angle value when both the second angle data and the second angle value are negative values is similar to this, which is not described herein again.
Fig. 2E is a schematic diagram of a planar rectangular coordinate system in the audio processing method according to the present disclosure. The planar rectangular coordinate system does not relate to height information. As shown in fig. 2E, the origin of the rectangular plane coordinate system may be the position of the listener, the abscissa axis may be along the direction facing the listener, the positive direction of which is consistent with the direction of the line of sight of the listener, and the ordinate axis may be perpendicular to the aforementioned abscissa axis.
The range of the abscissa (i.e., the first coordinate value) in the rectangular planar coordinate system is [ -1,1], and a parameter "sita (or) X" with a length of 7 bits can be used for quantization expression. Similarly, the range of the ordinate (i.e. the second coordinate value) in the rectangular plane coordinate system is [ -1,1], and the parameter "phi (or) Y" with a length of 7 can be used for quantization.
Based on the above description, the execution subject of the audio processing method in the present embodiment may determine the coordinate system type from the first data in the extension header, and determine the orientation information under the coordinate system type from the second data in the extension header.
And 212, outputting the target audio according to the azimuth information of the target audio and the coding of the target audio.
In this embodiment, the execution subject of the audio processing method in this embodiment may output a three-dimensional sound field of the target audio by combining the orientation information of the target audio through a virtual three-dimensional (3D) technology or the like, thereby realizing the spatial listening sensation of the target audio.
Here, the execution main body of the audio processing method in this embodiment may directly play the target audio, or transmit the corresponding audio signal to other electronic devices (e.g., a bluetooth speaker) for playing.
In one example, the extension header of the audio data packet also includes reverberation information. For example, in the example shown in fig. 2B, the parameter "revb" may represent the reverberation type (e.g., hall, room, or tunnel, etc.). On this basis, the execution subject of the audio processing method in the present embodiment may output the target audio by determining the reverberation information of the target audio from the extension header of the audio packet, and outputting the target audio from the azimuth information of the target audio, the reverberation information of the target audio, and the encoding of the target audio. Through the method, the spatial perception reconstruction can be performed by using the predefined reverberation model, and the spatial perception of the target audio is further improved.
The audio processing method in this embodiment is different from a method of transmitting azimuth information using a packet body, in which a packet body of an audio data packet is used to transmit a code of a target audio, and an extension header of the audio data packet is used to transmit azimuth information of the target audio. Therefore, the audio processing method in this embodiment does not need the encoding itself of the target audio to contain the azimuth information, can be compatible with the existing encoding and decoding method, and has a wider application range, and can make the azimuth information directly correspond to the audio encoding, thereby reducing the problem of information lag or loss caused by multipath transmission.
With continuing reference to fig. 3, a flow 310 of another embodiment of an audio processing method according to the present disclosure is shown, applicable to the server in fig. 1, the flow 310 including the steps of:
Here, the structure of the extension header of the audio data packet may refer to the description in the above embodiment, and is not described here again.
Here, the target device is, for example, the terminal device in fig. 1.
In one example, the target audio may be an audio-video conference audio, and the execution subject of the audio processing method in this embodiment may be an audio-video conference server. In this case, the video conference server can conveniently perform real-time transfer and dynamic control of the orientation information.
The audio processing method in the present embodiment can achieve similar technical effects to the audio processing method in the foregoing embodiments. In addition, the audio processing method in this embodiment transmits the azimuth information of the target audio by using the extension header of the audio data packet, and can implement real-time transmission of the azimuth information and dynamic adjustment of the azimuth information by the sending end.
With further reference to fig. 4A, as an implementation of the method shown in fig. 2A described above, the present disclosure provides an embodiment of an audio processing apparatus, which corresponds to the method embodiment shown in fig. 2A, and which is specifically applicable to various terminal devices.
As shown in fig. 4A, the audio processing apparatus 410 of the present embodiment includes: a processing unit 411 and an output unit 412. The processing unit 411 is configured to determine, according to an extension header of an audio data packet, azimuth information of a target audio, where the azimuth information is used to represent a spatial azimuth of a sound source corresponding to the target audio, and a code of the target audio is recorded in a packet body of the audio data packet; an output unit 412, configured to output the target audio according to the azimuth information of the target audio and the coding of the target audio.
In this embodiment, the detailed processing of the processing unit 411 and the output unit 412 and the technical effects thereof can refer to the related descriptions of step 211 and step 212 in the corresponding embodiment of fig. 2A, which are not repeated herein.
In some optional embodiments, the 411 processing unit is further configured to: determining reverberation information of the target audio according to an extension header of the audio data packet; and the output unit 412 is further configured to: and outputting the target audio according to the azimuth information of the target audio, the reverberation information of the target audio and the coding of the target audio.
In some optional embodiments, the processing unit 411 is further configured to: determining the type of a coordinate system according to the first data in the extension head; and determining the azimuth information under the coordinate system type according to the second data in the extension head.
In some alternative embodiments, the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or the coordinate system type is a plane rectangular coordinate system, and the orientation information comprises a first coordinate value and a second coordinate value.
In some alternative embodiments, the coordinate system is a spherical coordinate system, the accuracy of the first angle value is non-linear, and/or the accuracy of the second angle value is non-linear.
In some optional embodiments, the audio data packet is a real-time transport protocol data packet.
In some optional embodiments, the target audio is audio for audio-video conference.
In some optional embodiments, the output unit 412 is further configured to: and outputting a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology according to the azimuth information.
It should be noted that, for details of implementation and technical effects of each unit in the audio processing apparatus provided in the embodiments of the present disclosure, reference may be made to descriptions of other embodiments in the present disclosure, and details are not described herein again.
With further reference to fig. 4B, as an implementation of the method shown in fig. 3 described above, the present disclosure provides an embodiment of an audio processing apparatus, which corresponds to the embodiment of the method shown in fig. 3, and which is particularly applicable to various servers.
As shown in fig. 4B, the audio processing apparatus 420 of the present embodiment includes: a recording unit 421 and a transmitting unit 422. The recording unit 411 is configured to record a code of a target audio in a packet of an audio data packet, and record azimuth information of the target audio in an extension header of the audio data packet, where the azimuth information is used to indicate a spatial azimuth of a sound source corresponding to the target audio; a sending unit 412, configured to send the audio data packet to a target device.
In this embodiment, the detailed processing of the recording unit 421 and the sending unit 422 and the technical effects thereof can refer to the related descriptions of step 321 and step 322 in the corresponding embodiment of fig. 3, which are not repeated herein.
In some optional embodiments, the recording unit 421 is further configured to: and recording the reverberation information of the target audio in the extension header of the audio data packet.
In some optional embodiments, the recording unit 421 is further configured to: first data for recording a coordinate system type in the extension header; and second data for recording the orientation information in the coordinate system type in the extension header.
In some alternative embodiments, the coordinate system is a spherical coordinate system, and the orientation information includes a first angle value and a second angle value; and/or the coordinate system type is a plane rectangular coordinate system, and the orientation information comprises a first coordinate value and a second coordinate value.
In some alternative embodiments, the coordinate system is a spherical coordinate system, the accuracy of the first angle value is non-linear, and/or the accuracy of the second angle value is non-linear.
In some optional embodiments, the orientation information is used for the target device to output a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology.
It should be noted that, for details of implementation and technical effects of each unit in the audio processing apparatus provided in the embodiments of the present disclosure, reference may be made to descriptions of other embodiments in the present disclosure, and details are not described herein again.
Referring now to FIG. 5, there is shown a schematic block diagram of a computer system 500 suitable for use in implementing the terminal device or server of the present disclosure. The computer system 500 shown in fig. 5 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present disclosure.
As shown in fig. 5, computer system 500 may include a processing device (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage device 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the computer system 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, and the like; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the computer system 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates a computer system 500 having various means of electronic equipment, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program, when executed by the processing device 501, performs the above-described functions defined in the methods of embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the audio processing method as shown in the embodiment shown in fig. 2A or fig. 2B and its optional embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation on the unit itself, for example, the processing unit may also be described as "a unit for determining the azimuth information of the target audio from the extension header of the audio packet".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Claims (18)
1. An audio processing method, comprising:
determining azimuth information of a target audio according to an extension header of an audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and codes of the target audio are recorded in a packet body of the audio data packet;
and outputting the target audio according to the azimuth information of the target audio and the coding of the target audio.
2. The method of claim 1, wherein prior to the outputting the target audio based on the orientation information of the target audio and the encoding of the target audio, the method further comprises:
determining reverberation information of the target audio according to an extension header of the audio data packet; and
the outputting the target audio according to the azimuth information of the target audio and the coding of the target audio comprises:
and outputting the target audio according to the azimuth information of the target audio, the reverberation information of the target audio and the coding of the target audio.
3. The method of claim 1, wherein the determining the azimuth information of the target audio according to the extension header of the audio data packet comprises:
determining the type of a coordinate system according to the first data in the extension head;
and determining the azimuth information under the coordinate system type according to the second data in the extension head.
4. The method of claim 3, wherein the coordinate system type is a spherical coordinate system, the orientation information includes a first angle value and a second angle value; and/or
The coordinate system type is a plane rectangular coordinate system, and the orientation information comprises a first coordinate value and a second coordinate value.
5. The method according to claim 4, wherein the coordinate system type is a spherical coordinate system, the accuracy of the first angle values is non-linear, and/or the accuracy of the second angle values is non-linear.
6. The method of any of claims 1-5, wherein the audio data packet is a real-time transport protocol data packet.
7. The method of any of claims 1-5, wherein the target audio is audio for an audio-visual conference.
8. The method according to any one of claims 1-5, wherein the outputting the target audio according to the orientation information of the target audio and the encoding of the target audio comprises:
and outputting a three-dimensional sound field corresponding to the target audio by using a virtual three-dimensional technology according to the azimuth information.
9. An audio processing method, comprising:
recording the code of a target audio in a packet body of an audio data packet, and recording azimuth information of the target audio in an extension header of the audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio;
and sending the audio data packet to a target device.
10. The method of claim 9, wherein prior to said transmitting the audio data packet to a target device, the method further comprises:
recording reverberation information of the target audio in an extension header of the audio data packet.
11. The method of claim 9, wherein the recording of the azimuth information of the target audio in the extension header of the audio packet comprises:
first data recording a coordinate system type in the extension header;
and second data for recording the orientation information in the coordinate system type in the extension header.
12. The method of claim 11, wherein the coordinate system type is a spherical coordinate system, the orientation information includes a first angle value and a second angle value; and/or
The coordinate system type is a plane rectangular coordinate system, and the orientation information comprises a first coordinate value and a second coordinate value.
13. The method according to claim 12, wherein the type of coordinate system is a spherical coordinate system, the accuracy of the first angle values is non-linear, and/or the accuracy of the second angle values is non-linear.
14. The method of any of claims 9-13, wherein the orientation information is for the target device to output a three-dimensional sound field corresponding to the target audio using virtual three-dimensional techniques.
15. An audio processing apparatus comprising:
the processing unit is used for determining azimuth information of a target audio according to an extension header of an audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio, and the code of the target audio is recorded in a packet body of the audio data packet;
and the output unit is used for outputting the azimuth information according to the target audio and the coding of the target audio and outputting the target audio.
16. An audio processing apparatus comprising:
the recording unit is used for recording the code of the target audio in the packet body of an audio data packet and recording the azimuth information of the target audio in the extension header of the audio data packet, wherein the azimuth information is used for representing the spatial azimuth of a sound source corresponding to the target audio;
and the sending unit is used for sending the audio data packet to the target equipment.
17. An electronic device, comprising:
one or more processors;
a storage device having one or more programs stored thereon,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-14.
18. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by one or more processors, implements the method of any one of claims 1-14.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110778572.2A CN113674751A (en) | 2021-07-09 | 2021-07-09 | Audio processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110778572.2A CN113674751A (en) | 2021-07-09 | 2021-07-09 | Audio processing method and device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113674751A true CN113674751A (en) | 2021-11-19 |
Family
ID=78538806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110778572.2A Pending CN113674751A (en) | 2021-07-09 | 2021-07-09 | Audio processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113674751A (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101414462A (en) * | 2007-10-15 | 2009-04-22 | 华为技术有限公司 | Audio encoding method and multi-point audio signal mixing control method and corresponding equipment |
CN101819776A (en) * | 2009-02-27 | 2010-09-01 | 北京中星微电子有限公司 | Method for embedding and acquiring sound source orientation information and audio coding decoding method and system |
CN102436814A (en) * | 2011-09-09 | 2012-05-02 | 南京大学 | Audio transmission scheme for stereo sound with low code rate |
CN102655584A (en) * | 2011-03-04 | 2012-09-05 | 中兴通讯股份有限公司 | Media data transmitting and playing method and system in tele-presence technology |
CN105070304A (en) * | 2015-08-11 | 2015-11-18 | 小米科技有限责任公司 | Method, device and electronic equipment for realizing recording of object audio |
US20160227337A1 (en) * | 2015-01-30 | 2016-08-04 | Dts, Inc. | System and method for capturing, encoding, distributing, and decoding immersive audio |
CN106774930A (en) * | 2016-12-30 | 2017-05-31 | 中兴通讯股份有限公司 | A kind of data processing method, device and collecting device |
CN110995946A (en) * | 2019-12-25 | 2020-04-10 | 苏州科达科技股份有限公司 | Sound mixing method, device, equipment, system and readable storage medium |
CN112189348A (en) * | 2018-03-27 | 2021-01-05 | 诺基亚技术有限公司 | Spatial audio capture |
CN112219236A (en) * | 2018-04-06 | 2021-01-12 | 诺基亚技术有限公司 | Spatial audio parameters and associated spatial audio playback |
CN112260982A (en) * | 2019-07-22 | 2021-01-22 | 华为技术有限公司 | Audio processing method and device |
-
2021
- 2021-07-09 CN CN202110778572.2A patent/CN113674751A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101414462A (en) * | 2007-10-15 | 2009-04-22 | 华为技术有限公司 | Audio encoding method and multi-point audio signal mixing control method and corresponding equipment |
CN101819776A (en) * | 2009-02-27 | 2010-09-01 | 北京中星微电子有限公司 | Method for embedding and acquiring sound source orientation information and audio coding decoding method and system |
CN102655584A (en) * | 2011-03-04 | 2012-09-05 | 中兴通讯股份有限公司 | Media data transmitting and playing method and system in tele-presence technology |
CN102436814A (en) * | 2011-09-09 | 2012-05-02 | 南京大学 | Audio transmission scheme for stereo sound with low code rate |
US20160227337A1 (en) * | 2015-01-30 | 2016-08-04 | Dts, Inc. | System and method for capturing, encoding, distributing, and decoding immersive audio |
CN105070304A (en) * | 2015-08-11 | 2015-11-18 | 小米科技有限责任公司 | Method, device and electronic equipment for realizing recording of object audio |
CN106774930A (en) * | 2016-12-30 | 2017-05-31 | 中兴通讯股份有限公司 | A kind of data processing method, device and collecting device |
CN112189348A (en) * | 2018-03-27 | 2021-01-05 | 诺基亚技术有限公司 | Spatial audio capture |
CN112219236A (en) * | 2018-04-06 | 2021-01-12 | 诺基亚技术有限公司 | Spatial audio parameters and associated spatial audio playback |
CN112260982A (en) * | 2019-07-22 | 2021-01-22 | 华为技术有限公司 | Audio processing method and device |
CN110995946A (en) * | 2019-12-25 | 2020-04-10 | 苏州科达科技股份有限公司 | Sound mixing method, device, equipment, system and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111466124B (en) | Method, processor system and computer readable medium for rendering an audiovisual recording of a user | |
EP3343349B1 (en) | An apparatus and associated methods in the field of virtual reality | |
US11429340B2 (en) | Audio capture and rendering for extended reality experiences | |
US11109177B2 (en) | Methods and systems for simulating acoustics of an extended reality world | |
CN111050271B (en) | Method and apparatus for processing audio signal | |
TW202127916A (en) | Soundfield adaptation for virtual reality audio | |
US11580213B2 (en) | Password-based authorization for audio rendering | |
US20210006976A1 (en) | Privacy restrictions for audio rendering | |
TWI819344B (en) | Audio signal rendering method, apparatus, device and computer readable storage medium | |
US20150032797A1 (en) | Distributed audio playback and recording | |
WO2022021898A1 (en) | Audio processing method, apparatus, and system, and storage medium | |
CN114424587A (en) | Controlling presentation of audio data | |
CN110677802A (en) | Method and apparatus for processing audio | |
Rinaldi et al. | On the exploitation of 5G multi-access edge computing for spatial audio in cultural heritage applications | |
US11109176B2 (en) | Processing audio signals | |
CN116569255A (en) | Vector field interpolation of multiple distributed streams for six degree of freedom applications | |
US20200401364A1 (en) | Audio Scene Processing | |
CN113889140A (en) | Audio signal playing method and device and electronic equipment | |
CN117835121A (en) | Stereo playback method, computer, microphone device, sound box device and television | |
WO2020153027A1 (en) | Audio system, audio playback device, server device, audio playback method, and audio playback program | |
US20230123253A1 (en) | Method and Apparatus for Low Complexity Low Bitrate 6DOF HOA Rendering | |
CN113674751A (en) | Audio processing method and device, electronic equipment and storage medium | |
CN113691927B (en) | Audio signal processing method and device | |
KR101111734B1 (en) | Sound reproduction method and apparatus distinguishing multiple sound sources | |
CN114128312B (en) | Audio rendering for low frequency effects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211119 |
|
RJ01 | Rejection of invention patent application after publication |