WO2023212880A1 - Audio processing method and apparatus, and storage medium - Google Patents

Audio processing method and apparatus, and storage medium Download PDF

Info

Publication number
WO2023212880A1
WO2023212880A1 PCT/CN2022/091052 CN2022091052W WO2023212880A1 WO 2023212880 A1 WO2023212880 A1 WO 2023212880A1 CN 2022091052 W CN2022091052 W CN 2022091052W WO 2023212880 A1 WO2023212880 A1 WO 2023212880A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
metadata
audio data
sound object
information
Prior art date
Application number
PCT/CN2022/091052
Other languages
French (fr)
Chinese (zh)
Inventor
吕柱良
史润宇
吕雪洋
刘晗宇
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to CN202280001320.1A priority Critical patent/CN117581566A/en
Priority to PCT/CN2022/091052 priority patent/WO2023212880A1/en
Publication of WO2023212880A1 publication Critical patent/WO2023212880A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Definitions

  • the present disclosure relates to the field of communication technology, and in particular, to an audio processing method/device/equipment and a storage medium.
  • the encoding device When the encoding device collects audio data to produce an object audio signal, it will include relative position information between the sound object and the listener's listening position in the metadata of the object audio signal.
  • the decoding device When the decoding device renders the object audio signal, it can render the spatial audio based on the relative position information, so that the listener can hear the sound coming from a specific direction, thereby giving the user a better three-dimensional and spatial immersion experience.
  • the audio processing method/device/equipment and storage medium proposed by the present disclosure are used to solve the technical problems of low coding efficiency and poor rendering effect of object audio signals in related technologies.
  • the audio processing method proposed in one aspect of the present disclosure is applied to encoding equipment, including:
  • the metadata including at least one of the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. kind;
  • An object audio signal is obtained based on the metadata of the audio data.
  • determining the metadata of each frame of audio data includes:
  • the relative position information is used to indicate the relative position between the sound object and the listening position of the listener.
  • determining the metadata of each frame of audio data includes:
  • the orientation information of the acoustic object is included in the metadata, and a mark is included in the metadata, the mark being used to indicate that the metadata includes the orientation information.
  • the orientation information includes absolute orientation information and/or relative orientation information
  • the relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
  • the metadata further includes at least one of the following:
  • the spatial state of the sound object which includes movement or stillness
  • the type of sound object is the type of sound object.
  • the method further includes:
  • Audio data of the sound object is sampled in units of frames.
  • the environmental spatial information in response to the sound object being located in the room, includes at least one of the following:
  • the basic information of the acoustic object includes at least one of the following:
  • the sound source width of the sound object
  • the frame length of each frame of audio data is the frame length of each frame of audio data.
  • obtaining an object audio signal based on the metadata of the audio data includes:
  • the header file and the object audio data packet are spliced to obtain at least one object audio signal.
  • the audio processing method proposed by another embodiment of the present disclosure is applied to encoding equipment, including:
  • the metadata including at least one of the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object;
  • the object audio signal is rendered based on the metadata.
  • the orientation information includes absolute orientation information and/or relative orientation information
  • the relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
  • the metadata further includes at least one of the following:
  • the spatial state of the sound object which includes movement or stillness
  • the type of sound object is the type of sound object.
  • the object audio signal includes a header file and an object audio data packet
  • the header file includes environmental space information of the sound object and basic information of the sound object;
  • the object audio data packet includes audio data metadata and audio data.
  • the environmental spatial information in response to the sound object being located in the room, includes at least one of the following:
  • the basic information of the acoustic object includes at least one of the following:
  • the sound source width of the sound object
  • the frame length of each frame of audio data is the frame length of each frame of audio data.
  • rendering the object audio signal based on the metadata includes:
  • the audio data is rendered based on the metadata and the header file.
  • the method further includes:
  • an audio processing device including:
  • Determining module used to determine the metadata of each frame of audio data.
  • the metadata includes the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation of the sound object. at least one of the ranges;
  • a processing module configured to obtain an object audio signal based on the metadata of the audio data.
  • an audio processing device including:
  • the acquisition module is used to obtain the encoded signal sent by the encoding device
  • a decoding module used to decode the encoded signal to obtain the target audio signal
  • Determining module used to determine the metadata of the object audio signal, the metadata includes at least one of the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. kind;
  • a rendering module configured to render the object audio signal based on the metadata.
  • the device includes a processor and a memory.
  • a computer program is stored in the memory.
  • the processor executes the computer program stored in the memory so that the The device performs the method proposed in the above embodiment.
  • a communication device provided by another embodiment of the present disclosure includes: a processor and an interface circuit
  • the interface circuit is used to receive code instructions and transmit them to the processor
  • the processor is configured to run the code instructions to perform the method proposed in another embodiment.
  • a computer-readable storage medium provided by an embodiment of another aspect of the present disclosure is used to store instructions. When the instructions are executed, the method proposed by the embodiment of another aspect is implemented.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the sound in the audio data. At least one of the absolute position information of the object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects.
  • the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • Figure 1 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of an audio processing method provided by another embodiment of the present disclosure.
  • FIGS 3a-3b are flowcharts of an audio processing method provided by yet another embodiment of the present disclosure.
  • FIG. 4 is a schematic flowchart of an audio processing method provided by yet another embodiment of the present disclosure.
  • FIG. 5 is a schematic flowchart of an audio processing method provided by yet another embodiment of the present disclosure.
  • FIGS 6a-6b are flowcharts of an audio processing method provided by yet another embodiment of the present disclosure.
  • FIG. 7 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure.
  • Figure 8 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure.
  • FIGS 9a-9e are flowcharts of an audio processing method provided by an embodiment of the present disclosure.
  • Figure 9f is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure.
  • Figure 9g is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure.
  • Figure 10 is a block diagram of a user equipment provided by an embodiment of the present disclosure.
  • Figure 11 is a block diagram of a network side device provided by an embodiment of the present disclosure.
  • first, second, third, etc. may be used to describe various information in the embodiments of the present disclosure, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other.
  • first information may also be called second information, and similarly, the second information may also be called first information.
  • the words "if” and “if” as used herein may be interpreted as “when” or “when” or “in response to determining.”
  • FIG. 1 is a schematic flow chart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 1, the audio processing method may include the following steps:
  • Step 101 Determine metadata of each frame of audio data.
  • the metadata may include the absolute position information of the sound object in each frame of audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. of at least one.
  • the above-mentioned relative position information may be used to indicate the relative position between the sound object and the listening position of the listener.
  • the absolute position information and relative position information may specifically be mapping information of the absolute position or relative position of the sound object on the coordinate system.
  • the above-mentioned absolute position may be, for example, the longitude and latitude of the sound object, etc.; the above-mentioned relative position may be, for example, the distance, azimuth angle, pitch angle, etc. between the sound object and the listener.
  • the listening position of the listener can be any position, or the position of any sound object.
  • the method for determining the absolute position information of the sound object may include: first obtaining the absolute position of each sound object, and then establishing an absolute coordinate system.
  • the origin of the absolute coordinate system may be any A position is determined, and the origin of the absolute coordinate system is fixed. Then, the absolute position of each sound object is mapped to the absolute coordinate system to obtain the absolute position information of the sound object.
  • the absolute coordinate system may be a rectangular coordinate system
  • the absolute position information of the acoustic object may be: (x, y, z), where x, y, z respectively represent The position coordinates of the sound object on the x-axis (such as the axis in the front-to-back direction), y-axis (such as the axis in the left-right direction), and z-axis (such as the axis in the up-down direction) of the rectangular coordinate system.
  • the absolute coordinate system may be a spherical coordinate system
  • the absolute position information of the acoustic object may be: ( ⁇ , ⁇ , r), where ⁇ , ⁇ , r are respectively Indicates the horizontal direction angle of the sound object on the spherical coordinate system (i.e., the angle between the mapping of the sound object and the origin of the spherical coordinate system on the horizontal plane and the x-axis), the vertical direction angle (i.e., the mapping between the sound object and the origin of the The angle between the horizontal plane) and the straight-line distance of the sound object from the origin.
  • the method for determining the relative position information of the sound object may include: first obtaining the relative position of each sound object and the listening position of the listener, and then establishing a relative coordinate system.
  • the relative coordinate system The origin of is always the listening position, and when the listening position changes, the origin of the relative coordinate system will also change. Afterwards, the relative position of each sound object and the listening position is mapped to the relative coordinate system to obtain the relative position information of the sound object.
  • the relative coordinate system may be a rectangular coordinate system
  • the relative position information of the acoustic object may be: (x, y, z), where x, y, z respectively represent The position coordinates of the sound object on the x-axis (such as the axis in the front-to-back direction), y-axis (such as the axis in the left-right direction), and z-axis (such as the axis in the up-down direction) of the rectangular coordinate system.
  • the relative coordinate system may be a spherical coordinate system
  • the relative position information of the acoustic object may be: ( ⁇ , ⁇ , r), where ⁇ , ⁇ , r are respectively Indicates the horizontal direction angle of the sound object on the spherical coordinate system (i.e., the angle between the mapping of the sound object and the origin of the spherical coordinate system on the horizontal plane and the x-axis), the vertical direction angle (i.e., the mapping between the sound object and the origin of the The angle between the horizontal plane) and the straight-line distance of the sound object from the origin.
  • the above-mentioned method of "obtaining the absolute position or relative position of each acoustic object” may include: using a sensor or a combination of sensors to obtain the absolute position or relative position of the acoustic object, such as displacement.
  • Sensors, position sensors, attitude sensors (such as gyroscopes, ultrasonic rangefinders, etc.), positioning sensors, geomagnetic sensors, direction sensors, accelerometers, etc. obtain the absolute or relative position of the sound object.
  • the distance between the sound object in relative position and the listener can also be obtained through inertial navigation technology and initial alignment technology.
  • the absolute position or relative position of each sound object may also be obtained based on user input.
  • the absolute position or relative position of each sound object may also be generated based on a program.
  • the above-mentioned orientation information of the acoustic object may specifically be absolute orientation information of the acoustic object (such as true south orientation or true north orientation, etc.).
  • the orientation information of the acoustic object may be relative orientation information of the acoustic object, and the relative orientation information may be used to indicate the relative orientation between the acoustic object and the listening position, such as the relative orientation information.
  • the orientation information can be: the sound object is located 30° south to west of the listening position.
  • the orientation information of the acoustic object can be obtained using any of the above sensors or obtained based on user input or generated based on a program.
  • the above-mentioned sound radiation range of the sound object may be a parameter used to describe the radiation characteristics of the sound object.
  • the sound radiation range of the sound object can be used to indicate the sound radiation angle of the sound object.
  • the sound radiation range of the sound object can be: the sound object radiates sound 90° to the front, Alternatively, the sound radiation range of the sound object can be: the sound object radiates sound to 360°.
  • the sound radiation range of the sound object may be the sound radiation shape of the sound object.
  • the sound radiation range of the sound object may be: the sound object emits sound radiation according to a heart shape, or the sound radiation range is:
  • the object's sound radiation range is: the sound object's sound radiation follows a figure-8 shape.
  • the sound radiation range of the sound object can be obtained using any of the above sensors or obtained based on user input or generated based on a program.
  • the above-mentioned metadata of each frame of audio data may also include at least one of the following:
  • the spatial state of the sound object including moving or stationary;
  • Type of sound object (such as speech, music, etc.).
  • the size of the sound source of the sound object, the width of the sound object, the height of the sound object, and the spatial state of the sound object can also be obtained through any of the above sensors or obtained based on user input or generated based on a program.
  • each content in the metadata is correspondingly stored with a flag bit to indicate that the parameters of the content are relative to the content in the metadata of the previous frame of audio data. Whether the parameters have changed.
  • the azimuth angle in the metadata is correspondingly stored with an azimuth angle flag. If the azimuth angle in the metadata of the audio data of the current frame has not changed relative to the azimuth angle in the metadata of the audio data of the previous frame, , then the azimuth angle flag can be made to be a first value (such as 1), otherwise the azimuth flag can be made to be a second value (such as 0).
  • the metadata of the audio data of the current frame may not be included in the content, and the content in the metadata of the audio data of the previous frame may be directly reused, thereby reducing the data volume and transmission bandwidth of the metadata to a certain extent, reducing data compression, and improving efficiency. Encoding efficiency without affecting the final decoding rendering effect.
  • Step 102 Obtain the object audio signal based on the metadata of the audio data.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • FIG. 2 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 2, the audio processing method may include the following steps:
  • Step 201 Determine the environmental space information of the sound object.
  • the environmental spatial information when the sound object is located in the room, the environmental spatial information includes at least one of the following:
  • Room type (such as large room, small room, conference room, auditorium, hall, etc.);
  • the environmental space information can be obtained using any of the above sensors or obtained based on user input or generated based on a program.
  • the absolute coordinate system or the relative coordinate system can be established based on the environmental space information.
  • Step 202 Determine the basic information of the sound object.
  • the basic information of the acoustic object may include at least one of the following:
  • the sound source width of the sound object
  • the frame length of each frame of audio data is the frame length of each frame of audio data.
  • the basic information of the acoustic object can be obtained using any of the above-mentioned sensors or obtained based on user input or generated based on a program.
  • Step 203 Sample the audio data of the sound object in frame units.
  • a sound collection device (such as a microphone) can be used to sample the audio data of the sound object in frame units, and all sampling points included in the current frame can be Modulation, pulse code encoding modulation data) method to save.
  • Table 1 and Table 2 are schematic tables of the storage syntax of audio data provided by embodiments of the present disclosure.
  • Step 204 Determine metadata of each frame of audio data.
  • step 204 For detailed introduction to step 204, reference may be made to the description of the above embodiments, and the embodiments of the present disclosure will not be described again here.
  • Step 205 Obtain the object audio signal based on the metadata of the audio data.
  • a method of obtaining an object audio signal based on metadata based on audio data may include the following steps:
  • Step 1 Store the environmental space information of the sound object and the basic information of the sound object as a header file.
  • Table 3 is a schematic table of the storage syntax of the header file provided by the embodiment of the present disclosure.
  • Step 2 Store the metadata of each frame of audio data and each frame of audio data as an object audio data packet.
  • Table 4 is a schematic table of the storage syntax of the object audio data packet provided by the embodiment of the present disclosure.
  • Step 3 Splice the header file and the object audio data packet to obtain at least one object audio signal.
  • the encoding device can save or transmit the object audio signal as needed, or it can also encode the object audio signal into other formats and then save or transmit it. .
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • FIG 3a is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 3a, the audio processing method may include the following steps:
  • Step 301a Determine whether the metadata needs to contain absolute position information or relative position information.
  • whether the metadata needs to contain absolute position information or relative position information is determined mainly based on the characteristics of the application scene or sound object or the simplification of the subsequent rendering process.
  • the metadata needs to contain absolute location information
  • the second preset condition it is determined that the metadata needs to contain relative location information
  • the first preset condition may include at least one of the following:
  • the amount of data when the metadata includes absolute position information is less than or equal to the amount of data when the metadata includes relative position information
  • the rendering process required when metadata includes absolute position information is simpler than when the metadata includes relative position information.
  • the second preset condition may include at least one of the following:
  • the amount of data when the metadata includes absolute position information is greater than or equal to the amount of data when the metadata includes relative position information
  • the rendering process required when metadata includes relative position information is simpler than when the metadata includes absolute position information.
  • the metadata can be determined whether the metadata needs to contain absolute position information or relative position information by determining whether the absolute position or relative position of the sound object in the metadata of each consecutive frame of audio data remains unchanged. Among them, if the absolute position of the sound object in the metadata of the audio data of consecutive frames does not change, then it is determined that the metadata contains absolute position information. At this time, based on the absolute position of the metadata does not change, it is possible to only make the third of the continuous frame The metadata of one frame of audio data only needs to contain the absolute position information of the sound object, and the audio data of other frames in the consecutive frames can reuse the absolute position information contained in the metadata of the audio data of the first frame of the consecutive frames.
  • the metadata of one frame of audio data only needs to contain the relative position information of the sound object.
  • the audio data of other frames in the consecutive frames can reuse the relative position information contained in the metadata of the audio data of the first frame of the consecutive frames, so that It can reduce the data volume and transmission bandwidth of metadata to a certain extent, reduce data compression, and improve encoding efficiency without affecting the final decoding and rendering effect.
  • the simplification of the subsequent rendering process will also be taken into consideration, and information that can simplify the subsequent rendering process is selected from absolute position information or relative position information for use, thereby improving the subsequent rendering process.
  • information that can simplify the subsequent rendering process is selected from absolute position information or relative position information for use, thereby improving the subsequent rendering process.
  • Rendering process efficiency For example, in a scene with 6 degrees of freedom, the listener can perform three-dimensional rotation and three-dimensional displacement. In this case, using absolute position information can be more conducive to the processing of such scenes and simplify the rendering process.
  • Step 302a In response to determining that the metadata needs to include absolute location information, make the metadata include the absolute location information.
  • Table 5 is a schematic table of storage syntax for metadata containing absolute position information provided by an embodiment of the present disclosure.
  • Step 303a Obtain the object audio signal based on the metadata of the audio data.
  • steps 302a-303a please refer to the above embodiment description, and the embodiments of this disclosure will not be described again here.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • FIG. 3b is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 3b, the audio processing method may include the following steps:
  • Step 301b Determine the metadata of each frame of audio data, and the metadata includes absolute position information.
  • Step 302b Obtain the object audio signal based on the metadata of the audio data.
  • steps 301b-302b please refer to the description of the above embodiments, and the embodiments of the present disclosure will not be described again here.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • FIG 4 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 4, the audio processing method may include the following steps:
  • Step 401 Determine whether the metadata needs to contain absolute position information or relative position information.
  • Step 402 In response to determining that the metadata needs to include relative position information, make the metadata include the relative position information.
  • Table 6 is a schematic table of storage syntax for metadata containing relative position information provided by an embodiment of the present disclosure.
  • Step 403 Obtain the object audio signal based on the metadata of the audio data.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • FIG. 5 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 5, the audio processing method may include the following steps:
  • Step 501 Determine whether the sound object has a direction.
  • the sound object if the sound object emits sound in all directions, it is considered that the sound object has no orientation; otherwise, the sound emission direction of the sound object is determined as the orientation of the sound object.
  • Step 502 In response to the sound object having an orientation, include the orientation information of the acoustic object into the metadata, and include a tag in the metadata, the tag being used to indicate that the metadata includes the orientation information.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sounding object.
  • the subsequent encoding device When the subsequent encoding device renders the object audio signal, it can perform rendering based on the orientation information to simulate the different actual orientations of the sounding object. The resulting difference in hearing improves the rendering effect.
  • FIG. 6a is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 6a, the audio processing method may include the following steps:
  • Step 601a Determine whether the sound object has a direction.
  • Step 602a In response to the sound object having no orientation, no orientation information is included in the metadata.
  • steps 601a-602a please refer to the above embodiment description, and the embodiments of the present disclosure will not be described again here.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • FIG. 6b is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 6a, the audio processing method may include the following steps:
  • Step 601b Determine the metadata of each frame of audio data.
  • the metadata includes the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. of at least one.
  • Step 602b Obtain the object audio signal based on the metadata of the audio data.
  • steps 601b-602b please refer to the description of the above embodiments, and the embodiments of the present disclosure will not be described again here.
  • Step 603b Encode the target audio signal.
  • Step 604b Send the encoded signal to the decoding device.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • FIG. 7 is a schematic flow chart of an audio processing method provided by an embodiment of the present disclosure.
  • a local multi-person conference scenario is taken as an example.
  • the room on the left is the recording end, and the room on the right is the playback end.
  • Multiple objects are regarded as acoustic objects in the scene, and their corresponding voice data is obtained through microphones, and positioning and attitude sensors such as gyroscopes, ultrasonic rangefinders, etc. are used to obtain the spatial information of each object (such as relative position information or absolute position information) and orientation information.
  • the listener After encoding, transmitting, decoding and rendering the audio data, space and orientation information of each object, the listener can feel that he is in the conference scene on the left. Not only can he feel the object 1 , the direction and distance of object 2 and object 4, and you can also feel the direction of the object.
  • this solution can treat object 3 without audio data as an audio object for encoding and transmission, and it can be regarded as a listener in the recording scene. Through this solution, the playback end can completely restore the object.
  • the real listening experience of 3 includes changes in hearing experience caused by changes in the position of object 3 and head rotation (change in orientation).
  • Figure 8 is a schematic flow chart of an audio processing method provided by an embodiment of the present disclosure.
  • the left side is a remote participant, and multiple participants are located in different locations. Locations, different rooms, each can be regarded as a sound object for object audio coding.
  • Use displacement or position sensors and attitude sensors to obtain the object's spatial position change information and head orientation information, and use a microphone to obtain the participant's voice signal as Object audio data, and use spatial position information, head orientation information, and object audio data for encoding of object audio.
  • the near-end user (right side of Figure 8)
  • the user after obtaining multiple encoded remote object audios, decoding and rendering, combined with the local spatial information of the near-end user, the user can perceive multiple remote parameters. Participants' voices have a sense of direction that changes with time, and the hearing changes caused by the direction of the remote participant can also be perceived.
  • FIG 9a is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9a, the audio processing method may include the following steps:
  • Step 901a Obtain the encoded signal sent by the encoding device
  • Step 902a Decode the encoded signal to obtain the target audio signal
  • Step 903a Determine the metadata of the object audio signal, the metadata including at least one of the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object;
  • Step 904a Render the object audio signal based on the metadata.
  • the orientation information includes absolute orientation information and/or relative orientation information
  • the relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
  • the metadata further includes at least one of the following:
  • the spatial state of the sound object which includes movement or stillness
  • the type of sound object is the type of sound object.
  • the object audio signal includes a header file and an object audio data packet
  • the header file includes environmental space information of the sound object and basic information of the sound object;
  • the object audio data packet includes audio data metadata and audio data.
  • the environmental spatial information in response to the sound object being located in the room, includes at least one of the following:
  • the basic information of the acoustic object includes at least one of the following:
  • the sound source width of the sound object
  • the frame length of each frame of audio data is the frame length of each frame of audio data.
  • rendering the object audio signal based on the metadata includes:
  • the audio data is rendered based on the metadata and the header file.
  • steps 901a-904a please refer to the above embodiment description, and the embodiments of the present disclosure will not be described again here.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • FIG. 9b is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9b, the audio processing method may include the following steps:
  • Step 901b Obtain the encoded signal sent by the encoding device
  • Step 902b Decode the encoded signal to obtain the target audio signal
  • Step 903b Determine metadata of the object audio signal, where the metadata includes absolute position information of the acoustic object;
  • Step 904b Render the object audio signal based on the metadata.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • Figure 9c is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9c, the audio processing method may include the following steps:
  • Step 901c Obtain the encoded signal sent by the encoding device
  • Step 902c Decode the encoded signal to obtain the target audio signal
  • Step 903c Determine metadata of the object audio signal, where the metadata includes relative position information of the acoustic object;
  • Step 904c Render the object audio signal based on the metadata.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • Figure 9d is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9d, the audio processing method may include the following steps:
  • Step 901d Obtain the encoded signal sent by the encoding device
  • Step 902d Decode the encoded signal to obtain the target audio signal
  • Step 903d Determine the metadata of the object audio signal.
  • the metadata includes the orientation information and the tag of the acoustic object.
  • the tag is used to indicate that the metadata includes orientation information;
  • Step 904d Render the object audio signal based on the metadata.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • Figure 9e is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9e, the audio processing method may include the following steps:
  • Step 901e Obtain the encoded signal sent by the encoding device
  • Step 902e Decode the encoded signal to obtain the target audio signal
  • Step 903e Determine the metadata of the object audio signal.
  • the metadata includes the orientation information and the tag of the acoustic object.
  • the tag is used to indicate that the metadata includes orientation information;
  • Step 904e Render the object audio signal based on the metadata.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • Figure 9f is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure. As shown in Figure 9f, the device may include:
  • Determining module 901f used to determine the metadata of each frame of audio data.
  • the metadata includes the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the voicing of the sound object. At least one of the radiation ranges;
  • the processing module 902f is configured to obtain an object audio signal based on the metadata of the audio data.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • the determining module is also used to:
  • the metadata In response to determining that the metadata needs to include relative position information, the metadata includes relative position information, and the relative position information is used to indicate the relative position between the sound object and the listener.
  • the determining module is also used to:
  • the orientation information of the acoustic object is included in the metadata, and a mark is included in the metadata, the mark being used to indicate that the metadata includes the orientation information.
  • the orientation information includes absolute orientation information and/or relative orientation information
  • the relative orientation information is used to indicate the relative orientation between the sound object and the listener.
  • the metadata further includes at least one of the following:
  • the spatial state of the sound object which includes movement or stillness
  • the type of sound object is the type of sound object.
  • the device is also used for:
  • Audio data of the sound object is sampled in units of frames.
  • the environmental spatial information in response to the sound object being located in the room, includes at least one of the following:
  • the basic information of the acoustic object includes at least one of the following:
  • the sound source width of the sound object
  • the frame length of each frame of audio data is the frame length of each frame of audio data.
  • the processing module is also used to::
  • the header file and the object audio data packet are spliced to obtain at least one object audio signal.
  • the method further includes:
  • Figure 9g is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure. As shown in Figure 9g, the device may include:
  • the acquisition module 901g is used to acquire the encoded signal sent by the encoding device
  • the decoding module 902g is used to decode the encoded signal to obtain the target audio signal
  • Determining module 903g used to determine the metadata of the object audio signal, the metadata including at least the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. A sort of;
  • Rendering module 904g configured to render the object audio signal based on the metadata.
  • the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data.
  • metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object.
  • the metadata of the audio data of a certain frame can only include The absolute position information between the sound object and the listening position.
  • the audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
  • the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
  • the orientation information includes absolute orientation information and/or relative orientation information
  • the relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
  • the metadata further includes at least one of the following:
  • the spatial state of the sound object which includes movement or stillness
  • the type of sound object is the type of sound object.
  • the object audio signal includes a header file and an object audio data packet
  • the header file includes environmental space information of the sound object and basic information of the sound object;
  • the object audio data packet includes audio data metadata and audio data.
  • the environmental spatial information in response to the sound object being located in the room, includes at least one of the following:
  • the basic information of the acoustic object includes at least one of the following:
  • the sound source width of the sound object
  • the frame length of each frame of audio data is the frame length of each frame of audio data.
  • rendering the object audio signal based on the metadata includes:
  • the audio data is rendered based on the metadata and the header file.
  • FIG 10 is a block diagram of a user equipment UE1000 provided by an embodiment of the present disclosure.
  • the UE1000 can be a mobile phone, a computer, a digital broadcast terminal device, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc.
  • UE 1000 may include at least one of the following components: a processing component 1002 , a memory 1004 , a power supply component 1006 , a multimedia component 1008 , an audio component 1010 , an input/output (I/O) interface 1012 , a sensor component 1013 , and a communication component. 1016.
  • Processing component 1002 generally controls the overall operations of UE 1000, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 1002 may include at least one processor 1020 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 1002 may include at least one module to facilitate interaction between processing component 1002 and other components. For example, processing component 1002 may include a multimedia module to facilitate interaction between multimedia component 1008 and processing component 1002.
  • Memory 1004 is configured to store various types of data to support operations at UE 1000. Examples of this data include instructions for any application or method operating on the UE1000, contact data, phonebook data, messages, pictures, videos, etc.
  • Memory 1004 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EEPROM erasable programmable read-only memory
  • EPROM Programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory flash memory, magnetic or optical disk.
  • Power supply component 1006 provides power to various components of UE 1000.
  • Power supply components 1006 may include a power management system, at least one power supply, and other components associated with generating, managing, and distributing power to UE 1000.
  • Multimedia component 1008 includes a screen that provides an output interface between the UE 1000 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes at least one touch sensor to sense touches, slides, and gestures on the touch panel. The touch sensor may not only sense the boundary of the touch or sliding operation, but also detect the wake-up time and pressure related to the touch or sliding operation.
  • multimedia component 1008 includes a front-facing camera and/or a rear-facing camera. When UE1000 is in an operating mode, such as shooting mode or video mode, the front camera and/or rear camera can receive external multimedia data.
  • Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
  • Audio component 1010 is configured to output and/or input audio signals.
  • audio component 1010 includes a microphone (MIC) configured to receive external audio signals when UE 1000 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 1004 or sent via communication component 1016 .
  • audio component 1010 also includes a speaker for outputting audio signals.
  • the I/O interface 1012 provides an interface between the processing component 1002 and a peripheral interface module.
  • the peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
  • the sensor component 1013 includes at least one sensor for providing various aspects of status assessment for the UE 1000 .
  • the sensor component 1013 can detect the open/closed state of the device 1000, the relative positioning of components, such as the display and keypad of the UE1000, the sensor component 1013 can also detect the position change of the UE1000 or a component of the UE1000, the user and the The presence or absence of UE1000 contact, UE1000 orientation or acceleration/deceleration and temperature changes of UE1000.
  • Sensor assembly 1013 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 1013 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 1013 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 1016 is configured to facilitate wired or wireless communication between UE 1000 and other devices.
  • UE1000 can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 1016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communications component 1016 also includes a near field communications (NFC) module to facilitate short-range communications.
  • NFC near field communications
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • UE 1000 may be configured by at least one application specific integrated circuit (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), field programmable gate array ( FPGA), controller, microcontroller, microprocessor or other electronic component implementation for executing the above method.
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • DSPD digital signal processing device
  • PLD programmable logic device
  • FPGA field programmable gate array
  • controller microcontroller, microprocessor or other electronic component implementation for executing the above method.
  • FIG. 11 is a block diagram of a network side device 1100 provided by an embodiment of the present disclosure.
  • the network side device 1100 may be provided as a network side device.
  • the network side device 1100 includes a processing component 1111 , which further includes at least one processor, and a memory resource represented by a memory 1132 for storing instructions, such as application programs, that can be executed by the processing component 1122 .
  • An application stored in memory 1132 may include one or more modules, each of which corresponds to a set of instructions.
  • the processing component 1110 is configured to execute instructions to perform any of the foregoing methods applied to the network side device, for example, the method shown in FIG. 1 .
  • the network side device 1100 may also include a power supply component 1126 configured to perform power management of the network side device 1100, a wired or wireless network interface 1150 configured to connect the network side device 1100 to the network, and an input/output (I/O ) interface 1158.
  • the network side device 1100 may operate based on an operating system stored in the memory 1132, such as Windows Server TM, Mac OS X TM, Unix TM, Linux TM, FreeBSD TM or similar.
  • the methods provided by the embodiments of the present disclosure are introduced from the perspectives of network side equipment and UE respectively.
  • the network side device and the UE may include a hardware structure and a software module to implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module.
  • a certain function among the above functions can be executed by a hardware structure, a software module, or a hardware structure plus a software module.
  • the methods provided by the embodiments of the present disclosure are introduced from the perspectives of network side equipment and UE respectively.
  • the network side device and the UE may include a hardware structure and a software module to implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module.
  • a certain function among the above functions can be executed by a hardware structure, a software module, or a hardware structure plus a software module.
  • the communication device may include a transceiver module and a processing module.
  • the transceiver module may include a sending module and/or a receiving module.
  • the sending module is used to implement the sending function
  • the receiving module is used to implement the receiving function.
  • the transceiving module may implement the sending function and/or the receiving function.
  • the communication device may be a terminal device (such as the terminal device in the foregoing method embodiment), a device in the terminal device, or a device that can be used in conjunction with the terminal device.
  • the communication device may be a network device, a device in a network device, or a device that can be used in conjunction with the network device.
  • the communication device may be a network device, or may be a terminal device (such as the terminal device in the foregoing method embodiment), or may be a chip, chip system, or processor that supports the network device to implement the above method, or may be a terminal device that supports A chip, chip system, or processor that implements the above method.
  • the device can be used to implement the method described in the above method embodiment. For details, please refer to the description in the above method embodiment.
  • a communications device may include one or more processors.
  • the processor may be a general-purpose processor or a special-purpose processor, etc.
  • it can be a baseband processor or a central processing unit.
  • the baseband processor can be used to process communication protocols and communication data
  • the central processor can be used to control and execute communication devices (such as network side equipment, baseband chips, terminal equipment, terminal equipment chips, DU or CU, etc.)
  • a computer program processes data for a computer program.
  • the communication device may also include one or more memories, on which a computer program may be stored, and the processor executes the computer program, so that the communication device executes the method described in the above method embodiment.
  • data may also be stored in the memory.
  • the communication device and the memory can be provided separately or integrated together.
  • the communication device may also include a transceiver and an antenna.
  • the transceiver can be called a transceiver unit, a transceiver, or a transceiver circuit, etc., and is used to implement transceiver functions.
  • the transceiver can include a receiver and a transmitter.
  • the receiver can be called a receiver or a receiving circuit, etc., and is used to implement the receiving function;
  • the transmitter can be called a transmitter or a transmitting circuit, etc., and is used to implement the transmitting function.
  • the communication device may also include one or more interface circuits.
  • Interface circuitry is used to receive code instructions and transmit them to the processor.
  • the processor executes the code instructions to cause the communication device to perform the method described in the above method embodiment.
  • the communication device is a terminal device (such as the terminal device in the foregoing method embodiment): the processor is configured to execute the method shown in any one of Figures 1-4.
  • the communication device is a network device: a transceiver is used to perform the method shown in any one of Figures 5-7.
  • a transceiver for implementing receiving and transmitting functions may be included in the processor.
  • the transceiver can be a transceiver circuit, an interface, or an interface circuit.
  • the transceiver circuits, interfaces or interface circuits used to implement the receiving and transmitting functions can be separate or integrated together.
  • the above-mentioned transceiver circuit, interface or interface circuit can be used for reading and writing codes/data, or the above-mentioned transceiver circuit, interface or interface circuit can be used for signal transmission or transfer.
  • the processor may store a computer program, and the computer program runs on the processor, which can cause the communication device to perform the method described in the above method embodiment.
  • the computer program may be embedded in the processor, in which case the processor may be implemented in hardware.
  • the communication device may include a circuit, and the circuit may implement the functions of sending or receiving or communicating in the foregoing method embodiments.
  • the processors and transceivers described in this disclosure can be implemented in integrated circuits (ICs), analog ICs, radio frequency integrated circuits RFICs, mixed signal ICs, application specific integrated circuits (ASICs), printed circuit boards (printed circuit boards). circuit board, PCB), electronic equipment, etc.
  • the processor and transceiver can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), n-type metal oxide-semiconductor (NMOS), P-type Metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
  • CMOS complementary metal oxide semiconductor
  • NMOS n-type metal oxide-semiconductor
  • PMOS P-type Metal oxide semiconductor
  • BJT bipolar junction transistor
  • BiCMOS bipolar CMOS
  • SiGe silicon germanium
  • GaAs gallium arsenide
  • the communication device described in the above embodiments may be a network device or a terminal device (such as the terminal device in the foregoing method embodiment), but the scope of the communication device described in the present disclosure is not limited thereto, and the structure of the communication device may not be limited to limits.
  • the communication device may be a stand-alone device or may be part of a larger device.
  • the communication device may be:
  • the IC collection may also include storage components for storing data and computer programs;
  • the communication device may be a chip or a system on a chip
  • the chip includes a processor and an interface.
  • the number of processors may be one or more, and the number of interfaces may be multiple.
  • the chip also includes a memory, which is used to store necessary computer programs and data.
  • Embodiments of the present disclosure also provide a system for determining side link duration.
  • the system includes a communication device as a terminal device in the foregoing embodiment (such as the first terminal device in the foregoing method embodiment) and a communication device as a network device.
  • the system includes a communication device as a terminal device in the foregoing embodiment (such as the first terminal device in the foregoing method embodiment) and a communication device as a network device.
  • the present disclosure also provides a readable storage medium on which instructions are stored, and when the instructions are executed by a computer, the functions of any of the above method embodiments are implemented.
  • the present disclosure also provides a computer program product, which, when executed by a computer, implements the functions of any of the above method embodiments.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer programs.
  • the computer program When the computer program is loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present disclosure are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer program may be stored in or transferred from one computer-readable storage medium to another, for example, the computer program may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., high-density digital video discs (DVD)), or semiconductor media (e.g., solid state disks, SSD)) etc.
  • magnetic media e.g., floppy disks, hard disks, magnetic tapes
  • optical media e.g., high-density digital video discs (DVD)
  • DVD digital video discs
  • semiconductor media e.g., solid state disks, SSD
  • At least one in the present disclosure can also be described as one or more, and the plurality can be two, three, four or more, and the present disclosure is not limited.
  • the technical feature is distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D” etc.
  • the technical features described in “first”, “second”, “third”, “A”, “B”, “C” and “D” are in no particular order or order.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Stereophonic System (AREA)

Abstract

The present disclosure provides an audio processing method and apparatus, and a storage medium, which belong to the technical field of communications. The method comprises: determining metadata of each frame of audio data, wherein the metadata comprises at least one among absolute position information of an acoustic object, relative position information of the acoustic object, orientation information of the acoustic object, and a sound emission range of the acoustic object in the audio data; and acquiring an object audio signal on the basis of the metadata of the audio data. By means of using the method provided in the present disclosure, the data volume and transmission bandwidth of the metadata are reduced, the coding efficiency is increased, and it is ensured that a renderer can subsequently correctly render the orientation of an acoustic object and provide a correct spatial audio perception result without affecting a final decoding rendering effect. A hearing difference generated by different actual orientations of the acoustic object can be simulated, so that the rendering effect is improved.

Description

音频处理方法/装置/设备及存储介质Audio processing method/device/equipment and storage medium 技术领域Technical field
本公开涉及通信技术领域,尤其涉及一种音频处理方法/装置/设备及存储介质。The present disclosure relates to the field of communication technology, and in particular, to an audio processing method/device/equipment and a storage medium.
背景技术Background technique
编码设备在采集音频数据制作对象音频(Object Audio)信号时,会在对象音频信号的元数据中包括声对象与听音者的听音位置之间的相对位置信息。则解码设备在渲染对象音频信号时,可以基于相对位置信息来进行空间音频的渲染,使听音者能够听到特定方位传来的声音,从而使得用户有更好的立体和空间沉浸感受。When the encoding device collects audio data to produce an object audio signal, it will include relative position information between the sound object and the listener's listening position in the metadata of the object audio signal. When the decoding device renders the object audio signal, it can render the spatial audio based on the relative position information, so that the listener can hear the sound coming from a specific direction, thereby giving the user a better three-dimensional and spatial immersion experience.
但是,相关技术中,在录制音频数据或制作对象音频信号时,若多个声对象的绝对位置固定,但听音位置在不断移动,则会使得每一帧的音频数据下声对象与听音位置之间的相对位置信息不一致,进而使得每一帧的音频数据的元数据中均需要包括有声对象与听音位置之间的相对位置信息,则会增大元数据的数据量,占用传输带宽,还会导致对象音频信号的编码效率较低,以及针对某些应用场景而言,元数据中若包含相对位置信息还会导致后续渲染流程较为复杂,影响渲染效率。并且,在解码渲染相关技术中的对象音频信号时,由于无法模拟出声对象的实际朝向不同而产生的听感差异,则会导致渲染效果较差。However, in related technologies, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed but the listening positions are constantly moving, the audio data of each frame will be different from the sound objects. The relative position information between positions is inconsistent, so that the metadata of each frame of audio data needs to include the relative position information between the sound object and the listening position, which will increase the amount of metadata and occupy the transmission bandwidth. , it will also lead to low encoding efficiency of the object audio signal. For some application scenarios, if the metadata contains relative position information, it will also make the subsequent rendering process more complicated and affect the rendering efficiency. Moreover, when decoding object audio signals in rendering-related technologies, the difference in hearing sensation caused by the actual orientation of the sound-emitting object cannot be simulated, which will lead to poor rendering effects.
发明内容Contents of the invention
本公开提出的音频处理方法/装置/设备及存储介质,以解决相关技术中对象音频信号的编码效率较低、以及渲染效果较差的技术问题。The audio processing method/device/equipment and storage medium proposed by the present disclosure are used to solve the technical problems of low coding efficiency and poor rendering effect of object audio signals in related technologies.
本公开一方面实施例提出的音频处理方法,应用于编码设备,包括:The audio processing method proposed in one aspect of the present disclosure is applied to encoding equipment, including:
确定每帧音频数据的元数据,所述元数据包括所述音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determine the metadata of each frame of audio data, the metadata including at least one of the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. kind;
基于所述音频数据的元数据得到对象音频信号。An object audio signal is obtained based on the metadata of the audio data.
可选的,在本公开的一个实施例之中,所述确定每帧音频数据的元数据,包括:Optionally, in one embodiment of the present disclosure, determining the metadata of each frame of audio data includes:
确定所述元数据中需要包含绝对位置信息或相对位置信息;Determine that the metadata needs to contain absolute position information or relative position information;
其中,响应于确定所述元数据中需要包含绝对位置信息,使得所述元数据中包含绝对位置信息;Wherein, in response to determining that the metadata needs to contain absolute location information, causing the metadata to contain absolute location information;
响应于确定所述元数据中需要包含相对位置信息,使得所述元数据中包含相对位置信息,所述相对位置信息用于指示所述声对象与听音者的听音位置之间的相对位置。In response to determining that the metadata needs to include relative position information, causing the metadata to include relative position information, the relative position information is used to indicate the relative position between the sound object and the listening position of the listener. .
可选的,在本公开的一个实施例之中,所述确定每帧音频数据的元数据,包括:Optionally, in one embodiment of the present disclosure, determining the metadata of each frame of audio data includes:
确定所述声对象是否有朝向;Determine whether the sound object has a direction;
响应于所述声对象有朝向,将所述声对象的朝向信息包含至所述元数据中,且在所述元数据中包含标记,所述标记用于指示所述元数据中包括有朝向信息;In response to the sound object having an orientation, the orientation information of the acoustic object is included in the metadata, and a mark is included in the metadata, the mark being used to indicate that the metadata includes the orientation information. ;
响应于所述声对象不存在朝向,在所述元数据中不包含朝向信息。In response to the absence of an orientation of the acoustic object, no orientation information is included in the metadata.
可选的,在本公开的一个实施例之中,所述朝向信息包括绝对朝向信息和/或相对朝向信息;Optionally, in an embodiment of the present disclosure, the orientation information includes absolute orientation information and/or relative orientation information;
所述相对朝向信息用于指示所述声对象与听音位置之间的相对朝向。The relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
可选的,在本公开的一个实施例之中,所述元数据还包括以下至少一项:Optionally, in one embodiment of the present disclosure, the metadata further includes at least one of the following:
声对象的声源大小;The size of the sound source of the sound object;
声对象的宽度;The width of the sound object;
声对象的高度;The height of the sound object;
声对象的空间状态,所述空间状态包括移动或静止;The spatial state of the sound object, which includes movement or stillness;
声对象的类型。The type of sound object.
可选的,在本公开的一个实施例之中,所述方法还包括:Optionally, in an embodiment of the present disclosure, the method further includes:
确定所述声对象的环境空间信息;Determine the environmental spatial information of the acoustic object;
确定所述声对象的基本信息;Determine the basic information of the acoustic object;
以帧为单位采样所述声对象的音频数据。Audio data of the sound object is sampled in units of frames.
可选的,在本公开的一个实施例之中,响应于所述声对象位于房间中,所述环境空间信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, in response to the sound object being located in the room, the environmental spatial information includes at least one of the following:
房间大小;room size;
房间墙壁类型;room wall type;
墙壁反射系数;wall reflection coefficient;
房间类型;Room type;
混响时间。Reverberation time.
可选的,在本公开的一个实施例之中,所述声对象的基本信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, the basic information of the acoustic object includes at least one of the following:
声对象的数量;The number of sound objects;
声对象的声源的采样率;The sampling rate of the sound source of the sound object;
声对象的声源位宽;The sound source width of the sound object;
每帧音频数据的帧长。The frame length of each frame of audio data.
可选的,在本公开的一个实施例之中,所述基于所述音频数据的元数据得到对象音频信号,包括:Optionally, in an embodiment of the present disclosure, obtaining an object audio signal based on the metadata of the audio data includes:
将所述声对象的环境空间信息和所述声对象的基本信息存储为头文件;Store the environmental space information of the sound object and the basic information of the sound object as a header file;
将每一帧音频数据的元数据和每一帧音频数据存储为一个对象音频数据包;Store the metadata of each frame of audio data and each frame of audio data as an object audio data packet;
拼接所述头文件和所述对象音频数据包以得到至少一个对象音频信号。The header file and the object audio data packet are spliced to obtain at least one object audio signal.
本公开另一方面实施例提出的音频处理方法,应用于编码设备,包括:The audio processing method proposed by another embodiment of the present disclosure is applied to encoding equipment, including:
获取编码设备发送的编码后的信号;Obtain the encoded signal sent by the encoding device;
对所述编码后的信号进行解码得到对象音频信号;Decode the encoded signal to obtain the target audio signal;
确定所述对象音频信号的元数据,所述元数据包括声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determine the metadata of the object audio signal, the metadata including at least one of the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object;
基于所述元数据渲染所述对象音频信号。The object audio signal is rendered based on the metadata.
可选的,在本公开的一个实施例之中,所述朝向信息包括绝对朝向信息和/或相对朝向信息;Optionally, in an embodiment of the present disclosure, the orientation information includes absolute orientation information and/or relative orientation information;
所述相对朝向信息用于指示所述声对象与听音位置之间的相对朝向。The relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
可选的,在本公开的一个实施例之中,所述元数据还包括以下至少一项:Optionally, in one embodiment of the present disclosure, the metadata further includes at least one of the following:
声对象的声源大小;The size of the sound source of the sound object;
声对象的宽度;The width of the sound object;
声对象的高度;The height of the sound object;
声对象的空间状态,所述空间状态包括移动或静止;The spatial state of the sound object, which includes movement or stillness;
声对象的类型。The type of sound object.
可选的,在本公开的一个实施例之中,所述对象音频信号包括头文件和对象音频数据包;Optionally, in an embodiment of the present disclosure, the object audio signal includes a header file and an object audio data packet;
所述头文件包括所述声对象的环境空间信息和所述声对象的基本信息;The header file includes environmental space information of the sound object and basic information of the sound object;
所述对象音频数据包包括音频数据的元数据和音频数据。The object audio data packet includes audio data metadata and audio data.
可选的,在本公开的一个实施例之中,响应于所述声对象位于房间中,所述环境空间信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, in response to the sound object being located in the room, the environmental spatial information includes at least one of the following:
房间大小;room size;
房间墙壁类型;room wall type;
墙壁反射系数;wall reflection coefficient;
房间类型;Room type;
混响时间。Reverberation time.
可选的,在本公开的一个实施例之中,所述声对象的基本信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, the basic information of the acoustic object includes at least one of the following:
声对象的数量;The number of sound objects;
声对象的声源的采样率;The sampling rate of the sound source of the sound object;
声对象的声源位宽;The sound source width of the sound object;
每帧音频数据的帧长。The frame length of each frame of audio data.
可选的,在本公开的一个实施例之中,所述基于所述元数据渲染所述对象音频信号,包括:Optionally, in one embodiment of the present disclosure, rendering the object audio signal based on the metadata includes:
基于所述元数据和所述头文件对所述音频数据进行渲染。The audio data is rendered based on the metadata and the header file.
可选的,在本公开的一个实施例之中,所述方法还包括:Optionally, in an embodiment of the present disclosure, the method further includes:
对所述对象音频信号进行编码;Encoding the object audio signal;
将编码后的信号发送至解码设备.Send the encoded signal to the decoding device.
本公开又一方面实施例提出的一种音频处理装置,包括:Another aspect of the present disclosure provides an audio processing device, including:
确定模块,用于确定每帧音频数据的元数据,所述元数据包括所述音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determining module, used to determine the metadata of each frame of audio data. The metadata includes the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation of the sound object. at least one of the ranges;
处理模块,用于基于所述音频数据的元数据得到对象音频信号。A processing module, configured to obtain an object audio signal based on the metadata of the audio data.
本公开又一方面实施例提出的一种音频处理装置,包括:Another aspect of the present disclosure provides an audio processing device, including:
获取模块,用于获取编码设备发送的编码后的信号;The acquisition module is used to obtain the encoded signal sent by the encoding device;
解码模块,用于对所述编码后的信号进行解码得到对象音频信号;A decoding module, used to decode the encoded signal to obtain the target audio signal;
确定模块,用于确定所述对象音频信号的元数据,所述元数据包括声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determining module, used to determine the metadata of the object audio signal, the metadata includes at least one of the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. kind;
渲染模块,用于基于所述元数据渲染所述对象音频信号。A rendering module, configured to render the object audio signal based on the metadata.
本公开又一方面实施例提出的一种通信装置,所述装置包括处理器和存储器,所述存储器中存储有计算机程序,所述处理器执行所述存储器中存储的计算机程序,以使所述装置执行如上另一方面实施例提出的方法。Another aspect of the present disclosure provides a communication device. The device includes a processor and a memory. A computer program is stored in the memory. The processor executes the computer program stored in the memory so that the The device performs the method proposed in the above embodiment.
本公开又一方面实施例提出的通信装置,包括:处理器和接口电路;A communication device provided by another embodiment of the present disclosure includes: a processor and an interface circuit;
所述接口电路,用于接收代码指令并传输至所述处理器;The interface circuit is used to receive code instructions and transmit them to the processor;
所述处理器,用于运行所述代码指令以执行如另一方面实施例提出的方法。The processor is configured to run the code instructions to perform the method proposed in another embodiment.
本公开又一方面实施例提出的计算机可读存储介质,用于存储有指令,当所述指令被执行时,使如另一方面实施例提出的方法被实现。A computer-readable storage medium provided by an embodiment of another aspect of the present disclosure is used to store instructions. When the instructions are executed, the method proposed by the embodiment of another aspect is implemented.
综上所述,在本公开实施例提供的音频处理方法/装置/设备及存储介质之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method/device/equipment and storage medium provided by the embodiments of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the sound in the audio data. At least one of the absolute position information of the object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
附图说明Description of the drawings
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and readily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1为本公开一个实施例所提供的音频处理方法的流程示意图;Figure 1 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure;
图2为本公开另一个实施例所提供的音频处理方法的流程示意图;Figure 2 is a schematic flowchart of an audio processing method provided by another embodiment of the present disclosure;
图3a-3b为本公开再一个实施例所提供的音频处理方法的流程示意图;Figures 3a-3b are flowcharts of an audio processing method provided by yet another embodiment of the present disclosure;
图4为本公开又一个实施例所提供的音频处理方法的流程示意图;Figure 4 is a schematic flowchart of an audio processing method provided by yet another embodiment of the present disclosure;
图5为本公开又一个实施例所提供的音频处理方法的流程示意图;Figure 5 is a schematic flowchart of an audio processing method provided by yet another embodiment of the present disclosure;
图6a-6b为本公开又一个实施例所提供的音频处理方法的流程示意图;Figures 6a-6b are flowcharts of an audio processing method provided by yet another embodiment of the present disclosure;
图7为本公开一个实施例所提供的音频处理方法的流程示意图;Figure 7 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure;
图8为本公开一个实施例所提供的音频处理方法的流程示意图;Figure 8 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure;
图9a-9e为本公开一个实施例所提供的音频处理方法的流程示意图;Figures 9a-9e are flowcharts of an audio processing method provided by an embodiment of the present disclosure;
图9f为本公开一个实施例所提供的音频处理装置的结构示意图;Figure 9f is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure;
图9g为本公开一个实施例所提供的音频处理装置的结构示意图;Figure 9g is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure;
图10是本公开一个实施例所提供的一种用户设备的框图;Figure 10 is a block diagram of a user equipment provided by an embodiment of the present disclosure;
图11为本公开一个实施例所提供的一种网络侧设备的框图。Figure 11 is a block diagram of a network side device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开实施例相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开实施例的一些方面相一致的装置和方法的例子。Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. When the following description refers to the drawings, the same numbers in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with embodiments of the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects of embodiments of the present disclosure as detailed in the appended claims.
在本公开实施例使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开实施例。在本公开实施例和所附权利要求书中所使用的单数形式的“一种”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terminology used in the embodiments of the present disclosure is for the purpose of describing specific embodiments only and is not intended to limit the embodiments of the present disclosure. As used in the embodiments of the present disclosure and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that the term "and/or" as used herein refers to and includes any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本公开实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开实施例范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”及“若”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used to describe various information in the embodiments of the present disclosure, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from each other. For example, without departing from the scope of the embodiments of the present disclosure, the first information may also be called second information, and similarly, the second information may also be called first information. Depending on the context, the words "if" and "if" as used herein may be interpreted as "when" or "when" or "in response to determining."
下面参考附图对本公开实施例所提供的音频处理方法/装置/设备及存储介质进行详细描述。The audio processing method/device/equipment and storage medium provided by the embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
图1为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由编码设备执行,如图1所示,该音频处理方法可以包括以下步骤:Figure 1 is a schematic flow chart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 1, the audio processing method may include the following steps:
步骤101、确定每帧音频数据的元数据。Step 101: Determine metadata of each frame of audio data.
其中,在本公开的一个实施例之中,该元数据可以包括每帧音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种。In one embodiment of the present disclosure, the metadata may include the absolute position information of the sound object in each frame of audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. of at least one.
需要说明的是,在本公开的一个实施例之中,上述的相对位置信息可以用于指示声对象与听音者的听音位置之间的相对位置。以及,在本公开的一个实施例之中,该绝对位置信息和相对位置信息具体可以是声对象的绝对位置或相对位置在坐标系上的映射信息。其中,上述的绝对位置例如可以为声对象的经纬度等;上述的相对位置例如可以为声对象与听音者之间的距离、方位角、俯仰角等。其中,听音者的听音位置可以为任一位置,也可以是任一声对象的位置。It should be noted that, in one embodiment of the present disclosure, the above-mentioned relative position information may be used to indicate the relative position between the sound object and the listening position of the listener. And, in one embodiment of the present disclosure, the absolute position information and relative position information may specifically be mapping information of the absolute position or relative position of the sound object on the coordinate system. The above-mentioned absolute position may be, for example, the longitude and latitude of the sound object, etc.; the above-mentioned relative position may be, for example, the distance, azimuth angle, pitch angle, etc. between the sound object and the listener. The listening position of the listener can be any position, or the position of any sound object.
具体而言,在本公开的一个实施例之中,确定声对象的绝对位置信息的方法可以包括:先获取各个声对象的绝对位置,之后建立绝对坐标系,该绝对坐标系的原点可以为任一位置,且该绝对坐标系的原点固定,之后,将各个声对象的绝对位置映射至该绝对坐标系中以得到声对象的绝对位置信息。示例的,在本公开的一个实施例之中,该绝对坐标系可以为直角坐标系,以及声对象的绝对位置信息可以为:(x,y,z),其中,x、y、z分别表示声对象在直角坐标系的x轴(如前后方向的轴)、y轴(如左右方向的轴)、z轴(如上下方向的轴)上的位置坐标。示例的,在本公开的另一个实施例之中,该绝对坐标系可以为球坐标系,以及声对象的绝对位置信息可以为:(θ,γ,r),其中,θ、γ、r分别表示声对象在球坐标系上的水平方向角度(即声对象和球坐标系原点的连线在水平面上的映射和x轴的夹角)、垂直方向角度 (即声对象和原点的连线与水平面的夹角)、声对象离原点的直线距离。Specifically, in one embodiment of the present disclosure, the method for determining the absolute position information of the sound object may include: first obtaining the absolute position of each sound object, and then establishing an absolute coordinate system. The origin of the absolute coordinate system may be any A position is determined, and the origin of the absolute coordinate system is fixed. Then, the absolute position of each sound object is mapped to the absolute coordinate system to obtain the absolute position information of the sound object. For example, in one embodiment of the present disclosure, the absolute coordinate system may be a rectangular coordinate system, and the absolute position information of the acoustic object may be: (x, y, z), where x, y, z respectively represent The position coordinates of the sound object on the x-axis (such as the axis in the front-to-back direction), y-axis (such as the axis in the left-right direction), and z-axis (such as the axis in the up-down direction) of the rectangular coordinate system. For example, in another embodiment of the present disclosure, the absolute coordinate system may be a spherical coordinate system, and the absolute position information of the acoustic object may be: (θ, γ, r), where θ, γ, r are respectively Indicates the horizontal direction angle of the sound object on the spherical coordinate system (i.e., the angle between the mapping of the sound object and the origin of the spherical coordinate system on the horizontal plane and the x-axis), the vertical direction angle (i.e., the mapping between the sound object and the origin of the The angle between the horizontal plane) and the straight-line distance of the sound object from the origin.
在本公开的另一个实施例之中,确定声对象的相对位置信息的方法可以包括:先获取各个声对象与听音者的听音位置的相对位置,之后建立相对坐标系,该相对坐标系的原点始终为听音位置,其中,当听音位置变化时,该相对坐标系的原点也会发生变化。之后,将各个声对象与听音位置的相对位置映射至该相对坐标系中以得到声对象的相对位置信息。示例的,在本公开的一个实施例之中,该相对坐标系可以为直角坐标系,以及声对象的相对位置信息可以为:(x,y,z),其中,x、y、z分别表示声对象在直角坐标系的x轴(如前后方向的轴)、y轴(如左右方向的轴)、z轴(如上下方向的轴)上的位置坐标。示例的,在本公开的另一个实施例之中,该相对坐标系可以为球坐标系,以及声对象的相对位置信息可以为:(θ,γ,r),其中,θ、γ、r分别表示声对象在球坐标系上的水平方向角度(即声对象和球坐标系原点的连线在水平面上的映射和x轴的夹角)、垂直方向角度(即声对象和原点的连线与水平面的夹角)、声对象离原点的直线距离。In another embodiment of the present disclosure, the method for determining the relative position information of the sound object may include: first obtaining the relative position of each sound object and the listening position of the listener, and then establishing a relative coordinate system. The relative coordinate system The origin of is always the listening position, and when the listening position changes, the origin of the relative coordinate system will also change. Afterwards, the relative position of each sound object and the listening position is mapped to the relative coordinate system to obtain the relative position information of the sound object. For example, in one embodiment of the present disclosure, the relative coordinate system may be a rectangular coordinate system, and the relative position information of the acoustic object may be: (x, y, z), where x, y, z respectively represent The position coordinates of the sound object on the x-axis (such as the axis in the front-to-back direction), y-axis (such as the axis in the left-right direction), and z-axis (such as the axis in the up-down direction) of the rectangular coordinate system. For example, in another embodiment of the present disclosure, the relative coordinate system may be a spherical coordinate system, and the relative position information of the acoustic object may be: (θ, γ, r), where θ, γ, r are respectively Indicates the horizontal direction angle of the sound object on the spherical coordinate system (i.e., the angle between the mapping of the sound object and the origin of the spherical coordinate system on the horizontal plane and the x-axis), the vertical direction angle (i.e., the mapping between the sound object and the origin of the The angle between the horizontal plane) and the straight-line distance of the sound object from the origin.
其中,上述的(x,y,z)和(θ,γ,r)之间可以由如下的公式进行转换。Among them, the above-mentioned (x, y, z) and (θ, γ, r) can be converted by the following formula.
Figure PCTCN2022091052-appb-000001
Figure PCTCN2022091052-appb-000001
以及,在本公开的一个实施例之中,上述的“获取各个声对象的绝对位置或相对位置”的方法可以包括:利用传感器或传感器组合获取声对象的绝对位置或相对位置,如可以利用位移传感器、位置传感器、姿态传感器(如陀螺仪、超声测距仪等)、定位传感器、地磁传感器、方向传感器、加速度计等获取声对象的绝对位置或相对位置中。以及,相对位置中的声对象与听音者之间的距离还可以通过惯性导航技术和初始对准技术获取。在本公开的另一个实施例之中,各个声对象的绝对位置或相对位置也可以是基于用户输入获取。在本公开的又一个实施例之中,各个声对象的绝对位置或相对位置也可以是基于程序生成。And, in one embodiment of the present disclosure, the above-mentioned method of "obtaining the absolute position or relative position of each acoustic object" may include: using a sensor or a combination of sensors to obtain the absolute position or relative position of the acoustic object, such as displacement. Sensors, position sensors, attitude sensors (such as gyroscopes, ultrasonic rangefinders, etc.), positioning sensors, geomagnetic sensors, direction sensors, accelerometers, etc. obtain the absolute or relative position of the sound object. In addition, the distance between the sound object in relative position and the listener can also be obtained through inertial navigation technology and initial alignment technology. In another embodiment of the present disclosure, the absolute position or relative position of each sound object may also be obtained based on user input. In yet another embodiment of the present disclosure, the absolute position or relative position of each sound object may also be generated based on a program.
进一步地,在本公开的一个实施例之中,上述的声对象的朝向信息具体可以为声对象的绝对朝向信息(如正南朝向或正北朝向等)。在本公开的另一个实施例之中,该声对象的朝向信息具体可以为声对象的相对朝向信息,该相对朝向信息可以用于指示声对象与听音位置之间的相对朝向,如该相对朝向信息可以为:声对象位于听音位置的南偏西30°。以及,该声对象的朝向信息可以利用上述任一传感器获取或基于用户输入获取或基于程序生成。Furthermore, in an embodiment of the present disclosure, the above-mentioned orientation information of the acoustic object may specifically be absolute orientation information of the acoustic object (such as true south orientation or true north orientation, etc.). In another embodiment of the present disclosure, the orientation information of the acoustic object may be relative orientation information of the acoustic object, and the relative orientation information may be used to indicate the relative orientation between the acoustic object and the listening position, such as the relative orientation information. The orientation information can be: the sound object is located 30° south to west of the listening position. In addition, the orientation information of the acoustic object can be obtained using any of the above sensors or obtained based on user input or generated based on a program.
以及,在本公开的一个实施例之中,上述的声对象的发声辐射范围可以是用于描述声对象的辐射特性的参数。其中,在本公开的一个实施例之中,该声对象的发声辐射范围可以用于指示声对象的发声辐射角度,如声对象的发声辐射范围可以为:声对象向正前方90°发声辐射,或者,声对象发声辐射范围可以为:声对象向360°发声辐射。以及,在本公开的另一个实施例之中,该声对象的发声辐射范围可以为声对象的发声辐射形状,如声对象的发声辐射范围可以为:声对象按照心形发声辐射,或者,声对象发声辐射范围为:声对象按照8字形发声辐射。以及,该声对象的发声辐射范围可以利用上述任一传感器获取或基于用户输入获取或基于程序生成。And, in one embodiment of the present disclosure, the above-mentioned sound radiation range of the sound object may be a parameter used to describe the radiation characteristics of the sound object. Among them, in one embodiment of the present disclosure, the sound radiation range of the sound object can be used to indicate the sound radiation angle of the sound object. For example, the sound radiation range of the sound object can be: the sound object radiates sound 90° to the front, Alternatively, the sound radiation range of the sound object can be: the sound object radiates sound to 360°. And, in another embodiment of the present disclosure, the sound radiation range of the sound object may be the sound radiation shape of the sound object. For example, the sound radiation range of the sound object may be: the sound object emits sound radiation according to a heart shape, or the sound radiation range is: The object's sound radiation range is: the sound object's sound radiation follows a figure-8 shape. And, the sound radiation range of the sound object can be obtained using any of the above sensors or obtained based on user input or generated based on a program.
此外,在本公开的一个实施例之中,上述每帧音频数据的元数据还可以包括有以下至少一种:In addition, in an embodiment of the present disclosure, the above-mentioned metadata of each frame of audio data may also include at least one of the following:
声对象的声源大小;The size of the sound source of the sound object;
声对象的宽度;The width of the sound object;
声对象的高度;The height of the sound object;
声对象的空间状态,空间状态包括移动或静止;The spatial state of the sound object, including moving or stationary;
声对象的类型(如语音、音乐等)。Type of sound object (such as speech, music, etc.).
其中,该声对象的声源大小、声对象的宽度、声对象的高度、声对象的空间状态也可以通过上述任一传感器来获取或基于用户输入获取或基于程序生成。The size of the sound source of the sound object, the width of the sound object, the height of the sound object, and the spatial state of the sound object can also be obtained through any of the above sensors or obtained based on user input or generated based on a program.
还需要说明的是,在本公开的一个实施例之中,元数据中的各个内容均对应存储有标志位,以用于指示该内容的参数相对于上一帧音频数据的元数据中该内容的参数是否发生变化。示例的,元数据中的 方位角对应存储有方位角标志位,其中,若当前帧的音频数据的元数据中的方位角相对于上一帧的音频数据的元数据中的方位角未发生变化,则可以使得该方位角标志位为第一数值(如1),否则可以使得该方位角标志位为第二数值(如0)。以及,在本公开的一个实施例之中,若当前帧的音频数据的元数据中的部分内容相对于上一帧的音频数据的元数据未发生变化,则该当前帧的音频数据的元数据中可以不包括该未发生变化的部分内容,而直接复用上一帧的音频数据的元数据中的内容,从而可以一定程度上降低元数据的数据量和传输带宽,减少数据压缩,提高了编码效率,且不影响最终解码渲染效果。It should also be noted that in one embodiment of the present disclosure, each content in the metadata is correspondingly stored with a flag bit to indicate that the parameters of the content are relative to the content in the metadata of the previous frame of audio data. Whether the parameters have changed. For example, the azimuth angle in the metadata is correspondingly stored with an azimuth angle flag. If the azimuth angle in the metadata of the audio data of the current frame has not changed relative to the azimuth angle in the metadata of the audio data of the previous frame, , then the azimuth angle flag can be made to be a first value (such as 1), otherwise the azimuth flag can be made to be a second value (such as 0). And, in one embodiment of the present disclosure, if part of the content in the metadata of the audio data of the current frame has not changed relative to the metadata of the audio data of the previous frame, then the metadata of the audio data of the current frame The unchanged part of the content may not be included in the content, and the content in the metadata of the audio data of the previous frame may be directly reused, thereby reducing the data volume and transmission bandwidth of the metadata to a certain extent, reducing data compression, and improving efficiency. Encoding efficiency without affecting the final decoding rendering effect.
步骤102、基于音频数据的元数据得到对象音频信号。Step 102: Obtain the object audio signal based on the metadata of the audio data.
其中,关于上述的“基于音频数据的元数据得到对象音频信号”的具体方法会在后续实施例进行详细介绍。Among them, the specific method of "obtaining the object audio signal based on the metadata of the audio data" mentioned above will be introduced in detail in subsequent embodiments.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图2为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由编码设备执行,如图2所示,该音频处理方法可以包括以下步骤:Figure 2 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 2, the audio processing method may include the following steps:
步骤201、确定声对象的环境空间信息。Step 201: Determine the environmental space information of the sound object.
其中,在本公开的一个实施例之中,当声对象位于房间中,该环境空间信息包括以下至少一种:Wherein, in one embodiment of the present disclosure, when the sound object is located in the room, the environmental spatial information includes at least one of the following:
房间大小;room size;
房间墙壁类型;room wall type;
墙壁反射系数;wall reflection coefficient;
房间类型(如大房间、小房间、会议室、礼堂、大厅等);Room type (such as large room, small room, conference room, auditorium, hall, etc.);
混响时间。Reverberation time.
其中,该环境空间信息可以利用上述任一传感器获取或基于用户输入获取或基于程序生成。Wherein, the environmental space information can be obtained using any of the above sensors or obtained based on user input or generated based on a program.
需要说明的是,后续建立绝对坐标系或相对坐标系时可以基于该环境空间信息来建立绝对坐标系或相对坐标系。It should be noted that when the absolute coordinate system or the relative coordinate system is subsequently established, the absolute coordinate system or the relative coordinate system can be established based on the environmental space information.
步骤202、确定声对象的基本信息。Step 202: Determine the basic information of the sound object.
其中,在本公开的一个实施例之中,该声对象的基本信息可以包括以下至少一种:In one embodiment of the present disclosure, the basic information of the acoustic object may include at least one of the following:
声对象的数量;The number of sound objects;
声对象的声源的采样率;The sampling rate of the sound source of the sound object;
声对象的声源位宽;The sound source width of the sound object;
每帧音频数据的帧长。The frame length of each frame of audio data.
其中,该声对象的基本信息可以利用上述任一传感器获取或基于用户输入获取或基于程序生成。The basic information of the acoustic object can be obtained using any of the above-mentioned sensors or obtained based on user input or generated based on a program.
步骤203、以帧为单位采样声对象的音频数据。Step 203: Sample the audio data of the sound object in frame units.
其中,在本公开的一个实施例之中,可以利用声音采集设备(如麦克风)来以帧为单位采样声对象的音频数据,并且,可以将当前帧所包括的所有采样点以pcm(Pulse Code Modulation,脉码编码调制数据)方式进行保存。Among them, in one embodiment of the present disclosure, a sound collection device (such as a microphone) can be used to sample the audio data of the sound object in frame units, and all sampling points included in the current frame can be Modulation, pulse code encoding modulation data) method to save.
以及表1和表2为本公开实施例提供的音频数据的存储语法的示意表。And Table 1 and Table 2 are schematic tables of the storage syntax of audio data provided by embodiments of the present disclosure.
表1对象音频数据的语法(低延迟模式)Table 1 Syntax of object audio data (low latency mode)
Figure PCTCN2022091052-appb-000002
Figure PCTCN2022091052-appb-000002
表2—对象原始pcm样本的语法Table 2—Syntax of object raw pcm sample
Figure PCTCN2022091052-appb-000003
Figure PCTCN2022091052-appb-000003
步骤204、确定每帧音频数据的元数据。Step 204: Determine metadata of each frame of audio data.
其中,关于步骤204的详细介绍可以参考上述实施例描述,本公开实施例在此不做赘述。For detailed introduction to step 204, reference may be made to the description of the above embodiments, and the embodiments of the present disclosure will not be described again here.
步骤205、基于音频数据的元数据得到对象音频信号。Step 205: Obtain the object audio signal based on the metadata of the audio data.
其中,在本公开的一个实施例之中,基于基于音频数据的元数据得到对象音频信号的方法可以包括以下步骤:Among them, in one embodiment of the present disclosure, a method of obtaining an object audio signal based on metadata based on audio data may include the following steps:
步骤1、将声对象的环境空间信息和声对象的基本信息存储为头文件。Step 1. Store the environmental space information of the sound object and the basic information of the sound object as a header file.
其中,表3为本公开实施例提供的头文件的存储语法的示意表。Table 3 is a schematic table of the storage syntax of the header file provided by the embodiment of the present disclosure.
表3—对象音频文件头的语法Table 3—Syntax for object audio file headers
Figure PCTCN2022091052-appb-000004
Figure PCTCN2022091052-appb-000004
Figure PCTCN2022091052-appb-000005
Figure PCTCN2022091052-appb-000005
步骤2、将每一帧音频数据的元数据和每一帧音频数据存储为一个对象音频数据包。Step 2: Store the metadata of each frame of audio data and each frame of audio data as an object audio data packet.
其中,表4为本公开实施例提供的对象音频数据包的存储语法的示意表。Table 4 is a schematic table of the storage syntax of the object audio data packet provided by the embodiment of the present disclosure.
表4—对象音频数据包的语法Table 4—Syntax for object audio packets
Figure PCTCN2022091052-appb-000006
Figure PCTCN2022091052-appb-000006
步骤3、拼接头文件和对象音频数据包以得到至少一个对象音频信号。Step 3: Splice the header file and the object audio data packet to obtain at least one object audio signal.
其中,在本公开的一个实施例之中,当得到对象音频信号之后,编码设备根据需要,可以保存或传输该对象音频信号,或者,也可以将该对象音频信号编码成其他格式后保存或传输。In one embodiment of the present disclosure, after obtaining the object audio signal, the encoding device can save or transmit the object audio signal as needed, or it can also encode the object audio signal into other formats and then save or transmit it. .
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图3a为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由编码设备执行,如图3a所示,该音频处理方法可以包括以下步骤:Figure 3a is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 3a, the audio processing method may include the following steps:
步骤301a、确定元数据中需要包含绝对位置信息或相对位置信息。 Step 301a: Determine whether the metadata needs to contain absolute position information or relative position information.
其中,在本公开的一个实施例之中,主要是基于应用场景或声对象特点或后续渲染流程的简化程度等来确定元数据中需要包含绝对位置信息还是相对位置信息。Among them, in one embodiment of the present disclosure, whether the metadata needs to contain absolute position information or relative position information is determined mainly based on the characteristics of the application scene or sound object or the simplification of the subsequent rendering process.
具体的,当满足第一预设条件,确定元数据中需要包含绝对位置信息;当满足第二预设条件,确定元数据中需要包含相对位置信息。Specifically, when the first preset condition is met, it is determined that the metadata needs to contain absolute location information; when the second preset condition is met, it is determined that the metadata needs to contain relative location information.
其中,该第一预设条件可以包括以下至少一种:Wherein, the first preset condition may include at least one of the following:
声对象的绝对位置不变;The absolute position of the sound object remains unchanged;
元数据中包括绝对位置信息时的数据量小于或等于元数据中包括相对位置信息时的数据量;The amount of data when the metadata includes absolute position information is less than or equal to the amount of data when the metadata includes relative position information;
元数据中包括绝对位置信息时所需的渲染流程比元数据中包括相对位置信息时所需的渲染流程简化。The rendering process required when metadata includes absolute position information is simpler than when the metadata includes relative position information.
该第二预设条件可以包括以下至少一种:The second preset condition may include at least one of the following:
声对象的相对位置不变;The relative position of the sound object remains unchanged;
元数据中包括绝对位置信息时的数据量大于或等于元数据中包括相对位置信息时的数据量;The amount of data when the metadata includes absolute position information is greater than or equal to the amount of data when the metadata includes relative position information;
元数据中包括相对位置信息时所需的渲染流程比元数据中包括绝对位置信息时所需的渲染流程简化。The rendering process required when metadata includes relative position information is simpler than when the metadata includes absolute position information.
也即是,可以通过判定各个连续帧的音频数据的元数据中声对象的绝对位置不变还是相对位置不变,来确定元数据中是需要包含绝对位置信息还是相对位置信息。其中,若连续帧的音频数据的元数据中声对象的绝对位置不变,则确定元数据中包含绝对位置信息,此时,基于元数据的绝对位置不变,则可以仅使得连续帧的第一帧的音频数据的元数据包含声对象的绝对位置信息即可,连续帧中的其他帧的音频数据均可以复用连续帧的第一帧的音频数据的元数据包含的绝对位置信息。以及,若连续帧的音频数据的元数据中声对象的相对位置不变,则确定元数据中包含相对位置信息,此时,基于元数据的相对位置不变,则可以仅使得连续帧的第一帧的音频数据的元数据包含声对象的相对位置信息即可,连续帧中的其他帧的音频数据均可以复用连续帧的第一帧的音频数据的元数据包含的相对位置信息,从而可以一定程度上降低元数据的数据量和传输带宽,减少数据压缩,提高了编码效率,且不影响最终解码渲染效果。That is, it can be determined whether the metadata needs to contain absolute position information or relative position information by determining whether the absolute position or relative position of the sound object in the metadata of each consecutive frame of audio data remains unchanged. Among them, if the absolute position of the sound object in the metadata of the audio data of consecutive frames does not change, then it is determined that the metadata contains absolute position information. At this time, based on the absolute position of the metadata does not change, it is possible to only make the third of the continuous frame The metadata of one frame of audio data only needs to contain the absolute position information of the sound object, and the audio data of other frames in the consecutive frames can reuse the absolute position information contained in the metadata of the audio data of the first frame of the consecutive frames. And, if the relative position of the sound object in the metadata of the audio data of the consecutive frames does not change, it is determined that the metadata contains relative position information. At this time, based on the relative position of the metadata does not change, it is possible to only make the third of the continuous frame The metadata of one frame of audio data only needs to contain the relative position information of the sound object. The audio data of other frames in the consecutive frames can reuse the relative position information contained in the metadata of the audio data of the first frame of the consecutive frames, so that It can reduce the data volume and transmission bandwidth of metadata to a certain extent, reduce data compression, and improve encoding efficiency without affecting the final decoding and rendering effect.
此外,在本公开的一个实施例之中,还会考虑到后续渲染流程的简化程度,从绝对位置信息或相对位置信息中选择出能够使得后续渲染流程较简化的信息来使用,从而可以提高后续渲染流程的效率。示例的,如针对于6个自由度的场景下,听音者可以进行三维的旋转,三维的位移,此时使用绝对位置信息可以更有利于这类场景的处理,且简化渲染流程。In addition, in one embodiment of the present disclosure, the simplification of the subsequent rendering process will also be taken into consideration, and information that can simplify the subsequent rendering process is selected from absolute position information or relative position information for use, thereby improving the subsequent rendering process. Rendering process efficiency. For example, in a scene with 6 degrees of freedom, the listener can perform three-dimensional rotation and three-dimensional displacement. In this case, using absolute position information can be more conducive to the processing of such scenes and simplify the rendering process.
由此可知,本公开实施例之中,从多个维度(如较低数据量和较简化的渲染流程等)来综合考虑元数据中是需要包括绝对位置信息还是相对位置信息,从而不仅降低了元数据的数据量,还简化了后续渲染流程,提高了渲染效率。It can be seen from this that in the embodiments of the present disclosure, whether the metadata needs to include absolute position information or relative position information is comprehensively considered from multiple dimensions (such as lower data volume and simpler rendering process, etc.), thereby not only reducing The amount of metadata also simplifies the subsequent rendering process and improves rendering efficiency.
需要说明的是,上述的“确定元数据中需要包含绝对位置信息或相对位置信息”的确定逻辑仅是本公开的一个示例,其他与上述确定逻辑相关或类似的内容也均在本公开保护范围内。It should be noted that the above-mentioned determination logic of “determining that metadata needs to contain absolute position information or relative position information” is only an example of this disclosure, and other contents related to or similar to the above-mentioned determination logic are also within the scope of protection of this disclosure. Inside.
步骤302a、响应于确定元数据中需要包含绝对位置信息,使得元数据中包含绝对位置信息。 Step 302a: In response to determining that the metadata needs to include absolute location information, make the metadata include the absolute location information.
其中,表5为本公开实施例提供的包含绝对位置信息的元数据的存储语法的示意表。Table 5 is a schematic table of storage syntax for metadata containing absolute position information provided by an embodiment of the present disclosure.
表5—对象元数据样本的语法(绝对坐标模式)Table 5—Syntax for object metadata sample (absolute coordinate mode)
Figure PCTCN2022091052-appb-000007
Figure PCTCN2022091052-appb-000007
步骤303a、基于音频数据的元数据得到对象音频信号。 Step 303a: Obtain the object audio signal based on the metadata of the audio data.
其中,关于步骤302a-303a的详细介绍可以参考上述实施例描述,本公开实施例在此不做赘述。For detailed description of steps 302a-303a, please refer to the above embodiment description, and the embodiments of this disclosure will not be described again here.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其 中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图3b为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由编码设备执行,如图3b所示,该音频处理方法可以包括以下步骤:Figure 3b is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 3b, the audio processing method may include the following steps:
步骤301b、确定每帧音频数据的元数据,该元数据中包含绝对位置信息。 Step 301b: Determine the metadata of each frame of audio data, and the metadata includes absolute position information.
步骤302b、基于音频数据的元数据得到对象音频信号。 Step 302b: Obtain the object audio signal based on the metadata of the audio data.
其中,关于步骤301b-302b的详细介绍可以参考上述实施例描述,本公开实施例在此不做赘述。For detailed description of steps 301b-302b, please refer to the description of the above embodiments, and the embodiments of the present disclosure will not be described again here.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect.
图4为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由编码设备执行,如图4所示,该音频处理方法可以包括以下步骤:Figure 4 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 4, the audio processing method may include the following steps:
步骤401、确定元数据中需要包含绝对位置信息或相对位置信息。Step 401: Determine whether the metadata needs to contain absolute position information or relative position information.
步骤402、响应于确定元数据中需要包含相对位置信息,使得元数据中包含相对位置信息。Step 402: In response to determining that the metadata needs to include relative position information, make the metadata include the relative position information.
其中,表6为本公开实施例提供的包含相对位置信息的元数据的存储语法的示意表。Table 6 is a schematic table of storage syntax for metadata containing relative position information provided by an embodiment of the present disclosure.
表6—对象元数据样本的语法(相对坐标模式)Table 6—Syntax for object metadata sample (relative coordinate mode)
Figure PCTCN2022091052-appb-000008
Figure PCTCN2022091052-appb-000008
Figure PCTCN2022091052-appb-000009
Figure PCTCN2022091052-appb-000009
步骤403、基于音频数据的元数据得到对象音频信号。Step 403: Obtain the object audio signal based on the metadata of the audio data.
其中,关于步骤401-403的详细介绍可以参考上述实施例描述,本公开实施例在此不做赘述。For detailed introduction to steps 401-403, please refer to the above embodiment description, and the embodiments of the present disclosure will not be described again here.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
以及,由上述图4和图5的实施例可知,本公开中通过结合相对位置信息及绝对位置信息共同进行编码,则能够同时保证在相对位置位置不变的场景中及绝对位置位置不变的场景中,均能够达到最高效的空间音频元数据方案。And, as can be seen from the above-mentioned embodiments of Figures 4 and 5, in the present disclosure, by combining relative position information and absolute position information for encoding, it can simultaneously ensure that the relative position remains unchanged and the absolute position remains unchanged. In all scenarios, the most efficient spatial audio metadata solution can be achieved.
图5为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由编码设备执行,如图5所示,该音频处理方法可以包括以下步骤:Figure 5 is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 5, the audio processing method may include the following steps:
步骤501、确定声对象是否有朝向。Step 501: Determine whether the sound object has a direction.
其中,在本公开的一个实施例之中,若声对象是朝向所有方向发声,则认为该声对象不存在朝向,否则,将该声对象的发声方向确定为该声对象的朝向。Among them, in one embodiment of the present disclosure, if the sound object emits sound in all directions, it is considered that the sound object has no orientation; otherwise, the sound emission direction of the sound object is determined as the orientation of the sound object.
步骤502、响应于声对象有朝向,将声对象的朝向信息包含至元数据中,且在元数据中包含标记,该标记用于指示元数据中包括有朝向信息。Step 502: In response to the sound object having an orientation, include the orientation information of the acoustic object into the metadata, and include a tag in the metadata, the tag being used to indicate that the metadata includes the orientation information.
其中,关于步骤501-502的详细介绍可以参考上述实施例描述,本公开实施例在此不做赘述。For detailed introduction to steps 501-502, please refer to the above embodiment description, and the embodiments of the present disclosure will not be described again here.
综上所述,本公开实施例的元数据中还包括有声对象的朝向信息,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, the metadata in the embodiment of the present disclosure also includes the orientation information of the sounding object. When the subsequent encoding device renders the object audio signal, it can perform rendering based on the orientation information to simulate the different actual orientations of the sounding object. The resulting difference in hearing improves the rendering effect.
图6a为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由编码设备执行,如图6a所示,该音频处理方法可以包括以下步骤:Figure 6a is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 6a, the audio processing method may include the following steps:
步骤601a、确定声对象是否有朝向。 Step 601a: Determine whether the sound object has a direction.
步骤602a、响应于声对象不存在朝向,在元数据中不包含朝向信息。 Step 602a: In response to the sound object having no orientation, no orientation information is included in the metadata.
其中,关于步骤601a-602a的详细介绍可以参考上述实施例描述,本公开实施例在此不做赘述。For detailed description of steps 601a-602a, please refer to the above embodiment description, and the embodiments of the present disclosure will not be described again here.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的 声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图6b为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由编码设备执行,如图6a所示,该音频处理方法可以包括以下步骤:Figure 6b is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by an encoding device. As shown in Figure 6a, the audio processing method may include the following steps:
步骤601b、确定每帧音频数据的元数据,所述元数据包括所述音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种。 Step 601b: Determine the metadata of each frame of audio data. The metadata includes the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. of at least one.
步骤602b、基于所述音频数据的元数据得到对象音频信号。 Step 602b: Obtain the object audio signal based on the metadata of the audio data.
其中,关于步骤601b-602b的详细介绍可以参考上述实施例描述,本公开实施例在此不做赘述。For detailed description of steps 601b-602b, please refer to the description of the above embodiments, and the embodiments of the present disclosure will not be described again here.
步骤603b、对所述对象音频信号进行编码。 Step 603b: Encode the target audio signal.
步骤604b、将编码后的信号发送至解码设备。 Step 604b: Send the encoded signal to the decoding device.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
以下对上述的音频处理方法进行举例说明。The following is an example of the above audio processing method.
图7为本公开实施例所提供的一种音频处理方法的流程示意图,如图7所示,以本地多人会议场景为例。左侧房间为录制端,右侧房间为重放端。录制端房间内存在多个对象,包括正在说话的对象1,对象2,对象4,以及未说话的对象3。多个对象均被视为场景内的声对象,通过麦克风获取其对应的语音数据,并且使用定位、姿态传感器例如陀螺仪、超声测距仪等获取各个对象的空间信息(如相对位置信息或绝对位置信息)和朝向信息,将各个对象的音频数据,空间及朝向信息进行编码,传输,解码,渲染后,可使得听音者感觉自己置身于左侧的会议场景中,不仅能感受到对象1,对象2,对象4的方向和距离,还能感受到对象的朝向。除此以外,本方案可将没有音频数据的对象3同样视为一个声对象进行编码传输,可将其视为录制场景中的一个听音者,通过本方案能够使得重放端完全还原出对象3的真实听感,包括了对象3位置变化、头部转动(朝向变化)引起的听感变化。Figure 7 is a schematic flow chart of an audio processing method provided by an embodiment of the present disclosure. As shown in Figure 7, a local multi-person conference scenario is taken as an example. The room on the left is the recording end, and the room on the right is the playback end. There are multiple objects in the room at the recording end, including Object 1, Object 2, Object 4 who are speaking, and Object 3 who is not speaking. Multiple objects are regarded as acoustic objects in the scene, and their corresponding voice data is obtained through microphones, and positioning and attitude sensors such as gyroscopes, ultrasonic rangefinders, etc. are used to obtain the spatial information of each object (such as relative position information or absolute position information) and orientation information. After encoding, transmitting, decoding and rendering the audio data, space and orientation information of each object, the listener can feel that he is in the conference scene on the left. Not only can he feel the object 1 , the direction and distance of object 2 and object 4, and you can also feel the direction of the object. In addition, this solution can treat object 3 without audio data as an audio object for encoding and transmission, and it can be regarded as a listener in the recording scene. Through this solution, the playback end can completely restore the object. The real listening experience of 3 includes changes in hearing experience caused by changes in the position of object 3 and head rotation (change in orientation).
图8为本公开实施例所提供的一种音频处理方法的流程示意图,如图8所示,在多人远程会议的场景中,左侧为远端参会人,多个参会人位于不同地点,不同房间,每个可视为一个声对象进行对象音频编码,使用位移或位置传感器及姿态传感器即可获取对象的空间位置变化信息以及头部朝向信息,使用麦克风获得参会人语音信号作为对象音频数据,并将空间位置信息,头部朝向信息,对象音频数据用于对象音频的编码。对于近端用户而言(图8右侧),在获取到多个编码后的远程对象音频后进行解码并渲染,结合近端用户本地的空间信息,即可使用户感知到多个远端参会人的声音具有随时间变化的方向感,也可以感知到远端参会人的朝向带来的听感变化。Figure 8 is a schematic flow chart of an audio processing method provided by an embodiment of the present disclosure. As shown in Figure 8, in a multi-person remote conference scenario, the left side is a remote participant, and multiple participants are located in different locations. Locations, different rooms, each can be regarded as a sound object for object audio coding. Use displacement or position sensors and attitude sensors to obtain the object's spatial position change information and head orientation information, and use a microphone to obtain the participant's voice signal as Object audio data, and use spatial position information, head orientation information, and object audio data for encoding of object audio. For the near-end user (right side of Figure 8), after obtaining multiple encoded remote object audios, decoding and rendering, combined with the local spatial information of the near-end user, the user can perceive multiple remote parameters. Participants' voices have a sense of direction that changes with time, and the hearing changes caused by the direction of the remote participant can also be perceived.
图9a为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由解码设备执行,如图9a 所示,该音频处理方法可以包括以下步骤:Figure 9a is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9a, the audio processing method may include the following steps:
步骤901a、获取编码设备发送的编码后的信号; Step 901a: Obtain the encoded signal sent by the encoding device;
步骤902a、对所述编码后的信号进行解码得到对象音频信号; Step 902a: Decode the encoded signal to obtain the target audio signal;
步骤903a、确定所述对象音频信号的元数据,所述元数据包括声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种; Step 903a: Determine the metadata of the object audio signal, the metadata including at least one of the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object;
步骤904a、基于所述元数据渲染所述对象音频信号。 Step 904a: Render the object audio signal based on the metadata.
可选的,在本公开的一个实施例之中,所述朝向信息包括绝对朝向信息和/或相对朝向信息;Optionally, in an embodiment of the present disclosure, the orientation information includes absolute orientation information and/or relative orientation information;
所述相对朝向信息用于指示所述声对象与听音位置之间的相对朝向。The relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
可选的,在本公开的一个实施例之中,所述元数据还包括以下至少一项:Optionally, in one embodiment of the present disclosure, the metadata further includes at least one of the following:
声对象的声源大小;The size of the sound source of the sound object;
声对象的宽度;The width of the sound object;
声对象的高度;The height of the sound object;
声对象的空间状态,所述空间状态包括移动或静止;The spatial state of the sound object, which includes movement or stillness;
声对象的类型。The type of sound object.
可选的,在本公开的一个实施例之中,所述对象音频信号包括头文件和对象音频数据包;Optionally, in an embodiment of the present disclosure, the object audio signal includes a header file and an object audio data packet;
所述头文件包括所述声对象的环境空间信息和所述声对象的基本信息;The header file includes environmental space information of the sound object and basic information of the sound object;
所述对象音频数据包包括音频数据的元数据和音频数据。The object audio data packet includes audio data metadata and audio data.
可选的,在本公开的一个实施例之中,响应于所述声对象位于房间中,所述环境空间信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, in response to the sound object being located in the room, the environmental spatial information includes at least one of the following:
房间大小;room size;
房间墙壁类型;room wall type;
墙壁反射系数;wall reflection coefficient;
房间类型;Room type;
混响时间。Reverberation time.
可选的,在本公开的一个实施例之中,所述声对象的基本信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, the basic information of the acoustic object includes at least one of the following:
声对象的数量;The number of sound objects;
声对象的声源的采样率;The sampling rate of the sound source of the sound object;
声对象的声源位宽;The sound source width of the sound object;
每帧音频数据的帧长。The frame length of each frame of audio data.
可选的,在本公开的一个实施例之中,所述基于所述元数据渲染所述对象音频信号,包括:Optionally, in one embodiment of the present disclosure, rendering the object audio signal based on the metadata includes:
基于所述元数据和所述头文件对所述音频数据进行渲染。The audio data is rendered based on the metadata and the header file.
其中,关于步骤901a-904a的详细介绍可以参考上述实施例描述,本公开实施例在此不做赘述。For detailed description of steps 901a-904a, please refer to the above embodiment description, and the embodiments of the present disclosure will not be described again here.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图9b为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由解码设备执行,如图9b所示,该音频处理方法可以包括以下步骤:Figure 9b is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9b, the audio processing method may include the following steps:
步骤901b、获取编码设备发送的编码后的信号; Step 901b: Obtain the encoded signal sent by the encoding device;
步骤902b、对所述编码后的信号进行解码得到对象音频信号; Step 902b: Decode the encoded signal to obtain the target audio signal;
步骤903b、确定所述对象音频信号的元数据,所述元数据包括声对象的绝对位置信息; Step 903b: Determine metadata of the object audio signal, where the metadata includes absolute position information of the acoustic object;
步骤904b、基于所述元数据渲染所述对象音频信号。 Step 904b: Render the object audio signal based on the metadata.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图9c为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由解码设备执行,如图9c所示,该音频处理方法可以包括以下步骤:Figure 9c is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9c, the audio processing method may include the following steps:
步骤901c、获取编码设备发送的编码后的信号; Step 901c: Obtain the encoded signal sent by the encoding device;
步骤902c、对所述编码后的信号进行解码得到对象音频信号; Step 902c: Decode the encoded signal to obtain the target audio signal;
步骤903c、确定所述对象音频信号的元数据,所述元数据包括声对象的相对位置信息; Step 903c: Determine metadata of the object audio signal, where the metadata includes relative position information of the acoustic object;
步骤904c、基于所述元数据渲染所述对象音频信号。 Step 904c: Render the object audio signal based on the metadata.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图9d为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由解码设备执行,如图9d所示,该音频处理方法可以包括以下步骤:Figure 9d is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9d, the audio processing method may include the following steps:
步骤901d、获取编码设备发送的编码后的信号; Step 901d: Obtain the encoded signal sent by the encoding device;
步骤902d、对所述编码后的信号进行解码得到对象音频信号; Step 902d: Decode the encoded signal to obtain the target audio signal;
步骤903d、确定所述对象音频信号的元数据,所述元数据包括声对象的朝向信息和标记,所述标记用于指示元数据中包括有朝向信息; Step 903d: Determine the metadata of the object audio signal. The metadata includes the orientation information and the tag of the acoustic object. The tag is used to indicate that the metadata includes orientation information;
步骤904d、基于所述元数据渲染所述对象音频信号。 Step 904d: Render the object audio signal based on the metadata.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图9e为本公开实施例所提供的一种音频处理方法的流程示意图,该方法由解码设备执行,如图9e所示,该音频处理方法可以包括以下步骤:Figure 9e is a schematic flowchart of an audio processing method provided by an embodiment of the present disclosure. The method is executed by a decoding device. As shown in Figure 9e, the audio processing method may include the following steps:
步骤901e、获取编码设备发送的编码后的信号; Step 901e: Obtain the encoded signal sent by the encoding device;
步骤902e、对所述编码后的信号进行解码得到对象音频信号; Step 902e: Decode the encoded signal to obtain the target audio signal;
步骤903e、确定所述对象音频信号的元数据,所述元数据包括声对象的朝向信息和标记,所述标记用于指示元数据中包括有朝向信息; Step 903e: Determine the metadata of the object audio signal. The metadata includes the orientation information and the tag of the acoustic object. The tag is used to indicate that the metadata includes orientation information;
步骤904e、基于所述元数据渲染所述对象音频信号。 Step 904e: Render the object audio signal based on the metadata.
综上所述,在本公开实施例提供的音频处理方法之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing method provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
图9f为本公开实施例所提供的一种音频处理装置的结构示意图,如图9f所示,该装置可以包括:Figure 9f is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure. As shown in Figure 9f, the device may include:
确定模块901f,用于确定每帧音频数据的元数据,所述元数据包括所述音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determining module 901f, used to determine the metadata of each frame of audio data. The metadata includes the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the voicing of the sound object. At least one of the radiation ranges;
处理模块902f,用于基于所述音频数据的元数据得到对象音频信号。The processing module 902f is configured to obtain an object audio signal based on the metadata of the audio data.
综上所述,在本公开实施例提供的音频处理装置之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音 位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing device provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
可选的,在本公开的一个实施例之中,所述确定模块,还用于:Optionally, in one embodiment of the present disclosure, the determining module is also used to:
确定所述元数据中需要包含绝对位置信息或相对位置信息;Determine that the metadata needs to contain absolute position information or relative position information;
其中,响应于确定所述元数据中需要包含绝对位置信息,使得所述元数据中包含绝对位置信息;Wherein, in response to determining that the metadata needs to contain absolute location information, causing the metadata to contain absolute location information;
响应于确定所述元数据中需要包含相对位置信息,使得所述元数据中包含相对位置信息,所述相对位置信息用于指示所述声对象与听音者之间的相对位置。In response to determining that the metadata needs to include relative position information, the metadata includes relative position information, and the relative position information is used to indicate the relative position between the sound object and the listener.
可选的,在本公开的一个实施例之中,所述确定模块,还用于:Optionally, in one embodiment of the present disclosure, the determining module is also used to:
确定所述声对象是否有朝向;Determine whether the sound object has a direction;
响应于所述声对象有朝向,将所述声对象的朝向信息包含至所述元数据中,且在所述元数据中包含标记,所述标记用于指示所述元数据中包括有朝向信息;In response to the sound object having an orientation, the orientation information of the acoustic object is included in the metadata, and a mark is included in the metadata, the mark being used to indicate that the metadata includes the orientation information. ;
响应于所述声对象不存在朝向,在所述元数据中不包含朝向信息。In response to the absence of an orientation of the acoustic object, no orientation information is included in the metadata.
可选的,在本公开的一个实施例之中,所述朝向信息包括绝对朝向信息和/或相对朝向信息;Optionally, in an embodiment of the present disclosure, the orientation information includes absolute orientation information and/or relative orientation information;
所述相对朝向信息用于指示所述声对象与听音者之间的相对朝向。The relative orientation information is used to indicate the relative orientation between the sound object and the listener.
可选的,在本公开的一个实施例之中,所述元数据还包括以下至少一项:Optionally, in one embodiment of the present disclosure, the metadata further includes at least one of the following:
声对象的声源大小;The size of the sound source of the sound object;
声对象的宽度;The width of the sound object;
声对象的高度;The height of the sound object;
声对象的空间状态,所述空间状态包括移动或静止;The spatial state of the sound object, which includes movement or stillness;
声对象的类型。The type of sound object.
可选的,在本公开的一个实施例之中,所述装置还用于:Optionally, in one embodiment of the present disclosure, the device is also used for:
确定所述声对象的环境空间信息;Determine the environmental spatial information of the acoustic object;
确定所述声对象的基本信息;Determine the basic information of the acoustic object;
以帧为单位采样所述声对象的音频数据。Audio data of the sound object is sampled in units of frames.
可选的,在本公开的一个实施例之中,响应于所述声对象位于房间中,所述环境空间信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, in response to the sound object being located in the room, the environmental spatial information includes at least one of the following:
房间大小;room size;
房间墙壁类型;room wall type;
墙壁反射系数;wall reflection coefficient;
房间类型;Room type;
混响时间。Reverberation time.
可选的,在本公开的一个实施例之中,所述声对象的基本信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, the basic information of the acoustic object includes at least one of the following:
声对象的数量;The number of sound objects;
声对象的声源的采样率;The sampling rate of the sound source of the sound object;
声对象的声源位宽;The sound source width of the sound object;
每帧音频数据的帧长。The frame length of each frame of audio data.
可选的,在本公开的一个实施例之中,所述处理模块,还用于::Optionally, in one embodiment of the present disclosure, the processing module is also used to::
将所述声对象的环境空间信息和所述声对象的基本信息存储为头文件;Store the environmental space information of the sound object and the basic information of the sound object as a header file;
将每一帧音频数据的元数据和每一帧音频数据存储为一个对象音频数据包;Store the metadata of each frame of audio data and each frame of audio data as an object audio data packet;
拼接所述头文件和所述对象音频数据包以得到至少一个对象音频信号。The header file and the object audio data packet are spliced to obtain at least one object audio signal.
可选的,在本公开的一个实施例之中,所述方法还包括:Optionally, in an embodiment of the present disclosure, the method further includes:
对所述对象音频信号进行编码;Encoding the object audio signal;
将编码后的信号发送至解码设备。Send the encoded signal to the decoding device.
图9g为本公开实施例所提供的一种音频处理装置的结构示意图,如图9g所示,该装置可以包括:Figure 9g is a schematic structural diagram of an audio processing device provided by an embodiment of the present disclosure. As shown in Figure 9g, the device may include:
获取模块901g,用于获取编码设备发送的编码后的信号;The acquisition module 901g is used to acquire the encoded signal sent by the encoding device;
解码模块902g,用于对所述编码后的信号进行解码得到对象音频信号;The decoding module 902g is used to decode the encoded signal to obtain the target audio signal;
确定模块903g,用于确定所述对象音频信号的元数据,所述元数据包括声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determining module 903g, used to determine the metadata of the object audio signal, the metadata including at least the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. A sort of;
渲染模块904g,用于基于所述元数据渲染所述对象音频信号。 Rendering module 904g, configured to render the object audio signal based on the metadata.
综上所述,在本公开实施例提供的音频处理装置之中,编码设备会确定每帧音频数据的元数据,其中,该元数据中包括有音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;之后,编码设备会基于音频数据的元数据得到对象音频信号。由此可知,本公开实施例中,元数据中可以包括有声对象的绝对位置信息,基于此,当录制音频数据或制作对象音频信号的过程中,若多个声对象的绝对位置固定,但听音位置在不断移动,则可以使得元数据中包括声对象的绝对位置信息,此时由于声对象的绝对位置固定,可以仅使得某一帧(如第一帧)的音频数据的元数据中包括声对象与听音位置之间的绝对位置信息,其他帧的音频数据可以复用该帧中的声对象与听音位置之间的绝对位置信息,而无需使得每一帧的音频数据的元数据均包括有声对象与听音位置之间的绝对位置信息,从而降低了元数据的数据量和传输带宽,提高了编码效率并且保证渲染器后续能够正确的渲染声对象的方位,提供正确的空间音频感知结果而不影响最终解码渲染效果。此外,本公开实施例的元数据中还包括有声对象的朝向信息和声对象的发声辐射范围,则后续编码设备在渲染对象音频信号时,可以基于该朝向信息和发声辐射范围来进行渲染,以模拟出声对象的实际朝向不同而产生的听感差异,提高了渲染效果。To sum up, in the audio processing device provided by the embodiment of the present disclosure, the encoding device determines the metadata of each frame of audio data, where the metadata includes the absolute position information of the sound object in the audio data, the sound At least one of the relative position information of the object, the orientation information of the sound object, and the sound radiation range of the sound object; then, the encoding device will obtain the object audio signal based on the metadata of the audio data. It can be seen from this that in embodiments of the present disclosure, metadata may include absolute position information of sound objects. Based on this, when recording audio data or producing object audio signals, if the absolute positions of multiple sound objects are fixed, but the listening If the sound position is constantly moving, the metadata can include the absolute position information of the sound object. At this time, since the absolute position of the sound object is fixed, the metadata of the audio data of a certain frame (such as the first frame) can only include The absolute position information between the sound object and the listening position. The audio data of other frames can reuse the absolute position information between the sound object and the listening position in the frame without making the metadata of the audio data of each frame All include the absolute position information between the sound object and the listening position, thereby reducing the amount of metadata and transmission bandwidth, improving coding efficiency and ensuring that the renderer can subsequently correctly render the position of the sound object and provide correct spatial audio. Perceptual results without affecting the final decoding rendering effect. In addition, the metadata in the embodiment of the present disclosure also includes the orientation information of the sound object and the sound radiation range of the sound object. Then when the subsequent encoding device renders the object audio signal, the rendering can be based on the orientation information and the sound radiation range, so as to Simulates the difference in hearing caused by the actual direction of the sounding object, improving the rendering effect.
可选的,在本公开的一个实施例之中,所述朝向信息包括绝对朝向信息和/或相对朝向信息;Optionally, in an embodiment of the present disclosure, the orientation information includes absolute orientation information and/or relative orientation information;
所述相对朝向信息用于指示所述声对象与听音位置之间的相对朝向。The relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
可选的,在本公开的一个实施例之中,所述元数据还包括以下至少一项:Optionally, in one embodiment of the present disclosure, the metadata further includes at least one of the following:
声对象的声源大小;The size of the sound source of the sound object;
声对象的宽度;The width of the sound object;
声对象的高度;The height of the sound object;
声对象的空间状态,所述空间状态包括移动或静止;The spatial state of the sound object, which includes movement or stillness;
声对象的类型。The type of sound object.
可选的,在本公开的一个实施例之中,所述对象音频信号包括头文件和对象音频数据包;Optionally, in an embodiment of the present disclosure, the object audio signal includes a header file and an object audio data packet;
所述头文件包括所述声对象的环境空间信息和所述声对象的基本信息;The header file includes environmental space information of the sound object and basic information of the sound object;
所述对象音频数据包包括音频数据的元数据和音频数据。The object audio data packet includes audio data metadata and audio data.
可选的,在本公开的一个实施例之中,响应于所述声对象位于房间中,所述环境空间信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, in response to the sound object being located in the room, the environmental spatial information includes at least one of the following:
房间大小;room size;
房间墙壁类型;room wall type;
墙壁反射系数;wall reflection coefficient;
房间类型;Room type;
混响时间。Reverberation time.
可选的,在本公开的一个实施例之中,所述声对象的基本信息包括以下至少一种:Optionally, in an embodiment of the present disclosure, the basic information of the acoustic object includes at least one of the following:
声对象的数量;The number of sound objects;
声对象的声源的采样率;The sampling rate of the sound source of the sound object;
声对象的声源位宽;The sound source width of the sound object;
每帧音频数据的帧长。The frame length of each frame of audio data.
可选的,在本公开的一个实施例之中,所述基于所述元数据渲染所述对象音频信号,包括:Optionally, in one embodiment of the present disclosure, rendering the object audio signal based on the metadata includes:
基于所述元数据和所述头文件对所述音频数据进行渲染。The audio data is rendered based on the metadata and the header file.
图10是本公开一个实施例所提供的一种用户设备UE1000的框图。例如,UE1000可以是移动电话,计算机,数字广播终端设备,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。Figure 10 is a block diagram of a user equipment UE1000 provided by an embodiment of the present disclosure. For example, the UE1000 can be a mobile phone, a computer, a digital broadcast terminal device, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, etc.
参照图10,UE1000可以包括以下至少一个组件:处理组件1002,存储器1004,电源组件1006,多媒体组件1008,音频组件1010,输入/输出(I/O)的接口1012,传感器组件1013,以及通信组件1016。Referring to FIG. 10 , UE 1000 may include at least one of the following components: a processing component 1002 , a memory 1004 , a power supply component 1006 , a multimedia component 1008 , an audio component 1010 , an input/output (I/O) interface 1012 , a sensor component 1013 , and a communication component. 1016.
处理组件1002通常控制UE1000的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件1002可以包括至少一个处理器1020来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件1002可以包括至少一个模块,便于处理组件1002和其他组件之间的交互。例如,处理组件1002可以包括多媒体模块,以方便多媒体组件1008和处理组件1002之间的交互。 Processing component 1002 generally controls the overall operations of UE 1000, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 1002 may include at least one processor 1020 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 1002 may include at least one module to facilitate interaction between processing component 1002 and other components. For example, processing component 1002 may include a multimedia module to facilitate interaction between multimedia component 1008 and processing component 1002.
存储器1004被配置为存储各种类型的数据以支持在UE1000的操作。这些数据的示例包括用于在UE1000上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器1004可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。 Memory 1004 is configured to store various types of data to support operations at UE 1000. Examples of this data include instructions for any application or method operating on the UE1000, contact data, phonebook data, messages, pictures, videos, etc. Memory 1004 may be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EEPROM), Programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
电源组件1006为UE1000的各种组件提供电力。电源组件1006可以包括电源管理系统,至少一个电源,及其他与为UE1000生成、管理和分配电力相关联的组件。 Power supply component 1006 provides power to various components of UE 1000. Power supply components 1006 may include a power management system, at least one power supply, and other components associated with generating, managing, and distributing power to UE 1000.
多媒体组件1008包括在所述UE1000和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括至少一个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的唤醒时间和压力。在一些实施例中,多媒体组件1008包括一个前置摄像头和/或后置摄像头。当UE1000处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。 Multimedia component 1008 includes a screen that provides an output interface between the UE 1000 and the user. In some embodiments, the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user. The touch panel includes at least one touch sensor to sense touches, slides, and gestures on the touch panel. The touch sensor may not only sense the boundary of the touch or sliding operation, but also detect the wake-up time and pressure related to the touch or sliding operation. In some embodiments, multimedia component 1008 includes a front-facing camera and/or a rear-facing camera. When UE1000 is in an operating mode, such as shooting mode or video mode, the front camera and/or rear camera can receive external multimedia data. Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
音频组件1010被配置为输出和/或输入音频信号。例如,音频组件1010包括一个麦克风(MIC),当UE1000处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器1004或经由通信组件1016发送。在一些实施例中,音频组件1010还包括一个扬声器,用于输出音频信号。 Audio component 1010 is configured to output and/or input audio signals. For example, audio component 1010 includes a microphone (MIC) configured to receive external audio signals when UE 1000 is in operating modes, such as call mode, recording mode, and voice recognition mode. The received audio signals may be further stored in memory 1004 or sent via communication component 1016 . In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.
I/O接口1012为处理组件1002和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 1012 provides an interface between the processing component 1002 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
传感器组件1013包括至少一个传感器,用于为UE1000提供各个方面的状态评估。例如,传感器组件1013可以检测到设备1000的打开/关闭状态,组件的相对定位,例如所述组件为UE1000的显示器和小键盘,传感器组件1013还可以检测UE1000或UE1000一个组件的位置改变,用户与UE1000接触的存在或不存在,UE1000方位或加速/减速和UE1000的温度变化。传感器组件1013可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件1013还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件1013还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。The sensor component 1013 includes at least one sensor for providing various aspects of status assessment for the UE 1000 . For example, the sensor component 1013 can detect the open/closed state of the device 1000, the relative positioning of components, such as the display and keypad of the UE1000, the sensor component 1013 can also detect the position change of the UE1000 or a component of the UE1000, the user and the The presence or absence of UE1000 contact, UE1000 orientation or acceleration/deceleration and temperature changes of UE1000. Sensor assembly 1013 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 1013 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 1013 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
通信组件1016被配置为便于UE1000和其他设备之间有线或无线方式的通信。UE1000可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件1016经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件1016还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。 Communication component 1016 is configured to facilitate wired or wireless communication between UE 1000 and other devices. UE1000 can access wireless networks based on communication standards, such as WiFi, 2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 1016 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communications component 1016 also includes a near field communications (NFC) module to facilitate short-range communications. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
在示例性实施例中,UE1000可以被至少一个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处 理器或其他电子元件实现,用于执行上述方法。In an exemplary embodiment, UE 1000 may be configured by at least one application specific integrated circuit (ASIC), digital signal processor (DSP), digital signal processing device (DSPD), programmable logic device (PLD), field programmable gate array ( FPGA), controller, microcontroller, microprocessor or other electronic component implementation for executing the above method.
图11是本公开实施例所提供的一种网络侧设备1100的框图。例如,网络侧设备1100可以被提供为一网络侧设备。参照图11,网络侧设备1100包括处理组件1111,其进一步包括至少一个处理器,以及由存储器1132所代表的存储器资源,用于存储可由处理组件1122的执行的指令,例如应用程序。存储器1132中存储的应用程序可以包括一个或一个以上的每一个对应于一组指令的模块。此外,处理组件1110被配置为执行指令,以执行上述方法前述应用在所述网络侧设备的任意方法,例如,如图1所示方法。Figure 11 is a block diagram of a network side device 1100 provided by an embodiment of the present disclosure. For example, the network side device 1100 may be provided as a network side device. Referring to FIG. 11 , the network side device 1100 includes a processing component 1111 , which further includes at least one processor, and a memory resource represented by a memory 1132 for storing instructions, such as application programs, that can be executed by the processing component 1122 . An application stored in memory 1132 may include one or more modules, each of which corresponds to a set of instructions. In addition, the processing component 1110 is configured to execute instructions to perform any of the foregoing methods applied to the network side device, for example, the method shown in FIG. 1 .
网络侧设备1100还可以包括一个电源组件1126被配置为执行网络侧设备1100的电源管理,一个有线或无线网络接口1150被配置为将网络侧设备1100连接到网络,和一个输入输出(I/O)接口1158。网络侧设备1100可以操作基于存储在存储器1132的操作系统,例如Windows Server TM,Mac OS XTM,Unix TM,Linux TM,FreeBSDTM或类似。The network side device 1100 may also include a power supply component 1126 configured to perform power management of the network side device 1100, a wired or wireless network interface 1150 configured to connect the network side device 1100 to the network, and an input/output (I/O ) interface 1158. The network side device 1100 may operate based on an operating system stored in the memory 1132, such as Windows Server TM, Mac OS X TM, Unix TM, Linux TM, FreeBSD TM or similar.
上述本公开提供的实施例中,分别从网络侧设备、UE的角度对本公开实施例提供的方法进行了介绍。为了实现上述本公开实施例提供的方法中的各功能,网络侧设备和UE可以包括硬件结构、软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能可以以硬件结构、软件模块、或者硬件结构加软件模块的方式来执行。In the above embodiments provided by the present disclosure, the methods provided by the embodiments of the present disclosure are introduced from the perspectives of network side equipment and UE respectively. In order to implement each function in the method provided by the above embodiments of the present disclosure, the network side device and the UE may include a hardware structure and a software module to implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. A certain function among the above functions can be executed by a hardware structure, a software module, or a hardware structure plus a software module.
上述本公开提供的实施例中,分别从网络侧设备、UE的角度对本公开实施例提供的方法进行了介绍。为了实现上述本公开实施例提供的方法中的各功能,网络侧设备和UE可以包括硬件结构、软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能可以以硬件结构、软件模块、或者硬件结构加软件模块的方式来执行。In the above embodiments provided by the present disclosure, the methods provided by the embodiments of the present disclosure are introduced from the perspectives of network side equipment and UE respectively. In order to implement each function in the method provided by the above embodiments of the present disclosure, the network side device and the UE may include a hardware structure and a software module to implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. A certain function among the above functions can be executed by a hardware structure, a software module, or a hardware structure plus a software module.
本公开实施例提供的一种通信装置。通信装置可包括收发模块和处理模块。收发模块可包括发送模块和/或接收模块,发送模块用于实现发送功能,接收模块用于实现接收功能,收发模块可以实现发送功能和/或接收功能。A communication device provided by an embodiment of the present disclosure. The communication device may include a transceiver module and a processing module. The transceiver module may include a sending module and/or a receiving module. The sending module is used to implement the sending function, and the receiving module is used to implement the receiving function. The transceiving module may implement the sending function and/or the receiving function.
通信装置可以是终端设备(如前述方法实施例中的终端设备),也可以是终端设备中的装置,还可以是能够与终端设备匹配使用的装置。或者,通信装置可以是网络设备,也可以是网络设备中的装置,还可以是能够与网络设备匹配使用的装置。The communication device may be a terminal device (such as the terminal device in the foregoing method embodiment), a device in the terminal device, or a device that can be used in conjunction with the terminal device. Alternatively, the communication device may be a network device, a device in a network device, or a device that can be used in conjunction with the network device.
本公开实施例提供的另一种通信装置。通信装置可以是网络设备,也可以是终端设备(如前述方法实施例中的终端设备),也可以是支持网络设备实现上述方法的芯片、芯片系统、或处理器等,还可以是支持终端设备实现上述方法的芯片、芯片系统、或处理器等。该装置可用于实现上述方法实施例中描述的方法,具体可以参见上述方法实施例中的说明。Another communication device provided by an embodiment of the present disclosure. The communication device may be a network device, or may be a terminal device (such as the terminal device in the foregoing method embodiment), or may be a chip, chip system, or processor that supports the network device to implement the above method, or may be a terminal device that supports A chip, chip system, or processor that implements the above method. The device can be used to implement the method described in the above method embodiment. For details, please refer to the description in the above method embodiment.
通信装置可以包括一个或多个处理器。处理器可以是通用处理器或者专用处理器等。例如可以是基带处理器或中央处理器。基带处理器可以用于对通信协议以及通信数据进行处理,中央处理器可以用于对通信装置(如,网络侧设备、基带芯片,终端设备、终端设备芯片,DU或CU等)进行控制,执行计算机程序,处理计算机程序的数据。A communications device may include one or more processors. The processor may be a general-purpose processor or a special-purpose processor, etc. For example, it can be a baseband processor or a central processing unit. The baseband processor can be used to process communication protocols and communication data, and the central processor can be used to control and execute communication devices (such as network side equipment, baseband chips, terminal equipment, terminal equipment chips, DU or CU, etc.) A computer program processes data for a computer program.
可选的,通信装置中还可以包括一个或多个存储器,其上可以存有计算机程序,处理器执行所述计算机程序,以使得通信装置执行上述方法实施例中描述的方法。可选的,所述存储器中还可以存储有数据。通信装置和存储器可以单独设置,也可以集成在一起。Optionally, the communication device may also include one or more memories, on which a computer program may be stored, and the processor executes the computer program, so that the communication device executes the method described in the above method embodiment. Optionally, data may also be stored in the memory. The communication device and the memory can be provided separately or integrated together.
可选的,通信装置还可以包括收发器、天线。收发器可以称为收发单元、收发机、或收发电路等,用于实现收发功能。收发器可以包括接收器和发送器,接收器可以称为接收机或接收电路等,用于实现接收功能;发送器可以称为发送机或发送电路等,用于实现发送功能。Optionally, the communication device may also include a transceiver and an antenna. The transceiver can be called a transceiver unit, a transceiver, or a transceiver circuit, etc., and is used to implement transceiver functions. The transceiver can include a receiver and a transmitter. The receiver can be called a receiver or a receiving circuit, etc., and is used to implement the receiving function; the transmitter can be called a transmitter or a transmitting circuit, etc., and is used to implement the transmitting function.
可选的,通信装置中还可以包括一个或多个接口电路。接口电路用于接收代码指令并传输至处理器。处理器运行所述代码指令以使通信装置执行上述方法实施例中描述的方法。Optionally, the communication device may also include one or more interface circuits. Interface circuitry is used to receive code instructions and transmit them to the processor. The processor executes the code instructions to cause the communication device to perform the method described in the above method embodiment.
通信装置为终端设备(如前述方法实施例中的终端设备):处理器用于执行图1-图4任一所示的方法。The communication device is a terminal device (such as the terminal device in the foregoing method embodiment): the processor is configured to execute the method shown in any one of Figures 1-4.
通信装置为网络设备:收发器用于执行图5-图7任一所示的方法。The communication device is a network device: a transceiver is used to perform the method shown in any one of Figures 5-7.
在一种实现方式中,处理器中可以包括用于实现接收和发送功能的收发器。例如该收发器可以是收 发电路,或者是接口,或者是接口电路。用于实现接收和发送功能的收发电路、接口或接口电路可以是分开的,也可以集成在一起。上述收发电路、接口或接口电路可以用于代码/数据的读写,或者,上述收发电路、接口或接口电路可以用于信号的传输或传递。In one implementation, a transceiver for implementing receiving and transmitting functions may be included in the processor. For example, the transceiver can be a transceiver circuit, an interface, or an interface circuit. The transceiver circuits, interfaces or interface circuits used to implement the receiving and transmitting functions can be separate or integrated together. The above-mentioned transceiver circuit, interface or interface circuit can be used for reading and writing codes/data, or the above-mentioned transceiver circuit, interface or interface circuit can be used for signal transmission or transfer.
在一种实现方式中,处理器可以存有计算机程序,计算机程序在处理器上运行,可使得通信装置执行上述方法实施例中描述的方法。计算机程序可能固化在处理器中,该种情况下,处理器可能由硬件实现。In one implementation, the processor may store a computer program, and the computer program runs on the processor, which can cause the communication device to perform the method described in the above method embodiment. The computer program may be embedded in the processor, in which case the processor may be implemented in hardware.
在一种实现方式中,通信装置可以包括电路,所述电路可以实现前述方法实施例中发送或接收或者通信的功能。本公开中描述的处理器和收发器可实现在集成电路(integratedcircuit,IC)、模拟IC、射频集成电路RFIC、混合信号IC、专用集成电路(application specific integrated circuit,ASIC)、印刷电路板(printed circuit board,PCB)、电子设备等上。该处理器和收发器也可以用各种IC工艺技术来制造,例如互补金属氧化物半导体(complementary metal oxide semiconductor,CMOS)、N型金属氧化物半导体(nMetal-oxide-semiconductor,NMOS)、P型金属氧化物半导体(positive channel metal oxide semiconductor,PMOS)、双极结型晶体管(bipolar junction transistor,BJT)、双极CMOS(BiCMOS)、硅锗(SiGe)、砷化镓(GaAs)等。In one implementation, the communication device may include a circuit, and the circuit may implement the functions of sending or receiving or communicating in the foregoing method embodiments. The processors and transceivers described in this disclosure can be implemented in integrated circuits (ICs), analog ICs, radio frequency integrated circuits RFICs, mixed signal ICs, application specific integrated circuits (ASICs), printed circuit boards (printed circuit boards). circuit board, PCB), electronic equipment, etc. The processor and transceiver can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), n-type metal oxide-semiconductor (NMOS), P-type Metal oxide semiconductor (positive channel metal oxide semiconductor, PMOS), bipolar junction transistor (BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
以上实施例描述中的通信装置可以是网络设备或者终端设备(如前述方法实施例中的终端设备),但本公开中描述的通信装置的范围并不限于此,而且通信装置的结构可以不受的限制。通信装置可以是独立的设备或者可以是较大设备的一部分。例如所述通信装置可以是:The communication device described in the above embodiments may be a network device or a terminal device (such as the terminal device in the foregoing method embodiment), but the scope of the communication device described in the present disclosure is not limited thereto, and the structure of the communication device may not be limited to limits. The communication device may be a stand-alone device or may be part of a larger device. For example, the communication device may be:
(1)独立的集成电路IC,或芯片,或,芯片系统或子系统;(1) Independent integrated circuit IC, or chip, or chip system or subsystem;
(2)具有一个或多个IC的集合,可选的,该IC集合也可以包括用于存储数据,计算机程序的存储部件;(2) A collection of one or more ICs. Optionally, the IC collection may also include storage components for storing data and computer programs;
(3)ASIC,例如调制解调器(Modem);(3)ASIC, such as modem;
(4)可嵌入在其他设备内的模块;(4) Modules that can be embedded in other devices;
(5)接收机、终端设备、智能终端设备、蜂窝电话、无线设备、手持机、移动单元、车载设备、网络设备、云设备、人工智能设备等等;(5) Receivers, terminal equipment, intelligent terminal equipment, cellular phones, wireless equipment, handheld devices, mobile units, vehicle-mounted equipment, network equipment, cloud equipment, artificial intelligence equipment, etc.;
(6)其他等等。(6) Others, etc.
对于通信装置可以是芯片或芯片系统的情况,芯片包括处理器和接口。其中,处理器的数量可以是一个或多个,接口的数量可以是多个。Where the communication device may be a chip or a system on a chip, the chip includes a processor and an interface. The number of processors may be one or more, and the number of interfaces may be multiple.
可选的,芯片还包括存储器,存储器用于存储必要的计算机程序和数据。Optionally, the chip also includes a memory, which is used to store necessary computer programs and data.
本领域技术人员还可以了解到本公开实施例列出的各种说明性逻辑块(illustrative logical block)和步骤(step)可以通过电子硬件、电脑软件,或两者的结合进行实现。这样的功能是通过硬件还是软件来实现取决于特定的应用和整个系统的设计要求。本领域技术人员可以对于每种特定的应用,可以使用各种方法实现所述的功能,但这种实现不应被理解为超出本公开实施例保护的范围。Those skilled in the art can also understand that the various illustrative logical blocks and steps listed in the embodiments of the present disclosure can be implemented by electronic hardware, computer software, or a combination of both. Whether such functionality is implemented in hardware or software depends on the specific application and overall system design requirements. Those skilled in the art can use various methods to implement the described functions for each specific application, but such implementation should not be understood as exceeding the scope of protection of the embodiments of the present disclosure.
本公开实施例还提供一种确定侧链路时长的系统,该系统包括前述实施例中作为终端设备(如前述方法实施例中的第一终端设备)的通信装置和作为网络设备的通信装置,或者,该系统包括前述实施例中作为终端设备(如前述方法实施例中的第一终端设备)的通信装置和作为网络设备的通信装置。Embodiments of the present disclosure also provide a system for determining side link duration. The system includes a communication device as a terminal device in the foregoing embodiment (such as the first terminal device in the foregoing method embodiment) and a communication device as a network device. Alternatively, the system includes a communication device as a terminal device in the foregoing embodiment (such as the first terminal device in the foregoing method embodiment) and a communication device as a network device.
本公开还提供一种可读存储介质,其上存储有指令,该指令被计算机执行时实现上述任一方法实施例的功能。The present disclosure also provides a readable storage medium on which instructions are stored, and when the instructions are executed by a computer, the functions of any of the above method embodiments are implemented.
本公开还提供一种计算机程序产品,该计算机程序产品被计算机执行时实现上述任一方法实施例的功能。The present disclosure also provides a computer program product, which, when executed by a computer, implements the functions of any of the above method embodiments.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机程序。在计算机上加载和执行所述计算机程序时,全部或部分地产生按照本公开实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机程序可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机程序可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、 数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(digital video disc,DVD))、或者半导体介质(例如,固态硬盘(solid state disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs. When the computer program is loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present disclosure are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer program may be stored in or transferred from one computer-readable storage medium to another, for example, the computer program may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The usable media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., high-density digital video discs (DVD)), or semiconductor media (e.g., solid state disks, SSD)) etc.
本领域普通技术人员可以理解:本公开中涉及的第一、第二等各种数字编号仅为描述方便进行的区分,并不用来限制本公开实施例的范围,也表示先后顺序。Those of ordinary skill in the art can understand that the first, second, and other numerical numbers involved in this disclosure are only for convenience of description and are not used to limit the scope of the embodiments of the disclosure, nor to indicate the order.
本公开中的至少一个还可以描述为一个或多个,多个可以是两个、三个、四个或者更多个,本公开不做限制。在本公开实施例中,对于一种技术特征,通过“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”等区分该种技术特征中的技术特征,该“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”描述的技术特征间无先后顺序或者大小顺序。At least one in the present disclosure can also be described as one or more, and the plurality can be two, three, four or more, and the present disclosure is not limited. In the embodiment of the present disclosure, for a technical feature, the technical feature is distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D” etc. The technical features described in "first", "second", "third", "A", "B", "C" and "D" are in no particular order or order.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本发明的其它实施方案。本公开旨在涵盖本发明的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本发明的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the invention will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. The present disclosure is intended to cover any variations, uses, or adaptations of the invention that follow the general principles of the invention and include common common sense or customary technical means in the technical field that are not disclosed in the present disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and various modifications and changes may be made without departing from the scope thereof. The scope of the disclosure is limited only by the appended claims.

Claims (25)

  1. 一种音频处理方法,其特征在于,被编码设备执行,所述方法包括:An audio processing method, characterized in that it is executed by an encoding device, and the method includes:
    确定每帧音频数据的元数据,所述元数据包括所述音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determine the metadata of each frame of audio data, the metadata including at least one of the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. kind;
    基于所述音频数据的元数据得到对象音频信号。An object audio signal is obtained based on the metadata of the audio data.
  2. 如权利要求1所述的方法,其特征在于,所述确定每帧音频数据的元数据,包括:The method of claim 1, wherein determining the metadata of each frame of audio data includes:
    确定所述元数据中需要包含绝对位置信息或相对位置信息;Determine that the metadata needs to contain absolute position information or relative position information;
    其中,响应于确定元数据中需要包含绝对位置信息,使得所述元数据中包括绝对位置信息;Wherein, in response to determining that the metadata needs to include absolute position information, causing the metadata to include absolute position information;
    响应于确定元数据中需要包含相对位置信息,使得所述元数据中包括相对位置信息。In response to determining that relative position information needs to be included in the metadata, relative position information is included in the metadata.
  3. 如权利要求1所述的方法,其特征在于,所述确定每帧音频数据的元数据,包括:The method of claim 1, wherein determining the metadata of each frame of audio data includes:
    确定所述声对象是否有朝向;Determine whether the sound object has a direction;
    响应于所述声对象有朝向,将所述声对象的朝向信息包含至所述元数据中,且在所述元数据中包含标记,所述标记用于指示所述元数据中包括有朝向信息;In response to the sound object having an orientation, the orientation information of the acoustic object is included in the metadata, and a mark is included in the metadata, the mark being used to indicate that the metadata includes the orientation information. ;
    响应于所述声对象不存在朝向,在所述元数据中不包含朝向信息。In response to the absence of an orientation of the acoustic object, no orientation information is included in the metadata.
  4. 如权利要求3所述的方法,其特征在于,所述朝向信息包括绝对朝向信息和/或相对朝向信息;The method of claim 3, wherein the orientation information includes absolute orientation information and/or relative orientation information;
    所述相对朝向信息用于指示所述声对象与听音位置之间的相对朝向。The relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
  5. 如权利要求1所述的方法,其特征在于,所述元数据还包括以下至少一项:The method of claim 1, wherein the metadata further includes at least one of the following:
    声对象的声源大小;The size of the sound source of the sound object;
    声对象的宽度;The width of the sound object;
    声对象的高度;The height of the sound object;
    声对象的空间状态,所述空间状态包括移动或静止;The spatial state of the sound object, which includes movement or stillness;
    声对象的类型。The type of sound object.
  6. 如权利要求1-5任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, characterized in that the method further includes:
    确定所述声对象的环境空间信息;Determine the environmental spatial information of the acoustic object;
    确定所述声对象的基本信息;Determine the basic information of the acoustic object;
    以帧为单位采样所述声对象的音频数据。Audio data of the sound object is sampled in units of frames.
  7. 如权利要求6所述的方法,其特征在于,响应于所述声对象位于房间中,所述环境空间信息包括以下至少一种:The method of claim 6, wherein in response to the acoustic object being located in the room, the environmental spatial information includes at least one of the following:
    房间大小;room size;
    房间墙壁类型;room wall type;
    墙壁反射系数;wall reflection coefficient;
    房间类型;Room type;
    混响时间。Reverberation time.
  8. 如权利要求6所述的方法,其特征在于,所述声对象的基本信息包括以下至少一种:The method of claim 6, wherein the basic information of the acoustic object includes at least one of the following:
    声对象的数量;The number of sound objects;
    声对象的声源的采样率;The sampling rate of the sound source of the sound object;
    声对象的声源位宽;The sound source width of the sound object;
    每帧音频数据的帧长。The frame length of each frame of audio data.
  9. 如权利要求6所述的方法,其特征在于,所述基于所述音频数据的元数据得到对象音频信号,包括:The method of claim 6, wherein obtaining the object audio signal based on the metadata of the audio data includes:
    将所述声对象的环境空间信息和所述声对象的基本信息存储为头文件;Store the environmental space information of the sound object and the basic information of the sound object as a header file;
    将每一帧音频数据的元数据和每一帧音频数据存储为一个对象音频数据包;Store the metadata of each frame of audio data and each frame of audio data as an object audio data packet;
    拼接所述头文件和所述对象音频数据包以得到至少一个对象音频信号。The header file and the object audio data packet are spliced to obtain at least one object audio signal.
  10. 如权利要求1所述的方法,其特征在于,所述方法还包括:The method of claim 1, further comprising:
    对所述对象音频信号进行编码;Encoding the object audio signal;
    将编码后的信号发送至解码设备。Send the encoded signal to the decoding device.
  11. 一种音频处理方法,其特征在于,被解码设备执行,所述方法包括:An audio processing method, characterized in that it is executed by a decoding device, and the method includes:
    获取编码设备发送的编码后的信号;Obtain the encoded signal sent by the encoding device;
    对所述编码后的信号进行解码得到对象音频信号;Decode the encoded signal to obtain the target audio signal;
    确定所述对象音频信号的元数据,所述元数据包括声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determine the metadata of the object audio signal, the metadata including at least one of the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object;
    基于所述元数据渲染所述对象音频信号。The object audio signal is rendered based on the metadata.
  12. 如权利要求11所述的方法,其特征在于,所述朝向信息包括绝对朝向信息和/或相对朝向信息;The method of claim 11, wherein the orientation information includes absolute orientation information and/or relative orientation information;
    所述相对朝向信息用于指示所述声对象与听音位置之间的相对朝向。The relative orientation information is used to indicate the relative orientation between the sound object and the listening position.
  13. 如权利要求11所述的方法,其特征在于,所述元数据还包括以下至少一项:The method of claim 11, wherein the metadata further includes at least one of the following:
    声对象的声源大小;The size of the sound source of the sound object;
    声对象的宽度;The width of the sound object;
    声对象的高度;The height of the sound object;
    声对象的空间状态,所述空间状态包括移动或静止。The spatial state of the sound object, including movement or stillness.
  14. 如权利要求11所述的方法,其特征在于,所述对象音频信号包括头文件和对象音频数据包;The method of claim 11, wherein the object audio signal includes a header file and an object audio data packet;
    所述头文件包括所述声对象的环境空间信息和所述声对象的基本信息;The header file includes environmental space information of the sound object and basic information of the sound object;
    所述对象音频数据包包括音频数据的元数据和音频数据。The object audio data packet includes audio data metadata and audio data.
  15. 如权利要求14所述的方法,其特征在于,响应于所述声对象位于房间中,所述环境空间信息包括以下至少一种:The method of claim 14, wherein in response to the acoustic object being located in the room, the environmental spatial information includes at least one of the following:
    房间大小;room size;
    房间墙壁类型;room wall type;
    墙壁反射系数;wall reflection coefficient;
    房间类型;Room type;
    混响时间。Reverberation time.
  16. 如权利要求14所述的方法,其特征在于,所述声对象的基本信息包括以下至少一种:The method of claim 14, wherein the basic information of the acoustic object includes at least one of the following:
    声对象的数量;The number of sound objects;
    声对象的声源的采样率;The sampling rate of the sound source of the sound object;
    声对象的声源位宽;The sound source width of the sound object;
    每帧音频数据的帧长。The frame length of each frame of audio data.
  17. 如权利要求14-16任一所述的方法,其特征在于,所述基于所述元数据渲染所述对象音频信号,包括:The method according to any one of claims 14-16, wherein rendering the object audio signal based on the metadata includes:
    基于所述元数据和所述头文件对所述音频数据进行渲染。The audio data is rendered based on the metadata and the header file.
  18. 一种音频处理装置,其特征在于,包括:An audio processing device, characterized by including:
    确定模块,用于确定每帧音频数据的元数据,所述元数据包括所述音频数据中的声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determining module, used to determine the metadata of each frame of audio data. The metadata includes the absolute position information of the sound object in the audio data, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation of the sound object. at least one of the ranges;
    处理模块,用于基于所述音频数据的元数据得到对象音频信号。A processing module, configured to obtain an object audio signal based on the metadata of the audio data.
  19. 一种音频处理装置,其特征在于,包括:An audio processing device, characterized by including:
    获取模块,用于获取编码设备发送的编码后的信号;The acquisition module is used to obtain the encoded signal sent by the encoding device;
    解码模块,用于对所述编码后的信号进行解码得到对象音频信号;A decoding module, used to decode the encoded signal to obtain the target audio signal;
    确定模块,用于确定所述对象音频信号的元数据,所述元数据包括声对象的绝对位置信息、声对象的相对位置信息、声对象的朝向信息、声对象的发声辐射范围中的至少一种;Determining module, used to determine the metadata of the object audio signal, the metadata includes at least one of the absolute position information of the sound object, the relative position information of the sound object, the orientation information of the sound object, and the sound radiation range of the sound object. kind;
    渲染模块,用于基于所述元数据渲染所述对象音频信号。A rendering module, configured to render the object audio signal based on the metadata.
  20. 一种通信装置,其特征在于,所述装置包括处理器和存储器,其中,所述存储器中存储有计算机程序,所述处理器执行所述存储器中存储的计算机程序,以使所述装置执行如权利要求1至10中任 一项所述的方法。A communication device, characterized in that the device includes a processor and a memory, wherein a computer program is stored in the memory, and the processor executes the computer program stored in the memory, so that the device executes: The method of any one of claims 1 to 10.
  21. 一种通信装置,其特征在于,所述装置包括处理器和存储器,其中,所述存储器中存储有计算机程序,所述处理器执行所述存储器中存储的计算机程序,以使所述装置执行如权利要求11至17中任一项所述的方法。A communication device, characterized in that the device includes a processor and a memory, wherein a computer program is stored in the memory, and the processor executes the computer program stored in the memory, so that the device executes: The method of any one of claims 11 to 17.
  22. 一种通信装置,其特征在于,包括:处理器和接口电路,其中A communication device, characterized by comprising: a processor and an interface circuit, wherein
    所述接口电路,用于接收代码指令并传输至所述处理器;The interface circuit is used to receive code instructions and transmit them to the processor;
    所述处理器,用于运行所述代码指令以执行如权利要求1至10中任一项所述的方法。The processor is configured to run the code instructions to perform the method according to any one of claims 1 to 10.
  23. 一种通信装置,其特征在于,包括:处理器和接口电路,其中A communication device, characterized by comprising: a processor and an interface circuit, wherein
    所述接口电路,用于接收代码指令并传输至所述处理器;The interface circuit is used to receive code instructions and transmit them to the processor;
    所述处理器,用于运行所述代码指令以执行如权利要求11至17中任一项所述的方法。The processor is configured to run the code instructions to perform the method according to any one of claims 11 to 17.
  24. 一种计算机可读存储介质,用于存储有指令,当所述指令被执行时,使如权利要求1至10中任一项所述的方法被实现。A computer-readable storage medium for storing instructions, which when executed, enables the method according to any one of claims 1 to 10 to be implemented.
  25. 一种计算机可读存储介质,用于存储有指令,当所述指令被执行时,使如权利要求11至17中任一项所述的方法被实现。A computer-readable storage medium for storing instructions, which when executed, enables the method according to any one of claims 11 to 17 to be implemented.
PCT/CN2022/091052 2022-05-05 2022-05-05 Audio processing method and apparatus, and storage medium WO2023212880A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202280001320.1A CN117581566A (en) 2022-05-05 2022-05-05 Audio processing method, device and storage medium
PCT/CN2022/091052 WO2023212880A1 (en) 2022-05-05 2022-05-05 Audio processing method and apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/091052 WO2023212880A1 (en) 2022-05-05 2022-05-05 Audio processing method and apparatus, and storage medium

Publications (1)

Publication Number Publication Date
WO2023212880A1 true WO2023212880A1 (en) 2023-11-09

Family

ID=88646108

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091052 WO2023212880A1 (en) 2022-05-05 2022-05-05 Audio processing method and apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN117581566A (en)
WO (1) WO2023212880A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103650539A (en) * 2011-07-01 2014-03-19 杜比实验室特许公司 System and method for adaptive audio signal generation, coding and rendering
US20160133261A1 (en) * 2013-05-31 2016-05-12 Sony Corporation Encoding device and method, decoding device and method, and program
US20180192186A1 (en) * 2015-07-02 2018-07-05 Dolby Laboratories Licensing Corporation Determining azimuth and elevation angles from stereo recordings
CN113905321A (en) * 2021-09-01 2022-01-07 赛因芯微(北京)电子科技有限公司 Object-based audio channel metadata and generation method, device and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103650539A (en) * 2011-07-01 2014-03-19 杜比实验室特许公司 System and method for adaptive audio signal generation, coding and rendering
US20160133261A1 (en) * 2013-05-31 2016-05-12 Sony Corporation Encoding device and method, decoding device and method, and program
US20180192186A1 (en) * 2015-07-02 2018-07-05 Dolby Laboratories Licensing Corporation Determining azimuth and elevation angles from stereo recordings
CN113905321A (en) * 2021-09-01 2022-01-07 赛因芯微(北京)电子科技有限公司 Object-based audio channel metadata and generation method, device and storage medium

Also Published As

Publication number Publication date
CN117581566A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
WO2020244495A1 (en) Screen projection display method and electronic device
JP5882964B2 (en) Audio spatialization by camera
US10051368B2 (en) Mobile apparatus and control method thereof
WO2022135527A1 (en) Video recording method and electronic device
WO2022068613A1 (en) Audio processing method and electronic device
US20230026812A1 (en) Device Positioning Method and Related Apparatus
CN113921002A (en) Equipment control method and related device
CN113596241B (en) Sound processing method and device
WO2023212880A1 (en) Audio processing method and apparatus, and storage medium
CN114598984B (en) Stereo synthesis method and system
CN105577521B (en) Good friend's group technology, apparatus and system
CN116368460A (en) Audio processing method and device
CN114040319B (en) Method, device, equipment and medium for optimizing playback quality of terminal equipment
WO2022143310A1 (en) Double-channel screen projection method and electronic device
CN114667744B (en) Real-time communication method, device and system
CN116797767A (en) Augmented reality scene sharing method and electronic device
KR20230002968A (en) Bit allocation method and apparatus for audio signal
WO2023193148A1 (en) Audio playback method/apparatus/device, and storage medium
WO2023197646A1 (en) Audio signal processing method and electronic device
US20240080406A1 (en) Video Conference Calls
CN115552518B (en) Signal encoding and decoding method and device, user equipment, network side equipment and storage medium
WO2022206643A1 (en) Method for estimating angle of arrival of signal and electronic device
WO2023202445A1 (en) Demonstration system, method, graphical interface, and related apparatus
WO2023212879A1 (en) Object audio data generation method and apparatus, electronic device, and storage medium
CN116048241A (en) Prompting method, augmented reality device and medium

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202280001320.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22940576

Country of ref document: EP

Kind code of ref document: A1