CN115134737A

CN115134737A - Sound bed output rendering item determination method, device, equipment and storage medium

Info

Publication number: CN115134737A
Application number: CN202210600880.0A
Authority: CN
Inventors: 吴健
Original assignee: Saiyinxin Micro Beijing Electronic Technology Co ltd
Current assignee: Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2022-09-30

Abstract

The present disclosure relates to a sound bed output rendering item determination method, device, equipment and storage medium, the method includes: acquiring a rendering item generated in advance by a rendering item generator; selecting a rendering item generated in advance by a rendering item generator according to the audio model metadata structure path, and determining an output rendering item according to the preset attribute of the audio model metadata; transmitting the audio signals corresponding to the output rendering items to a mixer through a sound bed renderer to form audio output and transmit the audio output to all loudspeaker configurations; and after the definition type of the output rendering item is determined, converting each audio channel format and the corresponding audio track specification into a sound bed output rendering item in audio channel allocation. The audio signals can be rendered to all speaker configurations specified in the advanced sound system.

Description

Sound bed output rendering item determination method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of audio processing technologies, and in particular, to a method, an apparatus, a device, and a storage medium for determining an output rendering item of a sound bed.

Background

With the development of technology, audio becomes more and more complex. The early single-channel audio is converted into stereo, and the working center also focuses on the correct processing mode of the left and right channels. But after surround sound occurs, the process begins to become complex. The surround 5.1 speaker system performs ordering constraint on a plurality of channels, and further the surround 6.1 speaker system, the surround 7.1 speaker system and the like enable audio processing to be varied, and correct signals are transmitted to proper speakers to form an effect of mutual involvement. Thus, as sound becomes more immersive and interactive, the complexity of audio processing also increases greatly.

Audio channels (or audio channels) refer to audio signals that are independent of each other and that are captured or played back at different spatial locations when sound is recorded or played. And the number of channels is the number of sound sources when recording sound or the corresponding number of speakers when playing back sound. For example, in a surround 5.1 speaker system comprising audio signals at 6 different spatial locations, each separate audio signal is used to drive a speaker at a corresponding spatial location; in a surround 7.1 speaker system comprising audio signals at 8 different spatial positions, each separate audio signal is used to drive a speaker at a corresponding spatial position.

Therefore, the effect achieved by current loudspeaker systems depends on the number and spatial position of the loudspeakers. For example, a binaural speaker system cannot achieve the effect of a surround 5.1 speaker system.

Disclosure of Invention

The present disclosure is directed to a sound bed output rendering item determination method, apparatus, device and storage medium, which provides conversion of audio model metadata into a set of renderable items, and can render audio signals to all speaker configurations specified in an advanced sound system.

A first aspect of the present disclosure provides a sound bed output rendering item determining method, including:

acquiring a rendering item generated in advance by a rendering item generator;

selecting a rendering item generated in advance by a rendering item generator according to the audio model metadata structure path, and determining an output rendering item according to the preset attribute of the audio model metadata; transmitting the audio signal corresponding to the output rendering item to a mixer through a sound bed renderer to form audio output and transmit the audio output to all loudspeaker configurations;

and after the definition type of the output rendering item is determined, converting each audio channel format and the corresponding audio track specification into a sound bed output rendering item in audio channel distribution.

A second aspect of the present disclosure provides a sound bed output rendering item determining apparatus, including:

the acquisition module is used for acquiring the rendering item generated in advance by the rendering item generator;

the determining module is used for selecting a rendering item generated in advance by the rendering item generator according to the audio model metadata structure path and then determining an output rendering item according to the preset attribute of the audio model metadata; transmitting the audio signal corresponding to the output rendering item to a mixer through a sound bed renderer to form audio output and transmit the audio output to all loudspeaker configurations;

and the conversion module is used for converting each audio channel format and the corresponding audio track specification into a sound bed output rendering item module in audio channel allocation after the definition type of the output rendering item is determined.

A third aspect of the present disclosure provides an electronic device, comprising: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method for determining a bed output rendering item as provided by any of the embodiments.

A fourth aspect of the present disclosure provides a storage medium containing computer-executable instructions for implementing the method for determining a bed output rendering item provided in any of the embodiments in a computer processor.

From the above, it can be seen that the sound bed output rendering item determination method of the present disclosure provides conversion of audio model metadata to a set of renderable items, and can render audio signals to all speaker configurations specified in an advanced sound system. The renderer receives the audio and metadata, along with information about the desired output format (typically speaker layout), and processes the input audio channels after parsing the metadata in a sound-generating manner described by the metadata.

Drawings

Fig. 1 is a schematic diagram of a three-dimensional acoustic audio model provided in an embodiment of the present disclosure:

FIG. 2 is a schematic diagram of an audio renderer provided in an embodiment of the disclosure;

FIG. 3 is a flow chart of a method for determining a sound bed output rendering item in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a sound bed output rendering item determination apparatus in an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device in an embodiment of the present disclosure;

fig. 6 is a schematic diagram of rendering item set selection in an embodiment of the present disclosure.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Examples

As shown in fig. 1, a three-dimensional acoustic audio model is composed of a set of elements, each element describing one stage of audio, and includes a content production section and a format production section.

Wherein the content part comprises: an audio program element, an audio content element, an audio object element, and a soundtrack unique identification element; the format making part includes: an audio packet format element, an audio channel format element, an audio stream format element, and an audio track format element;

the audio program element references at least one of the audio content elements; the audio content element references at least one audio object element; the audio object element references the corresponding audio package format element and the corresponding audio track unique identification element; the audio track unique identification element refers to the corresponding audio track format element and the corresponding audio package format element;

the audio package format element references at least one of the audio channel format elements; the audio stream format element references the corresponding audio channel format element and the corresponding audio packet format element; the audio track format element and the corresponding audio stream format element are referenced to each other. The reference relationships between elements are indicated by arrows in fig. 1.

The audio program may include, but is not limited to, narration, sound effects, and background music, the audio program elements may be used to describe a program, the program includes at least one content, and the audio content elements are used to describe a corresponding one of the audio program elements. An audio program element may reference one or more audio content elements that are grouped together to construct a complete audio program element.

The audio content elements describe the content of a component of an audio program, such as background music, and relate the content to its format by reference to one or more audio object elements.

The audio object elements are used to build content, format and valuable information and to determine the soundtrack unique identification of the actual soundtrack.

The format making part comprises: an audio packet format element, an audio channel format element, an audio stream format element, an audio track format element.

The audio packet format element may be configured to describe a format used when the audio object element and the original audio data are packed according to channel packets.

The audio channel format element may be used to represent a single sequence of audio samples and preset operations performed on it, such as movement of rendering objects in a scene. The audio channel format element may comprise at least one audio block format element. The audio block format elements may be considered to be sub-elements of the audio channel format elements, and therefore there is an inclusion relationship between the audio channel format elements and the audio block format elements.

Audio streams, which are combinations of audio tracks needed to render channels, objects, higher-order ambient sound components, or packets. The audio stream format elements are used for establishing the relationship between the audio track format element set and the audio channel format element set or the relationship between the audio track format set and the audio packet format.

The audio track format elements correspond to a set of samples or data in a single audio track, and are used to describe the format of the original audio data, and the decoded signals of the renderer, and also to identify the combination of audio tracks required to successfully decode the audio track data.

And generating synthetic audio data containing metadata after the original audio data are produced through the three-dimensional sound audio model.

The Metadata (Metadata) is information describing characteristics of data, and functions supported by the Metadata include indicating a storage location, history data, resource lookup, or file record.

And after the synthesized audio data are transmitted to the far end in a communication mode, the far end renderer analyzes the synthesized audio data based on the metadata, and an original sound scene is restored or a new sound scene is rendered in real time.

Channel-based audio refers to an audio rendering: content is mixed into a predetermined number of signal channels during programming and each channel is associated with a speaker at a particular static location. Each channel is reproduced by routing the channel to the associated speaker (if present) or to one or more available speakers (e.g., by channel down-mixing) to best represent the playback on the intended speaker. The production flow, the broadcast network and the reproduction system are each defined by the position of a series of loudspeakers.

The renderer architecture is based on the provided input metadata, target environment (parameters/configuration) and audio stream as shown in fig. 2; the processing steps are as follows, the rendering item is determined as the conversion of the audio model metadata to a set of renderable items; rendering item processing is optional processing of application importance and conversion simulation; the type definition (typeDefinition) of a rendering item is a sub-component split from the rendering item itself: an object-based renderer, a direct speaker (DirectSpeakers) signal based renderer, a scene (HOA) based renderer, a shared renderer component for all parts. Note that matrix type processing is not shown in the figure, as matrix types are processed during the creation of the render item and are part of other types of renderers.

Target environment behavior: at initialization, the user selects a speaker layout from the speaker layouts that define the advanced sound system for programming. The nominal position (polar _ nominal _ position) of each loudspeaker is as specified, with the nominal azimuth angles of M + SC and M-SC being 15 ° and-15 °. The actual position of each speaker may be specified by the user. If not, the nominal position is used. Checking the given actual position according to the given range; if not, an error is issued. Furthermore, the absolute azimuth of the M + SC and M-SC speakers must be between 5 ° and 25 °, or between 35 ° or 60 °. Where "+/-SC" represents a pair of speakers to the left and right of the screen. The international telecommunications union ITU bs.2051 standard specifies in detail the loudspeaker layout (i.e. bs.2051_ layout group (loudspeakers)) of an advanced sound system for programming.

As shown in fig. 3, the present disclosure provides a method for determining a bed output rendering item, the method including:

s210, obtaining a rendering item generated in advance by a rendering item generator;

s220, according to the audio model metadata structure path, selecting a rendering item generated in advance by a rendering item generator, and then determining an output rendering item according to the preset attribute of the audio model metadata; transmitting the audio signal corresponding to the output rendering item to a mixer through a sound bed renderer to form audio output and transmit the audio output to all loudspeaker configurations;

and S230, after the definition type of the output rendering item is determined, converting each audio channel format and the corresponding audio track specification into a sound bed output rendering item in audio channel allocation. In the rendering item determination, the process of rendering item determination of the object (Objects) and the sound Bed (Bed) is similar, only the selection of the type and parameters involved is different.

Optionally, as shown in fig. 6, selecting a rendering item generated in advance by the rendering item generator, analyzing an audio model structure, and selecting a rendering item among the rendering items in a single audio object when the rendering item is in a selected state; when completely filled, the rendering item selection state represents all rendering items constituting a single rendering item (RenderingItem); each of the rendering items accepts a single rendering item selection state and returns a rendering item copy.

Optionally, there is at least one copy of each rendering item, and the copy of the rendering item is filled with more rendering items; each rendering item modifies a nested loop on the state in turn.

Optionally, determining the output rendering item includes: selecting an input starting point, selecting an audio program, selecting audio content, selecting an audio object, complementing audio object processing, audio packet format matching, and outputting a rendering item;

the input starting point selection starts from a plurality of input points in an audio model structure according to elements contained in the audio file; selecting a single audio program (audioprogram) if there are audio program elements; otherwise, if there is an audio object (audioObject) element, all audio objects should be selected; otherwise, all track unique identification (audioTrackUID) sets (called channel only assignment (CHNA) mode) will be selected;

selecting an audio program by using a program; the user can select the program to be used, and if the audio program is not selected, the audio program with the lowest ID value is selected;

the audio content selection is used for selecting and referencing all audio content (audioContent) sets according to the audio program;

the audio object selection, audio objects being all paths through the audio object hierarchy, starting from the selected audio content (following audio object links);

the supplementary audio object processing means that one group of audio objects is selected from default audio objects in the defined audio object group to all non-default audio objects in the group, and the default audio objects are overwritten after the audio objects are copied; defining a set of audio objects, determining a set of audio objects to be ignored;

the audio packet format matching is to match the audio packet format, the unique audio track identifier and the list of the number of silent audio tracks in the audio object according to the audio packet format (audioPackFormat) and audio channel format (audioChannelFormat) structures; or matching a list of all audio track unique identifiers under the channel allocation mode according to the audio packet format and the audio channel format structure;

and the output rendering item determines the format of a root audio package, allocates a corresponding track specification (TrackSpec) for each audio channel, and converts all information of finding the format of the root audio package into one or more output rendering item sets (RenderingItems). The output rendering item set is determined according to a type of providing a root audio package format. The root audio packet format is a root packet (root _ pack), the top-level audio packet format of all channels to be distributed is referred to, and the AudioPackFormat root _ pack is represented in a software program.

The additional data, track specifications, and importance of the rendered item are determined by shared component attribute derivation.

A sharing component, some data in the render item is shared between types, also derived in the same way; the importance data (ImportanceData) object is derived from the item selection state, and has the following values:

the shared component is designated as least important in all sets of audio objects of the path; the shared component has the lowest importance specified in any audio packet format on the path from the root audio packet format to the audio channel format.

In both cases, unspecified importance (None) is defined as the highest importance.

The extra data (Extraddata) object is derived from the item selection state, and has the following values: object start is the start time of the last audio object on the path (no importance is specified in the channel-only allocation mode).

The object duration (object _ duration) is the duration (duration) of the last audio object on the path (no significance is specified in the channel-only allocation mode).

The screen reference (reference _ screen) is an audio program screen reference (audioprogramreferencescreen) of the selected audio program (unselected, i.e., not assigned importance).

The channel frequency (channel _ frequency) is a frequency (frequency) element of the selected audio channel format. (or no importance is specified if one is not selected, such as when creating a scene rendering item).

Optionally, the sound bed outputs the rendering item, creates a resource metadata (MetadataSource), generates a rendering item of a preset type for each audio block format (audioBlockFormat) in the selected audio channel format, and the attribute of the audio packet format set includes all audio packet formats on a path between the root audio packet format and the audio channel format; and the sound bed output rendering item is packaged in a preset type of rendering item object.

For a defined type of bed, the metadata type (TypeMetadata) of the rendered item contains the audio chunk format, the audio packet format set list containing the audio channel format, and the generic data collected in the additional data; each audio channel format of the sound bed type can be processed independently, and the rendering item comprises a sound track specification; the track definition types are supported by track specification types, including silent track specification, direct track specification, matrix coefficient track specification, and mixed track specification.

For different bed (channel) types, the attributes of the audio packet format element: audio packet ID, audio packet name, channel type descriptor, channel type description, and importance of the audio packet.

The channel-based audio type is a type of audio without any signal modification, and each channel audio signal is transmitted to each loudspeaker; each channel is marked with a preset identifier to ensure that the channel is directed to the correct speaker. Preferably, the channel-based audio channels are processed and converted to other configurations. The renderer receives the audio and metadata, along with information about the desired output format (typically speaker layout), and processes the input audio channels after parsing the metadata in a sound-generating manner described by the metadata. When the channel matches the speaker layout used, the channel-based audio does not need to be rendered; if the input channel configuration does not match the speaker layout, rendering needs to be used.

The determination method, which provides a transformation of audio model metadata into a set of renderable items, is capable of rendering audio signals to all speaker configurations specified in an advanced sound system.

Fig. 4 is a sound bed output rendering item determining apparatus provided in an embodiment of the present disclosure, including:

an obtaining module 310, configured to obtain a rendering item generated in advance by the rendering item generator;

the determining module 320 is configured to select a rendering item pre-generated by the rendering item generator according to the audio model metadata structure path, and determine an output rendering item according to an attribute of preset audio model metadata; transmitting the audio signal corresponding to the output rendering item to a mixer through a sound bed renderer to form audio output and transmit the audio output to all loudspeaker configurations;

a converting module 330, configured to convert each audio channel format and the corresponding audio track specification into a sound bed output rendering item module in audio channel allocation after the definition type of the output rendering item is determined.

Optionally, selecting a rendering item generated in advance by the rendering item generator, analyzing an audio model structure, and selecting a rendering item among rendering items in a single audio object when the rendering item is in a selected state; when fully populated, the rendering item selection state represents all rendering items that make up a single rendering item; each rendering item accepts a single rendering item selection state and returns a rendering item copy.

Optionally, at least one copy of each rendering item is used, and the copies of the rendering items are filled with more rendering items; each rendering item modifies a nested loop on the state in turn.

the input starting point selection starts from a plurality of input points in an audio model structure according to elements contained in the audio file; selecting a single audio program if there are audio program elements; otherwise, if there are audio object elements, all audio objects are selected; otherwise, all track unique identification sets will be selected (called channel-only allocation mode);

selecting an audio program by using a program; if the audio program is not selected, the audio program with the lowest ID value is selected;

the audio content selection is performed, and all audio content sets quoted according to the audio program selection are selected;

the supplementary audio object processing means that a group of audio objects is selected from default audio objects in the defined audio object group to all non-default audio objects in the group, and the default audio objects are overwritten after the audio objects are copied; defining a set of audio objects, determining a set of audio objects to be ignored;

the audio packet format matching is to match the audio packet format, the audio track unique identifier and the list of the silent audio track number in the audio object according to the audio packet format and the audio channel format structure; or matching a list of all audio track unique identifiers under the channel allocation mode according to the audio packet format and the audio channel format structure;

and the output rendering item determines a root audio packet format, allocates a corresponding audio track specification for each audio channel, and converts all information of the found root audio packet format into one or more output rendering item sets. The set of output rendering items is determined according to a type of a format providing the root audio packet.

Optionally, the sound bed output rendering item module creates a resource metadata, generates a preset type of rendering item for each audio block format in the selected audio channel format, and the attribute of the audio packet format set includes all audio packet formats on a path between the root audio packet format and the audio channel format; and the sound bed output rendering item is packaged in a rendering item object of a preset type.

The sound bed output rendering item determining device provided by the embodiment of the invention can execute the sound bed output rendering item determining method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executing method.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure. As shown in fig. 5, the electronic apparatus includes: a processor 410, a memory 420, an input device 430, and an output device 440. The number of the processors 30 in the electronic device may be one or more, and one processor 410 is taken as an example in fig. 5. The number of the memory 420 in the electronic device may be one or more, and one memory 420 is taken as an example in fig. 5. The processor 410, the memory 420, the input device 430 and the output device 440 of the electronic apparatus may be connected by a bus or other means, and fig. 5 illustrates the connection by the bus as an example. The electronic device can be a computer, a server and the like. The embodiment of the present disclosure is described in detail by taking an electronic device as a server, and the server may be an independent server or a cluster server. The memory 420 serves as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules of the broadcast audio format file generation apparatus according to any embodiment of the present disclosure. The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 430 may be used to receive input numeric or character information and generate key signal inputs related to viewer user settings and function controls of the electronic device, and may also be a camera for acquiring images and a sound pickup device for acquiring audio data. The output device 440 may include an audio device such as a speaker. It should be noted that the specific composition of the input device 430 and the output device 440 can be set according to actual situations.

The processor 410 executes various functional applications of the device and data processing, i.e., implements a sound bed output rendering item determination method, by executing software programs, instructions, and modules stored in the memory 420.

The disclosed embodiments also provide a storage medium containing computer-executable instructions for generating, by a computer processor, a sound bed output rendering item determination method including any of the embodiments.

Of course, the storage medium provided by the embodiments of the present disclosure includes computer-executable instructions, which are not limited to the above-described electronic method operations, but may also perform related operations in the electronic method provided by any embodiment of the present disclosure, and have corresponding functions and advantages.

From the above description of the embodiments, it is obvious for a person skilled in the art that the present disclosure can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the electronic method according to any embodiment of the present disclosure.

It should be noted that, in the electronic device, the units and modules included in the electronic device are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present disclosure.

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description of the present specification, reference to the description of the terms "in an embodiment," "in another embodiment," "exemplary" or "in a particular embodiment," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the present disclosure has been described in detail hereinabove with respect to general description, specific embodiments and experiments, it will be apparent to those skilled in the art that some modifications or improvements may be made based on the present disclosure. Accordingly, such modifications and improvements are intended to be within the scope of this disclosure, as claimed.

Claims

1. A method for determining a bed output rendering item, comprising:

acquiring a rendering item generated in advance by a rendering item generator;

according to the audio model metadata structure path, selecting a rendering item generated in advance by the rendering item generator, and then determining an output rendering item according to the preset attribute of the audio model metadata; transmitting the audio signal corresponding to the output rendering item to a mixer through a sound bed renderer to form audio output and transmit the audio output to all loudspeaker configurations;

and after the definition type of the output rendering item is determined, converting each audio channel format and the corresponding audio track specification into a sound bed output rendering item in audio channel allocation.

2. The method of claim 1, wherein the rendering items generated by the rendering item generator in advance are selected, the audio model structure is analyzed, and when a rendering item selection state is selected, a rendering item selection is performed between the rendering items in a single audio object; when fully populated, the rendering item selection state represents all rendering items that make up a single rendering item; each rendering item accepts a single rendering item selection state and returns a rendering item copy.

3. The method of claim 2, wherein there is at least one copy of each of the render items, and wherein each of the render items modifies a nested loop over the state in turn.

4. The method of claim 1, wherein determining the output rendering item comprises: selecting an input starting point, selecting an audio program, selecting audio content, selecting an audio object, complementing audio object processing, audio packet format matching, and outputting a rendering item;

the input starting point selection starts from a plurality of input points in an audio model structure according to elements contained in the audio file;

selecting an audio program by using a program;

selecting the audio object, wherein the audio object is all paths passing through an audio object hierarchy structure and starts from the selected audio content;

the supplementary audio object processing means that one group of audio objects is selected from default audio objects in the defined audio object group to all non-default audio objects in the group, and the default audio objects are overwritten after the audio objects are copied;

the audio packet format matching is to match the audio packet format, the audio track unique identifier and the list of the silent audio track number in the audio object according to the audio packet format and the audio channel format structure; or a list matching the unique identifiers of all audio tracks in the channel allocation mode;

and the output rendering item determines a root audio packet format, allocates a corresponding audio track specification for each audio channel, and converts all information of the found root audio packet format into one or more output rendering item sets, wherein the output rendering item sets are determined according to the type of the provided root audio packet format.

5. The method of claim 4, wherein the sound bed outputs the rendering item, creates a resource metadata, generates a predetermined type of rendering item for each audio block format in the selected audio channel format, and the attribute of the audio packet format set includes all audio packet formats on a path between the root audio packet format and the audio channel format; and the sound bed output rendering item is packaged in a rendering item object of a preset type.

6. A sound bed output rendering item determination apparatus, comprising:

the determining module is used for selecting a rendering item generated in advance by the rendering item generator according to the audio model metadata structure path and then determining an output rendering item according to the attribute of the preset audio model metadata; transmitting the audio signal corresponding to the output rendering item to a mixer through a sound bed renderer to form audio output and transmit the audio output to all loudspeaker configurations;

7. An electronic device, comprising: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-5.

8. A storage medium containing computer-executable instructions for implementing the method of any one of claims 1-5 when executed by a computer processor.