CN114121036A - Audio track unique identification metadata and generation method, electronic device and storage medium - Google Patents

Audio track unique identification metadata and generation method, electronic device and storage medium Download PDF

Info

Publication number
CN114121036A
CN114121036A CN202111205630.9A CN202111205630A CN114121036A CN 114121036 A CN114121036 A CN 114121036A CN 202111205630 A CN202111205630 A CN 202111205630A CN 114121036 A CN114121036 A CN 114121036A
Authority
CN
China
Prior art keywords
audio
audio track
unique identification
format
track
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111205630.9A
Other languages
Chinese (zh)
Inventor
吴健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Saiyinxin Micro Beijing Electronic Technology Co ltd
Original Assignee
Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Saiyinxin Micro Beijing Electronic Technology Co ltd filed Critical Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority to CN202111205630.9A priority Critical patent/CN114121036A/en
Publication of CN114121036A publication Critical patent/CN114121036A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Stereophonic System (AREA)

Abstract

The present disclosure relates to a sound track unique identification metadata and generation method, an electronic device, and a storage medium. The audio track uniquely identifies metadata, including: the attribute area comprises audio track unique identification information and preset audio track description information; and the sub-element area comprises an audio material exchange format searching sub-element, audio track format reference information and audio packet format reference information. The audio data can realize the reproduction of three-dimensional sound in the space during rendering, thereby improving the quality of sound scenes.

Description

Audio track unique identification metadata and generation method, electronic device and storage medium
Technical Field
The present disclosure relates to the field of audio processing technologies, and in particular, to a method for generating audio track unique identification metadata, an electronic device, and a storage medium.
Background
With the development of technology, audio becomes more and more complex. The early single-channel audio is converted into stereo, and the working center also focuses on the correct processing mode of the left and right channels. But the process begins to become complex after surround sound occurs. The surround 5.1 speaker system performs ordering constraint on a plurality of channels, and further the surround 6.1 speaker system, the surround 7.1 speaker system and the like enable audio processing to be varied, and correct signals are transmitted to proper speakers to form an effect of mutual involvement. Thus, as sound becomes more immersive and interactive, the complexity of audio processing also increases greatly.
Audio channels (or audio channels) refer to audio signals that are independent of each other and that are captured or played back at different spatial locations when sound is recorded or played. The number of channels is the number of sound sources when recording or the number of corresponding speakers when playing back sound. For example, in a surround 5.1 speaker system comprising audio signals at 6 different spatial locations, each separate audio signal is used to drive a speaker at a corresponding spatial location; in a surround 7.1 speaker system comprising audio signals at 8 different spatial positions, each separate audio signal is used to drive a speaker at a corresponding spatial position.
Therefore, the effect achieved by current loudspeaker systems depends on the number and spatial position of the loudspeakers. For example, a binaural speaker system cannot achieve the effect of a surround 5.1 speaker system.
The present disclosure provides a track unique identification metadata and a construction method in order to provide a metadata capable of solving the above technical problems.
Disclosure of Invention
The present disclosure is directed to a method for generating audio track unique identification metadata, an electronic device, and a storage medium, so as to solve one of the above technical problems.
To achieve the above object, a first aspect of the present disclosure provides audio track unique identification metadata, including:
the attribute area comprises audio track unique identification information and preset audio track description information;
and the sub-element area comprises an audio material exchange format searching sub-element, audio track format reference information and audio packet format reference information.
To achieve the above object, a second aspect of the present disclosure provides a method for generating audio track unique identification metadata, including:
the generating comprises a unique identification metadata of the audio track as described in the first aspect.
To achieve the above object, a third aspect of the present disclosure provides an electronic device, including: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more programs, cause the one or more processors to generate a data stream comprising audio track unique identification metadata as described in the first aspect.
To achieve the above object, a fourth aspect of the present disclosure provides a storage medium containing computer-executable instructions which, when generated by a computer processor, comprise audio track unique identification metadata as described in the first aspect.
As can be seen from the above, the disclosed audio track uniquely identifies metadata, uniquely identifies an audio track or resource in a file or recording of an audio scene. So as to realize the reproduction of three-dimensional sound in space, thereby improving the quality of sound scenes.
Drawings
Fig. 1 is a schematic diagram of a three-dimensional acoustic audio production model provided in embodiment 1 of the present disclosure;
fig. 2 is a flowchart of a method for generating audio track unique identification metadata according to embodiment 2 of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device provided in embodiment 3 of the present disclosure.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
As shown in fig. 1, a three-dimensional audio production model is composed of a set of production elements each describing one stage of audio production, and includes a content production section and a format production section.
Wherein the content production section includes: an audio program element, an audio content element, an audio object element, and a soundtrack unique identification element; the format making part includes: an audio packet format element, an audio channel format element, an audio stream format element, and an audio track format element;
the audio program element references at least one of the audio content elements; the audio content element references at least one audio object element; the audio object element references the corresponding audio package format element and the corresponding audio track unique identification element; the audio track unique identification element refers to the corresponding audio track format element and the corresponding audio package format element;
the audio package format element references at least one of the audio channel format elements; the audio stream format element refers to the corresponding audio channel format element and the corresponding audio packet format element; the audio track format element and the corresponding audio stream format element are referenced to each other. The reference relationships between elements are indicated by arrows in fig. 1.
The audio program may include, but is not limited to, narration, sound effects, and background music, the audio program elements may be used to describe a program, the program includes at least one content, and the audio content elements are used to describe a corresponding one of the audio program elements. An audio program element may reference one or more audio content elements that are grouped together to construct a complete audio program element.
The audio content elements describe the content of a component of an audio program, such as background music, and relate the content to its format by reference to one or more audio object elements.
The audio object elements are used to build content, format and valuable information and to determine the soundtrack unique identification of the actual soundtrack.
The audio packet format element may be configured to describe a format adopted when the audio object element and the original audio data are packed according to channel packets.
The audio channel format element may be used to represent a single sequence of audio samples and preset operations performed on it, such as movement of rendering objects in a scene. The audio channel format element may comprise at least one audio block format element. The audio block format elements may be considered to be sub-elements of the audio channel format elements, and therefore there is an inclusion relationship between the audio channel format elements and the audio block format elements.
Stream, is a combination of audio tracks needed to render a channel, object, higher order ambient sound component, or packet. The audio stream format establishes a relationship between a set of audio track formats and a set of audio channel formats or audio packet formats.
An audio track format element corresponds to a set of samples or data in a single audio track in a storage medium. It is used to describe the format of the data, allowing the renderer to decode the signal correctly. It comes from an audio stream format element that identifies the combination of audio tracks needed to successfully decode the audio track data.
And generating synthetic audio data containing metadata after the original audio data is produced through the three-dimensional sound audio production model.
The Metadata (Metadata) is information describing characteristics of data, and functions supported by the Metadata include indicating a storage location, history data, resource lookup, or file record.
And after the synthesized audio data is transmitted to the far end in a communication mode, the far end renders the synthesized audio data based on the metadata to restore the original sound scene.
The division between content production, format production and BW64(Broadcast Wave 64 bit) files is shown in fig. 1. Both the content production portion and the format production portion constitute metadata in XML format, which is typically contained in one block ("axml" block) of the BW64 file. The bottom BW64 file portion contains a "channel allocation (chna)" block, which is a look-up table used to link metadata to the audio programs in the file.
Example 1
The present disclosure provides and details a unique identification metadata for a sound track in a three-dimensional acoustic audio model.
The audio track uniquely identifies metadata, including:
the attribute area comprises audio track unique identification information and preset audio track description information;
and the sub-element area comprises an audio material exchange format searching sub-element, audio track format reference information and audio packet format reference information.
Wherein the attribute zone includes a generic definition of audio track unique identification (audioTrackUID) metadata. The track unique identification information is used to uniquely identify a track or asset in a file or recording of an audio scene. The audio track unique identification information may be composed of characters and numbers as identification information of an actual audio track to which the audio track unique identification metadata corresponds. For example, the track unique identification information is denoted by ATU _ 00000001. The property region may further include preset track description information, which may be information on a bit depth and a sampling rate of the track. The preset track description information may include: track sampling rate and track bit depth.
The track unique identification metadata may also include sub-elements that allow the use of three-dimensional sound audio production models for non-BW 64 applications by performing the job of "channel allocation (chna)" blocks. When the three-dimensional audio production model is used together with a Material Exchange Format (MXF) file, child elements are searched for using the Material Exchange Format audio, wherein the child elements refer to audio elements in the file, that is, audio track Format reference information and audio packet Format reference information. The audio packet format reference information indicates an audio packet format referenced by the audio track unique identification metadata, and may be audio packet identification information that uniquely identifies the audio packet format referenced by the metadata for the audio track, where the audio packet identification information is information that identifies the audio packet format in an audio packet format element. The audio track format reference information indicates an audio track format referenced by the audio track unique identification metadata, which may be audio track identification information uniquely identifying the audio track format referenced by the metadata for the audio track, the audio track identification information being information identifying the audio track format in an audio track format element.
The attribute area includes information as shown in table 1,
TABLE 1
Figure BDA0003306714630000061
In table 1, the requirement item indicates whether the item attribute needs to be set when generating the unique identification metadata of the track, "yes" indicates that the item attribute is a necessary item, and "optional" indicates that the item attribute is an optional item.
The sub-element region includes information as shown in table 2,
TABLE 2
Figure BDA0003306714630000062
The number one item in table 2 indicates the number of sub-elements that can be set, and the track unique identification format may refer to the audio track format, or may refer to the audio package format, for example, when the audio track format is referred to, the number of audiotrackformatidiref is 1.
Optionally, the audio material interchange format lookup sub-element includes MXF package track identification reference information, MXF track reference information, and channel track reference information.
Wherein MXF has a different meaning for the terms "track" and "channel" than their use in the three-dimensional acoustic audio model in the embodiments of the present disclosure. In MXF, "track" is a storage medium containing audio or video, which for audio can be subdivided into "channels". The audio material interchange format lookup sub-element includes information as shown in table 3,
TABLE 3
Figure BDA0003306714630000071
Example 2
The present disclosure also provides a method embodiment adapted to the above embodiment, and a method for generating audio track unique identifier metadata, where the explanation based on the same name meaning is the same as that in the above embodiment, and has the same technical effect as that in the above embodiment, and details are not repeated here.
A method for generating audio track unique identification metadata, as shown in fig. 2, comprises the following steps:
step S110 of generating, in response to a setting operation by a user for audio track unique identification metadata, audio track unique identification metadata including:
the attribute area comprises audio track unique identification information and preset audio track description information;
and the sub-element area comprises an audio material exchange format searching sub-element, audio track format reference information and audio packet format reference information.
The setting operation of the user for the audio track unique identification metadata may be an operation of setting by the user for a relevant attribute of the audio track unique identification metadata, for example, a relevant attribute of the audio track unique identification metadata input item by the user is received; or automatically generating the unique identification metadata of the audio track according to the operation of a user on a preset metadata generation program, wherein the preset metadata generation program can be set to set all attributes of the unique identification metadata of the audio track according to default attributes of a system; alternatively, the unique identification metadata of the track may be automatically generated according to a user's operation on a preset metadata generation program, which may be configured to set a part of attributes of the unique identification metadata of the track according to default attributes of the system and then receive the remaining attributes input by the user.
Optionally, the preset audio track description information includes: track sampling rate and track bit depth.
Optionally, the audio package format reference information is audio package identification information of an audio package format referenced by the audio track unique identification metadata.
Optionally, the audio track format reference information is audio track identification information of an audio track format that the audio track uniquely identifies the metadata reference.
Optionally, the audio material interchange format lookup sub-element includes MXF package track identification reference information, MXF track reference information, and channel track reference information.
For example, the method for setting the unique identification metadata of the audio track can adopt the following coding mode:
<audioTrackUTD UID="ATU_00000001"sampleRate="48000"bitDepth="24">
<audioTrackFormatIDRef>AT_00010001_01</audioTrackFormatID Ref>
<audioPackFormatIDRef>AP_00010002</audioPackFormatIDRef>
</audioTrackUID>
the audio track unique identification metadata generated by the method for generating the audio track unique identification metadata uniquely identifies the audio scene file or the audio track or resource in recording, and can realize the reproduction of three-dimensional sound in space, thereby improving the quality of the sound scene.
Example 3
Fig. 3 is a schematic structural diagram of an electronic device provided in embodiment 3 of the present disclosure. As shown in fig. 3, the electronic apparatus includes: a processor 30, a memory 31, an input device 32, and an output device 33. The number of the processors 30 in the electronic device may be one or more, and one processor 30 is taken as an example in fig. 3. The number of the memories 31 in the electronic device may be one or more, and one memory 31 is taken as an example in fig. 3. The processor 30, the memory 31, the input device 32 and the output device 33 of the electronic apparatus may be connected by a bus or other means, and fig. 3 illustrates the connection by a bus as an example. The electronic device can be a computer, a server and the like. The embodiment of the present disclosure describes in detail by taking an electronic device as a server, and the server may be an independent server or a cluster server.
The memory 31 serves as a computer-readable storage medium that may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules for generating unique identification metadata for audio tracks, as described in any embodiment of the present disclosure. The memory 31 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 31 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 31 may further include memory located remotely from the processor 30, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 32 may be used to receive input numeric or character information and generate key signal inputs related to viewer user settings and function controls of the electronic device, as well as a camera for capturing images and a sound pickup device for capturing audio data. The output device 33 may include an audio device such as a speaker. It should be noted that the specific composition of the input device 32 and the output device 33 can be set according to actual conditions.
The processor 30 executes various functional applications of the device and data processing, i.e. generating track unique identification metadata, by running software programs, instructions and modules stored in the memory 31.
Example 4
The disclosed embodiment 4 also provides a storage medium containing computer executable instructions which, when generated by a computer processor, include unique identification metadata for an audio track as described in embodiment 1.
Of course, the storage medium provided by the embodiments of the present disclosure contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the electronic method described above, and may also perform related operations in the electronic method provided by any embodiments of the present disclosure, and have corresponding functions and advantages.
From the above description of the embodiments, it is obvious for a person skilled in the art that the present disclosure can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the electronic method according to any embodiment of the present disclosure.
It should be noted that, in the electronic device, the units and modules included in the electronic device are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present disclosure.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "in an embodiment," "in yet another embodiment," "exemplary" or "in a particular embodiment," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the disclosure. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although the present disclosure has been described in detail hereinabove with respect to general description, specific embodiments and experiments, it will be apparent to those skilled in the art that some modifications or improvements may be made based on the present disclosure. Accordingly, such modifications and improvements are intended to be within the scope of this disclosure, as claimed.

Claims (8)

1. Audio track unique identification metadata, comprising:
the attribute area comprises audio track unique identification information and preset audio track description information;
and the sub-element area comprises an audio material exchange format searching sub-element, audio track format reference information and audio packet format reference information.
2. The audio track unique identification metadata according to claim 1, wherein said preset audio track description information comprises: track sampling rate and track bit depth.
3. The audio track unique identification metadata according to claim 1, wherein the audio package format reference information is audio package identification information of an audio package format to which the audio track unique identification metadata refers.
4. The audio track unique identification metadata according to claim 1, wherein the audio track format reference information is audio track identification information of an audio track format to which the audio track unique identification metadata refers.
5. The soundtrack unique identification metadata according to claim 1, wherein the audio material interchange format lookup sub-element comprises MXF package soundtrack identification reference information, MXF soundtrackreference information, and channel soundtrack reference information.
6. A method of generating audio track unique identification metadata, characterized in that it is arranged to generate audio track unique identification metadata comprising audio track unique identification metadata according to any of claims 1-5.
7. An electronic device, comprising: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more programs, cause the one or more processors to generate a data stream comprising audio track unique identification metadata as recited in any of claims 1-5.
8. A storage medium containing computer-executable instructions which, when generated by a computer processor, comprise audio track unique identification metadata as claimed in any one of claims 1 to 5.
CN202111205630.9A 2021-10-15 2021-10-15 Audio track unique identification metadata and generation method, electronic device and storage medium Pending CN114121036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111205630.9A CN114121036A (en) 2021-10-15 2021-10-15 Audio track unique identification metadata and generation method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111205630.9A CN114121036A (en) 2021-10-15 2021-10-15 Audio track unique identification metadata and generation method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN114121036A true CN114121036A (en) 2022-03-01

Family

ID=80376336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111205630.9A Pending CN114121036A (en) 2021-10-15 2021-10-15 Audio track unique identification metadata and generation method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN114121036A (en)

Similar Documents

Publication Publication Date Title
JP6174326B2 (en) Acoustic signal generating device and acoustic signal reproducing device
CN104506920A (en) Method and device for playing omnimedia data information
CN113905321A (en) Object-based audio channel metadata and generation method, device and storage medium
CN114121036A (en) Audio track unique identification metadata and generation method, electronic device and storage medium
US20090088879A1 (en) Audio reproduction device and method for audio reproduction
CN114512152A (en) Method, device and equipment for generating broadcast audio format file and storage medium
CN114051194A (en) Audio track metadata and generation method, electronic equipment and storage medium
CN114023339A (en) Audio-bed-based audio packet format metadata and generation method, device and medium
CN114143695A (en) Audio stream metadata and generation method, electronic equipment and storage medium
CN113889128A (en) Audio production model and generation method, electronic equipment and storage medium
US8117241B2 (en) Method and apparatus for generating media-exchangeable multimedia data and method and apparatus for reconstructing media-exchangeable multimedia data
CN114203189A (en) Method, apparatus and medium for generating metadata based on binaural audio packet format
CN115190412A (en) Method, device and equipment for generating internal data structure of renderer and storage medium
CN114023340A (en) Object-based audio packet format metadata and generation method, apparatus, and medium
CN113905322A (en) Method, device and storage medium for generating metadata based on binaural audio channel
CN113923264A (en) Scene-based audio channel metadata and generation method, device and storage medium
CN114360556A (en) Serial audio metadata frame generation method, device, equipment and storage medium
CN114203188A (en) Scene-based audio packet format metadata and generation method, device and storage medium
CN114530157A (en) Audio metadata channel allocation block generation method, apparatus, device and medium
CN113938811A (en) Audio channel metadata based on sound bed, generation method, equipment and storage medium
CN114363792A (en) Transmission audio track format serial metadata generation method, device, equipment and medium
CN115529548A (en) Speaker channel generation method and device, electronic device and medium
CN115426611A (en) Method and apparatus for rendering object-based audio using metadata
CN113963724A (en) Audio content metadata and generation method, electronic device and storage medium
CN114979935A (en) Object output rendering item determination method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination