CN113990355A - Audio program metadata and generation method, electronic device and storage medium - Google Patents

Audio program metadata and generation method, electronic device and storage medium Download PDF

Info

Publication number
CN113990355A
CN113990355A CN202111102045.6A CN202111102045A CN113990355A CN 113990355 A CN113990355 A CN 113990355A CN 202111102045 A CN202111102045 A CN 202111102045A CN 113990355 A CN113990355 A CN 113990355A
Authority
CN
China
Prior art keywords
audio
audio program
information
screen
program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111102045.6A
Other languages
Chinese (zh)
Inventor
吴健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Saiyinxin Micro Beijing Electronic Technology Co ltd
Original Assignee
Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Saiyinxin Micro Beijing Electronic Technology Co ltd filed Critical Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority to CN202111102045.6A priority Critical patent/CN113990355A/en
Publication of CN113990355A publication Critical patent/CN113990355A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/12Formatting, e.g. arrangement of data block or words on the record carriers
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10546Audio or video recording specifically adapted for audio data
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10592Audio or video recording specifically adapted for recording or reproducing multichannel signals
    • G11B2020/10601Audio or video recording specifically adapted for recording or reproducing multichannel signals surround sound signal
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/12Formatting, e.g. arrangement of data block or words on the record carriers
    • G11B2020/1291Formatting, e.g. arrangement of data block or words on the record carriers wherein the formatting serves a specific purpose
    • G11B2020/1298Enhancement of the signal quality

Abstract

The present disclosure relates to an audio program metadata and generation method, an electronic device, and a storage medium. Audio program metadata comprising: an attribute zone comprising an audio program identification and an audio program name of an audio program, the audio program identification comprising audio program information created with reference to one or more audio contents; a sub-element region comprising: audio content reference information including audio content information referenced when an audio program is played, the audio content reference information including one or more audio content information referenced. The audio program metadata describe the metadata format of the audio program playing, and can realize the control of audio content acquisition, playing time, playing loudness and playing screen when the audio program is played, so that the quality of an audio playing scene is improved.

Description

Audio program metadata and generation method, electronic device and storage medium
Technical Field
The present disclosure relates to the field of audio processing technologies, and in particular, to an audio program metadata and generation method, an electronic device, and a storage medium.
Background
With the development of technology, audio becomes more and more complex. The early single-channel audio is converted into stereo, and the working center also focuses on the correct processing mode of the left and right channels. But the process begins to become complex after surround sound occurs. The surround 5.1 speaker system performs ordering constraint on a plurality of channels, and further the surround 6.1 speaker system, the surround 7.1 speaker system and the like enable audio processing to be varied, and correct signals are transmitted to proper speakers to form an effect of mutual involvement. Thus, as sound becomes more immersive and interactive, the complexity of audio processing also increases greatly.
Audio channels (or audio channels) refer to audio signals that are independent of each other and that are captured or played back at different spatial locations when sound is recorded or played. The number of channels is the number of sound sources when recording or the number of corresponding speakers when playing back sound. For example, in a surround 5.1 speaker system comprising audio signals at 6 different spatial locations, each separate audio signal is used to drive a speaker at a corresponding spatial location; in a surround 7.1 speaker system comprising audio signals at 8 different spatial positions, each separate audio signal is used to drive a speaker at a corresponding spatial position.
Therefore, the effect achieved by current loudspeaker systems depends on the number and spatial position of the loudspeakers. For example, a binaural speaker system cannot achieve the effect of a surround 5.1 speaker system.
The present disclosure provides an audio program metadata and a generating method thereof in order to provide a metadata capable of solving the above-mentioned technical problems.
Disclosure of Invention
The present disclosure is directed to an audio program metadata generation method, an electronic device, and a storage medium, so as to solve one of the above technical problems.
To achieve the above object, a first aspect of the present disclosure provides audio program metadata, including:
the attribute area comprises an audio program identifier and an audio program name of an audio program, wherein the audio program identifier comprises audio program information created by referring to one or more audio contents;
a sub-element region comprising: audio content reference information including audio content information referenced when an audio program is played, the audio content reference information including one or more audio content information referenced.
To achieve the above object, a second aspect of the present disclosure provides a method for generating audio program metadata, including:
the generating comprises audio program metadata as described in the first aspect.
To achieve the above object, a third aspect of the present disclosure provides an electronic device, including: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to generate a stream comprising audio program metadata as described in the first aspect.
To achieve the above object, a fourth aspect of the present disclosure provides a storage medium containing computer-executable instructions which, when generated by a computer processor, comprise audio program metadata as described in the first aspect.
From the above, the disclosed audio program metadata includes: the attribute area comprises an audio program identifier and an audio program name of an audio program, wherein the audio program identifier comprises audio program information created by referring to one or more audio contents; a sub-element region comprising: audio content reference information including audio content information referenced when an audio program is played, the audio content reference information including one or more audio content information referenced. The audio program metadata describes the metadata format of the audio program playing, and can realize the control of audio content acquisition, playing time, playing loudness and playing screen when the audio program is played, thereby improving the quality of an audio playing scene.
Drawings
Fig. 1 is a schematic diagram of a multi-dimensional acoustic audio production model provided in embodiment 1 of the present disclosure;
fig. 2 is a schematic structural diagram of audio program metadata provided in embodiment 1 of the present disclosure;
fig. 3 is a flowchart of a method for generating audio program metadata provided in embodiment 2 of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device provided in embodiment 3 of the present disclosure.
Detailed Description
The following examples are intended to illustrate the present disclosure, but are not intended to limit the scope of the present disclosure.
Metadata (Metadata) is information that describes the structural characteristics of data, and the functions supported by Metadata include indicating storage locations, historical data, resource lookups, or file records.
As shown in fig. 1, the multi-dimensional audio production model is composed of a set of production elements each describing information of structural characteristics of data of a corresponding stage of audio production by metadata, and includes a content production section and a format production section.
The production elements of the content production section include: an audio program element, an audio content element, an audio object element, and a soundtrack unique identification element.
The audio program includes narration, sound effects, and background music, and the audio program references one or more audio contents that are combined together to construct a complete audio program. The audio program elements are, for example, elements that produce an audio program, and metadata that describes the structural characteristics of the audio program is generated for the audio program.
The audio content describes the content of a component of an audio program, such as background music, and relates the content to its format by reference to one or more audio objects. The audio content element is information for producing audio content, and metadata for generating the audio content is used for describing structural characteristics of the audio content.
The audio objects are used to build content, format and valuable information and to determine the soundtrack unique identification of the actual soundtrack. The audio object elements are, for example, production audio objects, and metadata of the audio objects is generated to describe information of structural characteristics of the audio objects.
The audio track unique identification element is used for making an audio track unique identification, and metadata for generating the audio track unique identification is used for describing the structural characteristics of the audio track unique identification.
The production elements of the format production part include: an audio program format element, an audio channel format element, an audio stream format element, an audio track format element.
The audio program format is a format adopted when the metadata of the audio object and the audio stream data are packaged according to channel packets, wherein the audio program format can comprise a nested audio program format. The audio program format element is also the production audio program data. The audio program data includes audio program metadata, which is used to describe information of structural characteristics of the audio program format.
The audio channel format represents a single sequence of audio samples on which certain operations may be performed, such as movement of rendering objects in a scene. Nested audio channel formats can be included in the audio channel formats. The audio channel format element is to make audio channel data. The audio channel data comprises metadata in an audio channel format, and the metadata in the audio channel format is used for describing information of structural characteristics of the audio channel format.
Audio streams, which are combinations of audio tracks needed to render channels, objects, higher-order ambient sound components, or packets. The audio stream format is used to establish a relationship between a set of audio track formats and a set of audio channel formats or audio program formats. The audio stream format element is also the production audio stream data. The audio stream data comprises metadata in an audio stream format, and the metadata in the audio stream format is used for describing information of structural characteristics of the audio stream format.
The audio track format corresponds to a set of samples or data in a single audio track in the storage medium, the track format used to describe the original audio data, and the decoded signal of the renderer. The audio track format is derived from the original audio data for identifying the combination of audio tracks required for successful decoding of the audio track data. The audio track format element is the production audio track data. The audio track data includes metadata in an audio track format, and the metadata in the audio track format is used for describing information of structural characteristics of the audio track format.
Each stage of the multi-dimensional audio production model produces metadata that describes the characteristics of that stage.
And after the audio channel data manufactured based on the multi-dimensional audio manufacturing model is transmitted to the far end in a communication mode, the far end performs stage-by-stage rendering on the audio channel data based on the metadata, and the manufactured sound scene is restored.
Example 1
The present disclosure provides and describes in detail audio program metadata in a multi-dimensional acoustic audio model.
The audio program (audioprogram) is a top-level element in a multi-dimensional sound audio model, the audio program represents the whole program or a version of the program, and is composed of one or more audio contents, one or more audio programs can form an audio file, the audio program contains the time of the beginning and the time of the end of the audio program, the audio and video synchronization can be realized through time control, the audio program also comprises a loudness element, the loudness element is used for recording the loudness information of the program, and according to the needs, the screen size information of audio production can be provided for a production center or a user. Meanwhile, if one audio file contains a plurality of audio programs, if the playing sequence of the audio programs and the first audio program to be played are not established, the playing sequence of the audio programs is played according to the sequencing information of the audio programs.
As shown in fig. 2, the audio program metadata 100 includes an attribute section 110 and a sub-element section 120.
The attribute area 110 includes an audio program identifier 111 and an audio program name 112 of the audio program.
The audio program identification 111 includes audio program information created with reference to one or more audio contents.
In the embodiment of the present disclosure, the audio program identifier 111 is an identifier or label of an audio program, for example, an identifier of an audio program is "music 001", and a corresponding audio program name may be set to be "one song", so that by identifying "music 001", an audio program with an audio program name of "one song" may be obtained, and the audio program identifier 111 may be described in a computer language as:
<audioProgramme audioContentID=“APR_1001”>
the audio program identifier 111 that represents a certain audio program is "APR _ 1001" and the audio program identifier 111 may be understood as that the corresponding audio program name and audio content information may be obtained through the audio program identifier.
The audio program name 112 is represented as a specific name of an audio program, for example, the name of an audio program is "one song", and its corresponding audio program identifier 111 is "music 001", so that the audio program with the name of "one song" can be obtained through "music 001" of the audio program identifier 111.
The sub-element region 120 includes: audio content reference information 121, the audio content reference information 121 including audio content information referenced when an audio program is played, the audio content reference information 121 including one or more audio content information referenced.
In the embodiment of the present disclosure, an audio program may be composed of one or more independent audio contents, for example, when performing audio editing, an entire audio file is edited into a plurality of small audio files, at this time, if the entire audio program needs to be played, the small audio files need to be played as the contents of the audio program, at this time, the audio content reference information 121 functions to play the audio contents of the audio program in a position away from the small audio files, and when performing specific execution, each of the referenced audio contents needs to be added to a list of the audio program, for example, the software code execution mode thereof is:
<audioContentIDRef>ACO_1001<audioContentIDRef>
<audioContentIDRef>ACO_1002<audioContentIDRef>
<audioContentIDRef>ACO_1003<audioContentIDRef>
the audio content referenced by the identification audio content reference information 121 includes three audio program names, namely "ACO _ 1001", "ACO _ 1002", "ACO _ 1003", "ACO _ 1001", "ACO _ 1002", and "ACO _ 1003", respectively, and the three audio program names are sequentially played in the order of "ACO _ 1001", "ACO _ 1002", and "ACO _ 1003" during playing.
Optionally, the attribute section 110 further includes audio language information 113 indicating an audio program displayed on the display screen for playing the audio program, where the audio language information 113 indicates a language displayed when the audio program is played, such as "en" for english, "cn" for chinese, "jp" for japanese, "kr" for korean, and so on, and when one of the audio language information 113 is selected, the text content played is displayed on the screen according to the selected language information, and if not selected, the default option is "en".
Optionally, the attribute section 110 further includes start time information 114 indicating playing of the audio program, the number of digits in seconds of the start time information 114 is not less than five, and by setting the number of digits in seconds to not less than five, it can be ensured that accurate timing of acquisition is guaranteed by enough digits in seconds. In specific implementation, the computer language is described as:
<start=“00:01:00.00000”>
the start time indicated as the audio program play starts at minute 1.
Optionally, the attribute section 110 further includes end time information 115 indicating playing of the audio program, and the number of second digits of the end time information 115 is not less than five digits. By setting the number of digits in seconds to be not less than five, it can be ensured that accurate timing is guaranteed to be acquired by sufficient decimal places. In specific implementation, the computer language is described as:
<end=“00:10:00.00000”>
the end time indicated as the audio program play is the 10 th minute end.
Optionally, the attribute section 110 further includes maximum allowed dodging information 116 indicating the dodges allowed for the playing of the audio program.
Specifically, the sub-element area 120 further includes program loudness information 122 for playing the audio program, where the program loudness information 122 is a decibel value for playing the audio program. Specifically, since the loudness of an audio program during playing needs to be adjusted, the program loudness information 122 is to measure and control the loudness of the program, and specifically, the program loudness information 122 includes: loudness calculation algorithm information (loudnessMethod), loudness compliance standard information (loudnessRecType), loudness modification type information (loudnesscorrectType);
the loudness calculation algorithm information is used for characterizing an algorithm followed by adjusting the loudness of the audio program when the audio program is played, for example, the ITU bs.1770 recommendation specification is adopted;
the loudness compliance standard information is used for representing an industry standard which needs to be met when the loudness correction is carried out during the playing of the audio program, for example, "EBU R128" represents that the standard conforms to the european R128 standard;
the loudness modification type information is used to characterize the types of modifications on which loudness modification is based, including file-based modifications and real-time modifications, when an audio program is played. In particular, the computer program is described as:
<loudnessMetadata loudnessMethod="ITU-R BS.1770" loudnessRecType="EBU R128">
<intergratedLoudness>-23.0<intergratedLoudness>
</loudnessMetadata>
the decibel value representing the audio program play is "-23.0".
Optionally, the sub-element area 120 further includes screen information 123 for playing the audio program, where the screen information 123 is used to represent screen aspect ratio, center point position, and screen width information of the display screen during playing the audio program.
Specifically, the screen information 123 includes: screen aspect ratio information, screen center position information, and screen width information;
the screen aspect ratio information represents a ratio of a vertical direction to a horizontal direction of a screen of the audio program when the audio program is played, the screen aspect ratio information is used for representing a screen aspect ratio displayed when the audio program is played, and in concrete implementation, the computer language description is as follows:
<aspectratio>1.778<aspectratio>
indicating a screen aspect ratio of 1.778.
The screen center position information includes: the center position azimuth angle is used for representing the azimuth angle of the center of the screen; the elevation angle is used for representing the elevation angle of the center of the screen; a distance for characterizing a distance of a screen center; an X value for characterizing the position of the screen center on the X axis; a Y value for representing the position of the screen center on the Y axis; and the Z value is used for representing the position of the screen center on the Z axis. In specific implementation, the computer language is described as:
<screenCenterPosition X=“0.0”Y=“0.0”Z=“0.0”/>
the central position of the screen is represented as an origin coordinate.
The screen width information includes: the width azimuth angle is used for representing the azimuth angle measured in the width direction of the screen; and the width X value is used for representing the measured width of the screen in the X-axis direction.
The embodiment of the present disclosure describes the metadata format of the audio program playing through the audio program metadata 100, and can realize the control of the acquisition of the audio content, the playing time, the playing loudness, and the playing screen during the audio program playing, thereby improving the quality of the audio playing scene.
Example 2
The present disclosure also provides an embodiment of a method for generating metadata of an audio program, which is similar to the above embodiment, and the explanation based on the same name and meaning is the same as the above embodiment, and has the same technical effect as the above embodiment, and is not described herein again.
As shown in fig. 3, a method for generating metadata of an audio program includes the steps of:
step S210, generating audio program metadata, where the audio program metadata includes:
an attribute zone comprising an audio program identification and an audio program name of an audio program, the audio program identification comprising audio program information created with reference to one or more audio contents;
a sub-element region comprising: audio content reference information including audio content information referenced when an audio program is played, the audio content reference information including one or more audio content information referenced.
Optionally, the attribute area further includes audio language information of the audio program displayed on the display screen indicating the playing of the audio program.
Optionally, the attribute area further includes start time information indicating playing of the audio program, and the number of second digits of the start time information is not less than five digits.
Optionally, the attribute area further includes end time information indicating playing of the audio program, and the number of second digits of the end time information is not less than five digits.
Optionally, the attribute area further includes maximum allowed dodging information indicating dodges allowed for playing the audio program.
Optionally, the sub-element region further includes program loudness information of the audio program, where the program loudness information is a decibel value of the audio program.
Optionally, the sub-element area further includes screen information for playing the audio program, where the screen information is used to represent screen aspect ratio, center point position, and screen width information of the display screen during playing the audio program.
Optionally, the program loudness information includes: the loudness calculation algorithm information is used for representing an algorithm followed by adjusting the loudness of the audio program when the audio program is played, the loudness following standard information is used for representing an industry standard which needs to be met when the loudness is corrected when the audio program is played, the loudness correction type information is used for representing a correction type which is based on file correction and real-time correction when the loudness correction is performed when the audio program is played.
The selectable screen information includes: screen aspect ratio information, screen center position information, and screen width information, the screen aspect ratio information being used to characterize a screen aspect ratio displayed when an audio program is played, the screen center position information including: the center position azimuth angle is used for representing the azimuth angle of the center of the screen; the elevation angle is used for representing the elevation angle of the center of the screen; a distance for characterizing a distance of a screen center; an X value for characterizing the position of the screen center on the X axis; a Y value for representing the position of the screen center on the Y axis; the Z value is used for representing the position of the screen center on the Z axis; the screen width information includes: the width azimuth angle is used for representing the azimuth angle measured in the width direction of the screen; and the width X value is used for representing the measured width of the screen in the X-axis direction.
The embodiment of the disclosure generates the audio program metadata, and the audio program metadata describes the metadata format of the audio program playing, so that the control of the acquisition of audio content, the playing time, the playing loudness and the playing screen during the audio program playing can be realized, and the quality of an audio playing scene is improved.
Example 3
Fig. 4 is a schematic structural diagram of an electronic device provided in embodiment 3 of the present disclosure. As shown in fig. 4, the electronic apparatus includes: a processor 30, a memory 31, an input device 32, and an output device 33. The number of the processors 30 in the electronic device may be one or more, and one processor 30 is taken as an example in fig. 4. The number of the memories 31 in the electronic device may be one or more, and one memory 31 is taken as an example in fig. 4. The processor 30, the memory 31, the input device 32 and the output device 33 of the electronic apparatus may be connected by a bus or other means, and fig. 4 illustrates the connection by a bus as an example. The electronic device can be a computer, a server and the like. The embodiment of the present disclosure describes in detail by taking an electronic device as a server, and the server may be an independent server or a cluster server.
Memory 31 serves as a computer-readable storage medium that may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules for generating audio program metadata as described in any embodiment of the present disclosure. The memory 31 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 31 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 31 may further include memory located remotely from the processor 30, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 32 may be used to receive input numeric or character information and generate key signal inputs related to viewer user settings and function controls of the electronic device, as well as a camera for capturing images and a sound pickup device for capturing audio data. The output device 33 may include an audio device such as a speaker. It should be noted that the specific composition of the input device 32 and the output device 33 can be set according to actual conditions.
Processor 30 executes various functional applications of the device and data processing, i.e., generating audio program metadata, by executing software programs, instructions, and modules stored in memory 31.
Example 4
The embodiment 4 of the present disclosure also provides a storage medium containing computer executable instructions, which generate metadata including audio program metadata as described in embodiment 1 by a computer processor.
Of course, the storage medium provided by the embodiments of the present disclosure contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the electronic method described above, and may also perform related operations in the electronic method provided by any embodiments of the present disclosure, and have corresponding functions and advantages.
From the above description of the embodiments, it is obvious for a person skilled in the art that the present disclosure can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solutions of the present disclosure may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the electronic method according to any embodiment of the present disclosure.
It should be noted that, in the electronic device, the units and modules included in the electronic device are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the present disclosure.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
In the description herein, references to the description of the term "in an embodiment," "in yet another embodiment," "exemplary" or "in a particular embodiment," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the disclosure. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although the present disclosure has been described in detail hereinabove with respect to general description, specific embodiments and experiments, it will be apparent to those skilled in the art that some modifications or improvements may be made based on the present disclosure. Accordingly, such modifications and improvements are intended to be within the scope of this disclosure, as claimed.

Claims (12)

1. Audio program metadata, comprising:
an attribute zone comprising an audio program identification and an audio program name of an audio program, the audio program identification comprising audio program information created with reference to one or more audio contents;
a sub-element region comprising: audio content reference information including audio content information referenced when an audio program is played, the audio content reference information including one or more audio content information referenced.
2. The audio program metadata of claim 1 wherein said attribute section further comprises audio language information indicating an audio program displayed on a display screen on which the audio program is played.
3. The audio program metadata according to claim 1, wherein said attribute section further comprises start time information indicating an audio program play, said start time information having a number of second bits not less than five bits.
4. The audio program metadata according to claim 1, wherein said attribute section further comprises end time information indicating an audio program play, said end time information having a number of second bits of not less than five bits.
5. The audio program metadata of claim 1, wherein the attribute section further comprises maximum allowed dodging information indicating dodges allowed for audio program playback.
6. The audio program metadata of claim 1 wherein said sub-element region further comprises program loudness information for an audio program play, said program loudness information being a decibel value for an audio program play.
7. The audio program metadata according to claim 1, wherein said sub-element region further comprises screen information of the audio program playing, said screen information being used to characterize the screen aspect ratio, center point position, screen width information of the display screen at the time of the audio program playing.
8. The audio program metadata as recited in claim 6, wherein the program loudness information comprises: the loudness calculation algorithm information is used for representing an algorithm followed by adjusting the loudness of the audio program when the audio program is played, the loudness following standard information is used for representing an industry standard which needs to be met when the loudness is corrected when the audio program is played, the loudness correction type information is used for representing a correction type which is based on file correction and real-time correction when the loudness correction is performed when the audio program is played.
9. The audio program metadata according to claim 7, wherein the screen information includes:
screen aspect ratio information, the screen aspect ratio information including: screen center position information and screen width information;
the screen aspect ratio information is used for representing the aspect ratio of a screen displayed when the audio program is played;
the screen center position information includes:
the center position azimuth angle is used for representing the azimuth angle of the center of the screen; the elevation angle is used for representing the elevation angle of the center of the screen;
a distance for characterizing a distance of a screen center;
an X value for characterizing the position of the screen center on the X axis;
a Y value for representing the position of the screen center on the Y axis;
the Z value is used for representing the position of the screen center on the Z axis;
the screen width information includes:
the width azimuth angle is used for representing the azimuth angle measured in the width direction of the screen;
and the width X value is used for representing the measured width of the screen in the X-axis direction.
10. A method for generating metadata for an audio program, comprising:
generating metadata comprising an audio program as claimed in any one of claims 1 to 9.
11. An electronic device, comprising: a memory and one or more processors;
the memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to generate metadata comprising the audio program as recited in any of claims 1-9.
12. A storage medium containing computer-executable instructions which, when generated by a computer processor, comprise audio program metadata as claimed in any one of claims 1 to 9.
CN202111102045.6A 2021-09-18 2021-09-18 Audio program metadata and generation method, electronic device and storage medium Pending CN113990355A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111102045.6A CN113990355A (en) 2021-09-18 2021-09-18 Audio program metadata and generation method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111102045.6A CN113990355A (en) 2021-09-18 2021-09-18 Audio program metadata and generation method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN113990355A true CN113990355A (en) 2022-01-28

Family

ID=79736143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111102045.6A Pending CN113990355A (en) 2021-09-18 2021-09-18 Audio program metadata and generation method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113990355A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1771492A (en) * 2003-03-13 2006-05-10 韩国电子通信研究院 Extended metadata and adaptive program service providing system and method for providing digital broadcast program service
CN101094407A (en) * 2006-06-23 2007-12-26 美国博通公司 Video circuit, video system and video processing method
CN104240709A (en) * 2013-06-19 2014-12-24 杜比实验室特许公司 Audio encoder and decoder with program information or substream structure metadata
CN105075295A (en) * 2013-04-03 2015-11-18 杜比实验室特许公司 Methods and systems for generating and rendering object based audio with conditional rendering metadata
CN107861711A (en) * 2016-09-22 2018-03-30 腾讯科技(深圳)有限公司 page adaptation method and device
US20190265943A1 (en) * 2018-02-23 2019-08-29 Bose Corporation Content based dynamic audio settings

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1771492A (en) * 2003-03-13 2006-05-10 韩国电子通信研究院 Extended metadata and adaptive program service providing system and method for providing digital broadcast program service
CN101094407A (en) * 2006-06-23 2007-12-26 美国博通公司 Video circuit, video system and video processing method
CN105075295A (en) * 2013-04-03 2015-11-18 杜比实验室特许公司 Methods and systems for generating and rendering object based audio with conditional rendering metadata
CN104240709A (en) * 2013-06-19 2014-12-24 杜比实验室特许公司 Audio encoder and decoder with program information or substream structure metadata
CN107861711A (en) * 2016-09-22 2018-03-30 腾讯科技(深圳)有限公司 page adaptation method and device
US20190265943A1 (en) * 2018-02-23 2019-08-29 Bose Corporation Content based dynamic audio settings

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
国际电信联盟: "音频定义模型", 《ITU-R BS.2076-1 建议书》, pages 1 - 106 *
隋爱娜、曹刚、王永滨: "《数字内容安全技术》", 1 October 2016, pages: 32 *

Similar Documents

Publication Publication Date Title
US9818448B1 (en) Media editing with linked time-based metadata
US20020136529A1 (en) Caption subject matter creating system, caption subject matter creating method and a recording medium in which caption subject matter creating program is stored
CN113990355A (en) Audio program metadata and generation method, electronic device and storage medium
CN113905321A (en) Object-based audio channel metadata and generation method, device and storage medium
CN114512152A (en) Method, device and equipment for generating broadcast audio format file and storage medium
CN113963724A (en) Audio content metadata and generation method, electronic device and storage medium
CN114121036A (en) Audio track unique identification metadata and generation method, electronic device and storage medium
CN114023339A (en) Audio-bed-based audio packet format metadata and generation method, device and medium
CN115529548A (en) Speaker channel generation method and device, electronic device and medium
CN114203189A (en) Method, apparatus and medium for generating metadata based on binaural audio packet format
CN114023340A (en) Object-based audio packet format metadata and generation method, apparatus, and medium
CN114203188A (en) Scene-based audio packet format metadata and generation method, device and storage medium
CN111601157B (en) Audio output method and display device
CN114051194A (en) Audio track metadata and generation method, electronic equipment and storage medium
CN114530157A (en) Audio metadata channel allocation block generation method, apparatus, device and medium
CN114286179B (en) Video editing method, apparatus, and computer-readable storage medium
CN113963725A (en) Audio object metadata and generation method, electronic device, and storage medium
CN114360556A (en) Serial audio metadata frame generation method, device, equipment and storage medium
CN114363790A (en) Method, apparatus, device and medium for generating metadata of serial audio block format
CN113938811A (en) Audio channel metadata based on sound bed, generation method, equipment and storage medium
CN114143695A (en) Audio stream metadata and generation method, electronic equipment and storage medium
JP4792819B2 (en) Remote editing method and remote editing system
CN113889128A (en) Audio production model and generation method, electronic equipment and storage medium
CN115190412A (en) Method, device and equipment for generating internal data structure of renderer and storage medium
CN115426612A (en) Metadata parsing method, device, equipment and medium for object renderer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination