CN114510212A

CN114510212A - Data transmission method, device and equipment based on serial digital audio interface

Info

Publication number: CN114510212A
Application number: CN202111675350.4A
Authority: CN
Inventors: 吴健
Original assignee: Saiyinxin Micro Beijing Electronic Technology Co ltd
Current assignee: Saiyinxin Micro Beijing Electronic Technology Co ltd
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2022-05-17
Anticipated expiration: 2041-12-31
Also published as: CN114510212B

Abstract

The application relates to a data transmission method, a device and equipment based on a serial digital audio interface, wherein the method comprises the following steps: acquiring data stream data for generating a data burst; wherein the data stream data comprises audio data and/or audio model metadata; placing audio samples of the data stream data in a payload field of a subframe of the data burst according to a data pattern of the data stream data; setting a field other than the payload field in the data burst; the data bursts are transmitted over a serial digital audio interface. The technical scheme of the application can be used for transmitting data by using an AES3 serial digital audio interface in professional application.

Description

Data transmission method, device and equipment based on serial digital audio interface

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to a data transmission method, apparatus, and device based on a serial digital audio interface.

Background

With the development of technology, audio becomes more and more complex. The early single-channel audio is converted into stereo, and the working center also focuses on the correct processing mode of the left and right channels. But the process begins to become complex after surround sound occurs. The surround 5.1 speaker system performs ordering constraint on a plurality of channels, and further the surround 6.1 speaker system, the surround 7.1 speaker system and the like enable audio processing to be varied, and correct signals are transmitted to proper speakers to form an effect of mutual involvement. Thus, as sound becomes more immersive and interactive, the complexity of audio processing also increases greatly.

Audio channels (or audio channels) refer to audio signals that are independent of each other and that are captured or played back at different spatial locations when sound is recorded or played. The number of channels is the number of sound sources when recording or the number of corresponding speakers when playing back sound. For example, in a surround 5.1 speaker system comprising audio signals at 6 different spatial locations, each separate audio signal is used to drive a speaker at a corresponding spatial location; in a surround 7.1 speaker system comprising audio signals at 8 different spatial positions, each separate audio signal is used to drive a speaker at a corresponding spatial position.

The AES3 interface is widely used in the industry to transfer linear PCM audio between digital audio devices.

Disclosure of Invention

The application aims to provide a data transmission method, a device and equipment based on a serial digital audio interface, so as to transmit data by using an AES3 serial digital audio interface in professional application.

The application provides a data transmission method based on a serial digital audio interface in a first aspect, which comprises the following steps:

acquiring data stream data for generating a data burst; wherein the data stream data comprises audio data and/or audio model metadata;

placing audio samples of the data stream data in a payload field of a subframe of the data burst according to a data pattern of the data stream data;

setting a field other than the payload field in the data burst;

the data bursts are transmitted over a serial digital audio interface.

A second aspect of the present application provides a data transmission device based on a serial digital audio interface, including:

the data acquisition module is used for acquiring data stream data used for generating data bursts; wherein the data stream data comprises audio data and/or audio model metadata;

a data placement module to place audio samples of the data stream data in a payload field of a subframe of the data burst according to a data pattern of the data stream data;

a field setting module, configured to set a field other than the payload field in the data burst;

and the data transmission module is used for transmitting the data burst through a serial digital audio interface.

A third aspect of the present application provides an electronic device comprising: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method for data transmission based on a serial digital audio interface as provided in any of the embodiments.

A fourth aspect of the present application provides a storage medium containing computer-executable instructions for implementing a method for data transmission based on a serial digital audio interface as provided in any of the embodiments in a computer processor.

According to the data transmission method based on the serial digital audio interface, audio data stream data are placed in data bursts, and relevant fields of the data bursts are set so as to transmit the data stream data by using the AES3 serial digital audio interface.

Drawings

FIG. 1 is a schematic diagram of a three-dimensional audio model provided in an embodiment of the present application;

FIG. 2 is a flow chart of a data transmission method based on a serial digital audio interface according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a data transmission apparatus based on a serial digital audio interface according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device in an embodiment of the present application;

fig. 5 is a diagram illustrating a mode switching decision process of a receiver according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Examples

As shown in fig. 1, a three-dimensional acoustic audio model is composed of a set of elements, each element describing one stage of audio, and includes a content production section and a format production section.

Wherein the content part comprises: an audio program element, an audio content element, an audio object element, and a soundtrack unique identification element; the format making part includes: an audio packet format element, an audio channel format element, an audio stream format element, and an audio track format element;

the audio program element references at least one of the audio content elements; the audio content element references at least one audio object element; the audio object element references the corresponding audio package format element and the corresponding audio track unique identification element; the audio track unique identification element refers to the corresponding audio track format element and the corresponding audio package format element;

the audio package format element references at least one of the audio channel format elements; the audio stream format element references the corresponding audio channel format element and the corresponding audio packet format element; the audio track format element and the corresponding audio stream format element are referenced to each other. The reference relationships between elements are indicated by arrows in fig. 1.

The audio program may include, but is not limited to, narration, sound effects, and background music, the audio program elements may be used to describe a program, the program includes at least one content, and the audio content elements are used to describe a corresponding one of the audio program elements. An audio program element may reference one or more audio content elements that are grouped together to construct a complete audio program element.

The audio content elements describe the content of a component of an audio program, such as background music, and relate the content to its format by reference to one or more audio object elements.

The audio object elements are used to build content, format and valuable information and to determine the soundtrack unique identification of the actual soundtrack.

The format making part comprises: an audio packet format element, an audio channel format element, an audio stream format element, an audio track format element.

The audio packet format element may be configured to describe a format adopted when the audio object element and the original audio data are packed according to channel packets.

The audio channel format element may be used to represent a single sequence of audio samples and preset operations performed on it, such as movement of rendering objects in a scene. The audio channel format element may comprise at least one audio block format element. The audio block format elements may be considered to be sub-elements of the audio channel format elements, and therefore there is an inclusion relationship between the audio channel format elements and the audio block format elements.

Audio streams, which are combinations of audio tracks needed to render channels, objects, higher-order ambient sound components, or packets. The audio stream format element is used for establishing the relationship between the audio track format element set and the audio channel format element set, or the relationship between the audio track format set and the audio packet format.

The audio track format elements correspond to a set of samples or data in a single audio track, and are used to describe the format of the original audio data, and the decoded signals of the renderer, and also to identify the combination of audio tracks required to successfully decode the audio track data.

And generating synthetic audio data containing metadata after the original audio data are produced through the three-dimensional sound audio model.

The Metadata (Metadata) is information describing characteristics of data, and functions supported by the Metadata include indicating a storage location, history data, resource lookup, or file record.

And after the synthesized audio data are transmitted to the far end in a communication mode, the far end analyzes the synthesized audio data based on the metadata, and restores the original sound scene or renders the original sound scene into a new sound scene in real time.

The division between content production, format production and BW64(Broadcast Wave 64 bit) files is shown in fig. 1. Both the content production portion and the format production portion constitute metadata in XML format, which is typically contained in one block ("axml" block) of the BW64 file. The bottom BW64 file portion contains a "channel allocation (chna)" block, which is a look-up table used to link metadata to the audio programs in the file.

The content production section describes the technical content of the audio, e.g. whether it contains dialogs or a specific language, and loudness metadata. The format section describes the channel types of the audio tracks and how they are combined together, e.g. the left and right channels in a stereo pair. The meta-index of the content production is typically unique to the audio and program, while the elements of the format production may be multiplexed.

The audio model is an open compatible metadata generic model, but the audio model metadata is not suitable for real-time production and streaming audio applications, but rather for local file storage. When remote real-time transmission of metadata with digital audio is involved, a serial audio metadata schema is required to allow slicing of existing audio and its associated audio model metadata files into frames and streaming.

A frame of serial audio metadata contains a set of audio model metadata describing the audio frames within a certain time period associated with the frame. The serial audio metadata has the same structure, attributes and elements as the audio model metadata, as well as additional attributes for specifying the frame format. The frames of serial audio metadata do not overlap and are linked to a specified start time and duration. Metadata contained in a frame of serial audio metadata is likely to be used to describe the audio itself over the duration of the frame.

The parent element of the serial audio metadata is a frame (frame) comprising: frame header (frameHeader) and audio format extended (audio format extended) two sub-elements. And the frame header includes 2 sub-elements: frame format (frameFormat) and transport track format (transportTrackFormat).

The audio format extension includes 8 sub-elements: audio program (audioprogram), audio content (audioContent), audio object (audioObject), soundtrack unique identifier (audiotrack uid), audio packet format (audiopackagformat), audio channel format (audioChannelFormat), audio stream format (audioStreamFormat), and audio track format (audioTrackFormat).

The audio model metadata consists of a content portion (e.g., audio program elements) and a format portion (e.g., audio channel format elements). Only three elements, audio program element, audio object element and audio block format element, have time-related parameters stored. In the content portion, the start time, end time and duration of an audio program element or audio object element are used to determine the start time, end time or duration of the element, these parameters are typically fixed. In the format part, all parameters in the audio block format elements are time-varying parameters.

The audio model metadata can be divided into two groups: namely dynamic metadata (e.g., audio block format elements in an audio channel format element) and static metadata (e.g., audio program elements and audio content elements).

A serial audio metadata frame consists of one or more metadata chunks.

The application provides a data transmission method based on a serial digital audio interface, as shown in fig. 2, the method includes:

s210, acquiring data stream data for generating data burst; wherein the data stream data comprises audio data and/or audio model metadata;

s220, according to the data mode of the data stream data, placing the audio samples of the data stream data in the payload field of the sub-frame of the data burst;

s230, setting fields except the payload field in the data burst;

and S240, transmitting the data burst through a serial digital audio interface.

The serial audio mode is applied to real-time production and streaming audio, a serial audio file is coded and then sliced into frames or frames are generated, and the data transmission method based on the serial digital audio interface provided by the embodiment is utilized to transmit through the serial digital audio interface. The serial digital audio interface may be an AES3 interface to transfer PCM audio, non-PCM data or both non-PCM audio and data, linear PCM audio in separate channels. The AES3 interface used to transfer data in this embodiment is based on the physical and logical specifications of the existing AES3 format, allowing non-PCM data and serial audio to be exchanged between different devices. The standard can accommodate a variety of non-PCM audio and data formats and allows for the transmission of a variety of data streams in a single interface.

AES3 interface, widely used in the industry to transfer linear PCM audio between digital audio devices; the AES3 interface is limited to only two lanes. A significant problem arises when multiple AES3 channels are used to transmit correlated audio for more than two channels.

The existing AES3 format is modified in this embodiment to transport non-PCM data, including non-PCM audio bitstreams, typically (but not necessarily) reduced bit rate bitstreams. This allows carrying a single audio program or multiple audio programs of more than 2 channels, each possibly containing more than 2 channels, on a single AES3 interface.

The present embodiment is a partial modification of the existing AES3 standard logic and is compatible with the existing AES3 interface for transferring linear PCM audio. Thus, interconnection of devices that may work with linear PCM or non-PCM audio and data may be facilitated. This may allow some existing devices capable of recording linear PCM to also record non-PCM data. The present interface supports independent use of the AES3 channel to allow one linear PCM audio channel and one non-PCM data channel to be carried in a single AES3 signal.

The AES3 interface in this embodiment accommodates synchronization methods for reconstruction of the original source audio signal encoded in the non-PCM audio stream, and for temporal alignment with other information streams, such as the associated video stream. Other standards and suggested operations, however, may contain important and necessary information regarding the synchronization requirements of a particular data type. Generally, in order to quasi-correctly transmit and receive non-PCM streams using an interface, reference to such information is required.

Because of the wide variety of types of data that may be transferred according to the present interface, global synchronization is not required in the present interface. However, the synchronization of the non-PCM data content is very important for the correct use of the present interface, both in terms of the encoded audio sample rate versus AES3 frame frequency (when transferring non-PCM audio) and in terms of time synchronization with other information streams. Furthermore, the synchronization requirements of particular data types may add caching to devices that support those data types. Therefore, other documents containing specific data type synchronization requirements need to be referenced to maintain compatibility with these data types.

There is a dedicated data type, i.e. a timestamp data type, to support the synchronization method. Many data types may utilize information contained in time-stamped data bursts, including SMPTE12M timecode information, to maintain time synchronization with other information streams.

The interface is only suitable for professional sound equipment. The existing standard (IEC 60958) covers the transmission of non-PCM data in a consumer environment. The present interface allows for some interoperability between professional AMS and user equipment and describes specific compatibility requirements.

The logical format of the AES3 interface consists of a series of subframes. Each subframe is intended to carry one linear PCM sample and contains 32 slots, each of which (excluding four slots for synchronization purposes) may carry a single bit of information. A pair of sub-frames (each sub-frame containing a PCM word of an audio channel) constitutes an AES3 frame, which contains two PCM words, one from channel 1 and one from channel 2. A sequence of 192 frames constitutes one block. The 192 channel state bits for each channel during a block constitute the 192-bit (24-byte) channel state word for that channel. The standard usage of 32 AES3 time slots is modified when transmitting non-PCM data. Such use is shown in table 1. The subframe bit field usage of AES3 non-PCM data is shown in table 1:

TABLE 1

The non-PCM data stream to be transmitted is formed into data bursts, each data burst comprising a burst preamble and a burst payload. The data bursts are placed in the audio sample word/auxiliary data field of the AES3 subframe in one of two modes. In frame mode, the data space from each sub-frame within the AES3 frame will combine to allow up to 48 bits of data to be placed in each frame. In subframe mode, each channel will be processed independently and data is not shared between intra-frame subframes. In this mode, each sub-frame may contain linear PCM audio or non-PCM data. This allows the AES3 interface to transfer two linear PCM channels simultaneously, or one linear PCM channel and one set of data bit streams, or two sets of data bit stream data types.

Data bursts are marked with a number indicating to which data stream they belong. Up to seven different non-PCM data streams, along with additional stream types dedicated to time-stamped data bursts, may be time-multiplexed together to form a set of data bit streams. In subframe mode, this allows multiplexing up to 14 independent non-PCM data streams within a single AES3 interface.

The data burst is placed in the audio sample word/auxiliary data field of the AES3 subframe using 16, 20, or 24 bits of available space within each subframe. While the 24-bit mode allows for more efficient use of the AES3 data capacity, the 16-bit and 20-bit modes may be required when interfacing with existing devices that are limited to 16-bit or 20-bit operation.

Optionally, the data patterns include a 16-bit pattern, a 20-bit pattern, and a 24-bit pattern;

the placing of audio samples of the data stream data in a payload field of a subframe of the data burst according to a data pattern of the data stream data comprises:

if the data mode is 16-bit mode or 20-bit mode, placing audio samples of the data stream data in an audio sample field of a subframe of the data burst;

if the data mode is a 24-bit mode, audio samples of the data stream data are placed in the audio sample field and auxiliary data field of the sub-frame of the data burst.

Optionally, the format of the data stream data is PCM data or non-PCM data;

if the data stream data is non-PCM data, the setting of the field other than the payload field in the data burst includes:

and setting preset bytes in a channel state field in the data burst.

Optionally, the setting a preset byte in a channel state field in the data burst includes:

setting preset bytes in the channel state bit field in the data burst to comprise byte 0, byte 1, byte 2 and byte 23; wherein the content of the first and second substances,

byte 0 bit 0 is set to 1 to indicate professional use of the channel state block; byte 0 bit 1 is set to 1 to indicate a non-audio mode; byte 0 bits 2-4 are set to 000; byte 0 bit 5 is set according to AES3 section 4 to indicate the source frame frequency lock state; bits 6-7 of byte 0 are set according to paragraph 4 of AES3 to indicate the frame rate of the AES3 interface;

bytes 1 bits 0-3 are set to 0000; bits 4-7 of byte 1 are set according to section 4 of AES 3;

byte 2 bits 0-2 are set according to AES3 section 4 to indicate the use of the auxiliary sample bit; bytes 3-5 of byte 2 are set according to AES3 section 4 to indicate a non-PCM data word size; bits 6-7 of byte 2 are set to 00;

bits 0-7 of byte 23 are set according to section 4 of AES3 to indicate the valid CRCC value of the channel state block.

Optionally, after setting the field in the data burst other than the preset bit field, the method further includes:

marking the data burst as a numeric identifier to indicate the data stream to which the data burst belongs.

Optionally, the setting a field other than the payload field in the data burst includes:

setting a preamble field of the data bursts, wherein the preamble field is set at the beginning of each of the data bursts, before the payload field, contains 4 words and represents a first sync word, a second sync word, a burst information value and a length code, respectively.

Optionally, the setting the preamble field of the data burst includes:

in a frame mode, setting 4 words included in the preamble in 2 consecutive frames, wherein a preceding frame of the 2 consecutive frames is a frame for starting the data burst, a first sub-frame of the preceding frame includes the first synchronization word, a second sub-frame of the preceding frame includes the second synchronization word, a first sub-frame of a following frame includes the burst information value, and a second sub-frame of the following frame includes the length code;

in a subframe mode, setting 4 words contained in the preamble in 4 sequential subframes of a single channel for transmitting non-PCM data, wherein a first subframe of the 4 sequential subframes is a subframe for starting the data burst, and the first synchronization word, the second synchronization word, the burst information value and the length code are sequentially set in the 4 sequential subframes.

Optionally, in frame mode, the payload field is arranged to use two AES3 lanes to transport a set of non-PCM data streams;

in the sub-frame mode, the payload field is set to carry a set of non-PCM burst sequences or linear PCM audio streams individually for each AES3 channel.

In the modified AES3 interface format in this embodiment, the logical interface format conforms to the specification of AES3 unless otherwise specified. The electrical and mechanical properties of the interface conform to AES3 or ANSI/SMPTE 276M. When one channel is used for transmitting linear PCM in the subframe mode, the channel transmitting linear PCM should be used according to AES 3.

As described in this embodiment, non-PCM data should be placed in bursts in the available AES3 data space. The non-PCM data should occupy some or all of the bit positions 4-27 of the AES3 subframe. The unused bit positions in the sub-frame or in the sub-frame between bursts should be set to 0.

Channel state mode

For the AES3 channel to transport non-PCM data, byte 0, byte 1, byte 2, and byte 23 of the channel state word should be used as described in the present embodiment. For channels that transport non-PCM data, the usage of the remaining bytes of the channel state word is undefined. Each bit of the undefined channel state byte is set to 0.

AES3 defines three types of implementation that are related to the use of channel state characteristics: min, standard and enhanced. In order to be compatible with existing implementations of AES3, a standard implementation must be used. Although the specific interpretation of certain bit fields differs from the AES3 specification, the use of the channel state bytes 0, 1, 2 and 23 defined in this interface is consistent with the standard implementation described in AES 3.

Byte 0 bit of the channel status word should be set to 1 indicating professional use of the channel status block. Regardless of the user using the AES3 bitstream.

Byte 0 bit 1 should be set to 1, indicating a non-audio mode.

Byte 0 bits 2-4 should be set to 000.

Byte 0, bit 5, should be set according to AES3, paragraph 4, but should indicate the source frame frequency lock state. The source frame frequency should be interpreted as the source rate of the AES3 interface frame rate. This bit is not necessarily used to indicate that the source sample rate of the audio signal encoded in the non-PCM audio stream in the AES3 signal is locked to the AES3 frame rate, although such use may be specified by certain data stream types.

Byte 0 bits 6-7 should be set according to AES3 section 4, but should represent the frame rate of the AES3 interface. These bits are not necessarily used to indicate the source sample rate of the audio signal encoded in the non-PCM audio stream in the AES3 signal, although such use may be specified by certain data stream types.

In this embodiment a state 00 (interpreted as a frame rate not shown) is allowed, but it is suggested that the actual frame rate is displayed.

Table 2 provides a set-up digest of the channel state word byte 0 when transferring non-PCM data. Table 2 shows the channel status bits in byte 0:

TABLE 2

Bytes 1 bits 0-3 are set to 0000. Bits 4-7 of byte 1 should be set according to AES3 section 4.

Table 3 provides a set digest of the channel state word byte 1 when transferring non-PCM data, and table 3 shows the channel state bits in byte 1:

TABLE 3

Bits 0-2 of byte 2 are set according to AES3, paragraph 4, to indicate the use of the auxiliary sample bit. In this case, the audio sample word length refers to the non-PCM data word length (the number of bits used to transmit non-PCM data) defined by the data mode parameters. If multiple non-PCM data streams are carried within the AES3 interface and there are multiple data pattern words, the data pattern corresponding to the maximum non-PCM data word length is used as a reference.

Bits 3-5 of byte 2 are set according to paragraph 4 of AES3, indicating the non-PCM data word length (number of bits used to transfer non-PCM data) defined by the data mode parameter. If multiple non-PCM data streams are carried within the AES3 interface and there are multiple data pattern words, the data pattern corresponding to the largest non-PCM data word length is used as a reference.

State 000 (interpreted as a non-PCM data word length not indicated) is allowed but indicates the actual non-PCM data word length. Bits 6-7 of byte 2 should be set to 00.

Table 4 provides a set-up digest of the channel state word byte 2 when transferring non-PCM data. Table 4 shows the channel status bits in byte 2:

TABLE 4

Bytes 23 bits 0-7 are set according to AES3 paragraph 4 to indicate a valid CRCC value for the channel state block (default state is not allowed to be 0). Table 5 shows the channel status bits in byte 23:

TABLE 5

Character (Chinese character)	Value of	Note
			Bits0–7	—	CRCC words according to AES3

Sample rate synchronization, the present interface does not require synchronization between the AES3 interface rate (frame rate) and the sample rate of the encoded audio in the non-PCM data stream. However, other standards or recommended practices may dictate a fixed relationship between the AES3 interface and the encoded audio sample rate for a particular data type.

Data burst format, the non-PCM data stream to be transmitted forms a data burst consisting of data words in a consecutive sequence of AES3 frames. Each data burst includes a burst preamble and a burst payload. When there are multiple streams, bursts from each stream are placed in the AES3 stream in a time-division multiplexed manner.

The burst preamble occurs at the beginning of each data burst, followed by the burst message payload. The burst preamble should occupy 16, 20 or 24 bits in each of 4 consecutive subframes in one of two ways, depending on whether a frame or subframe pattern is in use or not. The preamble includes four words, Pa, Pb, Pc, Pd respectively. When placed in the AES3 subframe, the MSB of the preamble is placed in slot 27 of the subframe. The LSB of each preamble is placed in slot 12, 8 or 4, depending on the data pattern. In a 16-bit pattern slot, 11-8 are set to 0 for each subframe containing a preamble. In the 16-bit and 20-bit modes, slots 7-4 are also set to 0 for each subframe containing the preamble. Table 6 shows the preamble:

TABLE 6

In frame mode, 4 preambles are contained in 2 consecutive frames. The frame starting the data burst contains a preamble Pa in the lane 1 sub-frame and Pb in the lane 2 sub-frame. The next frame contains Pc in channel 1 and Pd in channel 2.

In the subframe mode, 4 preambles are contained in 4 sequence subframes of a single channel (channel 1 or channel 2) used to transmit non-PCM data. The sub-frame (channel being used) at which the data burst starts should contain the preamble Pa, and the next sub-frame (channel) in the burst contains Pb and the like.

The burst information burst _ info, the value of burst _ info shall contain information of the burst payload content specified in the CELT static allocation table. The 15 th bit, 19 th bit or 23 th bit of burst _ info should be regarded as MSB depending on the data mode (16 bits, 20 bits or 24 bits). Note that burst _ info msb will always be in slot 27 of the AES3 subframe.

Data type data _ type, a 5-bit data _ type field indicates a data type contained in the burst payload. The MSB of the data _ type field is placed in bit 4, 8, or 12 of the burst information word depending on the data mode. Note that the data _ type MSB will always be in slot 16 of the AES3 subframe. The Data _ type value only applies to a single burst of Data. When multiple data streams are carried within the AES3 interface, the data type may vary between different stream numbers. The supported data types and the mapping of data _ type values to specific data types are defined in the SMPTE 338M. Other standards may contain specific data type dependent format requirements. Table 7 shows the burst information values:

TABLE 7

Data mode data _ mode, 2-bit data _ mode field indicates the mode in which burst _ payload data is placed in the AES3 subframe. The MSB of the data _ mode field should be placed in bit 6, 10 or 14 of the burst information word, depending on the data mode. Note that the MSB of data _ mode will always be in slot 18 of the AES3 subframe. In each data pattern, the burst _ payload data word should occupy the subframe slot shown in table 8. In 16-bit and 20-bit modes, the unused slots should contain a value of 0. The data _ mode value should only apply to a single data burst. The data pattern may vary between consecutive data bursts for a given data _ stream _ number, or between data bursts of different stream numbers when carrying multiple data streams within the AES3 interface. Table 8 shows data pattern values:

TABLE 8

Value of	Data schema	burst payload position
			0	16 bit pattern	Sub-frame slots 27-12
1	20 bit mode	Sub-frame time slots 27-8
			2	24 bit pattern	Sub-frame time slot 27-4
3	Retention	N/A

An error flag error _ flag, which provides an error indication for the data in the burst _ payload. If the data in the burst payload is known to be error free or not known to contain errors, the value of the bit is set to 0. If the data in the burst payload is known to contain an error, then the bit is set to 1. Note that the error flag will always be located in slot 19 of the AES3 subframe.

The data type-dependent data _ type _ dependent, data _ type _ dependent field shall contain 5 bits, the meaning of which depends on the value of data _ type. The particular encoding of this field may be found in other standards or recommended practices applicable to the particular data type. A reference to a document containing this field description of a particular data type may be found in SMPTE 338M.

The data stream number data _ stream _ number, 3-bit data _ stream _ number shall denote the number of the data stream to which the burst belongs. The MSB of the 3-bit data _ stream _ number should be placed at bit 15, 19 or 23 of the burst information word according to the data pattern. The data _ stream _ number MSB will always be in slot 27 of the AES3 subframe.

Each independent data stream uses a unique value of data _ stream _ number. Eight data stream numbers (0-7) are available. The data stream number 7 is reserved for the time stamp data type. All time stamped data bursts are encoded using a data stream set to 7. Data stream numbers 0-6 may be used for all data types except the timestamp data type. Thus, up to 7 independent data streams may be time multiplexed in the AES3 interface when in frame mode. In the subframe mode, each AES3 channel is processed separately, and the unique data stream number requirement for each data stream only applies to a given AES3 channel. In this mode, up to 14 independent data streams (7 per channel) can be time multiplexed in the AES3 interface.

It is noted that a single time stamped data burst is applicable to a particular data burst of other data types. Although all time-stamped data bursts are identified as data stream number 7, they should not be considered as a single stream of related time-stamp values. When time code information is carried within a time code data burst, multiple time code streams may be transmitted within the data burst identified as data stream number 7. Other criteria or recommendations contain further information about the type of timestamp data.

Length code length _ code, which represents the length of the burst _ payload in bits. Depending on the data mode, the length _ code occupies 16, 20, or 24 bits of the AES3 subframe. The length _ code MSB is always located in slot 27 of the AES3 subframe.

The size of the burst payload field is limited according to the data mode: 0 to 65535 bits in the 16-bit mode, 0 to 1048575 bits in the 20-bit mode, and 0 to 16777215 bits in the 24-bit mode. The size of the burst _ preamble is not counted in the value of length _ code.

The burst payload is divided into data words and placed in a continuous sequence of AES3 frames in one of two modes:

in frame mode, two AES3 lanes should be used to transmit a set of non-PCM data streams. When packing a burst of data into a continuous sequence of frames, the available data space for each sub-frame within the AES3 frame should be combined. This mode would allow up to 32, 40 or 48 data bits to be placed in a single AES3 frame, depending on the data _ mode setting in the burst _ preamble. Considering the burst payload as a serial stream of bits, the first bit of the first data word of the payload in the burst occupies the MSB bit position of sub-frame 1 (slot 27) and the last bit of the first data word occupies the LSB bit position of sub-frame 2 (set according to the data pattern). The last data bit of the burst payload may occupy only a small portion of the last frame. Any unused bits in the last frame are set to 0.

In the sub-frame mode, each AES3 channel should be used separately to carry a set of non-PCM data streams or linear PCM audio. When packing bursts of data into a continuous sequence of frames, the sub-frames of each AES3 channel within a frame should be considered individually. This mode would allow up to 16, 20 or 24 data bits per lane to be placed in a single AES3 frame, depending on the data _ mode setting in the burst _ preamble.

Considering burst _ payload as a serial bit stream, the first bit of the first data word of the payload in the burst should occupy the MSB bit position of the sub-frame (slot 27) and the last bit of the first data word should occupy the LSB bit position of the sub-frame (set according to data _ mode). The last data bit of the burst payload may occupy only a small portion of the last frame. Any unused bits in the last frame should be set to 0.

In the subframe mode, the channel status word for each channel should be processed separately. Any channel carrying non-PCM data sets the channel status bit according to the present interface. Any channel transmitting linear PCM audio sets the channel status bit according to AES 3.

Burst interval, no sequence of 4096 or more AES3 frames (frame mode) or subframes (subframe mode) containing at least one data burst, no beginning of such at least one data burst, preceded by four AES3 subframes with subframe content in all 0's slots 8-27. This requirement ensures that extended synchronization codes of 0, Pa, Pb occur. Bursts of data from a given non-PCM data stream are placed in the AES3 interface in sequential order. If multiple non-PCM data streams are placed in the AES3 interface (or in a single lane in a sub-frame mode), the data bursts from each stream should be interleaved in a time-division multiplexed manner. The present interface has no other requirements on the multiplexing method as long as each stream satisfies the above requirements. Other standards or recommended practices may define specific multiplexing techniques, which may depend on the type of data.

The present interface does not require that the data bursts be placed at fixed intervals or locations in the AES3 stream, although these features may be used to convey synchronization information. Other criteria or recommended operations may be specified for this purpose, depending on the fixed reference location and repetition rate of the data type. the time _ stamp data type is also used to transmit specific synchronization information for certain data types.

The data formats contained in the data type-related field, data _ type _ specific, and burst _ payload fields depend on the data _ type field, and are beyond the scope of the present specification. The particular encoding of these fields may be found in other standards or recommended practices applicable to the particular data type.

And the consumption format is compatible, and the interface is only suitable for professional equipment. Some users in professional environments may wish to be compatible with consumer devices. In order to be specifically compatible with the user equipment, the professional device should implement an interface or a specific interface mode that complies with the user format specification IEC 60958. However, some professional devices may be desired to be compatible at the level of bitstream formatting while still utilizing the AES3 interface in professional mode. The present embodiment lists specific compatibility requirements and issues to be considered in terms of bit stream format.

The receiving device, the user bitstream format, is generally considered to be a subset of the professional bitstream format defined by the present specification. Professional equipment implementing the interface can read the user burst preamble and correctly extract the user burst data carried in the AES3 interface. This does not guarantee that a professional receiver can correctly receive and decode all data types. The present interface may not define certain user data types. In operation, a professional receiver will discard any data bursts containing undefined data types.

Source apparatus, for professional equipment to generate an AES3 output bit stream compatible with the user format, the data burst should be formatted in the following manner:

all data bursts should be limited to 16-bit frame mode (data mode 0, using two AES3 subframes).

There is at least one bit stream of data _ stream _ number 0. The bitstream should contain encoded audio information that is considered a primary audio service. The user equipment may not be able to receive the bit stream having a data stream number greater than 0.

Some specialized data types defined by the present interface may not be defined in the user format. The user equipment may expect (but not guarantee) to ignore these data types.

The user equipment should be able to read the burst preamble and extract the data burst format correctly according to the above mentioned proposals. This does not guarantee that the consumer receiver can correctly receive and decode all data types. Professional devices also take into account the user's synchronization requirements for certain data types that may affect the ability of the user device to receive and decode encoded data. Many users receive information that relies on channel state to detect non-PCM encoded data. In the AES3 signal, which sets the channel state according to the present interface, the consumer receiver may not be able to detect the bitstream.

Data type-related issues, for data types that are common to professional and consumer, a particular data type-related issue may still affect bitstream compatibility between professional and consumer devices. For some data types, the data _ type _ dependent field may differ between the professional specification and the user specification. Such differences may prevent proper exchange of encoded data and/or synchronization information. The encoding of burst payload may also be different for certain data types. Furthermore, the synchronization method required by consumer formats for certain data types may impose restrictions on data burst intervals, which in some cases may be different from the professional format allowed. Data type specific format issues are not within the scope of the present interface. Compatibility with a particular data type may be addressed by other criteria and recommendation operations. References to documents containing specific data type format requirements may be found in SMPTE 338M.

Automatically detecting audio/data mode, the AES3 interface may transmit PCM audio, non-PCM data, or both simultaneously in separate channels. A receiving device capable of receiving an AES3 stream containing PCM and non-PCM data needs to identify whether the AES3 information is considered PCM audio, non-PCM data, or both. This information may be conveyed by setting the 1 st bit of the channel status word to indicate data. In some applications it may be a useful function for the receiver to be able to determine whether the AES3 content is PCM audio or non-PCM data without reference to the 1 st bit of the channel state word. By recognizing the synchronization code formed by the first two words (Pa, Pb) of the preamble, it is less likely to occur often in natural PCM audio, which can be achieved very reliably.

By looking for an extended synchronization code consisting of six words (4 zeros after Pa, Pb in case of 16 bit patterns 0x0000, 0xF872, 0x4E 1F), the probability of synchronization error occurring will be very small. The decision process that can be followed is shown in fig. 5. In fig. 5, the mode of the receiver can be switched between PCM and data. The SYNC function is used to indicate whether an extended synchronization code is found within the range of 4096AES3 frames. Note that if the AES3 stream is idle (all zeros), the auto-detector will enter PCM mode and switch back to data mode only when a data burst occurs. If such behavior is not required, it can be prevented by inserting a null data burst at least once every 4096AES3 frame.

Fig. 3 is a data transmission device based on a serial digital audio interface according to an embodiment of the present application, including:

a data obtaining module 310, configured to obtain data stream data used for generating a data burst; wherein the data stream data comprises audio data and/or audio model metadata;

a data placement module 320 for placing audio samples of the data stream data in a payload field of a subframe of the data burst according to a data pattern of the data stream data;

a field setting module 330, configured to set a field other than the payload field in the data burst;

a data transmission module 340, configured to transmit the data burst through a serial digital audio interface.

the data placement module is specifically configured to:

Optionally, the format of the data stream data is PCM data or non-PCM data;

if the data stream data is non-PCM data, the field setting module is specifically configured to:

and setting preset bytes in a channel state field in the data burst.

Optionally, the data transmission apparatus based on the serial digital audio interface further includes:

and after the field outside the preset bit field in the data burst is set, marking the data burst as a digital identifier to indicate the data stream to which the data burst belongs.

Optionally, the field setting module is specifically configured to:

the setting the preamble field of the data burst includes:

in the sub-frame mode, the payload field is set to carry a set of non-PCM data burst sequences or linear PCM audio streams individually for each AES3 channel.

The data transmission device based on the serial digital audio interface provided by the embodiment of the invention can execute the data transmission method based on the serial digital audio interface provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 4, the electronic apparatus includes: a processor 410, a memory 420, an input device 430, and an output device 440. The number of the processors 30 in the electronic device may be one or more, and one processor 410 is taken as an example in fig. 4. The number of the memories 420 in the electronic device may be one or more, and one memory 420 is taken as an example in fig. 4. The processor 410, the memory 420, the input device 430, and the output device 440 of the electronic apparatus may be connected by a bus or other means, and fig. 4 illustrates the connection by the bus as an example. The electronic device can be a computer, a server and the like. In the embodiment of the present application, the electronic device is used as a server, and the server may be an independent server or a cluster server.

The memory 420 serves as a computer readable storage medium for storing software programs, computer executable programs, and modules, such as program instructions/modules of the data transmission apparatus based on the serial digital audio interface according to any embodiment of the present application. The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to devices through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 430 may be used to receive input numeric or character information and generate key signal inputs related to viewer user settings and function controls of the electronic device, as well as a camera for capturing images and a sound pickup device for capturing audio data. The output device 440 may include an audio device such as a speaker. It should be noted that the specific composition of the input device 430 and the output device 440 can be set according to actual situations.

The processor 410 executes various functional applications of the device and data processing, i.e., implements a data transmission method based on a serial digital audio interface, by executing software programs, instructions, and modules stored in the memory 420.

The embodiment of the application also provides a storage medium containing computer executable instructions, and the computer executable instructions are generated by a computer processor and comprise the data transmission method based on the serial digital audio interface provided by any embodiment.

Of course, the storage medium provided in the embodiments of the present application contains computer-executable instructions, and the computer-executable instructions are not limited to the operations of the electronic method described above, and may also perform related operations in the electronic method provided in any embodiments of the present application, and have corresponding functions and advantages.

From the above description of the embodiments, it is obvious for those skilled in the art that the present application can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a robot, a personal computer, a server, or a network device) to execute the electronic method according to any embodiment of the present application.

It should be noted that, in the electronic device, the units and modules included in the electronic device are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only used for distinguishing one functional unit from another, and are not used for limiting the protection scope of the application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "in an embodiment," "in another embodiment," "exemplary" or "in a particular embodiment," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although the present application has been described in detail above with respect to general description, specific embodiments and experiments, it will be apparent to those skilled in the art that some modifications or improvements may be made based on the present application. Accordingly, such modifications and improvements are intended to be within the scope of this invention as claimed.

Claims

1. A data transmission method based on a serial digital audio interface is characterized by comprising the following steps:

setting a field in the data burst other than the payload field;

the data bursts are transmitted over a serial digital audio interface.

2. The method of claim 1, wherein the data patterns comprise a 16-bit pattern, a 20-bit pattern, and a 24-bit pattern;

3. The method according to claim 1 or 2, wherein the format of the data stream data is PCM data or non-PCM data;

and setting preset bytes in a channel state field in the data burst.

4. The method of claim 3, wherein the setting the preset byte in the channel state field in the data burst comprises:

byte 0 bit 0 is set to 1 to indicate professional use of the channel state block; byte 0 bit 1 is set to 1 to indicate a non-audio mode; bits 2-4 of byte 0 are set to 000; byte 0 bit 5 is set according to AES3 section 4 to indicate the source frame frequency lock state; bits 6-7 of byte 0 are set according to paragraph 4 of AES3 to indicate the frame rate of the AES3 interface;

5. The method according to claim 1, further comprising, after the setting of the field other than the preset bit field in the data burst:

6. The method of claim 3, wherein setting a field other than the payload field in the data burst comprises:

7. The method of claim 6, wherein the setting the preamble field of the data burst comprises:

in a frame mode, setting 4 words contained in the preamble in 2 consecutive frames, wherein a previous frame of the 2 consecutive frames is a frame starting the data burst, a first subframe of the previous frame contains the first synchronization word, a second subframe of the previous frame contains the second synchronization word, a first subframe of a subsequent frame contains the burst information value, and a second subframe of the subsequent frame contains the length code;

8. The method of claim 6,

in frame mode, the payload field is set to use two AES3 lanes to transmit a set of non-PCM data streams;

9. A data transmission device based on a serial digital audio interface, comprising:

10. An electronic device, comprising: a memory and one or more processors;

the memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-8.

11. A storage medium containing computer-executable instructions for implementing the method of any one of claims 1-8 when executed by a computer processor.