CN111445914B

CN111445914B - Processing method and device for detachable and re-editable audio signals

Info

Publication number: CN111445914B
Application number: CN202010209390.9A
Authority: CN
Inventors: 潘兴德; 黄旭; 谭敏强
Original assignee: Wavarts Technologies Co ltd
Current assignee: Wavarts Technologies Co ltd
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2023-10-17
Anticipated expiration: 2040-03-23
Also published as: CN111445914A; WO2021190039A1

Abstract

The invention discloses a processing method and a processing device for detachable and reediting audio signals, which relate to the technical field of digital signal processing and audio production and solve the technical problems that under the condition of ensuring that compressed audio can be completely and correctly decoded, all production processes from original signal input to compressed sound signal output cannot be completed by one physical device, so that additional physical devices and transmission processes are needed; the audio editing module adds, deletes or replaces any audio track to generate a new audio track set, and the audio encoding module encodes the audio track and auxiliary data to obtain a compressed sound signal. All production flows from the original signal input to the compressed sound signal output can be completed by one physical device, and operations such as adding, deleting, replacing and the like can be performed on any audio track.

Description

Processing method and device for detachable and re-editable audio signals

Technical Field

The present disclosure relates to the field of digital signal processing and audio production technologies, and in particular, to a method and apparatus for processing a detachable and re-editable audio signal.

Background

Audio technology has been developed for many years, and stereo, 5.1, 7.1 surround sound, etc. systems have been widely used, but these systems can only present two-dimensional sound at most due to lack of sound altitude information. In the real world, panoramic sound (also called three-dimensional sound) is the most realistic presentation and expression mode of sound, and is a future development trend in the fields of nature, art or audiovisual entertainment.

Panoramic sound is sometimes also referred to as three-dimensional sound, immersive sound, and panoramic sound signals are generally divided into audio data and auxiliary data. The audio data may be mono or multi-channel audio signals, such as mono, stereo, 4.0, 5.1, 7.1, 9.1, 11.1, 13.1, 22.2, etc. channels, and combinations of the above channel types, such as 7.1 channel signal +4.0 channel signal +6 stereo signals; the auxiliary data are generally used for defining the spatial position or rendering mode of the audio data, so that the presentation effect of the audio data can be improved, for example, the three-dimensional positioning information can enable the spatial sense and the immersion sense of the audio to be stronger, and the sound effect (such as an equalizer, reverberation and the like) can enable the audio to be diversified when the information is processed, so that the hearing experience is enriched. One audio data and its auxiliary data are also collectively referred to as a sound object, and audio data without auxiliary data are referred to as a sound bed. Typical panoramic sound technology which is commercially available at present can refer to three-dimensional panoramic sound national standard AVS2-P3 (GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), dolby Atmos, WANOS and the like.

In a panoramic sound signal, the audio data may be a mono signal, a stereo signal, a single layer multi-channel signal, a multi-layer multi-channel signal (i.e., a combination of multiple channel signals, distributed in different height planes), etc. For example, some panoramic acoustic signals use two layers of planes of the middle layer and the top layer (e.g., 5.1.4 channels are the combination of two audio signals of 5.1 and 4.0 channels, 5.1 channels are in the middle layer, and 4.0 channels are in the top layer), some panoramic acoustic signals use three layers of planes, etc.; some panoramic acoustic signals have only multiple layers of audio data, but no auxiliary data, such as SMPTE's 22.2 three-dimensional acoustic system and AURO 9.1 system, etc.; some panoramic acoustic signals have both multi-layer multi-channel signals and ancillary data such as MPEG-H, dolby ATmos, WANOS, and DTS: and an X system. Of course, the panoramic sound signal may also be all mono or stereo signals and auxiliary data.

The panoramic sound format is the same as AAC, AC3, MP3, etc. formats, and also belongs to the compressed audio format. Currently, two types of manufacturing tools are commonly used in the process of manufacturing compressed audio signals:

the first type is digital audio workstations (Digital Audio Workstation, DAW, such as Pro Tools, nuendo, cubase, logic Pro, adobe audio, etc.), which are widely used in the production of movies and music, and which enable the production of high quality audio signals using professional audio plug-ins.

The second category is some audio and video applications such as K songs, short videos, dubbing software, etc. These software are widely used in the life of the public, and change the daily life and work of people in a acquainted manner. The audio-video application software supports editing and making of a conventional audio format (including a PCM format and a compressed audio format commonly used at present such as mp3 and aac, wma, ac 3), can support secondary creation of an audio signal (such as chorus and ensemble of multiple persons, relay/collaborative making of a work and the like), and has strong entertainment and interactivity.

The method for producing the audio signal is shown in fig. 1, and comprises the following specific steps:

101: adding audio data (hereinafter referred to as audio tracks), inputting an audio signal with a recorded source or importing an audio file with a conventional format, decoding the audio file into PCM data if the audio file is input, and recording the PCM data as an audio track set B after the addition is completed;

102: auxiliary data is added. For DAW, one or more auxiliary data may be configured per track; for software such as K songs, short video, etc., an auxiliary data can be added to the human voice. After the addition is completed, the auxiliary data set E0 is marked;

103: editing and making any audio track in the audio track set B and any auxiliary data in the auxiliary data set E0, wherein the editing and making comprises adding, deleting and replacing operations; steps 101 to 103 may be selectively performed or repeated, and no sequence is followed, and a track set B 'and an auxiliary data set E0' are generated after the completion of the production;

104: the manufactured audio track and the auxiliary data are encoded into a compressed audio signal S0'. If the output format is AAC, AC3, etc., then the auxiliary data set E0' is applied to set B ' in the manufacturing process to generate a pure track set B ", and B ' is encoded to generate a compressed audio file; and if the output format is the panoramic sound format, transmitting the sound track set B 'and the auxiliary data set E0' to a special panoramic sound encoding device for panoramic sound encoding, and generating a panoramic sound signal.

Steps 101 to 104 enable the production of high quality audio signals, but still suffer from a number of disadvantages:

(1) If the output signal is in panoramic sound format, two physical devices or software systems are required to complete the encoding, and no case of simultaneously editing and encoding with a single software/device has been known so far. And the audio track and the auxiliary data are transmitted separately, the audio track uses audio protocols (such as MADI, AES and the like), and the auxiliary data uses network protocols (such as TCP/IP and the like), so that the problems of delay, synchronization and the like of the audio data and the network data are also considered, and the process is complex.

(2) If the output signal is in a panoramic sound format, the panoramic sound can only be produced at the PC end at present, and the PC configuration requirement is higher, and no case of realizing panoramic sound editing production in interactive applications such as K songs, short videos, dubbing software and the like exists.

(3) Further, DAW can only be used as a professional production system, and output production results, and the output sound signals are subjected to down-mixing, and a plurality of sound elements are mixed in one PCM, so that separation cannot be achieved. Civil software such as short video, K song and the like can only add or simply process audio signals which are subjected to down-mixing molding, and specific sound elements cannot be removed.

(4) In internet applications, it is sometimes necessary to take the output compressed audio signal S0 'as a new input signal and to make temporary modifications or secondary authoring on the basis of S0'. At this time, each component in S0 'cannot be disassembled and editing operations such as addition, deletion, replacement, etc. can be performed separately, and only S0' can be edited as a whole, so that a specific sound component cannot be removed or replaced, and the sound effect of the specific sound component cannot be modified. For example, for a rock, existing DAWs typically down-mix guitar, bass, drum, keyboard, and vocal sounds into 2 or 5.1 format PCM channels and encode the output. Even if the encoded rock music is decoded, the sound parts such as guitar, bass, drum, keyboard and human voice cannot be separated, the specific sound parts are not allowed to be deleted or replaced, and the sound effects originally added to part or all of the works, such as reverberation, EQ, pressure limit and the like, are not allowed to be removed or modified. The only thing that can be done is to add the sound part on the basis of the original rock music work, or to do the whole sound effect processing to the original rock music.

In view of the above, no separate physical device (or software or method) has heretofore been available that can perform the following functions:

(1) The decoding of sound, the editing production of sound tracks, the editing production and encoding of auxiliary data (including sound effects) can be completed in one physical device (or software) without additional physical devices (or software) and data transmission;

(2) Each sound component can be independently encoded, edited, and decoded by anyone at any time, place, without being mixed with other sound components;

(3) Auxiliary data such as spatial information, rendering information, gain, reverberation, equalization and the like of single sound components or partial sound components or all sound components can be arbitrarily decoded, edited and encoded by anyone at any time and place without being mixed with other sound information and cannot be separated;

(4) The method can be compatible in various devices such as DAW, karaoke software, video software, dubbing software and the like, namely, any sound work can be decoded, edited, encoded and shared by any person (professionals and business persons) at any time and any place.

Disclosure of Invention

The invention provides a processing method and a processing device for detachable and re-editable audio signals, which are technically aimed at completing all production processes from original signal input to signal output by using one physical device without additional physical devices and transmission processes under the condition of ensuring that audio can be completely and correctly decoded; each track and auxiliary data contained in the code stream can be completely separated during decoding, and operations such as adding, deleting, replacing and the like or any combination of the three operations can be performed on any track and auxiliary data. The present disclosure provides a method and apparatus for processing a detachable and re-editable audio signal, which can implement the following functions:

1. the decoding of sound, the editing production of sound tracks, the editing production and encoding of auxiliary data (including sound effects) can be completed in one physical device (or software) without additional physical devices (or software) and data transmission;

2. each sound component can be independently encoded, edited, and decoded by anyone at any time, place, without being mixed with other sound components;

3. auxiliary data such as spatial information, rendering information, gain, reverberation, equalization and the like of single sound components or partial sound components or all sound components can be arbitrarily decoded, edited and encoded by anyone at any time and place without being mixed with other sound information and cannot be separated;

4. the method and the device can realize compatibility in various devices (such as DAW, karaoke software, video software, dubbing software and the like), namely, any person (professionals and business persons) can decode, edit, encode and share the same sound work at any time and any place.

The technical aim of the disclosure is achieved by the following technical scheme:

a method of processing a detachable and re-editable audio signal, comprising:

inputting m1 PCM signals, wherein m1 is greater than 0, and m1 PCM signals are the track set C1, and C1= { C _1i },0≤i≤m1-1；

Adding, deleting or replacing the track set C1 or any combination of the three ways to generate a new track set C1';

adding at least one group of auxiliary data to the track set C1 'to obtain an auxiliary data set E1';

encoding the track set C1 'and the auxiliary data set E1' to obtain a compressed sound signal S _q '。

Further, the method comprises the steps of:

when m2 auxiliary data are input and m2 is greater than 0, there is an auxiliary data set e1= { E _1j },0≤j≤m2-1；

Encoding the track set C1 'and the auxiliary data set E1 and E1' to obtain a compressed sound signal S _q ”。

Further, the method comprises the steps of:

inputting n3 PCM signals and n4 auxiliary data, wherein n3 and n4 are both greater than 0, the set of audio tracks is c3= { C _3k K is more than or equal to 0 and less than or equal to n3-1, and the auxiliary data set is E3= { E _3t },0≤t≤n4-1；

Adding, deleting or replacing or any combination of the three ways to the track set C3 to generate a new track set C3';

adding, deleting or replacing the auxiliary data set E3 or any combination of the three modes to obtain an auxiliary data set E3';

encoding the set of audio tracks C3 'and the set of auxiliary data E3' to obtain a compressed sound signal S _q ”'。

Further, the input PCM signal may be partly or entirely from a recording device input or a local storage or a network input or any combination of the three inputs.

Further, the PCM signal stored locally or input from the network may be obtained by decoding the compressed audio signal.

Further, the auxiliary data may be obtained by decoding the compressed audio signal.

Further, the auxiliary data may be a down-mix scheme of the audio track, spatial location information, spatial trajectory information, reverberation parameters, equalizer parameters, etc.

Further, the auxiliary data may be applied to all or part of the tracks of the set of tracks.

Further, the auxiliary data may be fixed or may change with time.

A processing apparatus for detachable and re-editable audio signals, comprising:

the audio input module comprises a PCM input unit, wherein the PCM input unit inputs m1 PCM signals, m1 is larger than 0, then m1 PCM signals are the track set C1, and then C1= { C _1i },0≤i≤m1-1；

The audio editing module comprises an audio track editing unit, wherein the audio track editing unit adds, deletes or replaces the audio track set C1 or any combination of the three modes to generate a new audio track set C1';

the auxiliary data adding module is used for adding at least one group of auxiliary data for the track set C1 'to obtain an auxiliary data set E1';

the audio coding module is used for coding the audio track set C1 'and the auxiliary data set E1' to obtain a compressed sound signal S _q '。

Further, the audio input module also comprises an auxiliary data input sheetThe auxiliary data input unit inputs m2 auxiliary data, and if m2 is greater than 0, there is an auxiliary data set e1= { E _1j },0≤j≤m2-1；

The audio encoding module encodes the audio track set C1 'and the auxiliary data set E1 and E1' to obtain a compressed sound signal S _q ”。

Further, the audio editing module further comprises an auxiliary data editing unit, and the auxiliary data editing unit adds, deletes or replaces the auxiliary data set or any combination of the three modes to obtain a new auxiliary data set.

Further, the PCM signal input by the PCM input unit may be partly or wholly from a recording device input or a local storage or a network input or any combination of the three inputs.

Further, the apparatus further comprises a decoding module comprising an audio decoding unit by which the locally stored or network-input PCM signal is obtainable by decoding the compressed audio signal.

Further, the decoding module further includes an auxiliary data decoding unit through which the auxiliary data is obtained by decoding the compressed audio signal.

The beneficial effects of the present disclosure are: according to the audio signal processing method and device, an audio input module inputs an audio signal, and an auxiliary data adding module can add auxiliary data for an audio track; the audio editing module adds, deletes or replaces any audio track or auxiliary data or any combination of the three modes, so that a new audio track set and an auxiliary data set are generated, and the audio encoding module encodes the audio track and the auxiliary data to obtain a compressed sound signal.

All production processes from the original signal input to the compressed sound signal output can be completed by one physical device, and no additional physical device and transmission process are needed; and performs operations such as addition, deletion, replacement, etc. or any combination of the three operations on any audio track and auxiliary data.

Drawings

FIG. 1 is a flow chart of a conventional audio production method;

FIG. 2 is a flow chart of an embodiment of the method of the present disclosure;

FIG. 3 is a flow chart of a second embodiment of the disclosed method;

FIG. 4 is a schematic diagram of an embodiment of the disclosed device;

FIG. 5 is a schematic diagram of a second embodiment of the disclosed apparatus;

fig. 6 is a schematic diagram of an embodiment of the device of the present disclosure.

Detailed Description

The technical scheme of the present disclosure will be described in detail below with reference to the accompanying drawings.

In the description of the present disclosure, it is to be understood that PCM (Pulse-code modulation) audio track data is a separate sound component, not a sound component that is not mixed together to be disassembled. That is, the PCM audio track data is independent sound parts or musical instruments or human voices, and not a plurality of sound parts or musical instruments or human voices are mixed together and cannot be disassembled. The PCM track data may be PCM data of independent sound components obtained by recording, inputting, decoding, etc., such as independent components of instruments, such as guitar, bass, drum, keyboard, vocal, violin, etc., vocal parts, or combinations thereof. As a special case of the present invention, the PCM audio track data also allows sound components which cannot be disassembled together to be input, but in this case, only unified audio track editing and sound effect editing can be performed on the sound components which cannot be disassembled together, and the components in the PCM audio track data cannot be disassembled and processed separately.

Embodiment one: and adding shared auxiliary data for the edited audio track.

The processing method and device for detachable and re-editable audio signals can perform editing operations such as adding, deleting, replacing and the like on input audio tracks, and add one or more sharing auxiliary data to all audio tracks or part audio tracks, as shown in fig. 2, and comprises the following steps:

(301) M PCM audio track data are input, the total number of existing audio tracks is recorded as x after the input, all audio tracks are recorded as an audio track set C0, x-1, and m is more than or equal to 1. The input track data may be partially or wholly from recording device inputs, local storage, network inputs, or any combination of the three.

(302) Editing: performing adding, deleting and replacing operations on the existing audio tracks, always keeping the value of x equal to the number of the current audio tracks, and recording the manufactured audio track set as C0, I, x-1, wherein the adding operation of the audio tracks is the same as that of the step (301);

(303) Meanwhile, n auxiliary data can be added to y audio tracks in the manufactured audio track set C ', and the manufactured audio track set C ' is recorded as an auxiliary data set E ' [0 ],. The auxiliary data set E-1 ] which means that each auxiliary data in the E ' acts on y audio tracks at the same time, namely, the E ' is shared by the y audio tracks; n is more than or equal to 0, y is more than or equal to 1 and less than or equal to x;

the operations of adding, deleting, replacing, adding auxiliary data and the like of the audio tracks can be selectively and repeatedly performed, and no sequence exists.

(304) Audio coding: the track set C ' and the corresponding auxiliary data set E ' are jointly encoded into a compressed audio signal S ', and the encoding technology can refer to three-dimensional panoramic sound national standard AVS2-P3 (GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), dolby Atmos, WANOS and the like.

Embodiment two: the audio track and the auxiliary data are input, and various types of auxiliary data are added, deleted, and replaced at the time of editing production.

The processing method and device for detachable and re-editable audio signals provided by the invention can carry out editing operations such as adding, deleting, replacing and the like on auxiliary data on the basis of the embodiment 1, and can edit various types of auxiliary data, as shown in fig. 3, and comprises the following steps:

(401) Input data, comprising:

(401.1) adding an audio signal: the added audio signal may be partially or entirely from recording device inputs, local storage, network inputs, or any combination of the three inputs; for local storage and network input, the audio format may be a PCM signal, a compressed audio signal, or any combination of the two formats. If the added audio signal contains m3 PCM recording tracks, m4 locally-imported PCM signals, m5 locally-imported compressed audio signals, and m6 network compressed audio signals, then m5 locally-compressed audio signals are decoded into m5 'PCM signals, m6 network compressed audio signals are decoded into m6' PCM signals, and the total number of existing tracks is denoted as x, and all tracks are denoted as track set C0. m3, m4, m5 and m6 are all larger than or equal to 0, m3+m4+m5+m6 is larger than or equal to 1, m5 'is larger than or equal to m5, m6' is larger than or equal to m6, and x=m3+m4+m5 '+m6'; audio formats of the locally compressed audio signal and the network compressed audio signal include, but are not limited to, AAC, AC3, MP3, WANOS, atmos, etc., and decoding techniques may refer to AAC (ISO/IEC 13818-7), AC3 (ATSC a/52), MP3, three-dimensional panoramic sound national standard AVS2-P3 (GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), dolby Atmos, WANOS, etc.

(401.2) adding auxiliary data. Auxiliary data is added to the existing audio track, denoted set E. Auxiliary data and audio track correspondence, which may be applied to a single audio track (e.g., equalizer, reverberations, spatial information, etc.), or may be applied to multiple audio tracks (e.g., downmix, automatic gain, etc.) simultaneously; from the perspective of the tracks, each track may have one or more auxiliary data, and multiple tracks may share one or more auxiliary data at the same time; the sound effects on a single track and the sound effects shared by multiple tracks may exist simultaneously and in any combination.

For auxiliary data on a single track, the specific operation is that m auxiliary data are added to any track in the existing track set C, and the auxiliary data are divided according to tracks and recorded as an auxiliary data set E4[0 ], m-1]Representing each track C i]The corresponding auxiliary data is E4[ i ]][0,...,e _i -1]，e _i Representing the current auxiliary data amount of the i-th track. For auxiliary data shared by a plurality of audio tracks, the specific operations are as follows: n auxiliary data, denoted E5[0 ], n-1, are added to y tracks in set C]Representing that each auxiliary data in E5 acts on, i.e. is shared by, y tracks at the same time. m is greater than or equal to 0, n is greater than or equal to 0, m+n is greater than or equal to 1, e _i ≥0(e _i When=0, it means that no auxiliary data is on the i-th track, 0.ltoreq.i < x, 1.ltoreq.y.ltoreq.x (when y=x, it means that auxiliary data in E5 acts on all tracks in C, and when 1.ltoreq.y < x, it means that auxiliary data in E5 acts on part of tracks in C), e=e4+e5.

(402) Editing and making

The method comprises the steps of performing adding, deleting and replacing operations on the existing audio tracks, always keeping the value of x equal to the number of the current audio tracks, and recording the manufactured audio track set as C' [ 0..x-1 ]; the adding operation of the audio track is the same as the step (401.1);

the existing auxiliary data is added, deleted and replaced, and e is always kept _i The value of (2) is equal to the auxiliary data amount of the i-th track, and the manufactured auxiliary data set is denoted as E' [ 0..x-1.)]The auxiliary data adding operation is the same as the step (401.2);

the operations of adding, deleting and replacing the audio track and the auxiliary data can be selectively and repeatedly performed, and no sequence exists.

(403) And (5) audio coding. The set of audio tracks C ' and its corresponding set of auxiliary data E ' are jointly encoded into a compressed audio signal S '. The coding technology can refer to three-dimensional panoramic sound national standard AVS2-P3 (GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), dolbyAtmos and the like.

Embodiment III: the input audio signal contains auxiliary data, and the output audio signal can be subjected to secondary production.

The processing method and apparatus for detachable and re-editable audio signals according to the present invention can add auxiliary data to each audio track, and can make a second time of the audio signal (e.g. the final output signal S') as an input source, as shown in fig. 3, comprising the steps of:

(501) M7 compressed audio signals containing auxiliary data are input. The m7 audio signals are decoded (the decoding technology can refer to three-dimensional panoramic sound national standard AVS2-P3 (GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), dolby Atmos, WANOS and the like), the contained audio track data and auxiliary data are completely separated, and m8 PCM audio tracks and m9 auxiliary data are generated. The m8 tracks are denoted as set C0, m8-1]The method comprises the steps of carrying out a first treatment on the surface of the M9 auxiliary data are divided by track and recorded as a set E0, m8-1]Representing that m9 auxiliary data correspond to m8 audio tracks, and the auxiliary data corresponding to the ith audio track is E [ i ]][0,...,e _i -1]；1≤m7≤m8，0≤i＜m8，e _i ≥0(e _i When=0, no auxiliary data is present on the i-th track), m9 > 0, Σe _i ＝m9；

Let the current track number be x, then x=m8 at this time;

(502) Editing is performed on the basis of the track set C and the auxiliary data set E, including but not limited to:

and performing adding, deleting and replacing operations on the existing audio tracks. And always keep: the value of x is equal to the current number of tracks; the content in C is all tracks currently.

The method is characterized in that the existing auxiliary data are subjected to adding, deleting and replacing operations, and the operation is always kept: e, e _i The value of (2) is equal to the auxiliary data amount of the i-th track; the content in E is the auxiliary data corresponding to each track currently. In addition to the features described in (401.2), the assistance data may also be time-dependent (e.g. spatial location information, reference to national standard GB/T33475.3, dolby Atmos, etc.) or fixed (e.g. equalizer parameters).

The set of tracks after fabrication is denoted as C '[0,..x-1 ], and the set of auxiliary data is denoted as E' [0,..x-1 ].

The operations of adding, deleting and replacing can be selectively and repeatedly performed, and no sequence exists.

(503) And (5) audio coding. The set of audio tracks C ' and its corresponding set of auxiliary data E ' are jointly encoded into a compressed audio signal S '. During encoding, fixed auxiliary data and auxiliary data changing with time can be processed differently, and specific reference can be made to three-dimensional panoramic sound national standard AVS2-P3 (GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), dolby Atmos and the like.

(504) And (5) secondary manufacturing. If S 'is temporarily modified or secondarily produced (e.g., multi-player chorus/ensemble, multi-player relay/collaboration is performed to complete a work, etc.), steps (501) to (503) are repeated with S' as a signal input source until the production is completed, and a final compressed audio signal is output by (503).

One or more auxiliary data may be added to each track, as the track may be devoid of auxiliary data, there may be one auxiliary data, or there may be a plurality of auxiliary data, i.e. the auxiliary data set E1 'is actually a set of auxiliary data contained in all tracks of the track set C1', in general the track without auxiliary data is called a soundtrack, and the track with auxiliary data is called a sound object.

After the audio track and the auxiliary thereof are added, deleted or replaced, the audio object and the audio bed are possibly changed, the audio track and the audio bed in the changed audio object form a new audio track set, and all the auxiliary data in the changed audio object form a new auxiliary data set, namely, the changed audio object and the audio bed are encoded to obtain a compressed audio signal.

Fig. 4 is a schematic diagram of an embodiment of the present apparatus, where the apparatus includes an audio input module, an audio editing module, an auxiliary data adding module, and an audio encoding module, and the audio input module includes a PCM input unit, and the PCM input unit inputs PCM signals, for example, m1 PCM signals, where m1 is greater than 0, and the m1 PCM signals are an audio track set C1, where c1= { C is present _1i },0≤i≤m1-1。

The audio editing module comprises an audio track editing unit, wherein the audio track editing unit adds, deletes or replaces the audio track set C1 or any combination of the three modes to generate a new audio track set C1'; the auxiliary data adding module adds at least one group of auxiliary data for the track set C1 'to obtain an auxiliary data set E1'; the audio encoding module encodes the track set C1 'and the auxiliary data set E1' to obtain a compressed sound signal S _q '。

Fig. 5 is a schematic diagram of a second embodiment of the present apparatus, where, based on the first embodiment of the apparatus, the audio input module further includes an auxiliary data input unit, where the auxiliary data input unit inputs an auxiliary data set E1, E1 may be a group of auxiliary data shared by a plurality of audio tracks in C1, or may be a set of auxiliary data added by different audio tracks in C1, and finally the audio encoding module encodes C1', E1, and E1' to obtain a compressed sound signal S _q ”。

Fig. 6 is a schematic diagram of a third embodiment of the present apparatus, where the audio input module further includes a compressed signal input unit, and the decoding module decodes the compressed signal after inputting the compressed signal. The decoding module comprises an audio decoding unit and an auxiliary data decoding unit, and if the input signal is a compressed audio signal (such as local storage or network input), the audio decoding unit can decode the input signal to obtain corresponding PCM data; if the input compressed signal also contains auxiliary data, the auxiliary data decoding unit may decode the input signal to obtain the auxiliary data.

The audio editing module further comprises an auxiliary data editing unit which adds, deletes or replaces or any combination of the three modes to the auxiliary data set to obtain a new auxiliary data set.

As a specific embodiment, the output of the audio encoding module is input to the audio input module.

As a specific example, the input PCM signal may be partly or wholly from a recording device input or a local storage or a network input or any combination of the three inputs.

As a specific embodiment, the number of channels of the audio signal input by the audio input module includes mono, stereo, 4.0 channels, 5.1 channels, 7.1 channels, 9.1 channels, 11.1 channels, 13.1 channels, 22.2 channels, and any combination of the above channel types.

As a specific example, the auxiliary data may be a down-mix scheme of the audio track, spatial location information, spatial trajectory information, reverberation parameters, equalizer parameters, etc.

As a specific example, the auxiliary data may be applied to all or part of the tracks of the set of tracks.

As a specific embodiment, the auxiliary data adding module adds auxiliary data or not, without affecting the implementation of the disclosure.

The foregoing is an exemplary embodiment of the disclosure, the scope of which is defined by the claims and their equivalents.

Claims

1. A method of processing a detachable and re-editable audio signal, comprising:

inputting m1 PCM signals, wherein m1 is greater than 0, and m1 PCM signals are track set C1C1＝{C _1i },0≤i≤m1-1；

encoding the track set C1 'and the auxiliary data set E1' to obtain a compressed sound signal S _q '；

The PCM signal which is stored locally or input by the network is obtained by decoding the compressed audio signal;

the auxiliary data is obtained by decoding the compressed audio signal.

2. The method for processing a detachable and re-editable audio signal as claimed in claim 1, comprising:

3. A method of processing a detachable and re-editable audio signal as claimed in claim 2, comprising:

4. A method of processing a detachable and re-editable audio signal as claimed in any one of claims 1 to 3, wherein part or all of the input PCM signal is derived from recording device input or local storage or network input or any combination of the three inputs.

5. A method of processing a detachable and re-editable audio signal as claimed in any one of claims 1-3, wherein the auxiliary data is a down-mix scheme of the audio track, spatial location information, spatial track information, reverberation parameters, equalizer parameters.

6. A method of processing a detachable and re-editable audio signal as claimed in any one of claims 1 to 3, wherein the auxiliary data is applied to all or part of the tracks of the set of tracks.

7. A method of processing a detachable and re-editable audio signal as claimed in any one of claims 1 to 3, wherein the auxiliary data is fixed or time-dependent.

8. A processing apparatus for detachable and re-editable audio signals, comprising:

the audio coding module is used for coding the audio track set C1 'and the auxiliary data set E1' to obtain a compressed sound signal S _q '；

The device also comprises a decoding module, wherein the decoding module comprises an audio decoding unit, and a PCM signal which is stored locally or input by a network is obtained by decoding a compressed audio signal through the audio decoding unit;

the decoding module further includes an auxiliary data decoding unit by which the auxiliary data is obtained by decoding the compressed audio signal.

9. The apparatus for processing a detachable and re-editable audio signal according to claim 8, wherein the audio input module further comprises an auxiliary data input unit for inputting m2 auxiliary data, m2 being greater than 0, the auxiliary data set e1= { E _1j },0≤j≤m2-1；

10. The apparatus according to claim 9, wherein the audio editing module further comprises an auxiliary data editing unit that adds, deletes or replaces or any combination of the three ways to the auxiliary data set to obtain a new auxiliary data set.

11. A processing apparatus for detachable and re-editable audio signals according to any one of claims 8 to 10, wherein the PCM signal input by the PCM input unit is partly or wholly from a recording device input or a local storage or a network input or any combination of the three inputs.