CN111445914A - Processing method and device capable of disassembling and re-editing audio signal - Google Patents

Processing method and device capable of disassembling and re-editing audio signal Download PDF

Info

Publication number
CN111445914A
CN111445914A CN202010209390.9A CN202010209390A CN111445914A CN 111445914 A CN111445914 A CN 111445914A CN 202010209390 A CN202010209390 A CN 202010209390A CN 111445914 A CN111445914 A CN 111445914A
Authority
CN
China
Prior art keywords
audio
auxiliary data
editing
signal
track
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010209390.9A
Other languages
Chinese (zh)
Other versions
CN111445914B (en
Inventor
潘兴德
黄旭
谭敏强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wavarts Technologies Co ltd
Original Assignee
Wavarts Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wavarts Technologies Co ltd filed Critical Wavarts Technologies Co ltd
Priority to CN202010209390.9A priority Critical patent/CN111445914B/en
Publication of CN111445914A publication Critical patent/CN111445914A/en
Priority to PCT/CN2020/140722 priority patent/WO2021190039A1/en
Application granted granted Critical
Publication of CN111445914B publication Critical patent/CN111445914B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Management Or Editing Of Information On Record Carriers (AREA)
  • Signal Processing For Digital Recording And Reproducing (AREA)

Abstract

The invention discloses a processing method and a device for disassembling and re-editing an audio signal, which relate to the technical field of digital signal processing and audio manufacturing and solve the technical problem that under the condition of ensuring that a compressed audio can be completely and correctly decoded, the whole production process from the input of an original signal to the output of a compressed sound signal cannot be completed by using one physical device, so that additional physical devices and a transmission process are needed; the audio editing module adds, deletes or replaces any audio track to generate a new audio track set, and the audio coding module codes the audio track and auxiliary data to obtain a compressed sound signal. It is possible to complete the entire production flow from the input of the original signal to the output of the compressed sound signal with one physical apparatus and perform operations such as addition, deletion, replacement, etc. of an arbitrary audio track.

Description

Processing method and device capable of disassembling and re-editing audio signal
Technical Field
The present disclosure relates to the field of digital signal processing and audio production technologies, and in particular, to a processing method and apparatus for disassembling and re-editing an audio signal.
Background
After many years of development, stereo, 5.1, 7.1 surround sound, etc. systems have been widely used, but these systems can only present two-dimensional sound at most due to lack of sound height information. In the real world, panoramic sound (also called three-dimensional sound) is the most realistic presentation and expression mode of sound, and is a future development trend regardless of the nature, the art field or the audiovisual entertainment field.
Panned sound is sometimes also referred to as three-dimensional sound, immersive sound, and panned sound signals are generally divided into audio data and auxiliary data. The audio data may be a mono or multi-channel audio signal, such as mono, stereo, 4.0, 5.1, 7.1, 9.1, 11.1, 13.1, 22.2, etc. channels and combinations of the above channel types, such as 7.1 channel signal +4.0 channel signal +6 stereo signals; the auxiliary data is generally used for defining the spatial position or rendering mode of the audio data, and can improve the presentation effect of the audio data, for example, the three-dimensional positioning information can make the spatial sense and immersion sense of the audio stronger, and the information processed by sound effects (such as equalizer and reverberation) can make the audio more diversified and enrich the auditory experience. Sometimes, one piece of audio data and its auxiliary data are collectively referred to as a sound object, and audio data without auxiliary data is referred to as a sound bed. Typical panoramas techniques that are currently commercially available may be referred to the three-dimensional panoramas national standard AVS2-P3(GB/T33475.3), the international standard MPEG-H (ISO/IEC 23008-3), Dolby Atmos, WANOS, and the like.
In a panoramic sound signal, the audio data may be a mono signal, a stereo signal, a single-layer multi-channel signal, a multi-layer multi-channel signal (i.e., a combination of multiple channel signals distributed in different height planes), and the like. For example, some panoramic sound signals use two layers of planes (e.g., 5.1.4 channels is a combination of audio signals of 5.1 and 4.0 channels, 5.1 channels is in the middle layer, 4.0 channels is in the top layer), some panoramic sound signals use three layers of planes, etc.; some panoramic acoustic signals have only multi-layer audio data, but no auxiliary data, such as the 22.2 three-dimensional acoustic system of SMPTE and the AURO 9.1 system; some panoramas have both multi-layer multi-channel signals and auxiliary data, such as MPEG-H, Dolby Atmos, WANOS, and DTS: and (4) X system. Of course, the panoramic sound signal may also be a mono or stereo signal and the auxiliary data in its entirety.
The panoramic sound format is also a compressed audio format, as in the AAC, AC3, MP3, and the like. Currently, two types of production tools are commonly used in the production of compressed audio signals:
the first is a Digital Audio Workstation (DAW, such as Pro Tools, Nuendo, Cubase, L ogic Pro, Adobe Audio, etc.), which is widely used for producing movies and music, and can produce high-quality Audio signals by using professional Audio plug-ins.
The second category is some audio-video applications such as karaoke, short video, dubbing software, etc. The software is widely deep in the public life and changes the daily life and work of people in a subtler way. The audio and video application software supports editing and manufacturing of conventional audio formats (including PCM formats and currently common compressed audio formats such as mp3, aac, wma, ac 3) and can support secondary creation of audio signals (such as multi-person chorus, ensemble, relay/cooperative manufacturing of a piece of work and the like), so that the audio and video application software has strong entertainment and interactivity.
The audio signal manufacturing method is shown in fig. 1, and comprises the following specific steps:
101: adding audio data (hereinafter referred to as audio track), inputting audio signals from recording or importing audio files in conventional format, decoding the audio signals into PCM data if the audio files are input, and recording the PCM data as audio track set B after the audio data are added;
102: auxiliary data is added. For DAW, each track may be configured with one or more auxiliary data; for software such as Karaoke, short video and the like, auxiliary data can be added to the voice. After the addition is completed, the auxiliary data set E0 is recorded;
103: editing any audio track in the audio track set B and any auxiliary data in the auxiliary data set E0, wherein the editing operation comprises adding, deleting and replacing operations; the steps 101 to 103 can be selectively carried out or repeatedly carried out, and have no sequence, and a track set B 'and an auxiliary data set E0' are generated after the manufacture is finished;
104: the produced audio track and the auxiliary data are encoded into a compressed audio signal S0'. If the output format is conventional format such as AAC, AC3, applying auxiliary data set E0 'to set B' in the manufacturing process, generating pure track set B ', and encoding B' to generate compressed audio file; and if the output format is the panoramagram format, transmitting the audio track set B 'and the auxiliary data set E0' to a special panoramagram coding device for panoramagram coding to generate a panoramagram signal.
Steps 101 to 104 enable the production of high quality audio signals, but there are still some disadvantages:
(1) if the output signal is in a panoramic sound format, two physical devices or software systems are needed for encoding, and a case of simultaneously realizing editing and encoding by using a single software/device does not exist so far. And the audio track and the auxiliary data are transmitted separately, the audio track uses audio protocols (such as MADI, AES and the like), the auxiliary data uses network protocols (such as TCP/IP and the like), so the problems of time delay, synchronization and the like of the audio data and the network data are also considered, and the process is complex.
(2) If the output signal is in a panoramic sound format, the output signal can only be manufactured at a PC end at present, the requirement on the configuration of the PC is high, and no case for editing and manufacturing the panoramic sound in interactive applications such as Karaoke, short video, dubbing software and the like exists.
(3) Furthermore, the DAW can only be used as a professional production system and output production results, and the output sound signals are down-mixed, and multiple sound elements are mixed in one PCM and cannot be separated. The short video, Karaoke and other civil software can only add or simply process the down-mixed audio signal, and cannot remove specific sound elements.
(4) In internet applications, it is sometimes necessary to take the output compressed audio signal S0 'as a new input signal and perform temporary modification or secondary authoring based on S0'. At this time, the components in S0 'cannot be separated and editing operations such as addition, deletion, and replacement cannot be performed, and only S0' can be edited and created as a whole, and the specific sound component cannot be removed or replaced, and the sound effect of the specific sound component cannot be modified. For example, for a rock music, the existing DAW typically down-mixes the acoustic parts of guitar, bass, drum, keyboard, and human voice into 2 or 5.1 PCM channels and encodes the output. Even if the encoded rock music is decoded, sound parts such as guitar, bass, drum, keyboard and human voice cannot be separated, and specific sound parts are not allowed to be deleted or replaced, and sound effects originally added to part or all of the works, such as reverberation, EQ, pressure limit and the like, are not allowed to be removed or modified. The only thing that can do is on the basis of the original rock music works, the sound part is added, or the original well-made rock music is processed by the integral sound effect.
In summary, there is no independent physical device (or software or method) that can implement the following functions:
(1) the decoding of sound, the editing and production of audio track, the editing and production of auxiliary data (including sound effect) and the coding can be completed in one physical device (or software) without additional physical device (or software) and data transmission;
(2) each sound component can be independently encoded, edited and decoded by anyone at any time, place, without being mixed with other sound components;
(3) auxiliary data such as spatial information, rendering information, gain, reverberation, equalization and the like of a single sound component or a part of the sound component or all the sound components can be decoded, edited and encoded at will by anyone at any time and place, and are not mixed with other sound information and cannot be separated;
(4) the method can realize compatibility in various devices, such as DAW, karaoke software, video software, dubbing software and other applications, namely, anyone (professional and business person) can decode, edit, encode and share the same sound works at any time and any place.
Disclosure of Invention
The utility model provides a processing method and a device for disassembling and re-editing audio signals, which can complete the whole production process from the input of original signals to the output of signals by using a physical device without additional physical devices and transmission processes under the condition of ensuring the audio to be complete and correct for decoding; when decoding, each audio track and auxiliary data contained in the code stream can be completely separated, and operations such as addition, deletion, replacement and the like or any combination of three operations can be carried out on any audio track and auxiliary data. The present disclosure provides a processing method and apparatus for disassembling and re-editing an audio signal, which can realize the following functions:
1. the decoding of sound, the editing and production of audio track, the editing and production of auxiliary data (including sound effect) and the coding can be completed in one physical device (or software) without additional physical device (or software) and data transmission;
2. each sound component can be independently encoded, edited and decoded by anyone at any time, place, without being mixed with other sound components;
3. auxiliary data such as spatial information, rendering information, gain, reverberation, equalization and the like of a single sound component or a part of the sound component or all the sound components can be decoded, edited and encoded at will by anyone at any time and place, and are not mixed with other sound information and cannot be separated;
4. the method and the device can realize compatibility in various devices (such as DAW, karaoke software, video software, dubbing software and other applications), namely, anyone (professional and business person) can decode, edit, encode and share the same sound works at any time and any place by using the method or the device.
The technical purpose of the present disclosure is achieved by the following technical solutions:
a processing method of dismantlable and re-editable audio signals, comprising:
m1 PCM signals are input, m1 is greater than 0, m1 PCM signals are the track set C1, and C1 ═ { C1i},0≤i≤m1-1;
Adding, deleting or replacing the audio track set C1 or any combination of the three ways to generate a new audio track set C1';
adding at least one set of auxiliary data to the audio track set C1 'to obtain an auxiliary data set E1';
encoding the set of audio tracks C1 'and the set of auxiliary data E1' to obtain compressed soundSound signal Sq'。
Further, comprising:
inputting m2 auxiliary data, if m2 is greater than 0, then there is auxiliary data set E1 ═ { E ═ E1j},0≤j≤m2-1;
Encoding the set of audio tracks C1 'and the sets of auxiliary data E1 and E1' to obtain a compressed sound signal Sq”。
Further, comprising:
inputting n3 PCM signals and n4 auxiliary data, where n3 and n4 are both greater than 0, then there is a track set of C3 ═ { C3kK is more than or equal to 0 and less than or equal to n3-1, and the auxiliary data set is E3 ═ E3t},0≤t≤n4-1;
Adding, deleting or replacing the audio track set C3 or any combination of the three ways to generate a new audio track set C3';
adding, deleting or replacing the auxiliary data set E3 or randomly combining the three modes to obtain an auxiliary data set E3';
encoding the set of audio tracks C3 'and the set of auxiliary data E3' to obtain a compressed sound signal Sq”'。
Further, the input PCM signal may come in part or in whole from a sound recording device input or a local storage or network input or any combination of the three.
Further, the PCM signal of the local storage or the network input may be obtained by decoding the compressed audio signal.
Further, the auxiliary data may be obtained by decoding the compressed audio signal.
Further, the auxiliary data may be a downmix scheme of the audio track, spatial position information, spatial trajectory information, reverberation parameters, equalizer parameters, etc.
Further, the auxiliary data may act on all or part of the tracks of the set of tracks.
Further, the assistance data may be fixed or may vary over time.
A processing apparatus that can disassemble and re-edit an audio signal, comprising:
an audio input module comprising a PCM input unit inputting m1 PCM signals, m1 being greater than 0, m1 being the set of tracks C1, C1 ═ C1i},0≤i≤m1-1;
An audio editing module which comprises a track editing unit, wherein the track editing unit adds, deletes or replaces the track set C1 or randomly combines the three ways to generate a new track set C1';
an auxiliary data adding module, which adds at least one set of auxiliary data to the audio track set C1 'to obtain an auxiliary data set E1';
an audio encoding module for encoding the audio track set C1 'and the auxiliary data set E1' to obtain a compressed sound signal Sq'。
Further, the audio input module further comprises an auxiliary data input unit, wherein the auxiliary data input unit inputs m2 auxiliary data, m2 is greater than 0, and there is an auxiliary data set E1 ═ { E ═ E1j},0≤j≤m2-1;
The audio encoding module encodes the set of audio tracks C1 'and the sets of auxiliary data E1 and E1' into a compressed sound signal Sq”。
Furthermore, the audio editing module further comprises an auxiliary data editing unit, and the auxiliary data editing unit adds, deletes or replaces the auxiliary data set or randomly combines the three modes to obtain a new auxiliary data set.
Further, the PCM signal input by the PCM input unit may be partially or completely from a recording device input or a local storage or a network input or any combination of the three inputs.
Further, the apparatus further includes a decoding module including an audio decoding unit by which the PCM signal locally stored or network-inputted may be obtained by decoding the compressed audio signal.
Further, the decoding module further comprises an auxiliary data decoding unit, by which auxiliary data is obtained by decoding the compressed audio signal.
The beneficial effect of this disclosure lies in: according to the audio signal processing method and device, the audio input module inputs the audio signal, and the auxiliary data adding module can add auxiliary data to the audio track; the audio editing module adds, deletes or replaces any audio track or auxiliary data or any combination of the three modes, so as to generate a new audio track set and an auxiliary data set, and the audio coding module codes the audio track and the auxiliary data to obtain a compressed sound signal.
The method can complete all production processes from original signal input to compressed sound signal output by using one physical device without additional physical devices and transmission processes; and performs operations such as addition, deletion, replacement, or any combination of the three operations on any audio track and auxiliary data.
Drawings
FIG. 1 is a flow chart of a conventional audio production method;
FIG. 2 is a flow chart of an embodiment of the disclosed method;
FIG. 3 is a flowchart of an embodiment of a second method of the present disclosure;
FIG. 4 is a schematic diagram of an embodiment of the disclosed apparatus;
FIG. 5 is a schematic view of a second embodiment of the disclosed apparatus;
fig. 6 is a third schematic view of the disclosed apparatus.
Detailed Description
The technical scheme of the disclosure will be described in detail with reference to the accompanying drawings.
In the description of the present disclosure, it is to be understood that the PCM (Pulse-code modulation) track data is an independent sound component, not a sound component that is mixed together and cannot be disassembled. That is, the PCM audio track data is independent sound parts or musical instruments or voices, and is not a mixture of several sound parts or musical instruments or voices which cannot be disassembled. The PCM audio track data may be PCM data of independent sound components obtained by recording, inputting, decoding, etc., such as independent components of instruments, parts, or combinations thereof, such as guitar, bass, drum, keyboard, vocal, violin, etc. As a special example of the present invention, the PCM audio track data also allows sound components that cannot be separated from each other by mixing together as input, but in this case, only the sound components that cannot be separated from each other by mixing together are subjected to unified audio track editing and sound effect editing, and the components in the PCM audio track data cannot be separated and processed separately.
The first embodiment is as follows: sharing auxiliary data is added to the edited soundtrack.
The processing method and apparatus for dismissing and re-editing an audio signal according to the present invention can perform editing operations such as adding, deleting, and replacing an input audio track, and add one or more shared auxiliary data to all or a portion of the audio track, as shown in fig. 2, and includes the following steps:
(301) inputting m PCM audio track data, and recording the total number of the existing audio tracks as x after inputting, recording all the audio tracks as an audio track set C [0.,. once, x-1], wherein m is more than or equal to 1. The input soundtrack data may be partially or wholly from a sound recording device input, a local storage, a network input, or any combination of the three inputs.
(302) Editing and making: adding, deleting and replacing existing audio tracks, keeping the value of x equal to the number of current audio tracks all the time, recording the manufactured audio track set as C [0.,. x-1], and performing the same audio track adding operation as the step (301);
(303) n auxiliary data can be added to y audio tracks in the post-production audio track set C ', and the added auxiliary data is referred to as an auxiliary data set E' [0.,. n-1], which means that each auxiliary data in E 'simultaneously acts on y audio tracks, that is, E' is shared by y audio tracks; n is more than or equal to 0, y is more than or equal to 1 and less than or equal to x;
the operations of adding, deleting and replacing the audio track and adding the auxiliary data can be selectively and repeatedly carried out, and the sequence is not required.
(304) Audio coding: the audio track set C 'and the auxiliary data set E' corresponding to the audio track set C 'are jointly encoded into a compressed audio signal S', and the encoding technology can refer to the three-dimensional panorama acoustic national standard AVS2-P3(GB/T33475.3), the international standard MPEG-H (ISO/IEC 23008-3), Dolby Atmos, WANOS and the like.
Example two: the audio track and auxiliary data are input and various types of auxiliary data are added, deleted, and replaced when editing and making.
The processing method and apparatus for disassembling and re-editing audio signals according to the present invention can perform editing operations such as adding, deleting, and replacing auxiliary data on the basis of embodiment 1, and can edit various types of auxiliary data, as shown in fig. 3, including the following steps:
(401) inputting data, including:
(401.1) adding an audio signal: the added audio signal may be partially or wholly from a recording device input, a local storage, a network input, or any combination of the three; for local storage and network input, the audio format may be a PCM signal, a compressed audio signal, or any combination of the two formats. If the added audio signals contain m3 PCM audio recording tracks, m4 locally imported PCM signals, m5 locally imported compressed audio signals, and m6 network compressed audio signals, then m5 locally compressed audio signals are decoded into m5 'PCM signals, m6 network compressed audio signals are decoded into m6' PCM signals, and the total number of existing tracks is recorded as x, and all tracks are recorded as a set of tracks C [0,. ·, x-1 ]. m3, m4, m5 and m6 are all more than or equal to 0, m3+ m4+ m5+ m6 is more than or equal to 1, m5 'is more than or equal to m5, m6' is more than or equal to m6, and x is m3+ m4+ m5'+ m 6'; the audio formats of the local compressed audio signal and the network compressed audio signal include, but are not limited to, AAC, AC3, MP3, WANOS, Atmos, etc., and the decoding techniques may refer to AAC (ISO/IEC 13818-7), AC3(ATSC A/52), MP3, three-dimensional panoramic acoustic national standard AVS2-P3(GB/T33475.3), International Standard MPEG-H (ISO/IEC 23008-3), Dolby Atmos, WANOS, etc.
(401.2) adding auxiliary data. Auxiliary data is added to the existing audio track, denoted as set E. Auxiliary data corresponds to audio tracks and may be applied to a single audio track (e.g., equalizer, reverberation, spatial information, etc.) or to multiple audio tracks simultaneously (e.g., downmix, automatic gain, etc.); from the perspective of an audio track, each audio track may possess one or more auxiliary data, and multiple audio tracks may share one or more auxiliary data simultaneously; the sound effects on a single track and the sound effects shared by multiple tracks can be present simultaneously and combined arbitrarily.
For the auxiliary data on a single audio track, the specific operation is to add m auxiliary data to any audio track in the existing audio track set C, and divide the audio track into auxiliary data sets E4[ 0., m-1]Represents each track C [ i ]]The corresponding auxiliary data is E4[ i ]][0,...,ei-1],eiRepresenting the current amount of auxiliary data of the ith track. For auxiliary data shared by multiple audio tracks, the specific operations are: n auxiliary data are added to the y tracks in the set C, denoted E5[ 0., n-1]It is indicated that each auxiliary data in E5 acts on y tracks simultaneously, i.e. is shared by y tracks. m is greater than or equal to 0, n is greater than or equal to 0, m + n is greater than or equal to 1, ei≥0(eiWhen 0, it means that there is no auxiliary data on the ith track), 0 ≦ i < x, 1 ≦ y ≦ x (when y ≦ x, it means that the auxiliary data in E5 acts on all tracks in C, and when 1 ≦ y < x, it means that the auxiliary data in E5 acts on part of the tracks in C), E ≦ E4+ E5.
(402) Editing and production
Adding, deleting and replacing the existing audio tracks, keeping the value of x equal to the number of the current audio tracks all the time, and recording the manufactured audio track set as C' [0.. x-1 ]; the adding operation of the audio track is the same as the step (401.1);
adding, deleting and replacing the existing auxiliary data and keeping e all the timeiIs equal to the amount of auxiliary data of the ith track, and the set of auxiliary data after production is denoted as E' [0.. x-1]]The adding operation of the auxiliary data is the same as the step (401.2);
the operations of adding, deleting and replacing the audio track and the auxiliary data can be selectively and repeatedly carried out, and the sequence is not sequential.
(403) And (5) audio coding. The set of tracks C ' and its corresponding set of auxiliary data E ' are jointly encoded into a compressed audio signal S '. The coding technology can refer to three-dimensional panoramic sound national standard AVS2-P3(GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), Dolby Atmos and the like.
Example three: the input audio signal contains auxiliary data and the output audio signal can be secondarily generated.
The processing method and apparatus for disassembling and re-editing audio signals according to the present invention can add auxiliary data to each audio track, and can perform secondary production by using the produced audio signal (e.g. the final output signal S' of the second embodiment) as an input source, as also shown in fig. 3, including the following steps:
(501) m7 compressed audio signals containing auxiliary data are input. The m7 audio signals are decoded (the decoding technique can refer to three-dimensional panoramic sound national standard AVS2-P3(GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), Dolby Atmos and WANOS, etc.), the track data and auxiliary data contained in the audio signals are completely separated, and m8 PCM tracks and m9 auxiliary data are generated. Record m8 tracks as a set C [0.,. m8-1](ii) a The m9 auxiliary data are divided by track and recorded as a set E [0.,. m8-1]It is shown that m9 auxiliary data correspond to m8 tracks, and the auxiliary data corresponding to the ith track is E [ i [ ]][0,...,ei-1];1≤m7≤m8,0≤i<m8,ei≥0(eiWhen 0 indicates no auxiliary data on the ith track), m9 > 0, Σ ei=m9;
The number of current tracks is recorded as x, and x is m 8;
(502) editing and producing are carried out on the basis of the audio track set C and the auxiliary data set E, and the editing and producing method comprises the following steps:
and performing addition, deletion and replacement operations on the existing audio track. And always maintain: the value of x is equal to the current number of tracks; the content in C is the current all tracks.
Adding, deleting and replacing the existing auxiliary data, and always keeping: e.g. of the typeiIs equal to the amount of auxiliary data of the ith track; e content is auxiliary data corresponding to each current audio track. The auxiliary data may vary over time (e.g., spatial location information, reference national standard GB/T33475.3, Dolby Atmos, etc.) or be fixed (e.g., equalizer parameters) in addition to the characteristics described (401.2).
The set of audio tracks after production is denoted C '[0.,. x-1], and the set of auxiliary data is denoted E' [0.,. x-1 ].
The operations of addition, deletion and replacement can be selectively carried out and repeated, and have no sequence.
(503) And (5) audio coding. The set of tracks C ' and its corresponding set of auxiliary data E ' are jointly encoded into a compressed audio signal S '. During encoding, the fixed auxiliary data and the auxiliary data which changes along with time can be processed differently, and specifically, reference can be made to three-dimensional panoramic sound national standard AVS2-P3(GB/T33475.3), international standard MPEG-H (ISO/IEC 23008-3), DolbyAtmos and the like.
(504) And (5) secondary manufacturing. If S 'is modified temporarily or made for the second time (for example, multiple persons sing/ensemble, multiple persons relay/cooperate to complete a work), then S' is used as a signal input source, steps (501) to (503) are repeated until the making is finished, and the final compressed audio signal is output by (503).
One or more auxiliary data may be added to each track, since a track may have no auxiliary data, one auxiliary data, or more auxiliary data, i.e. the set of auxiliary data E1 'is actually a set of auxiliary data contained in all tracks of the set of tracks C1', in general, tracks without auxiliary data are called sound beds, and tracks with auxiliary data are called sound objects.
After adding, deleting or replacing the tracks and their auxiliary data, the sound object and sound bed may be changed, the tracks and sound bed in the changed sound object constitute a new set of tracks, and all the auxiliary data in the changed sound object constitute a new set of auxiliary data, that is, the changed sound object and sound bed are encoded to obtain a compressed sound signal.
Fig. 4 is a schematic diagram of an embodiment of the apparatus, which includes an audio input module, an audio editing module, an auxiliary data adding module and an audio encoding module, wherein the audio input module includes a PCM input unit, and the PCM input unit inputs PCM signals, for example, m1 PCM signals, m1 is greater than 0, and then the m1 PCM signals are audio track sets C1, where C1 ═ C1 ═ C1i},0≤i≤m1-1。
The audio editing module comprises a track editing unit which adds, deletes or replaces the track set C1 or any combination of the three ways to generate a new track set C1'; auxiliary data addingAdding at least one group of auxiliary data to the audio track set C1 'by an adding module to obtain an auxiliary data set E1'; the audio encoding module encodes the set of audio tracks C1 'and the set of auxiliary data E1' to obtain a compressed sound signal Sq'。
Fig. 5 is a schematic diagram of a second embodiment of the apparatus, in which, based on the first embodiment of the apparatus, the audio input module further includes an auxiliary data input unit, the auxiliary data input unit inputs an auxiliary data set E1, E1 may be a group of auxiliary data shared by several tracks in C1, or a set of auxiliary data added to different tracks in C1, and finally the audio encoding module encodes C1', E1, and E1' to obtain a compressed sound signal Sq”。
Fig. 6 is a schematic diagram of a third embodiment of the apparatus, in which the audio input module further includes a compressed signal input unit, and the compressed signal is decoded by the decoding module after being input. The decoding module also comprises an audio decoding unit and an auxiliary data decoding unit, if the input signal is a compressed audio signal (such as local storage or network input), the audio decoding unit can decode the input signal to obtain corresponding PCM data; if the input compressed signal also contains auxiliary data, the auxiliary data decoding unit can decode the input signal to obtain the auxiliary data.
The audio editing module also comprises an auxiliary data editing unit, and the auxiliary data editing unit adds, deletes or replaces the auxiliary data set or randomly combines the three modes to obtain a new auxiliary data set.
As a specific embodiment, the output of the audio encoding module is input to the audio input module.
As a specific example, the input PCM signal may come in part or in whole from a sound recording device input or a local storage or network input or any combination of the three.
As a specific embodiment, the number of channels of the audio signal input by the audio input module includes mono, stereo, 4.0 channels, 5.1 channels, 7.1 channels, 9.1 channels, 11.1 channels, 13.1 channels, 22.2 channels, and any combination of the above types of channels.
As a specific example, the auxiliary data may be a downmix scheme of the audio track, spatial position information, spatial track information, reverberation parameters, equalizer parameters, and the like.
As a specific embodiment, the auxiliary data may act on all or part of the tracks of the set of tracks.
As a specific embodiment, whether the auxiliary data adding module adds the auxiliary data or not does not affect the implementation of the present disclosure.
The foregoing is an exemplary embodiment of the present disclosure, and the scope of the present disclosure is defined by the claims and their equivalents.

Claims (15)

1. A method of processing a dismantlable and re-editable audio signal, comprising:
m1 PCM signals are input, m1 is greater than 0, m1 PCM signals are the track set C1, and C1 ═ { C1i},0≤i≤m1-1;
Adding, deleting or replacing the audio track set C1 or any combination of the three ways to generate a new audio track set C1';
adding at least one set of auxiliary data to the audio track set C1 'to obtain an auxiliary data set E1';
encoding the set of audio tracks C1 'and the set of auxiliary data E1' to obtain a compressed sound signal Sq'。
2. The detachable and re-editing audio signal processing method according to claim 1, comprising:
inputting m2 auxiliary data, if m2 is greater than 0, then there is auxiliary data set E1 ═ { E ═ E1j},0≤j≤m2-1;
Encoding the set of audio tracks C1 'and the sets of auxiliary data E1 and E1' to obtain a compressed sound signal Sq”。
3. The detachable and re-editing audio signal processing method according to claim 2, comprising:
n3 PCM signals and n4 auxiliary data are inputted, n3 and n4 are bothIf greater than 0, then there is a set of tracks C3 ═ C3kK is more than or equal to 0 and less than or equal to n3-1, and the auxiliary data set is E3 ═ E3t},0≤t≤n4-1;
Adding, deleting or replacing the audio track set C3 or any combination of the three ways to generate a new audio track set C3';
adding, deleting or replacing the auxiliary data set E3 or randomly combining the three modes to obtain an auxiliary data set E3';
encoding the set of audio tracks C3 'and the set of auxiliary data E3' to obtain a compressed sound signal Sq”'。
4. A method of processing a dismantlable and re-editable audio signal as claimed in any one of claims 1 to 3, wherein the incoming PCM signal may be derived in part or in whole from a recording device input or from local storage or from a network input or any combination of the three.
5. The detachable and re-editing audio signal processing method according to claim 4, wherein the locally stored or network inputted PCM signal is obtained by decoding the compressed audio signal.
6. A method of processing a dismantlable and re-editable audio signal as claimed in claim 5, wherein the auxiliary data is obtained by decoding a compressed audio signal.
7. A method of processing a disassemblable and re-editable audio signal as claimed in any one of claims 1 to 3, wherein said auxiliary data is a downmix scheme, spatial position information, spatial track information, reverberation parameters, equalizer parameters, etc. of the soundtrack.
8. A panoptic sound processing method of dismantlable and re-editing an audio signal as claimed in any one of claims 1 to 3, wherein the auxiliary data is applied to all tracks or to parts of tracks of a set of tracks.
9. A panoptic sonication method for dismantlable and re-editing an audio signal as claimed in any one of claims 1 to 3, characterized in that said auxiliary data can be fixed and can also vary over time.
10. A processing apparatus for disassembling and re-editing an audio signal, comprising:
an audio input module comprising a PCM input unit inputting m1 PCM signals, m1 being greater than 0, m1 being the set of tracks C1, C1 ═ C1i},0≤i≤m1-1;
An audio editing module which comprises a track editing unit, wherein the track editing unit adds, deletes or replaces the track set C1 or randomly combines the three ways to generate a new track set C1';
an auxiliary data adding module, which adds at least one set of auxiliary data to the audio track set C1 'to obtain an auxiliary data set E1';
an audio encoding module for encoding the audio track set C1 'and the auxiliary data set E1' to obtain a compressed sound signal Sq'。
11. The detachable and re-editing audio signal processing device as claimed in claim 10, wherein said audio input module further comprises an auxiliary data input unit, said auxiliary data input unit inputs m2 auxiliary data, m2 is greater than 0, there is auxiliary data set E1 ═ { E ═ E-1j},0≤j≤m2-1;
The audio encoding module encodes the set of audio tracks C1 'and the sets of auxiliary data E1 and E1' into a compressed sound signal Sq”。
12. The apparatus for dismantlable and re-editing audio signal according to claim 11, wherein the audio editing module further comprises an auxiliary data editing unit, and the auxiliary data editing unit adds, deletes or replaces the auxiliary data set or any combination of the three ways to obtain a new auxiliary data set.
13. The apparatus for processing the dismantlable and re-editing audio signal as claimed in any one of claims 10 to 12, wherein the PCM signal inputted from the PCM input unit is partially or entirely from a recording device input or a local storage or a network input or any combination of the three inputs.
14. The detachable and re-editing audio signal processing device according to claim 13, further comprising a decoding module, wherein said decoding module comprises an audio decoding unit, and the PCM signal stored locally or inputted over a network can be obtained by decoding the compressed audio signal through said audio decoding unit.
15. The detachable and re-edit audio signal processing apparatus of claim 14, wherein the decoding module further includes an auxiliary data decoding unit, the auxiliary data being obtained by decoding the compressed audio signal by the auxiliary data decoding unit.
CN202010209390.9A 2020-03-23 2020-03-23 Processing method and device for detachable and re-editable audio signals Active CN111445914B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010209390.9A CN111445914B (en) 2020-03-23 2020-03-23 Processing method and device for detachable and re-editable audio signals
PCT/CN2020/140722 WO2021190039A1 (en) 2020-03-23 2020-12-29 Processing method and apparatus capable of disassembling and re-editing audio signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010209390.9A CN111445914B (en) 2020-03-23 2020-03-23 Processing method and device for detachable and re-editable audio signals

Publications (2)

Publication Number Publication Date
CN111445914A true CN111445914A (en) 2020-07-24
CN111445914B CN111445914B (en) 2023-10-17

Family

ID=71650637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010209390.9A Active CN111445914B (en) 2020-03-23 2020-03-23 Processing method and device for detachable and re-editable audio signals

Country Status (2)

Country Link
CN (1) CN111445914B (en)
WO (1) WO2021190039A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021190039A1 (en) * 2020-03-23 2021-09-30 全景声科技南京有限公司 Processing method and apparatus capable of disassembling and re-editing audio signal
CN113691860A (en) * 2021-07-19 2021-11-23 北京全景声信息科技有限公司 UGC media content generation method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136022A (en) * 2006-09-01 2008-03-05 李筑 Panorama manufacturing and displaying system of resource information
JP2008225232A (en) * 2007-03-14 2008-09-25 Crimson Technology Inc Signal processing method and audio content distribution method
WO2009093421A1 (en) * 2008-01-21 2009-07-30 Panasonic Corporation Sound reproducing device
CN102682776A (en) * 2012-05-28 2012-09-19 深圳市茁壮网络股份有限公司 Method for processing audio data and server
CN105336348A (en) * 2015-11-16 2016-02-17 合一网络技术(北京)有限公司 Processing system and method for multiple audio tracks in video editing
US20160284355A1 (en) * 2015-03-23 2016-09-29 Microsoft Technology Licensing, Llc Replacing an encoded audio output signal
CN107094277A (en) * 2016-02-18 2017-08-25 谷歌公司 Signal processing method and system for the rendering audio on virtual speaker array
CN108550369A (en) * 2018-04-14 2018-09-18 全景声科技南京有限公司 A kind of panorama acoustical signal decoding method of variable-length

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004029377A (en) * 2002-06-26 2004-01-29 Namco Ltd Compression data processor, compression data processing method and compression data processing program
JP4311541B2 (en) * 2003-10-06 2009-08-12 アルパイン株式会社 Audio signal compression device
CN108550377B (en) * 2018-03-15 2020-06-19 北京雷石天地电子技术有限公司 Method and system for rapidly switching audio tracks
CN111445914B (en) * 2020-03-23 2023-10-17 全景声科技南京有限公司 Processing method and device for detachable and re-editable audio signals

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136022A (en) * 2006-09-01 2008-03-05 李筑 Panorama manufacturing and displaying system of resource information
JP2008225232A (en) * 2007-03-14 2008-09-25 Crimson Technology Inc Signal processing method and audio content distribution method
WO2009093421A1 (en) * 2008-01-21 2009-07-30 Panasonic Corporation Sound reproducing device
CN102682776A (en) * 2012-05-28 2012-09-19 深圳市茁壮网络股份有限公司 Method for processing audio data and server
US20160284355A1 (en) * 2015-03-23 2016-09-29 Microsoft Technology Licensing, Llc Replacing an encoded audio output signal
CN105336348A (en) * 2015-11-16 2016-02-17 合一网络技术(北京)有限公司 Processing system and method for multiple audio tracks in video editing
CN107094277A (en) * 2016-02-18 2017-08-25 谷歌公司 Signal processing method and system for the rendering audio on virtual speaker array
CN108550369A (en) * 2018-04-14 2018-09-18 全景声科技南京有限公司 A kind of panorama acoustical signal decoding method of variable-length

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021190039A1 (en) * 2020-03-23 2021-09-30 全景声科技南京有限公司 Processing method and apparatus capable of disassembling and re-editing audio signal
CN113691860A (en) * 2021-07-19 2021-11-23 北京全景声信息科技有限公司 UGC media content generation method, device, equipment and storage medium
CN113691860B (en) * 2021-07-19 2023-12-08 北京全景声信息科技有限公司 UGC media content generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN111445914B (en) 2023-10-17
WO2021190039A1 (en) 2021-09-30

Similar Documents

Publication Publication Date Title
JP6750061B2 (en) Equalization of encoded audio metadata database
JP6538128B2 (en) Efficient Coding of Audio Scenes Including Audio Objects
CN103649706B (en) The coding of three-dimensional audio track and reproduction
JP5467105B2 (en) Apparatus and method for generating an audio output signal using object-based metadata
CN105981411B (en) The matrix mixing based on multi-component system for the multichannel audio that high sound channel counts
Herre et al. MPEG spatial audio object coding—the ISO/MPEG standard for efficient coding of interactive audio scenes
US7590249B2 (en) Object-based three-dimensional audio system and method of controlling the same
Breebaart et al. Spatial audio object coding (SAOC)-the upcoming MPEG standard on parametric object based audio coding
Sturmel et al. Linear mixing models for active listening of music productions in realistic studio conditions
CN102227769A (en) Decoding apparatus, decoding method, encoding apparatus, encoding method, and editing apparatus
Jot et al. Beyond surround sound-creation, coding and reproduction of 3-D audio soundtracks
WO2021190039A1 (en) Processing method and apparatus capable of disassembling and re-editing audio signal
Purnhagen et al. Immersive audio delivery using joint object coding
CN108550369A (en) A kind of panorama acoustical signal decoding method of variable-length
Kalliris et al. Media management, sound editing and mixing
Marchand et al. DReaM: a novel system for joint source separation and multi-track coding
WO2021203753A1 (en) Delta encoding method and device for audio signal
US8838460B2 (en) Apparatus for playing and producing realistic object audio
AU2013200578B2 (en) Apparatus and method for generating audio output signals using object based metadata
Marchand Spatial manipulation of musical sound: Informed source separation and respatialization
Grill et al. Closing the gap between the multi-channel and the stereo audio world: Recent MP3 surround extensions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant