WO2024000534A1 - Audio signal encoding method and apparatus, and electronic device and storage medium - Google Patents

Audio signal encoding method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2024000534A1
WO2024000534A1 PCT/CN2022/103170 CN2022103170W WO2024000534A1 WO 2024000534 A1 WO2024000534 A1 WO 2024000534A1 CN 2022103170 W CN2022103170 W CN 2022103170W WO 2024000534 A1 WO2024000534 A1 WO 2024000534A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
encoding
channels
downmix
rate
Prior art date
Application number
PCT/CN2022/103170
Other languages
French (fr)
Chinese (zh)
Inventor
高硕�
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to PCT/CN2022/103170 priority Critical patent/WO2024000534A1/en
Priority to CN202280002189.0A priority patent/CN117643073A/en
Publication of WO2024000534A1 publication Critical patent/WO2024000534A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic

Definitions

  • the present disclosure relates to the field of communication technology, and in particular, to an audio signal encoding method, device, electronic equipment and storage medium.
  • the audio signal is uniformly encoded.
  • the number of bits that can be used by each channel is different without considering different encoding rates, which will cause the number of bits that can be used by each channel to exceed or If the number of bits is lower than that required for encoding, it will result in a waste of bits or the inability to provide audio services that match the encoding rate to remote users. This is a problem that needs to be solved urgently.
  • Embodiments of the present disclosure provide an audio signal encoding method, device, electronic equipment and storage medium.
  • the audio signal is encoded according to the number of channels and the encoding rate.
  • the number of bits that can be used can be fully processed. Utilize to avoid the waste of bits and provide remote users with audio services that match the encoding rate.
  • an embodiment of the present disclosure provides a method for encoding audio signals.
  • the method includes: acquiring a scene-based audio signal; determining the number of channels and the encoding rate of the audio signal; and determining the number of channels and the encoding rate according to the number of channels and the Encoding rate, encoding the audio signal to generate an encoded code stream.
  • a scene-based audio signal is obtained; the number of channels and the encoding rate of the audio signal are determined; and the audio signal is encoded according to the number of channels and the encoding rate to generate a coded code stream.
  • the audio signal is encoded according to the number of channels and the encoding rate.
  • encoding the audio signal according to the number of channels and the encoding rate to generate an encoded code stream includes: encoding the audio signal according to the number of channels and the encoding rate.
  • the audio signal is subjected to downmixing processing to generate downmixing parameters and downmixing channel signals;
  • the downmixing channel signal is subjected to encoding processing to generate encoding parameters;
  • the downmixing parameters and the encoding parameters are code stream complex used to generate the encoded code stream.
  • performing downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signals includes: according to the number of channels and The coding rate determines a target control parameter for the audio signal; a downmix processing algorithm is determined according to the target control parameter; and a downmix processing is performed on the audio signal according to the downmix processing algorithm to generate the The downmix parameters and the downmix channel signal.
  • determining a target control parameter for the audio signal according to the number of channels and the encoding rate includes: calculating each channel according to the number of channels and the encoding rate. an initial average rate; determine a target average rate based on the initial average rate and a preset average rate threshold; determine the target control parameter for the audio signal based on the initial average rate and the target average rate.
  • the method before encoding the audio signal, the method further includes: pre-processing the audio signal with pre-emphasis and/or high-pass filtering.
  • an embodiment of the present disclosure provides an audio signal encoding device.
  • the audio signal encoding device includes: a signal acquisition unit configured to acquire a scene-based audio signal; an information determination unit configured to determine the The number of channels and the encoding rate of the audio signal; the encoding processing unit is configured to perform encoding processing on the audio signal according to the number of channels and the encoding rate to generate an encoded code stream.
  • the encoding processing unit includes: a downmix processing module configured to perform downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signal; a parameter generation module configured to perform encoding processing on the downmix channel signal and generate encoding parameters; a code stream generation module configured to perform code stream complexation of the downmix parameter and the encoding parameter used to generate the encoded code stream.
  • the downmix processing module includes: a parameter determination submodule configured to determine target control parameters for the audio signal according to the number of channels and the encoding rate; an algorithm determination submodule , configured to determine the downmix processing algorithm according to the target control parameter; the downmix processing submodule is configured to perform downmix processing on the audio signal according to the downmix processing algorithm to generate the downmix parameters and the downmix channel signal.
  • the parameter determination sub-module is further configured to: calculate an initial target average rate for each channel based on the number of channels and the encoding rate; based on the initial average rate and preset The average rate threshold determines a target average rate; and the target control parameter for the audio signal is determined based on the initial average rate and the target average rate.
  • the method further includes: a preprocessing unit configured to perform pre-emphasis and/or high-pass filtering on the audio signal.
  • embodiments of the present disclosure provide an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the at least one processor. Instructions executed by the processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect.
  • embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method described in the first aspect.
  • embodiments of the present disclosure provide a computer program product, including computer instructions, characterized in that, when executed by a processor, the computer instructions implement the method described in the first aspect.
  • Figure 1 is a flow chart of an audio signal encoding method provided by an embodiment of the present disclosure
  • Figure 2 is a schematic coordinate diagram of an audio signal in FOA format provided by an embodiment of the present disclosure
  • Figure 3 is a flow chart of another audio signal encoding method provided by an embodiment of the present disclosure.
  • Figure 4 is a flow chart of an audio signal encoding method in the related technology provided by the embodiment of the present disclosure
  • Figure 5 is a flow chart of yet another audio signal encoding method provided by an embodiment of the present disclosure.
  • Figure 6 is a flowchart of the sub-steps of S30 in the audio signal encoding method provided by an embodiment of the present disclosure
  • Figure 7 is a flowchart of the sub-steps of S301 in the audio signal encoding method provided by an embodiment of the present disclosure
  • Figure 8 is a structural diagram of an audio signal encoding device provided by an embodiment of the present disclosure.
  • Figure 9 is a structural diagram of the encoding processing unit in the audio signal encoding device provided by an embodiment of the present disclosure.
  • Figure 10 is a structural diagram of the downmix processing module in the audio signal encoding device provided by an embodiment of the present disclosure
  • Figure 11 is a structural diagram of another audio signal encoding device provided by an embodiment of the present disclosure.
  • FIG. 12 is a structural diagram of an electronic device according to an embodiment of the present disclosure.
  • At least one in the present disclosure can also be described as one or more, and the plurality can be two, three, four or more, and the present disclosure is not limited.
  • the technical feature is distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D” etc.
  • the technical features described in “first”, “second”, “third”, “A”, “B”, “C” and “D” are in no particular order or order.
  • each table in this disclosure can be configured or predefined.
  • the values of the information in each table are only examples and can be configured as other values, which is not limited by this disclosure.
  • it is not necessarily required to configure all the correspondences shown in each table.
  • the corresponding relationships shown in some rows may not be configured.
  • appropriate deformation adjustments can be made based on the above table, such as splitting, merging, etc.
  • the names of the parameters shown in the titles of the above tables may also be other names understandable by the communication device, and the values or expressions of the parameters may also be other values or expressions understandable by the communication device.
  • other data structures can also be used, such as arrays, queues, containers, stacks, linear lists, pointers, linked lists, trees, graphs, structures, classes, heaps, hash tables or hash tables. wait.
  • the first generation of mobile communication technology is the first generation of wireless cellular technology and is an analog mobile communication network.
  • the mobile phone When upgrading from 1G to 2G, the mobile phone will be transferred from analog communication to digital communication, using the GSM (Global System for Mobile Communication, Global System for Mobile Communications) network standard, and the speech coder will use AMR (Adaptive Multi Rate-Narrow BandSpeech Codec, narrowband adaptive multi-rate Business coding), EFR (Enhanced Full Rate, enhanced full rate), FR (FullRate, full rate), HR (HarfRate, half rate), communications provide single-channel narrowband voice services, 3G mobile communication system is ITU (International Telecommunication Union , International Telecommunications Union) proposed for International Mobile Communications in 2000, can use TD-SCDMA, or use CDMA2000, or use WCDMA, and its voice coder uses AMR-WB to provide single-channel broadband voice services.
  • GSM Global System for Mobile Communication, Global System for Mobile Communications
  • AMR Adaptive Multi Rate-Narrow BandSpeech Codec, narrowband
  • 4G is a better improvement on 3G technology.
  • Data and voice are all IP-based, providing real-time HD (High Definition, high resolution) + Voice services for voice audio, using EVS (Enhanced Voice Services, enhanced Speech Service) codec is capable of high-quality compression of both speech and audio.
  • EVS Enhanced Voice Services, enhanced Speech Service
  • the voice and audio communication services provided above have expanded from narrowband signals to ultra-wideband and even full-band services, but they are still monophonic services. People's demand for high-quality audio continues to increase. Compared with monophonic audio, stereo audio Have a sense of orientation and distribution for each sound source and improve clarity.
  • Three signal formats including channel-based multi-channel audio signals, object-based audio signals, and scene-based audio signals, can provide three-dimensional audio services.
  • the immersive voice and audio service IVAS (immersive voice and audio service) codec that is being standardized by the third-generation partner program 3GPP SA4 can support the coding and decoding requirements of the above three signal formats.
  • Terminal devices that can support 3D audio services include mobile phones, computers, tablets, conference system equipment, AR (augmented reality, augmented reality)/VR (virtual reality, virtual reality) equipment, cars, etc.
  • the FOA Firs-Order Ambisonics, 1st-order panoramic surround sound
  • HOA High-Order Ambisonics, high-order panoramic surround sound
  • Audio information which is an immersive audio format in which the audio quality gradually increases as the order increases.
  • Different Ambisonics orders represent the number of different audio signal components, that is: for an N-order Ambisonics signal, the number of Ambisonics coefficients It is (N+1)*(N+1):
  • the number of Ambisonics channels increases rapidly with the increase of order, and the amount of encoded data also increases rapidly.
  • the complexity of encoding also increases significantly.
  • the encoding performance also decreases.
  • the input initial channels need to be downmixed. After the downmixing process, the number of channels becomes smaller, and the complexity of encoding decreases, thereby achieving a compromise between encoding complexity and encoding performance. Balanced state.
  • embodiments of the present disclosure provide an audio signal encoding method and device to at least solve the problems in the related art. problem, in order to make full use of the number of bits that can be used, provide remote users with audio services that match the encoding rate, and improve user experience.
  • FIG. 1 is a flow chart of an audio signal encoding method provided by an embodiment of the present disclosure.
  • the method may include but is not limited to the following steps:
  • the local user when the local user establishes voice communication with any remote user, the local user can establish voice communication with the terminal equipment of any remote user through the terminal device, wherein the terminal device of the local user can obtain the information in real time.
  • the sound information of the local user's environment is used to obtain scene-based audio signals.
  • the sound information of the environment where the local user is located includes the sound information emitted by the local user, the sound information of surrounding things, etc.
  • Sound information of surrounding things such as: sound information of driving vehicles, bird calls, wind sound information, sound information of other users around the local user, etc.
  • the terminal device is an entity on the user side that is used to receive or transmit signals, such as mobile phones, computers, tablets, watches, walkie-talkies, conference system equipment, augmented reality AR/virtual reality VR equipment, cars, etc.
  • Terminal equipment can also be called user equipment (user equipment, UE), mobile station (mobile station, MS), mobile terminal equipment (mobile terminal, MT), etc.
  • the terminal device can be a car with communication functions, a smart car, a mobile phone, a wearable device, a tablet computer (Pad), a computer with wireless transceiver functions, a virtual reality (VR) terminal device, an augmented reality (augmented reality (AR) terminal equipment, wireless terminal equipment in industrial control, wireless terminal equipment in self-driving, wireless terminal equipment in remote medical surgery, smart grid ( Wireless terminal equipment in smart grid, wireless terminal equipment in transportation safety, wireless terminal equipment in smart city, wireless terminal equipment in smart home, etc.
  • the embodiments of the present disclosure do not limit the specific technology and specific equipment form used by the terminal equipment.
  • the local user's terminal device obtains scene-based audio signals through a recording device, such as a microphone, that is provided in the terminal device or cooperates with the terminal device to obtain sound information of the environment where the local user is located. Further, Generate scene-based audio signals and obtain scene-based audio signals.
  • a recording device such as a microphone
  • the scene-based audio signal may be an audio signal in FOA format or an audio signal in HOA format.
  • the number of channels and the encoding rate of the scene-based audio signal may be determined.
  • the number of channels of the audio signal is 4, which may be W, X, Y, and Z.
  • W represents a component that includes all sounds in all directions in the sound field superimposed with the same gain and phase
  • X represents the front-to-back direction component in the sound field
  • Y represents the left-right direction component in the sound field
  • Z represents the up-down direction component in the sound field.
  • S3 Encode the audio signal according to the number of channels and encoding rate to generate an encoding stream.
  • a scene-based audio signal is obtained, the number of channels and the encoding rate of the audio signal are determined, and the audio signal is encoded according to the number of channels and the encoding rate to generate an encoded code stream.
  • the audio signal is encoded according to the number of channels and the encoding rate.
  • the encoding rate of each channel can be determined according to the number of channels and the encoding rate. For example, the average encoding rate of each channel can be determined, or the encoding rate of each channel can be determined. The maximum encoding rate of the channel, or the encoding rate of each channel, etc. Among them, the average encoding rate of each channel can be determined according to the encoding rate divided by the number of channels, the maximum encoding rate of each channel is equal to the encoding rate, and the encoding rate of each channel is the encoding rate.
  • the number of bits that can be used by each channel under different coding rates can be considered according to the coding rate of each channel, and then during the encoding process , can make full use of the number of bits that can be used, avoid the waste of bits, and provide remote users with audio services that match the encoding rate.
  • the generated encoding stream can ensure clear, stable and understandable audio services when the encoding rate is low, and can guarantee high-definition, stable and immersive audio services when the encoding rate is high, and can provide remote users with Audio services that match the encoding rate to improve user experience.
  • the method before encoding the audio signal, the method further includes: pre-processing the audio signal with pre-emphasis and/or high-pass filtering.
  • pre-emphasis preprocessing when the scene-based audio signal is obtained and the number of channels and coding rate of the audio signal are determined, pre-emphasis preprocessing can be performed on the audio signal. Pre-emphasis preprocessing on the audio signal can The high-frequency part of the audio information is emphasized to increase the high-frequency resolution of the audio information.
  • the audio signal after obtaining a scene-based audio signal and determining the number of channels and encoding rate of the audio signal, the audio signal can be preprocessed by high-pass filtering to filter out the audio signal that is lower than a certain frequency threshold. signal components.
  • the setting of the starting frequency in the high-pass filtering process can be set as needed. For example, the starting frequency can be set to 20 Hz.
  • the audio signal component of the required encoding frequency band can be obtained.
  • it can avoid the ultra-low frequency signal from affecting the effect of the encoding process.
  • a scene-based audio signal is obtained, the number of channels and the encoding rate of the audio signal are determined, and the audio signal is encoded according to the number of channels and the encoding rate to generate an encoded code stream.
  • the audio signal is encoded according to the number of channels and the encoding rate.
  • the number of bits that can be used can be fully utilized, avoiding the waste of bits, and providing remote users with information that matches the encoding rate. audio services.
  • FIG. 3 is a flow chart of an audio signal encoding method provided by an embodiment of the present disclosure.
  • the method may include but is not limited to the following steps:
  • a scene-based audio signal is obtained, the number of channels and the encoding rate of the audio signal are determined, and the audio signal is encoded according to the number of channels and the encoding rate to generate an encoded code stream.
  • the audio signal is encoded according to the number of channels and the encoding rate, and the audio signal can be downmixed according to the number of channels and the encoding rate to generate the downmix parameters and the downmix channel signal, and then the downmix channel is
  • the signal is encoded and processed to generate encoding parameters, and then the code stream is multiplexed based on the downmix parameters and encoding parameters to generate the code stream.
  • the audio signal is uniformly downmixed.
  • the number of channels is compared with the original The number of channels is reduced, and all the reduced channels are encoded using the core encoder.
  • the downmix parameters and core encoder output parameters generated by the downmix processing are multiplexed and the encoded code stream is output.
  • the audio signal is uniformly downmixed, and the number of bits that can be used by each channel is different without taking into account different encoding rates.
  • the number of channels after downmixing is different from what the core encoder can encode.
  • the number of channels does not match, resulting in: when the number of channels after downmixing is much smaller than the number of input channels, it is impossible to provide better audio services to remote users at high encoding rates (the reason is that each channel has The number of bits that can be used exceeds the number of bits necessary for encoding, resulting in a waste of bits); when the number of channels after downmixing is not much different from the number of input channels, it is impossible to provide remote users with a low encoding rate.
  • Provide an audio service that matches the rate value (the reason is that the number of bits available for each channel is much lower than the number of bits necessary for encoding, resulting in poor encoding quality for each channel).
  • the scene-based audio signal (audio signal in FOA format or audio signal in HOA format) is input to the encoder end, and the encoder end can determine the number of channels and encoding of the audio signal. Rate, input the coding rate, number of channels and audio signal to the pattern analysis module, or you can also preprocess the audio signal with high-pass filtering first, and then input the preprocessed audio signal to the pattern analysis module.
  • the mode analysis module can output control parameters according to the selected encoding rate and number of channels, and use the control parameters to guide the downmix processing module to select the corresponding downmix processing algorithm.
  • the downmix processing module processes the audio signal, it outputs the downmix parameters and downmix channel signal.
  • the downmix channel signal is encoded and processed by the core encoder, it outputs the encoding parameters; the encoding parameters and downmix parameters are input to the code stream multiplexing
  • the processor outputs the encoded code stream.
  • a matching next step is automatically selected based on the number of channels of the input audio signal and the number of bits that can be used.
  • Mixing processing algorithm so that the number of channels after downmixing matches the number of channels that the core encoder can encode at this encoding rate, achieving full (optimal) utilization of the available bits, that is, at low rates It can ensure the provision of clear, stable and understandable audio services. At high speeds, it can ensure the provision of high-definition, stable and immersive audio services, which can improve the user experience.
  • the encoder end after the encoder end outputs the encoded code stream, it can be sent to the decoder end for decoding so that the remote terminal can obtain the sound information transmitted by the local terminal.
  • S30 perform downmix processing on the audio signal according to the number of channels and coding rate to generate downmix parameters and downmix channel signals, including:
  • S301 Determine the target control parameters for the audio signal according to the number of channels and the encoding rate.
  • the audio signal is downmixed according to the number of channels and the coding rate, and the target control parameters of the audio signal can be determined based on the number of channels and the coding rate.
  • the target control parameters for the audio signal are determined based on the number of channels and the coding rate.
  • the coding rate of each channel can be determined based on the number of channels and the coding rate. For example, the average coding rate of each channel can be determined. , or the maximum encoding rate of each channel, or the encoding rate of each channel, etc. Among them, the average encoding rate of each channel can be determined according to the encoding rate divided by the number of channels, the maximum encoding rate of each channel is equal to the encoding rate, and the encoding rate of each channel is the encoding rate.
  • the target control parameters for the audio signal can be determined according to the coding rate of each channel.
  • the target control parameters for the audio signal are determined based on the number of channels and the coding rate. You can also determine the number of channels and coding rate of the audio signal by pre-setting the corresponding relationship between the number of channels and the coding rate and the control parameters. case, the target control parameters for the audio signal can be determined.
  • the target number of channels can also be determined based on the number of channels and the encoding rate, and then the target control parameters for the audio signal can be determined based on the target number of channels, and so on.
  • the target number of channels is determined based on the number of channels and the coding rate.
  • N thresholds of the average coding rate are preset, N is a positive integer, N thresholds determine N+1 threshold intervals, and different threshold intervals are set to correspond to different The number of channels after downmix processing.
  • the initial average encoding rate is calculated according to the number of channels and the encoding rate.
  • the target number of channels can be determined, and then based on the target number of channels, the Target control parameters of the audio signal.
  • the average rate that can be allocated to each channel after downmixing can also be obtained, and the average rate that can be allocated to each channel after downmixing can also be obtained based on the target number of channels and/or downmixing.
  • the average rate that each channel can be allocated after mixing processing determines the target control parameters of the audio signal.
  • the target control parameters for the audio signal are determined according to the target number of channels and/or the average rate that each channel can be allocated after the downmixing process.
  • the target number of channels and/or each channel after the downmixing process can be set in advance.
  • the target control of the audio signal can be determined. parameter.
  • the downmix processing algorithm may be determined based on the target control parameters.
  • the downmix processing algorithm may be determined by determining the downmix processing algorithm corresponding to each channel, and the downmix processing algorithms determined for different channels may be the same or different.
  • S303 Perform downmix processing on the audio signal according to the downmix processing algorithm to generate downmix parameters and downmix channel signals.
  • the audio signal when the downmix processing algorithm corresponding to each channel is determined, the audio signal can be downmixed according to the downmix processing algorithm to generate downmix parameters and downmix channel signals.
  • S301 determine the target control parameters for the audio signal according to the number of channels and the encoding rate, including:
  • S3011 Calculate the initial average rate of each channel according to the number of channels and encoding rate.
  • S3012 Determine the target average rate based on the initial average rate and the preset average rate threshold.
  • S3013 Determine the target control parameters for the audio signal based on the initial average rate and the target average rate.
  • the initial average rate of each channel is calculated, which can be determined according to the encoding rate divided by the number of channels. For example, when the number of channels is 4 and the encoding rate is 96 kbps, the initial average rate of each channel is calculated to be 24 kbps based on the number of channels and the encoding rate.
  • the target average rate when the initial average rate of each channel is calculated, the target average rate may be determined based on the initial average rate and a preset average rate threshold.
  • the preset average rate threshold can be set according to the scene-based audio signal. For example, set the first average rate threshold Thres1 to 13.2kbps, and the second average rate threshold Thres2 to 32kbps. According to the above two average rate thresholds Divide the interval corresponding to the average rate into 3 average rate intervals, as follows:
  • Average rate interval two greater than 13.2kbps and less than 32kbps;
  • Average rate interval three greater than or equal to 32kbps.
  • the target average rate is determined based on the initial average rate and the preset average rate threshold.
  • the average rate threshold interval is determined based on the average rate threshold, the corresponding number of output channels is set for different average rate threshold intervals. , thus, the corresponding target output channel number can be determined based on the average rate threshold interval to which the initial average rate belongs.
  • the target average rate can be calculated based on the target number of output channels and encoding rate.
  • the number of output channels corresponding to average rate interval one is 2
  • the number of output channels corresponding to average rate interval two is 3
  • the number of output channels corresponding to average rate interval three is 4.
  • the initial average rate is 24kbps
  • the target average rate in the average rate interval 2 has increased compared with the initial average rate, so that when the target control parameters of the audio signal are subsequently determined, the appropriate target control parameters can be determined, and the downmix processing algorithm can be determined based on the target control parameters.
  • three different types of downmix processing algorithms can be selected for three average rate intervals for scene-based audio signals.
  • the downmix processing average rate interval one and the average rate interval 2.
  • the average rate that can be used by each channel increases.
  • some scene-based audio signals the initial average rate (the initial average rate that can be allocated to each channel) and the preset average rate threshold, as well as the corresponding number of output channels (downmix The number of channels after processing), and the determined target average rate (the average rate that can be allocated to each channel after downmix processing).
  • the average rate that can be allocated to each channel after downmix processing is greater than or equal to the average number of bits that can be used for each channel, and the number of bits that can be used can be fully utilized. Avoid wasting bits and provide remote users with audio services that match the encoding rate.
  • each element in Table 2 exists independently, and these elements are exemplarily listed in the same table, but it does not mean that all elements in the table must exist at the same time as shown in the table.
  • the value of each element does not depend on the value of any other element in Table 2. Therefore, those skilled in the art can understand that the value of each element in Table 2 is an independent embodiment.
  • the target average rate is determined based on the initial average rate and a preset average rate threshold.
  • the average rate threshold closest to the initial average rate can also be determined as the target average rate, or, It can also be to directly determine the initial average rate as the target average rate, or it can also be to determine the average rate threshold that is greater than the initial average rate, and the average rate threshold closest to the initial average rate is the target average rate, etc., the present disclosure implements There are no specific restrictions on this.
  • the target control parameters for the audio signal are determined based on the initial average rate and the target average rate.
  • the corresponding relationship between the initial average rate, the target average rate, and the control parameters can be preset, for example : Set the corresponding relationship between the initial average speed and the target average speed and the control parameters, or set the difference between the initial average speed and the target average speed, and the corresponding relationship between the control parameters, or set the difference between the initial average speed and the target average speed.
  • a downmix processing algorithm is to design a downmix conversion matrix based on the number of target output channels and the number of channels for obtaining scene-based audio signals. For example, the number of channels is N and the target output channel number is M, then the conversion matrix is M* N, N and M are all positive integers, and M is less than or equal to N.
  • [M*1] represents a matrix of M by 1
  • [M*N] represents a matrix of M by N
  • [N*1] represents a matrix of N by 1.
  • the embodiment of the present disclosure provides an exemplary embodiment.
  • the number of channels is 4, namely: W, X, Y, Z, and the selected encoding rate is 96kbps.
  • the target output channel number after downmixing is 3 channels, where W represents a component that contains all sounds in all directions in the sound field superimposed with the same gain and phase, X represents the front and rear direction components in the sound field, and Y represents The components in the left and right directions in the sound field, Z represents the components in the up and down directions in the sound field, and the coordinate diagram is shown in 2.
  • the Z component in the up and down direction is ignored, and only a total of 3 channel components of W, X, and Y are retained.
  • This strategy reconstruction In the sound field, the listener at the playback end is more sensitive to the components in the front and rear and left and right directions, and less sensitive to the components in the up and down directions; secondly, there are fewer sound sources for the up and down components in the sound field of general audio scenes; the sound after downmix processing
  • FIG. 8 is a structural diagram of an audio signal encoding device provided by an embodiment of the present disclosure.
  • the audio signal encoding device 1 includes: a signal acquisition unit 11 , an information determination unit 12 and an encoding processing unit 13 .
  • the signal acquisition unit 11 is configured to acquire scene-based audio signals.
  • the information determining unit 12 is configured to determine the number of channels and the encoding rate of the audio signal.
  • the encoding processing unit 13 is configured to encode the audio signal according to the number of channels and the encoding rate to generate an encoded code stream.
  • the signal acquisition unit 11 acquires a scene-based audio signal
  • the information determination unit 12 determines the number of channels and the encoding rate of the audio signal
  • the encoding processing unit 13 encodes the audio signal according to the number of channels and the encoding rate. Processing to generate an encoded code stream, whereby the audio signal is encoded according to the number of channels and encoding rate.
  • the number of bits that can be used can be fully utilized and the waste of bits can be avoided. End users are provided with audio services that match the encoding rate.
  • the encoding processing unit 13 includes: a downmix processing module 131, a parameter generation module 132, and a code stream generation module 133.
  • the downmix processing module 131 is configured to perform downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signals.
  • the parameter generation module 132 is configured to perform encoding processing on the downmix channel signal and generate encoding parameters.
  • the code stream generation module 133 is configured to perform code stream multiplexing on downmix parameters and encoding parameters to generate an encoded code stream.
  • the downmix processing module 131 includes: a parameter determination sub-module 1311, an algorithm determination sub-module 1312 and a downmix processing sub-module 1313.
  • the parameter determination sub-module 1311 is configured to determine target control parameters for the audio signal according to the number of channels and the encoding rate.
  • the algorithm determination sub-module 1312 is configured to determine the downmix processing algorithm according to the target control parameters.
  • the downmix processing sub-module 1313 is configured to perform downmix processing on the audio signal according to the downmix processing algorithm to generate downmix parameters and downmix channel signals.
  • the parameter determination sub-module 1311 is also configured to:
  • the target control parameters of the audio signal are determined.
  • the audio signal encoding device 1 further includes: a pre-processing unit 14.
  • the preprocessing unit 14 is configured to perform pre-emphasis and/or high-pass filtering preprocessing on the audio signal.
  • the audio signal encoding device provided by the embodiments of the present disclosure can perform the audio signal encoding method as described in some of the above embodiments. Its beneficial effects are the same as those of the audio signal encoding method described above, and will not be described again here.
  • FIG. 12 is a structural diagram of an electronic device 100 for performing an audio signal encoding method according to an exemplary embodiment.
  • the electronic device 100 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.
  • the electronic device 100 may include one or more of the following components: a processing component 101 , a memory 102 , a power supply component 103 , a multimedia component 104 , an audio component 105 , an input/output (I/O) interface 106 , and a sensor. component 107, and communications component 108.
  • the processing component 101 generally controls the overall operations of the electronic device 100, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 101 may include one or more processors 1011 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 101 may include one or more modules that facilitate interaction between processing component 101 and other components. For example, processing component 101 may include a multimedia module to facilitate interaction between multimedia component 104 and processing component 101 .
  • Memory 102 is configured to store various types of data to support operations at electronic device 100 . Examples of such data include instructions for any application or method operating on the electronic device 100, contact data, phonebook data, messages, pictures, videos, etc.
  • the memory 102 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as SRAM (Static Random-Access Memory), EEPROM (Electrically Erasable Programmable read only memory), which can be Erasable programmable read-only memory), EPROM (Erasable Programmable Read-Only Memory, erasable programmable read-only memory), PROM (Programmable read-only memory, programmable read-only memory), ROM (Read-Only Memory, only read memory), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM Static Random-Access Memory
  • EEPROM Electrical Erasable Programmable read only memory
  • EPROM Erasable Programmable Read-Only Memory, erasable programmable read-only memory
  • PROM Pro
  • Power supply component 103 provides power to various components of electronic device 100 .
  • Power supply components 103 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 100 .
  • Multimedia component 104 includes a touch-sensitive display screen that provides an output interface between the electronic device 100 and the user.
  • the touch display screen may include LCD (Liquid Crystal Display) and TP (Touch Panel).
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action.
  • multimedia component 104 includes a front-facing camera and/or a rear-facing camera. When the electronic device 100 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data.
  • Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
  • Audio component 105 is configured to output and/or input audio signals.
  • the audio component 105 includes a MIC (Microphone), and when the electronic device 100 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signals may be further stored in memory 102 or sent via communications component 108 .
  • audio component 105 also includes a speaker for outputting audio signals.
  • the I/O interface 2112 provides an interface between the processing component 101 and a peripheral interface module.
  • the peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
  • Sensor component 107 includes one or more sensors for providing various aspects of status assessment for electronic device 100 .
  • the sensor component 107 can detect the open/closed state of the electronic device 100, the relative positioning of components, such as the display and the keypad of the electronic device 100, the sensor component 107 can also detect the electronic device 100 or an electronic device 100.
  • the position of components changes, the presence or absence of user contact with the electronic device 100 , the orientation or acceleration/deceleration of the electronic device 100 and the temperature of the electronic device 100 change.
  • Sensor assembly 107 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • the sensor component 107 may also include a light sensor, such as a CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charge-coupled Device) image sensor for use in imaging applications.
  • CMOS Complementary Metal Oxide Semiconductor
  • CCD Charge-coupled Device
  • the sensor component 107 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 108 is configured to facilitate wired or wireless communication between electronic device 100 and other devices.
  • the electronic device 100 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 108 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 108 also includes an NFC (Near Field Communication) module to facilitate short-range communication.
  • the NFC module can be based on RFID (Radio Frequency Identification) technology, IrDA (Infrared Data Association) technology, UWB (Ultra Wide Band) technology, BT (Bluetooth, Bluetooth) technology and other Technology to achieve.
  • the electronic device 100 may be configured by one or more ASIC (Application Specific Integrated Circuit), DSP (Digital Signal Processor, digital signal processor), digital signal processing device (DSPD), PLD ( Programmable Logic Device, Programmable Logic Device), FPGA (Field Programmable Gate Array, Field Programmable Logic Gate Array), controller, microcontroller, microprocessor or other electronic components for executing the above audio signal encoding method .
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor, digital signal processor
  • DSPD digital signal processing device
  • PLD Programmable Logic Device, Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • controller microcontroller, microprocessor or other electronic components for executing the above audio signal encoding method.
  • the electronic device 100 provided by the embodiments of the present disclosure can perform the audio signal encoding method as described in some of the above embodiments, and its beneficial effects are the same as those of the audio signal encoding method described above, which will not be described again here.
  • the present disclosure also proposes a storage medium.
  • the storage medium can be ROM (Read Only Memory Image, read-only memory), RAM (Random Access Memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, compact disc read-only memory) , tapes, floppy disks and optical data storage devices, etc.
  • the present disclosure also provides a computer program product.
  • the computer program When the computer program is executed by a processor of an electronic device, the electronic device can perform the audio signal encoding method as described above.

Abstract

Disclosed in the embodiments of the present disclosure are an audio signal encoding method and apparatus, and an electronic device and a storage medium. The method comprises: acquiring a scenario-based audio signal; determining the number of channels and an encoding rate of the audio signal; and performing encoding processing on the audio signal according to the number of channels and the encoding rate, so as to generate an encoded code stream. Therefore, by means of performing encoding processing on an audio signal according to the number of channels and an encoding rate, the number of bits that can be used can be fully utilized during an encoding process, the waste of the number of bits is avoided, and an audio service matching the encoding rate is provided for a remote user.

Description

音频信号的编码方法、装置、电子设备和存储介质Audio signal encoding method, device, electronic equipment and storage medium 技术领域Technical field
本公开涉及通信技术领域,尤其涉及一种音频信号的编码方法、装置、电子设备和存储介质。The present disclosure relates to the field of communication technology, and in particular, to an audio signal encoding method, device, electronic equipment and storage medium.
背景技术Background technique
相关技术中,在获取音频信号之后,对音频信号进行统一的编码处理。其中,在进行统一的编码处理的过程中,未考虑不同的编码速率下,每个声道所能使用的比特数是不同的,这就会导致每个声道所能使用的比特数超过或低于编码所需的比特数,造成比特数的浪费或无法给远端用户提供与编码速率相匹配的音频服务,这是亟需解决的问题。In the related art, after the audio signal is acquired, the audio signal is uniformly encoded. Among them, in the process of unified encoding processing, the number of bits that can be used by each channel is different without considering different encoding rates, which will cause the number of bits that can be used by each channel to exceed or If the number of bits is lower than that required for encoding, it will result in a waste of bits or the inability to provide audio services that match the encoding rate to remote users. This is a problem that needs to be solved urgently.
发明内容Contents of the invention
本公开实施例提供一种音频信号的编码方法、装置、电子设备和存储介质,根据声道数和编码速率对音频信号进行编码处理,在编码过程中,能够对所能使用的比特数进行充分利用,避免比特数的浪费,为远端用户提供与编码速率相匹配的音频服务。Embodiments of the present disclosure provide an audio signal encoding method, device, electronic equipment and storage medium. The audio signal is encoded according to the number of channels and the encoding rate. During the encoding process, the number of bits that can be used can be fully processed. Utilize to avoid the waste of bits and provide remote users with audio services that match the encoding rate.
第一方面,本公开实施例提供一种音频信号的编码方法,该方法包括:获取基于场景的音频信号;确定所述音频信号的声道数和编码速率;根据所述声道数和所述编码速率,对所述音频信号进行编码处理,以生成编码码流。In a first aspect, an embodiment of the present disclosure provides a method for encoding audio signals. The method includes: acquiring a scene-based audio signal; determining the number of channels and the encoding rate of the audio signal; and determining the number of channels and the encoding rate according to the number of channels and the Encoding rate, encoding the audio signal to generate an encoded code stream.
在该技术方案中,获取基于场景的音频信号;确定所述音频信号的声道数和编码速率;根据所述声道数和所述编码速率,对所述音频信号进行编码处理,以生成编码码流。由此,根据声道数和编码速率对音频信号进行编码处理,在编码过程中,能够对所能使用的比特数进行充分利用,避免比特数的浪费,为远端用户提供与编码速率相匹配的音频服务。In this technical solution, a scene-based audio signal is obtained; the number of channels and the encoding rate of the audio signal are determined; and the audio signal is encoded according to the number of channels and the encoding rate to generate a coded code stream. As a result, the audio signal is encoded according to the number of channels and the encoding rate. During the encoding process, the number of bits that can be used can be fully utilized, avoiding the waste of bits, and providing remote users with information that matches the encoding rate. audio services.
在一些实施例中,所述根据所述声道数和所述编码速率,对所述音频信号进行编码处理,以生成编码码流,包括:根据所述声道数和所述编码速率,对所述音频信号进行下混处理,以生成下混参数和下混声道信号;对所述下混声道信号进行编码处理,生成编码参数;将所述下混参数和所述编码参数进行码流复用,以生成所述编码码流。In some embodiments, encoding the audio signal according to the number of channels and the encoding rate to generate an encoded code stream includes: encoding the audio signal according to the number of channels and the encoding rate. The audio signal is subjected to downmixing processing to generate downmixing parameters and downmixing channel signals; the downmixing channel signal is subjected to encoding processing to generate encoding parameters; the downmixing parameters and the encoding parameters are code stream complex used to generate the encoded code stream.
在一些实施例中,所述根据所述声道数和所述编码速率,对所述音频信号进行下混处理,以生成下混参数和下混声道信号,包括:根据所述声道数和所述编码速率,确定对所述音频信号的目标控制参数;根据所述目标控制参数,确定下混处理算法;根据所述下混处理算法,对所述音频信号进行下混处理,以生成所述下混参数和所述下混声道信号。In some embodiments, performing downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signals includes: according to the number of channels and The coding rate determines a target control parameter for the audio signal; a downmix processing algorithm is determined according to the target control parameter; and a downmix processing is performed on the audio signal according to the downmix processing algorithm to generate the The downmix parameters and the downmix channel signal.
在一些实施例中,所述根据所述声道数和所述编码速率,确定对所述音频信号的目标控制参数,包括:根据所述声道数和所述编码速率,计算每个声道的初始平均速率;根据所述初始平均速率和预先设置的平均速率阈值确定目标平均速率;根据所述初始平均速率和所述目标平均速率,确定对所述音频信号的所述目标控制参数。In some embodiments, determining a target control parameter for the audio signal according to the number of channels and the encoding rate includes: calculating each channel according to the number of channels and the encoding rate. an initial average rate; determine a target average rate based on the initial average rate and a preset average rate threshold; determine the target control parameter for the audio signal based on the initial average rate and the target average rate.
在一些实施例中,在对所述音频信号进行编码处理之前,还包括:对所述音频信号进行预加重和/或高通滤波的预处理。In some embodiments, before encoding the audio signal, the method further includes: pre-processing the audio signal with pre-emphasis and/or high-pass filtering.
第二方面,本公开实施例提供一种音频信号的编码装置,所述音频信号的编码装置包括:信号获取 单元,被配置为获取基于场景的音频信号;信息确定单元,被配置为确定所述音频信号的声道数和编码速率;编码处理单元,被配置为根据所述声道数和所述编码速率,对所述音频信号进行编码处理,以生成编码码流。In a second aspect, an embodiment of the present disclosure provides an audio signal encoding device. The audio signal encoding device includes: a signal acquisition unit configured to acquire a scene-based audio signal; an information determination unit configured to determine the The number of channels and the encoding rate of the audio signal; the encoding processing unit is configured to perform encoding processing on the audio signal according to the number of channels and the encoding rate to generate an encoded code stream.
在一些实施例中,所述编码处理单元,包括:下混处理模块,被配置为根据所述声道数和所述编码速率,对所述音频信号进行下混处理,以生成下混参数和下混声道信号;参数生成模块,被配置为对所述下混声道信号进行编码处理,生成编码参数;码流生成模块,被配置为将所述下混参数和所述编码参数进行码流复用,以生成所述编码码流。In some embodiments, the encoding processing unit includes: a downmix processing module configured to perform downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signal; a parameter generation module configured to perform encoding processing on the downmix channel signal and generate encoding parameters; a code stream generation module configured to perform code stream complexation of the downmix parameter and the encoding parameter used to generate the encoded code stream.
在一些实施例中,所述下混处理模块,包括:参数确定子模块,被配置为根据所述声道数和所述编码速率,确定对所述音频信号的目标控制参数;算法确定子模块,被配置为根据所述目标控制参数,确定下混处理算法;下混处理子模块,被配置为根据所述下混处理算法,对所述音频信号进行下混处理,以生成所述下混参数和所述下混声道信号。In some embodiments, the downmix processing module includes: a parameter determination submodule configured to determine target control parameters for the audio signal according to the number of channels and the encoding rate; an algorithm determination submodule , configured to determine the downmix processing algorithm according to the target control parameter; the downmix processing submodule is configured to perform downmix processing on the audio signal according to the downmix processing algorithm to generate the downmix parameters and the downmix channel signal.
在一些实施例中,所述参数确定子模块,还被配置为:根据所述声道数和所述编码速率,计算每个声道的初始目标平均速率;根据所述初始平均速率和预先设置的平均速率阈值确定目标平均速率;根据所述初始平均速率和所述目标平均速率,确定对所述音频信号的的所述目标控制参数。In some embodiments, the parameter determination sub-module is further configured to: calculate an initial target average rate for each channel based on the number of channels and the encoding rate; based on the initial average rate and preset The average rate threshold determines a target average rate; and the target control parameter for the audio signal is determined based on the initial average rate and the target average rate.
在一些实施例中,还包括:预处理单元,被配置为对所述音频信号进行预加重和/或高通滤波的预处理。In some embodiments, the method further includes: a preprocessing unit configured to perform pre-emphasis and/or high-pass filtering on the audio signal.
第三方面,本公开实施例提供一种电子设备,该电子设备包括:至少一个处理器;以及与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述第一方面所述的方法。In a third aspect, embodiments of the present disclosure provide an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the at least one processor. Instructions executed by the processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect.
第四方面,本公开实施例提供一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行上述第一方面所述的方法。In a fourth aspect, embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method described in the first aspect.
第五方面,本公开实施例提供一种计算机程序产品,包括计算机指令,其特征在于,所述计算机指令在被处理器执行时实现上述第一方面所述的方法。In a fifth aspect, embodiments of the present disclosure provide a computer program product, including computer instructions, characterized in that, when executed by a processor, the computer instructions implement the method described in the first aspect.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。It should be understood that the foregoing general description and the following detailed description are exemplary and explanatory only, and do not limit the present disclosure.
附图说明Description of drawings
为了更清楚地说明本公开实施例或背景技术中的技术方案,下面将对本公开实施例或背景技术中所需要使用的附图进行说明。In order to more clearly illustrate the technical solutions in the embodiments of the disclosure or the background technology, the drawings required to be used in the embodiments or the background technology of the disclosure will be described below.
图1是本公开实施例提供的一种音频信号的编码方法的流程图;Figure 1 is a flow chart of an audio signal encoding method provided by an embodiment of the present disclosure;
图2是本公开实施例提供的FOA格式的音频信号的坐标示意图;Figure 2 is a schematic coordinate diagram of an audio signal in FOA format provided by an embodiment of the present disclosure;
图3是本公开实施例提供的另一种音频信号的编码方法的流程图;Figure 3 is a flow chart of another audio signal encoding method provided by an embodiment of the present disclosure;
图4是本公开实施例提供的相关技术中的一种音频信号的编码方法的流程图;Figure 4 is a flow chart of an audio signal encoding method in the related technology provided by the embodiment of the present disclosure;
图5是本公开实施例提供的又一种音频信号的编码方法的流程图;Figure 5 is a flow chart of yet another audio signal encoding method provided by an embodiment of the present disclosure;
图6是本公开实施例提供的音频信号的编码方法中S30的子步骤的流程图;Figure 6 is a flowchart of the sub-steps of S30 in the audio signal encoding method provided by an embodiment of the present disclosure;
图7是本公开实施例提供的音频信号的编码方法中S301的子步骤的流程图;Figure 7 is a flowchart of the sub-steps of S301 in the audio signal encoding method provided by an embodiment of the present disclosure;
图8是本公开实施例提供的一种音频信号的编码装置的结构图;Figure 8 is a structural diagram of an audio signal encoding device provided by an embodiment of the present disclosure;
图9是本公开实施例提供的音频信号的编码装置中编码处理单元的结构图;Figure 9 is a structural diagram of the encoding processing unit in the audio signal encoding device provided by an embodiment of the present disclosure;
图10是本公开实施例提供的音频信号的编码装置中下混处理模块的结构图;Figure 10 is a structural diagram of the downmix processing module in the audio signal encoding device provided by an embodiment of the present disclosure;
图11是本公开实施例提供的另一种音频信号的编码装置的结构图;Figure 11 is a structural diagram of another audio signal encoding device provided by an embodiment of the present disclosure;
图12为本公开一实施例示出的电子设备的结构图。FIG. 12 is a structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。In order to allow ordinary people in the art to better understand the technical solutions of the present disclosure, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings.
除非上下文另有要求,否则,在整个说明书和权利要求书中,术语“包括”被解释为开放、包含的意思,即为“包含,但不限于”。在说明书的描述中,术语“一些实施例”等旨在表明与该实施例或示例相关的特定特征、结构、材料或特性包括在本公开的至少一个实施例或示例中。上述术语的示意性表示不一定是指同一实施例或示例。此外,所述的特定特征、结构、材料或特点可以以任何适当方式包括在任何一个或多个实施例或示例中。Unless the context requires otherwise, throughout the specification and claims, the term "including" is to be interpreted in an open, inclusive sense, that is, "including, but not limited to." In the description of this specification, the terms "some embodiments" and the like are intended to indicate that a particular feature, structure, material, or characteristic associated with the embodiment or example is included in at least one embodiment or example of the present disclosure. The schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials or characteristics described may be included in any suitable manner in any one or more embodiments or examples.
需要说明的是,本公开的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。It should be noted that the terms "first", "second", etc. in the description, claims and drawings of the present disclosure are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. The terms “first” and “second” are used for descriptive purposes only and shall not be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include one or more of these features. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. The implementations described in the following exemplary embodiments do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects of the disclosure as detailed in the appended claims.
本公开中的至少一个还可以描述为一个或多个,多个可以是两个、三个、四个或者更多个,本公开不做限制。在本公开实施例中,对于一种技术特征,通过“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”等区分该种技术特征中的技术特征,该“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”描述的技术特征间无先后顺序或者大小顺序。At least one in the present disclosure can also be described as one or more, and the plurality can be two, three, four or more, and the present disclosure is not limited. In the embodiment of the present disclosure, for a technical feature, the technical feature is distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D” etc. The technical features described in "first", "second", "third", "A", "B", "C" and "D" are in no particular order or order.
本公开中各表所示的对应关系可以被配置,也可以是预定义的。各表中的信息的取值仅仅是举例,可以配置为其他值,本公开并不限定。在配置信息与各参数的对应关系时,并不一定要求必须配置各表中示意出的所有对应关系。例如,本公开中的表格中,某些行示出的对应关系也可以不配置。又例如,可以基于上述表格做适当的变形调整,例如,拆分,合并等等。上述各表中标题示出参数的名称也可以采用通信装置可理解的其他名称,其参数的取值或表示方式也可以通信装置可理解的其他取值或表示方式。上述各表在实现时,也可以采用其他的数据结构,例如可以采用数组、队列、容器、栈、线性表、指针、链表、树、图、结构体、类、堆、散列表或哈希表等。The corresponding relationships shown in each table in this disclosure can be configured or predefined. The values of the information in each table are only examples and can be configured as other values, which is not limited by this disclosure. When configuring the correspondence between information and each parameter, it is not necessarily required to configure all the correspondences shown in each table. For example, in the table in this disclosure, the corresponding relationships shown in some rows may not be configured. For another example, appropriate deformation adjustments can be made based on the above table, such as splitting, merging, etc. The names of the parameters shown in the titles of the above tables may also be other names understandable by the communication device, and the values or expressions of the parameters may also be other values or expressions understandable by the communication device. When implementing the above tables, other data structures can also be used, such as arrays, queues, containers, stacks, linear lists, pointers, linked lists, trees, graphs, structures, classes, heaps, hash tables or hash tables. wait.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本公开的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented with electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered to be beyond the scope of this disclosure.
第一代移动通信技术(1G)是第一代无线蜂窝技术,属于模拟移动通信网。1G升级到2G时将手机从模拟通信转移到数字通信,采用GSM(GlobalSystem for Mobile Communication,全球移动通信系统)网络制式,语音编码器采用AMR(Adaptive Multi Rate-Narrow BandSpeech Codec,窄带自适应多 速率业务编码),EFR(Enhanced Full Rate,增强型全速率),FR(FullRate,全速率),HR(HarfRate,半速率),通信提供单通道窄带语音服务,3G移动通信系统是ITU(International Telecommunication Union,国际电信联盟)为2000年国际移动通信而提出的,可以采用TD-SCDMA,或采用CDMA2000,或采用WCDMA,其语音编码器采用AMR-WB提供单通道宽带语音服务。4G是在3G技术上的一次更好的改良,数据和话音都采用全IP的方式,提供语音音频的实时HD(High Definition,高分辨率)+Voice服务,采用的EVS(Enhanced Voice Services,增强语音服务)编解码器能够兼顾语音和音频的高质量压缩。The first generation of mobile communication technology (1G) is the first generation of wireless cellular technology and is an analog mobile communication network. When upgrading from 1G to 2G, the mobile phone will be transferred from analog communication to digital communication, using the GSM (Global System for Mobile Communication, Global System for Mobile Communications) network standard, and the speech coder will use AMR (Adaptive Multi Rate-Narrow BandSpeech Codec, narrowband adaptive multi-rate Business coding), EFR (Enhanced Full Rate, enhanced full rate), FR (FullRate, full rate), HR (HarfRate, half rate), communications provide single-channel narrowband voice services, 3G mobile communication system is ITU (International Telecommunication Union , International Telecommunications Union) proposed for International Mobile Communications in 2000, can use TD-SCDMA, or use CDMA2000, or use WCDMA, and its voice coder uses AMR-WB to provide single-channel broadband voice services. 4G is a better improvement on 3G technology. Data and voice are all IP-based, providing real-time HD (High Definition, high resolution) + Voice services for voice audio, using EVS (Enhanced Voice Services, enhanced Speech Service) codec is capable of high-quality compression of both speech and audio.
以上提供的语音和音频通信服务从窄带信号扩展到超宽带甚至是全带服务,但还都是单声道服务,人们对高质量音频的需求不断增加,与单声道音频相比,立体声音频对于每个声源具有取向感和分布感,并且可以提高清晰度。The voice and audio communication services provided above have expanded from narrowband signals to ultra-wideband and even full-band services, but they are still monophonic services. People's demand for high-quality audio continues to increase. Compared with monophonic audio, stereo audio Have a sense of orientation and distribution for each sound source and improve clarity.
随着传输带宽的增加、终端设备信号采集设备的升级、信号处理器性能的提升、以及终端回放设备的升级。基于声道的多通道音频信号,基于对象的音频信号,基于场景的音频信号等三种信号格式可以提供三维音频服务。第三代合作伙伴计划3GPP SA4正在标准化的沉浸式语音和音频服务IVAS(沉浸式语音和音频服务)编解码器即能支持上述三种信号格式的编解码需求。能够支持三维音频服务的终端设备有手机,电脑,平板,会议系统设备,AR(augmented reality,增强现实)/VR(virtual reality,虚拟现实)设备,汽车等。With the increase of transmission bandwidth, the upgrade of terminal equipment signal acquisition equipment, the improvement of signal processor performance, and the upgrade of terminal playback equipment. Three signal formats, including channel-based multi-channel audio signals, object-based audio signals, and scene-based audio signals, can provide three-dimensional audio services. The immersive voice and audio service IVAS (immersive voice and audio service) codec that is being standardized by the third-generation partner program 3GPP SA4 can support the coding and decoding requirements of the above three signal formats. Terminal devices that can support 3D audio services include mobile phones, computers, tablets, conference system equipment, AR (augmented reality, augmented reality)/VR (virtual reality, virtual reality) equipment, cars, etc.
FOA(Firs-Order Ambisonics,1阶全景环绕声)/HOA(High-Order Ambisonics,高阶全景环绕声)信号是一种主要的基于场景音频信号,它表示在音频场景中某一位置处采集的音频信息,它是一种音频质量随着阶数增加逐渐增加的沉浸式音频格式,其中不同的Ambisonics阶数代表不同的音频信号分量的数目,即:对于N阶Ambisonics信号,Ambisonics系数的个数为(N+1)*(N+1)个:The FOA (Firs-Order Ambisonics, 1st-order panoramic surround sound)/HOA (High-Order Ambisonics, high-order panoramic surround sound) signal is a main scene-based audio signal, which represents the audio signal collected at a certain position in the audio scene. Audio information, which is an immersive audio format in which the audio quality gradually increases as the order increases. Different Ambisonics orders represent the number of different audio signal components, that is: for an N-order Ambisonics signal, the number of Ambisonics coefficients It is (N+1)*(N+1):
Ambisonics信号阶数与Ambisonics系数的关系如下表1所示:The relationship between the Ambisonics signal order and the Ambisonics coefficient is shown in Table 1 below:
Ambisonics阶数Ambisonics order Ambisonics系数/声道数Ambisonics coefficient/number of channels
00 11
11 44
22 99
33 1616
44 2525
55 3636
66 4949
表1Table 1
如表1所示,Ambisonics的声道数随着阶数的增加迅速增加,编码的数据量也迅速增加,与之对应的是编码的复杂度也大幅增加,同时由于编码速率的限制编码性能也大幅下降,为了降低编码的复杂度,需要对输入的初始声道进行下混处理,下混处理后声道数量变少,则编码的复杂度下降,从而达到编码复杂度与编码性能折中的平衡状态。As shown in Table 1, the number of Ambisonics channels increases rapidly with the increase of order, and the amount of encoded data also increases rapidly. Correspondingly, the complexity of encoding also increases significantly. At the same time, due to the limitation of encoding rate, the encoding performance also decreases. Significantly reduced, in order to reduce the complexity of encoding, the input initial channels need to be downmixed. After the downmixing process, the number of channels becomes smaller, and the complexity of encoding decreases, thereby achieving a compromise between encoding complexity and encoding performance. Balanced state.
针对相关技术中,存在比特数的浪费或无法给远端用户提供与编码速率相匹配的音频服务的问题,本公开实施例提供一种音频信号的编码方法和装置,以至少解决相关技术中存在的问题,以对所能使用的比特数进行充分利用,为远端用户提供与编码速率相匹配的音频服务,提升用户体验。In view of the problems in the related art that there is a waste of bits or the inability to provide remote users with audio services that match the encoding rate, embodiments of the present disclosure provide an audio signal encoding method and device to at least solve the problems in the related art. problem, in order to make full use of the number of bits that can be used, provide remote users with audio services that match the encoding rate, and improve user experience.
请参见图1,图1是本公开实施例提供的一种音频信号的编码方法的流程图。Please refer to FIG. 1 , which is a flow chart of an audio signal encoding method provided by an embodiment of the present disclosure.
如图1所示,该方法可以包括但不限于如下步骤:As shown in Figure 1, the method may include but is not limited to the following steps:
S1,获取基于场景的音频信号。S1, obtain scene-based audio signals.
可以理解的是,本端用户与任一远端用户建立语音通信时,本端用户可以通过终端设备与任一远端用户的终端设备建立语音通信,其中,本端用户的终端设备可以实时获取本端用户所在环境的声音信息,获取基于场景的音频信号。It can be understood that when the local user establishes voice communication with any remote user, the local user can establish voice communication with the terminal equipment of any remote user through the terminal device, wherein the terminal device of the local user can obtain the information in real time. The sound information of the local user's environment is used to obtain scene-based audio signals.
其中,本端用户所在环境的声音信息,包括本端用户发出的声音信息、以及周边事物的声音信息等。周围事物的声音信息,例如:车辆行驶的声音信息、小鸟的叫声信息、风声信息、本端用户身边的其他用户发出的声音信息,等等。Among them, the sound information of the environment where the local user is located includes the sound information emitted by the local user, the sound information of surrounding things, etc. Sound information of surrounding things, such as: sound information of driving vehicles, bird calls, wind sound information, sound information of other users around the local user, etc.
需要说明的是,终端设备是用户侧的一种用于接收或发射信号的实体,如手机、电脑、平板、手表、对讲机、会议系统设备、增强现实AR/虚拟现实VR设备、汽车等。终端设备也可以称为用户设备(user equipment,UE)、移动台(mobile station,MS)、移动终端设备(mobile terminal,MT)等。终端设备可以是具备通信功能的汽车、智能汽车、手机(mobile phone)、穿戴式设备、平板电脑(Pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端设备、无人驾驶(self-driving)中的无线终端设备、远程手术(remote medical surgery)中的无线终端设备、智能电网(smart grid)中的无线终端设备、运输安全(transportation safety)中的无线终端设备、智慧城市(smart city)中的无线终端设备、智慧家庭(smart home)中的无线终端设备等等。本公开的实施例对终端设备所采用的具体技术和具体设备形态不做限定。It should be noted that the terminal device is an entity on the user side that is used to receive or transmit signals, such as mobile phones, computers, tablets, watches, walkie-talkies, conference system equipment, augmented reality AR/virtual reality VR equipment, cars, etc. Terminal equipment can also be called user equipment (user equipment, UE), mobile station (mobile station, MS), mobile terminal equipment (mobile terminal, MT), etc. The terminal device can be a car with communication functions, a smart car, a mobile phone, a wearable device, a tablet computer (Pad), a computer with wireless transceiver functions, a virtual reality (VR) terminal device, an augmented reality ( augmented reality (AR) terminal equipment, wireless terminal equipment in industrial control, wireless terminal equipment in self-driving, wireless terminal equipment in remote medical surgery, smart grid ( Wireless terminal equipment in smart grid, wireless terminal equipment in transportation safety, wireless terminal equipment in smart city, wireless terminal equipment in smart home, etc. The embodiments of the present disclosure do not limit the specific technology and specific equipment form used by the terminal equipment.
本公开实施例中,本端用户的终端设备获取基于场景的音频信号,可以通过设置于终端设备中或与终端设备配合的录音装置,例如:麦克风,获取本端用户所在环境的声音信息,进一步生成基于场景的音频信号,获取基于场景的音频信号。In the embodiment of the present disclosure, the local user's terminal device obtains scene-based audio signals through a recording device, such as a microphone, that is provided in the terminal device or cooperates with the terminal device to obtain sound information of the environment where the local user is located. Further, Generate scene-based audio signals and obtain scene-based audio signals.
本公开实施例中,基于场景的音频信号可以为FOA格式的音频信号,或者HOA格式的音频信号。In the embodiment of the present disclosure, the scene-based audio signal may be an audio signal in FOA format or an audio signal in HOA format.
S2,确定音频信号的声道数和编码速率。S2, determine the number of channels and coding rate of the audio signal.
本公开实施例中,在获取基于场景的音频信号之后,可以确定基于场景的音频信号的声道数和编码速率。In the embodiment of the present disclosure, after acquiring the scene-based audio signal, the number of channels and the encoding rate of the scene-based audio signal may be determined.
示例性地,如图2所示,获取基于场景的音频信号为FOA格式的音频信号的情况下,确定音频信号的声道数为4个,可以为W,X,Y,Z。其中,W表示一个包含了声场中各个方向所有的声音以相同的增益和相位叠加后的分量,X表示声场中前后方向的分量,Y表示声场中左右方向的分量,Z表示声场中上下方向的分量。并且,还可以确定选择的编码速率为96kbps。For example, as shown in Figure 2, when the scene-based audio signal is obtained as an audio signal in FOA format, it is determined that the number of channels of the audio signal is 4, which may be W, X, Y, and Z. Among them, W represents a component that includes all sounds in all directions in the sound field superimposed with the same gain and phase, X represents the front-to-back direction component in the sound field, Y represents the left-right direction component in the sound field, and Z represents the up-down direction component in the sound field. Portion. Moreover, you can also confirm that the selected encoding rate is 96kbps.
S3,根据声道数和编码速率,对音频信号进行编码处理,以生成编码码流。S3: Encode the audio signal according to the number of channels and encoding rate to generate an encoding stream.
本公开实施例中,获取基于场景的音频信号,确定音频信号的声道数和编码速率,根据声道数和编码速率,对音频信号进行编码处理,以生成编码码流。In the embodiment of the present disclosure, a scene-based audio signal is obtained, the number of channels and the encoding rate of the audio signal are determined, and the audio signal is encoded according to the number of channels and the encoding rate to generate an encoded code stream.
其中,根据声道数和编码速率对音频信号进行编码处理,可以根据声道数和编码速率确定每个声道的编码速率情况,例如,可以确定每个声道的平均编码速率,或者每个声道的最大编码速率,或者每个声道的编码速率等等。其中,每个声道的平均编码速率可以根据编码速率除以声道数进行确定,每个声道的最大编码速率等于编码速率,每个声道的编码速率均为编码速率。Among them, the audio signal is encoded according to the number of channels and the encoding rate. The encoding rate of each channel can be determined according to the number of channels and the encoding rate. For example, the average encoding rate of each channel can be determined, or the encoding rate of each channel can be determined. The maximum encoding rate of the channel, or the encoding rate of each channel, etc. Among them, the average encoding rate of each channel can be determined according to the encoding rate divided by the number of channels, the maximum encoding rate of each channel is equal to the encoding rate, and the encoding rate of each channel is the encoding rate.
在确定每个声道的编码速率情况的基础上,可以根据每个声道的编码速率情况,考虑不同编码速率情况下,每个声道所能使用的比特数,进而在进行编码处理过程中,能够对所能使用的比特数进行充分利用,避免比特数的浪费,为远端用户提供与编码速率相匹配的音频服务。其中,生成的编码码流,在编码速率为低速率时能够保证提供清晰稳定可懂的音频服务,在编码速率为高速率时能够保证提供高清稳定沉浸式的音频服务,能够给远端用户提供与编码速率相匹配的音频服务,提升用户体验。On the basis of determining the coding rate of each channel, the number of bits that can be used by each channel under different coding rates can be considered according to the coding rate of each channel, and then during the encoding process , can make full use of the number of bits that can be used, avoid the waste of bits, and provide remote users with audio services that match the encoding rate. Among them, the generated encoding stream can ensure clear, stable and understandable audio services when the encoding rate is low, and can guarantee high-definition, stable and immersive audio services when the encoding rate is high, and can provide remote users with Audio services that match the encoding rate to improve user experience.
在一些实施例中,在对音频信号进行编码处理之前,还包括:对音频信号进行预加重和/或高通滤波的预处理。In some embodiments, before encoding the audio signal, the method further includes: pre-processing the audio signal with pre-emphasis and/or high-pass filtering.
本公开实施例中,在获取基于场景的音频信号,确定音频信号的声道数和编码速率的情况下,可以对音频信号进行预加重的预处理,对音频信号进行预加重的预处理能够对音频信息中的高频部分进行加重,增加音频信息的高频分辨率。In the embodiment of the present disclosure, when the scene-based audio signal is obtained and the number of channels and coding rate of the audio signal are determined, pre-emphasis preprocessing can be performed on the audio signal. Pre-emphasis preprocessing on the audio signal can The high-frequency part of the audio information is emphasized to increase the high-frequency resolution of the audio information.
本公开实施例中,在获取基于场景的音频信号,确定音频信号的声道数和编码速率的情况下,可以对音频信号进行高通滤波的预处理,以过滤音频信号中低于某一频率阈值的信号分量。高通滤波处理中起始频率的设置可以根据需要进行设置,比如起始频率可以设置为20赫兹。In the embodiment of the present disclosure, after obtaining a scene-based audio signal and determining the number of channels and encoding rate of the audio signal, the audio signal can be preprocessed by high-pass filtering to filter out the audio signal that is lower than a certain frequency threshold. signal components. The setting of the starting frequency in the high-pass filtering process can be set as needed. For example, the starting frequency can be set to 20 Hz.
其中,在对音频信号进行高通滤波的预处理之后,可以获取所需要编码频段的音频信号分量,在对音频信号进行编码处理时,能够避免超低频信号影响编码处理的效果。Among them, after preprocessing the audio signal with high-pass filtering, the audio signal component of the required encoding frequency band can be obtained. When encoding the audio signal, it can avoid the ultra-low frequency signal from affecting the effect of the encoding process.
通过实施本公开实施例,获取基于场景的音频信号,确定音频信号的声道数和编码速率,根据声道数和编码速率,对音频信号进行编码处理,以生成编码码流。由此,根据声道数和编码速率对音频信号进行编码处理,在编码过程中,能够对所能使用的比特数进行充分利用,避免比特数的浪费,为远端用户提供与编码速率相匹配的音频服务。By implementing the embodiments of the present disclosure, a scene-based audio signal is obtained, the number of channels and the encoding rate of the audio signal are determined, and the audio signal is encoded according to the number of channels and the encoding rate to generate an encoded code stream. As a result, the audio signal is encoded according to the number of channels and the encoding rate. During the encoding process, the number of bits that can be used can be fully utilized, avoiding the waste of bits, and providing remote users with information that matches the encoding rate. audio services.
请参见图3,图3是本公开实施例提供的一种音频信号的编码方法的流程图。Please refer to FIG. 3 , which is a flow chart of an audio signal encoding method provided by an embodiment of the present disclosure.
如图3所示,该方法可以包括但不限于如下步骤:As shown in Figure 3, the method may include but is not limited to the following steps:
S10,获取基于场景的音频信号。S10, obtain scene-based audio signals.
S20,确定音频信号的声道数和编码速率。S20, determine the number of channels and coding rate of the audio signal.
本公开实施例中,S20和S20的相关描述可以参见上述实施例中的相关描述,相同的内容此处不再赘述。In the embodiment of the present disclosure, the relevant descriptions of S20 and S20 can be referred to the relevant descriptions in the above embodiments, and the same content will not be described again here.
S30,根据声道数和编码速率,对音频信号进行下混处理,以生成下混参数和下混声道信号。S30, perform downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signals.
S40,对下混声道信号进行编码处理,生成编码参数。S40, perform coding processing on the downmix channel signal and generate coding parameters.
S50,将下混参数和编码参数进行码流复用,以生成编码码流。S50, perform code stream multiplexing on the downmix parameters and encoding parameters to generate an encoding code stream.
本公开实施例中,获取基于场景的音频信号,确定音频信号的声道数和编码速率,根据声道数和编码速率,对音频信号进行编码处理,以生成编码码流。其中,根据声道数和编码速率,对音频信号进行编码处理,可以根据声道数和编码速率,对音频信号进行下混处理,以生成下混参数和下混声道信号,之后对下混声道信号进行编码处理,生成编码参数,再根据下混参数和编码参数,进行码流复用,以生成编码码流。In the embodiment of the present disclosure, a scene-based audio signal is obtained, the number of channels and the encoding rate of the audio signal are determined, and the audio signal is encoded according to the number of channels and the encoding rate to generate an encoded code stream. Among them, the audio signal is encoded according to the number of channels and the encoding rate, and the audio signal can be downmixed according to the number of channels and the encoding rate to generate the downmix parameters and the downmix channel signal, and then the downmix channel is The signal is encoded and processed to generate encoding parameters, and then the code stream is multiplexed based on the downmix parameters and encoding parameters to generate the code stream.
如图4所示,相关技术中,获取音频信号(FOA格式的音频信号或HOA格式的音频信号)之后,对音频信号进行统一的下混处理,下混处理后声道数相比于原来的声道数减少,减少后的所有声道使用核心编码器进行编码,下混处理产生的下混参数和核心编码器输出参数经过码流复用后输出编码码流。As shown in Figure 4, in the related technology, after obtaining the audio signal (audio signal in FOA format or audio signal in HOA format), the audio signal is uniformly downmixed. After the downmixing process, the number of channels is compared with the original The number of channels is reduced, and all the reduced channels are encoded using the core encoder. The downmix parameters and core encoder output parameters generated by the downmix processing are multiplexed and the encoded code stream is output.
其中,对音频信号进行统一的下混处理,未考虑不同的编码速率情况下每个声道所能使用的比特数 是不同的,导致下混处理后的声道数目与核心编码器所能编码的声道数目不匹配,从而导致:当下混处理后的声道数远小于输入声道数时,高编码速率情况下无法给远端用户提供更优的音频服务(原因是每个声道所能使用的比特数超过了编码所必需的比特数,造成了比特数的浪费);当下混处理后的声道数与输入声道数相差不大时,低编码速率情况下无法给远端用户提供与速率值相匹配的音频服务(原因是每个声道所能使用的比特数远低于编码所必需的比特数,造成了每个声道编码质量很差)。Among them, the audio signal is uniformly downmixed, and the number of bits that can be used by each channel is different without taking into account different encoding rates. As a result, the number of channels after downmixing is different from what the core encoder can encode. The number of channels does not match, resulting in: when the number of channels after downmixing is much smaller than the number of input channels, it is impossible to provide better audio services to remote users at high encoding rates (the reason is that each channel has The number of bits that can be used exceeds the number of bits necessary for encoding, resulting in a waste of bits); when the number of channels after downmixing is not much different from the number of input channels, it is impossible to provide remote users with a low encoding rate. Provide an audio service that matches the rate value (the reason is that the number of bits available for each channel is much lower than the number of bits necessary for encoding, resulting in poor encoding quality for each channel).
然而,如图5所示,本公开实施例中,基于场景的音频信号(FOA格式的音频信号或HOA格式的音频信号)输入编码器端,编码器端可以确定音频信号的声道数和编码速率,将编码速率、声道数和音频信号输入至模式分析模块,或者还可以先对音频信号进行高通滤波的预处理,预处理之后的音频信号输入到模式分析模块。However, as shown in Figure 5, in the embodiment of the present disclosure, the scene-based audio signal (audio signal in FOA format or audio signal in HOA format) is input to the encoder end, and the encoder end can determine the number of channels and encoding of the audio signal. Rate, input the coding rate, number of channels and audio signal to the pattern analysis module, or you can also preprocess the audio signal with high-pass filtering first, and then input the preprocessed audio signal to the pattern analysis module.
模式分析模块可以根据所选编码速率和声道数输出控制参数,利用控制参数指导下混处理模块选择对应的下混处理算法。下混处理模块对音频信号进行处理之后,输出下混参数和下混声道信号,下混声道信号经过核心编码器进行编码处理后,输出编码参数;编码参数和下混参数输入到码流复用器,输出编码码流。The mode analysis module can output control parameters according to the selected encoding rate and number of channels, and use the control parameters to guide the downmix processing module to select the corresponding downmix processing algorithm. After the downmix processing module processes the audio signal, it outputs the downmix parameters and downmix channel signal. After the downmix channel signal is encoded and processed by the core encoder, it outputs the encoding parameters; the encoding parameters and downmix parameters are input to the code stream multiplexing The processor outputs the encoded code stream.
本公开实施例中,当输入基于场景的音频信号为FOA格式的音频信号/HOA格式的音频信号时,根据输入音频信号的声道数和所能使用的比特数来自适应决策选择出匹配的下混处理算法,从而使得下混处理后的声道数与此编码速率下核心编码器所能编码的声道数相匹配,达成对所能使用比特的充分(最优)利用,即低速率时能够保证提供清晰稳定可懂的音频服务,高速率时能够保证提供高清稳定沉浸式的音频服务,能够提升用户的体验。In the embodiment of the present disclosure, when the input scene-based audio signal is an audio signal in FOA format/an audio signal in HOA format, a matching next step is automatically selected based on the number of channels of the input audio signal and the number of bits that can be used. Mixing processing algorithm, so that the number of channels after downmixing matches the number of channels that the core encoder can encode at this encoding rate, achieving full (optimal) utilization of the available bits, that is, at low rates It can ensure the provision of clear, stable and understandable audio services. At high speeds, it can ensure the provision of high-definition, stable and immersive audio services, which can improve the user experience.
本公开实施例中,编码器端输出编码码流后,可以发送到解码端,以解码使远端终端可以获取到本端终端传递的声音信息。In the embodiment of the present disclosure, after the encoder end outputs the encoded code stream, it can be sent to the decoder end for decoding so that the remote terminal can obtain the sound information transmitted by the local terminal.
如图6所示,在一些实施例中,S30,根据声道数和编码速率,对音频信号进行下混处理,以生成下混参数和下混声道信号,包括:As shown in Figure 6, in some embodiments, S30, perform downmix processing on the audio signal according to the number of channels and coding rate to generate downmix parameters and downmix channel signals, including:
S301,根据声道数和编码速率,确定对音频信号的目标控制参数。S301: Determine the target control parameters for the audio signal according to the number of channels and the encoding rate.
本公开实施例中,根据声道数和编码速率,对音频信号进行下混处理,可以根据声道数和编码速率,确定对音频信号的目标控制参数。In the embodiment of the present disclosure, the audio signal is downmixed according to the number of channels and the coding rate, and the target control parameters of the audio signal can be determined based on the number of channels and the coding rate.
其中,根据根据声道数和编码速率,确定对音频信号的目标控制参数,可以根据声道数和编码速率确定每个声道的编码速率情况,例如,可以确定每个声道的平均编码速率,或者每个声道的最大编码速率,或者每个声道的编码速率等等。其中,每个声道的平均编码速率可以根据编码速率除以声道数进行确定,每个声道的最大编码速率等于编码速率,每个声道的编码速率均为编码速率。Among them, the target control parameters for the audio signal are determined based on the number of channels and the coding rate. The coding rate of each channel can be determined based on the number of channels and the coding rate. For example, the average coding rate of each channel can be determined. , or the maximum encoding rate of each channel, or the encoding rate of each channel, etc. Among them, the average encoding rate of each channel can be determined according to the encoding rate divided by the number of channels, the maximum encoding rate of each channel is equal to the encoding rate, and the encoding rate of each channel is the encoding rate.
本公开实施例中,在根据声道数和编码速率确定每个声道的编码速率情况的基础上,可以根据每个声道的编码速率情况,确定对音频信号的目标控制参数。In the embodiment of the present disclosure, on the basis of determining the coding rate of each channel according to the number of channels and the coding rate, the target control parameters for the audio signal can be determined according to the coding rate of each channel.
当然,根据根据声道数和编码速率,确定对音频信号的目标控制参数,还可以通过预先设置声道数和编码速率与控制参数的对应关系,在确定音频信号的声道数和编码速率的情况下,可以确定对音频信号的目标控制参数。Of course, the target control parameters for the audio signal are determined based on the number of channels and the coding rate. You can also determine the number of channels and coding rate of the audio signal by pre-setting the corresponding relationship between the number of channels and the coding rate and the control parameters. case, the target control parameters for the audio signal can be determined.
或者,还可以根据声道数和编码速率确定目标声道数,进而根据目标声道数,确定对音频信号的目标控制参数,等等。Alternatively, the target number of channels can also be determined based on the number of channels and the encoding rate, and then the target control parameters for the audio signal can be determined based on the target number of channels, and so on.
其中,根据声道数和编码速率确定目标声道数,例如:预先设置平均编码速率的N个阈值,N为 正整数,N个阈值确定N+1个阈值区间,设置不同阈值区间分别对应不同的下混处理后声道数,基于此,根据声道数和编码速率计算初始平均编码速率,根据初始平均速率所属于的阈值区间,可以确定目标声道数,进而根据目标声道数,确定对音频信号的目标控制参数。Among them, the target number of channels is determined based on the number of channels and the coding rate. For example, N thresholds of the average coding rate are preset, N is a positive integer, N thresholds determine N+1 threshold intervals, and different threshold intervals are set to correspond to different The number of channels after downmix processing. Based on this, the initial average encoding rate is calculated according to the number of channels and the encoding rate. According to the threshold interval to which the initial average rate belongs, the target number of channels can be determined, and then based on the target number of channels, the Target control parameters of the audio signal.
可以理解的是,在已知编码速率和下混处理后声道数的情况下,还可以得到下混处理后每个声道能分配的平均速率,还可以根据目标声道数和/或下混处理后每个声道能分配的平均速率,确定对音频信号的目标控制参数。It can be understood that when the encoding rate and the number of channels after downmixing are known, the average rate that can be allocated to each channel after downmixing can also be obtained, and the average rate that can be allocated to each channel after downmixing can also be obtained based on the target number of channels and/or downmixing. The average rate that each channel can be allocated after mixing processing determines the target control parameters of the audio signal.
其中,根据目标声道数和/或下混处理后每个声道能分配的平均速率,确定对音频信号的目标控制参数,可以预先设置目标声道数和/或下混处理后每个声道能分配的平均速率与控制参数的对应关系,根据目标声道数和/或下混处理后每个声道能分配的平均速率,以及预先设置的对应关系,可以确定对音频信号的目标控制参数。Among them, the target control parameters for the audio signal are determined according to the target number of channels and/or the average rate that each channel can be allocated after the downmixing process. The target number of channels and/or each channel after the downmixing process can be set in advance. The corresponding relationship between the average rate of channel distribution and the control parameters. According to the target number of channels and/or the average rate that each channel can be allocated after downmix processing, as well as the preset corresponding relationship, the target control of the audio signal can be determined. parameter.
S302,根据目标控制参数,确定下混处理算法。S302: Determine the downmix processing algorithm according to the target control parameters.
本公开实施例中,在根据声道数和编码速率,确定对音频信号的目标控制参数的情况下,可以根据目标控制参数,确定下混处理算法。其中,确定下混处理算法可以为确定每一个声道对应的下混处理算法,不同声道确定的下混处理算法可以相同或不同。In the embodiment of the present disclosure, when the target control parameters for the audio signal are determined based on the number of channels and the coding rate, the downmix processing algorithm may be determined based on the target control parameters. The downmix processing algorithm may be determined by determining the downmix processing algorithm corresponding to each channel, and the downmix processing algorithms determined for different channels may be the same or different.
S303,根据下混处理算法,对音频信号进行下混处理,以生成下混参数和下混声道信号。S303: Perform downmix processing on the audio signal according to the downmix processing algorithm to generate downmix parameters and downmix channel signals.
本公开实施例中,在确定每个声道对应的下混处理算法的情况下,可以根据下混处理算法对音频信号进行下混处理,以生成下混参数和下混声道信号。In the embodiment of the present disclosure, when the downmix processing algorithm corresponding to each channel is determined, the audio signal can be downmixed according to the downmix processing algorithm to generate downmix parameters and downmix channel signals.
如图7所示,在一些实施例中,S301,根据声道数和编码速率,确定对音频信号的目标控制参数,包括:As shown in Figure 7, in some embodiments, S301, determine the target control parameters for the audio signal according to the number of channels and the encoding rate, including:
S3011,根据声道数和编码速率,计算每个声道的初始平均速率。S3011: Calculate the initial average rate of each channel according to the number of channels and encoding rate.
S3012,根据初始平均速率和预先设置的平均速率阈值确定目标平均速率。S3012: Determine the target average rate based on the initial average rate and the preset average rate threshold.
S3013,根据初始平均速率和目标平均速率,确定对音频信号的目标控制参数。S3013: Determine the target control parameters for the audio signal based on the initial average rate and the target average rate.
其中,根据声道数和编码速率,计算每个声道的初始平均速率,可以根据编码速率除以声道数进行确定。示例性地,在声道数为4个,编码速率为96kbps的情况下,根据声道数和编码速率,计算每个声道的初始平均速率为24kbps。Among them, according to the number of channels and the encoding rate, the initial average rate of each channel is calculated, which can be determined according to the encoding rate divided by the number of channels. For example, when the number of channels is 4 and the encoding rate is 96 kbps, the initial average rate of each channel is calculated to be 24 kbps based on the number of channels and the encoding rate.
本公开实施例中,在计算得到每个声道的初始平均速率的情况下,可以根据初始平均速率和预先设置的平均速率阈值确定目标平均速率。In the embodiment of the present disclosure, when the initial average rate of each channel is calculated, the target average rate may be determined based on the initial average rate and a preset average rate threshold.
其中,预先设置的平均速率阈值,可以根据基于场景的音频信号进行设置,例如,设置第一个平均速率阈值Thres1为13.2kbps,第二个平均速率阈值Thres2为32kbps,根据上述两个平均速率阈值将平均速率对应的区间划分为3个平均速率区间,依次如下:Among them, the preset average rate threshold can be set according to the scene-based audio signal. For example, set the first average rate threshold Thres1 to 13.2kbps, and the second average rate threshold Thres2 to 32kbps. According to the above two average rate thresholds Divide the interval corresponding to the average rate into 3 average rate intervals, as follows:
平均速率区间一:小于等于13.2kbps;Average rate interval one: less than or equal to 13.2kbps;
平均速率区间二:大于13.2kbps小于32kbps;Average rate interval two: greater than 13.2kbps and less than 32kbps;
平均速率区间三:大于等于32kbps。Average rate interval three: greater than or equal to 32kbps.
本公开实施例中,根据初始平均速率和预先设置的平均速率阈值确定目标平均速率,在根据平均速率阈值确定平均速率阈值区间的情况下,针对不同平均速率阈值区间分别设置对应的输出声道数,从而,可以根据初始平均速率所属的平均速率阈值区间,确定对应的目标输出声道数。In the embodiment of the present disclosure, the target average rate is determined based on the initial average rate and the preset average rate threshold. When the average rate threshold interval is determined based on the average rate threshold, the corresponding number of output channels is set for different average rate threshold intervals. , thus, the corresponding target output channel number can be determined based on the average rate threshold interval to which the initial average rate belongs.
基于此,在确定目标输出声道数的情况下,可以根据目标输出声道数和编码速率,计算得到目标平 均速率。Based on this, when the target number of output channels is determined, the target average rate can be calculated based on the target number of output channels and encoding rate.
示例性地,平均速率区间一对应的输出声道数为2,平均速率区间二对应的输出声道数为3,平均速率区间三对应的输出声道数为4,在初始平均速率为24kbps的情况下,确定属于平均速率区间二,可以确定目标输出声道数为3,则可以计算得到目标平均速率为96kbps/3=32kbps。可见,平均速率区间二目标平均速率相比于初始平均速率有所上升,以在后续确定对音频信号的目标控制参数时,能够确定合适的目标控制参数,并根据目标控制参数确定下混处理算法,从而使得下混处理后的输出声道数与此编码速率下核心编码器所能编码的声道数相匹配,达成对所能使用比特的最优利用,即低速率时能够保证提供清晰稳定可懂的音频服务,高速率时能够保证提供高清稳定沉浸式的音频服务。For example, the number of output channels corresponding to average rate interval one is 2, the number of output channels corresponding to average rate interval two is 3, and the number of output channels corresponding to average rate interval three is 4. When the initial average rate is 24kbps In this case, it is determined that it belongs to the average rate interval 2, and it can be determined that the target output channel number is 3, and then the target average rate can be calculated to be 96kbps/3=32kbps. It can be seen that the target average rate in the average rate interval 2 has increased compared with the initial average rate, so that when the target control parameters of the audio signal are subsequently determined, the appropriate target control parameters can be determined, and the downmix processing algorithm can be determined based on the target control parameters. , so that the number of output channels after downmix processing matches the number of channels that the core encoder can encode at this encoding rate, achieving optimal utilization of the available bits, that is, ensuring clear and stable performance at low rates Intelligible audio services can ensure high-definition, stable and immersive audio services at high speeds.
本公开实施例中,针对3个平均速率区间对基于场景的音频信号可以选择使用3种不同类型的下混处理算法,经过选择的下混处理后,下混处理平均速率区间一和平均速率区间二每个声道所能使用的平均速率上升,平均速率区间三因为编码速率足够丰富而选择不进行下混处理,即将输入信号直接作为下混处理的输出信号,即下混处理后每个声道所能使用的平均速率保持不变。In the embodiment of the present disclosure, three different types of downmix processing algorithms can be selected for three average rate intervals for scene-based audio signals. After the selected downmix processing, the downmix processing average rate interval one and the average rate interval 2. The average rate that can be used by each channel increases. The average rate range 3. Because the encoding rate is rich enough, the downmixing process is not chosen, that is, the input signal is directly used as the output signal of the downmixing process, that is, each sound after the downmixing process The average speed that the channel can use remains unchanged.
示例性地,如下表2所示,一些基于场景的音频信号,初始平均速率(初始每个声道所能分配平均速率)和预先设置的平均速率阈值,以及对应的输出声道数(下混处理后声道数),和确定的目标平均速率(下混处理后每个声道能分配的平均速率)的情况。For example, as shown in Table 2 below, some scene-based audio signals, the initial average rate (the initial average rate that can be allocated to each channel) and the preset average rate threshold, as well as the corresponding number of output channels (downmix The number of channels after processing), and the determined target average rate (the average rate that can be allocated to each channel after downmix processing).
由下表2可以看出,下混处理后每个声道能分配的平均速率大于或者等于获取的每个声道所能用的平均比特数,能够对所能使用的比特数进行充分利用,避免比特数的浪费,为远端用户提供与编码速率相匹配的音频服务。As can be seen from Table 2 below, the average rate that can be allocated to each channel after downmix processing is greater than or equal to the average number of bits that can be used for each channel, and the number of bits that can be used can be fully utilized. Avoid wasting bits and provide remote users with audio services that match the encoding rate.
Figure PCTCN2022103170-appb-000001
Figure PCTCN2022103170-appb-000001
表2Table 2
可以理解的是,表2中的每一个元素都是独立存在的,这些元素被示例性的列在同一张表格中,但是并不代表表格中的所有元素必须根据表格中所示的同时存在。其中每一个元素的值,是不依赖于表2中任何其他元素值。因此本领域内技术人员可以理解,该表2中的每一个元素的取值都是一个独立的实施例。It can be understood that each element in Table 2 exists independently, and these elements are exemplarily listed in the same table, but it does not mean that all elements in the table must exist at the same time as shown in the table. The value of each element does not depend on the value of any other element in Table 2. Therefore, those skilled in the art can understand that the value of each element in Table 2 is an independent embodiment.
本公开实施例中,根据初始平均速率和预先设置的平均速率阈值确定目标平均速率,除上述示例的方法外,还可以为确定与初始平均速率最接近的平均速率阈值为目标平均速率,或者,还可以为直接确定初始平均速率为目标平均速率,或者,还可以为确定大于初始平均速率的平均速率阈值中,与初始平均速率最接近的平均速率阈值为目标平均速率,等等,本公开实施例对此不作具体限制。In the embodiment of the present disclosure, the target average rate is determined based on the initial average rate and a preset average rate threshold. In addition to the method in the above example, the average rate threshold closest to the initial average rate can also be determined as the target average rate, or, It can also be to directly determine the initial average rate as the target average rate, or it can also be to determine the average rate threshold that is greater than the initial average rate, and the average rate threshold closest to the initial average rate is the target average rate, etc., the present disclosure implements There are no specific restrictions on this.
本公开实施例中,在确定目标平均速率之后,根据初始平均速率和目标平均速率,确定对音频信号的目标控制参数,可以为预先设置初始平均速率和目标平均速率与控制参数的对应关系,例如:设置初始平均速率和目标平均速率与控制参数的对应关系,或者设置初始平均速率和目标平均速率之间的差值,与控制参数的对应关系,或者设置初始平均速率和目标平均速率之间的差值绝对值,与控制参数的对应关系,或者设置初始平均速率和目标平均速率的和,与控制参数的对应关系,等等,本公开实施例对此不作具体限制。In the embodiment of the present disclosure, after determining the target average rate, the target control parameters for the audio signal are determined based on the initial average rate and the target average rate. The corresponding relationship between the initial average rate, the target average rate, and the control parameters can be preset, for example : Set the corresponding relationship between the initial average speed and the target average speed and the control parameters, or set the difference between the initial average speed and the target average speed, and the corresponding relationship between the control parameters, or set the difference between the initial average speed and the target average speed. The absolute value of the difference, the corresponding relationship with the control parameters, or the sum of the set initial average rate and the target average rate, the corresponding relationship with the control parameters, etc. This embodiment of the present disclosure does not specifically limit this.
一种下混处理算法是根据目标输出声道数和获取基于场景的音频信号的声道数设计下混转换矩阵,例如声道数为N,目标输出声道数位M,则转换矩阵为M*N,N和M均为正整数,M小于或等于N。A downmix processing algorithm is to design a downmix conversion matrix based on the number of target output channels and the number of channels for obtaining scene-based audio signals. For example, the number of channels is N and the target output channel number is M, then the conversion matrix is M* N, N and M are all positive integers, and M is less than or equal to N.
转换矩阵M*N满足如下关系:The transformation matrix M*N satisfies the following relationship:
[M*1]=[M*N]*[N*1][M*1]=[M*N]*[N*1]
其中,[M*1]表示M乘1的矩阵;[M*N]表示M乘N的矩阵;[N*1]表示N乘1的矩阵。Among them, [M*1] represents a matrix of M by 1; [M*N] represents a matrix of M by N; [N*1] represents a matrix of N by 1.
为了方便理解,本公开实施例提供一示例性实施例。For ease of understanding, the embodiment of the present disclosure provides an exemplary embodiment.
示例性实施例中,获取基于场景的音频信号为FOA格式的音频信号,则其声道数为4个,即:W,X,Y,Z,选择的编码速率为96kbps。下混处理后目标输出声道数为3个声道,其中W表示一个包含了声场中各个方向所有的声音以相同的增益和相位叠加后的分量,X表示声场中前后方向的分量,Y表示声场中左右方向的分量,Z表示声场中上下方向的分量,坐标示意图如2所示。In the exemplary embodiment, if the scene-based audio signal is obtained in FOA format, the number of channels is 4, namely: W, X, Y, Z, and the selected encoding rate is 96kbps. The target output channel number after downmixing is 3 channels, where W represents a component that contains all sounds in all directions in the sound field superimposed with the same gain and phase, X represents the front and rear direction components in the sound field, and Y represents The components in the left and right directions in the sound field, Z represents the components in the up and down directions in the sound field, and the coordinate diagram is shown in 2.
当下混处理后目标声道数为3个时,采用忽略掉上下方向的Z分量,只保留W,X,Y共3个声道分量,这种策略的考虑点有两个:第一,重建声场中,回放端听者对前后和左右方向的分量比较敏感,对上下方向的分量敏感度较低;第二,一般音频场景的声场中上下分量的声源较少;下混处理后的声道数为3个,每个声道所能分配的平均编码速率为96kbps/3=32kbps,编码核在此平均编码速率下能够编码重建质量很高的音频信号,从而达到给远端用户提供高清稳定沉浸式的音频服务。When the number of target channels after downmixing is 3, the Z component in the up and down direction is ignored, and only a total of 3 channel components of W, X, and Y are retained. There are two considerations for this strategy: First, reconstruction In the sound field, the listener at the playback end is more sensitive to the components in the front and rear and left and right directions, and less sensitive to the components in the up and down directions; secondly, there are fewer sound sources for the up and down components in the sound field of general audio scenes; the sound after downmix processing The number of channels is 3, and the average encoding rate that can be allocated to each channel is 96kbps/3=32kbps. At this average encoding rate, the encoding core can encode and reconstruct high-quality audio signals, thereby providing high-definition to remote users. Stable and immersive audio service.
图8是本公开实施例提供的一种音频信号的编码装置的结构图。FIG. 8 is a structural diagram of an audio signal encoding device provided by an embodiment of the present disclosure.
如图8所示,音频信号的编码装置1,包括:信号获取单元11、信息确定单元12和编码处理单元13。As shown in FIG. 8 , the audio signal encoding device 1 includes: a signal acquisition unit 11 , an information determination unit 12 and an encoding processing unit 13 .
信号获取单元11,被配置为获取基于场景的音频信号。The signal acquisition unit 11 is configured to acquire scene-based audio signals.
信息确定单元12,被配置为确定音频信号的声道数和编码速率。The information determining unit 12 is configured to determine the number of channels and the encoding rate of the audio signal.
编码处理单元13,被配置为根据声道数和编码速率,对音频信号进行编码处理,以生成编码码流。The encoding processing unit 13 is configured to encode the audio signal according to the number of channels and the encoding rate to generate an encoded code stream.
通过实施本公开实施例,信号获取单元11获取基于场景的音频信号,信息确定单元12确定音频信号的声道数和编码速率,编码处理单元13根据声道数和编码速率,对音频信号进行编码处理,以生成 编码码流,由此,根据声道数和编码速率对音频信号进行编码处理,在编码过程中,能够对所能使用的比特数进行充分利用,避免比特数的浪费,为远端用户提供与编码速率相匹配的音频服务。By implementing the embodiments of the present disclosure, the signal acquisition unit 11 acquires a scene-based audio signal, the information determination unit 12 determines the number of channels and the encoding rate of the audio signal, and the encoding processing unit 13 encodes the audio signal according to the number of channels and the encoding rate. Processing to generate an encoded code stream, whereby the audio signal is encoded according to the number of channels and encoding rate. During the encoding process, the number of bits that can be used can be fully utilized and the waste of bits can be avoided. End users are provided with audio services that match the encoding rate.
如图9所示,在一些实施例中,编码处理单元13,包括:下混处理模块131、参数生成模块132和码流生成模块133。As shown in Figure 9, in some embodiments, the encoding processing unit 13 includes: a downmix processing module 131, a parameter generation module 132, and a code stream generation module 133.
下混处理模块131,被配置为根据声道数和编码速率,对音频信号进行下混处理,以生成下混参数和下混声道信号。The downmix processing module 131 is configured to perform downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signals.
参数生成模块132,被配置为对下混声道信号进行编码处理,生成编码参数。The parameter generation module 132 is configured to perform encoding processing on the downmix channel signal and generate encoding parameters.
码流生成模块133,被配置为将下混参数和编码参数进行码流复用,以生成编码码流。The code stream generation module 133 is configured to perform code stream multiplexing on downmix parameters and encoding parameters to generate an encoded code stream.
如图10所示,在一些实施例中,下混处理模块131,包括:参数确定子模块1311、算法确定子模块1312和下混处理子模块1313。As shown in Figure 10, in some embodiments, the downmix processing module 131 includes: a parameter determination sub-module 1311, an algorithm determination sub-module 1312 and a downmix processing sub-module 1313.
参数确定子模块1311,被配置为根据声道数和编码速率,确定对音频信号的目标控制参数。The parameter determination sub-module 1311 is configured to determine target control parameters for the audio signal according to the number of channels and the encoding rate.
算法确定子模块1312,被配置为根据目标控制参数,确定下混处理算法。The algorithm determination sub-module 1312 is configured to determine the downmix processing algorithm according to the target control parameters.
下混处理子模块1313,被配置为根据下混处理算法,对音频信号进行下混处理,以生成下混参数和下混声道信号。The downmix processing sub-module 1313 is configured to perform downmix processing on the audio signal according to the downmix processing algorithm to generate downmix parameters and downmix channel signals.
在一些实施例中,参数确定子模块1311,还被配置为:In some embodiments, the parameter determination sub-module 1311 is also configured to:
根据声道数和编码速率,计算每个声道的初始目标平均速率;Based on the number of channels and encoding rate, calculate the initial target average rate for each channel;
根据初始平均速率和预先设置的平均速率阈值确定目标平均速率;Determine the target average rate based on the initial average rate and a preset average rate threshold;
根据初始平均速率和目标平均速率,确定对音频信号的的目标控制参数。According to the initial average rate and the target average rate, the target control parameters of the audio signal are determined.
如图11所示,在一些实施例中,音频信号的编码装置1,还包括:预处理单元14。As shown in Figure 11, in some embodiments, the audio signal encoding device 1 further includes: a pre-processing unit 14.
预处理单元14,被配置为对音频信号进行预加重和/或高通滤波的预处理。The preprocessing unit 14 is configured to perform pre-emphasis and/or high-pass filtering preprocessing on the audio signal.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the devices in the above embodiments, the specific manner in which each module performs operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
本公开实施例提供的音频信号的编码装置,可以执行如上面一些实施例所述的音频信号的编码方法,其有益效果与上述的音频信号的编码方法的有益效果相同,此处不再赘述。The audio signal encoding device provided by the embodiments of the present disclosure can perform the audio signal encoding method as described in some of the above embodiments. Its beneficial effects are the same as those of the audio signal encoding method described above, and will not be described again here.
图12是根据一示例性实施例示出的一种用于执行音频信号的编码方法的电子设备100的结构图。FIG. 12 is a structural diagram of an electronic device 100 for performing an audio signal encoding method according to an exemplary embodiment.
示例性地,电子设备100可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。Illustratively, the electronic device 100 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.
如图12所示,电子设备100可以包括以下一个或多个组件:处理组件101,存储器102,电源组件103,多媒体组件104,音频组件105,输入/输出(I/O)的接口106,传感器组件107,以及通信组件108。As shown in FIG. 12 , the electronic device 100 may include one or more of the following components: a processing component 101 , a memory 102 , a power supply component 103 , a multimedia component 104 , an audio component 105 , an input/output (I/O) interface 106 , and a sensor. component 107, and communications component 108.
处理组件101通常控制电子设备100的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件101可以包括一个或多个处理器1011来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件101可以包括一个或多个模块,便于处理组件101和其他组件之间的交互。例如,处理组件101可以包括多媒体模块,以方便多媒体组件104和处理组件101之间的交互。The processing component 101 generally controls the overall operations of the electronic device 100, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing component 101 may include one or more processors 1011 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 101 may include one or more modules that facilitate interaction between processing component 101 and other components. For example, processing component 101 may include a multimedia module to facilitate interaction between multimedia component 104 and processing component 101 .
存储器102被配置为存储各种类型的数据以支持在电子设备100的操作。这些数据的示例包括用于在电子设备100上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。 存储器102可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如SRAM(Static Random-Access Memory,静态随机存取存储器),EEPROM(Electrically Erasable Programmable read only memory,带电可擦可编程只读存储器),EPROM(Erasable Programmable Read-Only Memory,可擦除可编程只读存储器),PROM(Programmable read-only memory,可编程只读存储器),ROM(Read-Only Memory,只读存储器),磁存储器,快闪存储器,磁盘或光盘。 Memory 102 is configured to store various types of data to support operations at electronic device 100 . Examples of such data include instructions for any application or method operating on the electronic device 100, contact data, phonebook data, messages, pictures, videos, etc. The memory 102 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as SRAM (Static Random-Access Memory), EEPROM (Electrically Erasable Programmable read only memory), which can be Erasable programmable read-only memory), EPROM (Erasable Programmable Read-Only Memory, erasable programmable read-only memory), PROM (Programmable read-only memory, programmable read-only memory), ROM (Read-Only Memory, only read memory), magnetic memory, flash memory, magnetic disk or optical disk.
电源组件103为电子设备100的各种组件提供电力。电源组件103可以包括电源管理系统,一个或多个电源,及其他与为电子设备100生成、管理和分配电力相关联的组件。 Power supply component 103 provides power to various components of electronic device 100 . Power supply components 103 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 100 .
多媒体组件104包括在所述电子设备100和用户之间的提供一个输出接口的触控显示屏。在一些实施例中,触控显示屏可以包括LCD(Liquid Crystal Display,液晶显示器)和TP(Touch Panel,触摸面板)。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。所述触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与所述触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件104包括一个前置摄像头和/或后置摄像头。当电子设备100处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。 Multimedia component 104 includes a touch-sensitive display screen that provides an output interface between the electronic device 100 and the user. In some embodiments, the touch display screen may include LCD (Liquid Crystal Display) and TP (Touch Panel). The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action. In some embodiments, multimedia component 104 includes a front-facing camera and/or a rear-facing camera. When the electronic device 100 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data. Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
音频组件105被配置为输出和/或输入音频信号。例如,音频组件105包括一个MIC(Microphone,麦克风),当电子设备100处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器102或经由通信组件108发送。在一些实施例中,音频组件105还包括一个扬声器,用于输出音频信号。 Audio component 105 is configured to output and/or input audio signals. For example, the audio component 105 includes a MIC (Microphone), and when the electronic device 100 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal. The received audio signals may be further stored in memory 102 or sent via communications component 108 . In some embodiments, audio component 105 also includes a speaker for outputting audio signals.
I/O接口2112为处理组件101和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。The I/O interface 2112 provides an interface between the processing component 101 and a peripheral interface module. The peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
传感器组件107包括一个或多个传感器,用于为电子设备100提供各个方面的状态评估。例如,传感器组件107可以检测到电子设备100的打开/关闭状态,组件的相对定位,例如所述组件为电子设备100的显示器和小键盘,传感器组件107还可以检测电子设备100或电子设备100一个组件的位置改变,用户与电子设备100接触的存在或不存在,电子设备100方位或加速/减速和电子设备100的温度变化。传感器组件107可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件107还可以包括光传感器,如CMOS(Complementary Metal Oxide Semiconductor,互补金属氧化物半导体)或CCD(Charge-coupled Device,电荷耦合元件)图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件107还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。 Sensor component 107 includes one or more sensors for providing various aspects of status assessment for electronic device 100 . For example, the sensor component 107 can detect the open/closed state of the electronic device 100, the relative positioning of components, such as the display and the keypad of the electronic device 100, the sensor component 107 can also detect the electronic device 100 or an electronic device 100. The position of components changes, the presence or absence of user contact with the electronic device 100 , the orientation or acceleration/deceleration of the electronic device 100 and the temperature of the electronic device 100 change. Sensor assembly 107 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact. The sensor component 107 may also include a light sensor, such as a CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charge-coupled Device) image sensor for use in imaging applications. In some embodiments, the sensor component 107 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
通信组件108被配置为便于电子设备100和其他设备之间有线或无线方式的通信。电子设备100可以接入基于通信标准的无线网络,如WiFi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件108经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,所述通信组件108还包括NFC(Near Field Communication,近场通信)模块,以促进短程通信。例如,在NFC模块可基于RFID(Radio Frequency Identification,射频识别)技术,IrDA(Infrared Data Association,红外数据协会)技术,UWB(Ultra Wide Band,超宽带)技术,BT(Bluetooth,蓝牙)技术和其他技术来实现。 Communication component 108 is configured to facilitate wired or wireless communication between electronic device 100 and other devices. The electronic device 100 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In one exemplary embodiment, the communication component 108 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 108 also includes an NFC (Near Field Communication) module to facilitate short-range communication. For example, the NFC module can be based on RFID (Radio Frequency Identification) technology, IrDA (Infrared Data Association) technology, UWB (Ultra Wide Band) technology, BT (Bluetooth, Bluetooth) technology and other Technology to achieve.
在示例性实施例中,电子设备100可以被一个或多个ASIC(Application Specific Integrated Circuit, 专用集成电路)、DSP(Digital Signal Processor,数字信号处理器)、数字信号处理设备(DSPD)、PLD(Programmable Logic Device,可编程逻辑器件)、FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述音频信号的编码方法。需要说明的是,本实施例的电子设备的实施过程和技术原理参见前述对本公开实施例的音频信号的编码方法的解释说明,此处不再赘述。In an exemplary embodiment, the electronic device 100 may be configured by one or more ASIC (Application Specific Integrated Circuit), DSP (Digital Signal Processor, digital signal processor), digital signal processing device (DSPD), PLD ( Programmable Logic Device, Programmable Logic Device), FPGA (Field Programmable Gate Array, Field Programmable Logic Gate Array), controller, microcontroller, microprocessor or other electronic components for executing the above audio signal encoding method . It should be noted that for the implementation process and technical principles of the electronic device in this embodiment, please refer to the aforementioned explanation of the audio signal encoding method in the embodiment of the present disclosure, and will not be described again here.
本公开实施例提供的电子设备100,可以执行如上面一些实施例所述的音频信号的编码方法,其有益效果与上述的音频信号的编码方法的有益效果相同,此处不再赘述。The electronic device 100 provided by the embodiments of the present disclosure can perform the audio signal encoding method as described in some of the above embodiments, and its beneficial effects are the same as those of the audio signal encoding method described above, which will not be described again here.
为了实现上述实施例,本公开还提出一种存储介质。In order to implement the above embodiments, the present disclosure also proposes a storage medium.
其中,该存储介质中的指令由电子设备的处理器执行时,使得电子设备能够执行如前所述的音频信号的编码方法。例如,所述存储介质可以是ROM(Read Only Memory Image,只读存储器)、RAM(Random Access Memory,随机存取存储器)、CD-ROM(Compact Disc Read-Only Memory,紧凑型光盘只读存储器)、磁带、软盘和光数据存储设备等。When the instructions in the storage medium are executed by the processor of the electronic device, the electronic device is able to perform the audio signal encoding method as described above. For example, the storage medium can be ROM (Read Only Memory Image, read-only memory), RAM (Random Access Memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, compact disc read-only memory) , tapes, floppy disks and optical data storage devices, etc.
为了实现上述实施例,本公开还提供一种计算机程序产品,该计算机程序由电子设备的处理器执行时,使得电子设备能够执行如前所述的音频信号的编码方法。In order to implement the above embodiments, the present disclosure also provides a computer program product. When the computer program is executed by a processor of an electronic device, the electronic device can perform the audio signal encoding method as described above.
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。Other embodiments of the disclosure will be readily apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure that follow the general principles of the disclosure and include common knowledge or customary technical means in the technical field that are not disclosed in the disclosure. . It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and units described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
以上所述,仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本公开揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本公开的保护范围之内。因此,本公开的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present disclosure, but the protection scope of the present disclosure is not limited thereto. Any person familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the present disclosure. should be covered by the protection scope of this disclosure. Therefore, the protection scope of the present disclosure should be subject to the protection scope of the claims.

Claims (13)

  1. 一种音频信号的编码方法,其特征在于,包括:An audio signal encoding method, characterized by including:
    获取基于场景的音频信号;Obtain scene-based audio signals;
    确定所述音频信号的声道数和编码速率;Determine the number of channels and coding rate of the audio signal;
    根据所述声道数和所述编码速率,对所述音频信号进行编码处理,以生成编码码流。The audio signal is encoded according to the number of channels and the encoding rate to generate an encoded code stream.
  2. 如权利要求1所述的方法,其特征在于,所述根据所述声道数和所述编码速率,对所述音频信号进行编码处理,以生成编码码流,包括:The method of claim 1, wherein encoding the audio signal according to the number of channels and the encoding rate to generate an encoded code stream includes:
    根据所述声道数和所述编码速率,对所述音频信号进行下混处理,以生成下混参数和下混声道信号;Perform downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signals;
    对所述下混声道信号进行编码处理,生成编码参数;Perform encoding processing on the downmix channel signal to generate encoding parameters;
    将所述下混参数和所述编码参数进行码流复用,以生成所述编码码流。The downmix parameters and the encoding parameters are code stream multiplexed to generate the encoded code stream.
  3. 如权利要求2所述的方法,其特征在于,所述根据所述声道数和所述编码速率,对所述音频信号进行下混处理,以生成下混参数和下混声道信号,包括:The method of claim 2, wherein performing downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signals includes:
    根据所述声道数和所述编码速率,确定对所述音频信号的目标控制参数;Determine target control parameters for the audio signal according to the number of channels and the encoding rate;
    根据所述目标控制参数,确定下混处理算法;Determine the downmix processing algorithm according to the target control parameters;
    根据所述下混处理算法,对所述音频信号进行下混处理,以生成所述下混参数和所述下混声道信号。According to the downmix processing algorithm, the audio signal is downmixed to generate the downmix parameters and the downmix channel signal.
  4. 如权利要求3所述的方法,其特征在于,所述根据所述声道数和所述编码速率,确定对所述音频信号的目标控制参数,包括:The method of claim 3, wherein determining target control parameters for the audio signal based on the number of channels and the encoding rate includes:
    根据所述声道数和所述编码速率,计算每个声道的初始平均速率;Calculate the initial average rate of each channel according to the number of channels and the encoding rate;
    根据所述初始平均速率和预先设置的平均速率阈值确定目标平均速率;Determine a target average rate according to the initial average rate and a preset average rate threshold;
    根据所述初始平均速率和所述目标平均速率,确定对所述音频信号的所述目标控制参数。The target control parameter for the audio signal is determined based on the initial average rate and the target average rate.
  5. 如权利要求1至4中任一项所述的方法,其特征在于,在对所述音频信号进行编码处理之前,还包括:The method according to any one of claims 1 to 4, characterized in that, before encoding the audio signal, it further includes:
    对所述音频信号进行预加重和/或高通滤波的预处理。The audio signal is preprocessed by pre-emphasis and/or high-pass filtering.
  6. 一种音频信号的编码装置,其特征在于,包括:An audio signal encoding device, characterized by including:
    信号获取单元,被配置为获取基于场景的音频信号;a signal acquisition unit configured to acquire scene-based audio signals;
    信息确定单元,被配置为确定所述音频信号的声道数和编码速率;an information determination unit configured to determine the number of channels and the encoding rate of the audio signal;
    编码处理单元,被配置为根据所述声道数和所述编码速率,对所述音频信号进行编码处理,以生成编码码流。An encoding processing unit is configured to perform encoding processing on the audio signal according to the number of channels and the encoding rate to generate an encoded code stream.
  7. 如权利要求6所述的装置,其特征在于,所述编码处理单元,包括:The device of claim 6, wherein the encoding processing unit includes:
    下混处理模块,被配置为根据所述声道数和所述编码速率,对所述音频信号进行下混处理,以生成 下混参数和下混声道信号;A downmix processing module configured to perform downmix processing on the audio signal according to the number of channels and the encoding rate to generate downmix parameters and downmix channel signals;
    参数生成模块,被配置为对所述下混声道信号进行编码处理,生成编码参数;A parameter generation module configured to perform encoding processing on the downmix channel signal and generate encoding parameters;
    码流生成模块,被配置为将所述下混参数和所述编码参数进行码流复用,以生成所述编码码流。A code stream generation module is configured to perform code stream multiplexing on the downmix parameters and the encoding parameters to generate the code stream.
  8. 如权利要求7所述的装置,其特征在于,所述下混处理模块,包括:The device according to claim 7, wherein the downmix processing module includes:
    参数确定子模块,被配置为根据所述声道数和所述编码速率,确定对所述音频信号的目标控制参数;A parameter determination submodule configured to determine target control parameters for the audio signal according to the number of channels and the encoding rate;
    算法确定子模块,被配置为根据所述目标控制参数,确定下混处理算法;The algorithm determination submodule is configured to determine the downmix processing algorithm according to the target control parameter;
    下混处理子模块,被配置为根据所述下混处理算法,对所述音频信号进行下混处理,以生成所述下混参数和所述下混声道信号。The downmix processing submodule is configured to perform downmix processing on the audio signal according to the downmix processing algorithm to generate the downmix parameters and the downmix channel signal.
  9. 如权利要求8所述的装置,其特征在于,所述参数确定子模块,还被配置为:The device of claim 8, wherein the parameter determination sub-module is further configured to:
    根据所述声道数和所述编码速率,计算每个声道的初始目标平均速率;Calculate an initial target average rate for each channel based on the number of channels and the encoding rate;
    根据所述初始平均速率和预先设置的平均速率阈值确定目标平均速率;Determine a target average rate according to the initial average rate and a preset average rate threshold;
    根据所述初始平均速率和所述目标平均速率,确定对所述音频信号的的所述目标控制参数。The target control parameter for the audio signal is determined based on the initial average rate and the target average rate.
  10. 如权利要求6至9中任一项所述的装置,其特征在于,还包括:The device according to any one of claims 6 to 9, further comprising:
    预处理单元,被配置为对所述音频信号进行预加重和/或高通滤波的预处理。A preprocessing unit configured to perform pre-emphasis and/or high-pass filtering preprocessing on the audio signal.
  11. 一种电子设备,其特征在于,包括:An electronic device, characterized by including:
    至少一个处理器;以及at least one processor; and
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively connected to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1至5中任一项所述的方法。The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can perform any one of claims 1 to 5. Methods.
  12. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1至5中任一项所述的方法。A non-transitory computer-readable storage medium storing computer instructions, characterized in that the computer instructions are used to cause the computer to execute the method described in any one of claims 1 to 5.
  13. 一种计算机程序产品,包括计算机指令,其特征在于,所述计算机指令在被处理器执行时实现权利要求1至5中任一项所述的方法。A computer program product comprising computer instructions, characterized in that, when executed by a processor, the computer instructions implement the method of any one of claims 1 to 5.
PCT/CN2022/103170 2022-06-30 2022-06-30 Audio signal encoding method and apparatus, and electronic device and storage medium WO2024000534A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/103170 WO2024000534A1 (en) 2022-06-30 2022-06-30 Audio signal encoding method and apparatus, and electronic device and storage medium
CN202280002189.0A CN117643073A (en) 2022-06-30 2022-06-30 Audio signal encoding method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/103170 WO2024000534A1 (en) 2022-06-30 2022-06-30 Audio signal encoding method and apparatus, and electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2024000534A1 true WO2024000534A1 (en) 2024-01-04

Family

ID=89383874

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/103170 WO2024000534A1 (en) 2022-06-30 2022-06-30 Audio signal encoding method and apparatus, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN117643073A (en)
WO (1) WO2024000534A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110002393A1 (en) * 2009-07-03 2011-01-06 Fujitsu Limited Audio encoding device, audio encoding method, and video transmission device
CN109243488A (en) * 2018-10-30 2019-01-18 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency detection, device and storage medium
CN110335615A (en) * 2019-05-05 2019-10-15 北京字节跳动网络技术有限公司 Processing method, device, electronic equipment and the storage medium of audio data
CN114582357A (en) * 2020-11-30 2022-06-03 华为技术有限公司 Audio coding and decoding method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110002393A1 (en) * 2009-07-03 2011-01-06 Fujitsu Limited Audio encoding device, audio encoding method, and video transmission device
CN109243488A (en) * 2018-10-30 2019-01-18 腾讯音乐娱乐科技(深圳)有限公司 Audio-frequency detection, device and storage medium
CN110335615A (en) * 2019-05-05 2019-10-15 北京字节跳动网络技术有限公司 Processing method, device, electronic equipment and the storage medium of audio data
CN114582357A (en) * 2020-11-30 2022-06-03 华为技术有限公司 Audio coding and decoding method and device

Also Published As

Publication number Publication date
CN117643073A (en) 2024-03-01

Similar Documents

Publication Publication Date Title
EP3139640A2 (en) Method and device for achieving object audio recording and electronic apparatus
CN106412772B (en) Camera driven audio spatialization
KR101431934B1 (en) An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal
US20140341280A1 (en) Multiple region video conference encoding
US20150034643A1 (en) Sealing disk for induction sealing a container
CN112673649B (en) Spatial audio enhancement
WO2009051857A2 (en) System and method for video coding using variable compression and object motion tracking
US11870941B2 (en) Audio processing method and electronic device
KR20210072736A (en) Converting audio signals captured in different formats to a reduced number of formats to simplify encoding and decoding operations.
CN114600188A (en) Apparatus and method for audio coding
CN113810589A (en) Electronic device, video shooting method and medium thereof
EP4138381A1 (en) Method and device for video playback
WO2023216119A1 (en) Audio signal encoding method and apparatus, electronic device and storage medium
WO2024000534A1 (en) Audio signal encoding method and apparatus, and electronic device and storage medium
US9930467B2 (en) Sound recording method and device
CN116368460A (en) Audio processing method and device
CN115550559B (en) Video picture display method, device, equipment and storage medium
EP3923280A1 (en) Adapting multi-source inputs for constant rate encoding
CN110166797B (en) Video transcoding method and device, electronic equipment and storage medium
CN116830193A (en) Audio code stream signal processing method, device, electronic equipment and storage medium
CN114631332A (en) Signaling of audio effect metadata in a bitstream
US20240114310A1 (en) Method and System For Efficiently Encoding Scene Positions
EP4280211A1 (en) Sound signal processing method and electronic device
WO2023240653A1 (en) Audio signal format determination method and apparatus
EP4152770A1 (en) A method and apparatus for communication audio handling in immersive audio scene rendering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22948636

Country of ref document: EP

Kind code of ref document: A1