WO2021052293A1 - 音频编码方法和装置 - Google Patents

音频编码方法和装置 Download PDF

Info

Publication number
WO2021052293A1
WO2021052293A1 PCT/CN2020/115123 CN2020115123W WO2021052293A1 WO 2021052293 A1 WO2021052293 A1 WO 2021052293A1 CN 2020115123 W CN2020115123 W CN 2020115123W WO 2021052293 A1 WO2021052293 A1 WO 2021052293A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter set
audio data
packet type
audio
value combination
Prior art date
Application number
PCT/CN2020/115123
Other languages
English (en)
French (fr)
Inventor
王卓
王萌
范泛
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to JP2022517444A priority Critical patent/JP7387879B2/ja
Priority to EP20865475.6A priority patent/EP4024394A4/en
Priority to KR1020227012578A priority patent/KR20220066316A/ko
Publication of WO2021052293A1 publication Critical patent/WO2021052293A1/zh
Priority to US17/697,455 priority patent/US20220208200A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/032Quantisation or dequantisation of spectral components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/07Applications of wireless loudspeakers or wireless microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/80Services using short range communication, e.g. near-field communication [NFC], radio-frequency identification [RFID] or low energy communication

Definitions

  • This application relates to audio processing technology, and in particular to an audio coding method and device.
  • Bluetooth coding and decoding technologies include Advanced Audio Distribution Profile (A2DP) default sub-band coding (SBC), moving Picture Experts Group (Moving Picture Experts Group, MPEG) advanced audio coding (Advanced Audio Coding, AAC) series, Sony's LDAC, Qualcomm's aptX series, etc.
  • A2DP Advanced Audio Distribution Profile
  • SBC sub-band coding
  • MPEG moving Picture Experts Group
  • AAC Advanced Audio Coding
  • AAC Advanced Audio Coding
  • the audio quality strictly depends on the throughput and stability of the Bluetooth connection link.
  • the channel quality of the Bluetooth connection link is interfered, once the bit rate fluctuates greatly, the audio data will be transmitted It is lost in the process, and the sound is interrupted during audio playback, which seriously affects the user experience.
  • Related technologies can control the range of code rate fluctuations, but the control method is relatively crude, and cannot take into account both the continuity of the sound and the guarantee of the sound quality.
  • the present application provides an audio coding method and device to adaptively match the Bluetooth channel conditions and maximize the sound quality while bringing a continuous audio listening experience.
  • this application provides an audio coding method, including:
  • the target bit rate and the Bluetooth packet type obtain the target bit rate and the Bluetooth packet type, the target bit rate and the Bluetooth packet type correspond to the current Bluetooth channel conditions; according to the first audio data, the target bit rate, and the Bluetooth
  • the packet type obtains one or more of a bit pool parameter set, a psychoacoustic parameter set, and a spectral bandwidth parameter set through a neural network obtained by pre-training, and the parameters in the bit pool parameter set are used to represent the remaining code streams that can be used for encoding
  • the number of bits, the parameters in the psychoacoustic parameter set are used to indicate the allocation of the number of bits required for encoding at different frequencies, and the parameters in the spectrum bandwidth parameter set are used to indicate the highest cut-off frequency of the encoded audio spectrum;
  • One or more of the bit pool parameter set, the psychoacoustic parameter set, and the spectrum bandwidth parameter set encodes the first audio data to obtain a bit stream to be sent.
  • this application obtains the relevant parameters for encoding through the neural network for audio data, which can not only adaptively match the Bluetooth channel condition, but also effectively reduce the bit rate of audio encoding Fluctuations improve the anti-interference performance of audio transmission, maximize the sound quality and bring continuous audio listening experience.
  • the neural network obtained by pre-training according to the first audio data, the target bit rate, and the Bluetooth packet type obtains a bit pool parameter set, a psychoacoustic parameter set, and a spectrum bandwidth
  • One or more of the parameter set includes: performing feature extraction on the first audio data, the target bit rate, and the Bluetooth packet type to obtain a first feature vector; and inputting the first feature vector to the nerve
  • the network obtains one or more of the bit pool parameter set, the psychoacoustic parameter set, and the spectrum bandwidth parameter set.
  • the Bluetooth packet type refers to a packet type transmitted by Bluetooth, and may include any one of 2DH1, 2DH3, 2DH5, 3DH1, 3DH3, and 3DH5.
  • the target code rate is used to indicate the average number of bytes of data packets generated by encoding within a set time period.
  • the method before acquiring the first audio data, further includes: constructing a training data set of the neural network, the training data set including the correspondence between the first value combination and the second value combination Relationship, the first value combination is any one of multiple value combinations of audio data, target bit rate, and Bluetooth packet type, and the second value combination is a bit pool parameter set, a psychoacoustic parameter set, and a spectrum bandwidth
  • the multiple value combinations of the parameter set, the multiple value combinations of the bit pool parameter set, the psychoacoustic parameter set, and the spectral bandwidth parameter set correspond to multiple ODG scores, wherein the second value The ODG score corresponding to the combination is the highest; the neural network is obtained by training according to the training data set.
  • the target bit rate and Bluetooth packet type correspond to the Bluetooth channel conditions, so the optimal value combination of the corresponding bit pool parameter set, psychoacoustic parameter set, and spectrum bandwidth parameter set is also Corresponding to the Bluetooth channel status, it can be seen that the neural network has taken into account the changes in the Bluetooth channel status and considered the optimal value combination of related parameters that matches the Bluetooth channel status.
  • the constructing the training data set of the neural network includes: acquiring a plurality of audio data; in the first value combination, using the bit pool parameter set and the psychoacoustic parameter Multiple value combinations of the set and the spectral bandwidth parameter set respectively encode second audio data, where the second audio data is any one of the multiple audio data; obtain the multiple ODG scores according to the encoding result; The value combination corresponding to the highest one of the multiple ODG scores is determined as the second value combination; the first value combination and the second value combination are added to the training data set.
  • this application provides an audio coding device, including:
  • the input module is used to obtain the first audio data; the target bit rate and the Bluetooth packet type are obtained, and the target bit rate and the Bluetooth packet type correspond to the current Bluetooth channel conditions; the parameter obtaining module is used to obtain the first audio data.
  • the data, the target bit rate, and the Bluetooth packet type obtain one or more of a bit pool parameter set, a psychoacoustic parameter set, and a spectral bandwidth parameter set through a neural network obtained by pre-training.
  • the parameter is used to indicate the number of remaining code stream bits that can be used for encoding
  • the parameter in the psychoacoustic parameter set is used to indicate the allocation of the number of bits required for encoding at different frequencies
  • the parameter in the spectrum bandwidth parameter set is used to indicate The highest cut-off frequency of the encoded audio spectrum
  • the parameter acquisition module is specifically configured to perform feature extraction on the first audio data, the target bit rate, and the Bluetooth packet type to obtain a first feature vector;
  • a feature vector is input to the neural network to obtain one or more of the bit pool parameter set, the psychoacoustic parameter set, and the spectral bandwidth parameter set.
  • the Bluetooth packet type refers to a packet type transmitted by Bluetooth, and may include any one of 2DH1, 2DH3, 2DH5, 3DH1, 3DH3, and 3DH5.
  • the target code rate is used to indicate the average number of bytes of data packets generated by encoding within a set time period.
  • the parameter acquisition module is also used to construct a training data set of the neural network, and the training data set includes a corresponding relationship between a first value combination and a second value combination, so
  • the first value combination is any one of multiple value combinations of audio data, target bit rate, and Bluetooth packet type
  • the second value combination is a combination of a bit pool parameter set, a psychoacoustic parameter set, and a spectral bandwidth parameter set
  • One of the multiple value combinations, the multiple value combinations of the bit pool parameter set, the psychoacoustic parameter set, and the spectral bandwidth parameter set correspond to multiple ODG scores, wherein the second value combination corresponds to The ODG score is the highest; the neural network is obtained by training according to the training data set.
  • the parameter acquisition module is specifically configured to acquire multiple audio data; in the first value combination, the bit pool parameter set, psychoacoustic parameter set, and spectrum bandwidth parameter are used.
  • the multiple value combinations of the set respectively encode the second audio data, where the second audio data is any one of the multiple audio data; obtain the multiple ODG scores according to the encoding result; combine the multiple The value combination corresponding to the highest ODG score is determined as the second value combination; the first value combination and the second value combination are added to the training data set.
  • this application provides a terminal device, including:
  • One or more processors are One or more processors;
  • Memory used to store one or more programs
  • the one or more processors When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of the above-mentioned first aspects.
  • the present application provides a computer-readable storage medium, including a computer program, which when executed on a computer, causes the computer to execute the method described in any one of the above-mentioned first aspects.
  • the present application provides a computer program product, the computer program product comprising computer program code, when the computer program code is run on a computer, the computer executes the method described in any one of the above-mentioned first aspects .
  • Fig. 1 exemplarily shows an example diagram of an application scenario to which the audio coding method of the present application is applicable
  • Figure 2 exemplarily shows an example diagram of the audio coding system of the present application
  • Fig. 3 is a flowchart of an embodiment of an audio coding method according to this application.
  • Figure 4 exemplarily shows a schematic diagram of the psychoacoustic process
  • Fig. 5 exemplarily shows a schematic diagram of a parameter acquisition method
  • Fig. 6 shows a schematic diagram of a method for constructing a training data set
  • FIG. 7 is a schematic structural diagram of an embodiment of an audio encoding device of this application.
  • FIG. 8 is a schematic structural diagram of a terminal device provided by this application.
  • At least one (item) refers to one or more, and “multiple” refers to two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects before and after are in an “or” relationship.
  • the following at least one item (a) or similar expressions refers to any combination of these items, including any combination of a single item (a) or a plurality of items (a).
  • At least one of a, b, or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.
  • Figure 1 exemplarily shows an example diagram of an application scenario to which the audio coding method of the present application is applicable.
  • the application scenario includes a terminal device and a Bluetooth device.
  • the terminal device and the Bluetooth device may have Bluetooth connection functions and Devices that support the AAC series of standards.
  • terminal devices can be, for example, mobile phones, computers (including notebooks, desktops, etc.) and tablets (including handheld tablets, car-mounted tablets, etc.)
  • Bluetooth playback devices can be, for example, TWS headsets and wireless headsets.
  • Bluetooth devices such as smart speakers, smart watches, smart glasses, and car speakers.
  • Fig. 2 exemplarily shows an example diagram of the audio coding system of the present application.
  • the audio coding system includes an input module, a processing module, and an output module.
  • the data acquired by the input module includes audio data, such as audio pulse code modulation (Pulse Code Modulation, PCM) code stream, and the target bit rate and Bluetooth packet type determined based on the Bluetooth channel condition, the target bit rate and Bluetooth packet type Corresponds to the current Bluetooth channel status.
  • the target code rate is used to indicate the average number of bytes of data packets generated by encoding in a set period of time.
  • the Bluetooth packet type refers to the packet type transmitted by Bluetooth.
  • the Bluetooth packet type used on the heterogeneous connection layer (Asynchronous Connection-Less, ACL) used to transmit the audio stream can include 2DH1 (limited to the maximum number of data packets in the transmitted audio stream is 31 Byte), 2DH3 (limited to the transmission of audio code stream data packets can be up to 356 bytes), 2DH5 (limited to the transmission of audio code stream data packets can be up to 656 bytes), 3DH1 (limited transmission audio The data packet in the code stream can be up to 11 bytes), 3DH3 (the data packet in the audio code stream that is limited to transmission can be up to 536 bytes) and 3DH5 (the data packet in the audio code stream that is limited to transmission can be up to 986 Byte), where the modulation used by 2DH1, 2DH3, and 2DH5 is ⁇ /4 four-phase relative phase shift keying (Differential Quadrature Reference Phase Shift Keying, DQPSK), and the modulation used by 3DH1, 3DH3, and 3DH5
  • DQPSK Different Quad
  • 2DH5 or 3DH5 When the degree of Bluetooth interference is small and the channel status is good, 2DH5 or 3DH5 is preferred. These two types of Bluetooth packets have greater data transmission capacity and weaker anti-interference ability, allowing the audio encoder to achieve a target code of more than 128kbps High-speed operation to achieve higher sound quality transmission; when Bluetooth is heavily interfered and the channel status is poor, 2DH3, 3DH3, 2DH1 or 3DH1 are preferred. These Bluetooth packet types have greater anti-interference ability and smaller data The transmission capability allows the audio encoder to work at a target bit rate below 96kbps, giving priority to ensuring the continuity of audio transmission.
  • the processing module includes a parameter adjustment sub-module, an encoding sub-module and an auxiliary sub-module.
  • the parameter adjustment submodule includes two functions of feature extraction and neural network, which is used to determine the best value combination of encoding parameters according to the data input by the input module;
  • the encoding submodule includes three functions of parameter configuration, encoding and decoding, which are used to The best value combination of encoding parameters encodes the audio data and decodes the code stream;
  • the auxiliary sub-module includes two functions: rate fluctuation statistics and subjective difference scoring (ie ODG scoring), which are used to encode the data generated Count the changes in the number of bytes of the packet, and score the sound quality of the encoded and then decoded audio.
  • ODG scores are obtained through the subjective measurement audio perception algorithm (Perceptual Evaluation of Audio Quality, PEAQ) in the International Telecommunication Union (International Telecommunication Union, ITU) BS.1387-1.
  • PEAQ Perceptual Evaluation of Audio Quality
  • ITU International Telecommunication Union
  • the value range of the score is -4-0, and the more the score is Close to 0 means that the audio quality after encoding and decoding is better.
  • the data output by the output module is the data packet generated by encoding, and then the audio code stream formed by the Bluetooth packet type encapsulation.
  • Fig. 3 is a flowchart of an embodiment of an audio coding method of this application.
  • the method of this embodiment can be used by the terminal device in Fig. 1, for example, a mobile phone, a computer (including a laptop, a desktop, etc.), and a tablet. (Including handheld tablets, vehicle-mounted tablets, etc.) and other implementations.
  • Audio coding methods can include:
  • Step 301 Acquire first audio data.
  • the first audio data is audio data to be encoded.
  • the terminal device can directly read the first audio data from the local memory, or can receive the first audio data from other devices, which is not specifically limited in this application.
  • Step 302 Obtain the target bit rate and the Bluetooth packet type, where the target bit rate and the Bluetooth packet type correspond to the current Bluetooth channel status.
  • the target code rate is used to indicate the average number of bytes of the data packet generated by encoding within the set time period, that is, the target code rate can be considered to be the average number of bytes of the data packet expected to be obtained after encoding the first audio data. Due to the influence of these factors, the number of bytes (ie code rate) of each data packet generated by the encoding is less likely to reach the target code rate, so the code rate of each data packet can be allowed to fluctuate in a small range near the target code rate. , It is only required that the average bit rate of multiple data packets in the set time period meets the target bit rate.
  • the Bluetooth packet type refers to the packet type transmitted by Bluetooth.
  • the Bluetooth packet type can include any of 2DH1, 2DH3, 2DH5, 3DH1, 3DH3, and 3DH5, and each Bluetooth packet type corresponds to an upper limit of code rate fluctuation.
  • the target bit rate and Bluetooth packet type in this application correspond to the current Bluetooth channel conditions, that is, the target bit rate and Bluetooth packet type are determined based on the Bluetooth channel conditions, so the target bit rate and Bluetooth packet type are also related to the Bluetooth channel conditions A reflection.
  • step 301 and step 302 are in no order.
  • Step 303 Acquire one or more of a bit pool parameter set, a psychoacoustic parameter set, and a spectrum bandwidth parameter set through a neural network obtained by pre-training according to the first audio data, the target bit rate and the Bluetooth packet type.
  • the parameters in the bit pool parameter set are used to indicate the number of remaining code stream bits that can be used for encoding.
  • the size of the bit pool is adjusted to control the fluctuation of the code rate in the constant bit rate (CBR) encoding mode to realize the code
  • CBR constant bit rate
  • the rate fluctuates instantaneously and converges for a long time.
  • This method allows a certain degree of code rate fluctuation in the CBR encoding mode, and provides better sound quality guarantee by allocating different numbers of bits to different audio data.
  • the actual number of bits allocated (bit rate) is less than the target number of bits (target bit rate)
  • the remaining bits are put into the bit pool; when the actual number of bits allocated is more than the target number of bits, some bits are taken out of the bit pool use.
  • bit pool state in this method is jointly determined by all historical frames and the current frame, which reflects the fluctuation of the code rate and the degree of ease of compression during the entire time period from the past to the current state. If the bit pool is large, the allowable code rate fluctuates greatly, and the encoded audio quality is high; if the bit pool is small, the allowable code rate fluctuates less, and the encoded audio quality is low.
  • the parameters in the psychoacoustic parameter set are used to indicate the allocation of the number of bits required for encoding at different frequencies.
  • the psychoacoustic model determines which information in a piece of audio is the main information, which must be preserved during the encoding process. Which information is secondary information can be ignored in the encoding.
  • Figure 4 exemplarily shows a schematic diagram of the psychoacoustic process. As shown in Figure 4, there is a bunker at 900 Hz with very high energy. The energy near the bunker is below the dotted line. It will not be heard by people, which means that the information below the dotted line can be unencoded, which reduces the number of encoded bits.
  • Masking is determined by three parts, one is the in-band masking parameter dr, the second is the attenuation speed k1 for masking the low frequency band, and the third is the attenuation speed k2 for masking the high frequency band.
  • the three parameters dr, k1, and k2 directly determine the number of bits (code rate) of the data packet generated by encoding during the AAC quantization process. If the code rate of the actual data packet is greater than the target code rate, reduce dr; if the code rate of the actual data packet is less than the target code rate, increase dr.
  • the parameters in the spectrum bandwidth parameter set are used to represent the highest cutoff frequency of the encoded audio spectrum.
  • the terminal device can perform feature extraction on the first audio data, target bit rate, and Bluetooth packet type to obtain the first feature vector, and input the first feature vector into the neural network to obtain the bit pool parameter set, the psychoacoustic parameter set, and the spectrum bandwidth parameter set.
  • Figure 5 exemplarily shows a schematic diagram of the parameter acquisition method. As shown in Figure 5, the terminal device performs feature transformation on the first audio data, target bit rate, and Bluetooth packet type, and extracts feature vectors, such as bit rate and representing music
  • the Mel cepstrum coefficient of the feature for example, represents the linear prediction cepstrum coefficient of the music feature.
  • the feature extraction process can reduce the data dimension, thereby reducing the amount of calculation.
  • the terminal device inputs the feature vector into the pre-trained neural network to obtain one or more of the above-mentioned bit pool parameter set, psychoacoustic parameter set, and spectrum bandwidth parameter set.
  • the terminal device can construct a training data set of a neural network, the training data set includes a corresponding relationship between a first value combination and a second value combination, and the first value combination is audio data, target bit rate, and Bluetooth packet Any one of the multiple value combinations of the type, the second value combination is one of the multiple value combinations of the bit pool parameter set, the psychoacoustic parameter set, and the spectrum bandwidth parameter set, the bit pool parameter set, the psychoacoustic parameter The multiple value combinations of the set and the spectrum bandwidth parameter set correspond to multiple ODG scores, where the second value combination corresponds to the highest ODG score.
  • the neural network is trained according to the training data set.
  • FIG. 6 shows a schematic diagram of a method for constructing a training data set.
  • a terminal device acquires multiple audio data.
  • multiple value combinations of the bit pool parameter set, the psychoacoustic parameter set, and the spectral bandwidth parameter set are used to respectively encode the second audio data, and the second audio data is any of the multiple audio data.
  • One. Obtain multiple ODG scores based on the encoding result.
  • the value combination corresponding to the highest one of the multiple ODG scores is determined as the second value combination.
  • the first value combination and the second value combination are added to the training data set.
  • the terminal device first collects a large number of music files, the styles and types of these music files are different, and then for the audio data in each music file, under the value combination of each audio data, target bit rate, and Bluetooth packet type,
  • the aforementioned audio data is encoded with the corresponding value combination, and the codes of multiple data packets generated by the encoding are counted for each encoding Rate fluctuations and scoring using the ODG method.
  • the terminal device can input the extracted feature vector into the neural network for training, output the bit pool parameter set, the psychoacoustic parameter set and the spectral bandwidth parameter set, and compare with the optimal value combination in the training data set Yes, the loss of the neural network is obtained.
  • a neural network with different target bit rates, different Bluetooth packet types and different audio data is finally obtained after convergence.
  • the optimal combination of the corresponding bit pool parameter set, psychoacoustic parameter set, and spectrum bandwidth parameter set is also compatible with Bluetooth Corresponding to the channel conditions, it can be seen that the neural network has considered the changes in the Bluetooth channel conditions and considered the optimal value combination of related parameters that matches the Bluetooth channel conditions.
  • Step 304 Encode the first audio data according to one or more of the bit pool parameter set, the psychoacoustic parameter set, and the spectrum bandwidth parameter set to obtain a bit stream to be sent.
  • the terminal device may set parameters in one or more of the bit pool parameter set, the psychoacoustic parameter set, and the spectrum bandwidth parameter set to the encoder, and then encode the first audio data to obtain the encoded bit stream.
  • This application can refer to the encoding technology in step 303 above, and encode the first audio data according to one or more of the bit pool parameter set, psychoacoustic parameter set, and spectral bandwidth parameter set obtained in this step.
  • the implementation principle is similar. I won't repeat them here. This can not only meet the Bluetooth limitation of bit rate fluctuations, but also ensure a higher level of sound quality.
  • the encoding end ie, the terminal device obtains the relevant parameters for encoding through the neural network according to the target bit rate and Bluetooth packet type corresponding to the current Bluetooth channel status, as well as the audio data, which can not only adaptively match the Bluetooth channel status, but also It can effectively reduce the code rate fluctuation of audio coding, improve the anti-interference performance during audio transmission, and maximize the sound quality while bringing a continuous audio listening experience.
  • FIG. 7 is a schematic structural diagram of an embodiment of an audio encoding device of this application.
  • the device 700 of this embodiment may include: an input module 701, a parameter acquisition module 702, and an encoding module 703.
  • the input module 701 is used for Acquire first audio data; obtain a target code rate and a Bluetooth packet type, where the target code rate and the Bluetooth packet type correspond to current Bluetooth channel conditions.
  • the target bit rate is used to indicate the average number of bytes of multiple data packets generated by encoding within a set time period
  • the Bluetooth packet type refers to the type of packets transmitted by Bluetooth
  • the parameter acquisition module 702 is used to The first audio data, the target bit rate, and the Bluetooth packet type obtain one or more of a bit pool parameter set, a psychoacoustic parameter set, and a spectral bandwidth parameter set through a neural network obtained by pre-training, and the bit pool parameter
  • the parameters in the set are used to indicate the number of remaining code stream bits that can be used for encoding
  • the parameters in the psychoacoustic parameter set are used to indicate the allocation of the number of bits required for encoding at different frequencies
  • the parameters in the spectrum bandwidth parameter set It is used to indicate the highest cut-off frequency of the encoded audio spectrum
  • the encoding module 703 is used to compare the first set of parameters according to one or more of the bit pool parameter set, the psychoacoustic parameter set, and the spectrum bandwidth parameter set.
  • the parameter acquisition module 702 is specifically configured to perform feature extraction on the first audio data, the target bit rate, and the Bluetooth packet type to obtain a first feature vector;
  • the first feature vector is input to the neural network to obtain one or more of the bit pool parameter set, the psychoacoustic parameter set, and the spectral bandwidth parameter set.
  • the Bluetooth packet type includes any one of 2DH1, 2DH3, 2DH5, 3DH1, 3DH3, and 3DH5.
  • the parameter acquisition module 702 is also used to construct a training data set of the neural network, and the training data set includes a corresponding relationship between a first value combination and a second value combination
  • the first value combination is any one of multiple value combinations of audio data, target bit rate, and Bluetooth packet type
  • the second value combination is a bit pool parameter set, a psychoacoustic parameter set, and a spectral bandwidth parameter set
  • One of the multiple value combinations of the bit pool parameter set, the psychoacoustic parameter set, and the spectral bandwidth parameter set corresponds to multiple ODG scores, wherein the second value combination corresponds to Has the highest ODG score; the neural network is trained according to the training data set.
  • the parameter acquisition module 702 is specifically configured to acquire multiple audio data; in the first value combination, the bit pool parameter set, psychoacoustic parameter set, and spectrum bandwidth are used.
  • the multiple value combinations of the parameter set respectively encode the second audio data, where the second audio data is any one of the multiple audio data; obtain the multiple ODG scores according to the encoding result; combine the multiple The value combination corresponding to the highest one of the ODG scores is determined as the second value combination; the first value combination and the second value combination are added to the training data set.
  • the device 700 of this embodiment may be used to implement the technical solutions of the method embodiments shown in FIGS. 3 to 6, and its implementation principles and technical effects are similar, and will not be repeated here.
  • FIG. 8 is a schematic structural diagram of a terminal device provided by this application. As shown in FIG. 8, the terminal device 800 includes a processor 801 and a transceiver 802.
  • the terminal device 800 further includes a memory 803.
  • the processor 801, the transceiver 802, and the memory 803 can communicate with each other through an internal connection path to transfer control signals and/or data signals.
  • the memory 803 is used to store a computer program.
  • the processor 801 is configured to execute a computer program stored in the memory 803, so as to implement various functions of the audio encoding device in the foregoing device embodiment.
  • the memory 803 may also be integrated in the processor 801 or independent of the processor 801.
  • the terminal device 800 may further include an antenna 804 for transmitting the signal output by the transceiver 802.
  • the transceiver 802 receives signals through an antenna.
  • the terminal device 800 may further include a power supply 805 for providing power to various devices or circuits in the terminal device.
  • the terminal device 800 may also include one of an input unit 806, a display unit 807 (also can be regarded as an output unit), an audio circuit 808, a camera 809, and a sensor 810, etc. Multiple.
  • the audio circuit may also include a speaker 8081, a microphone 8082, etc., which will not be described in detail.
  • the apparatus 800 of this embodiment may be used to implement the technical solutions of the method embodiments shown in FIG. 3 to FIG. 6, and its implementation principles and technical effects are similar, and will not be repeated here.
  • the steps of the foregoing method embodiments can be completed by hardware integrated logic circuits in the processor or instructions in the form of software.
  • the processor can be a general-purpose processor, digital signal processor (digital signal processor, DSP), application-specific integrated circuit (ASIC), field programmable gate array (field programmable gate array, FPGA) or other Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as execution and completion by a hardware encoding processor, or execution and completion by a combination of hardware and software modules in the encoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the memory mentioned in the above embodiments may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM random access memory
  • static random access memory static random access memory
  • dynamic RAM dynamic RAM
  • DRAM dynamic random access memory
  • synchronous dynamic random access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory serial DRAM, SLDRAM
  • direct rambus RAM direct rambus RAM
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (personal computer, server, or network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Otolaryngology (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

一种音频编码方法和装置。音频编码方法包括:获取第一音频数据;获取目标码率和蓝牙包类型,目标码率和蓝牙包类型与当前蓝牙信道状况对应;根据第一音频数据、目标码率和蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个;根据比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个对第一音频数据进行编码得到待发送码流。可以自适应地匹配蓝牙信道状况,并且最大化保证音质的同时带来连续的音频听觉体验。

Description

音频编码方法和装置
本申请要求于2019年9月18日提交中国专利局、申请号为201910883038.0、申请名称为“音频编码方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及音频处理技术,尤其涉及一种音频编码方法和装置。
背景技术
随着真无线立体声(True Wireless Stereo,TWS)耳机、智能音箱和智能手表等无线蓝牙设备在日常生活中的广泛普及和使用,人们在各种场景下对追求高质量音乐播放体验的需求也变得越来越迫切。由于蓝牙信道对数据传输大小的限制,音频数据必须经过蓝牙设备发送端的音频编码器进行数据压缩后才能传输到蓝牙设备接收端进行解码和播放。目前主流的蓝牙编解码技术包括高级音频传输协议(Advanced Audio Distribution Profile,A2DP)默认的次频带编码(Sub-band Coding,SBC),动态影像专家组(Moving Picture Experts Group,MPEG)的高级音频编码(Advanced Audio Coding,AAC)系列,索尼(Sony)的LDAC,高通(Qualcomm)的aptX系列等。
目前在音频的传输过程中,音频质量严格依赖于蓝牙连接链路的吞吐量和稳定性,当蓝牙连接链路的信道质量受到干扰时,一旦码率波动较大,就会导致音频数据在传输过程中丢失,进而在音频播放时产生声音的卡顿断续,严重影响了用户的体验。相关技术可以对码率波动的范围进行控制,但是控制方法较为粗糙,无法兼顾声音连续和音质保证。
发明内容
本申请提供一种音频编码方法和装置,以自适应地匹配蓝牙信道状况,并且最大化保证音质的同时带来连续的音频听觉体验。
第一方面,本申请提供一种音频编码方法,包括:
获取第一音频数据;获取目标码率和蓝牙包类型,所述目标码率和所述蓝牙包类型与当前蓝牙信道状况对应;根据所述第一音频数据、所述目标码率和所述蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个,所述比特池参数集合中的参数用于表示可用于编码的剩余码流比特数,所述心理声学参数集合中的参数用于表示编码所需比特数在不同频率处的分配,所述频谱带宽参数集合中的参数用于表示编码后的音频频谱的最高截止频率;根据所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个对所述第一音频数据进行编码得到待发送码流。
本申请根据与当前蓝牙信道状况对应的目标码率和蓝牙包类型,以及音频数据通过神经网络获取进行编码的相关参数,既可以自适应地匹配蓝牙信道状况,又可以有效降低音频编码的码率波动,提升音频传输时的抗干扰性,最大化保证音质的同时带来连续的音频听觉体验。
在一种可能的实现方式中,所述根据所述第一音频数据、所述目标码率和所述蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个,包括:对所述第一音频数据、所述目标码率和所述蓝牙包类型进行特征提取得到第一特征向量;将所述第一特征向量输入所述神经网络得到所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个。
在一种可能的实现方式中,所述蓝牙包类型指蓝牙发射的包类型,可以包括2DH1,2DH3,2DH5,3DH1,3DH3和3DH5中的任意一种。
在一种可能的实现方式中,所述目标码率用于指示设定时间段内编码生成的数据包的平均字节数。
在一种可能的实现方式中,所述获取第一音频数据之前,还包括:构建所述神经网络的训练数据集,所述训练数据集包括第一取值组合和第二取值组合的对应关系,所述第一取值组合为音频数据、目标码率和蓝牙包类型的多个取值组合的任意一个,所述第二取值组合为比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合的其中之一,所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合和多个ODG分数对应,其中,所述第二取值组合对应的ODG分数最高;根据所述训练数据集训练得到所述神经网络。
本申请由于在神经网络的训练过程中,目标码率和蓝牙包类型均与蓝牙信道状况对应,因此与其对应的比特池参数集合、心理声学参数集合和频谱带宽参数集合的最优取值组合也是和蓝牙信道状况对应的,可见神经网络已经考虑到了蓝牙信道状况的变化,且考虑到了匹配蓝牙信道状况的最优的相关参数的取值组合。
在一种可能的实现方式中,所述构建所述神经网络的训练数据集,包括:获取多个音频数据;在所述第一取值组合下,采用所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合分别对第二音频数据进行编码,所述第二音频数据为所述多个音频数据中的任意一个;根据编码结果获取所述多个ODG分数;将所述多个ODG分数中最高者对应的取值组合确定为所述第二取值组合;将所述第一取值组合和所述第二取值组合加入所述训练数据集。
第二方面,本申请提供一种音频编码装置,包括:
输入模块,用于获取第一音频数据;获取目标码率和蓝牙包类型,所述目标码率和所述蓝牙包类型与当前蓝牙信道状况对应;参数获取模块,用于根据所述第一音频数据、所述目标码率和所述蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个,所述比特池参数集合中的参数用于表示可用于编码的剩余码流比特数,所述心理声学参数集合中的参数用于表示编码所需比特数在不同频率处的分配,所述频谱带宽参数集合中的参数用于表示编码后的音频频谱的最高截止频率;编码模块,用于根据所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个对所述第一音频数据进行编码得到待发送码流。
在一种可能的实现方式中,所述参数获取模块,具体用于对所述第一音频数据、所述目标码率和所述蓝牙包类型进行特征提取得到第一特征向量;将所述第一特征向量输入所述神经网络得到所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个。
在一种可能的实现方式中,所述蓝牙包类型指蓝牙发射的包类型,可以包括2DH1,2DH3,2DH5,3DH1,3DH3和3DH5中的任意一种。
在一种可能的实现方式中,所述目标码率用于指示设定时间段内编码生成的数据包的平均字节数。
在一种可能的实现方式中,所述参数获取模块,还用于构建所述神经网络的训练数据集,所述训练数据集包括第一取值组合和第二取值组合的对应关系,所述第一取值组合为音频数据、目标码率和蓝牙包类型的多个取值组合的任意一个,所述第二取值组合为比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合的其中之一,所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合和多个ODG分数对应,其中,所述第二取值组合对应的ODG分数最高;根据所述训练数据集训练得到所述神经网络。
在一种可能的实现方式中,所述参数获取模块,具体用于获取多个音频数据;在所述第一取值组合下,采用所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合分别对第二音频数据进行编码,所述第二音频数据为所述多个音频数据中的任意一个;根据编码结果获取所述多个ODG分数;将所述多个ODG分数中最高者对应的取值组合确定为所述第二取值组合;将所述第一取值组合和所述第二取值组合加入所述训练数据集。
第三方面,本申请提供一种终端设备,包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述第一方面中任一项所述的方法。
第四方面,本申请提供一种计算机可读存储介质,包括计算机程序,所述计算机程序在计算机上被执行时,使得所述计算机执行上述第一方面中任一项所述的方法。
第五方面,本申请提供一种计算机程序产品,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行上述第一方面中任一项所述的方法。
附图说明
图1示例性的示出了本申请音频编码方法适用的应用场景的一个示例图;
图2示例性的示出了本申请音频编码系统的一个示例图;
图3为本申请音频编码方法实施例的流程图;
图4示例性的示出了心理声学过程的一个示意图;
图5示例性的示出了参数获取方法的一个示意图;
图6示出了训练数据集的构建方法的一个示意图;
图7为本申请音频编码装置实施例的结构示意图;
图8为本申请提供的终端设备的示意性结构图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书实施例和权利要求书及附图中的术语“第一”、“第二”等仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元。方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
图1示例性的示出了本申请音频编码方法适用的应用场景的一个示例图,如图1所示,该应用场景包括终端设备和蓝牙设备,终端设备和蓝牙设备可以是具备蓝牙连接功能且支持AAC系列标准的设备,其中,终端设备例如可以是手机、电脑(包括笔记本、台式机等)和平板(包括手持平板、车载平板等)等,蓝牙播放设备例如可以是TWS耳机、无线头戴式耳机和无线颈圈式耳机等,蓝牙设备例如还可以是智能音箱、智能手表、智能眼镜和车载音箱等。本申请中最广泛的应用场景存在于手机和蓝牙设备之间,即手机与TWS耳机、无线头戴式耳机和无线颈圈式耳机等之间,或者手机与智能音箱、智能手表、智能眼镜和车载音箱等之间。但本申请并不限定于此。
图2示例性的示出了本申请音频编码系统的一个示例图,如图2所示,音频编码系统包括输入模块、处理模块和输出模块。
其中,输入模块获取到的数据包括音频数据,例如音频脉冲编码调制(Pulse Code Modulation,PCM)码流,以及基于蓝牙信道状况确定的目标码率和蓝牙包类型,该目标码率和蓝牙包类型与当前蓝牙信道状况对应。目标码率用于指示设定时间段内编码生成的数据包的平均字节数。蓝牙包类型指蓝牙发射的包类型。在蓝牙连接链路中,用于传输音频码流的异构链接层(Asynchronous Connection-Less,ACL)上使用的蓝牙包类型可以包括2DH1(限定传输的音频码流中的数据包最大可以是31字节)、2DH3(限定传输的音频码流中的数据包最大可以是356字节),2DH5(限定传输的音频码流中的数据包最大可以是656字节),3DH1(限定传输的音频码流中的数据包最大可以是11字节)、3DH3(限定传输的音频码流中的数据包最大可以是536字节)和3DH5(限定传输的音频码流中的数据包最大可以是986字节)中的任意一种,其中,2DH1、2DH3和2DH5采用的调制方式为π/4四相相对相移键控(Differential Quadrature Reference Phase Shift Keying,DQPSK),3DH1、3DH3和3DH5采用的调制方式为8DQPSK。当蓝牙受干扰程度小, 信道状态好时,优先选择2DH5或者3DH5,这两种蓝牙包类型具有更大的数据传输能力和较弱的抗干扰能力,可以让音频编码器以128kbps以上的目标码率工作,实现更高的音质传输;当蓝牙受干扰程度大,信道状态差时,优先选择2DH3、3DH3、2DH1或者3DH1,这几种蓝牙包类型具有更大的抗干扰能力和较小的数据传输能力,可以使音频编码器以96kbps以下的目标码率工作,优先保证音频传输的连续性。
处理模块包括调参子模块、编码子模块和辅助子模块。调参子模块包括特征提取和神经网络两个功能,用于根据输入模块输入的数据确定最佳的编码参数的取值组合;编码子模块包括参数配置、编码和解码三个功能,用于根据最佳的编码参数的取值组合对音频数据进行编码,以及对码流进行解码;辅助子模块包括码率波动统计和主观差异打分(即ODG打分)两个功能,用于对编码产生的数据包的字节数变化进行统计,并对编码再解码后的音频的音质打分。ODG打分通过国际电信联盟标准(International Telecommunication Union,ITU)BS.1387-1中的主观度量音频感知算法(Perceptual Evaluation of Audio Quality,PEAQ)得到,分数的取值范围是-4-0,得分越接近0表示编码再解码后的音频的音质越好。
输出模块输出的数据即为编码生成的数据包,再通过蓝牙包类型封装后组成的音频码流。
图3为本申请音频编码方法实施例的流程图,如图3所示,本实施例的方法可以由图1中的终端设备,例如可以是手机、电脑(包括笔记本、台式机等)和平板(包括手持平板、车载平板等)等执行。音频编码方法可以包括:
步骤301、获取第一音频数据。
第一音频数据为待编码的音频数据。终端设备可以从本地的存储器中直接读取该第一音频数据,也可以接收来自其他设备的第一音频数据,本申请对此不做具体限定。
步骤302、获取目标码率和蓝牙包类型,目标码率和蓝牙包类型与当前蓝牙信道状况对应。
目标码率用于指示设定时间段内编码生成的数据包的平均字节数,即可以认为目标码率是对第一音频数据编码后,期望得到的数据包的平均字节数,由于多个因素的影响,编码产生的每个数据包的字节数(即码率)均要达到目标码率的可能性较低,因此可以允许各个数据包的码率在目标码率附近小范围波动,只要求设定时间段内的多个数据包的平均码率满足目标码率即可。蓝牙包类型指蓝牙发射的包类型。蓝牙包类型可以包括2DH1,2DH3,2DH5,3DH1,3DH3和3DH5中的任意一种,每种蓝牙包类型对应一个码率波动的上限。本申请中目标码率和蓝牙包类型均与当前蓝牙信道状况对应,亦即目标码率和蓝牙包类型均是基于蓝牙信道状况确定的,因此目标码率和蓝牙包类型也是对蓝牙信道状况的一种反映。
在图3所示的实施例中,步骤301和步骤302没有先后顺序之分。
步骤303、根据第一音频数据、目标码率和蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个。
比特池参数集合中的参数用于表示可用于编码的剩余码流比特数,相关技术中通过调节比特池的大小来控制固定码率(Constant Bitrate,CBR)编码模式下的码率波动,实现码率瞬时波动,长时收敛的特点。该方法在CBR编码模式下允许一定程度的码率波动, 通过对不同的音频数据分配不同的比特数提供更好的音质保证。当分配的实际比特数(码率)比目标比特数(目标码率)少,就将剩余的比特放进比特池;当分配的实际比特数比目标比特数多,就从比特池取出一些比特使用。由于比特池不是无限大,所以编码过程中长时间段内的平均码率仍被限制在CBR编码模式的目标码率附近。该方法中的比特池状态由所有历史帧和当前帧共同决定,其反应从过去到当前状态一整个时间段内的码率波动和可压难易程度。若比特池大,则可允许的码率波动大,进而编码的音质高;若比特池小,则可允许的码率波动小,进而编码的音质低。
心理声学参数集合中的参数用于表示表示编码所需比特数在不同频率处的分配,相关技术中通过心理声学模型确定了一段音频中哪些信息是主要信息,在编码过程中是必须保留的,哪些信息是次要信息,可以在编码中忽略。示例性的,图4示例性的示出了心理声学过程的一个示意图,如图4所示,在900Hz处有一个掩体,具有很高的能量,位于该掩体附近的能量在虚线以下分贝的音频则不会被人听到,这说明了虚线以下的信息可以不经过编码,降低了编码的比特数。掩蔽由三部分决定,一是带内掩蔽参数dr,二是掩蔽低频带的衰减速度k1,三是掩蔽高频带的衰减速度k2。dr,k1,k2这三个参数在AAC量化过程中直接决定了编码产生的数据包的比特数(码率)。如果实际数据包的码率大于目标码率,则降低dr;如果实际数据包的码率小于目标码率,则增大dr。
频谱带宽参数集合中的参数用于表示编码后的音频频谱的最高截止频率,截止频率越高,相应的音频高频成分越丰富,在一定程度上能够提升音频音质。
终端设备可以对第一音频数据、目标码率和蓝牙包类型进行特征提取得到第一特征向量,将第一特征向量输入神经网络得到比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个。图5示例性的示出了参数获取方法的一个示意图,如图5所示,终端设备对第一音频数据、目标码率和蓝牙包类型进行特征变换,提取特征向量,例如码率和表示音乐特征的梅尔倒谱系数,又例如表示音乐特征的线性预测倒谱系数,该特征提取过程可以降低数据维度,进而减少计算量。终端设备将特征向量输入预先训练好的神经网络,得到上述比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个。
需要说明的是,本申请中除了采用上述的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个外,还可以采用人工智能(Artificial Intelligence,AI)的其他方法、数学运算等方式获取上述参数集合,对此本申请不做具体限定。
本申请中,终端设备可以构建神经网络的训练数据集,该训练数据集包括第一取值组合和第二取值组合的对应关系,第一取值组合为音频数据、目标码率和蓝牙包类型的多个取值组合的任意一个,第二取值组合为比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合的其中之一,比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合和多个ODG分数对应,其中,第二取值组合对应的ODG分数最高。根据训练数据集训练得到神经网络。
示例性的,图6示出了训练数据集的构建方法的一个示意图,如图6所示,终端设备获取多个音频数据。在第一取值组合下,采用比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合分别对第二音频数据进行编码,第二音频数据为多个音频数据中的任意一个。根据编码结果获取多个ODG分数。将多个ODG分数中最高者对应的 取值组合确定为第二取值组合。将第一取值组合和第二取值组合加入训练数据集。即终端设备首先搜集大量音乐文件,这些音乐文件的风格、类型等各不相同,然后对于每个音乐文件中的音频数据,在各个音频数据、目标码率和蓝牙包类型的取值组合下,通过不断变换比特池参数集合、心理声学参数集合和频谱带宽参数集合的取值组合,采用相应的取值组合对前述音频数据进行编码,并且每次编码都统计编码产生的多个数据包的码率波动和采用ODG方法打分,最后将满足码率波动要求的ODG分数最高者对应的比特池参数集合、心理声学参数集合和频谱带宽参数集合的取值组合输出,得到x=(蓝牙包类型,目标码率和音频数据的取值组合之一)和y=(比特池参数集合、心理声学参数集合和频谱带宽参数集合的最优取值组合)之间的对应关系,x为神经网络的输入,y为神经网络的输出,(x,y)表示神经网络的训练数据集。
基于上述训练数据集,终端设备可以将提取得到的特征向量输入神经网络进行训练,输出比特池参数集合、心理声学参数集合和频谱带宽参数集合,并与训练数据集中的最优取值组合进行比对,得到神经网络的损失,通过大量的反向传播训练,最终获得收敛后的具有预测不同目标码率,不同蓝牙包类型和不同音频数据的神经网络。
由于在神经网络的训练过程中,目标码率和蓝牙包类型均与蓝牙信道状况对应,因此与其对应的比特池参数集合、心理声学参数集合和频谱带宽参数集合的最优取值组合也是和蓝牙信道状况对应的,可见神经网络已经考虑到了蓝牙信道状况的变化,且考虑到了匹配蓝牙信道状况的最优的相关参数的取值组合。
步骤304、根据比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个对第一音频数据进行编码得到待发送码流。
终端设备可以将比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个中的参数设置到编码器中,然后对第一音频数据进行编码,获取编码后的码流。本申请可以参考上述步骤303中的编码技术,根据本步骤获取的比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个对第一音频数据进行编码,其实现原理类似,此处不再赘述。这样既可以满足蓝牙对码率波动的限制,又能确保较高的音质水平。
本申请在编码端(即终端设备)根据与当前蓝牙信道状况对应的目标码率和蓝牙包类型,以及音频数据通过神经网络获取进行编码的相关参数,既可以自适应地匹配蓝牙信道状况,又可以有效降低音频编码的码率波动,提升音频传输时的抗干扰性,最大化保证音质的同时带来连续的音频听觉体验。
图7为本申请音频编码装置实施例的结构示意图,如图7所示,本实施例的装置700可以包括:输入模块701、参数获取模块702和编码模块703,其中,输入模块701,用于获取第一音频数据;获取目标码率和蓝牙包类型,所述目标码率和所述蓝牙包类型与当前蓝牙信道状况对应。其中,所述目标码率用于指示设定时间段内编码生成的多个数据包的平均字节数,所述蓝牙包类型指蓝牙发射的包类型;参数获取模块702,用于根据所述第一音频数据、所述目标码率和所述蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个,所述比特池参数集合中的参数用于表示可用于编码的剩余码流比特数,所述心理声学参数集合中的参数用于表示编码所需比特数在不同频率处的分配,所述频谱带宽参数集合中的参数用于表示编码后的音频频谱的最高截止频率;编码模块703,用于根据所述比特池参数集合、所述心理声 学参数集合和所述频谱带宽参数集合中的一个或多个对所述第一音频数据进行编码得到待发送码流。
在一种可能的实现方式中,所述参数获取模块702,具体用于对所述第一音频数据、所述目标码率和所述蓝牙包类型进行特征提取得到第一特征向量;将所述第一特征向量输入所述神经网络得到所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个。
在一种可能的实现方式中,所述蓝牙包类型包括2DH1,2DH3,2DH5,3DH1,3DH3和3DH5中的任意一种。
在一种可能的实现方式中,所述参数获取模块702,还用于构建所述神经网络的训练数据集,所述训练数据集包括第一取值组合和第二取值组合的对应关系,所述第一取值组合为音频数据、目标码率和蓝牙包类型的多个取值组合的任意一个,所述第二取值组合为比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合的其中之一,所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合和多个ODG分数对应,其中,所述第二取值组合对应的ODG分数最高;根据所述训练数据集训练得到所述神经网络。
在一种可能的实现方式中,所述参数获取模块702,具体用于获取多个音频数据;在所述第一取值组合下,采用所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合分别对第二音频数据进行编码,所述第二音频数据为所述多个音频数据中的任意一个;根据编码结果获取所述多个ODG分数;将所述多个ODG分数中最高者对应的取值组合确定为所述第二取值组合;将所述第一取值组合和所述第二取值组合加入所述训练数据集。
本实施例的装置700,可以用于执行图3-图6所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
图8为本申请提供的终端设备的示意性结构图。如图8所示,终端设备800包括处理器801和收发器802。
可选地,终端设备800还包括存储器803。其中,处理器801、收发器802和存储器803之间可以通过内部连接通路互相通信,传递控制信号和/或数据信号。
其中,存储器803用于存储计算机程序。处理器801用于执行存储器803中存储的计算机程序,从而实现上述装置实施例中音频编码装置的各功能。
可选地,存储器803也可以集成在处理器801中,或者独立于处理器801。
可选地,终端设备800还可以包括天线804,用于将收发器802输出的信号发射出去。或者,收发器802通过天线接收信号。
可选地,终端设备800还可以包括电源805,用于给终端设备中的各种器件或电路提供电源。
除此之外,为了使得终端设备的功能更加完善,终端设备800还可以包括输入单元806、显示单元807(也可以认为是输出单元)、音频电路808、摄像头809和传感器810等中的一个或多个。音频电路还可以包括扬声器8081、麦克风8082等,不再赘述。
本实施例的装置800,可以用于执行图3-图6所示方法实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。
在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、特定应用集成电路(application-specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。本申请实施例公开的方法的步骤可以直接体现为硬件编码处理器执行完成,或者用编码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
上述各实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各 个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种音频编码方法,其特征在于,包括:
    获取第一音频数据;
    获取目标码率和蓝牙包类型,所述目标码率和所述蓝牙包类型与当前蓝牙信道状况对应;
    根据所述第一音频数据、所述目标码率和所述蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个,所述比特池参数集合中的参数用于表示可用于编码的剩余码流比特数,所述心理声学参数集合中的参数用于表示编码所需比特数在不同频率处的分配,所述频谱带宽参数集合中的参数用于表示编码后的音频频谱的最高截止频率;
    根据所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个对所述第一音频数据进行编码得到待发送码流。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述第一音频数据、所述目标码率和所述蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个,包括:
    对所述第一音频数据、所述目标码率和所述蓝牙包类型进行特征提取得到第一特征向量;
    将所述第一特征向量输入所述神经网络得到所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个。
  3. 根据权利要求1或2所述的方法,其特征在于,所述蓝牙包类型包括2DH1,2DH3,2DH5,3DH1,3DH3和3DH5中的任意一种。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述目标码率用于指示设定时间段内编码生成的数据包的平均字节数。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述获取第一音频数据之前,还包括:
    构建所述神经网络的训练数据集,所述训练数据集包括第一取值组合和第二取值组合的对应关系,所述第一取值组合为音频数据、目标码率和蓝牙包类型的多个取值组合的任意一个,所述第二取值组合为比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合的其中之一,所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合和多个ODG分数对应,其中,所述第二取值组合对应的ODG分数最高;
    根据所述训练数据集训练得到所述神经网络。
  6. 根据权利要求5所述的方法,其特征在于,所述构建所述神经网络的训练数据集,包括:
    获取多个音频数据;
    在所述第一取值组合下,采用所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合分别对第二音频数据进行编码,所述第二音频数据为所述多个音频数据中的任意一个;
    根据编码结果获取所述多个ODG分数;
    将所述多个ODG分数中最高者对应的取值组合确定为所述第二取值组合;
    将所述第一取值组合和所述第二取值组合加入所述训练数据集。
  7. 一种音频编码装置,其特征在于,包括:
    输入模块,用于获取第一音频数据;获取目标码率和蓝牙包类型,所述目标码率和所述蓝牙包类型与当前蓝牙信道状况对应;
    参数获取模块,用于根据所述第一音频数据、所述目标码率和所述蓝牙包类型通过预先训练得到的神经网络获取比特池参数集合、心理声学参数集合和频谱带宽参数集合中的一个或多个,所述比特池参数集合中的参数用于表示可用于编码的剩余码流比特数,所述心理声学参数集合中的参数用于表示编码所需比特数在不同频率处的分配,所述频谱带宽参数集合中的参数用于表示编码后的音频频谱的最高截止频率;
    编码模块,用于根据所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个对所述第一音频数据进行编码得到待发送码流。
  8. 根据权利要求7所述的装置,其特征在于,所述参数获取模块,具体用于对所述第一音频数据、所述目标码率和所述蓝牙包类型进行特征提取得到第一特征向量;将所述第一特征向量输入所述神经网络得到所述比特池参数集合、所述心理声学参数集合和所述频谱带宽参数集合中的一个或多个。
  9. 根据权利要求7或8所述的装置,其特征在于,所述蓝牙包类型包括2DH1,2DH3,2DH5,3DH1,3DH3和3DH5中的任意一种。
  10. 根据权利要求7-9中任一项所述的方法,其特征在于,所述目标码率用于指示设定时间段内编码生成的数据包的平均字节数。
  11. 根据权利要求7-10中任一项所述的装置,其特征在于,所述参数获取模块,还用于构建所述神经网络的训练数据集,所述训练数据集包括第一取值组合和第二取值组合的对应关系,所述第一取值组合为音频数据、目标码率和蓝牙包类型的多个取值组合的任意一个,所述第二取值组合为比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合的其中之一,所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合和多个ODG分数对应,其中,所述第二取值组合对应的ODG分数最高;根据所述训练数据集训练得到所述神经网络。
  12. 根据权利要求11所述的装置,其特征在于,所述参数获取模块,具体用于获取多个音频数据;在所述第一取值组合下,采用所述比特池参数集合、心理声学参数集合和频谱带宽参数集合的多个取值组合分别对第二音频数据进行编码,所述第二音频数据为所述多个音频数据中的任意一个;根据编码结果获取所述多个ODG分数;将所述多个ODG分数中最高者对应的取值组合确定为所述第二取值组合;将所述第一取值组合和所述第二取值组合加入所述训练数据集。
  13. 一种终端设备,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一项所述的方法。
  14. 一种计算机可读存储介质,其特征在于,包括计算机程序,所述计算机程序在计 算机上被执行时,使得所述计算机执行权利要求1-6中任一项所述的方法。
  15. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序代码,当所述计算机程序代码在计算机上运行时,使得计算机执行权利要求1-6中任一项所述的方法。
PCT/CN2020/115123 2019-09-18 2020-09-14 音频编码方法和装置 WO2021052293A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2022517444A JP7387879B2 (ja) 2019-09-18 2020-09-14 オーディオ符号化方法および装置
EP20865475.6A EP4024394A4 (en) 2019-09-18 2020-09-14 AUDIO CODING METHOD AND APPARATUS
KR1020227012578A KR20220066316A (ko) 2019-09-18 2020-09-14 오디오 코딩 방법 및 장치
US17/697,455 US20220208200A1 (en) 2019-09-18 2022-03-17 Audio coding method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910883038.0 2019-09-18
CN201910883038.0A CN112530444B (zh) 2019-09-18 2019-09-18 音频编码方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/697,455 Continuation US20220208200A1 (en) 2019-09-18 2022-03-17 Audio coding method and apparatus

Publications (1)

Publication Number Publication Date
WO2021052293A1 true WO2021052293A1 (zh) 2021-03-25

Family

ID=74883171

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/115123 WO2021052293A1 (zh) 2019-09-18 2020-09-14 音频编码方法和装置

Country Status (6)

Country Link
US (1) US20220208200A1 (zh)
EP (1) EP4024394A4 (zh)
JP (1) JP7387879B2 (zh)
KR (1) KR20220066316A (zh)
CN (1) CN112530444B (zh)
WO (1) WO2021052293A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530444B (zh) * 2019-09-18 2023-10-03 华为技术有限公司 音频编码方法和装置
CN114550732B (zh) * 2022-04-15 2022-07-08 腾讯科技(深圳)有限公司 一种高频音频信号的编解码方法和相关装置
CN114783452B (zh) * 2022-06-17 2022-12-13 荣耀终端有限公司 音频播放方法、装置及存储介质
CN114863940B (zh) * 2022-07-05 2022-09-30 北京百瑞互联技术有限公司 音质转换的模型训练方法、提升音质的方法、装置及介质
CN117440440B (zh) * 2023-12-21 2024-03-15 艾康恩(深圳)电子科技有限公司 一种蓝牙耳机低延迟传输方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5185800A (en) * 1989-10-13 1993-02-09 Centre National D'etudes Des Telecommunications Bit allocation device for transformed digital audio broadcasting signals with adaptive quantization based on psychoauditive criterion
CN1677492A (zh) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 一种增强音频编解码装置及方法
CN101136202A (zh) * 2006-08-29 2008-03-05 华为技术有限公司 音频信号处理系统、方法以及音频信号收发装置
CN101163248A (zh) * 2007-11-19 2008-04-16 华为技术有限公司 一种码流调度方法、装置和系统
US20100121632A1 (en) * 2007-04-25 2010-05-13 Panasonic Corporation Stereo audio encoding device, stereo audio decoding device, and their method
CN101853663A (zh) * 2009-03-30 2010-10-06 华为技术有限公司 比特分配方法、编码装置及解码装置
CN102436819A (zh) * 2011-10-25 2012-05-02 杭州微纳科技有限公司 无线音频压缩、解压缩方法及音频编码器和音频解码器
CN103532936A (zh) * 2013-09-28 2014-01-22 福州瑞芯微电子有限公司 一种蓝牙音频自适应传输方法
CN109981545A (zh) * 2017-12-28 2019-07-05 北京松果电子有限公司 编码码率调整装置、方法及电子设备

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
EP1438673B1 (en) * 2001-09-26 2012-11-21 Interact Devices Inc. System and method for communicating media signals
EP1873753A1 (en) * 2004-04-01 2008-01-02 Beijing Media Works Co., Ltd Enhanced audio encoding/decoding device and method
CN101308659B (zh) * 2007-05-16 2011-11-30 中兴通讯股份有限公司 一种基于先进音频编码器的心理声学模型的处理方法
US20090099851A1 (en) * 2007-10-11 2009-04-16 Broadcom Corporation Adaptive bit pool allocation in sub-band coding
CN101350199A (zh) * 2008-07-29 2009-01-21 北京中星微电子有限公司 音频编码器及音频编码方法
CN101847413B (zh) * 2010-04-09 2011-11-16 北京航空航天大学 一种使用新型心理声学模型和快速比特分配实现数字音频编码的方法
CN102479514B (zh) * 2010-11-29 2014-02-19 华为终端有限公司 一种编码方法、解码方法、装置和系统
US8793557B2 (en) * 2011-05-19 2014-07-29 Cambrige Silicon Radio Limited Method and apparatus for real-time multidimensional adaptation of an audio coding system
US8666753B2 (en) 2011-12-12 2014-03-04 Motorola Mobility Llc Apparatus and method for audio encoding
US8787403B2 (en) 2012-05-14 2014-07-22 Texas Instruments Incorporated Audio convergence control facilitating bitpool value converging to stable level
WO2015140292A1 (en) * 2014-03-21 2015-09-24 Thomson Licensing Method for compressing a higher order ambisonics (hoa) signal, method for decompressing a compressed hoa signal, apparatus for compressing a hoa signal, and apparatus for decompressing a compressed hoa signal
US10721471B2 (en) * 2017-10-26 2020-07-21 Intel Corporation Deep learning based quantization parameter estimation for video encoding
EP3483882A1 (en) * 2017-11-10 2019-05-15 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Controlling bandwidth in encoders and/or decoders
US11416742B2 (en) * 2017-11-24 2022-08-16 Electronics And Telecommunications Research Institute Audio signal encoding method and apparatus and audio signal decoding method and apparatus using psychoacoustic-based weighted error function
CN109785847B (zh) * 2019-01-25 2021-04-30 东华大学 基于动态残差网络的音频压缩算法
EP3771238B1 (en) * 2019-07-26 2022-09-21 Google LLC Method for managing a plurality of multimedia communication links in a point-to-multipoint bluetooth network
CN112530444B (zh) * 2019-09-18 2023-10-03 华为技术有限公司 音频编码方法和装置
EP4182044A1 (en) * 2020-07-20 2023-05-24 Telefonaktiebolaget LM Ericsson (publ) 5g optimized game rendering
US20240022787A1 (en) * 2020-10-13 2024-01-18 Nokia Technologies Oy Carriage and signaling of neural network representations

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5185800A (en) * 1989-10-13 1993-02-09 Centre National D'etudes Des Telecommunications Bit allocation device for transformed digital audio broadcasting signals with adaptive quantization based on psychoauditive criterion
CN1677492A (zh) * 2004-04-01 2005-10-05 北京宫羽数字技术有限责任公司 一种增强音频编解码装置及方法
CN101136202A (zh) * 2006-08-29 2008-03-05 华为技术有限公司 音频信号处理系统、方法以及音频信号收发装置
US20100121632A1 (en) * 2007-04-25 2010-05-13 Panasonic Corporation Stereo audio encoding device, stereo audio decoding device, and their method
CN101163248A (zh) * 2007-11-19 2008-04-16 华为技术有限公司 一种码流调度方法、装置和系统
CN101853663A (zh) * 2009-03-30 2010-10-06 华为技术有限公司 比特分配方法、编码装置及解码装置
CN102436819A (zh) * 2011-10-25 2012-05-02 杭州微纳科技有限公司 无线音频压缩、解压缩方法及音频编码器和音频解码器
CN103532936A (zh) * 2013-09-28 2014-01-22 福州瑞芯微电子有限公司 一种蓝牙音频自适应传输方法
CN109981545A (zh) * 2017-12-28 2019-07-05 北京松果电子有限公司 编码码率调整装置、方法及电子设备

Also Published As

Publication number Publication date
EP4024394A1 (en) 2022-07-06
CN112530444B (zh) 2023-10-03
CN112530444A (zh) 2021-03-19
US20220208200A1 (en) 2022-06-30
JP2022548299A (ja) 2022-11-17
JP7387879B2 (ja) 2023-11-28
EP4024394A4 (en) 2022-10-26
KR20220066316A (ko) 2022-05-24

Similar Documents

Publication Publication Date Title
WO2021052293A1 (zh) 音频编码方法和装置
WO2021012872A1 (zh) 一种编码参数调控方法、装置、设备及存储介质
CN101217038B (zh) 音频数据子带编码算法编码方法及蓝牙立体声子系统
US20050186993A1 (en) Communication apparatus for playing sound signals
US20230131892A1 (en) Inter-channel phase difference parameter encoding method and apparatus
US20230069653A1 (en) Audio Transmission Method and Electronic Device
CN113365129B (zh) 蓝牙音频数据处理方法、发射器、接收器及收发设备
WO2023273701A1 (zh) 编码控制方法、装置、无线耳机及存储介质
CN111681664A (zh) 一种降低音频编码码率的方法、系统、存储介质及设备
US20240105188A1 (en) Downmixed signal calculation method and apparatus
WO2021213128A1 (zh) 音频信号编码方法和装置
WO2020232631A1 (zh) 一种语音分频传输方法、源端、播放端、源端电路和播放端电路
US20210312934A1 (en) Communication method, apparatus, and system for digital enhanced cordless telecommunications (dect) base station
WO2024021730A1 (zh) 音频信号的处理方法及其装置
WO2024021729A1 (zh) 量化方法、反量化方法及其装置
US20230154472A1 (en) Multi-channel audio signal encoding method and apparatus
CN114863940B (zh) 音质转换的模型训练方法、提升音质的方法、装置及介质
Lee et al. Robust Bluetooth Call Streaming with Frame-Overlapped Transmission and High Efficient Speech Compression
EP3143755A1 (en) Far-end context dependent pre-processing
CN117476013A (zh) 音频信号的处理方法、装置、存储介质及计算机程序产品
CN113347614A (zh) 音频处理设备、系统和方法
CN116709418A (zh) 动态延时调整方法、装置、蓝牙播放设备及存储介质
CN117792665A (zh) 数据传输方法、装置和设备
CN115910082A (zh) 音频数据的编码方法、装置、设备及存储介质
CN116092506A (zh) 通信方法和系统、音频设备及芯片

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20865475

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022517444

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20227012578

Country of ref document: KR

Kind code of ref document: A

Ref document number: 2020865475

Country of ref document: EP

Effective date: 20220331