WO2017206842A1 - 话音信号处理方法和相关装置和系统 - Google Patents

话音信号处理方法和相关装置和系统 Download PDF

Info

Publication number
WO2017206842A1
WO2017206842A1 PCT/CN2017/086374 CN2017086374W WO2017206842A1 WO 2017206842 A1 WO2017206842 A1 WO 2017206842A1 CN 2017086374 W CN2017086374 W CN 2017086374W WO 2017206842 A1 WO2017206842 A1 WO 2017206842A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
signal
decoding
terminal
encoded signal
Prior art date
Application number
PCT/CN2017/086374
Other languages
English (en)
French (fr)
Inventor
王宾
夏丙寅
刘泽新
苗磊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2017206842A1 publication Critical patent/WO2017206842A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M11/00Telephonic communication systems specially adapted for combination with other electrical systems
    • H04M11/06Simultaneous speech and data transmission, e.g. telegraphic transmission over the same conductors
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/167Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M7/00Arrangements for interconnection between switching centres
    • H04M7/006Networks other than PSTN/ISDN providing telephone service, e.g. Voice over Internet Protocol (VoIP), including next generation networks with a packet-switched transport layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/24Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding

Definitions

  • the present invention relates to the field of audio technologies, and in particular, to a voice signal processing method and related apparatus and system.
  • the bandwidth of voice signals transmitted by the main telephone network is generally less than 4 kHz, and the frequency band is often limited to the range of 300 Hz to 3.4 kHz.
  • the frequency band is often limited to the range of 300 Hz to 3.4 kHz.
  • Second-generation (2G), third-generation (3G), and fourth-generation (4G) networks coexist, so terminals that support a variety of different voice bandwidths may coexist.
  • NB narrowband
  • WB maximum bandwidth
  • SWB super wideband
  • a corresponding voice signal broadband service can be established when a call between two terminals supporting the same voice broadband is reached.
  • the maximum supported voice bandwidth is relatively small (for example, NB) and the maximum supported voice bandwidth is relatively large (such as WB or SWB, etc.)
  • the traditional scheme usually only allows the maximum supported voice bandwidth to be relatively large.
  • a terminal e.g., a WB terminal
  • enjoys a voice bandwidth service e.g., NB bandwidth service
  • a terminal e.g., an NB terminal
  • Embodiments of the present invention provide a voice signal processing method and related apparatus and system.
  • a first aspect of the present invention provides a voice signal processing method, including:
  • the network device receives the first voice encoded signal from the first terminal.
  • the network device performs voice decoding processing on the first voice encoded signal to obtain a voice decoding parameter and a first voice decoding signal; the network device performs virtual frequency band extension processing using the voice decoding parameter to obtain decoding with the first voice And a network device that combines the first voice decoding signal and the spread-band voice decoding signal to perform voice coding processing to obtain a second voice encoded signal;
  • the second terminal that establishes a call connection with the first terminal transmits the second voice coded signal.
  • the maximum bandwidth supported by the first terminal is smaller than the maximum bandwidth supported by the second terminal, or the maximum bandwidth supported by the first network where the first terminal is located is smaller than that of the second terminal.
  • the maximum bandwidth supported by the second network is smaller than that of the second terminal.
  • the maximum bandwidth supported by the first terminal and the maximum bandwidth supported by the second terminal may be, for example, two of the following typical bandwidths: narrowband (NB), wideband (WB), ultra-wideband (SWB), and fullband (FB). That is, the first terminal may be, for example, a narrowband terminal, a broadband terminal, or an ultra wideband terminal, and the second terminal may be a broadband terminal, an ultra wideband terminal, or a full band terminal.
  • the maximum bandwidth supported by the first terminal and the maximum bandwidth supported by the second terminal are also not limited to the typical bandwidth of the above example.
  • the maximum bandwidth supported by the first network and the maximum bandwidth supported by the second network may be, for example, two of the following typical bandwidths: narrowband (NB), wideband (WB), ultra-wideband (SWB), and Full band (FB). That is to say, the first network may be, for example, supporting narrowband, supporting broadband or ultra-wideband, and the second network may be supporting broadband, ultra-wideband or full-band.
  • the maximum bandwidth supported by the first network and the maximum bandwidth supported by the second network are also not limited to the typical bandwidth of the above example.
  • the frequency bandwidth of the first voice encoded signal is smaller than the frequency bandwidth of the second voice encoded signal, wherein a sampling rate of the first voice encoded signal is smaller than a sampling rate of the second voice encoded signal.
  • the frequency bandwidth of the first voice encoded signal is less than or equal to a maximum bandwidth supported by the first terminal, or less than or equal to a maximum bandwidth supported by the first network.
  • the first voice encoded signal can be a narrowband voice encoded signal.
  • the first voice encoded signal may be a wideband voice encoded signal or a narrowband voice encoded signal.
  • the first voice encoded signal may be an ultra wideband voice encoded signal or a narrowband voice encoded signal or a wideband voice encoded signal).
  • the first voice encoded signal can be a narrowband voice encoded signal.
  • the first voice encoded signal may be a wideband voice encoded signal or a narrowband voice encoded signal.
  • the first voice encoded signal may be an ultra wideband voice encoded signal or a narrowband voice encoded signal or a wideband voice encoded signal).
  • the frequency bandwidth of the first voice encoded signal is smaller than the frequency bandwidth of the second voice encoded signal.
  • the sampling rate of the first voice encoded signal is smaller than the sampling rate of the second voice encoded signal.
  • the frequency band of the first voice coded signal may be a subset of the frequency band of the second voice coded signal, of course, the intersection of the frequency band of the first voice coded signal and the frequency band of the second voice coded signal, It may also not be equal to the frequency band of the first voice encoded signal.
  • the spread-band voice decoding signal may include a high-band extended voice decoding signal, and the spread-band voice decoding signal may further include a low-band extended voice decoding signal.
  • the network device mentioned in this embodiment may be, for example, a base station, a radio network controller, a core network device, or other network device.
  • the network device may be a base station or a radio network controller of the radio access network accessed by the second terminal, or a base station or a radio network controller of the radio access network accessed by the first terminal, or may be a packet.
  • Core network devices such as data gateways or service gateways.
  • the first terminal and the second terminal can be a call function for a mobile phone, a tablet computer, a personal computer or a notebook computer.
  • User equipment a mobile phone, a tablet computer, a personal computer or a notebook computer.
  • the network device after receiving, by the network device, the first voice coded signal from the first terminal supporting the relatively narrow bandwidth, performing voice decoding processing on the first voice coded signal to obtain voice decoding parameters and a first speech decoding signal, performing a virtual band extension process using the speech decoding parameters to obtain a spread-band speech decoding signal corresponding to the first speech decoding signal, and then decoding the first speech decoding signal and the spread-band speech
  • the second terminal is sent to the second terminal supporting the relatively wide bandwidth, because the network device in the transit position sends a relatively wide bandwidth to the first terminal supporting the relatively narrow bandwidth.
  • the voice encoding signal of the wide second terminal performs virtual frequency band expansion, so that the downlink voice coded signal of the second terminal supporting the relatively wide bandwidth can better match the maximum bandwidth support capability of the second terminal, which is beneficial to
  • the second terminal supporting relatively wide bandwidth tries to enjoy the maximum frequency as much as possible
  • the ability to match the bandwidth to support voice signal bandwidth services, and also without the need for special second terminal functional enhancements, this will help to enhance the user call experience. It can be seen that the above example is beneficial to improving the quality of service in the case where the maximum bandwidth support capability of the terminal is asymmetric.
  • the network device after the network device receives the first voice coded signal from the first terminal supporting the relatively narrow bandwidth network, performing voice decoding processing on the first voice coded signal to obtain voice decoding. a parameter and a first speech decoding signal, performing a virtual band extension process using the speech decoding parameter to obtain a spread-band speech decoding signal corresponding to the first speech decoding signal, and then decoding the first speech decoding signal and the spreading band
  • the voice decoding signals are combined and subjected to voice encoding processing to obtain the second voice encoded signal
  • the second voice encoding signal is transmitted to the second terminal supporting the relatively wide bandwidth of the second network, because the network device in the transit position supports the first relatively narrow bandwidth.
  • the first terminal of the network sends the voice coded signal of the second terminal supporting the second network of the relatively wide bandwidth to the virtual frequency band extension, so that the downlink voice coded signal of the second terminal supporting the relatively wide bandwidth of the second network can be Better match the maximum bandwidth support capability of the second network, which is beneficial to
  • the second terminal supporting the relatively wide bandwidth of the second network enjoys the voice signal bandwidth service matching its maximum bandwidth support capability, and does not need to perform special function enhancement on the second network, thereby facilitating the user's call experience. . It can be seen that the above example is beneficial to improving the quality of service in the case where the maximum bandwidth support capability of the network is asymmetric.
  • the performing, by using the voice decoding parameter, performing a virtual band extension process to obtain a wideband voice decoding signal corresponding to the first voice decoding signal include:
  • the excitation signal includes: using the speech decoding parameter, estimating a spread spectrum excitation signal corresponding to the first speech decoding signal based on a spectral folding algorithm, a white noise excitation algorithm, or a harmonic noise model algorithm.
  • the using the voice decoding parameter to estimate The first voice decoding a spread spectrum spectrum envelope corresponding to the signal comprising: estimating a spread spectrum spectral envelope corresponding to the first voice decoded signal based on a linear mapping method, a codebook mapping method, or a statistical mapping method by using the voice decoding parameter .
  • the voice decoding parameter includes a gene period , voiced tone factor and linear predictive coding parameters.
  • the first voice encoded signal is voiced Decoding processing to obtain the voice decoding parameter and the first voice decoding signal includes: selecting, from the plurality of voice decoders, a voice decoder corresponding to a maximum bandwidth supported by the first terminal, performing the first voice encoded signal Voice decoding processing to obtain a voice decoding parameter and a first voice decoding signal; or, selecting a voice decoder corresponding to a maximum frequency bandwidth supported by the first network from a plurality of voice decoders, encoding the first voice
  • the signal is subjected to speech decoding processing to obtain a speech decoding parameter and a first speech decoding signal.
  • a plurality of voice decoders are built in the network device, and an appropriate voice decoder is selected according to the need to perform decoding of the voice coded signal, which is beneficial to improving the transcoding support capability and the response processing speed of the network device.
  • the using the voice decoding parameter Performing a virtual band extension process to obtain a wideband voice decoding signal corresponding to the first voice decoding signal comprising: selecting, from a plurality of virtual band extenders, a virtual corresponding to a maximum bandwidth supported by the second terminal a band expander that performs a virtual band extension process using the voice decoding parameter to obtain a spreadband voice decoding signal corresponding to the first voice decoding signal; or selects from the plurality of virtual band expanders and the second The virtual band expander corresponding to the maximum bandwidth of the network supported by the network performs virtual band expansion processing using the voice decoding parameters to obtain a spread-band voice decoding signal corresponding to the first voice decoded signal.
  • the first voice And combining the decoded signal and the spread-band voice decoding signal to perform a voice encoding process to obtain a second voice encoded signal comprising: selecting, from a plurality of voice encoders, a voice corresponding to a maximum bandwidth of the second terminal supported by the second terminal An encoder, combining the first voice decoding signal and the spread-band voice decoding signal, performing voice coding processing to obtain a second voice coded signal; selecting from a plurality of voice encoders and supporting the second network The voice coder corresponding to the maximum bandwidth of the band combines the first voice decoded signal and the spread-band voice decoding signal to perform a voice encoding process to obtain a second voice encoded signal.
  • the frequency bandwidth of the first voice encoded signal and the second voice encoded signal is two of the following band bandwidths: narrowband, wideband, ultra-wideband, and full band.
  • the voice signal processing method further includes :
  • the network device receives a third voice encoded signal from the second terminal
  • the network device downsamples the third voice decoding signal to obtain a fourth voice decoding signal; the network device performs a voice encoding process on the fourth voice decoding signal to obtain a fourth voice encoded signal, where The frequency bandwidth of the fourth voice encoded signal is smaller than the frequency bandwidth of the third voice encoded signal, and the sampling rate of the fourth voice encoded signal is smaller than the sampling rate of the third voice encoded signal;
  • the network device Transmitting, by the network device, the fourth voice coded signal to the first terminal; or after the network device performs voice enhancement processing on the fourth voice coded signal to obtain a fourth voice coded signal after the voice enhancement process Transmitting, to the first terminal, the fourth voice encoded signal after the voice enhancement processing.
  • the performing the voice decoding processing on the third voice encoded signal to obtain the third voice decoding signal includes: Selecting, by a plurality of voice decoders, a voice decoder corresponding to a maximum bandwidth supported by the second terminal, performing voice decoding processing on the third voice encoded signal to obtain a third voice decoding signal; or, from multiple The voice decoder selects a voice decoder corresponding to the maximum bandwidth supported by the second network, and performs voice decoding processing on the third voice encoded signal to obtain a third voice decoding signal.
  • the third voice decoding signal The downsampling process to obtain the fourth voice decoding signal includes: selecting a downsampler corresponding to a maximum bandwidth of the first terminal supported by the plurality of downsamplers, and downsampling the third voice decoded signal to Obtaining a fourth voice decoding signal; or selecting a downsampler corresponding to a maximum frequency bandwidth supported by the first network from a plurality of downsamplers, and downsampling the third voice decoding signal to obtain a fourth Voice decoding signal.
  • the fourth voice decoding Performing a speech encoding process to obtain a fourth speech encoded signal includes: selecting, from a plurality of speech encoders, a speech encoder corresponding to a maximum bandwidth supported by the first terminal, and performing the fourth speech decoding signal Voice encoding processing to obtain a fourth voice encoded signal; or, selecting, from a plurality of voice encoders, a voice encoder corresponding to a maximum bandwidth supported by the first network, and performing voice encoding on the fourth voice decoded signal Processing to obtain a fourth speech encoded signal.
  • the network device is a base station , wireless network controller or core network equipment.
  • a second aspect of the embodiments of the present invention provides a network device, including:
  • a communication interface configured to receive a first voice encoded signal from the first terminal
  • a first voice decoder configured to perform voice decoding processing on the first voice encoded signal to obtain a voice decoding parameter Number and first voice decoding signal
  • a first virtual band extension processor configured to perform a virtual band extension process using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the first voice decoding signal
  • a first voice coder combining the first voice decoding signal and the spread-band voice decoding signal, and performing voice coding processing to obtain a second voice coded signal; wherein a frequency bandwidth of the first voice coded signal is smaller than a bandwidth a bandwidth of the second voice encoded signal, the sampling rate of the first voice encoded signal being smaller than a sampling rate of the second voice encoded signal;
  • the communication interface is configured to send the second voice encoded signal to a second terminal that establishes a call connection with the first terminal.
  • the maximum bandwidth supported by the first terminal is smaller than the maximum bandwidth supported by the second terminal, or the maximum bandwidth supported by the first network where the first terminal is located is smaller than that of the second terminal.
  • the maximum bandwidth supported by the second network is smaller than the maximum bandwidth supported by the second network.
  • first voice decoder the first virtual band extension processor and the first voice encoder may also be integrated.
  • the first virtual band extension processor is specifically configured to: estimate, by using the voice decoding parameter, a spread spectrum excitation signal corresponding to the first voice decoding signal; and use the voice decoding parameter to estimate the first a spread spectrum spectrum envelope corresponding to the voice decoding signal; synthesizing the spread spectrum excitation signal by using a filter corresponding to the spread spectrum spectrum envelope to obtain an extension corresponding to the first voice decoded signal Band voice decoding signal.
  • the spread spectrum excitation corresponding to the first voice decoding signal is estimated by using the voice decoding parameter
  • the first virtual band extension processor is specifically configured to: use the voice decoding parameter to estimate, according to a spectral folding algorithm, a white noise excitation algorithm, or a harmonic noise model algorithm, to correspond to the first voice decoding signal The spread spectrum excitation signal.
  • the first virtual band extension processor is specifically configured to utilize the voice decoding parameter, based on A linear mapping method, a codebook mapping method, or a statistical mapping method estimates a spread spectrum spectral envelope corresponding to the first voice decoded signal.
  • the voice decoding parameter includes a gene period , voiced tone factor and linear predictive coding parameters.
  • the network device includes multiple voices a decoder, the first voice decoder being a voice decoder of a plurality of voice decoders corresponding to a maximum bandwidth supported by the first terminal; or the first voice decoder is a plurality of voice decoders The maximum band band supported by the first network Wide corresponding voice decoder.
  • the network device includes multiple virtual a bandwidth extension processor, the first virtual band extension processor being a virtual band expander corresponding to a maximum band bandwidth supported by the second terminal among the plurality of virtual band expanders; or the first virtual band The extension processor is a virtual band expander corresponding to a maximum band bandwidth supported by the second network among the plurality of virtual band expanders.
  • the network device includes multiple voices An encoder, the first voice coder being a voice coder corresponding to a maximum frequency bandwidth supported by the second terminal among the plurality of voice coder; or, the first voice coder is a plurality of voice codes A voice coder corresponding to a maximum bandwidth supported by the second network.
  • the first voice coded signal and The frequency bandwidth of the second voice coded signal is two of the following band bandwidths: narrowband, wideband, ultra-wideband, and full band.
  • the network device further includes: a second voice decoder, a second voice encoder, and a downsampler;
  • the communication interface is further configured to receive a third voice encoded signal from the second terminal;
  • the second voice decoder is configured to perform voice decoding processing on the third voice encoded signal to obtain a third voice decoding signal
  • the first downsampler is configured to downsample the third voice decoding signal to obtain a fourth voice decoding signal
  • the second voice coder is configured to perform voice coding processing on the fourth voice decoding signal to obtain a fourth voice coded signal, where a frequency bandwidth of the fourth voice coded signal is smaller than the third voice coded signal Frequency band bandwidth, the sampling rate of the fourth voice encoded signal is smaller than the sampling rate of the third voice encoded signal;
  • the communication interface is further configured to: send the fourth voice encoded signal to the first terminal; or the communication interface is further configured to perform voice enhancement processing on the fourth voice encoded signal to obtain voice enhancement processing. After the subsequent fourth voice encoded signal, the fourth voice encoded signal after the voice enhancement processing is transmitted to the first terminal.
  • the network device includes a plurality of voice decoders, and the second voice decoder is the second a voice decoder corresponding to a maximum bandwidth supported by the terminal; or the second voice decoder is a voice decoder corresponding to a maximum bandwidth supported by the second network.
  • the network device includes multiple downsamplers
  • the first downsampler is a downsampler corresponding to a maximum bandwidth supported by the first terminal among the plurality of downsamplers; or
  • the first downsampler is a downsampler corresponding to a maximum bandwidth supported by the first network among the plurality of downsamplers.
  • the network device includes multiple voice coding
  • the second voice coder is a voice coder corresponding to a maximum bandwidth supported by the first terminal; or the second voice coder is a voice code corresponding to a maximum bandwidth supported by the first network Device.
  • the network device is a base station , wireless network controller or core network equipment.
  • a third aspect of the embodiments of the present invention provides a network device, including a storage unit, a communication interface, and a processor coupled to the storage unit and the communication interface.
  • the storage unit is configured to store instructions
  • the processor is configured to execute the instructions
  • the communication interface is configured to communicate with other devices under the control of the processor. The method performed by the network device in the first aspect may be performed in accordance with the instructions when the processor is executing the instructions.
  • a fourth aspect of the embodiments of the present invention provides a computer readable storage medium storing program code for voice signal processing performed by a network device.
  • the program code includes instructions for performing the method performed by the network device in the first aspect.
  • a fifth aspect of the embodiments of the present invention further provides a network device, the network device comprising a unit capable of performing the method performed by the network device in the first aspect.
  • a sixth aspect of the embodiments of the present invention provides a communication system, which may include any network device according to an embodiment of the present invention.
  • a seventh aspect of the embodiments of the present invention provides a computer program product, the program code comprising instructions for performing the method performed by the network device in the first aspect.
  • FIG. 1A and FIG. 1B are schematic diagrams showing two network architectures according to an embodiment of the present invention.
  • FIG. 2A is a schematic diagram of a voice signal processing method according to an embodiment of the present invention.
  • 2B is a schematic diagram showing an exemplary spectrum range of a typical frequency band according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of another voice signal processing method according to an embodiment of the present invention.
  • 3B is a schematic diagram of internal devices of a network device according to an embodiment of the present invention.
  • 3C is a schematic diagram of a spectrum range of a narrowband according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a spectrum range of a narrowband extended broadband according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a flow direction of a voice signal according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of another voice signal processing method according to an embodiment of the present invention.
  • FIG. 4B is a schematic diagram of a flow direction of a voice signal according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a network device according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of another network device according to an embodiment of the present invention.
  • FIG. 1A and FIG. 1B are schematic diagrams of two possible network architectures provided in the embodiment of the present application, in the network architecture shown in FIG.
  • the terminal establishes a call connection with the core network through the access network.
  • the voice signal processing method provided in some embodiments of the present application may be performed by an access network device or a core network device.
  • the voice signal processing method provided in some embodiments of the present application may be performed by, for example, some servers in the Internet (e.g., conference server, VoIP server, etc.).
  • the terminal mentioned in the embodiment of the present invention may be a device having functions of collecting, storing, and transmitting voice signals to the outside.
  • the terminal may be, for example, a mobile phone, a tablet computer, a personal computer, or a notebook computer.
  • FIG. 2-A is a schematic flowchart of a voice signal processing method according to an embodiment of the present application.
  • a voice signal processing method provided by an embodiment of the present application may include:
  • the first terminal sends a first voice encoded signal.
  • the first terminal may send the first voice coded signal based on the call connection.
  • the second terminal can also transmit a voice encoded signal based on the call connection.
  • the call connection may be a call connection based on a mobile communication network, or may be an internet-based call connection.
  • the maximum bandwidth supported by the first terminal is smaller than the maximum bandwidth supported by the second terminal.
  • the maximum bandwidth supported by the first network where the first terminal is located is smaller than the maximum bandwidth supported by the second network where the second terminal is located.
  • the first network where the first terminal is located refers to the network to which the first terminal is connected during the current call
  • the second network where the second terminal is located refers to the network to which the second terminal is attached when the current call is performed.
  • the terminal capability is limited by the network capability. For example, although the first terminal is a full-band terminal, since the first network only supports the broadband, the first terminal can only be attached to the first network. Use broadband.
  • the maximum bandwidth supported by the first terminal and the maximum bandwidth supported by the second terminal may be, for example, two of the following typical bandwidths: narrowband (NB), wideband (WB), ultra-wideband (SWB), and fullband (FB). That is, the first terminal may be, for example, a narrowband terminal, a broadband terminal, or an ultra wideband terminal, and the second terminal may be a broadband terminal, an ultra wideband terminal, or a full band terminal.
  • the maximum bandwidth supported by the first terminal and the maximum bandwidth supported by the second terminal are also not limited to the typical bandwidth of the above example.
  • the maximum bandwidth supported by the first network and the maximum bandwidth supported by the second network may be, for example, two of the following typical bandwidths: narrowband (NB), wideband (WB), ultra-wideband (SWB), and fullband (FB). That is, the first network may support, for example, narrowband, broadband, or ultra-wideband, while the second network may support broadband, ultra-wideband. Belt or full belt.
  • the maximum bandwidth supported by the first network and the maximum bandwidth supported by the second network are also not limited to the typical bandwidth of the above example.
  • Figure 2-B illustrates an example range of typical band bandwidths such as narrowband (NB), wideband (WB), ultra-wideband (SWB), and full band (FB), of course the range boundaries of these typical bands. It is not limited to the examples in the figure.
  • NB narrowband
  • WB wideband
  • SWB ultra-wideband
  • FB full band
  • the frequency bandwidth of the first voice encoded signal is less than or equal to the maximum bandwidth supported by the first terminal.
  • the first voice encoded signal can be a narrowband voice encoded signal.
  • the first voice encoded signal may be a wideband voice encoded signal or a narrowband voice encoded signal.
  • the first voice encoded signal may be an ultra wideband voice encoded signal or a narrowband voice encoded signal or a wideband voice encoded signal.
  • the frequency bandwidth of the first voice encoded signal is less than or equal to the maximum bandwidth supported by the first network.
  • the first voice encoded signal can be a narrowband voice encoded signal.
  • the first voice encoded signal may be a wideband voice encoded signal or a narrowband voice encoded signal.
  • the first voice encoded signal may be an ultra wideband voice encoded signal or a narrowband voice encoded signal or a wideband voice encoded signal.
  • the network device receives the first voice coded signal from the first terminal, and the network device performs voice decoding processing on the first voice coded signal to obtain a voice decoding parameter and a first voice decoding signal.
  • the voice decoding parameters may include, for example, a gene period, a voiced sound factor, and a linear predictive coding parameter.
  • the network device performs virtual frequency band extension processing by using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the first voice decoding signal.
  • the virtual band expansion process is performed by using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the first voice decoding signal, and the main mechanism is to utilize a low band voice signal and a high band voice signal. Correlation, therefore, various frequency band extension related algorithms based on the above mechanism are optional for VBWE (Virtual Band Width Extension) processing.
  • VBWE Virtual Band Width Extension
  • the spread-band voice decoding signal may include a high-band extended voice decoding signal, and the spread-band voice decoding signal may further include a low-band extended voice decoding signal.
  • the network device combines the first voice decoding signal and the spread-band voice decoding signal to perform voice coding processing to obtain a second voice encoded signal.
  • the frequency bandwidth of the first voice encoded signal is smaller than the frequency bandwidth of the second voice encoded signal. Its The sampling rate of the first voice encoded signal is smaller than the sampling rate of the second voice encoded signal.
  • the frequency band of the first voice coded signal may be a subset of the frequency band of the second voice coded signal, of course, the intersection of the frequency band of the first voice coded signal and the frequency band of the second voice coded signal, It may also not be equal to the frequency band of the first voice encoded signal.
  • the network device sends the second voice coded signal to a second terminal that establishes a call connection with the first terminal.
  • the second terminal can receive the second voice encoded signal and perform decoding and playing on the second voice encoded signal.
  • the network device mentioned in this embodiment may be, for example, a base station, a radio network controller, a core network device, or other network device.
  • the network device may be a base station or a radio network controller of the radio access network accessed by the second terminal, or a base station or a radio network controller of the radio access network accessed by the first terminal, or may be a packet.
  • Core network devices such as data gateways or service gateways.
  • the first terminal and the second terminal may be user devices with a call function such as a mobile phone, a tablet computer, a personal computer or a notebook computer.
  • the network device after receiving the first voice coded signal from the first terminal supporting the relatively narrow bandwidth, performs voice decoding processing on the first voice coded signal to obtain voice decoding parameters and a first speech decoding signal, performing a virtual band extension process using the speech decoding parameters to obtain a spread-band speech decoding signal corresponding to the first speech decoding signal, and then decoding the first speech decoding signal and the spread-band speech
  • the signal is transmitted to the second terminal supporting the relatively wide bandwidth, because the network device in the transit position relatively supports the first terminal supporting the relatively narrow bandwidth.
  • the voice-encoded signal of the second terminal of the wide bandwidth is subjected to virtual frequency band expansion, so that the downlink voice coded signal of the second terminal supporting the relatively wide bandwidth can better match the maximum bandwidth support capability of the second terminal, which is beneficial to Making the second terminal supporting relatively wide bandwidth enjoy the maximum frequency as much as possible
  • the network device after the network device receives the first voice coded signal from the first terminal that supports the relatively narrow bandwidth of the first network, performing voice decoding processing on the first voice coded signal to obtain a voice decoding parameter and a first voice decoding signal, performing a virtual band extension process using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the first voice decoding signal, and then decoding the first voice decoding signal and the extension
  • the band speech decoding signals are combined and subjected to speech encoding processing to obtain the second speech encoded signal
  • the second band is transmitted to the second terminal supporting the relatively wide bandwidth, because the network device in the transit position supports relatively narrow bandwidth.
  • the first terminal of the first network sends the voice coded signal of the second terminal supporting the second network of the relatively wide bandwidth to perform virtual frequency band extension, thereby enabling downlink voice coding of the second terminal supporting the second network of relatively wide bandwidth
  • the signal can better match the maximum bandwidth support capability of the second network, which is beneficial to
  • the second terminal supporting the second network with a relatively wide bandwidth tries to enjoy the voice signal bandwidth service matched with the maximum bandwidth support capability thereof, and does not need to perform special function enhancement on the second network, thereby facilitating the user's call.
  • FIG. 3-A is a schematic flowchart of a voice signal processing method according to an embodiment of the present application.
  • a voice signal processing method provided by an embodiment of the present application may include:
  • Narrowband terminal voice coding obtains a first voice coded signal, and sends the first voice coded signal based on the call connection.
  • the first terminal is a narrowband terminal
  • the second terminal is a broadband terminal.
  • the narrowband terminal can then use the AMR NB encoder or other NB encoder speech sampled signals for speech encoding to obtain the first speech encoded signal.
  • the sampling rate of the first speech encoded signal is 8 kHz.
  • the RNC receives the first voice coded signal from the narrowband terminal, and the RNC performs voice decoding processing on the first voice coded signal to obtain a voice decoding parameter and a first voice decoding signal.
  • the voice decoding parameters may include, for example, a gene period, a voiced sound factor, and a linear predictive coding parameter.
  • the RNC may be based on a narrowband terminal support.
  • the maximum band bandwidth (e.g., NB) is selected from the set of voice decoders to perform a speech decoding process on the first voice encoded signal to obtain a speech decoding parameter and a first speech decoding signal.
  • the network device (such as RNC) exemplified in FIG. 3-B includes a voice decoder group (including multiple voice decoders), a VBWE processor group (including multiple VBWE processor groups), and a voice encoder group. (including multiple voice encoders) and downsampler groups (including multiple downsamplers).
  • the RNC can select the appropriate device from the corresponding device group to perform the corresponding operation as needed.
  • the NB decoder is, for example, an AMR (Adaptive Multi-Rate)-NB decoder or other type of NB decoder.
  • the SWB decoder may be, for example, an EVS (Enhanced Voice Services)-SWB decoder or another type of SWB decoder.
  • the WB decoder is, for example, an AMR-WB decoder or other type of WB decoder.
  • the FB decoder can be, for example, an EVS-FB decoder or other type of FB decoder.
  • the RNC performs VBWE processing by using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the first voice decoding signal.
  • the voice decoding parameters may include a gene period, a voiced sound factor, and a linear predictive coding parameter.
  • the VBWE processor group may include, for example, an NB-VBWE processor, a WB-VBWE processor, a SWB VBWE processor, and an FB VBWE processor
  • the RNC The WB VBWE processor may be selected from a plurality of VBWE processors based on a maximum bandwidth (such as WB) supported by the broadband terminal, and the VBWE processing is performed by using the voice decoding parameters to obtain a spread spectrum corresponding to the first voice decoded signal. With voice decoding signal.
  • the performing the virtual band extension process by using the voice decoding parameter to obtain the extended band voice decoding signal corresponding to the first voice decoding signal may include, for example, estimating the first voice by using the voice decoding parameter. a spread spectrum excitation signal corresponding to the decoded signal; estimating a spread spectrum spectral envelope corresponding to the first voice decoded signal by using the voice decoding parameter; using a filter pair corresponding to the spread spectrum spectral envelope
  • the spread-band excitation signal is subjected to synthesis processing to obtain a spread-band speech decoding signal corresponding to the first speech-decoded signal.
  • using the voice decoding parameter to estimate a spread spectrum excitation signal corresponding to the first voice decoding signal may include, for example, utilizing the voice decoding parameters (eg, voice decoding parameters such as a gene period and a voiced sound factor), A spread spectrum excitation signal corresponding to the first voice decoded signal is estimated based on a spectral folding algorithm, a white noise excitation algorithm, or a harmonic noise model algorithm.
  • voice decoding parameters eg, voice decoding parameters such as a gene period and a voiced sound factor
  • a spread spectrum excitation signal corresponding to the first voice decoded signal is estimated based on a spectral folding algorithm, a white noise excitation algorithm, or a harmonic noise model algorithm.
  • the estimating, by the voice decoding parameter, a spread spectrum spectrum envelope corresponding to the first voice decoding signal may include: using the voice decoding parameter (eg, a voice decoding parameter such as a linear predictive coding parameter) And estimating a spread spectrum spectrum envelope corresponding to the first voice decoding signal based on a linear mapping method, a codebook mapping method, or a statistical mapping method.
  • the voice decoding parameter eg, a voice decoding parameter such as a linear predictive coding parameter
  • estimating a spread spectrum spectrum envelope corresponding to the first voice decoding signal based on a linear mapping method, a codebook mapping method, or a statistical mapping method.
  • the RNC combines the first voice decoding signal and the spread-band voice decoding signal, and performs voice coding processing to obtain a second voice encoded signal.
  • the frequency bandwidth of the first voice encoded signal is smaller than the frequency bandwidth of the second voice encoded signal.
  • the sampling rate of the first voice encoded signal is smaller than the sampling rate of the second voice encoded signal.
  • the sampling rate of the second speech encoded signal is, for example, 16 kHz, and the bandwidth of the second speech encoded signal is BW2.
  • the spread-band voice decoding signal corresponding to the first voice decoding signal includes a high-band extended voice decoding signal and a low-band extended voice decoding signal
  • the RNC can be supported based on the broadband terminal.
  • the maximum bandwidth (WB) the WB encoder is selected from a plurality of voice encoders, and the first voice decoding signal and the spread-band voice decoding signal are combined and subjected to voice encoding processing to obtain a second voice encoded signal.
  • the NB encoder may be, for example, an AMR-NB encoder or other type of NB encoder.
  • the SWB encoder can be, for example, an EVS-SWB encoder or other type of SWB encoder.
  • the WB encoder is, for example, an AMR-WB encoder or other type of WB encoder.
  • the FB encoder can be, for example, an EVS-FB encoder or other type of FB encoder.
  • the RNC sends the second voice coded signal to a broadband terminal that establishes a call connection with the narrowband terminal.
  • the broadband terminal can receive the second voice encoded signal and perform decoding and playing on the second voice encoded signal.
  • the broadband terminal voice coding obtains a third voice encoded signal, and the third voice encoded signal is sent based on the call connection.
  • the RNC receives a third voice encoded signal from the broadband terminal, and the RNC performs voice decoding processing on the third voice encoded signal to obtain a third voice decoding signal.
  • the RNC can be based on the support of the broadband terminal.
  • Maximum Bandwidth (WB) the WB decoder is selected from a plurality of voice decoders to perform speech decoding processing on the third voice encoded signal to obtain a third voice decoding signal.
  • the RNC downsamples the third voice decoding signal to obtain a fourth voice decoding signal.
  • the RNC can be based on a narrowband The maximum bandwidth (NB) supported by the terminal, the NB downsampler is selected from the plurality of downsamplers included in the downsampler group to downsample the third voice decoded signal to obtain a fourth voice decoded signal.
  • NB The maximum bandwidth
  • the RNC performs voice coding processing on the fourth voice decoding signal to obtain a fourth voice encoded signal.
  • the RNC can be based on the maximum frequency band supported by the narrowband terminal.
  • the bandwidth (NB) selects an NB encoder from a plurality of voice encoders included in the voice coder group, and performs voice coding processing on the fourth voice decoding signal to obtain a fourth voice coded signal.
  • the frequency bandwidth (NB) of the fourth voice encoded signal is smaller than the frequency bandwidth (WB) of the third voice encoded signal, wherein a sampling rate (8 kHz) of the fourth voice encoded signal is smaller than the The sampling rate of the third speech encoded signal (16 kHz).
  • the RNC sends the fourth voice coded signal to the narrowband terminal; or after the RNC performs voice enhancement processing on the fourth voice coded signal to obtain a voice enhanced process fourth voice coded signal, Transmitting the fourth speech encoded signal after the speech enhancement process to the narrowband terminal.
  • the gain MOS of the fourth voice encoded signal can be made lower than the gain MOS when the wideband terminal directly transmits the voice signal band bandwidth to BW1 by the voice enhancement processing.
  • the product form of the narrowband terminal and the broadband terminal may be a user terminal having a call function such as a user equipment (UE).
  • UE user equipment
  • FIG. 3-E reference is made to the flow relationship of the voice signal illustrated in FIG. 3-E between a narrowband terminal, an RNC (an example of a network device), and a broadband terminal.
  • the narrowband terminal, the RNC, and the broadband terminal may have the functional devices exemplified in FIG. 3-E.
  • the RNC after receiving the narrowband voice coded signal of the narrowband terminal, the RNC performs voice decoding processing on the narrowband voice coded signal to obtain a voice decoding parameter and a narrowband voice decoding signal, and uses the voice decoding parameter to perform a virtual frequency band.
  • the network device in the transit position performs virtual frequency band extension on the voice coded signal sent by the narrowband terminal to the broadband terminal, so that the downlink voice coded signal of the broadband terminal can better support the maximum bandwidth of the broadband terminal.
  • Matching is beneficial to enable the broadband terminal to enjoy the voice signal bandwidth service that matches its maximum bandwidth support capability, and does not need to perform special function enhancement on the broadband terminal, which is beneficial to improve the user's calling experience.
  • FIG. 3 describes a process in which a narrowband terminal and a broadband terminal perform a call
  • a process in which a terminal in a narrowband network and a terminal in a broadband network perform a call is similar to the process of the embodiment corresponding to FIG. No longer.
  • FIG. 4-A is a schematic flowchart of a voice signal processing method according to an embodiment of the present application.
  • a voice signal processing method provided by an embodiment of the present application may include:
  • the broadband terminal voice encoding obtains the first voice encoded signal, and sends the first voice encoded signal based on the call connection.
  • the first terminal is a broadband terminal
  • the second terminal is an ultra-wideband terminal.
  • the wideband terminal can use the AMR WB encoder or other WB encoder speech sampled signals for speech encoding to obtain the first speech encoded signal.
  • the sampling rate of the first voice encoded signal is 16 kHz.
  • the RNC receives the first voice encoded signal from the broadband terminal, and the RNC performs voice decoding processing on the first voice encoded signal to obtain a voice decoding parameter and a first voice decoding signal.
  • the voice decoding parameters may include, for example, a gene period, a voiced sound factor, and a linear predictive coding parameter.
  • the RNC may be based on a broadband terminal support.
  • the maximum band bandwidth (e.g., WB) is selected from a set of voice decoders for performing a voice decoding process on the first voice encoded signal to obtain a voice decoding parameter and a first voice decoding signal.
  • the RNC performs VBWE processing by using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the first voice decoding signal.
  • the voice decoding parameters may include a gene period, a voiced sound factor, and a linear predictive coding parameter.
  • the VBWE processor group may include, for example, an NB-VBWE processor, a WB-VBWE processor, a SWB VBWE processor, and an FB VBWE processor
  • the RNC The SWB VBWE processor may be selected from a plurality of VBWE processors based on a maximum bandwidth (eg, SWB) supported by the ultra-wideband terminal, and the VBWE processing is performed by using the voice decoding parameter to obtain an extension corresponding to the first voice decoding signal.
  • Band voice decoding signal may include, for example, an NB-VBWE processor, a WB-VBWE processor, a SWB VBWE processor, and an FB VBWE processor
  • the performing the virtual band extension process by using the voice decoding parameter to obtain the extended band voice decoding signal corresponding to the first voice decoding signal may include, for example, estimating the first voice by using the voice decoding parameter. a spread spectrum excitation signal corresponding to the decoded signal; estimating a spread spectrum spectral envelope corresponding to the first voice decoded signal by using the voice decoding parameter; using a filter pair corresponding to the spread spectrum spectral envelope
  • the spread-band excitation signal is subjected to synthesis processing to obtain a spread-band speech decoding signal corresponding to the first speech-decoded signal.
  • using the voice decoding parameter to estimate a spread spectrum excitation signal corresponding to the first voice decoding signal may include, for example, utilizing the voice decoding parameters (eg, voice decoding parameters such as a gene period and a voiced sound factor), A spread spectrum excitation signal corresponding to the first voice decoded signal is estimated based on a spectral folding algorithm, a white noise excitation algorithm, or a harmonic noise model algorithm.
  • voice decoding parameters eg, voice decoding parameters such as a gene period and a voiced sound factor
  • a spread spectrum excitation signal corresponding to the first voice decoded signal is estimated based on a spectral folding algorithm, a white noise excitation algorithm, or a harmonic noise model algorithm.
  • the estimating, by the voice decoding parameter, a spread spectrum spectrum envelope corresponding to the first voice decoding signal may include: using the voice decoding parameter (eg, a voice decoding parameter such as a linear predictive coding parameter) And estimating a spread spectrum spectrum envelope corresponding to the first voice decoding signal based on a linear mapping method, a codebook mapping method, or a statistical mapping method.
  • the voice decoding parameter eg, a voice decoding parameter such as a linear predictive coding parameter
  • estimating a spread spectrum spectrum envelope corresponding to the first voice decoding signal based on a linear mapping method, a codebook mapping method, or a statistical mapping method.
  • the RNC combines the first voice decoding signal and the spread-band voice decoding signal, and performs voice coding processing to obtain a second voice encoded signal.
  • the frequency bandwidth of the first voice encoded signal is smaller than the frequency bandwidth of the second voice encoded signal.
  • the sampling rate of the first voice encoded signal is smaller than the sampling rate of the second voice encoded signal.
  • the sampling rate of the second speech encoded signal is, for example, 32 kHz, and the bandwidth of the second speech encoded signal is BW2.
  • the RNC can be based on ultra-wideband terminal support.
  • Maximum bandwidth (SWB) selecting a SWB encoder from a plurality of voice encoders, combining the first voice decoding signal and the spread-band voice decoding signal, and performing voice coding processing to obtain a second voice encoded signal .
  • the RNC sends the second voice coded signal to an ultra wideband terminal that establishes a call connection with the broadband terminal.
  • the ultra wideband terminal can receive the second voice encoded signal and perform decoding and playing on the second voice encoded signal.
  • the UWB terminal voice encoding obtains a third voice encoded signal, and sends the third session based on the call connection. Tone coded signal.
  • the RNC receives a third voice encoded signal from the ultra wideband terminal, and the RNC performs voice decoding processing on the third voice encoded signal to obtain a third voice decoded signal.
  • the RNC can be based on ultra-wideband terminal support.
  • the maximum bandwidth (SWB) of the plurality of voice decoders is selected from a plurality of voice decoders for performing a voice decoding process on the third voice encoded signal to obtain a third voice decoded signal.
  • the RNC downsamples the third voice decoding signal to obtain a fourth voice decoding signal.
  • the RNC can be based on broadband.
  • the maximum bandwidth (WB) supported by the terminal, the WB downsampler is selected from the plurality of downsamplers included in the downsampler group to downsample the third voice decoded signal to obtain a fourth voice decoded signal.
  • the RNC performs voice coding processing on the fourth voice decoding signal to obtain a fourth voice encoded signal.
  • the RNC can be based on the maximum frequency band supported by the wideband terminal.
  • Bandwidth WB
  • the frequency bandwidth (WB) of the fourth voice encoded signal is smaller than the frequency bandwidth (SWB) of the third voice encoded signal, wherein a sampling rate (16 kHz) of the fourth voice encoded signal is smaller than the The sampling rate of the third speech encoded signal (32 kHz).
  • the RNC sends the fourth voice coded signal to the broadband terminal; or after the RNC performs voice enhancement processing on the fourth voice coded signal to obtain a voice enhanced process fourth voice coded signal, Transmitting the fourth speech encoded signal after the speech enhancement process to the broadband terminal.
  • the gain MOS of the fourth speech encoded signal can be made lower than the gain MOS when the UWB terminal directly transmits the speech signal band bandwidth to BW1 by the speech enhancement processing.
  • the product form of the broadband terminal and the ultra-wideband terminal may be a user terminal having a call function such as a user equipment (UE).
  • UE user equipment
  • FIG. 4-B reference is made to the flow relationship of the voice signal illustrated in Figure 4-B between a broadband terminal, an RNC (an example of a network device), and an ultra wideband terminal.
  • the broadband terminal, the RNC, and the ultra wideband terminal may have the functional devices exemplified in FIG. 4-B.
  • the network device after receiving the wideband voice coded signal of the broadband terminal, the network device (RNC) performs voice decoding processing on the wideband voice coded signal to obtain voice decoding parameters and a broadband voice solution.
  • a code signal performing a virtual band extension process using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the wideband voice decoding signal, and then combining the wideband voice decoding signal and the spreadband voice decoding signal to perform voice
  • the RNC transmits to the UWB terminal, and the RNC in the transit position performs virtual band extension on the voice coded signal sent by the broadband terminal to the UWB terminal, thereby enabling the downlink voice coded signal of the UWB terminal.
  • FIG. 4 describes the flow of the call between the broadband terminal and the ultra-wideband terminal, and the process of the call between the terminal in the broadband network and the terminal in the ultra-wideband network and the process corresponding to FIG. 4 Similar, no longer repeat them.
  • the first terminal is a broadband terminal and the second terminal is a broadband terminal.
  • the first terminal is a broadband terminal and the second terminal is an ultra-wideband terminal.
  • the terminal and the second terminal are other types of terminals can be deduced.
  • an embodiment of the present application provides a network device 500, including:
  • the communication interface 510 is configured to receive a first voice encoded signal from the first terminal.
  • the first voice decoder 520 is configured to perform voice decoding processing on the first voice encoded signal to obtain a voice decoding parameter and a first voice decoding signal.
  • the first virtual band extension processor 530 is configured to perform a virtual band extension process using the voice decoding parameters to obtain a wideband voice decoding signal corresponding to the first voice decoding signal.
  • a first voice coder 540 combining the first voice decoding signal and the spread-band voice decoding signal, and performing voice coding processing to obtain a second voice coded signal; wherein a bandwidth of the first voice coded signal band The frequency bandwidth of the second voice encoded signal is smaller than the sampling rate of the second voice encoded signal.
  • the communication interface 510 is further configured to send the second voice encoded signal to a second terminal that establishes a call connection with the first terminal.
  • the maximum bandwidth supported by the first terminal is smaller than the maximum bandwidth supported by the second terminal, or the maximum bandwidth supported by the first network where the first terminal is located is smaller than that of the second terminal. The maximum bandwidth supported by the second network.
  • the first virtual band extension processor 530 is specifically configured to: estimate, by using the voice decoding parameter, a spread spectrum excitation signal corresponding to the first voice decoding signal; Deriving a speech decoding parameter to estimate a spread spectrum spectral envelope corresponding to the first speech decoding signal; synthesizing the spread spectrum excitation signal by using a filter corresponding to the spread spectrum spectral envelope to obtain a spread-band speech decoding signal corresponding to the first speech decoding signal.
  • the first virtual band extension processor 530 is specifically configured to use, in an aspect of estimating a spread band excitation signal corresponding to the first voice decoding signal by using the voice decoding parameter, Using the speech decoding parameters, a spread-band excitation signal corresponding to the first speech-decoded signal is estimated based on a spectral folding algorithm, a white noise excitation algorithm, or a harmonic noise model algorithm.
  • the first virtual band extension processor 530 is in the aspect of estimating a spread spectrum spectral envelope corresponding to the first voice decoding signal using the voice decoding parameter.
  • the broadband decoding spectrum corresponding to the first voice decoding signal is estimated based on a linear mapping method, a codebook mapping method, or a statistical mapping method by using the voice decoding parameter.
  • the speech decoding parameters include a gene period, a voiced sound factor, and a linear predictive coding parameter.
  • the network device includes a plurality of voice decoders, the first voice decoder being a voice of a plurality of voice decoders corresponding to a maximum bandwidth supported by the first terminal a decoder; or the first voice decoder is a voice decoder of a plurality of voice decoders corresponding to a maximum bandwidth supported by the first network;
  • the network device includes a plurality of virtual band extension processors, and the first virtual band extension processor is supported by the second terminal among the plurality of virtual band expanders a virtual band expander corresponding to the maximum band bandwidth; or the first virtual band extension processor is a virtual band expander corresponding to a maximum band bandwidth supported by the second network among the plurality of virtual band expanders;
  • the network device includes a plurality of voice coder, the first voice coder being a voice code corresponding to a maximum bandwidth of the plurality of voice coder supported by the second terminal Or the first voice coder is a voice coder corresponding to a maximum frequency bandwidth supported by the second network among the plurality of voice coder.
  • the frequency bandwidth of the first voice encoded signal and the second voice encoded signal is, for example, two of the following bandwidths: narrowband, wideband, ultra-wideband, and fullband.
  • the network device 500 further includes a second voice decoder 550, a second voice encoder 570, and a first downsampler 560.
  • the communication interface 510 is further configured to receive a third voice encoded signal from the second terminal.
  • the second voice decoder 550 is configured to perform voice decoding processing on the third voice encoded signal to obtain a third voice decoding signal.
  • the first downsampler 560 is configured to downsample the third voice decoding signal to obtain a fourth voice decoding signal.
  • the second voice encoder 570 is configured to perform voice coding processing on the fourth voice decoding signal to obtain a fourth voice coded signal, where a frequency bandwidth of the fourth voice coded signal is smaller than the third voice code. a frequency band bandwidth of the signal, the sampling rate of the fourth voice encoded signal being less than a sampling rate of the third voice encoded signal.
  • the communication interface 510 is further configured to send the fourth voice encoded signal to the first terminal; or the communication interface 510 is further configured to perform voice enhancement processing on the fourth voice encoded signal to obtain a voice enhancement process. After the fourth voice encoded signal, the fourth voice encoded signal after the voice enhancement processing is transmitted to the first terminal.
  • network device 500 includes a plurality of voice decoders, said second voice decoder 550 being a voice decoder corresponding to a maximum bandwidth supported by said second terminal; or Second words
  • the tone decoder 550 is a voice decoder corresponding to the maximum band bandwidth supported by the second network.
  • the network device includes a plurality of downsamplers, the first downsampler being a maximum band bandwidth supported by the first terminal among the plurality of downsamplers a corresponding downsampler; or the first downsampler is a downsampler corresponding to a maximum bandwidth supported by the first network of the plurality of downsamplers.
  • network device 500 includes a plurality of voice encoders, said second voice encoder 570 being a voice encoder corresponding to a maximum bandwidth supported by said first terminal; or The second voice encoder 570 is a voice coder corresponding to the maximum band bandwidth supported by the first network.
  • the network device is a base station, a radio network controller, or a core network device.
  • an embodiment of the present invention provides a network device 600, including:
  • the storage unit 620 is configured to store instructions
  • the processor 630 is configured to execute the instructions, wherein the communication interface 610 is configured to communicate with other devices under the control of the processor 630.
  • the processor 630 executes the instructions, any one of the voice signal processing methods in the above embodiments may be executed according to the instructions.
  • the processor 630 is configured to receive a first voice encoded signal from the first terminal through the communication interface 620, and perform voice decoding processing on the first voice encoded signal to obtain a voice decoding parameter and a first voice decoding signal; Performing virtual band extension processing using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the first voice decoding signal; combining the first voice decoding signal and the spreadband voice decoding signal a voice encoding process to obtain a second voice encoded signal; wherein a frequency bandwidth of the first voice encoded signal is smaller than a frequency bandwidth of the second voice encoded signal, and a sampling rate of the first voice encoded signal is smaller than the second a sampling rate of the voice encoded signal; transmitting, by the communication interface 610, the second voice encoded signal to a second terminal that has established a call connection with the first terminal.
  • the maximum bandwidth supported by the first terminal is smaller than the maximum bandwidth supported by the second terminal, or the maximum bandwidth supported by the first network where the first terminal is located is smaller than that of the second
  • the processor 630 is specifically configured to: estimate, by using the voice decoding parameter, a spread spectrum excitation signal corresponding to the first voice decoding signal; and use the voice decoding parameter to estimate Generating a spread spectrum spectral envelope corresponding to the first voice decoding signal; performing synthesis processing on the spread spectrum excitation signal by using a filter corresponding to the spread spectrum spectral envelope to obtain the first A wideband speech decoding signal corresponding to the speech decoding signal.
  • the processor 630 is specifically configured to utilize the voice decoding parameter in the aspect of estimating a spread spectrum excitation signal corresponding to the first voice decoding signal by using the voice decoding parameter. And estimating a spread spectrum excitation signal corresponding to the first voice decoding signal based on a spectral folding algorithm, a white noise excitation algorithm, or a harmonic noise model algorithm.
  • the processor 630 is specifically configured to utilize the voice in the aspect of estimating a spread spectrum spectral envelope corresponding to the first voice decoding signal by using the voice decoding parameter.
  • Decoding parameters estimating a spread spectrum spectrum corresponding to the first voice decoded signal based on a linear mapping method, a codebook mapping method, or a statistical mapping method Envelope.
  • the speech decoding parameters include a gene period, a voiced sound factor, and a linear predictive coding parameter.
  • the frequency bandwidth of the first voice encoded signal and the second voice encoded signal is, for example, two of the following bandwidths: narrowband, wideband, ultra-wideband, and fullband.
  • the processor 630 further receives a third voice encoded signal from the second terminal through the communication interface 610; and performs voice decoding processing on the third voice encoded signal to obtain a a three-voice decoding signal; a down-sampling process on the third voice-decoded signal to obtain a fourth voice-decoded signal; performing a voice-coding process on the fourth voice-decoded signal to obtain a fourth voice-coded signal, wherein the fourth The frequency bandwidth of the voice coded signal is smaller than the frequency bandwidth of the third voice coded signal, and the sampling rate of the fourth voice coded signal is smaller than the sample rate of the third voice coded signal; to the first terminal through the communication interface 610 Transmitting the fourth voice coded signal; or the communication interface 610 is further configured to: after performing the voice enhancement process on the fourth voice coded signal to obtain the fourth voice coded signal after the voice enhancement process, to the A terminal transmits the fourth voice encoded signal after the voice enhancement processing.
  • the network device 600 may be, for example, a base station, a radio network controller or a core network device or a network telephony server, or the like.
  • the network device 600 after receiving the first voice coded signal from the first terminal supporting the relatively narrow bandwidth, performs voice decoding processing on the first voice coded signal to obtain a voice decoding parameter. And a first speech decoding signal, performing a virtual band extension process using the speech decoding parameter to obtain a spread-band speech decoding signal corresponding to the first speech decoding signal, and then decoding the first speech decoding signal and the spread-band speech After the decoded signal is combined and subjected to the speech encoding process to obtain the second voice encoded signal, the second terminal is transmitted to the second terminal supporting the relatively wide bandwidth, because the network device in the transit position sends the support to the first terminal supporting the relatively narrow bandwidth.
  • the voice-encoded signal of the second terminal of the wide bandwidth is subjected to virtual frequency band expansion, so that the downlink voice coded signal of the second terminal supporting the relatively wide bandwidth can better match the maximum bandwidth support capability of the second terminal, which is beneficial to Making the second terminal supporting relatively wide bandwidth try to enjoy the maximum With the ability to match the bandwidth to support voice signal bandwidth services, and also without the need for special second terminal functional enhancements, this will help to enhance the user call experience. It can be seen that the above example is beneficial to improving the quality of service in the case where the maximum bandwidth support capability of the terminal is asymmetric.
  • the network device 600 after the network device 600 receives the first voice coded signal from the first terminal supporting the relatively narrow bandwidth of the first network, performing voice decoding processing on the first voice coded signal to Obtaining a voice decoding parameter and a first voice decoding signal, performing a virtual band extension process using the voice decoding parameter to obtain a wideband voice decoding signal corresponding to the first voice decoding signal, and then decoding the first voice signal and the
  • the wideband voice decoding signals are combined and then subjected to voice encoding processing to obtain a second voice encoded signal, and then transmitted to a second terminal supporting a relatively wide bandwidth second network, since the network device pair in the transit position supports a relatively narrow bandwidth
  • the first terminal of the first network sends the voice coded signal of the second terminal supporting the second network of the relatively wide bandwidth to the virtual band extension, thereby enabling the downlink voice of the second terminal of the second network supporting the relatively wide bandwidth
  • the encoded signal can better match the maximum bandwidth support capability of the second network, To support a relatively wide bandwidth
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division, and the actual implementation may have another division manner, for example, multiple units or components may be combined or may be integrated. Go to another system, or some features can be ignored or not executed.
  • the indirect coupling or direct coupling or communication connection shown or discussed herein may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and the like. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephonic Communication Services (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

话音信号处理方法和相关装置及系统,话音信号处理方法包括:网络设备接收来自第一终端的第一话音编码信号(201),对第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号(202);利用话音解码参数进行虚拟频带扩展处理以得到与第一话音解码信号对应的扩频带话音解码信号(203);将第一话音解码信号和扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号(204);向与第一终端建立了通话连接的第二终端发送第二话音编码信号(205)。有利于提升终端最大频带带宽支持能力非对称情况下或网络支持能力非对称情况下的服务质量。

Description

话音信号处理方法和相关装置和系统 技术领域
本发明涉及音频技术领域,具体主要涉及了话音信号处理方法和相关装置和系统。
背景技术
由于传输带宽和其它一些条件限制,目前主要电话网络所传输的话音信号的频带带宽一般都小于4kHz,频带常被限制在300Hz~3.4kHz范围内。随着通信带宽的逐步提升,用户对话音质量和临场感提出了越来越高的需求,传统的窄带话音已经越来越不能满足用户的体验需求。且传统电话话音中高频信息的缺乏对听力障碍用户影响极大,听力障碍用户使用电话交谈时常有困难。这些需求是宽带话音乃至超宽带话音等越来越受青睐的原因。
如今,第二代(2G)、第三代(3G)和第四代(4G)网络并存,因此支持各种不同的话音带宽的终端可能会并存。举例来说,可能并存有最大可支持窄带(NB,Narrow Band)话音带宽的窄带终端、最大支持宽带(WB,Wide Band)话音带宽的宽带终端、最大可以支持超宽带(SWB,Super Wide Band)话音带宽的超宽带终端和最大可以支持全带(FB,Full Band)话音带宽的全带终端等等。
最大支持相同话音宽带的两个终端之间通话时可建立相对应的话音信号宽带服务。但是,最大支持话音带宽相对较小(例如NB)的终端与最大支持话音带宽相对较大(例如WB或SWB等)的终端之间通话时,传统方案通常只能让最大支持话音带宽相对较大的终端(例如WB终端)享受到基本等同于最大支持话音带宽相对较小的终端(例如NB终端)的话音带宽服务(例如NB带宽服务)。
发明内容
本发明实施例提供话音信号处理方法和相关装置和系统。
本发明第一方面提供了一种话音信号处理方法,包括:
网络设备接收来自第一终端的第一话音编码信号。所述网络设备对第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号;所述网络设备利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;所述网络设备将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;所述网络设备向与所述第一终端建立了通话连接的第二终端发送所述第二话音编码信号。其中,所述第一终端支持的最大频带带宽小于所述第二终端支持的最大频带带宽,或者,所述第一终端所处第一网络所支持的最大频带带宽小于所述第二终端所处第二网络所支持的最大频带带宽。
例如,第一终端支持的最大频带带宽和第二终端支持的最大频带带宽例如可为如下典型频带带宽中的其中两个:窄带(NB)、宽带(WB)、超宽带(SWB)和全带(FB)。也就是说,第一终端例如可能是窄带终端、宽带终端或超宽带终端,而第二终端可能是宽带终端、超宽带终端或全带终端。当然第一终端支持的最大频带带宽和第二终端支持的最大频带带宽也并不限于上述举例的典型频带带宽。
例如,第一网络所支持的最大频带带宽和第二网络所支持的最大频带带宽例如可为如下典型频带带宽中的其中两个:窄带(NB)、宽带(WB)、超宽带(SWB)和全带(FB)。也就是说,第一网络例如可能是支持窄带、支持宽带或超宽带,而第二网络可能是支持宽带、超宽带或全带。当然第一网络所支持的最大频带带宽和第二网络所支持的最大频带带宽也并不限于上述举例的典型频带带宽。
其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽,其中,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率。
所述第一话音编码信号的频带带宽小于或等于所述第一终端支持的最大频带带宽,或者小于或等于所述第一网络所支持的最大频带带宽。例如,当所述第一终端为窄带终端,那么第一话音编码信号可为窄带话音编码信号。当所述第一终端为宽带终端,那么第一话音编码信号可为宽带话音编码信号或窄带话音编码信号。当所述第一终端为超宽带终端,那么第一话音编码信号可为超宽带话音编码信号或窄带话音编码信号或宽带话音编码信号)。例如,当所述第一网络支持窄带,那么第一话音编码信号可为窄带话音编码信号。当所述第一网络支持宽带,那么第一话音编码信号可为宽带话音编码信号或窄带话音编码信号。当所述第一网络支持超宽带,那么第一话音编码信号可为超宽带话音编码信号或窄带话音编码信号或宽带话音编码信号)。
其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽。其中,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率。例如,所述第一话音编码信号的频带可为所述第二话音编码信号的频带的子集,当然,所述第一话音编码信号的频带与所述第二话音编码信号的频带的交集,也可能不等于所述第一话音编码信号的频带。
扩频带话音解码信号可包括高频带扩展话音解码信号,扩频带话音解码信号还可包括低频带扩展话音解码信号。例如假设第一话音解码信号的频带带宽为3400Hz-300Hz=3100Hz,那么第一话音解码信号对应的扩频带话音解码信号可包括高频带扩展话音解码信号(如7000Hz-3400Hz=3600Hz),第一话音解码信号对应的扩频带话音解码信号还可以进一步包括低频带扩展话音解码信号(如300Hz-50Hz=250Hz)。又例如假设第一话音解码信号的频带带宽为7000Hz-50Hz=6950Hz,那么第一话音解码信号对应的扩频带话音解码信号可以包括高频带扩展话音解码信号(具体如14000Hz-7000Hz=7000Hz)。又例如假设第一话音解码信号的频带带宽为14kHz-50Hz=13950Hz,那么第一话音解码信号对应的扩频带话音解码信号可以包括高频带扩展话音解码信号(具体例如24kHz-14kHz=10kHz)。其他情况以此类推。
其中,本实施例提及的网络设备例如可为基站、无线网络控制器、核心网设备或其他网络设备。例如网络设备具体可能是第二终端所接入的无线接入网的基站或无线网络控制器,或第一终端所接入的无线接入网的基站或无线网络控制器,或也可能是分组数据网关或服务网关等核心网设备。
第一终端和第二终端可为手机、平板电脑、个人电脑或笔记本电脑等具有通话功能的 用户设备。
可以看出,上述举例的技术方案中,网络设备接收来自支持相对较窄带宽的第一终端的第一话音编码信号之后,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号,利用话音解码参数进行虚拟频带扩展处理以得到与第一话音解码信号对应的扩频带话音解码信号,而后在将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号之后,向支持相对较宽带宽第二终端发送,由于处于中转位置的网络设备对支持相对较窄带宽的第一终端发往支持相对较宽带宽的第二终端的话音编码信号进行虚拟频带扩展,进而使得支持相对较宽带宽的第二终端的下行话音编码信号可更好的与第二终端的最大频带带宽支持能力相匹配,有利于使得支持相对较宽带宽的第二终端尽量享受到与其最大频带带宽支持能力相匹配的话音信号带宽服务,并且还无需对第二终端进行特别的功能增强,这样有利于提升用户通话体验。可见,上述举例有利于提升终端最大频带带宽支持能力非对称情况下的服务质量。
可以看出,上述举例的技术方案中,网络设备接收来自支持相对较窄带宽的网络的第一终端的第一话音编码信号之后,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号,利用话音解码参数进行虚拟频带扩展处理以得到与第一话音解码信号对应的扩频带话音解码信号,而后在将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号之后,向支持相对较宽带宽的第二网络的第二终端发送,由于处于中转位置的网络设备对支持相对较窄带宽的第一网络的第一终端发往支持相对较宽带宽的第二网络的第二终端的话音编码信号进行虚拟频带扩展,进而使得支持相对较宽带宽的第二网络的第二终端的下行话音编码信号可更好的与第二网络的最大频带带宽支持能力相匹配,有利于使得支持相对较宽带宽的第二网络的第二终端尽量享受到与其最大频带带宽支持能力相匹配的话音信号带宽服务,并且还无需对第二网络进行特别的功能增强,这样有利于提升用户通话体验。可见,上述举例有利于提升网络最大频带带宽支持能力非对称情况下的服务质量。
结合第一方面,在第一方面的第一种可能的实施方式中,所述利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号包括:
利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
结合第一方面的第一种可能的实施方式,在第一方面的第二种可能的实施方式中,所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号,包括:利用所述话音解码参数,基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
结合第一方面的第一种可能的实施方式或第一方面的第二种可能的实施方式,在第一方面的第三种可能的实施方式中,所述利用所述话音解码参数估计出与所述第一话音解码 信号对应的扩频带谱包络,包括:利用所述话音解码参数,基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱包络。
结合第一方面或第一方面的第一种至第三种可能实施方式中的任意一种可能实施方式,在第一方面的第四种可能的实施方式中,所述话音解码参数包括基因周期、浊音度因子和线性预测编码参数。
结合第一方面或第一方面的第一种至第四种可能实施方式中的任意一种可能实施方式,在第一方面的第五种可能的实施方式中,对第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号包括:从多个话音解码器中选用与所述第一终端所支持的最大频带带宽对应的话音解码器,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号;或者,从多个话音解码器中选用与所述第一网络所支持的最大频带带宽对应的话音解码器,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号。
可以理解,网络设备中内置多种话音解码器,根据需要选择适宜的话音解码器来执行话音编码信号的解码,有利于提升网络设备的转码支持能力和响应处理速度。
结合第一方面或第一方面的第一种至第五种可能实施方式中的任意一种可能实施方式,在第一方面的第六种可能的实施方式中,所述利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号,包括:从多个虚拟频带扩展器中选用与所述第二终端所支持的最大频带带宽对应的虚拟频带扩展器,利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;或者,从多个虚拟频带扩展器中选用与所述第二网络所支持的最大频带带宽对应的虚拟频带扩展器,利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
结合第一方面或第一方面的第一种至第六种可能实施方式中的任意一种可能实施方式,在第一方面的第七种可能的实施方式中,所述将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号,包括:从多个话音编码器中选用与所述第二终端所支持的最大频带带宽对应的话音编码器,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;从多个话音编码器中选用与所述第二网络所支持的最大频带带宽对应的话音编码器,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号。
结合第一方面或第一方面的第一种至第七种可能实施方式中的任意一种可能实施方式,在第一方面的第八种可能的实施方式中,
所述第一话音编码信号和所述第二话音编码信号的频带带宽为如下频带带宽中的其中两个:窄带、宽带、超宽带和全带。
结合第一方面或第一方面的第一种至第八种可能实施方式中的任意一种可能实施方式,在第一方面的第九种可能的实施方式中,所述话音信号处理方法还包括:
所述网络设备接收来自所述第二终端的第三话音编码信号;
所述网络设备对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;
所述网络设备对所述第三话音解码信号降采样处理以得到第四话音解码信号;所述网络设备对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,其中,所述第四话音编码信号的频带带宽小于所述第三话音编码信号的频带带宽,所述第四话音编码信号的采样率小于所述第三话音编码信号的采样率;
所述网络设备向所述第一终端发送所述第四话音编码信号;或者所述网络设备在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述第一终端发送所述话音增强处理后的第四话音编码信号。
可以理解,通过对发往第一终端的话音编码信号进行话音增强处理,有利于提升所述第一终端收到的话音编码信号的增益,进而有利于提升较窄带宽支持能力的终端的通话体验或提升处于较窄带宽支持能力的网络的终端的通话体验。
结合第一方面的第九种可能实施方式,在第一方面的第十种可能的实施方式中,所述对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号包括:从多个话音解码器中选用与所述第二终端所支持的最大频带带宽对应的话音解码器,对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;或者,从多个话音解码器中选用与所述第二网络所支持的最大频带带宽对应的话音解码器,对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号。
结合第一方面的第九种至第十种可能实施方式之中的任意一种可能实施方式,在第一方面的第十一种可能的实施方式中,所述对所述第三话音解码信号降采样处理以得到第四话音解码信号包括:从多个降采样器中选用与所述第一终端所支持的最大频带带宽对应的降采样器,对所述第三话音解码信号降采样处理以得到第四话音解码信号;或者,从多个降采样器中选用与所述第一网络所支持的最大频带带宽对应的降采样器,对所述第三话音解码信号降采样处理以得到第四话音解码信号。
结合第一方面的第九种至第十一种可能实施方式之中的任意一种可能实施方式,在第一方面的第十二种可能的实施方式中,所述对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,包括:从多个话音编码器中选用与所述第一终端所支持的最大频带带宽对应的话音编码器,对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号;或者,从多个话音编码器中选用与所述第一网络所支持的最大频带带宽对应的话音编码器,对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号。
结合第一方面或第一方面的第一种至第十二种可能实施方式中的任意一种可能实施方式,在第一方面的第十三种可能的实施方式中,所述网络设备为基站、无线网络控制器或核心网设备。
本发明实施例第二方面提供一种网络设备,包括:
通信接口,用于接收来自第一终端的第一话音编码信号;
第一话音解码器,用于对所述第一话音编码信号进行话音解码处理以得到话音解码参 数和第一话音解码信号;
第一虚拟频带扩展处理器,用于利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;
第一话音编码器,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率;
所述通信接口,用于向与所述第一终端建立了通话连接的第二终端发送所述第二话音编码信号。
其中,所述第一终端支持的最大频带带宽小于所述第二终端支持的最大频带带宽,或者,所述第一终端所处第一网络所支持的最大频带带宽小于所述第二终端所处第二网络所支持的最大频带带宽。
可以理解,第一话音解码器、第一虚拟频带扩展处理器和第一话音编码器也可以一体集成。
结合第二方面,在第二方面的第一种可能的实施方式中,
所述第一虚拟频带扩展处理器具体用于,利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
结合第二方面的第一种可能的实施方式,在第二方面的第二种可能的实施方式中,在利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号的方面,所述第一虚拟频带扩展处理器具体用于,利用所述话音解码参数,基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
结合第二方面的第一种可能的实施方式或第二方面的第二种可能的实施方式,在第二方面的第三种可能的实施方式中,
在所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络的方面,所述第一虚拟频带扩展处理器具体用于利用所述话音解码参数,基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱包络。
结合第二方面或第二方面的第一种至第三种可能实施方式中的任意一种可能实施方式,在第二方面的第四种可能的实施方式中,所述话音解码参数包括基因周期、浊音度因子和线性预测编码参数。
结合第二方面或第二方面的第一种至第四种可能实施方式中的任意一种可能实施方式,在第二方面的第五种可能的实施方式中,所述网络设备包括多个话音解码器,所述第一话音解码器为多个话音解码器中与所述第一终端所支持的最大频带带宽对应的话音解码器;或者,所述第一话音解码器为多个话音解码器中与所述第一网络所支持的最大频带带 宽对应的话音解码器。
结合第二方面或第二方面的第一种至第五种可能实施方式中的任意一种可能实施方式,在第二方面的第六种可能的实施方式中,所述网络设备包括多个虚拟频带扩展处理器,所述第一虚拟频带扩展处理器为多个虚拟频带扩展器中的与所述第二终端所支持的最大频带带宽对应的虚拟频带扩展器;或者,所述第一虚拟频带扩展处理器为多个虚拟频带扩展器中的与所述第二网络所支持的最大频带带宽对应的虚拟频带扩展器。
结合第二方面或第二方面的第一种至第六种可能实施方式中的任意一种可能实施方式,在第二方面的第七种可能的实施方式中,所述网络设备包括多个话音编码器,所述第一话音编码器为多个话音编码器中与所述第二终端所支持的最大频带带宽对应的话音编码器;或者,,所述第一话音编码器为多个话音编码器中与所述第二网络所支持的最大频带带宽对应的话音编码器。
结合第二方面或第二方面的第一种至第七种可能实施方式中的任意一种可能实施方式,在第二方面的第八种可能的实施方式中,所述第一话音编码信号和所述第二话音编码信号的频带带宽为如下频带带宽中的其中两个:窄带、宽带、超宽带和全带。
结合第二方面或第二方面的第一种至第八种可能实施方式中的任意一种可能实施方式,在第二方面的第九种可能的实施方式中,
所述网络设备还包括:第二话音解码器、第二话音编码器和降采样器;
所述通信接口还用于,接收来自所述第二终端的第三话音编码信号;
所述第二话音解码器用于,对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;
所述第一降采样器,用于对所述第三话音解码信号降采样处理以得到第四话音解码信号;
所述第二话音编码器,用于对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,其中,所述第四话音编码信号的频带带宽小于所述第三话音编码信号的频带带宽,所述第四话音编码信号的采样率小于所述第三话音编码信号的采样率;
所述通信接口还用于,向所述第一终端发送所述第四话音编码信号;或者所述通信接口还用于,在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述第一终端发送所述话音增强处理后的第四话音编码信号。
结合第二方面的第九种可能实施方式,在第二方面的第十种可能的实施方式中,所述网络设备包括多个话音解码器,所述第二话音解码器为与所述第二终端所支持的最大频带带宽对应的话音解码器;或者,所述第二话音解码器为与所述第二网络所支持的最大频带带宽对应的话音解码器。
结合第二方面的第九种至第十种可能实施方式之中的任意一种可能实施方式,在第二方面的第十一种可能的实施方式中,所述网络设备包括多个降采样器,所述第一降采样器为所述多个降采样器中的与所述第一终端所支持的最大频带带宽对应的降采样器;或者, 所述第一降采样器为所述多个降采样器中的与所述第一网络所支持的最大频带带宽对应的降采样器。
结合第二方面的第九种至第十一种可能实施方式之中的任意一种可能实施方式,在第二方面的第十二种可能的实施方式中,所述网络设备包括多个话音编码器,第二话音编码器为与所述第一终端所支持的最大频带带宽对应的话音编码器;或者,第二话音编码器为与所述第一网络所支持的最大频带带宽对应的话音编码器。
结合第二方面或第二方面的第一种至第十二种可能实施方式中的任意一种可能实施方式,在第二方面的第十三种可能的实施方式中,所述网络设备为基站、无线网络控制器或核心网设备。
此外,本发明实施例第三方面提供一种网络设备,包括存储单元、通信接口及与所述存储单元和通信接口耦合的处理器。其中,所述存储单元用于存储指令,所述处理器用于执行所述指令,所述通信接口用于在所述处理器的控制下与其他设备进行通信。当所述处理器在执行所述指令时可根据所述指令执行在第一方面中网络设备所执行的方法。
此外,本发明实施例第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储了网络设备所执行的用于话音信号处理的程序代码。所述程序代码包括用于执行在第一方面中网络设备所执行的方法的指令。
此外,本发明实施例的第五方面还提供了一种网络设备,所述网络设备包括的单元能够执行在第一方面中的网络设备所执行的方法。
本发明实施例第六方面提供一种通信系统,可包括:如本发明实施例提供的任意一种网络设备。
本发明实施例第七方面提供一种计算机程序产品,所述程序代码包括用于执行在第一方面中网络设备所执行的方法的指令。
附图说明
图1-A和图1-B为本发明实施例提供的两种网络架构的示意图;
图2-A为本发明实施例提供的一种话音信号处理方法的示意图;
图2-B为本发明实施例提供的一种典型频带的举例频谱范围示意图;
图3-A为本发明实施例提供的另一种话音信号处理方法的示意图;
图3-B为本发明实施例提供的一种网络设备的内部器件的示意图;
图3-C为本发明实施例提供的一种窄带的频谱范围示意图;
图3-D为本发明实施例提供的一种窄带扩展宽带的频谱范围示意图;
图3-E为本发明实施例提供的一种话音信号的流向示意图;
图4-A为本发明实施例提供的另一种话音信号处理方法的示意图;
图4-B为本发明实施例提供的一种话音信号的流向示意图;
图5为本发明实施例提供的一种网络设备的结构示意图;
图6为本发明实施例提供的另一种网络设备的结构示意图。
具体实施方式
本发明说明书、权利要求书和附图中出现的术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。此外,术语“第一”、“第二”和“第三”等是用于区别不同的对象,而并非用于描述特定的顺序。
下面首先参见图1-A和图1-B,其中,图1-A和图1-B为本申请实施例提供的两种可能的网络架构示意图,在图1-A所示的网络架构中,终端之间通过接入网与核心网建立起通话连接。本申请一些实施例中提供的话音信号处理方法可由接入网设备或核心网设备来执行。本申请一些实施例中提供的话音信号处理方法例如可由互联网中的一些服务器(例如会议服务器、网络电话服务器等)来执行。
其中,本发明实施例提及的终端可为具有采集、存储、向外传输话音信号等功能的装置,具体的,终端例如可为手机、平板电脑、个人电脑或笔记本电脑等等。
参见图2-A,图2-A是本申请的一个实施例提供的一种话音信号处理方法的流程示意图。如图2-A举例所示,本申请的一个实施例提供的一种话音信号处理方法可包括:
201、第一终端发送第一话音编码信号。
其中,在第一终端与第二终端之间建立起通话连接之后,第一终端可基于所述通话连接发送第一话音编码信号。第二终端也可基于所述通话连接发送话音编码信号。
其中,所述通话连接可能是基于移动通信网络的通话连接,也可能是基于互联网的通话连接。
其中,本申请实施例中以所述第一终端支持的最大频带带宽小于所述第二终端支持的最大频带带宽为例。或者,所述第一终端所处的第一网络支持的最大频带带宽小于所述第二终端所处的第二网络支持的最大频带带宽。其中,第一终端所处的第一网络是指第一终端进行当前通话时所附着的网络,第二终端所处的第二网络是指第二终端进行所述当前通话时所附着的网络,在终端附着在网络时,终端能力是会受到网络能力的限制的,例如第一终端虽然是全带终端,但是由于第一网络只支持宽带,那么第一终端附着在第一网络时也只能使用宽带。
例如,第一终端支持的最大频带带宽和第二终端支持的最大频带带宽例如可为如下典型频带带宽中的其中两个:窄带(NB)、宽带(WB)、超宽带(SWB)和全带(FB)。也就是说,第一终端例如可能是窄带终端、宽带终端或超宽带终端,而第二终端可能是宽带终端、超宽带终端或全带终端。当然第一终端支持的最大频带带宽和第二终端支持的最大频带带宽也并不限于上述举例的典型频带带宽。
例如,第一网络支持的最大频带带宽和第二网络支持的最大频带带宽例如可为如下典型频带带宽中的其中两个:窄带(NB)、宽带(WB)、超宽带(SWB)和全带(FB)。也就是说,第一网络例如可能支持窄带、宽带或超宽带,而第二网络可能支持宽带、超宽 带或全带。当然第一网络支持的最大频带带宽和第二网络支持的最大频带带宽也并不限于上述举例的典型频带带宽。
参见图2-B,图2-B举例示出了窄带(NB)、宽带(WB)、超宽带(SWB)和全带(FB)等典型频带带宽的举例范围,当然这些典型频带的范围边界并不限于图中举例。
所述第一话音编码信号的频带带宽小于或等于所述第一终端支持的最大频带带宽。例如,当所述第一终端为窄带终端,那么第一话音编码信号可为窄带话音编码信号。当所述第一终端为宽带终端,那么第一话音编码信号可为宽带话音编码信号或窄带话音编码信号。当所述第一终端为超宽带终端,那么第一话音编码信号可为超宽带话音编码信号或窄带话音编码信号或宽带话音编码信号。
所述第一话音编码信号的频带带宽小于或等于所述第一网络支持的最大频带带宽。例如,当所述第一网络支持窄带,那么第一话音编码信号可为窄带话音编码信号。当所述第一网络支持宽带,那么第一话音编码信号可为宽带话音编码信号或窄带话音编码信号。当所述第一网络支持超宽带,那么第一话音编码信号可为超宽带话音编码信号或窄带话音编码信号或宽带话音编码信号。
202、网络设备接收来自所述第一终端的所述第一话音编码信号,所述网络设备对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号。
其中,所述话音解码参数例如可包括基因周期、浊音度因子和线性预测编码参数等。
203、所述网络设备利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
其中,利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号,主要机理是利用了低频带话音信号和高频带话音信号之间的相关性,因此,基于上述机理的各种频带扩展相关算法均是可选用来进行VBWE(Virtual Band Width Extension,虚拟频带扩展)处理的。
扩频带话音解码信号可包括高频带扩展话音解码信号,扩频带话音解码信号还可包括低频带扩展话音解码信号。例如假设第一话音解码信号的频带带宽为3400Hz-300Hz=3100Hz,那么第一话音解码信号对应的扩频带话音解码信号可包括高频带扩展话音解码信号(如7000Hz-3400Hz=3600Hz),第一话音解码信号对应的扩频带话音解码信号还可以进一步包括低频带扩展话音解码信号(如300Hz-50Hz=250Hz)。又例如假设第一话音解码信号的频带带宽为7000Hz-50Hz=6950Hz,那么第一话音解码信号对应的扩频带话音解码信号可以包括高频带扩展话音解码信号(具体如14000Hz-7000Hz=7000Hz)。又例如假设第一话音解码信号的频带带宽为14kHz-50Hz=13950Hz,那么第一话音解码信号对应的扩频带话音解码信号可以包括高频带扩展话音解码信号(具体例如24kHz-14kHz=10kHz)。其他情况以此类推。
204、所述网络设备将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号。
其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽。其 中,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率。例如,所述第一话音编码信号的频带可为所述第二话音编码信号的频带的子集,当然,所述第一话音编码信号的频带与所述第二话音编码信号的频带的交集,也可能不等于所述第一话音编码信号的频带。
205、所述网络设备向与所述第一终端建立了通话连接的第二终端发送所述第二话音编码信号。
相应的,所述第二终端可接收所述第二话音编码信号,并对所述第二话音编码信号进行解码播放。
其中,本实施例提及的网络设备例如可为基站、无线网络控制器、核心网设备或其他网络设备。例如网络设备具体可能是第二终端所接入的无线接入网的基站或无线网络控制器,或第一终端所接入的无线接入网的基站或无线网络控制器,或也可能是分组数据网关或服务网关等核心网设备。
第一终端和第二终端可为手机、平板电脑、个人电脑或笔记本电脑等具有通话功能的用户设备。
可以看出,本实施例的方案中,网络设备接收来自支持相对较窄带宽的第一终端的第一话音编码信号之后,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号,利用话音解码参数进行虚拟频带扩展处理以得到与第一话音解码信号对应的扩频带话音解码信号,而后在将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号之后,向支持相对较宽带宽的第二终端发送,由于处于中转位置的网络设备对支持相对较窄带宽的第一终端发往支持相对较宽带宽的第二终端的话音编码信号进行虚拟频带扩展,进而使得支持相对较宽带宽的第二终端的下行话音编码信号可以更好的与第二终端的最大频带带宽支持能力相匹配,有利于使得支持相对较宽带宽的第二终端尽量享受到与其最大频带带宽支持能力相匹配的话音信号带宽服务,并且还无需对第二终端进行特别的功能增强,这样有利于提升用户通话体验。
可以看出,本实施例的方案中,网络设备接收来自支持相对较窄带宽的第一网络的第一终端的第一话音编码信号之后,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号,利用话音解码参数进行虚拟频带扩展处理以得到与第一话音解码信号对应的扩频带话音解码信号,而后在将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号之后,向支持相对较宽带宽的第二网络的第二终端发送,由于处于中转位置的网络设备对支持相对较窄带宽的第一网络的第一终端发往支持相对较宽带宽的第二网络的第二终端的话音编码信号进行虚拟频带扩展,进而使得支持相对较宽带宽的第二网络的第二终端的下行话音编码信号可以更好的与第二网络的最大频带带宽支持能力相匹配,有利于使得支持相对较宽带宽的第二网络的第二终端尽量享受到与其最大频带带宽支持能力相匹配的话音信号带宽服务,并且还无需对第二网络进行特别的功能增强,这样有利于提升用户通话体验。
下面首先结合相关附图来进一步进行详细的介绍。
参见图3-A,图3-A是本申请的一个实施例提供的一种话音信号处理方法的流程示意图。如图3-A举例所示,本申请的一个实施例提供的一种话音信号处理方法可包括:
301、窄带终端与宽带终端之间建立通话连接。
302、窄带终端话音编码得到第一话音编码信号,基于所述通话连接发送所述第一话音编码信号。
其中,本实施例中假设第一终端为窄带终端,第二终端为宽带终端。窄带终端支持的最大频带带宽为窄带(例如3400Hz-300Hz=3100Hz),宽带终端支持的最大频带带宽为宽带(例如7000Hz-50Hz=6950Hz)。
具体例如,那么窄带终端可使用AMR NB编码器或其他NB编码器对话音采样信号进行话音编码以得到第一话音编码信号。第一话音编码信号的采样率为8kHz。第一话音编码信号的频带带宽BW1=3400Hz-300Hz=3100Hz。
303、RNC接收来自所述窄带终端的所述第一话音编码信号,所述RNC对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号。
其中,所述话音解码参数例如可包括基因周期、浊音度因子和线性预测编码参数等。
假设RNC中存在包括多个话音解码器的话音解码器组(话音解码器组例如可包括NB解码器、WB解码器、SWB解码器和FB解码器等),那么,RNC可基于窄带终端支持的最大频带带宽(如NB),从话音解码器组中选用NB解码器对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号。
参见图3-B,图3-B举例的网络设备(如RNC)包括话音解码器组(包括多个话音解码器)、VBWE处理器组(包括多个VBWE处理器组)、话音编码器组(包括多个话音编码器)和降采样器组(包括多个降采样器)等。RNC可根据需要来从相应器件组中选用相应器件执行相应操作。
本申请的各实施例,NB解码器例如为AMR(Adaptive Multi-Rate,自适应多速率)-NB解码器或其他类型的NB解码器。其中,SWB解码器例如可为EVS(Enhanced Voice Services,增强型语音服务)-SWB解码器或者其他类型的SWB解码器。其中,WB解码器例如为AMR-WB解码器或者其他类型的WB解码器。其中,FB解码器例如可为EVS-FB解码器或者其他类型的FB解码器。
304、所述RNC利用所述话音解码参数进行VBWE处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。其中,所述话音解码参数可包括基因周期、浊音度因子和线性预测编码参数等。
假设RNC中存在包括多个VBWE处理器的VBWE处理器组(VBWE处理器组例如可包括NB-VBWE处理器、WB-VBWE处理器、SWB VBWE处理器和FB VBWE处理器等),那么,RNC可以基于宽带终端支持的最大频带带宽(如WB),从多个VBWE处理器中选用WB VBWE处理器,利用所述话音解码参数进行VBWE处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
其中,利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号,例如可包括:利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
其中,利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号,例如可包括:利用所述话音解码参数(例如基因周期和浊音度因子等话音解码参数),基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
其中,所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络,例如可包括:利用所述话音解码参数(例如线性预测编码参数等话音解码参数),基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱包络。
305、所述RNC将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号。
其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽。其中,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率。第二话音编码信号的采样速率例如为16kHz,第二话音编码信号的频带带宽BW2。
其中,第一话音解码信号对应的扩频带话音解码信号包括高的频带扩展话音解码信号的频带带宽BWE1=7000Hz-3400Hz=3600Hz。如果第一话音解码信号对应的扩频带话音解码信号包括高频带扩展话音解码信号而不包括低频带扩展话音解码信号,那么例如图3-C和图3-D举例所示,
BW2=BW1+BWE1=3100Hz+(7000Hz-3400Hz)=6700Hz。
此外,如果第一话音解码信号对应的扩频带话音解码信号包括高频带扩展话音解码信号和低频带扩展话音解码信号,那么,
BW2=BW1+BWE1+BWE2=3100Hz+(300Hz-50Hz)+(7000Hz-3400Hz)=6950Hz。其中,第一话音解码信号对应的低频带扩展话音解码信号的频带带宽BWE2=(300Hz-50Hz)=250Hz。
假设,RNC中存在包括多个话音编码器的话音编码器组(话音编码器组例如包括NB编码器、WB编码器、SWB编码器和FB编码器等),那么,RNC可基于宽带终端支持的最大频带带宽(WB),从多个话音编码器中选用WB编码器,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号。
在本申请的各实施例中,NB编码器例如可为AMR-NB编码器或其他类型的NB编码器。其中,SWB编码器例如可为EVS-SWB编码器或者其他类型的SWB编码器。其中,WB编码器例如为AMR-WB编码器或者其他类型的WB编码器。其中,FB编码器例如可为EVS-FB编码器或者其他类型的FB编码器。
306、所述RNC向与所述窄带终端建立了通话连接的宽带终端发送所述第二话音编码信号。
相应的,所述宽带终端可接收所述第二话音编码信号,并对所述第二话音编码信号进行解码播放。
307、宽带终端话音编码得到第三话音编码信号,基于所述通话连接发送所述第三话音编码信号。
其中,假设第三话音编码信号的采样率例如为16kHz,假设第三话音编码信号的频带带宽BW3=(7000Hz-300Hz)=6700Hz。
308、所述RNC接收来自所述宽带终端的第三话音编码信号,所述RNC对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号。
假设,RNC中存在包括多个话音解码器的话音解码器组(话音解码器组例如包括NB解码器、WB解码器、SWB解码器和FB解码器等等),那么RNC可以基于宽带终端支持的最大频带带宽(WB),从多个话音解码器中选用WB解码器对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号。
309、所述RNC对所述第三话音解码信号降采样处理以得到第四话音解码信号。
假设,RNC中存在包括多个降采样器的降采样器组(降采样器组例如包括NB降采样器、WB降采样器、SWB降采样器和FB降采样器等),那么RNC可基于窄带终端支持的最大频带带宽(NB),从降采样器组包括的多个降采样器中选用NB降采样器对所述第三话音解码信号降采样处理以得到第四话音解码信号。
310、所述RNC对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号。
假设RNC中存在包括多个话音编码器的话音编码器组(话音编码器组例如包括NB编码器、WB编码器、SWB编码器和FB编码器等),那么RNC可基于窄带终端支持的最大频带带宽(NB),从话音编码器组包括的多个话音编码器中选用NB编码器,将所述第四话音解码信号进行话音编码处理以得到第四话音编码信号。
可以看出,所述第四话音编码信号的频带带宽(NB)小于所述第三话音编码信号的频带带宽(WB),其中,所述第四话音编码信号的采样率(8kHz)小于所述第三话音编码信号的采样率(16kHz)。
第四话音编码信号的频带带宽BW4=(3400Hz-300Hz)=3100Hz。
311、所述RNC向所述窄带终端发送所述第四话音编码信号;或者所述RNC在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述窄带终端发送所述话音增强处理后的第四话音编码信号。
通过话音增强处理可使第四话音编码信号的增益MOS不低于宽带终端直接发送语音信号频带带宽为BW1时的增益MOS。
窄带终端和宽带终端的产品形态可为用户设备(UE)等具有通话功能的用户终端。
可以理解,步骤301~步骤306,与步骤307~步骤311之间没有必然的执行先后顺序。
参见图3-E,参见图3-E举例示出的话音信号在窄带终端、RNC(网络设备的一种举例)和宽带终端之间的流向关系。窄带终端、RNC和宽带终端可具有图3-E中举例的功能器件。
可以看出,本实施例的举例方案中,RNC接收窄带终端的窄带话音编码信号之后,对窄带话音编码信号进行话音解码处理以得到话音解码参数和窄带话音解码信号,利用话音解码参数进行虚拟频带扩展处理以得到与窄带话音解码信号对应的扩频带话音解码信号,而后在将所述窄带话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到宽带话音编码信号之后,向宽带终端发送,由于处于中转位置的网络设备对窄带终端发往宽带终端的话音编码信号进行虚拟频带扩展,进而使得宽带终端的下行话音编码信号可以更好的与宽带终端的最大频带带宽支持能力相匹配,有利于使得宽带终端尽量享受到与其最大频带带宽支持能力相匹配的话音信号带宽服务,并且还无需对宽带终端进行特别的功能增强,这样有利于提升用户通话体验。
可以理解的是,图3对应的实施例描述的是窄带终端和宽带终端进行通话的流程,处于窄带网络的终端和处于宽带网络的终端进行通话的流程与图3对应的实施例的流程类似,不再赘述。
参见图4-A,图4-A是本申请的一个实施例提供的一种话音信号处理方法的流程示意图。如图4-A举例所示,本申请的一个实施例提供的一种话音信号处理方法可包括:
401、宽带终端与超宽带终端之间建立通话连接。
402、宽带终端话音编码得到第一话音编码信号,基于所述通话连接发送所述第一话音编码信号。
本实施例中假设第一终端为宽带终端,第二终端为超宽带终端。宽带终端支持的最大频带带宽为宽带(例如7kHz-50Hz=6950Hz),超宽带终端支持的最大频带带宽为超宽带(例如14kHz-50Hz=13950Hz)。
具体例如,那么宽带终端可使用AMR WB编码器或其他WB编码器对话音采样信号进行话音编码以得到第一话音编码信号。第一话音编码信号的采样率为16kHz。第一话音编码信号的频带带宽BW1=7kHz-50Hz=6950Hz。
403、RNC接收来自所述宽带终端的所述第一话音编码信号,所述RNC对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号。
其中,所述话音解码参数例如可包括基因周期、浊音度因子和线性预测编码参数等。
假设RNC中存在包括多个话音解码器的话音解码器组(话音解码器组例如可包括NB解码器、WB解码器、SWB解码器和FB解码器等),那么,RNC可基于宽带终端支持的最大频带带宽(如WB),从话音解码器组中选用WB解码器对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号。
404、所述RNC利用所述话音解码参数进行VBWE处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。其中,所述话音解码参数可包括基因周期、浊音度因子和线性预测编码参数等。
假设RNC中存在包括多个VBWE处理器的VBWE处理器组(VBWE处理器组例如可包括NB-VBWE处理器、WB-VBWE处理器、SWB VBWE处理器和FB VBWE处理器等),那么,RNC可以基于超宽带终端支持的最大频带带宽(如SWB),从多个VBWE处理器中选用SWB VBWE处理器,利用所述话音解码参数进行VBWE处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
其中,利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号,例如可包括:利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
其中,利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号,例如可包括:利用所述话音解码参数(例如基因周期和浊音度因子等话音解码参数),基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
其中,所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络,例如可包括:利用所述话音解码参数(例如线性预测编码参数等话音解码参数),基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱包络。
405、所述RNC将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号。
其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽。其中,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率。第二话音编码信号的采样速率例如为32kHz,第二话音编码信号的频带带宽BW2。
其中,第一话音解码信号对应的扩频带话音解码信号包括高的频带扩展话音解码信号的频带带宽BWE1=14kHz-7kHz=7kHz。如果第一话音解码信号对应的扩频带话音解码信号包括高频带扩展话音解码信号而不包括低频带扩展话音解码信号,那么,
BW2=BW1+BWE1=6950Hz+(14kHz-7kHz)=13950Hz。
假设,RNC中存在包括多个话音编码器的话音编码器组(话音编码器组例如包括NB编码器、WB编码器、SWB编码器和FB编码器等),那么,RNC可基于超宽带终端支持的最大频带带宽(SWB),从多个话音编码器中选用SWB编码器,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号。
406、所述RNC向与所述宽带终端建立了通话连接的超宽带终端发送所述第二话音编码信号。
相应的,所述超宽带终端可接收所述第二话音编码信号,并对所述第二话音编码信号进行解码播放。
407、超宽带终端话音编码得到第三话音编码信号,基于所述通话连接发送所述第三话 音编码信号。
其中,假设第三话音编码信号的采样率例如为32kHz,假设第三话音编码信号的频带带宽BW3=(14kHz-50Hz)=13950Hz。
408、所述RNC接收来自所述超宽带终端的第三话音编码信号,所述RNC对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号。
假设,RNC中存在包括多个话音解码器的话音解码器组(话音解码器组例如包括NB解码器、WB解码器、SWB解码器和FB解码器等等),那么RNC可以基于超宽带终端支持的最大频带带宽(SWB),从多个话音解码器中选用SWB解码器对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号。
409、所述RNC对所述第三话音解码信号降采样处理以得到第四话音解码信号。
假设,RNC中存在包括多个降采样器的降采样器组(降采样器组例如包括NB降采样器、WB降采样器、SWB降采样器和FB降采样器等),那么RNC可基于宽带终端支持的最大频带带宽(WB),从降采样器组包括的多个降采样器中选用WB降采样器对所述第三话音解码信号降采样处理以得到第四话音解码信号。
410、所述RNC对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号。
假设RNC中存在包括多个话音编码器的话音编码器组(话音编码器组例如包括NB编码器、WB编码器、SWB编码器和FB编码器等),那么RNC可基于宽带终端支持的最大频带带宽(WB),从话音编码器组包括的多个话音编码器中选用WB编码器,将所述第四话音解码信号进行话音编码处理以得到第四话音编码信号。
可以看出,所述第四话音编码信号的频带带宽(WB)小于所述第三话音编码信号的频带带宽(SWB),其中,所述第四话音编码信号的采样率(16kHz)小于所述第三话音编码信号的采样率(32kHz)。
第四话音编码信号的频带带宽BW4=(7000Hz-500Hz)=6950Hz。
411、所述RNC向所述宽带终端发送所述第四话音编码信号;或者所述RNC在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述宽带终端发送所述话音增强处理后的第四话音编码信号。
通过话音增强处理可使第四话音编码信号的增益MOS不低于超宽带终端直接发送语音信号频带带宽为BW1时的增益MOS。
宽带终端和超宽带终端的产品形态可为用户设备(UE)等具有通话功能的用户终端。
可以理解,步骤401~步骤406,与步骤407~步骤411之间没有必然的执行先后顺序。
参见图4-B,参见图4-B举例示出的话音信号在宽带终端、RNC(网络设备的一种举例)和超宽带终端之间的流向关系。宽带终端、RNC和超宽带终端可具有图4-B中举例的功能器件。
可以看出,本实施例的举例方案中,网络设备(RNC)接收宽带终端的宽带话音编码信号之后,对所述宽带话音编码信号进行话音解码处理以得到话音解码参数和宽带话音解 码信号,利用话音解码参数进行虚拟频带扩展处理以得到与宽带话音解码信号对应的扩频带话音解码信号,而后在将所述宽带话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到超宽带话音编码信号之后,向超宽带终端发送,由于处于中转位置的RNC对宽带终端发往超宽带终端的话音编码信号进行虚拟频带扩展,进而使得超宽带终端的下行话音编码信号可以更好的与超宽带终端的最大频带带宽支持能力相匹配,有利于使得超宽带终端尽量享受到与其最大频带带宽支持能力相匹配的话音信号带宽服务,并且还无需对超宽带终端进行特别的功能增强,这样有利于提升用户通话体验。
可以理解的是,图4对应的实施例描述的是宽带终端和超宽带终端进行通话的流程,处于宽带网络的终端和处于超宽带网络的终端进行通话的流程与图4对应的实施例的流程类似,不再赘述。
其中,图3-A中以第一终端为窄带终端而第二终端为宽带终端为例,图4-B中以第一终端为宽带终端而第二终端为超宽带终端为例,而第一终端和第二终端为其他类型终端的情况可以此类推。
参见图5,本申请实施例提供一种网络设备500,包括:
通信接口510,用于接收来自第一终端的第一话音编码信号。
第一话音解码器520,用于对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号。
第一虚拟频带扩展处理器530,用于利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
第一话音编码器540,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率。
所述通信接口510还用于向与所述第一终端建立了通话连接的第二终端发送所述第二话音编码信号。其中,所述第一终端支持的最大频带带宽小于所述第二终端支持的最大频带带宽,或者所述第一终端所处的第一网络所支持的最大频带带宽小于所述第二终端所处的第二网络所支持的最大频带带宽。
在本发明的一些可能实施方式中,所述第一虚拟频带扩展处理器530具体用于,利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
在本发明一些可能实施方式中,在利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号的方面,所述第一虚拟频带扩展处理器530具体用于,利用所述话音解码参数,基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
在本发明的一些可能实施方式中,在所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络的方面,所述第一虚拟频带扩展处理器530具体用于利用所述话音解码参数,基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱包络。
在本发明的一些可能实施方式中,所述话音解码参数包括基因周期、浊音度因子和线性预测编码参数。
在本发明的一些可能实施方式中,所述网络设备包括多个话音解码器,所述第一话音解码器为多个话音解码器中与所述第一终端所支持的最大频带带宽对应的话音解码器;或者,所述第一话音解码器为多个话音解码器中与所述第一网络所支持的最大频带带宽对应的话音解码器;
在本发明的一些可能实施方式中,所述网络设备包括多个虚拟频带扩展处理器,所述第一虚拟频带扩展处理器为多个虚拟频带扩展器中的与所述第二终端所支持的最大频带带宽对应的虚拟频带扩展器;或者,所述第一虚拟频带扩展处理器为多个虚拟频带扩展器中的与所述第二网络所支持的最大频带带宽对应的虚拟频带扩展器;
在本发明一些可能实施方式中,所述网络设备包括多个话音编码器,所述第一话音编码器为多个话音编码器中与所述第二终端所支持的最大频带带宽对应的话音编码器;或者,所述第一话音编码器为多个话音编码器中与所述第二网络所支持的最大频带带宽对应的话音编码器。
在本发明的一些可能实施方式中,所述第一话音编码信号和所述第二话音编码信号的频带带宽为例如如下频带带宽中的其中两个:窄带、宽带、超宽带和全带。
在本发明的一些可能实施方式中,所述网络设备500还包括:第二话音解码器550、第二话音编码器570和第一降采样器560。
其中,所述通信接口510还用于,接收来自所述第二终端的第三话音编码信号。
所述第二话音解码器550用于,对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号。
所述第一降采样器560,用于对所述第三话音解码信号降采样处理以得到第四话音解码信号。
所述第二话音编码器570,用于对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,其中,所述第四话音编码信号的频带带宽小于所述第三话音编码信号的频带带宽,所述第四话音编码信号的采样率小于所述第三话音编码信号的采样率。
通信接口510还用于向所述第一终端发送所述第四话音编码信号;或者所述通信接口510还用于,在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述第一终端发送所述话音增强处理后的第四话音编码信号。
在本发明一些可能实施方式中,网络设备500包括多个话音解码器,所述第二话音解码器550为与所述第二终端所支持的最大频带带宽对应的话音解码器;或者,所述第二话 音解码器550为与所述第二网络所支持的最大频带带宽对应的话音解码器。
在本发明的一些可能实施方式中,所述网络设备包括多个降采样器,所述第一降采样器为所述多个降采样器中的与所述第一终端所支持的最大频带带宽对应的降采样器;或者,所述第一降采样器为所述多个降采样器中的与所述第一网络所支持的最大频带带宽对应的降采样器。
在本发明一些可能实施方式中,网络设备500包括多个话音编码器,所述第二话音编码器570为与所述第一终端所支持的最大频带带宽对应的话音编码器;或者,所述第二话音编码器570为与所述第一网络所支持的最大频带带宽对应的话音编码器。
在本发明一些可能实施方式中,所述网络设备为基站、无线网络控制器或核心网设备。
参见图6,本发明实施例提供一种网络设备600,包括:
存储单元620、通信接口610及与所述存储单元620和通信接口610耦合的处理器630。所述存储单元620用于存储指令,所述处理器630用于执行所述指令,其中,所述通信接口610用于在所述处理器630的控制下与其他设备进行通信。当所述处理器630在执行所述指令时可根据所述指令执行在上述实施例中的任意一种话音信号处理方法。
具体的,所述处理器630用于通过通信接口620接收来自第一终端的第一话音编码信号;对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号;利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率;通过所述通信接口610向与所述第一终端建立了通话连接的第二终端发送所述第二话音编码信号。其中,所述第一终端支持的最大频带带宽小于所述第二终端支持的最大频带带宽,或者,所述第一终端所处的第一网络所支持的最大频带带宽小于所述第二终端所处网络所支持的最大频带带宽。
在本发明的一些可能实施方式中,所述处理器630具体用于,利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
在本发明的一些可能实施方式中,在利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号的方面,处理器630具体用于,利用所述话音解码参数,基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
在本发明的一些可能实施方式中,在所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络的方面,处理器630具体用于利用所述话音解码参数,基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱 包络。
在本发明的一些可能实施方式中,所述话音解码参数包括基因周期、浊音度因子和线性预测编码参数。
在本发明的一些可能实施方式中,所述第一话音编码信号和所述第二话音编码信号的频带带宽为例如如下频带带宽中的其中两个:窄带、宽带、超宽带和全带。
在本发明的一些可能实施方式中,处理器630还用通过所述通信接口610接收来自所述第二终端的第三话音编码信号;对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;对所述第三话音解码信号降采样处理以得到第四话音解码信号;对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,其中,所述第四话音编码信号的频带带宽小于所述第三话音编码信号的频带带宽,所述第四话音编码信号的采样率小于所述第三话音编码信号的采样率;通过通信接口610向所述第一终端发送所述第四话音编码信号;或者所述通信接口610还用于,在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述第一终端发送所述话音增强处理后的第四话音编码信号。
在本发明一些可能实施方式中,网络设备600例如可为基站、无线网络控制器或核心网设备或网络电话服务器等。
可以看出,上述举例的技术方案中,网络设备600接收来自支持相对较窄带宽的第一终端的第一话音编码信号之后,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号,利用话音解码参数进行虚拟频带扩展处理以得到与第一话音解码信号对应的扩频带话音解码信号,而后在将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号之后,向支持相对较宽带宽第二终端发送,由于处于中转位置的网络设备对支持相对较窄带宽的第一终端发往支持相对较宽带宽的第二终端的话音编码信号进行虚拟频带扩展,进而使得支持相对较宽带宽的第二终端的下行话音编码信号可更好的与第二终端的最大频带带宽支持能力相匹配,有利于使得支持相对较宽带宽的第二终端尽量享受到与其最大频带带宽支持能力相匹配的话音信号带宽服务,并且还无需对第二终端进行特别的功能增强,这样有利于提升用户通话体验。可见,上述举例有利于提升终端最大频带带宽支持能力非对称情况下的服务质量。
可以看出,上述举例的技术方案中,网络设备600接收来自支持相对较窄带宽的第一网络的第一终端的第一话音编码信号之后,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号,利用话音解码参数进行虚拟频带扩展处理以得到与第一话音解码信号对应的扩频带话音解码信号,而后在将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号之后,向支持相对较宽带宽的第二网络的第二终端发送,由于处于中转位置的网络设备对支持相对较窄带宽的第一网络的第一终端发往支持相对较宽带宽的第二网络的第二终端的话音编码信号进行虚拟频带扩展,进而使得支持相对较宽带宽的第二网络的第二终端的下行话音编码信号可更好的与第二网络的最大频带带宽支持能力相匹配,有利于使得支持相对较宽带宽的第二 网络的第二终端尽量享受到与其最大频带带宽支持能力相匹配的话音信号带宽服务,并且还无需对第二网络或第二终端进行特别的功能增强,这样有利于提升用户通话体验。可见,上述举例有利于提升终端最大频带带宽支持能力非对称情况下的服务质量。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可结合或者可以集成到另一个系统,或一些特征可以忽略或不执行。另一点,所显示或讨论的相互之间的间接耦合或者直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例的方案的目的。
另外,在本发明各实施例中的各功能单元可集成在一个处理单元中,也可以是各单元单独物理存在,也可两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,或者也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (29)

  1. 一种话音信号处理方法,其特征在于,包括:
    网络设备接收来自第一终端的第一话音编码信号;
    所述网络设备对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号;
    所述网络设备利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;所述网络设备将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率;
    所述网络设备向与所述第一终端建立了通话连接的第二终端发送所述第二话音编码信号。
  2. 根据权利要求1所述的方法,其特征在于,所述利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号包括:
    利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
  3. 根据权利要求2所述的方法,其特征在于,所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号包括:利用所述话音解码参数,基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
  4. 根据权利要求2或3所述的方法,其特征在于,所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络,包括:利用所述话音解码参数,基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱包络。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述话音解码参数包括基因周期、浊音度因子和线性预测编码参数。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号包括:从多个话音解码器中选用与所述第一终端所支持的最大频带带宽对应的话音解码器,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号;或者,从多个话音解码器中选用与所述第一终端所处第一网络所支持的最大频带带宽对应的话音解码器,对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号;
    或者,所述利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号,包括:从多个虚拟频带扩展器中选用与所述第二终端所 支持的最大频带带宽对应的虚拟频带扩展器,利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;或者,从多个虚拟频带扩展器中选用与所述第二终端所处第二网络所支持的最大频带带宽对应的虚拟频带扩展器,利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;
    或者,所述将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号,包括:从多个话音编码器中选用与所述第二终端所支持的最大频带带宽对应的话音编码器,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;或者,从多个话音编码器中选用与所述第二终端所处第二网络所支持的最大频带带宽对应的话音编码器,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,
    所述第一话音编码信号和所述第二话音编码信号的频带带宽为如下频带带宽中的其中两个:窄带、宽带、超宽带和全带。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述话音信号处理方法还包括:
    所述网络设备接收来自所述第二终端的第三话音编码信号;
    所述网络设备对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;
    所述网络设备对所述第三话音解码信号降采样处理以得到第四话音解码信号;所述网络设备对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,其中,所述第四话音编码信号的频带带宽小于所述第三话音编码信号的频带带宽,所述第四话音编码信号的采样率小于所述第三话音编码信号的采样率;
    所述网络设备向所述第一终端发送所述第四话音编码信号;或者所述网络设备在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述第一终端发送所述话音增强处理后的第四话音编码信号。
  9. 根据权利要求8所述的方法,其特征在于,所述对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号,包括:从多个话音解码器中选用与所述第二终端所支持的最大频带带宽对应的话音解码器,对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;或者,从多个话音解码器中选用与所述第二终端所处第二网络所支持的最大频带带宽对应的话音解码器,对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;
    或者,所述对所述第三话音解码信号降采样处理以得到第四话音解码信号包括:从多个降采样器中选用与所述第一终端所支持的最大频带带宽对应的降采样器,对所述第三话音解码信号降采样处理以得到第四话音解码信号;或者,从多个降采样器中选用与所述第一终端所处第一网络所支持的最大频带带宽对应的降采样器,对所述第三话音解码信号降采样处理以得到第四话音解码信号;
    或者,所述对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,包括:从多个话音编码器中选用与所述第一终端所支持的最大频带带宽对应的话音编码器,对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号;或者,从多个话音编码器中选用与所述第一终端所处第一网络所支持的最大频带带宽对应的话音编码器,对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号。
  10. 根据权利要求1至9任一项所述的方法,其特征在于,所述网络设备为基站、无线网络控制器或核心网设备。
  11. 一种网络设备,其特征在于,包括:
    通信接口,用于接收来自第一终端的第一话音编码信号;
    第一话音解码器,用于对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号;
    第一虚拟频带扩展处理器,用于利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;
    第一话音编码器,将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率;
    所述通信接口,用于向与所述第一终端建立了通话连接的第二终端发送所述第二话音编码信号。
  12. 根据权利要求11所述的网络设备,其特征在于,
    所述第一虚拟频带扩展处理器具体用于,利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
  13. 根据权利要求12所述的网络设备,其特征在于,
    在利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号的方面,所述第一虚拟频带扩展处理器具体用于,利用所述话音解码参数,基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
  14. 根据权利要求12或13所述的网络设备,其特征在于,
    在所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络的方面,所述第一虚拟频带扩展处理器具体用于利用所述话音解码参数,基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱包络。
  15. 根据权利要求11至14任一项所述的网络设备,其特征在于,所述话音解码参数包括基因周期、浊音度因子和线性预测编码参数。
  16. 根据权利要求11至15任一项所述的网络设备,其特征在于,所述网络设备包括多个话音解码器,所述第一话音解码器为多个话音解码器中与所述第一终端所支持的最大频带带宽对应的话音解码器;或者,所述第一话音解码器为多个话音解码器中与所述第一终端所处第一网络所支持的最大频带带宽对应的话音解码器;
    或者,
    所述网络设备包括多个虚拟频带扩展处理器,所述第一虚拟频带扩展处理器为多个虚拟频带扩展器中的与所述第二终端所支持的最大频带带宽对应的虚拟频带扩展器;或者,所述第一虚拟频带扩展处理器为多个虚拟频带扩展器中的与所述第二终端所处第二网络所支持的最大频带带宽对应的虚拟频带扩展器;
    或者,所述网络设备包括多个话音编码器,所述第一话音编码器为多个话音编码器中与所述第二终端所支持的最大频带带宽对应的话音编码器;或者,所述第一话音编码器为多个话音编码器中与所述第二终端所处第二网络所支持的最大频带带宽对应的话音编码器。
  17. 根据权利要求11至16任一项所述的网络设备,其特征在于,所述第一话音编码信号和所述第二话音编码信号的频带带宽为如下频带带宽中的其中两个:窄带、宽带、超宽带和全带。
  18. 根据权利要求11至17任一项所述的网络设备,其特征在于,所述网络设备还包括:第二话音解码器、第二话音编码器和第一降采样器;
    所述通信接口还用于,接收来自所述第二终端的第三话音编码信号;
    所述第二话音解码器用于,对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;
    所述第一降采样器,用于对所述第三话音解码信号降采样处理以得到第四话音解码信号;
    所述第二话音编码器,用于对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,其中,所述第四话音编码信号的频带带宽小于所述第三话音编码信号的频带带宽,所述第四话音编码信号的采样率小于所述第三话音编码信号的采样率;
    所述通信接口还用于,向所述第一终端发送所述第四话音编码信号;或者所述通信接口还用于,在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述第一终端发送所述话音增强处理后的第四话音编码信号。
  19. 根据权利要求18所述的方法,其特征在于,所述网络设备包括多个话音解码器,所述第二话音解码器为与所述第二终端所支持的最大频带带宽对应的话音解码器;或者,所述第二话音解码器为与所述第二终端所处第二网络所支持的最大频带带宽对应的话音解码器;
    或者,所述网络设备包括多个降采样器,所述第一降采样器为所述多个降采样器中的与所述第一终端所支持的最大频带带宽对应的降采样器;或者,所述第一降采样器为所述 多个降采样器中的与所述第一终端所处第一网络所支持的最大频带带宽对应的降采样器;
    或者,所述网络设备包括多个话音编码器,第二话音编码器为与所述第一终端所支持的最大频带带宽对应的话音编码器;或者,第二话音编码器为与所述第一终端所处第一网络所支持的最大频带带宽对应的话音编码器。
  20. 根据权利要求11至19任一项所述的网络设备,其特征在于,所述网络设备为基站、无线网络控制器或核心网设备。
  21. 一种网络设备,其特征在于,包括:相互耦合的处理器、存储器和通信接口;
    所述通信接口,用于接收来自第一终端的第一话音编码信号;
    所述处理器,用于对所述第一话音编码信号进行话音解码处理以得到话音解码参数和第一话音解码信号;利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号;将所述第一话音解码信号和所述扩频带话音解码信号组合后进行话音编码处理以得到第二话音编码信号;其中,所述第一话音编码信号的频带带宽小于所述第二话音编码信号的频带带宽,所述第一话音编码信号的采样率小于所述第二话音编码信号的采样率;
    所述通信接口还用于向与所述第一终端建立了通话连接的第二终端发送所述第二话音编码信号。
  22. 根据权利要求21所述的网络设备,其特征在于,在利用所述话音解码参数进行虚拟频带扩展处理以得到与所述第一话音解码信号对应的扩频带话音解码信号的方面,所述处理器用于,利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号;利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络;利用与所述扩频带谱包络对应的滤波器对所述扩频带激励信号进行合成处理以得到与所述第一话音解码信号对应的扩频带话音解码信号。
  23. 根据权利要求22所述的网络设备,其特征在于,在利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带激励信号的方面,所述处理器用于,利用所述话音解码参数,基于谱折叠算法、白噪声激励算法或谐波噪声模型算法估计出与所述第一话音解码信号对应的扩频带激励信号。
  24. 根据权利要求22或23所述的网络设备,其特征在于,
    在所述利用所述话音解码参数估计出与所述第一话音解码信号对应的扩频带谱包络的方面,所述处理器用于具体用于利用所述话音解码参数,基于线性映射法、码本映射法或统计映射法估计出与所述第一话音解码信号对应的扩频带谱包络。
  25. 根据权利要求21至24任一项所述的网络设备,其特征在于,所述话音解码参数包括基因周期、浊音度因子和线性预测编码参数。
  26. 根据权利要求21至25任一项所述的网络设备,其特征在于,所述第一话音编码信号和所述第二话音编码信号的频带带宽为如下频带带宽中的其中两个:窄带、宽带、超宽带和全带。
  27. 根据权利要求21至26任一项所述的网络设备,其特征在于,所述通信接口还用于,接收来自所述第二终端的第三话音编码信号;
    其中,所述处理器还用于,对所述第三话音编码信号进行话音解码处理以得到第三话音解码信号;对所述第三话音解码信号降采样处理以得到第四话音解码信号;对所述第四话音解码信号进行话音编码处理以得到第四话音编码信号,其中,所述第四话音编码信号的频带带宽小于所述第三话音编码信号的频带带宽,所述第四话音编码信号的采样率小于所述第三话音编码信号的采样率;
    所述通信接口还用于,向所述第一终端发送所述第四话音编码信号;或者所述通信接口还用于,在对所述第四话音编码信号进行话音增强处理以得到话音增强处理后的第四话音编码信号之后,向所述第一终端发送所述话音增强处理后的第四话音编码信号。
  28. 根据权利要求21至27任一项所述的网络设备,其特征在于,所述网络设备为基站、无线网络控制器或核心网设备。
  29. 一种通信系统,其特征在于,包括:如权利要求11至28任一项所述的网络设备。
PCT/CN2017/086374 2016-05-31 2017-05-27 话音信号处理方法和相关装置和系统 WO2017206842A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610379386.0 2016-05-31
CN201610379386.0A CN105869653B (zh) 2016-05-31 2016-05-31 话音信号处理方法和相关装置和系统

Publications (1)

Publication Number Publication Date
WO2017206842A1 true WO2017206842A1 (zh) 2017-12-07

Family

ID=56642997

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2016/104157 WO2017206432A1 (zh) 2016-05-31 2016-10-31 话音信号处理方法和相关装置和系统
PCT/CN2017/086374 WO2017206842A1 (zh) 2016-05-31 2017-05-27 话音信号处理方法和相关装置和系统

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/104157 WO2017206432A1 (zh) 2016-05-31 2016-10-31 话音信号处理方法和相关装置和系统

Country Status (5)

Country Link
US (1) US10218856B2 (zh)
EP (1) EP3252767B1 (zh)
CN (1) CN105869653B (zh)
BR (1) BR102017011008A2 (zh)
WO (2) WO2017206432A1 (zh)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869653B (zh) * 2016-05-31 2019-07-12 华为技术有限公司 话音信号处理方法和相关装置和系统
CN108156307B (zh) * 2016-12-02 2020-09-08 塞舌尔商元鼎音讯股份有限公司 语音处理的方法以及语音通讯装置
CN107070854A (zh) * 2016-12-09 2017-08-18 西安华为技术有限公司 一种传输语音数据的方法、设备和装置
CN107087069B (zh) * 2017-04-19 2020-02-28 维沃移动通信有限公司 一种语音通话方法及移动终端
US10778729B2 (en) * 2017-11-07 2020-09-15 Verizon Patent And Licensing, Inc. Codec parameter adjustment based on call endpoint RF conditions in a wireless network
KR102645659B1 (ko) 2019-01-04 2024-03-11 삼성전자주식회사 뉴럴 네트워크 모델에 기반하여 무선 통신을 수행하는 장치 및 방법
KR102423977B1 (ko) * 2019-12-27 2022-07-22 삼성전자 주식회사 인공신경망 기반의 음성 신호 송수신 방법 및 장치
CN113113032A (zh) * 2020-01-10 2021-07-13 华为技术有限公司 一种音频编解码方法和音频编解码设备
US11699452B2 (en) 2020-12-08 2023-07-11 T-Mobile Usa, Inc. Machine learning-based audio codec switching
US11425259B2 (en) 2020-12-08 2022-08-23 T-Mobile Usa, Inc. Machine learning-based audio codec switching

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1552164A (zh) * 2001-06-26 2004-12-01 诺基亚公司 对音频信号进行代码变换的方法、码变换器、网元、无线通信网和通信系统
CN101208972A (zh) * 2005-06-30 2008-06-25 摩托罗拉公司 用于语音通信的带宽扩展的方法及系统
US20090138272A1 (en) * 2007-10-17 2009-05-28 Gwangju Institute Of Science And Technology Wideband audio signal coding/decoding device and method
CN104658547A (zh) * 2013-11-20 2015-05-27 大连佑嘉软件科技有限公司 一种人工语音带宽扩展的方法
CN105869653A (zh) * 2016-05-31 2016-08-17 华为技术有限公司 话音信号处理方法和相关装置和系统

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7423983B1 (en) * 1999-09-20 2008-09-09 Broadcom Corporation Voice and data exchange over a packet based network
US6765931B1 (en) * 1999-04-13 2004-07-20 Broadcom Corporation Gateway with voice
US7924752B2 (en) * 1999-09-20 2011-04-12 Broadcom Corporation Voice and data exchange over a packet based network with AGC
US7920697B2 (en) * 1999-12-09 2011-04-05 Broadcom Corp. Interaction between echo canceller and packet voice processing
ATE388542T1 (de) * 1999-12-13 2008-03-15 Broadcom Corp Sprach-durchgangsvorrichtung mit sprachsynchronisierung in abwärtsrichtung
US7330812B2 (en) * 2002-10-04 2008-02-12 National Research Council Of Canada Method and apparatus for transmitting an audio stream having additional payload in a hidden sub-channel
US7486719B2 (en) 2002-10-31 2009-02-03 Nec Corporation Transcoder and code conversion method
JP4438280B2 (ja) * 2002-10-31 2010-03-24 日本電気株式会社 トランスコーダ及び符号変換方法
US7461003B1 (en) * 2003-10-22 2008-12-02 Tellabs Operations, Inc. Methods and apparatus for improving the quality of speech signals
US20050267739A1 (en) 2004-05-25 2005-12-01 Nokia Corporation Neuroevolution based artificial bandwidth expansion of telephone band speech
US7707034B2 (en) * 2005-05-31 2010-04-27 Microsoft Corporation Audio codec post-filter
EP2045800A1 (en) * 2007-10-05 2009-04-08 Nokia Siemens Networks Oy Method and apparatus for transcoding
CN104517610B (zh) * 2013-09-26 2018-03-06 华为技术有限公司 频带扩展的方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1552164A (zh) * 2001-06-26 2004-12-01 诺基亚公司 对音频信号进行代码变换的方法、码变换器、网元、无线通信网和通信系统
CN101208972A (zh) * 2005-06-30 2008-06-25 摩托罗拉公司 用于语音通信的带宽扩展的方法及系统
US20090138272A1 (en) * 2007-10-17 2009-05-28 Gwangju Institute Of Science And Technology Wideband audio signal coding/decoding device and method
CN104658547A (zh) * 2013-11-20 2015-05-27 大连佑嘉软件科技有限公司 一种人工语音带宽扩展的方法
CN105869653A (zh) * 2016-05-31 2016-08-17 华为技术有限公司 话音信号处理方法和相关装置和系统

Also Published As

Publication number Publication date
BR102017011008A2 (pt) 2017-12-19
CN105869653A (zh) 2016-08-17
EP3252767B1 (en) 2019-06-26
US10218856B2 (en) 2019-02-26
CN105869653B (zh) 2019-07-12
WO2017206432A1 (zh) 2017-12-07
EP3252767A1 (en) 2017-12-06
US20170346954A1 (en) 2017-11-30

Similar Documents

Publication Publication Date Title
WO2017206842A1 (zh) 话音信号处理方法和相关装置和系统
EP3414760B1 (en) Encoding of multiple audio signals
US8560307B2 (en) Systems, methods, and apparatus for context suppression using receivers
AU2018237285B2 (en) Target sample generation
JP2018522272A (ja) ハイバンド信号生成
JP2018522271A (ja) ハイバンド信号生成
JP6542478B2 (ja) インターフレーム時間シフト変動のためのチャネル調整
EP3391371B1 (en) Temporal offset estimation
US11705138B2 (en) Inter-channel bandwidth extension spectral mapping and adjustment
US10885925B2 (en) High-band residual prediction with time-domain inter-channel bandwidth extension
JP6798048B2 (ja) 時間領域チャンネル間予測
KR20090129450A (ko) 고정된 배경 잡음의 평활화를 위한 방법 및 장치
BR112016022764B1 (pt) Aparelho e métodos de comutação de tecnologias de codificação em um dispositivo

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17805787

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17805787

Country of ref document: EP

Kind code of ref document: A1