WO2021135340A1 - Voice signal processing method, system and apparatus, computer device, and storage medium - Google Patents

Voice signal processing method, system and apparatus, computer device, and storage medium Download PDF

Info

Publication number
WO2021135340A1
WO2021135340A1 PCT/CN2020/113219 CN2020113219W WO2021135340A1 WO 2021135340 A1 WO2021135340 A1 WO 2021135340A1 CN 2020113219 W CN2020113219 W CN 2020113219W WO 2021135340 A1 WO2021135340 A1 WO 2021135340A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
voice
residual signal
speech
signal
Prior art date
Application number
PCT/CN2020/113219
Other languages
French (fr)
Chinese (zh)
Inventor
许慎愉
林绪虹
陈建峰
Original Assignee
广州华多网络科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州华多网络科技有限公司 filed Critical 广州华多网络科技有限公司
Publication of WO2021135340A1 publication Critical patent/WO2021135340A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm

Definitions

  • This application relates to the technical field of audio and video coding and decoding, and in particular to a voice signal processing method, system, device, computer equipment, and storage medium.
  • Current speech encoders generally use parameter encoding, that is, according to the human voice model, the speech signal is converted into channel parameters and excitation parameters, quantized and coded to generate a code stream, and then the code stream is sent to the channel for transmission. After receiving the code stream, the receiver decodes the channel parameters and excitation parameters, and then re-synthesizes the speech signal according to the utterance model.
  • an embodiment of the present application provides a voice signal processing method, which includes:
  • the speech residual signal is the uncorrelated signal or weakly correlated signal obtained after processing the original speech signal;
  • an embodiment of the present application provides a voice signal processing method, which includes:
  • the code stream sent by the encoder;
  • the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is determined based on the preset compensation configuration;
  • an embodiment of the present application provides a voice signal processing system, which includes: an encoder and a decoder; an encoder, used to implement any one of the voice signal processing provided by the embodiments of the first aspect and the second aspect The steps of the method; the decoder is used to implement the steps of any one of the voice signal processing methods provided by the embodiments of the first aspect and the second aspect.
  • an embodiment of the present application provides a voice signal processing device, and the device includes:
  • the shunt module is used to obtain the voice residual signal and shunt the voice residual signal to obtain multiple sub-voice residual signals;
  • the voice residual signal is an uncorrelated signal or a weakly correlated signal obtained after processing the original voice signal;
  • the acquisition module is used to acquire the compensation information of each sub-speech residual signal based on the preset compensation configuration
  • the sending module is used to send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
  • an embodiment of the present application provides a voice signal processing device, the device includes:
  • the receiving module is used to receive the code stream sent by the encoder; the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is based on preset compensation The configuration is determined;
  • the decoding module is used for decoding according to each sub-speech residual signal in the code stream and the corresponding compensation information.
  • an embodiment of the present application provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor executes the computer program to implement any one of the voice signals provided by the embodiments of the first aspect and the second aspect. Processing method steps.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program When executed by a processor, it implements any one of the voice signal processing provided by the embodiments of the first aspect and the second aspect. Method steps.
  • the encoder obtains the voice residual signal from the original voice signal, it splits the voice residual signal to obtain multiple sub-voice residual signals; And based on the preset compensation configuration, the compensation information of each sub-speech residual signal is obtained; then a code stream including each sub-speech residual signal and the corresponding compensation information is sent to the decoder, where the code stream is used to instruct the decoder according to each sub-speech The residual signal and the corresponding compensation information are decoded.
  • the voice residual signal it is equivalent to sending the decoder after multiple descriptions of the voice encoder parameters, and when describing each split, compensation information is added.
  • This compensation information can be effectively used when the decoder decodes. A better voice signal can be recovered. In this way, the decoder can recover a better voice signal even if packet loss occurs during the transmission process by describing the residual voice signal. Therefore, this method can effectively improve the voice encoder Anti-packet loss performance.
  • Fig. 1 is a block diagram of a voice signal processing system provided by an embodiment
  • FIG. 2 is a schematic flowchart of a voice signal processing method provided by an embodiment
  • FIG. 3 is an interaction diagram of an encoder and a decoder of a speech signal processing method provided by an embodiment
  • FIG. 4 is a schematic flowchart of a voice signal processing method provided by an embodiment
  • FIG. 5 is a schematic flowchart of a voice signal processing method provided by an embodiment
  • FIG. 6 is a schematic flowchart of a voice signal processing method provided by an embodiment
  • FIG. 7 is a schematic flowchart of a voice signal processing method provided by an embodiment
  • FIG. 8 is a schematic flowchart of a voice signal processing method provided by an embodiment
  • FIG. 9 is a structural block diagram of a voice signal processing device provided by an embodiment.
  • FIG. 10 is a structural block diagram of a voice signal processing device provided by an embodiment
  • Fig. 11 is a diagram of the internal structure of a computer device in an embodiment.
  • a voice signal processing method provided by this application can be applied to the voice signal processing system shown in FIG. 1.
  • the system includes an encoder 01 and a decoder 02, where the encoder 01 and the decoder 02 can perform data transmission.
  • the encoder 01 includes, but is not limited to, a contact encoder, a non-contact encoder, an incremental encoder, an absolute encoder, etc.
  • the embodiment of the present application does not specifically limit the type of the encoder.
  • the decoder 02 includes, but is not limited to, a hardware decoder, a wireless decoder, a software decoder, a multi-channel decoder, a single-channel decoder, etc.
  • the type of the decoder is not specifically limited in this embodiment.
  • the transmission-oriented anti-packet strategy is no longer applicable.
  • the split multi-description is an implementation of the anti-packet loss voice encoder, and the split multi-description here refers to the way in which the voice code stream to be transmitted is transmitted in a split mode.
  • the voice residual signal occupies the largest traffic in the code stream of the SILK encoder. Therefore, it is necessary to consider splitting the voice residual signal in the anti-packet voice encoder.
  • the speech residual signal means that the speech encoder removes the short and long-term correlation of the original speech signal, performs gain control, and the remaining uncorrelated or weakly correlated signal after noise shaping.
  • the speech residual signal is generally a random segment Pulse sequence. Based on this, the embodiments of the present application provide a voice signal processing method, system, device, computer equipment, and storage medium, which improve the anti-packet loss performance of the voice encoder by shunting the voice residual signal.
  • the execution body of FIGS. 2 to 5 is an encoder
  • the execution body of FIGS. 6 to 8 is a decoder, where the execution body may also be a voice signal.
  • a processing device where the device can be implemented as part or all of the encoder through software, hardware, or a combination of software and hardware.
  • execution body is an encoder
  • FIG. 2 provides a method for processing a voice signal. This embodiment relates to that after an encoder obtains a voice residual signal from an original voice signal, it splits the voice residual signal and adds a voice encoder to the split voice signal. The specific process of compensating information and sending it to the decoder is shown in Figure 2. The method includes:
  • S101 Obtain a voice residual signal, and shunt the voice residual signal to obtain multiple sub-voice residual signals; the voice residual signal is an uncorrelated signal or a weakly correlated signal obtained after processing the original voice signal.
  • the speech residual signal speech is the uncorrelated or weakly correlated signal remaining after the encoder removes the short- and long-term correlation of the original speech signal, and performs gain control and noise shaping.
  • the acquisition of the speech residual signal can be understood as the encoder (hereinafter also referred to as the speech encoder) after receiving a segment of the original speech signal, the original speech signal is divided into the speech residual signal and other parameters, where the other parameters are the original
  • the other parameters include not only one parameter, but multiple parameters. As to which parameters are specifically included, this embodiment does not limit it.
  • the encoder obtains the voice residual signal from the original voice signal, it shunts the voice residual signal. It is understandable that because the voice signal is actually a bit stream after entering the encoder, the essence of the signal is a signal sequence, so to split the voice residual signal is to split the entire code stream of the voice residual signal sequence, and split into multiple streams. Article signal sequence.
  • the voice residual signal can be divided into two code streams, or it can be divided into other numbers of code streams, which is not limited in this embodiment.
  • split into two that is, split the voice residual signal into a first sub-voice residual signal and a second sub-voice residual signal.
  • the speech residual signal is split into the first sub speech residual signal and the second sub speech residual signal
  • other parameters in the original speech signal except for the speech residual signal will be copied and saved in each sub speech at the same time.
  • each bar code stream formed after the final split includes the sub-voice residual signal, and also carries complete other parameters. In this way, if the voice residual signal is recovered at the decoding end, it can be restored in combination with other parameters. Out the original voice signal.
  • S102 Acquire compensation information of each sub-speech residual signal based on a preset compensation configuration.
  • the compensation configuration indicates the configuration of the compensation information, and the compensation information is the preset compensation information for each sub-speech residual signal, and the decoder can compensate the extra information according to the compensation information according to each sub-speech residual signal.
  • the additional information of compensation can enable the decoder to better recover the speech signal during decoding.
  • the compensation configuration may include code rate configuration and compensation parameters.
  • the code rate configuration determines the upper limit of the traffic of each transmission packet when transmitting the voice signal code stream.
  • the change of the compensation parameter will cause the proportion of the compensation information in a packet of data to change.
  • the compensation parameter may include the number of small frames dividing each sub-speech residual signal, and the number of non-zero pulses in each small frame.
  • the compensation configuration information can be determined by the preset packet loss rate. Generally, under the same average code rate, the lower the packet loss rate, the less compensation information; conversely, the higher the packet loss rate, the more compensation information. In extreme cases, such as no packet loss, the compensation information size is 0.
  • the compensation configuration needs to be preset, and the preset can be determined according to historical big data and in combination with actual conditions, which is not limited in this embodiment.
  • the encoder can obtain the compensation information of each sub-speech residual signal through a preset algorithm or a pre-trained neural network model, and directly determine the corresponding compensation using the compensation configuration information as input. information.
  • it can also be in other ways, which is not limited in this embodiment.
  • S103 Send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
  • each code stream After determining each sub-speech residual signal from the speech residual signal and obtaining the compensation information of each sub-speech residual signal, when the encoder transmits the code stream to the decoder, each code stream includes each sub-speech residual signal and the corresponding
  • the compensation information also needs to include the other parameters mentioned above. In this embodiment, only the speech residual signal is described, and the other parameters will not be repeated in some embodiments.
  • the transmission of the code stream from the encoder to the decoder is used to instruct the decoder to decode and recover the speech residual signal according to the sub-speech residual signals and corresponding compensation information in the code stream.
  • FIG. 3 a schematic diagram of an encoder sending a split code stream to a decoder is provided.
  • the main decoder and side decoder in Figure 3 can be considered as one decoder.
  • the main decoder and the side decoder can be regarded as sub-decoders that implement different decoding methods in one decoder.
  • the specific decoding process of the decoder please refer to the description in the embodiment with the decoder as the execution subject, and will not be repeated here.
  • the encoder after the encoder obtains the speech residual signal from the original speech signal, it splits the speech residual signal to obtain multiple sub-speech residual signals; and obtains each sub-speech residual signal based on a preset compensation configuration Signal compensation information; and then send to the decoder a code stream including each sub-speech residual signal and corresponding compensation information, where the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
  • the encoder by splitting the voice residual signal, it is equivalent to sending the decoder after multiple descriptions of the voice encoder parameters, and when describing each split, compensation information is added. This compensation information can be effectively used when the decoder decodes.
  • a better voice signal can be recovered.
  • the decoder can recover a better voice signal even if packet loss occurs during the transmission process by describing more voice residual signals. Therefore, this method uses more voice residual signals.
  • the described method can effectively improve the anti-packet performance of the speech encoder.
  • the embodiment of the present application also provides a voice signal processing method, which relates to the specific process of the voice encoder shunting the voice residual signal into two sub-voice residual signals.
  • the multiple sub-speech residual signals include a first sub-speech residual signal and a second sub-speech residual signal; as shown in FIG. 4, the above S101 step includes:
  • S201 quantize the speech residual signal to obtain a quantized sequence corresponding to the speech residual signal.
  • the speech encoder in this embodiment is described by taking the SILK encoder as an example.
  • the splitting method that can be used is odd-even splitting.
  • the SILK encoder needs to quantize the speech residual signal before shunting the speech residual signal to obtain the quantized sequence corresponding to the speech residual signal.
  • S202 Perform odd-even splitting on the quantized sequence to obtain an odd quantized sequence and an even quantized sequence.
  • the quantized sequence is odd-even split, and after the split, the final odd quantized sequence and the even quantized sequence need to be further determined based on the algorithm of the SILK encoder.
  • the parity split sequence can be expressed as:
  • the determined symbol function is: Determine the random seed pair sequence: And determine the odd sequence of random seeds: Among them, seed_init will keep a copy in odd and even streams, and the size is generated by the SILK encoder.
  • the final odd quantization sequence and even quantization sequence can be determined as:
  • Q represents the quantization algorithm in the SILK encoder, which is provided by the SILK encoder. Offset is obtained by looking up the table according to the small frame type, which is also provided by the SILK encoder.
  • S203 Determine the odd quantization sequence as the first sub-speech residual signal, and determine the even quantization sequence as the second sub-speech residual signal.
  • the encoder determines the odd quantization sequence as the first sub-speech residual signal and the even quantization sequence as the second sub-speech residual signal.
  • the even quantization sequence can also be determined as the first sub-speech The residual signal
  • the odd quantization sequence is determined to be the second sub-speech residual signal, the correspondence between the first and second quantization sequences and the odd and even quantization sequence is not limited in this embodiment.
  • the parity split is performed based on the quantized residual speech signal quantization sequence, and combined with the algorithm in the speech encoder, the final parity quantization sequence is determined. In this way, the parity quantization sequence is transmitted as the final encoder.
  • the code stream facilitates the transmission of voice residual signals.
  • the compensation parameters in the above compensation configuration are used, and the compensation parameters include the number of small frames into which each sub-speech residual signal is divided, and the number of non-zero pulses in each small frame N2; N1 is a positive integer, and N2 is Taking a non-negative integer as an example, the process of obtaining the compensation information of each sub-speech residual signal will be described.
  • the compensation parameter is preset in practice and can be determined according to the packet loss rate to ensure the reasonableness of the set N1 and N2. As shown in Figure 5, the above S102 step includes:
  • S301 Obtain the compensation gain, position sequence, and symbol sequence corresponding to each sub-speech residual signal; the length of the compensation gain is N1, and the length of the position sequence and the symbol sequence are both N2.
  • the length of the compensation gain, position sequence and symbol sequence are all N2, which means that the compensation gain, position sequence and symbol sequence are all for each small frame in the sub-speech residual signal, that is to say, the encoder obtains in this step It is the compensation gain, position sequence and symbol sequence of each small frame in each sub-speech residual signal.
  • the i-th small frame, crq i ) represents the i-th small frame in the sequence recovered based on the compensation information when the decoder receives a single bar code stream in the shunted code stream, i can take from 0 to cfc-1 Value;
  • the function ABS represents the absolute value of the items in the speech signal sequence, and the items in the speech signal sequence are determined by rq i and crq i ).
  • the MAX_POSx function returns the position sequence of the first nz largest item.
  • S302 Construct a compensation sequence of each sub-speech residual signal according to the N1 position sequences and symbol sequences corresponding to each sub-speech residual signal;
  • each sub-speech residual signal Based on the position sequence and symbol sequence of each sub-speech residual signal determined by the above encoder, all small frames are spliced to form a complete sequence, that is, N1 position sequences and symbol sequences of each small frame are constructed to construct a complete sequence.
  • the completed sequence is each sub-frame.
  • the compensation sequence is represented by cq. Since N1 is the number of small frames divided by each sub-speech residual signal, that is, the length of each sub-speech residual signal, the length of cq of each sub-speech residual signal is L/2, and L is The length of the entire speech residual signal.
  • S303 Determine the compensation information of each sub-speech residual signal according to the compensation sequence of each sub-speech residual signal and the compensation gain of each sub-speech residual signal.
  • the compensation sequence and the compensation gain of each sub-speech residual signal determined above are determined, and the compensation sequence and the compensation gain of each sub-speech residual signal are determined as the final compensation information.
  • each sub-speech residual signal is first divided into multiple small frames, and then the gain value and compensation sequence of each small frame are obtained, and the final compensation sequence is determined based on the entire compensation sequence. Compensation information.
  • the determined compensation information can fully and completely decode the decoder and provide additional compensation information to ensure the quality of the speech signal recovered by the decoder.
  • the above compensation configuration also includes a code rate configuration, which is a parameter that determines the upper limit of the transmission packet flow, and the code rate configuration can be determined by a preset packet loss rate.
  • the method further includes : Determine the space size of the compensation information of each sub-voice residual signal according to the bit rate configuration; the space size is used to indicate the space capacity for storing the compensation information when the code stream is sent.
  • the capacity of the space for storing the compensation information needs to be further determined.
  • the storage space of each divided bit stream is divided into a similar size, which is beneficial to the subsequent transmission performance test, and in the transmitted bit stream After the compensation information is added, the size of the code stream will increase to a certain extent.
  • the execution subject is a decoder. It should be noted that although this application divides the embodiment in which the decoder is the main body of execution and the embodiment in which the encoder is the main body of execution, in reality, the decoder and the encoder interact with each other to complete speech signal processing. Therefore, The process descriptions in the embodiment in which the encoder is the execution subject and the embodiment in which the decoder is the execution subject may refer to each other, rather than as a limitation of the execution range of the two.
  • a method for processing a voice signal is provided.
  • This embodiment relates to a specific process of decoding after a decoder receives a code stream sent by an encoder.
  • the method includes:
  • S401 Receive a code stream sent by an encoder; the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is determined based on a preset compensation configuration .
  • S402 Perform decoding according to each sub-speech residual signal in the bitstream and corresponding compensation information.
  • the decoder when the decoder receives the code stream sent by the encoder, it either receives all or part of the code stream, that is, packet loss occurs. For two different situations, the decoder adopts different decoding methods. For restoring the residual voice signal, refer to the description in the following embodiment for the specific process.
  • the decoder After the decoder receives the code stream sent by the encoder, it decodes according to the sub-speech residual signal and corresponding compensation information carried in each code stream. After the voice residual signal is obtained from the signal, the voice residual signal is shunted to obtain multiple sub-voice residual signals, and based on the preset compensation configuration, the compensation information of each sub-voice residual signal is obtained, and then sent to the decoder including each sub-voice residual signal The code stream of the signal and the corresponding compensation information.
  • the voice residual signal is split at the encoder end, which is equivalent to sending the decoder after multiple descriptions of the voice encoder parameters, and compensation information is added when each split is described, and the compensation information can be used in the decoder During decoding, a better voice signal can be recovered effectively. In this way, the decoder can recover a better voice signal even if packet loss occurs during the transmission process by describing the residual voice signal. Therefore, this method can effectively improve Anti-packet loss performance of the speech encoder.
  • the multiple sub-speech residual signals include the first sub-speech residual signal and the second sub-speech residual signal as an example for description.
  • the multiple sub-speech residual signals include a first sub-speech residual signal and a second sub-speech residual signal; and the received code stream is the first sub-speech residual signal and corresponding compensation information, and, the second The sub-speech residual signal and the corresponding compensation information; then, as shown in FIG. 7, the above step S402 includes:
  • S501 Restore a corresponding even voice residual signal according to the first sub-speech residual signal, and restore a corresponding odd voice residual signal according to the second sub-speech residual signal.
  • the difference between the even voice residual signal and the first sub-speech residual signal is: the first sub-speech residual signal is the sub-signal sequence that the encoder side quantizes and shunts the voice residual signal, while the even voice residual signal is the decoder side according to the sub-signal sequence.
  • the voice residual information recovered from the signal sequence; the odd voice residual signal has the same difference with the second sub-voice residual signal.
  • the first sub-speech residual signal is an even quantization sequence
  • the second sub-speech residual signal is an odd quantization sequence.
  • the correspondence between the two can also be exchanged, because the first The second itself is only used to distinguish the sub-speech residual signals, which is not limited in this embodiment.
  • odd quantization sequence and even quantization sequence are integers:
  • the even voice residual signal and the odd voice residual signal can be expressed as:
  • q(n) is the quantized sequence quantized from r(n), and rq(n) represents the speech residual signal recovered from q(n).
  • the process of the decoder recovering rq(n) from q(n) can be performed by using some commonly used decoding algorithms, which is not limited in this embodiment.
  • S502 Perform interleaving and interpolation on the even voice residual signal and the odd voice residual signal to determine the voice residual signal.
  • the decoder Based on the restored even voice residual signal and odd voice residual signal, the decoder performs interleaving and interpolation on the even voice residual signal and the odd voice residual signal, that is, interleaving and inserting the odd and even items respectively to obtain a complete voice residual signal.
  • the decoder restores the speech residual signal, it can restore the original speech signal by combining other parameters carried in the code stream.
  • the even voice residual signal and the odd voice residual signal are interleaved sample by sample to recover the optimal voice residual signal. Signal, which can restore the original voice signal with higher sound quality.
  • the multiple sub-speech residual signals include a first sub-speech residual signal and a second sub-speech residual signal; and the received code stream is the first sub-speech residual signal and corresponding compensation information, or the second sub-speech residual signal
  • the sub-voice residual signal and the corresponding compensation information includes:
  • S601 Restore the corresponding even voice residual signal according to the first sub-speech residual signal, or restore the corresponding odd voice residual signal according to the second sub-speech residual signal.
  • only one sub-speech residual signal is received, for example, only the first sub-speech residual signal is received or only the second sub-speech residual signal is received.
  • the encoder recovers only the even voice residual signal or the odd voice residual signal, that is, which sub voice residual signal is received, and what is restored is the voice residual signal corresponding to the sub voice residual signal.
  • S602 Restore a similar voice residual signal based on the even voice residual signal, or restore a similar voice residual signal based on the odd voice residual signal.
  • the similar voice residual signal is restored, where the similar voice residual signal represents the voice residual signal recovered based on the compensation information, which is represented by crq(n), and the similar voice residual signal is equal to rq There is a small amount of error between (n), so it is called similar speech residual signal.
  • the even speech residual signal recovered in the above steps can be expressed as rq e
  • the odd speech residual signal can be expressed as rq o
  • the similar speech residual signal is expressed as crq.
  • the above formula indicates, is determined based on rq e crq e, determined based CRQ rq o is O, and the sequence of the above formula is 0 ⁇ L-1 in the fully covered, it is considered that the recovered CRQ
  • the lengths of e and crq o are both L, so crq e and crq o can be collectively referred to as crq, that is, similar speech residual signals.
  • S603 Determine the target similar voice residual signal based on the compensation information and the similar voice residual signal corresponding to the first sub-voice residual signal, or determine the target similar voice residual signal according to the compensation information and the similar voice residual signal corresponding to the second sub-voice residual signal signal.
  • the compensation information carried in each sub-speech residual signal is merged into the similar speech residual signal to obtain the final target similar speech residual signal.
  • the target determined in the above steps is similar to the voice residual signal to the determined voice residual signal. It is understandable that although there is a certain small error between the target similar voice residual signal and the optimal voice residual signal determined in this embodiment, this implementation is based on the voice residual signal recovered when the decoder receives a single code stream. signal. In other words, in the case of packet loss, the decoder can recover the target similar voice residual signal.
  • the sound quality verification is performed on the target similar voice residual signal obtained by the method provided in this embodiment, as shown in Table 1 below, to use the same code stream ch_f1.wav, compare it under the same packet loss strategy and similar actual traffic
  • This method is divided into MOS of SILK encoder.
  • the encoder provided by the embodiments of the present application has strong anti-packet loss performance by splitting and multiple descriptions of the parameters of the encoder.
  • the decoder can even receive only one packet. Solve the medium sound quality, if the decoder receives two packets in time, it can restore the higher sound quality.
  • the embodiment of the present application also provides a voice signal processing system, which can be referred to as shown in Figure 1.
  • the system includes: an encoder and a decoder; wherein the encoder is used to implement all the previous implementations of the encoder as the main body.
  • the process in the embodiment; the decoder is used to implement the processes in all the previous embodiments where the decoder is the main body of execution.
  • a virtual device corresponding to the above-mentioned voice signal processing method is also provided.
  • a voice signal processing device includes: a shunt module 10, an acquisition module 11, and a sending module 12, of which,
  • the shunting module 10 is used to obtain the speech residual signal and shunt the speech residual signal to obtain multiple sub-speech residual signals;
  • the speech residual signal is an uncorrelated signal or a weakly correlated signal obtained after processing the original speech signal;
  • the obtaining module 11 is configured to obtain compensation information of each sub-speech residual signal based on a preset compensation configuration
  • the sending module 12 is used to send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
  • a voice signal processing device is provided. If the multiple sub-voice residual signals include a first sub-voice residual signal and a second sub-voice residual signal, the above-mentioned shunt module 10 includes:
  • the quantization unit is used to quantize the speech residual signal to obtain a quantized sequence corresponding to the speech residual signal
  • the shunting unit is used to perform odd-even shunting of the quantized sequence to obtain the odd quantized sequence and the even quantized sequence;
  • the sub-signal determining unit is configured to determine the odd quantization sequence as the first sub-speech residual signal, and determine the even quantization sequence as the second sub-speech residual signal.
  • a voice signal processing device includes compensation parameters, and the compensation parameters include the number of small frames into which each sub-speech residual signal is divided, N1, and the number of non-zero pulses in each small frame, N2; N1 is a positive integer and N2 is a non-negative integer; then the above-mentioned obtaining module 11 includes:
  • the acquiring unit is used to acquire the compensation gain, position sequence and symbol sequence corresponding to each sub-speech residual signal; the length of the compensation gain is N1, and the length of the position sequence and symbol sequence are both N2;
  • the construction unit is used to construct the compensation sequence of each sub-speech residual signal according to the N1 position sequences and symbol sequences corresponding to each sub-speech residual signal;
  • the compensation information determining unit is used to determine the compensation information of each sub-speech residual signal according to the compensation sequence of each sub-speech residual signal and the compensation gain of each sub-speech residual signal.
  • a voice signal processing device is provided, and the compensation configuration further includes a bit rate configuration; the bit rate configuration is determined according to a preset packet loss rate; the device further includes: a space determining module, configured to configure according to the bit rate Determine the space size of the compensation information of each sub-voice residual signal; the space size is used to indicate the space capacity for storing the compensation information when the code stream is sent.
  • a voice signal processing device is provided, and the device includes:
  • the receiving module 13 is used to receive the code stream sent by the encoder; the code stream includes multiple sub-voice residual signals and corresponding compensation information; each sub-voice residual signal is obtained by splitting from the voice residual signal; the compensation information is based on a preset The compensation configuration is determined;
  • the decoding module 14 is used for decoding according to each sub-speech residual signal in the bitstream and the corresponding compensation information.
  • a voice signal processing device is provided. If the plurality of sub-voice residual signals include a first sub-voice residual signal and a second sub-voice residual signal; and the received code stream is the first sub-voice residual signal and the corresponding The compensation information of, and, the second sub-speech residual signal and the corresponding compensation information; then the above-mentioned decoding module 14 includes:
  • the first restoration unit is configured to restore the corresponding even voice residual signal according to the first sub-voice residual signal, and restore the corresponding odd voice residual signal according to the second sub-voice residual signal;
  • the first voice signal determining unit is used to perform interleaving and interpolation on the even voice residual signal and the odd voice residual signal to determine the voice residual signal.
  • a voice signal processing device is provided. If the plurality of sub-voice residual signals include a first sub-voice residual signal and a second sub-voice residual signal; and the received code stream is the first sub-voice residual signal and the corresponding Or, the second sub-speech residual signal and the corresponding compensation information; the decoding module 14 includes:
  • the second restoration unit is configured to restore the corresponding even voice residual signal according to the first sub-voice residual signal, or restore the corresponding odd voice residual signal according to the second sub-voice residual signal;
  • the third restoration unit is configured to restore similar voice residual signals based on even voice residual signals, or restore similar voice residual signals based on odd voice residual signals;
  • the target similar signal unit is used to determine the target similar voice residual signal based on the compensation information and the similar voice residual signal corresponding to the first sub-voice residual signal, or, according to the compensation information and the similar voice residual signal corresponding to the second sub-voice residual signal, Determine the target similar voice residual signal.
  • the second voice signal determining unit is used to determine the voice residual signal according to the target similar voice residual signal.
  • Each module in the above-mentioned speech signal processing device can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 11.
  • the computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system and a computer program.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a voice signal processing method.
  • the display screen of the computer device can be a liquid crystal display or an electronic ink display screen
  • the input device of the computer device can be a touch layer covered on the display screen, or it can be a button, trackball or touchpad set on the computer device shell , It can also be an external keyboard, touchpad, or mouse.
  • FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when the processor executes the computer program:
  • the speech residual signal is the uncorrelated signal or weakly correlated signal obtained after processing the original speech signal;
  • the code stream sent by the encoder;
  • the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is determined based on the preset compensation configuration;
  • a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
  • the speech residual signal is the uncorrelated signal or weakly correlated signal obtained after processing the original speech signal;
  • the code stream sent by the encoder;
  • the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is determined based on the preset compensation configuration;
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

A voice signal processing method. The method comprises: an encoder obtains a voice residual signal from an original voice signal, and shunts same to obtain multiple sub-voice residual signals (S101), obtains compensation information of the sub-voice residual signals on the basis of a preset compensation configuration (S102), and sends a code stream comprising the sub-voice residual signals and corresponding compensation information to a decoder, the code stream being used for instructing the decoder to decode according to the sub-voice residual signals and the corresponding compensation information (S103). The method can effectively improve the anti-packet loss performance of a speech encoder. Also provided are a voice signal processing system and apparatus, a computer device, and a storage medium.

Description

语音信号处理方法、系统、装置、计算机设备和存储介质Voice signal processing method, system, device, computer equipment and storage medium
本申请要求于2019年12月31日提交中国专利局、申请号为201911422259.4、发明名称为“语音信号处理方法、系统、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 201911422259.4, and the invention title is "voice signal processing methods, systems, devices, computer equipment and storage media" on December 31, 2019, and its entire contents Incorporated in this application by reference.
技术领域Technical field
本申请涉及音视频编解码技术领域,特别是涉及一种语音信号处理方法、系统、装置、计算机设备和存储介质。This application relates to the technical field of audio and video coding and decoding, and in particular to a voice signal processing method, system, device, computer equipment, and storage medium.
背景技术Background technique
目前的语音编码器普遍采用参数编码,即根据人的发声模型,对语音信号转化为声道参数与激励参数,对其进行量化编码生成码流,然后将码流发送到信道进行传输。接收方在收到该码流后,解码出声道参数与激励参数,然后根据发声模型,重新合成语音信号。Current speech encoders generally use parameter encoding, that is, according to the human voice model, the speech signal is converted into channel parameters and excitation parameters, quantized and coded to generate a code stream, and then the code stream is sent to the channel for transmission. After receiving the code stream, the receiver decodes the channel parameters and excitation parameters, and then re-synthesizes the speech signal according to the utterance model.
实际应用中,在传输码流时,经常会发生丢包。基于这样的现实,人们开发了很多抗丢包策略,主要分成两类,一类是面向传输,主要思路是低延时下重传,高延时下做前向纠错(Forward Error Correction,FEC)。其中,面向传输的FEC,重传等抗丢包策略在极弱网下(比如20kbps,甚至更低)不再适用,因此,需要采用另一类抗丢包策略,即对编码器本身进行改进,也称为抗丢包语音编码器。In practical applications, packet loss often occurs when the code stream is transmitted. Based on this reality, people have developed many anti-packet loss strategies, which are mainly divided into two categories. One is transmission-oriented. The main idea is to retransmit under low latency and forward error correction (FEC) under high latency. Among them, transmission-oriented FEC, retransmission and other anti-packet strategies are no longer applicable under extremely weak networks (such as 20kbps or even lower). Therefore, another type of anti-packet strategy needs to be adopted, which is to improve the encoder itself. , Also known as anti-packet speech encoder.
但是,现有的语音编码器抗丢包性能普遍较差。However, the anti-packet loss performance of existing speech encoders is generally poor.
发明内容Summary of the invention
基于此,有必要针对上述技术问题,提供一种语音信号处理方法、系统、装置、计算机设备和存储介质。Based on this, it is necessary to provide a voice signal processing method, system, device, computer equipment, and storage medium in response to the above technical problems.
第一方面,本申请实施例提供一种语音信号处理方法,该方法包括:In the first aspect, an embodiment of the present application provides a voice signal processing method, which includes:
获取语音残留信号,并对语音残留信号进行分流,得到多个子语音残留信号;语音残留信号为对原始语音信号进行处理后得到的无相关性信号 或弱相关性信号;Obtain the speech residual signal, and shunt the speech residual signal to obtain multiple sub-speech residual signals; the speech residual signal is the uncorrelated signal or weakly correlated signal obtained after processing the original speech signal;
基于预设的补偿配置,获取各子语音残留信号的补偿信息;Obtain the compensation information of each sub-speech residual signal based on the preset compensation configuration;
向解码器发送包括各子语音残留信号和对应的补偿信息的码流;码流用于指示解码器根据各子语音残留信号和对应补偿信息进行解码。Send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
第二方面,本申请实施例提供一种语音信号处理方法,该方法包括:In a second aspect, an embodiment of the present application provides a voice signal processing method, which includes:
接收编码器发送的码流;码流包括多个子语音残留信号和对应的补偿信息;各子语音残留信号为从语音残留信号中分流得到的;补偿信息是基于预设的补偿配置确定的;Receive the code stream sent by the encoder; the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is determined based on the preset compensation configuration;
根据码流中的各子语音残留信号和对应的补偿信息进行解码。Decode according to each sub-speech residual signal in the bitstream and the corresponding compensation information.
第三方面,本申请实施例提供一种语音信号处理系统,该系统包括:编码器和解码器;编码器,用于实现上述第一方面和第二方面实施例提供的任一项语音信号处理方法的步骤;解码器,用于实现上述第一方面和第二方面实施例提供的任一项语音信号处理方法的步骤。In a third aspect, an embodiment of the present application provides a voice signal processing system, which includes: an encoder and a decoder; an encoder, used to implement any one of the voice signal processing provided by the embodiments of the first aspect and the second aspect The steps of the method; the decoder is used to implement the steps of any one of the voice signal processing methods provided by the embodiments of the first aspect and the second aspect.
第四方面,本申请实施例提供一种语音信号处理装置,装置包括:In a fourth aspect, an embodiment of the present application provides a voice signal processing device, and the device includes:
分流模块,用于获取语音残留信号,并对语音残留信号进行分流,得到多个子语音残留信号;语音残留信号为对原始语音信号进行处理后得到的无相关性信号或弱相关性信号;The shunt module is used to obtain the voice residual signal and shunt the voice residual signal to obtain multiple sub-voice residual signals; the voice residual signal is an uncorrelated signal or a weakly correlated signal obtained after processing the original voice signal;
获取模块,用于基于预设的补偿配置,获取各子语音残留信号的补偿信息;The acquisition module is used to acquire the compensation information of each sub-speech residual signal based on the preset compensation configuration;
发送模块,用于向解码器发送包括各子语音残留信号和对应的补偿信息的码流;码流用于指示解码器根据各子语音残留信号和对应补偿信息进行解码。The sending module is used to send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
第五方面,本申请实施例提供一种语音信号处理装置,装置包括:In a fifth aspect, an embodiment of the present application provides a voice signal processing device, the device includes:
接收模块,用于接收编码器发送的码流;码流包括多个子语音残留信号和对应的补偿信息;各子语音残留信号为从语音残留信号中分流得到的;补偿信息是基于预设的补偿配置确定的;The receiving module is used to receive the code stream sent by the encoder; the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is based on preset compensation The configuration is determined;
解码模块,用于根据码流中的各子语音残留信号和对应的补偿信息进行解码。The decoding module is used for decoding according to each sub-speech residual signal in the code stream and the corresponding compensation information.
第六方面,本申请实施例提供一种计算机设备,包括存储器和处理器, 存储器存储有计算机程序,处理器执行计算机程序时实现上述第一方面和第二方面实施例提供的任一项语音信号处理方法的步骤。In a sixth aspect, an embodiment of the present application provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor executes the computer program to implement any one of the voice signals provided by the embodiments of the first aspect and the second aspect. Processing method steps.
第七方面,本申请实施例提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述第一方面和第二方面实施例提供的任一项语音信号处理方法的步骤。In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, it implements any one of the voice signal processing provided by the embodiments of the first aspect and the second aspect. Method steps.
本申请实施例提供的一种语音信号处理方法、系统、装置、计算机设备和存储介质,编码器从原始语音信号中获取语音残留信号后,对语音残留信号进行分流,得到多个子语音残留信号;并基于预设的补偿配置,获取各子语音残留信号的补偿信息;然后向解码器发送包括各子语音残留信号和对应的补偿信息的码流,其中该码流用于指示解码器根据各子语音残留信号和对应补偿信息进行解码。该方法中,通过对语音残留信号进行分流,相当于对语音编码器参数进行多描述后发送解码器,且对各分流进行描述时均增加了补偿信息,该补偿信息可用于解码器解码时有效恢复出较好的语音信号,这样,通过对语音残留信号多描述的方式,即使在传输过程中发生丢包,解码器也可以恢复出较好语音信号,因此,该方法可以有效提高语音编码器的抗丢包性能。In the voice signal processing method, system, device, computer equipment and storage medium provided by the embodiments of the present application, after the encoder obtains the voice residual signal from the original voice signal, it splits the voice residual signal to obtain multiple sub-voice residual signals; And based on the preset compensation configuration, the compensation information of each sub-speech residual signal is obtained; then a code stream including each sub-speech residual signal and the corresponding compensation information is sent to the decoder, where the code stream is used to instruct the decoder according to each sub-speech The residual signal and the corresponding compensation information are decoded. In this method, by splitting the voice residual signal, it is equivalent to sending the decoder after multiple descriptions of the voice encoder parameters, and when describing each split, compensation information is added. This compensation information can be effectively used when the decoder decodes. A better voice signal can be recovered. In this way, the decoder can recover a better voice signal even if packet loss occurs during the transmission process by describing the residual voice signal. Therefore, this method can effectively improve the voice encoder Anti-packet loss performance.
附图说明Description of the drawings
图1为一个实施例提供的一种语音信号处理系统框图;Fig. 1 is a block diagram of a voice signal processing system provided by an embodiment;
图2为一个实施例提供的一种语音信号处理方法的流程示意图;FIG. 2 is a schematic flowchart of a voice signal processing method provided by an embodiment;
图3为一个实施例提供的一种语音信号处理方法的编码器和解码器交互图;FIG. 3 is an interaction diagram of an encoder and a decoder of a speech signal processing method provided by an embodiment;
图4为一个实施例提供的一种语音信号处理方法的流程示意图;FIG. 4 is a schematic flowchart of a voice signal processing method provided by an embodiment;
图5为一个实施例提供的一种语音信号处理方法的流程示意图;FIG. 5 is a schematic flowchart of a voice signal processing method provided by an embodiment;
图6为一个实施例提供的一种语音信号处理方法的流程示意图;FIG. 6 is a schematic flowchart of a voice signal processing method provided by an embodiment;
图7为一个实施例提供的一种语音信号处理方法的流程示意图;FIG. 7 is a schematic flowchart of a voice signal processing method provided by an embodiment;
图8为一个实施例提供的一种语音信号处理方法的流程示意图;FIG. 8 is a schematic flowchart of a voice signal processing method provided by an embodiment;
图9为一个实施例提供的一种语音信号处理装置的结构框图;FIG. 9 is a structural block diagram of a voice signal processing device provided by an embodiment;
图10为一个实施例提供的一种语音信号处理装置的结构框图;FIG. 10 is a structural block diagram of a voice signal processing device provided by an embodiment;
图11为一个实施例中计算机设备的内部结构图。Fig. 11 is a diagram of the internal structure of a computer device in an embodiment.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions, and advantages of this application clearer and clearer, the following further describes the application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
为了更好的理解本申请实施例提供的语音信号处理方法,提供一个本申请实施例适用的应用环境。请参见图1,本申请提供的一种语音信号处理方法,可以应用于如图1所示的语音信号处理系统。该系统包括编码器01以及解码器02,其中编码器01可以与解码器02进行数据传输。其中,编码器01包括但不限于是接触式编码器、非接触式编码器、增量式编码器、绝对值编码器等,本申请实施例对编码器的类型不作具体限定。其中,解码器02包括但不限于是硬件解码器、无线解码器、软件解码器、多路解码器、单路解码器等,本实施例对解码器的类型也不作具体限定。In order to better understand the voice signal processing method provided by the embodiment of the present application, an application environment to which the embodiment of the present application is applicable is provided. Please refer to FIG. 1, a voice signal processing method provided by this application can be applied to the voice signal processing system shown in FIG. 1. The system includes an encoder 01 and a decoder 02, where the encoder 01 and the decoder 02 can perform data transmission. The encoder 01 includes, but is not limited to, a contact encoder, a non-contact encoder, an incremental encoder, an absolute encoder, etc. The embodiment of the present application does not specifically limit the type of the encoder. The decoder 02 includes, but is not limited to, a hardware decoder, a wireless decoder, a software decoder, a multi-channel decoder, a single-channel decoder, etc. The type of the decoder is not specifically limited in this embodiment.
通常在极弱网下(比如20kbps,甚至更低),面向传输的抗丢包策略不再适用,此时需要开发抗丢语音包编码器,提高语音编码器本身的抗丢包能力。其中,分流多描述是抗丢包语音编码器的一种实现方式,这里分流多描述指的是对待传输的语音码流通过分流的方式进行传输的方式。Usually under extremely weak networks (such as 20kbps or even lower), the transmission-oriented anti-packet strategy is no longer applicable. At this time, it is necessary to develop an anti-lost voice packet encoder to improve the anti-packet ability of the voice encoder itself. Among them, the split multi-description is an implementation of the anti-packet loss voice encoder, and the split multi-description here refers to the way in which the voice code stream to be transmitted is transmitted in a split mode.
以SILK编码器为例,一般在语音信号中,语音残留信号在SILK编码器的码流中占有最大流量,所以有必要考虑在抗丢包语音编码器中对语音残留信号进行分流。其中,语音残留信号表示的是语音编码器将原始语音信号经过去除短长时相关,进行增益控制,噪声整形之后剩下的无相关性或弱相关性的信号,该语音残留信号一般为一段随机的脉冲序列。基于此,本申请实施例提供一种语音信号处理方法、系统、装置、计算机设备和存储介质,通过对语音残留信号进行分流,提高语音编码器的抗丢包性能。Take the SILK encoder as an example. Generally, in the voice signal, the voice residual signal occupies the largest traffic in the code stream of the SILK encoder. Therefore, it is necessary to consider splitting the voice residual signal in the anti-packet voice encoder. Among them, the speech residual signal means that the speech encoder removes the short and long-term correlation of the original speech signal, performs gain control, and the remaining uncorrelated or weakly correlated signal after noise shaping. The speech residual signal is generally a random segment Pulse sequence. Based on this, the embodiments of the present application provide a voice signal processing method, system, device, computer equipment, and storage medium, which improve the anti-packet loss performance of the voice encoder by shunting the voice residual signal.
下面将通过实施例并结合附图具体地对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。需要说明的是,本申请提供的一种语音信号处理方法,图2-图5的执 行主体为编码器,图6-图8的执行主体为解码器,其中,其执行主体还可以是语音信号处理装置,其中该装置可以通过软件、硬件或者软硬件结合的方式实现成为编码器的部分或者全部。Hereinafter, the technical solution of the present application and how the technical solution of the present application solves the above-mentioned technical problems will be described in detail through the embodiments and the accompanying drawings. The following specific embodiments can be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. It should be noted that, in the voice signal processing method provided by the present application, the execution body of FIGS. 2 to 5 is an encoder, and the execution body of FIGS. 6 to 8 is a decoder, where the execution body may also be a voice signal. A processing device, where the device can be implemented as part or all of the encoder through software, hardware, or a combination of software and hardware.
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments It is a part of the embodiments of the present application, but not all of the embodiments.
下面对执行主体为编码器的一侧实施例进行说明。The following describes an embodiment where the execution body is an encoder.
在一个实施例中,图2提供了一种语音信号处理方法,本实施例涉及的是编码器从原始语音信号中获取语音残留信号后,将语音残留信号分流以及对分流后的语音编码器增加补偿信息,并发送给解码器的具体过程,如图2所示,所述方法包括:In one embodiment, FIG. 2 provides a method for processing a voice signal. This embodiment relates to that after an encoder obtains a voice residual signal from an original voice signal, it splits the voice residual signal and adds a voice encoder to the split voice signal. The specific process of compensating information and sending it to the decoder is shown in Figure 2. The method includes:
S101,获取语音残留信号,并对语音残留信号进行分流,得到多个子语音残留信号;语音残留信号为对原始语音信号进行处理后得到的无相关性信号或弱相关性信号。S101: Obtain a voice residual signal, and shunt the voice residual signal to obtain multiple sub-voice residual signals; the voice residual signal is an uncorrelated signal or a weakly correlated signal obtained after processing the original voice signal.
其中,语音残留信号语音是编码器将原始语音信号经过去除短长时相关,进行增益控制和噪声整形之后剩下的无相关性或弱相关性的信号。Among them, the speech residual signal speech is the uncorrelated or weakly correlated signal remaining after the encoder removes the short- and long-term correlation of the original speech signal, and performs gain control and noise shaping.
其中,获取语音残留信号可以理解为是编码器(以下也可称为语音编码器)接收到一段原始语音信号后,将该原始语音信号分为语音残留信号和其他参数,这里的其他参数是原始语音信号中除去语音残留信号后的统称,也就是说其他参数中包括的不止是一种参数,而是多种参数,至于具体包括哪些参数,本实施例不作限定。Among them, the acquisition of the speech residual signal can be understood as the encoder (hereinafter also referred to as the speech encoder) after receiving a segment of the original speech signal, the original speech signal is divided into the speech residual signal and other parameters, where the other parameters are the original The general term after the speech residual signal is removed from the speech signal, that is to say, the other parameters include not only one parameter, but multiple parameters. As to which parameters are specifically included, this embodiment does not limit it.
本步骤中,编码器从原始语音信号中获取语音残留信号后,对语音残留信号进行分流。可以理解的是,因为语音信号进入编码器后其实就是一段码流,信号的实质就是一段信号序列,那么对语音残留信号进行分流就是对语音残留信号序列的整段码流进行分流,分流成多条信号序列。In this step, after the encoder obtains the voice residual signal from the original voice signal, it shunts the voice residual signal. It is understandable that because the voice signal is actually a bit stream after entering the encoder, the essence of the signal is a signal sequence, so to split the voice residual signal is to split the entire code stream of the voice residual signal sequence, and split into multiple streams. Article signal sequence.
本实施例中,对语音残留信号分流可以是分为两条码流,也可以是分成其他数量的码流,本实施例对此不作限定。例如,分流成两条,即将语音残留信号分流成第一子语音残留信号和第二子语音残留信号。这里需要说明的是,在将语音残留信号分流成第一子语音残留信号和第二子语音残 留信号后,会同时将原始语音信号中除语音残留信号外的其他参数分别复制保存在各子语音残留信号中,那么最终分流后形成的每条码流中除了包括分出的子语音残留信号外,还携带完整的其他参数,这样,在解码端若恢复出语音残留信号后,可以结合其他参数恢复出原始语音信号。In this embodiment, the voice residual signal can be divided into two code streams, or it can be divided into other numbers of code streams, which is not limited in this embodiment. For example, split into two, that is, split the voice residual signal into a first sub-voice residual signal and a second sub-voice residual signal. What needs to be explained here is that after the speech residual signal is split into the first sub speech residual signal and the second sub speech residual signal, other parameters in the original speech signal except for the speech residual signal will be copied and saved in each sub speech at the same time. In the residual signal, each bar code stream formed after the final split includes the sub-voice residual signal, and also carries complete other parameters. In this way, if the voice residual signal is recovered at the decoding end, it can be restored in combination with other parameters. Out the original voice signal.
S102,基于预设的补偿配置,获取各子语音残留信号的补偿信息。S102: Acquire compensation information of each sub-speech residual signal based on a preset compensation configuration.
补偿配置表示的补偿信息的配置,而补偿信息是对各子语音残留信号预设的补偿的信息,解码器根据各子语音残留信号可以根据该补偿信息进行补偿的额外信息。该补偿的额外信息可以使解码器在解码时较好的恢复出语音信号。例如,补偿配置可以包括码率配置和补偿参数,码率配置决定了传输语音信号码流时每个传输包的流量上限,补偿参数的变化会使得补偿信息在一包数据中的比例发生变化。又例如,补偿参数可以包括将各子语音残留信号划分的小帧数量,以及每小帧中非零脉冲的数量。其中,补偿配置信息可以由预设的丢包率确定,一般在相同的平均码率下,丢包率越低,补偿信息越少;反之,丢包率越高,补偿信息越多。极限情况,比如不丢包时,补偿信息大小为0。The compensation configuration indicates the configuration of the compensation information, and the compensation information is the preset compensation information for each sub-speech residual signal, and the decoder can compensate the extra information according to the compensation information according to each sub-speech residual signal. The additional information of compensation can enable the decoder to better recover the speech signal during decoding. For example, the compensation configuration may include code rate configuration and compensation parameters. The code rate configuration determines the upper limit of the traffic of each transmission packet when transmitting the voice signal code stream. The change of the compensation parameter will cause the proportion of the compensation information in a packet of data to change. For another example, the compensation parameter may include the number of small frames dividing each sub-speech residual signal, and the number of non-zero pulses in each small frame. Among them, the compensation configuration information can be determined by the preset packet loss rate. Generally, under the same average code rate, the lower the packet loss rate, the less compensation information; conversely, the higher the packet loss rate, the more compensation information. In extreme cases, such as no packet loss, the compensation information size is 0.
实际应用中,获取各子语音残留信号的补偿信息时,需基于补偿配置确定,则需要预先确定出补偿配置。通常在传输语音残留信号时,补偿配置需要预设,预设时可根据历史大数据,以及结合实际情况确定,本实施例对此不作限定。具体地,编码器基于预设的补偿配置,获取各子语音残留信号的补偿信息的方式可以是通过预设算法,或者预先训练好的神经网络模型,将补偿配置信息作为输入直接确定对应的补偿信息。当然也可以是其他方式,本实施例对此不做限定。In practical applications, when obtaining the compensation information of each sub-voice residual signal, it needs to be determined based on the compensation configuration, and the compensation configuration needs to be determined in advance. Generally, when the voice residual signal is transmitted, the compensation configuration needs to be preset, and the preset can be determined according to historical big data and in combination with actual conditions, which is not limited in this embodiment. Specifically, based on the preset compensation configuration, the encoder can obtain the compensation information of each sub-speech residual signal through a preset algorithm or a pre-trained neural network model, and directly determine the corresponding compensation using the compensation configuration information as input. information. Of course, it can also be in other ways, which is not limited in this embodiment.
S103,向解码器发送包括各子语音残留信号和对应的补偿信息的码流;码流用于指示解码器根据各子语音残留信号和对应补偿信息进行解码。S103: Send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
从语音残留信号中确定了各子语音残留信号后,以及获取了各子语音残留信号的补偿信息后,编码器向解码器传输码流时,每条码流中包括各子语音残留信号和对应的补偿信息,当然还需要包括上述所述的其他参数,本实施例中只要针对语音残留信号进行说明,部分实施例中不会再赘述该其他参数。After determining each sub-speech residual signal from the speech residual signal and obtaining the compensation information of each sub-speech residual signal, when the encoder transmits the code stream to the decoder, each code stream includes each sub-speech residual signal and the corresponding The compensation information, of course, also needs to include the other parameters mentioned above. In this embodiment, only the speech residual signal is described, and the other parameters will not be repeated in some embodiments.
可以理解的是,编码器向解码器传输码流是用于指示解码器根据码流中的各子语音残留信号和对应补偿信息解码恢复出语音残留信号。It can be understood that the transmission of the code stream from the encoder to the decoder is used to instruct the decoder to decode and recover the speech residual signal according to the sub-speech residual signals and corresponding compensation information in the code stream.
示例地,如图3所示,提供一种编码器向解码器发送分流后码流的示意图。其中,图3中的主解码器和边路解码器可以认为是一个解码器,该解码器在接收到码流后,会根据收到码流的数量采用不同的解码方法,也即是说,主解码器和边路解码器可以看作是一个解码器中实现不同解码方法的子解码器。对于解码器具体的解码过程,可参见以解码器为执行主体的实施例中进行说明,这里不再赘述。Illustratively, as shown in FIG. 3, a schematic diagram of an encoder sending a split code stream to a decoder is provided. Among them, the main decoder and side decoder in Figure 3 can be considered as one decoder. After the decoder receives the code stream, it will adopt different decoding methods according to the number of received code streams, that is, The main decoder and the side decoder can be regarded as sub-decoders that implement different decoding methods in one decoder. For the specific decoding process of the decoder, please refer to the description in the embodiment with the decoder as the execution subject, and will not be repeated here.
本实施例提供的语音信号处理方法,编码器从原始语音信号中获取语音残留信号后,对语音残留信号进行分流,得到多个子语音残留信号;并基于预设的补偿配置,获取各子语音残留信号的补偿信息;然后向解码器发送包括各子语音残留信号和对应的补偿信息的码流,其中该码流用于指示解码器根据各子语音残留信号和对应补偿信息进行解码。该方法中,通过对语音残留信号进行分流,相当于对语音编码器参数进行多描述后发送解码器,且对各分流进行描述时均增加了补偿信息,该补偿信息可用于解码器解码时有效恢复出较好的语音信号,这样,通过对语音残留信号多描述的方式,即使在传输过程中发生丢包,解码器也可以恢复出较好语音信号,因此,该方法通过对语音残留信号多描述的方式,可以有效提高语音编码器的抗丢包性能。In the speech signal processing method provided in this embodiment, after the encoder obtains the speech residual signal from the original speech signal, it splits the speech residual signal to obtain multiple sub-speech residual signals; and obtains each sub-speech residual signal based on a preset compensation configuration Signal compensation information; and then send to the decoder a code stream including each sub-speech residual signal and corresponding compensation information, where the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information. In this method, by splitting the voice residual signal, it is equivalent to sending the decoder after multiple descriptions of the voice encoder parameters, and when describing each split, compensation information is added. This compensation information can be effectively used when the decoder decodes. A better voice signal can be recovered. In this way, the decoder can recover a better voice signal even if packet loss occurs during the transmission process by describing more voice residual signals. Therefore, this method uses more voice residual signals. The described method can effectively improve the anti-packet performance of the speech encoder.
在以上实施例的基础上,本申请实施例还提供了一种语音信号处理方法,其涉及的是语音编码器将语音残留信号分流成两条子语音残留信号的具体过程,则在一个实施例中,若多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号;则如图4所示,上述S101步骤包括:On the basis of the above embodiment, the embodiment of the present application also provides a voice signal processing method, which relates to the specific process of the voice encoder shunting the voice residual signal into two sub-voice residual signals. In one embodiment, If the multiple sub-speech residual signals include a first sub-speech residual signal and a second sub-speech residual signal; as shown in FIG. 4, the above S101 step includes:
S201,对语音残留信号进行量化,得到语音残留信号对应的量化序列。S201: quantize the speech residual signal to obtain a quantized sequence corresponding to the speech residual signal.
本实施例中的语音编码器以SILK编码器为例进行说明。The speech encoder in this embodiment is described by taking the SILK encoder as an example.
若将语音残留信号分流成第一子语音残留信号和第二子语音残留信号,则可以采用的分流方式是奇偶分流。SILK编码器在对语音残留信号进行分流前需要先对语音残留信号进行量化,得到语音残留信号对应的量化序列。If the voice residual signal is split into the first sub-voice residual signal and the second sub-voice residual signal, the splitting method that can be used is odd-even splitting. The SILK encoder needs to quantize the speech residual signal before shunting the speech residual signal to obtain the quantized sequence corresponding to the speech residual signal.
例如,定义量化前的语音残留信号为:r[n],n=0,1,...,L-1;则量化后的语音残留信号序列可以表示为q[n],n=0,1,...,L-1。For example, define the speech residual signal before quantization as: r[n],n=0,1,...,L-1; then the quantized speech residual signal sequence can be expressed as q[n], n=0, 1,...,L-1.
S202,对量化序列进行奇偶分流,获取奇量化序列和偶量化序列。S202: Perform odd-even splitting on the quantized sequence to obtain an odd quantized sequence and an even quantized sequence.
基于上述量化的语音残留信号的量化序列,对该量化序列进行奇偶分流,分流后,需要基于SILK编码器的本身的算法进一步确定最终的奇量化序列和偶量化序列。Based on the quantized sequence of the quantized speech residual signal, the quantized sequence is odd-even split, and after the split, the final odd quantized sequence and the even quantized sequence need to be further determined based on the algorithm of the SILK encoder.
例如,将序列q[n],n=0,1,...,L-1,奇偶分流序列可表示为:For example, if the sequence q[n],n=0,1,...,L-1, the parity split sequence can be expressed as:
Figure PCTCN2020113219-appb-000001
Figure PCTCN2020113219-appb-000001
基于SILK编码器本身的随机种子序列和符号函数,确定出的符号函数为:
Figure PCTCN2020113219-appb-000002
确定出随机种子偶序列:
Figure PCTCN2020113219-appb-000003
以及确定出随机种子奇序列:
Figure PCTCN2020113219-appb-000004
其中,seed_init在奇流与偶流中都会保留一份,大小由SILK编码器产生。
Based on the random seed sequence and symbol function of the SILK encoder itself, the determined symbol function is:
Figure PCTCN2020113219-appb-000002
Determine the random seed pair sequence:
Figure PCTCN2020113219-appb-000003
And determine the odd sequence of random seeds:
Figure PCTCN2020113219-appb-000004
Among them, seed_init will keep a copy in odd and even streams, and the size is generated by the SILK encoder.
进一步地,基于上述确定的符号函数和奇偶随机种子序列,以及语音残留信号量化后的奇偶分流序列,可确定出最终的奇量化序列和偶量化序列为:Further, based on the above determined symbol function and parity random seed sequence, and the parity split sequence after the speech residual signal quantization, the final odd quantization sequence and even quantization sequence can be determined as:
q e[n]=Q(r[2*n]*sign(s e[n])-offset) q e [n]=Q(r[2*n]*sign(s e [n])-offset)
q o[n]=Q(r[2*n+1]*sign(s o[n])-offset) q o [n]=Q(r[2*n+1]*sign(s o [n])-offset)
其中,Q表示SILK编码器中的量化算法,由SILK编码器提供。Offset时根据小帧类型查表得出,也由SILK编码器提供。Among them, Q represents the quantization algorithm in the SILK encoder, which is provided by the SILK encoder. Offset is obtained by looking up the table according to the small frame type, which is also provided by the SILK encoder.
S203,将奇量化序列确定为第一子语音残留信号,以及将偶量化序列确定为第二子语音残留信号。S203: Determine the odd quantization sequence as the first sub-speech residual signal, and determine the even quantization sequence as the second sub-speech residual signal.
基于上述确定的奇偶量化序列,编码器将奇量化序列确定为第一子语音残留信号,以及将偶量化序列确定为第二子语音残留信号,当然还可以将偶量化序列确定为第一子语音残留信号,以及将奇量化序列确定为第二子语音残留信号,对于第一第二与奇偶量化序列的对应关系,本实施例不 作限定。Based on the above-determined parity quantization sequence, the encoder determines the odd quantization sequence as the first sub-speech residual signal and the even quantization sequence as the second sub-speech residual signal. Of course, the even quantization sequence can also be determined as the first sub-speech The residual signal, and the odd quantization sequence is determined to be the second sub-speech residual signal, the correspondence between the first and second quantization sequences and the odd and even quantization sequence is not limited in this embodiment.
本实施例中,基于量化后的语音残留信号量化序列,进行奇偶分流,且结合了语音编码器中本身的算法,确定出最终的奇偶量化序列,这样,将奇偶量化序列作为最终编码器传输的码流,方便了对语音残留信号的传输。In this embodiment, the parity split is performed based on the quantized residual speech signal quantization sequence, and combined with the algorithm in the speech encoder, the final parity quantization sequence is determined. In this way, the parity quantization sequence is transmitted as the final encoder. The code stream facilitates the transmission of voice residual signals.
在一个实施例中,以上述补偿配置中补偿参数,且补偿参数包括各子语音残留信号被划分的小帧数量N1,和每小帧中非零脉冲的数量N2;N1为正整数、N2为非负整数为例,对获取各子语音残留信号的补偿信息的过程进行说明。其中补偿参数在实际中预设是可根据丢包率确定,以保证设置的N1和N2的合理性。则如图5所示,上述S102步骤包括:In an embodiment, the compensation parameters in the above compensation configuration are used, and the compensation parameters include the number of small frames into which each sub-speech residual signal is divided, and the number of non-zero pulses in each small frame N2; N1 is a positive integer, and N2 is Taking a non-negative integer as an example, the process of obtaining the compensation information of each sub-speech residual signal will be described. The compensation parameter is preset in practice and can be determined according to the packet loss rate to ensure the reasonableness of the set N1 and N2. As shown in Figure 5, the above S102 step includes:
S301,获取各子语音残留信号对应的补偿增益、位置序列和符号序列;补偿增益的长度为N1,位置序列和符号序列的长度均为N2。S301: Obtain the compensation gain, position sequence, and symbol sequence corresponding to each sub-speech residual signal; the length of the compensation gain is N1, and the length of the position sequence and the symbol sequence are both N2.
其中,补偿增益、位置序列和符号序列的长度均为N2,表示的是补偿增益、位置序列和符号序列都是针对子语音残留信号中每小帧的,也就是说本步骤中编码器获取的是各子语音残留信号中每小帧的补偿增益、位置序列和符号序列。Among them, the length of the compensation gain, position sequence and symbol sequence are all N2, which means that the compensation gain, position sequence and symbol sequence are all for each small frame in the sub-speech residual signal, that is to say, the encoder obtains in this step It is the compensation gain, position sequence and symbol sequence of each small frame in each sub-speech residual signal.
例如,以N1=cfc,N2=nz为例For example, take N1=cfc, N2=nz as an example
位置序列x i可以表示为x i=MAX_POS nz(ABS(rq i-crq i));其中,(rq i表示解码器接收到所有分流的码流都收到时,恢复出来的最优序列中的第i个小帧,crq i)表示解码器接收到分流的码流中单条码流时,基于补偿信息恢复出来的序列中的第i个小帧,i可以取0至cfc-1中的值;则函数ABS表示对语音信号序列中的项分别求绝对值,而语音信号序列中的项是由rq i与crq i)的确定的。其中,MAX_POSx函数返回前nz大的项的位置序列。 The position sequence x i can be expressed as x i =MAX_POS nz (ABS(rq i -crq i )); where (rq i represents the optimal sequence recovered when all the substreams received by the decoder are received The i-th small frame, crq i ) represents the i-th small frame in the sequence recovered based on the compensation information when the decoder receives a single bar code stream in the shunted code stream, i can take from 0 to cfc-1 Value; the function ABS represents the absolute value of the items in the speech signal sequence, and the items in the speech signal sequence are determined by rq i and crq i ). Among them, the MAX_POSx function returns the position sequence of the first nz largest item.
其中,补偿增益可基于位置序列确定,例如,补偿增益g i可表示为
Figure PCTCN2020113219-appb-000005
同样,基于位置序列可确定符号序列,则符号序列s i可表示为:s i=sign(rq i[x i]-crq i[x i])。
Among them, the compensation gain can be determined based on the position sequence, for example, the compensation gain g i can be expressed as
Figure PCTCN2020113219-appb-000005
Similarly, the symbol sequence can be determined based on the position sequence, and the symbol sequence s i can be expressed as: s i =sign(rq i [ xi ]-crq i [ xi ]).
S302,根据各子语音残留信号对应的N1个位置序列和符号序列,构建 各子语音残留信号的补偿序列;S302: Construct a compensation sequence of each sub-speech residual signal according to the N1 position sequences and symbol sequences corresponding to each sub-speech residual signal;
基于上述编码器确定的各子语音残留信号的位置序列和符号序列,将所有小帧拼接构成完成序列,即将N1个每小帧的位置序列和符号序列,构建完整序列,该完成序列为各子语音残留信号的补偿序列。补偿序列用cq表示,由于N1是各子语音残留信号划分的小帧的数量,即为每个子语音残留信号的长度,所以最后每个子语音残留信号的cq的长度均为L/2,L为整条语音残留信号的长度。Based on the position sequence and symbol sequence of each sub-speech residual signal determined by the above encoder, all small frames are spliced to form a complete sequence, that is, N1 position sequences and symbol sequences of each small frame are constructed to construct a complete sequence. The completed sequence is each sub-frame. Compensation sequence of speech residual signal. The compensation sequence is represented by cq. Since N1 is the number of small frames divided by each sub-speech residual signal, that is, the length of each sub-speech residual signal, the length of cq of each sub-speech residual signal is L/2, and L is The length of the entire speech residual signal.
S303,根据各子语音残留信号的补偿序列和各子语音残留信号的补偿增益,确定各子语音残留信号的补偿信息。S303: Determine the compensation information of each sub-speech residual signal according to the compensation sequence of each sub-speech residual signal and the compensation gain of each sub-speech residual signal.
将上述确定的各子语音残留信号的补偿序列和各子语音残留信号的补偿增益,将各子语音残留信号的补偿序列和补偿增益确定为最终的补偿信息。The compensation sequence and the compensation gain of each sub-speech residual signal determined above are determined, and the compensation sequence and the compensation gain of each sub-speech residual signal are determined as the final compensation information.
本实施例中在确定补偿信息时,是先将各子语音残留信号划分成多个小帧,然后求取每个小帧的增益值以及补偿序列,并基于构成的整条补偿序列确定最终的补偿信息,这样,确定出的补偿信息可以全面完整的对解码器解码是提供额外的补偿信息,保证了解码器恢复的语音信号的质量。When determining the compensation information in this embodiment, each sub-speech residual signal is first divided into multiple small frames, and then the gain value and compensation sequence of each small frame are obtained, and the final compensation sequence is determined based on the entire compensation sequence. Compensation information. In this way, the determined compensation information can fully and completely decode the decoder and provide additional compensation information to ensure the quality of the speech signal recovered by the decoder.
另外,上述补偿配置还包括码率配置,该码率配置是决定传输包流量上限的参数,且该码率配置可以由预设的丢包率确定,则在一个实施例中,该方法还包括:根据码率配置确定各子语音残留信号的补偿信息的空间大小;空间大小用于指示在发送码流时,存储补偿信息的空间容量。In addition, the above compensation configuration also includes a code rate configuration, which is a parameter that determines the upper limit of the transmission packet flow, and the code rate configuration can be determined by a preset packet loss rate. In one embodiment, the method further includes : Determine the space size of the compensation information of each sub-voice residual signal according to the bit rate configuration; the space size is used to indicate the space capacity for storing the compensation information when the code stream is sent.
在确定了补偿信息后,需要进一步确定存储补偿信息空间的容量,这样对每条分流后的码流的存储空间划分差不多大小的容量,有利于后续进行传输性能测试,且在传输的码流中增加补偿信息后对于码流的大小会有一定的增加,通过确定补偿信息存储空间容量可以有效检测出影响码流传输的效率的因素。After the compensation information is determined, the capacity of the space for storing the compensation information needs to be further determined. In this way, the storage space of each divided bit stream is divided into a similar size, which is beneficial to the subsequent transmission performance test, and in the transmitted bit stream After the compensation information is added, the size of the code stream will increase to a certain extent. By determining the storage space capacity of the compensation information, the factors that affect the efficiency of the code stream transmission can be effectively detected.
下面对执行主体为解码器的一侧实施例进行说明。需要说明的是,本申请虽然分了解码器为执行主体的实施例和编码器为执行主体的实施例,但实际中,解码器与编码器是相互配合着交互完成语音信号处理的,因此,编码器为执行主体的实施例与解码器为执行主体的实施例中的过程描述可 互相参考,而不是作为两者执行范围的限定。The following describes an embodiment where the execution subject is a decoder. It should be noted that although this application divides the embodiment in which the decoder is the main body of execution and the embodiment in which the encoder is the main body of execution, in reality, the decoder and the encoder interact with each other to complete speech signal processing. Therefore, The process descriptions in the embodiment in which the encoder is the execution subject and the embodiment in which the decoder is the execution subject may refer to each other, rather than as a limitation of the execution range of the two.
如图6所示,在一个实施例中,提供了一种语音信号处理方法,本实施例涉及的是解码器在接收到编码器发送的码流后,进行解码的具体过程,该方法包括:As shown in Figure 6, in one embodiment, a method for processing a voice signal is provided. This embodiment relates to a specific process of decoding after a decoder receives a code stream sent by an encoder. The method includes:
S401,接收编码器发送的码流;码流包括多个子语音残留信号和对应的补偿信息;各子语音残留信号为从语音残留信号中分流得到的;补偿信息是基于预设的补偿配置确定的。S401: Receive a code stream sent by an encoder; the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is determined based on a preset compensation configuration .
S402,根据码流中的各子语音残留信号和对应的补偿信息进行解码。S402: Perform decoding according to each sub-speech residual signal in the bitstream and corresponding compensation information.
本实施例中码流以及补偿配置等原理过程可参见执行主体为编码器的实施例中的描述,在此不再赘述。The principle process of the code stream and compensation configuration in this embodiment can be referred to the description in the embodiment in which the execution body is the encoder, which will not be repeated here.
其中,解码器接收编码器发送的码流时,要么是全部接收到,要么就是接收到其中的部分码流,即发生了丢包现象,针对两种不同的情况,解码器采用不同的解码方法恢复语音残留信号,具体过程可参见下述实施例中的说明。Among them, when the decoder receives the code stream sent by the encoder, it either receives all or part of the code stream, that is, packet loss occurs. For two different situations, the decoder adopts different decoding methods. For restoring the residual voice signal, refer to the description in the following embodiment for the specific process.
本实施例提供的语音信号处理方法,解码器接收编码器发送的码流后,根据各码流中携带的根据各子语音残留信号和对应补偿信息进行解码,该码流为编码器从原始语音信号中获取语音残留信号后,对语音残留信号进行分流,得到多个子语音残留信号,并基于预设的补偿配置,获取各子语音残留信号的补偿信息,然后向解码器发送包括各子语音残留信号和对应的补偿信息的码流。该方法中,在编码器端对语音残留信号进行分流,相当于对语音编码器参数进行多描述后发送解码器,且对各分流进行描述时均增加了补偿信息,该补偿信息可用于解码器解码时有效恢复出较好的语音信号,这样,通过对语音残留信号多描述的方式,即使在传输过程中发生丢包,解码器也可以恢复出较好语音信号,因此,该方法可以有效提高语音编码器的抗丢包性能。In the speech signal processing method provided by this embodiment, after the decoder receives the code stream sent by the encoder, it decodes according to the sub-speech residual signal and corresponding compensation information carried in each code stream. After the voice residual signal is obtained from the signal, the voice residual signal is shunted to obtain multiple sub-voice residual signals, and based on the preset compensation configuration, the compensation information of each sub-voice residual signal is obtained, and then sent to the decoder including each sub-voice residual signal The code stream of the signal and the corresponding compensation information. In this method, the voice residual signal is split at the encoder end, which is equivalent to sending the decoder after multiple descriptions of the voice encoder parameters, and compensation information is added when each split is described, and the compensation information can be used in the decoder During decoding, a better voice signal can be recovered effectively. In this way, the decoder can recover a better voice signal even if packet loss occurs during the transmission process by describing the residual voice signal. Therefore, this method can effectively improve Anti-packet loss performance of the speech encoder.
下面通过两个实施例,对解码器接收到所有码流的情况,和解码器只接收到其中单条码流的情况的解码过程进行说明。下面将以多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号为例进行说明。In the following, two embodiments are used to describe the decoding process in the case where the decoder receives all the code streams and the case where the decoder only receives a single code stream. In the following, the multiple sub-speech residual signals include the first sub-speech residual signal and the second sub-speech residual signal as an example for description.
则在一个实施例中,若多个子语音残留信号包括第一子语音残留信号 和第二子语音残留信号;且接收到码流为第一子语音残留信号和对应的补偿信息,和,第二子语音残留信号和对应的补偿信息;则如图7所示,上述S402步骤包括:In an embodiment, if the multiple sub-speech residual signals include a first sub-speech residual signal and a second sub-speech residual signal; and the received code stream is the first sub-speech residual signal and corresponding compensation information, and, the second The sub-speech residual signal and the corresponding compensation information; then, as shown in FIG. 7, the above step S402 includes:
S501,根据第一子语音残留信号恢复对应的偶语音残留信号,以及根据第二子语音残留信号恢复对应的奇语音残留信号。S501: Restore a corresponding even voice residual signal according to the first sub-speech residual signal, and restore a corresponding odd voice residual signal according to the second sub-speech residual signal.
其中,偶语音残留信号与第一子语音残留信号的区别是:第一子语音残留信号是编码器端对语音残留信号量化后分流的子信号序列,而偶语音残留信号是解码器端根据子信号序列恢复出的语音残留信息;奇语音残留信号与第二子语音残留信号同样的区别。Among them, the difference between the even voice residual signal and the first sub-speech residual signal is: the first sub-speech residual signal is the sub-signal sequence that the encoder side quantizes and shunts the voice residual signal, while the even voice residual signal is the decoder side according to the sub-signal sequence. The voice residual information recovered from the signal sequence; the odd voice residual signal has the same difference with the second sub-voice residual signal.
本步骤中是以第一子语音残留信号为偶量化序列,第二子语音残留信号为奇量化序列进行说明的,在实际应用中,还可将两者的对应关系进行互换,因为第一第二本身只是用于对子语音残留信号作以区分的,本实施例对此不作限定。In this step, the first sub-speech residual signal is an even quantization sequence, and the second sub-speech residual signal is an odd quantization sequence. In practical applications, the correspondence between the two can also be exchanged, because the first The second itself is only used to distinguish the sub-speech residual signals, which is not limited in this embodiment.
例如,若定义奇量化序列和偶量化序列为:For example, if you define odd quantization sequence and even quantization sequence as:
q e[n]=Q(r[2*n]*sign(s e[n])-offset) q e [n]=Q(r[2*n]*sign(s e [n])-offset)
q o[n]=Q(r[2*n+1]*sign(s o[n])-offset) q o [n]=Q(r[2*n+1]*sign(s o [n])-offset)
那么,偶语音残留信号和奇语音残留信号可表示为:Then, the even voice residual signal and the odd voice residual signal can be expressed as:
Figure PCTCN2020113219-appb-000006
Figure PCTCN2020113219-appb-000006
可见q(n)是从r(n)量化后的量化序列,rq(n)表示的是从q(n)恢复后的语音残留信号。It can be seen that q(n) is the quantized sequence quantized from r(n), and rq(n) represents the speech residual signal recovered from q(n).
示例地,本实施例中,解码器从q(n)恢复rq(n)的过程,可采用一些常用的解码算法进行,本实施例对此不作限定。For example, in this embodiment, the process of the decoder recovering rq(n) from q(n) can be performed by using some commonly used decoding algorithms, which is not limited in this embodiment.
S502,对偶语音残留信号和奇语音残留信号进行交织插值,确定语音残留信号。S502: Perform interleaving and interpolation on the even voice residual signal and the odd voice residual signal to determine the voice residual signal.
基于上述恢复的偶语音残留信号和奇语音残留信号,解码器对偶语音残留信号和奇语音残留信号进行交织插值,即将奇偶项分别交织插入即可得到完整语音残留信号。前面有说在实际中发送码流时还会将一开始分出的其他参数携带上,那解码器恢复了语音残留信号后,可以结合码流中携 带的其他参数恢复出原始语音信号。Based on the restored even voice residual signal and odd voice residual signal, the decoder performs interleaving and interpolation on the even voice residual signal and the odd voice residual signal, that is, interleaving and inserting the odd and even items respectively to obtain a complete voice residual signal. As mentioned earlier, when the code stream is sent in practice, other parameters that were separated at the beginning will be carried. After the decoder restores the speech residual signal, it can restore the original speech signal by combining other parameters carried in the code stream.
本实施例中由于解码器接收的所有码流,也就是说将编码器发送的所有码流均接收到了,所以对偶语音残留信号和奇语音残留信号逐采样交织,就恢复出最优的语音残留信号,从而可恢复出较高音质的原始语音信号。In this embodiment, since all the code streams received by the decoder, that is to say, all the code streams sent by the encoder are received, the even voice residual signal and the odd voice residual signal are interleaved sample by sample to recover the optimal voice residual signal. Signal, which can restore the original voice signal with higher sound quality.
在另外一个实施例中,若多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号;且接收到码流为第一子语音残留信号和对应的补偿信息,或者,第二子语音残留信号和对应的补偿信息;则如图8所示,上述S402步骤包括:In another embodiment, if the multiple sub-speech residual signals include a first sub-speech residual signal and a second sub-speech residual signal; and the received code stream is the first sub-speech residual signal and corresponding compensation information, or the second sub-speech residual signal The sub-voice residual signal and the corresponding compensation information; then, as shown in FIG. 8, the above step S402 includes:
S601,根据第一子语音残留信号恢复对应的偶语音残留信号,或者,根据第二子语音残留信号恢复对应的奇语音残留信号。S601: Restore the corresponding even voice residual signal according to the first sub-speech residual signal, or restore the corresponding odd voice residual signal according to the second sub-speech residual signal.
本实施例是只接收都一个子语音残留信号,例如,只接收到第一子语音残留信号或者只接收到第二子语音残留信号。相应地,编码器恢复的就只有偶语音残留信号或者奇语音残留信号,即接收到哪个子语音残留信号,恢复的就是该子语音残留信号对应的语音残留信号。In this embodiment, only one sub-speech residual signal is received, for example, only the first sub-speech residual signal is received or only the second sub-speech residual signal is received. Correspondingly, the encoder recovers only the even voice residual signal or the odd voice residual signal, that is, which sub voice residual signal is received, and what is restored is the voice residual signal corresponding to the sub voice residual signal.
S602,根据偶语音残留信号恢复相似语音残留信号,或者,根据奇语音残留信号恢复相似语音残留信号。S602: Restore a similar voice residual signal based on the even voice residual signal, or restore a similar voice residual signal based on the odd voice residual signal.
基于上述获取的偶语音残留信号或者奇语音残留信号,恢复相似语音残留信号,这里的相似语音残留信号表示基于补偿信息恢复的语音残留信号,用crq(n)表示,该相似语音残留信号与rq(n)之间有少量误差,所以称为相似语音残留信号。Based on the even voice residual signal or odd voice residual signal obtained above, the similar voice residual signal is restored, where the similar voice residual signal represents the voice residual signal recovered based on the compensation information, which is represented by crq(n), and the similar voice residual signal is equal to rq There is a small amount of error between (n), so it is called similar speech residual signal.
上述步骤中恢复的偶语音残留信号可以表示为rq e,奇语音残留信号可以表示为rq o,相似语音残留信号表示为crq。 The even speech residual signal recovered in the above steps can be expressed as rq e , the odd speech residual signal can be expressed as rq o , and the similar speech residual signal is expressed as crq.
则解码器根据rq e或者rq o恢复出crq的方法可以是通过下述公式实现: Then the method for the decoder to recover crq according to rq e or rq o can be implemented by the following formula:
Figure PCTCN2020113219-appb-000007
Figure PCTCN2020113219-appb-000007
Figure PCTCN2020113219-appb-000008
Figure PCTCN2020113219-appb-000008
根据以上公式可知,基于rq e确定的是crq e,基于rq o确定出的是crq o,且上述公式是将序列中0~L-1中的各项全部涵盖,所以可认为恢复出的crq e和crq o的长度均为L,所以crq e和crq o可统称为crq,即相似语音残留信号。 The above formula indicates, is determined based on rq e crq e, determined based CRQ rq o is O, and the sequence of the above formula is 0 ~ L-1 in the fully covered, it is considered that the recovered CRQ The lengths of e and crq o are both L, so crq e and crq o can be collectively referred to as crq, that is, similar speech residual signals.
S603,基于第一子语音残留信号对应的补偿信息和相似语音残留信号,确定目标相似语音残留信号,或者,根据第二子语音残留信号对应的补偿信息和相似语音残留信号,确定目标相似语音残留信号。S603: Determine the target similar voice residual signal based on the compensation information and the similar voice residual signal corresponding to the first sub-voice residual signal, or determine the target similar voice residual signal according to the compensation information and the similar voice residual signal corresponding to the second sub-voice residual signal signal.
上述在确定相似语音残留信号时,还未考虑补偿信息,由于补偿的目的是通过额外的信息,使得crq序列更接近rq序列。因此,为了最终恢复的语音残留信号的质量更好,将各子语音残留信号中携带的补偿信息融进相似语音残留信号中,得到最终的目标相似语音残留信号。When determining the similar voice residual signal, compensation information has not been considered, because the purpose of compensation is to use additional information to make the crq sequence closer to the rq sequence. Therefore, in order to have a better quality of the finally recovered speech residual signal, the compensation information carried in each sub-speech residual signal is merged into the similar speech residual signal to obtain the final target similar speech residual signal.
例如,先求取语音残留信号中的每小帧的目标相似语音残留信号,将补偿信息中的每小帧补偿增益,乘以每小帧补偿序列后,再加上相似语音残留信号,即目标crq i=crq i+g i*cq i。基于每小帧的目标相似语音残留信号可以确定出整条语音残留信号的目标相似语音残留信号。 For example, first obtain the target similar voice residual signal of each small frame in the voice residual signal, multiply the compensation gain of each small frame in the compensation information by the compensation sequence of each small frame, and then add the similar voice residual signal, that is, the target crq i = crq i +g i *cq i . Based on the target similar voice residual signal of each small frame, the target similar voice residual signal of the entire voice residual signal can be determined.
S604,根据目标相似语音残留信号,确定语音残留信号。S604: Determine a voice residual signal according to the target similar voice residual signal.
将上述步骤中确定的目标相似语音残留信号,确定的语音残留信号。可以理解的是,虽然本实施例中确定的目标相似语音残留信号与最优语音残留信号之间存在一定的小误差,但是本实施是基于解码器收到单条码流的情况下恢复的语音残留信号。也就是说,在发生丢包的情况下,解码器可恢复出目标相似语音残留信号。The target determined in the above steps is similar to the voice residual signal to the determined voice residual signal. It is understandable that although there is a certain small error between the target similar voice residual signal and the optimal voice residual signal determined in this embodiment, this implementation is based on the voice residual signal recovered when the decoder receives a single code stream. signal. In other words, in the case of packet loss, the decoder can recover the target similar voice residual signal.
在实践中,对本实施例提供的方法获取的目标相似语音残留信号做了音效质量验证,如下表1所示,为使用同一码流ch_f1.wav,在相同的丢包策略,相近实际流量下比较本方法与SILK编码器的MOS分。In practice, the sound quality verification is performed on the target similar voice residual signal obtained by the method provided in this embodiment, as shown in Table 1 below, to use the same code stream ch_f1.wav, compare it under the same packet loss strategy and similar actual traffic This method is divided into MOS of SILK encoder.
表1Table 1
Figure PCTCN2020113219-appb-000009
Figure PCTCN2020113219-appb-000009
从上表中的MOS分可看出,在丢包率很高的情况下,也能恢复出中等音质,如果丢包率不高时,就可以恢复出较高音质。因此,本申请实施例提供的编码器,通过对编码器的参数进行分流多描述,具有很强的抗丢包性能,当在传输过程中发生丢包时,解码器即使只收到一个包能够解出中等音质,如果解码器及时收到两个包,就可以恢复出较高音质。From the MOS score in the above table, it can be seen that even when the packet loss rate is high, medium sound quality can be restored, and if the packet loss rate is not high, higher sound quality can be restored. Therefore, the encoder provided by the embodiments of the present application has strong anti-packet loss performance by splitting and multiple descriptions of the parameters of the encoder. When packet loss occurs during transmission, the decoder can even receive only one packet. Solve the medium sound quality, if the decoder receives two packets in time, it can restore the higher sound quality.
应该理解的是,虽然图2-8的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-8中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowcharts of FIGS. 2-8 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 2-8 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
另外,本申请实施例还提供了一种语音信号处理系统,可参照上述图1所示,该系统包括:编码器和解码器;其中编码器,用于实现前面以编码器为执行主体的所有实施例中的过程;解码器,用于实现前面以解码器为执行主体的所有实施例中的过程。In addition, the embodiment of the present application also provides a voice signal processing system, which can be referred to as shown in Figure 1. The system includes: an encoder and a decoder; wherein the encoder is used to implement all the previous implementations of the encoder as the main body. The process in the embodiment; the decoder is used to implement the processes in all the previous embodiments where the decoder is the main body of execution.
上述实施例提供的一种语音信号处理系统,其实现原理和技术效果与上述语音信号处理方法实施例类似,在此不再赘述。The implementation principle and technical effect of a voice signal processing system provided by the foregoing embodiment are similar to those of the foregoing voice signal processing method embodiment, and will not be repeated here.
此外,还提供了上述语音信号处理方法对应的虚拟装置,如图9所示,在一个实施例中,提供了一种语音信号处理装置,该装置包括:分流模块10、获取模块11和发送模块12,其中,In addition, a virtual device corresponding to the above-mentioned voice signal processing method is also provided. As shown in FIG. 9, in one embodiment, a voice signal processing device is provided. The device includes: a shunt module 10, an acquisition module 11, and a sending module 12, of which,
分流模块10,用于获取语音残留信号,并对语音残留信号进行分流, 得到多个子语音残留信号;语音残留信号为对原始语音信号进行处理后得到的无相关性信号或弱相关性信号;The shunting module 10 is used to obtain the speech residual signal and shunt the speech residual signal to obtain multiple sub-speech residual signals; the speech residual signal is an uncorrelated signal or a weakly correlated signal obtained after processing the original speech signal;
获取模块11,用于基于预设的补偿配置,获取各子语音残留信号的补偿信息;The obtaining module 11 is configured to obtain compensation information of each sub-speech residual signal based on a preset compensation configuration;
发送模块12,用于向解码器发送包括各子语音残留信号和对应的补偿信息的码流;码流用于指示解码器根据各子语音残留信号和对应补偿信息进行解码。The sending module 12 is used to send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
在一个实施例中,提供了一种语音信号处理装置,若多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号;则上述分流模块10包括:In one embodiment, a voice signal processing device is provided. If the multiple sub-voice residual signals include a first sub-voice residual signal and a second sub-voice residual signal, the above-mentioned shunt module 10 includes:
量化单元,用于对语音残留信号进行量化,得到语音残留信号对应的量化序列;The quantization unit is used to quantize the speech residual signal to obtain a quantized sequence corresponding to the speech residual signal;
分流单元,用于对量化序列进行奇偶分流,获取奇量化序列和偶量化序列;The shunting unit is used to perform odd-even shunting of the quantized sequence to obtain the odd quantized sequence and the even quantized sequence;
子信号确定单元,用于将奇量化序列确定为第一子语音残留信号,以及将偶量化序列确定为第二子语音残留信号。The sub-signal determining unit is configured to determine the odd quantization sequence as the first sub-speech residual signal, and determine the even quantization sequence as the second sub-speech residual signal.
在一个实施例中,提供了一种语音信号处理装置,上述补偿配置包括补偿参数,补偿参数包括各子语音残留信号被划分的小帧数量N1,以及每小帧中非零脉冲的数量N2;N1为正整数、N2为非负整数;则上述获取模块11包括:In one embodiment, a voice signal processing device is provided, the above compensation configuration includes compensation parameters, and the compensation parameters include the number of small frames into which each sub-speech residual signal is divided, N1, and the number of non-zero pulses in each small frame, N2; N1 is a positive integer and N2 is a non-negative integer; then the above-mentioned obtaining module 11 includes:
获取单元,用于获取各子语音残留信号对应的补偿增益、位置序列和符号序列;补偿增益的长度为N1,位置序列和符号序列的长度均为N2;The acquiring unit is used to acquire the compensation gain, position sequence and symbol sequence corresponding to each sub-speech residual signal; the length of the compensation gain is N1, and the length of the position sequence and symbol sequence are both N2;
构建单元,用于根据各子语音残留信号对应的N1个位置序列和符号序列,构建各子语音残留信号的补偿序列;The construction unit is used to construct the compensation sequence of each sub-speech residual signal according to the N1 position sequences and symbol sequences corresponding to each sub-speech residual signal;
补偿信息确定单元,用于根据各子语音残留信号的补偿序列和各子语音残留信号的补偿增益,确定各子语音残留信号的补偿信息。The compensation information determining unit is used to determine the compensation information of each sub-speech residual signal according to the compensation sequence of each sub-speech residual signal and the compensation gain of each sub-speech residual signal.
在一个实施例中,提供了一种语音信号处理装置,补偿配置还包括码率配置;码率配置根据预设的丢包率确定;该装置还包括:空间确定模块,用于根据码率配置确定各子语音残留信号的补偿信息的空间大小;空间大 小用于指示在发送码流时,存储补偿信息的空间容量。In one embodiment, a voice signal processing device is provided, and the compensation configuration further includes a bit rate configuration; the bit rate configuration is determined according to a preset packet loss rate; the device further includes: a space determining module, configured to configure according to the bit rate Determine the space size of the compensation information of each sub-voice residual signal; the space size is used to indicate the space capacity for storing the compensation information when the code stream is sent.
在一个实施例中,如图10所示,提供了一种语音信号处理装置,该装置包括:In one embodiment, as shown in FIG. 10, a voice signal processing device is provided, and the device includes:
接收模块13,用于接收编码器发送的码流;码流包括多个子语音残留信号和对应的补偿信息;各子语音残留信号为从语音残留信号中分流得到的;补偿信息是基于预设的补偿配置确定的;The receiving module 13 is used to receive the code stream sent by the encoder; the code stream includes multiple sub-voice residual signals and corresponding compensation information; each sub-voice residual signal is obtained by splitting from the voice residual signal; the compensation information is based on a preset The compensation configuration is determined;
解码模块14,用于根据码流中的各子语音残留信号和对应的补偿信息进行解码。The decoding module 14 is used for decoding according to each sub-speech residual signal in the bitstream and the corresponding compensation information.
在一个实施例中,提供了一种语音信号处理装置,若多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号;且接收到码流为第一子语音残留信号和对应的补偿信息,和,第二子语音残留信号和对应的补偿信息;则上述解码模块14包括:In one embodiment, a voice signal processing device is provided. If the plurality of sub-voice residual signals include a first sub-voice residual signal and a second sub-voice residual signal; and the received code stream is the first sub-voice residual signal and the corresponding The compensation information of, and, the second sub-speech residual signal and the corresponding compensation information; then the above-mentioned decoding module 14 includes:
第一恢复单元,用于根据第一子语音残留信号恢复对应的偶语音残留信号,以及根据第二子语音残留信号恢复对应的奇语音残留信号;The first restoration unit is configured to restore the corresponding even voice residual signal according to the first sub-voice residual signal, and restore the corresponding odd voice residual signal according to the second sub-voice residual signal;
第一确定语音信号单元,用于对偶语音残留信号和奇语音残留信号进行交织插值,确定语音残留信号。The first voice signal determining unit is used to perform interleaving and interpolation on the even voice residual signal and the odd voice residual signal to determine the voice residual signal.
在一个实施例中,提供了一种语音信号处理装置,若多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号;且接收到码流为第一子语音残留信号和对应的补偿信息,或者,第二子语音残留信号和对应的补偿信息;上述解码模块14包括:In one embodiment, a voice signal processing device is provided. If the plurality of sub-voice residual signals include a first sub-voice residual signal and a second sub-voice residual signal; and the received code stream is the first sub-voice residual signal and the corresponding Or, the second sub-speech residual signal and the corresponding compensation information; the decoding module 14 includes:
第二恢复单元,用于根据第一子语音残留信号恢复对应的偶语音残留信号,或者,根据第二子语音残留信号恢复对应的奇语音残留信号;The second restoration unit is configured to restore the corresponding even voice residual signal according to the first sub-voice residual signal, or restore the corresponding odd voice residual signal according to the second sub-voice residual signal;
第三恢复单元,用于根据偶语音残留信号恢复相似语音残留信号,或者,根据奇语音残留信号恢复相似语音残留信号;The third restoration unit is configured to restore similar voice residual signals based on even voice residual signals, or restore similar voice residual signals based on odd voice residual signals;
目标相似信号单元,用于基于第一子语音残留信号对应的补偿信息和相似语音残留信号,确定目标相似语音残留信号,或者,根据第二子语音残留信号对应的补偿信息和相似语音残留信号,确定目标相似语音残留信号。The target similar signal unit is used to determine the target similar voice residual signal based on the compensation information and the similar voice residual signal corresponding to the first sub-voice residual signal, or, according to the compensation information and the similar voice residual signal corresponding to the second sub-voice residual signal, Determine the target similar voice residual signal.
第二确定语音信号单元,用于根据目标相似语音残留信号,确定语音 残留信号。The second voice signal determining unit is used to determine the voice residual signal according to the target similar voice residual signal.
上述实施例提供的所有语音信号处理装置,其实现原理和技术效果与上述语音信号处理方法实施例类似,在此不再赘述。The implementation principles and technical effects of all the voice signal processing devices provided in the foregoing embodiments are similar to those of the foregoing voice signal processing method embodiments, and will not be repeated here.
关于语音信号处理装置的具体限定可以参见上文中对于语音信号处理方法的限定,在此不再赘述。上述语音信号处理装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the voice signal processing device, please refer to the above limitation on the voice signal processing method, which will not be repeated here. Each module in the above-mentioned speech signal processing device can be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是终端,其内部结构图可以如图11所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口、显示屏和输入装置。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统和计算机程序。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种语音信号处理方法。该计算机设备的显示屏可以是液晶显示屏或者电子墨水显示屏,该计算机设备的输入装置可以是显示屏上覆盖的触摸层,也可以是计算机设备外壳上设置的按键、轨迹球或触控板,还可以是外接的键盘、触控板或鼠标等。In one embodiment, a computer device is provided. The computer device may be a terminal, and its internal structure diagram may be as shown in FIG. 11. The computer equipment includes a processor, a memory, a network interface, a display screen and an input device connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer program is executed by the processor to realize a voice signal processing method. The display screen of the computer device can be a liquid crystal display or an electronic ink display screen, and the input device of the computer device can be a touch layer covered on the display screen, or it can be a button, trackball or touchpad set on the computer device shell , It can also be an external keyboard, touchpad, or mouse.
本领域技术人员可以理解,图11中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Those skilled in the art can understand that the structure shown in FIG. 11 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
在一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:In one embodiment, a computer device is provided, including a memory and a processor, a computer program is stored in the memory, and the processor implements the following steps when the processor executes the computer program:
获取语音残留信号,并对语音残留信号进行分流,得到多个子语音残留信号;语音残留信号为对原始语音信号进行处理后得到的无相关性信号或弱相关性信号;Obtain the speech residual signal, and shunt the speech residual signal to obtain multiple sub-speech residual signals; the speech residual signal is the uncorrelated signal or weakly correlated signal obtained after processing the original speech signal;
基于预设的补偿配置,获取各子语音残留信号的补偿信息;Obtain the compensation information of each sub-speech residual signal based on the preset compensation configuration;
向解码器发送包括各子语音残留信号和对应的补偿信息的码流;码流用于指示解码器根据各子语音残留信号和对应补偿信息进行解码。Send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
或者,该处理器执行计算机程序时实现以下步骤:Or, when the processor executes the computer program, the following steps are implemented:
接收编码器发送的码流;码流包括多个子语音残留信号和对应的补偿信息;各子语音残留信号为从语音残留信号中分流得到的;补偿信息是基于预设的补偿配置确定的;Receive the code stream sent by the encoder; the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is determined based on the preset compensation configuration;
根据码流中的各子语音残留信号和对应的补偿信息进行解码。Decode according to each sub-speech residual signal in the bitstream and the corresponding compensation information.
上述实施例提供的一种计算机设备,其实现原理和技术效果与上述方法实施例类似,在此不再赘述。The implementation principle and technical effect of a computer device provided by the foregoing embodiment are similar to those of the foregoing method embodiment, and will not be repeated here.
在一个实施例中,提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, and when the computer program is executed by a processor, the following steps are implemented:
获取语音残留信号,并对语音残留信号进行分流,得到多个子语音残留信号;语音残留信号为对原始语音信号进行处理后得到的无相关性信号或弱相关性信号;Obtain the speech residual signal, and shunt the speech residual signal to obtain multiple sub-speech residual signals; the speech residual signal is the uncorrelated signal or weakly correlated signal obtained after processing the original speech signal;
基于预设的补偿配置,获取各子语音残留信号的补偿信息;Obtain the compensation information of each sub-speech residual signal based on the preset compensation configuration;
向解码器发送包括各子语音残留信号和对应的补偿信息的码流;码流用于指示解码器根据各子语音残留信号和对应补偿信息进行解码。Send a code stream including each sub-speech residual signal and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each sub-speech residual signal and corresponding compensation information.
或者,该计算机程序被处理器执行时实现以下步骤:Or, when the computer program is executed by the processor, the following steps are implemented:
接收编码器发送的码流;码流包括多个子语音残留信号和对应的补偿信息;各子语音残留信号为从语音残留信号中分流得到的;补偿信息是基于预设的补偿配置确定的;Receive the code stream sent by the encoder; the code stream includes multiple sub-speech residual signals and corresponding compensation information; each sub-speech residual signal is obtained by splitting from the speech residual signal; the compensation information is determined based on the preset compensation configuration;
根据码流中的各子语音残留信号和对应的补偿信息进行解码。Decode according to each sub-speech residual signal in the bitstream and the corresponding compensation information.
上述实施例提供的一种计算机可读存储介质,其实现原理和技术效果与上述方法实施例类似,在此不再赘述。The foregoing embodiment provides a computer-readable storage medium, and its implementation principle and technical effect are similar to those of the foregoing method embodiment, and will not be repeated here.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易 失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and the description is relatively specific and detailed, but it should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims (12)

  1. 一种语音信号处理方法,其特征在于,所述方法包括:A voice signal processing method, characterized in that the method includes:
    获取语音残留信号,并对所述语音残留信号进行分流,得到多个子语音残留信号;所述语音残留信号为对原始语音信号进行处理后得到的无相关性信号或弱相关性信号;Acquiring a voice residual signal, and shunting the voice residual signal to obtain a plurality of sub voice residual signals; the voice residual signal is an uncorrelated signal or a weakly correlated signal obtained after processing the original voice signal;
    基于预设的补偿配置,获取各所述子语音残留信号的补偿信息;Acquiring compensation information of each of the sub-speech residual signals based on a preset compensation configuration;
    向解码器发送包括各所述子语音残留信号和对应的补偿信息的码流;所述码流用于指示所述解码器根据各所述子语音残留信号和对应补偿信息进行解码。A code stream including each of the sub-speech residual signals and corresponding compensation information is sent to the decoder; the code stream is used to instruct the decoder to decode according to each of the sub-speech residual signals and the corresponding compensation information.
  2. 根据权利要求1所述的语音信号处理方法,其特征在于,若所述多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号;The voice signal processing method according to claim 1, wherein if the plurality of sub-speech residual signals includes a first sub-speech residual signal and a second sub-speech residual signal;
    则所述对所述语音残留信号进行分流,得到多个子语音残留信号,包括:Then the shunting the voice residual signal to obtain multiple sub-voice residual signals includes:
    对所述语音残留信号进行量化,得到所述语音残留信号对应的量化序列;Quantizing the speech residual signal to obtain a quantization sequence corresponding to the speech residual signal;
    对所述量化序列进行奇偶分流,获取奇量化序列和偶量化序列;Performing odd-even splitting on the quantization sequence to obtain an odd quantization sequence and an even quantization sequence;
    将所述奇量化序列确定为所述第一子语音残留信号,以及将所述偶量化序列确定为所述第二子语音残留信号。The odd quantization sequence is determined as the first sub-speech residual signal, and the even quantization sequence is determined as the second sub-speech residual signal.
  3. 根据权利要求2所述的语音信号处理方法,其特征在于,所述补偿配置包括补偿参数,所述补偿参数包括各所述子语音残留信号被划分的小帧数量N1,以及每小帧中非零脉冲的数量N2;所述N1为正整数、N2为非负整数;The voice signal processing method according to claim 2, wherein the compensation configuration includes compensation parameters, and the compensation parameters include the number of small frames N1 into which each of the sub-speech residual signals is divided, and the number of small frames in each small frame. The number of zero pulses N2; said N1 is a positive integer, and N2 is a non-negative integer;
    则所述基于预设的补偿配置,获取各所述子语音残留信号的补偿信息,包括:Then the acquiring compensation information of each of the sub-speech residual signals based on the preset compensation configuration includes:
    获取各所述子语音残留信号对应的补偿增益、位置序列和符号序列;所述补偿增益的长度为N1,所述位置序列和所述符号序列的长度均为N2;Acquiring a compensation gain, a position sequence, and a symbol sequence corresponding to each of the sub-speech residual signals; the length of the compensation gain is N1, and the length of the position sequence and the symbol sequence are both N2;
    根据各所述子语音残留信号对应的N1个位置序列和符号序列,构建各所述子语音残留信号的补偿序列;Constructing a compensation sequence of each sub-speech residual signal according to the N1 position sequences and symbol sequences corresponding to each of the sub-speech residual signals;
    根据各所述子语音残留信号的补偿序列和各所述子语音残留信号的补 偿增益,确定各所述子语音残留信号的补偿信息。The compensation information of each of the sub-speech residual signals is determined according to the compensation sequence of each of the sub-speech residual signals and the compensation gain of each of the sub-speech residual signals.
  4. 根据权利要求3所述的语音信号处理方法,其特征在于,所述补偿配置还包括码率配置;所述码率配置根据预设的丢包率确定;The voice signal processing method according to claim 3, wherein the compensation configuration further comprises a code rate configuration; the code rate configuration is determined according to a preset packet loss rate;
    则所述方法还包括:The method further includes:
    根据所述码率配置确定各所述子语音残留信号的补偿信息的空间大小;所述空间大小用于指示在发送所述码流时,存储所述补偿信息的空间容量。The space size of the compensation information of each of the sub-speech residual signals is determined according to the code rate configuration; the space size is used to indicate the space capacity for storing the compensation information when the code stream is sent.
  5. 一种语音信号处理方法,其特征在于,所述方法包括:A voice signal processing method, characterized in that the method includes:
    接收编码器发送的码流;所述码流包括多个子语音残留信号和对应的补偿信息;各所述子语音残留信号为从语音残留信号中分流得到的;所述补偿信息是基于预设的补偿配置确定的;Receive the code stream sent by the encoder; the code stream includes a plurality of sub-speech residual signals and corresponding compensation information; each of the sub-speech residual signals is obtained by splitting from the speech residual signal; the compensation information is based on a preset The compensation configuration is determined;
    根据所述码流中的各所述子语音残留信号和对应的补偿信息进行解码。Decoding is performed according to each of the sub-speech residual signals in the code stream and the corresponding compensation information.
  6. 根据权利要求5所述的语音信号处理方法,其特征在于,若所述多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号;且接收到码流为所述第一子语音残留信号和对应的补偿信息,和,所述第二子语音残留信号和对应的补偿信息;The speech signal processing method according to claim 5, wherein if the plurality of sub speech residual signals include a first sub speech residual signal and a second sub speech residual signal; and the received bit stream is the first sub A speech residual signal and corresponding compensation information, and, the second sub-speech residual signal and corresponding compensation information;
    则所述根据所述码流中的各所述子语音残留信号和对应的补偿信息进行解码,包括:Then the decoding according to each of the sub-speech residual signals in the code stream and the corresponding compensation information includes:
    根据第一子语音残留信号恢复对应的偶语音残留信号,以及根据所述第二子语音残留信号恢复对应的奇语音残留信号;Recovering the corresponding even voice residual signal according to the first sub-speech residual signal, and recovering the corresponding odd voice residual signal according to the second sub-speech residual signal;
    对所述偶语音残留信号和奇语音残留信号进行交织插值,确定所述语音残留信号。Perform interleaving and interpolation on the even voice residual signal and the odd voice residual signal to determine the voice residual signal.
  7. 根据权利要求5所述的语音信号处理方法,其特征在于,若所述多个子语音残留信号包括第一子语音残留信号和第二子语音残留信号;且接收到码流为所述第一子语音残留信号和对应的补偿信息,或者,所述第二子语音残留信号和对应的补偿信息;The speech signal processing method according to claim 5, wherein if the plurality of sub speech residual signals include a first sub speech residual signal and a second sub speech residual signal; and the received bit stream is the first sub A voice residual signal and corresponding compensation information, or the second sub-voice residual signal and corresponding compensation information;
    则所述根据所述码流中的各所述子语音残留信号和对应的补偿信息进行解码,包括:Then the decoding according to each of the sub-speech residual signals in the code stream and the corresponding compensation information includes:
    根据所述第一子语音残留信号恢复对应的偶语音残留信号,或者,根据所述第二子语音残留信号恢复对应的奇语音残留信号;Restore the corresponding even voice residual signal according to the first sub-speech residual signal, or restore the corresponding odd voice residual signal according to the second sub-speech residual signal;
    根据所述偶语音残留信号恢复相似语音残留信号,或者,根据所述奇语音残留信号恢复所述相似语音残留信号;Recovering a similar voice residual signal according to the even voice residual signal, or recovering the similar voice residual signal according to the odd voice residual signal;
    基于所述第一子语音残留信号对应的补偿信息和所述相似语音残留信号,确定目标相似语音残留信号,或者,根据所述第二子语音残留信号对应的补偿信息和所述相似语音残留信号,确定所述目标相似语音残留信号;Determine the target similar voice residual signal based on the compensation information corresponding to the first sub-speech residual signal and the similar voice residual signal, or, according to the compensation information corresponding to the second sub-speech residual signal and the similar voice residual signal , Determine the target similar voice residual signal;
    根据所述目标相似语音残留信号,确定所述语音残留信号。Determine the voice residual signal according to the target similar voice residual signal.
  8. 一种语音信号处理系统,其特征在于,所述系统包括:编码器和解码器;A speech signal processing system, characterized in that the system includes: an encoder and a decoder;
    所述编码器,用于实现所述权利要求1-4中任一项所述语音信号处理方法的步骤;The encoder is used to implement the steps of the speech signal processing method in any one of claims 1-4;
    所述解码器,用于实现所述权利要求5-7中任一项所述语音信号处理方法的步骤。The decoder is used to implement the steps of the speech signal processing method of any one of claims 5-7.
  9. 一种语音信号处理装置,其特征在于,所述装置包括:A voice signal processing device, characterized in that the device includes:
    分流模块,用于获取语音残留信号,并对所述语音残留信号进行分流,得到多个子语音残留信号;所述语音残留信号为对原始语音信号进行处理后得到的无相关性信号或弱相关性信号;The splitting module is used to obtain the voice residual signal and split the voice residual signal to obtain multiple sub voice residual signals; the voice residual signal is an uncorrelated signal or weak correlation signal obtained after processing the original voice signal signal;
    获取模块,用于基于预设的补偿配置,获取各所述子语音残留信号的补偿信息;An obtaining module, configured to obtain compensation information of each of the sub-speech residual signals based on a preset compensation configuration;
    发送模块,用于向解码器发送包括各所述子语音残留信号和对应的补偿信息的码流;所述码流用于指示所述解码器根据各所述子语音残留信号和对应补偿信息进行解码。The sending module is used to send a code stream including each of the sub-speech residual signals and corresponding compensation information to the decoder; the code stream is used to instruct the decoder to decode according to each of the sub-speech residual signals and corresponding compensation information .
  10. 一种语音信号处理装置,其特征在于,所述装置包括:A voice signal processing device, characterized in that the device includes:
    接收模块,用于接收编码器发送的码流;所述码流包括多个子语音残留信号和对应的补偿信息;各所述子语音残留信号为从语音残留信号中分流得到的;所述补偿信息是基于预设的补偿配置确定的;The receiving module is configured to receive the code stream sent by the encoder; the code stream includes a plurality of sub-speech residual signals and corresponding compensation information; each of the sub-speech residual signals is obtained by splitting from the speech residual signal; the compensation information It is determined based on the preset compensation configuration;
    解码模块,用于根据所述码流中的各所述子语音残留信号和对应的补偿信息进行解码。The decoding module is used for decoding according to each of the sub-speech residual signals in the code stream and the corresponding compensation information.
  11. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时实现权利要求1至7中任一项所述的语音信号处理方法的步骤。A computer device, comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the voice signal processing method according to any one of claims 1 to 7 when the processor executes the computer program A step of.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的语音信号处理方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the voice signal processing method according to any one of claims 1 to 7 when the computer program is executed by a processor.
PCT/CN2020/113219 2019-12-31 2020-09-03 Voice signal processing method, system and apparatus, computer device, and storage medium WO2021135340A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911422259.4 2019-12-31
CN201911422259.4A CN111063361B (en) 2019-12-31 2019-12-31 Voice signal processing method, system, device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021135340A1 true WO2021135340A1 (en) 2021-07-08

Family

ID=70306113

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/113219 WO2021135340A1 (en) 2019-12-31 2020-09-03 Voice signal processing method, system and apparatus, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN111063361B (en)
WO (1) WO2021135340A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063361B (en) * 2019-12-31 2023-02-21 广州方硅信息技术有限公司 Voice signal processing method, system, device, computer equipment and storage medium
CN111554322A (en) * 2020-05-15 2020-08-18 腾讯科技(深圳)有限公司 Voice processing method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158728A1 (en) * 2002-02-19 2003-08-21 Ning Bi Speech converter utilizing preprogrammed voice profiles
CN101115051A (en) * 2006-07-25 2008-01-30 华为技术有限公司 Audio signal processing method, system and audio signal transmitting/receiving device
CN101777960A (en) * 2008-11-17 2010-07-14 华为终端有限公司 Audio encoding method, audio decoding method, related device and communication system
CN108109629A (en) * 2016-11-18 2018-06-01 南京大学 A kind of more description voice decoding methods and system based on linear predictive residual classification quantitative
CN111063361A (en) * 2019-12-31 2020-04-24 广州华多网络科技有限公司 Voice signal processing method, system, device, computer equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE60100131T2 (en) * 2000-09-14 2003-12-04 Lucent Technologies Inc Method and device for diversity operation control in voice transmission
CN101325058B (en) * 2007-06-15 2012-04-25 华为技术有限公司 Method and apparatus for coding-transmitting and receiving-decoding speech
CN101630509B (en) * 2008-07-14 2012-04-18 华为技术有限公司 Method, device and system for coding and decoding
TWI390503B (en) * 2009-11-19 2013-03-21 Gemtek Technolog Co Ltd Dual channel voice transmission system, broadcast scheduling design module, packet coding and missing sound quality damage estimation algorithm
CN108231083A (en) * 2018-01-16 2018-06-29 重庆邮电大学 A kind of speech coder code efficiency based on SILK improves method
CN109616129B (en) * 2018-11-13 2021-07-30 南京南大电子智慧型服务机器人研究院有限公司 Mixed multi-description sinusoidal coder method for improving voice frame loss compensation performance

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030158728A1 (en) * 2002-02-19 2003-08-21 Ning Bi Speech converter utilizing preprogrammed voice profiles
CN101115051A (en) * 2006-07-25 2008-01-30 华为技术有限公司 Audio signal processing method, system and audio signal transmitting/receiving device
CN101777960A (en) * 2008-11-17 2010-07-14 华为终端有限公司 Audio encoding method, audio decoding method, related device and communication system
CN108109629A (en) * 2016-11-18 2018-06-01 南京大学 A kind of more description voice decoding methods and system based on linear predictive residual classification quantitative
CN111063361A (en) * 2019-12-31 2020-04-24 广州华多网络科技有限公司 Voice signal processing method, system, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XU MINGLIANG: "Research on Robust Audio Coding and Transmission Algorithms Based on Multiple Description Coding", CHINESE MASTER'S THESES FULL-TEXT DATABASE, TIANJIN POLYTECHNIC UNIVERSITY, CN, 15 December 2011 (2011-12-15), CN, XP055828298, ISSN: 1674-0246 *

Also Published As

Publication number Publication date
CN111063361B (en) 2023-02-21
CN111063361A (en) 2020-04-24

Similar Documents

Publication Publication Date Title
JP4660545B2 (en) Method, apparatus and system for enhancing predictive video codec robustness utilizing side channels based on distributed source coding techniques
US10224040B2 (en) Packet loss concealment apparatus and method, and audio processing system
WO2018077083A1 (en) Audio frame loss recovery method and apparatus
EP2805325B1 (en) Devices, methods and computer-program product for redundant frame coding and decoding
CN100426715C (en) Lost frame hiding method and device
US8428959B2 (en) Audio packet loss concealment by transform interpolation
WO2021135340A1 (en) Voice signal processing method, system and apparatus, computer device, and storage medium
US11031020B2 (en) Speech/audio bitstream decoding method and apparatus
ES2966665T3 (en) Audio coding device and method
US9325544B2 (en) Packet-loss concealment for a degraded frame using replacement data from a non-degraded frame
US10121484B2 (en) Method and apparatus for decoding speech/audio bitstream
WO2008067763A1 (en) A decoding method and device
US10652120B2 (en) Voice quality monitoring system
CN110770822B (en) Audio signal encoding and decoding
US20050169387A1 (en) Method and system for the error resilient transmission of predictively encoded signals
KR20200038297A (en) Method and device for signal reconstruction in stereo signal encoding
KR102654181B1 (en) Method and apparatus for low-cost error recovery in predictive coding
WO2023049628A1 (en) Efficient packet-loss protected data encoding and/or decoding
Alhussain REAL TIME VOICE COMMUNICATION
Florêncio Error-Resilient Coding and Error Concealment Strategies for Audio Communication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20910931

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20910931

Country of ref document: EP

Kind code of ref document: A1