WO2019000178A1 - Procédé et dispositif de compensation de perte de trame - Google Patents

Procédé et dispositif de compensation de perte de trame Download PDF

Info

Publication number
WO2019000178A1
WO2019000178A1 PCT/CN2017/090035 CN2017090035W WO2019000178A1 WO 2019000178 A1 WO2019000178 A1 WO 2019000178A1 CN 2017090035 W CN2017090035 W CN 2017090035W WO 2019000178 A1 WO2019000178 A1 WO 2019000178A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
information
future
historical
current
Prior art date
Application number
PCT/CN2017/090035
Other languages
English (en)
Chinese (zh)
Inventor
高振东
肖建良
刘泽新
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2017/090035 priority Critical patent/WO2019000178A1/fr
Priority to CN201780046044.XA priority patent/CN109496333A/zh
Publication of WO2019000178A1 publication Critical patent/WO2019000178A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received

Definitions

  • the present application relates to the field of voice processing technologies, and in particular, to a frame loss compensation method and device.
  • PS packet switching
  • VoIP Internet Protocol based voice
  • VoIP Internet Protocol based voice
  • the vocoder has a Packet Loss Concealment (PLC) function, and can estimate the code stream information of the current frame loss according to the good frame (history frame) information before the current frame loss.
  • the code stream information of the current frame loss includes formant spectrum information, pitch frequency, fractional pitch, adaptive codebook gain, fixed codebook gain or energy.
  • PLC Packet Loss Concealment
  • the present application provides a frame loss compensation method and device, which can improve the accuracy of frame loss compensation.
  • the embodiment of the present application provides a frame loss compensation method, including:
  • Receiving a sequence of voice code streams acquiring historical frame information and future frame information in a sequence of voice code streams, wherein the sequence of voice code streams includes frame information of a plurality of voice frames, the plurality of voice frames including at least one history frame, at least one current a frame and at least one future frame, at least one historical frame is located before the at least one current frame in the time domain, at least one future frame is located after the at least one current frame in the time domain, and the historical frame information is frame information of at least one historical frame, and the future The frame information is frame information of at least one future frame; and frame information of at least one current frame is estimated according to the historical frame information and the future frame information, thereby improving the accuracy of the frame loss compensation.
  • the sequence of voice code streams includes frame information of a plurality of voice frames, the plurality of voice frames including at least one history frame, at least one current a frame and at least one future frame, at least one historical frame is located before the at least one current frame in the time domain, at least one future frame is located after the at least
  • the type or state of the voice frame in the sequence of voice code streams may be determined, including: determining whether there is a good frame, at least one current frame before the at least one current frame. Whether the previous good frame is a silent frame, whether there is a valid future frame, and so on. For different types or different states of the speech frames in the speech stream sequence, different compensation measures are taken for the current frame, so that the recovered signal is closer to the original signal, and a better frame loss compensation effect is achieved.
  • the voice code stream sequence can be stored in a buffer, such as an AJB buffer. Then, the frame information of the voice code stream sequence in the buffer is decoded to obtain the decoded history frame information and the undecoded future frame information in the buffer.
  • the historical frame information includes formant spectrum information of the historical frame
  • the future frame information includes formant spectrum information of the future frame.
  • the formant spectrum information of the at least one current frame may be determined according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame.
  • the formant spectrum information is the excitation response of the channel at the time of utterance.
  • the frame state in the sequence of the voice code stream may be performed before determining the formant spectrum information of at least one current frame according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame.
  • the judgment includes: judging how many frames are lost, whether there are future good frames, whether there are good frames before the current frame, and the like. Then, according to the frame state in the speech code stream sequence, different methods are used to calculate the formant spectrum information of the current frame loss.
  • the historical frame information includes the pitch value of the historical frame
  • the future frame information includes the pitch value of the future frame.
  • the pitch value of at least one current frame may be determined based on the pitch value of the historical frame and the pitch value of the future frame.
  • the pitch value is the gene frequency of the vocal cord vibration at the time of sounding
  • the gene period is the value of the inverse of the pitch frequency.
  • the frame state of the voice code stream sequence may be determined, including: determining how much is lost. Frame, whether there are future good frames, whether there are good frames before the current frame, etc., and then according to the frame state of the voice code stream sequence, different methods are used to calculate the pitch value of the current frame loss.
  • the size of the spectral tilt of the at least one current frame is determined according to the size of the time domain signal obtained by decoding the historical frame; determining the at least one current frame according to the size of the spectral tilt of the at least one current frame.
  • Frame type For example, the time domain signal is a time domain representation of the decoded frame information after decoding.
  • a pitch change state of a plurality of subframes in at least one current frame may be acquired; and a frame type of at least one current frame is determined according to a pitch change state of the plurality of subframes.
  • the voiced sound is mainly caused by the vocal cord vibration, there is a pitch, and the vocal cord vibration changes relatively slowly, and the pitch changes relatively slowly.
  • each sub-frame has a pitch, so the pitch is used to determine the frame type of at least one current frame.
  • the frame type of at least one current frame is determined, and at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame is determined according to the frame type.
  • the current frame includes a pitch speech frame and a noise speech frame, the adaptive codebook gain is the energy gain of the pitch portion, and the fixed codebook gain is the energy gain of the noise portion.
  • the adaptive codebook gain of at least one current frame is determined according to an adaptive codebook gain and a pitch period of a historical frame, and an energy gain of at least one current frame. And taking the average of the fixed codebook gains of the plurality of historical frames as the fixed codebook gain of the at least one current frame.
  • the energy gain of the at least one current frame is determined according to a time domain signal size in the decoded historical frame information and a length of each subframe in the historical frame.
  • the energy gain of the current frame includes the energy gain of the current frame in the voiced sound or the energy gain of the current frame in the unvoiced sound.
  • the embodiment of the present application provides a frame loss compensation apparatus, where the apparatus is configured to implement the method and function performed by the user equipment in the foregoing first aspect, implemented by hardware/software, and the hardware/software includes the foregoing.
  • the corresponding unit of function is configured to implement the method and function performed by the user equipment in the foregoing first aspect, implemented by hardware/software, and the hardware/software includes the foregoing. The corresponding unit of function.
  • the present application provides a frame loss compensation device, comprising: a vocoder, a memory, and a communication bus, the memory being coupled to the vocoder via the communication bus; wherein the communication bus is used for A connection communication between the processor and the memory is implemented, and the vocoder executes the program stored in the memory for implementing the steps in the frame loss compensation method provided by the above first aspect.
  • Yet another aspect of the present application provides a computer readable storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
  • Yet another aspect of the present application provides a computer program product comprising instructions which, when run on a computer, cause the computer to perform the methods described in the various aspects above.
  • FIG. 1 is a schematic structural diagram of a frame loss compensation system according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a sequence of voice sequences provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a frame loss compensation method according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of another frame loss compensation provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a frame loss compensation apparatus according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of a frame loss compensation apparatus according to an embodiment of the present application.
  • FIG. 1 is a schematic structural diagram of a frame loss compensation system according to an embodiment of the present application.
  • the system can be applied to PS voice calls (including but not limited to VoLTE, VoWiFi and VoIP) scenarios, and the system can also be applied to Circuit Switched (CS) calls with increased buffering.
  • the system includes a base station and a receiving device, and the receiving device may refer to a device that provides a voice and/or data connection to the user, ie, a user device, or may be connected to a computing device such as a laptop or desktop computer, or It can be a standalone device such as a Personal Digital Assistant (PDA).
  • PDA Personal Digital Assistant
  • a receiving device may also be referred to as a system, subscriber unit, subscriber station, mobile station, mobile station, remote station, access point, remote terminal, access terminal, user terminal, user agent, or user device.
  • a base station may be an access point, a Node B, an evolved node (eNB), or a 5G base station, referred to as a device in an access network that communicates with a wireless terminal over one or more sectors over an air interface.
  • eNB evolved node
  • 5G base station referred to as a device in an access network that communicates with a wireless terminal over one or more sectors over an air interface.
  • IP Internet Protocol
  • the base station can act as a signal relay device between the wireless terminal and the rest of the access network, which can include an internetwork protocol network.
  • the base station can also coordinate the management of the attributes of the air interface.
  • Both the base station and the user equipment in this embodiment may adopt the frame loss compensation method mentioned in the following embodiments, and include a corresponding frame loss compensation device to implement frame loss compensation for the voice signal.
  • any device receives the voice code stream sequence from the peer device, it can decode the voice code stream sequence to obtain the decoded frame information, and compensate for the lost frame and perform subsequent decoding.
  • FIG. 2 is a schematic diagram of a voice code stream sequence provided by an embodiment of the present application.
  • a common voice signal that is, a voice frame
  • a current frame at least one frame at time T
  • a future frame at least one frame at time T
  • the time T is a unit of time or a point in time.
  • the history frame includes an N-1 frame, an N-2 frame, an N-3 frame, an N-4 frame, and the like
  • the current frame is a lost frame, including: an N frame and an N+1 frame
  • the future frame includes an N+2 frame.
  • the lost frame or lost frame involved in frame loss compensation in this embodiment may include a lost frame in transmission, and is damaged. Frames, frames that cannot be received or decoded correctly, or frames that are not available for a specific reason.
  • FIG. 3 is a schematic flowchart of a frame loss compensation method according to an embodiment of the present application. As shown in FIG. 3, the method in this embodiment of the present application includes:
  • the sequence of voice code streams includes frame information of multiple voice frames, where the multiple voice frames include at least one history frame, at least one a current frame and at least one future frame, the at least one historical frame being located before the at least one current frame in the time domain, the at least one future frame being located after the at least one current frame in the time domain, the historical frame information being the Frame information of at least one history frame, the future frame information being frame information of the at least one future frame.
  • the at least one current frame may be a frame loss caused by various reasons.
  • the voice code stream sequence may be stored in a buffer of a memory or a buffer, such as a buffer of the AJB.
  • the frame information of the voice code stream sequence in the buffer is then decoded to obtain the decoded history frame information and the undecoded future frame information in the buffer.
  • the decoded history frame is a voice analog signal
  • the historical frame before decoding is a voice digital signal.
  • the future frame is not decoded, and the frame loss compensation device or system can parse the future frame to obtain partially valid frame information, such as formant spectrum information and pitch value.
  • information or data of a plurality of frames including a history frame, a current frame, and a future frame are buffered in the buffer.
  • the type or state of the voice frame in the voice code stream sequence may be determined, including: determining whether there is a good frame before the frame loss (ie, a normal frame that can be used as compensation), and before the frame loss is good. Whether the frame is a silent frame, whether there is a valid future frame, and so on. For different types or different states of the speech frames in the sequence of the speech code stream, different compensation measures are taken for the current frame in S302, so that the recovered signal is closer to the original signal, and a better frame loss compensation effect is achieved.
  • FIG. 4 is a schematic flowchart of another frame loss compensation method according to an embodiment of the present disclosure. include:
  • S401 Determine whether the current frame is a dropped frame or a bad frame.
  • S402. If the current frame is a good frame, decode the good frame.
  • S403. If the current frame is a bad frame or a dropped frame, determine whether the good frame before the current frame is a Silence Insertion Descriptor (SID).
  • S404. If the good frame before the current frame is a silent frame, the silent frame is directly decoded.
  • S405. If the good frame before the current frame is not a silent frame, determine whether there is a valid future frame after the current frame.
  • S406. If there is no valid future frame after the current frame, the current frame is compensated according to the historical frame information.
  • S407. If there is a valid future frame, the current frame loss is compensated according to the historical frame information and the future frame information. The specific implementation manner of this step is described in detail below.
  • S408. After compensating the frame information of the current frame, decoding the compensated current frame.
  • step S407 is described in detail, and the current frame loss can be compensated by considering the historical frame information and the future frame information.
  • the specific method is as follows:
  • the historical frame information includes formant spectrum information of the historical frame
  • the future frame information includes formant spectrum information of the future frame.
  • the formant spectrum information of the at least one current frame may be determined according to the formant spectrum information of the historical frame and the formant spectrum information of the future frame.
  • the formant spectrum information is the excitation response of the channel when the sound is emitted, including the Immittance Spectral Frequency (ISF).
  • ISF Immittance Spectral Frequency
  • Information uses an ISF vector to represent formant spectrum information.
  • the N-2 frame and the N-1 frame in the speech code stream sequence are good frames, N frames and N+1 frames are lost, and there are future N+2 frames and N+3 frames, and the first-order polynomial fitting is used to calculate N.
  • the formant spectrum information of the frame is .
  • ISF i (N-1) a+b ⁇ (N-1);
  • ISF i (N+2) a+b ⁇ (N+2)
  • the formant spectrum information of the N-1 frame and the N+2 frame is represented by the peak spectrum information of the plurality of points, and the formant spectrum information is processed by the filter, and each point represents a coefficient pair of the filter.
  • ISF i (N-1) is the formant spectrum information corresponding to the i-th point of the N-1 frame
  • ISF i (N+2) is the formant spectrum information corresponding to the i-th point of the N+2 frame
  • ISF i (N) a+b ⁇ N
  • the frame state in the sequence of the voice code stream may be The judgment includes: judging how many frames are lost, whether there are future good frames, whether there are good frames before the current frame, and the like. Then, according to the frame state in the speech code stream sequence, different methods are used to calculate the formant spectrum information of the current frame loss.
  • the first-order polynomial in the prior art is used to fit the two frames before the current frame loss.
  • the previous frame of the current frame loss is a good frame, one frame or more is lost, and there is no future good frame
  • the previous frame of the current frame loss is used to fit with the ISF mean (i).
  • Loss of 3 frames or more, with good future frames using first-order polynomial to fit with ISF mean (i) and future good frames.
  • the first-order polynomial is used to fit the good frame before the current frame loss and the future good frame. This situation has been described in detail above. .
  • ISF mean (i) is calculated as follows:
  • past_ISF q (i) is the formant spectrum information corresponding to the i-th point of the previous frame of the current frame loss
  • is a preset constant
  • ISF const _ mean (i) is the average of the formant spectrum information over a period of time.
  • the historical frame information includes a pitch value of the historical frame
  • the future frame information includes a pitch value of the future frame.
  • the pitch value of the at least one current frame may be determined according to a pitch value of the historical frame and a pitch value of the future frame.
  • the pitch value is the pitch frequency of the vocal cord vibration at the time of sounding, and is the reciprocal of the pitch period.
  • the pitch value is the pitch period.
  • a sequence of speech streams includes four pitches per frame.
  • Set N-2 frame, N-1 frame is a good frame, there are no N frames and N+1 frames, there are N+2 frames and N+3 frames, and second-order polynomials are used according to N-1 frames and N+2 frames.
  • the pitch value pitch 1 (N-1), pitch 2 (N-1), pitch 3 (N-1), pitch 4 (N-1), and N+2 frame pitch value pitch of the N-1 frame are known. 1 (N+2), pitch 2 (N+2), pitch 3 (N+2), pitch 4 (N+2).
  • pitch represents a pitch value
  • N represents a frame number
  • a subscript indicates a position of the subframe within each frame, for example, each sub-frame corresponds to a pitch.
  • the second-order polynomial is as follows:
  • Y a 0 + a 1 x + a 2 x 2
  • a 0 , a 1 , and a 2 are coefficients of the fitting curve, respectively, which may be preset according to engineering design experience. According to the principle of the smallest squared deviation, the following matrix can be obtained:
  • x i is the time point of the ith subframe in the N-1 frame and the N+2 frame
  • y i is N-
  • the pitch 1 (N-1) time point is defined as 4 ⁇ (N-1)+1
  • the pitch 2 (N-1) time point is defined as 4 ⁇ (N-1)+2
  • the time point is defined as 4 ⁇ (N+2)+4.
  • the frame state of the voice code stream sequence may be determined, including: determining the loss. How many frames, whether there are future good frames, whether there are good frames before the current frame loss, etc., and then according to the frame state of the voice code stream sequence, different methods are used to calculate the pitch value of the current frame loss.
  • the second-order polynomial is used to fit the pitch values of the current frame loss by using the good frame before the current frame loss.
  • the current frame loss includes a pitched speech frame and a noisy speech frame. If 4 frames or more is lost, the pitch energy is reduced, and only the pitch value of the noise is compensated.
  • the three frames before the current frame loss are good frames, 1-3 frames are lost, and there are future good frames.
  • the second-order polynomial is used to fit the pitch values of the current frame loss by using the good frame and the future good frame before the current frame loss. This situation has been introduced above.
  • S303 Determine a frame type of the current frame.
  • the frame type includes unvoiced and voiced.
  • the vocal characteristics of voiced and unvoiced voices are quite different.
  • the frame type of the current frame is different, and the frame loss compensation strategy used is different.
  • the difference between voiced and unvoiced is that the voiced signal has significant periodicity due to vocal cord vibration.
  • Periodic detection can employ algorithms such as zero-crossing rate, correlation, spectral tilt, or pitch change rate. Among them, the zero-crossing rate and correlation calculation are widely used in the prior art and will not be described. The following describes the frame state of a speech signal by spectral tilt and pitch change rate.
  • the size of the spectral tilt of the at least one current frame may be determined according to the size of the time domain signal obtained by decoding the historical frame; and determining the size according to the size of the spectral tilt of the at least one current frame.
  • the frame type of at least one current frame wherein, the pitch frequency of the voiced speech signal is below 500 Hz, and the periodic signal can be determined according to the spectral tilt. Calculate the spectral tilt formula as follows:
  • tilt is the magnitude of the spectral tilt of the current frame
  • s is the size of the simulated time domain signal obtained by decoding the historical frame
  • i is the time point of the time domain signal in the time direction.
  • a pitch change state of the plurality of subframes in the at least one current frame may be acquired; and a frame type of the at least one current frame is determined according to a pitch change state of the multiple subframes.
  • the voiced sound is mainly caused by the vocal cord vibration, there is a pitch, and the vocal cord vibration changes relatively slowly, and the pitch changes relatively slowly.
  • Each sub-frame has a pitch, so the pitch is used to determine the frame type of the current dropped frame.
  • pitch change is the pitch change state of 4 subframes in a frame
  • pitch(i) is the pitch value of the i-th subframe
  • a frame type of the at least one current frame may be determined, and at least one of an adaptive codebook gain and a fixed codebook gain of the at least one current frame is determined according to the frame type.
  • the current frame includes a pitched speech frame and a noisy speech frame, the adaptive codebook gain is the energy gain of the pitch portion, and the fixed codebook gain is the energy gain of the noise portion.
  • determining the self of the at least one current frame according to an adaptive codebook gain and a pitch period of a historical frame, and an energy gain of the at least one current frame.
  • the codebook gain is adapted and the average of the fixed codebook gains of the plurality of historical frames is used as the fixed codebook gain of the at least one current frame.
  • a history frame may be the latest history frame before the current frame.
  • determining the fixing of the at least one current frame according to a fixed codebook gain and a pitch period of a historical frame, and an energy gain of the at least one current frame.
  • the codebook gains and averages the adaptive codebook gains of the plurality of historical frames as the adaptive codebook gain of the at least one current frame.
  • the current lost frame is enhanced in energy adjustment, including: if the frame state of the current dropped frame is voiced, the adaptive codebook gain of the current dropped frame
  • the fixed codebook gain of the current frame loss g c median5(g c (n-1),...,g c (n-5)), where g p (n-1) is the most recent one of the history frames
  • G voice is the energy gain of the current frame loss in the voiced sound
  • T c is the pitch period of the most recent history frame
  • median5(g c (n-1),...,g c (n-5 )) is the average of the fixed codebook gains for the last five historical frames.
  • the adaptive codebook gain of the current frame loss g p median5(g p (n-1),...,g p (n-5)), the current frame loss Fixed codebook gain
  • median5(g p (n-1),...,g p (n-5)) is the average of the adaptive codebook gains of the last five historical frames
  • g c (n-1) is the nearest one.
  • the fixed codebook gain of the history frame G noise is the energy gain of the current frame loss in the unvoiced sound, and the T c is the pitch period of the most recent history frame.
  • median5(g p (n-1),...,g p (n-5)) is the average of the adaptive codebook gains of the last five historical frames
  • median5(g c (n-1), ..., g c (n-5)) is the average of the fixed codebook gains of the last five historical frames
  • the at least one current energy gain may be determined according to a time domain signal size in the decoded historical frame information and a length of each subframe in the historical frame.
  • the energy gain of the current frame includes an energy gain of the current dropped frame in the voiced sound or an energy gain of the current dropped frame in the unvoiced sound. Since the future frame is not decoded at the current time, the energy gain of the current frame loss can only be determined based on the history frame information. When the current good frame is voiced, the energy gain of the current frame loss is calculated as follows:
  • S is the time domain signal size obtained by decoding the previous good frame of the current frame
  • the length is 4L subfr
  • L subfr represents the length of one subframe
  • T c represents the pitch period of the previous good frame
  • i is the time domain signal. The point in time in the direction of time. In order to prevent the G voice from being too large or too small, the energy of the restored frame is unpredictable, and G voice is limited to [0, 2].
  • the energy gain of the current frame is calculated as follows:
  • S is the time domain signal size obtained by decoding the previous good frame of the current frame
  • L subfr is the length of one subframe
  • i is the time point of the time domain signal in the time direction.
  • the historical frame information and the future frame information in the sequence of the voice code stream are first obtained, and then the formant spectrum information of the current frame loss in the voice signal is estimated according to the historical frame information and the future frame information. Pitch value, fixed codebook gain, adaptive codebook gain and energy, etc.
  • FIG. 5 is a schematic structural diagram of a frame loss compensation apparatus according to an embodiment of the present disclosure.
  • the device may be a vocoder, and may include, for example, a receiving module 501, an obtaining module 502, and a processing module 503. A detailed description of the module is as follows.
  • the receiving module 501 is configured to receive a voice code stream sequence.
  • the obtaining module 502 is configured to acquire historical frame information and future frame information in the sequence of the voice code stream, where the voice code stream sequence includes frame information of multiple voice frames, where the multiple voice frames include at least one history a frame, at least one current frame, and at least one future frame, the at least one historical frame being located before the at least one current frame in the time domain, the at least one future frame being located after the at least one current frame in the time domain, the historical frame
  • the information is frame information of the at least one history frame
  • the future frame information is frame information of the at least one future frame;
  • the processing module 503 is configured to estimate frame information of the at least one current frame according to the historical frame information and the future frame information.
  • sequence of the voice code stream is stored in a buffer
  • the processing module 503 is specifically configured to: decode frame information of multiple voice frames of the voice code stream sequence in the buffer to obtain the decoded history frame information; Obtaining the undecoded future frame information.
  • the historical frame information includes formant spectrum information of the at least one historical frame, and the future frame information includes formant spectrum information of the at least one future frame;
  • the processing module 503 is specifically configured to: determine formant spectrum information of the at least one current frame according to formant spectrum information of the historical frame and formant spectrum information of the future frame.
  • the historical frame information includes a pitch value of the at least one historical frame
  • the future frame information includes a pitch value of the at least one future frame
  • the processing module 503 is specifically configured to: determine a pitch value of the at least one current frame according to a pitch value of the at least one historical frame and a pitch value of the at least one future frame.
  • the historical frame information includes energy of the at least one historical frame
  • the future frame information includes the The energy of at least one future frame
  • the processing module 503 is specifically configured to: determine, according to the energy of the at least one historical frame and the energy of the at least one future frame, the energy of the at least one current frame.
  • the processing module 503 is specifically configured to: determine a frame type of the at least one current frame, where the frame type includes unvoiced or voiced sound;
  • the processing module 503 is further configured to determine a size of a spectral tilt of the at least one current frame
  • the processing module 503 is further configured to acquire a pitch change state of the multiple subframes in the at least one current frame.
  • the processing module 503 is specifically configured to: if the frame type is voiced, determine the at least one according to an adaptive codebook gain and a pitch period of a historical frame, and an energy gain of the at least one current frame. An adaptive codebook gain of the current frame, and an average of the fixed codebook gains of the plurality of historical frames as a fixed codebook gain of the at least one current frame.
  • the processing module 503 is specifically configured to: if the frame type is unvoiced, determine the at least one current according to a fixed codebook gain and a pitch period of a historical frame, and an energy gain of the at least one current frame.
  • the fixed codebook gain of the frame, and the average of the adaptive codebook gains of the plurality of historical frames is used as the adaptive codebook gain of the at least one current frame.
  • the processing module 503 is further configured to determine the at least one current energy gain according to the size of the time domain signal in the decoded historical frame information and the length of each subframe in the historical frame.
  • each module may also perform the methods and functions performed in the foregoing embodiments corresponding to the corresponding descriptions of the method embodiments shown in FIG.
  • FIG. 6 is a schematic structural diagram of a frame loss compensation device according to the present application.
  • the apparatus can include at least one vocoder 601, such as an Adaptive Multi-Rate Wideband (AMR-WB), at least one communication interface 602, at least one memory 603, and at least one communication bus 604. .
  • the communication bus 604 is used to implement connection communication between these components.
  • the communication interface 602 of the device in the embodiment of the present application is used for signaling or data communication with other node devices.
  • the memory 603 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory.
  • the memory 603 can optionally also be at least one storage device located remotely from the vocoder 601.
  • a set of program codes is stored in the memory 603, and may be further used to store temporary data such as intermediate operation data of the vocoder 601.
  • the vocoder 601 executes the program code in the memory 603 to implement the method mentioned in the previous embodiment, and can be specifically referred to the description of the previous embodiment. Further, the vocoder 601 can also cooperate with the memory 603 and the communication interface 602 to perform the operations of the receiving device in the above-mentioned application embodiment.
  • the vocoder 601 may specifically include a processor that executes the program code, such as a central processing unit (CPU) or a digital signal processor (DSP) or the like.
  • communication interface 602 can be used to receive a stream of voice code streams.
  • the memory 603 may not have program code, and the vocoder 601 may include a hardware processor that does not need to execute program code, such as an application specific integrated circuit (ASIC) or a field programmable logic gate array (FPGA). Or a hardware accelerator formed by an integrated circuit. At this time, the memory 603 may be used only for storing temporary data such as intermediate operation data of the vocoder 601.
  • ASIC application specific integrated circuit
  • FPGA field programmable logic gate array
  • the memory 603 may be used only for storing temporary data such as intermediate operation data of the vocoder 601.
  • the functions of the method may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software When implemented in software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention concerne un procédé de compensation de perte de trame et un dispositif faisant intervenir : la réception d'une séquence de flux de code vocal ; l'acquisition des informations de trames historiques et des informations de trames futures dans la séquence de flux de code vocal, la séquence de flux de code vocal comprenant des informations de trame pour de multiples trames vocales qui comprennent au moins une trame historique, au moins une trame actuelle et au moins une trame future, la ou les trames historiques étant situées avant la ou les trames actuelles dans le domaine temporel, et la ou les trames futures sont situées après la ou les trames actuelles dans le domaine temporel, les informations de trames historiques étant des informations de trames pour la ou les trames historiques et les informations de trames futures étant des informations de trames pour la ou les trames futures ; et l'estimation, sur la base des informations de trames historiques et des informations de trames futures, des informations de trames de la ou des trames actuelles, de façon à augmenter la précision de la compensation de perte de trame.
PCT/CN2017/090035 2017-06-26 2017-06-26 Procédé et dispositif de compensation de perte de trame WO2019000178A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/090035 WO2019000178A1 (fr) 2017-06-26 2017-06-26 Procédé et dispositif de compensation de perte de trame
CN201780046044.XA CN109496333A (zh) 2017-06-26 2017-06-26 一种丢帧补偿方法及设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/090035 WO2019000178A1 (fr) 2017-06-26 2017-06-26 Procédé et dispositif de compensation de perte de trame

Publications (1)

Publication Number Publication Date
WO2019000178A1 true WO2019000178A1 (fr) 2019-01-03

Family

ID=64740767

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/090035 WO2019000178A1 (fr) 2017-06-26 2017-06-26 Procédé et dispositif de compensation de perte de trame

Country Status (2)

Country Link
CN (1) CN109496333A (fr)
WO (1) WO2019000178A1 (fr)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111836117B (zh) * 2019-04-15 2022-08-09 深信服科技股份有限公司 一种补帧数据的发送方法、装置及相关组件
CN111554308A (zh) * 2020-05-15 2020-08-18 腾讯科技(深圳)有限公司 一种语音处理方法、装置、设备及存储介质
CN111711992B (zh) * 2020-06-23 2023-05-02 瓴盛科技有限公司 Cs语音下行链路抖动的校准方法
CN112489665B (zh) * 2020-11-11 2024-02-23 北京融讯科创技术有限公司 语音处理方法、装置以及电子设备
CN112634912B (zh) * 2020-12-18 2024-04-09 北京猿力未来科技有限公司 丢包补偿方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004239930A (ja) * 2003-02-03 2004-08-26 Iwatsu Electric Co Ltd パケット損失補償におけるピッチ検出方法と装置
CN101147190A (zh) * 2005-01-31 2008-03-19 高通股份有限公司 语音通信中的帧擦除隐蔽
CN101894558A (zh) * 2010-08-04 2010-11-24 华为技术有限公司 丢帧恢复方法、设备以及语音增强方法、设备和系统
CN102449690A (zh) * 2009-06-04 2012-05-09 高通股份有限公司 用于重建被擦除语音帧的系统与方法
CN103714820A (zh) * 2013-12-27 2014-04-09 广州华多网络科技有限公司 参数域的丢包隐藏方法及装置
CN106251875A (zh) * 2016-08-12 2016-12-21 广州市百果园网络科技有限公司 一种丢帧补偿的方法及终端

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2388439A1 (fr) * 2002-05-31 2003-11-30 Voiceage Corporation Methode et dispositif de dissimulation d'effacement de cadres dans des codecs de la parole a prevision lineaire
KR100542435B1 (ko) * 2003-09-01 2006-01-11 한국전자통신연구원 패킷 망에서의 프레임 손실 은닉 방법 및 장치
US8255207B2 (en) * 2005-12-28 2012-08-28 Voiceage Corporation Method and device for efficient frame erasure concealment in speech codecs
CN101009098B (zh) * 2007-01-26 2011-01-26 清华大学 声码器增益参数分模式抗信道误码方法
KR100998396B1 (ko) * 2008-03-20 2010-12-03 광주과학기술원 프레임 손실 은닉 방법, 프레임 손실 은닉 장치 및 음성송수신 장치
CN101630242B (zh) * 2009-07-28 2011-01-12 苏州国芯科技有限公司 G.723.1编码器快速计算自适应码书的贡献模块
CN103325375B (zh) * 2013-06-05 2016-05-04 上海交通大学 一种极低码率语音编解码设备及编解码方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004239930A (ja) * 2003-02-03 2004-08-26 Iwatsu Electric Co Ltd パケット損失補償におけるピッチ検出方法と装置
CN101147190A (zh) * 2005-01-31 2008-03-19 高通股份有限公司 语音通信中的帧擦除隐蔽
CN102449690A (zh) * 2009-06-04 2012-05-09 高通股份有限公司 用于重建被擦除语音帧的系统与方法
CN101894558A (zh) * 2010-08-04 2010-11-24 华为技术有限公司 丢帧恢复方法、设备以及语音增强方法、设备和系统
CN103714820A (zh) * 2013-12-27 2014-04-09 广州华多网络科技有限公司 参数域的丢包隐藏方法及装置
CN106251875A (zh) * 2016-08-12 2016-12-21 广州市百果园网络科技有限公司 一种丢帧补偿的方法及终端

Also Published As

Publication number Publication date
CN109496333A (zh) 2019-03-19

Similar Documents

Publication Publication Date Title
WO2019000178A1 (fr) Procédé et dispositif de compensation de perte de trame
US9047863B2 (en) Systems, methods, apparatus, and computer-readable media for criticality threshold control
JP5232151B2 (ja) パケットベースのエコー除去および抑制
KR100581413B1 (ko) 음성 복호기에서 프레임 오류 은폐를 위한 개선된스펙트럼 매개변수 대체
JP5571235B2 (ja) ピッチ調整コーディング及び非ピッチ調整コーディングを使用する信号符号化
US8352252B2 (en) Systems and methods for preventing the loss of information within a speech frame
US7778824B2 (en) Device and method for frame lost concealment
EP2140637B1 (fr) Procédé de transmission de données dans un système de communication
US8401865B2 (en) Flexible parameter update in audio/speech coded signals
CN107248411B (zh) 丢帧补偿处理方法和装置
CN112489665A (zh) 语音处理方法、装置以及电子设备
JP2023166423A (ja) Mdct係数からのスペクトル形状予測
JP5553760B2 (ja) 符号化されたパラメータからの音声エネルギ推定
JP6264673B2 (ja) ロストフレームを処理するための方法および復号器
US20040138878A1 (en) Method for estimating a codec parameter
JP6759927B2 (ja) 発話評価装置、発話評価方法、および発話評価プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17915498

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17915498

Country of ref document: EP

Kind code of ref document: A1