WO2006016439A1 - 消失補償装置、消失補償方法、および消失補償プログラム - Google Patents
消失補償装置、消失補償方法、および消失補償プログラム Download PDFInfo
- Publication number
- WO2006016439A1 WO2006016439A1 PCT/JP2005/006850 JP2005006850W WO2006016439A1 WO 2006016439 A1 WO2006016439 A1 WO 2006016439A1 JP 2005006850 W JP2005006850 W JP 2005006850W WO 2006016439 A1 WO2006016439 A1 WO 2006016439A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- periodic signal
- unit
- erasure
- loss
- interpolation
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/004—Arrangements for detecting or preventing errors in the information received by using forward error control
- H04L1/0045—Arrangements at the receiver end
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/22—Arrangements for supervision, monitoring or testing
- H04M3/2236—Quality of speech transmission monitoring
Definitions
- Erasure compensation apparatus Erasure compensation apparatus, erasure compensation method, and erasure compensation program
- the present invention relates to an erasure compensation device, an erasure compensation method, and an erasure compensation program, and is suitable for application to, for example, real-time communication such as a voice call.
- Non-Patent Document 1 assumes the PCM (pulse modulation) coding system described in Non-Patent Document 2 below as the coding system.
- Non-Patent Document 1 With the technology of Non-Patent Document 1, the decoding result of a decoded audio signal (hereinafter, decoding result) obtained by decoding audio encoded data that is an audio signal encoded by the PCM encoding method of Non-Patent Document 2 is obtained. Store it in a functional unit (such as a memory) that can store. On the other hand, voice loss is monitored for each voice frame (frame) which is a decoding processing unit, and compensation processing is executed every time voice loss occurs.
- a functional unit such as a memory
- F1 to F7 indicate frames (decoded audio signals) to be received in time series.
- F1 is received earliest, and F2, F3, ... are received in sequence.
- voice loss is detected in the three sections corresponding to these three frames F4 to F6.
- FIG. 2 (B) is a waveform representation of the decoding result stored in the memory.
- the basic period length ⁇ is assumed to be shorter than the decoding result for one frame. It is acceptable even if the basic period length ⁇ ⁇ is longer than the decoding result for one frame.
- Fig. 2 (C) shows compensation processing for the section corresponding to frame F4
- Fig. 2 (D) shows compensation processing for the section corresponding to frame F 5
- Fig. 2 (E) shows frame F6. The compensation processing for the section corresponding to is shown.
- the interval corresponding to one basic period stored in the memory immediately before frame F4 Generate interpolated speech data to compensate for speech loss based on Ta decoding results.
- the interval Ta is an interval corresponding to the basic period T1.
- the earliest position B4 of the section Ta is set as the start position of the interpolated audio data, and interpolated audio data is generated by acquiring one frame.
- one basic period is shorter than one frame, there will be a shortage even if the decoding result S41 for one basic period is obtained, so this shortage will return to the oldest position B4 again.
- Get the decoding result S42 to compensate.
- the connection of S41 and S42 is inserted as interpolated speech data into the section corresponding to the frame F4.
- processing such as overlay addition is performed so that the junction of S41 and S42 is a continuous waveform.
- section Tb is a section corresponding to the basic periods T1 and T2.
- Position B5 at which acquisition of interpolated audio data is started in section Tb for two basic periods is determined as follows. That is, in general, E4 (the right end of S42), which is the end position of S42 obtained in Fig. 2 (C) last time, is selected as the corresponding position B5, but as shown in the example in the figure, E4 is in the section Tb. In the case where it is not included in section T2 in the oldest one basic period, move to the oldest side in one basic period T until it enters section T2, and position B5 is determined. In the example shown, The position where device E4 is moved to the oldest side by one basic period corresponds to B5.
- section Tc is a section corresponding to the basic periods Tl, ⁇ 2 and ⁇ 3. Also in Fig. 2 (E), as in Fig. 2 (D), the position ⁇ 6 where the acquisition of interpolated audio data is started is determined, from which one frame of data S61, S62 is obtained and corresponding to frame F6. Interpolated voice data to be inserted into the section is generated.
- the position ⁇ 6 (the left end of S61) corresponds to the position moved to the oldest side by one basic period from the position ⁇ 5.
- the interpolated voice data is gradually attenuated in the second and subsequent frames (F5 and F6 in the case of FIG. 2). For example, linearly attenuate 20% for 10ms. As a result, it is possible to suppress the occurrence of abnormal sounds such as beep sounds that occur when the same audio data is continuously output.
- Non-Patent Document 1 ITU—T Recommendation G.711 Appendix I
- Non-Patent Document 2 ITU-T Recommendation G.711
- Non-Patent Document 1 in order to avoid the generation of abnormal sounds such as beeps, when voice loss continues over a plurality of frames (for example, continuous voice loss of 60 ms or more occurs). Silence will be output after a certain period of time. Therefore, long-term speech loss compensation cannot be performed, flexibility is poor, and communication quality in a broad sense is low.
- a storage capacity sufficient to store the decoding results for three basic periods is required, so that storage resources such as a memory are consumed and the efficiency is low.
- the details will be described later. As described above, in the actual implementation, there is a high possibility that a storage capacity sufficient to store the decoding results of three basic periods or more is required.
- the erasure occurs in an arbitrary section of a periodic signal divided into predetermined sections and received in a time series.
- a periodic signal storage unit for storing a newly received periodic signal of one or more sections for a predetermined time, and (2) erasure of the periodic signal in sections.
- An element periodic signal generation unit that generates a plurality of element periodic signals for interpolation having (4) a plurality of element periodic signals generated by the element periodic signal generation unit, and a combination result Is arranged in a section where the loss of the periodic signal occurs.
- the erasure compensation method for compensating for the disappearance when an erasure occurs in an arbitrary section of the periodic signal that is divided into predetermined sections and received in time series
- the periodic signal storage unit stores the newly received periodic signal of one or more sections for a predetermined time
- the element periodic signal generation unit has different waveforms based on the periodic signal stored in the periodic signal storage unit at that time.
- (4) Combining multiple element periodic signals generated by the element periodic signal generator, and combining the combined results with the periodic signal loss It arrange
- the erasure compensation program that compensates for erasure when an erasure occurs in an arbitrary interval of a periodic signal that is divided into predetermined intervals and received in time series. And (1) a periodic signal storage function for storing a newly received periodic signal of one or more sections for a predetermined time, and (2) detecting the disappearance of the periodic signal for each section. Erasure detection function; (3) When erasure is detected by the erasure detection function, the periodic signal stored in the periodic signal storage function is stored at that time.
- an element periodic signal generation function that generates a plurality of element periodic signals for interpolation having different waveforms, and (4) a plurality of element periodic signals generated by the element periodic signal generation function It is characterized in that it is synthesized and the result of the synthesis is placed in a section where the loss of the periodic signal occurs.
- FIG. 1 is an operation explanatory diagram of the first embodiment.
- FIG. 2 is a schematic diagram showing a conventional interpolated speech creation operation.
- FIG. 3 is a schematic diagram showing an example of an internal configuration of a communication terminal according to the first to fifth embodiments.
- FIG. 4 is a schematic diagram showing an example of the internal configuration of a compensator used in the first embodiment.
- FIG. 5 is a schematic diagram showing an example of the internal configuration of a compensator used in the second embodiment.
- FIG. 6 is a schematic diagram showing an example of the internal configuration of a compensator used in the third embodiment.
- FIG. 7 is a schematic diagram showing an example of the internal configuration of a compensator used in the fourth embodiment.
- FIG. 8 is a schematic diagram showing an example of the internal configuration of a synthesis unit used in the fourth embodiment.
- FIG. 9 is a schematic diagram showing an example of the internal configuration of a compensator used in the fifth embodiment.
- FIG. 10 is a schematic diagram showing an example of the overall configuration of a communication system according to the first to fifth embodiments.
- FIG. 1 An example of the overall configuration of the communication system 20 according to the present embodiment is shown in FIG.
- the communication system 20 includes a network 21 and communication terminals 22 and 23.
- the network 21 may be an Internet provided by a telecommunications carrier, which may be the Internet, and may be an IP network that guarantees communication quality to some extent.
- the communication terminal 22 is a communication device that can execute a voice call such as an IP telephone (VoIP compatible telephone) in real time.
- IP phones use VoIP technology to allow voice data to be exchanged over a network that uses the IP protocol.
- the communication terminal 23 is also the same communication device as the communication terminal 22.
- the communication terminal 22 is used by the user U1, and the communication terminal 23 is used by the user U2.
- IP telephones have the ability to exchange voice in both directions to establish a conversation between users.
- IP packets packets
- frames voice frames
- the length of one frame is not limited, but may be 10 ms, for example.
- the PCM speech coding method may be used as the coding method.
- the communication terminal 23 Since the frames included in the packets PK11 to PK13 contain audio data indicating the contents (audio information) uttered by the user U1, the communication terminal 23 performs only reception processing as far as this direction is concerned. User U2 only listens to the voice spoken by user U1. Although it is possible to adopt a configuration in which a single packet includes a plurality of frames, it is assumed here that a single packet includes a single frame for the sake of simplicity.
- the packet PK11 corresponds to the frame F2
- the packet PK12 corresponds to the frame F3
- the packet PK13 corresponds to the frame F4. For this reason, for example, when the packet PK13 is lost on the network 21, the frame F4 is lost, and voice loss occurs in the section corresponding to the frame F4.
- the communication terminal 23 includes a decoder 10, a compensator 11, and an erasure determiner 12.
- the decoder 10 decodes the voice data (eg, CD11) extracted from the packet for each packet (eg, PK11) received by the communication terminal 23, and the decoding result (eg, DC11 ) Is output.
- the unit (processing unit) of decoding processing by the decoder 10 is the frame.
- the decoding result obtained from packet PK11 is DC11
- the decoding result obtained from packet PK12 is DC12
- the decoding result obtained from packet PK13 is DC13. If the voice call continues and no voice loss occurs, it is natural that a decoding result after DC13 can be obtained.
- codes such as CD11 to CD13 are used.
- audio data is generically referred to, CD is used as the code.
- codes such as DC11 to DC13 are used, and when decoding results are collectively referred to, DC is used as the code.
- the decoding result DC indicates a part of the decoding result (for example, DC1) obtained from one packet or a plurality of continuous packets. It may also refer to a part of the decoding result (eg, DC1 and DC2) obtained from the packet.
- a voice uttered by a human has a noise part whose amplitude changes randomly and a periodic sound part that repeats at almost the same period.
- the repetition period of the periodic sound part is called a basic period. . Therefore, the fundamental period can also be obtained from DC11 to DC13, which are the decoding results.
- the number of samples included therein is arbitrarily determined.
- the force that can be determined may be about 160 samples.
- the compensator 11 is a characteristic component in the present embodiment, and performs interpolation when voice loss occurs. Details of the configuration and function of the compensator 11 will be described later.
- the erasure determination unit 12 is a part for determining the presence or absence of voice loss, and outputs the determination result as voice loss information ER.
- voice loss information ER There is a possibility that the presence or absence of voice loss can be determined by various methods, but it can be determined by determining that voice loss has occurred due to the power of not receiving the packet to be received.
- the sequence number assigned to be a serial number on the transmission side, which is included in the RTP header included in the transmitted packet is missing, or has already arrived because the order has been changed.
- the delay is based on the method of determining that voice loss has occurred when the packet sequence number is older than the packet sequence number, and the time stamp value, which is the transmission time information added on the transmission side, included in the RTP header. It may be determined that voice loss has occurred when a packet larger than a predetermined value is received. Also, when a transmission error is detected, it may be treated as voice loss.
- the erasure determination unit 12 can also realize such a function in the decoder 10.
- the internal configuration of the compensator 11 is, for example, as shown in FIG.
- the compensator 11 includes two interpolation function units 35a and 35b and a synthesis unit 34.
- the interpolation function units 35a and 35b have the same internal configuration. That is, the interpolation function unit 35a includes a control unit 30a, an interpolation execution unit 3la, a basic period calculation unit 32a, and a decoding result storage.
- the interpolation function unit 35b includes a control unit 30b, an interpolation execution unit 31b, a basic period calculation unit 32b, and a decoding result storage unit 33b.
- control unit 30a corresponds to the control unit 30b
- interpolation execution unit 31a corresponds to the interpolation execution unit 31b
- basic period calculation unit 32a corresponds to the basic period calculation unit 32b
- decoding result storage unit 33. a corresponds to the decryption result storage unit 33b. Since the function of the interpolation function unit 35a and the function of the interpolation function unit 35b are the same as described above, the following description will be given mainly focusing on the interpolation function unit 35a.
- control unit 30a functions as a CPU (central processing unit) in terms of hardware, and functions as a control program such as an OS (operating system) in terms of software. . Accordingly, the respective constituent elements 31a to 33a in the interpolation function unit 35a are controlled by the control unit 30a.
- the decoding result DC signal sequence having exactly the same contents is supplied to the interpolation function units 35a and 35b.
- the interpolation execution unit 35a receives the decoding result DC in the interpolation function unit 35a. 3 la.
- the role of the interpolation execution unit 31a is different between a normal time when no voice loss occurs (corresponding to a normal state described later) and a loss when voice loss occurs (corresponding to a loss compensation state described later).
- the decoding result DC received from the decoder 10 is only supplied to the control unit 30a and the synthesizing unit 34.
- the interpolated speech data TP1 supplied from the control unit 30a is inserted into the section, and a signal sequence including the insertion result is supplied to the synthesis unit 34.
- the signal sequence mainly having the decoding result DC power supplied from the interpolation execution unit 31a to the synthesis unit 34 is the intermediate signal VI.
- the intermediate signal VI is a signal sequence with exactly the same content as the decoding result DC.
- the decoding result storage unit 33a is a decoding result supplied from the interpolation execution unit 31a to the control unit 30a. This is a part for storing DC, and is constituted by a volatile or nonvolatile storage means. Although it depends on the implementation, it is highly likely that an expensive storage means capable of high-speed read / write access is used as the decryption result storage unit 33a, assuming a voice call in which real-time performance is important.
- the upper limit of the storage capacity of the decoding result storage unit 33a can be freely determined, but here, it is assumed that the decoding capacity DC can store the decoding result DC for one basic period. Since the length of one fundamental period varies depending on the content of the audio, if you try to memorize exactly one fundamental period, except for special cases where the length of one fundamental period is known in advance. Therefore, it is necessary to calculate the basic period of the decoding result before storing, acquire only one basic period and store it, and temporarily store the decoding result of one basic period or more for calculating the basic period. There is a high possibility that a working storage area will be required. However, this is a problem that always occurs when the technology of Non-Patent Document 1 is implemented in an actual device.
- Non-Patent Document 1 when the decoding results for exactly three basic periods are stored. In practice, however, the basic period must be calculated before storage, and it is highly likely that the working storage area will require more than three basic periods of storage capacity. In general, a decoding result for one basic period may be obtained as a result of decoding one frame, or a decoding result for a plurality of frames may be obtained.
- the decoding result storage unit 33a for example, a storage capacity sufficiently larger than the upper limit (fixed value) of the fluctuation range of the length of one basic period is prepared, and the latest storage capacity that satisfies the storage capacity is prepared. Let's memorize the decryption result.
- the decoding result storage unit 33a should have a storage capacity sufficient to store the decoding result DC for one frame. You can also. In this case, since the calculation of the basic period before being stored in the decoding result storage unit 33a can be omitted, it is possible to contribute to the saving of the work storage area and the calculation amount.
- the storage capacity is only one basic period (one frame)
- every time a decoding result DC for one basic period is obtained from a new frame (every new frame arrives) the previous time It is necessary to overwrite the stored decoding result DC of one basic period (one frame), and only the newest decoding result DC of one basic period (one frame) is always decoded.
- the result is stored in the fruit storage unit 33a. Note that since a valid decoding result DC is not supplied in a section where speech loss has occurred, the decoding result storage unit 33a maintains the storage of the decoding result DC for one basic period stored immediately before being overwritten. Is done. The same applies when audio loss continues over multiple frames.
- the control unit 30a can recognize that the voice loss has occurred in that section. Therefore, it is possible to perform control so as to maintain the storage on the decoding result storage unit 33a.
- the basic period calculation unit 32a uses the decoding result DC stored in the decoded speech storage unit 32a at the time when the state loss when the speech loss occurs and the transition to the V-turning state occurs.
- the basic period is calculated.
- the basic period may be obtained by various methods. For example, a known autocorrelation function is obtained from the decoding result DC stored in the decoding result storage unit 33a, and the autocorrelation function is maximized. It can also be calculated by calculating the amount of delay.
- the interpolation function unit 35a can obtain the basic period almost simultaneously with the interpolation function unit 35b.
- the interpolated audio data TP1 supplied from the control unit 30a to the interpolation execution unit 31a and the interpolated audio data TP2 supplied from the control unit 30b to the interpolation execution unit 31b need to be different.
- the decoding result DC having the same content is stored in the decoding result storage units 33a and 33b, if the basic period is obtained in the same way, the interpolated speech data TP1 obtained by the interpolation function unit 35a and the interpolation function unit 35b Since the interpolated audio data TP2 obtained in step 1 will be the same, the basic period will be different.
- control unit for example, 30a
- the control unit for example, 30b
- the basic period is calculated by excluding that value.
- the basic period calculated by the basic period calculation unit 32a in the interpolation function unit 35a is Pa
- the basic period calculation unit 32b in the interpolation function unit 35b is calculated from the force in the search range that does not include the value of Pa.
- Pb be the basic period.
- the basic period Pb may be between 2.5 ms and Pa (excluding Pa itself).
- the calculated basic period Pa may be transmitted from the control unit 30a to 30b as described above, but one basic period before the decoding result DC is stored in the decoding result storage unit 33b. If the value is stored, the control unit 30b can recognize Pa, and can search for the basic period Pb from a force having a search range different from Pa.
- the synthesizer 34 gives weighting coefficients to the intermediate signal VI supplied from the interpolation execution unit 31a in the interpolation function unit 35a and the intermediate signal V2 supplied from the interpolation execution unit 31b in the interpolation function unit 35b. After that, it is synthesized and the synthesized result is output as the final output audio signal V. If the weighting factor assigned to the intermediate signal VI is ⁇ and the weighting factor assigned to the intermediate signal V2 is
- the operation of the communication terminal 23 at the time of reception can be divided into four.
- the first is a normal operation that is executed in a normal state where frames without loss of voice continue to be received normally, and the second is the loss of voice from the normal state when a loss of voice for one frame is detected.
- the third is an erasure transition operation executed when shifting to the compensation state, and the third is a normal transition operation performed when shifting to the normal state from the loss compensation state.
- the erasure compensation state can be divided into a case in which the audio loss for one frame ends and a case in which an audio loss for a plurality of frames occurs.
- the normal state is a state in which the immediately preceding frame and the current frame are effectively received, and an effective decoding result DC is obtained.
- the erasure transition operation is executed when the previous frame is received and a valid decoding result DC is obtained, but the current frame cannot be received and a valid decoding result DC is not obtained.
- the normal transition operation is executed when the previous frame cannot be received and an effective decoding result DC is obtained, but the current frame can be received and an effective decoding result DC is obtained. . If voice loss occurs for multiple frames, neither the previous frame nor the current frame can be received, and an effective decoding result DC cannot be obtained for both.
- the communication terminal 23 used by the user U2 receives time-series packets PK11, PK12, PK13,... (Including frames) transmitted from the communication terminal 22 used by the user U1, and VoIP communication is performed. Since the voice uttered by the user U1 is output from the communication terminal 23 that performs the normal operation, the user U2 can listen to the voice.
- the decoder 10 in the communication terminal 23 When this normal operation is performed, the decoder 10 in the communication terminal 23 outputs a signal sequence of the decoding result DC composed of the decoding results DC 11, DC12, DC13, ..., and this signal sequence is interpolated.
- the data is supplied to the synthesis unit 34 via the execution units 31a and 31b.
- the interpolation execution units 31a and 31b supply the signal sequence received from the decoder 10 to the synthesis unit 34 as intermediate signals VI and V2 as they are.
- the synthesizer 34 outputs an output audio signal V that is a result of the synthesis.
- the user U2 listens to the voice corresponding to the output voice signal V by the communication terminal 23.
- the interpolation execution units 31a and 31b supply the decoding result DC received from the decoder 10 to the control units 30a and 30b, every time a new one-frame decoding result DC is supplied.
- the control units 30a and 30b calculate the decoding result DC for one new basic period (one frame), and store the calculation result in the decoding result storage units 33a and 33b. Since the decoding result storage units 33a and 33b have a storage capacity sufficient to store the decoding result for one basic period (for one frame), the decoding result corresponding to a certain basic period (for one frame) (for example, DC11 Is stored when the decoding result (for example, a part of DC 12) corresponding to the next basic period (one frame) is stored. Accordingly, only the latest decoding result DC for one basic period (one frame) remains in the decoding result storage units 33a and 33b.
- the loss determination unit 12 indicates that a speech loss has been detected.
- the voice erasure information ER is supplied to the control units 30a and 30b, the erasure transition operation is executed, and the compensator 11 transitions from the normal state to the erasure compensation state.
- control units 30a, 3 Ob recognize the necessity of generating the interpolated speech data TP1, TP2 in order to compensate for the decoding result DC13 lost due to speech loss, and the basic period calculation units 32a, 32b It is instructed to calculate the basic period using the decoding result (for example, DC12) for one basic period stored in the decoding result storage units 33a and 33b at the time.
- the calculation result may be stored and reused. .
- the basic period calculation unit 32a calculates Pa as a basic period
- the basic period calculation unit 32b calculates a basic period Pb having a value different from the Pa. To do.
- a decoding result DC slightly larger than the original one basic period Pa is stored in the decoding result storage units 33a and 33b.
- the decoding result DC may be a decoding result for one frame.
- the control unit 30a in the interpolation function unit 35a performs the decoding result for one basic period Pa.
- W11, part of W1 is used to make up for the shortage.
- the processing here may be the same processing as when S42 is connected to S41 in FIG.
- the interpolated speech data TP1 generated by the control unit 30a in this way is supplied to the interpolation execution unit 31a and is supplied to the synthesis unit 34 as part of the signal train of the intermediate signal VI.
- the basic period calculation unit 32b Calculate a basic period Pb of a different value
- the control unit 30b obtains the decoding result W2 for the basic period Pb from the decoding result for one frame stored in the decoding result storage unit 33b, and temporally Shortage is compensated by using W21, which is part of the same decryption result W2.
- the interpolated audio data TP2 generated by the control unit 30b in this way has a waveform different from that of TP1.
- the interpolated speech data TP2 is supplied to the interpolation execution unit 31b and supplied to the synthesis unit 34 as part of the signal sequence of the intermediate signal V2.
- the synthesizing unit 34 synthesizes the interpolated audio data TP1 and TP2 by assigning the weighting coefficients ⁇ and j8, and outputs the synthesis result as an output audio signal V.
- the output audio signal V generated by synthesizing TP1 and TP2 also reflects the characteristics of the decoding result DC indicating the original audio, and the user U2 who listens is almost uncomfortable. There is a high possibility of realizing high sound quality without feeling. Also, the output audio signal V generated in this way usually has a waveform different from that of the output audio signal V in the section corresponding to the frame immediately before the effective decoding result DC is obtained.
- a valid decoding result DC14 (not shown) is immediately supplied from the decoder 10, so that the compensator 11 performs a normal transition operation.
- the decoding result storage units 33a and 33b store the decoding results for one frame (or one basic period) corresponding to the decoding result DC14. At this time, superposition and addition processing may be performed so that the junction between the effective decoding result and the generated interpolated speech is continuous in waveform.
- the interpolated voice data TP1 and TP2 of the waveform shown in Fig. 1 are transferred from the control units 30a and 30b to the interpolation execution unit 31a, It may be output to 31b.
- generation is performed from the sample following the last sample used to generate interpolated audio data in the previous frame. It is also desirable to start it. This ensures waveform-like continuity as well as recording.
- the memorized limited decoding result DC can be used effectively, and the repetition frequency per unit time of the same waveform in the output audio signal V can be reduced. It is also desirable to change the period used for the complementary audio data TP1 and TP2 in the decoding result DC of the limited period stored in the decoding result storage units 33a and 33b.
- the interpolated speech data TP1 and TP2 obtained by the cyclic method are output as they are from the limited decoding results D stored in the decoding result storage units 33a and 33b.
- Output of one or more waveforms such as the output audio signal V of the waveform shown in 1 is repeated, which may cause a beep sound.
- the storage capacity of the decoding result storage units 33a and 33b is about one frame or one basic period, it is smaller than the conventional one.
- effective voice loss compensation can be continued even when voice loss continues for a long period of time, so flexibility is improved and communication quality in a broad sense (e.g., sound quality felt by user U2). ) And storage resources can be saved.
- This embodiment is different from the first embodiment only in the point relating to the internal configuration of the compensator.
- FIG. 13 An example of the internal configuration of the compensator 13 of this embodiment is shown in FIG.
- the compensator 13 includes two interpolation function units 325a and 325b, a synthesis unit 34, Switching units 321, 322 are provided.
- the interpolation function units 325a and 325b have the same internal configuration. That is, the interpolation function unit 325a includes a control unit 30a, an interpolation execution unit 3la, a basic period calculation unit 32a, and a decoding result storage unit 323a.
- the interpolation function unit 325b includes the control unit 30b and the interpolation execution unit. Unit 31b, basic period calculation unit 32b, and decoding result storage unit 323b.
- the switching unit 321 only supplies the decoding result DC as the decoding result IN1 to the interpolation execution unit 31a.
- the decoding result storage unit 323a stores the decoding result IN 1 corresponding to one new frame (or one basic period). Is not done.
- each part in the interpolation function unit 325b such as the interpolation execution unit 31b, the basic period calculation unit 32b, and the decoding result storage unit 323b does not execute an effective operation and maintains a sleep state. In the dormant state, almost no storage resources and computing power are used, which contributes to saving storage resources and reducing the amount of calculation.
- the switching unit 322 receives only the supply of the intermediate signal VI supplied from the interpolation execution unit 31a, and outputs the intermediate signal VI to the combining unit 34 or the connection point P1.
- the switching unit 322 outputs the intermediate signal VI to the connection point P1 in the normal state.
- the intermediate signal VI is output as the final output audio signal V as it is.
- the synthesizer 34 does not need to perform processing related to the weight coefficient ⁇ and ⁇ as in the first embodiment, so that it saves storage resources and suppresses the amount of calculation.
- the switching unit 322 When voice loss occurs and shifts to the loss compensation state, the switching unit 322 outputs the intermediate signal VI to the synthesis unit 34. At this time, since the control unit 30a, 30b is informed that voice loss has been detected in the voice loss information ER, the control unit 30a, 30b has the decoding result stored in the decoding result storage unit 323a in the normal state. The DC is copied to the decryption result storage unit 323b (DCa). After this copy, the same contents are stored in the decryption result storage units 323a and 323b. Therefore, the interpolated speech data TP1 and TP2 can be generated in the two interpolation function units 323a and 323b in the same manner as in the first embodiment. The generated interpolated speech data TP1 and TP2 are supplied to the interpolation execution units 31a and 31b, respectively, and synthesized by the synthesis unit 34, which is the same as in the first embodiment.
- the weighting factors a and ⁇ may be used also in the present embodiment. As a result, even when voice loss continues for a long time, it is possible to continue effective voice loss compensation while suppressing the generation of abnormal noise.
- the decoding unit ⁇ 2 having the same content as the decoding result IN1 may be supplied from the switching unit 321 to the interpolation execution unit 31b.
- the decoder 10 since the decoder 10 is in a state where no effective decoding result DC is supplied due to speech loss, it is necessary to supply a control signal such as a signal for informing the frame timing. Except in some cases, there is little need to supply the decoding result IN2 to the interpolation execution unit 3 lb.
- the switching unit 321 can be omitted, and the decoding result DC output from the decoder 10 should be supplied only to the interpolation execution unit 31a. become.
- the switching unit 322 again outputs the intermediate signal VI to the contact P1.
- each unit in the interpolation function unit 325b returns to a sleep state in which an effective operation is not executed.
- This embodiment is different from the first and second embodiments only in respect of the internal configuration of the compensator. Of the first and second embodiments, the first embodiment is closer to the present embodiment.
- FIG. 1 An example of the internal configuration of the compensator 14 of this embodiment is shown in FIG.
- the compensator 14 includes two interpolation function units 35 a and 35 b and a synthesis unit 334.
- the interpolation function units 35a and 35b have the same internal configuration. That is, the interpolation function unit 325a includes a control unit 30a, an interpolation execution unit 3la, a basic period calculation unit 32a, and a decoding result storage unit 33a, and the interpolation function unit 35b includes a control unit 30b and an interpolation execution unit. A unit 31b, a basic period calculation unit 32b, and a decoding result storage unit 33b are provided.
- the internal configuration of the synthesis unit 334 is different from that of the first embodiment.
- the random weight generation unit 331 operates only when the voice loss information ER reports the occurrence of voice loss, and generates a random value almost in white noise.
- the range may be further limited within this range. For example, 0. 7 ⁇ a ⁇ 0.8.
- any method can be used to determine ⁇ . For example, when the value of ⁇ is updated in time series, the amount of change from the value of a before the update may be randomly generated.
- the frequency of updating the value of a should be updated for each force frame that can be varied.
- This embodiment is different from the first to third embodiments only in respect of the internal configuration of the compensator.
- the second embodiment is closest to the present embodiment.
- FIG. 15 An example of the internal configuration of the compensator 15 of this embodiment is shown in FIG.
- the compensator 15 includes two interpolation function units 75a and 75b and switching units 321, 32.
- the interpolation function sections 75a and 75b have the same internal configuration. That is, the interpolation function unit 75a includes a control unit 71a, an interpolation execution unit 31a, a basic period calculation unit 32a, and a decoding result storage unit 323a.
- the interpolation function unit 75b includes the control unit 71b and the interpolation execution unit 31b. And a basic cycle calculation unit 32b and a decoding result storage unit 323b.
- This embodiment is different from the second embodiment in some of the functions of the control units 71a and 71b and the internal configuration of the synthesis unit 72.
- the control units 71a and 71b of the present embodiment are the basic period calculation units in the self-interpolation function units 75a and 75b. A function of transmitting the basic periods PI and P2 calculated by 32a and 32b to the combining unit 72 is provided.
- the basic period P1 may correspond to the Pa, and the basic period P2 may correspond to the Pb.
- the synthesizer 72 has the internal configuration shown in FIG. 8, for example, and the weight updater 81 generates a weight coefficient
- ⁇ indicates the maximum value of the basic period that can be calculated
- Pm indicates the minimum value of the basic period that can be calculated
- the value of ⁇ may be determined according to a predetermined rule based on feature quantities (for example, power, spectrum, etc.) other than the basic period.
- the OC value is set when the erasure compensation state is entered, and the same OC value is used until the normal state is restored.
- control unit 71a, 71b outputs the power of the waveform of the decoding result DC (for example, the mean square of the sample values of one basic period interval) instead of the basic periods Pl, P2 every frame period.
- the combining unit 72 may update the value of each frame interval ⁇ until the normal state is restored.
- the weighting coefficients to be generated are generated in the decoding result storage units (323a, 323b) such as the basic period (PI, ⁇ 2), and various feature quantities of the decoding results (DC) are generated. Since it can be reflected in the values (ex, ⁇ ), it is possible to provide various nominations.
- This embodiment is different from the first to fourth embodiments only in respect to the internal configuration of the compensator.
- the fourth embodiment is closest to the present embodiment in that it has the switching units 321, 322, and the like.
- FIG. 1 An example of the internal configuration of the compensator 16 of this embodiment is shown in FIG. 1
- the compensator 16 includes one interpolation function unit 95, switching units 321 and 322, and a combining unit 72.
- the interpolation function unit 95 includes interpolation execution units 31a and 31b, a control unit 90, a basic period calculation unit 92, a decoding result storage unit 93, a control switching unit 96, and a state holding unit 97. I have.
- control unit 90 corresponds to the control unit 71a or 71b
- basic period calculation unit 92 corresponds to the basic period calculation unit 32a or 32b
- decoding result storage unit 93 includes the decoding result storage unit. Since it corresponds to 323a or 323b, detailed description thereof is omitted.
- control unit 90 the basic cycle calculation unit 92, and the decoding result storage unit 93 generate the interpolated speech data TP1 and the basic cycle P1 as well as the interpolated speech data TP2 and the basic cycle P2. Is different from the fourth embodiment in that it also functions.
- the control unit 90 since the decoding result IN1 is stored in the decoding result storage unit 93 via the interpolation execution unit 31a in the normal state, the control unit 90 performs the basic operation when the erasure compensation state is entered. It is possible to cause the period calculation unit 92 to calculate the basic period P1 for generating the interpolated voice data TP1, and to generate the interpolated voice data TP1 based on the basic period P1. Also, since voice loss occurs in the subsequent frame section, various information necessary for generating the subsequent interpolated voice data TP1 is stored in the state holding unit 97 as generation state information Q1.
- control unit 90 causes the basic period calculation unit 92 to calculate the basic period P2 for generating the interpolated sound data TP2, and based on the basic period P2, the interpolated sound data T P2 can be generated. Since voice loss occurs in the subsequent frame section, various information necessary for generating the subsequent interpolated voice data TP2 is stored in the state holding unit 97 as generation state information Q2.
- the generation state information Ql and Q2 may include various types of information.
- the decoding result storage unit may include various types of information.
- the decoding result storage unit may include various types of information.
- control unit 90 causes the control switching unit 96 to execute the interpolation execution unit 3la and inserts the generated interpolated voice data TP1 into the signal sequence of the intermediate signal VI.
- the interpolation unit 31b is executed by the switching unit 96, and the generated interpolated audio data TP2 is inserted into the signal sequence of the intermediate signal V2.
- the synthesizing unit 72 uses the basic periods PI and P2 already supplied from the control unit 90 to weight factors ⁇ and ⁇ can be assigned to output an output audio signal V as a synthesis result.
- the erasure compensation state does not end in one frame but continues for a plurality of frames
- the subsequent interpolated audio is generated using the generation state information Ql, Q2 stored in the state holding unit 97.
- Data TP1, ⁇ 2 can be generated and voice loss compensation can be continued.
- two interpolation function units are provided in one compensator.
- the number of interpolation function units in one compensator may be three or more.
- two interpolation execution units are provided in one interpolation function unit.
- the number of interpolation execution units in one interpolation function unit may be three or more. .
- PCM is used as the encoding method, but the present invention can be applied to various encoding methods. For example, it can be applied to differential quantization methods such as ADPCM.
- the present invention has been described by taking the communication of voice signals by telephone (VoIP-compatible telephone) as an example.
- voice signals other than voice signals by telephone. It is also applicable to. For example, it can be widely applied to communication using periodic signals such as voice tone signals.
- the application range of the present invention is not necessarily limited to voice, tone, and the like.
- it may be applicable to image signals such as moving images.
- the communication protocol to which the present invention is applied need not be limited to the IP protocol described above.
- the present invention has been realized mainly in hardware, but the present invention can also be realized in software.
Abstract
Description
Claims
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/659,205 US7793202B2 (en) | 2004-08-12 | 2005-04-07 | Loss compensation device, loss compensation method and loss compensation program |
GB0702838A GB2435749B (en) | 2004-08-12 | 2007-02-14 | Loss compensation device, loss compensation method, and loss compensation program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-235461 | 2004-08-12 | ||
JP2004235461A JP4419748B2 (ja) | 2004-08-12 | 2004-08-12 | 消失補償装置、消失補償方法、および消失補償プログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006016439A1 true WO2006016439A1 (ja) | 2006-02-16 |
Family
ID=35839215
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/006850 WO2006016439A1 (ja) | 2004-08-12 | 2005-04-07 | 消失補償装置、消失補償方法、および消失補償プログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US7793202B2 (ja) |
JP (1) | JP4419748B2 (ja) |
CN (1) | CN100445716C (ja) |
GB (1) | GB2435749B (ja) |
WO (1) | WO2006016439A1 (ja) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4504389B2 (ja) * | 2007-02-22 | 2010-07-14 | 富士通株式会社 | 隠蔽信号生成装置、隠蔽信号生成方法および隠蔽信号生成プログラム |
JP5233986B2 (ja) * | 2007-03-12 | 2013-07-10 | 富士通株式会社 | 音声波形補間装置および方法 |
JP5059677B2 (ja) * | 2008-04-18 | 2012-10-24 | ルネサスエレクトロニクス株式会社 | ノイズ除去装置、及びノイズ除去方法 |
JP5584157B2 (ja) * | 2011-03-22 | 2014-09-03 | 株式会社タムラ製作所 | 無線受信機 |
KR20140067512A (ko) * | 2012-11-26 | 2014-06-05 | 삼성전자주식회사 | 신호 처리 장치 및 그 신호 처리 방법 |
FR3004876A1 (fr) * | 2013-04-18 | 2014-10-24 | France Telecom | Correction de perte de trame par injection de bruit pondere. |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0573097A (ja) * | 1991-09-17 | 1993-03-26 | Nippon Telegr & Teleph Corp <Ntt> | 低遅延符号駆動形予測符号化方法 |
JPH06282298A (ja) * | 1993-03-29 | 1994-10-07 | Nippon Telegr & Teleph Corp <Ntt> | 音声の符号化方法 |
JPH07271391A (ja) * | 1994-04-01 | 1995-10-20 | Toshiba Corp | 音声復号装置 |
JPH08305398A (ja) * | 1995-04-28 | 1996-11-22 | Matsushita Electric Ind Co Ltd | 音声復号化装置 |
JPH09120297A (ja) * | 1995-06-07 | 1997-05-06 | At & T Ipm Corp | フレーム消失の間のコードブック利得減衰 |
JP2003249957A (ja) * | 2002-02-22 | 2003-09-05 | Nippon Telegr & Teleph Corp <Ntt> | パケット構成方法及び装置、パケット構成プログラム、並びにパケット分解方法及び装置、パケット分解プログラム |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS57132486A (en) * | 1981-02-10 | 1982-08-16 | Sony Corp | Magnetic recorder and reproducer |
KR100197366B1 (ko) * | 1995-12-23 | 1999-06-15 | 전주범 | 영상 에러 복구 장치 |
JP3157116B2 (ja) * | 1996-03-29 | 2001-04-16 | 三菱電機株式会社 | 音声符号化伝送システム |
US6128369A (en) * | 1997-05-14 | 2000-10-03 | A.T.&T. Corp. | Employing customer premises equipment in communications network maintenance |
-
2004
- 2004-08-12 JP JP2004235461A patent/JP4419748B2/ja active Active
-
2005
- 2005-04-07 US US11/659,205 patent/US7793202B2/en active Active
- 2005-04-07 WO PCT/JP2005/006850 patent/WO2006016439A1/ja active Application Filing
- 2005-04-07 CN CNB2005800272467A patent/CN100445716C/zh active Active
-
2007
- 2007-02-14 GB GB0702838A patent/GB2435749B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0573097A (ja) * | 1991-09-17 | 1993-03-26 | Nippon Telegr & Teleph Corp <Ntt> | 低遅延符号駆動形予測符号化方法 |
JPH06282298A (ja) * | 1993-03-29 | 1994-10-07 | Nippon Telegr & Teleph Corp <Ntt> | 音声の符号化方法 |
JPH07271391A (ja) * | 1994-04-01 | 1995-10-20 | Toshiba Corp | 音声復号装置 |
JPH08305398A (ja) * | 1995-04-28 | 1996-11-22 | Matsushita Electric Ind Co Ltd | 音声復号化装置 |
JPH09120297A (ja) * | 1995-06-07 | 1997-05-06 | At & T Ipm Corp | フレーム消失の間のコードブック利得減衰 |
JP2003249957A (ja) * | 2002-02-22 | 2003-09-05 | Nippon Telegr & Teleph Corp <Ntt> | パケット構成方法及び装置、パケット構成プログラム、並びにパケット分解方法及び装置、パケット分解プログラム |
Also Published As
Publication number | Publication date |
---|---|
JP4419748B2 (ja) | 2010-02-24 |
CN101002079A (zh) | 2007-07-18 |
CN100445716C (zh) | 2008-12-24 |
GB0702838D0 (en) | 2007-03-28 |
GB2435749B (en) | 2009-02-18 |
US7793202B2 (en) | 2010-09-07 |
JP2006053394A (ja) | 2006-02-23 |
GB2435749A (en) | 2007-09-05 |
US20090019343A1 (en) | 2009-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gunduzhan et al. | Linear prediction based packet loss concealment algorithm for PCM coded speech | |
US9336783B2 (en) | Method and apparatus for performing packet loss or frame erasure concealment | |
RU2407071C2 (ru) | Способ генерации кадров маскирования в системе связи | |
CN100426715C (zh) | 一种丢帧隐藏方法和装置 | |
JP5405659B2 (ja) | 消去されたスピーチフレームを再構成するためのシステムおよび方法 | |
JP4473869B2 (ja) | 音響信号のパケット通信方法、送信方法、受信方法、これらの装置およびプログラム | |
US20060167693A1 (en) | Method and apparatus for performing packet loss or frame erasure concealment | |
JP2008529423A (ja) | 音声通信におけるフレーム消失キャンセル | |
WO2006016439A1 (ja) | 消失補償装置、消失補償方法、および消失補償プログラム | |
JP4485690B2 (ja) | マルチメディア信号を伝送する伝送システム | |
KR20160002920A (ko) | 노이즈 주입이 가중된 프레임 손실 보정 | |
Ogunfunmi et al. | Speech over VoIP networks: Advanced signal processing and system implementation | |
US6584104B1 (en) | Lost-packet replacement for a digital voice signal | |
TW432855B (en) | Echo eliminator | |
Kim et al. | Enhancing VoIP speech quality using combined playout control and signal reconstruction | |
JPH10340097A (ja) | 快適雑音発生装置及び該装置の構成要素を含む音声エンコーダ及びデコーダ | |
JP3833490B2 (ja) | データ伝送において発生する遅延ジッタを吸収する装置および方法 | |
Rodbro et al. | Time-scaling of sinusoids for intelligent jitter buffer in packet based telephony | |
JP2005274917A (ja) | 音声復号装置 | |
JP4093174B2 (ja) | 受信装置および方法 | |
JP3225256B2 (ja) | 擬似背景雑音生成方法 | |
Gournay et al. | Performance analysis of a decoder-based time scaling algorithm for variable jitter buffering of speech over packet networks | |
JP3508850B2 (ja) | 疑似背景雑音生成方法 | |
Ulseth et al. | VoIP speech quality-Better than PSTN? | |
Lee et al. | A forward-backward voice packet loss concealment algorithm for multimedia over IP network services |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KM KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DPE1 | Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 11659205 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580027246.7 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 0702838 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20050407 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 0702838.4 Country of ref document: GB |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |