WO2021200151A1 - Transmission device, transmission method, reception device, and reception method - Google Patents

Transmission device, transmission method, reception device, and reception method Download PDF

Info

Publication number
WO2021200151A1
WO2021200151A1 PCT/JP2021/010803 JP2021010803W WO2021200151A1 WO 2021200151 A1 WO2021200151 A1 WO 2021200151A1 JP 2021010803 W JP2021010803 W JP 2021010803W WO 2021200151 A1 WO2021200151 A1 WO 2021200151A1
Authority
WO
WIPO (PCT)
Prior art keywords
waveform
range
data
noise
spectrum
Prior art date
Application number
PCT/JP2021/010803
Other languages
French (fr)
Japanese (ja)
Inventor
崇史 服部
劔持 千智
戸栗 康裕
竜二 徳永
田中 朗穂
Original Assignee
ソニーグループ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ソニーグループ株式会社 filed Critical ソニーグループ株式会社
Publication of WO2021200151A1 publication Critical patent/WO2021200151A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/69Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for evaluating synthetic or decoded voice signals

Definitions

  • This technology relates to a transmitting device, a transmitting method, a receiving device, and a receiving method applicable to data communication.
  • Patent Document 1 describes an audio decoder including an error concealment unit.
  • this error concealment unit when a frame loss or the like occurs, an audio information component expressing the high frequency side of the loss portion is synthesized by the concealment process in the frequency domain. Further, the audio information component expressing the low frequency side of the loss portion is synthesized by the concealment processing in the time domain.
  • error concealment that avoids click sounds and beep sounds associated with the synthesis process is possible (paragraphs [0016] [0017] [0094] [0995] of the specification of Patent Document 1). 1, Fig. 2 etc.).
  • an object of the present technology is to provide a transmission device, a transmission method, a reception device, and a reception method capable of realizing high-quality error concealment while suppressing the amount of data transmission.
  • the transmission device includes a quality prediction unit, a range setting unit, and a data generation unit.
  • the quality prediction unit predicts the waveform quality of the restored waveform with respect to the target frame of the waveform data.
  • the range setting unit sets at least one target range as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame.
  • the data generation unit generates the redundant data based on the target range, and generates transmission data including the redundant data.
  • the waveform quality of the restored waveform is predicted for the target frame of the waveform data. Based on this waveform quality, at least one target range, which is a frequency range allocated to redundant data for generating a restored waveform, is set. Then, transmission data including redundant data generated based on the target range is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.
  • the restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame.
  • the quality prediction unit may predict the waveform quality of the composite waveform as the waveform quality of the restored waveform.
  • the target frame may be a frame in the vicinity of the transmission frame transmitted as the transmission data.
  • the quality prediction unit may generate the composite waveform for the target frame based on the waveform data included in the transmission frame.
  • the quality prediction unit may calculate a noise spectrum representing the waveform quality of the composite waveform based on the composite waveform and the original waveform represented by the waveform data included in the target frame.
  • the range setting unit may set the target range based on the noise spectrum.
  • the range setting unit calculates the total noise amount of the interpolated data obtained by interpolating the redundant data with the composite waveform based on the noise spectrum and the quantization noise accompanying the coding of the redundant data, and the total noise amount is the minimum.
  • the target range may be set so as to be.
  • the noise spectrum may be either a spectrum obtained by frequency-converting the difference between the original waveform and the composite waveform, or a spectrum representing the difference between the spectrum of the original waveform and the spectrum of the composite waveform.
  • the range setting unit may set an integration range for calculating the integrated value of the noise spectrum, and may set the target range based on the minimum integrated range in which the integrated value exceeds the first threshold value.
  • the range setting unit may set the minimum frequency of the integration range to the minimum frequency of the noise spectrum and change the maximum frequency of the integration range to calculate the integration value.
  • the range setting unit may calculate at least one excess range in which the noise spectrum exceeds a second threshold set for each frequency, and set the target range based on the at least one excess range.
  • the range setting unit may calculate a plurality of candidate ranges that are candidates for the target range, and set the target range based on the plurality of candidate ranges.
  • the range setting unit may calculate a connection cost representing the amount of noise that changes by connecting the candidate ranges adjacent to each other, and connect the candidate ranges based on the connection cost.
  • the restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame.
  • the range setting unit extracts the toned frequency component included in the waveform data of the target frame, and the toned frequency component is included on the high frequency side of the predetermined threshold frequency.
  • the width of the candidate range may be adjusted.
  • the range setting unit may adjust the width of the candidate range based on the noise components at the highest frequency and the lowest frequency of the candidate range.
  • the restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame.
  • the range setting unit may set one of the plurality of synthesis methods for generating the composite waveform in a non-target range which is a frequency range other than the target range.
  • the quality prediction unit may predict the waveform quality of the composite waveform for each of the plurality of synthesis methods.
  • the range setting unit may set the target range and the synthesis method assigned to the non-target range based on the waveform quality predicted for each of the plurality of synthesis methods.
  • the range setting unit calculates at least one candidate range that is a candidate for the target range for each of the plurality of synthesis methods, and calculates a frequency range represented by the intersection of the candidate ranges calculated for each of the plurality of synthesis methods.
  • the target range may be set, and the method in which the integrated value of the noise spectrum is minimized among the plurality of synthesis methods may be set in the non-target range.
  • a transmission method is a transmission method executed by a computer system and includes predicting the waveform quality of a restored waveform with respect to a target frame of waveform data. Based on the waveform quality, at least one target range is set as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame. The redundant data is generated based on the target range, and transmission data including the redundant data is generated.
  • the receiving device includes a receiving unit and a waveform restoring unit.
  • the receiving unit receives redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame, based on the waveform quality of the restored waveform with respect to the target frame of the waveform data.
  • the waveform restoration unit generates the restoration waveform based on the redundant data.
  • the restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame.
  • the receiving unit may receive designated information that specifies a method of synthesizing the composite waveform for each non-target range that is a frequency range other than the target range.
  • the waveform restoration unit may interpolate the redundant data for each non-target range by using the composite waveform generated by the synthesis method specified by the designated information.
  • the receiving method is a receiving method executed by a computer system, and is a frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform with respect to the target frame of the waveform data. Includes receiving redundant data assigned to at least one of the scopes. The restored waveform is generated based on the redundant data.
  • FIG. 1 is a diagram schematically showing the appearance of a transmission / reception system according to the first embodiment of the present technology.
  • the transmission / reception system 100 includes a transmission device 20 and a reception device 50, and is a system that transmits waveform data from the transmission device 20 to the reception device 50.
  • the waveform data is, for example, data representing a waveform that changes with time.
  • voice data representing a voice waveform is transmitted as waveform data.
  • waveform data is transmitted by wireless communication between the transmission device 20 and the reception device 50.
  • the communication standard for wireless communication is not limited, and a communication standard capable of transmitting waveform data such as Bluetooth (registered trademark) may be appropriately used.
  • a portable terminal device for example, a smartphone, a tablet terminal, a portable music player, etc.
  • a voice reproducing device for example, wireless headphones, wireless earphones, wireless speakers, etc.
  • the configurations of the transmitting device 20 and the receiving device 50 are not limited.
  • Waveform data such as voice transmitted from the transmitting device 20 is received by the receiving device 50.
  • the waveform represented by the waveform data is restored and reproduced as sound from the speaker mounted on the receiving device 50.
  • a part of the waveform data transmitted from the transmitting device 20 may not be received by the receiving device 50.
  • wireless communication between the transmitting device 20 and the receiving device 50 may be hindered, and a situation may occur in which waveform data is partially lost.
  • error concealment is executed when such data loss occurs.
  • the error concealment is, for example, a process of compensating for the lost portion when a part of the waveform data transmitted from the transmitting device 20 to the receiving device 50 is lost.
  • FIG. 2 is a schematic diagram for explaining an outline of error concealment.
  • FIG. 2A is a schematic diagram showing a process in which waveform data is transmitted.
  • FIG. 2B is a schematic diagram showing an example of a waveform compensated by error concealment.
  • the transmission device 20 transmits waveform data encoded by using a transmission encoder. Further, in the receiving device 50, the waveform data is decoded using a decoder corresponding to the encoder used for coding. Data processing such as these coding (decoding) is executed for each frame that divides the waveform data along the time axis.
  • the frame is, for example, a processing unit standardized by a coding method.
  • the length of the frame (division period for dividing the waveform) is set to, for example, a period (for example, 10 mSec or the like) according to the coding method used in the transmission device 20.
  • the waveform data (voice data) for one frame before encoding is described as the original data in the transmission device 20. Further, when showing the time waveform of the original data, it is described as x (n), and when showing the frequency spectrum obtained by performing time-frequency conversion on the original data, it is described as X (k).
  • the encoded original data is packed in packet 1 for each frame and transmitted.
  • the packet 1 is a data transmission unit between the transmitting device 20 and the receiving device 50.
  • FIG. 2A schematically illustrates three packets 1 transmitted from the transmitting device 20 to the receiving device 50.
  • the packet 1 transmitted from the transmission device 20 corresponds to the transmission data.
  • an error signal indicating that the packet 1 is not received from the receiving device 50 is transmitted.
  • the transmission device 20 that has received the error signal executes a retransmission process for retransmitting the target packet 1. By repeating such processing, it is possible to prevent the loss of the packet 1.
  • a limit is provided on the number of times a packet is retransmitted. In this case, since the packet 1 exceeding the limit number of times is discarded, a missing packet (packet 1 in the center of FIG. 2A) is generated. If the voice or the like is reproduced as it is with the missing packet generated, a discontinuity point may occur in the voice signal, which may be perceived as an audible discomfort.
  • the transmission device 20 generates a packet 1 including data in which redundant data 3 is added to the main data 2. That is, one packet 1 includes a set of main data 2 and redundant data 3.
  • the main data 2 is a coded version of the original data that is originally desired to be transmitted.
  • the frame of the main data 2 packed in the packet 1 and transmitted will be referred to as a transmission frame.
  • the redundant data 3 is data for one frame separately added to the main data 2 for the purpose of being used for error concealment (PLC in this case).
  • the target frame for generating the redundant data 3 will be referred to as a target frame.
  • the data encoded by using a part of the original data included in the frame near the frame (transmission frame) of the main data 2 is used as the redundant data 3. Therefore, the target frame is a frame in the vicinity of the transmission frame 8 transmitted as the packet 1.
  • one packet 1 includes one main data 2 and one redundant data 3.
  • the main data 2 is, for example, data obtained by encoding the original data of the Mth frame (M) with the encoder for the main data 2.
  • the redundant data 3 is data obtained by encoding the original data of the M + 1th frame (M + 1), which is a frame in the vicinity of the main data 2.
  • the frame (M) is the transmission frame
  • the frame (M + 1) is the target frame.
  • an encoder of generally low quality (high compression rate) which has different settings such as a coding method and a compression rate from the encoder of the main data 2 is used.
  • the data amount of the redundant data 3 is smaller than the data amount of the main data 2.
  • the target frame that becomes the redundant data 3 is not limited, and redundant data 3 such as the M + 2nd frame (M + 2) and the M-1st frame (M-1) may be added. Further, the number of main data 2 (redundant data 3) packed in the packet 1 is not limited. For example, the present technology can be applied even when a packet 1 or the like including a set of main data 2 and redundant data 3 for a plurality of frames is used.
  • the packet 1 including the redundant data 3 is sequentially generated and transmitted to the receiving device 50.
  • the receiving device 50 compensates for the loss data by using the redundant data 3 corresponding to the main data 2 (hereinafter, referred to as loss data) included in the loss packet.
  • loss data the redundant data 3 corresponding to the main data 2 (hereinafter, referred to as loss data) included in the loss packet.
  • the loss data is interpolated using the redundant data 3 of the frame (M + 1) that has already been received.
  • Such a PLC method is generally classified into Media-Specific FEC (Forward Error Correction).
  • the loss data can be immediately compensated by using the already received redundant data 3. It is also possible to compensate for the lost data by receiving the necessary redundant data 3 after the packet is lost.
  • FIG. 2B schematically illustrates the waveform restored by the receiving device 50 when a lost packet occurs.
  • the time range shown by the dotted line in the figure is the loss period during which data was lost due to the occurrence of lost packets.
  • the receiving device 50 generates a restored waveform 5 in which the waveform (original waveform 4) represented by the original data in the loss period is restored.
  • interpolated data is generated by interpolating the redundant data 3 corresponding to the loss data using a composite waveform related to the loss data.
  • the composite waveform is a waveform synthesized based on the data (preferably the main data) of a nearby frame normally received on the receiving terminal side for the purpose of using it for error concealment (PLC in this case).
  • PLC error concealment
  • the interpolated data is data generated by combining the redundant data 3 and the composite waveform, and is waveform data (audio data) used for the final concealment.
  • the waveform represented by this interpolated data is used as the restored waveform 5. Therefore, it can be said that the restored waveform 5 is a waveform generated based on the redundant data 3 and the composite waveform related to the target frame.
  • the original waveform 4 represented by the original data is shown by a solid line
  • the restored waveform 5 represented by the interpolated data is shown by a dotted line.
  • the transmission device 20 predicts in advance the waveform quality of the restoration waveform 5 restored by the reception device 50. That is, the waveform quality of the restored waveform 5 with respect to the target frame 7 of the waveform data is predicted. Typically, an index (noise spectrum or the like described later) indicating the waveform quality of the composite waveform is calculated and used as an index indicating the waveform quality of the restored waveform. Then, the frequency range assigned to the redundant data 3 is set based on the predicted waveform quality.
  • FIG. 3 is a schematic graph showing the original waveform 4 represented by the original data 6 (target frame 7) for one frame.
  • the horizontal axis of the graph is time, and the vertical axis is the amplitude value x (n). n is an index representing the time in the frame.
  • FIG. 4 is a schematic diagram showing an example of a frequency range of redundant data according to the first embodiment.
  • FIG. 4 shows a schematic graph showing the frequency spectrum of the original waveform 4 shown in FIG.
  • the horizontal axis of the graph is the frequency
  • the vertical axis is the spectrum value X (k).
  • k is an index (frequency bin) representing the frequency of each spectral value.
  • Redundant data 3 is generated by encoding the frequency component (spectral component) of the original data 6 included in this frequency range.
  • the frequency range assigned to the redundant data 3 will be referred to as a coded frequency range 70.
  • the coded frequency range 70 is, for example, a frequency range in which the waveform quality of the composite waveform is low (there is a lot of noise and the like).
  • the coded frequency range 70 corresponds to the target range.
  • the frequency range other than the coded frequency range 70 is the frequency range interpolated by using the composite waveform at the time of packet loss.
  • a frequency range other than the coded frequency range 70 will be referred to as an interpolation range 71.
  • the interpolation range 71 is, for example, a frequency range in which the waveform quality of the composite waveform is high (noise and the like are small). In the present embodiment, the interpolation range 71 corresponds to the non-target range. The method of evaluating the waveform quality and the method of setting the coding frequency range 70 (interpolation range 71) will be described in detail later.
  • FIG. 5 is a schematic view showing an example of error concealment given as a comparative example.
  • the transmission device 20 sets the data of the target frame 7 in the vicinity of the transmission frame of the main data 2 to be originally transmitted to the low frequency side according to the waveform quality of the composite waveform. Only the specific frequency band (encoded frequency range 70) is encoded, and the redundant data 3 is generated. The generated redundant data 3 is added to the packet 1 that transmits the main data 2.
  • the receiving device 50 When packet loss occurs, the receiving device 50 generates interpolated data in which the redundant data 3 corresponding to the lost main data 2 is interpolated with a composite waveform. More specifically, the range (interpolation range 71) other than the valid range (encoded frequency range 70) of the frequency spectrum of the redundant data 3 is replaced with the frequency spectrum of the composite waveform generated from the neighboring frames normally received in the past. Will be done.
  • the amount of redundant data 3 is as small as possible within the acceptable quality.
  • the amount of redundant data 3 can be further reduced by predicting the waveform quality of the composite waveform in advance with the transmission device 20 and encoding only the band in which the quality is below a certain level. .. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.
  • a process of replacing the spectrum outside the band of the redundant data 3 with the spectrum of a nearby frame can be mentioned. It can be said that this process uses a waveform obtained by copying a nearby frame as a composite waveform. It is possible to significantly improve the deterioration of sound quality such as a feeling of clogging or discontinuity of sound due to energy attenuation in a high frequency range or the like even by performing such a simple and small amount of calculation processing. In other words, it is possible to maintain a high level of error concealment quality with a small amount of redundant data at all times without increasing the amount of calculation.
  • FIG. 6 is a schematic diagram showing an example of a waveform data transmission method.
  • a waveform data transmission method including coding and decoding will be described with reference to FIG.
  • data representing an audio signal (input signal) such as voice is assumed.
  • the input signal is separated by the frame length of the N sample, and an analysis frame of the 2N sample that overlaps with the previous frame by the same size as this frame length is generated.
  • the analysis frame of this 2N sample is used as the transmission frame 8.
  • FIG. 6 the time range corresponding to x (n), which is the data included in the transmission frame 8, is schematically illustrated by using arrows.
  • x prev (n) and x next (n) are a transmission frame 8 (previous frame) that is temporally earlier and a transmission frame 8 (rear frame) that is temporally later than x (n). ..
  • the original data 6 (input signal) of these transmission frames 8 is subjected to a predetermined analysis window to calculate a time-frequency-converted frequency spectrum.
  • the type of analysis window is not limited.
  • FIG. 6 schematically illustrates the outline of the function representing the analysis window.
  • a modified discrete cosine transform (MDCT) or the like is used for the time-frequency conversion.
  • the frequency spectrum of the transmission frame 8 is encoded, and the encoded data is packed in the packet 1 as the main data 2 and transmitted. At this time, redundant data 3 related to a frame (target frame 7) in the vicinity of the main data 2 is generated and added to the same packet 1.
  • the type and setting of the coding method are not limited.
  • the receiving device 50 After receiving the transmitted data (packet 1), the receiving device 50 decodes the data and returns it to the frequency spectrum. For decoding, a decoding method corresponding to the coding method by the transmission device 20 is used. An inverse modified discrete cosine transform (IMDCT) is applied to the decoded frequency spectrum to calculate the time waveform of a 2N sample.
  • FIG. 6 schematically shows y (n), which is data included in the received frame. An output signal is generated by applying a composite window to these y (n) and adding them in an overlapping manner with the front frame and the rear frame which have the same positional relationship as the transmitting side. The output signal is a signal in which the same waveform as the input signal is reconstructed.
  • FIG. 7 is a block diagram showing a configuration example of the transmission / reception system 100.
  • the transmission / reception system 100 is a system that transmits waveform data 10 stored as an audio file from the transmission device 20 to the reception device 50 according to, for example, a BLE communication method.
  • the transmission / reception system 100 is designed assuming a use case in which, for example, both the transmission device 20 and the reception device 50 have restrictions on the amount of calculation. Examples of such a configuration include a combination of a transmitting device 20 such as a smartphone or a digital audio player and a receiving device 50 having a limited computing power such as wireless earphones or wireless headphones.
  • this technique can be applied even when a device having sufficient computing power (for example, a PC on the transmitting side and a stationary audio player on the receiving side) is used.
  • the transmission device 20 includes a retransmission timeout time calculation unit 21, a signal processing unit 22, an input buffer 23, a redundant data generation unit 24, a coding unit 25, a mux unit 26, and a transmission buffer 27.
  • the transmission device 20 is configured by using, for example, a computer including a CPU and a memory.
  • the transmission device 20 executes the program related to the present embodiment and each unit operates, the transmission method according to the present embodiment is executed.
  • the retransmission timeout time calculation unit 21 acquires the parameters determined according to the combination of the transmission device 20 and the reception device 50, and calculates the retransmission timeout time.
  • the retransmission timeout time is a time limit for allowing the receiving device 50 to retransmit the packet 1 that has not been received.
  • the packet 1 that is not received even if the retransmission timeout time is exceeded is processed as a lost packet.
  • the signal processing unit 22 reads the audio data (waveform data 10) required for generating one frame from the audio file, executes predetermined signal processing, and generates the original data 6. For example, MDCT is executed and the frequency spectrum of each frame is calculated as the original data 6. In addition, signal processing for adjusting gain and sound quality may be executed.
  • the input buffer 23 is a buffer that temporarily stores the data processed by the signal processing unit 22.
  • the input buffer 23 stores the original data 6 representing the frequency spectrum and the time waveform of the waveform data. When the capacity of the input buffer 23 is full, the original data 6 having the lowest priority (typically, the oldest original data 6) is discarded.
  • the redundant data generation unit 24 reads the original data 6 stored in the input buffer 23 and generates the redundant data 3 and the main data 2.
  • the coded frequency range 70 to which the redundant data 3 is assigned is set based on the waveform quality of the composite waveform. Further, in the setting of the coded frequency range 70, information on the quantization setting (for example, the resolution at the time of quantizing the value of the frequency spectrum) output from the coding unit 25 described later and the amount of data to be transmitted (for example). It is also possible to use information about (such as the remaining data capacity of packet 1). The specific configuration, operation, and the like of the redundant data generation unit 24 will be described in detail later.
  • the coding unit 25 encodes the main data 2 and the redundant data 3 output from the redundant data generation unit 24 according to the corresponding coding methods, respectively.
  • the main data 2 is encoded with a relatively low compression ratio
  • the redundant data 3 is encoded with a higher compression ratio than the main data 2.
  • the mux unit 26 stores the main data 2 and the redundant data 3 encoded by the coding unit 25 in a predetermined packet 1.
  • the data capacity of packet 1 and the like are set according to the communication method used (here, BLE communication).
  • the transmission buffer 27 is a buffer that temporarily stores the packet 1 generated by the mux unit 26.
  • the packets 1 stored in the transmission buffer 27 are transmitted in a predetermined order via the transmission module (not shown).
  • FIG. 8 is a block diagram showing a configuration example of the redundant data generation unit.
  • the redundant data generation unit 24 includes an original data selection unit 30, a composite waveform generation unit 31, a generated noise calculation unit 32, a coding range setting unit 33, and a coding spectrum selection unit 34.
  • both the data X (k) representing the time-frequency-converted frequency spectrum and the data x (n) representing the corresponding time waveform are both. Is generated as the original data 6.
  • the original data selection unit 30 selects and acquires necessary data from the original data 6 stored in the input buffer 23. Specifically, data representing the frequency spectrum and time waveform of the original data 6 to be processed is read from the input buffer 23. Further, as shown in FIG. 8, the delivery destination of the data differs depending on the data to be acquired.
  • the frequency spectrum of the original data 6 original data 6 included in the transmission frame 8) corresponding to the main data 2 is passed through to the coding unit 25 in the subsequent stage as it is.
  • the frequency spectrum and time waveform of the original data 6 (original data 6 included in the target frame 7) corresponding to the redundant data 3 are input to the generated noise calculation unit 32 and the coded spectrum selection unit 34.
  • the frequency spectrum and time waveform of the original data 6 for the composite waveform are input to the composite waveform generation unit 31.
  • the original data 6 for the composite waveform is data included in a frame (for example, a transmission frame 8) in the vicinity of the target frame 7 which is the redundant data 3.
  • the composite waveform generation unit 31 generates a composite waveform for the target frame 7 based on the original data 6 for the composite waveform. For example, synthetic data representing the frequency spectrum of the composite waveform is generated. Alternatively, synthetic data representing the time waveform of the composite waveform may be calculated. In this case, a process of converting the time waveform into a frequency spectrum or the like may be executed. As described above, in the present disclosure, generating synthetic data representing the frequency spectrum of the synthetic waveform and generating synthetic data representing the time waveform of the synthetic waveform are included in generating the synthetic waveform.
  • a method for generating the composite waveform in the present embodiment, a method is adopted in which the frequency spectrum and the time waveform of the original data 6 for the composite waveform are used as they are as the composite waveform.
  • the data obtained by copying the original data 6 for the composite waveform becomes the composite data.
  • the method of generating the composite waveform is not limited, and for example, a method of appropriately processing the original data 6 for the composite waveform to generate the composite waveform may be used.
  • the composite waveform (composite data) is input to the generated noise calculation unit 32.
  • the generated noise calculation unit 32 acquires the original data 6 corresponding to the redundant data 3 and the composite data representing the composite waveform, and calculates the noise information related to the composite waveform.
  • the noise information is information representing noise generated when data is interpolated using, for example, a composite waveform. For example, the deviation of the composite waveform with respect to the original waveform 4 of the target frame 7 is calculated as the noise of the composite waveform.
  • the generated noise calculation unit 32 calculates the frequency distribution (noise spectrum) of such noise, the total amount of noise, and the like as noise information.
  • the noise information is not only the information indicating the waveform quality of the composite waveform but also the information representing the waveform quality of the restored waveform 5 generated by using the composite waveform.
  • the generated noise calculation unit 32 predicts the waveform quality of the composite waveform as the waveform quality of the restored waveform.
  • the generated noise calculation unit 32 corresponds to the quality prediction unit.
  • the coding range setting unit 33 acquires noise information and sets a frequency range (coding frequency range 70) to be encoded as redundant data 3.
  • one coded frequency range 70 is set on the low frequency side with respect to the original data 6 included in the target frame 7 based on the noise information (waveform quality of the composite waveform).
  • the coded frequency range 70 can be set by using other information indicating the waveform quality of the restored waveform 5.
  • the coding range setting unit 33 allocates the frequency assigned to the redundant data 3 for generating the restored waveform 5 from the original data 6 (waveform data) included in the target frame 7 based on the waveform quality of the restored waveform 5.
  • the coded frequency range 70 is set as the range. Further, as shown in FIG. 8, information regarding the quantization setting in coding, the data amount of the packet, and the like is input to the coding range setting unit 33.
  • the coded frequency range 70 may be set using this information.
  • the coded spectrum selection unit 34 acquires the frequency spectrum of the original data 6 corresponding to the redundant data 3 and the coded frequency range 70, and extracts the spectrum component to be used as the redundant data 3. Specifically, only the spectral components included in the coded frequency range 70 are extracted from the original data 6. The data representing these spectral components becomes the redundant data 3 before coding. In this way, the coded spectrum selection unit 34 generates the redundant data 3 based on the coded frequency range 70.
  • the redundant data 3 and the main data 2 before encoding are input to the coding unit 25 shown in FIG. 7, and the encoded redundant data 3 and the main data 2 are generated. Then, the mux unit 26 generates a packet 1 (transmission data) including the encoded main data 2 and the redundant data 3.
  • the coded spectrum selection unit 34, the coding unit 25, and the mux unit 26 cooperate to realize a data generation unit that generates transmission data including redundant data.
  • a single coded frequency range 70 is set on the low frequency side of the graph of the frequency spectrum.
  • the maximum frequency k max of the coded frequency range 70 is set based on the above-mentioned noise information (noise spectrum).
  • a method of generating a composite waveform related to the target frame 7 (redundant data 3) by the receiving device 50 a method of copying the original data 6 one frame before the target frame 7 is adopted.
  • the redundant data 3 is the data related to x (n) shown in FIG.
  • coded frequency range 70 By limiting the coded frequency range 70 to the low frequency side in this way, information for designating the coded frequency range 70 (for example, meta information used in the embodiments described later) becomes unnecessary, and redundant data 3 It is possible to avoid a situation in which the amount of data in the data increases. Further, by adopting a method of copying the original data 6 one frame before as a waveform synthesis method, it is possible to reduce the amount of calculation on the receiving device 50 side.
  • FIG. 9 is a flowchart showing an example of the generation process of the redundant data 3.
  • the process shown in FIG. 9 is an example of the process executed by the redundant data generation unit 24 and the coding unit 25. This process is, for example, a loop process that is executed every time packet 1 is generated.
  • the original data selection unit 30 acquires the original data 6 to be processed from the input buffer 23 (step 101). Specifically, the original data 6 of the transmission frame 8 for generating the main data 2, the original data 6 of the target frame 7 for generating the redundant data 3, and the composite waveform for generating the composite waveform. The original data 6 and the original data 6 are read from the input buffer 23.
  • the transmission frame 8 is the Mth frame (M)
  • the target frame 7 is the M + 1th frame (M + 1) (see FIG. 2A and the like).
  • the composite waveform is generated from the original data 6 of the frame immediately before the target frame 7. Therefore, the original data 6 for the composite waveform becomes the main data 2. If the target frame 7 (redundant data 3) is not the frame immediately after the transmission frame (main data 2), the original data 6 one frame before the target frame 7 in chronological order is the main data 2. Separately, it is read as the original data 6 for the composite waveform.
  • the composite waveform generation unit 31 executes a composite waveform generation process for generating a composite waveform for the target frame 7 based on the original data 6 for the composite waveform (step 102).
  • the composite waveform generation unit 31 generates a composite waveform for the target frame 7 based on the original data 6 (main data 2) included in the transmission frame 8. It is desirable that the method for generating the composite waveform is exactly the same as the waveform synthesis method used in the receiving device 50.
  • a method of copying the original data 6 one frame before the target frame 7 (redundant data 3) is used. Therefore, the process by the composite waveform generation unit 31 is a process of passing through the main data 2 which is the original data 6 for the composite waveform as it is.
  • a composite waveform is appropriately generated based on the original data 6 for the composite waveform according to the set method.
  • the generated noise calculation unit 32 executes a generated noise prediction process for predicting the noise generated by using the composite waveform (step 103). Specifically, by using the composite waveform, the frequency spectrum (hereinafter, referred to as noise spectrum) of the noise generated with respect to the original waveform (original waveform 4) is calculated.
  • the noise spectrum is typically calculated as a power spectrum representing the intensity (power) of noise for each frequency.
  • the noise spectrum can also be used as a measure of the waveform quality of the composite waveform. That is, it can be said that the generated noise prediction process is a process for predicting the waveform quality (noise spectrum) of the composite waveform.
  • the noise spectrum 13 representing the waveform quality of the composite waveform is calculated based on the composite waveform 11 and the original waveform 4 represented by the original data 6 included in the target frame 7.
  • FIG. 10 is a schematic diagram showing a calculation example of a noise spectrum.
  • 10A to 10C are schematic graphs showing an example of a time waveform for one frame of the original waveform 4, the composite waveform 11, and the difference waveform 12.
  • FIG. 10D is a schematic graph showing an example of the noise spectrum 13.
  • a method of calculating the noise spectrum 13 by using the original waveform 4 which is a time waveform and the composite waveform 11 will be described.
  • the generated noise calculation unit 32 reads the original waveform x (n) represented by the original data 6 used for the redundant data 3 and the composite waveform x'(n) calculated by the composite waveform generation unit 31, and the difference between the waveforms.
  • the difference waveform 12 representing the above is calculated.
  • the original waveform 4 original data 6
  • the composite waveform 11 composite data shown in FIG. 10B are read.
  • the composite waveform 11 is, for example, the waveform of the previous frame of the target frame 7 including the original waveform 4, and therefore does not completely match the shape of the original waveform 4.
  • the difference waveform 12 (x (n) ⁇ x'(n)) between the original waveform 4 and the composite waveform 11 is calculated.
  • the difference waveform 12 is a waveform representing the difference between the original waveform 4 and the composite waveform 11 at each timing n.
  • FIG. 10C schematically shows a difference waveform 12 between the original waveform 4 and the composite waveform 11 shown in FIGS. 10A and 10B.
  • Time-frequency conversion (for example, Fourier transform) is executed on the difference waveform 12, and the frequency spectrum of the difference waveform 12 is calculated.
  • the power spectrum representing the absolute value of this frequency spectrum is calculated as the noise spectrum 13 (P noise (k)).
  • P noise (k) is expressed using the following equation.
  • F in (Equation 1) represents a Fast Fourier Transform (FFT) for the difference waveform 12.
  • the noise spectrum 13 is a spectrum obtained by frequency-converting the difference between the original waveform 4 and the composite waveform 11. This makes it possible to evaluate the noise generated in the actual time waveform for each frequency, and it is possible to accurately predict the waveform quality of the composite waveform 11.
  • FIG. 10D schematically shows a power spectrum obtained by Fourier transforming the difference waveform 12 shown in FIG. 10C.
  • the noise spectrum 13 (P noise (k)) is a frequency spectrum representing the intensity of the difference between the original waveform 4 and the composite waveform 11 for each frequency k.
  • P noise (k) is a frequency spectrum representing the intensity of the difference between the original waveform 4 and the composite waveform 11 for each frequency k.
  • the composite waveform 11 can be regarded as a waveform deviated from the original waveform 4, and the quality of the composite waveform can be regarded as low.
  • the noise spectrum 13 is preferably calculated by applying the analysis window w (n).
  • the noise spectrum 13 is calculated using the following equation instead of the equation (Equation 1).
  • the analysis window w (n) is appropriately set according to the process of interpolating the data using, for example, the composite waveform 11. This makes it possible to accurately predict the noise that actually occurs by using the composite waveform.
  • the method of calculating the noise spectrum 13 by executing the FFT after taking the difference between the two time waveforms (the original waveform 4 and the composite waveform 11) has been described. Instead of this, it is also possible to calculate the noise spectrum 13 using the frequency spectra of the original waveform 4 and the composite waveform 11.
  • the generated noise calculation unit 32 reads the frequency spectrum X (k) of the original waveform 4 and the frequency spectrum X'(k) of the composite waveform 11 to calculate a difference spectrum representing the difference between the respective spectra.
  • X (k) and X'(k) for example, the MDCT spectrum in which x (n) and x'(n) (for example, the original data 6 of the main data 2) are MDCC-converted in advance by the signal processing unit 22 is used. It can be used.
  • the noise spectrum 13 is a spectrum representing the difference between the spectrum of the original waveform 4 and the spectrum of the composite waveform 11.
  • the original data 6 here, the main data 2 for the composite waveform that has already been MDCT-converted may be directly read. Therefore, it is not necessary to perform the time-frequency conversion process (FFT or the like) again in order to calculate the noise spectrum 13, and the MDCT spectrum X'(k) used when generating the main data 2 can be reused. Will be. As a result, the amount of calculation can be sufficiently suppressed.
  • FFT time-frequency conversion process
  • the moving average of the spectra calculated using (Equation 1) to (Equation 3) may be calculated as the noise spectrum 13.
  • the moving average is a process of moving a predetermined bin range (for example, bins for three spectra) and calculating the average of each spectrum value included in the bin range.
  • the noise spectrum 13 (P noise-smoothed (k)) calculated by the moving average is expressed using the following equation.
  • the noise spectrum 13 can be smoothed, and the data processing in the subsequent stage can be easily executed.
  • a spectrum obtained by smoothing the noise spectrum 13 calculated by using ( Equation 1) or the like (P noise-smoothed (k)) is also included in the noise spectrum 13.
  • P noise-smoothed (k) will be described simply as P noise (k).
  • the coding range setting unit 33 executes the coding range setting process for setting the coding frequency range 70 (step 104).
  • the coding frequency range 70 is set based on the noise spectrum 13 (P noise (k)) calculated by the generated noise calculation unit 32.
  • P noise (k) the noise spectrum 13
  • a frequency range in which noise is large is calculated from the noise spectrum 13, and is set to a coded frequency range 70 to be assigned to the redundant data 3.
  • the frequency range assigned to the redundant data 3 is a frequency range in which the waveform quality of the composite waveform is low. In this way, by using the noise spectrum, it is possible to accurately set the frequency range to be assigned to the redundant data 3 (that is, the composite waveform should not be used).
  • the coding range setting process will be described in detail later.
  • the coded spectrum selection unit 34 executes a coded spectrum selection process for extracting only the spectrum components corresponding to the coded frequency range 70 from the original data 6 of the target frame 7.
  • Step 105 For example, the original data 6 representing the MDCT spectrum X (k) of the target frame 7 is input to the coded spectrum selection unit 34.
  • the spectral components (frequency components) included in X (k) the components included in the coded frequency range 70 are extracted.
  • the data including the extracted components becomes the redundant data 3 before coding. Therefore, the data amount of the redundant data 3 before coding changes according to the width of the coding frequency range 70 (the state of the noise spectrum 13).
  • step 106 When the redundant data 3 is extracted, it is determined whether or not the original data 6 to be processed remains (step 106). Specifically, it is determined whether or not the redundant data 3 to be packed in one packet 1 remains. For example, when the original data 6 to be processed (that is, the redundant data 3 to be generated) remains (YES in step 106), the processes of steps 101 to 105 are executed for the remaining original data 6.
  • the configuration of the main data 2 and the redundant data 3 packed in one packet 1 is not limited. For example, in the example described with reference to FIG. 2A, packet 1 including one data set (main data 2 and redundant data 3) is generated. In this case, the loop processing of step 106 is not executed. On the other hand, when a plurality of data sets are packed in one packet 1, the processes up to step 105 are executed for the number of redundant data 3 to be generated.
  • the coding process for encoding the redundant data 3 is executed (step 107).
  • the unencoded redundant data 3 (spectral component included in the encoded frequency range 70) generated in the above process is encoded by a predetermined coding method, and the encoded redundant data 3 is used. Is generated.
  • the coding process it is possible to adjust the data amount of the encoded redundant data 3 by appropriately setting the compression rate (bit rate) and the like at the time of coding. For example, when the target data amount of the redundant data 3 is set, the redundant data 3 is encoded with a compression rate that fits in the target data amount. Further, the compression rate of the redundant data 3 may be fixed. In this case, the amount of coded redundant data 3 varies depending on the width of the coded frequency range 70 and the like.
  • the coding of the main data 2 is executed separately from the coding of the redundant data 3.
  • the target data amount of the main data 2 is set (step 108). Specifically, the free space of the packet 1 is calculated from the data amount of the encoded redundant data 3.
  • the free capacity of the packet 1 is, for example, the capacity obtained by subtracting the total amount of the encoded redundant data 3 from the data size of the packet 1.
  • the free space of the packet 1 is set as the target data amount of the main data 2. For example, when encoding the main data 2, the compression rate or the like is appropriately set so that the amount of the encoded main data 2 fits within the target data amount set here.
  • FIG. 11 is a schematic diagram showing a calculation example of the coded frequency range.
  • FIG. 11 shows a schematic graph showing a noise spectrum 13 (P noise (k)). The horizontal axis of the graph is the frequency, and the vertical axis is the spectrum value of P noise (k).
  • P noise noise
  • the coding range setting unit 33 sets the integration range 72 for calculating the integrated value of the noise spectrum 13, and the coding frequency range is based on the minimum integration range 72 in which the integrated value exceeds the first threshold value. 70 is set.
  • the integrated value of the noise spectrum 13 is a value representing the total amount of noise in the target frequency range. Therefore, it can be said that the coded frequency range 70 is set to a frequency range in which the total noise amount is about the first threshold value.
  • the minimum integration range 72 whose integrated value exceeds the first threshold value is set as the coding frequency range 70 will be described.
  • the total noise amount P noise-sum in the preset total calculation range 73 is calculated.
  • the total calculation range 73 is schematically shown as a dotted line range.
  • the minimum index (lowest frequency) of the total calculation range 73 is set to 0.
  • the maximum index (maximum frequency) of the total calculation range 73 can be arbitrarily set to be less than half (N) of the total number of FFT indexes of 2N (FFT length).
  • the maximum index of the total calculation range 73 is described as total_area.
  • the total noise amount P noise-sum in the total calculation range 73 is expressed using the following equation.
  • the integration range 72 is set, and the total noise amount P noise-red un-sum in the integration range 72 is calculated.
  • the integration range 72 is schematically shown as a solid line range.
  • the minimum index (lowest frequency) of the integration range 72 is set to 0.
  • the maximum index (maximum frequency) of the integration range 72 is set to be included in the total calculation range 73.
  • redun_area the maximum index of the integration range 72 will be referred to as redun_area.
  • the total noise amount P noise-red un-sum in the integration range 72 is expressed using the following equation.
  • the total noise amount P noise-red un-sum in the integration range 72 represented by the equation (Equation 6) is equal to or more than a predetermined ratio ⁇ (for example, 0.7) with respect to the total noise amount P noise-sum in the total calculation range 73.
  • the minimum integration range 72 is calculated. That is, the minimum value of redun_area that satisfies the following formula is calculated.
  • the left side of the equation (Equation 7) (the product of the predetermined ratio ⁇ and the total noise amount P noise-sum in the total calculation range 73) corresponds to the above-mentioned first threshold value.
  • the minimum value of redun_area satisfying the equation (Equation 7) is set to the maximum index (maximum frequency) of the coded frequency range 70. Further, as described with reference to FIG. 4, the minimum index (minimum frequency) of the coding frequency range 70 is set to 0.
  • the coded frequency range 70 is set to the frequency range according to the distribution of noise due to the composite waveform. This makes it possible to allocate only the necessary frequency range to the redundant data 3, as compared with the case where, for example, the entire frequency region or a fixed fixed region of the original data 6 is allocated to the redundant data 3. As a result, it is possible to reduce the amount of data without degrading the quality of the redundant data 3.
  • the P noise-residue satisfies the following equation if the quantization noise associated with the coding of the redundant data 3 is ignored.
  • the noise P noise-residue remaining in the region not retained as the redundant data 3 is reduced to (1- ⁇ ) or less of the noise originally existing in the total calculation range 73. become.
  • the total calculation range 73 is set based on the human audible range.
  • P noise-sum is the total amount of noise due to the composite waveform 11 in the frequency range that can be heard by humans.
  • the above method is a method of setting the encoded frequency range 70 of the redundant data so as to reduce the noise amount by a predetermined ratio ⁇ in the total amount of noise in such an audible range.
  • redun_area as the maximum index
  • the coding frequency range 70 becomes large.
  • the quantization noise described later may increase.
  • the target data amount of the redundant data 3 is predetermined, the larger the coding frequency range 70 is, the larger the quantization noise is.
  • FIG. 12 is a flowchart showing an example of the coding range setting process.
  • the process shown in FIG. 12 is an example of the coding range setting process described with reference to FIG.
  • P noise (k) is read by the coding range setting unit 33 (step 201).
  • the total noise amount P noise-sum in the total calculation range 73 is calculated (step 202).
  • the maximum index redun_area of the integration range 72 is initialized (step 203).
  • the redun_area functions as a variable used to set the coding frequency range 70.
  • 0 is assigned to redun_area.
  • step 205 it is determined whether or not P noise-redun-sum satisfies the condition shown in the equation (Equation 7) (step 205). That is, by comparing P noise-redun-sum and P noise-sum , it is determined whether or not P noise-redun-sum is ⁇ (0 ⁇ ⁇ 1) times or more of P noise-sum. NS.
  • redun_area is set to the maximum value (maximum index) of the coding frequency range 70 (step 208). If the condition of Eq. (Equation 7) is not satisfied (NO in step 205), redun_aera is incremented and 1 is added to redun_area (step 206).
  • step 207 it is determined whether or not the incremented redun_area is within the total calculation range 73 (step 207). That is, it is determined whether or not redun_area is smaller than total_area (the maximum index of the total calculation range 73). If redun_area is smaller than total_area (YES in step 207), the processes after step 204 are repeated. If redun_area is greater than or equal to total_area (NO in step 207), step 208 is executed.
  • the minimum frequency of the integration range 72 is set to the minimum frequency of the noise spectrum 13, and the integration value is calculated by changing the maximum frequency of the integration range. This makes it possible to easily calculate redundant data 3 that suppresses noise in a frequency range that is easily heard by humans.
  • the coded frequency range 70 focuses mainly on the noise (noise spectrum 13) generated by using the composite waveform 11.
  • the complementary data may include quantization noise associated with the coding of the redundant data 3.
  • the original data 6 is quantized according to the set compression rate and the like. At this time, the higher the compression rate and the lower the quantization accuracy (bit rate, etc.), the smaller the amount of data, but on the other hand, the larger the quantization noise.
  • the target data amount (hereinafter referred to as nbit) of the redundant data 3 is predetermined so that the size of the redundant data 3 does not become larger than necessary with respect to the main data 2.
  • the compression rate or the like is set so that the amount of the redundant data 3 after encoding is contained in nbit. Therefore, for example, when the coding frequency range 70 is large, the compression rate is set high and the quantization noise may increase.
  • FIG. 13 is a schematic diagram for explaining the total noise amount of the interpolated data.
  • 13A and 13B are schematic graphs showing the frequency distribution of noise included in the interpolated data.
  • the horizontal axis of the graph is the frequency, and the vertical axis is the noise intensity at each frequency.
  • the coded frequency range 70 set on the low frequency side is a region where the redundant data 3 is used, and the noise of the interpolated data is represented by using the quantization noise N q (k). ..
  • the frequency side higher than the coded frequency range 70 is a region where the composite waveform 11 is used, and the noise of the interpolated data is represented by using the noise P noise (k) due to the composite waveform.
  • FIG. 13B a coding frequency range 70 wider than that in FIG. 13A is set. In this case, the total amount of P noise (k) decreases and the total amount of N q (k) increases.
  • FIG. 13C is a graph showing the relationship between the total noise amount of the interpolated data and the coded frequency range.
  • the maximum index (redun_area) of the coded frequency range 70 is changed, the total amount of noise due to the composite waveform P noise , the total amount of quantization noise N q , and the total amount of noise of the interpolated data (P noise + N).
  • the graphs showing q) are shown respectively.
  • redun_area shifts to the high frequency side
  • P noise decreases but N q increases. That is, P noise and N q have a trade-off relationship with each other with respect to redun_area. Therefore, the total noise amount (P noise + N q ) of the interpolated data is represented by a downwardly convex graph as shown in FIG. 13C, and becomes the minimum value when redun_area is set to a certain frequency.
  • the coding range setting process in consideration of the quantization noise will be described.
  • the quantization noise N q generated from the information such as the quantization accuracy determined according to the coding method of the redundant data 3 instead of P noise (k) for the frequency range assigned to the redundant data 3. (K) is simply calculated.
  • the coding frequency range 70 is set so that the noise power (total noise amount) in the entire interpolated data is minimized.
  • the target data amount of the redundant data 3 is set.
  • the target data amount (nbit) and the information of the coding method used for coding the redundant data 3 are input to the coding range setting unit 33.
  • the quantization noise N q (k) generated for each frequency is calculated.
  • a value obtained by estimating the quantization noise N q (k) is calculated according to the coding method of the redundant data 3.
  • the total noise amount P noise-residue of the interpolated data is expressed using the following equation.
  • P noise-residue is expressed as the total amount of noise remaining in the range other than the coded frequency range 70 (total amount of noise according to the composite waveform 11) and the total amount of quantization noise N q (k) in the coded frequency range 70. Will be done.
  • the redun_area that minimizes the P noise-residue shown in the equation (Equation 9) is calculated.
  • redun_area that minimizes P noise-residue is calculated and set to the maximum index of the coding frequency range 70. Will be done.
  • the coding range setting unit 33 is the interpolation data in which the redundant data 3 is interpolated by the composite waveform 11 based on the noise spectrum 13 and the quantization noise N q (k) accompanying the coding of the redundant data 3.
  • the total amount of noise P noise-residue is calculated, and the coding frequency range 70 is set so that the total amount of noise P noise-residue is minimized. This makes it possible to minimize the sum of noise in all bands that are important for hearing. As a result, the quality of the interpolated data can be sufficiently improved.
  • the coding frequency range 70 is set based on the intensity (power) of the spectral component that becomes the redundant data 3.
  • the target power P target is set as a threshold value for the total power in the frequency range to be used as the redundant data 3.
  • the P target is calculated using, for example, a table in which the P target corresponding to the target data amount (nbit) of the redundant data 3 is recorded, a calculation formula for calculating the P target according to the target data amount, or the like. The following conditional expression is set based on this P target.
  • the maximum redun_area satisfying the equation (Equation 10) is calculated and set to the maximum index of the coding frequency range 70. That is, the maximum frequency range in which the total power of the redundant data 3 is less than the target power P target is set as the coded frequency range 70. By using this method, the coding frequency range 70 can be easily set.
  • the threshold threshold (k) set for each frequency is subtracted from the value of each frequency distribution based on the human auditory characteristics.
  • This threshold (k) is set using, for example, a loudness curve showing a frequency distribution of a volume that can be heard by humans. If the value after subtraction becomes negative, it is set to 0.
  • a process of weighting the values of each frequency distribution may be executed according to the threshold (k). In this way, by adding corrections according to human auditory characteristics, it is possible to avoid a situation in which noise components that would otherwise be inaudible are counted. As a result, the coding frequency range 70 can be set appropriately.
  • the receiving device 50 includes a communication controller 51, a receiving buffer 52, a demox unit 53, a main data buffer 54, a redundant data buffer 55, a playback data selection unit 56, and a decoding unit 57. And a signal processing unit 58 and an audio DAC 59.
  • the receiving device 50 is configured by using, for example, a computer including a CPU and a memory. When the receiving device 50 executes the program related to the present embodiment and each unit operates, the receiving method according to the present embodiment is executed.
  • the communication controller 51 monitors, for example, BLE communication and controls the communication state.
  • the communication controller 51 generates packet loss information, for example, when packet 1 is lost. Based on this information, error concealment in the receiving device 50 is started.
  • the reception buffer 52 is a buffer that receives the packet 1 transmitted from the transmission device 20 and temporarily stores the packet 1.
  • the packet 1 includes the encoded main data 2 and the redundant data 3.
  • the main data 2 is data in which the original data 6 included in the transmission frame 8 is encoded in the transmission device 20.
  • the redundant data 3 is data in which the spectrum component of the coded frequency range 70 is encoded in the original data 6 included in the target frame 7. That is, the reception buffer 52 is assigned to the coded frequency range 70 of the frequency range of the waveform data (original data 6) included in the target frame 7 based on the waveform quality of the restored waveform 5 with respect to the target frame 7 of the waveform data.
  • the redundant data 3 is received.
  • the reception buffer 52 corresponds to the receiving unit.
  • the demux unit 53 appropriately reads the packet 1 stored in the reception buffer 52 and separates the encoded main data 2 into the encoded redundant data 3. Further, a data number (frame ID) for specifying the data is added to each separated data in the reproduction data selection unit 56 described later. Further, the demux unit 53 inquires the communication controller 51 for packet loss information. When packet loss occurs, packet loss information is output to each part in the subsequent stage.
  • the main data buffer 54 and the redundant data buffer 55 are buffers that temporarily store the encoded main data 2 and the redundant data 3 separated by the demux unit 53.
  • the reproduction data selection unit 56 reads data to be reproduced (hereinafter, referred to as reproduction data) from the main data buffer 54 or the redundant data buffer 55.
  • the playback data is selected in chronological order so that the frames are played back in an appropriate order. Further, the reproduction data selection unit 56 notifies the signal processing unit 58 of the presence / absence of the reproduction data.
  • the decoding unit 57 reads the reproduction data (main data 2 or one or more redundant data 3) selected by the reproduction data selection unit 56, and decodes each data according to the corresponding coding method.
  • the signal processing unit 58 performs signal processing on the data (main data 2 or redundant data 3) decoded by the decoding unit 57, and generates digital data representing the final time waveform. For example, when there is no packet loss and the main data 2 is properly received, frequency-time conversion (for example, IMDCT) is executed on the decoded main data 2. Further, for example, when packet loss occurs and the main data 2 to be reproduced does not exist, it is based on the corresponding redundant data 3 and the data for generating the composite waveform (main data 2 in the frame near the loss data, etc.). To generate interpolated data. Processing such as frequency-time conversion is executed on this interpolated data.
  • a method of generating the composite waveform it is assumed that a method of copying the original data 6 (main data 2) one frame before the target frame 7 is used.
  • the audio DAC 59 performs digital-analog conversion on the digital data processed by the signal processing unit 58 to generate an analog audio signal.
  • This audio signal is input to a reproduction element such as a speaker (not shown), and the sound of the audio file (waveform of waveform data 10) is reproduced.
  • FIG. 14 is a block diagram showing a configuration example of the signal processing unit 58 included in the receiving device 50.
  • the signal processing unit 58 includes a spectrum replacement unit 60, a spectrum buffer 61, an IMDCT unit 62, and a time signal output unit 63.
  • the spectrum replacement unit 60 executes a spectrum component replacement process on the redundant data 3 decoded by the decoding unit 57 in the previous stage. If no packet loss has occurred, the decoded main data 2 is acquired and output as it is to the subsequent stage.
  • the data output from the spectrum replacement unit 60 will be referred to as spectrum data.
  • information for specifying the presence / absence of redundant data 3 and information for specifying a method for replacing spectrum components are input to the spectrum replacement unit 60.
  • the information for specifying the replacement method includes information for specifying the data used for the replacement and information for specifying the frequency range for the replacement. Based on this information, the spectral component replacement process is executed.
  • the spectrum replacement unit 60 when packet loss occurs, the spectrum replacement unit 60 is input with redundant data 3 regarding the lost frame (target frame 7) and spectrum data one frame before the target frame 7 stored in the spectrum buffer 61. NS.
  • the spectrum data one frame before is the composite data representing the spectrum of the composite waveform 11.
  • the spectrum replacement unit 60 replaces the spectrum components of the interpolation range 71 other than the coding frequency range 70 assigned to the redundant data 3 by using the spectrum data (composite data) one frame before, and obtains new spectrum data (composite data). Interpolated data) is output.
  • the spectrum replacement unit 60 and the spectrum buffer 61 generate the composite waveform 11 for the target frame 7, and the interpolated data obtained by interpolating the redundant data 3 with the composite waveform 11 is generated.
  • the waveform represented by the interpolated data is the restored waveform 5.
  • the spectrum data output from the spectrum replacement unit 60 becomes the decoded main data 2 when the packet loss does not occur, and becomes the interpolated data when the packet loss occurs.
  • These spectral data are appropriately stored in the spectral buffer 61 in case the subsequent frames need to be replaced with spectral components.
  • the spectrum replacement unit 60 and the spectrum buffer 61 work together to realize a waveform restoration unit that generates a restoration waveform based on redundant data.
  • the IMDCT unit 62 executes the IMDCT on the spectrum data (decoded main data 2 or interpolated data) output from the spectrum replacement unit 60. As a result, the data representing the time waveform is restored in frame units.
  • the time signal output unit 63 applies a composite window (see FIG. 6) to the result of IMDCT, and executes superimposition addition with the result of the previous IMDCT. This makes it possible to reconstruct a digital audio signal (digital data) that is continuous in time. The result of the superimposition addition is output to the audio DAC in the subsequent stage.
  • FIG. 15 is a flowchart showing an example of the operation of the signal processing unit 58.
  • This process is a loop process that is continuously executed on a frame-by-frame basis.
  • the signal processing unit 58 (spectrum replacement unit 60) is the main data whose decoding is completed in the previous stage processing. It is assumed that the spectrum of 2 or the redundant data 3 can be acquired.
  • the spectrum replacement unit 60 determines whether or not the data acquired by the spectrum replacement unit 60 is the main data 2 (step 301).
  • the main data 2 is stored in the spectrum buffer 61 (step 306).
  • the spectrum replacement unit 60 acquires the corresponding redundant data 3.
  • the spectrum of the preprocessing result is acquired from the spectrum buffer 61 (step 302).
  • the processing result one frame before, that is, the main data 2 one frame before is acquired.
  • the spectrum of the preprocessing result is an MDCT spectrum generated by the MDCT executed by the transmission device 20.
  • the spectrum replacement unit 60 executes the waveform / spectrum synthesis processing (step 303). Specifically, it is a process of generating the spectrum X'dec [] of the composite waveform 11 from the processing result one frame before by using a predetermined waveform synthesis method.
  • X [] means an array using the index k corresponding to the frequency.
  • the waveform synthesis method a method in which a copy of the previous frame is used as the composite waveform is used, so that the spectrum of the preprocessing result stored in the spectrum buffer is used as it is as the spectrum X'dec [] of the composite waveform 11. ..
  • the spectrum substitution unit 60 executes a substitution region setting process for setting a substitution region for replacing the spectrum component (step 304).
  • the spectrum X out [] of the frame to be reproduced is prepared. This X out [] will be output as interpolated data.
  • replacement region setting process to the X out [], a process of calculating the index assigning a spectrum X REDUN [] of the redundant data 3, and an index for allocating the spectrum X 'dec [] of the composite waveform 11, respectively.
  • the spectrum replacement unit 60 executes a spectrum replacement process for replacing the spectrum components (step 305). Specifically, each spectral component of X out [] is replaced with the spectral component of the redundant data 3 or the composite waveform 11 based on the index assigned in the replacement region setting process. As a result, X out [] becomes interpolated data obtained by interpolating the redundant data 3 with the composite waveform 11. The substitution region setting process and the spectrum substitution process will be described in detail later. The X out [] generated as the interpolated data is stored in the spectrum buffer 61 for processing the next frame (step 306).
  • the spectrum data (main data 2 or interpolated data) processed by the spectrum replacement unit 60 is input to the IMDCT unit 62, and the IMDCT process is executed (step 307).
  • the M DCT spectrum is converted into data representing a time waveform.
  • the time signal output unit 63 covers the result of the IMDCT with a composite window, superimposes and adds it to the result of the IMDCT one frame before, and reconstructs the digital audio signal.
  • FIG. 16 is a flowchart showing an example of the replacement area setting process.
  • the process shown in FIG. 16 is an example of the internal process of step 304 in FIG.
  • FIG. 17 is a schematic diagram showing an example of a frequency range set by the replacement region setting process.
  • the variable indicating the index of each spectrum data is described as ISP.
  • the array in which the index of the spectrum of the redundant data 3 is stored is described as redun_isp []
  • the array in which the index of the spectrum to be replaced by the composite waveform 11 is stored is described as reply_isp [].
  • the substitution area setting process is a process of substituting an appropriate index into redun_isp [] and reply_isp [].
  • step 401 it is determined whether or not the redundant data 3 corresponding to the frame ID to be reproduced exists (step 401).
  • the redundant data 3 exists (YES in step 401)
  • the process of storing the index in redun_isp [] (step 402) is executed, and then the process of storing the index in reply_isp [] (step 403) is executed. ..
  • FIG. 17A illustrates the index range (frequency range) added to redun_isp [] and reply_isp [], respectively.
  • the indexes included in the interpolation range 71 which is a frequency range other than the coding frequency range 70, are sequentially stored in response_isp [].
  • an index other than the index stored in redun_isp [] is stored in reply_isp []. For example, when the maximum index of the coded frequency range 70 is 100 and the total number of spectra is set to 256, a number from 101 to 255 is input to reply_isp []. The index stored in replace_isp [], so that the spectrum X 'dec [] synthesized waveform 11 is used.
  • redun_isp [] is processed assuming that there is no element to be input (step 404). Therefore, redun_isp [] becomes blank data.
  • the redundant data 3 when the redundant data 3 does not exist, a process of generating alternative data by using only the spectrum of the preprocessing result stored in the spectrum buffer 61 is executed.
  • the tone component is a spectral component having a tone property.
  • a spectral component that reproduces a sound (tone) having a constant frequency is a tone component.
  • FIG. 17B illustrates the range (frequency range) of the index added to reply_isp [].
  • replace_isp [] is set to a frequency range excluding tone_isp [] (the range of shaded lines in the figure).
  • the index stored in replace_isp [] so that the spectrum X 'dec [] synthesized waveform 11 is used.
  • the spectral component corresponding to tone_isp [] is not used as data for reproduction. This makes it possible to suppress a sense of discomfort in hearing caused by discontinuity of tone components.
  • the entire index of the spectrum of the preprocessing result is stored in reply_isp []. That is, when the tone component is not excluded, the spectrum X'dec [] of the composite waveform 11 is used as it is as alternative data.
  • FIG. 18 is a flowchart showing an example of the spectrum replacement process.
  • the process shown in FIG. 18 is an example of the internal process in step 305 of FIG.
  • the spectrum of the redundant data 3 is described as X REDUN []
  • the spectrum of the composite waveform 11 is described as X 'dec [].
  • the spectrum data output as a result of the spectrum replacement processing is described as X out [].
  • X'dec [] and X out [] are N spectra satisfying 0 ⁇ k ⁇ N when MDCT having an analysis length of 2N is used.
  • the variable indicating the index of the spectrum be isp. By scanning this ISP, the loop processing shown in FIG. 18 is executed.
  • Step 505 When isp is included in response_isp [] (YES in 504), there is a corresponding spectral component ( X'dec [isp]) in the spectrum of the composite waveform 11, so that component is substituted for X out [isp]. (Step 505). If there is no redundant data 3, if redun_isp [] is such a state is empty is continuous multiple times, so it is possible to fade out the audio, X 'dec redundant data 3 to the [isp] May be weighted less than 1 which changes depending on the number of times that was not present. If the current ISP does not apply to either redun_isp [] or replace_isp [] (NO in step 504), the substitution for X out [isp] is not executed.
  • step 503 or step 505 is completed, and if NO in step 504 is determined, the ISP is incremented to scan the next index (step 506). It is determined whether the incremented ISP is smaller than the total number of spectra of the MDCT (step 507). If the ISP is smaller than the total number of spectra (YES in step 507), the processes after step 502 are executed again. In this way, the spectrum (interpolated data) in which the redundant data 3 and the composite waveform 11 are combined is stored in X out [] until the ISP becomes (total number of spectra N-1). When the ISP exceeds the total number of spectra (NO in step 507), the spectrum replacement process is completed, and X out [] is output to the IMDCT unit 62.
  • the waveform quality (noise spectrum 13) of the composite waveform 11 of the target frame 7 of the waveform data is predicted. Based on this waveform quality, one coded frequency range 70 to be assigned to the redundant data 3 is set in the frequency range of the waveform data included in the target frame 7. Then, the transmission data (packet 1) including the redundant data 3 generated based on the coded frequency range 70 is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.
  • the receiving device 50 receives the redundant data 3 assigned to one of the coded frequency ranges 70 of the frequency ranges of the waveform data included in the target frame 7.
  • the coded frequency range 70 is set using the waveform quality of the composite waveform 11 with respect to the target frame 7. Further, the interpolated data obtained by interpolating the received redundant data 3 with the composite waveform 11 is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.
  • a packet loss concealment method for generating interpolated data from a frame in the vicinity of a loss frame has been proposed.
  • a "hybrid concealment method” that generates interpolation data using a waveform synthesis method that differs for each frequency band (see Patent Document 1).
  • Waveform synthesis generally performed in the time domain is effective for voices distributed in the range of 4 kHz to 5 kHz, but the amount of calculation is large, and an extra wave tuning structure may be generated in a band higher than that, and a beep sound may be generated. There is sex.
  • waveform synthesis performed in the frequency domain works effectively especially for high-frequency noise components and the amount of calculation is often small, but click sounds due to phase discontinuity occur for voice. There is a fear. Further, in a general sound source, voice is included especially in the mid-low range, and noise is often included in the high range. Utilizing this, in the "hybrid concealment method", the amount of calculation is reduced and synthesized by performing waveform synthesis using the time domain method for the low frequency band and the frequency domain method for the high frequency band. The quality of the voice is maintained.
  • this method is a combination of existing waveform synthesis methods, for example, when the waveform is aperiodic and it is difficult to detect the pitch period, such as a combination of multiple musical instruments such as an orchestra, or when the power fluctuates. May not give good results. Further, for example, a sound having a wave-tuning structure, such as a musical instrument, does not always exist in the mid-low range equivalent to voice. Therefore, if the waveform synthesis methods are separated for only one frequency, noise components and the like associated with each concealment method may be heard. As described above, the quality may deteriorate depending on the type of sound source. In addition, since concealment is performed in the time domain with a large amount of calculation for the mid-low range, a certain calculation load may remain.
  • Another possible method is to add redundant data of neighboring frames in advance to the data (main data) that is originally desired to be transmitted, and use it as interpolated data when a packet is lost.
  • redundant data is used in this way, the amount of calculation on the receiving side generated by the concealment processing hardly increases, and high quality can be achieved, but the amount of data to be transmitted is significantly increased.
  • the frames in the vicinity of the main data to be originally transmitted are encoded only in a part of the frequency range (encoded frequency range 70) to generate the redundant data 3.
  • the coded frequency range 70 is set based on the waveform quality (noise spectrum 13, etc.) of the composite waveform used in the receiving device 50.
  • the interpolation range 71 other than the coded frequency range 70 is from a nearby frame that has been normally received in the past. It is replaced with the frequency spectrum of the generated composite waveform 11. In this way, the interpolated data in which the redundant data 3 is interpolated using the composite waveform 11 is used as the data for reproduction.
  • the waveform quality of the composite waveform is predicted in advance, and the width of the coding frequency range 70 is set based on the quality. This makes it possible to appropriately set the width of the coded frequency range 70 for each frame. As a result, it becomes possible to generate redundant data 3 in an appropriate range for each frame. Further, since the width of the coded frequency range 70 is appropriately changed, the amount of redundant data 3 is reduced as compared with the case of using, for example, a fixed width frequency range, and the amount of transmitted data can be suppressed. ..
  • the target data amount of the redundant data 3 When the target data amount of the redundant data 3 is set, the quality of the entire interpolated data including the quality of the redundant data 3 is predicted, and the coding frequency is maximized so that the quality of the interpolated data is maximized.
  • the range 70 is set. This makes it possible to realize the highest quality error concealment in the determined amount of data.
  • the optimum coding frequency range 70 can be set according to the characteristics of the waveform synthesis method by using the waveform quality of the composite waveform.
  • the coding frequency range 70 can be set so as to reduce the final amount of noise even when a simple waveform synthesis method such as copying the previous frame is used.
  • it is possible to significantly reduce the calculation load on the receiving device 50 side while maintaining the quality of the error concealment.
  • the coding frequency range 70 set mainly on the low frequency side has been described.
  • the lowest frequency of the coded frequency range 70 may be fixedly set to an arbitrary frequency according to a use case or the like.
  • the maximum frequency of the coded frequency range 70 is set according to the noise of the composite waveform. This makes it possible to realize high-quality error concealment according to the use case.
  • FIG. 19 is a schematic diagram showing an example of the coded frequency range 70 according to the second embodiment.
  • a plurality of coded frequency ranges 70 are set as the frequency range assigned to the redundant data 3. Further, the position, width, number, etc. of the coded frequency range 70 can be freely set. Therefore, in the present embodiment, the data of the target frame 7 is encoded by the transmission device 20 only in the coding frequency range 70 which is not necessarily limited to the low frequency range according to the waveform quality of the composite waveform, and the redundant data 3 is generated. Will be done.
  • two coding frequency ranges 70 are set. The spectral components included in the coded frequency range 70 are used as the redundant data 3.
  • the range other than the coded frequency range 70 is the interpolation range 71 that is interpolated using the composite waveform.
  • the configuration of the transmission / reception system according to the present embodiment is substantially the same as the configuration of the transmission / reception system 100 (transmission device 20 and reception device 50) described in the above embodiment, for example.
  • each configuration will be described using the same reference numerals as those of the transmission / reception system 100.
  • the processing content of the coding range setting process for setting the coding frequency range 70 and the configuration of the data transmitted as the packet 1 are different from the above-described embodiment. Specifically, in the coding range setting process, a process of setting a plurality of coding frequency ranges 70 is executed as shown in FIG. Further, meta information for specifying a plurality of coded frequency ranges 70 is added to the packet 1. In the transmission device 20, this meta information is encoded by the coding unit 25 as a part of the redundant data 3.
  • a plurality of coding frequency ranges 70 can be freely selected. Therefore, indexes at both ends are used to specify each coded frequency range 70. Specifically, information that specifies the lowest frequency index lsp and the highest frequency index hsp is generated as meta information.
  • the meta information increases, which may put pressure on the amount of data that can be used to transmit the main data 2 and the redundant data 3.
  • the maximum number of coded frequency ranges 70 that can be tolerated is set in advance. Further, in the coding range setting process, a plurality of candidate ranges that are candidates for the coding frequency range 70 are calculated. These candidate ranges are aggregated to fit in the maximum number. This point will be described in detail later with reference to FIGS. 24 and 25 and the like.
  • the coding range setting unit 33 calculates at least one excess range in which the noise spectrum 13 exceeds the second threshold value, and sets the coding frequency range 70 based on at least one excess range.
  • the noise spectrum 13 is a power spectrum P noise (k) of noise generated by using the composite waveform, and is calculated by, for example, the method described with reference to FIG.
  • the second threshold is set according to, for example, the power of noise allowed.
  • the excess range is a frequency range in which the spectral component (noise power) of the noise spectrum 13 exceeds the second threshold value. Therefore, the number of calculated excess ranges changes according to the state of the noise spectrum 13. That is, the number of excess ranges may vary from frame to frame. In this way, by using the second threshold value, it is possible to selectively detect the frequency range in which the noise power exceeds the permissible level. As a result, for example, a range with less noise can be excluded from the frequency range of the redundant data 3, and the amount of data of the redundant data 3 can be suppressed.
  • FIG. 20 is a schematic diagram showing a calculation example of the coded frequency range 70.
  • the noise spectrum 13 P noise (k)
  • the second threshold value is shown by the dotted line.
  • the threshold curve 14 set for each frequency is used as the second threshold.
  • a threshold threshold (k) set for each frequency based on human auditory characteristics for example, a loudness curve
  • a low threshold is set for a frequency that is easy for humans to hear
  • a high threshold is set for a frequency that is hard for humans to hear.
  • the coding range setting unit 33 compares the noise spectrum 13 (P noise (k)) with the threshold curve 14 (thresh (k)), and the noise power is higher than that of the threshold (k).
  • An increasing frequency range (excess range 74) is calculated.
  • a set of indexes k satisfying the following equation is calculated.
  • P noise (k) a moving average of noise power (P noise-smoothed (k)) calculated using the above (Equation 4) is typically used.
  • the noise spectrum calculated by using (Equation 1) to (Equation 3) may be used.
  • the frequency range including a predetermined number of indexes (for example, three indexes) located before and after the excess range 74 is the coding frequency.
  • the range is set to 70.
  • the region set in the coding frequency range 70 is schematically illustrated as a shaded region.
  • the excess range 74 can be used as it is as the coding frequency range 70.
  • the coded frequency range 70 set by the method shown in FIG. 20 can be aggregated and adjusted by the subsequent processing. Therefore, it can be said that the frequency range set by using the excess range 74 and the indexes before and after the excess range 74 is the candidate range 75 which is a candidate for the coded frequency range 70. As described above, in the present embodiment, a plurality of candidate ranges 75 that are candidates for the coding frequency range 70 are calculated, and the coding frequency range 70 is set based on the plurality of candidate ranges 75.
  • FIG. 21 is a flowchart showing an example of the coding range setting process.
  • the process shown in FIG. 21 is a process executed by, for example, the coding range setting unit 33 in place of step 104 shown in FIG.
  • the candidate range 75 is set based on the noise spectrum 13 (P noise (k)) and the threshold curve 14 (thresh (k)) (step 601).
  • the candidate range 75 is set according to the method described with reference to FIG. That is, an index range (excess range 74) that satisfies the relationship shown in (Equation 11) is calculated, and a frequency range including a predetermined number of indexes before and after that range is set as the candidate range 75.
  • a process of adjusting the threshold curve 14 (thresh (k)) based on the spectrum X'(k) of the composite waveform 11 and readjusting the candidate range 75 is executed (step 602).
  • the threshold value is adjusted in the manner of calculating the masking threshold value with reference to the power of the spectrum X'(k) of the composite waveform 11.
  • a band masked band
  • the threshold (k) is adjusted based on the volume (masking threshold) at which this sound becomes difficult to hear.
  • a spectral component that may be reproduced is detected from the power of X'(k) in a region other than the candidate range 75 once set.
  • the threshold (k) is set high so that there is little auditory discomfort even if the noise power is high.
  • the width of the candidate range 75 is narrowed, and the amount of redundant data 3 can be suppressed.
  • the non-tone component exclusion process for excluding the non-tone component from the candidate range 75 is executed (step 603).
  • the non-tone component is a component other than the tone component included in the spectrum X (k) of the original waveform 4 or the spectrum X'(k) of the composite waveform 11. It can be said that the non-tone component is a simple noise component that disappears when the frame is switched, for example. Therefore, among the spectral components satisfying the relationship (Equation 11), the component corresponding to the non-tone component has little effect on the sense of hearing even if it is replaced with, for example, the synthetic waveform 11.
  • the non-tone component exclusion process is a process of excluding such non-tone components from the candidate range 75 to narrow the candidate range 75. This point will be described in detail later.
  • a frequency range aggregation process for aggregating the candidate ranges 75 calculated in the steps up to the above is executed (step 604). This process reduces the number of candidate ranges 75. Finally, a frequency range adjustment process is performed to reduce the width of the reduced candidate range 75 (step 605). This process is particularly effective in use cases where the amount of target data exists in the redundant data 3 and the amount of target data is small, so that quantization noise becomes a problem.
  • the processes of steps 602, 603, and 605 may be appropriately executed according to, for example, the required noise level.
  • FIG. 22 is a schematic diagram for explaining the non-tone component exclusion process.
  • the tone component 15 is extracted from the original waveform 4 (original data 6 of the target frame 7). More specifically, the tone component 15 included in the spectrum X (k) of the original waveform 4 is extracted.
  • a method of comparing the characteristics of the neighboring frame and the spectrum is used.
  • is schematically illustrated using circles.
  • tone_isp [] a set (array) of indexes of the tone component 15 and its vicinity (for example, 3 spectra) is described as tone_isp [].
  • a component included in a range other than tone_isp [] is a non-tone component.
  • enc_isp [] a set of indexes indicating the candidate range 75 calculated in the previous process (step 601 or step 602) is described as enc_isp [].
  • enc_isp [] the frequency ranges corresponding to tone_isp [] and enc_isp [] are schematically illustrated.
  • enc_isp [] in a frequency range of a certain frequency fc (for example, 2 kHz) or higher, even when P noise (k) other than the tone component 15 is large, a sense of discomfort in hearing is unlikely to occur. That is, the noise derived from the non-tone component is hard to hear above fc.
  • the value of the frequency fc is not limited, and can be arbitrarily set according to, for example, the required noise level.
  • the frequency fc corresponds to a predetermined threshold frequency.
  • the range of the frequency fc or higher is divided into the tone property and the noise property frequency range. Then, the frequency range judged to be noisy is excluded from the coded frequency range 70 (candidate range 75) regardless of the magnitude of P noise (k), and only the frequency range judged to be toned is left. Will be executed.
  • the candidate range 75 adjusted to exclude the non-tone component is schematically shown.
  • the intersection of tone_isp [] and enc_isp [] (tone_isp [] ⁇ enc_isp []) is calculated.
  • the frequency range represented by this intersection is set in the new candidate range 75.
  • the width of enc_isp [] is reduced leaving the index of the tone component 15 and its vicinity. In this way, the width of the candidate range 75 is adjusted so that the tone component 15 is included on the high frequency side of the frequency fc.
  • the width of the candidate range 75 (enc_isp []) may be expanded so that the tone component 15 and the index in the vicinity thereof are completely included in the candidate range 75 on the high frequency side of the frequency fc. As a result, the tone component 15 can be reliably replaced with the redundant data 3.
  • FIG. 23 is a flowchart showing an example of the non-tone component exclusion process.
  • the process shown in FIG. 23 is an example of the internal process of step 603 of FIG.
  • data (X (k) and X'(k)) representing each spectrum of the original waveform 4 and the composite waveform 11 are acquired (step 701).
  • Each spectrum may be an FFT spectrum or an MDCT spectrum.
  • the composite waveform 11 is generated by copying the data one frame before the redundant data 3. That is, X'(k) is the spectrum X prev (k) of the previous frame.
  • a past spectrum or the like necessary for the tone component detection process described later may be acquired as necessary. In the following, it is assumed that the processing is performed using the two spectra of X (k) and X'(k).
  • step 702 the power (
  • the tone component detection process for detecting the tone component 15 is executed (step 703).
  • the tone component 15 is detected from each spectrum of X (k) and X'(k).
  • a spectral component having a strong tone property in each spectrum is calculated as the tone component 15 in consideration of the shape of the power spectrum, the temporal correlation between the spectra in the preceding and following frames, and the like.
  • the method for calculating the tone component 15 is not limited.
  • a set of indexes including the tone component 15 tone_isp [] is generated (step 704). Specifically, indexes of the tone component 15 and its vicinity (for example, indexes for three spectra before and after on the frequency axis) are acquired, and these indexes are stored in tone_isp [].
  • the already calculated candidate range 75 (enc_isp []) is updated based on tone_isp [] (step 705). Specifically, as described with reference to FIG. 22, the intersection of tone_isp [] and enc_isp [] is set in the candidate range 75 in the range above the frequency fc, and enc_isp [] is set in the range below the frequency fc. The candidate range 75 is set as it is.
  • the width of the candidate range 75 is reduced so as to include the tone components 15 of X (k) and X'(k).
  • the tone component 15 of X (k) is a component that is lost when the composite waveform 11 is used.
  • the tone component 15 of X'(k) is a component added when the synthetic waveform 11 is used. Therefore, by narrowing the width of the candidate range 75 so that each tone component 15 is included, it is possible to reduce the amount of redundant data 3 while avoiding such loss and additional occurrence of the tone component 15. It becomes.
  • FIG. 24 is a schematic diagram for explaining the frequency range aggregation process.
  • five candidate ranges 75 (ranges 1 to 5) are generated as shown in the upper part of FIG. 24 by the above-mentioned non-tone component exclusion process.
  • the candidate range 75 generated in the process of aggregating the candidate ranges 75 is shown in the middle of FIG. 24.
  • the coding range setting unit 33 calculates the connection cost representing the amount of noise that changes by connecting the candidate ranges 75 adjacent to each other. Then, the candidate range is concatenated based on the concatenation cost.
  • the concatenation cost is, for example, an index indicating an increase or decrease in the amount of noise generated by concatenating the candidate ranges 75. For example, when the candidate ranges 75 are concatenated, the total amount of quantization noise included in the interpolated data generated on the receiving side changes.
  • the connection cost is set so as to increase when the amount of noise increases and decrease when the amount of noise decreases, for example. The consolidation cost will be described in detail later.
  • the first-stage process is a process of calculating the connection cost for the unencoded range between the candidate ranges 75, and when the connection cost is equal to or less than a certain threshold value, the frequency ranges at both ends are combined into one. This process is always performed to prevent inadvertent dispersion of the frequency range.
  • the aggregation process from the upper stage to the middle stage shown in FIG. 24 is an example of the first stage process.
  • the second stage processing is executed when the number N of the candidate range 75 exceeds the maximum number N max in the first stage processing, and the pair of the candidate range 75 with the minimum connection cost is concatenated in order. It is a process that repeats the above. This process is executed until the number N of the candidate range 75 falls within the maximum number N max.
  • the aggregation process from the middle stage to the lower stage shown in FIG. 24 is an example of the second stage process.
  • the process of connecting the candidate ranges 75 is executed by, for example, the following method.
  • a number representing each candidate range 75 will be referred to as a range number i.
  • the combination of the candidate range 75 means that the candidate range 75 of the range number i and i + 1 is deleted, and the "lowest frequency index lsp_i of the original range number i" to the "highest frequency index hsp_ (i + 1) of the original range number i + 1" are deleted. ) ”, Means to generate a new candidate range 75.
  • the candidate ranges 75 of the range numbers 2 and 3 are combined to form a new range of lsp2 to hsp3.
  • Candidate range 75 (range 2 in the middle row) is generated.
  • the first method is a method of determining whether or not the quantization noise of the entire frame is reduced by reducing the meta information for one set by combining the candidate ranges 75.
  • the sum of the quantization noises N q (k) in the entire target frame 7 (P NQ-sum ), that is, the sum of the quantization noises N q (k) for the N candidate ranges 75 is as follows. It is calculated according to the formula of. In (Equation 12), the range number of the candidate range 75 is described as j.
  • P'NQ-sum is the sum of the quantization noises N'q (k) for N-1 candidate ranges 75 including the combined candidate ranges 75. Note that by meta-information is reduced, since the number of available allocation bits is increased, N 'q (k) is likely to be smaller than N q (k).
  • P'NQ-sum is calculated according to the following formula.
  • lsp'_j and hsp'_j are indexes of the lowest and highest frequencies of the range number j changed by coupling.
  • ⁇ P all-noise is the amount of change in the total amount of quantization noise due to the combination of the j-th and j + 1-th candidate ranges 75. This ⁇ P all-noise is used as the connection cost.
  • a condition that the total amount of quantization noise is reduced is set here. That is, when ⁇ P all-noise satisfies the following conditions, the j-th and j + 1-th candidate ranges 75 are combined. This makes it possible to reduce the number of candidate ranges 75 and reduce the amount of meta information data without increasing the quantization noise.
  • the second method is a method of determining whether or not the sum of the power
  • between the candidate ranges 75 is used as the consolidation cost. For example, when the frequency range in which
  • the sum P sum_inter_i of the powers of the spectra for the frequency range (intermediate range) between the i-th and i + 1-th candidate ranges 75 is calculated according to the following equation.
  • the P sum_inter_i calculated in this way is equal to or less than a predetermined threshold value
  • the i-th and i + 1-th candidate ranges 75 are combined.
  • the threshold value for determining P sum_inter_i may be a value set in relation to the amount of data of meta information or the like, or may be a predetermined fixed value.
  • the third method is a method of determining whether or not the interval (difference in the index of the spectrum) of the candidate range 75 is equal to or less than a predetermined threshold value.
  • the interval of the candidate range 75 is used as the consolidation cost. For example, it is estimated that the amount of noise generated is small even if the portion where the interval of the candidate range 75 is small is replaced with the composite waveform 11. Utilizing this, the combination of the candidate range 75 is determined based on the interval of the candidate range 75.
  • This method does not require a process of calculating the total power of the spectra, and can be said to be a simplified version of the second method.
  • the total number of indexes between the i-th and i + 1-th candidate ranges 75 is calculated according to the following formula.
  • the i-th and i + 1-th candidate ranges 75 are combined.
  • the above three methods are methods used for the first-stage processing (aggregation processing from the upper stage to the middle stage of FIG. 24) for connecting the candidate ranges 75. If the number N of the candidate range 75 is larger than N max even after executing these processes, the pair of the two candidate ranges 75 having the minimum cost is combined as one frequency range. The process to be performed is repeatedly executed. As a result, the candidate range 75 can be aggregated up to the specified number (N max).
  • FIG. 25 is a flowchart showing an example of frequency range aggregation processing.
  • the process shown in FIG. 25 is an example of the internal process in step 604 of FIG.
  • the connection cost (P sum_inter_i ) described in the second method described above is used.
  • the variable N representing the number of candidate ranges 75 and the variable i representing the range number are initialized (step 801).
  • the number of currently existing candidate ranges 75 (the number of candidate ranges 75 calculated in the processes up to the previous stage) is substituted into N. Further, 1 is assigned to the variable i for holding the range number for scanning each candidate range 75.
  • step 802 it is determined whether or not the range number i is N-1 or less (step 802). If i is less than N-1 (YES in step 802), the first step of connecting the candidate ranges 75 (steps 803 to 806) is executed.
  • the first step of processing first, an index for representing the frequency range (i-th intermediate range) between the i-th and i + 1-th candidate ranges 75 is acquired (step 803). Specifically, the index hsp_i of the highest frequency of the i-th candidate range 75 and the index lsp_ (i + 1) of the lowest frequency of the i + 1th candidate range 75 are read.
  • the range from (hsp_i + 1) to (lsp_ (i + 1) -1) is the i-th intermediate range.
  • the total power P sum_inter_i of the spectrum of the original waveform 4 is calculated according to the equation (Equation 16), and it is determined whether or not it is equal to or less than a predetermined threshold value (step 804).
  • a predetermined threshold value it is determined whether the sum of
  • step 804 If P sum_inter_i is larger than the threshold value (NO in step 804), the process of combining the candidate ranges 75 is not performed. Next, the range number i is incremented (step 806), and it is determined whether or not the range number i incremented by step 802 is N-1 or less. In this way, the aggregation process of comparing P sum_inter_i with the threshold value is repeated until the range number i becomes the N-1th.
  • N is N max or less (YES in step 808), that is, when the number of candidate ranges 75 is sufficiently reduced in the first-stage aggregation process, the frequency range aggregation process ends.
  • N is larger than N max (NO in step 808), the second-stage aggregation process is executed (step 809).
  • the candidate ranges 75 on both sides of the intermediate range that minimizes P sum_inter_i are combined.
  • the i-th candidate range 75 such that the sum of
  • the range number N is updated (step 810). Specifically, as the number of ranges N is reduced by one in step 809, N-1 is substituted for the number of ranges N. Then, step 808 is executed again, and it is determined whether or not the updated range number N is N max or less. As a result, the candidate ranges 75 are aggregated in ascending order of connection cost until the range number N drops to the maximum allowable number N max.
  • FIG. 26 is a schematic diagram for explaining the frequency range adjustment process.
  • the coding range setting unit 33 adjusts the width of the candidate range 75 based on the noise components at the highest and lowest frequencies of the candidate range 75. Specifically, from the indexes (highest frequency and lowest frequency) at both ends of each candidate range 75 calculated in the processing up to the previous stage, the index that minimizes P noise (k) (hereinafter referred to as k noise-min ). ) Is detected, and the index k noise-min is repeatedly excluded from the candidate range 75, so that the candidate range 75 is reduced.
  • k noise-min the index that minimizes P noise
  • Such processing is repeatedly executed while a predetermined condition is satisfied.
  • the amount of noise change ⁇ P all-noise in the entire target frame 7 generated by excluding k noise-min and narrowing the candidate range 75 is calculated.
  • the amount of noise change ⁇ P all-noise is expressed as the sum of (decrease in quantization noise) and (increase in noise due to narrowing of the frequency range). Specifically, ⁇ P all-noise is calculated according to the following equation.
  • ⁇ P all-noise When ⁇ P all-noise is calculated, it is determined whether or not the following equation is satisfied. As shown in (Equation 21), when ⁇ P all-noise is negative, it means that the total amount of noise in the entire target frame 7 is reduced by excluding k noise-min. In this case, a new candidate range 75 excluding k noise-min is set. That is, lsp'_i and hsp'_i are set as new lsp_i and hsp_i. After that, the reduction operation of the candidate range 75 is repeatedly executed until the equation (Equation 21) is no longer satisfied. This makes it possible to reduce the total amount of noise in the entire target frame 7. Therefore, for example, even if the data amount (target data amount) of the redundant data 3 is determined in advance and the quantization noise becomes a problem due to its small size, the total amount of noise in the entire target frame 7 can be calculated. It can be sufficiently suppressed.
  • the total power P target (nbit) corresponding to the target data amount (nbit) of the redundant data 3 is calculated using a predetermined formula or table.
  • in all the candidate ranges 75 is calculated according to the following equation. As shown in the equation (Equation 22), the value of P redun-sum becomes smaller by excluding, for example, k noise-min.
  • P redun-sum satisfies the following equation. For example, when the relationship shown in (Expression 23) is not satisfied, in a similar manner to that described above, from among the lsp_i and Hsp_i, detects the index k noise-min to P noise (k) is minimized, k Noise- A new candidate range 75 excluding min is set. For the new candidate range 75, the total power P redun-sum is recalculated. In other words, the P REDUN-sum calculated before exclude k noise-min, k spectral power of the original waveform 4 in noise-min
  • Equation 24 The process from the detection of k noise-min to the recalculation of the total power shown in Eq. (Equation 24) is repeatedly executed until the following relationship is satisfied.
  • a candidate range 75 that satisfies this relationship is set as a coded frequency range 70 that is finally assigned to the redundant data 3.
  • the intensity perceived by humans may vary from frequency range to frequency range, even if the spectral powers are the same. Therefore, for example, for P noise (k), N q (k),
  • the candidate range 75 may be expanded based on the noise component.
  • a frequency component having a similarly high noise level exists in the vicinity of a frequency having a high P noise (k). Therefore, for example, when P noise (k) exceeds a certain level at both ends of the candidate range 75, even if a process of expanding the candidate range 75 and replacing nearby noise components with redundant data 3 is executed. good. This makes it possible to reduce noise due to the composite waveform.
  • the operation of the receiving device 50 according to the second embodiment is substantially the same as the operation described in the first embodiment, but the point of receiving the meta information as packet 1 and the processing content of the replacement area setting process are different. It is different from the above embodiment.
  • a plurality of frequency ranges (encoded frequency range 70) based on meta information are set in the substitution region setting process.
  • the spectrum of the redundant data 3 (X redun []) is used for the plurality of coded frequency ranges 70. Further, in X out [], all the interpolation ranges 71 other than the coding frequency range 70 are replaced with the spectrum (X'dec []) of the composite waveform 11 (see FIGS. 16 and 17). Therefore, the array redun_isp [] in which the index of the spectrum of the redundant data 3 is stored stores the indexes of all lsp_i to hsp_i in the meta information. Further, in the array reply_isp [] in which the indexes of the spectra of the composite waveform 11 are stored, all indexes other than the indexes stored in redun_isp [] are stored.
  • redun_isp [] stores indexes of 10, 11, 12, 13, 14, 15, 33, 34, 35, 36, 55, 56, 57, 58, 59, 60.
  • the spectrum replacement process shown in FIG. 18 can be applied as it is to generate appropriate spectrum data X out [] for reproduction.
  • FIG. 27 is a schematic diagram showing an example of the coded frequency range 70 according to the third embodiment.
  • a plurality of coded frequency ranges 70 are set to arbitrary frequency ranges.
  • an arbitrary waveform synthesis method is individually set for the interpolation range 71 between the coding frequency ranges 70. That is, in the present embodiment, the coding frequency range 70 and the waveform synthesis method can be freely set.
  • the waveform synthesis methods that can be executed on the receiving side are tried on the transmitting side, and the noise (noise spectrum 13) generated by the synthesized waveform 11 generated by each method is calculated.
  • the coding frequency range 70 assigned to the redundant data 3 and the optimum waveform synthesis method to be set in the interpolation range 71 are set.
  • the synthesis method 1, the synthesis method 2, and the synthesis method 3 are selected in order from the low frequency side (left in the figure) for the three interpolation ranges 71. As a result, it is possible to realize high-quality error concealment with less redundant data 3.
  • the configuration of the transmission / reception system according to the present embodiment is substantially the same as the configuration of the transmission / reception system 100 (transmission device 20 and reception device 50) described in the above embodiment, but the redundant data generation unit included in the transmission device 20. And the configuration of the signal processing unit included in the receiving device 50 is different.
  • the same configuration as the transmission / reception system 100 will be described using the same reference numerals.
  • FIG. 28 is a block diagram showing a configuration example of the redundant data generation unit 324 according to the third embodiment.
  • the redundant data generation unit 324 includes an original data selection unit 330, a composite waveform generation unit 331a to 331c, a generated noise calculation unit 332, a coding range setting unit 333, and a coding spectrum selection unit 334.
  • the redundant data generation unit 324 is different from the redundant data generation unit 24 described in the above embodiment in that a plurality of composite waveform generation units 331 are mainly provided. Along with this, a plurality of noise information (noise spectrum 13) output from the generated noise calculation unit 332 is provided.
  • the redundant data generation unit 324 is provided with three composite waveform generation units 331a to 331c corresponding to the three types of waveform synthesis methods.
  • the number of composite waveform generation units 331 is also n.
  • the original data selection unit 330 selects and acquires necessary data from the original data 6 stored in the input buffer 23 (see FIG. 7). Specifically, the original data 6 corresponding to the main data 2 (transmission frame 8) and the redundant data 3 (target frame 7) is read from the input buffer 23 in the previous stage. Further, the necessary original data 6 is delivered to the composite waveform generation units 331a to 331c.
  • the original data 6 is at least one of the spectrum data X (k) and the time waveform data x (n).
  • the composite waveform generation units 331a to 331c generate the composite waveform 11 according to the waveform synthesis method set for each based on the original data 6 acquired from the original data selection unit 330, and the data (composite data) of each composite waveform 11 Is handed over to the generated noise calculation unit 332.
  • the waveform synthesis method it is desirable to combine a plurality of methods having a low calculation amount (for example, a method of copying the previous frame). This makes it possible to reduce the calculation load on the receiving side.
  • this technique can be applied regardless of the type of waveform synthesis method.
  • the generated noise calculation unit 332 acquires the original data 6 from the original data selection unit 330, and acquires the three composite waveforms 11 from the composite waveform generation units 331a to 331c. Further, the generated noise calculation unit 332 calculates noise information related to noise generated by using the composite waveform 11 for each waveform synthesis method, and delivers this to the coding range setting unit. Specifically, the noise spectrum 13 representing the waveform quality of each composite waveform 11 is calculated as noise information. In this way, the generated noise calculation unit 332 predicts the waveform quality of the composite waveform 11 for each of the plurality of waveform synthesis methods.
  • the coding range setting unit 333 acquires noise information (noise spectrum 13) calculated for each waveform synthesis method from the generated noise calculation unit 332, and allocates at least one code to the redundant data 3 based on each noise spectrum 13.
  • the frequency range 70 is set. Further, the coding range setting unit 333 sets one of the plurality of waveform synthesis methods for generating the composite waveform 11 in the interpolation range 71, which is a frequency range other than the coding frequency range 70. Specifically, the optimum waveform synthesis method is selected for each interpolation range 71 based on each noise spectrum 13.
  • the coding frequency range 70 and the waveform synthesis method assigned to the interpolation range 71 are determined based on the waveform quality (noise spectrum 13) predicted for each of the plurality of waveform synthesis methods.
  • the coding range setting unit 333 generates meta information including information for designating the coding frequency range 70 and information for specifying the waveform synthesis method assigned to the interpolation range 71, and delivers the meta information to the coding spectrum selection unit 334. ..
  • the coded spectrum selection unit 334 extracts a spectrum component to be used as the redundant data 3 based on the coded frequency range 70 from the original data 6 (spectral data X (k)) corresponding to the redundant data 3.
  • the operation of the coded spectrum selection unit 334 is the same as that of the above embodiment, but the meta information about the waveform synthesis method is also handled as a part of the redundant data 3.
  • FIG. 29 is a flowchart showing an example of the generation process of the redundant data 3. This process is, for example, a loop process that is executed every time packet 1 is generated.
  • the waveform synthesis method it is assumed that a method of generating the composite waveform 11 by using one or both of the spectrum data and the time waveform data is used. Further, it is assumed that these data required for generating the composite waveform 11 are stored in advance in the input buffer 23 of the previous stage.
  • the original data 6 (original data 6 included in the target frame 7) corresponding to the redundant data 3 is acquired from the input buffer 23 (step 901). Further, spectrum data is acquired from the input buffer 23 as the original data 6 required for waveform synthesis (step 902). In addition, time waveform data is acquired from the input buffer 23 as the original data 6 required for waveform synthesis (step 903).
  • the order of processing in steps 901 to 903 is not limited.
  • a composite waveform generation process for generating the composite waveform 11 based on the target waveform synthesis method is executed (step 904).
  • This composite waveform generation process corresponds to step 102 in FIG.
  • the generated noise prediction process for calculating the noise spectrum 13 related to the composite waveform is executed (step 905).
  • This generated noise prediction process is a process corresponding to step 103 of FIG.
  • the coding range setting process for setting the frequency range assigned to the redundant data 3 is executed based on the noise spectrum 13 (step 906).
  • This coding range setting process is a process corresponding to the coding range setting process shown in FIG.
  • At least one frequency range is set.
  • the frequency range set in step 906 is a candidate range 75 that is a candidate for the coded frequency range 70.
  • at least one candidate range 75 that is a candidate for the coding frequency range 70 is calculated for each of the plurality of synthesis methods.
  • step 907 It is determined whether or not there is a waveform synthesis method to be processed (step 907). If the waveform synthesis method remains (YES in step 907), the processing after step 904 is executed again using the waveform synthesis method that has not been processed yet. As a result, the same number of candidate ranges 75 (enc_isp_i []) corresponding to the i-th waveform synthesis method are generated as the planned waveform synthesis method (see FIG. 30).
  • the coding range synthesis processing for synthesizing the candidate ranges 75 corresponding to each waveform synthesis method and setting the coding frequency range 70 is executed. (Step 908).
  • the coding frequency range 70 is set, and an appropriate waveform synthesis method is assigned to the interpolation range 71 between the coding frequency ranges 70. That is, which composite waveform 11 spectrum is applied to each interpolation range 71 is set, and this setting result is generated as meta information.
  • a coded spectrum selection process for extracting only the spectral components corresponding to the coded frequency range 70 from the original data 6 of the target frame 7 is executed (step 909).
  • This coded spectrum selection process corresponds to step 105 in FIG.
  • the redundant data 3 before coding is generated using only the spectral components in the coded frequency range 70.
  • an index indicating the coding frequency range 70 and meta information indicating the waveform synthesis method set for each interpolation range 71 are added to the redundant data 3.
  • the coefficient calculated in the waveform synthesis process may be added as meta information. This makes it possible to reduce the calculation load on the receiving device 50.
  • step 910 When the redundant data 3 is extracted, it is determined whether or not the original data 6 to be processed remains (step 910). If the original data 6 to be processed remains (YES in step 910), the processing after step 901 is executed again for the remaining original data 6. When the original data 6 to be processed does not remain (NO in step 910), the coding process for encoding the redundant data 3 is executed (step 911).
  • the target data amount of the main data 2 is set (step 912). Specifically, the free space of the packet 1 is calculated from the data amount of the encoded redundant data 3.
  • the processes in steps 910 to 912 correspond to the processes in steps 106 to 108 shown in FIG. 9, for example.
  • FIG. 30 is a schematic diagram showing an example of the coding range synthesis process.
  • the outline of the coding range composition processing will be described below.
  • (enc_isp_i []) is read as the candidate range 75 corresponding to the i-th waveform synthesis method by the composite waveform generation process, the generated noise prediction process, and the coding range setting process in the previous stage.
  • a region showing a candidate range 75 calculated corresponding to the waveform synthesis methods 1 to 3 is shown.
  • the candidate range 75 (enc_isp_1 []) of the waveform synthesis method 1 includes one frequency range.
  • the candidate ranges 75 (enc_isp_2 [] and enc_isp_3 []) of the waveform synthesis methods 2 and 3 include two frequency ranges, respectively.
  • each enc_isp_i [] only the spectrum of the common index may be encoded as the redundant data 3, and all the other frequency ranges may be appropriately replaced with the spectrum of the composite waveform 11.
  • enc_isp [] encoded frequency range 70
  • the frequency range represented by the intersection of the candidate ranges 75 calculated for each of the plurality of waveform synthesis methods is set to the coded frequency range 70.
  • the frequency range other than the coded frequency range 70 is the “uncoded frequency range”, and is the interpolation range 71 interpolated using the composite waveform 11. It can be said that the interpolation range 71 is a composite frequency range in which the composite waveform 11 is used.
  • two coded frequency ranges 70 represented by the intersection of each candidate range 75 are schematically illustrated using shaded areas. A region other than these coding frequency ranges 70 is the interpolation range 71.
  • the interpolation range 71 is uniquely determined as follows.
  • (Lsp_j, hsp_j) (0,1), (56,76), (81, N)
  • N is, for example, the largest index of the MDCT spectrum.
  • j is an index indicating the interpolation range 71.
  • the sum P noise-sum (i, j) in the j-th interpolation range 71 is as follows. Calculated according to the formula.
  • the waveform synthesis method i that minimizes the P noise-sum (i, j) shown in Eq. (Equation 27) is set as the optimum waveform synthesis method for the j-th interpolation range 71. .. That is, the waveform synthesis method having the smallest total sum of generated noise in the j-th interpolation range 71 is selected. In this way, the method that minimizes the integrated value of the noise spectrum 13 among the plurality of waveform synthesis methods is set in the interpolation range 71. Such processing is executed for all the interpolation ranges 71, and the optimum waveform synthesis method is set for each interpolation range 71.
  • the waveform synthesis method 3, the waveform synthesis method 2, and the waveform synthesis method 1 are set in order from the low frequency side for the three interpolation ranges 71.
  • the same waveform synthesis method may be set for all the interpolation ranges 71.
  • the interpolation range 71 can also be specified. Therefore, only the assignment of the waveform synthesis method to each interpolation range 71 may be added to the meta information in order from, for example, the low frequency band. That is, information such as an index that specifies the interpolation range 71 is unnecessary.
  • the frequency range adjustment process in step 605 of FIG. 21 May be executed.
  • FIG. 31 is a flowchart showing an example of the coding range synthesis process.
  • the process shown in FIG. 31 is an example of the coded range synthesis process described with reference to FIG. 29.
  • the sequence representing the index (encoding frequency range 70) of the spectrum to be finally encoded will be referred to as enc_isp [].
  • the variable representing the waveform synthesis method is described as i
  • the variable that specifies the interpolation range 71 is described as j.
  • the index enc_isp_i [] of the candidate range 75 calculated for each waveform synthesis method by the processing up to the previous stage is acquired (step 1001).
  • enc_isp [] is initialized with the index enc_isp_1 [] of the first candidate range 75 (step 1002).
  • the variable i representing the waveform synthesis method is initialized to 1 (step 1003).
  • enc_isp [] With the index enc_isp_i [] of the candidate range 75 obtained for each waveform synthesis method is calculated, and enc_isp [] is updated (step 1004). Then, the variable i is incremented (step 1005). It is determined whether the incremented i is less than or equal to the number of waveform synthesis methods used, that is, whether the intersection of candidate ranges 75 has been calculated for all waveform synthesis methods (step 1006). If the waveform synthesis method remains (YES in step 1006), the processes after step 1004 are executed again. When the intersection of the candidate ranges 75 is calculated for all the waveform synthesis methods, enc_isp [] becomes an array representing the index of the coded frequency range 70 shown in (Equation 26).
  • the interpolation range 71 is calculated based on enc_isp [] (step 1007), and the variable j representing the interpolation range 71 is initialized to 1 (NO in step 1006). Step 1008).
  • the indexes representing the lowest frequency and the highest frequency of the j-th interpolation range 71 are referred to as (lsp_j, hsp_j).
  • the number of interpolation ranges 71 depends on the distribution of the coding frequency range 70, but is ⁇ 1 of the number of coding frequency ranges 70.
  • an appropriate waveform synthesis method is sequentially set for the j-th interpolation range 71.
  • P noise-sum (i, j) is calculated based on the equation (Equation 27) for all the waveform synthesis methods, and the waveform synthesis method that minimizes the result is selected.
  • the waveform synthesis method selected in the previous stage is set as the waveform synthesis method used for the j-th interpolation range 71 (step 1010). Then, the variable j is incremented (step 1011), and it is determined whether or not the variable j is equal to or less than the total number of the interpolation ranges 71, that is, whether or not the waveform synthesis method is set for all the interpolation ranges 71 (step 1011). Step 1012).
  • step 1012 If the unprocessed interpolation range 71 remains (YES in step 1012), the processes after step 1009 are executed again.
  • the waveform synthesis method is set for all the interpolation ranges 71, the frequency range synthesis process is completed.
  • a method of generating the composite waveform 11 that minimizes the total amount of generated noise is set for each interpolation range 71.
  • the amount of noise in the restored waveform 5 restored on the receiving side is reduced, and even if the amount of redundant data 3 is small, it is possible to sufficiently reduce the sense of discomfort in hearing.
  • the packet 1 received by the receiving device 50 includes the main data 2 and the redundant data 3 to which meta information is added.
  • This meta information includes information on the waveform synthesis method set for each interpolation range 71, in addition to information for designating a plurality of coded frequency ranges 70. That is, it can be said that the meta information is information for designating the synthesis method of the composite waveform 11 for each interpolation range 71, which is a frequency range other than the coding frequency range 70.
  • the meta information is received by the reception buffer 52 and is appropriately referred to in the subsequent processing.
  • the meta information corresponds to the designated information.
  • the configuration of the signal processing unit will be mainly described as the configuration on the receiving device 50 side.
  • FIG. 32 is a block diagram showing a configuration example of the signal processing unit according to the third embodiment.
  • the signal processing unit 358 has a spectrum replacement unit 360, a spectrum buffer 361, an IMDCT unit 362, and a time signal output unit 363. Further, the signal processing unit 358 has a plurality of composite waveform generation units 364a to 364c, an MDCT unit 365, and a time waveform buffer 366.
  • the spectrum replacement unit 360 acquires spectrum data as the decoded redundant data 3 from the previous stage. Further, the spectrum data of the composite waveform 11 generated by the composite waveform generation units 364a and 364b and the spectrum data (MDCT spectrum) of the composite waveform 11 generated by the composite waveform generation unit 364c and converted by the MDCT unit 365 are acquired. .. Further, the spectrum replacement unit 360 generates interpolation data in which a part of the decoded redundant data 3 (interpolation range 71) is replaced with the spectrum data of each composite waveform 11 based on the above-mentioned meta information. This interpolated data is passed to the spectrum buffer 361 and the IMDCT unit 362.
  • the IMDCT unit 362 acquires the spectrum of the interpolated data from the spectrum replacement unit 360, performs IMDCT on the spectrum, and converts the interpolated data into time domain data.
  • the result of this IMDCT is passed to the time signal output unit 363.
  • the time signal output unit 363 acquires the result of the IMDCT from the IMDCT unit 362.
  • a composite window is applied to this, and the result of the previous IMDCT is superimposed and added to reconstruct the audio signal (time waveform data), and the audio signal (time waveform data) is delivered to the time waveform buffer 366.
  • the time waveform data stored in the time waveform buffer 366 is used for audio reproduction at a timing required in the subsequent stage.
  • the composite waveform generation units 364a and 364b appropriately acquire spectrum data of past frames (one frame before or two frames before) from the spectrum buffer 361, for example. Then, the spectrum data of the composite waveform 11 is generated based on each spectrum data and passed to the spectrum replacement unit 360.
  • the composite waveform generation unit 364c acquires the time waveform data of the past reproduced waveform from the time waveform buffer 366. Then, the time waveform data of the composite waveform 11 is generated based on the time waveform data, and is delivered to the MDCT unit 365.
  • the MDCT unit 365 acquires the time waveform data of the composite waveform 11 generated by the composite waveform generation unit 364c, and performs MDCT on this to create an MDCT spectrum (spectrum data) of the composite waveform 11.
  • the spectrum data of the composite waveform 11 is passed to the spectrum replacement unit 360.
  • the waveform synthesis method used by the composite waveform generation units 364a, 364b, and 364c is the waveform synthesis method assigned to the interpolation range 71 on the transmitting side.
  • each configuration of the signal processing unit 358 may be appropriately set according to the type of waveform synthesis method and the like.
  • the composite waveform generation units 364a and 364b perform a waveform synthesis method in the frequency domain (for example, copying a spectrum from a previous frame). Further, the composite waveform generation unit 364c performs a waveform synthesis method using a time waveform (for example, a method of extrapolating a waveform using a linear prediction code (LPC)). Regardless of which waveform synthesis method is used, the composite waveform 11 may be input to the spectrum replacement unit 360 after being converted into a spectrum in the frequency domain (MDCT region), if necessary.
  • a time waveform for example, a method of extrapolating a waveform using a linear prediction code (LPC)
  • FIG. 33 is a flowchart showing an example of the operation of the signal processing unit 358.
  • This process is a loop process that is continuously executed on a frame-by-frame basis.
  • the signal processing unit 358 (spectrum replacement unit 360) is in the preceding stage. It is assumed that the spectrum of the main data 2 or the redundant data 3 for which decoding has been completed can be acquired by the process of.
  • the spectrum replacement unit 360 is the main data 2 (step 1101).
  • the main data 2 is stored in the spectrum buffer 361 (step 1109).
  • the composite waveform generator 364c acquires the time waveform data required to generate the composite waveform 11 from the time waveform buffer 366. (Step 1102). Subsequently, the composite waveform generation units 364a and 364b acquire spectrum data necessary for generating the composite waveform 11 from the spectrum buffer 361.
  • any of the composite waveform generation units 364a to 364c executes the waveform / spectrum synthesis process, and the composite waveform 11 is generated according to the waveform synthesis method set in each unit (step 1104).
  • the composite waveform 11 is generated, it is determined whether or not there is a waveform synthesis method that has not been executed (step 1105). If there is an unprocessed waveform synthesis method (YES in step 1105), step 1104 generates the composite waveform 11 according to the following waveform synthesis method.
  • the time waveform data of the composite waveform 11 that requires MDCT processing here, the data generated by the composite waveform generation unit 364c is spectrumed by the MDCT unit 365.
  • the spectrum substitution unit 360 executes a substitution region setting process for setting a substitution region for replacing the spectrum component (step 1107). Specifically, an index for allocating the spectrum components of the composite waveforms 1 to 3 and an index for allocating the spectrum components of the redundant data are set for the spectrum X out [] of the frame to be reproduced.
  • the basic process of the replacement area setting process is substantially the same as the process of step 304 shown in FIG. 15, but the same number of arrays as the waveform synthesis method are prepared as the array reply_isp [] for storing the index to be replaced. Will be done.
  • reply_isp_1 [], reply_isp_2 [], and reply_isp_3 [] are prepared.
  • an array of indexes redund_isp [] to which the redundant data 3 is assigned is configured.
  • an index that specifies an interpolation range 71 other than each coded frequency range 70 is calculated.
  • an array of interpolation ranges 71 corresponding to the composite waveforms 1 to 3 is configured.
  • the index included in the interpolation range 71 is stored in response_isp_1 [].
  • the spectrum replacement process is executed by the spectrum replacement unit 360 using the processing result of the replacement region setting process.
  • the spectral components of the redundant data 3 and the composite waveforms 1 to 3 are substituted into X out [] according to the indexes specified by redund_isp [] and each reply_isp_i [].
  • the spectrum replacement unit 360 interpolates the redundant data 3 for each interpolation range 71 using the composite waveform 11 generated by the synthesis method specified by the meta information, and the interpolation data X out [ ] Is generated.
  • the basic processing of the spectrum replacement processing is substantially the same as the processing of step 305 shown in FIG. 15, but the number of sequences used is different.
  • the response_isp [] referenced in step 504 in the detailed processing flow of step 305 is extended to response_isp_1 [] to response_isp_3 [].
  • the value to be assigned to the X out [isp] is, X corresponds to the synthesized waveform 1 ⁇ 3 '1dec [isp] , X' 2dec [isp], and the one of X '3dec [isp].
  • the X out [] generated as the interpolated data is stored in the spectrum buffer for the next and subsequent processing (step 1109).
  • IMDCT processing is executed on the spectrum data X out [] (main data 2 or interpolated data) processed by the spectrum replacement unit 360 (step 1110).
  • the time signal output unit 363 reconstructs the audio signal from the result of IMDCT (step 1111).
  • the processing of steps 1109 to 1111 is the same as the processing of steps 306 to 308 shown in FIG.
  • the time signal output unit 363 stores the generated audio signal in the time waveform buffer 366 for the next and subsequent processing.
  • the redundant data 3 is interpolated by using a plurality of synthesis methods having different waveform synthesis methods.
  • the allocation of these synthesis methods is preset on the transmission device 20 side so as to reduce the amount of noise in the interpolation result. As a result, even if the packet 1 is lost or the like, it is possible to realize high-quality error concealment with less noise.
  • the transmission / reception system that mainly performs wireless communication has been described.
  • the present technology is not limited to this, and can be applied to, for example, a system for transmitting and receiving waveform data by wired communication.
  • this technique may be used as a PLC method when playing music by using network streaming or the like.
  • the present technology can also adopt the following configurations.
  • a quality prediction unit that predicts the waveform quality of the restored waveform related to the target frame of the waveform data
  • a range setting unit that sets at least one target range as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame based on the waveform quality.
  • a transmission device including a data generation unit that generates the redundant data based on the target range and generates transmission data including the redundant data.
  • the transmitter according to (1) The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
  • the quality prediction unit is a transmission device that predicts the waveform quality of the composite waveform as the waveform quality of the restored waveform.
  • the target frame is a frame in the vicinity of the transmission frame transmitted as the transmission data.
  • the quality prediction unit is a transmission device that generates the composite waveform of the target frame based on the waveform data included in the transmission frame.
  • the quality prediction unit calculates a noise spectrum representing the waveform quality of the composite waveform based on the composite waveform and the original waveform represented by the waveform data included in the target frame.
  • the range setting unit is a transmission device that sets the target range based on the noise spectrum. (5) The transmitter according to (4).
  • the range setting unit calculates the total noise amount of the interpolated data obtained by interpolating the redundant data with the composite waveform based on the noise spectrum and the quantization noise accompanying the coding of the redundant data, and the total noise amount is the minimum.
  • a transmitter that sets the target range so as to be. (6) The transmitter according to (4) or (5).
  • the noise spectrum is one of a spectrum obtained by frequency-converting the difference between the original waveform and the composite waveform, or a spectrum representing the difference between the spectrum of the original waveform and the spectrum of the composite waveform. (7) The transmitter according to any one of (4) to (6).
  • the range setting unit is a transmission device that sets an integration range for calculating an integrated value of the noise spectrum, and sets the target range based on the minimum integrated range in which the integrated value exceeds a first threshold value.
  • the transmitter according to (7).
  • the range setting unit is a transmission device that sets the minimum frequency of the integration range to the minimum frequency of the noise spectrum, changes the maximum frequency of the integration range, and calculates the integrated value.
  • the transmitter according to any one of (4) to (6).
  • the range setting unit is a transmission device that calculates at least one excess range in which the noise spectrum exceeds a second threshold value set for each frequency, and sets the target range based on the at least one excess range. (10) The transmitter according to any one of (1) to (9).
  • the range setting unit is a transmission device that calculates a plurality of candidate ranges that are candidates for the target range and sets the target range based on the plurality of candidate ranges.
  • the range setting unit is a transmission device that calculates a connection cost representing the amount of noise that changes by connecting the candidate ranges adjacent to each other, and connects the candidate ranges based on the connection cost.
  • the restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
  • the range setting unit extracts the toned frequency component included in the waveform data of the target frame, and sets the candidate range so that the toned frequency component is included on the high frequency side of a predetermined threshold frequency.
  • a transmitter that adjusts the width (13) The transmitter according to any one of (10) to (12).
  • the range setting unit is a transmission device that adjusts the width of the candidate range based on noise components at the highest and lowest frequencies of the candidate range.
  • the restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
  • the range setting unit is a transmission device that sets one of a plurality of synthesis methods for generating the composite waveform in a non-target range that is a frequency range other than the target range.
  • the quality prediction unit predicts the waveform quality of the composite waveform for each of the plurality of synthesis methods.
  • the range setting unit is a transmission device that sets the target range and the synthesis method assigned to the non-target range based on the waveform quality predicted for each of the plurality of synthesis methods.
  • the transmitter according to (15).
  • the range setting unit calculates at least one candidate range that is a candidate for the target range for each of the plurality of synthesis methods, and calculates a frequency range represented by the intersection of the candidate ranges calculated for each of the plurality of synthesis methods.
  • a transmission device that sets the target range and sets the method that minimizes the integrated value of the noise spectrum among the plurality of synthesis methods in the non-target range.
  • At least one target range is set as a frequency range to be assigned to the redundant data for generating the restored waveform from the waveform data included in the target frame.
  • a receiving unit that receives redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform related to the target frame of the waveform data.
  • the restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
  • the receiving unit receives designated information that specifies a method for synthesizing the synthesized waveform for each non-target range that is a frequency range other than the target range.
  • the waveform restoration unit is a receiving device that interpolates the redundant data using the composite waveform generated by the synthesis method specified by the designated information for each non-target range. (20) Reconstructed with respect to the target frame of the waveform data Based on the waveform quality of the waveform, the redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame is received.
  • a receiving method in which a computer system performs to generate the restored waveform based on the redundant data.
  • Steps for predicting the waveform quality of the restored waveform with respect to the target frame of the waveform data and A step of setting at least one target range as a frequency range to be assigned to redundant data for generating the restored waveform from the waveform data included in the target frame based on the waveform quality.
  • (22) A step of receiving redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform related to the target frame of the waveform data.

Abstract

A transmission device according to one embodiment of the present technology is provided with a quality prediction unit, a range setting unit, and a data generation unit. The quality prediction unit predicts waveform quality of a restoration waveform pertaining to a target frame in waveform data. On the basis of the waveform quality, the range setting unit sets at least one target range as a frequency range that is to be allotted to redundant data in order to generate the restoration waveform from the waveform data included in the target frame. The data generation unit generates the redundant data on the basis of the target range, and further generates transmission data that contains said redundant data.

Description

送信装置、送信方法、受信装置、及び受信方法Transmitter, transmitter, receiver, and receiver
 本技術は、データ通信に適用可能な送信装置、送信方法、受信装置、及び受信方法に関する。 This technology relates to a transmitting device, a transmitting method, a receiving device, and a receiving method applicable to data communication.
 従来、音声等の波形データを伝送する技術が開発されている。例えば、送信側で分割された波形データを、受信側で結合することで元の波形が復元される。また波形データの一部が損失した場合等に、損失部分を補償するエラーコンシールメントを行う技術が知られている。 Conventionally, technology for transmitting waveform data such as voice has been developed. For example, the original waveform is restored by combining the waveform data divided on the transmitting side on the receiving side. Further, there is known a technique for performing error concealment to compensate for a lost portion when a part of the waveform data is lost.
 例えば特許文献1には、エラーコンシールメントユニットを備えるオーディオデコーダが記載されている。このエラーコンシールメントユニットでは、フレームロス等が生じた場合に、周波数領域でのコンシールメント処理により、損失部分の高周波数側を表現するオーディオ情報成分が合成される。また時間領域でのコンシールメント処理により、損失部分の低周波数側を表現するオーディオ情報成分が合成される。これらの成分を組み合わせることで、合成処理に伴うクリック音やビープ音を回避したエラーコンシールメントが可能となっている(特許文献1の明細書段落[0016][0017][0094][0095]図1、図2等)。 For example, Patent Document 1 describes an audio decoder including an error concealment unit. In this error concealment unit, when a frame loss or the like occurs, an audio information component expressing the high frequency side of the loss portion is synthesized by the concealment process in the frequency domain. Further, the audio information component expressing the low frequency side of the loss portion is synthesized by the concealment processing in the time domain. By combining these components, error concealment that avoids click sounds and beep sounds associated with the synthesis process is possible (paragraphs [0016] [0017] [0094] [0995] of the specification of Patent Document 1). 1, Fig. 2 etc.).
国際公開2017/153006号International release 2017/153006
 近年、音声等の波形データを伝送して再生するデバイスが広く普及しており、データの伝送量を抑制しつつ高品質なエラーコンシールメントを実現することが可能な技術が求められている。 In recent years, devices that transmit and reproduce waveform data such as voice have become widespread, and there is a demand for technology that can realize high-quality error concealment while suppressing the amount of data transmission.
 以上のような事情に鑑み、本技術の目的は、データの伝送量を抑制しつつ高品質なエラーコンシールメントを実現することが可能な送信装置、送信方法、受信装置、及び受信方法を提供することにある。 In view of the above circumstances, an object of the present technology is to provide a transmission device, a transmission method, a reception device, and a reception method capable of realizing high-quality error concealment while suppressing the amount of data transmission. There is.
 上記目的を達成するため、本技術の一形態に係る送信装置は、品質予測部と、範囲設定部と、データ生成部とを具備する。
 前記品質予測部は、波形データの対象フレームに関する復元波形の波形品質を予測する。
 前記範囲設定部は、前記波形品質に基づいて、前記対象フレームに含まれる前記波形データから前記復元波形を生成するための冗長データに割り当てる周波数範囲として、少なくとも1つの対象範囲を設定する。
 前記データ生成部は、前記対象範囲に基づいて前記冗長データを生成し、前記冗長データを含む送信データを生成する。
In order to achieve the above object, the transmission device according to one embodiment of the present technology includes a quality prediction unit, a range setting unit, and a data generation unit.
The quality prediction unit predicts the waveform quality of the restored waveform with respect to the target frame of the waveform data.
Based on the waveform quality, the range setting unit sets at least one target range as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame.
The data generation unit generates the redundant data based on the target range, and generates transmission data including the redundant data.
 この送信装置では、波形データの対象フレームについて、その復元波形の波形品質が予測される。この波形品質に基づいて、復元波形を生成するための冗長データに割り当てる周波数範囲である対象範囲が少なくとも1つ設定される。そして対象範囲に基づいて生成された冗長データを含む送信データが生成される。これにより、データの伝送量を抑制しつつ高品質なエラーコンシールメントを実現することが可能となる。 With this transmitter, the waveform quality of the restored waveform is predicted for the target frame of the waveform data. Based on this waveform quality, at least one target range, which is a frequency range allocated to redundant data for generating a restored waveform, is set. Then, transmission data including redundant data generated based on the target range is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.
 前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であってもよい。この場合、前記品質予測部は、前記復元波形の波形品質として、前記合成波形の波形品質を予測してもよい。 The restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame. In this case, the quality prediction unit may predict the waveform quality of the composite waveform as the waveform quality of the restored waveform.
 前記対象フレームは、前記送信データとして送信される送信フレームの近傍のフレームであってもよい。この場合、前記品質予測部は、前記送信フレームに含まれる前記波形データに基づいて、前記対象フレームに関する前記合成波形を生成してもよい。 The target frame may be a frame in the vicinity of the transmission frame transmitted as the transmission data. In this case, the quality prediction unit may generate the composite waveform for the target frame based on the waveform data included in the transmission frame.
 前記品質予測部は、前記合成波形と前記対象フレームに含まれる前記波形データが表す元波形とに基づいて前記合成波形の波形品質を表すノイズスペクトルを算出してもよい。この場合、前記範囲設定部は、前記ノイズスペクトルに基づいて、前記対象範囲を設定してもよい。 The quality prediction unit may calculate a noise spectrum representing the waveform quality of the composite waveform based on the composite waveform and the original waveform represented by the waveform data included in the target frame. In this case, the range setting unit may set the target range based on the noise spectrum.
 前記範囲設定部は、前記ノイズスペクトルと前記冗長データの符号化に伴う量子化ノイズとに基づいて、前記冗長データを前記合成波形で補間した補間データのノイズ総量を算出し、前記ノイズ総量が最小となるように前記対象範囲を設定してもよい。 The range setting unit calculates the total noise amount of the interpolated data obtained by interpolating the redundant data with the composite waveform based on the noise spectrum and the quantization noise accompanying the coding of the redundant data, and the total noise amount is the minimum. The target range may be set so as to be.
 前記ノイズスペクトルは、前記元波形と前記合成波形との差分を周波数変換したスペクトル、又は前記元波形のスペクトルと前記合成波形のスペクトルとの差分を表すスペクトルのうちいずれか一方であってもよい。 The noise spectrum may be either a spectrum obtained by frequency-converting the difference between the original waveform and the composite waveform, or a spectrum representing the difference between the spectrum of the original waveform and the spectrum of the composite waveform.
 前記範囲設定部は、前記ノイズスペクトルの積算値を算出する積算範囲を設定し、前記積算値が第1の閾値を超える最小の前記積算範囲に基づいて前記対象範囲を設定してもよい。 The range setting unit may set an integration range for calculating the integrated value of the noise spectrum, and may set the target range based on the minimum integrated range in which the integrated value exceeds the first threshold value.
 前記範囲設定部は、前記積算範囲の最低周波数を前記ノイズスペクトルの最低周波数に設定し、前記積算範囲の最高周波数を変化させて前記積算値を算出してもよい。 The range setting unit may set the minimum frequency of the integration range to the minimum frequency of the noise spectrum and change the maximum frequency of the integration range to calculate the integration value.
 前記範囲設定部は、前記ノイズスペクトルが周波数ごとに設定された第2の閾値を超える少なくとも1つの超過範囲を算出し、前記少なくとも1つの超過範囲に基づいて前記対象範囲を設定してもよい。 The range setting unit may calculate at least one excess range in which the noise spectrum exceeds a second threshold set for each frequency, and set the target range based on the at least one excess range.
 前記範囲設定部は、前記対象範囲の候補となる複数の候補範囲を算出し、前記複数の候補範囲に基づいて前記対象範囲を設定してもよい。 The range setting unit may calculate a plurality of candidate ranges that are candidates for the target range, and set the target range based on the plurality of candidate ranges.
 前記範囲設定部は、互いに隣接する前記候補範囲を連結することで変化するノイズ量を表す連結コストを算出し、前記連結コストに基づいて前記候補範囲を連結してもよい。 The range setting unit may calculate a connection cost representing the amount of noise that changes by connecting the candidate ranges adjacent to each other, and connect the candidate ranges based on the connection cost.
 前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であってもよい。この場合、前記範囲設定部は、前記対象フレームの前記波形データに含まれるトーン性の周波数成分を抽出し、所定の閾値周波数の高周波数側で、前記トーン性の周波数成分が含まれるように前記候補範囲の幅を調整してもよい。 The restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame. In this case, the range setting unit extracts the toned frequency component included in the waveform data of the target frame, and the toned frequency component is included on the high frequency side of the predetermined threshold frequency. The width of the candidate range may be adjusted.
 前記範囲設定部は、前記候補範囲の最高周波数及び最低周波数におけるノイズ成分に基づいて、前記候補範囲の幅を調整してもよい。 The range setting unit may adjust the width of the candidate range based on the noise components at the highest frequency and the lowest frequency of the candidate range.
 前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であってもよい。この場合、前記範囲設定部は、前記合成波形を生成する複数の合成方法のうちの1つを前記対象範囲以外の周波数範囲である非対象範囲にそれぞれ設定してもよい。 The restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame. In this case, the range setting unit may set one of the plurality of synthesis methods for generating the composite waveform in a non-target range which is a frequency range other than the target range.
 前記品質予測部は、前記複数の合成方法ごとに前記合成波形の波形品質を予測してもよい。この場合、前記範囲設定部は、前記複数の合成方法ごとに予測された前記波形品質に基づいて、前記対象範囲と、前記非対象範囲に割り当てる合成方法とを設定してもよい。 The quality prediction unit may predict the waveform quality of the composite waveform for each of the plurality of synthesis methods. In this case, the range setting unit may set the target range and the synthesis method assigned to the non-target range based on the waveform quality predicted for each of the plurality of synthesis methods.
 前記範囲設定部は、前記複数の合成方法ごとに前記対象範囲の候補となる少なくとも1つの候補範囲を算出し、前記複数の合成方法ごとに算出された前記候補範囲の積集合が表す周波数範囲を前記対象範囲に設定し、前記複数の合成方法のうち前記ノイズスペクトルの積算値が最小となる方法を前記非対象範囲に設定してもよい。 The range setting unit calculates at least one candidate range that is a candidate for the target range for each of the plurality of synthesis methods, and calculates a frequency range represented by the intersection of the candidate ranges calculated for each of the plurality of synthesis methods. The target range may be set, and the method in which the integrated value of the noise spectrum is minimized among the plurality of synthesis methods may be set in the non-target range.
 本技術の一形態に係る送信方法は、コンピュータシステムにより実行される送信方法であって、波形データの対象フレームに関する復元波形の波形品質を予測することを含む。
 前記波形品質に基づいて、前記対象フレームに含まれる前記波形データから前記復元波形を生成するための冗長データに割り当てる周波数範囲として、少なくとも1つの対象範囲が設定される。
 前記対象範囲に基づいて前記冗長データを生成し、前記冗長データを含む送信データが生成される。
A transmission method according to an embodiment of the present technology is a transmission method executed by a computer system and includes predicting the waveform quality of a restored waveform with respect to a target frame of waveform data.
Based on the waveform quality, at least one target range is set as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame.
The redundant data is generated based on the target range, and transmission data including the redundant data is generated.
 本技術の一形態に係る受信装置は、受信部と、波形復元部とを具備する。
 前記受信部は、波形データの対象フレームに関する復元波形の波形品質に基づいて、前記対象フレームに含まれる前記波形データの周波数範囲のうち少なくとも1つの対象範囲に割り当てられた冗長データを受信する。
 前記波形復元部は、前記冗長データに基づいて前記復元波形を生成する。
The receiving device according to one embodiment of the present technology includes a receiving unit and a waveform restoring unit.
The receiving unit receives redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame, based on the waveform quality of the restored waveform with respect to the target frame of the waveform data.
The waveform restoration unit generates the restoration waveform based on the redundant data.
 前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であってもよい。この場合、前記受信部は、前記対象範囲以外の周波数範囲である非対象範囲ごとに前記合成波形の合成方法を指定する指定情報を受信してもよい。また前記波形復元部は、前記非対象範囲ごとに、前記指定情報が指定する前記合成方法で生成された前記合成波形を用いて前記冗長データを補間してもよい。 The restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame. In this case, the receiving unit may receive designated information that specifies a method of synthesizing the composite waveform for each non-target range that is a frequency range other than the target range. Further, the waveform restoration unit may interpolate the redundant data for each non-target range by using the composite waveform generated by the synthesis method specified by the designated information.
 本技術の一形態に係る受信方法は、コンピュータシステムにより実行される受信方法であって、波形データの対象フレームに関する復元波形の波形品質に基づいて、前記対象フレームに含まれる前記波形データの周波数範囲のうち少なくとも1つの対象範囲に割り当てられた冗長データを受信することを含む。
 前記冗長データに基づいて前記復元波形が生成される。
The receiving method according to one embodiment of the present technology is a receiving method executed by a computer system, and is a frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform with respect to the target frame of the waveform data. Includes receiving redundant data assigned to at least one of the scopes.
The restored waveform is generated based on the redundant data.
本技術の第1の実施形態に係る送受信システムの外観を模式的に示す図である。It is a figure which shows typically the appearance of the transmission / reception system which concerns on 1st Embodiment of this technique. エラーコンシールメントの概要を説明するための模式図である。It is a schematic diagram for demonstrating the outline of error concealment. 1フレーム分の元データが表す元波形を示す模式的なグラフである。It is a schematic graph which shows the original waveform represented by the original data for one frame. 第1の実施形態に係る冗長データの周波数範囲の一例を示す模式図である。It is a schematic diagram which shows an example of the frequency range of redundant data which concerns on 1st Embodiment. 比較例として挙げるエラーコンシールメントの一例を示す模式図である。It is a schematic diagram which shows an example of the error concealment given as a comparative example. 波形データの伝送方法の一例を示す模式図である。It is a schematic diagram which shows an example of the transmission method of a waveform data. 送受信システムの構成例を示すブロック図である。It is a block diagram which shows the configuration example of a transmission / reception system. 冗長データ生成部の構成例を示すブロック図である。It is a block diagram which shows the configuration example of a redundant data generation part. 冗長データの生成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the generation process of redundant data. ノイズスペクトルの算出例を示す模式図である。It is a schematic diagram which shows the calculation example of a noise spectrum. 符号化周波数範囲の算出例を示す模式図である。It is a schematic diagram which shows the calculation example of the coding frequency range. 符号化範囲設定処理の一例を示すフローチャートである。It is a flowchart which shows an example of the coding range setting process. 補間データの総ノイズ量について説明するための模式図である。It is a schematic diagram for demonstrating the total noise amount of the interpolated data. 受信装置に含まれる信号処理部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the signal processing part included in the receiving device. 信号処理部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of a signal processing unit. 置換領域設定処理の一例を示すフローチャートである。It is a flowchart which shows an example of the substitution area setting process. 置換領域設定処理により設定される周波数範囲の一例を示す模式図である。It is a schematic diagram which shows an example of the frequency range set by the substitution area setting process. スペクトル置換処理の一例を示すフローチャートである。It is a flowchart which shows an example of spectrum substitution processing. 第2の実施形態に係る符号化周波数範囲の一例を示す模式図である。It is a schematic diagram which shows an example of the coded frequency range which concerns on 2nd Embodiment. 符号化周波数範囲の算出例を示す模式図である。It is a schematic diagram which shows the calculation example of the coding frequency range. 符号化範囲設定処理の一例を示すフローチャートである。It is a flowchart which shows an example of the coding range setting process. トーン外成分除外処理について説明するための模式図である。It is a schematic diagram for demonstrating the non-tone component exclusion process. トーン外成分除外処理の一例を示すフローチャートである。It is a flowchart which shows an example of the non-tone component exclusion processing. 周波数範囲集約処理について説明するための模式図である。It is a schematic diagram for demonstrating the frequency range aggregation processing. 周波数範囲集約処理の一例を示すフローチャートである。It is a flowchart which shows an example of the frequency range aggregation processing. 周波数範囲調整処理について説明するための模式図である。It is a schematic diagram for demonstrating the frequency range adjustment processing. 第3の実施形態に係る符号化周波数範囲の一例を示す模式図である。It is a schematic diagram which shows an example of the coded frequency range which concerns on 3rd Embodiment. 第3の実施形態に係る冗長データ生成部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the redundant data generation part which concerns on 3rd Embodiment. 冗長データの生成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the generation process of redundant data. 符号化範囲合成処理の一例を示す模式図である。It is a schematic diagram which shows an example of the coding range synthesis processing. 符号化範囲合成処理の一例を示すフローチャートである。It is a flowchart which shows an example of the coding range synthesis processing. 第3の実施形態に係る信号処理部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the signal processing part which concerns on 3rd Embodiment. 信号処理部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of a signal processing unit.
 以下、本技術に係る実施形態を、図面を参照しながら説明する。 Hereinafter, embodiments relating to the present technology will be described with reference to the drawings.
 <第1の実施形態>
 [送受信システム]
 図1は、本技術の第1の実施形態に係る送受信システムの外観を模式的に示す図である。送受信システム100は、送信装置20及び受信装置50を含み、波形データを送信装置20から受信装置50に伝送するシステムである。
 ここで、波形データとは、例えば時間的に変化する波形を表すデータである。送受信システム100では、例えば音声の波形を表す音声データ等が波形データとして伝送される。
 また送受信システム100では、送信装置20及び受信装置50が互いに無線通信を行うことで、波形データが伝送される。無線通信の通信規格等は限定されず、例えばBluetooth(登録商標)等の波形データを伝送可能な通信規格が適宜用いられてよい。
 図1に示す例では、送信装置20として携帯型の端末装置(例えばスマートフォン、タブレット端末、携帯音楽プレイヤー等)が用いられる。また受信装置50として送信装置20と無線接続が可能な音声再生装置(例えばワイヤレスヘッドホン、ワイヤレスイヤホン、ワイヤレススピーカ等)が用いられる。この他、送信装置20及び受信装置50の構成は限定されない。
<First Embodiment>
[Transmission / reception system]
FIG. 1 is a diagram schematically showing the appearance of a transmission / reception system according to the first embodiment of the present technology. The transmission / reception system 100 includes a transmission device 20 and a reception device 50, and is a system that transmits waveform data from the transmission device 20 to the reception device 50.
Here, the waveform data is, for example, data representing a waveform that changes with time. In the transmission / reception system 100, for example, voice data representing a voice waveform is transmitted as waveform data.
Further, in the transmission / reception system 100, waveform data is transmitted by wireless communication between the transmission device 20 and the reception device 50. The communication standard for wireless communication is not limited, and a communication standard capable of transmitting waveform data such as Bluetooth (registered trademark) may be appropriately used.
In the example shown in FIG. 1, a portable terminal device (for example, a smartphone, a tablet terminal, a portable music player, etc.) is used as the transmission device 20. Further, as the receiving device 50, a voice reproducing device (for example, wireless headphones, wireless earphones, wireless speakers, etc.) capable of wirelessly connecting to the transmitting device 20 is used. In addition, the configurations of the transmitting device 20 and the receiving device 50 are not limited.
 送信装置20から送信された音声等の波形データは、受信装置50により受信される。受信装置50では、波形データが表す波形が復元され、受信装置50に搭載されたスピーカから音声として再生される。
 この時、送信装置20から送信された波形データの一部が、受信装置50により受信されない場合がある。例えば通信環境によっては、送信装置20及び受信装置50の間の無線通信が阻害され、波形データが部分的に損失するといった事態が発生する可能性がある。
 送受信システム100では、このようなデータの損失が発生した場合に、エラーコンシールメントが実行される。本開示において、エラーコンシールメントとは、例えば送信装置20から受信装置50へ伝送される波形データの一部が損失した場合に、損失した部分を補償する処理である。
Waveform data such as voice transmitted from the transmitting device 20 is received by the receiving device 50. In the receiving device 50, the waveform represented by the waveform data is restored and reproduced as sound from the speaker mounted on the receiving device 50.
At this time, a part of the waveform data transmitted from the transmitting device 20 may not be received by the receiving device 50. For example, depending on the communication environment, wireless communication between the transmitting device 20 and the receiving device 50 may be hindered, and a situation may occur in which waveform data is partially lost.
In the transmission / reception system 100, error concealment is executed when such data loss occurs. In the present disclosure, the error concealment is, for example, a process of compensating for the lost portion when a part of the waveform data transmitted from the transmitting device 20 to the receiving device 50 is lost.
 [エラーコンシールメントの概要]
 図2は、エラーコンシールメントの概要を説明するための模式図である。図2Aは、波形データが伝送される過程を示す模式図である。図2Bは、エラーコンシールメントにより補償された波形の一例を示す模式図である。
[Overview of error concealment]
FIG. 2 is a schematic diagram for explaining an outline of error concealment. FIG. 2A is a schematic diagram showing a process in which waveform data is transmitted. FIG. 2B is a schematic diagram showing an example of a waveform compensated by error concealment.
 まず、図2Aを参照して、波形データを伝送する方法について説明する。
 波形データを伝送する場合、送信装置20からは伝送用のエンコーダを用いて符号化された波形データが送信される。また受信装置50では符号化に用いられたエンコーダに対応するデコーダを用いて波形データが復号化される。
 これらの符号化(復号化)等のデータ処理は、波形データを時間軸に沿って分割するフレームごとに実行される。ここでフレームとは、例えば符号化方式で規格化された処理単位である。フレームの長さ(波形を分割する分割期間)は、例えば送信装置20で用いられる符号化方式に応じた期間(例えば10mSec等)に設定される。
 以下では、送信装置20において、符号化を行う前の、1フレーム分の波形データ(音声データ)を元データと記載する。また、元データの時間波形を示す場合には、x(n)と記載し、元データについて時間-周波数変換を行った周波数スペクトルを示す場合はX(k)と記載する。
First, a method of transmitting waveform data will be described with reference to FIG. 2A.
When transmitting waveform data, the transmission device 20 transmits waveform data encoded by using a transmission encoder. Further, in the receiving device 50, the waveform data is decoded using a decoder corresponding to the encoder used for coding.
Data processing such as these coding (decoding) is executed for each frame that divides the waveform data along the time axis. Here, the frame is, for example, a processing unit standardized by a coding method. The length of the frame (division period for dividing the waveform) is set to, for example, a period (for example, 10 mSec or the like) according to the coding method used in the transmission device 20.
In the following, the waveform data (voice data) for one frame before encoding is described as the original data in the transmission device 20. Further, when showing the time waveform of the original data, it is described as x (n), and when showing the frequency spectrum obtained by performing time-frequency conversion on the original data, it is described as X (k).
 符号化された元データは、フレームごとにパケット1に詰めて送信される。ここでパケット1とは、送信装置20及び受信装置50の間でのデータの伝送単位である。図2Aには、送信装置20から受信装置50に伝送される3つのパケット1が模式的に図示されている。本実施形態では、送信装置20から送信されるパケット1は、送信データに相当する。 The encoded original data is packed in packet 1 for each frame and transmitted. Here, the packet 1 is a data transmission unit between the transmitting device 20 and the receiving device 50. FIG. 2A schematically illustrates three packets 1 transmitted from the transmitting device 20 to the receiving device 50. In the present embodiment, the packet 1 transmitted from the transmission device 20 corresponds to the transmission data.
 例えば送信装置20から送信されたパケット1が、受信装置50により受信されない場合、受信装置50からパケット1を受信していない旨を伝えるエラー信号が送信される。エラー信号を受信した送信装置20は、対象となるパケット1を再度送信する再送処理を実行する。このような処理を繰り返すことで、パケット1の損失を防ぐことが可能である。
 一方で、各種の音声伝送方法のうち、例えばBLE(Bluetooth Low Energy)等による音声伝送では、パケットの再送回数に制限が設けられている。この場合、制限回数を超えたパケット1は破棄される為、欠損パケット(図2Aの中央のパケット1)が生じることになる。仮に欠損パケットが生じた状態で、そのまま音声等の再生を行うと、音声信号中に不連続点が発生し、これが聴感上の違和感として知覚されてしまう恐れがある。
For example, when the packet 1 transmitted from the transmitting device 20 is not received by the receiving device 50, an error signal indicating that the packet 1 is not received from the receiving device 50 is transmitted. The transmission device 20 that has received the error signal executes a retransmission process for retransmitting the target packet 1. By repeating such processing, it is possible to prevent the loss of the packet 1.
On the other hand, among various voice transmission methods, for example, in voice transmission by BLE (Bluetooth Low Energy) or the like, a limit is provided on the number of times a packet is retransmitted. In this case, since the packet 1 exceeding the limit number of times is discarded, a missing packet (packet 1 in the center of FIG. 2A) is generated. If the voice or the like is reproduced as it is with the missing packet generated, a discontinuity point may occur in the voice signal, which may be perceived as an audible discomfort.
 このため、送受信システム100では、エラーコンシールメントとして、欠損パケットを補償するパケットロスコンシールメント(PLC)が実行される。
 具体的には、送信装置20により、主データ2に冗長データ3を付加したデータを含むパケット1が生成される。すなわち、1つのパケット1には、主データ2と冗長データ3とのセットが含まれる。
 ここで、主データ2とは、本来送信したい元データを符号化したものである。以下では、パケット1に詰めて送信される主データ2のフレームを送信フレームと記載する。
 また、冗長データ3とは、エラーコンシールメント(ここではPLC)に利用する目的で、主データ2に対し別途付加する1フレーム分のデータである。以下では、冗長データ3を生成する対象となるフレームを対象フレームと記載する。
 本実施形態では、主データ2のフレーム(送信フレーム)の近傍のフレームに含まれる元データのうち一部の情報を用いて符号化されたデータが冗長データ3として用いられる。従って、対象フレームは、パケット1として送信される送信フレーム8の近傍のフレームである。
Therefore, in the transmission / reception system 100, packet loss concealment (PLC) for compensating for missing packets is executed as error concealment.
Specifically, the transmission device 20 generates a packet 1 including data in which redundant data 3 is added to the main data 2. That is, one packet 1 includes a set of main data 2 and redundant data 3.
Here, the main data 2 is a coded version of the original data that is originally desired to be transmitted. In the following, the frame of the main data 2 packed in the packet 1 and transmitted will be referred to as a transmission frame.
The redundant data 3 is data for one frame separately added to the main data 2 for the purpose of being used for error concealment (PLC in this case). In the following, the target frame for generating the redundant data 3 will be referred to as a target frame.
In the present embodiment, the data encoded by using a part of the original data included in the frame near the frame (transmission frame) of the main data 2 is used as the redundant data 3. Therefore, the target frame is a frame in the vicinity of the transmission frame 8 transmitted as the packet 1.
 図2Aの下側には、パケット1のデータ構成の一例が模式的に図示されている。この例では、1つのパケット1に1つの主データ2と、1つの冗長データ3とが含まれる。この主データ2は、例えばM番目のフレーム(M)の元データを、主データ2用のエンコーダで符号化したデータである。また冗長データ3は、主データ2の近傍のフレームであるM+1番目のフレーム(M+1)の元データを符号化したデータである。この場合、フレーム(M)が送信フレームとなり、フレーム(M+1)が対象フレームとなる。
 冗長データ3の符号化には、主データ2のエンコーダとは符号化方法や圧縮率等の設定が異なる一般に低品質(高圧縮率)のエンコーダが用いられる。従って冗長データ3のデータ量は、主データ2のデータ量よりも小さくなる。
 なお冗長データ3となる対象フレームは限定されず、例えばM+2番目のフレーム(M+2)や、M-1番目のフレーム(M-1)等の冗長データ3が付加されてもよい。
 また、パケット1に詰める主データ2(冗長データ3)の数等は限定されない。例えば複数フレーム分の主データ2及び冗長データ3のセットを含むパケット1等が用いられる場合であっても、本技術は適用可能である。
An example of the data structure of packet 1 is schematically shown on the lower side of FIG. 2A. In this example, one packet 1 includes one main data 2 and one redundant data 3. The main data 2 is, for example, data obtained by encoding the original data of the Mth frame (M) with the encoder for the main data 2. The redundant data 3 is data obtained by encoding the original data of the M + 1th frame (M + 1), which is a frame in the vicinity of the main data 2. In this case, the frame (M) is the transmission frame, and the frame (M + 1) is the target frame.
For encoding the redundant data 3, an encoder of generally low quality (high compression rate), which has different settings such as a coding method and a compression rate from the encoder of the main data 2, is used. Therefore, the data amount of the redundant data 3 is smaller than the data amount of the main data 2.
The target frame that becomes the redundant data 3 is not limited, and redundant data 3 such as the M + 2nd frame (M + 2) and the M-1st frame (M-1) may be added.
Further, the number of main data 2 (redundant data 3) packed in the packet 1 is not limited. For example, the present technology can be applied even when a packet 1 or the like including a set of main data 2 and redundant data 3 for a plurality of frames is used.
 このように冗長データ3を含むパケット1が順次生成され、受信装置50に送信される。受信装置50では、損失パケットが生じた場合、損失パケットに含まれる主データ2(以下、損失データと記載する)に対応する冗長データ3を用いて、損失データが補償される。
 例えばフレーム(M+1)の主データ2を含むパケット1が破棄された場合、すでに受信されているフレーム(M+1)の冗長データ3を用いて損失データが補間される。
 このようなPLCの方法は、一般にMedia-Specific FEC(Forward Error Correction)に分類される。これにより、すでに受信された冗長データ3を用いて、ただちに損失データを補償することが可能となる。なお、パケットが損失した後に必要な冗長データ3を受信して損失データを補償することも可能である。
In this way, the packet 1 including the redundant data 3 is sequentially generated and transmitted to the receiving device 50. When a loss packet occurs, the receiving device 50 compensates for the loss data by using the redundant data 3 corresponding to the main data 2 (hereinafter, referred to as loss data) included in the loss packet.
For example, when the packet 1 including the main data 2 of the frame (M + 1) is discarded, the loss data is interpolated using the redundant data 3 of the frame (M + 1) that has already been received.
Such a PLC method is generally classified into Media-Specific FEC (Forward Error Correction). As a result, the loss data can be immediately compensated by using the already received redundant data 3. It is also possible to compensate for the lost data by receiving the necessary redundant data 3 after the packet is lost.
 図2Bには、損失パケットが生じた場合に、受信装置50により復元される波形が模式的に図示されている。例えば図中の点線で示された時間範囲が、損失パケットの発生によりデータが失われた損失期間である。
 受信装置50では、損失期間における元データが表す波形(元波形4)を復元した復元波形5が生成される。具体的には、損失データに対応する冗長データ3を、損失データに関する合成波形を用いて補間した補間データが生成される。
 ここで、合成波形とは、エラーコンシールメント(ここではPLC)に利用する目的で、受信端末側で正常に受信された近傍のフレームのデータ(望ましくは主データ)をもとに合成される波形(音声データ)のことである。
 また補間データとは、冗長データ3と合成波形とを組み合わせて生成されるデータであり、最終的なコンシールメントに使用される波形データ(音声データ)である。
FIG. 2B schematically illustrates the waveform restored by the receiving device 50 when a lost packet occurs. For example, the time range shown by the dotted line in the figure is the loss period during which data was lost due to the occurrence of lost packets.
The receiving device 50 generates a restored waveform 5 in which the waveform (original waveform 4) represented by the original data in the loss period is restored. Specifically, interpolated data is generated by interpolating the redundant data 3 corresponding to the loss data using a composite waveform related to the loss data.
Here, the composite waveform is a waveform synthesized based on the data (preferably the main data) of a nearby frame normally received on the receiving terminal side for the purpose of using it for error concealment (PLC in this case). (Voice data).
The interpolated data is data generated by combining the redundant data 3 and the composite waveform, and is waveform data (audio data) used for the final concealment.
 この補間データが表す波形が、復元波形5として用いられる。従って、復元波形5は、冗長データ3と、対象フレームに関する合成波形とに基づいて生成される波形であると言える。図2Bには、元データが表す元波形4が実線で図示されており、補間データが表す復元波形5が点線で図示されている。
 このように、冗長データ3と合成波形とを組み合わせて用いることで、元データを完全に復元するわけではないものの、各種のノイズが十分に低減され聴感上の違和感が知覚されにくい、高品質なエラーコンシールメントを実現することが可能となる。
The waveform represented by this interpolated data is used as the restored waveform 5. Therefore, it can be said that the restored waveform 5 is a waveform generated based on the redundant data 3 and the composite waveform related to the target frame. In FIG. 2B, the original waveform 4 represented by the original data is shown by a solid line, and the restored waveform 5 represented by the interpolated data is shown by a dotted line.
In this way, by using the redundant data 3 and the composite waveform in combination, although the original data is not completely restored, various noises are sufficiently reduced and it is difficult to perceive a sense of discomfort in hearing, and the quality is high. It is possible to realize error concealment.
 [冗長データ]
 送受信システム100では、送信装置20により、受信装置50で復元される復元波形5の波形品質が予め予測される。すなわち、波形データの対象フレーム7に関する復元波形5の波形品質が予測される。
 典型的には、合成波形の波形品質を表す指標(後述するノイズスペクトル等)が算出され、復元波形の波形品質を表す指標として用いられる。そして予測された波形品質に基づいて、冗長データ3に割り当てる周波数範囲が設定される。
[Redundant data]
In the transmission / reception system 100, the transmission device 20 predicts in advance the waveform quality of the restoration waveform 5 restored by the reception device 50. That is, the waveform quality of the restored waveform 5 with respect to the target frame 7 of the waveform data is predicted.
Typically, an index (noise spectrum or the like described later) indicating the waveform quality of the composite waveform is calculated and used as an index indicating the waveform quality of the restored waveform. Then, the frequency range assigned to the redundant data 3 is set based on the predicted waveform quality.
 例えば、合成波形の波形品質が許容できるレベルに達している周波数範囲(波形品質が高い周波数範囲)については、合成波形を用いて損失データを補間しても聴感上の違和感が知覚されにくい。一方で、波形品質が許容できるレベルに達していない周波数範囲(波形品質が低い周波数範囲)については、合成波形を用いることで、不自然な音声等が再生されてしまい、聴感上の違和感が知覚される可能性が生じる。
 このため、送受信システム100では、波形品質が低い周波数範囲については、冗長データ3を用いたエラーコンシールメントが実行される。
 以下では、合成波形の波形品質の予測結果を用いて冗長データの周波数範囲を設定する方法を中心に説明する。
For example, in the frequency range in which the waveform quality of the composite waveform has reached an acceptable level (frequency range in which the waveform quality is high), even if the loss data is interpolated using the composite waveform, it is difficult to perceive a sense of discomfort. On the other hand, in the frequency range where the waveform quality does not reach an acceptable level (frequency range where the waveform quality is low), by using the composite waveform, unnatural sound etc. is reproduced, and a sense of discomfort in hearing is perceived. There is a possibility that it will be done.
Therefore, in the transmission / reception system 100, error concealment using the redundant data 3 is executed in the frequency range where the waveform quality is low.
In the following, a method of setting the frequency range of redundant data using the prediction result of the waveform quality of the composite waveform will be mainly described.
 図3は、1フレーム分の元データ6(対象フレーム7)が表す元波形4を示す模式的なグラフである。グラフの横軸は時間であり、縦軸は振幅値x(n)である。nは、フレーム内での時間を表すインデックスである。
 図4は、第1の実施形態に係る冗長データの周波数範囲の一例を示す模式図である。図4には、図3に示す元波形4の周波数スペクトルを示す模式的なグラフが図示されている。グラフの横軸は周波数であり、縦軸はスペクトル値X(k)である。kは、各スペクトル値の周波数を表すインデックス(周波数ビン)である。
FIG. 3 is a schematic graph showing the original waveform 4 represented by the original data 6 (target frame 7) for one frame. The horizontal axis of the graph is time, and the vertical axis is the amplitude value x (n). n is an index representing the time in the frame.
FIG. 4 is a schematic diagram showing an example of a frequency range of redundant data according to the first embodiment. FIG. 4 shows a schematic graph showing the frequency spectrum of the original waveform 4 shown in FIG. The horizontal axis of the graph is the frequency, and the vertical axis is the spectrum value X (k). k is an index (frequency bin) representing the frequency of each spectral value.
 図4には、波形品質に基づいて設定された周波数範囲の一例が矢印を用いて模式的に図示されている。この周波数範囲に含まれる元データ6の周波数成分(スペクトル成分)を符号化して、冗長データ3が生成される。
 以下では、冗長データ3に割り当てる周波数範囲を符号化周波数範囲70と記載する。符号化周波数範囲70は、例えば合成波形の波形品質が低い(ノイズ等が多い)周波数範囲である。本実施形態では、符号化周波数範囲70は、対象範囲に相当する。
 また、符号化周波数範囲70以外の周波数範囲(図4中の斜線で示した周波数範囲)は、パケット損失時に合成波形を用いて補間される周波数範囲となる。以下では、符号化周波数範囲70以外の周波数範囲を補間範囲71と記載する。補間範囲71は、例えば合成波形の波形品質が高い(ノイズ等が少ない)周波数範囲である。本実施形態では、補間範囲71は、非対象範囲に相当する。
 波形品質を評価する方法や、符号化周波数範囲70(補間範囲71)を設定する方法については、後に詳しく説明する。
In FIG. 4, an example of the frequency range set based on the waveform quality is schematically illustrated by using arrows. Redundant data 3 is generated by encoding the frequency component (spectral component) of the original data 6 included in this frequency range.
In the following, the frequency range assigned to the redundant data 3 will be referred to as a coded frequency range 70. The coded frequency range 70 is, for example, a frequency range in which the waveform quality of the composite waveform is low (there is a lot of noise and the like). In this embodiment, the coded frequency range 70 corresponds to the target range.
Further, the frequency range other than the coded frequency range 70 (the frequency range shown by the shaded line in FIG. 4) is the frequency range interpolated by using the composite waveform at the time of packet loss. In the following, a frequency range other than the coded frequency range 70 will be referred to as an interpolation range 71. The interpolation range 71 is, for example, a frequency range in which the waveform quality of the composite waveform is high (noise and the like are small). In the present embodiment, the interpolation range 71 corresponds to the non-target range.
The method of evaluating the waveform quality and the method of setting the coding frequency range 70 (interpolation range 71) will be described in detail later.
 図5は、比較例として挙げるエラーコンシールメントの一例を示す模式図である。
 図5に示す方法では、冗長データ3として用いる周波数範囲が最低周波数(k=0Hz)を含む低周波数側の固定範囲に設定されている。また、固定範囲よりも高周波数側の周波数範囲は、データの復元が行われないブランク範囲となる。
 この場合、パケット損失時には、固定範囲に含まれるスペクトル成分を含む音声等が復元される。一方で、これらの音声は、コンシールメント時に付加する冗長データ3の固定範囲でしか再生がなさない。この結果、失われた高域のエネルギーに起因して、聴感上の違和感が生じる可能性がある。
FIG. 5 is a schematic view showing an example of error concealment given as a comparative example.
In the method shown in FIG. 5, the frequency range used as the redundant data 3 is set to a fixed range on the low frequency side including the lowest frequency (k = 0 Hz). Further, the frequency range on the higher frequency side than the fixed range is a blank range in which data is not restored.
In this case, at the time of packet loss, voice or the like including a spectral component included in a fixed range is restored. On the other hand, these sounds can be reproduced only within a fixed range of the redundant data 3 added at the time of concealment. As a result, audible discomfort may occur due to the lost high frequency energy.
 本実施形態では、図4に示すように、送信装置20により、本来送信したい主データ2の送信フレームの近傍の対象フレーム7のデータが、合成波形の波形品質に応じて、低周波数側に設定された特定の周波数帯域(符号化周波数範囲70)に限り符号化され、冗長データ3が生成される。生成された冗長データ3は、主データ2を送信するパケット1に付加される。
 パケット損失が生じた場合、受信装置50により、損失した主データ2に対応する冗長データ3を合成波形で補間した補間データが生成される。より詳しくは、冗長データ3の周波数スペクトルのうち有効な範囲(符号化周波数範囲70)以外の範囲(補間範囲71)が、過去に正常に受信した近傍フレームから生成した合成波形の周波数スペクトルに置換される。
In the present embodiment, as shown in FIG. 4, the transmission device 20 sets the data of the target frame 7 in the vicinity of the transmission frame of the main data 2 to be originally transmitted to the low frequency side according to the waveform quality of the composite waveform. Only the specific frequency band (encoded frequency range 70) is encoded, and the redundant data 3 is generated. The generated redundant data 3 is added to the packet 1 that transmits the main data 2.
When packet loss occurs, the receiving device 50 generates interpolated data in which the redundant data 3 corresponding to the lost main data 2 is interpolated with a composite waveform. More specifically, the range (interpolation range 71) other than the valid range (encoded frequency range 70) of the frequency spectrum of the redundant data 3 is replaced with the frequency spectrum of the composite waveform generated from the neighboring frames normally received in the past. Will be done.
 これにより、例えば図5に示すように、単に主データ2と比較して狭帯域の冗長データ3のみをデコードして補間データとして使用する場合よりも、聴感上の違和感を大幅に抑制することが可能である。
 また、許容できる品質の中で、冗長データ3のデータ量はなるべく小さいことが望ましい。本実施形態では、送信装置20で合成波形の波形品質を予め予測し、この品質が一定以下となる帯域のみを符号化することにより、冗長データ3のデータ量をより少なくすることが可能である。これにより、データの伝送量を抑制しつつ高品質なエラーコンシールメントを実現することが可能となる。
As a result, for example, as shown in FIG. 5, it is possible to significantly suppress the sense of discomfort in hearing as compared with the case where only the redundant data 3 having a narrow band is decoded as compared with the main data 2 and used as the interpolated data. It is possible.
Further, it is desirable that the amount of redundant data 3 is as small as possible within the acceptable quality. In the present embodiment, the amount of redundant data 3 can be further reduced by predicting the waveform quality of the composite waveform in advance with the transmission device 20 and encoding only the band in which the quality is below a certain level. .. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.
 また、データを補間する方法としては、例えば冗長データ3の帯域外のスペクトルを、近傍のフレームのスペクトルに置き換えるといった処理が挙げられる。この処理は、合成波形として近傍のフレームをコピーした波形を用いる処理であるともいえる。
 このような、簡便で演算量の少ない処理を行うだけでも、高域のエネルギー減衰等に起因する音声等の詰まった感じや不連続感といった音質の劣化を大幅に改善することが可能となる。言い換えれば、演算量を増加させることなく、常時少ない冗長データ量でエラーコンシールメントの品質を高いレベルに保つことが可能となる。
Further, as a method of interpolating the data, for example, a process of replacing the spectrum outside the band of the redundant data 3 with the spectrum of a nearby frame can be mentioned. It can be said that this process uses a waveform obtained by copying a nearby frame as a composite waveform.
It is possible to significantly improve the deterioration of sound quality such as a feeling of clogging or discontinuity of sound due to energy attenuation in a high frequency range or the like even by performing such a simple and small amount of calculation processing. In other words, it is possible to maintain a high level of error concealment quality with a small amount of redundant data at all times without increasing the amount of calculation.
 [波形データの伝送方法]
 図6は、波形データの伝送方法の一例を示す模式図である。
 ここでは、図6を参照して、符号化及び復号化を含む波形データの伝送方法の一例について説明する。また波形データとしては、音声等のオーディオ信号(入力信号)を表すデータを想定する。
[Waveform data transmission method]
FIG. 6 is a schematic diagram showing an example of a waveform data transmission method.
Here, an example of a waveform data transmission method including coding and decoding will be described with reference to FIG. Further, as the waveform data, data representing an audio signal (input signal) such as voice is assumed.
 送信装置20では、入力信号がNサンプルのフレーム長で区切られ、このフレーム長と同じ大きさだけ先のフレームとオーバーラップした2Nサンプルの分析フレームが生成される。この2Nサンプルの分析フレームが、送信フレーム8として用いられる。
 図6では、送信フレーム8に含まれるデータであるx(n)に対応する時間範囲が矢印を用いて模式的に図示されている。図中のxprev(n)及びxnext(n)は、x(n)から見て時間的に前の送信フレーム8(前フレーム)及び時間的に後の送信フレーム8(後フレーム)である。
 これらの送信フレーム8の元データ6(入力信号)に、所定の分析窓をかけて時間-周波数変換した周波数スペクトルが算出される。分析窓の種類は限定されない。図6には分析窓を表す関数の概形が模式的に図示されている。また時間-周波数変換には、例えば修正離散コサイン変換(MDCT:Modified Discrete Cosine Transform)等が用いられる。
 送信フレーム8の周波数スペクトルが符号化され、符号化されたデータが主データ2としてパケット1に詰めて伝送される。この時、主データ2の近傍のフレーム(対象フレーム7)に関する冗長データ3が生成され、同じパケット1に付加される。符号化方法の種類や設定等は限定されない。
In the transmission device 20, the input signal is separated by the frame length of the N sample, and an analysis frame of the 2N sample that overlaps with the previous frame by the same size as this frame length is generated. The analysis frame of this 2N sample is used as the transmission frame 8.
In FIG. 6, the time range corresponding to x (n), which is the data included in the transmission frame 8, is schematically illustrated by using arrows. In the figure, x prev (n) and x next (n) are a transmission frame 8 (previous frame) that is temporally earlier and a transmission frame 8 (rear frame) that is temporally later than x (n). ..
The original data 6 (input signal) of these transmission frames 8 is subjected to a predetermined analysis window to calculate a time-frequency-converted frequency spectrum. The type of analysis window is not limited. FIG. 6 schematically illustrates the outline of the function representing the analysis window. Further, for the time-frequency conversion, for example, a modified discrete cosine transform (MDCT) or the like is used.
The frequency spectrum of the transmission frame 8 is encoded, and the encoded data is packed in the packet 1 as the main data 2 and transmitted. At this time, redundant data 3 related to a frame (target frame 7) in the vicinity of the main data 2 is generated and added to the same packet 1. The type and setting of the coding method are not limited.
 受信装置50では、伝送されたデータ(パケット1)を受信した後、データが復号化され、周波数スペクトルに戻される。復号化には、送信装置20による符号化方法に対応する復号化方法が用いられる。
 復号化された周波数スペクトルに、逆修正離散コサイン変換(IMDCT:Inversed Modified Discrete Cosine Transform)が適用され、2Nサンプルの時間波形が算出される。図6には、受信されたフレームに含まれるデータであるy(n)が模式的に図示されている。
 これらのy(n)に合成窓をかけ、送信側と同じ位置関係となる前フレーム及び後フレームとオーバーラップ加算することで出力信号が生成される。出力信号は、入力信号と同様の波形が再構成された信号となる。
After receiving the transmitted data (packet 1), the receiving device 50 decodes the data and returns it to the frequency spectrum. For decoding, a decoding method corresponding to the coding method by the transmission device 20 is used.
An inverse modified discrete cosine transform (IMDCT) is applied to the decoded frequency spectrum to calculate the time waveform of a 2N sample. FIG. 6 schematically shows y (n), which is data included in the received frame.
An output signal is generated by applying a composite window to these y (n) and adding them in an overlapping manner with the front frame and the rear frame which have the same positional relationship as the transmitting side. The output signal is a signal in which the same waveform as the input signal is reconstructed.
 ここで、パケット損失に伴いy(n)に対応するデータ(即ちx(n)に対応する符号化済みの送信フレーム8のデータ)が、欠損したとする。この場合、y(n)に対応する2Nサンプル分の期間にわたって、出力信号を適正に算出することが難しくなる。
 このため、受信装置50では、欠損したフレームに対応する冗長データ3に加えて、近傍フレームの復号結果であるyprev(n)及びynext(n)、あるいは既に再構成済みの過去フレーム等から生成した合成波形を使用して、欠損フレームを補う補間データが生成される。
Here, it is assumed that the data corresponding to y (n) (that is, the data of the encoded transmission frame 8 corresponding to x (n)) is lost due to the packet loss. In this case, it becomes difficult to properly calculate the output signal over the period of 2N samples corresponding to y (n).
Therefore, in the receiving device 50, in addition to the redundant data 3 corresponding to the missing frame, y prev (n) and y next (n), which are the decoding results of neighboring frames, or the past frame that has already been reconstructed, etc. Interpolated data that compensates for missing frames is generated using the generated composite waveform.
 [送受信システムの構成]
 図7は、送受信システム100の構成例を示すブロック図である。
 送受信システム100は、例えばBLE通信の方式に従って、オーディオファイルとして記憶された波形データ10を、送信装置20から受信装置50に伝送するシステムである。
 送受信システム100は、例えば送信装置20及び受信装置50において共に演算量の制約があるようなユースケースを想定して設計される。このような構成としては、スマートフォンやデジタルオーディオプレーヤー等の送信装置20と、ワイヤレスイヤホンやワイヤレスヘッドホンのように演算能力に制約のある受信装置50との組み合わせ等が挙げられる。
 もちろん、十分な演算能力を備えた装置(例えば送信側がPCで受信側が据え置き型のオーディオプレーヤー等)が用いられる場合であっても、本技術は適用可能である。
[Transmission / reception system configuration]
FIG. 7 is a block diagram showing a configuration example of the transmission / reception system 100.
The transmission / reception system 100 is a system that transmits waveform data 10 stored as an audio file from the transmission device 20 to the reception device 50 according to, for example, a BLE communication method.
The transmission / reception system 100 is designed assuming a use case in which, for example, both the transmission device 20 and the reception device 50 have restrictions on the amount of calculation. Examples of such a configuration include a combination of a transmitting device 20 such as a smartphone or a digital audio player and a receiving device 50 having a limited computing power such as wireless earphones or wireless headphones.
Of course, this technique can be applied even when a device having sufficient computing power (for example, a PC on the transmitting side and a stationary audio player on the receiving side) is used.
 [送信装置の構成]
 以下では、送受信システム100のうち、送信装置20の構成について説明する。
 送信装置20は、再送タイムアウト時間算出部21と、信号処理部22と、入力バッファ23と、冗長データ生成部24と、符号化部25と、mux部26と、送信バッファ27とを有する。送信装置20は、例えばCPUやメモリを含むコンピュータを用いて構成される。送信装置20が本実施形態に係わるプログラムを実行して各部が動作することで、本実施形態に係る送信方法が実行される。
 再送タイムアウト時間算出部21は、BLEによる接続が完了した段階で、送信装置20及び受信装置50の組合せに応じて決定されるパラメータを取得し、再送タイムアウト時間を算出する。ここで再送タイムアウト時間とは、受信装置50が受信しなかったパケット1の再送を許可する制限時間である。再送タイムアウト時間を超えても受信されなかったパケット1は、損失パケットとして処理される。
[Transmission device configuration]
Hereinafter, the configuration of the transmission device 20 in the transmission / reception system 100 will be described.
The transmission device 20 includes a retransmission timeout time calculation unit 21, a signal processing unit 22, an input buffer 23, a redundant data generation unit 24, a coding unit 25, a mux unit 26, and a transmission buffer 27. The transmission device 20 is configured by using, for example, a computer including a CPU and a memory. When the transmission device 20 executes the program related to the present embodiment and each unit operates, the transmission method according to the present embodiment is executed.
When the connection by BLE is completed, the retransmission timeout time calculation unit 21 acquires the parameters determined according to the combination of the transmission device 20 and the reception device 50, and calculates the retransmission timeout time. Here, the retransmission timeout time is a time limit for allowing the receiving device 50 to retransmit the packet 1 that has not been received. The packet 1 that is not received even if the retransmission timeout time is exceeded is processed as a lost packet.
 信号処理部22は、1フレームの生成に必要なオーディオデータ(波形データ10)をオーディオファイルから読み込み、所定の信号処理を実行して元データ6を生成する。例えばMDCTが実行され、フレーム単位の周波数スペクトルが元データ6として算出される。この他、ゲインや音質を調整する信号処理等が実行されてもよい。
 入力バッファ23は、信号処理部22により処理されたデータを一時的に記憶するバッファである。入力バッファ23には、波形データの周波数スペクトルや時間波形を表す元データ6が記憶される。なお、入力バッファ23の容量が満杯である場合には、優先度の最も低い元データ6(典型的には、最も古い元データ6)が破棄される。
The signal processing unit 22 reads the audio data (waveform data 10) required for generating one frame from the audio file, executes predetermined signal processing, and generates the original data 6. For example, MDCT is executed and the frequency spectrum of each frame is calculated as the original data 6. In addition, signal processing for adjusting gain and sound quality may be executed.
The input buffer 23 is a buffer that temporarily stores the data processed by the signal processing unit 22. The input buffer 23 stores the original data 6 representing the frequency spectrum and the time waveform of the waveform data. When the capacity of the input buffer 23 is full, the original data 6 having the lowest priority (typically, the oldest original data 6) is discarded.
 冗長データ生成部24は、入力バッファ23に記憶された元データ6を読み込み、冗長データ3と主データ2とを生成する。この時、冗長データ3を割り当てる符号化周波数範囲70が、合成波形の波形品質に基づいて設定される。
 また符号化周波数範囲70の設定には、後述する符号化部25から出力された、量子化設定(例えば周波数スペクトルの値を量子化する際の分解能等)に関する情報や、送信するデータ量(例えばパケット1の残りのデータ容量等)に関する情報を用いることも可能である。
 冗長データ生成部24の具体的な構成や動作等については、後に詳しく説明する。
The redundant data generation unit 24 reads the original data 6 stored in the input buffer 23 and generates the redundant data 3 and the main data 2. At this time, the coded frequency range 70 to which the redundant data 3 is assigned is set based on the waveform quality of the composite waveform.
Further, in the setting of the coded frequency range 70, information on the quantization setting (for example, the resolution at the time of quantizing the value of the frequency spectrum) output from the coding unit 25 described later and the amount of data to be transmitted (for example). It is also possible to use information about (such as the remaining data capacity of packet 1).
The specific configuration, operation, and the like of the redundant data generation unit 24 will be described in detail later.
 符号化部25は、冗長データ生成部24から出力された主データ2及び冗長データ3をそれぞれ対応する符号化方法に従って符号化する。例えば主データ2は、比較的低い圧縮率で符号化され、冗長データ3は、主データ2よりも高い圧縮率で符号化される。
 mux部26は、符号化部25により符号化された主データ2及び冗長データ3を、所定のパケット1に格納する。パケット1のデータ容量等は、用いられる通信方式(ここでは、BLE通信)に合わせて設定される。
 送信バッファ27は、mux部26により生成されたパケット1を一時的に記憶するバッファである。送信バッファ27に記憶されたパケット1は、所定の順番で送信モジュール(図示省略)を介して送信される。
The coding unit 25 encodes the main data 2 and the redundant data 3 output from the redundant data generation unit 24 according to the corresponding coding methods, respectively. For example, the main data 2 is encoded with a relatively low compression ratio, and the redundant data 3 is encoded with a higher compression ratio than the main data 2.
The mux unit 26 stores the main data 2 and the redundant data 3 encoded by the coding unit 25 in a predetermined packet 1. The data capacity of packet 1 and the like are set according to the communication method used (here, BLE communication).
The transmission buffer 27 is a buffer that temporarily stores the packet 1 generated by the mux unit 26. The packets 1 stored in the transmission buffer 27 are transmitted in a predetermined order via the transmission module (not shown).
 図8は、冗長データ生成部の構成例を示すブロック図である。
 冗長データ生成部24は、元データ選択部30と、合成波形生成部31と、発生ノイズ算出部32と、符号化範囲設定部33と、符号化スペクトル選択部34とを有する。
 以下では、冗長データ生成部24の前段の信号処理部22において、時間-周波数変換が行われた周波数スペクトルを表すデータX(k)と、対応する時間波形を表すデータx(n)との両方が、元データ6として生成されるものとする。
FIG. 8 is a block diagram showing a configuration example of the redundant data generation unit.
The redundant data generation unit 24 includes an original data selection unit 30, a composite waveform generation unit 31, a generated noise calculation unit 32, a coding range setting unit 33, and a coding spectrum selection unit 34.
In the following, in the signal processing unit 22 in the previous stage of the redundant data generation unit 24, both the data X (k) representing the time-frequency-converted frequency spectrum and the data x (n) representing the corresponding time waveform are both. Is generated as the original data 6.
 元データ選択部30は、入力バッファ23に記憶された元データ6から、必要なデータを選択して取得する。具体的には、処理対象となる元データ6の周波数スペクトル及び時間波形を表すデータが、入力バッファ23から読み込まれる。また、図8に示すように、取得するデータに応じて、そのデータを引き渡す先が異なる。
 主データ2に対応する元データ6(送信フレーム8に含まれる元データ6)の周波数スペクトルは、そのまま後段の符号化部25にパススルーされる。
 冗長データ3に対応する元データ6(対象フレーム7に含まれる元データ6)の周波数スペクトル及び時間波形は、発生ノイズ算出部32及び符号化スペクトル選択部34に入力される。
 合成波形用の元データ6の周波数スペクトル及び時間波形は、合成波形生成部31に入力される。合成波形用の元データ6は、冗長データ3となる対象フレーム7の近傍のフレーム(例えば送信フレーム8)に含まれるデータである。
The original data selection unit 30 selects and acquires necessary data from the original data 6 stored in the input buffer 23. Specifically, data representing the frequency spectrum and time waveform of the original data 6 to be processed is read from the input buffer 23. Further, as shown in FIG. 8, the delivery destination of the data differs depending on the data to be acquired.
The frequency spectrum of the original data 6 (original data 6 included in the transmission frame 8) corresponding to the main data 2 is passed through to the coding unit 25 in the subsequent stage as it is.
The frequency spectrum and time waveform of the original data 6 (original data 6 included in the target frame 7) corresponding to the redundant data 3 are input to the generated noise calculation unit 32 and the coded spectrum selection unit 34.
The frequency spectrum and time waveform of the original data 6 for the composite waveform are input to the composite waveform generation unit 31. The original data 6 for the composite waveform is data included in a frame (for example, a transmission frame 8) in the vicinity of the target frame 7 which is the redundant data 3.
 合成波形生成部31は、合成波形用の元データ6に基づいて、対象フレーム7に関する合成波形を生成する。例えば、合成波形の周波数スペクトルを表す合成データが生成される。
 あるいは合成波形の時間波形を表す合成データが算出されてもよい。この場合、時間波形を周波数スペクトルに変換する処理等が実行されてもよい。
 このように、本開示において、合成波形の周波数スペクトルを表す合成データを生成すること、及び合成波形の時間波形を表す合成データを生成することは、合成波形を生成することに含まれる。
 合成波形を生成する方法として、本実施形態では、合成波形用の元データ6の周波数スペクトルや時間波形が、そのまま合成波形として用いる方法が採用される。この場合、合成波形用の元データ6をコピーしたデータが合成データとなる。なお、合成波形を生成する方法は限定されず、例えば合成波形用の元データ6を適宜加工して合成波形を生成する方法等が用いられてもよい。
 合成波形(合成データ)は、発生ノイズ算出部32に入力される。
The composite waveform generation unit 31 generates a composite waveform for the target frame 7 based on the original data 6 for the composite waveform. For example, synthetic data representing the frequency spectrum of the composite waveform is generated.
Alternatively, synthetic data representing the time waveform of the composite waveform may be calculated. In this case, a process of converting the time waveform into a frequency spectrum or the like may be executed.
As described above, in the present disclosure, generating synthetic data representing the frequency spectrum of the synthetic waveform and generating synthetic data representing the time waveform of the synthetic waveform are included in generating the synthetic waveform.
As a method for generating the composite waveform, in the present embodiment, a method is adopted in which the frequency spectrum and the time waveform of the original data 6 for the composite waveform are used as they are as the composite waveform. In this case, the data obtained by copying the original data 6 for the composite waveform becomes the composite data. The method of generating the composite waveform is not limited, and for example, a method of appropriately processing the original data 6 for the composite waveform to generate the composite waveform may be used.
The composite waveform (composite data) is input to the generated noise calculation unit 32.
 発生ノイズ算出部32は、冗長データ3に対応する元データ6と、合成波形を表す合成データとを取得し、合成波形に関するノイズ情報を算出する。
 ノイズ情報とは、例えば合成波形を用いてデータを補間した場合に発生するノイズを表す情報である。例えば対象フレーム7の元波形4に対する合成波形のずれが、合成波形のノイズとして算出される。
 後述するように、発生ノイズ算出部32は、このようなノイズの周波数分布(ノイズスペクトル)や、ノイズ総量等をノイズ情報として算出する。
The generated noise calculation unit 32 acquires the original data 6 corresponding to the redundant data 3 and the composite data representing the composite waveform, and calculates the noise information related to the composite waveform.
The noise information is information representing noise generated when data is interpolated using, for example, a composite waveform. For example, the deviation of the composite waveform with respect to the original waveform 4 of the target frame 7 is calculated as the noise of the composite waveform.
As will be described later, the generated noise calculation unit 32 calculates the frequency distribution (noise spectrum) of such noise, the total amount of noise, and the like as noise information.
 例えばノイズが大きい周波数領域は、合成波形の波形品質が低い領域となり、ノイズが小さい周波数領域は、合成波形の波形品質が高い領域となる。従ってノイズ情報は、合成波形の波形品質を表す情報であるとともに、合成波形を用いて生成される復元波形5の波形品質を表す情報であるとも言える。
 このように、発生ノイズ算出部32は、復元波形の波形品質として、前記合成波形の波形品質を予測する。本実施形態では、発生ノイズ算出部32は、品質予測部に相当する。
For example, a frequency region where noise is large is a region where the waveform quality of the composite waveform is low, and a frequency region where noise is small is a region where the waveform quality of the composite waveform is high. Therefore, it can be said that the noise information is not only the information indicating the waveform quality of the composite waveform but also the information representing the waveform quality of the restored waveform 5 generated by using the composite waveform.
In this way, the generated noise calculation unit 32 predicts the waveform quality of the composite waveform as the waveform quality of the restored waveform. In the present embodiment, the generated noise calculation unit 32 corresponds to the quality prediction unit.
 符号化範囲設定部33は、ノイズ情報を取得し、冗長データ3として符号化すべき周波数範囲(符号化周波数範囲70)を設定する。
 本実施形態では、ノイズ情報(合成波形の波形品質)をもとに、対象フレーム7に含まれる元データ6に対して、低周波数側に符号化周波数範囲70が1つ設定される。なお、合成波形の波形品質以外にも、復元波形5の波形品質を表す他の情報を利用して符号化周波数範囲70を設定可能である。
 このように、符号化範囲設定部33は、復元波形5の波形品質に基づいて、対象フレーム7に含まれる元データ6(波形データ)から復元波形5を生成するための冗長データ3に割り当てる周波数範囲として、符号化周波数範囲70を設定する。
 また、図8に示すように、符号化範囲設定部33には、符号化における量子化設定やパケットのデータ量等に関する情報が入力される。これらの情報を用いて符号化周波数範囲70が設定されてもよい。
The coding range setting unit 33 acquires noise information and sets a frequency range (coding frequency range 70) to be encoded as redundant data 3.
In the present embodiment, one coded frequency range 70 is set on the low frequency side with respect to the original data 6 included in the target frame 7 based on the noise information (waveform quality of the composite waveform). In addition to the waveform quality of the composite waveform, the coded frequency range 70 can be set by using other information indicating the waveform quality of the restored waveform 5.
In this way, the coding range setting unit 33 allocates the frequency assigned to the redundant data 3 for generating the restored waveform 5 from the original data 6 (waveform data) included in the target frame 7 based on the waveform quality of the restored waveform 5. The coded frequency range 70 is set as the range.
Further, as shown in FIG. 8, information regarding the quantization setting in coding, the data amount of the packet, and the like is input to the coding range setting unit 33. The coded frequency range 70 may be set using this information.
 符号化スペクトル選択部34は、冗長データ3に対応する元データ6の周波数スペクトルと、符号化周波数範囲70とを取得し、冗長データ3として用いるスペクトル成分を抽出する。具体的には、元データ6の中から符号化周波数範囲70に含まれるスペクトル成分のみが抽出される。これらのスペクトル成分を表すデータが、符号化前の冗長データ3となる。
 このように、符号化スペクトル選択部34は、符号化周波数範囲70に基づいて冗長データ3を生成する。
The coded spectrum selection unit 34 acquires the frequency spectrum of the original data 6 corresponding to the redundant data 3 and the coded frequency range 70, and extracts the spectrum component to be used as the redundant data 3. Specifically, only the spectral components included in the coded frequency range 70 are extracted from the original data 6. The data representing these spectral components becomes the redundant data 3 before coding.
In this way, the coded spectrum selection unit 34 generates the redundant data 3 based on the coded frequency range 70.
 符号化前の冗長データ3及び主データ2は、図7に示す符号化部25に入力され、符号化された冗長データ3及び主データ2が生成される。そしてmux部26により、符号化された主データ2及び冗長データ3を含むパケット1(送信データ)が生成される。
 本実施形態では、符号化スペクトル選択部34と、符号化部25と、mux部26とが協動することで、冗長データを含む送信データを生成するデータ生成部が実現される。
The redundant data 3 and the main data 2 before encoding are input to the coding unit 25 shown in FIG. 7, and the encoded redundant data 3 and the main data 2 are generated. Then, the mux unit 26 generates a packet 1 (transmission data) including the encoded main data 2 and the redundant data 3.
In the present embodiment, the coded spectrum selection unit 34, the coding unit 25, and the mux unit 26 cooperate to realize a data generation unit that generates transmission data including redundant data.
 [冗長データの生成処理]
 本実施形態では、送信装置20において、指定周波数以下のスペクトル成分のみを冗長データ3としてパケット1に付加するものとする。
 具体的には、図4に示すように、周波数スペクトルのグラフの低域側に、単一の符号化周波数範囲70が設定される。符号化周波数範囲70の最低周波数kminは、元データ6の周波数スペクトルにおける周波数の下限値(kmin=0)に設定される。符号化周波数範囲70の最高周波数kmaxは、上記したノイズ情報(ノイズスペクトル)に基づいて設定される。
[Redundant data generation process]
In the present embodiment, in the transmission device 20, only the spectral components having a designated frequency or less are added to the packet 1 as redundant data 3.
Specifically, as shown in FIG. 4, a single coded frequency range 70 is set on the low frequency side of the graph of the frequency spectrum. Minimum frequency k min of the coded frequency range 70 is set to the lower limit value of the frequency in the frequency spectrum of the original data 6 (k min = 0). The maximum frequency k max of the coded frequency range 70 is set based on the above-mentioned noise information (noise spectrum).
 また受信装置50により対象フレーム7(冗長データ3)に関する合成波形を生成する方法(波形合成方法)として、対象フレーム7に対して1フレーム前の元データ6をコピーする方法を採用する。
 例えば、冗長データ3が図6に示すx(n)に関するデータであったとする。この場合、x(n)を含むフレーム(対象フレーム7)の1フレーム前の元データ6であるxprev(n)をコピーした波形が合成波形となる。すなわち、x(n)の合成波形x'(n)は、x'(n)=xprev(n)と表される。
 例えば、x(n)を含むパケット1の損失を補償するエラーコンシールメントが実行される場合、最高周波数kmaxよりも高い周波数範囲には、対象フレーム7に対して1フレーム前の元データ6であるxprev(n)のスペクトル成分がそのまま用いられることになる。
Further, as a method (waveform synthesis method) of generating a composite waveform related to the target frame 7 (redundant data 3) by the receiving device 50, a method of copying the original data 6 one frame before the target frame 7 is adopted.
For example, it is assumed that the redundant data 3 is the data related to x (n) shown in FIG. In this case, the waveform obtained by copying x prev (n), which is the original data 6 one frame before the frame containing x (n) (target frame 7), becomes the composite waveform. That is, the composite waveform x'(n) of x (n) is expressed as x'(n) = x prev (n).
For example, when error concealment is executed to compensate for the loss of packet 1 including x (n), in the frequency range higher than the maximum frequency k max , the original data 6 one frame before the target frame 7 is used. The spectral component of a certain x prev (n) will be used as it is.
 このように、符号化周波数範囲70を低域側に限定することで、符号化周波数範囲70を指定するための情報(例えば後述する実施形態で用いられるメタ情報等)が不要となり、冗長データ3のデータ量が増加するといった事態を回避することが可能となる。
 また、波形合成方法として、1フレーム前の元データ6をコピーする方法を採用することで、受信装置50側での演算量を低減することが可能となる。
By limiting the coded frequency range 70 to the low frequency side in this way, information for designating the coded frequency range 70 (for example, meta information used in the embodiments described later) becomes unnecessary, and redundant data 3 It is possible to avoid a situation in which the amount of data in the data increases.
Further, by adopting a method of copying the original data 6 one frame before as a waveform synthesis method, it is possible to reduce the amount of calculation on the receiving device 50 side.
 図9は、冗長データ3の生成処理の一例を示すフローチャートである。図9に示す処理は、冗長データ生成部24及び符号化部25により実行される処理の一例である。この処理は、例えばパケット1を生成する度に実行されるループ処理である。
 まず、元データ選択部30により、入力バッファ23から処理対象となる元データ6が取得される(ステップ101)。
 具体的には、主データ2を生成するための送信フレーム8の元データ6と、冗長データ3を生成するための対象フレーム7の元データ6と、合成波形を生成するための合成波形用の元データ6とが、入力バッファ23から読み込まれる。
FIG. 9 is a flowchart showing an example of the generation process of the redundant data 3. The process shown in FIG. 9 is an example of the process executed by the redundant data generation unit 24 and the coding unit 25. This process is, for example, a loop process that is executed every time packet 1 is generated.
First, the original data selection unit 30 acquires the original data 6 to be processed from the input buffer 23 (step 101).
Specifically, the original data 6 of the transmission frame 8 for generating the main data 2, the original data 6 of the target frame 7 for generating the redundant data 3, and the composite waveform for generating the composite waveform. The original data 6 and the original data 6 are read from the input buffer 23.
 ここでは、送信フレーム8をM番目のフレーム(M)とし、対象フレーム7をM+1番目のフレーム(M+1)とする(図2A等参照)。また上記したように、本実施形態では、合成波形は、対象フレーム7の1つ前のフレームの元データ6から生成される。従って、合成波形用の元データ6は、主データ2となる。
 なお、対象フレーム7(冗長データ3)が、送信フレーム(主データ2)の直後のフレームでない場合には、対象フレーム7に対して時系列的に1フレーム分前の元データ6が主データ2とは別に合成波形用の元データ6として読み込まれる。
Here, the transmission frame 8 is the Mth frame (M), and the target frame 7 is the M + 1th frame (M + 1) (see FIG. 2A and the like). Further, as described above, in the present embodiment, the composite waveform is generated from the original data 6 of the frame immediately before the target frame 7. Therefore, the original data 6 for the composite waveform becomes the main data 2.
If the target frame 7 (redundant data 3) is not the frame immediately after the transmission frame (main data 2), the original data 6 one frame before the target frame 7 in chronological order is the main data 2. Separately, it is read as the original data 6 for the composite waveform.
 合成波形生成部31により、合成波形用の元データ6に基づいて、対象フレーム7に関する合成波形を生成する合成波形生成処理が実行される(ステップ102)。
 本実施形態では、合成波形生成部31により、送信フレーム8に含まれる元データ6(主データ2)に基づいて、対象フレーム7に関する合成波形が生成される。
 合成波形を生成する方法は、受信装置50で用いられる波形合成方法と全く同一の方法であることが望ましい。上記したように、本実施形態では、対象フレーム7(冗長データ3)に対して1フレーム前の元データ6をコピーする方法が用いられる。このため、合成波形生成部31による処理は、合成波形用の元データ6である主データ2をそのままパススルーする処理となる。
 なお、他の波形合成方法が用いられる場合には、設定された方法に応じて合成波形用の元データ6をもとに合成波形が適宜生成される。
The composite waveform generation unit 31 executes a composite waveform generation process for generating a composite waveform for the target frame 7 based on the original data 6 for the composite waveform (step 102).
In the present embodiment, the composite waveform generation unit 31 generates a composite waveform for the target frame 7 based on the original data 6 (main data 2) included in the transmission frame 8.
It is desirable that the method for generating the composite waveform is exactly the same as the waveform synthesis method used in the receiving device 50. As described above, in the present embodiment, a method of copying the original data 6 one frame before the target frame 7 (redundant data 3) is used. Therefore, the process by the composite waveform generation unit 31 is a process of passing through the main data 2 which is the original data 6 for the composite waveform as it is.
When another waveform synthesis method is used, a composite waveform is appropriately generated based on the original data 6 for the composite waveform according to the set method.
 発生ノイズ算出部32により、合成波形を用いることで発生するノイズを予測する発生ノイズ予測処理が実行される(ステップ103)。
 具体的には、合成波形を用いることで本来の波形(元波形4)に対して生じたノイズの周波数スペクトル(以下、ノイズスペクトルと記載する)が算出される。ノイズスペクトルは、典型的には周波数ごとのノイズの強度(パワー)を表すパワースペクトルとして算出される。またノイズスペクトルは、合成波形の波形品質を表す尺度として用いることが可能である。すなわち、発生ノイズ予測処理は、合成波形の波形品質(ノイズスペクトル)を予測する処理であると言える。
 本実施形態では、合成波形11と対象フレーム7に含まれる元データ6が表す元波形4とに基づいて合成波形の波形品質を表すノイズスペクトル13が算出される。
The generated noise calculation unit 32 executes a generated noise prediction process for predicting the noise generated by using the composite waveform (step 103).
Specifically, by using the composite waveform, the frequency spectrum (hereinafter, referred to as noise spectrum) of the noise generated with respect to the original waveform (original waveform 4) is calculated. The noise spectrum is typically calculated as a power spectrum representing the intensity (power) of noise for each frequency. The noise spectrum can also be used as a measure of the waveform quality of the composite waveform. That is, it can be said that the generated noise prediction process is a process for predicting the waveform quality (noise spectrum) of the composite waveform.
In the present embodiment, the noise spectrum 13 representing the waveform quality of the composite waveform is calculated based on the composite waveform 11 and the original waveform 4 represented by the original data 6 included in the target frame 7.
 図10は、ノイズスペクトルの算出例を示す模式図である。図10A~図10Cは、元波形4、合成波形11、及び差分波形12の1フレーム分の時間波形の一例を示す模式的なグラフである。図10Dは、ノイズスペクトル13の一例を示す模式的なグラフである。
 ここでは、時間波形である元波形4と、合成波形11とを用いて、ノイズスペクトル13を算出する方法について説明する。
FIG. 10 is a schematic diagram showing a calculation example of a noise spectrum. 10A to 10C are schematic graphs showing an example of a time waveform for one frame of the original waveform 4, the composite waveform 11, and the difference waveform 12. FIG. 10D is a schematic graph showing an example of the noise spectrum 13.
Here, a method of calculating the noise spectrum 13 by using the original waveform 4 which is a time waveform and the composite waveform 11 will be described.
 発生ノイズ算出部32では、冗長データ3に用いる元データ6が表す元波形x(n)と、合成波形生成部31により算出された合成波形x'(n)とが読み込まれ、各波形の差分を表す差分波形12が算出される。
 例えば、図10Aに示す元波形4(元データ6)と、図10Bに示す合成波形11(合成データ)とが読み込まれる。図10A及び図10Bに示すように、合成波形11は、例えば元波形4が含まれる対象フレーム7の前フレームの波形であるため、元波形4の形状とは完全に一致しない。
 各波形が読み込まれると、元波形4と合成波形11との差分波形12(x(n)-x'(n))が算出される。差分波形12は、各タイミングnにおける、元波形4と合成波形11との差分を表す波形である。図10Cには、図10A及び図10Bに示す元波形4と合成波形11との差分波形12が模式的に図示されている。
The generated noise calculation unit 32 reads the original waveform x (n) represented by the original data 6 used for the redundant data 3 and the composite waveform x'(n) calculated by the composite waveform generation unit 31, and the difference between the waveforms. The difference waveform 12 representing the above is calculated.
For example, the original waveform 4 (original data 6) shown in FIG. 10A and the composite waveform 11 (composite data) shown in FIG. 10B are read. As shown in FIGS. 10A and 10B, the composite waveform 11 is, for example, the waveform of the previous frame of the target frame 7 including the original waveform 4, and therefore does not completely match the shape of the original waveform 4.
When each waveform is read, the difference waveform 12 (x (n) −x'(n)) between the original waveform 4 and the composite waveform 11 is calculated. The difference waveform 12 is a waveform representing the difference between the original waveform 4 and the composite waveform 11 at each timing n. FIG. 10C schematically shows a difference waveform 12 between the original waveform 4 and the composite waveform 11 shown in FIGS. 10A and 10B.
 差分波形12に対して時間-周波数変換(例えばフーリエ変換)が実行され、差分波形12の周波数スペクトルが算出される。この周波数スペクトルの絶対値を表すパワースペクトルがノイズスペクトル13(Pnoise(k))として算出される。
 この場合、Pnoise(k)は、以下の式を用いて表される。
Figure JPOXMLDOC01-appb-M000001
 なお、(数1)中のFは、差分波形12に対する高速フーリエ変換(FFT:Fast Fourier Transform)を表す。
 このように、ノイズスペクトル13は、元波形4と合成波形11との差分を周波数変換したスペクトルである。これにより、実際の時間波形に生じるノイズを、周波数ごとに評価することが可能となり、合成波形11の波形品質を精度よく予測することが可能となる。
Time-frequency conversion (for example, Fourier transform) is executed on the difference waveform 12, and the frequency spectrum of the difference waveform 12 is calculated. The power spectrum representing the absolute value of this frequency spectrum is calculated as the noise spectrum 13 (P noise (k)).
In this case, P noise (k) is expressed using the following equation.
Figure JPOXMLDOC01-appb-M000001
Note that F in (Equation 1) represents a Fast Fourier Transform (FFT) for the difference waveform 12.
As described above, the noise spectrum 13 is a spectrum obtained by frequency-converting the difference between the original waveform 4 and the composite waveform 11. This makes it possible to evaluate the noise generated in the actual time waveform for each frequency, and it is possible to accurately predict the waveform quality of the composite waveform 11.
 図10Dには、図10Cに示す差分波形12をフーリエ変換したパワースペクトルが模式的に図示されている。図10Dに示すように、ノイズスペクトル13(Pnoise(k))は、周波数kごとの元波形4と合成波形11との差分の強度を表す周波数スペクトルである。
 例えばPnoise(k)の値が小さい周波数範囲については、合成波形11は元波形4に近い波形であるとして、高品質な合成波形が生成できたとみなすことが可能である。逆に、Pnoise(k)の値が大きい周波数範囲については、合成波形11は元波形4からずれた波形であるとして、合成波形の品質は低いものとみなすことが可能である。
FIG. 10D schematically shows a power spectrum obtained by Fourier transforming the difference waveform 12 shown in FIG. 10C. As shown in FIG. 10D, the noise spectrum 13 (P noise (k)) is a frequency spectrum representing the intensity of the difference between the original waveform 4 and the composite waveform 11 for each frequency k.
For example, in the frequency range where the value of P noise (k) is small, it can be considered that a high-quality composite waveform can be generated by assuming that the composite waveform 11 is a waveform close to the original waveform 4. On the contrary, in the frequency range in which the value of P noise (k) is large, the composite waveform 11 can be regarded as a waveform deviated from the original waveform 4, and the quality of the composite waveform can be regarded as low.
 なおノイズスペクトル13は、分析窓w(n)を適用して算出することが望ましい。この場合、(数1)式に替えて、以下の式を用いてノイズスペクトル13が算出される。
Figure JPOXMLDOC01-appb-M000002
 分析窓w(n)は、例えば合成波形11を用いてデータを補間するまでの工程等に応じて適宜設定される。これにより、合成波形を用いることで実際に生じるノイズを精度よく予測することが可能となる。
The noise spectrum 13 is preferably calculated by applying the analysis window w (n). In this case, the noise spectrum 13 is calculated using the following equation instead of the equation (Equation 1).
Figure JPOXMLDOC01-appb-M000002
The analysis window w (n) is appropriately set according to the process of interpolating the data using, for example, the composite waveform 11. This makes it possible to accurately predict the noise that actually occurs by using the composite waveform.
 上記では、2つの時間波形(元波形4及び合成波形11)の差分を取った後にFFTを実行してノイズスペクトル13を算出する方法について説明した。これに替えて、元波形4及び合成波形11の周波数スペクトルを用いてノイズスペクトル13を算出することも可能である。
 例えば、発生ノイズ算出部32では、元波形4の周波数スペクトルX(k)と、合成波形11の周波数スペクトルX'(k)とが読み込まれ、各スペクトルの差分を表す差分スペクトルが算出される。なお、X(k)及びX'(k)としては、例えば信号処理部22により予めx(n)及びx'(n)(例えば主データ2の元データ6)がMDCT変換されたMDCTスペクトルを用いることが可能である。
In the above, the method of calculating the noise spectrum 13 by executing the FFT after taking the difference between the two time waveforms (the original waveform 4 and the composite waveform 11) has been described. Instead of this, it is also possible to calculate the noise spectrum 13 using the frequency spectra of the original waveform 4 and the composite waveform 11.
For example, the generated noise calculation unit 32 reads the frequency spectrum X (k) of the original waveform 4 and the frequency spectrum X'(k) of the composite waveform 11 to calculate a difference spectrum representing the difference between the respective spectra. As X (k) and X'(k), for example, the MDCT spectrum in which x (n) and x'(n) (for example, the original data 6 of the main data 2) are MDCC-converted in advance by the signal processing unit 22 is used. It can be used.
 この場合、各MDCTスペクトル(X(k)及びX'(k))について、直接差分を取った差分スペクトルが算出される。そして差分スペクトルの絶対値又は2乗を表すパワースペクトルが、ノイズスペクトル13として算出される。
 この場合、Pnoise(k)は、以下の式を用いて表される。
Figure JPOXMLDOC01-appb-M000003
 このように、ノイズスペクトル13は、元波形4のスペクトルと合成波形11のスペクトルとの差分を表すスペクトルである。
 このような方法をとることにより、例えばパワースペクトル同士での比較では検知しにくい位相の不連続によって発生するノイズを予測することも可能となる。
 また波形合成方法として前フレームからのコピーを用いる場合、すでにMDCT変換された合成波形用の元データ6(ここでは、主データ2)を直接読み込めばよい。従って、ノイズスペクトル13を算出するために、再度時間-周波数変換処理(FFT等)を行う必要がなく、主データ2を生成する際に使用したMDCTスペクトルX'(k)を使いまわすことが可能となる。この結果、演算量を十分に抑制することが可能となる。
In this case, for each M DCT spectrum (X (k) and X'(k)), a difference spectrum obtained by directly taking a difference is calculated. Then, the power spectrum representing the absolute value or the square of the difference spectrum is calculated as the noise spectrum 13.
In this case, P noise (k) is expressed using the following equation.
Figure JPOXMLDOC01-appb-M000003
As described above, the noise spectrum 13 is a spectrum representing the difference between the spectrum of the original waveform 4 and the spectrum of the composite waveform 11.
By adopting such a method, it is possible to predict noise generated by phase discontinuity, which is difficult to detect by comparison between power spectra, for example.
When a copy from the previous frame is used as the waveform synthesis method, the original data 6 (here, the main data 2) for the composite waveform that has already been MDCT-converted may be directly read. Therefore, it is not necessary to perform the time-frequency conversion process (FFT or the like) again in order to calculate the noise spectrum 13, and the MDCT spectrum X'(k) used when generating the main data 2 can be reused. Will be. As a result, the amount of calculation can be sufficiently suppressed.
 また、(数1)~(数3)を用いて算出されるスペクトルの移動平均をノイズスペクトル13として算出してもよい。移動平均は、所定のビン範囲(例えば3スペクトル分のビン)を移動させて、ビン範囲に含まれる各スペクトル値の平均を算出する処理である。
 移動平均により算出されたノイズスペクトル13(Pnoise-smoothed(k))は、以下の式を用いて表される。
Figure JPOXMLDOC01-appb-M000004
 これにより、例えばノイズスペクトル13の平滑化が可能となり、後段のデータ処理を容易に実行することが可能である。
 本開示では、(数1)等を用いて算出されたノイズスペクトル13を平滑化したスペクトル(Pnoise-smoothed(k))も、ノイズスペクトル13に含まれる。以下、Pnoise-smoothed(k)を単にPnoise(k)と記載して説明を行う。
Further, the moving average of the spectra calculated using (Equation 1) to (Equation 3) may be calculated as the noise spectrum 13. The moving average is a process of moving a predetermined bin range (for example, bins for three spectra) and calculating the average of each spectrum value included in the bin range.
The noise spectrum 13 (P noise-smoothed (k)) calculated by the moving average is expressed using the following equation.
Figure JPOXMLDOC01-appb-M000004
As a result, for example, the noise spectrum 13 can be smoothed, and the data processing in the subsequent stage can be easily executed.
In the present disclosure, a spectrum obtained by smoothing the noise spectrum 13 calculated by using ( Equation 1) or the like (P noise-smoothed (k)) is also included in the noise spectrum 13. Hereinafter, P noise-smoothed (k) will be described simply as P noise (k).
 図9に戻り、ノイズスペクトル13が算出されると、符号化範囲設定部33により、符号化周波数範囲70を設定する符号化範囲設定処理が実行される(ステップ104)。具体的には、発生ノイズ算出部32により算出されたノイズスペクトル13(Pnoise(k))に基づいて、符号化周波数範囲70が設定される。
 例えば、ノイズスペクトル13から、ノイズが大きい周波数範囲が算出され冗長データ3に割り当てる符号化周波数範囲70に設定される。すなわち、冗長データ3に割り当てる周波数範囲は、合成波形の波形品質が低い周波数範囲となる。
 このように、ノイズスペクトルを用いることで、冗長データ3に割り当てるべき(すなわち合成波形を使用するべきではない)周波数範囲を精度よく設定することが可能となる。
 符号化範囲設定処理については、後に詳しく説明する。
Returning to FIG. 9, when the noise spectrum 13 is calculated, the coding range setting unit 33 executes the coding range setting process for setting the coding frequency range 70 (step 104). Specifically, the coding frequency range 70 is set based on the noise spectrum 13 (P noise (k)) calculated by the generated noise calculation unit 32.
For example, a frequency range in which noise is large is calculated from the noise spectrum 13, and is set to a coded frequency range 70 to be assigned to the redundant data 3. That is, the frequency range assigned to the redundant data 3 is a frequency range in which the waveform quality of the composite waveform is low.
In this way, by using the noise spectrum, it is possible to accurately set the frequency range to be assigned to the redundant data 3 (that is, the composite waveform should not be used).
The coding range setting process will be described in detail later.
 符号化周波数範囲70が設定されると、符号化スペクトル選択部34により、対象フレーム7の元データ6から符号化周波数範囲70に対応するスペクトル成分のみを抽出する符号化スペクトル選択処理が実行される(ステップ105)。
 符号化スペクトル選択部34には、例えば対象フレーム7のMDCTスペクトルX(k)を表す元データ6が入力される。このX(k)に含まれるスペクトル成分(周波数成分)のうち、符号化周波数範囲70に含まれる成分が抽出される。抽出された成分を含むデータが、符号化前の冗長データ3となる。
 従って、符号化前の冗長データ3のデータ量は、符号化周波数範囲70の幅(ノイズスペクトル13の状態)に応じて変化することになる。
When the coded frequency range 70 is set, the coded spectrum selection unit 34 executes a coded spectrum selection process for extracting only the spectrum components corresponding to the coded frequency range 70 from the original data 6 of the target frame 7. (Step 105).
For example, the original data 6 representing the MDCT spectrum X (k) of the target frame 7 is input to the coded spectrum selection unit 34. Of the spectral components (frequency components) included in X (k), the components included in the coded frequency range 70 are extracted. The data including the extracted components becomes the redundant data 3 before coding.
Therefore, the data amount of the redundant data 3 before coding changes according to the width of the coding frequency range 70 (the state of the noise spectrum 13).
 冗長データ3が抽出されると、処理すべき元データ6が残っているか否かが判定される(ステップ106)。具体的には、1つのパケット1に詰める冗長データ3が残っているか否かが判定される。
 例えば、処理すべき元データ6(すなわち生成すべき冗長データ3)が残っている場合(ステップ106のYES)、残っている元データ6についてステップ101~ステップ105の処理が実行される。
 1つのパケット1に詰める主データ2及び冗長データ3の構成は限定されない。例えば図2Aを参照して説明した例では、1つのデータセット(主データ2及び冗長データ3)を含むパケット1が生成される。この場合、ステップ106のループ処理は実行されない。一方で、1つのパケット1に複数のデータセットを詰める場合等には、生成すべき冗長データ3の個数分だけステップ105までの処理が実行される。
When the redundant data 3 is extracted, it is determined whether or not the original data 6 to be processed remains (step 106). Specifically, it is determined whether or not the redundant data 3 to be packed in one packet 1 remains.
For example, when the original data 6 to be processed (that is, the redundant data 3 to be generated) remains (YES in step 106), the processes of steps 101 to 105 are executed for the remaining original data 6.
The configuration of the main data 2 and the redundant data 3 packed in one packet 1 is not limited. For example, in the example described with reference to FIG. 2A, packet 1 including one data set (main data 2 and redundant data 3) is generated. In this case, the loop processing of step 106 is not executed. On the other hand, when a plurality of data sets are packed in one packet 1, the processes up to step 105 are executed for the number of redundant data 3 to be generated.
 処理すべき元データ6が残っていない場合(ステップ106のNO)、冗長データ3を符号化する符号化処理が実行される(ステップ107)。ここでは、上記の処理で生成された符号化前の冗長データ3(符号化周波数範囲70に含まれるスペクトル成分)が、所定の符号化方法を用いて符号化され、符号化済みの冗長データ3が生成される。
 符号化処理では、符号化の際の圧縮率(ビットレート)等を適宜設定することで、符号化済みの冗長データ3のデータ量を調整することが可能である。例えば、冗長データ3のターゲットデータ量等が設定されている場合には、ターゲットデータ量に収まるような圧縮率で冗長データ3が符号化される。
 また、冗長データ3の圧縮率は固定されていてもよい。この場合、符号化済みの冗長データ3のデータ量は、符号化周波数範囲70の幅等に応じて変化する。
 なお、主データ2の符号化は、冗長データ3の符号化とは別に実行される。
When the original data 6 to be processed does not remain (NO in step 106), the coding process for encoding the redundant data 3 is executed (step 107). Here, the unencoded redundant data 3 (spectral component included in the encoded frequency range 70) generated in the above process is encoded by a predetermined coding method, and the encoded redundant data 3 is used. Is generated.
In the coding process, it is possible to adjust the data amount of the encoded redundant data 3 by appropriately setting the compression rate (bit rate) and the like at the time of coding. For example, when the target data amount of the redundant data 3 is set, the redundant data 3 is encoded with a compression rate that fits in the target data amount.
Further, the compression rate of the redundant data 3 may be fixed. In this case, the amount of coded redundant data 3 varies depending on the width of the coded frequency range 70 and the like.
The coding of the main data 2 is executed separately from the coding of the redundant data 3.
 冗長データ3が符号化されると、主データ2のターゲットデータ量が設定される(ステップ108)。具体的には、符号化済みの冗長データ3のデータ量からパケット1の空き容量が算出される。パケット1の空き容量は、例えばパケット1のデータサイズから符号化済みの冗長データ3の総データ量を差し引いた容量である。このパケット1の空き容量が、主データ2のターゲットデータ量に設定される。例えば主データ2を符号化する際には、符号化済みの主データ2のデータ量が、ここで設定されたターゲットデータ量に収まるように、圧縮率等が適宜設定される。 When the redundant data 3 is encoded, the target data amount of the main data 2 is set (step 108). Specifically, the free space of the packet 1 is calculated from the data amount of the encoded redundant data 3. The free capacity of the packet 1 is, for example, the capacity obtained by subtracting the total amount of the encoded redundant data 3 from the data size of the packet 1. The free space of the packet 1 is set as the target data amount of the main data 2. For example, when encoding the main data 2, the compression rate or the like is appropriately set so that the amount of the encoded main data 2 fits within the target data amount set here.
 [符号化範囲設定処理]
 図11は、符号化周波数範囲の算出例を示す模式図である。図11には、ノイズスペクトル13(Pnoise(k))を示す模式的なグラフが図示されている。グラフの横軸は周波数であり、縦軸はPnoise(k)のスペクトル値である。
 以下では、符号化範囲設定処理(図9のステップ104)の詳細について説明する。
[Code-coded range setting process]
FIG. 11 is a schematic diagram showing a calculation example of the coded frequency range. FIG. 11 shows a schematic graph showing a noise spectrum 13 (P noise (k)). The horizontal axis of the graph is the frequency, and the vertical axis is the spectrum value of P noise (k).
Hereinafter, the details of the coding range setting process (step 104 in FIG. 9) will be described.
 本実施形態では、符号化範囲設定部33により、ノイズスペクトル13の積算値を算出する積算範囲72が設定され、積算値が第1の閾値を超える最小の積算範囲72に基づいて符号化周波数範囲70が設定される。
 ノイズスペクトル13の積算値は、対象となる周波数範囲における総ノイズ量を表す値である。従って、符号化周波数範囲70は、総ノイズ量が第1の閾値程度になる周波数範囲に設定されるともいえる。
 以下では、積算値が第1の閾値を超える最小の積算範囲72が、そのまま符号化周波数範囲70として設定される場合について説明する。
In the present embodiment, the coding range setting unit 33 sets the integration range 72 for calculating the integrated value of the noise spectrum 13, and the coding frequency range is based on the minimum integration range 72 in which the integrated value exceeds the first threshold value. 70 is set.
The integrated value of the noise spectrum 13 is a value representing the total amount of noise in the target frequency range. Therefore, it can be said that the coded frequency range 70 is set to a frequency range in which the total noise amount is about the first threshold value.
In the following, a case where the minimum integration range 72 whose integrated value exceeds the first threshold value is set as the coding frequency range 70 will be described.
 まず、予め設定された総和算定範囲73における総ノイズ量Pnoise-sumが算出される。図11には、総和算定範囲73が点線の範囲として模式的に図示されている。
 総和算定範囲73の最小のインデックス(最低周波数)は0に設定される。また総和算定範囲73の最大のインデックス(最高周波数)は、FFTのインデックスの総数2N(FFT長)の半分(N)未満で任意に設定可能である。以下では、総和算定範囲73の最大のインデックスをtotal_areaと記載する。
 総和算定範囲73における総ノイズ量Pnoise-sumは、以下の式を用いて表される。
Figure JPOXMLDOC01-appb-M000005
First, the total noise amount P noise-sum in the preset total calculation range 73 is calculated. In FIG. 11, the total calculation range 73 is schematically shown as a dotted line range.
The minimum index (lowest frequency) of the total calculation range 73 is set to 0. Further, the maximum index (maximum frequency) of the total calculation range 73 can be arbitrarily set to be less than half (N) of the total number of FFT indexes of 2N (FFT length). In the following, the maximum index of the total calculation range 73 is described as total_area.
The total noise amount P noise-sum in the total calculation range 73 is expressed using the following equation.
Figure JPOXMLDOC01-appb-M000005
 また、積算範囲72が設定され、積算範囲72における総ノイズ量Pnoise-redun-sumが算出される。図11には、積算範囲72が実線の範囲として模式的に図示されている。
 積算範囲72の最小のインデックス(最低周波数)は0に設定される。また積算範囲72の最大のインデックス(最高周波数)は総和算定範囲73に含まれるように設定される。以下では、積算範囲72の最大のインデックスをredun_areaと記載する。
 積算範囲72における総ノイズ量Pnoise-redun-sumは、以下の式を用いて表される。
Figure JPOXMLDOC01-appb-M000006
Further, the integration range 72 is set, and the total noise amount P noise-red un-sum in the integration range 72 is calculated. In FIG. 11, the integration range 72 is schematically shown as a solid line range.
The minimum index (lowest frequency) of the integration range 72 is set to 0. Further, the maximum index (maximum frequency) of the integration range 72 is set to be included in the total calculation range 73. In the following, the maximum index of the integration range 72 will be referred to as redun_area.
The total noise amount P noise-red un-sum in the integration range 72 is expressed using the following equation.
Figure JPOXMLDOC01-appb-M000006
 (数6)式が示す積算範囲72の総ノイズ量Pnoise-redun-sumが、総和算定範囲73の総ノイズ量Pnoise-sumに対して所定の割合α(例えば0.7)以上となる最小の積算範囲72が算出される。すなわち、以下の式を満たすようなredun_areaの最小値が算出される。
Figure JPOXMLDOC01-appb-M000007
 (数7)式の左辺(所定の割合αと総和算定範囲73の総ノイズ量Pnoise-sumとの積)は、上記した第1の閾値に相当する。
 (数7)式を満たすredun_areaの最小値は、符号化周波数範囲70の最大のインデックス(最高周波数)に設定される。また、図4を参照して説明したように、符号化周波数範囲70の最小のインデックス(最低 周波数)は、0に設定される。
The total noise amount P noise-red un-sum in the integration range 72 represented by the equation (Equation 6) is equal to or more than a predetermined ratio α (for example, 0.7) with respect to the total noise amount P noise-sum in the total calculation range 73. The minimum integration range 72 is calculated. That is, the minimum value of redun_area that satisfies the following formula is calculated.
Figure JPOXMLDOC01-appb-M000007
The left side of the equation (Equation 7) ( the product of the predetermined ratio α and the total noise amount P noise-sum in the total calculation range 73) corresponds to the above-mentioned first threshold value.
The minimum value of redun_area satisfying the equation (Equation 7) is set to the maximum index (maximum frequency) of the coded frequency range 70. Further, as described with reference to FIG. 4, the minimum index (minimum frequency) of the coding frequency range 70 is set to 0.
 このように、本実施形態では、符号化周波数範囲70は、合成波形によるノイズの分布に応じた周波数範囲に設定される。これにより、例えば元データ6の全周波数領域や一定の固定領域を冗長データ3に割り当てるような場合と比べ、必要な周波数範囲だけを冗長データ3に割り当てることが可能となる。この結果、冗長データ3の品質を劣化させることなくデータ量を低減するといったことが可能となる。 As described above, in the present embodiment, the coded frequency range 70 is set to the frequency range according to the distribution of noise due to the composite waveform. This makes it possible to allocate only the necessary frequency range to the redundant data 3, as compared with the case where, for example, the entire frequency region or a fixed fixed region of the original data 6 is allocated to the redundant data 3. As a result, it is possible to reduce the amount of data without degrading the quality of the redundant data 3.
 ここで、冗長データ3として保持しない周波数範囲、すなわち合成波形11が用いられる周波数範囲に残存するノイズPnoise-residue(=Pnoise-sum-Pnoise-redun-sum)を見積もる。
 (数7)式を満たす場合、冗長データ3の符号化に伴う量子化ノイズを無視すれば、Pnoise-residueは、以下の式を満たすことになる。
Figure JPOXMLDOC01-appb-M000008
 (数8)式に示すように、冗長データ3として保持しない領域に残存するノイズPnoise-residueは、総和算定範囲73に元々存在したノイズの(1-α)以下にまで減少していることになる。
Here, the noise P noise-residue (= P noise-sum- P noise-redun-sum ) remaining in the frequency range that is not retained as the redundant data 3, that is, the frequency range in which the composite waveform 11 is used is estimated.
When the equation (Equation 7) is satisfied, the P noise-residue satisfies the following equation if the quantization noise associated with the coding of the redundant data 3 is ignored.
Figure JPOXMLDOC01-appb-M000008
As shown in the equation (Equation 8), the noise P noise-residue remaining in the region not retained as the redundant data 3 is reduced to (1-α) or less of the noise originally existing in the total calculation range 73. become.
 例えば、総和算定範囲73は、人間の可聴範囲に基づいて設定される。この場合、Pnoise-sumは、人間が聞き取れる周波数範囲における合成波形11によるノイズの総量となる。上記した方法は、このような可聴範囲におけるノイズの総量の内、所定の割合αでノイズ量を低下させるように冗長データの符号化周波数範囲70を設定する方法であるといえる。
 またredun_areaを最大のインデックスとする低周波数側に符号化周波数範囲70を設定することで、低域側で合成波形によるノイズが発生する事態を回避することが可能となり、聴感上の違和感を大幅に低減することが可能となる。
For example, the total calculation range 73 is set based on the human audible range. In this case, P noise-sum is the total amount of noise due to the composite waveform 11 in the frequency range that can be heard by humans. It can be said that the above method is a method of setting the encoded frequency range 70 of the redundant data so as to reduce the noise amount by a predetermined ratio α in the total amount of noise in such an audible range.
In addition, by setting the coded frequency range 70 on the low frequency side with redun_area as the maximum index, it is possible to avoid the situation where noise due to the composite waveform is generated on the low frequency side, and the sense of discomfort in hearing is greatly reduced. It is possible to reduce it.
 なお、Pnoise-sumに対するPnoise-redun-sumの割合αを高く設定した場合には、符号化周波数範囲70が大きくなる。この場合、合成波形によりノイズは減少するものの、後述する量子化ノイズが増大する可能性がある。例えば、冗長データ3のターゲットデータ量が予め決められている場合には、符号化周波数範囲70が大きいほど量子化ノイズが増大する。
 このようなノイズのトレードオフを考慮して、最終的に復元される復元波形5の総ノイズ量が低下するようにαの値を設定することも可能である。
When the ratio α of P noise-red un -sum to P noise-sum is set high, the coding frequency range 70 becomes large. In this case, although the noise is reduced by the composite waveform, the quantization noise described later may increase. For example, when the target data amount of the redundant data 3 is predetermined, the larger the coding frequency range 70 is, the larger the quantization noise is.
Considering such a noise trade-off, it is possible to set the value of α so that the total noise amount of the restored waveform 5 finally restored is reduced.
 図12は、符号化範囲設定処理の一例を示すフローチャートである。図12に示す処理は、図11を参照して説明した符号化範囲設定処理の一例である。
 まず、符号化範囲設定部33により、Pnoise(k)が読み込まれる(ステップ201)。
 次に、Pnoise(k)について、総和算定範囲73における総ノイズ量Pnoise-sumが算出される(ステップ202)。具体的には、(数5)式に従って、k=0からk=total_areaまでのノイズスペクトル13の積算値がPnoise-sumとして算出される。
 次に、積算範囲72の最大のインデックスredun_areaが初期化される(ステップ203)。redun_areaは、符号化周波数範囲70を設定する為に用いる変数として機能する。ここでは、redun_areaに0が代入される。
 次に、Pnoise(k)について、積算範囲72における総ノイズ量Pnoise-redun-sumが算出される(ステップ204)。具体的には、(数6)式に従って、k=0からk=redun_areaまでのノイズスペクトル13の積算値がPnoise-redun-sumとして算出される。
FIG. 12 is a flowchart showing an example of the coding range setting process. The process shown in FIG. 12 is an example of the coding range setting process described with reference to FIG.
First, P noise (k) is read by the coding range setting unit 33 (step 201).
Next, for P noise (k), the total noise amount P noise-sum in the total calculation range 73 is calculated (step 202). Specifically, the integrated value of the noise spectrum 13 from k = 0 to k = total_area is calculated as P noise-sum according to the equation (Equation 5).
Next, the maximum index redun_area of the integration range 72 is initialized (step 203). The redun_area functions as a variable used to set the coding frequency range 70. Here, 0 is assigned to redun_area.
Next, for P noise (k), the total noise amount P noise-red un-sum in the integration range 72 is calculated (step 204). Specifically, the integrated value of the noise spectrum 13 from k = 0 to k = redun_area is calculated as P noise-redun-sum according to the equation (Equation 6).
 次に、Pnoise-redun-sumが、(数7)式に示す条件をみたしているか否かが判定される(ステップ205)。すなわち、Pnoise-redun-sumとPnoise-sumとを比較して、Pnoise-redun-sumがPnoise-sumのα(0<α<1)倍以上になっているか否かが判定される。
 (数7)式の条件が満たされる場合(ステップ205のYES)、redun_areaが、符号化周波数範囲70の最大値(最大のインデックス)に設定される(ステップ208)。
 また、(数7)式の条件が満たされない場合(ステップ205のNO)、redun_aeraがインクリメントされ、redun_areaに1が加えられる(ステップ206)。
 次に、インクリメントされたredun_areaが、総和算定範囲73に収まっているか否かが判定される(ステップ207)。すなわち、redun_areaがtotal_area(総和算定範囲73の最大のインデックス)よりも小さいか否かが判定される。
 redun_areaがtotal_areaよりも小さい場合(ステップ207のYES)、ステップ204以降の処理が繰り返される。redun_areaがtotal_area以上となる場合(ステップ207のNO)、ステップ208が実行される。
Next, it is determined whether or not P noise-redun-sum satisfies the condition shown in the equation (Equation 7) (step 205). That is, by comparing P noise-redun-sum and P noise-sum , it is determined whether or not P noise-redun-sum is α (0 <α <1) times or more of P noise-sum. NS.
When the condition of the equation (Equation 7) is satisfied (YES in step 205), redun_area is set to the maximum value (maximum index) of the coding frequency range 70 (step 208).
If the condition of Eq. (Equation 7) is not satisfied (NO in step 205), redun_aera is incremented and 1 is added to redun_area (step 206).
Next, it is determined whether or not the incremented redun_area is within the total calculation range 73 (step 207). That is, it is determined whether or not redun_area is smaller than total_area (the maximum index of the total calculation range 73).
If redun_area is smaller than total_area (YES in step 207), the processes after step 204 are repeated. If redun_area is greater than or equal to total_area (NO in step 207), step 208 is executed.
 このように、図12に示す処理では、積算範囲72の最大のインデックスredun_areaを、k=0から増加させるごとに(数7)式を満たすか否かが判定される。そして、(数7)式を満たす最初のインデックスが、符号化周波数範囲70の最大値に設定される。
 この他、(数7)式を満たす積算範囲72を算出する方法は限定されない。例えば、redun_areaを、k=total_areaから減少させて、(数7)式を満たすか否かを判定し(数7)式を満たす最後のインデックスを、符号化周波数範囲70の最大値に設定してもよい。
 このように、本実施形態では、積算範囲72の最低周波数がノイズスペクトル13の最低周波数に設定され、積算範囲の最高周波数を変化させて積算値が算出される。これにより、人間に聞こえやすい周波数範囲のノイズを抑制する冗長データ3を容易に算出することが可能となる。
As described above, in the process shown in FIG. 12, it is determined whether or not the equation (Equation 7) is satisfied every time the maximum index redun_area of the integration range 72 is increased from k = 0. Then, the first index satisfying the equation (Equation 7) is set to the maximum value of the coding frequency range 70.
In addition, the method of calculating the integration range 72 that satisfies the equation (Equation 7) is not limited. For example, redun_area is decremented from k = total_area, it is determined whether or not the equation (Equation 7) is satisfied, and the last index satisfying the equation (Equation 7) is set to the maximum value of the coding frequency range 70. May be good.
As described above, in the present embodiment, the minimum frequency of the integration range 72 is set to the minimum frequency of the noise spectrum 13, and the integration value is calculated by changing the maximum frequency of the integration range. This makes it possible to easily calculate redundant data 3 that suppresses noise in a frequency range that is easily heard by humans.
 [量子化ノイズを考慮した符号化範囲設定処理]
 上記では、パケット損失時に生成される補間データ(図2B参照)に含まれるノイズ成分のうち、主に合成波形11を用いることで発生するノイズ(ノイズスペクトル13)に注目して符号化周波数範囲70を設定する方法について説明した。
 補完データには、合成波形11によるノイズの他に、冗長データ3の符号化に伴う量子化ノイズが含まれる場合がある。例えば冗長データ3を符号化する処理では、設定される圧縮率等に応じて元データ6が量子化される。この時、圧縮率が高く量子化の精度(ビットレート等)が低いほど、データ量が小さくなるが、一方で量子化ノイズは大きくなる。
[Coding range setting process considering quantization noise]
In the above, among the noise components included in the interpolated data (see FIG. 2B) generated at the time of packet loss, the coded frequency range 70 focuses mainly on the noise (noise spectrum 13) generated by using the composite waveform 11. Explained how to set.
In addition to the noise due to the composite waveform 11, the complementary data may include quantization noise associated with the coding of the redundant data 3. For example, in the process of encoding the redundant data 3, the original data 6 is quantized according to the set compression rate and the like. At this time, the higher the compression rate and the lower the quantization accuracy (bit rate, etc.), the smaller the amount of data, but on the other hand, the larger the quantization noise.
 例えば主データ2に対して冗長データ3のサイズが必要以上に大きくならないように、冗長データ3のターゲットデータ量(以下nbitと記載する)が予め決まっているユースケースも考えられる。この場合、符号化後の冗長データ3のデータ量が、nbitに収まるように圧縮率等が設定される。このため、例えば符号化周波数範囲70が大きい場合には、圧縮率が高く設定され、量子化ノイズが増大する可能性がある。 For example, there may be a use case in which the target data amount (hereinafter referred to as nbit) of the redundant data 3 is predetermined so that the size of the redundant data 3 does not become larger than necessary with respect to the main data 2. In this case, the compression rate or the like is set so that the amount of the redundant data 3 after encoding is contained in nbit. Therefore, for example, when the coding frequency range 70 is large, the compression rate is set high and the quantization noise may increase.
 図13は、補間データの総ノイズ量について説明するための模式図である。図13A及び図13Bは、補間データに含まれるノイズの周波数分布を示す模式的なグラフである。グラフの横軸は周波数であり、縦軸は各周波数でのノイズ強度である。
 図13Aに示すように、低周波数側に設定された符号化周波数範囲70は、冗長データ3が用いられる領域であり、補間データのノイズは量子化ノイズNq(k)を用いて表される。また符号化周波数範囲70よりも高周波数側は、合成波形11が用いられる領域であり、補間データのノイズは合成波形によるノイズPnoise(k)を用いて表される。
 図13Bでは、図13Aよりも広い符号化周波数範囲70が設定される。この場合、Pnoise(k)の総量は減少し、Nq(k)の総量は増加する。
FIG. 13 is a schematic diagram for explaining the total noise amount of the interpolated data. 13A and 13B are schematic graphs showing the frequency distribution of noise included in the interpolated data. The horizontal axis of the graph is the frequency, and the vertical axis is the noise intensity at each frequency.
As shown in FIG. 13A, the coded frequency range 70 set on the low frequency side is a region where the redundant data 3 is used, and the noise of the interpolated data is represented by using the quantization noise N q (k). .. Further, the frequency side higher than the coded frequency range 70 is a region where the composite waveform 11 is used, and the noise of the interpolated data is represented by using the noise P noise (k) due to the composite waveform.
In FIG. 13B, a coding frequency range 70 wider than that in FIG. 13A is set. In this case, the total amount of P noise (k) decreases and the total amount of N q (k) increases.
 図13Cは、補間データの総ノイズ量と符号化周波数範囲との関係を示すグラフである。ここでは、符号化周波数範囲70の最大のインデックス(redun_area)を変化させた場合の、合成波形よるノイズの総量Pnoise、量子化ノイズの総量Nq、及び補間データの総ノイズ量(Pnoise+Nq)を示すグラフがそれぞれ示されている。
 例えば、redun_areaが高周波数側にシフトすると、Pnoiseは減少するもののNqが増加する。すなわち、Pnoise及びNqは、redun_areaに対して互いにトレードオフの関係となる。このため、補間データの総ノイズ量(Pnoise+Nq)は、図13Cに示すように下に凸のグラフで表され、redun_areaがある周波数に設定された場合に最小値となる。
FIG. 13C is a graph showing the relationship between the total noise amount of the interpolated data and the coded frequency range. Here, when the maximum index (redun_area) of the coded frequency range 70 is changed, the total amount of noise due to the composite waveform P noise , the total amount of quantization noise N q , and the total amount of noise of the interpolated data (P noise + N). The graphs showing q) are shown respectively.
For example, when redun_area shifts to the high frequency side, P noise decreases but N q increases. That is, P noise and N q have a trade-off relationship with each other with respect to redun_area. Therefore, the total noise amount (P noise + N q ) of the interpolated data is represented by a downwardly convex graph as shown in FIG. 13C, and becomes the minimum value when redun_area is set to a certain frequency.
 以下では、量子化ノイズを考慮した符号化範囲設定処理について説明する。
 この処理では、冗長データ3に割り当てる周波数範囲について、Pnoise(k)の代わりに、冗長データ3の符号化方式に応じて決定される量子化精度等の情報から、発生する量子化ノイズNq(k)が簡易的に算出される。そして補間データ全体におけるノイズのパワー(総ノイズ量)が最小化されるように、符号化周波数範囲70が設定される。
Hereinafter, the coding range setting process in consideration of the quantization noise will be described.
In this process, the quantization noise N q generated from the information such as the quantization accuracy determined according to the coding method of the redundant data 3 instead of P noise (k) for the frequency range assigned to the redundant data 3. (K) is simply calculated. Then, the coding frequency range 70 is set so that the noise power (total noise amount) in the entire interpolated data is minimized.
 例えば、冗長データ3のターゲットデータ量が設定されているとする。この場合、符号化範囲設定部33には、ターゲットデータ量(nbit)と、冗長データ3の符号化に用いられる符号化方法の情報とが入力される。そして、周波数ごとに発生する量子化ノイズNq(k)が算出される。例えば、冗長データ3の符号化方法に応じて、量子化ノイズNq(k)を概算した値が算出される。
 この量子化ノイズNq(k)を用いると、補間データの総ノイズ量Pnoise-residueは、以下の式を用いて表される。
Figure JPOXMLDOC01-appb-M000009
 すなわち、Pnoise-residueは、符号化周波数範囲70以外の範囲に残るノイズの総量(合成波形11によるノイズの総量)と、符号化周波数範囲70における量子化ノイズNq(k)の総量として表される。
For example, it is assumed that the target data amount of the redundant data 3 is set. In this case, the target data amount (nbit) and the information of the coding method used for coding the redundant data 3 are input to the coding range setting unit 33. Then, the quantization noise N q (k) generated for each frequency is calculated. For example, a value obtained by estimating the quantization noise N q (k) is calculated according to the coding method of the redundant data 3.
Using this quantization noise N q (k), the total noise amount P noise-residue of the interpolated data is expressed using the following equation.
Figure JPOXMLDOC01-appb-M000009
That is, P noise-residue is expressed as the total amount of noise remaining in the range other than the coded frequency range 70 (total amount of noise according to the composite waveform 11) and the total amount of quantization noise N q (k) in the coded frequency range 70. Will be done.
 (数9)式に示すPnoise-residueを最小にするredun_areaが算出される。
 図13Cを参照して説明したように、Pnoise-residueは、redun_area<Nの領域で、下に凸と予想される。従って、例えばredun_area=kiの時のPnoise-residueが、ki-1及びki+1番目での値より小さくなれば、その時のkiがPnoise-residueを最小にするredun_areaとなる。
 例えば、redun_areaを適宜変化させて、その前後でのPnoise-residueの大小関係を判定することで、Pnoise-residueを最小にするredun_areaが算出され、符号化周波数範囲70の最大のインデックスに設定される。
The redun_area that minimizes the P noise-residue shown in the equation (Equation 9) is calculated.
As explained with reference to FIG. 13C, P noise-residue is expected to be convex downward in the region of redun_area <N. Therefore, for example, if P noise-residue when redun_area = k i is smaller than the values at k i-1 and k i + 1 , then k i at that time becomes redun_area that minimizes P noise-residue. ..
For example, by appropriately changing redun_area and determining the magnitude relationship of P noise-residue before and after it, redun_area that minimizes P noise-residue is calculated and set to the maximum index of the coding frequency range 70. Will be done.
 このように、符号化範囲設定部33は、ノイズスペクトル13と冗長データ3の符号化に伴う量子化ノイズNq(k)とに基づいて、冗長データ3を合成波形11で補間した補間データのノイズ総量Pnoise-residueを算出し、ノイズ総量Pnoise-residueが最小となるように符号化周波数範囲70を設定する。これにより、聴感上重要な全帯域について、ノイズの和を最小化することが可能となる。この結果、補間データの品質を十分に向上することが可能となる。 In this way, the coding range setting unit 33 is the interpolation data in which the redundant data 3 is interpolated by the composite waveform 11 based on the noise spectrum 13 and the quantization noise N q (k) accompanying the coding of the redundant data 3. The total amount of noise P noise-residue is calculated, and the coding frequency range 70 is set so that the total amount of noise P noise-residue is minimized. This makes it possible to minimize the sum of noise in all bands that are important for hearing. As a result, the quality of the interpolated data can be sufficiently improved.
 [総パワーを用いた符号化範囲設定処理]
 冗長データ3となるスペクトル成分の強度(パワー)に基づいて、符号化周波数範囲70を設定することも可能である。
 例えば、冗長データ3として用いることになる周波数範囲の総パワーに対する閾値として、ターゲットパワーPtargetが設定される。Ptargetは、例えば、冗長データ3のターゲットデータ量(nbit)に対応するPtargetを記録したテーブルや、ターゲットデータ量に応じてPtargetを算出する計算式等を用いて算出される。このPtargetを基準にして、以下の条件式が設定される。
Figure JPOXMLDOC01-appb-M000010
[Coding range setting process using total power]
It is also possible to set the coding frequency range 70 based on the intensity (power) of the spectral component that becomes the redundant data 3.
For example, the target power P target is set as a threshold value for the total power in the frequency range to be used as the redundant data 3. The P target is calculated using, for example, a table in which the P target corresponding to the target data amount (nbit) of the redundant data 3 is recorded, a calculation formula for calculating the P target according to the target data amount, or the like. The following conditional expression is set based on this P target.
Figure JPOXMLDOC01-appb-M000010
 (数10)式の左辺は、k=0~redun_areaにおける元波形4(元データ6)のスペクトル成分のパワーの積算値である。この積算値は、冗長データ3として抽出されるスペクトル成分の総パワーを表す。
 符号化範囲設定部33では、(数10)式を満たす最大のredun_areaが算出され、符号化周波数範囲70の最大のインデックスに設定される。すなわち、冗長データ3の総パワーが、ターゲットパワーPtarget未満となる最大の周波数範囲が、符号化周波数範囲70として設定される。この方法を用いることで、符号化周波数範囲70を容易に設定することが可能となる。
The left side of the equation (Equation 10) is an integrated value of the powers of the spectral components of the original waveform 4 (original data 6) in k = 0 to redun_area. This integrated value represents the total power of the spectral components extracted as the redundant data 3.
In the coding range setting unit 33, the maximum redun_area satisfying the equation (Equation 10) is calculated and set to the maximum index of the coding frequency range 70. That is, the maximum frequency range in which the total power of the redundant data 3 is less than the target power P target is set as the coded frequency range 70. By using this method, the coding frequency range 70 can be easily set.
 なお、人間の聴覚特性として、同じパワーでも周波数領域毎に知覚される強度に違いが生じる場合がある。このため、上記した符号化範囲設定処理で用いられるPnoise(k)、Nq(k)及び|X(k)|等の周波数分布を予め補正しておくことが望ましい。
 例えば、各周波数分布の値から、人間の聴覚特性に基づいて周波数毎に設定された閾値thresh(k)が減算される。このthresh(k)は、例えば人間が聞き取れる音量の周波数分布を示すラウドネス曲線等を用いて設定される。なお減算後の値がマイナスになる場合には、0に設定される。あるいは、thresh(k)に応じて、各周波数分布の値に重みづけを行う処理等が実行されてもよい。
 このように、人間の聴覚特性に応じた補正を加えることで、本来ならば聞き取れないノイズ成分をカウントしてしまうといった事態を回避することが可能となる。この結果、符号化周波数範囲70を適正に設定することが可能となる。
As a human auditory characteristic, there may be a difference in the perceived intensity for each frequency domain even with the same power. Therefore, it is desirable to correct the frequency distributions such as P noise (k), N q (k), and | X (k) | used in the above-mentioned coding range setting process in advance.
For example, the threshold threshold (k) set for each frequency is subtracted from the value of each frequency distribution based on the human auditory characteristics. This threshold (k) is set using, for example, a loudness curve showing a frequency distribution of a volume that can be heard by humans. If the value after subtraction becomes negative, it is set to 0. Alternatively, a process of weighting the values of each frequency distribution may be executed according to the threshold (k).
In this way, by adding corrections according to human auditory characteristics, it is possible to avoid a situation in which noise components that would otherwise be inaudible are counted. As a result, the coding frequency range 70 can be set appropriately.
 [受信装置の構成]
 以下では、送受信システム100のうち、受信装置50の構成について説明する。
 図7に示すように、受信装置50は、通信コントローラ51と、受信バッファ52と、demux部53と、主データバッファ54と、冗長データバッファ55と、再生データ選択部56と、復号化部57と、信号処理部58と、オーディオDAC59とを有する。受信装置50は、例えばCPUやメモリを含むコンピュータを用いて構成される。受信装置50が本実施形態に係わるプログラムを実行して各部が動作することで、本実施形態に係る受信方法が実行される。
[Receiver configuration]
Hereinafter, the configuration of the receiving device 50 in the transmitting / receiving system 100 will be described.
As shown in FIG. 7, the receiving device 50 includes a communication controller 51, a receiving buffer 52, a demox unit 53, a main data buffer 54, a redundant data buffer 55, a playback data selection unit 56, and a decoding unit 57. And a signal processing unit 58 and an audio DAC 59. The receiving device 50 is configured by using, for example, a computer including a CPU and a memory. When the receiving device 50 executes the program related to the present embodiment and each unit operates, the receiving method according to the present embodiment is executed.
 通信コントローラ51は、例えばBLE通信を監視し、通信状態を制御する。通信コントローラ51は、例えばパケット1が損失した場合等に、パケット損失情報を生成する。この情報に基づいて、受信装置50におけるエラーコンシールメントが開始される。 The communication controller 51 monitors, for example, BLE communication and controls the communication state. The communication controller 51 generates packet loss information, for example, when packet 1 is lost. Based on this information, error concealment in the receiving device 50 is started.
 受信バッファ52は、送信装置20から送信されたパケット1を受信して、パケット1を一時的に記憶するバッファである。
 上記したようにパケット1には、符号化された主データ2及び冗長データ3が含まれる。
 このうち、主データ2は、送信装置20において、送信フレーム8に含まれる元データ6が符号化されたデータである。
 また冗長データ3は、対象フレーム7に含まれる元データ6のうち、符号化周波数範囲70のスペクトル成分が符号化されたデータである。すなわち、受信バッファ52は、波形データの対象フレーム7に関する復元波形5の波形品質に基づいて、対象フレーム7に含まれる波形データ(元データ6)の周波数範囲のうち符号化周波数範囲70に割り当てられた冗長データ3を受信する。本実施形態では、受信バッファ52は、受信部に相当する。
 demux部53は、受信バッファ52に記憶されたパケット1を適宜読み込み、符号化済みの主データ2と、符号化済みの冗長データ3とに分離する。また分離された各データには、後述する再生データ選択部56においてデータを特定するためのデータ番号(フレームID)が付加される。
 またdemux部53は、通信コントローラ51にパケット損失情報を問い合わせる。パケット損失が生じた場合には、後段の各部にパケット損失情報を出力する。
 主データバッファ54及び冗長データバッファ55は、demux部53により分離された符号化済みの主データ2及び冗長データ3を一時的に記憶するバッファである。
The reception buffer 52 is a buffer that receives the packet 1 transmitted from the transmission device 20 and temporarily stores the packet 1.
As described above, the packet 1 includes the encoded main data 2 and the redundant data 3.
Of these, the main data 2 is data in which the original data 6 included in the transmission frame 8 is encoded in the transmission device 20.
Further, the redundant data 3 is data in which the spectrum component of the coded frequency range 70 is encoded in the original data 6 included in the target frame 7. That is, the reception buffer 52 is assigned to the coded frequency range 70 of the frequency range of the waveform data (original data 6) included in the target frame 7 based on the waveform quality of the restored waveform 5 with respect to the target frame 7 of the waveform data. The redundant data 3 is received. In the present embodiment, the reception buffer 52 corresponds to the receiving unit.
The demux unit 53 appropriately reads the packet 1 stored in the reception buffer 52 and separates the encoded main data 2 into the encoded redundant data 3. Further, a data number (frame ID) for specifying the data is added to each separated data in the reproduction data selection unit 56 described later.
Further, the demux unit 53 inquires the communication controller 51 for packet loss information. When packet loss occurs, packet loss information is output to each part in the subsequent stage.
The main data buffer 54 and the redundant data buffer 55 are buffers that temporarily store the encoded main data 2 and the redundant data 3 separated by the demux unit 53.
 再生データ選択部56は、主データバッファ54または冗長データバッファ55から、再生すべきデータ(以下、再生データと記載する)を読み込む。再生データは、適正な順番でフレームが再生するように、時系列に沿って選択される。
 また再生データ選択部56は、再生データの有無を信号処理部58へ通知する。
 復号化部57は、再生データ選択部56により選択された再生データ(主データ2または1個以上の冗長データ3)を読み込み、各データを対応する符号化方法に応じて復号化する。
The reproduction data selection unit 56 reads data to be reproduced (hereinafter, referred to as reproduction data) from the main data buffer 54 or the redundant data buffer 55. The playback data is selected in chronological order so that the frames are played back in an appropriate order.
Further, the reproduction data selection unit 56 notifies the signal processing unit 58 of the presence / absence of the reproduction data.
The decoding unit 57 reads the reproduction data (main data 2 or one or more redundant data 3) selected by the reproduction data selection unit 56, and decodes each data according to the corresponding coding method.
 信号処理部58は、復号化部57により復号化されたデータ(主データ2、又は冗長データ3)に対して信号処理を行い、最終的な時間波形を表すデジタルデータを生成する。
 例えば、パケット損失がなく主データ2が適正に受信されている場合、復号化された主データ2に対して、周波数-時間変換(例えばIMDCT)を実行する。
 また例えば、パケット損失が発生し、再生するべき主データ2が存在しない場合、対応する冗長データ3と、合成波形を生成するためのデータ(損失データの近傍フレームの主データ2等)とに基づいて、補間データを生成する。この補間データに対して、周波数-時間変換等の処理を実行する。
 ここでは、合成波形を生成する方法として、対象フレーム7に対して1フレーム前の元データ6(主データ2)をコピーする方法を用いるものとする。
The signal processing unit 58 performs signal processing on the data (main data 2 or redundant data 3) decoded by the decoding unit 57, and generates digital data representing the final time waveform.
For example, when there is no packet loss and the main data 2 is properly received, frequency-time conversion (for example, IMDCT) is executed on the decoded main data 2.
Further, for example, when packet loss occurs and the main data 2 to be reproduced does not exist, it is based on the corresponding redundant data 3 and the data for generating the composite waveform (main data 2 in the frame near the loss data, etc.). To generate interpolated data. Processing such as frequency-time conversion is executed on this interpolated data.
Here, as a method of generating the composite waveform, it is assumed that a method of copying the original data 6 (main data 2) one frame before the target frame 7 is used.
 オーディオDAC59は、信号処理部58による処理を受けたデジタルデータに対して、デジタル-アナログ変換を行い、アナログのオーディオ信号を生成する。このオーディオ信号が、図示しないスピーカ等の再生素子に入力されて、オーディオファイルの音声(波形データ10の波形)が再生される。 The audio DAC 59 performs digital-analog conversion on the digital data processed by the signal processing unit 58 to generate an analog audio signal. This audio signal is input to a reproduction element such as a speaker (not shown), and the sound of the audio file (waveform of waveform data 10) is reproduced.
 図14は、受信装置50に含まれる信号処理部58の構成例を示すブロック図である。
 信号処理部58は、スペクトル置換部60と、スペクトルバッファ61と、IMDCT部62と、時間信号出力部63とを有する。
 スペクトル置換部60は、パケット損失が生じた場合に、前段の復号化部57で復号化された冗長データ3に対してスペクトル成分の置換処理を実行する。またパケット損失が生じていない場合は、復号化された主データ2を取得し、そのまま後段に出力する。以下では、スペクトル置換部60から出力されるデータをスペクトルデータと記載する。
FIG. 14 is a block diagram showing a configuration example of the signal processing unit 58 included in the receiving device 50.
The signal processing unit 58 includes a spectrum replacement unit 60, a spectrum buffer 61, an IMDCT unit 62, and a time signal output unit 63.
When a packet loss occurs, the spectrum replacement unit 60 executes a spectrum component replacement process on the redundant data 3 decoded by the decoding unit 57 in the previous stage. If no packet loss has occurred, the decoded main data 2 is acquired and output as it is to the subsequent stage. In the following, the data output from the spectrum replacement unit 60 will be referred to as spectrum data.
 またスペクトル置換部60には、冗長データ3の有無を指定する情報や、スペクトル成分の置換方法を指定する情報が入力される。置換方法を指定する情報としては、置換に用いるデータを指定する情報や、置換を行う周波数範囲を指定する情報が含まれる。これらの情報に基づいて、スペクトル成分の置換処理が実行される。 Further, information for specifying the presence / absence of redundant data 3 and information for specifying a method for replacing spectrum components are input to the spectrum replacement unit 60. The information for specifying the replacement method includes information for specifying the data used for the replacement and information for specifying the frequency range for the replacement. Based on this information, the spectral component replacement process is executed.
 例えばパケット損失が生じた場合、スペクトル置換部60には、損失したフレーム(対象フレーム7)に関する冗長データ3と、スペクトルバッファ61に記憶された対象フレーム7の1フレーム前のスペクトルデータとが入力される。
 上記したように、受信装置50では、対象フレームの1フレーム前の主データ2のコピーを合成波形11とする方法が用いられる。従って、1フレーム前のスペクトルデータは、合成波形11のスペクトルを表す合成データとなる。
 スペクトル置換部60は、1フレーム前のスペクトルデータ(合成データ)を用いて、冗長データ3に割り当てられた符号化周波数範囲70以外の補間範囲71のスペクトル成分を置換して、新たなスペクトルデータ(補間データ)を出力する。
 すなわち、スペクトル置換部60及びスペクトルバッファ61により、対象フレーム7に関する合成波形11が生成され、冗長データ3を合成波形11で補間した補間データが生成される。この補間データにより表される波形が、復元波形5となる。
For example, when packet loss occurs, the spectrum replacement unit 60 is input with redundant data 3 regarding the lost frame (target frame 7) and spectrum data one frame before the target frame 7 stored in the spectrum buffer 61. NS.
As described above, in the receiving device 50, a method is used in which a copy of the main data 2 one frame before the target frame is used as the composite waveform 11. Therefore, the spectrum data one frame before is the composite data representing the spectrum of the composite waveform 11.
The spectrum replacement unit 60 replaces the spectrum components of the interpolation range 71 other than the coding frequency range 70 assigned to the redundant data 3 by using the spectrum data (composite data) one frame before, and obtains new spectrum data (composite data). Interpolated data) is output.
That is, the spectrum replacement unit 60 and the spectrum buffer 61 generate the composite waveform 11 for the target frame 7, and the interpolated data obtained by interpolating the redundant data 3 with the composite waveform 11 is generated. The waveform represented by the interpolated data is the restored waveform 5.
 このように、スペクトル置換部60から出力されるスペクトルデータは、パケット損失が生じていない場合には、復号化された主データ2となり、パケット損失が生じた場合には、補間データとなる。これらのスペクトルデータは、以降のフレームに対してスペクトル成分の置換処理が必要になった場合に備えて、スペクトルバッファ61に適宜格納される。
 本実施形態では、スペクトル置換部60及びスペクトルバッファ61が共動することで、冗長データに基づいて復元波形を生成する波形復元部が実現される。
As described above, the spectrum data output from the spectrum replacement unit 60 becomes the decoded main data 2 when the packet loss does not occur, and becomes the interpolated data when the packet loss occurs. These spectral data are appropriately stored in the spectral buffer 61 in case the subsequent frames need to be replaced with spectral components.
In the present embodiment, the spectrum replacement unit 60 and the spectrum buffer 61 work together to realize a waveform restoration unit that generates a restoration waveform based on redundant data.
 IMDCT部62は、スペクトル置換部60から出力されたスペクトルデータ(復号化された主データ2、または補間データ)に対してIMDCTを実行する。これにより、フレーム単位で時間波形を表すデータが復元される。
 時間信号出力部63は、IMDCTの結果に対して合成窓(図6参照)をかけ、前回のIMDCTの結果との重畳加算を実行する。これにより、時間的に連続するデジタルオーディオ信号(デジタルデータ)を再構成することが可能となる。重畳加算の結果は、後段のオーディオDACに出力される。
The IMDCT unit 62 executes the IMDCT on the spectrum data (decoded main data 2 or interpolated data) output from the spectrum replacement unit 60. As a result, the data representing the time waveform is restored in frame units.
The time signal output unit 63 applies a composite window (see FIG. 6) to the result of IMDCT, and executes superimposition addition with the result of the previous IMDCT. This makes it possible to reconstruct a digital audio signal (digital data) that is continuous in time. The result of the superimposition addition is output to the audio DAC in the subsequent stage.
 図15は、信号処理部58の動作の一例を示すフローチャートである。この処理は、フレーム単位で継続して実行されるループ処理である。
 以下では、冗長データ3の有無が通知され、対応する主データ2又は冗長データ3が存在する場合には、信号処理部58(スペクトル置換部60)は、前段の処理で復号が完了した主データ2又は冗長データ3のスペクトルを取得できているものとする。
FIG. 15 is a flowchart showing an example of the operation of the signal processing unit 58. This process is a loop process that is continuously executed on a frame-by-frame basis.
In the following, the presence / absence of the redundant data 3 is notified, and when the corresponding main data 2 or the redundant data 3 exists, the signal processing unit 58 (spectrum replacement unit 60) is the main data whose decoding is completed in the previous stage processing. It is assumed that the spectrum of 2 or the redundant data 3 can be acquired.
 まず、スペクトル置換部60が取得したデータが主データ2であるか否かが判定される(ステップ301)。取得したデータが主データ2である場合(ステップ301のYES)、スペクトルバッファ61に主データ2が格納される(ステップ306)。
 例えばパケット1の損失等によって、適正な主データ2が取得できない場合、スペクトル置換部60は、対応する冗長データ3を取得する。このように、取得したデータが主データ2ではない場合(ステップ301のNO)、スペクトルバッファ61から前処理結果のスペクトルが取得される(ステップ302)。例えば、1フレーム前の処理結果、すなわち1フレーム前の主データ2が取得される。なお、前処理結果のスペクトルは、送信装置20で実行されたMDCTにより生成されたMDCTスペクトルである。
First, it is determined whether or not the data acquired by the spectrum replacement unit 60 is the main data 2 (step 301). When the acquired data is the main data 2 (YES in step 301), the main data 2 is stored in the spectrum buffer 61 (step 306).
For example, when the proper main data 2 cannot be acquired due to the loss of the packet 1 or the like, the spectrum replacement unit 60 acquires the corresponding redundant data 3. In this way, when the acquired data is not the main data 2 (NO in step 301), the spectrum of the preprocessing result is acquired from the spectrum buffer 61 (step 302). For example, the processing result one frame before, that is, the main data 2 one frame before is acquired. The spectrum of the preprocessing result is an MDCT spectrum generated by the MDCT executed by the transmission device 20.
 1フレーム前の処理結果が取得されると、スペクトル置換部60により波形/スペクトル合成処理が実行される(ステップ303)。具体的には、所定の波形合成方法を用いて、1フレーム前の処理結果から、合成波形11のスペクトルX'dec[]を生成する処理である。なおX[]は、周波数に対応するインデックスkを用いた配列を意味する。
 ここでは、波形合成方法として、前フレームのコピーを合成波形とする方法が用いられるため、スペクトルバッファ内に格納された前処理結果のスペクトルがそのまま合成波形11のスペクトルX'dec[]として用いられる。
When the processing result one frame before is acquired, the spectrum replacement unit 60 executes the waveform / spectrum synthesis processing (step 303). Specifically, it is a process of generating the spectrum X'dec [] of the composite waveform 11 from the processing result one frame before by using a predetermined waveform synthesis method. Note that X [] means an array using the index k corresponding to the frequency.
Here, as the waveform synthesis method, a method in which a copy of the previous frame is used as the composite waveform is used, so that the spectrum of the preprocessing result stored in the spectrum buffer is used as it is as the spectrum X'dec [] of the composite waveform 11. ..
 次に、スペクトル置換部60により、スペクトル成分を置換する置換領域を設定する置換領域設定処理が実行される(ステップ304)。この処理では、再生するフレームのスペクトルXout[]が準備される。このXout[]が、補間データとして出力されることになる。
 置換領域設定処理では、Xout[]に対して、冗長データ3のスペクトルXredun[]を割り当てるインデックスと、合成波形11のスペクトルX'dec[]を割り当てるインデックスとをそれぞれ算出する処理である。
Next, the spectrum substitution unit 60 executes a substitution region setting process for setting a substitution region for replacing the spectrum component (step 304). In this process, the spectrum X out [] of the frame to be reproduced is prepared. This X out [] will be output as interpolated data.
In replacement region setting process, to the X out [], a process of calculating the index assigning a spectrum X REDUN [] of the redundant data 3, and an index for allocating the spectrum X 'dec [] of the composite waveform 11, respectively.
 次に、スペクトル置換部60により、スペクトル成分を置換するスペクトル置換処理が実行される(ステップ305)。具体的には、置換領域設定処理で割り当てられたインデックスに基づいて、Xout[]の各スペクトル成分が冗長データ3又は合成波形11のスペクトル成分で置換される。この結果、Xout[]は、冗長データ3を合成波形11で補間した補間データとなる。
 置換領域設定処理及びスペクトル置換処理については、後で詳しく説明する。
 補間データとして生成されたXout[]は、次フレームの処理のために、スペクトルバッファ61に格納される(ステップ306)。
Next, the spectrum replacement unit 60 executes a spectrum replacement process for replacing the spectrum components (step 305). Specifically, each spectral component of X out [] is replaced with the spectral component of the redundant data 3 or the composite waveform 11 based on the index assigned in the replacement region setting process. As a result, X out [] becomes interpolated data obtained by interpolating the redundant data 3 with the composite waveform 11.
The substitution region setting process and the spectrum substitution process will be described in detail later.
The X out [] generated as the interpolated data is stored in the spectrum buffer 61 for processing the next frame (step 306).
 スペクトル置換部60で処理されたスペクトルデータ(主データ2又は補間データ)は、IMDCT部62に入力され、IMDCT処理が実行される(ステップ307)。これにより、MDCTスペクトルは、時間波形を表すデータに変換される。
 最後に、時間信号出力部63により、IMDCTの結果に対して合成窓をかけて、1フレーム前のIMDCTの結果に重畳加算して、デジタルオーディオ信号が再構成される。
The spectrum data (main data 2 or interpolated data) processed by the spectrum replacement unit 60 is input to the IMDCT unit 62, and the IMDCT process is executed (step 307). As a result, the M DCT spectrum is converted into data representing a time waveform.
Finally, the time signal output unit 63 covers the result of the IMDCT with a composite window, superimposes and adds it to the result of the IMDCT one frame before, and reconstructs the digital audio signal.
 図16は、置換領域設定処理の一例を示すフローチャートである。図16に示す処理は、図15のステップ304の内部処理の一例である。
 図17は、置換領域設定処理により設定される周波数範囲の一例を示す模式図である。
 以下では、各スペクトルデータのインデックスを示す変数をispと記載する。また、冗長データ3のスペクトルのインデックスが格納される配列をredun_isp[]と記載し、合成波形11で置き換えるべきスペクトルのインデックスが格納される配列をreplace_isp[]と記載する。
 置換領域設定処理は、redun_isp[]及びreplace_isp[]に適切なインデックスを代入する処理である。
FIG. 16 is a flowchart showing an example of the replacement area setting process. The process shown in FIG. 16 is an example of the internal process of step 304 in FIG.
FIG. 17 is a schematic diagram showing an example of a frequency range set by the replacement region setting process.
In the following, the variable indicating the index of each spectrum data is described as ISP. Further, the array in which the index of the spectrum of the redundant data 3 is stored is described as redun_isp [], and the array in which the index of the spectrum to be replaced by the composite waveform 11 is stored is described as reply_isp [].
The substitution area setting process is a process of substituting an appropriate index into redun_isp [] and reply_isp [].
 まず、再生するフレームIDに対応する冗長データ3が存在するか否かが判定される(ステップ401)。冗長データ3が存在する場合(ステップ401のYES)、redun_isp[]にインデックスを格納する処理(ステップ402)が実行され、続いてreplace_isp[]にインデックスを格納する処理(ステップ403)が実行される。 First, it is determined whether or not the redundant data 3 corresponding to the frame ID to be reproduced exists (step 401). When the redundant data 3 exists (YES in step 401), the process of storing the index in redun_isp [] (step 402) is executed, and then the process of storing the index in reply_isp [] (step 403) is executed. ..
 図17Aには、redun_isp[]及びreplace_isp[]に追加されるインデックスの範囲(周波数範囲)がそれぞれ図示されている。
 ステップ402では、符号化周波数範囲70に含まれるインデックス(ここでは0~符号化する最高周波数を示すインデックス)が連番でredun_isp[]に格納される。すなわちredun_isp[]には、復号化された冗長データ3(Xredun[])のインデックスが格納される。例えば、isp=100までのスペクトルが冗長データ3として用いられていた場合、redun_isp[]には0~100までの101個の数字が入力される。
 ステップ403では、符号化周波数範囲70以外の周波数範囲である補間範囲71に含まれるインデックスが連番でreplace_isp[]に格納される。すなわち、redun_isp[]に格納されたインデックス以外のインデックスがreplace_isp[]に格納される。例えば、符号化周波数範囲70の最大のインデックスが100であり、総スペクトル数が256本に設定されている場合、replace_isp[]には、101~255までの数字が入力される。replace_isp[]に格納されたインデックスについては、合成波形11のスペクトルX'dec[]が用いられることになる。
FIG. 17A illustrates the index range (frequency range) added to redun_isp [] and reply_isp [], respectively.
In step 402, the indexes included in the coded frequency range 70 (here, the index indicating the highest frequency to be coded from 0) are sequentially stored in redun_isp []. That is, the index of the decoded redundant data 3 (X redun []) is stored in redund_isp []. For example, when the spectrum up to isp = 100 is used as the redundant data 3, 101 numbers from 0 to 100 are input to redund_isp [].
In step 403, the indexes included in the interpolation range 71, which is a frequency range other than the coding frequency range 70, are sequentially stored in response_isp []. That is, an index other than the index stored in redun_isp [] is stored in reply_isp []. For example, when the maximum index of the coded frequency range 70 is 100 and the total number of spectra is set to 256, a number from 101 to 255 is input to reply_isp []. The index stored in replace_isp [], so that the spectrum X 'dec [] synthesized waveform 11 is used.
 また、再生するフレームIDに対応する冗長データ3が存在しない場合(ステップ401のNO)、入力する要素は無いものとしてredun_isp[]が処理される(ステップ404)。従ってredun_isp[]はブランクデータとなる。 If the redundant data 3 corresponding to the frame ID to be reproduced does not exist (NO in step 401), redun_isp [] is processed assuming that there is no element to be input (step 404). Therefore, redun_isp [] becomes blank data.
 また本実施形態では、冗長データ3が存在しない場合、スペクトルバッファ61に記憶された前処理結果のスペクトルのみを使用して代替となるデータを生成する処理が実行される。
 この処理では、前処理結果のスペクトルを使用する際にトーン成分を含めるモードと、トーン成分を除外するモードとを選択可能である。ここで、トーン成分とは、トーン性のスペクトル成分である。例えば、一定の周波数の音(トーン)を再生させるスペクトル成分が、トーン成分となる。前処理結果のスペクトルをそのまま用いることによりトーン成分の位相がずれ、時間波形が不連続となることによって、聴感上の違和感が生じる場合がある。
Further, in the present embodiment, when the redundant data 3 does not exist, a process of generating alternative data by using only the spectrum of the preprocessing result stored in the spectrum buffer 61 is executed.
In this process, it is possible to select a mode in which the tone component is included and a mode in which the tone component is excluded when using the spectrum of the preprocessing result. Here, the tone component is a spectral component having a tone property. For example, a spectral component that reproduces a sound (tone) having a constant frequency is a tone component. By using the spectrum of the preprocessing result as it is, the phase of the tone component shifts and the time waveform becomes discontinuous, which may cause a sense of discomfort in hearing.
 図16に示す処理では、トーン成分を除外するモード、すなわち前処理結果のスペクトルからトーン成分を切り分けるモードを利用するか否かが判定される(ステップ405)。
 トーン成分を除外するモードが選択された場合(ステップ405のYES)、前処理結果のスペクトルについて、トーン成分を検出するトーン成分検出処理が実行される(ステップ406)。トーン成分を検出する方法は限定されず、トーン性の周波数範囲を検出可能な任意の方法が用いられてよい。
 トーン成分が検出されると、トーン成分とその近傍の周波数範囲を指定するインデックスの一覧としてtone_isp[]が生成される(ステップ407)。例えば、検出されたトーン成分ごとに、トーン成分の±j(例えばj=3等)の周波数範囲を指定するインデックスが連番で算出され、tone_isp[]に入力される。
In the process shown in FIG. 16, it is determined whether or not to use the mode for excluding the tone component, that is, the mode for separating the tone component from the spectrum of the preprocessing result (step 405).
When the mode for excluding the tone component is selected (YES in step 405), the tone component detection process for detecting the tone component is executed for the spectrum of the preprocessing result (step 406). The method for detecting the tone component is not limited, and any method capable of detecting the tone frequency range may be used.
When the tone component is detected, tone_isp [] is generated as a list of indexes that specify the tone component and the frequency range in the vicinity thereof (step 407). For example, for each detected tone component, an index that specifies the frequency range of ± j (for example, j = 3, etc.) of the tone component is calculated in sequence and input to tone_isp [].
 次に、replace_isp[]に必要なインデックスを格納する処理が実行される。具体的には、前処理結果のスペクトルのインデックスから、tone_isp[]以外の全てのインデックスが、replace_isp[]に格納される。
 図17Bには、replace_isp[]に追加されるインデックスの範囲(周波数範囲)が図示されている。図17Bに示すように、replace_isp[]は、tone_isp[](図中の斜線の範囲)を除く周波数範囲に設定される。replace_isp[]に格納されたインデックスについては、合成波形11のスペクトルX'dec[]が用いられることになる。なお、tone_isp[]に対応するスペクトル成分は、再生用のデータとしては用いられない。これにより、トーン成分が不連続になることによって生じる聴感上の違和感を抑制することが可能となる。
Next, the process of storing the index required in reply_isp [] is executed. Specifically, from the index of the spectrum of the preprocessing result, all indexes other than tone_isp [] are stored in response_isp [].
FIG. 17B illustrates the range (frequency range) of the index added to reply_isp []. As shown in FIG. 17B, replace_isp [] is set to a frequency range excluding tone_isp [] (the range of shaded lines in the figure). The index stored in replace_isp [], so that the spectrum X 'dec [] synthesized waveform 11 is used. The spectral component corresponding to tone_isp [] is not used as data for reproduction. This makes it possible to suppress a sense of discomfort in hearing caused by discontinuity of tone components.
 トーン成分を除外するモードが選択されない場合(ステップ405のNO)、replace_isp[]に前処理結果のスペクトルの全インデックスが格納される。すなわち、トーン成分を除外しない場合は、合成波形11のスペクトルX'dec[]がそのまま代替となるデータとして用いられる。 When the mode for excluding the tone component is not selected (NO in step 405), the entire index of the spectrum of the preprocessing result is stored in reply_isp []. That is, when the tone component is not excluded, the spectrum X'dec [] of the composite waveform 11 is used as it is as alternative data.
 図18は、スペクトル置換処理の一例を示すフローチャートである。図18に示す処理は、図15のステップ305の内部処理の一例である。
 ここでは、図15の説明と同様に、冗長データ3のスペクトルをXredun[]と記載し、合成波形11のスペクトルをX'dec[]と記載する。また、スペクトル置換処理の結果出力されるスペクトルデータをXout[]と記載する。
 なお、X'dec[]及びXout[]は、分析長が2NのMDCTを使用する場合、0≦k<Nを満たすN個のスペクトルである。また、スペクトルのインデックスを示す変数をispとする。このispを走査させることで、図18に示すループ処理が実行される。
FIG. 18 is a flowchart showing an example of the spectrum replacement process. The process shown in FIG. 18 is an example of the internal process in step 305 of FIG.
Here, as in the description of Figure 15, the spectrum of the redundant data 3 is described as X REDUN [], the spectrum of the composite waveform 11 is described as X 'dec []. Further, the spectrum data output as a result of the spectrum replacement processing is described as X out [].
Note that X'dec [] and X out [] are N spectra satisfying 0 ≦ k <N when MDCT having an analysis length of 2N is used. Also, let the variable indicating the index of the spectrum be isp. By scanning this ISP, the loop processing shown in FIG. 18 is executed.
 まず、isp及びXout[]が初期化される(ステップ501)。具体的には、isp、及びXout[]の全ての要素が0に設定される(isp=0、Xout[]=0)。
 次に、現在のispが、redun_isp[]に含まれるか否かが判定される(ステップ502)。ispがredun_isp[]に含まれる場合(502のYES)、冗長データ3内に対応するスペクトル成分(Xredun[isp])が存在するので、その成分がXout[isp]に代入される(ステップ503)。
 ispがredun_isp[]に含まれない場合(502のNO)、ispが、replace_isp[]に含まれるか否かが判定される(ステップ504)。
 ispが、replace_isp[]に含まれる場合(504のYES)、合成波形11のスペクトル内に対応するスペクトル成分(X'dec[isp])が存在するので、その成分がXout[isp]に代入される(ステップ505)。
 仮に冗長データ3が存在せず、redun_isp[]が空である状態が複数回連続しているような場合、音声をフェードアウトすることができるように、X'dec[isp]に対して冗長データ3が存在しなかった回数に応じて変化する1未満の重みをかけてもよい。
 また、現在のispが、redun_isp[]及びreplace_isp[]のどちらにも当てはまらない場合(ステップ504のNO)には、Xout[isp]についての代入は実行しない。
First, ISP and X out [] are initialized (step 501). Specifically, all the elements of isp and X out [] are set to 0 (isp = 0, X out [] = 0).
Next, it is determined whether or not the current ISP is included in redun_isp [] (step 502). When isp is included in redun_isp [] (YES in 502), the corresponding spectral component (X redun [isp]) exists in the redundant data 3, and that component is assigned to X out [isp] (step). 503).
If the ISP is not included in redun_isp [] (NO in 502), it is determined whether the ISP is included in the reply_isp [] (step 504).
When isp is included in response_isp [] (YES in 504), there is a corresponding spectral component ( X'dec [isp]) in the spectrum of the composite waveform 11, so that component is substituted for X out [isp]. (Step 505).
If there is no redundant data 3, if redun_isp [] is such a state is empty is continuous multiple times, so it is possible to fade out the audio, X 'dec redundant data 3 to the [isp] May be weighted less than 1 which changes depending on the number of times that was not present.
If the current ISP does not apply to either redun_isp [] or replace_isp [] (NO in step 504), the substitution for X out [isp] is not executed.
 ステップ503又はステップ505が完了した場合、及びステップ504のNOが判定された場合、次のインデックスを走査するためにispをインクリメントする(ステップ506)。
 インクリメントされたispが、MDCTの総スペクトル数よりも小さいか否かが判定される(ステップ507)。ispが総スペクトル数よりも小さい場合(ステップ507のYES)、ステップ502以降の処理が再度実行される。このように、ispが(総スペクトル数N-1)になるまで、Xout[]に冗長データ3と合成波形11が組み合わされたスペクトル(補間データ)が格納される。
 またispが総スペクトル数以上となった場合(ステップ507のNO)、スペクトル置換処理が完了し、Xout[]がIMDCT部62に出力される。
If step 503 or step 505 is completed, and if NO in step 504 is determined, the ISP is incremented to scan the next index (step 506).
It is determined whether the incremented ISP is smaller than the total number of spectra of the MDCT (step 507). If the ISP is smaller than the total number of spectra (YES in step 507), the processes after step 502 are executed again. In this way, the spectrum (interpolated data) in which the redundant data 3 and the composite waveform 11 are combined is stored in X out [] until the ISP becomes (total number of spectra N-1).
When the ISP exceeds the total number of spectra (NO in step 507), the spectrum replacement process is completed, and X out [] is output to the IMDCT unit 62.
 以上、本実施形態に関わる送信装置20では、波形データの対象フレーム7について、その合成波形11の波形品質(ノイズスペクトル13)が予測される。この波形品質に基づいて、対象フレーム7に含まれる波形データの周波数範囲のうち冗長データ3に割り当てる符号化周波数範囲70が1つ設定される。そして符号化周波数範囲70に基づいて生成された冗長データ3を含む送信データ(パケット1)が生成される。これにより、データの伝送量を抑制しつつ高品質なエラーコンシールメントを実現することが可能となる。 As described above, in the transmission device 20 related to the present embodiment, the waveform quality (noise spectrum 13) of the composite waveform 11 of the target frame 7 of the waveform data is predicted. Based on this waveform quality, one coded frequency range 70 to be assigned to the redundant data 3 is set in the frequency range of the waveform data included in the target frame 7. Then, the transmission data (packet 1) including the redundant data 3 generated based on the coded frequency range 70 is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.
 また本実施形態に関わる受信装置50では、対象フレーム7に含まれる波形データの周波数範囲のうち1つの符号化周波数範囲70に割り当てられた冗長データ3が受信される。この符号化周波数範囲70は、対象フレーム7に関する合成波形11の波形品質を用いて設定される。また受信された冗長データ3を合成波形11で補間した補間データが生成される。これにより、データの伝送量を抑制しつつ高品質なエラーコンシールメントを実現することが可能となる。 Further, the receiving device 50 according to the present embodiment receives the redundant data 3 assigned to one of the coded frequency ranges 70 of the frequency ranges of the waveform data included in the target frame 7. The coded frequency range 70 is set using the waveform quality of the composite waveform 11 with respect to the target frame 7. Further, the interpolated data obtained by interpolating the received redundant data 3 with the composite waveform 11 is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.
 従来、損失フレームの近傍のフレームから補間データを生成するパケットロスコンシールメント方法が提案されている。
 例えば、周波数帯域ごとに異なる波形合成方法を用いて補間データを生成する「ハイブリッドコンシールメント方法」が知られている(特許文献1参照)。
 一般に時間ドメインで行われる波形合成は、4kHz~5kHz程度までに分布する音声に対して有効だが、演算量が大きく、またそれ以上の帯域で余計な調波構造が発生しビープ音が発生する可能性がある。一方、周波数ドメインで行われる波形合成は、特に高域のノイズ性の成分に対して有効に作用し演算量も小さいことが多いが、音声に対しては位相不連続によるクリック音が生じてしまう恐れがある。
 また一般の音源では、特に中低域に音声が含まれ、高域はノイズが含まれることが多い。これを利用して、「ハイブリッドコンシールメント方法」では、低域には時間ドメインの手法を、高域には周波数ドメインの手法を用いて波形合成を行うことで、演算量を低下させ、合成された音声の品質が維持される。
Conventionally, a packet loss concealment method for generating interpolated data from a frame in the vicinity of a loss frame has been proposed.
For example, there is known a "hybrid concealment method" that generates interpolation data using a waveform synthesis method that differs for each frequency band (see Patent Document 1).
Waveform synthesis generally performed in the time domain is effective for voices distributed in the range of 4 kHz to 5 kHz, but the amount of calculation is large, and an extra wave tuning structure may be generated in a band higher than that, and a beep sound may be generated. There is sex. On the other hand, waveform synthesis performed in the frequency domain works effectively especially for high-frequency noise components and the amount of calculation is often small, but click sounds due to phase discontinuity occur for voice. There is a fear.
Further, in a general sound source, voice is included especially in the mid-low range, and noise is often included in the high range. Utilizing this, in the "hybrid concealment method", the amount of calculation is reduced and synthesized by performing waveform synthesis using the time domain method for the low frequency band and the frequency domain method for the high frequency band. The quality of the voice is maintained.
 しかしながら、この方法は既存の波形合成方法の組合せであり、例えばオーケストラ等複数楽器の組合せのように、波形が非周期的でピッチ周期の検出が困難な場合や、パワーが変動するような場合には良好な結果が得られない可能性がある。また例えば楽器のように、調波構造を持つ音が必ずしも音声と同等の中低域に存在するとは限らない。このため、1つの周波数だけで波形合成の方法を分けるだけでは、それぞれのコンシールメント方法に伴うノイズ成分等が聞こえてしまう可能性がある。このように、音源の種類によっては、品質が低下する恐れがある。また、中低域に対しては、演算量の大きい時間ドメインでのコンシールメントが行われるため、一定の演算負荷が残る可能性がある。 However, this method is a combination of existing waveform synthesis methods, for example, when the waveform is aperiodic and it is difficult to detect the pitch period, such as a combination of multiple musical instruments such as an orchestra, or when the power fluctuates. May not give good results. Further, for example, a sound having a wave-tuning structure, such as a musical instrument, does not always exist in the mid-low range equivalent to voice. Therefore, if the waveform synthesis methods are separated for only one frequency, noise components and the like associated with each concealment method may be heard. As described above, the quality may deteriorate depending on the type of sound source. In addition, since concealment is performed in the time domain with a large amount of calculation for the mid-low range, a certain calculation load may remain.
 また、本来送信したいデータ(主データ)に対して、予め近傍フレームの冗長データを付加し、パケットが損失した場合に補間データとする方法が考えられる。このように冗長データを用いる場合、コンシールメント処理によって生じる受信側の演算量がほとんど増加せず、高品質化が可能であるが、伝送するデータ量の大幅な増加が生じてしまう。 Another possible method is to add redundant data of neighboring frames in advance to the data (main data) that is originally desired to be transmitted, and use it as interpolated data when a packet is lost. When redundant data is used in this way, the amount of calculation on the receiving side generated by the concealment processing hardly increases, and high quality can be achieved, but the amount of data to be transmitted is significantly increased.
 これに対して、冗長データを付加するか否かを制御する方法が提案されている(特開2003-249957)。この方法では、主データから合成した前後フレームに相当する波形と、本来の前後フレームの波形とのSN比やケプストラム距離を比較して、閾値以下のときにのみ冗長データが付加される。これにより、平均冗長データ量を削減することが可能であるが、冗長データを付加するフレームが続いたときには、冗長データ量が削減されない。 On the other hand, a method of controlling whether or not to add redundant data has been proposed (Japanese Patent Laid-Open No. 2003-249957). In this method, the SN ratio and cepstrum distance between the waveform corresponding to the front and rear frames synthesized from the main data and the waveform of the original front and rear frames are compared, and redundant data is added only when the value is equal to or less than the threshold value. This makes it possible to reduce the average amount of redundant data, but the amount of redundant data is not reduced when frames for adding redundant data continue.
 本実施形態では、送信装置20において、本来送信したい主データの近傍のフレームが、一部の周波数範囲(符号化周波数範囲70)に限り符号化して冗長データ3が生成される。
 この符号化周波数範囲70は、受信装置50で用いられる合成波形の波形品質(ノイズスペクトル13等)に基づいて設定される。
 またパケット損失が生じた場合には、受信装置50において、損失データに対応する冗長データ3の周波数スペクトルのうち、符号化周波数範囲70以外の補間範囲71が、過去に正常に受信した近傍フレームから生成した合成波形11の周波数スペクトルに置換される。このように、冗長データ3が合成波形11を用いて補間された補間データが、再生用のデータとして用いられる。
In the present embodiment, in the transmission device 20, the frames in the vicinity of the main data to be originally transmitted are encoded only in a part of the frequency range (encoded frequency range 70) to generate the redundant data 3.
The coded frequency range 70 is set based on the waveform quality (noise spectrum 13, etc.) of the composite waveform used in the receiving device 50.
When packet loss occurs, in the receiving device 50, in the frequency spectrum of the redundant data 3 corresponding to the loss data, the interpolation range 71 other than the coded frequency range 70 is from a nearby frame that has been normally received in the past. It is replaced with the frequency spectrum of the generated composite waveform 11. In this way, the interpolated data in which the redundant data 3 is interpolated using the composite waveform 11 is used as the data for reproduction.
 これにより、例えば合成波形によるノイズが大きくなるような範囲については、冗長データ3を利用し、ノイズが小さくなるような範囲については、合成波形11を用いるといったことが可能となる。この結果、補間データの品質を十分に向上することが可能となり、高品質なエラーコンシールメントを実現することが可能となる。 As a result, for example, it is possible to use the redundant data 3 in the range where the noise due to the composite waveform becomes large, and to use the composite waveform 11 in the range where the noise becomes small. As a result, the quality of the interpolated data can be sufficiently improved, and high-quality error concealment can be realized.
 また、送信装置20では、合成波形の波形品質を予め予測して、その品質に基づいて符号化周波数範囲70の幅が設定される。これにより、符号化周波数範囲70の幅を、フレームごとに適正に設定することが可能となる。この結果、各フレームについて適正な範囲の冗長データ3を生成することが可能となる。また符号化周波数範囲70の幅が適宜変化するため、例えば固定幅の周波数範囲等を用いる場合と比べ、冗長データ3のデータ量が低下し、伝送されるデータ量を抑制することが可能となる。 Further, in the transmission device 20, the waveform quality of the composite waveform is predicted in advance, and the width of the coding frequency range 70 is set based on the quality. This makes it possible to appropriately set the width of the coded frequency range 70 for each frame. As a result, it becomes possible to generate redundant data 3 in an appropriate range for each frame. Further, since the width of the coded frequency range 70 is appropriately changed, the amount of redundant data 3 is reduced as compared with the case of using, for example, a fixed width frequency range, and the amount of transmitted data can be suppressed. ..
 また、冗長データ3のターゲットデータ量が設定されている場合には、冗長データ3の品質も含めて補間データ全体での品質を予測し、補間データの品質が最大化されるように符号化周波数範囲70が設定される。これにより、決められたデータ量の中で、最も品質の高いエラーコンシールメントを実現することが可能となる。 When the target data amount of the redundant data 3 is set, the quality of the entire interpolated data including the quality of the redundant data 3 is predicted, and the coding frequency is maximized so that the quality of the interpolated data is maximized. The range 70 is set. This makes it possible to realize the highest quality error concealment in the determined amount of data.
 また本実施形態では、合成波形の波形品質を用いることで、波形合成方法の特性に応じた最適な符号化周波数範囲70を設定可能である。これにより、例えば前フレームをコピーするといった単純な波形合成方法が用いられる場合であっても、最終的なノイズ量を低下させるように、符号化周波数範囲70が設定可能である。これにより、エラーコンシールメントの品質を保ちつつ、受信装置50側での演算負荷を大幅に低減することが可能となる。 Further, in the present embodiment, the optimum coding frequency range 70 can be set according to the characteristics of the waveform synthesis method by using the waveform quality of the composite waveform. As a result, the coding frequency range 70 can be set so as to reduce the final amount of noise even when a simple waveform synthesis method such as copying the previous frame is used. As a result, it is possible to significantly reduce the calculation load on the receiving device 50 side while maintaining the quality of the error concealment.
 なお、上記では主に低域側に設定された符号化周波数範囲70について説明した。これに限定されず、例えば符号化周波数範囲70の最低周波数が、ユースケース等に応じて任意の周波数に固定して設定されてもよい。この場合、符号化周波数範囲70の最高周波数が合成波形のノイズに応じて設定される。これにより、ユースケースに応じた高品質なエラーコンシールメントを実現することが可能となる。 In the above, the coding frequency range 70 set mainly on the low frequency side has been described. Not limited to this, for example, the lowest frequency of the coded frequency range 70 may be fixedly set to an arbitrary frequency according to a use case or the like. In this case, the maximum frequency of the coded frequency range 70 is set according to the noise of the composite waveform. This makes it possible to realize high-quality error concealment according to the use case.
 <第2の実施形態>
 本技術に係る第2の実施形態の送受信システムについて説明する。これ以降の説明では、上記の実施形態で説明した送受信システム100における構成及び作用と同様な部分については、その説明を省略又は簡略化する。
<Second embodiment>
The transmission / reception system of the second embodiment according to the present technology will be described. In the following description, the description of the same parts as the configuration and operation in the transmission / reception system 100 described in the above embodiment will be omitted or simplified.
 図19は、第2の実施形態に係る符号化周波数範囲70の一例を示す模式図である。
 本実施形態では、冗長データ3に割り当てる周波数範囲として、複数の符号化周波数範囲70が設定される。また、符号化周波数範囲70の位置、幅、数等は自由に設定可能である。
 従って、本実施形態では、送信装置20により、対象フレーム7のデータが、合成波形の波形品質に応じて、必ずしも低域に限定されない符号化周波数範囲70に限り符号化され、冗長データ3が生成される。
 図19に示す例では、2つの符号化周波数範囲70が設定される。この符号化周波数範囲70に含まれるスペクトル成分が冗長データ3として用いられる。また符号化周波数範囲70以外の範囲は、合成波形を用いて補間される補間範囲71となる。
FIG. 19 is a schematic diagram showing an example of the coded frequency range 70 according to the second embodiment.
In the present embodiment, a plurality of coded frequency ranges 70 are set as the frequency range assigned to the redundant data 3. Further, the position, width, number, etc. of the coded frequency range 70 can be freely set.
Therefore, in the present embodiment, the data of the target frame 7 is encoded by the transmission device 20 only in the coding frequency range 70 which is not necessarily limited to the low frequency range according to the waveform quality of the composite waveform, and the redundant data 3 is generated. Will be done.
In the example shown in FIG. 19, two coding frequency ranges 70 are set. The spectral components included in the coded frequency range 70 are used as the redundant data 3. The range other than the coded frequency range 70 is the interpolation range 71 that is interpolated using the composite waveform.
 本実施形態に係る送受信システムの構成は、例えば上記の実施形態で説明した送受信システム100(送信装置20及び受信装置50)の構成と略同様である。以下では、各構成について、送受信システム100と同様の符号を用いて説明する。
 なお、本実施形態では、符号化周波数範囲70を設定する符号化範囲設定処理の処理内容と、パケット1として送信されるデータの構成とが、上記の実施形態とは異なる。
 具体的には、符号化範囲設定処理では、図19に示すように複数の符号化周波数範囲70を設定する処理が実行される。また複数の符号化周波数範囲70を特定するためのメタ情報がパケット1に追加される。送信装置20では、このメタ情報が冗長データ3の一部として符号化部25により符号化される。
The configuration of the transmission / reception system according to the present embodiment is substantially the same as the configuration of the transmission / reception system 100 (transmission device 20 and reception device 50) described in the above embodiment, for example. Hereinafter, each configuration will be described using the same reference numerals as those of the transmission / reception system 100.
In the present embodiment, the processing content of the coding range setting process for setting the coding frequency range 70 and the configuration of the data transmitted as the packet 1 are different from the above-described embodiment.
Specifically, in the coding range setting process, a process of setting a plurality of coding frequency ranges 70 is executed as shown in FIG. Further, meta information for specifying a plurality of coded frequency ranges 70 is added to the packet 1. In the transmission device 20, this meta information is encoded by the coding unit 25 as a part of the redundant data 3.
 上記したように、送信装置20では、複数の符号化周波数範囲70を自由に選択することが可能である。このため、各符号化周波数範囲70を特定するために両端のインデックスが用いられる。具体的には、最低周波数のインデックスlspと、最高周波数のインデックスhspを指定する情報がメタ情報として生成される。
 メタ情報では、i番目の符号化周波数範囲70が、lsp_iからhsp_iまでの周波数範囲として指定される。例えば、2番目の符号化周波数範囲70がk=20からk=33までの周波数範囲であったとする。この場合、(lsp_2,hsp_2)=(20,33)という形式で、2番目の符号化周波数範囲70が表される。
As described above, in the transmission device 20, a plurality of coding frequency ranges 70 can be freely selected. Therefore, indexes at both ends are used to specify each coded frequency range 70. Specifically, information that specifies the lowest frequency index lsp and the highest frequency index hsp is generated as meta information.
In the meta information, the i-th coded frequency range 70 is designated as the frequency range from lsp_i to hsp_i. For example, suppose that the second coded frequency range 70 is the frequency range from k = 20 to k = 33. In this case, the second coded frequency range 70 is represented in the form (lsp_2, hsp_2) = (20,33).
 なお、符号化周波数範囲70の数が増加するとメタ情報が増加するため、主データ2や冗長データ3を送信するために使用可能なデータ量が圧迫される可能性がある。
 このような事態を防ぐため、本実施形態では、予め許容可能な符号化周波数範囲70の最大数が設定される。また、符号化範囲設定処理では、符号化周波数範囲70の候補となる複数の候補範囲が算出される。これらの候補範囲が、最大数に収まるように集約される。この点については、図24及び図25等を参照して後に詳しく説明する。
As the number of coded frequency ranges 70 increases, the meta information increases, which may put pressure on the amount of data that can be used to transmit the main data 2 and the redundant data 3.
In order to prevent such a situation, in the present embodiment, the maximum number of coded frequency ranges 70 that can be tolerated is set in advance. Further, in the coding range setting process, a plurality of candidate ranges that are candidates for the coding frequency range 70 are calculated. These candidate ranges are aggregated to fit in the maximum number. This point will be described in detail later with reference to FIGS. 24 and 25 and the like.
 [符号化周波数範囲の算出]
 本実施形態では、符号化範囲設定部33により、ノイズスペクトル13が第2の閾値を超える少なくとも1つの超過範囲が算出され、少なくとも1つの超過範囲に基づいて符号化周波数範囲70が設定される。
 ノイズスペクトル13は、合成波形を用いることで生じるノイズのパワースペクトルPnoise(k)であり、例えば図10を参照して説明した方法により算出される。第2の閾値は、例えば許容されるノイズのパワーに応じて設定される。
[Calculation of coded frequency range]
In the present embodiment, the coding range setting unit 33 calculates at least one excess range in which the noise spectrum 13 exceeds the second threshold value, and sets the coding frequency range 70 based on at least one excess range.
The noise spectrum 13 is a power spectrum P noise (k) of noise generated by using the composite waveform, and is calculated by, for example, the method described with reference to FIG. The second threshold is set according to, for example, the power of noise allowed.
 超過範囲は、ノイズスペクトル13のスペクトル成分(ノイズのパワー)が第2の閾値を超える周波数範囲である。従って、算出される超過範囲の個数は、ノイズスペクトル13の状態に応じて変化する。すなわち、超過範囲の個数はフレームごとに異なる可能性がある。
 このように、第2の閾値を用いることで、ノイズのパワーが許容レベルを超えている周波数範囲を選択的に検出することが可能となる。これにより、例えばノイズが少ない範囲を冗長データ3の周波数範囲から除外することが可能となり、冗長データ3のデータ量を抑制することが可能となる。
The excess range is a frequency range in which the spectral component (noise power) of the noise spectrum 13 exceeds the second threshold value. Therefore, the number of calculated excess ranges changes according to the state of the noise spectrum 13. That is, the number of excess ranges may vary from frame to frame.
In this way, by using the second threshold value, it is possible to selectively detect the frequency range in which the noise power exceeds the permissible level. As a result, for example, a range with less noise can be excluded from the frequency range of the redundant data 3, and the amount of data of the redundant data 3 can be suppressed.
 図20は、符号化周波数範囲70の算出例を示す模式図である。図20には、ノイズスペクトル13(Pnoise(k))が太い実線で図示されている。また第2の閾値が点線で図示されている。
 図20に示すように、本実施形態では、第2の閾値として、周波数ごとに設定された閾値曲線14が用いられる。具体的には、閾値曲線14として、人間の聴覚特性等(例えばラウドネス曲線)に基づいて周波数毎に設定された閾値thresh(k)が用いられる。
 thresh(k)では、例えば人間に聞こえやすい周波数については低い閾値が設定され、人間に聞こえにくい周波数については高い閾値が設定される。これにより、合成波形11を用いることで人間に聞こえやすいクリティカルなノイズが発生する範囲を、冗長データ3に選択的に割り当てることが可能となる。また、人間に聞こえにくいノイズが発生する範囲を合成波形11に割り当てることで、冗長データ3のデータ量を抑制することが可能となる。
FIG. 20 is a schematic diagram showing a calculation example of the coded frequency range 70. In FIG. 20, the noise spectrum 13 (P noise (k)) is illustrated by a thick solid line. The second threshold value is shown by the dotted line.
As shown in FIG. 20, in the present embodiment, the threshold curve 14 set for each frequency is used as the second threshold. Specifically, as the threshold curve 14, a threshold threshold (k) set for each frequency based on human auditory characteristics (for example, a loudness curve) is used.
In threshold (k), for example, a low threshold is set for a frequency that is easy for humans to hear, and a high threshold is set for a frequency that is hard for humans to hear. This makes it possible to selectively allocate a range in which critical noise that is easily heard by humans is generated to the redundant data 3 by using the composite waveform 11. Further, by allocating a range in which noise that is difficult for humans to hear is generated to the composite waveform 11, it is possible to suppress the amount of redundant data 3.
 符号化範囲設定処理では、符号化範囲設定部33により、ノイズスペクトル13(Pnoise(k))と閾値曲線14(thresh(k))とが比較され、ノイズのパワーがthresh(k)よりも大きくなる周波数範囲(超過範囲74)が算出される。具体的には、以下の式を満たすインデックスkの集合が算出される。
Figure JPOXMLDOC01-appb-M000011
 Pnoise(k)としては、典型的には上記した(数4)を用いて算出されるノイズのパワーの移動平均(Pnoise-smoothed(k))が用いられる。平均化されたノイズスペクトル13を用いることで、算出される超過範囲74の数を抑えることが可能となり、後段の処理に要する演算負荷等を軽減することが可能である。なお、(数1)~(数3)を用いて算出されるノイズスペクトルが用いられてもよい。
In the coding range setting process, the coding range setting unit 33 compares the noise spectrum 13 (P noise (k)) with the threshold curve 14 (thresh (k)), and the noise power is higher than that of the threshold (k). An increasing frequency range (excess range 74) is calculated. Specifically, a set of indexes k satisfying the following equation is calculated.
Figure JPOXMLDOC01-appb-M000011
As P noise (k), a moving average of noise power (P noise-smoothed (k)) calculated using the above (Equation 4) is typically used. By using the averaged noise spectrum 13, it is possible to suppress the number of the calculated excess ranges 74, and it is possible to reduce the calculation load and the like required for the subsequent processing. The noise spectrum calculated by using (Equation 1) to (Equation 3) may be used.
 本実施形態では、(数11)の関係を満たす超過範囲74に加えて、超過範囲74の前後に位置する所定数のインデックス(例えば3つ分のインデックス等)を含む周波数範囲が、符号化周波数範囲70に設定される。
 図20には、符号化周波数範囲70に設定される領域が斜線の領域として模式的に図示されている。このように、超過範囲74の近傍のスペクトルも冗長データ3に加えることで、合成波形11によるノイズの発生を確実に抑制することが可能となる。もちろん、超過範囲74をそのまま符号化周波数範囲70として用いることも可能である。
In the present embodiment, in addition to the excess range 74 satisfying the relationship of (Equation 11), the frequency range including a predetermined number of indexes (for example, three indexes) located before and after the excess range 74 is the coding frequency. The range is set to 70.
In FIG. 20, the region set in the coding frequency range 70 is schematically illustrated as a shaded region. As described above, by adding the spectrum in the vicinity of the excess range 74 to the redundant data 3, it is possible to surely suppress the generation of noise due to the composite waveform 11. Of course, the excess range 74 can be used as it is as the coding frequency range 70.
 なお、図20に示す方法で設定された符号化周波数範囲70は、後段の処理により集約、調整することが可能である。従って、超過範囲74とその前後のインデックスを用いて設定された周波数範囲は、符号化周波数範囲70の候補となる候補範囲75であるともいえる。
 このように、本実施形態では、符号化周波数範囲70の候補となる複数の候補範囲75が算出され、複数の候補範囲75に基づいて符号化周波数範囲70が設定される。
The coded frequency range 70 set by the method shown in FIG. 20 can be aggregated and adjusted by the subsequent processing. Therefore, it can be said that the frequency range set by using the excess range 74 and the indexes before and after the excess range 74 is the candidate range 75 which is a candidate for the coded frequency range 70.
As described above, in the present embodiment, a plurality of candidate ranges 75 that are candidates for the coding frequency range 70 are calculated, and the coding frequency range 70 is set based on the plurality of candidate ranges 75.
 図21は、符号化範囲設定処理の一例を示すフローチャートである。図21に示す処理は、例えば符号化範囲設定部33により、図9に示すステップ104に代えて実行される処理である。
 まず、ノイズスペクトル13(Pnoise(k))と閾値曲線14(thresh(k))とに基づいて、候補範囲75が設定される(ステップ601)。ここでは、図20を参照して説明した方法に従って、候補範囲75が設定される。すなわち、(数11)に示す関係を満たすインデックスの範囲(超過範囲74)が算出され、その前後の所定数のインデックスを含む周波数範囲が候補範囲75として設定される。
FIG. 21 is a flowchart showing an example of the coding range setting process. The process shown in FIG. 21 is a process executed by, for example, the coding range setting unit 33 in place of step 104 shown in FIG.
First, the candidate range 75 is set based on the noise spectrum 13 (P noise (k)) and the threshold curve 14 (thresh (k)) (step 601). Here, the candidate range 75 is set according to the method described with reference to FIG. That is, an index range (excess range 74) that satisfies the relationship shown in (Equation 11) is calculated, and a frequency range including a predetermined number of indexes before and after that range is set as the candidate range 75.
 次に、閾値曲線14(thresh(k))を合成波形11のスペクトルX'(k)に基づいて調整し、候補範囲75を再調整する処理が実行される(ステップ602)。具体的には、合成波形11のスペクトルX'(k)のパワーを参照して、マスキング閾値を計算する要領でthresh(k)が調整される。
 一般に、ある周波数の音が存在した場合に、一定の音量以下の音が聞こえにくくなる帯域(マスキングされる帯域)が生じる場合がある。この音が聞こえにくくなる音量(マスキング閾値)を基準にthresh(k)が調整される。
 例えば一度設定された候補範囲75以外の領域におけるX'(k)のパワーから、再生される可能性があるスペクトル成分が検出される。このスペクトル成分により聞こえなくなる帯域についてはノイズのパワーが高くても聴覚上の違和感が少ないものとして、thresh(k)が高く設定される。これにより、候補範囲75の幅が狭くなり、冗長データ3のデータ量を抑制することが可能となる。
Next, a process of adjusting the threshold curve 14 (thresh (k)) based on the spectrum X'(k) of the composite waveform 11 and readjusting the candidate range 75 is executed (step 602). Specifically, the threshold value is adjusted in the manner of calculating the masking threshold value with reference to the power of the spectrum X'(k) of the composite waveform 11.
In general, when a sound having a certain frequency is present, a band (masked band) in which it is difficult to hear a sound below a certain volume may occur. The threshold (k) is adjusted based on the volume (masking threshold) at which this sound becomes difficult to hear.
For example, a spectral component that may be reproduced is detected from the power of X'(k) in a region other than the candidate range 75 once set. In the band that cannot be heard due to this spectral component, the threshold (k) is set high so that there is little auditory discomfort even if the noise power is high. As a result, the width of the candidate range 75 is narrowed, and the amount of redundant data 3 can be suppressed.
 次に、トーン外成分を候補範囲75から除外するトーン外成分除外処理が実行される(ステップ603)。ここで、トーン外成分とは、元波形4のスペクトルX(k)又は合成波形11のスペクトルX'(k)に含まれるトーン成分以外の成分である。トーン外成分は、例えばフレームが切り替わることで消滅する単純なノイズ性の成分であるといえる。従って、(数11)の関係を満たすスペクトル成分のうち、トーン外成分に該当する成分は、例えば合成波形11に置き換えてしまっても聴感上の影響が少ない。
 トーン外成分除外処理では、このようなトーン外成分を、候補範囲75から除外して候補範囲75を狭める処理である。この点については、後に詳しく説明する。
Next, the non-tone component exclusion process for excluding the non-tone component from the candidate range 75 is executed (step 603). Here, the non-tone component is a component other than the tone component included in the spectrum X (k) of the original waveform 4 or the spectrum X'(k) of the composite waveform 11. It can be said that the non-tone component is a simple noise component that disappears when the frame is switched, for example. Therefore, among the spectral components satisfying the relationship (Equation 11), the component corresponding to the non-tone component has little effect on the sense of hearing even if it is replaced with, for example, the synthetic waveform 11.
The non-tone component exclusion process is a process of excluding such non-tone components from the candidate range 75 to narrow the candidate range 75. This point will be described in detail later.
 次に、上記までのステップで算出された候補範囲75を集約する周波数範囲集約処理が実行される(ステップ604)。この処理により、候補範囲75の数が削減される。
 最後に、削減された候補範囲75の幅を縮小する周波数範囲調整処理が実行される(ステップ605)。この処理は、特に冗長データ3にターゲットデータ量が存在し、ターゲットデータ量が小さいことで量子化ノイズが問題となるユースケースにおいて有効である。
 なおステップ602、603、及び605の処理は、例えば要求されるノイズレベル等に応じて適宜実行されてもよい。
Next, a frequency range aggregation process for aggregating the candidate ranges 75 calculated in the steps up to the above is executed (step 604). This process reduces the number of candidate ranges 75.
Finally, a frequency range adjustment process is performed to reduce the width of the reduced candidate range 75 (step 605). This process is particularly effective in use cases where the amount of target data exists in the redundant data 3 and the amount of target data is small, so that quantization noise becomes a problem.
The processes of steps 602, 603, and 605 may be appropriately executed according to, for example, the required noise level.
 [トーン外成分除外処理]
 図22は、トーン外成分除外処理について説明するための模式図である。
 トーン外成分除外処理では、元波形4(対象フレーム7の元データ6)から、トーン成分15が抽出される。より詳しくは、元波形4のスペクトルX(k)に含まれるトーン成分15が抽出される。トーン成分15の抽出には、例えば近傍フレームとスペクトルの特徴を比較する等の方法が用いられる。
 図22の上段には、一例として|X(k)|におけるトーン成分15が丸印を用いて模式的に図示されている。
[Exclusion process for non-tone components]
FIG. 22 is a schematic diagram for explaining the non-tone component exclusion process.
In the non-tone component exclusion process, the tone component 15 is extracted from the original waveform 4 (original data 6 of the target frame 7). More specifically, the tone component 15 included in the spectrum X (k) of the original waveform 4 is extracted. For the extraction of the tone component 15, for example, a method of comparing the characteristics of the neighboring frame and the spectrum is used.
In the upper part of FIG. 22, as an example, the tone component 15 in | X (k) | is schematically illustrated using circles.
 以下では、トーン成分15とその近傍(例えば3スペクトル)のインデックスの集合(配列)をtone_isp[]と記載する。例えばtone_isp[]以外の範囲に含まれる成分が、トーン外成分となる。
 また、前段の処理(ステップ601又はステップ602)で算出された候補範囲75を示すインデックスの集合をenc_isp[]と記載する。
 図22の上段及び中段には、tone_isp[]及びenc_isp[]に対応する周波数範囲がそれぞれ模式的に図示されている。
In the following, a set (array) of indexes of the tone component 15 and its vicinity (for example, 3 spectra) is described as tone_isp []. For example, a component included in a range other than tone_isp [] is a non-tone component.
Further, a set of indexes indicating the candidate range 75 calculated in the previous process (step 601 or step 602) is described as enc_isp [].
In the upper and middle stages of FIG. 22, the frequency ranges corresponding to tone_isp [] and enc_isp [] are schematically illustrated.
 enc_isp[]で示す範囲の内、ある一定の周波数fc(例えば2kHz)以上の周波数範囲では、トーン成分15以外のPnoise(k)が大きい場合であっても、聴感上の違和感を生じにくい。すなわちトーン外成分に由来するノイズは、fc以上では聞こえにくい。
 なお、周波数fcの値は限定されず、例えば要求されるノイズレベル等に応じて任意に設定可能である。本実施形態では、周波数fcは、所定の閾値周波数に相当する。
Within the range indicated by enc_isp [], in a frequency range of a certain frequency fc (for example, 2 kHz) or higher, even when P noise (k) other than the tone component 15 is large, a sense of discomfort in hearing is unlikely to occur. That is, the noise derived from the non-tone component is hard to hear above fc.
The value of the frequency fc is not limited, and can be arbitrarily set according to, for example, the required noise level. In this embodiment, the frequency fc corresponds to a predetermined threshold frequency.
 このように、トーン性ではない通常のノイズ性の成分(トーン外成分)については、Pnoise(k)が大きくなる場合でも聴感上の違和感が少ない可能性がある。
 本実施形態では、周波数fc以上の範囲が、トーン性とノイズ性の周波数範囲に切り分けられる。そしてノイズ性と判断された周波数範囲については、Pnoise(k)の大きさにかかわらず符号化周波数範囲70(候補範囲75)から除外し、トーン性と判断された周波数範囲のみを残す処理が実行される。
As described above, with respect to a normal noise component (non-tone component) that is not toned, there is a possibility that there is little sense of discomfort in hearing even when P noise (k) becomes large.
In the present embodiment, the range of the frequency fc or higher is divided into the tone property and the noise property frequency range. Then, the frequency range judged to be noisy is excluded from the coded frequency range 70 (candidate range 75) regardless of the magnitude of P noise (k), and only the frequency range judged to be toned is left. Will be executed.
 図22の下段には、トーン外成分を除外するように調整された候補範囲75が模式的に図示されている。
 例えば周波数がfc以上となる領域では、tone_isp[]及びenc_isp[]の積集合(tone_isp[]∩enc_isp[])が算出される。この積集合が表す周波数範囲が新たな候補範囲75に設定される。この結果、enc_isp[]の幅は、トーン成分15とその近傍のインデックスを残して縮小される。このように、周波数fcの高周波数側で、トーン成分15が含まれるように候補範囲75の幅が調整される。
 なお、周波数fcの高周波数側で、トーン成分15とその近傍のインデックスとが完全に候補範囲75に含まれるように、候補範囲75(enc_isp[])の幅が拡大されてもよい。これにより、トーン成分15を確実に冗長データ3で置き換えることが可能となる。
In the lower part of FIG. 22, the candidate range 75 adjusted to exclude the non-tone component is schematically shown.
For example, in the region where the frequency is fc or more, the intersection of tone_isp [] and enc_isp [] (tone_isp [] ∩enc_isp []) is calculated. The frequency range represented by this intersection is set in the new candidate range 75. As a result, the width of enc_isp [] is reduced leaving the index of the tone component 15 and its vicinity. In this way, the width of the candidate range 75 is adjusted so that the tone component 15 is included on the high frequency side of the frequency fc.
The width of the candidate range 75 (enc_isp []) may be expanded so that the tone component 15 and the index in the vicinity thereof are completely included in the candidate range 75 on the high frequency side of the frequency fc. As a result, the tone component 15 can be reliably replaced with the redundant data 3.
 また周波数がfc未満となる領域では、トーン成分だけでなく、トーン外成分に由来するノイズも聴感上の違和感を発生させる可能性が高い。このため、周波数fcの低周波数側では、enc_isp[]がそのまま候補範囲75に設定される。
 例えば、図22に示すように、k=0Hzを含み周波数fc未満となるような候補範囲75については、enc_isp[]がそのまま新たな候補範囲75に設定される。
Further, in the region where the frequency is less than fc, not only the tone component but also the noise derived from the non-tone component is likely to cause a sense of discomfort in hearing. Therefore, on the low frequency side of the frequency fc, enc_isp [] is set to the candidate range 75 as it is.
For example, as shown in FIG. 22, for the candidate range 75 including k = 0 Hz and having a frequency lower than the frequency fc, enc_isp [] is set as it is in the new candidate range 75.
 図23は、トーン外成分除外処理の一例を示すフローチャートである。図23に示す処理は、図21のステップ603の内部処理の一例である。
 まず、元波形4及び合成波形11の各スペクトルを表すデータ(X(k)、及びX'(k))が取得される(ステップ701)。各スペクトルは、FFTスペクトルでもよいし、MDCTスペクトルでもよい。なお、本実施形態では、合成波形11は、冗長データ3の1フレーム前のデータをコピーして生成されるものとする。すなわち、X'(k)は、前フレームのスペクトルXprev(k)となる。
 またステップ701では、後述するトーン成分検出処理に必要な過去のスペクトル等が必要に応じて取得されてもよい。以下では、X(k)、及びX'(k)の2つのスペクトルを用いて処理が行われるものとする。
FIG. 23 is a flowchart showing an example of the non-tone component exclusion process. The process shown in FIG. 23 is an example of the internal process of step 603 of FIG.
First, data (X (k) and X'(k)) representing each spectrum of the original waveform 4 and the composite waveform 11 are acquired (step 701). Each spectrum may be an FFT spectrum or an MDCT spectrum. In the present embodiment, the composite waveform 11 is generated by copying the data one frame before the redundant data 3. That is, X'(k) is the spectrum X prev (k) of the previous frame.
Further, in step 701, a past spectrum or the like necessary for the tone component detection process described later may be acquired as necessary. In the following, it is assumed that the processing is performed using the two spectra of X (k) and X'(k).
 次に、元波形4及び合成波形11の各スペクトルのパワー(|X(k)|、及び|X'(k)|)が算出される(ステップ702)。この処理は、トーン成分検出処理において、各スペクトルのパワーが必要となる場合に実行される。 Next, the power (| X (k) | and | X'(k) |) of each spectrum of the original waveform 4 and the composite waveform 11 is calculated (step 702). This process is executed when the power of each spectrum is required in the tone component detection process.
 次に、トーン成分15を検出するトーン成分検出処理が実行される(ステップ703)。この処理では、例えばX(k)、及びX'(k)の各スペクトルからそれぞれトーン成分15が検出される。例えば、パワースペクトルの形状や、前後フレームにおけるスペクトル間での時間的相関等を考慮して、各スペクトル内におけるトーン性の強いスペクトル成分がトーン成分15として算出される。トーン成分15を算出する方法は限定されない。 Next, the tone component detection process for detecting the tone component 15 is executed (step 703). In this process, for example, the tone component 15 is detected from each spectrum of X (k) and X'(k). For example, a spectral component having a strong tone property in each spectrum is calculated as the tone component 15 in consideration of the shape of the power spectrum, the temporal correlation between the spectra in the preceding and following frames, and the like. The method for calculating the tone component 15 is not limited.
 次に、トーン成分15を含むインデックスの集合tone_isp[]が生成される(ステップ704)。具体的には、トーン成分15とその近傍のインデックス(例えば周波数軸上での前後3スペクトル分のインデックス)が取得され、これらのインデックスがtone_isp[]に格納される。 Next, a set of indexes including the tone component 15 tone_isp [] is generated (step 704). Specifically, indexes of the tone component 15 and its vicinity (for example, indexes for three spectra before and after on the frequency axis) are acquired, and these indexes are stored in tone_isp [].
 最後に、tone_isp[]に基づいて、すでに算出されている候補範囲75(enc_isp[])が更新される(ステップ705)。具体的には、図22を参照して説明したように、周波数fc以上の範囲ではtone_isp[]及びenc_isp[]の積集合が候補範囲75に設定され、周波数fc未満の範囲ではenc_isp[]がそのまま候補範囲75に設定される。 Finally, the already calculated candidate range 75 (enc_isp []) is updated based on tone_isp [] (step 705). Specifically, as described with reference to FIG. 22, the intersection of tone_isp [] and enc_isp [] is set in the candidate range 75 in the range above the frequency fc, and enc_isp [] is set in the range below the frequency fc. The candidate range 75 is set as it is.
 これにより、例えば周波数fc以上では、X(k)、及びX'(k)のトーン成分15を含むように候補範囲75の幅が縮小される。
 例えば、X(k)のトーン成分15は、合成波形11を用いた場合に損失する成分である。またX'(k)のトーン成分15は、合成波形11を用いた場合に追加される成分である。
 従って、各トーン成分15が含まれるように候補範囲75の幅を狭めることで、このようなトーン成分15の損失及び追加の発生を回避しながら、冗長データ3のデータ量を低下させることが可能となる。
As a result, for example, at frequencies fc and above, the width of the candidate range 75 is reduced so as to include the tone components 15 of X (k) and X'(k).
For example, the tone component 15 of X (k) is a component that is lost when the composite waveform 11 is used. Further, the tone component 15 of X'(k) is a component added when the synthetic waveform 11 is used.
Therefore, by narrowing the width of the candidate range 75 so that each tone component 15 is included, it is possible to reduce the amount of redundant data 3 while avoiding such loss and additional occurrence of the tone component 15. It becomes.
 [周波数範囲集約処理]
 図24は、周波数範囲集約処理について説明するための模式図である。
 ここでは、上記したトーン外成分除外処理により、図24の上段に示すように5つの候補範囲75(範囲1~範囲5)が生成されているものとする。これらの候補範囲75の総数Nを、図24の下段に示すように、予め設定された最大数Nmax(ここではNmax=2)にまで集約する処理が実行される。なお図24の中段には、候補範囲75を集約する過程で生成される候補範囲75が図示されている。
[Frequency range aggregation processing]
FIG. 24 is a schematic diagram for explaining the frequency range aggregation process.
Here, it is assumed that five candidate ranges 75 (ranges 1 to 5) are generated as shown in the upper part of FIG. 24 by the above-mentioned non-tone component exclusion process. As shown in the lower part of FIG. 24, a process of aggregating the total number N of these candidate ranges 75 up to a preset maximum number N max (here, N max = 2) is executed. The candidate range 75 generated in the process of aggregating the candidate ranges 75 is shown in the middle of FIG. 24.
 本実施形態では、符号化範囲設定部33により、互いに隣接する候補範囲75を連結することで変化するノイズ量を表す連結コストが算出される。そして、連結コストに基づいて候補範囲が連結される。
 連結コストは、例えば、候補範囲75を連結することで生じるノイズ量の増減を示す指標である。例えば候補範囲75を連結すると、受信側で生成される補間データに含まれる量子化ノイズの総量が変化する。連結コストは、例えばノイズ量が増加する場合に高くなり、ノイズ量が減少する場合に低くなるように設定される。連結コストについては、後に詳しく説明する。
In the present embodiment, the coding range setting unit 33 calculates the connection cost representing the amount of noise that changes by connecting the candidate ranges 75 adjacent to each other. Then, the candidate range is concatenated based on the concatenation cost.
The concatenation cost is, for example, an index indicating an increase or decrease in the amount of noise generated by concatenating the candidate ranges 75. For example, when the candidate ranges 75 are concatenated, the total amount of quantization noise included in the interpolated data generated on the receiving side changes. The connection cost is set so as to increase when the amount of noise increases and decrease when the amount of noise decreases, for example. The consolidation cost will be described in detail later.
 本実施形態では、連結コストを用いた2段階の集約処理が実行される。
 1段階目の処理は、各候補範囲75の間にある符号化しない範囲に関する連結コストを算出し、連結コストが一定の閾値以下の場合、両端の周波数範囲を一つにまとめる処理である。この処理は、周波数範囲の不用意な分散を防ぐために必ず行われる処理である。
 図24に示す上段から中段への集約処理は、1段階目の処理の一例である。
 2段階目の処理は、1段階目の処理で候補範囲75の数Nが、最大数Nmaxを超えている場合に実行され、連結コストが最小となる候補範囲75のペアから順番に連結することを繰り返す処理である。この処理は、候補範囲75の数Nが最大数Nmaxに収まるまで実行される。
 図24に示す中段から下段への集約処理は、2段階目の処理の一例である。
In this embodiment, a two-step aggregation process using the consolidation cost is executed.
The first-stage process is a process of calculating the connection cost for the unencoded range between the candidate ranges 75, and when the connection cost is equal to or less than a certain threshold value, the frequency ranges at both ends are combined into one. This process is always performed to prevent inadvertent dispersion of the frequency range.
The aggregation process from the upper stage to the middle stage shown in FIG. 24 is an example of the first stage process.
The second stage processing is executed when the number N of the candidate range 75 exceeds the maximum number N max in the first stage processing, and the pair of the candidate range 75 with the minimum connection cost is concatenated in order. It is a process that repeats the above. This process is executed until the number N of the candidate range 75 falls within the maximum number N max.
The aggregation process from the middle stage to the lower stage shown in FIG. 24 is an example of the second stage process.
 候補範囲75を連結する処理は、例えば以下の方法で実行される。
 まず低周波数側から2つの候補範囲75のペアを比較し、連結コストが後述する「結合する条件」を満たす場合は、対象となる2つの候補範囲75を結合して1つの周波数範囲(候補範囲75)とする。以下では、各候補範囲75を表す番号を範囲番号iと記載する。
 候補範囲75の結合とは、範囲番号i及びi+1の候補範囲75を削除して、「元の範囲番号iの最低周波数のインデックスlsp_i」~「元の範囲番号i+1の最高周波数のインデックスhsp_(i+1)」までの周波数範囲を、新たな候補範囲75として生成することを意味する。
The process of connecting the candidate ranges 75 is executed by, for example, the following method.
First, the pair of two candidate ranges 75 is compared from the low frequency side, and if the connection cost satisfies the "combination condition" described later, the two target candidate ranges 75 are combined into one frequency range (candidate range). 75). In the following, a number representing each candidate range 75 will be referred to as a range number i.
The combination of the candidate range 75 means that the candidate range 75 of the range number i and i + 1 is deleted, and the "lowest frequency index lsp_i of the original range number i" to the "highest frequency index hsp_ (i + 1) of the original range number i + 1" are deleted. ) ”, Means to generate a new candidate range 75.
 例えば図24の上段から中段に候補範囲75を集約する処理であれば、範囲番号2及び3の候補範囲75(上段の範囲2及び範囲3)が結合されて、lsp2~hsp3の範囲をもつ新たな候補範囲75(中段の範囲2)が生成される。新たに生成された候補範囲のインデックスを、lsp_2'(=lsp_2)及びhsp_2'(=hsp_3)と記載する。
 この処理を、元々N個ある候補範囲75のうち、1~(N-1)番目の候補範囲75に対して順番に行うことで、候補範囲75が集約される。
For example, in the process of aggregating the candidate ranges 75 from the upper to the middle of FIG. 24, the candidate ranges 75 of the range numbers 2 and 3 (ranges 2 and 3 in the upper row) are combined to form a new range of lsp2 to hsp3. Candidate range 75 (range 2 in the middle row) is generated. The indexes of the newly generated candidate range are described as lsp_2'(= lsp_2) and hsp_2'(= hsp_3).
By performing this process in order for the 1st to (N-1) th candidate ranges 75 out of the originally N candidate ranges 75, the candidate ranges 75 are aggregated.
 候補範囲75を結合する条件について、以下に3種類の方法を説明する。なお、これら3種類以外の方法が用いられてもよい。
 第1の方法は、候補範囲75の結合により1組分のメタ情報が減ることで、フレーム全体の量子化ノイズが減少するか否かを判定する方法である。
 この方法では、初めに対象フレーム7全体での量子化ノイズNq(k)の総和(PNQ-sum)、すなわちN個の候補範囲75についての量子化ノイズNq(k)の総和が以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000012
 なお、(数12)では、候補範囲75の範囲番号をjと記載している。
Three types of methods will be described below for the conditions for combining the candidate ranges 75. In addition, a method other than these three types may be used.
The first method is a method of determining whether or not the quantization noise of the entire frame is reduced by reducing the meta information for one set by combining the candidate ranges 75.
In this method, first, the sum of the quantization noises N q (k) in the entire target frame 7 (P NQ-sum ), that is, the sum of the quantization noises N q (k) for the N candidate ranges 75 is as follows. It is calculated according to the formula of.
Figure JPOXMLDOC01-appb-M000012
In (Equation 12), the range number of the candidate range 75 is described as j.
 次に、j番目の候補範囲75と、j+1番目の候補範囲75とを結合した場合の、対象フレーム7全体での量子化ノイズN'q(k)の総和(P'NQ-sum)が算出される。P'NQ-sumは、結合された候補範囲75を含むN-1個の候補範囲75についての量子化ノイズN'q(k)の総和である。
 なお、メタ情報が減ったことで、割り当て可能なビット数が増えるため、N'q(k)は、Nq(k)よりも小さくなる可能性がある。P'NQ-sumは、以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000013
 ここで、lsp'_j及びhsp'_jは、結合により変化した範囲番号jの最低及び最高周波数のインデックスである。
Next, the sum (P'NQ-sum ) of the quantization noise N'q (k) in the entire target frame 7 when the j-th candidate range 75 and the j + 1-th candidate range 75 are combined is calculated. Will be done. P'NQ-sum is the sum of the quantization noises N'q (k) for N-1 candidate ranges 75 including the combined candidate ranges 75.
Note that by meta-information is reduced, since the number of available allocation bits is increased, N 'q (k) is likely to be smaller than N q (k). P'NQ-sum is calculated according to the following formula.
Figure JPOXMLDOC01-appb-M000013
Here, lsp'_j and hsp'_j are indexes of the lowest and highest frequencies of the range number j changed by coupling.
 次に、(数12)及び(数13)に従って算出された量子化ノイズの総和の差分(ΔPall-noise)が、以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000014
 すなわちΔPall-noiseは、j番目及びj+1番目の候補範囲75の結合による量子化ノイズの総量の変化量となる。このΔPall-noiseが連結コストとして用いられる。
Next, the difference (ΔP all-noise ) of the total sum of the quantization noises calculated according to (Equation 12) and (Equation 13) is calculated according to the following equation.
Figure JPOXMLDOC01-appb-M000014
That is, ΔP all-noise is the amount of change in the total amount of quantization noise due to the combination of the j-th and j + 1-th candidate ranges 75. This ΔP all-noise is used as the connection cost.
 j番目及びj+1番目の候補範囲75を結合する条件として、ここでは量子化ノイズの総量が減少するという条件が設定される。すなわち、ΔPall-noiseが以下の条件を満たす場合に、j番目及びj+1番目の候補範囲75が結合される。
Figure JPOXMLDOC01-appb-M000015
 これにより、量子化ノイズを増加させることなく、候補範囲75の数を減少させてメタ情報のデータ量を小さくすることが可能となる。
As a condition for combining the j-th and j + 1-th candidate ranges 75, a condition that the total amount of quantization noise is reduced is set here. That is, when ΔP all-noise satisfies the following conditions, the j-th and j + 1-th candidate ranges 75 are combined.
Figure JPOXMLDOC01-appb-M000015
This makes it possible to reduce the number of candidate ranges 75 and reduce the amount of meta information data without increasing the quantization noise.
 第2の方法は、候補範囲75の間に含まれる元波形4のスペクトルのパワー|X(k)|の総和が所定の閾値以下であるか否かを判定する方法である。この方法では、候補範囲75の間の|X(k)|の総和が、連結コストとして用いられる。
 例えば|X(k)|が大きい周波数範囲を合成波形11に置き換えた場合、発生するノイズ量が増加する可能性が高い。逆に、|X(k)|が小さい周波数範囲を合成波形11に置き換えても、発生するノイズ量は小さいと推定される。これを利用して、|X(k)|の総和に基づいて、候補範囲75の結合が判定される。この方法は、量子化ノイズの算出処理等が不要であり、第1の方法の簡易版であるともいえる。
The second method is a method of determining whether or not the sum of the power | X (k) | of the spectrum of the original waveform 4 included in the candidate range 75 is equal to or less than a predetermined threshold value. In this method, the sum of | X (k) | between the candidate ranges 75 is used as the consolidation cost.
For example, when the frequency range in which | X (k) | is large is replaced with the composite waveform 11, the amount of generated noise is likely to increase. On the contrary, even if the frequency range in which | X (k) | is small is replaced with the composite waveform 11, the amount of noise generated is estimated to be small. Utilizing this, the combination of the candidate range 75 is determined based on the sum of | X (k) |. This method does not require calculation processing of quantization noise and can be said to be a simplified version of the first method.
 具体的には、i番目及びi+1番目の候補範囲75の間の周波数範囲(中間範囲)についての、スペクトルのパワーの総和Psum_inter_iが、以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000016
 このように算出されたPsum_inter_iが所定の閾値以下である場合に、i番目及びi+1番目の候補範囲75が結合される。
 なお、Psum_inter_iを判定するための閾値は、メタ情報のデータ量等に関連して設定される値でもよいし、予め定められた固定値でもよい。
Specifically, the sum P sum_inter_i of the powers of the spectra for the frequency range (intermediate range) between the i-th and i + 1-th candidate ranges 75 is calculated according to the following equation.
Figure JPOXMLDOC01-appb-M000016
When the P sum_inter_i calculated in this way is equal to or less than a predetermined threshold value, the i-th and i + 1-th candidate ranges 75 are combined.
The threshold value for determining P sum_inter_i may be a value set in relation to the amount of data of meta information or the like, or may be a predetermined fixed value.
 第3の方法は、候補範囲75の間隔(スペクトルのインデックスの差)が所定の閾値以下であるか否かを判定する方法である。この方法では、候補範囲75の間隔が、連結コストとして用いられる。
 例えば、候補範囲75の間隔が小さい部分は、合成波形11に置き換えても発生するノイズ量は小さいと推定される。これを利用して、候補範囲75の間隔に基づいて、候補範囲75の結合が判定される。この方法は、スペクトルのパワーの総和の算出処理等が不要であり、第2の方法の簡易版であるともいえる。
The third method is a method of determining whether or not the interval (difference in the index of the spectrum) of the candidate range 75 is equal to or less than a predetermined threshold value. In this method, the interval of the candidate range 75 is used as the consolidation cost.
For example, it is estimated that the amount of noise generated is small even if the portion where the interval of the candidate range 75 is small is replaced with the composite waveform 11. Utilizing this, the combination of the candidate range 75 is determined based on the interval of the candidate range 75. This method does not require a process of calculating the total power of the spectra, and can be said to be a simplified version of the second method.
 具体的には、i番目及びi+1番目の候補範囲75の間のインデックスの総数が、以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000017
 このように算出された候補範囲75の間のインデックスの総数(候補範囲75の間隔)が、所定の閾値以下である場合に、i番目及びi+1番目の候補範囲75が結合される。
Specifically, the total number of indexes between the i-th and i + 1-th candidate ranges 75 is calculated according to the following formula.
Figure JPOXMLDOC01-appb-M000017
When the total number of indexes between the candidate ranges 75 calculated in this way (interval between the candidate ranges 75) is equal to or less than a predetermined threshold value, the i-th and i + 1-th candidate ranges 75 are combined.
 なお、上記した3つの方法は、候補範囲75を連結する1段階目の処理(図24の上段から中段への集約処理)に用いられる方法である。
 もし、これらの処理を実行した後でも候補範囲75の数Nが、Nmaxよりも多い場合には、上記した各コストが最小となる2つの候補範囲75のペアを、1つの周波数範囲として結合する処理が繰り返し実行される。これにより、指定された数(Nmax)まで候補範囲75を集約することが可能となる。
The above three methods are methods used for the first-stage processing (aggregation processing from the upper stage to the middle stage of FIG. 24) for connecting the candidate ranges 75.
If the number N of the candidate range 75 is larger than N max even after executing these processes, the pair of the two candidate ranges 75 having the minimum cost is combined as one frequency range. The process to be performed is repeatedly executed. As a result, the candidate range 75 can be aggregated up to the specified number (N max).
 図25は、周波数範囲集約処理の一例を示すフローチャートである。図25に示す処理は、図21のステップ604の内部処理の一例である。ここでは、上記した第2の方法で説明した連結コスト(Psum_inter_i)を用いるものとする。
 初めに、候補範囲75の数を表す変数Nと、範囲番号を表す変数iとが初期化される(ステップ801)。Nには、現在存在する候補範囲75の数(前段までの処理で算出された候補範囲75の数)が代入される。また、各候補範囲75を走査するための範囲番号を保持するための変数iに1が代入される。
FIG. 25 is a flowchart showing an example of frequency range aggregation processing. The process shown in FIG. 25 is an example of the internal process in step 604 of FIG. Here, it is assumed that the connection cost (P sum_inter_i ) described in the second method described above is used.
First, the variable N representing the number of candidate ranges 75 and the variable i representing the range number are initialized (step 801). The number of currently existing candidate ranges 75 (the number of candidate ranges 75 calculated in the processes up to the previous stage) is substituted into N. Further, 1 is assigned to the variable i for holding the range number for scanning each candidate range 75.
 次に、範囲番号iが、N-1以下であるか否かが判定される(ステップ802)。iがN-1未満である場合(ステップ802のYES)、候補範囲75を連結する1段階目の処理(ステップ803~806)が実行される。
 1段階目の処理としては、まずi番目及びi+1番目の候補範囲75の間の周波数範囲(i番目の中間範囲)を表すためのインデックスが取得される(ステップ803)。具体的には、i番目の候補範囲75の最高周波数のインデックスhsp_iと、i+1番目の候補範囲75の最低周波数のインデックスlsp_(i+1)とが読み込まれる。
 ここでは、(hsp_i+1)から(lsp_(i+1)-1)までの範囲が、i番目の中間範囲となる。
Next, it is determined whether or not the range number i is N-1 or less (step 802). If i is less than N-1 (YES in step 802), the first step of connecting the candidate ranges 75 (steps 803 to 806) is executed.
In the first step of processing, first, an index for representing the frequency range (i-th intermediate range) between the i-th and i + 1-th candidate ranges 75 is acquired (step 803). Specifically, the index hsp_i of the highest frequency of the i-th candidate range 75 and the index lsp_ (i + 1) of the lowest frequency of the i + 1th candidate range 75 are read.
Here, the range from (hsp_i + 1) to (lsp_ (i + 1) -1) is the i-th intermediate range.
 次に、i番目の中間範囲について、元波形4のスペクトルのパワーの総和Psum_inter_iが(数16)式に従って算出され、所定の閾値以下であるか否かが判定される(ステップ804)。ここでは、|X(hsp_i+1)|から|X(lsp_(i+1)-1)|の総和が閾値以下であるかが判定される。
 Psum_inter_iが閾値以下である場合(ステップ804のYES)、2つの候補範囲75(i番目及びi+1番目の候補範囲75)が結合され、1つの新しい候補範囲75が生成される(ステップ805)。また、Psum_inter_iが閾値より大きい場合(ステップ804のNO)、候補範囲75を結合する処理は行われない。
 次に、範囲番号iがインクリメントされ(ステップ806)、ステップ802によりインクリメントされた範囲番号iが、N-1以下であるか否かが判定される。このように、範囲番号iが、N-1番目になるまで、Psum_inter_iを閾値と比較する集約処理が繰り返される。
Next, for the i-th intermediate range, the total power P sum_inter_i of the spectrum of the original waveform 4 is calculated according to the equation (Equation 16), and it is determined whether or not it is equal to or less than a predetermined threshold value (step 804). Here, it is determined whether the sum of | X (hsp_i + 1) | to | X (lsp_ (i + 1) -1) | is equal to or less than the threshold value.
If P sum_inter_i is less than or equal to the threshold (YES in step 804), the two candidate ranges 75 (i-th and i + 1-th candidate ranges 75) are combined to generate one new candidate range 75 (step 805). If P sum_inter_i is larger than the threshold value (NO in step 804), the process of combining the candidate ranges 75 is not performed.
Next, the range number i is incremented (step 806), and it is determined whether or not the range number i incremented by step 802 is N-1 or less. In this way, the aggregation process of comparing P sum_inter_i with the threshold value is repeated until the range number i becomes the N-1th.
 ステップ802において、iがN-1より大きい場合、すなわちi=Nとなった場合(ステップ802のNO)、範囲数Nが更新される(ステップ807)。具体的には、範囲数Nに、現在の候補範囲75の総数(ステップ803~806で集約された候補範囲75の数)が代入される。 In step 802, when i is larger than N-1, that is, when i = N (NO in step 802), the range number N is updated (step 807). Specifically, the total number of current candidate ranges 75 (the number of candidate ranges 75 aggregated in steps 803 to 806) is substituted into the range number N.
 次に、更新された範囲数Nが、最大数Nmax以下であるか否かが判定される(ステップ808)。Nが、Nmax以下である場合(ステップ808のYES)、すなわち1段階目の集約処理で候補範囲75の数が十分に低減された場合、周波数範囲集約処理が終了する。
 またNが、Nmaxよりも大きい場合(ステップ808のNO)、2段階目の集約処理が実行される(ステップ809)。ここでは、Psum_inter_iが最小となる中間範囲の両側の候補範囲75が結合される。すなわち、|X(hsp_i+1)|から|X(lsp_(i+1)-1)|の総和が最小となるような、i番目の候補範囲75が、i+1番目の候補範囲75と結合される。
 次に、範囲数Nが更新される(ステップ810)。具体的には、ステップ809により範囲数Nが1つ減少したことに伴い、範囲数NにN-1が代入される。そして、再度ステップ808が実行され、更新済みの範囲数Nが、Nmax以下であるか否かが判定される。これにより、範囲数Nが許容される最大数Nmaxに低下するまで、連結コストが低い順番に候補範囲75が集約される。
Next, it is determined whether or not the updated range number N is equal to or less than the maximum number N max (step 808). When N is N max or less (YES in step 808), that is, when the number of candidate ranges 75 is sufficiently reduced in the first-stage aggregation process, the frequency range aggregation process ends.
When N is larger than N max (NO in step 808), the second-stage aggregation process is executed (step 809). Here, the candidate ranges 75 on both sides of the intermediate range that minimizes P sum_inter_i are combined. That is, the i-th candidate range 75 such that the sum of | X (hsp_i + 1) | to | X (lsp_ (i + 1) -1) | is minimized is combined with the i + 1-th candidate range 75.
Next, the range number N is updated (step 810). Specifically, as the number of ranges N is reduced by one in step 809, N-1 is substituted for the number of ranges N. Then, step 808 is executed again, and it is determined whether or not the updated range number N is N max or less. As a result, the candidate ranges 75 are aggregated in ascending order of connection cost until the range number N drops to the maximum allowable number N max.
 [周波数範囲調整処理]
 図26は、周波数範囲調整処理について説明するための模式図である。ここでは、図21のステップ605で実行される周波数範囲調整処理の概要について説明する。
 本実施形態では、符号化範囲設定部33により、候補範囲75の最高周波数及び最低周波数におけるノイズ成分に基づいて、候補範囲75の幅が調整される。具体的には、前段までの処理で算出された各候補範囲75の両端のインデックス(最高周波数及び最低周波数)から、Pnoise(k)が最小となるインデックス(以下、knoise-minと記載する)を検出し、そのインデックスknoise-minを候補範囲75から除外することを繰り返して、候補範囲75が縮小される。
[Frequency range adjustment processing]
FIG. 26 is a schematic diagram for explaining the frequency range adjustment process. Here, the outline of the frequency range adjustment process executed in step 605 of FIG. 21 will be described.
In the present embodiment, the coding range setting unit 33 adjusts the width of the candidate range 75 based on the noise components at the highest and lowest frequencies of the candidate range 75. Specifically, from the indexes (highest frequency and lowest frequency) at both ends of each candidate range 75 calculated in the processing up to the previous stage, the index that minimizes P noise (k) (hereinafter referred to as k noise-min ). ) Is detected, and the index k noise-min is repeatedly excluded from the candidate range 75, so that the candidate range 75 is reduced.
 例えば、図26の上段に示すように、範囲番号i=1及び2の二つの候補範囲75が前段までの処理で算出されたとする。また範囲1の両端のインデックス(lsp_1及びhsp_1)と、範囲2の両端のインデックス(lsp_2及びhsp_2)とのうち、knoise-minがhsp_1であったとする。この時、所定の条件が満たされる場合には、図26の中段に示すように、hsp_1が範囲1から除外され、範囲1の最高周波数を示すインデックスは、hsp'_1=(hsp_1)-1となる。
 また次に、knoise-minがlsp_2であり、所定の条件が満たされる場合には、図26の下段に示すように、lsp_2が範囲2から除外され、範囲2の最低周波数を示すインデックスは、lsp'_2=(lsp_2)+1となる。
 このような処理が、所定の条件が満たされる間に繰り返し実行される。
For example, as shown in the upper part of FIG. 26, it is assumed that the two candidate ranges 75 of the range numbers i = 1 and 2 are calculated by the processing up to the previous stage. Further, it is assumed that k noise-min is hsp_1 among the indexes at both ends of the range 1 (lsp_1 and hsp_1) and the indexes at both ends of the range 2 (lsp_2 and hsp_2). At this time, if a predetermined condition is satisfied, as shown in the middle part of FIG. 26, hsp_1 is excluded from the range 1, and the index indicating the highest frequency of the range 1 is hsp'_1 = (hsp_1) -1. Become.
Next, when k noise-min is lsp_2 and a predetermined condition is satisfied, lsp_2 is excluded from the range 2 as shown in the lower part of FIG. 26, and the index indicating the lowest frequency in the range 2 is lsp'_2 = (lsp_2) +1.
Such processing is repeatedly executed while a predetermined condition is satisfied.
 以下では、周波数範囲調整処理の詳細について説明する。
 初めに、N個の候補範囲75で発生する量子化ノイズNq(k)の、対象フレーム7全体での総和PNQ-sumが以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000018
これは、上記した(数12)と同様の式である。
The details of the frequency range adjustment process will be described below.
First, the sum P NQ-sum of the quantization noise N q (k) generated in the N candidate ranges 75 over the entire target frame 7 is calculated according to the following equation.
Figure JPOXMLDOC01-appb-M000018
This is the same equation as (Equation 12) described above.
 次に、各候補範囲75に対応するインデックス(lsp_i及びhsp_i)の中から、Pnoise(k)が最小となるインデックスknoise-minが検出される。なおi=1~Nである。 Next, the index k noise-min that minimizes P noise (k) is detected from the indexes (lsp_i and hsp_i) corresponding to each candidate range 75. It should be noted that i = 1 to N.
 次に、knoise-minを除外した範囲(lsp'_i及びhsp'_i)について、対象フレーム7全体での量子化ノイズNq'(k)の総和P'NQ-sumが算出される。なお、knoise-minの除外に伴い量子化ノイズNq'(k)は上記したNq(k)とは異なる値を示す。P'NQ-sum
は、以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000019
 このように、(数18)及び(数19)により、knoise-minを除外する前後での、量子化ノイズの総和が算出される。
Next, for the range excluding k noise-min (lsp'_i and hsp'_i), the sum P'NQ-sum of the quantization noise N q '(k) in the entire target frame 7 is calculated. With the exclusion of k noise-min , the quantization noise N q '(k) shows a value different from the above-mentioned N q (k). P'NQ-sum
Is calculated according to the following formula.
Figure JPOXMLDOC01-appb-M000019
In this way, (Equation 18) and (Equation 19) calculate the total sum of the quantization noise before and after excluding k noise-min.
 次に、knoise-minを除外して候補範囲75を狭めたことによって生じた対象フレーム7全体でのノイズ変化量ΔPall-noiseが算出される。ノイズ変化量ΔPall-noiseは、(量子化ノイズの減少分)と(周波数範囲が狭くなることによるノイズ増加分)との和として表される。具体的には、以下の式に従って、ΔPall-noiseが算出される。
Figure JPOXMLDOC01-appb-M000020
Next, the amount of noise change ΔP all-noise in the entire target frame 7 generated by excluding k noise-min and narrowing the candidate range 75 is calculated. The amount of noise change ΔP all-noise is expressed as the sum of (decrease in quantization noise) and (increase in noise due to narrowing of the frequency range). Specifically, ΔP all-noise is calculated according to the following equation.
Figure JPOXMLDOC01-appb-M000020
 ΔPall-noiseが算出されると、以下の式が満たされるか否かが判定される。
Figure JPOXMLDOC01-appb-M000021
 (数21)に示すように、ΔPall-noiseが負である場合、knoise-minを除外することで対象フレーム7全体でのノイズ総量が低下することを意味する。この場合、knoise-minを除外した新たな候補範囲75が設定される。すなわち、lsp'_i及びhsp'_iが新たなlsp_i及びhsp_iとして設定される。
 以後、この候補範囲75の縮小操作が、(数21)の式が満たされなくなるまで繰り返し実行される。これにより、対象フレーム7全体でのノイズ総量を減少させることが可能となる。従って、例えば予め冗長データ3のデータ量(ターゲットデータ量)が決められており、それが小さいことで量子化ノイズが問題となるような場合であっても、対象フレーム7全体でのノイズ総量を十分に抑制することが可能である。
When ΔP all-noise is calculated, it is determined whether or not the following equation is satisfied.
Figure JPOXMLDOC01-appb-M000021
As shown in (Equation 21), when ΔP all-noise is negative, it means that the total amount of noise in the entire target frame 7 is reduced by excluding k noise-min. In this case, a new candidate range 75 excluding k noise-min is set. That is, lsp'_i and hsp'_i are set as new lsp_i and hsp_i.
After that, the reduction operation of the candidate range 75 is repeatedly executed until the equation (Equation 21) is no longer satisfied. This makes it possible to reduce the total amount of noise in the entire target frame 7. Therefore, for example, even if the data amount (target data amount) of the redundant data 3 is determined in advance and the quantization noise becomes a problem due to its small size, the total amount of noise in the entire target frame 7 can be calculated. It can be sufficiently suppressed.
 上記したように量子化ノイズ等を参照する方法に代えて、他の方法が用いられてもよい。
 一般に音声の符号化を行う場合には、パワーの大きいスペクトル成分に対して割り当てるビット数が多くなる傾向がある。ここでは、このような特性を利用して、元波形4のスペクトルのパワー|X(k)|を参照して、knoise-minを除外するか否かが判定される。
Instead of the method of referring to the quantization noise or the like as described above, another method may be used.
Generally, when coding voice, the number of bits assigned to a high-power spectral component tends to be large. Here, using such a characteristic, it is determined whether or not to exclude k noise-min by referring to the power | X (k) | of the spectrum of the original waveform 4.
 例えば、冗長データ3のターゲットデータ量(nbit)に対応する総パワーPtarget(nbit)が、所定の計算式またはテーブルを用いて算出される。
 次に、以下の式に従って、全ての候補範囲75における|X(k)|の総パワーPredun-sumが算出される。
Figure JPOXMLDOC01-appb-M000022
 (数22)式に示すように、Predun-sumの値は、例えばknoise-minを除外することで小さくなる。
For example, the total power P target (nbit) corresponding to the target data amount (nbit) of the redundant data 3 is calculated using a predetermined formula or table.
Next, the total power P redun-sum of | X (k) | in all the candidate ranges 75 is calculated according to the following equation.
Figure JPOXMLDOC01-appb-M000022
As shown in the equation (Equation 22), the value of P redun-sum becomes smaller by excluding, for example, k noise-min.
 次に、Predun-sumが以下の式を満たしているか否かが判定される。
Figure JPOXMLDOC01-appb-M000023
 例えば、(数23)に示す関係が満たされない場合、上記した方法と同様に、各lsp_i及びhsp_iの中から、Pnoise(k)が最小となるインデックスknoise-minが検出され、knoise-minを除外した新たな候補範囲75が設定される。
 新たな候補範囲75について、総パワーPredun-sumが再計算される。つまり、knoise-minを除外する前に算出したPredun-sumから、knoise-minにおける元波形4のスペクトルのパワー|X(knoise-min)|が差し引かれる。すなわち、再計算される総パワーP'redun-sumは、以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000024
Next, it is determined whether or not P redun-sum satisfies the following equation.
Figure JPOXMLDOC01-appb-M000023
For example, when the relationship shown in (Expression 23) is not satisfied, in a similar manner to that described above, from among the lsp_i and Hsp_i, detects the index k noise-min to P noise (k) is minimized, k Noise- A new candidate range 75 excluding min is set.
For the new candidate range 75, the total power P redun-sum is recalculated. In other words, the P REDUN-sum calculated before exclude k noise-min, k spectral power of the original waveform 4 in noise-min | X (k noise -min) | is subtracted. That is, the recalculated total power P'redun-sum is calculated according to the following equation.
Figure JPOXMLDOC01-appb-M000024
noise-minの検出から、(数24)式に示す総パワーを再計算するまでの処理が、以下に示す関係を満たすまで繰り返し実行される。
Figure JPOXMLDOC01-appb-M000025
 この関係を満たす候補範囲75が、最終的に冗長データ3に割り当てる符号化周波数範囲70として設定される。
The process from the detection of k noise-min to the recalculation of the total power shown in Eq. (Equation 24) is repeatedly executed until the following relationship is satisfied.
Figure JPOXMLDOC01-appb-M000025
A candidate range 75 that satisfies this relationship is set as a coded frequency range 70 that is finally assigned to the redundant data 3.
なお、図24及び図25を参照して説明した周波数範囲集約処理や、図26を参照して説明した周波数範囲調整処理について、各処理に必要なスペクトルデータを調整して用いることも可能である。例えば、人間に知覚される強度には、スペクトルのパワーが同じであっても周波数範囲ごとに違いが生じる場合がある。このため、例えば連結コストの計算に使用されるPnoise(k)や、Nq(k)、及び|X(k)|等について、閾値曲線thresh(k)の値を減算する処理(ただし負になった場合は0にする)や、周波数毎に重みづけをする処理を施した上で用いることが望ましい。その他符号化方式に応じて適切なコストが存在すれば、それを使用してもよい。
 また、周波数範囲調整処理については、1回の操作で1個以上のインデックスを除外するような処理が実行されてもよい。
It is also possible to adjust and use the spectrum data required for each process in the frequency range aggregation process described with reference to FIGS. 24 and 25 and the frequency range adjustment process described with reference to FIG. 26. .. For example, the intensity perceived by humans may vary from frequency range to frequency range, even if the spectral powers are the same. Therefore, for example, for P noise (k), N q (k), | X (k) |, etc. used in the calculation of the connection cost, the process of subtracting the value of the threshold curve threshold (k) (however, negative). If it becomes, it is set to 0), and it is desirable to use it after performing a process of weighting for each frequency. If there is an appropriate cost depending on the other coding method, it may be used.
Further, as for the frequency range adjustment process, a process of excluding one or more indexes in one operation may be executed.
 なお、上記では、候補範囲75の両端のノイズ成分に基づいて、候補範囲75を縮小する処理について説明したが、ノイズ成分に基づいて候補範囲75が拡大されてもよい。
 例えば、Pnoise(k)が高い周波数の近傍には、同様にノイズレベルの高い周波数成分が存在する可能性がある。このため、例えば候補範囲75の両端において、Pnoise(k)が一定のレベルを超える場合には、候補範囲75を拡大して近傍のノイズ成分を冗長データ3で置き換えるといった処理が実行されてもよい。これにより、合成波形によるノイズを低減することが可能である。
In the above description, the process of reducing the candidate range 75 based on the noise components at both ends of the candidate range 75 has been described, but the candidate range 75 may be expanded based on the noise component.
For example, there is a possibility that a frequency component having a similarly high noise level exists in the vicinity of a frequency having a high P noise (k). Therefore, for example, when P noise (k) exceeds a certain level at both ends of the candidate range 75, even if a process of expanding the candidate range 75 and replacing nearby noise components with redundant data 3 is executed. good. This makes it possible to reduce noise due to the composite waveform.
 [受信装置の動作]
 第2の実施形態に係る受信装置50の動作は、第1の実施形態で説明した動作と略同様であるが、パケット1としてメタ情報を受信する点、及び置換領域設定処理の処理内容が、上記の実施形態とは異なる。
 本実施形態では、置換領域設定処理において、メタ情報に基づく複数の周波数範囲(符号化周波数範囲70)が設定される。
[Operation of receiver]
The operation of the receiving device 50 according to the second embodiment is substantially the same as the operation described in the first embodiment, but the point of receiving the meta information as packet 1 and the processing content of the replacement area setting process are different. It is different from the above embodiment.
In the present embodiment, a plurality of frequency ranges (encoded frequency range 70) based on meta information are set in the substitution region setting process.
 具体的には、再生するフレームのスペクトルとして設定される配列Xout[]において、複数の符号化周波数範囲70には、冗長データ3のスペクトル(Xredun[])が用いられる。またXout[]において、符号化周波数範囲70以外の全ての補間範囲71は、合成波形11のスペクトル(X'dec[])で置き換えられる(図16及び図17参照)。
 このため、冗長データ3のスペクトルのインデックスが格納される配列redun_isp[]には、メタ情報内にあるすべてのlsp_i~hsp_iまでのインデックスが格納される。また合成波形11のスペクトルのインデックスが格納される配列replace_isp[]には、redun_isp[]に格納されたインデックス以外の全てのインデックスが格納される。
Specifically, in the array X out [] set as the spectrum of the frame to be reproduced, the spectrum of the redundant data 3 (X redun []) is used for the plurality of coded frequency ranges 70. Further, in X out [], all the interpolation ranges 71 other than the coding frequency range 70 are replaced with the spectrum (X'dec []) of the composite waveform 11 (see FIGS. 16 and 17).
Therefore, the array redun_isp [] in which the index of the spectrum of the redundant data 3 is stored stores the indexes of all lsp_i to hsp_i in the meta information. Further, in the array reply_isp [] in which the indexes of the spectra of the composite waveform 11 are stored, all indexes other than the indexes stored in redun_isp [] are stored.
 例えば、(lsp1,hsp1)=(10,15)、(lsp2,hsp2)=(33,36)、(lsp3,hsp3)=(55,60)という旨を指示するメタ情報が受信されたとする。この場合、redun_isp[]には、10、11、12、13、14、15、33、34、35、36、55、56、57、58、59、60のインデックスが格納される。
 これにより、図18に示すスペクトル置換処理をそのまま適用して、適切な再生用のスペクトルデータXout[]を生成することが可能である。
For example, suppose that meta information indicating that (lsp1, hsp1) = (10,15), (lsp2, hsp2) = (33,36), (lsp3, hsp3) = (55,60) is received. In this case, redun_isp [] stores indexes of 10, 11, 12, 13, 14, 15, 33, 34, 35, 36, 55, 56, 57, 58, 59, 60.
Thereby, the spectrum replacement process shown in FIG. 18 can be applied as it is to generate appropriate spectrum data X out [] for reproduction.
 <第3の実施形態>
 図27は、第3の実施形態に係る符号化周波数範囲70の一例を示す模式図である。
 第3の実施形態では、第2の実施形態と同様に複数の符号化周波数範囲70が任意の周波数範囲に設定される。さらに、図27に示すように、各符号化周波数範囲70の間の補間範囲71に対して、任意の波形合成方法が個別に設定される。すなわち、本実施形態では、符号化周波数範囲70及び波形合成方法が自由に設定可能である。
<Third embodiment>
FIG. 27 is a schematic diagram showing an example of the coded frequency range 70 according to the third embodiment.
In the third embodiment, as in the second embodiment, a plurality of coded frequency ranges 70 are set to arbitrary frequency ranges. Further, as shown in FIG. 27, an arbitrary waveform synthesis method is individually set for the interpolation range 71 between the coding frequency ranges 70. That is, in the present embodiment, the coding frequency range 70 and the waveform synthesis method can be freely set.
 本実施形態では、受信側で実行可能な波形合成方法を送信側で全て試し、各方法で生成される合成波形11による発生ノイズ(ノイズスペクトル13)がそれぞれ算出される。これらのノイズスペクトル13に基づいて、冗長データ3に割り当てる符号化周波数範囲70と、補間範囲71に設定するべき最適な波形合成方法とが設定される。
 図27に示す例では、3つの補間範囲71に対して、低周波数側(図中の左)から順番に合成方法1、合成方法2、及び合成方法3が選択される。
 この結果、より少ない冗長データ3で高品位のエラーコンシールメントを実現することが可能となる。
In the present embodiment, all the waveform synthesis methods that can be executed on the receiving side are tried on the transmitting side, and the noise (noise spectrum 13) generated by the synthesized waveform 11 generated by each method is calculated. Based on these noise spectra 13, the coding frequency range 70 assigned to the redundant data 3 and the optimum waveform synthesis method to be set in the interpolation range 71 are set.
In the example shown in FIG. 27, the synthesis method 1, the synthesis method 2, and the synthesis method 3 are selected in order from the low frequency side (left in the figure) for the three interpolation ranges 71.
As a result, it is possible to realize high-quality error concealment with less redundant data 3.
 本実施形態に係る送受信システムの構成は、例えば上記の実施形態で説明した送受信システム100(送信装置20及び受信装置50)の構成と略同様であるが、送信装置20に含まれる冗長データ生成部及び受信装置50に含まれる信号処理部の構成が異なる。
 以下では、送受信システム100と同様の構成については、同じ符号を用いて説明する。
The configuration of the transmission / reception system according to the present embodiment is substantially the same as the configuration of the transmission / reception system 100 (transmission device 20 and reception device 50) described in the above embodiment, but the redundant data generation unit included in the transmission device 20. And the configuration of the signal processing unit included in the receiving device 50 is different.
Hereinafter, the same configuration as the transmission / reception system 100 will be described using the same reference numerals.
 [送信装置]
 以下では、送信装置20側の構成として、冗長データ生成部の構成を中心に説明する。
 図28は、第3の実施形態に係る冗長データ生成部324の構成例を示すブロック図である。冗長データ生成部324は、元データ選択部330と、合成波形生成部331a~331cと、発生ノイズ算出部332と、符号化範囲設定部333と、符号化スペクトル選択部334とを有する。
 冗長データ生成部324は、上記の実施形態で説明した冗長データ生成部24と比較して、主に複数の合成波形生成部331が設けられる点で異なる。また、これに伴い、発生ノイズ算出部332から出力されるノイズ情報(ノイズスペクトル13)が複数になっている。
 ここでは、受信装置50において3種類の波形合成方法が選択可能であるものとする。この場合、冗長データ生成部324には、3種類の波形合成方法に対応して3つの合成波形生成部331a~331cが設けられる。
 なお、波形合成方法がn種類ある場合には、合成波形生成部331もn個となる。
[Transmitter]
Hereinafter, the configuration of the redundant data generation unit will be mainly described as the configuration on the transmission device 20 side.
FIG. 28 is a block diagram showing a configuration example of the redundant data generation unit 324 according to the third embodiment. The redundant data generation unit 324 includes an original data selection unit 330, a composite waveform generation unit 331a to 331c, a generated noise calculation unit 332, a coding range setting unit 333, and a coding spectrum selection unit 334.
The redundant data generation unit 324 is different from the redundant data generation unit 24 described in the above embodiment in that a plurality of composite waveform generation units 331 are mainly provided. Along with this, a plurality of noise information (noise spectrum 13) output from the generated noise calculation unit 332 is provided.
Here, it is assumed that three types of waveform synthesis methods can be selected in the receiving device 50. In this case, the redundant data generation unit 324 is provided with three composite waveform generation units 331a to 331c corresponding to the three types of waveform synthesis methods.
When there are n types of waveform synthesis methods, the number of composite waveform generation units 331 is also n.
 元データ選択部330は、入力バッファ23(図7参照)に記憶された元データ6から、必要なデータを選択して取得する。具体的には、主データ2(送信フレーム8)及び冗長データ3(対象フレーム7)に対応する元データ6を前段の入力バッファ23から読み込む。また合成波形生成部331a~331cに対して必要な元データ6を引き渡す。元データ6は、スペクトルデータX(k)及び時間波形データx(n)の少なくとも一方である。 The original data selection unit 330 selects and acquires necessary data from the original data 6 stored in the input buffer 23 (see FIG. 7). Specifically, the original data 6 corresponding to the main data 2 (transmission frame 8) and the redundant data 3 (target frame 7) is read from the input buffer 23 in the previous stage. Further, the necessary original data 6 is delivered to the composite waveform generation units 331a to 331c. The original data 6 is at least one of the spectrum data X (k) and the time waveform data x (n).
 合成波形生成部331a~331cは、元データ選択部330から取得した元データ6に基づいて、各々に設定された波形合成方法に従って合成波形11を生成し、各合成波形11のデータ(合成データ)を発生ノイズ算出部332に引き渡す。
 なお、波形合成方法としては、演算量の低い方法(例えば前フレームをコピーする方法等)を複数個組み合わせることが望ましい。これにより、受信側での演算負荷を軽減することが可能である。もちろん、波形合成方法の種類等に係わらず、本技術は適用可能である。
The composite waveform generation units 331a to 331c generate the composite waveform 11 according to the waveform synthesis method set for each based on the original data 6 acquired from the original data selection unit 330, and the data (composite data) of each composite waveform 11 Is handed over to the generated noise calculation unit 332.
As the waveform synthesis method, it is desirable to combine a plurality of methods having a low calculation amount (for example, a method of copying the previous frame). This makes it possible to reduce the calculation load on the receiving side. Of course, this technique can be applied regardless of the type of waveform synthesis method.
 発生ノイズ算出部332は、元データ選択部330から元データ6を取得し、合成波形生成部331a~331cから3つの合成波形11を取得する。また、発生ノイズ算出部332は、波形合成方法ごとに、合成波形11を用いることで生じるノイズに関するノイズ情報を算出し、これを符号化範囲設定部へ引き渡す。
 具体的には、各合成波形11の波形品質を表すノイズスペクトル13がノイズ情報として算出される。このように、発生ノイズ算出部332は、複数の波形合成方法ごとに合成波形11の波形品質を予測する。
The generated noise calculation unit 332 acquires the original data 6 from the original data selection unit 330, and acquires the three composite waveforms 11 from the composite waveform generation units 331a to 331c. Further, the generated noise calculation unit 332 calculates noise information related to noise generated by using the composite waveform 11 for each waveform synthesis method, and delivers this to the coding range setting unit.
Specifically, the noise spectrum 13 representing the waveform quality of each composite waveform 11 is calculated as noise information. In this way, the generated noise calculation unit 332 predicts the waveform quality of the composite waveform 11 for each of the plurality of waveform synthesis methods.
 符号化範囲設定部333は、発生ノイズ算出部332から波形合成方法ごとに算定されたノイズ情報(ノイズスペクトル13)を取得し、各ノイズスペクトル13に基づいて、冗長データ3に割り当てる少なくとも1つの符号化周波数範囲70を設定する。
 また符号化範囲設定部333は、合成波形11を生成する複数の波形合成方法のうちの1つを符号化周波数範囲70以外の周波数範囲である補間範囲71にそれぞれ設定する。具体的には、各ノイズスペクトル13に基づいて、補間範囲71ごとに最適な波形合成方法を選択する。
The coding range setting unit 333 acquires noise information (noise spectrum 13) calculated for each waveform synthesis method from the generated noise calculation unit 332, and allocates at least one code to the redundant data 3 based on each noise spectrum 13. The frequency range 70 is set.
Further, the coding range setting unit 333 sets one of the plurality of waveform synthesis methods for generating the composite waveform 11 in the interpolation range 71, which is a frequency range other than the coding frequency range 70. Specifically, the optimum waveform synthesis method is selected for each interpolation range 71 based on each noise spectrum 13.
 このように、符号化範囲設定部333では、複数の波形合成方法ごとに予測された波形品質(ノイズスペクトル13)に基づいて、符号化周波数範囲70と、補間範囲71に割り当てる波形合成方法とが設定される。
 また符号化範囲設定部333は、符号化周波数範囲70を指定する情報と、補間範囲71に割り当てた波形合成方法を指定する情報とを含むメタ情報を生成し、符号化スペクトル選択部334へ引き渡す。
As described above, in the coding range setting unit 333, the coding frequency range 70 and the waveform synthesis method assigned to the interpolation range 71 are determined based on the waveform quality (noise spectrum 13) predicted for each of the plurality of waveform synthesis methods. Set.
Further, the coding range setting unit 333 generates meta information including information for designating the coding frequency range 70 and information for specifying the waveform synthesis method assigned to the interpolation range 71, and delivers the meta information to the coding spectrum selection unit 334. ..
 符号化スペクトル選択部334は、冗長データ3に対応する元データ6(スペクトルデータX(k))から、符号化周波数範囲70に基づいて冗長データ3として用いるスペクトル成分を抽出する。符号化スペクトル選択部334の動作は上記の実施形態と同様であるが、波形合成方法についてのメタ情報も冗長データ3の一部として取り扱う。 The coded spectrum selection unit 334 extracts a spectrum component to be used as the redundant data 3 based on the coded frequency range 70 from the original data 6 (spectral data X (k)) corresponding to the redundant data 3. The operation of the coded spectrum selection unit 334 is the same as that of the above embodiment, but the meta information about the waveform synthesis method is also handled as a part of the redundant data 3.
 図29は、冗長データ3の生成処理の一例を示すフローチャートである。この処理は、例えばパケット1を生成する度に実行されるループ処理である。
 以下では、波形合成方法として、スペクトルデータ及び時間波形データの一方又は両方を使用して合成波形11を生成する方法が用いられるものとする。また合成波形11を生成するために必要となるこれらのデータは、予め前段の入力バッファ23に格納されているものとする。
FIG. 29 is a flowchart showing an example of the generation process of the redundant data 3. This process is, for example, a loop process that is executed every time packet 1 is generated.
In the following, as the waveform synthesis method, it is assumed that a method of generating the composite waveform 11 by using one or both of the spectrum data and the time waveform data is used. Further, it is assumed that these data required for generating the composite waveform 11 are stored in advance in the input buffer 23 of the previous stage.
 まず、入力バッファ23から冗長データ3に対応する元データ6(対象フレーム7に含まれる元データ6)が取得される(ステップ901)。また、入力バッファ23から波形合成に必要な元データ6としてスペクトルデータが取得される(ステップ902)。また、入力バッファ23から波形合成に必要な元データ6として時間波形データが取得される(ステップ903)。ステップ901~903の処理の順番は限定されない。 First, the original data 6 (original data 6 included in the target frame 7) corresponding to the redundant data 3 is acquired from the input buffer 23 (step 901). Further, spectrum data is acquired from the input buffer 23 as the original data 6 required for waveform synthesis (step 902). In addition, time waveform data is acquired from the input buffer 23 as the original data 6 required for waveform synthesis (step 903). The order of processing in steps 901 to 903 is not limited.
 次に、波形合成方法ごとに、ステップ904~906までの処理が順次実行される。
 まず、対象となる波形合成方法に基づいて合成波形11を生成する合成波形生成処理が実行される(ステップ904)。この合成波形生成処理は、図9のステップ102に対応する処理である。
 合成波形11が生成されると、合成波形に関するノイズスペクトル13を算出する発生ノイズ予測処理が実行される(ステップ905)。この発生ノイズ予測処理は、図9のステップ103に対応する処理である。
 ノイズスペクトル13が算出されると、ノイズスペクトル13に基づいて、冗長データ3に割り当てる周波数範囲を設定する符号化範囲設定処理が実行される(ステップ906)。この符号化範囲設定処理は、図21に示す符号化範囲設定処理に対応する処理である。ここでは、少なくとも1つの周波数範囲が設定される。なお、ステップ906で設定される周波数範囲は、符号化周波数範囲70の候補となる候補範囲75である。
 このように、本実施形態では、複数の合成方法ごとに符号化周波数範囲70の候補となる少なくとも1つの候補範囲75が算出される。
Next, the processes of steps 904 to 906 are sequentially executed for each waveform synthesis method.
First, a composite waveform generation process for generating the composite waveform 11 based on the target waveform synthesis method is executed (step 904). This composite waveform generation process corresponds to step 102 in FIG.
When the composite waveform 11 is generated, the generated noise prediction process for calculating the noise spectrum 13 related to the composite waveform is executed (step 905). This generated noise prediction process is a process corresponding to step 103 of FIG.
When the noise spectrum 13 is calculated, the coding range setting process for setting the frequency range assigned to the redundant data 3 is executed based on the noise spectrum 13 (step 906). This coding range setting process is a process corresponding to the coding range setting process shown in FIG. Here, at least one frequency range is set. The frequency range set in step 906 is a candidate range 75 that is a candidate for the coded frequency range 70.
As described above, in the present embodiment, at least one candidate range 75 that is a candidate for the coding frequency range 70 is calculated for each of the plurality of synthesis methods.
 処理すべき波形合成方法が残っているか否かが判定される(ステップ907)。波形合成方法が残っている場合(ステップ907のYES)、まだ処理を行っていない波形合成方法を用いてステップ904以降の処理が再度実行される。
 この結果、i番目の波形合成方法に対応する候補範囲75(enc_isp_i[])が、予定される波形合成方法と同じ数だけ生成される(図30参照)。
It is determined whether or not there is a waveform synthesis method to be processed (step 907). If the waveform synthesis method remains (YES in step 907), the processing after step 904 is executed again using the waveform synthesis method that has not been processed yet.
As a result, the same number of candidate ranges 75 (enc_isp_i []) corresponding to the i-th waveform synthesis method are generated as the planned waveform synthesis method (see FIG. 30).
 全ての波形合成方法についての処理が完了した場合(ステップ907のNO)、各波形合成方法に対応する候補範囲75を合成して符号化周波数範囲70を設定する符号化範囲合成処理が実行される(ステップ908)。
 符号化範囲合成処理では、符号化周波数範囲70を設定するとともに、符号化周波数範囲70の間の補間範囲71に対して適切な波形合成方法が割り当てられる。すなわち、各補間範囲71に対して、どの合成波形11のスペクトルを適用するかが設定され、この設定結果がメタ情報として生成される。
When the processing for all the waveform synthesis methods is completed (NO in step 907), the coding range synthesis processing for synthesizing the candidate ranges 75 corresponding to each waveform synthesis method and setting the coding frequency range 70 is executed. (Step 908).
In the coding range synthesis process, the coding frequency range 70 is set, and an appropriate waveform synthesis method is assigned to the interpolation range 71 between the coding frequency ranges 70. That is, which composite waveform 11 spectrum is applied to each interpolation range 71 is set, and this setting result is generated as meta information.
 次に、対象フレーム7の元データ6から符号化周波数範囲70に対応するスペクトル成分のみを抽出する符号化スペクトル選択処理が実行される(ステップ909)。この符号化スペクトル選択処理は、図9のステップ105に対応する処理である。
 例えば、符号化周波数範囲70のスペクトル成分のみを用いて符号化前の冗長データ3が生成される。また冗長データ3には、符号化周波数範囲70を示すインデックスや、補間範囲71ごとに設定された波形合成方法を指示するメタ情報が付加される。
 この時、例えば波形合成処理の中で算出された係数等がメタ情報として付加されてもよい。これにより、受信装置50における演算負荷を低減することが可能となる。
Next, a coded spectrum selection process for extracting only the spectral components corresponding to the coded frequency range 70 from the original data 6 of the target frame 7 is executed (step 909). This coded spectrum selection process corresponds to step 105 in FIG.
For example, the redundant data 3 before coding is generated using only the spectral components in the coded frequency range 70. Further, an index indicating the coding frequency range 70 and meta information indicating the waveform synthesis method set for each interpolation range 71 are added to the redundant data 3.
At this time, for example, the coefficient calculated in the waveform synthesis process may be added as meta information. This makes it possible to reduce the calculation load on the receiving device 50.
 冗長データ3が抽出されると、処理すべき元データ6が残っているか否かが判定される(ステップ910)。処理すべき元データ6が残っている場合(ステップ910のYES)、残っている元データ6についてステップ901以降の処理が再度実行される。処理すべき元データ6が残っていない場合(ステップ910のNO)、冗長データ3を符号化する符号化処理が実行される(ステップ911)。冗長データ3が符号化されると、主データ2のターゲットデータ量が設定される(ステップ912)。具体的には、符号化済みの冗長データ3のデータ量からパケット1の空き容量が算出される。
 ステップ910~912までの処理は、例えば図9に示すステップ106~108の処理にそれぞれ対応する処理である。
When the redundant data 3 is extracted, it is determined whether or not the original data 6 to be processed remains (step 910). If the original data 6 to be processed remains (YES in step 910), the processing after step 901 is executed again for the remaining original data 6. When the original data 6 to be processed does not remain (NO in step 910), the coding process for encoding the redundant data 3 is executed (step 911). When the redundant data 3 is encoded, the target data amount of the main data 2 is set (step 912). Specifically, the free space of the packet 1 is calculated from the data amount of the encoded redundant data 3.
The processes in steps 910 to 912 correspond to the processes in steps 106 to 108 shown in FIG. 9, for example.
 [符号化範囲合成処理]
 図30は、符号化範囲合成処理の一例を示す模式図である。
 以下では、符号化範囲合成処理の概要について説明する。ここでは、前段の合成波形生成処理、発生ノイズ予測処理、及び符号化範囲設定処理により、i番目の波形合成方法に対応する候補範囲75として(enc_isp_i[])が読み込まれているものとする。
 図30の上側3段には、波形合成方法1~3に対応して算出された候補範囲75を示す領域が図示されている。例えば、波形合成方法1の候補範囲75(enc_isp_1[])には1つの周波数範囲が含まれる。また波形合成方法2及び3の候補範囲75(enc_isp_2[]及びenc_isp_3[])には、それぞれ2つの周波数範囲が含まれる。
[Code range composition processing]
FIG. 30 is a schematic diagram showing an example of the coding range synthesis process.
The outline of the coding range composition processing will be described below. Here, it is assumed that (enc_isp_i []) is read as the candidate range 75 corresponding to the i-th waveform synthesis method by the composite waveform generation process, the generated noise prediction process, and the coding range setting process in the previous stage.
In the upper three stages of FIG. 30, a region showing a candidate range 75 calculated corresponding to the waveform synthesis methods 1 to 3 is shown. For example, the candidate range 75 (enc_isp_1 []) of the waveform synthesis method 1 includes one frequency range. Further, the candidate ranges 75 (enc_isp_2 [] and enc_isp_3 []) of the waveform synthesis methods 2 and 3 include two frequency ranges, respectively.
 本実施形態では、これら複数の波形合成方法を組み合わせることが可能である。従って、各enc_isp_i[]のうち、共通するインデックスのスペクトルのみを冗長データ3として符号化すればよく、他の周波数範囲は全て合成波形11のスペクトルで適宜置き換えればよい。
 n個の波形合成方法が用いられる場合、最終的に符号化すべきインデックスの集合enc_isp[](符号化周波数範囲70)は、以下の式を用いて表される。
Figure JPOXMLDOC01-appb-M000026
In the present embodiment, it is possible to combine these plurality of waveform synthesis methods. Therefore, of each enc_isp_i [], only the spectrum of the common index may be encoded as the redundant data 3, and all the other frequency ranges may be appropriately replaced with the spectrum of the composite waveform 11.
When n waveform synthesis methods are used, the final set of indexes to be encoded, enc_isp [] (encoded frequency range 70), is expressed using the following equation.
Figure JPOXMLDOC01-appb-M000026
 このように、本実施形態では、複数の波形合成方法ごとに算出された候補範囲75の積集合が表す周波数範囲が符号化周波数範囲70に設定される。
 また、符号化周波数範囲70以外の周波数範囲は、「符号化しない周波数範囲」となり、合成波形11を使用して補間される補間範囲71となる。なお補間範囲71は、合成波形11を用いる合成周波数範囲であるとも言える。
 図30には、各候補範囲75の積集合で表される2つの符号化周波数範囲70が、斜線の領域を用いて模式的に図示されている。これらの符号化周波数範囲70以外の領域が補間範囲71となる。
As described above, in the present embodiment, the frequency range represented by the intersection of the candidate ranges 75 calculated for each of the plurality of waveform synthesis methods is set to the coded frequency range 70.
Further, the frequency range other than the coded frequency range 70 is the “uncoded frequency range”, and is the interpolation range 71 interpolated using the composite waveform 11. It can be said that the interpolation range 71 is a composite frequency range in which the composite waveform 11 is used.
In FIG. 30, two coded frequency ranges 70 represented by the intersection of each candidate range 75 are schematically illustrated using shaded areas. A region other than these coding frequency ranges 70 is the interpolation range 71.
 例えば符号化周波数範囲70を表す配列enc_isp[]に、下記のインデックスが含まれているとする。
 (lsp_i,hsp_i)=(2,55)、(77,80)。
 この場合、補間範囲71は、下記のように一意に決定される。
 (lsp_j,hsp_j)=(0,1)、(56,76)、(81,N)
 ここで、Nは、例えばMDCTスペクトルの最大のインデックスである。またjは、補間範囲71を示すインデックスである。
For example, it is assumed that the following index is included in the array enc_isp [] representing the coded frequency range 70.
(Lsp_i, hsp_i) = (2,55), (77,80).
In this case, the interpolation range 71 is uniquely determined as follows.
(Lsp_j, hsp_j) = (0,1), (56,76), (81, N)
Here, N is, for example, the largest index of the MDCT spectrum. Further, j is an index indicating the interpolation range 71.
 次に、各補間範囲71に対して波形合成方法を割り当てる方法について説明する。ここでは、各補間範囲71に対して1つの波形合成方法を設定するものとする。従って、1つの補間範囲71を複数に分割してそれぞれに波形合成方法を割り当てるといった処理は行わない。これにより、メタ情報の増加を抑制可能である。 Next, a method of assigning a waveform synthesis method to each interpolation range 71 will be described. Here, it is assumed that one waveform synthesis method is set for each interpolation range 71. Therefore, the process of dividing one interpolation range 71 into a plurality of parts and assigning a waveform synthesis method to each of them is not performed. This makes it possible to suppress the increase in meta information.
 例えばj番目の補間範囲71に波形合成方法を設定する処理を想定する。この場合、i番目の波形合成方法を用いて生成された合成波形11によるノイズスペクトルPnoise-i(k)について、j番目の補間範囲71における総和Pnoise-sum(i,j)が以下の式に従って算出される。
Figure JPOXMLDOC01-appb-M000027
For example, assume a process of setting a waveform synthesis method in the j-th interpolation range 71. In this case, for the noise spectrum P noise-i (k) based on the composite waveform 11 generated by using the i-th waveform synthesis method, the sum P noise-sum (i, j) in the j-th interpolation range 71 is as follows. Calculated according to the formula.
Figure JPOXMLDOC01-appb-M000027
 複数の波形合成方法のうち、(数27)式に示すPnoise-sum(i,j)が最小となる波形合成方法iが、j番目の補間範囲71に対する最適な波形合成方法として設定される。すなわちj番目の補間範囲71において発生ノイズの総和が最も少ない波形合成方法が選択される。
 このように、複数の波形合成方法のうちノイズスペクトル13の積算値が最小となる方法が補間範囲71に設定される。
 このような処理が、全ての補間範囲71について実行され、補間範囲71ごとに最適な波形合成方法がそれぞれ設定される。
Of the plurality of waveform synthesis methods, the waveform synthesis method i that minimizes the P noise-sum (i, j) shown in Eq. (Equation 27) is set as the optimum waveform synthesis method for the j-th interpolation range 71. .. That is, the waveform synthesis method having the smallest total sum of generated noise in the j-th interpolation range 71 is selected.
In this way, the method that minimizes the integrated value of the noise spectrum 13 among the plurality of waveform synthesis methods is set in the interpolation range 71.
Such processing is executed for all the interpolation ranges 71, and the optimum waveform synthesis method is set for each interpolation range 71.
 例えば図30に示す例では、3つの補間範囲71に対して低周波数側から順番に波形合成方法3、波形合成方法2、及び波形合成方法1がそれぞれ設定される。なお、(数27)式による判定の結果によっては、全ての補間範囲71に同じ波形合成方法が設定される場合等もあり得る。
 なお上記したように、符号化周波数範囲70が確認できれば、補間範囲71も特定可能である。従って、メタ情報には、各補間範囲71に対する波形合成方法の割り当てのみを、例えば低域から順に並べて付加すればよい。すなわち補間範囲71を指定するインデックス等の情報は不要である。
For example, in the example shown in FIG. 30, the waveform synthesis method 3, the waveform synthesis method 2, and the waveform synthesis method 1 are set in order from the low frequency side for the three interpolation ranges 71. Depending on the result of the determination by the equation (Equation 27), the same waveform synthesis method may be set for all the interpolation ranges 71.
As described above, if the coding frequency range 70 can be confirmed, the interpolation range 71 can also be specified. Therefore, only the assignment of the waveform synthesis method to each interpolation range 71 may be added to the meta information in order from, for example, the low frequency band. That is, information such as an index that specifies the interpolation range 71 is unnecessary.
 また符号化周波数範囲70が設定された後、例えば冗長データ3のターゲットデータ量が極端に小さい等の要因で量子化ノイズが問題となる場合等には、図21のステップ605の周波数範囲調整処理が実行されてもよい。 Further, after the coded frequency range 70 is set, for example, when the quantization noise becomes a problem due to a factor such as an extremely small amount of target data of the redundant data 3, the frequency range adjustment process in step 605 of FIG. 21 May be executed.
 図31は、符号化範囲合成処理の一例を示すフローチャートである。図31に示す処理は、図29を参照して説明した符号化範囲合成処理の一例である。
 以下では、最終的に符号化するスペクトルのインデックス(符号化周波数範囲70)を表す配列をenc_isp[]と記載する。また波形合成方法を表す変数をiと記載し、補間範囲71を指定する変数をjと記載する。
FIG. 31 is a flowchart showing an example of the coding range synthesis process. The process shown in FIG. 31 is an example of the coded range synthesis process described with reference to FIG. 29.
In the following, the sequence representing the index (encoding frequency range 70) of the spectrum to be finally encoded will be referred to as enc_isp []. Further, the variable representing the waveform synthesis method is described as i, and the variable that specifies the interpolation range 71 is described as j.
 まず、前段までの処理により波形合成方法ごとに算出された候補範囲75のインデックスenc_isp_i[]が取得される(ステップ1001)。
 次に、enc_isp[]が、1番目の候補範囲75のインデックスenc_isp_1[]で初期化される(ステップ1002)。また波形合成方法を表す変数iが1に初期化される(ステップ1003)。
First, the index enc_isp_i [] of the candidate range 75 calculated for each waveform synthesis method by the processing up to the previous stage is acquired (step 1001).
Next, enc_isp [] is initialized with the index enc_isp_1 [] of the first candidate range 75 (step 1002). Further, the variable i representing the waveform synthesis method is initialized to 1 (step 1003).
 次に、enc_isp[]に対して、波形合成方法ごとに求めた候補範囲75のインデックスenc_isp_i[]との積集合が算出され、enc_isp[]が更新される(ステップ1004)。そして、変数iがインクリメントされる(ステップ1005)。
 インクリメントされたiが、使用される波形合成方法の数以下であるか否か、すなわち全ての波形合成方法について候補範囲75の積集合が算出されたかが判定される(ステップ1006)。波形合成方法が残っている場合(ステップ1006のYES)、ステップ1004以降の処理が再度実行される。
 全ての波形合成方法について候補範囲75の積集合が算出されると、enc_isp[]は、(数26)に示す符号化周波数範囲70のインデックスを表す配列となる。
Next, the intersection of enc_isp [] with the index enc_isp_i [] of the candidate range 75 obtained for each waveform synthesis method is calculated, and enc_isp [] is updated (step 1004). Then, the variable i is incremented (step 1005).
It is determined whether the incremented i is less than or equal to the number of waveform synthesis methods used, that is, whether the intersection of candidate ranges 75 has been calculated for all waveform synthesis methods (step 1006). If the waveform synthesis method remains (YES in step 1006), the processes after step 1004 are executed again.
When the intersection of the candidate ranges 75 is calculated for all the waveform synthesis methods, enc_isp [] becomes an array representing the index of the coded frequency range 70 shown in (Equation 26).
 処理するべき波形合成方法が残っていない場合(ステップ1006のNO)、enc_isp[]に基づいて補間範囲71が算出され(ステップ1007)、補間範囲71を表す変数jが1に初期化される(ステップ1008)。以下、j番目の補間範囲71の最低周波数及び最高周波数を表すインデックスを(lsp_j,hsp_j)と記載する。補間範囲71の個数は、符号化周波数範囲70の分布にもよるが、符号化周波数範囲70の個数±1個となる。 If there is no waveform synthesis method to be processed (NO in step 1006), the interpolation range 71 is calculated based on enc_isp [] (step 1007), and the variable j representing the interpolation range 71 is initialized to 1 (NO in step 1006). Step 1008). Hereinafter, the indexes representing the lowest frequency and the highest frequency of the j-th interpolation range 71 are referred to as (lsp_j, hsp_j). The number of interpolation ranges 71 depends on the distribution of the coding frequency range 70, but is ± 1 of the number of coding frequency ranges 70.
 次に、ステップ1009~1012のループ処理により、j番目の補間範囲71について適切な波形合成方法が順番に設定される。
 まず、j番目の補間範囲71(k=lsp_j~hsp_j)におけるPnoise-i(k)の総和Pnoise-sum(i,j)が最小となる波形合成方法が探索される(ステップ1009)。具体的には、全ての波形合成方法に対して、(数27)式に基づいてPnoise-sum(i,j)が算出され、その結果が最小となる波形合成方法が選択される。
 前段で選択された波形合成方法が、j番目の補間範囲71に用いる波形合成方法として設定される(ステップ1010)。そして、変数jがインクリメントされ(ステップ1011)、変数jが補間範囲71の総数以下であるか否か、すなわち全ての補間範囲71に対して波形合成方法が設定されたか否かが判定される(ステップ1012)。
Next, by the loop processing of steps 1009 to 1012, an appropriate waveform synthesis method is sequentially set for the j-th interpolation range 71.
First, a waveform synthesis method that minimizes the sum P noise-sum (i, j) of P noise-i (k) in the j-th interpolation range 71 (k = lsp_j to hsp_j) is searched for (step 1009). Specifically, P noise-sum (i, j) is calculated based on the equation (Equation 27) for all the waveform synthesis methods, and the waveform synthesis method that minimizes the result is selected.
The waveform synthesis method selected in the previous stage is set as the waveform synthesis method used for the j-th interpolation range 71 (step 1010). Then, the variable j is incremented (step 1011), and it is determined whether or not the variable j is equal to or less than the total number of the interpolation ranges 71, that is, whether or not the waveform synthesis method is set for all the interpolation ranges 71 (step 1011). Step 1012).
 未処理の補間範囲71が残っている場合(ステップ1012のYES)、ステップ1009以降の処理が再度実行される。また全ての補間範囲71に対して波形合成方法が設定された場合、周波数範囲合成処理が完了する。
 この処理により、各補間範囲71に対して、発生ノイズの総量を最小にするような合成波形11の生成方法が設定される。これにより、受信側で復元される復元波形5のノイズ量が減少し、冗長データ3のデータ量が少なくても聴感上の違和感を十分に低減することが可能となる。
If the unprocessed interpolation range 71 remains (YES in step 1012), the processes after step 1009 are executed again. When the waveform synthesis method is set for all the interpolation ranges 71, the frequency range synthesis process is completed.
By this process, a method of generating the composite waveform 11 that minimizes the total amount of generated noise is set for each interpolation range 71. As a result, the amount of noise in the restored waveform 5 restored on the receiving side is reduced, and even if the amount of redundant data 3 is small, it is possible to sufficiently reduce the sense of discomfort in hearing.
 [受信装置]
 本実施形態に係る受信装置50(図7に示す受信バッファ52)が受信するパケット1には、主データ2と、メタ情報が付加された冗長データ3とが含まれる。このメタ情報には、複数の符号化周波数範囲70を指定する情報に加え、補間範囲71ごとに設定された波形合成方法の情報が含まれる。
 すなわち、メタ情報は、符号化周波数範囲70以外の周波数範囲である補間範囲71ごとに合成波形11の合成方法を指定する情報であると言える。このようなメタ情報が、受信バッファ52により受信され、後段の処理で適宜参照される。本実施形態では、メタ情報は、指定情報に相当する。
 以下では、受信装置50側の構成として、信号処理部の構成を中心に説明する。
[Receiver]
The packet 1 received by the receiving device 50 (reception buffer 52 shown in FIG. 7) according to the present embodiment includes the main data 2 and the redundant data 3 to which meta information is added. This meta information includes information on the waveform synthesis method set for each interpolation range 71, in addition to information for designating a plurality of coded frequency ranges 70.
That is, it can be said that the meta information is information for designating the synthesis method of the composite waveform 11 for each interpolation range 71, which is a frequency range other than the coding frequency range 70. Such meta information is received by the reception buffer 52 and is appropriately referred to in the subsequent processing. In the present embodiment, the meta information corresponds to the designated information.
Hereinafter, the configuration of the signal processing unit will be mainly described as the configuration on the receiving device 50 side.
 図32は、第3の実施形態に係る信号処理部の構成例を示すブロック図である。信号処理部358は、スペクトル置換部360と、スペクトルバッファ361と、IMDCT部362と、時間信号出力部363とを有する。さらに信号処理部358は、複数の合成波形生成部364a~364cと、MDCT部365と、時間波形バッファ366とを有する。 FIG. 32 is a block diagram showing a configuration example of the signal processing unit according to the third embodiment. The signal processing unit 358 has a spectrum replacement unit 360, a spectrum buffer 361, an IMDCT unit 362, and a time signal output unit 363. Further, the signal processing unit 358 has a plurality of composite waveform generation units 364a to 364c, an MDCT unit 365, and a time waveform buffer 366.
 スペクトル置換部360は、前段から復号化済の冗長データ3としてスペクトルデータを取得する。また、合成波形生成部364a及び364bで生成された合成波形11のスペクトルデータと、合成波形生成部364cで生成されMDCT部365で変換された合成波形11のスペクトルデータ(MDCTスペクトル)とを取得する。
 またスペクトル置換部360は、上記したメタ情報に基づいて、復号化済の冗長データ3の一部(補間範囲71)を各合成波形11のスペクトルデータで置換した補間データを生成する。この補間データは、スペクトルバッファ361及びIMDCT部362に引き渡される。
The spectrum replacement unit 360 acquires spectrum data as the decoded redundant data 3 from the previous stage. Further, the spectrum data of the composite waveform 11 generated by the composite waveform generation units 364a and 364b and the spectrum data (MDCT spectrum) of the composite waveform 11 generated by the composite waveform generation unit 364c and converted by the MDCT unit 365 are acquired. ..
Further, the spectrum replacement unit 360 generates interpolation data in which a part of the decoded redundant data 3 (interpolation range 71) is replaced with the spectrum data of each composite waveform 11 based on the above-mentioned meta information. This interpolated data is passed to the spectrum buffer 361 and the IMDCT unit 362.
 IMDCT部362は、スペクトル置換部360から補間データのスペクトルを取得し、これにIMDCTを行って補間データを時間領域のデータに変換する。このIMDCTの結果は、時間信号出力部363に引き渡される。
 時間信号出力部363は、IMDCT部362からIMDCTの結果を取得する。これに合成窓をかけて前回のIMDCTの結果と重畳加算を行ってオーディオ信号(時間波形データ)を再構成し、時間波形バッファ366に引き渡す。
 時間波形バッファ366に格納された時間波形データは、後段で必要なタイミングで音声再生に使用される。
The IMDCT unit 362 acquires the spectrum of the interpolated data from the spectrum replacement unit 360, performs IMDCT on the spectrum, and converts the interpolated data into time domain data. The result of this IMDCT is passed to the time signal output unit 363.
The time signal output unit 363 acquires the result of the IMDCT from the IMDCT unit 362. A composite window is applied to this, and the result of the previous IMDCT is superimposed and added to reconstruct the audio signal (time waveform data), and the audio signal (time waveform data) is delivered to the time waveform buffer 366.
The time waveform data stored in the time waveform buffer 366 is used for audio reproduction at a timing required in the subsequent stage.
 合成波形生成部364a及び364bは、例えばスペクトルバッファ361から過去フレーム(1フレーム前や2フレーム前)のスペクトルデータを適宜取得する。そして、各スペクトルデータに基づいて合成波形11のスペクトルデータを生成し、スペクトル置換部360に引き渡す。
 合成波形生成部364cは、時間波形バッファ366から過去の再生波形の時間波形データを取得する。そして、時間波形データに基づいて合成波形11の時間波形データを生成し、MDCT部365に引き渡す。
 MDCT部365は、合成波形生成部364cが生成した合成波形11の時間波形データを取得し、これにMDCTを行って合成波形11のMDCTスペクトル(スペクトルデータ)を作成する。合成波形11のスペクトルデータは、スペクトル置換部360に引き渡される。
 合成波形生成部364a、364b、及び364cが用いる波形合成方法が、送信側で補間範囲71に割り当てられる波形合成方法となる。
The composite waveform generation units 364a and 364b appropriately acquire spectrum data of past frames (one frame before or two frames before) from the spectrum buffer 361, for example. Then, the spectrum data of the composite waveform 11 is generated based on each spectrum data and passed to the spectrum replacement unit 360.
The composite waveform generation unit 364c acquires the time waveform data of the past reproduced waveform from the time waveform buffer 366. Then, the time waveform data of the composite waveform 11 is generated based on the time waveform data, and is delivered to the MDCT unit 365.
The MDCT unit 365 acquires the time waveform data of the composite waveform 11 generated by the composite waveform generation unit 364c, and performs MDCT on this to create an MDCT spectrum (spectrum data) of the composite waveform 11. The spectrum data of the composite waveform 11 is passed to the spectrum replacement unit 360.
The waveform synthesis method used by the composite waveform generation units 364a, 364b, and 364c is the waveform synthesis method assigned to the interpolation range 71 on the transmitting side.
 なお、図32に示す信号処理部358の構成はあくまで一例であって、これに限定されるわけではない。例えば、採用される波形合成方法に応じて、必要となるデータの種類(スペクトルデータ、時間波形データ、過去フレームの数や位置)が異なることが考えられる。このため、信号処理部358の各構成(合成波形生成部364等)は、波形合成方法の種類等に応じて適宜設定されればよい。 Note that the configuration of the signal processing unit 358 shown in FIG. 32 is merely an example, and is not limited to this. For example, it is conceivable that the types of required data (spectral data, time waveform data, number and position of past frames) differ depending on the waveform synthesis method adopted. Therefore, each configuration of the signal processing unit 358 (composite waveform generation unit 364, etc.) may be appropriately set according to the type of waveform synthesis method and the like.
 図32に示す例では、例えば合成波形生成部364a及び364bにおいて、周波数領域での波形合成方法(例えば前フレームからのスペクトルのコピー等)が行われる。また合成波形生成部364cでは、時間波形を用いる波形合成方法(例えば線形予測符号(LPC:Linear Prediction Coefficient)を用いて波形を外挿する方法等)が行われる。
 どのような波形合成方法を用いるにせよ、合成波形11は、必要に応じて周波数領域(MDCT領域)のスペクトルへと変換された後、スペクトル置換部360に入力されればよい。
In the example shown in FIG. 32, for example, the composite waveform generation units 364a and 364b perform a waveform synthesis method in the frequency domain (for example, copying a spectrum from a previous frame). Further, the composite waveform generation unit 364c performs a waveform synthesis method using a time waveform (for example, a method of extrapolating a waveform using a linear prediction code (LPC)).
Regardless of which waveform synthesis method is used, the composite waveform 11 may be input to the spectrum replacement unit 360 after being converted into a spectrum in the frequency domain (MDCT region), if necessary.
 図33は、信号処理部358の動作の一例を示すフローチャートである。この処理は、フレーム単位で継続して実行されるループ処理である。
 以下では、図15に示す処理と同様に、冗長データ3の有無が通知され、対応する主データ2又は冗長データ3が存在する場合には、信号処理部358(スペクトル置換部360)は、前段の処理で復号が完了した主データ2又は冗長データ3のスペクトルを取得できているものとする。
 まず、スペクトル置換部360が取得したデータが主データ2であるか否かが判定される(ステップ1101)。取得したデータが主データ2である場合(ステップ1101のYES)、スペクトルバッファ361に主データ2が格納される(ステップ1109)。
FIG. 33 is a flowchart showing an example of the operation of the signal processing unit 358. This process is a loop process that is continuously executed on a frame-by-frame basis.
In the following, similarly to the processing shown in FIG. 15, the presence / absence of the redundant data 3 is notified, and when the corresponding main data 2 or the redundant data 3 exists, the signal processing unit 358 (spectrum replacement unit 360) is in the preceding stage. It is assumed that the spectrum of the main data 2 or the redundant data 3 for which decoding has been completed can be acquired by the process of.
First, it is determined whether or not the data acquired by the spectrum replacement unit 360 is the main data 2 (step 1101). When the acquired data is the main data 2 (YES in step 1101), the main data 2 is stored in the spectrum buffer 361 (step 1109).
 パケット1の損失等が生じ、冗長データ3が取得された場合(ステップ1101のNO)、合成波形生成部364cにより、時間波形バッファ366から合成波形11を生成するために必要な時間波形データが取得される(ステップ1102)。続いて、合成波形生成部364a及び364bにより、スペクトルバッファ361から合成波形11を生成するために必要なスペクトルデータが取得される。 When the packet 1 is lost and the redundant data 3 is acquired (NO in step 1101), the composite waveform generator 364c acquires the time waveform data required to generate the composite waveform 11 from the time waveform buffer 366. (Step 1102). Subsequently, the composite waveform generation units 364a and 364b acquire spectrum data necessary for generating the composite waveform 11 from the spectrum buffer 361.
 次に、合成波形生成部364a~364cのいずれかにより、波形/スペクトル合成処理が実行され、各部に設定された波形合成方法に従って合成波形11が生成される(ステップ1104)。1つの合成波形11が生成されると、実行されていない波形合成方法があるか否かが判定される(ステップ1105)。未処理の波形合成方法がある場合(ステップ1105のYES)、ステップ1104により次の波形合成方法に従って合成波形11が生成される。
 全ての波形合成方法が実行された場合(ステップ1105のNO)、MDCT処理が必要な合成波形11の時間波形データ(ここでは、合成波形生成部364cにより生成されたデータ)がMDCT部365によりスペクトルデータに変換される(ステップ1106)。
 これにより、各合成波形11のスペクトルデータX'idec[k]、が取得される。ここでは、合成波形1~3のスペクトルデータ(X'1dec[k]、X'2dec[k]、及びX'3dec[k])が算出される。
Next, any of the composite waveform generation units 364a to 364c executes the waveform / spectrum synthesis process, and the composite waveform 11 is generated according to the waveform synthesis method set in each unit (step 1104). When one composite waveform 11 is generated, it is determined whether or not there is a waveform synthesis method that has not been executed (step 1105). If there is an unprocessed waveform synthesis method (YES in step 1105), step 1104 generates the composite waveform 11 according to the following waveform synthesis method.
When all the waveform synthesis methods are executed (NO in step 1105), the time waveform data of the composite waveform 11 that requires MDCT processing (here, the data generated by the composite waveform generation unit 364c) is spectrumed by the MDCT unit 365. It is converted into data (step 1106).
As a result, the spectrum data X'idec [k] of each composite waveform 11 is acquired. Here, the spectral data of the composite waveforms 1 to 3 ( X'1dec [k], X'2dec [k], and X'3dec [k]) are calculated.
 次に、スペクトル置換部360により、スペクトル成分を置換する置換領域を設定する置換領域設定処理が実行される(ステップ1107)。具体的には、再生するフレームのスペクトルXout[]に対して、合成波形1~3のスペクトル成分を割り当てるインデックスと、冗長データのスペクトル成分を割り当てるインデックスとが設定される。
 置換領域設定処理の基本的な処理は、図15に示すステップ304の処理と略同様であるが、置換を行うインデックスを格納するための配列replace_isp[]として、波形合成方法と同数の配列が用意される。
Next, the spectrum substitution unit 360 executes a substitution region setting process for setting a substitution region for replacing the spectrum component (step 1107). Specifically, an index for allocating the spectrum components of the composite waveforms 1 to 3 and an index for allocating the spectrum components of the redundant data are set for the spectrum X out [] of the frame to be reproduced.
The basic process of the replacement area setting process is substantially the same as the process of step 304 shown in FIG. 15, but the same number of arrays as the waveform synthesis method are prepared as the array reply_isp [] for storing the index to be replaced. Will be done.
 ここでは、合成波形1~3に対応する配列として、replace_isp_1[]、replace_isp_2[]、及びreplace_isp_3[]が用意される。
 例えば各符号化周波数範囲70を指定するメタ情報に基づいて、冗長データ3を割り当てるインデックスの配列redun_isp[]が構成される。また、同じメタ情報に基づいて、各符号化周波数範囲70以外の補間範囲71を指定するインデックスが算出される。
 さらに、各補間範囲71に設定される波形合成方法を指定するメタ情報を参照して、合成波形1~3に対応する補間範囲71の配列がそれぞれ構成される。例えば、最も低域側の補間範囲71に合成波形1を用いるという指定に基づいて、その補間範囲71に含まれるインデックスがreplace_isp_1[]に格納される。
Here, as sequences corresponding to the composite waveforms 1 to 3, reply_isp_1 [], reply_isp_2 [], and reply_isp_3 [] are prepared.
For example, based on the meta information that specifies each coded frequency range 70, an array of indexes redund_isp [] to which the redundant data 3 is assigned is configured. Further, based on the same meta information, an index that specifies an interpolation range 71 other than each coded frequency range 70 is calculated.
Further, with reference to the meta information that specifies the waveform synthesis method set in each interpolation range 71, an array of interpolation ranges 71 corresponding to the composite waveforms 1 to 3 is configured. For example, based on the specification that the composite waveform 1 is used for the interpolation range 71 on the lowest frequency side, the index included in the interpolation range 71 is stored in response_isp_1 [].
 次に、置換領域設定処理の処理結果を用いて、スペクトル置換部360によりスペクトル置換処理が実行される。ここでは、redun_isp[]及び各replace_isp_i[]で指定されるインデックスに従って、Xout[]に冗長データ3及び合成波形1~3のスペクトル成分がそれぞれ代入される。このように、スペクトル置換処理では、スペクトル置換部360により、補間範囲71ごとに、メタ情報が指定する合成方法で生成された合成波形11を用いて冗長データ3が補間され、補間データXout[]が生成される。 Next, the spectrum replacement process is executed by the spectrum replacement unit 360 using the processing result of the replacement region setting process. Here, the spectral components of the redundant data 3 and the composite waveforms 1 to 3 are substituted into X out [] according to the indexes specified by redund_isp [] and each reply_isp_i []. In this way, in the spectrum replacement process, the spectrum replacement unit 360 interpolates the redundant data 3 for each interpolation range 71 using the composite waveform 11 generated by the synthesis method specified by the meta information, and the interpolation data X out [ ] Is generated.
 スペクトル置換処理の基本的な処理は、図15に示すステップ305の処理と略同様であるが、用いられる配列の数が異なる。例えば、ステップ305の詳細な処理フロー(図18参照)におけるステップ504で参照されるreplace_isp[]が、replace_isp_1[]~replace_isp_3[]に拡張される。また、Xout[isp]に代入される値は、合成波形1~3に対応するX'1dec[isp]、X'2dec[isp]、及びX'3dec[isp]のいずれかとなる。
 補間データとして生成されたXout[]は、次回以降の処理のためにスペクトルバッファに格納される(ステップ1109)。
The basic processing of the spectrum replacement processing is substantially the same as the processing of step 305 shown in FIG. 15, but the number of sequences used is different. For example, the response_isp [] referenced in step 504 in the detailed processing flow of step 305 (see FIG. 18) is extended to response_isp_1 [] to response_isp_3 []. The value to be assigned to the X out [isp] is, X corresponds to the synthesized waveform 1 ~ 3 '1dec [isp] , X' 2dec [isp], and the one of X '3dec [isp].
The X out [] generated as the interpolated data is stored in the spectrum buffer for the next and subsequent processing (step 1109).
 スペクトル置換部360で処理されたスペクトルデータXout[](主データ2又は補間データ)に対して、IMDCT処理が実行される(ステップ1110)。次に、時間信号出力部363により、IMDCTの結果からオーディオ信号が再構成される(ステップ1111)。ステップ1109~1111の処理は、図15に示すステップ306~308の処理と同様である。
 最後に、時間信号出力部363により、生成されたオーディオ信号が、次回以降の処理のために時間波形バッファ366に格納される。
IMDCT processing is executed on the spectrum data X out [] (main data 2 or interpolated data) processed by the spectrum replacement unit 360 (step 1110). Next, the time signal output unit 363 reconstructs the audio signal from the result of IMDCT (step 1111). The processing of steps 1109 to 1111 is the same as the processing of steps 306 to 308 shown in FIG.
Finally, the time signal output unit 363 stores the generated audio signal in the time waveform buffer 366 for the next and subsequent processing.
 このように、本実施形態に係る受信装置では、波形合成方法が異なる複数の合成方法を用いて、冗長データ3が補間される。これらの合成方法の割り振りは、補間結果のノイズ量が低減するように、送信装置20側で予め設定されている。この結果、パケット1の損失等が生じた場合であっても、ノイズの少ない高品質なエラーコンシールメントを実現することが可能となる。 As described above, in the receiving device according to the present embodiment, the redundant data 3 is interpolated by using a plurality of synthesis methods having different waveform synthesis methods. The allocation of these synthesis methods is preset on the transmission device 20 side so as to reduce the amount of noise in the interpolation result. As a result, even if the packet 1 is lost or the like, it is possible to realize high-quality error concealment with less noise.
 <その他の実施形態>
 本技術は、以上説明した実施形態に限定されず、他の種々の実施形態を実現することができる。
<Other Embodiments>
The present technology is not limited to the embodiments described above, and various other embodiments can be realized.
 上記では、主に無線通信を行う送受信システムについて説明した。これに限定されず、例えば有線通信により波形データの送受信を行うシステム等が用いられる場合にも、本技術は適用可能である。例えばネットワークストリーミング等を用いて音楽を再生する場合のPLC方法として、本技術が用いられてもよい。 In the above, the transmission / reception system that mainly performs wireless communication has been described. The present technology is not limited to this, and can be applied to, for example, a system for transmitting and receiving waveform data by wired communication. For example, this technique may be used as a PLC method when playing music by using network streaming or the like.
 以上説明した本技術に係る特徴部分のうち、少なくとも2つの特徴部分を組み合わせることも可能である。すなわち各実施形態で説明した種々の特徴部分は、各実施形態の区別なく、任意に組み合わされてもよい。また上記で記載した種々の効果は、あくまで例示であって限定されるものではなく、また他の効果が発揮されてもよい。 It is also possible to combine at least two feature parts among the feature parts related to the present technology described above. That is, the various feature portions described in each embodiment may be arbitrarily combined without distinction between the respective embodiments. Further, the various effects described above are merely examples and are not limited, and other effects may be exhibited.
 本開示において、「同じ」「等しい」「直交」等は、「実質的に同じ」「実質的に等しい」「実質的に直交」等を含む概念とする。例えば「完全に同じ」「完全に等しい」「完全に直交」等を基準とした所定の範囲(例えば±10%の範囲)に含まれる状態も含まれる。 In the present disclosure, "same", "equal", "orthogonal", etc. are concepts including "substantially the same", "substantially equal", "substantially orthogonal", and the like. For example, a state included in a predetermined range (for example, a range of ± 10%) based on "exactly the same", "exactly equal", "exactly orthogonal", etc. is also included.
 なお、本技術は以下のような構成も採ることができる。
(1)波形データの対象フレームに関する復元波形の波形品質を予測する品質予測部と、
 前記波形品質に基づいて、前記対象フレームに含まれる前記波形データから前記復元波形を生成するための冗長データに割り当てる周波数範囲として、少なくとも1つの対象範囲を設定する範囲設定部と、
 前記対象範囲に基づいて前記冗長データを生成し、前記冗長データを含む送信データを生成するデータ生成部と
 を具備する送信装置。
(2)(1)に記載の送信装置であって、
 前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であり、
 前記品質予測部は、前記復元波形の波形品質として、前記合成波形の波形品質を予測する
 送信装置。
(3)(2)に記載の送信装置であって、
 前記対象フレームは、前記送信データとして送信される送信フレームの近傍のフレームであり、
 前記品質予測部は、前記送信フレームに含まれる前記波形データに基づいて、前記対象フレームに関する前記合成波形を生成する
 送信装置。
(4)(2)又は(3)に記載の送信装置であって、
 前記品質予測部は、前記合成波形と前記対象フレームに含まれる前記波形データが表す元波形とに基づいて前記合成波形の波形品質を表すノイズスペクトルを算出し、
 前記範囲設定部は、前記ノイズスペクトルに基づいて、前記対象範囲を設定する
 送信装置。
(5)(4)に記載の送信装置であって、
 前記範囲設定部は、前記ノイズスペクトルと前記冗長データの符号化に伴う量子化ノイズとに基づいて、前記冗長データを前記合成波形で補間した補間データのノイズ総量を算出し、前記ノイズ総量が最小となるように前記対象範囲を設定する
 送信装置。
(6)(4)又は(5)に記載の送信装置であって、
 前記ノイズスペクトルは、前記元波形と前記合成波形との差分を周波数変換したスペクトル、又は前記元波形のスペクトルと前記合成波形のスペクトルとの差分を表すスペクトルのうちいずれか一方である
 送信装置。
(7)(4)から(6)のうちいずれか1つに記載の送信装置であって、
 前記範囲設定部は、前記ノイズスペクトルの積算値を算出する積算範囲を設定し、前記積算値が第1の閾値を超える最小の前記積算範囲に基づいて前記対象範囲を設定する
 送信装置。
(8)(7)に記載の送信装置であって、
 前記範囲設定部は、前記積算範囲の最低周波数を前記ノイズスペクトルの最低周波数に設定し、前記積算範囲の最高周波数を変化させて前記積算値を算出する
 送信装置。
(9)(4)から(6)のうちいずれか1つに記載の送信装置であって、
 前記範囲設定部は、前記ノイズスペクトルが周波数ごとに設定された第2の閾値を超える少なくとも1つの超過範囲を算出し、前記少なくとも1つの超過範囲に基づいて前記対象範囲を設定する
 送信装置。
(10)(1)から(9)のうちいずれか1つに記載の送信装置であって、
 前記範囲設定部は、前記対象範囲の候補となる複数の候補範囲を算出し、前記複数の候補範囲に基づいて前記対象範囲を設定する
 送信装置。
(11)(10)に記載の送信装置であって、
 前記範囲設定部は、互いに隣接する前記候補範囲を連結することで変化するノイズ量を表す連結コストを算出し、前記連結コストに基づいて前記候補範囲を連結する
 送信装置。
(12)(10)又は(11)に記載の送信装置であって、
 前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であり、
 前記範囲設定部は、前記対象フレームの前記波形データに含まれるトーン性の周波数成分を抽出し、所定の閾値周波数の高周波数側で、前記トーン性の周波数成分が含まれるように前記候補範囲の幅を調整する
 送信装置。
(13)(10)から(12)のうちいずれか1つに記載の送信装置であって、
 前記範囲設定部は、前記候補範囲の最高周波数及び最低周波数におけるノイズ成分に基づいて、前記候補範囲の幅を調整する
 送信装置。
(14)(1)から(13)のうちいずれか1つに記載の送信装置であって、
 前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であり、
 前記範囲設定部は、前記合成波形を生成する複数の合成方法のうちの1つを前記対象範囲以外の周波数範囲である非対象範囲にそれぞれ設定する
 送信装置。
(15)(14)に記載の送信装置であって、
 前記品質予測部は、前記複数の合成方法ごとに前記合成波形の波形品質を予測し、
 前記範囲設定部は、前記複数の合成方法ごとに予測された前記波形品質に基づいて、前記対象範囲と、前記非対象範囲に割り当てる合成方法とを設定する
 送信装置。
(16)(15)に記載の送信装置であって、
 前記範囲設定部は、前記複数の合成方法ごとに前記対象範囲の候補となる少なくとも1つの候補範囲を算出し、前記複数の合成方法ごとに算出された前記候補範囲の積集合が表す周波数範囲を前記対象範囲に設定し、前記複数の合成方法のうち前記ノイズスペクトルの積算値が最小となる方法を前記非対象範囲に設定する
 送信装置。
(17)波形データの対象フレームに関する復元波形の波形品質を予測し、
 前記波形品質に基づいて、前記対象フレームに含まれる前記波形データから前記復元波形を生成するための冗長データに割り当てる周波数範囲として、少なくとも1つの対象範囲を設定し、
 前記対象範囲に基づいて前記冗長データを生成し、前記冗長データを含む送信データを生成する
 ことをコンピュータシステムが実行する送信方法。
(18)波形データの対象フレームに関する復元波形の波形品質に基づいて、前記対象フレームに含まれる前記波形データの周波数範囲のうち少なくとも1つの対象範囲に割り当てられた冗長データを受信する受信部と、
 前記冗長データに基づいて前記復元波形を生成する波形復元部と
 を具備する受信装置。
(19)(18)に記載の受信装置であって、
 前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であり、
 前記受信部は、前記対象範囲以外の周波数範囲である非対象範囲ごとに前記合成波形の合成方法を指定する指定情報を受信し、
 前記波形復元部は、前記非対象範囲ごとに、前記指定情報が指定する前記合成方法で生成された前記合成波形を用いて前記冗長データを補間する
 受信装置。
(20)波形データの対象フレームに関する復元波形の波形品質に基づいて、前記対象フレームに含まれる前記波形データの周波数範囲のうち少なくとも1つの対象範囲に割り当てられた冗長データを受信し、
 前記冗長データに基づいて前記復元波形を生成する
 ことをコンピュータシステムが実行する受信方法。
(21)波形データの対象フレームに関する復元波形の波形品質を予測するステップと、
 前記波形品質に基づいて、前記対象フレームに含まれる前記波形データから前記復元波形を生成するための冗長データに割り当てる周波数範囲として、少なくとも1つの対象範囲を設定するステップと、
 前記対象範囲に基づいて前記冗長データを生成し、前記冗長データを含む送信データを生成するステップと
 をコンピュータシステムに実行させるプログラム。
(22)波形データの対象フレームに関する復元波形の波形品質に基づいて、前記対象フレームに含まれる前記波形データの周波数範囲のうち少なくとも1つの対象範囲に割り当てられた冗長データを受信するステップと、
 前記冗長データに基づいて前記復元波形を生成するステップと
 をコンピュータシステムに実行させるプログラム。
The present technology can also adopt the following configurations.
(1) A quality prediction unit that predicts the waveform quality of the restored waveform related to the target frame of the waveform data,
A range setting unit that sets at least one target range as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame based on the waveform quality.
A transmission device including a data generation unit that generates the redundant data based on the target range and generates transmission data including the redundant data.
(2) The transmitter according to (1).
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The quality prediction unit is a transmission device that predicts the waveform quality of the composite waveform as the waveform quality of the restored waveform.
(3) The transmitter according to (2).
The target frame is a frame in the vicinity of the transmission frame transmitted as the transmission data.
The quality prediction unit is a transmission device that generates the composite waveform of the target frame based on the waveform data included in the transmission frame.
(4) The transmitter according to (2) or (3).
The quality prediction unit calculates a noise spectrum representing the waveform quality of the composite waveform based on the composite waveform and the original waveform represented by the waveform data included in the target frame.
The range setting unit is a transmission device that sets the target range based on the noise spectrum.
(5) The transmitter according to (4).
The range setting unit calculates the total noise amount of the interpolated data obtained by interpolating the redundant data with the composite waveform based on the noise spectrum and the quantization noise accompanying the coding of the redundant data, and the total noise amount is the minimum. A transmitter that sets the target range so as to be.
(6) The transmitter according to (4) or (5).
The noise spectrum is one of a spectrum obtained by frequency-converting the difference between the original waveform and the composite waveform, or a spectrum representing the difference between the spectrum of the original waveform and the spectrum of the composite waveform.
(7) The transmitter according to any one of (4) to (6).
The range setting unit is a transmission device that sets an integration range for calculating an integrated value of the noise spectrum, and sets the target range based on the minimum integrated range in which the integrated value exceeds a first threshold value.
(8) The transmitter according to (7).
The range setting unit is a transmission device that sets the minimum frequency of the integration range to the minimum frequency of the noise spectrum, changes the maximum frequency of the integration range, and calculates the integrated value.
(9) The transmitter according to any one of (4) to (6).
The range setting unit is a transmission device that calculates at least one excess range in which the noise spectrum exceeds a second threshold value set for each frequency, and sets the target range based on the at least one excess range.
(10) The transmitter according to any one of (1) to (9).
The range setting unit is a transmission device that calculates a plurality of candidate ranges that are candidates for the target range and sets the target range based on the plurality of candidate ranges.
(11) The transmitter according to (10).
The range setting unit is a transmission device that calculates a connection cost representing the amount of noise that changes by connecting the candidate ranges adjacent to each other, and connects the candidate ranges based on the connection cost.
(12) The transmitter according to (10) or (11).
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The range setting unit extracts the toned frequency component included in the waveform data of the target frame, and sets the candidate range so that the toned frequency component is included on the high frequency side of a predetermined threshold frequency. A transmitter that adjusts the width.
(13) The transmitter according to any one of (10) to (12).
The range setting unit is a transmission device that adjusts the width of the candidate range based on noise components at the highest and lowest frequencies of the candidate range.
(14) The transmitter according to any one of (1) to (13).
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The range setting unit is a transmission device that sets one of a plurality of synthesis methods for generating the composite waveform in a non-target range that is a frequency range other than the target range.
(15) The transmitter according to (14).
The quality prediction unit predicts the waveform quality of the composite waveform for each of the plurality of synthesis methods.
The range setting unit is a transmission device that sets the target range and the synthesis method assigned to the non-target range based on the waveform quality predicted for each of the plurality of synthesis methods.
(16) The transmitter according to (15).
The range setting unit calculates at least one candidate range that is a candidate for the target range for each of the plurality of synthesis methods, and calculates a frequency range represented by the intersection of the candidate ranges calculated for each of the plurality of synthesis methods. A transmission device that sets the target range and sets the method that minimizes the integrated value of the noise spectrum among the plurality of synthesis methods in the non-target range.
(17) Predict the waveform quality of the restored waveform for the target frame of the waveform data,
Based on the waveform quality, at least one target range is set as a frequency range to be assigned to the redundant data for generating the restored waveform from the waveform data included in the target frame.
A transmission method in which a computer system executes to generate the redundant data based on the target range and generate transmission data including the redundant data.
(18) A receiving unit that receives redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform related to the target frame of the waveform data.
A receiving device including a waveform restoration unit that generates the restoration waveform based on the redundant data.
(19) The receiving device according to (18).
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The receiving unit receives designated information that specifies a method for synthesizing the synthesized waveform for each non-target range that is a frequency range other than the target range.
The waveform restoration unit is a receiving device that interpolates the redundant data using the composite waveform generated by the synthesis method specified by the designated information for each non-target range.
(20) Reconstructed with respect to the target frame of the waveform data Based on the waveform quality of the waveform, the redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame is received.
A receiving method in which a computer system performs to generate the restored waveform based on the redundant data.
(21) Steps for predicting the waveform quality of the restored waveform with respect to the target frame of the waveform data, and
A step of setting at least one target range as a frequency range to be assigned to redundant data for generating the restored waveform from the waveform data included in the target frame based on the waveform quality.
A program that causes a computer system to perform a step of generating the redundant data based on the target range and generating transmission data including the redundant data.
(22) A step of receiving redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform related to the target frame of the waveform data.
A program that causes a computer system to perform steps to generate the restored waveform based on the redundant data.
 1…パケット
 2…主データ
 3…冗長データ
 4…元波形
 5…復元波形
 6…元データ
 7…対象フレーム
 8…送信フレーム
 10…波形データ
 11…合成波形
 13…ノイズスペクトル
 14…閾値曲線
 15…トーン成分
 20…送信装置
 24…冗長データ生成部
 50…受信装置
 58…信号処理部
 70…符号化周波数範囲
 71…補間範囲
 72…積算範囲
 74…超過範囲
 75…候補範囲
 100…送受信システム
1 ... Packet 2 ... Main data 3 ... Redundant data 4 ... Original waveform 5 ... Restored waveform 6 ... Original data 7 ... Target frame 8 ... Transmission frame 10 ... Waveform data 11 ... Synthetic waveform 13 ... Noise spectrum 14 ... Threshold curve 15 ... Tone Component 20 ... Transmitter 24 ... Redundant data generator 50 ... Receiver 58 ... Signal processing unit 70 ... Coded frequency range 71 ... Interpolation range 72 ... Integration range 74 ... Excess range 75 ... Candidate range 100 ... Transmission / reception system

Claims (20)

  1.  波形データの対象フレームに関する復元波形の波形品質を予測する品質予測部と、
     前記波形品質に基づいて、前記対象フレームに含まれる前記波形データから前記復元波形を生成するための冗長データに割り当てる周波数範囲として、少なくとも1つの対象範囲を設定する範囲設定部と、
     前記対象範囲に基づいて前記冗長データを生成し、前記冗長データを含む送信データを生成するデータ生成部と
     を具備する送信装置。
    A quality prediction unit that predicts the waveform quality of the restored waveform related to the target frame of the waveform data,
    A range setting unit that sets at least one target range as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame based on the waveform quality.
    A transmission device including a data generation unit that generates the redundant data based on the target range and generates transmission data including the redundant data.
  2.  請求項1に記載の送信装置であって、
     前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であり、
     前記品質予測部は、前記復元波形の波形品質として、前記合成波形の波形品質を予測する
     送信装置。
    The transmitting device according to claim 1.
    The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
    The quality prediction unit is a transmission device that predicts the waveform quality of the composite waveform as the waveform quality of the restored waveform.
  3.  請求項2に記載の送信装置であって、
     前記対象フレームは、前記送信データとして送信される送信フレームの近傍のフレームであり、
     前記品質予測部は、前記送信フレームに含まれる前記波形データに基づいて、前記対象フレームに関する前記合成波形を生成する
     送信装置。
    The transmitting device according to claim 2.
    The target frame is a frame in the vicinity of the transmission frame transmitted as the transmission data.
    The quality prediction unit is a transmission device that generates the composite waveform of the target frame based on the waveform data included in the transmission frame.
  4.  請求項2に記載の送信装置であって、
     前記品質予測部は、前記合成波形と前記対象フレームに含まれる前記波形データが表す元波形とに基づいて前記合成波形の波形品質を表すノイズスペクトルを算出し、
     前記範囲設定部は、前記ノイズスペクトルに基づいて、前記対象範囲を設定する
     送信装置。
    The transmitting device according to claim 2.
    The quality prediction unit calculates a noise spectrum representing the waveform quality of the composite waveform based on the composite waveform and the original waveform represented by the waveform data included in the target frame.
    The range setting unit is a transmission device that sets the target range based on the noise spectrum.
  5.  請求項4に記載の送信装置であって、
     前記範囲設定部は、前記ノイズスペクトルと前記冗長データの符号化に伴う量子化ノイズとに基づいて、前記冗長データを前記合成波形で補間した補間データのノイズ総量を算出し、前記ノイズ総量が最小となるように前記対象範囲を設定する
     送信装置。
    The transmitting device according to claim 4.
    The range setting unit calculates the total noise amount of the interpolated data obtained by interpolating the redundant data with the composite waveform based on the noise spectrum and the quantization noise accompanying the coding of the redundant data, and the total noise amount is the minimum. A transmitter that sets the target range so as to be.
  6.  請求項4に記載の送信装置であって、
     前記ノイズスペクトルは、前記元波形と前記合成波形との差分を周波数変換したスペクトル、又は前記元波形のスペクトルと前記合成波形のスペクトルとの差分を表すスペクトルのうちいずれか一方である
     送信装置。
    The transmitting device according to claim 4.
    The noise spectrum is one of a spectrum obtained by frequency-converting the difference between the original waveform and the composite waveform, or a spectrum representing the difference between the spectrum of the original waveform and the spectrum of the composite waveform.
  7.  請求項4に記載の送信装置であって、
     前記範囲設定部は、前記ノイズスペクトルの積算値を算出する積算範囲を設定し、前記積算値が第1の閾値を超える最小の前記積算範囲に基づいて前記対象範囲を設定する
     送信装置。
    The transmitting device according to claim 4.
    The range setting unit is a transmission device that sets an integration range for calculating an integrated value of the noise spectrum, and sets the target range based on the minimum integrated range in which the integrated value exceeds a first threshold value.
  8.  請求項7に記載の送信装置であって、
     前記範囲設定部は、前記積算範囲の最低周波数を前記ノイズスペクトルの最低周波数に設定し、前記積算範囲の最高周波数を変化させて前記積算値を算出する
     送信装置。
    The transmitting device according to claim 7.
    The range setting unit is a transmission device that sets the minimum frequency of the integration range to the minimum frequency of the noise spectrum, changes the maximum frequency of the integration range, and calculates the integrated value.
  9.  請求項4に記載の送信装置であって、
     前記範囲設定部は、前記ノイズスペクトルが周波数ごとに設定された第2の閾値を超える少なくとも1つの超過範囲を算出し、前記少なくとも1つの超過範囲に基づいて前記対象範囲を設定する
     送信装置。
    The transmitting device according to claim 4.
    The range setting unit is a transmission device that calculates at least one excess range in which the noise spectrum exceeds a second threshold value set for each frequency, and sets the target range based on the at least one excess range.
  10.  請求項1に記載の送信装置であって、
     前記範囲設定部は、前記対象範囲の候補となる複数の候補範囲を算出し、前記複数の候補範囲に基づいて前記対象範囲を設定する
     送信装置。
    The transmitting device according to claim 1.
    The range setting unit is a transmission device that calculates a plurality of candidate ranges that are candidates for the target range and sets the target range based on the plurality of candidate ranges.
  11.  請求項10に記載の送信装置であって、
     前記範囲設定部は、互いに隣接する前記候補範囲を連結することで変化するノイズ量を表す連結コストを算出し、前記連結コストに基づいて前記候補範囲を連結する
     送信装置。
    The transmitting device according to claim 10.
    The range setting unit is a transmission device that calculates a connection cost representing the amount of noise that changes by connecting the candidate ranges adjacent to each other, and connects the candidate ranges based on the connection cost.
  12.  請求項10に記載の送信装置であって、
     前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であり、
     前記範囲設定部は、前記対象フレームの前記波形データに含まれるトーン性の周波数成分を抽出し、所定の閾値周波数の高周波数側で、前記トーン性の周波数成分が含まれるように前記候補範囲の幅を調整する
     送信装置。
    The transmitting device according to claim 10.
    The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
    The range setting unit extracts the toned frequency component included in the waveform data of the target frame, and sets the candidate range so that the toned frequency component is included on the high frequency side of a predetermined threshold frequency. A transmitter that adjusts the width.
  13.  請求項10に記載の送信装置であって、
     前記範囲設定部は、前記候補範囲の最高周波数及び最低周波数におけるノイズ成分に基づいて、前記候補範囲の幅を調整する
     送信装置。
    The transmitting device according to claim 10.
    The range setting unit is a transmission device that adjusts the width of the candidate range based on noise components at the highest and lowest frequencies of the candidate range.
  14.  請求項1に記載の送信装置であって、
     前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であり、
     前記範囲設定部は、前記合成波形を生成する複数の合成方法のうちの1つを前記対象範囲以外の周波数範囲である非対象範囲にそれぞれ設定する
     送信装置。
    The transmitting device according to claim 1.
    The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
    The range setting unit is a transmission device that sets one of a plurality of synthesis methods for generating the composite waveform in a non-target range that is a frequency range other than the target range.
  15.  請求項14に記載の送信装置であって、
     前記品質予測部は、前記複数の合成方法ごとに前記合成波形の波形品質を予測し、
     前記範囲設定部は、前記複数の合成方法ごとに予測された前記波形品質に基づいて、前記対象範囲と、前記非対象範囲に割り当てる合成方法とを設定する
     送信装置。
    The transmitting device according to claim 14.
    The quality prediction unit predicts the waveform quality of the composite waveform for each of the plurality of synthesis methods.
    The range setting unit is a transmission device that sets the target range and the synthesis method assigned to the non-target range based on the waveform quality predicted for each of the plurality of synthesis methods.
  16.  請求項15に記載の送信装置であって、
     前記範囲設定部は、前記複数の合成方法ごとに前記対象範囲の候補となる少なくとも1つの候補範囲を算出し、前記複数の合成方法ごとに算出された前記候補範囲の積集合が表す周波数範囲を前記対象範囲に設定し、前記複数の合成方法のうち前記ノイズスペクトルの積算値が最小となる方法を前記非対象範囲に設定する
     送信装置。
    The transmitting device according to claim 15.
    The range setting unit calculates at least one candidate range that is a candidate for the target range for each of the plurality of synthesis methods, and calculates a frequency range represented by the intersection of the candidate ranges calculated for each of the plurality of synthesis methods. A transmission device that sets the target range and sets the method that minimizes the integrated value of the noise spectrum among the plurality of synthesis methods in the non-target range.
  17.  波形データの対象フレームに関する復元波形の波形品質を予測し、
     前記波形品質に基づいて、前記対象フレームに含まれる前記波形データから前記復元波形を生成するための冗長データに割り当てる周波数範囲として、少なくとも1つの対象範囲を設定し、
     前記対象範囲に基づいて前記冗長データを生成し、前記冗長データを含む送信データを生成する
     ことをコンピュータシステムが実行する送信方法。
    Predict the waveform quality of the restored waveform for the target frame of the waveform data,
    Based on the waveform quality, at least one target range is set as a frequency range to be assigned to the redundant data for generating the restored waveform from the waveform data included in the target frame.
    A transmission method in which a computer system executes to generate the redundant data based on the target range and generate transmission data including the redundant data.
  18.  波形データの対象フレームに関する復元波形の波形品質に基づいて、前記対象フレームに含まれる前記波形データの周波数範囲のうち少なくとも1つの対象範囲に割り当てられた冗長データを受信する受信部と、
     前記冗長データに基づいて前記復元波形を生成する波形復元部と
     を具備する受信装置。
    A receiver that receives redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform with respect to the target frame of the waveform data.
    A receiving device including a waveform restoration unit that generates the restoration waveform based on the redundant data.
  19.  請求項18に記載の受信装置であって、
     前記復元波形は、前記冗長データと、前記対象フレームに関する合成波形とに基づいて生成される波形であり、
     前記受信部は、前記対象範囲以外の周波数範囲である非対象範囲ごとに前記合成波形の合成方法を指定する指定情報を受信し、
     前記波形復元部は、前記非対象範囲ごとに、前記指定情報が指定する前記合成方法で生成された前記合成波形を用いて前記冗長データを補間する
     受信装置。
    The receiving device according to claim 18.
    The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
    The receiving unit receives designated information that specifies a method for synthesizing the synthesized waveform for each non-target range that is a frequency range other than the target range.
    The waveform restoration unit is a receiving device that interpolates the redundant data using the composite waveform generated by the synthesis method specified by the designated information for each non-target range.
  20.  波形データの対象フレームに関する復元波形の波形品質に基づいて、前記対象フレームに含まれる前記波形データの周波数範囲のうち少なくとも1つの対象範囲に割り当てられた冗長データを受信し、
     前記冗長データに基づいて前記復元波形を生成する
     ことをコンピュータシステムが実行する受信方法。
    Based on the waveform quality of the restored waveform with respect to the target frame of the waveform data, the redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame is received.
    A receiving method in which a computer system performs to generate the restored waveform based on the redundant data.
PCT/JP2021/010803 2020-03-30 2021-03-17 Transmission device, transmission method, reception device, and reception method WO2021200151A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-061543 2020-03-30
JP2020061543 2020-03-30

Publications (1)

Publication Number Publication Date
WO2021200151A1 true WO2021200151A1 (en) 2021-10-07

Family

ID=77928486

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/010803 WO2021200151A1 (en) 2020-03-30 2021-03-17 Transmission device, transmission method, reception device, and reception method

Country Status (1)

Country Link
WO (1) WO2021200151A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004504744A (en) * 2000-07-18 2004-02-12 ローベルト ボツシユ ゲゼルシヤフト ミツト ベシユレンクテル ハフツング Error concealment method for digital audio data transmission error
JP2013524732A (en) * 2010-04-12 2013-06-17 クゥアルコム・アセロス・インコーポレイテッド Delayed acknowledgment for low overhead communication in networks
US20170103761A1 (en) * 2015-10-10 2017-04-13 Dolby Laboratories Licensing Corporation Adaptive Forward Error Correction Redundant Payload Generation
JP2017529565A (en) * 2014-08-27 2017-10-05 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Encoder, decoder and method for encoding and decoding audio content using parameters for enhancing concealment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004504744A (en) * 2000-07-18 2004-02-12 ローベルト ボツシユ ゲゼルシヤフト ミツト ベシユレンクテル ハフツング Error concealment method for digital audio data transmission error
JP2013524732A (en) * 2010-04-12 2013-06-17 クゥアルコム・アセロス・インコーポレイテッド Delayed acknowledgment for low overhead communication in networks
JP2017529565A (en) * 2014-08-27 2017-10-05 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Encoder, decoder and method for encoding and decoding audio content using parameters for enhancing concealment
US20170103761A1 (en) * 2015-10-10 2017-04-13 Dolby Laboratories Licensing Corporation Adaptive Forward Error Correction Redundant Payload Generation

Similar Documents

Publication Publication Date Title
US10546594B2 (en) Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program
US9659573B2 (en) Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program
JP5129888B2 (en) Transcoding method, transcoding system, and set top box
JP4876574B2 (en) Signal encoding apparatus and method, signal decoding apparatus and method, program, and recording medium
US9583112B2 (en) Signal processing apparatus and signal processing method, encoder and encoding method, decoder and decoding method, and program
AU2012297804B2 (en) Encoding device and method, decoding device and method, and program
JP3983668B2 (en) How to enhance the performance of coding systems that use high-frequency reconstruction methods
JP5048697B2 (en) Encoding device, decoding device, encoding method, decoding method, program, and recording medium
JP3926726B2 (en) Encoding device and decoding device
JP5942358B2 (en) Encoding apparatus and method, decoding apparatus and method, and program
AU2018201468B2 (en) Audio decoding device, audio encoding device, audio decoding method, audio encoding method, audio decoding program, and audio encoding program
KR20050030887A (en) Signal encoding device, method, signal decoding device, and method
JP2015500514A (en) Apparatus, method and computer program for avoiding clipping artifacts
JPWO2007116809A1 (en) Stereo speech coding apparatus, stereo speech decoding apparatus, and methods thereof
JP2012032803A (en) Full-band scalable audio codec
JP4736812B2 (en) Signal encoding apparatus and method, signal decoding apparatus and method, program, and recording medium
WO2006001159A1 (en) Signal encoding device and method, and signal decoding device and method
JP5589631B2 (en) Voice processing apparatus, voice processing method, and telephone apparatus
JP4558734B2 (en) Signal decoding device
WO2021200151A1 (en) Transmission device, transmission method, reception device, and reception method
JPWO2008155835A1 (en) Decoding device, decoding method, and program
JP2005114814A (en) Method, device, and program for speech encoding and decoding, and recording medium where same is recorded
JP5093514B2 (en) Audio encoding apparatus, audio encoding method and program thereof
JPH07225597A (en) Method and device for encoding/decoding acoustic signal
JP2009103974A (en) Masking level calculating device, encoder, masking level calculating method and masking level calculation program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21778921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21778921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP