WO2021200151A1

WO2021200151A1 - Transmission device, transmission method, reception device, and reception method

Info

Publication number: WO2021200151A1
Application number: PCT/JP2021/010803
Authority: WO
Inventors: 崇史服部; 劔持　千智; 戸栗　康裕; 竜二徳永; 田中　朗穂
Original assignee: ソニーグループ株式会社
Priority date: 2020-03-30
Filing date: 2021-03-17
Publication date: 2021-10-07

Abstract

A transmission device according to one embodiment of the present technology is provided with a quality prediction unit, a range setting unit, and a data generation unit. The quality prediction unit predicts waveform quality of a restoration waveform pertaining to a target frame in waveform data. On the basis of the waveform quality, the range setting unit sets at least one target range as a frequency range that is to be allotted to redundant data in order to generate the restoration waveform from the waveform data included in the target frame. The data generation unit generates the redundant data on the basis of the target range, and further generates transmission data that contains said redundant data.

Description

Transmitter, transmitter, receiver, and receiver

This technology relates to a transmitting device, a transmitting method, a receiving device, and a receiving method applicable to data communication.

Conventionally, technology for transmitting waveform data such as voice has been developed. For example, the original waveform is restored by combining the waveform data divided on the transmitting side on the receiving side. Further, there is known a technique for performing error concealment to compensate for a lost portion when a part of the waveform data is lost.

For example, Patent Document 1 describes an audio decoder including an error concealment unit. In this error concealment unit, when a frame loss or the like occurs, an audio information component expressing the high frequency side of the loss portion is synthesized by the concealment process in the frequency domain. Further, the audio information component expressing the low frequency side of the loss portion is synthesized by the concealment processing in the time domain. By combining these components, error concealment that avoids click sounds and beep sounds associated with the synthesis process is possible (paragraphs [0016] [0017] [0094] [0995] of the specification of Patent Document 1). 1, Fig. 2 etc.).

International release 2017/153006

In recent years, devices that transmit and reproduce waveform data such as voice have become widespread, and there is a demand for technology that can realize high-quality error concealment while suppressing the amount of data transmission.

In view of the above circumstances, an object of the present technology is to provide a transmission device, a transmission method, a reception device, and a reception method capable of realizing high-quality error concealment while suppressing the amount of data transmission. There is.

In order to achieve the above object, the transmission device according to one embodiment of the present technology includes a quality prediction unit, a range setting unit, and a data generation unit.
The quality prediction unit predicts the waveform quality of the restored waveform with respect to the target frame of the waveform data.
Based on the waveform quality, the range setting unit sets at least one target range as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame.
The data generation unit generates the redundant data based on the target range, and generates transmission data including the redundant data.

With this transmitter, the waveform quality of the restored waveform is predicted for the target frame of the waveform data. Based on this waveform quality, at least one target range, which is a frequency range allocated to redundant data for generating a restored waveform, is set. Then, transmission data including redundant data generated based on the target range is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.

The restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame. In this case, the quality prediction unit may predict the waveform quality of the composite waveform as the waveform quality of the restored waveform.

The target frame may be a frame in the vicinity of the transmission frame transmitted as the transmission data. In this case, the quality prediction unit may generate the composite waveform for the target frame based on the waveform data included in the transmission frame.

The quality prediction unit may calculate a noise spectrum representing the waveform quality of the composite waveform based on the composite waveform and the original waveform represented by the waveform data included in the target frame. In this case, the range setting unit may set the target range based on the noise spectrum.

The range setting unit calculates the total noise amount of the interpolated data obtained by interpolating the redundant data with the composite waveform based on the noise spectrum and the quantization noise accompanying the coding of the redundant data, and the total noise amount is the minimum. The target range may be set so as to be.

The noise spectrum may be either a spectrum obtained by frequency-converting the difference between the original waveform and the composite waveform, or a spectrum representing the difference between the spectrum of the original waveform and the spectrum of the composite waveform.

The range setting unit may set an integration range for calculating the integrated value of the noise spectrum, and may set the target range based on the minimum integrated range in which the integrated value exceeds the first threshold value.

The range setting unit may set the minimum frequency of the integration range to the minimum frequency of the noise spectrum and change the maximum frequency of the integration range to calculate the integration value.

The range setting unit may calculate at least one excess range in which the noise spectrum exceeds a second threshold set for each frequency, and set the target range based on the at least one excess range.

The range setting unit may calculate a plurality of candidate ranges that are candidates for the target range, and set the target range based on the plurality of candidate ranges.

The range setting unit may calculate a connection cost representing the amount of noise that changes by connecting the candidate ranges adjacent to each other, and connect the candidate ranges based on the connection cost.

The restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame. In this case, the range setting unit extracts the toned frequency component included in the waveform data of the target frame, and the toned frequency component is included on the high frequency side of the predetermined threshold frequency. The width of the candidate range may be adjusted.

The range setting unit may adjust the width of the candidate range based on the noise components at the highest frequency and the lowest frequency of the candidate range.

The restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame. In this case, the range setting unit may set one of the plurality of synthesis methods for generating the composite waveform in a non-target range which is a frequency range other than the target range.

The quality prediction unit may predict the waveform quality of the composite waveform for each of the plurality of synthesis methods. In this case, the range setting unit may set the target range and the synthesis method assigned to the non-target range based on the waveform quality predicted for each of the plurality of synthesis methods.

The range setting unit calculates at least one candidate range that is a candidate for the target range for each of the plurality of synthesis methods, and calculates a frequency range represented by the intersection of the candidate ranges calculated for each of the plurality of synthesis methods. The target range may be set, and the method in which the integrated value of the noise spectrum is minimized among the plurality of synthesis methods may be set in the non-target range.

A transmission method according to an embodiment of the present technology is a transmission method executed by a computer system and includes predicting the waveform quality of a restored waveform with respect to a target frame of waveform data.
Based on the waveform quality, at least one target range is set as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame.
The redundant data is generated based on the target range, and transmission data including the redundant data is generated.

The receiving device according to one embodiment of the present technology includes a receiving unit and a waveform restoring unit.
The receiving unit receives redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame, based on the waveform quality of the restored waveform with respect to the target frame of the waveform data.
The waveform restoration unit generates the restoration waveform based on the redundant data.

The restored waveform may be a waveform generated based on the redundant data and the composite waveform related to the target frame. In this case, the receiving unit may receive designated information that specifies a method of synthesizing the composite waveform for each non-target range that is a frequency range other than the target range. Further, the waveform restoration unit may interpolate the redundant data for each non-target range by using the composite waveform generated by the synthesis method specified by the designated information.

The receiving method according to one embodiment of the present technology is a receiving method executed by a computer system, and is a frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform with respect to the target frame of the waveform data. Includes receiving redundant data assigned to at least one of the scopes.
The restored waveform is generated based on the redundant data.

It is a figure which shows typically the appearance of the transmission / reception system which concerns on 1st Embodiment of this technique. It is a schematic diagram for demonstrating the outline of error concealment. It is a schematic graph which shows the original waveform represented by the original data for one frame. It is a schematic diagram which shows an example of the frequency range of redundant data which concerns on 1st Embodiment. It is a schematic diagram which shows an example of the error concealment given as a comparative example. It is a schematic diagram which shows an example of the transmission method of a waveform data. It is a block diagram which shows the configuration example of a transmission / reception system. It is a block diagram which shows the configuration example of a redundant data generation part. It is a flowchart which shows an example of the generation process of redundant data. It is a schematic diagram which shows the calculation example of a noise spectrum. It is a schematic diagram which shows the calculation example of the coding frequency range. It is a flowchart which shows an example of the coding range setting process. It is a schematic diagram for demonstrating the total noise amount of the interpolated data. It is a block diagram which shows the structural example of the signal processing part included in the receiving device. It is a flowchart which shows an example of the operation of a signal processing unit. It is a flowchart which shows an example of the substitution area setting process. It is a schematic diagram which shows an example of the frequency range set by the substitution area setting process. It is a flowchart which shows an example of spectrum substitution processing. It is a schematic diagram which shows an example of the coded frequency range which concerns on 2nd Embodiment. It is a schematic diagram which shows the calculation example of the coding frequency range. It is a flowchart which shows an example of the coding range setting process. It is a schematic diagram for demonstrating the non-tone component exclusion process. It is a flowchart which shows an example of the non-tone component exclusion processing. It is a schematic diagram for demonstrating the frequency range aggregation processing. It is a flowchart which shows an example of the frequency range aggregation processing. It is a schematic diagram for demonstrating the frequency range adjustment processing. It is a schematic diagram which shows an example of the coded frequency range which concerns on 3rd Embodiment. It is a block diagram which shows the structural example of the redundant data generation part which concerns on 3rd Embodiment. It is a flowchart which shows an example of the generation process of redundant data. It is a schematic diagram which shows an example of the coding range synthesis processing. It is a flowchart which shows an example of the coding range synthesis processing. It is a block diagram which shows the structural example of the signal processing part which concerns on 3rd Embodiment. It is a flowchart which shows an example of the operation of a signal processing unit.

Hereinafter, embodiments relating to the present technology will be described with reference to the drawings.

<First Embodiment>
[Transmission / reception system]
FIG. 1 is a diagram schematically showing the appearance of a transmission / reception system according to the first embodiment of the present technology. The transmission / reception system 100 includes a transmission device 20 and a reception device 50, and is a system that transmits waveform data from the transmission device 20 to the reception device 50.
Here, the waveform data is, for example, data representing a waveform that changes with time. In the transmission / reception system 100, for example, voice data representing a voice waveform is transmitted as waveform data.
Further, in the transmission / reception system 100, waveform data is transmitted by wireless communication between the transmission device 20 and the reception device 50. The communication standard for wireless communication is not limited, and a communication standard capable of transmitting waveform data such as Bluetooth (registered trademark) may be appropriately used.
In the example shown in FIG. 1, a portable terminal device (for example, a smartphone, a tablet terminal, a portable music player, etc.) is used as the transmission device 20. Further, as the receiving device 50, a voice reproducing device (for example, wireless headphones, wireless earphones, wireless speakers, etc.) capable of wirelessly connecting to the transmitting device 20 is used. In addition, the configurations of the transmitting device 20 and the receiving device 50 are not limited.

Waveform data such as voice transmitted from the transmitting device 20 is received by the receiving device 50. In the receiving device 50, the waveform represented by the waveform data is restored and reproduced as sound from the speaker mounted on the receiving device 50.
At this time, a part of the waveform data transmitted from the transmitting device 20 may not be received by the receiving device 50. For example, depending on the communication environment, wireless communication between the transmitting device 20 and the receiving device 50 may be hindered, and a situation may occur in which waveform data is partially lost.
In the transmission / reception system 100, error concealment is executed when such data loss occurs. In the present disclosure, the error concealment is, for example, a process of compensating for the lost portion when a part of the waveform data transmitted from the transmitting device 20 to the receiving device 50 is lost.

[Overview of error concealment]
FIG. 2 is a schematic diagram for explaining an outline of error concealment. FIG. 2A is a schematic diagram showing a process in which waveform data is transmitted. FIG. 2B is a schematic diagram showing an example of a waveform compensated by error concealment.

First, a method of transmitting waveform data will be described with reference to FIG. 2A.
When transmitting waveform data, the transmission device 20 transmits waveform data encoded by using a transmission encoder. Further, in the receiving device 50, the waveform data is decoded using a decoder corresponding to the encoder used for coding.
Data processing such as these coding (decoding) is executed for each frame that divides the waveform data along the time axis. Here, the frame is, for example, a processing unit standardized by a coding method. The length of the frame (division period for dividing the waveform) is set to, for example, a period (for example, 10 mSec or the like) according to the coding method used in the transmission device 20.
In the following, the waveform data (voice data) for one frame before encoding is described as the original data in the transmission device 20. Further, when showing the time waveform of the original data, it is described as x (n), and when showing the frequency spectrum obtained by performing time-frequency conversion on the original data, it is described as X (k).

The encoded original data is packed in packet 1 for each frame and transmitted. Here, the packet 1 is a data transmission unit between the transmitting device 20 and the receiving device 50. FIG. 2A schematically illustrates three packets 1 transmitted from the transmitting device 20 to the receiving device 50. In the present embodiment, the packet 1 transmitted from the transmission device 20 corresponds to the transmission data.

For example, when the packet 1 transmitted from the transmitting device 20 is not received by the receiving device 50, an error signal indicating that the packet 1 is not received from the receiving device 50 is transmitted. The transmission device 20 that has received the error signal executes a retransmission process for retransmitting the target packet 1. By repeating such processing, it is possible to prevent the loss of the packet 1.
On the other hand, among various voice transmission methods, for example, in voice transmission by BLE (Bluetooth Low Energy) or the like, a limit is provided on the number of times a packet is retransmitted. In this case, since the packet 1 exceeding the limit number of times is discarded, a missing packet (packet 1 in the center of FIG. 2A) is generated. If the voice or the like is reproduced as it is with the missing packet generated, a discontinuity point may occur in the voice signal, which may be perceived as an audible discomfort.

Therefore, in the transmission / reception system 100, packet loss concealment (PLC) for compensating for missing packets is executed as error concealment.
Specifically, the transmission device 20 generates a packet 1 including data in which redundant data 3 is added to the main data 2. That is, one packet 1 includes a set of main data 2 and redundant data 3.
Here, the main data 2 is a coded version of the original data that is originally desired to be transmitted. In the following, the frame of the main data 2 packed in the packet 1 and transmitted will be referred to as a transmission frame.
The redundant data 3 is data for one frame separately added to the main data 2 for the purpose of being used for error concealment (PLC in this case). In the following, the target frame for generating the redundant data 3 will be referred to as a target frame.
In the present embodiment, the data encoded by using a part of the original data included in the frame near the frame (transmission frame) of the main data 2 is used as the redundant data 3. Therefore, the target frame is a frame in the vicinity of the transmission frame 8 transmitted as the packet 1.

An example of the data structure of packet 1 is schematically shown on the lower side of FIG. 2A. In this example, one packet 1 includes one main data 2 and one redundant data 3. The main data 2 is, for example, data obtained by encoding the original data of the Mth frame (M) with the encoder for the main data 2. The redundant data 3 is data obtained by encoding the original data of the M + 1th frame (M + 1), which is a frame in the vicinity of the main data 2. In this case, the frame (M) is the transmission frame, and the frame (M + 1) is the target frame.
For encoding the redundant data 3, an encoder of generally low quality (high compression rate), which has different settings such as a coding method and a compression rate from the encoder of the main data 2, is used. Therefore, the data amount of the redundant data 3 is smaller than the data amount of the main data 2.
The target frame that becomes the redundant data 3 is not limited, and redundant data 3 such as the M + 2nd frame (M + 2) and the M-1st frame (M-1) may be added.
Further, the number of main data 2 (redundant data 3) packed in the packet 1 is not limited. For example, the present technology can be applied even when a packet 1 or the like including a set of main data 2 and redundant data 3 for a plurality of frames is used.

In this way, the packet 1 including the redundant data 3 is sequentially generated and transmitted to the receiving device 50. When a loss packet occurs, the receiving device 50 compensates for the loss data by using the redundant data 3 corresponding to the main data 2 (hereinafter, referred to as loss data) included in the loss packet.
For example, when the packet 1 including the main data 2 of the frame (M + 1) is discarded, the loss data is interpolated using the redundant data 3 of the frame (M + 1) that has already been received.
Such a PLC method is generally classified into Media-Specific FEC (Forward Error Correction). As a result, the loss data can be immediately compensated by using the already received redundant data 3. It is also possible to compensate for the lost data by receiving the necessary redundant data 3 after the packet is lost.

FIG. 2B schematically illustrates the waveform restored by the receiving device 50 when a lost packet occurs. For example, the time range shown by the dotted line in the figure is the loss period during which data was lost due to the occurrence of lost packets.
The receiving device 50 generates a restored waveform 5 in which the waveform (original waveform 4) represented by the original data in the loss period is restored. Specifically, interpolated data is generated by interpolating the redundant data 3 corresponding to the loss data using a composite waveform related to the loss data.
Here, the composite waveform is a waveform synthesized based on the data (preferably the main data) of a nearby frame normally received on the receiving terminal side for the purpose of using it for error concealment (PLC in this case). (Voice data).
The interpolated data is data generated by combining the redundant data 3 and the composite waveform, and is waveform data (audio data) used for the final concealment.

The waveform represented by this interpolated data is used as the restored waveform 5. Therefore, it can be said that the restored waveform 5 is a waveform generated based on the redundant data 3 and the composite waveform related to the target frame. In FIG. 2B, the original waveform 4 represented by the original data is shown by a solid line, and the restored waveform 5 represented by the interpolated data is shown by a dotted line.
In this way, by using the redundant data 3 and the composite waveform in combination, although the original data is not completely restored, various noises are sufficiently reduced and it is difficult to perceive a sense of discomfort in hearing, and the quality is high. It is possible to realize error concealment.

[Redundant data]
In the transmission / reception system 100, the transmission device 20 predicts in advance the waveform quality of the restoration waveform 5 restored by the reception device 50. That is, the waveform quality of the restored waveform 5 with respect to the target frame 7 of the waveform data is predicted.
Typically, an index (noise spectrum or the like described later) indicating the waveform quality of the composite waveform is calculated and used as an index indicating the waveform quality of the restored waveform. Then, the frequency range assigned to the redundant data 3 is set based on the predicted waveform quality.

For example, in the frequency range in which the waveform quality of the composite waveform has reached an acceptable level (frequency range in which the waveform quality is high), even if the loss data is interpolated using the composite waveform, it is difficult to perceive a sense of discomfort. On the other hand, in the frequency range where the waveform quality does not reach an acceptable level (frequency range where the waveform quality is low), by using the composite waveform, unnatural sound etc. is reproduced, and a sense of discomfort in hearing is perceived. There is a possibility that it will be done.
Therefore, in the transmission / reception system 100, error concealment using the redundant data 3 is executed in the frequency range where the waveform quality is low.
In the following, a method of setting the frequency range of redundant data using the prediction result of the waveform quality of the composite waveform will be mainly described.

FIG. 3 is a schematic graph showing the original waveform 4 represented by the original data 6 (target frame 7) for one frame. The horizontal axis of the graph is time, and the vertical axis is the amplitude value x (n). n is an index representing the time in the frame.
FIG. 4 is a schematic diagram showing an example of a frequency range of redundant data according to the first embodiment. FIG. 4 shows a schematic graph showing the frequency spectrum of the original waveform 4 shown in FIG. The horizontal axis of the graph is the frequency, and the vertical axis is the spectrum value X (k). k is an index (frequency bin) representing the frequency of each spectral value.

In FIG. 4, an example of the frequency range set based on the waveform quality is schematically illustrated by using arrows. Redundant data 3 is generated by encoding the frequency component (spectral component) of the original data 6 included in this frequency range.
In the following, the frequency range assigned to the redundant data 3 will be referred to as a coded frequency range 70. The coded frequency range 70 is, for example, a frequency range in which the waveform quality of the composite waveform is low (there is a lot of noise and the like). In this embodiment, the coded frequency range 70 corresponds to the target range.
Further, the frequency range other than the coded frequency range 70 (the frequency range shown by the shaded line in FIG. 4) is the frequency range interpolated by using the composite waveform at the time of packet loss. In the following, a frequency range other than the coded frequency range 70 will be referred to as an interpolation range 71. The interpolation range 71 is, for example, a frequency range in which the waveform quality of the composite waveform is high (noise and the like are small). In the present embodiment, the interpolation range 71 corresponds to the non-target range.
The method of evaluating the waveform quality and the method of setting the coding frequency range 70 (interpolation range 71) will be described in detail later.

FIG. 5 is a schematic view showing an example of error concealment given as a comparative example.
In the method shown in FIG. 5, the frequency range used as the redundant data 3 is set to a fixed range on the low frequency side including the lowest frequency (k = 0 Hz). Further, the frequency range on the higher frequency side than the fixed range is a blank range in which data is not restored.
In this case, at the time of packet loss, voice or the like including a spectral component included in a fixed range is restored. On the other hand, these sounds can be reproduced only within a fixed range of the redundant data 3 added at the time of concealment. As a result, audible discomfort may occur due to the lost high frequency energy.

In the present embodiment, as shown in FIG. 4, the transmission device 20 sets the data of the target frame 7 in the vicinity of the transmission frame of the main data 2 to be originally transmitted to the low frequency side according to the waveform quality of the composite waveform. Only the specific frequency band (encoded frequency range 70) is encoded, and the redundant data 3 is generated. The generated redundant data 3 is added to the packet 1 that transmits the main data 2.
When packet loss occurs, the receiving device 50 generates interpolated data in which the redundant data 3 corresponding to the lost main data 2 is interpolated with a composite waveform. More specifically, the range (interpolation range 71) other than the valid range (encoded frequency range 70) of the frequency spectrum of the redundant data 3 is replaced with the frequency spectrum of the composite waveform generated from the neighboring frames normally received in the past. Will be done.

As a result, for example, as shown in FIG. 5, it is possible to significantly suppress the sense of discomfort in hearing as compared with the case where only the redundant data 3 having a narrow band is decoded as compared with the main data 2 and used as the interpolated data. It is possible.
Further, it is desirable that the amount of redundant data 3 is as small as possible within the acceptable quality. In the present embodiment, the amount of redundant data 3 can be further reduced by predicting the waveform quality of the composite waveform in advance with the transmission device 20 and encoding only the band in which the quality is below a certain level. .. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.

Further, as a method of interpolating the data, for example, a process of replacing the spectrum outside the band of the redundant data 3 with the spectrum of a nearby frame can be mentioned. It can be said that this process uses a waveform obtained by copying a nearby frame as a composite waveform.
It is possible to significantly improve the deterioration of sound quality such as a feeling of clogging or discontinuity of sound due to energy attenuation in a high frequency range or the like even by performing such a simple and small amount of calculation processing. In other words, it is possible to maintain a high level of error concealment quality with a small amount of redundant data at all times without increasing the amount of calculation.

[Waveform data transmission method]
FIG. 6 is a schematic diagram showing an example of a waveform data transmission method.
Here, an example of a waveform data transmission method including coding and decoding will be described with reference to FIG. Further, as the waveform data, data representing an audio signal (input signal) such as voice is assumed.

In the transmission device 20, the input signal is separated by the frame length of the N sample, and an analysis frame of the 2N sample that overlaps with the previous frame by the same size as this frame length is generated. The analysis frame of this 2N sample is used as the transmission frame 8.
In FIG. 6, the time range corresponding to x (n), which is the data included in the transmission frame 8, is schematically illustrated by using arrows. In the figure, x _prev (n) and x _next (n) are a transmission frame 8 (previous frame) that is temporally earlier and a transmission frame 8 (rear frame) that is temporally later than x (n). ..
The original data 6 (input signal) of these transmission frames 8 is subjected to a predetermined analysis window to calculate a time-frequency-converted frequency spectrum. The type of analysis window is not limited. FIG. 6 schematically illustrates the outline of the function representing the analysis window. Further, for the time-frequency conversion, for example, a modified discrete cosine transform (MDCT) or the like is used.
The frequency spectrum of the transmission frame 8 is encoded, and the encoded data is packed in the packet 1 as the main data 2 and transmitted. At this time, redundant data 3 related to a frame (target frame 7) in the vicinity of the main data 2 is generated and added to the same packet 1. The type and setting of the coding method are not limited.

After receiving the transmitted data (packet 1), the receiving device 50 decodes the data and returns it to the frequency spectrum. For decoding, a decoding method corresponding to the coding method by the transmission device 20 is used.
An inverse modified discrete cosine transform (IMDCT) is applied to the decoded frequency spectrum to calculate the time waveform of a 2N sample. FIG. 6 schematically shows y (n), which is data included in the received frame.
An output signal is generated by applying a composite window to these y (n) and adding them in an overlapping manner with the front frame and the rear frame which have the same positional relationship as the transmitting side. The output signal is a signal in which the same waveform as the input signal is reconstructed.

Here, it is assumed that the data corresponding to y (n) (that is, the data of the encoded transmission frame 8 corresponding to x (n)) is lost due to the packet loss. In this case, it becomes difficult to properly calculate the output signal over the period of 2N samples corresponding to y (n).
Therefore, in the receiving device 50, in addition to the redundant data 3 corresponding to the missing frame, y _prev (n) and y _next (n), which are the decoding results of neighboring frames, or the past frame that has already been reconstructed, etc. Interpolated data that compensates for missing frames is generated using the generated composite waveform.

[Transmission / reception system configuration]
FIG. 7 is a block diagram showing a configuration example of the transmission / reception system 100.
The transmission / reception system 100 is a system that transmits waveform data 10 stored as an audio file from the transmission device 20 to the reception device 50 according to, for example, a BLE communication method.
The transmission / reception system 100 is designed assuming a use case in which, for example, both the transmission device 20 and the reception device 50 have restrictions on the amount of calculation. Examples of such a configuration include a combination of a transmitting device 20 such as a smartphone or a digital audio player and a receiving device 50 having a limited computing power such as wireless earphones or wireless headphones.
Of course, this technique can be applied even when a device having sufficient computing power (for example, a PC on the transmitting side and a stationary audio player on the receiving side) is used.

[Transmission device configuration]
Hereinafter, the configuration of the transmission device 20 in the transmission / reception system 100 will be described.
The transmission device 20 includes a retransmission timeout time calculation unit 21, a signal processing unit 22, an input buffer 23, a redundant data generation unit 24, a coding unit 25, a mux unit 26, and a transmission buffer 27. The transmission device 20 is configured by using, for example, a computer including a CPU and a memory. When the transmission device 20 executes the program related to the present embodiment and each unit operates, the transmission method according to the present embodiment is executed.
When the connection by BLE is completed, the retransmission timeout time calculation unit 21 acquires the parameters determined according to the combination of the transmission device 20 and the reception device 50, and calculates the retransmission timeout time. Here, the retransmission timeout time is a time limit for allowing the receiving device 50 to retransmit the packet 1 that has not been received. The packet 1 that is not received even if the retransmission timeout time is exceeded is processed as a lost packet.

The signal processing unit 22 reads the audio data (waveform data 10) required for generating one frame from the audio file, executes predetermined signal processing, and generates the original data 6. For example, MDCT is executed and the frequency spectrum of each frame is calculated as the original data 6. In addition, signal processing for adjusting gain and sound quality may be executed.
The input buffer 23 is a buffer that temporarily stores the data processed by the signal processing unit 22. The input buffer 23 stores the original data 6 representing the frequency spectrum and the time waveform of the waveform data. When the capacity of the input buffer 23 is full, the original data 6 having the lowest priority (typically, the oldest original data 6) is discarded.

The redundant data generation unit 24 reads the original data 6 stored in the input buffer 23 and generates the redundant data 3 and the main data 2. At this time, the coded frequency range 70 to which the redundant data 3 is assigned is set based on the waveform quality of the composite waveform.
Further, in the setting of the coded frequency range 70, information on the quantization setting (for example, the resolution at the time of quantizing the value of the frequency spectrum) output from the coding unit 25 described later and the amount of data to be transmitted (for example). It is also possible to use information about (such as the remaining data capacity of packet 1).
The specific configuration, operation, and the like of the redundant data generation unit 24 will be described in detail later.

The coding unit 25 encodes the main data 2 and the redundant data 3 output from the redundant data generation unit 24 according to the corresponding coding methods, respectively. For example, the main data 2 is encoded with a relatively low compression ratio, and the redundant data 3 is encoded with a higher compression ratio than the main data 2.
The mux unit 26 stores the main data 2 and the redundant data 3 encoded by the coding unit 25 in a predetermined packet 1. The data capacity of packet 1 and the like are set according to the communication method used (here, BLE communication).
The transmission buffer 27 is a buffer that temporarily stores the packet 1 generated by the mux unit 26. The packets 1 stored in the transmission buffer 27 are transmitted in a predetermined order via the transmission module (not shown).

FIG. 8 is a block diagram showing a configuration example of the redundant data generation unit.
The redundant data generation unit 24 includes an original data selection unit 30, a composite waveform generation unit 31, a generated noise calculation unit 32, a coding range setting unit 33, and a coding spectrum selection unit 34.
In the following, in the signal processing unit 22 in the previous stage of the redundant data generation unit 24, both the data X (k) representing the time-frequency-converted frequency spectrum and the data x (n) representing the corresponding time waveform are both. Is generated as the original data 6.

The original data selection unit 30 selects and acquires necessary data from the original data 6 stored in the input buffer 23. Specifically, data representing the frequency spectrum and time waveform of the original data 6 to be processed is read from the input buffer 23. Further, as shown in FIG. 8, the delivery destination of the data differs depending on the data to be acquired.
The frequency spectrum of the original data 6 (original data 6 included in the transmission frame 8) corresponding to the main data 2 is passed through to the coding unit 25 in the subsequent stage as it is.
The frequency spectrum and time waveform of the original data 6 (original data 6 included in the target frame 7) corresponding to the redundant data 3 are input to the generated noise calculation unit 32 and the coded spectrum selection unit 34.
The frequency spectrum and time waveform of the original data 6 for the composite waveform are input to the composite waveform generation unit 31. The original data 6 for the composite waveform is data included in a frame (for example, a transmission frame 8) in the vicinity of the target frame 7 which is the redundant data 3.

The composite waveform generation unit 31 generates a composite waveform for the target frame 7 based on the original data 6 for the composite waveform. For example, synthetic data representing the frequency spectrum of the composite waveform is generated.
Alternatively, synthetic data representing the time waveform of the composite waveform may be calculated. In this case, a process of converting the time waveform into a frequency spectrum or the like may be executed.
As described above, in the present disclosure, generating synthetic data representing the frequency spectrum of the synthetic waveform and generating synthetic data representing the time waveform of the synthetic waveform are included in generating the synthetic waveform.
As a method for generating the composite waveform, in the present embodiment, a method is adopted in which the frequency spectrum and the time waveform of the original data 6 for the composite waveform are used as they are as the composite waveform. In this case, the data obtained by copying the original data 6 for the composite waveform becomes the composite data. The method of generating the composite waveform is not limited, and for example, a method of appropriately processing the original data 6 for the composite waveform to generate the composite waveform may be used.
The composite waveform (composite data) is input to the generated noise calculation unit 32.

The generated noise calculation unit 32 acquires the original data 6 corresponding to the redundant data 3 and the composite data representing the composite waveform, and calculates the noise information related to the composite waveform.
The noise information is information representing noise generated when data is interpolated using, for example, a composite waveform. For example, the deviation of the composite waveform with respect to the original waveform 4 of the target frame 7 is calculated as the noise of the composite waveform.
As will be described later, the generated noise calculation unit 32 calculates the frequency distribution (noise spectrum) of such noise, the total amount of noise, and the like as noise information.

For example, a frequency region where noise is large is a region where the waveform quality of the composite waveform is low, and a frequency region where noise is small is a region where the waveform quality of the composite waveform is high. Therefore, it can be said that the noise information is not only the information indicating the waveform quality of the composite waveform but also the information representing the waveform quality of the restored waveform 5 generated by using the composite waveform.
In this way, the generated noise calculation unit 32 predicts the waveform quality of the composite waveform as the waveform quality of the restored waveform. In the present embodiment, the generated noise calculation unit 32 corresponds to the quality prediction unit.

The coding range setting unit 33 acquires noise information and sets a frequency range (coding frequency range 70) to be encoded as redundant data 3.
In the present embodiment, one coded frequency range 70 is set on the low frequency side with respect to the original data 6 included in the target frame 7 based on the noise information (waveform quality of the composite waveform). In addition to the waveform quality of the composite waveform, the coded frequency range 70 can be set by using other information indicating the waveform quality of the restored waveform 5.
In this way, the coding range setting unit 33 allocates the frequency assigned to the redundant data 3 for generating the restored waveform 5 from the original data 6 (waveform data) included in the target frame 7 based on the waveform quality of the restored waveform 5. The coded frequency range 70 is set as the range.
Further, as shown in FIG. 8, information regarding the quantization setting in coding, the data amount of the packet, and the like is input to the coding range setting unit 33. The coded frequency range 70 may be set using this information.

The coded spectrum selection unit 34 acquires the frequency spectrum of the original data 6 corresponding to the redundant data 3 and the coded frequency range 70, and extracts the spectrum component to be used as the redundant data 3. Specifically, only the spectral components included in the coded frequency range 70 are extracted from the original data 6. The data representing these spectral components becomes the redundant data 3 before coding.
In this way, the coded spectrum selection unit 34 generates the redundant data 3 based on the coded frequency range 70.

The redundant data 3 and the main data 2 before encoding are input to the coding unit 25 shown in FIG. 7, and the encoded redundant data 3 and the main data 2 are generated. Then, the mux unit 26 generates a packet 1 (transmission data) including the encoded main data 2 and the redundant data 3.
In the present embodiment, the coded spectrum selection unit 34, the coding unit 25, and the mux unit 26 cooperate to realize a data generation unit that generates transmission data including redundant data.

[Redundant data generation process]
In the present embodiment, in the transmission device 20, only the spectral components having a designated frequency or less are added to the packet 1 as redundant data 3.
Specifically, as shown in FIG. 4, a single coded frequency range 70 is set on the low frequency side of the graph of the frequency spectrum. Minimum frequency k _min of the coded frequency range 70 is set to the lower limit value of the frequency in the frequency spectrum of the original data 6 (k _min = 0). _{The maximum frequency k max} of the coded frequency range 70 is set based on the above-mentioned noise information (noise spectrum).

Further, as a method (waveform synthesis method) of generating a composite waveform related to the target frame 7 (redundant data 3) by the receiving device 50, a method of copying the original data 6 one frame before the target frame 7 is adopted.
For example, it is assumed that the redundant data 3 is the data related to x (n) shown in FIG. In this case, the waveform obtained by copying _{x prev} (n), which is the original data 6 one frame before the frame containing x (n) (target frame 7), becomes the composite waveform. That is, the composite waveform x'(n) of x (n) is expressed as _{x'(n) = x prev (n).}
For example, when error concealment is executed to compensate for the loss of packet 1 including x (n), in the _{frequency range higher than the maximum frequency k max} , the original data 6 one frame before the target frame 7 is used. The spectral component of a certain x _prev (n) will be used as it is.

By limiting the coded frequency range 70 to the low frequency side in this way, information for designating the coded frequency range 70 (for example, meta information used in the embodiments described later) becomes unnecessary, and redundant data 3 It is possible to avoid a situation in which the amount of data in the data increases.
Further, by adopting a method of copying the original data 6 one frame before as a waveform synthesis method, it is possible to reduce the amount of calculation on the receiving device 50 side.

FIG. 9 is a flowchart showing an example of the generation process of the redundant data 3. The process shown in FIG. 9 is an example of the process executed by the redundant data generation unit 24 and the coding unit 25. This process is, for example, a loop process that is executed every time packet 1 is generated.
First, the original data selection unit 30 acquires the original data 6 to be processed from the input buffer 23 (step 101).
Specifically, the original data 6 of the transmission frame 8 for generating the main data 2, the original data 6 of the target frame 7 for generating the redundant data 3, and the composite waveform for generating the composite waveform. The original data 6 and the original data 6 are read from the input buffer 23.

Here, the transmission frame 8 is the Mth frame (M), and the target frame 7 is the M + 1th frame (M + 1) (see FIG. 2A and the like). Further, as described above, in the present embodiment, the composite waveform is generated from the original data 6 of the frame immediately before the target frame 7. Therefore, the original data 6 for the composite waveform becomes the main data 2.
If the target frame 7 (redundant data 3) is not the frame immediately after the transmission frame (main data 2), the original data 6 one frame before the target frame 7 in chronological order is the main data 2. Separately, it is read as the original data 6 for the composite waveform.

The composite waveform generation unit 31 executes a composite waveform generation process for generating a composite waveform for the target frame 7 based on the original data 6 for the composite waveform (step 102).
In the present embodiment, the composite waveform generation unit 31 generates a composite waveform for the target frame 7 based on the original data 6 (main data 2) included in the transmission frame 8.
It is desirable that the method for generating the composite waveform is exactly the same as the waveform synthesis method used in the receiving device 50. As described above, in the present embodiment, a method of copying the original data 6 one frame before the target frame 7 (redundant data 3) is used. Therefore, the process by the composite waveform generation unit 31 is a process of passing through the main data 2 which is the original data 6 for the composite waveform as it is.
When another waveform synthesis method is used, a composite waveform is appropriately generated based on the original data 6 for the composite waveform according to the set method.

The generated noise calculation unit 32 executes a generated noise prediction process for predicting the noise generated by using the composite waveform (step 103).
Specifically, by using the composite waveform, the frequency spectrum (hereinafter, referred to as noise spectrum) of the noise generated with respect to the original waveform (original waveform 4) is calculated. The noise spectrum is typically calculated as a power spectrum representing the intensity (power) of noise for each frequency. The noise spectrum can also be used as a measure of the waveform quality of the composite waveform. That is, it can be said that the generated noise prediction process is a process for predicting the waveform quality (noise spectrum) of the composite waveform.
In the present embodiment, the noise spectrum 13 representing the waveform quality of the composite waveform is calculated based on the composite waveform 11 and the original waveform 4 represented by the original data 6 included in the target frame 7.

FIG. 10 is a schematic diagram showing a calculation example of a noise spectrum. 10A to 10C are schematic graphs showing an example of a time waveform for one frame of the original waveform 4, the composite waveform 11, and the difference waveform 12. FIG. 10D is a schematic graph showing an example of the noise spectrum 13.
Here, a method of calculating the noise spectrum 13 by using the original waveform 4 which is a time waveform and the composite waveform 11 will be described.

The generated noise calculation unit 32 reads the original waveform x (n) represented by the original data 6 used for the redundant data 3 and the composite waveform x'(n) calculated by the composite waveform generation unit 31, and the difference between the waveforms. The difference waveform 12 representing the above is calculated.
For example, the original waveform 4 (original data 6) shown in FIG. 10A and the composite waveform 11 (composite data) shown in FIG. 10B are read. As shown in FIGS. 10A and 10B, the composite waveform 11 is, for example, the waveform of the previous frame of the target frame 7 including the original waveform 4, and therefore does not completely match the shape of the original waveform 4.
When each waveform is read, the difference waveform 12 (x (n) −x'(n)) between the original waveform 4 and the composite waveform 11 is calculated. The difference waveform 12 is a waveform representing the difference between the original waveform 4 and the composite waveform 11 at each timing n. FIG. 10C schematically shows a difference waveform 12 between the original waveform 4 and the composite waveform 11 shown in FIGS. 10A and 10B.

Time-frequency conversion (for example, Fourier transform) is executed on the difference waveform 12, and the frequency spectrum of the difference waveform 12 is calculated. The power spectrum representing the absolute value of this frequency spectrum _{is calculated as the noise spectrum 13 (P noise} (k)).
In this case, P _noise (k) is expressed using the following equation.

Note that F in (Equation 1) represents a Fast Fourier Transform (FFT) for the difference waveform 12.
As described above, the noise spectrum 13 is a spectrum obtained by frequency-converting the difference between the original waveform 4 and the composite waveform 11. This makes it possible to evaluate the noise generated in the actual time waveform for each frequency, and it is possible to accurately predict the waveform quality of the composite waveform 11.

FIG. 10D schematically shows a power spectrum obtained by Fourier transforming the difference waveform 12 shown in FIG. 10C. As shown in FIG. 10D, the noise spectrum 13 (P _noise (k)) is a frequency spectrum representing the intensity of the difference between the original waveform 4 and the composite waveform 11 for each frequency k.
For example, in the _{frequency range where the value of P noise} (k) is small, it can be considered that a high-quality composite waveform can be generated by assuming that the composite waveform 11 is a waveform close to the original waveform 4. On the contrary, in the _{frequency range in which the value of P noise} (k) is large, the composite waveform 11 can be regarded as a waveform deviated from the original waveform 4, and the quality of the composite waveform can be regarded as low.

The noise spectrum 13 is preferably calculated by applying the analysis window w (n). In this case, the noise spectrum 13 is calculated using the following equation instead of the equation (Equation 1).

The analysis window w (n) is appropriately set according to the process of interpolating the data using, for example, the composite waveform 11. This makes it possible to accurately predict the noise that actually occurs by using the composite waveform.

In the above, the method of calculating the noise spectrum 13 by executing the FFT after taking the difference between the two time waveforms (the original waveform 4 and the composite waveform 11) has been described. Instead of this, it is also possible to calculate the noise spectrum 13 using the frequency spectra of the original waveform 4 and the composite waveform 11.
For example, the generated noise calculation unit 32 reads the frequency spectrum X (k) of the original waveform 4 and the frequency spectrum X'(k) of the composite waveform 11 to calculate a difference spectrum representing the difference between the respective spectra. As X (k) and X'(k), for example, the MDCT spectrum in which x (n) and x'(n) (for example, the original data 6 of the main data 2) are MDCC-converted in advance by the signal processing unit 22 is used. It can be used.

In this case, for each M DCT spectrum (X (k) and X'(k)), a difference spectrum obtained by directly taking a difference is calculated. Then, the power spectrum representing the absolute value or the square of the difference spectrum is calculated as the noise spectrum 13.
In this case, P _noise (k) is expressed using the following equation.

As described above, the noise spectrum 13 is a spectrum representing the difference between the spectrum of the original waveform 4 and the spectrum of the composite waveform 11.
By adopting such a method, it is possible to predict noise generated by phase discontinuity, which is difficult to detect by comparison between power spectra, for example.
When a copy from the previous frame is used as the waveform synthesis method, the original data 6 (here, the main data 2) for the composite waveform that has already been MDCT-converted may be directly read. Therefore, it is not necessary to perform the time-frequency conversion process (FFT or the like) again in order to calculate the noise spectrum 13, and the MDCT spectrum X'(k) used when generating the main data 2 can be reused. Will be. As a result, the amount of calculation can be sufficiently suppressed.

Further, the moving average of the spectra calculated using (Equation 1) to (Equation 3) may be calculated as the noise spectrum 13. The moving average is a process of moving a predetermined bin range (for example, bins for three spectra) and calculating the average of each spectrum value included in the bin range.
The noise spectrum 13 (P _{noise-smoothed} (k)) calculated by the moving average is expressed using the following equation.

As a result, for example, the noise spectrum 13 can be smoothed, and the data processing in the subsequent stage can be easily executed.
In the present disclosure, a spectrum obtained by smoothing the noise spectrum 13 calculated by using ( _{Equation 1) or the like (P noise-smoothed} (k)) is also included in the noise spectrum 13. Hereinafter, P _{noise-smoothed} (k) will _{be described simply as P noise} (k).

Returning to FIG. 9, when the noise spectrum 13 is calculated, the coding range setting unit 33 executes the coding range setting process for setting the coding frequency range 70 (step 104). Specifically, the coding frequency range 70 is set based on _{the noise spectrum 13 (P noise} (k)) calculated by the generated noise calculation unit 32.
For example, a frequency range in which noise is large is calculated from the noise spectrum 13, and is set to a coded frequency range 70 to be assigned to the redundant data 3. That is, the frequency range assigned to the redundant data 3 is a frequency range in which the waveform quality of the composite waveform is low.
In this way, by using the noise spectrum, it is possible to accurately set the frequency range to be assigned to the redundant data 3 (that is, the composite waveform should not be used).
The coding range setting process will be described in detail later.

When the coded frequency range 70 is set, the coded spectrum selection unit 34 executes a coded spectrum selection process for extracting only the spectrum components corresponding to the coded frequency range 70 from the original data 6 of the target frame 7. (Step 105).
For example, the original data 6 representing the MDCT spectrum X (k) of the target frame 7 is input to the coded spectrum selection unit 34. Of the spectral components (frequency components) included in X (k), the components included in the coded frequency range 70 are extracted. The data including the extracted components becomes the redundant data 3 before coding.
Therefore, the data amount of the redundant data 3 before coding changes according to the width of the coding frequency range 70 (the state of the noise spectrum 13).

When the redundant data 3 is extracted, it is determined whether or not the original data 6 to be processed remains (step 106). Specifically, it is determined whether or not the redundant data 3 to be packed in one packet 1 remains.
For example, when the original data 6 to be processed (that is, the redundant data 3 to be generated) remains (YES in step 106), the processes of steps 101 to 105 are executed for the remaining original data 6.
The configuration of the main data 2 and the redundant data 3 packed in one packet 1 is not limited. For example, in the example described with reference to FIG. 2A, packet 1 including one data set (main data 2 and redundant data 3) is generated. In this case, the loop processing of step 106 is not executed. On the other hand, when a plurality of data sets are packed in one packet 1, the processes up to step 105 are executed for the number of redundant data 3 to be generated.

When the original data 6 to be processed does not remain (NO in step 106), the coding process for encoding the redundant data 3 is executed (step 107). Here, the unencoded redundant data 3 (spectral component included in the encoded frequency range 70) generated in the above process is encoded by a predetermined coding method, and the encoded redundant data 3 is used. Is generated.
In the coding process, it is possible to adjust the data amount of the encoded redundant data 3 by appropriately setting the compression rate (bit rate) and the like at the time of coding. For example, when the target data amount of the redundant data 3 is set, the redundant data 3 is encoded with a compression rate that fits in the target data amount.
Further, the compression rate of the redundant data 3 may be fixed. In this case, the amount of coded redundant data 3 varies depending on the width of the coded frequency range 70 and the like.
The coding of the main data 2 is executed separately from the coding of the redundant data 3.

When the redundant data 3 is encoded, the target data amount of the main data 2 is set (step 108). Specifically, the free space of the packet 1 is calculated from the data amount of the encoded redundant data 3. The free capacity of the packet 1 is, for example, the capacity obtained by subtracting the total amount of the encoded redundant data 3 from the data size of the packet 1. The free space of the packet 1 is set as the target data amount of the main data 2. For example, when encoding the main data 2, the compression rate or the like is appropriately set so that the amount of the encoded main data 2 fits within the target data amount set here.

[Code-coded range setting process]
FIG. 11 is a schematic diagram showing a calculation example of the coded frequency range. FIG. 11 shows a schematic graph showing a noise spectrum 13 (P _{noise (k)).} The horizontal axis of the graph is the frequency, and the vertical axis is the spectrum value of _{P noise (k).}
Hereinafter, the details of the coding range setting process (step 104 in FIG. 9) will be described.

In the present embodiment, the coding range setting unit 33 sets the integration range 72 for calculating the integrated value of the noise spectrum 13, and the coding frequency range is based on the minimum integration range 72 in which the integrated value exceeds the first threshold value. 70 is set.
The integrated value of the noise spectrum 13 is a value representing the total amount of noise in the target frequency range. Therefore, it can be said that the coded frequency range 70 is set to a frequency range in which the total noise amount is about the first threshold value.
In the following, a case where the minimum integration range 72 whose integrated value exceeds the first threshold value is set as the coding frequency range 70 will be described.

First, the total noise amount P _noise-sum in the preset total calculation range 73 is calculated. In FIG. 11, the total calculation range 73 is schematically shown as a dotted line range.
The minimum index (lowest frequency) of the total calculation range 73 is set to 0. Further, the maximum index (maximum frequency) of the total calculation range 73 can be arbitrarily set to be less than half (N) of the total number of FFT indexes of 2N (FFT length). In the following, the maximum index of the total calculation range 73 is described as total_area.
_{The total noise amount P noise-sum} in the total calculation range 73 is expressed using the following equation.

Further, the integration range 72 is set, and the total noise amount P _{noise-red un-sum} in the integration range 72 is calculated. In FIG. 11, the integration range 72 is schematically shown as a solid line range.
The minimum index (lowest frequency) of the integration range 72 is set to 0. Further, the maximum index (maximum frequency) of the integration range 72 is set to be included in the total calculation range 73. In the following, the maximum index of the integration range 72 will be referred to as redun_area.
_{The total noise amount P noise-red un-sum} in the integration range 72 is expressed using the following equation.

_{The total noise amount P noise-red un-sum in the} integration range 72 represented by the equation (Equation 6) is equal to or more than a predetermined ratio α (for example, 0.7) with respect to _{the total noise amount P noise-sum in} the total calculation range 73. The minimum integration range 72 is calculated. That is, the minimum value of redun_area that satisfies the following formula is calculated.

The left side of the equation (Equation 7) ( _{the product of the predetermined ratio α and the total noise amount P noise-sum in the} total calculation range 73) corresponds to the above-mentioned first threshold value.
The minimum value of redun_area satisfying the equation (Equation 7) is set to the maximum index (maximum frequency) of the coded frequency range 70. Further, as described with reference to FIG. 4, the minimum index (minimum frequency) of the coding frequency range 70 is set to 0.

As described above, in the present embodiment, the coded frequency range 70 is set to the frequency range according to the distribution of noise due to the composite waveform. This makes it possible to allocate only the necessary frequency range to the redundant data 3, as compared with the case where, for example, the entire frequency region or a fixed fixed region of the original data 6 is allocated to the redundant data 3. As a result, it is possible to reduce the amount of data without degrading the quality of the redundant data 3.

_{Here, the noise P noise-residue} (= P _noise-sum- P _{noise-redun-sum} ) remaining in the frequency range that is not retained as the redundant data 3, that is, the frequency range in which the composite waveform 11 is used is estimated.
_{When the equation (Equation 7) is satisfied, the P noise-residue} satisfies the following equation if the quantization noise associated with the coding of the redundant data 3 is ignored.

As shown in the equation (Equation 8), the noise P _{noise-residue} remaining in the region not retained as the redundant data 3 is reduced to (1-α) or less of the noise originally existing in the total calculation range 73. become.

For example, the total calculation range 73 is set based on the human audible range. In this case, P _noise-sum is the total amount of noise due to the composite waveform 11 in the frequency range that can be heard by humans. It can be said that the above method is a method of setting the encoded frequency range 70 of the redundant data so as to reduce the noise amount by a predetermined ratio α in the total amount of noise in such an audible range.
In addition, by setting the coded frequency range 70 on the low frequency side with redun_area as the maximum index, it is possible to avoid the situation where noise due to the composite waveform is generated on the low frequency side, and the sense of discomfort in hearing is greatly reduced. It is possible to reduce it.

When the ratio α of P _{noise-red un} _-sum to P noise-sum is set high, the coding frequency range 70 becomes large. In this case, although the noise is reduced by the composite waveform, the quantization noise described later may increase. For example, when the target data amount of the redundant data 3 is predetermined, the larger the coding frequency range 70 is, the larger the quantization noise is.
Considering such a noise trade-off, it is possible to set the value of α so that the total noise amount of the restored waveform 5 finally restored is reduced.

FIG. 12 is a flowchart showing an example of the coding range setting process. The process shown in FIG. 12 is an example of the coding range setting process described with reference to FIG.
_{First, P noise} (k) is read by the coding range setting unit 33 (step 201).
Next, for P _noise (k), the total noise amount P _noise-sum in the total calculation range 73 is calculated (step 202). Specifically, the integrated value of the noise spectrum 13 from k = 0 to k = total_area is calculated as _{P noise-sum according to the equation (Equation 5).}
Next, the maximum index redun_area of the integration range 72 is initialized (step 203). The redun_area functions as a variable used to set the coding frequency range 70. Here, 0 is assigned to redun_area.
Next, for P _noise (k), the total noise amount P _{noise-red un-sum} in the integration range 72 is calculated (step 204). Specifically, the integrated value of the noise spectrum 13 from k = 0 to k = redun_area is calculated as _{P noise-redun-sum according to the equation (Equation 6).}

Next, _{it is determined whether or not P noise-redun-sum satisfies the} condition shown in the equation (Equation 7) (step 205). That is, by comparing _{P noise-redun-sum} and P _noise-sum _{, it is determined whether or not P noise-redun-sum} is α (0 <α <1) times or more of _{P noise-sum.} NS.
When the condition of the equation (Equation 7) is satisfied (YES in step 205), redun_area is set to the maximum value (maximum index) of the coding frequency range 70 (step 208).
If the condition of Eq. (Equation 7) is not satisfied (NO in step 205), redun_aera is incremented and 1 is added to redun_area (step 206).
Next, it is determined whether or not the incremented redun_area is within the total calculation range 73 (step 207). That is, it is determined whether or not redun_area is smaller than total_area (the maximum index of the total calculation range 73).
If redun_area is smaller than total_area (YES in step 207), the processes after step 204 are repeated. If redun_area is greater than or equal to total_area (NO in step 207), step 208 is executed.

As described above, in the process shown in FIG. 12, it is determined whether or not the equation (Equation 7) is satisfied every time the maximum index redun_area of the integration range 72 is increased from k = 0. Then, the first index satisfying the equation (Equation 7) is set to the maximum value of the coding frequency range 70.
In addition, the method of calculating the integration range 72 that satisfies the equation (Equation 7) is not limited. For example, redun_area is decremented from k = total_area, it is determined whether or not the equation (Equation 7) is satisfied, and the last index satisfying the equation (Equation 7) is set to the maximum value of the coding frequency range 70. May be good.
As described above, in the present embodiment, the minimum frequency of the integration range 72 is set to the minimum frequency of the noise spectrum 13, and the integration value is calculated by changing the maximum frequency of the integration range. This makes it possible to easily calculate redundant data 3 that suppresses noise in a frequency range that is easily heard by humans.

[Coding range setting process considering quantization noise]
In the above, among the noise components included in the interpolated data (see FIG. 2B) generated at the time of packet loss, the coded frequency range 70 focuses mainly on the noise (noise spectrum 13) generated by using the composite waveform 11. Explained how to set.
In addition to the noise due to the composite waveform 11, the complementary data may include quantization noise associated with the coding of the redundant data 3. For example, in the process of encoding the redundant data 3, the original data 6 is quantized according to the set compression rate and the like. At this time, the higher the compression rate and the lower the quantization accuracy (bit rate, etc.), the smaller the amount of data, but on the other hand, the larger the quantization noise.

For example, there may be a use case in which the target data amount (hereinafter referred to as nbit) of the redundant data 3 is predetermined so that the size of the redundant data 3 does not become larger than necessary with respect to the main data 2. In this case, the compression rate or the like is set so that the amount of the redundant data 3 after encoding is contained in nbit. Therefore, for example, when the coding frequency range 70 is large, the compression rate is set high and the quantization noise may increase.

FIG. 13 is a schematic diagram for explaining the total noise amount of the interpolated data. 13A and 13B are schematic graphs showing the frequency distribution of noise included in the interpolated data. The horizontal axis of the graph is the frequency, and the vertical axis is the noise intensity at each frequency.
As shown in FIG. 13A, the coded frequency range 70 set on the low frequency side is a region where the redundant data 3 is used, and the noise of the interpolated data is represented by using the _{quantization noise N q (k).} .. Further, the frequency side higher than the coded frequency range 70 is a region where the composite waveform 11 is used, and the noise of the interpolated data is represented by using the _{noise P noise (k) due to the composite waveform.}
In FIG. 13B, a coding frequency range 70 wider than that in FIG. 13A is set. In this case, _{the total amount of P noise} (k) decreases and the total amount of N _q (k) increases.

FIG. 13C is a graph showing the relationship between the total noise amount of the interpolated data and the coded frequency range. Here, when the maximum index (redun_area) of the coded frequency range 70 is changed, the total amount of noise due to the composite waveform P _noise _{, the total amount of quantization noise N q} , and the total amount of noise of the interpolated data (P _noise + N). _The graphs showing q) are shown respectively.
For example, when redun_area shifts to the high frequency side, P _noise decreases _{but N q} increases. That is, P _noise and N _q have a trade-off relationship with each other with respect to redun_area. Therefore, the total noise amount (P _noise + N _q ) of the interpolated data is represented by a downwardly convex graph as shown in FIG. 13C, and becomes the minimum value when redun_area is set to a certain frequency.

Hereinafter, the coding range setting process in consideration of the quantization noise will be described.
_{In this process, the quantization noise N q} generated from the information such as the quantization accuracy determined according to the coding method of the redundant data 3 instead of _{P noise} (k) for the frequency range assigned to the redundant data 3. (K) is simply calculated. Then, the coding frequency range 70 is set so that the noise power (total noise amount) in the entire interpolated data is minimized.

For example, it is assumed that the target data amount of the redundant data 3 is set. In this case, the target data amount (nbit) and the information of the coding method used for coding the redundant data 3 are input to the coding range setting unit 33. Then, the quantization noise N _q (k) generated for each frequency is calculated. For example, a value obtained by estimating _{the quantization noise N q} (k) is calculated according to the coding method of the redundant data 3.
Using this quantization noise N _q (k), the total noise amount P _{noise-residue} of the interpolated data is expressed using the following equation.

That is, P _{noise-residue} is expressed as the total amount of noise remaining in the range other than the coded frequency range 70 (total amount of noise according to the composite waveform 11) and the total amount of quantization noise N _q (k) in the coded frequency range 70. Will be done.

The redun_area that minimizes the _{P noise-residue} shown in the equation (Equation 9) is calculated.
As explained with reference to FIG. 13C, P _{noise-residue} is expected to be convex downward in the region of redun_area <N. Therefore, for example, if P _{noise-residue} _{when redun_area = k i} is smaller than the values at k _i-1 and k _{i + 1} _{, then k i} at that time becomes redun_area that minimizes P _{noise-residue.} ..
For example, by appropriately changing redun_area and determining the magnitude relationship _{of P noise-residue} _{before and after it, redun_area that minimizes P noise-residue} is calculated and set to the maximum index of the coding frequency range 70. Will be done.

In this way, the coding range setting unit 33 is the interpolation data in which the redundant data 3 is interpolated by the composite waveform 11 based on _{the noise spectrum 13 and the quantization noise N q (k) accompanying the coding of the redundant data 3.} The total amount of _{noise P noise-residue} is calculated, and the coding frequency range 70 is set so that the _{total amount of noise P noise-residue is minimized.} This makes it possible to minimize the sum of noise in all bands that are important for hearing. As a result, the quality of the interpolated data can be sufficiently improved.

[Coding range setting process using total power]
It is also possible to set the coding frequency range 70 based on the intensity (power) of the spectral component that becomes the redundant data 3.
_{For example, the target power P target} is set as a threshold value for the total power in the frequency range to be used as the redundant data 3. The P _target _{is calculated using, for example, a table in which the P target} corresponding to the target data amount (nbit) of the redundant data 3 is recorded, a calculation formula for calculating the _{P target} according to the target data amount, or the like. The following conditional expression is set based on this P _target.

The left side of the equation (Equation 10) is an integrated value of the powers of the spectral components of the original waveform 4 (original data 6) in k = 0 to redun_area. This integrated value represents the total power of the spectral components extracted as the redundant data 3.
In the coding range setting unit 33, the maximum redun_area satisfying the equation (Equation 10) is calculated and set to the maximum index of the coding frequency range 70. That is, the maximum frequency range in which the total power of the redundant data 3 is _{less than the target power P target} is set as the coded frequency range 70. By using this method, the coding frequency range 70 can be easily set.

As a human auditory characteristic, there may be a difference in the perceived intensity for each frequency domain even with the same power. _{Therefore, it is desirable to correct the frequency distributions such as P noise} (k), N _q (k), and | X (k) | used in the above-mentioned coding range setting process in advance.
For example, the threshold threshold (k) set for each frequency is subtracted from the value of each frequency distribution based on the human auditory characteristics. This threshold (k) is set using, for example, a loudness curve showing a frequency distribution of a volume that can be heard by humans. If the value after subtraction becomes negative, it is set to 0. Alternatively, a process of weighting the values of each frequency distribution may be executed according to the threshold (k).
In this way, by adding corrections according to human auditory characteristics, it is possible to avoid a situation in which noise components that would otherwise be inaudible are counted. As a result, the coding frequency range 70 can be set appropriately.

[Receiver configuration]
Hereinafter, the configuration of the receiving device 50 in the transmitting / receiving system 100 will be described.
As shown in FIG. 7, the receiving device 50 includes a communication controller 51, a receiving buffer 52, a demox unit 53, a main data buffer 54, a redundant data buffer 55, a playback data selection unit 56, and a decoding unit 57. And a signal processing unit 58 and an audio DAC 59. The receiving device 50 is configured by using, for example, a computer including a CPU and a memory. When the receiving device 50 executes the program related to the present embodiment and each unit operates, the receiving method according to the present embodiment is executed.

The communication controller 51 monitors, for example, BLE communication and controls the communication state. The communication controller 51 generates packet loss information, for example, when packet 1 is lost. Based on this information, error concealment in the receiving device 50 is started.

The reception buffer 52 is a buffer that receives the packet 1 transmitted from the transmission device 20 and temporarily stores the packet 1.
As described above, the packet 1 includes the encoded main data 2 and the redundant data 3.
Of these, the main data 2 is data in which the original data 6 included in the transmission frame 8 is encoded in the transmission device 20.
Further, the redundant data 3 is data in which the spectrum component of the coded frequency range 70 is encoded in the original data 6 included in the target frame 7. That is, the reception buffer 52 is assigned to the coded frequency range 70 of the frequency range of the waveform data (original data 6) included in the target frame 7 based on the waveform quality of the restored waveform 5 with respect to the target frame 7 of the waveform data. The redundant data 3 is received. In the present embodiment, the reception buffer 52 corresponds to the receiving unit.
The demux unit 53 appropriately reads the packet 1 stored in the reception buffer 52 and separates the encoded main data 2 into the encoded redundant data 3. Further, a data number (frame ID) for specifying the data is added to each separated data in the reproduction data selection unit 56 described later.
Further, the demux unit 53 inquires the communication controller 51 for packet loss information. When packet loss occurs, packet loss information is output to each part in the subsequent stage.
The main data buffer 54 and the redundant data buffer 55 are buffers that temporarily store the encoded main data 2 and the redundant data 3 separated by the demux unit 53.

The reproduction data selection unit 56 reads data to be reproduced (hereinafter, referred to as reproduction data) from the main data buffer 54 or the redundant data buffer 55. The playback data is selected in chronological order so that the frames are played back in an appropriate order.
Further, the reproduction data selection unit 56 notifies the signal processing unit 58 of the presence / absence of the reproduction data.
The decoding unit 57 reads the reproduction data (main data 2 or one or more redundant data 3) selected by the reproduction data selection unit 56, and decodes each data according to the corresponding coding method.

The signal processing unit 58 performs signal processing on the data (main data 2 or redundant data 3) decoded by the decoding unit 57, and generates digital data representing the final time waveform.
For example, when there is no packet loss and the main data 2 is properly received, frequency-time conversion (for example, IMDCT) is executed on the decoded main data 2.
Further, for example, when packet loss occurs and the main data 2 to be reproduced does not exist, it is based on the corresponding redundant data 3 and the data for generating the composite waveform (main data 2 in the frame near the loss data, etc.). To generate interpolated data. Processing such as frequency-time conversion is executed on this interpolated data.
Here, as a method of generating the composite waveform, it is assumed that a method of copying the original data 6 (main data 2) one frame before the target frame 7 is used.

The audio DAC 59 performs digital-analog conversion on the digital data processed by the signal processing unit 58 to generate an analog audio signal. This audio signal is input to a reproduction element such as a speaker (not shown), and the sound of the audio file (waveform of waveform data 10) is reproduced.

FIG. 14 is a block diagram showing a configuration example of the signal processing unit 58 included in the receiving device 50.
The signal processing unit 58 includes a spectrum replacement unit 60, a spectrum buffer 61, an IMDCT unit 62, and a time signal output unit 63.
When a packet loss occurs, the spectrum replacement unit 60 executes a spectrum component replacement process on the redundant data 3 decoded by the decoding unit 57 in the previous stage. If no packet loss has occurred, the decoded main data 2 is acquired and output as it is to the subsequent stage. In the following, the data output from the spectrum replacement unit 60 will be referred to as spectrum data.

Further, information for specifying the presence / absence of redundant data 3 and information for specifying a method for replacing spectrum components are input to the spectrum replacement unit 60. The information for specifying the replacement method includes information for specifying the data used for the replacement and information for specifying the frequency range for the replacement. Based on this information, the spectral component replacement process is executed.

For example, when packet loss occurs, the spectrum replacement unit 60 is input with redundant data 3 regarding the lost frame (target frame 7) and spectrum data one frame before the target frame 7 stored in the spectrum buffer 61. NS.
As described above, in the receiving device 50, a method is used in which a copy of the main data 2 one frame before the target frame is used as the composite waveform 11. Therefore, the spectrum data one frame before is the composite data representing the spectrum of the composite waveform 11.
The spectrum replacement unit 60 replaces the spectrum components of the interpolation range 71 other than the coding frequency range 70 assigned to the redundant data 3 by using the spectrum data (composite data) one frame before, and obtains new spectrum data (composite data). Interpolated data) is output.
That is, the spectrum replacement unit 60 and the spectrum buffer 61 generate the composite waveform 11 for the target frame 7, and the interpolated data obtained by interpolating the redundant data 3 with the composite waveform 11 is generated. The waveform represented by the interpolated data is the restored waveform 5.

As described above, the spectrum data output from the spectrum replacement unit 60 becomes the decoded main data 2 when the packet loss does not occur, and becomes the interpolated data when the packet loss occurs. These spectral data are appropriately stored in the spectral buffer 61 in case the subsequent frames need to be replaced with spectral components.
In the present embodiment, the spectrum replacement unit 60 and the spectrum buffer 61 work together to realize a waveform restoration unit that generates a restoration waveform based on redundant data.

The IMDCT unit 62 executes the IMDCT on the spectrum data (decoded main data 2 or interpolated data) output from the spectrum replacement unit 60. As a result, the data representing the time waveform is restored in frame units.
The time signal output unit 63 applies a composite window (see FIG. 6) to the result of IMDCT, and executes superimposition addition with the result of the previous IMDCT. This makes it possible to reconstruct a digital audio signal (digital data) that is continuous in time. The result of the superimposition addition is output to the audio DAC in the subsequent stage.

FIG. 15 is a flowchart showing an example of the operation of the signal processing unit 58. This process is a loop process that is continuously executed on a frame-by-frame basis.
In the following, the presence / absence of the redundant data 3 is notified, and when the corresponding main data 2 or the redundant data 3 exists, the signal processing unit 58 (spectrum replacement unit 60) is the main data whose decoding is completed in the previous stage processing. It is assumed that the spectrum of 2 or the redundant data 3 can be acquired.

First, it is determined whether or not the data acquired by the spectrum replacement unit 60 is the main data 2 (step 301). When the acquired data is the main data 2 (YES in step 301), the main data 2 is stored in the spectrum buffer 61 (step 306).
For example, when the proper main data 2 cannot be acquired due to the loss of the packet 1 or the like, the spectrum replacement unit 60 acquires the corresponding redundant data 3. In this way, when the acquired data is not the main data 2 (NO in step 301), the spectrum of the preprocessing result is acquired from the spectrum buffer 61 (step 302). For example, the processing result one frame before, that is, the main data 2 one frame before is acquired. The spectrum of the preprocessing result is an MDCT spectrum generated by the MDCT executed by the transmission device 20.

When the processing result one frame before is acquired, the spectrum replacement unit 60 executes the waveform / spectrum synthesis processing (step 303). Specifically, it is a process of generating _{the spectrum X'dec} [] of the composite waveform 11 from the processing result one frame before by using a predetermined waveform synthesis method. Note that X [] means an array using the index k corresponding to the frequency.
Here, as the waveform synthesis method, a method in which a copy of the previous frame is used as the composite waveform is used, so that the spectrum of the preprocessing result stored in the spectrum buffer is used as it is as the spectrum _X'dec [] of the composite waveform 11. ..

Next, the spectrum substitution unit 60 executes a substitution region setting process for setting a substitution region for replacing the spectrum component (step 304). In this process, the spectrum X _out [] of the frame to be reproduced is prepared. This X _out [] will be output as interpolated data.
In replacement region setting process, to the X _out [], a process of calculating the index assigning a spectrum X _REDUN [] of the redundant data 3, and an index for allocating the spectrum X _'dec [] of the composite waveform 11, respectively.

Next, the spectrum replacement unit 60 executes a spectrum replacement process for replacing the spectrum components (step 305). _{Specifically, each spectral component of X out} [] is replaced with the spectral component of the redundant data 3 or the composite waveform 11 based on the index assigned in the replacement region setting process. As a result, X _out [] becomes interpolated data obtained by interpolating the redundant data 3 with the composite waveform 11.
The substitution region setting process and the spectrum substitution process will be described in detail later.
_{The X out} [] generated as the interpolated data is stored in the spectrum buffer 61 for processing the next frame (step 306).

The spectrum data (main data 2 or interpolated data) processed by the spectrum replacement unit 60 is input to the IMDCT unit 62, and the IMDCT process is executed (step 307). As a result, the M DCT spectrum is converted into data representing a time waveform.
Finally, the time signal output unit 63 covers the result of the IMDCT with a composite window, superimposes and adds it to the result of the IMDCT one frame before, and reconstructs the digital audio signal.

FIG. 16 is a flowchart showing an example of the replacement area setting process. The process shown in FIG. 16 is an example of the internal process of step 304 in FIG.
FIG. 17 is a schematic diagram showing an example of a frequency range set by the replacement region setting process.
In the following, the variable indicating the index of each spectrum data is described as ISP. Further, the array in which the index of the spectrum of the redundant data 3 is stored is described as redun_isp [], and the array in which the index of the spectrum to be replaced by the composite waveform 11 is stored is described as reply_isp [].
The substitution area setting process is a process of substituting an appropriate index into redun_isp [] and reply_isp [].

First, it is determined whether or not the redundant data 3 corresponding to the frame ID to be reproduced exists (step 401). When the redundant data 3 exists (YES in step 401), the process of storing the index in redun_isp [] (step 402) is executed, and then the process of storing the index in reply_isp [] (step 403) is executed. ..

FIG. 17A illustrates the index range (frequency range) added to redun_isp [] and reply_isp [], respectively.
In step 402, the indexes included in the coded frequency range 70 (here, the index indicating the highest frequency to be coded from 0) are sequentially stored in redun_isp []. That is, the index of the decoded redundant data 3 (X _redun []) is stored in redund_isp []. For example, when the spectrum up to isp = 100 is used as the redundant data 3, 101 numbers from 0 to 100 are input to redund_isp [].
In step 403, the indexes included in the interpolation range 71, which is a frequency range other than the coding frequency range 70, are sequentially stored in response_isp []. That is, an index other than the index stored in redun_isp [] is stored in reply_isp []. For example, when the maximum index of the coded frequency range 70 is 100 and the total number of spectra is set to 256, a number from 101 to 255 is input to reply_isp []. The index stored in replace_isp [], so that the spectrum X _'dec [] synthesized waveform 11 is used.

If the redundant data 3 corresponding to the frame ID to be reproduced does not exist (NO in step 401), redun_isp [] is processed assuming that there is no element to be input (step 404). Therefore, redun_isp [] becomes blank data.

Further, in the present embodiment, when the redundant data 3 does not exist, a process of generating alternative data by using only the spectrum of the preprocessing result stored in the spectrum buffer 61 is executed.
In this process, it is possible to select a mode in which the tone component is included and a mode in which the tone component is excluded when using the spectrum of the preprocessing result. Here, the tone component is a spectral component having a tone property. For example, a spectral component that reproduces a sound (tone) having a constant frequency is a tone component. By using the spectrum of the preprocessing result as it is, the phase of the tone component shifts and the time waveform becomes discontinuous, which may cause a sense of discomfort in hearing.

In the process shown in FIG. 16, it is determined whether or not to use the mode for excluding the tone component, that is, the mode for separating the tone component from the spectrum of the preprocessing result (step 405).
When the mode for excluding the tone component is selected (YES in step 405), the tone component detection process for detecting the tone component is executed for the spectrum of the preprocessing result (step 406). The method for detecting the tone component is not limited, and any method capable of detecting the tone frequency range may be used.
When the tone component is detected, tone_isp [] is generated as a list of indexes that specify the tone component and the frequency range in the vicinity thereof (step 407). For example, for each detected tone component, an index that specifies the frequency range of ± j (for example, j = 3, etc.) of the tone component is calculated in sequence and input to tone_isp [].

Next, the process of storing the index required in reply_isp [] is executed. Specifically, from the index of the spectrum of the preprocessing result, all indexes other than tone_isp [] are stored in response_isp [].
FIG. 17B illustrates the range (frequency range) of the index added to reply_isp []. As shown in FIG. 17B, replace_isp [] is set to a frequency range excluding tone_isp [] (the range of shaded lines in the figure). The index stored in replace_isp [], so that the spectrum X _'dec [] synthesized waveform 11 is used. The spectral component corresponding to tone_isp [] is not used as data for reproduction. This makes it possible to suppress a sense of discomfort in hearing caused by discontinuity of tone components.

When the mode for excluding the tone component is not selected (NO in step 405), the entire index of the spectrum of the preprocessing result is stored in reply_isp []. That is, when the tone component is not excluded, the spectrum _X'dec [] of the composite waveform 11 is used as it is as alternative data.

FIG. 18 is a flowchart showing an example of the spectrum replacement process. The process shown in FIG. 18 is an example of the internal process in step 305 of FIG.
Here, as in the description of Figure 15, the spectrum of the redundant data 3 is described as X _REDUN [], the spectrum of the composite waveform 11 is described as X _'dec []. Further, the spectrum data output as a result of the spectrum replacement processing _{is described as X out} [].
Note that _X'dec [] and X _out [] are N spectra satisfying 0 ≦ k <N when MDCT having an analysis length of 2N is used. Also, let the variable indicating the index of the spectrum be isp. By scanning this ISP, the loop processing shown in FIG. 18 is executed.

First, ISP and X _out [] are initialized (step 501). Specifically, all the elements of isp and X _out [] are set to 0 (isp = 0, X _out [] = 0).
Next, it is determined whether or not the current ISP is included in redun_isp [] (step 502). When isp is included in redun_isp [] (YES in 502), the corresponding spectral component (X _redun [isp]) exists in _{the redundant data 3, and that component is assigned to X out} [isp] (step). 503).
If the ISP is not included in redun_isp [] (NO in 502), it is determined whether the ISP is included in the reply_isp [] (step 504).
When isp is included in response_isp [] (YES in 504), there is a corresponding spectral component ( _X'dec [isp]) _{in the spectrum of the composite waveform 11, so that component is substituted for X out} [isp]. (Step 505).
If there is no redundant data 3, if redun_isp [] is such a state is empty is continuous multiple times, so it is possible to fade out the audio, X _'dec redundant data 3 to the [isp] May be weighted less than 1 which changes depending on the number of times that was not present.
If the current ISP does not apply to either redun_isp [] or replace_isp [] (NO in step 504), _{the substitution for X out} [isp] is not executed.

If step 503 or step 505 is completed, and if NO in step 504 is determined, the ISP is incremented to scan the next index (step 506).
It is determined whether the incremented ISP is smaller than the total number of spectra of the MDCT (step 507). If the ISP is smaller than the total number of spectra (YES in step 507), the processes after step 502 are executed again. In this way, the spectrum (interpolated data) in which the redundant data 3 and the composite waveform 11 are combined is stored in _{X out} [] until the ISP becomes (total number of spectra N-1).
When the ISP exceeds the total number of spectra (NO in step 507), the spectrum replacement process is completed, and X _out [] is output to the IMDCT unit 62.

As described above, in the transmission device 20 related to the present embodiment, the waveform quality (noise spectrum 13) of the composite waveform 11 of the target frame 7 of the waveform data is predicted. Based on this waveform quality, one coded frequency range 70 to be assigned to the redundant data 3 is set in the frequency range of the waveform data included in the target frame 7. Then, the transmission data (packet 1) including the redundant data 3 generated based on the coded frequency range 70 is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.

Further, the receiving device 50 according to the present embodiment receives the redundant data 3 assigned to one of the coded frequency ranges 70 of the frequency ranges of the waveform data included in the target frame 7. The coded frequency range 70 is set using the waveform quality of the composite waveform 11 with respect to the target frame 7. Further, the interpolated data obtained by interpolating the received redundant data 3 with the composite waveform 11 is generated. This makes it possible to realize high-quality error concealment while suppressing the amount of data transmission.

Conventionally, a packet loss concealment method for generating interpolated data from a frame in the vicinity of a loss frame has been proposed.
For example, there is known a "hybrid concealment method" that generates interpolation data using a waveform synthesis method that differs for each frequency band (see Patent Document 1).
Waveform synthesis generally performed in the time domain is effective for voices distributed in the range of 4 kHz to 5 kHz, but the amount of calculation is large, and an extra wave tuning structure may be generated in a band higher than that, and a beep sound may be generated. There is sex. On the other hand, waveform synthesis performed in the frequency domain works effectively especially for high-frequency noise components and the amount of calculation is often small, but click sounds due to phase discontinuity occur for voice. There is a fear.
Further, in a general sound source, voice is included especially in the mid-low range, and noise is often included in the high range. Utilizing this, in the "hybrid concealment method", the amount of calculation is reduced and synthesized by performing waveform synthesis using the time domain method for the low frequency band and the frequency domain method for the high frequency band. The quality of the voice is maintained.

However, this method is a combination of existing waveform synthesis methods, for example, when the waveform is aperiodic and it is difficult to detect the pitch period, such as a combination of multiple musical instruments such as an orchestra, or when the power fluctuates. May not give good results. Further, for example, a sound having a wave-tuning structure, such as a musical instrument, does not always exist in the mid-low range equivalent to voice. Therefore, if the waveform synthesis methods are separated for only one frequency, noise components and the like associated with each concealment method may be heard. As described above, the quality may deteriorate depending on the type of sound source. In addition, since concealment is performed in the time domain with a large amount of calculation for the mid-low range, a certain calculation load may remain.

Another possible method is to add redundant data of neighboring frames in advance to the data (main data) that is originally desired to be transmitted, and use it as interpolated data when a packet is lost. When redundant data is used in this way, the amount of calculation on the receiving side generated by the concealment processing hardly increases, and high quality can be achieved, but the amount of data to be transmitted is significantly increased.

On the other hand, a method of controlling whether or not to add redundant data has been proposed (Japanese Patent Laid-Open No. 2003-249957). In this method, the SN ratio and cepstrum distance between the waveform corresponding to the front and rear frames synthesized from the main data and the waveform of the original front and rear frames are compared, and redundant data is added only when the value is equal to or less than the threshold value. This makes it possible to reduce the average amount of redundant data, but the amount of redundant data is not reduced when frames for adding redundant data continue.

In the present embodiment, in the transmission device 20, the frames in the vicinity of the main data to be originally transmitted are encoded only in a part of the frequency range (encoded frequency range 70) to generate the redundant data 3.
The coded frequency range 70 is set based on the waveform quality (noise spectrum 13, etc.) of the composite waveform used in the receiving device 50.
When packet loss occurs, in the receiving device 50, in the frequency spectrum of the redundant data 3 corresponding to the loss data, the interpolation range 71 other than the coded frequency range 70 is from a nearby frame that has been normally received in the past. It is replaced with the frequency spectrum of the generated composite waveform 11. In this way, the interpolated data in which the redundant data 3 is interpolated using the composite waveform 11 is used as the data for reproduction.

As a result, for example, it is possible to use the redundant data 3 in the range where the noise due to the composite waveform becomes large, and to use the composite waveform 11 in the range where the noise becomes small. As a result, the quality of the interpolated data can be sufficiently improved, and high-quality error concealment can be realized.

Further, in the transmission device 20, the waveform quality of the composite waveform is predicted in advance, and the width of the coding frequency range 70 is set based on the quality. This makes it possible to appropriately set the width of the coded frequency range 70 for each frame. As a result, it becomes possible to generate redundant data 3 in an appropriate range for each frame. Further, since the width of the coded frequency range 70 is appropriately changed, the amount of redundant data 3 is reduced as compared with the case of using, for example, a fixed width frequency range, and the amount of transmitted data can be suppressed. ..

When the target data amount of the redundant data 3 is set, the quality of the entire interpolated data including the quality of the redundant data 3 is predicted, and the coding frequency is maximized so that the quality of the interpolated data is maximized. The range 70 is set. This makes it possible to realize the highest quality error concealment in the determined amount of data.

Further, in the present embodiment, the optimum coding frequency range 70 can be set according to the characteristics of the waveform synthesis method by using the waveform quality of the composite waveform. As a result, the coding frequency range 70 can be set so as to reduce the final amount of noise even when a simple waveform synthesis method such as copying the previous frame is used. As a result, it is possible to significantly reduce the calculation load on the receiving device 50 side while maintaining the quality of the error concealment.

In the above, the coding frequency range 70 set mainly on the low frequency side has been described. Not limited to this, for example, the lowest frequency of the coded frequency range 70 may be fixedly set to an arbitrary frequency according to a use case or the like. In this case, the maximum frequency of the coded frequency range 70 is set according to the noise of the composite waveform. This makes it possible to realize high-quality error concealment according to the use case.

<Second embodiment>
The transmission / reception system of the second embodiment according to the present technology will be described. In the following description, the description of the same parts as the configuration and operation in the transmission / reception system 100 described in the above embodiment will be omitted or simplified.

FIG. 19 is a schematic diagram showing an example of the coded frequency range 70 according to the second embodiment.
In the present embodiment, a plurality of coded frequency ranges 70 are set as the frequency range assigned to the redundant data 3. Further, the position, width, number, etc. of the coded frequency range 70 can be freely set.
Therefore, in the present embodiment, the data of the target frame 7 is encoded by the transmission device 20 only in the coding frequency range 70 which is not necessarily limited to the low frequency range according to the waveform quality of the composite waveform, and the redundant data 3 is generated. Will be done.
In the example shown in FIG. 19, two coding frequency ranges 70 are set. The spectral components included in the coded frequency range 70 are used as the redundant data 3. The range other than the coded frequency range 70 is the interpolation range 71 that is interpolated using the composite waveform.

The configuration of the transmission / reception system according to the present embodiment is substantially the same as the configuration of the transmission / reception system 100 (transmission device 20 and reception device 50) described in the above embodiment, for example. Hereinafter, each configuration will be described using the same reference numerals as those of the transmission / reception system 100.
In the present embodiment, the processing content of the coding range setting process for setting the coding frequency range 70 and the configuration of the data transmitted as the packet 1 are different from the above-described embodiment.
Specifically, in the coding range setting process, a process of setting a plurality of coding frequency ranges 70 is executed as shown in FIG. Further, meta information for specifying a plurality of coded frequency ranges 70 is added to the packet 1. In the transmission device 20, this meta information is encoded by the coding unit 25 as a part of the redundant data 3.

As described above, in the transmission device 20, a plurality of coding frequency ranges 70 can be freely selected. Therefore, indexes at both ends are used to specify each coded frequency range 70. Specifically, information that specifies the lowest frequency index lsp and the highest frequency index hsp is generated as meta information.
In the meta information, the i-th coded frequency range 70 is designated as the frequency range from lsp_i to hsp_i. For example, suppose that the second coded frequency range 70 is the frequency range from k = 20 to k = 33. In this case, the second coded frequency range 70 is represented in the form (lsp_2, hsp_2) = (20,33).

As the number of coded frequency ranges 70 increases, the meta information increases, which may put pressure on the amount of data that can be used to transmit the main data 2 and the redundant data 3.
In order to prevent such a situation, in the present embodiment, the maximum number of coded frequency ranges 70 that can be tolerated is set in advance. Further, in the coding range setting process, a plurality of candidate ranges that are candidates for the coding frequency range 70 are calculated. These candidate ranges are aggregated to fit in the maximum number. This point will be described in detail later with reference to FIGS. 24 and 25 and the like.

[Calculation of coded frequency range]
In the present embodiment, the coding range setting unit 33 calculates at least one excess range in which the noise spectrum 13 exceeds the second threshold value, and sets the coding frequency range 70 based on at least one excess range.
The noise spectrum 13 is a power spectrum P _noise (k) of noise generated by using the composite waveform, and is calculated by, for example, the method described with reference to FIG. The second threshold is set according to, for example, the power of noise allowed.

The excess range is a frequency range in which the spectral component (noise power) of the noise spectrum 13 exceeds the second threshold value. Therefore, the number of calculated excess ranges changes according to the state of the noise spectrum 13. That is, the number of excess ranges may vary from frame to frame.
In this way, by using the second threshold value, it is possible to selectively detect the frequency range in which the noise power exceeds the permissible level. As a result, for example, a range with less noise can be excluded from the frequency range of the redundant data 3, and the amount of data of the redundant data 3 can be suppressed.

FIG. 20 is a schematic diagram showing a calculation example of the coded frequency range 70. In FIG. 20, the noise spectrum 13 (P _noise (k)) is illustrated by a thick solid line. The second threshold value is shown by the dotted line.
As shown in FIG. 20, in the present embodiment, the threshold curve 14 set for each frequency is used as the second threshold. Specifically, as the threshold curve 14, a threshold threshold (k) set for each frequency based on human auditory characteristics (for example, a loudness curve) is used.
In threshold (k), for example, a low threshold is set for a frequency that is easy for humans to hear, and a high threshold is set for a frequency that is hard for humans to hear. This makes it possible to selectively allocate a range in which critical noise that is easily heard by humans is generated to the redundant data 3 by using the composite waveform 11. Further, by allocating a range in which noise that is difficult for humans to hear is generated to the composite waveform 11, it is possible to suppress the amount of redundant data 3.

In the coding range setting process, the coding range setting unit 33 _{compares the noise spectrum 13 (P noise} (k)) with the threshold curve 14 (thresh (k)), and the noise power is higher than that of the threshold (k). An increasing frequency range (excess range 74) is calculated. Specifically, a set of indexes k satisfying the following equation is calculated.

As P _noise (k), a moving average of noise power (P _{noise-smoothed} (k)) calculated using the above (Equation 4) is typically used. By using the averaged noise spectrum 13, it is possible to suppress the number of the calculated excess ranges 74, and it is possible to reduce the calculation load and the like required for the subsequent processing. The noise spectrum calculated by using (Equation 1) to (Equation 3) may be used.

In the present embodiment, in addition to the excess range 74 satisfying the relationship of (Equation 11), the frequency range including a predetermined number of indexes (for example, three indexes) located before and after the excess range 74 is the coding frequency. The range is set to 70.
In FIG. 20, the region set in the coding frequency range 70 is schematically illustrated as a shaded region. As described above, by adding the spectrum in the vicinity of the excess range 74 to the redundant data 3, it is possible to surely suppress the generation of noise due to the composite waveform 11. Of course, the excess range 74 can be used as it is as the coding frequency range 70.

The coded frequency range 70 set by the method shown in FIG. 20 can be aggregated and adjusted by the subsequent processing. Therefore, it can be said that the frequency range set by using the excess range 74 and the indexes before and after the excess range 74 is the candidate range 75 which is a candidate for the coded frequency range 70.
As described above, in the present embodiment, a plurality of candidate ranges 75 that are candidates for the coding frequency range 70 are calculated, and the coding frequency range 70 is set based on the plurality of candidate ranges 75.

FIG. 21 is a flowchart showing an example of the coding range setting process. The process shown in FIG. 21 is a process executed by, for example, the coding range setting unit 33 in place of step 104 shown in FIG.
First, the candidate range 75 is set based on the noise spectrum 13 (P _noise (k)) and the threshold curve 14 (thresh (k)) (step 601). Here, the candidate range 75 is set according to the method described with reference to FIG. That is, an index range (excess range 74) that satisfies the relationship shown in (Equation 11) is calculated, and a frequency range including a predetermined number of indexes before and after that range is set as the candidate range 75.

Next, a process of adjusting the threshold curve 14 (thresh (k)) based on the spectrum X'(k) of the composite waveform 11 and readjusting the candidate range 75 is executed (step 602). Specifically, the threshold value is adjusted in the manner of calculating the masking threshold value with reference to the power of the spectrum X'(k) of the composite waveform 11.
In general, when a sound having a certain frequency is present, a band (masked band) in which it is difficult to hear a sound below a certain volume may occur. The threshold (k) is adjusted based on the volume (masking threshold) at which this sound becomes difficult to hear.
For example, a spectral component that may be reproduced is detected from the power of X'(k) in a region other than the candidate range 75 once set. In the band that cannot be heard due to this spectral component, the threshold (k) is set high so that there is little auditory discomfort even if the noise power is high. As a result, the width of the candidate range 75 is narrowed, and the amount of redundant data 3 can be suppressed.

Next, the non-tone component exclusion process for excluding the non-tone component from the candidate range 75 is executed (step 603). Here, the non-tone component is a component other than the tone component included in the spectrum X (k) of the original waveform 4 or the spectrum X'(k) of the composite waveform 11. It can be said that the non-tone component is a simple noise component that disappears when the frame is switched, for example. Therefore, among the spectral components satisfying the relationship (Equation 11), the component corresponding to the non-tone component has little effect on the sense of hearing even if it is replaced with, for example, the synthetic waveform 11.
The non-tone component exclusion process is a process of excluding such non-tone components from the candidate range 75 to narrow the candidate range 75. This point will be described in detail later.

Next, a frequency range aggregation process for aggregating the candidate ranges 75 calculated in the steps up to the above is executed (step 604). This process reduces the number of candidate ranges 75.
Finally, a frequency range adjustment process is performed to reduce the width of the reduced candidate range 75 (step 605). This process is particularly effective in use cases where the amount of target data exists in the redundant data 3 and the amount of target data is small, so that quantization noise becomes a problem.
The processes of steps 602, 603, and 605 may be appropriately executed according to, for example, the required noise level.

[Exclusion process for non-tone components]
FIG. 22 is a schematic diagram for explaining the non-tone component exclusion process.
In the non-tone component exclusion process, the tone component 15 is extracted from the original waveform 4 (original data 6 of the target frame 7). More specifically, the tone component 15 included in the spectrum X (k) of the original waveform 4 is extracted. For the extraction of the tone component 15, for example, a method of comparing the characteristics of the neighboring frame and the spectrum is used.
In the upper part of FIG. 22, as an example, the tone component 15 in | X (k) | is schematically illustrated using circles.

In the following, a set (array) of indexes of the tone component 15 and its vicinity (for example, 3 spectra) is described as tone_isp []. For example, a component included in a range other than tone_isp [] is a non-tone component.
Further, a set of indexes indicating the candidate range 75 calculated in the previous process (step 601 or step 602) is described as enc_isp [].
In the upper and middle stages of FIG. 22, the frequency ranges corresponding to tone_isp [] and enc_isp [] are schematically illustrated.

Within the range indicated by enc_isp [], in a frequency range of a certain frequency fc (for example, 2 kHz) or higher, _{even when P noise} (k) other than the tone component 15 is large, a sense of discomfort in hearing is unlikely to occur. That is, the noise derived from the non-tone component is hard to hear above fc.
The value of the frequency fc is not limited, and can be arbitrarily set according to, for example, the required noise level. In this embodiment, the frequency fc corresponds to a predetermined threshold frequency.

As described above, with respect to a normal noise component (non-tone component) that is not toned, there is a possibility that there is little sense of discomfort in hearing even when _{P noise (k) becomes large.}
In the present embodiment, the range of the frequency fc or higher is divided into the tone property and the noise property frequency range. Then, the frequency range judged to be noisy is excluded from the coded frequency range 70 (candidate range 75) regardless of the magnitude of _{P noise (k), and only the frequency range judged to be toned is left.} Will be executed.

In the lower part of FIG. 22, the candidate range 75 adjusted to exclude the non-tone component is schematically shown.
For example, in the region where the frequency is fc or more, the intersection of tone_isp [] and enc_isp [] (tone_isp [] ∩enc_isp []) is calculated. The frequency range represented by this intersection is set in the new candidate range 75. As a result, the width of enc_isp [] is reduced leaving the index of the tone component 15 and its vicinity. In this way, the width of the candidate range 75 is adjusted so that the tone component 15 is included on the high frequency side of the frequency fc.
The width of the candidate range 75 (enc_isp []) may be expanded so that the tone component 15 and the index in the vicinity thereof are completely included in the candidate range 75 on the high frequency side of the frequency fc. As a result, the tone component 15 can be reliably replaced with the redundant data 3.

Further, in the region where the frequency is less than fc, not only the tone component but also the noise derived from the non-tone component is likely to cause a sense of discomfort in hearing. Therefore, on the low frequency side of the frequency fc, enc_isp [] is set to the candidate range 75 as it is.
For example, as shown in FIG. 22, for the candidate range 75 including k = 0 Hz and having a frequency lower than the frequency fc, enc_isp [] is set as it is in the new candidate range 75.

FIG. 23 is a flowchart showing an example of the non-tone component exclusion process. The process shown in FIG. 23 is an example of the internal process of step 603 of FIG.
First, data (X (k) and X'(k)) representing each spectrum of the original waveform 4 and the composite waveform 11 are acquired (step 701). Each spectrum may be an FFT spectrum or an MDCT spectrum. In the present embodiment, the composite waveform 11 is generated by copying the data one frame before the redundant data 3. That is, X'(k) is the spectrum X _prev (k) of the previous frame.
Further, in step 701, a past spectrum or the like necessary for the tone component detection process described later may be acquired as necessary. In the following, it is assumed that the processing is performed using the two spectra of X (k) and X'(k).

Next, the power (| X (k) | and | X'(k) |) of each spectrum of the original waveform 4 and the composite waveform 11 is calculated (step 702). This process is executed when the power of each spectrum is required in the tone component detection process.

Next, the tone component detection process for detecting the tone component 15 is executed (step 703). In this process, for example, the tone component 15 is detected from each spectrum of X (k) and X'(k). For example, a spectral component having a strong tone property in each spectrum is calculated as the tone component 15 in consideration of the shape of the power spectrum, the temporal correlation between the spectra in the preceding and following frames, and the like. The method for calculating the tone component 15 is not limited.

Next, a set of indexes including the tone component 15 tone_isp [] is generated (step 704). Specifically, indexes of the tone component 15 and its vicinity (for example, indexes for three spectra before and after on the frequency axis) are acquired, and these indexes are stored in tone_isp [].

Finally, the already calculated candidate range 75 (enc_isp []) is updated based on tone_isp [] (step 705). Specifically, as described with reference to FIG. 22, the intersection of tone_isp [] and enc_isp [] is set in the candidate range 75 in the range above the frequency fc, and enc_isp [] is set in the range below the frequency fc. The candidate range 75 is set as it is.

As a result, for example, at frequencies fc and above, the width of the candidate range 75 is reduced so as to include the tone components 15 of X (k) and X'(k).
For example, the tone component 15 of X (k) is a component that is lost when the composite waveform 11 is used. Further, the tone component 15 of X'(k) is a component added when the synthetic waveform 11 is used.
Therefore, by narrowing the width of the candidate range 75 so that each tone component 15 is included, it is possible to reduce the amount of redundant data 3 while avoiding such loss and additional occurrence of the tone component 15. It becomes.

[Frequency range aggregation processing]
FIG. 24 is a schematic diagram for explaining the frequency range aggregation process.
Here, it is assumed that five candidate ranges 75 (ranges 1 to 5) are generated as shown in the upper part of FIG. 24 by the above-mentioned non-tone component exclusion process. As shown in the lower part of FIG. 24, a process of aggregating the total number N of these candidate ranges 75 up to a preset maximum number N _max (here, N _max = 2) is executed. The candidate range 75 generated in the process of aggregating the candidate ranges 75 is shown in the middle of FIG. 24.

In the present embodiment, the coding range setting unit 33 calculates the connection cost representing the amount of noise that changes by connecting the candidate ranges 75 adjacent to each other. Then, the candidate range is concatenated based on the concatenation cost.
The concatenation cost is, for example, an index indicating an increase or decrease in the amount of noise generated by concatenating the candidate ranges 75. For example, when the candidate ranges 75 are concatenated, the total amount of quantization noise included in the interpolated data generated on the receiving side changes. The connection cost is set so as to increase when the amount of noise increases and decrease when the amount of noise decreases, for example. The consolidation cost will be described in detail later.

In this embodiment, a two-step aggregation process using the consolidation cost is executed.
The first-stage process is a process of calculating the connection cost for the unencoded range between the candidate ranges 75, and when the connection cost is equal to or less than a certain threshold value, the frequency ranges at both ends are combined into one. This process is always performed to prevent inadvertent dispersion of the frequency range.
The aggregation process from the upper stage to the middle stage shown in FIG. 24 is an example of the first stage process.
The second stage processing is executed when the number N of the candidate range 75 _{exceeds the maximum number N max} in the first stage processing, and the pair of the candidate range 75 with the minimum connection cost is concatenated in order. It is a process that repeats the above. This process is executed until the number N of the candidate range 75 falls within the _{maximum number N max.}
The aggregation process from the middle stage to the lower stage shown in FIG. 24 is an example of the second stage process.

The process of connecting the candidate ranges 75 is executed by, for example, the following method.
First, the pair of two candidate ranges 75 is compared from the low frequency side, and if the connection cost satisfies the "combination condition" described later, the two target candidate ranges 75 are combined into one frequency range (candidate range). 75). In the following, a number representing each candidate range 75 will be referred to as a range number i.
The combination of the candidate range 75 means that the candidate range 75 of the range number i and i + 1 is deleted, and the "lowest frequency index lsp_i of the original range number i" to the "highest frequency index hsp_ (i + 1) of the original range number i + 1" are deleted. ) ”, Means to generate a new candidate range 75.

For example, in the process of aggregating the candidate ranges 75 from the upper to the middle of FIG. 24, the candidate ranges 75 of the range numbers 2 and 3 (ranges 2 and 3 in the upper row) are combined to form a new range of lsp2 to hsp3. Candidate range 75 (range 2 in the middle row) is generated. The indexes of the newly generated candidate range are described as lsp_2'(= lsp_2) and hsp_2'(= hsp_3).
By performing this process in order for the 1st to (N-1) th candidate ranges 75 out of the originally N candidate ranges 75, the candidate ranges 75 are aggregated.

Three types of methods will be described below for the conditions for combining the candidate ranges 75. In addition, a method other than these three types may be used.
The first method is a method of determining whether or not the quantization noise of the entire frame is reduced by reducing the meta information for one set by combining the candidate ranges 75.
In this method, first, _{the sum of the quantization noises N q} (k) in the _{entire target frame 7 (P NQ-sum} ), that is, the _{sum of the quantization noises N q} (k) for the N candidate ranges 75 is as follows. It is calculated according to the formula of.

In (Equation 12), the range number of the candidate range 75 is described as j.

_{Next, the sum (P'NQ-sum} _{) of the quantization noise N'q} (k) in the entire target frame 7 when the j-th candidate range 75 and the j + 1-th candidate range 75 are combined is calculated. Will be done. _P'NQ-sum _{is the sum of the quantization noises N'q} (k) for N-1 candidate ranges 75 including the combined candidate ranges 75.
Note that by meta-information is reduced, since the number of available allocation bits is increased, N _'q (k) is likely to be smaller than N _q (k). _P'NQ-sum is calculated according to the following formula.

Here, lsp'_j and hsp'_j are indexes of the lowest and highest frequencies of the range number j changed by coupling.

_{Next, the difference (ΔP all-noise} ) of the total sum of the quantization noises calculated according to (Equation 12) and (Equation 13) is calculated according to the following equation.

That is, ΔP _all-noise is the amount of change in the total amount of quantization noise due to the combination of the j-th and j + 1-th candidate ranges 75. This ΔP _all-noise is used as the connection cost.

As a condition for combining the j-th and j + 1-th candidate ranges 75, a condition that the total amount of quantization noise is reduced is set here. That is, when ΔP _all-noise satisfies the following conditions, the j-th and j + 1-th candidate ranges 75 are combined.

This makes it possible to reduce the number of candidate ranges 75 and reduce the amount of meta information data without increasing the quantization noise.

The second method is a method of determining whether or not the sum of the power | X (k) | of the spectrum of the original waveform 4 included in the candidate range 75 is equal to or less than a predetermined threshold value. In this method, the sum of | X (k) | between the candidate ranges 75 is used as the consolidation cost.
For example, when the frequency range in which | X (k) | is large is replaced with the composite waveform 11, the amount of generated noise is likely to increase. On the contrary, even if the frequency range in which | X (k) | is small is replaced with the composite waveform 11, the amount of noise generated is estimated to be small. Utilizing this, the combination of the candidate range 75 is determined based on the sum of | X (k) |. This method does not require calculation processing of quantization noise and can be said to be a simplified version of the first method.

_{Specifically, the sum P sum_inter_i} of the powers of the spectra for the frequency range (intermediate range) between the i-th and i + 1-th candidate ranges 75 is calculated according to the following equation.

_{When the P sum_inter_i} calculated in this way is equal to or less than a predetermined threshold value, the i-th and i + 1-th candidate ranges 75 are combined.
The threshold value for determining P _{sum_inter_i} may be a value set in relation to the amount of data of meta information or the like, or may be a predetermined fixed value.

The third method is a method of determining whether or not the interval (difference in the index of the spectrum) of the candidate range 75 is equal to or less than a predetermined threshold value. In this method, the interval of the candidate range 75 is used as the consolidation cost.
For example, it is estimated that the amount of noise generated is small even if the portion where the interval of the candidate range 75 is small is replaced with the composite waveform 11. Utilizing this, the combination of the candidate range 75 is determined based on the interval of the candidate range 75. This method does not require a process of calculating the total power of the spectra, and can be said to be a simplified version of the second method.

Specifically, the total number of indexes between the i-th and i + 1-th candidate ranges 75 is calculated according to the following formula.

When the total number of indexes between the candidate ranges 75 calculated in this way (interval between the candidate ranges 75) is equal to or less than a predetermined threshold value, the i-th and i + 1-th candidate ranges 75 are combined.

The above three methods are methods used for the first-stage processing (aggregation processing from the upper stage to the middle stage of FIG. 24) for connecting the candidate ranges 75.
If the number N of the candidate range 75 _{is larger than N max} even after executing these processes, the pair of the two candidate ranges 75 having the minimum cost is combined as one frequency range. The process to be performed is repeatedly executed. As a result, the candidate range 75 can be aggregated up to the specified number (N _max).

FIG. 25 is a flowchart showing an example of frequency range aggregation processing. The process shown in FIG. 25 is an example of the internal process in step 604 of FIG. Here, it is assumed that the connection cost (P _{sum_inter_i} ) described in the second method described above is used.
First, the variable N representing the number of candidate ranges 75 and the variable i representing the range number are initialized (step 801). The number of currently existing candidate ranges 75 (the number of candidate ranges 75 calculated in the processes up to the previous stage) is substituted into N. Further, 1 is assigned to the variable i for holding the range number for scanning each candidate range 75.

Next, it is determined whether or not the range number i is N-1 or less (step 802). If i is less than N-1 (YES in step 802), the first step of connecting the candidate ranges 75 (steps 803 to 806) is executed.
In the first step of processing, first, an index for representing the frequency range (i-th intermediate range) between the i-th and i + 1-th candidate ranges 75 is acquired (step 803). Specifically, the index hsp_i of the highest frequency of the i-th candidate range 75 and the index lsp_ (i + 1) of the lowest frequency of the i + 1th candidate range 75 are read.
Here, the range from (hsp_i + 1) to (lsp_ (i + 1) -1) is the i-th intermediate range.

Next, for the i-th intermediate range, the total power P _{sum_inter_i} of the spectrum of the original waveform 4 is calculated according to the equation (Equation 16), and it is determined whether or not it is equal to or less than a predetermined threshold value (step 804). Here, it is determined whether the sum of | X (hsp_i + 1) | to | X (lsp_ (i + 1) -1) | is equal to or less than the threshold value.
If P _{sum_inter_i} is less than or equal to the threshold (YES in step 804), the two candidate ranges 75 (i-th and i + 1-th candidate ranges 75) are combined to generate one new candidate range 75 (step 805). If P _{sum_inter_i} is larger than the threshold value (NO in step 804), the process of combining the candidate ranges 75 is not performed.
Next, the range number i is incremented (step 806), and it is determined whether or not the range number i incremented by step 802 is N-1 or less. _{In this way, the aggregation process of comparing P sum_inter_i} with the threshold value is repeated until the range number i becomes the N-1th.

In step 802, when i is larger than N-1, that is, when i = N (NO in step 802), the range number N is updated (step 807). Specifically, the total number of current candidate ranges 75 (the number of candidate ranges 75 aggregated in steps 803 to 806) is substituted into the range number N.

Next, it is determined whether or not the updated range number N is _{equal to or less than the maximum number N max (step 808).} When N is N _max or less (YES in step 808), that is, when the number of candidate ranges 75 is sufficiently reduced in the first-stage aggregation process, the frequency range aggregation process ends.
When N is _{larger than N max} (NO in step 808), the second-stage aggregation process is executed (step 809). Here, the candidate ranges 75 on both sides of the intermediate range that minimizes _{P sum_inter_i are combined.} That is, the i-th candidate range 75 such that the sum of | X (hsp_i + 1) | to | X (lsp_ (i + 1) -1) | is minimized is combined with the i + 1-th candidate range 75.
Next, the range number N is updated (step 810). Specifically, as the number of ranges N is reduced by one in step 809, N-1 is substituted for the number of ranges N. Then, step 808 is executed again, and it is determined whether or not _{the updated range number N is N max or less.} As a result, the candidate ranges 75 are aggregated in ascending order of connection cost until _{the range number N drops to the maximum allowable number N max.}

[Frequency range adjustment processing]
FIG. 26 is a schematic diagram for explaining the frequency range adjustment process. Here, the outline of the frequency range adjustment process executed in step 605 of FIG. 21 will be described.
In the present embodiment, the coding range setting unit 33 adjusts the width of the candidate range 75 based on the noise components at the highest and lowest frequencies of the candidate range 75. Specifically, from the indexes (highest frequency and lowest frequency) at both ends of each candidate range 75 calculated in the processing up to the previous stage, the index _{that minimizes P noise} (k) (hereinafter referred to as k _noise-min ). ) Is detected, and the index k _noise-min is repeatedly excluded from the candidate range 75, so that the candidate range 75 is reduced.

For example, as shown in the upper part of FIG. 26, it is assumed that the two candidate ranges 75 of the range numbers i = 1 and 2 are calculated by the processing up to the previous stage. Further, it is assumed that _{k noise-min} is hsp_1 among the indexes at both ends of the range 1 (lsp_1 and hsp_1) and the indexes at both ends of the range 2 (lsp_2 and hsp_2). At this time, if a predetermined condition is satisfied, as shown in the middle part of FIG. 26, hsp_1 is excluded from the range 1, and the index indicating the highest frequency of the range 1 is hsp'_1 = (hsp_1) -1. Become.
Next, when k _noise-min is lsp_2 and a predetermined condition is satisfied, lsp_2 is excluded from the range 2 as shown in the lower part of FIG. 26, and the index indicating the lowest frequency in the range 2 is lsp'_2 = (lsp_2) +1.
Such processing is repeatedly executed while a predetermined condition is satisfied.

The details of the frequency range adjustment process will be described below.
_{First, the sum P NQ-sum of} _{the quantization noise N q} (k) generated in the N candidate ranges 75 over the entire target frame 7 is calculated according to the following equation.

This is the same equation as (Equation 12) described above.

_{Next, the index k noise-min} _{that minimizes P noise} (k) is detected from the indexes (lsp_i and hsp_i) corresponding to each candidate range 75. It should be noted that i = 1 to N.

Next, _{for the range excluding k noise-min} (lsp'_i and hsp'_i), _{the sum P'NQ-sum} _{of the quantization noise N q} '(k) in the entire target frame 7 is calculated. With the exclusion of k _noise-min , the quantization noise N _q '(k) shows a value different from the _{above-mentioned N q (k).} _P'NQ-sum
Is calculated according to the following formula.

In this way, (Equation 18) and (Equation 19) calculate the total sum of the quantization noise before and after excluding _{k noise-min.}

Next, the _{amount of noise change ΔP all-noise} in the entire target frame 7 generated by excluding _{k noise-min} and narrowing the candidate range 75 is calculated. The amount of noise change ΔP _all-noise is expressed as the sum of (decrease in quantization noise) and (increase in noise due to narrowing of the frequency range). _{Specifically, ΔP all-noise} is calculated according to the following equation.

When ΔP _all-noise is calculated, it is determined whether or not the following equation is satisfied.

As shown in (Equation 21), when ΔP _all-noise is negative, it means that the total amount of noise in the entire target frame 7 is reduced by excluding _{k noise-min.} In this case, a new candidate range 75 excluding _{k noise-min is set.} That is, lsp'_i and hsp'_i are set as new lsp_i and hsp_i.
After that, the reduction operation of the candidate range 75 is repeatedly executed until the equation (Equation 21) is no longer satisfied. This makes it possible to reduce the total amount of noise in the entire target frame 7. Therefore, for example, even if the data amount (target data amount) of the redundant data 3 is determined in advance and the quantization noise becomes a problem due to its small size, the total amount of noise in the entire target frame 7 can be calculated. It can be sufficiently suppressed.

Instead of the method of referring to the quantization noise or the like as described above, another method may be used.
Generally, when coding voice, the number of bits assigned to a high-power spectral component tends to be large. Here, using such a characteristic, it is determined whether or not to exclude _{k noise-min} by referring to the power | X (k) | of the spectrum of the original waveform 4.

_{For example, the total power P target} (nbit) corresponding to the target data amount (nbit) of the redundant data 3 is calculated using a predetermined formula or table.
_{Next, the total power P redun-sum} of | X (k) | in all the candidate ranges 75 is calculated according to the following equation.

As shown in the equation (Equation 22), _{the value of P redun-sum} becomes smaller by excluding, for example, k _noise-min.

Next, it is determined whether or not _{P redun-sum satisfies the following equation.}

For example, when the relationship shown in (Expression 23) is not satisfied, in a similar manner to that described above, from among the lsp_i and Hsp_i, detects the index k _noise-min to P _noise (k) is minimized, k _Noise- A new candidate range 75 excluding _{min is set.}
For the new candidate range 75, the total power P _redun-sum is recalculated. In other words, the P _REDUN-sum calculated before exclude k _noise-min, k spectral power of the original waveform 4 in _{_{noise-min | X (k noise}} -min) | is subtracted. That is, the recalculated total power _P'redun-sum is calculated according to the following equation.

The process _{from the detection of k noise-min to} the recalculation of the total power shown in Eq. (Equation 24) is repeatedly executed until the following relationship is satisfied.

A candidate range 75 that satisfies this relationship is set as a coded frequency range 70 that is finally assigned to the redundant data 3.

It is also possible to adjust and use the spectrum data required for each process in the frequency range aggregation process described with reference to FIGS. 24 and 25 and the frequency range adjustment process described with reference to FIG. 26. .. For example, the intensity perceived by humans may vary from frequency range to frequency range, even if the spectral powers are the same. _{Therefore, for example, for P noise} (k), N _q (k), | X (k) |, etc. used in the calculation of the connection cost, the process of subtracting the value of the threshold curve threshold (k) (however, negative). If it becomes, it is set to 0), and it is desirable to use it after performing a process of weighting for each frequency. If there is an appropriate cost depending on the other coding method, it may be used.
Further, as for the frequency range adjustment process, a process of excluding one or more indexes in one operation may be executed.

In the above description, the process of reducing the candidate range 75 based on the noise components at both ends of the candidate range 75 has been described, but the candidate range 75 may be expanded based on the noise component.
For example, there is a possibility that a frequency component having a similarly high noise level exists in the vicinity of a frequency having a high _{P noise (k).} Therefore, for example, when P _noise (k) exceeds a certain level at both ends of the candidate range 75, even if a process of expanding the candidate range 75 and replacing nearby noise components with redundant data 3 is executed. good. This makes it possible to reduce noise due to the composite waveform.

[Operation of receiver]
The operation of the receiving device 50 according to the second embodiment is substantially the same as the operation described in the first embodiment, but the point of receiving the meta information as packet 1 and the processing content of the replacement area setting process are different. It is different from the above embodiment.
In the present embodiment, a plurality of frequency ranges (encoded frequency range 70) based on meta information are set in the substitution region setting process.

_{Specifically, in the array X out} [] set as the spectrum of the frame to be reproduced, the spectrum of _{the redundant data 3 (X redun} []) is used for the plurality of coded frequency ranges 70. Further, in X _out [], all the interpolation ranges 71 other than the coding frequency range 70 are _{replaced with the spectrum (X'dec} []) of the composite waveform 11 (see FIGS. 16 and 17).
Therefore, the array redun_isp [] in which the index of the spectrum of the redundant data 3 is stored stores the indexes of all lsp_i to hsp_i in the meta information. Further, in the array reply_isp [] in which the indexes of the spectra of the composite waveform 11 are stored, all indexes other than the indexes stored in redun_isp [] are stored.

For example, suppose that meta information indicating that (lsp1, hsp1) = (10,15), (lsp2, hsp2) = (33,36), (lsp3, hsp3) = (55,60) is received. In this case, redun_isp [] stores indexes of 10, 11, 12, 13, 14, 15, 33, 34, 35, 36, 55, 56, 57, 58, 59, 60.
Thereby, the spectrum replacement process shown in FIG. 18 can be applied as it is to generate _{appropriate spectrum data X out [] for reproduction.}

<Third embodiment>
FIG. 27 is a schematic diagram showing an example of the coded frequency range 70 according to the third embodiment.
In the third embodiment, as in the second embodiment, a plurality of coded frequency ranges 70 are set to arbitrary frequency ranges. Further, as shown in FIG. 27, an arbitrary waveform synthesis method is individually set for the interpolation range 71 between the coding frequency ranges 70. That is, in the present embodiment, the coding frequency range 70 and the waveform synthesis method can be freely set.

In the present embodiment, all the waveform synthesis methods that can be executed on the receiving side are tried on the transmitting side, and the noise (noise spectrum 13) generated by the synthesized waveform 11 generated by each method is calculated. Based on these noise spectra 13, the coding frequency range 70 assigned to the redundant data 3 and the optimum waveform synthesis method to be set in the interpolation range 71 are set.
In the example shown in FIG. 27, the synthesis method 1, the synthesis method 2, and the synthesis method 3 are selected in order from the low frequency side (left in the figure) for the three interpolation ranges 71.
As a result, it is possible to realize high-quality error concealment with less redundant data 3.

The configuration of the transmission / reception system according to the present embodiment is substantially the same as the configuration of the transmission / reception system 100 (transmission device 20 and reception device 50) described in the above embodiment, but the redundant data generation unit included in the transmission device 20. And the configuration of the signal processing unit included in the receiving device 50 is different.
Hereinafter, the same configuration as the transmission / reception system 100 will be described using the same reference numerals.

[Transmitter]
Hereinafter, the configuration of the redundant data generation unit will be mainly described as the configuration on the transmission device 20 side.
FIG. 28 is a block diagram showing a configuration example of the redundant data generation unit 324 according to the third embodiment. The redundant data generation unit 324 includes an original data selection unit 330, a composite waveform generation unit 331a to 331c, a generated noise calculation unit 332, a coding range setting unit 333, and a coding spectrum selection unit 334.
The redundant data generation unit 324 is different from the redundant data generation unit 24 described in the above embodiment in that a plurality of composite waveform generation units 331 are mainly provided. Along with this, a plurality of noise information (noise spectrum 13) output from the generated noise calculation unit 332 is provided.
Here, it is assumed that three types of waveform synthesis methods can be selected in the receiving device 50. In this case, the redundant data generation unit 324 is provided with three composite waveform generation units 331a to 331c corresponding to the three types of waveform synthesis methods.
When there are n types of waveform synthesis methods, the number of composite waveform generation units 331 is also n.

The original data selection unit 330 selects and acquires necessary data from the original data 6 stored in the input buffer 23 (see FIG. 7). Specifically, the original data 6 corresponding to the main data 2 (transmission frame 8) and the redundant data 3 (target frame 7) is read from the input buffer 23 in the previous stage. Further, the necessary original data 6 is delivered to the composite waveform generation units 331a to 331c. The original data 6 is at least one of the spectrum data X (k) and the time waveform data x (n).

The composite waveform generation units 331a to 331c generate the composite waveform 11 according to the waveform synthesis method set for each based on the original data 6 acquired from the original data selection unit 330, and the data (composite data) of each composite waveform 11 Is handed over to the generated noise calculation unit 332.
As the waveform synthesis method, it is desirable to combine a plurality of methods having a low calculation amount (for example, a method of copying the previous frame). This makes it possible to reduce the calculation load on the receiving side. Of course, this technique can be applied regardless of the type of waveform synthesis method.

The generated noise calculation unit 332 acquires the original data 6 from the original data selection unit 330, and acquires the three composite waveforms 11 from the composite waveform generation units 331a to 331c. Further, the generated noise calculation unit 332 calculates noise information related to noise generated by using the composite waveform 11 for each waveform synthesis method, and delivers this to the coding range setting unit.
Specifically, the noise spectrum 13 representing the waveform quality of each composite waveform 11 is calculated as noise information. In this way, the generated noise calculation unit 332 predicts the waveform quality of the composite waveform 11 for each of the plurality of waveform synthesis methods.

The coding range setting unit 333 acquires noise information (noise spectrum 13) calculated for each waveform synthesis method from the generated noise calculation unit 332, and allocates at least one code to the redundant data 3 based on each noise spectrum 13. The frequency range 70 is set.
Further, the coding range setting unit 333 sets one of the plurality of waveform synthesis methods for generating the composite waveform 11 in the interpolation range 71, which is a frequency range other than the coding frequency range 70. Specifically, the optimum waveform synthesis method is selected for each interpolation range 71 based on each noise spectrum 13.

As described above, in the coding range setting unit 333, the coding frequency range 70 and the waveform synthesis method assigned to the interpolation range 71 are determined based on the waveform quality (noise spectrum 13) predicted for each of the plurality of waveform synthesis methods. Set.
Further, the coding range setting unit 333 generates meta information including information for designating the coding frequency range 70 and information for specifying the waveform synthesis method assigned to the interpolation range 71, and delivers the meta information to the coding spectrum selection unit 334. ..

The coded spectrum selection unit 334 extracts a spectrum component to be used as the redundant data 3 based on the coded frequency range 70 from the original data 6 (spectral data X (k)) corresponding to the redundant data 3. The operation of the coded spectrum selection unit 334 is the same as that of the above embodiment, but the meta information about the waveform synthesis method is also handled as a part of the redundant data 3.

FIG. 29 is a flowchart showing an example of the generation process of the redundant data 3. This process is, for example, a loop process that is executed every time packet 1 is generated.
In the following, as the waveform synthesis method, it is assumed that a method of generating the composite waveform 11 by using one or both of the spectrum data and the time waveform data is used. Further, it is assumed that these data required for generating the composite waveform 11 are stored in advance in the input buffer 23 of the previous stage.

First, the original data 6 (original data 6 included in the target frame 7) corresponding to the redundant data 3 is acquired from the input buffer 23 (step 901). Further, spectrum data is acquired from the input buffer 23 as the original data 6 required for waveform synthesis (step 902). In addition, time waveform data is acquired from the input buffer 23 as the original data 6 required for waveform synthesis (step 903). The order of processing in steps 901 to 903 is not limited.

Next, the processes of steps 904 to 906 are sequentially executed for each waveform synthesis method.
First, a composite waveform generation process for generating the composite waveform 11 based on the target waveform synthesis method is executed (step 904). This composite waveform generation process corresponds to step 102 in FIG.
When the composite waveform 11 is generated, the generated noise prediction process for calculating the noise spectrum 13 related to the composite waveform is executed (step 905). This generated noise prediction process is a process corresponding to step 103 of FIG.
When the noise spectrum 13 is calculated, the coding range setting process for setting the frequency range assigned to the redundant data 3 is executed based on the noise spectrum 13 (step 906). This coding range setting process is a process corresponding to the coding range setting process shown in FIG. Here, at least one frequency range is set. The frequency range set in step 906 is a candidate range 75 that is a candidate for the coded frequency range 70.
As described above, in the present embodiment, at least one candidate range 75 that is a candidate for the coding frequency range 70 is calculated for each of the plurality of synthesis methods.

It is determined whether or not there is a waveform synthesis method to be processed (step 907). If the waveform synthesis method remains (YES in step 907), the processing after step 904 is executed again using the waveform synthesis method that has not been processed yet.
As a result, the same number of candidate ranges 75 (enc_isp_i []) corresponding to the i-th waveform synthesis method are generated as the planned waveform synthesis method (see FIG. 30).

When the processing for all the waveform synthesis methods is completed (NO in step 907), the coding range synthesis processing for synthesizing the candidate ranges 75 corresponding to each waveform synthesis method and setting the coding frequency range 70 is executed. (Step 908).
In the coding range synthesis process, the coding frequency range 70 is set, and an appropriate waveform synthesis method is assigned to the interpolation range 71 between the coding frequency ranges 70. That is, which composite waveform 11 spectrum is applied to each interpolation range 71 is set, and this setting result is generated as meta information.

Next, a coded spectrum selection process for extracting only the spectral components corresponding to the coded frequency range 70 from the original data 6 of the target frame 7 is executed (step 909). This coded spectrum selection process corresponds to step 105 in FIG.
For example, the redundant data 3 before coding is generated using only the spectral components in the coded frequency range 70. Further, an index indicating the coding frequency range 70 and meta information indicating the waveform synthesis method set for each interpolation range 71 are added to the redundant data 3.
At this time, for example, the coefficient calculated in the waveform synthesis process may be added as meta information. This makes it possible to reduce the calculation load on the receiving device 50.

When the redundant data 3 is extracted, it is determined whether or not the original data 6 to be processed remains (step 910). If the original data 6 to be processed remains (YES in step 910), the processing after step 901 is executed again for the remaining original data 6. When the original data 6 to be processed does not remain (NO in step 910), the coding process for encoding the redundant data 3 is executed (step 911). When the redundant data 3 is encoded, the target data amount of the main data 2 is set (step 912). Specifically, the free space of the packet 1 is calculated from the data amount of the encoded redundant data 3.
The processes in steps 910 to 912 correspond to the processes in steps 106 to 108 shown in FIG. 9, for example.

[Code range composition processing]
FIG. 30 is a schematic diagram showing an example of the coding range synthesis process.
The outline of the coding range composition processing will be described below. Here, it is assumed that (enc_isp_i []) is read as the candidate range 75 corresponding to the i-th waveform synthesis method by the composite waveform generation process, the generated noise prediction process, and the coding range setting process in the previous stage.
In the upper three stages of FIG. 30, a region showing a candidate range 75 calculated corresponding to the waveform synthesis methods 1 to 3 is shown. For example, the candidate range 75 (enc_isp_1 []) of the waveform synthesis method 1 includes one frequency range. Further, the candidate ranges 75 (enc_isp_2 [] and enc_isp_3 []) of the

waveform synthesis methods

2 and 3 include two frequency ranges, respectively.

In the present embodiment, it is possible to combine these plurality of waveform synthesis methods. Therefore, of each enc_isp_i [], only the spectrum of the common index may be encoded as the redundant data 3, and all the other frequency ranges may be appropriately replaced with the spectrum of the composite waveform 11.
When n waveform synthesis methods are used, the final set of indexes to be encoded, enc_isp [] (encoded frequency range 70), is expressed using the following equation.

As described above, in the present embodiment, the frequency range represented by the intersection of the candidate ranges 75 calculated for each of the plurality of waveform synthesis methods is set to the coded frequency range 70.
Further, the frequency range other than the coded frequency range 70 is the “uncoded frequency range”, and is the interpolation range 71 interpolated using the composite waveform 11. It can be said that the interpolation range 71 is a composite frequency range in which the composite waveform 11 is used.
In FIG. 30, two coded frequency ranges 70 represented by the intersection of each candidate range 75 are schematically illustrated using shaded areas. A region other than these coding frequency ranges 70 is the interpolation range 71.

For example, it is assumed that the following index is included in the array enc_isp [] representing the coded frequency range 70.
(Lsp_i, hsp_i) = (2,55), (77,80).
In this case, the interpolation range 71 is uniquely determined as follows.
(Lsp_j, hsp_j) = (0,1), (56,76), (81, N)
Here, N is, for example, the largest index of the MDCT spectrum. Further, j is an index indicating the interpolation range 71.

Next, a method of assigning a waveform synthesis method to each interpolation range 71 will be described. Here, it is assumed that one waveform synthesis method is set for each interpolation range 71. Therefore, the process of dividing one interpolation range 71 into a plurality of parts and assigning a waveform synthesis method to each of them is not performed. This makes it possible to suppress the increase in meta information.

For example, assume a process of setting a waveform synthesis method in the j-th interpolation range 71. _{In this case, for the noise spectrum P noise-i} (k) based on the composite waveform 11 generated by using the i-th waveform synthesis method, _{the sum P noise-sum} (i, j) in the j-th interpolation range 71 is as follows. Calculated according to the formula.

Of the plurality of waveform synthesis methods, the waveform synthesis method _{i that minimizes the P noise-sum} (i, j) shown in Eq. (Equation 27) is set as the optimum waveform synthesis method for the j-th interpolation range 71. .. That is, the waveform synthesis method having the smallest total sum of generated noise in the j-th interpolation range 71 is selected.
In this way, the method that minimizes the integrated value of the noise spectrum 13 among the plurality of waveform synthesis methods is set in the interpolation range 71.
Such processing is executed for all the interpolation ranges 71, and the optimum waveform synthesis method is set for each interpolation range 71.

For example, in the example shown in FIG. 30, the waveform synthesis method 3, the waveform synthesis method 2, and the waveform synthesis method 1 are set in order from the low frequency side for the three interpolation ranges 71. Depending on the result of the determination by the equation (Equation 27), the same waveform synthesis method may be set for all the interpolation ranges 71.
As described above, if the coding frequency range 70 can be confirmed, the interpolation range 71 can also be specified. Therefore, only the assignment of the waveform synthesis method to each interpolation range 71 may be added to the meta information in order from, for example, the low frequency band. That is, information such as an index that specifies the interpolation range 71 is unnecessary.

Further, after the coded frequency range 70 is set, for example, when the quantization noise becomes a problem due to a factor such as an extremely small amount of target data of the redundant data 3, the frequency range adjustment process in step 605 of FIG. 21 May be executed.

FIG. 31 is a flowchart showing an example of the coding range synthesis process. The process shown in FIG. 31 is an example of the coded range synthesis process described with reference to FIG. 29.
In the following, the sequence representing the index (encoding frequency range 70) of the spectrum to be finally encoded will be referred to as enc_isp []. Further, the variable representing the waveform synthesis method is described as i, and the variable that specifies the interpolation range 71 is described as j.

First, the index enc_isp_i [] of the candidate range 75 calculated for each waveform synthesis method by the processing up to the previous stage is acquired (step 1001).
Next, enc_isp [] is initialized with the index enc_isp_1 [] of the first candidate range 75 (step 1002). Further, the variable i representing the waveform synthesis method is initialized to 1 (step 1003).

Next, the intersection of enc_isp [] with the index enc_isp_i [] of the candidate range 75 obtained for each waveform synthesis method is calculated, and enc_isp [] is updated (step 1004). Then, the variable i is incremented (step 1005).
It is determined whether the incremented i is less than or equal to the number of waveform synthesis methods used, that is, whether the intersection of candidate ranges 75 has been calculated for all waveform synthesis methods (step 1006). If the waveform synthesis method remains (YES in step 1006), the processes after step 1004 are executed again.
When the intersection of the candidate ranges 75 is calculated for all the waveform synthesis methods, enc_isp [] becomes an array representing the index of the coded frequency range 70 shown in (Equation 26).

If there is no waveform synthesis method to be processed (NO in step 1006), the interpolation range 71 is calculated based on enc_isp [] (step 1007), and the variable j representing the interpolation range 71 is initialized to 1 (NO in step 1006). Step 1008). Hereinafter, the indexes representing the lowest frequency and the highest frequency of the j-th interpolation range 71 are referred to as (lsp_j, hsp_j). The number of interpolation ranges 71 depends on the distribution of the coding frequency range 70, but is ± 1 of the number of coding frequency ranges 70.

Next, by the loop processing of steps 1009 to 1012, an appropriate waveform synthesis method is sequentially set for the j-th interpolation range 71.
_{First, a waveform synthesis method that minimizes the sum P noise-sum} _{(i, j) of P noise-i} (k) in the j-th interpolation range 71 (k = lsp_j to hsp_j) is searched for (step 1009). _{Specifically, P noise-sum} (i, j) is calculated based on the equation (Equation 27) for all the waveform synthesis methods, and the waveform synthesis method that minimizes the result is selected.
The waveform synthesis method selected in the previous stage is set as the waveform synthesis method used for the j-th interpolation range 71 (step 1010). Then, the variable j is incremented (step 1011), and it is determined whether or not the variable j is equal to or less than the total number of the interpolation ranges 71, that is, whether or not the waveform synthesis method is set for all the interpolation ranges 71 (step 1011). Step 1012).

If the unprocessed interpolation range 71 remains (YES in step 1012), the processes after step 1009 are executed again. When the waveform synthesis method is set for all the interpolation ranges 71, the frequency range synthesis process is completed.
By this process, a method of generating the composite waveform 11 that minimizes the total amount of generated noise is set for each interpolation range 71. As a result, the amount of noise in the restored waveform 5 restored on the receiving side is reduced, and even if the amount of redundant data 3 is small, it is possible to sufficiently reduce the sense of discomfort in hearing.

[Receiver]
The packet 1 received by the receiving device 50 (reception buffer 52 shown in FIG. 7) according to the present embodiment includes the main data 2 and the redundant data 3 to which meta information is added. This meta information includes information on the waveform synthesis method set for each interpolation range 71, in addition to information for designating a plurality of coded frequency ranges 70.
That is, it can be said that the meta information is information for designating the synthesis method of the composite waveform 11 for each interpolation range 71, which is a frequency range other than the coding frequency range 70. Such meta information is received by the reception buffer 52 and is appropriately referred to in the subsequent processing. In the present embodiment, the meta information corresponds to the designated information.
Hereinafter, the configuration of the signal processing unit will be mainly described as the configuration on the receiving device 50 side.

FIG. 32 is a block diagram showing a configuration example of the signal processing unit according to the third embodiment. The signal processing unit 358 has a spectrum replacement unit 360, a spectrum buffer 361, an IMDCT unit 362, and a time signal output unit 363. Further, the signal processing unit 358 has a plurality of composite waveform generation units 364a to 364c, an MDCT unit 365, and a time waveform buffer 366.

The spectrum replacement unit 360 acquires spectrum data as the decoded redundant data 3 from the previous stage. Further, the spectrum data of the composite waveform 11 generated by the composite

waveform generation units

364a and 364b and the spectrum data (MDCT spectrum) of the composite waveform 11 generated by the composite waveform generation unit 364c and converted by the MDCT unit 365 are acquired. ..
Further, the spectrum replacement unit 360 generates interpolation data in which a part of the decoded redundant data 3 (interpolation range 71) is replaced with the spectrum data of each composite waveform 11 based on the above-mentioned meta information. This interpolated data is passed to the spectrum buffer 361 and the IMDCT unit 362.

The IMDCT unit 362 acquires the spectrum of the interpolated data from the spectrum replacement unit 360, performs IMDCT on the spectrum, and converts the interpolated data into time domain data. The result of this IMDCT is passed to the time signal output unit 363.
The time signal output unit 363 acquires the result of the IMDCT from the IMDCT unit 362. A composite window is applied to this, and the result of the previous IMDCT is superimposed and added to reconstruct the audio signal (time waveform data), and the audio signal (time waveform data) is delivered to the time waveform buffer 366.
The time waveform data stored in the time waveform buffer 366 is used for audio reproduction at a timing required in the subsequent stage.

The composite

waveform generation units

364a and 364b appropriately acquire spectrum data of past frames (one frame before or two frames before) from the spectrum buffer 361, for example. Then, the spectrum data of the composite waveform 11 is generated based on each spectrum data and passed to the spectrum replacement unit 360.
The composite waveform generation unit 364c acquires the time waveform data of the past reproduced waveform from the time waveform buffer 366. Then, the time waveform data of the composite waveform 11 is generated based on the time waveform data, and is delivered to the MDCT unit 365.
The MDCT unit 365 acquires the time waveform data of the composite waveform 11 generated by the composite waveform generation unit 364c, and performs MDCT on this to create an MDCT spectrum (spectrum data) of the composite waveform 11. The spectrum data of the composite waveform 11 is passed to the spectrum replacement unit 360.
The waveform synthesis method used by the composite

waveform generation units

364a, 364b, and 364c is the waveform synthesis method assigned to the interpolation range 71 on the transmitting side.

Note that the configuration of the signal processing unit 358 shown in FIG. 32 is merely an example, and is not limited to this. For example, it is conceivable that the types of required data (spectral data, time waveform data, number and position of past frames) differ depending on the waveform synthesis method adopted. Therefore, each configuration of the signal processing unit 358 (composite waveform generation unit 364, etc.) may be appropriately set according to the type of waveform synthesis method and the like.

In the example shown in FIG. 32, for example, the composite

waveform generation units

364a and 364b perform a waveform synthesis method in the frequency domain (for example, copying a spectrum from a previous frame). Further, the composite waveform generation unit 364c performs a waveform synthesis method using a time waveform (for example, a method of extrapolating a waveform using a linear prediction code (LPC)).
Regardless of which waveform synthesis method is used, the composite waveform 11 may be input to the spectrum replacement unit 360 after being converted into a spectrum in the frequency domain (MDCT region), if necessary.

FIG. 33 is a flowchart showing an example of the operation of the signal processing unit 358. This process is a loop process that is continuously executed on a frame-by-frame basis.
In the following, similarly to the processing shown in FIG. 15, the presence / absence of the redundant data 3 is notified, and when the corresponding main data 2 or the redundant data 3 exists, the signal processing unit 358 (spectrum replacement unit 360) is in the preceding stage. It is assumed that the spectrum of the main data 2 or the redundant data 3 for which decoding has been completed can be acquired by the process of.
First, it is determined whether or not the data acquired by the spectrum replacement unit 360 is the main data 2 (step 1101). When the acquired data is the main data 2 (YES in step 1101), the main data 2 is stored in the spectrum buffer 361 (step 1109).

When the packet 1 is lost and the redundant data 3 is acquired (NO in step 1101), the composite waveform generator 364c acquires the time waveform data required to generate the composite waveform 11 from the time waveform buffer 366. (Step 1102). Subsequently, the composite

waveform generation units

364a and 364b acquire spectrum data necessary for generating the composite waveform 11 from the spectrum buffer 361.

Next, any of the composite waveform generation units 364a to 364c executes the waveform / spectrum synthesis process, and the composite waveform 11 is generated according to the waveform synthesis method set in each unit (step 1104). When one composite waveform 11 is generated, it is determined whether or not there is a waveform synthesis method that has not been executed (step 1105). If there is an unprocessed waveform synthesis method (YES in step 1105), step 1104 generates the composite waveform 11 according to the following waveform synthesis method.
When all the waveform synthesis methods are executed (NO in step 1105), the time waveform data of the composite waveform 11 that requires MDCT processing (here, the data generated by the composite waveform generation unit 364c) is spectrumed by the MDCT unit 365. It is converted into data (step 1106).
As a result, the spectrum data _X'idec [k] of each composite waveform 11 is acquired. Here, the spectral data of the composite waveforms 1 to 3 ( _X'1dec [k], _X'2dec [k], and _X'3dec [k]) are calculated.

Next, the spectrum substitution unit 360 executes a substitution region setting process for setting a substitution region for replacing the spectrum component (step 1107). Specifically, an index for allocating the spectrum components of the composite waveforms 1 to 3 and an index for allocating the spectrum components of the redundant data are set for _{the spectrum X out} [] of the frame to be reproduced.
The basic process of the replacement area setting process is substantially the same as the process of step 304 shown in FIG. 15, but the same number of arrays as the waveform synthesis method are prepared as the array reply_isp [] for storing the index to be replaced. Will be done.

Here, as sequences corresponding to the composite waveforms 1 to 3, reply_isp_1 [], reply_isp_2 [], and reply_isp_3 [] are prepared.
For example, based on the meta information that specifies each coded frequency range 70, an array of indexes redund_isp [] to which the redundant data 3 is assigned is configured. Further, based on the same meta information, an index that specifies an interpolation range 71 other than each coded frequency range 70 is calculated.
Further, with reference to the meta information that specifies the waveform synthesis method set in each interpolation range 71, an array of interpolation ranges 71 corresponding to the composite waveforms 1 to 3 is configured. For example, based on the specification that the composite waveform 1 is used for the interpolation range 71 on the lowest frequency side, the index included in the interpolation range 71 is stored in response_isp_1 [].

Next, the spectrum replacement process is executed by the spectrum replacement unit 360 using the processing result of the replacement region setting process. Here, the spectral components of the redundant data 3 and the composite waveforms 1 to 3 are substituted into _{X out} [] according to the indexes specified by redund_isp [] and each reply_isp_i []. In this way, in the spectrum replacement process, the spectrum replacement unit 360 interpolates the redundant data 3 for each interpolation range 71 using the composite waveform 11 generated by the synthesis method specified by the meta information, and the interpolation data X _out [ ] Is generated.

The basic processing of the spectrum replacement processing is substantially the same as the processing of step 305 shown in FIG. 15, but the number of sequences used is different. For example, the response_isp [] referenced in step 504 in the detailed processing flow of step 305 (see FIG. 18) is extended to response_isp_1 [] to response_isp_3 []. The value to be assigned to the X _out [isp] is, X corresponds to the synthesized waveform _{1 ~ 3 '1dec [isp]} , X' 2dec [isp], and the one of _X '3dec [isp].
_{The X out} [] generated as the interpolated data is stored in the spectrum buffer for the next and subsequent processing (step 1109).

_{IMDCT processing is executed on the spectrum data X out} [] (main data 2 or interpolated data) processed by the spectrum replacement unit 360 (step 1110). Next, the time signal output unit 363 reconstructs the audio signal from the result of IMDCT (step 1111). The processing of steps 1109 to 1111 is the same as the processing of steps 306 to 308 shown in FIG.
Finally, the time signal output unit 363 stores the generated audio signal in the time waveform buffer 366 for the next and subsequent processing.

As described above, in the receiving device according to the present embodiment, the redundant data 3 is interpolated by using a plurality of synthesis methods having different waveform synthesis methods. The allocation of these synthesis methods is preset on the transmission device 20 side so as to reduce the amount of noise in the interpolation result. As a result, even if the packet 1 is lost or the like, it is possible to realize high-quality error concealment with less noise.

<Other Embodiments>
The present technology is not limited to the embodiments described above, and various other embodiments can be realized.

In the above, the transmission / reception system that mainly performs wireless communication has been described. The present technology is not limited to this, and can be applied to, for example, a system for transmitting and receiving waveform data by wired communication. For example, this technique may be used as a PLC method when playing music by using network streaming or the like.

It is also possible to combine at least two feature parts among the feature parts related to the present technology described above. That is, the various feature portions described in each embodiment may be arbitrarily combined without distinction between the respective embodiments. Further, the various effects described above are merely examples and are not limited, and other effects may be exhibited.

In the present disclosure, "same", "equal", "orthogonal", etc. are concepts including "substantially the same", "substantially equal", "substantially orthogonal", and the like. For example, a state included in a predetermined range (for example, a range of ± 10%) based on "exactly the same", "exactly equal", "exactly orthogonal", etc. is also included.

The present technology can also adopt the following configurations.
(1) A quality prediction unit that predicts the waveform quality of the restored waveform related to the target frame of the waveform data,
A range setting unit that sets at least one target range as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame based on the waveform quality.
A transmission device including a data generation unit that generates the redundant data based on the target range and generates transmission data including the redundant data.
(2) The transmitter according to (1).
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The quality prediction unit is a transmission device that predicts the waveform quality of the composite waveform as the waveform quality of the restored waveform.
(3) The transmitter according to (2).
The target frame is a frame in the vicinity of the transmission frame transmitted as the transmission data.
The quality prediction unit is a transmission device that generates the composite waveform of the target frame based on the waveform data included in the transmission frame.
(4) The transmitter according to (2) or (3).
The quality prediction unit calculates a noise spectrum representing the waveform quality of the composite waveform based on the composite waveform and the original waveform represented by the waveform data included in the target frame.
The range setting unit is a transmission device that sets the target range based on the noise spectrum.
(5) The transmitter according to (4).
The range setting unit calculates the total noise amount of the interpolated data obtained by interpolating the redundant data with the composite waveform based on the noise spectrum and the quantization noise accompanying the coding of the redundant data, and the total noise amount is the minimum. A transmitter that sets the target range so as to be.
(6) The transmitter according to (4) or (5).
The noise spectrum is one of a spectrum obtained by frequency-converting the difference between the original waveform and the composite waveform, or a spectrum representing the difference between the spectrum of the original waveform and the spectrum of the composite waveform.
(7) The transmitter according to any one of (4) to (6).
The range setting unit is a transmission device that sets an integration range for calculating an integrated value of the noise spectrum, and sets the target range based on the minimum integrated range in which the integrated value exceeds a first threshold value.
(8) The transmitter according to (7).
The range setting unit is a transmission device that sets the minimum frequency of the integration range to the minimum frequency of the noise spectrum, changes the maximum frequency of the integration range, and calculates the integrated value.
(9) The transmitter according to any one of (4) to (6).
The range setting unit is a transmission device that calculates at least one excess range in which the noise spectrum exceeds a second threshold value set for each frequency, and sets the target range based on the at least one excess range.
(10) The transmitter according to any one of (1) to (9).
The range setting unit is a transmission device that calculates a plurality of candidate ranges that are candidates for the target range and sets the target range based on the plurality of candidate ranges.
(11) The transmitter according to (10).
The range setting unit is a transmission device that calculates a connection cost representing the amount of noise that changes by connecting the candidate ranges adjacent to each other, and connects the candidate ranges based on the connection cost.
(12) The transmitter according to (10) or (11).
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The range setting unit extracts the toned frequency component included in the waveform data of the target frame, and sets the candidate range so that the toned frequency component is included on the high frequency side of a predetermined threshold frequency. A transmitter that adjusts the width.
(13) The transmitter according to any one of (10) to (12).
The range setting unit is a transmission device that adjusts the width of the candidate range based on noise components at the highest and lowest frequencies of the candidate range.
(14) The transmitter according to any one of (1) to (13).
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The range setting unit is a transmission device that sets one of a plurality of synthesis methods for generating the composite waveform in a non-target range that is a frequency range other than the target range.
(15) The transmitter according to (14).
The quality prediction unit predicts the waveform quality of the composite waveform for each of the plurality of synthesis methods.
The range setting unit is a transmission device that sets the target range and the synthesis method assigned to the non-target range based on the waveform quality predicted for each of the plurality of synthesis methods.
(16) The transmitter according to (15).
The range setting unit calculates at least one candidate range that is a candidate for the target range for each of the plurality of synthesis methods, and calculates a frequency range represented by the intersection of the candidate ranges calculated for each of the plurality of synthesis methods. A transmission device that sets the target range and sets the method that minimizes the integrated value of the noise spectrum among the plurality of synthesis methods in the non-target range.
(17) Predict the waveform quality of the restored waveform for the target frame of the waveform data,
Based on the waveform quality, at least one target range is set as a frequency range to be assigned to the redundant data for generating the restored waveform from the waveform data included in the target frame.
A transmission method in which a computer system executes to generate the redundant data based on the target range and generate transmission data including the redundant data.
(18) A receiving unit that receives redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform related to the target frame of the waveform data.
A receiving device including a waveform restoration unit that generates the restoration waveform based on the redundant data.
(19) The receiving device according to (18).
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The receiving unit receives designated information that specifies a method for synthesizing the synthesized waveform for each non-target range that is a frequency range other than the target range.
The waveform restoration unit is a receiving device that interpolates the redundant data using the composite waveform generated by the synthesis method specified by the designated information for each non-target range.
(20) Reconstructed with respect to the target frame of the waveform data Based on the waveform quality of the waveform, the redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame is received.
A receiving method in which a computer system performs to generate the restored waveform based on the redundant data.
(21) Steps for predicting the waveform quality of the restored waveform with respect to the target frame of the waveform data, and
A step of setting at least one target range as a frequency range to be assigned to redundant data for generating the restored waveform from the waveform data included in the target frame based on the waveform quality.
A program that causes a computer system to perform a step of generating the redundant data based on the target range and generating transmission data including the redundant data.
(22) A step of receiving redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform related to the target frame of the waveform data.
A program that causes a computer system to perform steps to generate the restored waveform based on the redundant data.

1 ... Packet 2 ... Main data 3 ... Redundant data 4 ... Original waveform 5 ... Restored waveform 6 ... Original data 7 ... Target frame 8 ... Transmission frame 10 ... Waveform data 11 ... Synthetic waveform 13 ... Noise spectrum 14 ... Threshold curve 15 ... Tone Component 20 ... Transmitter 24 ... Redundant data generator 50 ... Receiver 58 ... Signal processing unit 70 ... Coded frequency range 71 ... Interpolation range 72 ... Integration range 74 ... Excess range 75 ... Candidate range 100 ... Transmission / reception system

Claims

A quality prediction unit that predicts the waveform quality of the restored waveform related to the target frame of the waveform data,
A range setting unit that sets at least one target range as a frequency range assigned to redundant data for generating the restored waveform from the waveform data included in the target frame based on the waveform quality.
A transmission device including a data generation unit that generates the redundant data based on the target range and generates transmission data including the redundant data.
The transmitting device according to claim 1.
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The quality prediction unit is a transmission device that predicts the waveform quality of the composite waveform as the waveform quality of the restored waveform.
The transmitting device according to claim 2.
The target frame is a frame in the vicinity of the transmission frame transmitted as the transmission data.
The quality prediction unit is a transmission device that generates the composite waveform of the target frame based on the waveform data included in the transmission frame.
The transmitting device according to claim 2.
The quality prediction unit calculates a noise spectrum representing the waveform quality of the composite waveform based on the composite waveform and the original waveform represented by the waveform data included in the target frame.
The range setting unit is a transmission device that sets the target range based on the noise spectrum.
The transmitting device according to claim 4.
The range setting unit calculates the total noise amount of the interpolated data obtained by interpolating the redundant data with the composite waveform based on the noise spectrum and the quantization noise accompanying the coding of the redundant data, and the total noise amount is the minimum. A transmitter that sets the target range so as to be.
The transmitting device according to claim 4.
The noise spectrum is one of a spectrum obtained by frequency-converting the difference between the original waveform and the composite waveform, or a spectrum representing the difference between the spectrum of the original waveform and the spectrum of the composite waveform.
The transmitting device according to claim 4.
The range setting unit is a transmission device that sets an integration range for calculating an integrated value of the noise spectrum, and sets the target range based on the minimum integrated range in which the integrated value exceeds a first threshold value.
The transmitting device according to claim 7.
The range setting unit is a transmission device that sets the minimum frequency of the integration range to the minimum frequency of the noise spectrum, changes the maximum frequency of the integration range, and calculates the integrated value.
The transmitting device according to claim 4.
The range setting unit is a transmission device that calculates at least one excess range in which the noise spectrum exceeds a second threshold value set for each frequency, and sets the target range based on the at least one excess range.
The transmitting device according to claim 1.
The range setting unit is a transmission device that calculates a plurality of candidate ranges that are candidates for the target range and sets the target range based on the plurality of candidate ranges.
The transmitting device according to claim 10.
The range setting unit is a transmission device that calculates a connection cost representing the amount of noise that changes by connecting the candidate ranges adjacent to each other, and connects the candidate ranges based on the connection cost.
The transmitting device according to claim 10.
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The range setting unit extracts the toned frequency component included in the waveform data of the target frame, and sets the candidate range so that the toned frequency component is included on the high frequency side of a predetermined threshold frequency. A transmitter that adjusts the width.
The transmitting device according to claim 10.
The range setting unit is a transmission device that adjusts the width of the candidate range based on noise components at the highest and lowest frequencies of the candidate range.
The transmitting device according to claim 1.
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The range setting unit is a transmission device that sets one of a plurality of synthesis methods for generating the composite waveform in a non-target range that is a frequency range other than the target range.
The transmitting device according to claim 14.
The quality prediction unit predicts the waveform quality of the composite waveform for each of the plurality of synthesis methods.
The range setting unit is a transmission device that sets the target range and the synthesis method assigned to the non-target range based on the waveform quality predicted for each of the plurality of synthesis methods.
The transmitting device according to claim 15.
The range setting unit calculates at least one candidate range that is a candidate for the target range for each of the plurality of synthesis methods, and calculates a frequency range represented by the intersection of the candidate ranges calculated for each of the plurality of synthesis methods. A transmission device that sets the target range and sets the method that minimizes the integrated value of the noise spectrum among the plurality of synthesis methods in the non-target range.
Predict the waveform quality of the restored waveform for the target frame of the waveform data,
Based on the waveform quality, at least one target range is set as a frequency range to be assigned to the redundant data for generating the restored waveform from the waveform data included in the target frame.
A transmission method in which a computer system executes to generate the redundant data based on the target range and generate transmission data including the redundant data.
A receiver that receives redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame based on the waveform quality of the restored waveform with respect to the target frame of the waveform data.
A receiving device including a waveform restoration unit that generates the restoration waveform based on the redundant data.
The receiving device according to claim 18.
The restored waveform is a waveform generated based on the redundant data and the composite waveform related to the target frame.
The receiving unit receives designated information that specifies a method for synthesizing the synthesized waveform for each non-target range that is a frequency range other than the target range.
The waveform restoration unit is a receiving device that interpolates the redundant data using the composite waveform generated by the synthesis method specified by the designated information for each non-target range.
Based on the waveform quality of the restored waveform with respect to the target frame of the waveform data, the redundant data assigned to at least one target range of the frequency range of the waveform data included in the target frame is received.
A receiving method in which a computer system performs to generate the restored waveform based on the redundant data.