WO2019008700A1

WO2019008700A1 - Audio decoding device for digital broadcasting

Info

Publication number: WO2019008700A1
Application number: PCT/JP2017/024652
Authority: WO
Inventors: 忠俊大久保
Original assignee: 三菱電機株式会社
Priority date: 2017-07-05
Filing date: 2017-07-05
Publication date: 2019-01-10
Also published as: DE112017007504T5; DE112017007504B4

Abstract

A detection unit (20a) detects a frequency sub-band to which zero bits are allocated in bit allocation information for each sub-band. A correction unit (20b) corrects the bit allocation to zero for a sub-band of higher frequencies than a sub-band of the lowest frequency in sub-bands to which zero bits are allocated.

Description

Audio decoding device for digital broadcasting

The present invention relates to an apparatus for decoding digital broadcast audio.

In digital broadcasting, voice decoding is performed by receiving voice compressed by a voice codec. If the digital broadcast reception condition is poor, the pre-decoding data, that is, the data to be decoded contains an error. In particular, the old audio codec MPEG Audio Layer 2 used in digital radio broadcast (DAB; Digital Audio Broadcast) and digital television broadcast (DVB-T; Digital Video Broadcasting-Terrestrial) etc. in Europe is less resistant to errors. Important information such as bit allocation information and scale factor information is included in the bare and compressed data frame. For this reason, if an error gets into the part of such important information, it will often become a big unpleasant noise.
Therefore, for example, Patent Document 1 describes an audio reproduction apparatus that detects an abnormal frame included in audio data and replaces data of the detected abnormal frame with data of normal frames before and after. In addition, the audio reproduction device is configured to perform processing for muting the abnormal frame when the abnormal frame continues.

Patent No. 3596978 gazette

However, as in the case of the audio reproduction device of Patent Document 1 described above, when the replacement using the normal frames before and after or mute processing is performed, the data of the frame determined to be abnormal is not used at all for reproduction. . In other words, not only the errored part in the abnormal frame but also the part without error will not be used at all for reproduction. Therefore, it was not possible to leave the original speech component as much as possible.

The present invention has been made to solve the above-described problems, and it is an object of the present invention to provide an audio decoding device for digital broadcast that can suppress the influence of an error causing abnormal noise while leaving the original audio component. With the goal.

The audio decoding apparatus for digital broadcast according to the present invention comprises a detection unit for detecting the lowest frequency sub-band among the sub-bands for which the bit allocation is 0 for the bit allocation information of each sub-band, and a detection unit And a correction unit that corrects the bit allocation of higher frequency sub-bands to 0 than the lowest frequency sub-band out of detected sub-band sub-bands.

Further, in the audio decoding device for digital broadcast according to the present invention, is it possible that the total data amount of one frame calculated using the bit allocation information and the scale factor information exceeds the maximum data amount of one frame specified by the bit rate? And a correction unit that repeats correction to set the bit allocation of the highest frequency sub-band among the sub-bands whose frequency is not 0 when the total data amount exceeds the maximum data amount. And the like.

According to the present invention, it is possible to suppress the influence of an error that causes abnormal noise while leaving the original voice component.

It is a figure which shows the data frame format of MPEG Audio Layer2. It is a figure which shows the structure of a header. It is a figure which shows the meaning of an example of a header. It is an example of the bit stream which can be decoded normally without noise. It is a figure which shows the structure of the sample data of the sub band of group 0. FIG. It is a figure which shows the structure of the sample data of the sub band of group 1. FIG. It is a figure which shows the structure of the sample data of the sub band of group 11. FIG. It is a figure which shows the structure of the data which showed the bit allocation index. It is a figure which shows the value of the maximum value +1 which the sample data of each sub-band can take. It is a figure which shows the bit number allocated to the sample data of each sub-band. It is a figure which shows the bit allocation index information corresponding to FIG. It is a figure which shows the sample data of the sub-band of the group 0 corresponding to FIG. It is a figure which shows the sample data of the sub band of the group 1 corresponding to FIG. It is a figure which shows the sample data of the sub band of the group 11 corresponding to FIG. It is a figure which shows the meaning of scale factor selection information. It is a figure which shows the relationship between a scale factor index and a scale factor. It is a figure which shows the structure of scale factor information. It is a figure which shows the scale factor information corresponding to FIG. It is an example of the bit stream which noise generate | occur | produces. It is an example different from FIG. 13 of the bit stream which noise generate | occur | produces. It is a figure which shows the bit allocation index information corresponding to FIG. It is the figure which correct | amended the bit allocation index information of FIG. It is a figure which shows the sample data of the sub-band of the group 0 corresponding to FIG. It is a figure which shows the sample data of the sub band of the group 1 corresponding to FIG. It is a figure which shows the sample data of the sub band of the group 11 corresponding to FIG. It is a figure which shows the sample data of the sub-band of the group 0 at the time of performing correction | amendment with respect to FIG. It is a figure which shows the sample data of the sub-band of group 1 at the time of performing correction | amendment with respect to FIG. It is a figure which shows the sample data of the sub band of the group 11 at the time of performing correction | amendment with respect to FIG. FIG. 1 is a block diagram showing a configuration of a speech decoding apparatus according to a first embodiment. 20A and 20B are diagrams showing an example of a hardware configuration of the speech decoding apparatus according to the first embodiment. 5 is a flowchart showing processing of the speech decoding apparatus according to the first embodiment. FIG. 7 is a block diagram showing the configuration of a speech decoding apparatus according to a second embodiment. 7 is a flowchart showing processing of the speech decoding apparatus according to the second embodiment. It is a figure which shows the scale factor corresponding to FIG. FIG. 16 is a block diagram showing the configuration of a speech decoding apparatus according to a third embodiment. It is a flowchart which shows the process of the audio | voice decoding apparatus of Embodiment 3. FIG. It is a decoding result of the bit stream of FIG. It is a decoding result of the bit stream of FIG. It is a decoding result after applying bit allocation correction | amendment with respect to the bit stream of FIG. FIG. 30 is a diagram showing scale factor index values when the decoding result of FIG. 29 is obtained. It is a figure which shows the scale factor value corresponding to FIG. FIG. 32 is a diagram showing a ratio of magnitudes to scale factor values of adjacent subbands for the scale factor values of FIG. 31. It is the figure which graphed FIG. It is a figure which shows the scale factor index after correction | amendment. It is a figure which shows the scale factor value corresponding to FIG. FIG. 36 is a diagram showing a ratio of magnitudes to scale factor values of adjacent sub-bands for the scale factor values of FIG. 35. It is a decoding result at the time of correcting a scale factor index value. FIG. 18 is a block diagram showing a configuration of a speech decoding apparatus according to a fourth embodiment. It is a flowchart which shows the process of the speech decoding apparatus of Embodiment 4. FIG. FIG. 18 is a block diagram showing the configuration of a speech decoding apparatus according to a fifth embodiment. It is a block diagram which shows the structure of the audio | voice decoding apparatus as a reference example.

Hereinafter, in order to explain the present invention in more detail, a mode for carrying out the present invention will be described according to the attached drawings.
Embodiment 1
First, the decoding process of MPEG Audio Layer 2 will be briefly described using the audio decoding apparatus 100 for digital broadcast shown in FIG. In addition, regarding the processing of MPEG Audio, it is desirable to use it as a reference, since it is described in a general book. For example, "Point-up-to-date latest MPEG textbook (Pages 167-187)" published by ASCII Corporation on August 1, 1994, etc. is a reference.

The audio decoding apparatus 100 performs processing for decoding MPEG Audio Layer 2 and, as shown in FIG. 41, the synchronization detector 10, the frame separator 11, the bit allocation decoder 12, the inverse quantizer 13, the scale factor decoding as shown in FIG. , An inverse normalizer 15, a subband synthesizer 16, an error detector 17, and a mute controller 18.
The bit stream of compressed speech input to the speech decoding apparatus 100 is detected by the synchronization detector 10 in synchronization and output to the frame separator 11.

The frame separator 11 separates the bit allocation information, the scale factor information, and the quantized sample data from the bit stream synchronized by the synchronization detector 10, and outputs the bit allocation information to the bit allocation decoder 12 for scale The factor information is output to the scale factor decoder 14, and the quantized sample data is output to the inverse quantizer 13.

The bit allocation decoder 12 decodes the number of allocated bits of quantized sample data of each subband from the bit allocation information, and outputs the decoded bit data to the inverse quantizer 13.
The inverse quantizer 13 separates the quantized sample data into individual sample data for each subband, using the number of allocated bits output from the bit allocation decoder 12. The dequantizer 13 outputs the separated individual sample data to the denormalizer 15.

The scale factor decoder 14 decodes the scale factor index value of each subband from the scale factor information, and outputs the value to the denormalizer 15.
The denormalizer 15 denormalizes the sample data output from the dequantizer 13 using a scale factor value corresponding to the scale factor index value. The denormalizer 15 outputs the denormalized sample data to the subband synthesizer 16.

The subband synthesizer 16 synthesizes the denormalized sample data of each subband and outputs it as time-series audio data.
In the audio decoding device 100 of FIG. 41, when an error is detected by the error detector 17, the output of audio data is stopped by the mute controller 18, and the audio decoding device 100 is in the mute state.

In the following, for the sake of easy understanding of the description and simplification, the case where the sampling frequency is 48 kHz, the bit rate is 256 kbps, and the channel configuration is Stereo will be described as an example.
The data frame format of MPEG Audio Layer 2 is shown in FIG. The bit stream is composed of a series of a plurality of frames, and one frame is composed of a header, bit allocation information, cyclic redundancy check (CRC), scale factor information, quantized sample data, and additional data.

The header is a fixed length of 4 bytes. The bit allocation information is 22 bytes when the bit rate is 256 kbps. The presence or absence of the CRC is designated by the header, and is 2 bytes if it is present, and 0 bytes if it is absent. The additional data is also referred to as padding data.

Since one frame has data for 24 ms and the sampling frequency is 48 kHz, the number of discrete audio data after decoding of one frame, that is, represented as continuous samples in the time domain, is the following equation (1) It will be.
24 × 10 ^-3 × 48 ³ = 1152 [samples] (1)
Since it is Stereo, the number of discrete audio data is 1152 samples for each of Lch on the left and Rch on the right.
The structure of the header is as shown in FIG. 2A.
When the bit rate is 256 kbps, the number of data bytes before decoding one frame is as shown in the following equation (2).
256 × 10 ³ × 24 × 10 ^-3 × 1/8
= 768 [bytes] ... (2)

In the following, when describing a specific example, a bit stream shown in FIG. 3 is used as an example as an example of a data frame that can be decoded normally without noise. In FIG. 3, the header, the bit allocation information, the scale factor information, and the delimitation of the quantization sample data are indicated by dashed lines. The quantized sample data is 58 bytes of data for each group.
In the bit stream shown in FIG. 3, the header is 32 bits (4 bytes) of FF FD C400, and the meaning of this header is as shown in FIG. 2B.

In MPEG Audio Layer 2, at the time of encoding, audio data is divided into 32 subbands by the subband filter bank, unnecessary data is removed from the signal of each subband, and bit allocation for each subband is optimized and It has been reduced.
In the subband division filter at the time of encoding, time domain data of one frame is 32 times in the frequency direction from the number of samples 1152, and 1/32 in the time direction to 32 (the number of subbands, 32 in the frequency direction Pieces) × 36 (the number of samples of each subband, 36 in the time direction). That is, each subband in one frame has 36 pieces of data.

At the time of decoding, the sub-band synthesizer 16 combines data of 32 sub-bands × 36 samples to restore audio data of 1152 samples in the time domain.

The quantized sample data has a configuration in which 12 groups of groups 0 to 11 are arranged as shown in FIG. Each group has data for three samples in the time direction. Specifically, group 0 has

samples

0, 1, 2 and group 1 has 3, 4, 5 and so forth. 4A to 4C show the basic structure of sample data of subbands. Although FIG. 4A shows group 0, FIG. 4B shows group 1 and FIG. 4C shows group 11, the same applies to other groups.

The bit allocation information to each sample of the quantized sample data of each actual subband is given by bit allocation information. Further, in the quantized sample data, the amplitude of each subband is normalized, and the actual amplitude of each subband is separately given by scale factor information.

The data structure and processing of these bit allocation information and scale factor information necessary for decoding will be described below.
First, bit allocation information will be described.
In the bit allocation information, data having a structure shown in FIG. 5 is transmitted as a bit allocation index.

Lower frequency components, that is, four bits for subbands 0 to 10, three bits for subbands 11 to 22, two bits for subbands 23 to 26, and subbands 27 to 31 in ascending order of subband numbers. 0 bits of information are separately provided to Lch and Rch of Stereo respectively, and a total of 22 bytes of information is transmitted. As the bit allocation index is up to 4 bits, the bit allocation index can take values from 0 to up to 15. Note that the sub-bands 27-31 do not substantially transmit information.

The combination of subband numbers 0 to 31 and bit allocation index values of 0 to 15 gives quantization level values for each subband shown in FIG. 6A. This value is defined by ISO-IEC 11172-3 which is a standard of MPEG Audio Layer 2. Nbal in FIG. 6A corresponds to the number of bits in FIG. Also, index in FIG. 6A indicates a bit allocation index.

Assuming that the quantization level shown in FIG. 6A is N, each value of the quantization sample data is quantized as a value of 0 to N-1. Since the values shown in FIG. 6A are 2 ⁿ -1 (n is an integer of 2 or more), the number of bits allocated to each value is n. However, when the quantization level is 3, 5 or 9, the three sample values are combined into one granule as a granule by the following equations (3) to (5), and then bit allocation is performed, and the required number of bits Follow this as the reduction of Equation (3) corresponds to the case where the quantization level is 3 (that is, 0 to 2), equation (4) corresponds to the case where the quantization level is 5 (that is, 0 to 4), and equation (5) Corresponds to the case where the quantization level is 9 (that is, 0 to 8).

FIG. 6A shows the possible maximum value + 1 value of sample data (three sample values) of each subband. Also, FIG. 6B shows the number of bits allocated to sample data (three sample values) of each subband.

The relationship between the value of the bit allocation index and the bit allocation to the quantized sample data of each sub-band is shown in FIG. 7 as a specific example that can be normally decoded without abnormal noise.
In FIG. 7, bit allocation index values of high frequency components of the Lch sub-band 17 and subsequent Rch sub-bands 18 are zero. At this time, as shown in FIGS. 8A to 8C showing sample data of subbands, subband sample data of Lch after subband 17 and subbands of Rch after 18 are not transmitted. Further, with respect to the Lch subband 16 and before the Rch subband 17, the quantization level value and the number of bits of samples shown in FIGS. 6A to 7 are obtained.

Since some of the quantized sample data are bit-allocated as granules, 3 samples are one group, and the bit allocation is always in integer units. Conversely, for granulated samples, the bit allocation is 1/3 bit per sample. Therefore, the number of bits allocated to a group of one quantized sample data of 32 subbands × 3 samples in the example of FIG. 7 is 464 bits, ie, 58.0 bytes, three times the number of bits of each sample of all 32 subbands. This is 12 times in the entire quantized sample data.

Subsequently, scale factor information will be described.
The scale factor information is composed of two, scale factor selection information and scale factor index information. As for both the scale factor selection information and the scale factor index information, only the information of the sub-band for which the bit allocation is made available by the bit allocation index is transmitted.
The scale factor selection information is assigned to each ch of Lch and Rch and to each sub-band. The scale factor index information is allocated three to each ch and each subband of Lch and Rch respectively.
However, scale factor index information shares one or two of the three values with other scale factor index information according to the value of scale factor selection information.

The scale factor selection information (ScFsi) is a value of 2 bits each and has the meaning shown in FIG. The scale factor index information (ScFi) is a value of 6 bits each, has an index value of 0 to 62, and the scale factor value shown in FIG. 10 is used according to each index value. In FIG. 10, value corresponds to the scale factor value.

FIG. 12 shows a specific example of the scale factor information that can be decoded normally without noise. FIG. 11 shows the structure of scale factor information.
The scale factor selection information (ScFsi) transmits one to three pieces of scale factor index information for each channel and sub-band with bit allocation.
The required data amount of the scale factor selection information (ScFsi) is 2 bits for each channel and sub-band with bit allocation, and 70 bits (8 bytes + 6 bits) in the example of FIG.
The required data amount of the scale factor index information (ScFi) is determined by ch with bit allocation and scale factor selection information (ScFsi) for each sub-band, and is 294 bits (36 bytes + 6 bits) in the example of FIG.

Here, in FIG. 7 and FIG. 12, the same data is used as an example at the time of normal, and the data amount of one frame in this case is as shown in the following equation (6).
32 + 176 + 0 + (70 + 294) + 464 x 12
= 6140 [bits]
= 767 [Bytes] + 4 [bits] ... (6)
The amount of data of one frame can be calculated using bit allocation information and scale factor information as shown in equation (6).

The outline of the MPEG Audio Layer 2 decoding process has been described above using an example of data that can be decoded normally. Next, speech decoding apparatus 1 according to Embodiment 1 will be described. The audio decoding device 1 decodes audio of digital broadcasting.

FIG. 13 is an example of a data frame in which abnormal noise occurs in the decoding process by the speech decoding apparatus 100 in FIG. 41 described above, unlike FIG. 3. In FIG. 13, the header, the bit allocation information, the scale factor information, and the delimitation of the quantized sample data are indicated by dashed lines. The quantized sample data is data of 73 bytes + 2 bits for each group. For reference, another example as shown in FIG. 14 can be considered as a data frame in which abnormal noise occurs.
In the data frame shown in FIG. 13, the bit allocation index information has the values shown in FIG. As can be seen from FIG. 15, once the bit allocation becomes 0 at the frequency of Lch subband 16 or higher and the frequency of Rch 17 or higher, the subbands of high frequency subbands 23 to 26 are again generated. There is a non-zero bit allocation to appear.

When encoding processing of actual audio data according to MPEG Audio Layer 2, generally, the bit allocation and the scale factor value become large at low frequencies because major information is present at low frequency components, and The bit allocation and the scale factor values tend to be smaller as they are.
The bit allocation shown in FIG. 15 is different from such a general tendency, and it is estimated that an error is introduced into the component of the high frequency sub-band in the bit allocation information.

Therefore, in the first embodiment, when a value other than 0, that is, a value indicating the presence of bit allocation appears in a subband higher in frequency than a subband in which bit allocation information has once become 0, the sub A correction is made so as not to allocate bits in the band.
This correction indicates that, for the data frame of FIG. 13, for example, the bit allocation index information shown in FIG. 15 is corrected as shown in FIG.
In addition, if such correction is performed so that even if there is at least one subband in which bit allocation is 0, correction may be performed too much, so subbands with bit allocation 0 are continuous. Detecting occurrence, do not perform bit allocation for sub-bands of higher frequency than the highest frequency sub-band among bit frequencies of continuous sub-bands of 0, that is, set bit allocation to 0 You may do so.
FIGS. 17A to 17C are diagrams showing sample data of subbands when bit allocation correction is not performed on the data frame of FIG. 18A to 18C are diagrams showing sample data of sub-bands when bit allocation correction is performed on the data frame of FIG.

FIG. 19 is a block diagram showing the configuration of speech decoding apparatus 1 of the first embodiment that performs such correction.
The speech decoding apparatus 1 includes a synchronization detector 10, a frame separator 11, a bit allocation decoder 12, an inverse quantizer 13, a scale factor decoder 14, an inverse normalizer 15, a subband synthesizer 16, and bit allocation error correction. A vessel 20.

The synchronization detector 10, the frame separator 11, the bit allocation decoder 12, the dequantizer 13, the scale factor decoder 14, the denormalizer 15 and the subband synthesizer 16 have already been described with reference to FIG. Do the same process as

The bit allocation error corrector 20 includes a detection unit 20a and a correction unit 20b.
The detection unit 20a detects a subband of a frequency where the bit allocation value is 0, that is, the bit allocation is 0, for the bit allocation information of each subband separated by the frame separator 11 and decoded by the bit allocation decoder 12. Do. Then, the detection unit 20a notifies the correction unit 20b of the subband of the lowest frequency among the subbands in which the detected bit allocation value is 0.

The correction unit 20 b corrects the bit allocation of the subband of a frequency higher than the subband of the lowest frequency for which the bit allocation value is 0 notified from the detection unit 20 a to 0. The correction unit 20 b outputs the bit allocation information after correction to the frame separator 11. Thereby, the correction unit 20b causes the frame separator 11 to perform separation again.

After being reworked in the frame separator 11, the processing by the bit allocation decoder 12 to the subband synthesizer 16 is performed again, and the audio data is output from the subband synthesizer 16.

Here, a hardware configuration example of the speech decoding device 1 will be described using FIGS. 20A and 20B.
The synchronization detector 10, the frame separator 11, the bit allocation decoder 12, the dequantizer 13, the scale factor decoder 14, the denormalizer 15, the subband synthesizer 16 and the bit allocation error corrector of the speech decoding apparatus 1 Each function of 20 is realized by a processing circuit. The processing circuit may be dedicated hardware or a CPU (Central Processing Unit) that executes a program stored in the memory. The CPU is also called a central processing unit, a processing unit, a computing unit, a microprocessor, a microcomputer, a processor or a DSP (Digital Signal Processor).

FIG. 20A shows a synchronization detector 10, a frame separator 11, a bit allocation decoder 12, an inverse quantizer 13, a scale factor decoder 14, an inverse normalizer 15, a subband synthesizer 16 and a bit allocation error corrector 20. It is a figure which shows the example of a hardware configuration at the time of implement | achieving the function of each part of with the processing circuit 101 which is exclusive hardware. The processing circuit 101 may be, for example, a single circuit, a complex circuit, a programmed processor, a parallel programmed processor, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or a combination thereof. Do. Functions of parts of synchronization detector 10, frame separator 11, bit allocation decoder 12, dequantizer 13, scale factor decoder 14, denormalizer 15, subband synthesizer 16 and bit allocation error corrector 20 May be realized by combining separate processing circuits 101, or the function of each part may be realized by one processing circuit 101.

FIG. 20B shows synchronization detector 10, frame separator 11, bit allocation decoder 12, dequantizer 13, scale factor decoder 14, denormalizer 15, subband synthesizer 16 and bit allocation error corrector 20. FIG. 6 is a diagram illustrating an example of a hardware configuration in the case where the functions of the respective units are realized by the CPU 103 that executes a program stored in the memory 102. In this case, synchronization detector 10, frame separator 11, bit allocation decoder 12, dequantizer 13, scale factor decoder 14, denormalizer 15, subband synthesizer 16 and bit allocation error corrector 20 are used. The function of each unit is realized by software, firmware, or a combination of software and firmware. The software and the firmware are described as a program and stored in the memory 102. The CPU 103 reads out and executes the program stored in the memory 102, whereby the synchronization detector 10, the frame separator 11, the bit allocation decoder 12, the dequantizer 13, the scale factor decoder 14, the denormalizer 15, implement the functions of the sub-band synthesizer 16 and each part of the bit allocation error corrector 20. That is, the speech decoding apparatus 1 has a memory 102 for storing a program or the like that results in the execution of steps ST1 to 13 shown in the flowchart of FIG. 21 described later. Also, these programs include a synchronization detector 10, a frame separator 11, a bit allocation decoder 12, an inverse quantizer 13, a scale factor decoder 14, an inverse normalizer 15, a subband synthesizer 16 and a bit allocation error. It can also be said that the procedure or method of each part of the corrector 20 is to be executed by a computer. Here, the memory 102 is, for example, nonvolatile or volatile, such as a random access memory (RAM), a read only memory (ROM), a flash memory, an erasable programmable ROM (EPROM), or an electrically erasable programmable ROM (EEPROM). A semiconductor memory or a disc-shaped recording medium such as a magnetic disc, a flexible disc, an optical disc, a compact disc, a mini disc, a DVD (Digital Versatile Disc), or the like, and a combination thereof are applicable.

Note that each part of the synchronization detector 10, the frame separator 11, the bit allocation decoder 12, the dequantizer 13, the scale factor decoder 14, the denormalizer 15, the subband synthesizer 16 and the bit allocation error corrector 20 Some of the functions of the above may be realized by dedicated hardware, and some may be realized by software or firmware. For example, the synchronization detector 10, the frame separator 11, the bit allocation decoder 12 and the dequantizer 13 are realized by processing circuits as dedicated hardware, and the scale factor decoder 14 and the denormalizer are realized. The sub-band synthesizer 16 and the bit allocation error corrector 20 can realize their functions by the processing circuit reading and executing the program stored in the memory.

Thus, the processing circuit may be the above-mentioned synchronization detector 10, frame separator 11, bit allocation decoder 12, dequantizer 13, scale factor decoder 14, by hardware, software, firmware or a combination thereof. The functions of the inverse normalizer 15, the subband synthesizer 16 and the parts of the bit allocation error corrector 20 can be realized.

Next, an example of the process of the speech decoding device 1 configured as described above will be described using the flowchart shown in FIG. The process shown in FIG. 21 is started when a bit stream of compressed speech is input to the speech decoding apparatus 1.
First, the synchronization detector 10 detects synchronization of the input compressed voice bit stream and outputs the bit stream to the frame separator 11 (step ST1).
Subsequently, the frame separator 11 separates the bit allocation information, the scale factor information, and the quantized sample data from the bit stream synchronized by the synchronization detector 10 and outputs the bit allocation information to the bit allocation decoder 12 Then, the scale factor information is output to the scale factor decoder 14, and the quantized sample data is output to the inverse quantizer 13 (step ST2).

The bit allocation decoder 12 decodes the number of allocated bits of the quantized sample data of each subband from the bit allocation information (step ST3). The bit allocation decoder 12 outputs the decoded allocation bit number to the bit allocation error corrector 20.

The detection unit 20a first sets a subband i to be processed as i = 0 (step ST4).
Subsequently, the detection unit 20a determines whether the bit allocation of subband i is other than 0 (step ST5).
If the bit allocation of subband i is other than 0 (step ST5; YES), the process proceeds to the process of step ST7.
On the other hand, when the bit allocation of subband i is 0 (step ST5; NO), detection unit 20a determines whether a subband of bit allocation 0 is present in a subband lower than subband i (step ST6). ).

If a subband with a bit allocation of 0 does not exist in a subband lower than subband i (step ST6; NO), the detection unit 20a performs an increment process to set i = i + 1 (step ST7).
Subsequently, the detection unit 20a determines whether i is smaller than 32 (step ST8).
If i is smaller than 32 (step ST8; YES), the process returns to the process of step ST6.
On the other hand, when i is 32 or more (step ST8; NO), the bit allocation error corrector 20 ends the process without performing correction by the correction unit 20b. In this case, the dequantizer 13 dequantizes the quantized sample data into individual sample data for each sub-band using the allocated number of bits without correction (step ST10). Further, the scale factor decoder 14 decodes the scale factor index value of each sub-band from the scale factor information (step ST11). Then, the denormalizer 15 denormalizes the sample data output from the dequantizer 13 using a scale factor value corresponding to the scale factor index value (step ST12). Finally, the sub-band synthesizer 16 synthesizes the denormalized sample data of each sub-band and outputs it as time-series audio data (step ST13).

If a subband with a bit allocation of 0 exists in a lower subband than subband i (step ST6; YES), the correction unit 20b corrects the bit allocation of subbands of frequencies higher than subband i to 0. (Step ST9). In this case, the processing from step ST2 is performed again using the corrected bit allocation, and finally time-series audio data is output.

As described above, the speech decoding apparatus 1 performs correction so as to delete only the high frequency sub-band data estimated to have an error, thereby reducing the low frequency not including the error causing the offending noise. The data of the sub-bands of are directly decoded. Therefore, the speech decoding device 1 can suppress abnormal noise while reducing the loss of the original speech component.
On the other hand, when the entire frame in which an error is detected is replaced with a normal frame before or after, or when the entire frame in which an error is detected is muted, as in the conventional case, no error is included. Even subband data will be lost.

As described above, the speech decoding apparatus 1 according to the first embodiment keeps bit allocation information of low frequency components while removing an error included in bit allocation information of high frequency components. Thereby, the audio | voice decoding apparatus 1 can perform the decoding which suppressed the influence by the error which causes abnormal noise, leaving an original audio | voice component.

Second Embodiment
If there is bit allocation information in which bit allocation is 0 in the middle part of subbands 0 to 31, speech decoding apparatus 1 of Embodiment 1 can detect as a frequency estimated to contain an error. it can. However, on the other hand, if there is no bit allocation information in which bit allocation is 0 in the middle part of subbands 0 to 31, speech decoding apparatus 1 of Embodiment 1 estimates the frequency estimated to contain an error. It can not be detected.
Therefore, in the second embodiment, the size of a frame is calculated using bit allocation information and scale factor information, and the presence or absence of necessity of correction is determined based on whether or not the size of one frame determined by the bit rate is exceeded. The determination form will be described. The size of one frame determined by the bit rate is 768 bytes when the bit rate is 256 kbps.

FIG. 22 is a block diagram showing a configuration of speech decoding apparatus 1A according to Embodiment 2. In addition, about the structure which has a function the same as that of the structure already demonstrated in Embodiment 1, or equivalent, the same code | symbol is attached | subjected, and the description is abbreviate | omitted or simplified.

The bit allocation error corrector 20 includes a determination unit 20c and a correction unit 20d.
The determination unit 20 c acquires bit allocation information from the bit allocation decoder 12. Further, the determination unit 20 c acquires scale factor information from the scale factor decoder 14. Also, the determination unit 20 c acquires header information from the frame separator 11 via, for example, the bit allocation decoder 12.

The determination unit 20 c uses the bit allocation information decoded by the bit allocation decoder 12 and the scale factor information decoded by the scale factor decoder 14 as described with reference to Equation (6) above. Calculate the total data volume. Further, the determination unit 20c specifies a bit rate from the header information, and calculates the maximum data amount of one frame from the specified bit rate. Then, the determination unit 20 c compares the total data amount of one frame calculated using the bit allocation information and the scale factor information with the maximum data amount of one frame specified using the bit rate, and compares the comparison result. Output to the correction unit 20d.

If the total data amount of one frame calculated using the bit allocation information and the scale factor information exceeds the maximum data amount of one frame calculated using the bit rate, the correction unit 20d detects a high frequency Perform correction to delete bit allocation information. Specifically, the correction unit 20 d corrects the bit allocation of the highest frequency sub-band among the sub-bands of non-zero bit allocations to 0 and sets the corrected bit allocation information to the frame separator 11. Output to Thereby, the correction unit 20d causes the frame separator 11 to perform separation again. Such correction processing is repeated until the total data amount of one frame calculated as shown in Equation (6) becomes equal to or less than the maximum data amount of one frame calculated using the bit rate. This suppresses the generation of abnormal noise due to incorrect bit allocation of high frequency sub-band components.

Similarly to the speech decoding device 1 of the first embodiment, the speech decoding device 1A of the second embodiment can be realized by the processing circuit 101 shown in FIG. 20A or the memory 102 and the CPU 103 shown in FIG. 20B.

Next, an example of the process of the speech decoding device 1A configured as described above will be described using the flowchart shown in FIG. The process shown in FIG. 23 is started when a bit stream of compressed speech is input to the speech decoding apparatus 1A. The same or corresponding processing as that described with reference to FIG. 21 is denoted by the same reference numeral, and the description thereof is omitted or simplified.
After steps ST1 and ST2, the determination unit 20c obtains the header information to calculate the maximum data amount N of one frame from the bit rate (step ST20). In addition, after steps ST3 and 11, the determination unit 20c calculates the data amount of the quantized sample data using the bit allocation information (step ST21), and calculates the data amount of the scale factor information using the scale factor information. (Step ST22).
Then, using the amount of data calculated in step ST21 and the amount of data calculated in step ST22, the determination unit 20c calculates a total data amount n of one frame as in equation (6) (step ST23).

Subsequently, the determination unit 20c determines whether the total data amount n exceeds the maximum data amount N (step ST24).
If the total data amount n exceeds the maximum data amount N (step ST24; YES), the correction unit 20d corrects the bit allocation of the highest frequency sub-band whose bit allocation is not 0 to 0, and the corrected bit Allocation information is output to the frame separator 11 (step ST25). In this case, the processing from step ST2 is performed again using the corrected bit allocation, and finally time-series audio data is output.

On the other hand, if the total data amount n is less than or equal to the maximum data amount N (step ST24; NO), the bit allocation error corrector 20 ends the processing without performing correction by the correction unit 20d. In this case, time-series audio data is output by the processes of steps ST10, 12, and 13.

As described above, the speech decoding apparatus 1A according to the second embodiment keeps bit allocation information of low frequency components while removing an error included in bit allocation information of high frequency components. As a result, the speech decoding device 1A can perform decoding while leaving the original speech component and suppressing the influence of an error that causes abnormal noise.

Third Embodiment
In the first embodiment and the second embodiment, although the correction for setting the bit allocation of the high frequency sub-band assumed to contain an error to 0 is shown, there may be a case where the presence of an error can not be estimated. Therefore, if it is suspected that an error is included even after correction of bit allocation information, it is desirable to perform final error processing such as muting the entire frame.
Therefore, in the third embodiment, an error check is performed on a scale factor index value after correction of bit allocation information as described in the first embodiment and the second embodiment, and a frame in which an error is detected is an error. The form of the frame will be described.

When scale factor index is given to one and the same ch and the same sub-band two to three, that is, when scale factor selection information ScFsi is 0, 1, 3, adjacent scale factor indices have the same value. Things generally can not happen.
This corresponds to, for example, the two scales given to

Rch subbands

2, 10, and 13 in FIG. 24 showing the scale factor before bit allocation correction corresponding to the data frame in which abnormal noise occurs as shown in FIG. The case where factor index becomes the same value corresponds. In such a case, the speech coding apparatus normally sends only one scale factor index value with the scale factor selection information as 2, so that another adjacent scale factor index in one same ch and the same sub-band is If the values are the same, it is likely that the bit allocation is incorrect.

Based on the above, in the third embodiment, with regard to scale factor indexes obtained after bit allocation correction in the first and second embodiments, adjacent scale factor index values in one same channel and in the same subband are the same. Do some error checking.

The error check is performed by the error detector 21 shown in FIG. FIG. 25 is a block diagram showing a configuration of speech decoding apparatus 1B according to Embodiment 3. In addition, about the structure which has a function the same as that of the structure already demonstrated by

Embodiment

1, 2 or corresponds, the same code | symbol is attached | subjected, and the description is abbreviate | omitted or simplified.
When the correction by the bit allocation error corrector 20 as described in the first and second embodiments is performed, the error detector 21 performs an error check using scale factor information obtained from the frame separator 11 after the correction. Do. Specifically, as described above, the error detector 21 detects that the same scale factor index value is continuous in one and the same ch and the same sub-band. When the channel configuration is not stereo but monaural, the error detector 21 may simply detect that the same scale factor index value is continuous in the same subband.

When the error detector 21 detects that the same scale factor index value is continuous in one and the same ch and the same sub-band, the error detector 21 determines that there is an error, and sets the frame as an error frame. The error detector 21 notifies the mute controller 18 that the frame is an error frame.

The mute controller 18 mutes the entire frame regarded as an error frame. The audio decoding device 1B may have a configuration in which audio output is performed by replacing the frame regarded as an error frame with another frame before and after the frame, instead of the mute controller 18.

Similarly to the speech decoding device 1 of the first embodiment, the speech decoding device 1B of the third embodiment can be realized by the processing circuit 101 shown in FIG. 20A or the memory 102 and the CPU 103 shown in FIG. 20B.

Next, an example of the process of speech decoding apparatus 1B configured as described above will be described using the flowchart shown in FIG. The process shown in FIG. 26 is started when a bit stream of compressed speech is input to the speech decoding device 1B. The same or corresponding processing as that described with reference to FIG. 21 is denoted by the same reference numeral, and the description thereof is omitted or simplified.
It is assumed that the correction by the bit allocation error corrector 20 as described in the first and second embodiments is performed after the steps ST1 to ST3 (step ST30).

The error detector 21 performs the above-described error check on the scale factor index value, and determines whether there is an error (step ST31).
If there is no error (step ST31; YES), time-series audio data is output by the processing of steps ST10 to 13 without being muted by the mute controller 18.

On the other hand, when there is an error (step ST31; NO), the frame is determined as an error frame by the error detector 21, and the mute controller 18 mutes the entire error frame (step ST32).

As described above, the audio decoding device 1B according to the third embodiment can suppress the generation of the abnormal noise even when the abnormal noise can not be suppressed only by the correction in the first and second embodiments.

Fourth Embodiment
The bit stream of FIG. 3 shown as an example of a data frame that can be decoded normally without noise and the bit stream of FIG. 13 shown as an example of a data frame that generates noise are adjacent bit streams in time. is there. The result of decoding the bit stream of FIG. 3 as it is is shown in FIG. 27, and the result of decoding the bit stream of FIG. 13 is shown in FIG.

On the other hand, after applying the bit allocation correction described in Embodiment 1 to the bit stream of FIG. 13, the result of decoding is shown in FIG. As compared with the decoding result shown in FIG. 28, the decoding result shown in FIG. 29 shows improvement in the magnitude of the abnormal sound and the length of the abnormal sound generation section in Rch. However, even in the decoding result shown in FIG. 29, strong components of abnormal noise and suspicious high frequency are still seen.

Such strong components are caused by outliers of the scale factor. Therefore, in the fourth embodiment, an embodiment will be described in which the occurrence of abnormal noise is suppressed by checking the value of the scale factor and correcting the abnormal value.

Hereinafter, detection of an abnormal value of a scale factor and a correction method thereof will be described using a specific data example.
The scale factor index value when the decoding result of FIG. 29 is obtained is shown in FIG. Then, the actual scale factor value is as shown in FIG. 31 according to FIG.
The scale factors of Lch subbands 3 and 11 and the scale factor of the top of Rch subband 10 (the left end of Rch in FIG. 31) are clearly larger singular values than the adjacent subband scale factors. It has become.

Subband 0 has a larger value than the others, but subbands 0 to 2 are main voices, as can be seen from the use of the thickest bit allocation table as shown in FIG. 6A. It is a sub-band that gives features. Therefore, comparisons with adjacent subbands are not performed for these subbands.

The result of having calculated the ratio of the magnitude | size with the scale factor value of an adjacent sub-band about each scale factor value shown in FIG. 31 is shown in FIG. In FIG. 32, for each scale factor value in FIG. 31, the value obtained by dividing the sum of the preceding and succeeding sub-bands is further divided by 2 and shown as a ratio. For example, the ratio of subband 3 of (Lch) [0] is the scale factor value of subband 3 of (Lch) [0] in FIG. 31 and the scale factor value of subband 2 of (Lch) [0] in FIG. (Lch) A value obtained by dividing the sum of the scale factor values of the sub-band 4 of [0] by 2 and dividing the sum by two.
As shown in FIG. 32, with respect to scale factors of subbands of frequencies higher than the setting subbands, singularly large values can be detected by taking a ratio to scale factors of adjacent subbands. In addition, a setting sub-band is a sub-band which gives the main feature of speech like sub-band 0-2

Also, as shown in FIG. 33, when the scale factor index value (index) of FIG. 10 is taken as an X axis, the scale factor value (value) of FIG. 10 is taken as a Y axis, the Y axis is taken as a logarithmic axis. As is apparent, the scale factor value is halved with each increase of the scale factor index value by 3.
Thus, the ratio of a scale factor value to a scale factor value of an adjacent subband is proportional to the relationship between a scale factor index value and a scale factor index value of the adjacent subband. In addition, since scale factor index values are often around 20 except for very low frequency components, whether a certain scale factor value is extremely large compared to the scale factor value of an adjacent subband, When the scale factor

index value bits

4 and 5 are both 0, that is, when the scale factor index value is less than 16, the difference from the scale factor index value of the adjacent subband is larger than when the bit 4 is 1 It can be determined by Note that the bit4, refers to 2 ^4-position when a represents the scale factor index value in binary, bit5 refers to 2 ^5-position when representing the scale factor index value in binary.

When this determination is performed for scale factor index values of subband 3 and higher in FIG. 30, the scale factor index of Lch subbands 3 and 11 and the scale factor index of the beginning of Rch subband 10 as in FIG. Is detected as an abnormal value, and bit 4 is set to 1 for the detected abnormal value, that is, correction is performed by adding 16. The correction result at this time is as shown in FIG. As for the actual scale factor values, as can be seen from the scale factor values shown in FIG. 35 and the scale factor values of adjacent subbands shown in FIG. 36, the difference from adjacent subbands is small. , Will not be detected as a singular value.
As shown in FIG. 34, the decoding result when the scale factor index value is corrected is as shown in FIG. 37, which is a waveform without abnormal noise similar to the waveform of the adjacent frame shown in FIG.

The correction of the scale factor index value as described above is performed by the scale factor error corrector 22 shown in FIG. FIG. 38 is a block diagram showing a configuration of speech decoding apparatus 1C according to Embodiment 4. The components having the same or corresponding functions as those described in the first to third embodiments are designated by the same reference numerals, and the description thereof will be omitted or simplified.
Scale factor error corrector 22 uses the scale factor information obtained from scale factor decoder 14 when correction is performed by bit allocation error corrector 20 as described in the first and second embodiments. Correct the index value. Specifically, as described above, scale factor error corrector 22 changes the scale factor index value of the sub-band higher in frequency than the setting sub-band when the specific bit is changed to 1, the scale of the adjacent sub-band. If the difference with the factor index value becomes smaller, correction is performed to change the specific bit to one. The specific bit is determined according to the average value of the scale factor index, and in the above description, the average value is around 20, and bit 4 corresponds to the specific bit.

Similarly to the speech decoding device 1 of the first embodiment, the speech decoding device 1C of the fourth embodiment can be realized by the processing circuit 101 shown in FIG. 20A or the memory 102 and the CPU 103 shown in FIG. 20B.

Next, an example of the process of speech decoding apparatus 1C configured as described above will be described using the flowchart shown in FIG. The process shown in FIG. 39 is started when a bit stream of compressed speech is input to the speech decoding device 1C. The same or corresponding processing as that described with reference to FIG. 21 is denoted by the same reference numeral, and the description thereof is omitted or simplified.
It is assumed that the correction by the bit allocation error corrector 20 as described in the first and second embodiments is performed after the steps ST1 to ST3 (step ST40).

The scale factor error corrector 22 corrects the scale factor index value as needed, as described above, using the scale factor information obtained from the scale factor decoder 14 (step ST41). Then, time-series audio data is output by the processes of steps ST10, 12, and 13.

As described above, the audio decoding device 1C according to the fourth embodiment can suppress the generation of the abnormal noise even when the abnormal noise can not be suppressed only by the correction in the first and second embodiments.

Embodiment 5
Data correction at the time of decoding as shown in the first to fourth embodiments is to estimate and correct an error based on whether or not a feature that is generally considered to cause abnormal noise is observed. Therefore, it may be considered that the original sound component is corrected by mistake.
Therefore, in the fifth embodiment, the correction processing as shown in the first to fourth embodiments is performed only when the reception state of the digital broadcast is bad.

FIG. 40 is a block diagram showing a configuration of speech decoding apparatus 1D according to Embodiment 5. The components having the same or corresponding functions as those described in the first to fourth embodiments are designated by the same reference numerals, and the description thereof will be omitted or simplified.
The demodulator 23 demodulates the received signal of the digital broadcast wave input via the antenna, and outputs a bit stream to the synchronization detector 10. Further, the demodulator 23 outputs information indicating the reception state of the digital broadcast to the reception state determination unit 24. The information indicating the reception status of the digital broadcast is a reception signal level, a carrier to noise ratio, an error rate or the like.

The reception state determination unit 24 determines whether the reception state of the digital broadcast is better than the set level, using the information indicating the reception state of the digital broadcast. The reception state determiner 24 outputs the determination result to the correction controller 25. The setting level is set to, for example, the degree of the reception state in which an error starts to enter the frame.

The correction controller 25 controls the bit allocation error corrector 20 and the scale factor error corrector 22 to make corrections when the reception state determiner 24 determines that the reception state is lower than the set level and is bad. Specifically, the correction controller 25 outputs a control signal instructing correction to the bit allocation error corrector 20 and the scale factor error corrector 22.
On the other hand, the correction controller 25 is controlled such that the bit allocation error corrector 20 and the scale factor error corrector 22 do not perform correction when the reception state determiner 24 determines that the reception state is better than the set level. Do. Specifically, the correction controller 25 outputs a control signal for inhibiting correction to the bit allocation error corrector 20 and the scale factor error corrector 22.

As with the speech decoding apparatus 1 of the first embodiment, the speech decoding apparatus 1D of the fifth embodiment can be realized by the processing circuit 101 shown in FIG. 20A or the memory 102 and the CPU 103 shown in FIG. 20B. it can.

As described above, the speech decoding device 1D according to the fifth embodiment can suppress erroneous correction processing when the reception state is good.

In the scope of the invention, the present invention allows free combination of each embodiment, modification of any component of each embodiment, or omission of any component in each embodiment. is there.

As described above, since the audio decoding device for digital broadcast according to the present invention can suppress the influence due to an error causing abnormal noise while leaving the original audio component, it is suitable for being mounted on a vehicle etc. ing.

1 to 1D speech decoder, 10 sync detectors, 11 frame separators, 12 bit allocation decoders, 13 inverse quantizers, 14 scale factor decoders, 15 inverse normalizers, 16 subband synthesizers, 17 error detection 18 mute controller 20 bit allocation error corrector 20a detector 20b corrector 20c determiner 20d corrector 21 error detector 22 scale factor error corrector 23 demodulator 24 receive status , 25 correction controller, 100 speech decoders, 101 processing circuits, 102 memories, 103 CPUs.

Claims

A detection unit that detects the lowest frequency sub-band among the sub-bands of the frequency at which the bit allocation is 0 for bit allocation information of each sub-band;
And a correction unit that corrects, to 0, the bit allocation of a subband of a higher frequency than a subband of the lowest frequency among subbands in which the bit allocation detected by the detection unit is 0. Audio decoding device for digital broadcasting.
A determination unit that determines whether the total data amount of one frame calculated using bit allocation information and scale factor information exceeds the maximum data amount of one frame specified by the bit rate;
A correction unit that repeats correction to set the bit allocation of the highest frequency sub-band among the sub-bands of non-zero bit allocations to 0 when the total data amount exceeds the maximum data amount; Audio decoding device for digital broadcasting.
3. The apparatus according to claim 1, further comprising an error detector that sets a frame as an error frame when it is detected that the same scale factor index value continues in the same sub-band after the correction by the correction unit. Audio decoding device for digital broadcast according to the description.
After the correction by the correction unit, the difference between the scale factor index value of the adjacent sub-band and the scale factor index value of the sub-band higher in frequency than the setting sub-band is smaller if the specific bit is changed to 1. The voice decoding apparatus for digital broadcast according to claim 1 or 2, further comprising a scale factor error corrector that performs correction to change the specific bit to one.
A reception state determination unit that determines whether the reception state of the digital broadcast is better than the set level using information indicating the reception state of the digital broadcast;
2. The apparatus according to claim 1, further comprising: a correction controller configured to control the correction by the correction unit not to be performed when the reception state determiner determines that the reception state is better than the set level. The audio | voice decoding apparatus of the digital broadcast of claim 2.