WO2011048815A1 - オーディオ符号化装置、復号装置、方法、回路およびプログラム - Google Patents
オーディオ符号化装置、復号装置、方法、回路およびプログラム Download PDFInfo
- Publication number
- WO2011048815A1 WO2011048815A1 PCT/JP2010/006234 JP2010006234W WO2011048815A1 WO 2011048815 A1 WO2011048815 A1 WO 2011048815A1 JP 2010006234 W JP2010006234 W JP 2010006234W WO 2011048815 A1 WO2011048815 A1 WO 2011048815A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pitch
- parameter
- range
- audio signal
- encoded
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 73
- 230000008859 change Effects 0.000 claims abstract description 179
- 230000005236 sound signal Effects 0.000 claims description 100
- 238000012545 processing Methods 0.000 claims description 49
- 230000008569 process Effects 0.000 claims description 22
- 238000004590 computer program Methods 0.000 claims description 7
- 239000011295 pitch Substances 0.000 description 603
- 230000008602 contraction Effects 0.000 description 187
- 238000010586 diagram Methods 0.000 description 35
- 239000013598 vector Substances 0.000 description 20
- 238000004458 analytical method Methods 0.000 description 15
- 238000006243 chemical reaction Methods 0.000 description 15
- 238000001514 detection method Methods 0.000 description 11
- 238000001228 spectrum Methods 0.000 description 11
- 238000005516 engineering process Methods 0.000 description 9
- 230000009467 reduction Effects 0.000 description 9
- 238000012952 Resampling Methods 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 7
- 230000006835 compression Effects 0.000 description 7
- 238000007906 compression Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 230000002441 reversible effect Effects 0.000 description 5
- 230000002195 synergetic effect Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 4
- 230000018109 developmental process Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 241000596875 Gladiolus communis Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
- G10L19/265—Pre-filtering, e.g. high frequency emphasis prior to encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0212—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/09—Long term prediction, i.e. removing periodical redundancies, e.g. by using adaptive codebook or pitch predictor
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
Definitions
- the present invention relates generally to a converted audio encoding system, and more particularly to a converted audio encoding system that improves encoding efficiency and sound quality by shifting the pitch frequency of an input audio signal using a time expansion / contraction technique.
- the audio encoding system can be applied not only to audio but also to a speech signal, and can be used for a mobile phone, a telephone / video conference.
- the transform coding technique is designed to efficiently encode an audio signal.
- human speech the fundamental frequency of the signal changes from time to time.
- the energy of the speech signal is spread over a wide frequency band.
- the time expansion / contraction technique is used in the prior arts [3] and [4] to compensate for the influence of pitch change.
- FIG. 10 is a diagram showing an example of the concept of shifting the basic frequency.
- Time expansion / contraction technology is used to realize pitch shift.
- the spectrum in the column (a) in FIG. 10 is the original spectrum, and the spectrum in the column (b) in FIG. 10 is the spectrum after the pitch shift.
- the basic frequency is shifted from 200 Hz to 100 Hz. In this way, the pitch is stabilized by shifting the pitch of the next frame to match the pitch of the preceding frame.
- FIG. 11 is a diagram showing the spectrum after the pitch shift.
- the signal in column (a) of FIG. 11 is a sweep signal. And the signal of the (b) column of FIG. 11 is a signal after a pitch shift, and the pitch in the (b) column becomes constant.
- the two spectra in column (c) of FIG. 11 are the spectra of signal (a) and signal (b).
- the energy of the signal (b) is limited to a narrow band.
- the pitch shift as described above is achieved using a resampling method.
- the resampling rate changes according to the pitch change rate.
- the pitch contour of the input frame is obtained by applying the pitch tracking algorithm.
- FIG. 8 is a diagram for explaining segmentation of one audio frame.
- the frame is segmented into small sections for pitch tracking.
- adjacent sections may overlap. That is, for example, in at least one combination, one section (a part) of two sections adjacent to each other in the combination may overlap the other section (a part).
- Each section has a pitch value corresponding to that section.
- FIG. 15 is a diagram showing processing for calculating the pitch contour.
- the signal in the column (a) in FIG. 15 is a signal having a time-varying pitch.
- One pitch value is calculated from one section of the signal.
- a pitch contour is a chain of pitch values.
- the resampling rate is proportional to the pitch change rate.
- the pitch change information is extracted from the pitch contour.
- cents and semitones are frequently used to measure the pitch change rate.
- FIG. 12 is a diagram showing the lengths of cents and semitones. The cent is calculated from the pitch ratio of adjacent pitches.
- Resampling is applied to the time domain signal according to the pitch change rate.
- the pitch of the other sections is shifted to the reference pitch to obtain a stable pitch. For example, if the pitch of the next section is higher than the previous pitch, the resampling rate is set lower in proportion to the cent difference between those two pitches. Otherwise, the sampling rate must be higher.
- the sound range is shifted to a lower frequency by lowering the playback speed of the high-pitched sound. This is similar to the concept of resampling the signal in proportion to the pitch change rate.
- FIG. 13 and FIG. 14 show an encoding system incorporating a time expansion / contraction method.
- FIG. 13 is a block diagram of time expansion and contraction in the encoder (encoder 13A).
- FIG. 14 is a block diagram of time expansion and contraction in the decoder (decoder 14A).
- the time domain signal is time stretched before transform coding.
- Pitch information is required for inverse time expansion and contraction in the decoder.
- the pitch ratio must be encoded with an encoder.
- time expansion / contraction The motivation for using time expansion / contraction is to stabilize the pitch within one frame and achieve improved coding efficiency.
- the time expansion / contraction depends to some extent on the accuracy of pitch tracking.
- pitch contour detection is that difficulties may arise due to changes in signal amplitude and trajectory.
- post processing methods such as smoothing and fine adjustment threshold parameters have been introduced to improve pitch detection accuracy, but these methods are based on a specific database.
- time expansion / contraction is applied based on an inaccurate pitch contour, the sound quality deteriorates and the bits used for transmitting the time expansion / contraction information are wasted. Therefore, it is necessary to design time expansion / contraction that does not use the detected pitch contour as a guideline.
- the saved bits can be used for transform encoding, so that the sound quality can be improved and the signal has a large pitch change. Can respond.
- a simple method of incorporating the time expansion / contraction method into the transform encoding system is to directly connect the time expansion / contraction method to the transform encoding.
- the time scaling scheme is independent of transform coding. Since the purpose of the time expansion / contraction is to improve the efficiency of transform coding, it is useful for the time stretching to use some coding information from the transform coding system.
- the current transform coding structure using time expansion / contraction needs to be improved.
- Another object includes providing an encoding device, a decoding device, and the like in which the range of the pitch change ratio (see the ratio 88 in FIG. 18) can be an appropriate range (see the range 86). Another object is to provide an encoding device or the like that can perform high-quality sound when appropriate processing is performed at a pitch change ratio in a wider range (see the ratio 88 in FIG. 18). including. Another object is to generate data (see data 90L in FIG. 22) of the code (see reference numeral 90 in FIG. 18) in which the pitch (see pitch 822 in FIG. 16, ratio 83, ratio 88 in FIG. 18, etc.) is encoded. For example, an encoding device that can reduce the amount of data (for example, an average amount). Then, another object includes providing an encoding device or the like that performs processing in a standard such as ISO that will be defined in the future and that performs processing relatively appropriately.
- a standard such as ISO that will be defined in the future and that performs processing relatively appropriately.
- the encoding device includes a pitch detector that detects pitch contour information of an input audio signal, and a range (range) of the bit change ratio (see Tw_ratio in FIG. 18) based on the detected pitch contour information.
- 86) is the cent number of the pitch change ratio (Tw_ratio: 1.0416, 1.0293, 0.9772, 0.9715, 0.9604) of the range (see the range 86a) (cent: 60, 50,
- the absolute value of ⁇ 40, ⁇ 50, ⁇ 60) is a pitch change ratio (Tw_ratio, Tw_ratio_index: FIG. 18) that is a range (range 86) of a range (range 86) including a range (range 86a) that is 42 or more.
- a pitch parameter generator that generates a pitch parameter, a first encoder that encodes the generated pitch parameter, and the input audio according to the pitch contour information.
- a pitch shifter that shifts the pitch frequency of the audio signal, a second encoder that encodes the shifted audio signal that is output from the pitch shifter, the encoded pitch parameter that is output from the first encoder, and
- a multiplexer that generates a bit stream including the encoded pitch parameter and the data by combining the data encoded from the audio signal output from the pitch shifter and output from the second encoder;
- the first encoder sets the pitch parameter (see the ratio 88 in FIG. 18) to the cent number having a relatively small absolute value (see cent in FIG. 18).
- the pitch parameter of the pitch change ratio see the ratio 88a
- it is encoded into a coding pitch parameter (see the code 90a) of a code having a relatively short code length, and a cent number having a relatively large absolute value is obtained.
- the pitch parameter is the pitch parameter of the pitch change ratio (see the ratio 88b)
- an encoding device that encodes the encoded pitch parameter (see the code 90b) of the code having a relatively long code length is constructed.
- a decoding device is a decoding device that decodes a bitstream including encoded data of a pitch-shifted audio signal and encoded pitch parameter information, from the bitstream to be decoded to the bitstream.
- a demultiplexer that separates the encoded data and the encoded pitch parameter information included therein, and a domain of the bit change ratio (see Tw_ratio in FIG. 18) from the separated encoded pitch parameter information ( Range 86) is the cent number (cent: 60, 50, cent) of the pitch change ratio (Tw_ratio: 1.0416, 1.0293, 0.9772, 0.9715, 0.9604) of the range (range 86a).
- the absolute value of ⁇ 40, ⁇ 50, ⁇ 60) is the range (range 86) of the range (range 86) including the range (range 86a) that is 42 or more. 6) a first decoder that generates a decoding pitch parameter including a pitch change ratio (Tw_ratio, Tw_ratio_index: FIG. 18), a pitch contour reconstructor that restores pitch contour information according to the generated decoding pitch parameter, A second decoder that decodes the separated encoded data to generate the pitch-shifted audio signal; and the pitch-shifted audio according to the reconstructed pitch contour information that is the restored pitch contour information An audio signal reconstructor that converts a signal into an original audio signal.
- a first decoder that generates a decoding pitch parameter including a pitch change ratio (Tw_ratio, Tw_ratio_index: FIG. 18), a pitch contour reconstructor that restores pitch contour information according to the generated decoding pitch parameter,
- a second decoder that decodes the separated encoded data to generate the
- the first decoder uses the separated encoded pitch parameter information when the encoded pitch parameter information is encoded pitch parameter information of a code having a relatively short code length. Is decoded into a pitch parameter of a relatively small absolute value cent number pitch change ratio, and when the code pitch parameter information of a code having a relatively long code length, A decoding device for decoding the pitch change ratio into a pitch parameter is constructed.
- the following signal processing system including an encoding device and a decoding device may be constructed (see also the description at the beginning of the embodiment, etc.).
- the encoding device in the signal processing system, the encoding device generates the second signal in which the pitch shifter shifts the pitch of the first signal from the first signal to a predetermined pitch.
- the second encoder encodes the generated second signal into a third signal, and the pitch parameter generator identifies the pitch of the first signal before being shifted.
- a pitch change ratio is calculated, and the first encoder is an encoding device that encodes the calculated pitch change ratio into a code.
- the second decoder In the decoding apparatus, the second decoder generates the second signal generated from the first signal, the pitch of the first signal being shifted to the predetermined pitch.
- the encoded third signal is decoded into the second signal, and the audio signal reconstructor generates the first signal from the decoded second signal, and the first signal
- a decoder decodes the code into the pitch change ratio, and the pitch contour reconstructor specifies the pitch at which the first signal of the pitch is generated, which is specified by the decoded pitch change ratio. It is a decoding device to calculate.
- the code that is encoded into the pitch change ratio and decoded into the pitch change ratio is a pitch between two pitches having a pitch difference of 0 cents corresponding to the pitch change ratio corresponding to the code.
- the first pitch change ratio has a relatively small difference with respect to the change ratio
- the first code has a relatively short code length
- the second pitch change ratio has a relatively large difference.
- the second code has a relatively long code length.
- the third signal in which the shifted second signal is encoded is generated by the encoding device, and the operation in which the decoding device decodes the first signal before the shift is performed.
- the pitch change ratio of the pitch of the signal of the signal is only performed when the difference that the pitch change ratio of 0 cents has with respect to the pitch change ratio is equal to or smaller than a threshold value, and is not performed when the difference is larger than the threshold value. It is not a value at a pitch of less than 42 cents, but a value at a pitch greater than 42 cents.
- the harmonics are corrected along with the pitch shift, so it is necessary to consider the harmonic structure of the signal during time expansion and contraction.
- the proposed harmonic time expansion / contraction method improves the sound quality by correcting the pitch contour and taking into account the harmonic structure during time expansion / contraction based on the analysis of the harmonic structure.
- the proposed dynamic time expansion and contraction also evaluates the efficiency of time expansion and contraction by comparing the harmonic structures before and after the time expansion and contraction, and decides whether to use the time expansion and contraction for the target frame. It removes the inaccuracy caused by inaccurate pitch contours.
- the pitch contour information is sent directly to the decoder without being compressed.
- dynamic time expansion / contraction a method for encoding time expansion / contraction parameters more efficiently is proposed. After the statistical analysis of the pitch contour for time expansion / contraction, it can be seen that the time expansion / contraction is enabled only at a few positions where the pitch changes in the signal frame.
- the proposed dynamic time expansion / contraction also supports a wide range of time expansion / contraction values. Note that “corresponding” means that an appropriate operation can be performed.
- the saved bits are used for transform coding and the sound quality is improved by a wide range of time scaling values.
- MS stereo mode (Mid Stereo Mode) is used to encode stereo audio signals.
- MS stereo mode Mid Stereo Mode
- the left and right channels have similar characteristics to each other, it is more efficient to use the same time scaling parameter for the left and right signals.
- sharing the time expansion / contraction may lower the coding efficiency. Therefore, the MS mode is introduced for time expansion and contraction in the proposed transform coding structure.
- the bit stream (see the bit streams 106x, 205i, etc.) received by the decoding device has a plurality of positions (see sections 841 to 84M) in one frame (see the frame 84F in FIG. 16).
- the signal at the pitch change position (see position 704p in FIG. 9) is TimeWarp (pitch shifted) by the audio signal reconstructor, and the signals at other positions are not subjected to TimeWarp (see position 704p).
- the pitch contour is corrected based on information obtained by analyzing the harmonic structure of the audio signal, and the efficiency of time expansion / contraction is evaluated by comparing the harmonic structures before and after the time expansion / contraction process. .
- the time expansion / contraction technique of the present invention can improve sound quality and encoding efficiency of an audio encoding system by using MS stereo mode information from transform encoding.
- the range of the pitch change ratio can be an appropriate range (see the range 86).
- the data amount (for example, the average of the data amount) of the code (see the reference numeral 90 in FIG. 18) in which the pitch (see the pitch 822 in FIG. 16, the ratio 83, the ratio 88 in FIG. 18, etc.) is encoded can be reduced. .
- FIG. 1 is a block diagram of an encoder that uses dynamic time stretching.
- FIG. 2 is a block diagram of a decoder that uses dynamic time stretching.
- FIG. 3 is a block diagram of a decoder that uses a modified dynamic time warp decoder.
- FIG. 4 is a block diagram of an encoder that uses dynamic time stretching using the MS mode.
- FIG. 5 is a block diagram of a decoder using dynamic time warping utilizing the MS mode.
- FIG. 6 is a block diagram of an encoder that uses a modified dynamic time warping utilizing the MS mode.
- FIG. 7 is a block diagram of an encoder using closed loop dynamic time stretching.
- FIG. 8 is a diagram for explaining segmentation of one audio frame.
- FIG. 9 is a diagram illustrating the calculation of the vector C.
- FIG. 1 is a block diagram of an encoder that uses dynamic time stretching.
- FIG. 3 is a block diagram of a decoder that uses a modified dynamic time warp decoder
- FIG. 10 is a diagram for explaining the pitch shift.
- FIG. 11 shows the spectrum after the pitch shift.
- FIG. 12 is a diagram illustrating cents and semitones.
- FIG. 13 is a block diagram of time expansion and contraction in the encoder.
- FIG. 14 is a block diagram of time expansion / contraction in the decoder.
- FIG. 15 is a diagram for explaining the calculation of the pitch contour.
- FIG. 16 shows a spectrum based on a logarithmic scale.
- FIG. 17 is a diagram illustrating pitch shift using harmonics.
- FIG. 18 is a diagram showing a table.
- FIG. 19 is a diagram showing a table in the preceding example.
- FIG. 20 is a diagram illustrating an encoding device and a decoding device.
- FIG. 21 is a flowchart showing the flow of processing.
- FIG. 22 is a diagram illustrating data in each of the preceding example and the present apparatus.
- the encoding apparatus (encoding apparatus 1) of the embodiment provided in the system of the embodiment (system 2S in FIG. 20) is an input audio signal (signal 101i (FIG. 1): see signal 811 in FIG. 11).
- Pitch detector pitch contour analysis block (pitch contour analysis unit) for detecting pitch contour information (information (pitch) 101x, pitch 822 (FIG. 15))) 101) and the detected pitch contour information (information 101x), the range (range 86) of the bit change ratio (Tw_ratio (FIG. 18), ratio 83 (FIG. 15), ratio 88 (FIG. 18)).
- FIG. 18 shows the cent number (cent: 60) of the pitch change ratio (Tw_ratio: 1.0416, 1.0293, 0.9772, 0.9715, 0.9604) of the range (range 86a).
- 50, ⁇ 40, ⁇ 50, ⁇ 60) is a pitch change ratio (Tw_ratio: FIG. 18) that is a range (range 86) of a range (range 86) including a range (range 86a) that is 42 or more.
- a pitch parameter generator dynamic time expansion / contraction block 102 for generating pitch parameters (parameter (pitch change ratio) 102x, ratio 88 (FIG. 18)), and the generated pitch parameter (parameter 102x) (reference numeral 90).
- the first encoder for encoding (to FIG. 18) and the pitch contour information (information (pitch) 101x, pitch 822), the input audio signal (signal (first signal) ) 101i) Pitch shifter for shifting the pitch frequency (pitch 822: FIG. 15) (to the reference pitch 82r (FIG.
- a coding apparatus comprises a multiplexer circuit) 106) and (encoder 1).
- 1 cent is, for example, a pitch that is only 1 / 100th of a pitch 90j (FIG. 12) of 100 cents that constitutes a semitone (see two pitches (see two pitches 821 and 822 in FIG. 15)). Difference), in other words, a pitch of only 1 / 1200th of a pitch of one octave.
- the entire pitch parameter to be generated may be the pitch change ratio, or a part may be the pitch change ratio. Then, such a pitch parameter whose part or the like is the pitch change ratio may be one of a plurality of generated pitch parameters.
- the first encoder uses the pitch parameter (parameter 102x (FIG. 1), ratio 88 (FIG. 18)) and the pitch parameter (ratio 88) is relatively small.
- Pitch change ratio for example, 1.0 at two pitches of pitch width (see pitches 821 and 822 (see FIG. 15)) of cent number ( ⁇ 0: see cent in FIG. 18) of value (0) )
- Pitch parameter (ratio 88a) a coding pitch parameter (symbol 90a) of a code (symbol 90a: "0") of a relatively short code length (length 1: see bits in FIG. 18).
- the encoding apparatus (encoding apparatus 1) is encoded to encode the encoding pitch parameter (reference numeral 90b) of the length 6) code (reference numeral 90b: “111100”).
- the decoding apparatus includes encoded data (third signal) 204i of the pitch-shifted audio signal (second signal 203ib: FIG. 2), and encoding pitch.
- a decoding device decoding device 2 that decodes a bit stream (stream 205i (stream 106x)) including parameter information (parameter 201i, code 90), from the bit stream (stream 205i) to be decoded
- a demultiplexer that separates the encoded data (third signal 204i in FIG. 2 (third signal 105x in FIG. 1)) and the encoded pitch parameter information (parameter 201i, reference numeral 90) included in the stream, respectively. (Multiplexer block 205) and the separated encoded pitch parameter information (parameter).
- the range (range 86) of the bit change ratio (ratio 88, Tw_ratio_index, Tw_ratio: FIG. 18) is the pitch change ratio (Tw_ratio: 1.0416) of the range (86a).
- 0293, 0.9772, 0.9715, 0.9604 including the range (range 86a) in which the absolute value of the cent number (cent: 60, 50, ⁇ 40, ⁇ 50, ⁇ 60) is 42 or more First decoder (lossless decoding block) that generates a decoding pitch parameter (parameter 202i, code 90) including a pitch change ratio (ratio 88, Tw_ratio_index, Tw_ratio: FIG.
- a constructor dynamic time expansion / contraction reconstruction block 202 and the separated encoded data (signal 204i, third signal 204i) are decoded and the pitch-shifted audio signal (signal (second signal)) is decoded. 203ib), and the pitch-shifted audio signal (signal (signal (2)) according to reconstructed pitch contour information (information 203ia, pitch 822) that is the restored pitch contour information.
- An audio signal reconstructor (time expansion / contraction block 203) that converts (second signal) 203ib) into the original audio signal (second signal 203x) (having the pitch specified by the reconstructed pitch contour information); Is a decoding device (decoding device 2).
- the first decoder uses the separated encoded pitch parameter information (parameter 201i (FIG. 2), code 90 (FIG. 18)) as the encoded pitch.
- the parameter information (code 90 (FIG. 18)) is coded pitch parameter information (code 90a) of a code (code 90a: “0”) of a relatively short code length (length 1: see bits in FIG. 18).
- decoding into a pitch parameter (ratio 88a) of a pitch change ratio (1.0, ratio 88a) of a relatively small absolute value (0) cent number (0: see cent in FIG.
- an encoding device see, for example, encoding device 1 (FIG. 1, FIG. 20), step S1 (FIG. 21), etc.
- a decoding device see decoding device 2, step S2, etc.
- the following signal processing system may be constructed.
- the pitch shifter (time expansion / contraction unit 104) is configured so that the pitch shifter (time expansion / contraction unit 104) receives the first signal (first signal 101i, input audio signal (previously described): FIG. 1) Second signal (second signal 104x, shifted audio signal (previously described)) in which the pitch of the first signal (pitch 822: FIG. 15) is shifted to a predetermined pitch (reference pitch 82r). ), And the second encoder (conversion encoder 105) outputs the generated second signal (second signal 104x) from the third signal (third signal 105x, pitch shifter).
- the audio signal is encoded into encoded data (described above), and the pitch parameter generator (pitch parameter generator (dynamic time expansion / contraction block) 10) is encoded.
- the pitch parameter generator pitch parameter generator (dynamic time expansion / contraction block) 10) is encoded.
- the second signal (conversion decoder 204) is generated from the first signal (first signal 203x (first signal 101i)).
- the second signal (second signal 203ib (second signal) in which the pitch (pitch 822: FIG. 15) of (first signal 203x) is shifted to the predetermined pitch (reference pitch 82r).
- 104x) is encoded into the third signal (third signal 204i (third signal 105x)) to the second signal (second signal 203ib (second signal 104x)).
- the audio signal reconstructor (time expansion / contraction unit 203) generates the first signal (first signal 203x) from the decoded second signal (second signal 203ib), and the first signal 203x is decoded.
- Lissless decoding unit 201 converts the code (parameter 201i (parameter 103x), code 90 (FIG. 18)) into the pitch change ratio (parameter 202i (parameter 102x), ratio 88 (number of the ratio 88), Tw_ratio, Tw_ratio_index), and the pitch contour reconstructor (202) is identified by the decoded pitch change ratio (ratio 88), and the first signal (first signal) of the pitch (pitch 822) is specified.
- 203x) is a decoding device (decoding device 2: decoding device 2c, 2g (FIG. 2, FIG. 5, etc.)) that calculates the pitch (pitch 822).
- Non-Patent Documents 1 to 4, etc. The technical development of this type of signal processing system is currently in progress (see Non-Patent Documents 1 to 4, etc.), and such a signal processing system is often not well understood. .
- this signal processing system is a signal processing system in a standard to be determined in the future.
- the shifted second signal (second signal 104x, 203ib) is encoded into the third signal (third signal 105x, 204i).
- the converted third signal is decoded into the second signal.
- the sound data (third signal) subjected to processing such as communication from the encoding device to the decoding device can be made into more appropriate data such as data having a small data amount.
- the pitch change ratio is calculated and the second signal decoded from the third signal is shifted, the shift to the pitch specified by the calculated pitch change ratio is reliably performed.
- the pitch of the shift destination can be set to an appropriate pitch.
- the calculated pitch change ratio is encoded into a code, and the encoded code is decoded into the pitch change ratio so that a code having a data amount smaller than the data amount of the pitch change ratio is communicated.
- the amount of data of the pitch data (the code in which the pitch change ratio is encoded (code 90)) to be processed can be reduced.
- the code (the ratio 88) is encoded and the code (the ratio 88) is decoded into the pitch change ratio (the ratio 88).
- Reference numeral 90 indicates that the pitch change ratio (ratio 88) corresponding to the reference numeral (reference numeral 90) is a pitch change ratio between two pitches having a pitch difference of 0 cents (a ratio 88x of 1.0: FIG. 18).
- the first pitch change ratio (ratio 88a) having a relatively small difference (0 cent)
- the first code (code 90a) having a relatively short code length (length 1).
- the second code (symbol 90b) having a relatively long code length is used.
- variable length coding according to the difference (whether it is close to the 0 cent ratio 8x (how far away)) may be used.
- the data amount of the third signal (signals 105x and 204i) is reduced, and the data amount of the pitch data (signals 103x and 201i) to be processed such as communication can be further sufficiently reduced.
- the third signal in which the shifted second signal (signal 104x, 203ib) is encoded.
- 105x) is generated by the encoding device and decoded by the decoding device (S1, S2 in FIG. 21) is the first signal (first signal 101i, 203x) before being shifted.
- the range (range) of the pitch change ratio (ratio 88) in which the above-described operation is performed may be set to a range 86 (FIG. 18) wider than (range 87 in the previous example).
- the pitch change ratio in a wider range is encoded, and the data amount of the encoded code 90 data (data 90L in FIG. 22) is further increased.
- the data amount of the encoded data 90L is, for example, a data amount (substantially) smaller than the data amount of the data 91L (FIG. 19) encoded with the fixed-length code 91 in the preceding example. It is avoided that the data amount becomes too small, the data amount is relatively close (for example, the same data amount may be sufficient), and the data amount after encoding can be made an appropriate data amount. .
- the range of the pitch change ratio range is such that the amount of data of the encoded code 90 (data 90L) is, for example, a fixed length.
- a range (threshold value) that is an appropriate data amount such as a data amount that is relatively close to the data amount of data (for example, data 91L) at the time of encoding (preceding example).
- the pitch change ratio (ratio 88) is as large as the pitch change ratio in the range 86a in which the cent number is greater (42 cents) than the previous pitch (pitch 821: FIG. 15). It was noticed that the pitch change ratio of the changed pitch (pitch 822: FIG. 15) is often (to some extent).
- the pitch change ratio belongs to the above-mentioned wider range (range 86), and the third signal 105x is generated.
- the sound quality can be improved, for example, by avoiding the process of generating another signal having a sound quality lower than that of the third signal 105x.
- the code 90a having the short code length (length 1) described above is the code 90 having the pitch change ratio 88a in the range 87 in less than 42 cents.
- a code 90b having a long code length (length 6) is a code 90 having a pitch change ratio 88b in a range 86a of 42 cents or more.
- This threshold value (“0.0416” in the above description) is, for example, the value of each value belonging to the range of the pitch change ratio range (range 86 in FIG. 18, 1.0416 to 0.9604). Among them, the value in the cent number of the largest absolute value (1.0416). That is, in this way, by setting the threshold value to a high value (for example, “0.0416” described above), only the range 87 in which the range 86 is less than 42 (see 1.02285 to 0.982857 in FIG. 19). In addition, a range 86a of 42 cents or more (a range between 1.0416 to 1.0293 and 0.9772 to 0.9604 in FIG. 18) is also included, and a wider range may be included. Good.
- FIG. 1 is a diagram showing an example of a proposed encoder (encoding device).
- one frame of the left and right signals is transmitted to a block 101 which is a pitch contour analysis block.
- 101 pitch contour analysis block (pitch contour analysis unit) 101
- the pitch contours of the left and right channels are calculated separately. That is, the pitch contour of each channel is calculated.
- the pitch contour detection algorithm described in the prior art can be used here (pitch contour analysis unit 101).
- one frame is segmented into M overlapping segments.
- M pitches are calculated from M sections.
- the pitch contours of the left and right channels extracted in block 101 are sent to block 102 which is a dynamic time expansion / contraction block. Then, the block 102 is based on the extracted pitch contour information composed of the pitch change section information (time expansion / contraction position) and the pitch change ratio (time expansion / contraction value) of the adjacent section corresponding thereto in each audio frame. Generate pitch parameters.
- the pitch parameter is also referred to as a dynamic time expansion / contraction parameter.
- This dynamic time expansion / contraction parameter is sent to the block 103 which is a lossless encoding block.
- the lossless encoding block further compresses the time expansion / contraction value to generate an encoding time expansion / contraction parameter.
- a general lossless encoding technique is used.
- the generated encoding time expansion / contraction parameter is sent to a block 106 which is a multiplexer (multiplexer block, multiplexer circuit), and a bit stream is generated.
- the dynamic time expansion / contraction parameter is sent to the block 104 which is a time expansion / contraction block.
- Block 104 resamples the input signal according to the time stretch parameter.
- the left and right signal pitches are shifted (time stretched) separately according to the corresponding dynamic time stretch parameters.
- the signal after the time expansion / contraction is sent to the block 105 which is a conversion encoder.
- the encoded signal and related information are also sent to block 106, which is a multiplexer.
- the input signal of the block 101 in the first embodiment is not necessarily a stereo signal, and may be a monaural signal or a multi-signal.
- the dynamic time stretching method can be applied to any number of channels.
- the pitch contour is processed by a dynamic time expansion / contraction method to generate a dynamic time expansion / contraction parameter.
- the generated dynamic time expansion / contraction parameter represents a position to which time expansion / contraction is applied and a time expansion / contraction value at the position. Sound quality is improved by the proposed dynamic time expansion and contraction method.
- lossless encoding is also introduced.
- pitch detection is a difficult problem because the amplitude and period of the signal change. That is, when the pitch contour information is directly used for time expansion / contraction, the inaccuracy of the pitch contour affects the time expansion / contraction performance. Since the harmonics of the signal are corrected in proportion to the pitch shift during the time expansion / contraction, it is necessary to consider the influence of the time expansion / contraction on the harmonics.
- the pitch contour is corrected and a more efficient dynamic time expansion / contraction parameter is generated. This consists of three parts.
- the performance of time expansion and contraction is evaluated by comparing the harmonic structures before and after time expansion and contraction.
- the pitch contour is modified. Similar to the first embodiment, an audio frame is segmented into M sections for pitch calculation.
- the pitch contour has M pitch values (pitch 1 , pitch 2 ,..., Pitch M ).
- the pitch is shifted close to the reference pitch value.
- a stable reference pitch is obtained after time expansion and contraction.
- the signal harmonics can be shifted to the vicinity of the reference pitch harmonics.
- FIG. 17 is a diagram illustrating pitch shift using harmonics.
- FIG. 17 An example is shown in FIG. As shown in FIG. 17, the reference pitch and the respective reference harmonics are shown by broken lines (three places).
- the detected pitch is close to the harmonics of the reference pitch.
- ⁇ f 1 > ⁇ f 2 means the following. That is, ⁇ f 1 > ⁇ f 2 is such that a larger expansion / contraction value (see ⁇ f 1 in FIG. 17) is used to shift the detected pitch to the reference pitch, and the detected pitch is used as the reference pitch harmonics. Means that a smaller scaling value (see ⁇ f 2 in FIG. 17) is used to shift to.
- the dynamic time expansion / contraction process corrects the pitch contour and enables the shift of harmonic components. Details of this correction processing will be described below.
- the proposed dynamic time expansion and contraction compares the difference between the detected pitch and the reference pitch.
- pitch ref in Equation 2 below (Expression 2) represents a reference pitch value.
- pitch i is the section i, representing the detected pitch value.
- pitch ref is an integer and k> 1.
- Equation 2 If there is a value of k that satisfies Equation 2 below, The value pitch i must be shifted to “k ⁇ pitch ref ” at the value of k, which is the harmonic of the reference pitch value. The detected pitch i is corrected to pitch i / 2.
- pitch i is corrected to k ⁇ pitch i .
- time expansion / contraction is applied, and the performance is evaluated by comparing the harmonic structures before and after the time expansion / contraction.
- the sum of the harmonic components before and after the time expansion and contraction is used as a performance evaluation criterion in the second embodiment.
- the harmonics of the pitch value of section i are calculated as follows.
- q is the number of harmonic components.
- S (•) represents the spectrum of the signal.
- Pitch i is a pitch value detected in pitch contours pitch 1 , pitch 2 ,..., Pitch M.
- S ′ (•) represents the spectrum of the signal after time expansion and contraction.
- the signal Prior to time scaling, the signal consists of pitch 1 , pitch 2 ,..., Pitch M harmonics.
- the harmonic ratio HR is defined to represent the energy distribution between these harmonic components as follows.
- the harmonics ratio HR ′ is calculated as follows.
- H ′ (pitch ref ) is the sum of the harmonics of the reference pitch after time expansion and contraction.
- dynamic time stretching parameters are generated using an efficient method.
- Pitch change position in the frame since not many in the frame can be designed with a pitch change position, so as to encode separately a value Delta] p i, efficient manner.
- the corrected pitch contour is normalized.
- the difference between adjacent corrected pitches is calculated as follows.
- FIG. 9 is a diagram illustrating the calculation process of the vector C.
- FIG. N An example of the setting contents of the vector C is shown in FIG. N is defined as the number of sections where the pitch varies and ⁇ p i ⁇ 1.
- a dynamic scheme is used to encode the vector C and the time scaling value ⁇ p i for which ⁇ p i ⁇ 1.
- a flag A is then generated to indicate which method has been selected.
- the time stretch value ⁇ p i with ⁇ p i ⁇ 1 and the vector C must be sent to the decoder.
- N 0 and If so, it means that the number of pitch change points is small. In this case, it is more efficient to directly encode the position of the pitch change point.
- the flag A is set to 2 and log 2 M bits are used to encode the positions marked 0 in vector C.
- the position of the pitch change point is 2, and 3 bits are used for encoding position 2.
- Flag A the number of pitch change point N, the pitch change position, and, Delta] p i is a Delta] p i ⁇ 1 is sent to block 103.
- the saving of bit-rate, lossless encoding may be used.
- ⁇ p i 1.
- Only the first two schemes may be used for block 102 for the purpose of reducing complexity.
- Dynamic time stretching allows the harmonic structure to be rebuilt through time stretching. Since the energy is limited to the reference pitch and its harmonic components, the coding efficiency is improved.
- the evaluation scheme reduces the dependency on the accuracy of pitch detection and improves the performance of the coding system.
- An efficient method for encoding the time expansion / contraction parameter can improve the sound quality by reducing the bit rate and can cope with the encoding of a signal having a larger pitch change rate.
- FIG. 2 is a block diagram of the third embodiment.
- a block 205 which is a demultiplexer, divides the input bitstream into an encoded time stretch parameter, an encoded audio signal, and associated transform encoder information.
- the encoding time expansion / contraction parameter is sent to the block 201 which is a lossless decoding block.
- the block 201 which is a lossless decoding block.
- dynamic time expansion / contraction parameters are generated.
- Dynamic time warping is composed of flags and the information of the position where time warping is applied, the time warping value Delta] p i corresponding thereto.
- Dynamic time expansion / contraction information is sent to block 202 which is a dynamic time expansion / contraction reconstruction block.
- Block 202 decodes the time stretch parameter from the dynamic time stretch parameter.
- the block 204 which is a conversion decoder decodes the encoded signal based on the conversion encoder information from the demultiplexer block 205. It decodes the time stretched signal.
- the time expansion / contraction block 203 receives the time expanded / contracted signal and applies the time expansion / contraction to the input signal. This time expansion / contraction process is the same as the process in the block 104 in the first embodiment. The signal is not stretched according to the time stretch parameter and the audio signal.
- Stretch dynamic time received by the expansion and contraction reconstruction dynamic time consists flag and the information of the position where time warping is applied, the time warping value Delta] p i corresponding thereto.
- the flag is confirmed. If the flag is 0, it means that time expansion / contraction is not applied to the target frame. In this case, all the reconstructed pitch contour vectors are set to 1.
- the flag is 1, it means that M bits are used for encoding the vector C indicating the position to which time expansion / contraction is applied. One bit is aligned to one position. 1 is marked as no pitch change, while 0 is marked as time expansion / contraction. By counting the number of zeros in vector C, the total number of time expansion points N can be determined. In the process, stretch value Delta] p i of N times is obtained from the buffer. ⁇ p i corresponds to the time expansion and contraction value.
- c (i) 0.
- the pseudo code is as follows.
- the normalized pitch contour is reconstructed as follows.
- the pitch contour is later used for time expansion and contraction.
- FIG. 3 is a diagram showing the proposed encoder.
- the difference between the encoding system shown in FIG. 1 and the encoder shown in FIG. 3 is in blocks 306 and 307.
- the function of the lossless decoding 306 in FIG. 3 is the same as 201 in FIG.
- the dynamic time expansion / contraction reconstruction block 307 is the same as 202 in FIG.
- the encoder uses exactly the same time expansion / contraction parameters as the decoder.
- the fifth embodiment increases the accuracy of time expansion and contraction in the encoder.
- FIG. 4 is a diagram illustrating a configuration of the encoding device according to the sixth embodiment.
- the MS mode is frequently used for encoding a stereo audio signal such as an AAC codec.
- MS mode detects the similarity between the left and right channel subbands in the frequency domain. MS stereo mode is enabled when the left and right channel subbands are similar. Otherwise, the MS mode is not enabled.
- the MS mode information can be used for many transform encodings, the MS mode information can be used for improving the performance of harmonic time expansion / contraction in dynamic time expansion / contraction.
- the left and right channel signals are sent to block 401, which is an MS calculation block.
- the MS calculation block calculates the similarity between the left and right signals in the frequency domain. This is the same as MS detection in general transform coding.
- Block 401 generates a flag. If the MS mode is enabled for all subbands of the stereo audio signal, the flag is set to 1, otherwise the flag is set to 0.
- block 402 which is a downmix block
- the left and right channel signals are downmixed into a middle signal and a side signal.
- the middle signal is sent to block 403, which is a pitch contour analysis block.
- a block 403 which is a pitch contour analysis block, calculates pitch contour information in the same manner as the block 102 in FIG.
- a set of pitch contours is generated for the downmixed signal. Otherwise, the pitch contours of the left and right signals are generated separately.
- blocks 404, 405, and 406, 408 is the same as the description of the operations of blocks 103, 104, 105, and 196.
- dynamic time compression is modified to be more suitable for stereo coding.
- the left and right channels may have different characteristics.
- different time compression parameters are calculated for different channels.
- the left and right channels may have similar characteristics. It is reasonable to use the same time compression parameter for both channels. If the left and right channels are similar, more efficient audio coding can be achieved by using the same set of time compression parameters.
- FIG. 5 is a block diagram of a decoding device according to the seventh embodiment.
- the input bit stream is sent to the demultiplexer block 506.
- the output of the block 506 is an encoding time compression parameter, transform encoder information, and an encoded signal.
- the block 505 which is a conversion decoder decodes the encoded signal into a time compression signal according to the conversion encoder information, and extracts the MS mode information.
- MS mode information is sent to the MS mode detection block 504.
- the MS mode is enabled for all subbands of this frame, the MS mode is also enabled for time compression and the flag is set to 1. Otherwise, the MS mode is not used to reconstruct the harmonic time stretch and the flag is set to zero. The MS mode flag is sent to the harmonics time stretch reconstruction block 502.
- the dynamic time expansion / contraction parameter is inversely quantized from the block 501 which is a lossless decoding block.
- the dynamic time expansion / contraction reconstruction block 502 reconstructs the time expansion / contraction parameters according to the MS flag.
- FIG. 6 is a block diagram of an encoder that uses a modified dynamic time warping utilizing the MS mode.
- the fourth embodiment is changed so as to improve the accuracy of time expansion and contraction in the encoder.
- a lossless encoding block 608 and a dynamic time expansion / contraction reconstruction block 609 are added to the encoding structure.
- the purpose is to ensure that the encoder uses the same time scaling parameters as the decoder.
- the description of blocks 608 and 609 is the same as the description of blocks 501 and 502 in FIG.
- FIG. 7 is a diagram illustrating an encoding apparatus according to the ninth embodiment.
- the configuration of the ninth embodiment is based on the configuration of the eighth embodiment, but a comparison scheme (comparison scheme 710) is added. Prior to sending the encoded signal and the time stretch parameter to the multiplexer 711 of FIG. 7, the encoded signal is verified in a comparison scheme 710. After decoding the time expansion / contraction, it is determined whether the overall sound quality is improved.
- One example is to compare the SNR of the decoded signal with the original signal.
- the time-scaled encoded signal is decoded by the transform decoder.
- the time expansion / contraction is applied to the decoded time expansion / contraction signal using the same time expansion / contraction parameter as 708 in FIG. 7, and a non-expansion / contraction signal is generated.
- SNR 1 is calculated.
- This encoded signal is decoded by the same transform decoder, and the SNR 2 is calculated by comparing the decoded signal with the original signal.
- the determination is made by comparing SNR 1 and SNR 2 . If SNR 1 > SNR 2 , the time stretch is selected and the first encoded signal, transform encoder information, and encoded time stretch parameters are sent to the decoder. Otherwise, time scaling is not selected and the second encoded signal and transform encoder information are transmitted to the decoder.
- bit consumption can be compared instead of SNR.
- the time expansion / contraction technique is used to compensate for the influence of pitch change in the audio encoding system.
- a dynamic time expansion / contraction method is proposed.
- the time expansion / contraction method of the present invention improves the sound quality by correcting the pitch contour and taking into account the harmonic structure during time expansion / contraction based on the analysis of the harmonic structure.
- the dynamic time expansion / contraction method also evaluates the effectiveness of time expansion / contraction by comparing the harmonic structures before and after the time expansion / contraction, and determines whether or not the time expansion / contraction should be used for the target audio frame. This removes inaccuracies caused by inaccurate pitch contour information.
- Dynamic time stretching also provides a more efficient way to encode time stretching parameters and uses MS mode information obtained from transform coding to improve sound quality and coding efficiency.
- the encoding device 1 and the decoding device 2 may be constructed.
- the following operation may be performed.
- Some (or all) of the above-described processes may be the same (similar) to the operations described below.
- the following processing may be performed in the encoding device 1.
- the pitch of the signal 101i is the reference pitch (previously described: for example, the reference pitch in FIG. 15).
- 82r may be generated (refer to signal 812 in FIGS. 1 and 11) (time expansion / contraction unit 104, step S104 in FIG. 21).
- a shift to a shift destination pitch may be performed in this way.
- the shift destination pitch may not be the reference pitch but may be a harmonic of the reference pitch (harmonic) or the like (see Formula 2 etc.).
- the signal 101i (signal 104x) is, for example, of a plurality of channels such as a plurality of channels such as a stereo 2-channel, a 5.1-channel, or a 7.1-channel multichannel. It may be a signal in one channel.
- the signal 101i refers to, for example, a plurality of sections (for example, M sections 84 (section 841 to section 84M) included in the frame 84F (FIG. 16) shown in FIG.
- the signal in one or a part of the sections 84 may be used.
- M in FIG. 16 may specifically be 16, for example.
- the reference pitch (reference pitch 82r) described above is more appropriate when the signal 104x after being shifted to the reference pitch is encoded than when the signal 101i is encoded.
- the pitch to be encoded is more appropriate when the signal 104x after being shifted to the reference pitch is encoded than when the signal 101i is encoded.
- the term “appropriate” here means, for example, the amount of data after encoding (while maintaining the sound quality) when it is assumed that the signal 101 i before being shifted is encoded.
- This also means that the data amount of the signal 105x (FIG. 1) obtained by encoding the signal 104x after the shift is smaller. That is, for example, the smaller data amount refers to a data amount that is the same as the sound quality of the data of that data amount and is smaller than the data amount of the other data in which the sound quality is maintained.
- the reference pitch is a shift in another section (for example, the section 821 s adjacent to the section 822 s) other than the section of the signal 101 i (for example, the section 822 s in FIG. 15). ) Is the same pitch (reference pitch 82r) as the previous pitch (for example, reference pitch 82r).
- the signal 104x (FIG. 1) after the shift may be encoded into the signal 105x (conversion encoder 105, step S105).
- the signal 104x after the shift is easily spectrally encoded, and the signal that has been easily encoded is encoded, whereby the signal that is not shifted (the first signal 101i) is encoded.
- the sound quality is the same, the amount of data required for encoding can be reduced.
- the third signal 105x having a data amount smaller than the data amount of the directly encoded signal is encoded, and the third signal 105x having a smaller data amount is obtained as the encoded signal of the sound of the first signal 101i.
- Signal 105x is used.
- the parameter 102x (dynamic time expansion / contraction parameter, pitch parameter described above) for specifying the pitch of the signal 101i before the shift (see pitch 822 (FIG. 15)) may be calculated (pitch parameter).
- pitch parameter dynamic time expansion / contraction parameter, pitch parameter described above
- the calculated parameter 102x may be a predetermined ratio (ratio 88 (Tw_ratio) in FIG. 18: pitch change ratio described above).
- the calculated ratio (ratio 88, parameter 102x) is changed from the predetermined pitch (see, for example, pitch 821 in FIG. 15) by the ratio (see ratio 83 shown in FIG. 15).
- the specified pitch (pitch 822) can be specified (see the ratio 83 shown in FIG. 15).
- the data of the ratio 88 is number data for specifying the number of the ratio 88 (FIG. Tw_ratio_index), and the ratio is indirectly determined by specifying the ratio of the specified number. May be specified.
- Such number data may be calculated as the parameter 102x.
- the ratio indicated by reference numeral 83 is schematically illustrated as the ratio between the pitch 821 and the pitch 822 depending on the position of the tip of the arrow line indicated by reference numeral 83.
- the calculated parameter 102x is a signal obtained by decoding the signal 105x (the signal 204i in FIG. 2) when the encoded sound signal 105x is decoded (for example, by the decoding device 2) (FIG. 2).
- Signal 203ib (signal 104x in FIG. 1)
- a signal (signal 203x in FIG. 2 (signal 101i in FIG. 1)) having a pitch (see pitch 822) specified by the parameter 102x is generated (reverse shift). Parameter).
- the parameter 102x is communicated from the encoding device 1 to the decoding device (decoding device 2), and the communicated parameter 102x (see the signal 201i in FIG. 2)
- the above processing may be performed.
- the pitch of the decoded signal (signal 203x in FIG. 2) can be surely set to an appropriate pitch (see pitch 822).
- the sound data (pitch identification parameter 102x) is used together with the sound data (signal 104x, signal 105x in FIG. 1, signal 203ib, signal 204i in FIG. 2), and the sound data and pitch Two types of data may be used.
- the calculated parameter 102x is encoded into the encoded parameter 103x (parameter 201i in FIGS. 1 and 2) having a data amount smaller than the data amount of the parameter 102x.
- Lossless encoding Hassless encoding (Huffman code, Arithmetic encoding, etc.)
- Lossless encoding 103, step S103 Lossless encoding 103, step S103.
- the parameter 102x (pitch data) can also be encoded (lossless encoding) to reduce the data amount of the parameter 102x (pitch data).
- the section of the time adjacent to the time of the section (section 822s) of the pitch (see, for example, pitch 822 of FIG. 15) that can be specified by the calculated parameter 102x (parameter 204i of FIGS. 1 and 2) 821s) (pitch 821).
- the calculated parameter 102x is a ratio (ratio 83, Tw_ratio in FIG. 18) between the pitch (pitch 821) of the adjacent (section (section 821s)) and the pitch (pitch 822) of the parameter 102x.
- the specified parameter may be used.
- the ratio is calculated (specified), lossless encoding is performed on the calculated ratio, and the data after the ratio is irreversibly encoded may be used as the encoding time expansion / contraction parameter. (See description above).
- the calculated parameter 102x specifies a pitch (pitch 822) having a change by a ratio specified by the parameter 102x (ratio 83 in FIG. 15) from the adjacent pitch (pitch 821), and determines the pitch ( The pitch 822) may be indirectly specified by the ratio.
- the ratio 88a (eg, the ratio 88x itself) is relatively close to the ratio 88x (1.0 ratio: FIG. 18) of the 0 cent pitch change. Etc.) occur at a high frequency (appearance frequency), while a ratio 88b relatively distant from the ratio 88x (eg, the ratio of “1.0293” shown in FIG. 18) occurs at a low frequency. Noticed.
- the frequency at which the ratio 88 occurs is a frequency according to whether the ratio 88 is close to the 0 cent ratio 88x (higher the closer to the 0 cent ratio 88x, and the lower the distance is). I realized that.
- ratio 88 (parameter 102x) is a ratio that is relatively close to the ratio 88x of 0 cents (ratio 88a: FIG. 18) and is a ratio 88a that appears at a relatively high appearance frequency
- a code having a relatively short code length (bit length, length) (a code (bit string) 90a (FIG. 18), for example, a code “0” having a length of 1 (see FIG. 18)) is encoded. May be.
- the calculated ratio 88 (parameter 102x) is a ratio that is relatively far from the ratio 88x of 0 cents (ratio 88b), and is a ratio 88b that appears at a relatively low appearance frequency.
- the code may be encoded into a relatively long code (code 90b, for example, code "111110" shown in FIG. 18 and having a code length of 6).
- each ratio 88 (parameter 102x: ratio 88a, ratio 88b, etc.) thus calculated is whether the ratio 88 is close to the 0 cent ratio 88x (how much is the difference from the ratio 88x).
- Variable length code 90 (codes 90a, 90b, etc.) having a code length corresponding to the appearance frequency corresponding to the frequency of appearance.
- a table 103t (table data, table data) that associates an appropriate variable length code 90 (code 90a, 90b, etc.) corresponding to the ratio 88 with respect to the ratio 88 (ratio 88a, 88b, etc.).
- Table 85 see FIG. 18, FIG. 20, FIG. 1, etc. may be stored.
- this table 103t may be specifically memorize
- the calculated ratio 88 (ratio 88a, 88b: parameter 102x (FIG. 1)) is associated with the variable length code 90 (reference 90a, 90b: parameter 103x (FIG. 1)). Then, the variable length coding may be performed by encoding the ratio 88.
- the data amount of the parameter 103x (code 90) after encoding becomes smaller, and the amount of encoded data that can be used by the transform encoder can be indirectly increased, thereby improving the encoded sound quality. Can be made.
- the following processing may be performed in the decoding device 2 (FIG. 2 and the like).
- the signal 204i obtained by encoding the sound signal 203ib may be decoded into the signal 203ib (signal 104x) (conversion decoder 204, step S204).
- the transform decoder may be an orthogonal transform coding method such as MPEG (Moving Picture Experts Group) -AAC (Advanced Audio Coding), or ACELP (Algebraic Code Exited Linear Prediction).
- MPEG Motion Picture Experts Group
- AAC Advanced Audio Coding
- ACELP Algebraic Code Exited Linear Prediction
- the signal 204i to be decoded has a pitch (pitch 822) in the signal 203x (signal 101i) generated from the sound signal 203x (signal 101i) before being shifted.
- the signal 203ib (signal 104x) after being shifted to the reference pitch (reference pitch 82r) is an encoded signal 204i (signal 105x).
- the signal 204i to be decoded may be, for example, the signal 105x after being encoded by the encoding device 1 described above.
- the signal 204i to be decoded is included in the data (stream 106x in FIG. 1, stream 205i in FIG. 2) communicated from the encoding apparatus 1 that has performed the encoding to the decoding apparatus 2.
- a signal communicated from the encoding device 1 to the decoding device 2 may be used.
- a signal 203x obtained by shifting (reversely shifting) the reference pitch (reference pitch 82r) in the decoded signal 203ib from the signal 203ib decoded from the signal 204i to the pitch (pitch 822) before the shift. Is generated (time expansion and contraction unit 203, step S203).
- the encoding time expansion / contraction parameter 201i is losslessly decoded to obtain the dynamic time expansion / contraction parameter 202i.
- the acquired dynamic time expansion / contraction parameter 202i is represented by the TW_Ratio_Index.
- the time expansion / contraction parameter TW_Ratio is acquired from the acquired dynamic time expansion / contraction parameter 202i and the table 103t representing the relationship between TW_Ratio_Index and TW_Ratio.
- the signal 203ib is converted by the time expansion / contraction circuit (time expansion / contraction unit) 203 into a non-expansion / contraction signal 203x corresponding to the pitch before being shifted (reverse shift).
- the parameter 201i (parameter 103x in FIG. 1) obtained by encoding the ratio 88 (parameter 202i, parameter 102x) is decoded into the ratio 88 (parameter 202i, parameter 102x) and decoded.
- a shift to a pitch (pitch 822) specified by the ratio 88 (parameter 202i) may be performed (reversible decoding unit 201, S201).
- the data amount of the pitch data is also made small in the encoded data (parameter 201i, parameter 103x), and the data amount of the pitch data can be reduced.
- the inventor when the ratio 88 is a ratio 88a close to the 0 cent ratio 88x, appears frequently, and when the ratio 88b is a ratio 88b away from the 0 cent ratio 88x. Noticed that it appears less frequently.
- a relatively short code 90a may be decoded to a ratio 88a close to the 0 cent ratio 88x
- a relatively long code 90b may be decoded to a ratio 88b far from the 0 cent ratio 88x.
- decoding in accordance with the appearance frequency according to whether or not the ratio is close to the 0 cent ratio 88x may be performed.
- the code 90 (FIG. 18) of the parameter 201i to be decoded is the code 90 (code 90a) with the ratio 88a close to the 0 cent ratio 88x
- the code 90i is a short code 90a and is 0 cent.
- the code 90 (the code 90b) having the ratio 88b that is away from the ratio 88x the long code 90b may be used.
- the short code 90a may be decoded into the ratio 88a close to the 0 cent ratio 88x
- the long code 90b may be decoded into the ratio 88b away from the 0 cent ratio 88x.
- a decoding table 201t (FIG. 18, FIG. 2, FIG. 20, etc .: table 85) corresponding to the above-described table 103t (table 85: FIG. 18) is stored.
- the table 201t may be stored by the lossless decoding unit 201 (second pitch processing unit 201A: see FIG. 2, FIG. 20, etc.).
- the stored table 201t is decoded into the ratio 88 (parameter 202i) associated with the variable length code 90 (encoded parameter 201i), so that an appropriate decoding process is performed. Also good.
- a fixed-length code having a fixed length see the fixed-length code 91 (reference numerals 91a and 91b) having a length of 3 bits in FIG. 19) and pitch data (ratio 88 (FIG. 18)
- a technique is known in which the parameters in FIG. 1 (see parameter 202 (FIG. 2 etc.)) are fixed-length encoded.
- the data 9L (first row and second column in FIG. 22) communicated for each frame 84F is, for example, 16 fixed lengths corresponding to 16 sections 84 of the frame 84F.
- data 90L (second row and third row in FIG. 22) communicated for each frame 84F is shown in FIG. It includes fifteen length 1 codes 90c, indicated by fifteen "1" characters.
- the data 90L in the present embodiment is, for example, a single length 6 (long in the data 90Ls) indicated by one “6” (“4” in the data 90Ls) shown in FIG. 4) of code 90d (code 90ds of data 90Ls, code 90dt of data 90Lt).
- the data 90L in the present embodiment appears at a high frequency (for example, the frequency of 15/16 in the example of FIG. 22) and has a short length (for example, the length 1 at 9c in FIG. 22,
- a large number for example, 15 in the example of the data 90L in FIG. 22
- the code 90c reference numeral 90a in FIG. 18
- the code 90a “0” in the table of FIG. 18 is included. .
- the data 90L includes a code 90d (see, for example, the length 6 in FIG. 22 (length 4 in the data 90Ls and the length 6 in the code 90b “111110” in FIG. 18)).
- 18 includes a small number (for example, one illustrated in FIG. 22).
- the data amount of the data 90L in the processing such as communication of each frame 84F is reduced from the data amount in the data 91L (first row in FIG. 22) in the previous example.
- these reduction ranges are merely examples that are theoretically assumed by calculation.
- the principle for reduction described above may be used to obtain a reduction width that is the same as or close to these reduction widths (27 bits, 29 bits), or a relatively small reduction width, etc. It may be used to obtain other reduction widths.
- the reduction amount of the data amount to be reduced can be a relatively large reduction amount (for example, 27 bits, 29 bits, etc. described above).
- FIG. 12 shows a pitch 90j of only 100 cents (one cent is 1/1200 of one octave) constituting a semitone.
- a pitch that is only 1 / 100th of the semitone pitch 90j is 1 cent.
- the pitch between two pitches separated from each other by the ratio 88 of the row is It indicates how many times the pitch is 1 cent, i.e. the number of cents of the pitch in the ratio of 88 in that row.
- cent number of 1.0288 times the ratio 88 is 50 cents. Is shown.
- the range 861 (FIG. 18: a part of the range 86a) is a range of ratio 88 (1.0293, 1.0416) larger than 42 cents from the ratio 88x (the eighth row in FIG. 18) of 0 cents ( The ratio is larger than the ratio 88x and the absolute value of the difference from the ratio 88x is 42 cents or more).
- the range 862 (part of the range 86a) is a ratio 88 (less than -42 cents) (a ratio 88 (0.9772, 0. 9715, 0.9604) (the range is smaller than the ratio 88x and the absolute value of the difference from the ratio 88x is 42 cents or more).
- the absolute value of the difference from the 0 cent ratio 88x (line 8) is 42 cents or more, and is 42 cents or more away from the ratio 88x.
- the ratio 88 range is shown.
- range 87 is a range of ratio 88 that is less than 42 cents away.
- the ratio 88a (ratio 83a in FIG. 15) is, for example, the ratio 88 belonging to the range 87 in less than 42 cents as described above, and the ratio 88b (ratio 83b in FIG. 15) is, as shown in FIG. , A ratio 88 belonging to range 86a that is 42 cents or more.
- ratio 83 is the ratio in the range 87 where the ratio 83 is less than 42 cents.
- 83a is a relatively small difference
- a ratio 83b is a relatively large difference.
- the ratio 88a is, for example, a ratio 88a (in FIG. 18, the ratio 88x itself) that is relatively close to the 0 cent ratio 88x (Tw_ratio “1”).
- the other ratio 88b is a ratio 88b that is relatively far from the ratio 88x.
- the length (length 1) of the code 90a (code “0”) corresponding to the ratio 88a is shorter than the length of the code 90b (“111100”) corresponding to the ratio 88b.
- the code 90a (parameter 103x in FIG. 1) corresponding to the calculated ratio 88a is generated (reference code 103). 1), the generated code 90a may be decoded into the ratio 88a (parameter 202i in FIG. 2) (decoding device 2), and the processing described above may be performed.
- the code 90b corresponding to the ratio 88b is generated, and the generated code 90b is decoded into the ratio 88b.
- the above-described processing may be performed to reduce the data amount of the sound data (see the signal 105x (FIG. 1) and the signal 204i (FIG. 2)).
- the ratio 88b of the range 86a is calculated, that is, when the ratio 83 between the two pitches (pitch 822, 821) is 42 cents or more, the above-described processing is performed, Since the data amount of data is reduced, the data amount of sound data can be reduced more reliably.
- the ratio 83 (FIG. 15) is a ratio 83a of less than 42 cents, and the change between two pitches (see pitches 822 and 821 in FIG. 15) is a small change, not only 42 cents. Even when the ratio 83b is a large change, the data amount of the sound data is reduced. That is, regardless of whether the change in pitch (see pitches 822 and 821 in FIG. 15) is large or small, the data amount of sound data is reduced, and the data amount of sound data can be reliably reduced.
- the ratio 89 (FIG. 19) between two pitches (see pitches 822 and 821) is a ratio belonging to a range 87 that is less than 42 cents. Only in such a case, the data amount is reduced, and the data amount of the sound data cannot be surely reduced.
- the range in which appropriate processing is performed is from a relatively narrow range (range consisting of only the range 87) in the preceding example to a range wider than that range (range 87).
- the range including the range 86a is further increased to the range 86), and the range in which appropriate processing is performed can be set to a wider range (range 87).
- the above-mentioned range 87 is an example of such an expanded range.
- the range (range 87) in which appropriate processing is performed in the preceding example includes at least a ratio (see ratio 88 etc.) less than 42 cents. It is.
- the ratio 83p (FIG. 9) between two pitches (see pitches 822 and 821 in FIG. 15) at the position 704p (FIG. 9) is a ratio of 0 cents 90x (FIG. 18) (near).
- the position 704p (the position at which the pitch changes) as described above and the ratio 83q (FIG. 9) at the position 704q (FIG. 9) are the position 704q (previously described) that is the ratio 90x (near) of 0 cent.
- the constructed encoding apparatus stores, for example, the locations where the pitch variation is present (704p in FIG. 9) and the locations where the pitch variation is not present (704q in FIG. 9) in this encoded frame (see FIG. 9).
- 9 vector C, 102m), and the location information (vector C, 102m) and the TW_Ratio or TW_Ratio_Index information at the pitch fluctuation point (704p) are transmitted to the decoding device. good. By doing so, it is only necessary to transmit TW_Ratio (or TW_Ratio_Index) of only the pitch fluctuation portion, and therefore the encoding / decoding device can be configured with the minimum necessary communication data amount (encoding amount).
- the position 704x is a position 704q where the pitch does not change in many cases, and a position 704p where the pitch changes. I notice that there is little (a little) (previous).
- the parameter 102x (parameter 202i in FIGS. 1 and 2) is, for example, a ratio 83p (the data 102m (FIG. 9 and the like) specifying the changing position 704p and the changing position 704p specified by the data 102m). May be included.
- the parameter 102x may specify the ratio (ratio 83p) of the position 704p specified by the included data 102m as the ratio 83p (specified by the data (described above)) included in the parameter 102x.
- the parameter 102x is a ratio (ratio 83q) at a position other than the position 704p specified by the included data 102m (position 704q where the pitch does not change), for example, a ratio 90x of 0 cents (see FIG. It may be specified as a ratio 83q at a position 704q where the pitch does not change, such as 18).
- the parameter 102x includes only data of the ratio 83p of the changing position 704p and does not change.
- the data of the position 704q is not included, the data of many positions (the position 704q that does not change) is not included, and the data amount of the pitch data (parameters 102x and 103x in FIG. 1, 204i and 203 b in FIG. 2) is further increased. Can be sufficiently small.
- codes variable length code 90, data 90L (FIGS. 20 and 22) for encoding the pitch (ratio 88 of pitch 822 and pitch 822) of signal 204i (stream 205i) input to decoding apparatus 2 in this way. )) Format (table 85 in FIG. 18) is disclosed.
- a ratio 88a code (variable length code 90, code 90a) that is relatively close to a 0 cent ratio 88x is a shorter length (length 1) code 90a ("0").
- the code (variable length code 90, code 90b) having a ratio 88b far from the 0 cent ratio 88x is a code 90b ("111100") having a longer length (length 6).
- the data amount of the pitch data (parameters 103x, 203x) is, for example, in the first row and third column in FIG.
- the data amount of the pitch data can be further reduced by reducing the width from 48 bits to 21 bits in the second row and the third column (19 bits in the third row and the third column).
- a plurality of configurations (such as the lossless encoding unit 103) are combined to produce a synergistic effect from the combination.
- some or all of the plurality of configurations are lacking, and a synergistic effect in the present technology occurs. Absent.
- the present technology is considered to have an advanced level over the conventional example.
- a part (or all) of the encoding device 1 may be an integrated circuit in which one or more functions of the encoding device 1 are mounted (for example, refer to the integrated circuit 1C in FIG. 20).
- a computer program for causing a computer that is a part (or all) of the encoding device 1 to execute one or more functions of the encoding device 1 may be constructed.
- an integrated circuit see integrated circuit 2C
- a computer program see program 2P
- the like on which the function of the decoding device 2 is mounted may be constructed.
- a storage medium storing this computer program may be constructed, or a data structure of data of this computer program may be constructed.
- step S101 and S104, etc. may be any order within a range in which an appropriate operation is possible.
- the order of step S101 may be earlier than or later than that of step S104, or may be the same order by being executed in parallel.
- ranges can be considered as the range handled by the processing.
- the range (range 86, 87) of the above-described range of the pitch change ratio (the ratio 88 in FIG. 18 and the ratio 89 in FIG. 19) is more than the above-described various ranges.
- the bit stream (bit streams 106x and 205i) received by the decoding device (decoding device 2) is a plurality of positions (sections 841 to 84M) in one frame (frame 84F: FIG. 16). Only the signal at the pitch change position (position 704p) is time warped (time expansion / contraction processing) by the audio signal reconstructor (time expansion / contraction block (time expansion / contraction unit) 203), and signals of other positions are not time warped (time).
- a decoding device including position information (for example, data 102m in FIG. 9) for specifying a pitch change position (position 704p) that is not subjected to expansion / contraction processing may be constructed.
- the pitch parameter generator determines the pitch change position (see position 704p (FIG. 9), data 102m) and the pitch based on the detected pitch contour information (information 101x).
- Two pitch parameters (parameter 102x: for example, a first pitch parameter 102x for specifying a pitch change position and a second pitch parameter 102x for specifying a pitch change ratio) including a change ratio (see ratio 83p)
- An encoding device that generates a pitch parameter 102x or the like may be constructed.
- the number of pitch change positions is small (small), and the number of other positions is large.
- an encoding device (encoding device 1e: FIG. 3) or the like further provided with a pitch contour reconstructor (dynamic time expansion / contraction reconstruction block 307: FIG. 3) or the like may be constructed.
- decoding is performed from the encoding pitch parameter (parameter 303x: FIG. 3 (parameter 103x)) output from the first encoder (lossless encoding unit 303: FIG. 3 (lossless encoding unit 103: FIG. 1)).
- a first decoder (lossless decoding block 306) that generates a decoding pitch parameter (parameter 306x) including a pitch change position (see position 704p (see FIG.
- a pitch contour reconstructor (dynamic time expansion / contraction reconstruction block 307) for restoring pitch contour information (information 307x (see information 301x)) according to the decoded pitch parameter (parameter 306x), and the pitch shifter (time The expansion / contraction block 304) is the restored pitch contour information (information 307x).
- Encoding device encoding device 1e, pitch contour analysis unit 301 to multiplexer circuit 308, that shifts the pitch frequency (pitch 822: FIG. 15) of the input audio signal (signal 301i) according to the constructed pitch contour information (information 307x) May be constructed.
- the restored information 307x as the information used in the shift, the same information as the information restored in the decoding device 2 used in the decoding device 2 is used. More appropriate (accurate) information may be available.
- the middle side stereo mode (MS stereo mode) is applied to each audio frame of the input stereo audio signal (signal 401i: FIG. 4), and a flag (flag 401x) indicating application of the MS stereo mode is confirmed.
- a downmixer (downmix block 402) for downmixing the input stereo audio signal (signal 401i) according to the generated flag (flag 401x) and an MS mode selector (MS operation block (MS operation unit) 401)
- the pitch detector is a downmix signal (signal 402a) obtained by downmixing the input stereo audio signal (signal 401i) according to the generated flag (flag 401x).
- Ma Detects pitch contour information (information 403x) of the input stereo audio signal (signal 402b), and the pitch shifter (time expansion / contraction block 406) according to the pitch contour information (information 403x) and the flag (flag 401x).
- An encoding device (encoding device 1f, MS operation unit 401) that shifts the pitch frequency (see pitch 822 (FIG. 15)) of the input stereo audio signal or the downmix signal (signal 402x (signal 402a or 402b)).
- ⁇ Multiplexer circuit 408) may be constructed.
- a flag may be generated and processing according to the generated flag may be performed.
- an MS mode selector that selects an MS stereo mode according to an input stereo audio signal (signal 601i: FIG. 6) and generates a flag (flag 601x) indicating application of the MS stereo mode
- generation A downmixer (downmix block 602) for downmixing the input stereo audio signal (signal 601i) according to the flag (flag 601x), a first decoder (lossless decoding block 608), and a pitch contour reconstructor (motion And a pitch detector (pitch contour analysis block 603) according to the generated flag (flag 601x), and the input stereo audio signal (signal 601i) is downmixed according to the generated flag (flag 601x).
- the pitch decoder (signal 602a) or pitch contour information (information 603x) of the input stereo audio signal (signal 602b) is detected, and the first decoder (lossless decoding block 608) detects the first encoder (lossless encoding). From the encoded pitch parameter (parameter 605x) output from the block 605), a decoding pitch parameter (including a decoding pitch change position (see position 704p (see FIG. 8)) and a decoding pitch change ratio (see ratio 83p) (see FIG. 8). Parameter 608x), and the pitch contour reconstructor (dynamic time expansion and reconstruction reconstruction block 609) reconstructs pitch contour information according to the generated decoded pitch parameter (parameter 608x) and the flag (flag 601x).
- the pitch shifter (time expansion / contraction block 606) shifts the pitch frequency of the input stereo audio signal or the downmix signal (signal 602x (signal 602a or 602b)) according to the reconstructed pitch contour information (information 609x).
- An encoding device (encoding device 1h, MS operation unit 601 to multiplexer circuit 408) may be constructed.
- the same information as the information used in the decryption device 2 is used, so that more appropriate information can be used and the operation can be easily performed.
- comparison means for determining whether to use the pitch shifter (time expansion / contraction block 708 in FIG. 7), the multiplexer (multiplexer block 711), encoded data (signal 709x).
- an encoding device encoding device 1i, MS operation unit 701 to multiplexer circuit 711 that generates a bit stream (stream 711x) by combining the encoding pitch parameter (parameter 710x) output from the comparison unit. May be constructed.
- the comparison scheme 710 generates a more appropriate signal (for example, SNR (Signal Signal) among the third signal 709x (third signal 105x (FIG. 1)) generated and the other signals.
- SNR Signal Signal
- to Noise Ratio signal noise ratio, signal-to-noise ratio having a higher noise and less noise, or a signal having a smaller amount of data
- the decoding device decoding device 2 or the like. It may be selected as a signal to be used.
- the other signal may be, for example, another signal other than the third signal 709x in which the same sound as that recorded by the third signal 709x is recorded.
- the SNR (Signal to Noise Ratio) in the third signal 709x and the SNR in other signals are calculated, respectively, and based on the two calculated SNRs.
- the above selection may be made.
- the calculated SNR is, for example, the difference that the signal of the SNR (the third signal 709x, other signals) has with respect to the signal before the shift (see the signal 101i in FIG. 1 and the like).
- the value when the noise of the signal of the SNR is taken may be used.
- the third signal 709x may not be appropriate, the other signal is used and the appropriate signal is maintained to be used. Is available.
- the pitch parameter generator (for example, the dynamic time expansion / contraction block 102 in FIG. 1) provided in the encoding device (encoding device 1) is a first harmonic structure before the pitch shift.
- a pitch parameter generator (dynamic time expansion / contraction block 102) is constructed that modifies the pitch contour (information 101x) and determines whether the pitch shift should be used or not by comparing with the second harmonic structure after May be.
- the first pitch contour when the first pitch contour is not corrected, it is determined to use the pitch shift in the first pitch contour, and the first pitch contour is changed to the second pitch contour. By being modified, it may be determined to use a pitch shift at the second pitch contour.
- the harmonic structure (data) includes, for example, data including a plurality of values each of which is a value indicating the amplitude of the harmonics corresponding to the value among one or more harmonics of the signal. But you can.
- an evaluation value indicating the quality of the signal after being processed may be calculated from the harmonic structure of the signal before being pitch-shifted and the harmonic structure of the signal after being processed.
- the quality indicated by the evaluation value calculated for the pitch shift of the first pitch contour is higher than the quality indicated by the evaluation value calculated for the pitch shift of the second pitch contour, It may be determined that the first pitch profile is not modified and, if it is of lower quality (if less), it will be modified.
- the quality at the first pitch contour may not be high quality
- the signal quality after the pitch shift is performed after the processing at the second pitch contour is performed at that time It is possible to maintain high quality and ensure high signal quality.
- the first decoder determines the pitch change position (position 704p (FIG. 9)) from the encoded pitch parameter information (parameter 201i) separated. And the pitch change ratio (see the ratio 83p) (see, for example, the first parameter 202i for specifying the pitch change position and the second parameter for specifying the pitch change ratio).
- a decoding device (decoding device 2c) that generates two parameters 202i) with 202i may be constructed.
- the decoding apparatus includes the bit stream (stream 506i) including the encoded data (signal 505i: FIG. 5) of the pitch-shifted stereo audio signal (signal 503ibL, etc .: FIG. 5). ) And an MS mode detector (MS mode detection block 504), and the second decoder (transform decoder block 505) decodes the separated encoded data (signal 505i) and is pitch-shifted.
- the audio signal (signal 503ibL, etc.) and MS mode encoding information (information 504i) are generated, and the MS mode detector (MS mode detection block 504) generates whether the MS mode is enabled.
- MS mode is detected according to the encoded MS mode information (information 504i)
- An MS mode flag (flag 504F: FIG. 5) indicating whether or not to be enabled is generated, and the pitch contour reconstructor (dynamic time expansion / contraction reconstruction unit 502) generates the first decoder (lossless decoding block 501).
- the pitch contour reconstructor dynamic time expansion / contraction reconstruction unit 502
- the pitch contour reconstructor dynamic time expansion / contraction reconstruction unit 502
- a block refers to a so-called functional block.
- the above-described effects occur, and the operation of the encoding device 1 and the like can be performed more appropriately.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Mathematical Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
なお、上述された処理(符号化など)は、後で詳しく説明されるように、例えば、将来定められることが想定される、ISO(International Organization for Standardization)等の規格における処理と同じ処理である。
第1の実施形態において、動的時間伸縮方式を用いる符号化装置を提案する。
第1の実施形態において、ピッチ輪郭が、動的時間伸縮方式により処理され、動的時間伸縮パラメータが生成される。そして、生成された動的時間伸縮パラメータは、時間伸縮が適用される位置と、その位置の時間伸縮値とを表す。提案の動的時間伸縮方式により、音質が改善される。時間伸縮値の符号化に用いられるビットを、さらに削減するため、可逆符号化も導入する。
第2の実施形態において、時間伸縮パラメータを、より効率よく符号化する方式を用いる動的時間伸縮方法を説明する。
動的時間伸縮により、時間伸縮を通して、ハーモニクス構造を再構築することが可能になる。エネルギーが、参照ピッチと、そのハーモニクス成分に制限されることから、符号化効率が、改善される。評価方式により、ピッチ検出の精度への依存が減少し、符号化システムの性能が、改善される。時間伸縮パラメータを符号化する効率的な方式は、ビットレートを減らすことで、音質を改善し、より大きなピッチ変化レートを有する信号の符号化に対応することができる。
第3の実施形態において、動的時間伸縮方式を用いる復号装置を提案する。
動的時間伸縮再構築の具体例を、第4の実施形態で説明する。
第5の実施形態において、動的時間伸縮方式を用いる、他の符号化装置を提案する。
第6の実施形態において、ミドルサイドステレオモード(MSモード)を組み入れた符号化装置を説明する。
第6の実施形態において、動的時間圧縮は、ステレオ符号化に、さらに適するように変更される。ステレオ符号化に関し、左右のチャネルは、異なる特性を持つことがある。この場合、異なるチャネルに対し、異なる時間圧縮パラメータが算出される。左右のチャネルが、類似の特性を有することもある。両チャネルに、同じ時間圧縮パラメータを用いると、合理的である。左右のチャネルが類似している場合、同じ時間圧縮パラメータの組を用いることで、より効率的なオーディオ符号化が、達成できる。
第7の実施形態において、MSモードに対応する復号装置を説明する。
図6は、MSモードを利用する、変更された動的時間伸縮を用いるエンコーダのブロック図である。
第9の実施形態において、閉ループ動的時間伸縮手段を備える符号化装置を、導入する。
2 復号装置
2S システム
101 ピッチ輪郭分析部
102 動的時間伸縮部
103 可逆符号化部
104 時間伸縮部
105 変換エンコーダ
106 マルチプレクサ
201 可逆復号部
202 動的時間伸縮再構築部
203 時間伸縮部
204 変換デコーダ
205 デマルチプレクサ
Claims (19)
- 入力オーディオ信号のピッチ輪郭情報を検出するピッチディテクタと、
検出された前記ピッチ輪郭情報に基づいて、当該ビット変化比の変域は、当該範囲のピッチ変化比のセント数の絶対値は、42以上である範囲を含む範囲の変域であるピッチ変化比を含むピッチパラメータを生成するピッチパラメータジェネレータと、
生成された前記ピッチパラメータを符号化する第1のエンコーダと、
前記ピッチ輪郭情報に従って、前記入力オーディオ信号のピッチ周波数をシフトするピッチシフタと、
前記ピッチシフタから出力された、シフトがされたオーディオ信号を符号化する第2のエンコーダと、
前記第1のエンコーダから出力された符号化ピッチパラメータと、前記第2のエンコーダから出力された、前記ピッチシフタから出力された前記オーディオ信号が符号化されたデータとを組み合わせることで、前記符号化ピッチパラメータと当該データとが含まれるビットストリームを生成するマルチプレクサとを備える符号化装置。 - 前記ピッチパラメータジェネレータは、検出された前記ピッチ輪郭情報に基づいて、ピッチ変化位置と前記ピッチ変化比とを含む前記ピッチパラメータを生成する請求項1記載の符号化装置。
- 前記第1のエンコーダから出力された前記符号化ピッチパラメータから、復号ピッチ変化位置と復号ピッチ変化比とを含む復号ピッチパラメータを生成する第1のデコーダと、
生成された前記復号ピッチパラメータに従って、ピッチ輪郭情報を復元するピッチ輪郭リコンストラクタとを備え、
前記ピッチシフタは、復元された前記ピッチ輪郭情報である再構築ピッチ輪郭情報に従って、前記入力オーディオ信号のピッチ周波数をシフトする請求項2記載の符号化装置。 - 入力ステレオオーディオ信号の各オーディオフレームにミドルサイドステレオモード(MSステレオモード)を適用するかどうかを確認して、前記MSステレオモードの適用を示すフラグを生成するMSモードセレクタと、
生成された前記フラグに従って、前記入力ステレオオーディオ信号をダウンミックスするダウンミキサとを備え、
前記ピッチディテクタは、生成された前記フラグに従って、前記入力ステレオオーディオ信号がダウンミックスされたダウンミックス信号または前記入力ステレオオーディオ信号のピッチ輪郭情報を検出し、
前記ピッチシフタは、前記ピッチ輪郭情報と前記フラグとに従って、前記入力ステレオオーディオ信号または前記ダウンミックス信号のピッチ周波数をシフトする請求項2または3記載の符号化装置。 - 入力ステレオオーディオ信号に従って、MSステレオモードを選択し、前記MSステレオモードの適用を示すフラグを生成するMSモードセレクタと、
生成された前記フラグに従って前記入力ステレオオーディオ信号をダウンミックスするダウンミキサと、
第1のデコーダと、
ピッチ輪郭リコンストラクタとを備え、
前記ピッチディテクタは、生成された前記フラグに従って、前記入力ステレオオーディオ信号がダウンミックスされたダウンミックス信号または前記入力ステレオオーディオ信号のピッチ輪郭情報を検出し、
前記第1のデコーダは、前記第1のエンコーダから出力された前記符号化ピッチパラメータから、復号ピッチ変化位置と復号ピッチ変化比とを含む復号ピッチパラメータを生成し、
前記ピッチ輪郭リコンストラクタは、生成された前記復号ピッチパラメータと、前記フラグに従って、再構築ピッチ輪郭情報を復元し、
前記ピッチシフタは、復元された前記再構築ピッチ輪郭情報に従って、前記入力ステレオオーディオ信号または前記ダウンミックス信号のピッチ周波数をシフトする請求項2記載の符号化装置。 - 前記ピッチシフタを使用するかどうかを決定する比較手段を備え、
前記マルチプレクサは、符号化データと、前記比較手段から出力された符号化ピッチパラメータとを組み合わせることで、前記ビットストリームを生成する請求項5記載の符号化装置。 - 請求項1~6の何れかに記載の符号化装置に設けられた前記ピッチパラメータジェネレータであって、
ピッチシフトがされる前の第1のハーモニクス構造と、された後の第2のハーモニクス構造とを比較することで、前記ピッチ輪郭を修正し、当該ピッチシフトを利用すべきかどうかを決定するピッチパラメータジェネレータ。 - 前記第1のエンコーダは、
前記ピッチパラメータを、
当該ピッチパラメータが、比較的小さな絶対値のセント数のピッチ変化比のピッチパラメータである場合には、比較的短い符号長の符号の符号化ピッチパラメータへと符号化し、
比較的大きな絶対値のセント数のピッチ変化比のピッチパラメータである場合には、比較的長い符号長の符号の符号化ピッチパラメータへと符号化する請求項1~6の何れかに記載の符号化装置。 - ピッチシフトされたオーディオ信号の符号化データと、符号化ピッチパラメータ情報とを含むビットストリームを復号する復号装置であって、
復号を行う前記ビットストリームから、当該ビットストリームに含まれる前記符号化データと、前記符号化ピッチパラメータ情報とをそれぞれ分離するデマルチプレクサと、
分離された前記符号化ピッチパラメータ情報から、当該ビット変化比の変域は、当該範囲のピッチ変化比のセント数の絶対値は、42以上である範囲を含む範囲の変域であるピッチ変化比を含む復号ピッチパラメータを生成する第1のデコーダと、
生成された前記復号ピッチパラメータに従って、ピッチ輪郭情報を復元するピッチ輪郭リコンストラクタと、
分離された前記符号化データを復号して、ピッチシフトされた前記オーディオ信号を生成する第2のデコーダと、
復元された前記ピッチ輪郭情報である再構築ピッチ輪郭情報に従って、ピッチシフトされた前記オーディオ信号を、元のオーディオ信号に変換するオーディオ信号リコンストラクタとを備える復号装置。 - 前記第1のデコーダは、分離された前記符号化ピッチパラメータ情報から、ピッチ変化位置と前記ピッチ変化比とを含む前記復号ピッチパラメータを生成する請求項9記載の復号装置。
- 当該復号装置は、ピッチシフトされたステレオオーディオ信号の前記符号化データを含む前記ビットストリームを復号し、
MSモードディテクタを備え、
前記第2のデコーダは、分離された前記符号化データを復号して、ピッチシフトされた前記ステレオオーディオ信号と、MSモード符号化情報とを生成し、
前記MSモードディテクタは、MSモードが有効にされているかどうかを、生成された前記MSモード符号化情報に従って検出し、MSモードが有効にされるべきかどうかを示すMSモードフラグを生成し、
前記ピッチ輪郭リコンストラクタは、前記第1のデコーダから出力された、生成された前記復号ピッチパラメータと、生成された前記MSモードフラグとに従って、前記ピッチ輪郭情報を復元する請求項10記載の復号装置。 - 前記第1のデコーダは、
分離された前記符号化ピッチパラメータ情報を、
当該符号化ピッチパラメータ情報が、比較的短い符号長の符号の符号化ピッチパラメータ情報である場合には、比較的小さな絶対値のセント数のピッチ変化比のピッチパラメータへと復号し、
比較的長い符号長の符号の符号化ピッチパラメータ情報である場合には、比較的大きな絶対値のセント数のピッチ変化比のピッチパラメータへと復号する請求項9~11の何れかに記載の復号装置。 - 請求項8記載の符号化装置と、請求項12記載の復号装置とを含んでなる信号処理システム。
- 入力オーディオ信号のピッチ輪郭情報を検出するピッチディテクタ工程と、
検出された前記ピッチ輪郭情報に基づいて、当該ビット変化比の変域は、当該範囲のピッチ変化比のセント数の絶対値は、42以上である範囲を含む範囲の変域であるピッチ変化比を含むピッチパラメータを生成するピッチパラメータジェネレータ工程と、
生成された前記ピッチパラメータを符号化する第1のエンコーダ工程と、
前記ピッチ輪郭情報に従って、前記入力オーディオ信号のピッチ周波数をシフトするピッチシフタ工程と、
前記ピッチシフタ工程で出力された、シフトがされたオーディオ信号を符号化する第2のエンコーダ工程と、
前記第1のエンコーダ工程で出力された符号化ピッチパラメータと、前記第2のエンコーダ工程で出力された、前記ピッチシフタから出力された前記オーディオ信号が符号化されたデータとを組み合わせることで、前記符号化ピッチパラメータと当該データとが含まれるビットストリームを生成するマルチプレクサ工程とを含む符号化方法。 - ピッチシフトされたオーディオ信号の符号化データと、符号化ピッチパラメータ情報とを含むビットストリームを復号する復号方法であって、
復号を行う前記ビットストリームから、当該ビットストリームに含まれる前記符号化データと、前記符号化ピッチパラメータ情報とをそれぞれ分離するデマルチプレクサ工程と、
分離された前記符号化ピッチパラメータ情報から、当該ビット変化比の変域は、当該範囲のピッチ変化比のセント数の絶対値は、42以上である範囲を含む範囲の変域であるピッチ変化比を含む復号ピッチパラメータを生成する第1のデコーダ工程と、
生成された前記復号ピッチパラメータに従って、ピッチ輪郭情報を復元するピッチ輪郭リコンストラクタ工程と、
分離された前記符号化データを復号して、ピッチシフトされた前記オーディオ信号を生成する第2のデコーダ工程と、
復元された前記ピッチ輪郭情報である再構築ピッチ輪郭情報に従って、ピッチシフトされた前記オーディオ信号を、元のオーディオ信号に変換するオーディオ信号リコンストラクタ工程とを含む復号方法。 - 入力オーディオ信号のピッチ輪郭情報を検出するピッチディテクタと、
検出された前記ピッチ輪郭情報に基づいて、当該ビット変化比の変域は、当該範囲のピッチ変化比のセント数の絶対値は、42以上である範囲を含む範囲の変域であるピッチ変化比を含むピッチパラメータを生成するピッチパラメータジェネレータと、
生成された前記ピッチパラメータを符号化する第1のエンコーダと、
前記ピッチ輪郭情報に従って、前記入力オーディオ信号のピッチ周波数をシフトするピッチシフタと、
前記ピッチシフタから出力された、シフトがされたオーディオ信号を符号化する第2のエンコーダと、
前記第1のエンコーダから出力された符号化ピッチパラメータと、前記第2のエンコーダから出力された、前記ピッチシフタから出力された前記オーディオ信号が符号化されたデータとを組み合わせることで、前記符号化ピッチパラメータと当該データとが含まれるビットストリームを生成するマルチプレクサとを備える集積回路。 - ピッチシフトされたオーディオ信号の符号化データと、符号化ピッチパラメータ情報とを含むビットストリームを復号する集積回路であって、
復号を行う前記ビットストリームから、当該ビットストリームに含まれる前記符号化データと、前記符号化ピッチパラメータ情報とをそれぞれ分離するデマルチプレクサと、
分離された前記符号化ピッチパラメータ情報から、当該ビット変化比の変域は、当該範囲のピッチ変化比のセント数の絶対値は、42以上である範囲を含む範囲の変域であるピッチ変化比を含む復号ピッチパラメータを生成する第1のデコーダと、
生成された前記復号ピッチパラメータに従って、ピッチ輪郭情報を復元するピッチ輪郭リコンストラクタと、
分離された前記符号化データを復号して、ピッチシフトされた前記オーディオ信号を生成する第2のデコーダと、
復元された前記ピッチ輪郭情報である再構築ピッチ輪郭情報に従って、ピッチシフトされた前記オーディオ信号を、元のオーディオ信号に変換するオーディオ信号リコンストラクタとを備える集積回路。 - 入力オーディオ信号のピッチ輪郭情報を検出するピッチディテクタ工程と、
検出された前記ピッチ輪郭情報に基づいて、当該ビット変化比の変域は、当該範囲のピッチ変化比のセント数の絶対値は、42以上である範囲を含む範囲の変域であるピッチ変化比を含むピッチパラメータを生成するピッチパラメータジェネレータ工程と、
生成された前記ピッチパラメータを符号化する第1のエンコーダ工程と、
前記ピッチ輪郭情報に従って、前記入力オーディオ信号のピッチ周波数をシフトするピッチシフタ工程と、
前記ピッチシフタ工程で出力された、シフトがされたオーディオ信号を符号化する第2のエンコーダ工程と、
前記第1のエンコーダ工程で出力された符号化ピッチパラメータと、前記第2のエンコーダ工程で出力された、前記ピッチシフタから出力された前記オーディオ信号が符号化されたデータとを組み合わせることで、前記符号化ピッチパラメータと当該データとが含まれるビットストリームを生成するマルチプレクサ工程とをコンピュータに実行させるためのコンピュータプログラム。 - ピッチシフトされたオーディオ信号の符号化データと、符号化ピッチパラメータ情報とを含むビットストリームをコンピュータに復号させるためのコンピュータプログラムであって、
復号を行う前記ビットストリームから、当該ビットストリームに含まれる前記符号化データと、前記符号化ピッチパラメータ情報とをそれぞれ分離するデマルチプレクサ工程と、
分離された前記符号化ピッチパラメータ情報から、当該ビット変化比の変域は、当該範囲のピッチ変化比のセント数の絶対値は、42以上である範囲を含む範囲の変域であるピッチ変化比を含む復号ピッチパラメータを生成する第1のデコーダ工程と、
生成された前記復号ピッチパラメータに従って、ピッチ輪郭情報を復元するピッチ輪郭リコンストラクタ工程と、
分離された前記符号化データを復号して、ピッチシフトされた前記オーディオ信号を生成する第2のデコーダ工程と、
復元された前記ピッチ輪郭情報である再構築ピッチ輪郭情報に従って、ピッチシフトされた前記オーディオ信号を、元のオーディオ信号に変換するオーディオ信号リコンストラクタ工程とを前記コンピュータに実行させるためのコンピュータプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2011537144A JP5530454B2 (ja) | 2009-10-21 | 2010-10-21 | オーディオ符号化装置、復号装置、方法、回路およびプログラム |
EP10824667.9A EP2492911B1 (en) | 2009-10-21 | 2010-10-21 | Audio encoding apparatus, decoding apparatus, method, circuit and program |
CN2010800036592A CN102257564B (zh) | 2009-10-21 | 2010-10-21 | 音频编码装置、解码装置、方法、电路及程序 |
US13/141,169 US8886548B2 (en) | 2009-10-21 | 2010-10-21 | Audio encoding device, decoding device, method, circuit, and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2009-242302 | 2009-10-21 | ||
JP2009242302 | 2009-10-21 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2011048815A1 true WO2011048815A1 (ja) | 2011-04-28 |
Family
ID=43900059
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2010/006234 WO2011048815A1 (ja) | 2009-10-21 | 2010-10-21 | オーディオ符号化装置、復号装置、方法、回路およびプログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US8886548B2 (ja) |
EP (1) | EP2492911B1 (ja) |
JP (1) | JP5530454B2 (ja) |
CN (1) | CN102257564B (ja) |
WO (1) | WO2011048815A1 (ja) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10173036B2 (en) | 2012-02-07 | 2019-01-08 | Marie-Andrea I. Wilborn | Apparatus operable to protect and maintain positioning of an IV catheter |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7720677B2 (en) * | 2005-11-03 | 2010-05-18 | Coding Technologies Ab | Time warped modified transform coding of audio signals |
MY154452A (en) * | 2008-07-11 | 2015-06-15 | Fraunhofer Ges Forschung | An apparatus and a method for decoding an encoded audio signal |
ES2379761T3 (es) * | 2008-07-11 | 2012-05-03 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Proporcinar una señal de activación de distorsión de tiempo y codificar una señal de audio con la misma |
US8855303B1 (en) * | 2012-12-05 | 2014-10-07 | The Boeing Company | Cryptography using a symmetric frequency-based encryption algorithm |
US9798974B2 (en) | 2013-09-19 | 2017-10-24 | Microsoft Technology Licensing, Llc | Recommending audio sample combinations |
US9257954B2 (en) * | 2013-09-19 | 2016-02-09 | Microsoft Technology Licensing, Llc | Automatic audio harmonization based on pitch distributions |
US9280313B2 (en) | 2013-09-19 | 2016-03-08 | Microsoft Technology Licensing, Llc | Automatically expanding sets of audio samples |
US9372925B2 (en) | 2013-09-19 | 2016-06-21 | Microsoft Technology Licensing, Llc | Combining audio samples by automatically adjusting sample characteristics |
CN106571145A (zh) * | 2015-10-08 | 2017-04-19 | 重庆邮电大学 | 一种语音模仿方法和装置 |
GB201621434D0 (en) | 2016-12-16 | 2017-02-01 | Palantir Technologies Inc | Processing sensor logs |
CN107181928A (zh) * | 2017-07-21 | 2017-09-19 | 苏睿 | 会议系统及数据传输方法 |
CN113112993B (zh) * | 2020-01-10 | 2024-04-02 | 阿里巴巴集团控股有限公司 | 一种音频信息处理方法、装置、电子设备以及存储介质 |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60263377A (ja) * | 1984-06-08 | 1985-12-26 | Ricoh Elemex Corp | 音響信号の時間軸変換装置 |
JPS60263375A (ja) * | 1984-06-08 | 1985-12-26 | Ricoh Elemex Corp | 音響信号の時間軸変換装置 |
JPH10111694A (ja) * | 1996-10-08 | 1998-04-28 | Sony Corp | 音声信号多重化装置および方法 |
JP2001188600A (ja) * | 1999-12-28 | 2001-07-10 | Matsushita Electric Ind Co Ltd | 音程変換装置 |
JP2002162996A (ja) * | 2000-11-24 | 2002-06-07 | Matsushita Electric Ind Co Ltd | オーディオ信号符号化方法、オーディオ信号符号化装置、音楽配信方法、および、音楽配信システム |
JP2002268694A (ja) * | 2001-03-13 | 2002-09-20 | Nippon Hoso Kyokai <Nhk> | ステレオ信号の符号化方法及び符号化装置 |
JP2003521721A (ja) * | 1998-11-24 | 2003-07-15 | マイクロソフト コーポレイション | ピッチ追跡方法および装置 |
WO2006046761A1 (ja) * | 2004-10-27 | 2006-05-04 | Yamaha Corporation | ピッチ変換装置 |
US20080004869A1 (en) | 2006-06-30 | 2008-01-03 | Juergen Herre | Audio Encoder, Audio Decoder and Audio Processor Having a Dynamically Variable Warping Characteristic |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001069474A1 (fr) | 2000-03-14 | 2001-09-20 | Kabushiki Kaisha Toshiba | Centre de systemes d'irm et systeme d'irm |
FR2850781B1 (fr) * | 2003-01-30 | 2005-05-06 | Jean Luc Crebouw | Procede pour le traitement numerique differencie de la voix et de la musique, le filtrage du bruit, la creation d'effets speciaux et dispositif pour la mise en oeuvre dudit procede |
SE0301272D0 (sv) * | 2003-04-30 | 2003-04-30 | Coding Technologies Sweden Ab | Adaptive voice enhancement for low bit rate audio coding |
US7840014B2 (en) * | 2005-04-05 | 2010-11-23 | Roland Corporation | Sound apparatus with howling prevention function |
US7974837B2 (en) | 2005-06-23 | 2011-07-05 | Panasonic Corporation | Audio encoding apparatus, audio decoding apparatus, and audio encoded information transmitting apparatus |
US9058812B2 (en) | 2005-07-27 | 2015-06-16 | Google Technology Holdings LLC | Method and system for coding an information signal using pitch delay contour adjustment |
US7720677B2 (en) * | 2005-11-03 | 2010-05-18 | Coding Technologies Ab | Time warped modified transform coding of audio signals |
US7734053B2 (en) * | 2005-12-06 | 2010-06-08 | Fujitsu Limited | Encoding apparatus, encoding method, and computer product |
WO2009038512A1 (en) * | 2007-09-19 | 2009-03-26 | Telefonaktiebolaget Lm Ericsson (Publ) | Joint enhancement of multi-channel audio |
CN101552005A (zh) | 2008-04-03 | 2009-10-07 | 华为技术有限公司 | 编码方法、解码方法、系统及装置 |
-
2010
- 2010-10-21 JP JP2011537144A patent/JP5530454B2/ja not_active Expired - Fee Related
- 2010-10-21 US US13/141,169 patent/US8886548B2/en not_active Expired - Fee Related
- 2010-10-21 EP EP10824667.9A patent/EP2492911B1/en not_active Not-in-force
- 2010-10-21 CN CN2010800036592A patent/CN102257564B/zh not_active Expired - Fee Related
- 2010-10-21 WO PCT/JP2010/006234 patent/WO2011048815A1/ja active Application Filing
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60263377A (ja) * | 1984-06-08 | 1985-12-26 | Ricoh Elemex Corp | 音響信号の時間軸変換装置 |
JPS60263375A (ja) * | 1984-06-08 | 1985-12-26 | Ricoh Elemex Corp | 音響信号の時間軸変換装置 |
JPH10111694A (ja) * | 1996-10-08 | 1998-04-28 | Sony Corp | 音声信号多重化装置および方法 |
JP2003521721A (ja) * | 1998-11-24 | 2003-07-15 | マイクロソフト コーポレイション | ピッチ追跡方法および装置 |
JP2001188600A (ja) * | 1999-12-28 | 2001-07-10 | Matsushita Electric Ind Co Ltd | 音程変換装置 |
JP2002162996A (ja) * | 2000-11-24 | 2002-06-07 | Matsushita Electric Ind Co Ltd | オーディオ信号符号化方法、オーディオ信号符号化装置、音楽配信方法、および、音楽配信システム |
JP2002268694A (ja) * | 2001-03-13 | 2002-09-20 | Nippon Hoso Kyokai <Nhk> | ステレオ信号の符号化方法及び符号化装置 |
WO2006046761A1 (ja) * | 2004-10-27 | 2006-05-04 | Yamaha Corporation | ピッチ変換装置 |
US20080004869A1 (en) | 2006-06-30 | 2008-01-03 | Juergen Herre | Audio Encoder, Audio Decoder and Audio Processor Having a Dynamically Variable Warping Characteristic |
Non-Patent Citations (4)
Title |
---|
BERND EDLER: "A Time-warped MDCT Approach To Speech Transform Coding", AES 126TH CONVENTION, May 2000 (2000-05-01) |
MILAN JELINEK: "Wideband Speech Coding Advances in VMR-WB Standard", IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 15, no. 4, May 2007 (2007-05-01), XP011177208, DOI: doi:10.1109/TASL.2007.894514 |
See also references of EP2492911A4 |
XUEJING SUN: "Pitch Detection and Voice Quality Analysis Using Subharmonic-to-Harmonic Ratio", IEEE ICASSP, 2002, pages 333 - 336 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10173036B2 (en) | 2012-02-07 | 2019-01-08 | Marie-Andrea I. Wilborn | Apparatus operable to protect and maintain positioning of an IV catheter |
Also Published As
Publication number | Publication date |
---|---|
JP5530454B2 (ja) | 2014-06-25 |
EP2492911A1 (en) | 2012-08-29 |
US20110268279A1 (en) | 2011-11-03 |
EP2492911B1 (en) | 2017-08-16 |
CN102257564B (zh) | 2013-07-10 |
JPWO2011048815A1 (ja) | 2013-03-07 |
CN102257564A (zh) | 2011-11-23 |
US8886548B2 (en) | 2014-11-11 |
EP2492911A4 (en) | 2015-04-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP5530454B2 (ja) | オーディオ符号化装置、復号装置、方法、回路およびプログラム | |
JP5208901B2 (ja) | 音声信号および音楽信号を符号化する方法 | |
KR101274827B1 (ko) | 다수 채널 오디오 신호를 디코딩하기 위한 장치 및 방법, 및 다수 채널 오디오 신호를 코딩하기 위한 방법 | |
TWI405187B (zh) | 可縮放語音及音訊編碼解碼器、包括可縮放語音及音訊編碼解碼器之處理器、及用於可縮放語音及音訊編碼解碼器之方法及機器可讀媒體 | |
JP6704037B2 (ja) | 音声符号化装置および方法 | |
US8340976B2 (en) | Method and apparatus for generating an enhancement layer within a multiple-channel audio coding system | |
KR101275892B1 (ko) | 오디오 신호를 인코딩하고 디코딩하기 위한 방법 및 장치 | |
KR101274802B1 (ko) | 오디오 신호를 인코딩하기 위한 장치 및 방법 | |
TW200841743A (en) | Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream | |
KR20140005277A (ko) | 저-지연 통합 스피치 및 오디오 코딩에서 에러 은닉을 위한 장치 및 방법 | |
WO2016016724A2 (ko) | 패킷 손실 은닉방법 및 장치와 이를 적용한 복호화방법 및 장치 | |
KR101809298B1 (ko) | 부호화 장치, 복호 장치, 부호화 방법 및 복호 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 201080003659.2 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 10824667 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2010824667 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010824667 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 13141169 Country of ref document: US Ref document number: 2011537144 Country of ref document: JP |