WO2024142357A1 - 音信号処理装置、音信号処理方法、プログラム - Google Patents

音信号処理装置、音信号処理方法、プログラム Download PDF

Info

Publication number
WO2024142357A1
WO2024142357A1 PCT/JP2022/048528 JP2022048528W WO2024142357A1 WO 2024142357 A1 WO2024142357 A1 WO 2024142357A1 JP 2022048528 W JP2022048528 W JP 2022048528W WO 2024142357 A1 WO2024142357 A1 WO 2024142357A1
Authority
WO
WIPO (PCT)
Prior art keywords
channel
signal
value
input sound
sound signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/048528
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
健弘 守谷
登 原田
優 鎌本
亮介 杉浦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to PCT/JP2022/048528 priority Critical patent/WO2024142357A1/ja
Priority to JP2024567128A priority patent/JPWO2024142357A1/ja
Publication of WO2024142357A1 publication Critical patent/WO2024142357A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing

Definitions

  • the present invention relates to a technology for processing two-channel stereo sound signals so as to suppress deterioration in the auditory quality of the decoded sound signal obtained by stereo encoding and decoding.
  • Patent Document 1 and Patent Document 2 describe technologies for processing the L-channel signal and the R-channel signal, respectively, to obtain an L-channel processed signal and an R-channel processed signal, and subjecting the L-channel processed signal and the R-channel processed signal to subsequent encoding processing.
  • Patent Document 1 the energy ratio and time difference between the L channel signal and the R channel signal are obtained as spatial information, and the signal of one of the channels is processed using the spatial information to obtain an L channel processed signal and an R channel processed signal that are more similar than the L channel signal and the R channel signal.
  • Patent Document 2 for each channel, the energy ratio and time difference between the channel signal and a monaural signal that is the average of the left channel signal and the right channel signal are obtained as spatial information for that channel, and the channel signal is made closer to the monaural signal using the spatial information for that channel to obtain an L channel processed signal and an R channel processed signal.
  • the audio signal encoding system 300 is as shown in FIG.
  • the sound signal coding system 300 performs the processes of step S100 and step S200 shown in Fig. 2 for each frame. If the number of samples per frame is T, the first channel input sound signals x1 (1), x1 (2), ..., x1 (T) and the second channel input sound signals x2 (1), x2 (2), ..., x2 (T) are input to the sound signal coding system 300 on a frame-by-frame basis, and the sound signal coding system 300 obtains and outputs a stereo code CS from the first channel input sound signals x1 (1), x1 (2), ..., x1 (T) and the second channel input sound signals x2 (1), x2 (2), ..., x2 (T) on a frame-by-frame basis.
  • T is a positive integer, and for example, if the frame length is 20 ms and the sampling frequency is 48 kHz, T is 960.
  • the fact that the second type of value is in a broadly monotonically decreasing relationship with the first type of value means that in the entire range in which the first type of value can be, the second type of value is in a monotonically decreasing relationship with the first type of value, or that in a portion of the range in which the first type of value can be (the first type of range), the second type of value is constant regardless of the first type of value, and in a range other than the portion of the range in which the first type of value can be (the range other than the first type of range, the second type of range), the second type of value is in a monotonically decreasing relationship with the first type of value.
  • There are one or more ranges for each of the first type of range and the second type of range That is, there may be a plurality of first type ranges, and there may be a plurality of second type ranges. Naturally, "broadly monotonically decreasing" may be read as "monotonically non-increasing".
  • the signal mixing unit 120 obtains, for each channel, a signal obtained by mixing the input sound signal of the channel with the input sound signal of the other channel in all possible ranges of the stereo encoding bit rate, where the higher the stereo encoding bit rate, the closer the signal is to the input sound signal of the channel; or, in a portion of the range of possible stereo encoding bit rates (a first type of range), obtains, for each channel, a signal obtained by mixing the input sound signal of the channel with the input sound signal of the other channel, where the closeness to the input sound signal of the channel is the same regardless of the stereo encoding bit rate, and in ranges other than the portion of the range of possible stereo encoding bit rates (a range other than the first type of range, a second type of range), obtains, for each channel, a signal obtained by mixing the input sound signal of the channel with the input sound signal of the other channel, where the higher the stereo encoding bit rate, the closer the signal is to the input sound signal of the channel (step S120).
  • the signal mixing unit 120 may obtain, for each channel, a signal that is a weighted addition of the input sound signal of that channel and the input sound signal of the other channel, where the weight of the input sound signal of that channel in the weighted addition is a value that has a broad-sense monotonically increasing relationship with the stereo encoding bit rate, and the weight of the input sound signal of the other channel in the weighted addition is a value that has a broad-sense monotonically decreasing relationship with the stereo encoding bit rate, as the signal to be encoded for that channel.
  • the value having a broad monotonically increasing relationship with the stereo encoding bit rate is, for example, a function value of a broad monotonically increasing function with the stereo encoding bit rate as an argument. Therefore, for example, a broad monotonically increasing function for each channel is stored in the signal mixing unit 120 in advance, and the signal mixing unit 120 obtains a function value for each channel of each frame by providing the stereo encoding bit rate of the frame as an argument to the broad monotonically increasing function for that channel, and sets the obtained function value as the weight of the input sound signal of that channel.
  • a set of each bit rate and each weight value corresponding to each bit rate that is predetermined so that the weight value has a broad monotonically increasing relationship with the bit rate is stored in the signal mixing unit 120 in advance, and the signal mixing unit 120 obtains a weight value corresponding to the stereo encoding bit rate of the frame from the stored weight values for each channel of each frame, and sets the obtained weight value as the weight of the input sound signal of that channel.
  • the value having a broad monotonically decreasing relationship with the stereo encoding bit rate is, for example, a function value of a broad monotonically decreasing function with the stereo encoding bit rate as an argument. Therefore, for example, a broad monotonically decreasing function for each channel is stored in the signal mixing unit 120 in advance, and the signal mixing unit 120 obtains a function value for each channel of each frame by providing the stereo encoding bit rate of the frame as an argument to the broad monotonically decreasing function for that channel, and sets the obtained function value as the weight of the input sound signal of the other channel.
  • the signal to be encoded for each channel may be obtained using a weighting value between the bit rate of the immediately preceding frame and the bit rate of the current frame.
  • the weighting value of the second channel determined from the bit rate of the previous frame is w p2
  • the weighting value of the second channel determined from the bit rate of the current frame is w c2
  • the second channel signal mixing unit 120-2 may use the value obtained by the following equation (2-5) as the weighting value w 2 (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and use w c2 as the weighting value w 2 (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame, to obtain the second channel encoding target signal x' 2 (t) represented by the following equation (2-6) instead of the above equation (2-2) for each time t of the current frame.
  • the first channel signal mixer 120-1 stores the weight value w c1 of the current frame and uses it as the weight value w p1 in processing the next frame.
  • the second channel signal mixer 120-2 stores the weight value w c2 of the current frame and uses it as the weight value w p2 in processing the next frame.
  • the signal mixer 120 obtaining the encoding target signal for each channel using the above formulas (2-4) and (2-6), it is possible to maintain the continuity of the waveform of the encoding target signal at the frame boundary even if the bit rate of the current frame is different from the bit rate of the immediately preceding frame.
  • weight value w1 and weight value wc1 are both values that have a broad-sense monotonically increasing relationship with the stereo encoding bit rate
  • weight value w1 (t) obtained by the above formula (2-3) is also a value that has a broad-sense monotonically increasing relationship with the stereo encoding bit rate.
  • weight value w2 and weight value wc2 are both values that have a broad-sense monotonically increasing relationship with the stereo encoding bit rate
  • weight value w2 (t) obtained by the above formula (2-5) is also a value that has a broad-sense monotonically increasing relationship with the stereo encoding bit rate.
  • the index value calculation unit 110 calculates an index value ⁇ that has a broad-sense monotonically increasing relationship with the stereo encoding bit rate of the stereo encoding device 200, or an index value ⁇ ' that has a broad-sense monotonically decreasing relationship with the stereo encoding bit rate of the stereo encoding device 200 (step S110).
  • the index value ⁇ or the index value ⁇ ' obtained by the index value calculation unit 110 is output to the signal mixer 120.
  • the index value calculation unit 110 may store in advance a set of information specifying the stereo encoding bit rate belonging to each of a plurality of partial ranges that divide the possible range of the stereo encoding bit rate, and each function value corresponding to each partial range that is predetermined so that the function value has a broad monotonic function relationship with the stereo encoding bit rate, and the index value calculation unit 110 may obtain, for each frame, a function value corresponding to the stereo encoding bit rate of the frame from among the stored function values, and obtain the obtained function value as the index value ⁇ .
  • the index value calculation unit 110 may use the stereo encoding bit rate itself as the index value ⁇ .
  • the value that has a broadly monotonically decreasing relationship with the stereo encoding bit rate of stereo encoding device 200 is, for example, the function value of a broadly monotonically decreasing function that has the stereo encoding bit rate of stereo encoding device 200 as an argument. Therefore, for example, the broadly monotonically decreasing function can be stored in advance in index value calculation unit 110, and index value calculation unit 110 can obtain a function value for each frame by providing the broadly monotonically decreasing function with the stereo encoding bit rate of the frame as an argument, and obtain the obtained function value as index value ⁇ '.
  • a set of information specifying the stereo encoding bit rate belonging to each partial range and each function value corresponding to each partial range that is predefined so that the function value has a broad-sense monotonically decreasing relationship with the stereo encoding bit rate is stored in the index value calculation unit 110 in advance, and the index value calculation unit 110 acquires, for each frame, a function value that corresponds to the stereo encoding bit rate of that frame from among the stored function values, and obtains the acquired function value as the index value ⁇ '.
  • the signal mixing unit 120 receives as input a first channel input sound signal and a second channel input sound signal which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, and the index value ⁇ or the index value ⁇ ' output from the index value calculation unit 110.
  • the signal mixing unit 120 to which the index value ⁇ is input obtains, for each of the first and second channels, a signal obtained by mixing the input sound signal of the channel with the input sound signal of the other channel, where the larger the index value ⁇ , the closer the signal is to the input sound signal of the channel, as a signal to be coded for the channel
  • the signal mixing unit 120 to which the index value ⁇ ' is input obtains, for each of the first and second channels, a signal obtained by mixing the input sound signal of the channel with the input sound signal of the other channel, where the smaller the index value ⁇ ', the closer the signal is to the input sound signal of the channel (step S120).
  • the two-channel encoding target signals i.e., two-channel stereo encoding target signals
  • the two-channel encoding target signals obtained by the signal mixer 120 are output to the stereo encoding device 200 as output signals of the sound signal processing device 100 .
  • the signal mixing unit 120 may include a first channel signal mixing unit 120-1 and a second channel signal mixing unit 120-2, as shown in Figure 3.
  • the first channel signal mixing unit 120-1 to which the index value ⁇ is input may obtain, as the first channel encoding target signal, a signal obtained by mixing the first channel input sound signal and the second channel input sound signal, where the larger the index value ⁇ , the closer the signal is to the first channel input sound signal
  • the first channel signal mixing unit 120-1 to which the index value ⁇ ' is input may obtain, as the first channel encoding target signal, a signal obtained by mixing the first channel input sound signal and the second channel input sound signal, where the smaller the index value ⁇ ', the closer the signal is to the first channel input sound signal.
  • the second channel signal mixing unit 120-2 to which the index value ⁇ is input may obtain, as the second channel encoding target signal, a signal obtained by mixing the second channel input sound signal and the first channel input sound signal, where the larger the index value ⁇ , the closer the signal is to the second channel input sound signal; and the second channel signal mixing unit 120-2 to which the index value ⁇ ' is input may obtain, as the second channel encoding target signal, a signal obtained by mixing the second channel input sound signal and the first channel input sound signal, where the smaller the index value ⁇ ', the closer the signal is to the second channel input sound signal.
  • the signal mixing unit 120 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel, and in other cases, that is, when the index value ⁇ is equal to or less than the predetermined value described above, may obtain, for each channel, a signal in which the input sound signal of that channel is mixed with the input sound signal of the other channel, and the larger the index value ⁇ , the closer the signal is to the input sound signal of that channel (step S120).
  • the signal mixing unit 120 may operate by replacing the previously described "greater than the predetermined value” and “equal to or less than the predetermined value” with “equal to or greater than the predetermined value” and “equal to or less than the predetermined value", respectively.
  • the signal mixer 120 obtains a first-channel encoding target signal x'1 (t) represented by the following equation (2-7) and a second-channel encoding target signal x'2 (t) represented by the following equation (2-8).
  • the signal mixer 120 may, for each frame, set the index value ⁇ calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ p and the index value ⁇ calculated by the index value calculation unit 110 for the current frame as ⁇ c , set the value obtained by the following equation (2-9) as the index value ⁇ (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and set ⁇ c as the index value ⁇ (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame, and may obtain a first-channel encoding target signal x' 1 (t) represented by the following equation (2-10) instead of the above equation (2-7), or may obtain a second-channel encoding target signal x' 2 (t) represented by the following equation (2-11) instead of the above equation (2-8), for each time t of
  • Index value calculation unit 110 obtains index value ⁇ ' which is greater than or equal to 0 and less than or equal to 0.5 and which has a monotonically decreasing relationship in a broad sense with the stereo encoding bitrate of stereo encoding device 200. For example, index value calculation unit 110 obtains index value ⁇ ' which is 0 when the stereo encoding bitrate of stereo encoding device 200 is the maximum value that the bitrate can take, is 0.5 when the stereo encoding bitrate of stereo encoding device 200 is the minimum value that the bitrate can take, and is a larger value as the stereo encoding bitrate of stereo encoding device 200 is lower.
  • the signal mixer 120 may, for each frame, use the index value ⁇ ' calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ 'p and the index value ⁇ ' calculated by the index value calculation unit 110 for the current frame as ⁇ 'c , use a value obtained by the following equation (2-14) as the index value ⁇ '(t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and use ⁇ 'c as the index value ⁇ '(t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame.
  • the downmix signal generated by the downmix signal generating unit 1201 may be any signal that is a mixture of a first channel input sound signal and a second channel input sound signal.
  • the downmix signal generating unit 1201 may generate a signal that is an average of the first channel input sound signal and the second channel input sound signal, or a signal that is an average of the first channel input sound signal and the second channel input sound signal while taking into account the time difference between the first channel input sound signal and the second channel input sound signal, etc. as the downmix signal.
  • the mixing unit 1211 receives as input a first channel input sound signal and a second channel input sound signal which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, and a downmix signal output from the downmix signal generation unit 1201.
  • the mixing unit 1211 obtains, as an encoding target signal for that channel (step S1211), a signal obtained by mixing the downmix signal with the input sound signal of that channel, and the higher the stereo encoding bit rate of the stereo encoding device 200, the closer the signal is to the input sound signal of that channel, and the lower the stereo encoding bit rate of the stereo encoding device 200, the closer the signal is to the downmix signal.
  • a signal in which the input sound signal of the channel is mixed with the downmix signal, and the higher the stereo encoding bit rate is, the closer the signal is to the input sound signal of the channel (i.e., the lower the stereo encoding bit rate is, the closer the signal is to the downmix signal) is obtained as the encoding target signal of the channel (step S1211).
  • Each of the first type of range and the second type of range is one or more ranges. That is, there may be a plurality of first type ranges, and there may be a plurality of second type ranges.
  • the value having a broad monotonically increasing relationship with the stereo encoding bit rate is, for example, a function value of a broad monotonically increasing function with the stereo encoding bit rate as an argument. Therefore, for example, a broad monotonically increasing function for each channel may be stored in the mixer 1211 in advance, and the mixer 1211 may obtain a function value for each channel of each frame by providing the stereo encoding bit rate of the frame as an argument to the broad monotonically increasing function for that channel, and use the obtained function value as the weight of the input sound signal of that channel.
  • the weighting value w1 is 1, the first-channel encoding target signal x'1 (t) expressed by the above equation (2-17) is the same as the first-channel input sound signal x1 (t), and when the weighting value w2 is 1, the second-channel encoding target signal x'2 (t) expressed by the above equation (2-18) is the same as the second-channel input sound signal x2 (t).
  • the first-channel encoding target signal x'1 (t) expressed by the above equation (2-17) is the same as the downmix signal xM (t)
  • the weighting value w2 is 0, the second-channel encoding target signal x'2 (t) expressed by the above equation (2-18) is the same as the downmix signal xM (t).
  • the mixing unit 1211 obtains, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel, and in a second range in which the stereo encoding bit rate is a range other than the first range of possible bit rates (i.e., the second case in which the first case is a case other than the first case, specifically, when the stereo encoding bit rate is equal to or less than the predetermined value described above), obtains, for each channel, a signal in which the input sound signal of that channel and the downmix signal are weighted together, wherein the weight of the input sound signal of that channel in the weighted addition is a value that has a broad-sense monotonically increasing relationship with the stereo encoding bit rate in the second range, and the weight of the downmix signal in the weighted
  • the mixer 1211 may perform an operation in which the above-mentioned "smaller than a predetermined value” and “equal to or greater than a predetermined value” are respectively read as “equal to or less than a predetermined value” and "equal to or greater than a predetermined value”.
  • Each of the first type of range and the second type of range is one or more ranges. That is, there may be multiple first-type ranges, and there may be multiple second-type ranges.
  • the mixing unit 1211 may operate by replacing the previously mentioned “greater than a predetermined first value” and “less than or equal to a predetermined first value” with “greater than or equal to a predetermined first value” and “less than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "greater than a predetermined second value” and “less than or equal to a predetermined second value” with “greater than or equal to a predetermined second value” and “less than a predetermined second value", respectively.
  • the weighting value of the first channel determined from the bit rate of the previous frame may be w p1
  • the weighting value of the first channel determined from the bit rate of the current frame may be w c1
  • the first channel mixing unit 1211-1 may use the value obtained by the following equation (2-19) as the weighting value w 1 (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and use w c1 as the weighting value w 1 (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame, to obtain a first channel encoding target signal x' 1 (t) represented by the following equation (2-20) instead of the above equation (2-17) for each time t of the current frame.
  • the second modification of the second embodiment may be implemented by including a process of calculating an index value according to a bit rate of stereo encoding by the stereo encoding device 200.
  • a form including a process of calculating an index value according to a bit rate of stereo encoding will be described as a third modification of the second embodiment.
  • the sound signal processing device 100 of the third modification of the second embodiment is as shown by the dashed and solid lines in FIG. 5, and includes an index value calculation unit 110 and a signal mixing unit 120, and the signal mixing unit 120 includes a downmix signal generation unit 1201 and a mixing unit 1211.
  • the sound signal processing device 100 performs a process of step S110 and a process of step S120 by steps S1201 and S1211.
  • the third modification of the second embodiment will be described mainly with respect to the differences from the second modification of the second embodiment.
  • the mixing unit 1211 receives as input a first channel input sound signal and a second channel input sound signal, which are two channel input sound signals constituting the two-channel stereo input sound signal input to the sound signal processing device 100, the downmix signal output from the downmix signal generation unit 1201, and the index value ⁇ or the index value ⁇ ' output from the index value calculation unit 110.
  • the mixer 1211 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel if the index value ⁇ is greater than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal of that channel with the downmix signal, where the larger the index value ⁇ , the closer the signal is to the input sound signal of that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the signal to be coded for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the previously described "greater than the predetermined value” and “equal to or less than the predetermined value” are respectively interpreted as “equal to or greater than the predetermined value” and “equal to or less than the predetermined value”.
  • the mixer 1211 to which the index value ⁇ is input may obtain, for each channel, the downmix signal as is as the encoding target signal for that channel when the index value ⁇ is smaller than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal for that channel, and the larger the index value ⁇ , the closer the signal is to the input sound signal for that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the encoding target signal for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "smaller than the predetermined value” and “equal to or greater than the predetermined value” are interpreted as “equal to or less than the predetermined value” and “equal to or greater than the predetermined value", respectively.
  • the mixing unit 1211 may operate by replacing the previously mentioned “greater than a predetermined first value” and “less than or equal to a predetermined first value” with “greater than or equal to a predetermined first value” and “less than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "greater than a predetermined second value” and “less than or equal to a predetermined second value” with “greater than or equal to a predetermined second value” and “less than a predetermined second value", respectively.
  • the first example is an example using the absolute value of the correlation coefficient. For each number of candidate samples ⁇ cand from a predetermined positive number ⁇ max to a predetermined negative number ⁇ min , the index value calculation unit 110 obtains an absolute value ⁇ cand of the correlation coefficient between a sample sequence of the first channel input sound signal and a sample sequence of the second channel input sound signal that is shifted backward from the sample sequence by each number of candidate samples ⁇ cand (step S110-A1). The index value calculation unit 110 then obtains the absolute value of ⁇ cand when the absolute value ⁇ cand of the correlation coefficient is maximum as the absolute value of the inter-channel time difference
  • of the inter-channel time difference of the previous frame is defined as w p2
  • of the inter-channel time difference of the current frame is defined as w c2 .
  • the signal mixing unit 120 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, and the index value ⁇ or the index value ⁇ ' output from the index value calculation unit 110.
  • the signal mixing unit 120 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel, and in other cases, that is, when the index value ⁇ is equal to or less than the predetermined value described above, may obtain, for each channel, a signal in which the input sound signal of that channel is mixed with the input sound signal of the other channel, and the larger the index value ⁇ , the closer the signal is to the input sound signal of that channel (step S120).
  • the signal mixing unit 120 may operate by replacing the previously described "greater than the predetermined value” and “equal to or less than the predetermined value” with “equal to or greater than the predetermined value” and “equal to or less than the predetermined value", respectively.
  • the signal mixer 120 obtains, for each time t, the first-channel encoding target signal x'1 (t) represented by the above equation (2-7) and the second-channel encoding target signal x'2 (t) represented by the above equation (2-8).
  • the signal mixer 120 may, for each frame, take the index value ⁇ calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ p and the index value ⁇ calculated by the index value calculation unit 110 for the current frame as ⁇ c , set the value obtained by the above equation (2-9) as the index value ⁇ (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and set ⁇ c as the index value ⁇ (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame, and may obtain the first-channel encoding target signal x' 1 (t) represented by the above equation (2-10) instead of the above equation (2-7) for each time t of the current frame, or may obtain the second-channel encoding target signal x' 2 (t) represented by the above equation (2-11) instead of the above equation (2
  • the index value calculation unit 110 obtains an index value ⁇ ' that is greater than or equal to 0 and less than or equal to 0.5 and has a monotonically increasing relationship in a broad sense with the absolute value
  • the signal mixer 120 may, for each frame, use the index value ⁇ ' calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ 'p and the index value ⁇ ' calculated by the index value calculation unit 110 for the current frame as ⁇ 'c , use the value obtained by the above equation (2-14) as the index value ⁇ '(t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and use ⁇ 'c as the index value ⁇ '(t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame.
  • the signal mixer 120 may obtain the first-channel encoding target signal x' 1 (t) represented by the above equation (2-15) instead of the above equation (2-12), or may obtain the second-channel encoding target signal x' 2 (t) represented by the above equation (2-16) instead of the above equation (2-13).
  • the third embodiment may be implemented by including a process of mixing two-channel stereo input sound signals to generate a downmix signal.
  • An embodiment including a process of generating a downmix signal will be described as Modification 2 of the third embodiment.
  • the sound signal processing device 100 of Modification 2 of the third embodiment is as shown by the dashed line, dashed line, and solid line in Fig. 5, and includes an index value calculation unit 110 and a signal mixing unit 120, and the signal mixing unit 120 includes a downmix signal generation unit 1201 and a mixing unit 1211.
  • the sound signal processing device 100 performs a process of step S110 and a process of step S120 by steps S1201 and S1211.
  • the modification 2 of the third embodiment will be described mainly with respect to the differences from the third embodiment.
  • index value calculation unit 110 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100.
  • the index value calculation unit 110 calculates an absolute value
  • of the inter-channel time difference obtained by the index value calculation unit 110 is output to the signal mixing unit 120.
  • the input/output and operation of the downmix signal generation unit 1201 are the same as those of the second and third modifications of the second embodiment, and are as described in the second modification of the second embodiment.
  • the downmix signal generation unit 1201 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting a two-channel stereo input sound signal input to the sound signal processing device 100.
  • the downmix signal generation unit 1201 mixes the first channel input sound signal and the second channel input sound signal to generate a downmix signal (step S1201).
  • the downmix signal obtained by the downmix signal generation unit 1201 is output to a mixer 1211.
  • the mixer 1211 receives as input a first channel input sound signal and a second channel input sound signal which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, a downmix signal output from the downmix signal generation unit 1201, and an absolute value
  • the mixer 1211 obtains, as a coding target signal for that channel (step S1211), a signal obtained by mixing the downmix signal with the input sound signal of that channel, where the smaller the absolute value
  • the mixer 1211 obtains, as the encoding target signal for that channel, a signal obtained by mixing the input sound signal and the downmix signal for that channel, where the smaller the absolute value
  • the encoding target signals for the two channels obtained by the mixer 1211 i.e., two-channel stereo encoding target signals
  • the mixing unit 1211 may include a first channel mixing unit 1211-1 and a second channel mixing unit 1211-2.
  • the first channel mixing unit 1211-1 may obtain, as a first channel encoding target signal, a signal obtained by mixing a first channel input sound signal and a downmix signal, in which the smaller the absolute value
  • the second channel mixing unit 1211-2 may obtain, as a second channel encoding target signal, a signal obtained by mixing a second channel input sound signal and a downmix signal, in which the smaller the absolute value
  • the first channel mixer 1211-1 may obtain the first-channel encoding target signal x' 1 (t) represented by the above formula (2-17) for each time t
  • the second channel mixer 1211-2 may obtain the second-channel encoding target signal x' 2 (t) represented by the above formula (2-18) for each time t, using weight values w 1 and w 2 that are between 0 and 1 and have a negative correlation with the absolute value
  • the weight values w 1 and w 2 may be the same value or different values.
  • the mixer 1211 may obtain, for each channel, a signal obtained by weighting and adding the input sound signal and downmix signal of that channel, where the weight of the input sound signal of that channel in the weighting and addition is a value that has a broad-sense monotonically decreasing relationship with the absolute value of the inter-channel time difference
  • of the inter-channel time difference is, for example, the function value of a broadly-sense monotonically decreasing function with the absolute value
  • the mixer 1211 may store in advance a set of information for identifying the absolute value
  • the weighting value w1 is 1, the first-channel encoding target signal x'1 (t) expressed by the above equation (2-17) is the same as the first-channel input sound signal x1 (t), and when the weighting value w2 is 1, the second-channel encoding target signal x'2 (t) expressed by the above equation (2-18) is the same as the second-channel input sound signal x2 (t).
  • the weighting value w1 and the weighting value w2 are 1 when the absolute value
  • the first-channel encoding target signal x'1 (t) expressed by the above formula (2-17) is the same as the downmix signal xM (t)
  • the weighting value w2 is 0, the second-channel encoding target signal x'2 (t) expressed by the above formula (2-18) is the same as the downmix signal xM (t).
  • the mixer 1211 may treat the downmix signal as it is for each channel as the encoding target signal for that channel when the absolute value
  • the mixer 1211 obtains, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel, and in cases other than the above, i.e., when the absolute value
  • the mixing unit 1211 may perform an operation in which the above-mentioned "smaller than a predetermined value” and “equal to or greater than a predetermined value” are respectively interpreted as “equal to or less than a predetermined value” and "equal to or greater than a predetermined value”.
  • the mixer 1211 obtains, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel, and in a second range in which the absolute value
  • the mixer 1211 may perform an operation in which the above-mentioned "smaller than a predetermined value” and “equal to or greater than a predetermined value” are respectively interpreted as “equal to or less than a predetermined value” and "equal to or greater than a predetermined value”.
  • the mixer 1211 obtains, for each channel, the downmix signal as it is as the signal to be coded for that channel, and in any other case, i.e., when the absolute value
  • the mixer 1211 may perform an operation in which the above-mentioned "greater than a predetermined value” and “equal to or smaller than a predetermined value” are respectively read as “equal to or larger than a predetermined value” and "equal to or smaller than a predetermined value”.
  • the mixer 1211 obtains, for each channel, the input sound signal of the channel as is as the signal to be coded for the channel, and when the absolute value
  • of the inter-channel time difference may be obtained as the signal to be coded for the channel, and in a range other than the part of the ranges (ranges other than the first type of range, second type of range) of the possible ranges of the absolute value
  • the mixer 1211 may operate by replacing the above-mentioned "smaller than a predetermined first value” and “greater than a predetermined first value” with “smaller than a predetermined first value” and “greater than a predetermined first value”, respectively, and may operate by replacing the above-mentioned "smaller than a predetermined second value” and “greater than a predetermined second value” with “smaller than a predetermined second value” and “greater than a predetermined second value", respectively.
  • the first type of range and the second type of range each include one or more ranges. That is, there may be multiple first type ranges, and there may be multiple second type ranges.
  • the mixing unit 1211 may operate by replacing the previously mentioned “smaller than a predetermined first value” and “greater than or equal to a predetermined first value” with “smaller than a predetermined first value” and “greater than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "smaller than a predetermined second value” and “greater than or equal to a predetermined second value” with “smaller than a predetermined second value” and “greater than a predetermined second value", respectively.
  • of the previous frame is defined as w p1
  • of the current frame is defined as w c1
  • the first channel mixing unit 1211-1 may use the value obtained by the above equation (2-19) as the weighting value w 1 (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and use w c1 as the weighting value w 1 (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame, thereby obtaining the first channel encoding target signal x' 1 (t) represented by the above equation (2-20) instead of the above equation ( 2-17 ) for each
  • the second channel mixing unit 1211-2 may use w p2 as the weighting value for the second channel determined from the absolute value
  • the second modification of the third embodiment may be implemented by including a process of calculating an index value according to the absolute value
  • of the inter-channel time difference will be described as a third modification of the third embodiment.
  • the sound signal processing device 100 of the third modification of the third embodiment is as shown by a dashed line, a dashed line, and a solid line in FIG. 5, and includes an index value calculation unit 110 and a signal mixing unit 120, and the signal mixing unit 120 includes a downmix signal generation unit 1201 and a mixing unit 1211.
  • the sound signal processing device 100 performs a process of step S110 and a process of step S120 by steps S1201 and S1211.
  • the third modification of the third embodiment will be described mainly with respect to the differences from the second modification of the third embodiment.
  • index value calculation unit 110 The input/output and operation of the index value calculation unit 110 are the same as those of the first modification of the third embodiment, and are as described in the first modification of the third embodiment.
  • the first channel input sound signal and the second channel input sound signal which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, are input to the index value calculation unit 110.
  • the input/output and operation of the downmix signal generation unit 1201 are the same as those of Modifications 2 and 3 of the second embodiment and Modification 2 of the third embodiment, and the details are as described in Modification 2 of the second embodiment.
  • a first channel input sound signal and a second channel input sound signal which are input sound signals of two channels constituting a two-channel stereo input sound signal input to the sound signal processing device 100, are input to the downmix signal generation unit 1201.
  • the downmix signal generation unit 1201 mixes the first channel input sound signal and the second channel input sound signal to generate a downmix signal (step S1201).
  • the downmix signal obtained by the downmix signal generation unit 1201 is output to a mixer 1211.
  • the mixer 1211 receives, as inputs, a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, the downmix signal output from the downmix signal generation unit 1201, and the index value ⁇ or the index value ⁇ ' output from the index value calculation unit 110.
  • the mixer 1211 to which the index value ⁇ is input obtains, for each of the first and second channels, a signal obtained by mixing the input sound signal of the channel with the downmix signal, and the larger the index value ⁇ , the closer the signal is to the input sound signal of the channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as a signal to be coded for the channel
  • the mixer 1211 to which the index value ⁇ ' is input obtains, for each of the first and second channels, a signal obtained by mixing the input sound signal of the channel with the downmix signal, and the smaller the index value ⁇ ', the closer the signal is to the input sound signal of the channel (i.e., the larger the index value ⁇ ', the closer the signal is to the downmix signal), as a signal to be coded for the channel (step S1201).
  • the coding target signals of the two channels obtained by the mixer 1211 i.e., two-channel stereo coding target signals
  • the mixer 1211 may perform an operation in which the previously described "greater than the predetermined value” and “equal to or less than the predetermined value” are respectively interpreted as “equal to or greater than the predetermined value” and “equal to or less than the predetermined value”.
  • the mixer 1211 to which the index value ⁇ is input may obtain, for each channel, the downmix signal as is as the encoding target signal for that channel when the index value ⁇ is smaller than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal for that channel, and the larger the index value ⁇ , the closer the signal is to the input sound signal for that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the encoding target signal for that channel (step S1211).
  • the mixing unit 1211 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel if the index value ⁇ is greater than a predetermined first value, and may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel if the index value ⁇ is equal to or less than a predetermined second value which is smaller than the predetermined first value described above, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal for that channel, where the larger the index value ⁇ , the closer the signal is to the input sound signal for that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the signal to be encoded for that channel (step S1211).
  • the mixer 1211 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the encoding target signal for that channel when the index value ⁇ ' is smaller than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal of that channel with the downmix signal, in which the smaller the index value ⁇ ' is, the closer the signal is to the input sound signal of that channel (i.e., the larger the index value ⁇ ' is, the closer the signal is to the downmix signal), as the encoding target signal for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "smaller than the predetermined value” and “equal to or greater than the predetermined value” are interpreted as “equal to or less than the predetermined value” and “equal to or greater than the predetermined value", respectively.
  • the mixing unit 1211 may operate by replacing the previously mentioned “smaller than a predetermined first value” and “greater than or equal to a predetermined first value” with “smaller than a predetermined first value” and “greater than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "smaller than a predetermined second value” and “greater than or equal to a predetermined second value” with “smaller than a predetermined second value” and “greater than a predetermined second value", respectively.
  • the index value calculation unit 110 obtains an index value ⁇ that is greater than or equal to 0 and less than or equal to 1 and has a monotonically decreasing relationship in a broad sense with the absolute value
  • the index value calculation unit 110 obtains an index value ⁇ that is 0 when the absolute value
  • the index value calculation unit 110 uses the absolute value of the inter-channel time difference
  • the index value calculation unit 110 may obtain the index value ⁇ expressed by the following equation (3-8).
  • the mixer 1211 obtains, for each time t, the first-channel encoding target signal x' 1 (t) expressed by the above equation (2-23), and obtains the second-channel encoding target signal x' 2 (t) expressed by the above equation (2-24).
  • the mixer 1211 may take the index value ⁇ calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ p and the index value ⁇ calculated by the index value calculation unit 110 for the current frame as ⁇ c , set the value obtained by the above equation (2-25) as the index value ⁇ (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and set ⁇ c as the index value ⁇ (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame.
  • the mixer 1211 may obtain the first-channel encoding target signal x' 1 (t) represented by the above equation (2-26) instead of the above equation (2-23), or may obtain the second-channel encoding target signal x' 2 (t) represented by the above equation (2-27) instead of the above equation (2-24).
  • the index value calculation unit 110 obtains an index value ⁇ ' that is greater than or equal to 0 and less than or equal to 1 and has a monotonically increasing relationship in a broad sense with the absolute value
  • the index value calculation unit 110 obtains an index value ⁇ ' that is 0 when the absolute value
  • the mixer 1211 may obtain, for each frame, the first-channel encoding target signal x' 1 ( t) represented by the above equation (2-31) instead of the above equation (2-28) or the second-channel encoding target signal x' 2 ( t ) represented by the above equation (2-32) instead of the above equation (2-29), using, for each frame, the index value ⁇ ' calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ ' p and the index value ⁇ ' calculated by the index value calculation unit 110 for the current frame as ⁇ ' c , and using the value obtained by the above equation (2-30) as the index value ⁇ '(t) for each time from the first time (i.e., the 1st time ) to the T 0 -1th time of the current frame, and using ⁇ ' c as the index value ⁇ ' (t) for each time from the T 0th time to the last time (i.e.
  • the process of obtaining the index value ⁇ using the index value of the single sound source-likeness of the two-channel stereo input sound signal can be performed by, for example, storing in advance in the index value calculation unit 110 a set of information for identifying the index value of the single sound source-likeness of the two-channel stereo input sound signal belonging to each partial range, for a plurality of partial ranges that divide the range in which the index value of the single sound source-likeness of the two-channel stereo input sound signal can take, and each function value corresponding to each partial range that is predetermined so that the function value has a broadly monotonically increasing relationship with the index value of the single sound source-likeness of the two-channel stereo input sound signal, and the index value calculation unit 110 acquiring, for each frame, a function value that corresponds to the index value of the single sound source-likeness of the two-channel stereo input sound signal of that frame from among the stored function values, and setting the acquired function value as the index value ⁇ .
  • the index value calculation unit 110 may use a predetermined positive number ⁇ range to obtain an average value for each ⁇ cand using the above formula (3-5), and obtain a normalized correlation value obtained by the above formula (3-6) using the obtained average value ⁇ c ( ⁇ cand ) and the phase difference signal ⁇ ( ⁇ cand ) as ⁇ cand (step S110-C1-B5').
  • the index value calculation unit 110 may obtain the maximum value of ⁇ cand obtained in step S110-C1-B5 or step S110-C1-B5' as an index value of the single sound source-likeliness of the two-channel stereo input sound signals (step S110-C1-B6').
  • a set of information that specifies the index value ⁇ that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically increasing relationship with the index value ⁇ is stored in the signal mixing unit 120 in advance for each channel, and the signal mixing unit 120 obtains a weight value that corresponds to the index value ⁇ of the frame from the stored weight values for each channel of each frame, and sets the obtained weight value as the weight of the input sound signal of that channel.
  • Each set that is stored in advance may be the same or different for the first and second channels.
  • a value that has a monotonically decreasing relationship with the index value ⁇ is, for example, a function value of a monotonically decreasing function with the index value ⁇ as an argument. Therefore, for example, a monotonically decreasing function for each channel is stored in advance in the signal mixing unit 120, and for each channel in each frame, the signal mixing unit 120 provides the index value ⁇ as an argument to the monotonically decreasing function for that channel to obtain a function value, and sets the obtained function value as the weight of the input sound signal for the other channel.
  • the monotonically decreasing function for the first channel and the monotonically decreasing function for the second channel may be the same or different.
  • the signal mixing unit 120 to which the index value ⁇ ' is input obtains, for each channel, a signal obtained by weighting and adding the input sound signal of that channel and the input sound signal of the other channel, where the weight of the input sound signal of that channel in the weighting and addition is a value that has a monotonically decreasing relationship with the index value ⁇ ', and the weight of the input sound signal of the other channel in the weighting and addition is a value that has a monotonically increasing relationship with the index value ⁇ ' or a signal that is the index value ⁇ ', as the signal to be coded for that channel.
  • a value that has a monotonically increasing relationship with the index value ⁇ ' is, for example, the function value of a monotonically increasing function with the index value ⁇ ' as an argument. Therefore, for example, a monotonically increasing function for each channel is stored in advance in the signal mixing unit 120, and for each channel of each frame, the signal mixing unit 120 provides the index value ⁇ ' as an argument to the monotonically increasing function for that channel to obtain a function value, and sets the obtained function value as the weight of the input sound signal of the other channel.
  • the monotonically increasing function for the first channel and the monotonically increasing function for the second channel may be the same or different.
  • stereo coding methods are designed with consideration given to the reproducibility of the sound itself emitted by the sound source and the reproducibility of the localization of the sound source.
  • a signal to be coded on two channels contains mainly sounds emitted by a single sound source
  • the amount of information required to represent the localization of the sound source is small, so not only is the reproducibility of the localization of the sound source high, but the reproducibility of the sound itself emitted by the sound source is also high.
  • a signal to be coded on two channels contains mainly sounds emitted by multiple sound sources, a large amount of information is required to represent the localization of the multiple sound sources, which may result in poor reproducibility of the sound itself emitted by the sound source.
  • the signal mixing unit 120 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel in a first range in which the index value ⁇ can take is greater than a predetermined value (i.e., the first case in which the index value ⁇ is greater than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal of that channel and the input sound signal of the other channel are weighted together, wherein the weight of the input sound signal of that channel in the weighted addition is a value or index value ⁇ that is monotonically increasing with respect to the index value ⁇ in the second range, and the weight of the input sound signal of the other channel in the weighted addition is a value that is monotonically decreasing with respect to the index value ⁇ in the second range.
  • the signal mixing unit 120 may operate by replacing the previously mentioned "greater than a predetermined value” and “less than a predetermined value” with "greater than a predetermined value” and "less
  • the signal mixing unit 120 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel, and in any other case, that is, when the index value ⁇ ' is equal to or greater than the predetermined value described above, may obtain, for each channel, a signal in which the input sound signal of that channel is mixed with the input sound signal of the other channel, and the smaller the index value ⁇ ', the closer the signal is to the input sound signal of that channel (step S120).
  • the signal mixing unit 120 may operate by replacing the previously described "smaller than a predetermined value” and “equal to or greater than a predetermined value” with “equal to or less than a predetermined value” and “equal to or greater than a predetermined value”, respectively.
  • the signal mixing unit 120 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel in a first range in which the index value ⁇ ' can be in a range in which the index value ⁇ ' is smaller than a predetermined value (i.e., the first case in which the index value ⁇ ' is smaller than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal of that channel and the input sound signal of the other channel are weighted together, wherein the weight of the input sound signal of that channel in the weighted addition is a value that is monotonically decreasing with respect to the index value ⁇ ' in the second range, and the weight of the input sound signal of the other channel in the weighted addition is a value or index value ⁇ ' that is monotonically increasing with respect to the index value ⁇ ' in the second range.
  • the signal mixing unit 120 may operate by replacing the previously mentioned "smaller than a predetermined value" and "greater than or
  • the index value calculation unit 110 obtains an index value ⁇ that is equal to or greater than 0.5 and equal to or less than 1 and has a monotonically increasing relationship with respect to the single sound source-likeness in a broad sense.
  • the index value calculation unit 110 obtains an index value ⁇ that is 0.5 when the index value of the single sound source-likeness is the minimum value that the index value can take, and that is 1 when the index value of the single sound source-likeness is the maximum value that the index value can take, and that the larger the index value of the single sound source-likeness is, the larger the value that the index value calculation unit 110 obtains as the index value ⁇ .
  • the signal mixer 120 obtains, for each time t, the first-channel encoding target signal x'1 (t) represented by the above equation (2-7) and the second-channel encoding target signal x'2 (t) represented by the above equation (2-8).
  • the signal mixer 120 may, for each frame, take the index value ⁇ calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ p and the index value ⁇ calculated by the index value calculation unit 110 for the current frame as ⁇ c , set the value obtained by the above equation (2-9) as the index value ⁇ (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and set ⁇ c as the index value ⁇ (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame, and may obtain the first-channel encoding target signal x' 1 (t) represented by the above equation (2-10) instead of the above equation (2-7) for each time t of the current frame, or may obtain the second-channel encoding target signal x' 2 (t) represented by the above equation (2-11) instead of the above equation (2
  • the index value calculation unit 110 obtains an index value ⁇ ' that is greater than or equal to 0 and less than or equal to 0.5 and has a monotonically decreasing relationship in a broad sense with respect to the single sound source-likeness. For example, the index value calculation unit 110 obtains an index value ⁇ ' that is 0 when the index value of the single sound source-likeness is the maximum value that the index value can take, is 0.5 when the index value of the single sound source-likeness is the minimum value that the index value can take, and is a larger value as the index value of the single sound source-likeness is smaller.
  • the signal mixer 120 obtains, for each time t, the first-channel encoding target signal x'1 (t) expressed by the above equation (2-12) and the second-channel encoding target signal x'2 (t) expressed by the above equation (2-13).
  • the signal mixer 120 may, for each frame, use the index value ⁇ ' calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ 'p and the index value ⁇ ' calculated by the index value calculation unit 110 for the current frame as ⁇ 'c , use the value obtained by the above equation (2-14) as the index value ⁇ '(t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and use ⁇ 'c as the index value ⁇ '(t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame.
  • the signal mixer 120 may obtain the first-channel encoding target signal x' 1 (t) represented by the above equation (2-15) instead of the above equation (2-12), or may obtain the second-channel encoding target signal x' 2 (t) represented by the above equation (2-16) instead of the above equation (2-13).
  • the fourth embodiment may be implemented by including a process of mixing two-channel stereo input sound signals to generate a downmix signal.
  • An embodiment including a process of generating a downmix signal will be described as Modification 1 of the fourth embodiment.
  • the sound signal processing device 100 of Modification 1 of the fourth embodiment is as shown by the dashed line, dashed line, and solid line in Fig. 5, and includes an index value calculation unit 110 and a signal mixing unit 120, and the signal mixing unit 120 includes a downmix signal generation unit 1201 and a mixing unit 1211.
  • the sound signal processing device 100 performs a process of step S110 and a process of step S120 by steps S1201 and S1211.
  • the modification 1 of the fourth embodiment will be described mainly with respect to the differences from the fourth embodiment.
  • index value calculation unit 110 receives a first channel input sound signal and a second channel input sound signal, which are two channel input sound signals constituting the two-channel stereo input sound signal input to the sound signal processing device 100.
  • the index value calculation unit 110 calculates an index value ⁇ that is in a broadly monotonically increasing relationship with respect to the single sound source-likeness of the two-channel stereo input sound signal, or an index value ⁇ ' that is in a broadly monotonically decreasing relationship with respect to the single sound source-likeness of the two-channel stereo input sound signal (step S110).
  • the index value ⁇ or the index value ⁇ ' obtained by the index value calculation unit 110 is output to the signal mixing unit 120.
  • the input/output and operation of the downmix signal generation unit 1201 are the same as those of the second and third modifications of the second embodiment and the third modification, and are as described in the second modification of the second embodiment in detail.
  • the downmix signal generation unit 1201 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting a two-channel stereo input sound signal input to the sound signal processing device 100.
  • the downmix signal generation unit 1201 mixes the first channel input sound signal and the second channel input sound signal to generate a downmix signal (step S1201).
  • the downmix signal obtained by the downmix signal generation unit 1201 is output to the mixer 1211.
  • the mixer 1211 receives, as inputs, a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, the downmix signal output from the downmix signal generation unit 1201, and the index value ⁇ or the index value ⁇ ' output from the index value calculation unit 110.
  • the mixer 1211 to which the index value ⁇ is input obtains, for each of the first and second channels, a signal obtained by mixing the input sound signal of the channel with the downmix signal, and the larger the index value ⁇ , the closer the signal is to the input sound signal of the channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as a signal to be coded for the channel
  • the mixer 1211 to which the index value ⁇ ' is input obtains, for each of the first and second channels, a signal obtained by mixing the input sound signal of the channel with the downmix signal, and the smaller the index value ⁇ ', the closer the signal is to the input sound signal of the channel (i.e., the larger the index value ⁇ ', the closer the signal is to the downmix signal), as a signal to be coded for the channel (step S1201).
  • the coding target signals of the two channels obtained by the mixer 1211 i.e., two-channel stereo coding target signals
  • the mixer 1211 to which the index value ⁇ is input obtains, for each channel, a signal obtained by weighting and adding the input sound signal and downmix signal of that channel, where the weight of the input sound signal of that channel in the weighting and addition is a value or index value ⁇ that has a monotonically increasing relationship with the index value ⁇ , and the weight of the downmix signal in the weighting and addition is a value that has a monotonically decreasing relationship with the index value ⁇ , as the encoding target signal for that channel.
  • the value that is in a monotonically increasing relationship with the index value ⁇ is, for example, a function value of a monotonically increasing function with the index value ⁇ as an argument. Therefore, for example, a monotonically increasing function for each channel is stored in the mixer 1211 in advance, and the mixer 1211 obtains a function value for each channel of each frame by giving the index value ⁇ as an argument to the monotonically increasing function for that channel, and sets the obtained function value as the weight of the input sound signal of that channel.
  • the monotonically increasing function for the first channel and the monotonically increasing function for the second channel may be the same or different.
  • a set of information that specifies the index value ⁇ that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically increasing relationship with the index value ⁇ is stored in the mixer 1211 in advance for each channel, and the mixer 1211 obtains a weight value that corresponds to the index value ⁇ of the frame from the stored weight values for each channel of each frame, and sets the obtained weight value as the weight of the input sound signal of that channel.
  • Each set that is stored in advance may be the same or different for the first and second channels.
  • the value that is in a monotonically decreasing relationship with the index value ⁇ is, for example, a function value of a monotonically decreasing function with the index value ⁇ as an argument. Therefore, for example, a monotonically decreasing function for each channel may be stored in the mixer 1211 in advance, and the mixer 1211 may obtain a function value for each channel of each frame by providing the index value ⁇ as an argument to the monotonically decreasing function for that channel, and use the obtained function value as the weight of the downmix signal.
  • the monotonically decreasing function for the first channel and the monotonically decreasing function for the second channel may be the same or different.
  • the mixer 1211 to which the index value ⁇ ' is input obtains, for each channel, a signal obtained by weighting and adding the input sound signal and downmix signal of that channel, where the weight of the input sound signal of that channel in the weighting and addition is a value that has a monotonically decreasing relationship with the index value ⁇ ', and the weight of the downmix signal in the weighting and addition is a value that has a monotonically increasing relationship with the index value ⁇ ' or a signal that is the index value ⁇ ', as the signal to be coded for that channel.
  • a value that has a monotonically decreasing relationship with the index value ⁇ ' is, for example, a function value of a monotonically decreasing function with the index value ⁇ ' as an argument. Therefore, for example, a monotonically decreasing function for each channel is stored in advance in the mixer 1211, and for each channel of each frame, the mixer 1211 obtains a function value by providing the index value ⁇ ' as an argument to the monotonically decreasing function for that channel, and sets the obtained function value as the weight of the input sound signal for that channel.
  • the monotonically decreasing function for the first channel and the monotonically decreasing function for the second channel may be the same or different.
  • a set of information specifying the index value ⁇ ' that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically decreasing relationship with the index value ⁇ ' may be stored in the mixer 1211 for each channel in advance, and the mixer 1211 may acquire, for each channel of each frame, a weight value that corresponds to the index value ⁇ ' of that frame from the stored weight values, and set the acquired weight value as the weight of the input sound signal of that channel.
  • the sets stored in advance may be the same or different for the first and second channels.
  • the value that has a monotonically increasing relationship with the index value ⁇ ' is, for example, the function value of a monotonically increasing function with the index value ⁇ ' as an argument. Therefore, for example, a monotonically increasing function for each channel is stored in advance in the mixer 1211, and for each channel of each frame, the mixer 1211 obtains a function value by providing the index value ⁇ ' as an argument to the monotonically increasing function for that channel, and sets the obtained function value as the weight of the downmix signal.
  • the monotonically increasing function for the first channel and the monotonically increasing function for the second channel may be the same or different.
  • the mixer 1211 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel if the index value ⁇ is greater than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal of that channel with the downmix signal, where the larger the index value ⁇ , the closer the signal is to the input sound signal of that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the signal to be coded for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the previously described "greater than the predetermined value” and “equal to or less than the predetermined value” are respectively interpreted as “equal to or greater than the predetermined value” and “equal to or less than the predetermined value”.
  • the mixing unit 1211 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel in a first range in which the index value ⁇ can take is greater than a predetermined value (i.e., the first case in which the index value ⁇ is greater than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal of that channel and the downmix signal are weighted together, in which the weight of the input sound signal of that channel in the weighted addition is a value or index value ⁇ that is monotonically increasing with respect to the index value ⁇ in the second range, and the weight of the downmix signal in the weighted addition is a value that is monotonically decreasing with respect to the index value ⁇ in the second range.
  • the mixing unit 1211 may operate by replacing the previously mentioned "greater than a specified value" and “less than or equal to a specified value” with "greater than or equal to a specified value” and "less than a specified value
  • the mixer 1211 to which the index value ⁇ is input may obtain, for each channel, the downmix signal as is as the encoding target signal for that channel when the index value ⁇ is smaller than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal for that channel, and the larger the index value ⁇ , the closer the signal is to the input sound signal for that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the encoding target signal for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "smaller than the predetermined value” and “equal to or greater than the predetermined value” are respectively interpreted as “equal to or less than the predetermined value” and “equal to or greater than the predetermined value”.
  • the mixing unit 1211 to which the index value ⁇ is input may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel in a first range in which the index value ⁇ can be in a range where the index value ⁇ is smaller than a predetermined value (i.e., in the first case where the index value ⁇ is smaller than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal and the downmix signal for that channel are weighted together, in which the weight of the input sound signal for that channel in the weighted addition is a value or index value ⁇ that is monotonically increasing with respect to the index value ⁇ in the second range, and the weight of the downmix signal in the weighted addition is a value that is monotonically decreasing with respect to the index value ⁇ in the second range.
  • the mixing unit 1211 may operate by replacing the previously mentioned "smaller than a predetermined value" and "greater than or equal to a predetermined value” with "less than or equal to a predetermined
  • the mixing unit 1211 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel if the index value ⁇ is greater than a predetermined first value, and may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel if the index value ⁇ is equal to or less than a predetermined second value which is smaller than the predetermined first value described above, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal of that channel, where the larger the index value ⁇ , the closer the signal is to the input sound signal of that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the signal to be encoded for that channel (step S1211).
  • the mixing unit 1211 may operate by replacing the previously mentioned “greater than a predetermined first value” and “less than or equal to a predetermined first value” with “greater than or equal to a predetermined first value” and “less than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "greater than a predetermined second value” and “less than or equal to a predetermined second value” with “greater than or equal to a predetermined second value” and “less than a predetermined second value", respectively.
  • the mixing unit 1211 to which the index value ⁇ is input obtains, for each channel, the input sound signal of the channel as is as the encoding target signal for the channel in a first range in which the index value ⁇ can take is greater than a predetermined first value (i.e., in the first case where the index value ⁇ is greater than the predetermined first value), and obtains, for each channel, the downmix signal as is as the encoding target signal for the channel in a second range in which the index value ⁇ can take is equal to or less than a predetermined second value smaller than the first value described above (i.e., in the second case where the index value ⁇ is equal to or less than the predetermined second value smaller than the first value described above).
  • a third range which is a range that is neither the first range nor the second range (that is, in the third case which is neither the first case nor the second case, specifically, when the index value ⁇ is equal to or less than the above-mentioned predetermined first value and greater than the above-mentioned predetermined second value)
  • a signal obtained by weighting together an input sound signal and a downmix signal of the channel in which the weight of the input sound signal of the channel in the weighting addition is a value or index value ⁇ that has a monotonically increasing relationship with the index value ⁇ in the third range, and the weight of the downmix signal in the weighting addition is a value that has a monotonically decreasing relationship with the index value ⁇ in the third range, may be obtained as the encoding target signal of the channel.
  • the mixing unit 1211 may operate by replacing the previously mentioned “greater than a predetermined first value” and “less than or equal to a predetermined first value” with “greater than or equal to a predetermined first value” and “less than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "greater than a predetermined second value” and “less than or equal to a predetermined second value” with “greater than or equal to a predetermined second value” and “less than a predetermined second value", respectively.
  • the mixer 1211 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the encoding target signal for that channel when the index value ⁇ ' is smaller than a predetermined value, and in other cases, i.e., when the index value ⁇ ' is equal to or greater than the above-mentioned predetermined value, may obtain, for each channel, a signal obtained by mixing the input sound signal of that channel with the downmix signal, in which the smaller the index value ⁇ ' is, the closer the signal is to the input sound signal of that channel (i.e., the larger the index value ⁇ ' is, the closer the signal is to the downmix signal) as the encoding target signal for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "smaller than the predetermined value” and “equal to or greater than the predetermined value” are interpreted as “equal to or less than the predetermined value” and “equal to or greater than the predetermined value", respectively.
  • the mixing unit 1211 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel in a first range in which the index value ⁇ ' can be in a range in which the index value ⁇ ' is smaller than a predetermined value (i.e., in the first case in which the index value ⁇ ' is smaller than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal of that channel and the downmix signal are weighted together, where the weight of the input sound signal of that channel in the weighted addition is a value that is in a monotonically decreasing relationship with the index value ⁇ ' in the second range, and the weight of the downmix signal in the weighted addition is a value or index value ⁇ ' that is in a monotonically increasing relationship with the index value ⁇ ' in the second range.
  • the mixing unit 1211 may operate by replacing the previously mentioned "smaller than a predetermined value" and "greater than or equal to a
  • the mixer 1211 to which the index value ⁇ ' is input may obtain, for each channel, the downmix signal as is as the encoding target signal for that channel when the index value ⁇ ' is greater than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal for that channel, and in which the smaller the index value ⁇ ' is, the closer the signal is to the input sound signal for that channel (i.e., the larger the index value ⁇ ' is, the closer the signal is to the downmix signal) as the encoding target signal for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "greater than the predetermined value” and “equal to or less than the predetermined value” are respectively interpreted as “equal to or greater than the predetermined value” and “equal to or less than the predetermined value”.
  • the mixing unit 1211 to which the index value ⁇ ' is input may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel in a first range in which the index value ⁇ ' can be in a range in which the index value ⁇ is greater than a predetermined value (i.e., in the first case in which the index value ⁇ ' is greater than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal and the downmix signal for that channel are weighted together, where the weight of the input sound signal for that channel in the weighted addition is a value that is in a monotonically decreasing relationship with the index value ⁇ ' in the second range, and the weight of the downmix signal in the weighted addition is a value or index value ⁇ ' that is in a monotonically increasing relationship with the index value ⁇ ' in the second range.
  • the mixing unit 1211 may operate by replacing the previously mentioned "greater than a specified value” and "less than a specified value” with "greater than
  • the mixing unit 1211 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel if the index value ⁇ ' is smaller than a predetermined first value, and may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel if the index value ⁇ ' is equal to or greater than a predetermined second value greater than the above-mentioned predetermined first value, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal of that channel, where the smaller the index value ⁇ ' is, the closer the signal is to the input sound signal of that channel (i.e., the larger the index value ⁇ ' is, the closer the signal is to the downmix signal) as the signal to be encoded for that channel (step S1211).
  • the mixing unit 1211 may operate by replacing the previously mentioned “smaller than a predetermined first value” and “greater than or equal to a predetermined first value” with “smaller than a predetermined first value” and “greater than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "smaller than a predetermined second value” and “greater than or equal to a predetermined second value” with “smaller than a predetermined second value” and “greater than a predetermined second value", respectively.
  • the mixer 1211 to which the index value ⁇ ' is input obtains, for each channel, the input sound signal of the channel as is as the signal to be coded for the channel in a first range in which the index value ⁇ ' can be taken, where the index value ⁇ ' is a range smaller than a predetermined first value (i.e., in the first case where the index value ⁇ ' is smaller than the predetermined first value), and obtains, for each channel, the downmix signal as is as the signal to be coded for the channel in a second range in which the index value ⁇ ' can be taken, where the index value ⁇ ' is equal to or greater than a predetermined second value larger than the first value described above (i.e., in the second case where the index value ⁇ ' is equal to or greater than a predetermined second value larger than the first value described above).
  • a third range which is a range that is neither the first range nor the second range (that is, in the third case which is neither the first case nor the second case, specifically, when the index value ⁇ ' is equal to or greater than the above-mentioned predetermined first value and smaller than the above-mentioned predetermined second value)
  • a signal obtained by weighting together an input sound signal and a downmix signal of the channel in which the weight of the input sound signal of the channel in the weighting addition is a value that has a monotonically decreasing relationship with the index value ⁇ ' in the third range, and the weight of the downmix signal in the weighting addition is a value that has a monotonically increasing relationship with the index value ⁇ ' in the third range or the index value ⁇ ', may be obtained as the encoding target signal of the channel.
  • the mixing unit 1211 may operate by replacing the previously mentioned “smaller than a predetermined first value” and “greater than or equal to a predetermined first value” with “smaller than a predetermined first value” and “greater than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "smaller than a predetermined second value” and “greater than or equal to a predetermined second value” with “smaller than a predetermined second value” and “greater than a predetermined second value", respectively.
  • the index value calculation unit 110 obtains an index value ⁇ that is greater than or equal to 0 and less than or equal to 1 and has a monotonically increasing relationship with respect to the single sound source-likeness. For example, the index value calculation unit 110 obtains index value ⁇ such that the index value is 0 when the index value of the single sound source-likeness is the minimum value that the index value can take, and the index value is 1 when the index value of the single sound source-likeness is the maximum value that the index value can take, and the larger the index value of the single sound source-likeness is, the larger the value that the index value calculation unit 110 obtains as index value ⁇ .
  • the index value calculation unit 110 obtains an index value for the single sound source-likeness of the two-channel stereo input sound signal by any of the above-mentioned methods from [First example of a method in which the index value calculation unit 110 obtains an index value for the single sound source-likeness of the two-channel stereo input sound signal] to [Third example of a method in which the index value calculation unit 110 obtains an index value for the single sound source-likeness of the two-channel stereo input sound signal], and obtains, as index value ⁇ , a value normalized so that the index value for the single sound source-likeness of the two-channel stereo input sound signal falls within the range of 0 to 1.
  • step S110-C1-A2' of [first example of the method in which the index value calculation unit 110 obtains an index value of the single sound source-likeness of the two-channel stereo input sound signal] and step S110-C1-B6' of [second example of the method in which the index value calculation unit 110 obtains an index value of the single sound source-likeness of the two-channel stereo input sound signal] fall within the range of 0 to 1
  • the index value calculation unit 110 may directly obtain the index value ⁇ of either of these two-channel stereo input sound signal single sound source-likeness index values.
  • the index value calculation unit 110 may obtain an index value of the single sound source-likeness of the two-channel stereo input sound signal by any of the above-mentioned [First example of a method in which the index value calculation unit 110 obtains an index value of the single sound source-likeness of the two-channel stereo input sound signal] to [Third example of a method in which the index value calculation unit 110 obtains an index value of the single sound source-likeness of the two-channel stereo input sound signal], and normalize the index value of the single sound source-likeness of the two-channel stereo input sound signal so that the index value falls within a range of 0 to 1, as y, or obtain an index value ⁇ expressed by the following formula (4-2) by using the index value of the single sound source-likeness of the two-channel stereo input sound signal obtained in any of step S110-C1-A2′ of [First example of a method in which the index value calculation unit 110 obtains an index value of the single sound source-likeness of the two-channel stereo input sound signal] and step S
  • the mixer 1211 obtains, for each time t, the first-channel encoding target signal x' 1 (t) expressed by the above equation (2-23), and obtains the second-channel encoding target signal x' 2 (t) expressed by the above equation (2-24).
  • the mixer 1211 may take the index value ⁇ calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ p and the index value ⁇ calculated by the index value calculation unit 110 for the current frame as ⁇ c , set the value obtained by the above equation (2-25) as the index value ⁇ (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and set ⁇ c as the index value ⁇ (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame.
  • the mixer 1211 may obtain the first-channel encoding target signal x' 1 (t) represented by the above equation (2-26) instead of the above equation (2-23), or may obtain the second-channel encoding target signal x' 2 (t) represented by the above equation (2-27) instead of the above equation (2-24).
  • the index value calculation unit 110 obtains an index value ⁇ ' that is greater than or equal to 0 and less than or equal to 1 and has a monotonically decreasing relationship in a broad sense with respect to the single sound source-likeness. For example, the index value calculation unit 110 obtains an index value ⁇ ' that is 0 when the index value of the single sound source-likeness is the maximum value that the index value can take, is 1 when the index value of the single sound source-likeness is the minimum value that the index value can take, and is a larger value as the index value of the single sound source-likeness is smaller.
  • the mixer 1211 obtains, for each time t, the first-channel encoding target signal x' 1 (t) expressed by the above equation (2-28) and the second-channel encoding target signal x' 2 (t) expressed by the above equation (2-29).
  • the mixer 1211 may obtain, for each frame, the first-channel encoding target signal x' 1 ( t) represented by the above equation (2-31) instead of the above equation (2-28) or the second-channel encoding target signal x' 2 ( t ) represented by the above equation (2-32) instead of the above equation (2-29), using, for each frame, the index value ⁇ ' calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ ' p and the index value ⁇ ' calculated by the index value calculation unit 110 for the current frame as ⁇ ' c , and may use the value obtained by the above equation (2-30) as the index value ⁇ '(t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and may use ⁇ ' c as the index value ⁇ '(t) for each time from the T 0th time to the last time (i.e.
  • a sound signal processing device 100 will be described that performs processing according to two or more of the bit rate of stereo encoding of the stereo encoding device 200, the absolute value of the inter-channel time difference of the two-channel stereo input sound signal input to the sound signal processing device 100, and the single sound source likeliness of the two-channel stereo input sound signal input to the sound signal processing device 100.
  • the sound signal processing device 100 of the fifth embodiment is as shown by the dashed line, dashed line, and solid line in Fig. 3, and includes an index value calculation unit 110 and a signal mixing unit 120.
  • the sound signal processing device 100 performs processing of steps S110 and S120 shown by the dashed line and solid line in Fig. 4. The following mainly describes the points where the fifth embodiment is different from the second embodiment.
  • the index value calculation unit 110 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100.
  • the index value calculation unit 110 calculates a value that satisfies two or more of the following first, second and third conditions as an index value ⁇ , or calculates a value that satisfies two or more of the following fourth, fifth and sixth conditions as an index value ⁇ ' (step S110).
  • the index value ⁇ or index value ⁇ ' obtained by the index value calculation unit 110 is output to the signal mixing unit 120.
  • the first condition is that when conditions other than the stereo encoding bit rate of the stereo encoding device 200 are the same, the ratio must be in a broadly monotonically increasing relationship with the stereo encoding bit rate of the stereo encoding device 200.
  • the second condition is that when all conditions are the same except for the absolute value
  • the index value ⁇ calculated by the index value calculation unit 110 is one of the following four types.
  • the stereo encoding bit rate of the stereo encoding device 200 is BR
  • a certain predetermined broadly monotonically increasing function is f 1 ()
  • a certain predetermined broadly monotonically decreasing function is f 2 ()
  • ) is an example of the first type of index value ⁇ .
  • the fourth type of index value ⁇ is a value that satisfies the first condition, the second condition, and the third condition.
  • index value calculation unit 110 calculates the fourth type of index value ⁇ , for example, a function that broadly monotonically increases with respect to the first argument when the second argument and the third argument are the same value, that broadly monotonically decreases with respect to the second argument when the first argument and the third argument are the same value, and that broadly monotonically increases with respect to the third argument when the first argument and the second argument are the same value is stored in index value calculation unit 110, and index value calculation unit 110 may obtain a function value for each frame by providing the stereo encoding bit rate of the frame as a first argument, the absolute value
  • the fourth condition is that when all conditions other than the stereo encoding bit rate of the stereo encoding device 200 are the same, there is a broadly monotonically decreasing relationship with the stereo encoding bit rate of the stereo encoding device 200.
  • the fifth condition is that when all conditions are the same except for the absolute value
  • the sixth condition is that, when all conditions other than the single-source-likeness of the two-channel stereo input sound signal are the same, there is a broad-sense monotonically decreasing relationship with respect to the single-source-likeness of the two-channel stereo input sound signal.
  • the sixth condition can also be said to be that, when all conditions other than the multiple-source-likeness of the two-channel stereo input sound signal are the same, there is a broad-sense monotonically increasing relationship with respect to the multiple-source-likeness of the two-channel stereo input sound signal.
  • the index value ⁇ ' calculated by the index value calculation unit 110 is one of the following four types.
  • the first type of index value ⁇ ' is an index value that satisfies the fourth and fifth conditions.
  • the index value calculation unit 110 calculates the first type of index value ⁇ ', for example, a function that monotonically decreases in a broad sense with respect to the first argument when the second argument is the same value and monotonically increases in a broad sense with respect to the second argument when the first argument is the same value is stored in the index value calculation unit 110, and the index value calculation unit 110 may obtain a function value for each frame by providing the stereo encoding bit rate of the frame as a first argument and the absolute value
  • f 4 () a certain predetermined broadly monotonically decreasing function
  • f 5 () a certain predetermined broadly monotonically increasing function
  • ) is an example of the first type of index value ⁇ '.
  • the second type of index value ⁇ ' is an index value that satisfies the fourth and sixth conditions.
  • the index value calculation unit 110 calculates the second type of index value ⁇ ', for example, a function that monotonically decreases in a broad sense with respect to the first argument when the second argument is the same value and that monotonically decreases in a broad sense with respect to the second argument when the first argument is the same value is stored in the index value calculation unit 110, and for each frame, the index value calculation unit 110 provides the function with the bit rate of stereo encoding of the frame as the first argument and the index value of the single sound source-likeness of the frame as the second argument to obtain a function value, and sets the obtained function value as the index value ⁇ ' of the frame.
  • a certain predetermined monotonically decreasing function is f 6 ()
  • the function value f 4 (BR)+f 6 (SS) is an example of the second type of index value ⁇ '.
  • the third type of index value ⁇ ' is an index value that satisfies the fifth and sixth conditions.
  • the index value calculation unit 110 calculates the third type of index value ⁇ ', for example, a function that monotonically increases in a broad sense with respect to the first argument when the second argument is the same value and monotonically decreases in a broad sense with respect to the second argument when the first argument is the same value is stored in the index value calculation unit 110, and for each frame, the index value calculation unit 110 provides the absolute value
  • )+ f6 (SS) is an example of the third type of index value ⁇ '.
  • the fourth type of index value ⁇ ' is an index value that satisfies the fourth condition, the fifth condition, and the sixth condition.
  • index value calculation unit 110 calculates the fourth type of index value ⁇ ', for example, a function that monotonically decreases in a broad sense with respect to the first argument when the second argument and the third argument are the same value, that monotonically increases in a broad sense with respect to the second argument when the first argument and the third argument are the same value, and that monotonically decreases in a broad sense with respect to the third argument when the first argument and the second argument are the same value is stored in index value calculation unit 110, and index value calculation unit 110 may obtain a function value for each frame by providing the stereo encoding bit rate of the frame as a first argument, the absolute value
  • the index value calculation unit 110 may, for example, calculate the absolute value
  • the index value calculation unit 110 may, for example, calculate the single-sound-source-likeness index value or the multiple-sound-source-likeness index value of the two-channel stereo input sound signal in the same manner as the index value calculation unit 110 of the fourth embodiment, and then calculate the index value ⁇ or the index value ⁇ '.
  • the signal mixing unit 120 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, and the index value ⁇ or the index value ⁇ ' output from the index value calculation unit 110.
  • the signal mixer 120 to which the index value ⁇ is input obtains, for each of the first and second channels, a signal obtained by mixing an input sound signal of the first channel and an input sound signal of the other channel, where the larger the index value ⁇ , the closer the signal is to the input sound signal of the first channel, and the signal mixer 120 to which the index value ⁇ ' is input obtains, for each of the first and second channels, a signal obtained by weighting and adding an input sound signal of the first channel and an input sound signal of the other channel, where the smaller the index value ⁇ ', the closer the signal is to the input sound signal of the first channel (step S120).
  • the encoding target signals of the two channels obtained by the signal mixer 120 i.e., two-channel stereo encoding target signals
  • the signal mixing unit 120 to which the index value ⁇ is input obtains, for each channel, a signal obtained by weighting and adding the input sound signal of that channel and the input sound signal of the other channel, where the weight of the input sound signal of that channel in the weighting and adding is a value or index value ⁇ that has a monotonically increasing relationship with the index value ⁇ , and the weight of the input sound signal of the other channel in the weighting and adding is a value that has a monotonically decreasing relationship with the index value ⁇ , as the signal to be coded for that channel.
  • the value that is in a monotonically increasing relationship with the index value ⁇ is, for example, the function value of a monotonically increasing function with the index value ⁇ as an argument. Therefore, for example, a monotonically increasing function for each channel is stored in the signal mixing unit 120 in advance, and the signal mixing unit 120 obtains a function value for each channel of each frame by giving the index value ⁇ as an argument to the monotonically increasing function for that channel, and sets the obtained function value as the weight of the input sound signal of that channel.
  • the monotonically increasing function for the first channel and the monotonically increasing function for the second channel may be the same or different.
  • a set of information that specifies the index value ⁇ that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically increasing relationship with the index value ⁇ is stored in the signal mixing unit 120 in advance for each channel, and the signal mixing unit 120 obtains a weight value that corresponds to the index value ⁇ of the frame from the stored weight values for each channel of each frame, and sets the obtained weight value as the weight of the input sound signal of that channel.
  • Each set that is stored in advance may be the same or different for the first and second channels.
  • a value that has a monotonically decreasing relationship with the index value ⁇ is, for example, a function value of a monotonically decreasing function with the index value ⁇ as an argument. Therefore, for example, a monotonically decreasing function for each channel is stored in advance in the signal mixing unit 120, and for each channel in each frame, the signal mixing unit 120 provides the index value ⁇ as an argument to the monotonically decreasing function for that channel to obtain a function value, and sets the obtained function value as the weight of the input sound signal for the other channel.
  • the monotonically decreasing function for the first channel and the monotonically decreasing function for the second channel may be the same or different.
  • a set of information specifying the index value ⁇ that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically decreasing relationship with the index value ⁇ may be stored in advance in the signal mixing unit 120 for each channel, and the signal mixing unit 120 may acquire, for each channel of each frame, a weight value that corresponds to the index value ⁇ of that frame from among the stored weight values, and set the acquired weight value as the weight of the input sound signal of the other channel.
  • the sets stored in advance may be the same or different for the first and second channels.
  • the signal mixing unit 120 to which the index value ⁇ ' is input obtains, for each channel, a signal obtained by weighting and adding the input sound signal of that channel and the input sound signal of the other channel, where the weight of the input sound signal of that channel in the weighting and addition is a value that has a monotonically decreasing relationship with the index value ⁇ ', and the weight of the input sound signal of the other channel in the weighting and addition is a value that has a monotonically increasing relationship with the index value ⁇ ' or a signal that is the index value ⁇ ', as the signal to be coded for that channel.
  • a set of information specifying the index value ⁇ ' that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically decreasing relationship with the index value ⁇ ' may be stored in advance in the signal mixing unit 120 for each channel, and the signal mixing unit 120 may acquire, for each channel of each frame, the weight value that corresponds to the index value ⁇ ' of that frame from the stored weight values, and set the acquired weight value as the weight of the input sound signal of that channel.
  • the sets stored in advance may be the same or different for the first and second channels.
  • a value that has a monotonically increasing relationship with the index value ⁇ ' is, for example, a function value of a monotonically increasing function with the index value ⁇ ' as an argument. Therefore, for example, a monotonically increasing function for each channel is stored in advance in the signal mixing unit 120, and for each channel of each frame, the signal mixing unit 120 provides the index value ⁇ ' as an argument to the monotonically increasing function for that channel to obtain a function value, and sets the obtained function value as the weight of the input sound signal of the other channel.
  • the monotonically increasing function for the first channel and the monotonically increasing function for the second channel may be the same or different.
  • a set of information specifying the index value ⁇ ' that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically increasing relationship with the index value ⁇ ' may be stored in advance in the signal mixing unit 120 for each channel, and the signal mixing unit 120 may acquire, for each channel of each frame, the weight value that corresponds to the index value ⁇ ' of that frame from the stored weight values, and set the acquired weight value as the weight of the input sound signal of the other channel.
  • the sets stored in advance may be the same or different for the first and second channels.
  • the signal mixing unit 120 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel, and in other cases, that is, when the index value ⁇ is equal to or less than the predetermined value described above, may obtain, for each channel, a signal in which the input sound signal of that channel is mixed with the input sound signal of the other channel, and the larger the index value ⁇ , the closer the signal is to the input sound signal of that channel (step S120).
  • the signal mixing unit 120 may operate by replacing the previously described "greater than the predetermined value” and “equal to or less than the predetermined value” with “equal to or greater than the predetermined value” and “equal to or less than the predetermined value", respectively.
  • the signal mixing unit 120 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel in a first range in which the index value ⁇ can take on is greater than a predetermined value (i.e., the first case in which the index value ⁇ is greater than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal of that channel and the input sound signal of the other channel are weighted together, wherein the weight of the input sound signal of that channel in the weighted addition is a value or index value ⁇ that is monotonically increasing with respect to the index value ⁇ in the second range, and the weight of the input sound signal of the other channel in the weighted addition is a value that is monotonically decreasing with respect to the index value ⁇ in the second range.
  • the signal mixing unit 120 may operate by replacing the previously mentioned "greater than a predetermined value” and "less than a predetermined value” with "greater than a predetermined value” and "
  • the signal mixing unit 120 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel, and in any other case, that is, when the index value ⁇ ' is equal to or greater than the predetermined value described above, may obtain, for each channel, a signal in which the input sound signal of that channel is mixed with the input sound signal of the other channel, and the smaller the index value ⁇ ', the closer the signal is to the input sound signal of that channel (step S120).
  • the signal mixing unit 120 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel in a first range in which the index value ⁇ ' can be in a range in which the index value ⁇ ' is smaller than a predetermined value (i.e., in the first case in which the index value ⁇ ' is smaller than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal of that channel and the input sound signal of the other channel are weighted together, wherein the weight of the input sound signal of that channel in the weighted addition is a value that is monotonically decreasing with respect to the index value ⁇ ' in the second range, and the weight of the input sound signal of the other channel in the weighted addition is a value or index value ⁇ ' that is monotonically increasing with respect to the index value ⁇ ' in the second range.
  • the signal mixing unit 120 may operate by replacing the previously mentioned "smaller than a predetermined value" and "greater than
  • the index value calculation unit 110 obtains an index value ⁇ that is 0.5 or more and 1 or less and satisfies two or more of the first condition, the second condition, and the third condition. Specifically, the index value calculation unit 110 obtains any one of the index value ⁇ that is 0.5 or more and 1 or less and satisfies the first condition and the second condition, the index value ⁇ that is 0.5 or more and 1 or less and satisfies the first condition and the third condition, the index value ⁇ that is 0.5 or more and 1 or less and satisfies the second condition and the third condition, and the index value ⁇ that is 0.5 or more and 1 or less and satisfies the first condition, the second condition, and the third condition.
  • the signal mixer 120 obtains, for each time t, the first-channel encoding target signal x'1 (t) represented by the above equation (2-7) and the second-channel encoding target signal x'2 (t) represented by the above equation (2-8).
  • the signal mixer 120 may, for each frame, take the index value ⁇ calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ p and the index value ⁇ calculated by the index value calculation unit 110 for the current frame as ⁇ c , set the value obtained by the above equation (2-9) as the index value ⁇ (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and set ⁇ c as the index value ⁇ (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame, and may obtain the first-channel encoding target signal x' 1 (t) represented by the above equation (2-10) instead of the above equation (2-7) for each time t of the current frame, or may obtain the second-channel encoding target signal x' 2 (t) represented by the above equation (2-11) instead of the above equation (2
  • the index value calculation unit 110 obtains an index value ⁇ ' that is equal to or greater than 0 and equal to 0.5 and satisfies two or more of the fourth, fifth, and sixth conditions.
  • the index value calculation unit 110 obtains any one of the index value ⁇ ' that is equal to or greater than 0 and equal to 0.5 and satisfies the fourth and fifth conditions, the index value ⁇ ' that is equal to or greater than 0 and equal to 0.5 and satisfies the fourth and sixth conditions, the index value ⁇ ' that is equal to or greater than 0 and equal to 0.5 and satisfies the fifth and sixth conditions, and the index value ⁇ ' that is equal to or greater than 0 and equal to 0.5 and satisfies the fourth, fifth, and sixth conditions.
  • the signal mixer 120 obtains, for each time t, the first-channel encoding target signal x'1 (t) expressed by the above equation (2-12), and obtains the second-channel encoding target signal x'2 (t) expressed by the above equation (2-13).
  • the signal mixer 120 may, for each frame, use the index value ⁇ ' calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ 'p and the index value ⁇ ' calculated by the index value calculation unit 110 for the current frame as ⁇ 'c , use the value obtained by the above equation (2-14) as the index value ⁇ '(t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and use ⁇ 'c as the index value ⁇ '(t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame.
  • the signal mixer 120 may obtain the first-channel encoding target signal x' 1 (t) represented by the above equation (2-15) instead of the above equation (2-12), or may obtain the second-channel encoding target signal x' 2 (t) represented by the above equation (2-16) instead of the above equation (2-13).
  • the fifth embodiment may be implemented by including a process of mixing two-channel stereo input sound signals to generate a downmix signal.
  • An embodiment including a process of generating a downmix signal will be described as a first modified example of the fifth embodiment.
  • the sound signal processing device 100 of the first modified example of the fifth embodiment is as shown by a dashed line, a dashed line, and a solid line in Fig. 5, and includes an index value calculation unit 110 and a signal mixing unit 120, and the signal mixing unit 120 includes a downmix signal generation unit 1201 and a mixing unit 1211.
  • the sound signal processing device 100 performs a process of step S110 and a process of step S120 by steps S1201 and S1211.
  • the first modified example of the fifth embodiment will be described mainly with respect to the differences from the fifth embodiment.
  • index value calculation unit 110 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100.
  • the index value calculation unit 110 calculates a value that satisfies two or more of the first condition, the second condition, and the third condition as the index value ⁇ , or calculates a value that satisfies two or more of the fourth condition, the fifth condition, and the sixth condition as the index value ⁇ ' (step S110).
  • the index value ⁇ or the index value ⁇ ' obtained by the index value calculation unit 110 is output to the signal mixing unit 120.
  • the input/output and operation of the downmix signal generation unit 1201 are the same as those of Modifications 2 and 3 of the second embodiment, Modifications 2 and 3 of the third embodiment, and Modification 1 of the fourth embodiment, and are as described in detail in Modification 2 of the second embodiment.
  • the downmix signal generation unit 1201 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting a two-channel stereo input sound signal input to the sound signal processing device 100.
  • the downmix signal generation unit 1201 mixes the first channel input sound signal and the second channel input sound signal to generate a downmix signal (step S1201).
  • the downmix signal obtained by the downmix signal generation unit 1201 is output to the mixer 1211.
  • the mixing unit 1211 receives, as input, a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100, the downmix signal output from the downmix signal generation unit 1201, and the index value ⁇ or the index value ⁇ ' output from the index value calculation unit 110.
  • the mixer 1211 to which the index value ⁇ is input obtains, for each of the first and second channels, a signal obtained by mixing the input sound signal of the channel and the downmix signal, and the larger the index value ⁇ , the closer the signal is to the input sound signal of the channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as a signal to be coded for the channel
  • the mixer 1211 to which the index value ⁇ ' is input obtains, for each of the first and second channels, a signal obtained by mixing the input sound signal of the channel and the downmix signal, and the smaller the index value ⁇ ', the closer the signal is to the input sound signal of the channel (i.e., the larger the index value ⁇ ', the closer the signal is to the downmix signal), as a signal to be coded for the channel (step S1201).
  • the coding target signals of the two channels obtained by the mixer 1211 i.e., two-channel stereo coding target signals
  • the mixer 1211 to which the index value ⁇ is input obtains, for each channel, a signal obtained by weighting and adding the input sound signal and downmix signal of that channel, where the weight of the input sound signal of that channel in the weighting and addition is a value or index value ⁇ that has a monotonically increasing relationship with the index value ⁇ , and the weight of the downmix signal in the weighting and addition is a value that has a monotonically decreasing relationship with the index value ⁇ , as the encoding target signal for that channel.
  • the value that is in a monotonically increasing relationship with the index value ⁇ is, for example, a function value of a monotonically increasing function with the index value ⁇ as an argument. Therefore, for example, a monotonically increasing function for each channel is stored in the mixer 1211 in advance, and the mixer 1211 obtains a function value for each channel of each frame by providing the index value ⁇ as an argument to the monotonically increasing function for that channel, and sets the obtained function value as the weight of the input sound signal of that channel.
  • the monotonically increasing function for the first channel and the monotonically increasing function for the second channel may be the same or different.
  • a set of information that specifies the index value ⁇ that belongs to each partial range and each weight value that corresponds to each partial range that is predetermined so that the weight value has a monotonically increasing relationship with the index value ⁇ is stored in the mixer 1211 in advance for each channel, and the mixer 1211 obtains a weight value that corresponds to the index value ⁇ of the frame from the stored weight values for each channel of each frame, and sets the obtained weight value as the weight of the input sound signal of that channel.
  • Each set that is stored in advance may be the same or different for the first and second channels.
  • the value that is in a monotonically decreasing relationship with the index value ⁇ is, for example, a function value of a monotonically decreasing function with the index value ⁇ as an argument. Therefore, for example, a monotonically decreasing function for each channel may be stored in the mixer 1211 in advance, and the mixer 1211 may obtain a function value for each channel of each frame by providing the index value ⁇ as an argument to the monotonically decreasing function for that channel, and use the obtained function value as the weight of the downmix signal.
  • the monotonically decreasing function for the first channel and the monotonically decreasing function for the second channel may be the same or different.
  • a set of information that specifies the index value ⁇ that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically decreasing relationship with the index value ⁇ may be stored in the mixer 1211 in advance for each channel, and the mixer 1211 may obtain a weight value that corresponds to the index value ⁇ of the frame from the stored weight values for each channel of each frame, and use the obtained weight value as the weight of the downmix signal.
  • Each set that is stored in advance may be the same or different for the first and second channels.
  • the mixer 1211 to which the index value ⁇ ' is input obtains, for each channel, a signal obtained by weighting and adding the input sound signal and downmix signal of that channel, where the weight of the input sound signal of that channel in the weighting and addition is a value that has a monotonically decreasing relationship with the index value ⁇ ', and the weight of the downmix signal in the weighting and addition is a value that has a monotonically increasing relationship with the index value ⁇ ' or a signal that is the index value ⁇ ', as the signal to be coded for that channel.
  • a value that has a monotonically decreasing relationship with the index value ⁇ ' is, for example, a function value of a monotonically decreasing function with the index value ⁇ ' as an argument. Therefore, for example, a monotonically decreasing function for each channel is stored in advance in the mixer 1211, and for each channel of each frame, the mixer 1211 obtains a function value by providing the index value ⁇ ' as an argument to the monotonically decreasing function for that channel, and sets the obtained function value as the weight of the input sound signal for that channel.
  • the monotonically decreasing function for the first channel and the monotonically decreasing function for the second channel may be the same or different.
  • a set of information specifying the index value ⁇ ' that belongs to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically decreasing relationship with the index value ⁇ ' may be stored in the mixer 1211 for each channel in advance, and the mixer 1211 may acquire, for each channel of each frame, a weight value that corresponds to the index value ⁇ ' of that frame from the stored weight values, and set the acquired weight value as the weight of the input sound signal of that channel.
  • the sets stored in advance may be the same or different for the first and second channels.
  • the value that has a monotonically increasing relationship with the index value ⁇ ' is, for example, the function value of a monotonically increasing function with the index value ⁇ ' as an argument. Therefore, for example, a monotonically increasing function for each channel is stored in advance in the mixer 1211, and for each channel of each frame, the mixer 1211 obtains a function value by providing the index value ⁇ ' as an argument to the monotonically increasing function for that channel, and sets the obtained function value as the weight of the downmix signal.
  • the monotonically increasing function for the first channel and the monotonically increasing function for the second channel may be the same or different.
  • a set of information specifying the index value ⁇ ' belonging to each partial range and each weight value corresponding to each partial range that is predetermined so that the weight value has a monotonically increasing relationship with the index value ⁇ ' may be stored in the mixer 1211 for each channel in advance, and the mixer 1211 may acquire, for each channel of each frame, a weight value corresponding to the index value ⁇ ' of the frame from among the stored weight values, and set the acquired weight value as the weight of the downmix signal.
  • the sets stored in advance may be the same or different for the first and second channels.
  • the mixer 1211 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be coded for that channel if the index value ⁇ is greater than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal of that channel with the downmix signal, where the larger the index value ⁇ , the closer the signal is to the input sound signal of that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the signal to be coded for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "greater than the predetermined value” and “equal to or less than the predetermined value” are respectively interpreted as “equal to or greater than the predetermined value” and “equal to or less than the predetermined value”.
  • the mixing unit 1211 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel in a first range in which the index value ⁇ can take is greater than a predetermined value (i.e., the first case in which the index value ⁇ is greater than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal of that channel and the downmix signal are weighted together, in which the weight of the input sound signal of that channel in the weighted addition is a value or index value ⁇ that is monotonically increasing with respect to the index value ⁇ in the second range, and the weight of the downmix signal in the weighted addition is a value that is monotonically decreasing with respect to the index value ⁇ in the second range.
  • the mixing unit 1211 may operate by replacing the previously mentioned "greater than a specified value” and “less than a specified value” with "greater than a specified value” and "less than a specified value", respectively.
  • the mixer 1211 to which the index value ⁇ is input may obtain, for each channel, the downmix signal as is as the encoding target signal for that channel when the index value ⁇ is smaller than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal for that channel, and the larger the index value ⁇ , the closer the signal is to the input sound signal for that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the encoding target signal for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "smaller than the predetermined value” and “equal to or greater than the predetermined value” are respectively interpreted as “equal to or less than the predetermined value” and “equal to or greater than the predetermined value”.
  • the mixing unit 1211 to which the index value ⁇ is input may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel in a first range in which the index value ⁇ can be in a range where the index value ⁇ is smaller than a predetermined value (i.e., in the first case where the index value ⁇ is smaller than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal and the downmix signal for that channel are weighted together, in which the weight of the input sound signal for that channel in the weighted addition is a value or index value ⁇ that is monotonically increasing with respect to the index value ⁇ in the second range, and the weight of the downmix signal in the weighted addition is a value that is monotonically decreasing with respect to the index value ⁇ in the second range.
  • the mixing unit 1211 may operate by replacing the previously mentioned "smaller than a predetermined value" and "greater than or equal to a predetermined value” with "less than or equal to a predetermined
  • the mixing unit 1211 to which the index value ⁇ is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel if the index value ⁇ is greater than a predetermined first value, and may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel if the index value ⁇ is equal to or less than a predetermined second value which is smaller than the predetermined first value described above, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal of that channel, where the larger the index value ⁇ , the closer the signal is to the input sound signal of that channel (i.e., the smaller the index value ⁇ , the closer the signal is to the downmix signal), as the signal to be encoded for that channel (step S1211).
  • the mixing unit 1211 may operate by replacing the previously mentioned “greater than a predetermined first value” and “less than or equal to a predetermined first value” with “greater than or equal to a predetermined first value” and “less than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "greater than a predetermined second value” and “less than or equal to a predetermined second value” with “greater than or equal to a predetermined second value” and “less than a predetermined second value", respectively.
  • the mixing unit 1211 to which the index value ⁇ is input obtains, for each channel, the input sound signal of the channel as is as the encoding target signal for the channel in a first range in which the index value ⁇ can take is greater than a predetermined first value (i.e., in the first case where the index value ⁇ is greater than the predetermined first value), and obtains, for each channel, the downmix signal as is as the encoding target signal for the channel in a second range in which the index value ⁇ can take is equal to or less than a predetermined second value smaller than the first value described above (i.e., in the second case where the index value ⁇ is equal to or less than the predetermined second value smaller than the first value described above).
  • a third range which is a range that is neither the first range nor the second range (that is, in the third case which is neither the first case nor the second case, specifically, when the index value ⁇ is equal to or less than the above-mentioned predetermined first value and greater than the above-mentioned predetermined second value)
  • a signal obtained by weighting together an input sound signal and a downmix signal of the channel in which the weight of the input sound signal of the channel in the weighting addition is a value or index value ⁇ that has a monotonically increasing relationship with the index value ⁇ in the third range, and the weight of the downmix signal in the weighting addition is a value that has a monotonically decreasing relationship with the index value ⁇ in the third range, may be obtained as the encoding target signal of the channel.
  • the mixing unit 1211 may operate by replacing the previously mentioned “greater than a predetermined first value” and “less than or equal to a predetermined first value” with “greater than or equal to a predetermined first value” and “less than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "greater than a predetermined second value” and “less than or equal to a predetermined second value” with “greater than or equal to a predetermined second value” and “less than a predetermined second value", respectively.
  • the mixer 1211 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the encoding target signal for that channel when the index value ⁇ ' is smaller than a predetermined value, and in other cases, i.e., when the index value ⁇ ' is equal to or greater than the above-mentioned predetermined value, may obtain, for each channel, a signal obtained by mixing the input sound signal of that channel with the downmix signal, in which the smaller the index value ⁇ ' is, the closer the signal is to the input sound signal of that channel (i.e., the larger the index value ⁇ ' is, the closer the signal is to the downmix signal) as the encoding target signal for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "smaller than the predetermined value” and “equal to or greater than the predetermined value” are interpreted as “equal to or less than the predetermined value” and “equal to or greater than the predetermined value", respectively.
  • the mixing unit 1211 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel in a first range in which the index value ⁇ ' can be in a range in which the index value ⁇ ' is smaller than a predetermined value (i.e., in the first case in which the index value ⁇ ' is smaller than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal of that channel and the downmix signal are weighted together, where the weight of the input sound signal of that channel in the weighted addition is a value that is in a monotonically decreasing relationship with the index value ⁇ ' in the second range, and the weight of the downmix signal in the weighted addition is a value or index value ⁇ ' that is in a monotonically increasing relationship with the index value ⁇ ' in the second range.
  • the mixing unit 1211 may operate by replacing the previously mentioned "smaller than a predetermined value" and "greater than or equal to a
  • the mixer 1211 to which the index value ⁇ ' is input may obtain, for each channel, the downmix signal as is as the encoding target signal for that channel when the index value ⁇ ' is greater than a predetermined value, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal for that channel, and in which the smaller the index value ⁇ ' is, the closer the signal is to the input sound signal for that channel (i.e., the larger the index value ⁇ ' is, the closer the signal is to the downmix signal) as the encoding target signal for that channel (step S1211).
  • the mixer 1211 may perform an operation in which the above-mentioned "greater than the predetermined value” and “equal to or less than the predetermined value” are respectively interpreted as “equal to or greater than the predetermined value” and “equal to or less than the predetermined value”.
  • the mixing unit 1211 to which the index value ⁇ ' is input may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel in a first range in which the index value ⁇ ' can be in a range in which the index value ⁇ is greater than a predetermined value (i.e., in the first case in which the index value ⁇ ' is greater than the predetermined value), and may obtain, for each channel, a signal in which the input sound signal and the downmix signal for that channel are weighted together, where the weight of the input sound signal for that channel in the weighted addition is a value that is in a monotonically decreasing relationship with the index value ⁇ ' in the second range, and the weight of the downmix signal in the weighted addition is a value or index value ⁇ ' that is in a monotonically increasing relationship with the index value ⁇ ' in the second range.
  • the mixing unit 1211 may operate by replacing the previously mentioned "greater than a specified value” and "less than a specified value” with "greater than
  • the mixing unit 1211 to which the index value ⁇ ' is input may obtain, for each channel, the input sound signal of that channel as is as the signal to be encoded for that channel if the index value ⁇ ' is smaller than a predetermined first value, and may obtain, for each channel, the downmix signal as is as the signal to be encoded for that channel if the index value ⁇ ' is equal to or greater than a predetermined second value greater than the above-mentioned predetermined first value, and may obtain, for each channel, a signal obtained by mixing the input sound signal and the downmix signal of that channel, where the smaller the index value ⁇ ' is, the closer the signal is to the input sound signal of that channel (i.e., the larger the index value ⁇ ' is, the closer the signal is to the downmix signal) as the signal to be encoded for that channel (step S1211).
  • the mixing unit 1211 may operate by replacing the previously mentioned “smaller than a predetermined first value” and “greater than or equal to a predetermined first value” with “smaller than a predetermined first value” and “greater than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "smaller than a predetermined second value” and “greater than or equal to a predetermined second value” with “smaller than a predetermined second value” and “greater than a predetermined second value", respectively.
  • the mixer 1211 to which the index value ⁇ ' is input obtains, for each channel, the input sound signal of the channel as is as the signal to be coded for the channel in a first range in which the index value ⁇ ' can be taken, where the index value ⁇ ' is a range smaller than a predetermined first value (i.e., in the first case where the index value ⁇ ' is smaller than the predetermined first value), and obtains, for each channel, the downmix signal as is as the signal to be coded for the channel in a second range in which the index value ⁇ ' can be taken, where the index value ⁇ ' is equal to or greater than a predetermined second value larger than the first value described above (i.e., in the second case where the index value ⁇ ' is equal to or greater than a predetermined second value larger than the first value described above).
  • a third range which is a range that is neither the first range nor the second range (that is, in the third case which is neither the first case nor the second case, specifically, when the index value ⁇ ' is equal to or greater than the above-mentioned predetermined first value and smaller than the above-mentioned predetermined second value)
  • a signal obtained by weighting together an input sound signal and a downmix signal of the channel in which the weight of the input sound signal of the channel in the weighting addition is a value that has a monotonically decreasing relationship with the index value ⁇ ' in the third range, and the weight of the downmix signal in the weighting addition is a value that has a monotonically increasing relationship with the index value ⁇ ' in the third range or the index value ⁇ ', may be obtained as the encoding target signal of the channel.
  • the mixing unit 1211 may operate by replacing the previously mentioned “smaller than a predetermined first value” and “greater than or equal to a predetermined first value” with “smaller than a predetermined first value” and “greater than a predetermined first value”, respectively, and may operate by replacing the previously mentioned "smaller than a predetermined second value” and “greater than or equal to a predetermined second value” with “smaller than a predetermined second value” and “greater than a predetermined second value", respectively.
  • the index value calculation unit 110 obtains an index value ⁇ that is 0 or more and 1 or less and satisfies two or more of the first condition, the second condition, and the third condition. Specifically, the index value calculation unit 110 obtains any one of the index value ⁇ that is 0 or more and 1 or less and satisfies the first condition and the second condition, the index value ⁇ that is 0 or more and 1 or less and satisfies the first condition and the third condition, the index value ⁇ that is 0 or more and 1 or less and satisfies the second condition and the third condition, and the index value ⁇ that is 0 or more and 1 or less and satisfies the first condition, the second condition, and the third condition.
  • y be the index value of the single sound source-likeness of the two-channel stereo input sound signal obtained in any of steps S110-C1-B6', let the value expressed by the following equation (5-1) using y be u, let the value expressed by the following equation (5-2) using bias, range, and u be v, let the value expressed by the following equation (5-3) using the absolute value
  • the index value calculation unit 110 may obtain w expressed by the following equation (5-6) when the inter-channel time difference ITD is greater than 0 or equal to or greater than 0, and obtain w expressed by the following equation (5-7) in cases other than the above, i.e., when the inter-channel time difference ITD is less than or equal to 0, and may define the value expressed by the following equation (5-8) as u, define the value expressed by the above equation (5-2) using bias, range, and u as v, and obtain the value expressed by the above equation (5-3) using the absolute value of the inter-channel time difference
  • the index value calculation unit 110 may obtain the value v expressed by the above formula (5-2) as the index value ⁇ that satisfies the first and third conditions.
  • the mixer 1211 obtains, for each time t, the first-channel encoding target signal x' 1 (t) expressed by the above equation (2-23), and obtains the second-channel encoding target signal x' 2 (t) expressed by the above equation (2-24).
  • the mixer 1211 may, for each frame, set the index value ⁇ calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ p and the index value ⁇ calculated by the index value calculation unit 110 for the current frame as ⁇ c , set the value obtained by the above equation (2-25) as the index value ⁇ (t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and set ⁇ c as the index value ⁇ (t) for each time from the T 0th time to the last time (i.e., the Tth time) of the current frame, and obtain the first-channel-coding-target signal x' 1 (t) represented by the above equation (2-26) instead of the above equation (2-23) for each time t of the current frame, or may obtain the second-channel-coding-target signal x' 2 (t) represented by the above equation (2-27) instead of the above equation (2-24).
  • the index value calculation unit 110 obtains an index value ⁇ ' that is 0 or more and 1 or less and satisfies two or more of the fourth, fifth, and sixth conditions. Specifically, the index value calculation unit 110 obtains any one of the index value ⁇ ' that is 0 or more and 1 or less and satisfies the fourth and fifth conditions, the index value ⁇ ' that is 0 or more and 1 or less and satisfies the fourth and sixth conditions, the index value ⁇ ' that is 0 or more and 1 or less and satisfies the fifth and sixth conditions, and the index value ⁇ ' that is 0 or more and 1 or less and satisfies the fourth, fifth, and sixth conditions.
  • the mixer 1211 obtains, for each time t, the first-channel encoding target signal x' 1 (t) expressed by the above equation (2-28) and the second-channel encoding target signal x' 2 (t) expressed by the above equation (2-29).
  • the mixer 1211 may obtain, for each frame, the first-channel encoding target signal x' 1 ( t) represented by the above equation (2-31) instead of the above equation (2-28) or the second-channel encoding target signal x' 2 ( t ) represented by the above equation (2-32) instead of the above equation (2-29), using, for each frame, the index value ⁇ ' calculated by the index value calculation unit 110 for the immediately preceding frame as ⁇ ' p and the index value ⁇ ' calculated by the index value calculation unit 110 for the current frame as ⁇ ' c , and may use the value obtained by the above equation (2-30) as the index value ⁇ '(t) for each time from the first time (i.e., the 1st time) to the T 0 -1th time of the current frame, and may use ⁇ ' c as the index value ⁇ '(t) for each time from the T 0th time to the last time (i.e.
  • the downmix signal generating unit 1201 receives a first channel input sound signal and a second channel input sound signal, which are input sound signals of two channels constituting the two-channel stereo input sound signal input to the sound signal processing device 100.
  • the downmix signal generating unit 1201 generates a signal obtained by weighting and adding the first channel input sound signal and the second channel input sound signal so that the input sound signal of the preceding channel out of the first channel input sound signal and the second channel input sound signal is included to a greater extent the greater the correlation between the first channel input sound signal and the second channel input sound signal (step S1201).
  • the downmix signal generating unit 1201 obtains the downmix signal by performing each of the following processes.
  • the downmix signal generating unit 1201 first performs the same processing as step S110-A1 of the first example of the method in which the index value calculating unit 110 of the third embodiment calculates the absolute value
  • ⁇ cand is a value representing the magnitude of correlation between a sample sequence of a first channel input sound signal and a sample sequence of a second channel input sound signal that is shifted backward from the sample sequence by each candidate sample number ⁇ cand .
  • the downmix signal generation unit 1201 does not need to perform processing to obtain ⁇ cand . As indicated by the two-dot chain line in FIG. 5 , it is sufficient that the ⁇ cand obtained by the index value calculation unit 110 is input to the downmix signal generation unit 1201, and the downmix signal generation unit 1201 uses the input ⁇ cand .
  • the downmix signal generating unit 1201 then obtains the maximum value ⁇ of ⁇ cand .
  • ⁇ cand is a positive value when ⁇ cand is the maximum value ⁇
  • the downmix signal generating unit 1201 obtains information indicating that the first channel is leading as the leading channel information
  • ⁇ cand is a negative value when ⁇ cand is the maximum value ⁇
  • the downmix signal generating unit 1201 obtains information indicating that the second channel is leading as the leading channel information.
  • the downmix signal generating unit 1201 may obtain information indicating that none of the channels is leading as the leading channel information, but may also obtain information indicating that the first channel is leading as the leading channel information, or may obtain information indicating that the second channel is leading as the leading channel information.
  • the leading channel information is information that corresponds to whether the sound emitted by the main sound source in a space reaches the first channel microphone placed in that space first, or the second channel microphone placed in that space first.
  • the leading channel information is information that indicates whether the same sound signal is contained first in the first channel input sound signal or the second channel input sound signal. If the same sound signal is contained first in the first channel input sound signal, it is said that the first channel is leading, and if the same sound signal is contained first in the second channel input sound signal, it is said that the second channel is leading.
  • the leading channel information is information that indicates whether the first channel or the second channel is leading.
  • the downmix signal generating unit 1201 then generates a downmix signal that is a weighted addition of the first channel input sound signal and the second channel input sound signal, such that the input sound signal of the preceding channel out of the first channel input sound signal and the second channel input sound signal is included to a greater extent the greater the correlation between the first channel input sound signal and the second channel input sound signal.
  • each part of the above-mentioned system and each device may be realized by a computer, in which case the processing contents of the functions that each device should have are described by a program. Then, by loading this program into the storage unit 2020 of the computer 2000 shown in Fig. 9 and operating the arithmetic processing unit 2010, the input unit 2030, the output unit 2040, etc., various processing functions of the above-mentioned system and each of the above-mentioned devices are realized on the computer.
  • the system and device of the present invention as a single hardware entity, for example, has an input unit capable of inputting signals from outside the hardware entity, an output unit capable of outputting signals to outside the hardware entity, a communication unit to which a communication device (e.g. a communication cable) capable of communicating with outside the hardware entity can be connected, a CPU (which may also have a central processing unit, cache memory, registers, etc.), memories such as RAM and ROM, an external storage device such as a hard disk, and buses connecting the input unit, output unit, communication unit, CPU, RAM, ROM, and external storage device so that data can be exchanged between them.
  • the hardware entity may also be provided with a device (drive) capable of reading and writing recording media such as a CD-ROM.
  • a device drive
  • An example of a physical entity equipped with such hardware resources is a general-purpose computer.
  • the external storage device of the hardware entity stores the programs required to realize the above-mentioned functions and the data required in the processing of these programs (not limited to an external storage device, the programs may be stored in a ROM, which is a read-only storage device, for example). Data obtained by the processing of these programs is stored appropriately in the RAM, the external storage device, etc.
  • each program stored in an external storage device or ROM, etc.
  • the data required to process each program are loaded into memory as necessary, and interpreted, executed, and processed by the CPU as appropriate.
  • the CPU realizes a specified function (each component represented as the above, “... unit,” “... means,” etc.).
  • each component of an embodiment of the present invention may be configured by a processing circuit.
  • the program describing this processing can be recorded on a computer-readable recording medium.
  • a computer-readable recording medium is, for example, a non-transitory recording medium, specifically, a magnetic recording device, an optical disk, etc.
  • the program may be distributed, for example, by selling, transferring, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded. Furthermore, the program may be distributed by storing the program in a storage device of a server computer and transferring the program from the server computer to other computers via a network.
  • a computer that executes such a program for example, first stores the program recorded on a portable recording medium or the program transferred from a server computer in its own non-transient storage device, auxiliary storage unit 2050. Then, when executing processing, the computer loads the program stored in its own non-transient storage device, auxiliary storage unit 2050, into storage unit 2020, and executes processing according to the loaded program. As another execution form of this program, the computer may load the program directly from a portable recording medium into storage unit 2020 and execute processing according to the program, or, each time a program is transferred to this computer from the server computer, the computer may execute processing according to the received program.
  • the server computer may not transfer the program to this computer, but may instead execute the above-mentioned processing using a so-called ASP (Application Service Provider) type service that realizes processing functions only by issuing execution instructions and obtaining results.
  • ASP Application Service Provider
  • the program includes information used for processing by an electronic computer that is equivalent to a program (such as data that is not a direct command to a computer but has properties that dictate computer processing).
  • system and device are configured by executing a specific program on a computer, but at least a portion of the processing content may be realized by hardware.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Stereophonic System (AREA)
  • Stereo-Broadcasting Methods (AREA)
PCT/JP2022/048528 2022-12-28 2022-12-28 音信号処理装置、音信号処理方法、プログラム Ceased WO2024142357A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2022/048528 WO2024142357A1 (ja) 2022-12-28 2022-12-28 音信号処理装置、音信号処理方法、プログラム
JP2024567128A JPWO2024142357A1 (https=) 2022-12-28 2022-12-28

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/048528 WO2024142357A1 (ja) 2022-12-28 2022-12-28 音信号処理装置、音信号処理方法、プログラム

Publications (1)

Publication Number Publication Date
WO2024142357A1 true WO2024142357A1 (ja) 2024-07-04

Family

ID=91716847

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/048528 Ceased WO2024142357A1 (ja) 2022-12-28 2022-12-28 音信号処理装置、音信号処理方法、プログラム

Country Status (2)

Country Link
JP (1) JPWO2024142357A1 (https=)
WO (1) WO2024142357A1 (https=)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1132399A (ja) * 1997-05-13 1999-02-02 Sony Corp 符号化方法及び装置、並びに記録媒体
WO2010016270A1 (ja) * 2008-08-08 2010-02-11 パナソニック株式会社 量子化装置、符号化装置、量子化方法及び符号化方法
WO2010140350A1 (ja) * 2009-06-02 2010-12-09 パナソニック株式会社 ダウンミックス装置、符号化装置、及びこれらの方法
JP2013033189A (ja) * 2011-07-01 2013-02-14 Sony Corp オーディオ符号化装置、オーディオ符号化方法、およびプログラム
JP2018533056A (ja) * 2015-09-25 2018-11-08 ヴォイスエイジ・コーポレーション ステレオ音声信号をプライマリチャンネルおよびセカンダリチャンネルに時間領域ダウンミックスするために左チャンネルと右チャンネルとの間の長期相関差を使用する方法およびシステム
WO2022097244A1 (ja) * 2020-11-05 2022-05-12 日本電信電話株式会社 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH1132399A (ja) * 1997-05-13 1999-02-02 Sony Corp 符号化方法及び装置、並びに記録媒体
WO2010016270A1 (ja) * 2008-08-08 2010-02-11 パナソニック株式会社 量子化装置、符号化装置、量子化方法及び符号化方法
WO2010140350A1 (ja) * 2009-06-02 2010-12-09 パナソニック株式会社 ダウンミックス装置、符号化装置、及びこれらの方法
JP2013033189A (ja) * 2011-07-01 2013-02-14 Sony Corp オーディオ符号化装置、オーディオ符号化方法、およびプログラム
JP2018533056A (ja) * 2015-09-25 2018-11-08 ヴォイスエイジ・コーポレーション ステレオ音声信号をプライマリチャンネルおよびセカンダリチャンネルに時間領域ダウンミックスするために左チャンネルと右チャンネルとの間の長期相関差を使用する方法およびシステム
WO2022097244A1 (ja) * 2020-11-05 2022-05-12 日本電信電話株式会社 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体

Also Published As

Publication number Publication date
JPWO2024142357A1 (https=) 2024-07-04

Similar Documents

Publication Publication Date Title
US10607629B2 (en) Methods and apparatus for decoding based on speech enhancement metadata
US8532999B2 (en) Apparatus and method for generating a multi-channel synthesizer control signal, multi-channel synthesizer, method of generating an output signal from an input signal and machine-readable storage medium
JP7517461B2 (ja) 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体
JP7544139B2 (ja) 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体
JP7517459B2 (ja) 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体
JP7491393B2 (ja) 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7537512B2 (ja) 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7491394B2 (ja) 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7537511B2 (ja) 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7491395B2 (ja) 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7517460B2 (ja) 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体
JP7517458B2 (ja) 音信号高域補償方法、音信号後処理方法、音信号復号方法、これらの装置、プログラム、および記録媒体
JP2026001181A (ja) 音信号ダウンミックス方法、音信号ダウンミックス装置、プログラム
US20250149047A1 (en) Downmixer and Method of Downmixing
WO2024142357A1 (ja) 音信号処理装置、音信号処理方法、プログラム
WO2024142359A1 (ja) 音信号処理装置、音信号処理方法、プログラム
WO2024142358A1 (ja) 音信号処理装置、音信号処理方法、プログラム
WO2024142360A1 (ja) 音信号処理装置、音信号処理方法、プログラム
JP7380837B2 (ja) 音信号符号化方法、音信号復号方法、音信号符号化装置、音信号復号装置、プログラム及び記録媒体
JP7521595B2 (ja) 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7521596B2 (ja) 音信号精製方法、音信号復号方法、これらの装置、プログラム及び記録媒体
JP7380838B2 (ja) 音信号符号化方法、音信号復号方法、音信号符号化装置、音信号復号装置、プログラム及び記録媒体

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22970145

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2024567128

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22970145

Country of ref document: EP

Kind code of ref document: A1