EP3820161A1

EP3820161A1 - Audio signal processing device and method, impulse response generation device and method, and program

Info

Publication number: EP3820161A1
Application number: EP19831112.8A
Authority: EP
Inventors: Takao Fukui
Original assignee: Sony Corp
Current assignee: Sony Group Corp
Priority date: 2018-07-04
Filing date: 2019-06-20
Publication date: 2021-05-12
Also published as: WO2020008889A1; JP7359146B2; EP3820161A4; JPWO2020008889A1

Abstract

The present technology relates to an audio signal processing device and method, an impulse response generation device and method, and a program that enable a desired phase characteristic to be obtained.The audio signal processing device includes an acquisition part that acquires an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic, and a phase characteristic convolution part that convolves the impulse response into an input audio signal. The present technology can be applied to an audio signal processing device and an impulse response generation device.

Description

TECHNICAL FIELD

The present technology relates to an audio signal processing device and method, an impulse response generation device and method, and a program, and particularly to an audio signal processing device and method, an impulse response generation device and method, and a program that enable a desired phase characteristic to be obtained.

BACKGROUND ART

For example, in a case of reproducing audio such as music, there is known a technology of applying an effect such as an effect to the music and the like to be reproduced by performing filter processing on an audio signal.
As such a technology, for example, a technology of changing an amplitude characteristic of an audio signal so as to bring about a bass enhancement effect by combining a plurality of filters has been proposed (see, for example, Patent Document 1).

CITATION LIST

PATENT DOCUMENT

Patent Document 1: Japanese Patent Application Laid-Open No. 2002-171589

SUMMARY OF THE INVENTION

PROBLEMS TO BE SOLVED BY THE INVENTION

By the way, in recent years, the number of users who reproduce and listen to music with headphones instead of speakers is increasing, and audio reproduction with headphones is becoming mainstream.
On the other hand, most commercial content is basically mastered by speakers. Therefore, there is a complaint that, when the content such as music mastered while being reproduced on the speaker is reproduced on the headphones, it is not possible to hear the bass with a sense of volume as when reproduced on the speaker. That is, when the content is reproduced with headphones, the way the low frequencies are heard is different from that of the speaker reproduction, and it may not be possible to realize the reproduction with the sound quality that the creator originally wanted to convey.
Therefore, the applicant investigated and found that the way the low frequencies were heard was greatly affected by a phase characteristic. That is, it was found that one of the causes is that a low-frequency phase characteristic of the speaker and a low-frequency phase characteristic of the headphones are significantly different.
However, with the above-mentioned technology, although the amplitude characteristic of the audio signal can be adjusted, it is not possible to make the phase characteristic of the audio signal a desired characteristic.
The present technology has been made in view of such circumstances, and is intended to enable a desired phase characteristic to be obtained.

SOLUTIONS TO PROBLEMS

An audio signal processing device of a first aspect of the present technology includes an acquisition part that acquires an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic, and a phase characteristic convolution part that convolves the impulse response into an input audio signal.
An audio signal processing method or program of the first aspect of the present technology includes steps of acquiring an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic, and convolving the impulse response into an input audio signal.
In the first aspect of the present technology, an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic is acquired, and the impulse response is convolved into an input audio signal.
An impulse response generation device of a second aspect of the present technology generates a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic.
An impulse response generation method or program of the second aspect of the present technology includes a step of generating a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic.
In the second aspect of the present technology, a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic is generated.

EFFECTS OF THE INVENTION

According to a first aspect and a second aspect of the present technology, a desired phase characteristic can be obtained.
Note that the effects described herein are not necessarily limited, and any of the effects described in the present disclosure may be applied.

BRIEF DESCRIPTION OF DRAWINGS

Fig. 1 is a diagram for explaining a relationship between a frequency characteristic and an impulse response.
Fig. 2 is a diagram for explaining reconstruction of the impulse response.
Fig. 3 is a diagram showing a frequency characteristic of a reconstructed impulse response.
Fig. 4 is a diagram for explaining reconstruction of the impulse response.
Fig. 5 is a diagram showing a frequency characteristic of a reconstructed impulse response.
Fig. 6 is a diagram for explaining reconstruction of the impulse response.
Fig. 7 is a diagram showing a frequency characteristic of a reconstructed impulse response.
Fig. 8 is a diagram showing a configuration example of an impulse response generation device.
Fig. 9 is a flowchart for explaining impulse response generation processing.
Fig. 10 is a diagram showing a configuration example of an impulse response generation device.
Fig. 11 is a flowchart for explaining impulse response generation processing.
Fig. 12 is a diagram for explaining content mastering.
Fig. 13 is a diagram showing a configuration example of a reproduction device.
Fig. 14 is a flowchart for explaining reproduction processing.
Fig. 15 is a diagram showing a configuration example of a reproduction device.
Fig. 16 is a flowchart for explaining reproduction processing.
Fig. 17 is a diagram showing a configuration example of a reproduction device.
Fig. 18 is a flowchart for explaining reproduction processing.
Fig. 19 is a diagram showing a configuration example of a reproduction device.
Fig. 20 is a diagram showing a configuration example of a computer.

MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments to which the present technology is applied will be described with reference to the drawings.

The present technology enables only a phase characteristic to be adjusted without changing an amplitude characteristic (gain characteristic) of an audio signal.
That is, the present technology enables only a phase characteristic to be adjusted while maintaining an amplitude characteristic of an audio signal by generating an impulse response having a flat or substantially flat amplitude characteristic and a desired phase characteristic, so that a desired phase characteristic can be obtained.
In the present technology, by performing fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) on an impulse response having a target phase characteristic, an impulse response having a flat or substantially flat amplitude characteristic and a desired phase characteristic can be obtained. Here, the amplitude characteristic being flat or substantially flat means that, for example, a value of amplitude (gain) at each frequency of the amplitude characteristic is 1 or substantially 1.
Specifically, in the present technology, a target impulse response is generated by a method A1 or method A2 below.
That is, in the method A1, first, 0 data (zero data) of an appropriate length is inserted before an impulse response for which a phase is to be simulated, and fast Fourier transform (FFT) is performed.
An amplitude characteristic and a phase characteristic can be obtained by such an FFT, the amplitude (gain) value at each frequency of the amplitude characteristic is set to 1 so that the amplitude characteristic becomes flat, and inverse fast Fourier transform (IFFT) is performed on the basis of the flat amplitude characteristic and the phase characteristic obtained by the FFT. Then, the subsequent stage of the impulse response obtained by the IFFT is fade-processed with an appropriate time constant to obtain a desired impulse response.
The impulse response obtained as described above functions as an infinite impulse response (IIR) filter that changes only the phase characteristic while maintaining the amplitude characteristic. Therefore, only the phase characteristic can be adjusted by convolving such an impulse response into the audio signal.
Furthermore, in the method A2, the FFT is performed without inserting 0 data for the impulse response for which the phase is to be simulated, and 0 data is inserted for the simple impulse and the FFT is performed.
Then, the phase characteristic obtained by the FFT for the impulse response and the phase characteristic obtained by the FFT for the simple impulse for which 0 data is inserted are added, and the IFFT is performed on the basis of the phase characteristic obtained as a result and a flat amplitude characteristic of which the value of the amplitude at each frequency is 1. Moreover, the subsequent stage of the impulse response obtained by the IFFT is fade-processed with an appropriate time constant to obtain a desired impulse response.
In the method A2, an impulse response that has a similar characteristic to that in the method A1 can be obtained. In other words, it is possible to obtain an IIR filter that changes only the phase characteristic while maintaining the amplitude characteristic.
In addition, in the method A2, if the phase characteristic obtained by the FFT for the impulse response and the phase characteristic obtained by the FFT for the simple impulse for which 0 data is inserted are subtracted instead of added, an impulse response having an inverse phase characteristic from that of the original impulse response can be obtained.
By using the impulse responses obtained by the above method A1 and method A2, it is possible to reproduce sounds with the same sound quality even with different reproduction devices.
As a specific example, suppose that, for example, there is content that has been mastered while reproducing sound by a speaker, and that content is reproduced by headphones.
In such a case, by convolving the impulse response having the inverse characteristic of the phase characteristic of the headphones into the audio signal of the content and convolving the impulse response having the same phase characteristic as that of the speaker, the phase characteristic of the headphones is canceled and the phase characteristic of the speaker can be simulated. That is, even in a case where content is reproduced by headphones, the sound having the same sound quality as that in mastering can be reproduced.
Then, the present technology will be described in more detail below.
For example, consider a case where there is content that has been mastered while reproducing sound by a speaker, and that content is reproduced by headphones.
In this case, it is assumed that, if only the low-frequency phase characteristic of the speaker used for mastering can be added to the sound source of the content, that is, the audio signal of the content when reproducing by headphones, almost the same sound as the sound that has been created by the creator in the mastering studio can be experienced.
In general, it is difficult to identify the speaker used in the mastering studio from any content. Furthermore, in the future, it may be possible to identify the speaker used in the mastering studio from metadata of the content or the like, but it is difficult at present.
Therefore, since the characteristic of the IIR type high pass filter (HPF) with a cutoff frequency Fc = 50 Hz is close to the characteristic of the speaker, such a characteristic of the HPF is considered as a pseudo speaker characteristic.
For example, the relationship between the frequency characteristic of the IIR type HPF having a cutoff frequency Fc of 50 Hz, that is, the amplitude characteristic (gain characteristic) and the phase characteristic, and the impulse response of the HPF is as shown in Fig. 1.
In Fig. 1, the portion shown by an arrow Q11 shows the amplitude characteristic of the frequency characteristic of the HPF, and the portion shown by an arrow Q12 shows the phase characteristic of the frequency characteristic of the HPF.
In particular, in the amplitude characteristic, the vertical axis indicates the gain (amplitude), and the horizontal axis indicates the frequency. In the phase characteristic, the vertical axis indicates the phase, and the horizontal axis indicates the frequency. From this frequency characteristic, it can be seen that the gain is small and the phase is a positive value on the low frequency side of the HPF.
On the other hand, the impulse response of the HPF is shown in the portion shown by an arrow Q13. The vertical axis of the impulse response indicates the amplitude, and the horizontal axis indicates the time, that is, the time sample (sample). Note that, here, the impulse response of the HPF is enlarged around the 0th sample.
Such an impulse response can be used as an IIR type filter, and by convolving the impulse response shown by the arrow Q13 into an audio signal, the audio signal can be subjected to filter processing of the HPF.
Moreover, the frequency characteristic of the HPF, that is, the amplitude characteristic and the phase characteristic, and the impulse response of the HPF have a lossless relationship, although there is a conversion error.
Specifically, when the IFFT is performed on the frequency characteristic including the amplitude characteristic shown by the arrow Q11 and the phase characteristic shown by the arrow Q12, the impulse response shown by the arrow Q13 is ideally obtained. On the other hand, when the FFT is performed on the impulse response shown by the arrow Q13, the frequency characteristic including the amplitude characteristic shown by the arrow Q11 and the phase characteristic shown by the arrow Q12 is ideally obtained.
By convolving the impulse response of such an HPF into the audio signal of the above-mentioned content mastered while reproducing the sound with the speaker, a similar phase characteristic to that of the speaker can be added to the sound of content, but the low frequency gain (amplitude) is reduced.
Therefore, it is assumed that the FFT is actually performed on the impulse response shown by the arrow Q13, for the resulting frequency characteristic, the amplitude (gain) of all frequencies of the amplitude characteristic is set to 1 to obtain a flat amplitude characteristic, and then the IFFT is performed to reconstruct the impulse response.
At this time, the impulse response to be obtained by the reconstruction is an impulse response in which only the phase characteristic shown in Fig. 1 is added to the audio signal without changing the amplitude characteristic, that is, an impulse response in which only a desired phase characteristic is added without changing the amplitude characteristic.
Then, by convolving the impulse response obtained by the reconstruction into the audio signal of the above-mentioned content mastered while reproducing the sound with the speaker, the target phase characteristic of the speaker can be added to the sound of content without changing the amplitude characteristic. Therefore, the listener (user) can experience almost the same sound as the sound created by the creator in the mastering studio.
Note that, hereinafter, an impulse response that functions as a filter that adds a desired phase characteristic without changing the amplitude characteristic will be also referred to as a target phase characteristic impulse response in particular.
In a case where the target phase characteristic impulse response is to be obtained by reconstructing the impulse response, it is conceivable to perform the reconstruction as shown in Fig. 2, for example.
In Fig. 2, the portion shown by an arrow Q21 shows the impulse response shown by the arrow Q13 in Fig. 1.
This impulse response converges in approximately 1024 samples. However, here, considering the tone after conversion, as shown by an arrow Q22, 0-filling processing is performed on the rear side of the impulse response in the time direction, that is, the future side, and 4096 processing is performed.
That is, 0-filling processing of adding 0 data, which is a sample having a sample value of 0, to the rear side (end) of the impulse response in the time direction is performed so that the total length (number of samples) of the impulse response is 4096 samples.
When the FFT is performed on the impulse response that has been subjected to the 0-filling processing as described above as shown by an arrow Q23, an amplitude characteristic and a phase characteristic similar to those shown in Fig. 1 can be obtained.
Here, the target phase characteristic impulse response has a flat or substantially flat amplitude characteristic, and the phase characteristic is the phase characteristic shown by the arrow Q12.
Therefore, the value of the amplitude (gain) of each frequency in the amplitude characteristic obtained by the FFT is adjusted to "1". In other words, the amplitude of the amplitude characteristic obtained by the FFT is adjusted so that the amplitude characteristic becomes flat.
Furthermore, since the phase characteristic obtained by the FFT should be the target phase characteristic shown by the arrow Q12, no particular phase adjustment is performed on the phase characteristic obtained by the FFT.
Next, as shown by an arrow Q24, the IFFT is performed on the frequency characteristic including the flat amplitude characteristic obtained by the amplitude adjustment and the phase characteristic obtained by the FFT.
Moreover, since the impulse response obtained by the IFFT does not converge to 0, fade processing is performed to cause the impulse response obtained by the IFFT to fade out the rear side (end side) of the impulse response in the time direction to converge to 0.
The impulse response is reconstructed by such fade processing, and as a result, the target phase characteristic impulse response shown by an arrow Q25 is obtained. Here, an impulse response having a length of 4096 samples is obtained as the target phase characteristic impulse response.
The target phase characteristic impulse response shown by the arrow Q25 should ideally have a flat or substantially flat amplitude characteristic and the same phase characteristic as that of the original HPF.
However, since conversion distortion occurs in conversion such as the FFT or IFFT in actual, the frequency characteristic of the target phase characteristic impulse response shown by the arrow Q25 is as shown in Fig. 3.
In Fig. 3, the portion shown by an arrow Q31 shows the amplitude characteristic, and the portion shown by an arrow Q32 shows the phase characteristic. Note that, in the amplitude characteristic, the vertical axis indicates the gain (amplitude), and the horizontal axis indicates the frequency. In the phase characteristic, the vertical axis indicates the phase, and the horizontal axis indicates the frequency.
In the portion shown by the arrow Q31, a curve L11 shows the amplitude characteristic of the target phase characteristic impulse response shown by the arrow Q25 in Fig. 2, and a curve L12 shows the amplitude characteristic of the original HPF shown by the arrow Q21 in Fig. 2. From the curve L11, it can be seen that, in the amplitude characteristic of the target phase characteristic impulse response, the gain of the low frequency portion, that is, the portion shown by an arrow W11 is reduced, although not as much as the original HPF, and the amplitude characteristic is not flat.
Furthermore, in the portion shown by the arrow Q32, a curve L13 shows the phase characteristic of the target phase characteristic impulse response shown by the arrow Q25 in Fig. 2, and a curve L14 shows the phase characteristic of the original HPF shown by the arrow Q21 in Fig. 2, that is, the target phase characteristic.
In this example, the curve L13 is substantially the same as the curve L14, and it can be seen that the target characteristic is obtained with respect to the phase characteristic in the target phase characteristic impulse response.
By the way, in general, in a case of being opposite from the target phase characteristic impulse response, that is, a case where the phase characteristic is flat (a straight line) and the amplitude (gain) changes, it is known that the impulse response basically has a symmetrical shape.
Therefore, the applicant though that, by performing 0-filling processing so that the impulse response is substantially symmetrical around the portion where the pulse rises, so that the front and rear of the portion where the pulse rises have the same length, it would be possible to obtain an impulse response having a flat amplitude characteristic.
Here, it is assumed that the impulse response of the original HPF is subjected to 0-filling processing at least on the front side (past side) in the time direction so that the impulse response has a substantially symmetrical shape, and then the FFT and the IFFT are performed to generate a target phase characteristic impulse response.
In such a case, for example, as shown in Fig. 4, the impulse response is reconstructed and the target phase characteristic impulse response is generated.
In Fig. 4, the portion shown by an arrow Q41 shows the impulse response of the HPF shown by the arrow Q13 in Fig. 1, and this impulse response converges in approximately 1024 samples.
In this example, the impulse response of the HPF shown by the arrow Q41 is subjected to 0-filling processing as shown by an arrow Q42.
That is, 0 data is added not only to the rear side (end side) in the time direction of the impulse response but also to the front side (head side) according to the length of the impulse response.
In particular, here, 0 data is added to the front side of the impulse response in the time direction by the amount of 8192 samples, and 0 data is also added to the rear side of the impulse response in the time direction so that the impulse length itself becomes 8192 samples. Due to such 0-filling processing, the impulse response shown by the arrow Q42 has a substantially symmetrical shape, and the total length is 16384 samples.
Next, as shown by an arrow Q43, when the FFT is performed on the impulse response that has been subjected to 0-filling processing, an amplitude characteristic and a phase characteristic are obtained as similar to the case of the arrow Q23 in Fig. 2.
In this example as well, the target phase characteristic impulse response has a flat amplitude characteristic, so the amplitude (gain) value of each frequency in the amplitude characteristic obtained by the FFT is adjusted to "1" so that a flat amplitude characteristic is obtained.
Furthermore, since the phase characteristic obtained by the FFT should be the target phase characteristic, no particular phase adjustment is performed on the phase characteristic obtained by the FFT.
Subsequently, as shown by an arrow Q44, the IFFT is performed on the frequency characteristic including the flat amplitude characteristic obtained by the amplitude adjustment and the phase characteristic obtained by the FFT, and the impulse response obtained as a result is subjected to the fade processing in a similar manner to in the case of the arrow Q24 in Fig. 2.
Then, the impulse response obtained by the fade processing is used as the target phase characteristic impulse response. Here, the target phase characteristic impulse response shown by an arrow Q45 is obtained, and the target phase characteristic impulse response has a shape close to symmetry. Furthermore, the length of the target phase characteristic impulse response is 16384 samples.
The frequency characteristic of the target phase characteristic impulse response shown by the arrow Q45 thus obtained is as shown in Fig. 5.
In Fig. 5, the portion shown by an arrow Q51 shows the amplitude characteristic, and the portion shown by an arrow Q52 shows the phase characteristic. Note that, in the amplitude characteristic, the vertical axis indicates the gain (amplitude), and the horizontal axis indicates the frequency. In the phase characteristic, the vertical axis indicates the phase, and the horizontal axis indicates the frequency.
In the portion shown by the arrow Q51, a curve L31 shows the amplitude characteristic of the target phase characteristic impulse response shown by the arrow Q45 in Fig. 4, and a curve L32 shows the amplitude characteristic of the original HPF shown by the arrow Q41 in Fig. 4.
The amplitude characteristic of the target phase characteristic impulse response shown in the curve L31 is within ±0.2 dB of the amplitude (gain) value at each frequency, and it can be seen that a substantially flat characteristic is obtained. That is, it can be seen that the target amplitude characteristic is obtained.
Furthermore, in the portion shown by the arrow Q52, a curve L33 shows the phase characteristic of the target phase characteristic impulse response shown by the arrow Q45 in Fig. 4, and a curve L34 shows the phase characteristic of the original HPF shown by the arrow Q41 in Fig. 4, that is, the target phase characteristic. Moreover, a curve L35 shows the phase characteristic of a simple impulse with only 8192 samples delayed, that is, the linear phase.
Here, the curve L33 and the curve L34 almost overlap each other, and it can be seen that a characteristic that is substantially equivalent to the target characteristic as the phase characteristic of the target phase characteristic impulse response is obtained.
Furthermore, the curve L35 is shown for comparison. Since the curve L35 shows the phase characteristic of a simple impulse that is a linear phase, if the difference between the curve L33 and the curve L35 at each frequency is the phase value at each frequency of the phase characteristic shown by the arrow Q12 in Fig. 1, the target characteristic is obtained as the phase characteristic of the target phase characteristic impulse response. Note that the phase characteristic of the original HPF shown by the arrow Q41 in Fig. 4 is the same as the phase characteristic shown by the arrow Q12 in Fig. 1
In this example, when the curve L33 and the curve L35 are compared, the difference in their phases becomes smaller as the frequency increases. Therefore, it can also be seen from the curves L33 and L35 that substantially the same characteristic as the phase characteristic shown by the arrow Q12 in Fig. 1 is obtained as the phase characteristic of the target phase characteristic impulse response.
From the above, it can be seen that, by performing 0-filling processing on the impulse response having the target phase characteristic at least on the front side in the time direction, and performing the FFT, the IFFT, and the fade processing on the impulse response that has been subjected to the 0-filling processing, a target phase characteristic impulse response having a flat or substantially flat amplitude characteristic and a target phase characteristic can be obtained.
The method of generating the target phase characteristic impulse response described with reference to Fig. 4 as described above is the above-mentioned method A1.
Note that, in generating the target phase characteristic impulse response, the longer the length of the impulse response that has been subjected to the 0-filling processing, that is, the larger the number of samples, the closer the frequency characteristic of the target phase characteristic impulse response becomes to the target characteristic. That is, a better characteristic can be obtained. In particular, when the length of the impulse response that has been subjected to the 0-filling processing is infinite samples, the error between the frequency characteristic of the target phase characteristic impulse response and the target characteristic becomes infinitely close to zero.
Furthermore, in a case of generating the target phase characteristic impulse response, it may be desired to reduce the processing amount even if an error from the target characteristic is allowed to some extent. For example, if the length of the target phase characteristic impulse response is shortened, the processing amount is reduced both at the time of generation and at the time of convolution after generation.
In such a case, for example, as shown in Fig. 6, the number of pieces of 0 data added to the impulse response in the 0-filling processing may be reduced so that the processing amount is reduced and the target phase characteristic impulse response having a sufficient characteristic can be obtained.
In Fig. 6, the portion shown by an arrow Q61 shows the impulse response of the HPF shown by the arrow Q13 in Fig. 1, and this impulse response converges in approximately 1024 samples.
In this example, the impulse response of the HPF shown by the arrow Q61 is subjected to 0-filling processing as shown by an arrow Q62.
Here, 0 data is added to the front side of the impulse response in the time direction by the amount of 384 samples, and 0 data is also added to the rear side of the impulse response in the time direction so that the total length of the impulse response becomes 4096 samples.
In this 0-filling processing, the number of pieces of 0 data added to the front side in the time direction in the impulse response is small, so that the impulse response after the 0-filling processing shown by the arrow Q62 does not have a symmetrical shape.
Next, as shown by an arrow Q63, when the FFT is performed on the impulse response that has been subjected to 0-filling processing, an amplitude characteristic and a phase characteristic are obtained as similar to the case of the arrow Q23 in Fig. 2.
Also in this example, as similar to the case of Fig. 4, the value of the amplitude (gain) of each frequency in the amplitude characteristic obtained by the FFT is adjusted to "1" to obtain a flat amplitude characteristic, and no particular phase adjustment is performed on the phase characteristic obtained by the FFT.
Subsequently, as shown by an arrow Q64, the IFFT is performed on the frequency characteristic including the flat amplitude characteristic obtained by the amplitude adjustment and the phase characteristic obtained by the FFT, and the impulse response obtained as a result is subjected to the fade processing in a similar manner to in the case of the arrow Q24 in Fig. 2.
Then, the impulse response obtained by the fade processing is used as the target phase characteristic impulse response. Here, the target phase characteristic impulse response shown by an arrow Q65 is obtained, and the length of the target phase characteristic impulse response is 4096 samples.
Note that, in this example, since the number of pieces of 0 data added to the front side in the time direction in the impulse response is small, the target phase characteristic impulse response shown by the arrow Q65 does not have a symmetrical shape.
The frequency characteristic of the target phase characteristic impulse response shown by the arrow Q65 thus obtained is as shown in Fig. 7.
In Fig. 7, the portion shown by an arrow Q71 shows the amplitude characteristic, and the portion shown by an arrow Q72 shows the phase characteristic. Note that, in the amplitude characteristic, the vertical axis indicates the gain (amplitude), and the horizontal axis indicates the frequency. In the phase characteristic, the vertical axis indicates the phase, and the horizontal axis indicates the frequency.
In the portion shown by the arrow Q71, a curve L51 shows the amplitude characteristic of the target phase characteristic impulse response shown by the arrow Q65 in Fig. 6, and a curve L52 shows the amplitude characteristic of the original HPF shown by the arrow Q61 in Fig. 6.
The amplitude characteristic of the target phase characteristic impulse response shown in the curve L51 is within ±1 dB of the amplitude (gain) value at each frequency, and it can be seen that a substantially flat characteristic is obtained. That is, it can be seen that a sufficient amplitude characteristic is obtained.
In particular, the amplitude characteristic shown in the curve L51 here has a larger error from the target characteristic than the amplitude characteristic shown in the curve L31 in Fig. 5, but it can be seen that the error is within a sufficiently small range.
Furthermore, in the portion shown by the arrow Q72, a curve L53 shows the phase characteristic of the target phase characteristic impulse response shown by the arrow Q65 in Fig. 6, and a curve L54 shows the phase characteristic of the original HPF shown by the arrow Q61 in Fig. 6, that is, the target phase characteristic. Moreover, a curve L55 shows the phase characteristic of the delayed simple impulse as similar to the curve L35 of Fig. 5.
Here, although having a larger error than that in the case of Fig. 5, the curve L53 and the curve L54 almost overlap each other, and it can be seen that a characteristic that is substantially equivalent to the target characteristic as the phase characteristic of the target phase characteristic impulse response is obtained.
Furthermore, when the curve L53 and the curve L55 are compared, the difference in their phases becomes smaller as the frequency increases, and it can be seen that, as similar to the case of Fig. 5, substantially the same characteristic as the phase characteristic shown by the arrow Q12 in Fig. 1 is obtained as the phase characteristic of the target phase characteristic impulse response.
As described above, even if the number of pieces of 0 data to be added to the front side of the impulse response having the target phase characteristic in the time direction is reduced to some extent, it is possible to obtain a target phase characteristic impulse response having a flat or substantially flat amplitude characteristic and a target phase characteristic.
Note that, since the number of pieces of 0 data to be added to the front side of the impulse response in the time direction has a trade-off relationship with the tolerance with the target characteristic and the processing amount, the number of pieces of 0 data to be added is only required to be adjusted as necessary.
Furthermore, instead of performing 0-filling processing on the impulse response having the target phase characteristic, the 0-filling processing may be performed on a simple impulse as shown in the curve L35 in Fig. 5 to generate the target phase characteristic impulse response, for example. Such a method of generating the target phase characteristic impulse response is the above-mentioned method A2.
In the method A2, 0-filling processing of adding 0 data to the front side of the simple impulse in the time direction is performed, and the FFT is performed on the simple impulse that has been subjected to the 0-filling processing.
Note that, hereinafter, the phase characteristic of the frequency characteristic obtained by the FFT with respect to the simple impulse after 0-filling processing will be also referred to as the phase characteristic of the simple impulse in particular.
Furthermore, in the method A2, the 0-filling processing is not performed on the impulse response having the target phase characteristic, and the FFT is performed on the impulse response as it is. Hereinafter, the phase characteristic of the frequency characteristic obtained by the FFT on the impulse response having the target phase characteristic will be also referred to as the target phase characteristic in particular.
When the phase characteristic of the simple impulse and the target phase characteristic are obtained by the FFT as described above, the phase characteristic of the simple impulse and the target phase characteristic are added, and the frequency characteristic including the phase characteristic obtained by the addition and the flat amplitude characteristic is subjected to the IFFT.
Then, the fade processing is performed on the impulse response obtained by the IFFT, and the impulse response obtained as a result is used as the target phase characteristic impulse response.
The target phase characteristic impulse response thus obtained is an impulse response having a flat or substantially flat amplitude characteristic and a target phase characteristic.
Note that, in the method A2, instead of adding the phase characteristic of the simple impulse and the target phase characteristic, by subtracting the target phase characteristic from the phase characteristic of the simple impulse, an impulse response having an inverse characteristic of the target phase characteristic can be obtained as a target phase characteristic impulse response.
Specifically, for example, the phase characteristic obtained by performing the FFT on the impulse response of a predetermined HPF without performing the 0-filling processing is subtracted from the phase characteristic of the simple impulse, and the IFFT is performed on the frequency characteristic including the phase characteristic obtained as a result and the flat amplitude characteristic. Then, the fade processing is performed on the impulse response obtained by the IFFT, and the impulse response obtained as a result is used as the target phase characteristic impulse response.
In this case, the phase characteristic of the obtained target phase characteristic impulse response is the inverse characteristic of the phase characteristic of the original HPF.
As described above, in the method A1, the 0-filling processing is performed on the impulse response having the target phase characteristic, and then the FFT, the IFFT, and the fade processing are performed to generate the target phase characteristic impulse response. On the other hand, in the method A2, the simple impulse is subjected to 0-filling processing, and then the FFT, the IFFT, and the fade processing are performed to generate the target phase characteristic impulse response.
The impulse response used in the method A1 and the simple impulse used in the method A2 are both impulse information, that is, information associated with the impulse. Therefore, if the method A1 and the method A2 are generalized, it can be said that the 0-filling processing is performed on the impulse information, and the resulting phase characteristic is subjected to the FFT, the IFFT, and the fade processing to generate the target phase characteristic impulse response.
By using the target phase characteristic impulse response obtained as described above, it is possible to add a desired phase characteristic to the audio signal without changing the amplitude characteristic.
As a specific example, for example, consider a case where there is content that has been mastered while reproducing sound by a speaker for mastering, and that content is reproduced by headphones or a speaker on a reproduction side.
In this case, by convolving the target phase characteristic impulse response having the inverse characteristic of the phase characteristic of the headphones or speaker on the reproduction side into the audio signal of the content, the phase characteristic of the headphones or speaker on the reproduction side can be canceled for the audio signal of the content. Here, the audio signal in which the phase characteristic of the headphones or speaker on the reproduction side is canceled is referred to as a corrected audio signal.
Note that the target phase characteristic impulse response having the inverse characteristic of the phase characteristic of the headphones or speaker on the reproduction side may be generated by the above-mentioned method A2 or method A1. For example, in a case where such a target phase characteristic impulse response is generated by the method A1, it is sufficient that, the impulse response having the inverse characteristic of the phase characteristic of the headphones or speaker on the reproduction side is subjected to the 0-filling processing, and then, the FFT, the IFFT, and the fade processing is performed.
Moreover, by convolving the target phase characteristic impulse response having the same characteristic as the phase characteristic of the mastering speaker into the corrected audio signal, the phase characteristic of the mastering speaker can be added to the corrected audio signal, that is, the sound of content.
Therefore, if the sound of content is reproduced on the basis of the corrected audio signal to which the phase characteristic of the mastering speaker is added as described above, the listener (user) can experience the sound that is almost the same as the sound created by the creator in the mastering studio.
In addition, for example, in a case where the listener reproduces the sound of content with headphones on the reproduction side, if the head transmission characteristic, that is, the head related transfer function (HRTF) is used, it is possible to present sound that is closer to the sound created by the creator in the mastering studio.
Here, the HRTF is a function indicating the sound transmission characteristic from the sound source to the listener's ear, and more specifically, to the vicinity of the listener's eardrum or the entrance of the ear canal.
In this example, by further convolving the HRTF into the corrected audio signal to which the phase characteristic of the mastering speaker is added, the listener can experience sound closer to the sound heard when the creator is creating in the mastering studio.

Subsequently, a specific configuration and operation of an impulse response generation device that generates the target phase characteristic impulse response described above will be described.
Fig. 8 is a diagram showing a configuration example of an impulse response generation device that generates a target phase characteristic impulse response, that is, an impulse response having a flat or substantially flat amplitude characteristic and a desired phase characteristic by the above-mentioned method A1.
An impulse response generation device 11 shown in Fig. 8 has a 0-filling processing part 21, an FFT processing part 22, an IFFT processing part 23, and a fade processing part 24.
The 0-filling processing part 21 is supplied with an impulse response having the target phase characteristic used for generating the target phase characteristic impulse response. Hereinafter, an impulse response having such a target phase characteristic will be referred to as an input impulse response.
The 0-filling processing part 21 performs 0-filling processing on the supplied input impulse response and supplies the result to the FFT processing part 22.
The FFT processing part 22 performs the FFT on the input impulse response after the 0-filling processing supplied from the 0-filling processing part 21, and supplies the phase characteristic of the frequency characteristic obtained as a result to the IFFT processing part 23.
A flat amplitude characteristic (gain characteristic) in which the gain (amplitude) of each frequency is "1" is supplied to the IFFT processing part 23 from the outside.
The IFFT processing part 23 performs the IFFT on the frequency characteristic including the flat amplitude characteristic supplied from the outside and the phase characteristic supplied from the FFT processing part 22, and supplies the impulse response obtained as a result to the fade processing part 24. In other words, the IFFT is performed on the basis of the flat amplitude characteristic and the phase characteristic supplied from the FFT processing part 22, and an impulse response is generated.
Note that the IFFT processing part 23 may not use the flat amplitude characteristic supplied from the outside, and the flat amplitude characteristic may be generated by adjusting the gain with respect to the amplitude characteristic of the frequency characteristic obtained by the FFT in the FFT processing part 22 so that the amplitude characteristic is used for the IFFT.
The fade processing part 24 performs fade processing on the impulse response supplied from the IFFT processing part 23, and outputs the impulse response obtained as a result as a target phase characteristic impulse response.

Next, the operation of the impulse response generation device 11 will be described.
That is, the impulse response generation processing performed by the impulse response generation device 11 will be described below with reference to the flowchart of Fig. 9.
In step S11, the 0-filling processing part 21 performs 0-filling processing on the supplied input impulse response and supplies the result to the FFT processing part 22.
For example, in step S11, as described with reference to Figs. 4 and 6, 0-filling processing is performed in which 0 data is added to the rear side and the front side in the time direction in the input impulse response. In the 0-filling processing, 0 data is added at least to the front side in the time direction in the input impulse response.
In step S12, the FFT processing part 22 performs the FFT on the input impulse response after the 0-filling processing supplied from the 0-filling processing part 21, and supplies the phase characteristic of the frequency characteristic obtained as a result to the IFFT processing part 23.
In step S13, the IFFT processing part 23 performs the IFFT on the frequency characteristic including the flat amplitude characteristic supplied from the outside and the phase characteristic supplied from the FFT processing part 22, and supplies the impulse response obtained as a result to the fade processing part 24.
In step S14, the fade processing part 24 performs fade processing on the impulse response supplied from the IFFT processing part 23, and outputs the impulse response obtained as a result as a target phase characteristic impulse response.
For example, in the fade processing, the target phase characteristic impulse response is generated by fading out the rear side (end side) of the impulse response supplied from the IFFT processing part 23 in the time direction and converging it to 0. Note that if the impulse response obtained by the IFFT converges to 0, no special fade processing is required.
Furthermore, if, for example, an impulse response having the inverse characteristic of the phase characteristic of the headphones is used as the input impulse response, the phase characteristic of the headphones can be canceled as the target phase characteristic impulse response, that is, an impulse response having the inverse characteristic of the phase characteristic of the headphones can be obtained.
When the target phase characteristic impulse response is generated as described above, the impulse response generation processing ends.
As described above, the impulse response generation device 11 performs 0-filling processing of adding 0 data to the front side in the time direction at least in the input impulse response, and the FFT, the IFFT, and the fade processing is performed on the input impulse response that has been subjected to the 0-filling processing, so that the target phase characteristic impulse response is generated.
As a result, it is possible to obtain the target phase characteristic impulse response that functions as a filter capable of adding the target phase characteristic without changing the amplitude characteristic. Therefore, it possible to obtain a desired phase characteristic by using the target phase characteristic impulse response without changing the amplitude characteristic.

Furthermore, in a case where the target phase characteristic impulse response is generated by the above-mentioned method A2, the impulse response generation device is configured as shown in Fig. 10, for example. Note that, in Fig. 10, the same reference numerals are given to the parts corresponding to the case in Fig. 8, and the description thereof will be omitted as appropriate.
An impulse response generation device 51 shown in Fig. 10 has an FFT processing part 61, a 0-filling processing part 62, an FFT processing part 63, an operation processing part 64, an IFFT processing part 23, and a fade processing part 24. The impulse response generation device 51 includes the FFT processing part 61 to the operation processing part 64 in place of the 0-filling processing part 21 and the FFT processing part 22 in the impulse response generation device 11.
The FFT processing part 61 is supplied with an impulse response having the target phase characteristic used for generating the target phase characteristic impulse response, that is the input impulse response.
The FFT processing part 61 performs the FFT on the supplied input impulse response, and supplies the phase characteristic of the frequency characteristic obtained as a result to the operation processing part 64. Note that if the target phase characteristic itself can be obtained and the target phase characteristic can be supplied to the operation processing part 64, the FFT processing part 61 does not need to be provided.
The 0-filling processing part 62 is supplied with a simple impulse used for generating the target phase characteristic impulse response. The 0-filling processing part 62 performs 0-filling processing on the supplied simple impulse and supplies the result to the FFT processing part 63.
The FFT processing part 63 performs the FFT on the simple impulse that has been subjected to the 0-filling processing and supplied from the 0-filling processing part 62, and supplies the phase characteristic of the frequency characteristic obtained as a result to the operation processing part 64.
The operation processing part 64 performs operation processing based on the phase characteristic supplied from the FFT processing part 61 and the phase characteristic supplied from the FFT processing part 63, and supplies the phase characteristic obtained as a result to the IFFT processing part 23. Here, addition processing or subtraction processing is performed as operation processing.

Next, the operation of the impulse response generation device 51 will be described.
That is, the impulse response generation processing performed by the impulse response generation device 51 will be described below with reference to the flowchart of Fig. 11.
In step S41, the FFT processing part 61 performs the FFT on the supplied input impulse response, and supplies the phase characteristic of the frequency characteristic obtained as a result to the operation processing part 64.
In step S42, the 0-filling processing part 62 performs 0-filling processing on the supplied simple impulse and supplies the result to the FFT processing part 63. In the 0-filling processing, 0 data is added to the front side of the simple impulse in the time direction, and the simple impulse is appropriately delayed.
In step S43, the FFT processing part 63 performs the FFT on the simple impulse that has been subjected to the 0-filling processing and supplied from the 0-filling processing part 62, and supplies the phase characteristic of the frequency characteristic obtained as a result to the operation processing part 64.
In step S44, the operation processing part 64 performs operation processing based on the phase characteristic supplied from the FFT processing part 61 and the phase characteristic supplied from the FFT processing part 63, and supplies the phase characteristic obtained as a result to the IFFT processing part 23.
For example, in a case of obtaining the same characteristic as the phase characteristic of the input impulse response as the phase characteristic of the target phase characteristic impulse response, the operation processing part 64 adds the phase characteristic of the input impulse response supplied from the FFT processing part 61 and the phase characteristic of the simple impulse that has been subjected to the 0-filling processing and supplied from the FFT processing part 63, and supplies the phase characteristic obtained as a result to the IFFT processing part 23.
On the other hand, in a case of obtaining the inverse characteristic of the phase characteristic of the input impulse response as the phase characteristic of the target phase characteristic impulse response, the operation processing part 64 subtracts the phase characteristic of the input impulse response supplied from the FFT processing part 61 from the phase characteristic of the simple impulse that has been subjected to the 0-filling processing and supplied from the FFT processing part 63, and supplies the phase characteristic obtained as a result to the IFFT processing part 23.
When addition or subtraction is performed as operation processing for the phase characteristic as described above, the processing of step S45 and step S46 is performed thereafter to end the impulse response generation processing, and these pieces of processing are similar to those in step S13 and step S14 of Fig. 9, and therefore, the description thereof will be omitted.
As described above, the impulse response generation device 51 performs 0-filling processing of adding 0 data to the front side of the simple impulse in the time direction, and generates the target phase characteristic impulse response on the basis of the simple impulse response that has been subjected to the 0-filling processing and the input impulse response.
As a result, it is possible to obtain the target phase characteristic impulse response that functions as a filter capable of adding the target phase characteristic without changing the amplitude characteristic. Therefore, it possible to obtain a desired phase characteristic by using the target phase characteristic impulse response without changing the amplitude characteristic.

Here, a reproduction device that reproduces content by using the target phase characteristic impulse response generated by the impulse response generation device 11 and the impulse response generation device 51 described above will be described.
Hereinafter, for the sake of specific explanation, it is assumed that the content to be reproduced is mastered in a predetermined studio as shown in Fig. 12.
In the example shown in Fig. 12, there is a creator M11 who performs mastering in the studio, and the creator M11 performs the amplitude adjustment and the like of each band of the content while reproducing the sound of content with the speaker 91 arranged in the studio, as a mastering task.
Furthermore, the audio signal of the content obtained by mastering is reproduced by a reproduction system including a reproduction device and the like of the listener. Note that, anything such as headphones, speakers, and earphones may be used for reproducing the sound of content, but the description will be continued below assuming that headphones are used as a specific example.
The reproduction device used for reproducing the content is configured as shown in Fig. 13, for example.
In the example shown in Fig. 13, a reproduction device 121 includes at least a portable player, a smartphone, a personal computer, or the like capable of controlling reproduction of audio content, and headphones 122 are connected to the reproduction device 121.
The reproduction device 121 has an acquisition part 131, a speaker phase characteristic convolution part 132, and a reproduction control part 133.
In the reproduction device 121, the audio signal of the content obtained by mastering by the creator M11 is supplied to the speaker phase characteristic convolution part 132.
The acquisition part 131 acquires and holds the target phase characteristic impulse response from an external device such as the impulse response generation device 11 or the impulse response generation device 51 at an arbitrary timing. Furthermore, the acquisition part 131 supplies the held target phase characteristic impulse response to the speaker phase characteristic convolution part 132.
The target phase characteristic impulse response acquired by the acquisition part 131 is generated by the impulse response generation device 11 or the impulse response generation device 51 using the input impulse response having the phase characteristic of the speaker 91 used for mastering. That is, the target phase characteristic impulse response is an impulse response having the same phase characteristic as the phase characteristic of the speaker 91.
Note that the target phase characteristic impulse response may not be acquired by the acquisition part 131 at an arbitrary timing, and may be held in advance by the acquisition part 131.
Furthermore, hereinafter, a target phase characteristic impulse response having the same phase characteristic as the phase characteristic of the speaker 91 will be also referred to as a speaker characteristic impulse response in particular.
The speaker phase characteristic convolution part 132 convolves the speaker characteristic impulse response supplied from the acquisition part 131 into the supplied audio signal, and supplies the audio signal obtained as a result to the reproduction control part 133.
The reproduction control part 133 supplies the audio signal supplied from the speaker phase characteristic convolution part 132 to the headphones 122, and reproduces the sound of content. In other words, the reproduction control part 133 controls the reproduction of the sound of content on the headphones 122.
The headphones 122 reproduce the sound of content on the basis of the audio signal supplied from the reproduction control part 133.
Note that although the reproduction device 121 is not provided with the headphones 122 here, the headphones 122 may be provided in the reproduction device 121, or the acquisition part 131 to the reproduction control part 133 may be provided inside the headphones 122.

Subsequently, the operation of the reproduction device 121 will be described. That is, the reproduction processing by the reproduction device 121 will be described below with reference to the flowchart of Fig. 14. Note that at the timing when this reproduction processing is started, the speaker characteristic impulse response has already been acquired by the acquisition part 131.
In step S71, the speaker phase characteristic convolution part 132 convolves the speaker characteristic impulse response supplied from the acquisition part 131 into the supplied audio signal, and supplies the audio signal obtained as a result to the reproduction control part 133.
Therefore, the phase characteristic of the speaker characteristic impulse response, that is, the phase characteristic of the speaker 91 can be added to the sound of content based on the audio signal.
In step S72, the reproduction control part 133 supplies the audio signal supplied from the speaker phase characteristic convolution part 132 to the headphones 122, and reproduces the sound of content, and the reproduction processing ends.
Since the sound of content reproduced by the headphones 122 has the same characteristic as the phase characteristic of the speaker 91, the listener listening to the sound of content hear the sound with almost the same sound quality as the sound of content that the creator M11 was listening to in the studio. Moreover, since, with the speaker characteristic impulse response, only the desired phase characteristic can be added to the sound of content without changing the amplitude characteristic, the gain of the sound of content does not change.
As described above, the reproduction device 121 reproduces the sound of content after convolving the speaker characteristic impulse response into the audio signal of the content. As a result, even in a case where the sound of content is reproduced by the headphones 122, the phase characteristic of the speaker 91 used for mastering can be added to the sound of content. That is, a desired phase characteristic can be obtained.

Note that, in the above description, in the reproduction device 121, the same characteristic as the phase characteristic of the speaker 91 is added to the sound of content. However, when the sound of content is reproduced by the headphones 122, the phase characteristic of the headphones 122 is also added to the sound.
Therefore, not only adding the same characteristic as the phase characteristic of the speaker 91 to the sound of content, the phase characteristic of the headphones 122 may be canceled (removed) to allow the listener to hear the sound closer to the sound of content that the creator M11 was listening to in the studio.
In such a case, the reproduction device is configured as shown in Fig. 15, for example. Note that, in Fig. 15, the same reference numerals are given to the parts corresponding to the case in Fig. 13, and the description thereof will be omitted as appropriate.
Headphones 122 are connected to a reproduction device 161 shown in Fig. 15. Furthermore, the reproduction device 161 has an acquisition part 131, a headphone inverse characteristic convolution part 171, a speaker phase characteristic convolution part 132, and a reproduction control part 133.
In particular, the reproduction device 161 is configured such that the headphone inverse characteristic convolution part 171 is provided in the preceding stage of the speaker phase characteristic convolution part 132 in the reproduction device 121.
In the reproduction device 161, not only the above-mentioned speaker characteristic impulse response but also the target phase characteristic impulse response having the inverse characteristic of the phase characteristic of the headphones 122 is acquired by the acquisition part 131 from an external device such as the impulse response generation device 11 and the impulse response generation device 51 and is held. Hereinafter, the target phase characteristic impulse response having the inverse characteristic of the phase characteristic of the headphones 122 will also be referred to as a headphone inverse characteristic impulse response.
This headphone inverse characteristic impulse response is a target phase characteristic impulse response generated by the impulse response generation device 51 by, for example, using an input impulse response having the phase characteristic of the headphones 122, and performing subtraction as the operation processing in the operation processing part 64.
Note that the headphone inverse characteristic impulse response may not be acquired by the acquisition part 131, but may be held in advance by the acquisition part 131.
The acquisition part 131 supplies the held headphone inverse characteristic impulse response to the headphone inverse characteristic convolution part 171.
The headphone inverse characteristic convolution part 171 convolves the headphone inverse characteristic impulse response supplied from the acquisition part 131 into the audio signal of the supplied content, and supplies the audio signal obtained as a result to the speaker phase characteristic convolution part 132.

Next, the operation of the reproduction device 161 will be described. That is, the reproduction processing by the reproduction device 161 will be described below with reference to the flowchart of Fig. 16. Note that at the timing when this reproduction processing is started, the speaker characteristic impulse response and the headphone inverse characteristic impulse response have already been acquired by the acquisition part 131.
In step S101, the headphone inverse characteristic convolution part 171 convolves the headphone inverse characteristic impulse response supplied from the acquisition part 131 into the supplied audio signal of content, and supplies the audio signal obtained as a result to the speaker phase characteristic convolution part 132.
Therefore, it is possible to add the inverse characteristic of the phase characteristic of the headphones 122 to the sound of content. In other words, the phase characteristic of the headphones 122, which is added when the sound of content is reproduced with the headphones 122, is canceled. Moreover, in the convolution of the headphone inverse characteristic impulse response, only the phase characteristic can be adjusted without changing the amplitude (gain) of the sound of content.
When the headphone inverse characteristic impulse response is convolved into the audio signal, the processing of step S102 and step S103 is performed thereafter to end the reproduction processing, and these pieces of processing are similar to those in step S71 and step S72 of Fig. 14, and therefore, the description thereof will be omitted.
In the reproduction of the sound of content by the reproduction device 161, the phase characteristic of the headphones 122 is first canceled with respect to the sound of content, and then the phase characteristic of the speaker 91, which is a characteristic to be added, is added.
Note that a target phase characteristic impulse response to which the inverse characteristic of the phase characteristic of the headphones 122 can be added and simultaneously, the phase characteristic of the speaker 91 can be added may be generated, and the target phase characteristic impulse response may be convolved into the audio signal of the content.
However, as in the reproduction device 161, by separately convolving the speaker characteristic impulse response and the headphone inverse characteristic impulse response, the phase characteristic added to the sound of content can be freely changed. That is, for example, in the reproduction device 161, it is possible to select an arbitrary speaker 91 from a plurality of speakers 91 of different manufacturers or the like, and convolve the speaker characteristic impulse response having the phase characteristic of the selected speaker 91.
As described above, the reproduction device 161 convolves the headphone inverse characteristic impulse response into the audio signal of content, further convolves the speaker characteristic impulse response into the audio signal, and then reproduces the sound of content.
As a result, even in a case where the sound of the content is reproduced by the headphones 122, the phase characteristic added by the headphones 122 can be canceled and the phase characteristic of the speaker 91 used for mastering can be added to the sound of the content. That is, a desired phase characteristic can be obtained. In particular, in the reproduction processing described with reference to Fig. 16, it is possible to hear sound closer to the sound of content that the creator M11 was listening to in the studio than in the case of reproduction processing described with reference to Fig. 14.

Note that, in a case of reproducing the sound of content with the headphones 122, by convolving the HRTF indicating the sound transmission characteristic from the sound source, for example, the speaker 91 to the creator M11, it is possible to hear sound closer to the sound of content that the creator M11 was listening to in the studio. That is, it is possible to reproduce the listening environment of the studio at the time of mastering.
In a case of convolving the HRTF into the audio signal of the content, the reproduction device is configured, for example, as shown in Fig. 17. Note that, in Fig. 17, the same reference numerals are given to the parts corresponding to the case in Fig. 15, and the description thereof will be omitted as appropriate.
Headphones 122 are connected to a reproduction device 201 shown in Fig. 17. Furthermore, the reproduction device 201 has an acquisition part 131, a headphone inverse characteristic convolution part 171, a speaker phase characteristic convolution part 132, an HRTF convolution part 211, and a reproduction control part 133.
In particular, the reproduction device 201 is configured such that the HRTF convolution part 211 is provided in the subsequent stage of the speaker phase characteristic convolution part 132 in the reproduction device 161.
In the reproduction device 201, not only the speaker characteristic impulse response and the headphone inverse characteristic impulse response described above, but also the HRTF is acquired from an external device by the acquisition part 131 and is held. Note that the HRTF may not be acquired by the acquisition part 131, but may be held in advance by the acquisition part 131.
The acquisition part 131 supplies the held HRTF to the HRTF convolution part 211.
The HRTF convolution part 211 convolves the HRTF supplied from the acquisition part 131 into the audio signal supplied from the speaker phase characteristic convolution part 132, and supplies the audio signal obtained as a result to the reproduction control part 133.
Note that the reproduction device 121 shown in Fig. 13 may be provided with the HRTF convolution part 211.

Next, the operation of the reproduction device 201 will be described. That is, the reproduction processing by the reproduction device 201 will be described below with reference to the flowchart of Fig. 18. Note that at the timing when this reproduction processing is started, the speaker characteristic impulse response, the headphone inverse characteristic impulse response and the HRTF have already been acquired by the acquisition part 131.
When the reproduction processing is started, the processing of step S131 and step S132 is performed, and these pieces of processing are similar to those in step S101 and step S102 of Fig. 16, and therefore, the description thereof will be omitted.
In step S133, the HRTF convolution part 211 convolves the HRTF supplied from the acquisition part 131 into the audio signal supplied from the speaker phase characteristic convolution part 132, and supplies the audio signal obtained as a result to the reproduction control part 133.
In step S134, the reproduction control part 133 supplies the audio signal supplied from the HRTF convolution part 211 to the headphones 122, and reproduces the sound of the content, and the reproduction processing ends. Therefore, when reproducing the sound of content, the phase characteristic of the headphones 122 is canceled, and the phase characteristic of the speaker 91 and the sound transmission characteristic in the studio are added.
As described above, the reproduction device 201 convolves the headphone inverse characteristic impulse response, the speaker characteristic impulse response, and the HRTF into the audio signal, and then reproduces the sound of content.
As a result, even in a case where the sound of content is reproduced by the headphones 122, it is possible to add the desired phase characteristic and the transmission characteristic in a desired listening environment such as a studio, and allow the listener to hear almost the same sound as the sound of content that the creator M11 was listening to in the studio.

Note that a generation part that generates a target phase characteristic impulse response may be provided inside the reproduction device 121, the reproduction device 161, and the reproduction device 201.
For example, in a case where such a generation part is provided inside the reproduction device 161, the reproduction device 161 is configured as shown in Fig. 19. Note that, in Fig. 19, the same reference numerals are given to the parts corresponding to the case in Fig. 15, and the description thereof will be omitted as appropriate.
The reproduction device 161 shown in Fig. 19 has a generation part 241, an acquisition part 131, a headphone inverse characteristic convolution part 171, a speaker phase characteristic convolution part 132, and a reproduction control part 133.
The configuration of the reproduction device 161 shown in Fig. 19 is such that the reproduction device 161 shown in Fig. 15 is further provided with the generation part 241.
The generation part 241 corresponds to the impulse response generation device 11 and the impulse response generation device 51. That is, the generation part 241 performs similar processing to the impulse response generation processing described with reference to Figs. 9 and 11 to generate the headphone inverse characteristic impulse response and the speaker characteristic impulse response, and supplies the result to the acquisition part 131.
According to the present technology described in each of the above embodiments and modification, it is possible to obtain a desired phase characteristic by adjusting only the phase characteristic without changing the amplitude characteristic.
For example, the phase characteristic of any speaker used for mastering in music creation, especially the low-frequency phase characteristic can be added to the sound source while the amplitude characteristic remains flat. Therefore, even when the sound is reproduced using headphones, the equivalent effect to that obtained in a mastering studio can be obtained as a low-frequency sound quality effect.
Moreover, even in a case where the target speaker is unknown, if the impulse response of any general IIR filter that imitates the phase characteristic of the speaker is used as the above-mentioned input impulse response, it is possible to use the obtained speaker characteristic impulse response to add a low-frequency phase characteristic equivalent to that of a speaker without changing the amplitude characteristic.
Furthermore, if the phase characteristic of the headphones, particularly the inverse characteristic of the low-frequency phase characteristic is added by the headphone inverse characteristic impulse response, it is possible to cancel the phase characteristic of the headphones, particularly the low-frequency phase characteristic. Then, after canceling the phase characteristic of the headphones, if the phase characteristic of the speaker, particularly the low frequency characteristic, is further added by the speaker characteristic impulse response, an effect closer to the low frequency sound quality effect in the mastering studio can be obtained.
Note that, in a case where the speaker used in the mastering studio can be specified by metadata or the like in the future, an impulse response having the phase characteristic of the speaker may be used as the input impulse response. Furthermore, in a case where the speaker used in the mastering studio cannot be specified, an impulse response such as an IIR type HPF that imitates the phase characteristic of the speaker may be used as the input impulse response.
Moreover, when the sound of content is reproduced with headphones, in addition to canceling the phase characteristic of the headphones and adding the phase characteristic of the speaker, by convolving the HRTF into the audio signal of the content, the low-frequency phase characteristic of the listening environment in the mastering studio can be simulated with headphones.

By the way, the series of processing described above can be also performed by hardware or can be performed by software. In a case where a series of processing is performed by software, a program constituting the software is installed in a computer. Here, the computer includes a computer incorporated in dedicated hardware and a general-purpose personal computer capable of executing various functions by installing various programs, for example, and the like.
Fig. 20 is a block diagram showing a configuration example of a hardware of a computer that executes the above-described series of processing by a program.
In a computer, a central processing unit (CPU) 501, a read only memory (ROM) 502, and a random access memory (RAM) 503 are mutually connected by a bus 504.
An input and output interface 505 is further connected to the bus 504. An input part 506, an output part 507, a recording part 508, a communication part 509, and a drive 510 are connected to the input and output interface 505.
The input part 506 includes a keyboard, a mouse, a microphone, an imaging element, and the like. The output part 507 includes a display, a speaker, and the like. The recording part 508 includes a hard disk, a nonvolatile memory, and the like. The communication part 509 includes a network interface and the like. The drive 510 drives a removable recording medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.
In the computer configured as described above, for example, the CPU 501 loads the program recorded in the recording part 508 into the RAM 503 via the input and output interface 505 and the bus 504, and executes the program, so that the above-described series of processing is performed.
The program executed by the computer (CPU 501) can be provided by being recorded on the removable recording medium 511 as a package medium or the like, for example. Furthermore, the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
In the computer, a program can be installed in the recording part 508 via the input and output interface 505 by mounting the removable recording medium 511 to the drive 510. Furthermore, the program can be received by the communication part 509 via a wired or wireless transmission medium and installed in the recording part 508. In addition, the program can be installed in the ROM 502 or the recording part 508 in advance.
Note that the program executed by the computer may be a program of processing in chronological order according to the order described in the present specification or may be a program of processing in parallel or at necessary timing such as when a call is made.
Furthermore, the embodiments of the present technology are not limited to the above-described embodiments, and various modifications are possible without departing from the gist of the present technology.
For example, in the present technology, it is possible to adopt a configuration of cloud computing in which one function is shared by a plurality of devices via a network, and is collaboratively processed.
Furthermore, each step described in the above-described flowchart can be executed by one device or shared by a plurality of devices.
Moreover, in a case where a plurality of processes is included in one step, a plurality of processes included in the one step can be executed by one device or shared and executed by a plurality of devices.
Moreover, the present technology may have the configurations below.

(1) An audio signal processing device including:
- an acquisition part that acquires an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic; and
- a phase characteristic convolution part that convolves the impulse response into an input audio signal.
(2) The audio signal processing device according to (1) ,
in which the predetermined phase characteristic is a phase characteristic of a predetermined speaker.
(3) The audio signal processing device according to (1) or (2), further including
a reproduction control part that controls reproduction by headphones of sound based on an audio signal obtained by convolution of the impulse response.
(4) The audio signal processing device according to (3), further including
an inverse characteristic convolution part that convolves an impulse response having an inverse characteristic of a phase characteristic of the headphones into the input audio signal.
(5) The audio signal processing device according to any one of (1) to (4), further including
an HRTF convolution part that convolves an HRTF into an audio signal obtained by convolution by the phase characteristic convolution part.
(6) The audio signal processing device according to any one of (1) to (5), further including
an impulse response generation part that generates the impulse response.
(7) An audio signal processing method executed by an audio signal processing device, including:
- acquiring an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic; and
- convolving the impulse response into an input audio signal.
(8) A program that causes a computer to perform processing including steps of:
- acquiring an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic; and
- convolving the impulse response into an input audio signal.
(9) An impulse response generation device that generates a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic.
(10) The impulse response generation device according to (9), further including:
- a 0-filling processing part that performs 0-filling processing of adding 0 data to predetermined impulse information;
- an impulse information FFT processing part that performs FFT on the impulse information to which the 0 data has been added; and
- an IFFT processing part that performs IFFT on the basis of a phase characteristic obtained by the FFT and a flat amplitude characteristic to generate the target characteristic impulse response.
(11) The impulse response generation device according to (10) ,
in which the 0-filling processing part adds the 0 data to at least a front side of the impulse information in a time direction.
(12) The impulse response generation device according to (10) or (11), further including
a fade processing part that performs fade processing on an impulse response obtained by the IFFT to obtain the target characteristic impulse response.
(13) The impulse response generation device according to any one of (10) to (12),
in which the impulse information is an impulse response having the predetermined phase characteristic.
(14) The impulse response generation device according to any one of (10) to (12),
in which the impulse information is a simple impulse,
the impulse response generation device further includes
an impulse response FFT processing part that performs FFT on an impulse response having the predetermined phase characteristic, and
an operation processing part that performs operation based on a phase characteristic obtained by the FFT by the impulse information FFT processing part and the phase characteristic obtained by the FFT by the impulse response FFT processing part, and
the IFFT processing part performs the IFFT on the basis of a phase characteristic obtained by the operation and the flat amplitude characteristic.
(15) The impulse response generation device according to (14),
in which, as the operation, the operation processing part adds the phase characteristic obtained by the FFT by the impulse information FFT processing part and the phase characteristic obtained by the FFT by the impulse response FFT processing part.
(16) The impulse response generation device according to (14),
in which, as the operation, the operation processing part subtracts the phase characteristic obtained by the FFT by the impulse response FFT processing part from the phase characteristic obtained by the FFT by the impulse information FFT processing part.
(17) An impulse response generation method including,
by an impulse response generation device,
generating a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic.
(18) A program that causes a computer to perform processing including a step of
generating a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic.

REFERENCE SIGNS LIST

11: Impulse response generation device
21: 0-filling processing part
22: FFT processing part
23: IFFT processing part
24: Fade processing part
61: FFT processing part
62: 0-filling processing part
63: FFT processing part
64: Operation processing part
121: Reproduction device
131: Acquisition part
132: Speaker phase characteristic convolution part
133: Reproduction control part
171: Headphone inverse characteristic convolution part
211: HRTF convolution part
241: Generation part

Claims

An audio signal processing device comprising:
an acquisition part that acquires an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic; and

a phase characteristic convolution part that convolves the impulse response into an input audio signal.
The audio signal processing device according to claim 1,
wherein the predetermined phase characteristic is a phase characteristic of a predetermined speaker.
The audio signal processing device according to claim 1, further comprising
a reproduction control part that controls reproduction by headphones of sound based on an audio signal obtained by convolution of the impulse response.
The audio signal processing device according to claim 3, further comprising
an inverse characteristic convolution part that convolves an impulse response having an inverse characteristic of a phase characteristic of the headphones into the input audio signal.
The audio signal processing device according to claim 1, further comprising
an HRTF convolution part that convolves an HRTF into an audio signal obtained by convolution by the phase characteristic convolution part.
The audio signal processing device according to claim 1, further comprising
an impulse response generation part that generates the impulse response.
An audio signal processing method executed by an audio signal processing device, comprising:
acquiring an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic; and

convolving the impulse response into an input audio signal.
A program that causes a computer to perform processing comprising steps of:
acquiring an impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic; and

convolving the impulse response into an input audio signal.
An impulse response generation device that generates a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic.
The impulse response generation device according to claim 9, further comprising:
a 0-filling processing part that performs 0-filling processing of adding 0 data to predetermined impulse information;

an impulse information FFT processing part that performs FFT on the impulse information to which the 0 data has been added; and

an IFFT processing part that performs IFFT on a basis of a phase characteristic obtained by the FFT and a flat amplitude characteristic to generate the target characteristic impulse response.
The impulse response generation device according to claim 10,
wherein the 0-filling processing part adds the 0 data to at least a front side of the impulse information in a time direction.
The impulse response generation device according to claim 10, further comprising
a fade processing part that performs fade processing on an impulse response obtained by the IFFT to obtain the target characteristic impulse response.
The impulse response generation device according to claim 10,
wherein the impulse information is an impulse response having the predetermined phase characteristic.
The impulse response generation device according to claim 10,
wherein the impulse information is a simple impulse,
the impulse response generation device further comprises
an impulse response FFT processing part that performs FFT on an impulse response having the predetermined phase characteristic, and
an operation processing part that performs operation based on a phase characteristic obtained by the FFT by the impulse information FFT processing part and the phase characteristic obtained by the FFT by the impulse response FFT processing part, and
the IFFT processing part performs the IFFT on a basis of a phase characteristic obtained by the operation and the flat amplitude characteristic.
The impulse response generation device according to claim 14,
wherein, as the operation, the operation processing part adds the phase characteristic obtained by the FFT by the impulse information FFT processing part and the phase characteristic obtained by the FFT by the impulse response FFT processing part.
The impulse response generation device according to claim 14,
wherein, as the operation, the operation processing part subtracts the phase characteristic obtained by the FFT by the impulse response FFT processing part from the phase characteristic obtained by the FFT by the impulse information FFT processing part.
An impulse response generation method comprising,
by an impulse response generation device,
generating a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic.
A program that causes a computer to perform processing comprising a step of
generating a target characteristic impulse response having a flat or substantially flat amplitude characteristic and a predetermined phase characteristic.