WO2006120829A1 - 混合音分離装置 - Google Patents
混合音分離装置 Download PDFInfo
- Publication number
- WO2006120829A1 WO2006120829A1 PCT/JP2006/307673 JP2006307673W WO2006120829A1 WO 2006120829 A1 WO2006120829 A1 WO 2006120829A1 JP 2006307673 W JP2006307673 W JP 2006307673W WO 2006120829 A1 WO2006120829 A1 WO 2006120829A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- waveform
- frequency
- local
- analysis
- frequency information
- Prior art date
Links
- 238000001228 spectrum Methods 0.000 claims abstract description 34
- 230000005236 sound signal Effects 0.000 claims abstract description 5
- 238000004458 analytical method Methods 0.000 claims description 342
- 238000000926 separation method Methods 0.000 claims description 29
- 238000000605 extraction Methods 0.000 claims description 26
- 238000000034 method Methods 0.000 claims description 16
- 230000002123 temporal effect Effects 0.000 claims description 14
- 230000004304 visual acuity Effects 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 43
- 238000007796 conventional method Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 11
- 238000005070 sampling Methods 0.000 description 8
- 238000012545 processing Methods 0.000 description 7
- 239000000284 extract Substances 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 239000007787 solid Substances 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 235000009413 Ratibida columnifera Nutrition 0.000 description 1
- 241000510442 Ratibida peduncularis Species 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/0204—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using subband decomposition
Definitions
- the present invention relates to a mixed sound separation device that separates a desired sound from a mixed sound.
- a mixed sound separation device as a device for separating a desired sound from a mixed sound.
- frequency analysis is performed on the mixed sound, and a spectrogram is created with the vertical axis representing frequency and the horizontal axis representing time, and the intensity of power at each point is shown in shades.
- a desired sound is separated from the mixed sound on the spectrogram.
- Fourier transform is generally used as a method for converting speech power into a spectrogram, that is, a speech frequency analysis method. For this reason, the Fourier transform plays an important role in the mixed sound separation processing.
- determining the time width of the analysis waveform is equivalent to determining the analysis frame width (time width) in Fourier transform.
- frequency analysis may be performed by applying a window function with a value to the waveform to be analyzed that is not zero in the analysis target section (the time section in which the analysis waveform exists)!
- FIG. 1 is a diagram for explaining a method of Fourier transform (discrete Fourier transform).
- the analysis waveform and the analysis waveform shown in Fig. 1 (c) are Mutual
- the correlation (convolution) Fig. 1 (b)
- the frequency information amplitude spectrum and phase spectrum
- the index k in Equation 1 is an index indicating the frequency to be analyzed.
- frequency information at a plurality of frequencies to be analyzed is obtained simultaneously. The larger the index value, the higher the analysis result.
- time resolution is the length of the time interval that is averaged when obtaining the cross-correlation (convolution) between the waveform to be analyzed and the analyzed waveform.
- Frequency resolution means a frequency bandwidth through which a frequency component of a waveform to be analyzed passes, and the bandwidth exists around the frequency to be analyzed.
- FIG. 2 is a diagram showing a relationship between an analysis waveform having a predetermined time width and a frequency characteristic when the waveform to be analyzed is subjected to frequency analysis using the analysis waveform.
- Figure 2 shows the frequency characteristics when frequency analysis is performed using three types of time resolution. From the left column, the analysis has time resolution of one period, two periods, and three periods. This shows the relationship between the analysis waveform and frequency characteristics when frequency analysis is performed using the waveform.
- the frequency resolution becomes coarse when frequency analysis is performed using the cosine waveform for one cycle as the analysis waveform, and the time resolution is constrained, and the cosine waveform for three cycles (one cycle) is obtained.
- the frequency analysis is performed with the time resolution coarsened using the analysis waveform with a time width three times that of the cosine waveform of Fig. 2), it can be seen that the frequency resolution becomes more powerful.
- the time resolution (the length of the time interval that is averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) and the frequency resolution are in a trade-off relationship.
- frequency analysis is performed using a cosine waveform having a time width determined from time resolution (spatial resolution) and frequency resolution (an analysis waveform having a zero value in time intervals other than the above time width). Is done.
- FIG. 3 is a diagram for explaining cosine transform (discrete cosine transform).
- cosine transform discrete cosine transform
- a analysis waveform having a zero value in the time interval other than the above time width
- 3 By obtaining the cross-correlation (convolution) between the analyzed waveform and the analyzed waveform shown in (c) (Fig. 3 (b)), Obtain wave number information (represented by combining amplitude spectrum and phase spectrum).
- the index k in Equations 5 and 6 is an index indicating the frequency to be analyzed, and in the cosine transformation, frequency information at a plurality of frequencies to be analyzed is obtained simultaneously. The larger the index value, the higher the analysis result at the frequency.
- the time resolution (the length of the time interval to be averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform), the frequency resolution, Both are automatically determined. This mechanism is the same as in the case of the Fourier transform (Fig.
- Equation 5 performs frequency analysis using a cross-correlation (convolution) between the analyzed waveform in the form of integration and the analyzed waveform. become.
- frequency analysis is performed using a wavelet basis function having a time width determined from time resolution (spatial resolution) and frequency resolution.
- FIG. 4 is a diagram for explaining wavelet transform.
- a wavelet basis function analysis waveform having a zero value in a time interval other than the above time range
- an analysis waveform having a predetermined time width as shown in Fig. 4 (a) is used.
- the cross-correlation convolution
- the frequency information Obtain the amplitude spectrum and phase spectrum.
- a is a wavelet basis function.
- the time resolution (the length of the time interval to be averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform), the frequency resolution, Both are automatically determined.
- This mechanism is the Fourier transform. The same as in the case of replacement (see Figure 2).
- time resolution (or frequency resolution) can be set independently for each frequency to be analyzed.
- all analyzed frequencies have the same time resolution (time width of the time window to be analyzed) and frequency resolution, and these cannot be set independently for each frequency to be analyzed.
- the frequency resolution (or time resolution) is automatically determined by the time resolution (or frequency resolution).
- wavelet transform using wavelet basis functions such as force debesy, Meyer, and Gabor described using a Mexican hat as a wavelet basis function.
- Non-patent document 1 Hironobu Nakano, 2 others, "Signal processing and image processing by wavelet", 199 August 15, 1999, Kyoritsu Publishing Co., pp. 35-39, pp. 49- 52
- Non-Patent Document 2 Seiichi Nakagawa, “Pattern Information Processing”, March 30, 1999, Maruzen Co., Ltd., pp. 14-19
- the time resolution (the length of the time interval to be averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) and the frequency resolution (around the analysis frequency through which the frequency component of the analyzed waveform passes) Frequency bandwidth) interfere with each other. Therefore, if the time width of the analysis waveform is shortened and the time resolution is increased, the frequency resolution becomes coarser, and if the time width of the analysis waveform is increased and the frequency resolution is increased, the time resolution becomes coarser. Therefore, there is a problem that time resolution and frequency resolution cannot be set independently.
- the present invention has been made in view of such problems, and has a time resolution (the length of a time interval that is averaged when obtaining a cross-correlation between an analyzed waveform and an analyzed waveform).
- the frequency resolution (frequency bandwidth around the analysis frequency through which the frequency component of the waveform to be analyzed passes) is set at the same time, and the mixing is performed based on the result as if frequency analysis was performed. High certain sound from the sound! It is an object to provide a mixed sound separation device that can be separated with high accuracy.
- a mixed sound separation device is a mixed sound separation device that separates a specific sound from mixed sound composed of a plurality of sounds, and is a predetermined sound
- a local frequency information creating means for obtaining a plurality of pieces of local frequency information corresponding to the local analysis waveform including at least one of the phase spectra
- a specific sound frequency feature that performs pattern matching with a set of wave number information and extracts the plurality of sets of local frequency information based on the result of the pattern matching! / Comprising a Extraction unit, based on the set of the plurality of local frequency information extracted by the specific sound frequency feature extracting unit Te, and a sound signal generation means for generating a signal of the specific sound.
- the time resolution and the frequency resolution can be set independently, and a plurality of sets of local frequency information each analyzed by a plurality of frequency resolutions (a plurality of time resolutions) are determined in advance.
- a plurality of sets of local frequency information each analyzed by a plurality of frequency resolutions are determined in advance.
- the above-described mixed sound separation device further includes, based on the predetermined frequency resolution, An analysis waveform time width determining means for determining the time width of the analysis waveform may be provided!
- the analysis waveform includes a cosine waveform or a sine waveform
- the analysis waveform time width determining unit is configured to determine whether the analysis waveform is a cosine waveform or an integer period based on the predetermined frequency resolution.
- the time width of the analysis waveform is determined so as to include the analysis waveform of the sine waveform for an integer period.
- the integer period is one period.
- the above-described mixed sound separation device further includes frequency resolution input receiving means for receiving an input of frequency resolution, and the analysis waveform time width determining means is based on the input frequency resolution. The time width of the analysis waveform may be determined.
- the frequency resolution can be controlled based on the properties of the waveform to be analyzed, the application specifications, and the like.
- the mixed sound separation device described above further divides the analysis waveforms so as not to overlap in time based on the predetermined spatiotemporal resolution, and thereby the plurality of local analysis waveforms are divided. It is characterized by having an analysis waveform dividing means to create!
- the analysis waveform dividing means may generate the plurality of local analysis waveforms by dividing the analysis waveform so as to have a plurality of spatiotemporal resolutions.
- the mixed sound separation device described above further includes a spatiotemporal resolution input receiving unit that receives an input of a spatiotemporal resolution, and the analysis waveform dividing unit is based on the input spatiotemporal resolution.
- the analysis waveform may be divided to create the plurality of local analysis waveforms. This makes it possible to control the frequency resolution based on the characteristics of the waveform to be analyzed, application specifications, and the like.
- a frequency analysis device is a device that performs frequency analysis of a waveform to be analyzed using an analysis waveform for analyzing a predetermined frequency, and includes a part of the analysis waveform.
- a plurality of local frequency information corresponding to the local analysis waveform including at least one of an amplitude spectrum and a phase spectrum at the predetermined frequency from the plurality of local analysis waveforms configured and having a predetermined spatiotemporal resolution and the waveform to be analyzed;
- the local frequency information creation means to be obtained and the plurality of local frequency information obtained by the local frequency information creation means are used as a set, and the analysis target is analyzed at a predetermined frequency resolution from the set and the frequency information of the waveform to be analyzed.
- an analyzed waveform frequency feature quantity extracting means for extracting a frequency feature quantity contained in the waveform.
- FIG. 5 is a diagram illustrating the overall configuration of the present invention.
- the time width of the analysis waveform is determined based on a predetermined frequency resolution as shown in FIG. 5 (a). That is, as shown in Fig. 5 (b), the cosine waveform for three cycles is used as the analysis waveform.
- the frequency resolution it is necessary to set the frequency resolution to be fine, so the time width of the analysis waveform is set so that the frequency resolution is about 15 Hz.
- the time resolution (the length of the time interval averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) is The time resolution is determined by the time width of the analysis waveform, and the time resolution becomes the time width of the cosine waveform for three cycles, resulting in coarse time resolution.
- the fine temporal structure of the waveform to be analyzed (change in frequency information at time intervals smaller than the time width of the cosine waveform for three cycles) cannot be expressed.
- the analysis waveform is temporally divided based on a desired time resolution.
- the analysis waveform is divided into time intervals smaller than the length of the fundamental waveform so that the structure of the fundamental waveform of speech can be seen.
- the analysis waveform is divided into cosine waveforms for one period to create three local analysis waveforms.
- the time resolution (the length of the time interval that is averaged when obtaining the cross-correlation between the waveform to be analyzed and the analyzed waveform) is the time width of the cosine waveform for one cycle, and the cosine waveform for three cycles. Compared to the time width of In other words, the time resolution is carefully set independently of the frequency resolution (however, the three local analysis waveforms are extracted from the same analysis waveform).
- three local frequency information is obtained by performing frequency analysis using three local analysis waveforms.
- the local frequency information is obtained by calculating the cross-correlation (convolution) between the waveform to be analyzed and the local analysis waveform by replacing the analysis waveform with the local analysis waveform in the conventional frequency analysis.
- the frequency information obtained using the analysis waveform, which is a cosine waveform for three cycles, by the discrete cosine transform, which is the conventional technique, and the cosine waveform for three cycles in the present invention are temporally divided. Let us consider the relationship with the three pieces of local frequency information obtained using the local analysis waveforms. In the case of the example in FIG. 5, the frequency information obtained by the discrete cosine transform, which is the conventional technique, is expressed by Equation 11.
- Equation 12 Equation 12, Equation 13, and Equation 14.
- the frequency information obtained by the discrete cosine transform may be equivalent to the sum of the three local frequency information obtained by the present invention.
- the three pieces of local frequency information obtained by the present invention include frequency information having the frequency resolution obtained by the discrete cosine transform. In other words, if three pieces of local frequency information are considered together, frequency information with fine frequency resolution can be obtained.
- the time resolution (the length of the time period to be averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) and It is possible to extract the frequency feature quantity contained in the waveform to be analyzed as if the frequency analysis was performed with both the power and the frequency resolution simultaneously.
- an analyzed waveform with a time width equivalent to a cosine waveform for three periods is required to obtain three pieces of local frequency information. Therefore, the length of the time interval of the waveform to be analyzed necessary for frequency analysis is the same as the conventional analysis method.
- FIG. 6 is a diagram illustrating an example in which frequency analysis is performed based on another frequency resolution.
- the cosine waveform for 4 cycles is used as the analysis waveform.
- the time resolution (the length of the time interval to be averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) Becomes the time width of the cosine waveform for 4 cycles, and the time resolution becomes coarse. This makes it impossible to represent the detailed temporal structure of the waveform to be analyzed.
- the analysis waveform is temporally divided based on a desired time resolution.
- the analysis waveform is divided into cosine waveforms for two periods to create two local analysis waveforms.
- the time resolution (the length of the time interval that is averaged when obtaining the cross-correlation between the waveform to be analyzed and the analyzed waveform) is the time width of the cosine waveform for two cycles, and is independent of the frequency resolution.
- the power is set. (However, the two local analysis waveforms are waveforms in which the same analysis waveform force is also extracted.)
- frequency analysis is performed using two local analysis waveforms to obtain two pieces of local frequency information.
- the local frequency information is calculated using the conventional frequency.
- the analysis waveform is replaced with the local analysis waveform, and the cross-correlation (convolution) between the analyzed waveform and the local analysis waveform is calculated.
- the frequency information obtained using the analysis waveform, which is a cosine waveform for four cycles, and the cosine waveform for two cycles in the present invention by discrete cosine transform, which is a conventional technique, are obtained by dividing the frequency information.
- the frequency information obtained by the discrete cosine transform, which is a conventional technique is expressed by Equation 17.
- Equation 17 Also, the two pieces of local frequency information in the present invention are expressed by Equations 18 and 19.
- the frequency information obtained by the discrete cosine transform may be equivalent to the sum of the two pieces of local frequency information obtained by the present invention.
- the two pieces of local frequency information obtained in the present invention are obtained by discrete cosine transform. It can be seen that frequency information having a desired frequency resolution is included. In other words, if two pieces of local frequency information are considered together, frequency information with fine frequency resolution can be obtained.
- Equation 20 there are a plurality of combinations of local frequency information values (Equations 18 and 19) in the frequency information values (Equation 17) by discrete cosine transform obtained with a desired frequency resolution.
- Equation 21 there are combinations shown in Equation 21. That is, X
- two pieces of local frequency information treated as a set of data can be used to resolve frequency information having a desired frequency resolution to a desired fine time resolution.
- This is a dispersion representation of two local frequency information that has the ability to be added to the frequency information obtained by the conventional discrete cosine transform and further information related to changes in the temporal frequency structure. I know that there is.
- the time resolution (the length of the time interval to be averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) and It is possible to extract the frequency feature quantity contained in the waveform to be analyzed as if the frequency analysis was performed with both the power and the frequency resolution simultaneously.
- an analyzed waveform with a time width equivalent to a cosine waveform for four periods is required to obtain two pieces of local frequency information. Therefore, the length of the time interval of the waveform to be analyzed necessary for frequency analysis is the same as the conventional analysis method.
- FIG. 7 is a diagram showing an example of creating a local analysis waveform by temporally overlapping and dividing the analysis waveform.
- Fig. 7 (a) is a diagram showing the frequency resolution in this example, which is the same as the frequency resolution shown in Fig. 6 (a).
- Fig. 7 (b) The cosine waveform for the same four cycles is used as the analysis waveform.
- the time resolution (the length of the time interval that is averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) Becomes the time width of the cosine waveform for 4 cycles, and the time resolution becomes coarse. This makes it impossible to represent the detailed temporal structure of the waveform to be analyzed.
- the analysis waveform is temporally divided based on a desired time resolution.
- the analysis waveforms are divided into cosine waveforms for two periods while temporally overlapping to create three local analysis waveforms.
- the time resolution (the length of the time interval that is averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) here is the time width of the cosine waveform for two periods (however, Local analysis waveforms are waveforms extracted from the same analysis waveform.)
- three local frequency information is obtained by performing frequency analysis using three local analysis waveforms.
- the local frequency information is obtained by calculating the cross-correlation (convolution) between the waveform to be analyzed and the local analysis waveform by replacing the analysis waveform with the local analysis waveform in the conventional frequency analysis.
- the frequency information obtained using the analysis waveform, which is a cosine waveform for four cycles, and the cosine waveform for two cycles in the present invention are obtained by discrete cosine transform, which is a conventional technique.
- discrete cosine transform which is a conventional technique.
- the sum of the three local frequency information gives an approximate value of twice the frequency information obtained by the discrete cosine transform.
- the three pieces of local frequency information contain frequency information obtained with fine frequency resolution by discrete cosine transform.
- FIG. 8 is a diagram illustrating an example in which frequency analysis is performed based on another time resolution.
- Figure 8 (a) shows the frequency resolution in this example, which is the same as the frequency resolution shown in Figure 5 (a).
- frequency analysis is performed with a finer time resolution than the example of Fig. 5 (the length of the time interval that is averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform).
- the cosine waveform for the same three cycles as in Fig. 5 is used as the analysis waveform.
- the time resolution becomes the time width of the cosine waveform for three periods, and the time resolution becomes coarse. Therefore, in the example of Fig. 8, as shown in Fig. 8 (c), the analysis waveform is divided into cosine waveforms for 0.5 cycles to create six local analysis waveforms.
- the time resolution here is the time width of a cosine waveform for 0.5 period. Then, frequency analysis is performed using six local analysis waveforms to obtain six local frequency information.
- the relationship between the frequency information obtained by the discrete cosine transform, which is the conventional technique, using the analysis waveform (cosine waveform for three cycles) and the six pieces of local frequency information in the present invention is considered.
- the frequency information obtained by the discrete cosine transform is obtained from the sum of the six pieces of local frequency information.
- the six pieces of local frequency information include frequency information obtained by discrete cosine transform obtained with a predetermined frequency resolution.
- what handled six pieces of local frequency information as a set of data is distributed in a manner that uses frequency information with finer frequency resolution than local frequency information and six pieces of local frequency information with fine temporal resolution as components. It is apparent that this is the frequency information obtained by the conventional discrete cosine transform plus information on changes in the temporal frequency structure.
- FIG. 9 is a diagram showing the relationship between frequency information based on a cosine waveform for one period and frequency information based on Fourier transform.
- a cosine waveform for one period corresponding to the frequency to be analyzed is used as a local analysis waveform in the same manner as in the example of Fig. 5.
- the frequency to be analyzed is expressed as fn when the fundamental frequency is fl as shown in Fig. 9 (c).
- fn indicates a frequency n times fl. Then, as shown in Fig.
- the frequency information of the Fourier transform is created by obtaining the sum of the local frequency information that falls within the time window in the Fourier transform, as in the example of Fig. 5. Can do.
- the number of local frequency information entering the time window in the Fourier transform corresponds to one for the local frequency information corresponding to the frequency fl and to the frequency f2.
- waveform information can be easily created from frequency information by inverse Fourier transform. This shows that the local frequency information in the present invention can be converted into waveform information.
- the mixed sound can be subtracted from the mixed sound with a fine frequency resolution and a fine time resolution (the cross-correlation between the analyzed waveform and the analyzed waveform Clear by extracting the local frequency information of the sound extracted with high accuracy using a set of local frequency information for each frequency expressed by the length of the time interval to be averaged when obtaining) Can provide the user with the extracted sound (waveform information of the extracted sound).
- the analysis time width (corresponding to the time width of the analysis waveform) determined based on the desired frequency resolution.
- a plurality of analysis waveforms (corresponding to local analysis waveforms) respectively extracted from the same analysis waveform having the predetermined frequency are prepared so as to be within the analysis time width, and the plurality of analysis waveforms (local analysis waveforms) are prepared.
- This method is characterized in that multiple frequency information (corresponding to local frequency information) is created using (analysis waveform), and these are treated as a set of data to analyze the frequency features of the analyzed waveform. It is.
- the time resolution (the length of the time interval to be averaged when obtaining the cross-correlation between the analyzed waveform and the analyzed waveform) and the frequency resolution can be set independently.
- a mixed sound separation device and a frequency analysis device that can perform frequency analysis as if the frequency analysis was performed with the power of both time resolution and frequency resolution at the same time are provided. It can be used as a basic technology in a wide range of fields such as speech recognition, sound recognition, character recognition, face recognition, and iris authentication, and its practical value is extremely high.
- FIG. 1 is a diagram for explaining a conventional Fourier transform (discrete Fourier transform) method.
- FIG. 2 is a diagram showing a relationship between an analysis waveform having a predetermined time width and a frequency characteristic when the analyzed waveform is subjected to frequency analysis using the analysis waveform.
- FIG. 3 is a diagram for explaining cosine transform (discrete cosine transform), which is a conventional technique.
- FIG. 4 is a diagram for explaining wavelet transform, which is a conventional technique.
- FIG. 5 is a diagram for explaining the overall configuration of the present invention.
- FIG. 6 is a diagram illustrating an example in which frequency analysis is performed based on another frequency resolution.
- FIG. 7 is a diagram showing an example of creating a local analysis waveform by dividing the analysis waveform by overlapping in time.
- FIG. 8 is a diagram illustrating an example in which frequency analysis is performed based on another time resolution.
- FIG. 9 is a diagram showing the relationship between frequency information based on a cosine waveform for one period and frequency information based on Fourier transform.
- FIG. 10 is a block diagram showing the overall configuration of the frequency analyzer according to the embodiment of the present invention.
- FIG. 11 is a flowchart showing an operation procedure of the mixed sound separation system 100.
- FIG. 12 shows an example of mixed sound S 100.
- FIG. 13 shows an analysis waveform and local frequency information.
- FIG. 14 is a diagram showing local frequency information obtained by experiments.
- FIG. 15 is a diagram showing an example of a method for extracting the local frequency information of the extracted sound included in the mixed sound S100.
- FIG. 16 is a diagram comparing the configuration of the conventional method and the method of the present invention for the extraction of frequency feature values!
- FIG. 17 is a diagram showing a spatial image of local frequency information.
- FIG. 18 is a diagram showing an example of the local frequency information of the extracted sound included in the mixed sound S100.
- FIG. 19 is a block diagram showing another example of the overall configuration of the frequency analyzer according to the embodiment of the present invention.
- FIG. 20 is a diagram for explaining the local frequency information DB created by the local frequency information creating unit.
- FIG. 21 is a diagram for explaining the local frequency information DB created by the local frequency information creation unit.
- FIG. 22 is a diagram showing an example of local frequency information DB.
- FIG. 23 is a diagram showing an example of a frequency feature amount analysis method using the local frequency information DB.
- FIG. 24 is a diagram showing an example of a frequency feature amount analysis method using the local frequency information DB.
- FIG. 25 is a diagram for explaining the local frequency information DB created by the local frequency information creating unit.
- FIG. 26 is a diagram showing an example of local frequency information DB.
- FIG. 27 is a diagram showing an example of a frequency feature amount analysis method using the local frequency information DB.
- FIG. 28 is a diagram showing an example of a frequency feature amount analysis method using the local frequency information DB.
- FIG. 10 is a block diagram showing the overall configuration of the frequency analyzer according to the embodiment of the present invention.
- the frequency analyzer according to the present invention is incorporated in a mixed sound separation system.
- the case of separating the voice of one speaker from the mixed sound is analyzed by frequency analysis of the mixed sound composed of the voices of three speakers. An example will be described.
- the mixed sound separation system 100 is a system that extracts the voice of one speaker from the mixed sound in which the voices of a plurality of speakers are mixed.
- the microphone 101, the frequency analyzer 102, and the sound conversion unit 107 and a speaker 108 are provided.
- the frequency analysis device 102 is a processing device that analyzes frequency components included in the mixed sound and extracts frequency feature amounts.
- the analysis waveform time width determination unit 103, the analysis waveform division unit 104, and the local frequency information creation unit 105 And an analyzed waveform frequency feature quantity extraction unit 106.
- Microphone 101 takes mixed sound S100 and outputs it to local frequency information creation section 105.
- the analysis waveform time width determination unit 103 determines the time width of the analysis waveform corresponding to the frequency to be analyzed based on a predetermined frequency resolution.
- the analysis waveform time division unit 104 is based on a predetermined time resolution (the length of the time interval that is averaged when obtaining the correlation between the waveform to be analyzed and the analysis waveform).
- the analysis waveform S101 created by 103 is divided by allowing it to overlap in time, and a plurality of local analysis waveforms S102 are created.
- the local frequency information creation unit 105 includes the local analysis waveform S102 including at least one of the amplitude spectrum and the phase spectrum with the predetermined time resolution. A plurality of local frequency information S103 corresponding to is obtained.
- the analyzed waveform frequency feature quantity extraction unit 106 uses the plurality of pieces of local frequency information S103 as a group of data, so that the local frequency information of the extracted sound included in the mixed sound S100 can be obtained with the above frequency resolution. And the Fourier coefficient S104 of the extracted sound, which is one of the frequency features included in the mixed sound S100, is extracted by using the local frequency information of the extracted sound to create the Fourier coefficient S104 of the extracted sound. .
- the sound converter 107 creates an extracted sound (extracted sound waveform) S 105 using the Fourier coefficient S104 of the extracted sound.
- the speaker 108 outputs the extracted sound S105 to the user.
- FIG. 11 is a flowchart showing an operation procedure of the mixed sound separation system 100.
- the mixed sound S100 which also includes the voice power of three speakers, is taken into the local frequency information creation unit 105 of the frequency analyzer 102 (step 200 in FIG. 11).
- Fig. 12 shows an example of the mixed sound S100.
- FIG. 12 (a) shows the waveform of the mixed sound S100
- FIG. 12 (b) shows the spectrum of the mixed sound S100 obtained by Fourier transform, which is a conventional technique.
- speech can be expressed by repeating the basic waveform.
- the amplitude of the basic waveform is not large for all times, but there is a time region close to zero.
- the time resolution is reduced and analyzed, the characteristics of the basic waveform of the speech of three speakers in the mixed sound can be analyzed.
- the mixed sound waveform in Fig. 12 (a) is displayed with coarse time resolution, so it is difficult to see the characteristics of the basic waveform of the speech of three people. This shows that it is important to improve the time resolution to separate mixed sounds.
- the spectrogram based on the Fourier transform in Fig. 12 (b) the resolution of both the time resolution and the frequency resolution cannot be intensified at the same time during the Fourier transform. It is difficult to separate the spectral features of the speech.
- the analysis waveform time width determination unit 103 determines the time width of the analysis waveform corresponding to the frequency to be analyzed based on a predetermined frequency resolution, and creates the analysis waveform S101 (Fig. 11). Step 201).
- the time width of the analysis waveform S101 is set to the time width (a time window in the Fourier transform) for the fundamental frequency power ⁇ period.
- 13 (a) and 13 (b) are diagrams for explaining frequency analysis using a cosine waveform
- FIGS. 13 (c) and 13 (d) are diagrams for explaining frequency analysis using a sine waveform.
- FIG. FIGS. 13 (a) and 13 (c) show analysis waveforms having the above-described analysis waveforms
- FIGS. 13 (b) and 13 (d) show FIGS. 13 (a) and 13 (d). 13 shows local frequency information corresponding to the analysis waveforms shown in (c).
- the analysis waveforms shown in Fig. 13 (a) and Fig. 13 (c) are obtained by combining both the solid and dashed waveforms. It is a waveform (a waveform with only a solid line represents one local analysis waveform).
- an analysis waveform with the same time width is used for all frequencies to be analyzed.
- the frequency to be analyzed is different, the number of periods included in the analysis waveform differs depending on the frequency to be analyzed. Specifically, as shown in Fig. 13 (a) and Fig.
- the analysis waveform whose fundamental frequency fl is the analysis frequency is composed of cosine waveform and sine waveform force for one period, and the frequency to be analyzed
- the analysis waveform of f2 which is twice the fundamental frequency fl, is composed of cosine waveform and sine waveform for two periods, and the analysis waveform of f3 whose analysis frequency is three times the fundamental waveform fl is cosine waveform and sine of three periods Consists of waveforms.
- the frequency resolution of the analysis waveform before it is divided into local analysis waveforms is the same as that shown in Fig. 9 (c), and the frequency characteristics of the frequencies fl, f 2 and f 3 to be analyzed are such that the frequency characteristics are orthogonal. It becomes frequency resolution.
- determining the time width of the analysis waveform is equivalent to determining the analysis frame width in the Fourier transform in a short time.
- a window function may be applied to the waveform to be analyzed in the Fourier transform in a short time, but in this example, this is equivalent to a rectangular window having the same time width as the analysis waveform being applied to the waveform to be analyzed. It is.
- frequency analysis may be performed by applying a window function having a non-zero value in the analysis target section (time section in which the analysis waveform exists) to the analyzed waveform.
- the frequency analyzer 102 can further determine the frequency resolution based on the nature of the waveform S100 to be analyzed and the specifications of the application by further including a frequency resolution input receiving unit.
- Such frequency resolution may be input from the outside. For example, it is possible to analyze the feature value of sudden sound with coarse frequency resolution (the number of local frequency information to be collected in the same time resolution is reduced), but for musical sounds, the frequency resolution should be sought. Therefore, it is necessary to analyze the features (the number of pieces of local frequency information to be collected in the same time resolution increases). Since the amount of calculation when extracting feature values differs depending on the number of data to be collected, the calculation cost can be reduced by controlling the frequency resolution to be analyzed according to the nature of the input waveform to be analyzed.
- the analysis waveform dividing unit 104 divides the analysis waveform S101 created by the analysis waveform time width determination unit 103 based on a predetermined time resolution to allow time overlap, and divides a plurality of local waveforms.
- An analysis waveform S102 is created (step 202 in FIG. 11).
- the minutes For each frequency to be analyzed the analysis waveform S101 (waveform combining both solid and dashed lines) is divided into a cosine waveform and sine waveform for one period, and the local analysis waveform S102 (the solid line waveform is one local waveform). Create an analysis waveform). Specifically, as shown in FIGS.
- the local analysis waveform whose analysis frequency is the fundamental frequency fl is the analysis waveform itself, and the analysis frequency is the fundamental frequency fl.
- the local analysis waveform of f2 that is twice the frequency is composed of two local analysis waveforms consisting of a cosine waveform and a sine waveform force for the frequency of f2, and the frequency to be analyzed is three times the fundamental frequency fl
- the local analysis waveform of f3 is composed of one period of cosine waveform with frequency of f3 and three local analysis waveforms composed of sine waveform force. Looking at each frequency to be analyzed, it is the same as the local analysis waveform shown in Fig. 5 (c).
- the time resolution at this time (the length of the time interval that is averaged when obtaining the cross-correlation between the waveform to be analyzed and the analyzed waveform) is the time width of one cycle of the analyzed waveform of the frequency to be analyzed. This shows that the time resolution can be set independently of the frequency resolution.
- the multiple local analysis waveforms are waveforms in which the same analysis waveform force is also extracted. In this example, the analysis waveform S101 is divided without overlapping in time. Create local analysis waveforms as shown in Fig. 6, Fig. 7 and Fig. 8.
- the frequency analysis apparatus 102 further includes a spatiotemporal resolution input receiving unit, so that the time resolution can be determined based on the property of the waveform S100 to be analyzed and the specification of the application. Such time resolution may be input from the outside. For example, sudden sound needs to be analyzed with fine temporal resolution. When analyzing a mixed sound in which sudden sounds, voices, musical sounds, etc. appear alternately, it is possible to analyze with high accuracy by controlling the time resolution based on the input waveform to be analyzed.
- the memory capacity for storing frequency information can also be reduced (the number of local frequency information to be stored can be reduced by coarsening the time resolution when fine time resolution is not required).
- the local frequency information creation unit 105 performs the above predetermined time resolution (cross-correlation between the waveform to be analyzed and the analysis waveform) based on the cross-correlation (convolution) between the mixed sound S100 and the local analysis waveform S102.
- Frequency information S 103 is obtained (step 203 in FIG. 11).
- the local frequency information is obtained by changing the analysis waveform to the local analysis waveform according to the analysis method used in the Fourier transform (see Equation 11, Equation 12, Equation 13, and Equation 14). As shown in the example of Fig.
- one local frequency information is two local frequencies when the frequency to be analyzed is f2, which is twice the basic frequency.
- the frequency information to be analyzed is f3, which is three times the basic frequency
- the frequency information is obtained in each of the analysis of the three local frequency information power cosine waveforms and sine waveforms (see also Fig. 5).
- the amplitude spectrum and phase spectrum can be obtained. That is, in this example, the local frequency information is frequency information including both an amplitude spectrum and a phase spectrum.
- Fig. 14 shows the mixed sound sampled at 16KHz, as shown in Fig. 14 (a), using the same cosine waveform for one period as the example in Fig. 5 as the local analysis waveform, as shown in Fig. 5.
- the local frequency information is obtained for all sampling points while shifting the time for each sampling point.
- Figure 14 (b) is a graph in which the local frequency information for all sampling points is arranged in time series when the frequency to be analyzed is ⁇ , with the horizontal axis representing time and the vertical axis representing power.
- Figure 14 (b) shows three drafts when Japanese is spoken. From the top, local frequency information in the female Japanese “e” utterance, male “ It shows the local frequency information in the utterance of "N" and the local frequency information in the mixed sound.
- Fig. 14 (c) is a graph in which local frequency information is arranged in time series at all sampling points when the frequency to be analyzed is 2KHz, and is different from the graph shown in Fig. 14 (b). Only the frequency to be analyzed is different.
- the analyzed waveform frequency feature quantity extraction unit 106 uses the plurality of pieces of local frequency information S103 as a set of data, so that the local frequency of the extracted sound included in the mixed sound S100 can be obtained with the above frequency resolution. Extract the information and create the Fourier coefficient S104 of the extracted sound using the local frequency information of the extracted sound, and extract the Fourier coefficient S104 of the extracted sound, which is one of the frequency features included in the mixed sound S100. (Step 204 in Figure 11).
- Fig. 15 shows an example of a method for extracting the local frequency information of the extracted sound included in the mixed sound S100.
- FIG. 15A shows an example of the local analysis waveform S102.
- FIG. 15 (b) is a diagram showing local frequency information for each of the fundamental frequency f1, the double frequency f2 of the fundamental frequency f1, and the triple frequency f3 of the fundamental frequency f1.
- Fig. 15 (c) is a diagram showing a pattern of local frequency information for a group of sounds to be extracted. Here, two patterns of local frequency information for female speech are shown.
- Fig. 15 (c) local frequency information (a collection of local frequency information in the Fourier transform time window) of a group of sounds to be extracted in advance is collected.
- the local frequency information of the extracted sound included in the mixed sound S100 is extracted.
- the female voice pattern is stored as described above.
- the error distance (reciprocal of similarity) is minimized by comparing the local frequency information S103 of a group of the mixed sound S100 with the stored local frequency information (female voice pattern) of the group. If the stored voice pattern is selected and the error distance is equal to or less than a predetermined threshold value, the local frequency information of the mixed sound S100 is extracted. Also, if the error distance is larger than the threshold! /, The value of the local frequency information of the woman to be extracted (for example, indicated by Z in FIG. 18 described later) using the voice pattern with the smallest stored error distance. You can create a!! Specifically, the error distance is calculated using Equation 22.
- the configuration of the conventional method and the method of the present invention will be compared using FIG.
- the conventional method calculates the error distance for each local frequency information and selects the minimum pattern
- the method of the present invention uses As shown in Fig. 16 (b), the error distance is calculated using a set of local frequency information as one pattern, and the maximum distance is calculated. A small pattern is selected. For this reason, it is the frequency information at the desired frequency resolution when the error distance of each local frequency information is reduced and multiple pieces of local frequency information are grouped together.
- FIG. 17 is a diagram showing an image of the space of the local frequency information.
- Equations 27 and 28 which are frequency information at the desired frequency resolution, indicate the values of intercepts with respect to each axis of the plane, and are pieces of local frequency information.
- Equation 30 indicate the points on the plane represented by Equation 27 and the plane represented by Equation 28, respectively.
- the distance between planes having a desired frequency resolution (the cut in FIG. 17).
- the distance between the points on the plane that expresses the change in the frequency in a minute time interval on the plane with the desired frequency resolution (the point shown in Equation 29 and the equation 30)
- the frequency feature amount is analyzed in consideration of the distance between the points indicated by (1).
- the conventional method is to measure the distance between points on the plane!
- a pattern was created by collecting a group of local frequency information of all the frequencies to be analyzed, but a female voice pattern is stored for each frequency to be analyzed and the frequency to be analyzed is stored.
- the error distance may be calculated using a piece of local frequency information.
- Frequency information at a desired frequency resolution when a plurality of pieces of local frequency information are grouped is calculated separately, and the frequency at the desired frequency resolution calculated together with the group of local frequency information is calculated.
- the error distance may be calculated using the information explicitly.
- the degree of similarity may be calculated using the ratio of each value of a group of local frequency information instead of Equation 22 as an evaluation formula for calculating the error distance.
- the Fourier coefficient S 104 of the extracted sound is obtained using the local frequency information of the extracted extracted sound.
- FIG. 18 (a) shows an example of the local frequency information of the extracted sound included in the mixed sound S100.
- the Fourier coefficient (Y in Fig. 18) as shown in Fig. 18 (b) is obtained by calculating the sum of the local frequency information (Z in Fig. 18) within the time window in the Fourier transform.
- the sound conversion unit 107 creates an extracted sound (extracted sound waveform) S105 using the Fourier coefficient S104 of the extracted sound (step 205 in FIG. 11).
- the extracted sound S 105 is created by inverse Fourier transform.
- the speaker 108 outputs the extracted sound S105 to the user (step 206 in FIG. 11).
- time resolution and frequency resolution can be set independently, and by comparing multiple pieces of local frequency information that have been subjected to frequency analysis with multiple frequency resolutions (multiple time resolutions), the force can also be applied to the time resolution and frequency resolution. At the same time, it is possible to obtain results such as force analyzed with frequency. Therefore, it is possible to extract the sound to be extracted with high accuracy from the mixed sound.
- the frequency analysis device may be incorporated into a force speech recognition system, a sound recognition system, a character recognition system, a face recognition system, or an iris authentication system incorporated in a mixed sound separation system! / .
- the time waveform is the analyzed waveform.
- the spatial waveform is the analyzed waveform, so "time resolution” corresponds to "spatial resolution”. Will do.
- “temporal resolution” and “spatial resolution” are collectively referred to as “spatio-temporal resolution”. “Spatial resolution” refers to the size of the spatial region that is averaged when obtaining the cross-correlation (convolution) between the waveform to be analyzed and the analysis waveform.
- Frequency analysis apparatus 102 can also be configured as follows.
- the frequency analyzer 102A creates the local frequency information by creating the local frequency information and creating a database (database).
- Two device forces can be configured: a frequency feature quantity analysis device 1001 that analyzes the frequency feature quantity S104 using the local frequency information DBS 1000 created by the device 1000.
- the analysis waveform time width determination unit 103A is based on the finest frequency resolution that the frequency feature amount analysis device 1001 will use when analyzing the frequency feature amount S 104.
- the analysis waveform S101 is created by determining the time width of the analysis waveform corresponding to the frequency to be analyzed. That is, the upper limit of the frequency resolution at which the frequency feature quantity analyzer 1001 can analyze the frequency feature quantity S104 is determined by the time width of the analysis waveform determined by the analysis waveform time width determination unit 103A.
- the local frequency information creation unit 105A performs a predetermined time resolution (analyzed waveform) based on the cross-correlation (convolution) between the mixed sound S100 captured from the microphone 101 and the local analysis waveform S102. And obtaining a plurality of local frequency information S103 corresponding to the local analysis waveform S102 including at least one of the amplitude spectrum and the phase spectrum.
- Local frequency consisting of at least (1) the analyzed frequency, (2) information on the shape of the local analysis waveform, and (3) the local frequency information S103 and the time of the analyzed waveform for which the corresponding local frequency information was obtained.
- FIG. 20 (a) shows an example of the local frequency information DBS 1000.
- the local frequency information DBS1000 has (1) the analyzed frequency is ⁇ , and (2) as the information on the local analysis waveform, the analysis of the cosine waveform force for five cycles that the local analysis waveforms do not overlap.
- the information that the time resolution is lms (the length of one cycle of the analyzed frequency ⁇ , that is, the length of one cycle of the analyzed waveform) and (3) five pieces of local frequency information (five pieces) (A value equivalent to the discrete cosine transform coefficient in the local analysis waveform) and the time of the waveform to be analyzed for which the corresponding local frequency information was obtained.
- FIG. 20 (b) and FIG. 20 (c) also show an image diagram for explanation.
- the image shown in Fig. 20 (b) shows that there is no overlap between the local analysis waveforms.
- FIG. 20 (c) shows that a group of five local frequency information pieces is obtained while shifting the waveform to be analyzed over time. This time shift interval (0.3 ms) can be set independently of the time interval (lms) of the five local analysis waveforms used to obtain the five pieces of local frequency information.
- the frequency resolution when five pieces of local frequency information are collected is the finest frequency resolution that can be analyzed by the frequency feature quantity analyzer 1001.
- FIG. 21 (a) shows another example of the local frequency information DBS 1000.
- This example shows an example of the local frequency information DB obtained from a local analysis waveform with multiple time resolutions.
- the analyzed frequency is 2KHz, and
- Information about the local analysis waveform is not limited to 4 cosine waveform forces.
- the local analysis waveform corresponding to the first cycle of the analysis waveform is 0.5 ms
- the local analysis waveform corresponding to the second cycle of the analysis waveform is 0.5 ms
- the third to fourth cycles of the analysis waveform Corresponding data that corresponds to 1.0 ms in the corresponding local analysis waveform and (3) three pieces of local frequency information (equivalent to the discrete cosine transform coefficients in the three local analysis waveforms) and the corresponding The time of the waveform to be analyzed for which local frequency information was obtained, and the like.
- FIG. 21 (b) and FIG. 21 (c) an image diagram is also shown for explanation.
- the image shown in Fig. 21 (b) shows that there is no overlap between the local analysis waveforms.
- Fig. 21 (c) it can be seen that a group of three pieces of local frequency information is obtained while shifting the waveform to be analyzed over time. This time shift interval (0.3 ms) is independent of the time intervals (0.5 ms, 0.5 ms, 1.0 ms) of the three local analysis waveforms used to obtain the three pieces of local frequency information. Can be set.
- the frequency resolution when the three pieces of local frequency information are collected is the finest frequency resolution that can be analyzed by the frequency feature analyzer 1001.
- FIG. 22 shows another example of local frequency information DBS 1000.
- the above-mentioned frequency information (refer to Equation 11, Equation 12, Equation 13, Equation 14, and Equation 15) is also added, which is the sum of the values of multiple pieces of local frequency information that are grouped together. It has been converted into a database.
- the local frequency information DBS 1000 is created and stored.
- the analyzed waveform frequency feature quantity extraction unit 106A includes a frequency resolution determination unit 1002.
- Analyzed waveform frequency Feature quantity extraction unit 106A receives local frequency information DBS1000, and based on the frequency resolution determined by frequency resolution determination unit 1002, local frequency information DBS1000 holds (3) Multiple local frequencies The number of local frequency information to be handled as a set of data is determined from the time of the analyzed waveform for which the corresponding local frequency information is obtained.
- the local frequency information DBS 1000 may be received using a communication channel, or may be acquired by a recording medium such as a memory force.
- the frequency resolution determination unit 1002 may be omitted.
- FIG. 23 shows an example of a frequency feature amount analysis method using the local frequency information DBS1000.
- frequency feature quantities are analyzed using all (5) local frequency information enclosed in a circle in the figure as a set of data.
- a specific analysis method of the frequency feature quantity using a piece of local frequency information is performed in the same manner as the analyzed waveform frequency feature quantity extraction unit 106 in FIG. In this example, the frequency resolution determining unit 1002 is not necessary.
- FIG. 24 shows another example of the frequency feature amount analysis method using the local frequency information DBS1000.
- the frequency resolution determination unit 1002 calculates the relationship between the number of local frequency information to be collected and the frequency resolution from the frequency ⁇ to be analyzed and the time resolution lms held in the local frequency information DBS1000. Based on the determined frequency resolution, frequency feature quantities are analyzed using the three pieces of local frequency information enclosed in a circle in the figure as a set of data. A specific analysis method of the frequency feature quantity using a piece of local frequency information is performed in the same manner as the analyzed waveform frequency feature quantity extraction unit 106 in FIG. As shown in the example of FIG. 24, by using a part of the local frequency information held in the local frequency information DB, it is possible to analyze the frequency feature quantity with a desired frequency resolution.
- the frequency feature value may be analyzed using a piece of local frequency information at a time of 1.2 ms. In this case, the frequency feature amount is analyzed using a part of the local frequency information DBS1000.
- the error function of Formula 22 is used in the operation of the analyzed waveform frequency feature quantity extraction unit 106 in FIG. Instead of using the frequency information of the local frequency information DBS1000 in Fig. 22, which is the frequency information at the desired frequency resolution when a plurality of pieces of local frequency information are collected as a group, using the following Equation 31. To calculate the error distance.
- W is a weighting factor
- the error distance may be calculated using the error function of Equation 31 by calculating the “frequency information” by calculating the sum of the values of the local frequency information.
- local frequency information creation unit 105A local frequency information DBS 1000, frequency to be analyzed
- Another example of the number feature quantity extraction unit 106A is shown.
- the local frequency information creation unit 105A obtains a predetermined temporal resolution (correlation between the waveform to be analyzed and the analysis waveform) based on the correlation (convolution) between the mixed sound S100 and the local analysis waveform S102.
- a predetermined temporal resolution correlation between the waveform to be analyzed and the analysis waveform
- the correlation convolution
- FIG. 25 (a) shows an example of the local frequency information DBS 1000.
- the expression of (3) local frequency information S103 and the time of the analyzed waveform for which the corresponding local frequency information was obtained It is parallel to the direction. That is, the three local frequency information at time 1.0 ms are local frequency information at time 1.0 ms, local frequency information at time 2.0 ms, and local frequency information at time 3.0 ms, and five local frequency information at time 2.0 ms.
- the frequency information is local frequency information at time 2.0 ms, local frequency information at time 3.0 ms, local frequency information at time 4.0 ms, local frequency information at time 5.0 ms, and local frequency information at time 6.0 ms.
- the reason for this representation is 1.0 ms for one period of IKHz, which is the frequency to analyze the time resolution capability, and a set of local frequency information of a set of integers is temporally related to the waveform to be analyzed. This is because it is the same as the 1.0 ms interval for shifting to (see Fig. 25 (b) and Fig. 25 (c)).
- the local frequency information of the second and subsequent periods at the previous time can be expressed by the local frequency information of the first period shifted in time.
- (1) the analyzed frequency and (2) information on the shape of the local analysis waveform are the same as the example of the local frequency information DB in FIG.
- FIG. 26 shows another example of local frequency information DB1000.
- the analyzed frequency unlike the example of the local frequency information DB in Fig. 25, (1) the analyzed frequency, (2) information on the shape of the local analysis waveform, and (3) the local frequency, for a plurality of analyzed frequencies.
- Information S103 and the time of the waveform to be analyzed for which the corresponding local frequency information is obtained are stored in a database.
- the station Create a database of frequency information.
- the local frequency information DBS 1000 is created and stored.
- the analyzed waveform frequency feature quantity extraction unit 106 A includes a frequency resolution determination unit 1002.
- the analyzed waveform frequency feature quantity extraction unit 106A receives the local frequency information DBS1000, and based on the frequency resolution determined by the frequency resolution determination unit 1002, the local frequency information DBS1000 holds (3) a plurality of local frequencies and The number of local frequency information to be handled as a set of data is determined from the time of the analyzed waveform for which the corresponding local frequency information was obtained.
- Fig. 27 shows an example of a frequency feature amount analysis method using the local frequency information DBS1000.
- the frequency resolution determination unit 1002 calculates the relationship between the frequency frequency resolution and the number of local frequency information to be collected from the frequency ⁇ to be analyzed and the time resolution lms stored in the local frequency information DB. Based on the determined frequency resolution, frequency feature values are analyzed using three pieces of local frequency information as a set of data.
- the three pieces of local frequency information in this example are: time 0.0 ms, local frequency information at time 0.0 ms, local frequency information at time 1.0 ms, and local information at time 2.0 ms This is frequency information.
- the local frequency information at time 2.0 ms, the local frequency information at time 3.0 ms, and the local frequency information at time 4.0 ms are enclosed in a broken-line circle in the figure.
- a set of local frequency information is obtained every 1.0 ms between time shifts.
- a specific analysis method of the frequency feature amount using a piece of local frequency information is performed in the same manner as the analyzed waveform frequency feature amount extraction unit 106 in FIG.
- FIG. 28 shows an example of another analysis method of the frequency feature amount using the local frequency information DBS1000.
- a group of local frequency information is obtained at time shift intervals of 3.0 ms (solid circles and dashed circles in the figure). This time shift interval may be 5.0 ms or 8.0 ms. In this way, the time shift interval can be set freely.
- a specific analysis method of the frequency feature amount using the local frequency information of the cluster is performed in the same manner as the analyzed waveform frequency feature amount extraction unit 106 in FIG.
- the frequency feature amount S104 is extracted.
- Frequency feature analysis apparatus 1001 further includes a frequency resolution input receiving unit, so that the frequency resolution can be determined based on application specifications and the like. Such frequency resolution may be input from the outside.
- the present invention can be used in systems such as a mixed sound separation system, a speech recognition system, a sound recognition system, a character recognition system, a face recognition system, and an iris authentication system.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE602006018282T DE602006018282D1 (de) | 2005-05-13 | 2006-04-11 | Vorrichtung zur trennung gemischter audiosignale |
EP06731620A EP1881489B1 (en) | 2005-05-13 | 2006-04-11 | Mixed audio separation apparatus |
JP2006522162A JP4041154B2 (ja) | 2005-05-13 | 2006-04-11 | 混合音分離装置 |
US11/665,265 US7974420B2 (en) | 2005-05-13 | 2006-04-11 | Mixed audio separation apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2005-141939 | 2005-05-13 | ||
JP2005141939 | 2005-05-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006120829A1 true WO2006120829A1 (ja) | 2006-11-16 |
Family
ID=37396345
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2006/307673 WO2006120829A1 (ja) | 2005-05-13 | 2006-04-11 | 混合音分離装置 |
Country Status (6)
Country | Link |
---|---|
US (1) | US7974420B2 (ja) |
EP (1) | EP1881489B1 (ja) |
JP (1) | JP4041154B2 (ja) |
CN (1) | CN100585701C (ja) |
DE (1) | DE602006018282D1 (ja) |
WO (1) | WO2006120829A1 (ja) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009270896A (ja) * | 2008-05-02 | 2009-11-19 | Tektronix Japan Ltd | 信号分析装置及び周波数領域データ表示方法 |
WO2013005550A1 (ja) * | 2011-07-01 | 2013-01-10 | クラリオン株式会社 | 直接音抽出装置および残響音抽出装置 |
JP2016161573A (ja) * | 2015-02-27 | 2016-09-05 | キーサイト テクノロジーズ, インク. | 広帯域位相スペクトル測定における使用に適合した位相勾配基準 |
WO2018055673A1 (ja) * | 2016-09-20 | 2018-03-29 | 三菱電機株式会社 | 干渉識別装置および干渉識別方法 |
TWI740315B (zh) * | 2019-08-23 | 2021-09-21 | 大陸商北京市商湯科技開發有限公司 | 聲音分離方法、電子設備和電腦可讀儲存媒體 |
WO2022059869A1 (ko) * | 2020-09-15 | 2022-03-24 | 삼성전자 주식회사 | 영상의 음질을 향상시키는 디바이스 및 방법 |
JP2022521244A (ja) * | 2019-02-19 | 2022-04-06 | 株式会社ソニー・インタラクティブエンタテインメント | ハイブリッドスピーカ及びコンバータ |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2007080764A1 (ja) * | 2006-01-12 | 2007-07-19 | Matsushita Electric Industrial Co., Ltd. | 対象音分析装置、対象音分析方法および対象音分析プログラム |
US20070299657A1 (en) * | 2006-06-21 | 2007-12-27 | Kang George S | Method and apparatus for monitoring multichannel voice transmissions |
US8219409B2 (en) * | 2008-03-31 | 2012-07-10 | Ecole Polytechnique Federale De Lausanne | Audio wave field encoding |
US8620646B2 (en) * | 2011-08-08 | 2013-12-31 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal using harmonic envelope |
US8925058B1 (en) * | 2012-03-29 | 2014-12-30 | Emc Corporation | Authentication involving authentication operations which cross reference authentication factors |
US9670492B2 (en) | 2013-08-28 | 2017-06-06 | Ionis Pharmaceuticals, Inc. | Modulation of prekallikrein (PKK) expression |
CN103871417A (zh) * | 2014-03-25 | 2014-06-18 | 北京工业大学 | 一种移动手机特定连续语音过滤方法及过滤装置 |
EP3137091B1 (en) | 2014-05-01 | 2020-12-02 | Ionis Pharmaceuticals, Inc. | Conjugates of modified antisense oligonucleotides and their use for modulating pkk expression |
JP6696221B2 (ja) * | 2016-02-26 | 2020-05-20 | セイコーエプソン株式会社 | 制御装置、受電装置、電子機器及び電力伝送システム |
CN106128472A (zh) * | 2016-07-12 | 2016-11-16 | 乐视控股(北京)有限公司 | 演唱者声音的处理方法及装置 |
JP6907859B2 (ja) * | 2017-09-25 | 2021-07-21 | 富士通株式会社 | 音声処理プログラム、音声処理方法および音声処理装置 |
CN109801644B (zh) | 2018-12-20 | 2021-03-09 | 北京达佳互联信息技术有限公司 | 混合声音信号的分离方法、装置、电子设备和可读介质 |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004028640A (ja) * | 2002-06-21 | 2004-01-29 | Sony Corp | スペクトラムアナライザー装置、再生装置、スペクトラム解析方法、プログラム、記録媒体 |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE4121356C2 (de) * | 1991-06-28 | 1995-01-19 | Siemens Ag | Verfahren und Einrichtung zur Separierung eines Signalgemisches |
US6317703B1 (en) * | 1996-11-12 | 2001-11-13 | International Business Machines Corporation | Separation of a mixture of acoustic sources into its components |
SE521024C2 (sv) * | 1999-03-08 | 2003-09-23 | Ericsson Telefon Ab L M | Metod och anordning för att separera en blandning av källsignaler |
EP1887561A3 (en) * | 1999-08-26 | 2008-07-02 | Sony Corporation | Information retrieving method, information retrieving device, information storing method and information storage device |
JP4491700B2 (ja) | 1999-08-26 | 2010-06-30 | ソニー株式会社 | 音響検索処理方法、音響情報検索装置、音響情報蓄積方法、音響情報蓄積装置および音響映像検索処理方法、音響映像情報検索装置、音響映像情報蓄積方法、音響映像情報蓄積装置 |
US6879952B2 (en) * | 2000-04-26 | 2005-04-12 | Microsoft Corporation | Sound source separation using convolutional mixing and a priori sound source knowledge |
JP2002236494A (ja) | 2001-02-09 | 2002-08-23 | Denso Corp | 音声区間判別装置、音声認識装置、プログラム及び記録媒体 |
JP2003061198A (ja) * | 2001-08-10 | 2003-02-28 | Pioneer Electronic Corp | オーディオ再生装置 |
JP3931237B2 (ja) * | 2003-09-08 | 2007-06-13 | 独立行政法人情報通信研究機構 | ブラインド信号分離システム、ブラインド信号分離方法、ブラインド信号分離プログラムおよびその記録媒体 |
US7454333B2 (en) * | 2004-09-13 | 2008-11-18 | Mitsubishi Electric Research Lab, Inc. | Separating multiple audio signals recorded as a single mixed signal |
JP2007034184A (ja) * | 2005-07-29 | 2007-02-08 | Kobe Steel Ltd | 音源分離装置,音源分離プログラム及び音源分離方法 |
US8014536B2 (en) * | 2005-12-02 | 2011-09-06 | Golden Metallic, Inc. | Audio source separation based on flexible pre-trained probabilistic source models |
WO2007080764A1 (ja) * | 2006-01-12 | 2007-07-19 | Matsushita Electric Industrial Co., Ltd. | 対象音分析装置、対象音分析方法および対象音分析プログラム |
JP4672611B2 (ja) * | 2006-07-28 | 2011-04-20 | 株式会社神戸製鋼所 | 音源分離装置、音源分離方法及び音源分離プログラム |
-
2006
- 2006-04-11 US US11/665,265 patent/US7974420B2/en active Active
- 2006-04-11 CN CN200680001027A patent/CN100585701C/zh active Active
- 2006-04-11 DE DE602006018282T patent/DE602006018282D1/de active Active
- 2006-04-11 EP EP06731620A patent/EP1881489B1/en active Active
- 2006-04-11 JP JP2006522162A patent/JP4041154B2/ja active Active
- 2006-04-11 WO PCT/JP2006/307673 patent/WO2006120829A1/ja active Application Filing
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004028640A (ja) * | 2002-06-21 | 2004-01-29 | Sony Corp | スペクトラムアナライザー装置、再生装置、スペクトラム解析方法、プログラム、記録媒体 |
Non-Patent Citations (5)
Title |
---|
HIROKI NAKANO; OTHER TWO AUTHORS: "Ueiburetto ni yoru Shingo Shori to Gazo Shori (Signal Processing and Image Processing through Wavelet", 15 August 1999, KYORITSU PRESS, pages: 35 - 39 |
KAMEOKA H. ET AL.: "Audio Stream Segregation Based on Time-Space Clustering Using Gaussian Kernel 2-Dimensional Model", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2005. PROCEEDINGS. (ICASSP'05). 2005 IEEE INTERNATIONAL CONFERENCE, vol. 3, March 2005 (2005-03-01), pages 5 - 8, XP010792315 * |
SEIICHI NAKAGAWA: "Patan Joho Shori (Pattern Image Processing", 30 March 1999, MARUZEN CO. LTD., pages: 14 - 19 |
SRINIVASAN S.H. AND KANKANHALLI M.: "Harmonicity and dynamics based audio separation", ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2003. PROCEEDINGS. (ICASSP'03). 2003 IEEE INTERNATIONAL CONFERENCE, vol. 5, 6 April 2003 (2003-04-06), pages 640 - 643, XP010639353 * |
THOMAS F. QUATIERI; RONALD G. DANISEWICZ: "An Approach to Co-Channel Talker Interference Suppression Using a Sinusoidal Model for Speech", IEEE TRANSACTIONS ON ACCOUSTICS, SPEECH AND SIGNAL PROCESSING, vol. 38, no. 1, January 1990 (1990-01-01), pages 56 - 69 |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009270896A (ja) * | 2008-05-02 | 2009-11-19 | Tektronix Japan Ltd | 信号分析装置及び周波数領域データ表示方法 |
WO2013005550A1 (ja) * | 2011-07-01 | 2013-01-10 | クラリオン株式会社 | 直接音抽出装置および残響音抽出装置 |
JP2013015606A (ja) * | 2011-07-01 | 2013-01-24 | Clarion Co Ltd | 直接音抽出装置および残響音抽出装置 |
CN103503066A (zh) * | 2011-07-01 | 2014-01-08 | 歌乐株式会社 | 直达声提取装置和混响声提取装置 |
JP2016161573A (ja) * | 2015-02-27 | 2016-09-05 | キーサイト テクノロジーズ, インク. | 広帯域位相スペクトル測定における使用に適合した位相勾配基準 |
WO2018055673A1 (ja) * | 2016-09-20 | 2018-03-29 | 三菱電機株式会社 | 干渉識別装置および干渉識別方法 |
DE112016007146B4 (de) * | 2016-09-20 | 2019-12-24 | Mitsubishi Electric Corporation | Störungsidentifizierungsvorrichtung und Störungsidentifizierungsverfahren |
JP2022521244A (ja) * | 2019-02-19 | 2022-04-06 | 株式会社ソニー・インタラクティブエンタテインメント | ハイブリッドスピーカ及びコンバータ |
JP7271695B2 (ja) | 2019-02-19 | 2023-05-11 | 株式会社ソニー・インタラクティブエンタテインメント | ハイブリッドスピーカ及びコンバータ |
US11832071B2 (en) | 2019-02-19 | 2023-11-28 | Sony Interactive Entertainment Inc. | Hybrid speaker and converter |
TWI740315B (zh) * | 2019-08-23 | 2021-09-21 | 大陸商北京市商湯科技開發有限公司 | 聲音分離方法、電子設備和電腦可讀儲存媒體 |
WO2022059869A1 (ko) * | 2020-09-15 | 2022-03-24 | 삼성전자 주식회사 | 영상의 음질을 향상시키는 디바이스 및 방법 |
Also Published As
Publication number | Publication date |
---|---|
US20090067647A1 (en) | 2009-03-12 |
EP1881489B1 (en) | 2010-11-17 |
EP1881489A4 (en) | 2008-05-28 |
CN101040324A (zh) | 2007-09-19 |
EP1881489A1 (en) | 2008-01-23 |
CN100585701C (zh) | 2010-01-27 |
DE602006018282D1 (de) | 2010-12-30 |
JP4041154B2 (ja) | 2008-01-30 |
JPWO2006120829A1 (ja) | 2008-12-18 |
US7974420B2 (en) | 2011-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2006120829A1 (ja) | 混合音分離装置 | |
JP4065314B2 (ja) | 対象音分析装置、対象音分析方法および対象音分析プログラム | |
Wang et al. | Specaugment++: A hidden space data augmentation method for acoustic scene classification | |
US20060064299A1 (en) | Device and method for analyzing an information signal | |
US20050228518A1 (en) | Filter set for frequency analysis | |
JP2001184083A (ja) | 自動音声認識のための特徴量抽出方法 | |
JP2015138053A (ja) | 音響信号処理装置およびその方法 | |
Do et al. | Speech Separation in the Frequency Domain with Autoencoder. | |
Chu et al. | A noise-robust FFT-based auditory spectrum with application in audio classification | |
Dziubinski et al. | Estimation of musical sound separation algorithm effectiveness employing neural networks | |
JP4119112B2 (ja) | 混合音の分離装置 | |
Agcaer et al. | Optimization of amplitude modulation features for low-resource acoustic scene classification | |
JP3699912B2 (ja) | 音声特徴量抽出方法と装置及びプログラム | |
Muhsina et al. | Signal enhancement of source separation techniques | |
Olivero et al. | Sound morphing strategies based on alterations of time-frequency representations by Gabor multipliers | |
Dang et al. | THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement | |
Zhang et al. | Improving Design of Input Condition Invariant Speech Enhancement | |
Jiang et al. | A Complex Neural Network Adaptive Beamforming for Multi-channel Speech Enhancement in Time Domain | |
Fitzgerald et al. | On inpainting the adress algorithm | |
Becker et al. | Adaptive weights for NMF with additional priors | |
Lee et al. | Adversarial audio synthesis using a harmonic-percussive discriminator | |
EP2840570A1 (en) | Enhanced estimation of at least one target signal | |
Ragano et al. | Exploring a Perceptually-Weighted DNN-based Fusion Model for Speech Separation. | |
JP3223564B2 (ja) | ピッチ抽出方法 | |
Sharma et al. | Time-varying sinusoidal demodulation for non-stationary modeling of speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 2006522162 Country of ref document: JP |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 11665265 Country of ref document: US Ref document number: 2006731620 Country of ref document: EP Ref document number: 200680001027.6 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
NENP | Non-entry into the national phase |
Ref country code: RU |
|
WWP | Wipo information: published in national office |
Ref document number: 2006731620 Country of ref document: EP |