JP2006343544A - Voice recognition method - Google Patents

Voice recognition method Download PDF

Info

Publication number
JP2006343544A
JP2006343544A JP2005169217A JP2005169217A JP2006343544A JP 2006343544 A JP2006343544 A JP 2006343544A JP 2005169217 A JP2005169217 A JP 2005169217A JP 2005169217 A JP2005169217 A JP 2005169217A JP 2006343544 A JP2006343544 A JP 2006343544A
Authority
JP
Japan
Prior art keywords
contribution
section
spectrum
frequency
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2005169217A
Other languages
Japanese (ja)
Other versions
JP4890792B2 (en
JP2006343544A5 (en
Inventor
Takashi Nakayama
隆 中山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miyazaki Prefecture
Original Assignee
Miyazaki Prefecture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miyazaki Prefecture filed Critical Miyazaki Prefecture
Priority to JP2005169217A priority Critical patent/JP4890792B2/en
Publication of JP2006343544A publication Critical patent/JP2006343544A/en
Publication of JP2006343544A5 publication Critical patent/JP2006343544A5/ja
Application granted granted Critical
Publication of JP4890792B2 publication Critical patent/JP4890792B2/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To discriminate voices by a defined reference irrespective of the level of audio signal. <P>SOLUTION: A ratio of amplitude or power of a fundamental wave and each higher harmonic component to the total sum of amplitude or power of the fundamental wave and each higher harmonic component included in a voice frequency region is acquired as a contribution ratio, and the phonemes of a consonant and a vowel are specified by the manner in the presence of the contribution ratio not affected by the level of the audio signal. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、話者の音声から、簡便な処理装置を用いて言語を認識することができる音声認識方法に関する。さらに詳しくは、音声信号の大小に拘わらず同じ基準で分析することができる音声認識方法に関する。   The present invention relates to a speech recognition method capable of recognizing a language from a speaker's voice using a simple processing device. More specifically, the present invention relates to a speech recognition method that can perform analysis based on the same reference regardless of the size of a speech signal.

従来、音声認識方法としては、音声波形から母音領域と子音領域を分別し、分別された母音領域の波形と子音領域の波形から当該母音と子音を特定して認識できるようにする方法が提案されている(例えば、特許文献1参照)。   Conventionally, as a speech recognition method, a method has been proposed in which a vowel region and a consonant region are separated from a speech waveform, and the vowel and consonant can be identified and recognized from the waveform of the separated vowel region and the waveform of the consonant region. (For example, refer to Patent Document 1).

また、上記母音と子音の特定方法として、分別された母音領域について、音声波形の音声信号レベルが電圧ゼロボルトを通過してから正電圧領域を推移して再び電圧ゼロボルトを通過するまでの時間を検知して当該母音を特定し、分別された子音領域について、音声波形の音声信号レベルが電圧ゼロボルトを通過または電圧ゼロボルト近傍から上昇後、正電圧領域を推移して再び電圧ゼロボルトを通過または電圧ゼロボルト近傍に達するまでの時間を検知して当該子音を特定する方法も提案されている(例えば、特許文献2参照)。   In addition, as a method for identifying the vowel and consonant, for the separated vowel region, the time from when the speech signal level of the speech waveform passes through the voltage zero volt until it passes through the positive voltage region and again passes the voltage zero volt is detected. The vowel is identified, and for the separated consonant region, the voice signal level of the speech waveform passes through the voltage zero volt or rises from near the voltage zero volt, then moves through the positive voltage region and again passes through the voltage zero volt or near the voltage zero volt. There has also been proposed a method for identifying the consonant by detecting the time required to reach (see, for example, Patent Document 2).

特開平9−101797号公報Japanese Patent Laid-Open No. 9-101797 特開2001−265379号公報JP 2001-265379 A

上記従来の音声認識方法は、いずれも母音と子音を分けて識別しようとするものであるが、マイクロホンなどから採取した音声波形そのものを基準に識別するものとなっている。このため、特に声の大小(音声信号の大小)の影響を受けやすく、日常会話などの条件が不定な環境下では正確な識別が行いにくい問題がある。   All of the above conventional speech recognition methods attempt to identify vowels and consonants separately, but identify them based on the speech waveform itself collected from a microphone or the like. For this reason, there is a problem that accurate discrimination is difficult in an environment in which conditions such as daily conversation are uncertain because it is easily affected by the magnitude of the voice (the magnitude of the audio signal).

本願発明は、上記従来の音声認識法の問題点に鑑みてなされたもので、音声信号の大小に拘わらず一定の基準で音声を識別できるようにすることを目的とする。   The present invention has been made in view of the problems of the above-described conventional speech recognition method, and an object thereof is to make it possible to identify speech on a constant basis regardless of the size of the speech signal.

本発明は、上記目的のために、音声信号からサンプリングされA/D変換された音声データ群を周波数分析し、これによって得られる振幅スペクトルまたはパワースペクトルから、音声周波数領域に含まれる基本波および各高調波成分の振幅またはパワーの合計に対する、基本波および各高調波成分のそれぞれの振幅またはパワーの比率を寄与率として求め、この寄与率の現れ方から、子音と母音の音素を特定することを特徴とする音声認識方法を提供するものである。   For the above purpose, the present invention performs frequency analysis on audio data groups sampled from an audio signal and A / D-converted, and from the amplitude spectrum or power spectrum obtained thereby, the fundamental wave included in the audio frequency domain and each The ratio of the amplitude or power of each of the fundamental wave and each harmonic component to the sum of the amplitude or power of the harmonic components is obtained as a contribution rate, and the phoneme of the consonant and the vowel is specified from the appearance of this contribution rate. A characteristic speech recognition method is provided.

また、上記本発明は、音声データ群を子音領域と母音領域に区分し、子音領域の音声データ群と母音領域の音声データ群をそれぞれ周波数分析して寄与率を求め、各音声データ群における寄与率の現れ方から、子音と母音の音素を特定すること、
寄与率として、音声周波数領域に含まれる基本波および各高調波成分の振幅の合計に対する基本波および各高調波成分のそれぞれの振幅の比率を用いること、
音声データ群に対し、N個の音声データの分析区間毎に順次周波数分析を施し、各分析区間毎に寄与率を求めること、
音声データ群に対して窓関数処理を施してから周波数分析を施すこと、
窓関数処理にハミング窓を用い、周波数分析に高速フーリエ解析を用いること、
をその好ましい態様として含むものである。
In the present invention, the speech data group is divided into a consonant region and a vowel region, and the contribution rate is obtained by frequency analysis of the speech data group of the consonant region and the speech data group of the vowel region, and the contribution in each speech data group Identifying the phonemes of consonants and vowels from the way the rate appears,
As a contribution rate, using the ratio of the amplitude of each of the fundamental wave and each harmonic component to the sum of the amplitudes of the fundamental wave and each harmonic component included in the audio frequency domain,
The frequency analysis is sequentially performed for each analysis section of the N pieces of voice data for the voice data group, and a contribution rate is obtained for each analysis section.
Performing frequency analysis after performing window function processing on the audio data group,
Use a Hamming window for window function processing, use fast Fourier analysis for frequency analysis,
Is included as a preferred embodiment thereof.

本発明の音声認識方法は、寄与率を用いて子音と母音の特定を行うものとなっている。   The speech recognition method of the present invention specifies consonants and vowels using the contribution rate.

ところで、本発明における寄与率は、音声周波数領域に含まれる基本波および各高調波成分の振幅の合計に対する基本波および各高調波成分のそれぞれの振幅の比率、または、音声周波数領域に含まれる基本波および各高調波成分のパワーの合計に対する基本波および各高調波成分のそれぞれのパワーの比率である。そして、寄与率は、上記のような比率であることから、音声信号の大小に影響を受けることのない値であり、本発明は、この寄与率を基準に音声認識を行うものであることから、音声信号の大小に拘わらず、確度の高い識別を行うことができるものである。   By the way, the contribution rate in the present invention is the ratio of the amplitude of each of the fundamental wave and each harmonic component to the sum of the amplitudes of the fundamental wave and each harmonic component contained in the speech frequency region, or the fundamental contained in the speech frequency region. The ratio of the power of each of the fundamental wave and each harmonic component to the total power of the wave and each harmonic component. Since the contribution rate is the ratio as described above, it is a value that is not affected by the magnitude of the audio signal, and the present invention performs speech recognition based on this contribution rate. Therefore, it is possible to identify with high accuracy regardless of the size of the audio signal.

本発明に係る音声識別方法の基本的な手順を説明する。なお、ここでの説明は、説明の便宜上、被験者が日本語における五十音の一音を発してこれをサンプリングした場合を例にして説明する。   The basic procedure of the speech identification method according to the present invention will be described. In addition, description here demonstrates as an example the case where a test subject has sampled and emitted one Japanese syllabary sound for convenience of explanation.

まず、本発明に係る音声認識方法の一例を図1に基づいて説明する。   First, an example of the speech recognition method according to the present invention will be described with reference to FIG.

音声信号は、例えばマイクロホンなどによりアナログ信号として採取し、必要に応じて増幅したりフィルター処理を加えた後、サンプリングし、A/D変換して、一旦メモリーに音声データ群として記録する。   The audio signal is collected as an analog signal using, for example, a microphone, amplified or filtered as necessary, sampled, A / D converted, and temporarily recorded as an audio data group in a memory.

音声を認識する上で分析が必要な周波数は、言語によっても多少異なるが、例えば日本語においては、5〜5.5kHz程度までは必要であろうと考えられる。また、連続信号に含まれる周波数成分を正しくサンプリングデータとして得るには、サンプリング周波数が連続信号の持つ周波数の上限の2倍以上でなければならないとされていることから、サンプリング周波数は10kHz以上であることが好ましい。後述する具体例では50kHzでサンプリングを行っているが、現実的にはこれほどの高い周波数とする必要はない。また、予めサンプリング周波数の1/2を超える周波数成分をフィルター(ローパスフィルター)でカットしておくことが好ましい。   The frequency that needs to be analyzed for recognizing speech differs somewhat depending on the language, but for example, in Japanese, it may be necessary to have a frequency of about 5 to 5.5 kHz. In addition, in order to correctly obtain the frequency component included in the continuous signal as sampling data, the sampling frequency must be at least twice the upper limit of the frequency of the continuous signal, so the sampling frequency is 10 kHz or more. It is preferable. In a specific example to be described later, sampling is performed at 50 kHz, but it is not necessary to set the frequency as high as practical. Moreover, it is preferable to cut in advance a frequency component exceeding 1/2 of the sampling frequency with a filter (low-pass filter).

メモリーに格納された音声データは、通常、最初と最後に存在する無信号領域(無音領域)を除いて取り出されて周波数分析が施されるが、周波数分析誤差をできるだけ少なく抑えるため、その前処理として、窓関数処理を施すことが好ましい。   The audio data stored in the memory is usually extracted and subjected to frequency analysis except for the no-signal area (silence area) that exists at the beginning and end, but in order to minimize the frequency analysis error, the pre-processing is performed. It is preferable to perform window function processing.

窓関数処理を行う場合の窓関数としては、ハニング窓、ハミング窓、ブラックマン窓、矩形窓などがあり、いずれを用いることも可能であるが、音声はランダム波形であることから、音声解析で最も一般的に用いられているハミング窓が好ましい。   As window functions when performing window function processing, there are Hanning window, Hamming window, Blackman window, rectangular window, etc., and any of them can be used. The most commonly used Hamming window is preferred.

ハミング窓を用いた場合、元の音声データ値をd、データ番号をn、周波数分析に用いるデータ数をNとすると、変換後のデータXは以下の通りとなる。   When the Hamming window is used, if the original audio data value is d, the data number is n, and the number of data used for frequency analysis is N, the converted data X is as follows.

X=d×〔0.54−0.46×cos{2×π×n/(N−1)}〕
n=0〜(N−1)
X = d × [0.54-0.46 × cos {2 × π × n / (N−1)}]
n = 0 to (N-1)

周波数分析に用いるデータ数Nは、これが少ない場合周波数分解能は低下するが、分析区間内での時間分解能が大きく現れる。逆に、データ数Nが多いと、分析区間内での時間分解能は小さくなるが、周波数分解能が向上する。周波数分解能が過剰に低下すると、後述する寄与率が音声波形に含まれる周波数成分を正しく反映しにくくなり、寄与率の現れ方の特徴を掴みにくくなることから、40〜100Hzの分解能が得られるようにサンプリング周波数に応じてデータ数を調整することが好ましい。なお、高速フーリエ解析の場合、データ数は2の整数乗となる。   When the number N of data used for frequency analysis is small, the frequency resolution decreases, but the time resolution within the analysis interval appears greatly. Conversely, when the number of data N is large, the time resolution in the analysis interval is small, but the frequency resolution is improved. If the frequency resolution is excessively reduced, it will be difficult for the contribution rate described later to correctly reflect the frequency component contained in the speech waveform, and it will be difficult to grasp the characteristics of how the contribution rate appears, so that a resolution of 40 to 100 Hz can be obtained. It is preferable to adjust the number of data according to the sampling frequency. In the case of fast Fourier analysis, the number of data is an integer power of 2.

音声データ群について、必要に応じて上記窓関数処理を施してから周波数分析を行い、振幅スペクトルおよび/またはパワースペクトルを求める。この周波数分析にはフーリエ解析、特に処理時間が短い高速フーリエ解析が好ましい。   The audio data group is subjected to the window function processing as necessary and then subjected to frequency analysis to obtain an amplitude spectrum and / or a power spectrum. For this frequency analysis, Fourier analysis, particularly fast Fourier analysis with a short processing time is preferable.

フーリエ解析(高速フーリエ解析)を用いた場合、1回の解析で、前記音声データ群中のN個(例えば512個、1024個などの2の整数乗の個数)の音声データについて、m=1(mは次数)の時の基本周波数と基本周波数の整数倍(基本周波数の次数倍)の周波数(高調波)とについて、それぞれ対応する正弦波成分の係数amと余弦波成分の係数bmが得られる。そして、これらの係数を用い、以下のようにして振幅スペクトルXmとパワースペクトルXm 2を求めることができる。なお、m=0は直流成分に対応する。 When Fourier analysis (fast Fourier analysis) is used, m = 1 for N pieces of voice data (for example, the number of integer powers of 2 such as 512, 1024, etc.) in the voice data group in one analysis. (m is the order) the fundamental frequency and the frequency (harmonic) of integer multiples of the fundamental frequency (the next multiple of the fundamental frequency), the coefficient b m of the coefficient a m and the cosine wave component of the corresponding sine wave component when the Is obtained. Then, using these coefficients, the amplitude spectrum X m and the power spectrum X m 2 can be obtained as follows. Note that m = 0 corresponds to a DC component.

m=√(am 2+bm 2
m 2=am 2+bm 2
X m = √ (a m 2 + b m 2)
X m 2 = a m 2 + b m 2

本発明における寄与率は、m=1の基本周波数からm=2以上の各高調波成分の振幅スペクトルXmの合計に対する各振幅スペクトルXmの比率C、または、m=1の基本周波数からm=2以上の各高調波成分のパワースペクトルXm 2の合計に対する各パワースペクトルXm 2の比率C’として求めることができる。CまたはC’は、比として求めても百分率で求めても良い。比として求める場合、下記式となる。百分率で求める場合、それぞれ100を乗じた値となる。 The contribution ratio in the present invention is the ratio C of each amplitude spectrum X m to the sum of the amplitude spectra X m of each harmonic component of m = 2 or more from the fundamental frequency of m = 1, or m from the fundamental frequency of m = 1. = The ratio C ′ of each power spectrum X m 2 to the sum of the power spectra X m 2 of each harmonic component equal to or greater than 2 can be obtained. C or C ′ may be obtained as a ratio or as a percentage. When calculating | requiring as a ratio, it becomes a following formula. When obtained as a percentage, each value is multiplied by 100.

C=(1/ΣXm)×(Xm
C’=(1/ΣXm 2)×(Xm 2
C = (1 / ΣX m ) × (X m )
C ′ = (1 / ΣX m 2 ) × (X m 2 )

なお、mの上限は、周波数分析を行う際の周波数分解能によって異なるが、音声認識を行うに必要な周波数まで分析できる次数までで足る。具体的には、サンプリング周波数が50kHz、データ数Nが1024個であるとすると、周波数分析で得られる次数は、(1024/2)―1=511であるが、周波数分解能=50000÷1024≒48Hzであることと、前記のように日本語の音声認識では5.5kHz程度までの分析が必要であると考えると、m=5500÷48≒114となる。   The upper limit of m depends on the frequency resolution when performing frequency analysis, but is sufficient up to the order that can be analyzed up to the frequency necessary for speech recognition. Specifically, if the sampling frequency is 50 kHz and the number of data N is 1024, the order obtained by frequency analysis is (1024/2) −1 = 511, but the frequency resolution = 50000 ÷ 1024≈48 Hz. And, as described above, if Japanese speech recognition needs to be analyzed up to about 5.5 kHz, m = 5500 ÷ 48≈114.

本発明で用いる寄与率は、上記CとC’のいずれでも良いが、C’の場合変動がほぼ二乗で現れることから、振幅スペクトルよりも周波数成分の大小が強調されやすくなるため、Cを用いることが好ましい。   The contribution ratio used in the present invention may be either C or C ′. However, in the case of C ′, the fluctuation appears almost squared, so that the magnitude of the frequency component is more easily emphasized than the amplitude spectrum. It is preferable.

周波数分析は、例えば、音声データ群の適宜の領域のN個の音声データについて1回行うだけとすることもできる。しかし、通常、音声データ群がN個を超える音声データの集まりとなるようにデータ数Nやサンプル数を定めることから、N個の音声データを1つの分析区間(1フレーム)とし、各分析区間を所定音声データ数ずつずらせながら、音声データ群全体を複数回に分けて周波数分析することが好ましい。このようにして音声データ群全体を分析することで精度を向上させることができる。この場合、各分析区間毎に寄与率を求めることになる。分析区間番号をjとすると、前記寄与率C,C’は下記のように表すことができる。   For example, the frequency analysis may be performed only once for N pieces of sound data in an appropriate region of the sound data group. However, since the number of data N and the number of samples are usually determined so that the sound data group is a collection of more than N sound data, N sound data is defined as one analysis section (one frame), and each analysis section It is preferable that the entire audio data group is divided into a plurality of times and the frequency analysis is performed while shifting the predetermined number of audio data. Thus, the accuracy can be improved by analyzing the entire audio data group. In this case, the contribution rate is obtained for each analysis section. When the analysis section number is j, the contribution rates C and C ′ can be expressed as follows.

j=(1/ΣXjm)×(Xjm
j’=(1/Σ(Xjm 2)×(Xjm 2
C j = (1 / ΣX jm ) × (X jm )
C j '= (1 / Σ (X jm 2 ) × (X jm 2 )

上記周波数分析により、通常、m次までの寄与率が各分析区間毎に求められる。そして、求められた寄与率の状態と、予め定められた判定基準とを対比することにより、音声データ群から識別できる母音の音素と子音の音素が特定される。例えば、子音の音素が「k」で母音の音素が「a」と特定された場合、「カ」との識別結果となる。また、その音声データ群から「a」だけしか特定されない場合、「ア」との識別結果となる。   According to the frequency analysis, a contribution rate up to the m-th order is usually obtained for each analysis section. Then, the vowel phonemes and consonant phonemes that can be identified from the speech data group are identified by comparing the state of the determined contribution rate with a predetermined criterion. For example, when the phoneme of the consonant is “k” and the phoneme of the vowel is specified as “a”, the identification result is “K”. Further, when only “a” is specified from the audio data group, an identification result “a” is obtained.

次に、本発明に係る音声識別方法の他の例を図2に基づいて説明する。   Next, another example of the speech identification method according to the present invention will be described with reference to FIG.

音声信号のサンプリングおよびA/D変換は図1の例と同様である。   The sampling of the audio signal and the A / D conversion are the same as in the example of FIG.

本例においては、メモリーに格納された音声データ群を、窓関数処理に先立って、母音領域の音声データ群と子音領域の音声データ群に区分する。この母音領域と子音領域の区分けは、例えば次のようにして行うことができる。   In this example, the sound data group stored in the memory is divided into a sound data group in the vowel area and a sound data group in the consonant area prior to the window function processing. The division between the vowel area and the consonant area can be performed as follows, for example.

1 音声データ群の信号領域の先頭位置から所定個数の音声データを次々に比較し、音声データ中で最大のピーク(最大ピークPmax)の値とその位置(最大ピークPmaxは音声データ群の中間部に存在する)を求める。 1. A predetermined number of audio data are sequentially compared from the beginning position of the signal area of the audio data group, and the value of the maximum peak (maximum peak P max ) in the audio data and its position (the maximum peak P max is the In the middle part).

2 適宜の音声データ数の区間を設定し、最大ピークPmaxの位置から音声データ群の先頭に向かって、順次区間内で最も大きいピーク(区間ピークPn)を求める。 2. An appropriate number of audio data sections are set, and the largest peak (interval peak P n ) is sequentially obtained from the position of the maximum peak P max toward the beginning of the audio data group.

3 母音領域においては急激なピークの低下はないことから、最大ピークPmaxと、これに隣接する区間における区間ピークP1、さらにP1の区間に隣接する次の区間における区間ピークP2のように次々に対比し、例えば区間ピークP1が最大ピークPmaxの60%以上である場合は母音領域であると判別することができ、また区間ピークPnが一つ前の区間ピークPn-1の60%以上である場合には母音領域の続きであると判別することができる。 3 Since there is no sudden peak drop in the vowel region, the maximum peak P max , the section peak P 1 in the section adjacent to this, and the section peak P 2 in the next section adjacent to the section of P 1 For example, when the section peak P 1 is 60% or more of the maximum peak P max , it can be determined that it is a vowel area, and the section peak P n is the previous section peak P n−. When it is 60% or more of 1 , it can be determined that it is a continuation of the vowel region.

4 上記対比を行って、区間ピークPnが一つ前の区間ピークPn-1に比して大きく低下する位置を求め、これが先頭位置であれば全体が母音領域と判別でき、これが音声データ群の中間位置であれば、子音領域と母音領域の境界であると判別できる。また、ピーク値が急激に低下した位置が先頭位置ではない場合でも、先頭位置から当該位置までのデータ数が極端に少ないときには、母音の立ち上がり領域であると判断することができる。 4 performs the comparison, determine the position to be lowered significantly in comparison with the interval peak P n-1 immediately preceding the section peak P n, which can be determined overall if the head position is a vowel area, which is audio data If it is the middle position of the group, it can be determined that it is the boundary between the consonant region and the vowel region. Even when the position at which the peak value has sharply decreased is not the head position, if the number of data from the head position to the position is extremely small, it can be determined that the region is a vowel rising region.

5 また、最大ピークPmaxの位置から音声データ群の最後尾に向かって同様のピーク値の対比を行うと、母音領域の最後尾の位置を検知することができる。複数音を連続して発生した場合、この位置を検知することで、音間の境界を検知することができる。 5. Further, when the same peak value is compared from the position of the maximum peak P max toward the tail of the voice data group, the position of the tail of the vowel area can be detected. When a plurality of sounds are generated continuously, the boundary between the sounds can be detected by detecting this position.

上記の母音領域と子音領域の区分け方法はその一例で、本発明における母音領域と子音領域の区分け方法としては、従来公知のいずれの方法でも適用することができる。例えば背景技術で挙げた特許文献1の方法で行うこともできる。また、複数の区分け方法を併用することもできる。   The above vowel region and consonant region segmentation method is one example, and any conventionally known method can be applied as the vowel region and consonant region segmentation method in the present invention. For example, it can also carry out by the method of patent document 1 quoted by background art. In addition, a plurality of sorting methods can be used in combination.

音声データ群を母音領域の音声データ群と子音領域の音声データ群に区分した後、必要に応じてそれぞれに前述と同様の窓関数処理を施し、さらに前述と同様にして周波数分析を行い、振幅スペクトルおよび/またはパワースペクトルを求める。   After dividing the voice data group into a voice data group in the vowel area and a voice data group in the consonant area, each is subjected to window function processing similar to that described above as necessary, and frequency analysis is performed in the same manner as described above to determine the amplitude. A spectrum and / or power spectrum is determined.

上記周波数分析は、前記母音領域の音声データ群と、子音領域の音声データ群のそれぞれについて施され、それぞれm次までの寄与率が各分析区間毎に求められる。そして、求められた寄与率の状態と、予め定められた判定基準と対比されて、母音と子音が特定される。例えば、子音が「k」で母音が「a」と特定された場合、「カ」との識別結果となる。また、母音領域のみであって、その音声データ群から「a」と特定された場合、「ア」との識別結果となる。特に本例の場合、子音領域の音声データ群から求められた寄与率は子音の音素を特定するための判定基準のみの対比とし、母音領域の音声データ群から求められた寄与率は母音の音素を特定するための判定基準のみの対比とすることができ、予め母音領域と子音領域を区分けしておくことで、対比を簡略化することができる。   The frequency analysis is performed for each of the speech data group in the vowel region and the speech data group in the consonant region, and a contribution rate up to m-th order is obtained for each analysis section. Then, the vowel and the consonant are specified by comparing the obtained contribution rate state with a predetermined criterion. For example, when the consonant is “k” and the vowel is specified as “a”, the identification result is “K”. In addition, if only the vowel area is identified as “a” from the voice data group, the identification result is “a”. In particular, in the case of this example, the contribution rate obtained from the consonant speech data group is a comparison of only the criteria for identifying the consonant phoneme, and the contribution rate obtained from the vowel speech data group is the vowel phoneme. The comparison can be made only by the determination criterion for specifying the vowel, and the comparison can be simplified by dividing the vowel area and the consonant area in advance.

子音と母音の音素を特定するための判定基準は、予めできるだけ多数の被験者から五十音の寄与率を求め、各被験者の五十音それぞれの音素についての寄与率の現れ方を整理しておくことで用意することができる。具体的には、どのような周波数領域にどのような大きさの寄与率が何個現れるか、最大の寄与率を生じる周波数領域、特定の周波数領域の寄与率と他の特定の周波数領域の寄与率との大小関係などを五十音の音素についてデータベース化しておくことで用意することができる。   The criterion for identifying phonemes of consonants and vowels is to determine the contribution rate of the fifty sounds from as many subjects as possible in advance, and organize how the contribution ratios of the respective phonemes of each subject appear. Can be prepared. Specifically, how many contribution ratios appear in what frequency domain, frequency domain that produces the maximum contribution ratio, contribution ratio of a specific frequency domain and other specific frequency domain It can be prepared by creating a database of the phoneme of the Japanese syllabary, such as the magnitude relationship with the rate.

判定基準との対比により、複数の音素が該当する結果が得られる場合などにおいては、例えば音素に優先順位を定めておいて、その順番で特定したり、原波形を参照することでいずれかを選択することが可能である。   In the case where a result corresponding to a plurality of phonemes is obtained by comparison with the judgment criterion, for example, a priority order is set for the phonemes, and either is specified by specifying the order or referring to the original waveform. It is possible to select.

判定基準を作成する場合や、未知の音声を識別する場合に、ニューラルネットワークなどを導入することにより、より認識精度を高めることが可能である。また、コンピューター以外にも、適当な電子回路を用いることにより、目的を達成することが可能である。   The recognition accuracy can be further improved by introducing a neural network or the like when creating a criterion or identifying an unknown voice. In addition to the computer, the object can be achieved by using an appropriate electronic circuit.

次に、実際に寄与率を求めた例について説明する。   Next, an example in which the contribution rate is actually obtained will be described.

―「ア」について―
図1に示す手順で音素の判定を行った。
―About “A” ―
Phonemes were determined according to the procedure shown in FIG.

まず、被験者に単音で「ア」を発声してもらい、その音声をマイクロホンで採取し、サンプリングし、A/D変換して、1番から順次時系列でデータ番号を付してパーソナルコンピューターのメモリーに格納した。サンプリング周波数は50kHzで、A/Dで変換を行う際に、ローパスフィルターで25kHzを超える周波数成分をカットした。   First, ask the test subject to say “a” with a single sound, collect the sound with a microphone, sample it, A / D convert it, and add the data number in chronological order starting from No. 1. Stored. The sampling frequency was 50 kHz, and frequency components exceeding 25 kHz were cut with a low-pass filter when A / D conversion was performed.

採取した音声波形を図3に示す。   The collected voice waveform is shown in FIG.

メモリーに格納した音声データ群の音声信号領域(無信号領域を除いた領域)を取り出し、ハミング窓関数による窓関数処理を施し、高速フーリエ変換を施した。高速フーリエ変換のデータ数Nは1024、周波数分析次数mは114までとした。なお、今回の場合は、(1024/2)−1=511次までのスペクトルが求められているが、115次から511次までのスペクトルは全て無視できる値であった(0に近かった)。   The audio signal area (area excluding the no-signal area) of the audio data group stored in the memory was taken out, subjected to window function processing using a Hamming window function, and subjected to fast Fourier transform. The number of fast Fourier transform data N is 1024, and the frequency analysis order m is 114. In this case, spectra from (1024/2) -1 = 511th order are obtained, but all the spectra from the 115th order to the 511th order are negligible values (close to 0).

百分率で求めた寄与率を表1〜表18に示す。   Tables 1 to 18 show the contribution ratios obtained as percentages.

表1は、データ番号が314〜1338までの1024の音声データを1分析区間(1フレーム)として高速フーリエ変換して求めた寄与率を示し、表2は、データ番号が714〜1738までの1024の音声データ1分析区間としてを高速フーリエ変換して求めた寄与率を示す。表1のデータ番号が314からであるのに対し、表2のデータ番号が714であるのは、各分析区間の間を400の音声データ分だけずらせながら分析を行ったことを示す。1024の音声データを1フレームとし、各フレーム間を400の音声データ分だけずらせているのは以後の他の表においても同様である。   Table 1 shows contribution ratios obtained by performing fast Fourier transform on 1024 audio data with data numbers 314 to 1338 as one analysis section (one frame), and Table 2 shows 1024 with data numbers 714 to 1738. The contribution rate calculated | required by carrying out the fast Fourier transform as the audio | voice data 1 analysis area is shown. The data number in Table 1 is from 314, while the data number in Table 2 is 714 indicates that the analysis was performed while shifting each analysis section by 400 audio data. The same applies to the other tables hereinafter, in which 1024 audio data are defined as one frame and each frame is shifted by 400 audio data.

また、各表の末尾に示される「判定」の欄の記載は、特定された母音または子音の音素を示し、「判定基準」の欄の記載は、後述する表311〜322に示される「音素」の欄にカッコ書きで示される符号に対応する。「判定」と「判定基準」の欄が空欄である場合は、判定には使用されなかったデータ(後述する判定基準には該当しなかったデータ)であったことを示す。これらは以後の他の表においても同様である。   The description in the “determination” column at the end of each table indicates the phoneme of the specified vowel or consonant, and the description in the “determination criteria” column indicates “phoneme” shown in tables 311 to 322 described later. Corresponding to the reference numerals shown in parentheses in the "" column. If the “determination” and “determination criteria” fields are blank, it indicates that the data was not used for the determination (data that did not correspond to the determination criteria described later). The same applies to other tables below.

―「イ」について―
被験者に単音で「イ」を発生してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About “I” ―
The subject was asked to “a” with a single sound, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図4に示すと共に、百分率で求めた寄与率を表19〜表43に示す。なお、表1のデータ番号が314から始まっているのに対し、表19のデータ番号が21からとなっているのは、表1においては313までが無信号状態(無音状態)であったために処理対象外とされ、表19においてはそれが20までであったことによる。以後の他の音の表におけるデータ番号のズレも同様である。   The collected speech waveforms are shown in FIG. 4, and the contribution rates obtained as a percentage are shown in Table 19 to Table 43. The data number in Table 1 starts from 314, whereas the data number in Table 19 starts from 21 because in Table 1, up to 313 was a no-signal state (silent state). It is excluded from the processing target, and in Table 19, it is up to 20. The same applies to the deviation of the data numbers in the other sound tables thereafter.

―「ウ」について―
被験者に単音で「ウ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
-About "U"-
The subject was asked to pronounce “U” with a single note, and the contribution rate was obtained in the same manner as the measurement of “A”.

採取した音声波形を図5に示すと共に、百分率で求めた寄与率を表44〜表68に示す。   The collected speech waveforms are shown in FIG. 5, and the contribution ratios obtained as a percentage are shown in Table 44 to Table 68.

―「エ」について―
被験者に単音で「エ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About “E” ―
The subject was asked to pronounce “e” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図6に示すと共に、百分率で求めた寄与率を表69〜表93に示す。   The collected speech waveforms are shown in FIG. 6 and the contribution rates obtained as percentages are shown in Tables 69 to 93.

―「オ」について―
被験者に単音で「オ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About “O” ―
The subject was asked to pronounce “o” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図7に示すと共に、百分率で求めた寄与率を表94〜表123に示す。   The collected speech waveforms are shown in FIG. 7, and the contribution ratios obtained as percentages are shown in Tables 94 to 123.

―「カ」行について―
被験者に単音で「カ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About “K” Line―
The subject was asked to pronounce “ka” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図8に示すと共に、百分率で求めた寄与率を表124〜表144に示す。   The collected voice waveforms are shown in FIG. 8, and the contribution ratios obtained as percentages are shown in Tables 124 to 144.

なお、「キ」、「ク」、「ケ」、「コ」については、子音の音素判別自体は「カ」と同様であることから省略する。   Note that “ki”, “ku”, “ke”, and “ko” are omitted because the phoneme discrimination of the consonant is the same as “ka”.

―「サ」行について―
被験者に単音で「サ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About the “sa” line―
The subject was asked to pronounce “sa” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図9に示すと共に、百分率で求めた寄与率を表145〜表173に示す。   The collected voice waveforms are shown in FIG. 9 and the contribution ratios obtained as percentages are shown in Tables 145 to 173.

なお、「シ」、「ス」、「セ」、「ソ」については、子音の音素判別自体は「サ」と同様であることから省略する。   Note that “shi”, “su”, “se”, and “so” are omitted because the phoneme discrimination of the consonant is the same as “sa”.

―「タ」行について―
被験者に単音で「タ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About “Ta” line―
The subject was asked to pronounce “ta” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図10に示すと共に、百分率で求めた寄与率を表174〜表194に示す。   The collected voice waveforms are shown in FIG. 10, and the contribution ratios obtained as percentages are shown in Tables 174 to 194.

なお、「チ」、「ツ」、「テ」、「ト」については、子音の音素判別自体は「タ」と同様であることから省略する。   Note that “chi”, “tsu”, “te”, and “g” are omitted because the phoneme discrimination of the consonant is the same as “ta”.

―「ナ」行について―
被験者に単音で「ナ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About “Na” Line―
The subject was asked to pronounce “na” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図11に示すと共に、百分率で求めた寄与率を表195〜表223に示す。   The collected speech waveforms are shown in FIG. 11, and the contribution ratios obtained as percentages are shown in Tables 195 to 223.

なお、「ニ」、「ヌ」、「ネ」、「ノ」については、子音の音素判別自体は「ナ」と同様であることから省略する。   Note that “ni”, “nu”, “ne”, and “no” are omitted because the phoneme discrimination of the consonant is the same as “na”.

―「ハ」行について―
被験者に単音で「ハ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About “C” line―
The subject was asked to pronounce “ha” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図12に示すと共に、百分率で求めた寄与率を表224〜表250に示す。   The collected speech waveforms are shown in FIG. 12, and the contribution rates obtained as percentages are shown in Tables 224 to 250.

なお、「ヒ」、「フ」、「ヘ」、「ホ」については、子音の音素判別自体は「ハ」と同様であることから省略する。   Note that “hi”, “fu”, “he”, and “ho” are omitted because the phoneme discrimination of the consonant is the same as “ha”.

―「マ」行について―
被験者に単音で「マ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
-About "Ma" line-
The subject was asked to pronounce “ma” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図13に示すと共に、百分率で求めた寄与率を表251〜表280に示す。   The collected speech waveforms are shown in FIG. 13 and the contribution rates obtained as percentages are shown in Tables 251 to 280.

なお、「ミ」、「ム」、「メ」、「モ」については、子音の音素判別自体は「マ」と同様であることから省略する。   Note that “mi”, “mu”, “me”, and “mo” are omitted because the phoneme discrimination of the consonant is the same as “ma”.

―「ヤ」行について―
「ヤ」、「ユ」、「ヨ」については、「ia」、「iu」、「io」に準ずると考えられることから省略する。
―About “Ya” line―
“Ya”, “Yu”, and “Yo” are omitted because they are considered to be equivalent to “ia”, “iu”, and “io”.

―「ラ」行について―
被験者に単音で「ラ」を発音してもらい、以下「ア」の測定と同様にして寄与率を求めた。
―About “La” line―
The subject was asked to pronounce “ra” with a single note, and the contribution rate was obtained in the same manner as the measurement of “a”.

採取した音声波形を図15に示すと共に、百分率で求めた寄与率を表281〜表310に示す。   The collected speech waveforms are shown in FIG. 15 and the contribution rates obtained as percentages are shown in Tables 281 to 310.

なお、「リ」、「ル」、「レ」、「ロ」については、子音の音素判別自体は「ラ」と同様であることから省略する。   Note that “li”, “le”, “le”, and “ro” are omitted because the phoneme discrimination of the consonant is the same as “la”.

―「ワ」行について―
「ワ」、「ヲ」については、「ua」、「uo」に準ずると考えられることから省略する。
―About “Wa” line―
“Wa” and “Wo” are omitted because they are considered to be equivalent to “ua” and “uo”.

―「ン」について―
「ン」については「un」または「n」若しくは「m」に準ずると考えられることから省略する。
―About “N” ―
“N” is omitted because it is considered to be equivalent to “un”, “n” or “m”.

―判定基準について―
男女複数の被験者から五十音を測定した結果得られた判定基準の一例を表311〜322に示す。
―About judgment criteria―
Examples of determination criteria obtained as a result of measuring the Japanese syllabary from a plurality of male and female subjects are shown in Tables 311 to 322.

この表311〜322においては、表示を簡略化するため、1次高調波(49Hz)と2次高調波(98Hz)の寄与率を足し合わせた値を98Hzの寄与率とし、3次高調波(147Hz)と4次高調波(196Hz)の寄与率を足し合わせた値を196Hzの寄与率とし、以下同様にして、m−1次高調波の寄与率とm次高調波の寄与率を足し合わせた値をm次の周波数における寄与率として表したものとなっている(ただし、ここでのmは2以上の整数)。しかし、判定基準は、m−1次高調波の寄与率とm次高調波の寄与率を足し合わせた値をm次の周波数における寄与率として表したものを基準としなければならないものではなく、各分析区間における1次からm次までの寄与率をそのまま表したものを基準とすることもできる。   In Tables 311 to 322, in order to simplify the display, the value obtained by adding the contribution ratios of the first harmonic (49 Hz) and the second harmonic (98 Hz) is defined as the contribution ratio of 98 Hz and the third harmonic ( 147 Hz) and the contribution ratio of the fourth harmonic (196 Hz) are added as the contribution ratio of 196 Hz, and in the same manner, the contribution ratio of the m-1st harmonic and the contribution ratio of the mth harmonic are added together. The value is expressed as a contribution rate at the m-th order frequency (where m is an integer of 2 or more). However, the criterion should not be based on the value obtained by adding the contribution ratio of the m-1st harmonic and the contribution ratio of the mth harmonic as the contribution ratio at the mth order frequency. It is also possible to use the one that directly represents the contribution rate from the first order to the m-th order in each analysis section.

なお、表311〜322において、「周波数」の項目における上段と下段の数字は、98Hzに乗ずべき数字を意味し、上段の数字は十の位を指し、下段の数字は一の位を指す。また、「区間」の欄に示されるA,B,C,…などの符号は、「周波数」の欄に矢印で示される領域を意味するが、以下の説明の便宜上付したもので、各表に付されている符号が同じ周波数領域を意味するものではない。   In Tables 311-322, the upper and lower numbers in the “frequency” item mean numbers that should not be multiplied by 98 Hz, the upper number indicates the tens place, and the lower number indicates the first place. Further, symbols such as A, B, C,... Shown in the “section” column mean areas indicated by arrows in the “frequency” column, but are attached for convenience of the following description. The symbols attached to do not mean the same frequency region.

以下、表311〜322を補足説明する。   Hereinafter, Tables 311 to 322 will be supplementarily described.

(1)「a」の判定基準について
表311に示されるように、A−1とA−2の2つの判定基準のいずれか一方を満たすときに「a」と判定することができる。
(1) Determination criteria for “a” As shown in Table 311, “a” can be determined when either one of the two determination criteria A-1 and A-2 is satisfied.

A−1は、以下の条件を総て満たすときに「a」と判定するものである。
・ 区間A(1×98〜4×98Hz)にあるスペクトルには、寄与率の大きさが10以上のものが存在しないこと。
・ 区間B(5×98〜9×98Hz)にあるスペクトルには、寄与率の大きさが10以上のものが2個未満であること。
・ 区間C(8×98〜15×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが3個を超えて存在すること。
・ 区間D(13×98〜25×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが0個でないこと。
A-1 is determined as “a” when all of the following conditions are satisfied.
-The spectrum in the section A (1 × 98 to 4 × 98 Hz) should not have a contribution ratio of 10 or more.
-The spectrum in the section B (5 × 98 to 9 × 98 Hz) should have less than 2 contribution factors with a magnitude of 10 or more.
-In the spectrum in the section C (8 × 98 to 15 × 98 Hz), there must be more than three with a contribution ratio of 3 or more.
-The spectrum in the section D (13 × 98 to 25 × 98 Hz) should not have 0 contribution factor of 3 or more.

A−2は、以下の条件を総て満たすときに「a」と判定するものである。
・ 区間A(1×98〜4×98Hz)にあるスペクトルには、寄与率の大きさが10以上のものが存在しないこと。
・ 区間B(2×98〜7×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが1個を超えて存在すること。
・ 区間C(5×98〜9×98Hz)にあるスペクトルには、寄与率の大きさが10以上のものが2個未満であること。
・ 区間D(9×98〜15×98Hz)にあるスペクトルには、寄与率の大きさが10以上のものが2個を超えて存在すること。
・ 区間E(13×98〜25×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが0個でないこと。
A-2 is determined as “a” when all of the following conditions are satisfied.
-The spectrum in the section A (1 × 98 to 4 × 98 Hz) should not have a contribution ratio of 10 or more.
-In the spectrum in the section B (2 × 98 to 7 × 98 Hz), there should be more than one with a contribution ratio of 3 or more.
-The spectrum in the section C (5 × 98 to 9 × 98 Hz) should have less than 2 contributions with a magnitude of 10 or more.
-In the spectrum in the section D (9 × 98 to 15 × 98 Hz), there must be more than two having a contribution ratio of 10 or more.
-The spectrum in the section E (13 × 98 to 25 × 98 Hz) must not have 0 contribution factor of 3 or more.

(2)「i」の判定基準について
表312に示されるように、I−1とI−2の2つの判定基準のいずれかを満たすときに「i」と判定することができる。
(2) Determination criteria for “i” As shown in Table 312, it is possible to determine “i” when either of the two determination criteria I-1 and I-2 is satisfied.

I−1の表の見方は前記「a」の判定基準を示す表311に準ずる。   The way of reading the table of I-1 is based on the table 311 showing the determination criteria of the above “a”.

I−2は、以下の条件を総て満たすときに「i」と判定するものである。
・ 区間A(2×98〜4×98Hz)にあるスペクトルには、寄与率の大きさが9以上のものが0個でないこと。
・ 区間B(11×98〜15×98Hz)にあるスペクトルには、寄与率の大きさが2.5以上のものが0個であること。
・ 区間C(17×98〜26×98Hz)にあるスペクトルには、寄与率の大きさが2.5以上のものが6個未満であること。
・ 区間D(17×98〜20×98Hz)にあるスペクトルには、寄与率の大きさが1.5以上のものが0個であること。
・ 区間E1(28×98〜41×98Hz)にあるスペクトルには、寄与率の大きさが
0.5以上のものが8個以上あること、または区間E2(28×98〜41×98Hz)にあるスペクトルには、寄与率の大きさが1以上のものが3個以上あること、若しくは
区間F(28×98〜41×98Hz)にあるスペクトルには、寄与率の大きさが0.5以上のものが3個以上あり、かつ区間G(28×98〜41×98Hz)にあるスペクトルには、寄与率の大きさが1以上のものが0個でないこと。
・ 区間H(35×98〜46×98Hz)にあるスペクトルには、寄与率の大きさが
2.5以上のものが0個であること。
・区間1×98〜10×98Hzにおいては、寄与率の大きさが3以上のものは7×98
Hz以上には存在しないこと。
I-2 is determined as “i” when all of the following conditions are satisfied.
-In the spectrum in the section A (2 × 98 to 4 × 98 Hz), there should be no zero having a contribution ratio of 9 or more.
-The spectrum in the section B (11 × 98 to 15 × 98 Hz) has zero contribution factor of 2.5 or more.
-The spectrum in the section C (17 × 98 to 26 × 98 Hz) should have less than six with a contribution ratio of 2.5 or more.
-The spectrum in the section D (17 × 98 to 20 × 98 Hz) should have zero contribution factor of 1.5 or more.
-The spectrum in the section E1 (28 × 98 to 41 × 98 Hz) has eight or more contribution factors having a magnitude of 0.5 or more, or the section E2 (28 × 98 to 41 × 98 Hz). A certain spectrum has three or more contribution factors having a magnitude of 1 or more, or a spectrum in a section F (28 × 98 to 41 × 98 Hz) has a contribution rate of 0.5 or more. The spectrum in the section G (28 × 98 to 41 × 98 Hz) must have no more than one with a contribution ratio of 1 or more.
-The spectrum in the section H (35 × 98 to 46 × 98 Hz) has zero contribution factor of 2.5 or more.
・ In sections 1 × 98 to 10 × 98 Hz, those with a contribution ratio of 3 or more are 7 × 98
Does not exist above Hz.

(3)「u」、「e」、「o」、「s」、「t」の判定基準について
「u」は表313、「e」は表314、「o」は表315、「s」は表317、「t」は表318に示される判定基準によって判定することができる。「u」、「e」、「o」、「s」および「t」のT−1の表の見方は上記「a」の判定基準を示す表311に準ずる。「t」のT−1の表の見方は次に述べるK−2の見方に準ずる。
(3) “u”, “e”, “o”, “s”, “t” Determination Criteria “u” is Table 313, “e” is Table 314, “o” is Table 315, “s” Table 317, “t” can be determined according to the determination criteria shown in Table 318. The reading of the T-1 table of “u”, “e”, “o”, “s”, and “t” is in accordance with Table 311 showing the determination criteria of “a”. The view of the T-1 table of “t” is in accordance with the view of K-2 described below.

(4)「k」の判定基準について
表316に示されるように、K−1とK−2とK−3の3つの判定基準のいずれか一つを満たすときに「k」と判定することができる。
(4) About the judgment criterion of “k” As shown in Table 316, it is judged as “k” when any one of the three judgment criteria of K-1, K-2, and K-3 is satisfied. Can do.

K−1とK−3の表の見方は前記「a」の判定基準を示す表311に準ずる。   The way of viewing the tables of K-1 and K-3 is in accordance with Table 311 showing the determination criteria of “a”.

K−2は、以下の条件を総て満たすときに「k」と判定するものである。
・ 区間A(1×98〜5×98Hz)にあるスペクトルには、寄与率の大きさが6以上のものが0個であること。
・ 区間B(16×98〜20×98Hz)にあるスペクトルには、寄与率の大きさが2.5以上のものが0個であること。
・ 区間C1(36×98〜40×98Hz)にあるスペクトルには、寄与率の大きさが2以上のものが1個以上あること、または、区間C2(46×98〜55×98Hz)にあるスペクトルには、寄与率の大きさが2以上のものが1個以上あること。
・ 区間D(41×98〜45×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが1個以上あること。
K-2 is determined as “k” when all of the following conditions are satisfied.
-The spectrum in the section A (1 × 98 to 5 × 98 Hz) must have zero contribution ratio of 6 or more.
-The spectrum in the section B (16 × 98 to 20 × 98 Hz) has zero contribution factor of 2.5 or more.
-The spectrum in the section C1 (36 × 98 to 40 × 98 Hz) has one or more contribution magnitudes of 2 or more, or is in the section C2 (46 × 98 to 55 × 98 Hz). There must be at least one spectrum with a contribution ratio of 2 or more.
-The spectrum in the section D (41 × 98 to 45 × 98 Hz) must have at least one contribution factor of 3 or more.

(5)「n」の判定基準について
表319に示されるように、以下の条件を総て満たすときに「n」と判定することができる。
・ 区間A(1×98〜6×98Hz)にあるスペクトルには、寄与率の大きさが30以上のものが0個であること。
・ 区間B(1×98〜6×98Hz)にあるスペクトルには、寄与率の大きさが10以上のものが1個を超えること。
・ 区間C(1×98〜6×98Hz)にあるスペクトルには、寄与率の大きさが5以上のものが2個を超えること。
・ 区間D(7×98〜9×98Hz)にあるスペクトルの最大寄与率をp0とし、区間E(10×98〜15×98Hz)にあるスペクトルの最大寄与率をp1とし、区間F(16×98〜21×98Hz)にあるスペクトルの最大寄与率をp2とし、区間G(22×98〜30×98Hz)にあるスペクトルの最大寄与率をp3としたときに、p0、p2、p3のうちの最低1個がp1よりも大きく、かつ、p0、p2、p3のうちの最低1個の寄与率が2以上であること。
・ 区間H(31×98〜55×98Hz)にあるスペクトルには、寄与率の大きさが2以上のものが0個であること。
(5) Determination criteria for “n” As shown in Table 319, “n” can be determined when all of the following conditions are satisfied.
-The spectrum in the section A (1 × 98 to 6 × 98 Hz) must have zero contribution factor of 30 or more.
-The spectrum in the section B (1 × 98 to 6 × 98 Hz) must have more than one contribution factor of 10 or more.
-The spectrum in the section C (1 × 98 to 6 × 98 Hz) should have more than 2 contributions with a contribution ratio of 5 or more.
The maximum contribution ratio of the spectrum in the section D (7 × 98 to 9 × 98 Hz) is p0, the maximum contribution ratio of the spectrum in the section E (10 × 98 to 15 × 98 Hz) is p1, and the section F (16 × The maximum contribution ratio of the spectrum in the range 98 to 21 × 98 Hz is p2, and the maximum contribution ratio of the spectrum in the section G (22 × 98 to 30 × 98 Hz) is p3. At least one is larger than p1, and the contribution of at least one of p0, p2, and p3 is 2 or more.
-The spectrum in the section H (31 × 98 to 55 × 98 Hz) should have zero contribution factor of 2 or more.

(6)「h」の判定基準について
表320に示されるように、H−1〜H−4の4つの判定基準のいずれか一つを満たすときに「h」と判定することができる。
(6) Determination criteria for “h” As shown in Table 320, “h” can be determined when any one of the four determination criteria H-1 to H-4 is satisfied.

H−2の表の見方は前記表320のK−2に準じ、H−3の表の見方は前記「a」の判定基準を示す表311に準ずる。   The way of reading the table of H-2 is based on K-2 of the above table 320, and the way of reading the table of H-3 is based on the table 311 showing the judgment criteria of the above “a”.

H−1は、以下の条件を総て満たすときに「h」と判定するものである。
・ 区間A1(1×98〜5×98Hz)にあるスペクトルには、寄与率の大きさが7以上のものが0個でないこと、または、区間A2(21×98〜26×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが0個でないこと。
・ 区間B(6×98〜10×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが0個でないこと。
・ 区間C(11×98〜15×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが0個でないこと。
・ 区間D(16×98〜20×98Hz)にあるスペクトルには、寄与率の大きさが3以上のものが0個でないこと。
・ 区間E(6×98〜30×98Hz)にスペクトルの最大寄与率p0が存在し、かつ、このp0の大きさが8以上であること。
H-1 is determined as “h” when all of the following conditions are satisfied.
-The spectrum in the section A1 (1 × 98 to 5 × 98 Hz) has zero contribution factor of 7 or more, or the spectrum in the section A2 (21 × 98 to 26 × 98 Hz) Is that there are no zero contributions of 3 or more.
-The spectrum in the section B (6 × 98 to 10 × 98 Hz) must not have zero contribution factor of 3 or more.
-The spectrum in the section C (11 × 98 to 15 × 98 Hz) must not have 0 contribution factor of 3 or more.
-The spectrum in the section D (16 × 98 to 20 × 98 Hz) must not have 0 contribution factor of 3 or more.
-The maximum contribution p0 of the spectrum exists in the section E (6 × 98 to 30 × 98 Hz), and the size of this p0 is 8 or more.

H−4は、以下の条件を総て満たすときに「h」と判定するものである。
・ 区間A(1×98〜5×98Hz)にあるスペクトには、寄与率の大きさが20以上のものが0個であること。
・ 区間C(1×98〜26×98Hz)にスペクトルの最大寄与率p0が存在し、かつ、このp0の大きさが8以上であること。
・ 上記最大寄与率p0が属する区間を除く区間B1〜B8のいずれか2区間以上で、寄与率の大きさが4以上のものが1個以上存在すること。
H-4 is determined as “h” when all of the following conditions are satisfied.
-The spectrum in the section A (1 × 98 to 5 × 98 Hz) has zero contribution ratio of 20 or more.
-The maximum contribution p0 of the spectrum exists in the section C (1 × 98 to 26 × 98 Hz), and the size of this p0 is 8 or more.
-There must be at least one of two or more of the sections B1 to B8 excluding the section to which the maximum contribution rate p0 belongs and having a contribution rate of 4 or more.

(7)「m」の判定基準について
表321に示されるように、M−1とM−2の2つの判定基準のいずれか一方を満たすときに「m」と判定することができる。
(7) Judgment criteria of “m” As shown in Table 321, it can be judged as “m” when one of the two judgment criteria of M-1 and M-2 is satisfied.

M−1は、以下の条件を総て満たすときに「m」と判定するものである。
・ 区間A(1×98〜6×98Hz)にあるスペクトルには、寄与率の大きさが10以上のものが1個を超えること。
・ 区間B(1×98〜6×98Hz)にあるスペクトルには、寄与率の大きさが5以上のものが2個を超えること。
・ 区間C(7×98〜10×98Hz)にあるスペクトルの最大寄与率をp0とし、区間D(11×98〜15×98Hz)にあるスペクトルの最大寄与率をp1とし、区間E(16×98〜21×98Hz)にあるスペクトルの最大寄与率をp2とし、区間F(22×98〜30×98Hz)にあるスペクトルの最大寄与率をp3としたときに、p1は、p0、p2、p3のいずれよりも大きく、かつ、p1は2以上であること。
・ 区間G(31×98〜55×98Hz)にあるスペクトルには、寄与率の大きさが4以上のものが0個であること。
M-1 is determined to be “m” when all of the following conditions are satisfied.
-The spectrum in the section A (1 × 98 to 6 × 98 Hz) must have more than one contribution factor of 10 or more.
-The spectrum in the section B (1 × 98 to 6 × 98 Hz) must have more than 2 contribution factors with a magnitude of 5 or more.
The maximum contribution ratio of the spectrum in the section C (7 × 98 to 10 × 98 Hz) is p0, the maximum contribution ratio of the spectrum in the section D (11 × 98 to 15 × 98 Hz) is p1, and the section E (16 × P1 is p0, p2, and p3, where p2 is the maximum contribution ratio of the spectrum in the range 98 to 21 × 98 Hz) and p3 is the maximum contribution ratio of the spectrum in the section F (22 × 98 to 30 × 98 Hz). And p1 is 2 or more.
-The spectrum in the section G (31 × 98 to 55 × 98 Hz) has zero contribution factor of 4 or more.

M−2の表の見方は前記「a」の判定基準を示す表311に準ずる。   The way to read the table of M-2 is based on the table 311 showing the determination criteria of the “a”.

(8)「r」の判定基準について
「r」は表322に示される判定基準によって判定することができる。この表の見方は上記表321のM−1に準ずる。
(8) Determination criteria for “r” “r” can be determined according to the determination criteria shown in Table 322. The way of reading this table conforms to M-1 in Table 321 above.

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

Figure 2006343544
Figure 2006343544

本発明に係る音声認識方法の一例を示すブロック線図である。It is a block diagram which shows an example of the speech recognition method which concerns on this invention. 本発明に係る音声認識方法の他の例を示すブロック線図である。It is a block diagram which shows the other example of the speech recognition method which concerns on this invention. 「ア」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "a". 「イ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "I". 「ウ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "U". 「エ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "d". 「オ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "o". 「カ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "K". 「サ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "sa". 「タ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "ta". 「ナ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "na". 「ハ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "c". 「マ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "ma". 「ラ」の音声波形を示す図である。It is a figure which shows the audio | voice waveform of "La".

Claims (6)

音声信号からサンプリングされA/D変換された音声データ群を周波数分析し、これによって得られる振幅スペクトルまたはパワースペクトルから、音声周波数領域に含まれる基本波および各高調波成分の振幅またはパワーの合計に対する、基本波および各高調波成分のそれぞれの振幅またはパワーの比率を寄与率として求め、この寄与率の現れ方から、子音と母音の音素を特定することを特徴とする音声認識方法。   A frequency analysis is performed on a group of audio data sampled from the audio signal and A / D converted, and the amplitude spectrum or power spectrum obtained thereby is used to calculate the amplitude or power of the fundamental wave and each harmonic component included in the audio frequency domain. A speech recognition method characterized in that the ratio of the amplitude or power of each of the fundamental wave and each harmonic component is obtained as a contribution rate, and phonemes of consonants and vowels are specified from the appearance of the contribution rate. 音声データ群を子音領域と母音領域に区分し、子音領域の音声データ群と母音領域の音声データ群をそれぞれ周波数分析して寄与率を求め、各音声データ群における寄与率の現れ方から、子音と母音の音素を特定することを特徴とする請求項1に記載の音声認識方法。   The speech data group is divided into a consonant region and a vowel region, and the contribution rate is obtained by frequency analysis of the speech data group of the consonant region and the speech data group of the vowel region, and from the appearance of the contribution rate in each speech data group, the consonant 2. The speech recognition method according to claim 1, wherein phonemes of vowels are specified. 寄与率として、音声周波数領域に含まれる基本波および各高調波成分の振幅の合計に対する基本波および各高調波成分のそれぞれの振幅の比率を用いることを特徴とする請求項1または2に記載の音声認識方法。   The ratio of the amplitude of each of the fundamental wave and each harmonic component to the sum of the amplitudes of the fundamental wave and each harmonic component included in the audio frequency region is used as the contribution rate. Speech recognition method. 音声データ群に対し、N個の音声データの分析区間毎に順次周波数分析を施し、各分析区間毎に寄与率を求めることを特徴とする請求項1〜3のいずれか1項に記載の音声認識方法。   The voice according to any one of claims 1 to 3, wherein the voice data group is sequentially subjected to frequency analysis for each analysis section of N pieces of voice data, and a contribution rate is obtained for each analysis section. Recognition method. 音声データ群に対して窓関数処理を施してから周波数分析を施すことを特徴とする請求項1〜4いずれか1項に記載の音声認識方法。   5. The speech recognition method according to claim 1, wherein a frequency analysis is performed after a window function process is performed on the speech data group. 窓関数処理にハミング窓を用い、周波数分析に高速フーリエ解析を用いることを特徴とする請求項1〜5のいずれか1項に記載の音声認識方法。   The speech recognition method according to claim 1, wherein a Hamming window is used for window function processing, and fast Fourier analysis is used for frequency analysis.
JP2005169217A 2005-06-09 2005-06-09 Speech recognition method Expired - Fee Related JP4890792B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2005169217A JP4890792B2 (en) 2005-06-09 2005-06-09 Speech recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2005169217A JP4890792B2 (en) 2005-06-09 2005-06-09 Speech recognition method

Publications (3)

Publication Number Publication Date
JP2006343544A true JP2006343544A (en) 2006-12-21
JP2006343544A5 JP2006343544A5 (en) 2008-08-21
JP4890792B2 JP4890792B2 (en) 2012-03-07

Family

ID=37640558

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2005169217A Expired - Fee Related JP4890792B2 (en) 2005-06-09 2005-06-09 Speech recognition method

Country Status (1)

Country Link
JP (1) JP4890792B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014041292A (en) * 2012-08-23 2014-03-06 Daihen Corp Welding system and welding control device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56129000A (en) * 1980-03-14 1981-10-08 Hitachi Ltd Wind hanging calculator
JPS6180298A (en) * 1984-09-28 1986-04-23 松下電器産業株式会社 Voice recognition equipment
JPS62299899A (en) * 1986-06-19 1987-12-26 富士通株式会社 Contracted sound-direct sound speech evaluation system
JPS6389900A (en) * 1986-10-03 1988-04-20 沖電気工業株式会社 Voice recognition equipment
JPS63234299A (en) * 1987-03-20 1988-09-29 株式会社日立製作所 Voice analysis/synthesization system
JPH03230200A (en) * 1990-02-05 1991-10-14 Sekisui Chem Co Ltd Voice recognizing method
JP2000298495A (en) * 1999-03-19 2000-10-24 Koninkl Philips Electronics Nv Specifying method of regression class tree structure for voice recognition device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56129000A (en) * 1980-03-14 1981-10-08 Hitachi Ltd Wind hanging calculator
JPS6180298A (en) * 1984-09-28 1986-04-23 松下電器産業株式会社 Voice recognition equipment
JPS62299899A (en) * 1986-06-19 1987-12-26 富士通株式会社 Contracted sound-direct sound speech evaluation system
JPS6389900A (en) * 1986-10-03 1988-04-20 沖電気工業株式会社 Voice recognition equipment
JPS63234299A (en) * 1987-03-20 1988-09-29 株式会社日立製作所 Voice analysis/synthesization system
JPH03230200A (en) * 1990-02-05 1991-10-14 Sekisui Chem Co Ltd Voice recognizing method
JP2000298495A (en) * 1999-03-19 2000-10-24 Koninkl Philips Electronics Nv Specifying method of regression class tree structure for voice recognition device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014041292A (en) * 2012-08-23 2014-03-06 Daihen Corp Welding system and welding control device

Also Published As

Publication number Publication date
JP4890792B2 (en) 2012-03-07

Similar Documents

Publication Publication Date Title
JP3162994B2 (en) Method for recognizing speech words and system for recognizing speech words
US20200160839A1 (en) Method and system for generating advanced feature discrimination vectors for use in speech recognition
US8942977B2 (en) System and method for speech recognition using pitch-synchronous spectral parameters
US7908142B2 (en) Apparatus and method for identifying prosody and apparatus and method for recognizing speech
He et al. Automatic syllable segmentation algorithm of Chinese speech based on MF-DFA
Deb et al. Exploration of phase information for speech emotion classification
Mary et al. Automatic syllabification of speech signal using short time energy and vowel onset points
JP4890792B2 (en) Speech recognition method
Hasija et al. Recognition of Children Punjabi Speech using Tonal Non-Tonal Classifier
JPH0229232B2 (en)
KR0136608B1 (en) Phoneme recognizing device for voice signal status detection
JPH07191696A (en) Speech recognition device
Aadit et al. Pitch and formant estimation of bangla speech signal using autocorrelation, cepstrum and LPC algorithm
Every et al. Enhancement of harmonic content of speech based on a dynamic programming pitch tracking algorithm
Awais et al. Continuous arabic speech segmentation using FFT spectrogram
Kepuska et al. Using formants to compare short and long vowels in modern standard Arabic
Pyž et al. Modelling of Lithuanian speech diphthongs
JP2001083978A (en) Speech recognition device
Pietrowicz et al. Acoustic correlates for perceived effort levels in expressive speech.
Loni et al. Singing voice identification using harmonic spectral envelope
Li SPEech Feature Toolbox (SPEFT) design and emotional speech feature extraction
Latha et al. Performance Analysis of Kannada Phonetics: Vowels, Fricatives and Stop Consonants Using LP Spectrum
Yusof et al. Speech recognition application based on malaysian spoken vowels using autoregressive model of the vocal tract
JPS6068000A (en) Pitch extractor
Rahaman et al. Special feature extraction techniques for Bangla speech

Legal Events

Date Code Title Description
A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20080605

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20080605

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A711

Effective date: 20080605

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20080606

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20080725

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20080725

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20100921

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20101102

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20101214

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20110913

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20111024

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20111206

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20111215

R150 Certificate of patent or registration of utility model

Ref document number: 4890792

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20141222

Year of fee payment: 3

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

LAPS Cancellation because of no payment of annual fees