JP2018033540A

JP2018033540A - Lingual position/lingual habit determination device, lingual position/lingual habit determination method and program

Info

Publication number: JP2018033540A
Application number: JP2016167180A
Authority: JP
Inventors: 俊介石光; Shunsuke Ishimitsu; 仁史中山; Hitoshi Nakayama; 一貴葛西; Kazutaka Kasai; 聡堀畑; Satoshi Horibatake; 石井　かおり; Kaori Ishii; かおり石井; 公子山下; Kimiko Yamashita
Original assignee: Nihon University; Hiroshima City University
Current assignee: Nihon University; Hiroshima City University
Priority date: 2016-08-29
Filing date: 2016-08-29
Publication date: 2018-03-08
Anticipated expiration: 2036-08-29
Also published as: JP6782940B2

Abstract

PROBLEM TO BE SOLVED: To provide a lingual position/lingual habit determination device, a lingual position/lingual habit determination method and a program in which a lingual position/lingual habit can be determined in a non-invasion manner.SOLUTION: A lingual position/lingual habit determination device 1 includes: an audio input unit 2 for inputting audio data concerning the utterance of an utterer h; a measuring unit 3 for extracting the audio data serving as a determination object from the input audio data, and measuring a sound feature value (zero crossing rate and mel-frequency cepstrum) concerning a lingual position and a lingual habit in the extracted audio data; and an estimation unit 4 for estimating the lingual position or the lingual habit of the utterer h, based on the measured sound feature value (sound feature value vector).SELECTED DRAWING: Figure 1

Description

本発明は、舌位・舌癖判定装置、舌位・舌癖判定方法及びプログラムに関する。 The present invention relates to a tongue position / lingual tongue determining apparatus, a tongue position / lingual tongue determining method, and a program.

口腔は発声、呼吸、咀嚼及び嚥下の際に用いられる器官である。口腔環境を正常な状態に保つことは身体の健康等にとって極めて重要であるため、従来から口腔環境の測定が行われている（例えば、特許文献１参照）。 The oral cavity is an organ used for vocalization, breathing, chewing and swallowing. Since maintaining the oral environment in a normal state is extremely important for physical health and the like, the oral environment has been conventionally measured (see, for example, Patent Document 1).

口腔環境に大きな影響を与え、身体の健康等を損なうおそれがあるものの１つに不正咬合がある。不正咬合の約２５％は口腔習癖（吸指癖や舌癖（舌突出癖））などが原因である。そこで、不正咬合を矯正すべく、舌、口唇や顔面の筋肉バランスを改善することで口腔習癖を改善する口腔筋機能療法（ＭＴＦ）が行われている。 Malocclusion is one of the factors that have a great impact on the oral environment and may impair physical health. About 25% of malocclusions are caused by oral habits (such as finger sucking and tongue wrinkles (tongue protrusions)). Therefore, in order to correct malocclusion, oral muscle function therapy (MTF) for improving oral habits by improving muscle balance of the tongue, lips and face has been performed.

国際公開第２００１／０７８６０２号International Publication No. 2001/078602

不正咬合は、原因となる口腔習癖がなんであるかによってその治療法も変わってくる。このため、舌の機能を解析して不正咬合の原因となる口腔習癖を突き止めることが重要になっている。従来の医療現場では、口腔習癖を突き止めるためエックス線、パラトグラム圧力センサ等を用いて舌機能の解析が行われているため、検査の際に、被曝のおそれがあり、口腔内に器具を設置する必要がある。そこで、非侵襲に舌位・舌癖を識別、判定する方法が望まれている。 The treatment of malocclusion varies depending on what causes the oral habit. For this reason, it has become important to analyze the function of the tongue and identify oral habits that cause malocclusion. In conventional medical practice, analysis of tongue function is performed using X-rays, palatogram pressure sensors, etc. in order to identify oral habits, so there is a risk of exposure during examination, and it is necessary to install instruments in the oral cavity There is. Therefore, a non-invasive method for identifying and determining the tongue position and tongue fold is desired.

一方で、音響モデルを用いた音響認識技術が急速に発達している。しかしながら、舌位・舌癖を判定可能な音響認識技術は未だ考案されていない。 On the other hand, acoustic recognition technology using acoustic models is rapidly developing. However, an acoustic recognition technology that can determine the tongue position and tongue tongue has not yet been devised.

本発明は、非侵襲に舌位・舌癖を判定することができる舌位・舌癖判定装置、舌位・舌癖判定方法及びプログラムを提供することを目的とする。 An object of the present invention is to provide a tongue position / lingual tongue determination device, a tongue position / lingual tongue determination method, and a program capable of determining the tongue position / tongue in a non-invasive manner.

上記目的を達成するために、本発明に係る舌位・舌癖判定装置は、
発話者の発話に係る音声データに基づいて、前記発話者の舌位及び舌癖と関連する音響特徴量を計測する計測部と、
計測された音響特徴量に基づいて、前記発話者の舌位又は舌癖を推定する推定部と、
を備える。 In order to achieve the above object, a tongue position / lingual tongue determining apparatus according to the present invention includes:
Based on voice data related to the utterance of the speaker, a measurement unit that measures acoustic features related to the tongue position and tongue tongue of the speaker;
An estimation unit for estimating the tongue position or tongue tongue of the speaker based on the measured acoustic features;
Is provided.

この場合、前記計測部は、入力された音声データの波形が零レベルまたは零レベル付近の一定区間と交差する数である零交差数を、前記音響特徴量として計測し、
前記推定部は、計測された零交差数に基づいて、前記発話者の舌位又は舌癖を推定する、
こととしてもよい。 In this case, the measurement unit measures, as the acoustic feature quantity, the number of zero crossings, which is the number that the waveform of the input voice data intersects with a zero level or a constant section near the zero level,
The estimating unit estimates the tongue position or tongue tongue of the speaker based on the measured number of zero crossings;
It is good as well.

また、前記計測部は、入力された音声データのメル周波数ケプストラム係数を、前記音響特徴量として計測し、
前記推定部は、計測されたメル周波数ケプストラム係数に基づいて、前記発話者の舌位又は舌癖を推定する、
こととしてもよい。 Further, the measurement unit measures a mel frequency cepstrum coefficient of the input voice data as the acoustic feature amount,
The estimation unit estimates the tongue position or tongue tongue of the speaker based on the measured Mel frequency cepstrum coefficient;
It is good as well.

舌位又は舌癖と参照用の音響特徴量に関する情報とを対応付けて記憶しており、
前記推定部は、計測された音響特徴量に最も近い参照用の音響特徴量に対応付けられた舌位又は舌癖を、前記発話者の舌位又は舌癖として推定する、
こととしてもよい。 The tongue position or tongue fold is stored in association with information on the acoustic feature for reference,
The estimation unit estimates a tongue position or tongue tongue associated with the reference acoustic feature quantity closest to the measured acoustic feature quantity as the tongue position or tongue tongue of the speaker;
It is good as well.

また、同じ舌位又は舌癖を有する複数の発話者の発話に係る音声データから得られた零交差数及びメル周波数ケプストラム係数を各要素とする参照用の音響特徴量ベクトルを参照用の音響特徴量に関する情報として記憶し、
前記推定部は、被験者の発話に係る音声データから得られた零交差数及びメル周波数ケプストラム係数を各要素とする音響特徴量ベクトルを、前記参照用の音響特徴量ベクトルと比較して、被験者の舌位又は舌癖を推定する、
こととしてもよい。 Moreover, the acoustic feature for reference is the acoustic feature vector for reference that includes the number of zero crossings and the mel frequency cepstrum coefficient obtained from the speech data related to the utterances of a plurality of speakers having the same tongue position or tongue tongue. Memorize as quantity information,
The estimation unit compares the acoustic feature quantity vector having the zero-crossing number and the mel frequency cepstrum coefficient obtained from the speech data related to the subject's utterance with the reference acoustic feature quantity vector. Estimating tongue position or tongue fold,
It is good as well.

前記計測部は、子音区間の音声データを、判定用の音声データとして抽出する、
こととしてもよい。 The measurement unit extracts the sound data of the consonant section as sound data for determination.
It is good as well.

前記計測部は、前記音声データの零交差数が閾値以上となる区間の音声データを、子音区間の音声データとして抽出する、
こととしてもよい。 The measurement unit extracts voice data of a section in which the number of zero crossings of the voice data is equal to or greater than a threshold value as voice data of a consonant section;
It is good as well.

本発明の第２の観点に係る舌位・舌癖判定方法は、
発話者の発話に係る音声データに基づいて、前記発話者の舌位及び舌癖と関連する音響特徴量を計測する計測工程と、
計測された音響特徴量に基づいて、前記発話者の舌位又は舌癖を推定する推定工程と、
を含む。 Tongue position / tongue determination method according to the second aspect of the present invention,
A measurement step of measuring an acoustic feature amount related to the tongue position and tongue tongue of the speaker based on voice data related to the speaker's speech;
An estimation step for estimating the tongue position or tongue tongue of the speaker based on the measured acoustic features;
including.

本発明の第３の観点に係るプログラムは、
コンピュータを、
発話者の発話に係る音声データに基づいて、前記発話者の舌位及び舌癖と関連する音響特徴量を計測する計測部、
計測された音響特徴量に基づいて、前記発話者の舌位又は舌癖を推定する推定部、
として機能させる。 The program according to the third aspect of the present invention is:
Computer
A measurement unit that measures acoustic features related to the tongue position and tongue tongue of the speaker based on voice data related to the speaker's speech,
An estimation unit that estimates the tongue position or tongue tongue of the speaker based on the measured acoustic feature amount;
To function as.

本発明によれば、発話者の発話に係る音声データから得られる音響特徴量に基づいて、発話者の舌位・舌癖を推定するので、非侵襲に舌位・舌癖を判定することができる。 According to the present invention, since the tongue position / tongue of the speaker is estimated based on the acoustic feature amount obtained from the voice data related to the utterance of the speaker, the tongue position / tongue can be determined non-invasively. it can.

本発明の実施の形態に係る舌位・舌癖判定装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the tongue position / lingual tongue determination apparatus which concerns on embodiment of this invention. 舌位・舌癖の一例を示す図である。It is a figure which shows an example of a tongue position and a tongue fold. 図３（Ａ）及び図３（Ｂ）は、舌位・舌癖毎の零交差数の分布を示す図である。3 (A) and 3 (B) are diagrams showing the distribution of the number of zero crossings for each tongue position / lingual tongue. 図４（Ａ）乃至図４（Ｅ）は、音声データ中における子音区間の音声データを示す図である。FIG. 4A to FIG. 4E are diagrams showing audio data of a consonant section in audio data. 図５（Ａ）及び図５（Ｂ）は、舌位・舌癖毎のメル周波数ケプストラム第４係数の分布を示す図である。FIG. 5A and FIG. 5B are diagrams showing the distribution of the Mel frequency cepstrum fourth coefficient for each tongue position and tongue tongue. 図６（Ａ）乃至図６（Ｅ）は、舌位・舌癖毎のメル周波数ケプストラム係数の計測結果を示すグラフである。6A to 6E are graphs showing the measurement results of the Mel frequency cepstrum coefficient for each tongue position / lingual tongue. 音響特徴量ベクトルが２次元変換されたベクトル空間を示す図である。It is a figure which shows the vector space by which the acoustic feature-value vector was 2-dimensionally converted. 図１の舌位・舌癖判定装置のハードウエア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the tongue position / lingual tongue determination apparatus of FIG. 舌位・舌癖判定装置における判定処理のフローチャートである。It is a flowchart of the determination process in a tongue position / tongue fold determination apparatus. 計測工程のフローチャートである。It is a flowchart of a measurement process.

以下、本発明の実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１に示すように、本実施の形態に係る舌位・舌癖判定装置１としては、発話者ｈの音声を入力可能な例えば携帯電話、スマートフォン、レコーダ又はパーソナルコンピュータを用いることができる。 As shown in FIG. 1, for example, a mobile phone, a smartphone, a recorder, or a personal computer that can input the voice of the speaker h can be used as the tongue / lingual tongue determining apparatus 1 according to the present embodiment.

判定対象となる舌位・舌癖には、「健常（Origin）」、「低位舌（Lower tongue）」、「下顎前突症(Mandibular)」、「舌突出（Protruding tongue）」、「低位舌＋下顎前突症(Mandibular＋Lower tongue)」がある。図２に示すように、「健常（Origin）」は、舌２２の先が上の前歯２０のすぐ後ろに位置して、舌２２の広い部分が上あごの口蓋（天井の部分）に軽く付いている状態、すなわち図２の実線で示す舌２２の状態である。「低位舌（Lower tongue）」は、舌２２の先の位置が低く、下の前歯２１の裏側に触れている状態、すなわち図２に示す点線で示した舌２２の状態である。「下顎前突症(Mandibular)」は、噛み合わせたときに下あごにある歯（前歯２１含む）全体が上あごにある歯全体（前歯２０含む）より前方に突出している状態（図２と逆の状態）である。「舌突出（Protruding tongue）」は、舌２２が前歯２０、２１の間から突出している状態である。「低位舌＋下顎前突症(Mandibular＋Lower tongue)」は、低位舌と下顎前突症とが合併した状態である。 Tongue position and tongue tongue to be judged include “Origin”, “Lower tongue”, “Mandibular”, “Protruding tongue”, “Lower tongue” + Mandibular + Lower tongue ”. As shown in FIG. 2, “Origin” means that the tip of the tongue 22 is located immediately behind the upper front teeth 20, and the wide portion of the tongue 22 is lightly attached to the palate (the ceiling portion) of the upper jaw. This is the state of the tongue 22 shown by the solid line in FIG. “Lower tongue” is a state in which the position of the tip of the tongue 22 is low and the back side of the lower front tooth 21 is touched, that is, the state of the tongue 22 shown by a dotted line in FIG. “Mandibular” is a condition in which the teeth on the lower jaw (including the front teeth 21) protrude forward from the whole teeth (including the front teeth 20) on the upper jaw when meshed (see FIG. 2). The reverse state). “Protruding tongue” is a state in which the tongue 22 protrudes between the front teeth 20 and 21. "Lower tongue + lower tongue" (Mandibular + Lower tongue) is a condition in which lower tongue and lower jaw protrusion are combined.

舌位・舌癖判定装置１は、入力された発話者ｈの発話に係る音声データに基づいて、発話者ｈの舌位・舌癖が、「健常（Origin）」、「低位舌（Lower tongue）」、「下顎前突症(Mandibular)」、「舌突出（Protruding tongue）」、「低位舌＋下顎前突症(Mandibular＋Lower tongue)」のいずれに該当するか否かを判定する。 The tongue position / lingual tongue determination device 1 determines that the tongue position / tongue of the utterer h is “Origin”, “Lower tongue” based on the input voice data related to the utterance of the speaker h. ) ”,“ Mandibular ”,“ Protruding tongue ”, and“ Lower tongue + Mandibular + Lower tongue ”.

図１に示すように、本実施の形態に係る舌位・舌癖判定装置１は、発話者ｈの発話に係る音声データを入力する音声入力部２と、入力された音声データに基づいて、発話者ｈの舌位及び舌癖と関連する音響特徴量を計測する計測部３と、計測された音響特徴量に基づいて、発話者ｈの舌位又は舌癖を推定する推定部４と、を備える。また、舌位・舌癖判定装置１は、各種データを記憶する記憶装置である記憶部５を備える。 As shown in FIG. 1, the tongue position / lingual tongue determination apparatus 1 according to the present embodiment is based on a voice input unit 2 that inputs voice data related to the utterance of the speaker h, and the input voice data. A measuring unit 3 that measures an acoustic feature amount related to the tongue position and tongue tongue of the speaker h, and an estimation unit 4 that estimates the tongue position or tongue tongue of the speaker h based on the measured acoustic feature amount; Is provided. The tongue position / lingual tongue determining apparatus 1 includes a storage unit 5 that is a storage device that stores various data.

音声入力部２は、マイクロフォンであり、発話者ｈの発話に係る音声データを入力する。入力された音声データは、音声データ１０として記憶部５に記憶される。発話者ｈは、ある決まった言葉、例えば「いしいしいし・・・」という言葉を発する。音声入力部２は、この発話による音声を音声データとして入力する。発話者ｈが発話する言葉は、フォルマントが存在する子音である”Ｓ”を含む言葉が用いられる。ここで、フォルマントとは、言葉を発している人の音声のスペクトルに含まれる、時間的に移動している複数のピークの塊のことである。発明者は、上述した舌位・舌璧とフォルマントとが相関関係が高いことを突き止めている。 The voice input unit 2 is a microphone and inputs voice data related to the utterance of the speaker h. The input voice data is stored in the storage unit 5 as the voice data 10. The speaker h utters a certain fixed word, for example, the word “delicious”. The voice input unit 2 inputs the voice generated by this utterance as voice data. As words spoken by the speaker h, words including “S”, which is a consonant in which formants exist, are used. Here, the formant is a cluster of a plurality of peaks moving in time included in the spectrum of the voice of the person who is speaking. The inventor has found that the above-mentioned tongue position / tongue wall and formant have a high correlation.

計測部３は、抽出された音声データにおける舌位及び舌癖と関連する音響特徴量を計測する。このような音響特徴量には、抽出された音声データにおける音声レベルの零交差数がある。すなわち、計測部３は、抽出された音声データにおける波形が零レベルまたは零レベル付近の一定区間と交差する回数である零交差数を計測する。 The measuring unit 3 measures an acoustic feature amount related to the tongue position and tongue tongue in the extracted voice data. Such acoustic features include the number of zero crossings of the voice level in the extracted voice data. That is, the measuring unit 3 measures the number of zero crossings, which is the number of times that the waveform in the extracted voice data intersects with a zero level or a constant section near the zero level.

零交差数は、例えば母音と子音とを発声したときに、母音の区間では少なくなり、子音の区間では多くなることが知られている。また、零交差数は、同じ音を発した複数の者が発した場合でも、発話者の上述の舌位・舌癖によって異なることが知られている。図３（Ａ）及び図３（Ｂ）に示すように、「低位舌（Lower tongue）」、「下顎前突症(Mandibular)」、「低位舌＋下顎前突症(Mandibular＋Lower tongue)」、「健常（Origin）」、「舌突出（Protruding tongue）」について、子音内の各舌位・舌癖について零交差数の分布を確認したところ、舌位・舌癖毎に、零交差数の分布は大きく異なっている。 It is known that the number of zero crossings decreases in a vowel section and increases in a consonant section when, for example, vowels and consonants are uttered. Further, it is known that the number of zero crossings varies depending on the above-mentioned tongue position / tongue of the speaker even when a plurality of persons who have made the same sound are emitted. As shown in FIGS. 3 (A) and 3 (B), “Lower tongue”, “Mandibular”, “Lower tongue + Mandibular + Lower tongue”, “ With regard to “Origin” and “Protruding tongue”, the distribution of the number of zero crossings for each tongue position and tongue fold in the consonant was confirmed. It is very different.

なお、図３（Ａ）では、短冊状の中途にある横ラインが各舌位・舌癖における平均値であり、短冊の上端が上位１／４のデータの値であり、短冊の下端が下位１／４のデータの値である。また、縦ラインの上端が各舌位・舌癖における最高値であり、縦ラインの下端が各舌位・舌癖における最低値である。また、図３（Ｂ）では、横軸が零交差数であり、縦軸が各舌位・舌癖に係る密度（出現確率）を示している。これらのデータは、すべて子音区間の音声データから得られたものである。 In FIG. 3A, the horizontal line in the middle of the strip is the average value for each tongue position and tongue tongue, the upper end of the strip is the value of the upper quarter, and the lower end of the strip is the lower order. The data value is 1/4. Further, the upper end of the vertical line is the highest value in each tongue position / tongue, and the lower end of the vertical line is the lowest value in each tongue position / tongue. In FIG. 3B, the horizontal axis represents the number of zero crossings, and the vertical axis represents the density (appearance probability) associated with each tongue position and tongue tongue. These data are all obtained from the speech data of the consonant section.

また、計測部３は、記憶部５に記憶された音声データ１０の中から判定対象となる区間の音声データを抽出する。例えば、計測部３は、子音区間の音声データを、判定用の音声データとして抽出する。子音区間の判別には、例えば、上述した零交差数を用いることができる。 In addition, the measurement unit 3 extracts audio data of a section to be determined from the audio data 10 stored in the storage unit 5. For example, the measuring unit 3 extracts the sound data of the consonant section as the sound data for determination. For example, the number of zero crossings described above can be used to determine the consonant section.

零交差数から子音区間の音声データを抽出する方法は、以下の通りである。図４（Ａ）に示すような音声データ（波形データ）が得られた場合、この波形データのスペクトル(Spectrogram)は、図４（Ｂ）に示すようになる。計測部３は、図４（Ａ）に示す音声データの波形から、その波形がゼロレベルと交差する零クロス点（Z cross(Only trigger)）を検出する（図４（Ｃ））。ここで、フレーム（単位時間）毎の零クロス点の数（Z cross(Each frame)）は、図４（Ｄ）に示すようになる。計測部３は、このフレーム毎の零クロス点の数が閾値以上の部分（図４（Ｅ））の音声データ（Z cross(Detected result)で示される区間の音声データ）を、判定用の音声データとして抽出する。 A method for extracting speech data of a consonant section from the number of zero crossings is as follows. When audio data (waveform data) as shown in FIG. 4A is obtained, the spectrum of the waveform data is as shown in FIG. 4B. The measuring unit 3 detects a zero cross point (Z cross (Only trigger)) at which the waveform intersects the zero level from the waveform of the audio data shown in FIG. 4A (FIG. 4C). Here, the number of zero cross points (Z cross (Each frame)) per frame (unit time) is as shown in FIG. The measuring unit 3 uses the audio data (audio data in the section indicated by Z cross (Detected result)) of the portion where the number of zero cross points per frame is equal to or greater than the threshold (FIG. 4E) as audio for determination. Extract as data.

さらに、計測部３は、音声データのメル周波数ケプストラム係数（ＭＦＣＣ）を、音響特徴量として計測する。具体的には、計測部３は、子音区間の音声データとして抽出された判定用の音声データのメル周波数ケプストラム係数を計測する。メル周波数ケプストラム係数は、ケプストラムと同じく声道特性を表す音響特徴量である。ここで、ケプストラムとは、音のスペクトルを信号とみなして周波数変換（例えばフーリエ変換）した結果である。メルとは、その係数が、人間の音声知覚の特徴を考慮し算出されたものであることを示している。 Furthermore, the measurement unit 3 measures the mel frequency cepstrum coefficient (MFCC) of the audio data as an acoustic feature amount. Specifically, the measuring unit 3 measures the mel frequency cepstrum coefficient of the audio data for determination extracted as the audio data of the consonant section. The mel frequency cepstrum coefficient is an acoustic feature amount that represents vocal tract characteristics as in the cepstrum. Here, the cepstrum is the result of frequency conversion (for example, Fourier transform) by regarding the sound spectrum as a signal. Mel indicates that the coefficient is calculated in consideration of the characteristics of human speech perception.

計測部３は、プリエンファシスフィルタで判定用の音声データの波形の高域成分を強調する。プリエンファシスフィルタは、高域成分を強調することで声道特徴をはっきりと出すために用いられる。フィルタの演算式は、例えば、以下の式を採用することができる。
ｙ（ｎ）＝ｘ（ｎ）−ｐｘ（ｎ−１）
ここで、ｎは、自然数であり、サンプリング番号である。また、ｘ（ｎ）は判定用の音声波形データであり、ｘ（ｎ−１）は１つ前の音声データの値である。ｐはプリエンファシス係数であり、０．９７を用いることが多いが、設定する値は任意である。また、ｙ（ｎ）がフィルタの出力である。 The measuring unit 3 emphasizes the high frequency component of the waveform of the audio data for determination using the pre-emphasis filter. The pre-emphasis filter is used to clearly show vocal tract characteristics by enhancing high frequency components. For example, the following formula can be adopted as the filter calculation formula.
y (n) = x (n) -px (n-1)
Here, n is a natural number and a sampling number. Further, x (n) is the sound waveform data for determination, and x (n−1) is the value of the previous sound data. p is a pre-emphasis coefficient, and 0.97 is often used, but the value to be set is arbitrary. Y (n) is the output of the filter.

さらに、計測部３は、窓関数（ハミング窓）をかけた後に高域成分が強調された音声データに対して高速フーリエ変換（ＦＦＴ）を行い、音声データの振幅スペクトルを求める。 Further, the measuring unit 3 performs fast Fourier transform (FFT) on the audio data in which the high frequency component is emphasized after applying the window function (Humming window) to obtain the amplitude spectrum of the audio data.

続いて、計測部３は、振幅スペクトルにメルフィルタバンクをかけて圧縮する。メルフィルタバンクとは、例えば三角形のバンドパスフィルタを複数並べたものであり、メル尺度上で等間隔なフィルタバンクである。メル尺度は、人間の音声知覚を反映した周波数軸で単位はｍｅｌである。すなわち、メルフィルタバンクのバンドパスフィルタは、低周波数ほど間隔が狭く、高周波ほど間隔が広くなっている。バンドパスフィルタの数をチャネル数と呼ぶ。 Subsequently, the measurement unit 3 compresses the amplitude spectrum by applying a mel filter bank. A mel filter bank is, for example, a plurality of triangular bandpass filters arranged side by side, and is a filter bank that is equally spaced on the mel scale. The mel scale is a frequency axis reflecting human speech perception, and its unit is mel. That is, the band-pass filters of the mel filter bank are narrower as the frequency is lower, and wider as the frequency is higher. The number of bandpass filters is called the number of channels.

さらに、計測部３は、圧縮した数値列を信号とみなして離散コサイン変換を行ってケプストラムを得る。そして、得られたケプストラムの低次成分がメル周波数ケプストラム係数（ＭＦＣＣ）であり、計測部３は、ＭＦＣＣを抽出する。ＭＦＣＣは、次数が低い順に、ＭＦＣＣ（１）〜ＭＦＣＣ（２０）などと表現される。 Further, the measurement unit 3 regards the compressed numerical sequence as a signal and performs discrete cosine transform to obtain a cepstrum. The low-order component of the obtained cepstrum is the mel frequency cepstrum coefficient (MFCC), and the measurement unit 3 extracts the MFCC. The MFCC is expressed as MFCC (1) to MFCC (20) in order from the lowest order.

例えば、図５（Ａ）及び図５（Ｂ）に示すように、「低位舌（Lower tongue）」、「下顎前突症(Mandibular)」、「低位舌＋下顎前突症(Mandibular＋Lower tongue)」、「健常（Origin）」、「舌突出（Protruding tongue）」について、子音内の各舌位・舌癖についてＭＦＣＣ（４）（第４係数）の分布を確認したところ、舌位・舌癖毎に、ＭＦＣＣ（４）の分布が大きく異なっている。なお、図５（Ａ）及び図５（Ｂ）の見方は、図３（Ａ）及び図３（Ｂ）と同じである。 For example, as shown in FIGS. 5 (A) and 5 (B), “Lower tongue”, “Mandibular”, “Lower tongue + Mandibular + Lower tongue” , “Origin”, “Protruding tongue”, the distribution of MFCC (4) (4th coefficient) for each tongue position and tongue tongue in the consonant, the tongue position and tongue tongue In addition, the distribution of MFCC (4) is greatly different. 5A and 5B are the same as those in FIGS. 3A and 3B.

図６（Ａ）乃至図６（Ｅ）には、「低位舌（Lower tongue）」、「下顎前突症(Mandibular)」、「低位舌＋下顎前突症(Mandibular＋Lower tongue)」、「健常（Origin）」、「舌突出（Protruding tongue）」におけるＭＦＣＣ（２）乃至ＭＦＣＣ（１３）の計測結果が示されている。図６（Ａ）乃至図６（Ｅ）に示すように、各舌位・舌癖において、ＭＦＣＣ（２）乃至ＭＦＣＣ（１３）の変化パターン（プロフィール）はよく一致しており、そのプロフィールは、舌位・舌癖間で異なっている。 6 (A) to 6 (E), “Lower tongue”, “Mandibular”, “Mandibular + Lower tongue”, “Healthy ( The measurement results of MFCC (2) to MFCC (13) in “Origin” and “Protruding tongue” are shown. As shown in FIG. 6 (A) to FIG. 6 (E), the change patterns (profiles) of MFCC (2) to MFCC (13) are in good agreement in each tongue position and tongue tongue, and the profile is It is different between tongue position and tongue tongue.

推定部４は、計測された零交差数及びメル周波数ケプストラム係数（ＭＦＣＣ）に基づいて、発話者ｈの舌位又は舌癖を推定する。具体的には、記憶部５は、同じ舌位又は舌癖を有する複数の発話者（発話者ｈとは異なる）の発話に係る音声データから得られた音響特徴量に関する情報を参照データ１２として記憶している。推定部４は、計測された発話者ｈの音響特徴量に関する情報と最も近い参照データ１２に対応付けられた舌位又は舌癖を、発話者ｈの舌位又は舌癖として推定する。 The estimation unit 4 estimates the tongue position or tongue tongue of the speaker h based on the measured number of zero crossings and the mel frequency cepstrum coefficient (MFCC). Specifically, the storage unit 5 uses, as reference data 12, information related to acoustic feature amounts obtained from speech data related to utterances of a plurality of speakers (different from the speaker h) having the same tongue position or tongue tongue. I remember it. The estimation unit 4 estimates the tongue position or tongue tongue associated with the reference data 12 closest to the information related to the measured acoustic feature amount of the speaker h as the tongue position or tongue tongue of the speaker h.

より具体的には、記憶部５に記憶される参照データ１２は、同じ舌位又は舌癖を有する複数の発話者（発話者ｈを除く）の発話に係る音声データから得られた零交差数及びメル周波数ケプストラム係数（ＭＦＣＣ）を各要素とする参照用の音響特徴量ベクトルである。推定部４は、発話者ｈの発話に係る音声データから得られた零交差数及びメル周波数ケプストラム係数（ＭＦＣＣ）を各要素とする音響特徴量ベクトルを、参照用データ（参照用の音響特徴量ベクトル）１２と比較して、発話者ｈの舌位又は舌癖を推定する。 More specifically, the reference data 12 stored in the storage unit 5 is the number of zero crossings obtained from speech data related to the utterances of a plurality of speakers (excluding the speaker h) having the same tongue position or tongue tongue. And a reference acoustic feature vector having mel frequency cepstrum coefficients (MFCC) as elements. The estimation unit 4 generates an acoustic feature vector having each of the zero crossing number and the mel frequency cepstrum coefficient (MFCC) obtained from the speech data related to the speech of the speaker h as reference data (reference acoustic feature value). Compared with (vector) 12, the tongue position or tongue tongue of speaker h is estimated.

舌位・舌癖毎にまとめられた零交差数及びＭＦＣＣ（１）〜ＭＦＣＣ（８）を各要素とする参照用の音響特徴量ベクトル（参照データ１２）を２次元平面に変換して図示すると、図７に示すように、「低位舌（Lower tongue）」、「下顎前突症(Mandibular)」、「低位舌＋下顎前突症(Mandibular＋Lower tongue)」、「健常（Origin）」、「舌突出（Protruding tongue）」についてベクトルが存在する領域が明確に区別される。推定部４は、発話者ｈの発話に係る音声データから零交差数及びＭＦＣＣ（１）〜ＭＦＣＣ（８）を計測し、それらの値を要素とする音響特徴量ベクトルが、どの領域に属するかを判定することにより、発話者ｈの舌位・舌癖を推定する。例えば、図７に示す空間において、音響特徴量ベクトルが「Lower tongue」の領域に入っている場合には、その発話者ｈの舌位・舌癖は、「低位舌（Lower tongue）」であると推定される。 If the number of zero crossings and the MFCC (1) to MFCC (8) collected for each tongue position and tongue fold are used as reference elements, the reference acoustic feature vector (reference data 12) is converted into a two-dimensional plane and illustrated. As shown in Fig. 7, "Lower tongue", "Mandibular", "Lower tongue + Mandibular + Lower tongue", "Origin", "Tongue" The region where the vector exists for the “protruding tongue” is clearly distinguished. The estimation unit 4 measures the number of zero crossings and the MFCC (1) to MFCC (8) from the speech data related to the utterance of the speaker h, and to which region the acoustic feature quantity vector having these values as elements belongs. , The tongue position / tongue of the speaker h is estimated. For example, in the space shown in FIG. 7, when the acoustic feature quantity vector is in the region of “Lower tongue”, the tongue position / tongue of the speaker h is “Lower tongue”. It is estimated to be.

図８に示すように、舌位・舌癖判定装置１は、制御部３１、主記憶部３２、外部記憶部３３、操作部３４、表示部３５及び入力部３６をハードウエア構成として備えている。主記憶部３２、外部記憶部３３、操作部３４、表示部３５及び入力部３６はいずれも内部バス３０を介して制御部３１に接続されている。 As shown in FIG. 8, the tongue / lingual tongue determining apparatus 1 includes a control unit 31, a main storage unit 32, an external storage unit 33, an operation unit 34, a display unit 35, and an input unit 36 as hardware configurations. . The main storage unit 32, the external storage unit 33, the operation unit 34, the display unit 35, and the input unit 36 are all connected to the control unit 31 via the internal bus 30.

制御部３１は、ＣＰＵ（Central Processing Unit）等から構成されている。このＣＰＵが、外部記憶部３３に記憶されているプログラム３９を実行することにより、図１に示す舌位・舌癖判定装置１の各構成要素が実現される。 The control unit 31 includes a CPU (Central Processing Unit) and the like. When this CPU executes the program 39 stored in the external storage unit 33, each component of the tongue position / lingual tongue determination apparatus 1 shown in FIG. 1 is realized.

主記憶部３２は、ＲＡＭ（Random-Access Memory）等から構成されている。主記憶部３２には、外部記憶部３３に記憶されているプログラム３９がロードされる。この他、主記憶部３２は、制御部３１の作業領域（データの一時記憶領域）として用いられる。 The main storage unit 32 is composed of a RAM (Random-Access Memory) or the like. The main storage unit 32 is loaded with a program 39 stored in the external storage unit 33. In addition, the main storage unit 32 is used as a work area (temporary data storage area) of the control unit 31.

外部記憶部３３は、フラッシュメモリ、ハードディスク、ＤＶＤ−ＲＡＭ（Digital Versatile Disc Random-Access Memory）、ＤＶＤ−ＲＷ（Digital Versatile Disc ReWritable）等の不揮発性メモリから構成される。外部記憶部３３には、制御部３１に実行させるためのプログラム３９があらかじめ記憶されている。また、外部記憶部３３は、制御部３１の指示に従って、このプログラム３９の実行の際に用いられるデータを制御部３１に供給し、制御部３１から供給されたデータを記憶する。 The external storage unit 33 includes a nonvolatile memory such as a flash memory, a hard disk, a DVD-RAM (Digital Versatile Disc Random-Access Memory), and a DVD-RW (Digital Versatile Disc ReWritable). In the external storage unit 33, a program 39 to be executed by the control unit 31 is stored in advance. Further, the external storage unit 33 supplies data used when executing the program 39 to the control unit 31 in accordance with an instruction from the control unit 31, and stores the data supplied from the control unit 31.

上述の、計測部３及び推定部４は、制御部３１に対応しており、記憶部５は、主記憶部３２及び外部記憶部３３に対応している。 The measurement unit 3 and the estimation unit 4 described above correspond to the control unit 31, and the storage unit 5 corresponds to the main storage unit 32 and the external storage unit 33.

操作部３４は、キーボード及びマウスなどのポインティングデバイス等と、キーボードおよびポインティングデバイス等を内部バス３０に接続するインターフェイス装置から構成されている。操作部３４を介して、操作者が操作した内容に関する情報が制御部３１に入力される。この操作部３４から操作入力によって、音声入力部２、計測部３及び推定部４の動作が開始される。 The operation unit 34 includes a pointing device such as a keyboard and a mouse, and an interface device that connects the keyboard and the pointing device to the internal bus 30. Information regarding the content operated by the operator is input to the control unit 31 via the operation unit 34. The operation of the voice input unit 2, the measurement unit 3, and the estimation unit 4 is started by an operation input from the operation unit 34.

表示部３５は、ＣＲＴ（Cathode Ray Tube）またはＬＣＤ（Liquid Crystal Display）などから構成され、操作者が操作情報を入力する場合は、操作用の画面が表示される。表示部３５には、例えば、舌位の判定結果等が表示される。 The display unit 35 is composed of a CRT (Cathode Ray Tube), an LCD (Liquid Crystal Display), or the like. When the operator inputs operation information, an operation screen is displayed. The display unit 35 displays, for example, the result of tongue position determination.

入力部３６は、マイクロフォンから構成されている。入力部３６が、周囲の音声を入力し、音声データとして内部バス３０に出力する。音声入力部２は、制御部３１及び入力部３６によって構成される。 The input unit 36 is composed of a microphone. The input unit 36 inputs ambient sound and outputs it to the internal bus 30 as sound data. The voice input unit 2 includes a control unit 31 and an input unit 36.

なお、この他、通信ネットワークを介して通信可能な通信インターフェイスを有していてもよい。このような通信インターフェイスを介して受信した音声データも判定対象とすることができる。 In addition, you may have a communication interface which can communicate via a communication network. Audio data received via such a communication interface can also be determined.

図１に示す舌位・舌癖判定装置１の各種構成要素は、図２に示すプログラム３９が、制御部３１、主記憶部３２、外部記憶部３３、操作部３４、表示部３５及び入力部３６などをハードウエア資源として用いて実行されることによってその機能を発揮する。 The various components of the tongue / lingual tongue determining apparatus 1 shown in FIG. 1 include a control unit 31, a main storage unit 32, an external storage unit 33, an operation unit 34, a display unit 35, and an input unit. The function is exhibited by being executed using 36 as a hardware resource.

次に、本実施の形態に係る舌位・舌癖判定装置１の動作について説明する。図９には、舌位・舌癖判定装置１で実行される判定処理を示すフローチャートが示されている。 Next, the operation of the tongue position / lingual tongue determining apparatus 1 according to the present embodiment will be described. FIG. 9 shows a flowchart showing determination processing executed by the tongue position / lingual tongue determination apparatus 1.

図９に示すように、音声入力部２は、発話者ｈの発話に係る音声データを入力する音声入力工程を行う（ステップＳ１）。音声入力部２は、入力された音声データを、記憶部５に音声データ１０として記憶する。 As shown in FIG. 9, the voice input unit 2 performs a voice input process of inputting voice data related to the utterance of the speaker h (step S1). The voice input unit 2 stores the input voice data as the voice data 10 in the storage unit 5.

続いて、計測部３は、発話者ｈの発話に係る音声データから、判定対象となる音声データを抽出し、抽出された音声データにおける舌位又は舌癖と関連する音響特徴量を計測する計測工程を行う（ステップＳ２）。 Subsequently, the measurement unit 3 extracts the audio data to be determined from the audio data related to the utterance of the speaker h, and measures the acoustic feature quantity related to the tongue position or tongue tongue in the extracted audio data. A process is performed (step S2).

このステップＳ２において、図８に示すように、まず、計測部３は、音声データ１０を読み込んで、その音声データ１０の波形について零レベルと交差する零交差数を計測する零交差数計測を行う（ステップＳ１０）。そして、計測部３は、零交差数が閾値以上の区間の音声データを、子音区間の音声データとして抽出する子音区間抽出を行う（ステップＳ１１）。この子音区間抽出において、計測部３は、抽出された音声データにおける零交差数及びＭＦＣＣ（１）〜ＭＦＣＣ（８）の値を算出する音響特徴量算出を行う（ステップＳ１２）。この音響特徴量の算出の工程において、計測部３は、算出された音響特徴量を、音響特徴量データ１１として記憶部５に記憶する。 In step S2, as shown in FIG. 8, first, the measurement unit 3 reads the audio data 10 and performs zero-crossing measurement for measuring the number of zero-crossings that intersect the zero level for the waveform of the audio data 10. (Step S10). And the measurement part 3 performs the consonant area extraction which extracts the audio | voice data of the area where the number of zero crossings is more than a threshold value as audio | voice data of a consonant area (step S11). In this consonant segment extraction, the measurement unit 3 performs acoustic feature amount calculation for calculating the number of zero crossings and the values of MFCC (1) to MFCC (8) in the extracted voice data (step S12). In this acoustic feature quantity calculation step, the measurement unit 3 stores the calculated acoustic feature quantity in the storage unit 5 as acoustic feature quantity data 11.

図９に戻り、さらに、推定部４は、記憶部５に記憶された音響特徴量データ１１に基づいて、発話者（被験者）ｈの舌位又は舌癖を推定する推定工程を行う（ステップＳ３）。基本的には、推定部４は、音響特徴量データ１１（発話者ｈの音響特徴量ベクトル）と、参照データ１２としての舌位・舌癖毎の参照用の音響特徴量ベクトルとの間の距離をそれぞれ算出し、その距離が最も短い音響特徴量ベクトルに対応する舌位・舌癖を、判定結果として出力する。この判定結果は、例えば画面表示され、発話者ｈや医師等に提示することができる。 Returning to FIG. 9, the estimation unit 4 further performs an estimation step of estimating the tongue position or tongue tongue of the speaker (subject) h based on the acoustic feature data 11 stored in the storage unit 5 (step S <b> 3). ). Basically, the estimation unit 4 between the acoustic feature quantity data 11 (the acoustic feature quantity vector of the speaker h) and the reference acoustic feature vector for each tongue position / tongue as reference data 12 Each distance is calculated, and the tongue position / tongue corresponding to the acoustic feature vector having the shortest distance is output as a determination result. This determination result is displayed on a screen, for example, and can be presented to the speaker h or a doctor.

なお、記憶部５に記憶される参照データ１２としては、上述した処理が実行される前に、舌位・舌癖が明らかな複数の被験者の音声を音声入力部２により入力して、その音声データに対して、計測部３が計測した音響特徴量を、記憶部５に参照データ１２として記憶したものを用いればよい。また、全国平均でとられた膨大な量から成る参照データ１２を、記憶部５に記憶しておくようにしてもよい。 In addition, as the reference data 12 stored in the storage unit 5, the voice input unit 2 inputs voices of a plurality of subjects whose tongue position and tongue tongue are obvious before the above-described processing is executed. What stored the acoustic feature-value which the measurement part 3 measured with respect to data as the reference data 12 in the memory | storage part 5 should just be used. Further, the reference data 12 composed of an enormous amount taken on a national average may be stored in the storage unit 5.

以上詳細に説明したように、本実施の形態によれば、発話者ｈの発話に係る音声データから得られる音響特徴量に基づいて、発話者ｈの舌位・舌癖を推定するので、非侵襲に舌位・舌癖を判定することができる。 As described above in detail, according to the present embodiment, the tongue position / tongue of the speaker h is estimated based on the acoustic feature amount obtained from the speech data related to the speech of the speaker h. Tongue position and tongue fold can be determined during invasion.

なお、上記実施の形態では、零交差数及びＭＦＣＣ（１）〜ＭＦＣＣ（８）を要素とする音響特徴量ベクトルを用いて舌位・舌癖を判定したが、これには限られない。例えば、ＭＦＣＣ（９）以上を、ベクトルの要素に含めて判定を行うようにしてもかまわない。 In the above embodiment, the tongue position / lingual tongue is determined using the acoustic feature quantity vector having the number of zero crossings and the MFCC (1) to MFCC (8) as elements, but the present invention is not limited to this. For example, MFCC (9) or higher may be included in the vector element for the determination.

また、メル周波数ケプストラム係数の演算方法は、上述したものには限られない。例えば、プリエンファシスフィルタとは別の高調波フィルタ（ハイパスフィルタ）で、高調波成分を強調するようにしてもよい。また、ハミング窓を用いずに、矩形窓、ガウス窓、ハン窓のような他の窓関数を用いてもよい。また、離散コサイン変換を用いずに、高速フーリエ変換を用いて周波数変換を行ってもよい。 Further, the calculation method of the mel frequency cepstrum coefficient is not limited to the above. For example, the harmonic component may be emphasized by a harmonic filter (high pass filter) different from the pre-emphasis filter. Further, other window functions such as a rectangular window, a Gaussian window, and a Hann window may be used without using a Hamming window. Further, frequency conversion may be performed using fast Fourier transform without using discrete cosine transform.

また、上記実施の形態では、発話者ｈは、ある決まった言葉、例えば「いしいしいし・・・」という言葉を発するようにしたが、本発明はこれには限られない。発話者ｈが発する言葉は、フォルマントが存在する子音である”Ｓ”を含む他の言葉であってもよい。 Further, in the above embodiment, the speaker h utters a certain fixed word, for example, the word “delicious”, but the present invention is not limited to this. The words uttered by the speaker h may be other words including “S” which is a consonant in which a formant exists.

また、上記実施の形態では、舌位・舌癖として、「健常（Origin）」、「低位舌（Lower tongue）」、「下顎前突症(Mandibular)」、「舌突出（Protruding tongue）」を判定した。しかしながら、本発明はこれには限られない。他の舌位・舌癖を判定対象とするようにしてもよい。例えば、舌を上下の前歯２０、２１で挟み込むような舌癖を抽出するようにしてもよい。 In the above embodiment, “Origin”, “Lower tongue”, “Mandibular”, and “Protruding tongue” are used as the tongue position and tongue tongue. Judged. However, the present invention is not limited to this. Other tongue positions and tongue folds may be determined. For example, it is possible to extract a tongue fold that sandwiches the tongue between the upper and lower front teeth 20, 21.

また、判定するのは、「健常（Origin）」か、「低位舌（Lower tongue）」かだけでもよい。すなわち、上述した舌位・舌癖のうち、一部を判定することができるようにしてもよい。 Further, it may be determined only whether “Origin” or “Lower tongue”. That is, you may enable it to determine a part among tongue position and tongue tongue mentioned above.

また、上記実施の形態では、音声データの零交差数、メル周波数ケプストラム係数を音響特徴量として発話者ｈの舌位・舌癖を判定したが、本発明はこれには限られない。例えば、零交差数だけで発話者ｈの舌位・舌癖を判定することも可能である。発話者ｈの零交差数の分布を計測し、その分布を舌位・舌癖毎の分布（参照データ１２）と比べ、分布曲線が最も近い分布を有する舌位・舌癖を判定結果とするようにしてもよい。このように、音響特徴量が１つであっても、統計的な手法で、発話者ｈの舌位・舌癖を判定することが可能である。また、発話者ｈの舌位・舌癖を判定可能な音響特徴量であれば、他のものを用いることも可能である。 In the above embodiment, the tongue position / tongue of the speaker h is determined by using the number of zero crossings of the voice data and the mel frequency cepstrum coefficient as the acoustic feature quantity. However, the present invention is not limited to this. For example, it is also possible to determine the tongue position / tongue of the speaker h only from the number of zero crossings. The distribution of the number of zero crossings of the speaker h is measured, and the distribution is compared with the distribution for each tongue position / tongue (reference data 12), and the tongue position / tongue having the closest distribution curve is used as the determination result. You may do it. As described above, even if there is only one acoustic feature amount, it is possible to determine the tongue position / tongue of the speaker h by a statistical method. Any other acoustic feature can be used as long as it can determine the tongue position / tongue of the speaker h.

なお、本実施の形態では、舌位・舌癖判定装置１に音声入力部２を備えたが、本発明はこれには限られない。すなわち、音声入力部２は備えていなくてもよい。例えば遠隔地から送られてきた音声データに対して、舌位の判定を行う舌位・舌癖判定装置を用いるようにしてもよい。 In the present embodiment, the tongue position / lingual tongue determining apparatus 1 includes the voice input unit 2, but the present invention is not limited to this. That is, the voice input unit 2 may not be provided. For example, a tongue position / lingual tongue determination device that determines the tongue position of audio data sent from a remote place may be used.

また、上記実施の形態では、舌位・舌癖判定装置１を、例えば携帯電話、スマートフォン、ボイスレコーダ又はパーソナルコンピュータとしたが、これには限られない。舌位・舌癖判定装置１は、専用の装置であってもよい。 Moreover, in the said embodiment, although the tongue position / lingual tongue determination apparatus 1 was made into the mobile phone, the smart phone, the voice recorder, or the personal computer, for example, it is not restricted to this. The tongue position / lingual tongue determining apparatus 1 may be a dedicated apparatus.

その他、舌位・舌癖判定装置１のハードウエア構成やソフトウエア構成は一例であり、任意に変更および修正が可能である。 In addition, the hardware configuration and software configuration of the tongue position / lingual tongue determining apparatus 1 are merely examples, and can be arbitrarily changed and modified.

制御部３１、主記憶部３２、外部記憶部３３、操作部３４、表示部３５及び入力部３６、内部バス３０などから構成される舌位・舌癖判定装置１の処理を行う中心となる部分は、上述のように、専用のシステムによらず、通常のコンピュータシステムを用いて実現可能である。例えば、前記の動作を実行するためのコンピュータプログラムを、コンピュータが読み取り可能な記録媒体（フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ等）に格納して配布し、当該コンピュータプログラムをコンピュータにインストールすることにより、前記の処理を実行する舌位・舌癖判定装置１を構成してもよい。また、インターネット等の通信ネットワーク上のサーバ装置が有する記憶装置に当該コンピュータプログラムを格納しておき、通常のコンピュータシステムがダウンロード等することで舌位・舌癖判定装置１を構成してもよい。 A central part that performs processing of the tongue position / lingual tongue determining apparatus 1 including the control unit 31, the main storage unit 32, the external storage unit 33, the operation unit 34, the display unit 35, the input unit 36, the internal bus 30, and the like As described above, can be realized using a normal computer system without using a dedicated system. For example, a computer program for executing the above operation is stored in a computer-readable recording medium (flexible disk, CD-ROM, DVD-ROM, etc.) and distributed, and the computer program is installed in the computer. Thus, the tongue position / lingual tongue determining apparatus 1 that executes the above-described processing may be configured. Further, the tongue position / lingual tongue determining apparatus 1 may be configured by storing the computer program in a storage device included in a server device on a communication network such as the Internet and downloading the computer program by a normal computer system.

コンピュータの機能を、ＯＳ（オペレーティングシステム）とアプリケーションプログラムの分担、またはＯＳとアプリケーションプログラムとの協働により実現する場合などには、アプリケーションプログラム部分のみを記録媒体や記憶装置に格納してもよい。 When realizing the function of a computer by sharing an OS (operating system) and an application program, or by cooperation between the OS and an application program, only the application program portion may be stored in a recording medium or a storage device.

搬送波にコンピュータプログラムを重畳し、通信ネットワークを介して配信することも可能である。たとえば、通信ネットワーク上の掲示板（BBS, Bulletin Board System）にコンピュータプログラムを掲示し、ネットワークを介してコンピュータプログラムを配信してもよい。そして、このコンピュータプログラムを起動し、ＯＳの制御下で、他のアプリケーションプログラムと同様に実行することにより、前記の処理を実行できるように構成してもよい。 It is also possible to superimpose a computer program on a carrier wave and distribute it via a communication network. For example, a computer program may be posted on a bulletin board (BBS, Bulletin Board System) on a communication network, and the computer program distributed via the network. The computer program may be started and executed in the same manner as other application programs under the control of the OS, so that the above-described processing may be executed.

この発明は、この発明の広義の精神と範囲を逸脱することなく、様々な実施の形態及び変形が可能とされるものである。また、上述した実施の形態は、この発明を説明するためのものであり、この発明の範囲を限定するものではない。すなわち、この発明の範囲は、実施の形態ではなく、特許請求の範囲によって示される。そして、特許請求の範囲内及びそれと同等の発明の意義の範囲内で施される様々な変形が、この発明の範囲内とみなされる。 Various embodiments and modifications can be made to the present invention without departing from the broad spirit and scope of the present invention. The above-described embodiments are for explaining the present invention and do not limit the scope of the present invention. In other words, the scope of the present invention is shown not by the embodiments but by the claims. Various modifications within the scope of the claims and within the scope of the equivalent invention are considered to be within the scope of the present invention.

本発明は、発話者の舌位・舌癖を推定するのに有用である。 The present invention is useful for estimating a speaker's tongue position and tongue tongue.

１舌位・舌癖判定装置、２音声入力部、３計測部、４推定部、５記憶部、１０音声データ、１１音響特徴量データ、１２参照データ、１３参照データ、２０，２１前歯、２２舌、３０内部バス、３１制御部、３２主記憶部、３３外部記憶部、３４操作部、３５表示部、３６入力部、３９プログラム、ｈ発話者 DESCRIPTION OF SYMBOLS 1 Tongue position / lingual tongue determination apparatus, 2 Voice input part, 3 Measurement part, 4 Estimation part, 5 Storage part, 10 Voice data, 11 Acoustic feature-value data, 12 Reference data, 13 Reference data, 20, 21 Front tooth, 22 Tongue, 30 internal bus, 31 control unit, 32 main storage unit, 33 external storage unit, 34 operation unit, 35 display unit, 36 input unit, 39 program, h speaker

Claims

Based on voice data related to the utterance of the speaker, a measurement unit that measures acoustic features related to the tongue position and tongue tongue of the speaker;
An estimation unit for estimating the tongue position or tongue tongue of the speaker based on the measured acoustic features;
Tongue position / tongue determination device.

The measurement unit measures, as the acoustic feature amount, the number of zero crossings, which is the number at which the waveform of the input voice data intersects with a zero level or a constant section near the zero level,
The estimating unit estimates the tongue position or tongue tongue of the speaker based on the measured number of zero crossings;
The tongue position / tongue determining apparatus according to claim 1.

The measurement unit measures a mel frequency cepstrum coefficient of input voice data as the acoustic feature amount,
The estimation unit includes
Based on the measured Mel frequency cepstrum coefficient, estimating the tongue position or tongue tongue of the speaker,
The tongue position / tongue determining apparatus according to claim 1 or 2.

The tongue position or tongue fold is stored in association with information on the acoustic feature for reference,
The estimation unit estimates a tongue position or tongue tongue associated with the reference acoustic feature quantity closest to the measured acoustic feature quantity as the tongue position or tongue tongue of the speaker;
The tongue position / tongue determining apparatus according to any one of claims 1 to 3.

A reference acoustic feature vector having a zero-crossing number and a mel frequency cepstrum coefficient obtained from speech data related to the utterances of a plurality of speakers having the same tongue position or tongue fold as a reference acoustic feature amount Remember as information,
The estimation unit compares the acoustic feature quantity vector having the zero-crossing number and the mel frequency cepstrum coefficient obtained from the speech data related to the subject's utterance with the reference acoustic feature quantity vector. Estimating tongue position or tongue fold,
The tongue position / tongue determining apparatus according to claim 4.

The measurement unit extracts the sound data of the consonant section as sound data for determination.
The tongue position / tongue determining apparatus according to any one of claims 1 to 5.

The measurement unit extracts voice data of a section in which the number of zero crossings of the voice data is equal to or greater than a threshold value as voice data of a consonant section;
The tongue position / tongue determining apparatus according to claim 6.

A measurement step of measuring an acoustic feature amount related to the tongue position and tongue tongue of the speaker based on voice data related to the speaker's speech;
An estimation step for estimating the tongue position or tongue tongue of the speaker based on the measured acoustic features;
Tongue position / tongue determination method.

Computer
A measurement unit that measures acoustic features related to the tongue position and tongue tongue of the speaker based on voice data related to the speaker's speech,
An estimation unit that estimates the tongue position or tongue tongue of the speaker based on the measured acoustic feature amount;
Program to function as.