JP2005308993A

JP2005308993A - Learning support system

Info

Publication number: JP2005308993A
Application number: JP2004124406A
Authority: JP
Inventors: Etsuko Ebara; 枝津子江原; Koichi Fujita; 孝一藤田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2004-04-20
Filing date: 2004-04-20
Publication date: 2005-11-04

Abstract

<P>PROBLEM TO BE SOLVED: To provide a learning support system with which influence of a difference in solid difference of a speaker can be reduced and it can be made easy to compare waveforms of a teaching material and a learner's speech. <P>SOLUTION: A waveform display processing part 103 generates a pitch waveform comparison image for comparing teaching material pitch waveform information and learner pitch waveform information. At this point, a waveform display processing part 103 adjusts a pitch waveform display position to absorb the difference between a representative pitch of the speech of the teaching material and a representative pitch of the learner. Further, the waveform display processing part 103 generates a sound pressure waveform comparison image as an image for comparing teaching material sound pressure waveform information and learner sound pressure waveform information. In this case, the waveform display processing part 103 adjusts a sound pressure waveform to absorb the difference between the sound pressure variation width of the speech of the teaching material and sound pressure variation width of the speech of the learner. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、音声からピッチ波形および音圧波形を抽出して表示する機能を備えた学習支援システムに関する。 The present invention relates to a learning support system having a function of extracting and displaying a pitch waveform and a sound pressure waveform from speech.

従来より、マルチメディア技術を応用した語学学習用の学習支援システムが提案されている（例えば特許文献１参照）。従来の学習支援システムは、視覚的に教材の音声と学習者の音声を比較するための便利な機能として、音圧波形およびピッチ波形の表示機能を備えている。この場合、教材の音声から音圧波形およびピッチ波形が抽出される。学習者の入力音声からも音圧波形およびピッチ波形が抽出される。これらの波形が、学習者の操作に従って画面に表示される。 Conventionally, a learning support system for language learning using multimedia technology has been proposed (see, for example, Patent Document 1). The conventional learning support system has a display function of a sound pressure waveform and a pitch waveform as a convenient function for visually comparing the voice of the teaching material with the voice of the learner. In this case, a sound pressure waveform and a pitch waveform are extracted from the voice of the teaching material. A sound pressure waveform and a pitch waveform are also extracted from the input voice of the learner. These waveforms are displayed on the screen according to the learner's operation.

従来の学習支援システムの一例を挙げると、音圧波形およびピッチ波形は以下のようにして表示される。まず、教材の音声ファイルが読み込まれ、教材の音圧波形が画像中の上段領域に表示される。次に学習者が発声し、音声が録音され、録音音声の音圧波形が画像中の下段領域に表示される。上段領域および下段領域に対して、比較したい部分の範囲が指定される。次に、ピッチ抽出波形画面を開くためのボタン操作を受け付けると、ピッチ抽出波形画面が開く。そして、教材音声および学習者音声の指定範囲のピッチ波形が、上段および下段に分かれて表示される。 As an example of a conventional learning support system, a sound pressure waveform and a pitch waveform are displayed as follows. First, an audio file of a learning material is read, and a sound pressure waveform of the learning material is displayed in the upper area of the image. Next, the learner utters, the sound is recorded, and the sound pressure waveform of the recorded sound is displayed in the lower area of the image. The range of the portion to be compared is specified for the upper area and the lower area. Next, when a button operation for opening the pitch extraction waveform screen is received, the pitch extraction waveform screen opens. Then, the pitch waveforms in the designated ranges of the teaching material voice and the learner voice are displayed separately in an upper stage and a lower stage.

学習者は、スピーチ学習において、上記の機能を利用し、波形比較画像を見て、教材の波形と自分の波形を比較し、自分の波形を教材の波形に近づける。このような学習を通じて、語学の上達を促すことができる。
特開２００２−２３６１３号公報（第３−４ページ、図１） In the speech learning, the learner uses the above function, looks at the waveform comparison image, compares the waveform of the teaching material with his own waveform, and brings his waveform close to the waveform of the teaching material. Through this kind of learning, you can improve your language skills.
JP 2002-23613 (page 3-4, FIG. 1)

しかしながら、従来の学習支援システムは、下記のように、波形比較画像が教材の話者と学習者の個体差の影響を受けるので、波形の比較が容易でないという問題があった。 However, the conventional learning support system has a problem that the waveform comparison image is not easily compared because the waveform comparison image is affected by the individual difference between the speaker of the teaching material and the learner as described below.

例えば、教材の話者の声が学習者より大きかったとする。この場合、学習者が普通に話すと、声の大きさの分だけ音圧波形が異なってしまう。教材の音圧波形に学習者の音圧波形を合わせようとすると、学習者は、通常とは異なる大きな声を出さなくてはならない。声の大きさに気をとられる結果、練習の効果が低下する可能性がある。 For example, it is assumed that the speaker of the teaching material is louder than the learner. In this case, when the learner speaks normally, the sound pressure waveform differs by the amount of the voice. In order to match the sound pressure waveform of the learner with the sound pressure waveform of the teaching material, the learner must make a loud voice that is different from normal. As a result of paying attention to the loudness of the voice, the effectiveness of the practice may be reduced.

上記の例に示されるように、音圧は、発話者個人の声の大きさによって異なり、また、マイクの位置によっても異なる。ピッチは、発話者の性別などに応じた個体差があり、また、発話者の感情の起伏によっても変化する。このような個体差は、音圧およびピッチの波形にそのまま反映される。したがって、学習者が自分の波形を教材の波形に合わせるためには、通常と異なる大きさの声を出したり、通常と違う高さの声を出さなくてはならない。しかし、このような声の調整は、スピーチの上達には関係ないものである。むしろ、そのような練習は、上記の例で説明したように、スピーチの技術とは関係ない発話の特徴をまねするだけの結果に終わる可能性もある。 As shown in the above example, the sound pressure varies depending on the loudness of the speaker's individual voice and also varies depending on the position of the microphone. The pitch has individual differences depending on the gender of the speaker, and also changes depending on the emotional undulation of the speaker. Such individual differences are directly reflected in the sound pressure and pitch waveforms. Therefore, in order for the learner to adjust his / her waveform to the waveform of the teaching material, he / she has to make a voice with a different volume than usual or a voice with a different height. However, this kind of voice adjustment has nothing to do with improving speech. Rather, such practice may result in just simulating utterance features unrelated to speech technology, as explained in the example above.

上記のような話者の個体差を吸収するために、学習者が教材に合わせてマイク入力ボリュームを調節することも考えられる。また、性別に応じたピッチ探索周波数帯域が設定されてもよい。しかし、このような設定作業は、教材が変わる度に行わなければならず、手間と音声に対する知識が必要とされるため、このような作業を学習者に要求することは好ましいとはいえなかった。 In order to absorb the individual differences of speakers as described above, it is also conceivable that the learner adjusts the microphone input volume according to the teaching material. Also, a pitch search frequency band according to gender may be set. However, such setting work must be done every time the teaching material changes, and it requires labor and knowledge of voice, so it was not preferable to request such work from the learner. .

また、従来は、教材の波形と学習者の波形が別の領域に表示されており、両波形の距離が離れており、比較が容易でないという問題もあった。 Conventionally, the waveform of the teaching material and the waveform of the learner are displayed in different areas, and there is a problem that the distance between the two waveforms is long and the comparison is not easy.

本発明は上記課題を解決するためになされたものであり、発話者の個体差の違いによる影響を低減して、教材と学習者の音声の波形を容易に比較できる学習支援システムを提供することにある。 The present invention has been made to solve the above-described problem, and provides a learning support system that can easily compare the learning material and the waveform of the learner's voice by reducing the influence due to the difference in individual differences of the speakers. It is in.

本発明の学習支援システムは、教材の音声のピッチ波形である教材ピッチ波形情報を取得する教材ピッチ取得手段と、学習者の音声のピッチ波形である学習者ピッチ波形情報を取得する学習者ピッチ取得手段と、前記教材ピッチ波形情報および前記学習者ピッチ波形情報を比較するためのピッチ波形比較画像を生成するピッチ波形比較画像生成手段と、前記ピッチ波形比較画像生成手段に設けられ、前記教材の音声の代表的なピッチと前記学習者の代表的なピッチとの差分を吸収するようにピッチ波形表示位置を調整する表示位置調整手段とを備えている。 The learning support system according to the present invention includes learning material pitch acquisition means for acquiring learning material pitch waveform information that is a pitch waveform of a learning material voice, and learner pitch acquisition that acquires learner pitch waveform information that is a pitch waveform of a learner's voice. Means, a pitch waveform comparison image generating means for generating a pitch waveform comparison image for comparing the learning material pitch waveform information and the learner pitch waveform information, and a voice of the teaching material Display position adjusting means for adjusting the pitch waveform display position so as to absorb the difference between the representative pitch and the representative pitch of the learner.

この構成により、教材の音声の代表的なピッチと学習者の代表的なピッチの差分を吸収するようにピッチ波形表示位置を調整するので、学習者が声の高さを無理に教材に合わせなくても、教材と学習者のピッチ波形が比較しやすい位置に描かれた比較画像を提供できる。したがって、発話者の個体差の違いに影響を低減でき、教材と学習者の音声の波形を容易に比較できる。 With this configuration, the pitch waveform display position is adjusted so as to absorb the difference between the representative pitch of the teaching material's voice and the representative pitch of the learner, so that the learner does not force the pitch to match the teaching material. However, it is possible to provide a comparative image drawn at a position where the teaching material and the learner's pitch waveform can be easily compared. Therefore, it is possible to reduce the influence on the difference between individual speakers and to easily compare the teaching material and the voice waveform of the learner.

本発明の学習支援システムにおいて、前記表示位置調整手段は、前記代表的なピッチとしてピッチの平均値を用いる。この構成により、教材と学習者のピッチの平均の差分を吸収した比較画像を提供できる。 In the learning support system of the present invention, the display position adjusting means uses an average value of pitches as the representative pitch. With this configuration, it is possible to provide a comparative image in which the average difference between the teaching material and the learner's pitch is absorbed.

また、本発明の学習支援システムにおいて、前記ピッチ波形比較画像生成手段は、前記教材ピッチ波形情報と前記学習者ピッチ波形情報とを重ね合わせる。この構成により、教材と学習者の音声のピッチをさらに比較しやすくできる。 In the learning support system of the present invention, the pitch waveform comparison image generation unit superimposes the teaching material pitch waveform information and the learner pitch waveform information. This configuration makes it easier to compare the pitch between the teaching material and the learner's voice.

また、本発明の学習支援システムにおいて、前記ピッチ波形比較画像生成手段は、学習者の入力音声からリアルタイムでピッチ波形画像を生成し、前記表示位置調整手段は、過去に録音された学習者の音声から得られた代表ピッチを用いて表示位置を調整する。この構成により、リアルタイムでピッチ波形表示位置を調整することができる。 In the learning support system of the present invention, the pitch waveform comparison image generating means generates a pitch waveform image in real time from the input voice of the learner, and the display position adjusting means is the learner's voice recorded in the past. The display position is adjusted using the representative pitch obtained from the above. With this configuration, the pitch waveform display position can be adjusted in real time.

また、本発明の学習支援システムにおいて、前記ピッチ波形比較画像生成手段は、学習者の入力音声からリアルタイムでピッチ波形画像を生成し、前記表示位置調整手段は、入力された学習者の音声の初期段階のピッチを用いて表示位置を調整する。この構成によっても、リアルタイムでピッチ波形表示位置を調整することができる。 In the learning support system of the present invention, the pitch waveform comparison image generating unit generates a pitch waveform image in real time from the input voice of the learner, and the display position adjusting unit is an initial stage of the input voice of the learner. Adjust the display position using the step pitch. Also with this configuration, the pitch waveform display position can be adjusted in real time.

また、本発明の学習支援システムは、前記ピッチ波形比較画像生成手段に設けられ、前記教材ピッチ波形情報と前記学習者ピッチ波形情報の少なくとも一方に関して、ピッチ変化幅を表す情報をピッチ波形比較画像に付加する変化幅付加手段を備えている。 Further, the learning support system of the present invention is provided in the pitch waveform comparison image generating means, and information indicating a pitch change width is provided in the pitch waveform comparison image for at least one of the teaching material pitch waveform information and the learner pitch waveform information. A change width adding means for adding is provided.

この構成により、学習者は、ピッチ変化幅に着目することで、教材と学習者のピッチ波形を容易に比較できる。 With this configuration, the learner can easily compare the teaching material and the learner's pitch waveform by paying attention to the pitch change width.

また、本発明の学習支援システムにおいて、前記変化幅付加手段は、音声の最高ピッチと最低ピッチを示す情報をピッチ波形比較画像に付加する。この構成により、最高ピッチと最低ピッチが表示されるので、ピッチの把握がさらに容易になる。 In the learning support system of the present invention, the change width adding unit adds information indicating the highest pitch and the lowest pitch of the voice to the pitch waveform comparison image. With this configuration, since the highest pitch and the lowest pitch are displayed, it becomes easier to grasp the pitch.

また、本発明の別態様の学習支援システムは、教材の音声の音圧波形である教材音圧波形情報を取得する教材音圧取得手段と、学習者の音声の音圧波形である学習者音圧波形情報を取得する学習者音圧取得手段と、前記教材音圧波形情報および前記学習者音圧波形情報を比較するための音圧波形比較画像を生成する音圧波形比較画像生成手段と、前記音圧波形比較画像生成手段に設けられ、前記教材の音声の音圧変化幅と前記学習者の音声の音圧変化幅との差分を吸収するように音圧波形を調整する波形調整手段とを備えている。 The learning support system according to another aspect of the present invention includes teaching material sound pressure acquisition means for acquiring teaching material sound pressure waveform information that is a sound pressure waveform of a teaching material voice, and learner sound that is a sound pressure waveform of a learner's voice. Learner sound pressure acquisition means for acquiring pressure waveform information, sound pressure waveform comparison image generation means for generating a sound pressure waveform comparison image for comparing the teaching material sound pressure waveform information and the learner sound pressure waveform information, Waveform adjusting means provided in the sound pressure waveform comparison image generating means for adjusting the sound pressure waveform so as to absorb a difference between the sound pressure change width of the sound of the teaching material and the sound pressure change width of the learner's voice; It has.

この構成により、教材の音声の音圧変化幅と学習者の音声の音圧変化幅の差分が吸収されるので、学習者が声の大きさを無理に教材に合わせなくても、教材と学習者の音圧波形が比較しやすいかたちで描かれた比較画像を提供できる。したがって、発話者の個体差の違いによる影響を低減でき、教材と学習者の音声の波形を容易に比較できる。 This configuration absorbs the difference between the sound pressure change width of the teaching material's voice and the sound pressure change width of the learner's voice, so that the learner can learn from the teaching material without having to adjust the loudness of the voice to the teaching material. It is possible to provide a comparative image in which the sound pressure waveforms of the person are easily compared. Therefore, it is possible to reduce the influence caused by the difference between individual speakers and to easily compare the teaching material and the voice waveform of the learner.

また、本発明の学習支援システムにおいて、前記波形調整手段は、音圧変化幅として音声の最大音圧と最小音圧の幅を用いる。この構成により、教材と学習者の音圧のダイナミックレンジの差分を吸収した比較画像を提供できる。 In the learning support system of the present invention, the waveform adjusting unit uses a maximum sound pressure and a minimum sound pressure width as the sound pressure change width. With this configuration, it is possible to provide a comparative image that absorbs the difference between the dynamic range of the teaching material and the sound pressure of the learner.

また、本発明の学習支援システムにおいて、前記音圧波形比較画像生成手段は、前記教材音圧波形情報と前記学習者音圧波形情報を重ね合わせる。この構成により、教材と学習者の音声の音圧をさらに比較しやすくできる。 In the learning support system of the present invention, the sound pressure waveform comparison image generating means superimposes the teaching material sound pressure waveform information and the learner sound pressure waveform information. With this configuration, it is possible to further easily compare the sound pressure of the teaching material and the learner's voice.

また、本発明の学習支援システムにおいて、前記音圧波形比較画像生成手段は、学習者の入力音声からリアルタイムで音圧波形画像を生成し、前記波形調整手段は、過去に録音された学習者の音声から得られた音圧変化幅を用いて音圧波形を調整する。この構成により、リアルタイムで調整された音圧波形を表示することができる。 Further, in the learning support system of the present invention, the sound pressure waveform comparison image generating means generates a sound pressure waveform image in real time from the input voice of the learner, and the waveform adjusting means is for the learner recorded in the past. The sound pressure waveform is adjusted using the sound pressure change width obtained from the voice. With this configuration, the sound pressure waveform adjusted in real time can be displayed.

また、本発明の学習支援システムにおいて、前記音圧波形比較画像生成手段は、学習者の入力音声からリアルタイムで音圧波形画像を生成し、前記表示位置調整手段は、入力された学習者の音声の初期段階の音圧を用いて表示位置を調整する。この構成により、リアルタイムで調整された音圧波形を表示することができる。 Further, in the learning support system of the present invention, the sound pressure waveform comparison image generating means generates a sound pressure waveform image in real time from the learner's input voice, and the display position adjusting means is the input learner's voice. The display position is adjusted using the sound pressure at the initial stage. With this configuration, the sound pressure waveform adjusted in real time can be displayed.

また、本発明の学習支援システムは、前記音圧波形比較画像生成手段に設けられ、前記教材音圧波形情報と前記学習者音圧波形情報の少なくとも一方に関して、音圧変化幅を表す情報を音圧波形比較画像に加える変化幅付加手段を備えている。 Further, the learning support system of the present invention is provided in the sound pressure waveform comparison image generating means, and at least one of the teaching material sound pressure waveform information and the learner sound pressure waveform information, information indicating a sound pressure change width is obtained as sound. A change width adding means for adding to the pressure waveform comparison image is provided.

この構成により、学習者は、音圧変化幅に着目することで、教材と学習者の音圧波形を容易に比較できる。 With this configuration, the learner can easily compare the teaching material and the sound pressure waveform of the learner by paying attention to the sound pressure change width.

また、本発明の学習支援システムにおいて、前記変化幅付加手段は、音声の最大音圧と最小音圧を示す情報を音圧波形比較画像に付加する。この構成により、最大音圧と最小音圧が表示されるので、音圧の把握がさらに容易になる。 In the learning support system of the present invention, the change width adding unit adds information indicating the maximum sound pressure and the minimum sound pressure of the sound to the sound pressure waveform comparison image. With this configuration, since the maximum sound pressure and the minimum sound pressure are displayed, it becomes easier to grasp the sound pressure.

本発明の別態様の学習支援システムは、教材の音声パラメータの波形である教材波形情報を取得する教材波形取得手段と、学習者の音声パラメータの波形である学習者波形情報を取得する学習者波形取得手段と、前記教材波形情報および前記学習者波形情報を比較するための波形比較画像を生成する波形比較画像生成手段と、前記波形比較画像生成手段に設けられ、前記教材および前記学習者の音声パラメータの代表値の差を吸収するように波形画像を調整する調整手段とを備えている。音声パラメータは、例えば、上記のピッチまたは音圧である。この構成によっても、上述した本発明の利点が得られ、すなわち、発話者の個体差の違いによる影響を低減でき、教材と学習者の音声の波形を容易に比較できる。 A learning support system according to another aspect of the present invention includes learning material waveform acquisition means for acquiring learning material waveform information that is a waveform of a speech parameter of a learning material, and learner waveform that acquires learner waveform information that is a waveform of a learner's speech parameter. Provided in the acquisition means, the waveform comparison image generation means for generating a waveform comparison image for comparing the learning material waveform information and the learner waveform information, and the waveform comparison image generation means; Adjusting means for adjusting the waveform image so as to absorb the difference between the representative values of the parameters. The voice parameter is, for example, the above pitch or sound pressure. Also with this configuration, the above-described advantages of the present invention can be obtained, that is, the influence due to the difference between individual speakers can be reduced, and the teaching material and the voice waveform of the learner can be easily compared.

また、本発明の音声情報処理方法は、教材の音声のピッチ波形である教材ピッチ波形情報を取得するステップと、学習者の音声のピッチ波形である学習者ピッチ波形情報を取得するステップと、前記教材ピッチ波形情報および前記学習者ピッチ波形情報を比較するためのピッチ波形比較画像を生成するステップと、前記ピッチ波形比較画像を生成するときに、前記教材の音声の代表的なピッチと前記学習者の代表的なピッチとの差分を吸収するようにピッチ波形表示位置を調整するステップとを備えている。この構成によっても、上述した本発明の利点が得られる。 The speech information processing method of the present invention includes a step of acquiring teaching material pitch waveform information that is a pitch waveform of speech of a teaching material, a step of acquiring learner pitch waveform information that is a pitch waveform of a learner's speech, Generating a pitch waveform comparison image for comparing the learning material pitch waveform information and the learner pitch waveform information; and when generating the pitch waveform comparison image, a representative pitch of the teaching material voice and the learner Adjusting the pitch waveform display position so as to absorb the difference from the representative pitch. This configuration also provides the advantages of the present invention described above.

また、本発明の別態様の音声情報処理方法は、教材の音声の音圧波形である教材音圧波形情報を取得するステップと、学習者の音声の音圧波形である学習者音圧波形情報を取得するステップと、前記教材音圧波形情報および前記学習者音圧波形情報を比較するための音圧波形比較画像を生成するステップと、前記音圧波形比較画像を生成するときに、前記教材の音声の音圧変化幅と前記学習者の音声の音圧変化幅との差分を吸収するように音圧波形を調整するステップとを備えている。この構成によっても、上述した本発明の利点が得られる。 The speech information processing method according to another aspect of the present invention includes a step of acquiring teaching material sound pressure waveform information that is a sound pressure waveform of speech of a teaching material, and learner sound pressure waveform information that is a sound pressure waveform of a learner's speech. Obtaining a sound pressure waveform comparison image for comparing the learning material sound pressure waveform information and the learner sound pressure waveform information, and generating the sound pressure waveform comparison image, Adjusting the sound pressure waveform so as to absorb the difference between the sound pressure change width of the voice and the sound pressure change width of the learner's voice. This configuration also provides the advantages of the present invention described above.

また、本発明の音声情報処理プログラムは、教材の音声のピッチ波形である教材ピッチ波形情報を取得するステップと、学習者の音声のピッチ波形である学習者ピッチ波形情報を取得するステップと、前記教材ピッチ波形情報および前記学習者ピッチ波形情報を比較するためのピッチ波形比較画像を生成するステップと、前記ピッチ波形比較画像を生成するときに、前記教材の音声の代表的なピッチと前記学習者の代表的なピッチとの差分を吸収するようにピッチ波形表示位置を調整するステップとをコンピュータに実行させる。この構成によっても、上述した本発明の利点が得られる。 The speech information processing program of the present invention includes a step of acquiring teaching material pitch waveform information which is a pitch waveform of speech of a teaching material, a step of acquiring learner pitch waveform information which is a pitch waveform of a learner's speech, Generating a pitch waveform comparison image for comparing the learning material pitch waveform information and the learner pitch waveform information; and when generating the pitch waveform comparison image, a representative pitch of the teaching material voice and the learner And adjusting the pitch waveform display position so as to absorb the difference from the representative pitch of the computer. This configuration also provides the advantages of the present invention described above.

また、本発明の別態様の音声情報処理プログラムは、教材の音声の音圧波形である教材音圧波形情報を取得するステップと、学習者の音声の音圧波形である学習者音圧波形情報を取得するステップと、前記教材音圧波形情報および前記学習者音圧波形情報を比較するための音圧波形比較画像を生成するステップと、前記音圧波形比較画像を生成するときに、前記教材の音声の音圧変化幅と前記学習者の音声の音圧変化幅との差分を吸収するように音圧波形を調整するステップとをコンピュータに実行させる。この構成によっても、上述した本発明の利点が得られる。 Further, the speech information processing program according to another aspect of the present invention includes a step of acquiring teaching material sound pressure waveform information that is a sound pressure waveform of speech of a teaching material, and learner sound pressure waveform information that is a sound pressure waveform of a learner's speech. Obtaining a sound pressure waveform comparison image for comparing the learning material sound pressure waveform information and the learner sound pressure waveform information, and generating the sound pressure waveform comparison image, Adjusting the sound pressure waveform so as to absorb the difference between the sound pressure change width of the voice and the sound pressure change width of the learner's voice. This configuration also provides the advantages of the present invention described above.

本発明は、教材音声と学習者音声の代表ピッチ等の差分を吸収する調整手段を設けたことにより、学習者が声の高さ等を無理に教材に合わせなくても、教材と学習者を比較しやすい比較画像を提供できる。したがって、発話者の個体差の違いによる影響を低減でき、教材と学習者の音声の波形を容易に比較できるという効果を有する学習支援システムを提供することができるものである。 The present invention provides adjustment means for absorbing the difference between the representative pitch of the teaching material voice and the learner voice, so that the learning material and the learner can be connected without the learner forcing the voice pitch to match the teaching material. A comparative image that is easy to compare can be provided. Therefore, it is possible to provide a learning support system that can reduce the influence caused by the difference between individual speakers and can easily compare the teaching material and the voice waveform of the learner.

以下、本発明の実施の形態の学習支援システムについて、図面を用いて説明する。 Hereinafter, a learning support system according to an embodiment of the present invention will be described with reference to the drawings.

本発明の実施の形態の学習支援システムを図１に示す。以下の説明では、教材の音声の波形を教材波形といい、学習者の音声の波形を学習者波形という。波形としては、ピッチ波形と音圧波形が用いられる。また、教材と学習者の波形を比較する画像を波形比較画像という。 A learning support system according to an embodiment of the present invention is shown in FIG. In the following description, the sound waveform of the learning material is referred to as a learning material waveform, and the learner's sound waveform is referred to as a learner waveform. As the waveform, a pitch waveform and a sound pressure waveform are used. An image for comparing the teaching material and the learner's waveform is called a waveform comparison image.

図１において、学習支援システム１は、学習者端末１０とサーバ端末２０を備え、これらはネットワークで接続されている。図示されないが、同様の構成を有する複数の学習者端末１０がネットワークに接続されている。学習者端末１０およびサーバ端末２０はコンピュータで構成され、各端末の処理機能は、コンピュータにインストールされたプログラムをＣＰＵが実行することによって実現される。ネットワークは、学校内等のＬＡＮでもよく、また、インターネットでもよい。本システムをＷｅｂ上で稼働することで、ｅ−ｌｅａｒｎｉｎｇ上で、発音矯正可能なシステムを実現できる。 In FIG. 1, a learning support system 1 includes a learner terminal 10 and a server terminal 20, which are connected via a network. Although not shown, a plurality of learner terminals 10 having the same configuration are connected to the network. The learner terminal 10 and the server terminal 20 are configured by computers, and the processing functions of each terminal are realized by the CPU executing a program installed in the computer. The network may be a LAN in a school or the Internet. By operating this system on the Web, a system capable of correcting pronunciation on e-learning can be realized.

また、図示されないが、サーバ端末２０を制御する先生用の端末もネットワークに接続されている。先生用の端末が設けられず、サーバ端末２０が直接先生によって操作されてもよい。 Although not shown, a teacher terminal that controls the server terminal 20 is also connected to the network. The teacher terminal may not be provided, and the server terminal 20 may be directly operated by the teacher.

図１に示すように、学習者端末１０は、音声入出力部１０１、音声抽出演算処理部１０２、波形表示処理部１０３、表示部１０４、認証処理部１０５、音声関連情報記憶部１０６、音圧計算処理部１０７、音声教材データ記憶部１０８、学習者音声記録部１０９および操作部１１０で構成されている。 As shown in FIG. 1, the learner terminal 10 includes a voice input / output unit 101, a voice extraction calculation processing unit 102, a waveform display processing unit 103, a display unit 104, an authentication processing unit 105, a voice related information storage unit 106, a sound pressure The calculation processing unit 107, the audio teaching material data storage unit 108, the learner audio recording unit 109, and the operation unit 110 are included.

音声入出力部１０１は、ヘッドホンおよびマイクで構成されており、教材の音声を出力し、かつ、学習者の音声を入力する機能をもつ。 The voice input / output unit 101 is composed of headphones and a microphone, and has a function of outputting the voice of the teaching material and inputting the voice of the learner.

音声抽出演算処理部１０２は、音声からピッチを抽出する演算機能を有すると共に、音声教材データ記憶部１０８、音声関連情報記憶部１０６および学習者音声記録部１０９に関するデータ入出力の制御機能をもつ。ピッチ抽出演算機能は、ＦＦＴまたは自己相関等の処理を行うように構成されている。ピッチは、教材音声と学習者音声の両方から抽出される。また、音声抽出演算処理部１０２は、教材音声を音声入出力部１０１のヘッドホンに供給し、音声入出力部１０１のマイクに入力された学習者音声を取得する。 The voice extraction calculation processing unit 102 has a calculation function for extracting a pitch from voice, and also has a data input / output control function regarding the voice teaching material data storage unit 108, the voice related information storage unit 106, and the learner voice recording unit 109. The pitch extraction calculation function is configured to perform processing such as FFT or autocorrelation. The pitch is extracted from both the teaching material voice and the learner voice. Further, the voice extraction calculation processing unit 102 supplies the teaching material voice to the headphones of the voice input / output unit 101 and acquires the learner voice input to the microphone of the voice input / output unit 101.

音圧計算処理部１０７は、音声抽出演算処理部１０２から供給された音声データから音圧を算出する。音圧も教材音声と学習者音声の両方から算出される。波形表示処理部１０３は、ピッチ波形および音圧波形を表示するための処理を行う。波形表示処理部１０３は、音声抽出演算処理部１０２から供給されるピッチを用いて、ピッチ波形の画像を生成し、また、音圧計算処理部１０７から供給される音圧を用いて、音圧波形の画像を生成する。ピッチ波形および音圧波形は表示部１０４に表示される。表示部１０４は、ディスプレイで構成されている。 The sound pressure calculation processing unit 107 calculates the sound pressure from the sound data supplied from the sound extraction calculation processing unit 102. The sound pressure is also calculated from both the teaching material voice and the learner voice. The waveform display processing unit 103 performs processing for displaying a pitch waveform and a sound pressure waveform. The waveform display processing unit 103 generates an image of a pitch waveform using the pitch supplied from the voice extraction calculation processing unit 102, and uses the sound pressure supplied from the sound pressure calculation processing unit 107 to generate sound pressure. Generate a waveform image. The pitch waveform and the sound pressure waveform are displayed on the display unit 104. The display unit 104 includes a display.

また、音声教材データ記憶部１０８には、音声教材データが格納されている。音声教材データは、サーバ端末２０の音声教材データ記憶部２０４から学習者端末１０に供給され、音声抽出演算処理部１０２によって音声教材データ記憶部１０８に格納される。また、学習者音声記録部１０９には、音声入出力部１０１に入力された学習者音声が、音声ファイルのかたちで格納される。 The audio teaching material data storage unit 108 stores audio teaching material data. The voice teaching material data is supplied from the voice teaching data storage unit 204 of the server terminal 20 to the learner terminal 10 and is stored in the voice teaching data storage unit 108 by the voice extraction calculation processing unit 102. The learner voice recording unit 109 stores the learner voice input to the voice input / output unit 101 in the form of an audio file.

また、音声関連情報記憶部１０６には、学習者の過去の学習で得られた情報（以下、音声関連情報という）が記憶される。音声関連情報には、「音圧（音声の強度）」、「ピッチ帯域（平均値、最高値、最低値）」、「母音および子音の周波数特性」、「マイク入力レベル」、「学習評価結果」などが含まれる。これらは、学習者ごとの情報であり、使用環境に依存した情報である。学習評価結果は、教材波形と学習者波形の差分の計測値の情報である。音声関連情報は、下記の認証処理部１０５を介してサーバ端末２０から入手され、音声抽出演算処理部１０２により音声関連情報記憶部１０６に書き込まれる。そして、学習者端末１０を使った学習を反映するように、音声関連情報が音声抽出演算処理部１０２により更新される。 The voice related information storage unit 106 stores information obtained by the learner's past learning (hereinafter referred to as voice related information). Voice related information includes “sound pressure (sound intensity)”, “pitch band (average value, highest value, lowest value)”, “frequency characteristics of vowels and consonants”, “microphone input level”, “learning evaluation results” And the like. These are information for each learner, and are information depending on the use environment. The learning evaluation result is information on the measured value of the difference between the teaching material waveform and the learner waveform. The voice related information is obtained from the server terminal 20 via the authentication processing unit 105 described below, and is written into the voice related information storage unit 106 by the voice extraction calculation processing unit 102. Then, the voice extraction calculation processing unit 102 updates the voice related information so as to reflect the learning using the learner terminal 10.

認証処理部１０５は、認証要求をサーバ端末２０の認証処理部２０１に送り、サーバ端末２０から認証結果を受信する。そして、認証処理部１０５は、認証対象の学習者の音声関連情報をサーバ端末２０から入手し、音声抽出演算処理部１０２に供給する。 The authentication processing unit 105 sends an authentication request to the authentication processing unit 201 of the server terminal 20 and receives an authentication result from the server terminal 20. Then, the authentication processing unit 105 obtains the voice related information of the learner to be authenticated from the server terminal 20 and supplies it to the voice extraction calculation processing unit 102.

また、操作部１１０は、キーボード、マウス等のデバイスで構成されており、学習者の各種の操作を入力する。学習者の指示は、操作部１１０から入力され、関連する処理部へと伝えられる。 The operation unit 110 includes devices such as a keyboard and a mouse, and inputs various operations of the learner. The learner's instruction is input from the operation unit 110 and transmitted to a related processing unit.

前述したように、学習者端末１０は、コンピュータで構成されている。そして、上記の各種処理部を実現するプログラムが用意され、それらプログラムがコンピュータにインストールされている。そして、ＣＰＵがメモリ等の構成を用いてプログラムを実行し、これにより学習者端末１０が実現される。 As described above, the learner terminal 10 is configured by a computer. And the program which implement | achieves said various process part is prepared, and these programs are installed in the computer. Then, the CPU executes the program using a configuration such as a memory, and thereby the learner terminal 10 is realized.

サーバ端末２０は、図示のように、認証処理部２０１、音声関連情報データベース２０２、学習者情報データベース２０３および音声教材データベース２０４を備えている。音声教材データベース２０４には、各種の音声教材が格納されている。音声教材は、ネットワークを経由して学習者端末１０に配信される。 As shown in the figure, the server terminal 20 includes an authentication processing unit 201, a voice related information database 202, a learner information database 203, and a voice teaching material database 204. The audio teaching material database 204 stores various audio teaching materials. The audio teaching material is distributed to the learner terminal 10 via the network.

学習者情報データベース２０３には、各学習者の情報が格納されており、学習者の情報は、氏名、番号、認証のためのＩＤおよびパスワードを含んでいる。さらに、学習者情報データベース２０３は、各学習者の出席情報および学習履歴情報を記憶している。また、音声関連情報データベース２０２には、各学習者の音声関連情報が格納されている。音声関連情報の内容は、学習者端末１０の音声関連情報記憶部１０６に関連して説明した通りである。 The learner information database 203 stores information about each learner, and the learner information includes a name, a number, an ID for authentication, and a password. Further, the learner information database 203 stores attendance information and learning history information of each learner. The voice related information database 202 stores voice related information of each learner. The contents of the voice related information are as described in relation to the voice related information storage unit 106 of the learner terminal 10.

認証処理部２０１は、学習者端末１０から認証要求を受信し、認証処理を行い、認証結果を学習者端末１０に送信する。認証が成功したとき、認証処理部２０１は、音声関連情報データベース２０２および学習者情報データベース２０３に格納された情報を学習者端末１０に提供する。 The authentication processing unit 201 receives an authentication request from the learner terminal 10, performs an authentication process, and transmits an authentication result to the learner terminal 10. When the authentication is successful, the authentication processing unit 201 provides information stored in the speech related information database 202 and the learner information database 203 to the learner terminal 10.

サーバ端末２０も、前述したように、コンピュータで構成されている。そして、認証処理部２０１および他の情報提供機能を実現するプログラムが用意され、それらプログラムがコンピュータにインストールされている。そして、ＣＰＵがメモリ等の構成を用いてプログラムを実行し、これによりサーバ端末２０が実現される。 The server terminal 20 is also composed of a computer as described above. Programs for realizing the authentication processing unit 201 and other information providing functions are prepared, and these programs are installed in the computer. Then, the CPU executes a program using a configuration such as a memory, thereby realizing the server terminal 20.

学習支援システム１の全体的な動作の例について説明すると、図２に示すように、システム起動時は、認証処理が行われる。ＩＤ、パスワード等の情報が操作部１１０に入力されると、認証処理部１０５からサーバ端末２０の認証処理部２０１に認証要求が送られる（Ｓ１０）。認証処理部２０１は、学習者情報データベース２０３を参照して認証処理を行い、認証結果を学習者端末１０の認証処理部１０５へ通知する（Ｓ１２）。 An example of the overall operation of the learning support system 1 will be described. As shown in FIG. 2, an authentication process is performed when the system is activated. When information such as an ID and a password is input to the operation unit 110, an authentication request is sent from the authentication processing unit 105 to the authentication processing unit 201 of the server terminal 20 (S10). The authentication processing unit 201 performs authentication processing with reference to the learner information database 203, and notifies the authentication processing unit 105 of the learner terminal 10 of the authentication result (S12).

認証が成功すると、サーバ端末２０の認証処理部２０１は、音声関連情報データベース２０２に記憶された音声関連情報を、学習者端末１０の認証処理部１０５へ送信する（Ｓ１４）。ここでは、認証対象になった学習者の情報が送られる。そして、音声抽出演算処理部１０２が認証処理部１０５から音声関連情報のデータを取得して、音声関連情報記憶部１０６に保存する（Ｓ１６）。さらに、音声抽出演算処理部１０２は、音声関連情報記憶部１０６から音声関連情報を読み出す。読み出された情報は、学習者端末１０での処理に必要に応じて利用される。 If the authentication is successful, the authentication processing unit 201 of the server terminal 20 transmits the speech related information stored in the speech related information database 202 to the authentication processing unit 105 of the learner terminal 10 (S14). Here, information of the learner who is the authentication target is sent. Then, the voice extraction calculation processing unit 102 acquires voice related information data from the authentication processing unit 105 and stores it in the voice related information storage unit 106 (S16). Furthermore, the voice extraction calculation processing unit 102 reads the voice related information from the voice related information storage unit 106. The read information is used as necessary for processing in the learner terminal 10.

認証処理後の学習時の動作例としては、サーバ端末２０が、音声教材データベース２０４に記憶された音声教材データを読み出して、学習者端末１０に供給する。音声教材データは、音声抽出演算処理部１０２により音声教材データ記憶部１０８に記憶される。音声抽出演算処理部１０２は、音声教材データ記憶部１０８または音声教材データベース２０４から得た教材の音声を音声入出力部１０１に供給する。 As an operation example at the time of learning after the authentication process, the server terminal 20 reads out the audio teaching material data stored in the audio teaching material database 204 and supplies it to the learner terminal 10. The voice teaching material data is stored in the voice teaching data storage unit 108 by the voice extraction calculation processing unit 102. The voice extraction calculation processing unit 102 supplies the voice of the teaching material obtained from the voice teaching material data storage unit 108 or the voice teaching material database 204 to the voice input / output unit 101.

教材音声が音声入出力部１０１から出力され、そして、学習者音声が音声入出力部１０１に入力され、音声抽出演算処理部１０２に供給される。音声抽出演算処理部１０２は、学習者音声を学習者音声記録部１０９に保存する。また、音声抽出演算処理部１０２は、学習者音声からピッチを抽出する。また、音声抽出演算処理部１０２は、教材音声からもピッチを抽出する。抽出されたピッチは、波形表示処理部１０３に送られる。 The teaching material voice is output from the voice input / output unit 101, and the learner voice is input to the voice input / output unit 101 and supplied to the voice extraction calculation processing unit 102. The voice extraction calculation processing unit 102 stores the learner voice in the learner voice recording unit 109. In addition, the voice extraction calculation processing unit 102 extracts a pitch from the learner voice. The voice extraction calculation processing unit 102 also extracts the pitch from the teaching material voice. The extracted pitch is sent to the waveform display processing unit 103.

波形表示処理部１０３では、ピッチ波形の画像が生成される。このとき、教材と学習者のピッチ波形を一画面に表示する波形比較画像が生成される。そして、ピッチ波形比較画像が表示部１０４に表示される。 The waveform display processing unit 103 generates a pitch waveform image. At this time, a waveform comparison image that displays the teaching material and the learner's pitch waveform on one screen is generated. Then, the pitch waveform comparison image is displayed on the display unit 104.

また、教材および学習者の音声データは、音声抽出演算処理部１０２から音圧計算処理部１０７に供給される。音圧計算処理部１０７では、音声から音圧が計算される。音圧の情報も波形表示処理部１０３に送られ、そして、波形表示処理部１０３では音圧波形の画像が生成される。ピッチ波形と同様に、教材と学習者の音圧波形を一画面に表示する波形比較画像が生成される。音圧波形比較画像も表示部１０４に表示される。 The learning material and the learner's voice data are supplied from the voice extraction calculation processing unit 102 to the sound pressure calculation processing unit 107. The sound pressure calculation processing unit 107 calculates the sound pressure from the voice. The sound pressure information is also sent to the waveform display processing unit 103, and the waveform display processing unit 103 generates an image of the sound pressure waveform. Similar to the pitch waveform, a waveform comparison image that displays the teaching material and the sound pressure waveform of the learner on one screen is generated. A sound pressure waveform comparison image is also displayed on the display unit 104.

システム終了時の処理としては、図３に示すように、音声抽出演算処理部１０２が、音声関連情報記憶部１０６から音声関連情報を読み出し、認証処理部１０５へ供給する（Ｓ２０）。音声関連情報は、認証処理部１０５からサーバ端末２０の認証処理部２０１へ送られる（Ｓ２２）。そして、認証処理部２０１が、音声関連情報データベース２０２に音声関連情報のデータを保存する。 As processing at the end of the system, as shown in FIG. 3, the voice extraction calculation processing unit 102 reads voice-related information from the voice-related information storage unit 106 and supplies it to the authentication processing unit 105 (S20). The voice related information is sent from the authentication processing unit 105 to the authentication processing unit 201 of the server terminal 20 (S22). Then, the authentication processing unit 201 stores the voice related information data in the voice related information database 202.

以上のように、学習支援システム１では、音圧およびピッチについて、教材および学習者の波形を比較する波形比較画像が表示される。これにより、学習者は、教材と自分の発音の違いを視覚的に認識しながら、発音を矯正することができ、そして、ネイティブスピーカに近い発話技術を獲得することができる。 As described above, the learning support system 1 displays the waveform comparison image for comparing the teaching material and the learner's waveform for the sound pressure and the pitch. Thereby, the learner can correct the pronunciation while visually recognizing the difference between the teaching material and his / her pronunciation, and can acquire the utterance technique close to the native speaker.

また、学習支援システム１では、複数の学習者の音声関連情報がサーバ端末２０に一括保存される。そして、音声関連情報は、認証機能と連動して、学習者端末１０に提供される。すなわち、ログイン時に、ログインを行った学習者の音声関連情報が、学習者端末１０にダウンロードされる。学習者端末１０では音声関連情報が使用される。音声関連情報は例えば上記の波形表示に使われる。そして、ログアウト時には音声関連情報が更新される。例えば、今回の学習の音声関連情報と、過去の学習の音声関連情報とが比較され、差分が求められる。この差分の情報が、サーバ端末２０にアップロードされ、保存される。 Further, in the learning support system 1, voice related information of a plurality of learners is collectively stored in the server terminal 20. The voice related information is provided to the learner terminal 10 in conjunction with the authentication function. That is, at the time of login, the voice related information of the learner who has logged in is downloaded to the learner terminal 10. The learner terminal 10 uses voice related information. The voice related information is used for the above-described waveform display, for example. The voice related information is updated when logging out. For example, the speech-related information of the current learning is compared with the speech-related information of the past learning, and the difference is obtained. This difference information is uploaded to the server terminal 20 and stored.

したがって、学習者は、どの学習者端末１０で学習をするときでも、自分の音声関連情報を利用した学習ができる。学習者ごとの特性に応じた機器設定作業を少なくできる。また、音声関連情報を更新していくので、より正確な音声波形表示へと音声関連情報が寄与できる。 Therefore, the learner can perform learning using his / her voice related information, regardless of which learner terminal 10 is used for learning. Equipment setting work according to the characteristics of each learner can be reduced. In addition, since the voice related information is updated, the voice related information can contribute to more accurate voice waveform display.

この点に関し、従来システムでも、学習者の音声特性に合わせた波形を表示するためのパラメータ設定機能が備えられている。しかし、どのパラメータをどのように設定すると、学習者自身の特性に合った適切な波形を表示できるかは、学習者には分かり難い。また、仮にパラメータを適切に設定できたとしても、その日に使う学習者端末１０が変わるたびにパラメータの再設定するという煩雑な作業が求められる。これに対して、本実施の形態では、学習者関連情報がデータベースに一括保存され、学習者端末１０にダウンロードされる。学習者関連情報を使って学習者に適した端末機能の設定が可能になり、また、どの学習者端末１０を使うときでも学習者関連情報を反映できる。 In this regard, the conventional system is also provided with a parameter setting function for displaying a waveform in accordance with the learner's voice characteristics. However, it is difficult for the learner to know which parameter is set and how an appropriate waveform that matches the learner's own characteristics can be displayed. Moreover, even if the parameters can be set appropriately, a complicated operation of resetting the parameters every time the learner terminal 10 used on the day changes is required. On the other hand, in the present embodiment, learner-related information is collectively stored in a database and downloaded to the learner terminal 10. The terminal function suitable for the learner can be set using the learner related information, and the learner related information can be reflected when any learner terminal 10 is used.

図４は、波形表示処理部１０３を示している。データ蓄積処理部１０３１は、音声抽出演算処理部１０２から教材および学習者のピッチ波形の情報を取得する。また、データ蓄積処理部１０３１は、音圧計算処理部１０７から教材および学習者の音圧波形の情報を取得する。ピッチ波形および音圧波形の情報は、メモリ１０３５に書き込まれる。また、データ蓄積処理部１０３１は、音圧、ピッチ帯域に加えて、母音の周波数特性、マイク入力レベルといった情報を取得し、メモリ１０３５に書き込む。 FIG. 4 shows the waveform display processing unit 103. The data storage processing unit 1031 acquires the teaching material and information on the pitch waveform of the learner from the voice extraction calculation processing unit 102. Further, the data accumulation processing unit 1031 obtains information on the teaching material and the sound pressure waveform of the learner from the sound pressure calculation processing unit 107. Information on the pitch waveform and the sound pressure waveform is written in the memory 1035. In addition to the sound pressure and pitch band, the data storage processing unit 1031 acquires information such as the frequency characteristics of the vowels and the microphone input level, and writes the information in the memory 1035.

波形表示処理部１０３は、メモリ１０３５を使用し、データ蓄積処理部１０３１が取得したピッチ波形および音圧波形の情報を基に、前述したように、表示部１０４に表示されるべき波形画像を生成する。波形表示処理部１０３において、ピッチ抽出波形描画位置処理部１０３２は、ピッチ波形（ピッチ抽出波形）の描画位置を設定および調整する処理を行う。音圧波形描画位置処理部１０３３は、音圧波形の描画位置を設定および調整する処理を行う。時間軸波形描画位置処理部１０３４は、ピッチ波形および音圧波形の時間軸方向の描画位置を設定および調整する処理を行う。これら処理部により、ピッチ波形および音圧波形は、波形高さ方向および時間軸方向に変形される。 The waveform display processing unit 103 uses the memory 1035 to generate a waveform image to be displayed on the display unit 104 as described above based on the pitch waveform and sound pressure waveform information acquired by the data storage processing unit 1031. To do. In the waveform display processing unit 103, the pitch extraction waveform drawing position processing unit 1032 performs processing for setting and adjusting the drawing position of the pitch waveform (pitch extraction waveform). The sound pressure waveform drawing position processing unit 1033 performs processing for setting and adjusting the drawing position of the sound pressure waveform. The time axis waveform drawing position processing unit 1034 performs processing for setting and adjusting the drawing position of the pitch waveform and the sound pressure waveform in the time axis direction. By these processing units, the pitch waveform and the sound pressure waveform are deformed in the waveform height direction and the time axis direction.

また、波形表示処理部１０３は、学習者により操作部１１０に入力された可変速再生設定パラメータを受け付ける。可変速再生設定パラメータは、教材の可変速再生モードでの再生速度を表すパラメータである。これにより、教材の再生速度の変更が受け付けられる。可変速再生設定パラメータは、メモリ１０３５に書き込まれる。 In addition, the waveform display processing unit 103 receives a variable speed reproduction setting parameter input to the operation unit 110 by the learner. The variable speed playback setting parameter is a parameter representing the playback speed of the teaching material in the variable speed playback mode. Thereby, a change in the reproduction speed of the learning material is accepted. The variable speed reproduction setting parameter is written in the memory 1035.

図５は、音声抽出演算処理部１０２の構成を示している。音声抽出演算処理部１０２は、既に説明したように、音声入出力部１０１または学習者音声記録部１０９から学習者音声を取得し、また、音声教材データ記憶部１０８および音声教材データベース２０４から教材音声を取得する。 FIG. 5 shows a configuration of the voice extraction calculation processing unit 102. As described above, the voice extraction calculation processing unit 102 acquires the learner voice from the voice input / output unit 101 or the learner voice recording unit 109, and the teaching material voice from the voice teaching material data storage unit 108 and the voice teaching material database 204. To get.

音声抽出演算処理部１０２において、Ａ／Ｄ変換部１０２１は、入力音声をデジタルデータに変換する。ピッチ抽出処理部１０２２は、音声データにＦＦＴ等の解析処理を施してピッチ波形の情報を抽出する。ピッチはステップ単位で抽出される。１ステップの時間の長さは予め設定されている。平均化処理部１０２３は、ピッチ波形に平均化処理を施し、ピッチ波形をなめらかにする。さらに、継続時間カウント処理部１０２４は、ピッチの継続時間をカウントする。ここでは、ピッチが継続するときのステップの数がカウントされる。 In the voice extraction calculation processing unit 102, the A / D conversion unit 1021 converts the input voice into digital data. The pitch extraction processing unit 1022 performs analysis processing such as FFT on the audio data to extract pitch waveform information. The pitch is extracted in units of steps. The length of time for one step is preset. The averaging processing unit 1023 performs an averaging process on the pitch waveform to make the pitch waveform smooth. Further, the duration count processing unit 1024 counts the duration of the pitch. Here, the number of steps when the pitch continues is counted.

Ａ／Ｄ変換部１０２１で変換された音声データは、音圧計算処理部１０７および音声再生処理部１０２５にも供給される。音圧計算処理部１０７では、前述したように、音圧が計算される。また、音声再生処理部１０２５は、音声を再生する処理を行う。再生された音声は、音声再生処理部１０２５から音声入出力部１０１に供給される。 The audio data converted by the A / D conversion unit 1021 is also supplied to the sound pressure calculation processing unit 107 and the audio reproduction processing unit 1025. The sound pressure calculation processing unit 107 calculates the sound pressure as described above. The audio reproduction processing unit 1025 performs a process for reproducing audio. The reproduced audio is supplied from the audio reproduction processing unit 1025 to the audio input / output unit 101.

また、音声再生処理部１０２５には、音声抽出演算処理部１０２から可変速再生設定パラメータが供給される。この可変速再生設定パラメータは、操作部１１０から音声抽出演算処理部１０２を介して波形表示処理部１０３に供給されている。音声再生処理部１０２５は、可変速再生設定パラメータが示す再生速度に従って再生音声の速度を調整する。これによって、可変速再生パラメータが示す再生速度で音声が再生される。 The audio reproduction processing unit 1025 is supplied with variable speed reproduction setting parameters from the audio extraction calculation processing unit 102. The variable speed reproduction setting parameter is supplied from the operation unit 110 to the waveform display processing unit 103 via the voice extraction calculation processing unit 102. The audio reproduction processing unit 1025 adjusts the reproduction audio speed according to the reproduction speed indicated by the variable speed reproduction setting parameter. As a result, the audio is reproduced at the reproduction speed indicated by the variable speed reproduction parameter.

以上に、音声抽出演算処理部１０２および波形表示処理部１０３について説明した。次に、本実施の形態の波形調整処理に関する特徴的構成について説明する。 The speech extraction calculation processing unit 102 and the waveform display processing unit 103 have been described above. Next, a characteristic configuration regarding the waveform adjustment processing of the present embodiment will be described.

前述したように、波形表示処理部１０３は、教材と学習者のピッチ波形を比較する画像を生成し、また、両者の音圧波形を比較する画像を生成する。従来の波形比較画像では、単に、教材と学習者の波形が、上下に並べられる。しかしながら、従来の波形比較画像は、教材の話者と学習者の音程および声量の個体差を含んでしまっており、そのため、学習者にとって波形の比較が容易でない。そこで、本実施の形態は、下記のように、個体差を吸収するように波形を調整する。この調整機能は、波形表示処理部１０３のピッチ抽出波形描画位置処理部１０３２および音圧波形描画位置処理部１０３３により実現される。 As described above, the waveform display processing unit 103 generates an image for comparing the learning material and the learner's pitch waveform, and also generates an image for comparing the sound pressure waveforms of both. In the conventional waveform comparison image, the teaching material and the learner's waveform are simply arranged one above the other. However, the conventional waveform comparison image includes individual differences in the pitch and voice volume between the teaching material speaker and the learner, and therefore it is not easy for the learner to compare waveforms. Therefore, in the present embodiment, the waveform is adjusted so as to absorb individual differences as described below. This adjustment function is realized by the pitch extraction waveform drawing position processing unit 1032 and the sound pressure waveform drawing position processing unit 1033 of the waveform display processing unit 103.

（１）ピッチ波形の調整
ピッチ波形の調整では、教材の音声の代表的なピッチと学習者の代表的なピッチが用いられる。そして、両波形の代表的ピッチの差分を吸収するように、ピッチ波形の表示位置が調整される。この処理は、波形表示処理部１０３のピッチ抽出波形描画位置処理部１０３２によって行われる。代表的ピッチは、本実施の形態では平均ピッチである。 (1) Adjustment of pitch waveform In the adjustment of the pitch waveform, a representative pitch of the teaching material voice and a representative pitch of the learner are used. Then, the display position of the pitch waveform is adjusted so as to absorb the difference between the representative pitches of both waveforms. This processing is performed by the pitch extraction waveform drawing position processing unit 1032 of the waveform display processing unit 103. The representative pitch is an average pitch in the present embodiment.

図６は、ピッチ波形の調整処理の例を示している。図６の左側には、調整前の教材ピッチ波形および学習者ピッチ波形が示されている。学習者ピッチ波形は、録音済みの音声から得られたピッチである。 FIG. 6 shows an example of pitch waveform adjustment processing. The teaching material pitch waveform and the learner pitch waveform before adjustment are shown on the left side of FIG. The learner pitch waveform is a pitch obtained from the recorded voice.

ピッチ抽出波形描画位置処理部１０３２は、教材ピッチ波形のピッチ平均値を計算し、また、学習者ピッチ波形のピッチ平均値を計算する。ピッチ平均値は、前段の別の処理機能によって算出されてもよい。そして、ピッチ平均値の差分だけ、波形画像をピッチ高さ方向（図の上下方向）にスライドする。図６の例では、教材ピッチ波形がスライドされる。そして、図６の右側に示すように、教材ピッチ波形と学習者ピッチ波形が重ね合わされる。この重合せ後の波形比較画像が、波形表示処理部１０３から表示部１０４に供給され、ディスプレイに表示される。 The pitch extraction waveform drawing position processing unit 1032 calculates a pitch average value of the teaching material pitch waveform and calculates a pitch average value of the learner pitch waveform. The average pitch value may be calculated by another processing function in the previous stage. Then, the waveform image is slid in the pitch height direction (vertical direction in the figure) by the difference in pitch average value. In the example of FIG. 6, the teaching material pitch waveform is slid. Then, as shown on the right side of FIG. 6, the teaching material pitch waveform and the learner pitch waveform are superimposed. The superimposed waveform comparison image is supplied from the waveform display processing unit 103 to the display unit 104 and displayed on the display.

図の例では、学習者の声が、教材の話者より低い。しかし、波形比較画像では、声の高さの違いが吸収され、教材と学習者の波形が縦軸上（周波数上）のほぼ同じ範囲に表示されることになり、両者を比較しやすくなっている。さらに、教材と学習者の波形を重ね合わせることで、波形の一致度および不一致箇所が明確になっている。 In the example of the figure, the learner's voice is lower than the speaker of the teaching material. However, in the waveform comparison image, the difference in the pitch of the voice is absorbed, and the teaching material and the learner's waveform are displayed in the same range on the vertical axis (on the frequency), making it easier to compare the two. Yes. Furthermore, the degree of coincidence and the disagreement of the waveform are clarified by superimposing the learning material and the learner's waveform.

なお、本実施の形態では、教材の波形がスライドされた。しかし、学習者の波形が比較されてもよい。また、両波形がスライドされてもよい。例えば、両波形の各々について、平均ピッチが基準ピッチに一致するように波形位置が調整されてもよい。 In this embodiment, the teaching material waveform is slid. However, the learner's waveforms may be compared. Moreover, both waveforms may be slid. For example, for each of both waveforms, the waveform position may be adjusted so that the average pitch matches the reference pitch.

図７は、さらに好適な表示処理の例を示している。本実施の形態では、教材波形のピッチ変化幅を示す情報が波形比較画像に付加されている。 FIG. 7 shows an example of a more preferable display process. In the present embodiment, information indicating the pitch change width of the teaching material waveform is added to the waveform comparison image.

図７では、図６と同様に、教材ピッチ波形がスライドされ、かつ、学習者ピッチ波形と重ねられている。さらに、画像上には、教材のピッチ最高値ラインとピッチ最低値ラインが付加されている。ピッチ最高値ラインは、教材のピッチ最高値を通るラインであり、ピッチ最低値ラインは、ピッチ最低値を通るラインである。両ラインは、時間軸に平行である。 In FIG. 7, the teaching material pitch waveform is slid and overlapped with the learner pitch waveform as in FIG. Furthermore, a pitch maximum value line and a pitch minimum value line of the teaching material are added on the image. The highest pitch value line is a line that passes the highest pitch value of the teaching material, and the lowest pitch value line is a line that passes the lowest pitch value. Both lines are parallel to the time axis.

上記のように、本実施の形態では、ピッチ最高値およびピッチ最低値が表示されるので、学習者は、教材のピッチ高低差を実現することを練習の目標にできる。この練習では、波形の外形を同じにするといったように、波形の細かな違いに着目しなくてもよい。例えば、波形を同じにするためには、話すスピードを教材と揃えなければならない。しかし、高低差に着目するのであれば、スピードをそれほど強く意識しなくてもよい。このようにして、高低差という適当な目標を画像上に追加できるので、学習のモチベーションの向上および学習効果の向上が期待できる。 As described above, in the present embodiment, since the highest pitch value and the lowest pitch value are displayed, the learner can make the goal of practice to realize the pitch height difference of the teaching material. In this practice, it is not necessary to pay attention to small differences in the waveform, such as making the outer shape of the waveform the same. For example, in order to make the waveform the same, speaking speed must be aligned with teaching materials. However, if you pay attention to the height difference, you do not have to be so conscious of speed. In this way, since an appropriate target of height difference can be added to the image, improvement in learning motivation and improvement in learning effect can be expected.

なお、上記のピッチ最高値およびピッチ最低値は、ピッチ抽出波形描画位置処理部１０３２または別の処理機能により自動的に算出される。そして、最高値ラインおよび最低値ラインが自動的に表示される。 Note that the above-described maximum pitch value and minimum pitch value are automatically calculated by the pitch extraction waveform drawing position processing unit 1032 or another processing function. Then, the highest value line and the lowest value line are automatically displayed.

また、最高値および最低値が、学習者によって操作部１１０を使って指定されてもよい。そして、入力された最高値および最低値が受け付けられ、受け付けた値に対応する位置にラインが付加されてもよい。最高値および最低値は、別の端末を使って先生によって設定されてもよい。さらに、最高値と次に高い値とをライン表示させるようにしてもよい。こうすると、例えば英語の第１アクセントと第２アクセントとに特化した練習を行うことが可能となる。 The maximum value and the minimum value may be designated by the learner using the operation unit 110. Then, the input maximum value and minimum value may be received, and a line may be added at a position corresponding to the received value. The maximum and minimum values may be set by the teacher using another terminal. Further, the highest value and the next higher value may be displayed in a line. In this way, for example, it is possible to practice specialized in English first accent and second accent.

また、本実施の形態では、波形上の水平線は、意識的に削除することもできるように構成されていてもよい。 In the present embodiment, the horizontal line on the waveform may be configured so that it can be deleted intentionally.

また、本実施の形態では、教材のピッチ変化幅が表示される。これに対して、学習者のピッチ変化幅が表示されてもよい。また、両者のピッチ変化幅が表示されてもよい。さらに、最大値と次に大きい値とを表示させるようにしてもよい。 In the present embodiment, the pitch change width of the teaching material is displayed. On the other hand, a learner's pitch change width may be displayed. Moreover, both pitch change widths may be displayed. Further, the maximum value and the next largest value may be displayed.

次に、リアルタイム表示を行うときの波形調整について説明する。上記の処理例は、主として録音済みの音声の波形を調整している。リアルタイム表示の場合は、下記のような処理を行うことが好適である。 Next, the waveform adjustment when performing real-time display will be described. In the above processing example, the waveform of the recorded voice is mainly adjusted. In the case of real-time display, it is preferable to perform the following processing.

まず、リアルタイム表示について説明する。リアルタイム表示では、学習者が発声を開始し、音声入力が開始すると、音声抽出演算処理部１０２もピッチの抽出を開始する。ピッチはステップ単位で抽出される。１ステップの長さは１０ｍｓｅｃである。各ステップのピッチが順次波形表示処理部１０３に供給される。波形表示処理部１０３では、入力されるピッチが、次々と、ピッチ波形画像に継ぎ足される。このようにして、学習者の音声の進行と同時に、画面上ではピッチ波形がのびていく。リアルタイム表示により、視覚的フィードバック効果を高められる。そして、視覚的フィードバックが聴覚フィードバックを補強し、これによる学習のモチベーションの向上と学習効果の向上が期待できる。 First, real-time display will be described. In real-time display, when the learner starts speaking and voice input starts, the voice extraction calculation processing unit 102 also starts pitch extraction. The pitch is extracted in units of steps. The length of one step is 10 msec. The pitch of each step is sequentially supplied to the waveform display processing unit 103. In the waveform display processing unit 103, the input pitch is successively added to the pitch waveform image. In this way, the pitch waveform extends on the screen simultaneously with the progress of the learner's voice. Real-time display enhances visual feedback effect. And visual feedback reinforces auditory feedback, which can be expected to improve learning motivation and learning effects.

リアルタイム表示では、波形表示処理部１０３は、表示対象の音声データのピッチ平均値は利用できない。そこで、波形表示処理部１０３は、前回までの学習で蓄積されたデータから学習者のピッチ平均値を読み込む。図１を参照すると、学習者の平均ピッチは、音声関連情報の一部として、サーバ端末２０から学習者端末１０にダウンロードされ、音声関連情報記憶部１０６に記憶されている。また、図４に示すように、音声関連情報はメモリ１０３５に記憶されている。 In the real-time display, the waveform display processing unit 103 cannot use the pitch average value of the audio data to be displayed. Therefore, the waveform display processing unit 103 reads the learner's pitch average value from the data accumulated in the previous learning. Referring to FIG. 1, the average pitch of the learner is downloaded from the server terminal 20 to the learner terminal 10 as a part of the speech related information and stored in the speech related information storage unit 106. As shown in FIG. 4, the voice related information is stored in the memory 1035.

そこで、このような過去の学習のピッチ平均を使って、表示位置が調整される。すなわち、過去の学習の平均ピッチが読み込まれ、一方で、教材の平均ピッチが算出され、そして、これら平均ピッチの差を吸収するようにピッチ波形位置が調整される。 Therefore, the display position is adjusted using such a past learning pitch average. That is, the average pitch of the past learning is read, while the average pitch of the teaching material is calculated, and the pitch waveform position is adjusted so as to absorb the difference between these average pitches.

波形比較画像上では、例えば、教材波形の全体が先に表示される。そして、教材波形に重なるように学習者波形が描かれる。このとき、表示位置が調整された波形が描かれる。リアルタイム表示なので、学習者の発話の進行に伴って、波形も時間軸方向に延びていく。そして、表示位置が調整されているので、学習者の波形は、縦軸方向（ピッチ高さ方向）に教材波形とほぼ同じ場所に描かれる。 On the waveform comparison image, for example, the entire teaching material waveform is displayed first. Then, a learner waveform is drawn so as to overlap the teaching material waveform. At this time, a waveform with the adjusted display position is drawn. Since it is a real-time display, the waveform also extends in the time axis direction as the learner progresses. Since the display position is adjusted, the learner's waveform is drawn in the vertical axis direction (pitch height direction) at substantially the same location as the teaching material waveform.

次に、リアルタイム表示におけるもう一つの好適な波形調整処理を説明する。この調整処理では、入力された学習者の音声の初期段階のピッチを用いて表示位置を調整する。初期段階のピッチが、代表ピッチになり、波形スライドの基準になる。 Next, another preferred waveform adjustment process in real time display will be described. In this adjustment process, the display position is adjusted using the initial pitch of the input learner's voice. The initial pitch becomes the representative pitch and becomes the reference for the waveform slide.

初期段階のピッチは、最初のピッチでもよい。初期段階のピッチは、例えば、所定の期間のピッチの平均値でもよい。この場合、初期段階の平均値を得るまで学習者波形の描画が遅延してもよい。このような遅延があったとしても、その時間が短ければ、十分に満足できるリアルタイム表示が行われる。 The initial pitch may be the initial pitch. The initial stage pitch may be, for example, an average value of pitches in a predetermined period. In this case, the drawing of the learner waveform may be delayed until the initial average value is obtained. Even if there is such a delay, if the time is short, a sufficiently satisfactory real-time display is performed.

なお、上記のリアルタイム表示の波形調整は、リアルタイム表示だけでなく、録音済みの音声ファイルを使った波形表示にも適用されてよい。 Note that the above-described real-time display waveform adjustment may be applied not only to real-time display but also to waveform display using a recorded audio file.

（２）音圧波形の調整
次に、音圧波形の調整処理を説明する。音圧波形の調整では、教材の音声の音圧変化幅と学習者の音声の音圧変化幅が用いられる。そして、両波形の音圧変化幅の差分を吸収するように音圧波形が調整される。この処理は、波形表示処理部１０３の音圧波形描画位置処理部１０３３によって行われる。 (2) Sound Pressure Waveform Adjustment Next, sound pressure waveform adjustment processing will be described. In the adjustment of the sound pressure waveform, the sound pressure change width of the teaching material voice and the sound pressure change width of the learner's voice are used. Then, the sound pressure waveform is adjusted so as to absorb the difference between the sound pressure change widths of both waveforms. This process is performed by the sound pressure waveform drawing position processing unit 1033 of the waveform display processing unit 103.

音圧変化幅は、本実施の形態では、音声の最大音圧と最小音圧の幅（ダイナミックレンジ）である。音圧波形描画位置処理部１０３３は、教材の音圧波形の変化幅を計算する。そして、教材の音圧変化幅を、基準の変化幅と一致させる正規化処理を行う。このとき、音圧波形が、音圧レベル方向に伸縮される。音圧波形描画位置処理部１０３３は、学習者の音圧波形に対しても同様の正規化処理を行う。これにより、両波形の音圧変化幅が吸収される。 In this embodiment, the sound pressure change width is the width (dynamic range) of the maximum sound pressure and the minimum sound pressure of the sound. The sound pressure waveform drawing position processing unit 1033 calculates the change width of the sound pressure waveform of the teaching material. Then, normalization processing is performed to match the sound pressure change width of the teaching material with the reference change width. At this time, the sound pressure waveform is expanded and contracted in the sound pressure level direction. The sound pressure waveform drawing position processing unit 1033 performs the same normalization process on the learner's sound pressure waveform. Thereby, the sound pressure change width of both waveforms is absorbed.

図８は、音圧波形の調整処理の例を示している。この例では、教材波形の音圧変化幅が、正規化の基準幅として使われている。したがって、教材波形は変形されていない。これに対して、学習者の波形は、正規化によって変形されている。そして、図８の右側に示すように、教材と学習者の音圧波形が重ね合わされている。この重合せ後の波形比較画像が、波形表示処理部１０３から表示部１０４に供給され、ディスプレイに表示される。 FIG. 8 shows an example of sound pressure waveform adjustment processing. In this example, the sound pressure change width of the teaching material waveform is used as a reference width for normalization. Therefore, the teaching material waveform is not deformed. In contrast, the learner's waveform is transformed by normalization. Then, as shown on the right side of FIG. 8, the teaching material and the sound pressure waveform of the learner are superimposed. The superimposed waveform comparison image is supplied from the waveform display processing unit 103 to the display unit 104 and displayed on the display.

図８の例では、学習者の声が全体的に教材より小さいので、学習者の音圧波形が音圧レベル方向（縦軸方向）に伸ばされている。その結果、音圧変化幅が吸収されて、教材と学習者の波形が縦軸上（音圧レベル方向）のほぼ同じ範囲に表示されることになり、両者を比較しやすくなっている。さらに、教材と学習者の波形を重ね合わせることで、波形の一致度および不一致箇所が明確になっている。 In the example of FIG. 8, since the learner's voice is generally smaller than the teaching material, the learner's sound pressure waveform is extended in the sound pressure level direction (vertical axis direction). As a result, the sound pressure change width is absorbed, and the teaching material and the learner's waveform are displayed in substantially the same range on the vertical axis (in the direction of the sound pressure level), making it easy to compare the two. Furthermore, the degree of coincidence and the disagreement of the waveform are clarified by superimposing the learning material and the learner's waveform.

なお、図８の例では、学習者波形が変形された。これに対して、上述したように、学習者波形と教材波形の両方が変形されてもよく、また、教材波形のみが変形されてもよい。 In the example of FIG. 8, the learner waveform is deformed. On the other hand, as described above, both the learner waveform and the learning material waveform may be modified, or only the learning material waveform may be modified.

ところで、前述のピッチ波形に関しては、図７に示したように、ピッチ変化幅、より詳細にはピッチ最高値と最低値が波形比較画像に付加された。音圧波形に関しても同様の処理が好適に行われ、すなわち、音圧最大値と音圧最小値が図８の画像に付加される。図７と同様に最大値および最小値にラインが引かれてもよい。学習者は、教材の音圧大小差（音圧レベル幅）を実現することを学習の目標にできる。 By the way, with respect to the pitch waveform described above, as shown in FIG. 7, a pitch change width, more specifically, a pitch maximum value and a minimum value are added to the waveform comparison image. The same processing is suitably performed for the sound pressure waveform, that is, the maximum sound pressure value and the minimum sound pressure value are added to the image of FIG. Similarly to FIG. 7, a line may be drawn to the maximum value and the minimum value. The learner can make the goal of learning to realize the sound pressure difference (sound pressure level width) of the teaching material.

この点に関し、音圧波形は主にストレスの位置を表すと考えられる。したがって、音圧大小差をまねる練習は、平易な発話箇所（音圧最小値に対応）とストレスの存在する発話箇所（音圧最大値に対応）の大小差（音圧レベル幅）を実現する練習に相当する。この練習では、波形の外形を同じにするといったように、波形の細かな違いに着目しなくてもよい。例えば、波形を同じにするためには、話すスピードを教材と揃えなければならない。しかし、大小差に着目するのであれば、スピードをそれほど強く意識しなくてもよい。このようにして、大小差という適当な目標を画像上に追加できるので、学習のモチベーションの向上および学習効果の向上が期待できる。 In this regard, it is considered that the sound pressure waveform mainly represents the position of stress. Therefore, the practice of mimicking the difference in sound pressure achieves a large difference (sound pressure level width) between a simple utterance point (corresponding to the minimum sound pressure value) and an utterance point where stress is present (corresponding to the maximum sound pressure value). Equivalent to practice. In this practice, it is not necessary to pay attention to small differences in the waveform, such as making the outer shape of the waveform the same. For example, in order to make the waveform the same, speaking speed must be aligned with teaching materials. However, if you focus on the difference in size, you don't have to be so conscious of speed. In this way, since an appropriate target of magnitude difference can be added to the image, improvement in learning motivation and improvement in learning effect can be expected.

なお、上記の音圧最大値および音圧最小値は、音圧波形描画位置処理部１０３３または別の処理機能により自動的に算出される。そして、最大値ラインおよび最小値ラインが自動的に表示される。 Note that the maximum sound pressure value and the minimum sound pressure value are automatically calculated by the sound pressure waveform drawing position processing unit 1033 or another processing function. Then, the maximum value line and the minimum value line are automatically displayed.

また、最大値および最小値が、学習者によって操作部１１０を使って指定されてもよい。そして、入力された最大値および最小値が受け付けられ、受け付けた値に対応する位置にラインが付加されてもよい。最大値および最小値は、別の端末を使って先生によって設定されてもよい。 Further, the maximum value and the minimum value may be designated by the learner using the operation unit 110. Then, the input maximum value and minimum value may be received, and a line may be added at a position corresponding to the received value. The maximum and minimum values may be set by the teacher using another terminal.

また、本実施の形態では、教材の音圧変化幅が表示される。これに対して、学習者の音圧変化幅が表示されてもよい。また、両者の音圧変化幅が表示されてもよい。 In the present embodiment, the sound pressure change width of the teaching material is displayed. On the other hand, a learner's sound pressure change width may be displayed. Further, the sound pressure change width of both may be displayed.

次に、リアルタイム表示について説明する。ピッチ波形で説明したのと同様に、音圧波形に関しても、リアルタイム表示での波形調整が好適に行われる。 Next, real-time display will be described. Similar to the description of the pitch waveform, the sound pressure waveform is also suitably adjusted in real time.

リアルタイム表示では、波形表示処理部１０３は、前回までの学習で蓄積されたデータから学習者の音圧データを読み込む。図１を参照すると、学習者の音圧データは、音声関連情報の一部として、サーバ端末２０から学習者端末１０にダウンロードされ、音声関連情報記憶部１０６に記憶されている。また、図４に示すように、音声関連情報はメモリ１０３５に記憶されている。そこで、音圧波形描画位置処理部１０３３は、過去のデータから、学習者の音圧変化幅を取得する。この音圧変化幅を使って、図８に示されるような正規化処理が行われる。 In the real-time display, the waveform display processing unit 103 reads the learner's sound pressure data from the data accumulated in the previous learning. Referring to FIG. 1, the sound pressure data of the learner is downloaded from the server terminal 20 to the learner terminal 10 as a part of the sound related information and stored in the sound related information storage unit 106. As shown in FIG. 4, the voice related information is stored in the memory 1035. Therefore, the sound pressure waveform drawing position processing unit 1033 acquires the learner's sound pressure change width from the past data. A normalization process as shown in FIG. 8 is performed using the sound pressure change width.

したがって、波形比較画像上では、例えば、教材波形の全体が先に表示される。そして、教材波形に重なるように学習者波形が描かれる。このとき、正規化が施された波形が描かれる。リアルタイム表示なので、学習者の発話の進行に伴って、波形も時間軸方向に延びていく。そして、音圧変化幅の差が吸収されているので、学習者の波形は、縦軸方向（音圧高さ方向）に教材波形とほぼ同じ場所に描かれる。 Therefore, on the waveform comparison image, for example, the entire teaching material waveform is displayed first. Then, a learner waveform is drawn so as to overlap the teaching material waveform. At this time, a normalized waveform is drawn. Since it is a real-time display, the waveform also extends in the time axis direction as the learner progresses. And since the difference of the sound pressure change width is absorbed, the learner's waveform is drawn in the vertical axis direction (sound pressure height direction) at substantially the same location as the teaching material waveform.

次に、過去の音圧データを使わないときのリアルタイム表示について説明する。この場合は、入力された学習者の音声の初期段階の音圧データを用いて表示位置を調整する。初期段階の音圧は、発声開始から所定の期間の音圧でよい。このような初期段階の音圧のダイナミックレンジが、正規化に使われる。この場合、初期段階の平均値を得るまで学習者波形の描画が遅延してもよい。このような遅延があったとしても、その時間が短ければ、十分に満足できるリアルタイム表示が行われる。 Next, real-time display when past sound pressure data is not used will be described. In this case, the display position is adjusted using the sound pressure data at the initial stage of the input voice of the learner. The sound pressure at the initial stage may be a sound pressure for a predetermined period from the start of speaking. Such a dynamic range of the sound pressure at the initial stage is used for normalization. In this case, the drawing of the learner waveform may be delayed until the initial average value is obtained. Even if there is such a delay, if the time is short, a sufficiently satisfactory real-time display is performed.

以上、本発明の実施の形態の学習支援システム１について説明した。本実施の形態の学習支援システム１によれば、教材の音声の代表的なピッチと学習者の代表的なピッチの差分を吸収するようにピッチ波形表示位置を調整するので、学習者が声の高さを無理に教材に合わせなくても、教材と学習者のピッチ波形が比較しやすい位置に描かれた比較画像を提供できる。したがって、発話者の個体差の違いに影響を低減でき、教材と学習者の音声の波形を容易に比較できる。 The learning support system 1 according to the embodiment of the present invention has been described above. According to the learning support system 1 of the present embodiment, the pitch waveform display position is adjusted so as to absorb the difference between the representative pitch of the teaching material voice and the representative pitch of the learner. Without forcibly adjusting the height to the teaching material, it is possible to provide a comparative image drawn at a position where the teaching material and the learner's pitch waveform can be easily compared. Therefore, it is possible to reduce the influence on the difference between individual speakers and to easily compare the teaching material and the voice waveform of the learner.

また、本実施の形態によれば、代表的なピッチとして平均値を使うことで、教材と学習者の個体差を効果的に吸収できる。 Further, according to the present embodiment, by using the average value as a representative pitch, individual differences between the teaching material and the learner can be effectively absorbed.

また、本実施の形態によれば、教材と学習者の波形を重ね合わせることで、両波形の比較が容易になる。 Further, according to the present embodiment, the comparison of both waveforms is facilitated by superimposing the teaching material and the learner's waveforms.

また、本実施の形態によれば、過去に録音された学習者の音声から得られた代表ピッチを用いて表示位置を調整するので、リアルタイムでピッチ波形表示位置を調整することができる。 Further, according to the present embodiment, the display position is adjusted using the representative pitch obtained from the voice of the learner recorded in the past, so that the pitch waveform display position can be adjusted in real time.

また、本実施の形態によれば、入力された学習者の音声の初期段階のピッチを用いて表示位置を調整するので、リアルタイムでピッチ波形表示位置を調整することができる。 Also, according to the present embodiment, the display position is adjusted using the initial pitch of the input learner's voice, so the pitch waveform display position can be adjusted in real time.

また、本実施の形態によれば、ピッチ変化幅を表す情報をピッチ波形画像に付加するので、学習者は、ピッチ変化幅に着目することで、教材と学習者のピッチを容易に比較できる。 Further, according to the present embodiment, since information indicating the pitch change width is added to the pitch waveform image, the learner can easily compare the teaching material and the learner's pitch by paying attention to the pitch change width.

また、本実施の形態によれば、最高ピッチと最低ピッチを示す情報をピッチ波形画像に付加するので、ピッチの把握がさらに容易になる。 In addition, according to the present embodiment, since the information indicating the highest pitch and the lowest pitch is added to the pitch waveform image, it becomes easier to grasp the pitch.

また、本実施の形態によれば、音圧波形に関しても、教材の音声の音圧変化幅と学習者の音声の音圧変化幅の差分が吸収されるので、学習者が声の大きさを無理に教材に合わせなくても、教材と学習者の音圧波形が比較しやすいかたちで描かれた比較画像を提供できる。したがって、発話者の個体差の違いによる影響を低減でき、教材と学習者の音声の波形を容易に比較できる。 Further, according to the present embodiment, the difference between the sound pressure change width of the teaching material voice and the sound pressure change width of the learner's voice is also absorbed with respect to the sound pressure waveform. Without forcibly matching the learning material, it is possible to provide a comparative image drawn in such a way that the sound pressure waveforms of the learning material and the learner can be easily compared. Therefore, it is possible to reduce the influence caused by the difference between individual speakers and to easily compare the teaching material and the voice waveform of the learner.

また、本実施の形態によれば、音圧変化幅として最大音圧と最小音圧の幅を用いるので、ダイナミックレンジの差分を吸収した好適な比較画像を提供できる。 Further, according to the present embodiment, since the width of the maximum sound pressure and the minimum sound pressure is used as the sound pressure change width, it is possible to provide a suitable comparative image that absorbs the difference in the dynamic range.

また、本実施の形態によれば、教材と学習者の音圧波形を重ね合わせることで、両波形の比較が容易になる。 Further, according to the present embodiment, comparison of both waveforms is facilitated by superimposing the teaching material and the sound pressure waveform of the learner.

また、本実施の形態によれば、過去に録音された学習者の音声から得られた音圧変化幅を用いて音圧波形を調整するので、リアルタイムで音圧波形を調整することができる。 Further, according to the present embodiment, the sound pressure waveform is adjusted using the sound pressure change width obtained from the voice of the learner recorded in the past, so that the sound pressure waveform can be adjusted in real time.

また、本実施の形態によれば、入力された学習者の音声の初期段階の音圧を用いて音圧波形を調整するので、リアルタイムで音圧波形を調整することができる。 Further, according to the present embodiment, the sound pressure waveform is adjusted using the sound pressure at the initial stage of the input learner's voice, so that the sound pressure waveform can be adjusted in real time.

また、本実施の形態によれば、音圧変化幅を表す情報を音圧波形画像に付加するので、学習者は、音圧変化幅に着目することで、教材と学習者の音圧を容易に比較できる。 In addition, according to the present embodiment, since information indicating the sound pressure change width is added to the sound pressure waveform image, the learner can easily make the sound pressure of the teaching material and the learner by paying attention to the sound pressure change width. Can be compared.

また、本実施の形態によれば、最大音圧と最小音圧を示す情報を音圧波形画像に付加するので、音圧の把握がさらに容易になる。 In addition, according to the present embodiment, since information indicating the maximum sound pressure and the minimum sound pressure is added to the sound pressure waveform image, the sound pressure can be further easily grasped.

また、別の観点では、本実施の形態の学習支援システム１は、教材の音声パラメータの波形である教材波形情報を取得する教材波形取得手段と、学習者の音声パラメータの波形である学習者波形情報を取得する学習者波形取得手段と、前記教材波形情報および前記学習者波形情報を比較するための波形比較画像を生成する波形比較画像生成手段と、前記波形比較画像生成手段に設けられ、前記教材および前記学習者の音声パラメータの代表値の差を吸収するように波形画像を調整する調整手段とを備えている。音声パラメータは、例えば、上記のピッチまたは音圧である。この構成によっても、上述した実施の形態の利点が得られ、すなわち、発話者の個体差の違いによる影響を低減でき、教材と学習者の音声の波形を容易に比較できる。 From another viewpoint, the learning support system 1 according to the present embodiment includes a learning material waveform acquisition unit that acquires learning material waveform information that is a waveform of a speech parameter of the learning material, and a learner waveform that is a waveform of the learner's speech parameter. Learner waveform acquisition means for acquiring information, waveform comparison image generation means for generating a waveform comparison image for comparing the teaching material waveform information and the learner waveform information, and the waveform comparison image generation means, Adjusting means for adjusting the waveform image so as to absorb the difference between the teaching material and the representative value of the voice parameter of the learner. The voice parameter is, for example, the above pitch or sound pressure. Also with this configuration, the advantages of the above-described embodiment can be obtained, that is, the influence due to the difference between individual speakers can be reduced, and the teaching material and the voice waveform of the learner can be easily compared.

なお、上記の実施の形態では、学習支援システム１が、ピッチ波形の調整と音圧波形の調整の両方を行った。しかし、本発明の範囲内で、これらの処理の一部が行われてもよい。 In the above embodiment, the learning support system 1 performs both the adjustment of the pitch waveform and the adjustment of the sound pressure waveform. However, some of these processes may be performed within the scope of the present invention.

また、上記の実施の形態では、学習支援システム１が、ネットワークで接続された学習者端末１０とサーバ端末２０で構成された。しかし、本発明はこれに限定されず、例えば、単独のコンピュータで学習支援システムが構成されてもよい。 Moreover, in said embodiment, the learning assistance system 1 was comprised by the learner terminal 10 and the server terminal 20 which were connected by the network. However, the present invention is not limited to this. For example, the learning support system may be configured by a single computer.

また、本実施の形態では、音声教材データ記憶部１０８と音声教材データベース２０４とが音声教材である場合について説明したが、音声を含む動画教材であってもよい。この動画教材も音声を含むので音声教材であり、また、動画教材（ＭＰＥＧなど）から抽出された音声も音声教材であり、音声の抽出は音声抽出演算処理部１０２で行われてよい。 In the present embodiment, the case where the audio teaching material data storage unit 108 and the audio teaching material database 204 are audio teaching materials has been described. Since this moving picture teaching material also includes sound, it is a sound teaching material, and the voice extracted from the moving picture teaching material (such as MPEG) is also a voice teaching material, and the voice extraction calculation processing unit 102 may perform the voice extraction.

その他、本発明は上述の実施の形態に限定されず、当業者が本発明の範囲内で上述の実施の形態を変形可能なことはもちろんである。 In addition, this invention is not limited to the above-mentioned embodiment, Of course, those skilled in the art can modify the above-mentioned embodiment within the scope of the present invention.

以上のように、本発明にかかる学習支援システムは、発話者の個体差の違いによる影響を低減でき、教材と学習者の音声の波形の比較を容易にできるという効果を有し、マルチメディアを利用した学習支援システム等として有用である。 As described above, the learning support system according to the present invention can reduce the influence caused by the difference between individual speakers, and can easily compare the waveform of the teaching material and the voice of the learner. It is useful as a learning support system.

本発明の実施の形態における学習支援システムのブロック図The block diagram of the learning assistance system in embodiment of this invention 学習支援システムの起動時のフロー図Flow chart when starting the learning support system 学習支援システムの終了時のフロー図Flow chart at the end of the learning support system 波形表示処理部のブロック図Waveform display processing block diagram 音声抽出演算処理部のブロック図Block diagram of voice extraction calculation processing unit ピッチ波形比較画像の調整処理を示す図The figure which shows the adjustment process of a pitch waveform comparison image ピッチ波形比較画像の調整処理を示す図The figure which shows the adjustment process of a pitch waveform comparison image 音圧波形比較画像の調整処理を示す図The figure which shows the adjustment processing of the sound pressure waveform comparison image

Explanation of symbols

１０学習者端末
２０サーバ端末
１０１音声入出力部
１０２音声抽出演算処理部
１０３波形表示処理部
１０４表示部
１０５認証処理部
１０６音声関連情報記憶部
１０７音圧計算処理部
１０８音声教材データベース
１０９学習者音声記録部
１１０操作部 DESCRIPTION OF SYMBOLS 10 Learner terminal 20 Server terminal 101 Voice input / output part 102 Voice extraction calculation process part 103 Waveform display process part 104 Display part 105 Authentication process part 106 Voice related information storage part 107 Sound pressure calculation process part 108 Voice teaching material database 109 Learner voice Recording unit 110 operation unit

Claims

Teaching material pitch acquisition means for acquiring teaching material pitch waveform information that is a pitch waveform of audio of the teaching material;
Learner pitch acquisition means for acquiring learner pitch waveform information which is a pitch waveform of a learner's voice;
Pitch waveform comparison image generating means for generating a pitch waveform comparison image for comparing the teaching material pitch waveform information and the learner pitch waveform information;
A display position adjusting means provided in the pitch waveform comparison image generating means, for adjusting a pitch waveform display position so as to absorb a difference between a representative pitch of the learning material voice and a representative pitch of the learner;
A learning support system characterized by comprising

The learning support system according to claim 1, wherein the display position adjusting unit uses an average pitch value as the representative pitch.

The learning support system according to claim 1, wherein the pitch waveform comparison image generation unit superimposes the teaching material pitch waveform information and the learner pitch waveform information.

The pitch waveform comparison image generating means generates a pitch waveform image in real time from a learner's input voice,
The learning support system according to claim 1, wherein the display position adjusting unit adjusts the display position by using a representative pitch obtained from a learner's voice recorded in the past.

The pitch waveform comparison image generating means generates a pitch waveform image in real time from a learner's input voice,
The learning support system according to claim 1, wherein the display position adjusting unit adjusts the display position by using an initial pitch of the input voice of the learner.

Provided in the pitch waveform comparison image generating means, provided with a change width adding means for adding information indicating a pitch change width to the pitch waveform comparison image with respect to at least one of the teaching material pitch waveform information and the learner pitch waveform information. The learning support system according to claim 1, wherein:

Teaching material pitch acquisition means for acquiring teaching material pitch waveform information that is a pitch waveform of audio of the teaching material;
Learner pitch acquisition means for acquiring learner pitch waveform information which is a pitch waveform of a learner's voice;
A pitch comparison image generating means for generating a pitch waveform comparison image for comparing the teaching material pitch waveform information and the learner pitch waveform information;
A change width adding means provided in the pitch comparison image generating means for adding information indicating a pitch change width to the pitch waveform comparison image with respect to at least one of the teaching material pitch waveform information and the learner pitch waveform information;
A learning support system characterized by comprising

The learning support system according to claim 7, wherein the change width adding unit adds information indicating the highest pitch and the lowest pitch to the pitch waveform comparison image.

Teaching material sound pressure acquisition means for acquiring teaching material sound pressure waveform information which is a sound pressure waveform of the sound of the teaching material;
Learner sound pressure acquisition means for acquiring learner sound pressure waveform information which is a sound pressure waveform of a learner's voice;
A sound pressure waveform comparison image generating means for generating a sound pressure waveform comparison image for comparing the learning material sound pressure waveform information and the learner sound pressure waveform information;
Waveform adjusting means provided in the sound pressure waveform comparison image generating means for adjusting the sound pressure waveform so as to absorb a difference between the sound pressure change width of the sound of the teaching material and the sound pressure change width of the learner's voice; ,
A learning support system characterized by comprising

The learning support system according to claim 9, wherein the waveform adjusting unit uses a maximum sound pressure range and a minimum sound pressure range as a sound pressure change range.

The learning support system according to claim 9, wherein the sound pressure waveform comparison image generation unit superimposes the teaching material sound pressure waveform information and the learner sound pressure waveform information.

The sound pressure waveform comparison image generating means generates a sound pressure waveform image in real time from the input voice of the learner,
The learning support system according to claim 9, wherein the waveform adjusting unit adjusts a sound pressure waveform using a sound pressure change width obtained from a voice of a learner recorded in the past.

The sound pressure waveform comparison image generating means generates a sound pressure waveform image in real time from the input voice of the learner,
The learning support system according to claim 9, wherein the display position adjusting unit adjusts the display position by using an initial sound pressure of the input voice of the learner.

Change width adding means provided in the sound pressure waveform comparison image generating means for adding information representing a sound pressure change width to the sound pressure waveform comparison image with respect to at least one of the teaching material sound pressure waveform information and the learner sound pressure waveform information. The learning support system according to claim 9, further comprising:

Teaching material sound pressure acquisition means for acquiring teaching material sound pressure waveform information which is a sound pressure waveform of the sound of the teaching material;
Learner sound pressure acquisition means for acquiring learner sound pressure waveform information which is a sound pressure waveform of a learner's voice;
A sound pressure waveform comparison image generating means for generating a sound pressure waveform comparison image for comparing the learning material sound pressure waveform information and the learner sound pressure waveform information;
Change width adding means provided in the sound pressure waveform comparison image generating means for adding information representing a sound pressure change width to the sound pressure waveform comparison image with respect to at least one of the teaching material sound pressure waveform information and the learner sound pressure waveform information. When,
A learning support system characterized by comprising

16. The learning support system according to claim 15, wherein the change width adding unit adds information indicating a maximum sound pressure and a minimum sound pressure of a sound to a sound pressure waveform comparison image.

Teaching material waveform acquisition means for acquiring teaching material waveform information which is a waveform of a speech parameter of the teaching material;
Learner waveform acquisition means for acquiring learner waveform information which is a waveform of a learner's voice parameter;
A waveform comparison image generating means for generating a waveform comparison image for comparing the teaching material waveform information and the learner waveform information;
An adjustment means provided in the waveform comparison image generation means, for adjusting a waveform image so as to absorb a difference between representative values of the teaching material and the learner's voice parameter;
A learning support system characterized by comprising

Obtaining teaching material pitch waveform information which is a pitch waveform of the teaching material voice;
Acquiring learner pitch waveform information which is a pitch waveform of a learner's voice;
Generating a pitch waveform comparison image for comparing the teaching material pitch waveform information and the learner pitch waveform information;
Adjusting the pitch waveform display position so as to absorb the difference between the representative pitch of the learning material voice and the representative pitch of the learner when generating the pitch waveform comparison image;
A voice information processing method for learning support, characterized by comprising:

Obtaining teaching material sound pressure waveform information that is a sound pressure waveform of the sound of the teaching material;
Obtaining learner sound pressure waveform information that is a sound pressure waveform of a learner's voice;
Generating a sound pressure waveform comparison image for comparing the learning material sound pressure waveform information and the learner sound pressure waveform information;
Adjusting the sound pressure waveform so as to absorb the difference between the sound pressure change width of the sound of the teaching material and the sound pressure change width of the learner's voice when generating the sound pressure waveform comparison image;
A voice information processing method for learning support, characterized by comprising:

Obtaining teaching material pitch waveform information which is a pitch waveform of the teaching material voice;
Acquiring learner pitch waveform information which is a pitch waveform of a learner's voice;
Generating a pitch waveform comparison image for comparing the teaching material pitch waveform information and the learner pitch waveform information;
Adjusting the pitch waveform display position so as to absorb a difference between a representative pitch of the voice of the teaching material and a representative pitch of the learner when generating the pitch waveform comparison image;
A speech information processing program for learning support, characterized in that a computer is executed.

Obtaining teaching material sound pressure waveform information that is a sound pressure waveform of the sound of the teaching material;
Obtaining learner sound pressure waveform information that is a sound pressure waveform of a learner's voice;
Generating a sound pressure waveform comparison image for comparing the learning material sound pressure waveform information and the learner sound pressure waveform information;
Adjusting the sound pressure waveform so as to absorb the difference between the sound pressure change width of the sound of the teaching material and the sound pressure change width of the learner's voice when generating the sound pressure waveform comparison image;
A speech information processing program for learning support, characterized in that a computer is executed.