JP2021043306A

JP2021043306A - Electronic apparatus, sound reproduction method, and program

Info

Publication number: JP2021043306A
Application number: JP2019164749A
Authority: JP
Inventors: 誠北地; Makoto Kitachi
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2019-09-10
Filing date: 2019-09-10
Publication date: 2021-03-18
Anticipated expiration: 2039-09-10
Also published as: JP7379968B2

Abstract

To provide an electronic apparatus with which it is possible for a user to naturally practice a listening lesson while listening to a pronounced portion that includes a phoneme which the user is not good at, without being obstructed from hearing the entire text.SOLUTION: If set to a hard-to-catch pronunciation listening mode when reproducing, for example, discretionarily selected English voice data, a pronunciation element hard for a user to catch which is discretionarily selected or registered to a table in advance is specified. A pronunciation timing that corresponds to a hard-to-catch pronunciation element in the voice data to be reproduced is specified on the basis of the position of a phonetic symbol added to text data that corresponds to the voice data, or the voice data is specified by voice recognition and a reproduction section of voice data of a pronunciation portion that includes the pronunciation timing of the specified hard-to-catch pronunciation element is specified. In the reproduction section of a pronunciation portion of the voice data to be reproduced that includes the pronunciation timing of the hard-to-catch pronunciation element, the reproduction speed is slowly switched and the speech rate is converted when reproduced.SELECTED DRAWING: Figure 4

Description

本発明は、例えば、ユーザが苦手な発音を含む音声データを再生するための電子機器、音声再生方法、およびプログラムに関する。 The present invention relates to, for example, an electronic device, a voice reproduction method, and a program for reproducing voice data including pronunciation that the user is not good at.

例えば、語学学習機などの学習装置において、学習対象となる言語のネイティブによる発音の音声データを再生する機能がある。ユーザは、単語、熟語、文章など、様々なテキストの中から任意のテキストを選択し、当該テキストに対応する音声データを聞くことで、リスニングの学習を行なうことができる。 For example, in a learning device such as a language learning machine, there is a function of reproducing voice data of pronunciation by a native speaker of the language to be learned. The user can learn listening by selecting an arbitrary text from various texts such as words, idioms, and sentences, and listening to the voice data corresponding to the text.

従来の学習装置では、選択したテキストに対応する音声データの全体（先頭から末尾まで）の再生速度を変化させ、ユーザは、当該音声データをゆっくり聞いたり速く聞いたりして学習できる。 In the conventional learning device, the playback speed of the entire voice data (from the beginning to the end) corresponding to the selected text is changed, and the user can learn by listening to the voice data slowly or quickly.

また、一般的に日本人が苦手な音素である「Ｌ」及び「Ｒ」の発音の聞き取りを練習するために、当該音素を含む発音区間で再生速度を変化させた専用の音声データを用いて、聞き取り練習を行なうことのできる英語の音素「Ｌ」及び「Ｒ」の学習装置が考えられている（例えば、特許文献１参照。）。 In addition, in order to practice listening to the pronunciation of "L" and "R", which are generally not good for Japanese people, we use dedicated voice data in which the playback speed is changed in the pronunciation section including the phoneme. , A learning device for English phonemes "L" and "R" capable of practicing listening is considered (see, for example, Patent Document 1).

特開２００４−３３４１６４号公報Japanese Unexamined Patent Publication No. 2004-334164

従来、苦手な音素の発音の聞き取りを練習するための学習装置では、当該苦手な音素（「Ｌ」及び「Ｒ」など）を含む専用の音声データを用いる必要がある。 Conventionally, in a learning device for practicing listening to the pronunciation of a phoneme that is not good at it, it is necessary to use dedicated voice data including the phoneme that is not good at it (“L”, “R”, etc.).

このため、従来の学習装置では、ユーザが、例えば任意のテキストの音声データを再生させて聞いているときに、ユーザの苦手な音素を含む発音部分の聞き取り練習を行なうことはできない。 Therefore, in the conventional learning device, when the user is listening to the voice data of an arbitrary text, for example, it is not possible to practice listening to the pronunciation portion including the phonemes that the user is not good at.

本発明は、このような課題に鑑みてなされたもので、ユーザによるテキスト全体の聞き取りが妨げられることなく、ユーザが自然に苦手な音素を含む発音部分の聞き取り練習を行なうことが可能になる電子機器、音声再生方法、およびプログラムを提供することを目的とする。 The present invention has been made in view of such a problem, and an electronic device that enables a user to practice listening to a pronunciation portion including a phoneme that is naturally weak, without hindering the user from listening to the entire text. It is intended to provide equipment, audio reproduction methods, and programs.

本発明に係る電子機器は、
プロセッサを備え、
前記プロセッサは、
学習対象となる発音要素を特定し、
再生対象の音声データ内で、前記特定された発音要素を含む一部の再生区間を対象区間として特定し、
前記音声データの再生中に、前記特定された前記対象区間での再生状態を他の再生区間の再生状態に対して変化させる、
ように構成されている。 The electronic device according to the present invention is
Equipped with a processor
The processor
Identify the pronunciation elements to be learned and
In the audio data to be reproduced, a part of the reproduction section including the specified sounding element is specified as the target section.
During the reproduction of the audio data, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section.
It is configured as follows.

本発明の電子機器の実施形態に係る学習支援装置１０の外観構成を示す図。The figure which shows the appearance structure of the learning support apparatus 10 which concerns on embodiment of the electronic device of this invention. 学習支援装置１０の電子回路の構成を示すブロック図。The block diagram which shows the structure of the electronic circuit of the learning support apparatus 10. 苦手発音テーブル（２２ｇ）に３段階の語学レベルに区分して記述されたユーザが苦手な複数の音素の発音記号の一例を示す図。It is a figure which shows an example of the phonetic symbols of a plurality of phonemes which a user is not good at, which is described in the weak pronunciation table (22 g) divided into three language levels. 学習支援装置１０の第１実施形態の音声再生処理（１）を示すフローチャート。The flowchart which shows the voice reproduction processing (1) of 1st Embodiment of a learning support apparatus 10. 音声再生処理（１）に含まれる音声選択処理（Ｓ１）を示すフローチャート。The flowchart which shows the audio selection process (S1) included in the audio reproduction process (1). 音声再生処理（１）に含まれる苦手発音要素特定処理（Ｓ３）を示すフローチャート。The flowchart which shows the weak pronunciation element identification process (S3) included in the voice reproduction process (1). 音声再生処理（１）に含まれる発音タイミング特定処理（Ｓ４）を示すフローチャート。The flowchart which shows the pronunciation timing specification process (S4) included in the voice reproduction process (1). 音声再生処理（１）に含まれる話速変換区間設定方法特定処理（Ｓ５）を示すフローチャート。The flowchart which shows the speech speed conversion section setting method specification process (S5) included in the voice reproduction process (1). 音声再生処理（１）に従った再生対象の音声データの通常の再生タイミングと、苦手な発音要素の発音部分に対応して再生速度を変化させ話速変換して再生する再生タイミングとを対比して示す図。Compare the normal playback timing of the voice data to be played back according to the voice playback process (1) with the playback timing of changing the playback speed according to the pronunciation part of the pronunciation element that is not good and converting the speech speed to play. The figure shown. 学習支援装置１０の第２実施形態の音声再生処理（２）を示すフローチャート。The flowchart which shows the voice reproduction processing (2) of the 2nd Embodiment of a learning support apparatus 10. 音声再生処理（２）に含まれる話速変換発音区間特定処理（Ａ４）を示すフローチャート。The flowchart which shows the speech speed conversion sounding section identification process (A4) included in the voice reproduction process (2). 音声再生処理（２）に従った２つの類似音素をそれぞれ含む２つの単語の音声データの通常の再生タイミングと、類似音素の発音部分に対応して再生速度を変化させ話速変換して再生する再生タイミングとを対比して示す図。The normal playback timing of the voice data of two words including two similar phonemes according to the voice playback process (2) and the playback speed are changed according to the pronunciation part of the similar phonemes to convert the speech speed and play the data. The figure which shows in comparison with the reproduction timing.

以下図面を参照して本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の電子機器の実施形態に係る学習支援装置１０の外観構成を示す図である。 FIG. 1 is a diagram showing an external configuration of a learning support device 10 according to an embodiment of the electronic device of the present invention.

電子機器は、以下に説明する学習支援専用の学習支援装置１０（実施形態では電子辞書）として構成されるか、学習支援機能を備えたタブレット型のＰＤＡ(personal digital assistants)、ＰＣ(personal computer)、携帯電話、電子ブック、携帯ゲーム機などとして構成される。 The electronic device is configured as a learning support device 10 (electronic dictionary in the embodiment) dedicated to learning support described below, or is a tablet-type PDA (personal digital assistant) or PC (personal computer) having a learning support function. , Mobile phones, electronic books, portable game machines, etc.

学習支援装置１０は、その本体ケース１１と蓋体ケース１２とがヒンジ部１３を介して展開／閉塞可能な折り畳み型ケースを備えて構成される。折り畳み型ケースを展開した本体ケース１１の表面には、［ホーム］キー１４ａ、機能指定キー１４ｂ、文字入力キー１４ｃ、［訳／決定］キー１４ｄ、［戻る／リスト］キー１４ｅ、カーソルキー１４ｆ、［シフト］キー１４ｇ、［音声］キー１４Ｓ、などを含むキー入力部（キーボード）１４、音声出力部（スピーカを含む）１５、および音声入力部（マイクを含む）１６が設けられる。 The learning support device 10 is configured to include a foldable case in which the main body case 11 and the lid case 12 can be deployed / closed via the hinge portion 13. On the surface of the main body case 11 in which the foldable case is unfolded, [Home] key 14a, function specification key 14b, character input key 14c, [translation / decision] key 14d, [back / list] key 14e, cursor key 14f, A key input unit (keyboard) 14 including a [shift] key 14g, a [voice] key 14S, and the like, an audio output unit (including a speaker) 15, and an audio input unit (including a microphone) 16 are provided.

また、蓋体ケース１２の表面には、タッチパネル式表示部（ディスプレイ）１７が設けられる。タッチパネル式表示部１７は、ユーザがペンや指などでタッチした位置を検出するタッチ位置検出装置と表示装置が一体となった構造であり、バックライト付きのカラー液晶表示画面に透明タッチパネルを重ねて構成される。 Further, a touch panel type display unit (display) 17 is provided on the surface of the lid case 12. The touch panel type display unit 17 has a structure in which a touch position detection device for detecting a position touched by a user with a pen or a finger and a display device are integrated, and a transparent touch panel is superimposed on a backlit color liquid crystal display screen. It is composed.

そして、タッチパネル式表示部１７の右端には、キー入力部１４における一部のキーの押下操作や本学習支援装置１０の一部の機能の指定操作を、タッチ操作により行うためのキーや機能の表記（［ホーム］［音声］［訳／決定］など）が固定印刷されたタッチキーエリア１７Ａが設けられる。 Then, at the right end of the touch panel type display unit 17, there are keys and functions for performing a touch operation for pressing some keys in the key input unit 14 and designating some functions of the learning support device 10. A touch key area 17A on which notations ([home], [voice], [translation / decision], etc.) are fixedly printed is provided.

キー入力部１４の機能指定キー１４ｂは、各キーに表記されている辞書コンテンツ（［大辞典］など）、辞書コンテンツのカテゴリ（［国語］［古語］［漢和］［英和］など）、学習コンテンツのカテゴリ（［学習１］［学習２］）、［コンテンツ一覧］、ツールの一つのカテゴリ［学習帳］を、それぞれ直接指定するためのキーである。 The function designation key 14b of the key input unit 14 is a dictionary content ([large dictionary], etc.) written on each key, a category of dictionary content ([national language], [old language], [Kanwa], [English-Japanese], etc.), and learning content. It is a key for directly specifying each of the categories ([Learning 1] and [Learning 2]), [Content list], and one category [Learning book] of the tool.

また、キー入力部１４のキーは、［シフト］キー１４ｇが操作された後に続けて操作されることで、そのキートップに枠囲み無しで記載されたキー機能ではなく、枠囲みして記載されたキーとして機能できるようになっている。例えば、［シフト］キー１４ｇの操作後に［訳／決定］キー１４ｄが操作（以下、［シフト］＋［決定］キーと記す。）されると、登録対象として指定されているデータを登録する機能を起動させるための［登録］キーとなる。［シフト］＋［削除］キーは［設定］キーとなる。 Further, the keys of the key input unit 14 are described in a frame instead of the key function described in the key top without a frame by being continuously operated after the [shift] key 14g is operated. It can function as a key. For example, when the [Translate / Enter] key 14d is operated (hereinafter referred to as [Shift] + [Enter] key) after the operation of the [Shift] key 14g, the function of registering the data designated as the registration target. It becomes the [Registration] key to start. The [Shift] + [Delete] keys are the [Settings] keys.

キー入力部１４の［音声］キー１４Ｓおよびタッチキーエリア１７Ａの［音声］タッチキーＢＳは、何れも、タッチパネル式表示部１７に表示されているテキストや項目の内容に対応する音声データを出力させるための音声再生機能を起動させるキーである。 Both the [voice] key 14S of the key input unit 14 and the [voice] touch key BS of the touch key area 17A output voice data corresponding to the contents of the text and items displayed on the touch panel type display unit 17. It is a key to activate the voice playback function for.

例えば、図１に示すように、英和辞典の見出し語検索に従い、見出し語“establish”の見出し語説明画面ＧＥをタッチパネル式表示部１７に表示させた状態で、［音声］キー１４Ｓ（又は［音声］タッチキーＢＳ）の操作により音声再生機能を起動させる。そして、見出し語説明画面ＧＥ上の見出し語“establish”を再生対象として選択して反転表示（識別表示）ｈさせた状態で、［訳／決定］キー１４ｄを操作すると、選択された見出し語“establish”に対応する音声データ（見出し語“establish”を読み上げる、例えばネイティブの音声データ）が再生され、音声出力部１５から出力される。 For example, as shown in FIG. 1, according to the headword search of the English-Japanese dictionary, the headword explanation screen GE of the headword "establish" is displayed on the touch panel display unit 17, and the [voice] key 14S (or [voice] is displayed. ] Touch key BS) to activate the voice playback function. Then, when the heading word "establish" on the heading word explanation screen GE is selected as the playback target and highlighted (identified display) h, the [translation / decision] key 14d is operated to select the heading word "establish". The voice data corresponding to "establish" (reading out the heading word "establish", for example, native voice data) is reproduced and output from the voice output unit 15.

本実施形態の学習支援装置１０は、音声再生機能に基づき音声データを再生する際、当該音声データにユーザが苦手な音素の発音（例えば“establish”の“sh”[∫]に対応する発音）が含まれている場合に、当該苦手な音素を含む発音部分に対応する音声データの再生区間を特定し、特定された再生区間での音声データの再生速度を変化させる（例えば遅くする）と共に、話速変換して再生する機能を有する。 When the learning support device 10 of the present embodiment reproduces voice data based on the voice reproduction function, the pronunciation of phonemes that the user is not good at with the voice data (for example, the pronunciation corresponding to “sh” [∫] of “establish”). When is included, the reproduction section of the audio data corresponding to the sounding portion including the phoneme that is not good at the above is specified, and the reproduction speed of the audio data in the specified reproduction section is changed (for example, slowed down). It has a function to convert the speech speed and play it.

これにより、本実施形態の学習支援装置１０では、再生対象としてユーザが選択した音声データの全体を再生する過程において、ユーザが苦手な音素を含む発音部分の音声データを、当該ユーザが聞き取り易いようにその再生速度を変化させて再生できる。 As a result, in the learning support device 10 of the present embodiment, in the process of reproducing the entire voice data selected by the user as the playback target, the user can easily hear the voice data of the sounding portion including the phonemes that the user is not good at. It can be played by changing its playback speed.

図２は、学習支援装置１０の電子回路の構成を示すブロック図である。 FIG. 2 is a block diagram showing a configuration of an electronic circuit of the learning support device 10.

学習支援装置１０の電子回路は、コンピュータであるＣＰＵ（プロセッサ）２１を備える。 The electronic circuit of the learning support device 10 includes a CPU (processor) 21 which is a computer.

ＣＰＵ２１は、フラッシュＲＯＭなどの記憶部（ストレージ）２２に予め記憶されたプログラム（学習支援処理プログラム２２ａおよび音声再生処理プログラム２２ｂを含む）、あるいはメモリカードなどの外部記録媒体２３から記録媒体読取部２４により読み取られて記憶部２２に記憶されたプログラム、あるいは通信ネットワークＮ上のＷｅｂサーバ（ここではプログラムサーバ）３０から通信部２５を介してダウンロードされ記憶部２２に記憶されたプログラム、に従って回路各部の動作を制御する。 The CPU 21 is a recording medium reading unit 24 from a program (including a learning support processing program 22a and a voice reproduction processing program 22b) stored in advance in a storage unit (storage) 22 such as a flash ROM, or an external recording medium 23 such as a memory card. According to the program read by the user and stored in the storage unit 22, or the program downloaded from the Web server (here, the program server) 30 on the communication network N via the communication unit 25 and stored in the storage unit 22, each circuit unit Control the operation.

ＣＰＵ２１には、データ及び制御バスを介して、記憶部２２、記録媒体読取部２４、通信部２５を接続するほか、キー入力部１４、音声出力部１５、音声入力部１６、表示部１７、を接続する。 A storage unit 22, a recording medium reading unit 24, and a communication unit 25 are connected to the CPU 21 via a data and control bus, and a key input unit 14, a voice output unit 15, a voice input unit 16, and a display unit 17 are connected to the CPU 21. Connecting.

音声出力部１５は、記憶部２２に記憶されているかあるいは録音された音声データに基づく音声を出力する本体スピーカ１５Ｓを備える。 The audio output unit 15 includes a main body speaker 15S that outputs audio based on audio data stored or recorded in the storage unit 22.

音声入力部１６は、ユーザ等の音声を入力する本体マイク１６Ｍを備える。 The voice input unit 16 includes a main body microphone 16M for inputting voice of a user or the like.

音声出力部１５および音声入力部１６は、共用の外部接続端子（ＥＸ）２６を備え、外部接続端子２６には、ユーザが必要に応じてイヤホンマイク２７を接続する。 The audio output unit 15 and the audio input unit 16 are provided with a shared external connection terminal (EX) 26, and the user connects the earphone microphone 27 to the external connection terminal 26 as needed.

イヤホンマイク２７は、イヤホンを有すると共に、マイク２７ｍを備えたリモコン部２７Ｒを有する。 The earphone microphone 27 has an earphone and a remote control unit 27R provided with a microphone 27m.

記憶部２２は、プログラム（学習支援処理プログラム２２ａおよび音声再生処理プログラム２２ｂを含む）を記憶するプログラム記憶部のほか、学習コンテンツ記憶部２２ｃ、辞書データ記憶部２２ｄ、他のコンテンツ記憶部２２ｅ、語学レベルデータ記憶部２２ｆ、苦手発音テーブル記憶部２２ｇ、発音変化イディオムテーブル記憶部２２ｈ、音声再生モードデータ記憶部２２ｉ、話速変換区間設定データ記憶部２２ｊ、および話速変換再生区間データ記憶部２２ｋを備える。 The storage unit 22 includes a program storage unit that stores programs (including a learning support processing program 22a and a voice reproduction processing program 22b), a learning content storage unit 22c, a dictionary data storage unit 22d, another content storage unit 22e, and a language. Level data storage unit 22f, weak pronunciation table storage unit 22g, pronunciation change idiom table storage unit 22h, voice reproduction mode data storage unit 22i, speech speed conversion section setting data storage unit 22j, and speech speed conversion reproduction section data storage unit 22k. Be prepared.

学習支援処理プログラム２２ａとしては、学習支援装置１０の全体の動作を司るシステムプログラム、通信部２５を介して外部の電子機器と通信接続するためのプログラム、および音声再生処理プログラム２２ｂと併せて学習コンテンツ記憶部２２ｃ、辞書データ記憶部２２ｄ、および他のコンテンツ記憶部２２ｅに記憶されている各種のコンテンツデータに応じた学習機能を実行するためのプログラムなどを記憶する。 The learning support processing program 22a includes a system program that controls the overall operation of the learning support device 10, a program for communicating with an external electronic device via the communication unit 25, and a learning content together with the voice reproduction processing program 22b. A program for executing a learning function corresponding to various content data stored in the storage unit 22c, the dictionary data storage unit 22d, and the other content storage unit 22e is stored.

音声再生処理プログラム２２ｂは、ユーザ操作に応じして選択された再生対象の音声データを再生するためのプログラム、および再生対象の音声データを再生する際、当該音声データにユーザが苦手な音素の発音が含まれている場合に、当該苦手な音素を含む発音部分に対応する音声データの再生区間を特定し、特定された再生区間での音声データの再生速度を変化させると共に、話速変換して再生するためのプログラムを含む。 The voice reproduction processing program 22b is a program for reproducing the audio data to be reproduced selected in response to the user operation, and when reproducing the audio data to be reproduced, the voice data is pronounced as a sound element that the user is not good at. When is included, the playback section of the voice data corresponding to the sounding part including the weak phonetic element is specified, the playback speed of the voice data in the specified playback section is changed, and the speech speed is converted. Includes a program for playback.

学習コンテンツ記憶部２２ｃは、例えば、リスニングレッスンデータ２２ｃ１、スピーキングレッスンデータ２２ｃ２、などの学習コンテンツデータを記憶する。 The learning content storage unit 22c stores learning content data such as listening lesson data 22c1 and speaking lesson data 22c2, for example.

リスニングレッスンデータ２２ｃ１は、例えば、リスニングレッスンの模範となる単語と文章に対応するテキストデータ（テキストデータには発音記号が付加されている）と当該テキストデータに対応する音声データを有し、単語または文章のテキストデータを表示部１７に表示させ、音声データを音声出力部１５から出力する機能を有する。 The listening lesson data 22c1 has, for example, text data corresponding to a word and a sentence as a model of the listening lesson (text data has a pronunciation symbol added) and voice data corresponding to the text data, and is a word or a word or a sentence. It has a function of displaying text data of a sentence on the display unit 17 and outputting voice data from the voice output unit 15.

スピーキングレッスンデータ２２ｃ２は、例えば、スピーキングレッスンの模範となるテキストデータ（テキストデータには発音記号が付加されている）と当該テキストデータに対応する音声データを有し、テキストデータを表示部１７に表示させ、音声データを音声出力部１５から出力した後に、音声入力部１６から入力したユーザの音声データを解析し、正誤等の判定結果を表示や音声により出力する機能を有する。 The speaking lesson data 22c2 has, for example, text data that serves as a model for speaking lessons (text data has a pronunciation symbol added) and voice data corresponding to the text data, and the text data is displayed on the display unit 17. After the voice data is output from the voice output unit 15, the user's voice data input from the voice input unit 16 is analyzed, and a determination result such as correctness is displayed or output by voice.

辞書データ記憶部２２ｄは、例えば、英和辞書、和英辞書、英英辞書、国語辞書などの各種の辞書コンテンツデータを記憶し、辞書コンテンツデータは、例えば、ユーザ操作に応じてキー入力または音声入力される辞書検索の対象となる見出し語に基づいて、当該見出し語に対応する説明情報を辞書検索して表示や音声により出力する機能を有する。 The dictionary data storage unit 22d stores various dictionary content data such as an English-Japanese dictionary, a Japanese-English dictionary, an English-English dictionary, and a Japanese dictionary, and the dictionary content data is, for example, key-input or voice-input according to a user operation. Based on the entry word that is the target of the dictionary search, it has a function to search the dictionary for the explanatory information corresponding to the entry word and output it by display or voice.

なお、辞書コンテンツデータは、各種の辞書のそれぞれにおいて、見出し語、見出し語の意味，内容を含む説明情報、見出し語を含む例文などのテキストデータ、および当該テキストデータに対応する音声データを有し、そのうち例えば見出し語および例文のテキストデータには、発音記号が付加されている。 The dictionary content data includes headwords, meanings of headwords, explanatory information including contents, text data such as example sentences including headwords, and audio data corresponding to the text data in each of the various dictionaries. Of these, for example, pronunciation symbols are added to the text data of headwords and example sentences.

他のコンテンツ記憶部２２ｅは、学習コンテンツデータ（２２ｃ）、辞書コンテンツデータ（２２ｄ）以外の、例えば書籍、新聞、雑誌などの他のコンテンツデータを記憶する。他のコンテンツデータは、各コンテンツデータのテキストデータ、および当該テキトデータに対応する音声データを有する。 The other content storage unit 22e stores other content data such as books, newspapers, magazines, etc. other than the learning content data (22c) and the dictionary content data (22d). The other content data includes text data of each content data and audio data corresponding to the text data.

語学レベルデータ記憶部２２ｆは、ユーザの語学レベルのデータを、例えば、初級：１、中級：２、上級：３として記憶する。語学レベルは、ユーザ操作に応じてユーザ自身の語学レベルが入力されて記憶されるか、あるいは学習コンテンツデータ（２２ｃ）や辞書コンテンツデータ（２２ｄ）に応じた学習機能が実行された際に、当該学習機能の中で判定されたユーザの語学レベルが自動更新されて記憶される。 The language level data storage unit 22f stores the user's language level data as, for example, beginner: 1, intermediate: 2, and advanced: 3. The language level corresponds to when the user's own language level is input and stored according to the user operation, or when the learning function according to the learning content data (22c) or the dictionary content data (22d) is executed. The language level of the user determined by the learning function is automatically updated and stored.

苦手発音テーブル記憶部２２ｇは、ユーザが苦手な音素の発音記号および当該発音記号に対応する音声データを、例えば３段階の語学レベル（初級：１、中級：２、上級：３）に区分けして対応付けたテーブルとして記憶する（図３参照）。 The weak pronunciation table storage unit 22g divides the phonetic symbols of phonemes that the user is not good at and the voice data corresponding to the phonetic symbols into, for example, three levels of language (beginner: 1, intermediate: 2, advanced: 3). Store as an associated table (see FIG. 3).

図３は、苦手発音テーブル（２２ｇ）に３段階の語学レベルに区分して記述されたユーザが苦手な複数の音素の発音記号の一例を示す図である。 FIG. 3 is a diagram showing an example of phonetic symbols of a plurality of phonemes that a user is not good at, which is described in a weak pronunciation table (22 g) divided into three language levels.

図３に示す苦手発音テーブル（２２ｇ）では、ユーザが苦手な音素の発音記号として、ユーザが苦手で且つ聞き分けるのが難しい２つの類似する音素の発音記号の組みが、複数組み記述され、語学レベル１（初級）のユーザは当該テーブルに記述された全ての音素の組み（１６組み）が聞き分けの苦手な類似音素であることを示し、語学レベル２（中級）のユーザは当該テーブルに記述された全ての音素の組みのうち下から９組みが聞き分けの苦手な類似音素であることを示し、語学レベル３（上級）のユーザは当該テーブルに記述された全ての音素の組みのうち下から４組みが聞き分けの苦手な類似音素であることを示している。 In the weak phoneme table (22 g) shown in FIG. 3, a plurality of sets of phoneme symbols of two similar phonemes that the user is not good at and difficult to distinguish are described as phoneme symbols that the user is not good at, and the language level. The 1 (beginner) user showed that all the phoneme sets (16 sets) described in the table were similar phonemes that were difficult to distinguish, and the language level 2 (intermediate) user was described in the table. It is shown that the bottom 9 pairs of all phoneme pairs are similar phonemes that are difficult to distinguish, and the user of language level 3 (advanced) can use the bottom 4 pairs of all phoneme pairs described in the table. Indicates that is a similar phoneme that is difficult to distinguish.

発音変化イディオムテーブル記憶部２２ｈは、熟語や成句など、複数の単語を連結して構成される語句のうち、単語を単一で発音した場合と比較して発音が変化する複数の発音変化語句（例えば“there is”：ゼァ・イズ→ゼァリズと発音変化）のデータをテーブルにして記憶する。発音変化語句のデータは、発音変化語句のテキストデータ、およびテキストデータに対応する音声データを有し、テキストデータには、発音が変化するテキストの範囲に対応して発音記号が付加（ユーザが苦手な音素の発音記号として付加）されている。 The pronunciation change idiom table storage unit 22h is a plurality of pronunciation change words (phrases whose pronunciation changes as compared with the case where a single word is pronounced among words and phrases composed by concatenating a plurality of words such as idioms and phrases. For example, the data of “there is”: “there is” → “there is” and the pronunciation change) is stored as a table. The phonetic change phrase data includes text data of the pronunciation change phrase and voice data corresponding to the text data, and phonetic symbols are added to the text data corresponding to the range of the text whose pronunciation changes (users are not good at it). It is added as a phonetic symbol of a phonetic element).

音声再生モードデータ記憶部２２ｉは、音声データの再生モード（通常再生モードまたは苦手発音聞き取り（練習）モードまたは類似音素聞き分け（練習）モードなど）を示すデータを記憶する。音声データの再生モードは、例えばユーザ操作に応じて選択される。 The voice reproduction mode data storage unit 22i stores data indicating a reproduction mode of voice data (normal reproduction mode, weak pronunciation listening (practice) mode, similar phoneme distinction (practice) mode, etc.). The audio data reproduction mode is selected, for example, according to the user operation.

話速変換区間設定データ記憶部２２ｊは、再生対象の音声データのうち、ユーザが苦手な音素を含む発音部分として再生速度を変化させ話速変換して再生する再生区間を、音素単位に設定するか、単語単位に設定するか、文単位に設定するか、の設定方法を示すデータ（話速変換区間設定データ）を記憶する。話速変換区間設定データ（音素単位／単語単位／文単位）は、ユーザ操作に応じて任意に特定されるか、またはユーザの語学レベルに応じて特定される。 The speech speed conversion section setting data storage unit 22j sets, in phoneme units, a reproduction section in which the speech speed is converted and reproduced by changing the reproduction speed as a sounding portion including phonemes that the user is not good at in the audio data to be reproduced. Data (speech speed conversion section setting data) indicating the setting method of whether to set in word units or sentence units is stored. The speech speed conversion interval setting data (phoneme unit / word unit / sentence unit) is arbitrarily specified according to the user operation, or is specified according to the language level of the user.

話速変換再生区間データ記憶部２２ｋは、再生対象の音声データの先頭（開始時間）から末尾（終了時間）までの再生タイミング（例えば先頭（開始時間）を０msecとした末尾（終了時間）までの時間で管理される：図９参照）において、話速変換区間設定データ（２２ｊ）に基づき特定された話速変換の再生区間に対応する再生タイミング（例えばＮ msec〜Ｍ msec）のデータを記憶する。 The speech speed conversion reproduction section data storage unit 22k is the reproduction timing from the beginning (start time) to the end (end time) of the audio data to be reproduced (for example, to the end (end time) with the beginning (start time) as 0 msec). In time management (see FIG. 9), the data of the reproduction timing (for example, N msec to M msec) corresponding to the reproduction section of the speech speed conversion specified based on the speech speed conversion interval setting data (22j) is stored. ..

このように構成された学習支援装置１０は、ＣＰＵ２１が学習支援処理プログラム２２ａおよび音声再生処理プログラム２２ｂに記述された命令に従い回路各部の動作を制御し、ソフトウエアとハードウエアとが協働して動作することにより、以下の動作説明で述べるような、音声再生機能を実現する。 In the learning support device 10 configured in this way, the CPU 21 controls the operation of each part of the circuit according to the instructions described in the learning support processing program 22a and the voice reproduction processing program 22b, and the software and the hardware cooperate with each other. By operating, the voice reproduction function as described in the following operation description is realized.

次に、実施形態の学習支援装置（電子辞書）１０の動作について説明する。 Next, the operation of the learning support device (electronic dictionary) 10 of the embodiment will be described.

（第１実施形態）
図４は、学習支援装置１０の第１実施形態の音声再生処理（１）を示すフローチャートである。 (First Embodiment)
FIG. 4 is a flowchart showing the voice reproduction process (1) of the first embodiment of the learning support device 10.

図５は、音声再生処理（１）に含まれる音声選択処理（Ｓ１）を示すフローチャートである。 FIG. 5 is a flowchart showing a voice selection process (S1) included in the voice reproduction process (1).

図６は、音声再生処理（１）に含まれる苦手発音要素特定処理（Ｓ３）を示すフローチャートである。 FIG. 6 is a flowchart showing a weak pronunciation element identification process (S3) included in the voice reproduction process (1).

図７は、音声再生処理（１）に含まれる発音タイミング特定処理（Ｓ４）を示すフローチャートである。 FIG. 7 is a flowchart showing a sounding timing specifying process (S4) included in the voice reproduction process (1).

図８は、音声再生処理（１）に含まれる話速変換区間設定方法特定処理（Ｓ５）を示すフローチャートである。 FIG. 8 is a flowchart showing a speech speed conversion section setting method specifying process (S5) included in the voice reproduction process (1).

図９は、音声再生処理（１）に従った再生対象の音声データの通常の再生タイミングと、苦手な発音要素の発音部分に対応して再生速度を変化させ話速変換して再生する再生タイミングとを対比して示す図である。 FIG. 9 shows a normal playback timing of the voice data to be played back according to the voice playback process (1), and a playback timing in which the playback speed is changed according to the sounding portion of the sounding element that is not good at the speech speed conversion. It is a figure which shows in contrast with.

再生対象の音声データを選択するための音声選択処理（Ｓ１）（図５参照）において、例えばユーザによる機能指定キー１４ｂの操作に応じて辞書が選択されると（ステップＳ１０１（Ｙｅｓ））、ＣＰＵ２１は、選択された辞書データに対応して検索対象の見出し語を入力するための見出し語入力画面（図示せず）を表示部１７に表示させる（ステップＳ１０２）。 In the voice selection process (S1) (see FIG. 5) for selecting the voice data to be reproduced, for example, when the dictionary is selected according to the operation of the function designation key 14b by the user (step S101 (Yes)), the CPU 21 Displays a headword input screen (not shown) for inputting a headword to be searched corresponding to the selected dictionary data on the display unit 17 (step S102).

見出し語入力画面において、ユーザ操作に応じて検索対象の見出し語が入力されると、ＣＰＵ２１は、入力された見出し語のデータを、選択された辞書データから検索し（ステップＳ１０３）、検索された見出し語とその意味，内容のデータを展開した見出し語説明画面ＧＥ（図１参照）を表示部１７に表示させる（ステップＳ１０４）。 When the headword to be searched is input in response to the user operation on the headword input screen, the CPU 21 searches the input headword data from the selected dictionary data (step S103) and searches. The display unit 17 displays the headword explanation screen GE (see FIG. 1) in which the headword, its meaning, and the content data are expanded (step S104).

見出し語説明画面ＧＥにおいて、例えば当該画面ＧＥに対するユーザのタッチ操作に応じてテキストが選択され、選択されたテキストが反転表示（識別表示）ｈされると（ステップＳ１０５（Ｙｅｓ））、ＣＰＵ２１は、選択されたテキストに対応する音声データを再生対象に設定する（ステップＳ１０６）。 On the headword explanation screen GE, for example, when a text is selected in response to a user's touch operation on the screen GE and the selected text is highlighted (identified display) h (step S105 (Yes)), the CPU 21 moves the CPU 21. The voice data corresponding to the selected text is set as the playback target (step S106).

一方、ユーザ操作に応じて、学習コンテンツ記憶部２２ｃに記憶されているリスニングレッスンデータ２２ｃ１の学習コンテンツが選択されると（ステップＳ１０７（Ｙｅｓ））、ＣＰＵ２１は、選択された学習コンテンツが有する、例えばリスニング練習の対象となる単語や文章の項目の一覧を表示部１７に表示させる（ステップＳ１０８）。 On the other hand, when the learning content of the listening lesson data 22c1 stored in the learning content storage unit 22c is selected according to the user operation (step S107 (Yes)), the CPU 21 has, for example, the selected learning content. A list of items of words and sentences to be practiced for listening is displayed on the display unit 17 (step S108).

表示された項目の一覧から、ユーザの例えばタッチ操作に応じて任意の項目が選択されると（ステップＳ１０９（Ｙｅｓ））、ＣＰＵ２１は、選択された項目に対応する単語や文章のテキストの音声データを再生対象に設定する（ステップＳ１１０）。 When an arbitrary item is selected from the displayed list of items according to, for example, a touch operation by the user (step S109 (Yes)), the CPU 21 performs voice data of the text of the word or sentence corresponding to the selected item. Is set as the playback target (step S110).

また、ユーザ操作に応じて、他のコンテンツ記憶部２２ｅに記憶されている他のコンテンツが選択されると（ステップＳ１１１（Ｙｅｓ））、ＣＰＵ２１は、選択された他のコンテンツのテキストデータを表示部１７に表示させる（ステップＳ１１２）。 Further, when other content stored in the other content storage unit 22e is selected according to the user operation (step S111 (Yes)), the CPU 21 displays the text data of the selected other content. It is displayed on 17 (step S112).

表示された他のコンテンツのテキストデータの中から、ユーザの例えばタッチ操作に応じて単語や文章などの任意のテキストが選択されると（ステップＳ１１３（Ｙｅｓ））、ＣＰＵ２１は、選択されたテキストに対応する音声データを再生対象に設定する（ステップＳ１１４）。 When any text such as a word or a sentence is selected from the text data of other displayed contents according to, for example, a touch operation of the user (step S113 (Yes)), the CPU 21 selects the selected text. The corresponding audio data is set as the playback target (step S114).

このように、音声選択処理（Ｓ１）に従い再生対象の音声データが選択されると、ＣＰＵ２１は、音声再生モードデータ記憶部２２ｉに記憶されている再生モードのデータに基づき、苦手発音聞き取りモードか通常再生モードかを判定する（ステップＳ２）。 In this way, when the audio data to be reproduced is selected according to the audio selection process (S1), the CPU 21 is in the poor pronunciation listening mode or the normal, based on the reproduction mode data stored in the audio reproduction mode data storage unit 22i. It is determined whether the mode is the playback mode (step S2).

ここで、通常再生モードと判定されると（ステップＳ２（Ｎｏ））、ＣＰＵ２１は、再生対象の音声データをその先頭から末尾まで通常の再生速度タイミングに従い通常の再生速度で再生する（ステップＳ１１）。 Here, when the normal playback mode is determined (step S2 (No)), the CPU 21 reproduces the audio data to be reproduced from the beginning to the end at the normal reproduction speed according to the normal reproduction speed timing (step S11). ..

一方、苦手発音聞き取りモードと判定されると（ステップＳ２（Ｙｅｓ））、ＣＰＵ２１は、ユーザが苦手な発音の要素（音素）を特定するための苦手発音要素特定処理（Ｓ２）（図６参照）に移行する。 On the other hand, when it is determined that the pronunciation listening mode is not good (step S2 (Yes)), the CPU 21 performs a weak pronunciation element identification process (S2) (see FIG. 6) for identifying a pronunciation element (phoneme) that the user is not good at. Move to.

苦手発音要素特定処理に移行されると、ＣＰＵ２１は、ユーザが苦手な発音の要素を、当該ユーザが任意に特定するか、または当該ユーザの語学レベルに応じて自動で特定するか、または発音変化イディオムテーブル（２２ｈ）に基づき自動で特定するかの何れかの項目について、ユーザに選択させる項目選択画面を表示部１７に表示させる。 When the process shifts to the weak pronunciation element identification process, the CPU 21 either arbitrarily identifies the pronunciation element that the user is not good at, automatically identifies it according to the language level of the user, or changes the pronunciation. The display unit 17 displays an item selection screen for the user to select any item that is automatically specified based on the idiom table (22h).

項目選択画面において、ユーザが苦手な発音の要素を、ユーザが任意に特定する項目が選択されると（ステップＳ３１（Ｙｅｓ））、ＣＰＵ２１は、例えば英語系の辞書データから読み出した発音記号の一覧を表示部１７に表示させる（ステップＳ３２）。 On the item selection screen, when an item for which the user arbitrarily identifies an element of pronunciation that the user is not good at is selected (step S31 (Yes)), the CPU 21 is, for example, a list of phonetic symbols read from English dictionary data. Is displayed on the display unit 17 (step S32).

発音記号の一覧において、ユーザ操作に応じて、当該ユーザが苦手な一つまたは複数の音素の発音記号が選択されると（ステップＳ３３（Ｙｅｓ））、ＣＰＵ２１は、選択された発音記号の発音要素を苦手発音要素として特定する（ステップＳ３４）。 In the list of phonetic symbols, when the phonetic symbols of one or more phonemes that the user is not good at are selected according to the user operation (step S33 (Yes)), the CPU 21 determines the phonetic elements of the selected phonetic symbols. Is specified as a weak pronunciation element (step S34).

一方、項目選択画面において、ユーザが苦手な発音の要素を、ユーザの語学レベルに応じて自動で特定する項目が選択されると（ステップＳ３５（Ｙｅｓ））、ＣＰＵ２１は、語学レベルデータ記憶部２２ｆからユーザの語学レベル（初級：１、または中級：２、または上級：３）のデータを取得し（ステップＳ３６）、苦手発音テーブル（２２ｇ：図３参照）の中から、当該ユーザの語学レベルに応じた複数の発音記号の発音要素を苦手発音要素として特定する（ステップＳ３７）。 On the other hand, on the item selection screen, when an item for automatically identifying a pronunciation element that the user is not good at is selected according to the language level of the user (step S35 (Yes)), the CPU 21 is the language level data storage unit 22f. Obtain the data of the user's language level (beginner: 1, intermediate: 2, or advanced: 3) from (step S36), and change the user's language level from the weak pronunciation table (22 g: see FIG. 3). The pronunciation elements of the corresponding plurality of phonetic symbols are specified as weak pronunciation elements (step S37).

また、項目選択画面において、ユーザが苦手な発音の要素を、発音変化イディオムテーブル（２２ｈ）に基づき自動で特定する項目が選択されると（ステップＳ３５（Ｎｏ））、ＣＰＵ２１は、発音変化イディオムテーブル（２２ｈ）にある複数の発音変化語句のテキストデータにそれぞれ対応付けられた、発音が変化するテキストの範囲に対応した発音記号の発音要素を苦手発音要素として特定する（ステップＳ３８）。 Further, on the item selection screen, when an item for automatically identifying a pronunciation element that the user is not good at is selected based on the pronunciation change idiom table (22h) (step S35 (No)), the CPU 21 uses the pronunciation change idiom table. The pronunciation element of the phonetic symbol corresponding to the range of the text whose pronunciation changes, which is associated with the text data of the plurality of pronunciation change words in (22h), is specified as a weak pronunciation element (step S38).

このように、苦手発音要素特定処理（Ｓ３）に従いユーザが苦手な発音の要素（音素）が特定されると、ＣＰＵ２１は、音声選択処理（Ｓ１）に従い選択された再生対象の音声データのうち、苦手発音要素特定処理（Ｓ３）に従い特定された苦手発音要素に対応する発音タイミングを特定するための発音タイミング特定処理（Ｓ４）（図７参照）に移行する。 In this way, when the pronunciation element (phoneme) that the user is not good at is specified according to the weak pronunciation element identification process (S3), the CPU 21 performs the reproduction target voice data selected according to the voice selection process (S1). The process shifts to the pronunciation timing specifying process (S4) (see FIG. 7) for specifying the pronunciation timing corresponding to the weak pronunciation element specified according to the weak pronunciation element specifying process (S3).

発音タイミング特定処理に移行されると、ＣＰＵ２１は、再生対象の音声データに対応するテキストデータに発音記号が付加されているか否かを判定する（ステップＳ４１）。 When the process shifts to the sounding timing specifying process, the CPU 21 determines whether or not a phonetic symbol is added to the text data corresponding to the voice data to be reproduced (step S41).

音声選択処理（Ｓ１）に従い選択された再生対象の音声データが、例えば辞書データ（２２ｄ）または学習コンテンツデータ（２２ｃ）から選択された音声データである場合に、当該音声データに対応するテキストデータに発音記号が付加されていると判定されると（ステップＳ４１（Ｙｅｓ））、ＣＰＵ２１は、苦手発音要素特定処理（Ｓ３）に従い特定されたユーザが苦手な発音要素が、発音変化イディオムテーブル（２２ｈ）を利用して特定されたものか否かを判定する（ステップＳ４２）。 When the voice data to be reproduced selected according to the voice selection process (S1) is, for example, voice data selected from dictionary data (22d) or learning content data (22c), the text data corresponding to the voice data is used. When it is determined that the phonetic symbol is added (step S41 (Yes)), the CPU 21 determines that the phonetic element identified by the user according to the weak pronunciation element identification process (S3) is the pronunciation change idiom table (22h). It is determined whether or not it has been specified by using (step S42).

苦手発音要素特定処理（Ｓ３）に従い特定されたユーザが苦手な発音要素が、発音変化イディオムテーブル（２２ｈ）を利用して特定されたものではない、すなわち、ユーザにより任意に特定された発音記号の発音要素であるか、ユーザの語学レベルに応じて苦手発音テーブル（２２ｇ）から特定された発音記号の発音要素であると判定されると（ステップＳ４２（Ｎｏ））、ＣＰＵ２１は、再生対象の音声データのうち、当該ユーザが苦手な発音要素の発音記号に対応する発音タイミングを、当該音声データに対応するテキストデータに付加された発音記号の位置に基づき特定する（ステップＳ４４）。 The phonetic element that the user is not good at, which is specified according to the weak phonetic element identification process (S3), is not specified by using the pronunciation change idiom table (22h), that is, the phonetic symbol arbitrarily specified by the user. When it is determined that it is a phonetic element or a phonetic element of a phonetic symbol specified from a phonetic symbol (22 g) that is not good according to the language level of the user (step S42 (No)), the CPU 21 determines the sound to be reproduced. Among the data, the pronunciation timing corresponding to the phonetic symbol of the phonetic element that the user is not good at is specified based on the position of the phonetic symbol added to the text data corresponding to the voice data (step S44).

また、苦手発音要素特定処理（Ｓ３）に従い特定されたユーザが苦手な発音要素が、発音変化イディオムテーブル（２２ｈ）を利用して特定されたものであると判定された場合（ステップＳ４２（Ｙｅｓ））、ＣＰＵ２１は、再生対象の音声データに対応するテキストデータに、発音変化イディオムテーブル（２２ｈ）にある発音変化イディオム（発音変化語句）のテキストデータと一致する部分があるか、すなわち、再生対象の音声データに発音変化語句と同じく発音が変化する部分が含まれているかを判定する（ステップＳ４３）。 Further, when it is determined that the pronunciation element that the user who is not good at according to the weak pronunciation element identification process (S3) is identified by using the pronunciation change idiom table (22h) (step S42 (Yes)). ), The CPU 21 has a part in the text data corresponding to the voice data to be reproduced that matches the text data of the pronunciation change idiom (pronunciation change phrase) in the pronunciation change idiom table (22h), that is, the reproduction target. It is determined whether the voice data includes a portion whose pronunciation changes as in the pronunciation change phrase (step S43).

再生対象の音声データに対応するテキストデータに、発音変化イディオム（発音変化語句）のテキストデータと一致する部分があると判定されると（ステップＳ４３（Ｙｅｓ））、ＣＰＵ２１は、再生対象の音声データのうち、発音変化イディオムテーブル（２２ｈ）を利用して特定されたユーザが苦手な発音要素の発音記号に対応する発音タイミングを、当該音声データに対応するテキストデータに付加された発音記号の位置に基づき特定する（ステップＳ４４）。 When it is determined that the text data corresponding to the voice data to be reproduced has a part that matches the text data of the pronunciation change idiom (pronunciation change phrase) (step S43 (Yes)), the CPU 21 determines the voice data to be reproduced. Among them, the pronunciation timing corresponding to the phonetic symbol of the phonetic element that the user identified by using the pronunciation change idiom table (22h) is set to the position of the phonetic symbol added to the text data corresponding to the voice data. Specify based on (step S44).

一方、音声選択処理（Ｓ１）に従い選択された再生対象の音声データが、例えば他のコンテンツデータ（２２ｅ）から選択された音声データである場合に、当該音声データに対応するテキストデータに発音記号が付加されていないと判定されると（ステップＳ４１（Ｎｏ））、ＣＰＵ２１は、再生対象の音声データを音声認識し、当該音声データをその先頭（開始時間）から末尾（終了時間）までに含まれる複数の音素毎の音素区間に分解する（ステップＳ４５）。 On the other hand, when the voice data to be reproduced selected according to the voice selection process (S1) is, for example, voice data selected from other content data (22e), a pronunciation symbol is added to the text data corresponding to the voice data. If it is determined that the data is not added (step S41 (No)), the CPU 21 recognizes the voice data to be reproduced and includes the voice data from the beginning (start time) to the end (end time). It is decomposed into speech section for each of a plurality of speech elements (step S45).

そして、ＣＰＵ２１は、苦手発音要素特定処理（Ｓ３）に従い特定されたユーザが苦手な発音要素が、発音変化イディオムテーブル（２２ｈ）を利用して特定されたものか否かを判定する（ステップＳ４６）。 Then, the CPU 21 determines whether or not the pronunciation element that the user is not good at according to the weak pronunciation element identification process (S3) is identified by using the pronunciation change idiom table (22h) (step S46). ..

苦手発音要素特定処理（Ｓ３）に従い特定されたユーザが苦手な発音要素が、発音変化イディオムテーブル（２２ｈ）を利用して特定されたものではない、すなわち、ユーザにより任意に特定された発音記号の発音要素であるか、ユーザの語学レベルに応じて苦手発音テーブル（２２ｇ）から特定された発音記号の発音要素であると判定されると（ステップＳ４６（Ｎｏ））、ＣＰＵ２１は、ステップＳ４５にて音素区間に分解された再生対象の音声データの中に、任意または苦手発音テーブル（２２ｇ）から特定された苦手な発音要素の音声データに一致する音素区間があるかを判定する（ステップＳ４９）。 The pronunciation element that the user is not good at, which is identified according to the weak pronunciation element identification process (S3), is not identified by using the pronunciation change idiom table (22h), that is, the pronunciation symbol arbitrarily specified by the user. When it is determined that it is a pronunciation element or a pronunciation element of a pronunciation symbol specified from a pronunciation table (22 g) that is not good according to the language level of the user (step S46 (No)), the CPU 21 performs step S45. It is determined whether or not there is a phoneme section that matches the phoneme data of the unfavorable pronunciation element specified from the arbitrary or weak pronunciation table (22 g) in the sound data to be reproduced decomposed into the phoneme sections (step S49).

再生対象の音声データの中に、任意または苦手発音テーブル（２２ｇ）から特定された苦手な発音要素の音声データに一致する音素区間があると判定されると（ステップＳ４９（Ｙｅｓ））、ＣＰＵ２１は、再生対象の音声データの中の、任意または苦手発音テーブル（２２ｇ）から特定された苦手な発音要素の音声データに一致した音素区間の発音タイミングを特定する（ステップＳ４８）。 When it is determined that there is a phoneme section in the audio data to be reproduced that matches the audio data of the unfavorable pronunciation element specified from the arbitrary or weak pronunciation table (22 g) (step S49 (Yes)), the CPU 21 determines. , The pronunciation timing of the phoneme section that matches the speech data of the weak pronunciation element specified from the arbitrary or weak pronunciation table (22 g) in the voice data to be reproduced is specified (step S48).

また、苦手発音要素特定処理（Ｓ３）に従い特定されたユーザが苦手な発音要素が、発音変化イディオムテーブル（２２ｈ）を利用して特定されたものであると判定された場合（ステップＳ４６（Ｙｅｓ））、ＣＰＵ２１は、ステップＳ４５にて音素区間に分解された再生対象の音声データの中に、発音変化イディオムテーブル（２２ｈ）から特定された苦手な発音要素の音声データに一致する音素区間があるかを判定する（ステップＳ４７）。 Further, when it is determined that the pronunciation element that the user who is not good at according to the weak pronunciation element identification process (S3) is identified by using the pronunciation change idiom table (22h) (step S46 (Yes)). ), Does the CPU 21 have a phoneme section that matches the phoneme data of the sounding element that the CPU 21 is not good at identified from the pronunciation change idiom table (22h) in the voice data to be reproduced decomposed into phoneme sections in step S45? Is determined (step S47).

再生対象の音声データの中に、発音変化イディオムテーブル（２２ｈ）から特定された苦手な発音要素の音声データに一致する音素区間があると判定されると（ステップＳ４７（Ｙｅｓ））、ＣＰＵ２１は、再生対象の音声データの中の、発音変化イディオムテーブル（２２ｈ）から特定された苦手な発音要素の音声データに一致した音素区間の発音タイミングを特定する（ステップＳ４８）。 When it is determined that there is a phoneme section in the voice data to be reproduced that matches the voice data of the sounding element that is not good at being identified from the pronunciation change idiom table (22h) (step S47 (Yes)), the CPU 21 determines. In the voice data to be reproduced, the pronunciation timing of the phoneme section that matches the voice data of the poor pronunciation element specified from the pronunciation change idiom table (22h) is specified (step S48).

このように、発音タイミング特定処理（Ｓ４）に従い、再生対象の音声データの中のユーザが苦手な発音要素（音素）に対応する発音タイミングが特定されると、ＣＰＵ２１は、再生対象の音声データのうち、当該ユーザが苦手な発音要素（音素）の発音タイミングを含む発音部分として再生速度を変化させ話速変換して再生する再生区間を、音素単位に設定するか、単語単位に設定するか、文単位に設定するか、の設定方法を特定するための話速変換区間設定方法特定処理（Ｓ５）（図８参照）に移行する。 In this way, when the pronunciation timing corresponding to the pronunciation element (phoneme) that the user is not good at in the voice data to be played back is specified according to the pronunciation timing specifying process (S4), the CPU 21 causes the voice data to be played back to be played. Of these, whether the playback section to be played by changing the playback speed and converting the speech speed as the pronunciation part including the pronunciation timing of the pronunciation element (phoneme) that the user is not good at is set in phoneme units or word units. The process shifts to the speech speed conversion section setting method specifying process (S5) (see FIG. 8) for setting in sentence units or specifying the setting method.

話速変換区間設定方法特定処理に移行されると、ＣＰＵ２１は、当該話速変換区間の設定方法について、ユーザが任意に特定するか、またはユーザの語学レベルに応じて自動で特定するかの何れかの設定方法特定項目について、ユーザに選択させる設定方法特定項目選択画面を表示部１７に表示させる。 When the process shifts to the speaking speed conversion section setting method specifying process, the CPU 21 either arbitrarily specifies the setting method of the speaking speed conversion section by the user, or automatically specifies the setting method according to the language level of the user. Setting method for the user to select a specific item The display unit 17 displays the setting method specific item selection screen.

設定方法特定項目選択画面において、ユーザが任意に設定方法を特定する項目が選択されると（ステップＳ５１（Ｙｅｓ））、ＣＰＵ２１は、話速変換して再生する再生区間を、音素単位に設定するか、単語単位に設定するか、文単位に設定するか、の設定方法の一覧を表示部１７に表示させる（ステップＳ５２）。 When an item for which the user arbitrarily specifies the setting method is selected on the setting method specific item selection screen (step S51 (Yes)), the CPU 21 sets the playback section to be reproduced by converting the speech speed in phoneme units. A list of setting methods for setting in word units or sentence units is displayed on the display unit 17 (step S52).

設定方法の一覧において、ユーザ操作に応じて、音素単位または単語単位または文単位のうち何れかの設定方法が選択されると（ステップＳ５３（Ｙｅｓ））、ＣＰＵ２１は、選択された設定方法を話速変換区間の設定方法として特定し、話速変換区間設定データ記憶部２２ｊに記憶させる（ステップＳ５４）。 In the list of setting methods, when any of phoneme unit, word unit, and sentence unit setting method is selected according to the user operation (step S53 (Yes)), the CPU 21 talks about the selected setting method. It is specified as a method for setting the speed conversion section, and is stored in the speech speed conversion section setting data storage unit 22j (step S54).

一方、設定方法特定項目選択画面において、ユーザの語学レベルに応じて自動で特定する項目が選択されると（ステップＳ５１（Ｎｏ））、ＣＰＵ２１は、語学レベルデータ記憶部２２ｆからユーザの語学レベル（初級：１、または中級：２、または上級：３）のデータを取得する（ステップＳ５６）。 On the other hand, when an item to be automatically specified is selected according to the language level of the user on the setting method specific item selection screen (step S51 (No)), the CPU 21 performs the language level of the user from the language level data storage unit 22f (step S51 (No)). The data of beginner level: 1, intermediate level: 2, or advanced level: 3) is acquired (step S56).

そして、話速変換区間の設定方法を、ユーザの語学レベルが（初級：１）である場合は＜文単位＞として特定し（ステップＳ５６→Ｓ５７ａ）、また、（中級：２）である場合は＜単語単位＞として特定し（ステップＳ５６→Ｓ５７ｂ）、また、（上級：３）である場合は＜音素単位＞として特定し（ステップＳ５６→Ｓ５７ｃ）、話速変換区間設定データ記憶部２２ｊに記憶させる。 Then, the method of setting the speech speed conversion section is specified as <sentence unit> when the user's language level is (beginner: 1) (step S56 → S57a), and when it is (intermediate: 2). It is specified as <word unit> (step S56 → S57b), and if it is (advanced: 3), it is specified as <phoneme unit> (step S56 → S57c) and stored in the speech speed conversion section setting data storage unit 22j. Let me.

ＣＰＵ２１は、話速変換区間設定方法特定処理（Ｓ５）に従い特定された話速変換区間の設定方法に基づいて、再生対象の音声データにおけるユーザが苦手な発音要素（音素）の発音タイミングを含む再生区間を特定する（ステップＳ６）。 Based on the speech speed conversion section setting method specified according to the speech speed conversion section setting method specifying process (S5), the CPU 21 reproduces the voice data to be reproduced including the pronunciation timing of the pronunciation element (phoneme) that the user is not good at. The section is specified (step S6).

例えば、音声選択処理（Ｓ１）にて選択された再生対象の音声データが、図９の（Ａ）に示すように、学習コンテンツデータ（２２ｃ）または辞書データ（２２ｄ）から選択された英単語“think”に対応する音声データであり、苦手発音要素特定処理（Ｓ３）に従い特定されたユーザが苦手な発音要素が、当該英単語“think”の“th”に対応する発音記号［θ］の発音要素（音素）であり、発音タイミング特定処理（Ｓ４）に従い特定された発音タイミングが、音声データ“think”のうち０〜１００msecの発音タイミングであり、話速変換区間設定方法特定処理（Ｓ５）に従い特定された話速変換区間の設定方法が＜音素単位＞である場合、ＣＰＵ２１は、再生対象の音声データ“think”におけるユーザが苦手な発音要素（音素）［θ］を含む再生区間を、当該＜音素単位＞である０〜１００msecとして特定し、話速変換再生区間データ記憶部２２ｋに記憶させる（ステップＳ６）。 For example, as shown in FIG. 9A, the audio data to be reproduced selected in the audio selection process (S1) is an English word selected from the learning content data (22c) or the dictionary data (22d). The pronunciation element that is the voice data corresponding to "think" and is not good for the user identified according to the weak pronunciation element identification process (S3) is the pronunciation of the pronunciation symbol [θ] corresponding to the "th" of the English word "think". The pronunciation timing, which is an element (phonetic element) and is specified according to the pronunciation timing specifying process (S4), is the pronunciation timing of 0 to 100 msec in the voice data “think”, and is according to the speech speed conversion section setting method specifying process (S5). When the specified method of setting the speech speed conversion section is <phone unit unit>, the CPU 21 sets the playback section including the pronunciation element (phone element) [θ] that the user is not good at in the voice data “think” to be played. It is specified as 0 to 100 msec, which is a <speech unit>, and is stored in the speech speed conversion reproduction section data storage unit 22k (step S6).

ＣＰＵ２１は、再生対象の音声データ“think”の音声出力部１５からの通常再生を開始すると共に（ステップＳ７）、当該音声データ“think”の再生区間が、話速変換再生区間データ記憶部２２ｋに記憶された話速変換の対象となる再生区間０〜１００msecであるか否かを判定する（ステップＳ８）。 The CPU 21 starts normal playback of the voice data “think” to be played back from the voice output unit 15 (step S7), and the playback section of the voice data “think” is changed to the speech speed conversion playback section data storage unit 22k. It is determined whether or not the reproduction section is 0 to 100 msec, which is the target of the stored speech speed conversion (step S8).

そして、音声データ“think”の再生区間が、話速変換の対象となる再生区間０〜１００msecであると判定される状態では（ステップＳ８（Ｙｅｓ））、ＣＰＵ２１は、図９の（Ａ）（Ｂ）に示すように、音声データ“think”の“th”に対応する発音部分について、再生速度を遅く（ここでは２．７倍に遅く）切り換えて変化させると共に話速変換して再生する（ステップＳ９ａ）。 Then, in a state where it is determined that the reproduction section of the voice data “think” is the reproduction section 0 to 100 msec to be converted in speech speed (step S8 (Yes)), the CPU 21 is in the state (A) (A) of FIG. As shown in B), for the sounding part corresponding to "th" of the voice data "think", the playback speed is slowed down (here, 2.7 times slower) to change and the speech speed is converted and played back (here). Step S9a).

また、音声データ“think”の再生区間が、話速変換の対象となる再生区間０〜１００msecではない再生区間１００〜４００msecと判定される状態では（ステップＳ８（Ｎｏ））、ＣＰＵ２１は、図９の（Ａ）（Ｂ）に示すように、音声データ“think”の“ink”に対応する発音部分について、再生速度を通常の再生速度に切り換えて話速変換せずに再生する（ステップＳ９ｂ）。 Further, in a state where the reproduction section of the voice data “think” is determined to be a reproduction section of 100 to 400 msec, which is not a reproduction section of 0 to 100 msec to be converted in speech speed (step S8 (No)), the CPU 21 is shown in FIG. As shown in (A) and (B) of the above, the pronunciation portion corresponding to "ink" of the voice data "think" is reproduced by switching the reproduction speed to the normal reproduction speed without converting the speech speed (step S9b). ..

そして、再生対象の音声データ“think”（０〜４００msec）の全ての再生が終了したと判定されると（ステップＳ１０（Ｙｅｓ））、ＣＰＵ２１は、一連の音声再生処理（１）を終了する。 Then, when it is determined that all the reproduction of the audio data “think” (0 to 400 msec) to be reproduced has been completed (step S10 (Yes)), the CPU 21 ends a series of audio reproduction processes (1).

これにより、再生対象の音声データ“think”は、ユーザが苦手な“th”の発音記号［θ］に対応する発音要素（音素）を含む発音部分の再生区間において、再生速度が遅く切り換えられ話速変換されて再生されるので、当該再生対象の音声データ“think”の全体のユーザによる聞き取りが妨げられることなく、ユーザが自然に苦手な音素を含む発音部分の聞き取り練習を行なうことが可能になる。 As a result, the voice data "think" to be played back is switched at a slower playback speed in the playback section of the sounding part including the phonetic element (phoneme) corresponding to the phonetic symbol [θ] of "th", which the user is not good at. Since it is quickly converted and played back, it is possible for the user to practice listening to the pronunciation part including phonemes that he or she is not naturally good at without hindering the listening of the entire voice data "think" to be played back. Become.

なお、ここでは、再生対象の音声データが単語であり、話速変換区間設定方法特定処理（Ｓ５）により特定された話速変換区間の設定方法が＜音素単位＞である場合の例について説明したが、再生対象の音声データが複数の単語からなる文であり（例えば英単語“think”を含む例文“Where do you think she lives?”）、当該話速変換区間の設定方法が＜単語単位＞として特定された場合には、再生対象の音声データ“ Where do you think she lives?”におけるユーザが苦手な発音要素（音素）［θ］を含む単語単位の発音部分である“think”の再生区間が特定され（図９の（Ａ）参照）、再生対象の音声データの全体である“ Where do you think she lives?”のうちの“think”の単語の再生区間において、再生速度が遅く（例えば２．７倍に遅く）切り換えられ話速変換して再生され、その他の単語の再生区間については話速変換せずに通常の再生速度で再生される（ステップＳ８，Ｓ９ａ，Ｓ１０）。 Here, an example in which the voice data to be reproduced is a word and the setting method of the speaking speed conversion section specified by the speaking speed conversion section setting method specifying process (S5) is <phoneme unit> has been described. However, the voice data to be played is a sentence consisting of a plurality of words (for example, an example sentence "Where do you think she lives?" Including the English word "think"), and the method of setting the speech speed conversion section is <word unit>. When specified as, the playback section of "think", which is a word-based pronunciation part including the pronunciation element (phoneme) [θ] that the user is not good at in the voice data "Where do you think she lives?" Is specified (see (A) in FIG. 9), and the playback speed is slow (for example, in the playback section of the word “think” in “Where do you think she lives?”, Which is the entire audio data to be played back. (2.7 times slower) is switched and the speech speed is converted and reproduced, and the reproduction sections of other words are reproduced at the normal reproduction speed without the speech speed conversion (steps S8, S9a, S10).

これによれば、話速変換区間の設定方法が＜音素単位＞である場合と比較して、ユーザが苦手な発音要素（音素）［θ］を含む英単語“think”の音声データを、再生対象の文に含まれる他の単語よりもユーザにより聞き取り易く再生できる。 According to this, the voice data of the English word “think” including the pronunciation element (phoneme) [θ], which the user is not good at, is reproduced as compared with the case where the setting method of the speech speed conversion section is <phoneme unit>. It can be played more easily by the user than other words contained in the target sentence.

また、音声選択処理（Ｓ１）にて選択された再生対象の音声データが、例えば他のコンテンツ（２２ｅ）から選択された複数の文が連なる文章の音声データであって、話速変換区間の設定方法が＜文単位＞として特定された場合には、再生対象の音声データ（ここでは文章）におけるユーザが苦手な発音要素（音素）［θ］を含む単語を有した文単位の発音部分である文の再生区間が特定される。そして、再生対象の音声データの全体である文章のうちの苦手な発音要素（音素）［θ］を含む単語を有した文の再生区間において、再生速度が遅く（例えば２．７倍に遅く）切り換えられ話速変換して再生され、その他の文の再生区間については話速変換せずに通常の再生速度で再生される（ステップＳ８，Ｓ９ａ，Ｓ１０）。 Further, the voice data to be reproduced selected in the voice selection process (S1) is, for example, voice data of a sentence in which a plurality of sentences selected from other contents (22e) are connected, and the speech speed conversion section is set. When the method is specified as <sentence unit>, it is a sentence unit pronunciation part having a word containing a pronunciation element (phonetic element) [θ] that the user is not good at in the speech data (sentence in this case) to be reproduced. The playback section of the sentence is specified. Then, the reproduction speed is slow (for example, 2.7 times slower) in the reproduction section of the sentence having the word containing the pronunciation element (phoneme) [θ] which is not good in the sentence which is the whole voice data to be reproduced. It is switched and the speech speed is converted and reproduced, and the reproduction sections of other sentences are reproduced at the normal reproduction speed without the speech speed conversion (steps S8, S9a, S10).

これによれば、話速変換区間の設定方法が＜音素単位＞である場合、および＜単語単位＞である場合と比較して、ユーザが苦手な発音要素（音素）［θ］を含む単語を有した文の音声データを、再生対象の文章に含まれる他の文よりもユーザにより聞き取り易く再生できる。
また、複数の単語を連結して構成される語句であって、複数の単語が連続する部分で音が変化するような場合に、その複数の単語の発音部分を聞き取り易くすることができる。 According to this, a word containing a pronunciation element (phoneme) [θ] that the user is not good at is compared with the case where the method of setting the speech speed conversion section is <phoneme unit> and the case where <word unit> is used. The voice data of the sentence can be reproduced more easily by the user than other sentences included in the sentence to be reproduced.
In addition, when a phrase is composed by connecting a plurality of words and the sound changes in a portion where the plurality of words are continuous, the pronunciation portion of the plurality of words can be easily heard.

なお、図７を参照して説明した発音タイミング特定処理（Ｓ４）において、再生対象の音声データに、ユーザが苦手な発音要素に対応する発音部分が含まれないと判定された場合（ステップＳ４３（Ｎｏ）／Ｓ４７（Ｎｏ）／Ｓ４９（Ｎｏ））には、当該音声データに話速変換の対象となる再生区間は特定されないので、同音声データはその全体の再生区間において通常の再生速度で再生される（ステップＳ５〜Ｓ８（Ｎｏ），Ｓ９ｂ，Ｓ１０，終了）。 In the pronunciation timing specifying process (S4) described with reference to FIG. 7, when it is determined that the voice data to be reproduced does not include a pronunciation portion corresponding to a pronunciation element that the user is not good at (step S43 (step S43). No) / S47 (No) / S49 (No)) does not specify a playback section for which the speech speed is converted in the voice data, so that the voice data is played back at a normal playback speed in the entire playback section. (Steps S5 to S8 (No), S9b, S10, end).

以上のように構成した学習支援装置１０の第１実施形態の音声再生処理（１）によれば、辞書データ（２２ｄ）や学習コンテンツデータ（２２ｃ）などからユーザにより任意に選択された、例えば英語のテキストデータに対応する音声データを再生する際に、音声再生モード（２２ｉ）が苦手発音聞き取りモードに設定されている場合には、ユーザにより発音記号の一覧から任意に選択されるか、または苦手発音テーブル（２２ｇ）や発音変化イディオムテーブル（２２ｈ）に登録されている、ユーザが苦手な発音記号の発音要素が特定される。 According to the voice reproduction processing (1) of the first embodiment of the learning support device 10 configured as described above, the user arbitrarily selects from the dictionary data (22d), the learning content data (22c), and the like, for example, English. When the voice data corresponding to the text data of is played, if the voice playback mode (22i) is set to the poor pronunciation listening mode, the user can arbitrarily select it from the list of phonetic symbols or is not good at it. The phonetic elements of phonetic symbols that the user is not good at, which are registered in the phonetic table (22 g) and the phonetic change idiom table (22h), are specified.

すると、再生対象の音声データにおける苦手な発音要素に対応する発音タイミングが、当該音声データに対応するテキストデータに付加された発音記号の位置に基づき特定されるか、または当該音声データを音声認識して同音声データを構成する音素区間に分解し、苦手な発音要素の音声データと一致する音素区間を判定することで特定され、特定された苦手な発音要素の発音タイミングを含む発音部分の音声データの再生区間が特定される。 Then, the pronunciation timing corresponding to the pronunciation element that is not good in the voice data to be reproduced is specified based on the position of the phonetic symbol added to the text data corresponding to the voice data, or the voice data is voice-recognized. It is specified by decomposing into the phonetic sections that compose the same voice data and determining the phonetic section that matches the voice data of the phonetic element that is not good at it. The playback section of is specified.

そして、再生対象の音声データの通常の再生が開始され、当該音声データの苦手な発音要素の発音タイミングを含む発音部分の再生区間では、再生速度が遅く切り換えられると共に話速変換されて再生される。 Then, normal reproduction of the audio data to be reproduced is started, and in the reproduction section of the pronunciation portion including the pronunciation timing of the pronunciation element which is not good at the audio data, the reproduction speed is switched slowly and the speech speed is converted and reproduced. ..

これにより、ユーザにより選択されたテキスト全体の聞き取りが妨げられることなく、ユーザが自然に苦手な音素を含む発音部分の聞き取り練習を行なうことが可能になる。 This makes it possible for the user to practice listening to the pronunciation part including phonemes that the user is naturally not good at, without hindering the listening of the entire text selected by the user.

また、学習支援装置１０の第１実施形態の音声再生処理（１）によれば、再生対象の音声データにおける苦手な発音要素の発音タイミングを含む発音部分の再生区間は、ユーザにより任意に選択されて特定されるか、ユーザの語学レベル（２２ｆ）に応じて特定される、話速変換区間の設定方法（＜音素単位＞または＜単語単位＞または＜文単位＞）に従い特定される。 Further, according to the voice reproduction process (1) of the first embodiment of the learning support device 10, the reproduction section of the pronunciation portion including the pronunciation timing of the pronunciation element which is not good in the audio data to be reproduced is arbitrarily selected by the user. It is specified according to the setting method (<phoneme unit> or <word unit> or <sentence unit>) of the speech speed conversion section, which is specified according to the language level (22f) of the user.

このため、例えばユーザが語学上級者である場合は、再生対象の音声データのうち、ユーザが苦手な発音要素（音素）を含む当該音素の発音部分のみ話速変換の対象となる再生区間として特定され、また、例えばユーザが語学中級者や語学初級者である場合は、再生対象の音声データのうち、ユーザが苦手な発音要素（音素）を含む発音部分として、単語全体や文全体が話速変換の対象となる再生区間として特定されるので、ユーザの語学レベルに応じて、当該ユーザが苦手な発音要素（音素）を含む音声データの発音部分を、当該ユーザが聞き取り易く且つ学習に効果的な範囲に特定して再生できる。 Therefore, for example, when the user is an advanced language expert, only the pronunciation part of the phoneme including the pronunciation element (phoneme) that the user is not good at is specified as the playback section to be the target of speech speed conversion in the voice data to be played. In addition, for example, when the user is an intermediate language person or a beginner language person, the whole word or the whole sentence is spoken as a pronunciation part including a pronunciation element (phoneme) that the user is not good at in the voice data to be reproduced. Since it is specified as the playback section to be converted, it is easy for the user to hear and effective for learning the pronunciation part of the voice data including the pronunciation element (phoneme) that the user is not good at, depending on the language level of the user. It can be played by specifying it in a wide range.

以上、第１実施形態の音声再生処理（１）では、再生対象の音声データを再生する際に、当該音声データうち、ユーザが苦手な発音要素（音素）を含む発音部分の再生区間を特定し、特定した再生区間の再生速度を遅く切り換えて（変化させて）再生する実施例について説明した。 As described above, in the voice reproduction process (1) of the first embodiment, when the audio data to be reproduced is reproduced, the reproduction section of the pronunciation portion including the pronunciation element (phoneme) that the user is not good at is specified. , The embodiment in which the reproduction speed of the specified reproduction section is slowly switched (changed) and reproduced has been described.

以下、第２実施形態の音声再生処理（２）では、２つの類似の発音要素（音素）をそれぞれ含む２つの単語（熟語、成句等でもよい）の音声データをそれぞれ再生し、ユーザが類似音素を聞き分ける練習を行なう際に、当該２つの単語それぞれの音声データにおいて、２つの類似の発音要素（音素）を含む発音部分の再生区間を特定し、特定した再生区間の再生速度を遅く変化させ話速変換して再生する実施例について説明する。 Hereinafter, in the voice reproduction process (2) of the second embodiment, the voice data of two words (which may be compound words, phrases, etc.) including two similar pronunciation elements (phonemes) are reproduced, and the user uses the similar phonemes. When practicing to distinguish between the two words, the playback section of the pronunciation part containing two similar pronunciation elements (phonemes) is specified in the voice data of each of the two words, and the playback speed of the specified playback section is changed slowly. An example of speed conversion and reproduction will be described.

（第２実施形態）
図１０は、学習支援装置１０の第２実施形態の音声再生処理（２）を示すフローチャートである。 (Second Embodiment)
FIG. 10 is a flowchart showing the voice reproduction process (2) of the second embodiment of the learning support device 10.

図１１は、音声再生処理（２）に含まれる話速変換発音区間特定処理（Ａ４）を示すフローチャートである。 FIG. 11 is a flowchart showing a speech speed conversion sounding section specifying process (A4) included in the voice reproduction process (2).

図１２は、音声再生処理（２）に従った２つの類似音素をそれぞれ含む２つの単語の音声データの通常の再生タイミングと、類似音素の発音部分に対応して再生速度を変化させ話速変換して再生する再生タイミングとを対比して示す図である。 FIG. 12 shows the normal playback timing of the voice data of two words including two similar phonemes according to the voice reproduction process (2), and the speech speed conversion by changing the playback speed according to the sounding portion of the similar phonemes. It is a figure which shows in comparison with the reproduction timing which is reproduced.

ユーザ操作に応じて音声再生処理（２）が開始されると、ＣＰＵ２１は、音声再生モードデータ記憶部２２ｉに記憶されている再生モードのデータに基づき、再生モードが、類似音素の聞き分けモードであるかを判定する（ステップＡ１）。 When the voice reproduction process (2) is started in response to the user operation, the CPU 21 sets the reproduction mode as a listening mode for similar phonemes based on the reproduction mode data stored in the audio reproduction mode data storage unit 22i. (Step A1).

類似音素の聞き分けモードであると判定されると（ステップＡ１（Ｙｅｓ））、ＣＰＵ２１は、例えば苦手発音テーブル（２２ｇ：図３参照）に記述されている、ユーザが苦手で且つ聞き分けるのが難しい２つの類似する音素の発音記号の組み（１６組み）を、表示部１７に一覧にして表示させ、聞き分け対象となる２つの類似する音素の発音記号の組みをユーザに選択させる（ステップＡ２）。 When it is determined that the mode is for distinguishing similar phonemes (step A1 (Yes)), the CPU 21 is not good at the user and is difficult to distinguish, which is described in, for example, a weak pronunciation table (22 g: see FIG. 3). The phonetic symbol sets (16 sets) of two similar phonemes are displayed in a list on the display unit 17, and the user is made to select the phonetic symbol sets of two similar phonemes to be distinguished (step A2).

ここでは、聞き分け対象となる２つの類似する音素の発音記号の組みとして、苦手発音テーブル（２２ｇ）の語学レベル１（初級）に区分けされている発音記号の組み（[∫]：［θ］）が選択されたと仮定する。 Here, as a set of phonetic symbols of two similar phonemes to be distinguished, a set of phonetic symbols divided into language level 1 (beginner) of the weak pronunciation table (22 g) ([∫]: [θ]) Suppose that is selected.

ＣＰＵ２１は、選択された聞き分け対象となる２つの類似する音素をそれぞれ含む音声データを選択する（ステップＡ３）。 The CPU 21 selects voice data including two similar phonemes to be selected and distinguished (step A3).

ここでは、ステップＡ２にて選択された２つの類似する音素の発音記号（[∫]：［θ］）に基づいて、辞書データ（２２ｄ）あるいは学習コンテンツデータ（２２ｃ）から、それぞれの音素が含まれる２つの単語（“sink”と“think”）に対応する音声データが選択されたと仮定する。 Here, each phoneme is included from the dictionary data (22d) or the learning content data (22c) based on the phonetic symbols ([∫]: [θ]) of two similar phonemes selected in step A2. It is assumed that the phoneme data corresponding to the two words (“sink” and “think”) are selected.

すると、ＣＰＵ２１は、話速変換発音区間特定処理（Ａ４：図１１参照）に移行し、選択された２つの単語（“sink”と“think”）に対応する音声データを対象に、それぞれの音声データ内の類似する発音要素（[∫]：［θ］）に対応する発音タイミングを特定し、特定された発音タイミングに対応する部分の音声データを、再生速度を遅くして話速変換処理する。 Then, the CPU 21 shifts to the speech speed conversion pronunciation section identification process (A4: see FIG. 11), and targets the voice data corresponding to the two selected words (“sink” and “think”), and each voice is targeted. The pronunciation timing corresponding to a similar pronunciation element ([∫]: [θ]) in the data is specified, and the voice data of the part corresponding to the specified pronunciation timing is subjected to speech speed conversion processing at a slower playback speed. ..

すなわち、話速変換発音区間特定処理（Ａ４）に移行されると、ＣＰＵ２１は、先ず、聞き分け対象として選択された２つの音声データが類似の音声であるか否かを、例えば各音声データに対応するテキストに付加された発音記号の一致度に基づき判定する（ステップＡ４１）。 That is, when the process shifts to the speech speed conversion pronunciation section identification process (A4), the CPU 21 first determines whether or not the two voice data selected as the distinction targets are similar voices, for example, for each voice data. Judgment is made based on the degree of matching of the phonetic symbols added to the text to be used (step A41).

聞き分け対象として選択された２つの音声データが、２つの単語（“sink”と“think”）に対応する音声データである場合に、当該各音声データに対応する発音記号の一致度に基づき類似の音声であると判定されると（ステップＡ４１（Ｙｅｓ））、ＣＰＵ２１は、２つの音声データをそれぞれ音声認識して、例えば図１２の（Ａ１）（Ｂ１）に示すように、開始時間０msecから終了時間４００msecまでを構成する各音素区間の発音タイミング（０−１００−１７０−２７０−４００msec）に分解する（ステップＡ４２）。 When the two voice data selected as the distinction target are voice data corresponding to two words (“sink” and “think”), they are similar based on the degree of matching of the phonetic symbols corresponding to each voice data. When it is determined that the sound is voice (step A41 (Yes)), the CPU 21 recognizes each of the two voice data, and ends from a start time of 0 msec, for example, as shown in (A1) and (B1) of FIG. It is decomposed into the sounding timing (0-100-170-270-400 msec) of each phoneme section constituting the time up to 400 msec (step A42).

そして、２つの音声データを比較して差異のある部分（“sink”の“s”[∫]の部分と“think”の“th”［θ］の部分）の音素区間に対応する発音タイミング（０−１００msec）を特定し、例えば図１２の（Ａ２）（Ｂ２）に示すように、特定された発音タイミングに対応する部分の音声データを、再生速度を遅く（ここでは２．７倍に遅く）して話速変換処理する（ステップＡ４３）。 Then, comparing the two voice data, the pronunciation timing corresponding to the phoneme section of the difference (“s” [∫] part of “sink” and “th” [θ] part of “think”) (0-100 msec) is specified, and as shown in (A2) and (B2) of FIG. 12, the reproduction speed of the audio data of the portion corresponding to the specified sounding timing is slowed down (here, 2.7 times slower). ) To perform the speech speed conversion process (step A43).

一方、聞き分け対象として選択された２つの音声データが、類似の音声ではないと判定された場合（ステップＡ４１（Ｎｏ））、ＣＰＵ２１は、２つの音声データそれぞれにおいて、選択された類似音素の発音記号（[∫]：［θ］）に対応する発音タイミングを特定し、特定された発音タイミングに対応する部分の音声データを、再生速度を遅くして話速変換処理する（ステップＡ４４）。 On the other hand, when it is determined that the two voice data selected as the objects to be distinguished are not similar voices (step A41 (No)), the CPU 21 uses the phonetic symbols of the selected similar phonemes in each of the two voice data. The pronunciation timing corresponding to ([∫]: [θ]) is specified, and the voice data of the portion corresponding to the specified pronunciation timing is subjected to the speech speed conversion process by slowing down the reproduction speed (step A44).

ＣＰＵ２１は、話速変換発音区間特定処理（Ａ４）に従い、図１２の（Ａ２）（Ｂ２）に示すように、話速変換処理された２つの音声データ（“sink”と“think”）のうち、一方の音声データ“sink”と他方の音声データ“think”とを順番に再生する（ステップＡ５，Ａ６）。 The CPU 21 follows the speech speed conversion sound section identification process (A4), and as shown in (A2) and (B2) of FIG. 12, of the two voice data (“sink” and “think”) that have been subjected to the speech speed conversion process. , One audio data "sink" and the other audio data "think" are reproduced in order (steps A5 and A6).

ここで、ＣＰＵ２１は、一方の音声データ“sink”と他方の音声データ“think”が、ユーザにより順番に指定される毎に再生するよう処理してもよいし、自動で順次再生するよう処理してもよい。 Here, the CPU 21 may process the one audio data "sink" and the other audio data "think" to be reproduced each time they are sequentially specified by the user, or may be processed to automatically reproduce the audio data "sink" in sequence. You may.

以上のように構成した学習支援装置１０の第２実施形態の音声再生処理（２）によれば、２つの類似の発音要素（音素）をそれぞれ含む２つの単語（熟語、成句等でもよい）の音声データが、聞き分け対象の音声データとして選択されると、当該２つの単語それぞれの音声データにおいて、２つの類似の発音要素（音素）を含む発音部分の再生区間が特定され、特定された再生区間の再生速度が遅く変化され話速変換されて再生される。 According to the voice reproduction process (2) of the second embodiment of the learning support device 10 configured as described above, two words (may be compound words, phrases, etc.) containing two similar pronunciation elements (phonemes), respectively. When the voice data is selected as the voice data to be discriminated, the reproduction section of the pronunciation portion including two similar pronunciation elements (phonemes) is specified in the voice data of each of the two words, and the specified reproduction section is specified. The playback speed of is changed slowly and the speech speed is converted and played.

これにより、ユーザによる聞き取りが苦手な２つの類似の発音要素（音素）をそれぞれ含む２つの英単語等の音声データの再生において、ユーザは、当該類似の発音要素（音素）の部分を容易に聞き取って、効果的に聞き分ける練習を行なうことが可能になる。 As a result, in reproducing voice data such as two English words including two similar pronunciation elements (phonemes) that the user is not good at hearing, the user can easily hear the part of the similar pronunciation element (phoneme). Therefore, it becomes possible to practice distinguishing effectively.

なお、以上の学習支援装置１０による第１および第２実施形態の音声再生処理において、再生対象の音声データを再生する際に、ユーザが苦手なあるいは聞き分け対象の発音要素（音素）を含む発音部分の再生区間を特定し、当該特定した再生区間の再生速度を遅く切り換えるタイミングと、元の通常の再生速度に切り換えるタイミングでは、当該再生速度を段階的に切り換えることで、ユーザに聞き取りの違和感を与えないよう処理してもよい。 In the voice reproduction processing of the first and second embodiments by the above learning support device 10, when reproducing the audio data to be reproduced, the pronunciation portion including the pronunciation element (phoneme) that the user is not good at or is to distinguish. At the timing of specifying the playback section of the above and switching the playback speed of the specified playback section slowly and the timing of switching to the original normal playback speed, the playback speed is gradually switched to give the user a sense of discomfort in listening. You may process it so that it does not exist.

また、第１および第２実施形態の音声再生処理では、再生対象の音声データの、ユーザが苦手なあるいは聞き分け対象の発音要素（音素）を含む特定の再生区間において、再生速度を遅く切り換える（変化させる）ことで、当該ユーザが苦手なあるいは聞き分け対象の発音要素（音素）を含む発音部分をユーザに聞き取り易く再生し、ユーザが効果的に練習を行えるよう構成した。 Further, in the audio reproduction processing of the first and second embodiments, the reproduction speed of the audio data to be reproduced is switched slowly (change) in a specific reproduction section including a pronunciation element (phoneme) which the user is not good at or is to be distinguished. By doing so, the pronunciation part including the pronunciation element (phoneme) that the user is not good at or is to be distinguished is reproduced so that the user can easily hear it, and the user can practice effectively.

これとは逆に、再生対象の音声データの、特定の再生区間の再生速度を早く切り換える（変化させる）ことで、ユーザが苦手なあるいは聞き分け対象の発音要素（音素）を含む発音部分を、ユーザにより聞き取り難く再生し、例えば語学レベルの高いユーザにとって効果的な練習が行えるよう構成してもよい。 On the contrary, by quickly switching (changing) the playback speed of a specific playback section of the voice data to be played back, the user can select the pronunciation part including the pronunciation element (phoneme) that the user is not good at or is to distinguish. The playback may be difficult to hear, and may be configured so that, for example, a user with a high language level can practice effectively.

さらに、再生対象の音声データの、特定の再生区間の再生速度を変化させるのではなく、当該特定の再生区間の再生音量を大きくまたは小さく変化させ強調して再生することで、ユーザが苦手なあるいは聞き分け対象の発音要素（音素）を含む発音部分の聞き取り練習を行なう構成としてもよい。 Furthermore, the user is not good at playing the audio data to be played back by changing the playback volume of the specific playback section to a large or small value and emphasizing the playback, instead of changing the playback speed of the specific playback section. It may be configured to practice listening to the pronunciation part including the pronunciation element (phoneme) to be distinguished.

前記各実施形態において記載した電子機器（学習支援装置１０）による各処理の手法、すなわち、図４〜図８のフローチャートに示す第１実施形態の音声再生処理（１）、図１０，図１１のフローチャートに示す第２実施形態の音声再生処理（２）などの各手法は、何れもコンピュータに実行させることができるプログラムとして、メモリカード（ＲＯＭカード、ＲＡＭカードなど）、磁気ディスク（フロッピ（登録商標）ディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤなど）、半導体メモリなどの外部記録装置の媒体に格納して配布することができる。そして、電子機器のコンピュータ（ＣＰＵ）は、この外部記録装置の媒体に記録されたプログラムを記憶装置に読み込み、この読み込んだプログラムによって動作が制御されることにより、前記各実施形態において説明した音声再生機能を実現し、前述した手法による同様の処理を実行することができる。 Each processing method by the electronic device (learning support device 10) described in each of the above embodiments, that is, the voice reproduction processing (1) of the first embodiment shown in the flowcharts of FIGS. 4 to 8, FIGS. 10 and 11. Each method such as the sound reproduction processing (2) of the second embodiment shown in the flowchart is a memory card (ROM card, RAM card, etc.) and a magnetic disk (floppy (registered trademark)) as programs that can be executed by a computer. ) Disks, hard disks, etc.), optical disks (CD-ROM, DVD, etc.), semiconductor memories, etc. can be stored and distributed in external recording device media. Then, the computer (CPU) of the electronic device reads the program recorded in the medium of the external recording device into the storage device, and the operation is controlled by the read program, so that the audio reproduction described in each of the above-described embodiments It is possible to realize the function and execute the same processing by the above-mentioned method.

また、前記各手法を実現するためのプログラムのデータは、プログラムコードの形態として通信ネットワーク（Ｎ）上を伝送させることができ、この通信ネットワーク（Ｎ）に接続されたコンピュータ装置（プログラムサーバ）から、前記プログラムのデータを電子機器に取り込んで記憶装置に記憶させ、前述した音声再生機能を実現することもできる。 Further, the data of the program for realizing each of the above methods can be transmitted on the communication network (N) in the form of a program code, and is transmitted from a computer device (program server) connected to the communication network (N). It is also possible to take the data of the program into an electronic device and store it in a storage device to realize the above-mentioned voice reproduction function.

本願発明は、前記各実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。さらに、前記各実施形態には種々の段階の発明が含まれており、開示される複数の構成要件における適宜な組み合わせにより種々の発明が抽出され得る。例えば、各実施形態に示される全構成要件から幾つかの構成要件が削除されたり、幾つかの構成要件が異なる形態にして組み合わされても、発明が解決しようとする課題の欄で述べた課題が解決でき、発明の効果の欄で述べられている効果が得られる場合には、この構成要件が削除されたり組み合わされた構成が発明として抽出され得るものである。 The present invention is not limited to each of the above-described embodiments, and can be variously modified at the implementation stage without departing from the gist thereof. Further, each of the above-described embodiments includes inventions at various stages, and various inventions can be extracted by an appropriate combination of a plurality of disclosed constituent requirements. For example, even if some constituent requirements are deleted from all the constituent requirements shown in each embodiment or some constituent requirements are combined in different forms, the problems described in the section of the problem to be solved by the invention Can be solved and the effects described in the section on the effects of the invention can be obtained, the configuration in which this constituent requirement is deleted or combined can be extracted as the invention.

以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。 The inventions described in the claims of the original application of the present application are described below.

［付記１］
プロセッサを備え、
前記プロセッサは、
学習対象となる発音要素を特定し、
再生対象の音声データ内で、前記特定された発音要素を含む一部の再生区間を対象区間として特定し、
前記音声データの再生中に、前記特定された前記対象区間での再生状態を他の再生区間の再生状態に対して変化させる、
ように構成されている電子機器。 [Appendix 1]
Equipped with a processor
The processor
Identify the pronunciation elements to be learned and
In the audio data to be reproduced, a part of the reproduction section including the specified sounding element is specified as the target section.
During the reproduction of the audio data, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section.
An electronic device that is configured to.

［付記２］
前記プロセッサは、
ユーザが再生対象として任意に指定したテキストに対応する音声データを再生し、
学習モードが設定されている場合には、前記特定された対象区間での再生状態を他の再生区間の再生状態に対して変化させ、学習モードが設定されていない場合には、前記音声データの全体を同じ再生状態で再生する、
ように構成されている付記１に記載の電子機器。 [Appendix 2]
The processor
Plays the audio data corresponding to the text arbitrarily specified by the user as the playback target,
When the learning mode is set, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section, and when the learning mode is not set, the audio data Play the whole in the same playback state,
The electronic device according to Appendix 1, which is configured as described above.

［付記３］
前記プロセッサは、
前記音声データの再生中に、前記特定された前記対象区間で再生される音声を他の再生区間で再生される音声よりも強調するか、前記特定された対象区間での再生速度を他の再生区間の再生速度よりも遅くする、
ように構成されている付記１または付記２に記載の電子機器。 [Appendix 3]
The processor
During the reproduction of the audio data, the audio reproduced in the specified target section is emphasized more than the audio reproduced in the other reproduction section, or the reproduction speed in the specified target section is set to another reproduction. Slower than the playback speed of the section,
The electronic device according to Appendix 1 or Appendix 2, which is configured as described above.

［付記４］
前記プロセッサは、
前記特定された対象区間で、音程を変えずに再生速度を変化させる話速変換により再生速度を変化させる、
ように構成されている付記１ないし付記３の何れかに記載の電子機器。 [Appendix 4]
The processor
In the specified target section, the reproduction speed is changed by the speech speed conversion that changes the reproduction speed without changing the pitch.
The electronic device according to any one of Supplementary note 1 to Supplementary note 3 configured as described above.

［付記５］
前記プロセッサは、
再生対象のテキストが単語を含む場合に、前記単語に含まれる一部の発音要素の発音部分を前記対象区間として特定する第１処理と、
再生対象のテキストが文を含む場合に、前記文に含まれる一部の単語の発音部分を前記対象区間として特定する第２処理と、
再生対象のテキストが文章である場合に、前記文章に含まれる一部の文の発音部分を前記対象区間として特定する第３処理、
のうちの少なくとも１つの処理を実行する、
ように構成されている付記１乃至付記４のいずれかに記載の電子機器。 [Appendix 5]
The processor
When the text to be reproduced contains a word, the first process of specifying the pronunciation part of a part of the pronunciation element included in the word as the target section, and
When the text to be reproduced contains a sentence, the second process of specifying the pronunciation part of some words included in the sentence as the target section, and
Third process of specifying the pronunciation part of a part of the sentence included in the sentence as the target section when the text to be reproduced is a sentence.
Perform at least one of the processes,
The electronic device according to any one of Supplementary note 1 to Supplementary note 4, which is configured as described above.

［付記６］
前記プロセッサは、
前記第１処理と、前記第２処理と、前記第３処理、のいずれかをユーザに選択させる、
付記５に記載の電子機器。 [Appendix 6]
The processor
Let the user select one of the first process, the second process, and the third process.
The electronic device according to Appendix 5.

［付記７］
ディスプレイと、
ストレージと、を備え、
前記プロセッサは、
学習対象となる発音要素を、前記ディスプレイに表示させた複数の発音記号の中からユーザに選択させて特定するか、または前記ストレージに予め記憶された前記ユーザにとって苦手な発音要素のデータに基づき特定する、
ように構成されている付記１ないし付記６の何れかに記載の電子機器。 [Appendix 7]
With the display
With storage,
The processor
The phonetic element to be learned is specified by the user by selecting it from a plurality of phonetic symbols displayed on the display, or is specified based on the data of the phonetic element that is stored in the storage in advance and is not good for the user. To do,
The electronic device according to any one of Supplementary note 1 to Supplementary note 6 configured as described above.

［付記８］
前記ストレージは、音声データを対応付けたテキストデータを記憶し、
前記プロセッサは、
前記ストレージに記憶されたテキストデータのテキストを前記ディスプレイに表示させる、ように構成され、
前記再生対象の音声データは、前記ディスプレイに表示されたテキストの中からユーザにより任意に選択されたテキストに対応する音声データである、
付記７に記載の電子機器。 [Appendix 8]
The storage stores text data associated with voice data, and stores the text data.
The processor
It is configured to display the text of the text data stored in the storage on the display.
The voice data to be reproduced is voice data corresponding to a text arbitrarily selected by the user from the text displayed on the display.
The electronic device according to Appendix 7.

［付記９］
前記プロセッサは、
前記音声データの前記特定された苦手な発音要素を含む再生区間を、当該苦手な発音要素としての音素を含む音素単位または単語単位または文単位の再生区間として特定する、
ように構成されている付記７または付記８に記載の電子機器。 [Appendix 9]
The processor
A reproduction section including the specified weak pronunciation element of the voice data is specified as a phoneme unit, a word unit, or a sentence unit reproduction section including a phoneme as the weak pronunciation element.
The electronic device according to Appendix 7 or Appendix 8, which is configured as described above.

［付記１０］
前記プロセッサは、
前記苦手な発音要素としての音素を含む音素単位または単語単位または文単位の再生区間を、ディスプレイに表示させた当該音素単位または単語単位または文単位の選択項目をユーザに選択させて特定するか、またはストレージに記憶されたユーザの語学レベルのデータに応じて特定する、
付記９に記載の電子機器。 [Appendix 10]
The processor
The user is allowed to select a phoneme-based, word-based, or sentence-based playback section that includes a phoneme as a pronunciation element that he / she is not good at, and the user selects and specifies the phoneme-based, word-based, or sentence-based selection item displayed on the display. Or identify according to the user's language level data stored in the storage,
The electronic device according to Appendix 9.

［付記１１］
前記ストレージに予め記憶された苦手な発音要素のデータは、複数の単語を連結して構成される語句のうち、当該単語を単一で発音した場合と比較して発音が変化する発音変化語句の当該発音が変化する部分の発音要素のデータである、
付記７または付記８に記載の電子機器。 [Appendix 11]
The data of the pronunciation element that is not good to be stored in advance in the storage is the pronunciation change phrase whose pronunciation changes as compared with the case where the word is pronounced alone among the words and phrases composed by concatenating a plurality of words. It is the data of the pronunciation element of the part where the pronunciation changes,
The electronic device according to Appendix 7 or Appendix 8.

［付記１２］
前記プロセッサは、
前記音声データの前記特定された再生区間での再生速度を、当該特定された再生区間以外での再生速度よりも遅く変化させる、
ように構成されている付記１ないし付記１１の何れかに記載の電子機器。 [Appendix 12]
The processor
The reproduction speed of the audio data in the specified reproduction section is changed to be slower than the reproduction speed in the other than the specified reproduction section.
The electronic device according to any one of Supplementary note 1 to Supplementary note 11 configured as described above.

［付記１３］
前記プロセッサは、
聞き分け練習の対象となる２つの発音要素を特定し、
前記特定された２つの発音要素をそれぞれ含む２つの単語の音声データを再生する際に、前記２つの音声データそれぞれの前記２つの発音要素を含む再生区間を特定し、
前記２つの音声データの前記特定されたそれぞれの再生区間での再生速度を変化させる、
ように構成されている付記１ないし付記１２の何れかに記載の電子機器。 [Appendix 13]
The processor
Identify the two pronunciation elements that are the subject of listening practice,
When reproducing the voice data of two words including the two specified pronunciation elements, the reproduction section including the two pronunciation elements of each of the two voice data is specified.
The reproduction speed of the two audio data in each of the specified reproduction sections is changed.
The electronic device according to any one of Supplementary note 1 to Supplementary note 12, which is configured as described above.

［付記１４］
電子機器のプロセッサにより、
学習対象となる発音要素を特定し、
再生対象の音声データ内で、前記特定された発音要素を含む一部の再生区間を対象区間として特定し、
前記音声データの再生中に、前記特定された前記対象区間での再生状態を他の再生区間の再生状態に対して変化させる、
ようにした音声再生方法。 [Appendix 14]
Depending on the processor of the electronic device
Identify the pronunciation elements to be learned and
In the audio data to be reproduced, a part of the reproduction section including the specified sounding element is specified as the target section.
During the reproduction of the audio data, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section.
How to play audio.

［付記１５］
電子機器のプロセッサを、
学習対象となる発音要素を特定し、
再生対象の音声データ内で、前記特定された発音要素を含む一部の再生区間を対象区間として特定し、
前記音声データの再生中に、前記特定された前記対象区間での再生状態を他の再生区間の再生状態に対して変化させる、
ように機能させるためのプログラム。 [Appendix 15]
Electronic processor,
Identify the pronunciation elements to be learned and
In the audio data to be reproduced, a part of the reproduction section including the specified sounding element is specified as the target section.
During the reproduction of the audio data, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section.
A program to make it work like this.

１０ …学習支援装置（電子機器）
１４ …キー入力部（キーボード）
１４Ｓ…［音声］キー
ＢＳ …［音声］タッチキー
１５ …音声出力部
１５Ｓ…本体スピーカ
１７ …タッチパネル式表示部（ディスプレイ）
２１ …ＣＰＵ（プロセッサ）
２２ …記憶部（ストレージ）
２２ａ…学習支援処理プログラム
２２ｂ…音声再生処理プログラム
２２ｃ…学習コンテンツ記憶部
２２ｄ…辞書データ記憶部
２２ｅ…他のコンテンツ記憶部
２２ｆ…語学レベルデータ記憶部
２２ｇ…苦手発音テーブル記憶部
２２ｈ…発音変化イディオムテーブル記憶部
２２ｉ…音声再生モードデータ記憶部
２２ｊ…話速変換区間設定データ記憶部
２２ｋ…話速変換再生区間データ記憶部
２３ …外部記録媒体
２４ …記録媒体読取部
２５ …通信部
２７ …イヤホンマイク
３０ …Ｗｅｂサーバ（プログラムサーバ）
Ｎ …通信ネットワーク（インターネット） 10 ... Learning support device (electronic device)
14 ... Key input section (keyboard)
14S ... [Voice] key BS ... [Voice] Touch key 15 ... Audio output unit 15S ... Main unit speaker 17 ... Touch panel display unit (display)
21 ... CPU (processor)
22 ... Storage unit
22a ... Learning support processing program 22b ... Voice reproduction processing program 22c ... Learning content storage unit 22d ... Dictionary data storage unit 22e ... Other content storage unit 22f ... Language level data storage unit 22g ... Bad pronunciation table storage unit 22h ... Pronunciation change idiom Table storage unit 22i ... Voice reproduction mode data storage unit 22j ... Speaking speed conversion section setting data storage unit 22k ... Speaking speed conversion reproduction section data storage unit 23 ... External recording medium 24 ... Recording medium reading unit 25 ... Communication unit 27 ... Earphone microphone 30 ... Web server (program server)
N ... Communication network (Internet)

Claims

Equipped with a processor
The processor
Identify the pronunciation elements to be learned and
In the audio data to be reproduced, a part of the reproduction section including the specified sounding element is specified as the target section.
During the reproduction of the audio data, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section.
An electronic device that is configured to.

The processor
Plays the audio data corresponding to the text arbitrarily specified by the user as the playback target,
When the learning mode is set, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section, and when the learning mode is not set, the audio data Play the whole in the same playback state,
The electronic device according to claim 1, which is configured as described above.

The processor
During the reproduction of the audio data, the audio reproduced in the specified target section is emphasized more than the audio reproduced in the other reproduction section, or the reproduction speed in the specified target section is set to another reproduction. Slower than the playback speed of the section,
The electronic device according to claim 1 or 2, which is configured as described above.

The processor
In the specified target section, the reproduction speed is changed by the speech speed conversion that changes the reproduction speed without changing the pitch.
The electronic device according to any one of claims 1 to 3, which is configured as described above.

The processor
When the text to be reproduced contains a word, the first process of specifying the pronunciation part of a part of the pronunciation element included in the word as the target section, and
When the text to be reproduced contains a sentence, the second process of specifying the pronunciation part of some words included in the sentence as the target section, and
Third process of specifying the pronunciation part of a part of the sentence included in the sentence as the target section when the text to be reproduced is a sentence.
Perform at least one of the processes,
The electronic device according to any one of claims 1 to 4, which is configured as described above.

The processor
Let the user select one of the first process, the second process, and the third process.
The electronic device according to claim 5, which is configured as described above.

With the display
With storage,
The processor
The phonetic element to be learned is specified by the user by selecting it from a plurality of phonetic symbols displayed on the display, or is specified based on the data of the phonetic element that is stored in the storage in advance and is not good for the user. To do,
The electronic device according to any one of claims 1 to 6, which is configured as described above.

The storage stores text data associated with voice data, and stores the text data.
The processor
It is configured to display the text of the text data stored in the storage on the display.
The voice data to be reproduced is voice data corresponding to a text arbitrarily selected by the user from the text displayed on the display.
The electronic device according to claim 7.

The processor
A reproduction section including the specified weak pronunciation element of the voice data is specified as a phoneme unit, a word unit, or a sentence unit reproduction section including a phoneme as the weak pronunciation element.
The electronic device according to claim 7 or 8.

The processor
The user is allowed to select a phoneme-based, word-based, or sentence-based playback section that includes a phoneme as a pronunciation element that he / she is not good at, and the user selects and specifies the phoneme-based, word-based, or sentence-based selection item displayed on the display. Or identify according to the user's language level data stored in the storage,
The electronic device according to claim 9.

The data of the pronunciation element that is not good to be stored in advance in the storage is the pronunciation change phrase whose pronunciation changes as compared with the case where the word is pronounced alone among the words and phrases composed by concatenating a plurality of words. It is the data of the pronunciation element of the part where the pronunciation changes,
The electronic device according to claim 7 or 8.

The processor
The reproduction speed of the audio data in the specified reproduction section is changed to be slower than the reproduction speed in the other than the specified reproduction section.
The electronic device according to any one of claims 1 to 11, which is configured as described above.

The processor
Identify the two pronunciation elements that are the subject of listening practice,
When reproducing the voice data of two words including the two specified pronunciation elements, the reproduction section including the two pronunciation elements of each of the two voice data is specified.
The reproduction speed of the two audio data in each of the specified reproduction sections is changed.
The electronic device according to any one of claims 1 to 12, which is configured as described above.

Depending on the processor of the electronic device
Identify the pronunciation elements to be learned and
In the audio data to be reproduced, a part of the reproduction section including the specified sounding element is specified as the target section.
During the reproduction of the audio data, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section.
How to play audio.

Electronic processor,
Identify the pronunciation elements to be learned and
In the audio data to be reproduced, a part of the reproduction section including the specified sounding element is specified as the target section.
During the reproduction of the audio data, the reproduction state in the specified target section is changed with respect to the reproduction state of another reproduction section.
A program to make it work like this.