JP2010048930A

JP2010048930A - Voice data creation method, storage device, integrated circuit device, and voice reproduction system

Info

Publication number: JP2010048930A
Application number: JP2008211630A
Authority: JP
Inventors: Tsutomu Nonaka; 勉野中; Masayuki Murakami; 雅行村上
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2008-08-20
Filing date: 2008-08-20
Publication date: 2010-03-04

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice data creation method sufficiently reproducing voice data while reducing total quantity of the voice data, and to provide a storage device, an integrated circuit device, and a voice reproduction system. <P>SOLUTION: The voice data creation method for preparing a plurality of voice data obtained by dividing a voice message at a predetermined dividing parts includes a dividing part selection procedure (Step S102) of selecting the dividing parts of the voice data based on voice frequency included in the voice message, a dividing procedure (Step S104) of dividing the voice data in the dividing parts selected by the dividing part selection procedure. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声データ作成方法、記憶装置、集積回路装置及び音声再生システム等に関する。 The present invention relates to an audio data creation method, a storage device, an integrated circuit device, an audio reproduction system, and the like.

音声ＩＣを搭載し、音声メッセージを出力する音声再生システムが搭載された電子機器が知られている。 2. Description of the Related Art There is known an electronic device equipped with a voice reproduction system that has a voice IC and outputs a voice message.

このような音声再生システムにおいては、音声メッセージを複数の音声データとして記憶装置等に記憶させた構成が知られている。例えば、音声メッセージを単語や文節に基づいて分割した音声データを用意し、音声データを複数の音声メッセージで共通に利用することにより、必要な音声データの総量を削減することができる。
特開２００４−２４０００９号公報特開平９−１０２８１８号公報 In such an audio reproduction system, a configuration in which an audio message is stored as a plurality of audio data in a storage device or the like is known. For example, the total amount of necessary voice data can be reduced by preparing voice data obtained by dividing a voice message based on words or phrases and using the voice data in common for a plurality of voice messages.
JP 2004-240009 A JP-A-9-102818

音声メッセージを単語や文節に基づいて分割した音声データを組み合わせて音声メッセージを再生した場合には、音声データの組合せによっては聞き取りの際に不自然になる場合があった。 When a voice message is reproduced by combining voice data obtained by dividing a voice message based on words or phrases, depending on the combination of voice data, it may become unnatural at the time of listening.

本発明は、以上のような技術的課題に鑑みてなされたものである。本発明の幾つかの態様によれば、音声データの総量を削減しつつ、良好な音声データの再生を可能にする音声データ作成方法、記憶装置、集積回路装置及び音声再生システムを提供できる。 The present invention has been made in view of the above technical problems. According to some aspects of the present invention, it is possible to provide an audio data creation method, a storage device, an integrated circuit device, and an audio reproduction system that enable good audio data reproduction while reducing the total amount of audio data.

（１）本発明に係る音声データ作成方法は、
音声メッセージを所与の分割箇所で分割した複数の音声データを作成する音声データ作成方法であって、
前記音声メッセージに含まれる音声周波数に基づいて前記音声データの分割箇所を選定する分割箇所選定手順と、
前記分割箇所選定手順で選定した分割箇所で前記音声データを分割する分割手順とを含むことを特徴とする。 (1) The audio data creation method according to the present invention includes:
A voice data creation method for creating a plurality of voice data obtained by dividing a voice message at a given division point,
A division location selection procedure for selecting a division location of the voice data based on a voice frequency included in the voice message;
And a division procedure for dividing the audio data at the division point selected in the division point selection procedure.

本発明によれば、音声データを組み合わせて音声メッセージを再生する際に、音声データ間の境界が目立たず、より自然に聞こえる音声データを作成することができる。 ADVANTAGE OF THE INVENTION According to this invention, when reproducing | regenerating an audio | voice message combining audio | voice data, the audio | voice data which can be heard more naturally can be produced without the boundary between audio | voice data being conspicuous.

（２）この音声データ作成方法であって、
前記分割箇所として、前記音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる箇所の少なくとも１つを選定してもよい。 (2) This audio data creation method,
As the division part, at least one of the parts where the intensity of the voice frequency component equal to or lower than a predetermined frequency included in the voice message is equal to or lower than a predetermined value may be selected.

（３）これらのいずれかの音声データ作成方法であって、
前記分割箇所選定手順において、
前記分割箇所として、前記音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる期間が所定時間以上続く箇所の少なくとも１つを選定してもよい。 (3) Any one of these audio data creation methods,
In the division location selection procedure,
As the division part, at least one of the parts where the intensity of the voice frequency component equal to or lower than the predetermined frequency included in the voice message continues for a predetermined time or more may be selected.

（４）これらのいずれかの音声データ作成方法であって、
前記所定周波数は、音声データのサンプリング周波数の１／２であってもよい。 (4) Any one of these audio data creation methods,
The predetermined frequency may be ½ of the sampling frequency of audio data.

（５）これらのいずれかの音声データ作成方法であって、
前記音声メッセージは、数字の読み上げメッセージを含んでもよい。 (5) Any one of these audio data creation methods,
The voice message may include a numeric reading message.

（６）これらのいずれかの音声データ作成方法であって、
複数の前記音声メッセージに共通に含まれる共通音節群の前後を分割箇所候補として選定する分割箇所候補選定手順を含み、
前記分割箇所選定手順において、
前記分割箇所候補の中から前記分割箇所を選定してもよい。 (6) Any one of these audio data creation methods,
Including a division location candidate selection procedure for selecting the front and rear of a common syllable group included in a plurality of the voice messages as division location candidates,
In the division location selection procedure,
The division location may be selected from the division location candidates.

共通音節群は、例えば日本語の音声メッセージの場合は、音声メッセージのうち、平仮名で表した場合に共通文字列となる部分である。 For example, in the case of a Japanese voice message, the common syllable group is a portion that becomes a common character string when expressed in hiragana in the voice message.

（７）本発明に係る記憶装置は、
音声メッセージを分割した複数の音声データが記憶された記憶部を含む記憶装置であって、
少なくとも１つの前記音声データは、その先頭部分が前記音声メッセージに含まれる単語の途中の音節であって、前記音節の先頭に含まれる所定周波数以下の音声周波数成分の強度が所定値以下であることを特徴とする。 (7) A storage device according to the present invention includes:
A storage device including a storage unit in which a plurality of voice data obtained by dividing a voice message is stored,
At least one of the voice data has a head portion of a syllable in the middle of a word included in the voice message, and an intensity of a voice frequency component equal to or lower than a predetermined frequency included in the head of the syllable is equal to or lower than a predetermined value. It is characterized by.

（８）本発明に係る記憶装置は、
音声メッセージを分割した複数の音声データが記憶された記憶部を含む記憶装置であって、
少なくとも１つの前記音声データは、その末尾部分が前記音声メッセージに含まれる単語の途中の音節であって、前記音節の末尾に含まれる所定周波数以下の音声周波数成分の強度が所定値以下であることを特徴とする。 (8) A storage device according to the present invention includes:
A storage device including a storage unit in which a plurality of voice data obtained by dividing a voice message is stored,
At least one of the audio data has an end portion that is a syllable in the middle of a word included in the audio message, and an intensity of an audio frequency component equal to or less than a predetermined frequency included at the end of the syllable is equal to or less than a predetermined value. It is characterized by.

（９）本発明に係る集積回路装置は、
音声メッセージを分割した複数の音声データが記憶された記憶部と、
音声再生コマンドを受け取り、受け取った音声再生コマンドに基づき前記記憶部から音声データを読み出して再生出力する音声再生部とを含む集積回路装置であって、
少なくとも１つの前記音声データは、その先頭部分が前記音声メッセージに含まれる単語の途中の音節であって、前記音節の先頭に含まれる所定周波数以下の音声周波数成分の強度が所定値以下であることを特徴とする。 (9) An integrated circuit device according to the present invention includes:
A storage unit storing a plurality of voice data obtained by dividing a voice message;
An integrated circuit device including an audio reproduction unit that receives an audio reproduction command, reads out audio data from the storage unit based on the received audio reproduction command, and reproduces and outputs the audio data;
At least one of the voice data has a head portion of a syllable in the middle of a word included in the voice message, and an intensity of a voice frequency component equal to or lower than a predetermined frequency included in the head of the syllable is equal to or lower than a predetermined value. It is characterized by.

（１０）本発明に係る集積回路装置は、
音声メッセージを分割した複数の音声データが記憶された記憶部と、
音声再生コマンドを受け取り、受け取った音声再生コマンドに基づき前記記憶部から音声データを読み出して再生出力する音声再生部とを含む集積回路装置であって、
少なくとも１つの前記音声データは、その末尾部分が前記音声メッセージに含まれる単語の途中の音節であって、前記音節の末尾に含まれる所定周波数以下の音声周波数成分の強度が所定値以下であることを特徴とする。 (10) An integrated circuit device according to the present invention includes:
A storage unit storing a plurality of voice data obtained by dividing a voice message;
An integrated circuit device including an audio reproduction unit that receives an audio reproduction command, reads out audio data from the storage unit based on the received audio reproduction command, and reproduces and outputs the audio data;
At least one of the voice data has an end portion of a syllable in the middle of a word included in the voice message, and an intensity of a voice frequency component equal to or lower than a predetermined frequency included at the end of the syllable is equal to or lower than a predetermined value. It is characterized by.

（１１）本発明に係る音声再生システムは、
音声メッセージを分割した複数の音声データが記憶された記憶装置と、
音声再生コマンドを受け取り、受け取った音声再生コマンドに基づき前記記憶装置に記憶された音声データを再生出力する集積回路装置とを含む音声再生システムであって、
前記記憶装置は、これらのいずれかの記憶装置であることを特徴とする。 (11) An audio reproduction system according to the present invention includes:
A storage device storing a plurality of voice data obtained by dividing a voice message;
An audio reproduction system including an integrated circuit device that receives an audio reproduction command and reproduces and outputs audio data stored in the storage device based on the received audio reproduction command;
The storage device is any one of these storage devices.

以下、本発明を適用した実施の形態について図面を参照して説明する。ただし、本発明は以下の実施の形態に限定されるものではない。また、本発明は、以下の内容を自由に組み合わせたものを含むものとする。 Embodiments to which the present invention is applied will be described below with reference to the drawings. However, the present invention is not limited to the following embodiments. Moreover, this invention shall include what combined the following content freely.

１．音声データ作成方法
本実施の形態に係る音声データ作成方法は、音声メッセージを所与の分割箇所で分割した複数の音声データを作成する音声データ作成方法であって、音声メッセージに含まれる音声周波数に基づいて音声データの分割箇所を選定する分割箇所選定手順と、分割箇所選定手順で選定した分割箇所で音声データを分割する分割手順とを含む。 1. Audio data generation method An audio data generation method according to the present embodiment is an audio data generation method for generating a plurality of audio data obtained by dividing an audio message at a given division location, and an audio frequency included in the audio message A division location selection procedure for selecting a division location of the audio data based on the data and a division procedure for dividing the audio data at the division location selected in the division location selection procedure.

音声メッセージとは、例えば音声で読み上げた文や文節であり、文は電子機器などの音声ガイドメッセージとして使用される文であってもよい。本実施の形態においては、音声メッセージに使用される文として「料金は、７００円です。」という文を用いて説明する。 The voice message is, for example, a sentence or a phrase read out by voice, and the sentence may be a sentence used as a voice guide message for an electronic device or the like. In the present embodiment, description will be made using a sentence “Price is 700 yen” as a sentence used for a voice message.

図１は、本実施の形態に係る音声データ作成方法の一例を示すフローチャートである。 FIG. 1 is a flowchart showing an example of a voice data creation method according to the present embodiment.

本実施の形態に係る音声データ作成方法では、まず音声メッセージから音声データを作成する（ステップＳ１００）。 In the voice data creation method according to the present embodiment, voice data is first created from a voice message (step S100).

音声メッセージから音声データを作成する手法は、既知の手法を用いることが可能であり、例えば音声メッセージを読み上げた肉声をサンプリングしたり、ＴＴＳ（Text to Speech）システムにより合成したりしてもよい。 A known method can be used as a method for creating voice data from a voice message. For example, a real voice read out from a voice message may be sampled or synthesized by a TTS (Text to Speech) system.

本実施の形態に係る音声データ作成方法では、ステップＳ１００の次に、音声メッセージに含まれる音声周波数に基づいて音声データの分割箇所を選定する（ステップＳ１０２；分割箇所選定手順に対応）。ステップＳ１０２では、分割箇所として、音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる箇所の少なくとも１つを選定することが可能である。また、ステップＳ１０２では、分割箇所として、音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる期間が所定時間以上続く箇所の少なくとも１つを選定することが可能である。所定時間は、例えば数１０ｍ秒程度とすることができる。 In the audio data creation method according to the present embodiment, after step S100, a division location of the audio data is selected based on the audio frequency included in the audio message (step S102; corresponding to the division location selection procedure). In step S102, it is possible to select at least one of the locations where the intensity of the audio frequency component equal to or lower than the predetermined frequency included in the audio message is equal to or lower than the predetermined value as the division location. Further, in step S102, it is possible to select at least one of the portions where the period in which the intensity of the audio frequency component equal to or lower than the predetermined frequency included in the audio message is equal to or lower than the predetermined value continues for the predetermined time or longer. The predetermined time can be, for example, about several tens of milliseconds.

所定周波数は、例えば音声データのサンプリング周波数ｆの１／２とすることができる。音声メッセージに含まれる音声周波数成分のうち、音声データのサンプリング周波数ｆの１／２を超える音声周波数成分は、その音声データを用いては再現することができないため、音声データの再生時には聞こえないことになるからである。 The predetermined frequency can be set to ½ of the sampling frequency f of the audio data, for example. Of the audio frequency components included in the audio message, an audio frequency component exceeding 1/2 of the sampling frequency f of the audio data cannot be reproduced using the audio data, and therefore cannot be heard when reproducing the audio data. Because it becomes.

したがって、音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる箇所を分割箇所として選定することにより、音が小さくしか聞こえない又はほとんど聞こえない箇所で音声データを分割することにより、音声データを組み合わせて音声メッセージを再生する際に、音声データ間の境界が目立たず、より自然に聞こえる音声データを作成することができる。 Therefore, by selecting a part where the intensity of the voice frequency component equal to or lower than the predetermined frequency included in the voice message is equal to or lower than the predetermined value, the voice data is divided at a part where the sound is audible or hardly audible. Thus, when reproducing voice messages by combining voice data, it is possible to create voice data that can be heard more naturally because the boundary between the voice data is not noticeable.

図２（Ａ）は音声メッセージの時間と音声振幅との関係の一例を示すグラフである。横軸は時間、縦軸は音圧を表す。図２（Ｂ）は音声メッセージの時間と音声周波数との関係の一例を示すグラフである。横軸は時間、縦軸は周波数を表し、音声周波数成分の強度を黒色の濃さで表す。図２（Ａ）及び図２（Ｂ）においては、音声メッセージとして「料金は、７００円です。」という文を読み上げた場合の「りょうきんは」付近についてのグラフを表している。 FIG. 2A is a graph showing an example of the relationship between voice message time and voice amplitude. The horizontal axis represents time, and the vertical axis represents sound pressure. FIG. 2B is a graph showing an example of the relationship between the time of the voice message and the voice frequency. The horizontal axis represents time, the vertical axis represents frequency, and the intensity of the audio frequency component is represented by the darkness of black. In FIGS. 2A and 2B, a graph is shown for the vicinity of “Ryokinha” when a sentence “Price is 700 yen” is read out as a voice message.

図２（Ａ）及び図２（Ｂ）に示す例においては、音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる期間が所定時間以上続く箇所のうち、分割箇所として、時刻ｔ１を選定している。 In the example shown in FIG. 2A and FIG. 2B, as a divided part, a part in which the intensity of the voice frequency component equal to or lower than a predetermined frequency included in the voice message continues for a predetermined time or longer continues. , Time t1 is selected.

本実施の形態に係る音声データ作成方法では、ステップＳ１０２の次に、ステップＳ１０２で選定した分割箇所で音声データを分割する（ステップＳ１０４；分割手順に対応）。図２（Ａ）及び図２（Ｂ）に示す例においては、時刻ｔ１において音声データを分割し、「りょう」に対応する音声データＤ１１と、「きんは」に対応する音声データＤ１２を作成することができる。 In the audio data creation method according to the present embodiment, after step S102, the audio data is divided at the division point selected in step S102 (step S104; corresponding to the division procedure). In the example shown in FIGS. 2A and 2B, the audio data is divided at time t1, and audio data D11 corresponding to “Ryo” and audio data D12 corresponding to “Kinha” are created. be able to.

本実施の形態における音声データ作成方法によれば、音声データの分割箇所として、音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる期間が所定時間以上続く箇所の少なくとも１つを選定している。音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が小さくなる箇所は、音が小さくしか聞こえない又はほとんど聞こえない箇所と考えることができる。 According to the audio data creation method of the present embodiment, at least one of the locations where the intensity of the audio frequency component equal to or lower than the predetermined frequency included in the audio message continues for a predetermined time or longer as the audio data division location. One is selected. A portion where the intensity of a voice frequency component equal to or lower than a predetermined frequency included in the voice message is small can be considered as a portion where the sound can be heard only little or hardly.

したがって、音が小さくしか聞こえない又はほとんど聞こえない箇所で音声データを分割することにより、音声データを組み合わせて音声メッセージを再生する際に、音声データ間の境界が目立たず、より自然に聞こえる音声データを作成することができる。 Therefore, by dividing the voice data at a place where the sound can only be heard or hardly heard, the voice data can be heard more naturally when the voice data is reproduced by combining the voice data and the boundary between the voice data is not noticeable. Can be created.

上述の説明では、音声メッセージとして「料金は、７００円です。」という文を読み上げた場合の「りょうきんは」に対応する音声データを作成する例について説明したが、同様に「ななひゃくえんです」に対応する音声データとして、「ななひゃ」に対応する音声データＤ２１と「くえんです」に対応する音声データＤ２２を作成することができる。 In the above description, an example of creating voice data corresponding to “Ryokinha” when a sentence “Price is 700 yen” is read as a voice message has been explained. Similarly, “Nanahyakuen” As the audio data corresponding to “is”, the audio data D21 corresponding to “Nanahya” and the audio data D22 corresponding to “kuen de” can be created.

また、上述の説明では音声メッセージ「料金は、７００円です。」についての音声データを作成する例で説明したが、他の音声メッセージについても同様に音声データの作成が可能である。例えば、音声メッセージ「料金は、９００円です。」について、「りょう」に対応する音声データＤ３１、「きんは」に対応する音声データＤ３２、「きゅうひゃ」に対応する音声データ４１、「くえんです」に対応する音声データＤ４２を作成することができる。 In the above description, the voice data for the voice message “Price is 700 yen” has been described as an example. However, voice data can be similarly created for other voice messages. For example, for the voice message “Fee is 900 yen”, the voice data D31 corresponding to “Ryo”, the voice data D32 corresponding to “Kinha”, the voice data 41 corresponding to “Kyuhya”, “Ken” The voice data D42 corresponding to “is” can be created.

この例の場合、「りょう」に対応する音声データＤ１１と音声データＤ３１、「きんは」に対応する音声データＤ１２と音声データＤ３２、「くえんです」に対応する音声データＤ２２と音声データＤ４２は、それぞれ共通の音声データとすることができる。これにより、例えば音声データを組み合わせて音声メッセージを再生する音声再生システム等において、音声データの総量を削減することが可能である。 In this example, the audio data D11 and audio data D31 corresponding to “Ryo”, the audio data D12 and audio data D32 corresponding to “Kinha”, and the audio data D22 and audio data D42 corresponding to “Ken de” are: Each can be common audio data. Thereby, it is possible to reduce the total amount of audio data, for example, in an audio reproduction system that reproduces an audio message by combining audio data.

図３（Ａ）及び図３（Ｂ）は、音声メッセージを生成する音声データの組合せ例を示すグラフである。横軸は時間、縦軸は音圧を表す。また、図３（Ａ）は、音声メッセージ「料金は、７００円です。」を生成する場合の組合せ、図３（Ｂ）は、音声メッセージ「料金は、９００円です。」を生成する場合の組合せを表す。 FIGS. 3A and 3B are graphs showing examples of combinations of voice data for generating voice messages. The horizontal axis represents time, and the vertical axis represents sound pressure. 3A is a combination for generating a voice message “Price is 700 yen”, and FIG. 3B is a case for generating a voice message “Price is 900 yen”. Represents a combination.

音声メッセージ「料金は、７００円です。」を生成する場合には、「りょう」に対応する音声データＤ１１、「きんは」に対応する音声データＤ１２、「ななひゃ」に対応する音声データＤ２１、「くえんです」に対応する音声データＤ２２の順に再生する。なお、音声データＤ１２と音声データＤ２１の間には、無音期間を挿入している。 When generating a voice message “Price is 700 yen”, voice data D11 corresponding to “Ryo”, voice data D12 corresponding to “Kinha”, voice data corresponding to “Nanahya”. The audio data D22 corresponding to “D21” and “kuen is” are reproduced in this order. A silent period is inserted between the audio data D12 and the audio data D21.

音声メッセージ「料金は、９００円です。」を生成する場合には、「りょう」に対応する音声データＤ１１、「きんは」に対応する音声データＤ１２、「きゅうひゃ」に対応する音声データＤ４１、「くえんです」に対応する音声データＤ２２の順に再生する。なお、音声データＤ１２と音声データＤ４１の間には、無音期間を挿入している。 When generating the voice message “Fee is 900 yen”, the voice data D11 corresponding to “Ryo”, the voice data D12 corresponding to “Kinha”, and the voice data D41 corresponding to “Kyuhya”. , Audio data D22 corresponding to “kuen is” are reproduced in this order. A silence period is inserted between the audio data D12 and the audio data D41.

このように、音声メッセージ「料金は、７００円です。」と「料金は、９００円です。」を生成する場合には、音声データＤ１１、Ｄ１２、Ｄ２１、Ｄ２２、Ｄ４１を用意するのみで済むので、全ての音声データを用意する場合に比べて音声データの総量を削減することができる。音声データを分割して共用して音声データの総量を削減する手法は、例えば音声メッセージが数字の読み上げメッセージ（例えば金額、時間、時刻、日付、温度、湿度等を読み上げるメッセージ）を含む場合には、共用できる音声データが多くなり、特に有効である。 In this way, when generating the voice messages “Price is 700 yen” and “Price is 900 yen”, it is only necessary to prepare the voice data D11, D12, D21, D22, D41. The total amount of audio data can be reduced compared to the case where all audio data is prepared. A method of reducing the total amount of audio data by dividing and sharing the audio data is, for example, when the audio message includes a numerical reading message (for example, a message reading the amount, time, time, date, temperature, humidity, etc.). The voice data that can be shared increases and is particularly effective.

したがって、本実施の形態における音声データ作成方法によれば、音声データの総量を削減しつつ、良好な音声データの再生を可能にする音声データを作成することができる。 Therefore, according to the audio data creation method in the present embodiment, it is possible to create audio data that enables good audio data reproduction while reducing the total amount of audio data.

〔変形例〕
図１に示すフローチャートを用いて説明した実施の形態において、さらに、複数の音声メッセージに共通に含まれる共通音節群の前後を分割箇所候補として選定する分割箇所候補選定手順を含み、分割箇所選定手順において、分割箇所候補の中から分割箇所を選定することも可能である。 [Modification]
In the embodiment described with reference to the flowchart shown in FIG. 1, further includes a division location candidate selection procedure for selecting a division location candidate selection procedure as a division location candidate before and after a common syllable group included in a plurality of voice messages in common. It is also possible to select a division location from among the division location candidates.

図４は、本実施の形態に係る音声データ作成方法の他の一例を示すフローチャートである。図４に示す例は、図１に示すフローチャートを用いて説明した実施の形態に、分割箇所候補選定手順を加えた例である。なお、図１のフローチャートと同一の手順には同一の符号を付し、詳細な説明を省略する。 FIG. 4 is a flowchart showing another example of the audio data creation method according to the present embodiment. The example shown in FIG. 4 is an example in which a division point candidate selection procedure is added to the embodiment described using the flowchart shown in FIG. The same steps as those in the flowchart of FIG. 1 are denoted by the same reference numerals, and detailed description thereof is omitted.

次に、複数の音声メッセージに共通に含まれる共通音節群の前後を分割箇所候補として選定する（ステップＳ２００；分割箇所候補選定手順に対応）。共通音節群は、例えば日本語の音声メッセージの場合には、音声メッセージのうち、平仮名で表した場合に共通文字列となる部分である。 Next, before and after a common syllable group included in common in a plurality of voice messages are selected as division point candidates (step S200; corresponding to a division point candidate selection procedure). For example, in the case of a Japanese voice message, the common syllable group is a portion that becomes a common character string when expressed in hiragana in the voice message.

例えば音声メッセージとして「７００円です。」と「９００円です。」を用いる場合には、共通音節群としては「ひゃくえんです」、「ひゃくえんで」、「す」、「ひゃく」、「えんです」、「ひゃ」、「くえんです」、「ひゃ」、「くえん」、「です」、「えん」等、様々な組合せが考えられる。ステップＳ３００では、これらの共通音節群の前後を分割箇所候補として選定する。 For example, when using “700 yen” and “900 yen” as voice messages, the common syllables are “Hyakuen de”, “Hyakuen”, “Su”, “Hyaku”, “En” There are various combinations such as “is”, “hya”, “kuen is”, “hya”, “kuen”, “is”, “en”. In step S300, the front and rear of these common syllable groups are selected as candidate division points.

本実施の形態に係る音声データ作成方法では、ステップＳ２００の次に、音声メッセージに含まれる音声周波数に基づいて音声データの分割箇所を分割箇所候補の中から選定する（ステップＳ２０２；分割箇所選定手順に対応）。ステップＳ２０２では、分割箇所として、音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる箇所の少なくとも１つを選定することが可能である。また、ステップＳ２０２では、分割箇所として、音声メッセージに含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる期間が所定時間以上続く箇所の少なくとも１つを選定することが可能である。所定時間は、例えば数１０ｍ秒程度とすることができる。所定周波数は、例えば音声データのサンプリング周波数ｆの１／２とすることができる。 In the audio data creation method according to the present embodiment, after step S200, the audio data division location is selected from the division location candidates based on the audio frequency included in the audio message (step S202; division location selection procedure). Corresponding). In step S202, it is possible to select at least one of the locations where the intensity of the audio frequency component equal to or lower than the predetermined frequency included in the audio message is equal to or lower than the predetermined value as the division location. Further, in step S202, it is possible to select at least one of the parts where the period in which the intensity of the audio frequency component equal to or lower than the predetermined frequency included in the audio message is equal to or less than the predetermined value continues for a predetermined time or more as the division part. The predetermined time can be, for example, about several tens of milliseconds. The predetermined frequency can be set to ½ of the sampling frequency f of the audio data, for example.

本実施の形態に係る音声データ作成方法では、ステップＳ２０２の次に、ステップＳ２０２で選定した分割箇所で音声データを分割する（ステップＳ１０４；分割手順に対応）。 In the audio data creation method according to the present embodiment, after step S202, the audio data is divided at the division point selected in step S202 (step S104; corresponding to the division procedure).

このように、あらかじめ分割箇所候補を選定しておくことにより、分割箇所の選定が容易になる。 In this way, by selecting the candidate for the division part in advance, it becomes easy to select the division part.

２．記憶装置
図５は、本実施の形態に係る記憶装置の構成の一例を示す機能ブロック図である。 2. Storage Device FIG. 5 is a functional block diagram showing an example of the configuration of the storage device according to the present embodiment.

本実施の形態に係る記憶装置１は、記憶部１０を含む。記憶部１０は、音声メッセージを分割した複数の音声データを記憶する。 The storage device 1 according to the present embodiment includes a storage unit 10. The storage unit 10 stores a plurality of voice data obtained by dividing a voice message.

本実施の形態に係る記憶装置１は、インターフェイス部１２を含んでもよい。インターフェイス部１２は、記憶装置１と他の装置（図示せず）との間で音声データ等を入出力する際のインターフェイスとして機能する。 The storage device 1 according to the present embodiment may include an interface unit 12. The interface unit 12 functions as an interface for inputting / outputting audio data and the like between the storage device 1 and another device (not shown).

本実施の形態に係る記憶装置１の記憶部１０に記憶される音声データの少なくとも１つは、その先頭部分が音声メッセージに含まれる単語の途中であって、音節の先頭に含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる音声データとすることができる。
また、本実施の形態に係る記憶装置１の記憶部１０に記憶される音声データの少なくとも１つは、その末尾部分が音声メッセージに含まれる単語の途中であって、音節の末尾に含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる音声データとすることができる。 At least one of the voice data stored in the storage unit 10 of the storage device 1 according to the present embodiment has a head portion in the middle of a word included in the voice message and not more than a predetermined frequency included in the head of the syllable. The voice data can be voice data in which the intensity of the voice frequency component is a predetermined value or less.
In addition, at least one of the voice data stored in the storage unit 10 of the storage device 1 according to the present embodiment is a predetermined part included in the end of the syllable, the end of which is in the middle of a word included in the voice message. It can be set as the audio | speech data from which the intensity | strength of the audio | voice frequency component below a frequency becomes below a predetermined value.

所定周波数は、例えば音声データのサンプリング周波数の１／２とすることができる。音声メッセージに含まれる音声周波数成分のうち、音声データのサンプリング周波数の１／２を超える音声周波数成分は、その音声データを用いては再現することができないため、音声データの再生時には聞こえないことになるからである。 The predetermined frequency can be set to ½ of the sampling frequency of the audio data, for example. Of the audio frequency components included in the audio message, an audio frequency component exceeding 1/2 of the sampling frequency of the audio data cannot be reproduced using the audio data, so that it cannot be heard when reproducing the audio data. Because it becomes.

図６は、音声データの一例を示す図である。本実施の形態に係る記憶装置１の記憶部１０は、音声メッセージとして「料金は、７００円です。」と「料金は、９００円です。」の２種類を生成するための音声データを記憶しているものとして説明する。 FIG. 6 is a diagram illustrating an example of audio data. The storage unit 10 of the storage device 1 according to the present embodiment stores voice data for generating two types of voice messages “Price is 700 yen” and “Price is 900 yen”. Explain that it is.

図６に示す音声データから音声メッセージとして「料金は、７００円です。」を生成する場合には、「りょう」に対応する音声データＤ１１、「きんは」に対応する音声データＤ１２、「ななひゃ」に対応する音声データＤ２１、「くえんです」に対応する音声データＤ２２を順に再生することにより音声メッセージを生成することができる。 In the case where “fee is 700 yen” is generated as a voice message from the voice data shown in FIG. 6, voice data D11 corresponding to “Ryo”, voice data D12 corresponding to “Kinha”, “Nana” A voice message can be generated by sequentially playing back the voice data D21 corresponding to "Hya" and the voice data D22 corresponding to "Kuen de".

また、図６に示す音声データから音声メッセージとして「料金は、９００円です。」を生成する場合には、「りょう」に対応する音声データＤ１１、「きんは」に対応する音声データＤ１２、「きゅうひゃ」に対応する音声データＤ４１、「くえんです」に対応する音声データＤ２２を順に再生することにより音声メッセージを生成することができる。 Further, in the case of generating “a fee is 900 yen” as a voice message from the voice data shown in FIG. 6, voice data D11 corresponding to “Ryo”, voice data D12 corresponding to “Kinha”, “ A voice message can be generated by sequentially playing back the voice data D41 corresponding to “Kyuhya” and the voice data D22 corresponding to “Kuen de”.

図６に示す音声データにおいて、「きんは」に対応する音声データＤ１２と「くえんです」に対応する音声データＤ２２は、その先頭部分が音声メッセージに含まれる単語の途中であって、音節の先頭に含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる音声データである。 In the voice data shown in FIG. 6, the voice data D12 corresponding to “Kinha” and the voice data D22 corresponding to “Kuin de” are in the middle of a word included in the voice message and the head of the syllable Is the audio data in which the intensity of the audio frequency component equal to or lower than the predetermined frequency included in is equal to or lower than the predetermined value.

また、図６に示す音声データにおいて、「りょう」に対応する音声データＤ１１、「ななひゃ」に対応する音声データＤ２１、「きゅうひゃ」に対応する音声データＤ４１は、その末尾部分が音声メッセージに含まれる単語の途中であって、音節の末尾に含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる音声データである。 In addition, in the audio data shown in FIG. 6, the audio data D11 corresponding to “Ryo”, the audio data D21 corresponding to “Nanahya”, and the audio data D41 corresponding to “Kyuna” have the end portion thereof. This is voice data in which the intensity of a voice frequency component equal to or lower than a predetermined frequency included in the end of a syllable is in the middle of a word included in the voice message.

これらの音声データは、例えば先に「１．音声データの作成方法」で説明した方法で作成することができる。 These audio data can be created, for example, by the method described above in “1. Method for creating audio data”.

このように、音が小さくしか聞こえない又はほとんど聞こえない箇所で分割した音声データを記憶することにより、音声データを組み合わせて音声メッセージを再生する際に、音声データ間の境界が目立たず、より自然に聞こえる音声データを記憶した記憶装置を実現することができる。 In this way, by storing audio data divided at places where the sound is only low or hardly audible, when the audio message is reproduced by combining the audio data, the boundary between the audio data is not noticeable and more natural. It is possible to realize a storage device that stores voice data that can be heard.

３．集積回路装置
図７は、本実施の形態に係る集積回路装置の構成の一例を示すハードウェアブロック図である。 3. Integrated Circuit Device FIG. 7 is a hardware block diagram showing an example of the configuration of the integrated circuit device according to the present embodiment.

集積回路装置２は、ホストＣＰＵ１００とともに電子機器に実装され、ホストＣＰＵ１００からの制御コマンドによって動作し、電子機器に搭載されるホストＣＰＵ１００をホストとするコンパニオンチップとして動作する。集積回路装置２は、ホストＣＰＵ１００から発効されるコマンドで制御される。コマンドとしては、集積回路装置２の起動やデータ転送、再生／停止などの音声処理などが用意されている。 The integrated circuit device 2 is mounted on an electronic device together with the host CPU 100, operates according to a control command from the host CPU 100, and operates as a companion chip having the host CPU 100 mounted on the electronic device as a host. The integrated circuit device 2 is controlled by a command issued from the host CPU 100. As commands, voice processing such as activation of the integrated circuit device 2, data transfer, reproduction / stop, and the like are prepared.

ホストＣＰＵ１００は、電子機器に実装され電子機器の主制御や全体制御を行う電子機器組み込みのマイクロコンピュータ等である。 The host CPU 100 is a microcomputer incorporated in an electronic device that is mounted on the electronic device and performs main control and overall control of the electronic device.

集積回路装置２とホストＣＰＵ１００の通信は、例えばＳＰＩ（クロック同期式３線（ＲＥＱ、ＲＥＳ、ＩＮＤ）シリアル）転送、ＳＰＩ／ＵＡＲＴ転送により行われるように構成することができる。 The communication between the integrated circuit device 2 and the host CPU 100 can be configured to be performed by, for example, SPI (clock synchronous three-wire (REQ, RES, IND) serial) transfer or SPI / UART transfer.

集積回路装置２は、記憶部２０を含む。記憶部２０は、音声メッセージを分割した複数の音声データを記憶する。音声データは、後述する音声再生部３０で再生可能なＰＣＭデータ（例えばＡＤＰＣＭ／ＡＡＣ−ＬＣ）等の音声データが圧縮した形式で記憶されている。 The integrated circuit device 2 includes a storage unit 20. The storage unit 20 stores a plurality of voice data obtained by dividing the voice message. The audio data is stored in a compressed format of audio data such as PCM data (for example, ADPCM / AAC-LC) that can be reproduced by the audio reproduction unit 30 described later.

集積回路装置２は、音声再生部３０を含む。音声再生部３０は、音声再生コマンドを受け取り、受け取った音声再生コマンドに基づき、記憶部２０から音声データを読み出して再生出力する。音声再生コマンドには、例えば、後述する内蔵データ再生コマンドや添付データ付き再生コマンドを含んでもよい。 The integrated circuit device 2 includes an audio reproduction unit 30. The audio reproduction unit 30 receives the audio reproduction command, reads out audio data from the storage unit 20 based on the received audio reproduction command, and reproduces and outputs it. The audio playback command may include, for example, a built-in data playback command and a playback command with attached data, which will be described later.

音声再生部３０は、例えば、コマンド処理部３１、再生対象格納バッファ３２、デコード部３３、デコード済データ格納バッファ３４、Ｄ／Ａコンバータ３５を含んで構成することが可能である。 The audio reproducing unit 30 can include, for example, a command processing unit 31, a reproduction target storage buffer 32, a decoding unit 33, a decoded data storage buffer 34, and a D / A converter 35.

コマンド処理部３１は、ホストＣＰＵ１００とのコマンドやデータのやり取りの制御を行うもので、例えばホストＣＰＵ１００から受け取った各種コマンドの処理（集積回路装置２の起動やデータ転送、再生／停止などの音声処理等）や、ホストＣＰＵ１００との通信のハンドシェイク処理等を行う。 The command processing unit 31 controls the exchange of commands and data with the host CPU 100. For example, the command processing unit 31 processes various commands received from the host CPU 100 (audio processing such as activation of the integrated circuit device 2, data transfer, reproduction / stop). Etc.), handshake processing of communication with the host CPU 100, and the like.

コマンド処理部３１は、ホストＣＰＵ１００から受信したコマンドに基づき記憶部２０に記憶された音声データを読み出して再生するかコマンドに添付されている音声データを再生するか判断し、判断結果に基づき再生対象となる音声データを再生対象格納バッファに格納する制御を行う。例えば、コマンド処理部３１は、ホストＣＰＵ１００から添付データ付き再生コマンドを受信した場合には、添付された音声データを再生対象格納バッファ３２に格納する制御を行い、内蔵データ再生コマンドを受信した場合には、記憶部２０から再生対象となる音声データを読み出して、読み出した音声データを再生対象格納バッファ３２に格納する制御を行う。 The command processing unit 31 determines whether to read and reproduce the audio data stored in the storage unit 20 based on the command received from the host CPU 100 or to reproduce the audio data attached to the command. The audio data to be stored is controlled to be stored in the reproduction target storage buffer. For example, when receiving a playback command with attached data from the host CPU 100, the command processing unit 31 performs control to store the attached audio data in the playback target storage buffer 32, and when receiving a built-in data playback command. Controls to read out audio data to be reproduced from the storage unit 20 and store the read audio data in the reproduction object storage buffer 32.

またコマンド処理部３１は、添付データに無音区間を設定して再生することを指示する添付データ付き再生コマンドを受信した場合には、添付データに対応した音声が再生出力される際に無音区間を設定する制御を行う無音区間設定制御部３１０を含んでもよい。無音区間設定制御部３２は、無音区間を添付データに対応する音声の出力前に設定するようにしてもよい。 In addition, when the command processing unit 31 receives a playback command with attached data instructing to set and reproduce a silent period in the attached data, the command processing unit 31 selects the silent period when the sound corresponding to the attached data is reproduced and output. A silent section setting control unit 310 that performs control to be set may be included. The silence interval setting control unit 32 may set the silence interval before outputting the sound corresponding to the attached data.

再生対象格納バッファ４０は、再生対象となる音声データが格納されるバッファである。 The reproduction target storage buffer 40 is a buffer for storing audio data to be reproduced.

デコード部３３は、再生対象格納バッファ３２に格納された音声データをデコードする。再生オーディオフォーマットは、例えばＡＤＰＣＭ／ＡＡＣ−ＬＣがサポートされるようにしてもよい。デコード済データ格納バッファ３４は、デコード部３３でデコードされたデータが格納されるバッファである。Ｄ／Ａコンバータ３５は、デコード済データ格納バッファ３４に格納されたデータをＤ／Ａ変換してスピーカ１２０へ出力する。 The decoding unit 33 decodes the audio data stored in the reproduction target storage buffer 32. For example, ADPCM / AAC-LC may be supported as a playback audio format. The decoded data storage buffer 34 is a buffer in which the data decoded by the decoding unit 33 is stored. The D / A converter 35 D / A converts the data stored in the decoded data storage buffer 34 and outputs it to the speaker 120.

デコード部３３は、コマンド処理部３１からデコード開始信号２１０を受け取ることによりデコードを開始し、デコード終了後にコマンド処理部３１にデコード終了信号２２０を出力する構成としてもよい。 The decoding unit 33 may be configured to start decoding by receiving the decoding start signal 210 from the command processing unit 31 and output the decoding end signal 220 to the command processing unit 31 after the decoding ends.

また、Ｄ／Ａコンバータ３５は、スピーカ１２０への出力信号２５０の出力終了後に出力終了信号２３０をコマンド処理部３１に出力する構成としてもよい。また、Ｄ／Ａコンバータ３５は、無音区間設定制御部３１０から無音区間設定信号２４０を受け取り、無音区間設定信号２４０に基づいて設定される所定期間中は出力信号２５０を出力しない無音期間を設ける構成としてもよい。 Further, the D / A converter 35 may output the output end signal 230 to the command processing unit 31 after the output of the output signal 250 to the speaker 120 is completed. Further, the D / A converter 35 receives the silent period setting signal 240 from the silent period setting control unit 310 and provides a silent period in which the output signal 250 is not output during a predetermined period set based on the silent period setting signal 240. It is good.

図８は、本発明に係る集積回路装置２の処理の流れを示すフローチャートである。 FIG. 8 is a flowchart showing a processing flow of the integrated circuit device 2 according to the present invention.

集積回路装置２はホストＣＰＵ１００からコマンドを受信すると以下の処理を行う。 When receiving a command from the host CPU 100, the integrated circuit device 2 performs the following processing.

まずコマンド処理部３１は、受信したコマンドが内蔵データ再生コマンドか否か判断し（ステップＳ１０）、内蔵データ再生コマンドである場合には、コマンドで指示されたアドレス情報に基づき記憶部２０から音声データを読み出して、読み出した音声データを再生対象格納バッファ３２に出力する（ステップＳ２０）。 First, the command processing unit 31 determines whether or not the received command is a built-in data reproduction command (step S10). If the command is a built-in data reproduction command, the voice data is stored from the storage unit 20 based on the address information indicated by the command. And the read audio data is output to the reproduction target storage buffer 32 (step S20).

また受信したコマンドが内蔵データ再生コマンドでない場合には、添付データ付き再生コマンドであるか否か判断し（ステップＳ３０）、添付データ付き再生コマンドである場合には、コマンドに添付された音声データを取り出して再生対象格納バッファ３２に出力する（ステップＳ４０）。 If the received command is not a built-in data playback command, it is determined whether or not it is a playback command with attached data (step S30). If the received command is a playback command with attached data, the audio data attached to the command is determined. The data is taken out and output to the reproduction target storage buffer 32 (step S40).

次にデコード部３３は、再生対象格納バッファ３２からデータを読み出して、読み出したデータをデコードしてデコード済データを生成し、デコード済データ格納バッファ３４に格納する（ステップＳ５０）。 Next, the decoding unit 33 reads data from the reproduction target storage buffer 32, decodes the read data to generate decoded data, and stores the decoded data in the decoded data storage buffer 34 (step S50).

次にＤ／Ａコンバータ３５は、デコード済みデータ格納バッファ３４のデータをＤ／Ａ変換してスピーカ１２０へ出力する（ステップＳ６０）。 Next, the D / A converter 35 D / A converts the data in the decoded data storage buffer 34 and outputs it to the speaker 120 (step S60).

本実施の形態に係る集積回路装置２の記憶部２０に記憶される音声データの少なくとも１つは、その先頭部分が音声メッセージに含まれる単語の途中であって、音節の先頭に含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる音声データとすることができる。
また、本実施の形態に係る集積回路装置２の記憶部２０に記憶される音声データの少なくとも１つは、その末尾部分が音声メッセージに含まれる単語の途中であって、音節の末尾に含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる音声データとすることができる。 At least one of the audio data stored in the storage unit 20 of the integrated circuit device 2 according to the present embodiment has a predetermined frequency included at the beginning of a syllable whose leading part is in the middle of a word included in the voice message. It can be set as the audio | speech data from which the intensity | strength of the following audio | voice frequency components becomes below a predetermined value.
In addition, at least one of the voice data stored in the storage unit 20 of the integrated circuit device 2 according to the present embodiment has a tail part in the middle of a word included in the voice message and is included at the end of the syllable. It can be set as the audio | speech data from which the intensity | strength of the audio frequency component below a predetermined frequency becomes below a predetermined value.

このような音声データは、例えば図６に示すような音声データである。これらの音声データは、例えば先に「１．音声データの作成方法」で説明した方法で作成することができる。 Such audio data is, for example, audio data as shown in FIG. These audio data can be created, for example, by the method described above in “1. Method for creating audio data”.

このように、音が小さくしか聞こえない又はほとんど聞こえない箇所で分割した音声データを記憶することにより、音声データを組み合わせて音声メッセージを再生する際に、音声データ間の境界が目立たず、より自然に聞こえる集積回路装置を実現することができる。 In this way, by storing audio data divided at places where the sound is only low or hardly audible, when the audio message is reproduced by combining the audio data, the boundary between the audio data is not noticeable and more natural. An integrated circuit device can be realized.

４．音声再生システム
図９は、本実施の形態に係る音声再生システムの構成の一例を示すハードウェアブロック図である。 4). Audio Playback System FIG. 9 is a hardware block diagram showing an example of the configuration of the audio playback system according to the present embodiment.

本実施の形態に係る音声再生システムは、記憶装置１及び集積回路装置４を含む。記憶装置１は、音声メッセージを分割した複数の音声データが記憶されている。本実施の形態においては、図５及び６を用いて説明した記憶装置である。集積回路装置４は、音声再生コマンドを受け取り、受け取った音声再生コマンドに基づいて記憶装置１に記憶された音声データを再生出力する。 The audio reproduction system according to the present embodiment includes a storage device 1 and an integrated circuit device 4. The storage device 1 stores a plurality of voice data obtained by dividing a voice message. In the present embodiment, the storage device described with reference to FIGS. The integrated circuit device 4 receives the audio reproduction command, and reproduces and outputs the audio data stored in the storage device 1 based on the received audio reproduction command.

集積回路装置４は、ホストＣＰＵ１００とともに電子機器に実装され、ホストＣＰＵ１００からの制御コマンドによって動作し、電子機器に搭載されるホストＣＰＵ１００をホストとするコンパニオンチップとして動作する。集積回路装置４は、ホストＣＰＵ１００から発効されるコマンドで制御される。コマンドとしては、集積回路装置４の起動やデータ転送、再生／停止などの音声処理などが用意されている。 The integrated circuit device 4 is mounted on an electronic device together with the host CPU 100, operates according to a control command from the host CPU 100, and operates as a companion chip having the host CPU 100 mounted on the electronic device as a host. The integrated circuit device 4 is controlled by a command issued from the host CPU 100. As commands, voice processing such as activation of the integrated circuit device 4, data transfer, reproduction / stop, and the like are prepared.

ホストＣＰＵ１００は、電子機器に実装され電子機器の主制御や全体制御を行う電子機器組み込みのマイクロコンピュータ等である。また、ホストＣＰＵ１００は、記憶装置１から音声データを読み出し、集積回路装置４へ転送する。 The host CPU 100 is a microcomputer incorporated in an electronic device that is mounted on the electronic device and performs main control and overall control of the electronic device. In addition, the host CPU 100 reads audio data from the storage device 1 and transfers it to the integrated circuit device 4.

集積回路装置４とホストＣＰＵ１００の通信は、例えばＳＰＩ（クロック同期式３線（ＲＥＱ、ＲＥＳ、ＩＮＤ）シリアル）転送、ＳＰＩ／ＵＡＲＴ転送により行われるように構成することができる。 The communication between the integrated circuit device 4 and the host CPU 100 can be configured to be performed, for example, by SPI (clock synchronous three-wire (REQ, RES, IND) serial) transfer or SPI / UART transfer.

集積回路装置４は、音声再生部３０を含む。音声再生部３０は、例えば、コマンド処理部３１、再生対象格納バッファ３２、デコード部３３、デコード済データ格納バッファ３４、Ｄ／Ａコンバータ３５を含んで構成することが可能である。 The integrated circuit device 4 includes an audio reproduction unit 30. The audio reproducing unit 30 can include, for example, a command processing unit 31, a reproduction target storage buffer 32, a decoding unit 33, a decoded data storage buffer 34, and a D / A converter 35.

コマンド処理部３１は、ホストＣＰＵ１００とのコマンドやデータのやり取りの制御を行うもので、例えばホストＣＰＵ１００から受け取った各種コマンドの処理（集積回路装置４の起動やデータ転送、再生／停止などの音声処理等）や、ホストＣＰＵ１００との通信のハンドシェイク処理等を行う。 The command processing unit 31 controls the exchange of commands and data with the host CPU 100. For example, the command processing unit 31 processes various commands received from the host CPU 100 (audio processing such as activation of the integrated circuit device 4, data transfer, reproduction / stop). Etc.), handshake processing of communication with the host CPU 100, and the like.

コマンド処理部３１は、ホストＣＰＵ１００から添付データ付き再生コマンドを受信した場合には、添付された音声データを再生対象格納バッファ３２に格納する制御を行う。 When receiving a playback command with attached data from the host CPU 100, the command processing unit 31 performs control to store the attached audio data in the playback target storage buffer 32.

本実施の形態に係る音声再生システム３の記憶装置１に記憶される音声データの少なくとも１つは、その先頭部分が音声メッセージに含まれる単語の途中であって、音節の先頭に含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる音声データとすることができる。
また、本実施の形態に係る音声再生システム３の記憶装置１に記憶される音声データの少なくとも１つは、その末尾部分が音声メッセージに含まれる単語の途中であって、音節の末尾に含まれる所定周波数以下の音声周波数成分の強度が所定値以下となる音声データとすることができる。 At least one of the audio data stored in the storage device 1 of the audio reproduction system 3 according to the present embodiment has a predetermined frequency included at the beginning of the syllable at the beginning of the word included in the audio message. It can be set as the audio | speech data from which the intensity | strength of the following audio | voice frequency components becomes below a predetermined value.
In addition, at least one of the audio data stored in the storage device 1 of the audio reproduction system 3 according to the present embodiment is included in the middle of a word included in the audio message and at the end of the syllable. It can be set as the audio | speech data from which the intensity | strength of the audio frequency component below a predetermined frequency becomes below a predetermined value.

このような音声データは、例えば図９に示すような音声データである。これらの音声データは、例えば先に「１．音声データの作成方法」で説明した方法で作成することができる。 Such audio data is, for example, audio data as shown in FIG. These audio data can be created, for example, by the method described above in “1. Method for creating audio data”.

このように、音が小さくしか聞こえない又はほとんど聞こえない箇所で分割した音声データを記憶することにより、音声データを組み合わせて音声メッセージを再生する際に、音声データ間の境界が目立たず、より自然に聞こえる音声再生システムを実現することができる。 In this way, by storing audio data divided at places where the sound is only low or hardly audible, when the audio message is reproduced by combining the audio data, the boundary between the audio data is not noticeable and more natural. It is possible to realize a sound reproduction system that can be heard.

なお、本発明は本実施の形態に限定されず、本発明の要旨の範囲内で種々の変形実施が可能である。 In addition, this invention is not limited to this Embodiment, A various deformation | transformation implementation is possible within the range of the summary of this invention.

本発明は、実施の形態で説明した構成と実質的に同一の構成（例えば、機能、方法及び結果が同一の構成、あるいは目的及び効果が同一の構成）を含む。また、本発明は、実施の形態で説明した構成の本質的でない部分を置き換えた構成を含む。また、本発明は、実施の形態で説明した構成と同一の作用効果を奏する構成又は同一の目的を達成することができる構成を含む。また、本発明は、実施の形態で説明した構成に公知技術を付加した構成を含む。 The present invention includes configurations that are substantially the same as the configurations described in the embodiments (for example, configurations that have the same functions, methods, and results, or configurations that have the same objects and effects). In addition, the invention includes a configuration in which a non-essential part of the configuration described in the embodiment is replaced. In addition, the present invention includes a configuration that exhibits the same operational effects as the configuration described in the embodiment or a configuration that can achieve the same object. Further, the invention includes a configuration in which a known technique is added to the configuration described in the embodiment.

本実施の形態に係る音声データ作成方法の一例を示すフローチャート。The flowchart which shows an example of the audio | voice data creation method which concerns on this Embodiment. 図２（Ａ）は音声メッセージの時間と音声振幅との関係の一例を示すグラフ、図２（Ｂ）は音声メッセージの時間と音声周波数との関係の一例を示すグラフ。2A is a graph showing an example of the relationship between the time of the voice message and the voice amplitude, and FIG. 2B is a graph showing an example of the relationship between the time of the voice message and the voice frequency. 図３（Ａ）及び図３（Ｂ）は、音声メッセージを生成する音声データの組合せ例を示すグラフ。3A and 3B are graphs showing examples of combinations of voice data for generating voice messages. 本実施の形態に係る音声データ作成方法の他の一例を示すフローチャート。The flowchart which shows another example of the audio | voice data creation method which concerns on this Embodiment. 本実施の形態に係る記憶装置の構成の一例を示す機能ブロック図。3 is a functional block diagram illustrating an example of a configuration of a storage device according to an embodiment. FIG. 音声データの一例を示す図。The figure which shows an example of audio | voice data. 本実施の形態に係る集積回路装置の構成の一例を示すハードウェアブロック図。1 is a hardware block diagram illustrating an example of a configuration of an integrated circuit device according to an embodiment. 本発明に係る集積回路装置の処理の流れを示すフローチャート。4 is a flowchart showing a flow of processing of the integrated circuit device according to the present invention. 本実施の形態に係る音声再生システムの構成の一例を示すハードウェアブロック図。The hardware block diagram which shows an example of a structure of the audio | voice reproduction | regeneration system concerning this Embodiment.

Explanation of symbols

１記憶装置、２集積回路装置、３音声再生システム、４集積回路装置、１０記憶部、１２インターフェイス部、２０記憶部、３０音声再生部、３１コマンド処理部、３２再生対象格納バッファ、３３デコード部、３４デコード済データ格納バッファ、３５Ｄ／Ａコンバータ、１００ホストＣＰＵ、１２０スピーカ、２１０デコード開始信号、２２０デコード終了信号、２３０出力終了信号、２４０無音区間設定信号、２５０出力信号、３１０無音区間設定制御部 DESCRIPTION OF SYMBOLS 1 Memory | storage device, 2 Integrated circuit device, 3 Audio | voice reproduction system, 4 Integrated circuit device, 10 Memory | storage part, 12 Interface part, 20 Memory | storage part, 30 Audio | voice reproduction | regeneration part, 31 Command processing part, 32 Playback object storage buffer, 33 Decoding part 34 Decoded data storage buffer 35 D / A converter 100 Host CPU 120 Speaker 210 Decode start signal 220 Decode end signal 230 Output end signal 240 Silent section setting signal 250 Output signal 310 Silent section setting Control unit

Claims

A voice data creation method for creating a plurality of voice data obtained by dividing a voice message at a given division point,
A division location selection procedure for selecting a division location of the voice data based on a voice frequency included in the voice message;
And a dividing procedure for dividing the audio data at the dividing points selected in the dividing point selecting procedure.

The voice data creation method according to claim 1,
A method for creating voice data, comprising selecting at least one of the parts where the intensity of a voice frequency component equal to or lower than a predetermined frequency included in the voice message is equal to or lower than a predetermined value as the division part.

It is the audio | voice data creation method in any one of Claim 1 and 2, Comprising:
In the division location selection procedure,
A method of creating voice data, wherein the division part is selected from at least one part where a period in which an intensity of a voice frequency component equal to or lower than a predetermined frequency included in the voice message is equal to or lower than a predetermined value continues for a predetermined time or longer.

It is the audio | voice data creation method in any one of Claim 2 and 3,
The audio data creation method, wherein the predetermined frequency is ½ of a sampling frequency of audio data.

The audio data creation method according to any one of claims 1 to 4,
The voice data creation method, wherein the voice message includes a numerical reading message.

The voice data creation method according to any one of claims 1 to 5,
Including a division location candidate selection procedure for selecting the front and rear of a common syllable group included in a plurality of the voice messages as division location candidates,
In the division location selection procedure,
A method for creating audio data, wherein the division location is selected from the division location candidates.

A storage device including a storage unit in which a plurality of voice data obtained by dividing a voice message is stored,
At least one of the voice data has a head portion of a syllable in the middle of a word included in the voice message, and an intensity of a voice frequency component equal to or lower than a predetermined frequency included in the head of the syllable is equal to or lower than a predetermined value. A storage device.

A storage device including a storage unit in which a plurality of voice data obtained by dividing a voice message is stored,
At least one of the audio data has an end portion that is a syllable in the middle of a word included in the audio message, and an intensity of an audio frequency component equal to or less than a predetermined frequency included at the end of the syllable is equal to or less than a predetermined value. A storage device.

A storage unit storing a plurality of voice data obtained by dividing a voice message;
An integrated circuit device including an audio reproduction unit that receives an audio reproduction command, reads out audio data from the storage unit based on the received audio reproduction command, and reproduces and outputs the audio data;
At least one of the voice data has a head portion of a syllable in the middle of a word included in the voice message, and an intensity of a voice frequency component equal to or lower than a predetermined frequency included in the head of the syllable is equal to or lower than a predetermined value. An integrated circuit device.

A storage unit storing a plurality of voice data obtained by dividing a voice message;
An integrated circuit device including an audio reproduction unit that receives an audio reproduction command, reads out audio data from the storage unit based on the received audio reproduction command, and reproduces and outputs the audio data;
At least one of the voice data has an end portion of a syllable in the middle of a word included in the voice message, and an intensity of a voice frequency component equal to or lower than a predetermined frequency included at the end of the syllable is equal to or lower than a predetermined value. An integrated circuit device.

A storage device storing a plurality of voice data obtained by dividing a voice message;
An audio reproduction system including an integrated circuit device that receives an audio reproduction command and reproduces and outputs audio data stored in the storage device based on the received audio reproduction command;
The sound storage system according to claim 7, wherein the storage device is the storage device according to claim 7.