JP2018072509A

JP2018072509A - Voice reading device, voice reading system, voice reading method and program

Info

Publication number: JP2018072509A
Application number: JP2016210652A
Authority: JP
Inventors: 尚志奥村; Hisashi Okumura; 隆史右田; Takashi Uda; 直紀竹内; Naoki Takeuchi
Original assignee: Toppan Forms Co Ltd
Current assignee: Toppan Edge Inc
Priority date: 2016-10-27
Filing date: 2016-10-27
Publication date: 2018-05-10

Abstract

PROBLEM TO BE SOLVED: To permit a user to easily recognize a content of a character string in a range selected by the user.SOLUTION: A voice reading device comprises: a display section for displaying text information; an operation input section for receiving operation input to select a part of or a whole character string included in the text information displayed by the display section; and a voice reproduction section for reproducing the voice on the basis of character string data showing the character string selected by the operation input.SELECTED DRAWING: Figure 1

Description

本発明は、音声読み上げ装置、音声読み上げシステム、音声読み上げ方法、およびプログラムに関する。 The present invention relates to a voice reading device, a voice reading system, a voice reading method, and a program.

昨今、スマートフォンをはじめ小型のディスプレイが搭載された携帯型端末が普及している。ディスプレイの小型化に伴い、当該ディスプレイに表示される文書の文字のサイズも小さくなるが、これにより、とくに視力が衰えた高齢者等にとっては、文字が小さすぎて読みにくい場合がある。例えば、スマートフォンの画面から何らかのサービスの契約の手続きをする際に、当該契約に関する確認事項や約款等の文書が表示される場合があるが、記載事項が多いために小さい文字サイズで表示されることが多い。そこで、例えば特許文献１に記載の保険契約情報提供システムは、予め約款を内容に応じて複数のパートに分けておき、約款の読み出し指示があると、約款全体の電子データのうち、提示すべき対象のパートの電子データを約款データベースから読み出してクライアント端末に表示させる。 In recent years, portable terminals equipped with a small display such as a smartphone have become widespread. Along with the miniaturization of the display, the text size of the document displayed on the display is also reduced. However, this may make the text too small and difficult to read, particularly for elderly people who have diminished vision. For example, when you process a contract for a service from the screen of a smartphone, documents such as confirmation items and contracts related to the contract may be displayed. There are many. Therefore, for example, the insurance contract information providing system described in Patent Document 1 divides the clauses into a plurality of parts according to the contents in advance, and if there is an instruction to read the clauses, it should be presented in the electronic data of the entire clauses The electronic data of the target part is read from the clause database and displayed on the client terminal.

特開２００９−１１６６１０号公報JP 2009-116610 A

約款等が表示された画面の一部に対して拡大操作をして拡大表示すると、拡大された画像が表示画面に収まらなくなり、ユーザはスクロールさせながら読む必要がある。一方、特許文献１に記載の保険契約情報提供システムは、予め約款を内容に応じて複数のパートに分けておき、当該パート毎に分けられた電子データを約款データベースから読み出してクライアント端末に提供することができるため、ユーザは必ずしもそのパート全体を拡大する必要がないが、パート内の一部のみを確認したい場合であっても、必要ではないデータも含めて表示されてしまう。その場合、ユーザにとって必ずしも内容が確認しやすいとは限らない。 When an enlargement operation is performed on a part of the screen on which the terms and conditions are displayed, the enlarged image does not fit on the display screen, and the user needs to read while scrolling. On the other hand, the insurance contract information providing system described in Patent Document 1 previously divides the clauses into a plurality of parts according to the contents, reads out the electronic data divided for each part from the clause database, and provides it to the client terminal. Therefore, the user does not necessarily need to enlarge the entire part, but even when only a part of the part is desired to be confirmed, it is displayed including unnecessary data. In that case, it is not always easy for the user to confirm the contents.

本発明は、このような事情に鑑みてなされたもので、その目的は、ユーザによって選択された範囲の文字列の内容を認識させやすくすることができる音声読み上げ装置、音声読み上げシステム、音声読み上げ方法、およびプログラムを提供することにある。 The present invention has been made in view of such circumstances, and an object of the present invention is to provide a speech reading device, a speech reading system, and a speech reading method capable of easily recognizing the contents of a character string in a range selected by a user. , And to provide a program.

上述した課題を解決するため、本発明の一態様は、テキスト情報を表示する表示部と、前記表示部によって表示された前記テキスト情報に含まれる文字列の一部または全部を選択する操作入力を受け付ける操作入力部と、前記操作入力によって選択された前記文字列を示す文字列データに基づく音声を再生する音声再生部と、を備えることを特徴とする音声読み上げ装置である。 In order to solve the above-described problem, according to one embodiment of the present invention, a display unit that displays text information and an operation input that selects a part or all of a character string included in the text information displayed by the display unit are provided. A speech reading apparatus comprising: an operation input unit that accepts; and an audio reproduction unit that reproduces audio based on character string data indicating the character string selected by the operation input.

また、本発明の一態様は、前記音声再生部は、前記文字列データに基づく文字列が辞書情報を用いて言語解析された文字列を示す文字列データに基づく音声を再生することを特徴とする（１）に記載の音声読み上げ装置である。 Moreover, one aspect of the present invention is characterized in that the voice reproduction unit reproduces voice based on character string data indicating a character string obtained by performing language analysis on the character string based on the character string data using dictionary information. The speech reading apparatus according to (1).

また、本発明の一態様は、前記音声再生部は、前記文字列データに基づく文字列が辞書情報を用いて言語解析された文字列がさらに他の言語に翻訳された文字列を示す文字列データに基づく音声を再生することを特徴とする（１）に記載の音声読み上げ装置である。 Further, according to one aspect of the present invention, the voice reproduction unit may include a character string in which a character string obtained by performing a linguistic analysis on a character string based on the character string data using dictionary information is further translated into another language. The voice reading apparatus according to (1), wherein voice based on data is reproduced.

また、本発明の一態様は、音声読み上げ装置と音声変換サーバとを有する音声読み上げシステムであって、前記音声読み上げ装置は、テキスト情報を表示する表示部と、前記表示部によって表示された前記テキスト情報に含まれる文字列の一部または全部を選択する操作入力を受け付ける操作入力部と、前記操作入力によって選択された文字列を示す文字列データを第２の通信部へ送信し、音声データを前記第２の通信部から受信する第１の通信部と、前記音声データに基づいて音声を再生する音声再生部と、
を備え、前記音声変換サーバは、前記第１の通信部から送信された前記文字列データを前記音声データに変換する音声変換部と、前記文字列データを前記第１の通信部から受信し、前記音声変換部によって変換された前記音声データを前記第１の通信部へ送信する第２の通信部と、を備えることを特徴とする音声読み上げシステムである。 Another embodiment of the present invention is a speech reading system including a speech reading device and a speech conversion server, wherein the speech reading device includes a display unit that displays text information and the text displayed by the display unit. An operation input unit that receives an operation input for selecting a part or all of a character string included in the information, and character string data indicating the character string selected by the operation input are transmitted to the second communication unit, and voice data is transmitted. A first communication unit that receives from the second communication unit; an audio reproduction unit that reproduces audio based on the audio data;
The voice conversion server receives a voice conversion unit that converts the character string data transmitted from the first communication unit into the voice data, and the character string data from the first communication unit, And a second communication unit that transmits the voice data converted by the voice conversion unit to the first communication unit.

また、本発明の一態様は、コンピュータによる音声読み上げ方法であって、表示部が、テキスト情報を表示する表示ステップと、操作入力部が、前記表示部によって表示された前記テキスト情報に含まれる文字列の一部または全部を選択する操作入力を受け付ける操作入力ステップと、音声再生部が、前記操作入力によって選択された前記文字列を示す文字列データに基づく音声を再生する音声再生ステップと、を有することを特徴とする音声読み上げ方法である。 One embodiment of the present invention is a computer-to-speech reading method, in which a display unit displays text information, and an operation input unit includes characters included in the text information displayed by the display unit. An operation input step for receiving an operation input for selecting a part or all of the column; and an audio reproduction step for reproducing an audio based on character string data indicating the character string selected by the operation input. It is a voice reading method characterized by having.

コンピュータに、テキスト情報を表示する表示ステップと、前記表示ステップによって表示された前記テキスト情報に含まれる文字列の一部または全部を選択するユーザによる操作入力を受け付ける操作入力ステップと、前記操作入力によって選択された前記文字列を示す文字列データに基づく音声を再生する音声再生ステップと、を実行させるためのプログラムである。 A display step for displaying text information on a computer, an operation input step for accepting an operation input by a user selecting a part or all of a character string included in the text information displayed by the display step, and the operation input And a sound reproduction step of reproducing sound based on character string data indicating the selected character string.

以上説明したように、この発明によれば、ユーザによって選択された範囲の文字列の内容を認識させやすくすることができる。 As described above, according to the present invention, the contents of the character string in the range selected by the user can be easily recognized.

この発明の第１の実施形態による音声読み上げシステムの音声読み上げ装置および音声変換サーバの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the voice reading apparatus and the voice conversion server of the voice reading system by 1st Embodiment of this invention. この発明の第１の実施形態による音声読み上げシステムの音声読み上げ装置における文字列選択の一例を示す概略図である。It is the schematic which shows an example of the character string selection in the voice reading apparatus of the voice reading system by 1st Embodiment of this invention. この発明の第１の実施形態による音声読み上げシステムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the speech reading system by 1st Embodiment of this invention. この発明の第２の実施形態による音声読み上げシステムの音声変換サーバの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the speech conversion server of the speech reading system by 2nd Embodiment of this invention. この発明の第２の実施形態による音声読み上げシステムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the speech reading system by 2nd Embodiment of this invention. この発明の第３の実施形態による音声読み上げシステムの音声変換サーバの機能構成を示すブロック図である。It is a block diagram which shows the function structure of the speech conversion server of the speech reading system by 3rd Embodiment of this invention. この発明の第３の実施形態による音声読み上げシステムの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the speech reading system by 3rd Embodiment of this invention. この発明の第４の実施形態による音声読み上げシステムの音声読み上げ装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the speech reading apparatus of the speech reading system by 4th Embodiment of this invention.

＜第１の実施形態＞
以下、本発明の第１の実施形態による音声読み上げシステムについて図面を参照して説明する。 <First Embodiment>
Hereinafter, a voice reading system according to a first embodiment of the present invention will be described with reference to the drawings.

［音声読み上げシステムの構成］
図１は、この発明の第１の実施形態による音声読み上げシステムの音声読み上げ装置および音声変換サーバの機能構成を示すブロック図である。同図に示す音声読み上げシステム１は、音声読み上げ装置１０Ａと、音声変換サーバ２０Ａと、通信ネットワーク５０と、で構成されている。 [Configuration of the voice reading system]
FIG. 1 is a block diagram showing a functional configuration of a speech reading apparatus and a speech conversion server of the speech reading system according to the first embodiment of the present invention. The voice reading system 1 shown in FIG. 1 includes a voice reading device 10A, a voice conversion server 20A, and a communication network 50.

音声読み上げ装置１０Ａは、テキスト情報を表示し、表示したテキスト情報に含まれる文字列の一部または全部を選択する操作入力を受け付け、当該操作入力によって選択された文字列を示す文字列データを、通信ネットワーク５０を介して音声変換サーバ２０Ａへ送信する。また、音声読み上げ装置１０Ａは、音声データを、音声変換サーバ２０Ａから通信ネットワーク５０を介して受信し、当該音声データに基づいて音声を再生する。
音声読み上げ装置は、携帯型の小型情報端末、例えば、スマートフォン等である。 The voice reading apparatus 10A displays text information, accepts an operation input for selecting part or all of a character string included in the displayed text information, and receives character string data indicating a character string selected by the operation input. The data is transmitted to the voice conversion server 20A via the communication network 50. Further, the voice reading device 10A receives voice data from the voice conversion server 20A via the communication network 50, and reproduces voice based on the voice data.
The speech reading apparatus is a portable small information terminal, such as a smartphone.

音声変換サーバ２０Ａは、音声読み上げ装置１０Ａから送信された文字列データを、通信ネットワーク５０を介して受信し、受信した文字列データを音声データに変換し、変換した音声データを、通信ネットワーク５０を介して音声読み上げ装置１０Ａへ送信する。
音声変換サーバ２０は、コンピュータ装置、例えば、汎用コンピュータ、またはパーソナルコンピュータ等を含んで構成される。 The voice conversion server 20A receives the character string data transmitted from the voice reading device 10A via the communication network 50, converts the received character string data into voice data, and converts the converted voice data to the communication network 50. To the voice reading apparatus 10A.
The voice conversion server 20 includes a computer device such as a general-purpose computer or a personal computer.

通信ネットワーク５０は、音声読み上げ装置１０Ａと音声変換サーバ２０とが通信接続される通信ネットワークである。通信ネットワーク５０は、例えば、インターネット、ＷＡＮ（Wide Area Network；広域通信網）、ＬＡＮ（Local Area Network；構内通信網）、またはこれらの通信ネットワークの任意の組み合わせ、等によって構成される。
以下に、音声読み上げ装置１０Ａ、および音声変換サーバ２０Ａの機能構成について、それぞれ説明する。 The communication network 50 is a communication network in which the voice reading device 10A and the voice conversion server 20 are connected for communication. The communication network 50 is configured by, for example, the Internet, a WAN (Wide Area Network), a LAN (Local Area Network), or any combination of these communication networks.
Hereinafter, functional configurations of the voice reading device 10A and the voice conversion server 20A will be described.

なお、本実施形態による音声読み上げシステム１は、音声読み上げ装置１０Ａと、音声変換サーバ２０Ａと、通信ネットワーク５０と、から構成されるものとしたが、これに限られない。例えば、後述する第４の実施形態による音声読み上げシステム１のように、音声変換サーバ２０Ａが有する機能を音声読み上げ装置１０Ａが有するような構成であってもよい。例えば、スマートフォンが、テキスト情報を表示し、表示したテキスト情報に含まれる文字列の一部または全部を選択する操作入力を受け付け、選択された文字列を示す文字列データを音声データに変換して、当該音声データに基づく音声を再生するような構成、すなわち、スマートフォン単体で音声による読み上げが行われるような構成であってもよい。 The speech reading system 1 according to the present embodiment includes the speech reading device 10A, the speech conversion server 20A, and the communication network 50, but is not limited thereto. For example, a configuration in which the speech reading apparatus 10A has the functions of the speech conversion server 20A as in the speech reading system 1 according to the fourth embodiment described later may be used. For example, a smartphone displays text information, accepts an operation input for selecting part or all of a character string included in the displayed text information, and converts character string data indicating the selected character string into voice data. A configuration that reproduces audio based on the audio data, that is, a configuration in which reading by audio is performed by a smartphone alone may be employed.

［音声読み上げ装置の構成］
図１に示すように、音声読み上げ装置１０Ａは、制御部１００と、記憶部１０１と、通信部１０２と、表示部１０３と、操作入力部１０４と、選択範囲検出部１０５と、音声再生部１０６と、を含んで構成される。 [Configuration of voice reading device]
As shown in FIG. 1, the voice reading device 10A includes a control unit 100, a storage unit 101, a communication unit 102, a display unit 103, an operation input unit 104, a selection range detection unit 105, and a voice reproduction unit 106. And comprising.

制御部１００は、音声読み上げ装置１０Ａの各機能ブロックによって行われる処理を制御する。制御部１００は、例えば、ＣＰＵ（Central Processing Unit；中央演算処理装置）を含んで構成される。 The control unit 100 controls processing performed by each functional block of the speech reading apparatus 10A. The control unit 100 includes, for example, a CPU (Central Processing Unit).

記憶部１０１は、音声読み上げ装置１０Ａにおいて用いられる各種プログラムや、各種データを記憶する。なお、記憶部１０１は、後述する表示部１０３によって表示される画像に含まれるテキスト情報（例えば、ＨＴＭＬ形式で記述されたソースファイル）等のコンテンツデータも記憶する。
記憶部１０１は、記憶媒体、例えば、例えば、ＨＤＤ（Hard Disk Drive；ハードディスクドライブ）、フラッシュメモリ、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory；イーイーピーロム）、ＲＡＭ（Random Access read/write Memory；読み書き可能なメモリ）、ＲＯＭ（Read Only Memory；読み出し専用メモリ）、またはこれらの記憶媒体の任意の組み合わせによって構成される。 The storage unit 101 stores various programs and various data used in the speech reading apparatus 10A. The storage unit 101 also stores content data such as text information (for example, a source file described in HTML format) included in an image displayed by the display unit 103 described later.
The storage unit 101 is a storage medium, for example, HDD (Hard Disk Drive), flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), RAM (Random Access read / write Memory) Memory), ROM (Read Only Memory), or any combination of these storage media.

通信部１０２は、通信ネットワーク５０と通信接続する通信インターフェースである。通信部１０２は、通信ネットワーク５０を介して、音声読み上げ装置１０Ａと音声変換サーバ２０Ａとの間における各種データの送受信を行う。通信部１０２（第１の通信部）は、例えば、操作入力によって選択された文字列を示す文字列データを、通信ネットワーク５０を介して、後述する音声変換サーバ２０Ａの通信部２０２（第２の通信部）へ送信する。また、通信部１０２（第１の通信部）は、音声データを、後述する音声変換サーバ２０Ａの通信部２０２（第２の通信部）から通信ネットワーク５０を介して受信する。 The communication unit 102 is a communication interface for communication connection with the communication network 50. The communication unit 102 transmits and receives various data between the speech reading apparatus 10 A and the speech conversion server 20 A via the communication network 50. The communication unit 102 (first communication unit), for example, transmits character string data indicating a character string selected by an operation input via the communication network 50 to the communication unit 202 (second second) of the voice conversion server 20A described later. To the communication unit). The communication unit 102 (first communication unit) receives voice data from the communication unit 202 (second communication unit) of the voice conversion server 20A described later via the communication network 50.

表示部１０３は、ユーザに対して画像情報を出力するユーザインターフェースである。例えば、表示部１０３は、テキスト情報を含む画像を表示する。当該テキスト情報は、記憶部１０１に記憶されているテキスト情報であってもよいし、通信ネットワーク５０を介して外部のウェブサーバ（図示せず）等から取得するテキスト情報であってもよい。また、例えば、表示部１０３は、当該テキスト情報に含まれる文字列のうち、ユーザによって選択された範囲の文字列であることを示す画像（例えば、ユーザが選択した範囲の文字列に対して施される網掛けの画像等）を表示する。表示部１０３は、ディスプレイ、例えば、ＬＣＤ（Liquid Crystal Display；液晶ディスプレイ）、または有機ＥＬ（エレクトロルミネッセンス）ディスプレイを含んで構成される。 The display unit 103 is a user interface that outputs image information to the user. For example, the display unit 103 displays an image including text information. The text information may be text information stored in the storage unit 101, or may be text information acquired from an external web server (not shown) or the like via the communication network 50. Further, for example, the display unit 103 applies an image (for example, a character string in a range selected by the user) indicating that the character string is in a range selected by the user among the character strings included in the text information. Displayed). The display unit 103 includes a display, for example, an LCD (Liquid Crystal Display) or an organic EL (electroluminescence) display.

操作入力部１０４は、ユーザからの操作入力を受け付けるユーザインターフェースである。例えば、操作入力部１０４は、表示部１０３によって表示されたテキスト情報に含まれる文字列の一部または全部を選択する操作入力を受け付ける。操作入力部１０４は、例えば、表示部１０３を構成するディスプレイとタッチ操作検出用のセンサーとが一体化された、タッチパネル式の液晶ディスプレイ等を含んで構成される。 The operation input unit 104 is a user interface that receives an operation input from a user. For example, the operation input unit 104 receives an operation input for selecting a part or all of a character string included in the text information displayed by the display unit 103. The operation input unit 104 includes, for example, a touch panel type liquid crystal display in which a display constituting the display unit 103 and a touch operation detection sensor are integrated.

選択範囲検出部１０５は、操作入力部１０４によって受け付けられた操作入力に基づく文字列の選択の選択範囲を検出し、検出した選択範囲に含まれる文字列を示す文字列データを生成する。なお、文字列選択の具体例については後述する。 The selection range detection unit 105 detects a selection range for selecting a character string based on the operation input received by the operation input unit 104, and generates character string data indicating a character string included in the detected selection range. A specific example of character string selection will be described later.

音声再生部１０６は、ユーザに対して音声情報を出力するユーザインターフェースである。例えば、音声再生部１０６は、通信部１０２（第１の通信部）により通信ネットワーク５０を介して受信する音声データに基づいて音声を再生する。音声再生部１０６は、例えば、スピーカー、またはヘッドホン等を含んで構成される。 The audio reproduction unit 106 is a user interface that outputs audio information to the user. For example, the audio reproduction unit 106 reproduces audio based on audio data received via the communication network 50 by the communication unit 102 (first communication unit). The audio reproducing unit 106 includes, for example, a speaker or headphones.

［音声変換サーバの構成］
図１に示すように、音声変換サーバ２０Ａは、制御部２００と、記憶部２０１と、通信部２０２と、音声情報記憶部２０３と、音声変換部２０４と、を含んで構成される。 [Configuration of voice conversion server]
As illustrated in FIG. 1, the voice conversion server 20A includes a control unit 200, a storage unit 201, a communication unit 202, a voice information storage unit 203, and a voice conversion unit 204.

制御部２００は、音声変換サーバ２０Ａの各機能ブロックによって行われる処理を制御する。制御部２００は、例えば、ＣＰＵを含んで構成される。 The control unit 200 controls processing performed by each functional block of the voice conversion server 20A. The control unit 200 includes a CPU, for example.

記憶部２０１は、音声変換サーバ２０Ａにおいて用いられる各種プログラムや、各種データを記憶する。記憶部２０１は、記憶媒体、例えば、例えば、ＨＤＤ、フラッシュメモリ、ＥＥＰＲＯＭ、ＲＡＭ、ＲＯＭ、またはこれらの記憶媒体の任意の組み合わせによって構成される。 The storage unit 201 stores various programs and various data used in the voice conversion server 20A. The storage unit 201 is configured by a storage medium, for example, an HDD, a flash memory, an EEPROM, a RAM, a ROM, or any combination of these storage media.

通信部２０２は、通信ネットワーク５０と通信接続する通信インターフェースである。通信部２０２は、通信ネットワーク５０を介して、音声変換サーバ２０Ａと音声読み上げ装置１０Ａとの間における各種データの送受信を行う。通信部２０２（第２の通信部）は、例えば、操作入力によって選択された文字列を示す文字列データを、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から通信ネットワーク５０を介して受信する。また、通信部２０２（第２の通信部）は、後述する音声変換部２０４によって当該文字列が変換された音声データを、通信ネットワーク５０を介して、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）へ送信する。 The communication unit 202 is a communication interface that is communicatively connected to the communication network 50. The communication unit 202 transmits / receives various data between the voice conversion server 20A and the voice reading device 10A via the communication network 50. The communication unit 202 (second communication unit), for example, transmits character string data indicating a character string selected by an operation input from the communication unit 102 (first communication unit) of the speech reading apparatus 10A via the communication network 50. Receive. In addition, the communication unit 202 (second communication unit) transmits the voice data, in which the character string is converted by the voice conversion unit 204 described later, via the communication network 50, to the communication unit 102 (first first) of the voice reading device 10A. Communication section).

音声情報記憶部２０３は、文字列データと音声データとが対応付けられた音声情報を記憶する。音声情報記憶部２０３は、記憶媒体、例えば、例えば、ＨＤＤ、フラッシュメモリ、ＥＥＰＲＯＭ、ＲＡＭ、ＲＯＭ、またはこれらの記憶媒体の任意の組み合わせによって構成される。 The voice information storage unit 203 stores voice information in which character string data and voice data are associated with each other. The audio information storage unit 203 is configured by a storage medium, for example, an HDD, a flash memory, an EEPROM, a RAM, a ROM, or any combination of these storage media.

音声変換部２０４は、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データを検索キーとして、音声情報記憶部２０３に記憶された音声情報に含まれる文字列データを検索し、検索された文字列データに対応付けられた音声データを取得する。これにより、音声変換部２０４は、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データを音声データに変換する。 The voice conversion unit 204 uses the character string data transmitted from the communication unit 102 (first communication unit) of the voice reading apparatus 10A as a search key, and the character string data included in the voice information stored in the voice information storage unit 203. And the voice data associated with the searched character string data is acquired. Thereby, the voice conversion unit 204 converts the character string data transmitted from the communication unit 102 (first communication unit) of the voice reading apparatus 10A into voice data.

［文字列選択の具体例］
以下、音声読み上げ装置１０Ａにおける文字列選択の一例について説明する。
図２は、この発明の第１の実施形態による音声読み上げシステムの音声読み上げ装置における文字列選択の一例を示す概略図である。 [Specific example of character string selection]
Hereinafter, an example of character string selection in the speech reading apparatus 10A will be described.
FIG. 2 is a schematic diagram showing an example of character string selection in the speech reading apparatus of the speech reading system according to the first embodiment of the present invention.

図２には、音声読み上げ装置１０Ａであるスマートフォンが図示されている。当該スマートフォンには、表示部１０３および操作入力部１０４を兼ねたタッチパネル式の液晶ディスプレイが搭載されている。図示するように、液晶ディスプレイには、「○○○○○株式会社は〜」の文言から始まる文書を示す画像が表示されており、液晶ディスプレイにおける当該文書を示す画像の上方には、プルダウンメニューｐｍ１を示す画像と、音声読み上げボタンｂｔ１を示す画像が表示されている。 FIG. 2 illustrates a smartphone that is the voice reading device 10 A. The smartphone is equipped with a touch panel type liquid crystal display that also serves as the display unit 103 and the operation input unit 104. As shown in the figure, the liquid crystal display displays an image showing a document starting with the words “XXXXX Corporation is ~”, and a pull-down menu is displayed above the image showing the document on the liquid crystal display. An image indicating pm1 and an image indicating the voice reading button bt1 are displayed.

また、図示するように、液晶ディスプレイに表示された文書を示す画像において、「技術や〜」から「〜を最適化」までの文字列の範囲は網掛けがなされている。当該網掛けがなされた範囲は文字列選択範囲ｓａ１であり、液晶ディスプレイにおいて、「技術や〜」から「〜を最適化」までの文字列が表示された部分がユーザの指によってなぞられたことを表す画像である。このように、ユーザは、液晶ディスプレイにおいて、文章を視認し、読みにくい部分があった場合には、その箇所の文字列を指でなぞることによって、文書内の一部の範囲の文字列を選択指定する（文字列選択する）ことができる。 Further, as shown in the figure, in the image showing the document displayed on the liquid crystal display, the range of the character string from “technical or ~” to “optimize ~” is shaded. The shaded range is the character string selection range sa1, and on the liquid crystal display, the portion where the character string from "Technology or ~" to "Optimize ~" is traced by the user's finger It is an image showing. In this way, when the user visually recognizes a sentence on the liquid crystal display and there is a difficult part to read, the user selects a part of the character string in the document by tracing the character string of the part with a finger. You can specify (select a character string).

なお、図２に示した音声読み上げ装置１０Ａの表示部１０３に表示された文言のうち、例えば、「術やノウハウをベースに、お客様の情報伝」の部分の文字列がユーザによって選択指定されたとする。このように、この「術」や「情報伝」のように単語や熟語の一部分のみが選択指定されていることを検知した場合、音声読み上げ装置１０Ａは、単語や熟語の区切りと推測される範囲まで自動的に選択指定範囲を広げて、「技術」から「情報伝達」までを選択指定するようにしてもよい。 Of the text displayed on the display unit 103 of the speech reading apparatus 10A shown in FIG. 2, for example, the user selects and designates the character string of the part “customer information transmission based on art and know-how”. To do. Thus, when it is detected that only a part of a word or idiom is selected and designated as in “jutsu” or “information transmission”, the speech reading apparatus 10 A has a range estimated as a delimiter between words and idioms. It is also possible to automatically expand the selection / designation range to “selection” from “technology” to “information transmission”.

［音声読み上げシステムの動作］
以下、第１の実施形態に係る音声読み上げシステム１の動作について説明する。
図３は、この発明の第１の実施形態による音声読み上げシステムの動作を示すフローチャートである。本フローチャートの処理は、音声読み上げ装置１０Ａの制御部１００により表示部１０３へテキスト情報を含むコンテンツデータが出力された際に開始する。 [Operation of voice reading system]
Hereinafter, the operation of the speech reading system 1 according to the first embodiment will be described.
FIG. 3 is a flowchart showing the operation of the speech reading system according to the first embodiment of the present invention. The processing of this flowchart starts when content data including text information is output to the display unit 103 by the control unit 100 of the speech reading apparatus 10A.

（ステップＳ００１）音声読み上げ装置１０Ａの表示部１０３は、テキスト情報を含むコンテンツデータを取得する。その後、ステップＳ００２へ進む。
なお、コンテンツデータは、制御部１００により、記憶部１０１から表示部１０３へ出力される、予め記憶部１０１に記憶されているコンテンツデータであってもよいし、または、制御部１００により、通信部１０２から表示部１０３へ出力される、通信ネットワーク５０に通信接続された外部のコンテンツサーバ（図示せず）から取得されるコンテンツデータであってもよい。 (Step S001) Display unit 103 of speech reading apparatus 10A acquires content data including text information. Thereafter, the process proceeds to step S002.
The content data may be content data stored in advance in the storage unit 101 that is output from the storage unit 101 to the display unit 103 by the control unit 100, or may be a communication unit by the control unit 100. Content data acquired from an external content server (not shown) connected to the communication network 50 and output from the display 102 to the display unit 103 may be used.

（ステップＳ００２）音声読み上げ装置１０Ａの表示部１０３は、取得したコンテンツデータに基づくコンテンツを示す画像を表示する。その後、ステップＳ００３へ進む。 (Step S002) The display unit 103 of the speech reading apparatus 10A displays an image indicating content based on the acquired content data. Thereafter, the process proceeds to step S003.

（ステップＳ００３）音声読み上げ装置１０Ａの操作入力部１０４は、ユーザの操作入力に基づく音声読み上げ指示を示す信号を検出する。例えば、図２に示した液晶ディスプレイにおいて、音声読み上げボタンｂｔ１を示す画像の部分がユーザの指によってタッチされることによって、音声読み上げ指示を示す信号が生成され、操作入力部１０４は当該信号を検知する。操作入力部１０４は、当該信号を検知すると、ユーザによる文字列選択を示す操作入力を受け付けるモードになる。その後、ステップＳ００４へ進む。 (Step S003) The operation input unit 104 of the voice reading device 10A detects a signal indicating a voice reading instruction based on a user operation input. For example, in the liquid crystal display shown in FIG. 2, when the part of the image showing the speech reading button bt 1 is touched by the user's finger, a signal indicating a speech reading instruction is generated, and the operation input unit 104 detects the signal. To do. When the operation input unit 104 detects the signal, the operation input unit 104 enters a mode for receiving an operation input indicating character string selection by the user. Thereafter, the process proceeds to step S004.

（ステップＳ００４）音声読み上げ装置１０Ａの選択範囲検出部１０５は、操作入力部１０４によって受け付けられた操作入力に基づく文字列が選択された範囲を検出する。例えば、図２に示した液晶ディスプレイにおいて、文字列選択範囲ｓａ１の部分（すなわち、「技術や〜」から「〜を最適化」までの文字列の範囲）がユーザの指によってなぞられたことを、当該液晶ディスプレイに搭載されたタッチ操作検出用のセンサーが検出することによって、選択範囲検出部１０５は当該文字列選択範囲を検出する。また、操作入力部１０４は、タッチ操作検出用のセンサーによりユーザの指が液晶ディスプレイから離れたことを検出すると、ユーザによる文字列選択を示す操作入力を受け付けるモードから、通常時のモードに切り替わる。その後、ステップＳ００５へ進む。 (Step S004) The selection range detection unit 105 of the speech reading apparatus 10A detects a range in which a character string based on an operation input received by the operation input unit 104 is selected. For example, in the liquid crystal display shown in FIG. 2, the part of the character string selection range sa1 (that is, the character string range from “technology and” to “optimize”) is traced by the user's finger. When the touch operation detection sensor mounted on the liquid crystal display detects, the selection range detection unit 105 detects the character string selection range. Further, when the operation input unit 104 detects that the user's finger is separated from the liquid crystal display by the touch operation detection sensor, the operation input unit 104 switches from a mode in which an operation input indicating character string selection by the user is received to a normal mode. Thereafter, the process proceeds to step S005.

（ステップＳ００５）音声読み上げ装置１０Ａの選択範囲検出部１０５は、ステップＳ００４において検出した文字列選択範囲の画像に含まれる文字列を抽出し、抽出した文字列をテキストデータに変換した文字列データを生成する。制御部１００は、当該文字列データを、通信部１０２（第１の通信部）を介して、音声変換サーバ２０Ａの通信部２０２（第２の通信部）へ送信する。その後、ステップＳ００６へ進む。 (Step S005) The selection range detection unit 105 of the speech reading apparatus 10A extracts the character string included in the image of the character string selection range detected in Step S004, and converts the extracted character string into text data. Generate. The control unit 100 transmits the character string data to the communication unit 202 (second communication unit) of the voice conversion server 20A via the communication unit 102 (first communication unit). Thereafter, the process proceeds to step S006.

（ステップＳ００６）音声変換サーバ２０Ａの通信部２０２（第２の通信部）は、ステップＳ００５において音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データを、通信ネットワーク５０を介して受信する。音声変換サーバ２０Ａの制御部２００は、通信部２０２が受信した文字列データを音声変換部２０４へ出力する。
音声変換部２０４は、制御部２００により通信部２０２が受信した文字列データを取得すると、当該文字列データを検索キーとして、音声情報記憶部２０３に記憶された音声情報に含まれる文字列データを検索し、検索された文字列データに対応付けられた音声データを取得する。これにより、音声変換部２０４は、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データを音声データに変換する。
音声変換サーバ２０Ａの制御部２００は、音声変換部２０４によって変換された音声データを、通信部２０２（第２の通信部）を介して、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）へ送信する。その後、ステップＳ００７へ進む。 (Step S006) The communication unit 202 (second communication unit) of the voice conversion server 20A transmits the character string data transmitted from the communication unit 102 (first communication unit) of the voice reading device 10A in step S005 to the communication network. 50 through. The control unit 200 of the voice conversion server 20A outputs the character string data received by the communication unit 202 to the voice conversion unit 204.
When the voice conversion unit 204 acquires the character string data received by the communication unit 202 from the control unit 200, the voice conversion unit 204 uses the character string data as a search key to convert the character string data included in the voice information stored in the voice information storage unit 203. Search is performed, and voice data associated with the searched character string data is acquired. Thereby, the voice conversion unit 204 converts the character string data transmitted from the communication unit 102 (first communication unit) of the voice reading apparatus 10A into voice data.
The control unit 200 of the voice conversion server 20A transmits the voice data converted by the voice conversion unit 204 to the communication unit 102 (first communication unit) of the voice reading device 10A via the communication unit 202 (second communication unit). ). Thereafter, the process proceeds to step S007.

（ステップＳ００７）音声読み上げ装置１０Ａの通信部１０２（第１の通信部）は、ステップＳ００６において音声変換サーバ２０Ａの通信部２０２（第２の通信部）から送信された音声データを、通信ネットワーク５０を介して受信する。音声読み上げ装置１０Ａの制御部１００は、通信部１０２が受信した音声データを音声再生部１０６へ出力する。
音声再生部１０６は、制御部１００により通信部１０２が受信した音声データを取得すると、当該音声データに基づく音声を再生する。
以上で本フローチャートの処理が終了する。 (Step S007) The communication unit 102 (first communication unit) of the speech reading apparatus 10A transmits the voice data transmitted from the communication unit 202 (second communication unit) of the voice conversion server 20A in step S006 to the communication network 50. Receive via. The control unit 100 of the speech reading apparatus 10 A outputs the audio data received by the communication unit 102 to the audio reproduction unit 106.
When the audio reproducing unit 106 acquires the audio data received by the communication unit 102 by the control unit 100, the audio reproducing unit 106 reproduces audio based on the audio data.
Thus, the process of this flowchart is completed.

＜第２の実施形態＞
以下、本発明の第２の実施形態による音声読み上げシステムについて図面を参照して説明する。 <Second Embodiment>
Hereinafter, a speech reading system according to a second embodiment of the present invention will be described with reference to the drawings.

［音声変換サーバの構成］
図４は、この発明の第２の実施形態による音声読み上げシステムの音声変換サーバの機能構成を示すブロック図である。同図に示す音声変換サーバ２０Ｂは、制御部２００と、記憶部２０１と、通信部２０２と、音声情報記憶部２０３と、音声変換部２０４と、辞書記憶部２０５と、言語解析部２０６と、を含んで構成される。なお、同図において図１の各部に対応する部分には同一の符号を付け、その説明を省略する。 [Configuration of voice conversion server]
FIG. 4 is a block diagram showing a functional configuration of the voice conversion server of the voice reading system according to the second embodiment of the present invention. The voice conversion server 20B shown in the figure includes a control unit 200, a storage unit 201, a communication unit 202, a voice information storage unit 203, a voice conversion unit 204, a dictionary storage unit 205, a language analysis unit 206, It is comprised including. In the figure, portions corresponding to those in FIG. 1 are denoted by the same reference numerals, and description thereof is omitted.

辞書記憶部２０５は、文字列データと、単語、熟語、意味、他のデータとの関係、起源、用途、フォーマット、およびイントネーション等の言語情報と、が対応付けられた辞書情報を、集中的に記憶する。辞書記憶部２０５は、記憶媒体、例えば、例えば、ＨＤＤ、フラッシュメモリ、ＥＥＰＲＯＭ、ＲＡＭ、ＲＯＭ、またはこれらの記憶媒体の任意の組み合わせによって構成される。 The dictionary storage unit 205 concentrates on dictionary information in which character string data is associated with language information such as words, idioms, meanings, relationships with other data, origin, usage, format, and intonation. Remember. The dictionary storage unit 205 is configured by a storage medium, for example, an HDD, a flash memory, an EEPROM, a RAM, a ROM, or any combination of these storage media.

言語解析部２０６は、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データを検索キーとして、辞書記憶部２０５に記憶された辞書情報に含まれる文字列データを検索し、検索された文字列データに対応付けられた言語情報を取得する。これにより、言語解析部２０６は、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データが示す文字列を解析して、当該文字列における文節や単語の区切り等を特定し、当該文字列に含まれる語句およびイントネーション等を特定する。 The language analysis unit 206 uses the character string data transmitted from the communication unit 102 (first communication unit) of the speech reading apparatus 10A as a search key, and the character string data included in the dictionary information stored in the dictionary storage unit 205. The language information associated with the retrieved character string data is retrieved. As a result, the language analysis unit 206 analyzes the character string indicated by the character string data transmitted from the communication unit 102 (first communication unit) of the speech reading apparatus 10A, and breaks phrases or words in the character string. And the phrase and intonation included in the character string.

音声変換部２０４は、言語解析部２０６による上記の解析によって特定された言語情報（例えば、語句およびイントネーション等を示す情報）を用いて、文字列データから音声データへ変換する。これにより、第２の実施形態における音声読み上げシステム１は、第１の実施形態における音声読み上げシステム１と比べて、より自然な発音での音声による読み上げをすることができる。 The voice conversion unit 204 converts character string data into voice data using the language information (for example, information indicating a phrase and intonation) specified by the analysis by the language analysis unit 206. Thereby, the voice reading system 1 in the second embodiment can read out with a voice with a more natural pronunciation as compared with the voice reading system 1 in the first embodiment.

［音声読み上げシステムの動作］
以下、第２の実施形態に係る音声読み上げシステム１の動作について説明する。
図５は、この発明の第２の実施形態による音声読み上げシステムの動作を示すフローチャートである。本フローチャートの処理は、音声読み上げ装置１０Ａの制御部１００により表示部１０３へテキスト情報を含むコンテンツデータが出力された際に開始する。 [Operation of voice reading system]
Hereinafter, the operation of the speech reading system 1 according to the second embodiment will be described.
FIG. 5 is a flowchart showing the operation of the speech reading system according to the second embodiment of the present invention. The processing of this flowchart starts when content data including text information is output to the display unit 103 by the control unit 100 of the speech reading apparatus 10A.

ステップＳ１０１からステップＳ１０５までの動作は、第１の実施形態に係る音声読み上げシステム１の動作において説明したステップＳ００１からステップＳ００５までの動作と同様であるため、説明を省略する。 Since the operation from step S101 to step S105 is the same as the operation from step S001 to step S005 described in the operation of the speech reading system 1 according to the first embodiment, the description thereof is omitted.

（ステップＳ１０６）音声変換サーバ２０Ａの通信部２０２（第２の通信部）は、ステップＳ１０５において音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データを、通信ネットワーク５０を介して受信する。音声変換サーバ２０Ａの制御部２００は、通信部２０２が受信した文字列データを言語解析部２０６へ出力する。
言語解析部２０６は、制御部２００により通信部２０２が受信した文字列データを取得すると、当該文字列データを検索キーとして、辞書記憶部２０５に記憶された辞書情報に含まれる文字列データを検索し、検索された文字列データに対応付けられた言語情報を取得する。制御部２００は、言語解析部２０６が取得した言語情報を、音声変換部２０４へ出力する。その後、ステップＳ１０７へ進む。 (Step S106) The communication unit 202 (second communication unit) of the voice conversion server 20A transmits the character string data transmitted from the communication unit 102 (first communication unit) of the voice reading device 10A in step S105 to the communication network. 50 through. The control unit 200 of the voice conversion server 20A outputs the character string data received by the communication unit 202 to the language analysis unit 206.
When the language analysis unit 206 acquires the character string data received by the communication unit 202 from the control unit 200, the language analysis unit 206 searches the character string data included in the dictionary information stored in the dictionary storage unit 205 using the character string data as a search key. The language information associated with the retrieved character string data is acquired. The control unit 200 outputs the language information acquired by the language analysis unit 206 to the voice conversion unit 204. Thereafter, the process proceeds to step S107.

（ステップＳ１０７）音声変換サーバ２０Ａの音声変換部２０４は、制御部２００により言語解析部２０６が取得した言語情報を取得すると、当該言語情報（例えば、語句およびイントネーション等を示す情報）を用いて、文字列データから音声データへ変換する。
これにより、音声変換部２０４は、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データが、言語解析部２０６によって言語解析された文字列データを、音声データに変換する。
音声変換サーバ２０Ａの制御部２００は、音声変換部２０４によって変換された音声データを、通信部２０２（第２の通信部）を介して、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）へ送信する。その後、ステップＳ００８へ進む。 (Step S107) When the speech conversion unit 204 of the speech conversion server 20A acquires the language information acquired by the language analysis unit 206 by the control unit 200, the speech conversion unit 204 uses the language information (for example, information indicating a phrase and intonation). Convert from character string data to voice data.
As a result, the voice conversion unit 204 converts the character string data, which has been linguistically analyzed by the language analysis unit 206, from the character string data transmitted from the communication unit 102 (first communication unit) of the voice reading apparatus 10A into voice data. Convert.
The control unit 200 of the voice conversion server 20A transmits the voice data converted by the voice conversion unit 204 to the communication unit 102 (first communication unit) of the voice reading device 10A via the communication unit 202 (second communication unit). ). Thereafter, the process proceeds to step S008.

ステップＳ１０８の動作は、第１の実施形態に係る音声読み上げシステム１の動作において説明したステップＳ００７の動作と同様であるため、説明を省略する。
以上で本フローチャートの処理が終了する。 Since the operation in step S108 is the same as the operation in step S007 described in the operation of the speech reading system 1 according to the first embodiment, a description thereof will be omitted.
Thus, the process of this flowchart is completed.

＜第３の実施形態＞
以下、本発明の第３の実施形態による音声読み上げシステムについて図面を参照して説明する。 <Third Embodiment>
Hereinafter, a speech reading system according to a third embodiment of the present invention will be described with reference to the drawings.

［音声変換サーバの構成］
図６は、この発明の第３の実施形態による音声読み上げシステムの音声変換サーバの機能構成を示すブロック図である。同図に示す音声変換サーバ２０Ｃは、制御部２００と、記憶部２０１と、通信部２０２と、音声情報記憶部２０３と、音声変換部２０４と、辞書記憶部２０５と、言語解析部２０６と、外国語辞書記憶部２０７と、翻訳部２０８と、を含んで構成される。なお、同図において図１および図４の各部に対応する部分には同一の符号を付け、その説明を省略する。 [Configuration of voice conversion server]
FIG. 6 is a block diagram showing a functional configuration of the voice conversion server of the voice reading system according to the third embodiment of the present invention. The voice conversion server 20C shown in the figure includes a control unit 200, a storage unit 201, a communication unit 202, a voice information storage unit 203, a voice conversion unit 204, a dictionary storage unit 205, a language analysis unit 206, A foreign language dictionary storage unit 207 and a translation unit 208 are included. In the figure, parts corresponding to those in FIGS. 1 and 4 are denoted by the same reference numerals, and description thereof is omitted.

外国語辞書記憶部２０７は、母国語の言語情報と外国語の言語情報とが対応付けられた外国語辞書情報を記憶する。外国語辞書記憶部２０７は、記憶媒体、例えば、例えば、ＨＤＤ、フラッシュメモリ、ＥＥＰＲＯＭ、ＲＡＭ、ＲＯＭ、またはこれらの記憶媒体の任意の組み合わせによって構成される。 The foreign language dictionary storage unit 207 stores foreign language dictionary information in which native language information and foreign language information are associated with each other. The foreign language dictionary storage unit 207 is configured by a storage medium, for example, an HDD, a flash memory, an EEPROM, a RAM, a ROM, or any combination of these storage media.

翻訳部２０８は、言語解析部２０６によって解析された母国語での言語情報を検索キーとして、外国語辞書記憶部２０７に記憶された外国語辞書情報に含まれる母国語の言語情報を検索し、検索された母国語の言語情報に対応付けられた外国語の言語情報を取得する。これにより、翻訳部２０８は、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データが示す母国語の言語から、指定された外国語の言語へ翻訳する。
なお、図２に示したプルダウンメニューｐｍ１によってユーザは言語を選択することができる。翻訳部２０８は、母国語の言語から、プルダウンメニューｐｍ１によって指定された外国語の言語へ翻訳する。 The translation unit 208 searches the language information of the native language included in the foreign language dictionary information stored in the foreign language dictionary storage unit 207 using the language information in the native language analyzed by the language analysis unit 206 as a search key, The foreign language language information associated with the searched native language information is acquired. Thereby, the translation unit 208 translates the native language indicated by the character string data transmitted from the communication unit 102 (first communication unit) of the speech reading apparatus 10A into the designated foreign language.
Note that the user can select a language using the pull-down menu pm1 shown in FIG. The translation unit 208 translates from the native language into the foreign language specified by the pull-down menu pm1.

［音声読み上げシステムの動作］
以下、第３の実施形態に係る音声読み上げシステム１の動作について説明する。
図７は、この発明の第３の実施形態による音声読み上げシステムの動作を示すフローチャートである。本フローチャートの処理は、音声読み上げ装置１０Ａの制御部１００により表示部１０３へテキスト情報を含むコンテンツデータが出力された際に開始する。 [Operation of voice reading system]
Hereinafter, the operation of the speech reading system 1 according to the third embodiment will be described.
FIG. 7 is a flowchart showing the operation of the speech reading system according to the third embodiment of the present invention. The processing of this flowchart starts when content data including text information is output to the display unit 103 by the control unit 100 of the speech reading apparatus 10A.

ステップＳ２０１およびステップＳ２０２の動作は、第１の実施形態に係る音声読み上げシステム１の動作において説明したステップＳ００１およびステップＳ００２の動作とそれぞれ同様であるため、説明を省略する。 The operations in step S201 and step S202 are the same as the operations in step S001 and step S002 described in the operation of the speech reading system 1 according to the first embodiment, and thus the description thereof is omitted.

（ステップＳ２０３）音声読み上げ装置１０Ａの操作入力部１０４は、ユーザの操作入力（例えば、図２に示したプルダウンメニューｐｍ１による操作入力）に基づく言語選択指示を示す信号を検出する。操作入力部１０４は、当該信号を検知すると、制御部１００は、ユーザによって選択された言語を示す情報を、記憶部１０１に一時記憶させる。その後、ステップＳ２０５へ進む。 (Step S203) The operation input unit 104 of the speech reading apparatus 10A detects a signal indicating a language selection instruction based on a user operation input (for example, an operation input by the pull-down menu pm1 shown in FIG. 2). When the operation input unit 104 detects the signal, the control unit 100 temporarily stores information indicating the language selected by the user in the storage unit 101. Thereafter, the process proceeds to step S205.

ステップＳ２０４およびステップＳ２０５の動作は、第１の実施形態に係る音声読み上げシステム１の動作において説明したステップＳ００３およびステップＳ００４の動作とそれぞれ同様であるため、説明を省略する。 The operations in step S204 and step S205 are the same as the operations in step S003 and step S004 described in the operation of the speech reading system 1 according to the first embodiment, and thus the description thereof is omitted.

（ステップＳ２０６）音声読み上げ装置１０Ａの選択範囲検出部１０５は、ステップＳ２０５において検出した文字列選択範囲の画像に含まれる文字列を抽出し、抽出した文字列をテキストデータに変換した文字列データを生成する。制御部１００は、当該文字列データを、通信部１０２（第１の通信部）を介して、音声変換サーバ２０Ａの通信部２０２（第２の通信部）へ送信する。また、ステップＳ２０３において記憶部１０１に一時記憶された、ユーザによって選択された言語を示す情報も併せて送信される。その後、ステップＳ２０７へ進む。 (Step S206) The selection range detection unit 105 of the speech reading apparatus 10A extracts the character string included in the image of the character string selection range detected in Step S205, and converts the extracted character string into text data. Generate. The control unit 100 transmits the character string data to the communication unit 202 (second communication unit) of the voice conversion server 20A via the communication unit 102 (first communication unit). In addition, information indicating the language selected by the user, which is temporarily stored in the storage unit 101 in step S203, is also transmitted. Thereafter, the process proceeds to step S207.

（ステップＳ２０７）音声変換サーバ２０Ａの通信部２０２（第２の通信部）は、ステップＳ２０６において音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された文字列データ、およびユーザによって選択された言語を示す情報を、通信ネットワーク５０を介して受信する。音声変換サーバ２０Ａの制御部２００は、通信部２０２が受信した文字列データを言語解析部２０６へ出力する。
言語解析部２０６は、制御部２００により通信部２０２が受信した文字列データを取得すると、当該文字列データを検索キーとして、辞書記憶部２０５に記憶された辞書情報に含まれる文字列データを検索し、検索された文字列データに対応付けられた言語情報を取得する。制御部２００は、言語解析部２０６が取得した言語情報を、翻訳部２０８へ出力する。その後、ステップＳ２０８へ進む。 (Step S207) The communication unit 202 (second communication unit) of the speech conversion server 20A receives the character string data transmitted from the communication unit 102 (first communication unit) of the speech reading apparatus 10A in Step S206 and the user. Information indicating the selected language is received via the communication network 50. The control unit 200 of the voice conversion server 20A outputs the character string data received by the communication unit 202 to the language analysis unit 206.
When the language analysis unit 206 acquires the character string data received by the communication unit 202 from the control unit 200, the language analysis unit 206 searches the character string data included in the dictionary information stored in the dictionary storage unit 205 using the character string data as a search key. The language information associated with the retrieved character string data is acquired. The control unit 200 outputs the language information acquired by the language analysis unit 206 to the translation unit 208. Thereafter, the process proceeds to step S208.

（ステップＳ２０８）音声変換サーバ２０Ａの翻訳部２０８は、制御部２００により言語解析部２０６が取得した母国語の言語情報を取得すると、当該母国語の言語情報を検索キーとして、外国語辞書記憶部２０７に記憶された外国語辞書情報に含まれる母国語の言語情報を検索し、検索された母国語の言語情報に対応付けられた外国語の言語情報を取得する。ここで、翻訳部２０８は、ステップＳ２０７において通信部２０２より受信された、ユーザによって選択された言語を示す情報が示す外国語の言語情報を外国語辞書記憶部２０７から取得する。制御部２００は、翻訳部２０８が取得した外国語の言語情報を、音声変換部へ出力する。その後、ステップＳ２０９へ進む。 (Step S208) When the translation unit 208 of the speech conversion server 20A acquires the language information of the native language acquired by the language analysis unit 206 by the control unit 200, the foreign language dictionary storage unit using the language information of the native language as a search key. The language information of the native language included in the foreign language dictionary information stored in 207 is searched, and the language information of the foreign language associated with the searched language information of the native language is acquired. Here, the translation unit 208 acquires from the foreign language dictionary storage unit 207 the foreign language language information indicated by the information indicating the language selected by the user, received from the communication unit 202 in step S207. The control unit 200 outputs the language information of the foreign language acquired by the translation unit 208 to the voice conversion unit. Thereafter, the process proceeds to step S209.

（ステップＳ２０９）音声変換サーバ２０Ａの音声変換部２０４は、制御部２００により翻訳部２０８が取得した外国語の言語情報を取得すると、当該外国語の言語情報（例えば、語句およびイントネーション等を示す情報）を用いて、外国語の文字列データから外国語の音声データへ変換する。
これにより、音声変換部２０４は、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）から送信された母国語の文字列データが、言語解析部２０６によって言語解析され、さらに翻訳部２０８による翻訳によって外国語に変換された外国語の文字列データを、音声データに変換する。
音声変換サーバ２０Ａの制御部２００は、音声変換部２０４によって変換された音声データを、通信部２０２（第２の通信部）を介して、音声読み上げ装置１０Ａの通信部１０２（第１の通信部）へ送信する。その後、ステップＳ２１０へ進む。 (Step S209) When the voice conversion unit 204 of the voice conversion server 20A acquires the language information of the foreign language acquired by the translation unit 208 by the control unit 200, the language information of the foreign language (for example, information indicating a phrase and intonation) ) To convert foreign language character string data to foreign language speech data.
As a result, the speech conversion unit 204 performs language analysis on the character string data of the native language transmitted from the communication unit 102 (first communication unit) of the speech reading apparatus 10A by the language analysis unit 206, and further by the translation unit 208. Foreign language character string data converted into a foreign language by translation is converted into voice data.
The control unit 200 of the voice conversion server 20A transmits the voice data converted by the voice conversion unit 204 to the communication unit 102 (first communication unit) of the voice reading device 10A via the communication unit 202 (second communication unit). ). Thereafter, the process proceeds to step S210.

ステップＳ２１０の動作は、第１の実施形態に係る音声読み上げシステム１の動作において説明したステップＳ００７の動作と同様であるため、説明を省略する。
以上で本フローチャートの処理が終了する。 The operation in step S210 is the same as the operation in step S007 described in the operation of the speech reading system 1 according to the first embodiment, and thus the description thereof is omitted.
Thus, the process of this flowchart is completed.

＜第４の実施形態＞
以下、本発明の第４の実施形態による音声読み上げシステムについて図面を参照して説明する。
上述した第１乃至第３の実施形態による音声読み上げシステム１は、図１に示したように、通信ネットワーク５０によって通信接続された音声読み上げ装置１０Ａと音声変換サーバ２０（２０Ａ、２０Ｂ、２０Ｃ）とによって構成された。第１乃至第３の実施形態による音声読み上げシステム１では、音声読み上げ装置１０Ａはユーザからの操作入力を受け付け、音声による読み上げを行い、音声変換サーバは文字列データから音声データへの変換を行う構成であるが、本発明はこの構成に限られない。例えば、この第４の実施形態による音声読み上げシステム１の音声読み上げ装置１０Ｂのように、上述した音声変換サーバ２０（２０Ａ、２０Ｂ、２０Ｃ）が有する機能を、音声読み上げ装置１０Ｂ自体が有するような構成であってもよい。 <Fourth Embodiment>
Hereinafter, a speech reading system according to a fourth embodiment of the present invention will be described with reference to the drawings.
The voice reading system 1 according to the first to third embodiments described above includes a voice reading device 10A and a voice conversion server 20 (20A, 20B, 20C) connected by a communication network 50 as shown in FIG. Consists of. In the speech-to-speech system 1 according to the first to third embodiments, the speech-to-speech device 10A receives an operation input from the user, performs speech-to-speech, and the speech conversion server performs conversion from character string data to speech data. However, the present invention is not limited to this configuration. For example, like the voice reading device 10B of the voice reading system 1 according to the fourth embodiment, the voice reading device 10B itself has the function that the voice conversion server 20 (20A, 20B, 20C) has. It may be.

［音声読み上げ装置の構成］
図８は、この発明の第４の実施形態による音声読み上げシステムの音声読み上げ装置の機能構成を示すブロック図である。同図に示す音声読み上げ装置１０Ｂは、制御部１００と、記憶部１０１と、通信部１０２と、表示部１０３と、操作入力部１０４と、選択範囲検出部１０５と、音声再生部１０６と、音声情報記憶部２０３と、音声変換部２０４と、辞書記憶部２０５と、言語解析部２０６と、外国語辞書記憶部２０７と、翻訳部２０８と、を含んで構成される。なお、同図において図１、図４、および図６の各部に対応する部分には同一の符号を付け、その説明を省略する。 [Configuration of voice reading device]
FIG. 8 is a block diagram showing a functional configuration of the speech reading apparatus of the speech reading system according to the fourth embodiment of the present invention. The voice reading apparatus 10B shown in the figure includes a control unit 100, a storage unit 101, a communication unit 102, a display unit 103, an operation input unit 104, a selection range detection unit 105, a voice reproduction unit 106, and a voice. The information storage unit 203, the voice conversion unit 204, the dictionary storage unit 205, the language analysis unit 206, the foreign language dictionary storage unit 207, and the translation unit 208 are configured. In the figure, parts corresponding to those in FIGS. 1, 4 and 6 are given the same reference numerals, and the description thereof is omitted.

以上、説明したように、本発明の各実施形態による音声読み上げシステム１は、ユーザによって選択された範囲の文字列の内容を認識させやすくすることができる。 As described above, the speech-to-speech system 1 according to each embodiment of the present invention can easily recognize the contents of the character string in the range selected by the user.

本発明の各実施形態による音声読み上げシステム１によれば、ユーザは、ディスプレイに表示された文書の中で、読みにくい文字列の部分を指でなぞることによって音声による読み上げを行わせることができる。スマートフォンの画面は一般に手のひらに収まる程度の大きさであり、スマートフォンの表示部に長い文書（例えば、Ｗｅｂページやアプリの説明表示等）を表示させた場合、表示部の横幅の範囲に文書の横幅全体が収まるように表示されている状態において、表示されている一文字当たりのサイズが、例えば、３ミリメートル程度以下の大きさで表示され、小さすぎて読みにくい部分が生ずる。しかし、本発明の各実施形態による音声読み上げシステム１によれば、指定された文字列に対応した音声を出力するようにしたので、ユーザは拡大表示をさせる操作等を行うことなく、ユーザはディスプレイに表示された文書の内容を認識しやすい。 According to the speech-to-speech system 1 according to each embodiment of the present invention, the user can perform speech-to-speech by tracing a difficult-to-read character string portion with a finger in the document displayed on the display. The screen of a smartphone is generally large enough to fit in the palm of the hand, and when a long document (for example, a web page or app description display) is displayed on the display of the smartphone, the width of the document within the range of the width of the display In the state where the entire image is displayed, the displayed size per character is, for example, about 3 millimeters or less, and a portion that is too small to read is generated. However, according to the speech reading system 1 according to each embodiment of the present invention, since the voice corresponding to the designated character string is output, the user does not perform an operation for enlarging the display, and the user can It is easy to recognize the contents of the document displayed on the screen.

また、本発明の各実施形態による音声読み上げシステム１によれば、ユーザは、ディスプレイに表示された文書の中で、読みにくい文字列の部分を指でなぞることにより任意の範囲を指定して、当該範囲に含まれる文字列について音声による読み上げを行わせることができる。これにより、本発明の各実施形態による音声読み上げシステム１は、従来技術とは異なり、予め決められた文書や文字列についてのみ音声による読み上げを行うことができるというような制限がない。 Further, according to the speech reading system 1 according to each embodiment of the present invention, the user designates an arbitrary range by tracing a difficult-to-read character string portion with a finger in the document displayed on the display, The character string included in the range can be read out by voice. Thus, unlike the conventional technique, the speech reading system 1 according to each embodiment of the present invention is not limited such that only a predetermined document or character string can be read out by speech.

また、本発明の各実施形態による音声読み上げシステム１は、選択された文字列を言語解析し、さらに翻訳して音声による読み上げを行うことができる。これにより、例えば、銀行窓口等において、外国人の顧客が、表示されている文書の中で訳すことができない文字列の部分を指でなぞって選択し、選択した文字列を当該外国人の母国語に翻訳した音声にして読み上げさせることができる。 In addition, the speech reading system 1 according to each embodiment of the present invention can perform speech analysis by performing language analysis on the selected character string and further translating it. Thus, for example, at a bank counter, a foreign customer selects a portion of a character string that cannot be translated in the displayed document with a finger and selects the selected character string with the mother of the foreigner. It can be read as a voice translated into Japanese.

上述した実施形態における音声読み上げ装置１０（１０Ａ、１０Ｂ）および音声変換サーバ２０（２０Ａ、２０Ｂ、２０Ｃ）をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 The speech reading apparatus 10 (10A, 10B) and the speech conversion server 20 (20A, 20B, 20C) in the above-described embodiment may be realized by a computer. In that case, a program for realizing this function may be recorded on a computer-readable recording medium, and the program recorded on this recording medium may be read into a computer system and executed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Furthermore, the “computer-readable recording medium” dynamically holds a program for a short time like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be a program for realizing a part of the above-described functions, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system. You may implement | achieve using programmable logic devices, such as FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

１・・・音声読み上げシステム、１０（１０Ａ、１０Ｂ）・・・音声読み上げ装置、２０（２０Ａ、２０Ｂ、２０Ｃ）・・・音声変換サーバ、５０・・・通信ネットワーク、１００・・・制御部、１０１・・・記憶部、１０２・・・通信部（第１の通信部）、１０３・・・表示部、１０４・・・操作入力部、１０５・・・選択範囲検出部、１０６・・・音声再生部、２００・・・制御部、２０１・・・記憶部、２０２・・・通信部（第２の通信部）、２０３・・・音声情報記憶部、２０４・・・音声変換部、２０５・・・辞書記憶部、２０６・・・言語解析部、２０７・・・外国語辞書記憶部、２０８・・・翻訳部、ｂｔ１・・・音声読み上げボタン、ｓａ１・・・文字列選択範囲、ｐｍ１・・・プルダウンメニュー DESCRIPTION OF SYMBOLS 1 ... Voice reading system, 10 (10A, 10B) ... Voice reading apparatus, 20 (20A, 20B, 20C) ... Voice conversion server, 50 ... Communication network, 100 ... Control part, DESCRIPTION OF SYMBOLS 101 ... Memory | storage part, 102 ... Communication part (1st communication part), 103 ... Display part, 104 ... Operation input part, 105 ... Selection range detection part, 106 ... Voice Playback unit, 200... Control unit, 201... Storage unit, 202 .. communication unit (second communication unit), 203... Voice information storage unit, 204. ..Dictionary storage unit 206 ... Language analysis unit 207 ... Foreign language dictionary storage unit 208 ... Translation unit bt1 ... Speech button, sa1 ... Character string selection range, pm1 ··pull-down menu

Claims

A display for displaying text information;
An operation input unit that receives an operation input for selecting part or all of a character string included in the text information displayed by the display unit;
An audio reproduction unit for reproducing audio based on character string data indicating the character string selected by the operation input;
A speech-to-speech device comprising:

The voice reading unit according to claim 1, wherein the voice reproduction unit reproduces voice based on character string data indicating a character string obtained by performing language analysis on the character string based on the character string data using dictionary information. apparatus.

The voice reproduction unit reproduces voice based on character string data indicating a character string obtained by translating a character string based on the character string data using a dictionary information into a language. The speech reading apparatus according to claim 1, wherein

A voice reading system having a voice reading device and a voice conversion server,
The voice reading device is
A display for displaying text information;
An operation input unit that receives an operation input for selecting part or all of a character string included in the text information displayed by the display unit;
A first communication unit that transmits character string data indicating a character string selected by the operation input to the second communication unit and receives voice data from the second communication unit;
An audio reproduction unit for reproducing audio based on the audio data;
With
The voice conversion server
A voice conversion unit that converts the character string data transmitted from the first communication unit into the voice data;
A second communication unit that receives the character string data from the first communication unit and transmits the voice data converted by the voice conversion unit to the first communication unit;
A speech-to-speech system characterized by comprising:

A computer-to-speech reading method,
A display step for displaying text information;
An operation input step in which an operation input unit receives an operation input for selecting a part or all of a character string included in the text information displayed by the display unit;
An audio reproduction step for reproducing audio based on character string data indicating the character string selected by the operation input;
A speech reading method characterized by comprising:

On the computer,
A display step for displaying text information;
An operation input step of accepting an operation input by a user who selects a part or all of the character string included in the text information displayed by the display step;
An audio reproduction step of reproducing audio based on character string data indicating the character string selected by the operation input;
A program for running