JP2020201435A

JP2020201435A - Information processing terminal and information processing method

Info

Publication number: JP2020201435A
Application number: JP2019109747A
Authority: JP
Inventors: 格黒澤; Itaru Kurosawa
Original assignee: Pony Canyon Inc
Current assignee: Pony Canyon Inc
Priority date: 2019-06-12
Filing date: 2019-06-12
Publication date: 2020-12-17
Anticipated expiration: 2039-06-12
Also published as: JP6773844B1

Abstract

To realize an information processing terminal which can more preferably grasp a document composed of multiple pages through reading aloud and provide an experience of enjoying sentences for a visually impaired person.SOLUTION: An information processing terminal (1) includes a camera (11) which captures each page of a document composed of a plurality of pages, a controller (12) which executes text data generation processing and audio signal generation processing, an output unit which outputs the audio signal, and one or both of speakers which converts and outputs the audio signal into a sound wave.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理端末に関する。また、本発明は、情報処理端末を載置するための支持体に関する。また、本発明は、情報処理方法に関する。 The present invention relates to an information processing terminal. The present invention also relates to a support for mounting an information processing terminal. The present invention also relates to an information processing method.

視覚障碍者が利用する文書の形態として、活字図書を点字に変換した点字図書、活字図書を音声により読み上げた音訳図書、活字図書の文字や図を拡大し、見やすく整えて出版した拡大図書、及び電子書籍が知られている（特許文献１）。 Documents used by the visually impaired include Braille books, which are converted from printed books into Braille, transliterated books, which are read aloud from printed books, enlarged books, which are published by enlarging the characters and figures of the printed books. Electronic books are known (Patent Document 1).

”視覚障害者の読書について”、［online］、社会福祉法人日本盲人会連合、［令和１年５月２７日検索］、インターネット＜ＵＲＬ:http://nichimou.org/morebooks/reading/＞"About reading for the visually impaired", [online], Social Welfare Corporation Japan Blind Association, [Search on May 27, 1st year of Reiwa], Internet <URL: http://nichimou.org/morebooks/reading/>

しかしながら、点字図書、音訳図書、及び拡大図書は、作成に時間が必要であり、これらの図書を視覚障碍者が利用可能になる時期が遅いという問題がある。さらに、点字図書及び音訳図書は、コストの観点からボランティアに頼った運用となるという問題がある。また、拡大図書及び電子書籍は全盲者が利用できないという問題がある。さらには、これらは市販された書籍を対象としたものであるが、たとえば学術論文、地方自治体などが発行する広報誌、新聞、新聞の折り込み広告、パンフレットなどの広範な印刷物が網羅的に対象とされている状況ではない。そのため、視覚障碍者に向けた、より好適な、文章を音読するツールの開発が望まれている。 However, Braille books, transliterated books, and enlarged books require time to be created, and there is a problem that these books are available to the visually impaired at a late time. Furthermore, there is a problem that Braille books and transliterated books are operated by relying on volunteers from the viewpoint of cost. In addition, there is a problem that enlarged books and electronic books cannot be used by blind people. Furthermore, although these are intended for commercially available books, they cover a wide range of printed matter such as academic treatises, public relations magazines published by local governments, newspapers, newspaper inserts, and pamphlets. It is not the situation that has been done. Therefore, it is desired to develop a more suitable tool for reading aloud sentences for the visually impaired.

本発明の一態様は、前記の課題を鑑みてなされたものであり、その目的は、視覚障碍者に対してより好適に音読を通して複数頁で構成される文書を把握し、文章を楽しむ体験を提供することが可能な情報処理端末を実現することにある。 One aspect of the present invention has been made in view of the above-mentioned problems, and an object of the present invention is to provide a visually impaired person with an experience of grasping a document composed of a plurality of pages through reading aloud and enjoying a sentence. The purpose is to realize an information processing terminal that can be provided.

前記の課題を解決するために、本発明の一態様に係る情報処理端末は、複数の頁で構成される文書の各頁を撮像するカメラと、前記カメラにて得られた画像データを参照して、前記頁に記載されたテキストを表すテキストデータを生成するテキストデータ生成処理、及び、前記テキストデータを参照して、前記テキストを読み上げた音声を表す音声信号を生成する音声信号生成処理を実行するコントローラと、前記音声信号を出力する出力部、及び、前記音声信号を音波に変換して出力するスピーカの一方又は両方と、を備えている。 In order to solve the above-mentioned problems, the information processing terminal according to one aspect of the present invention refers to a camera that captures each page of a document composed of a plurality of pages and image data obtained by the camera. Then, a text data generation process for generating text data representing the text described on the page and a voice signal generation process for generating a voice signal representing the voice read out from the text are executed with reference to the text data. The controller includes an output unit that outputs the audio signal, and one or both of a speaker that converts the audio signal into sound waves and outputs the data.

前記の課題を解決するために、本発明の一態様に係る支持体は、書籍の頁と情報処理端末のカメラレンズとが正対するように前記情報処理端末を支持する支持体において、前記情報処理端末が載置される天板であって、前記カメラレンズと重なる部分が開放された天板と、上端が前記天板の外縁に連結されると共に、下端が前記頁の外縁に押し当てられる側板と、を備えている。 In order to solve the above-mentioned problems, the support according to one aspect of the present invention is the support that supports the information processing terminal so that the pages of the book and the camera lens of the information processing terminal face each other. A top plate on which a terminal is placed, the top plate having an open portion overlapping with the camera lens, and a side plate whose upper end is connected to the outer edge of the top plate and whose lower end is pressed against the outer edge of the page. And have.

前記の課題を解決するために、本発明の一態様に係る情報処理方法は、カメラと、コントローラと、出力部、及び、スピーカの一方又は両方と、を備えた情報処理端末を用いて複数の頁で構成される文書を音声として出力する情報処理方法であって、前記カメラが、複数の頁で構成される文書の各頁を撮像する撮像処理と、前記コントローラが、前記カメラにて得られた画像データを参照して、前記頁に記載されたテキストを表すテキストデータを生成するテキストデータ生成処理と、前記コントローラが、前記テキストデータを参照して、前記テキストを読み上げた音声を表す音声信号を生成する音声信号生成処理と、前記出力部が前記音声信号を出力する、及び／又は、前記スピーカが前記音声信号を音波に変換して出力する音声出力処理と、を含んでいる。 In order to solve the above-mentioned problems, the information processing method according to one aspect of the present invention uses a plurality of information processing terminals including a camera, a controller, an output unit, and one or both of speakers. An information processing method for outputting a document composed of pages as audio, wherein the camera obtains an imaging process for capturing each page of a document composed of a plurality of pages, and the controller is obtained by the camera. A text data generation process that generates text data representing the text described on the page by referring to the image data, and a voice signal representing a voice that the controller reads the text by referring to the text data. Includes an audio signal generation process for generating the audio signal, and an audio output process for the output unit to output the audio signal and / or the speaker to convert the audio signal into sound waves and output the data.

本発明の一態様によれば、視覚障碍者に対してより好適に音読を通して複数頁で構成される文書を把握し、文章を楽しむ体験を提供することが可能な情報処理端末を実現することができる。 According to one aspect of the present invention, it is possible to realize an information processing terminal capable of more preferably grasping a document composed of a plurality of pages through reading aloud and providing an experience of enjoying a sentence to a visually impaired person. it can.

本発明の一実施形態に係る情報処理端末の使用方法の概略を示す図である。It is a figure which shows the outline of the usage method of the information processing terminal which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理端末の構成を示すブロック図である。It is a block diagram which shows the structure of the information processing terminal which concerns on one Embodiment of this invention. 図２に示す情報処理端末を用いて実施される情報処理方法の流れを示すフローチャートである。It is a flowchart which shows the flow of the information processing method implemented using the information processing terminal shown in FIG. 本発明の一実施形態に係る情報処理端末の支持体の構成を示す斜視図である。It is a perspective view which shows the structure of the support of the information processing terminal which concerns on one Embodiment of this invention. 本発明の一実施形態に係る情報処理端末の支持体の構成を示す展開図である。It is a development view which shows the structure of the support of the information processing terminal which concerns on one Embodiment of this invention.

〔基本コンセプト〕
本実施形態の基本コンセプトについて、図１を参照して説明する。図１は、本実施形態に係る情報処理端末１及び支持体２の使用方法の概略を示す斜視図である。情報処理端末１としては、例えば、カメラ付きのスマートフォンが挙げられる。 [Basic concept]
The basic concept of this embodiment will be described with reference to FIG. FIG. 1 is a perspective view showing an outline of how to use the information processing terminal 1 and the support 2 according to the present embodiment. Examples of the information processing terminal 1 include a smartphone with a camera.

まず、図１の１００１に示すように、ユーザは、書籍をはじめとする複数の頁で構成される文書の頁を開く。複数の頁で構成される文書としては、例えば、新刊書や文庫本等の活字図書、学術論文、地方自治体などが発行する広報誌、新聞、新聞の折り込み広告、パンフレットなどが挙げられる。次いで、図１の１００２に示すように、ユーザは、開いた頁の上に支持体２を載置する。これにより、開いた頁の外縁に支持体２の側板２２の下端が押し当てられ、その結果、開いた頁は、状態が撮像に適した平坦な状態になる。その後、図１の１００３に示すように、ユーザは、支持体２の天板２１上に情報処理端末１を載置する。これにより、情報端末装置１のカメラレンズは、天板２１に設けられた開口２１ａを介して開いた頁に正対し、情報端末装置１のカメラレンズから開いた頁までの距離は、当該カメラレンズの被写界深度に含まれるようになる。情報処理端末１には、この状態において開いた頁を撮像し、得られた画像データを参照して当該頁に記載されたテキストを読み上げる。これにより、ユーザは、開いた頁に記載されたテキストを、音声として聴取することができる。 First, as shown in 1001 of FIG. 1, a user opens a page of a document composed of a plurality of pages including a book. Examples of documents composed of a plurality of pages include printed books such as new books and paperback books, academic treatises, public relations magazines issued by local governments, newspapers, newspaper insert advertisements, pamphlets, and the like. The user then places the support 2 on the opened page, as shown in 1002 of FIG. As a result, the lower end of the side plate 22 of the support 2 is pressed against the outer edge of the opened page, and as a result, the opened page becomes a flat state suitable for imaging. After that, as shown in 1003 of FIG. 1, the user places the information processing terminal 1 on the top plate 21 of the support 2. As a result, the camera lens of the information terminal device 1 faces the page opened through the opening 21a provided in the top plate 21, and the distance from the camera lens of the information terminal device 1 to the opened page is the camera lens. Will be included in the depth of field of. The information processing terminal 1 captures an image of a page opened in this state, and reads out the text described on the page with reference to the obtained image data. As a result, the user can listen to the text written on the opened page as a voice.

〔情報処理端末〕
本実施形態に係る情報処理端末１の構成について、図２を参照して説明する。図２は、情報処理端末１の構成を示すブロック図である。以下、複数の頁で構成される文書が書籍である場合を一例として説明する。 [Information processing terminal]
The configuration of the information processing terminal 1 according to the present embodiment will be described with reference to FIG. FIG. 2 is a block diagram showing the configuration of the information processing terminal 1. Hereinafter, a case where a document composed of a plurality of pages is a book will be described as an example.

情報処理端末１には、書籍の頁に記載されたテキストを読み上げるためのアプリケーションプログラム（以下、「朗読アプリ」と記載する）がインストールされている。情報処理端末１は、朗読アプリに従って朗読処理を実行するために利用するハードウェア資源として、カメラ１１と、コントローラ１２と、出力端子１３（出力部）と、スピーカ１４と、メモリ１５と、通信インターフェース１６と、を備えている。情報処理端末１は、例えば、スマートフォンで実現することができる。朗読アプリは、メモリ１５に保存されていてもよいし、それ以外の記録媒体に保存されていてもよい。 An application program (hereinafter, referred to as a "reading application") for reading aloud the text written on a page of a book is installed in the information processing terminal 1. The information processing terminal 1 has a camera 11, a controller 12, an output terminal 13 (output unit), a speaker 14, a memory 15, and a communication interface as hardware resources used to execute a reading process according to a reading application. 16 and. The information processing terminal 1 can be realized by, for example, a smartphone. The reading application may be stored in the memory 15 or may be stored in another recording medium.

カメラ１１は、書籍の頁を撮像するための構成である。カメラ１１としては、例えば、カメラレンズ、撮像素子、及び画像信号処理回路とにより構成されるデジタルカメラを用いられる。メモリ１５としては、例えば、１又は複数の半導体ＲＡＭ（random access memory）が用いられる。 The camera 11 is configured to capture a page of a book. As the camera 11, for example, a digital camera composed of a camera lens, an image sensor, and an image signal processing circuit is used. As the memory 15, for example, one or a plurality of semiconductor RAMs (random access memory) are used.

コントローラ１２は、朗読アプリに従って朗読処理を実行するための構成である。コントローラ１２としては、例えば、１又は複数のＣＰＵ（Central Processing Unit）が用いられる。朗読処理には、少なくともテキストデータ生成処理及び音声信号生成処理が含まれる。ここで、テキストデータ生成処理とは、カメラ１１にて得られた画像データを参照して、書籍の頁に記載されたテキストを表すテキストデータを生成する処理のことを指す。また、音声信号生成処理とは、テキストデータ生成処理にて生成されたテキストデータを参照して、書籍の頁に記載されたテキストを読み上げた音声を表す音声信号を生成する処理のことを指す。朗読処理の詳細については、参照する図面を代えて後述する。 The controller 12 is configured to execute the reading process according to the reading application. As the controller 12, for example, one or a plurality of CPUs (Central Processing Units) are used. The reading process includes at least a text data generation process and an audio signal generation process. Here, the text data generation process refers to a process of generating text data representing the text described on a page of a book by referring to the image data obtained by the camera 11. Further, the audio signal generation process refers to a process of referring to the text data generated by the text data generation process and generating an audio signal representing the voice obtained by reading the text described on the page of the book. The details of the reading process will be described later instead of the reference drawings.

出力端子１３は、音声信号生成処理にて生成された音声信号を、電気信号として外部に出力するための構成である。音声信号の出力先としては、例えば、イヤホン３が挙げられる。スピーカ１４は、音声信号生成処理にて生成された音声信号を、音波として外部に出力するための構成である。情報処理端末１は、出力端子１３のみを備えていてもよいし、スピーカ１４のみを備えていてもよいし、出力端子１３及びスピーカ１４の両方を備えていてもよい。また、情報処理端末１は、有線インターフェースである出力端子１３の代わりに近距離無線通信インターフェースを備えていてもよいし、出力端子１３に加えて無線インターフェースを備えていてもよい。近距離無線通信インターフェースとしては、例えば、Bluetooth（登録商標）インターフェースが挙げられる。情報処理端末１が近距離無線インターフェースを備えている場合、出力端子１３に接続して用いるイヤホン３の代わりに、近距離無線通信に対応したイヤホン、ヘッドホン、スピーカ等に音声を出力することができる。 The output terminal 13 is configured to output the voice signal generated by the voice signal generation process to the outside as an electric signal. Examples of the output destination of the audio signal include the earphone 3. The speaker 14 is configured to output the audio signal generated by the audio signal generation process to the outside as a sound wave. The information processing terminal 1 may include only the output terminal 13, only the speaker 14, or both the output terminal 13 and the speaker 14. Further, the information processing terminal 1 may be provided with a short-range wireless communication interface instead of the output terminal 13 which is a wired interface, or may be provided with a wireless interface in addition to the output terminal 13. Examples of the short-range wireless communication interface include a Bluetooth (registered trademark) interface. When the information processing terminal 1 is provided with a short-range wireless interface, audio can be output to earphones, headphones, speakers, or the like that support short-range wireless communication instead of the earphone 3 used by connecting to the output terminal 13. ..

通信インターフェース１６は、サーバと通信するためのインターフェースである。コントローラ１２は、通信インターフェース１６を介してサーバとの通信を行うことによって、朗読処理の一部をサーバと連携して実行することができる。コントローラ１２がサーバと連携して実行する処理の具体例については、参照する図面を代えて後述する。 The communication interface 16 is an interface for communicating with the server. The controller 12 can execute a part of the reading process in cooperation with the server by communicating with the server via the communication interface 16. A specific example of the process executed by the controller 12 in cooperation with the server will be described later instead of the reference drawing.

〔朗読処理〕
朗読アプリに従って情報処理端末１が実行する朗読処理Ｓ１の流れについて、図３を参照して説明する。図３は、朗読処理Ｓ１（複数の頁で構成される文書を音声として出力する情報処理方法）の流れを示すフローチャートである。 [Reading process]
The flow of the reading process S1 executed by the information processing terminal 1 according to the reading application will be described with reference to FIG. FIG. 3 is a flowchart showing the flow of reading processing S1 (information processing method for outputting a document composed of a plurality of pages as voice).

朗読処理Ｓ１は、図３に示すように、撮像処理Ｓ１０と、逆立画像判定処理Ｓ１１と、逆立画像通知処理Ｓ１２と、表紙判定処理Ｓ１３と、テキストデータ生成処理Ｓ１４と、音声信号生成処理Ｓ１５と、音声出力処理Ｓ１６と、縦横判定処理Ｓ１７と、テキストデータ生成処理Ｓ１８と、文字過少頁判定処理Ｓ１９と、文字過少頁通知処理Ｓ２０と、音声信号生成処理Ｓ２１と、音声出力処理Ｓ２２と、を含んでいる。 As shown in FIG. 3, the reading process S1 includes an image pickup process S10, an inverted image determination process S11, an inverted image notification process S12, a cover surface determination process S13, a text data generation process S14, and an audio signal generation process. S15, audio output processing S16, vertical / horizontal determination processing S17, text data generation processing S18, character underpage determination processing S19, character underpage notification processing S20, audio signal generation processing S21, and audio output processing S22. , Including.

撮像処理Ｓ１０は、書籍の頁を撮像する処理である。撮像処理Ｓ１０は、コントローラ１２に制御されたカメラ１１によって実行される。なお、撮像処理Ｓ１０を実行するタイミングは、音声コマンドによってユーザが撮像を指示したタイミングであってもよいし、カメラ１１が書籍の頁に対するフォーカスが成功したタイミングであってもよい。 The imaging process S10 is a process of imaging a page of a book. The image pickup process S10 is executed by the camera 11 controlled by the controller 12. The timing of executing the imaging process S10 may be the timing when the user instructs the imaging by a voice command, or the timing when the camera 11 succeeds in focusing on the page of the book.

逆立画像判定処理Ｓ１１は、撮像処理Ｓ１０にて得られた画像データの表す画像が逆立画像あるか否か（正立画像であるか）を判定する処理である。ここで、逆立画像とは、頁上方が画面下方に、且つ、頁下方が画面上方に配置された画像のことを指す。また、正立画像とは、頁上方が画面上方に、且つ、頁下方が画面下方に配置された画像のことを指す。逆立画像判定処理Ｓ１１は、コントローラ１２によって実行される。 The upright image determination process S11 is a process of determining whether or not the image represented by the image data obtained in the imaging process S10 is an upright image (whether it is an upright image). Here, the inverted image refers to an image in which the upper part of the page is arranged at the lower part of the screen and the lower part of the page is arranged at the upper part of the screen. The upright image refers to an image in which the upper part of the page is arranged at the upper part of the screen and the lower part of the page is arranged at the lower part of the screen. The inverted image determination process S11 is executed by the controller 12.

逆立画像判定処理Ｓ１１にて逆立画像であると判定された場合、逆立画像通知処理Ｓ１２が実行される。逆立画像通知処理Ｓ１２は、撮像処理Ｓ１０にて得られた画像データが逆立画像を表すことをユーザに通知する処理である。逆立画像通知処理Ｓ１２を実行するために、コントローラ１２は、例えば、「本の上下が逆になっています」という音声信号を出力端子１３又はスピーカ１４に出力する。コントローラ１２が出力端子１３に当該音声信号を出力した場合、出力端子１３は、当該音声信号をイヤホン３に出力する。イヤホン３は、当該音声信号を音波に変換して出力する。コントローラ１２がスピーカ１４に当該音声信号を出力した場合、スピーカ１４は、当該音声信号を音波に変換して出力する。これにより、本を１８０°回転して正しい向きにする動作をユーザに促すことができる。 When the inverted image determination process S11 determines that the image is an inverted image, the inverted image notification process S12 is executed. The inverted image notification process S12 is a process of notifying the user that the image data obtained in the imaging process S10 represents an inverted image. In order to execute the inverted image notification process S12, the controller 12 outputs, for example, an audio signal "the book is upside down" to the output terminal 13 or the speaker 14. When the controller 12 outputs the audio signal to the output terminal 13, the output terminal 13 outputs the audio signal to the earphone 3. The earphone 3 converts the audio signal into sound waves and outputs the signal. When the controller 12 outputs the audio signal to the speaker 14, the speaker 14 converts the audio signal into sound waves and outputs the signal. This can prompt the user to rotate the book 180 ° to the correct orientation.

なお、コントローラ１２は、逆立画像判定処理Ｓ１１にて画像が逆立画像であると判定された場合に、逆立画像を１８０°回転して正立画像として、後述する表紙判定処理Ｓ１３に進んでもよい。 When the controller 12 determines that the image is an inverted image in the inverted image determination process S11, the controller 12 rotates the inverted image by 180 ° to obtain an upright image, and proceeds to the cover determination process S13 described later. It may be.

逆立画像判定処理Ｓ１１にて正立画像であると判定された場合、表紙判定処理Ｓ１３が実行される。表紙判定処理Ｓ１３は、撮像処理Ｓ１０にて得られた画像データの表す画像に被写体として含まれる頁が表紙であるか否か（表紙以外であるか）を判定する処理である。表紙判定処理Ｓ１３は、情報処理端末１のコントローラ１２によって実行される。 When the upright image determination process S11 determines that the image is upright, the cover determination process S13 is executed. The cover page determination process S13 is a process of determining whether or not the page included as a subject in the image represented by the image data obtained in the image pickup process S10 is the cover page (whether it is other than the cover page). The cover determination process S13 is executed by the controller 12 of the information processing terminal 1.

表紙判定処理Ｓ１３にて頁が表紙であると判定された場合、テキストデータ生成処理Ｓ１４が実行される。テキストデータ生成処理Ｓ１４は、撮像処理Ｓ１０にて得られた画像データを参照して、頁に記載されたテキストを表すテキストデータを生成する処理である。テキストデータ生成処理Ｓ１４は、情報処理端末１のコントローラ１２によって実行される。特に、頁が書籍の表紙である場合に実行されるテキストデータ生成処理Ｓ１４において、コントローラ１２は、テキストデータとして、表紙に記載された題名を表す題名データ、及び、表紙に記載された著者名を表す著者名データを生成する。題名データ及び著者名データの生成は、例えば、以下の方法により実現される。 When it is determined in the cover page determination process S13 that the page is the cover page, the text data generation process S14 is executed. The text data generation process S14 is a process of generating text data representing the text described on the page by referring to the image data obtained in the imaging process S10. The text data generation process S14 is executed by the controller 12 of the information processing terminal 1. In particular, in the text data generation process S14 executed when the page is the cover of a book, the controller 12 uses the title data representing the title described on the cover and the author name described on the cover as text data. Generate author name data to represent. The generation of the title data and the author name data is realized by, for example, the following method.

方法１：表紙に記載された文字列の中で、最も大きいフォントで記載された文字列を対題名として特定すると共に、表紙に記載された文字列の中で、２番目に大きいフォントで記載された文字列を著者名として特定する。そして、特定した題名及び著者名を表すテキストデータを、題名データ及び著者名データとする。 Method 1: The character string described in the largest font among the character strings written on the cover is specified as the title, and is described in the second largest font among the character strings written on the cover. The character string is specified as the author name. Then, the text data representing the specified title and author name is used as the title data and the author name data.

方法２：書籍の表紙を表す画像データを入力とし、その書籍の題名及び著者名を表す題名及び著者名を出力するよう、機械学習された分類器（ＡＩ）を利用する。この場合、この分類器に撮像処理Ｓ１０にて得られた画像データを入力したとき、この分類器から出力される題名及び著者名を表すテキストデータを、題名データ及び著者名データとする。なお、この分類器は、情報処理端末１に実装されていてもよいし、通信インターフェース１６を介して情報処理端末１と通信可能なサーバに実装されていてもよい。後者の場合、情報処理端末１は、サーバに実装された分類器を利用して、表紙に記載された題名及び著者名を特定する。 Method 2: A machine-learned classifier (AI) is used to input image data representing the cover of a book and output the title and author name representing the title and author name of the book. In this case, when the image data obtained in the imaging process S10 is input to this classifier, the text data representing the title and author name output from this classifier is used as the title data and the author name data. The classifier may be mounted on the information processing terminal 1 or may be mounted on a server capable of communicating with the information processing terminal 1 via the communication interface 16. In the latter case, the information processing terminal 1 uses a classifier mounted on the server to specify the title and the author name described on the cover page.

方法３：様々な書籍の題名が登録された題名データベース、及び、様々な書籍の著者名が登録された著者名データベースを利用する。この場合、表紙に記載された文字列の中で、題名データベースに登録された題名に合致する文字列を題名として特定すると共に、表紙に記載された文字列の中で、著者名データベースに登録された著者名に合致する文字列を著者名として特定する。そして、特定した題名及び著者名を表すテキストデータを、題名データ及び著者名データとする。なお、これらのデータベースは、情報処理端末１に実装されていてもよいし、通信インターフェース１６を介して情報処理端末１と通信可能なサーバに実装されていてもよい。後者の場合、情報処理端末１は、サーバに実装されたデータベースを利用して、表紙に記載された題名及び著者名を特定する。 Method 3: Use a title database in which titles of various books are registered and an author name database in which author names of various books are registered. In this case, the character string that matches the title registered in the title database is specified as the title in the character string written on the cover, and is registered in the author name database in the character string written on the cover. The character string that matches the author name is specified as the author name. Then, the text data representing the specified title and author name is used as the title data and the author name data. These databases may be implemented in the information processing terminal 1 or may be implemented in a server capable of communicating with the information processing terminal 1 via the communication interface 16. In the latter case, the information processing terminal 1 uses the database implemented in the server to specify the title and the author name described on the cover page.

なお、著者名データベースには、漢字表記の著者名と共に、平仮名表記又は片仮名表記の著者名が登録されていることが好ましい。一例を挙げると、漢字表記の著者名「古沢良太」と共に、平仮名表記の著者名「こさわりょうた」が登録されていることが好ましい。この場合、著者名データを平仮名表記の著者名に基づいて生成すれば、後述する音声信号生成処理Ｓ１５において、著者名が正しく音読された音声データを生成することができる。題名データベースについても、同様のことが言える。 It is preferable that the author name in hiragana notation or katakana notation is registered in the author name database together with the author name in kanji notation. As an example, it is preferable that the author name "Kosawa Ryota" written in hiragana is registered together with the author name "Ryota Kosawa" written in kanji. In this case, if the author name data is generated based on the author name in hiragana notation, the voice data in which the author name is correctly read aloud can be generated in the voice signal generation process S15 described later. The same is true for the title database.

なお、コントローラ１２は、撮像処理Ｓ１０にて得られた画像データを参照して、頁がカメラ１１の撮像対象領域から外れた後に、テキストデータ生成処理Ｓ１４を実行してもよい。 Note that the controller 12 may execute the text data generation process S14 after the page is out of the image pickup target area of the camera 11 with reference to the image data obtained in the image pickup process S10.

音声信号生成処理Ｓ１５は、テキストデータ生成処理Ｓ１４にて生成されたテキストデータを参照して、頁に記載されたテキストを読み上げた音声を表す音声信号を生成する処理である。音声信号生成処理Ｓ１５は、テキストデータ生成処理Ｓ１４を実行した後に、情報処理端末１のコントローラ１２によって実行される。特に頁が表紙である場合に実行される音声信号生成処理Ｓ１５において、コントローラ１２は、音声信号として、題名を読み上げた音声を表す音声信号の前又は後に、著者名を読み上げた音声を表す音声信号を付加した音声信号を生成する。 The audio signal generation process S15 is a process of referring to the text data generated in the text data generation process S14 to generate an audio signal representing the voice of reading the text described on the page. The audio signal generation process S15 is executed by the controller 12 of the information processing terminal 1 after the text data generation process S14 is executed. In particular, in the audio signal generation process S15 executed when the page is the front page, the controller 12 uses the controller 12 as an audio signal to represent the audio signal reading the author's name before or after the audio signal representing the audio reading the title. Is added to generate an audio signal.

なお、音声信号生成処理Ｓ１５において、コントローラ１２は、撮像処理Ｓ１０にて得られた画像データを参照して、頁がカメラ１１の撮像対象領域から外れた後に、音声信号生成処理を実行してもよい。また、音声信号生成処理Ｓ１５において、コントローラ１２は、音声信号として、ユーザにより指定された読み上げ速度で頁に記載されたテキストを読み上げた音声を表す音声信号を生成してもよい。 In the audio signal generation process S15, the controller 12 may execute the audio signal generation process after the page is out of the image pickup target area of the camera 11 with reference to the image data obtained in the image pickup process S10. Good. Further, in the audio signal generation process S15, the controller 12 may generate an audio signal representing the audio that reads the text written on the page at the reading speed specified by the user as the audio signal.

音声出力処理Ｓ１６は、音声信号生成処理Ｓ１５にて生成された音声信号を電気信号として出力端子１３から出力する処理、及び、音声信号生成処理Ｓ１５にて生成された音声信号を音波としてスピーカ１４から出力する処理の一方又は両方である。音声出力処理Ｓ１６は、コントローラ１２によって実現される。 The audio output processing S16 is a process of outputting the audio signal generated by the audio signal generation processing S15 as an electric signal from the output terminal 13, and the audio signal generated by the audio signal generation processing S15 as a sound source from the speaker 14. One or both of the output processes. The audio output process S16 is realized by the controller 12.

表紙判定処理Ｓ１３にて頁が表紙以外であると判定された場合、縦横判定処理Ｓ１７が実行される。縦横判定処理Ｓ１７は、頁に記載されているテキストが縦書きテキストであるか横書きテキストであるかを判定する処理である。縦横判定処理Ｓ１７は、コントローラ１２によって実行される。 When it is determined in the cover page determination process S13 that the page is other than the cover page, the vertical / horizontal determination process S17 is executed. The vertical / horizontal determination process S17 is a process for determining whether the text described on the page is vertical text or horizontal text. The vertical / horizontal determination process S17 is executed by the controller 12.

テキストデータ生成処理Ｓ１８は、縦横判定処理Ｓ１７の結果、及び、撮像処理Ｓ１０にて得られた画像データを参照して、頁に記載されたテキストデータを生成する処理である。テキストデータ生成処理Ｓ１８は、情報処理端末１のコントローラ１２によって実行される。特に、頁が表紙以外である場合に実行されるテキストデータ生成処理Ｓ１８において、コントローラ１２は、テキストデータとして、頁に記載された頁番号を表す頁番号データ、及び、頁に記載された本文を表す本文データを生成する。 The text data generation process S18 is a process of generating the text data described on the page by referring to the result of the vertical / horizontal determination process S17 and the image data obtained in the image pickup process S10. The text data generation process S18 is executed by the controller 12 of the information processing terminal 1. In particular, in the text data generation process S18 executed when the page is other than the cover page, the controller 12 uses the page number data representing the page number described on the page and the text described on the page as text data. Generate text data to represent.

なお、テキストデータ生成処理Ｓ１８において、コントローラ１２は、本文データとして、（１）前頁本文の末尾が未完単語である場合、現頁本文の先頭に当該未完単語を付加すると共に、（２）現頁本文の末尾が未完単語である場合、現頁本文の末尾から当該未完単語を削除したものを表す本文データを生成してもよい。例えば、前頁本文の末尾に「ラー」と記載され、現頁本文の先頭に「メン」と記載されている場合、すなわち、「ラーメン」という単語が２頁に亘って記載されている場合、コントローラ１２は、現頁本文の先頭に「ラー」を付加したものを表す本文データ生成する。また、現頁本文の末尾に「チャ―」と記載され、次頁本文の先頭に「ハン」と記載と記載されている場合、すなわち、「チャーハン」という単語が２頁に亘って記載されている場合、コントローラ１２は、現頁本文の末尾から「チャ―」を削除したものを表す本文データを生成する。なお、テキストデータ生成処理Ｓ１８において、本文の末尾が未完単語であるか否かを判定するアルゴリズムは、特に限定されず、公知のアルゴリズムを用いることができる。 In the text data generation process S18, the controller 12 adds the unfinished word to the beginning of the text of the current page as the text data (1) when the end of the text of the previous page is an unfinished word, and (2) presents it. When the end of the page body is an unfinished word, the body data representing the unfinished word deleted from the end of the current page body may be generated. For example, when "Ra" is written at the end of the text of the previous page and "Men" is written at the beginning of the text of the current page, that is, when the word "Ramen" is written over two pages. The controller 12 generates text data representing a text in which "Ra" is added to the beginning of the text of the current page. In addition, when "Cha" is written at the end of the text of the current page and "Han" is written at the beginning of the text of the next page, that is, the word "fried rice" is written over two pages. If so, the controller 12 generates text data representing the one in which the "char" is deleted from the end of the text of the current page. In the text data generation process S18, the algorithm for determining whether or not the end of the text is an unfinished word is not particularly limited, and a known algorithm can be used.

また、テキストデータ生成処理Ｓ１８において、コントローラ１２は、テキストデータに含まれる見出しを特定してもよい。見出しを特定する方法としては、例えば、テキストデータに含まれている文字列のうち、太字になっている文字列、又は、周囲にスペースが空いている文字列を見出しとして特定する方法が挙げられる。 Further, in the text data generation process S18, the controller 12 may specify a heading included in the text data. As a method of specifying the heading, for example, among the character strings included in the text data, a method of specifying a character string in bold or a character string having a space around it as a heading can be mentioned. ..

文字過少頁判定処理Ｓ１９は、頁に記載されたテキストの文字数が予め定められた閾値を下回るか否かを判定する処理である。文字過少頁判定処理Ｓ１９は、コントローラ１２によって実行される。 The character underpage determination process S19 is a process of determining whether or not the number of characters of the text written on the page is less than a predetermined threshold value. The character underpage determination process S19 is executed by the controller 12.

文字過少頁判定処理にて文字数が閾値を下回ると判定された場合、文字過少頁通知処理Ｓ２０が実行される。文字過少頁通知処理Ｓ２０は、記載されたテキストの文字数が少ない頁である旨をユーザに通知する処理である。文字過少頁通知処理Ｓ２０を実行するために、コントローラ１２は、例えば、「頁全体が図になっています」との音声信号を出力端子１３又はスピーカ１４に出力する。コントローラ１２が出力端子１３に当該音声信号を出力した場合、出力端子１３は、当該音声信号をイヤホン３に出力する。イヤホン３は、当該音声信号を音波に変換して出力する。コントローラ１２がスピーカ１４に当該音声信号を出力した場合、スピーカ１４は、当該音声信号を音波に変換して出力する。これにより、ユーザが、当該頁には図が記載されていることを認識することができるため、ユーザに次頁の撮影に移行する動作を促すことができる。 When it is determined in the character underpage determination process that the number of characters is less than the threshold value, the character underpage notification process S20 is executed. The character shortage page notification process S20 is a process of notifying the user that the page has a small number of characters in the described text. In order to execute the character under-page notification process S20, the controller 12 outputs, for example, an audio signal that "the entire page is shown in the figure" to the output terminal 13 or the speaker 14. When the controller 12 outputs the audio signal to the output terminal 13, the output terminal 13 outputs the audio signal to the earphone 3. The earphone 3 converts the audio signal into sound waves and outputs the signal. When the controller 12 outputs the audio signal to the speaker 14, the speaker 14 converts the audio signal into sound waves and outputs the signal. As a result, the user can recognize that the figure is described on the page, and thus can prompt the user to move to the next page of shooting.

音声信号生成処理Ｓ２１は、テキストデータ生成処理Ｓ１８にて生成されたテキストデータを参照して、頁に記載されたテキストを読み上げた音声を表す音声信号を生成する処理である。音声信号生成処理Ｓ２１は、コントローラ１２によって実行される。特に、頁が表紙以外の場合に実行される音声信号生成処理Ｓ２１において、コントローラ１２は、音声信号として、頁番号を読み上げた音声を表す音声信号の後に、本文を読み上げた音声を表す音声信号を付加した音声信号を生成する。 The audio signal generation process S21 is a process of referring to the text data generated in the text data generation process S18 to generate an audio signal representing a voice reading the text described on the page. The audio signal generation process S21 is executed by the controller 12. In particular, in the audio signal generation process S21 executed when the page is other than the front page, the controller 12 uses the audio signal representing the audio signal reading the page number followed by the audio signal representing the audio signal reading the text as the audio signal. Generate the added audio signal.

なお、音声信号生成処理Ｓ２１において、コントローラ１２は、音声信号として、見出しを聞き分け可能にテキストを読み上げた音声を表す音声信号を生成してもよい。例えば、見出しを聞き分け可能にテキスト「・・・である。＜改行＞（３）旅立ち＜改行＞いよいよ私は、決心した。今日こそ・・・」を読み上げた音声としては、「である（休止）みだし（休止）さん（小休止）たびだち（休止）いよいよわたしはけっしんした（小休止）きょうこそ・・・」といった例が挙げられる。別の例として、見出しとそれ以外の部分とを、異なる声（例えば、見出しは男性の声、それ以外の部分は女性の声）で読み上げた音声データを用いてもよい。 In the voice signal generation process S21, the controller 12 may generate a voice signal representing a voice in which the text is read aloud so that the heading can be discerned. For example, the voice that reads out the text "... <Line break> (3) Departure <Line break> I finally decided. Today is ..." is "(pause)." ) Ms. Midashi (pause) (pause) Every time (pause) I'm finally sick (pause) Today ... ". As another example, voice data obtained by reading the headline and other parts with different voices (for example, the headline is a male voice and the other parts are female voices) may be used.

音声出力処理Ｓ２２は、音声信号生成処理Ｓ２１にて生成された音声信号を電気信号として出力端子１３から出力する処理、及び、音声信号生成処理Ｓ２１にて生成された音声信号を音波としてスピーカ１４から出力する処理の一方又は両方である。音声出力処理Ｓ２２は、コントローラ１２によって実行される。 The audio output processing S22 is a process of outputting the audio signal generated by the audio signal generation processing S21 as an electric signal from the output terminal 13, and the audio signal generated by the audio signal generation processing S21 as a sound source from the speaker 14. One or both of the output processes. The audio output process S22 is executed by the controller 12.

音声出力処理Ｓ２２において、コントローラ１２は、朗読停止を指示する音声コマンドをユーザから取得すると、前記音声信号の出力を停止し、朗読再開を指示する音声コマンドをユーザから取得すると、前記音声信号の出力を再開する。例えば、コントローラ１２は、「停止」という音声コマンドをユーザから取得すると、音声信号の出力を停止する。また、「再開」という音声コマンドをユーザから取得すると、音声信号の出力を再開する。あるいは、コントローラ１２は、ユーザによる情報処理端末１の画面をタップする操作によって音声信号の出力を停止し、更に、ユーザによる情報処理端末１の画面をタップする操作によって音声信号の出力を再開してもよい。なお、画面をタップする操作は、画面上に設けられた特定のＧＵＩ要素（ボタン等）をタップする操作であってもよいし、画面の任意の場所をタップする操作であってもよい。 In the voice output processing S22, when the controller 12 acquires a voice command instructing the reading stop from the user, the controller 12 stops the output of the voice signal, and when the voice command instructing the resumption of reading is obtained from the user, the controller 12 outputs the voice signal. To resume. For example, when the controller 12 acquires the voice command "stop" from the user, the controller 12 stops the output of the voice signal. Further, when the voice command "restart" is acquired from the user, the output of the voice signal is restarted. Alternatively, the controller 12 stops the output of the audio signal by the operation of tapping the screen of the information processing terminal 1 by the user, and further restarts the output of the audio signal by the operation of tapping the screen of the information processing terminal 1 by the user. May be good. The operation of tapping the screen may be an operation of tapping a specific GUI element (button or the like) provided on the screen, or an operation of tapping an arbitrary place on the screen.

以上のように、本実施形態に係る朗読処理Ｓ１は、カメラ１１にて得られた画像データを参照して、頁に記載されたテキストを表すテキストデータを生成するテキストデータ生成処理Ｓ１４、Ｓ１８、及び、テキストデータを参照して、テキストを読み上げた音声を表す音声信号を生成する音声信号生成処理Ｓ１５、Ｓ２１を含んでいる。したがって、本実施形態に係る朗読処理Ｓ１によれば、視覚障碍者に対してより好適に音読を通して複数頁で構成される文書を把握し、文章を楽しむ体験を提供することができる。具体的には、視覚障碍の度合いに依らずに、重度の視覚障碍者であっても、軽度の視覚障碍者であっても、視力の衰えた高齢者であっても、健常者であっても、読書することが可能である。また、点字図書、音訳図書、及びオーディオブックと比べて低コストである。さらに、活字図書を利用して読書することができるため、幅広い種類の書籍を即座に読書することができる。以上のように、本実施形態に係る情報処理端末１は、視覚障碍者等のユーザが徹底的に使いやすい仕様となっている。 As described above, the reading process S1 according to the present embodiment refers to the image data obtained by the camera 11 and generates text data representing the text described on the page. Text data generation processes S14, S18, In addition, the voice signal generation processes S15 and S21 for generating a voice signal representing the voice read aloud the text with reference to the text data are included. Therefore, according to the reading process S1 according to the present embodiment, it is possible to more preferably provide a visually impaired person with an experience of grasping a document composed of a plurality of pages through reading aloud and enjoying the text. Specifically, regardless of the degree of visual impairment, a person with severe visual impairment, a person with mild visual impairment, an elderly person with impaired vision, or a healthy person. It is also possible to read. In addition, the cost is lower than that of Braille books, transliteration books, and audio books. In addition, since it is possible to read using printed books, it is possible to read a wide variety of books immediately. As described above, the information processing terminal 1 according to the present embodiment has specifications that are thoroughly easy for users such as visually impaired people to use.

なお、朗読処理Ｓ１は、前述した各処理以外にも、最大頁番号通知処理を含んでいてもよい。最大頁番号通知処理は、メモリ１５に格納された頁番データが表す頁番号のうち、最大の頁番号を通知する処理である。最大頁番号通知処理は、情報処理端末１のコントローラ１２によって実行される。 The reading process S1 may include a maximum page number notification process in addition to the above-mentioned processes. The maximum page number notification process is a process of notifying the maximum page number among the page numbers represented by the page number data stored in the memory 15. The maximum page number notification process is executed by the controller 12 of the information processing terminal 1.

最大頁番号通知処理を実行するために、コントローラ１２は、例えば、「３０頁まで読みました」という音声信号を出力端子１３又はスピーカ１４に出力する。コントローラ１２が出力端子１３に当該音声信号を出力した場合、出力端子１３は、当該音声信号をイヤホン３に出力する。イヤホン３は、当該音声信号を音波に変換して出力する。コントローラ１２がスピーカ１４に当該音声信号を出力した場合、スピーカ１４は、当該音声信号を音波に変換して出力する。これにより、ユーザに何ページまで読んだかを認識させることができる。 In order to execute the maximum page number notification process, the controller 12 outputs, for example, an audio signal "read up to 30 pages" to the output terminal 13 or the speaker 14. When the controller 12 outputs the audio signal to the output terminal 13, the output terminal 13 outputs the audio signal to the earphone 3. The earphone 3 converts the audio signal into sound waves and outputs the signal. When the controller 12 outputs the audio signal to the speaker 14, the speaker 14 converts the audio signal into sound waves and outputs the signal. This makes it possible for the user to recognize how many pages have been read.

〔支持体の構成〕
本実施形態に係る支持体２の構成について、図４、５を参照して説明する。図４は、支持体２の構成を示す斜視図である。図５は、支持体２の構成を示す展開図である。なお、支持体２の使用方法については、上述した通りである。 [Structure of support]
The configuration of the support 2 according to the present embodiment will be described with reference to FIGS. 4 and 5. FIG. 4 is a perspective view showing the configuration of the support 2. FIG. 5 is a developed view showing the configuration of the support 2. The method of using the support 2 is as described above.

支持体２は、書籍の頁と情報処理端末１のカメラレンズとが正対するように情報処理端末１を支持するための台である。支持体２は、１つの天板２１と、４つの側板２２と、を備えている。 The support 2 is a stand for supporting the information processing terminal 1 so that the pages of the book and the camera lens of the information processing terminal 1 face each other. The support 2 includes one top plate 21 and four side plates 22.

天板２１は、支持体２の上面を構成する板状部材である。本実施形態において、天板２１の平面視形状は、長方形である。情報処理端末１は、天板２１の上に載置される。天板２１には、情報処理端末１のカメラレンズと重なる部分を開放するための開口２１ａが形成されている。天板２１及び開口２１ａの形状及びサイズは、対象とする情報処理端末１の種類に応じて最適化されている。例えば、天板２１の形状及びサイズは、情報処理端末１の背面の形状及びサイズに応じて設定されている。また、開口２１ａの形状及びサイズは、情報処理端末１の背面におけるカメラレンズの位置に応じて設定されている。 The top plate 21 is a plate-shaped member that constitutes the upper surface of the support 2. In the present embodiment, the top plate 21 has a rectangular shape in a plan view. The information processing terminal 1 is placed on the top plate 21. The top plate 21 is formed with an opening 21a for opening a portion overlapping the camera lens of the information processing terminal 1. The shapes and sizes of the top plate 21 and the opening 21a are optimized according to the type of the target information processing terminal 1. For example, the shape and size of the top plate 21 are set according to the shape and size of the back surface of the information processing terminal 1. The shape and size of the opening 21a are set according to the position of the camera lens on the back surface of the information processing terminal 1.

側板２２は、支持体２の側面を構成する板状部材であり、側板２２の上端は、天板２１の外縁に連結されている。本実施形態において、側板２２の平面視形状は、台形である。側板２２の下端は、書籍の頁の外縁に押し当てられる。側板２２の形状及びサイズは、対象とする情報処理端末１の種類、及び、対象とする書籍の種類に応じて最適化されている。例えば、天板２１に直交する方向に沿って測った天板２１から側板２２の下端までの距離が対象とする情報処理端末１のカメラレンズの被写界深度に含まれるように、且つ、側板２２の下端が対象とする書籍の頁の外縁に押し当てられるように設定されている。 The side plate 22 is a plate-shaped member that constitutes the side surface of the support 2, and the upper end of the side plate 22 is connected to the outer edge of the top plate 21. In the present embodiment, the side plate 22 has a trapezoidal shape in a plan view. The lower end of the side plate 22 is pressed against the outer edge of the book page. The shape and size of the side plate 22 are optimized according to the type of the target information processing terminal 1 and the type of the target book. For example, the distance from the top plate 21 to the lower end of the side plate 22 measured along the direction orthogonal to the top plate 21 is included in the depth of field of the camera lens of the target information processing terminal 1, and the side plate. The lower end of 22 is set to be pressed against the outer edge of the target book page.

本実施形態において、各側板２２は、天板２１の外縁に展開可能に連結されている。各側板２２を展開して各側板２２と天板２１との成す角を１８０°とすると、支持体２は、図５に示すように平面化される。逆に、各側板２２を折り畳んで各側板２２と天板２１との成す角を所定の角度（１８０°未満）とすると、支持体２は、図４に示すように立体化される。支持体２を販売するとき、及び、支持体２を保管するときには、可搬性を向上させるために支持体２を展開して平面化することが好ましい。 In the present embodiment, each side plate 22 is deployably connected to the outer edge of the top plate 21. When each side plate 22 is expanded and the angle formed by each side plate 22 and the top plate 21 is 180 °, the support 2 is flattened as shown in FIG. On the contrary, when each side plate 22 is folded and the angle formed by each side plate 22 and the top plate 21 is a predetermined angle (less than 180 °), the support 2 is three-dimensionalized as shown in FIG. When selling the support 2 and when storing the support 2, it is preferable to expand and flatten the support 2 in order to improve portability.

なお、天板２１及び側板２２の一方又は両方は、透光性を有する材料により構成されていることが好ましい。これにより、天板２１又は側板２２を透過した外光を利用して、撮像対象とする頁の照度を向上させることができる。なお、カメラレンズと同じ面に光源を備えた情報処理端末１をユーザが使用する場合、この光源を撮像対象とする頁の照度を確保するために利用してもよい。 It is preferable that one or both of the top plate 21 and the side plate 22 is made of a translucent material. As a result, the illuminance of the page to be imaged can be improved by utilizing the external light transmitted through the top plate 21 or the side plate 22. When the user uses the information processing terminal 1 having a light source on the same surface as the camera lens, the information processing terminal 1 may be used to secure the illuminance of the page on which the light source is to be imaged.

また、天板２１及び側板２２の一方又は両方には、支持体２の利用に適した書籍の種類を触知可能に示す構造２２ａ、及び、支持体２の利用に適した情報処理端末の種類を触知可能に示す構造２２ｂの一方又は両方が設けられていることが好ましい。これにより、目の不自由なユーザであっても、支持体２が対象とする情報処理端末１及び／又は書籍に適したものであるか否かを触知することができる。なお、これらの構造２２ａ，２２ｂは、例えば、天板２１又は側板２２の表面に形成された凹凸によって実現することができる。 Further, one or both of the top plate 21 and the side plate 22 have a structure 22a that palpably indicates the type of book suitable for use of the support 2, and a type of information processing terminal suitable for use of the support 2. It is preferable that one or both of the structures 22b are provided so as to be palpable. As a result, even a visually impaired user can feel whether or not the support 2 is suitable for the target information processing terminal 1 and / or a book. Note that these structures 22a and 22b can be realized, for example, by the unevenness formed on the surface of the top plate 21 or the side plate 22.

〔ソフトウェアによる実現例〕
情報処理端末１のコントローラ１２は、集積回路（ＩＣチップ）等に形成された論理回路（ハードウェア）によって実現してもよいし、ソフトウェアによって実現してもよい。 [Example of realization by software]
The controller 12 of the information processing terminal 1 may be realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like, or may be realized by software.

後者の場合、情報処理端末１は、各機能を実現するソフトウェアであるプログラムの命令を実行するコンピュータを備えている。このコンピュータは、例えば少なくとも１つのプロセッサ（制御装置）を備えていると共に、上記プログラムを記憶したコンピュータ読み取り可能な少なくとも１つの記録媒体を備えている。そして、上記コンピュータにおいて、上記プロセッサが上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記プロセッサとしては、例えばＣＰＵ（Central Processing Unit）を用いることができる。上記記録媒体としては、「一時的でない有形の媒体」、例えば、ＲＯＭ（Read Only Memory）等の他、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記プログラムを展開するＲＡＭ（Random Access Memory）などをさらに備えていてもよい。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記コンピュータに供給されてもよい。なお、本発明の一態様は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, the information processing terminal 1 includes a computer that executes instructions of a program that is software that realizes each function. The computer includes, for example, at least one processor (control device) and at least one computer-readable recording medium that stores the program. Then, in the computer, the processor reads the program from the recording medium and executes it, thereby achieving the object of the present invention. As the processor, for example, a CPU (Central Processing Unit) can be used. As the recording medium, in addition to a "non-temporary tangible medium" such as a ROM (Read Only Memory), a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. Further, a RAM (Random Access Memory) for expanding the above program may be further provided. Further, the program may be supplied to the computer via an arbitrary transmission medium (communication network, broadcast wave, etc.) capable of transmitting the program. It should be noted that one aspect of the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above program is embodied by electronic transmission.

〔まとめ〕
本発明の態様１に係る情報処理端末は、複数の頁で構成される文書の各頁を撮像するカメラと、前記カメラにて得られた画像データを参照して、前記頁に記載されたテキストを表すテキストデータを生成するテキストデータ生成処理、及び、前記テキストデータを参照して、前記テキストを読み上げた音声を表す音声信号を生成する音声信号生成処理を実行するコントローラと、前記音声信号を出力する出力部、及び、前記音声信号を音波に変換して出力するスピーカの一方又は両方と、を備えている、構成である。 [Summary]
The information processing terminal according to the first aspect of the present invention refers to a camera that captures each page of a document composed of a plurality of pages and image data obtained by the camera, and the text described on the page. A controller that executes a text data generation process that generates text data representing the text, and a voice signal generation process that generates a voice signal representing the voice that reads the text by referring to the text data, and outputs the voice signal. It is configured to include an output unit for information processing and one or both of a speaker that converts the audio signal into sound waves and outputs the data.

本発明の態様２に係る情報処理端末は、上記の態様１において、前記コントローラは、前記画像データを参照して、前記頁が前記カメラの撮像対象領域から外れた後に、前記テキストデータ生成処理を実行するか、又は、前記コントローラは、前記画像データを参照して、前記頁が前記カメラの撮像対象領域から外れた後に、前記音声信号生成処理を実行する、構成としてもよい。 In the information processing terminal according to the second aspect of the present invention, in the first aspect, the controller refers to the image data and performs the text data generation process after the page is out of the imaging target area of the camera. It may be executed, or the controller may execute the voice signal generation process after the page is out of the image pickup target area of the camera with reference to the image data.

本発明の態様３に係る情報処理端末は、上記の態様１又は２において、前記音声信号生成処理において、前記コントローラは、前記音声信号として、ユーザにより指定された読み上げ速度で前記頁に記載されたテキストを読み上げた音声を表す音声信号を生成する、構成としてもよい。 The information processing terminal according to the third aspect of the present invention is described on the page in the above aspect 1 or 2 in the audio signal generation process, in which the controller describes the audio signal as the audio signal at a reading speed specified by the user. It may be configured to generate an audio signal representing the audio read out from the text.

本発明の態様４に係る情報処理端末は、上記の態様１〜３の何れか１項において、前記テキストデータ生成処理において、前記コントローラは、前記テキストデータとして、前記頁に記載された頁番号を表す頁番号データ、及び、前記頁に記載された本文を表す本文データを生成し、前記音声信号生成処理において、前記コントローラは、前記音声信号として、前記頁番号を読み上げた音声を表す音声信号の後に、前記本文を読み上げた音声を表す音声信号を付加した音声信号を生成する、構成としてもよい。 In the information processing terminal according to the fourth aspect of the present invention, in any one of the above aspects 1 to 3, in the text data generation process, the controller uses the page number described on the page as the text data. The page number data to be represented and the text data representing the text described on the page are generated, and in the voice signal generation process, the controller uses the voice signal as the voice signal to represent the voice reading the page number. Later, a voice signal to which a voice signal representing the voice reading the text is added may be generated.

本発明の態様５に係る情報処理端末は、上記の態様１〜４の何れか１項において、前記テキストデータ生成処理において、前記コントローラは、前記本文データとして、（１）前頁本文の末尾が未完単語である場合、現頁本文の先頭に当該未完単語を付加すると共に、（２）現頁本文の末尾が未完単語である場合、現頁本文の末尾から当該未完単語を削除したものを表す本文データを生成する、構成としてもよい。 In the information processing terminal according to the fifth aspect of the present invention, in any one of the above aspects 1 to 4, in the text data generation process, the controller uses the text data as (1) the end of the text on the previous page. If it is an unfinished word, the unfinished word is added to the beginning of the current page text, and (2) if the end of the current page text is an unfinished word, the unfinished word is deleted from the end of the current page text. It may be configured to generate text data.

本発明の態様６に係る情報処理端末は、上記の態様１〜５の何れか１項において、前記テキストデータ生成処理において、前記コントローラは、前記テキストデータに含まれる見出しを特定し、前記音声信号生成処理において、前記コントローラは、前記音声信号として、前記見出しを聞き分け可能に前記テキストを読み上げた音声を表す音声信号を生成する、構成としてもよい。 In the information processing terminal according to the sixth aspect of the present invention, in any one of the above aspects 1 to 5, in the text data generation process, the controller identifies a heading included in the text data and the voice signal. In the generation process, the controller may be configured to generate, as the voice signal, a voice signal representing the voice in which the text is read aloud so that the heading can be discerned.

本発明の態様７に係る情報処理端末は、上記の態様１〜６の何れか１項において、前記頁が前記複数の頁で構成される文書の表紙である場合、前記テキストデータ生成処理において、前記コントローラは、前記テキストデータとして、前記表紙に記載された題名を表す題名データ、及び、前記表紙に記載された著者名を表す著者名データを生成し、前記音声信号生成処理において、前記コントローラは、前記音声信号として、前記題名を読み上げた音声を表す音声信号の前又は後に、前記著者名を読み上げた音声を表す音声信号を付加した音声信号を生成する、構成としてもよい。 In the information processing terminal according to the seventh aspect of the present invention, in any one of the above aspects 1 to 6, when the page is the cover of a document composed of the plurality of pages, in the text data generation process, The controller generates title data representing the title described on the cover page and author name data representing the author name described on the cover page as the text data, and in the voice signal generation process, the controller generates the title data. , The audio signal may be configured to generate an audio signal to which an audio signal representing the audio reading the author's name is added before or after the audio signal representing the audio reading the title.

本発明の態様８に係る情報処理端末は、上記の態様７において、前記テキストデータ生成処理において、前記コントローラは、前記表紙に記載された文字列の中で、最も大きいフォントで記載された文字列を前記題名として特定すると共に、前記表紙に記載された文字列の中で、２番目に大きいフォントで記載された文字列を著者名として特定する、構成としてもよい。 The information processing terminal according to the eighth aspect of the present invention is the character string described in the largest font among the character strings described on the cover page in the text data generation process in the above aspect 7. May be specified as the title, and the character string described in the second largest font among the character strings described on the cover may be specified as the author name.

本発明の態様９に係る情報処理端末は、上記の態様７において、複数の頁で構成される文書の表紙を表す画像データを入力とし、当該複数の頁で構成される文書の題名及び著者名を表すテキストデータを出力とする分類器、又は、当該分類器を備えたサーバと通信するための通信インターフェースを更に備え、前記テキストデータ生成処理において、前記コントローラは、前記カメラにて得られた画像データを入力すると共に、前記分類器から出力されるテキストデータから前記題名データ及び前記著者名データを生成する、構成としてもよい。 The information processing terminal according to the ninth aspect of the present invention receives image data representing the cover of a document composed of a plurality of pages as input in the above aspect 7, and the title and author name of the document composed of the plurality of pages. A classifier that outputs text data representing the above, or a communication interface for communicating with a server equipped with the classifier is further provided, and in the text data generation process, the controller is an image obtained by the camera. The title data and the author name data may be generated from the text data output from the classifier while inputting the data.

本発明の態様１０に係る情報処理端末は、上記の態様７において、複数の著者名が予め登録されたデータベース、又は、当該データベースを備えたサーバと通信するための通信インターフェースを更に備え、前記テキストデータ生成処理において、前記コントローラは、前記表紙に記載された文字列の中で、前記データベースに著者名として登録された文字列を前記著者名データとする、構成としてもよい。 In the above aspect 7, the information processing terminal according to the tenth aspect of the present invention further includes a database in which a plurality of author names are registered in advance, or a communication interface for communicating with a server provided with the database, and the text. In the data generation process, the controller may have a configuration in which the character string registered as the author name in the database is used as the author name data among the character strings described on the cover page.

本発明の態様１１に係る情報処理端末は、上記の態様１〜１０の何れか１項において、前記コントローラは、前記カメラにて得られた画像データが表す画像が前記頁の逆立画像あるか否かを判定する逆立画像判定処理、及び、前記逆立画像判定処理にて前記画像が前記逆立画像であると判定された場合に、その旨をユーザに通知する逆立画像通知処理を更に実行する、構成としてもよい。 In the information processing terminal according to the eleventh aspect of the present invention, in any one of the above aspects 1 to 10, the controller has the inverted image of the page represented by the image data obtained by the camera. Inverted image determination processing for determining whether or not, and inverted image notification processing for notifying the user when the image is determined to be the inverted image in the inverted image determination process. It may be configured to be further executed.

本発明の態様１２に係る情報処理端末は、上記の態様１〜１１の何れか１項において、前記コントローラは、前記頁に記載されたテキストの文字数が予め定められた閾値を下回るか否かを判定する文字過少頁判定処理、及び、文字過少頁判定処理にて前記文字数が前記閾値を下回ると判定された場合、その旨をユーザに通知する文字過少頁通知処理を更に実行する、構成としてもよい。 In the information processing terminal according to the twelfth aspect of the present invention, in any one of the above aspects 1 to 11, the controller determines whether or not the number of characters of the text described on the page is less than a predetermined threshold value. If it is determined by the character under-page determination process and the character under-page determination process that the number of characters is less than the threshold value, the character underpage notification process for notifying the user to that effect is further executed. Good.

本発明の態様１３に係る情報処理端末は、上記の態様１〜１２の何れか１項において、前記頁に記載された頁番号を表す頁番号データを格納するメモリを更に備え、前記コントローラは、前記メモリに格納された頁番データが表す頁番号のうち、最大の頁番号を通知する最大頁番号通知処理を更に実行する、構成としてもよい。 The information processing terminal according to the thirteenth aspect of the present invention further includes a memory for storing page number data representing the page number described on the page in any one of the above aspects 1 to 12, and the controller includes a memory. Among the page numbers represented by the page number data stored in the memory, the maximum page number notification process for notifying the maximum page number may be further executed.

本発明の態様１４に係る情報処理端末は、上記の態様１〜１３の何れか１項において、前記コントローラは、前記頁に記載されているテキストが縦書きテキストであるか横書きテキストであるかを判定する縦横判定処理を更に実行し、前記コントローラは、前記縦横判定処理の結果を参照して前記テキストデータ生成処理を実行する、構成としてもよい。 In the information processing terminal according to the fourteenth aspect of the present invention, in any one of the above aspects 1 to 13, the controller determines whether the text described on the page is vertical text or horizontal text. The vertical / horizontal determination process for determination may be further executed, and the controller may execute the text data generation process with reference to the result of the vertical / horizontal determination process.

本発明の態様１５に係る情報処理端末は、上記の態様１〜１４の何れか１項において、前記コントローラは、朗読停止を指示する音声コマンドをユーザから取得すると、前記音声信号の出力を停止し、朗読再開を指示する音声コマンドをユーザから取得すると、前記音声信号の出力を再開する、構成としてもよい。 In any one of the above aspects 1 to 14, the information processing terminal according to the fifteenth aspect of the present invention stops the output of the audio signal when the controller acquires a voice command instructing to stop reading from the user. , When a voice command instructing the resumption of reading is obtained from the user, the output of the voice signal may be restarted.

本発明の態様１６に係る支持体は、複数の頁で構成される文書の各頁と情報処理端末のカメラレンズとが正対するように前記情報処理端末を支持する支持体において、前記情報処理端末が載置される天板であって、前記カメラレンズと重なる部分が開放された天板と、上端が前記天板の外縁に連結されると共に、下端が前記頁の外縁に押し当てられる側板と、を備えている、構成である。 The support according to aspect 16 of the present invention is a support that supports the information processing terminal so that each page of a document composed of a plurality of pages faces the camera lens of the information processing terminal. A top plate on which a top plate is placed, the portion of which overlaps with the camera lens is open, and a side plate whose upper end is connected to the outer edge of the top plate and whose lower end is pressed against the outer edge of the page. , Is a configuration.

本発明の態様１７に係る支持体は、上記の態様１６において、前記天板及び前記側板の一方又は両方は、透光性を有している、構成としてもよい。 The support according to the 17th aspect of the present invention may be configured such that one or both of the top plate and the side plate has translucency in the 16th aspect.

本発明の態様１８に係る支持体は、上記の態様１６又は１７において、上記側板は、上記天板に展開可能に連結されている、構成としてもよい。 The support according to the aspect 18 of the present invention may have a configuration in which the side plate is deployably connected to the top plate in the above aspect 16 or 17.

本発明の態様１９に係る支持体は、上記の態様１６〜１８の何れか１項において、前記天板に直交する方向に沿って測った、前記天板から前記側板の下端までの距離は、前記カメラレンズの被写界深度に含まれる、構成としてもよい。 In the support according to the 19th aspect of the present invention, the distance from the top plate to the lower end of the side plate measured along the direction orthogonal to the top plate in any one of the above aspects 16 to 18 is determined. The configuration may be included in the depth of field of the camera lens.

本発明の態様２０に係る情報処理端末の支持体は、上記の態様１６〜１９の何れか１項において、前記天板及び前記側板の一方又は両方には、当該支持体の利用に適した複数の頁で構成される文書の種類を触知可能に示す構造、及び、当該支持体の利用に適した情報処理端末の種類を触知可能に示す構造の一方又は両方が設けられている、構成としてもよい。 In any one of the above aspects 16 to 19, the information processing terminal support according to the 20th aspect of the present invention has a plurality of supports suitable for use on one or both of the top plate and the side plates. One or both of a structure that palpably indicates the type of the document composed of the above pages and a structure that palpably indicates the type of the information processing terminal suitable for the use of the support are provided. May be.

本発明の態様２１に係る情報処理方法は、カメラと、コントローラと、出力部、及び、スピーカの一方又は両方と、を備えた情報処理端末を用いて複数の頁で構成される文書を音声として出力する情報処理方法であって、前記カメラが、複数の頁で構成される文書の各頁を撮像する撮像処理と、前記コントローラが、前記カメラにて得られた画像データを参照して、前記頁に記載されたテキストを表すテキストデータを生成するテキストデータ生成処理と、前記コントローラが、前記テキストデータを参照して、前記テキストを読み上げた音声を表す音声信号を生成する音声信号生成処理と、前記出力部が前記音声信号を出力する、及び／又は、前記スピーカが前記音声信号を音波に変換して出力する音声出力処理と、を含んでいる、構成としてもよい。 The information processing method according to aspect 21 of the present invention uses an information processing terminal including a camera, a controller, an output unit, and one or both of speakers to use a document composed of a plurality of pages as audio. An information processing method for outputting, wherein the camera takes an image of each page of a document composed of a plurality of pages, and the controller refers to the image data obtained by the camera. A text data generation process that generates text data representing the text described on the page, and a voice signal generation process in which the controller refers to the text data and generates a voice signal representing the voice that reads out the text. The output unit may include an audio output process in which the audio signal is output and / or the speaker converts the audio signal into sound waves and outputs the audio signal.

〔付記事項〕
本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。 [Additional notes]
The present invention is not limited to the above-described embodiments, and various modifications can be made within the scope of the claims, and the embodiments obtained by appropriately combining the technical means disclosed in the different embodiments. Is also included in the technical scope of the present invention.

１情報処理端末
２支持体
３イヤホン
１１カメラ
１２コントローラ
１３出力端子（出力部）
１４スピーカ
１５メモリ
２１天板
２２側板 1 Information processing terminal 2 Support 3 Earphone 11 Camera 12 Controller 13 Output terminal (output unit)
14 Speaker 15 Memory 21 Top plate 22 Side plate

Claims

A camera that captures each page of a document consisting of multiple pages,
A text data generation process for generating text data representing the text described on the page with reference to the image data obtained by the camera, and a voice reading the text with reference to the text data. A controller that executes audio signal generation processing that generates the represented audio signal,
It includes an output unit that outputs the audio signal, and one or both of a speaker that converts the audio signal into sound waves and outputs the sound wave.
An information processing terminal characterized by this.

The controller refers to the image data and executes the text data generation process after the page is out of the imaging target area of the camera, or
The controller refers to the image data and executes the audio signal generation process after the page is out of the imaging target area of the camera.
The information processing terminal according to claim 1, wherein the information processing terminal is characterized by the above.

In the voice signal generation process, the controller generates, as the voice signal, a voice signal representing the voice reading the text described on the page at a reading speed specified by the user.
The information processing terminal according to claim 1 or 2.

In the text data generation process, the controller generates page number data representing the page number described on the page and text data representing the text described on the page as the text data.
In the audio signal generation process, the controller generates an audio signal as the audio signal by adding an audio signal representing the audio that reads the text to the audio signal that represents the audio that reads the page number.
The information processing terminal according to any one of claims 1 to 3, wherein the information processing terminal is characterized by the above.

In the text data generation process, the controller adds the unfinished word to the beginning of the text of the current page as the text data (1) when the end of the text of the previous page is an unfinished word, and (2) the current page. If the end of the text is an unfinished word, the text data representing the unfinished word deleted from the end of the text of the current page is generated.
The information processing terminal according to any one of claims 1 to 4, wherein the information processing terminal is characterized by the above.

In the text data generation process, the controller identifies a heading included in the text data.
In the audio signal generation process, the controller generates, as the audio signal, an audio signal representing the voice in which the text is read aloud so that the heading can be discerned.
The information processing terminal according to any one of claims 1 to 5, wherein the information processing terminal is characterized.

When the page is the cover of a document composed of the plurality of pages
In the text data generation process, the controller generates title data representing the title described on the cover page and author name data representing the author name described on the cover page as the text data.
In the audio signal generation process, the controller generates an audio signal to which an audio signal representing the voice reading the author's name is added before or after the audio signal representing the voice reading the title as the audio signal. ,
The information processing terminal according to any one of claims 1 to 6, characterized in that.

In the text data generation process, the controller specifies the character string described in the largest font among the character strings described on the cover page as the title, and among the character strings described on the cover page. Then, specify the character string written in the second largest font as the author name,
The information processing terminal according to claim 7, wherein the information processing terminal is characterized by the above.

A classifier that inputs image data representing the cover of a document composed of a plurality of pages and outputs text data representing the title and author name of the document composed of the plurality of pages, or a classifier. It also has a communication interface for communicating with the server.
In the text data generation process, the controller inputs the image data obtained by the camera and generates the title data and the author name data from the text data output from the classifier.
The information processing terminal according to claim 7, wherein the information processing terminal is characterized by the above.

A database in which multiple author names are registered in advance, or a communication interface for communicating with a server equipped with the database is further provided.
In the text data generation process, the controller uses the character string registered as the author name in the database as the author name data among the character strings described on the cover page.
The information processing terminal according to claim 7, wherein the information processing terminal is characterized by the above.

The controller has an inverted image determination process for determining whether or not the image represented by the image data obtained by the camera is an inverted image of the page, and the inverted image determination process for the image. When it is determined that the image is an inverted image, the inverted image notification process for notifying the user to that effect is further executed.
The information processing terminal according to any one of claims 1 to 10.

When the controller determines whether or not the number of characters of the text described on the page falls below a predetermined threshold value, the character under-page determination process and the character under-page determination process show that the number of characters falls below the threshold value. If it is determined, the character underpage notification process for notifying the user to that effect is further executed.
The information processing terminal according to any one of claims 1 to 11, characterized in that.

A memory for storing page number data representing the page number described on the page is further provided.
The controller further executes a maximum page number notification process for notifying the maximum page number among the page numbers represented by the page number data stored in the memory.
The information processing terminal according to any one of claims 1 to 12, characterized in that.

The controller further executes a vertical / horizontal determination process for determining whether the text described on the page is vertical text or horizontal text.
The controller executes the text data generation process with reference to the result of the vertical / horizontal determination process.
The information processing terminal according to any one of claims 1 to 13, wherein the information processing terminal is characterized.

When the controller acquires a voice command instructing to stop reading from the user, the output of the voice signal is stopped, and when a voice command instructing to resume reading is obtained from the user, the controller resumes the output of the voice signal.
The information processing terminal according to any one of claims 1 to 14, wherein the information processing terminal is characterized.

In a support that supports the information processing terminal so that each page of a document composed of a plurality of pages and the camera lens of the information processing terminal face each other.
A top plate on which the information processing terminal is placed, the top plate having an open portion overlapping with the camera lens, and
It is provided with a side plate whose upper end is connected to the outer edge of the top plate and whose lower end is pressed against the outer edge of the page.
A support characterized by that.

One or both of the top plate and the side plate is translucent.
16. The support according to claim 16.

The side plate is deployably connected to the top plate,
The support according to claim 16 or 17.

The distance from the top plate to the lower end of the side plate measured along the direction orthogonal to the top plate is included in the depth of field of the camera lens.
The support according to any one of claims 16 to 18, characterized in that.

One or both of the top plate and the side plate has a structure that palpably indicates the type of document composed of a plurality of pages suitable for the use of the support, and information suitable for the use of the support. One or both of structures that palpate the type of processing terminal are provided.
The support for an information processing terminal according to any one of claims 16 to 19, wherein the information processing terminal supports.

An information processing method for outputting a document composed of a plurality of pages as voice using an information processing terminal provided with a camera, a controller, an output unit, and one or both of speakers.
An imaging process in which the camera captures each page of a document composed of a plurality of pages.
A text data generation process in which the controller refers to the image data obtained by the camera to generate text data representing the text described on the page.
An audio signal generation process in which the controller refers to the text data to generate an audio signal representing the audio that reads the text.
The output unit includes an audio output process in which the audio signal is output and / or the speaker converts the audio signal into sound waves and outputs the signal.
An information processing method characterized by this.