WO2019039591A1

WO2019039591A1 - Read-out system and read-out method

Info

Publication number: WO2019039591A1
Application number: PCT/JP2018/031366
Authority: WO
Inventors: 圭佑島影
Original assignee: 株式会社オトングラス
Priority date: 2017-08-24
Filing date: 2018-08-24
Publication date: 2019-02-28
Also published as: WO2019039591A4; JP2019040005A

Abstract

This read-out system is equipped with: an imaging unit which is provided to a wearable device that is used worn on the body of a user, and which captures images of the forward direction of the user; an extraction unit which extracts written characters from the images captured by the imaging unit; a conversion unit which converts the characters extracted by the extraction unit into voice audio; an output unit which is provided to the wearable device and outputs the voice audio; an input unit which is provided to the wearable device and receives input from the user; and a control unit which, on the basis of the input from the user received via the input unit, controls the playback speed of the voice audio output from the output unit.

Description

Reading system and reading method

The present invention relates to a reading system and a reading method for converting sentences into speech and reading them.

2. Description of the Related Art In recent years, devices have been developed to support visual recognition of people with low vision or people with reading disabilities who have difficulty reading letters. For example, Patent Document 1 discloses a wearable display capable of imaging and displaying a front view so that a low vision person can walk outdoors at night or the like. According to the low-vision person wearable display of Patent Document 1, the contrast and the brightness of the captured image are converted and displayed. It also discloses that when there is a character in a captured image, character recognition processing is performed to notify the user of the character by voice.

JP 2014-165616 A

By the way, in the low-vision person wearable display described in Patent Document 1 described above, it is described only that the letter is transmitted to the low-vision person through the speaker by the character recognition processing, and specifically how the speech is transmitted There is no disclosure. In addition, since the way of hearing the sound differs depending on the user, in the case of the wearable display for low vision persons described in Patent Document 1, there is a problem that the usability is lacking.

Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a reading system that is more convenient for the user to use than the low vision wearable display described in Patent Document 1 above. Do.

In order to solve the above-mentioned subject, a reading system concerning one mode of the present invention is equipped with a wearing tool which a user wears and uses, and an image which an image pick-up part picturizes a user's front direction, and an image And a converter for converting characters extracted by the extracting unit into voice, an output unit for outputting voice and provided in the mounting tool, and a mounting tool for receiving an input from the user An input unit, and a control unit provided in the mounting tool and controlling the reproduction speed of the sound output from the output unit based on the input from the user received via the input unit.

In order to solve the above problems, the reading method according to an aspect of the present invention includes an imaging step of imaging a front direction of a user by an imaging unit provided in a mounting tool worn and used by the user; An extraction step of extracting characters from the image captured in step, a conversion step of converting the characters extracted in the extraction step into voice, an output step of outputting voice from an output unit provided in the wearing tool, and The method further includes an input step of receiving an input from the user from the input unit, and a control step of controlling a reproduction speed of the sound output from the output unit based on the input from the user received through the input unit.

Further, in the above-mentioned reading system, the control unit may pause the sound output from the output unit based on the input from the user received via the input unit.

In the above-mentioned reading system, the control unit may repeatedly reproduce the voice output from the output unit based on the input from the user received via the input unit.

Further, in the above-mentioned reading system, the output unit may output a sound indicating that the character conversion process is in progress while the conversion unit converts the characters into speech.

Further, in the reading system, the reading system includes a server including an extraction unit and a conversion unit, and the mounting tool receives a transmission unit that transmits an image captured by the imaging unit to the server, and a voice converted by the conversion unit. The output unit may output a sound indicating that character conversion processing is in progress, from when the transmission unit transmits an image to when the reception unit receives an audio.

Further, in the above-mentioned reading system, the mounting tool may include an acquisition unit for acquiring environment information on the surrounding environment, and the imaging unit may change the imaging condition based on the environment information.

In the reading system, the mounting tool includes a determination unit that determines whether or not there is a character in the captured image captured by the imaging unit, and the output unit includes the character in the captured image. When it is determined that the character is present, a sound indicating that a character is present in the imaging direction by the imaging unit may be output.

Further, in the reading system, the reading system associates the captured image captured by the imaging unit with the voice obtained by converting the converting unit based on the captured image, and transmits the associated image to the user's information processing terminal A log transmission unit may be provided.

In the reading system, the mounting tool includes a position information acquisition unit that acquires position information indicating the position of the own terminal, and the imaging unit acquires position information acquired when the imaging unit captures an image of the captured image. The log transmission unit may transmit position information to the user's information processing terminal together with the captured image and the sound.

The reading system according to an aspect of the present invention can freely adjust the speed of reading speech converted from characters. Therefore, it is possible to provide a reading system excellent in usability.

(A) is a figure which shows the example of an external appearance of the user who mounts | wears with a mounting tool, (b) is a figure which shows the example of an external appearance which images by using a mounting tool and it reads aloud. It is a figure showing an example of system configuration of a reading system. (A) is a figure which shows the structural example of the data which a mounting tool transmits to a server, (b) is a figure which shows the structural example of the read-out audio | voice information which a server memorize | stores for every user. It is a sequence diagram showing exchange with a mounting tool and a server. It is a flowchart which shows operation | movement of a mounting tool. It is a flowchart which shows operation | movement of a server. FIG. 6 is a diagram illustrating an example of a range in which characters are preferentially extracted from an image. It is a figure which shows the example of a screen for reproducing the reading audio | voice using a map. It is a figure which shows another example of a system configuration of a reading system.

Hereinafter, a reading system according to an embodiment of the present invention will be described in detail with reference to the drawings.

Embodiment
<Configuration>
FIG. 1A is a view showing an example of the appearance of a user wearing the wearing tool 100 according to the reading system 1. Further, FIG. 1B is a view showing an example of an appearance showing a state in which imaging is performed using the mounting tool 100 and reading is performed. FIG. 2 is a view showing an example of the system configuration of the reading system 1.

As shown in FIG. 1 and FIG. 2, the reading system 1 is provided in the mounting tool 100 worn and used by the user, and from the imaging unit 111 that images the front direction of the user and the image captured by the imaging unit 111 The extraction unit 231 for extracting characters, the conversion unit 232 for converting the characters extracted by the extraction unit 231 into voice, the output unit 156 for outputting voice and provided in the wearing tool 100, and provided in the wearing tool 100 Control unit 155 that controls the reproduction speed of the sound output from the output unit 156 based on the input from the user that is provided to the mounting tool 100 and that is received by the user via the input unit 154. And Such a reading system 1 will be described in detail below.

As shown in FIGS. 1A and 1B, the user 10 wears the wearable glass 110 and uses it. In the wearable glass 110, an imaging unit 111 is disposed at a position where it can capture an image in the front direction of the user according to an instruction from the user. The imaging unit 111 is a so-called camera. The wearable glass 110 is connected to the controller 150. In FIG. 1A, the wearable glass 110 is shown to be connected via the cord 130 via the earphone 130 and the cable 140. However, the wearable glass 110 and the controller 150 are earphones. Similar to 130 and controller 150, they may be directly connected. Also, the user 10 can wear the earphone 130 on his ear and listen to the read-out voice transmitted from the controller 150. As shown in FIG. 1A, the user 10 holds the controller 150, and can use the controller 150 to issue an imaging instruction or an instruction related to reproduction of a read-out voice. As illustrated in FIG. 1B, when the user issues an imaging instruction, the imaging unit 111 captures an image of the imaging range 160. Then, a character included in the imaging range 160 is recognized, and the character is converted into a mechanically synthesized speech and read out. Therefore, the reading system 1 can provide information of difficult-to-read characters to the low vision person or the like.

FIG. 2 shows an example of the system configuration of the reading system 1. The reading system 1 includes a wearing tool 100 and a server 200. The mounting tool 100 and the server 200 are configured to be able to communicate via the network 300. The mounting tool 100 and the network 300 communicate by wireless communication. Any communication protocol may be used as long as wireless communication can be performed. The server 200 also communicates with the network, but any communication mode may be used, either wireless communication or wired communication, and any communication protocol may be used as long as communication can be performed.

As shown in FIG. 2, the mounting tool 100 includes a wearable glass 110, an earphone 130, and a controller 150. That is, in the present embodiment, as shown in FIG. 2, the wearable glass 110, the earphone 130, and the controller 150 are collectively referred to as a mounting tool 100. Further, although the wearable glass 110 is used here, it is needless to say that the wearable glass 110 is not limited to the glasses as long as the front direction (viewing direction) of the user 10 can be imaged.

The wearable glass 110 includes an imaging unit 111 and a communication I / F 112.

The imaging unit 111 is a camera capable of imaging the front direction of the user. The imaging unit 111 receives an imaging signal instructed from the communication I / F 112 and performs imaging. The imaging unit 111 may be provided anywhere on the wearable glass 110 as long as the imaging unit 111 can pick up an image in the front direction of the user. Although FIG. 1 illustrates an example in which the left hinge portion of the wearable glass is provided, the imaging unit 111 may be provided in the right hinge portion or may be provided in the bridge portion. The imaging unit 111 transmits a captured image obtained by imaging to the communication I / F 112. In addition, the imaging unit 111 may have a detection function of analyzing the captured image and detecting the presence or absence of characters in the captured image while sequentially capturing images, and at this time, the captured image includes characters. When it is determined, the presence signal indicating that the character is present in the front direction of the user is transmitted to the communication I / F 112.

The communication I / F 112 is a communication interface having a function of communicating with the controller 150. The communication I / F 112 is communicably connected to the communication I / F 151 of the controller 150. Here, as shown in FIG. 1, it is assumed that they are connected by wire, but this may be wireless connection. The communication I / F 112 transmits the imaging signal transmitted from the communication I / F 151 of the controller 150 to the imaging unit 111. In addition, the communication I / F 112 transmits, to the communication I / F 151, the captured image transmitted from the imaging unit 111 and a presence signal indicating that a character is present in the front direction of the user.

The earphone 130 is connected to the output unit 156 of the controller 150, and has a function of outputting an audio signal transmitted from the output unit 156 as an audio. Here, as shown in FIG. 1, the earphone 130 is connected to the controller 150 by wire, but this may be wireless connection. The earphone 130 outputs a read-out voice read out from a character detected based on the captured image, a sound indicating that the character is being analyzed, or a sound indicating that the character is present in the front direction of the imaging unit 111.

The controller 150 includes a communication I / F 151, a communication unit 152, a storage unit 153, an input unit 154, a control unit 155, and an output unit 156. As shown in FIG. 1, each part of the controller 150 is mutually connected by a bus.

The communication I / F 151 is a communication interface having a function of communicating with the communication I / F 112 of the wearable glass 110. When the communication I / F 151 receives an imaging signal from the control unit 155, the communication I / F 151 transmits the imaging signal to the communication I / F 112. Further, when the communication I / F 151 receives a captured image or a presence signal from the communication I / F 112, the communication I / F 151 transmits the image to the control unit 155.

The communication unit 152 is a communication interface having a function of executing communication with the server 200 via the network 300. The communication unit 152 functions as a transmission unit for the captured image to the server 200 according to an instruction from the control unit 155, and also functions as a reception unit for receiving from the server 200 a read-out voice obtained by converting characters included in the captured image to voice. . When the communication unit 152 receives the read voice from the server 200, the communication unit 152 transmits the read voice to the control unit 155.

The storage unit 153 has a function of storing various programs and data required for the controller 150 to function. The storage unit 153 can be realized by, for example, an HDD (Hard Disc Drive), an SSD (Solid State Drive), a flash memory, and the like, but is not limited thereto. The storage unit 153 stores, for example, a reading program executed by the control unit 155, a captured image captured by the imaging unit 111, information on read-out voice received by the communication unit 152, and the like. In addition, the storage unit 153 is a sound that is output from the output unit 156 during the period from when the communication unit 152 transmits the captured image to the server 200 to when the read-out sound is received, and the character is being converted into sound. The voice information to be shown is stored. Furthermore, when there is a character in the imaging direction of the imaging unit 111, the storage unit 153 also stores voice information for informing the user 10 of the fact.

The input unit 154 has a function of receiving an input from the user 10. The input unit 154 can be realized by, for example, a hard key provided in the controller 150, but this may be realized by a touch panel or the like. The input unit 154 includes an imaging button 154A for instructing the imaging by the user 10, a playback button 154B for instructing playback, pause, and replay, and an adjustment button 154C for adjusting the playback speed of sound. Good. The input unit 154 transmits a signal indicating the pressed content to the control unit 155 in response to the pressing of each button.

The control unit 155 is a processor having a function of controlling each unit of the controller 150. The control unit 155 executes the various programs stored in the storage unit 153 to perform the function to be executed as the controller 150.

When receiving the imaging instruction from the input unit 154, the control unit 155 instructs the communication I / F 151 to transmit an imaging signal to the wearable glass 110.

Further, when the captured image is transmitted from the communication I / F 151, the control unit 155 instructs the communication unit 152 to transmit the captured image to the server 200. Further, after the instruction, the control unit 155 reads from the storage unit 153 a voice indicating that conversion of characters included in the captured image into a voice is instructed, and instructs the output unit 156 to output the voice.

When the read-out voice is transmitted from the communication unit 152, the control unit 155 instructs the output unit 156 to stop the output of the voice indicating that conversion is in progress. Then, the control unit 155 instructs the output unit 156 to output the read voice.

Further, when the presence signal is transmitted from communication I / F 151, control unit 155 reads from storage unit 153 a voice indicating that a character is present in the front direction of user 10, and outputs the voice to output unit 156. To tell.

Further, the control unit 155 executes reproduction control processing of the read-out sound in accordance with the instruction from the user 10 transmitted from the input unit 154. For example, when the pause instruction is received, the output unit 156 is instructed to temporarily stop the reproduction of the reading voice.

Further, for example, when the slow reproduction instruction is received, the control unit 155 instructs the output unit 156 to execute the slow reproduction of the read sound. The slow playback instruction can be replaced by a playback speed adjustment instruction, and the control unit 155 can also increase or decrease the playback speed of the read-out sound. In addition, when the control unit 155 receives a replay instruction, the control unit 155 instructs the output unit 156 to reproduce the read-out voice output so far.

The output unit 156 has a function of outputting an audio signal instructed from the control unit 155 to the earphone 130. The output unit 156 outputs, to the earphone 130, a read-out voice, a voice indicating that the character is being converted into voice, or a voice indicating that the character is present in the front direction of the user 10.

The above is the description of the configuration of the mounting tool 100.

Next, the server 200 will be described. As shown in FIG. 2, the server 200 includes a communication unit 210, a storage unit 220, and a control unit 230. The communication unit 210, the storage unit 220, and the control unit 230 are connected to one another via a bus.

The communication unit 210 is a communication interface having a function of performing communication with the mounting tool 100 (controller 150) via the network 300. The communication unit 210 functions as a transmission unit that transmits read sound to the attachment 100 in accordance with an instruction from the control unit 230, and also functions as a reception unit that receives a captured image. When the communication unit 210 receives a captured image from the mounting tool 100, the communication unit 210 transmits the captured image to the control unit 230.

The storage unit 220 stores various programs and data that the server 200 needs in operation. The storage unit 220 can be realized by, for example, a hard disc drive (HDD), a solid state drive (SSD), a flash memory, or the like, but is not limited thereto. The storage unit 220 stores a character recognition program for extracting characters from an image, a voice conversion program for converting the recognized characters into speech, and read-out voice information. Details of the read-out speech information will be described later.

The control unit 230 is a processor having a function of controlling each unit of the server 200. The control unit 230 executes the various programs stored in the storage unit 220 to perform the function to be executed as the server 200. The control unit 230 functions as the extraction unit 231 by executing the character recognition program, and functions as the conversion unit 232 by executing the speech conversion program.

The extraction unit 231 has a function of analyzing a captured image and extracting characters included in the captured image. Existing character recognition processing can be used as the analysis technique.

The conversion unit 232 has a function of converting the characters extracted by the extraction unit 231 into speech (read-out speech). An existing conversion process can be used for the conversion technology.

The above is the description of the configuration of the server 200.

<Data>
FIG. 3 is a view showing an example of the data configuration of data according to the reading system 1.

FIG. 3A is a view showing a data configuration example (format example) of transmission data 310 (captured image) transmitted by the mounting tool 100 (controller 150) to the server 200. As shown in FIG.

As shown in FIG. 3A, the transmission data 310 is information in which a user ID 311, captured image information 312, and imaging time information 313 are associated.

The user ID 311 is identification information that can uniquely identify the user 10 who uses the mounting tool 100. Thus, the server 200 can identify which user the captured image is from, and can manage the captured image and the generated read-out sound for each user.

The captured image information 312 is information indicating actual data of a captured image captured by the imaging unit 111.

The imaging time information 313 is information indicating the date and time when the captured image indicated by the captured image information 312 was captured. The information is not illustrated, but can be acquired from the internal clock of the imaging unit 111.

FIG. 3B is a view showing an example of the data configuration of the read-out voice information stored in the storage unit 220 of the server 200 and managed for each user who uses the read-out system 1. The data is information for managing the read-out speech obtained by the server 200 converting it in the past.

As shown in FIG. 3B, the read-out voice information 320 is information in which the imaging time information 321, the captured image information 322, and the read-out voice 323 are associated.

The imaging time information 321 is information indicating the date and time when the corresponding imaging image was imaged, and is the same information as the imaging time information 313.

The captured image information 322 is information indicating actual data of the captured image, and is the same information as the captured image information 312.

The read-out sound 323 is actual data indicating the read-out sound obtained by the extraction unit 231 extracting a character from the corresponding captured image information 322 and the conversion unit 232 converting the character.

With the read-out voice information 320, the server 200 can manage the past read-out voice.

The above is the description of the information mainly related to the reading system 1.

<Operation>
The operation of the reading system 1 will now be described. First, the overall operation of the reading system 1 will be described using the sequence diagram shown in FIG. 4, and then the detailed operations of the mounting tool 100 and the server 200 will be described using the flowcharts of FIGS. 5 and 6, respectively. .

FIG. 4 is a sequence diagram showing the exchange between the mounting tool 100 and the server 200. As shown in FIG. As shown in FIG. 4, the wearing tool 100 performs imaging in the front direction of the user 10 (step S <b> 401). Then, the wearing tool 100 transmits the obtained captured image to the server 200 (step S402).

The server 200 receives the captured image transmitted from the mounting tool 100 (step S403). Then, the server 200 extracts characters from the received captured image (step S404). Then, the server 200 converts the extracted characters into speech and generates a read-out speech (step S405). When the read-out speech is generated, the server 200 transmits this to the attachment 100 (step S406).

The wearing tool 100 receives the read-out voice transmitted from the server 200 (step S407). Then, the wearing tool 100 outputs the received read-out voice (step S408). As a result, the reading system 1 can recognize characters present in the front direction (viewing direction) of the user 10 and transmit the characters to the user 10 by sound.

FIG. 5 is a flowchart showing the operation of the mounting tool 100.

First, the input unit 154 of the mounting tool 100 determines whether or not there has been an input from the user based on whether or not various buttons have been pressed (step S501). If there is an input from the user (YES in step S501), the process proceeds to the process of step S502. If there is no input (NO in step S501), the process proceeds to the process of step S512.

In step S502, the control unit 155 determines whether the input accepted by the input unit 154 is an imaging instruction (step S502). If the input is an imaging instruction (YES in step S502), the process proceeds to step S503. If the input is not an imaging instruction (NO in step S502), the process proceeds to step S506.

In step S503, when the input unit 154 receives an imaging instruction from the user, the imaging instruction is transmitted to the control unit 155. In response to this, the control unit 155 instructs the communication I / F 151 to transmit the imaging signal to the wearable glass 110. The communication I / F 151 transmits an imaging signal to the communication I / F 112 according to the instruction. Then, the communication I / F 112 transmits an imaging signal to the imaging unit 111, and the imaging unit 111 executes imaging (step S503).

The imaging unit 111 transmits the obtained captured image to the communication I / F 112, and the communication I / F 112 transfers the captured image to the communication I / F 151. The communication I / F 151 transmits the transmitted captured image to the control unit 155, and the control unit 155 instructs the communication unit 152 to transmit this to the server 200. In response to the instruction, the communication unit 152 transmits the captured image to the server 200 via the network 300 (step S504).

Then, after transmitting the captured image, the control unit 155 reads from the storage unit 153 a voice indicating that the characters in the captured image are being converted to voice, and instructs the output unit 156 to output the voice. In response to this, the output unit 156 outputs the voice to the earphone 130, and the earphone 130 notifies the voice (step S505), and the process returns to step S501. By notifying the voice indicating that the character contained in the captured image is being converted to voice, the user 10 can now recognize that the process of converting the character to voice is being performed. As compared with the case where no sound is generated (no notification is made to the user 10), it is possible to wait without frustration.

On the other hand, when it is determined in step S502 that the input instruction is not the imaging instruction (NO in step S502), it is determined whether the input is the audio pause instruction (step S506). If the input is a voice pause instruction (YES in step S506), the control unit 155 instructs the output unit 156 to pause the voice being output. Upon receiving the instruction, the output unit 156 stops outputting the voice (step S507), and returns to the process of step S501. The pause is performed until a new reproduction instruction is input or until a complete stop instruction is input.

When it is determined in step S506 that the input instruction is not the pause instruction (NO in step S506), it is determined whether the input is the slow reproduction instruction (step S508). If the input is an instruction for slow reproduction (YES in step S508), the control unit 155 instructs the output unit 156 to perform slow reproduction of the audio being output. In response to the instruction, the output unit 156 starts slow reproduction of the audio being output (step S509), and returns to the process of step S501. This makes it possible to correctly recognize speech even for users who have difficulty in hearing the rapid sound. Although an example is shown here in which the slow reproduction is performed, as described above, the reproduction speed may be increased. When the reproduction speed is increased, it can be used to shorten the time for grasping the outline of the characters included in the captured content.

When it is determined in step S508 that the input instruction is not slow reproduction (NO in step S508), it is determined whether the input is an input of reproduction or replay (step S510). If the input is not a reproduction or replay input (NO in step S510), the process returns to step S501. If the input is an input for playback or replay (YES in step S510), the control unit 155 resumes the output of the temporarily paused voice to the output unit 156 or performs replay of the voice once output. Instruct In response to this, the output unit 156 resumes or replays the audio reproduction (step S511), and returns to the process of step S501. By this, even if the user 10 misses the read-out voice, it can be re-read again.

When there is no input from the user in step S501 (NO in step S501), the control unit 155 determines whether the read-out speech has been received from the server 200 (step S512). If the read-out speech has not been received (NO in step S512), the process returns to step S501.

If the read-out speech has been received (YES in step S512), the control unit 155 first causes the output unit 156 to output a speech indicating that the character being outputted is being converted to speech. Instruct them to cancel In response to the instruction, the output unit 156 stops the output of the voice (step S513).

Then, the control unit 155 instructs the output unit 156 to output the read sound transmitted from the communication unit 132. The output unit 156 starts output of the read-out voice transmitted from the control unit 155 (step S514), and returns to step S501.

The above is the description of the operation of the mounting tool 100 (controller 150).

FIG. 6 is a flowchart showing an operation when the server 200 receives a captured image from the mounting tool 100.

First, the communication unit 210 of the server 200 receives a captured image from the mounting tool 100 via the network 300 (step S601). The communication unit 210 transmits the received captured image to the control unit 230.

The control unit 230, as the extraction unit 231, analyzes the transmitted captured image and extracts characters (step S602). The extraction unit 231 transmits the extracted character string to the conversion unit 232.

The conversion unit 232 converts the extracted character string into speech (step S603), and generates a read-out speech that is a synthetic speech of the opportunity speech. The conversion unit 232 transmits the generated read-out speech to the communication unit 210.

The communication unit 210 transmits the converted synthesized speech as the read-out speech to the wearing tool 100 via the network 300 (step S604).

After that, the control unit 230 sets the received captured image, the captured date and time of the captured image, and the read-out voice obtained from the captured image as the captured image information 322, the capture time information 321, and the read-out voice 323, respectively. It registers in read-out voice information (step S605), and ends the processing.

The above is the operation of the server 200. By performing the operation described above, the reading system 1 can reproduce the voice so as to be easy for the user to hear, rather than merely reading the recognized character.

<Summary>
The reading system 1 can recognize the characters included in the captured image, output them as sounds. At this time, in the reading system 1, since the user can perform operations such as slow reproduction, primary stop, replay, etc. for the read-out sound, the user can reproduce the sound so as to be easy to hear according to each preference. it can. Therefore, it is possible to provide a user-friendly reading system. Further, in the reading system 1, while the process of generating the reading voice from the captured image is being performed, the user 10 can be made to recognize the situation by notifying the voice indicating that the processing is in progress.

<Supplement>
It is needless to say that the reading system according to the above embodiment is not limited to the above embodiment, and may be realized by another method. Hereinafter, various modifications will be described.

(1) In the above embodiment, although the voice is output using the earphone 130, the wearable glass 110 or the controller 150 is provided with a speaker, and the output unit 156 outputs the read voice from the speaker It may be With this configuration, even a user who has pain in wearing the earphone 130 can hear the read-out voice. In addition, in this case, there is also an advantage that a plurality of users can listen to voice simultaneously.

(2) In the embodiment described above, an example has been shown in which the wearing tool 100 includes the wearable glass 110, the earphone 130, and the controller 150 and is configured as separate devices. However, this is not a limitation, and the wearable glass 110, the earphone 130, and the controller 150 may be integrally formed. That is, the wearable glass 110 may include a speaker as an alternative to the function of outputting the sound of the earphone 130, and may hold the function of the controller 150. For example, the temple portion of the wearable glass 110 may have a hollow structure, and the processor, the memory, the communication module, and the like of the controller 150 may be mounted therein. Then, on the exterior side of the temple or rim of the wearable glass 110, various buttons may be provided for voice reproduction control and an imaging instruction.

(3) Although the mounting tool 100 and the server 200 have been described as separate devices in the above embodiment, the mounting tool 100 may have the functions of the server 200 (the functions of the extraction unit and the conversion unit). Good. For example, the controller 150 may be configured to include a chip for realizing the function of the server 200. If comprised in this way, the mounting tool 100 can construct a reading system by a stand-alone. In addition, it is possible to suppress the latency related to the transmission of the captured image and the reception of the reading voice.

(4) In the above embodiment, the range for extracting characters from the captured image is determined in advance, but this is not the only limitation. For example, the wearable glass 110 is provided with a camera for capturing an eye of a user, a gaze direction is detected, a predetermined range centered on the gaze direction is applied to a captured image, and characters within the predetermined range are detected. You may For example, the wearable glass 110 transmits the first captured image captured by the imaging unit 111 and the second captured image obtained by capturing the eyes of the user to the controller 150, and the controller 150 transmits the first captured image and the second captured image. And to the server 200. The extraction unit 231 of the server 200 specifies the line-of-sight direction of the user 10 from the second captured image, specifies a predetermined range including the specified line-of-sight direction, and starts from the location corresponding to the predetermined range in the first captured image It may be configured to extract characters.

(5) In the above embodiment, although the imaging unit 111 receives an input of an imaging instruction to the controller 150 and performs imaging, the imaging trigger is not limited to this. For example, the wearable glass 110 or the controller 150 is provided with a microphone, and the microphone emits voice emitted by the user. Then, imaging may be performed based on a specific word issued by the user. That is, imaging by voice input may be performed.

Alternatively, a camera for capturing an eye of the user may be provided on the wearable glass 110, and blink (eye blink) of the user's eye may be used as a trigger for imaging.

(6) In the above embodiment, the input unit 154 is provided in the controller 150. However, the input unit 154 may be provided in the middle of the cable 140.

(7) Although not particularly described in the above embodiment, the reading system 1 may be provided with a setting unit capable of setting the language of the reading voice. A translation unit may be provided to translate the characters extracted by the extraction unit 231 into the language set in the setting unit, and the conversion unit 232 may convert the characters translated by the translation unit into speech. With this configuration, the reading system 1 can function as a system for interpreting written characters, and can be a system useful not only for people with low vision but also for foreign users.

(8) Although not particularly described in the above embodiment, the extraction unit 231 may keep the range for extracting characters from the captured image within a predetermined range instead of the entire captured image. FIG. 7 shows an example of the captured image 700, and the extraction unit 231 may set only a predetermined range 710 in the captured image 700 as a range for extracting characters. Alternatively, the predetermined range 710 may be set as a range in which characters are preferentially extracted. The process of extracting a character from the outside of the predetermined range 710 when the character extraction range is a range in which the character is extracted first, and the character can not be extracted from the inside of the predetermined range 710 Say what to do.

Here, the predetermined range 710 may be set by the user who uses the reading system 1. Generally, the user tends to look in a direction slightly lower than the front direction. Therefore, it is effective to set the predetermined range 710 closer to the lower part of the captured image 700.

Also, the control unit 230 may set the predetermined range 710. Specifically, for a large number of captured images received by the server 200, a range in which characters can be extracted is specified. Then, the average range may be set as a predetermined range 710 for extracting characters.

Furthermore, various sensors may be provided on the wearable glass 110, and the predetermined range 710 may be determined based on sensing data obtained from the sensors. For example, a gyro sensor is mounted on the wearable glass 110, and the mounting tool 100 transmits sensing data of the gyro sensor to the server 200 together with a captured image. Then, the extraction unit 231 may determine the predetermined range 710 based on sensing data of the gyro sensor. For example, when it is estimated from the sensing data that the user 10 is depressed, the predetermined range 710 may be set to a position from the lower side with respect to the entire captured image 700.

By not using the entire captured image 700 as an analysis target, it is possible to reduce the time required for the conversion processing to sound.

(9) Although not particularly described in the above embodiment, the server 200 has a configuration for transmitting the corresponding read-out voice information 320 as a past log to an information processing apparatus such as a PC held by the user 10 May be With this configuration, the user 10 can listen to the past read-out voice any time.

Furthermore, the mounting tool 100 may be provided with a position information acquisition unit for acquiring position information indicating the location where the own device is present. The position information acquisition unit can be realized by using, for example, GPS or GNSS.

Then, each time the imaging unit 111 obtains a captured image, the position information obtaining unit obtains position information, and associates the obtained position information with the captured image. The attachment tool 100 transmits the captured image associated with the position information to the server 200. The server 200 may further associate and manage imaging position information indicating an imaging position as the reading voice information 320.

Then, information including position information is transmitted from the server 200 as the read-out voice information 320 to the information processing apparatus of the user 10, so that the information processing apparatus of the user 10 further reads the read-out voice as shown in FIG. Can be presented with the map application. That is, the user 10 can recognize when and where the read-out speech is acquired on the map. Then, by positioning and clicking the cursor 803 on the

log information

801 and 802 of the map information, etc., the information processing apparatus may play back the read-out voice by voice reproduction software or the like. For example, as shown in the map 800 in FIG. 8, the log information 801 and the log information 802 can recognize where the read-out voice is obtained based on the captured image captured.

(10) In the above embodiment, the operation of the mounting tool 100 is not described in detail, but the imaging unit 111 sequentially performs imaging, and it is determined whether or not characters are included in the obtained captured image. It is good to detect. Then, when it is detected that a character is included, that effect is transmitted to the controller 150, and the control unit 155 causes the user 10 to recognize the presence of the character in the front direction at that time. It may be informed. Then, the user 10 can input an imaging instruction to the timing and the input unit 154. By providing this configuration, the user 10 can be made to recognize the presence of the character when the user 10 can not visually recognize the presence of the character, such as when the user 10 is a low-vision person, in particular, blind. Can provide a highly convenient reading system 1.

(11) Although not particularly described in the above embodiment, the imaging unit 111 may change the imaging condition according to the environment in which the user (wearable glass 110) is placed. For example, the wearable glass 110 may include various sensors (for example, an illuminance sensor or the like) to change the exposure time and the angle of view.

(12) In the above embodiment, although not particularly described, the server 200 can not extract characters from the image, can not convert the extracted characters into sound, or the image does not contain characters. Alternatively, the error signal may be transmitted to the mounting tool 100, and the mounting tool 100 may receive the signal and output a sound indicating the error from the output unit 156. In addition to the error sound and the sound indicating that conversion is in progress described in the above embodiment, for example, an activation sound when the mounting tool 100 is activated, an imaging sound when the imaging unit 111 performs imaging ( A variety of sounds are stored in the storage unit 153 such as a shutter sound), a sound indicating that the user is waiting, and a cancellation sound when the user inputs an instruction to cancel the process. A corresponding sound may be output from the output unit 156. Further, when the communication unit 152 can not communicate (can not connect to the network), a sound indicating that may be output from the output unit 156. Thus, the mounting tool 100 can notify the user of the state of the device only by the sound by adopting the configuration of outputting the sound according to various states.

(13) In the above embodiment, although not particularly described, the server 200 generates the read-out voice generated according to the ratio of the location where the characters are extracted from the captured image or the range where the characters are extracted to the captured image. May be changed.

Changing the mode of the voice according to the place where the character is extracted means changing the direction in which the user hears the voice according to the place in the captured image in which the character is extracted from the captured image. For example, when the part where the character is extracted is extracted from the right side of the captured image, the output unit 156 may output so that the read-out voice can be heard from the right side of the user. With this configuration, it is possible for the user to intuitively recognize in which direction the user read the character in the direction viewed from the user.

In addition, changing the mode of the read-out voice generated according to the ratio of the range from which the character is extracted to the captured image changes the volume of the read-out voice according to the ratio of the ratio to the captured image of the range from which the character is extracted. You may configure it. That is, the percentage of the ratio and the volume at which the read-out speech is output are stored in association, and the volume at which the read-out speech is output is determined in comparison with the percentage of the range in which the characters are extracted from the captured image. It is also possible to output the read sound at the selected volume.

(14) In the above embodiment, the transmission data 310 is associated with the user ID 311, the captured image information 312, and the imaging time information 313, but various other information is also supported. It may be attached. For example, as shown in the above supplement, even if information of sensing data such as a gyro sensor or an acceleration sensor capable of specifying the posture of the mounting tool 100 is also associated with position information indicating a position where the mounting tool 100 exists. Good.

In addition, also regarding the read-out voice information, it is assumed that the imaging time information 321, the captured image information 322, and the read-out voice 323 are associated with each other, but in addition to this, the captured image is obtained by analysis. Text data of characters, position information included in the transmission data 310, sensing data, etc. may also be associated.

The read-out voice information can be used as a life log of each user by accumulating and accumulating various types of information. And according to the demand from a user, server 200 may be provided with the offer part which provides specified information among the stored information. For example, by accumulating position information, information on the amount of movement of the user per unit time (for example, 1 day) can be provided, information on where the user has gone, information on gyro sensor information can be used Then, by specifying the posture of the user, it is possible to provide posture information (for example, whether the posture is good or bad).

(15) In the above embodiment, the processor (the control unit 155, the control unit 230) functioning as each functional unit constituting the reading system 1 performs a reading program etc. as a method for the reading system 1 to read the voice. By performing this processing, the reading processing is performed. This is a logic circuit (hardware) or a dedicated circuit formed in an integrated circuit (IC (Integrated Circuit) chip, LSI (Large Scale Integration)) or the like. It may be realized by incorporating In addition, these circuits may be realized by one or more integrated circuits, and the functions of the plurality of functional units shown in the above embodiments may be realized by one integrated circuit. An LSI may be called a VLSI, a super LSI, an ultra LSI, or the like depending on the degree of integration. That is, as shown in FIG. 9, each functional unit in the mounting tool 100 and the server 200 constituting the reading system 1 may be realized by a physical circuit. That is, as shown in FIG. 9, the wearing tool 100 includes a wearable glass 110 including an imaging circuit 111a and a communication I / F circuit 112a, an earphone 130, a communication I / F circuit 151a, a communication unit 152a, and a memory. It may be composed of the circuit 153a, the input circuit 154a, the control circuit 155a, and the output circuit 156a, and may have the same function as that of the corresponding functional units in the above embodiment. Similarly, the server 200 may also be configured of a communication circuit 210a, a storage circuit 220a, and a control circuit 230a including an extraction circuit 231a and a conversion circuit 232a.

Further, the above-mentioned reading program may be recorded on a recording medium readable by a processor, and as the recording medium, "non-temporary tangible medium", for example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit Etc. can be used. Further, the reading program may be supplied to the processor via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the reading program. The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above-mentioned reading program is embodied by electronic transmission.

The above-mentioned reading program can be implemented using, for example, a script language such as ActionScript or JavaScript (registered trademark), an object-oriented programming language such as Objective-C or Java (registered trademark), or a markup language such as HTML5.

(16) The configurations shown in the above embodiment and each supplement may be combined as appropriate.

1 reading system 100 mounting tool 110 wearable glass 111 imaging unit 112 communication I / F
130 Earphone 150 Controller 151 Communication I / F
152 communication unit 153 storage unit 154 input unit 155 control unit 156 output unit 200 server 210 communication unit 220 storage unit 230 control unit 231 extraction unit 232 conversion unit

Claims

An imaging unit provided in a mounting tool worn and used by the user, and imaging a front direction of the user;
An extraction unit that extracts characters from an image captured by the imaging unit;
A converter for converting the characters extracted by the extractor into speech;
An output unit provided in the mounting tool for outputting the sound;
An input unit provided in the mounting tool for receiving an input from a user;
A control unit which is provided in the mounting tool and controls the reproduction speed of the sound output from the output unit based on the input from the user received through the input unit.
The reading system according to claim 1, wherein the control unit pauses the sound output from the output unit based on the input from the user received through the input unit.
The reading system according to claim 1, wherein the control unit repeatedly reproduces the voice output from the output unit based on the input from the user received through the input unit.
The reading system according to claim 1, wherein the output unit outputs a sound indicating that character conversion processing is in progress while the conversion unit converts the characters into speech.
The reading system includes a server including the extraction unit and the conversion unit.
The mounting tool is
A transmitting unit that transmits an image captured by the imaging unit to the server;
A receiver for receiving the voice converted by the converter;
The invention is characterized in that the output unit outputs a sound indicating that the conversion process of the character is being performed between the transmission of the image by the transmission unit and the reception of the sound by the reception unit. The reading system described in 4.
The mounting tool includes an acquisition unit for acquiring environmental information on the surrounding environment,
The reading system according to claim 1, wherein the imaging unit changes imaging conditions based on the environment information.
The mounting tool includes a determination unit that determines whether or not there is a character in a captured image captured by the imaging unit.
The invention is characterized in that, when the determination unit determines that a character is included in the captured image, the output unit outputs a sound indicating that a character is present in the imaging direction by the imaging unit. The reading system described in 1.
The reading system
A log transmission unit is provided, which transmits the picked up image picked up by the image pickup unit to the information processing terminal of the user by associating the picked up image picked up with the voice obtained by converting the conversion unit based on the picked up image. The reading system according to claim 1.
The mounting tool includes a position information acquisition unit that acquires position information indicating the position of the terminal.
The imaging unit associates the captured image with the position information acquired by the position information when the image is captured,
The reading system according to claim 8, wherein the log transmission unit transmits the position information together with the captured image and the voice to the information processing terminal of the user.
An imaging step of imaging a front direction of the user by an imaging unit provided in a mounting tool worn and used by the user;
An extraction step of extracting characters from the image captured in the imaging step;
Converting the characters extracted in the extracting step into speech;
An output step of outputting the sound from an output unit provided in the mounting tool;
An input step of receiving an input from a user from an input unit provided in the mounting tool;
A control step of controlling a reproduction speed of voice output from the output unit based on an input from the user received through the input unit.