WO2019039591A1 - Read-out system and read-out method - Google Patents

Read-out system and read-out method Download PDF

Info

Publication number
WO2019039591A1
WO2019039591A1 PCT/JP2018/031366 JP2018031366W WO2019039591A1 WO 2019039591 A1 WO2019039591 A1 WO 2019039591A1 JP 2018031366 W JP2018031366 W JP 2018031366W WO 2019039591 A1 WO2019039591 A1 WO 2019039591A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
user
imaging
input
output
Prior art date
Application number
PCT/JP2018/031366
Other languages
French (fr)
Japanese (ja)
Other versions
WO2019039591A4 (en
Inventor
圭佑 島影
Original Assignee
株式会社オトングラス
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社オトングラス filed Critical 株式会社オトングラス
Publication of WO2019039591A1 publication Critical patent/WO2019039591A1/en
Publication of WO2019039591A4 publication Critical patent/WO2019039591A4/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/043Time compression or expansion by changing speed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/057Time compression or expansion for improving intelligibility

Definitions

  • the present invention relates to a reading system and a reading method for converting sentences into speech and reading them.
  • Patent Document 1 discloses a wearable display capable of imaging and displaying a front view so that a low vision person can walk outdoors at night or the like. According to the low-vision person wearable display of Patent Document 1, the contrast and the brightness of the captured image are converted and displayed. It also discloses that when there is a character in a captured image, character recognition processing is performed to notify the user of the character by voice.
  • the present invention has been made in view of the above problems, and it is an object of the present invention to provide a reading system that is more convenient for the user to use than the low vision wearable display described in Patent Document 1 above. Do.
  • a reading system concerning one mode of the present invention is equipped with a wearing tool which a user wears and uses, and an image which an image pick-up part picturizes a user's front direction, and an image And a converter for converting characters extracted by the extracting unit into voice, an output unit for outputting voice and provided in the mounting tool, and a mounting tool for receiving an input from the user An input unit, and a control unit provided in the mounting tool and controlling the reproduction speed of the sound output from the output unit based on the input from the user received via the input unit.
  • the reading method includes an imaging step of imaging a front direction of a user by an imaging unit provided in a mounting tool worn and used by the user; An extraction step of extracting characters from the image captured in step, a conversion step of converting the characters extracted in the extraction step into voice, an output step of outputting voice from an output unit provided in the wearing tool, and The method further includes an input step of receiving an input from the user from the input unit, and a control step of controlling a reproduction speed of the sound output from the output unit based on the input from the user received through the input unit.
  • control unit may pause the sound output from the output unit based on the input from the user received via the input unit.
  • control unit may repeatedly reproduce the voice output from the output unit based on the input from the user received via the input unit.
  • the output unit may output a sound indicating that the character conversion process is in progress while the conversion unit converts the characters into speech.
  • the reading system includes a server including an extraction unit and a conversion unit, and the mounting tool receives a transmission unit that transmits an image captured by the imaging unit to the server, and a voice converted by the conversion unit.
  • the output unit may output a sound indicating that character conversion processing is in progress, from when the transmission unit transmits an image to when the reception unit receives an audio.
  • the mounting tool may include an acquisition unit for acquiring environment information on the surrounding environment, and the imaging unit may change the imaging condition based on the environment information.
  • the mounting tool includes a determination unit that determines whether or not there is a character in the captured image captured by the imaging unit, and the output unit includes the character in the captured image. When it is determined that the character is present, a sound indicating that a character is present in the imaging direction by the imaging unit may be output.
  • the reading system associates the captured image captured by the imaging unit with the voice obtained by converting the converting unit based on the captured image, and transmits the associated image to the user's information processing terminal
  • a log transmission unit may be provided.
  • the mounting tool includes a position information acquisition unit that acquires position information indicating the position of the own terminal, and the imaging unit acquires position information acquired when the imaging unit captures an image of the captured image.
  • the log transmission unit may transmit position information to the user's information processing terminal together with the captured image and the sound.
  • the reading system according to an aspect of the present invention can freely adjust the speed of reading speech converted from characters. Therefore, it is possible to provide a reading system excellent in usability.
  • (A) is a figure which shows the example of an external appearance of the user who mounts
  • (b) is a figure which shows the example of an external appearance which images by using a mounting tool and it reads aloud. It is a figure showing an example of system configuration of a reading system.
  • (A) is a figure which shows the structural example of the data which a mounting tool transmits to a server
  • (b) is a figure which shows the structural example of the read-out audio
  • FIG. 6 is a diagram illustrating an example of a range in which characters are preferentially extracted from an image. It is a figure which shows the example of a screen for reproducing the reading audio
  • the reading system 1 is provided in the mounting tool 100 worn and used by the user, and from the imaging unit 111 that images the front direction of the user and the image captured by the imaging unit 111
  • Control unit 155 that controls the reproduction speed of the sound output from the output unit 156 based on the input from the user that is provided to the mounting tool 100 and that is received by the user via the input unit 154.
  • Such a reading system 1 will be described in detail below.
  • the user 10 wears the wearable glass 110 and uses it.
  • an imaging unit 111 is disposed at a position where it can capture an image in the front direction of the user according to an instruction from the user.
  • the imaging unit 111 is a so-called camera.
  • the wearable glass 110 is connected to the controller 150.
  • the wearable glass 110 is shown to be connected via the cord 130 via the earphone 130 and the cable 140.
  • the wearable glass 110 and the controller 150 are earphones. Similar to 130 and controller 150, they may be directly connected.
  • the user 10 can wear the earphone 130 on his ear and listen to the read-out voice transmitted from the controller 150. As shown in FIG.
  • the user 10 holds the controller 150, and can use the controller 150 to issue an imaging instruction or an instruction related to reproduction of a read-out voice.
  • the imaging unit 111 captures an image of the imaging range 160. Then, a character included in the imaging range 160 is recognized, and the character is converted into a mechanically synthesized speech and read out. Therefore, the reading system 1 can provide information of difficult-to-read characters to the low vision person or the like.
  • FIG. 2 shows an example of the system configuration of the reading system 1.
  • the reading system 1 includes a wearing tool 100 and a server 200.
  • the mounting tool 100 and the server 200 are configured to be able to communicate via the network 300.
  • the mounting tool 100 and the network 300 communicate by wireless communication. Any communication protocol may be used as long as wireless communication can be performed.
  • the server 200 also communicates with the network, but any communication mode may be used, either wireless communication or wired communication, and any communication protocol may be used as long as communication can be performed.
  • the mounting tool 100 includes a wearable glass 110, an earphone 130, and a controller 150. That is, in the present embodiment, as shown in FIG. 2, the wearable glass 110, the earphone 130, and the controller 150 are collectively referred to as a mounting tool 100. Further, although the wearable glass 110 is used here, it is needless to say that the wearable glass 110 is not limited to the glasses as long as the front direction (viewing direction) of the user 10 can be imaged.
  • the wearable glass 110 includes an imaging unit 111 and a communication I / F 112.
  • the imaging unit 111 is a camera capable of imaging the front direction of the user.
  • the imaging unit 111 receives an imaging signal instructed from the communication I / F 112 and performs imaging.
  • the imaging unit 111 may be provided anywhere on the wearable glass 110 as long as the imaging unit 111 can pick up an image in the front direction of the user.
  • FIG. 1 illustrates an example in which the left hinge portion of the wearable glass is provided, the imaging unit 111 may be provided in the right hinge portion or may be provided in the bridge portion.
  • the imaging unit 111 transmits a captured image obtained by imaging to the communication I / F 112.
  • the imaging unit 111 may have a detection function of analyzing the captured image and detecting the presence or absence of characters in the captured image while sequentially capturing images, and at this time, the captured image includes characters.
  • the presence signal indicating that the character is present in the front direction of the user is transmitted to the communication I / F 112.
  • the communication I / F 112 is a communication interface having a function of communicating with the controller 150.
  • the communication I / F 112 is communicably connected to the communication I / F 151 of the controller 150.
  • the communication I / F 112 transmits the imaging signal transmitted from the communication I / F 151 of the controller 150 to the imaging unit 111.
  • the communication I / F 112 transmits, to the communication I / F 151, the captured image transmitted from the imaging unit 111 and a presence signal indicating that a character is present in the front direction of the user.
  • the earphone 130 is connected to the output unit 156 of the controller 150, and has a function of outputting an audio signal transmitted from the output unit 156 as an audio.
  • the earphone 130 is connected to the controller 150 by wire, but this may be wireless connection.
  • the earphone 130 outputs a read-out voice read out from a character detected based on the captured image, a sound indicating that the character is being analyzed, or a sound indicating that the character is present in the front direction of the imaging unit 111.
  • the controller 150 includes a communication I / F 151, a communication unit 152, a storage unit 153, an input unit 154, a control unit 155, and an output unit 156. As shown in FIG. 1, each part of the controller 150 is mutually connected by a bus.
  • the communication I / F 151 is a communication interface having a function of communicating with the communication I / F 112 of the wearable glass 110.
  • the communication I / F 151 receives an imaging signal from the control unit 155, the communication I / F 151 transmits the imaging signal to the communication I / F 112. Further, when the communication I / F 151 receives a captured image or a presence signal from the communication I / F 112, the communication I / F 151 transmits the image to the control unit 155.
  • the communication unit 152 is a communication interface having a function of executing communication with the server 200 via the network 300.
  • the communication unit 152 functions as a transmission unit for the captured image to the server 200 according to an instruction from the control unit 155, and also functions as a reception unit for receiving from the server 200 a read-out voice obtained by converting characters included in the captured image to voice. .
  • the communication unit 152 transmits the read voice to the control unit 155.
  • the storage unit 153 has a function of storing various programs and data required for the controller 150 to function.
  • the storage unit 153 can be realized by, for example, an HDD (Hard Disc Drive), an SSD (Solid State Drive), a flash memory, and the like, but is not limited thereto.
  • the storage unit 153 stores, for example, a reading program executed by the control unit 155, a captured image captured by the imaging unit 111, information on read-out voice received by the communication unit 152, and the like.
  • the storage unit 153 is a sound that is output from the output unit 156 during the period from when the communication unit 152 transmits the captured image to the server 200 to when the read-out sound is received, and the character is being converted into sound.
  • the voice information to be shown is stored.
  • the storage unit 153 also stores voice information for informing the user 10 of the fact.
  • the input unit 154 has a function of receiving an input from the user 10.
  • the input unit 154 can be realized by, for example, a hard key provided in the controller 150, but this may be realized by a touch panel or the like.
  • the input unit 154 includes an imaging button 154A for instructing the imaging by the user 10, a playback button 154B for instructing playback, pause, and replay, and an adjustment button 154C for adjusting the playback speed of sound. Good.
  • the input unit 154 transmits a signal indicating the pressed content to the control unit 155 in response to the pressing of each button.
  • the control unit 155 is a processor having a function of controlling each unit of the controller 150.
  • the control unit 155 executes the various programs stored in the storage unit 153 to perform the function to be executed as the controller 150.
  • the control unit 155 When receiving the imaging instruction from the input unit 154, the control unit 155 instructs the communication I / F 151 to transmit an imaging signal to the wearable glass 110.
  • the control unit 155 instructs the communication unit 152 to transmit the captured image to the server 200. Further, after the instruction, the control unit 155 reads from the storage unit 153 a voice indicating that conversion of characters included in the captured image into a voice is instructed, and instructs the output unit 156 to output the voice.
  • the control unit 155 instructs the output unit 156 to stop the output of the voice indicating that conversion is in progress. Then, the control unit 155 instructs the output unit 156 to output the read voice.
  • control unit 155 reads from storage unit 153 a voice indicating that a character is present in the front direction of user 10, and outputs the voice to output unit 156. To tell.
  • control unit 155 executes reproduction control processing of the read-out sound in accordance with the instruction from the user 10 transmitted from the input unit 154. For example, when the pause instruction is received, the output unit 156 is instructed to temporarily stop the reproduction of the reading voice.
  • the control unit 155 instructs the output unit 156 to execute the slow reproduction of the read sound.
  • the slow playback instruction can be replaced by a playback speed adjustment instruction, and the control unit 155 can also increase or decrease the playback speed of the read-out sound.
  • the control unit 155 receives a replay instruction, the control unit 155 instructs the output unit 156 to reproduce the read-out voice output so far.
  • the output unit 156 has a function of outputting an audio signal instructed from the control unit 155 to the earphone 130.
  • the output unit 156 outputs, to the earphone 130, a read-out voice, a voice indicating that the character is being converted into voice, or a voice indicating that the character is present in the front direction of the user 10.
  • the server 200 includes a communication unit 210, a storage unit 220, and a control unit 230.
  • the communication unit 210, the storage unit 220, and the control unit 230 are connected to one another via a bus.
  • the communication unit 210 is a communication interface having a function of performing communication with the mounting tool 100 (controller 150) via the network 300.
  • the communication unit 210 functions as a transmission unit that transmits read sound to the attachment 100 in accordance with an instruction from the control unit 230, and also functions as a reception unit that receives a captured image.
  • the communication unit 210 transmits the captured image to the control unit 230.
  • the storage unit 220 stores various programs and data that the server 200 needs in operation.
  • the storage unit 220 can be realized by, for example, a hard disc drive (HDD), a solid state drive (SSD), a flash memory, or the like, but is not limited thereto.
  • the storage unit 220 stores a character recognition program for extracting characters from an image, a voice conversion program for converting the recognized characters into speech, and read-out voice information. Details of the read-out speech information will be described later.
  • the control unit 230 is a processor having a function of controlling each unit of the server 200.
  • the control unit 230 executes the various programs stored in the storage unit 220 to perform the function to be executed as the server 200.
  • the control unit 230 functions as the extraction unit 231 by executing the character recognition program, and functions as the conversion unit 232 by executing the speech conversion program.
  • the extraction unit 231 has a function of analyzing a captured image and extracting characters included in the captured image. Existing character recognition processing can be used as the analysis technique.
  • the conversion unit 232 has a function of converting the characters extracted by the extraction unit 231 into speech (read-out speech).
  • An existing conversion process can be used for the conversion technology.
  • FIG. 3 is a view showing an example of the data configuration of data according to the reading system 1.
  • FIG. 3A is a view showing a data configuration example (format example) of transmission data 310 (captured image) transmitted by the mounting tool 100 (controller 150) to the server 200. As shown in FIG.
  • the transmission data 310 is information in which a user ID 311, captured image information 312, and imaging time information 313 are associated.
  • the user ID 311 is identification information that can uniquely identify the user 10 who uses the mounting tool 100.
  • the server 200 can identify which user the captured image is from, and can manage the captured image and the generated read-out sound for each user.
  • the captured image information 312 is information indicating actual data of a captured image captured by the imaging unit 111.
  • the imaging time information 313 is information indicating the date and time when the captured image indicated by the captured image information 312 was captured. The information is not illustrated, but can be acquired from the internal clock of the imaging unit 111.
  • FIG. 3B is a view showing an example of the data configuration of the read-out voice information stored in the storage unit 220 of the server 200 and managed for each user who uses the read-out system 1.
  • the data is information for managing the read-out speech obtained by the server 200 converting it in the past.
  • the read-out voice information 320 is information in which the imaging time information 321, the captured image information 322, and the read-out voice 323 are associated.
  • the imaging time information 321 is information indicating the date and time when the corresponding imaging image was imaged, and is the same information as the imaging time information 313.
  • the captured image information 322 is information indicating actual data of the captured image, and is the same information as the captured image information 312.
  • the read-out sound 323 is actual data indicating the read-out sound obtained by the extraction unit 231 extracting a character from the corresponding captured image information 322 and the conversion unit 232 converting the character.
  • the server 200 can manage the past read-out voice.
  • FIG. 4 is a sequence diagram showing the exchange between the mounting tool 100 and the server 200.
  • the wearing tool 100 performs imaging in the front direction of the user 10 (step S ⁇ b> 401). Then, the wearing tool 100 transmits the obtained captured image to the server 200 (step S402).
  • the server 200 receives the captured image transmitted from the mounting tool 100 (step S403). Then, the server 200 extracts characters from the received captured image (step S404). Then, the server 200 converts the extracted characters into speech and generates a read-out speech (step S405). When the read-out speech is generated, the server 200 transmits this to the attachment 100 (step S406).
  • the wearing tool 100 receives the read-out voice transmitted from the server 200 (step S407). Then, the wearing tool 100 outputs the received read-out voice (step S408). As a result, the reading system 1 can recognize characters present in the front direction (viewing direction) of the user 10 and transmit the characters to the user 10 by sound.
  • FIG. 5 is a flowchart showing the operation of the mounting tool 100.
  • the input unit 154 of the mounting tool 100 determines whether or not there has been an input from the user based on whether or not various buttons have been pressed (step S501). If there is an input from the user (YES in step S501), the process proceeds to the process of step S502. If there is no input (NO in step S501), the process proceeds to the process of step S512.
  • step S502 the control unit 155 determines whether the input accepted by the input unit 154 is an imaging instruction (step S502). If the input is an imaging instruction (YES in step S502), the process proceeds to step S503. If the input is not an imaging instruction (NO in step S502), the process proceeds to step S506.
  • step S503 when the input unit 154 receives an imaging instruction from the user, the imaging instruction is transmitted to the control unit 155.
  • the control unit 155 instructs the communication I / F 151 to transmit the imaging signal to the wearable glass 110.
  • the communication I / F 151 transmits an imaging signal to the communication I / F 112 according to the instruction.
  • the communication I / F 112 transmits an imaging signal to the imaging unit 111, and the imaging unit 111 executes imaging (step S503).
  • the imaging unit 111 transmits the obtained captured image to the communication I / F 112, and the communication I / F 112 transfers the captured image to the communication I / F 151.
  • the communication I / F 151 transmits the transmitted captured image to the control unit 155, and the control unit 155 instructs the communication unit 152 to transmit this to the server 200.
  • the communication unit 152 transmits the captured image to the server 200 via the network 300 (step S504).
  • the control unit 155 reads from the storage unit 153 a voice indicating that the characters in the captured image are being converted to voice, and instructs the output unit 156 to output the voice.
  • the output unit 156 outputs the voice to the earphone 130, and the earphone 130 notifies the voice (step S505), and the process returns to step S501.
  • step S502 when it is determined in step S502 that the input instruction is not the imaging instruction (NO in step S502), it is determined whether the input is the audio pause instruction (step S506). If the input is a voice pause instruction (YES in step S506), the control unit 155 instructs the output unit 156 to pause the voice being output. Upon receiving the instruction, the output unit 156 stops outputting the voice (step S507), and returns to the process of step S501. The pause is performed until a new reproduction instruction is input or until a complete stop instruction is input.
  • step S506 When it is determined in step S506 that the input instruction is not the pause instruction (NO in step S506), it is determined whether the input is the slow reproduction instruction (step S508). If the input is an instruction for slow reproduction (YES in step S508), the control unit 155 instructs the output unit 156 to perform slow reproduction of the audio being output. In response to the instruction, the output unit 156 starts slow reproduction of the audio being output (step S509), and returns to the process of step S501.
  • the reproduction speed may be increased. When the reproduction speed is increased, it can be used to shorten the time for grasping the outline of the characters included in the captured content.
  • step S510 When it is determined in step S508 that the input instruction is not slow reproduction (NO in step S508), it is determined whether the input is an input of reproduction or replay (step S510). If the input is not a reproduction or replay input (NO in step S510), the process returns to step S501. If the input is an input for playback or replay (YES in step S510), the control unit 155 resumes the output of the temporarily paused voice to the output unit 156 or performs replay of the voice once output. Instruct In response to this, the output unit 156 resumes or replays the audio reproduction (step S511), and returns to the process of step S501. By this, even if the user 10 misses the read-out voice, it can be re-read again.
  • step S501 When there is no input from the user in step S501 (NO in step S501), the control unit 155 determines whether the read-out speech has been received from the server 200 (step S512). If the read-out speech has not been received (NO in step S512), the process returns to step S501.
  • control unit 155 If the read-out speech has been received (YES in step S512), the control unit 155 first causes the output unit 156 to output a speech indicating that the character being outputted is being converted to speech. Instruct them to cancel In response to the instruction, the output unit 156 stops the output of the voice (step S513).
  • control unit 155 instructs the output unit 156 to output the read sound transmitted from the communication unit 132.
  • the output unit 156 starts output of the read-out voice transmitted from the control unit 155 (step S514), and returns to step S501.
  • FIG. 6 is a flowchart showing an operation when the server 200 receives a captured image from the mounting tool 100.
  • the communication unit 210 of the server 200 receives a captured image from the mounting tool 100 via the network 300 (step S601).
  • the communication unit 210 transmits the received captured image to the control unit 230.
  • the control unit 230 analyzes the transmitted captured image and extracts characters (step S602).
  • the extraction unit 231 transmits the extracted character string to the conversion unit 232.
  • the conversion unit 232 converts the extracted character string into speech (step S603), and generates a read-out speech that is a synthetic speech of the opportunity speech.
  • the conversion unit 232 transmits the generated read-out speech to the communication unit 210.
  • the communication unit 210 transmits the converted synthesized speech as the read-out speech to the wearing tool 100 via the network 300 (step S604).
  • control unit 230 sets the received captured image, the captured date and time of the captured image, and the read-out voice obtained from the captured image as the captured image information 322, the capture time information 321, and the read-out voice 323, respectively. It registers in read-out voice information (step S605), and ends the processing.
  • the reading system 1 can reproduce the voice so as to be easy for the user to hear, rather than merely reading the recognized character.
  • the reading system 1 can recognize the characters included in the captured image, output them as sounds. At this time, in the reading system 1, since the user can perform operations such as slow reproduction, primary stop, replay, etc. for the read-out sound, the user can reproduce the sound so as to be easy to hear according to each preference. it can. Therefore, it is possible to provide a user-friendly reading system. Further, in the reading system 1, while the process of generating the reading voice from the captured image is being performed, the user 10 can be made to recognize the situation by notifying the voice indicating that the processing is in progress.
  • the voice is output using the earphone 130
  • the wearable glass 110 or the controller 150 is provided with a speaker
  • the output unit 156 outputs the read voice from the speaker It may be With this configuration, even a user who has pain in wearing the earphone 130 can hear the read-out voice. In addition, in this case, there is also an advantage that a plurality of users can listen to voice simultaneously.
  • the wearing tool 100 includes the wearable glass 110, the earphone 130, and the controller 150 and is configured as separate devices.
  • the wearable glass 110, the earphone 130, and the controller 150 may be integrally formed. That is, the wearable glass 110 may include a speaker as an alternative to the function of outputting the sound of the earphone 130, and may hold the function of the controller 150.
  • the temple portion of the wearable glass 110 may have a hollow structure, and the processor, the memory, the communication module, and the like of the controller 150 may be mounted therein. Then, on the exterior side of the temple or rim of the wearable glass 110, various buttons may be provided for voice reproduction control and an imaging instruction.
  • the mounting tool 100 may have the functions of the server 200 (the functions of the extraction unit and the conversion unit).
  • the controller 150 may be configured to include a chip for realizing the function of the server 200. If comprised in this way, the mounting tool 100 can construct a reading system by a stand-alone. In addition, it is possible to suppress the latency related to the transmission of the captured image and the reception of the reading voice.
  • the range for extracting characters from the captured image is determined in advance, but this is not the only limitation.
  • the wearable glass 110 is provided with a camera for capturing an eye of a user, a gaze direction is detected, a predetermined range centered on the gaze direction is applied to a captured image, and characters within the predetermined range are detected. You may For example, the wearable glass 110 transmits the first captured image captured by the imaging unit 111 and the second captured image obtained by capturing the eyes of the user to the controller 150, and the controller 150 transmits the first captured image and the second captured image. And to the server 200.
  • the extraction unit 231 of the server 200 specifies the line-of-sight direction of the user 10 from the second captured image, specifies a predetermined range including the specified line-of-sight direction, and starts from the location corresponding to the predetermined range in the first captured image It may be configured to extract characters.
  • the imaging unit 111 receives an input of an imaging instruction to the controller 150 and performs imaging
  • the imaging trigger is not limited to this.
  • the wearable glass 110 or the controller 150 is provided with a microphone, and the microphone emits voice emitted by the user. Then, imaging may be performed based on a specific word issued by the user. That is, imaging by voice input may be performed.
  • a camera for capturing an eye of the user may be provided on the wearable glass 110, and blink (eye blink) of the user's eye may be used as a trigger for imaging.
  • the input unit 154 is provided in the controller 150.
  • the input unit 154 may be provided in the middle of the cable 140.
  • the reading system 1 may be provided with a setting unit capable of setting the language of the reading voice.
  • a translation unit may be provided to translate the characters extracted by the extraction unit 231 into the language set in the setting unit, and the conversion unit 232 may convert the characters translated by the translation unit into speech.
  • the reading system 1 can function as a system for interpreting written characters, and can be a system useful not only for people with low vision but also for foreign users.
  • the extraction unit 231 may keep the range for extracting characters from the captured image within a predetermined range instead of the entire captured image.
  • FIG. 7 shows an example of the captured image 700, and the extraction unit 231 may set only a predetermined range 710 in the captured image 700 as a range for extracting characters.
  • the predetermined range 710 may be set as a range in which characters are preferentially extracted. The process of extracting a character from the outside of the predetermined range 710 when the character extraction range is a range in which the character is extracted first, and the character can not be extracted from the inside of the predetermined range 710 Say what to do.
  • the predetermined range 710 may be set by the user who uses the reading system 1. Generally, the user tends to look in a direction slightly lower than the front direction. Therefore, it is effective to set the predetermined range 710 closer to the lower part of the captured image 700.
  • control unit 230 may set the predetermined range 710. Specifically, for a large number of captured images received by the server 200, a range in which characters can be extracted is specified. Then, the average range may be set as a predetermined range 710 for extracting characters.
  • various sensors may be provided on the wearable glass 110, and the predetermined range 710 may be determined based on sensing data obtained from the sensors.
  • a gyro sensor is mounted on the wearable glass 110, and the mounting tool 100 transmits sensing data of the gyro sensor to the server 200 together with a captured image.
  • the extraction unit 231 may determine the predetermined range 710 based on sensing data of the gyro sensor. For example, when it is estimated from the sensing data that the user 10 is depressed, the predetermined range 710 may be set to a position from the lower side with respect to the entire captured image 700.
  • the server 200 has a configuration for transmitting the corresponding read-out voice information 320 as a past log to an information processing apparatus such as a PC held by the user 10 May be With this configuration, the user 10 can listen to the past read-out voice any time.
  • the mounting tool 100 may be provided with a position information acquisition unit for acquiring position information indicating the location where the own device is present.
  • the position information acquisition unit can be realized by using, for example, GPS or GNSS.
  • the position information obtaining unit obtains position information, and associates the obtained position information with the captured image.
  • the attachment tool 100 transmits the captured image associated with the position information to the server 200.
  • the server 200 may further associate and manage imaging position information indicating an imaging position as the reading voice information 320.
  • information including position information is transmitted from the server 200 as the read-out voice information 320 to the information processing apparatus of the user 10, so that the information processing apparatus of the user 10 further reads the read-out voice as shown in FIG. Can be presented with the map application. That is, the user 10 can recognize when and where the read-out speech is acquired on the map. Then, by positioning and clicking the cursor 803 on the log information 801 and 802 of the map information, etc., the information processing apparatus may play back the read-out voice by voice reproduction software or the like. For example, as shown in the map 800 in FIG. 8, the log information 801 and the log information 802 can recognize where the read-out voice is obtained based on the captured image captured.
  • the imaging unit 111 sequentially performs imaging, and it is determined whether or not characters are included in the obtained captured image. It is good to detect. Then, when it is detected that a character is included, that effect is transmitted to the controller 150, and the control unit 155 causes the user 10 to recognize the presence of the character in the front direction at that time. It may be informed. Then, the user 10 can input an imaging instruction to the timing and the input unit 154.
  • the user 10 can be made to recognize the presence of the character when the user 10 can not visually recognize the presence of the character, such as when the user 10 is a low-vision person, in particular, blind. Can provide a highly convenient reading system 1.
  • the imaging unit 111 may change the imaging condition according to the environment in which the user (wearable glass 110) is placed.
  • the wearable glass 110 may include various sensors (for example, an illuminance sensor or the like) to change the exposure time and the angle of view.
  • the server 200 can not extract characters from the image, can not convert the extracted characters into sound, or the image does not contain characters.
  • the error signal may be transmitted to the mounting tool 100, and the mounting tool 100 may receive the signal and output a sound indicating the error from the output unit 156.
  • an activation sound when the mounting tool 100 is activated an imaging sound when the imaging unit 111 performs imaging ( A variety of sounds are stored in the storage unit 153 such as a shutter sound), a sound indicating that the user is waiting, and a cancellation sound when the user inputs an instruction to cancel the process.
  • a corresponding sound may be output from the output unit 156.
  • the mounting tool 100 can notify the user of the state of the device only by the sound by adopting the configuration of outputting the sound according to various states.
  • the server 200 generates the read-out voice generated according to the ratio of the location where the characters are extracted from the captured image or the range where the characters are extracted to the captured image. May be changed.
  • Changing the mode of the voice according to the place where the character is extracted means changing the direction in which the user hears the voice according to the place in the captured image in which the character is extracted from the captured image. For example, when the part where the character is extracted is extracted from the right side of the captured image, the output unit 156 may output so that the read-out voice can be heard from the right side of the user. With this configuration, it is possible for the user to intuitively recognize in which direction the user read the character in the direction viewed from the user.
  • changing the mode of the read-out voice generated according to the ratio of the range from which the character is extracted to the captured image changes the volume of the read-out voice according to the ratio of the ratio to the captured image of the range from which the character is extracted.
  • the transmission data 310 is associated with the user ID 311, the captured image information 312, and the imaging time information 313, but various other information is also supported. It may be attached. For example, as shown in the above supplement, even if information of sensing data such as a gyro sensor or an acceleration sensor capable of specifying the posture of the mounting tool 100 is also associated with position information indicating a position where the mounting tool 100 exists. Good.
  • the imaging time information 321, the captured image information 322, and the read-out voice 323 are associated with each other, but in addition to this, the captured image is obtained by analysis. Text data of characters, position information included in the transmission data 310, sensing data, etc. may also be associated.
  • the read-out voice information can be used as a life log of each user by accumulating and accumulating various types of information.
  • server 200 may be provided with the offer part which provides specified information among the stored information. For example, by accumulating position information, information on the amount of movement of the user per unit time (for example, 1 day) can be provided, information on where the user has gone, information on gyro sensor information can be used Then, by specifying the posture of the user, it is possible to provide posture information (for example, whether the posture is good or bad).
  • the processor (the control unit 155, the control unit 230) functioning as each functional unit constituting the reading system 1 performs a reading program etc. as a method for the reading system 1 to read the voice. By performing this processing, the reading processing is performed.
  • This is a logic circuit (hardware) or a dedicated circuit formed in an integrated circuit (IC (Integrated Circuit) chip, LSI (Large Scale Integration)) or the like. It may be realized by incorporating In addition, these circuits may be realized by one or more integrated circuits, and the functions of the plurality of functional units shown in the above embodiments may be realized by one integrated circuit.
  • An LSI may be called a VLSI, a super LSI, an ultra LSI, or the like depending on the degree of integration. That is, as shown in FIG. 9, each functional unit in the mounting tool 100 and the server 200 constituting the reading system 1 may be realized by a physical circuit. That is, as shown in FIG. 9, the wearing tool 100 includes a wearable glass 110 including an imaging circuit 111a and a communication I / F circuit 112a, an earphone 130, a communication I / F circuit 151a, a communication unit 152a, and a memory.
  • the server 200 may also be configured of a communication circuit 210a, a storage circuit 220a, and a control circuit 230a including an extraction circuit 231a and a conversion circuit 232a.
  • the above-mentioned reading program may be recorded on a recording medium readable by a processor, and as the recording medium, "non-temporary tangible medium", for example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit Etc. can be used. Further, the reading program may be supplied to the processor via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the reading program.
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above-mentioned reading program is embodied by electronic transmission.
  • the above-mentioned reading program can be implemented using, for example, a script language such as ActionScript or JavaScript (registered trademark), an object-oriented programming language such as Objective-C or Java (registered trademark), or a markup language such as HTML5.
  • a script language such as ActionScript or JavaScript (registered trademark)
  • an object-oriented programming language such as Objective-C or Java (registered trademark)
  • a markup language such as HTML5.
  • reading system 100 mounting tool 110 wearable glass 111 imaging unit 112 communication I / F 130 Earphone 150 Controller 151 Communication I / F 152 communication unit 153 storage unit 154 input unit 155 control unit 156 output unit 200 server 210 communication unit 220 storage unit 230 control unit 231 extraction unit 232 conversion unit

Abstract

This read-out system is equipped with: an imaging unit which is provided to a wearable device that is used worn on the body of a user, and which captures images of the forward direction of the user; an extraction unit which extracts written characters from the images captured by the imaging unit; a conversion unit which converts the characters extracted by the extraction unit into voice audio; an output unit which is provided to the wearable device and outputs the voice audio; an input unit which is provided to the wearable device and receives input from the user; and a control unit which, on the basis of the input from the user received via the input unit, controls the playback speed of the voice audio output from the output unit.

Description

読み上げシステム及び読み上げ方法Reading system and reading method
 本発明は、文章を音声に変換して読み上げる読み上げシステム及び読み上げ方法に関する。 The present invention relates to a reading system and a reading method for converting sentences into speech and reading them.
 近年、弱視者や文字を読むことが困難な読字障害者の視認を支援する機器の開発が行われている。例えば、特許文献1には、ロービジョン者が屋外で夜間等にも歩行ができるように、前方視界を撮像し表示することのできるウェアラブルディスプレイが開示されている。特許文献1のロービジョン者用ウェアラブルディスプレイによれば、撮像した画像のコントラスト及び明るさを変換して表示している。また、撮像画像に文字があった場合に文字認識処理を行ってその文字をユーザに音声で知らせることも開示している。 2. Description of the Related Art In recent years, devices have been developed to support visual recognition of people with low vision or people with reading disabilities who have difficulty reading letters. For example, Patent Document 1 discloses a wearable display capable of imaging and displaying a front view so that a low vision person can walk outdoors at night or the like. According to the low-vision person wearable display of Patent Document 1, the contrast and the brightness of the captured image are converted and displayed. It also discloses that when there is a character in a captured image, character recognition processing is performed to notify the user of the character by voice.
特開2014-165616号公報JP 2014-165616 A
 ところで、上記特許文献1に記載のロービジョン者用ウェアラブルディスプレイにおいては、文字認識処理によって、その文字をロービジョン者にスピーカにより伝達するとのみ記載しており具体的にどのように音声を伝えるかについては開示がない。また、ユーザによって音の聞こえ方は異なるため、特許文献1に記載のロービジョン者用ウェアラブルディスプレイの場合、ユーザビリティに欠けるという問題がある。 By the way, in the low-vision person wearable display described in Patent Document 1 described above, it is described only that the letter is transmitted to the low-vision person through the speaker by the character recognition processing, and specifically how the speech is transmitted There is no disclosure. In addition, since the way of hearing the sound differs depending on the user, in the case of the wearable display for low vision persons described in Patent Document 1, there is a problem that the usability is lacking.
 そこで、本発明は上記問題に鑑みて成されたものであり、使用するユーザにとって上記特許文献1に記載のロービジョン者用ウェアラブルディスプレイよりも利便性に優れた読み上げシステムを提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and it is an object of the present invention to provide a reading system that is more convenient for the user to use than the low vision wearable display described in Patent Document 1 above. Do.
 上記課題を解決するために、本発明の一態様に係る読み上げシステムは、ユーザが身に着けて使用する装着具に備えられ、ユーザの正面方向を撮像する撮像部と、撮像部が撮像した画像から文字を抽出する抽出部と、抽出部が抽出した文字を音声に変換する変換部と、装着具に備えられ、音声を出力する出力部と、装着具に備えられ、ユーザからの入力を受け付ける入力部と、装着具に備えられ、入力部を介して受け付けたユーザからの入力に基づいて、出力部から出力される音声の再生速度を制御する制御部とを備える。 In order to solve the above-mentioned subject, a reading system concerning one mode of the present invention is equipped with a wearing tool which a user wears and uses, and an image which an image pick-up part picturizes a user's front direction, and an image And a converter for converting characters extracted by the extracting unit into voice, an output unit for outputting voice and provided in the mounting tool, and a mounting tool for receiving an input from the user An input unit, and a control unit provided in the mounting tool and controlling the reproduction speed of the sound output from the output unit based on the input from the user received via the input unit.
 上記課題を解決するために、本発明の一態様に係る読み上げ方法は、ユーザが身に着けて使用する装着具に備えられた撮像部により、ユーザの正面方向を撮像する撮像ステップと、撮像ステップにおいて撮像した画像から文字を抽出する抽出ステップと、抽出ステップにおいて抽出した文字を音声に変換する変換ステップと、装着具に備えられた出力部から音声を出力する出力ステップと、装着具に備えられた入力部からユーザからの入力を受け付ける入力ステップと、入力部を介して受け付けたユーザからの入力に基づいて、出力部から出力する音声の再生速度を制御する制御ステップとを含む。 In order to solve the above problems, the reading method according to an aspect of the present invention includes an imaging step of imaging a front direction of a user by an imaging unit provided in a mounting tool worn and used by the user; An extraction step of extracting characters from the image captured in step, a conversion step of converting the characters extracted in the extraction step into voice, an output step of outputting voice from an output unit provided in the wearing tool, and The method further includes an input step of receiving an input from the user from the input unit, and a control step of controlling a reproduction speed of the sound output from the output unit based on the input from the user received through the input unit.
 また、上記読み上げシステムにおいて、制御部は、入力部を介して受け付けたユーザからの入力に基づいて、出力部から出力される音声を一時停止させることとしてもよい。 Further, in the above-mentioned reading system, the control unit may pause the sound output from the output unit based on the input from the user received via the input unit.
 また、上記読み上げシステムにおいて、制御部は、入力部を介して受け付けたユーザからの入力に基づいて、出力部から出力した音声を繰り返し再生することとしてもよい。 In the above-mentioned reading system, the control unit may repeatedly reproduce the voice output from the output unit based on the input from the user received via the input unit.
 また、上記読み上げシステムにおいて、出力部は、変換部が文字を音声に変換している間、文字の変換処理中であることを示す音を出力することとしてもよい。 Further, in the above-mentioned reading system, the output unit may output a sound indicating that the character conversion process is in progress while the conversion unit converts the characters into speech.
 また、上記読み上げシステムにおいて、読み上げシステムは、抽出部と変換部とを備えるサーバを含み、装着具は、撮像部が撮像した画像をサーバに送信する送信部と、変換部が変換した音声を受信する受信部とを備え、出力部は、送信部が画像を送信してから受信部が音声を受信するまでの間、文字の変換処理中であることを示す音を出力することとしてもよい。 Further, in the reading system, the reading system includes a server including an extraction unit and a conversion unit, and the mounting tool receives a transmission unit that transmits an image captured by the imaging unit to the server, and a voice converted by the conversion unit. The output unit may output a sound indicating that character conversion processing is in progress, from when the transmission unit transmits an image to when the reception unit receives an audio.
 また、上記読み上げシステムにおいて、装着具は、周囲の環境に関する環境情報を取得する取得部を備え、撮像部は、環境情報に基づいて撮像条件を変更することとしてもよい。 Further, in the above-mentioned reading system, the mounting tool may include an acquisition unit for acquiring environment information on the surrounding environment, and the imaging unit may change the imaging condition based on the environment information.
 また、上記読み上げシステムにおいて、装着具は、撮像部が撮像している撮像画像中に文字があるか否かを判定する判定部を備え、出力部は、判定部が撮像画像中に文字が含まれると判定しているときに、撮像部による撮像方向に文字が存在することを示す音声を出力することとしてもよい。 In the reading system, the mounting tool includes a determination unit that determines whether or not there is a character in the captured image captured by the imaging unit, and the output unit includes the character in the captured image. When it is determined that the character is present, a sound indicating that a character is present in the imaging direction by the imaging unit may be output.
 また、上記読み上げシステムにおいて、読み上げシステムは、撮像部が撮像した撮像画像と、当該撮像画像に基づいて変換部が変換して得られた音声とを対応付けて、ユーザの情報処理端末に送信するログ送信部を備えることとしてもよい。 Further, in the reading system, the reading system associates the captured image captured by the imaging unit with the voice obtained by converting the converting unit based on the captured image, and transmits the associated image to the user's information processing terminal A log transmission unit may be provided.
 また、上記読み上げシステムにおいて、装着具は、自端末の位置を示す位置情報を取得する位置情報取得部を備え、撮像部は、撮像した撮像画像に、撮像したときに位置情報が取得した位置情報を対応付け、ログ送信部は、撮像画像と音声と共に位置情報をユーザの情報処理端末に送信することとしてもよい。 In the reading system, the mounting tool includes a position information acquisition unit that acquires position information indicating the position of the own terminal, and the imaging unit acquires position information acquired when the imaging unit captures an image of the captured image. The log transmission unit may transmit position information to the user's information processing terminal together with the captured image and the sound.
 本発明の一態様に係る読み上げシステムは、文字から変換された音声の読み上げの速度を自由に調節することができる。したがって、ユーザビリティに優れた読み上げシステムを提供することができる。 The reading system according to an aspect of the present invention can freely adjust the speed of reading speech converted from characters. Therefore, it is possible to provide a reading system excellent in usability.
(a)は、装着具を装着しているユーザの外観例を示す図であり、(b)は、装着具を用いて撮像を行って読み上げを行う外観例を示す図である。(A) is a figure which shows the example of an external appearance of the user who mounts | wears with a mounting tool, (b) is a figure which shows the example of an external appearance which images by using a mounting tool and it reads aloud. 読み上げシステムのシステム構成例を示す図である。It is a figure showing an example of system configuration of a reading system. (a)は、装着具がサーバに送信するデータの構成例を示す図であり、(b)は、サーバがユーザ毎に記憶する読み上げ音声情報の構成例を示す図である。(A) is a figure which shows the structural example of the data which a mounting tool transmits to a server, (b) is a figure which shows the structural example of the read-out audio | voice information which a server memorize | stores for every user. 装着具とサーバとのやり取りを示すシーケンス図である。It is a sequence diagram showing exchange with a mounting tool and a server. 装着具の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a mounting tool. サーバの動作を示すフローチャートである。It is a flowchart which shows operation | movement of a server. 画像から優先的に文字を抽出する範囲例を示す図である。FIG. 6 is a diagram illustrating an example of a range in which characters are preferentially extracted from an image. 地図を利用した読み上げ音声の再生を行うための画面例を示す図である。It is a figure which shows the example of a screen for reproducing the reading audio | voice using a map. 読み上げシステムのシステム構成の別例を示す図である。It is a figure which shows another example of a system configuration of a reading system.
 以下、本発明の一実施態様に係る読み上げシステムについて、図面を参照しながら詳細に説明する。 Hereinafter, a reading system according to an embodiment of the present invention will be described in detail with reference to the drawings.
<実施の形態>
<構成>
 図1(a)は、読み上げシステム1に係る装着具100を装着しているユーザの外観例を示す図である。また、図1(b)は、装着具100を用いて撮像を行い、読み上げを行っている様子を示す外観例を示す図である。また、図2は、読み上げシステム1のシステム構成例を示す図である。
Embodiment
<Configuration>
FIG. 1A is a view showing an example of the appearance of a user wearing the wearing tool 100 according to the reading system 1. Further, FIG. 1B is a view showing an example of an appearance showing a state in which imaging is performed using the mounting tool 100 and reading is performed. FIG. 2 is a view showing an example of the system configuration of the reading system 1.
 図1及び図2に示すように、読み上げシステム1は、ユーザが身に着けて使用する装着具100に備えられ、ユーザの正面方向を撮像する撮像部111と、撮像部111が撮像した画像から文字を抽出する抽出部231と、抽出部231が抽出した文字を音声に変換する変換部232と、装着具100に備えられ、音声を出力する出力部156と、装着具100に備えられ、ユーザからの入力を受け付ける入力部154と、装着具100に備えられ、入力部154を介して受け付けたユーザからの入力に基づいて、出力部156から出力される音声の再生速度を制御する制御部155とを備える。このような読み上げシステム1について、以下詳細に説明する。 As shown in FIG. 1 and FIG. 2, the reading system 1 is provided in the mounting tool 100 worn and used by the user, and from the imaging unit 111 that images the front direction of the user and the image captured by the imaging unit 111 The extraction unit 231 for extracting characters, the conversion unit 232 for converting the characters extracted by the extraction unit 231 into voice, the output unit 156 for outputting voice and provided in the wearing tool 100, and provided in the wearing tool 100 Control unit 155 that controls the reproduction speed of the sound output from the output unit 156 based on the input from the user that is provided to the mounting tool 100 and that is received by the user via the input unit 154. And Such a reading system 1 will be described in detail below.
 図1(a)、(b)に示すように、ユーザ10は、ウェアラブルグラス110を装着して使用する。ウェアラブルグラス110には、ユーザからの指示に従って、ユーザの正面方向を撮像できる位置に、撮像部111が配されている。撮像部111は、所謂カメラである。ウェアラブルグラス110は、コントローラ150と接続されている。なお、図1(a)においては、ウェアラブルグラス110は、コード120を介してイヤホン130を経由し、ケーブル140を介して接続するように示しているが、ウェアラブルグラス110とコントローラ150とは、イヤホン130とコントローラ150と同様に直接接続されていてよい。また、ユーザ10は、イヤホン130を耳に装着し、コントローラ150から伝達された読み上げ音声を聞くことができる。図1(a)に示すように、ユーザ10は、コントローラ150を保持し、当該コントローラ150を用いて、撮像の指示や、読み上げ音声の再生に係る指示を行うことができる。図1(b)に示すように、ユーザが撮像指示を行うと、撮像部111は、撮像範囲160を撮像する。そして、撮像範囲160に含まれる文字を認識し、その文字を機械合成音声に変換して読み上げる。したがって、読み上げシステム1は、弱視者等に対して読みにくい文字の情報を提供することができる。 As shown in FIGS. 1A and 1B, the user 10 wears the wearable glass 110 and uses it. In the wearable glass 110, an imaging unit 111 is disposed at a position where it can capture an image in the front direction of the user according to an instruction from the user. The imaging unit 111 is a so-called camera. The wearable glass 110 is connected to the controller 150. In FIG. 1A, the wearable glass 110 is shown to be connected via the cord 130 via the earphone 130 and the cable 140. However, the wearable glass 110 and the controller 150 are earphones. Similar to 130 and controller 150, they may be directly connected. Also, the user 10 can wear the earphone 130 on his ear and listen to the read-out voice transmitted from the controller 150. As shown in FIG. 1A, the user 10 holds the controller 150, and can use the controller 150 to issue an imaging instruction or an instruction related to reproduction of a read-out voice. As illustrated in FIG. 1B, when the user issues an imaging instruction, the imaging unit 111 captures an image of the imaging range 160. Then, a character included in the imaging range 160 is recognized, and the character is converted into a mechanically synthesized speech and read out. Therefore, the reading system 1 can provide information of difficult-to-read characters to the low vision person or the like.
 図2は、読み上げシステム1のシステム構成例であり、読み上げシステム1は、装着具100と、サーバ200とを含む。装着具100と、サーバ200とは、ネットワーク300を介して通信可能に構成されている。装着具100とネットワーク300は、無線通信により通信を行う。なお、無線通信を実行できれば、どのような通信プロトコルを用いてもよい。また、サーバ200もネットワークと通信を行うが、これは、無線通信でも有線通信でもいずれの通信態様をとってもよく、また、通信を実行できればどのような通信プロトコルを用いてもよい。 FIG. 2 shows an example of the system configuration of the reading system 1. The reading system 1 includes a wearing tool 100 and a server 200. The mounting tool 100 and the server 200 are configured to be able to communicate via the network 300. The mounting tool 100 and the network 300 communicate by wireless communication. Any communication protocol may be used as long as wireless communication can be performed. The server 200 also communicates with the network, but any communication mode may be used, either wireless communication or wired communication, and any communication protocol may be used as long as communication can be performed.
 図2に示すように、装着具100は、ウェアラブルグラス110と、イヤホン130と、コントローラ150とを備える。即ち、本実施の形態においては、図2に示すように、ウェアラブルグラス110と、イヤホン130と、コントローラ150とを纏めて装着具100として呼称する。また、ここでは、ウェアラブルグラス110としているが、ユーザ10の正面方向(視野方向)を撮像できるものであればよく、眼鏡に限るものではないことは言うまでもない。 As shown in FIG. 2, the mounting tool 100 includes a wearable glass 110, an earphone 130, and a controller 150. That is, in the present embodiment, as shown in FIG. 2, the wearable glass 110, the earphone 130, and the controller 150 are collectively referred to as a mounting tool 100. Further, although the wearable glass 110 is used here, it is needless to say that the wearable glass 110 is not limited to the glasses as long as the front direction (viewing direction) of the user 10 can be imaged.
 ウェアラブルグラス110は、撮像部111と、通信I/F112を備える。 The wearable glass 110 includes an imaging unit 111 and a communication I / F 112.
 撮像部111は、ユーザの正面方向を撮像可能なカメラである。撮像部111は、通信I/F112から指示された撮像信号を受けて撮像を行う。撮像部111は、ユーザの正面方向を撮像可能に設けられていればウェアラブルグラス110のどこに設けられてもよい。図1においては、ウェアラブルグラスの左側丁番部分に設ける例を示しているが、撮像部111は、右側丁番部分に設けることとしてもよいし、ブリッジ部分に設けることとしてもよい。撮像部111は、撮像して得られた撮像画像を、通信I/F112に伝達する。また、撮像部111は、逐次撮像を行いながら、撮像画像を解析して撮像画像中の文字の有無を検出する検出機能を有してよく、このとき、撮像画像中に文字が含まれていると判定した場合に、ユーザの正面方向に文字が存在すること旨を示す存在信号を通信I/F112に伝達する。 The imaging unit 111 is a camera capable of imaging the front direction of the user. The imaging unit 111 receives an imaging signal instructed from the communication I / F 112 and performs imaging. The imaging unit 111 may be provided anywhere on the wearable glass 110 as long as the imaging unit 111 can pick up an image in the front direction of the user. Although FIG. 1 illustrates an example in which the left hinge portion of the wearable glass is provided, the imaging unit 111 may be provided in the right hinge portion or may be provided in the bridge portion. The imaging unit 111 transmits a captured image obtained by imaging to the communication I / F 112. In addition, the imaging unit 111 may have a detection function of analyzing the captured image and detecting the presence or absence of characters in the captured image while sequentially capturing images, and at this time, the captured image includes characters. When it is determined, the presence signal indicating that the character is present in the front direction of the user is transmitted to the communication I / F 112.
 通信I/F112は、コントローラ150と通信を行う機能を有する通信インターフェースである。通信I/F112は、コントローラ150の通信I/F151と通信可能に接続されている。ここでは、図1に示すように有線により接続されていることとするが、これは、無線接続であってもよい。通信I/F112は、コントローラ150の通信I/F151から伝達された撮像信号を撮像部111に伝達する。また、通信I/F112は、撮像部111から伝達された撮像画像や、ユーザの正面方向に文字が存在することを示す存在信号を、通信I/F151に伝達する。 The communication I / F 112 is a communication interface having a function of communicating with the controller 150. The communication I / F 112 is communicably connected to the communication I / F 151 of the controller 150. Here, as shown in FIG. 1, it is assumed that they are connected by wire, but this may be wireless connection. The communication I / F 112 transmits the imaging signal transmitted from the communication I / F 151 of the controller 150 to the imaging unit 111. In addition, the communication I / F 112 transmits, to the communication I / F 151, the captured image transmitted from the imaging unit 111 and a presence signal indicating that a character is present in the front direction of the user.
 イヤホン130は、コントローラ150の出力部156に接続されており、出力部156から伝達された音声信号を音声として出力する機能を有する。ここでは、図1に示すように、イヤホン130は有線によりコントローラ150と接続されていることとするが、これは、無線接続であってもよい。イヤホン130は、撮像画像に基づいて検出された文字を読み上げた読み上げ音声や、文字の解析中であることを示す音や、撮像部111の正面方向に文字があることを示す音を出力する。 The earphone 130 is connected to the output unit 156 of the controller 150, and has a function of outputting an audio signal transmitted from the output unit 156 as an audio. Here, as shown in FIG. 1, the earphone 130 is connected to the controller 150 by wire, but this may be wireless connection. The earphone 130 outputs a read-out voice read out from a character detected based on the captured image, a sound indicating that the character is being analyzed, or a sound indicating that the character is present in the front direction of the imaging unit 111.
 コントローラ150は、通信I/F151と、通信部152と、記憶部153と、入力部154と、制御部155と、出力部156とを備える。図1に示すように、コントローラ150の各部は互いにバスにより接続されている。 The controller 150 includes a communication I / F 151, a communication unit 152, a storage unit 153, an input unit 154, a control unit 155, and an output unit 156. As shown in FIG. 1, each part of the controller 150 is mutually connected by a bus.
 通信I/F151は、ウェアラブルグラス110の通信I/F112と通信を行う機能を有する通信インターフェースである。通信I/F151は、制御部155から撮像信号を受け取ると、当該撮像信号を通信I/F112に伝達する。また、通信I/F151は、通信I/F112から撮像画像や存在信号を受け取ると、制御部155に伝達する。 The communication I / F 151 is a communication interface having a function of communicating with the communication I / F 112 of the wearable glass 110. When the communication I / F 151 receives an imaging signal from the control unit 155, the communication I / F 151 transmits the imaging signal to the communication I / F 112. Further, when the communication I / F 151 receives a captured image or a presence signal from the communication I / F 112, the communication I / F 151 transmits the image to the control unit 155.
 通信部152は、ネットワーク300を介してサーバ200と通信を実行する機能を有する通信インターフェースである。通信部152は、制御部155からの指示に従って、撮像画像をサーバ200に送信部として機能するとともに、撮像画像に含まれる文字を音声に変換した読み上げ音声をサーバ200から受信する受信部として機能する。通信部152は、サーバ200から読み上げ音声を受信した場合には、当該読み上げ音声を制御部155に伝達する。 The communication unit 152 is a communication interface having a function of executing communication with the server 200 via the network 300. The communication unit 152 functions as a transmission unit for the captured image to the server 200 according to an instruction from the control unit 155, and also functions as a reception unit for receiving from the server 200 a read-out voice obtained by converting characters included in the captured image to voice. . When the communication unit 152 receives the read voice from the server 200, the communication unit 152 transmits the read voice to the control unit 155.
 記憶部153は、コントローラ150が機能するために必要とする各種のプログラムやデータを記憶する機能を有する。記憶部153は、例えば、HDD(Hard Disc Drive)、SSD(Solid State Drive)、フラッシュメモリなどにより実現することができるが、これらに限定されるものではない。記憶部153は、制御部155によって実行される読み上げプログラムや、撮像部111が撮像した撮像画像や、通信部152が受信した読み上げ音声の情報などを記憶する。また、記憶部153は、通信部152が撮像画像をサーバ200に送信してから読み上げ音声を受信するまでの間に出力部156から出力する音であって文字を音声に変換中であることを示す音声情報を記憶する。さらには、記憶部153は、撮像部111の撮像方向に文字があった場合にユーザ10にその旨を報知するための音声情報も記憶する。 The storage unit 153 has a function of storing various programs and data required for the controller 150 to function. The storage unit 153 can be realized by, for example, an HDD (Hard Disc Drive), an SSD (Solid State Drive), a flash memory, and the like, but is not limited thereto. The storage unit 153 stores, for example, a reading program executed by the control unit 155, a captured image captured by the imaging unit 111, information on read-out voice received by the communication unit 152, and the like. In addition, the storage unit 153 is a sound that is output from the output unit 156 during the period from when the communication unit 152 transmits the captured image to the server 200 to when the read-out sound is received, and the character is being converted into sound. The voice information to be shown is stored. Furthermore, when there is a character in the imaging direction of the imaging unit 111, the storage unit 153 also stores voice information for informing the user 10 of the fact.
 入力部154は、ユーザ10からの入力を受け付ける機能を有する。入力部154は、例えば、コントローラ150に備えられたハードキーにより実現することができるが、これは、タッチパネルなどにより実現することとしてもよい。入力部154は、ユーザ10が撮像を指示するための撮像ボタン154Aや、再生、一時停止、リプレイを指示するための再生ボタン154Bや、音声の再生速度を調整するための調整ボタン154Cを含んでよい。入力部154は、各ボタンの押下に応じて、押下された内容を示す信号を制御部155に伝達する。 The input unit 154 has a function of receiving an input from the user 10. The input unit 154 can be realized by, for example, a hard key provided in the controller 150, but this may be realized by a touch panel or the like. The input unit 154 includes an imaging button 154A for instructing the imaging by the user 10, a playback button 154B for instructing playback, pause, and replay, and an adjustment button 154C for adjusting the playback speed of sound. Good. The input unit 154 transmits a signal indicating the pressed content to the control unit 155 in response to the pressing of each button.
 制御部155は、コントローラ150の各部を制御する機能を有するプロセッサである。制御部155は、記憶部153に記憶されている各種プログラムを実行することで、コントローラ150として実行すべき機能を果たす。 The control unit 155 is a processor having a function of controlling each unit of the controller 150. The control unit 155 executes the various programs stored in the storage unit 153 to perform the function to be executed as the controller 150.
 制御部155は、入力部154から撮像指示を伝達された場合には、通信I/F151に撮像信号をウェアラブルグラス110に送信するように指示する。 When receiving the imaging instruction from the input unit 154, the control unit 155 instructs the communication I / F 151 to transmit an imaging signal to the wearable glass 110.
 また、制御部155は、通信I/F151から撮像画像を伝達された場合には、当該撮像画像をサーバ200に送信するように通信部152に指示する。また、当該指示の後に、制御部155は、記憶部153から、撮像画像に含まれる文字の音声への変換中であることを示す音声を読み出し、出力部156に出力するように指示する。 Further, when the captured image is transmitted from the communication I / F 151, the control unit 155 instructs the communication unit 152 to transmit the captured image to the server 200. Further, after the instruction, the control unit 155 reads from the storage unit 153 a voice indicating that conversion of characters included in the captured image into a voice is instructed, and instructs the output unit 156 to output the voice.
 制御部155は、通信部152から読み上げ音声を伝達された場合には、変換中であることを示す音声の出力の停止を出力部156に指示する。そして、制御部155は、読み上げ音声を出力するように出力部156に指示する。 When the read-out voice is transmitted from the communication unit 152, the control unit 155 instructs the output unit 156 to stop the output of the voice indicating that conversion is in progress. Then, the control unit 155 instructs the output unit 156 to output the read voice.
 また、制御部155は、通信I/F151から存在信号を伝達された場合には、記憶部153からユーザ10の正面方向に文字が存在することを示す音声を読み出して、出力部156に出力するように指示する。 Further, when the presence signal is transmitted from communication I / F 151, control unit 155 reads from storage unit 153 a voice indicating that a character is present in the front direction of user 10, and outputs the voice to output unit 156. To tell.
 また、制御部155は、入力部154から伝達されたユーザ10からの指示に応じて読み上げ音声の再生制御処理を実行する。例えば、一時停止指示を受け付けた場合には、読み上げ音声の再生を一時中止するように出力部156に指示する。 Further, the control unit 155 executes reproduction control processing of the read-out sound in accordance with the instruction from the user 10 transmitted from the input unit 154. For example, when the pause instruction is received, the output unit 156 is instructed to temporarily stop the reproduction of the reading voice.
 また、例えば、スロー再生指示を受け付けた場合には制御部155は読み上げ音声のスロー再生を実行するように出力部156に指示する。スロー再生指示は、再生速度の調整指示によって代替することもでき、制御部155は、読み上げ音声の再生速度を早くしたり遅くしたりすることもできる。また、制御部155は、リプレイ指示を受け付けた場合には、それまでに出力していた読み上げ音声をもう一度再生するように出力部156に指示する。 Further, for example, when the slow reproduction instruction is received, the control unit 155 instructs the output unit 156 to execute the slow reproduction of the read sound. The slow playback instruction can be replaced by a playback speed adjustment instruction, and the control unit 155 can also increase or decrease the playback speed of the read-out sound. In addition, when the control unit 155 receives a replay instruction, the control unit 155 instructs the output unit 156 to reproduce the read-out voice output so far.
 出力部156は、制御部155から指示された音声信号をイヤホン130に出力する機能を有する。出力部156は、読み上げ音声や、文字の音声への変換中を示す音声や、文字がユーザ10の正面方向に文字が存在することを示す音声を、イヤホン130に出力する。 The output unit 156 has a function of outputting an audio signal instructed from the control unit 155 to the earphone 130. The output unit 156 outputs, to the earphone 130, a read-out voice, a voice indicating that the character is being converted into voice, or a voice indicating that the character is present in the front direction of the user 10.
 以上が、装着具100の構成についての説明である。 The above is the description of the configuration of the mounting tool 100.
 次にサーバ200について説明する。図2に示すようにサーバ200は、通信部210と、記憶部220と、制御部230とを備える。通信部210と、記憶部220と、制御部230とは互いにバスを介して接続されている。 Next, the server 200 will be described. As shown in FIG. 2, the server 200 includes a communication unit 210, a storage unit 220, and a control unit 230. The communication unit 210, the storage unit 220, and the control unit 230 are connected to one another via a bus.
 通信部210は、ネットワーク300を介して装着具100(コントローラ150)と通信を実行する機能を有する通信インターフェースである。通信部210は、制御部230からの指示に従って、読み上げ音声を装着具100に送信する送信部として機能するとともに、撮像画像を受信する受信部として機能する。通信部210は、装着具100から撮像画像を受信した場合には、当該撮像画像を制御部230に伝達する。 The communication unit 210 is a communication interface having a function of performing communication with the mounting tool 100 (controller 150) via the network 300. The communication unit 210 functions as a transmission unit that transmits read sound to the attachment 100 in accordance with an instruction from the control unit 230, and also functions as a reception unit that receives a captured image. When the communication unit 210 receives a captured image from the mounting tool 100, the communication unit 210 transmits the captured image to the control unit 230.
 記憶部220は、サーバ200が動作上必要とする各種プログラムやデータを記憶する。記憶部220は、例えば、HDD(Hard Disc Drive)、SSD(Solid State Drive)、フラッシュメモリなどにより実現することができるが、これらに限定されるものではない。記憶部220は、画像から文字を抽出するための文字認識プログラムや、認識した文字を音声変換するための音声変換プログラムや、読み上げ音声情報を記憶する。読み上げ音声情報の詳細については、後述する。 The storage unit 220 stores various programs and data that the server 200 needs in operation. The storage unit 220 can be realized by, for example, a hard disc drive (HDD), a solid state drive (SSD), a flash memory, or the like, but is not limited thereto. The storage unit 220 stores a character recognition program for extracting characters from an image, a voice conversion program for converting the recognized characters into speech, and read-out voice information. Details of the read-out speech information will be described later.
 制御部230は、サーバ200の各部を制御する機能を有するプロセッサである。制御部230は、記憶部220に記憶されている各種プログラムを実行することで、サーバ200として実行すべき機能を果たす。制御部230は、文字認識プログラムを実行することで、抽出部231として機能し、音声変換プログラムを実行することで変換部232として機能する。 The control unit 230 is a processor having a function of controlling each unit of the server 200. The control unit 230 executes the various programs stored in the storage unit 220 to perform the function to be executed as the server 200. The control unit 230 functions as the extraction unit 231 by executing the character recognition program, and functions as the conversion unit 232 by executing the speech conversion program.
 抽出部231は、撮像画像を解析して、当該撮像画像内に含まれる文字を抽出する機能を有する。当該解析技術には、既存の文字認識処理を用いることができる。 The extraction unit 231 has a function of analyzing a captured image and extracting characters included in the captured image. Existing character recognition processing can be used as the analysis technique.
 変換部232は、抽出部231が抽出した文字を音声(読み上げ音声)に変換する機能を有する。当該変換技術には、既存の変換処理を用いることができる。 The conversion unit 232 has a function of converting the characters extracted by the extraction unit 231 into speech (read-out speech). An existing conversion process can be used for the conversion technology.
 以上が、サーバ200の構成についての説明である。 The above is the description of the configuration of the server 200.
<データ>
 図3は、読み上げシステム1に係るデータのデータ構成例を示す図である。
<Data>
FIG. 3 is a view showing an example of the data configuration of data according to the reading system 1.
 図3(a)は、装着具100(コントローラ150)がサーバ200に送信する送信データ310(撮像画像)のデータ構成例(フォーマット例)を示す図である。 FIG. 3A is a view showing a data configuration example (format example) of transmission data 310 (captured image) transmitted by the mounting tool 100 (controller 150) to the server 200. As shown in FIG.
 図3(a)に示すように、送信データ310は、ユーザID311と、撮像画像情報312と、撮像時間情報313とが対応付けられた情報である。 As shown in FIG. 3A, the transmission data 310 is information in which a user ID 311, captured image information 312, and imaging time information 313 are associated.
 ユーザID311は、装着具100を使用するユーザ10を一意に特定することができる識別情報である。これにより、サーバ200は、どのユーザからの撮像画像であるかを特定できるとともに、ユーザ毎に撮像画像や生成した読み上げ音声を管理することができる。 The user ID 311 is identification information that can uniquely identify the user 10 who uses the mounting tool 100. Thus, the server 200 can identify which user the captured image is from, and can manage the captured image and the generated read-out sound for each user.
 撮像画像情報312は、撮像部111が撮像した撮像画像の実データを示す情報である。 The captured image information 312 is information indicating actual data of a captured image captured by the imaging unit 111.
 撮像時間情報313は、撮像画像情報312で示される撮像画像が撮像された日時を示す情報である。当該情報は、図示ししてないが、撮像部111の内部時計から取得することができる。 The imaging time information 313 is information indicating the date and time when the captured image indicated by the captured image information 312 was captured. The information is not illustrated, but can be acquired from the internal clock of the imaging unit 111.
 図3(b)は、サーバ200の記憶部220に記憶されており、読み上げシステム1を利用するユーザ毎に管理する読み上げ音声情報のデータ構成例を示す図である。当該データは、サーバ200が過去に変換して得られた読み上げ音声を管理するための情報である。 FIG. 3B is a view showing an example of the data configuration of the read-out voice information stored in the storage unit 220 of the server 200 and managed for each user who uses the read-out system 1. The data is information for managing the read-out speech obtained by the server 200 converting it in the past.
 図3(b)に示すように読み上げ音声情報320は、撮像時間情報321と、撮像画像情報322と、読み上げ音声323とが対応付けられた情報である。 As shown in FIG. 3B, the read-out voice information 320 is information in which the imaging time information 321, the captured image information 322, and the read-out voice 323 are associated.
 撮像時間情報321は、対応する撮像画像が撮像された日時を示す情報で、撮像時間情報313と同一の情報である。 The imaging time information 321 is information indicating the date and time when the corresponding imaging image was imaged, and is the same information as the imaging time information 313.
 撮像画像情報322は、撮像画像の実データを示す情報で、撮像画像情報312と同一の情報である。 The captured image information 322 is information indicating actual data of the captured image, and is the same information as the captured image information 312.
 読み上げ音声323は、対応する撮像画像情報322から抽出部231が文字を抽出し、当該文字を変換部232が変換して得られた読み上げ音声を示す実データである。 The read-out sound 323 is actual data indicating the read-out sound obtained by the extraction unit 231 extracting a character from the corresponding captured image information 322 and the conversion unit 232 converting the character.
 読み上げ音声情報320があることにより、サーバ200は、過去の読み上げ音声を管理することができる。 With the read-out voice information 320, the server 200 can manage the past read-out voice.
 以上が、読み上げシステム1に主として関わる情報の説明である。 The above is the description of the information mainly related to the reading system 1.
<動作>
 ここから、読み上げシステム1の動作について説明する。まず、図4に示すシーケンス図を用いて、読み上げシステム1の全体的な動作を説明した後に、装着具100及びサーバ200の詳細な動作をそれぞれ、図5及び図6のフローチャートを用いて説明する。
<Operation>
The operation of the reading system 1 will now be described. First, the overall operation of the reading system 1 will be described using the sequence diagram shown in FIG. 4, and then the detailed operations of the mounting tool 100 and the server 200 will be described using the flowcharts of FIGS. 5 and 6, respectively. .
 図4は、装着具100とサーバ200とのやり取りを示したシーケンス図である。図4に示すように、装着具100は、ユーザ10の正面方向の撮像を実行する(ステップS401)。そして、装着具100は、得られた撮像画像をサーバ200に送信する(ステップS402)。 FIG. 4 is a sequence diagram showing the exchange between the mounting tool 100 and the server 200. As shown in FIG. As shown in FIG. 4, the wearing tool 100 performs imaging in the front direction of the user 10 (step S <b> 401). Then, the wearing tool 100 transmits the obtained captured image to the server 200 (step S402).
 サーバ200は、装着具100から送信された撮像画像を受信する(ステップS403)。すると、サーバ200は、受信した撮像画像から文字を抽出する(ステップS404)。そして、サーバ200は、抽出した文字を音声に変換して読み上げ音声を生成する(ステップS405)。読み上げ音声を生成すると、サーバ200は、これを装着具100に送信する(ステップS406)。 The server 200 receives the captured image transmitted from the mounting tool 100 (step S403). Then, the server 200 extracts characters from the received captured image (step S404). Then, the server 200 converts the extracted characters into speech and generates a read-out speech (step S405). When the read-out speech is generated, the server 200 transmits this to the attachment 100 (step S406).
 装着具100は、サーバ200から送信された読み上げ音声を受信する(ステップS407)。すると、装着具100は、受信した読み上げ音声を、出力する(ステップS408)。これにより、読み上げシステム1は、ユーザ10の正面方向(視野方向)に存在する文字を認識して、音でユーザ10に伝えることができる。 The wearing tool 100 receives the read-out voice transmitted from the server 200 (step S407). Then, the wearing tool 100 outputs the received read-out voice (step S408). As a result, the reading system 1 can recognize characters present in the front direction (viewing direction) of the user 10 and transmit the characters to the user 10 by sound.
 図5は、装着具100の動作を示すフローチャートである。 FIG. 5 is a flowchart showing the operation of the mounting tool 100.
 まず、装着具100の入力部154は、ユーザからの入力があったか否かを、各種のボタンの押下があったか否かに基づいて判定する(ステップS501)。ユーザからの入力があった場合には(ステップS501のYES)、ステップS502の処理に移行し、なかった場合には(ステップS501のNO)、ステップS512の処理に移行する。 First, the input unit 154 of the mounting tool 100 determines whether or not there has been an input from the user based on whether or not various buttons have been pressed (step S501). If there is an input from the user (YES in step S501), the process proceeds to the process of step S502. If there is no input (NO in step S501), the process proceeds to the process of step S512.
 ステップS502において、制御部155は、入力部154が受け付けた入力が撮像指示であったか否かを判定する(ステップS502)。入力が撮像指示であった場合には(ステップS502のYES)、ステップS503の処理に移行し、撮像指示でなかった場合には(ステップS502のNO)、ステップS506の処理に移行する。 In step S502, the control unit 155 determines whether the input accepted by the input unit 154 is an imaging instruction (step S502). If the input is an imaging instruction (YES in step S502), the process proceeds to step S503. If the input is not an imaging instruction (NO in step S502), the process proceeds to step S506.
 ステップS503において、入力部154が撮像指示をユーザから受け付けると、制御部155に撮像指示が伝達される。これを受けて制御部155は、通信I/F151に撮像信号をウェアラブルグラス110に伝達するように指示する。通信I/F151は当該指示に従って撮像信号を通信I/F112に伝達する。そして、通信I/F112は、撮像部111に撮像信号を伝達し、撮像部111は、撮像を実行する(ステップS503)。 In step S503, when the input unit 154 receives an imaging instruction from the user, the imaging instruction is transmitted to the control unit 155. In response to this, the control unit 155 instructs the communication I / F 151 to transmit the imaging signal to the wearable glass 110. The communication I / F 151 transmits an imaging signal to the communication I / F 112 according to the instruction. Then, the communication I / F 112 transmits an imaging signal to the imaging unit 111, and the imaging unit 111 executes imaging (step S503).
 撮像部111は、得られた撮像画像を通信I/F112に伝達し、通信I/F112は、撮像画像を、通信I/F151に伝達する。通信I/F151は、伝達された撮像画像を制御部155に伝達し、制御部155は、これをサーバ200に送信するよう通信部152に指示する。通信部152は、当該指示を受けて、撮像画像をネットワーク300を介してサーバ200に送信する(ステップS504)。 The imaging unit 111 transmits the obtained captured image to the communication I / F 112, and the communication I / F 112 transfers the captured image to the communication I / F 151. The communication I / F 151 transmits the transmitted captured image to the control unit 155, and the control unit 155 instructs the communication unit 152 to transmit this to the server 200. In response to the instruction, the communication unit 152 transmits the captured image to the server 200 via the network 300 (step S504).
 そして、撮像画像の送信後に、制御部155は、撮像画像中の文字を音声に変換中であることを示す音声を記憶部153から読み出し、当該音声を出力するように出力部156に指示する。これを受けて出力部156は、イヤホン130に当該音声を出力し、イヤホン130は、当該音声を報知し(ステップS505)、ステップS501の処理に戻る。撮像画像に含まれる文字を音声に変換中であることを示す音声を報知することによって、ユーザ10は、今、文字を音声に変換する処理を行っている最中であることを認識することができ、何も音が鳴らない(ユーザ10に対して何の報知も成されない)場合に比して、いらだつことなく待機することができるようになる。 Then, after transmitting the captured image, the control unit 155 reads from the storage unit 153 a voice indicating that the characters in the captured image are being converted to voice, and instructs the output unit 156 to output the voice. In response to this, the output unit 156 outputs the voice to the earphone 130, and the earphone 130 notifies the voice (step S505), and the process returns to step S501. By notifying the voice indicating that the character contained in the captured image is being converted to voice, the user 10 can now recognize that the process of converting the character to voice is being performed. As compared with the case where no sound is generated (no notification is made to the user 10), it is possible to wait without frustration.
 一方、ステップS502において、入力指示が撮像指示ではないと判定した場合に(ステップS502のNO)、入力が音声の一時停止指示であったかを判定する(ステップS506)。入力が音声の一時停止指示であった場合には(ステップS506のYES)、制御部155は、出力部156に対して出力している音声を一時停止するように指示する。当該指示を受け付けて、出力部156は、音声の出力を停止し(ステップS507)、ステップS501の処理に戻る。当該一時停止は、新たな再生指示が入力されるまで、あるいは、完全停止指示が入力されるまで行われる。 On the other hand, when it is determined in step S502 that the input instruction is not the imaging instruction (NO in step S502), it is determined whether the input is the audio pause instruction (step S506). If the input is a voice pause instruction (YES in step S506), the control unit 155 instructs the output unit 156 to pause the voice being output. Upon receiving the instruction, the output unit 156 stops outputting the voice (step S507), and returns to the process of step S501. The pause is performed until a new reproduction instruction is input or until a complete stop instruction is input.
 ステップS506において、入力指示が一時停止指示ではないと判定した場合に(ステップS506のNO)、入力がスロー再生の指示であるかを判定する(ステップS508)。入力がスロー再生の指示であった場合には(ステップS508のYES)、制御部155は、出力部156に対して出力している音声をスロー再生するように指示する。当該指示を受けて出力部156は、出力している音声のスロー再生を開始し(ステップS509)、ステップS501の処理に戻る。これによって、早口を聞き取ることに困難を覚えるユーザであっても、音声を正しく認識できるようになる。なお、ここではスロー再生にする例を示しているが、これは上述した通り、再生速度を速めることとしてもよい。再生速度を速めた場合には、撮像した内容に含まれる文字の概要を把握する時間を短縮するのに役立てることができる。 When it is determined in step S506 that the input instruction is not the pause instruction (NO in step S506), it is determined whether the input is the slow reproduction instruction (step S508). If the input is an instruction for slow reproduction (YES in step S508), the control unit 155 instructs the output unit 156 to perform slow reproduction of the audio being output. In response to the instruction, the output unit 156 starts slow reproduction of the audio being output (step S509), and returns to the process of step S501. This makes it possible to correctly recognize speech even for users who have difficulty in hearing the rapid sound. Although an example is shown here in which the slow reproduction is performed, as described above, the reproduction speed may be increased. When the reproduction speed is increased, it can be used to shorten the time for grasping the outline of the characters included in the captured content.
 ステップS508において、入力指示がスロー再生ではないと判定した場合に(ステップS508のNO)、入力が再生又はリプレイの入力であるかを判定する(ステップS510)。入力が再生又はリプレイの入力ではない場合には(ステップS510のNO)、ステップS501の処理に戻る。入力が再生又はリプレイの入力であった場合には(ステップS510のYES)、制御部155は、出力部156に一次停止している音声の出力を再開または一度出力した音声のリプレイを実行するように指示する。これを受けて出力部156は、音声の再生を再開またはリプレイし(ステップS511)、ステップS501の処理に戻る。これによって、ユーザ10は、読み上げ音声を聞き逃しても、再度聞き直すことができる。 When it is determined in step S508 that the input instruction is not slow reproduction (NO in step S508), it is determined whether the input is an input of reproduction or replay (step S510). If the input is not a reproduction or replay input (NO in step S510), the process returns to step S501. If the input is an input for playback or replay (YES in step S510), the control unit 155 resumes the output of the temporarily paused voice to the output unit 156 or performs replay of the voice once output. Instruct In response to this, the output unit 156 resumes or replays the audio reproduction (step S511), and returns to the process of step S501. By this, even if the user 10 misses the read-out voice, it can be re-read again.
 ステップS501においてユーザからの入力がない場合に(ステップS501のNO)、制御部155は、サーバ200から読み上げ音声を受信したか否かを判定する(ステップS512)。読み上げ音声を受信していない場合には(ステップS512のNO)、ステップS501の処理に戻る。 When there is no input from the user in step S501 (NO in step S501), the control unit 155 determines whether the read-out speech has been received from the server 200 (step S512). If the read-out speech has not been received (NO in step S512), the process returns to step S501.
 読み上げ音声を受信していた場合には(ステップS512のYES)、制御部155は、まず、出力部156に対して、出力している文字の音声への変換中であることを示す音声の出力を中止するように指示する。当該指示を受けて出力部156は、当該音声の出力を停止する(ステップS513)。 If the read-out speech has been received (YES in step S512), the control unit 155 first causes the output unit 156 to output a speech indicating that the character being outputted is being converted to speech. Instruct them to cancel In response to the instruction, the output unit 156 stops the output of the voice (step S513).
 そして、制御部155は、出力部156に、通信部132から伝達された読み上げ音声を出力するように指示する。出力部156は、制御部155から伝達された読み上げ音声の出力を開始し(ステップS514)、ステップS501に戻る。 Then, the control unit 155 instructs the output unit 156 to output the read sound transmitted from the communication unit 132. The output unit 156 starts output of the read-out voice transmitted from the control unit 155 (step S514), and returns to step S501.
 以上が、装着具100(コントローラ150)の動作の説明である。 The above is the description of the operation of the mounting tool 100 (controller 150).
 図6は、サーバ200が撮像画像を装着具100から受信したときの動作を示すフローチャートである。 FIG. 6 is a flowchart showing an operation when the server 200 receives a captured image from the mounting tool 100.
 まず、サーバ200の通信部210は、ネットワーク300を介して、装着具100からの撮像画像を受信する(ステップS601)。通信部210は、受信した撮像画像を制御部230に伝達する。 First, the communication unit 210 of the server 200 receives a captured image from the mounting tool 100 via the network 300 (step S601). The communication unit 210 transmits the received captured image to the control unit 230.
 制御部230は、抽出部231として、伝達された撮像画像を解析して、文字を抽出する(ステップS602)。抽出部231は、抽出した文字列を変換部232に伝達する。 The control unit 230, as the extraction unit 231, analyzes the transmitted captured image and extracts characters (step S602). The extraction unit 231 transmits the extracted character string to the conversion unit 232.
 変換部232は、抽出した文字列を音声に変換し(ステップS603)、機会音声の合成音声である読み上げ音声を生成する。変換部232は、生成した読み上げ音声を通信部210に伝達する。 The conversion unit 232 converts the extracted character string into speech (step S603), and generates a read-out speech that is a synthetic speech of the opportunity speech. The conversion unit 232 transmits the generated read-out speech to the communication unit 210.
 通信部210は、変換後の合成音声を読み上げ音声として、ネットワーク300を介して、装着具100に送信する(ステップS604)。 The communication unit 210 transmits the converted synthesized speech as the read-out speech to the wearing tool 100 via the network 300 (step S604).
 その後に、制御部230は、受信した撮像画像と、当該撮像画像の撮像日時と、当該撮像画像から得られた読み上げ音声とをそれぞれ、撮像画像情報322、撮像時間情報321、読み上げ音声323として、読み上げ音声情報に登録して(ステップS605)、処理を終了する。 After that, the control unit 230 sets the received captured image, the captured date and time of the captured image, and the read-out voice obtained from the captured image as the captured image information 322, the capture time information 321, and the read-out voice 323, respectively. It registers in read-out voice information (step S605), and ends the processing.
 以上が、サーバ200の動作である。以上に説明した動作を実行することにより、読み上げシステム1は、単に認識した文字を読み上げるのではなく、ユーザにとって聞き取りやすいように音声を再生することができる。 The above is the operation of the server 200. By performing the operation described above, the reading system 1 can reproduce the voice so as to be easy for the user to hear, rather than merely reading the recognized character.
<まとめ>
 読み上げシステム1は、撮像画像に含まれる文字を認識して音声にして出力することができる。このとき、読み上げシステム1においては、読み上げ音声について、スロー再生や、一次停止、リプレイ等の操作をユーザが行うことができるので、ユーザは、各々の好みにより聞きやすいように音声を再生することができる。したがって、ユーザにとって利便性の高い読み上げシステムを提供することができる。また、読み上げシステム1においては、撮像画像から読み上げ音声を生成する処理を実行している間は、処理中であることを示す音声を報知することで、ユーザ10に状況を認識させることができる。
<Summary>
The reading system 1 can recognize the characters included in the captured image, output them as sounds. At this time, in the reading system 1, since the user can perform operations such as slow reproduction, primary stop, replay, etc. for the read-out sound, the user can reproduce the sound so as to be easy to hear according to each preference. it can. Therefore, it is possible to provide a user-friendly reading system. Further, in the reading system 1, while the process of generating the reading voice from the captured image is being performed, the user 10 can be made to recognize the situation by notifying the voice indicating that the processing is in progress.
<補足>
 上記実施の形態に係る読み上げシステムは、上記実施の形態に限定されるものではなく、他の手法により実現されてもよいことは言うまでもない。以下、各種変形例について説明する。
<Supplement>
It is needless to say that the reading system according to the above embodiment is not limited to the above embodiment, and may be realized by another method. Hereinafter, various modifications will be described.
 (1) 上記実施の形態においては、イヤホン130を用いて音声を出力することとしたが、ウェアラブルグラス110又はコントローラ150にスピーカを備えて、出力部156は、当該スピーカから読み上げ音声を出力することとしてもよい。このように構成することで、イヤホン130を装着することを苦痛とするユーザであっても、読み上げ音声を聞くことができる。また、この場合、複数のユーザが同時に音声を聞くことができるというメリットもある。 (1) In the above embodiment, although the voice is output using the earphone 130, the wearable glass 110 or the controller 150 is provided with a speaker, and the output unit 156 outputs the read voice from the speaker It may be With this configuration, even a user who has pain in wearing the earphone 130 can hear the read-out voice. In addition, in this case, there is also an advantage that a plurality of users can listen to voice simultaneously.
 (2) 上記実施の形態においては、装着具100として、ウェアラブルグラス110と、イヤホン130と、コントローラ150とを含み、それぞれ別の機器として構成する例を示した。しかし、これはその限りではなく、ウェアラブルグラス110と、イヤホン130と、コントローラ150とは一体に成形されてもよい。即ち、ウェアラブルグラス110は、イヤホン130の音声を出力する機能の代替としてスピーカを備え、コントローラ150が有する機能を保持してもよい。例えば、ウェアラブルグラス110のテンプル部分を中空構造とし、その内部に、コントローラ150のプロセッサやメモリ、通信モジュール等を搭載する構成としてもよい。そして、ウェアラブルグラス110のテンプルあるいはリムの外装側に音声再生制御や、撮像指示のための各種のボタンを配することとしてもよい。 (2) In the embodiment described above, an example has been shown in which the wearing tool 100 includes the wearable glass 110, the earphone 130, and the controller 150 and is configured as separate devices. However, this is not a limitation, and the wearable glass 110, the earphone 130, and the controller 150 may be integrally formed. That is, the wearable glass 110 may include a speaker as an alternative to the function of outputting the sound of the earphone 130, and may hold the function of the controller 150. For example, the temple portion of the wearable glass 110 may have a hollow structure, and the processor, the memory, the communication module, and the like of the controller 150 may be mounted therein. Then, on the exterior side of the temple or rim of the wearable glass 110, various buttons may be provided for voice reproduction control and an imaging instruction.
 (3) 上記実施の形態においては、装着具100とサーバ200とを別の機器として説明したが、装着具100は、サーバ200が有する機能(抽出部及び変換部の機能)を備えることとしてもよい。例えば、コントローラ150にサーバ200が有する機能を実現するチップを備える構成にしてもよい。このように構成すれば、装着具100はスタンドアローンで読み上げシステムを構築することができる。また、撮像画像の送信及び読み上げ音声の受信に係るレイテンシを抑制することができる。 (3) Although the mounting tool 100 and the server 200 have been described as separate devices in the above embodiment, the mounting tool 100 may have the functions of the server 200 (the functions of the extraction unit and the conversion unit). Good. For example, the controller 150 may be configured to include a chip for realizing the function of the server 200. If comprised in this way, the mounting tool 100 can construct a reading system by a stand-alone. In addition, it is possible to suppress the latency related to the transmission of the captured image and the reception of the reading voice.
 (4) 上記実施の形態においては、撮像画像から文字を抽出する範囲について予め定めておくこととしたが、これはその限りではない。例えば、ウェアラブルグラス110にユーザの目を撮像するカメラを設けて、視線方向を検出し、その視線方向を中心とした所定範囲を撮像画像に当てはめ、その所定範囲内の文字を検出するように構成してもよい。例えば、ウェアラブルグラス110は、撮像部111が撮像した第1撮像画像と、ユーザの目を撮像した第2撮像画像とをコントローラ150に伝達し、コントローラ150は、第1撮像画像と第2撮像画像とを、サーバ200に送信する。サーバ200の抽出部231は、第2撮像画像からユーザ10の視線方向を特定し、特定した視線方向を含む所定の範囲を特定し、第1撮像画像においてその所定の範囲内に対応する箇所から文字を抽出するように構成してもよい。 (4) In the above embodiment, the range for extracting characters from the captured image is determined in advance, but this is not the only limitation. For example, the wearable glass 110 is provided with a camera for capturing an eye of a user, a gaze direction is detected, a predetermined range centered on the gaze direction is applied to a captured image, and characters within the predetermined range are detected. You may For example, the wearable glass 110 transmits the first captured image captured by the imaging unit 111 and the second captured image obtained by capturing the eyes of the user to the controller 150, and the controller 150 transmits the first captured image and the second captured image. And to the server 200. The extraction unit 231 of the server 200 specifies the line-of-sight direction of the user 10 from the second captured image, specifies a predetermined range including the specified line-of-sight direction, and starts from the location corresponding to the predetermined range in the first captured image It may be configured to extract characters.
 (5) 上記実施の形態においては、コントローラ150に対する撮像指示の入力を受け付けて、撮像部111が撮像を行うこととしたが、撮像のトリガはこれに限るものではない。例えば、ウェアラブルグラス110又はコントローラ150にマイクを備え、そのマイクでユーザの発する音声を取得する。そして、ユーザが発した特定の言葉に基づいて撮像を行ってもよい。即ち、音声入力による撮像を行ってもよい。 (5) In the above embodiment, although the imaging unit 111 receives an input of an imaging instruction to the controller 150 and performs imaging, the imaging trigger is not limited to this. For example, the wearable glass 110 or the controller 150 is provided with a microphone, and the microphone emits voice emitted by the user. Then, imaging may be performed based on a specific word issued by the user. That is, imaging by voice input may be performed.
 また、あるいは、ウェアラブルグラス110にユーザの目を撮像するカメラを設け、ユーザの目のブリンク(まばたき)を撮像のトリガとしてもよい。 Alternatively, a camera for capturing an eye of the user may be provided on the wearable glass 110, and blink (eye blink) of the user's eye may be used as a trigger for imaging.
 (6) 上記実施の形態においては、入力部154は、コントローラ150に設ける構成を示したが、これはその限りではなく、ケーブル140の途上に設けられてもよい。 (6) In the above embodiment, the input unit 154 is provided in the controller 150. However, the input unit 154 may be provided in the middle of the cable 140.
 (7) 上記実施の形態においては、特に記載していないが、読み上げシステム1は、読み上げ音声の言語を設定できる設定部を備えることとしてもよい。そして、抽出部231が抽出した文字を設定部に設定された言語に翻訳する翻訳部を備え、変換部232は、翻訳部が翻訳した文字を音声に変換することとしてもよい。この構成を備えることで、読み上げシステム1は、書かれている文字の通訳システムとして機能することができ、弱視者に限らず、異国のユーザにとっても有用なシステムとすることができる。 (7) Although not particularly described in the above embodiment, the reading system 1 may be provided with a setting unit capable of setting the language of the reading voice. A translation unit may be provided to translate the characters extracted by the extraction unit 231 into the language set in the setting unit, and the conversion unit 232 may convert the characters translated by the translation unit into speech. With this configuration, the reading system 1 can function as a system for interpreting written characters, and can be a system useful not only for people with low vision but also for foreign users.
 (8) 上記実施の形態においては、特に説明していないが、抽出部231は、撮像画像から文字を抽出する範囲を撮像画像全体ではなく、所定の範囲内にとどめてもよい。図7は、撮像画像700の一例を示しており、抽出部231は、この撮像画像700のうち、所定の範囲710内のみを文字を抽出する範囲としてもよい。あるいは、所定の範囲710を優先的に文字を抽出する範囲としてもよい。優先的に文字を抽出する範囲とするとは、当該範囲内をまず文字を抽出する範囲とし、所定の範囲710内から文字を抽出できなかった場合に所定の範囲710外から文字を抽出する処理を行うことをいう。 (8) Although not particularly described in the above embodiment, the extraction unit 231 may keep the range for extracting characters from the captured image within a predetermined range instead of the entire captured image. FIG. 7 shows an example of the captured image 700, and the extraction unit 231 may set only a predetermined range 710 in the captured image 700 as a range for extracting characters. Alternatively, the predetermined range 710 may be set as a range in which characters are preferentially extracted. The process of extracting a character from the outside of the predetermined range 710 when the character extraction range is a range in which the character is extracted first, and the character can not be extracted from the inside of the predetermined range 710 Say what to do.
 ここで所定の範囲710は、読み上げシステム1を利用するユーザによって設定されてよい。一般にユーザは、正面方向よりも若干下寄りの方向を見る傾向が高い。したがって、撮像画像700の下部寄りに所定の範囲710を設定すれば効果的である。 Here, the predetermined range 710 may be set by the user who uses the reading system 1. Generally, the user tends to look in a direction slightly lower than the front direction. Therefore, it is effective to set the predetermined range 710 closer to the lower part of the captured image 700.
 また、所定の範囲710は、制御部230が設定することとしてもよい。具体的には、サーバ200が受信した大量の撮像画像について、文字を抽出できた範囲を特定する。そして、その平均範囲を、文字を抽出するための所定の範囲710とすることとしてもよい。 Also, the control unit 230 may set the predetermined range 710. Specifically, for a large number of captured images received by the server 200, a range in which characters can be extracted is specified. Then, the average range may be set as a predetermined range 710 for extracting characters.
 また、更には、ウェアラブルグラス110に各種のセンサを備え、当該センサから得られるセンシングデータに基づいて所定の範囲710を決定することとしてもよい。例えば、ウェアラブルグラス110にジャイロセンサを搭載し、装着具100は、撮像画像とともにジャイロセンサのセンシングデータをサーバ200に送信する。そして、抽出部231は、当該ジャイロセンサのセンシングデータに基づいて所定の範囲710を決定してもよい。例えば、センシングデータから、ユーザ10がうつむき加減であると推定した場合には、所定の範囲710を撮像画像700全体に対して下方よりの位置に設定することとしてよい。 Furthermore, various sensors may be provided on the wearable glass 110, and the predetermined range 710 may be determined based on sensing data obtained from the sensors. For example, a gyro sensor is mounted on the wearable glass 110, and the mounting tool 100 transmits sensing data of the gyro sensor to the server 200 together with a captured image. Then, the extraction unit 231 may determine the predetermined range 710 based on sensing data of the gyro sensor. For example, when it is estimated from the sensing data that the user 10 is depressed, the predetermined range 710 may be set to a position from the lower side with respect to the entire captured image 700.
 撮像画像700の全体を解析対象としないことで、音声への変換処理に要する時間を短縮することができる。 By not using the entire captured image 700 as an analysis target, it is possible to reduce the time required for the conversion processing to sound.
 (9) 上記実施の形態においては、特に記載しなかったが、サーバ200は、ユーザ10が保持するPC等の情報処理装置に、過去ログとして、対応する読み上げ音声情報320を送信する構成を備えてもよい。当該構成により、ユーザ10はいつでも過去の読み上げ音声を聞くことができるようになる。 (9) Although not particularly described in the above embodiment, the server 200 has a configuration for transmitting the corresponding read-out voice information 320 as a past log to an information processing apparatus such as a PC held by the user 10 May be With this configuration, the user 10 can listen to the past read-out voice any time.
 また、さらには、装着具100は、自装置の存在する場所を示す位置情報を取得するための位置情報取得部を備えてもよい。位置情報取得部は、例えば、GPSやGNSS等を利用することにより実現することができる。 Furthermore, the mounting tool 100 may be provided with a position information acquisition unit for acquiring position information indicating the location where the own device is present. The position information acquisition unit can be realized by using, for example, GPS or GNSS.
 そして、撮像部111が撮像画像を得るごとに、位置情報取得部は、位置情報を取得し、撮像画像に取得した位置情報を対応付ける。装着具100は、位置情報が対応付けられた撮像画像をサーバ200に送信する。サーバ200は、読み上げ音声情報320として、更に、撮像位置を示す撮像位置情報を対応付けて管理してよい。 Then, each time the imaging unit 111 obtains a captured image, the position information obtaining unit obtains position information, and associates the obtained position information with the captured image. The attachment tool 100 transmits the captured image associated with the position information to the server 200. The server 200 may further associate and manage imaging position information indicating an imaging position as the reading voice information 320.
 そうすると、サーバ200から、読み上げ音声情報320として、位置情報を含む情報がユーザ10の情報処理装置に送信されることから、ユーザ10の情報処理装置においては、更に、図8に示すように読み上げ音声を地図アプリケーションとともに提示することができるようになる。即ち、ユーザ10は、地図上において、いつどこで取得した読み上げ音声であるかを認識することができるようになる。そして、地図情報のログ情報801や802などの上にカーソル803を位置してクリックすることにより、情報処理装置は、読み上げ音声を音声再生ソフト等により再生することとしてもよい。例えば、図8の地図800に示すように、ログ情報801や、ログ情報802があることにより、どこで撮像した撮像画像に基づいて得られた読み上げ音声であるかを認識することができる。 Then, information including position information is transmitted from the server 200 as the read-out voice information 320 to the information processing apparatus of the user 10, so that the information processing apparatus of the user 10 further reads the read-out voice as shown in FIG. Can be presented with the map application. That is, the user 10 can recognize when and where the read-out speech is acquired on the map. Then, by positioning and clicking the cursor 803 on the log information 801 and 802 of the map information, etc., the information processing apparatus may play back the read-out voice by voice reproduction software or the like. For example, as shown in the map 800 in FIG. 8, the log information 801 and the log information 802 can recognize where the read-out voice is obtained based on the captured image captured.
 (10) 上記実施の形態においては、装着具100の動作として詳細には説明していないが、撮像部111は、逐次撮像を行い、得られた撮像画像に文字が含まれているか否かを検出することとしてよい。そして、文字が含まれていることを検出した場合に、その旨をコントローラ150に伝達し、制御部155は、ユーザ10にそのときの正面方向に文字が存在することを認識させるための音声を報知してよい。そうすると、ユーザ10は、そのタイミング、入力部154に対して撮像指示を入力することができる。この構成を備えることで、ユーザ10が弱視者、特に、盲目であった場合など、文字が存在することも視認できないような場合に、ユーザ10に文字の存在を認識させることができ、ユーザ10にとって、利便性の高い読み上げシステム1を提供することができる。 (10) In the above embodiment, the operation of the mounting tool 100 is not described in detail, but the imaging unit 111 sequentially performs imaging, and it is determined whether or not characters are included in the obtained captured image. It is good to detect. Then, when it is detected that a character is included, that effect is transmitted to the controller 150, and the control unit 155 causes the user 10 to recognize the presence of the character in the front direction at that time. It may be informed. Then, the user 10 can input an imaging instruction to the timing and the input unit 154. By providing this configuration, the user 10 can be made to recognize the presence of the character when the user 10 can not visually recognize the presence of the character, such as when the user 10 is a low-vision person, in particular, blind. Can provide a highly convenient reading system 1.
 (11) 上記実施の形態においては、特に記載していないが、撮像部111は、ユーザ(ウェアラブルグラス110)の置かれている環境に応じて撮像条件を変えてもよい。例えば、ウェアラブルグラス110は、各種のセンサ(例えば、照度センサなど)を備え、露光時間や画角を変更することとしてもよい。 (11) Although not particularly described in the above embodiment, the imaging unit 111 may change the imaging condition according to the environment in which the user (wearable glass 110) is placed. For example, the wearable glass 110 may include various sensors (for example, an illuminance sensor or the like) to change the exposure time and the angle of view.
 (12)上記実施の形態においては、特に記載していないが、サーバ200が画像から文字を抽出できなかったり、抽出した文字を音声に変換できなかったり、画像に文字が含まれていなかった場合には、エラー信号を装着具100に送信し、装着具100はこれを受けてエラーを示す音声を出力部156から出力することとしてもよい。また、エラー音や上記実施の形態に示した変換中であることを示す音声の他、例えば、装着具100を起動したときの起動音や、撮像部111が撮像を行ったときの撮像音(シャッター音)、待機中を示す音、ユーザが処理のキャンセルの入力を行ったときのキャンセル音など各種の音を記憶部153に記憶しておき、制御部155は装着具100の状態に応じて対応する音を出力部156から出力させることとしてもよい。また、通信部152が通信できない(ネットワークと接続できない)場合に、その旨を示す音声を出力部156から出力することとしてもよい。このように各種の状態に応じた音を出力する構成をとることで、装着具100は、装置の状態を音のみでユーザに通知することができる。 (12) In the above embodiment, although not particularly described, the server 200 can not extract characters from the image, can not convert the extracted characters into sound, or the image does not contain characters. Alternatively, the error signal may be transmitted to the mounting tool 100, and the mounting tool 100 may receive the signal and output a sound indicating the error from the output unit 156. In addition to the error sound and the sound indicating that conversion is in progress described in the above embodiment, for example, an activation sound when the mounting tool 100 is activated, an imaging sound when the imaging unit 111 performs imaging ( A variety of sounds are stored in the storage unit 153 such as a shutter sound), a sound indicating that the user is waiting, and a cancellation sound when the user inputs an instruction to cancel the process. A corresponding sound may be output from the output unit 156. Further, when the communication unit 152 can not communicate (can not connect to the network), a sound indicating that may be output from the output unit 156. Thus, the mounting tool 100 can notify the user of the state of the device only by the sound by adopting the configuration of outputting the sound according to various states.
 (13)上記実施の形態においては、特に記載していないが、サーバ200は、撮像画像から文字を抽出した箇所や、文字を抽出した範囲の撮像画像に対する割合に応じて生成する読み上げ音声の態様を変更することとしてもよい。 (13) In the above embodiment, although not particularly described, the server 200 generates the read-out voice generated according to the ratio of the location where the characters are extracted from the captured image or the range where the characters are extracted to the captured image. May be changed.
 文字を抽出した箇所に応じて音声の態様を変えるとは、撮像画像から文字を抽出した撮像画像における箇所に応じて、ユーザによって音声が聞こえてくる方向を変更することをいう。例えば、文字を抽出した箇所が、撮像画像の右寄りから抽出した場合には、ユーザの右側から読み上げ音声が聞こえるように出力部156から出力するように構成してもよい。この構成にすることによって、ユーザから見てどちらの方向にある文字を読み上げたのかを感覚的にユーザに認識させることができる。 Changing the mode of the voice according to the place where the character is extracted means changing the direction in which the user hears the voice according to the place in the captured image in which the character is extracted from the captured image. For example, when the part where the character is extracted is extracted from the right side of the captured image, the output unit 156 may output so that the read-out voice can be heard from the right side of the user. With this configuration, it is possible for the user to intuitively recognize in which direction the user read the character in the direction viewed from the user.
 また、文字を抽出した範囲の撮像画像に対する割合に応じて生成する読み上げ音声の態様を変更するとは、文字を抽出した範囲の撮像画像に対する割合の多寡に応じて、読み上げ音声の音量を変更するように構成してもよい。即ち、当該割合のパーセンテージと、読み上げ音声を出力する音量とを対応付けて記憶しておき、撮像画像から文字を抽出した範囲のパーセンテージと照らし合わせて読み上げ音声を出力する音量を決定し、その決定した音量で読み上げ音声を出力することとしてもよい。 In addition, changing the mode of the read-out voice generated according to the ratio of the range from which the character is extracted to the captured image changes the volume of the read-out voice according to the ratio of the ratio to the captured image of the range from which the character is extracted. You may configure it. That is, the percentage of the ratio and the volume at which the read-out speech is output are stored in association, and the volume at which the read-out speech is output is determined in comparison with the percentage of the range in which the characters are extracted from the captured image. It is also possible to output the read sound at the selected volume.
 (14)上記実施の形態においては、送信データ310は、ユーザID311と、撮像画像情報312と、撮像時間情報313とが対応付けられていることとしているが、これ以外にも各種の情報が対応付けられていてよい。例えば、上記補足に示したように、装着具100が存在する箇所を示す位置情報や、装着具100の姿勢を特定し得るジャイロセンサや加速度センサ等のセンシングデータの情報も対応付けられていてもよい。 (14) In the above embodiment, the transmission data 310 is associated with the user ID 311, the captured image information 312, and the imaging time information 313, but various other information is also supported. It may be attached. For example, as shown in the above supplement, even if information of sensing data such as a gyro sensor or an acceleration sensor capable of specifying the posture of the mounting tool 100 is also associated with position information indicating a position where the mounting tool 100 exists. Good.
 また、読み上げ音声情報についても、撮像時間情報321と、撮像画像情報322と、読み上げ音声323とが対応付けられた情報であるとしているが、これ以外にも、撮像画像を解析して得られた文字のテキストデータや、送信データ310に含まれている位置情報やセンシングデータなども対応付けられてよい。 In addition, also regarding the read-out voice information, it is assumed that the imaging time information 321, the captured image information 322, and the read-out voice 323 are associated with each other, but in addition to this, the captured image is obtained by analysis. Text data of characters, position information included in the transmission data 310, sensing data, etc. may also be associated.
 読み上げ音声情報は、各種の情報をより多く蓄積して集積することで、各ユーザのライフログとして利用することができる。そして、ユーザからの要望に応じて、サーバ200は、蓄積している情報のうち指定された情報を供与する供与部を備えてもよい。例えば、位置情報を蓄積することによって、ユーザの単位時間当たり(例えば、1日)の移動量に関する情報を提供したり、ユーザがどこに行ったのかの情報を提供したり、ジャイロセンサの情報を利用してユーザの姿勢を特定することで、姿勢の情報を提供(例えば、姿勢の良し悪しなど)したりすることもできる。 The read-out voice information can be used as a life log of each user by accumulating and accumulating various types of information. And according to the demand from a user, server 200 may be provided with the offer part which provides specified information among the stored information. For example, by accumulating position information, information on the amount of movement of the user per unit time (for example, 1 day) can be provided, information on where the user has gone, information on gyro sensor information can be used Then, by specifying the posture of the user, it is possible to provide posture information (for example, whether the posture is good or bad).
 (15)上記実施の形態においては、読み上げシステム1が音声の読み上げを実行する手法として、読み上げシステム1を構成する各機能部として機能するプロセッサ(制御部155、制御部230)が読み上げプログラム等を実行することにより、読み上げ処理を実行することとしているが、これは装置に集積回路(IC(Integrated Circuit)チップ、LSI(Large Scale Integration))等に形成された論理回路(ハードウェア)や専用回路を組み込むことによって実現してもよい。また、これらの回路は、1または複数の集積回路により実現されてよく、上記実施の形態に示した複数の機能部の機能を1つの集積回路により実現されることとしてもよい。LSIは、集積度の違いにより、VLSI、スーパーLSI、ウルトラLSIなどと呼称されることもある。すなわち、図9に示すように、読み上げシステム1を構成する装着具100及びサーバ200における各機能部は、物理的な回路により実現されてもよい。即ち、図9に示すように、装着具100は、撮像回路111aと通信I/F回路112aとを備えるウェアラブルグラス110と、イヤホン130と、通信I/F回路151aと、通信部152aと、記憶回路153aと、入力回路154aと、制御回路155aと、出力回路156aとから構成されてよく、上記実施の形態において対応する各機能部と同様の機能を有することとしてよい。そして、同様に、サーバ200も、通信回路210aと、記憶回路220aと、抽出回路231a及び変換回路232aとを含む制御回路230aとから構成されてよい。 (15) In the above embodiment, the processor (the control unit 155, the control unit 230) functioning as each functional unit constituting the reading system 1 performs a reading program etc. as a method for the reading system 1 to read the voice. By performing this processing, the reading processing is performed. This is a logic circuit (hardware) or a dedicated circuit formed in an integrated circuit (IC (Integrated Circuit) chip, LSI (Large Scale Integration)) or the like. It may be realized by incorporating In addition, these circuits may be realized by one or more integrated circuits, and the functions of the plurality of functional units shown in the above embodiments may be realized by one integrated circuit. An LSI may be called a VLSI, a super LSI, an ultra LSI, or the like depending on the degree of integration. That is, as shown in FIG. 9, each functional unit in the mounting tool 100 and the server 200 constituting the reading system 1 may be realized by a physical circuit. That is, as shown in FIG. 9, the wearing tool 100 includes a wearable glass 110 including an imaging circuit 111a and a communication I / F circuit 112a, an earphone 130, a communication I / F circuit 151a, a communication unit 152a, and a memory. It may be composed of the circuit 153a, the input circuit 154a, the control circuit 155a, and the output circuit 156a, and may have the same function as that of the corresponding functional units in the above embodiment. Similarly, the server 200 may also be configured of a communication circuit 210a, a storage circuit 220a, and a control circuit 230a including an extraction circuit 231a and a conversion circuit 232a.
 また、上記読み上げプログラムは、プロセッサが読み取り可能な記録媒体に記録されていてよく、記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記読み上げプログラムは、当該読み上げプログラムを伝送可能な任意の伝送媒体(通信ネットワークや放送波等)を介して上記プロセッサに供給されてもよい。本発明は、上記読み上げプログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 Further, the above-mentioned reading program may be recorded on a recording medium readable by a processor, and as the recording medium, "non-temporary tangible medium", for example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit Etc. can be used. Further, the reading program may be supplied to the processor via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the reading program. The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above-mentioned reading program is embodied by electronic transmission.
 なお、上記読み上げプログラムは、例えば、ActionScript、JavaScript(登録商標)などのスクリプト言語、Objective-C、Java(登録商標)などのオブジェクト指向プログラミング言語、HTML5などのマークアップ言語などを用いて実装できる。 The above-mentioned reading program can be implemented using, for example, a script language such as ActionScript or JavaScript (registered trademark), an object-oriented programming language such as Objective-C or Java (registered trademark), or a markup language such as HTML5.
 (16)上記実施の形態及び各補足に示した構成は、適宜組み合わせることとしてもよい。 (16) The configurations shown in the above embodiment and each supplement may be combined as appropriate.
1   読み上げシステム
100 装着具
110 ウェアラブルグラス
111 撮像部
112 通信I/F
130 イヤホン
150 コントローラ
151 通信I/F
152 通信部
153 記憶部
154 入力部
155 制御部
156 出力部
200 サーバ
210 通信部
220 記憶部
230 制御部
231 抽出部
232 変換部
1 reading system 100 mounting tool 110 wearable glass 111 imaging unit 112 communication I / F
130 Earphone 150 Controller 151 Communication I / F
152 communication unit 153 storage unit 154 input unit 155 control unit 156 output unit 200 server 210 communication unit 220 storage unit 230 control unit 231 extraction unit 232 conversion unit

Claims (10)

  1.  ユーザが身に着けて使用する装着具に備えられ、前記ユーザの正面方向を撮像する撮像部と、
     前記撮像部が撮像した画像から文字を抽出する抽出部と、
     前記抽出部が抽出した文字を音声に変換する変換部と、
     前記装着具に備えられ、前記音声を出力する出力部と、
     前記装着具に備えられ、ユーザからの入力を受け付ける入力部と、
     前記装着具に備えられ、前記入力部を介して受け付けた前記ユーザからの入力に基づいて、前記出力部から出力される音声の再生速度を制御する制御部とを備える読み上げシステム。
    An imaging unit provided in a mounting tool worn and used by the user, and imaging a front direction of the user;
    An extraction unit that extracts characters from an image captured by the imaging unit;
    A converter for converting the characters extracted by the extractor into speech;
    An output unit provided in the mounting tool for outputting the sound;
    An input unit provided in the mounting tool for receiving an input from a user;
    A control unit which is provided in the mounting tool and controls the reproduction speed of the sound output from the output unit based on the input from the user received through the input unit.
  2.  前記制御部は、前記入力部を介して受け付けた前記ユーザからの入力に基づいて、前記出力部から出力される音声を一時停止させることを特徴とする請求項1に記載の読み上げシステム。 The reading system according to claim 1, wherein the control unit pauses the sound output from the output unit based on the input from the user received through the input unit.
  3.  前記制御部は、前記入力部を介して受け付けた前記ユーザからの入力に基づいて、前記出力部から出力した音声を繰り返し再生することを特徴とする請求項1に記載の読み上げシステム。 The reading system according to claim 1, wherein the control unit repeatedly reproduces the voice output from the output unit based on the input from the user received through the input unit.
  4.  前記出力部は、前記変換部が前記文字を音声に変換している間、文字の変換処理中であることを示す音を出力することを特徴とする請求項1に記載の読み上げシステム。 The reading system according to claim 1, wherein the output unit outputs a sound indicating that character conversion processing is in progress while the conversion unit converts the characters into speech.
  5.  前記読み上げシステムは、前記抽出部と前記変換部とを備えるサーバを含み、
     前記装着具は、
     前記撮像部が撮像した画像を前記サーバに送信する送信部と、
     前記変換部が変換した音声を受信する受信部とを備え、
     前記出力部は、前記送信部が画像を送信してから前記受信部が前記音声を受信するまでの間、前記文字の変換処理中であることを示す音を出力することを特徴とする請求項4に記載の読み上げシステム。
    The reading system includes a server including the extraction unit and the conversion unit.
    The mounting tool is
    A transmitting unit that transmits an image captured by the imaging unit to the server;
    A receiver for receiving the voice converted by the converter;
    The invention is characterized in that the output unit outputs a sound indicating that the conversion process of the character is being performed between the transmission of the image by the transmission unit and the reception of the sound by the reception unit. The reading system described in 4.
  6.  前記装着具は、周囲の環境に関する環境情報を取得する取得部を備え、
     前記撮像部は、前記環境情報に基づいて撮像条件を変更することを特徴とする請求項1に記載の読み上げシステム。
    The mounting tool includes an acquisition unit for acquiring environmental information on the surrounding environment,
    The reading system according to claim 1, wherein the imaging unit changes imaging conditions based on the environment information.
  7.  前記装着具は、前記撮像部が撮像している撮像画像中に文字があるか否かを判定する判定部を備え、
     前記出力部は、前記判定部が前記撮像画像中に文字が含まれると判定しているときに、撮像部による撮像方向に文字が存在することを示す音声を出力することを特徴とする請求項1に記載の読み上げシステム。
    The mounting tool includes a determination unit that determines whether or not there is a character in a captured image captured by the imaging unit.
    The invention is characterized in that, when the determination unit determines that a character is included in the captured image, the output unit outputs a sound indicating that a character is present in the imaging direction by the imaging unit. The reading system described in 1.
  8.  前記読み上げシステムは、
     前記撮像部が撮像した撮像画像と、当該撮像画像に基づいて前記変換部が変換して得られた音声とを対応付けて、前記ユーザの情報処理端末に送信するログ送信部を備えることを特徴とする請求項1に記載の読み上げシステム。
    The reading system
    A log transmission unit is provided, which transmits the picked up image picked up by the image pickup unit to the information processing terminal of the user by associating the picked up image picked up with the voice obtained by converting the conversion unit based on the picked up image. The reading system according to claim 1.
  9.  前記装着具は、自端末の位置を示す位置情報を取得する位置情報取得部を備え、
     前記撮像部は、撮像した撮像画像に、撮像したときに前記位置情報が取得した位置情報を対応付け、
     前記ログ送信部は、前記撮像画像と前記音声と共に前記位置情報を前記ユーザの情報処理端末に送信することを特徴とする請求項8に記載の読み上げシステム。
    The mounting tool includes a position information acquisition unit that acquires position information indicating the position of the terminal.
    The imaging unit associates the captured image with the position information acquired by the position information when the image is captured,
    The reading system according to claim 8, wherein the log transmission unit transmits the position information together with the captured image and the voice to the information processing terminal of the user.
  10.  ユーザが身に着けて使用する装着具に備えられた撮像部により、前記ユーザの正面方向を撮像する撮像ステップと、
     前記撮像ステップにおいて撮像した画像から文字を抽出する抽出ステップと、
     前記抽出ステップにおいて抽出した文字を音声に変換する変換ステップと、
     前記装着具に備えられた出力部から前記音声を出力する出力ステップと、
     前記装着具に備えられた入力部からユーザからの入力を受け付ける入力ステップと、
     前記入力部を介して受け付けた前記ユーザからの入力に基づいて、前記出力部から出力する音声の再生速度を制御する制御ステップとを含む読み上げ方法。
    An imaging step of imaging a front direction of the user by an imaging unit provided in a mounting tool worn and used by the user;
    An extraction step of extracting characters from the image captured in the imaging step;
    Converting the characters extracted in the extracting step into speech;
    An output step of outputting the sound from an output unit provided in the mounting tool;
    An input step of receiving an input from a user from an input unit provided in the mounting tool;
    A control step of controlling a reproduction speed of voice output from the output unit based on an input from the user received through the input unit.
PCT/JP2018/031366 2017-08-24 2018-08-24 Read-out system and read-out method WO2019039591A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017160888A JP2019040005A (en) 2017-08-24 2017-08-24 Reading aloud system and reading aloud method
JP2017-160888 2017-08-24

Publications (2)

Publication Number Publication Date
WO2019039591A1 true WO2019039591A1 (en) 2019-02-28
WO2019039591A4 WO2019039591A4 (en) 2019-05-09

Family

ID=65440089

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/031366 WO2019039591A1 (en) 2017-08-24 2018-08-24 Read-out system and read-out method

Country Status (2)

Country Link
JP (1) JP2019040005A (en)
WO (1) WO2019039591A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6773844B1 (en) * 2019-06-12 2020-10-21 株式会社ポニーキャニオン Information processing terminal and information processing method
CN110991455B (en) * 2020-02-11 2023-05-05 上海肇观电子科技有限公司 Image text broadcasting method and equipment, electronic circuit and storage medium thereof
US11776286B2 (en) 2020-02-11 2023-10-03 NextVPU (Shanghai) Co., Ltd. Image text broadcasting
CN116997947A (en) * 2021-03-30 2023-11-03 升旗株式会社 Surrounding environment information transmission device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006227219A (en) * 2005-02-16 2006-08-31 Advanced Telecommunication Research Institute International Information generating device, information output device, and program
JP2011204190A (en) * 2010-03-26 2011-10-13 Nippon Telegr & Teleph Corp <Ntt> Document-processing method and document-processing system
JP2011209787A (en) * 2010-03-29 2011-10-20 Sony Corp Information processor, information processing method, and program
JP2014165616A (en) * 2013-02-23 2014-09-08 Hyogo Prefecture Wearable display for low vision person
JP2015125464A (en) * 2013-12-25 2015-07-06 Kddi株式会社 Wearable device
JP2016194612A (en) * 2015-03-31 2016-11-17 株式会社ニデック Visual recognition support device and visual recognition support program

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006227219A (en) * 2005-02-16 2006-08-31 Advanced Telecommunication Research Institute International Information generating device, information output device, and program
JP2011204190A (en) * 2010-03-26 2011-10-13 Nippon Telegr & Teleph Corp <Ntt> Document-processing method and document-processing system
JP2011209787A (en) * 2010-03-29 2011-10-20 Sony Corp Information processor, information processing method, and program
JP2014165616A (en) * 2013-02-23 2014-09-08 Hyogo Prefecture Wearable display for low vision person
JP2015125464A (en) * 2013-12-25 2015-07-06 Kddi株式会社 Wearable device
JP2016194612A (en) * 2015-03-31 2016-11-17 株式会社ニデック Visual recognition support device and visual recognition support program

Also Published As

Publication number Publication date
WO2019039591A4 (en) 2019-05-09
JP2019040005A (en) 2019-03-14

Similar Documents

Publication Publication Date Title
WO2019039591A4 (en) Read-out system and read-out method
US10318028B2 (en) Control device and storage medium
US10045110B2 (en) Selective sound field environment processing system and method
US20180124497A1 (en) Augmented Reality Sharing for Wearable Devices
JP6143975B1 (en) System and method for providing haptic feedback to assist in image capture
JP6574937B2 (en) COMMUNICATION SYSTEM, CONTROL METHOD, AND STORAGE MEDIUM
US20170303052A1 (en) Wearable auditory feedback device
JP2014115457A (en) Information processor and recording medium
WO2017130486A1 (en) Information processing device, information processing method, and program
US20230045237A1 (en) Wearable apparatus for active substitution
KR20090105531A (en) The method and divice which tell the recognized document image by camera sensor
WO2015068440A1 (en) Information processing apparatus, control method, and program
US20210350823A1 (en) Systems and methods for processing audio and video using a voice print
US20220148599A1 (en) Audio signal processing for automatic transcription using ear-wearable device
CN109257490B (en) Audio processing method and device, wearable device and storage medium
CN111314763A (en) Streaming media playing method and device, storage medium and electronic equipment
EP3113505A1 (en) A head mounted audio acquisition module
CN112836685A (en) Reading assisting method, system and storage medium
JP6766403B2 (en) Head-mounted display device, head-mounted display device control method, computer program
US20210405686A1 (en) Information processing device and method for control thereof
US11327576B2 (en) Information processing apparatus, information processing method, and program
CN109361727B (en) Information sharing method and device, storage medium and wearable device
JP2021033368A (en) Reading device
CN111149373B (en) Hearing device for assessing voice contact and related method
WO2022113189A1 (en) Speech translation processing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18848807

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18848807

Country of ref document: EP

Kind code of ref document: A1