WO2019039591A1 - Système de lecture et procédé de lecture - Google Patents

Système de lecture et procédé de lecture Download PDF

Info

Publication number
WO2019039591A1
WO2019039591A1 PCT/JP2018/031366 JP2018031366W WO2019039591A1 WO 2019039591 A1 WO2019039591 A1 WO 2019039591A1 JP 2018031366 W JP2018031366 W JP 2018031366W WO 2019039591 A1 WO2019039591 A1 WO 2019039591A1
Authority
WO
WIPO (PCT)
Prior art keywords
unit
user
imaging
input
output
Prior art date
Application number
PCT/JP2018/031366
Other languages
English (en)
Japanese (ja)
Other versions
WO2019039591A4 (fr
Inventor
圭佑 島影
Original Assignee
株式会社オトングラス
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社オトングラス filed Critical 株式会社オトングラス
Publication of WO2019039591A1 publication Critical patent/WO2019039591A1/fr
Publication of WO2019039591A4 publication Critical patent/WO2019039591A4/fr

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B21/00Teaching, or communicating with, the blind, deaf or mute
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/043Time compression or expansion by changing speed
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/057Time compression or expansion for improving intelligibility

Definitions

  • the present invention relates to a reading system and a reading method for converting sentences into speech and reading them.
  • Patent Document 1 discloses a wearable display capable of imaging and displaying a front view so that a low vision person can walk outdoors at night or the like. According to the low-vision person wearable display of Patent Document 1, the contrast and the brightness of the captured image are converted and displayed. It also discloses that when there is a character in a captured image, character recognition processing is performed to notify the user of the character by voice.
  • the present invention has been made in view of the above problems, and it is an object of the present invention to provide a reading system that is more convenient for the user to use than the low vision wearable display described in Patent Document 1 above. Do.
  • a reading system concerning one mode of the present invention is equipped with a wearing tool which a user wears and uses, and an image which an image pick-up part picturizes a user's front direction, and an image And a converter for converting characters extracted by the extracting unit into voice, an output unit for outputting voice and provided in the mounting tool, and a mounting tool for receiving an input from the user An input unit, and a control unit provided in the mounting tool and controlling the reproduction speed of the sound output from the output unit based on the input from the user received via the input unit.
  • the reading method includes an imaging step of imaging a front direction of a user by an imaging unit provided in a mounting tool worn and used by the user; An extraction step of extracting characters from the image captured in step, a conversion step of converting the characters extracted in the extraction step into voice, an output step of outputting voice from an output unit provided in the wearing tool, and The method further includes an input step of receiving an input from the user from the input unit, and a control step of controlling a reproduction speed of the sound output from the output unit based on the input from the user received through the input unit.
  • control unit may pause the sound output from the output unit based on the input from the user received via the input unit.
  • control unit may repeatedly reproduce the voice output from the output unit based on the input from the user received via the input unit.
  • the output unit may output a sound indicating that the character conversion process is in progress while the conversion unit converts the characters into speech.
  • the reading system includes a server including an extraction unit and a conversion unit, and the mounting tool receives a transmission unit that transmits an image captured by the imaging unit to the server, and a voice converted by the conversion unit.
  • the output unit may output a sound indicating that character conversion processing is in progress, from when the transmission unit transmits an image to when the reception unit receives an audio.
  • the mounting tool may include an acquisition unit for acquiring environment information on the surrounding environment, and the imaging unit may change the imaging condition based on the environment information.
  • the mounting tool includes a determination unit that determines whether or not there is a character in the captured image captured by the imaging unit, and the output unit includes the character in the captured image. When it is determined that the character is present, a sound indicating that a character is present in the imaging direction by the imaging unit may be output.
  • the reading system associates the captured image captured by the imaging unit with the voice obtained by converting the converting unit based on the captured image, and transmits the associated image to the user's information processing terminal
  • a log transmission unit may be provided.
  • the mounting tool includes a position information acquisition unit that acquires position information indicating the position of the own terminal, and the imaging unit acquires position information acquired when the imaging unit captures an image of the captured image.
  • the log transmission unit may transmit position information to the user's information processing terminal together with the captured image and the sound.
  • the reading system according to an aspect of the present invention can freely adjust the speed of reading speech converted from characters. Therefore, it is possible to provide a reading system excellent in usability.
  • (A) is a figure which shows the example of an external appearance of the user who mounts
  • (b) is a figure which shows the example of an external appearance which images by using a mounting tool and it reads aloud. It is a figure showing an example of system configuration of a reading system.
  • (A) is a figure which shows the structural example of the data which a mounting tool transmits to a server
  • (b) is a figure which shows the structural example of the read-out audio
  • FIG. 6 is a diagram illustrating an example of a range in which characters are preferentially extracted from an image. It is a figure which shows the example of a screen for reproducing the reading audio
  • the reading system 1 is provided in the mounting tool 100 worn and used by the user, and from the imaging unit 111 that images the front direction of the user and the image captured by the imaging unit 111
  • Control unit 155 that controls the reproduction speed of the sound output from the output unit 156 based on the input from the user that is provided to the mounting tool 100 and that is received by the user via the input unit 154.
  • Such a reading system 1 will be described in detail below.
  • the user 10 wears the wearable glass 110 and uses it.
  • an imaging unit 111 is disposed at a position where it can capture an image in the front direction of the user according to an instruction from the user.
  • the imaging unit 111 is a so-called camera.
  • the wearable glass 110 is connected to the controller 150.
  • the wearable glass 110 is shown to be connected via the cord 130 via the earphone 130 and the cable 140.
  • the wearable glass 110 and the controller 150 are earphones. Similar to 130 and controller 150, they may be directly connected.
  • the user 10 can wear the earphone 130 on his ear and listen to the read-out voice transmitted from the controller 150. As shown in FIG.
  • the user 10 holds the controller 150, and can use the controller 150 to issue an imaging instruction or an instruction related to reproduction of a read-out voice.
  • the imaging unit 111 captures an image of the imaging range 160. Then, a character included in the imaging range 160 is recognized, and the character is converted into a mechanically synthesized speech and read out. Therefore, the reading system 1 can provide information of difficult-to-read characters to the low vision person or the like.
  • FIG. 2 shows an example of the system configuration of the reading system 1.
  • the reading system 1 includes a wearing tool 100 and a server 200.
  • the mounting tool 100 and the server 200 are configured to be able to communicate via the network 300.
  • the mounting tool 100 and the network 300 communicate by wireless communication. Any communication protocol may be used as long as wireless communication can be performed.
  • the server 200 also communicates with the network, but any communication mode may be used, either wireless communication or wired communication, and any communication protocol may be used as long as communication can be performed.
  • the mounting tool 100 includes a wearable glass 110, an earphone 130, and a controller 150. That is, in the present embodiment, as shown in FIG. 2, the wearable glass 110, the earphone 130, and the controller 150 are collectively referred to as a mounting tool 100. Further, although the wearable glass 110 is used here, it is needless to say that the wearable glass 110 is not limited to the glasses as long as the front direction (viewing direction) of the user 10 can be imaged.
  • the wearable glass 110 includes an imaging unit 111 and a communication I / F 112.
  • the imaging unit 111 is a camera capable of imaging the front direction of the user.
  • the imaging unit 111 receives an imaging signal instructed from the communication I / F 112 and performs imaging.
  • the imaging unit 111 may be provided anywhere on the wearable glass 110 as long as the imaging unit 111 can pick up an image in the front direction of the user.
  • FIG. 1 illustrates an example in which the left hinge portion of the wearable glass is provided, the imaging unit 111 may be provided in the right hinge portion or may be provided in the bridge portion.
  • the imaging unit 111 transmits a captured image obtained by imaging to the communication I / F 112.
  • the imaging unit 111 may have a detection function of analyzing the captured image and detecting the presence or absence of characters in the captured image while sequentially capturing images, and at this time, the captured image includes characters.
  • the presence signal indicating that the character is present in the front direction of the user is transmitted to the communication I / F 112.
  • the communication I / F 112 is a communication interface having a function of communicating with the controller 150.
  • the communication I / F 112 is communicably connected to the communication I / F 151 of the controller 150.
  • the communication I / F 112 transmits the imaging signal transmitted from the communication I / F 151 of the controller 150 to the imaging unit 111.
  • the communication I / F 112 transmits, to the communication I / F 151, the captured image transmitted from the imaging unit 111 and a presence signal indicating that a character is present in the front direction of the user.
  • the earphone 130 is connected to the output unit 156 of the controller 150, and has a function of outputting an audio signal transmitted from the output unit 156 as an audio.
  • the earphone 130 is connected to the controller 150 by wire, but this may be wireless connection.
  • the earphone 130 outputs a read-out voice read out from a character detected based on the captured image, a sound indicating that the character is being analyzed, or a sound indicating that the character is present in the front direction of the imaging unit 111.
  • the controller 150 includes a communication I / F 151, a communication unit 152, a storage unit 153, an input unit 154, a control unit 155, and an output unit 156. As shown in FIG. 1, each part of the controller 150 is mutually connected by a bus.
  • the communication I / F 151 is a communication interface having a function of communicating with the communication I / F 112 of the wearable glass 110.
  • the communication I / F 151 receives an imaging signal from the control unit 155, the communication I / F 151 transmits the imaging signal to the communication I / F 112. Further, when the communication I / F 151 receives a captured image or a presence signal from the communication I / F 112, the communication I / F 151 transmits the image to the control unit 155.
  • the communication unit 152 is a communication interface having a function of executing communication with the server 200 via the network 300.
  • the communication unit 152 functions as a transmission unit for the captured image to the server 200 according to an instruction from the control unit 155, and also functions as a reception unit for receiving from the server 200 a read-out voice obtained by converting characters included in the captured image to voice. .
  • the communication unit 152 transmits the read voice to the control unit 155.
  • the storage unit 153 has a function of storing various programs and data required for the controller 150 to function.
  • the storage unit 153 can be realized by, for example, an HDD (Hard Disc Drive), an SSD (Solid State Drive), a flash memory, and the like, but is not limited thereto.
  • the storage unit 153 stores, for example, a reading program executed by the control unit 155, a captured image captured by the imaging unit 111, information on read-out voice received by the communication unit 152, and the like.
  • the storage unit 153 is a sound that is output from the output unit 156 during the period from when the communication unit 152 transmits the captured image to the server 200 to when the read-out sound is received, and the character is being converted into sound.
  • the voice information to be shown is stored.
  • the storage unit 153 also stores voice information for informing the user 10 of the fact.
  • the input unit 154 has a function of receiving an input from the user 10.
  • the input unit 154 can be realized by, for example, a hard key provided in the controller 150, but this may be realized by a touch panel or the like.
  • the input unit 154 includes an imaging button 154A for instructing the imaging by the user 10, a playback button 154B for instructing playback, pause, and replay, and an adjustment button 154C for adjusting the playback speed of sound. Good.
  • the input unit 154 transmits a signal indicating the pressed content to the control unit 155 in response to the pressing of each button.
  • the control unit 155 is a processor having a function of controlling each unit of the controller 150.
  • the control unit 155 executes the various programs stored in the storage unit 153 to perform the function to be executed as the controller 150.
  • the control unit 155 When receiving the imaging instruction from the input unit 154, the control unit 155 instructs the communication I / F 151 to transmit an imaging signal to the wearable glass 110.
  • the control unit 155 instructs the communication unit 152 to transmit the captured image to the server 200. Further, after the instruction, the control unit 155 reads from the storage unit 153 a voice indicating that conversion of characters included in the captured image into a voice is instructed, and instructs the output unit 156 to output the voice.
  • the control unit 155 instructs the output unit 156 to stop the output of the voice indicating that conversion is in progress. Then, the control unit 155 instructs the output unit 156 to output the read voice.
  • control unit 155 reads from storage unit 153 a voice indicating that a character is present in the front direction of user 10, and outputs the voice to output unit 156. To tell.
  • control unit 155 executes reproduction control processing of the read-out sound in accordance with the instruction from the user 10 transmitted from the input unit 154. For example, when the pause instruction is received, the output unit 156 is instructed to temporarily stop the reproduction of the reading voice.
  • the control unit 155 instructs the output unit 156 to execute the slow reproduction of the read sound.
  • the slow playback instruction can be replaced by a playback speed adjustment instruction, and the control unit 155 can also increase or decrease the playback speed of the read-out sound.
  • the control unit 155 receives a replay instruction, the control unit 155 instructs the output unit 156 to reproduce the read-out voice output so far.
  • the output unit 156 has a function of outputting an audio signal instructed from the control unit 155 to the earphone 130.
  • the output unit 156 outputs, to the earphone 130, a read-out voice, a voice indicating that the character is being converted into voice, or a voice indicating that the character is present in the front direction of the user 10.
  • the server 200 includes a communication unit 210, a storage unit 220, and a control unit 230.
  • the communication unit 210, the storage unit 220, and the control unit 230 are connected to one another via a bus.
  • the communication unit 210 is a communication interface having a function of performing communication with the mounting tool 100 (controller 150) via the network 300.
  • the communication unit 210 functions as a transmission unit that transmits read sound to the attachment 100 in accordance with an instruction from the control unit 230, and also functions as a reception unit that receives a captured image.
  • the communication unit 210 transmits the captured image to the control unit 230.
  • the storage unit 220 stores various programs and data that the server 200 needs in operation.
  • the storage unit 220 can be realized by, for example, a hard disc drive (HDD), a solid state drive (SSD), a flash memory, or the like, but is not limited thereto.
  • the storage unit 220 stores a character recognition program for extracting characters from an image, a voice conversion program for converting the recognized characters into speech, and read-out voice information. Details of the read-out speech information will be described later.
  • the control unit 230 is a processor having a function of controlling each unit of the server 200.
  • the control unit 230 executes the various programs stored in the storage unit 220 to perform the function to be executed as the server 200.
  • the control unit 230 functions as the extraction unit 231 by executing the character recognition program, and functions as the conversion unit 232 by executing the speech conversion program.
  • the extraction unit 231 has a function of analyzing a captured image and extracting characters included in the captured image. Existing character recognition processing can be used as the analysis technique.
  • the conversion unit 232 has a function of converting the characters extracted by the extraction unit 231 into speech (read-out speech).
  • An existing conversion process can be used for the conversion technology.
  • FIG. 3 is a view showing an example of the data configuration of data according to the reading system 1.
  • FIG. 3A is a view showing a data configuration example (format example) of transmission data 310 (captured image) transmitted by the mounting tool 100 (controller 150) to the server 200. As shown in FIG.
  • the transmission data 310 is information in which a user ID 311, captured image information 312, and imaging time information 313 are associated.
  • the user ID 311 is identification information that can uniquely identify the user 10 who uses the mounting tool 100.
  • the server 200 can identify which user the captured image is from, and can manage the captured image and the generated read-out sound for each user.
  • the captured image information 312 is information indicating actual data of a captured image captured by the imaging unit 111.
  • the imaging time information 313 is information indicating the date and time when the captured image indicated by the captured image information 312 was captured. The information is not illustrated, but can be acquired from the internal clock of the imaging unit 111.
  • FIG. 3B is a view showing an example of the data configuration of the read-out voice information stored in the storage unit 220 of the server 200 and managed for each user who uses the read-out system 1.
  • the data is information for managing the read-out speech obtained by the server 200 converting it in the past.
  • the read-out voice information 320 is information in which the imaging time information 321, the captured image information 322, and the read-out voice 323 are associated.
  • the imaging time information 321 is information indicating the date and time when the corresponding imaging image was imaged, and is the same information as the imaging time information 313.
  • the captured image information 322 is information indicating actual data of the captured image, and is the same information as the captured image information 312.
  • the read-out sound 323 is actual data indicating the read-out sound obtained by the extraction unit 231 extracting a character from the corresponding captured image information 322 and the conversion unit 232 converting the character.
  • the server 200 can manage the past read-out voice.
  • FIG. 4 is a sequence diagram showing the exchange between the mounting tool 100 and the server 200.
  • the wearing tool 100 performs imaging in the front direction of the user 10 (step S ⁇ b> 401). Then, the wearing tool 100 transmits the obtained captured image to the server 200 (step S402).
  • the server 200 receives the captured image transmitted from the mounting tool 100 (step S403). Then, the server 200 extracts characters from the received captured image (step S404). Then, the server 200 converts the extracted characters into speech and generates a read-out speech (step S405). When the read-out speech is generated, the server 200 transmits this to the attachment 100 (step S406).
  • the wearing tool 100 receives the read-out voice transmitted from the server 200 (step S407). Then, the wearing tool 100 outputs the received read-out voice (step S408). As a result, the reading system 1 can recognize characters present in the front direction (viewing direction) of the user 10 and transmit the characters to the user 10 by sound.
  • FIG. 5 is a flowchart showing the operation of the mounting tool 100.
  • the input unit 154 of the mounting tool 100 determines whether or not there has been an input from the user based on whether or not various buttons have been pressed (step S501). If there is an input from the user (YES in step S501), the process proceeds to the process of step S502. If there is no input (NO in step S501), the process proceeds to the process of step S512.
  • step S502 the control unit 155 determines whether the input accepted by the input unit 154 is an imaging instruction (step S502). If the input is an imaging instruction (YES in step S502), the process proceeds to step S503. If the input is not an imaging instruction (NO in step S502), the process proceeds to step S506.
  • step S503 when the input unit 154 receives an imaging instruction from the user, the imaging instruction is transmitted to the control unit 155.
  • the control unit 155 instructs the communication I / F 151 to transmit the imaging signal to the wearable glass 110.
  • the communication I / F 151 transmits an imaging signal to the communication I / F 112 according to the instruction.
  • the communication I / F 112 transmits an imaging signal to the imaging unit 111, and the imaging unit 111 executes imaging (step S503).
  • the imaging unit 111 transmits the obtained captured image to the communication I / F 112, and the communication I / F 112 transfers the captured image to the communication I / F 151.
  • the communication I / F 151 transmits the transmitted captured image to the control unit 155, and the control unit 155 instructs the communication unit 152 to transmit this to the server 200.
  • the communication unit 152 transmits the captured image to the server 200 via the network 300 (step S504).
  • the control unit 155 reads from the storage unit 153 a voice indicating that the characters in the captured image are being converted to voice, and instructs the output unit 156 to output the voice.
  • the output unit 156 outputs the voice to the earphone 130, and the earphone 130 notifies the voice (step S505), and the process returns to step S501.
  • step S502 when it is determined in step S502 that the input instruction is not the imaging instruction (NO in step S502), it is determined whether the input is the audio pause instruction (step S506). If the input is a voice pause instruction (YES in step S506), the control unit 155 instructs the output unit 156 to pause the voice being output. Upon receiving the instruction, the output unit 156 stops outputting the voice (step S507), and returns to the process of step S501. The pause is performed until a new reproduction instruction is input or until a complete stop instruction is input.
  • step S506 When it is determined in step S506 that the input instruction is not the pause instruction (NO in step S506), it is determined whether the input is the slow reproduction instruction (step S508). If the input is an instruction for slow reproduction (YES in step S508), the control unit 155 instructs the output unit 156 to perform slow reproduction of the audio being output. In response to the instruction, the output unit 156 starts slow reproduction of the audio being output (step S509), and returns to the process of step S501.
  • the reproduction speed may be increased. When the reproduction speed is increased, it can be used to shorten the time for grasping the outline of the characters included in the captured content.
  • step S510 When it is determined in step S508 that the input instruction is not slow reproduction (NO in step S508), it is determined whether the input is an input of reproduction or replay (step S510). If the input is not a reproduction or replay input (NO in step S510), the process returns to step S501. If the input is an input for playback or replay (YES in step S510), the control unit 155 resumes the output of the temporarily paused voice to the output unit 156 or performs replay of the voice once output. Instruct In response to this, the output unit 156 resumes or replays the audio reproduction (step S511), and returns to the process of step S501. By this, even if the user 10 misses the read-out voice, it can be re-read again.
  • step S501 When there is no input from the user in step S501 (NO in step S501), the control unit 155 determines whether the read-out speech has been received from the server 200 (step S512). If the read-out speech has not been received (NO in step S512), the process returns to step S501.
  • control unit 155 If the read-out speech has been received (YES in step S512), the control unit 155 first causes the output unit 156 to output a speech indicating that the character being outputted is being converted to speech. Instruct them to cancel In response to the instruction, the output unit 156 stops the output of the voice (step S513).
  • control unit 155 instructs the output unit 156 to output the read sound transmitted from the communication unit 132.
  • the output unit 156 starts output of the read-out voice transmitted from the control unit 155 (step S514), and returns to step S501.
  • FIG. 6 is a flowchart showing an operation when the server 200 receives a captured image from the mounting tool 100.
  • the communication unit 210 of the server 200 receives a captured image from the mounting tool 100 via the network 300 (step S601).
  • the communication unit 210 transmits the received captured image to the control unit 230.
  • the control unit 230 analyzes the transmitted captured image and extracts characters (step S602).
  • the extraction unit 231 transmits the extracted character string to the conversion unit 232.
  • the conversion unit 232 converts the extracted character string into speech (step S603), and generates a read-out speech that is a synthetic speech of the opportunity speech.
  • the conversion unit 232 transmits the generated read-out speech to the communication unit 210.
  • the communication unit 210 transmits the converted synthesized speech as the read-out speech to the wearing tool 100 via the network 300 (step S604).
  • control unit 230 sets the received captured image, the captured date and time of the captured image, and the read-out voice obtained from the captured image as the captured image information 322, the capture time information 321, and the read-out voice 323, respectively. It registers in read-out voice information (step S605), and ends the processing.
  • the reading system 1 can reproduce the voice so as to be easy for the user to hear, rather than merely reading the recognized character.
  • the reading system 1 can recognize the characters included in the captured image, output them as sounds. At this time, in the reading system 1, since the user can perform operations such as slow reproduction, primary stop, replay, etc. for the read-out sound, the user can reproduce the sound so as to be easy to hear according to each preference. it can. Therefore, it is possible to provide a user-friendly reading system. Further, in the reading system 1, while the process of generating the reading voice from the captured image is being performed, the user 10 can be made to recognize the situation by notifying the voice indicating that the processing is in progress.
  • the voice is output using the earphone 130
  • the wearable glass 110 or the controller 150 is provided with a speaker
  • the output unit 156 outputs the read voice from the speaker It may be With this configuration, even a user who has pain in wearing the earphone 130 can hear the read-out voice. In addition, in this case, there is also an advantage that a plurality of users can listen to voice simultaneously.
  • the wearing tool 100 includes the wearable glass 110, the earphone 130, and the controller 150 and is configured as separate devices.
  • the wearable glass 110, the earphone 130, and the controller 150 may be integrally formed. That is, the wearable glass 110 may include a speaker as an alternative to the function of outputting the sound of the earphone 130, and may hold the function of the controller 150.
  • the temple portion of the wearable glass 110 may have a hollow structure, and the processor, the memory, the communication module, and the like of the controller 150 may be mounted therein. Then, on the exterior side of the temple or rim of the wearable glass 110, various buttons may be provided for voice reproduction control and an imaging instruction.
  • the mounting tool 100 may have the functions of the server 200 (the functions of the extraction unit and the conversion unit).
  • the controller 150 may be configured to include a chip for realizing the function of the server 200. If comprised in this way, the mounting tool 100 can construct a reading system by a stand-alone. In addition, it is possible to suppress the latency related to the transmission of the captured image and the reception of the reading voice.
  • the range for extracting characters from the captured image is determined in advance, but this is not the only limitation.
  • the wearable glass 110 is provided with a camera for capturing an eye of a user, a gaze direction is detected, a predetermined range centered on the gaze direction is applied to a captured image, and characters within the predetermined range are detected. You may For example, the wearable glass 110 transmits the first captured image captured by the imaging unit 111 and the second captured image obtained by capturing the eyes of the user to the controller 150, and the controller 150 transmits the first captured image and the second captured image. And to the server 200.
  • the extraction unit 231 of the server 200 specifies the line-of-sight direction of the user 10 from the second captured image, specifies a predetermined range including the specified line-of-sight direction, and starts from the location corresponding to the predetermined range in the first captured image It may be configured to extract characters.
  • the imaging unit 111 receives an input of an imaging instruction to the controller 150 and performs imaging
  • the imaging trigger is not limited to this.
  • the wearable glass 110 or the controller 150 is provided with a microphone, and the microphone emits voice emitted by the user. Then, imaging may be performed based on a specific word issued by the user. That is, imaging by voice input may be performed.
  • a camera for capturing an eye of the user may be provided on the wearable glass 110, and blink (eye blink) of the user's eye may be used as a trigger for imaging.
  • the input unit 154 is provided in the controller 150.
  • the input unit 154 may be provided in the middle of the cable 140.
  • the reading system 1 may be provided with a setting unit capable of setting the language of the reading voice.
  • a translation unit may be provided to translate the characters extracted by the extraction unit 231 into the language set in the setting unit, and the conversion unit 232 may convert the characters translated by the translation unit into speech.
  • the reading system 1 can function as a system for interpreting written characters, and can be a system useful not only for people with low vision but also for foreign users.
  • the extraction unit 231 may keep the range for extracting characters from the captured image within a predetermined range instead of the entire captured image.
  • FIG. 7 shows an example of the captured image 700, and the extraction unit 231 may set only a predetermined range 710 in the captured image 700 as a range for extracting characters.
  • the predetermined range 710 may be set as a range in which characters are preferentially extracted. The process of extracting a character from the outside of the predetermined range 710 when the character extraction range is a range in which the character is extracted first, and the character can not be extracted from the inside of the predetermined range 710 Say what to do.
  • the predetermined range 710 may be set by the user who uses the reading system 1. Generally, the user tends to look in a direction slightly lower than the front direction. Therefore, it is effective to set the predetermined range 710 closer to the lower part of the captured image 700.
  • control unit 230 may set the predetermined range 710. Specifically, for a large number of captured images received by the server 200, a range in which characters can be extracted is specified. Then, the average range may be set as a predetermined range 710 for extracting characters.
  • various sensors may be provided on the wearable glass 110, and the predetermined range 710 may be determined based on sensing data obtained from the sensors.
  • a gyro sensor is mounted on the wearable glass 110, and the mounting tool 100 transmits sensing data of the gyro sensor to the server 200 together with a captured image.
  • the extraction unit 231 may determine the predetermined range 710 based on sensing data of the gyro sensor. For example, when it is estimated from the sensing data that the user 10 is depressed, the predetermined range 710 may be set to a position from the lower side with respect to the entire captured image 700.
  • the server 200 has a configuration for transmitting the corresponding read-out voice information 320 as a past log to an information processing apparatus such as a PC held by the user 10 May be With this configuration, the user 10 can listen to the past read-out voice any time.
  • the mounting tool 100 may be provided with a position information acquisition unit for acquiring position information indicating the location where the own device is present.
  • the position information acquisition unit can be realized by using, for example, GPS or GNSS.
  • the position information obtaining unit obtains position information, and associates the obtained position information with the captured image.
  • the attachment tool 100 transmits the captured image associated with the position information to the server 200.
  • the server 200 may further associate and manage imaging position information indicating an imaging position as the reading voice information 320.
  • information including position information is transmitted from the server 200 as the read-out voice information 320 to the information processing apparatus of the user 10, so that the information processing apparatus of the user 10 further reads the read-out voice as shown in FIG. Can be presented with the map application. That is, the user 10 can recognize when and where the read-out speech is acquired on the map. Then, by positioning and clicking the cursor 803 on the log information 801 and 802 of the map information, etc., the information processing apparatus may play back the read-out voice by voice reproduction software or the like. For example, as shown in the map 800 in FIG. 8, the log information 801 and the log information 802 can recognize where the read-out voice is obtained based on the captured image captured.
  • the imaging unit 111 sequentially performs imaging, and it is determined whether or not characters are included in the obtained captured image. It is good to detect. Then, when it is detected that a character is included, that effect is transmitted to the controller 150, and the control unit 155 causes the user 10 to recognize the presence of the character in the front direction at that time. It may be informed. Then, the user 10 can input an imaging instruction to the timing and the input unit 154.
  • the user 10 can be made to recognize the presence of the character when the user 10 can not visually recognize the presence of the character, such as when the user 10 is a low-vision person, in particular, blind. Can provide a highly convenient reading system 1.
  • the imaging unit 111 may change the imaging condition according to the environment in which the user (wearable glass 110) is placed.
  • the wearable glass 110 may include various sensors (for example, an illuminance sensor or the like) to change the exposure time and the angle of view.
  • the server 200 can not extract characters from the image, can not convert the extracted characters into sound, or the image does not contain characters.
  • the error signal may be transmitted to the mounting tool 100, and the mounting tool 100 may receive the signal and output a sound indicating the error from the output unit 156.
  • an activation sound when the mounting tool 100 is activated an imaging sound when the imaging unit 111 performs imaging ( A variety of sounds are stored in the storage unit 153 such as a shutter sound), a sound indicating that the user is waiting, and a cancellation sound when the user inputs an instruction to cancel the process.
  • a corresponding sound may be output from the output unit 156.
  • the mounting tool 100 can notify the user of the state of the device only by the sound by adopting the configuration of outputting the sound according to various states.
  • the server 200 generates the read-out voice generated according to the ratio of the location where the characters are extracted from the captured image or the range where the characters are extracted to the captured image. May be changed.
  • Changing the mode of the voice according to the place where the character is extracted means changing the direction in which the user hears the voice according to the place in the captured image in which the character is extracted from the captured image. For example, when the part where the character is extracted is extracted from the right side of the captured image, the output unit 156 may output so that the read-out voice can be heard from the right side of the user. With this configuration, it is possible for the user to intuitively recognize in which direction the user read the character in the direction viewed from the user.
  • changing the mode of the read-out voice generated according to the ratio of the range from which the character is extracted to the captured image changes the volume of the read-out voice according to the ratio of the ratio to the captured image of the range from which the character is extracted.
  • the transmission data 310 is associated with the user ID 311, the captured image information 312, and the imaging time information 313, but various other information is also supported. It may be attached. For example, as shown in the above supplement, even if information of sensing data such as a gyro sensor or an acceleration sensor capable of specifying the posture of the mounting tool 100 is also associated with position information indicating a position where the mounting tool 100 exists. Good.
  • the imaging time information 321, the captured image information 322, and the read-out voice 323 are associated with each other, but in addition to this, the captured image is obtained by analysis. Text data of characters, position information included in the transmission data 310, sensing data, etc. may also be associated.
  • the read-out voice information can be used as a life log of each user by accumulating and accumulating various types of information.
  • server 200 may be provided with the offer part which provides specified information among the stored information. For example, by accumulating position information, information on the amount of movement of the user per unit time (for example, 1 day) can be provided, information on where the user has gone, information on gyro sensor information can be used Then, by specifying the posture of the user, it is possible to provide posture information (for example, whether the posture is good or bad).
  • the processor (the control unit 155, the control unit 230) functioning as each functional unit constituting the reading system 1 performs a reading program etc. as a method for the reading system 1 to read the voice. By performing this processing, the reading processing is performed.
  • This is a logic circuit (hardware) or a dedicated circuit formed in an integrated circuit (IC (Integrated Circuit) chip, LSI (Large Scale Integration)) or the like. It may be realized by incorporating In addition, these circuits may be realized by one or more integrated circuits, and the functions of the plurality of functional units shown in the above embodiments may be realized by one integrated circuit.
  • An LSI may be called a VLSI, a super LSI, an ultra LSI, or the like depending on the degree of integration. That is, as shown in FIG. 9, each functional unit in the mounting tool 100 and the server 200 constituting the reading system 1 may be realized by a physical circuit. That is, as shown in FIG. 9, the wearing tool 100 includes a wearable glass 110 including an imaging circuit 111a and a communication I / F circuit 112a, an earphone 130, a communication I / F circuit 151a, a communication unit 152a, and a memory.
  • the server 200 may also be configured of a communication circuit 210a, a storage circuit 220a, and a control circuit 230a including an extraction circuit 231a and a conversion circuit 232a.
  • the above-mentioned reading program may be recorded on a recording medium readable by a processor, and as the recording medium, "non-temporary tangible medium", for example, a tape, a disk, a card, a semiconductor memory, a programmable logic circuit Etc. can be used. Further, the reading program may be supplied to the processor via any transmission medium (communication network, broadcast wave, etc.) capable of transmitting the reading program.
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the above-mentioned reading program is embodied by electronic transmission.
  • the above-mentioned reading program can be implemented using, for example, a script language such as ActionScript or JavaScript (registered trademark), an object-oriented programming language such as Objective-C or Java (registered trademark), or a markup language such as HTML5.
  • a script language such as ActionScript or JavaScript (registered trademark)
  • an object-oriented programming language such as Objective-C or Java (registered trademark)
  • a markup language such as HTML5.
  • reading system 100 mounting tool 110 wearable glass 111 imaging unit 112 communication I / F 130 Earphone 150 Controller 151 Communication I / F 152 communication unit 153 storage unit 154 input unit 155 control unit 156 output unit 200 server 210 communication unit 220 storage unit 230 control unit 231 extraction unit 232 conversion unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • User Interface Of Digital Computer (AREA)
  • Studio Devices (AREA)
  • Character Discrimination (AREA)

Abstract

Un système de lecture selon l'invention comprend : une unité d'imagerie qui est disposée sur un dispositif portable à utiliser porté sur le corps d'un utilisateur, et qui capture des images dans la direction vers l'avant de l'utilisateur ; une unité d'extraction qui extrait des caractères écrits des images capturées par l'unité d'imagerie ; une unité de conversion qui convertit les caractères extraits par l'unité d'extraction en un contenu audio vocal ; une unité de sortie qui est disposée sur le dispositif portable et délivre le contenu audio vocal ; une unité d'entrée qui est disposée sur le dispositif portable et reçoit une entrée de l'utilisateur ; et une unité de commande qui, sur la base de l'entrée de l'utilisateur reçue via l'unité d'entrée, commande la vitesse de lecture de la sortie audio vocale provenant de l'unité de sortie.
PCT/JP2018/031366 2017-08-24 2018-08-24 Système de lecture et procédé de lecture WO2019039591A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2017160888A JP2019040005A (ja) 2017-08-24 2017-08-24 読み上げシステム及び読み上げ方法
JP2017-160888 2017-08-24

Publications (2)

Publication Number Publication Date
WO2019039591A1 true WO2019039591A1 (fr) 2019-02-28
WO2019039591A4 WO2019039591A4 (fr) 2019-05-09

Family

ID=65440089

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2018/031366 WO2019039591A1 (fr) 2017-08-24 2018-08-24 Système de lecture et procédé de lecture

Country Status (2)

Country Link
JP (1) JP2019040005A (fr)
WO (1) WO2019039591A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6773844B1 (ja) * 2019-06-12 2020-10-21 株式会社ポニーキャニオン 情報処理端末及び情報処理方法
US11776286B2 (en) 2020-02-11 2023-10-03 NextVPU (Shanghai) Co., Ltd. Image text broadcasting
CN110991455B (zh) * 2020-02-11 2023-05-05 上海肇观电子科技有限公司 图像文本播报方法及其设备、电子电路和存储介质
JPWO2022209043A1 (fr) * 2021-03-30 2022-10-06

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006227219A (ja) * 2005-02-16 2006-08-31 Advanced Telecommunication Research Institute International 情報生成装置、情報出力装置およびプログラム
JP2011204190A (ja) * 2010-03-26 2011-10-13 Nippon Telegr & Teleph Corp <Ntt> 文書処理方法および文書処理システム
JP2011209787A (ja) * 2010-03-29 2011-10-20 Sony Corp 情報処理装置、および情報処理方法、並びにプログラム
JP2014165616A (ja) * 2013-02-23 2014-09-08 Hyogo Prefecture ロービジョン者用ウェアラブルディスプレイ
JP2015125464A (ja) * 2013-12-25 2015-07-06 Kddi株式会社 ウェアラブルデバイス
JP2016194612A (ja) * 2015-03-31 2016-11-17 株式会社ニデック 視覚認識支援装置および視覚認識支援プログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006227219A (ja) * 2005-02-16 2006-08-31 Advanced Telecommunication Research Institute International 情報生成装置、情報出力装置およびプログラム
JP2011204190A (ja) * 2010-03-26 2011-10-13 Nippon Telegr & Teleph Corp <Ntt> 文書処理方法および文書処理システム
JP2011209787A (ja) * 2010-03-29 2011-10-20 Sony Corp 情報処理装置、および情報処理方法、並びにプログラム
JP2014165616A (ja) * 2013-02-23 2014-09-08 Hyogo Prefecture ロービジョン者用ウェアラブルディスプレイ
JP2015125464A (ja) * 2013-12-25 2015-07-06 Kddi株式会社 ウェアラブルデバイス
JP2016194612A (ja) * 2015-03-31 2016-11-17 株式会社ニデック 視覚認識支援装置および視覚認識支援プログラム

Also Published As

Publication number Publication date
JP2019040005A (ja) 2019-03-14
WO2019039591A4 (fr) 2019-05-09

Similar Documents

Publication Publication Date Title
WO2019039591A4 (fr) Système de lecture et procédé de lecture
US10582328B2 (en) Audio response based on user worn microphones to direct or adapt program responses system and method
US10448139B2 (en) Selective sound field environment processing system and method
US10318028B2 (en) Control device and storage medium
JP6094190B2 (ja) 情報処理装置および記録媒体
US20180124497A1 (en) Augmented Reality Sharing for Wearable Devices
US10856071B2 (en) System and method for improving hearing
JP6143975B1 (ja) 画像の取り込みを支援するためにハプティックフィードバックを提供するためのシステムおよび方法
JP6574937B2 (ja) 通信システム、制御方法、および記憶媒体
US20170303052A1 (en) Wearable auditory feedback device
WO2017130486A1 (fr) Dispositif de traitement d&#39;informations, procédé de traitement d&#39;informations, et programme
CN104509129A (zh) 耳机方位的自动检测
EP4085655A1 (fr) Systèmes de prothèse auditive et procédés
WO2015068440A1 (fr) Appareil de traitement d&#39;informations, procédé de commande et programme associé
US20210350823A1 (en) Systems and methods for processing audio and video using a voice print
CN114115515A (zh) 用于帮助用户的方法和头戴式单元
US20220148599A1 (en) Audio signal processing for automatic transcription using ear-wearable device
CN109257490B (zh) 音频处理方法、装置、穿戴式设备及存储介质
EP3113505A1 (fr) Module d&#39;acquisition audio monté sur la tête
CN112836685A (zh) 一种辅助阅读方法、系统及存储介质
US11327576B2 (en) Information processing apparatus, information processing method, and program
JP6766403B2 (ja) 頭部装着型表示装置、頭部装着型表示装置の制御方法、コンピュータープログラム
JP2021033368A (ja) 読み上げ装置
CN111149373B (zh) 用于评估语音接触的听力设备及相关方法
WO2022113189A1 (fr) Dispositif de traitement de traduction de parole

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18848807

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18848807

Country of ref document: EP

Kind code of ref document: A1