WO2016192762A1

WO2016192762A1 - Augmented reality systems and methods to access optical information

Info

Publication number: WO2016192762A1
Application number: PCT/EP2015/062019
Authority: WO
Inventors: Pan Hui; Arailym BUTABAYEVA; Zhanpeng HUANG; Rui Zheng; Christoph Peylo
Original assignee: Deutsche Telekom Ag
Priority date: 2015-05-29
Filing date: 2015-05-29
Publication date: 2016-12-08
Also published as: EP3304528A1

Abstract

The invention relates to a method of selective image enlargement comprising the steps of capturing an image using an image capture unit, recognizing at least one text passage in the captured image, enlarging at least one part of the recognized text passage and displaying the enlarged part on a display unit and/or converting at least one part of the recognized text passage into speech and outputting the at least one converted part through a speaker. The invention also relates to a computing system for selective image enlargement comprising an image capture unit configured to capture an image, and at least one processing unit configured to recognize at least one text passage in the captured image, and enlarge at least one part of the recognized text passage and/or convert at least one part of the recognized text passage into speech, wherein the computing system further comprises a display unit being configured, if the at least one part of the recognized text passage is enlarged, to display the enlarged part and/or wherein the computing system further comprises a speaker configured to output the at least one converted part through the speaker, if the at least one part of the recognized text passage is converted into speech. The invention also relates to a computing device for use in a computing system.

Description

Augmented reality systems and methods to access optical information

The invention relates to a method of selective image enlargement as well as a respective computing system and a respective computing device.

Visual impairment such as nearsightedness or farsightedness is a common vision deficiency that affects people's daily life.

Nowadays wearable computing devices, for example in form of eyewear, head-mounted displays, helmet-mounted displays are publicly available to provide augmented reality experiences. A wearable computing device with augmented reality capabilities may be configured to allow visual perception of a real-world environment, and to display computer- generated information related to this perception. This computer-generated information may be in fact integrated with user's perception of the real- world environment. For example, the computer-generated information may supplement a user's current perception of the physical world.

In some situations, the user of a wearable computing device may have difficulties in seeing. For instance, the user may be visually impaired, i.e., he/she may have short- and/or longsighted vision. Therefore, it may be difficult for a user to see the surrounding environment clearly.

Thus, it may be beneficial to provide him/her information, in particular text information, from the surrounding environment. Text information around people introduces a valuable source of information, and visually impaired people may need assistance to access it in their daily life. Text can be printed in many media forms such as books, magazines, newspapers, advertisements, restaurant menus, flyers, product information, street signs, information boards, etc.

However, simple text recognition is not enough, since a user might face a variety of situations in everyday life, such as reading a book, having a walk, having a meeting or driving a car etc. Therefore, the system should adapt to users' needs in different situations.

Several systems were proposed, which are able to assist visually impaired individuals using mobile devices such as smartphones. However, holding a smartphone in the hand for a long time will be inconvenient for the user. In the current state of the art, systems to retrieve text information cannot provide particular benefits to individuals suffering from visual impairment, such as those with short-and long-sighted vision.

To this extent, mobile wearable devices could be useful to overcome these drawbacks of the prior art. In order for an electronic device providing text recognition to fully benefit the visually impaired individuals, the device should address certain criteria for user interaction. Among those criteria, it is desirable that the methods of interaction are natural, i.e., do not create discomfort to the user and do not require excessive user attention while using the device.

Moreover, it is desirable for the system to provide a flexible and user-friendly interface enabling the visually impaired user to easily activate and control the text recognition functionality of the device. Furthermore, in order to fully empower the visually impaired individuals, while also protecting their safety, high text recognition reliability and real-time functioning are desirable for the system.

It is an object of the invention to provide a method of selective image enlargement as well as a respective computing system and respective computing device.

The object of the invention is solved with the features of the independent claims. Dependent claims refer to preferred embodiments.

It is a gist of the invention to provide an augmented reality based method that can aid visually impaired individuals to access text information around their environments. With this method visually impaired individuals are able to retrieve text content from their current view, the content of the text information can be provided in different ways according to their needs, in various scenarios.

According to the invention on-demand services with hand-free user experience can be supplied.

Preferably, a system according to the invention either directly overlays text information on the current view, and/or converts it into speech. In the latter case, the users can have a full view of the real world while hearing the information contained in the retrieved text. The sight of a visually impaired individual might be significantly improved by the text recognition capability of an electronic assistant device providing text recognition functionalities along with other functionalities.

The invention relates to a method of selective image enlargement comprising the steps of: capturing an image using an image capture unit, recognizing at least one text passage in the captured image, enlarging at least one part of the recognized text passage and displaying the enlarged part on a display unit and/or converting at least one part of the recognized text passage into speech and outputting the at least one converted part through a speaker.

Preferably, the method further comprises the steps of recognizing of a user gesture, localizing the position of the user gesture, and selecting the at least one part of the text passage on the basis of the localized position.

Preferably, the user gesture is a movement of the user's hand. Preferably the at least one part of the text passage is selected by localizing where the user is pointing at, preferably within the focus of the image capture unit. Preferably the user gesture is a touching on a touch- sensitive display unit. Preferably the user gesture is recognized within the captured image, i.e. preferably the image is captured and the user gesture in the captured image is recognized and by localizing the position of the user gesture, the at least one part of the text passage can be selected.

Preferably, the method further comprises the step of displaying the captured image on the display unit. Preferably the display unit is a LCD-Display. Preferably the display unit is touch-sensitive.

Preferably, if the at least one part of the recognized text passage is enlarged, the enlarged text passage is displayed as an image of a virtual magnifier overlaid on the captured image on the display.

Preferably, the captured image comprises two or more text passages to be recognized and the method preferably further comprises the steps of: recognizing of a first user gesture, localizing the position of the first user gesture, and selecting the at least one text passage which at least one part is to be enlarged of on the basis of the first localized position. Preferably, the user gesture is a hand gesture. Preferably, the recognition of a user gesture is based on a color-based detection algorithm.

Preferably the text passages to be recognized are highlighted, preferably with rectangles and/or colors overlaid on the displayed image on the display.

Preferably the method further comprises the step of recognizing the characters of the at least one text passage and/or the at least one part of the recognized text passage.

Preferably, the characters of the at least one text passage also comprise emoticons, images and/or the like.

Preferably, the recognition of the characters comprises optical character recognition - OCR- techniques. Preferably, the OCR techniques also comprise intelligent character recognition (ICR) to recognize handwritten text. Preferably, the recognition of the characters also comprises the recognition of the combinations of computer readable characters that form at least one of words, phrases, sentences, paragraphs, addresses, phone numbers, dates, etc.

Preferably the position localization of the user gesture is performed by localizing the position of a user's hand and contouring the user's hand.

Preferably the steps of capturing an image using an image capture unit, recognizing at least one text passage in the captured image, enlarging at least one part of the recognized text passage and displaying the enlarged part on a display unit and/or converting at least one part of the recognized text passage into speech and outputting the at least one converted part through a speaker are repeated with a predetermined repetition time.

Preferably the method further comprises the step of activating the selective image enlargement by recognizing a predetermined voice command through a microphone unit and/or recognizing a predetermined command through an input unit.

The invention relates to a computing system for selective image enlargement comprising an image capture unit configured to capture an image and at least one processing unit. The at least one processing unit is configured to recognize at least one text passage in the captured image, and enlarge at least one part of the recognized text passage and/or convert at least one part of the recognized text passage into speech. If the at least one part of the recognized text passage is enlarged, the computing system further comprises a display unit being configured, to display the enlarged part. If the at least one part of the recognized text passage is converted into speech, the computing system further comprises a speaker configured to output the at least one converted part through the speaker.

Preferably, text recognition may be implemented locally on a personal mobile device. Text recognition may be powered by cloud computing powerful enough to perform the necessary text recognition in or near real time. Preferably computationally intensive methods may be offloaded to nearby mobile devices and/or remote cloud computing services to improve runtime performance.

Preferably, the computing system further comprises a gesture recognition unit configured to recognize a user gesture, wherein the at least one processing unit is configured to localize the position of the user gesture and to select the at least one part of the text passage on the basis of the localized position.

Preferably, the display unit is configured to display the captured image,

Preferably the gesture recognition unit is configured to recognize the user gesture within the captured image.

Preferably the gesture recognition unit is configured to recognize a first user gesture, and the at least one processing unit is configured to localize the position of the first user gesture and to select the at least one text passage which at least one part is to be enlarged of on the basis of the first localized position, the captured image comprises two or more passages to be recognized.

Preferably the at least one processing unit is configured to recognize the characters of the at least one text passage and/or the at least one part of the recognized text passage.

Preferably the gesture recognition unit is configured to localize the position of the user gesture performed by localizing the position of a user's hand and contouring the user's hand.

Preferably the computing device comprises a microphone unit, wherein the at least one processing unit is configured to activate the selective image enlargement by recognizing a predetermined voice command through the microphone unit. Preferably the computing device comprises an input unit, wherein the at least one processing unit is configured to activate the selective image enlargement by recognizing a predetermined command through the input unit.

Preferably the computing system comprises a computing device, preferably a portable computing device, wherein the computing device comprises at least one of the following: the image capture unit, the display unit and the processing unit.

Preferably the computing device is a wearable computing device, more preferably a virtual reality head-mounted display, an optical head-mounted display, eyewear, a head-mounted display, a helmet-mounted display.

Preferably, in case that the wearable computing devices have limited computational power and/or battery capacity, computational tasks, preferably text recognition, are offloaded to mobile and/or locally fixed computing devices.

Preferably the computing device comprises the gesture recognition unit.

The invention also relates to the computing device according to any of the preceding examples preferably for use in a computing system according to any of the preceding examples.

The methods and system described herein can facilitate providing text information present in the surrounding real-world environment.

Among other advantages, embodiments disclosed herein provide an intuitive and integrated user experience in physical browsing using augmented reality devices, thereby reducing time for navigation in real life scenarios.

The invention is of particular interest use to help visually impaired people to access information in non-crucial situations, for instance, shopping in a mall, walking in the streets or wandering around the campus of a university.

The present invention aids visually impaired people, essentially in situations when the user has difficulties seeing text from information boards, direction plates etc. In these cases, people require additional information to make their decisions. In such scenarios, the information might be helpful for making a current decision and can be discarded immediately after use or utilized further. After the user acquires the necessary information, it may be of no use any longer or user may want to manipulate and/or store the achieved data.

In the drawings:

Fig. 1 shows a schematic diagram of a method of selective image enlargement according to a first embodiment of the invention;

Fig. lb shows another schematic diagram of a method of selective image enlargement according to the first embodiment of the invention;

Fig. lc shows a flowchart diagram of a method of selective image enlargement according to the first embodiment of the invention;

Fig. 2a shows a schematic diagram of a method of selective image enlargement according to a second embodiment of the invention;

Fig. 2b shows another schematic diagram of a method of selective image enlargement according to the second embodiment of the invention;

Fig. 2c shows a flowchart diagram of a method of selective image enlargement according to the second embodiment of the invention;

Fig. 3 a shows a schematic diagram of a method of selective image enlargement according to a third embodiment of the invention;

Fig. 3a shows another schematic diagram of a method of selective image enlargement according to the third embodiment of the invention;

Fig. 3 c shows a flowchart diagram of a method of selective image enlargement according to the third embodiment of the invention; and Fig. 4 shows a schematic diagram of a computing system for selective image enlargement according to an embodiment of the invention.

Fig. la shows a schematic diagram of a method of selective image enlargement according to a first embodiment of the invention. In this embodiment, a wearable computing device comprises a head-mounted display, a camera unit and a display configured to display an image taken by the camera unit. An image 304 of a medium 301 is taken by the camera unit of the wearable computing device. The medium 301 includes text 302 and the wearable computing device uses the camera to capture the image of the medium 301 and to display the image 304 on the head-mounted display. Since the image 304 corresponds with the medium 301, the image 304 also includes at least partially the text 302. In this embodiment, the image 304 is considered to be a streaming video image that corresponds with what is currently being viewed by the wearable computing device's camera. In other words, if the wearable computing device is moved over to a different part of the text 302, then a different image of a different part of the text of 302 will be displayed on the wearable computing device's display.

In Fig. lb,, a layer of electronically enlarged text is displayed overlaying the original image 304. The electronically enlarged region comprises the same characters or words as the imaged text 304, but is optically enlarged and displayed as a magnifier 305. The position of the magnifier 305 can be moved/adjusted according to the recognized pointing of the user's finger 306.

Figure lc shows a schematic flow chart diagram of a method of selective image enlargement according the first embodiment of the invention. Therefore, Fig. lc shows a flow diagram of instructions executable on a wearable computing device for displaying text, when visually impaired users need assistance to read from objects, such as books, newspapers, restaurant menus, product information or the like.

At block 308, the user of the wearable computing device performs a voice command to activate the selective digital enlargement method according to the first embodiment of the present invention. It is understood by the person skilled in the art that such an activation step is not compulsory and/or can be activated also by other means, e.g. by touching a touch- sensitive input unit. At block 309, the wearable computing device captures an image of the text 302 of the medium 301 using the build-in camera. At block 310, the gesture recognition algorithm localizes the position of the user's hand and contours the user's hand in order to identify the user's segment of interest in the text 302. And block 311, the enlarged version of the selected text is presented in a virtual magnifier 305 and the displayed over the original image of the text 302.

In a further step, at block 312, images of the text of 302 are continuously captured and automatically updated to display the user's segment of interest in accordance with the detected position of the user's finger. In other words, as soon as the user moves his finger to another segment of interest of the text 302, this other segment of interest will be selected and enlarged in the virtual magnifier 305.

Fig. 2a shows a schematic diagram of a method of selective image enlargement according to a second embodiment of the invention. This embodiment is designed, among others, to help visually impaired people to access information in non-crucial situations, for instance, shopping in a mall, walking on the streets or wandering on the university campus.

In Fig. 2a, as in the first embodiment, a wearable computing device comprises a head- mounted display, a camera unit and a display configured to display an image taken by the camera unit. An image 404 of a medium 401 is taken by the camera unit of the wearable computing device. The medium 401 includes two text passages 402 and 403. The wearable computing device uses the camera to capture the image of the medium 401 and to display the image 404 on the head-mounted display. Since the image 404 corresponds with the medium 401, the image 404 also includes the two text passages 402 and 403. Also in this embodiment, the image of 404 is considered to be a streaming video image that corresponds with what is currently being viewed by the wearable computing device's camera.

In Fig. 2b, as multiple text passages 402, 403 exist in the current view of the wearable computing device's camera and the captured image respectively, the candidate text passages are highlighted with virtual rectangles 405, 406 and are converted into selectable items for the user. The user can then use his/her finger to select the text passage of interest, which is then preferably highlighted with a rectangle of alternative color to indicate his/her selection. The selected text 405 or 406 is then recognized and synthesized to speech by the wearable computing device. Afterwards, the synthesized speech is outputted by a loudspeaker of the computing device. Figure 2c shows a schematic flow chart diagram of a method of selective image enlargement according to the second embodiment of the invention.

At block 408, the wearable computing device receives a voice command to activate the method of selective image enlargement. Also in this case, it is understood by the person skilled in the art that such an activation step is not compulsory and/or can be activated also by other means, e.g. by touching a touch-sensitive input unit.

At block 409, the wearable computing device captures an original image of the scene, i.e. the board 401 including the two text passages 402 and 403, using the camera of the wearable computing device.

At block 410, the two text passages 402 and 403 are localized and segmented.

At block 41 1 , the selectable text passages are a display to in a highlighted form, i.e. with a virtual rectangle, on the wearable device's display.

At a block 412, the user's gesture is recognized and therefore the position of the user's hand is localized and contoured. With this information at, the text passage 402, 403 the user is interested in is identified.

At the block 413, the text of the text passage the user is interested in is recognized by pattern recognition.

At block 414, the recognized text is a converted into speech and outputted through are loudspeaker of the wearable computing device.

Fig. 3 a shows a schematic diagram of a method of selective image enlargement according to a third embodiment of the invention. This method, among others, is designed for users to have access to text information from e.g. slight show presentations, e.g. in situations such as attending a class or participating in a meeting. In particular, the recognition region of the method can be automatically adapted according to the position and size of the projector screen. This embodiment is particularly for the case that the text passages of interest and the user location are relatively fixed to each other and all the text passages located in the fixed region can be recognized, e.g. by OCR recognition. However, as in the previous embodiments, dynamic recognition is still preferred since the presented slides are changing over time. As in the first and second embodiment, a wearable computing device comprises a head-mounted display, a camera unit and a display configured to display an image taken by the camera unit. An image 504 of the projected slide 501 is taken by the camera unit of the wearable computing device. The slide 501 includes a three-line text passage 502. The wearable computing device uses the camera to capture the image of the slide 501 and to display the image 504 on the head-mounted display. Since the image 504 corresponds with the slide, the image 504 also includes the bullet-numbered text passage 502. Also in this embodiment, the image 504 is preferably considered to be a streaming video image that corresponds with what is currently being viewed by the wearable computing device's camera.

In Fig. 3b, the three-line text passage 502 is enlarged and presented as recognized text 504 in a rectangle on the display below the original image of the three-line text passage 502.

Figure 3 c shows a schematic flow chart diagram of a method of selective image enlargement according to the third embodiment of the invention.

At block 506, the wearable computing device receives a voice command to activate the method of selective image enlargement. Also in this case, it is understood by the person skilled in the art that such an activation step is not compulsory and/or can be activated also by other means, e.g. by touching a touch-sensitive input unit.

At block 507, the wearable computing device captures an original image of the scene, i.e. the projected slide 501 including the bullet-numbered text passage 502, using the camera of the wearable computing device.

At block 508, the bullet-numbered text passage 502 is localized.

At block 509, the bullet-numbered text passage 502 is recognized by pattern recognition.

At block 510, the enlarged recognized text is displayed on a rectangle on the display below the bullet-numbered text passage 502.

Fig. 4 shows a schematic diagram of a computing system 100 for selective image enlargement according to an embodiment of the invention. In this embodiment, all the components of the computing system 100 are comprised in a wearable computing device. However, it is understood by the skilled person that the computing system 100 according to the invention can also comprise two or more entities. For example, the image capture unit 109 (see below) and the display unit 108 (see below) can be part of a head-mounted display unit, i.e. a wearable, and the processing unit 101 (see below) can be comprised in a mobile computing device, e.g. a smartphone.

The computing system/wearable computing device 100 comprises an image capture unit/camera unit 102 which is configured to capture an image. The computing system 100 further comprises a processing unit 101 which recognizes at least one text passage in the captured image of the image capture unit 102 and is further configured to enlarge at least one part of the identified text passage. In addition to or as an alternative to the enlargement of the at least one part of the identified text passage, the processing unit 101 converts the at least one part of the identified text passage into speech. The computing system 100 further comprises a display unit 103 which displays the captured image as a well as the enlarged recognized text passage. The computing system 100 also comprises a speaker 104 which outputs the part of the text passage which is converted into speech through the speaker 104.

The computing system 100 further comprises a gesture recognition unit 105 configured to recognize a user gesture. In this context, the processing unit 101 is further configured to localize the position of the gesture and to select the at least one part of the text passage on the basis of the localized position.

The computing system 100 further comprises an audio input unit/microphone unit 106. The microphone unit 106 is able to convert voice commands of the user into electrical signals and the processing unit 101 activates the selective image enlargement method by recognizing a predetermined voice command through the microphone unit 106.

As an alternative or in addition to the microphone unit 106 the computing system 100 comprises an input unit 107 which is touch-sensitive. The processing unit 101 activates the selective image enlargement method as soon as the user touches the input unit 107 in a predetermined manner, e.g. by pressing a predetermined buttom displayed on the input unit 107 and/or by performing a predetermined gesture on the touch-sensitive input unit 107. According to an embodiment, the computing system 100 further comprises several modules, for example a network interface, a memory unit, a storage unit, and/or several motion sensors like a gyroscope, a magnetometer, an accelerometer (not shown). While the invention has been illustrated and described in detail in the drawings and foregoing description, such illustration and description are to be considered illustrative or exemplary and non-restrictive; the invention is thus not limited to the disclosed embodiments. Variations to the disclosed embodiments can be understood and effected by those skilled in the art and practicing the claimed invention, from a study of the drawings, the disclosure, and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality and may mean "at least one".

Claims

Method of selective image enlargement comprising the steps of:

- capturing an image using an image capture unit,

- recognizing at least one text passage in the captured image,

- enlarging at least one part of the recognized text passage and displaying the enlarged part on a display unit and/or converting at least one part of the recognized text passage into speech and outputting the at least one converted part through a speaker.

Method according to claim 1 , further comprising the steps of:

- recognizing of a user gesture,

- localizing the position of the user gesture, and

- selecting the at least one part of the text passage on the basis of the localized position.

Method according to claim 1 or 2, further comprising the step of displaying the captured image on the display unit.

Method according to claim 2 or 3, wherein the user gesture is recognized within the captured image.

Method according to any of claims 1 to 4, wherein, if the at least one part is enlarged, the enlarged text passage is displayed as an image of a virtual magnifier overlaid on the captured image on the display.

Method according to any of claims 1 to 5, wherein the captured image comprises two or more text passages to be recognized, further comprising the steps of:

- recognizing of a first user gesture,

- localizing the position of the first user gesture, and

- selecting the at least one text passage which at least one part is to be enlarged of on the basis of the first localized position.

Method according to claim 6, wherein the text passages to be recognized are highlighted, preferably with rectangles and/or colors overlaid on the displayed image on the display.

8. Method according to any of claims 1 to 7, further comprising the step of recognizing the characters of the at least one text passage and/or the at least one part of the recognized text passage.

9. Method according to any of claims 2 to 8, wherein the position localization of the user gesture is performed by localizing the position of a user's hand and contouring the user's hand.

10. Method according to any of claims 1 to 9, wherein the steps of capturing an image using an image capture unit, recognizing at least one text passage in the captured image, enlarging at least one part of the recognized text passage and displaying the enlarged part on a display unit and/or converting at least one part of the recognized text passage into speech and outputting the at least one converted part through a speaker are repeated with a predetermined repetition time.

1 1. Method according to any of claims 1 to 10, further comprising the step of activating the selective image enlargement by recognizing a predetermined voice command through a microphone unit and/or recognizing a predetermined command through an input unit.

12. Computing system for selective image enlargement comprising:

an image capture unit configured to capture an image, and

at least one processing unit configured to:

-recognize at least one text passage in the captured image, and

-enlarge at least one part of the recognized text passage and/or convert at least one part of the recognized text passage into speech,

wherein the computing system further comprises a display unit being configured, if the at least one part of the recognized text passage is enlarged, to display the enlarged part and/or wherein the computing system further comprises a speaker configured to output the at least one converted part through the speaker, if the at least one part of the recognized text passage is converted into speech.

13. Computing system according to claim 12, further comprising

a gesture recognition unit configured to recognize a user gesture, wherein the at least one processing unit is configured to localize the position of the user gesture and to select the at least one part of the text passage On the basis of the localized position.

14. Computing system according to claims 12 or 13, wherein the display unit is configured to display the captured image,

15. Computing system according to claims 13 or 14, wherein the gesture recognition unit is configured to recognize the user gesture within the captured image.

16. Computing system according to any of claims 12 to 15, wherein the captured image comprises two or more passages to be recognized,

wherein a/the gesture recognition unit is configured to recognize a first user gesture, and wherein the at least one processing unit is configured to localize the position of the first user gesture and to select the at least one text passage which at least one part is to be enlarged of on the basis of the first localized position.

17. Computing system according to any of claims 12 to 16, wherein the at least one processing unit is configured to recognize the characters of the at least one text passage and/or the at least one part of the recognized text passage.

18. Computing system according to any of claims 12 to 17, wherein the gesture recognition unit is configured to localize the position of the user gesture performed by localizing the position of a user's hand and contouring the user's hand.

19. Computing system according to any of claims 12 to 18, further comprising a microphone unit, wherein the at least one processing unit is configured to activate the selective image enlargement by recognizing a predetermined voice command through the microphone unit.

20. Computing system according to any of claims 12 to 19 further comprising an input unit, wherein the at least one processing unit is configured to activate the selective image enlargement by recognizing a predetermined command through the input unit.

21. Computing system according to any of claims 12 to 20, further comprising a computing device, preferably a portable computing device, wherein the computing device comprises at least one of the following: the image capture unit, the display unit and the processing unit.

22. Computing system according to claim 21, if not depending on claim 12, wherein the computing device comprises the gesture recognition unit.

23. Computing device according to claim 21 or 22, preferably for use in a computing system according to any of claims 12 to 20.