EP3218896A1

EP3218896A1 - Externally wearable treatment device for medical application, voice-memory system, and voice-memory-method

Info

Publication number: EP3218896A1
Application number: EP15771861.0A
Authority: EP
Inventors: Pan Hui; Bowen SHI; Zhanpeng HUANG; Christoph Peylo
Original assignee: Deutsche Telekom AG
Current assignee: Deutsche Telekom AG
Priority date: 2015-09-01
Filing date: 2015-09-01
Publication date: 2017-09-20
Also published as: WO2017036516A1

Abstract

A system and method are disclosed for extracting personal information from daily vocal dialogues for people wearing augmented reality glass. Designed to help people with impaired memory (like Amnesia, Alzheimer's disease, etc.), the system is able to automatically extract practical personal information from daily face-to-face conversation. Those information will be stored as private dataset for later searching and inquiring in related applications such as reminding and recommendation systems.

Description

Externally wearable treatment device for medical application, Voice-Memory System, and Voice-Memory-Method

This invention relates to a system and a method for helping memory impaired people memorize information by extracting information from daily vocal dialogues and providing the extracted information to the user when required.

Augmented reality (AR) is a technology to supplement the real world by overlaying computer-generated virtual contents on the view of real environment to create a new mixed environment in which a user can see both the real and virtual contents in his or her field of view. It is particularly applicable when users require informational support for a task while still focusing on that task. It has the potential to allow users to interact with information without getting distracted from the real world. With optical see-through or video see-through display terminals, users are able to interact with virtual contents without attention distraction from the real environment. AR glasses are important devices where augmented reality is displayed. Versions include eyewear that employ cameras to intercept the real world view and re-display its augmented view through the eye pieces and devices in which the AR imagery is projected through or reflected off the surfaces of the eyewear lens pieces. With the capability to integrate augmented reality, the AR glasses have the potential to do many things such as feeding peoples' live information during activities and even letting people manipulate 3D objects with ease.

Conventionally, there is little or no focus on the potential synergy of integrating augmented reality technology with vocal or visual information extraction techniques to assist people in obtaining or managing those information. In a conference, people tend to meet many new colleagues in a short time and it is hard to memorize all their personal details. In particular, the task is even more difficult to the elders among whom a large percentage is undergoing memory loss because of multiple diseases like Alzheimer's disease. Traditional smart devices like smartphones do not provide a mature interface to perform information extraction, as the capability of smartphone to extract information cannot be offered in real-time. The main reason being that the smartphone is not a tool that is capable of combining both video and audio information perfectly, therefore, information extraction using the smartphone is not precise enough. Summary

Embodiments of the present technology relate to a system and a method for helping memory impaired people memorize information. The system extracts inforaiation from self-introduction dialogues. The extracted information will be provided to the user in cases when he needs. The main type of information the voice-memory system, VMS, aims to obtain is the vocal information and the facial information, which is a major information source for AR glasses.

The VMS assists the user to memorize and recall information from daily self-introduction dialogues, which are typically daily speech sources and contain large amount of personal information. The obtained information is stored in the AR glass memory and will be offered to the user when he needs them. As an example, when people introduce themselves, the VMS will be triggered automatically and personal information like job and hobbies will be drawn if they are dealt with in the dialogue. In the future, when the user meets the target person again, both his photo and personal information will be displayed as a hint on the screen, which assists the user to recall him.

In accordance with another aspect of the present invention, an externally wearable treatment device for medical application is provided. The externally wearable treatment device comprises an augmented reality glass which may comprises a camera for capturing a live video stream or an image; a voice recorder to record speech concurrently with the camera; a central processing unit which may include a speech processor; an image processor; a natural language processor; wherein the central processing unit is adapted to generate and render the plurality of the people information contents for display on a screen of the augmented reality glass; a memory unit for storing a plurality of images captured by the camera and the plurality of people information contents; a display device for displaying a fused virtual and real contents; where the virtual content may comprises at least one of the plurality of people information contents.

In accordance with another aspect of the present invention, a Voice-Memory System, VMS, for assisting a user to memorize and recall a plurality of people information contents is provided. The system may comprise an augmented reality glass which may include a camera for capturing a live video stream or an image; a voice recorder to record speech concurrently with the camera; a central processing unit which may include a speech processor; an image processor; a natural language processor; wherein the central processing unit is adapted to generate and render the plurality of the people information contents for display on a screen of the augmented reality glass; a memory unit for storing a plurality of images captured by the camera and the plurality of people information contents; a display device for displaying a fused virtual and real contents; where the virtual content may comprises at least one of the plurality of people information contents.

The device or system according to a further aspect of the present invention may include a plurality of sensors adapted to gather information including position and orientation. The plurality of sensors may comprise at least a voice sensor, a position and orientation sensor and a motion sensor. Accordingly, an operation system of the central processing unit may be an Android-based system.

The speech processor may process a vocal signal of the speech. The speech processor may include a speech recognizer, an Android speech recognition API and a local speech recognizer. The Google server may be adapted to upload the vocal signal.

The image processor may include a face recognizer, a face detector and a snapdragon Software Development Kit. The image processor may be adapted to detect whether the human face exists in the image; and recognize and compare the human face in the image with the images stored in a database of the memory unit. The image processor may be adapted to process a plurality of human faces in the image and select a region of interest of the human face in the image. The image processor may be operable without connection to the internet.

The natural language processor may comprise a text classifier and an information extractor; wherein the information extractor is based on an Open Natural Language Processor Library. The natural language processor may be adapted to perform automatic summarization, preferably producing a readable summary of a chunk of text; discourse analysis, including identifying discourse structure of a connected text; Named Entity Recognition, NER, preferably determining which items in the text map to proper names such as people or places; parsing, preferably determining the parse tree of a given sentence. The natural language processor may be a self-designed NLP module; which utilizes the open source natural language processing tool kits as well as self-designed algorithms.

The device or system according to a further aspect of the present invention may include a motion evaluator; where the motion sensor may be adapted to measure the motion of augmented reality glass and the motion evaluator may be adapted to judge whether the augmented reality glass is in a static state or in a motion state.

The plurality of people information contents may comprise at least one of three categories of information; the three categories of information may comprise of a vocal information, a facial information or an extra information; and the three categories of information may be synthesized by an intention interpreter. The vocal information may be the vocal signal from the speech; the vocal information may be processed by a speech processor and translated into a plurality of text formed scripts; and the plurality of text formed scripts may be processed by the text classifier and information extractor in the natural language processor. The vocal information may comprise at least four types of information, preferably, name, job, company and age. The facial information may comprise of the human face, preferably a face of the person the user is talking to. The extra information may comprise a geographical and a date information.

The size of the memory unit may be at least 100Mb and stores a maximum of 150 people information contents. The memory unit may further comprise a Read-only memory and a Random-access memory, wherein the Read-only memory is a database. The augmented reality glass may further comprise an information retrieval system, and the information retrieval system may be adapted to transform the vocal information into a plurality of text and image form. The augmented reality glass may be connectable to the internet via an internet module and comprises GPS functionality. The device or system according to a further aspect of the present invention, may further comprise a battery and a power manager, the power manager may function as an interface between the battery and the system; the power manager may regulate the turning on or off of the system in accordance with an electricity level; the power manager may monitor the battery level; and when the battery is at a low level, the power manager turns the system off automatically to conserve the battery consumption.

The device or system according to a further aspect of the present invention may further comprise a storage manager. The storage manager may comprise a plurality of interfaces, where at least one of the plurality of interfaces may displays a plurality of storage information on an upper portion of the interface on the screen of the augmented reality glass, where the plurality of storage information may comprise at least a total number of the names of the people information contents stored in the database of the memory unit, and a percentage of free space available in the database of the memory unit, wherein at least one of the plurality of interfaces may display the names of the plurality of people information contents in an alphabetically order on the middle and lower portions of the interface on the screen of the augmented reality glass.

In accordance with another aspect of the present invention, a voice-memory method is provided. The method may be used for assisting a user to memorize and recall a plurality of people information contents in the voice-memory system according to the above disclosure; the method may further comprise the steps of operating a sleeping state; operating an inputting state; and operating an outputting state.

The step of operating a sleeping state may comprise running an operating system in the background; where the operating system may be surveilling the environment by analyzing the speech through the voice recorder, the image from the camera and the user head's attitude information through the motion sensor; and detecting if the speech is the self introduction dialogue or when the user requires a hint pertaining to the plurality of people information contents. The step of operating a sleeping state may further comprise determining a content of the speech via the intention interpreter and determining whether the operating system activates the inputting state or the outputting state or remains in the sleeping state. The step of operating an inputting state may comprise extracting a plurality of people information contents from a current environment and the speech; storing and classifying the plurality of people information contents in the database of the memory unit. The step of operating an outputting state may comprise recognizing via the image processor, the human face on the image captured by the camera; extracting the plurality of people information content from the database and displaying at least one of the plurality of people information contents on the screen of the augmented reality glass as long as the human face is captured by the camera of the augmented reality glass. The method according to a further aspect of the present invention, may further comprise activating the inputting state when the intention interpreter detects that the content of the speech is the self-introduction dialogue; the human face detected on the image captured by the camera is not in the database and the augmented reality glass is in the static state. The method according to a further aspect of the present invention, may further comprise activating the outputting state when the speech is the self-introduction dialogue; the human face detected on the image captured by the camera is in the database and the augmented reality glass is in the static state. The method according to a further aspect of the present invention may further comprise activating the outputting state when the intention interpreter detects that the user requires the hint on at least one of the plurality of people infomiation contents based on the content of the speech.

The method according to a further aspect of the present invention may further comprise activating the sleeping state when the human face is not detected on the image captured by the camera. The method according to a further aspect of the present invention may further comprise activating the sleeping state at the end of the inputting state or at the end of the outputting state.

The method according to a further aspect of the present invention may further comprise the step of operating a storage management state using a storage manager. The step of operating a storage management state using a storage manager may comprise activating the storage manager interface via a predefined vocal command, preferably, VMS Storage Manager; wherein when the storage manager interface is activated, displays the plurality of storage information on the upper portion of the storage manager interface, wherein the plurality of storage information comprises a total number of names of the plurality of people information contents stored in the database of the memory unit, and the percentage of free space available in the database of the memory unit, displaying the names of the plurality of people information contents in an alphabetically order on the middle and lower portions of the storage manager interface; managing the database of the memory unit via a plurality of predefined vocal commands, preferably, delete, new name, new job, new age, new time and/or new location.

The method according to a further aspect of the present invention, wherein managing the storage database may comprise selecting by vocally calling the name of the plurality of people information contents; retrieving the relevant plurality of people information contents; displaying on the storage manager interface, the relevant plurality of people infomiation contents, preferably, name, job, age, meeting time and/or meeting location; revising and/or deleting via a plurality of predefined vocal commands, at least one of the displayed plurality of people information contents. The method according to a further aspect of the present invention, when the database of the memory unit is full, displaying an alert information on the screen of the augmented reality glass, preferably, not enough storage.

In accordance with another aspect of the present invention, a computer program may comprise computer executable instructions which when run on the voice-memory system perform the method steps disclosed above.

In accordance with another aspect of the present invention, a wearable device wearable by a user is provided, wherein the wearable device comprises the voice memory system described above.

The system is preferably composed of three function modes, an information exploring, an information storing and an information display. In the information exploring mode, the system is mainly employed to detect whether it contains target information in the speech source. Information will be extracted from speech if it is detected. The system turns to information storing if the target information is extracted. It will be classified and stored in the database of AR glass. Information display is designed to detect situations when the user needs information. Information output takes several forms including voice hint, virtual screen display, etc. In terms of states, the three functional modes are realized by three certain states of the system respectively: an inputting state, a sleeping state and an outputting state. The three states are switched between each other automatically.

In accordance with an aspect of the present invention, the voice-memory system is embedded on AR glasses which may be equipped with a mobile central processing unit 603, a memory 604, a camera 301 , a display device 602 and a plurality of sensors 101. The camera 301 is used to capture a human face and its nearby environment. The mobile processing unit 603 extracts information from texts, recognizes contents from the speech which takes the form of electrical signals transmitted by sensor. The display device 602 serves as an interface for transmitting information to the user when both text and voice hint are employed. In accordance with an aspect of the present invention, the voice-memory system utilizes a series of newly-developed methods. The method comprises: (a) Detecting topic from speech including dialogues and monologues automatically; (b) Searching target named entity from a segmented text; (c) classifying and storing information in a light-weight database; (e) Representing and matching information from a light-weight database; (f) Detecting and Recognizing a human face in a given image or a live video stream; (g) Recognizing speech and transferring vocal signals into texts in an adapted way, using either internet or local speech recognizers. The summary is provided to introduce a selection of concepts in a simplified form that are further described as follows in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Brief introduction of drawings

FIG.l is an illustration of an embodiment of the flowchart of the VMS.

FIG.2 is an illustration of an embodiment of the sleeping state of the VMS.

FIG.3 is an illustration of an embodiment of the inputting state of the VMS.

FIG.4 is an illustration of an embodiment of the outputting state of the VMS.

FIG.5 is an illustration of an embodiment of state flow of the VMS.

FIG.6 is an illustration of an embodiment of the structures of the VMS and the relationship between modules and peripherals of the glass.

FIG.7 is an illustration of an embodiment of the core processor structures of the VMS.

FIG.8 is an illustration of a self-introduction scene

FIG.9 is an illustration of a scene of the two men in Fig 8 meeting again

FIG.10 is an illustration of an embodiment of the VMS storage manager

FIG.l 1 is an illustration of an embodiment of the VMS storage warning information

Detailed description

Embodiments of the present technology will be explained and described with reference to

FIGs. 1 to 9, which in general relates to retrieving information from vocal speeches based on the augmented reality glass. The same components may be designed by the same reference numbers although they are illustrated in different figures. Detailed descriptions of constructions or processes known in the reality may be omitted to avoid obscuring the subject matter of the present invention. The system for implementing the augmented reality environment includes a mobile central processing unit, a display screen, a rear built-in camera, a microphone, a voice sensor, and a position and orientation sensor in the embodiments. A user wears an augmented reality glass with a built-in rear camera and a voice recorder that is switched on. The information retrieval system is run in the background of the operation system of augmented reality glass. The operation system may be confined to Android or Android-based systems. The glass must be connected to the internet and supports GPS. When the user encounters vocal information the system will analyze the vocal signal, extract the information and store it in its database. Afterwards the system will detect the user's intention and determine whether he needs the information and provide the information to the user if necessary. Useful information herein means the personal information (like name and job) in self-introduction dialogues. This type of information is originally in vocal forms. Through the information retrieval system, they will be transformed and offered to the user in text and image form.

The whole system works for self-introduction scenarios, which is defined as a situation where two people who do not know each other, introduce themselves to each other. The personal information may include name, face, job, age, meeting time and location.

Fig. 1 shows the structure of the VMS according to the invention. As illustrated in Fig. 1, the VMS (automatic information retrieval system) is composed of three states, namely the sleeping state 801, the inputting state 803 and the outputting state 804. In the sleeping state 801, the program runs in backstage and there is not much user-system interaction in this state. The aim of the system in this state is to detect whether a self-introduction dialogue occurs or if the user needs hints. As illustrated in Fig. 5, when the result is true, the system will turn to the inputting state (see Fig. 3) where information is detected, extracted and stored in a light-weight database in a section of ROM 601 in the augmented reality glass. If the user needs hint, the system will turn to the outputting state 804 (see Fig. 4) where the main target is to obtain information from the database and to show the information on the screen 602. In the sleeping state 801 (see Fig. 2), the system continues to surveil the environment where it obtains speech through a voice recorder 201, image view through a camera 301 and the user head's attitude information through a motion sensor 101. The three categories of information will be synthesized by an intention interpreter 802, which determines whether the system shall turn to the inputting state 803 or the outputting state 804 or remains in the sleeping state. The speech is recorded by the voice recorder 201 in the augmented reality glass. The vocal signal is processed through a speech processor 202. In the VMS the speech processor 202 is offered with two solutions— a local speech recognitizer interface 605 (see Fig. 6) and a VMS speech recognition interface 207. The VMS speech recognition interface 207 is designed by the Deutsche Telekom and it is employed by default. The VMS speech recognizer 207 (see Fig. 7) employs android speech recognition API 208, where the vocal signal is uploaded to the Google server 701 and the translated content is returned to the client in the form of texts. Therefore, the augmented reality glass must be able to connect to the internet via the internet module 606 (see Fig. 6). The local speech recognizer 605 can be employed as well if it is available in the glass. The translated content is processed by the text classifier belonging to the natural language processor 203. The natural language processor 203 is a self-designed NLP module, which utilizes the open source natural language processing tool kits as well as self-designed algorithms. The text classifier 204 enclosed, is based on Naive Bayesian algorithm and it is trained offline with a corpora which is obtained through a self-designed python crawler on more than 50 English language education websites. The text classifier 204 will return true if the topic of translated content is self-introduction. Image is drawn from each frame of the video stream from the camera. The image is passed through an image processor 302. The image processor 302 will perform face detection and face recognition. The image processor 302 is based on a snapdragon SDK 305 for android. The image processor 302 is run real-time under no-internet conditions. It distinguishes whether a human face appears in the current view and can recognize whether the face matches any face in the database. Multiple faces in the image can also serve as an input to the image processor 302. Another function of the image processor is selecting the face ROI (Region Of Interest) in a correspondent image. The motion sensor 101 measures the motion of the AR glass. It passes a 3d vector of linear acceleration, which means the acceleration along x axis, y axis and z axis. Motion evaluator 102 is a module using the formula of threshold to process the 3d vector, which is capable of judging whether the AR glass is in a static or in a motion state. The intention interpreter 802 collects the information of the above three aspects and employs the following rule to manage the system: if the speech topic is a self-introduction and a face is detected but not in the database and the AR glass is in a static state, the system activates the inputting state 803; if the speech topic is a self-introduction, a face is detected and existed in the database and the AR glass is in a static state, the system activates the outputting state 804; otherwise the system remains in the sleeping state 801.

In the inputting state (see Fig. 3), the mission of the system is to extract useful information from the current environment. Herein the information includes three types— vocal information, facial information and extra information. The vocal information is passed through the speech recognizer and translated into text formed scripts. The scripts are processed by an information extractor 205 in the Natural Language Processor, NLP, 203. The information extractor is based on an OpenNLP Library 206, which is an android developing tool kit for natural language processing tasks. The main components used are listed in the following: sentence detector, tokenizer, pos tagger and chunker. From the vocal signals four types of information will be extracted: name, job, company and age. They will be left blank if they are not dealt with in the dialogue. Facial information is the photo of the person when they are talking. The video frame is also passed through an image processor 302 and a photo comprising a human face will be extracted. Extra information includes the geographical and date information. The GPS 607 must be supported by the AR glass. The information is stored in a section of ROM 601 , and the ROM 601 should be at least 100Mb. The system supports at most 150 persons' infomiation. The system will return to the sleeping state at the end of the inputting state.

In the outputting state, the system extracts information from the database. The image processor will recognize the face in the video frame and then extracts the person's photo with all its related personal information from the database. The information will be shown on the screen as long as the person appears in the user's sight. System will return to the sleeping state at the end of outputting.

In addition, the VMS provides a storage manager 504 for users to manually manage the database. All management to the database is done by vocal commands. Users can call the vocal command "VMS Storage Manager" to start this interface. As is illustrated in Fig. 10, the storage management interface comprises two parts. On the top portion of the interface, the number of stored persons and the percentage of free space are displayed. The names of all stored persons are listed alphabetically below the top portion and users may overwrite the recorded personal infomiation. When the user wants to revise the personal information he only needs to call the person's name and all information of this person will be shown on the virtual screen. A series of vocal commands can be used in this interface like "Delete" to delete this person, "Name...." to overwrite his name. Complete vocal commands used in this interface are defined in the following:

"DELETE delete this person NEW NAME(...)— -revise name NEW JOB(...)— -revise job NEW AGE(... )— -revise age NEW TIME(...)— revise meeting time NEW

LOCATION(...)— revise meeting location SAVE Save the revision

CANCEL— Cancel the revision EXIT-— Exit the storage manager". When the storage runs out, a piece of alert information will be prompted on the screen the moment the user attempts to add a new person: "Not enough storage." The user has to access the storage manager in order to delete some of the previously stored persons if the user wants to continue adding information of new people.

Figure 6 illustrates the composition of the whole system in terms of core processors and their relationship with the peripheral. The power manager 402 is a module to regulate the turning on or off of the system in accordance with the electricity level. It functions as an interface between a battery 401 and the VMS. The system is run in the background and it keeps calling the peripheral devices (camera 301, voice recorder 201 and motion sensor 101) and exploiting the CPU 603 to realize complicate recognition algorithms. When the augmented reality glass is at a low battery level, the system will be turned off automatically to conserve the battery consumption. User can turn the system off manually with a voice command: "VMS, Off'. The power manager continues to monitor the battery level.

Figures 8 to 11 are illustrations of how the system works in a real self-introduction scenario. As clarified above, the system provides three functional modes: the self-introduction mode, the meeting mode and the method mode. In each mode the complete work flow includes the information detection and extraction as well as information prompting, which corresponds to inputting state and outputting state of the system.

Fig. 8 demonstrates a self-introduction scene. The first time person Al and B 1 meets, Al is introducing himself to Bl who is wearing augmented glass A2 with the VMS installed. During the self-introduction process, Al looks at Bl through the augmented reality glass A2. The conversation deals with at least one element of the following attributes: name, job, age. In the virtual screen A3, the following information are displayed: "self-introduction detected... (A4) Information being recorded ....(A5) Information extracted successfully! (A6)". Those hints (A4-A6) imply that the system has extracted information from the conversation of self-introduction scenario.

Fig. 9 shows a scene when person A 1 meets Bl in the future. When A 1 sees the face of Bl through the augmented reality glass A2, the system will detect B 1 automatically and print his personal information on the virtual screen A3. The personal information is listed in the formats of name, job, age, first-meeting time, first meeting venue and personal photo (A7-A11). The attribute will be left blank if that particular type of information did not appear in the first meeting scene.

Fig. 10 illustrates how the system works when the user wants to overwrite the database manually. A new window (CI) will appears when the user vocally calls "VMS storage manager". In the upper part of CI, information regarding number of persons added (C2) and percentage of free space (C3) are displayed. Below is a list containing all names of persons in the database, which are listed alphabetically. If the user calls a name from the list, all recorded information of that name will be displayed on the screen (C5). User may use pre-defined commands to revise or delete information in C6.

Fig. 11 illustrates the prompted warning information when there is not enough space to store personal information. When the storage space runs out, a piece of alert information will be prompted on the screen (C8) the moment, the user adds new persons: "Not enough storage"(C9). The user has to access the storage manager to delete some previously stored persons in order to continue adding new person.

Claims

1. An externally wearable treatment device for medical application comprising:

an augmented reality glass (A2) comprising:

a camera (301) for capturing a live video stream or an image;

a voice recorder (201 ) to record speech concurrently with the camera;

a central processing unit (603) comprising:

a speech processor (202);

an image processor (302);

a natural language processor (203);

wherein the central processing unit (603) is adapted to generate and render the plurality of the people information contents for display on a screen (A3) of the augmented reality glass (A2);

a memory unit (604) for storing a plurality of images captured by the camera (301 ) and the plurality of people information contents;

a display device (602) for displaying a fused virtual and real contents;

wherein the virtual content comprises at least one of the plurality of people information contents.

2. A Voice-memory system, VMS, for assisting a user (Al) to memorize and recall a plurality of people information contents, the system comprises:

an augmented reality glass (A2) comprising:

a camera (301) for capturing a live video stream or an image;

a voice recorder (201 ) to record speech concurrently with the camera;

a central processing unit (603) comprising:

a speech processor (202);

an image processor (302);

a natural language processor (203);

a memory unit (604) for storing a plurality of images captured by the camera (301) and the plurality of people information contents; a display device (602) for displaying a fused virtual and real contents; wherein the virtual content comprises at least one of the plurality of people information contents.

3. The device or system according to claim 1 or 2, further comprising a plurality of sensors

(101) adapted to gather information including position and orientation.

4. The device or system according to any one of claims 1 to 3, wherein an operation system of the central processing unit (603) is an Android-based system (607).

5. The device or system according to any one of claims 1 to 4, wherein the speech processor (202) processes a vocal signal of the speech.

6. The device or system according to any one of claims 1 to 5,

wherein the speech processor (202) comprises a speech recognizer (207), an Android speech recognition API (208) and a local speech recognizer (209).

7. The device or system according to claim 5 or 6,

wherein the Google server (701) is adapted to upload the vocal signal.

8. The device or system according to any one of claims 1 to 7,

wherein the image processor (302) comprises a face recognizer (303), a face detector (304) and a snapdragon Software Development Kit (305).

9. The device or system according to any one of claims 1 to 8,

wherein the image processor (302) is adapted to:

detect whether the human face exists in the image; and

recognize and compare the human face in the image with the images stored in a database of the memory unit (604).

10. The device or system according to any one of claims 1 to 9,

wherein the image processor (302) is adapted to process a plurality of human faces in the image and select a region of interest of the human face in the image.

11. The device or system according to any one of claims 1 to 10,

wherein the image processor (302) is operable without connection to the internet.

12. The device or system according to any one of claims 1 to 11 ,

wherein the natural language processor (203) comprises a text classifier (204) and an information extractor (205);

wherein the information extractor is based on an Open Natural Language Processor Library (206).

13. The device or system according to any one of claims 1 to 12, wherein the natural language processor (203) is adapted to perform:

automatic summarization, preferably producing a readable summary of a chunk of text; discourse Analysis, including identifying discourse structure of a connected text;

Named Entity Recognition, NER, preferably detennining which items in the text map to proper names such as people or places;

parsing, preferably determining the parse tree of a given sentence.

14. The device or system according to any one of claims 1 to 13,

wherein the natural language processor (203) is a self-designed NLP module; which utilizes the open source natural language processing tool kits as well as self-designed algorithms.

15. The device or system according to any one of claims 1 to 14,

wherein the plurality of sensors (101) comprises of at least a voice sensor, a position and orientation sensor and a motion sensor (101).

16. The device or system according to any one of claims 1 to 15, further comprising a motion evaluator (102);

wherein the motion sensor (101) is adapted to measure the motion of augmented reality glass (A2) and

wherein the motion evaluator ( 102) is adapted to judge whether the augmented reality glass (A2) is in a static state or in a motion state.

17. The device or system according to any one of claims 1 to 16, wherein the plurality of people information contents comprises at least one of three categories of information;

wherein the three categories of information comprises of a vocal information, a facial information or an extra information;

wherein the three categories of information are synthesized by an intention interpreter

(802).

18. The device or system according to claim 17, wherein the vocal information is the vocal signal from the speech;

wherein the vocal information is processed by a speech processor (202) and translated into a plurality of text formed scripts;

wherein the plurality of text formed scripts is processed by the text classifier (204) and information extractor (205) in the natural language processor (203).

19. The device or system according to claim 17 or 18,

wherein the vocal information comprises at least four types of information, preferably, name, job, company and age.

20. The device or system according to any one of claims 17 to 19,

wherein the facial information comprises of the human face, preferably a face of the person

(Bl) the user (Al) is talking to.

21. The device or system according to any one of claims 17 to 20,

wherein the extra information comprises of a geographical and a date information.

22. The device or system according to any one of claims 1 to 21 ,

wherein the memory unit (604) is at least 100Mb and stores a maximum of 150 people information contents.

23. The device or system according to any one of claims 1 to 22,

wherein the memory unit (604) comprises a Read-only memory (601) and a Random-access memory (604),

wherein the Read-only memory (601) is a database.

24. The device or system according to any one of claims 1 to 23,

wherein the augmented reality glass (A2) further comprises an information retrieval system (503), and

wherein the information retrieval system (503) is adapted to transform the vocal information into a plurality of text and image form.

25. The device or system according to any one of claims 1 to 24,

wherein the augmented reality glass (A2) is connectable to the internet via an internet module (606) and comprises GPS (607) functionality.

26. The device or system according to any one of claims 1 to 25, further comprises a battery (401) and a power manager (402),

wherein the power manager (402) functions as an interface between the battery (401) and the system;

wherein the power manager (402) regulates the turning on or off of the system in accordance with an electricity level;

wherein the power manager (402) monitors the batter}' level; and

wherein when the battery is at a low level, the power manager (402) turns the system off automatically to conserve the battery consumption.

27. The device or system according to any one of claims 1 to 26, further comprises a storage manager (504).

28. The device or system according to claim 27,

wherein the storage manager (504) comprises a plurality of interfaces,

wherein at least one of the plurality of interfaces displays a plurality of storage information on an upper portion of the interface on the screen (A3) of the augmented reality glass (A2), wherein the plurality of storage information comprises at least:

a total number of the names of the people information contents stored in the database of the memory unit (604), and

a percentage of free space available in the database of the memory unit (604),

wherein at least one of the plurality of interfaces displays the names of the plurality of people information contents in an alphabetically order on the middle and lower portions of the interface on the screen (A3) of the augmented reality glass (A2).

29. A voice-memory method for assisting a user (Al) to memorize and recall a plurality of people information contents in the voice-memory system according to any one of claims 2 to 27;

the method comprising the steps of:

operating a sleeping state (801);

operating an inputting state (803); and

operating an outputting state (804).

30. The method of claim 29, the step of operating a sleeping state (801) comprises:

running an operating system in the background;

wherein the operating system is surveilling the environment by analyzing the speech through the voice recorder (201), the image from the camera (301) and the user (Al) head's attitude information through the motion sensor (101); and

detecting if the speech is the self introduction dialogue or when the user (Al) requires a hint pertaining to the plurality of people information contents.

31. The method of claim 29 or 30, the step of operating a sleeping state (801) further comprises:

determining a content of the speech via the intention interpreter (802) and

determining whether the operating system activates the inputting state (803) or the outputting state (804) or remains in the sleeping state (801).

32. The method of any one of claims 29 to 31 , the step of operating an inputting state (803) comprises:

extracting a plurality of people information contents from a current environment and the speech;

storing and classifying the plurality of people information contents in the database of the memory unit (604).

33. The method of any one of claims 29 to 32, the step of operating an outputting state (804) comprises:

recognizing via the image processor (302), the human face on the image captured by the camera (301); extracting the plurality of people information content from the database and displaying at least one of the plurality of people information contents on the screen (A3) of the augmented reality glass (A2) as long as the human face is captured by the camera (301) of the augmented reality glass (A2).

34. The method of any one of claims 29 to 33, further comprises:

activating the inputting state (803) when the intention interpreter (802) detects that the content of the speech is the self-introduction dialogue; the human face detected on the image captured by the camera (301) is not in the database and the augmented reality glass (A2) is in the static state.

35. The method of any one of claims 29 to 34, further comprises:

activating the outputting state (804) when the speech is the self-introduction dialogue; the human face detected on the image captured by the camera (301) is in the database and the augmented reality glass (A2) is in the static state.

36. The method of any one of claims 29 to 35, further comprises:

activating the outputting state (804) when the intention interpreter (802) detects that the user (Al) requires the hint on at least one of the plurality of people information contents based on the content of the speech.

37. The method of any one of claims 29 to 36, further comprises:

activating the sleeping state (801) when the human face is not detected on the image captured by the camera (301).

38. The method of any one of claims 29 to 37, further comprises:

activating the sleeping state (801) at the end of the inputting state (803) or at the end of the outputting state (804).

39. The method of any one of claims 29 to 38, further comprises the step of operating a storage management state using a storage manager (504).

40. The method of claim 39, wherein the step of operating a storage management state using a storage manager (504) comprises: activating the storage manager interface (504) via a predefined vocal command, preferably, VMS Storage Manager;

wherein when the storage manager interface (504) is activated, displays the plurality of storage information on the upper portion of the storage manager interface (504),

wherein the plurality of storage information comprises

a total number of names of the plurality of people information contents stored in the database of the memory unit (604), and

the percentage of free space available in the database of the memory unit (604), displaying the names of the plurality of people information contents in an alphabetically order on the middle and lower portions of the storage manager interface (504);

managing the database of the memory unit (604) via a plurality of predefined vocal commands, preferably, delete, new name, new job, new age, new time and/or new location.

41. The method of claim 40, wherein managing the storage database comprises:

selecting by vocally calling the name of the plurality of people inforrnation contents; retrieving the relevant plurality of people information contents;

displaying on the storage manager interface (504), the relevant plurality of people information contents, preferably, name, job, age, meeting time and/or meeting location;

revising and/or deleting via a plurality of predefined vocal commands, at least one of the displayed plurality of people information contents; and

saving or cancelling the revision of the displayed plurality of people information contents and/or exiting the storage manager interface via a plurality of predefined vocal commands.

42. The method of any one of claims 29 to 41 ,

wherein when the database of the memory unit is full, displaying an alert information (C 8) on the screen (A3) of the augmented reality glass (A2), preferably, not enough storage (C9).

43. A computer program comprising computer executable instructions which when run on the voice-memory system perform the method steps of any one of the claims 27 to 36.

44. A wearable device wearable by a user, wherein the wearable device comprises the voice memory system according to any one of claims 2 to 26.