WO2016103415A1 - ヘッドマウントディスプレイシステム及びヘッドマウントディスプレイ装置の操作方法 - Google Patents
ヘッドマウントディスプレイシステム及びヘッドマウントディスプレイ装置の操作方法 Download PDFInfo
- Publication number
- WO2016103415A1 WO2016103415A1 PCT/JP2014/084372 JP2014084372W WO2016103415A1 WO 2016103415 A1 WO2016103415 A1 WO 2016103415A1 JP 2014084372 W JP2014084372 W JP 2014084372W WO 2016103415 A1 WO2016103415 A1 WO 2016103415A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- character string
- language
- utterance
- mounted display
- head
- Prior art date
Links
- 238000011017 operating method Methods 0.000 title 1
- 238000000605 extraction Methods 0.000 claims abstract description 44
- 239000000284 extract Substances 0.000 claims abstract description 4
- 230000004044 response Effects 0.000 claims description 74
- 238000000034 method Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 20
- 230000004913 activation Effects 0.000 claims description 18
- 238000004891 communication Methods 0.000 claims description 17
- 230000003213 activating effect Effects 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 description 15
- 238000013519 translation Methods 0.000 description 13
- 230000014509 gene expression Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 230000003203 everyday effect Effects 0.000 description 2
- 239000011521 glass Substances 0.000 description 2
- 241001122315 Polites Species 0.000 description 1
- 241000544076 Whipplea modesta Species 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/30—Authentication, i.e. establishing the identity or authorisation of security principals
- G06F21/31—User authentication
- G06F21/32—User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/70—Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
- G06F21/82—Protecting input, output or interconnection devices
- G06F21/84—Protecting input, output or interconnection devices output devices, e.g. displays or monitors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/53—Processing of non-Latin text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/58—Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present invention relates to a technique for operating a head-mounted display device (hereinafter abbreviated as “HMD device”).
- HMD device head-mounted display device
- Patent Document 1 discloses that “a camera capable of capturing a character string composed of words or sentences in the field of view of a spectacles unit, and an image signal of the character string captured by the camera are transmitted.
- a control unit that outputs to the translation unit via the track, the translation unit translates the character string image signal output from the control unit of the spectacle unit, and the character string recognized by the OCR
- An electronic dictionary that includes a control unit that outputs the translation result to the spectacles unit via the signal transmission line, and in which the display element of the spectacles unit displays the translation result on the display unit (summary excerpt). ) "Is disclosed.
- Patent Document 2 describes a system that displays an answer to the content of the other party's remarks.
- a conversation support device that supports a conversation between a user and another person, in which the first language by the user is A source language expression input unit that inputs a source language expression including a natural language sentence, a source language conversion unit that converts the source language expression into another expression in the second language, and others to answer the other expression And a display unit for displaying the source language expression and the different expression and the answer screen on the same screen (summary excerpt).
- an HMD device as a foreign language translator or conversation assistance device does not require a device as compared to a smartphone or electronic dictionary, so it is easy to get used to everyday life, such as when you are on the go or have a baggage. There is.
- the present invention has been made in view of the above circumstances, and an object thereof is to provide a technique that can further improve the operability of a system using an HMD device.
- the present invention accepts an input of a talker's utterance, outputs voice information, converts the voice information into a character string, generates an utterance character string, and starts or stops the program. And the specific utterance included in the utterance character string with reference to the specific utterance information associated with the specific utterance to be activated or stopped for at least one of the operation modes and the programs and the operation modes. Extracting, generating a specific utterance extraction signal indicating the extraction result, and starting or stopping the program or operation mode with reference to the specific utterance extraction signal.
- Hardware configuration diagram showing an HMD device in the third embodiment The figure which shows the example which displayed the response character string of 3 classifications of common, affirmation, and denial as a tag
- the figure which shows the example which displayed the response character string hierarchically with the tag The figure which shows the example which displayed the response character string in order according to a certain standard
- FIG. 1 is a perspective view showing an outline of an external configuration example of an HMD device equipped with a start / stop program according to the present embodiment.
- FIG. 2 is a diagram illustrating a hardware configuration of the HMD device.
- the HMD device 1 includes an HMD device 1 and an application control device 5 that are integrally formed.
- the HMD device 1 includes a mounting body (main body) 1a for maintaining a state in which the HMD device 1 is mounted on the head of the user of the HMD device 1, and a function of displaying an image within the user's field of view.
- the application program including the display process on the display screen 2 and the operation mode are started and stopped based on the microphone 4 that outputs the speech information and the voice information and the input of the imaging information and the voice information.
- An application control device 5 is provided.
- the wearing body 1a is constituted by a frame of glasses, and the display screen 2 is fitted in the frame and positioned in front of the user's eyes.
- the application control device 5 is attached to a frame of glasses.
- the camera 3 and the microphone 4 are arranged on the front surface of the application control device 5.
- the application control device 5 is configured integrally with the mounting body 1a.
- the application control device 5 may be configured separately from the mounting body 1a, and may be connected by wire via a communication cable.
- wireless connection may be made using Bluetooth (registered trademark).
- the handling of the HMD device 1 becomes convenient, and when configured separately, there is no restriction of configuring the application control device 5 to a size that can be attached to the frame, and the degree of freedom in design is improved.
- a storage device for storing various dictionaries is required, and the application control device 5 tends to be larger. Is preferred.
- FIG. 2 is a diagram illustrating a hardware configuration of the application control device 5.
- the application control device 5 includes a CPU (Central Processing Unit) 51, a RAM (Random Access Memory) 52, a ROM (Read Only Memory) 53, an HDD (Hard Disk Drive) 54, an I / F 55, and Includes bus 58.
- the CPU 51, RAM 52, ROM 53, HDD 54, and I / F 55 are connected to each other via a bus 58.
- a ROM (Read Only Memory) 53 and an HDD (Hard Disk Drive) 54 are media that can easily downsize the application control device 5, such as SSD (Solid State Drive), as long as it is a storage medium that can store programs. You may change suitably.
- Application control device 5 is connected to HMD device 1 including display screen 2, camera 3, and microphone 4 via I / F 55. Then, a video output signal is output from the application control device 5 to the display screen 2.
- the camera 3 outputs a captured image captured with a line of sight that is substantially the same as that of the user to the application control device 5.
- the microphone 4 collects sound around the user, but may have directivity so as to have higher sensitivity to sound in front of the user.
- FIG. 3 is a block diagram showing a functional configuration of the application control device 5.
- the application control device 5 includes a speaker identification unit 510, a character string generation unit 520, a specific speech extraction unit 530, a controller 540, an application program (hereinafter referred to as “application”) 1, an application 2, and an application. 3 is included.
- application an application program
- the application control device 5 includes a user voice information storage unit 511, a voice dictionary storage unit 521, and a specific utterance information storage unit 531.
- the user voice information storage unit 511 stores the voice identification information of the user that is referred to when identifying the user of the HMD device 1.
- the speech dictionary storage unit 521 stores a speech dictionary that associates speech information with phonograms or ideograms.
- the specific utterance information storage unit 531 associates at least one of a program to be activated and an operation mode (for example, application 1, application 2, operation mode 1) and a specific utterance for starting and stopping the program and the operation mode. Specific utterance information is stored. In the present embodiment, the priority level for starting or starting each program or operation mode is also specified in the specific utterance information. Therefore, in the present embodiment, the specific utterance information includes activation rule information, and the specific utterance information storage unit 531 also functions as an activation rule information storage unit.
- the microphone 4 outputs the voice information generated by collecting the utterances of the user or the talker to the speaker specifying unit 510.
- the character string generation unit 520 generates a character string composed of phonetic characters (hereinafter referred to as “speaker character string”) as voice information, and outputs it to the specific utterance extraction unit 530.
- the specific utterance extraction unit 530 performs a specific utterance extraction process for starting and stopping a program or an operation mode. When a specific utterance for activation is extracted, the specific utterance extraction unit 530 generates an activation specific utterance extraction signal indicating the result. In addition, when a specific utterance for stopping is extracted, the specific utterance extraction unit 530 generates a stop specific utterance extraction signal indicating the result.
- the specific utterance extraction unit 530 outputs the activation specific utterance extraction signal and the stop specific utterance extraction signal to the controller (corresponding to the control unit) 540.
- the controller 540 outputs a start signal for starting the program or the operation mode or a stop signal for stopping according to the start specific utterance extraction signal and the stop specific utterance extraction signal.
- FIG. 4 is a flowchart showing a flow of start and stop processing of the HMD device 1 according to the present embodiment.
- FIG. 5 shows an example of the specific utterance information table.
- the microphone 4 collects the utterance and generates voice information, and the speaker specifying unit 510 determines whether or not the speaker is a user (S01). If the user is not a user (S01 / No), the speaker specifying unit 510 repeats the speaker specifying process without outputting the voice information to the character string generating unit 520. If the user is a user (S01 / Yes), the speaker identification unit 510 outputs the voice information to the character string generation unit 520.
- the speaker specifying unit 510 acquires voice information from the microphone 4 and performs, for example, a fast Fourier transform process on the voice information. Based on the consistency between the obtained frequency analysis result and the voice identification information stored in the user voice information storage unit 511 or the voice print of the voice information and the voice print of the voice identification information, Judgment of whether or not.
- the character string generation unit 520 converts the voice information into an utterance character string (S02) and outputs it to the specific utterance extraction unit 530.
- the character string generation unit 520 refers to the voice dictionary and converts the voice information sent from the microphone 4 into an utterance character string made up of phonetic characters.
- the specific utterance extraction unit 530 extracts a specific utterance based on the consistency between the utterance character string and the specific utterance information stored in the specific utterance information storage unit 531 (S03).
- the specific utterance is an utterance associated with each of the starting operation and the stopping operation of each program.
- the specific utterance information is data defined by associating the program name to be activated or stopped with the activation specific utterance for activating it and the specific utterance for stopping the program. It is. Further, in the present embodiment, while one program is running, the specific utterance information also defines whether or not to perform so-called exclusive control that does not activate the other program even if a specific utterance for starting another program is extracted. In FIG. 5, the drive assist program is defined as exclusive control “present”.
- the program is described as an example, but when a plurality of operation modes are included in one program, a specific utterance may be defined for each operation mode. Furthermore, instead of exclusive control, priorities may be set in a plurality of stages, and the programs and operation modes to be started and stopped may be ranked.
- step S04 If no specific utterance is extracted (S04 / No), the process returns to step S01 to repeat the process. Since the start specific utterance extraction signal and the stop specific utterance extraction signal are information indicating which program has been extracted to start or stop, the programs and operations to be started / stopped are referred to with reference to these signals. The mode can be determined by the controller 540.
- the controller 540 When the controller 540 receives the start specific utterance extraction signal and the stop specific utterance extraction signal, the controller 540 outputs a start signal (S08) or a stop signal (S09) to the program or operation mode that is the target of the start or stop operation. As a result, the target program or operation mode is started (S10) or stopped (S11).
- the operability is improved because the user only has to speak at the start and stop processing of the program and operation mode desired to be executed using the HMD device 1.
- the start / stop processing is executed after identifying whether the utterance is made by the user when extracting the specific utterance, the utterance specific utterance or the specific utterance for stoppage is included in the utterance of the person other than the user. Even if included, it is possible to prevent the start and stop operations of the program and operation mode not intended by the user from being executed.
- FIG. 6 is a block diagram showing a functional configuration of a translation program control apparatus (hereinafter referred to as “translation control apparatus”) according to the second embodiment.
- FIG. 7 is a diagram illustrating an example of the language type information table.
- the HMD device 1a is configured by replacing the application control device 5 of the first embodiment with a translation control device 5a.
- the translation control device 5a includes a language type information storage unit 522, a response character string generation unit 610, a response sentence dictionary storage unit 611, an image processing unit 620, and a display control unit 630.
- the language type storage unit 522 stores language type information shown in FIG.
- the language type information defines the user's comprehension ability (input ability) and speech ability (output ability) for each language. Each language is classified into language types according to comprehension ability and speaking ability.
- the language type is the first language that the user normally uses in conversation, the second language that can understand characters, but the understanding level is lower than the first language, and the understanding level is lower than that of the second language.
- Possible third languages, which can be spoken by the user are the fourth language, which is lower in speaking ability than the first language, and the fifth language, which is lower in speaking ability than the fourth language and cannot speak. For example, Japanese corresponds to the first language, English corresponds to the second language and the fourth language, and Chinese corresponds to the third language and the fifth language in both comprehension and speaking ability.
- the response character string generation unit 610 obtains a response sentence (including both sentences and words) from the response sentence dictionary stored in the response sentence dictionary storage unit 611 with respect to the utterance character string acquired from the character string generation unit 520. Generated based on selection or response sentence dictionary.
- the image processing unit 620 acquires a captured image obtained by capturing an image of a conversation person from the camera 3, and the conversation person wears the same HMD apparatus 1 as the user based on a characteristic image (barcode or mark) provided in the HMD apparatus in advance.
- a conversation person wearing signal used for determining whether or not the conversation is performed is generated and output to the controller 540.
- the display control unit 630 displays the utterance character string acquired from the character string generation unit 520 and the response character string acquired from the response character string generation unit 610 on the display screen 2. There are various display modes of the response character string, and the response character string may be displayed as it is or may be displayed using a tag as in the fourth embodiment described later.
- Fig. 8 is a time chart of the start and stop processing of the translation program.
- the specific utterance extraction unit 530 in the translation control device 5a, in step S06, the specific utterance extraction unit 530 generates an activation specific utterance extraction signal.
- step S07 when the activation specific utterance extraction signal sent by the controller 540 is received, an activation signal for activating the response character string generation unit 610, the image processing unit 620, and the display control unit 630 is transmitted to each block.
- Each block is activated, and by these operations, the HMD device according to the present embodiment automatically displays a character string related to a conversation and a response to a character string related to a response according to the user's speech. I can do it.
- the specific utterance extraction unit 530 detects a specific utterance for stoppage, it sends a stop detection signal notifying the detection to the controller 540.
- the controller 540 sends a stop signal to the character string generation unit 520, the response character string generation unit 610, and the display control unit 630 using the stop detection signal sent as a trigger, and stops each block.
- FIG. 9 is a flowchart showing the flow of processing of the translation program according to the second embodiment.
- the HMD device 1 determines whether or not the conversation person is using the HMD device. Automatically switch operation to generate The activation of the HMD device 1 is also triggered by the extraction of a specific utterance.
- the specific utterance extracted by the specific utterance extraction unit may be at least one of a greeting, a name, and a voiceprint of the utterance spoken in the second language or the third language.
- the microphone 4 again collects an utterance and generates voice information. Then, when the speaker specifying unit 510 determines that the conversation is by a speaker different from the user of the HMD device 1a (S21 / Yes), the controller 540 determines whether there is a speaker use signal (S22). If the speaker is a user (S21 / No), it waits for the utterance from the speaker.
- a process for determining the presence / absence of the use signal of the interlocutor for example, there is a technique of using a captured image output from the camera 3.
- a barcode or a special mark is attached to the HMD device 1a in advance.
- the image processing unit 620 extracts an area where the barcode or mark is captured from the captured image, the extracted area (feature image), and an image of the barcode or mark stored in advance for reference. And pattern matching.
- the image processing unit 620 outputs the result to the controller 540.
- the controller 540 determines whether the conversation person is wearing the HMD device based on the pattern matching result. In this case, a signal indicating the result of pattern matching used for the controller 540 to determine whether or not the HMD device 1a is attached corresponds to the conversation person use signal.
- the HMD device 1a includes a communication unit 710, for example, an RFID (Radio Frequency IDentification) and a detector, a mutual communication device using Bluetooth (registered trademark), and mutual ID Can also be realized by receiving each other.
- a communication unit 710 for example, an RFID (Radio Frequency IDentification) and a detector, a mutual communication device using Bluetooth (registered trademark), and mutual ID Can also be realized by receiving each other.
- the controller 540 determines that the conversation person is using the HMD device (S22 / Yes)
- the controller 540 generates an utterance character string in the first language used by the user in the normal conversation to the character string generation unit 520.
- a first language use signal for instructing and outputting a stop signal for stopping the response character string generation operation to the response character string generation unit 610.
- the character string generation unit 520 and the response character string generation unit 610 Based on the transmitted signal, the character string generation unit 520 and the response character string generation unit 610 generate an utterance character string in the first language of the user, and switch the operation so that the generation of the response character string is stopped (S23). ).
- the language used by the conversation person is determined (S24).
- the character string generation unit 520 In the case of the second language (S24 / second language), the character string generation unit 520 generates an utterance character string in the second language (S25).
- the character string generation unit 520 In the case of a language other than the second language, that is, the first language or the third language (S24 / first language or third language), the character string generation unit 520 generates an utterance character string in the first language of the user.
- the operation is switched (S23).
- the character string generation unit 520 switches to the first language (S23).
- the conversation person's utterance is less than the predetermined time or the difficulty level of the vocabulary used is relatively low (S26 / No)
- the generation of the utterance character string is continued in the second language.
- the predetermined time and high difficulty word are registered in advance.
- the response character string generation unit 610 determines the type of language used by the conversation person for utterance. If it is determined that the language is the fourth language (S27 / fourth language), a response character string is generated and displayed in the fourth language (S28). If it is determined that the utterance of the dialogue person is the fifth language (S27 / fifth language), a response character string is generated and displayed with the character string that constitutes the voice of the fifth language in the first language (S29). For example, if the user's first language is Japanese, the fourth language is English, and the fifth language is Chinese, when the speaker speaks in English, a response string is generated in English and the speaker speaks in Chinese When it does, a character string related to the response in Chinese is generated in katakana or romaji.
- the type of language used for the utterance character string and the response character string that is, the operation mode is set according to the utterance of the conversation person or the arrival of the conversation person's HMD device. Can be changed. At this time, since the user of the HMD device does not need to perform an operation input for setting or changing the operation mode, the operability of the HMD device can be expected to be improved.
- the utterance character string is generated and displayed using the second language or the first language, but the speaker identification unit 510 detects a plurality of conversation persons.
- the character string generation unit 520 determines that the utterance is uttered in a plurality of languages, the utterance character string may be generated in the first language regardless of the above processing.
- step S22 the presence / absence of a conversational person use signal is determined in step S22, but this step is not essential. In that case, the character string generation unit 520 may determine whether the conversation person is speaking in the first language in step S22.
- the HMD device 1 operates so as to automatically register user voice information to be stored in the user voice information storage unit in an interactive manner with the user. For this reason, the controller 540 detects that the user is the first use from the user's utterance information obtained from the microphone 4 and the pre-registration information held in the user voice information storage unit 511. When the controller 540 detects that the user is using for the first time, the controller 540 controls each block so as to perform an operation unique to the initial registration.
- the controller 540 first controls the character string generation unit 520 to output an appropriate character string and the instruction character string in a plurality of languages so that the numerical value is read in the native language. This confirms the user's native language.
- control is performed to output the instruction character string and a plurality of options so that the user selects the first language.
- the character string generation unit 520 gives a number to the option so that the user can answer with a numerical value, and outputs a character string instructing to answer with a numerical value. This establishes the user's first language.
- the second language, the third language, the fourth language, and the fifth language are determined in the same manner.
- the controller 540 registers specific utterance information for automatic activation. Therefore, the controller 540 controls the character string generation unit 520 to output a character string related to a predetermined greeting in the second language and the fourth language and a character string instructing to read the character string. Similarly, a specific utterance is registered for automatic stop. A personal name or a nickname other than the user's first language may be added to the character string related to the greeting.
- the controller 540 displays words, short sentences, and long sentences on the display screen 2 to verify the understanding level.
- the user may be instructed to read the display character string in the first language, but it is up to the user to determine whether or not he / she understands the display character string. Setting the proficiency level by utterance or setting the proficiency level by time until response utterance can complete the setting in a short time.
- the character string generation unit 520 determines a character string for setting a standard character size, a character string display used for character size determination, and a character size in order to determine a character size suitable for the user.
- the utterance method is displayed, and the character used for the determination is gradually increased from the minimum size, and the standard character size is determined by detecting the user's character size fixed utterance.
- the response character string generation unit 610 generates a response character string based on the character string converted by the character string generation unit 520, but the response character string generation unit 610 is based on voice information obtained from the microphone 4. Even if a response character string is generated, the same effect can be obtained.
- pre-registration information such as language type and voice identification information is performed by individual HMD devices, but the present invention is not limited to this.
- the pre-registration information once performed may be stored in a data storage device such as a server in association with a user ID via a communication device.
- pre-registration is not required by retrieving and downloading the pre-registration information from the server even when the other HMD head mounted display device is used for the first time.
- user IDs may be grouped in order to limit the searchable range of pre-registration information.
- the character string generation unit 520 operates to generate a character string based on the utterance of the conversation person, but the present invention is not limited to this. For example, when a user inputs a specific utterance and a word to be converted into the first language into the microphone, the word is displayed in the first language, or a specific utterance and the word to be converted from the first language are converted. When the desired language is input to the microphone, the language may be displayed in the language.
- the character string generation unit 520 displays the utterance character string
- the translation of the full sentence in the first language may be displayed according to the degree of difficulty, or a translation may be displayed for each word.
- a series of operations related to the initial setting is performed by determining that the person who uttered the voice information is not registered as a user based on the voice information and the voice identification information.
- 520 generates a setting character string used for initial setting
- the display control unit 630 displays the setting character string on the display screen 2, and the voice information that the controller 540 utters in response to the setting character string May be performed based on
- the series of operations related to the initial setting includes registration of a specific utterance.
- the “set character string used for initial setting” means the first language that the user is good at normal conversation and the second language in which the user can understand the characters, and the user cannot understand the characters.
- a string that asks the third language, a fourth language that the user can speak, a question-style string that asks the fifth language that the user cannot speak, or a greeting word or name in multiple languages A character string that prompts the user to speak a character string.
- the character string in the question format is a question sentence that can be answered with “Yes” or “No”, or a question sentence that can be answered with a number by adding a number to the beginning of each character string.
- FIG. 10 is an example of a hardware configuration diagram illustrating the HMD device according to the third embodiment.
- the HMD device 1b in FIG. 10 is different in that it obtains utterance information via the communication unit 710.
- the communication unit 710 converts the utterance information of the conversation person from a specific format to voice information, and outputs the converted voice information to the speaker identification unit 510.
- the controller 540 can determine whether or not the conversation person is using the head-mounted display by adding the device ID to the communication format of the communication unit 710 by both head-mounted display devices.
- the character string generation unit 520 communicates with the information sent from the communication unit 710.
- the character string information converted by the unit 710 is output as it is, or the character string information is simplified and output.
- the response character string generation unit 610 generates a response character string for the character string information converted by the communication unit 710 for the information sent from the communication unit 710, and sends the generated character string to the display screen 2. .
- the HMD device is one of the main features that can display a character string, an image, and a figure superimposed on a landscape in front of the user. For this reason, if the area for displaying characters, images, and figures is large, it is difficult to see the front landscape. Although it is possible to create an illusion that the displayed characters, images, and figures are displayed large in front of a few meters by creating a virtual image, the display area is still limited. Also, when talking to others in languages other than everyday use, it is easier for the user to use the text information that is viewed at a time for the translated text and response recommendation text, and the conversation may be smoother. Many.
- FIG. 11 is a diagram showing an example in which three types of response character strings of common, positive, and negative are displayed as tags.
- FIG. 12 is a diagram illustrating an example in which the display mode of the response sentence spoken by the user is changed in the example of FIG.
- FIG. 13 is a diagram illustrating a display example in which only tag items are displayed.
- FIG. 14 is a diagram illustrating an example of hierarchical display of response character strings using tags.
- FIG. 15 is a diagram illustrating an example in which response character strings are displayed in order according to a certain standard.
- the character string generation unit 520 generates a character string so that the number of characters in the display character string is reduced. For this reason, a string of characters is generated by omitting honorific expressions such as polite words, modesty words, and honorific words for the content of the conversations. Also, the titles before and after the name are omitted. In addition, subjects, verbs and nouns are given priority, and adjectives and adverbs are omitted or displayed in small letters.
- the character string generation unit 520 converts speech information into an utterance character string, and then performs part-of-speech decomposition processing and syntax analysis processing to generate a character string in which honorific expression is omitted.
- the response character string generation unit 610 selects a plurality of keywords related to the utterance contents of the conversation person from the database, classifies the selected keywords by a predetermined method, and adds a classification tag to each classified keyword. Display separately. Or it arranges and displays in order based on a predetermined
- a keyword related to the case where the response is “Yes” and “No” and a keyword common to both are selected from the database (response sentence dictionary).
- the selected keyword is output to the display screen 2 so as to be displayed in an individual area together with tags “Yes”, “No”, and “Common” (see FIG. 11).
- the used keyword (FIG. 12) is used to inform the user that the head mounted display has recognized that the keyword has been used. 11
- the keyword related to the utterance content of the conversation person and the keyword used by the user is searched from the database.
- the highlighting method of the selected keyword may be increased in bold or character size, and the timing of non-display may be delayed by a predetermined time from other keywords.
- the procedure for hiding the already displayed character string and tag may be deleted from the character string or classification having low relevance to the selected keyword.
- the character string of the tag may be displayed in the first language.
- tags 1301 are displayed first, and when the user utters a character string displayed as a tag, the color of the character string of the tag used is changed and displayed as shown in FIG.
- a procedure may be used in which other tags are not displayed, and the keywords related to the utterance contents of the conversation person and the tags used by the user are retrieved from the database and displayed.
- keywords and tags having different hierarchies may change the display area color or the character color.
- the response character string generation unit 610 first searches the database for keywords related to the conversation contents of the conversation person, and while the user is responding, the conversation contents of the conversation person and the user's A keyword related to the utterance content is searched from a database (response sentence dictionary).
- a display rule may be shown on the tag 1501 as shown in FIG. Further, point information may be added to the used keyword, and the priority order to be displayed may be determined according to the added point.
- the response character string generation unit 610 performs extraction of necessary terms from the response sentence dictionary, selection of the tag type, the word to be posted on one tag, and the type of the response sentence, and changes the display color. This is realized by the display control unit 630 executing the process of arranging and displaying the response statements in the tags.
- the utterance character string and the response character string are displayed using the tag, but a diagram or a video (moving image) may be displayed.
- the HMD device 1 may be provided with a speaker for outputting the response character string by voice.
- the user may speak the selection result of the response character string and output the selected response character string from the speaker.
- HAD device 1: HAD device, 2: display screen, 3: camera, 4: microphone, 5: application control device
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Acoustics & Sound (AREA)
- Artificial Intelligence (AREA)
- User Interface Of Digital Computer (AREA)
- Machine Translation (AREA)
Abstract
Description
第一実施形態は、HMD装置で実行されるプログラム及び動作モードの一つを、発話を基に起動・停止させる実施形態である。まず、図1乃至図2を参照してHMDシステムの概略構成について説明する。ここで図1は、本実施形態に係る起動・停止プログラムを搭載したHMD装置の外観構成例の概要を示した斜視図である。図2は、HMD装置のハードウェア構成を示す図である。
第二実施形態は、第一実施形態のプログラムとして翻訳プログラムを用いた実施形態である。まず、図6及び図7を参照して概略構成について説明する。図6は、第二実施形態に係る翻訳プログラムの制御装置(以下「翻訳制御装置」という)の機能構成を示すブロック図である。図7は、言語種類情報テーブルの一例を示す図である。
第三実施形態は、対話者の発話を、通信装置を経由して取得するHMD装置の例を説明する。図10は、第三実施形態におけるHMD装置を示すハードウェア構成図の例である。図10のHMD装置1bは、通信部710を経由して発話情報を得る点で異なる。通信部710は、対話者の発話情報を特定のフォーマットから音声情報に変換し、変換した音声情報を発話者特定部510に出力する。
一般的にHMD装置は、文字列や画像、図形を使用者の前方にある風景に重ねて表示できることが大きな特徴の一つである。このため文字や画像、図形を表示するエリアが広いと前方風景が見えづらくなってしまう。表示する文字や画像、図形は虚像を作ることで数メートル前方に大きく表示されているように錯覚させることが可能だが、その場合でも表示エリアに限界がある。また、日常使用する言語以外で他者と会話する場合には、訳文や応答リコメンド文に関して一度に見る文字情報は必要最小限にした方が使用者にとって利用しやすく、会話も円滑になる場合が多い。
Claims (15)
- 会話者の発話の入力を受け付け、音声情報を出力するマイクと、
前記音声情報を文字列に変換し、発話文字列を生成する文字列生成部と、
起動又は停止させたいプログラム及び動作モードの少なくとも一つ、及びそれらプログラム及び動作モードの其々に対し、起動又は停止させるための特定発話を関連付けた特定発話情報を格納する特定発話情報記憶部と、
前記特定発話情報を参照して、前記発話文字列に含まれる前記特定発話を抽出し、その抽出結果を示す特定発話抽出信号を生成する特定発話抽出部と、
前記特定発話抽出信号を参照し、前記プログラム又は動作モードを、起動又は停止させる制御部と、
を備えることを特徴とするヘッドマウントディスプレイシステム。 - 請求項1に記載のヘッドマウントディスプレイシステムにおいて、
ヘッドマウントディスプレイ装置の使用者を識別するために、前記使用者が予め発話した音声識別情報を記憶する使用者音声情報記憶部と、
前記マイクから出力された音声情報、及び前記音声識別情報の整合性を基に、前記会話者が前記使用者であるかを判断する発話者特定部と、を更に備え、
前記発話者特定部は、前記会話者が前記使用者であると判断した場合に、前記プログラム又は動作モードを、起動又は停止させる、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項1に記載のヘッドマウントディスプレイシステムにおいて、
前記プログラム又は前記動作モードを起動する際の優先度を規定した起動規則情報を記憶する起動規則情報記憶部を更に備え、
前記制御部は、前記特定発話抽出信号を取得すると、前記起動規則情報において、当該特定発話抽出信号に従って前記プログラム又は前記動作モードを起動することが許容されている場合に、当該プログラム又は前記動作モードを起動する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項1に記載のヘッドマウントディスプレイシステムにおいて、
会話文を構成する会話辞書を記憶する会話辞書記憶部と、
前記会話辞書を参照し、前記発話文字列に対応する応答文字列を選択又は生成する応答文字列生成部と、
前記使用者の眼前に配置される表示画面、及び当該表示画面に前記応答文字列を表示するための制御を行う表示制御部と、を更に備える、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項4に記載のヘッドマウントディスプレイシステムにおいて、
前記文字列生成部は、複数の言語の内から、前記使用者の各言語の理解力に応じて一つを選択し、選択した言語を用いて前記発話文字列を生成し、
前記応答文字列生成部は、前記複数の言語の内から、前記使用者の発言力に応じて一つを選択し、選択した言語を用いて前記応答文字列を生成する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項5に記載のヘッドマウントディスプレイシステムにおいて、
前記複数の言語は、前記使用者が通常会話で用いる第一言語と、文字の理解はできるが前記第一言語よりも理解度が低い第二言語と、当該第二言語よりもさらに理解度が低く文字の理解が不可能な第三言語とを含み、
前記特定発話抽出部が抽出する前記特定発話は、前記使用者が前記第二言語又は第三言語で発話された挨拶、名前、及び発話の声紋の少なくとも一つである、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項6に記載のヘッドマウントディスプレイシステムにおいて、
前記複数の言語は、前記使用者が発言できるが、前記第一言語よりも発言力が低い第四言語と、当該第四言語よりもさらに発言力が低く、発言が不可能な第五言語とを含み、
前記発話者特定部が、前記音声情報は前記使用者とは異なる会話者である対話者によるものと判断した場合、前記文字列生成部は、前記音声情報が第二言語によるものであると判断すると前記第二言語を用いて前記発話文字列し、前記音声情報が前記第一言語又は前記第三言語によるものであると判断すると、前記第一言語を用いて前記発話文字列を生成し、前記応答文字列生成部は、前記音声情報が第四言語によるものであると判断すると前記第四言語を用いて前記応答文字列を生成し、前記音声情報が前記第五言語によるものであると判断すると、第五言語での応答発話の発音に対応した応答文字列をローマ字あるいはカタカナで生成する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項7記載のヘッドマウントディスプレイシステムであって、
前記文字列生成部は、前記音声情報が第二言語によるものであると判断された場合、前記対話者の発話の長さ又は発話中の単語の難易度に応じて、前記発話文字列を構成する言語を前記第二言語から第一言語に変更する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項7記載のヘッドマウントディスプレイシステムであって、
前記文字列生成部は、言語が異なる複数の音声情報を取得すると、当該複数の音声情報のそれぞれについて前記第一言語を用いた前記発話文字列を生成する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項4記載のヘッドマウントディスプレイシステムであって、
前記使用者の周辺環境を撮像し、撮像画像を生成するカメラと、
前記撮像画像に、前記使用者が装着しているヘッドマウントディスプレイ装置と同機種の他のヘッドマウントディスプレイ装置が撮像されていることを示す特徴画像を検出する画像処理部と、備え、
前記制御部は、前記特徴画像が検出された場合は、前記文字列生成部に対して前記使用者が通常会話で用いる第一言語で前記発話文字列を生成することを指示する第一言語使用信号を出力し、前記応答文字列生成部に対して、前記応答文字列の生成動作を停止させる停止信号を出力する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項4記載のヘッドマウントディスプレイシステムであって、
外部装置と通信接続をする通信部を更に備え、
前記通信部は、自機と同機種の他のヘッドマウントディスプレイ装置と通信が確立した場合、前記制御部は、前記文字列生成部に対して前記使用者が通常会話で用いる第一言語で前記発話文字列を生成することを指示する第一言語使用信号を出力し、前記応答文字列生成部に対して、前記応答文字列の生成動作を停止させる停止信号を出力する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項2に記載のヘッドマウントディスプレイシステムであって、
前記使用者の眼前に配置される表示画面、及び当該表示画面に前記応答文字列を表示するための制御を行う表示制御部と、を更に備え、
前記発話者特定部が、前記音声情報及び前記音声識別情報に基づいて、前記音声情報を発話した者が使用者として登録されていない者であると判断した場合、前記文字列生成部は、初期設定に用いる設定文字列を生成し、前記表示制御部は、前記設定文字列を前記表示画面に表示し、前記制御部は、前記使用者が前記設定文字列に対して応答して発話した音声情報に基づいて、初期設定登録を行う、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項1に記載のヘッドマウントディスプレイシステムであって、
前記文字列生成部は、敬語及び敬称を使わない基本語により前記発話文字列を生成する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 請求項4に記載のヘッドマウントディスプレイシステムであって、
前記表示制御部は、副詞及び形容詞を省略した前記発話文字列を表示する、又は前記副詞及び形容詞を構成する文字のサイズを標準文字サイズより小さくして表示する、
ことを特徴とするヘッドマウントディスプレイシステム。 - 会話者の発話の入力を受け付け、音声情報を出力するステップと、
前記音声情報を文字列に変換し、発話文字列を生成するステップと、
起動又は停止させたいプログラム及び動作モードの少なくとも一つ、及びそれらプログラム及び動作モードの其々に対し、起動又は停止させるための特定発話を関連付けた特定発話情報を参照して、前記発話文字列に含まれる前記特定発話を抽出し、その抽出結果を示す特定発話抽出信号を生成するステップと、
前記特定発話抽出信号を参照し、前記プログラム又は動作モードを、起動又は停止させるステップと、
を含むことを特徴とするヘッドマウントディスプレイ装置の操作方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/084372 WO2016103415A1 (ja) | 2014-12-25 | 2014-12-25 | ヘッドマウントディスプレイシステム及びヘッドマウントディスプレイ装置の操作方法 |
CN201480083885.4A CN107003823B (zh) | 2014-12-25 | 2014-12-25 | 头戴式显示装置及其操作方法 |
JP2016565770A JP6392374B2 (ja) | 2014-12-25 | 2014-12-25 | ヘッドマウントディスプレイシステム及びヘッドマウントディスプレイ装置の操作方法 |
US15/538,830 US10613826B2 (en) | 2014-12-25 | 2014-12-25 | Head-mounted display system and operating method for head-mounted display device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2014/084372 WO2016103415A1 (ja) | 2014-12-25 | 2014-12-25 | ヘッドマウントディスプレイシステム及びヘッドマウントディスプレイ装置の操作方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016103415A1 true WO2016103415A1 (ja) | 2016-06-30 |
Family
ID=56149508
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/084372 WO2016103415A1 (ja) | 2014-12-25 | 2014-12-25 | ヘッドマウントディスプレイシステム及びヘッドマウントディスプレイ装置の操作方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US10613826B2 (ja) |
JP (1) | JP6392374B2 (ja) |
CN (1) | CN107003823B (ja) |
WO (1) | WO2016103415A1 (ja) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018105373A1 (ja) * | 2016-12-05 | 2018-06-14 | ソニー株式会社 | 情報処理装置、情報処理方法、および情報処理システム |
JPWO2018105373A1 (ja) * | 2016-12-05 | 2019-10-24 | ソニー株式会社 | 情報処理装置、情報処理方法、および情報処理システム |
JPWO2018185830A1 (ja) * | 2017-04-04 | 2019-12-26 | 株式会社オプティム | 情報処理システム、情報処理方法、ウェアラブル端末、及びプログラム |
JP2020194517A (ja) * | 2019-05-21 | 2020-12-03 | 雄史 高田 | 翻訳システムおよび翻訳システムセット |
KR20220161094A (ko) * | 2021-05-28 | 2022-12-06 | 주식회사 피앤씨솔루션 | 오프라인 환경에서 음성 명령어 번역 기능을 갖는 증강현실 글라스 장치 |
TWI816057B (zh) * | 2020-10-14 | 2023-09-21 | 財團法人資訊工業策進會 | 虛實影像融合方法、虛實影像融合系統及非暫態電腦可讀取媒體 |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6460286B2 (ja) * | 2016-08-25 | 2019-01-30 | ソニー株式会社 | 情報提示装置、および情報提示方法 |
US20190333273A1 (en) * | 2018-04-25 | 2019-10-31 | Igt | Augmented reality systems and methods for assisting gaming environment operations |
CN110459211B (zh) * | 2018-05-07 | 2023-06-23 | 阿里巴巴集团控股有限公司 | 人机对话方法、客户端、电子设备及存储介质 |
CN110874201B (zh) * | 2018-08-29 | 2023-06-23 | 斑马智行网络(香港)有限公司 | 交互方法、设备、存储介质和操作系统 |
JP7196122B2 (ja) * | 2020-02-18 | 2022-12-26 | 株式会社東芝 | インタフェース提供装置、インタフェース提供方法およびプログラム |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0981184A (ja) * | 1995-09-12 | 1997-03-28 | Toshiba Corp | 対話支援装置 |
JPH1020883A (ja) * | 1996-07-02 | 1998-01-23 | Fujitsu Ltd | ユーザ認証装置 |
JP2002507298A (ja) * | 1997-06-27 | 2002-03-05 | ルノー・アンド・オスピー・スピーチ・プロダクツ・ナームローゼ・ベンノートシャープ | 自動音声認識を有するアクセス制御コンピュータシステム |
JP2002244842A (ja) * | 2001-02-21 | 2002-08-30 | Japan Science & Technology Corp | 音声通訳システム及び音声通訳プログラム |
JP2005031150A (ja) * | 2003-07-07 | 2005-02-03 | Canon Inc | 音声処理装置および方法 |
JP2014164537A (ja) * | 2013-02-26 | 2014-09-08 | Yasuaki Iwai | 仮想現実サービス提供システム、仮想現実サービス提供方法 |
JP2014203454A (ja) * | 2013-04-02 | 2014-10-27 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 電子装置及びそのデータ処理方法 |
Family Cites Families (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6415258B1 (en) * | 1999-10-06 | 2002-07-02 | Microsoft Corporation | Background audio recovery system |
US7328409B2 (en) * | 2003-04-17 | 2008-02-05 | International Business Machines Corporation | Method, system, and computer program product for user customization of menu items |
JP2005222316A (ja) | 2004-02-05 | 2005-08-18 | Toshiba Corp | 会話支援装置、会議支援システム、受付業務支援システム及びプログラム |
US7552053B2 (en) * | 2005-08-22 | 2009-06-23 | International Business Machines Corporation | Techniques for aiding speech-to-speech translation |
JP4640046B2 (ja) * | 2005-08-30 | 2011-03-02 | 株式会社日立製作所 | デジタルコンテンツ再生装置 |
JP2007280163A (ja) | 2006-04-10 | 2007-10-25 | Nikon Corp | 電子辞書 |
US8230332B2 (en) * | 2006-08-30 | 2012-07-24 | Compsci Resources, Llc | Interactive user interface for converting unstructured documents |
US20080082316A1 (en) * | 2006-09-30 | 2008-04-03 | Ms. Chun Yu Tsui | Method and System for Generating, Rating, and Storing a Pronunciation Corpus |
US8909532B2 (en) * | 2007-03-23 | 2014-12-09 | Nuance Communications, Inc. | Supporting multi-lingual user interaction with a multimodal application |
US9734858B2 (en) * | 2008-06-08 | 2017-08-15 | Utsunomiya University | Optical information recording/reproduction method and device |
US9111538B2 (en) * | 2009-09-30 | 2015-08-18 | T-Mobile Usa, Inc. | Genius button secondary commands |
JP2013521576A (ja) * | 2010-02-28 | 2013-06-10 | オスターハウト グループ インコーポレイテッド | 対話式ヘッド取付け型アイピース上での地域広告コンテンツ |
US8514263B2 (en) * | 2010-05-12 | 2013-08-20 | Blue Jeans Network, Inc. | Systems and methods for scalable distributed global infrastructure for real-time multimedia communication |
JP5124001B2 (ja) * | 2010-09-08 | 2013-01-23 | シャープ株式会社 | 翻訳装置、翻訳方法、コンピュータプログラムおよび記録媒体 |
US9122307B2 (en) * | 2010-09-20 | 2015-09-01 | Kopin Corporation | Advanced remote control of host application using motion and voice commands |
US9098488B2 (en) * | 2011-04-03 | 2015-08-04 | Microsoft Technology Licensing, Llc | Translation of multilingual embedded phrases |
US20120310622A1 (en) * | 2011-06-02 | 2012-12-06 | Ortsbo, Inc. | Inter-language Communication Devices and Methods |
US20130021374A1 (en) * | 2011-07-20 | 2013-01-24 | Google Inc. | Manipulating And Displaying An Image On A Wearable Computing System |
EP2860726B1 (en) * | 2011-12-30 | 2017-12-06 | Samsung Electronics Co., Ltd | Electronic apparatus and method of controlling electronic apparatus |
KR20130133629A (ko) * | 2012-05-29 | 2013-12-09 | 삼성전자주식회사 | 전자장치에서 음성명령을 실행시키기 위한 장치 및 방법 |
ES2898981T3 (es) * | 2012-08-09 | 2022-03-09 | Tobii Ab | Activación rápida en un sistema de seguimiento de la mirada |
US8543834B1 (en) * | 2012-09-10 | 2013-09-24 | Google Inc. | Voice authentication and command |
US8761574B2 (en) * | 2012-10-04 | 2014-06-24 | Sony Corporation | Method and system for assisting language learning |
US20150199908A1 (en) * | 2013-02-08 | 2015-07-16 | Google Inc. | Translating content for learning a language |
US9262405B1 (en) * | 2013-02-28 | 2016-02-16 | Google Inc. | Systems and methods of serving a content item to a user in a specific language |
WO2014199602A1 (ja) * | 2013-06-10 | 2014-12-18 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ | 話者識別方法、話者識別装置及び情報管理方法 |
CN103593051B (zh) * | 2013-11-11 | 2017-02-15 | 百度在线网络技术(北京)有限公司 | 头戴式显示设备 |
US9541996B1 (en) * | 2014-02-28 | 2017-01-10 | Google Inc. | Image-recognition based game |
US9324065B2 (en) * | 2014-06-11 | 2016-04-26 | Square, Inc. | Determining languages for a multilingual interface |
US9444773B2 (en) * | 2014-07-31 | 2016-09-13 | Mimecast North America, Inc. | Automatic translator identification |
-
2014
- 2014-12-25 CN CN201480083885.4A patent/CN107003823B/zh active Active
- 2014-12-25 WO PCT/JP2014/084372 patent/WO2016103415A1/ja active Application Filing
- 2014-12-25 US US15/538,830 patent/US10613826B2/en active Active
- 2014-12-25 JP JP2016565770A patent/JP6392374B2/ja active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0981184A (ja) * | 1995-09-12 | 1997-03-28 | Toshiba Corp | 対話支援装置 |
JPH1020883A (ja) * | 1996-07-02 | 1998-01-23 | Fujitsu Ltd | ユーザ認証装置 |
JP2002507298A (ja) * | 1997-06-27 | 2002-03-05 | ルノー・アンド・オスピー・スピーチ・プロダクツ・ナームローゼ・ベンノートシャープ | 自動音声認識を有するアクセス制御コンピュータシステム |
JP2002244842A (ja) * | 2001-02-21 | 2002-08-30 | Japan Science & Technology Corp | 音声通訳システム及び音声通訳プログラム |
JP2005031150A (ja) * | 2003-07-07 | 2005-02-03 | Canon Inc | 音声処理装置および方法 |
JP2014164537A (ja) * | 2013-02-26 | 2014-09-08 | Yasuaki Iwai | 仮想現実サービス提供システム、仮想現実サービス提供方法 |
JP2014203454A (ja) * | 2013-04-02 | 2014-10-27 | 三星電子株式会社Samsung Electronics Co.,Ltd. | 電子装置及びそのデータ処理方法 |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018105373A1 (ja) * | 2016-12-05 | 2018-06-14 | ソニー株式会社 | 情報処理装置、情報処理方法、および情報処理システム |
JPWO2018105373A1 (ja) * | 2016-12-05 | 2019-10-24 | ソニー株式会社 | 情報処理装置、情報処理方法、および情報処理システム |
US20200075015A1 (en) | 2016-12-05 | 2020-03-05 | Sony Corporation | Information processing device, information processing method, and information processing system |
US11189289B2 (en) | 2016-12-05 | 2021-11-30 | Sony Corporation | Information processing device, information processing method, and information processing system |
JPWO2018185830A1 (ja) * | 2017-04-04 | 2019-12-26 | 株式会社オプティム | 情報処理システム、情報処理方法、ウェアラブル端末、及びプログラム |
JP2020194517A (ja) * | 2019-05-21 | 2020-12-03 | 雄史 高田 | 翻訳システムおよび翻訳システムセット |
TWI816057B (zh) * | 2020-10-14 | 2023-09-21 | 財團法人資訊工業策進會 | 虛實影像融合方法、虛實影像融合系統及非暫態電腦可讀取媒體 |
KR20220161094A (ko) * | 2021-05-28 | 2022-12-06 | 주식회사 피앤씨솔루션 | 오프라인 환경에서 음성 명령어 번역 기능을 갖는 증강현실 글라스 장치 |
KR102602513B1 (ko) * | 2021-05-28 | 2023-11-16 | 주식회사 피앤씨솔루션 | 오프라인 환경에서 음성 명령어 번역 기능을 갖는 증강현실 글라스 장치 |
Also Published As
Publication number | Publication date |
---|---|
JP6392374B2 (ja) | 2018-09-19 |
JPWO2016103415A1 (ja) | 2017-11-09 |
US10613826B2 (en) | 2020-04-07 |
CN107003823A (zh) | 2017-08-01 |
CN107003823B (zh) | 2020-02-07 |
US20180011687A1 (en) | 2018-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6392374B2 (ja) | ヘッドマウントディスプレイシステム及びヘッドマウントディスプレイ装置の操作方法 | |
US9640181B2 (en) | Text editing with gesture control and natural speech | |
KR102002979B1 (ko) | 사람-대-사람 교류들을 가능하게 하기 위한 헤드 마운티드 디스플레이들의 레버리징 | |
KR101777807B1 (ko) | 수화 번역기, 시스템 및 방법 | |
US6377925B1 (en) | Electronic translator for assisting communications | |
KR20200059054A (ko) | 사용자 발화를 처리하는 전자 장치, 및 그 전자 장치의 제어 방법 | |
KR20210137118A (ko) | 대화 단절 검출을 위한 글로벌 및 로컬 인코딩을 갖는 컨텍스트 풍부 주의 기억 네트워크를 위한 시스템 및 방법 | |
CN109543021B (zh) | 一种面向智能机器人的故事数据处理方法及系统 | |
EP3550449A1 (en) | Search method and electronic device using the method | |
KR20180116726A (ko) | 음성 데이터 처리 방법 및 이를 지원하는 전자 장치 | |
Alkhalifa et al. | Enssat: wearable technology application for the deaf and hard of hearing | |
Priya et al. | Indian and English language to sign language translator-an automated portable two way communicator for bridging normal and deprived ones | |
JP2002244842A (ja) | 音声通訳システム及び音声通訳プログラム | |
KR100949353B1 (ko) | 언어 장애인용 대화 보조 장치 | |
JP4079275B2 (ja) | 会話支援装置 | |
JP2019082981A (ja) | 異言語間コミュニケーション支援装置及びシステム | |
Goetze et al. | Multimodal human-machine interaction for service robots in home-care environments | |
JP7468360B2 (ja) | 情報処理装置および情報処理方法 | |
JP2006301967A (ja) | 会話支援装置 | |
JP2002244841A (ja) | 音声表示システム及び音声表示プログラム | |
US11657814B2 (en) | Techniques for dynamic auditory phrase completion | |
US20240347045A1 (en) | Information processing device, information processing method, and program | |
JP6509308B1 (ja) | 音声認識装置およびシステム | |
US20240119930A1 (en) | Artificial intelligence device and operating method thereof | |
KR20210144443A (ko) | 인공지능 가상 비서 서비스에서의 텍스트 출력 방법 및 이를 지원하는 전자 장치 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14909022 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016565770 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15538830 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14909022 Country of ref document: EP Kind code of ref document: A1 |