WO2018061743A1

WO2018061743A1 - Wearable terminal

Info

Publication number: WO2018061743A1
Application number: PCT/JP2017/032781
Authority: WO
Inventors: 軌行石井; 実矢口
Original assignee: コニカミノルタ株式会社
Priority date: 2016-09-28
Filing date: 2017-09-12
Publication date: 2018-04-05

Abstract

Provided is a wearable terminal capable of achieving a suitable action while compensating for the instability of voice recognition input by detecting a user operation. When a user's voice input to a voice input unit and detection of the user's operation by an operation detection unit take place within a predetermined time interval, a control unit of the wearable terminal compares a word resulting from conversion by a voice interpretation unit with the user's operation detected by the operation detection unit, and determines an action corresponding to the word if the control unit determines that the word and the user's operation have a predetermined relationship.

Description

Wearable terminal

The present invention relates to a wearable terminal capable of voice recognition.

In recent years, wearable terminals such as head-mounted displays, which have been developed rapidly, have many devices that can transmit information even when the user's hands are full, thereby improving work efficiency. There is an advantage that you can. Therefore, it can be said that hands-free input without using a hand is more suitable for a wearable terminal than manual input such as button operation in order to maximize the advantages. One type of hands-free input has been developed that uses a voice recognition function, but a head-mounted display can be used even under conditions where both hands of the user are blocked. It is said that the affinity is high.

However, in the case of the input method using the voice recognition function, not only the user's appropriate utterance but also the noise unrelated to the utterance and the user's monologue etc. are input via the microphone etc. As a result, there is a risk of causing an unintended operation by the user. In particular, utterances of short words such as “Yes” are relatively difficult to distinguish from noise by the speech recognition function.

In response to this, an input method based on gesture detection has been developed, in which a user's hand movement (gesture) is detected by a camera or the like and input is performed. However, even in the input method based on gesture detection, an erroneous input may be caused by an ambiguous gesture of the user, which may lead to an operation unintended by the user.

Patent Document 1 discloses a head-mounted display that can be worn on a user's head, detects an user's specific action, receives an instruction from the user, and an instruction input by the instruction input unit And a control means for performing a specific operation of the head-mounted display, and the instruction input means is a head-mounted display (hereinafter referred to as HMD) having a motion detection function for detecting the movement of the user's head. ) Is disclosed. Further, Patent Document 1 also includes a reference regarding speech recognition.

JP 2004-233909 A

Here, Patent Document 1 states that “a specific operation of the HMD may be performed according to the order of user actions. That is, two or more user actions are performed at different timings or in parallel. There may be a description that the HMD may be made to perform a specific operation of the HMD ”, and“ when the head movement and voice input are made simultaneously, the HMD performs a specific operation. You may make it do ". That is, Patent Document 1 mentions that a sensor and sound are individually detected and different operations are performed in accordance with the detection. However, if the voice recognition result is inappropriate, or if the sensor input is inappropriate, there still remains a possibility of causing a malfunction. That is, Patent Document 1 does not disclose or suggest that the speech recognition result and the sensor input are collated to determine whether or not the input is appropriate, and if the input is appropriate, the corresponding operation is determined and executed. .

The present invention has been made in view of the above circumstances, and provides a wearable terminal capable of realizing an appropriate action while compensating for input instability due to voice recognition by detecting a user's operation. Objective.

In order to achieve at least one of the above-described objects, a wearable terminal reflecting one aspect of the present invention is a wearable terminal used by being worn on a user's body,
A voice input unit for inputting the voice of the user;
A voice decoding unit that converts the voice of the user input by the voice input unit into a word;
An operation detection unit for detecting the operation of the user;
The words converted by the speech decoding unit when the input of the user's speech by the speech input unit and the detection of the user's motion by the motion detection unit occur within a predetermined time interval, and the motion detection And a control unit that determines an action corresponding to the word when it is determined that a predetermined relationship is established between the user's movements detected by the unit.

According to the present invention, it is possible to provide a wearable terminal capable of realizing an appropriate action while compensating for input instability due to voice recognition by detecting a user's action.

It is a perspective view of the head mounted display (HMD) concerning this embodiment. It is the figure which looked at HMD from the front. It is the figure which looked at HMD from the upper part. It is a block diagram of the main circuit of HMD. It is a front view when a user wears HMD. It is a figure which shows the state which the user US turned to the left, with HMD100 mounted | worn. It is a figure which shows the state which the user US faced the right with mounting | wearing with HMD100. It is a figure showing the state where user US who looked from the side turned up with wearing HMD100. It is a figure which shows the state which the user US seen from the side turned down, with HMD100 mounted | worn. It is an example of table TBL1 memorize | stored in RAM126. 5 is a flowchart illustrating an interrupt routine that is repeatedly executed in a control processing unit 121. 5 is a flowchart illustrating an interrupt routine that is repeatedly executed in a control processing unit 121. It is a figure which shows the example of a display which requests | requires confirmation in the image display part 104B visually recognized by the user US. It is a figure which shows the example in which 23 pages of all 104 pages of a document were displayed in the image display part 104B visually recognized by the user US. It is a figure which shows the state in which user US puts hand HD in front of the face with HMD100 mounted. It is a figure which shows the state in which the user US left the hand HD in front of the face while wearing the HMD100. It is a figure which shows the state in which user US puts hand HD in front of the face with wearing HMD100. It is a figure which shows the state in which user US put down hand HD in front of the face with HMD100 wearing. For example, it is an example of the table TBL2 stored in the RAM 126.

Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a perspective view of a head mounted display (hereinafter, HMD) 100 which is a wearable terminal according to the present embodiment. FIG. 2 is a front view of the HMD 100 according to the present embodiment. FIG. 3 is a view of the HMD 100 according to the present embodiment as viewed from above. Hereinafter, the right side and the left side of the HMD 100 refer to the right side and the left side for the user wearing the HMD 100.

As shown in FIGS. 1 to 3, the HMD 100 of this embodiment has a frame 101 as a support member. A frame 101 that is U-shaped when viewed from above has a front part 101a to which two spectacle lenses 102 are attached, and

side parts

101b and 101c extending rearward from both ends of the front part 101a. The two spectacle lenses 102 attached to the frame 101 may or may not have refractive power.

A cylindrical main body 103 as a support member is fixed to the front portion 101a of the frame 101 on the upper side of the spectacle lens 102 on the right side (which may be on the left side depending on the user's dominant eye). The main body 103 is provided with a display unit 104. A display control unit 104DR (see FIG. 4 described later) that controls display of the display unit 104 based on an instruction from a control processing unit (control unit) 121 described later is disposed in the main body 103. If necessary, a display unit may be arranged in front of both eyes.

The display unit 104 includes an image forming unit (not shown) housed in the main body unit 103 and an image display unit 104B. The image display unit 104B, which is a so-called see-through type display member, has a generally plate shape that is disposed so as to extend downward from the main body unit 103 and to extend in parallel with one eyeglass lens 102 (see FIG. 1). is there. Based on the image data input from the display control unit 104DR to the image forming unit, the image display unit 104B displays a color image on the image display unit 104B using the image light that is modulated for each pixel and emitted. On the other hand, since the image display unit 104B transmits almost all the external light, the user can observe an external image (real image) through these. Therefore, the virtual image of the image displayed on the image display unit 104B is observed while overlapping a part of the external image. In this way, the user of the HMD 100 can simultaneously observe the image provided via the image display unit 104B and the external image. Note that when the display unit 104 is in the non-display state, the image display unit 104B is transparent, and only the external image can be observed.

Further, in FIGS. 1 and 2, on the front surface of the main body 103, the proximity sensor 105 disposed near the center of the frame 101, the lens 106 a of the camera 106 disposed near the side of the frame 101, and the proximity sensor 105. And the illuminance sensor 112 disposed between the lens 106a and the lens 106a so as to face each other.

The proximity sensor 105 exists in a detection region in the proximity range in front of the detection surface of the proximity sensor in order to detect that an object, for example, a part of a human body (such as a hand or a finger) is close to the user's eyes. It has a function of detecting whether or not the signal is output and outputting a signal. The proximity range may be set as appropriate according to the operator's characteristics and preferences. For example, the proximity range from the detection surface of the proximity sensor may be within a range of 200 mm. If the distance from the proximity sensor is 200 mm or less, the user can easily put the palm and fingers into and out of the user's field of view with the arm bent, so that the user can easily operate with gestures using the hands and fingers. It is also preferable because it reduces the risk of erroneous detection of a human body or furniture other than the user.

1 and 2, the right sub-body portion 107 is attached to the right side portion 101b of the frame 101, and the left sub-body portion 108 is attached to the left side portion 101c of the frame 101. The right sub-main body portion 107 and the left sub-main body portion 108 have an elongated plate shape, and have elongated

protrusions

107a and 108a on the inner side, respectively. By engaging the elongated protrusion 107 a with the elongated hole 101 d of the side portion 101 b of the frame 101, the right sub-body portion 107 is attached to the frame 101 in a positioned state, and the elongated protrusion 108 a is attached to the side of the frame 101. By engaging with the long hole 101e of the portion 101c, the left sub-main body portion 108 is attached to the frame 101 in a positioned state.

In the right sub-body portion 107, there are a geomagnetic sensor 109 (see FIG. 4 described later) for detecting geomagnetism, and an angular velocity sensor 110B and an acceleration sensor 110A (see FIG. 4 described later) that generate an output corresponding to the posture. The left sub-main unit 108 is provided with a speaker / earphone 111C and a microphone 111B (see FIG. 4 described later). The main main body 103 and the right sub main body 107 are connected so as to be able to transmit signals through a wiring HS, and the main main body 103 and the left sub main body 108 are connected so as to be able to transmit signals through a wiring (not shown). Yes. As schematically illustrated in FIG. 3, the right sub-main body 107 is connected to the control unit CTU via a cord CD extending from the rear end. A 6-axis sensor in which an angular velocity sensor and an acceleration sensor are integrated may be used. Furthermore, the HMD can be operated by sound based on an output signal generated from the microphone 111B according to the input sound. Further, the main main body 103 and the left sub main body 108 may be configured to be wirelessly connected. However, the provision of the color temperature sensor 113 and the temperature sensor 114 is optional. The position where the microphone 111B is provided is arbitrary, but is preferably a position suitable for recording the voice spoken by the user US.

FIG. 4 is a block diagram of main circuits of the HMD 100. The control unit CTU generates a control signal for the display unit 104 and other functional devices, a control processing unit 121, an operation unit 122, a GPS receiving unit 123 that receives radio waves from GPS satellites, and external and data. A communication unit 124 that exchanges programs, a ROM 125 that stores programs and the like, a RAM 126 that stores image data and the like, a power supply circuit 130 that converts the voltage applied from the battery 127 into appropriate voltages for each unit, and an SSD And a storage device 129 such as a flash memory and a voice recognition unit 111E. Although the control processor 121 can use an application processor used in a smartphone or the like, the type of the control processor 121 is not limited. For example, if an application processor includes hardware necessary for image processing such as GPU or Codec as a standard, it can be said that the processor is suitable for a small HMD.

Further, when the light receiving unit of the proximity sensor 105 detects invisible light as detection light emitted from the human body, the signal is input from the proximity sensor 105 and the ambient brightness is detected. A signal from the illuminance sensor 112 is input. Further, the control processing unit 121 controls image display on the display unit 104 via the display control unit 104DR.

The control processing unit 121 receives power from the power supply circuit 130, operates according to a program stored in at least one of the ROM 124 and the storage device 129, and receives an image from the camera 106 according to an operation input such as power-on from the operation unit 122. Data can be input and stored in the RAM 126, and can be communicated with the outside via the communication unit 124 as necessary.

The microphone 111B collects the voice spoken by the user US, converts it into a signal, and inputs the signal to the voice processing unit 111D. The voice processing unit 111D processes the signal output from the microphone 111B and controls it as a voice signal. The voice recognition unit 111E outputs the signal to the voice recognition unit 111E of the CTU, analyzes the voice signal output from the voice processing unit 111D, converts it into a word, and inputs the information to the control processing unit 121. It has become. Here, the microphone 111B and the voice processing unit 111D constitute a voice input unit, and the voice recognition unit 111E constitutes a voice decoding unit, but the microphone 111B is externally attached and receives a signal via a pin jack or the like. In such a case, only the voice processing unit 111D may constitute the voice input unit.

Next, an example in which the acceleration sensor 110A that can detect the movement of the head of the user US is used as the motion detection unit that detects the motion of the user will be described. Here, the “user operation” detected by the operation detection unit is an operation performed by the user in a non-contact manner on the HMD 100. FIG. 5 is a front view when the user US wears the HMD 100 of the present embodiment. FIG. 6 is a diagram illustrating a state in which the user US is facing left while wearing the HMD 100, and FIG. 7 is a diagram illustrating a state in which the user US is facing right while wearing the HMD 100. Further, FIG. 8 is a diagram showing a state in which the user US viewed from the side is facing upward while wearing the HMD 100, and FIG. It is a figure which shows the state which faced.

FIG. 10 is an example of the table TBL1 stored in the RAM 126, for example. Here, when the control processing unit 121 receives the signal output from the acceleration sensor 110A, based on the signal, the head of the user US faces upward as shown in FIG. If it is determined that the user has not headed, the flag is set to “up” and the head of the user US turns downward as shown in FIG. When the head of the user US turns to the right as shown in FIG. 7 and then determines that the head of the user US has not turned to the left, the flag is set to “right” and the head of the user US If it is determined that the user has not turned to the right after turning to the left as shown in FIG. 8, the flag is set to “left” and the head of the user US is moving up and down between FIGS. If it is determined, the flag is set to “up and down” and the head of the user US is shown in FIG. If it is determined to be moving to the left and right in between, the flag and the "left and right". When the above flag is set, the control processing unit 121 executes control (see FIG. 11) described later, assuming that a predetermined sensor input has been made. The types of flags are not limited to the above.

In addition, as a result of the voice recognition performed by the voice recognition unit 111E, the control processing unit 121 determines that “up”, “done”, “migi”, “hidari”, “yes”, “no”, “page”, “on” If it is determined that eight types of words “page” have been input, the control (see FIG. 12) described later is executed assuming that the prescribed voice recognition has been performed. The word is not limited to the above, but for example, the user swings his head vertically as an action corresponding to “Yes”, and the user swings his head horizontally as an action corresponding to “No”. It is preferable to have a meaning related to the operation of the US because it becomes a natural operation. In the table TBL1, it is assumed that combinations of flags and words whose action contents are defined in the columns are effectively associated with each other, and therefore a predetermined relationship is established between the two. On the other hand, it is assumed that there is no correspondence relationship between the combinations marked with “X” in the column, and a predetermined relationship is not established between them. Such association between the sensor input and the voice recognition result can be arbitrarily performed. The next action can also be set arbitrarily. The contents of the table can be arbitrarily changed. In this case, the contents can be input while operating the HMD 100, or the contents adjusted by an external PC or the like can be downloaded by wire or wirelessly.

11 and 12 are flowcharts showing an interrupt routine that is repeatedly executed in the control processing unit 121. FIG. When the specified sensor input is performed before the specified voice recognition, the control of the flowchart of FIG. 11 is executed. When the specified voice recognition is performed before the specified sensor input, the control of the flowchart of FIG. Executed.

First, in the interrupt routine of FIG. 11, when the control processing unit 121 determines in step S101 that the specified sensor input is not performed (determination NO), it immediately exits the interrupt routine, but determines that the specified sensor input is performed. If so (determination YES), the control processing unit 121 resets and starts the built-in timer in subsequent step S102.

Further, in step S103, the control processing unit 121 determines whether or not the user's utterance input and the prescribed voice recognition have been performed. Here, when it is determined that the user's speech input and the prescribed voice recognition are not performed, in step S104, the control processing unit 121 determines whether or not the time count of the built-in timer exceeds 1 second. If the time count of the built-in timer does not exceed 1 second, the flow returns to step S103 to wait for the user's speech input and the prescribed voice recognition. On the other hand, if the time of the built-in timer exceeds 1 second before the prescribed voice recognition is performed, the interrupt routine is immediately exited. In other words, when a specified sensor input (detection) is made and a user's US utterance is input within a predetermined time (within 1 second in this case) and a predetermined voice recognition is made, both inputs are within a predetermined time interval. In other cases, it is determined that the inputs of both have not occurred within a predetermined time interval. The predetermined time is not limited to 1 second, it may be fixed or variable, and it is desirable that it can be adjusted according to the characteristics of the device.

On the other hand, if it is determined that the specified voice recognition has been made within 1 second after the specified sensor input is made (YES in step S103), the control processing unit 121 stores the table TBL1 stored in the RAM 126 in step S105. With reference, the flag based on the sensor input is compared with the word of the voice recognition result to determine whether or not a predetermined relationship is established. If the predetermined relationship is not established, the flow immediately exits the interrupt routine. On the other hand, if the predetermined relationship is established between the flag and the word, the control processing unit 121 defines in the corresponding column of the table TBL1 in step S106. The next action taken is determined and executed, after which the flow exits the interrupt routine.

On the other hand, in the interrupt routine of FIG. 12, when the control processing unit 121 determines in step S201 that the user's utterance is not input or the user's utterance is input but the prescribed voice recognition is not performed ( (Determination NO), the process immediately exits the interrupt routine, but if it is determined that the user's US utterance has been input and the prescribed voice recognition has been performed (determination YES), then in step S202, the control processing unit 121 resets the built-in timer. And start.

Further, in step S203, the control processing unit 121 determines whether or not a predetermined sensor input has been performed. If it is determined that the specified sensor input has not been performed, the control processing unit 121 further determines whether or not the built-in timer has exceeded 1 second in step S204. If the time measured by the built-in timer does not exceed 1 second, the flow returns to step S203 to wait for a prescribed sensor input. On the other hand, if the time of the built-in timer exceeds 1 second before the specified sensor input is performed, the interrupt routine is immediately exited. In other words, when the user's US utterance is input and the specified voice recognition is performed and the specified sensor input (detection) is made within a predetermined time (within 1 second in this case), both inputs are set at a predetermined time interval. It is determined that the input has not occurred within a predetermined time interval. The predetermined time is not limited to 1 second.

On the other hand, if it is determined that the specified sensor input has been made within one second after the specified voice recognition is made (YES in step S203), the control processing unit 121 stores the table TBL1 stored in the RAM 126 in step S205. With reference, the flag based on the sensor input is compared with the word of the voice recognition result to determine whether or not a predetermined relationship is established. If the predetermined relationship is not established, the flow immediately exits the interrupt routine, whereas if the predetermined relationship is established between the flag and the word, the control processing unit 121 defines in the corresponding column of the table TBL1 in step S206. The next action taken is determined and executed, after which the flow exits the interrupt routine.

Here, as a specific example, a display requesting confirmation as shown in FIG. 13 is made on the image display unit 104B visually recognized by the user US, and a button B1 “Yes” and a button B2 “No” are displayed at the same time. It is assumed that For example, when the “up / down” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “yes”, a predetermined relationship is obtained by collating the table TBL1. Therefore, the next action is “determine”, and in response to this, the control processing unit 121 turns on (highlights) the button B1. This confirms (determines) the confirmation. On the other hand, when the “left / right” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “no”, a predetermined relationship is obtained by collating the table TBL1. Therefore, the next action is “cancel”, and in response thereto, the control processing unit 121 turns on (highlights) the button B2. As a result, the confirmation is denied (cancelled).

In the case of a combination that does not have a correspondence relationship in the table TBL1 (for example, when the “left and right” flag is set in the sensor input and the voice recognition result is “Yes”), the sensor input result and the voice recognition are verified by collating the table TBL1. Since the predetermined relationship with the result is not established (the inside of the column is x), the next routine is exited without deciding the next action, so neither of the buttons B1 and B2 is turned on. . As a result, the voice recognition result uttered by the user US can be supported by the swing motion of the user US, and an appropriate action desired by the user US can be determined, so that malfunction can be effectively prevented.

In the example shown in FIG. 14, it is assumed that 23 pages out of all 104 pages of the document are displayed on the image display unit 104B visually recognized by the user US. For example, when the “right” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “page”, the control processing unit 121 performs the next action. Then, the page is turned to the next page (page 24). On the other hand, when the “left” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “page”, the control processing unit 121 performs the next action. Then, the page is turned to the previous page (page 22).

Further, when the “right” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “good”, the control processing unit 121 As an action, page up to 4 pages ahead. On the other hand, when the “left” flag is set by the sensor input at step S101 (or S203) and the word of the speech recognition result at step S103 (or S201) is “once page”, the control processing unit 121 As an action, return the page to the previous 4 pages.

When the “up” flag is set for sensor input and the voice recognition result is “up”, for example, the transition to the menu screen is performed as the next action, and the “down” flag is set for sensor input. When the voice recognition result is “done”, for example, a transition to the setting screen is performed as the next action. Similarly, when a “right” or “left” flag is set in the sensor input and the voice recognition result is “migi” or “hidari”, a predetermined action can be executed.

On the other hand, when the combination does not exist in the table TBL1 (for example, when the “up” flag is set in the sensor input and the speech recognition result is “page”), the sensor input is performed by checking the table TBL1. Since it can be seen that the predetermined relationship between the result and the voice recognition result is not established (the inside of the column is x), the next action is not decided and the interruption routine is exited, so that the action that the user US does not want is performed. There is nothing to do.

As a modification example described above, the control processing unit 121 performs confirmation display (see FIG. 13) on whether or not the page can be turned between step S205 and step S206 on the image display unit 104B. The page turning as the next action may be performed in response to the voice recognition result “yes” by the utterance of the user US and the input of the movement of the user US.

Next, an example in which the proximity sensor 105 that can detect the movement of the hand HD of the user US is used as the motion detection unit that detects the motion of the user will be described. FIG. 15 is a diagram showing a state in which the user US puts the hand HD on the right in front of the face while wearing the HMD 100. FIG. 16 shows a state in which the user US puts the hand HD in front of the face while wearing the HMD 100. It is a figure which shows the state put on the left. Further, FIG. 17 is a diagram illustrating a state in which the user US is wearing the HMD 100 and the hand HD is placed in front of the face. FIG. 18 is a diagram illustrating the state in which the user US is wearing the HMD 100 It is a figure which shows the state put down before. The proximity sensor 105 has, for example, a detection region divided into four parts, and can output a gesture signal by distinguishing which position the hand HD is in as shown in FIGS.

FIG. 19 shows an example of the table TBL2 stored in the RAM 126, for example. Here, when the control processing unit 121 receives the gesture signal output from the proximity sensor 105, based on the signal, after the hand HD of the user US moves above the face as shown in FIG. 17, When it is determined that it has not moved downward, the flag is set to “Up”, and after it is determined that the hand HD of the user US has not moved upward after moving below the face as shown in FIG. If the flag is set to “down” and the user's US hand HD moves to the right as shown in FIG. 15 and then does not move to the left, the flag is set to “right” When it is determined that the US hand HD has moved to the left of the face as shown in FIG. 16 and then has not moved to the right, the flag is set to “left” and the user's US hand HD is changed to FIGS. Moving up and down between If you cross, the flag is "vertical", if the hand of the user US HD is determined to be moving from side to side between 15 and 16, the flag is set to "right". When the above flag is set, the control processing unit 121 executes control (see FIG. 11) assuming that a predetermined sensor input has been made. The types of flags are not limited to the above.

In addition, as a result of the speech recognition by the speech recognition unit 111E, the control processing unit 121 determines six types of words “up”, “done”, “migi”, “hidari”, “yes”, “no”. If it is determined that the input has been made, control (see FIG. 12) is executed assuming that the prescribed voice recognition has been performed. The word is not limited to the above. In the table TBL1, the combinations whose action contents are described in the column are assumed to have a predetermined relationship between them, while the combinations marked with “X” in the column have no corresponding relationship. It is assumed that the predetermined relationship is not established. Also in this example, the control processing unit 121 can execute control according to the flowcharts shown in FIGS.

Specifically, referring to FIG. 13, for example, an “up / down” flag is set by the input of the proximity sensor 105 in step S101 (or S203), and the word of the speech recognition result in step S103 (or S201) is “Yes”. ”, It is determined that the predetermined relationship has been established by collating the table TBL2, and the next action is“ determine ”. In response, the control processing unit 121 turns on the button B1. Become. This confirms (determines) the confirmation. On the other hand, when the “left / right” flag is set by the input of the proximity sensor 105 in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “no”, the table TBL2 is collated. When it is determined that the predetermined relationship is established, the next action is “cancel”, and in response to this, the control processing unit 121 turns on the button B2. As a result, the confirmation is denied (cancelled).

In the case of a combination that does not have a correspondence relationship in the table TBL2 (for example, when a “left / right” flag is set at the input of the proximity sensor 105 and the voice recognition result is “yes”), the sensor input result is obtained by collating the table TBL2. Since the predetermined relationship is not established between the voice and the voice recognition result (the inside of the column is x), the interrupt routine is exited without determining the next action, so that both the buttons B1 and B2 are operated. There is no. Others are the same as in the above-described embodiment.

The present invention has been described above by taking the HMD as an example, but the present invention is not limited to the HMD and can be applied to all wearable terminals. In addition, the motion detection unit for detecting the user's motion is not limited to the above example. For example, the motion detection unit detects the line of sight from the motion of the user's eyeball or detects the motion of the lips according to the user's utterance. Also good.

The present invention is not limited to the embodiments described in the specification, and other embodiments and modifications are included for those skilled in the art from the embodiments and technical ideas described in the present specification. it is obvious. The description and the embodiments are for illustrative purposes only, and the scope of the present invention is indicated by the following claims.

100 HMD
101 Frame 101a Front part

101b Side part

101c Side part 101d Long hole 101e Long hole 102 Eyeglass lens 103 Main body part 104 Display unit 104B Image display part 104DR Display control part 105 Proximity sensor 106 Camera 106a Lens 107 Right sub-main part 107a Projection 108 Left sub-body portion 108a Protrusion 109 Geomagnetic sensor 110A Acceleration sensor 110B Angular velocity sensor 111B Microphone 111C Speaker / Earphone 111D Audio processing unit 111E Speech recognition unit 112 Illuminance sensor 113 Color temperature sensor 114 Temperature sensor 121 Control processing unit 122 Operation unit 123 Reception unit 124 Communication unit 127 Battery 129 Storage device Chair 130 Power supply circuit B1, B2 Button CD Code CTU Control unit HD Hand HS Wiring TBL1, TBL2 Table US User

Claims

A wearable device worn on the user's body,
A voice input unit for inputting the voice of the user;
A voice decoding unit that converts the voice of the user input by the voice input unit into a word;
An operation detection unit for detecting the operation of the user;
The words converted by the speech decoding unit when the input of the user's speech by the speech input unit and the detection of the user's motion by the motion detection unit occur within a predetermined time interval, and the motion detection A wearable terminal having a control unit that determines an action corresponding to the word when it is determined that a predetermined relationship has been established between the user's movements detected by the unit.
The wearable terminal according to claim 1, wherein the motion detection unit detects a movement of the user's head.
The wearable terminal according to claim 1, wherein the motion detection unit detects a motion of the user's hand.
The wearable terminal according to any one of claims 1 to 3, wherein the word has a meaning related to the operation of the user.
The control unit stores a table in which the word converted by the speech decoding unit and the user's motion detected by the motion detection unit are associated, and based on the table, the word and the user's motion are stored. The wearable terminal according to any one of claims 1 to 4, wherein when the operation is associated, it is determined that a predetermined relationship is established between the two.
The wearable terminal according to claim 5, wherein in the table, the association between the word and the user's action can be arbitrarily changed.
The wearable terminal according to any one of claims 1 to 6, wherein an action corresponding to the word can be arbitrarily changed.
The wearable terminal according to any one of claims 1 to 7, wherein the wearable terminal is a head mounted display mounted on a head of the user.