WO2018061743A1 - Wearable terminal - Google Patents

Wearable terminal Download PDF

Info

Publication number
WO2018061743A1
WO2018061743A1 PCT/JP2017/032781 JP2017032781W WO2018061743A1 WO 2018061743 A1 WO2018061743 A1 WO 2018061743A1 JP 2017032781 W JP2017032781 W JP 2017032781W WO 2018061743 A1 WO2018061743 A1 WO 2018061743A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
unit
input
word
wearable terminal
Prior art date
Application number
PCT/JP2017/032781
Other languages
French (fr)
Japanese (ja)
Inventor
軌行 石井
実 矢口
Original Assignee
コニカミノルタ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by コニカミノルタ株式会社 filed Critical コニカミノルタ株式会社
Publication of WO2018061743A1 publication Critical patent/WO2018061743A1/en

Links

Images

Classifications

    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B27/00Optical systems or apparatus not provided for by any of the groups G02B1/00 - G02B26/00, G02B30/00
    • G02B27/02Viewing or reading apparatus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/64Constructional details of receivers, e.g. cabinets or dust covers

Definitions

  • the present invention relates to a wearable terminal capable of voice recognition.
  • an input method based on gesture detection in which a user's hand movement (gesture) is detected by a camera or the like and input is performed.
  • a user's hand movement gesture
  • an erroneous input may be caused by an ambiguous gesture of the user, which may lead to an operation unintended by the user.
  • Patent Document 1 discloses a head-mounted display that can be worn on a user's head, detects an user's specific action, receives an instruction from the user, and an instruction input by the instruction input unit And a control means for performing a specific operation of the head-mounted display, and the instruction input means is a head-mounted display (hereinafter referred to as HMD) having a motion detection function for detecting the movement of the user's head. ) Is disclosed. Further, Patent Document 1 also includes a reference regarding speech recognition.
  • HMD head-mounted display
  • Patent Document 1 states that “a specific operation of the HMD may be performed according to the order of user actions. That is, two or more user actions are performed at different timings or in parallel. There may be a description that the HMD may be made to perform a specific operation of the HMD ”, and“ when the head movement and voice input are made simultaneously, the HMD performs a specific operation. You may make it do ". That is, Patent Document 1 mentions that a sensor and sound are individually detected and different operations are performed in accordance with the detection. However, if the voice recognition result is inappropriate, or if the sensor input is inappropriate, there still remains a possibility of causing a malfunction. That is, Patent Document 1 does not disclose or suggest that the speech recognition result and the sensor input are collated to determine whether or not the input is appropriate, and if the input is appropriate, the corresponding operation is determined and executed. .
  • the present invention has been made in view of the above circumstances, and provides a wearable terminal capable of realizing an appropriate action while compensating for input instability due to voice recognition by detecting a user's operation. Objective.
  • a wearable terminal reflecting one aspect of the present invention is a wearable terminal used by being worn on a user's body, A voice input unit for inputting the voice of the user; A voice decoding unit that converts the voice of the user input by the voice input unit into a word; An operation detection unit for detecting the operation of the user; The words converted by the speech decoding unit when the input of the user's speech by the speech input unit and the detection of the user's motion by the motion detection unit occur within a predetermined time interval, and the motion detection And a control unit that determines an action corresponding to the word when it is determined that a predetermined relationship is established between the user's movements detected by the unit.
  • a wearable terminal capable of realizing an appropriate action while compensating for input instability due to voice recognition by detecting a user's action.
  • HMD head mounted display
  • FIG. 5 It is an example of table TBL1 memorize
  • 5 is a flowchart illustrating an interrupt routine that is repeatedly executed in a control processing unit 121.
  • 5 is a flowchart illustrating an interrupt routine that is repeatedly executed in a control processing unit 121. It is a figure which shows the example of a display which requests
  • FIG. 1 is a perspective view of a head mounted display (hereinafter, HMD) 100 which is a wearable terminal according to the present embodiment.
  • FIG. 2 is a front view of the HMD 100 according to the present embodiment.
  • FIG. 3 is a view of the HMD 100 according to the present embodiment as viewed from above.
  • the right side and the left side of the HMD 100 refer to the right side and the left side for the user wearing the HMD 100.
  • the HMD 100 of this embodiment has a frame 101 as a support member.
  • a frame 101 that is U-shaped when viewed from above has a front part 101a to which two spectacle lenses 102 are attached, and side parts 101b and 101c extending rearward from both ends of the front part 101a.
  • the two spectacle lenses 102 attached to the frame 101 may or may not have refractive power.
  • a cylindrical main body 103 as a support member is fixed to the front portion 101a of the frame 101 on the upper side of the spectacle lens 102 on the right side (which may be on the left side depending on the user's dominant eye).
  • the main body 103 is provided with a display unit 104.
  • a display control unit 104DR (see FIG. 4 described later) that controls display of the display unit 104 based on an instruction from a control processing unit (control unit) 121 described later is disposed in the main body 103. If necessary, a display unit may be arranged in front of both eyes.
  • the display unit 104 includes an image forming unit (not shown) housed in the main body unit 103 and an image display unit 104B.
  • the image display unit 104B which is a so-called see-through type display member, has a generally plate shape that is disposed so as to extend downward from the main body unit 103 and to extend in parallel with one eyeglass lens 102 (see FIG. 1). is there.
  • the image display unit 104B Based on the image data input from the display control unit 104DR to the image forming unit, the image display unit 104B displays a color image on the image display unit 104B using the image light that is modulated for each pixel and emitted.
  • the image display unit 104B transmits almost all the external light, the user can observe an external image (real image) through these.
  • the virtual image of the image displayed on the image display unit 104B is observed while overlapping a part of the external image.
  • the user of the HMD 100 can simultaneously observe the image provided via the image display unit 104B and the external image. Note that when the display unit 104 is in the non-display state, the image display unit 104B is transparent, and only the external image can be observed.
  • the proximity sensor 105 disposed near the center of the frame 101, the lens 106 a of the camera 106 disposed near the side of the frame 101, and the proximity sensor 105. And the illuminance sensor 112 disposed between the lens 106a and the lens 106a so as to face each other.
  • the proximity sensor 105 exists in a detection region in the proximity range in front of the detection surface of the proximity sensor in order to detect that an object, for example, a part of a human body (such as a hand or a finger) is close to the user's eyes. It has a function of detecting whether or not the signal is output and outputting a signal.
  • the proximity range may be set as appropriate according to the operator's characteristics and preferences. For example, the proximity range from the detection surface of the proximity sensor may be within a range of 200 mm. If the distance from the proximity sensor is 200 mm or less, the user can easily put the palm and fingers into and out of the user's field of view with the arm bent, so that the user can easily operate with gestures using the hands and fingers. It is also preferable because it reduces the risk of erroneous detection of a human body or furniture other than the user.
  • the right sub-body portion 107 is attached to the right side portion 101b of the frame 101
  • the left sub-body portion 108 is attached to the left side portion 101c of the frame 101.
  • the right sub-main body portion 107 and the left sub-main body portion 108 have an elongated plate shape, and have elongated protrusions 107a and 108a on the inner side, respectively.
  • the right sub-body portion 107 is attached to the frame 101 in a positioned state
  • the elongated protrusion 108 a is attached to the side of the frame 101.
  • the left sub-main body portion 108 is attached to the frame 101 in a positioned state.
  • the right sub-body portion 107 there are a geomagnetic sensor 109 (see FIG. 4 described later) for detecting geomagnetism, and an angular velocity sensor 110B and an acceleration sensor 110A (see FIG. 4 described later) that generate an output corresponding to the posture.
  • the left sub-main unit 108 is provided with a speaker / earphone 111C and a microphone 111B (see FIG. 4 described later).
  • the main main body 103 and the right sub main body 107 are connected so as to be able to transmit signals through a wiring HS, and the main main body 103 and the left sub main body 108 are connected so as to be able to transmit signals through a wiring (not shown). Yes.
  • FIG. 4 As schematically illustrated in FIG.
  • the right sub-main body 107 is connected to the control unit CTU via a cord CD extending from the rear end.
  • a 6-axis sensor in which an angular velocity sensor and an acceleration sensor are integrated may be used.
  • the HMD can be operated by sound based on an output signal generated from the microphone 111B according to the input sound.
  • the main main body 103 and the left sub main body 108 may be configured to be wirelessly connected.
  • the provision of the color temperature sensor 113 and the temperature sensor 114 is optional.
  • the position where the microphone 111B is provided is arbitrary, but is preferably a position suitable for recording the voice spoken by the user US.
  • FIG. 4 is a block diagram of main circuits of the HMD 100.
  • the control unit CTU generates a control signal for the display unit 104 and other functional devices, a control processing unit 121, an operation unit 122, a GPS receiving unit 123 that receives radio waves from GPS satellites, and external and data.
  • a communication unit 124 that exchanges programs, a ROM 125 that stores programs and the like, a RAM 126 that stores image data and the like, a power supply circuit 130 that converts the voltage applied from the battery 127 into appropriate voltages for each unit, and an SSD And a storage device 129 such as a flash memory and a voice recognition unit 111E.
  • control processor 121 can use an application processor used in a smartphone or the like, the type of the control processor 121 is not limited. For example, if an application processor includes hardware necessary for image processing such as GPU or Codec as a standard, it can be said that the processor is suitable for a small HMD.
  • the control processing unit 121 controls image display on the display unit 104 via the display control unit 104DR.
  • the control processing unit 121 receives power from the power supply circuit 130, operates according to a program stored in at least one of the ROM 124 and the storage device 129, and receives an image from the camera 106 according to an operation input such as power-on from the operation unit 122. Data can be input and stored in the RAM 126, and can be communicated with the outside via the communication unit 124 as necessary.
  • the microphone 111B collects the voice spoken by the user US, converts it into a signal, and inputs the signal to the voice processing unit 111D.
  • the voice processing unit 111D processes the signal output from the microphone 111B and controls it as a voice signal.
  • the voice recognition unit 111E outputs the signal to the voice recognition unit 111E of the CTU, analyzes the voice signal output from the voice processing unit 111D, converts it into a word, and inputs the information to the control processing unit 121. It has become.
  • the microphone 111B and the voice processing unit 111D constitute a voice input unit
  • the voice recognition unit 111E constitutes a voice decoding unit, but the microphone 111B is externally attached and receives a signal via a pin jack or the like. In such a case, only the voice processing unit 111D may constitute the voice input unit.
  • FIG. 5 is a front view when the user US wears the HMD 100 of the present embodiment.
  • FIG. 6 is a diagram illustrating a state in which the user US is facing left while wearing the HMD 100
  • FIG. 7 is a diagram illustrating a state in which the user US is facing right while wearing the HMD 100.
  • FIG. 8 is a diagram showing a state in which the user US viewed from the side is facing upward while wearing the HMD 100
  • FIG. It is a figure which shows the state which faced.
  • FIG. 10 is an example of the table TBL1 stored in the RAM 126, for example.
  • the control processing unit 121 receives the signal output from the acceleration sensor 110A, based on the signal, the head of the user US faces upward as shown in FIG. If it is determined that the user has not headed, the flag is set to “up” and the head of the user US turns downward as shown in FIG.
  • the flag is set to “right” and the head of the user US If it is determined that the user has not turned to the right after turning to the left as shown in FIG.
  • the flag is set to “left” and the head of the user US is moving up and down between FIGS. If it is determined, the flag is set to “up and down” and the head of the user US is shown in FIG. If it is determined to be moving to the left and right in between, the flag and the "left and right".
  • the control processing unit 121 executes control (see FIG. 11) described later, assuming that a predetermined sensor input has been made.
  • the types of flags are not limited to the above.
  • the control processing unit 121 determines that “up”, “done”, “migi”, “hidari”, “yes”, “no”, “page”, “on” If it is determined that eight types of words “page” have been input, the control (see FIG. 12) described later is executed assuming that the prescribed voice recognition has been performed.
  • the word is not limited to the above, but for example, the user swings his head vertically as an action corresponding to “Yes”, and the user swings his head horizontally as an action corresponding to “No”. It is preferable to have a meaning related to the operation of the US because it becomes a natural operation.
  • FIG. 11 and 12 are flowcharts showing an interrupt routine that is repeatedly executed in the control processing unit 121.
  • FIG. 11 When the specified sensor input is performed before the specified voice recognition, the control of the flowchart of FIG. 11 is executed. When the specified voice recognition is performed before the specified sensor input, the control of the flowchart of FIG. Executed.
  • step S101 determines in step S101 that the specified sensor input is not performed (determination NO)
  • step S102 determines that the specified sensor input is performed. If so (determination YES), the control processing unit 121 resets and starts the built-in timer in subsequent step S102.
  • step S103 the control processing unit 121 determines whether or not the user's utterance input and the prescribed voice recognition have been performed.
  • step S104 the control processing unit 121 determines whether or not the time count of the built-in timer exceeds 1 second. If the time count of the built-in timer does not exceed 1 second, the flow returns to step S103 to wait for the user's speech input and the prescribed voice recognition. On the other hand, if the time of the built-in timer exceeds 1 second before the prescribed voice recognition is performed, the interrupt routine is immediately exited.
  • both inputs are within a predetermined time interval. In other cases, it is determined that the inputs of both have not occurred within a predetermined time interval.
  • the predetermined time is not limited to 1 second, it may be fixed or variable, and it is desirable that it can be adjusted according to the characteristics of the device.
  • the control processing unit 121 stores the table TBL1 stored in the RAM 126 in step S105. With reference, the flag based on the sensor input is compared with the word of the voice recognition result to determine whether or not a predetermined relationship is established. If the predetermined relationship is not established, the flow immediately exits the interrupt routine. On the other hand, if the predetermined relationship is established between the flag and the word, the control processing unit 121 defines in the corresponding column of the table TBL1 in step S106. The next action taken is determined and executed, after which the flow exits the interrupt routine.
  • step S201 when the control processing unit 121 determines in step S201 that the user's utterance is not input or the user's utterance is input but the prescribed voice recognition is not performed ( (Determination NO), the process immediately exits the interrupt routine, but if it is determined that the user's US utterance has been input and the prescribed voice recognition has been performed (determination YES), then in step S202, the control processing unit 121 resets the built-in timer. And start.
  • step S203 the control processing unit 121 determines whether or not a predetermined sensor input has been performed. If it is determined that the specified sensor input has not been performed, the control processing unit 121 further determines whether or not the built-in timer has exceeded 1 second in step S204. If the time measured by the built-in timer does not exceed 1 second, the flow returns to step S203 to wait for a prescribed sensor input. On the other hand, if the time of the built-in timer exceeds 1 second before the specified sensor input is performed, the interrupt routine is immediately exited.
  • both inputs are set at a predetermined time interval. It is determined that the input has not occurred within a predetermined time interval.
  • the predetermined time is not limited to 1 second.
  • the control processing unit 121 stores the table TBL1 stored in the RAM 126 in step S205.
  • the flag based on the sensor input is compared with the word of the voice recognition result to determine whether or not a predetermined relationship is established. If the predetermined relationship is not established, the flow immediately exits the interrupt routine, whereas if the predetermined relationship is established between the flag and the word, the control processing unit 121 defines in the corresponding column of the table TBL1 in step S206. The next action taken is determined and executed, after which the flow exits the interrupt routine.
  • a display requesting confirmation as shown in FIG. 13 is made on the image display unit 104B visually recognized by the user US, and a button B1 “Yes” and a button B2 “No” are displayed at the same time.
  • the “up / down” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “yes”, a predetermined relationship is obtained by collating the table TBL1. Therefore, the next action is “determine”, and in response to this, the control processing unit 121 turns on (highlights) the button B1. This confirms (determines) the confirmation.
  • the sensor input result and the voice recognition are verified by collating the table TBL1. Since the predetermined relationship with the result is not established (the inside of the column is x), the next routine is exited without deciding the next action, so neither of the buttons B1 and B2 is turned on. . As a result, the voice recognition result uttered by the user US can be supported by the swing motion of the user US, and an appropriate action desired by the user US can be determined, so that malfunction can be effectively prevented.
  • control processing unit 121 As an action, page up to 4 pages ahead.
  • the control processing unit 121 As an action, return the page to the previous 4 pages.
  • the sensor input is performed by checking the table TBL1. Since it can be seen that the predetermined relationship between the result and the voice recognition result is not established (the inside of the column is x), the next action is not decided and the interruption routine is exited, so that the action that the user US does not want is performed. There is nothing to do.
  • control processing unit 121 performs confirmation display (see FIG. 13) on whether or not the page can be turned between step S205 and step S206 on the image display unit 104B.
  • the page turning as the next action may be performed in response to the voice recognition result “yes” by the utterance of the user US and the input of the movement of the user US.
  • FIG. 15 is a diagram showing a state in which the user US puts the hand HD on the right in front of the face while wearing the HMD 100.
  • FIG. 16 shows a state in which the user US puts the hand HD in front of the face while wearing the HMD 100. It is a figure which shows the state put on the left.
  • FIG. 17 is a diagram illustrating a state in which the user US is wearing the HMD 100 and the hand HD is placed in front of the face.
  • FIG. 18 is a diagram illustrating the state in which the user US is wearing the HMD 100 It is a figure which shows the state put down before.
  • the proximity sensor 105 has, for example, a detection region divided into four parts, and can output a gesture signal by distinguishing which position the hand HD is in as shown in FIGS.
  • FIG. 19 shows an example of the table TBL2 stored in the RAM 126, for example.
  • the control processing unit 121 receives the gesture signal output from the proximity sensor 105, based on the signal, after the hand HD of the user US moves above the face as shown in FIG. 17, When it is determined that it has not moved downward, the flag is set to “Up”, and after it is determined that the hand HD of the user US has not moved upward after moving below the face as shown in FIG. If the flag is set to “down” and the user's US hand HD moves to the right as shown in FIG. 15 and then does not move to the left, the flag is set to “right” When it is determined that the US hand HD has moved to the left of the face as shown in FIG.
  • the flag is set to “left” and the user's US hand HD is changed to FIGS. Moving up and down between If you cross, the flag is "vertical", if the hand of the user US HD is determined to be moving from side to side between 15 and 16, the flag is set to "right”.
  • the control processing unit 121 executes control (see FIG. 11) assuming that a predetermined sensor input has been made.
  • the types of flags are not limited to the above.
  • the control processing unit 121 determines six types of words “up”, “done”, “migi”, “hidari”, “yes”, “no”. If it is determined that the input has been made, control (see FIG. 12) is executed assuming that the prescribed voice recognition has been performed.
  • the word is not limited to the above.
  • the combinations whose action contents are described in the column are assumed to have a predetermined relationship between them, while the combinations marked with “X” in the column have no corresponding relationship. It is assumed that the predetermined relationship is not established. Also in this example, the control processing unit 121 can execute control according to the flowcharts shown in FIGS.
  • an “up / down” flag is set by the input of the proximity sensor 105 in step S101 (or S203), and the word of the speech recognition result in step S103 (or S201) is “Yes”. ”, It is determined that the predetermined relationship has been established by collating the table TBL2, and the next action is“ determine ”. In response, the control processing unit 121 turns on the button B1. Become. This confirms (determines) the confirmation. On the other hand, when the “left / right” flag is set by the input of the proximity sensor 105 in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “no”, the table TBL2 is collated. When it is determined that the predetermined relationship is established, the next action is “cancel”, and in response to this, the control processing unit 121 turns on the button B2. As a result, the confirmation is denied (cancelled).
  • the sensor input result is obtained by collating the table TBL2. Since the predetermined relationship is not established between the voice and the voice recognition result (the inside of the column is x), the interrupt routine is exited without determining the next action, so that both the buttons B1 and B2 are operated. There is no. Others are the same as in the above-described embodiment.
  • the present invention has been described above by taking the HMD as an example, but the present invention is not limited to the HMD and can be applied to all wearable terminals.
  • the motion detection unit for detecting the user's motion is not limited to the above example.
  • the motion detection unit detects the line of sight from the motion of the user's eyeball or detects the motion of the lips according to the user's utterance. Also good.
  • HMD 101 Frame 101a Front part 101b Side part 101c Side part 101d Long hole 101e Long hole 102 Eyeglass lens 103 Main body part 104 Display unit 104B Image display part 104DR Display control part 105 Proximity sensor 106 Camera 106a Lens 107 Right sub-main part 107a Projection 108 Left sub-body portion 108a Protrusion 109 Geomagnetic sensor 110A Acceleration sensor 110B Angular velocity sensor 111B Microphone 111C Speaker / Earphone 111D Audio processing unit 111E Speech recognition unit 112 Illuminance sensor 113 Color temperature sensor 114 Temperature sensor 121 Control processing unit 122 Operation unit 123 Reception unit 124 Communication unit 127 Battery 129 Storage device Chair 130 Power supply circuit B1, B2 Button CD Code CTU Control unit HD Hand HS Wiring TBL1, TBL2 Table US User

Abstract

Provided is a wearable terminal capable of achieving a suitable action while compensating for the instability of voice recognition input by detecting a user operation. When a user's voice input to a voice input unit and detection of the user's operation by an operation detection unit take place within a predetermined time interval, a control unit of the wearable terminal compares a word resulting from conversion by a voice interpretation unit with the user's operation detected by the operation detection unit, and determines an action corresponding to the word if the control unit determines that the word and the user's operation have a predetermined relationship.

Description

ウェアラブル端末Wearable terminal
 本発明は,音声認識可能なウェアラブル端末に関する。 The present invention relates to a wearable terminal capable of voice recognition.
 近年、急速に発達してきたヘッドマウントディスプレイに代表されるようなウェアラブル端末は、ユーザーの手がふさがっている場合でも情報伝達を可能とするものが多くあり、それにより作業の効率化を図ることができるという長所がある。よって、その長所を最大限発揮するために、ボタン操作等の手入力よりも、手を用いないハンズフリー入力の方がウェアラブル端末には適しているといえる。このようなハンズフリー入力の一タイプとして、音声認識機能を利用したものが開発されているが、ヘッドマウントディスプレイなどは、ユーザーの両手がふさがるような条件下でも使用できることから、特に音声認識機能と親和性が高いとされている。 In recent years, wearable terminals such as head-mounted displays, which have been developed rapidly, have many devices that can transmit information even when the user's hands are full, thereby improving work efficiency. There is an advantage that you can. Therefore, it can be said that hands-free input without using a hand is more suitable for a wearable terminal than manual input such as button operation in order to maximize the advantages. One type of hands-free input has been developed that uses a voice recognition function, but a head-mounted display can be used even under conditions where both hands of the user are blocked. It is said that the affinity is high.
 しかしながら、音声認識機能を利用した入力方式の場合、ユーザーの適切な発話のみならず、発話と無関係なノイズやユーザーの独り言なども、マイク等を介して入力してしまうため誤入力を招きやすく、それによりユーザーが意図しない動作を招来する恐れがある。特に、「はい」などの短い単語の発話は、音声認識機能にてノイズと区別することが比較的困難であるとされる。 However, in the case of the input method using the voice recognition function, not only the user's appropriate utterance but also the noise unrelated to the utterance and the user's monologue etc. are input via the microphone etc. As a result, there is a risk of causing an unintended operation by the user. In particular, utterances of short words such as “Yes” are relatively difficult to distinguish from noise by the speech recognition function.
 これに対し、ユーザーの手の動き(ジェスチャー)などをカメラ等で検出して入力を行わせる、ジェスチャー検出による入力方式も開発されている。しかしながら、ジェスチャー検出による入力方式においても、ユーザーの曖昧なジェスチャーにより誤入力を招き、それによりユーザーが意図しない動作を招来する恐れがある。 In response to this, an input method based on gesture detection has been developed, in which a user's hand movement (gesture) is detected by a camera or the like and input is performed. However, even in the input method based on gesture detection, an erroneous input may be caused by an ambiguous gesture of the user, which may lead to an operation unintended by the user.
 特許文献1には、ユーザーの頭部に装着可能なヘッドマウントディスプレイであって、ユーザーの特定のアクションを検出して、ユーザーの指示を受け付ける指示入力手段と、前記指示入力手段により入力された指示に応じて、ヘッドマウントディスプレイの特定の動作を行わせる制御手段とを有し、前記指示入力手段は、ユーザーの頭部の動きを検出する動き検出機能を有するヘッドマウントディスプレイ(以下、HMDとする)が開示されている。又、特許文献1には音声認識に関する言及もある。 Patent Document 1 discloses a head-mounted display that can be worn on a user's head, detects an user's specific action, receives an instruction from the user, and an instruction input by the instruction input unit And a control means for performing a specific operation of the head-mounted display, and the instruction input means is a head-mounted display (hereinafter referred to as HMD) having a motion detection function for detecting the movement of the user's head. ) Is disclosed. Further, Patent Document 1 also includes a reference regarding speech recognition.
特開2004-233909号公報JP 2004-233909 A
 ここで、特許文献1には、「ユーザのアクションの順番に応じて、HMDの特定の動作をさせるものであってもよい。すなわち、2以上のユーザのアクションが異なるタイミングで行われたか、並行しておこなわれたかを区別して、HMDの特定の動作をさせるものであってもよい」という記載があり、また「頭部の動きと音声入力が同時になされた場合、~HMDが特定の動作をするようにしてもよい」との記載がある。つまり、特許文献1には、センサと音声とを個別に検出し、それに応じて異なる動作を実行させることについて言及されている。しかしながら、音声認識結果が不適切である場合、或いはセンサ入力が不適切である場合、それにより誤動作を招く恐れは依然として残存する。すなわち,特許文献1には、音声認識結果とセンサ入力を照合して入力の適否を判別し、入力が適切であればそれに応じた動作を決定、実行するということについて開示も示唆もされていない。 Here, Patent Document 1 states that “a specific operation of the HMD may be performed according to the order of user actions. That is, two or more user actions are performed at different timings or in parallel. There may be a description that the HMD may be made to perform a specific operation of the HMD ”, and“ when the head movement and voice input are made simultaneously, the HMD performs a specific operation. You may make it do ". That is, Patent Document 1 mentions that a sensor and sound are individually detected and different operations are performed in accordance with the detection. However, if the voice recognition result is inappropriate, or if the sensor input is inappropriate, there still remains a possibility of causing a malfunction. That is, Patent Document 1 does not disclose or suggest that the speech recognition result and the sensor input are collated to determine whether or not the input is appropriate, and if the input is appropriate, the corresponding operation is determined and executed. .
 本発明は、上記の事情に鑑みてなされたものであって、音声認識による入力の不安定さをユーザーの動作を検出することにより補いつつ、適切なアクションを実現できるウェアラブル端末を提供することを目的とする。 The present invention has been made in view of the above circumstances, and provides a wearable terminal capable of realizing an appropriate action while compensating for input instability due to voice recognition by detecting a user's operation. Objective.
 上述した目的のうち少なくとも一つを実現するために、本発明の一側面を反映したウェアラブル端末は、ユーザーの身体に装着されて使用されるウェアラブル端末であって、
 前記ユーザーの音声を入力する音声入力部と、
 前記音声入力部が入力した前記ユーザーの音声を単語に変換する音声解読部と、
 前記ユーザーの動作を検出する動作検出部と、
 前記音声入力部による前記ユーザーの音声の入力と、前記動作検出部による前記ユーザーの動作の検出とが所定の時間間隔内で生じたときに、前記音声解読部が変換した単語と、前記動作検出部が検出した前記ユーザーの動作とを比較し、両者間に所定の関係が成立したと判断したときは、前記単語に対応するアクションを決定する制御部とを有するものである。
In order to achieve at least one of the above-described objects, a wearable terminal reflecting one aspect of the present invention is a wearable terminal used by being worn on a user's body,
A voice input unit for inputting the voice of the user;
A voice decoding unit that converts the voice of the user input by the voice input unit into a word;
An operation detection unit for detecting the operation of the user;
The words converted by the speech decoding unit when the input of the user's speech by the speech input unit and the detection of the user's motion by the motion detection unit occur within a predetermined time interval, and the motion detection And a control unit that determines an action corresponding to the word when it is determined that a predetermined relationship is established between the user's movements detected by the unit.
 本発明によれば、音声認識による入力の不安定さをユーザーの動作を検出することにより補いつつ、適切なアクションを実現できるウェアラブル端末を提供することができる。 According to the present invention, it is possible to provide a wearable terminal capable of realizing an appropriate action while compensating for input instability due to voice recognition by detecting a user's action.
本実施形態にかかるヘッドマウントディスプレイ(HMD)の斜視図である。It is a perspective view of the head mounted display (HMD) concerning this embodiment. HMDを正面から見た図である。It is the figure which looked at HMD from the front. HMDを上方から見た図である。It is the figure which looked at HMD from the upper part. HMDの主要回路のブロック図である。It is a block diagram of the main circuit of HMD. ユーザーがHMDを装着したときの正面図である。It is a front view when a user wears HMD. ユーザーUSがHMD100を装着したまま、左を向いた状態を示す図である。It is a figure which shows the state which the user US turned to the left, with HMD100 mounted | worn. ユーザーUSがHMD100を装着したまま、右を向いた状態を示す図である。It is a figure which shows the state which the user US faced the right with mounting | wearing with HMD100. 側方から見たユーザーUSがHMD100を装着したまま、上方を向いた状態を示す図である。It is a figure showing the state where user US who looked from the side turned up with wearing HMD100. 側方から見たユーザーUSがHMD100を装着したまま、下方を向いた状態を示す図である。It is a figure which shows the state which the user US seen from the side turned down, with HMD100 mounted | worn. RAM126内に記憶されたテーブルTBL1の例である。It is an example of table TBL1 memorize | stored in RAM126. 制御処理部121内にて繰り返し実行される割り込みルーチンを示すフローチャートである。5 is a flowchart illustrating an interrupt routine that is repeatedly executed in a control processing unit 121. 制御処理部121内にて繰り返し実行される割り込みルーチンを示すフローチャートである。5 is a flowchart illustrating an interrupt routine that is repeatedly executed in a control processing unit 121. ユーザーUSが視認する画像表示部104Bにおいて、確認を求める表示例を示す図である。It is a figure which shows the example of a display which requests | requires confirmation in the image display part 104B visually recognized by the user US. ユーザーUSが視認する画像表示部104Bにおいて、文書の全104頁中の23頁が表示された例を示す図である。It is a figure which shows the example in which 23 pages of all 104 pages of a document were displayed in the image display part 104B visually recognized by the user US. ユーザーUSがHMD100を装着したまま、手HDを顔の前で右においた状態を示す図である。It is a figure which shows the state in which user US puts hand HD in front of the face with HMD100 mounted. ユーザーUSがHMD100を装着したまま、手HDを顔の前で左においた状態を示す図である。It is a figure which shows the state in which the user US left the hand HD in front of the face while wearing the HMD100. ユーザーUSがHMD100を装着したまま、手HDを顔の前で上においた状態を示す図である。It is a figure which shows the state in which user US puts hand HD in front of the face with wearing HMD100. ユーザーUSがHMD100を装着したまま、手HDを顔の前で下においた状態を示す図である。It is a figure which shows the state in which user US put down hand HD in front of the face with HMD100 wearing. 例えばRAM126内に記憶されたテーブルTBL2の例である。For example, it is an example of the table TBL2 stored in the RAM 126.
 以下に、本発明の実施形態を、図面を参照して説明する。図1は、本実施形態にかかる、ウェアラブル端末であるヘッドマウントディスプレイ(以下、HMD)100の斜視図である。図2は、本実施形態にかかるHMD100を正面から見た図である。図3は、本実施形態にかかるHMD100を上方から見た図である。以下、HMD100の右側及左側とは、HMD100を装着したユーザーにとっての右側及び左側をいうものとする。 Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a perspective view of a head mounted display (hereinafter, HMD) 100 which is a wearable terminal according to the present embodiment. FIG. 2 is a front view of the HMD 100 according to the present embodiment. FIG. 3 is a view of the HMD 100 according to the present embodiment as viewed from above. Hereinafter, the right side and the left side of the HMD 100 refer to the right side and the left side for the user wearing the HMD 100.
 図1~3に示すように、本実施形態のHMD100は,支持部材であるフレーム101を有している。上方から見てコ字状であるフレーム101は、2つの眼鏡レンズ102を取り付ける前方部101aと、前方部101aの両端から後方へと延在する側部101b、101cとを有する。フレーム101に取り付けられた2つの眼鏡レンズ102は屈折力を有していてもよいし、有していなくてもよい。 As shown in FIGS. 1 to 3, the HMD 100 of this embodiment has a frame 101 as a support member. A frame 101 that is U-shaped when viewed from above has a front part 101a to which two spectacle lenses 102 are attached, and side parts 101b and 101c extending rearward from both ends of the front part 101a. The two spectacle lenses 102 attached to the frame 101 may or may not have refractive power.
 右側(ユーザーの利き目などに応じて左側でもよい)の眼鏡レンズ102の上部において、支持部材である円筒状の主本体部103がフレーム101の前方部101aに固定されている。主本体部103にはディスプレイユニット104が設けられている。主本体部103内には、後述する制御処理部(制御部)121からの指示に基づいてディスプレイユニット104の表示制御を司る表示制御部104DR(後述する図4を参照)が配置されている。なお、必要であれば両眼の前にそれぞれディスプレイユニットを配置してもよい。 A cylindrical main body 103 as a support member is fixed to the front portion 101a of the frame 101 on the upper side of the spectacle lens 102 on the right side (which may be on the left side depending on the user's dominant eye). The main body 103 is provided with a display unit 104. A display control unit 104DR (see FIG. 4 described later) that controls display of the display unit 104 based on an instruction from a control processing unit (control unit) 121 described later is disposed in the main body 103. If necessary, a display unit may be arranged in front of both eyes.
 ディスプレイユニット104は、主本体部103内部に収容された不図示の画像形成部と、画像表示部104Bとからなる。いわゆるシースルー型の表示部材である画像表示部104Bは、主本体部103から下方に向かい、片方の眼鏡レンズ102(図1参照)に平行に延在するように配置された全体的に板状である。表示制御部104DRから画像形成部に入力された画像データに基づいて画素ごとに変調され、更に出射された画像光により、画像表示部104Bにてカラー画像が表示される。一方、画像表示部104Bは、外光をほとんど全て透過させるので、ユーザーはこれらを介して外界像(実像)を観察することができる。したがって、画像表示部104Bに表示された画像の虚像は、外界像の一部に重なって観察されることになる。このようにして、HMD100のユーザーは、画像表示部104Bを介して提供される画像と外界像とを同時に観察することができる。尚、ディスプレイユニット104が非表示状態のとき画像表示部104Bは素通しとなり、外界像のみを観察できる。 The display unit 104 includes an image forming unit (not shown) housed in the main body unit 103 and an image display unit 104B. The image display unit 104B, which is a so-called see-through type display member, has a generally plate shape that is disposed so as to extend downward from the main body unit 103 and to extend in parallel with one eyeglass lens 102 (see FIG. 1). is there. Based on the image data input from the display control unit 104DR to the image forming unit, the image display unit 104B displays a color image on the image display unit 104B using the image light that is modulated for each pixel and emitted. On the other hand, since the image display unit 104B transmits almost all the external light, the user can observe an external image (real image) through these. Therefore, the virtual image of the image displayed on the image display unit 104B is observed while overlapping a part of the external image. In this way, the user of the HMD 100 can simultaneously observe the image provided via the image display unit 104B and the external image. Note that when the display unit 104 is in the non-display state, the image display unit 104B is transparent, and only the external image can be observed.
 更に図1、2において、主本体部103の正面には、フレーム101の中央寄りに配置された近接センサ105と、フレーム101の側部寄りに配置されたカメラ106のレンズ106aと、近接センサ105とレンズ106aとの間に配置された照度センサ112とがそれぞれ前方を向くようにして設けられている。 Further, in FIGS. 1 and 2, on the front surface of the main body 103, the proximity sensor 105 disposed near the center of the frame 101, the lens 106 a of the camera 106 disposed near the side of the frame 101, and the proximity sensor 105. And the illuminance sensor 112 disposed between the lens 106a and the lens 106a so as to face each other.
 近接センサ105は、物体、例えば人体の一部(手や指など)がユーザーの眼前に近接していることを検知するために、近接センサの検出面前方の近接範囲にある検出領域内に存在しているか否かを検出して信号を出力する機能を有する。近接範囲は、操作者の特性や好みに応じて適宜設定すればよいが、例えば、近接センサの検出面からの距離が200mm以内の範囲とすることができる。近接センサからの距離が200mm以内であれば、ユーザーが腕を曲げた状態で、手のひらや指をユーザーの視野内に入れたり出したりできるため、手や指を使ったジェスチャーによって容易に操作を行うことができ、また、ユーザー以外の人体や家具等を誤って検出する恐れが少なくなるので好ましい。 The proximity sensor 105 exists in a detection region in the proximity range in front of the detection surface of the proximity sensor in order to detect that an object, for example, a part of a human body (such as a hand or a finger) is close to the user's eyes. It has a function of detecting whether or not the signal is output and outputting a signal. The proximity range may be set as appropriate according to the operator's characteristics and preferences. For example, the proximity range from the detection surface of the proximity sensor may be within a range of 200 mm. If the distance from the proximity sensor is 200 mm or less, the user can easily put the palm and fingers into and out of the user's field of view with the arm bent, so that the user can easily operate with gestures using the hands and fingers. It is also preferable because it reduces the risk of erroneous detection of a human body or furniture other than the user.
 図1、2において、フレーム101の右側の側部101bには、右副本体部107が取り付けられ、フレーム101の左側の側部101cには、左副本体部108が取り付けられている。右副本体部107及び左副本体部108は、細長い板形状を有しており、それぞれ内側に細長い突起107a,108aを有している。この細長い突起107aを、フレーム101の側部101bの長孔101dに係合させることで、右副本体部107が位置決めされた状態でフレーム101に取り付けられ、また細長い突起108aを、フレーム101の側部101cの長孔101eに係合させることで、左副本体部108が位置決めされた状態でフレーム101に取り付けられている。 1 and 2, the right sub-body portion 107 is attached to the right side portion 101b of the frame 101, and the left sub-body portion 108 is attached to the left side portion 101c of the frame 101. The right sub-main body portion 107 and the left sub-main body portion 108 have an elongated plate shape, and have elongated protrusions 107a and 108a on the inner side, respectively. By engaging the elongated protrusion 107 a with the elongated hole 101 d of the side portion 101 b of the frame 101, the right sub-body portion 107 is attached to the frame 101 in a positioned state, and the elongated protrusion 108 a is attached to the side of the frame 101. By engaging with the long hole 101e of the portion 101c, the left sub-main body portion 108 is attached to the frame 101 in a positioned state.
 右副本体部107内には、地磁気を検出する地磁気センサ109(後述する図4参照)と、姿勢に応じた出力を生成する、角速度センサ110B及び加速度センサ110A(後述する図4参照)とが搭載されており、左副本体部108内には、スピーカー・イヤホン111C及びマイク111B(後述する図4参照)とが設けられている。主本体部103と右副本体部107とは、配線HSで信号伝達可能に接続されており、主本体部103と左副本体部108とは、不図示の配線で信号伝達可能に接続されている。図3に簡略図示するように、右副本体部107は、その後端から延在するコードCDを介して制御ユニットCTUに接続されている。なお、角速度センサ及び加速度センサを一体化した6軸センサを用いてもよい。更に、入力される音声に応じてマイク111Bから生成される出力信号に基づいて、音声によってHMDを操作することもできる。また、主本体部103と左副本体部108とが無線接続されるように構成してもよい。但し、色温度センサ113や温度センサ114を設けることは任意である。尚、マイク111Bを設ける位置は任意であるが、ユーザーUSの発話した音声を収録するのに適した位置であると好ましい。 In the right sub-body portion 107, there are a geomagnetic sensor 109 (see FIG. 4 described later) for detecting geomagnetism, and an angular velocity sensor 110B and an acceleration sensor 110A (see FIG. 4 described later) that generate an output corresponding to the posture. The left sub-main unit 108 is provided with a speaker / earphone 111C and a microphone 111B (see FIG. 4 described later). The main main body 103 and the right sub main body 107 are connected so as to be able to transmit signals through a wiring HS, and the main main body 103 and the left sub main body 108 are connected so as to be able to transmit signals through a wiring (not shown). Yes. As schematically illustrated in FIG. 3, the right sub-main body 107 is connected to the control unit CTU via a cord CD extending from the rear end. A 6-axis sensor in which an angular velocity sensor and an acceleration sensor are integrated may be used. Furthermore, the HMD can be operated by sound based on an output signal generated from the microphone 111B according to the input sound. Further, the main main body 103 and the left sub main body 108 may be configured to be wirelessly connected. However, the provision of the color temperature sensor 113 and the temperature sensor 114 is optional. The position where the microphone 111B is provided is arbitrary, but is preferably a position suitable for recording the voice spoken by the user US.
 図4は、HMD100の主要回路のブロック図である。制御ユニットCTUは、ディスプレイユニット104やその他の機能デバイスに対して制御信号を生成する、制御処理部121と、操作部122と、GPS衛星からの電波を受信するGPS受信部123と、外部とデータのやりとりを行う通信部124と、プログラム等を格納するROM125と、画像データ等を保存するRAM126と、バッテリー127から付与された電圧を各部に適正な電圧に変換するための電源回路130と、SSDやフラッシュメモリ等のストレージデバイス129と、音声認識部111Eとを有している。制御処理部121はスマートフォンなどで用いられているアプリケーションプロセッサーを使用することが出来るが、制御処理部121の種類は問わない。例えば、アプリケーションプロセッサーの内部にGPUやCodecなど画像処理に必要なハードウェアが標準で組み込まれているものは、小型のHMDには適したプロセッサーであるといえる。 FIG. 4 is a block diagram of main circuits of the HMD 100. The control unit CTU generates a control signal for the display unit 104 and other functional devices, a control processing unit 121, an operation unit 122, a GPS receiving unit 123 that receives radio waves from GPS satellites, and external and data. A communication unit 124 that exchanges programs, a ROM 125 that stores programs and the like, a RAM 126 that stores image data and the like, a power supply circuit 130 that converts the voltage applied from the battery 127 into appropriate voltages for each unit, and an SSD And a storage device 129 such as a flash memory and a voice recognition unit 111E. Although the control processor 121 can use an application processor used in a smartphone or the like, the type of the control processor 121 is not limited. For example, if an application processor includes hardware necessary for image processing such as GPU or Codec as a standard, it can be said that the processor is suitable for a small HMD.
 更に、制御処理部121には、近接センサ105の受光部が人体から放射される検出光としての不可視光を検出したときは、近接センサ105からその信号が入力され、また周囲の明るさを検出する照度センサ112からの信号が入力される。更に、制御処理部121は、表示制御部104DRを介してディスプレイユニット104の画像表示を制御する。 Further, when the light receiving unit of the proximity sensor 105 detects invisible light as detection light emitted from the human body, the signal is input from the proximity sensor 105 and the ambient brightness is detected. A signal from the illuminance sensor 112 is input. Further, the control processing unit 121 controls image display on the display unit 104 via the display control unit 104DR.
 制御処理部121は、電源回路130からの給電を受け、ROM124及びストレージデバイス129の少なくとも一方に格納されたプログラムに従って動作し、操作部122からの電源オンなどの操作入力に従い、カメラ106からの画像データを入力してRAM126に記憶し、必要に応じて通信部124を介して外部と通信を行うことができる。 The control processing unit 121 receives power from the power supply circuit 130, operates according to a program stored in at least one of the ROM 124 and the storage device 129, and receives an image from the camera 106 according to an operation input such as power-on from the operation unit 122. Data can be input and stored in the RAM 126, and can be communicated with the outside via the communication unit 124 as necessary.
 マイク111Bは、ユーザーUSの発話した音声を集音し信号に変換して音声処理部111Dへと入力し、音声処理部111Dは、マイク111Bから出力された信号を処理して音声信号として制御ユニットCTUの音声認識部111Eへと出力し、音声認識部111Eは、音声処理部111Dから出力された音声信号を解析して単語に変換して、その情報を制御処理部121へと入力するようになっている。ここでは、マイク111Bと音声処理部111Dが音声入力部を構成し、音声認識部111Eが音声解読部を構成するが、マイク111Bが外付けであって、ピンジャック等を介して信号を受信するような場合、音声処理部111Dのみが音声入力部を構成することもある。 The microphone 111B collects the voice spoken by the user US, converts it into a signal, and inputs the signal to the voice processing unit 111D. The voice processing unit 111D processes the signal output from the microphone 111B and controls it as a voice signal. The voice recognition unit 111E outputs the signal to the voice recognition unit 111E of the CTU, analyzes the voice signal output from the voice processing unit 111D, converts it into a word, and inputs the information to the control processing unit 121. It has become. Here, the microphone 111B and the voice processing unit 111D constitute a voice input unit, and the voice recognition unit 111E constitutes a voice decoding unit, but the microphone 111B is externally attached and receives a signal via a pin jack or the like. In such a case, only the voice processing unit 111D may constitute the voice input unit.
 次に、ユーザーの動作を検出する動作検出部として,ユーザーUSの頭部の動きを検出できる加速度センサ110Aを用いた例について説明する。ここで、動作検出部が検出する「ユーザーの動作」とは、HMD100に対してユーザーが非接触で行う動作であるものとする。図5は、ユーザーUSが本実施形態のHMD100を装着したときの正面図である。図6は、ユーザーUSがHMD100を装着したまま、左を向いた状態を示す図であり、また図7は、ユーザーUSがHMD100を装着したまま、右を向いた状態を示す図である。更に、図8は、側方から見たユーザーUSがHMD100を装着したまま、上方を向いた状態を示す図であり、図9は、側方から見たユーザーUSがHMD100を装着したまま、下方を向いた状態を示す図である。 Next, an example in which the acceleration sensor 110A that can detect the movement of the head of the user US is used as the motion detection unit that detects the motion of the user will be described. Here, the “user operation” detected by the operation detection unit is an operation performed by the user in a non-contact manner on the HMD 100. FIG. 5 is a front view when the user US wears the HMD 100 of the present embodiment. FIG. 6 is a diagram illustrating a state in which the user US is facing left while wearing the HMD 100, and FIG. 7 is a diagram illustrating a state in which the user US is facing right while wearing the HMD 100. Further, FIG. 8 is a diagram showing a state in which the user US viewed from the side is facing upward while wearing the HMD 100, and FIG. It is a figure which shows the state which faced.
 図10は、例えばRAM126内に記憶されたテーブルTBL1の例である。ここで、制御処理部121は、加速度センサ110Aから出力された信号を受信したときは、その信号に基づいて、ユーザーUSの頭部が図8に示すように上方を向いた後、続いて下方を向かなかったと判断した場合、フラグを「上」とし、ユーザーUSの頭部が図9に示すように下方を向いた後、続いて上方を向かなかったと判断した場合、フラグを「下」とし、ユーザーUSの頭部が図7に示すように右方を向いた後、続いて左方を向かなかったと判断した場合、フラグを「右」とし、ユーザーUSの頭部が図6に示すように左方を向いた後、続いて右方を向かなかったと判断した場合、フラグを「左」とし、ユーザーUSの頭部が図8,9の間で上下に動いていると判断した場合、フラグを「上下」とし、ユーザーUSの頭部が図6,7の間で左右に動いていると判断した場合、フラグを「左右」とする。以上のフラグが立ったとき、制御処理部121は規定のセンサ入力がなされたものとして、後述する制御(図11参照)を実行する。尚、フラグの種類は以上に限られない。 FIG. 10 is an example of the table TBL1 stored in the RAM 126, for example. Here, when the control processing unit 121 receives the signal output from the acceleration sensor 110A, based on the signal, the head of the user US faces upward as shown in FIG. If it is determined that the user has not headed, the flag is set to “up” and the head of the user US turns downward as shown in FIG. When the head of the user US turns to the right as shown in FIG. 7 and then determines that the head of the user US has not turned to the left, the flag is set to “right” and the head of the user US If it is determined that the user has not turned to the right after turning to the left as shown in FIG. 8, the flag is set to “left” and the head of the user US is moving up and down between FIGS. If it is determined, the flag is set to “up and down” and the head of the user US is shown in FIG. If it is determined to be moving to the left and right in between, the flag and the "left and right". When the above flag is set, the control processing unit 121 executes control (see FIG. 11) described later, assuming that a predetermined sensor input has been made. The types of flags are not limited to the above.
 又、制御処理部121は、音声認識部111Eが音声認識を行った結果、「うえ」、「した」、「みぎ」、「ひだり」、「はい」、「いいえ」、「ページ」、「よんページ」の8種類の単語を入力したと判断した場合、規定の音声認識がなされたものとして、後述する制御(図12参照)を実行する。尚、単語は以上に限られないが、例えば「はい」に対応する動作として、ユーザーが縦に首を振り、「いいえ」に対応する動作として、ユーザーが横に首を振るというように、ユーザーUSの動作に関連する意味を持つものであると自然な動作となるので好ましい。テーブルTBL1中で、欄内にアクション内容が規定されたフラグと単語との組み合わせは、両者が有効に対応づけられており、従って両者間に所定の関係が成立しているものとする。一方、欄内に×の付された組み合わせは対応関係が存在せず、両者間に所定の関係が成立していないものとする。このようなセンサ入力と音声認識結果との対応付けは任意に行うことができる。又、次のアクションも任意に設定できる。テーブルの内容は任意に変更可能であり、その場合、HMD100を操作しながら入力したり、外部のPC等で調整したものを有線又は無線にてダウンロードすることもできる。 In addition, as a result of the voice recognition performed by the voice recognition unit 111E, the control processing unit 121 determines that “up”, “done”, “migi”, “hidari”, “yes”, “no”, “page”, “on” If it is determined that eight types of words “page” have been input, the control (see FIG. 12) described later is executed assuming that the prescribed voice recognition has been performed. The word is not limited to the above, but for example, the user swings his head vertically as an action corresponding to “Yes”, and the user swings his head horizontally as an action corresponding to “No”. It is preferable to have a meaning related to the operation of the US because it becomes a natural operation. In the table TBL1, it is assumed that combinations of flags and words whose action contents are defined in the columns are effectively associated with each other, and therefore a predetermined relationship is established between the two. On the other hand, it is assumed that there is no correspondence relationship between the combinations marked with “X” in the column, and a predetermined relationship is not established between them. Such association between the sensor input and the voice recognition result can be arbitrarily performed. The next action can also be set arbitrarily. The contents of the table can be arbitrarily changed. In this case, the contents can be input while operating the HMD 100, or the contents adjusted by an external PC or the like can be downloaded by wire or wirelessly.
 図11,12は、制御処理部121内にて繰り返し実行される割り込みルーチンを示すフローチャートである。規定の音声認識前に規定のセンサ入力がなされたときは、図11のフローチャートの制御が実行され、また規定のセンサ入力前に規定の音声認識がなされたときは、図12のフローチャートの制御が実行される。 11 and 12 are flowcharts showing an interrupt routine that is repeatedly executed in the control processing unit 121. FIG. When the specified sensor input is performed before the specified voice recognition, the control of the flowchart of FIG. 11 is executed. When the specified voice recognition is performed before the specified sensor input, the control of the flowchart of FIG. Executed.
 まず図11の割り込みルーチンで、ステップS101にて制御処理部121が規定のセンサ入力が行われないと判断した場合(判定NO)、直ちに割り込みルーチンを抜けるが、規定のセンサ入力が行われたと判断した場合(判定YES)、続くステップS102にて、制御処理部121は内蔵タイマのリセット及びスタートを行う。 First, in the interrupt routine of FIG. 11, when the control processing unit 121 determines in step S101 that the specified sensor input is not performed (determination NO), it immediately exits the interrupt routine, but determines that the specified sensor input is performed. If so (determination YES), the control processing unit 121 resets and starts the built-in timer in subsequent step S102.
 更にステップS103で、制御処理部121は、ユーザーの発話入力及び規定の音声認識が行われたか否かを判断する。ここで、ユーザーの発話入力及び規定の音声認識が行われていないと判断した場合、更にステップS104で、制御処理部121は内蔵タイマの計時が1秒を超えているか否かを判断する。内蔵タイマの計時が1秒を超えていなければ、フローはステップS103に戻って、ユーザーの発話入力及び規定の音声認識を待ち受ける。一方、規定の音声認識が行われる前に内蔵タイマの計時が1秒を超えた場合、直ちに割り込みルーチンを抜ける。つまり、規定のセンサ入力(検出)がなされ、且つ所定時間内(ここでは1秒以内)にユーザーUSの発話を入力して規定の音声認識がなされたとき、両者の入力が所定の時間間隔内で生じたと判断し、それ以外は両者の入力が所定の時間間隔内で生じていないと判断するのである。所定時間は1秒に限られず、固定でも可変でも良く、更には機器の特性に応じて調整可能とすることが望ましい。 Further, in step S103, the control processing unit 121 determines whether or not the user's utterance input and the prescribed voice recognition have been performed. Here, when it is determined that the user's speech input and the prescribed voice recognition are not performed, in step S104, the control processing unit 121 determines whether or not the time count of the built-in timer exceeds 1 second. If the time count of the built-in timer does not exceed 1 second, the flow returns to step S103 to wait for the user's speech input and the prescribed voice recognition. On the other hand, if the time of the built-in timer exceeds 1 second before the prescribed voice recognition is performed, the interrupt routine is immediately exited. In other words, when a specified sensor input (detection) is made and a user's US utterance is input within a predetermined time (within 1 second in this case) and a predetermined voice recognition is made, both inputs are within a predetermined time interval. In other cases, it is determined that the inputs of both have not occurred within a predetermined time interval. The predetermined time is not limited to 1 second, it may be fixed or variable, and it is desirable that it can be adjusted according to the characteristics of the device.
 一方、規定のセンサ入力がなされた後、1秒以内に規定の音声認識がなされたと判断した場合(ステップS103で判定YES)、制御処理部121は、ステップS105で、RAM126に記憶したテーブルTBL1を参照し、センサ入力によるフラグと、音声認識結果の単語とを照合して、所定の関係が成立するか否かを判断する。所定の関係が不成立の場合、直ちにフローは割り込みルーチンを抜ける一方で、フラグと単語とに所定の関係が成立する場合、制御処理部121は、ステップS106で、テーブルTBL1の対応する欄内に規定された次のアクションを決定すると共に当該アクションを実行し、その後フローは割り込みルーチンを抜けることとなる。 On the other hand, if it is determined that the specified voice recognition has been made within 1 second after the specified sensor input is made (YES in step S103), the control processing unit 121 stores the table TBL1 stored in the RAM 126 in step S105. With reference, the flag based on the sensor input is compared with the word of the voice recognition result to determine whether or not a predetermined relationship is established. If the predetermined relationship is not established, the flow immediately exits the interrupt routine. On the other hand, if the predetermined relationship is established between the flag and the word, the control processing unit 121 defines in the corresponding column of the table TBL1 in step S106. The next action taken is determined and executed, after which the flow exits the interrupt routine.
 これに対し図12の割り込みルーチンで、ステップS201にて制御処理部121が、ユーザーUSの発話を入力せず、或いはユーザーの発話を入力したが規定の音声認識が行われないと判断した場合(判定NO)、直ちに割り込みルーチンを抜けるが、ユーザーUSの発話を入力して規定の音声認識が行われたと判断した場合(判定YES)、続くステップS202にて、制御処理部121は内蔵タイマのリセット及びスタートを行う。 On the other hand, in the interrupt routine of FIG. 12, when the control processing unit 121 determines in step S201 that the user's utterance is not input or the user's utterance is input but the prescribed voice recognition is not performed ( (Determination NO), the process immediately exits the interrupt routine, but if it is determined that the user's US utterance has been input and the prescribed voice recognition has been performed (determination YES), then in step S202, the control processing unit 121 resets the built-in timer. And start.
 更にステップS203で、制御処理部121は、規定のセンサ入力が行われたか否かを判断する。ここで、規定のセンサ入力が行われていないと判断した場合、更にステップS204で、制御処理部121は内蔵タイマの計時が1秒を超えているか否かを判断する。内蔵タイマの計時が1秒を超えていなければ、フローはステップS203に戻って、規定のセンサ入力を待ち受ける。一方、規定のセンサ入力が行われる前に内蔵タイマの計時が1秒を超えた場合、直ちに割り込みルーチンを抜ける。つまり、ユーザーUSの発話を入力して規定の音声認識がなされ、且つ所定時間内(ここでは1秒以内)に規定のセンサ入力(検出)がなされたときに、両者の入力が所定の時間間隔内で生じたと判断し、それ以外は両者の入力が所定の時間間隔内で生じていないと判断するのである。所定時間は1秒に限られない。 Further, in step S203, the control processing unit 121 determines whether or not a predetermined sensor input has been performed. If it is determined that the specified sensor input has not been performed, the control processing unit 121 further determines whether or not the built-in timer has exceeded 1 second in step S204. If the time measured by the built-in timer does not exceed 1 second, the flow returns to step S203 to wait for a prescribed sensor input. On the other hand, if the time of the built-in timer exceeds 1 second before the specified sensor input is performed, the interrupt routine is immediately exited. In other words, when the user's US utterance is input and the specified voice recognition is performed and the specified sensor input (detection) is made within a predetermined time (within 1 second in this case), both inputs are set at a predetermined time interval. It is determined that the input has not occurred within a predetermined time interval. The predetermined time is not limited to 1 second.
 一方、規定の音声認識がなされた後、1秒以内に規定のセンサ入力がなされたと判断した場合(ステップS203で判定YES)、制御処理部121は、ステップS205で、RAM126に記憶したテーブルTBL1を参照し、センサ入力によるフラグと、音声認識結果の単語とを照合して、所定の関係が成立するか否かを判断する。所定の関係が不成立の場合、直ちにフローは割り込みルーチンを抜ける一方で、フラグと単語とに所定の関係が成立する場合、制御処理部121は、ステップS206で、テーブルTBL1の対応する欄内に規定された次のアクションを決定すると共に当該アクションを実行し、その後フローは割り込みルーチンを抜けることとなる。 On the other hand, if it is determined that the specified sensor input has been made within one second after the specified voice recognition is made (YES in step S203), the control processing unit 121 stores the table TBL1 stored in the RAM 126 in step S205. With reference, the flag based on the sensor input is compared with the word of the voice recognition result to determine whether or not a predetermined relationship is established. If the predetermined relationship is not established, the flow immediately exits the interrupt routine, whereas if the predetermined relationship is established between the flag and the word, the control processing unit 121 defines in the corresponding column of the table TBL1 in step S206. The next action taken is determined and executed, after which the flow exits the interrupt routine.
 ここで、具体例を挙げると、ユーザーUSが視認する画像表示部104Bに、図13に示すように確認を求める表示がなされ、同時に「はい」というボタンB1と、「いいえ」というボタンB2が表示されているものとする。例えば、ステップS101(又はS203)のセンサ入力で「上下」のフラグが立ち、ステップS103(又はS201)の音声認識結果の単語が「はい」であったときは、テーブルTBL1の照合により所定の関係が成立したものと判断されて、次のアクションが「決定」になり、これに応じて制御処理部121がボタンB1をオン操作(ハイライトに)することとなる。これにより確認が肯定(決定)される。一方、ステップS101(又はS203)のセンサ入力で「左右」のフラグが立ち、ステップS103(又はS201)の音声認識結果の単語が「いいえ」であったときは、テーブルTBL1の照合により所定の関係が成立したものと判断されて、次のアクションが「キャンセル」になり、これに応じて制御処理部121がボタンB2をオン操作(ハイライトに)することとなる。これにより確認が否定(キャンセル)される。 Here, as a specific example, a display requesting confirmation as shown in FIG. 13 is made on the image display unit 104B visually recognized by the user US, and a button B1 “Yes” and a button B2 “No” are displayed at the same time. It is assumed that For example, when the “up / down” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “yes”, a predetermined relationship is obtained by collating the table TBL1. Therefore, the next action is “determine”, and in response to this, the control processing unit 121 turns on (highlights) the button B1. This confirms (determines) the confirmation. On the other hand, when the “left / right” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “no”, a predetermined relationship is obtained by collating the table TBL1. Therefore, the next action is “cancel”, and in response thereto, the control processing unit 121 turns on (highlights) the button B2. As a result, the confirmation is denied (cancelled).
 テーブルTBL1で対応関係が存在しない組み合わせの場合(例えばセンサ入力で「左右」のフラグが立ち、音声認識結果が「はい」であったような場合)、テーブルTBL1の照合によりセンサ入力結果と音声認識結果との間で所定の関係が不成立(欄内が×)であるために、次のアクションが決定されることなく割り込みルーチンを抜けるので、ボタンB1、B2のいずれもオン操作されることがない。これによりユーザーUSが発話した音声認識結果を、ユーザーUSの首振り動作で裏付けることが出来、ユーザーUSが所望する適切なアクションを決定することができるから、誤動作を有効に阻止できる。 In the case of a combination that does not have a correspondence relationship in the table TBL1 (for example, when the “left and right” flag is set in the sensor input and the voice recognition result is “Yes”), the sensor input result and the voice recognition are verified by collating the table TBL1. Since the predetermined relationship with the result is not established (the inside of the column is x), the next routine is exited without deciding the next action, so neither of the buttons B1 and B2 is turned on. . As a result, the voice recognition result uttered by the user US can be supported by the swing motion of the user US, and an appropriate action desired by the user US can be determined, so that malfunction can be effectively prevented.
 図14に示す例では、ユーザーUSが視認する画像表示部104Bに、文書の全104頁中の23頁が表示されているものとする。例えば、ステップS101(又はS203)のセンサ入力で「右」のフラグが立ち、ステップS103(又はS201)の音声認識結果の単語が「ぺーじ」であったときは、制御処理部121は次のアクションとして、次のページ(24頁)へとページめくりを行う。一方、ステップS101(又はS203)のセンサ入力で「左」のフラグが立ち、ステップS103(又はS201)の音声認識結果の単語が「ぺーじ」であったときは、制御処理部121は次のアクションとして、前のページ(22頁)へとページめくりを行う。 In the example shown in FIG. 14, it is assumed that 23 pages out of all 104 pages of the document are displayed on the image display unit 104B visually recognized by the user US. For example, when the “right” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “page”, the control processing unit 121 performs the next action. Then, the page is turned to the next page (page 24). On the other hand, when the “left” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “page”, the control processing unit 121 performs the next action. Then, the page is turned to the previous page (page 22).
 又、ステップS101(又はS203)のセンサ入力で「右」のフラグが立ち、ステップS103(又はS201)の音声認識結果の単語が「よんぺーじ」であったときは、制御処理部121は次のアクションとして、4頁先までページを進める。一方、ステップS101(又はS203)のセンサ入力で「左」のフラグが立ち、ステップS103(又はS201)の音声認識結果の単語が「よんぺーじ」であったときは、制御処理部121は次のアクションとして、4頁前までページを戻す。 Further, when the “right” flag is set by the sensor input in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “good”, the control processing unit 121 As an action, page up to 4 pages ahead. On the other hand, when the “left” flag is set by the sensor input at step S101 (or S203) and the word of the speech recognition result at step S103 (or S201) is “once page”, the control processing unit 121 As an action, return the page to the previous 4 pages.
 尚、センサ入力で「上」のフラグが立ち、音声認識結果が「うえ」であるときは、次のアクションとして例えばメニュー画面への移行が行われ、センサ入力で「下」のフラグが立ち、音声認識結果が「した」であるときは、次のアクションとして例えば設定画面への移行が行われる。同様に、センサ入力で「右」又は「左」のフラグが立ち、音声認識結果が「みぎ」又は「ひだり」であるときは、予め規定されたアクションを実行できる。 When the “up” flag is set for sensor input and the voice recognition result is “up”, for example, the transition to the menu screen is performed as the next action, and the “down” flag is set for sensor input. When the voice recognition result is “done”, for example, a transition to the setting screen is performed as the next action. Similarly, when a “right” or “left” flag is set in the sensor input and the voice recognition result is “migi” or “hidari”, a predetermined action can be executed.
 これに対し、テーブルTBL1で対応関係が存在しない組み合わせの場合(例えばセンサ入力で「上」のフラグが立ち、音声認識結果が「ぺーじ」であったような場合)、テーブルTBL1の照合によりセンサ入力結果と音声認識結果との間で所定の関係が不成立(欄内が×)となることがわかるため、次のアクションが決定されることなく割り込みルーチンを抜けるので、ユーザーUSが望まないアクションがなされることがない。 On the other hand, when the combination does not exist in the table TBL1 (for example, when the “up” flag is set in the sensor input and the speech recognition result is “page”), the sensor input is performed by checking the table TBL1. Since it can be seen that the predetermined relationship between the result and the voice recognition result is not established (the inside of the column is x), the next action is not decided and the interruption routine is exited, so that the action that the user US does not want is performed. There is nothing to do.
 以上の変形例として、制御処理部121が、ステップS205とステップS206の間で、頁めくりを行って良いかの確認表示(図13参照)を画像表示部104Bに行った後、上述したようなユーザーUSの発話による音声認識結果「はい」と、ユーザーUSの頷く動作の入力とがなされたことに応じて、次のアクションとしての頁めくりを行うようにしても良い。 As a modification example described above, the control processing unit 121 performs confirmation display (see FIG. 13) on whether or not the page can be turned between step S205 and step S206 on the image display unit 104B. The page turning as the next action may be performed in response to the voice recognition result “yes” by the utterance of the user US and the input of the movement of the user US.
 次に、ユーザーの動作を検出する動作検出部として,ユーザーUSの手HDの動きを検出できる近接センサ105を用いた例について説明する。図15は、ユーザーUSがHMD100を装着したまま、手HDを顔の前で右においた状態を示す図であり、また図16は、ユーザーUSがHMD100を装着したまま、手HDを顔の前で左においた状態を示す図である。更に、図17は、ユーザーUSがHMD100を装着したまま、手HDを顔の前で上においた状態を示す図であり、図18は、ユーザーUSがHMD100を装着したまま、手HDを顔の前で下においた状態を示す図である。近接センサ105は、例えば4分割された検出領域を持ち、手HDが図15~18に示すいずれの位置にあるかを区別してジェスチャー信号を出力することができるものとする。 Next, an example in which the proximity sensor 105 that can detect the movement of the hand HD of the user US is used as the motion detection unit that detects the motion of the user will be described. FIG. 15 is a diagram showing a state in which the user US puts the hand HD on the right in front of the face while wearing the HMD 100. FIG. 16 shows a state in which the user US puts the hand HD in front of the face while wearing the HMD 100. It is a figure which shows the state put on the left. Further, FIG. 17 is a diagram illustrating a state in which the user US is wearing the HMD 100 and the hand HD is placed in front of the face. FIG. 18 is a diagram illustrating the state in which the user US is wearing the HMD 100 It is a figure which shows the state put down before. The proximity sensor 105 has, for example, a detection region divided into four parts, and can output a gesture signal by distinguishing which position the hand HD is in as shown in FIGS.
 図19は、例えばRAM126内に記憶されたテーブルTBL2の例である。ここで、制御処理部121は、近接センサ105から出力されたジェスチャー信号を受信したときは、その信号に基づいて、ユーザーUSの手HDが図17に示すように顔の上方に移動した後、続いて下方に移動しなかったと判断した場合、フラグを「上」とし、ユーザーUSの手HDが図18に示すように顔の下方に移動した後、続いて上方に移動しなかったと判断した場合、フラグを「下」とし、ユーザーUSの手HDが図15に示すように顔の右方に移動した後、続いて左方に移動しなかったと判断した場合、フラグを「右」とし、ユーザーUSの手HDが図16に示すように顔の左方に移動した後、続いて右方に移動しなかったと判断した場合、フラグを「左」とし、ユーザーUSの手HDが図17,18の間で上下に動いていると判断した場合、フラグを「上下」とし、ユーザーUSの手HDが図15,16の間で左右に動いていると判断した場合、フラグを「左右」とする。以上のフラグが立ったとき、制御処理部121は規定のセンサ入力がなされたものとして、制御(図11参照)を実行する。尚、フラグの種類は以上に限られない。 FIG. 19 shows an example of the table TBL2 stored in the RAM 126, for example. Here, when the control processing unit 121 receives the gesture signal output from the proximity sensor 105, based on the signal, after the hand HD of the user US moves above the face as shown in FIG. 17, When it is determined that it has not moved downward, the flag is set to “Up”, and after it is determined that the hand HD of the user US has not moved upward after moving below the face as shown in FIG. If the flag is set to “down” and the user's US hand HD moves to the right as shown in FIG. 15 and then does not move to the left, the flag is set to “right” When it is determined that the US hand HD has moved to the left of the face as shown in FIG. 16 and then has not moved to the right, the flag is set to “left” and the user's US hand HD is changed to FIGS. Moving up and down between If you cross, the flag is "vertical", if the hand of the user US HD is determined to be moving from side to side between 15 and 16, the flag is set to "right". When the above flag is set, the control processing unit 121 executes control (see FIG. 11) assuming that a predetermined sensor input has been made. The types of flags are not limited to the above.
 又、制御処理部121は、音声認識部111Eが音声認識を行った結果、「うえ」、「した」、「みぎ」、「ひだり」、「はい」、「いいえ」、の6種類の単語を入力したと判断した場合、規定の音声認識がなされたものとして、制御(図12参照)を実行する。尚、単語は以上に限られない。テーブルTBL1中で、欄内にアクション内容が記載された組み合わせは、両者間に所定の関係が成立しているものとし、一方、欄内に×の付された組み合わせは対応関係がなく、両者間に所定の関係が成立していないものとする。本例でも、制御処理部121が図11,12にフローチャートに従って、制御を実行できる。 In addition, as a result of the speech recognition by the speech recognition unit 111E, the control processing unit 121 determines six types of words “up”, “done”, “migi”, “hidari”, “yes”, “no”. If it is determined that the input has been made, control (see FIG. 12) is executed assuming that the prescribed voice recognition has been performed. The word is not limited to the above. In the table TBL1, the combinations whose action contents are described in the column are assumed to have a predetermined relationship between them, while the combinations marked with “X” in the column have no corresponding relationship. It is assumed that the predetermined relationship is not established. Also in this example, the control processing unit 121 can execute control according to the flowcharts shown in FIGS.
 図13を参照して具体的に説明すると、例えば、ステップS101(又はS203)の近接センサ105の入力で「上下」のフラグが立ち、ステップS103(又はS201)の音声認識結果の単語が「はい」であったときは、テーブルTBL2の照合により所定の関係が成立したものと判断されて、次のアクションが「決定」になり、これに応じて制御処理部121がボタンB1をオンすることとなる。これにより確認が肯定(決定)される。一方、ステップS101(又はS203)の近接センサ105の入力で「左右」のフラグが立ち、ステップS103(又はS201)の音声認識結果の単語が「いいえ」であったときは、テーブルTBL2の照合により所定の関係が成立したものと判断されて、次のアクションが「キャンセル」になり、これに応じて制御処理部121がボタンB2をオンすることとなる。これにより確認が否定(キャンセル)される。 Specifically, referring to FIG. 13, for example, an “up / down” flag is set by the input of the proximity sensor 105 in step S101 (or S203), and the word of the speech recognition result in step S103 (or S201) is “Yes”. ”, It is determined that the predetermined relationship has been established by collating the table TBL2, and the next action is“ determine ”. In response, the control processing unit 121 turns on the button B1. Become. This confirms (determines) the confirmation. On the other hand, when the “left / right” flag is set by the input of the proximity sensor 105 in step S101 (or S203) and the word of the speech recognition result in step S103 (or S201) is “no”, the table TBL2 is collated. When it is determined that the predetermined relationship is established, the next action is “cancel”, and in response to this, the control processing unit 121 turns on the button B2. As a result, the confirmation is denied (cancelled).
 テーブルTBL2で対応関係が存在しない組み合わせの場合(例えば近接センサ105の入力で「左右」のフラグが立ち、音声認識結果が「はい」であったような場合)、テーブルTBL2の照合によりセンサ入力結果と音声認識結果との間で所定の関係が不成立(欄内が×)であるために、次のアクションが決定されることなく割り込みルーチンを抜けるので、ボタンB1、B2のいずれも操作されることがない。その他、上述の実施形態と同様である。 In the case of a combination that does not have a correspondence relationship in the table TBL2 (for example, when a “left / right” flag is set at the input of the proximity sensor 105 and the voice recognition result is “yes”), the sensor input result is obtained by collating the table TBL2. Since the predetermined relationship is not established between the voice and the voice recognition result (the inside of the column is x), the interrupt routine is exited without determining the next action, so that both the buttons B1 and B2 are operated. There is no. Others are the same as in the above-described embodiment.
 以上、HMDを例にとり本発明を説明してきたが、本発明はHMDに限らず、ウェアラブル端末全般に適用可能である。又、ユーザーの動作を検出する動作検出部としては、以上の例に限られず、例えばユーザーの眼球の動きから視線を検出したり、ユーザーの発話に応じた唇の動きを検出するものであっても良い。 The present invention has been described above by taking the HMD as an example, but the present invention is not limited to the HMD and can be applied to all wearable terminals. In addition, the motion detection unit for detecting the user's motion is not limited to the above example. For example, the motion detection unit detects the line of sight from the motion of the user's eyeball or detects the motion of the lips according to the user's utterance. Also good.
 本発明は、明細書に記載の実施形態に限定されるものではなく、他の実施形態・変形例を含むことは、本明細書に記載された実施形態や技術思想から本分野の当業者にとって明らかである。明細書の記載及び実施形態は、あくまでも例証を目的としており、本発明の範囲は後述するクレームによって示されている。 The present invention is not limited to the embodiments described in the specification, and other embodiments and modifications are included for those skilled in the art from the embodiments and technical ideas described in the present specification. it is obvious. The description and the embodiments are for illustrative purposes only, and the scope of the present invention is indicated by the following claims.
100      HMD
101      フレーム
101a     前方部
101b     側部
101c     側部
101d     長孔
101e     長孔
102      眼鏡レンズ
103      主本体部
104      ディスプレイユニット
104B     画像表示部
104DR    表示制御部
105      近接センサ
106      カメラ
106a     レンズ
107      右副本体部
107a     突起
108      左副本体部
108a     突起
109      地磁気センサ
110A     加速度センサ
110B     角速度センサ
111B     マイク
111C     スピーカー・イヤホン
111D     音声処理部
111E     音声認識部
112      照度センサ
113      色温度センサ
114      温度センサ
121      制御処理部
122      操作部
123      受信部
124      通信部
127      バッテリー
129      ストレージデバイス
130      電源回路
B1、B2    ボタン
CD       コード
CTU      制御ユニット
HD       手
HS       配線
TBL1、TBL2  テーブル
US       ユーザー
100 HMD
101 Frame 101a Front part 101b Side part 101c Side part 101d Long hole 101e Long hole 102 Eyeglass lens 103 Main body part 104 Display unit 104B Image display part 104DR Display control part 105 Proximity sensor 106 Camera 106a Lens 107 Right sub-main part 107a Projection 108 Left sub-body portion 108a Protrusion 109 Geomagnetic sensor 110A Acceleration sensor 110B Angular velocity sensor 111B Microphone 111C Speaker / Earphone 111D Audio processing unit 111E Speech recognition unit 112 Illuminance sensor 113 Color temperature sensor 114 Temperature sensor 121 Control processing unit 122 Operation unit 123 Reception unit 124 Communication unit 127 Battery 129 Storage device Chair 130 Power supply circuit B1, B2 Button CD Code CTU Control unit HD Hand HS Wiring TBL1, TBL2 Table US User

Claims (8)

  1.  ユーザーの身体に装着されて使用されるウェアラブル端末であって、
     前記ユーザーの音声を入力する音声入力部と、
     前記音声入力部が入力した前記ユーザーの音声を単語に変換する音声解読部と、
     前記ユーザーの動作を検出する動作検出部と、
     前記音声入力部による前記ユーザーの音声の入力と、前記動作検出部による前記ユーザーの動作の検出とが所定の時間間隔内で生じたときに、前記音声解読部が変換した単語と、前記動作検出部が検出した前記ユーザーの動作とを比較し、両者間に所定の関係が成立したと判断したときは、前記単語に対応するアクションを決定する制御部とを有するウェアラブル端末。
    A wearable device worn on the user's body,
    A voice input unit for inputting the voice of the user;
    A voice decoding unit that converts the voice of the user input by the voice input unit into a word;
    An operation detection unit for detecting the operation of the user;
    The words converted by the speech decoding unit when the input of the user's speech by the speech input unit and the detection of the user's motion by the motion detection unit occur within a predetermined time interval, and the motion detection A wearable terminal having a control unit that determines an action corresponding to the word when it is determined that a predetermined relationship has been established between the user's movements detected by the unit.
  2.  前記動作検出部は、前記ユーザーの頭部の動きを検出する請求項1に記載のウェアラブル端末。 The wearable terminal according to claim 1, wherein the motion detection unit detects a movement of the user's head.
  3.  前記動作検出部は、前記ユーザーの手の動きを検出する請求項1に記載のウェアラブル端末。 The wearable terminal according to claim 1, wherein the motion detection unit detects a motion of the user's hand.
  4.  前記単語は、前記ユーザーの動作に関連する意味を持つ請求項1~3のいずれかに記載のウェアラブル端末。 The wearable terminal according to any one of claims 1 to 3, wherein the word has a meaning related to the operation of the user.
  5.  前記制御部は、前記音声解読部が変換した単語と、前記動作検出部が検出した前記ユーザーの動作とを対応づけたテーブルを記憶しており、前記テーブルに基づいて、前記単語と前記ユーザーの動作とが対応づけられたときは、両者間に所定の関係が成立したと判断する請求項1~4のいずれかに記載のウェアラブル端末。 The control unit stores a table in which the word converted by the speech decoding unit and the user's motion detected by the motion detection unit are associated, and based on the table, the word and the user's motion are stored. The wearable terminal according to any one of claims 1 to 4, wherein when the operation is associated, it is determined that a predetermined relationship is established between the two.
  6.  前記テーブルにおいて、前記単語と前記ユーザーの動作との対応付けを任意に変更可能である請求項5に記載のウェアラブル端末。 The wearable terminal according to claim 5, wherein in the table, the association between the word and the user's action can be arbitrarily changed.
  7.  前記単語に対応するアクションを任意に変更可能である請求項1~6のいずれかに記載のウェアラブル端末。 The wearable terminal according to any one of claims 1 to 6, wherein an action corresponding to the word can be arbitrarily changed.
  8.  前記ウェアラブル端末は、前記ユーザーの頭部に装着されるヘッドマウントディスプレイである請求項1~7のいずれかに記載のウェアラブル端末。 The wearable terminal according to any one of claims 1 to 7, wherein the wearable terminal is a head mounted display mounted on a head of the user.
PCT/JP2017/032781 2016-09-28 2017-09-12 Wearable terminal WO2018061743A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016189413 2016-09-28
JP2016-189413 2016-09-28

Publications (1)

Publication Number Publication Date
WO2018061743A1 true WO2018061743A1 (en) 2018-04-05

Family

ID=61763498

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/032781 WO2018061743A1 (en) 2016-09-28 2017-09-12 Wearable terminal

Country Status (1)

Country Link
WO (1) WO2018061743A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022535250A (en) * 2019-06-10 2022-08-05 オッポ広東移動通信有限公司 Control method, wearable device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08234789A (en) * 1995-02-27 1996-09-13 Sharp Corp Integrated recognition interactive device
JPH1173297A (en) * 1997-08-29 1999-03-16 Hitachi Ltd Recognition method using timely relation of multi-modal expression with voice and gesture
JP2004233909A (en) * 2003-01-31 2004-08-19 Nikon Corp Head-mounted display
JP2010511958A (en) * 2006-12-04 2010-04-15 韓國電子通信研究院 Gesture / voice integrated recognition system and method
US20110313768A1 (en) * 2010-06-18 2011-12-22 Christian Klein Compound gesture-speech commands
JP2015526753A (en) * 2012-06-15 2015-09-10 本田技研工業株式会社 Scene recognition based on depth

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH08234789A (en) * 1995-02-27 1996-09-13 Sharp Corp Integrated recognition interactive device
JPH1173297A (en) * 1997-08-29 1999-03-16 Hitachi Ltd Recognition method using timely relation of multi-modal expression with voice and gesture
JP2004233909A (en) * 2003-01-31 2004-08-19 Nikon Corp Head-mounted display
JP2010511958A (en) * 2006-12-04 2010-04-15 韓國電子通信研究院 Gesture / voice integrated recognition system and method
US20110313768A1 (en) * 2010-06-18 2011-12-22 Christian Klein Compound gesture-speech commands
JP2015526753A (en) * 2012-06-15 2015-09-10 本田技研工業株式会社 Scene recognition based on depth

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TAKAHASHI, FUMITADA ET AL.: "How to discern initiation of recognition and discard unintended actio: and speech", NIKKEI ELECTRONICS, 30 April 2012 (2012-04-30), pages 48 - 49 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2022535250A (en) * 2019-06-10 2022-08-05 オッポ広東移動通信有限公司 Control method, wearable device and storage medium
JP7413411B2 (en) 2019-06-10 2024-01-15 オッポ広東移動通信有限公司 Control method, wearable device and storage medium

Similar Documents

Publication Publication Date Title
US10891953B2 (en) Multi-mode guard for voice commands
US11914835B2 (en) Method for displaying user interface and electronic device therefor
US10949057B2 (en) Position-dependent modification of descriptive content in a virtual reality environment
US20180210544A1 (en) Head Tracking Based Gesture Control Techniques For Head Mounted Displays
US9261700B2 (en) Systems and methods for performing multi-touch operations on a head-mountable device
US20150109191A1 (en) Speech Recognition
US20170115736A1 (en) Photo-Based Unlock Patterns
US11947728B2 (en) Electronic device for executing function based on hand gesture and method for operating thereof
US11073898B2 (en) IMU for touch detection
KR20220002605A (en) Control method, wearable device and storage medium
CN108369451B (en) Information processing apparatus, information processing method, and computer-readable storage medium
JP2018206080A (en) Head-mounted display device, program, and control method for head-mounted display device
WO2018061743A1 (en) Wearable terminal
CN117063142A (en) System and method for adaptive input thresholding
JP6790769B2 (en) Head-mounted display device, program, and control method of head-mounted display device
US20240046578A1 (en) Wearable electronic device displaying virtual object and method for controlling the same
US20230196765A1 (en) Software-based user interface element analogues for physical device elements
EP4369155A1 (en) Wearable electronic device and method for identifying controller by using wearable electronic device
US20230065008A1 (en) Electronic device for performing plurality of functions using stylus pen and method for operating same
JP2017157120A (en) Display device, and control method for the same
KR20220149191A (en) Electronic device for executing function based on hand gesture and method for operating thereof
KR20230063829A (en) Waearable electronic device displaying virtual object and method for controlling the same
CN115145035A (en) Head-mounted device, method for controlling head-mounted device, and recording medium
KR20230134961A (en) Electronic device and operating method thereof
JP2016212769A (en) Display device, control method for the same and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17855700

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17855700

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP