JP2019220145A

JP2019220145A - Operation terminal, voice input method, and program

Info

Publication number: JP2019220145A
Application number: JP2019042991A
Authority: JP
Inventors: 康平田原; Kohei Tahara; 太田　雄策; Yusaku Ota; 雄策太田; 杉本　博子; Hiroko Sugimoto; 博子杉本
Original assignee: Panasonic Intellectual Property Corp of America
Current assignee: Panasonic Intellectual Property Corp of America
Priority date: 2018-06-13
Filing date: 2019-03-08
Publication date: 2019-12-26

Abstract

To make an operation terminal ready to input a voice without troubling a user.SOLUTION: An operation terminal comprises: an imaging part which images a space; a person detection part which detects a user from information on the imaged space; a voice input part which inputs an utterance voice made by the user; a coordinate detection part which detects, when the person detection part detects the user, first coordinates of a predetermined first region included in the upper limb of the user and second coordinates of a predetermined second region included in the upper half of the body excluding the upper limb of the user on the basis of information obtained by predetermined means; and a condition determination part which compares position relation between the first coordinates and second coordinates, and makes the voice input part ready to input the voice when the position relation meets a predetermined condition at least once.SELECTED DRAWING: Figure 4

Description

本開示は、ユーザの発話音声によって操作される操作端末、その操作端末における音声入力方法、及びその音声入力方法をコンピュータに実行させるためのプログラムに関するものである。 The present disclosure relates to an operation terminal operated by a user's uttered voice, a voice input method in the operation terminal, and a program for causing a computer to execute the voice input method.

ユーザが特定の端末を音声で操作する場合、端末はユーザの音声を収音する必要があるが、その方式は大きく二つに分類される。一つは、ユーザの操作によってユーザの音声入力の開始を判断し、収音を開始する方式である。もう一つは、常時収音を行い、収音された音から音声を抽出する方式である。後者の方式では、常時端末に音声が収集されていると感じてしまうため、ユーザはプライバシーが漏洩するなどの懸念を抱く可能性がある。そのため、前者の方式のようにユーザが音声入力を行いたいという意思を示している場合にのみ音声の収音を行う方式が有効である。 When a user operates a specific terminal by voice, the terminal needs to collect the voice of the user, and the method is roughly classified into two types. One is a method of judging the start of a user's voice input by a user's operation and starting sound collection. The other is a method of constantly collecting sound and extracting sound from the collected sound. In the latter method, the user always feels that voice is being collected by the terminal, so that the user may have a concern that privacy is leaked. Therefore, a method of collecting sound only when the user has indicated his / her intention to input a voice, as in the former method, is effective.

また、近年、ユーザのジェスチャーを検出することによってロボットを指示する技術も知られている。例えば、特許文献１には、自然な状態で指示動作をすることができ、かつ精度の高い指示位置検出を行うために、複数のカメラで撮影した画像から、人物の頭部の位置と手先の位置と手の向きとを検出し、これらの検出結果に基づいて人物が指示する方向を検出し、検出した方向から人物が指示する位置を検出する指示位置検出装置が開示されている。 Also, in recent years, a technique of instructing a robot by detecting a gesture of a user has been known. For example, Patent Document 1 discloses that in order to perform a pointing operation in a natural state and perform highly accurate pointing position detection, the position of a person's head and the tip of a hand are determined from images captured by a plurality of cameras. There is disclosed a pointing position detection device that detects a position and a hand direction, detects a direction pointed by a person based on the detection results, and detects a position pointed by the person from the detected direction.

特許文献２には、任意の位置で行われる腕を使ったジェスチャーを適切に認識するために、複数の距離画像センサの中から腕を使ったジェスチャーを適切に認識できた距離画像センサを特定し、特定した距離画像センサを利用して認識されたジェスチャーを登録するジェスチャー管理システムが開示されている。 Patent Document 2 specifies a range image sensor capable of appropriately recognizing a gesture using an arm from among a plurality of range image sensors in order to appropriately recognize a gesture using an arm performed at an arbitrary position. A gesture management system that registers a gesture recognized using a specified range image sensor is disclosed.

特許第４１４９２１３号Patent No. 4149213 特許第６３０３９１８号Patent No. 6303918

しかし、特許文献１、２では、ユーザに対して空間内の特定の方向に正しく腕を向けるというような煩わしいジェスチャーが要求されており、更なる改善の必要がある。 However, in Patent Literatures 1 and 2, a troublesome gesture such as turning a user's arm correctly in a specific direction in a space is required for the user, and further improvement is required.

本開示の目的は、ユーザに煩わしさを与えることなく、操作端末を音声入力の受付可能状態にする操作装置などを提供することである。 An object of the present disclosure is to provide an operation device or the like that brings an operation terminal into a state in which a voice input can be accepted without giving a user trouble.

本開示の一態様に係る操作端末は、ユーザの発話音声によって操作される操作端末であって、
空間を撮像する撮像部と、
撮像された前記空間の情報から、前記ユーザを検出する人検出部と、
前記ユーザによる発話音声の入力を受け付ける音声入力部と、
前記人検出部により前記ユーザが検出された場合、所定の手段によって得られた情報に基づいて前記ユーザの上肢に含まれる所定の第一部位の第一座標と前記ユーザの上肢を除く上半身に含まれる所定の第二部位の第二座標とを検出する座標検出部と、
前記第一座標と前記第二座標との位置関係を比較し、少なくとも一回、前記位置関係が所定の第一条件を満たした場合、前記音声入力部を音声入力の受付可能状態とする条件判定部とを備える。 An operation terminal according to an aspect of the present disclosure is an operation terminal operated by a user's uttered voice,
An imaging unit for imaging a space;
From the information of the imaged space, a person detection unit that detects the user,
A voice input unit that receives an input of an uttered voice by the user;
When the user is detected by the human detection unit, the first coordinate of a predetermined first portion included in the upper limb of the user based on information obtained by predetermined means and included in the upper body excluding the upper limb of the user A coordinate detection unit that detects a second coordinate of a predetermined second part to be
Comparing the positional relationship between the first coordinates and the second coordinates, and, at least once, when the positional relationship satisfies a predetermined first condition, a condition determination that sets the voice input unit to a state in which a voice input can be accepted. Unit.

本開示によれば、ユーザに煩わしさを与えることなく、操作端末を音声入力の受付可能状態にすることができる。 According to the present disclosure, the operation terminal can be set to a state in which voice input can be accepted without giving the user any trouble.

本開示の実施の形態１に係る操作端末とユーザとの位置関係の一例を示した図である。FIG. 3 is a diagram illustrating an example of a positional relationship between an operation terminal and a user according to Embodiment 1 of the present disclosure. 操作端末の外観構成の一例を示した図である。It is a figure showing an example of appearance composition of an operation terminal. 撮像装置によって計測されるユーザの骨格情報の一例を示す図である。FIG. 4 is a diagram illustrating an example of skeleton information of a user measured by an imaging device. 本開示の実施の形態１に係る操作端末の構成の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a configuration of an operation terminal according to Embodiment 1 of the present disclosure. 本開示の実施の形態１に係る開始条件判定部の処理の一例を示すフローチャートである。6 is a flowchart illustrating an example of a process of a start condition determination unit according to Embodiment 1 of the present disclosure. 本開示の実施の形態における管理部の処理の一例を示すフローチャートである。11 is a flowchart illustrating an example of a process of a management unit according to the embodiment of the present disclosure. 開始条件を説明するために、ユーザの骨格情報を例示した図である。FIG. 5 is a diagram illustrating skeleton information of a user for explaining a start condition. ジェスチャー可能範囲の一例を示す図である。It is a figure showing an example of a gesture possible range. 複数のユーザが操作端末に対してジェスチャーを行う場合を示した図である。FIG. 9 is a diagram illustrating a case where a plurality of users make a gesture on the operation terminal. 状態通知の第一例を示す図である。It is a figure showing the first example of a state notice. 状態通知の第二例を示す図である。It is a figure showing the 2nd example of a state notice. 状態通知の第三例を示す図である。It is a figure showing the 3rd example of a state notice. 状態通知の第四例を示す図である。It is a figure showing the 4th example of a state notice. 状態通知の第五例を示す図である。It is a figure showing the 5th example of a state notice. 図４で例示した操作端末のブロック図に対して、図１０〜図１４で例示した表示装置及び再生装置を加えた場合の操作端末のブロック図である。FIG. 15 is a block diagram of the operation terminal when the display device and the playback device illustrated in FIGS. 10 to 14 are added to the block diagram of the operation terminal illustrated in FIG. 4. 実施の形態２に係る操作端末の構成の一例を示すブロック図である。FIG. 9 is a block diagram illustrating an example of a configuration of an operation terminal according to Embodiment 2. 本開示の実施の形態２に係る終了条件判定部の処理の一例を示すフローチャートである。13 is a flowchart illustrating an example of a process of a termination condition determination unit according to Embodiment 2 of the present disclosure. 終了条件を説明するために、ユーザの骨格情報を例示した図である。FIG. 7 is a diagram illustrating skeleton information of a user in order to explain an end condition. 本開示の実施の形態２に係るタイムアウト判定部の処理の一例を示すフローチャートである。13 is a flowchart illustrating an example of a process of a timeout determination unit according to Embodiment 2 of the present disclosure. 本開示の実施の形態２に係る管理部の処理の一例を示すフローチャートである。15 is a flowchart illustrating an example of a process of a management unit according to the second embodiment of the present disclosure. 撮像装置、再生装置、及び表示装置を操作端末１とは別の装置で構成した場合の構成の一例を示す図である。FIG. 3 is a diagram illustrating an example of a configuration in a case where an imaging device, a playback device, and a display device are configured by devices different from the operation terminal 1.

（本開示の基礎となった知見）
本発明者は、例えば、家屋内の壁に立て掛けられ、ユーザからの音声を認識することによって家屋に設置された各種の電気機器を操作する操作端末を研究している。このような操作端末では、ユーザが音声入力を行っていること、又は音声入力を行おうとしていることを認識する必要がある。音声操作が可能な端末の多くは、特定のフレーズを常時音声認識できるように音声を常時収音し、特定のフレーズを認識したことをトリガーに特定のフレーズ以外のフレーズの音声認識を開始する構成を備えるのが一般的である。しかし、この構成では、音声が常時収音されるため、ユーザはプライバシーの侵害などの懸念を抱く可能性がある。したがって、音声を常時収音せずに、ユーザによる音声入力の開始の意思を判定する仕組みが必要である。 (Knowledge underlying the present disclosure)
The present inventor is studying, for example, an operation terminal that leans against a wall in a house and operates various electric devices installed in the house by recognizing a voice from a user. In such an operation terminal, it is necessary to recognize that the user is performing a voice input or is attempting to perform a voice input. Many terminals that can be operated by voice always pick up voice so that a specific phrase can always be recognized, and trigger voice recognition of phrases other than the specific phrase when triggered by recognition of the specific phrase It is common to provide. However, in this configuration, since the voice is always collected, there is a possibility that the user may have a concern such as invasion of privacy. Therefore, there is a need for a mechanism for determining a user's intention to start voice input without constantly collecting voice.

また、特定のフレーズをユーザに発話させる構成を前記操作端末にそのまま適用すると、家電機器を操作するたびにユーザは特定のフレーズを発話する必要があることに加え、操作端末の方を向いているにも拘わらずユーザは特定のフレーズを発話する必要があるため、ユーザに煩わしさ及び不自然さを与えてしまう。 In addition, if a configuration in which the user utters a specific phrase is applied to the operation terminal as it is, the user is required to utter a specific phrase every time the home appliance is operated, and the user is facing the operation terminal. Nevertheless, since the user needs to utter a specific phrase, the user is bothered and unnatural.

その一方、上述の特許文献１、２のようにユーザが行う何らかのジェスチャーをトリガーにロボット等の機器の操作する手法が存在する。 On the other hand, there is a method of operating a device such as a robot by using a gesture performed by a user as a trigger, as described in Patent Documents 1 and 2 described above.

しかし、特許文献１が検出対象とするジェスチャーは、ユーザがロボットに物を拾わせたり、ロボットを移動させたりするためのジェスチャーであり、音声認識の開始の意思表示のためのジェスチャーではない。そのため、特許文献１では、ユーザに対して空間内の特定の方向に腕を向けるジェスチャーが要求されている。したがって、特許文献１の技術を前記操作端末に適用すると、ユーザは音声認識を開始させるために、わざわざ特定の方向に腕を向けるジェスチャーをする必要があり、ユーザに煩わしさを与えてしまう。 However, the gesture to be detected in Patent Literature 1 is a gesture for the user to make the robot pick up an object or move the robot, and is not a gesture for indicating intention to start voice recognition. Therefore, in Patent Literature 1, a gesture is required for the user to turn his arm in a specific direction in space. Therefore, when the technique of Patent Document 1 is applied to the operation terminal, the user needs to make a gesture of turning his / her arm in a specific direction in order to start voice recognition, which gives the user trouble.

また、特許文献２は、ショッピングモール、博物館、展示会場などの空間内において、任意の位置で行われるユーザによる腕を使ったジェスチャーを管理する技術であり、音声認識の開始の意思表示のためのジェスチャーを管理する技術ではない。また、特許文献２が管理対象とするジェスチャーは、展示物などの物体に対して腕を向けるジェスチャーであるため、腕の方向が重要となり、その方向が異なれば異なるジェスチャーと判断される。したがって、特許文献２の技術をそのまま前記操作端末に適用した場合、ユーザは管理されたジェスチャーと同じ方向に腕を向けるジェスチャーを行う必要があり、ユーザに煩わしさを与えてしまう。また、特許文献２において、腕の方向を厳密に問わない簡易なジェスチャーで音声認識を開始させようとする場合、ユーザは音声認識の開始に利用したい多種多様な腕の方向の異なるジェスチャーを事前に登録する必要があり、やはりユーザに煩わしさを与えてしまう。 Further, Patent Literature 2 is a technology for managing a gesture performed by a user using an arm performed at an arbitrary position in a space such as a shopping mall, a museum, an exhibition hall, and the like. It's not a gesture management technology. In addition, since the gesture to be managed in Patent Literature 2 is a gesture in which the arm is pointed at an object such as an exhibit, the direction of the arm is important, and if the direction is different, the gesture is determined to be different. Therefore, when the technique of Patent Literature 2 is applied to the operation terminal as it is, the user needs to perform a gesture of turning his / her arm in the same direction as the managed gesture, which gives the user trouble. Further, in Patent Document 2, when trying to start voice recognition with a simple gesture irrespective of the direction of the arm, the user needs to perform various gestures with different arm directions that the user wants to use to start the voice recognition in advance. It is necessary to register, again giving the user trouble.

そこで、本発明者は、ユーザに煩わしさを与えずに音声認識を開始させるためには、厳密な腕の向きを問わないような簡易なジェスチャーが有効であるとの知見を得て本開示を想到するにいたった。 Therefore, the present inventor has obtained the knowledge that a simple gesture such as irrespective of the exact arm direction is effective in order to start voice recognition without giving the user annoyance, and disclosed the present disclosure. I came to imagination.

本構成によれば、ユーザの上肢に含まれる第一部位の第一座標と、ユーザの上肢を除く上半身に含まれる第二部位の第二座標との位置関係が所定の第一条件を満たす場合に音声入力部は音声入力の受付可能状態とされる。そのため、本構成は、例えば、首より少し上に腕を上げるというような腕の向きを問わない簡易なジェスチャーをユーザに行わせることによって音声入力部を音声入力の受付可能状態にすることができる。その結果、ユーザに煩わしさを与えることなく、操作端末を音声入力の受付可能状態にすることができる。 According to this configuration, when the positional relationship between the first coordinates of the first part included in the upper limb of the user and the second coordinates of the second part included in the upper body excluding the upper limb of the user satisfies a predetermined first condition. Then, the voice input unit is set to a state in which voice input can be accepted. Therefore, in this configuration, for example, the user can perform a simple gesture irrespective of the direction of the arm, such as raising the arm slightly above the neck, so that the voice input unit can be in a state where the voice input can be accepted. . As a result, the operation terminal can be set to a state in which voice input can be accepted without giving the user any trouble.

上記態様において、前記空間の情報から、前記ユーザの骨格情報を抽出する骨格情報抽出部をさらに備え、
前記所定の手段によって得られた情報は、前記骨格情報であってもよい。 In the above aspect, the apparatus further includes a skeleton information extracting unit that extracts skeleton information of the user from the information of the space,
The information obtained by the predetermined means may be the skeleton information.

本態様によれば、ユーザの骨格情報に基づいて、第一座標と第二座標とが検出されているため、第一座標と第二座標とを正確に検出できる。 According to this aspect, since the first coordinates and the second coordinates are detected based on the skeleton information of the user, the first coordinates and the second coordinates can be accurately detected.

上記態様において、前記撮像部は、可視光カメラ、赤外線カメラ、ＴＯＦセンサ、超音波センサ、又は電波センサであってもよい。 In the above aspect, the imaging unit may be a visible light camera, an infrared camera, a TOF sensor, an ultrasonic sensor, or a radio wave sensor.

本構成によれば、撮像部が可視光カメラ、赤外線カメラ、ＴＯＦセンサ、超音波センサ、又は電波センサで構成されているため、空間情報に距離情報が含まれることになり、周囲の空間に居るユーザを正確に検出できる。 According to this configuration, since the imaging unit is configured by the visible light camera, the infrared camera, the TOF sensor, the ultrasonic sensor, or the radio wave sensor, the spatial information includes the distance information, and the user is in the surrounding space. A user can be detected accurately.

上記態様において、前記位置関係は、鉛直方向における、前記第一座標と前記第二座標との位置関係であってもよい。 In the above aspect, the positional relationship may be a positional relationship between the first coordinates and the second coordinates in a vertical direction.

第一座標と第二座標との鉛直方向における位置関係が第一条件を満たす場合、音声入力の受付可能状態になるため、ユーザは、例えば、鉛直方向に上肢を上げるような簡易なジェスチャーを行うだけで、受付可能状態にすることができる。 If the positional relationship between the first coordinate and the second coordinate in the vertical direction satisfies the first condition, the voice input can be accepted, so the user performs a simple gesture such as raising the upper limb in the vertical direction, for example. Just by doing so, it can be set in the acceptable state.

上記態様において、前記位置関係は、前記ユーザの体幹軸方向における、前記第一座標と前記第二座標との位置関係であってもよい。 In the aspect described above, the positional relationship may be a positional relationship between the first coordinates and the second coordinates in a trunk axis direction of the user.

本構成によれば、第一座標と第二座標との位置関係が第一条件を満たした場合に受付可能状態になるため、ユーザは、例えば、体幹軸方向に上肢を上げるような簡易なジェスチャーを行うだけで、前記受付可能状態にすることができる。また、体幹軸方向を基準に位置関係が判断されているため、ユーザは、例えば、寝ころんだ状態、起立した状態というような現在の姿勢を気にせずに、上肢を体幹軸方向に上げることで前記受付可能状態にすることができる。 According to this configuration, when the positional relationship between the first coordinates and the second coordinates satisfies the first condition, the receivable state is established. Therefore, for example, the user can easily move the upper limb in the trunk axis direction. The gesture can be brought into the acceptable state only by performing a gesture. In addition, since the positional relationship is determined based on the trunk axis direction, the user raises the upper limb in the trunk axis direction without worrying about the current posture such as a lying state or a standing state, for example. Thus, the above-mentioned receivable state can be set.

上記態様において、前記座標検出部は、さらに前記上半身における第三部位の第三座標を検出し、
前記第一条件は、前記第一座標、前記第二座標、及び前記第三座標の成す角度が、所定の閾値を超える、前記所定の閾値を下回る、又は所定の範囲に収まることであってもよい。 In the above aspect, the coordinate detection unit further detects a third coordinate of a third part in the upper body,
The first condition is that the angle formed by the first coordinate, the second coordinate, and the third coordinate exceeds a predetermined threshold, is lower than the predetermined threshold, or falls within a predetermined range. Good.

本構成によれば、上半身における第三部位の第三座標がさらに検出され、第一座標、第二座標、及び第三座標の成す角度が所定の閾値を超える、下回る、又は所定の範囲内に収まった場合、位置関係が第一条件を満たすと判定される。そのため、ユーザは、例えば、上肢が体幹軸方向に対して所定角度になるようなジェスチャーによって前記受付可能状態にすることができる。 According to this configuration, the third coordinate of the third part in the upper body is further detected, and the angle formed by the first coordinate, the second coordinate, and the third coordinate exceeds, falls below, or falls within a predetermined range. If it is, it is determined that the positional relationship satisfies the first condition. Therefore, the user can be brought into the receivable state by, for example, a gesture in which the upper limb is at a predetermined angle with respect to the trunk axis direction.

上記態様において、前記第一部位は、前記上肢に含まれる複数の部位を含み、
前記第一座標は、前記複数の部位のいずれか１以上の座標に基づいて決定されてもよい。 In the above aspect, the first site includes a plurality of sites included in the upper limb,
The first coordinates may be determined based on coordinates of one or more of the plurality of parts.

本構成によれば、第一部位を構成する複数の部位のそれぞれの座標に基づいて第一座標が決定されるため、第一座標を柔軟に決定できる。 According to this configuration, since the first coordinates are determined based on the coordinates of each of the plurality of parts constituting the first part, the first coordinates can be determined flexibly.

上記態様において、前記第二部位は、前記上肢を除く前記上半身に含まれる複数の部位を含み、
前記第二座標は、前記複数の部位のいずれか１以上の座標に基づいて決定されてもよい。 In the above aspect, the second portion includes a plurality of portions included in the upper body excluding the upper limb,
The second coordinates may be determined based on coordinates of one or more of the plurality of parts.

本構成によれば、第二部位を構成する複数の部位のそれぞれの座標に基づいて第二座標が決定されるため、第二座標を柔軟に決定できる。 According to this configuration, since the second coordinates are determined based on the coordinates of each of the plurality of parts constituting the second part, the second coordinates can be determined flexibly.

上記態様において、前記第一条件は、複数の第二条件を含み、
前記条件判定部は、前記位置関係が、前記複数の第二条件の少なくとも１つ、又は前記複数の第二条件の一部を組み合わせた第三条件を満たす場合に前記受付可能状態にしてもよい。 In the above aspect, the first condition includes a plurality of second conditions,
The condition determination unit may set the receivable state when the positional relationship satisfies at least one of the plurality of second conditions or a third condition obtained by combining a part of the plurality of second conditions. .

本構成によれば、位置関係が第一条件を満たすか否かの判定を柔軟に行うことができる。 According to this configuration, it is possible to flexibly determine whether the positional relationship satisfies the first condition.

上記態様において、前記音声入力部が前記受付可能状態にあるか否かを示す情報を出力する表示部又は再生部をさらに備えてもよい。 In the above aspect, a display unit or a playback unit that outputs information indicating whether the voice input unit is in the receivable state may be further provided.

本構成によれば、音声入力部が受付可能状態にあるか否かを示す情報を視覚的又は聴覚的にユーザに通知できる。 According to this configuration, it is possible to visually or audibly notify the user of information indicating whether or not the voice input unit is in a receivable state.

上記態様において、前記表示部は、ディスプレイであってもよい。 In the above aspect, the display unit may be a display.

本構成によれば、音声入力部が受付可能状態にあるか否かを示す情報をディスプレイを用いてユーザに通知できる。 According to this configuration, it is possible to notify the user of information indicating whether the voice input unit is in the receivable state using the display.

上記態様において、前記音声入力部が前記受付可能状態にあるか否かを示す情報は、色、テキスト、又はアイコンであってもよい。 In the above aspect, the information indicating whether the voice input unit is in the receivable state may be a color, a text, or an icon.

本構成によれば、色、テキスト、又はアイコンを用いて音声入力部が受付可能状態であるか否かをユーザに通知できる。 According to this configuration, it is possible to notify the user whether or not the voice input unit is in the receivable state using the color, the text, or the icon.

上記態様において、前記表示部は、前記音声入力部が前記受付可能状態にあることを示す光を発光する発光装置であってもよい。 In the above aspect, the display unit may be a light emitting device that emits light indicating that the voice input unit is in the receivable state.

本構成によれば、発光装置から発光される光によって音声入力部が受付可能状態にあるか否かをユーザに通知できる。 According to this configuration, the user can be notified of whether or not the voice input unit is in a receivable state by the light emitted from the light emitting device.

上記態様において、前記再生部は、前記音声入力部が前記受付可能状態にあるか否かを示す音声を出力してもよい。 In the above aspect, the playback unit may output a sound indicating whether the sound input unit is in the receivable state.

本構成によれば、音声によって音声入力部が受付可能状態にあるか否かをユーザに通知できる。 According to this configuration, the user can be notified by voice whether or not the voice input unit is in a receivable state.

上記態様において、前記再生部は、前記音声入力部が前記受付可能状態にあるか否かを示す音を出力してもよい。 In the above aspect, the reproducing unit may output a sound indicating whether or not the audio input unit is in the receivable state.

本構成によれば、音によって音声入力部が受付可能状態にあるか否かをユーザに通知できる。 According to this configuration, the user can be notified by sound whether or not the voice input unit is in a receivable state.

上記態様において、前記条件判定部は、前記操作端末と前記ユーザとの距離が所定の第四条件を満たすときにのみ、前記位置関係を比較してもよい。 In the above aspect, the condition determination unit may compare the positional relationship only when a distance between the operation terminal and the user satisfies a predetermined fourth condition.

本構成によれば、操作端末とユーザとの距離が所定の第四条件を満たすときのみ、第一座標及び第二座標の位置関係が比較されるため、操作端末を操作する意思のないユーザに対して位置関係を比較する処理が実行されることを防止でき、処理コストを低減できる。 According to this configuration, only when the distance between the operation terminal and the user satisfies the predetermined fourth condition, the positional relationship between the first coordinates and the second coordinates is compared, so that the user who does not intend to operate the operation terminal can In this case, it is possible to prevent the processing for comparing the positional relationships from being executed, and to reduce the processing cost.

上記態様において、前記条件判定部は、前記受付可能状態において無音区間が一定時間続いた場合、前記受付可能状態を終了してもよい。 In the above aspect, the condition determination unit may end the receivable state when a silent section continues for a predetermined time in the receivable state.

本構成によれば、受付可能状態において無音区間が一定時間続いた場合、受付可能状態が終了されるため、ユーザが操作端末を操作する意思がないにも拘わらず、受付可能状態が継続されることを防止できる。その結果、ユーザのプライバシーを確保できる。 According to this configuration, if the silent section continues for a certain period of time in the receivable state, the receivable state is ended, so that the receivable state is continued even though the user does not intend to operate the operation terminal. Can be prevented. As a result, the privacy of the user can be secured.

上記態様において、前記条件判定部は、前記受付可能状態において前記位置関係が前記第一条件を満たしている限り、前記受付可能状態を継続してもよい。 In the above aspect, the condition determination unit may continue the receivable state as long as the positional relationship satisfies the first condition in the receivable state.

本構成によれば、受付可能状態においては位置関係が第一条件を満たしている限り、受付可能状態が継続されるため、ユーザは位置関係が第一条件を満たすジェスチャーを継続することで、操作端末への音声による操作の意思表示を示すことができる。 According to this configuration, in the receivable state, as long as the positional relationship satisfies the first condition, the receivable state is continued, so that the user can continue the gesture in which the positional relationship satisfies the first condition, thereby performing an operation. It is possible to indicate the intention of the operation by voice to the terminal.

上記態様において、前記条件判定部は、前記受付可能状態において前記位置関係が前記第一条件を満たさない状態が所定のタイムアウト期間継続した場合、前記受付可能状態を終了してもよい。 In the above aspect, the condition determination unit may end the receivable state when the state where the positional relationship does not satisfy the first condition continues for a predetermined timeout period in the receivable state.

本構成によれば、受付可能状態において位置関係が第一条件を満たさない状態がタイムアウト期間継続された場合、受付可能状態が終了されるため、ユーザが操作端末を操作する意思がないにも拘わらず、受付可能状態が継続されることを防止できる。その結果、ユーザのプライバシーを確保できる。 According to this configuration, when the state in which the positional relationship does not satisfy the first condition in the receivable state is continued for the timeout period, the receivable state is terminated, so that the user has no intention to operate the operation terminal. Therefore, it is possible to prevent the receivable state from being continued. As a result, the privacy of the user can be secured.

上記態様において、前記条件判定部は、前記タイムアウト期間において、前記位置関係が前記第一条件を満たすと判定した場合、前記タイムアウト期間を延長してもよい。 In the above aspect, the condition determination unit may extend the timeout period when determining that the positional relationship satisfies the first condition during the timeout period.

上記構成によれば、タイムアウト期間において、再度、位置関係が第一条件を満たすジェスチャーを行うことによって音声入力の受付可能状態を継続させることができる。 According to the configuration, in the timeout period, the gesture in which the positional relationship satisfies the first condition is performed again, so that the state in which the voice input can be accepted can be continued.

上記態様において、前記条件判定部は、前記タイムアウト期間の終了時に音声入力が検出されていれば、前記受付可能状態を継続してもよい。 In the above aspect, the condition determination unit may continue the receivable state if a voice input is detected at the end of the timeout period.

本態様によれば、位置関係が第一条件を満たさない状態がタイムアウト期間継続されたとしてもタイムアウト期間の終了時に音声入力が検出されていれば、受付可能状態が継続されるため、操作端末を操作するための発話を行っているにも拘わらず、受付可能状態が終了されることを防止できる。 According to this aspect, even if the state in which the positional relationship does not satisfy the first condition is continued for the timeout period, if a voice input is detected at the end of the timeout period, the acceptable state is continued. It is possible to prevent the receivable state from ending even though the utterance for operation is being performed.

上記態様において、前記条件判定部は、前記位置関係が前記第一条件とは異なる所定の第五条件を満たした場合、前記受付可能状態を終了してもよい。 In the above aspect, the condition determination unit may end the receivable state when the positional relationship satisfies a predetermined fifth condition different from the first condition.

本構成によれば、ユーザは位置関係が第五条件を満たすジェスチャーを行うことで、受付可能状態を終了させることができる。 According to this configuration, the user can end the receivable state by performing a gesture whose positional relationship satisfies the fifth condition.

上記態様において、前記条件判定部は、前記人検出部が複数のユーザを検出した場合、特定の一人を前記操作端末の操作者として認識してもよい。 In the above aspect, the condition determination unit may recognize a specific one as an operator of the operation terminal when the human detection unit detects a plurality of users.

本構成によれば、人検出部が複数のユーザを検出した場合、特定の一人が操作端末の操作者として認識されるため、操作端末の周囲に複数のユーザがいる状況下で、一人のユーザに操作端末を操作する権利を与えることができる。その結果、操作者の操作に関する発話を正確に認識できる。 According to this configuration, when the human detection unit detects a plurality of users, a specific one is recognized as an operator of the operation terminal. Can be given the right to operate the operation terminal. As a result, the utterance related to the operation of the operator can be accurately recognized.

上記態様において、前記操作者は、前記複数のユーザのうち前記操作端末に最も近いユーザであってもよい。 In the above aspect, the operator may be a user closest to the operation terminal among the plurality of users.

本構成によれば、人検出部が複数のユーザを検出した場合、操作端末の最も近くに居るユーザが操作者として特定されるため、複数のユーザの中から一人の操作者を簡素な処理により特定できる。 According to this configuration, when the human detection unit detects a plurality of users, the user who is closest to the operation terminal is specified as the operator. Can be identified.

本開示は、このような操作端末に含まれる特徴的な各構成をコンピュータに実行させるプログラム、或いはこのプログラムによって動作する音声入力方法として実現することもできる。また、このようなプログラムを、ＣＤ−ＲＯＭ等のコンピュータ読取可能な非一時的な記録媒体あるいはインターネット等の通信ネットワークを介して流通させることができるのは、言うまでもない。 The present disclosure can also be realized as a program that causes a computer to execute each characteristic configuration included in such an operation terminal, or a voice input method that is operated by the program. Needless to say, such a program can be distributed via a non-transitory computer-readable recording medium such as a CD-ROM or a communication network such as the Internet.

なお、以下で説明する実施の形態は、いずれも本開示の一具体例を示すものである。以下の実施の形態で示される数値、形状、構成要素、ステップ、ステップの順序などは、一例であり、本開示を限定する主旨ではない。また、以下の実施の形態における構成要素のうち、最上位概念を示す独立請求項に記載されていない構成要素については、任意の構成要素として説明される。また全ての実施の形態において、各々の内容を組み合わせることもできる。 Each of the embodiments described below shows a specific example of the present disclosure. Numerical values, shapes, components, steps, order of steps, and the like shown in the following embodiments are merely examples, and do not limit the present disclosure. Further, among the components in the following embodiments, components not described in the independent claims indicating the highest concept are described as arbitrary components. Further, in all the embodiments, the respective contents can be combined.

（実施の形態１）
図１は、本開示の実施の形態１に係る操作端末１とユーザＵ１との位置関係の一例を示した図である。操作端末１は、例えば、ユーザＵ１が居住する家などの建物内に設置され、ユーザＵ１が発話した音声を収音し、音声認識することで、ユーザＵ１からの操作を受け付ける装置である。操作端末１が受け付ける操作は、例えば、建物内に設置された電化機器に対する操作、及び操作端末１への操作などである。電化機器は、例えば、洗濯機、冷蔵庫、電子レンジ、空調器機などの家庭用電化機器、及びテレビ、オーディオ機器、レコーダーなどのＡＶ機器などである。ユーザＵ１は、ある電化機器を操作する場合、操作端末１に近づいて、当該電化機器を操作するための発話を行う。すると、操作端末１は発話された音声を音声認識し、操作対象となる電化機器及びその電化機器に対する操作内容を決定し、操作対象となる電化機器に対して操作内容に応じた制御コマンドを送信する。なお、操作端末１は、ネットワークを介して電化機器と無線又は有線により通信可能に接続されている。ネットワークは、例えば、無線ＬＡＮ、有線ＬＡＮなどである。なお、ネットワークにはインターネットが含まれていてもよい。 (Embodiment 1)
FIG. 1 is a diagram illustrating an example of a positional relationship between the operation terminal 1 and the user U1 according to Embodiment 1 of the present disclosure. The operation terminal 1 is, for example, a device that is installed in a building such as a house in which the user U1 lives, collects voice uttered by the user U1, recognizes the voice, and receives an operation from the user U1. The operations accepted by the operation terminal 1 include, for example, operations on electric appliances installed in the building, operations on the operation terminal 1, and the like. The electric appliances include, for example, household electric appliances such as a washing machine, a refrigerator, a microwave oven, and an air conditioner, and AV devices such as a television, an audio device, and a recorder. When operating a certain electric appliance, the user U1 approaches the operation terminal 1 and speaks to operate the electric appliance. Then, the operation terminal 1 recognizes the uttered voice, determines the electric appliance to be operated and the operation content for the electric appliance, and transmits a control command corresponding to the operation content to the electric appliance to be operated. I do. Note that the operation terminal 1 is communicably connected to electric appliances wirelessly or by wire via a network. The network is, for example, a wireless LAN, a wired LAN, or the like. Note that the network may include the Internet.

図２は、操作端末１の外観構成の一例を示した図である。図２に例示するように操作端末１は、撮像装置３０１（撮像部の一例）及び収音装置３０７（音声入力部の一例）を備えている。撮像装置３０１はユーザＵ１が操作端末１の周囲に存在していることを検出する人検出機能と、ユーザＵ１の空間内における位置を検出する位置検出機能と、図３に例示するようなユーザＵ１の骨格情報２０１を検出する骨格検出機能とを備えていてもよい。収音装置３０７は、ユーザＵ１が操作端末１に対して発話した音声を収音する機能を有している。 FIG. 2 is a diagram illustrating an example of an external configuration of the operation terminal 1. As illustrated in FIG. 2, the operation terminal 1 includes an imaging device 301 (an example of an imaging unit) and a sound collection device 307 (an example of a voice input unit). The imaging device 301 includes a person detection function for detecting that the user U1 is present around the operation terminal 1, a position detection function for detecting the position of the user U1 in the space, and a user U1 as illustrated in FIG. And a skeleton detecting function for detecting the skeleton information 201 of the above. The sound collection device 307 has a function of collecting sound uttered by the user U1 to the operation terminal 1.

図３は、撮像装置３０１によって計測されるユーザＵ１の骨格情報２０１の一例を示す図である。骨格情報２０１にはユーザＵ１の複数の部位のそれぞれの空間における三次元座標を含む部位座標２０２と、部位座標２０２の各々をユーザＵ１の身体に沿って繋ぐリンク２０３とを備えている。部位座標は、手首、肘、肩などの関節の座標と、指先、足先、及び頭部などの身体の末端の座標とを含む。なお、部位座標は、胸の中心、へそなどの関節及び先端以外の身体の特徴的な部位の座標を含んでもよい。 FIG. 3 is a diagram illustrating an example of the skeleton information 201 of the user U1 measured by the imaging device 301. The skeleton information 201 includes part coordinates 202 including three-dimensional coordinates of the plurality of parts of the user U1 in the respective spaces, and links 203 connecting each of the part coordinates 202 along the body of the user U1. The part coordinates include coordinates of joints such as wrists, elbows, and shoulders, and coordinates of body ends such as fingertips, toes, and heads. The part coordinates may include coordinates of a characteristic part of the body other than the center of the chest, the joint such as the navel, and the tip.

図３の例では、上から順に、頭の先、首と顔の付け根（首先）、首と胴体の付け根（首元）、左右の肘、左右の手首、腰、左右の膝、左右の足首が部位座標２０２として採用されている。 In the example of FIG. 3, in order from the top, the tip of the head, the base of the neck and the face (neck), the base of the neck and the body (neck), the left and right elbows, the left and right wrists, the waist, the left and right knees, and the left and right ankles Are adopted as the part coordinates 202.

部位座標２０２を示す三次元座標は、例えば、操作端末１を基準に設定された直交座標系、ユーザＵ１を基準に設定された直交座標系、操作端末１を基準に設定された極座標系、又はユーザＵ１を基準に設定された極座標系で定義される。但し、これは一例であり、三次元座標を定義する座標系は、これらの座標系に限定されない。リンク２０３は、例えば、部位座標２０２同士を繋ぐ三次元ベクトルである。 The three-dimensional coordinates indicating the part coordinates 202 are, for example, a rectangular coordinate system set based on the operation terminal 1, a rectangular coordinate system set based on the user U1, a polar coordinate system set based on the operation terminal 1, or It is defined in a polar coordinate system set based on the user U1. However, this is only an example, and the coordinate system defining the three-dimensional coordinates is not limited to these coordinate systems. The link 203 is, for example, a three-dimensional vector connecting the part coordinates 202.

撮像装置３０１は、周囲の空間情報が取得できる機能を備えていれば、その構成は特に限定されない。例えば、撮像装置３０１は、可視光カメラ、赤外線カメラ、ＴＯＦセンサ、超音波センサ、及び電波センサなど、操作端末１の周囲の空間の三次元的な情報を示す空間情報を計測するセンサで構成される。なお、撮像装置３０１は、可視光カメラ、赤外線カメラ、ＴＯＦセンサ、超音波センサ、及び電波センサのいずれか２つ以上を組合せて人検出機能および骨格情報検出機能を実現してもよい。 The configuration of the imaging device 301 is not particularly limited as long as the imaging device 301 has a function of acquiring surrounding spatial information. For example, the imaging device 301 is configured by a sensor that measures spatial information indicating three-dimensional information of the space around the operation terminal 1, such as a visible light camera, an infrared camera, a TOF sensor, an ultrasonic sensor, and a radio wave sensor. You. Note that the imaging device 301 may realize a human detection function and a skeleton information detection function by combining any two or more of a visible light camera, an infrared camera, a TOF sensor, an ultrasonic sensor, and a radio wave sensor.

可視光カメラは、例えば、カラー、モノクロのカメラである。赤外線カメラは、照射した赤外光の反射時間を複数の画素ごとに計測する。ＴＯＦ（Ｔｉｍｅｏｆｆｌｉｇｈｔ）センサは、照射したパルス光の反射時間を複数の画素ごとに計測する。超音波センサは、例えば、超音波センサアレイである。電波センサは、例えば、電波センサアレイである。 The visible light camera is, for example, a color or monochrome camera. The infrared camera measures the reflection time of the emitted infrared light for each of a plurality of pixels. The TOF (Time of Flight) sensor measures the reflection time of the irradiated pulse light for each of a plurality of pixels. The ultrasonic sensor is, for example, an ultrasonic sensor array. The radio wave sensor is, for example, a radio wave sensor array.

図４は、本開示の実施の形態１に係る操作端末１の構成の一例を示すブロック図である。 FIG. 4 is a block diagram illustrating an example of a configuration of the operation terminal 1 according to Embodiment 1 of the present disclosure.

操作端末１は、プロセッサ３００、撮像装置３０１、収音装置３０７、収音音声記録部３０８、及びメモリ３０９を備える。プロセッサ３００は、ＣＰＵ等の電子回路で構成され、人検出部３０２、骨格情報抽出部３０３、ジェスチャー抽出部３０４、開始条件判定部３０５、及び管理部３０６を備える。メモリ３０９は収音音声記録部３０８を含む。 The operation terminal 1 includes a processor 300, an imaging device 301, a sound collection device 307, a sound collection sound recording unit 308, and a memory 309. The processor 300 is configured by an electronic circuit such as a CPU, and includes a human detection unit 302, a skeleton information extraction unit 303, a gesture extraction unit 304, a start condition determination unit 305, and a management unit 306. The memory 309 includes a collected sound recording unit 308.

撮像装置３０１は、例えば、所定のフレームレートで空間情報を取得し、人検出部３０２及び骨格情報抽出部３０３に出力する。空間情報は、例えば、ＲＧＢの色成分及び深度成分を含む複数の画素データがマトリックス状に配列されたデータである。なお、空間情報を構成する画素データは深度成分及び色成分の少なくとも一方を備えてればよく、色成分に代えて輝度成分を備えていてもよい。 The imaging device 301 acquires spatial information at a predetermined frame rate, for example, and outputs the spatial information to the human detecting unit 302 and the skeleton information extracting unit 303. The spatial information is, for example, data in which a plurality of pixel data including RGB color components and depth components are arranged in a matrix. Note that the pixel data forming the spatial information only needs to include at least one of the depth component and the color component, and may include a luminance component instead of the color component.

人検出部３０２は、撮像装置３０１から空間情報を取得し、操作端末１の周囲の空間内にユーザが存在しているか否かを検出し、ユーザの存在の有無を示す検出結果をジェスチャー抽出部３０４に出力する。ここで、人検出部３０２は、種々の人検出手法を用いて空間情報からユーザを検出すればよい。例えば、人検出部３０２は、空間情報から１以上の物体を抽出し、１以上の物体のうちいずれか１つが人を示す物体であれば、ユーザが存在すると判定すればよい。 The person detection unit 302 acquires space information from the imaging device 301, detects whether or not a user is present in the space around the operation terminal 1, and outputs a detection result indicating the presence or absence of the user as a gesture extraction unit. Output to 304. Here, the human detection unit 302 may detect the user from the spatial information using various human detection techniques. For example, the person detection unit 302 may extract one or more objects from the spatial information, and determine that a user is present if any one of the one or more objects is an object indicating a person.

骨格情報抽出部３０３は、撮像装置３０１から空間情報を取得し、取得した空間情報からユーザの骨格情報２０１を抽出し、ジェスチャー抽出部３０４に出力する。骨格情報抽出部３０３は、空間情報を取得する度に骨格情報を抽出してもよいし、後述するようにジェスチャー抽出部３０４から骨格情報の抽出依頼を取得したことをトリガーにユーザの骨格情報を抽出してもよい。この場合、骨格情報抽出部３０３は、例えば、ジェスチャー抽出部３０４から、空間情報内に居るユーザの人数及び空間情報内でユーザが居る領域を含む抽出依頼を取得する。これにより、骨格情報抽出部３０３は、ユーザが居る領域内の空間情報から骨格情報２０１を抽出することができ、空間情報の全域から骨格情報２０１を抽出する場合に比べて、処理負荷を削減できる。また、骨格情報抽出部３０３は、人検出部３０２が複数のユーザを検出した場合は、各ユーザが位置する領域をジェスチャー抽出部３０４から取得してもよい。 The skeleton information extracting unit 303 acquires spatial information from the imaging device 301, extracts the user's skeleton information 201 from the acquired spatial information, and outputs the extracted skeleton information 201 to the gesture extracting unit 304. The skeleton information extracting unit 303 may extract the skeleton information every time the spatial information is obtained, or may use the trigger of the skeleton information extraction request from the gesture extracting unit 304 as described below to trigger the skeleton information of the user. May be extracted. In this case, the skeleton information extraction unit 303 acquires, for example, from the gesture extraction unit 304, an extraction request including the number of users in the spatial information and the area in the spatial information where the user is located. Thereby, the skeleton information extracting unit 303 can extract the skeleton information 201 from the spatial information in the area where the user is located, and can reduce the processing load as compared with the case where the skeleton information 201 is extracted from the entire spatial information. . In addition, when the human detection unit 302 detects a plurality of users, the skeleton information extraction unit 303 may acquire an area where each user is located from the gesture extraction unit 304.

骨格情報抽出部３０３は、例えば、スケルトントラッキング、モーションキャプチャリングなどの手法を用いて骨格情報をリアルタイムで抽出する。骨格情報抽出部３０３は、空間内に複数のユーザが存在する場合、それぞれのユーザの骨格情報２０１をリアルタイムで抽出してもよい。 The skeleton information extracting unit 303 extracts skeleton information in real time using a technique such as skeleton tracking and motion capturing. When a plurality of users exist in the space, the skeleton information extracting unit 303 may extract the skeleton information 201 of each user in real time.

ジェスチャー抽出部３０４（座標検出部の一例）は、人検出部３０２から取得した検出結果及び骨格情報抽出部３０３から取得した骨格情報２０１に基づいて、第一座標及び第二座標を抽出し、第一座標及び第二座標を含むジェスチャー抽出情報を開始条件判定部３０５に出力する。 The gesture extracting unit 304 (an example of a coordinate detecting unit) extracts first coordinates and second coordinates based on the detection result obtained from the human detecting unit 302 and the skeleton information 201 obtained from the skeleton information extracting unit 303. The gesture extraction information including the one coordinate and the second coordinate is output to the start condition determination unit 305.

例えば、ジェスチャー抽出部３０４は、人検出部３０２からユーザが検出されたことを示す検出結果を取得した場合、骨格情報抽出部３０３から骨格情報を取得する。或いは、ジェスチャー抽出部３０４は、人検出部３０２から人が検出されたことを示す検出結果を取得した場合、骨格情報抽出部３０３に骨格情報の抽出依頼を出力し、それによって骨格情報抽出部３０３から骨格情報を取得してもよい。この場合、ジェスチャー抽出部３０４は、人検出部３０２の検出結果が示す空間情報内でのユーザの人数及びユーザの居る領域などを抽出依頼に含ませて骨格情報抽出部３０３に出力すればよい。 For example, when the gesture extraction unit 304 acquires a detection result indicating that a user has been detected from the human detection unit 302, the gesture extraction unit 304 acquires skeleton information from the skeleton information extraction unit 303. Alternatively, when the gesture extraction unit 304 obtains a detection result indicating that a person has been detected from the human detection unit 302, the gesture extraction unit 304 outputs a skeleton information extraction request to the skeleton information extraction unit 303, and thereby the skeleton information extraction unit 303 The skeletal information may be obtained from. In this case, the gesture extracting unit 304 may include the number of users and the area where the user is in the spatial information indicated by the detection result of the human detecting unit 302 in the extraction request and output the request to the skeleton information extracting unit 303.

第一座標は、上肢を構成する第一部位の座標である。第二座標は、上肢を除く上半身を構成する第二部位の座標である。上肢は肩関節から指先までを指す。下肢は腰から足先までを指す。上半身は腰から頭の先までを指す。したがって、第二部位は、上肢を除く上半身、すなわち、胴体、首、及び顔内の特定の部位である。例えば、第二部位は、首先、首元、頭の先などである。第一部位は、例えば、手首、肘、及び肩などである。 The first coordinates are coordinates of a first part constituting the upper limb. The second coordinates are coordinates of a second part constituting the upper body excluding the upper limbs. The upper limb points from the shoulder joint to the fingertip. The lower limb points from the waist to the toes. The upper body points from the waist to the tip of the head. Therefore, the second part is a specific part of the upper body excluding the upper limbs, that is, the torso, the neck, and the face. For example, the second part is a neck, a neck, a head, and the like. The first part is, for example, a wrist, an elbow, and a shoulder.

本実施の形態では、ジェスチャー抽出部３０４は、例えば、手首、肘、及び肩のうちのいずれか一つの部位（例えば手首）を第一部位として採用する。また、ジェスチャー抽出部３０４は、例えば、首先、首元、頭の先のうちいずれか一つの部位（例えば、首元）を第二部位として採用する。 In the present embodiment, the gesture extracting unit 304 adopts, for example, any one of the wrist, the elbow, and the shoulder (for example, the wrist) as the first part. In addition, the gesture extracting unit 304 adopts, for example, any one of the neck, the neck, and the head (for example, the neck) as the second part.

但し、これは一例であり、ジェスチャー抽出部３０４は、例えば、手首、肘、及び肩のうちの２以上を第一部位として採用してもよいし、例えば、首元、首先、頭の先のいずれか２つ以上を第二部位として採用してもよい。この場合、ジェスチャー抽出部３０４は、２以上の第一部位の全部又は一部の平均値又は加算値を第一座標として算出してもよい。さらに、この場合、ジェスチャー抽出部３０４は、２以上の第二部位の全部又は一部の平均値又は加算値を第二座標として算出してもよい。 However, this is merely an example, and the gesture extracting unit 304 may employ, for example, two or more of the wrist, elbow, and shoulder as the first part, or, for example, the neck, neck, and head. Any two or more may be adopted as the second part. In this case, the gesture extracting unit 304 may calculate, as the first coordinates, an average value or an added value of all or some of the two or more first parts. Further, in this case, the gesture extracting unit 304 may calculate the average value or the added value of all or some of the two or more second parts as the second coordinates.

さらに、ジェスチャー抽出部３０４は、上半身において第一部位及び第二部位以外の第三部位を抽出してもよい。第三部位は、例えば腰、臍、胸元などである。この場合、ジェスチャー抽出部３０４は、第一座標及び第二座標に加えてさらに第三部位の第三座標をジェスチャー抽出情報に含めればよい。 Further, the gesture extracting unit 304 may extract a third part other than the first part and the second part in the upper body. The third part is, for example, a waist, a navel, a chest, or the like. In this case, the gesture extracting unit 304 may include the third coordinates of the third part in the gesture extraction information in addition to the first coordinates and the second coordinates.

開始条件判定部３０５は、ジェスチャー抽出部３０４から取得したジェスチャー抽出情報に含まれる第一座標及び第二座標の位置関係を比較し、位置関係が音声入力の開始条件（第一条件の一例）を満たすか否かの判定結果を管理部３０６に出力する。開始条件は、例えば、ユーザが上肢を上げるというようなユーザが音声入力の開始の意思表示を示す所定のジェスチャーをしていることを示す条件である。具体的には第一座標が手首、第二座標が首元であるとすると、鉛直方向において第一座標が第二座標よりも上に位置するという条件が開始条件の一例として挙げられる。 The start condition determining unit 305 compares the positional relationship between the first coordinate and the second coordinate included in the gesture extraction information acquired from the gesture extracting unit 304, and determines that the positional relationship is a start condition (an example of the first condition) of voice input. The result of the determination as to whether or not the condition is satisfied is output to the management unit 306. The start condition is, for example, a condition indicating that the user makes a predetermined gesture indicating an intention to start voice input, such as raising the upper limb. Specifically, assuming that the first coordinate is the wrist and the second coordinate is the neck, a condition that the first coordinate is located above the second coordinate in the vertical direction is an example of the start condition.

管理部３０６は、開始条件判定部３０５から開始条件が満たされていることを示す判定結果を取得した場合、メモリ３０９に記憶された状態フラグを立てると共に収音装置３０７に開始指示を出力することで、収音装置３０７を音声入力の受付可能状態にする。一方、管理部３０６は、開始条件判定部３０５から開始条件が満たされていないことを示す判定結果を取得した場合、終了指示を収音装置３０７に出力することで、受付可能状態を終了する。このとき、管理部３０６は、メモリ３０９に記憶された状態フラグを下げることによって、受付可能状態が終了したことを管理する。これにより、ユーザは開始条件を満たすジェスチャーをしている限り、受付可能状態を継続できる。なお、状態フラグを立てるとは、状態フラグのステータスを受付可能状態にすることを指し、状態フラグを下げるとは、状態フラグのステータスを受付可能状態ではない状態（待機状態）にすることを指す。 When acquiring the determination result indicating that the start condition is satisfied from the start condition determination unit 305, the management unit 306 sets the state flag stored in the memory 309 and outputs a start instruction to the sound collection device 307. Then, the sound collection device 307 is set to a state in which a voice input can be accepted. On the other hand, when the management unit 306 acquires a determination result indicating that the start condition is not satisfied from the start condition determination unit 305, the management unit 306 outputs an end instruction to the sound collection device 307, and ends the acceptable state. At this time, the management unit 306 manages the end of the receivable state by lowering the state flag stored in the memory 309. As a result, as long as the user makes a gesture that satisfies the start condition, the acceptable state can be continued. It should be noted that raising the status flag refers to setting the status of the status flag to an acceptable state, and lowering the status flag refers to setting the status of the status flag to a non-acceptable state (standby state). .

収音装置３０７は、周囲の音を取得するマイク、及びマイクを制御する電気回路を含む。収音装置３０７は、管理部３０６から開始指示を取得した場合、マイクに周囲の音を収音させ、収音された音を示す音声信号を収音音声記録部３０８に記録する。これにより、収音装置３０７は、受付可能状態になる。一方、収音装置３０７は、管理部３０６から終了指示を取得した場合、収音を終了する。これにより、収音装置３０７は待機状態になる。 The sound collection device 307 includes a microphone that acquires surrounding sounds, and an electric circuit that controls the microphone. When acquiring the start instruction from the management unit 306, the sound collection device 307 causes the microphone to collect surrounding sounds, and records a sound signal indicating the collected sounds in the sound collection sound recording unit 308. As a result, the sound collection device 307 enters a receivable state. On the other hand, when the sound collection device 307 receives an end instruction from the management unit 306, the sound collection device 307 ends sound collection. As a result, the sound collection device 307 enters a standby state.

収音音声記録部３０８は、収音装置３０７から取得した音声信号を記録することで、操作端末１の音声記録機能を実現する。収音音声記録部３０８は、例えば、不揮発性メモリで構成されてもよいし、揮発性メモリで構成されてもよい。 The sound collection sound recording unit 308 realizes a sound recording function of the operation terminal 1 by recording a sound signal acquired from the sound collection device 307. The sound collection voice recording unit 308 may be configured with, for example, a nonvolatile memory or a volatile memory.

図５は、本開示の実施の形態１に係る開始条件判定部３０５の処理の一例を示すフローチャートである。 FIG. 5 is a flowchart illustrating an example of a process of the start condition determining unit 305 according to Embodiment 1 of the present disclosure.

ステップＳ４０１では、開始条件判定部３０５は、ジェスチャー抽出部３０４からジェスチャー抽出情報を取得する。 In step S401, the start condition determination unit 305 acquires gesture extraction information from the gesture extraction unit 304.

ステップＳ４０２では、開始条件判定部３０５は、ステップＳ４０１で取得したジェスチャー抽出情報に基づいて、操作端末１の周囲のジェスチャー可能範囲内にユーザが存在しているか否かを判定する。ジェスチャー可能範囲内にユーザが存在しないと判定した場合（ステップＳ４０２でＮＯ）、開始条件判定部３０５は、処理をステップＳ４０１に戻し、ジュスチャー抽出情報を取得する。一方、ジェスチャー可能範囲内にユーザが存在すると判定した場合（ステップＳ４０２でＹＥＳ）、ステップＳ４０３を実行する。ジェスチャー可能範囲の詳細は、図８を用いて後述される。ここで、開始条件判定部３０５は、ジェスチャー抽出情報に含まれる第一座標、第二座標、及び第三座標などのユーザの位置を示す座標がジェスチャー可能範囲内にあれば、ユーザはジェスチャー可能範囲内に居ると判定し、前記座標がジェスチャー可能範囲内になければ、ユーザはジェスチャー可能範囲内に居ないと判定すればよい。 In step S402, the start condition determination unit 305 determines whether or not a user is within the gesture possible range around the operation terminal 1 based on the gesture extraction information acquired in step S401. If it is determined that there is no user within the gesture possible range (NO in step S402), the start condition determining unit 305 returns the process to step S401, and acquires gesture extraction information. On the other hand, if it is determined that the user is within the gesture possible range (YES in step S402), step S403 is executed. The details of the gesture possible range will be described later with reference to FIG. Here, the start condition determination unit 305 determines that if the coordinates indicating the user's position, such as the first coordinates, the second coordinates, and the third coordinates, included in the gesture extraction information are within the gesture possible range, the user can perform the gesture possible range. , And if the coordinates are not within the gesture possible range, the user may determine that the user is not within the gesture possible range.

ステップＳ４０３では、開始条件判定部３０５は、ジェスチャー抽出情報に含まれる第一座標、第二座標、及び第三座標の位置関係が開始条件を満たしているか否かを判定する（ステップＳ４０３）。そして、開始条件を満たしていると判定した場合（ステップＳ４０３でＹＥＳ）、開始条件判定部３０５は、ユーザが音声入力の開始の意思表示を示すジェスチャーを行ったため、処理をステップＳ４０４に進める。一方、開始条件を満たしていないと判定した場合（ステップＳ４０３でＮＯ）、開始条件判定部３０５は、処理をステップＳ４０１に戻し、ジュスチャー抽出情報を取得する。 In step S403, the start condition determination unit 305 determines whether the positional relationship between the first coordinate, the second coordinate, and the third coordinate included in the gesture extraction information satisfies the start condition (step S403). If it is determined that the start condition is satisfied (YES in step S403), start condition determination unit 305 advances the process to step S404 because the user has performed a gesture indicating intention to start voice input. On the other hand, when determining that the start condition is not satisfied (NO in step S403), the start condition determining unit 305 returns the process to step S401, and acquires the gesture extraction information.

ステップＳ４０４では、開始条件判定部３０５は、開始条件を満たしていることを示す判定結果を管理部３０６に出力する。ステップＳ４０４が終了すると、開始条件判定部３０５は、処理をステップＳ４０１に戻し、ジュスチャー抽出情報を取得する。 In step S404, the start condition determination unit 305 outputs a determination result indicating that the start condition is satisfied to the management unit 306. When step S404 ends, the start condition determining unit 305 returns the process to step S401, and acquires gesture extraction information.

図６は、本開示の実施の形態における管理部３０６の処理の一例を示すフローチャートである。ステップＳ６０１では、管理部３０６は、収音装置３０７が音声入力の受付可能状態であるか否かを判定する。この場合、管理部３０６は、メモリ３０９に記憶された状態フラグが立っていれば、受付可能状態にあると判定し、メモリ３０９に記憶された状態フラグが立っていなければ、受付可能状態にないと判定すればよい。 FIG. 6 is a flowchart illustrating an example of a process of the management unit 306 according to the embodiment of the present disclosure. In step S601, the management unit 306 determines whether the sound collection device 307 is in a state in which a voice input can be accepted. In this case, if the state flag stored in memory 309 is set, management unit 306 determines that the state is acceptable, and if the state flag stored in memory 309 is not set, it is not in the acceptable state. Should be determined.

受付可能状態にあると判定した場合（ステップＳ６０１でＹＥＳ）、管理部３０６は、処理をステップＳ６０１に戻す。一方、受付可能状態でないと判定した場合（ステップＳ６０１でＮＯ）、管理部３０６は、処理をステップＳ６０２に進める。 If it is determined that it is in the acceptable state (YES in step S601), management unit 306 returns the process to step S601. On the other hand, when determining that the state is not the acceptable state (NO in step S601), management unit 306 advances the process to step S602.

ステップＳ６０２では、管理部３０６は、開始条件判定部３０５から開始条件を満たすことを示す判定結果を取得した場合（ステップＳ６０２でＹＥＳ）、処理をＳ６０３に進め、開始条件を満たすことを示す判定結果を取得しなかった場合（ステップＳ６０２でＮＯ）、管理部３０６は、処理をステップＳ６０１に戻す。 In step S602, if the management unit 306 acquires a determination result indicating that the start condition is satisfied from the start condition determination unit 305 (YES in step S602), the process proceeds to step S603, and the determination result indicates that the start condition is satisfied. Is not acquired (NO in step S602), the management unit 306 returns the process to step S601.

ステップＳ６０３では、管理部３０６は、開始指示を収音装置３０７に出力することで、収音装置３０７を受付可能状態にすると共に、メモリ３０９に記憶された状態フラグを立てる。ステップＳ６０３の処理が終了すると、管理部３０６は、処理をステップＳ６０１に戻す。 In step S603, the management unit 306 outputs a start instruction to the sound collection device 307, thereby setting the sound collection device 307 to an acceptable state and setting a state flag stored in the memory 309. When the processing in step S603 ends, the management unit 306 returns the processing to step S601.

次に、開始条件について説明する。図７は、開始条件を説明するために、ユーザＵ１の骨格情報２０１を例示した図である。図７は、ステップＳ４０３において、開始条件の比較対象となる第一座標、第二座標、及び第三座標が例示されている。図７の例では、第一座標として、手首座標Ｈが採用され、第二座標として首元座標Ｎが採用され、第三座標として腰座標Ｗが採用されている。 Next, the start condition will be described. FIG. 7 is a diagram exemplifying the skeleton information 201 of the user U1 for explaining the start condition. FIG. 7 illustrates the first coordinate, the second coordinate, and the third coordinate to be compared with the start condition in step S403. In the example of FIG. 7, the wrist coordinate H is adopted as the first coordinate, the neck coordinate N is adopted as the second coordinate, and the waist coordinate W is adopted as the third coordinate.

開始条件の第一例は、手首座標Ｈが首元座標Ｎよりも鉛直方向に対して第一閾値（例えば２０ｃｍ）以上、大きい（高い）という条件である。鉛直方向とは、地面に対して直交する方向である。この場合、ユーザＵ１は、手首座標Ｈが首元座標Ｎよりも鉛直方向に対して第一閾値以上高くなるように上肢を上げるジェスチャーを行うことによって、収音装置３０７を受付可能状態にすることができる。なお、上肢は右腕であってもよいし、左腕であってもよい。 A first example of the start condition is a condition that the wrist coordinates H are larger (higher) than the neck base coordinates N by a first threshold (for example, 20 cm) or more in the vertical direction. The vertical direction is a direction orthogonal to the ground. In this case, the user U1 performs the gesture of raising the upper limb so that the wrist coordinate H is higher than the neck coordinate N in the vertical direction by the first threshold or more, so that the sound collection device 307 can be in the acceptable state. Can be. The upper limb may be the right arm or the left arm.

開始条件の第二例は、手首座標Ｈと首元座標Ｎとが鉛直方向に対して所定範囲内に収まるという条件である。例えば、首元座標Ｎを中心に鉛直方向上下の所定範囲内（例えば、プラスマイナス１０ｃｍ程度）に手首座標Ｈが位置するという条件が開始条件の第二例として採用できる。この場合、ユーザＵ１は、肘を曲げて手首座標Ｈを胸元付近まで上げるジェスチャー又は、肘を曲げずに上肢全体を胴体の外側に回して手首座標Ｈを胸元付近まで上げるジェスチャーを行うことで、受付可能状態にできる。 A second example of the start condition is a condition that the wrist coordinates H and the neck base coordinates N fall within a predetermined range in the vertical direction. For example, a condition that the wrist coordinate H is located within a predetermined range (for example, about ± 10 cm) in the vertical direction about the neck coordinate N can be adopted as a second example of the start condition. In this case, the user U1 performs a gesture of bending the elbow to raise the wrist coordinate H to near the chest, or a gesture of turning the entire upper limb to the outside of the torso without bending the elbow and raising the wrist coordinate H to near the chest. It can be set in the acceptable state.

開始条件の第三例は、首元座標Ｎと腰座標Ｗとをつないだ体幹軸方向において、手首座標Ｈが首元座標Ｎよりも第一閾値（例えば１０ｃｍ）以上大きいという条件である。この場合、ユーザＵ１は、手首座標Ｈが首元座標Ｎよりも第一閾値以上高くなるように上肢を上げるジェスチャーを行うことによって、受付可能状態にすることができる。この場合、ユーザＵ１は、寝ころんでいる或いは起立しているとった現在の姿勢に拘わらず、鉛直方向を意識せずに、体幹軸方向に沿って上肢を上げることで、受付可能状態にできる。 A third example of the start condition is a condition that the wrist coordinate H is larger than the neck coordinate N by a first threshold (for example, 10 cm) or more in the trunk axis direction connecting the neck coordinate N and the waist coordinate W. In this case, the user U1 can perform the gesture of raising the upper limb so that the wrist coordinate H is higher than the neck base coordinate N by the first threshold or more, so that the user U1 can enter the receivable state. In this case, regardless of the current posture in which the user U1 is lying down or standing up, the user U1 can raise the upper limb along the trunk axis direction without being conscious of the vertical direction, so that the reception state can be set. .

また、開始条件の第四例は、手首座標Ｈと首元座標Ｎとが体幹軸方向に対して所定範囲内に収まるという条件である。例えば、首元座標Ｎを中心に体幹軸方向上下の所定範囲内（例えば、プラスマイナス１０ｃｍ程度）に手首座標Ｈが位置するという条件が開始条件の第四例として採用できる。この場合、ユーザＵ１は例えば、寝ころんだ状態で、肘を曲げて手首座標Ｈを胸元付近まで上げるジェスチャー又は、肘を曲げずに上肢全体を胴体の外側に回して手首座標Ｈを胸元付近まで上げるジェスチャーを行うことで、受付可能状態にできる。 A fourth example of the start condition is a condition that the wrist coordinates H and the neck base coordinates N fall within a predetermined range in the trunk axis direction. For example, a condition that the wrist coordinate H is located within a predetermined range (for example, about plus or minus 10 cm) around the neck coordinate N in the trunk axis direction can be adopted as a fourth example of the start condition. In this case, for example, the user U1 raises the wrist coordinate H to the vicinity of the chest by bending the elbow while lying down, or turns the entire upper limb to the outside of the torso without bending the elbow and raises the wrist coordinate H to the vicinity of the chest. By performing a gesture, the state can be set to be acceptable.

また、開始条件の第五例は、手首座標Ｈ及び首元座標Ｎを結んだ上肢方向を示す線分と、腰座標Ｗ及び首元座標Ｎを結んだ体幹軸方向を示す線分との成す角度が所定の第二閾値（例えば１００度、８０度など）以上であるという条件である。この場合、ユーザＵ１は、起立状態又は寝ころんだ状態といった現在の姿勢に拘わらず、鉛直方向を意識せずに体幹軸方向に対して手を上げるジェスチャーを行うことによって、前記受付可能状態にすることができる。 A fifth example of the start condition is a line segment indicating the upper limb direction connecting the wrist coordinates H and the neck coordinates N and a line segment indicating the trunk axis direction connecting the waist coordinates W and the neck coordinates N. The condition is that the angle formed is equal to or larger than a predetermined second threshold (for example, 100 degrees, 80 degrees, etc.). In this case, regardless of the current posture, such as a standing state or a lying state, the user U1 makes a gesture of raising the hand in the trunk axis direction without being conscious of the vertical direction, thereby setting the state to the receivable state. be able to.

また、開始条件の第六例は、手首座標Ｈ及び首元座標Ｎを結んだ上肢方向を示す線分と、腰座標Ｗ及び首元座標Ｎを結んだ体幹軸方向を示す線分との成す角度が所定の角度範囲内収まっているという条件である。所定の角度範囲は、例えば、１００度を中心にプラスマイナス１０度、２０度などである。この場合、ユーザＵ１は、起立した状態及び寝ころんだ状態といった現在の姿勢に拘わらず、鉛直方向を意識せずに上肢方向と体幹軸方向との成す角度が所定の角度範囲内になるように上肢を上げるジェスチャーを行うことによって受付可能状態にすることができる。 The sixth example of the start condition is a line segment indicating the upper limb direction connecting the wrist coordinates H and the neck coordinates N and a line segment indicating the trunk axis direction connecting the waist coordinates W and the neck coordinates N. This is a condition that the angle formed falls within a predetermined angle range. The predetermined angle range is, for example, plus or minus 10 degrees or 20 degrees around 100 degrees. In this case, regardless of the current posture such as the standing state and the lying state, the user U1 does not care about the vertical direction so that the angle between the upper limb direction and the trunk axis direction is within a predetermined angle range. By performing a gesture of raising the upper limb, the reception state can be set.

開始条件は、第一例〜第六例のうちのいずれか２以上を組み合わせた条件であってもよい。例えば、開始条件は、第一例〜第六例のうちいずれか２以上の条件が共に成立したという条件（第三条件の一例）が採用できる。或いは、開始条件は、第一例〜第六例のうちのいずれか一つ又は少なくとも２つが成立したという条件（第二条件の一例）であってもよい。ここでは、第一例〜第六例は共に上肢を上げるというジェスチャーが想定されているが、これは一例である。例えば、上肢を下げるジェスチャー、左右の上肢を広げるジェスチャーなど種々のジェスチャーが開始条件として採用でき、検出対象となるジェスチャーは特に限定されない。左右の上肢を広げるジェスチャーは、例えば、左右の上肢を上に上げるジェスチャー、左右の上肢を下に下げるジェスチャー、及び片方の上肢を上げて残り片方の上肢を下げるジェスチャーなどである。 The start condition may be a condition in which any two or more of the first to sixth examples are combined. For example, as the start condition, a condition that any two or more of the first to sixth examples are satisfied (an example of a third condition) can be adopted. Alternatively, the start condition may be a condition that one or at least two of the first to sixth examples is satisfied (an example of a second condition). Here, the gesture of raising the upper limb is assumed in each of the first to sixth examples, but this is an example. For example, various gestures such as a gesture of lowering the upper limb and a gesture of expanding the left and right upper limb can be adopted as the start condition, and the gesture to be detected is not particularly limited. The gestures for extending the left and right upper limbs include, for example, a gesture for raising the left and right upper limbs, a gesture for lowering the left and right upper limbs, and a gesture for raising one upper limb and lowering the other upper limb.

次に、ステップＳ４０２の処理の一例を説明する。図８は、ジェスチャー可能範囲９０１の一例を示す図である。図８に例示されるようにジェスチャー可能範囲９０１は、ジェスチャー不可能範囲９０２と、ジェスチャー不可能範囲９０３とに挟まれている。 Next, an example of the process of step S402 will be described. FIG. 8 is a diagram illustrating an example of the gesture possible range 901. As illustrated in FIG. 8, the gesture possible range 901 is sandwiched between a gesture impossible range 902 and a gesture impossible range 903.

開始条件判定部３０５は、ジェスチャー可能範囲９０１及びジェスチャー不可能範囲９０２、９０３とユーザＵ１の位置とを比較することによって、ユーザＵ１のジェスチャーを検出する範囲をユーザＵ１と操作端末１との距離によって制限する。 The start condition determination unit 305 compares the position of the user U1 with the gesture possible range 901 and the gesture impossible range 902, 903, and determines the range in which the gesture of the user U1 is detected based on the distance between the user U1 and the operation terminal 1. Restrict.

ジェスチャー不可能範囲９０２は、ジェスチャー可能範囲９０１の下限値Ｄ１を半径とし、操作端末１を中心とする円形又は扇形の領域である。ジェスチャー可能範囲９０１は、上限値Ｄ２を半径とする円形又は扇形の領域からジェスチャー不可能範囲９０２を取り除いたドーナツ状の領域である。ジェスチャー不可能範囲９０３は、操作端末１から上限値Ｄ２より離れた領域である。 The non-gesture range 902 is a circular or fan-shaped area around the operation terminal 1 with the lower limit value D1 of the gesture possible range 901 as the radius. The gesture possible range 901 is a donut-shaped area obtained by removing the non-gesture-possible range 902 from a circular or fan-shaped area whose radius is the upper limit value D2. The gesture impossible range 903 is an area that is farther from the operation terminal 1 than the upper limit value D2.

したがって、開始条件判定部３０５は、ユーザＵ１の位置が操作端末１に対して下限値Ｄ１から上限値Ｄ２までの範囲に位置する、すなわちジェスチャー可能範囲９０１に位置するという条件（第四条件の一例）を満たす場合、ユーザＵ１のジェスチャーを検出する。一方、開始条件判定部３０５は、ユーザＵ１の位置が操作端末１に対して下限値Ｄ１以下に位置する場合、又は、ユーザＵ１の位置が操作端末１に対して上限値Ｄ２以上の範囲に位置する場合、ユーザＵ１のジェスチャーを検出しない。 Therefore, the start condition determining unit 305 determines that the position of the user U1 is located in the range from the lower limit D1 to the upper limit D2 with respect to the operation terminal 1, that is, the user U1 is located in the gesture possible range 901 (an example of the fourth condition ), The gesture of the user U1 is detected. On the other hand, the start condition determining unit 305 determines whether the position of the user U1 is less than or equal to the lower limit D1 with respect to the operation terminal 1, or if the position of the user U1 is within a range equal to or more than the upper limit D2 with respect to the operation terminal 1. In this case, the gesture of the user U1 is not detected.

ユーザＵ１の位置が操作端末１に対して近すぎる場合、ユーザＵ１のジェスチャーをうまく検出できない可能性があることに加えてユーザＵ１がたまたま操作端末１の近傍で何らかの作業を行っているなど操作端末１を操作する意思がない可能性もある。また、ユーザＵ１の位置が操作端末１に対して遠すぎる場合、ユーザＵ１が操作端末１を操作する意思がない可能性が高い。そこで、本実施の形態では、開始条件判定部３０５は、ユーザＵ１がジェスチャー可能範囲９０１に居るときのみユーザＵ１のジェスチャーを検出する処理、すなわち、開始条件を満たすか否かを判定する処理を実施することにした。これにより、ジェスチャーの検出精度の低下を防止できると共にユーザＵ１の操作端末１への操作意思がない場合において、ジェスチャーを検出する処理が作動することを防止でき、操作端末１の処理負荷を削減できる。 When the position of the user U1 is too close to the operation terminal 1, there is a possibility that the gesture of the user U1 may not be detected well, and in addition, the operation terminal may be such that the user U1 happens to be performing some work near the operation terminal 1. There is a possibility that there is no intention to operate 1. If the position of the user U1 is too far from the operation terminal 1, there is a high possibility that the user U1 does not intend to operate the operation terminal 1. Therefore, in the present embodiment, the start condition determining unit 305 performs a process of detecting a gesture of the user U1 only when the user U1 is in the gesture possible range 901, that is, a process of determining whether the start condition is satisfied. I decided to do it. Accordingly, it is possible to prevent a decrease in the accuracy of gesture detection, and to prevent a process of detecting a gesture from being activated when the user U1 does not intend to operate the operation terminal 1, thereby reducing a processing load on the operation terminal 1. .

上記説明では、ジェスチャー抽出部３０４は、１つの第一座標と１つの第二座標とをジェスチャー抽出情報に含め、開始条件判定部３０５に出力するとして説明したが、本開示はこれに限定されない。ジェスチャー抽出部３０４は、１又は複数の第一座標と１又は複数の第二座標とをジェスチャー抽出情報に含め、開始条件判定部３０５に出力してもよい。 In the above description, the gesture extraction unit 304 has been described as including one first coordinate and one second coordinate in the gesture extraction information and outputting the information to the start condition determination unit 305, but the present disclosure is not limited to this. The gesture extraction unit 304 may include one or a plurality of first coordinates and one or a plurality of second coordinates in the gesture extraction information and output the information to the start condition determination unit 305.

例えば、ジェスチャー抽出情報に複数の第一座標と１つの第二座標とが含まれる場合、開始条件判定部３０５は、複数の第一座標（例えば、手首座標Ｈ、肘座標、及び肩座標）のうち少なくとも１つの第一座標が１つの第二座標（例えば、首元座標Ｎ）に対して鉛直方向又は体幹軸方向に第一閾値以上、大きければ、開始条件を満たすと判定すればよい。また、ジェスチャー抽出情報に１つの第一座標と複数の第二座標とが含まれる場合、開始条件判定部３０５は、１つの第一座標（例えば、手首座標Ｈ）が複数の第二座標（例えば、胴体座標、首元座標Ｎ、頭の先の座標）の少なくとも１つの第二座標に対して鉛直方向又は体幹軸方向に第一閾値以上、大きいければ、開始条件を満たすと判定すればよい。 For example, when the gesture extraction information includes a plurality of first coordinates and one second coordinate, the start condition determining unit 305 determines the plurality of first coordinates (for example, wrist coordinates H, elbow coordinates, and shoulder coordinates). If at least one of the first coordinates is greater than or equal to one second coordinate (for example, neck coordinate N) in the vertical direction or the trunk axis direction by a first threshold or more, it may be determined that the start condition is satisfied. When the gesture extraction information includes one first coordinate and a plurality of second coordinates, the start condition determining unit 305 determines that one first coordinate (for example, wrist coordinate H) is a plurality of second coordinates (for example, wrist coordinate H). , Torso coordinates, neck coordinates N, coordinates at the tip of the head) in the vertical direction or trunk axis direction with respect to at least one second coordinate of a first threshold or more, if it is determined that the start condition is satisfied. Good.

図９は、複数のユーザが操作端末１に対してジェスチャーを行う場合を示した図である。図９のユーザＵ１及びユーザＵ２に例示されるように操作端末１に対して複数のユーザがジェスチャーを行う場合、ジェスチャー抽出部３０４は、１人の操作者を特定し、特定した操作者に対するジェスチャー抽出情報を開始条件判定部３０５に出力してもよい。この場合、ジェスチャー抽出部３０４は、複数のユーザのうち操作端末１に対して最も近くに位置するユーザを操作者として特定してもよい。 FIG. 9 is a diagram illustrating a case where a plurality of users make a gesture on the operation terminal 1. When a plurality of users make gestures on the operation terminal 1 as exemplified by the user U1 and the user U2 in FIG. 9, the gesture extraction unit 304 specifies one operator and performs a gesture for the specified operator. The extraction information may be output to the start condition determination unit 305. In this case, the gesture extraction unit 304 may specify a user located closest to the operation terminal 1 among the plurality of users as the operator.

また、ジェスチャー抽出部３０４は、複数のユーザのうち最初に検出されたユーザがジェスチャー可能範囲９０１を出るまで最初に検出されたユーザを操作者として特定し続けてもよい。例えば、ユーザＵ１が先にジェスチャー可能範囲９０１に入り、その後、ユーザＵ２がジェスチャー可能範囲９０１に入った場合、ジェスチャー抽出部３０４は、ユーザＵ１がジェスチャー可能範囲９０１に居る限り、ユーザＵ１を操作者として特定する。そして、ユーザＵ１がジェスチャー可能範囲９０１から出ると、ジェスチャー抽出部３０４は、ジェスチャー可能範囲９０１にユーザＵ２が居れば、ユーザＵ２を操作者として特定する。このとき、ジェスチャー可能範囲９０１内にユーザＵ２の他にユーザＵ３が居れば、ジェスチャー抽出部３０４は、ユーザＵ２、ユーザＵ３のうち操作端末１に対する距離が近い方のユーザを操作者として特定してもよい。 In addition, the gesture extraction unit 304 may continue to specify the first detected user as the operator until the first detected user out of the plurality of users exits the gesture possible range 901. For example, if the user U1 enters the gesture possible range 901 first and then the user U2 enters the gesture possible range 901, the gesture extracting unit 304 determines that the user U1 is the operator as long as the user U1 is in the gesture possible range 901. To be specified. Then, when the user U1 exits the gesture possible range 901, the gesture extracting unit 304 specifies the user U2 as the operator if the user U2 exists in the gesture possible range 901. At this time, if there is a user U3 in addition to the user U2 in the gesture possible range 901, the gesture extracting unit 304 specifies a user whose distance to the operation terminal 1 is shorter among the users U2 and U3 as an operator. Is also good.

但し、これらは一例であり複数のユーザの中から１人の操作者を特定する手法は上述の手法に限定されない。 However, these are merely examples, and the method of specifying one operator from a plurality of users is not limited to the above-described method.

次に、実施の形態１の変形例について説明する。実施の形態１の変形例は、受付可能状態にあるか否かを示す状態通知を出力するものである。 Next, a modified example of the first embodiment will be described. A modification of the first embodiment is to output a state notification indicating whether or not the apparatus is in a receivable state.

図１０は、状態通知の第一例を示す図である。第一例では、操作端末１は、前面にディスプレイ５０１を備え、例えば室内の壁などに取り付けられている。撮像装置３０１はディスプレイ５０１の外枠の例えば上側に設けられている。収音装置３０７は、ディスプレイ５０１の外枠の上側において撮像装置３０１の両側に２つ設けられている。これらのことは、図１１、図１２も同じである。 FIG. 10 is a diagram illustrating a first example of the status notification. In the first example, the operation terminal 1 includes a display 501 on the front surface, and is attached to, for example, a wall in a room. The imaging device 301 is provided, for example, above the outer frame of the display 501. Two sound collection devices 307 are provided on both sides of the imaging device 301 above the outer frame of the display 501. These are the same in FIGS. 11 and 12.

状態通知の第一例では、ディスプレイ５０１はテキスト５０２によって状態通知を表示する。この例では、収音装置３０７は音声入力の受付可能状態にあるため、テキスト５０２として「音声入力受付中」が採用されている。これにより、ジェスチャーを行ったユーザは操作端末１が音声入力の受付可能状態であることを認識できる。なお、受付可能状態が終了した場合、ディスプレイ５０１は、テキスト５０２を非表示にしてもよいし、「音声入力待受中」などの操作端末１が受付可能状態にないことを示すテキスト５０２を表示してもよい。なお、図１０に例示したテキスト５０２は一例であり、ユーザが、操作端末１が受付可能状態にあることを認識できるメッセージであれば他のメッセージが採用されてもよい。また、図１０に示す撮像装置３０１及び収音装置３０７のそれぞれの配置場所、個数はほんの一例である。このことは、図１１、図１２も同じである。 In the first example of the status notification, the display 501 displays the status notification by the text 502. In this example, since the sound collection device 307 is in a state where voice input can be accepted, “accepting voice input” is employed as the text 502. Thereby, the user who has performed the gesture can recognize that the operation terminal 1 is in a state in which voice input can be accepted. When the receivable state ends, the display 501 may hide the text 502 or display the text 502 indicating that the operation terminal 1 is not in the receivable state, such as “waiting for voice input”. May be. Note that the text 502 illustrated in FIG. 10 is an example, and other messages may be employed as long as the user can recognize that the operation terminal 1 is in a receivable state. The locations and numbers of the imaging devices 301 and the sound pickup devices 307 shown in FIG. 10 are only examples. This is the same in FIGS. 11 and 12.

図１１は、状態通知の第二例を示す図である。状態通知の第二例では、ディスプレイ５０１はアイコン５０３によって状態通知を表示する。この例では、収音装置３０７は音声入力の受付可能状態にあるため、アイコン５０３としてマイクを模擬したアイコンが採用されている。これにより、ジェスチャーを行ったユーザは操作端末１が音声入力の受付可能状態であることを認識できる。なお、受付可能状態が終了した場合、ディスプレイ５０１は、アイコン５０３を非表示にしてもよいし、音声入力が待受状態であることを示すアイコンなどを表示してもよい。或いは、ディスプレイ５０１は、受付可能状態にある場合、アイコン５０３を所定の第一色で表示し、音声入力が待受状態である場合、第一色とは異なる所定の第二色でアイコン５０３を表示してもよい。なお、図１０に例示したアイコン５０３は一例であり、ユーザが、受付可能状態にあることを認識できるアイコンであれば他のアイコンが採用されてもよい。 FIG. 11 is a diagram illustrating a second example of the status notification. In the second example of the status notification, the display 501 displays the status notification by the icon 503. In this example, since the sound collection device 307 is in a state in which voice input can be received, an icon simulating a microphone is used as the icon 503. Thereby, the user who has performed the gesture can recognize that the operation terminal 1 is in a state in which voice input can be accepted. When the receivable state has ended, the display 501 may hide the icon 503 or display an icon indicating that the voice input is in a standby state. Alternatively, the display 501 displays the icon 503 in a predetermined first color when in the acceptable state, and displays the icon 503 in a predetermined second color different from the first color when the voice input is in a standby state. It may be displayed. Note that the icon 503 illustrated in FIG. 10 is an example, and other icons may be employed as long as the user can recognize that the user is in the receivable state.

図１２は、状態通知の第三例を示す図である。状態通知の第三例では、ディスプレイ５０１は表示領域の全面の色５０４によって状態通知を表示する。色５０４とは、表示領域の全面に表示される背景の色である。この例では、収音装置３０７は音声入力の受付可能状態にあるため、色５０４として受付可能状態であることを示す第一色（例えば、赤、青、黄色など）が採用されている。これにより、ジェスチャーを行ったユーザは操作端末１が音声入力の受付可能状態であることを認識できる。なお、受付可能状態が終了した場合、ディスプレイ５０１は、待受状態であることを示す第一色とは異なる第二色を表示すればよい。第二色としては、例えば、ディスプレイ５０１に表示されるデフォルトの背景色が採用でき、例えば、白、黒、などである。なお、図１２に例示した色５０４は一例であり、ユーザが受付可能状態を認識できる色であればどのような色が採用されてもよい。 FIG. 12 is a diagram illustrating a third example of the status notification. In the third example of the status notification, the display 501 displays the status notification using the color 504 of the entire display area. The color 504 is a color of a background displayed on the entire display area. In this example, since the sound collection device 307 is in a state in which a voice input can be accepted, a first color (for example, red, blue, yellow, or the like) indicating that the sound collection device 307 is in the acceptable state is adopted as the color 504. Thereby, the user who has performed the gesture can recognize that the operation terminal 1 is in a state in which voice input can be accepted. When the receivable state ends, the display 501 may display a second color different from the first color indicating the standby state. As the second color, for example, a default background color displayed on the display 501 can be adopted, for example, white, black, or the like. Note that the color 504 illustrated in FIG. 12 is an example, and any color may be adopted as long as the user can recognize the receivable state.

図１３は、状態通知の第四例を示す図である。第四例において、操作端末１は、前面に例えば２つの収音装置３０７と例えば１つの撮像装置３０１とが配置され、上面に例えば４つの発光装置５０５が配置されている。なお、第四例において、操作端末１は、例えば、スマートスピーカのような机又は床などの上に置かれる据え置き型の装置で構成されている。発光装置５０５は、例えば、発光ダイオードなどである。 FIG. 13 is a diagram illustrating a fourth example of the status notification. In the fourth example, the operation terminal 1 has, for example, two sound collection devices 307 and, for example, one imaging device 301 on the front surface, and has, for example, four light emitting devices 505 on the upper surface. In the fourth example, the operation terminal 1 is constituted by a stationary device placed on a desk or a floor such as a smart speaker, for example. The light emitting device 505 is, for example, a light emitting diode.

第四例では、発光装置５０５によって状態通知を表示する。例えば、操作端末１が受付可能状態にある場合、発光装置５０５は発光する。一方、操作端末１が待機状態にある場合、発光装置５０５は消灯する。これにより、ジェスチャーを行ったユーザは受付可能状態であることを認識できる。但し、これは、一例であり、ユーザが受付可能状態を認識できる態様であれば発光装置５０５の表示態様としてどのようなものが採用されてもよい。例えば、受付可能状態にある場合の発光装置５０５の表示態様としては、例えば、常時点灯させる態様、点滅させる態様、発光する色を時間の経過に応じて変化させる態様などが挙げられる。また、発光装置５０５の表示態様としては、例えば、常時点灯しており、受付可能状態になると点滅する態様が採用されてもよいし、その逆の態様が採用されてもよい。或いは、発光装置５０５の表示態様としては、例えば、受付可能状態にあるときと待受状態にあるときとで、発光する色の種類を変える態様が採用されてもよい。 In the fourth example, a state notification is displayed by the light emitting device 505. For example, when the operation terminal 1 is in a receivable state, the light emitting device 505 emits light. On the other hand, when the operation terminal 1 is in the standby state, the light emitting device 505 is turned off. Thereby, the user who has performed the gesture can recognize that it is in the acceptable state. However, this is merely an example, and any display mode of the light emitting device 505 may be adopted as long as the user can recognize the receivable state. For example, examples of the display mode of the light emitting device 505 in the receivable state include a mode in which the light emitting device 505 is constantly turned on, a mode in which the light emitting device 505 blinks, and a mode in which the color of emitted light is changed over time. In addition, as a display mode of the light emitting device 505, for example, a mode in which the light is constantly lit, and a mode in which the light is flashed when a receivable state is set, or a reverse mode may be adopted. Alternatively, as a display mode of the light-emitting device 505, for example, a mode in which the type of a light-emitting color is changed between when in a receivable state and when in a standby state may be adopted.

図１３では、発光装置５０５の個数は４つであるが、これは一例であり、３つ以下、５つ以上であってもよい。また、発光装置５０５は、上面に配置されているが、これも一例に過ぎず、前面、側面、背面などに配置されてもよい。さらに、撮像装置３０１及び収音装置３０７の個数及び配置箇所も特に限定はされない。 In FIG. 13, the number of the light emitting devices 505 is four, but this is an example, and three or less and five or more may be used. Further, the light emitting device 505 is disposed on the upper surface, but this is also merely an example, and may be disposed on the front surface, the side surface, the rear surface, or the like. Furthermore, the number and arrangement of the imaging devices 301 and the sound collection devices 307 are not particularly limited.

図１４は、状態通知の第五例を示す図である。第五例の操作端末１は、第四例の操作端末１に対して前面にさらにスピーカ５０６が設けられている。第五例ではスピーカ５０６から出力される音によって状態通知を出力する。図１３において、ユーザが受付可能状態であることが認識することが可能であれば、スピーカ５０６の個数及び配置は特に限定されない。第五例においてスピーカ５０６は、受付可能状態にある場合、例えば「音声入力を行ってください」というような受付可能状態であることを示す音声メッセージを出力すればよい。或いは、スピーカ５０６は、受付可能状態にある場合、効果音を出力してもよいし、ビープ音を出力してもよい。これらのように、スピーカ５０６からの音の出力パターンは特定のパターンに限定されない。なお、スピーカ５０６は、待受状態の場合、音の出力を停止すればよい。 FIG. 14 is a diagram illustrating a fifth example of the status notification. The operation terminal 1 of the fifth example is further provided with a speaker 506 on the front side of the operation terminal 1 of the fourth example. In the fifth example, a state notification is output by sound output from the speaker 506. In FIG. 13, the number and arrangement of the speakers 506 are not particularly limited as long as the user can recognize that the speaker is in the acceptable state. In the fifth example, when the speaker 506 is in the receivable state, the speaker 506 may output a voice message indicating that the state is receivable, such as “Please perform voice input”. Alternatively, when the speaker 506 is in the receivable state, the speaker 506 may output a sound effect or may output a beep sound. As described above, the output pattern of the sound from the speaker 506 is not limited to a specific pattern. Note that the speaker 506 may stop outputting sound when in the standby state.

図１０〜図１４に例示した受付可能状態であるか否かをユーザに通知するための操作端末１が備える構成、すなわち、ディスプレイ５０１及び発光装置５０５などの表示装置と、スピーカ５０６などの再生装置とは、任意に組み合わされてもよい。例えば、１又は複数種類の表示装置及び１又は複数種類の再生装置を任意に組合せることで、操作端末１は構成されてもよい。 The configuration of the operation terminal 1 for notifying the user whether or not it is in the receivable state illustrated in FIGS. 10 to 14, that is, a display device such as the display 501 and the light emitting device 505 and a reproducing device such as the speaker 506 And may be arbitrarily combined. For example, the operation terminal 1 may be configured by arbitrarily combining one or more types of display devices and one or more types of playback devices.

図１５は、図４で例示した操作端末１のブロック図に対して、図１０〜図１４で例示した表示装置６０２及び再生装置６０３を加えた場合の操作端末１のブロック図である。 FIG. 15 is a block diagram of the operation terminal 1 in a case where the display device 602 and the reproduction device 603 illustrated in FIGS. 10 to 14 are added to the block diagram of the operation terminal 1 illustrated in FIG.

図１５で例示する操作端末１は、図４に対してさらに、再生装置６０３及び表示装置６０２を備えている。なお、図１５において操作端末１は、再生装置６０３及び表示装置６０２の少なくとも一方を備えればよい。 The operation terminal 1 illustrated in FIG. 15 further includes a playback device 603 and a display device 602 as compared to FIG. In FIG. 15, the operation terminal 1 may include at least one of the playback device 603 and the display device 602.

なお、図１５において、図４と同一の構成要素には同一の符号を付し、説明を省略する。但し、図１５では管理部に対して３０２の参照符号に代えて６０１の参照符号を付している。 In FIG. 15, the same components as those in FIG. 4 are denoted by the same reference numerals, and description thereof will be omitted. However, in FIG. 15, a reference numeral 601 is assigned to the management unit instead of the reference numeral 302.

開始条件判定部３０５は、図４と同様、ジェスチャー抽出部３０４から取得したジェスチャー抽出情報に含まれる第一座標、第二座標、及び第三座標の位置関係が開始条件を満たすか否かを判定し、判定結果を管理部６０１に出力する。この処理の詳細は、図５に例示するフローと同様である。ただし、ステップＳ４０４では、判定結果が管理部３０６に代えて管理部６０１に出力されている。 The start condition determination unit 305 determines whether the positional relationship among the first coordinate, the second coordinate, and the third coordinate included in the gesture extraction information acquired from the gesture extraction unit 304 satisfies the start condition, as in FIG. Then, the determination result is output to the management unit 601. The details of this processing are the same as the flow illustrated in FIG. However, in step S404, the determination result is output to the management unit 601 instead of the management unit 306.

管理部６０１は、管理部３０６の機能に加えて、さらに、下記の機能を備える。すなわち、管理部６０１は、開始条件判定部３０５から開始条件を満たす旨の判定結果を取得した場合、図１０〜図１４で例示した状態通知の出力コマンドを再生装置６０３及び表示装置６０２に出力する。 The management unit 601 has the following functions in addition to the functions of the management unit 306. That is, when the management unit 601 obtains the determination result indicating that the start condition is satisfied from the start condition determination unit 305, the management unit 601 outputs the output command of the status notification illustrated in FIGS. 10 to 14 to the playback device 603 and the display device 602. .

収音装置３０７は、管理部６０１から開始指示を取得した場合、マイクに周囲の音を収音させ、収音された音を示す音声信号を収音音声記録部３０８に記録する。 When acquiring the start instruction from the management unit 601, the sound collection device 307 causes the microphone to collect surrounding sounds, and records a sound signal indicating the collected sound in the sound collection sound recording unit 308.

再生装置６０３は、図１４で例示したスピーカ５０６、再生音を再生する再生回路などを備え、管理部３０６から状態通知の出力コマンドを取得した場合、メモリ３０９から所定の再生音を読み出して再生する。ここで、スピーカ５０６から再生される再生音は、図１４で例示した、効果音、ビープ音、又は音声メッセージなどである。これにより、前記状態通知が聴覚を通じてユーザに通知される。 The playback device 603 includes the speaker 506 illustrated in FIG. 14, a playback circuit that plays back a playback sound, and the like. When an output command for status notification is acquired from the management unit 306, the playback device 603 reads a predetermined playback sound from the memory 309 and plays it. . Here, the reproduction sound reproduced from the speaker 506 is a sound effect, a beep sound, a voice message, or the like illustrated in FIG. As a result, the user is notified of the status notification through hearing.

表示装置６０２は、図１０〜図１４で例示したディスプレイ５０１及び図１３で例示した発光装置５０５の少なくとも一方で構成され、管理部６０１から状態通知の出力コマンドを取得した場合、図１０〜図１４で例示した状態通知を出力する。これにより、メッセージ、色、アイコンなどによって、状態通知が視覚を通じてユーザに通知される。 The display device 602 is configured by at least one of the display 501 illustrated in FIGS. 10 to 14 and the light emitting device 505 illustrated in FIG. 13, and when a status notification output command is acquired from the management unit 601, the display device 602 illustrated in FIGS. The status notification illustrated in the example is output. As a result, the user is notified visually of the status notification by a message, a color, an icon, or the like.

このように、本実施の形態によれば、空間内の特定の位置に上肢を向けるというような煩わしいジェスチャーではなく、操作端末１に対して手を挙げる両手を広げるといった簡易なジェスチャーによって受付可能状態にすることが可能となる。 As described above, according to the present embodiment, a state in which a simple gesture such as spreading both hands with the hand raised to the operation terminal 1 is not a troublesome gesture such as turning the upper limb to a specific position in the space. It becomes possible to.

（実施の形態２）
実施の形態１では、ユーザがジェスチャーによって受付可能状態を開始させる態様を主に例示した。実施の形態２は、実施の形態１の態様において、さらに、収音装置３０７が受付可能状態を終了する態様の詳細を示したものである。 (Embodiment 2)
In the first embodiment, the mode in which the user starts the receivable state by gesture is mainly exemplified. Embodiment 2 shows the details of the mode of Embodiment 1 in which the sound collection device 307 ends the receivable state.

図１６は、実施の形態２に係る操作端末１の構成の一例を示すブロック図である。なお、本実施の形態において実施の形態１と同一の構成要素は同一の符号を付して説明を省略する。 FIG. 16 is a block diagram illustrating an example of a configuration of the operation terminal 1 according to Embodiment 2. In the present embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.

図１６の操作端末１は図１５の構成に加えてさらに、タイムアウト判定部７０２、終了条件判定部７０３、及び無音区間検出部７０５を備える。また、ジェスチャー抽出部、開始条件判定部、及び管理部は実施の形態１に対して機能が追加されているため、７００、７０１及び７０４の参照符号を付している。なお、図１６において、操作端末１は再生装置６０３及び表示装置６０２を有している必要はない。 The operation terminal 1 of FIG. 16 further includes a timeout determination unit 702, an end condition determination unit 703, and a silent section detection unit 705 in addition to the configuration of FIG. Also, the gesture extraction unit, the start condition determination unit, and the management unit are provided with reference numerals 700, 701, and 704 because the functions are added to the first embodiment. In FIG. 16, the operation terminal 1 does not need to have the playback device 603 and the display device 602.

ジェスチャー抽出部７００は、人検出部３０２から取得した検出結果及び骨格情報抽出部３０３から取得した骨格情報に基づいて、第一座標、第二座標及び第三座標を抽出し、第一座標、第二座標及び第三座標を含むジェスチャー抽出情報を開始条件判定部７０１に出力することに加えてさらに終了条件判定部７０３に出力する。なお、ジェスチャー抽出部７００の処理の詳細は実施の形態１と同じである。 The gesture extracting unit 700 extracts the first coordinate, the second coordinate, and the third coordinate based on the detection result obtained from the human detecting unit 302 and the skeleton information obtained from the skeleton information extracting unit 303. The gesture extraction information including the two coordinates and the third coordinates is output to the start condition determination unit 701 and further to the end condition determination unit 703. The details of the processing of the gesture extracting unit 700 are the same as those of the first embodiment.

開始条件判定部７０１は、ジェスチャー抽出部７００から取得したジェスチャー抽出情報に含まれる第一座標、第二座標、及び第三座標の位置関係が開始条件を満たすか否かを判定し、満たすと判定した場合、開始条件が満たされていることを示す判定結果を管理部７０４及びタイムアウト判定部７０２に出力する。なお、実施の形態２において、位置関係が開始条件を満たすか否かの判定処理の詳細は図５のフローと同様である。 The start condition determining unit 701 determines whether the positional relationship between the first coordinate, the second coordinate, and the third coordinate included in the gesture extraction information acquired from the gesture extracting unit 700 satisfies the start condition, and determines that the condition is satisfied. In this case, a determination result indicating that the start condition is satisfied is output to the management unit 704 and the timeout determination unit 702. In the second embodiment, the details of the process of determining whether the positional relationship satisfies the start condition are the same as those in the flow of FIG.

タイムアウト判定部７０２は、開始条件判定部７０１から開始条件を満たされていることを示す判定結果を取得した場合、所定のタイムアウト期間（例えば１０秒間）のカウントダウンを開始し、カウントダウンが完了すると、タイムアウト期間が経過したことを示す判定結果を管理部７０４に出力する。但し、タイムアウト判定部７０２は、カウントダウン中、すなわち、タイムアウト期間内に開始条件判定部７０１から開始条件が満たされたことを示す判定結果を取得した場合、タイムアウト期間を初期化し、最初からタイムアウト期間のカウントダウンを実行する。これにより、受付可能状態において、ユーザが音声入力の終了の意思表示を示すジェスチャーを行わずに、ジェスチャー可能範囲９０１をフェードアウトしたとしても、受付可能状態が継続されることを防止できる。したがって、ユーザが音声入力の終了の意思表示を示すジェスチャーをし忘れた場合において、受付可能状態が継続されることを防止できる。その結果、ユーザのプライバシーを確保できる。 The timeout determination unit 702 starts counting down for a predetermined timeout period (for example, 10 seconds) when acquiring the determination result indicating that the start condition is satisfied from the start condition determination unit 701, and when the countdown is completed, A determination result indicating that the period has elapsed is output to the management unit 704. However, when the timeout determination unit 702 obtains a determination result indicating that the start condition is satisfied from the start condition determination unit 701 during the countdown, that is, within the timeout period, the timeout period is initialized, and the timeout period starts from the beginning. Perform a countdown. Thereby, even if the user fades out the gesture possible range 901 without performing the gesture indicating the intention to end the voice input in the acceptable state, it is possible to prevent the acceptable state from continuing. Therefore, if the user forgets to perform a gesture indicating the intention to end the voice input, it is possible to prevent the receivable state from continuing. As a result, the privacy of the user can be secured.

終了条件判定部７０３は、ジェスチャー抽出部７００から取得したジェスチャー抽出情報に含まれる第一座標、第二座標、及び第三座標の位置関係が終了条件（第五条件の一例）を満たすか否かを判定し、満たすと判定した場合、終了条件が満たされていることを示す判定結果を管理部７０４に出力する。終了条件判定部７０３の処理の詳細は、図１７のフローを用いて後述する。 The termination condition determination unit 703 determines whether the positional relationship between the first coordinate, the second coordinate, and the third coordinate included in the gesture extraction information acquired from the gesture extraction unit 700 satisfies the termination condition (an example of a fifth condition). Is determined, the determination result indicating that the termination condition is satisfied is output to the management unit 704. Details of the processing of the end condition determination unit 703 will be described later using the flow of FIG.

管理部７０４は、待受状態にある場合において、開始条件判定部７０１から開始条件が満たされたことを示す判定結果を取得した場合、メモリ３０９に記憶された状態フラグを立てると共に収音装置３０７に開始指示を出力することで、収音装置３０７を受付可能状態にする。 When the management unit 704 obtains a determination result indicating that the start condition is satisfied from the start condition determination unit 701 in the standby state, the management unit 704 sets the state flag stored in the memory 309 and sets the sound collection device 307. By outputting a start instruction to the sound collecting device 307, the sound collecting device 307 is set in a receivable state.

また、管理部７０４は、受付可能状態にある場合において、タイムアウト判定部７０２からタイムアウト期間が経過したことを示す判定結果を取得した場合、メモリ３０９に記憶された状態フラグを下げると共に終了指示を収音装置３０７に出力することで、受付可能状態を終了させて待機状態にする。 When the management unit 704 obtains a determination result indicating that the timeout period has elapsed from the timeout determination unit 702 in the acceptable state, it lowers the status flag stored in the memory 309 and collects the termination instruction. By outputting to the sound device 307, the receivable state is ended and the apparatus enters the standby state.

また、管理部７０４は、受付可能状態にある場合において、終了条件判定部７０３から終了条件が満たされたことを示すを判定結果を取得した場合、メモリ３０９に記憶された状態フラグを下げると共に終了指示を収音装置３０７に出力することで、受付可能状態を終了させて待機状態にする。これにより、ユーザが音声入力の終了の意思表示を示すジェスチャーをすることにより、受付可能状態を終了させることができる。なお、実施の形態１では、管理部３０６は、開始条件判定部３０５から開始条件が満たされていないことを示す判定結果を取得した場合、受付可能状態を終了したが、実施の形態２では、管理部７０４は、基本的に、終了条件判定部７０３から終了条件が満たされたことを示す判定結果を取得した場合、受付可能状態を終了する。 When the management unit 704 obtains a determination result indicating that the termination condition has been satisfied from the termination condition determination unit 703 in the acceptable state, the management unit 704 lowers the status flag stored in the memory 309 and terminates the process. By outputting the instruction to the sound collection device 307, the receivable state is ended and the standby state is set. Thus, the acceptable state can be ended by the user making a gesture indicating the intention to end the voice input. In the first embodiment, when the management unit 306 obtains a determination result indicating that the start condition is not satisfied from the start condition determination unit 305, the management unit 306 ends the receivable state. However, in the second embodiment, Basically, when the management unit 704 obtains a determination result indicating that the termination condition has been satisfied from the termination condition determination unit 703, the management unit 704 ends the receivable state.

なお、管理部７０４は、受付可能状態にある場合において、タイムアウト判定部７０２からタイムアウト期間が経過したことを示す判定結果を取得した場合、さらに、無音区間検出部７０５により有音区間が検出されていれば、受付可能状態を継続させてもよい。これにより、ユーザが操作端末１を操作する発話を行っているにも拘わらず、タイムアウト期間の経過を条件に、受付可能状態が自動的に終了する事態を回避できる。 When the management unit 704 obtains a determination result indicating that the timeout period has elapsed from the timeout determination unit 702 in the acceptable state, the silent section detection unit 705 further detects a sound section. If so, the receivable state may be continued. Thus, it is possible to avoid a situation in which the receivable state automatically ends on condition that the timeout period has elapsed, even though the user is making an utterance for operating the operation terminal 1.

一方、管理部７０４は、受付可能状態にある場合において、タイムアウト判定部７０２からタイムアウト期間が経過したことを示す判定結果を取得した場合、さらに、無音区間検出部７０５により無音区間が検出されていれば、受付可能状態を終了する。 On the other hand, when the management unit 704 obtains a determination result indicating that the timeout period has elapsed from the timeout determination unit 702 in the acceptable state, the silent unit detection unit 705 may further detect a silent period. If so, the acceptable state ends.

無音区間検出部７０５は、収音音声記録部３０８に記録された最新の音声信号に無音区間が含まれているか否かを検出する。ここで、無音区間検出部７０５は、入力レベルが所定の閾値以下になっている時間が所定時間（例えば３００ミリ秒）、継続した場合に音声信号に無音区間があると判定すればよく、無音区間の検出方式は特定の手法に限定されない。無音区間検出部７０５は、無音区間を検出すると、現在の収音状態を無音に設定して、管理部７０４に出力する。一方、無音区間検出部７０５は、有音区間を検出すると、現在の収音状態を有音に設定して、管理部７０４に出力する。 The silent section detecting section 705 detects whether or not the latest audio signal recorded in the collected voice recording section 308 includes a silent section. Here, the silence section detection unit 705 may determine that there is a silence section in the audio signal when the time during which the input level is equal to or less than the predetermined threshold continues for a predetermined time (for example, 300 milliseconds). The section detection method is not limited to a specific method. When detecting a silent section, the silent section detecting section 705 sets the current sound collection state to silent and outputs the state to the managing section 704. On the other hand, when the silent section detecting section 705 detects a sound section, the silent section detecting section 705 sets the current sound collecting state to sound and outputs the state to the managing section 704.

実施の形態２において、ジェスチャー抽出部７００、開始条件判定部７０１、タイムアウト判定部７０２、終了条件判定部７０３、及び管理部７０４は条件判定部の一例に相当する。 In Embodiment 2, the gesture extraction unit 700, the start condition determination unit 701, the timeout determination unit 702, the end condition determination unit 703, and the management unit 704 correspond to an example of a condition determination unit.

図１６において、操作端末１は、タイムアウト判定部７０２、終了条件判定部７０３、及び無音区間検出部７０５の全て備える必要はなく、少なくとも１つを備えていればよい。 In FIG. 16, the operation terminal 1 does not need to include all of the timeout determination unit 702, the end condition determination unit 703, and the silent section detection unit 705, and may include at least one.

図１７は、本開示の実施の形態２に係る終了条件判定部７０３の処理の一例を示すフローチャートである。 FIG. 17 is a flowchart illustrating an example of a process of the termination condition determination unit 703 according to Embodiment 2 of the present disclosure.

ステップＳ８０１では、終了条件判定部７０３は、ジェスチャー抽出部７００からジェスチャー抽出情報を取得する。 In step S801, the termination condition determination unit 703 acquires gesture extraction information from the gesture extraction unit 700.

ステップＳ８０２では、終了条件判定部７０３は、ステップＳ８０１で取得したジェスチャー抽出情報に基づいて、操作端末１の周囲のジェスチャー可能範囲９０１内にユーザが存在しているか否かを判定する。ジェスチャー可能範囲９０１内にユーザが存在しないと判定した場合（ステップＳ８０２でＮＯ）、処理はステップＳ８０１に戻され、ジュスチャー抽出情報が取得される。一方、終了条件判定部７０３は、ジェスチャー可能範囲内にユーザが存在すると判定した場合（ステップＳ８０２でＹＥＳ）、ステップＳ８０３を実行する。ここで、終了条件判定部７０３は、ジェスチャー抽出情報に含まれる第一座標及び第二座標などのユーザの位置を示す座標がジェスチャー可能範囲９０１内にあれば、ユーザはジェスチャー可能範囲９０１内に居ると判定し、前記座標がジェスチャー可能範囲９０１内になければ、ユーザはジェスチャー可能範囲９０１内に居ないと判定すればよい。 In step S802, the termination condition determination unit 703 determines whether or not the user is within the gesture possible range 901 around the operation terminal 1, based on the gesture extraction information acquired in step S801. If it is determined that there is no user within the gesture possible range 901 (NO in step S802), the process returns to step S801, and gesture extraction information is obtained. On the other hand, when the end condition determination unit 703 determines that the user is within the gesture possible range (YES in step S802), the end condition determination unit 703 executes step S803. Here, if the coordinates indicating the user's position, such as the first coordinates and the second coordinates, included in the gesture extraction information are within the gesture possible range 901, the end condition determination unit 703 determines that the user is within the gesture possible range 901. If the coordinates are not within the gesture possible range 901, the user may determine that the user is not within the gesture possible range 901.

ステップＳ８０３では、終了条件判定部７０３は、ジェスチャー抽出情報に含まれる第一座標、第二座標及び第三座標の位置関係が所定の終了条件（第５条件の一例）を満たしているか否かを判定する。そして、位置関係が終了条件を満たしていると判定した場合（ステップＳ８０３でＹＥＳ）、終了条件判定部７０３は、ユーザが音声入力の終了の意思表示を示すジェスチャーを行ったため、処理をステップＳ８０４に進める。一方、位置関係が終了条件を満たしていないと判定した場合（ステップＳ８０３でＮＯ）、終了条件判定部７０３は処理をステップＳ８０１に戻し、ジェスチャー抽出情報を取得する。 In step S803, the termination condition determination unit 703 determines whether the positional relationship among the first coordinate, the second coordinate, and the third coordinate included in the gesture extraction information satisfies a predetermined termination condition (an example of a fifth condition). judge. Then, when it is determined that the positional relationship satisfies the termination condition (YES in step S803), the termination condition determination unit 703 moves the process to step S804 because the user has performed a gesture indicating intention to terminate the voice input. Proceed. On the other hand, if it is determined that the positional relationship does not satisfy the termination condition (NO in step S803), the termination condition determination unit 703 returns the process to step S801, and acquires gesture extraction information.

ステップＳ８０４では、終了条件判定部７０３は、終了条件が満たされていることを示す判定結果を管理部７０４に出力する。ステップＳ８０４が終了すると、終了条件判定部７０３は、処理をステップＳ８０１に戻し、ジュスチャー抽出情報を取得する。 In step S804, the termination condition determination unit 703 outputs a determination result indicating that the termination condition is satisfied to the management unit 704. When step S804 ends, the termination condition determination unit 703 returns the process to step S801, and acquires gesture extraction information.

次に、終了条件について説明する。図１８は、終了条件を説明するために、ユーザＵ１の骨格情報２０１を例示した図である。図１８は、ステップＳ８０３において、開始条件の比較対象となる第一座標、第二座標、及び第三座標が例示されている。図１８の例では、第一座標として手首座標Ｈが採用され、第二座標として首元座標Ｎが採用され、第三座標として腰座標Ｗが採用されている。 Next, the termination condition will be described. FIG. 18 is a diagram exemplifying the skeleton information 201 of the user U1 for explaining the termination condition. FIG. 18 illustrates the first coordinate, the second coordinate, and the third coordinate to be compared with the start condition in step S803. In the example of FIG. 18, the wrist coordinate H is used as the first coordinate, the neck coordinate N is used as the second coordinate, and the waist coordinate W is used as the third coordinate.

終了条件の第一例は、開始条件の第一例に対応するものであり、手首座標Ｈが首元座標Ｎよりも鉛直方向に対して第一閾値（例えば２０ｃｍ）以上、小さい（低い）という条件である。この場合、ユーザＵ１は、手首座標Ｈが首元座標Ｎよりも鉛直方向に対して第一閾値以上低くなるように上肢を下げるジェスチャーを行うことによって、受付可能状態を終了できる。なお、上肢は右腕であってもよいし、左腕であってもよい。 The first example of the end condition corresponds to the first example of the start condition, and the wrist coordinate H is smaller (lower) than the neck coordinate N by a first threshold (for example, 20 cm) or more in the vertical direction. Condition. In this case, the user U1 can end the receivable state by performing a gesture of lowering the upper limb so that the wrist coordinates H are lower than the neck base coordinates N by a first threshold or more in the vertical direction. The upper limb may be the right arm or the left arm.

終了条件の第二例は、開始条件の第二例に対応するものであり、手首座標Ｈと首元座標Ｎとが鉛直方向に対して所定範囲内に収まらなくなるという条件である。この場合、ユーザＵ１は、胸元付近まで上げた手首座標Ｈを鉛直方向に閾値範囲外の位置まで降ろす又は上げるジェスチャーを行うことで、受付可能状態を終了できる。 The second example of the end condition corresponds to the second example of the start condition, and is a condition that the wrist coordinates H and the neck coordinates N do not fall within a predetermined range in the vertical direction. In this case, the user U1 can end the acceptable state by performing a gesture of lowering or raising the wrist coordinate H raised near the chest to a position outside the threshold range in the vertical direction.

終了条件の第三例は、開始条件の第三例に対応するものであり、首元座標Ｎと腰座標Ｗとをつないだ体幹軸方向において、手首座標Ｈが首元座標Ｎよりも第一閾値以上小さいという条件である。この場合、ユーザＵ１は、手首座標Ｈが首元座標Ｎに対して体幹軸方向に対して第一閾値以上低くなるように上肢を下げるジェスチャーを行うことによって、受付可能状態を終了できる。 The third example of the end condition corresponds to the third example of the start condition. In the trunk axis direction connecting the neck coordinate N and the waist coordinate W, the wrist coordinate H is more than the neck coordinate N. The condition is that the value is smaller than one threshold value. In this case, the user U1 can end the receivable state by performing a gesture of lowering the upper limb so that the wrist coordinate H is lower than the neck coordinate N in the trunk axis direction by the first threshold or more.

終了条件の第四例は、開始条件の第四例に対応するものであり、手首座標Ｈと首元座標Ｎとが体幹軸方向に対して所定範囲内に収まらなくなるという条件である。この場合、ユーザＵ１は、胸元付近まで上げた手首座標Ｈを体幹軸方向に閾値範囲外の位置まで上げる又は下げるジェスチャーを行うことによって受け付け可能状態を終了できる。 The fourth example of the end condition corresponds to the fourth example of the start condition, and is a condition that the wrist coordinates H and the neck base coordinates N do not fall within a predetermined range in the trunk axis direction. In this case, the user U1 can end the receivable state by performing a gesture of raising or lowering the wrist coordinate H raised near the chest to a position outside the threshold range in the trunk axis direction.

終了条件の第五例は、開始条件の第五例に対応するものであり、手首座標Ｈ及び首元座標Ｎを結んだ上肢方向を示す線分と、腰座標Ｗ及び首元座標Ｎを結んだ体幹軸方向を示す線分との成す角度が所定の第二閾値（１００度、８０度など）未満であるという条件である。この場合、ユーザＵ１は、起立状態又は寝ころんだ状態であるかっといった現在の姿勢に拘わらず、鉛直方向を意識せずに、体幹軸方向に対して手を下げるジェスチャーを行うことによって受付可能状態を終了できる。 The fifth example of the end condition corresponds to the fifth example of the start condition, and connects a line indicating the upper limb direction connecting the wrist coordinates H and the neck coordinates N to the waist coordinate W and the neck coordinates N. The condition is that the angle formed by the line indicating the trunk axis direction is smaller than a predetermined second threshold value (100 degrees, 80 degrees, etc.). In this case, regardless of the current posture of the user U1 such as standing or lying down, the user U1 is ready to accept by performing a gesture of lowering the hand in the trunk axis direction without being aware of the vertical direction. Can be terminated.

終了条件の第六例は、開始条件の第六例に対応するものであり、手首座標Ｈ及び首元座標Ｎを結んだ上肢方向を示す線分と、腰座標Ｗ及び首元座標Ｎを結んだ体幹軸方向を示す線分との成す角度が所定の角度範囲内に収まっているという条件である。この場合、ユーザＵ１は、起立状態又は寝ころんだ状態であるかっといった現在の姿勢に拘わらず、鉛直方向を意識せずに、体幹軸方向に対して手を下げるジェスチャーを行うことによって受付可能状態を終了できる。 The sixth example of the end condition corresponds to the sixth example of the start condition, and connects the line indicating the upper limb direction connecting the wrist coordinate H and the neck coordinate N to the waist coordinate W and the neck coordinate N. This is a condition that the angle formed by the line segment indicating the trunk axis direction is within a predetermined angle range. In this case, regardless of the current posture of the user U1 such as standing or lying down, the user U1 is ready to accept by performing a gesture of lowering the hand in the trunk axis direction without being aware of the vertical direction. Can be terminated.

終了条件は、第一例〜第六例のうちいずれか２以上を組み合わせた条件であってもよい。例えば、終了条件は、第一例〜第六例のうちいずれか２以上の条件が共に成立したという条件が採用できる。或いは、終了条件は、第一例〜第六例のうちいずれか１つが成立したという条件であってもよい。ここでは、終了条件の第一例〜第六例は共に上肢を下げるジェスチャーが想定されているが、これは一例である。例えば、開始条件として上肢を下げる又は両手を下げるジェスチャーが採用されているのであれば、終了条件として上肢を上げる又は両手を上げるジェスチャーが終了条件として採用されてもよい。すなわち、終了条件は、開始条件と重複しないという制約が満たされているのであれば、どのような条件が採用されてもよい。 The termination condition may be a condition combining any two or more of the first to sixth examples. For example, as the end condition, a condition that any two or more conditions among the first to sixth examples are satisfied can be adopted. Alternatively, the termination condition may be a condition that any one of the first to sixth examples is satisfied. Here, a gesture of lowering the upper limb is assumed in each of the first to sixth examples of the end condition, but this is an example. For example, if a gesture of lowering the upper limb or lowering both hands is adopted as the start condition, a gesture of raising the upper limb or raising both hands may be adopted as the end condition. That is, any condition may be adopted as the end condition as long as the condition that the end condition does not overlap with the start condition is satisfied.

次に、ステップＳ８０２の処理の一例について図８を用いて説明する。終了条件判定部７０３は、開始条件判定部７０１と同様、ユーザＵ１の位置が操作端末１に対して下限値Ｄ１から上限値Ｄ２までの範囲内に位置する、すなわち、ジェスチャー可能範囲９０１にユーザが位置する場合、ジェスチャーを検出する。一方、終了条件判定部７０３は、ユーザＵ１の位置が操作端末に対して下限値Ｄ１以下に位置する場合、又は、ユーザＵ１の位置が操作端末１に対して上限値Ｄ２以上の範囲に位置する場合、ユーザＵ１のジェスチャーを検出しない。 Next, an example of the process of step S802 will be described with reference to FIG. The end condition determination unit 703 determines that the position of the user U1 is located within the range from the lower limit value D1 to the upper limit value D2 with respect to the operation terminal 1 as in the start condition determination unit 701. If so, the gesture is detected. On the other hand, the termination condition determination unit 703 determines that the position of the user U1 is located below the lower limit value D1 with respect to the operation terminal, or that the position of the user U1 is located within the range above the upper limit value D2 with respect to the operation terminal 1. In this case, the gesture of the user U1 is not detected.

ユーザＵ１の位置が操作端末１に対して近すぎる場合、ユーザＵ１のジェスチャーをうまく検出できない可能性があることに加えてユーザＵ１が音声入力の終了の意思表示を示すジェスチャーをし忘れてジェスチャー可能範囲９０１をフェードアウトした可能性がある。また、ユーザＵ１の位置が操作端末１に対して遠すぎる場合、ユーザＵ１が音声入力の終了の意思表示を示すジェスチャーをし忘れてジェスチャー可能範囲９０１をフェードアウトした可能性がある。そこで、本実施の形態では、終了条件判定部７０３は、ユーザＵ１がジェスチャー可能範囲９０１に居る場合にユーザＵ１のジェスチャーを検出する処理、すなわち、終了条件を満たすか否かを判定する処理を実施することにした。これにより、ジェスチャーの検出精度の低下を防止できると共にユーザＵ１が音声入力の終了の意思表示を示すジェスチャーをし忘れている場合において、受付可能状態が継続されることを防止できる。 When the position of the user U1 is too close to the operation terminal 1, there is a possibility that the gesture of the user U1 may not be detected well, and in addition, the user U1 may perform the gesture indicating that the intention to end the voice input is forgotten. The range 901 may have faded out. Further, when the position of the user U1 is too far from the operation terminal 1, the user U1 may have forgotten to perform a gesture indicating an intention to end the voice input and faded out the gesture possible range 901. Therefore, in the present embodiment, when the user U1 is in the gesture possible range 901, the end condition determination unit 703 performs a process of detecting the gesture of the user U1, that is, a process of determining whether the end condition is satisfied. I decided to do it. As a result, it is possible to prevent a decrease in gesture detection accuracy, and to prevent a state in which the accepting state is continued when the user U1 forgets to perform a gesture indicating an intention to end the voice input.

また、終了条件判定部７０３は、開始条件判定部７０１と同様、１又は複数の第一座標と１又は複数の第二座標とがジェスチャー抽出情報に含まれている場合は、これらの座標を用いて、終了条件を判定してもよい。 When one or a plurality of first coordinates and one or a plurality of second coordinates are included in the gesture extraction information, the end condition determining unit 703 uses these coordinates similarly to the start condition determining unit 701. Thus, the termination condition may be determined.

例えば、ジェスチャー抽出情報に複数の第一座標と１つの第二座標とが含まれている場合、終了条件判定部７０３は、複数の第一座標（例えば、手首座標Ｈ、肘座標、及び肩座標）のうち少なくとも１つの第一座標が１つの第二座標（例えば、首元座標Ｎ）に対して鉛直方向又は体幹軸方向に第一閾値以上、小さければ、終了条件を満たすと判定すればよい。また、ジェスチャー抽出情報に１つの第一座標と複数の第二座標とが含まれる場合、終了条件判定部７０３は、１つの第一座標（例えば、手首座標Ｈ）が複数の第二座標（例えば、胴体座標、首元座標Ｎ、頭の先の座標）の少なくとも１つの第二座標に対して鉛直方向又は体幹軸方向に第一閾値以上、小さければ、終了条件を満たすと判定すればよい。但し、開始条件と終了条件とは重複していてはならない。 For example, when the gesture extraction information includes a plurality of first coordinates and one second coordinate, the end condition determining unit 703 determines the plurality of first coordinates (for example, wrist coordinates H, elbow coordinates, and shoulder coordinates). If at least one of the first coordinates is smaller than or equal to one second coordinate (for example, neck coordinate N) in the vertical direction or the trunk axis direction by a first threshold or more, it is determined that the end condition is satisfied. Good. When the gesture extraction information includes one first coordinate and a plurality of second coordinates, the termination condition determination unit 703 determines that one first coordinate (for example, wrist coordinate H) is a plurality of second coordinates (for example, wrist coordinate H). , Torso coordinates, neck coordinates N, coordinates of the tip of the head) in the vertical direction or the trunk axis direction with respect to at least one second coordinate, which is smaller than or equal to the first threshold value, may be determined to satisfy the termination condition. . However, the start condition and the end condition must not overlap.

次に、タイムアウト期間が延長される処理の詳細について説明する。図１９は、本開示の実施の形態２に係るタイムアウト判定部７０２の処理の一例を示すフローチャートである。 Next, details of the processing for extending the timeout period will be described. FIG. 19 is a flowchart illustrating an example of a process of the timeout determination unit 702 according to Embodiment 2 of the present disclosure.

Ｓ１６０１では、タイムアウト判定部７０２は、タイムアウト期間のカウントダウン中であるか否かを判定する。カウントダウン中であれば（ステップＳ１６０１でＹＥＳ）、タイムアウト判定部７０２は、処理をＳ１６０２に進め、カウントダウン中でなければ（ステップＳ１６０１でＮＯ）、処理をステップＳ１６０１に戻す。 In step S1601, the timeout determination unit 702 determines whether the timeout period is counting down. If the timer is counting down (YES in step S1601), the timeout determination unit 702 advances the process to step S1602. If the timer is not counting down (NO in step S1601), the process returns to step S1601.

ステップＳ１６０２では、タイムアウト判定部７０２は、開始条件判定部７０１から開始条件が満たされることを示す判定結果を取得したか否かを判定する。この判定結果を取得した場合（ステップＳ１６０２でＹＥＳ）、タイムアウト判定部７０２は、処理をステップＳ１６０３に進め、この判定結果を取得しない場合（ステップＳ１６０２でＮＯ）、処理をステップＳ１６０１に戻す。 In step S1602, the timeout determination unit 702 determines whether a determination result indicating that the start condition is satisfied is obtained from the start condition determination unit 701. If this determination result has been obtained (YES in step S1602), timeout determination unit 702 proceeds with the process to step S1603, and if this determination result has not been obtained (NO in step S1602), returns the process to step S1601.

ステップＳ１６０３では、タイムアウト判定部７０２は、タイムアウト期間を初期値に戻して、再度カウントダウンを開始することで、タイムアウト期間を延長する。ステップＳ１６０３の処理が終了すると、処理はステップＳ１６０１に戻る。 In step S1603, the timeout determination unit 702 extends the timeout period by returning the timeout period to the initial value and starting the countdown again. When the process in step S1603 ends, the process returns to step S1601.

以上により、ジェスチャー可能範囲９０１内でユーザが音声入力の意思表示を示すジェスチャーを行っている限り、タイムアウト期間は延長され、操作端末１を操作する発話が収音されない事態を回避できる。 As described above, as long as the user performs the gesture indicating the intention of voice input within the gesture possible range 901, the timeout period is extended, and it is possible to avoid a situation in which the utterance for operating the operation terminal 1 is not collected.

次に、管理部７０４の処理について説明する。図２０は、本開示の実施の形態２に係る管理部７０４が受付可能状態を終了させるときの処理の一例を示すフローチャートである。なお、管理部７０４が受付可能状態を開始させるときの処理は図６と同じであるため、ここでは、説明を省略する。ステップＳ１７０１では、管理部７０４は、メモリ３０９に記憶された状態フラグを参照することで、受付可能状態であるか否かを判定する。受付可能状態であれば（ステップＳ１７０１でＹＥＳ）、処理はステップＳ１７０２に進み、受付可能状態でなければ（ステップＳ１７０１でＮＯ）、処理はステップＳ１７０１に戻る。 Next, the processing of the management unit 704 will be described. FIG. 20 is a flowchart illustrating an example of a process when the management unit 704 according to Embodiment 2 of the present disclosure terminates the receivable state. Note that the processing when the management unit 704 starts the receivable state is the same as that in FIG. 6, and thus the description is omitted here. In step S1701, the management unit 704 refers to the status flag stored in the memory 309 to determine whether the status is the acceptable status. If it is in the receivable state (YES in step S1701), the process proceeds to step S1702. If it is not in the receivable state (NO in step S1701), the process returns to step S1701.

ステップＳ１７０２では、管理部７０４は、終了条件判定部７０３から終了条件が満たされていることを示す判定結果を取得できたか否かを判定する。この判定結果が取得された場合（ステップＳ１７０２でＹＥＳ）、処理はステップＳ１７０５に進み、この判定結果が取得されなかった場合（ステップＳ１７０２でＮＯ）、処理はステップＳ１７０３に進む。 In step S1702, the management unit 704 determines whether the determination result indicating that the termination condition is satisfied has been obtained from the termination condition determination unit 703. If this determination result has been obtained (YES in step S1702), the process proceeds to step S1705. If this determination result has not been obtained (NO in step S1702), the process proceeds to step S1703.

ステップＳ１７０３では、管理部７０４は、タイムアウト判定部７０２からタイムアウト期間が経過したことを示す判定結果を取得したか否かを判定する。この判定結果が取得された場合（ステップＳ１７０３でＹＥＳ）、処理はステップＳ１７０４に進み、この判定結果が取得されない場合（ステップＳ１７０３でＮＯ）、処理はステップＳ１７０１に戻る。 In step S1703, the management unit 704 determines whether a determination result indicating that the timeout period has elapsed has been acquired from the timeout determination unit 702. If this determination result has been obtained (YES in step S1703), the process proceeds to step S1704. If this determination result has not been obtained (NO in step S1703), the process returns to step S1701.

ステップＳ１７０４では、管理部７０４は、無音区間検出部７０５から出力された収音状態が無音であることを示すか否かを判定する。収音状態が無音であることを示せば（ステップＳ１７０４でＹＥＳ）、処理はステップＳ１７０５に進み、収音状態が有音であることを示せば（ステップＳ１７０４でＮＯ）、処理はステップＳ１７０１に戻る。これにより、タイムアウト期間の経過時に無音区間が検出されていれば、受付可能状態が終了され、有音区間が検出されていれば、受付可能状態は継続される。 In step S1704, the management unit 704 determines whether the sound collection state output from the silence section detection unit 705 indicates that there is no sound. If the sound pickup state indicates that there is no sound (YES in step S1704), the process proceeds to step S1705. If the sound pickup state indicates that there is sound (NO in step S1704), the process returns to step S1701. . Thereby, if a silent section is detected after the elapse of the timeout period, the receivable state is ended, and if a sound section is detected, the receivable state is continued.

ステップＳ１７０５では、管理部７０４は、受付可能状態を終了し、処理をステップＳ１７０１に戻す。 In step S1705, the management unit 704 ends the receivable state, and returns the process to step S1701.

以上、実施の形態２によれば、、ユーザは操作端末１に対して、例えば、手を上げるというような簡易なジェスチャーによって受付可能状態を開始させ、手を下げるというような簡易なジェスチャーによって受付可能状態を終了させることができる。 As described above, according to the second embodiment, the user causes the operation terminal 1 to start the receivable state by a simple gesture such as raising the hand, and to accept the operation terminal 1 by a simple gesture such as lowering the hand. The enabled state can be ended.

次に、実施の形態２の変形例について説明する。実施の形態２においても、実施の形態１と同様、図９に示すように、に操作端末１に対して複数のユーザがジェスチャーを行う場合、ジェスチャー抽出部７００は、１人の操作者を特定してもよい。この場合、ジェスチャー抽出部７００は、実施の形態１と同様、最も近くにいるユーザを操作者として特定してもよいし、最初に検出されたユーザがジェスチャー可能範囲９０１を出るまで、そのユーザを操作者として特定し続けてもよい。 Next, a modification of the second embodiment will be described. In the second embodiment, as in the first embodiment, when a plurality of users make gestures on the operation terminal 1 as shown in FIG. 9, the gesture extraction unit 700 specifies one operator. May be. In this case, the gesture extracting unit 700 may specify the nearest user as the operator as in the first embodiment, or may identify the user until the first detected user exits the gesture possible range 901. You may continue to specify the operator.

図１６の例では、撮像装置３０１、収音装置３０７、再生装置６０３、及び表示装置６０２は、操作端末１が備えていたが、本開示はこれに限定されない。例えば、図２１に示すように、撮像装置３０１、再生装置６０３、及び表示装置６０２は、操作端末１とは別の装置で構成されてもよい。 In the example of FIG. 16, the operation terminal 1 includes the imaging device 301, the sound collection device 307, the playback device 603, and the display device 602, but the present disclosure is not limited to this. For example, as illustrated in FIG. 21, the imaging device 301, the playback device 603, and the display device 602 may be configured as devices different from the operation terminal 1.

図２１は、撮像装置３０１、再生装置６０３、及び表示装置６０２を操作端末１とは別の装置で構成した場合の構成の一例を示す図である。図２１において、撮像装置３０１、再生装置６０３、及び表示装置６０２はＬＡＮ等のネットワークを介して相互に通信可能に接続されている。 FIG. 21 is a diagram illustrating an example of a configuration in a case where the imaging device 301, the playback device 603, and the display device 602 are configured by devices different from the operation terminal 1. In FIG. 21, an imaging device 301, a playback device 603, and a display device 602 are communicably connected to each other via a network such as a LAN.

図２１の例では、ユーザＵ１が収音装置３０７を有した操作端末１に対して行ったジェスチャーが、操作端末１とは別体の撮像装置３０１により撮像され、撮像装置３０１は得られた空間情報から音声入力の開始の意思表示を示すジェスチャー又は音声入力の終了の意思表示を示すジェスチャーとを検出し、検出結果を操作端末１に送信する。操作端末１は、撮像装置３０１による検出結果に応じて、収音装置３０７を受付可能状態又は待機状態にする。そして、操作端末１は、状態通知を表示装置６０２及び再生装置６０３に送信し、表示装置６０２及び再生装置６０３から図１０〜図１４に示すような状態通知を出力させる。 In the example of FIG. 21, a gesture performed by the user U1 on the operation terminal 1 having the sound pickup device 307 is imaged by the imaging device 301 separate from the operation terminal 1, and the imaging device 301 is in the obtained space. A gesture indicating intention to start voice input or a gesture indicating intention to end voice input is detected from the information, and the detection result is transmitted to the operation terminal 1. The operation terminal 1 sets the sound collection device 307 to a receivable state or a standby state according to a detection result by the imaging device 301. Then, the operation terminal 1 transmits the status notification to the display device 602 and the playback device 603, and causes the display device 602 and the playback device 603 to output the status notification as illustrated in FIGS.

また、図２１において、操作端末１、撮像装置３０１、表示装置６０２、及び再生装置６０３は、それぞれ、複数の装置で構成されてもよいし、一つの装置に一体的に構成されていてもよい。また、図２１の例では、収音装置３０７は操作端末１が備えているが、操作端末１とは別体の装置で構成されていてもよい。 In FIG. 21, the operation terminal 1, the imaging device 301, the display device 602, and the playback device 603 may be each configured by a plurality of devices, or may be integrally configured by one device. . In addition, in the example of FIG. 21, the sound collection device 307 is provided in the operation terminal 1, but may be configured as a device separate from the operation terminal 1.

さらに、実施の形態２において、管理部７０４は、受付可能状態において、無音区間が検出された場合、ユーザがジェスチャー可能範囲９０１に居るか否かに拘わらず、音声入力の終了の意思表示を示すジェスチャーをするか否かに拘わらず、或いは、タイムアウト期間が経過するか否かに拘わらず、受付可能状態を終了させてもよい。 Further, in the second embodiment, when a silent section is detected in the receivable state, the management unit 704 indicates the intention to end the voice input regardless of whether the user is in the gesture possible range 901. The receivable state may be ended regardless of whether a gesture is made or whether a timeout period elapses.

本開示によって実現される、操作端末の音声入力の開始および終了をユーザのジェスチャーによって判定する技術は、ユーザが厳密なジェスチャーを覚える必要がなくなり、ジェスチャーによる簡易な音声入力の開始および終了を実現する手法として有用である。 The technology implemented by the present disclosure to determine the start and end of the voice input of the operation terminal by the gesture of the user eliminates the need for the user to memorize a strict gesture, and realizes the simple start and end of the voice input by the gesture. It is useful as a technique.

１：操作端末
３００：プロセッサ
３０１：撮像装置
３０２：人検出部
３０３：骨格情報抽出部
３０４：ジェスチャー抽出部
３０５：開始条件判定部
３０６：管理部
３０７：収音装置
３０８：収音音声記録部
３０９：メモリ
５０１：ディスプレイ
５０２：テキスト
５０３：アイコン
５０４：色
５０５：発光装置
５０６：スピーカ
６０１：管理部
６０２：表示装置
６０３：再生装置
７００：ジェスチャー抽出部
７０１：開始条件判定部
７０２：タイムアウト判定部
７０３：終了条件判定部
７０４：管理部
７０５：無音区間検出部 1: operation terminal 300: processor 301: imaging device 302: human detection unit 303: skeleton information extraction unit 304: gesture extraction unit 305: start condition determination unit 306: management unit 307: sound collection device 308: sound collection voice recording unit 309 : Memory 501: Display 502: Text 503: Icon 504: Color 505: Light emitting device 506: Speaker 601: Management unit 602: Display device 603: Playback device 700: Gesture extraction unit 701: Start condition judgment unit 702: Timeout judgment unit 703 : End condition determination unit 704: Management unit 705: Silence section detection unit

Claims

An operation terminal operated by a user's uttered voice,
An imaging unit for imaging a space;
From the information of the imaged space, a person detection unit that detects the user,
A voice input unit that receives an input of an uttered voice by the user;
When the user is detected by the human detection unit, the first coordinate of a predetermined first portion included in the upper limb of the user based on information obtained by predetermined means and included in the upper body excluding the upper limb of the user A coordinate detection unit that detects a second coordinate of a predetermined second part to be
Comparing the positional relationship between the first coordinates and the second coordinates, and, at least once, when the positional relationship satisfies a predetermined first condition, a condition determination that sets the voice input unit to a state in which a voice input can be accepted. And a part,
Operation terminal.

A skeleton information extraction unit configured to extract skeleton information of the user from the information of the space;
The information obtained by the predetermined means is the skeleton information,
The operation terminal according to claim 1.

The operation terminal according to claim 1, wherein the imaging unit is a visible light camera, an infrared camera, a TOF sensor, an ultrasonic sensor, or a radio wave sensor.

The positional relationship is a positional relationship between the first coordinates and the second coordinates in the vertical direction,
The operation terminal according to claim 1.

The positional relationship is a positional relationship between the first coordinates and the second coordinates in the trunk axis direction of the user,
The operation terminal according to claim 1.

The coordinate detection unit further detects a third coordinate of a third part in the upper body,
The first condition is that the angle formed by the first coordinate, the second coordinate, and the third coordinate exceeds a predetermined threshold, is lower than the predetermined threshold, or falls within a predetermined range.
The operation terminal according to claim 1.

The first portion includes a plurality of portions included in the upper limb,
The first coordinates are determined based on coordinates of any one or more of the plurality of parts,
The operation terminal according to claim 1.

The second part includes a plurality of parts included in the upper body excluding the upper limb,
The second coordinates are determined based on any one or more coordinates of the plurality of parts,
The operation terminal according to claim 1.

The first condition includes a plurality of second conditions,
The condition determination unit sets the receivable state when the positional relationship satisfies at least one of the plurality of second conditions, or a third condition obtained by combining a part of the plurality of second conditions.
An operation terminal according to any one of claims 1 to 8.

The audio input unit further includes a display unit or a reproduction unit that outputs information indicating whether or not in the acceptable state,
The operation terminal according to claim 1.

The display unit is a display,
The operation terminal according to claim 10.

The operation terminal according to claim 11, wherein the information indicating whether the voice input unit is in the receivable state is a color, a text, or an icon.

The display unit is a light emitting device that emits light indicating that the voice input unit is in the acceptable state,
The operation terminal according to claim 10.

The playback unit outputs a sound indicating whether the sound input unit is in the receivable state,
The operation terminal according to claim 10.

The playback unit outputs a sound indicating whether the voice input unit is in the receivable state,
The operation terminal according to claim 10.

The condition determination unit compares the positional relationship only when the distance between the operation terminal and the user satisfies a predetermined fourth condition,
The operation terminal according to claim 1.

The condition determination unit terminates the receivable state when a silent section continues for a predetermined time in the receivable state,
An operation terminal according to any one of claims 1 to 16.

The condition determination unit continues the receivable state as long as the positional relationship satisfies the first condition in the receivable state,
An operation terminal according to any one of claims 1 to 17.

The condition determination unit, when the state in which the positional relationship does not satisfy the first condition in the receivable state continues for a predetermined timeout period, ends the receivable state,
An operation terminal according to any one of claims 1 to 18.

The condition determination unit, in the timeout period, if it is determined that the positional relationship satisfies the first condition, extend the timeout period,
The operation terminal according to claim 19.

The condition determination unit continues the receivable state if a voice input is detected at the end of the timeout period,
The operation terminal according to claim 18.

The condition determination unit ends the receivable state when the positional relationship satisfies a predetermined fifth condition different from the first condition,
The operation terminal according to claim 1.

The condition determination unit, when the human detection unit detects a plurality of users, recognizes a specific one as an operator of the operation terminal,
An operation terminal according to claim 1.

The operator is a user closest to the operation terminal among the plurality of users,
The operation terminal according to claim 23.

A voice input method in an operation terminal operated by a user's uttered voice,
Obtain information on the space imaged by the imaging device,
Detecting the user from the information of the space,
When the user is detected, first coordinates of a predetermined first portion included in the upper limb of the user based on information obtained by predetermined means and predetermined second coordinates included in the upper body excluding the upper limb of the user Detect the second coordinate of the part and
Comparing the positional relationship between the first coordinates and the second coordinates, at least once, when the positional relationship satisfies a predetermined first condition, the audio input unit is set to a state in which a voice input can be accepted.
Voice input method.

A program for causing a computer to execute the voice input method according to claim 25.