JP2017011321A

JP2017011321A - Detection device, detection system, detection method, and program

Info

Publication number: JP2017011321A
Application number: JP2015121246A
Authority: JP
Inventors: 健士岩本; Takeshi Iwamoto
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2015-06-16
Filing date: 2015-06-16
Publication date: 2017-01-12
Anticipated expiration: 2035-06-16
Also published as: JP6500625B2

Abstract

PROBLEM TO BE SOLVED: To detect user's vocalization timing in advance.SOLUTION: A detection part 121 in a detection device 100 detects user's biological behavior in advance of his or her vocalization. To put it concretely, the detection part 121 detects the tremor of the throat preceding the vocalization of the user on the basis of comparison between a frequency waveform obtained by frequency conversion of a waveform indicating a secular change in the magnitude of the tremor of the throat of the user and a feature waveform indicating the feature of a frequency preceding the vocalization learned in advance. An identification part 122 identifies time information indicating a period to be taken until the vocalization of the user, as a predetermined process to be executed on the basis of a detection result.SELECTED DRAWING: Figure 2

Description

本発明は、検知装置、検知システム、検知方法及びプログラムに関する。 The present invention relates to a detection device, a detection system, a detection method, and a program.

一般的に、画像認識によりユーザの顔をトラッキングする技術がカメラなどで使用されている。
また、近時、音をトリガに、その音を発した音源（例えば、発声したユーザ）をトラッキングする技術もカメラなどで使用されている（例えば、特許文献１など）。 In general, a technique for tracking a user's face by image recognition is used in a camera or the like.
Recently, a technique for tracking a sound source (for example, a user who uttered) using a sound as a trigger is also used in a camera (for example, Patent Document 1).

特開平９−１３５４３２号公報JP-A-9-135432

ところで、カメラでユーザのしゃべる様子などをリアルタイムで撮影するような場合、上述したトラッキングの技術だとユーザが既にしゃべり始めた後の様子を撮影してしまうことがある。すなわち、音の発声をトリガとするトラッキングではユーザがしゃべり始める瞬間の撮影を逃してしまうことがある。
このようなことから、ユーザの発声タイミングを事前に検知する技術が望まれている。 By the way, when shooting a user's talking in real time with a camera, the tracking technique described above may shoot a situation after the user has already started talking. In other words, in tracking using sound generation as a trigger, shooting at the moment when the user starts speaking may be missed.
For this reason, a technique for detecting a user's utterance timing in advance is desired.

そこで、本発明は、上述した事情に鑑みてなされたものであり、ユーザの発声タイミングを事前に検知することを目的とする。 Therefore, the present invention has been made in view of the above-described circumstances, and an object thereof is to detect a user's utterance timing in advance.

上記目的を達成するため、本発明の１つの観点によれば、
ユーザの発声に先立つ生理的な挙動を検知する検知手段と、
前記検知手段の検知結果に基づき、所定処理を実行する実行手段と、
を備えたことを特徴とする検知装置を提供するものである。 In order to achieve the above object, according to one aspect of the present invention,
Detection means for detecting physiological behavior prior to user utterance;
Execution means for executing predetermined processing based on the detection result of the detection means;
The present invention provides a detection device characterized by comprising:

本発明によれば、ユーザの発声タイミングを事前に検知することができる。 ADVANTAGE OF THE INVENTION According to this invention, a user's utterance timing can be detected in advance.

実施形態に係る検知システムの利用例を示す図である。It is a figure which shows the usage example of the detection system which concerns on embodiment. 実施形態に係る検知装置の構成を示すブロック図である。It is a block diagram which shows the structure of the detection apparatus which concerns on embodiment. 発声前後の喉の震えの大きさの経時的変化の一例を示す波形図である。It is a wave form diagram which shows an example of the time-dependent change of the magnitude | size of the throat tremor before and behind vocalization. 図３の波形図をフーリエ変換した図である。FIG. 4 is a diagram obtained by Fourier transforming the waveform diagram of FIG. 3. 実施形態に係る追尾装置の構成を示すブロック図である。It is a block diagram which shows the structure of the tracking apparatus which concerns on embodiment. 追尾用テーブルの一例を示す図である。It is a figure which shows an example of the table for tracking. 検知処理のフローチャートの一例を示す図である。It is a figure which shows an example of the flowchart of a detection process. 追尾処理のフローチャートの一例を示す図である。It is a figure which shows an example of the flowchart of a tracking process. （Ａ）及び（Ｂ）は、追尾前のユーザ位置の例を示す図である。(A) And (B) is a figure showing an example of a user position before tracking. 追尾後のユーザ位置の例を示す図である。It is a figure which shows the example of the user position after tracking. 実施形態に係る撮影システムの別の利用例を示す図である。It is a figure which shows another example of utilization of the imaging | photography system which concerns on embodiment. 追尾用テーブルの別例を示す図である。It is a figure which shows another example of the table for tracking. 変形例１に係る検知装置の構成を示すブロック図である。It is a block diagram which shows the structure of the detection apparatus which concerns on the modification 1. FIG. 変形例２に係る追尾装置の構成を示すブロック図である。It is a block diagram which shows the structure of the tracking apparatus which concerns on the modification 2. 変形例３に係る検知システムの利用例を示す図である。It is a figure which shows the usage example of the detection system which concerns on the modification 3. 変形例３に係る録音装置の構成を示すブロック図である。It is a block diagram which shows the structure of the recording device which concerns on the modification 3. 変形例４に係る検知装置の構成を示すブロック図である。It is a block diagram which shows the structure of the detection apparatus which concerns on the modification 4.

以下、本発明の実施形態について図面を参照して説明する。
図１は、本発明の実施形態に係る検知システム１０の構成を示す図である。この検知システム１０は、検知装置（センシングデバイス）１００と、追尾装置（Ｗｅｂカメラ）２００と、から構成される。検知装置１００は、ウェアラブル装置であってユーザに装着される。この実施形態においては、ユーザの発声前の喉の震えを検知する関係上、検知装置１００は首に着ける装身具であって、例えば、ネックレスである。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a diagram showing a configuration of a detection system 10 according to an embodiment of the present invention. The detection system 10 includes a detection device (sensing device) 100 and a tracking device (web camera) 200. The detection device 100 is a wearable device and is worn by a user. In this embodiment, the detection device 100 is an accessory worn on the neck, for example, a necklace, in order to detect tremors in the throat before the user speaks.

この検知システム１０の基本となる技術的思想は、検知装置１００においてユーザの発声前の喉の震えを検知する点にある。検知の手法は任意だが、この実施形態では一例として、ユーザの喉の震えの大きさの経時的変化を示す波形を周波数変換（フーリエ変換）して得られた周波数波形と、予め学習しておいた発声前の周波数の特徴を示す特徴波形と、の比較に基づいて、喉の震えを検知する。
この検知の後、検知装置１００は、ユーザＩＤを無線送信する。次に、追尾装置２００は、受信したユーザＩＤのユーザを追尾する。以上が自動追尾の流れである。 The technical idea that is the basis of the detection system 10 is that the detection device 100 detects tremors in the throat before the user speaks. Although the detection method is arbitrary, in this embodiment, as an example, a frequency waveform obtained by frequency conversion (Fourier transform) of a waveform indicating a change in the magnitude of the tremor of the user's throat is learned in advance. Based on the comparison with the characteristic waveform indicating the characteristic of the frequency before utterance, the tremor of the throat is detected.
After this detection, the detection apparatus 100 wirelessly transmits the user ID. Next, the tracking device 200 tracks the user with the received user ID. The above is the flow of automatic tracking.

なお、この実施形態においては、理解を容易にするために、ユーザは一人である場合を例にとって説明する。また、ユーザは追尾装置２００の画角内に収まる、すなわちフレームアウトしていない場合を例にとって説明する。
また、追尾装置２００が写す映像はリアルタイムにＰＣ(Personal Computer)３００に転送され、そのＰＣ３００のディスプレイ３０１にライブビューが表示されている前提で説明する。また、図１のユーザはユーザＡと称して説明し、ユーザＡを特段特定する必要がなければ上位概念としてユーザと称して説明する。 In this embodiment, in order to facilitate understanding, a case where there is only one user will be described as an example. Further, a case where the user falls within the angle of view of the tracking device 200, that is, a case where the user is not out of frame will be described as an example.
In addition, a description will be made on the assumption that the video captured by the tracking device 200 is transferred to a PC (Personal Computer) 300 in real time and a live view is displayed on the display 301 of the PC 300. Further, the user in FIG. 1 will be referred to as a user A and will be described as a superordinate concept if there is no need to specify the user A.

さて、以下では、検知システム１０を構成する各装置（検知装置１００、追尾装置２００）について順に説明する。 In the following, each device (the detection device 100 and the tracking device 200) constituting the detection system 10 will be described in order.

まず、図２を参照しながら、検知装置１００の構成について説明する。この検知装置１００は、ジャイロセンサ１１０、制御部１２０、記憶部１３０及び無線通信Ｉ／Ｆ（インタフェース）１４０を備える。 First, the configuration of the detection apparatus 100 will be described with reference to FIG. The detection apparatus 100 includes a gyro sensor 110, a control unit 120, a storage unit 130, and a wireless communication I / F (interface) 140.

ジャイロセンサ１１０は、回転角（ピッチ、ヨー、ロール）を検知するための３軸ジャイロセンサである。すなわち、ジャイロセンサ１１０は、検知装置１００の静止状態からの傾きを検知可能であって、この実施形態においては、喉の震えに応じて検知装置１００が傾いた際の角度を検知する。 The gyro sensor 110 is a three-axis gyro sensor for detecting a rotation angle (pitch, yaw, roll). In other words, the gyro sensor 110 can detect the inclination of the detection device 100 from the stationary state, and in this embodiment, detects the angle when the detection device 100 is inclined according to the tremor of the throat.

制御部１２０は、例えばＣＰＵ（Central Processing Unit）やＲＯＭ（Read Only Memory）、ＲＡＭ(Random Access Memory)などにより構成される。制御部１２０は、ＲＯＭに記憶されたプログラムに従って制御することにより、各部（検知部１２１、特定部１２２、送信部１２３）の機能を実現する。なお、制御部１２０のハードウェアは、ＣＰＵ等に限らず、ネックレスの大きさに合わせて小型のＩＣ(Integrated Circuit)等で構成してもよいことはもちろんである。 The control unit 120 includes, for example, a CPU (Central Processing Unit), a ROM (Read Only Memory), a RAM (Random Access Memory), and the like. The control part 120 implement | achieves the function of each part (the detection part 121, the specific | specification part 122, the transmission part 123) by controlling according to the program memorize | stored in ROM. Of course, the hardware of the control unit 120 is not limited to the CPU or the like, and may be configured by a small IC (Integrated Circuit) or the like according to the size of the necklace.

記憶部１３０は、不揮発性メモリ（例えば、フラッシュメモリなど）から構成され、発声前波形１３１を記憶する。発声前波形１３１は、ユーザの発声前の喉の震えを検知するために予め学習された波形である。具体的には、図３に示す発声前後の喉の震えの大きさの経時的変化を示す波形をフーリエ変換して、図４に示す周波数波形にしたものが発声前波形１３１となる。 The storage unit 130 is configured by a non-volatile memory (for example, a flash memory) and stores a pre-utterance waveform 131. The pre-speech waveform 131 is a waveform that has been learned in advance in order to detect throat tremor before the user utters. Specifically, a waveform showing the temporal change in the magnitude of throat tremor before and after utterance shown in FIG. 3 is Fourier transformed into a frequency waveform shown in FIG.

図３の波形は、横軸が時間（ｔ）、縦軸が喉の震えの大きさ（ｄＢ）、を示し、発声前の期間（ｔ２−ｔ１）においては喉の震えの大きさが発声後に比べて小さい。なお、喉の震えの大きさは、音圧に比例するので単位はｄＢとなる。この図３の波形をフーリエ変換すると、図４に示す発声前波形１３１（周波数波形）が得られる。図４の発声前波形１３１は、横軸が周波数（Ｈｚ）、縦軸が音圧の（喉の震えの）大きさ（ｄＢ）、を示す。図中に示すように、発声前においては特定の周波数帯域Ｔ（Ｔ１からＴ２の帯域）に亘って、大きさ（ｄＢ）が発声後の大きさに比べて小さい特徴波形が得られる。この特徴波形は、発声前の周波数の特徴を示す。 In the waveform of FIG. 3, the horizontal axis indicates time (t), and the vertical axis indicates the magnitude of the throat tremor (dB). In the period (t2-t1) before vocalization, the magnitude of the throat tremor is Smaller than that. Note that the magnitude of tremor in the throat is proportional to the sound pressure, so the unit is dB. When the waveform of FIG. 3 is Fourier transformed, a pre-voice waveform 131 (frequency waveform) shown in FIG. 4 is obtained. In the pre-speech waveform 131 in FIG. 4, the horizontal axis represents frequency (Hz) and the vertical axis represents sound pressure (throat tremor) magnitude (dB). As shown in the figure, a characteristic waveform having a magnitude (dB) smaller than that after utterance is obtained over a specific frequency band T (band from T1 to T2) before utterance. This characteristic waveform shows the characteristic of the frequency before utterance.

この予め学習しておいた発声前波形１３１の特徴波形の有無をモニタリングすることでユーザの発声前の喉の震えを検知する。なお、記憶部１３０は、発声前波形１３１以外に、自装置（検知装置１００）の装置ＩＤを記憶する。この実施形態においては、検知装置１００はユーザに身に着けられ、ユーザと一対一に紐付いているので、装置ＩＤは実質的にユーザＩＤともいえる。以下では、図１のユーザＡのユーザＩＤを「Ａ」として説明する。なお、記憶部１３０は、装置ＩＤに代えてユーザＩＤを直接記憶してもよいことはもちろんである。 By monitoring the presence / absence of the characteristic waveform of the pre-speech waveform 131 learned in advance, the tremor of the throat before the user utters is detected. In addition to the pre-speech waveform 131, the storage unit 130 stores the device ID of the own device (the detection device 100). In this embodiment, since the detection apparatus 100 is worn by the user and is associated with the user on a one-to-one basis, the apparatus ID can be said to be substantially a user ID. In the following description, the user ID of the user A in FIG. Of course, the storage unit 130 may directly store the user ID instead of the device ID.

さて、図２に戻って、無線通信Ｉ／Ｆ１４０は、外部装置と無線通信を行うための通信インタフェースである。この実施形態においては、無線通信として、Bluetooth（登録商標）による近距離無線通信を用いる。特に、検知装置１００の電池寿命の観点から、低消費電力であるＢＬＥ（Bluetooth Low Energy）による近距離無線通信が好適である。 Now, returning to FIG. 2, the wireless communication I / F 140 is a communication interface for performing wireless communication with an external device. In this embodiment, near field communication by Bluetooth (registered trademark) is used as wireless communication. In particular, from the viewpoint of the battery life of the detection device 100, short-range wireless communication using BLE (Bluetooth Low Energy), which is low power consumption, is preferable.

次に、制御部１２０の機能について説明する。
制御部１２０は機能として、検知部１２１と特定部１２２と送信部１２３を備える。 Next, functions of the control unit 120 will be described.
The control unit 120 includes a detection unit 121, a specification unit 122, and a transmission unit 123 as functions.

検知部１２１は、ユーザの発声に先立つ生理的な挙動（この実施形態においては、一例として、発声前の喉の震え）を検知する。具体的には、検知部１２１は、ジャイロセンサ１１０によってリアルタイムに検知する実測値（喉の震えの大きさの実測値）をフーリエ変換して得られた周波数波形と、予め学習しておいた発声前の周波数の特徴を示す特徴波形と、を比較することで、発声前の喉の震えを検知する。 The detection unit 121 detects a physiological behavior prior to the user's utterance (in this embodiment, as an example, tremor of the throat before the utterance). Specifically, the detection unit 121 uses a frequency waveform obtained by Fourier-transforming an actual measurement value (actual measurement value of the magnitude of the throat tremor) detected in real time by the gyro sensor 110, and an utterance learned in advance. By comparing the characteristic waveform indicating the characteristic of the previous frequency, the tremor of the throat before utterance is detected.

次に、特定部１２２は、検知部１２１の検知結果に基づき、所定処理を実行する。この実施形態では、特定部１２２は、所定処理として、検知部１２１が発声前の喉の震えを検知すると、発声までにかかる時間を示す時間情報を特定する。具体的には、特定部１２２は、図３に示した喉が震える発声前の期間ｔ２−ｔ１（秒）を、時間情報として特定する。例えば、時間情報は、０．５秒である。なお、この時間情報の値は、検知部１２１が検知するまでにかかった時間に基づいて補正してもよい。例えば、検知に０．１秒要した場合は、補正後の時間情報は０．４（０．５−０．１）秒となる。なお、特定部１２２は、実行手段として機能とする。 Next, the specifying unit 122 executes predetermined processing based on the detection result of the detection unit 121. In this embodiment, the specifying unit 122 specifies time information indicating the time required for the utterance when the detecting unit 121 detects the tremor of the throat before the utterance as the predetermined process. Specifically, the specifying unit 122 specifies the period t2-t1 (seconds) before utterance that the throat shakes shown in FIG. 3 as time information. For example, the time information is 0.5 seconds. The time information value may be corrected based on the time taken until the detection unit 121 detects the time information. For example, when 0.1 seconds are required for detection, the corrected time information is 0.4 (0.5-0.1) seconds. The specifying unit 122 functions as an execution unit.

次に、送信部１２３は、検知部１２１が発声前の喉の震えを検知すると、ユーザのユーザＩＤと、特定部１２２が特定した時間情報と、を無線通信Ｉ／Ｆ１４０を介してＢＬＥに基づき無線送信する。この実施形態において、送信部１２３は、ユーザＡのユーザＩＤＡと、発声までにかかる時間を示す時間情報（例えば、０．５秒）と、を無線送信する。 Next, when the detection unit 121 detects tremors in the throat before speaking, the transmission unit 123 determines the user ID of the user and the time information specified by the specification unit 122 based on BLE via the wireless communication I / F 140. Wireless transmission. In this embodiment, the transmission unit 123 wirelessly transmits a user ID A of the user A and time information (for example, 0.5 seconds) indicating the time required for speaking.

以上、検知装置１００の構成について説明した。以下では、追尾装置２００の構成について説明する。この追尾装置２００は、無線通信Ｉ／Ｆ２１０、制御部２２０、撮像部２３０、記憶部２４０、サーボモータ２５０、操作部２６０、外部Ｉ／Ｆ（インタフェース）２７０及びマイク２８０を備える。 The configuration of the detection device 100 has been described above. Below, the structure of the tracking apparatus 200 is demonstrated. The tracking device 200 includes a wireless communication I / F 210, a control unit 220, an imaging unit 230, a storage unit 240, a servo motor 250, an operation unit 260, an external I / F (interface) 270, and a microphone 280.

無線通信Ｉ／Ｆ２１０は、外部装置（この実施形態においては検知装置１００）と無線通信を行うための通信インタフェースである。
制御部２２０は、例えばＣＰＵやＲＯＭ、ＲＡＭなどにより構成される。制御部２２０は、ＲＯＭに記憶されたプログラムに従って制御することにより、各部（受信部２２１、追尾部２２２、実行部２２３）の機能を実現する。 The wireless communication I / F 210 is a communication interface for performing wireless communication with an external device (the detection device 100 in this embodiment).
The control unit 220 includes, for example, a CPU, ROM, RAM, and the like. The control unit 220 implements the functions of each unit (reception unit 221, tracking unit 222, execution unit 223) by performing control according to a program stored in the ROM.

撮像部２３０は、被写体を撮影するためのＣＣＤ（Charge Coupled Device)やＣＭＯＳ（Complementary Metal Oxide Semiconductor)などのイメージセンサと、光学系（レンズ、絞り、シャッタなど）と、を備えたカメラである。 The imaging unit 230 is a camera including an image sensor such as a CCD (Charge Coupled Device) or a CMOS (Complementary Metal Oxide Semiconductor) for photographing a subject, and an optical system (lens, diaphragm, shutter, etc.).

記憶部２４０は、不揮発性メモリなどから構成され、録画された動画や音声、後述する追尾用テーブルなどを記憶する。
サーボモータ２５０は、サーボ機構によって位置・速度制御を行うモータである。このサーボモータ２５０により、追尾装置２００の撮像部２３０は、上下左右に回動可能である。
操作部２６０は、電源ボタンなどの各種ボタンによって構成される。 The storage unit 240 is composed of a non-volatile memory or the like, and stores recorded moving images and sounds, a tracking table to be described later, and the like.
The servo motor 250 is a motor that performs position / speed control by a servo mechanism. By the servo motor 250, the imaging unit 230 of the tracking device 200 can be rotated up and down and left and right.
The operation unit 260 includes various buttons such as a power button.

外部Ｉ／Ｆ２７０は、ＵＳＢ（Universal Serial Bus）コネクタなどから構成される。外部Ｉ／Ｆ２７０は、ＵＳＢケーブルを介して、外部装置であるＰＣ３００へ撮影中の映像をリアルタイムで転送する。
マイク２８０は、外部音声を収音する音声入力部である。 The external I / F 270 is configured by a USB (Universal Serial Bus) connector or the like. The external I / F 270 transfers the image being shot in real time to the PC 300, which is an external device, via the USB cable.
The microphone 280 is a sound input unit that picks up external sound.

次に、制御部２２０の機能について説明する。
制御部２２０は、機能として受信部２２１、追尾部２２２、実行部２２３を備える。 Next, functions of the control unit 220 will be described.
The control unit 220 includes a reception unit 221, a tracking unit 222, and an execution unit 223 as functions.

受信部２２１は、検知装置１００の送信部１２３から無線送信されたユーザＩＤと時間情報とを受信する。この実施形態において、受信部２２１は、無線送信されたユーザＩＤＡと時間情報とを受信する。 The receiving unit 221 receives the user ID and time information wirelessly transmitted from the transmitting unit 123 of the detection device 100. In this embodiment, the receiving unit 221 receives the user IDA and time information transmitted wirelessly.

追尾部２２２は、受信部２２１が受信したユーザＩＤのユーザを追尾する。この実施形態において、追尾部２２２は、ユーザＩＤＡのユーザＡを追尾する。追尾の手法は任意であるが、この実施形態では一例として、顔画像を用いたマッチングによりユーザを追尾する。この場合、図６の追尾用テーブルが示すように、予め、ユーザＩＤとユーザの顔画像とを対応付けて記憶しておく。例えば、ユーザＩＤＡのユーザＡの顔画像Ａ１をマッチングに用いるテンプレート画像として対応付けて記憶しておく。 The tracking unit 222 tracks the user with the user ID received by the receiving unit 221. In this embodiment, the tracking unit 222 tracks the user A of the user IDA. The tracking method is arbitrary, but in this embodiment, as an example, the user is tracked by matching using a face image. In this case, as shown in the tracking table in FIG. 6, the user ID and the user's face image are stored in advance in association with each other. For example, the face image A1 of the user A of the user IDA is stored in association with the template image used for matching.

ここで、追尾部２２２は、追尾用テーブルを参照して、ユーザＩＤＡに対応付けられた顔画像Ａ１を特定する。そして、追尾部２２２は、顔画像Ａ１を用い、撮像部２３０によってユーザＡの顔を認識後、その撮像部２３０の向きがユーザＡの顔の方向になるようにサーボモータ２５０を制御して、ユーザＡを追尾する。 Here, the tracking unit 222 identifies the face image A1 associated with the user IDA with reference to the tracking table. Then, the tracking unit 222 uses the face image A1 to recognize the face of the user A by the imaging unit 230, and then controls the servo motor 250 so that the orientation of the imaging unit 230 becomes the direction of the face of the user A. Track user A.

実行部２２３は、時間情報が示す発声までにかかる時間に基づいて、その発声のタイミングで追尾部２２２が追尾中のユーザに関連する所定処理を実行する。実行部２２３は、所定処理の一例として、発声のタイミングでユーザＡを被写体とする動画の録画を開始するようにする。これにより、ＰＣ３００のディスプレイ３０１に映るライブビューにおいて、追尾中のユーザＡに対する動画の録画が開始される。なお、録画された動画は追尾装置２００の記憶部２４０に記憶されても、ＰＣ３００の記憶部に記憶されてもよい。 The execution unit 223 executes a predetermined process related to the user being tracked by the tracking unit 222 at the time of the utterance based on the time taken until the utterance indicated by the time information. As an example of the predetermined process, the execution unit 223 starts recording a moving image with the user A as a subject at the time of utterance. Thereby, in the live view reflected on the display 301 of the PC 300, the recording of the moving image for the user A who is being tracked is started. The recorded moving image may be stored in the storage unit 240 of the tracking device 200 or the storage unit of the PC 300.

以上、図２乃至図６を参照しながら検知システム１０を構成する各装置（検知装置１００、追尾装置２００）について説明した。以下では、各装置が行う処理について順に説明する。 Heretofore, the respective devices (the detection device 100 and the tracking device 200) constituting the detection system 10 have been described with reference to FIGS. Below, the process which each apparatus performs is demonstrated in order.

まず、図７を参照しながら検知装置１００が行う検知処理について説明する。この検知処理は、検知装置１００を装着するユーザがジャイロセンサ１１０の電源をＯＮすることを契機として開始される。なお、以下では適宜ユーザＡを例にとって説明する。 First, detection processing performed by the detection apparatus 100 will be described with reference to FIG. This detection process is started when the user wearing the detection device 100 turns on the power of the gyro sensor 110. In the following description, user A will be described as an example.

まず、検知部１２１は、生理的な挙動を検知したか否か判定する（ステップＳ１１）。具体的には、検知部１２１は、ユーザＡの喉の震えを、上述した要領で、ユーザＡの喉の震えの大きさの経時的変化を示す波形を周波数変換（フーリエ変換）して得られた周波数波形と、予め学習しておいた発声前の周波数の特徴を示す特徴波形と、の比較に基づいて検知する。 First, the detection unit 121 determines whether a physiological behavior is detected (step S11). Specifically, the detection unit 121 is obtained by frequency-transforming (Fourier transform) a waveform indicating a temporal change in the magnitude of user A's throat tremor in the manner described above. Detection is performed based on a comparison between the measured frequency waveform and the characteristic waveform indicating the characteristics of the pre-speech frequency learned in advance.

検知部１２１は、生理的な挙動を検知するまで待機し（ステップＳ１１；Ｎｏ）、生理的な挙動、すなわちユーザＡの喉の震えを検知すると（ステップＳ１１；Ｙｅｓ）、ステップＳ１２に進む。 The detection unit 121 waits until it detects a physiological behavior (step S11; No), and when it detects a physiological behavior, that is, a tremor in the throat of the user A (step S11; Yes), proceeds to step S12.

ステップＳ１２において、特定部１２２は、発声までにかかる時間を特定する。具体的には、特定部１２２は、上述した要領で、図３に示した波形において喉が震える期間ｔ２−ｔ１（秒）を、時間情報として特定する。 In step S <b> 12, the specifying unit 122 specifies the time taken for utterance. Specifically, the specifying unit 122 specifies the period t2-t1 (seconds) in which the throat shakes in the waveform shown in FIG. 3 as time information in the manner described above.

次に、送信部１２３は、ユーザＩＤと特定した時間情報とを無線送信する（ステップＳ１３）。具体的には、送信部１２３は、上述した要領で、ユーザＡのユーザＩＤＡと、特定部１２２が特定した時間情報と、を無線通信Ｉ／Ｆ１４０を介してＢＬＥに基づき無線送信する。ステップＳ１３の後、検知処理を終了する。
検知装置１００は、以上の検知処理を、生理的な挙動を検知する都度、反復して実行する。 Next, the transmission unit 123 wirelessly transmits the user ID and the specified time information (step S13). Specifically, the transmission unit 123 wirelessly transmits the user ID A of the user A and the time information specified by the specification unit 122 based on BLE through the wireless communication I / F 140 in the manner described above. After step S13, the detection process is terminated.
The detection apparatus 100 repeatedly executes the above detection process each time a physiological behavior is detected.

さて、次に、図８を参照しながら、追尾装置２００が行う追尾処理について説明する。この追尾処理は、受信部２２１がユーザＩＤと時間情報を受信するまで待機しておき（ステップＳ２１；Ｎｏ）、ユーザＩＤと時間情報とを受信すると（ステップＳ２１；Ｙｅｓ）、開始する。
ユーザＩＤと時間情報を受信すると、追尾部２２２は、ユーザＩＤのユーザを追尾する（ステップＳ２２）。具体的には、追尾部２２２は、上述した要領で、追尾用テーブルからユーザＩＤＡに対応する顔画像Ａ１を特定し、その特定した顔画像Ａ１に基づいて、ユーザＡを追尾する。 Now, a tracking process performed by the tracking device 200 will be described with reference to FIG. This tracking process waits until the receiving unit 221 receives the user ID and time information (step S21; No), and starts when the user ID and time information are received (step S21; Yes).
When receiving the user ID and the time information, the tracking unit 222 tracks the user with the user ID (step S22). Specifically, the tracking unit 222 specifies the face image A1 corresponding to the user IDA from the tracking table in the manner described above, and tracks the user A based on the specified face image A1.

ここで、追尾前のユーザＡの位置が、ＰＣ３００のディスプレイ３０１において図９の（Ａ）又は（Ｂ）の位置だったとする。この場合、追尾部２２２は、図１０に示すように、ユーザＡがディスプレイ３０１の中央において所定の大きさになるように、サーボモータ２５０及び撮像部２３０の光学系を制御する。このように、追尾部２２２は、ユーザＡの発声前において、ユーザＡの動きを追尾（トラッキング）する。 Here, it is assumed that the position of the user A before tracking is the position of (A) or (B) in FIG. 9 on the display 301 of the PC 300. In this case, the tracking unit 222 controls the optical system of the servo motor 250 and the imaging unit 230 so that the user A has a predetermined size at the center of the display 301, as shown in FIG. Thus, the tracking unit 222 tracks (tracks) the movement of the user A before the user A utters.

次に、実行部２２３は、発声のタイミングで所定処理を実行する（ステップＳ２３）。具体的には、実行部２２３は、上述した要領で、時間情報が示す発声までにかかる時間に基づいて、発声のタイミングでユーザＡを被写体とする動画の録画を開始するようにする。ステップＳ２３の後、追尾処理を終了する。なお、この追尾処理終了後も、追尾部２２２は、ユーザＡを追尾するようにする。 Next, the execution part 223 performs a predetermined process at the timing of utterance (step S23). Specifically, the execution unit 223 starts recording a moving image with the user A as a subject at the time of utterance based on the time taken until utterance indicated by the time information in the manner described above. After step S23, the tracking process ends. Note that the tracking unit 222 tracks the user A even after the tracking process is completed.

以上、この実施形態における検知システム１０によれば、検知装置１００はユーザの発声に先立つ喉の震えを検知すると、そのユーザのユーザＩＤを無線送信し、一方で、ユーザＩＤを受信した追尾装置２００は、そのユーザＩＤのユーザを追尾するようにしている。このため、ユーザの発声タイミングに遅れてトラッキングを開始するようなことがない。したがって、ユーザがしゃべり始める瞬間の録画を逃してしまうといった事態を避けることができる。
また、検知装置１００は、学習しておいた発声前波形１３１における特徴波形と、リアルタイムに周波数変換して得られる周波数波形と、を比較することでユーザの喉の震えを検知する。このため、精度よく発声前の生理的な挙動を検知することができる。 As described above, according to the detection system 10 in this embodiment, when the detection device 100 detects a tremor in the throat prior to the user's utterance, the tracking device 200 that wirelessly transmits the user ID of the user and receives the user ID. Tracks the user of that user ID. For this reason, tracking does not start after the user's utterance timing. Therefore, it is possible to avoid a situation in which the recording at the moment when the user starts speaking is missed.
Moreover, the detection apparatus 100 detects the tremor of the user's throat by comparing the learned characteristic waveform in the pre-utterance waveform 131 and the frequency waveform obtained by frequency conversion in real time. For this reason, the physiological behavior before utterance can be detected with high accuracy.

なお、この実施形態においては、ユーザの発声タイミングで動画の録画を開始するようにしたが、これに限られない。例えば、ユーザの発声タイミングでユーザの声を録音するようにしてもよい。これによれば、ユーザＡが発声した後に録音を開始するといった事態を避けることができ、ユーザのしゃべる瞬間を逃さずに録音することができる。 In this embodiment, video recording is started at the user's utterance timing. However, the present invention is not limited to this. For example, the user's voice may be recorded at the user's voice timing. According to this, it is possible to avoid a situation where recording is started after the user A utters, and recording can be performed without missing the moment when the user speaks.

また、ユーザの発声タイミングでユーザに応じた撮影モードに切り替えるようにしてもよい。例えば、予めユーザＡが追尾装置２００の追尾用テーブルに、ユーザＩＤＡと撮影モードとを対応付ける設定をしておく。これによれば、ユーザＡの発声タイミングでユーザＡが所望する撮影モード（例えば、ユーザＡの顔を明るく滑らかにする画像補正を施すメイクアップモード）に切り替えることができる。したがって、ユーザ個人の趣味嗜好に応じたカスタマイズを行うことができるので、ユーザの満足度を向上することができる。 Moreover, you may make it switch to the imaging | photography mode according to a user at the user's utterance timing. For example, the user A sets the user IDA and the shooting mode in advance in the tracking table of the tracking device 200 in advance. According to this, it is possible to switch to a photographing mode desired by the user A (for example, a makeup mode for performing image correction that brightens and smoothes the face of the user A) at the utterance timing of the user A. Therefore, customization according to the user's personal hobbies can be performed, so that user satisfaction can be improved.

また、上述した実施形態においては、ユーザが一人である場合を例にとって説明したが、これに限られない。以下では、ユーザが複数いる場合の検知システム１０の利用例を、図１１を参照しながら説明する。 In the above-described embodiment, the case where there is only one user has been described as an example, but the present invention is not limited to this. Below, the usage example of the detection system 10 when there are a plurality of users will be described with reference to FIG.

図１１は、複数人（ユーザＡ〜Ｃの３人）で会議などのライブチャットをする場面を想定している。ユーザＡ及びユーザＢは同一ロケーション、ユーザＣはリモートロケーションにおり、インターネット等を介して検知システム１０を利用したライブチャットを行う場面である。ユーザＣのＰＣ４００にはリアルタイムで追尾装置２００Ａが撮影する映像が映っている。 FIG. 11 assumes a scene where a live chat such as a meeting is performed by a plurality of users (three users A to C). A user A and a user B are in the same location and a user C is in a remote location, and is a scene where a live chat using the detection system 10 is performed via the Internet or the like. A video taken by the tracking device 200A is shown in real time on the PC 400 of the user C.

ここで、ユーザＡとユーザＢが交互にしゃべったとする。この場合、ユーザＡの発声に先立つ喉の震えを検知装置１００Ａが検知してユーザＩＤＡを無線送信する。すると、追尾装置２００ＡはユーザＩＤＡを受信してユーザＩＤＡのユーザＡの追尾を行う。このため、ユーザＣは、ＰＣ４００のディスプレイにおいてユーザＡがしゃべり始める瞬間を逃さず見ることができる。 Here, it is assumed that user A and user B talk alternately. In this case, the detection device 100A detects tremors in the throat prior to the utterance of the user A, and wirelessly transmits the user IDA. Then, the tracking device 200A receives the user IDA and tracks the user A of the user IDA. For this reason, the user C can see the moment when the user A starts speaking on the display of the PC 400.

同様に、検知装置１００ＢがユーザＢの発声に先立つ喉の震えを検知すると、ユーザＩＤＢを無線送信する。すると、追尾装置２００ＡはユーザＩＤＢを受信してユーザＩＤＢのユーザＢの追尾を行う。このため、ユーザＣは、同様に、ＰＣ４００のディスプレイにおいてユーザＢがしゃべり始める瞬間を逃さず見ることができる。
この利用例によれば、会議などのライブチャットにおいて、複数のユーザが互いのしゃべり始める瞬間を逃さず見ることができるので、会話の齟齬を減らすことができる。このように、検知システム１０は、複数人で使用する場合に好適である。なお、図１１の例の場合、発声タイミングにおいて所定処理（上述した実施形態において動画録画を開始するなど）を実行するのは必須ではない。 Similarly, when the detection device 100B detects tremors in the throat prior to the user B's utterance, the user IDB is wirelessly transmitted. Then, the tracking device 200A receives the user IDB and tracks the user B of the user IDB. For this reason, the user C can similarly watch the moment when the user B starts speaking on the display of the PC 400.
According to this usage example, in a live chat such as a meeting, it is possible to watch without missing the moment when a plurality of users start to talk to each other, so it is possible to reduce conversational habits. Thus, the detection system 10 is suitable for use by a plurality of people. In the case of the example in FIG. 11, it is not essential to execute predetermined processing (such as starting moving image recording in the above-described embodiment) at the utterance timing.

この図１１の例に限らず、例えば、テレビのライブショー等で出演者がそれぞれ検知装置１００を着けていれば、追尾装置２００たるビデオカメラを自動で出演者がしゃべり始める前から追尾させて、しゃべる瞬間を撮影するといった使用が可能である。 For example, if a performer wears the detection device 100 at a live show on a TV, for example, the video camera as the tracking device 200 is automatically tracked before the performer starts speaking, It can be used to capture the moment you speak.

また、上述した実施形態において追尾装置２００は、追尾用テーブルを参照して顔画像を特定した後、フェイストラッキングによりユーザを追尾するようにしたが、これに限られない。例えば、図１２に示すように、追尾用テーブルにネックレスの形状画像を記憶しておき、このネックレスの形状画像に基づいて追尾を行ってもよい。 In the above-described embodiment, the tracking device 200 refers to the tracking table, identifies the face image, and then tracks the user by face tracking. However, the present invention is not limited to this. For example, as shown in FIG. 12, a shape image of a necklace may be stored in a tracking table, and tracking may be performed based on the shape image of the necklace.

この場合、検知装置１００毎にネックレスの形状を異ならせる。例えば、検知装置１００Ａの形状を楕円形状Ａ２、検知装置１００Ｂの形状を星形Ｂ２、検知装置１００Ｃの形状をハート型Ｃ２、といった具合に異ならせる。ここで、追尾装置２００の受信部２２１がユーザＩＤＡを受信したとする。すると、追尾部２２２は、ネックレスの楕円形状Ａ２を追尾用テーブルを参照して特定し、Ａ２をトラッキングすることでユーザＡを追尾する。
これによれば、ユーザＡの顔が半分フレームアウトしたとしてもネックレスが映っている限りユーザＡをトラッキングできるとともに、顔画像とあわせて追尾を行えば追尾精度を上げることができる。 In this case, the shape of the necklace is changed for each detection device 100. For example, the shape of the detection device 100A is changed to an oval shape A2, the shape of the detection device 100B is a star shape B2, and the shape of the detection device 100C is a heart shape C2. Here, it is assumed that the receiving unit 221 of the tracking device 200 receives the user IDA. Then, the tracking unit 222 specifies the elliptical shape A2 of the necklace with reference to the tracking table, and tracks the user A by tracking A2.
According to this, even if the face of the user A is half out of the frame, the user A can be tracked as long as the necklace is reflected, and tracking accuracy can be improved if tracking is performed together with the face image.

以上で実施形態の説明を終了するが、上記実施形態は一例であり、検知装置１００や追尾装置２００の構成や各装置が行う処理の内容などが上記実施形態で説明したものに限られないことはもちろんである。 Although the description of the embodiment is finished as described above, the above embodiment is an example, and the configuration of the detection device 100 and the tracking device 200, the content of processing performed by each device, and the like are not limited to those described in the above embodiment. Of course.

（変形例１）
上述した実施形態においてはユーザが追尾装置２００の画角内に入っている場合、すなわちフレームアウトしていない場合を前提に説明した。しかし、実際にはユーザがフレームアウトする場合も想定される。そこで、この変形例１においては、ユーザがフレームアウトした場合に備えた検知システムについて説明する。図１３に変形例１に係る検知装置１００’の構成を示す。この検知装置１００’は、上述した実施形態の検知装置１００と比べて、ＧＰＳ（Global Positioning System）１１１を備えた点、特定部１２２が位置特定を行う点、が異なる。以下ではこの異なる点を中心に説明する。 (Modification 1)
In the above-described embodiment, the case where the user is within the angle of view of the tracking device 200, that is, the case where the frame is not out has been described. However, it is also assumed that the user actually goes out of the frame. Therefore, in the first modification, a detection system provided when the user goes out of the frame will be described. FIG. 13 shows a configuration of a detection apparatus 100 ′ according to the first modification. This detection device 100 ′ is different from the detection device 100 of the above-described embodiment in that a GPS (Global Positioning System) 111 is provided and the specification unit 122 performs position specification. Hereinafter, this difference will be mainly described.

ＧＰＳ１１１は、緯度経度などの位置情報を取得するＧＰＳ受信機である。
特定部１２２は、検知部１２１がユーザの喉の震えを検知すると、ユーザの位置情報をＧＰＳ１１１に基づいて特定する。例えば、特定部１２２は、図１のユーザＡの緯度Ｘ、経度ＹをＧＰＳ１１１に基づいて特定する。送信部１２３は、ユーザＩＤＡと、特定した位置情報と、を無線送信する。 The GPS 111 is a GPS receiver that acquires position information such as latitude and longitude.
When the detection unit 121 detects the tremor of the user's throat, the specifying unit 122 specifies the user's position information based on the GPS 111. For example, the specifying unit 122 specifies the latitude X and longitude Y of the user A in FIG. The transmission unit 123 wirelessly transmits the user IDA and the specified position information.

一方、追尾装置２００の追尾部２２２は、ユーザの位置情報に基づいて、撮像部２３０の向きを制御してそのユーザを探索後、その探索したユーザを追尾する。例えば、追尾部２２２は、ユーザＡがフレームアウトしている場合、受信した位置情報に基づいて、撮像部２３０の向きを位置情報が示す緯度経度に向けてユーザＡを探索する。探索後、追尾部２２２は、ユーザＡの顔画像Ａ１に基づいて、ユーザＡを追尾する。 On the other hand, the tracking unit 222 of the tracking device 200 controls the direction of the imaging unit 230 based on the user position information to search for the user, and then tracks the searched user. For example, when the user A is out of the frame, the tracking unit 222 searches for the user A with the orientation of the imaging unit 230 directed to the latitude and longitude indicated by the position information based on the received position information. After the search, the tracking unit 222 tracks the user A based on the face image A1 of the user A.

以上、この変形例１によれば、ユーザがフレームアウトした場合であっても、位置情報に基づいてユーザを探索して追尾することができる。したがって、追尾精度を向上させることができる。なお、特定部１２２は、発声までにかかる時間を示す時間情報の特定に加えて、位置情報の特定を行ってもよい。これによれば、ユーザを探索して追尾しつつ、発声タイミングで所定処理を実行することができる。 As described above, according to the first modification, even when the user is out of the frame, the user can be searched and tracked based on the position information. Therefore, tracking accuracy can be improved. Note that the specifying unit 122 may specify position information in addition to specifying time information indicating the time taken to speak. According to this, it is possible to execute the predetermined process at the utterance timing while searching for and tracking the user.

なお、上記変形例１では、ユーザが一人である場合を例にとって説明したが、これに限られない。例えば、会議などにおいて複数のユーザがいる場合に、変形例１に係る検知装置１００’を適用してもよい。この場合、複数のユーザ（例えば、ユーザＡ〜Ｃ）それぞれが検知装置１００’を着けて、発声前に位置情報とユーザＩＤとを無線送信するようにする。これによれば、追尾装置２００は、各ユーザがフレームアウトしていても位置情報に基づいて各ユーザを探索後、発声前に追尾を開始することができる。 In the first modification, the case where there is only one user has been described as an example, but the present invention is not limited to this. For example, when there are a plurality of users in a meeting or the like, the detection apparatus 100 ′ according to the first modification may be applied. In this case, each of a plurality of users (for example, users A to C) wears the detection device 100 ′ and wirelessly transmits the position information and the user ID before speaking. According to this, even if each user is out of frame, the tracking device 200 can start tracking after speaking after searching for each user based on the position information.

（変形例２）
上述した変形例１においては、ユーザがフレームアウトした場合に備えて、検知装置１００’が位置情報を特定して無線送信するようにしたが、これに限られない。この変形例２においては、追尾装置２００’が発声前のユーザ位置を推定して探索する点について説明する。図１４に変形例２に係る追尾装置２００’を示す。この追尾装置２００’は、推定部２２４を備えた点が上述した実施形態の追尾装置２００と異なる。以下ではこの異なる点を中心に説明する。 (Modification 2)
In the first modification described above, the detection apparatus 100 ′ specifies the position information and wirelessly transmits it in preparation for the case where the user goes out of the frame. However, the present invention is not limited to this. In the second modification, a description will be given of a point that the tracking device 200 ′ searches by estimating the user position before utterance. FIG. 14 shows a tracking device 200 ′ according to the second modification. This tracking device 200 ′ is different from the tracking device 200 of the above-described embodiment in that an estimation unit 224 is provided. Hereinafter, this difference will be mainly described.

推定部２２４は、受信部２２１が受信したユーザＩＤの受信信号強度（ＲＳＳＩ：Received Signal Strength Indicator）と、そのユーザＩＤが無線送信された方向と、に基づいて、そのユーザＩＤのユーザの位置情報を推定する。具体的には、推定部２２４は、受信信号強度から、検知装置１００を着けたユーザＡまでの距離を推定する。この推定は、追尾装置２００’がＲＳＳＩ値と距離とを対応付けたテーブルを予め記憶しておき、そのテーブルを利用して行えばよい。 Based on the received signal strength indicator (RSSI) of the user ID received by the receiving unit 221 and the direction in which the user ID is wirelessly transmitted, the estimating unit 224 is the location information of the user of the user ID. Is estimated. Specifically, the estimation unit 224 estimates the distance to the user A wearing the detection device 100 from the received signal strength. This estimation may be performed by using a table in which the tracking device 200 ′ previously stores a table in which the RSSI value and the distance are associated with each other.

同時に、推定部２２４は、受信部２２１がユーザＩＤを受信した方向から送信源である検知装置１００の方向を推定する。方向推定の手法は任意だが、例えば、電波の到来方向を指向性アンテナ等で探知すればよい。 At the same time, the estimation unit 224 estimates the direction of the detection device 100 that is the transmission source from the direction in which the reception unit 221 receives the user ID. Although the direction estimation method is arbitrary, for example, the arrival direction of radio waves may be detected by a directional antenna or the like.

これにより、推定部２２４は、検知装置１００を着けたユーザＡまでの距離と方向を推定することができる。そして、追尾部２２２は、推定された方向にサーボモータ２５０を制御して撮像部２３０を向けるとともに、推定された距離に応じて光学系を制御して（ズームイン、ズームアウト等して）、ユーザＡを探索する。探索後、追尾部２２２は、そのユーザＡを追尾する。 Thereby, the estimation unit 224 can estimate the distance and direction to the user A wearing the detection device 100. Then, the tracking unit 222 controls the servo motor 250 in the estimated direction to direct the imaging unit 230 and controls the optical system according to the estimated distance (zoom in, zoom out, etc.), and the user Search for A. After the search, the tracking unit 222 tracks the user A.

以上、この変形例２によれば、ユーザがフレームアウトした場合であっても、追尾装置２００’が推定した位置情報に基づいてユーザを探索して追尾することができる。したがって、追尾精度を向上させることができることに加えて、検知装置１００からユーザの位置情報を送信する必要がないので処理負荷を抑えることができる。 As described above, according to the second modification, even when the user is out of the frame, the user can be searched and tracked based on the position information estimated by the tracking device 200 ′. Therefore, in addition to improving the tracking accuracy, it is not necessary to transmit the user's position information from the detection device 100, so that the processing load can be suppressed.

（変形例３）
上述した実施形態及び各変形例においては、追尾装置２００（Ｗｅｂカメラ）が発声前のユーザを追尾する場合を例にとって説明したが、これに限られない。例えば、追尾装置２００に代えて、録音装置５００（ＩＣレコーダー）を用いてもよい。この場合の検知システム２０を図１５に示す。
なお、録音装置も音を追尾するのであるから、録音装置は追尾装置の一形態である。 (Modification 3)
In the embodiment and each modification described above, the case where the tracking device 200 (Web camera) tracks the user before uttering has been described as an example, but the present invention is not limited to this. For example, instead of the tracking device 200, a recording device 500 (IC recorder) may be used. The detection system 20 in this case is shown in FIG.
Since the recording device also tracks the sound, the recording device is a form of the tracking device.

検知システム２０は、図中に示すように、検知装置１００と録音装置５００とから構成される。この検知装置１００の構成は、上述した実施形態と同じである。一方、録音装置５００は、上述した実施形態の追尾装置２００と比べて追尾に係る構成（追尾部２２２、実行部２２３、撮像部２３０、サーボモータ２５０）を除外して、新たに録音部２２５を設けた点が異なる。そこで、以下ではこの異なる点を中心に説明する。なお、図１５は、ユーザが自身の声を録音する場面を想定している。 The detection system 20 includes a detection device 100 and a recording device 500 as shown in the figure. The configuration of the detection device 100 is the same as that of the above-described embodiment. On the other hand, the recording device 500 excludes the configuration related to tracking (tracking unit 222, execution unit 223, imaging unit 230, servo motor 250) as compared with the tracking device 200 of the above-described embodiment, and newly adds a recording unit 225. Different points are provided. Therefore, in the following, this difference will be mainly described. FIG. 15 assumes a scene in which the user records his / her voice.

まず、検知装置１００の検知部１２１は、ユーザＡの発声に先立つ喉の震えを検知したとする。すると、特定部１２２は、ユーザＡの発声までにかかる時間を示す時間情報を特定する。そして、送信部１２３は、特定した時間情報を無線送信する。
一方で、録音装置５００の受信部２２１は、送信部１２３から無線送信された時間情報を受信する。そして、録音部２２５は、受信部２２１が受信した時間情報が示す発声までにかかる時間に基づいて、発声のタイミングでユーザＡの声の録音を開始する。具体的には、録音部２２５は、発声のタイミングでマイク２８０をＯＮにしてユーザＡの声を収音して記憶部２４０に記憶する。その後、録音部２２５は、マイク２８０からの音声信号が途絶えて所定時間経過するとマイク２８０をＯＦＦする。 First, it is assumed that the detection unit 121 of the detection device 100 detects a tremor in the throat prior to the utterance of the user A. Then, the specifying unit 122 specifies time information indicating the time taken until the user A speaks. Then, the transmission unit 123 wirelessly transmits the specified time information.
On the other hand, the receiving unit 221 of the recording device 500 receives time information wirelessly transmitted from the transmitting unit 123. Then, the recording unit 225 starts recording the voice of the user A at the utterance timing based on the time taken until the utterance indicated by the time information received by the receiving unit 221. Specifically, the recording unit 225 turns on the microphone 280 at the time of utterance, collects the voice of the user A, and stores it in the storage unit 240. Thereafter, the recording unit 225 turns off the microphone 280 when the audio signal from the microphone 280 is interrupted and a predetermined time elapses.

このように、図１５の検知システム２０では、ユーザＡがしゃべる前に、マイク２８０がＯＮして自動録音することができる。この場合、検知装置１００が喉の震えをトリガに発声までにかかる時間情報を録音装置５００に送信し、録音装置５００はその時間情報に基づいて、発声のタイミングで録音を開始する。このため、ユーザＡがしゃべり始めた後に録音を開始するようなことがない。また、自動でマイクのＯＮ・ＯＦＦをするので、ユーザＡは特に録音を意識することなく自身の声を漏れなく録音することができる。この変形例３に係る検知システム２０は、今後のトレンドである人の音声などのデータを自動記録するライフログのツールとして好適である。 As described above, in the detection system 20 of FIG. 15, before the user A speaks, the microphone 280 can be turned on to automatically record. In this case, the detection device 100 transmits time information required for utterance triggered by the tremor of the throat to the recording device 500, and the recording device 500 starts recording at the timing of utterance based on the time information. For this reason, recording does not start after the user A starts speaking. Further, since the microphone is automatically turned ON / OFF, the user A can record his / her voice without omission without being particularly aware of the recording. The detection system 20 according to the modified example 3 is suitable as a life log tool for automatically recording data such as human voice that is a future trend.

なお、検知システム２０は、上記図１５の例に限らず、様々な場面に応用することができる。例えば、録音装置５００がＩＣレコーダではなくスマートフォンだった場合に、検索エンジン（例えば、グーグル（登録商標）など）を用いて音声検索をする場面を想定する。
この場合、ユーザは、音声検索を開始するための開始指示をしなくとも、ユーザの発声に先立ってスマートフォンのマイクが自動でＯＮになる。すなわち、ユーザが着けた検知装置１００がユーザの喉の震えを検知して、発声までにかかる時間情報を無線送信する。そして、スマートフォンが受信した時間情報に基づいて、ユーザの発声のタイミングでマイクを自動でＯＮにしてユーザの声の録音を開始する。
従って、本変形例の録音装置５００は、ＩＣレコーダやスマートフォンなどのボイスレコーダに適用すると好適である。 The detection system 20 is not limited to the example shown in FIG. 15 and can be applied to various scenes. For example, when the recording device 500 is not an IC recorder but a smartphone, a scene in which a voice search is performed using a search engine (for example, Google (registered trademark)) is assumed.
In this case, even if the user does not give a start instruction for starting the voice search, the microphone of the smartphone is automatically turned on prior to the user's utterance. That is, the detection device 100 worn by the user detects the tremor of the user's throat and wirelessly transmits time information required for speaking. Then, based on the time information received by the smartphone, the microphone is automatically turned on at the timing of the user's utterance to start recording the user's voice.
Therefore, the recording apparatus 500 of the present modification is preferably applied to a voice recorder such as an IC recorder or a smartphone.

このように、検知システム２０の別例によれば、ユーザが音声検索前に毎回マイクをＯＮする必要がないので、ユーザにかかる負担を軽減することができる。なお、上述した実施形態に係る検知システム１０と変形例３に係る検知システム２０とは、検知装置１００がユーザの発声に先立つ生理的な挙動（喉の震え）を検知する点、受信側装置（追尾装置２００又は録音装置５００）がユーザの発声前に所定動作（追尾又は録音）を行う点、で共通の技術的特徴を有することを念のため付言する。つまり、検知システム１０と検知システム２０は単一性があることを念のため付言する。 As described above, according to another example of the detection system 20, it is not necessary for the user to turn on the microphone every time before the voice search, so the burden on the user can be reduced. Note that the detection system 10 according to the above-described embodiment and the detection system 20 according to the modification 3 are such that the detection device 100 detects a physiological behavior (tremor of the throat) prior to the user's utterance, Note that the tracking device 200 or the recording device 500) has a common technical feature in that a predetermined operation (tracking or recording) is performed before the user utters. In other words, the detection system 10 and the detection system 20 are supplemented just in case that there is unity.

（変形例４）
なお、上述した実施形態及び各変形例においては、検知装置１００がユーザの発声タイミングを検知する前提で説明したが、これに限られない。例えば、検知装置１００（ネックレス）は、ジャイロセンサ１１０でユーザの喉が震え始めたことを検知すると、即座に検知した信号を送信して、受信側装置（追尾装置２００又は録音装置５００）でユーザの発声タイミングを検知してもよい。なお、受信側装置は、送信側の検知装置１００の信号に基づいて、ユーザの発声タイミングを検知するので検知装置ともいえる。すなわち、この変形例４において追尾装置２００又は録音装置５００は、実質的に検知装置として機能する。 (Modification 4)
In addition, although embodiment mentioned above and each modification demonstrated on the premise that the detection apparatus 100 detects a user's utterance timing, it is not restricted to this. For example, when the detection device 100 (necklace) detects that the user's throat starts trembling with the gyro sensor 110, the detection device 100 (necklace) immediately transmits a detected signal, and the reception side device (the tracking device 200 or the recording device 500) transmits the user. May be detected. In addition, since the receiving side apparatus detects a user's utterance timing based on the signal of the transmission side detecting apparatus 100, it can also be said to be a detecting apparatus. That is, in the fourth modification, the tracking device 200 or the recording device 500 substantially functions as a detection device.

ここで、変形例４に係る検知装置１００”を図１７に示す。図１７に示す検知装置１００”は、実施形態に係る追尾装置２００と比べて、検知部２９０を備えた点及び記憶部２４０が発声前波形２４１を備えた点が異なる。以下では、この異なる点を中心に説明する。 Here, the detection device 100 ″ according to the modification 4 is illustrated in FIG. 17. The detection device 100 ″ illustrated in FIG. 17 includes a detection unit 290 and a storage unit 240 as compared with the tracking device 200 according to the embodiment. Is different from that of FIG. Below, it demonstrates centering on this different point.

まず、送信側の検知装置１００は、上記信号にジャイロセンサ１１０で検知した喉の震えの大きさの実測値を含めて送信する。一方、受信側の検知装置１００”の受信部２２１は、喉が震えたことを示す信号を受信する。 First, the transmitting-side detection device 100 transmits the above signal including the measured value of the magnitude of the throat tremor detected by the gyro sensor 110. On the other hand, the receiving unit 221 of the receiving-side detection device 100 ″ receives a signal indicating that the throat is shaking.

次に、検知部２９０は、その信号に含まれる実測値をフーリエ変換して周波数波形を得ると、その得られた周波数波形の周波数とその大きさとからユーザが発声前か否か判定する。すなわち、周波数波形の周波数が、特徴波形が示す発声前の周波数帯域内であり、かつ、周波数波形の大きさが特徴波形の大きさと同じであれば、ユーザが発声前であると判定する。そして、検知部２９０は、発声までにかかる時間（ｔ２−ｔ１）秒を特定して、発声のタイミングを検知する。
そして、実行部２２３は、検知部２９０が検知した発声のタイミングで所定処理（例えば、ユーザを被写体とする動画の録画又はユーザの音声の録音の開始）を実行する。 Next, when the detection unit 290 obtains a frequency waveform by performing Fourier transform on the actual measurement value included in the signal, the detection unit 290 determines whether or not the user is before speaking from the frequency of the obtained frequency waveform and its magnitude. That is, if the frequency of the frequency waveform is within the frequency band before utterance indicated by the feature waveform and the size of the frequency waveform is the same as the size of the feature waveform, it is determined that the user is before utterance. And the detection part 290 specifies the time (t2-t1) second taken until utterance, and detects the timing of utterance.
Then, the execution unit 223 executes predetermined processing (for example, recording of a moving image with the user as the subject or recording of the user's voice) at the timing of the utterance detected by the detection unit 290.

以上、変形例４に係る検知装置１００”によれば、送信側のウェアラブルな検知装置１００の構成を簡素化することができる。このため、検知装置１００の処理負荷を軽減するとともに、検知装置１００を小型化することができる。
なお、検知装置１００”の実行部２２３は、受信部２２１が喉が震えたことを示す信号を受信すると、撮影又は録音のスタンバイを行ってもよい。撮影のスタンバイとしては、例えば、追尾部２２２がユーザをトラッキングするなどである。録音のスタンバイとしては、例えば、マイク２８０のスイッチをＯＮするなどである。 As described above, according to the detection device 100 ″ according to the modified example 4, the configuration of the wearable detection device 100 on the transmission side can be simplified. For this reason, the processing load on the detection device 100 can be reduced and the detection device 100 can be reduced. Can be miniaturized.
Note that the execution unit 223 of the detection apparatus 100 ″ may perform shooting or recording standby when the reception unit 221 receives a signal indicating that the throat is shaking. As the shooting standby, for example, the tracking unit 222 may be used. Track the user, etc. As a recording standby, for example, the switch of the microphone 280 is turned on.

なお、上述した実施形態及び各変形例においては、生理的な挙動が喉の震えであることを前提に説明したが、これに限られない。例えば、検知装置１００は、喉の震えに代えて発声前の肺の動きを検知してもよい。この場合、検知装置１００はネックレスではなく、例えば、肺の動きを検知できるように胸付近に装着可能なバンド等で構成すればよい。 In addition, in embodiment mentioned above and each modification, although demonstrated on the assumption that physiological behavior was a tremor of the throat, it is not restricted to this. For example, the detection device 100 may detect the movement of the lungs before speaking instead of shaking the throat. In this case, the detection apparatus 100 is not a necklace, but may be formed of, for example, a band that can be worn near the chest so as to detect the movement of the lungs.

また、この発明の検知システム１０を構成する検知装置１００と追尾装置２００の各機能は、通常のＰＣ等のコンピュータによっても実施することができる。
具体的には、上記実施形態では、各装置が行う各処理（検知処理、追尾処理）のプログラムが、各装置の制御部１２０、２２０のＲＯＭに予め記憶されているものとして説明した。しかし、各処理のプログラムを、フレキシブルディスク、ＣＤ−ＲＯＭ（Compact Disc Read Only Memory）、ＤＶＤ（Digital Versatile Disc）及びＭＯ（Magneto-Optical Disc）等のコンピュータ読み取り可能な記録媒体に格納して配布し、そのプログラムをコンピュータにインストールすることにより、上述の各部の機能を実現することができるコンピュータを構成してもよい。 Moreover, each function of the detection apparatus 100 and the tracking apparatus 200 which comprise the detection system 10 of this invention can be implemented also by computers, such as normal PC.
Specifically, in the above-described embodiment, the program of each process (detection process and tracking process) performed by each device has been described as being stored in advance in the ROMs of the control units 120 and 220 of each device. However, each processing program is stored and distributed on a computer-readable recording medium such as a flexible disk, CD-ROM (Compact Disc Read Only Memory), DVD (Digital Versatile Disc), and MO (Magneto-Optical Disc). A computer that can realize the functions of the above-described units may be configured by installing the program in the computer.

以上、本発明の好ましい実施形態について説明したが、本発明は係る特定の実施形態に限定されるものではなく、本発明には、特許請求の範囲に記載された発明とその均等の範囲が含まれる。以下に、本願出願の当初の特許請求の範囲に記載された発明を付記する。 As mentioned above, although preferable embodiment of this invention was described, this invention is not limited to the specific embodiment which concerns, This invention includes the invention described in the claim, and its equivalent range It is. Hereinafter, the invention described in the scope of claims of the present application will be appended.

（付記１）
ユーザの発声に先立つ生理的な挙動を検知する検知手段と、
前記検知手段の検知結果に基づき、所定処理を実行する実行手段と、
を備えたことを特徴とする検知装置。 (Appendix 1)
Detection means for detecting physiological behavior prior to user utterance;
Execution means for executing predetermined processing based on the detection result of the detection means;
A detection device comprising:

（付記２）
前記検知手段は、前記ユーザの喉の震えの大きさの経時的変化を示す波形を周波数変換して得られた周波数波形と、予め学習しておいた発声前の周波数の特徴を示す特徴波形と、の比較に基づいて、前記ユーザの発声に先立つ喉の震えを検知する、
ことを特徴とする付記１に記載の検知装置。 (Appendix 2)
The detection means includes a frequency waveform obtained by frequency-converting a waveform indicating a temporal change in the magnitude of the tremor of the user's throat, and a characteristic waveform indicating a pre-speech frequency characteristic that has been learned in advance. , Based on the comparison, detecting tremors in the throat prior to the user's utterance,
The detecting device according to supplementary note 1, wherein:

（付記３）
前記実行手段は、前記所定処理として、前記ユーザのユーザ識別情報、該ユーザの発声までにかかる時間を示す時間情報又は該ユーザの位置情報のうち、何れか１つの情報を出力する、
ことを特徴とする付記１又は２に記載の検知装置。 (Appendix 3)
The execution means outputs, as the predetermined process, any one of the user identification information of the user, time information indicating a time taken until the user utters, or position information of the user.
The detection apparatus according to appendix 1 or 2, characterized by:

（付記４）
ユーザの発声に先立つ生理的な挙動を示す信号を受信する受信手段と、
前記受信手段が受信した信号に基づいて所定処理を実行する実行手段と、
を備えたことを特徴とする検知装置。 (Appendix 4)
Receiving means for receiving a signal indicating physiological behavior prior to the user's utterance;
Execution means for executing predetermined processing based on a signal received by the receiving means;
A detection device comprising:

（付記５）
前記実行手段は、前記所定処理として、撮影又は録音のスタンバイを行う、
ことを特徴とする付記４に記載の検知装置。 (Appendix 5)
The execution means performs shooting or recording standby as the predetermined processing.
The detection device according to supplementary note 4, wherein

（付記６）
前記受信手段が受信した信号に基づいて、前記ユーザの発声のタイミングを検知する検知手段を備え、
前記実行手段は、前記所定処理として、前記検知手段が検知した発声のタイミングで所定処理を実行する、
ことを特徴とする付記４に記載の検知装置。 (Appendix 6)
Based on the signal received by the receiving means, comprising a detecting means for detecting the timing of the utterance of the user,
The execution means executes the predetermined process at the utterance timing detected by the detection means as the predetermined process.
The detection device according to supplementary note 4, wherein

（付記７）
ユーザの発声に先立つ生理的な挙動を検知する検知手段と、
前記検知手段が前記生理的な挙動を検知すると、前記ユーザのユーザ識別情報を送信する送信手段と、を備えた検知装置と、
前記送信手段から送信されたユーザ識別情報を受信する受信手段と、
前記受信手段が受信したユーザ識別情報のユーザを追尾する追尾手段と、を備えた追尾装置と、
を備えたことを特徴とする検知システム。 (Appendix 7)
Detection means for detecting physiological behavior prior to user utterance;
When the detection means detects the physiological behavior, a detection device comprising: transmission means for transmitting user identification information of the user;
Receiving means for receiving user identification information transmitted from the transmitting means;
A tracking device comprising: tracking means for tracking the user of the user identification information received by the receiving means;
A detection system characterized by comprising:

（付記８）
前記検知手段は、前記ユーザの喉の震えの大きさの経時的変化を示す波形を周波数変換して得られた周波数波形と、予め学習しておいた発声前の周波数の特徴を示す特徴波形と、の比較に基づいて、前記ユーザの発声に先立つ喉の震えを検知する、
ことを特徴とする付記７に記載の検知システム。 (Appendix 8)
The detection means includes a frequency waveform obtained by frequency-converting a waveform indicating a temporal change in the magnitude of the tremor of the user's throat, and a characteristic waveform indicating a pre-speech frequency characteristic that has been learned in advance. , Based on the comparison, detecting tremors in the throat prior to the user's utterance,
The detection system according to appendix 7, characterized by:

（付記９）
前記追尾手段は、撮像手段によって前記ユーザの顔又は該ユーザが装着する前記検知装置の形状を認識後、該撮像手段の向きが前記顔又は前記形状の方向になるように制御して該ユーザを追尾する、
ことを特徴とする付記７又は８に記載の検知システム。 (Appendix 9)
The tracking means, after recognizing the face of the user or the shape of the detection device worn by the user by the imaging means, controls the imaging means so that the orientation of the imaging means becomes the direction of the face or the shape. To track,
The detection system according to appendix 7 or 8, characterized in that.

（付記１０）
前記検知装置は、さらに、
前記検知手段が前記生理的な挙動を検知すると、前記発声までにかかる時間を示す時間情報を特定する時間特定手段を備え、
前記送信手段は、前記ユーザ識別情報に加え、前記時間情報を送信し、
前記追尾装置は、さらに、
前記時間情報が示す発声までにかかる時間に基づいて、該発声のタイミングで前記追尾手段が追尾中のユーザに関連する所定処理を実行する実行手段を備えた、
ことを特徴とする付記７乃至９の何れか一つに記載の検知システム。 (Appendix 10)
The detection device further includes:
When the detecting means detects the physiological behavior, it comprises time specifying means for specifying time information indicating the time taken until the utterance,
The transmission means transmits the time information in addition to the user identification information,
The tracking device further includes:
Based on the time taken for the utterance indicated by the time information, the tracking means includes an execution means for executing a predetermined process related to the user being tracked at the timing of the utterance.
The detection system according to any one of appendices 7 to 9, characterized in that:

（付記１１）
前記実行手段は、前記所定処理として、前記発声のタイミングで前記ユーザを被写体とする動画の録画又は該ユーザの声の録音を開始する、
ことを特徴とする付記１０に記載の検知システム。 (Appendix 11)
The execution means starts the recording of the moving image or the voice of the user as the subject at the timing of the utterance as the predetermined process,
The detection system according to supplementary note 10, characterized by that.

（付記１２）
前記実行手段は、前記所定処理として、前記発声のタイミングで前記ユーザに応じた撮影モードに切り替える、
ことを特徴とする付記１０に記載の検知システム。 (Appendix 12)
The execution means switches to the shooting mode according to the user at the time of the utterance as the predetermined process.
The detection system according to supplementary note 10, characterized by that.

（付記１３）
前記検知装置は、さらに、
前記検知手段が前記生理的な挙動を検知すると、前記ユーザの位置情報を特定する位置特定手段を備え、
前記送信手段は、前記ユーザ識別情報に加え、前記ユーザの位置情報を送信し、
前記追尾手段は、前記ユーザの位置情報に基づいて、撮像手段の向きを制御して該ユーザを探索後、該探索したユーザを追尾する、
ことを特徴とする付記７乃至１２の何れか一つに記載の検知システム。 (Appendix 13)
The detection device further includes:
When the detecting means detects the physiological behavior, the detecting means includes position specifying means for specifying the position information of the user,
The transmission means transmits the user location information in addition to the user identification information,
The tracking means controls the direction of the imaging means based on the user position information and searches for the user, and then tracks the searched user.
The detection system according to any one of appendices 7 to 12, characterized in that:

（付記１４）
前記追尾装置は、さらに、
前記受信手段が受信したユーザ識別情報の受信信号強度と、該ユーザ識別情報が送信された方向と、に基づいて、該ユーザ識別情報のユーザの位置情報を推定する推定手段を備え、
前記追尾手段は、前記ユーザの位置情報に基づいて、撮像手段の向きを制御して該ユーザを探索後、該探索したユーザを追尾する、
ことを特徴とする付記７乃至１２の何れか一つに記載の検知システム。 (Appendix 14)
The tracking device further includes:
An estimation unit configured to estimate the position information of the user of the user identification information based on a received signal strength of the user identification information received by the reception unit and a direction in which the user identification information is transmitted;
The tracking means controls the direction of the imaging means based on the user position information and searches for the user, and then tracks the searched user.
The detection system according to any one of appendices 7 to 12, characterized in that:

（付記１５）
前記検知装置は、前記ユーザが身体に装着するウェアラブル装置である、
ことを特徴とする付記７乃至１４の何れか一つに記載の検知システム。 (Appendix 15)
The detection device is a wearable device worn by the user on the body,
The detection system according to any one of appendices 7 to 14, characterized in that:

（付記１６）
ユーザの発声に先立つ生理的な挙動を検知する検知手段と、
前記検知手段が前記生理的な挙動を検知すると、前記発声までにかかる時間を示す時間情報を特定する時間特定手段と、
前記時間特定手段が特定した前記時間情報を送信する送信手段と、を備えた検知装置と、
前記送信手段から送信された時間情報を受信する受信手段と、
前記受信手段が受信した前記時間情報が示す発声までにかかる時間に基づいて、該発声のタイミングで前記ユーザの声の録音を開始する録音手段と、を備えた録音装置と、
を備えたことを特徴とする検知システム。 (Appendix 16)
Detection means for detecting physiological behavior prior to user utterance;
When the detecting means detects the physiological behavior, time specifying means for specifying time information indicating a time taken until the utterance;
A transmission device that transmits the time information specified by the time specification device;
Receiving means for receiving time information transmitted from the transmitting means;
A recording device comprising: recording means for starting recording of the voice of the user at the timing of the utterance based on the time taken until the utterance indicated by the time information received by the receiving means;
A detection system characterized by comprising:

（付記１７）
ユーザの発声に先立つ生理的な挙動を検知する検知ステップと、
前記検知ステップの検知結果に基づき、所定処理を実行する実行ステップと、
を備えたことを特徴とする検知方法。 (Appendix 17)
A detection step for detecting physiological behavior prior to user utterance;
An execution step of executing a predetermined process based on the detection result of the detection step;
A detection method characterized by comprising:

（付記１８）
コンピュータを、
ユーザの発声に先立つ生理的な挙動を検知する検知手段、
前記検知手段の検知結果に基づき、所定処理を実行する実行手段、
として機能させるためのプログラム。 (Appendix 18)
Computer
Detection means for detecting physiological behavior prior to user utterance;
Execution means for executing predetermined processing based on the detection result of the detection means;
Program to function as.

１０，２０…検知システム、１００，１００Ａ〜Ｃ，１００’，１００”…検知装置、１１０…ジャイロセンサ、１１１…ＧＰＳ、１２０，２２０…制御部、１２１，２９０…検知部、１２２…特定部、１２３…送信部、１３０，２４０…記憶部、１３１，２４１…発声前波形、１４０，２１０…無線通信Ｉ／Ｆ、２００，２００Ａ，２００Ｂ，２００’…追尾装置、２２１…受信部、２２２…追尾部、２２３…実行部、２２４…推定部、２２５…録音部、２３０…撮像部、２５０…サーボモータ、２６０…操作部、２７０…外部Ｉ／Ｆ、２８０…マイク、３００，４００…ＰＣ、３０１…ディスプレイ、５００…録音装置 DESCRIPTION OF SYMBOLS 10,20 ... Detection system, 100, 100A-C, 100 ', 100 "... Detection apparatus, 110 ... Gyro sensor, 111 ... GPS, 120, 220 ... Control part, 121,290 ... Detection part, 122 ... Specific part, 123: Transmission unit, 130, 240 ... Storage unit, 131, 241 ... Pre-voice waveform, 140, 210 ... Wireless communication I / F, 200, 200A, 200B, 200 '... Tracking device, 221 ... Reception unit, 222 ... Tracking 223 ... execution unit, 224 ... estimation unit, 225 ... recording unit, 230 ... imaging unit, 250 ... servo motor, 260 ... operation unit, 270 ... external I / F, 280 ... microphone, 300,400 ... PC, 301 ... Display, 500 ... Recording device

Claims

Detection means for detecting physiological behavior prior to user utterance;
Execution means for executing predetermined processing based on the detection result of the detection means;
A detection device comprising:

The detection means includes a frequency waveform obtained by frequency-converting a waveform indicating a temporal change in the magnitude of the tremor of the user's throat, and a characteristic waveform indicating a pre-speech frequency characteristic that has been learned in advance. , Based on the comparison, detecting tremors in the throat prior to the user's utterance,
The detection device according to claim 1.

The execution means outputs, as the predetermined process, any one of the user identification information of the user, time information indicating a time taken until the user utters, or position information of the user.
The detection device according to claim 1, wherein

Receiving means for receiving a signal indicating physiological behavior prior to the user's utterance;
Execution means for executing predetermined processing based on a signal received by the receiving means;
A detection device comprising:

The execution means performs shooting or recording standby as the predetermined processing.
The detection device according to claim 4.

Based on the signal received by the receiving means, comprising a detecting means for detecting the timing of the utterance of the user,
The execution means executes the predetermined process at the utterance timing detected by the detection means as the predetermined process.
The detection device according to claim 4.

Detection means for detecting physiological behavior prior to user utterance;
When the detection means detects the physiological behavior, a detection device comprising: transmission means for transmitting user identification information of the user;
Receiving means for receiving user identification information transmitted from the transmitting means;
A tracking device comprising: tracking means for tracking the user of the user identification information received by the receiving means;
A detection system characterized by comprising:

The detection means includes a frequency waveform obtained by frequency-converting a waveform indicating a temporal change in the magnitude of the tremor of the user's throat, and a characteristic waveform indicating a pre-speech frequency characteristic that has been learned in advance. , Based on the comparison, detecting tremors in the throat prior to the user's utterance,
The detection system according to claim 7.

The tracking means, after recognizing the face of the user or the shape of the detection device worn by the user by the imaging means, controls the imaging means so that the orientation of the imaging means becomes the direction of the face or the shape. To track,
The detection system according to claim 7 or 8, characterized in that.

The detection device further includes:
When the detecting means detects the physiological behavior, it comprises time specifying means for specifying time information indicating the time taken until the utterance,
The transmission means transmits the time information in addition to the user identification information,
The tracking device further includes:
Based on the time taken for the utterance indicated by the time information, the tracking means includes an execution means for executing a predetermined process related to the user being tracked at the timing of the utterance.
The detection system according to any one of claims 7 to 9, characterized by the above.

The execution means starts the recording of the moving image or the voice of the user as the subject at the timing of the utterance as the predetermined process,
The detection system according to claim 10.

The execution means switches to the shooting mode according to the user at the time of the utterance as the predetermined process.
The detection system according to claim 10.

The detection device further includes:
When the detecting means detects the physiological behavior, the detecting means includes position specifying means for specifying the position information of the user,
The transmission means transmits the user location information in addition to the user identification information,
The tracking means controls the direction of the imaging means based on the user position information and searches for the user, and then tracks the searched user.
The detection system according to any one of claims 7 to 12, wherein

The tracking device further includes:
An estimation unit configured to estimate the position information of the user of the user identification information based on a received signal strength of the user identification information received by the reception unit and a direction in which the user identification information is transmitted;
The tracking means controls the direction of the imaging means based on the user position information and searches for the user, and then tracks the searched user.
The detection system according to any one of claims 7 to 12, wherein

The detection device is a wearable device worn by the user on the body,
The detection system according to any one of claims 7 to 14, wherein

Detection means for detecting physiological behavior prior to user utterance;
When the detecting means detects the physiological behavior, time specifying means for specifying time information indicating a time taken until the utterance;
A transmission device that transmits the time information specified by the time specification device;
Receiving means for receiving time information transmitted from the transmitting means;
A recording device comprising: recording means for starting recording of the voice of the user at the timing of the utterance based on the time taken until the utterance indicated by the time information received by the receiving means;
A detection system characterized by comprising:

A detection step for detecting physiological behavior prior to user utterance;
An execution step of executing a predetermined process based on the detection result of the detection step;
A detection method characterized by comprising:

Computer
Detection means for detecting physiological behavior prior to user utterance;
Execution means for executing predetermined processing based on the detection result of the detection means;
Program to function as.