JP4660592B2

JP4660592B2 - Camera control apparatus, camera control method, camera control program, and recording medium

Info

Publication number: JP4660592B2
Application number: JP2008521277A
Authority: JP
Inventors: 裕昭柴▲崎▼
Original assignee: Pioneer Corp
Current assignee: Pioneer Corp
Priority date: 2006-06-16
Filing date: 2007-06-15
Publication date: 2011-03-30
Anticipated expiration: 2027-06-15
Also published as: WO2007145331A1; JPWO2007145331A1

Description

この発明は、カメラの撮影方向を制御するカメラ制御装置、カメラ制御方法、カメラ制御プログラムおよび記録媒体に関する。ただし、この発明の利用は、上述したカメラ制御装置、カメラ制御方法、カメラ制御プログラムおよび記録媒体に限られない。 The present invention relates to a camera control device, a camera control method, a camera control program, and a recording medium that control a shooting direction of a camera. However, the use of the present invention is not limited to the above-described camera control device, camera control method, camera control program, and recording medium.

従来、インターネットや電話回線を用いたテレビ会議システムなどでは、音声が発せられた方向にカメラを向けて発言者を撮影し、違和感なく会議を進行できるようにしたものがある。このようなテレビ会議システムでは、たとえば、音声発生方向がカメラの現在の画角内にない場合は、雲台による方向変更により画角内に入るか否か判定し、入ると判定したときは、画角内に入るように雲台を駆動し、画像を表示し、入らないと判定したときは、画角内に入るように、画角を広げ、かつ雲台を駆動し、画像を表示する（たとえば、下記特許文献１参照。）。 2. Description of the Related Art Conventionally, there are some video conference systems using the Internet or telephone lines in which a speaker is photographed by directing a camera in a direction in which voice is emitted so that a conference can proceed without a sense of incongruity. In such a video conference system, for example, when the sound generation direction is not within the current angle of view of the camera, it is determined whether or not the angle of view is within the angle of view by changing the direction with the camera platform. When the pan head is driven to enter the angle of view and an image is displayed, and it is determined that it does not enter, the angle of view is widened and the pan head is driven to display the image so that it is within the angle of view. (For example, see Patent Document 1 below.)

特開２０００−２４４８８５号公報Japanese Patent Laid-Open No. 2000-244885

しかしながら、上述した従来技術によれば、発言者による発言の内容を考慮した撮影をおこなうことができないという問題点が一例として挙げられる。従来技術において、カメラによって撮影されるのは音声を発している発言者であるが、発話の内容によっては、発言者を撮影対象とするのが必ずしも適切ではない場合がある。たとえば、カメラの撮影可能範囲内に、発言内容に関連する人物がいる場合には、その人物を撮影する方が好ましい場合がある。 However, according to the above-described conventional technology, there is a problem that it is not possible to perform shooting in consideration of the content of the speech by the speaker. In the prior art, a speaker who is shooting a voice is photographed by a camera. However, depending on the content of the utterance, it may not always be appropriate to photograph the speaker. For example, if there is a person related to the content of the remark within the shootable range of the camera, it may be preferable to shoot that person.

また、上述した従来技術によれば、発言者以外を撮影対象としたい場合は、操作者によるマニュアル操作によってカメラの撮影方向を変更する必要があるという問題点が一例として挙げられる。この場合、カメラを操作する操作者が必要となり、撮影が煩雑になってしまうという問題点が一例として挙げられる。 In addition, according to the above-described prior art, there is a problem that it is necessary to change the shooting direction of the camera by a manual operation by an operator when it is desired to take an image other than the speaker. In this case, an operator who operates the camera is required, and the problem that shooting becomes complicated is an example.

上述した課題を解決し、目的を達成するため、請求項１の発明にかかるカメラ制御装置は、カメラ周辺の音声を取得する取得手段と、前記取得手段によって取得された音声から、前記カメラによる撮影対象を特定する語句（以下、「特定語句」という）を判別する判別手段と、前記判別手段によって判別された特定語句に基づいて、前記カメラの撮影方向を制御する制御手段と、前記撮影対象の候補に関する情報の入力を受け付ける入力手段と、前記撮影対象の候補の位置を検出する検出手段と、を備え、前記判別手段は、前記入力手段に入力された前記撮影対象の候補に関する情報と略一致する語句を前記特定語句として判別し、前記制御手段は、前記判別手段によって前記撮影対象の候補に関する情報と略一致する語句が判別された場合、前記検出手段によって検出された前記撮影対象の候補の位置に前記カメラの撮影方向を向けることを特徴とする。 In order to solve the above-described problems and achieve the object, a camera control device according to the invention of claim 1 is an acquisition unit that acquires audio around the camera, and an image captured by the camera from the audio acquired by the acquisition unit. Discriminating means for discriminating a word (hereinafter referred to as “specific phrase”) for specifying an object, control means for controlling the shooting direction of the camera based on the specific word / phrase discriminated by the discriminating means, Input means for accepting input of information relating to candidates, and detection means for detecting the position of the candidate for photographing object, wherein the determining means substantially matches the information relating to the candidate for photographing object input to the input means. A phrase to be identified is determined as the specific phrase, and the control unit determines that the determination unit determines a phrase that substantially matches information regarding the candidate to be photographed. Wherein the directing the shooting direction of the camera to the position of the candidate of the detected said imaging target by the detection means.

また、請求項６の発明にかかるカメラ制御方法は、カメラ周辺の音声を取得する取得工程と、前記取得工程によって取得された音声から、前記カメラによる撮影対象を特定する語句（以下、「特定語句」という）を判別する判別工程と、前記判別工程によって判別された特定語句に基づいて、前記カメラの撮影方向を制御する制御工程と、前記撮影対象の候補に関する情報の入力を受け付ける入力工程と、前記撮影対象の候補の位置を検出する検出工程と、を含み、前記判別工程は、前記入力工程に入力された前記撮影対象の候補に関する情報と略一致する語句を前記特定語句として判別し、前記制御工程は、前記判別工程によって前記撮影対象の候補に関する情報と略一致する語句が判別された場合、前記検出工程によって検出された前記撮影対象の候補の位置に前記カメラの撮影方向を向けることを特徴とする。 According to a sixth aspect of the present invention, there is provided a camera control method comprising: an acquisition step of acquiring audio around a camera; and a phrase that specifies a subject to be photographed by the camera from the voice acquired by the acquisition step (hereinafter referred to as “specific phrase”). A control step for controlling the shooting direction of the camera based on the specific phrase determined by the determination step, an input step for receiving input of information regarding the candidate for the shooting target, Detecting a position of the candidate for the photographing target, and the determining step determines, as the specific phrase, a phrase that substantially matches information related to the candidate for the photographing target input in the input step, In the control step, the word detected by the detection step is determined when a word or phrase substantially matching the information related to the candidate for the photographing target is determined by the determination step. Wherein the the position of the shadow object candidate directs photographing direction of the camera.

また、請求項７の発明にかかるカメラ制御プログラムは、請求項６に記載のカメラ制御方法をコンピュータに実行させることを特徴とする。 According to a seventh aspect of the present invention, a camera control program causes a computer to execute the camera control method according to the sixth aspect.

また、請求項８の発明にかかる記録媒体は、請求項７に記載のカメラ制御プログラムを記録したコンピュータに読み取り可能なことを特徴とする。 A recording medium according to an eighth aspect of the present invention is readable by a computer that records the camera control program according to the seventh aspect.

以下に添付図面を参照して、この発明にかかるカメラ制御装置、カメラ制御方法、カメラ制御プログラムおよび記録媒体の好適な実施の形態を詳細に説明する。 Exemplary embodiments of a camera control device, a camera control method, a camera control program, and a recording medium according to the present invention will be explained below in detail with reference to the accompanying drawings.

（実施の形態）
はじめに、実施の形態にかかるカメラ制御装置１００の機能的構成について説明する。図１は、カメラ制御装置の機能的構成を示すブロック図である。カメラ制御装置１００は、取得部１０１、判別部１０２、制御部１０３、入力部１０４、検出部１０５によって構成される。 (Embodiment)
First, a functional configuration of the camera control device 100 according to the embodiment will be described. FIG. 1 is a block diagram illustrating a functional configuration of the camera control apparatus. The camera control device 100 includes an acquisition unit 101, a determination unit 102, a control unit 103, an input unit 104, and a detection unit 105.

取得部１０１は、カメラ１１０周辺の音声を取得する。カメラ１１０周辺の音声とは、たとえば、カメラ１１０周辺に位置する人物による発話である。取得部１０１は、たとえば、マイクなどによってカメラ１１０周辺の音声を取得する。 The acquisition unit 101 acquires sound around the camera 110. The sound around the camera 110 is, for example, an utterance by a person located around the camera 110. The acquisition unit 101 acquires sound around the camera 110 using, for example, a microphone.

判別部１０２は、取得部１０１によって取得された音声から、カメラ１１０による撮影対象を特定する語句（以下、特定語句という）を判別する。判別部１０２は、たとえば、後述する入力部１０４に入力される撮影対象の候補に関する情報を、特定語句として判別する。 The discriminating unit 102 discriminates a phrase (hereinafter referred to as a specific phrase) that specifies a subject to be photographed by the camera 110 from the voice acquired by the acquiring unit 101. The determination unit 102 determines, for example, information on a candidate for photographing input to the input unit 104 described later as a specific phrase.

制御部１０３は、判別部１０２によって判別された特定語句に基づいて、カメラ１１０の撮影方向を制御する。制御部１０３は、たとえば、判別部１０２によって撮影対象の候補に関する情報と略一致する語句が判別された場合、後述する検出部１０５によって検出された撮影対象の候補の位置にカメラ１１０の撮影方向を向ける。略一致する語句とは、撮影対象の候補に関する情報として入力された語句と同一または類似する語句である。 The control unit 103 controls the shooting direction of the camera 110 based on the specific phrase determined by the determination unit 102. For example, when the determination unit 102 determines a phrase that substantially matches the information related to the candidate for shooting target, the control unit 103 sets the shooting direction of the camera 110 to the position of the candidate for shooting target detected by the detection unit 105 described later. Turn. The phrase that approximately matches is a phrase that is the same as or similar to the phrase that is input as information related to the candidate to be imaged.

入力部１０４は、撮影対象の候補に関する情報の入力を受け付ける。撮影対象の候補に関する情報とは、たとえば、撮影対象の候補の名称情報（氏名や愛称など）や属性情報などである。また、入力部１０４には、撮影対象の候補の画像や音声が入力されてもよい。 The input unit 104 receives input of information related to a candidate for shooting. The information related to the candidate for imaging target is, for example, name information (name, nickname, etc.) and attribute information of the candidate for imaging target. The input unit 104 may be input with a candidate image or sound to be captured.

検出部１０５は、撮影対象の候補の位置を検出する。検出部１０５は、たとえば、入力部１０４に入力された撮影対象の候補の画像と、カメラ１１０によって撮影された映像とを照合して、撮影対象の候補の位置を検出する。ここで、撮影対象の位置とは、たとえば、カメラ１１０が車両に設置されている場合には、撮影対象の着席した座席の位置であり、検出部１０５は、搭乗者がどの座席に着席したかを検出する。また、撮影対象の位置とは、カメラ１１０からの相対方向、相対方位などであってもよい。検出部１０５は、具体的には、たとえば、カメラ１１０によって撮影された映像中に、撮影対象の候補の画像との類似度が所定値以上の物体が映っていた場合、その物体の位置を撮影対象の候補の位置とする。 The detection unit 105 detects the position of a candidate for shooting. For example, the detection unit 105 collates a candidate image to be photographed input to the input unit 104 with a video photographed by the camera 110 and detects the position of the candidate to be photographed. Here, the position of the imaging target is, for example, the position of the seat where the imaging target is seated when the camera 110 is installed in the vehicle, and the detection unit 105 determines which seat the passenger is seated on. Is detected. Further, the position of the photographing target may be a relative direction from the camera 110, a relative direction, or the like. Specifically, for example, when an object whose similarity with a candidate image to be imaged appears in a video imaged by the camera 110, the detection unit 105 captures the position of the object. The position of the target candidate.

また、検出部１０５は、たとえば、入力部１０４に入力された撮影対象の候補の音声と、取得部１０１によって取得された音声とを照合して撮影対象の候補の位置を検出する。具体的には、たとえば、取得部１０１によって取得された音声中に、撮影対象の候補の音声との類似度が所定値以上の音声が含まれていた場合、その音声が発音されている位置を撮影対象の候補の位置とする。 For example, the detection unit 105 detects the position of the shooting target candidate by collating the sound of the shooting target candidate input to the input unit 104 with the sound acquired by the acquisition unit 101. Specifically, for example, when the voice acquired by the acquisition unit 101 includes a voice whose similarity to the shooting target candidate voice is greater than or equal to a predetermined value, the position where the voice is pronounced is determined. It is set as the position of a candidate for photographing.

また、カメラ１１０は、車両内部に設置されていてもよい。この場合、たとえば、取得部１０１は、車両の搭乗者の発話を取得し、判別部１０２は、撮影対象となる搭乗者に関する情報を特定語句として判別し、制御部１０３は、特定語句によって特定される搭乗者の乗車位置にカメラ１１０の撮影方向を向ける。また、入力部１０４には、たとえば、搭乗者に関する情報が入力され、検出部１０５は、各搭乗者がどの座席に着席したかを検出する。 The camera 110 may be installed inside the vehicle. In this case, for example, the acquisition unit 101 acquires the utterance of the passenger of the vehicle, the determination unit 102 determines information on the passenger to be imaged as a specific phrase, and the control unit 103 is specified by the specific phrase. The shooting direction of the camera 110 is directed to the boarding position of the passenger. In addition, for example, information related to the passenger is input to the input unit 104, and the detection unit 105 detects which seat each passenger is seated in.

つぎに、カメラ制御装置１００によるカメラ１１０の制御処理について説明する。図２は、カメラ制御装置によるカメラ制御処理の手順を示すフローチャートである。図２のフローチャートにおいて、まず、入力部１０４に対して、撮影対象の候補に関する情報の入力がおこなわれる（ステップＳ２０１）。また、検出部１０５によって、撮影対象の候補の位置を検出する（ステップＳ２０２）。 Next, control processing of the camera 110 by the camera control apparatus 100 will be described. FIG. 2 is a flowchart showing a procedure of camera control processing by the camera control apparatus. In the flowchart of FIG. 2, first, information regarding a candidate to be photographed is input to the input unit 104 (step S201). In addition, the detection unit 105 detects the position of the photographing target candidate (step S202).

つぎに、取得部１０１によって、カメラ１１０周辺の音声を取得する（ステップＳ２０３）。つづいて、判別部１０２によって、ステップＳ２０３で取得された音声から、特定語句を判別する（ステップＳ２０４）。このときの特定語句とは、撮影対象の候補に関する情報である。これにより、撮影対象の候補の中から撮影すべき撮影対象が特定される。そして、制御部１０３によって、ステップＳ２０２で検出した撮影対象の位置にカメラ１１０の撮影方向を制御して（ステップＳ２０５）、本フローチャートによる処理を終了する。 Next, the sound around the camera 110 is acquired by the acquisition unit 101 (step S203). Subsequently, the determination unit 102 determines a specific phrase from the voice acquired in step S203 (step S204). The specific phrase at this time is information relating to a candidate for photographing. Thereby, a photographing target to be photographed is specified from among photographing target candidates. And the control part 103 controls the imaging | photography direction of the camera 110 to the position of the imaging | photography object detected by step S202 (step S205), and complete | finishes the process by this flowchart.

以上説明したように、カメラ制御装置１００によれば、カメラ周辺の音声から撮影対象を特定し、撮影対象に向けてカメラの撮影方向を制御する。これにより、カメラ周辺の音声の内容を考慮して、カメラ１１０の撮影方向を変更することができる。 As described above, according to the camera control device 100, the shooting target is specified from the sound around the camera, and the shooting direction of the camera is controlled toward the shooting target. Thereby, it is possible to change the shooting direction of the camera 110 in consideration of the audio content around the camera.

また、カメラ制御装置１００は、あらかじめ入力された撮影対象の候補に関する情報を特定語句として撮影対象を特定するので、より精度良く撮影対象を特定することができる。さらに、撮影対象の候補の画像や音声から撮影対象の位置を検出するので、より精度良くカメラ１１０の撮影方向を制御することができる。 In addition, since the camera control apparatus 100 specifies the shooting target using information relating to the shooting target candidates input in advance as specific words, the shooting target can be specified with higher accuracy. Furthermore, since the position of the shooting target is detected from the image or sound of the shooting target candidate, the shooting direction of the camera 110 can be controlled with higher accuracy.

つぎに、上述した実施の形態にかかるカメラ制御装置１００の実施例について説明する。以下の実施例においては、カメラ制御装置１００を、車両に搭載されたナビゲーション装置３００に適用した場合について説明する。 Next, an example of the camera control apparatus 100 according to the above-described embodiment will be described. In the following embodiment, a case where the camera control device 100 is applied to a navigation device 300 mounted on a vehicle will be described.

（ナビゲーション装置３００の周辺機器構成）
はじめに、ナビゲーション装置３００の周辺機器構成について説明する。図３は、ナビゲーション装置が設置された車両のダッシュボード付近を示す説明図である。ナビゲーション装置３００は、車両のダッシュボードに設置されている。ナビゲーション装置３００は、本体部Ｍおよび表示部（ディスプレイ）Ｄによって構成され、表示部Ｄには車両の現在地点や地図情報、現在時刻などが表示される。 (Configuration of peripheral devices of navigation device 300)
First, the peripheral device configuration of the navigation device 300 will be described. FIG. 3 is an explanatory diagram showing the vicinity of the dashboard of the vehicle in which the navigation device is installed. The navigation device 300 is installed on the dashboard of the vehicle. The navigation device 300 includes a main body M and a display unit (display) D. The display unit D displays the current location of the vehicle, map information, the current time, and the like.

また、ナビゲーション装置３００には、バックミラー周辺に設置された車載用カメラ３１１、サンバイザーに設置された車載用マイク３１２が接続されている。車載用カメラ３１１は、撮影方向を変更可能であり、車外前方および車両内の各部（搭乗者など）を撮影する。以下、車載用カメラ３１１は、動画および静止画を撮影可能なカメラであるものとするが、静止画のみを撮影可能なカメラであってもよい。 The navigation device 300 is connected to an in-vehicle camera 311 installed around the rearview mirror and an in-vehicle microphone 312 installed in the sun visor. The in-vehicle camera 311 can change the shooting direction, and images each part (such as a passenger) in front of the vehicle and in the vehicle. Hereinafter, the in-vehicle camera 311 is assumed to be a camera capable of capturing moving images and still images, but may be a camera capable of capturing only still images.

車載用マイク３１２は、車両内の音声が入力され、ナビゲーション装置３００の音声入力による操作や車両内の様子を記録する際などに用いられる。なお、車載用マイク３１２の位置は、サンバイザーに限ることなく、車両内の音声を効率的に入力できる位置にあればよい。また、車載用カメラ３１１および車載用マイク３１２は、車両に複数設置されていてもよいし、さらに、固定式ではなく可動式であってもよい。本実施例では、車載用マイク３１２は、各搭乗者の座席ごとに設けられているものとする。 The in-vehicle microphone 312 is used when an in-vehicle sound is input and an operation by the sound input of the navigation device 300 or a state in the vehicle is recorded. Note that the position of the in-vehicle microphone 312 is not limited to the sun visor, and may be any position where the sound in the vehicle can be input efficiently. A plurality of in-vehicle cameras 311 and in-vehicle microphones 312 may be installed in the vehicle, and may be movable rather than fixed. In the present embodiment, it is assumed that the in-vehicle microphone 312 is provided for each passenger's seat.

ナビゲーション装置３００は、目的地点までの経路探索および情報記録をおこなう他、ドライブ中の車両内の様子を記録する車内撮影機能を有している。車内撮影機能は、車載用カメラ３１１や車載用マイク３１２で車両内の映像および音声を記録する。車内撮影機能によって記録された映像および音声はナビゲーション装置３００の記録媒体（後述する磁気ディスク４０５、光ディスク４０７）に記録される。また、記録された映像および音声を外部記録媒体に記録して、自宅のテレビなどで楽しめるようにしてもよい。 The navigation apparatus 300 has an in-vehicle shooting function for recording the state in the vehicle being driven, in addition to searching for a route to the destination point and recording information. The in-vehicle shooting function records in-vehicle video and audio with the in-vehicle camera 311 and the in-vehicle microphone 312. The video and audio recorded by the in-vehicle shooting function are recorded on a recording medium (a magnetic disk 405 and an optical disk 407 described later) of the navigation device 300. Also, the recorded video and audio may be recorded on an external recording medium so that they can be enjoyed on a home television or the like.

（ナビゲーション装置３００のハードウェア構成）
つぎに、ナビゲーション装置３００のハードウェア構成について説明する。図４は、ナビゲーション装置のハードウェア構成を示すブロック図である。図４において、ナビゲーション装置３００は、ＣＰＵ４０１と、ＲＯＭ４０２と、ＲＡＭ（メモリ）４０３と、磁気ディスクドライブ４０４と、磁気ディスク４０５と、光ディスクドライブ４０６と、光ディスク４０７と、音声Ｉ／Ｆ（インターフェース）４０８と、マイク４０９と、スピーカ４１０と、入力デバイス４１１と、映像Ｉ／Ｆ４１２と、カメラ４１３と、ディスプレイ４１４と、通信Ｉ／Ｆ４１５と、ＧＰＳユニット４１６と、各種センサ４１７と、外部接続用Ｉ／Ｆ４１８とを備えている。また、各構成部４０１〜４１８はバス４２０によってそれぞれ接続されている。 (Hardware configuration of navigation device 300)
Next, the hardware configuration of the navigation device 300 will be described. FIG. 4 is a block diagram illustrating a hardware configuration of the navigation apparatus. 4, the navigation apparatus 300 includes a CPU 401, a ROM 402, a RAM (memory) 403, a magnetic disk drive 404, a magnetic disk 405, an optical disk drive 406, an optical disk 407, and an audio I / F (interface) 408. , Microphone 409, speaker 410, input device 411, video I / F 412, camera 413, display 414, communication I / F 415, GPS unit 416, various sensors 417, and external connection I / F F418. Each component 401 to 418 is connected by a bus 420.

まず、ＣＰＵ４０１は、ナビゲーション装置３００の全体の制御を司る。ＲＯＭ４０２は、ブートプログラム、通信プログラム、データベース作成プログラム、データ解析プログラムなどのプログラムを記録している。ＲＡＭ４０３は、ＣＰＵ４０１のワークエリアとして使用される。 First, the CPU 401 governs overall control of the navigation device 300. The ROM 402 records programs such as a boot program, a communication program, a database creation program, and a data analysis program. The RAM 403 is used as a work area for the CPU 401.

磁気ディスクドライブ４０４は、ＣＰＵ４０１の制御に従って磁気ディスク４０５に対するデータの読み取り／書き込みを制御する。磁気ディスク４０５は、磁気ディスクドライブ４０４の制御で書き込まれたデータを記録する。磁気ディスク４０５としては、たとえば、ＨＤ（ハードディスク）やＦＤ（フレキシブルディスク）を用いることができる。 The magnetic disk drive 404 controls the reading / writing of the data with respect to the magnetic disk 405 according to control of CPU401. The magnetic disk 405 records data written under the control of the magnetic disk drive 404. As the magnetic disk 405, for example, an HD (hard disk) or an FD (flexible disk) can be used.

光ディスクドライブ４０６は、ＣＰＵ４０１の制御に従って光ディスク４０７に対するデータの読み取り／書き込みを制御する。光ディスク４０７は、光ディスクドライブ４０６の制御に従ってデータが読み出される着脱自在な記録媒体である。光ディスク４０７は、書き込み可能な記録媒体を利用することもできる。また、この着脱可能な記録媒体として、光ディスク４０７のほか、ＭＯ、メモリカードなどであってもよい。 The optical disk drive 406 controls the reading / writing of the data with respect to the optical disk 407 according to control of CPU401. The optical disc 407 is a detachable recording medium from which data is read according to the control of the optical disc drive 406. As the optical disc 407, a writable recording medium can be used. In addition to the optical disk 407, the removable recording medium may be an MO, a memory card, or the like.

磁気ディスク４０５または光ディスク４０７に記録される情報の一例として、経路探索・経路誘導などに用いる地図データが挙げられる。地図データは、建物、河川、地表面などの地物（フィーチャ）を表す背景データと、道路の形状を表す道路形状データとを有しており、ディスプレイ４１４の表示画面において２次元または３次元に描画される。ナビゲーション装置３００が経路誘導中の場合は、地図データと後述するＧＰＳユニット４１６によって取得された自車の現在地点とが重ねて表示されることとなる。 One example of information recorded on the magnetic disk 405 or the optical disk 407 is map data used for route search / route guidance. The map data includes background data representing features (features) such as buildings, rivers, and the ground surface, and road shape data representing the shape of the road. The map data is displayed on the display screen of the display 414 in two dimensions or three dimensions. Drawn. When the navigation device 300 is guiding a route, the map data and the current location of the vehicle acquired by the GPS unit 416 described later are displayed in an overlapping manner.

音声Ｉ／Ｆ４０８は、音声入力用のマイク４０９（たとえば、図３の車載用マイク３１２）および音声出力用のスピーカ４１０に接続される。マイク４０９に受音された音声は、音声Ｉ／Ｆ４０８内でＡ／Ｄ変換される。また、スピーカ４１０からは音声が出力される。なお、マイク４０９から入力された音声は、音声データとして磁気ディスク４０５あるいは光ディスク４０７に記録可能である。 The audio I / F 408 is connected to an audio input microphone 409 (for example, the in-vehicle microphone 312 in FIG. 3) and an audio output speaker 410. The sound received by the microphone 409 is A / D converted in the sound I / F 408. In addition, sound is output from the speaker 410. Note that the voice input from the microphone 409 can be recorded on the magnetic disk 405 or the optical disk 407 as voice data.

入力デバイス４１１は、文字、数値、各種指示などの入力のための複数のキーを備えたリモコン、キーボード、マウス、タッチパネルなどが挙げられる。さらに、入力デバイス４１１は、デジタルカメラや携帯電話端末などの他の情報処理端末を接続し、データの入出力をおこなうことができる。 Examples of the input device 411 include a remote controller including a plurality of keys for inputting characters, numerical values, various instructions, a keyboard, a mouse, a touch panel, and the like. Further, the input device 411 can be connected to another information processing terminal such as a digital camera or a mobile phone terminal to input / output data.

映像Ｉ／Ｆ４１２は、映像入力用のカメラ４１３（たとえば、図３の車載用カメラ３１１）および映像出力用のディスプレイ４１４と接続される。映像Ｉ／Ｆ４１２は、具体的には、たとえば、ディスプレイ４１４全体の制御をおこなうグラフィックコントローラと、即時表示可能な画像情報を一時的に記録するＶＲＡＭ（ＶｉｄｅｏＲＡＭ）などのバッファメモリと、グラフィックコントローラから出力される画像データに基づいて、ディスプレイ４１４を表示制御する制御ＩＣなどによって構成される。 The video I / F 412 is connected to a video input camera 413 (for example, the in-vehicle camera 311 in FIG. 3) and a video output display 414. Specifically, the video I / F 412 includes, for example, a graphic controller that controls the entire display 414, a buffer memory such as a VRAM (Video RAM) that temporarily records image information that can be displayed immediately, and a graphic controller. Based on the output image data, the display 414 is configured by a control IC that controls display.

カメラ４１３は、車両内外の画像（動画を含む）を撮影し、画像データとして出力する。カメラ４１３で撮影された画像は、画像データとして磁気ディスク４０５あるいは光ディスク４０７に記録することができる。この画像データは、ディスプレイ４１４で出力する他、記録媒体に記録したり、ネットワークを介して送信するなどして、他の情報処理端末で利用することができる。 The camera 413 captures images (including moving images) inside and outside the vehicle and outputs them as image data. An image captured by the camera 413 can be recorded on the magnetic disk 405 or the optical disk 407 as image data. This image data can be used by other information processing terminals by outputting it on the display 414, recording it on a recording medium, or transmitting it via a network.

ディスプレイ４１４には、アイコン、カーソル、メニュー、ウインドウ、あるいは文字や画像などの各種データが表示される。このディスプレイ４１４は、たとえば、ＣＲＴ、ＴＦＴ液晶ディスプレイ、プラズマディスプレイなどを採用することができる。 The display 414 displays icons, cursors, menus, windows, or various data such as characters and images. As the display 414, for example, a CRT, a TFT liquid crystal display, a plasma display, or the like can be adopted.

通信Ｉ／Ｆ４１５は、無線を介してインターネットなどの通信網に接続され、この通信網とＣＰＵ４０１とのインターフェースとして機能する。通信網には、ＬＡＮ、ＷＡＮ、公衆回線網や携帯電話網などがある。 The communication I / F 415 is connected to a communication network such as the Internet via wireless and functions as an interface between the communication network and the CPU 401. Communication networks include LANs, WANs, public line networks and mobile phone networks.

ＧＰＳユニット４１６は、ＧＰＳ衛星からの電波を受信し、車両の現在地点（ナビゲーション装置３００の現在地点）を示す情報を出力する。ＧＰＳユニット４１６の出力情報は、後述する各種センサ４１７の出力値とともに、ＣＰＵ４０１による車両の現在地点の算出に際して利用される。現在地点を示す情報は、たとえば緯度・経度、高度などの、地図データ上の１点を特定する情報である。 The GPS unit 416 receives radio waves from GPS satellites and outputs information indicating the current position of the vehicle (current position of the navigation device 300). The output information of the GPS unit 416 is used when the current position of the vehicle is calculated by the CPU 401 together with output values of various sensors 417 described later. The information indicating the current location is information for specifying one point on the map data, such as latitude / longitude and altitude.

各種センサ４１７は、車速センサや加速度センサ、角速度センサなどの、車両の位置や挙動を判断することが可能な情報を出力する。各種センサ４１７の出力値は、ＣＰＵ４０１による現在地点の算出や、速度や方位の変化量の測定に用いられる。 The various sensors 417 output information that can determine the position and behavior of the vehicle, such as a vehicle speed sensor, an acceleration sensor, and an angular velocity sensor. The output values of the various sensors 417 are used for the calculation of the current location by the CPU 401 and the measurement of the amount of change in speed and direction.

外部接続用Ｉ／Ｆ４１８は、オーディオ装置や車内空調装置など、外部の機器と接続するためのインターフェース類である。外部接続用Ｉ／Ｆ４１８は、たとえば、専用の接続ケーブルのポート、赤外線通信用ポートなどによって構成される。 The external connection I / F 418 is an interface for connecting to an external device such as an audio device or a vehicle air conditioner. The external connection I / F 418 includes, for example, a dedicated connection cable port, an infrared communication port, and the like.

また、実施の形態にかかるカメラ制御装置１００の構成のうち、取得部１０１は音声Ｉ／Ｆ４０８、マイク４０９によって、判別部１０２、検出部１０５はＣＰＵ４０１によって、制御部１０３はＣＰＵ４０１、映像Ｉ／Ｆ４１２によって、入力部１０４は入力デバイス４１１によって、それぞれの機能を実現する。 In the configuration of the camera control apparatus 100 according to the embodiment, the acquisition unit 101 is an audio I / F 408 and a microphone 409, the determination unit 102 and the detection unit 105 are CPU 401, the control unit 103 is a CPU 401, and a video I / F 412. Thus, the input unit 104 realizes each function by the input device 411.

（ナビゲーション装置３００による車内撮影処理）
つづいて、ナビゲーション装置３００による車内撮影処理について説明する。前述のように、ナビゲーション装置３００が設置された車両内には、車載用カメラ３１１および車載用マイク３１２が設けられており、ドライブ中の車両内の様子を記録することができる。ここで、車両内の様子を撮影する際に、ただ漫然と車両内を撮影するのみでは、コンテンツとしての魅力に乏しく、記録された映像を視聴する機会は低下してしまう。 (In-car shooting process by navigation device 300)
Next, in-vehicle shooting processing by the navigation device 300 will be described. As described above, the in-vehicle camera 311 and the in-vehicle microphone 312 are provided in the vehicle in which the navigation device 300 is installed, and the state in the vehicle during driving can be recorded. Here, when photographing the state in the vehicle, simply photographing the inside of the vehicle is not attractive as content, and the opportunity to view the recorded video is reduced.

このため、ナビゲーション装置３００は、搭乗者による発話の内容に基づいて撮影対象を特定し、車載用カメラ３１１の撮影方向を変更する。具体的には、発言者の発話に特定の搭乗者に対応する言葉が含まれる場合に、その搭乗者に車載用カメラ３１１を向けて撮影する。これにより、話題の中心にいる搭乗者に焦点をあてて車両内の様子を撮影することができ、記録された映像のコンテンツとしての魅力を向上させることができる。 For this reason, the navigation apparatus 300 specifies the imaging target based on the content of the utterance by the passenger, and changes the imaging direction of the in-vehicle camera 311. Specifically, when a speech corresponding to a specific passenger is included in the utterance of the speaker, the in-vehicle camera 311 is photographed toward the passenger. Thereby, it is possible to focus on the passenger at the center of the topic and photograph the state in the vehicle, and it is possible to improve the attractiveness of the recorded video content.

ここで、ナビゲーション装置３００では、撮影対象となる搭乗者を特定するため、あらかじめ搭乗者に関する情報を登録しておき、その登録情報に基づいて車載用カメラ３１１を制御している。具体的には、ナビゲーション装置３００は、車両に搭乗する可能性や予定がある人物（以下、「搭乗予定者」という）に関する情報が累積的に蓄積される蓄積型データベース（搭乗予定者データベース）と、車両に搭乗者が乗降するごとに更新される更新型データベース（今回搭乗者データベース）を作成する。 Here, in the navigation device 300, in order to identify a passenger to be imaged, information related to the passenger is registered in advance, and the vehicle-mounted camera 311 is controlled based on the registered information. Specifically, the navigation device 300 includes a storage-type database (planned passenger database) in which information related to the possibility of boarding a vehicle or a person who has a plan (hereinafter, “planned boarding person”) is accumulated. An update type database (current passenger database) that is updated each time a passenger gets on and off the vehicle is created.

図５は、ナビゲーション装置が作成するデータベースの内容を模式的に示す説明図である。搭乗予定者データベース５１０には、車両への搭乗者予定者の氏名情報５２１や愛称情報５２２、属性情報５２３、その搭乗予定者に対応するキーワード５２４などのテキスト情報５１１と、その搭乗者の顔画像データ５１２、声紋データ５１３が登録される。 FIG. 5 is an explanatory diagram schematically showing the contents of a database created by the navigation device. In the boarding person database 510, text information 511 such as name information 521, nickname information 522, attribute information 523, a keyword 524 corresponding to the boarding person, and a face image of the passenger Data 512 and voiceprint data 513 are registered.

今回搭乗者データベース５３０には、今回搭乗した搭乗者に対応したテキスト情報５４１と、その乗車位置情報５４２とが登録される。後述するように、ナビゲーション装置３００は、搭乗者の画像や声紋から今回搭乗した搭乗者を特定する。そして、搭乗予定者情報データベース５１０に格納されている、その搭乗者に対応する情報（テキスト情報５４１）を今回搭乗者データベース５３０に格納する。 In the current passenger database 530, text information 541 corresponding to the current passenger and the boarding position information 542 are registered. As will be described later, the navigation device 300 identifies the passenger who has boarded this time from the image or voiceprint of the passenger. Then, information corresponding to the passenger (text information 541) stored in the scheduled passenger information database 510 is stored in the current passenger database 530.

テキスト情報５４１は、搭乗予定者データベース５１０と同様に、氏名情報５５１や愛称情報５５２、属性情報５５３、その搭乗予定者に対応するキーワード５５４などによって構成される。また、乗車位置情報５４２は、今回搭乗した搭乗者の乗車位置、たとえば、「助手席」や「右後部座席」などの情報である。 The text information 541 includes name information 551, nickname information 552, attribute information 553, a keyword 554 corresponding to the boarding person, and the like, similar to the boarding person database 510. The boarding position information 542 is information such as the “passenger seat” and the “right rear seat”, for example, the boarding position of the passenger who has boarded this time.

乗車位置情報５４２に関連して、ナビゲーション装置３００のＲＯＭ４０２には、各座席の位置に対応する車載用カメラ３１１の制御テーブル５６０が記録されている。制御テーブル５００には、各座席を撮影する際に車載用カメラ３１１の撮影方向を変更する変更量が定義されている。具体的には、たとえば、基準となる撮影方向（０°方向）をリアウインドーに対して垂直とすると、運転席は基準となる撮影方向から左４５°方向、助手席は基準となる撮影方向から右４５°方向、右後部座席は基準となる撮影方向から左３０°方向、左後部座席は基準となる撮影方向から右３０°方向となる。 In relation to the boarding position information 542, the control table 560 of the in-vehicle camera 311 corresponding to the position of each seat is recorded in the ROM 402 of the navigation device 300. The control table 500 defines a change amount for changing the shooting direction of the in-vehicle camera 311 when shooting each seat. Specifically, for example, if the reference shooting direction (0 ° direction) is perpendicular to the rear window, the driver's seat is 45 ° left from the reference shooting direction, and the passenger seat is from the reference shooting direction. The right 45 ° direction, the right rear seat is 30 ° left from the reference shooting direction, and the left rear seat is 30 ° right from the reference shooting direction.

以上のようなデータベースを用いて、ナビゲーション装置３００は、車載用カメラ３１１の撮影方向を制御する。 Using the database as described above, the navigation apparatus 300 controls the shooting direction of the in-vehicle camera 311.

図６および図７は、ナビゲーション装置の車内撮影処理の手順を示すフローチャートである。図６のフローチャートにおいて、ナビゲーション装置３００は、まず、自装置が搭載された車両の搭乗予定者に、搭乗予定者情報を入力させる（ステップＳ６０１）。搭乗予定者情報は、図５のテキスト情報５１１に対応し、搭乗者の氏名や愛称、属性、キーワードなどの情報である。搭乗者予定者情報の入力は、ナビゲーション装置３００の入力デバイス４１１でおこなう他、各搭乗予定者の自宅のＰＣ（パーソナルコンピュータ）や携帯電話端末などからネットワークを介しておこなうこととしてもよい。 6 and 7 are flowcharts showing the procedure of the in-vehicle shooting process of the navigation device. In the flowchart of FIG. 6, the navigation apparatus 300 first causes the person who is scheduled to board the vehicle on which the apparatus is mounted to input boarding person information (step S601). The boarding person information corresponds to the text information 511 in FIG. 5 and is information such as the name, nickname, attribute, and keyword of the passenger. The passenger occupant information may be input using the input device 411 of the navigation apparatus 300, or may be performed via a network from a PC (personal computer) or a mobile phone terminal at the home of each passenger.

図８は、搭乗者予定者情報の入力画面の一例を示す説明図である。図８は、ナビゲーション装置３００の入力デバイス４１１を用いて搭乗予定者の登録をおこなう場合を例として説明する。図８において、ナビゲーション装置３００のディスプレイ４１４には、搭乗予定者の新規登録画面８００が表示されている。新規登録画面８００には、搭乗予定者の氏名を入力する氏名入力部８１１、搭乗予定者の愛称を入力する愛称入力部８１２、搭乗予定者の属性を入力する属性入力部８１３、搭乗予定者と関連するキーワードを入力するキーワード入力部８１４が表示されている。 FIG. 8 is an explanatory diagram illustrating an example of an input screen for prospective passenger information. FIG. 8 illustrates an example in which a boarding person is registered using the input device 411 of the navigation apparatus 300. In FIG. 8, a new registration screen 800 for a prospective boarding person is displayed on the display 414 of the navigation device 300. In the new registration screen 800, a name input unit 811 for inputting the name of a prospective boarder, a nickname input unit 812 for inputting a nickname of the prospective boarder, an attribute input unit 813 for inputting attributes of the prospective boarder, A keyword input unit 814 for inputting related keywords is displayed.

新規に登録する搭乗予定者は、これらの入力部に自己の情報を入力する。そして、撮影ボタン８２１を押下して、車載用カメラ３１１を用いて自己の顔画像を撮影する。また、収音ボタン８２２押下して、車載用マイク３１２を用いて自己の音声を収音する。なお、顔画像の撮影と音声の収音は、どちらか一方のみとしてもよい。また、搭乗予定者として登録するのは、人間には限らず、犬や猫などの動物であってもよい。この場合、搭乗予定者情報データベース５１０への登録は、人間が代わりにおこなう。 A newly registered boarding person inputs his / her information in these input sections. Then, the photographing button 821 is pressed and a self-portrait image is photographed using the in-vehicle camera 311. In addition, the sound collection button 822 is pressed, and the own sound is collected using the in-vehicle microphone 312. Note that only one of the face image shooting and the sound collection may be used. Further, the person who is registered as a boarding person is not limited to a human being but may be an animal such as a dog or a cat. In this case, registration in the boarding person information database 510 is performed by a human instead.

なお、上述したように、入力画面に新規登録画面を表示して搭乗予定者情報を文字で入力させる他、たとえば、音声対話によって、搭乗予定者情報を発話させて、音声で入力することとしてもよい。この方法によれば、音声データも同時に取得することができる。また、搭乗予定者の顔画像は、車両のドアの開閉などをトリガーとして、自動的に撮影することとしてもよい。 As described above, the new registration screen is displayed on the input screen and the boarding person information is input by characters. For example, the boarding person information may be spoken and input by voice by voice dialogue. Good. According to this method, audio data can be acquired simultaneously. In addition, the face image of the person who is scheduled to board may be automatically taken by using the opening / closing of the door of the vehicle as a trigger.

図６説明に戻り、ナビゲーション装置３００は、ステップＳ６０１で入力された搭乗者情報を搭乗予定者データベース５１０に格納する（ステップＳ６０２）。具体的には、搭乗予定者の氏名や愛称、属性などのテキスト情報５１１と、搭乗予定者の顔画像データ５１２および音声から抽出した声紋データ５１３（どちらか一方のみでもよい）とを関連付けて、搭乗予定者データベースに格納する。 Returning to FIG. 6, the navigation apparatus 300 stores the passenger information input in step S601 in the passenger boarding person database 510 (step S602). Specifically, the text information 511 such as the name, nickname, and attribute of the prospective passenger is associated with the facial image data 512 of the prospective passenger and voiceprint data 513 extracted from the voice (only one of them may be used) Store in the planned passenger database.

つぎに、ナビゲーション装置３００は、車両に搭乗者が乗車するまで待機する（ステップＳ６０３：Ｎｏのループ）。搭乗者が乗車したか否かは、たとえば、車両のエンジンが始動したか否かや、車両のドアが開閉したか否かなどによって判断する。搭乗者が乗車すると（ステップＳ６０３：Ｙｅｓ）、車載用カメラ３１１で車両内全体の画像を撮影し、車載用マイク３１２で各座席に搭乗した搭乗者の音声を収音する（ステップＳ６０４）。具体的には、車載用カメラ３１１の撮影方向を制御して、全座席の搭乗者の顔が位置する範囲を撮影する。また、各座席に設けられている車載用マイク３１２に向かって発話するよう、搭乗者に指示する。なお、画像の撮影と音声の収音は、一方のみをおこなってもよい。 Next, the navigation device 300 stands by until the passenger gets on the vehicle (step S603: No loop). Whether or not the passenger has boarded the vehicle is determined based on, for example, whether or not the vehicle engine has been started and whether or not the vehicle door has been opened or closed. When the passenger gets on (step S603: Yes), the vehicle-mounted camera 311 captures an image of the entire vehicle, and the vehicle-mounted microphone 312 collects the voice of the passenger who has boarded each seat (step S604). Specifically, the shooting direction of the in-vehicle camera 311 is controlled to capture a range in which the faces of the passengers in all seats are located. In addition, the passenger is instructed to speak to the in-vehicle microphone 312 provided in each seat. Note that only one of image shooting and sound pickup may be performed.

つぎに、ナビゲーション装置３００は、ステップＳ６０４で撮影・収音した画像および音声から、今回搭乗した搭乗者の顔画像および声紋を抽出する（ステップＳ６０５）。そして、ステップＳ６０５で抽出された顔画像および声紋のうち、任意の１つを搭乗予定者データベース５１０と照合し（ステップＳ６０６）、その顔画像および声紋と類似する顔画像データ５１２および声紋データ５１３が蓄積されているか否かを判断する（ステップＳ６０７）。具体的には、今回搭乗した搭乗者の顔画像や声紋の特徴点と、搭乗予定者データベースに格納されている顔画像データ５１２や声紋データ５１３の特徴点とを比較して、類似度が所定値以上の顔画像データ５１２や声紋データ５１３が蓄積されているか否かを判断する。 Next, the navigation device 300 extracts a face image and a voiceprint of the passenger who has boarded this time from the image and sound captured and collected in step S604 (step S605). Then, any one of the face images and voiceprints extracted in step S605 is collated with the planned passenger database 510 (step S606), and face image data 512 and voiceprint data 513 similar to the face images and voiceprints are obtained. It is determined whether or not it is stored (step S607). Specifically, the feature point of the face image or voiceprint of the passenger who has boarded this time is compared with the feature point of the face image data 512 or voiceprint data 513 stored in the boarding person database, and the degree of similarity is predetermined. It is determined whether face image data 512 and voiceprint data 513 equal to or greater than the value are accumulated.

類似した顔画像データ５１２や声紋データ５１３が蓄積されている場合は（ステップＳ６０７：Ｙｅｓ）、その顔画像データ５１２や声紋データ５１３に対応する搭乗予定者のテキスト情報５１１を、テキスト情報５４１として今回搭乗者データベース５３０に格納するとともに、その搭乗者の搭乗位置情報５４２を今回搭乗者データベースに格納する（ステップＳ６０８）。なお、搭乗予定者データベース５１０に格納されているテキスト情報５１１も、そのまま保持される。 When similar face image data 512 and voiceprint data 513 are accumulated (step S607: Yes), the text information 511 of the scheduled passenger corresponding to the face image data 512 and voiceprint data 513 is used as text information 541 this time. While being stored in the passenger database 530, the boarding position information 542 of the passenger is stored in the current passenger database (step S608). Note that the text information 511 stored in the planned passenger database 510 is also retained.

一方、類似した顔画像データ５１２や声紋データ５１３が蓄積されていない場合は（ステップＳ６０７：Ｎｏ）、その搭乗者に対して搭乗者予定者情報を入力させる（ステップＳ６０９）。そして、入力された情報を搭乗予定者情報として搭乗予定者データベース５１０に格納する（ステップＳ６１０）。そして、ステップＳ６０９で入力された搭乗者情報を、テキスト情報５４１として今回搭乗者データベース５３０に格納するとともに、その搭乗者の搭乗位置情報５４２を今回搭乗者データベースに格納する（ステップＳ６１１）。 On the other hand, when similar face image data 512 and voiceprint data 513 are not accumulated (step S607: No), the passenger information is input to the passenger (step S609). Then, the input information is stored in the boarding person database 510 as boarding person information (step S610). Then, the passenger information input in step S609 is stored as text information 541 in the current passenger database 530, and the boarding position information 542 of the passenger is stored in the current passenger database (step S611).

すべての顔画像および声紋について照合をおこなうまでは（ステップＳ６１２：Ｎｏ）、ステップＳ６０６に戻り、以降の処理を繰り返す。そして、すべての顔画像および声紋について照合をおこなうと（ステップＳ６１２：Ｙｅｓ）、図７のステップＳ６１３に移行する。ここまでの処理によって、搭乗予定者データベース５１０および今回搭乗者データベース５３０が作成される。 Until all face images and voiceprints are collated (step S612: No), the process returns to step S606, and the subsequent processing is repeated. When all face images and voiceprints are collated (step S612: Yes), the process proceeds to step S613 in FIG. Through the processing up to this point, the scheduled passenger database 510 and the current passenger database 530 are created.

なお、走行中に搭乗者の配置が変更する場合もあるため、走行中の所定のタイミング、たとえば、停車後にドアの開閉があった場合など、搭乗者の乗降を検知した際に、今回搭乗者データベース５３０を更新してもよい。この場合、ナビゲーション装置３００は、ステップＳ６０４以降の処理をおこなうことによって、今回搭乗者データベース５３０を更新する。 In addition, since the arrangement of the passengers may change during traveling, the current passenger is detected when the passenger gets on or off at a predetermined timing during traveling, for example, when the door is opened or closed after stopping. The database 530 may be updated. In this case, the navigation apparatus 300 updates the passenger database 530 this time by performing the processing after step S604.

図７の説明に移り、ナビゲーション装置３００は、車両内の音声を監視し（ステップＳ６１３）、搭乗者の発話に今回搭乗者データベース５３０に格納されたテキスト情報５４１に含まれる語句（特定語句）が含まれているか否かを判断する（ステップＳ６１４）。音声の監視は、一般に用いられている音声認識技術を用いておこなう。具体的には、車載用マイク３１２で収音された音声をテキスト変換し、テキスト情報５４１に含まれる語句が搭乗者の発話に含まれているか否かを判断する。 Moving to the description of FIG. 7, the navigation device 300 monitors the voice in the vehicle (step S613), and the phrase (specific phrase) included in the text information 541 stored in the passenger database 530 this time is the utterance of the passenger. It is determined whether or not it is included (step S614). The voice monitoring is performed using a generally used voice recognition technology. Specifically, the voice collected by the in-vehicle microphone 312 is converted into text, and it is determined whether or not the phrase included in the text information 541 is included in the utterance of the passenger.

テキスト情報５４１に含まれる語句が発話された場合は（ステップＳ６１４：Ｙｅｓ）、今回搭乗者データベース５３０を照合し、発話された語句に対応する搭乗者の搭乗位置情報５４２を取得する（ステップＳ６１５）。つづいて、ナビゲーション装置３００は、優先するカメラ動作があるか否かを判断する（ステップＳ６１６）。優先するカメラ動作とは、たとえば、ドライブレコーダ機能や搭乗者認識処理など、車載用カメラ３１１を利用する処理のうち、搭乗者撮影処理より優先しておこなうべき処理である。たとえば、車両の走行速度が所定速度以上の場合や、車両の現在位置が交差点から所定距離以内の場合は、事故が発生する可能性が通常より高いとして、ドライブレコーダ機能を優先して動作させる。優先するカメラ動作の条件設定は、あらかじめ定められていてもよいし、ユーザによって設定できることとしてもよい。また、このときまで、車載用カメラ３１１は、停止（電源オフ）状態であってもよい。 When the phrase included in the text information 541 is uttered (step S614: Yes), the passenger database 530 is checked this time, and the boarding position information 542 of the passenger corresponding to the spoken phrase is acquired (step S615). . Subsequently, the navigation device 300 determines whether there is a priority camera operation (step S616). The priority camera operation is a process to be performed with priority over the passenger photographing process among the processes using the in-vehicle camera 311 such as a drive recorder function and a passenger recognition process. For example, when the traveling speed of the vehicle is equal to or higher than a predetermined speed, or when the current position of the vehicle is within a predetermined distance from the intersection, the drive recorder function is preferentially operated because the possibility that an accident will occur is higher than usual. The preferential camera operation condition setting may be determined in advance or may be set by the user. Until this time, the in-vehicle camera 311 may be in a stopped (power off) state.

優先するカメラ動作がある場合は（ステップＳ６１６：Ｙｅｓ）、カメラ動作が終了するまで待機する。優先するカメラ動作がない場合は（ステップＳ６１６：Ｎｏ）、ＲＯＭ４０２に記録されている制御テーブル５６０を参照して、発話された語句に対応する搭乗者の搭乗位置に車載用カメラ３１１の撮影方向を変更し（ステップＳ６１７）、搭乗者を撮影する（ステップＳ６１８）。一方、テキスト情報５４１に含まれる語句が発話されない場合は（ステップＳ６１４：Ｎｏ）、ステップＳ６１３に戻り、音声の監視を継続する。 If there is a priority camera operation (step S616: Yes), the process waits until the camera operation ends. When there is no priority camera operation (step S616: No), the shooting direction of the in-vehicle camera 311 is set at the boarding position of the passenger corresponding to the spoken word with reference to the control table 560 recorded in the ROM 402. The change is made (step S617), and the passenger is photographed (step S618). On the other hand, when the phrase included in the text information 541 is not uttered (step S614: No), the process returns to step S613, and voice monitoring is continued.

なお、テキスト情報５４１に含まれる語句が発話された場合（ステップＳ６１４参照）に必ず搭乗者を撮影するのではなく、テキスト情報５４１に含まれる語句に加えて、特定のキーワード（撮影指示キーワード）が発話された場合にのみ、撮影をおこなってもよい。撮影指示キーワードとは、たとえば、「撮影」や「撮って」など、直接的に撮影を指示する語句の他、「○○さんを見て」「○○さんに向けて」など、間接的に撮影を指示する語句である。撮影指示キーワードは、あらかじめ決められていてもよいし、ユーザによって登録できるようにしてもよい。 It should be noted that when a phrase included in the text information 541 is uttered (see step S614), the passenger is not always photographed, but in addition to the phrase included in the text information 541, a specific keyword (shooting instruction keyword) is used. Shooting may be performed only when an utterance is made. Shooting instruction keywords include words such as “shooting” and “shooting” directly, and indirectly such as “Look at Mr. XX” and “Towards Mr. XX”. It is a phrase that instructs shooting. The shooting instruction keyword may be determined in advance or may be registered by the user.

また、たとえば、ステップＳ６１７で搭乗者に撮影方向を向けた後、撮影指示キーワードが含まれる発話がされた時点で、撮影を開始してもよい。この場合、被撮影者を特定するキーワードが発せられた後、所定時間経過するまで撮影指示キーワードが発話されない場合は、タイムアウトとして車載用カメラ３１１の撮影方向を、初期状態に戻すようにしてもよい。 Alternatively, for example, after the shooting direction is directed to the passenger in step S617, shooting may be started when an utterance including the shooting instruction keyword is made. In this case, if a shooting instruction keyword is not spoken until a predetermined time has elapsed after a keyword specifying a subject is issued, the shooting direction of the in-vehicle camera 311 may be returned to the initial state as a timeout. .

また、逆に、撮影指示キーワードが含まれる発話がされた時点で撮影を開始し、撮影中にテキスト情報５４１に含まれる語句が発話された場合は、発話された語句に対応する搭乗者の方へ向けるようにしてもよい。さらに、撮影指示キーワードが含まれる発話がされた時点では発言者に対して撮影方向を向けておき、その後、テキスト情報５４１に含まれる語句が発話された時点で、発話された語句に対応する搭乗者の方へ撮影方向を向け、撮影を開始するようにしてもよい。 On the other hand, when the utterance including the shooting instruction keyword is started, shooting is started, and when the phrase included in the text information 541 is uttered during shooting, the passenger corresponding to the spoken phrase You may make it turn to. Furthermore, when the utterance including the shooting instruction keyword is made, the shooting direction is directed toward the speaker, and then the boarding corresponding to the uttered word or phrase when the word or phrase included in the text information 541 is uttered. The shooting direction may be directed toward the person to start shooting.

また、撮影対象となる搭乗者は１人には限らず、たとえば、「全員」「男性」など、複数の搭乗者に共通する属性が発話された場合には、該当する搭乗者を順次撮影する。また、発話の中に、テキスト情報５４１に含まれる語句が複数発話された場合も、それぞれの語句に対応する搭乗者を順次撮影する。また、各搭乗者の撮影優先順位を決めておいてもよい。 Further, the number of passengers to be photographed is not limited to one. For example, when an attribute common to a plurality of passengers such as “all members” and “male” is spoken, the corresponding passengers are sequentially photographed. . In addition, when a plurality of words / phrases included in the text information 541 are uttered in the utterance, the passengers corresponding to the respective words / phrases are sequentially photographed. In addition, the shooting priority order of each passenger may be determined.

ナビゲーション装置３００は、撮影終了の指示があった場合や（ステップＳ６１９：Ｙｅｓ）、撮影を開始してから所定時間が経過した場合は（ステップＳ６２０：Ｙｅｓ）、搭乗者の撮影を終了する（ステップＳ６２１）。ここで、撮影終了の指示とは、たとえば、「ストップ」や「終了」など、直接的に撮影の終了を指示する語句の他、特定のキーワードの発話やボタン操作などである。また、撮影終了の指示がなく（ステップＳ６１９：Ｎｏ）、撮影を開始してから所定時間が経過していない間は（ステップＳ６２０：Ｎｏ）、ステップＳ６１８に戻り、搭乗者の撮影を継続する。 When there is an instruction to end shooting (step S619: Yes), or when a predetermined time has elapsed after starting shooting (step S620: Yes), the navigation device 300 ends shooting of the passenger (step S620: step S619: Yes). S621). Here, the shooting end instruction includes, for example, words such as “stop” and “end” that directly indicate the end of shooting, as well as utterances of specific keywords, button operations, and the like. Further, if there is no instruction to end the shooting (step S619: No) and the predetermined time has not elapsed since the start of shooting (step S620: No), the process returns to step S618 and the passenger's shooting is continued.

ステップＳ６２１で撮影を終了した後は、車載用カメラ３１１の撮影方向を元の位置に戻す（ステップＳ６２２）。車両の走行が終了するまでは（ステップＳ６２３：Ｎｏ）、ステップＳ６１３に戻り、以降の処理を継続する。そして、車両の走行が終了すると（ステップＳ６２３：Ｙｅｓ）、今回搭乗者データベースを消去して（ステップＳ６２４）、本フローチャートの処理を終了する。 After shooting is finished in step S621, the shooting direction of the in-vehicle camera 311 is returned to the original position (step S622). Until the vehicle travels (step S623: No), the process returns to step S613, and the subsequent processing is continued. And when driving | running | working of a vehicle is complete | finished (step S623: Yes), a passenger database is erase | eliminated this time (step S624), and the process of this flowchart is complete | finished.

なお、上述した説明では、搭乗予定者情報データベース５１０に搭乗予定者情報を蓄積することとしたが、たとえば、搭乗予定者情報データベース５１０を作成せず、毎回の走行ごとに搭乗者情報を入力させて今回搭乗者データベース５３０を生成するのみであってもよい。すなわち、毎回の走行ごとに図８に示したような登録画面を表示させ、そのとき搭乗している搭乗者全員に搭乗者情報を入力させ、搭乗位置情報とともに今回搭乗者データベース５３０に格納する。今回搭乗者データベース５３０は、走行が終了する度に消去されるため、搭乗者は搭乗するごとに情報の入力が必要となるが、ナビゲーション装置３００では、搭乗者情報を保持する必要がない。 In the above description, the planned passenger information is stored in the planned passenger information database 510. For example, the planned passenger information database 510 is not created, and the passenger information is input for each run. Thus, only the passenger database 530 this time may be generated. That is, the registration screen as shown in FIG. 8 is displayed for each run, and the passenger information is input to all the passengers on board at that time, and is stored in the current passenger database 530 together with the boarding position information. Since this time the passenger database 530 is deleted every time traveling is completed, the passenger needs to input information every time he / she gets on, but the navigation device 300 does not need to hold the passenger information.

また、本実施例では、車両内の様子を撮影することとしたが、たとえば、テレビ会議システムなどでも同様に適用することができる。この場合、本実施例における搭乗予定者は会議出席予定者であり、搭乗位置は会議室における着席位置となる。 In this embodiment, the situation inside the vehicle is photographed. However, the present invention can be similarly applied to, for example, a video conference system. In this case, the boarding person in the present embodiment is a meeting attendee, and the boarding position is the seating position in the meeting room.

以上説明したように、ナビゲーション装置３００によれば、車両内の音声から撮影対象を特定し、撮影対象に向けて車載用カメラ３１１の撮影方向を制御する。これにより、搭乗者の発話の内容を考慮して、車載用カメラ３１１の撮影方向を変更することができる。たとえば、後部座席に座っている子供に声をかけて、車載用カメラ３１１の撮影方向を子供に向け、その映像をディスプレイ４１４に出力させることができる。これにより、運転中においても、後方を振り返ることなく、後部座席の子供の様子を確認することができる。 As described above, according to the navigation device 300, a shooting target is specified from the sound in the vehicle, and the shooting direction of the in-vehicle camera 311 is controlled toward the shooting target. Thereby, the imaging direction of the vehicle-mounted camera 311 can be changed in consideration of the content of the passenger's utterance. For example, a child sitting in the back seat can be called out, the shooting direction of the in-vehicle camera 311 can be directed to the child, and the video can be output to the display 414. Thereby, the state of the child in the rear seat can be confirmed without looking back while driving.

また、ナビゲーション装置３００は、車載用カメラ３１１によって撮影された映像や車載用マイク３１２から収音された音声から、搭乗者の位置を抽出する。これにより、搭乗者に乗車位置の入力をおこなわせることなく、搭乗者の乗車位置を特定することができる。さらに、搭乗予定者データベースに搭乗予定者情報を蓄積しているため、その都度搭乗者に関する情報を入力させることなく、搭乗者に煩雑な処理をおこなわせることなく撮影をおこなうことができる。 In addition, the navigation device 300 extracts the position of the occupant from the video captured by the in-vehicle camera 311 and the sound collected from the in-vehicle microphone 312. Accordingly, the boarding position of the passenger can be specified without causing the passenger to input the boarding position. Furthermore, since the boarding person information is stored in the boarding person database, it is possible to perform shooting without causing the passenger to perform complicated processing without inputting information about the boarder each time.

なお、本実施の形態で説明したカメラ制御方法は、あらかじめ用意されたプログラムをパーソナルコンピュータやワークステーションなどのコンピュータで実行することにより実現することができる。このプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネットなどのネットワークを介して配布することが可能な伝送媒体であってもよい。 The camera control method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The program may be a transmission medium that can be distributed via a network such as the Internet.

カメラ制御装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of a camera control apparatus. カメラ制御装置によるカメラ制御処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the camera control process by a camera control apparatus. ナビゲーション装置が設置された車両のダッシュボード付近を示す説明図である。It is explanatory drawing which shows the dashboard vicinity of the vehicle in which the navigation apparatus was installed. ナビゲーション装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of a navigation apparatus. ナビゲーション装置が作成するデータベースの内容を模式的に示す説明図である。It is explanatory drawing which shows typically the content of the database which a navigation apparatus produces. ナビゲーション装置の車内撮影処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the imaging | photography process of a navigation apparatus. ナビゲーション装置の車内撮影処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the imaging | photography process of a navigation apparatus. 搭乗者予定者情報の入力画面の一例を示す説明図である。It is explanatory drawing which shows an example of the input screen of passenger planner information.

Explanation of symbols

１００カメラ制御装置
１０１取得部
１０２判別部
１０３制御部
１０４入力部
１０５検出部
１１０カメラ DESCRIPTION OF SYMBOLS 100 Camera control apparatus 101 Acquisition part 102 Discriminating part 103 Control part 104 Input part 105 Detection part 110 Camera

Claims

Acquisition means for acquiring audio around the camera;
Discriminating means for discriminating a phrase (hereinafter referred to as “specific phrase”) for identifying a subject to be photographed by the camera from the voice acquired by the acquiring means;
Control means for controlling the shooting direction of the camera based on the specific phrase determined by the determination means;
Input means for accepting input of information relating to the candidate to be photographed;
Detecting means for detecting the position of the candidate to be photographed,
The discrimination means includes
Determining a phrase that substantially matches the information related to the candidate to be photographed input to the input means as the specific phrase;
The control means includes
A camera that directs the shooting direction of the camera to the position of the candidate for the photographing target detected by the detecting means when a phrase that substantially matches the information related to the candidate for the photographing target is determined by the determining means. Control device.

The input means includes
The camera control apparatus according to claim 1, wherein at least one of name information of the shooting target candidate and attribute information of the shooting target candidate is input as the information related to the shooting target candidate. .

The input means includes
A candidate image of the shooting target is input,
The detection means includes
The camera control apparatus according to claim 1, wherein the position of the candidate to be photographed is detected by comparing the image with a video photographed by the camera.

The input means includes
The voice of the candidate for shooting is input,
The detection means includes
The camera control apparatus according to claim 1, wherein the position of the candidate to be photographed is detected by comparing the sound and the sound acquired by the acquisition unit.

The camera
It is installed inside the vehicle,
The acquisition means includes
Obtaining the utterance of the passenger of the vehicle,
The discrimination means includes
Identifying information about the passenger to be photographed as a specific phrase,
The control means includes
The camera control device according to claim 1, wherein a shooting direction of the camera is directed to a boarding position of the occupant specified by the specific phrase.

An acquisition process for acquiring audio around the camera;
A discrimination step of discriminating a phrase (hereinafter referred to as “specific phrase”) that identifies a subject to be photographed by the camera, from the voice acquired by the acquisition step;
A control step of controlling the shooting direction of the camera based on the specific phrase determined by the determination step;
An input step for receiving input of information related to the candidate to be photographed;
Detecting a position of the candidate for the photographing target,
The discrimination step includes
A word or phrase that substantially matches information related to the candidate to be photographed input in the input step is determined as the specific word and phrase,
The control step includes
A camera that directs the shooting direction of the camera to the position of the candidate for the shooting target detected by the detection step when a phrase that substantially matches the information about the candidate for the shooting target is determined by the determination step. Control method.

A camera control program for causing a computer to execute the camera control method according to claim 6.

A computer-readable recording medium on which the camera control program according to claim 7 is recorded.