JP2006343829A

JP2006343829A - Device and method for photographing around vehicle

Info

Publication number: JP2006343829A
Application number: JP2005166905A
Authority: JP
Inventors: Yoshiyuki Matsubara; 慶幸松原
Original assignee: Denso Corp
Current assignee: Denso Corp
Priority date: 2005-06-07
Filing date: 2005-06-07
Publication date: 2006-12-21

Abstract

<P>PROBLEM TO BE SOLVED: To easily specifying an object intended by a user from an image obtained by photographing the surroundings of a vehicle even if a direction of the vehicle is changed during voice recognition. <P>SOLUTION: A method for photographing around vehicle captures images photographed by a plurality of cameras around the vehicle at a start of the voice recognition (S106), calculates a variation amount of a direction of its own vehicle between the start time and an end time of the voice recognition (S112), determines priorities of images in which the objects recognized through the voice recognition are estimated to be contained by using a table with the priorities defined therein on the basis of the timing to capture the variation amount of the direction of its own vehicle and images (S114), and displays a plurality of image area candidates onto a display part in accordance with the determined priorities to make the user select the intended object when the image areas estimated as the objects are detected more than one (S212). <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、車両に搭載された複数のカメラを用いて車両周辺を撮影する車両周辺撮影装置および車両に搭載された複数のカメラを用いて車両周辺を撮影する車両周辺撮影方法に関する。 The present invention relates to a vehicle periphery photographing apparatus that photographs a vehicle periphery using a plurality of cameras mounted on the vehicle, and a vehicle periphery photographing method that photographs a vehicle periphery using a plurality of cameras mounted on the vehicle.

従来、車両に搭載されたカメラを用いて車両周辺を撮影する車両周辺撮影装置として、ユーザの発話を音声認識して撮影方向やカメラの作動制御ができるようになったものが提案されている（例えば、特許文献１参照）。
特開平８−２９７７３６号公報（段落番号００１９〜００２１） 2. Description of the Related Art Conventionally, as a vehicle periphery photographing device that photographs a vehicle periphery using a camera mounted on the vehicle, a device that can recognize a user's speech and control a photographing direction and a camera operation has been proposed ( For example, see Patent Document 1).
JP-A-8-277736 (paragraph numbers 0019 to 0021)

しかし、特許文献１に記載したようにユーザの音声を音声認識することにより車両周辺の状況をカメラ撮影する場合、以下のような問題が発生する。 However, as described in Japanese Patent Application Laid-Open No. 2004-228561, when the situation around the vehicle is photographed by recognizing the user's voice, the following problem occurs.

例えば、交差点の手前でユーザが前方の標識を見て「前方の標識は何？」と発話しながら交差点を右折し、交差点を右折し終わった時点で音声認識が終了し、この音声認識の終了に応じて車両周辺の画像を撮影するような場合、「標識」は車両前方を撮影するカメラによって撮影された画像ではなく、車両左側を撮影するカメラによって撮影された画像に含まれる場合がある。 For example, before the intersection, the user looks at the sign in front and speaks “What is the sign in front?” And turns right at the intersection. When the image around the vehicle is taken in response to the above, the “sign” may be included in the image taken by the camera photographing the left side of the vehicle, not the image photographed by the camera photographing the front of the vehicle.

このように音声認識により車両周辺の状況をカメラ撮影する場合、ユーザの発話を音声認識している間に車両の方位が変化すると、ユーザの意図する対象物を特定できなくなってしまうといった状況が考えられる。 In this way, when the situation around the vehicle is photographed by voice recognition, there is a situation in which the target object intended by the user cannot be specified if the direction of the vehicle changes while the voice of the user's utterance is recognized. It is done.

そこで、車両周辺の撮影画像から音声認識によって認識された対象物と推定される画像領域が複数検索された場合、これらの画像領域を表示画面に表示させ、ユーザに意図する対象物を選択させるといった方法が考えられる。しかし、複数の画像領域を一律に表示画面に表示させたのでは、意図する対象物を特定し難いといった問題がある。 Therefore, when a plurality of image areas estimated as objects recognized by voice recognition are searched from captured images around the vehicle, these image areas are displayed on the display screen, and the user selects an intended object. A method is conceivable. However, if a plurality of image areas are uniformly displayed on the display screen, there is a problem that it is difficult to specify an intended object.

本発明は上記問題に鑑みたもので、車両周辺を撮影した画像領域からユーザの意図する対象物を特定し易くすることを目的とする。 The present invention has been made in view of the above problem, and an object of the present invention is to make it easy to specify an object intended by a user from an image area obtained by photographing the periphery of a vehicle.

上記目的を達成するため、請求項１に記載の発明では、ユーザの発話した音声を認識する音声認識機能の開始時と終了時の自車の方位の変化量を算出し、前記ユーザの発話に基づいた所定のタイミングで前記車両周辺を複数のカメラで撮影した画像を取り込み、前記自車の方位の変化量と前記音声認識によって認識された対象物の方位と前記画像を取り込むタイミングとに基づいて前記音声認識によって認識された対象物が含まれると推定される画像の優先度が定義されたテーブルを用いて前記音声認識によって認識された対象物が含まれると推定される順に、前記画像取込手段により取り込まれた画像の優先度を決定し、前記取り込んだ画像の中から前記音声認識によって認識された対象物と推定される画像領域を検出し、複数の画像領域が検出された場合、前記決定した優先度に従って前記複数の画像領域の候補を表示部に表示させ、前記表示部に表示された画像領域の中からユーザの意図する対象物を選択させることを特徴としている。 In order to achieve the above object, according to the first aspect of the present invention, the amount of change in the direction of the vehicle at the start and end of the voice recognition function for recognizing the voice spoken by the user is calculated, and Based on the amount of change in the direction of the own vehicle, the direction of the object recognized by the voice recognition, and the timing of capturing the image. The image capture is performed in the order in which the objects recognized by the voice recognition are estimated to be included using a table in which the priority of the images estimated to include the objects recognized by the voice recognition is defined. Determining a priority of the image captured by the means, detecting an image region estimated as the object recognized by the voice recognition from the captured image, and a plurality of image regions When detected, the plurality of image area candidates are displayed on a display unit in accordance with the determined priority, and an object intended by the user is selected from the image areas displayed on the display unit. Yes.

したがって、表示部には、優先度に従って音声認識によって認識された対象物が含まれると推定される画像領域の候補が表示部に表示されるので、ユーザの意図する対象物を特定し易くすることができる。 Therefore, the display unit displays image region candidates that are estimated to include objects recognized by voice recognition in accordance with the priority, so that the object intended by the user can be easily specified. Can do.

また、ユーザの発話に基づいた所定のタイミングとしては、請求項２に記載の発明のように、音声認識開始時または音声認識終了時とすることができる。 Further, the predetermined timing based on the user's utterance can be at the start of speech recognition or at the end of speech recognition, as in the second aspect of the invention.

ここで、音声認識の開始時に取り込まれた車両周辺の画像とは、音声認識の開始時に車両周辺を複数のカメラで撮影した画像を取り込んだ画像だけでなく、音声認識の開始前より車両周辺を複数のカメラで撮影した動画像を記憶媒体に記憶しておき、音声認識の終了時に音声認識の開始時の画像を記憶媒体から取り込んだ画像も含む。 Here, the image around the vehicle captured at the start of speech recognition is not only an image obtained by capturing images of the periphery of the vehicle with a plurality of cameras at the start of speech recognition. A moving image captured by a plurality of cameras is stored in a storage medium, and an image obtained by capturing an image at the start of voice recognition from the storage medium at the end of voice recognition is also included.

また、請求項４〜６に記載の発明では、車両周辺の撮影時の状況を取得し、取得した車両周辺の撮影時の状況に応じて音声認識によって認識された色の検出範囲を補正し、補正した色の検出範囲に基づいて画像取込手段によって取り込まれた画像の中から音声認識によって認識された対象物と推定される画像領域を検出することを特徴としている。 In the inventions according to claims 4 to 6, the situation at the time of photographing around the vehicle is acquired, and the color detection range recognized by voice recognition is corrected according to the obtained situation at the time of photographing around the vehicle, Based on the corrected color detection range, an image region estimated as an object recognized by speech recognition is detected from an image captured by the image capturing means.

したがって、車両周辺の撮影時の状況により、取り込んだ画像の色が実際の色と異なる場合であっても、ユーザの意図する対象物を特定することが可能である。 Therefore, it is possible to specify the target object intended by the user even when the color of the captured image is different from the actual color depending on the situation at the time of shooting around the vehicle.

また、請求項７に記載の発明では、対象物の画像領域の画像データを通信手段を介してサーバに送信し、送信した画像データに基づいてサーバによって検索された対象物の詳細情報を取得し、取得した対象物の詳細情報をユーザに提供することを特徴としている。 In the invention according to claim 7, the image data of the image area of the object is transmitted to the server via the communication means, and the detailed information of the object searched by the server is acquired based on the transmitted image data. The detailed information of the acquired object is provided to the user.

このように、対象物の画像領域の画像データをサーバに送信し、送信した画像データに基づいてサーバによって検索された対象物の詳細情報を取得するので、対象物の詳細情報が車両周辺撮影装置内の記憶媒体に記憶されていなくても、サーバから対象物の詳細情報を取得してユーザに提供することができる。 As described above, the image data of the image area of the object is transmitted to the server, and the detailed information of the object searched by the server is acquired based on the transmitted image data. Even if it is not stored in the internal storage medium, it is possible to obtain detailed information of the object from the server and provide it to the user.

（第１実施形態）
本発明の第１実施形態に係る情報提供システムの概略構成を図１に示す。情報提供システムは、車両周辺撮像装置としてのナビゲーション装置１とサーバ２によって構成されている。このように、本実施形態では、ナビゲーション装置１を用いて車両周辺撮像装置を構成している。 (First embodiment)
A schematic configuration of an information providing system according to the first embodiment of the present invention is shown in FIG. The information providing system includes a navigation device 1 as a vehicle periphery imaging device and a server 2. Thus, in this embodiment, the vehicle periphery imaging device is configured using the navigation device 1.

ナビゲーション装置１は、リモコンセンサ１０、タッチパネル１１、音声認識部１２、音声合成部１３、表示装置１４、ディスプレイ１４ａ、位置検出器１５、外部記憶装置１６、通信機器接続装置１７および制御回路１８を備えている。 The navigation device 1 includes a remote control sensor 10, a touch panel 11, a voice recognition unit 12, a voice synthesis unit 13, a display device 14, a display 14a, a position detector 15, an external storage device 16, a communication device connection device 17, and a control circuit 18. ing.

また、制御回路１８には、車両周辺の状況を撮影する車載カメラ１９が接続され、音声認識部１２にはマイク１２ａが接続され、音声合成部１３にはスピーカ１３ａが接続されている。 The control circuit 18 is connected to a vehicle-mounted camera 19 that captures the situation around the vehicle, the speech recognition unit 12 is connected to a microphone 12a, and the speech synthesis unit 13 is connected to a speaker 13a.

リモコンセンサ１０は、ユーザの操作に基づいて赤外線等による無線信号を送信するリモコン１０ａから受信した信号を制御回路１８へ出力する。 The remote control sensor 10 outputs a signal received from the remote control 10 a that transmits a radio signal using infrared rays or the like to the control circuit 18 based on a user operation.

タッチパネル１１は、ディスプレイ１４ａの表示画面に重ねて設けられ、ユーザのタッチ操作に応じた信号を制御回路１８へ出力する。 The touch panel 11 is provided so as to overlap the display screen of the display 14 a and outputs a signal corresponding to a user's touch operation to the control circuit 18.

音声認識部１２は、マイク１２ａから入力される音声信号からユーザの音声を認識して当該音声に含まれる言語を抽出し、制御回路１８へ出力する。 The voice recognition unit 12 recognizes the user's voice from the voice signal input from the microphone 12 a, extracts a language included in the voice, and outputs the language to the control circuit 18.

音声合成部１３は、制御回路１８から入力される信号に基づいて音声出力のための音声信号を生成し、生成した音声信号をスピーカ１３ａへ出力する。 The voice synthesizer 13 generates a voice signal for voice output based on the signal input from the control circuit 18, and outputs the generated voice signal to the speaker 13a.

表示装置１４は、制御回路１８からの映像信号に応じてＬＥＤ、ＬＣＤ等のディスプレイ１４ａの表示部に当該映像を表示させる。 The display device 14 displays the video on the display unit of the display 14 a such as an LED or LCD in accordance with the video signal from the control circuit 18.

位置検出器１５は、ＧＰＳ受信機、ジャイロセンサ、車速センサ、加速度センサ等のセンサを有しており、これらセンサの各々の性質に基づいた自車の位置や方位（向き）を特定するための測位信号を制御回路１８に出力する。 The position detector 15 includes sensors such as a GPS receiver, a gyro sensor, a vehicle speed sensor, and an acceleration sensor. The position detector 15 is used to specify the position and direction (orientation) of the vehicle based on the characteristics of each sensor. The positioning signal is output to the control circuit 18.

外部記憶装置１６は、ＣＤ−ＲＯＭ、ＤＶＤ、メモリカード等の外部記憶媒体に対して、データの読み出しおよび可能であればデータの書き込みの制御を行う。上記外部記憶媒体が記憶している情報としては、マップマッチング用データ、地図データ等がある。 The external storage device 16 controls data reading and data writing to an external storage medium such as a CD-ROM, DVD, or memory card, if possible. Examples of information stored in the external storage medium include map matching data and map data.

通信機器接続装置１７は、通信機器３を接続するための装置である。通信機器接続装置１７には通信機器３が接続され、この通信機器３を介してナビゲーション装置１とサーバ２との間で双方向の通信を行うようになっている。なお、通信機器３としては、携帯電話が用いられている。 The communication device connection device 17 is a device for connecting the communication device 3. The communication device 3 is connected to the communication device connection device 17, and bidirectional communication is performed between the navigation device 1 and the server 2 via the communication device 3. Note that a mobile phone is used as the communication device 3.

制御回路１８は、ＲＯＭ、ＲＡＭ等のメモリ、ＣＰＵ等を有するコンピュータによって構成され、ＲＯＭに記憶されたプログラムに従って各種処理を実行する。具体的には、位置検出器１５から入力される信号に基づいて現在地を算出する現在地算出処理や地図上の現在地に自車位置マークを重ねて表示させる地図表示処理等がある。また、制御回路１８は、計時のためのタイマを備え、このタイマによって日時を管理するようになっている。 The control circuit 18 is configured by a computer having a memory such as a ROM and a RAM, a CPU, and the like, and executes various processes according to programs stored in the ROM. Specifically, there are a current position calculation process for calculating the current position based on a signal input from the position detector 15 and a map display process for displaying the vehicle position mark on the current position on the map. The control circuit 18 includes a timer for measuring time, and manages the date and time by this timer.

車載カメラ１９は、車両前方、車両右側、車両左側を撮影する３台のカメラを有している。これらのカメラによって撮影された各画像データは動画像として制御回路１８に入力され、制御回路１８は、必要なときに静止画像としてメモリに取り込むようになっている。 The in-vehicle camera 19 has three cameras that photograph the front side of the vehicle, the right side of the vehicle, and the left side of the vehicle. Each piece of image data photographed by these cameras is input as a moving image to the control circuit 18, and the control circuit 18 takes it into a memory as a still image when necessary.

本実施形態におけるナビゲーション装置１は、音声認識部１２によってユーザの発話を音声認識し、音声認識が終了した時に車載カメラ１９により車両周辺を撮影し、撮影した画像の中から音声認識によって認識された対象物の画像領域を特定し、特定した画像データを通信機器接続装置１７に接続された通信機器３を介してサーバ２へ送信するようになっている。 The navigation device 1 according to the present embodiment recognizes a user's utterance by the voice recognition unit 12, captures the vehicle periphery by the in-vehicle camera 19 when the voice recognition is completed, and is recognized by voice recognition from the captured image. The image area of the object is specified, and the specified image data is transmitted to the server 2 via the communication device 3 connected to the communication device connection device 17.

サーバ２は、動植物、標識、自動車等の各種詳細情報が記憶されたデータベース（ＤＢ）２０を備えている。このデータベース２０には、例えば、動植物に関しては、動植物の画像、名称、原産地等の詳細情報が記憶され、自動車に関しては、自動車の画像、車種名、メーカ名、年式といった詳細情報が記憶されている。 The server 2 includes a database (DB) 20 in which various detailed information such as animals and plants, signs, cars, and the like are stored. In this database 20, for example, detailed information such as images of animals and plants, names, and places of origin are stored for animals and plants, and detailed information such as images of cars, vehicle model names, manufacturer names, and model years are stored for automobiles. Yes.

サーバ２は、ナビゲーション装置１から画像データを受信すると、受信した画像データに含まれる対象物とマッチングする画像をデータベース２０から検索し、検索された対象物の詳細情報をナビゲーション装置１へ送信する。例えば、ナビゲーション装置１から対向車の画像データを受信すると、この画像データに含まれる対向車の画像とマッチングする画像をデータベース２０から検索し、マッチングする車両が検索された場合には、該当する車両の車種名、メーカ名、年式といった詳細情報をナビゲーション装置１へ送信する。 When the server 2 receives the image data from the navigation device 1, the server 2 searches the database 20 for an image that matches the object included in the received image data, and transmits detailed information on the searched object to the navigation device 1. For example, when image data of an oncoming vehicle is received from the navigation device 1, an image that matches the image of the oncoming vehicle included in the image data is searched from the database 20. Detailed information such as the car model name, manufacturer name, and model year is transmitted to the navigation device 1.

ナビゲーション装置１は、サーバ２から提供される対象物の詳細情報をディスプレイ１４ａに表示するようになっている。 The navigation device 1 displays detailed information on the object provided from the server 2 on the display 14a.

次に、音声認識部１２による音声認識について説明する。音声認識部１２は、自然言語の形態素解析を用いてユーザの音声に含まれる言語を抽出し、抽出した各言語に対し、相対方位（正面、右前、左上、右下、左後ろなど）、目標物の特徴（色、形、大きさなど）、目標物名（固有名詞、施設、看板、動植物など）、疑問詞（何、いつ、どこ、誰、なぜ、どうやって）等の認識を行う。 Next, speech recognition by the speech recognition unit 12 will be described. The speech recognition unit 12 extracts a language included in the user's speech using morphological analysis of natural language, and relative orientation (front, right front, upper left, lower right, left rear, etc.), target for each extracted language Recognize object features (color, shape, size, etc.), target names (proprietary nouns, facilities, signboards, animals and plants, etc.), and interrogative words (what, when, where, who, why, how).

外部記憶装置１６の外部記憶媒体には、図２に示すような、色データベース（色ＤＢ）、形データベース（形ＤＢ）、相対方位データベース（相対方位ＤＢ）等の各種データベースが記憶されており、音声認識部１２は、これらのデータベースを用いて、相対方位、目標物の特徴、目標物名、疑問詞等を認識する。 Various databases such as a color database (color DB), a shape database (shape DB), and a relative orientation database (relative orientation DB) as shown in FIG. 2 are stored in the external storage medium of the external storage device 16. The voice recognition unit 12 recognizes the relative orientation, the feature of the target, the name of the target, the interrogative, etc. using these databases.

例えば、ユーザによる「青い対向車の名前は何？」といった問い掛けに対しては、「青い」「対向」「車」「名前」「何」を抽出し、図２に示したデータベースを用いて、「青い」は色データベースに含まれる色の表現で分類は「青」、「対向」は相対方位データベースに含まれる相対方位の表現で分類は「前」といったように相対方位、目標物の特徴、目標物名、疑問詞等を認識する。 For example, in response to the question “What is the name of a blue oncoming car” by the user, “blue”, “opposite”, “car”, “name”, “what” are extracted, and the database shown in FIG. “Blue” is a representation of the color contained in the color database, the classification is “blue”, “opposite” is a representation of the relative orientation contained in the relative orientation database, and the classification is “front”. Recognize target names, question words, etc.

また、音声認識部１２は、ユーザの音声入力に応じて音声認識処理を開始し、一定時間（例えば、５秒）以上音声入力が無い状態が継続すると一連の音声認識処理を終了する。そして、再度、ユーザの音声入力があると、一定時間以上音声入力が無い状態が継続するまで一連の音声認識処理を行う。 In addition, the voice recognition unit 12 starts voice recognition processing in response to a user's voice input, and ends a series of voice recognition processing when there is no voice input for a predetermined time (for example, 5 seconds). Then, when there is a user's voice input again, a series of voice recognition processing is performed until a state in which no voice input is continued for a predetermined time or longer.

上述したように音声認識により車両周辺の状況をカメラ撮影する場合、ユーザの音声を認識する音声認識を開始してから音声認識が終了するまでの間に自車の方位が変化すると、音声認識によって認識された方向（相対方位）を撮影した画像にユーザの意図する対象物が含まれない場合が考えられる。このため、ナビゲーション装置１は、音声認識を開始したときから音声認識を終了するまでの自車の方位の変化量を算出して、ユーザの意図する対象物が含まれると推定される画像の優先度を決定し、決定した優先度に従って音声認識によって認識された対象物が含まれると推定される画像領域の候補を表示部に表示させ、画像領域の候補の中から意図する対象物を特定し易くしている。 As described above, when shooting the situation around the vehicle by voice recognition, if the direction of the vehicle changes between the start of voice recognition for recognizing the user's voice and the end of voice recognition, the voice recognition There may be a case where the object intended by the user is not included in the image obtained by capturing the recognized direction (relative orientation). For this reason, the navigation device 1 calculates the amount of change in the direction of the vehicle from the start of the speech recognition to the end of the speech recognition, and prioritizes the image that is estimated to include the target object intended by the user. The display unit displays image area candidates that are estimated to include objects recognized by voice recognition according to the determined priority, and identifies the intended object from the image area candidates. It is easy.

ナビゲーション装置１の音声認識部１２、音声合成部１３および制御回路１８はコンピュータによって構成されている。次に、図３を参照して、このコンピュータの処理について説明する。ナビゲーション装置１は、バッテリからイグニッションスイッチを介して電源が供給されると作動を開始し、コンピュータは、上述した現在地算出処理等の処理と並行して図３に示す処理を周期的に行う。また、コンピュータは、一定期間毎に通信機器接続装置１７から携帯電話を介してインターネットに接続し、インターネット上のサイトから現在の天気を取得してメモリに記憶するようになっている。 The voice recognition unit 12, the voice synthesis unit 13, and the control circuit 18 of the navigation device 1 are configured by a computer. Next, the processing of this computer will be described with reference to FIG. The navigation device 1 starts operating when power is supplied from the battery via the ignition switch, and the computer periodically performs the processing shown in FIG. 3 in parallel with the processing such as the current location calculation processing described above. The computer is connected to the Internet from the communication device connection device 17 via a mobile phone at regular intervals, acquires the current weather from a site on the Internet, and stores it in a memory.

まず、位置検出器１５から入力される測位信号に基づいて自車の絶対方位（Ｄ１）を算出するとともに算出した絶対方位（Ｄ１）をメモリに記憶する（Ｓ１００）。なお、本実施形態における絶対方位は、北を基準とする反時計回り方向の０°から３６０°までの値として算出される。 First, the absolute azimuth (D1) of the host vehicle is calculated based on the positioning signal input from the position detector 15, and the calculated absolute azimuth (D1) is stored in the memory (S100). Note that the absolute orientation in the present embodiment is calculated as a value from 0 ° to 360 ° in the counterclockwise direction with respect to the north.

次に、マイク１２ａから音声信号の入力があるか否かに基づいて音声入力があるか否かを判定する（Ｓ１０２）。 Next, it is determined whether there is an audio input based on whether there is an audio signal input from the microphone 12a (S102).

マイク１２ａから音声信号の入力がない場合、Ｓ１０２において「なし」と判定され、Ｓ１００の処理へ戻り、最新の絶対方位（Ｄ１）を再度算出し、算出した最新の絶対方位（Ｄ１）をメモリに記憶する。 If there is no audio signal input from the microphone 12a, it is determined as “None” in S102, the process returns to S100, the latest absolute direction (D1) is calculated again, and the calculated latest absolute direction (D1) is stored in the memory. Remember.

また、マイク１２ａから音声信号が入力されると、Ｓ１０２において「あり」と判定され、ユーザの音声を認識する音声認識を開始する。このように音声認識を開始した場合、メモリに記憶された絶対方位（Ｄ１）が、音声認識の開始時の自車の方位となる。 Further, when an audio signal is input from the microphone 12a, “Yes” is determined in S102, and voice recognition for recognizing the user's voice is started. When voice recognition is started in this way, the absolute direction (D1) stored in the memory becomes the direction of the host vehicle at the start of voice recognition.

次に、音声認識により対象物の方位、特徴（色、形）、疑問詞等を抽出する。具体的には、音声認識により抽出された言語からユーザの音声に含まれる対象物の方位、特徴（色、形）、疑問詞等を認識する（Ｓ１０４）。 Next, the orientation, characteristics (color, shape), interrogative, etc. of the object are extracted by voice recognition. Specifically, the orientation, characteristics (color, shape), interrogation, etc. of the object included in the user's voice are recognized from the language extracted by voice recognition (S104).

そして、一定時間以上音声入力の無い無音状態が継続し、一連の音声認識を終了すると、車載カメラ１９によって撮影された車両周辺の画像を取り込みメモリに記憶する（Ｓ１０６）。 Then, when a silent state without a voice input continues for a certain time or more and a series of voice recognition is finished, an image around the vehicle photographed by the in-vehicle camera 19 is captured and stored in the memory (S106).

次に、画像取り込み時の車両周辺の状況を取得する。具体的には、画像取り込み時における日時情報、天気等の車両周辺の状況を取得する。なお、日時情報はタイマより取得し、メモリから現在の天気を読み出す（Ｓ１０８）。 Next, the situation around the vehicle at the time of image capture is acquired. Specifically, the situation around the vehicle such as date information and weather at the time of image capture is acquired. The date / time information is acquired from the timer, and the current weather is read from the memory (S108).

次に、位置検出器１５から入力される測位信号に基づいて自車の絶対方位（Ｄ２）を算出するとともに算出した絶対方位（Ｄ２）をメモリに記憶する（Ｓ１１０）。 Next, the absolute azimuth (D2) of the host vehicle is calculated based on the positioning signal input from the position detector 15, and the calculated absolute azimuth (D2) is stored in the memory (S110).

次に、発話してから画像を取り込むまでの自車の方位の変化量、すなわち、音声認識を開始したときから音声認識を終了するまでの自車の方位の変化量を算出する。方位の変化量は、図４に示すように、Ｓ１００において算出した自車の絶対方位（Ｄ１）とＳ１１０において算出した自車の絶対方位（Ｄ２）の差分（Ｄ１−Ｄ２）として算出される（Ｓ１１２）。 Next, the amount of change in the direction of the own vehicle from when the speech is taken to the time when the image is captured, that is, the amount of change in the direction of the own vehicle from when speech recognition is started until the end of speech recognition is calculated. As shown in FIG. 4, the amount of change in direction is calculated as a difference (D1-D2) between the absolute direction (D1) of the own vehicle calculated in S100 and the absolute direction (D2) of the own vehicle calculated in S110 ( S112).

次に、音声認識によって認識された対象物の相対方位とＳ１１２において算出された方位の変化量に基づいて車両周辺の対象物を撮影した画像の中から車両周辺の対象物が含まれると推定される画像領域（カメラ画像）の優先度を決定する（Ｓ１１４）。 Next, it is estimated that the object around the vehicle is included from the images obtained by photographing the object around the vehicle based on the relative azimuth of the object recognized by the voice recognition and the amount of change in the azimuth calculated in S112. The priority of the image area (camera image) to be determined is determined (S114).

外部記憶装置１６の外部記憶媒体には、図５に示すように、自車の方位の変化量と音声認識によって認識された対象物の方位に対し、音声認識によって認識された対象物が含まれると推定される画像の優先度が定義されたテーブルが記憶されており、このテーブルを用いて、ユーザが意図する対象物が含まれる画像領域の優先度を決定する。 As shown in FIG. 5, the external storage medium of the external storage device 16 includes the object recognized by the voice recognition with respect to the amount of change in the direction of the own vehicle and the direction of the object recognized by the voice recognition. A table in which the priority of the estimated image is defined is stored, and this table is used to determine the priority of the image area including the object intended by the user.

例えば、「前＞右、左」と記されている場合、車両右側を撮影するカメラによって撮影された画像領域と車両左側を撮影するカメラによって撮影された画像の優先度は同じで、車両前方を撮影するカメラによって撮影された画像の優先度が最も高いことを意味する。 For example, when “front> right, left” is written, the priority of the image area captured by the camera that captures the right side of the vehicle and the image captured by the camera that captures the left side of the vehicle is the same. This means that the priority of the image taken by the camera to be taken is the highest.

また、「前＞右＞左」と記されている場合、車両前方を撮影するカメラによって撮影された画像の優先度が最も高く、次に、車両右側を撮影するカメラによって撮影された画像の優先度が高く、車両左側を撮影するカメラによって撮影された画像の優先度が最も低いことを意味する。 In addition, when “Previous> Right> Left” is described, the priority of the image captured by the camera that captures the front of the vehicle is the highest, and then the priority of the image captured by the camera that captures the right side of the vehicle. This means that the priority of the image captured by the camera that captures the left side of the vehicle is the lowest.

次に、Ｓ１０８において取得した画像取り込み時の状況、Ｓ１１４において決定した優先度および音声認識によって認識された対象物の特徴から画像毎に対象物の画像領域を検出する画像領域検出処理を行う（Ｓ２００）。 Next, image area detection processing is performed for detecting the image area of the object for each image from the situation at the time of image capture acquired in S108, the priority determined in S114 and the characteristics of the object recognized by voice recognition (S200). ).

図６に、画像領域検出処理のフローチャートを示す。なお、この画像領域検出処理では、予め外部記憶媒体に記憶された色範囲変換テーブルを用いて色範囲の補正を行うことにより、時間帯や天候などの影響によって取り込んだ画像の色が実際の色と異なる場合であっても、ユーザの意図する対象物を特定できるようになっている。 FIG. 6 shows a flowchart of the image area detection process. In this image area detection process, the color of the image captured due to the influence of time zone, weather, etc. is corrected by correcting the color range using a color range conversion table stored in advance in an external storage medium. Even if it is different from the above, the object intended by the user can be specified.

図７に、時間帯の影響による色範囲の補正を行うための色範囲変換テーブル（時間帯）を示す。例えば、明け方と夕方の時間帯では、朝日や夕日の影響により白いものでも赤みがかって見えることがあるため、音声認識によって認識された色に対して赤の範囲を拡大するように色の検出範囲を補正するようになっている。また、夜の時間帯では、周囲が暗くなり白いものでもグレーに見える可能性があるため、音声認識によって認識された色に対して明るさを減少させるように色の検出範囲を補正するようになっている。 FIG. 7 shows a color range conversion table (time zone) for correcting the color range due to the influence of the time zone. For example, in the morning and evening time zones, even white objects may appear reddish due to the effects of sunrise or sunset, so the color detection range should be expanded so that the red range is expanded relative to the colors recognized by voice recognition. It is to be corrected. Also, in the night time zone, the surroundings may be dark and even white objects may appear gray, so the color detection range should be corrected to reduce the brightness with respect to the colors recognized by voice recognition. It has become.

また、図８に、天候の影響による色範囲の補正を行うための色範囲変換テーブル（天候）を示す。図に示すように、曇り、雨、雪の場合、周囲が暗くなるため、音声認識によって認識された色に対して明るさを減少させるように色の検出範囲を補正するようになっている。 FIG. 8 shows a color range conversion table (weather) for correcting the color range due to the influence of the weather. As shown in the figure, in the case of cloudy, rainy or snowy, the surroundings become dark, so the color detection range is corrected so as to reduce the brightness with respect to the color recognized by voice recognition.

図６に示す画像領域検出処理では、まず、優先度が最も高い画像（画像領域）を読み込む（Ｓ２０２）。 In the image region detection process shown in FIG. 6, first, an image (image region) having the highest priority is read (S202).

次に、Ｓ１０８で取得した画像取り込み時の状況（日時および天候）と色範囲変換テーブルからマッチングの色範囲の補正内容を特定するとともに、音声認識によって認識した対象物の色に対し、特定した補正内容に基づく色範囲の補正を行う。具体的には、色範囲変換テーブル（時間帯）を参照して画像取り込み時の日時（時間帯）における補正内容を特定するとともに、色範囲変換テーブル（天候）を参照して画像取り込み時の天候における補正内容を特定し、音声認識によって認識した対象物の色に対し、特定した補正内容に基づく色範囲の補正を行う（Ｓ２０４）。 Next, the correction content of the matching color range is specified from the image capture status (date and weather) acquired in S108 and the color range conversion table, and the specified correction is performed for the color of the object recognized by voice recognition. Perform color range correction based on the content. Specifically, the correction contents in the date and time (time zone) at the time of image capture are specified by referring to the color range conversion table (time zone), and the weather at the time of image capture by referring to the color range conversion table (weather) In step S204, the correction content is determined, and the color of the object recognized by voice recognition is corrected based on the specified correction content.

例えば、音声認識によって認識した対象物の色が白であっても、夕方の時間帯では、白に対して赤の範囲が拡大され、白だけでなくピンク色を含む色範囲に補正される。また、音声認識によって認識した対象物の色が白であっても、雨の場合には、明るさを減少させるように、白よりもグレーよりの色範囲に補正される。 For example, even if the color of the object recognized by voice recognition is white, the red range is expanded with respect to white in the evening time zone, and the color range including not only white but also pink is corrected. Even if the color of the object recognized by voice recognition is white, in the case of rain, it is corrected to a gray color range rather than white so as to reduce the brightness.

次に、画像取り込み時の状況および音声認識によって認識された対象物の特徴に基づいて読み込んだ画像から対象物と推定される画像領域のマッチングを行う。具体的には、Ｓ２０４において補正された色範囲を用いて、読み込んだ画像から補正された色範囲に含まれる色の画像領域を検索する。また、音声認識によって、例えば、「標識」が認識された場合には、読み込んだ画像から丸や三角など標識の特徴に一致する形状を検索条件に付加して画像領域のマッチングを行う（Ｓ２０６）。 Next, matching of the image area estimated as the object from the read image is performed based on the situation at the time of image capture and the characteristics of the object recognized by the speech recognition. Specifically, using the color range corrected in S204, an image area of a color included in the corrected color range is searched from the read image. For example, when “signpost” is recognized by voice recognition, a shape that matches the feature of the sign such as a circle or a triangle is added to the search condition from the read image, and image region matching is performed (S206). .

ここで、読み込んだ画像からマッチングする画像領域が検出された場合には、検出された画像領域を対象物の候補としてＲＡＭに記憶し（Ｓ２０８）、全画像に対してマッチング判定を行ったか否かを判定する（Ｓ２１０）。 Here, when a matching image area is detected from the read image, the detected image area is stored in the RAM as a candidate for the object (S208), and whether or not matching determination has been performed for all images. Is determined (S210).

このように、優先度の高い画像から順番に上記処理Ｓ２０２〜Ｓ２０８を繰り返し行い、Ｓ２１０において全画像に対してマッチング判定を行ったと判定されると（Ｓ２１０でＹＥＳと判定）、次に、メモリに記憶された対象物の候補が複数あるか否かを判定し、対象物の候補が複数あると判定した場合、優先度に従って対象物の候補をディスプレイ１４ａに表示させ、意図する対象物をユーザに選択させ、ディスプレイ１４ａに表示された画像の候補の中からユーザのタッチパネル１１またはリモコン１０ａの操作によって選択された候補をユーザの意図する対象物として特定する（Ｓ２１２）。 As described above, when it is determined that the above-described processing S202 to S208 is repeated in order from the image with the highest priority and the matching determination is performed on all the images in S210 (YES in S210), next, in the memory. If it is determined whether there are a plurality of stored object candidates and it is determined that there are a plurality of object candidates, the object candidates are displayed on the display 14a according to the priority, and the intended object is displayed to the user. The candidate selected by the operation of the user's touch panel 11 or the remote controller 10a from the image candidates displayed on the display 14a is specified as an object intended by the user (S212).

例えば、優先度の高い順に番号を付与し、優先度の高い順にディスプレイ１４ａの左から右に並べて表示させ、意図する対象物をユーザに選択させる。なお、同一画像に対象物の候補が複数あると判定した場合、先にメモリに記憶された候補の優先度を高くし、優先度の高い順にディスプレイ１４ａに表示させる。 For example, numbers are assigned in descending order of priority, displayed in order from the left to the right of the display 14a in order of high priority, and the intended object is selected by the user. When it is determined that there are a plurality of candidate objects in the same image, the priorities of the candidates previously stored in the memory are increased, and are displayed on the display 14a in descending order of priority.

また、Ｓ２１２において、メモリに記憶された対象物の候補が１つの場合、メモリに記憶された対象物をディスプレイ１４ａに表示させるとともに、メモリに記憶された対象物をユーザの意図する対象物として特定する。 In S212, if there is one candidate for the target stored in the memory, the target stored in the memory is displayed on the display 14a, and the target stored in the memory is specified as the target intended by the user. To do.

また、Ｓ２１２において、メモリに記憶された対象物の候補がない場合には、ユーザの意図する対象物が検索されない旨をディスプレイ１４ａに表示させ、本処理を終了する。 In S212, if there is no candidate for the object stored in the memory, the display 14a displays that the object intended by the user is not searched, and the process is terminated.

図３の説明に戻り、Ｓ３００では、ユーザからの問い合わせ内容とともにＳ２００の画像領域検出処理によって検出された画像領域、すなわち、ユーザの操作により特定された対象物の画像データをサーバ２へ送信する。なお、サーバ２へ画像データを送信する前に、対象物を含む最小の画像になるように切り出し、切り出した画像をサーバ２へ送信する。また、ユーザからの問い合わせ内容は、ユーザの音声を認識する音声認識によって認識した言語としてサーバ２へ送信する（Ｓ３００）。 Returning to the description of FIG. 3, in S300, together with the inquiry content from the user, the image area detected by the image area detection processing in S200, that is, the image data of the object specified by the user's operation is transmitted to the server 2. Before transmitting image data to the server 2, the image is cut out so as to be the minimum image including the target object, and the cut out image is transmitted to the server 2. The inquiry content from the user is transmitted to the server 2 as a language recognized by voice recognition for recognizing the user's voice (S300).

サーバ２は、ユーザからの問い合わせ内容と画像データを受信すると、受信したユーザからの問い合わせ内容と画像データに基づいてマッチングする画像をデータベース２０から検索し、検索結果をナビゲーション装置１へ送信する。具体的には、受信したユーザからの問い合わせ内容と画像データに基づいてマッチングする対象物をデータベース２０から検索し、検索された対象物の詳細情報をナビゲーション装置１へ送信する。 When the server 2 receives the inquiry content and image data from the user, the server 2 searches the database 20 for a matching image based on the received inquiry content and image data from the user, and transmits the search result to the navigation device 1. Specifically, an object to be matched is searched from the database 20 based on the received inquiry content and image data from the user, and detailed information of the searched object is transmitted to the navigation device 1.

ナビゲーション装置１のコンピュータは、サーバ２から検索された対象物の詳細情報を受信すると（Ｓ３０２）、受信した対象物の詳細情報をディスプレイ１４ａに表示させるとともにスピーカ１３ａから音声出力させ（Ｓ３０４）、Ｓ１００の処理へ戻る。 When the computer of the navigation device 1 receives the detailed information of the searched object from the server 2 (S302), the computer displays the detailed information of the received object on the display 14a and outputs the sound from the speaker 13a (S304). Return to the process.

上記した構成によれば、ユーザの発話した音声を認識する音声認識機能の開始時に車両周辺を複数のカメラで撮影した画像を取り込む（Ｓ１０６）とともに、ユーザの発話した音声を認識する音声認識機能の開始時と終了時の自車の方位の変化量を算出（Ｓ１１２）し、自車の方位の変化量と音声認識によって認識された対象物の方位と画像取込手段により画像の取り込まれた所定のタイミングとに基づいて音声認識によって認識された対象物が含まれると推定される画像の優先度が定義されたテーブルを用いて音声認識によって認識された対象物が含まれると推定される順に、取り込んだ画像の優先度を決定し（Ｓ１１４）、取り込んだ画像の中から音声認識によって認識された対象物と推定される画像領域を検出し（Ｓ２００）、複数の画像領域が検出された場合、決定した優先度に従って複数の画像領域の候補を表示部に表示させ、表示部に表示された画像領域の中からユーザの意図する対象物を選択させ選択させる（Ｓ２１２）ようになっている。 According to the above configuration, at the start of the voice recognition function for recognizing the voice uttered by the user, images obtained by photographing the periphery of the vehicle with a plurality of cameras are captured (S106), and the voice recognition function for recognizing the voice uttered by the user. The amount of change in the direction of the vehicle at the start and the end is calculated (S112), the amount of change in the direction of the vehicle, the direction of the object recognized by the voice recognition, and the image captured by the image capturing means In the order in which it is estimated that the object recognized by the speech recognition is included using the table in which the priority of the image estimated to include the object recognized by the speech recognition based on the timing of The priority of the captured image is determined (S114), and an image region estimated as a target recognized by voice recognition is detected from the captured image (S200), When an image region is detected, a plurality of image region candidates are displayed on the display unit according to the determined priority, and an object intended by the user is selected and selected from the image regions displayed on the display unit (S212). )

したがって、表示部には、優先度に従って音声認識によって認識された対象物が含まれると推定される画像領域の候補が表示されるので、ユーザの意図する対象物を特定し易くすることができる。 Therefore, the display unit displays image area candidates that are estimated to include the object recognized by the voice recognition according to the priority, so that the object intended by the user can be easily specified.

（第２実施形態）
次に、第２の実施形態に係る情報提供システムについて説明する。第１実施形態におけるナビゲーション装置１は、音声認識の終了時に車両周辺の対象物を撮影した画像を取り込む例を示したが、本実施形態におけるナビゲーション装置１は、一定時間分の車両周辺の動画像をメモリに記憶しておき、音声認識の終了時にメモリに記憶された画像を読み出す。以下、第１実施形態と同じ部分については同一符号を付して説明を省略し、異なる部分を中心に説明する。 (Second Embodiment)
Next, an information providing system according to the second embodiment will be described. Although the navigation apparatus 1 in the first embodiment has shown an example of capturing an image obtained by photographing an object around the vehicle at the end of voice recognition, the navigation apparatus 1 in the present embodiment is a moving image around the vehicle for a certain period of time. Is stored in the memory, and the image stored in the memory is read out at the end of the speech recognition. Hereinafter, the same parts as those of the first embodiment are denoted by the same reference numerals, description thereof will be omitted, and different parts will be mainly described.

図９に、本実施形態に係るナビゲーション装置１のコンピュータの処理を示す。 FIG. 9 shows the processing of the computer of the navigation device 1 according to this embodiment.

まず、車載カメラ１９によって撮影された車両周辺の画像を取り込みメモリに記憶する（Ｓ４００）。 First, an image around the vehicle taken by the in-vehicle camera 19 is captured and stored in the memory (S400).

次に、画像取り込み時の車両周辺の状況を取得する。具体的には、画像取り込み時における日時情報、天気等の車両周辺の状況を取得する（Ｓ４０２）。 Next, the situation around the vehicle at the time of image capture is acquired. Specifically, the situation around the vehicle such as date information and weather at the time of image capture is acquired (S402).

次に、位置検出器１５から入力される測位信号に基づいて自車の絶対方位（Ｄ３）を算出するとともに算出した絶対方位（Ｄ３）をメモリに記憶する（Ｓ４０４）。 Next, the absolute azimuth (D3) of the host vehicle is calculated based on the positioning signal input from the position detector 15, and the calculated absolute azimuth (D3) is stored in the memory (S404).

次に、マイク１２ａから音声信号の入力があるか否かに基づいて音声入力があるか否かを判定する（Ｓ４０６）。 Next, it is determined whether there is an audio input based on whether there is an audio signal input from the microphone 12a (S406).

マイク１２ａから音声信号の入力がない場合、Ｓ４０６において「なし」と判定され、Ｓ４００の処理へ戻り、最新の絶対方位（Ｄ３）を再度算出し、算出した最新の絶対方位（Ｄ３）をメモリに記憶する。 If there is no audio signal input from the microphone 12a, it is determined as “None” in S406, the process returns to S400, the latest absolute direction (D3) is calculated again, and the calculated latest absolute direction (D3) is stored in the memory. Remember.

また、マイク１２ａから音声信号が入力されると、Ｓ４０６において「あり」と判定され、ユーザの音声を認識する音声認識を開始する。なお、メモリに記憶された絶対方位（Ｄ３）が、音声認識の開始時の自車の方位となる。 When a voice signal is input from the microphone 12a, “Yes” is determined in S406, and voice recognition for recognizing the user's voice is started. Note that the absolute direction (D3) stored in the memory is the direction of the vehicle at the start of speech recognition.

次に、音声認識により対象物の方位、特徴（色、形）、疑問詞等を抽出する。具体的には、音声認識により抽出された言語からユーザの音声に含まれる対象物の方位、特徴（色、形）、疑問詞等を認識する（Ｓ４０８）。 Next, the orientation, characteristics (color, shape), interrogative, etc. of the object are extracted by voice recognition. Specifically, the orientation, characteristics (color, shape), interrogation, etc. of the object included in the user's voice are recognized from the language extracted by voice recognition (S408).

そして、一定時間以上音声入力の無い無音状態が継続すると一連の音声認識を終了すると、位置検出器１５から入力される測位信号に基づいて自車の絶対方位（Ｄ４）を算出するとともに算出した絶対方位（Ｄ４）をメモリに記憶する（Ｓ４１０）。 When a silent state without a voice input continues for a certain time or longer, when a series of voice recognition is finished, the absolute direction (D4) of the host vehicle is calculated based on the positioning signal input from the position detector 15 and the calculated absolute The direction (D4) is stored in the memory (S410).

次に、発話してから画像を取り込むまでの自車の方位の変化量、すなわち、音声認識を開始したときから音声認識を終了するまでの自車の方位の変化量を算出する。方位の変化量は、図４に示したように、Ｓ４１０において算出した自車の絶対方位（Ｄ４）とＳ４０４において算出した自車の絶対方位（Ｄ３）の差分（Ｄ４−Ｄ３）として算出される（Ｓ４１２）。 Next, the amount of change in the direction of the own vehicle from when the speech is taken to the time when the image is captured, that is, the amount of change in the direction of the own vehicle from when speech recognition is started until the end of speech recognition is calculated. As shown in FIG. 4, the amount of change in direction is calculated as the difference (D4-D3) between the absolute direction (D4) of the vehicle calculated in S410 and the absolute direction (D3) of the vehicle calculated in S404. (S412).

次に、図５に示した自車の方位の変化量と音声認識によって認識された対象物の方位に対し、音声認識によって認識された対象物が含まれると推定される画像の優先度が定義されたテーブルに基づいて、ユーザが意図する対象物が含まれると推定される画像領域の優先度を決定する（Ｓ４１４）。 Next, the priority of the image estimated to include the object recognized by the voice recognition is defined with respect to the amount of change in the direction of the vehicle and the direction of the object recognized by the voice recognition shown in FIG. Based on the determined table, the priority of the image area estimated to include the object intended by the user is determined (S414).

なお、第１実施形態では、音声認識の終了時に車両周辺を撮影した画像を取り込み、本実施形態では、音声認識の開始時の車両周辺を撮影した画像の取り込むため、図５に示したテーブルと異なるテーブルを用いて優先度の決定を行うようにしてもよいが、方位の変化量の算出方法を異ならせることにより、第１実施形態と同じテーブル（図５に示す）を用いて優先度の決定を行うようにしている。 In the first embodiment, an image obtained by photographing the periphery of the vehicle at the end of voice recognition is captured. In this embodiment, an image obtained by photographing the periphery of the vehicle at the start of speech recognition is captured. The priority may be determined using a different table. However, by changing the calculation method of the azimuth change amount, the priority can be determined using the same table (shown in FIG. 5) as in the first embodiment. I try to make a decision.

すなわち、第１実施形態では、図３のＳ１１２の処理において、音声認識の開始時の絶対方位（Ｄ１）と音声認識の終了時の絶対方位（Ｄ２）の差分（Ｄ１−Ｄ２）として方位の変化量を算出したが、本実施形態では、音声認識の終了時の絶対方位（Ｄ４）と音声認識の開始時の絶対方位（Ｄ３）の差分（Ｄ４−Ｄ３）として方位の変化量を算出することにより、第１実施形態と同じテーブル（図５に示す）を用いて優先度の決定を行うようにしている。 That is, in the first embodiment, in the process of S112 in FIG. 3, the azimuth changes as the difference (D1-D2) between the absolute azimuth (D1) at the start of speech recognition and the absolute azimuth (D2) at the end of speech recognition. In this embodiment, the amount of change in orientation is calculated as the difference (D4-D3) between the absolute orientation (D4) at the end of speech recognition and the absolute orientation (D3) at the start of speech recognition. Thus, the priority is determined using the same table (shown in FIG. 5) as in the first embodiment.

次に、画像取り込み時の状況、画像領域の優先度および音声認識によって認識された対象物の特徴に基づいて対象物と推定される画像領域の検出を行う（Ｓ２００）。 Next, an image area estimated as an object is detected based on the situation at the time of image capture, the priority of the image area, and the characteristics of the object recognized by voice recognition (S200).

以下、図３のＳ３００〜Ｓ３０４と同様の処理を行い、対象物の詳細情報をユーザ提供する。 Thereafter, processing similar to S300 to S304 in FIG. 3 is performed, and detailed information on the object is provided to the user.

したがって、例えば、図１０（ａ）に示すように、交差点の手前でユーザが「あの青い車は何？・・・」と発話を開始し、交差点の右折途中で「・・・、左の方に見える・・・」と発話し、図１０（ｂ）に示すように車両が直角の交差点を右折し終わったときに一連の音声認識が終了するような場合、交差点の手前でユーザが発話を開始したときの自車の方位が音声認識の開始時の自車の方位（Ｄ３）としてメモリに記憶され、音声認識の終了時の車両が交差点を右折し終わったときの自車の方位が自車の方位（Ｄ４）としてメモリに記憶される。この場合、方位の変化量θは、θ＝Ｄ４−Ｄ３＝−９０°として算出される。 Therefore, for example, as shown in FIG. 10 (a), the user starts to speak before the intersection, “What is that blue car? When the vehicle completes a right turn at a right-angled intersection as shown in FIG. 10B, a series of voice recognition ends, and the user speaks before the intersection. The direction of the vehicle at the start is stored in the memory as the direction (D3) of the vehicle at the start of voice recognition, and the direction of the vehicle when the vehicle at the end of the voice recognition has finished turning right at the intersection It is stored in the memory as the vehicle direction (D4). In this case, the azimuth change amount θ is calculated as θ = D4−D3 = −90 °.

図５に示すように、相対方位が「左」、方位の変化量が「−６０°以上」の場合の優先度は「前＞右、左」の順となる。 As shown in FIG. 5, when the relative azimuth is “left” and the azimuth change amount is “−60 ° or more”, the priorities are in the order of “front> right, left”.

したがって、図１０（ａ）に示すように、車両前方を撮影するカメラの撮影画像から優先的に対象物の画像を検出する画像領域検出処理を行う。 Therefore, as shown in FIG. 10A, an image area detection process is performed for preferentially detecting the image of the object from the captured image of the camera that captures the front of the vehicle.

また、図１１（ａ）に示すように、交差点の手前でユーザが発話を開始し、交差点の右折途中で「・・・、右の方に見える・・・」と発話し、図１１（ｂ）に示すように車両が直角の交差点を右折し終わったときに一連の音声認識が終了するような場合、方位の変化量θは、θ＝Ｄ４−Ｄ３＝−９０°として算出される。 As shown in FIG. 11 (a), the user starts speaking before the intersection, and utters “... In the case where a series of speech recognition is finished when the vehicle has turned right at a right-angled intersection as shown in FIG. 3), the direction change amount θ is calculated as θ = D4−D3 = −90 °.

図５に示すように、相対方位が「右」、方位の変化量が「−６０°以上」の場合の優先度は「右＞前＞左」の順となる。 As shown in FIG. 5, when the relative azimuth is “right” and the azimuth change amount is “−60 ° or more”, the priorities are in the order of “right> front> left”.

したがって、図１１（ａ）に示すように、車両右側を撮影するカメラの撮影画像、車両前方を撮影するカメラの撮影画像、車両左を撮影するカメラの撮影画像の順番で対象物の画像を検出する画像領域検出処理を行う。 Therefore, as shown in FIG. 11 (a), the image of the object is detected in the order of the photographed image of the camera photographing the right side of the vehicle, the photographed image of the camera photographing the front of the vehicle, and the photographed image of the camera photographing the left side of the vehicle. The image area detection process to be performed is performed.

また、図１２（ａ）に示すように、右側に４５°曲がった道路に差し掛かる手前でユーザが発話を開始し、交差点の右折途中で「・・・、前の方に見える・・・」と発話し、図１２（ｂ）に示すように車両が右側に４５°曲がった道路の走行中に一連の音声認識が終了するような場合、方位の変化量θは、θ＝Ｄ４−Ｄ３＝−４５°として算出される。 Also, as shown in FIG. 12 (a), the user starts speaking just before approaching a road that is 45 ° to the right, and “... is visible in the front ...” in the middle of the right turn at the intersection. When a series of voice recognition ends while the vehicle is traveling on a road that is 45 ° to the right as shown in FIG. 12B, the direction change amount θ is θ = D4−D3 = Calculated as -45 °.

図５に示すように、相対方位が「前」、方位の変化量が「−３０°以上、−６０°未満」の場合の優先度は「前＞左＞右」の順となる。 As shown in FIG. 5, when the relative azimuth is “front” and the azimuth change amount is “−30 ° or more and less than −60 °”, the priorities are in the order of “front> left> right”.

したがって、図１２（ａ）に示すように、車両前方を撮影するカメラの撮影画像、車両左側を撮影するカメラの撮影画像、車両右側を撮影するカメラの撮影画像の順番で対象物の画像を検出する画像領域検出処理を行う。 Accordingly, as shown in FIG. 12 (a), the image of the object is detected in the order of a captured image of the camera that captures the front of the vehicle, a captured image of the camera that captures the left side of the vehicle, and a captured image of the camera that captures the right side of the vehicle. The image area detection process to be performed is performed.

また、図１３（ａ）に示すように、右側に４５°曲がった道路に差し掛かる手前でユーザが発話を開始し、交差点の右折途中で「・・・、右の方に見える・・・」と発話し、図１３（ｂ）に示すように車両が右側に４５°曲がった道路の走行中に一連の音声認識が終了するような場合、方位の変化量θは、θ＝Ｄ４−Ｄ３＝−４５°として算出される。 Also, as shown in FIG. 13A, the user starts speaking just before reaching the road that is 45 ° to the right, and in the middle of the right turn at the intersection, “... looks to the right ...” When a series of voice recognition ends while the vehicle is traveling on a road that is 45 ° to the right as shown in FIG. 13B, the direction change amount θ is θ = D4−D3 = Calculated as -45 °.

したがって、図１３（ａ）に示すように、車両前方を撮影するカメラの撮影画像、車両左側を撮影するカメラの撮影画像、車両右側を撮影するカメラの撮影画像の順番で対象物の画像を検出する画像領域検出処理を行う。 Therefore, as shown in FIG. 13 (a), the image of the object is detected in the order of the photographed image of the camera photographing the front of the vehicle, the photographed image of the camera photographing the left side of the vehicle, and the photographed image of the camera photographing the right side of the vehicle. The image area detection process to be performed is performed.

上述したように、本実施形態では、一定時間分の車両周辺の動画像をメモリに記憶しておき、音声認識の終了時にメモリに記憶された画像を読み出すようにしている。音声認識が終了してから車両周囲の状況を撮影する場合、例えば、ユーザの「あの対向車は何？」という問い合わせに対し、対向車が通り過ぎてしまい、車載カメラ１９で撮影した画像に対向車が写っていないといった状況が考えられるが、本実施形態のように、一定時間分の車両周辺の動画像をメモリに記憶しておき、音声認識の終了時にメモリに記憶された画像を読み出すことで、対向車の画像を取り逃がすことがない。 As described above, in the present embodiment, a moving image around the vehicle for a certain time is stored in the memory, and the image stored in the memory is read out when the speech recognition ends. When shooting the situation around the vehicle after the voice recognition is completed, for example, in response to the user's inquiry “What is that oncoming vehicle?”, The oncoming vehicle passes, and the oncoming vehicle appears in the image taken by the in-vehicle camera 19. However, as in this embodiment, a moving image around the vehicle for a certain period of time is stored in the memory, and the image stored in the memory is read out at the end of the voice recognition. , Never miss the image of the oncoming vehicle.

なお、本実施形態におけるナビゲーション装置１は、一定時間分の車両周辺の動画像をメモリに記憶する例を示したが、車両周辺の画像を定期的に撮影した静止画像を随時メモリに記憶するようにしてもよい。 Although the navigation apparatus 1 according to the present embodiment has shown an example in which a moving image around the vehicle for a certain time is stored in the memory, a still image obtained by periodically capturing images around the vehicle is stored in the memory as needed. It may be.

（第３実施形態）
次に、第３の本実施形態に係る情報提供システムについて説明する。第２実施形態におけるナビゲーション装置１は、一定時間分の車両周辺の動画像をメモリに記憶しておき、音声認識の終了時にメモリに記憶された画像を読み出す例を示したが、本実施形態におけるナビゲーション装置１は、音声認識の開始時に車両周辺の対象物を撮影した画像を取り込む。以下、第２実施形態と同じ部分については同一符号を付して説明を省略し、異なる部分を中心に説明する。 (Third embodiment)
Next, an information providing system according to the third embodiment will be described. The navigation device 1 in the second embodiment has shown an example in which moving images around the vehicle for a certain period of time are stored in the memory, and the image stored in the memory is read out at the end of the speech recognition. The navigation device 1 captures an image obtained by photographing an object around the vehicle at the start of voice recognition. Hereinafter, the same parts as those of the second embodiment are denoted by the same reference numerals, description thereof will be omitted, and different parts will be mainly described.

図１４に、本実施形態に係るナビゲーション装置１のコンピュータの処理を示す。 FIG. 14 shows the processing of the computer of the navigation device 1 according to this embodiment.

まず、マイク１２ａから音声信号の入力があるか否かに基づいて音声入力があるか否かを判定する（Ｓ５００）。 First, it is determined whether there is an audio input based on whether there is an audio signal input from the microphone 12a (S500).

マイク１２ａから音声信号の入力がない場合、Ｓ５００において「なし」と判定され、Ｓ５００の処理へ戻る。 If no audio signal is input from the microphone 12a, it is determined as “none” in S500, and the process returns to S500.

また、ユーザの発話によりマイク１２ａから音声信号が入力されると、Ｓ５００において「あり」と判定され、ユーザの音声を認識する音声認識を開始するとともに車載カメラ１９によって撮影された車両周辺の画像を取り込みメモリに記憶する（Ｓ５０２）。 When an audio signal is input from the microphone 12a due to the user's utterance, “Yes” is determined in S500, voice recognition for recognizing the user's voice is started, and an image around the vehicle captured by the in-vehicle camera 19 is displayed. Store it in the capture memory (S502).

また、音声認識の開始に応じて最新の絶対方位（Ｄ５）を算出し、算出した最新の絶対方位（Ｄ５）をメモリに記憶する（Ｓ５０６）。 Also, the latest absolute direction (D5) is calculated in response to the start of voice recognition, and the calculated latest absolute direction (D5) is stored in the memory (S506).

次に、音声認識により対象物の方位、特徴（色、形）、疑問詞等を抽出する。具体的には、音声認識により抽出された言語からユーザの音声に含まれる対象物の方位、特徴（色、形）、疑問詞等を認識する（Ｓ５０８）。 Next, the orientation, characteristics (color, shape), interrogative, etc. of the object are extracted by voice recognition. Specifically, the orientation, characteristics (color, shape), interrogation, etc. of the object included in the user's voice are recognized from the language extracted by voice recognition (S508).

そして、一定時間以上音声入力の無い無音状態が継続し、一連の音声認識を終了すると、最新の絶対方位（Ｄ６）を算出し、算出した最新の絶対方位（Ｄ６）をメモリに記憶する（Ｓ５１０）。 Then, when the silent state with no voice input continues for a certain time or more and the series of voice recognition is finished, the latest absolute direction (D6) is calculated, and the calculated latest absolute direction (D6) is stored in the memory (S510). ).

次に、発話してから画像を取り込むまでの自車の方位の変化量を算出する。方位の変化量は、図４に示したように、Ｓ５１０において算出した自車の絶対方位（Ｄ６）とＳ５０６において算出した自車の絶対方位（Ｄ５）の差分（Ｄ６−Ｄ５）として算出される（Ｓ５１２）。 Next, the amount of change in the direction of the own vehicle from when the utterance is taken until the image is captured is calculated. As shown in FIG. 4, the direction change amount is calculated as a difference (D6−D5) between the absolute direction (D6) of the own vehicle calculated in S510 and the absolute direction (D5) of the own vehicle calculated in S506. (S512).

以下、図９のＳ４１４〜Ｓ３０４と同様の処理を行い、対象物の詳細情報をユーザ提供する。 Thereafter, the same processing as S414 to S304 in FIG. 9 is performed, and detailed information of the object is provided to the user.

上記したように、本実施形態では、音声認識の開始時に車両周辺の対象物を撮影した画像を取り込むようにしている。第２実施形態では、一定時間分の車両周辺の動画像を記憶するために比較的大容量のメモリを備える必要があり、また、コンピュータに負荷をかけ続けることになるが、本実施形態のように音声認識の開始時に車両周辺の対象物を撮影した静止画像をメモリへ取り込むことで、メモリ容量を少なくし、また、コンピュータへの負荷を低減することができる。 As described above, in this embodiment, an image obtained by capturing an object around the vehicle is captured at the start of voice recognition. In the second embodiment, it is necessary to provide a relatively large-capacity memory in order to store a moving image around the vehicle for a certain period of time, and a load is continuously applied to the computer. In addition, a still image obtained by capturing an object around the vehicle at the start of voice recognition is taken into the memory, so that the memory capacity can be reduced and the load on the computer can be reduced.

（その他の実施形態）
上記実施形態では、Ｓ２１２の処理において、優先度の高い順に対象物の候補をディスプレイ１４ａに並べて表示させる例を示したが、例えば、優先度の高い順にタイミングをずらしてディスプレイ１４ａに表示させてもよい。 (Other embodiments)
In the above embodiment, in the processing of S212, the example in which the target candidates are displayed side by side on the display 14a in the descending order of priority has been described, but for example, the timing may be shifted on the display 14a in order of the priority. Good.

また、上記実施形態では、通信機器３として携帯電話を用いた例を示したが、ＶＩＣＳやＤＳＲＣ等の通信を行う通信機器を用いてもよい。また、ナビゲーション装置１に通信機器３を内蔵した構成としてもよい。 Moreover, although the example which used the mobile telephone as the communication apparatus 3 was shown in the said embodiment, you may use the communication apparatus which communicates, such as VICS and DSRC. Further, the navigation device 1 may be configured to incorporate the communication device 3.

また、上記実施形態では、ユーザの発話に基づいて、音声認識の開始時または終了時に車両周辺を複数のカメラで撮影した画像を取り込む例を示したが、音声認識の途中で車両周辺を複数のカメラで撮影した画像を取り込むようにしてもよい。 In the above embodiment, an example is shown in which an image obtained by capturing the periphery of the vehicle with a plurality of cameras at the start or end of speech recognition is captured based on the user's utterance. You may make it capture the image image | photographed with the camera.

また、上記実施形態では、Ｓ１１２とＳ４１２のように、方位の変化量の算出方法を異ならせることによって、車両周辺を撮影した画像を取り込むタイミングが、音声認識の終了時と開始時のように異なる場合であっても同一のテーブル（図５に示す）を用いて優先度の決定を行う例を示したが、方位の変化量の算出方法を統一し、音声認識の終了時に車両周辺を撮影した画像を取り込む場合と音声認識の開始時に車両周辺を撮影した画像を取り込む場合とで、別々のテーブルを用いて優先度を決定するようにしてもよい。 Further, in the above-described embodiment, the timing for capturing an image of the periphery of the vehicle is different as at the end of voice recognition and at the start by changing the calculation method of the amount of change in orientation as in S112 and S412. Even in this case, the example of determining the priority using the same table (shown in FIG. 5) is shown. However, the calculation method of the direction change amount is unified, and the periphery of the vehicle is photographed at the end of the speech recognition. The priority may be determined using separate tables depending on whether an image is captured or an image obtained by capturing the periphery of the vehicle at the start of voice recognition.

なお、上記実施形態における構成と特許請求の範囲の構成との対応関係について説明すると、Ｓ１１２、Ｓ４１２、Ｓ５１２の各処理が方位変化量算出手段に相当し、Ｓ１０６、Ｓ４００、Ｓ５０２の処理が画像取込手段に相当し、Ｓ１１４、Ｓ４１４、Ｓ５１４の各処理が優先度決定手段に相当し、外部記憶装置１６の外部記憶媒体が記憶手段に相当し、Ｓ２００の処理が画像領域検出手段に相当し、Ｓ２１２の処理が選択手段に相当し、Ｓ１００、Ｓ４０４、Ｓ５０６の処理が第１の方位算出手段に相当し、Ｓ１１０、Ｓ４１０、Ｓ５１０の各処理が第２の方位算出手段に相当し、Ｓ１０８、Ｓ４０２、Ｓ５０４の各処理が状況取得手段に相当し、Ｓ２０４の処理が色補正手段に相当し、通信機器接続装置１７および通信機器３が通信手段に相当し、Ｓ３００の処理が送信手段に相当し、Ｓ３０２の処理が取得手段に相当し、Ｓ３０４の処理が情報提供手段に相当する。 The correspondence between the configuration in the above embodiment and the configuration in the claims will be described. Each processing in S112, S412, and S512 corresponds to a direction change amount calculation unit, and the processing in S106, S400, and S502 is an image capture. Each process of S114, S414, and S514 corresponds to a priority determination unit, the external storage medium of the external storage device 16 corresponds to a storage unit, and the process of S200 corresponds to an image area detection unit. The process of S212 corresponds to the selection means, the processes of S100, S404, and S506 correspond to the first azimuth calculation means, the processes of S110, S410, and S510 correspond to the second azimuth calculation means, and S108, S402. , S504 corresponds to status acquisition means, S204 corresponds to color correction means, and the communication device connection device 17 and the communication device 3 communicate with each other. Corresponds to the processing of S300 corresponds to the transmission means, the processing of S302 corresponds to the acquiring means, the processing of S304 corresponds to the information providing means.

本発明の一実施形態に係る情報提供システムの概略構成を示す図である。It is a figure showing a schematic structure of an information service system concerning one embodiment of the present invention. 色データベース、形データベース、相対方位データベースを説明するための図表である。It is a chart for demonstrating a color database, a shape database, and a relative orientation database. 第１実施形態におけるナビゲーション装置１のコンピュータの処理を示すフローチャートである。It is a flowchart which shows the process of the computer of the navigation apparatus 1 in 1st Embodiment. 方位の変化量についての説明図である。It is explanatory drawing about the variation | change_quantity of an azimuth | direction. 対象物が含まれると推定される画像領域の優先度が定義されたテーブルを示す図である。It is a figure which shows the table in which the priority of the image area estimated that the target object is contained was defined. ナビゲーション装置１のコンピュータによる画像領域検出処理のフローチャートである。4 is a flowchart of image area detection processing by a computer of the navigation device 1. 色範囲変換テーブル（時間帯）の一例を示す図表である。It is a chart which shows an example of a color range conversion table (time slot). 色範囲変換テーブル（天候）の一例を示す図表である。It is a chart which shows an example of a color range conversion table (weather). 第２実施形態におけるナビゲーション装置１のコンピュータの処理を示すフローチャートである。It is a flowchart which shows the process of the computer of the navigation apparatus 1 in 2nd Embodiment. 第２実施形態における作動を説明するための図である。It is a figure for demonstrating the action | operation in 2nd Embodiment. 第２実施形態における作動を説明するための図である。It is a figure for demonstrating the action | operation in 2nd Embodiment. 第２実施形態における作動を説明するための図である。It is a figure for demonstrating the action | operation in 2nd Embodiment. 第２実施形態における作動を説明するための図である。It is a figure for demonstrating the action | operation in 2nd Embodiment. 第３実施形態におけるナビゲーション装置１のコンピュータの処理を示すフローチャートである。It is a flowchart which shows the process of the computer of the navigation apparatus 1 in 3rd Embodiment.

Explanation of symbols

１…ナビゲーション装置、２…サーバ、１０…リモコンセンサ、１１…タッチパネル、
１２…音声認識部、１２ａ…マイク、１３…音声合成部、１３ａ…スピーカ、
１４…表示装置、１４ａ…ディスプレイ、１５…位置検出器、１６…外部記憶装置、
１７…通信機器接続装置、１８…制御回路、１９…車載カメラ、２０…データベース。 DESCRIPTION OF SYMBOLS 1 ... Navigation apparatus, 2 ... Server, 10 ... Remote control sensor, 11 ... Touch panel,
12 ... voice recognition unit, 12a ... microphone, 13 ... voice synthesis unit, 13a ... speaker,
14 ... display device, 14a ... display, 15 ... position detector, 16 ... external storage device,
DESCRIPTION OF SYMBOLS 17 ... Communication apparatus connection apparatus, 18 ... Control circuit, 19 ... Car-mounted camera, 20 ... Database.

Claims

A vehicle periphery photographing device that photographs a vehicle periphery using a plurality of cameras mounted on the vehicle,
Direction change amount calculation means for calculating the amount of change in the direction of the vehicle at the start and end of the voice recognition function for recognizing the voice spoken by the user;
Image capturing means for capturing images obtained by photographing the periphery of the vehicle with a plurality of cameras at a predetermined timing based on the user's utterance;
The object recognized by the voice recognition based on the amount of change in the direction of the own vehicle, the direction of the object recognized by the voice recognition, and the predetermined timing at which the image is captured by the image capturing means. Storage means for storing a table in which priority levels of images estimated to be included are defined;
Priority determining means for determining the priority of the images captured by the image capturing means in the order in which it is estimated that the object recognized by the speech recognition is included using the table stored in the storage means. When,
Image area detecting means for detecting an image area estimated as an object recognized by the voice recognition from the images captured by the image capturing means;
When a plurality of image areas are detected by the image area detection unit, the plurality of image area candidates are displayed on the display unit according to the priority determined by the priority determination unit, and displayed on the display unit A vehicle periphery photographing apparatus comprising: selection means for selecting an object intended by a user from an image area.

2. The vehicle periphery photographing apparatus according to claim 1, wherein the predetermined timing is when voice recognition starts or when voice recognition ends.

First azimuth calculating means for calculating the azimuth of the vehicle at the start of the voice recognition;
Second azimuth calculating means for calculating the azimuth of the vehicle at the end of the voice recognition,
The azimuth change amount calculating means starts and ends the voice recognition based on the azimuth of the own vehicle calculated by the first azimuth calculating means and the azimuth of the own vehicle calculated by the second azimuth calculating means. The vehicle periphery photographing apparatus according to claim 1, wherein a change amount of the direction of the own vehicle at the time is calculated.

Situation acquisition means for acquiring a situation at the time of photographing around the vehicle;
Color correction means for correcting the color detection range recognized by the voice recognition according to the situation at the time of shooting around the vehicle acquired by the situation acquisition means;
The image area detection means is an image estimated as an object recognized by the speech recognition from images captured by the image capture means based on the color detection range corrected by the color correction means. The vehicle periphery photographing apparatus according to any one of claims 1 to 3, wherein an area is detected.

The situation acquisition means acquires a time zone at the time of shooting an object around the vehicle,
5. The vehicle periphery photographing apparatus according to claim 4, wherein the color correction unit corrects a color detection range recognized by the voice recognition in accordance with a time zone acquired by the situation acquisition unit.

The situation acquisition means acquires the weather at the time of photographing the object around the vehicle,
6. The vehicle periphery photographing apparatus according to claim 4, wherein the color correction unit corrects a color detection range recognized by the voice recognition according to the weather acquired by the situation acquisition unit.

Communication means for communicating with a server having a search function for searching for detailed information;
Transmitting means for transmitting image data of the image area of the object selected by the selecting means to the server via the communication means;
Obtaining means for obtaining detailed information of the object searched by the server based on the image data transmitted from the transmitting means via the communication means;
The vehicle periphery photographing apparatus according to claim 1, further comprising: an information providing unit that provides a user with detailed information about the object acquired by the acquiring unit.

A vehicle periphery photographing method for photographing a periphery of a vehicle using a plurality of cameras mounted on the vehicle,
Calculate the amount of change in the direction of the vehicle at the start and end of the voice recognition function that recognizes the voice spoken by the user,
Capture images taken by a plurality of cameras around the vehicle at a predetermined timing based on the user's utterance,
Image priority estimated to include the object recognized by the voice recognition based on the amount of change in the direction of the vehicle, the direction of the object recognized by the voice recognition, and the timing of capturing the image Determining the priority of the images captured by the image capturing means in the order in which it is estimated that the object recognized by the speech recognition is included using the table defined by
Detecting an image area estimated as an object recognized by the voice recognition from the captured image;
When a plurality of image areas are detected, the plurality of image area candidates are displayed on a display unit according to the determined priority, and an object intended by the user is selected from the image areas displayed on the display unit A vehicle periphery photographing method.

9. The vehicle periphery photographing method according to claim 8, wherein the predetermined timing is when voice recognition starts or when voice recognition ends.