JP7349072B2

JP7349072B2 - Voice recognition system for elevators

Info

Publication number: JP7349072B2
Application number: JP2022019655A
Authority: JP
Inventors: 行宏宮川
Original assignee: Fujitec Co Ltd
Current assignee: Fujitec Co Ltd
Priority date: 2022-02-10
Filing date: 2022-02-10
Publication date: 2023-09-22
Anticipated expiration: 2042-02-10
Also published as: JP2023117120A

Description

本発明は、エレベータの搭乗者の音声を認識するためのエレベータ用の音声認識システムに関する。 TECHNICAL FIELD The present invention relates to an elevator speech recognition system for recognizing the voices of elevator passengers.

従来、エレベータには、搭乗者の音声を認識する音声認識システムが搭載されているものがあり、例えば、特許文献１のエレベータには、かご内マイクで集音した音声を認識し、認識した音声に基づいてかご内の状況を認識する音声認識装置を備える音声認識システムが搭載されている。 Conventionally, some elevators are equipped with a voice recognition system that recognizes the voice of passengers.For example, the elevator disclosed in Patent Document 1 recognizes the voice collected by a microphone inside the car, and then outputs the recognized voice. The car is equipped with a voice recognition system that includes a voice recognition device that recognizes the situation inside the car based on the following information.

特開２０１１－０７３８１９号公報Japanese Patent Application Publication No. 2011-073819

ところで、上記従来のような音声認識システムのかご内マイクには、音声認識の対象とした搭乗者の音声だけでなく、音声認識の対象外の搭乗者の音声や環境音も入るため、音声認識装置が正しく音声を認識できないことがある。 By the way, the in-car microphone of the above-mentioned conventional voice recognition system receives not only the voices of the passengers targeted for voice recognition, but also the voices of passengers not targeted for voice recognition and environmental sounds. The device may not be able to correctly recognize speech.

そこで、本発明は、かかる実情に鑑み、音声認識の精度を向上させることができるエレベータ用の音声認識システムの提供を課題とする。 SUMMARY OF THE INVENTION In view of these circumstances, it is an object of the present invention to provide a voice recognition system for elevators that can improve the accuracy of voice recognition.

本発明のエレベータ用の音声認識システムは、
かご内で集音した音データから搭乗者の音声データを抽出し且つ該音声データの発生源の位置を特定する処理を実行する音処理手段と、
かご内を撮像した画像データに基づいて搭乗者の位置を特定する処理を実行する画像処理手段と、
前記音処理手段で抽出した前記音声データのうち、前記発生源の位置が前記画像処理手段で特定した前記搭乗者の位置と同じものに対して音声認識を行う音声認識手段と、を備える。 The voice recognition system for elevators of the present invention includes:
a sound processing means that executes a process of extracting voice data of a passenger from sound data collected in the car and identifying the location of the source of the voice data;
an image processing means that executes processing to identify the position of the passenger based on image data captured inside the car;
A voice recognition means is provided for performing voice recognition on the voice data extracted by the sound processing means, in which the position of the source is the same as the position of the passenger identified by the image processing means.

上記構成のエレベータ用の音声認識システムは、搭乗者の音声の発生源の位置と搭乗者の立ち位置の対応関係が一致している音声データに対して音声認識手段が音声認識を行えるように構成されているため、音声認識の対象とすべき音声データとは別の音声データに音声認識をしてしまうことが抑えられ、これにより、音声認識の精度が向上するようになっている。 The voice recognition system for elevators having the above configuration is configured such that the voice recognition means can perform voice recognition on voice data in which the correspondence between the position of the source of the voice of the passenger and the position of the passenger is the same. This prevents voice recognition from being performed on voice data different from the voice data to be subjected to voice recognition, thereby improving the accuracy of voice recognition.

本発明のエレベータ用の音声認識システムは、
前記音処理手段は、前記音声データに基づいて発声元の搭乗者の特徴を示す音声特徴情報を抽出する音声特徴抽出手段を有するように構成されていてもよい。 The voice recognition system for elevators of the present invention includes:
The sound processing means may be configured to include a sound feature extraction means for extracting sound feature information indicating characteristics of the passenger who is the source of the sound based on the sound data.

このようにすれば、搭乗者の特徴を示す情報を用いることができるため、音声認識を行うべき対象であるか否かの判定や音声認識の精度を高めることができる。 In this way, since information indicating the characteristics of the passenger can be used, it is possible to determine whether or not the passenger is an object to be subjected to voice recognition, and to improve the accuracy of voice recognition.

本発明のエレベータ用の音声認識システムは、
前記画像処理手段は、前記画像データに写る搭乗者の特徴を示す被写体特徴情報を抽出する被写体特徴抽出手段を有するように構成されていてもよい。 The voice recognition system for elevators of the present invention includes:
The image processing means may be configured to include a subject feature extraction means for extracting subject feature information indicating characteristics of the passenger appearing in the image data.

この場合においても、搭乗者の特徴を示す情報を用いることができるため、音声認識を行うべき対象であるか否かの判定や音声認識の精度を高めることができる。 Even in this case, since information indicating the characteristics of the passenger can be used, it is possible to determine whether or not the passenger is a target for voice recognition, and to improve the accuracy of voice recognition.

本発明のエレベータ用の音声認識システムは、
前記画像データに写る搭乗者の挙動に基づいて音声認識を行う対象とすべき搭乗者であるか否かを判定する挙動判定手段を備え、
前記音声認識手段は、前記発生源の位置が、前記挙動判定手段によって音声認識を行う対象とすべき搭乗者であるか判定された前記画像データが示す前記搭乗者の位置と同じである前記音声データに対して音声認識を行うように構成されていてもよい。 The voice recognition system for elevators of the present invention includes:
comprising behavior determination means for determining whether or not the passenger is a target for voice recognition based on the passenger's behavior reflected in the image data;
The voice recognition means is configured to generate the voice in which the position of the source is the same as the position of the passenger indicated by the image data determined by the behavior determination means to be the passenger to whom the voice recognition is to be performed. The data may be configured to perform voice recognition.

このようにすれば、搭乗者の動きも用いて音声認識を行うべき対象であるか否かの判定を行うことができるため、音声認識を行うべき対象であるか否かの判定精度を高めることができる。 In this way, it is possible to determine whether or not the passenger should perform voice recognition using the movement of the passenger, thereby increasing the accuracy of determining whether or not the passenger should perform voice recognition. I can do it.

本発明のエレベータ用の音声認識システムは、
前記音声データに基づいて抽出された言葉に基づいて音声認識を行う対象とすべき搭乗者であるか否かを判定する言葉判定手段を備え、
前記音声認識手段は、前記言葉判定手段が音声認識を行う対象とすべき搭乗者であると判定した前記音声データに対して音声認識を行うようにしてもよい。 The voice recognition system for elevators of the present invention includes:
comprising word determination means for determining whether or not the passenger is a target for voice recognition based on words extracted based on the voice data;
The voice recognition means may perform voice recognition on the voice data for which the word determination means has determined that the passenger is a passenger to whom voice recognition is to be performed.

このようにすれば、搭乗者の言葉に基づいて音声認識を行うべき対象であるか否かの判定を行うことができるため、音声認識を行うべき対象であるか否かの判定精度を高めることができる。 In this way, it is possible to determine whether or not the object should be subjected to voice recognition based on the passenger's words, thereby increasing the accuracy of determining whether or not the object is to be subjected to voice recognition. I can do it.

以上のように、本発明のエレベータ用の音声認識システムは、音声認識の精度を向上させることができるという優れた効果を奏し得る。 As described above, the voice recognition system for elevators according to the present invention can achieve the excellent effect of improving the accuracy of voice recognition.

図１は、本発明の一実施形態に係るエレベータ用の音声認識システムの構成の概要を示すブロック図である。FIG. 1 is a block diagram schematically showing the configuration of a voice recognition system for elevators according to an embodiment of the present invention. 図２は、同実施形態に係るエレベータ用の音声認識システムで用いる搭乗者情報の説明図である。FIG. 2 is an explanatory diagram of passenger information used in the elevator voice recognition system according to the embodiment. 図３は、同実施形態に係るエレベータ用の音声認識システムで用いる画像データの説明図である。FIG. 3 is an explanatory diagram of image data used in the elevator voice recognition system according to the embodiment. 図４は、同実施形態に係るエレベータ用の音声認識システムのメインフローチャートである。FIG. 4 is a main flowchart of the voice recognition system for elevators according to the same embodiment. 図５は、同実施形態に係るエレベータ用の音声認識システムのサブフローチャートであって、音声関連情報を作成する処理の流れを示すサブフローチャートである。FIG. 5 is a sub-flowchart of the elevator speech recognition system according to the same embodiment, and is a sub-flowchart showing the flow of processing for creating speech-related information. 図６は、同実施形態に係るエレベータ用の音声認識システムのサブフローチャートであって、画像処理手段を作成する処理の流れを示すサブフローチャートである。FIG. 6 is a sub-flowchart of the voice recognition system for an elevator according to the same embodiment, and is a sub-flowchart showing the flow of processing for creating an image processing means. 図７は、同実施形態に係るエレベータ用の音声認識システムのサブフローチャートであって、搭乗者情報を作成する処理の流れを示すサブフローチャートである。FIG. 7 is a sub-flowchart of the elevator voice recognition system according to the same embodiment, and is a sub-flowchart showing the flow of processing for creating passenger information. 図８は、同実施形態に係るエレベータ用の音声認識システムのサブフローチャートであって、搭乗者が連絡者であるか否かを判定する処理の流れを示すサブローチャートである。FIG. 8 is a sub-flowchart of the voice recognition system for an elevator according to the same embodiment, and is a sub-flowchart showing the flow of processing for determining whether or not a passenger is a contact person.

以下、本発明の一実施形態にかかるエレベータ用の音声認識システム（以下、音声認識システムと称する）について、添付図面を参照しつつ説明する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS A voice recognition system for elevators (hereinafter referred to as voice recognition system) according to an embodiment of the present invention will be described below with reference to the accompanying drawings.

音声認識システムは、かご内の搭乗者の音声に対して音声認識を行うように構成されたシステムである。また、本実施形態では、かご内の搭乗者のうち、外部に連絡を取ろうとしている搭乗者（本実施形態では連絡者と称する）を特定したうえで、この搭乗者の音声を認識できるように構成されていることを一例に挙げて音声認識システムの説明を行う。 The voice recognition system is a system configured to perform voice recognition on the voices of passengers in the car. Furthermore, in this embodiment, it is possible to identify a passenger who is trying to contact someone outside the car (referred to as a contact person in this embodiment) among the passengers in the car, and then to recognize the voice of this passenger. The speech recognition system will be explained by taking as an example the structure of the speech recognition system.

なお、音声認識システムで音声認識を行う対象とするかごは、例えば、図１に示すように、かご内の音を集音する集音装置Ｍと、かご内を撮像する撮像装置Ｃと、かごの外部に連絡をとるための外部連絡装置Ｔ（図３参照）とが設置されているものであればよい。 Note that a car that is subject to voice recognition in a voice recognition system includes, for example, as shown in FIG. Any device may be used as long as it is equipped with an external communication device T (see FIG. 3) for communicating with the outside.

本発明の音声認識システム１は、かご内で集音した音データに基づいて、搭乗者一人ごとに該搭乗者の音声データを含む音声関連情報を作成する処理を実行する音処理手段２と、かご内を撮像した画像データに基づいて、搭乗者一人ごとに該搭乗者が写る被写体データを含む被写体関連情報を作成する処理を実行する画像処理手段３と、同一の搭乗者の音声関連情報と被写体関連情報とを関連付けて搭乗者情報を作成する搭乗者情報作成手段４と、搭乗者情報に基づいて音声認識を行う対象とする搭乗者を選択する対象選択手段５と、対象選択手段５が音声認識を行う対象として選択した搭乗者の音声データに対して音声認識を行う音声認識手段６と、を備える。 The voice recognition system 1 of the present invention includes a sound processing means 2 that executes processing for creating voice-related information including voice data of each passenger based on sound data collected in the car; An image processing means 3 that executes processing for creating subject-related information including subject data in which the passenger is photographed for each passenger based on image data captured inside the car; Passenger information creation means 4 that creates passenger information by associating it with subject-related information; object selection means 5 that selects a passenger to perform voice recognition based on the passenger information; and object selection means 5. The vehicle includes a voice recognition means 6 that performs voice recognition on voice data of a passenger selected as a target for voice recognition.

音処理手段２は、音データを取得する音データ取得手段２０と、音データ取得手段２０で取得した音データから搭乗者ごとの音声データを作成する音声作成手段２１と、音声作成手段２１が作成した音声データに基づいて音声の発生源（搭乗者の位置を示す情報であり、本実施形態では音声位置情報と称する）を導出する音声位置導出手段２２と、音声作成手段２１が抽出した音声データに基づいて発声元の搭乗者の特徴を示す情報（本実施形態では音声特徴情報と称する）を導出する音声特徴導出手段２３と、を有する。 The sound processing means 2 includes a sound data acquisition means 20 that acquires sound data, a sound creation means 21 that creates sound data for each passenger from the sound data acquired by the sound data acquisition means 20, and a sound created by the sound creation means 21. audio position deriving means 22 that derives the source of the sound (information indicating the position of the passenger, referred to as audio position information in this embodiment) based on the sound data obtained by the passenger; and the sound data extracted by the sound generating means 21. and a voice feature deriving means 23 for deriving information (referred to as voice feature information in this embodiment) indicating the characteristics of the passenger who is the source of the voice based on the voice characteristic information.

音データ取得手段２０が取得する音データとは、かご内に設置されている集音装置Ｍが集音した音データのことである。また、音データ取得手段２０は、集音装置Ｍから直接的に音データを取得してもよいし、集音装置Ｍによって集音された後に記憶手段に記憶された音データを取得するように構成されていてもよい（すなわち、集音装置Ｍから間接的に音データを取得するように構成されていてもよい。 The sound data acquired by the sound data acquisition means 20 is the sound data collected by the sound collection device M installed in the car. Further, the sound data acquisition means 20 may acquire sound data directly from the sound collection device M, or may acquire sound data stored in the storage means after the sound collection device M has collected the sound data. (In other words, it may be configured to indirectly acquire sound data from the sound collection device M.

音声作成手段２１は、音データから雑音を除去する雑音除去処理と、雑音除去処理で雑音を除去した音データから搭乗者一人ごとの音声データを作成する音声抽出処理と、を実行するように構成されている。 The voice creation means 21 is configured to perform a noise removal process that removes noise from sound data, and a voice extraction process that creates voice data for each passenger from the sound data from which noise has been removed in the noise removal process. has been done.

雑音除去処理において音データから除去する雑音とは、搭乗者がいない状態のかご内で集音できる音のことであり、かごの動作音や、かご内でのアナウンス、かご外から入り込む外部の環境音等のことである。 The noise that is removed from the sound data in the noise removal process is the sound that can be collected inside the car when there are no passengers, such as the sound of the car operating, announcements inside the car, and the external environment that enters from outside the car. It refers to sounds, etc.

さらに、雑音除去処理は、予め作成されている雑音のデータを取得し、該雑音のデータに基づいて音データから雑音の成分（雑音のデータに含まれている音成分と同一又は略同一の音成分）を除去するように構成されていればよい。 Furthermore, the noise removal process acquires noise data created in advance, and extracts noise components from the sound data (sounds that are the same or approximately the same as the sound components included in the noise data) based on the noise data. components).

音声抽出処理は、音データから搭乗者の音声に該当する部分を切り出すことによって音声データを作成してもよいし、音データから搭乗者の音声成分を分離することによって音声データを作成するように構成されていてもよい。すなわち、音声抽出処理は、搭乗者一人分の音声が入った音声データを作成するように構成されていればよい。 In the audio extraction process, audio data may be created by cutting out a portion corresponding to the passenger's voice from the sound data, or audio data may be created by separating the passenger's voice component from the sound data. may be configured. That is, the audio extraction process may be configured to create audio data containing the audio of one passenger.

音声位置導出手段２２は、例えば、かご内に設置した複数の集音装置Ｍによって集音した複数の音データに基づいて音源位置情報を導出するように構成されていればよい。この場合、音声位置導出手段２２は、複数の音データを比較して音の伝わる速さの差や、音圧の差を導出し、これらの情報と、各集音装置の設置位置等の情報とに基づいて音声の発生源の位置を導出するように構成されていればよい。 The sound position deriving means 22 may be configured to derive sound source position information based on a plurality of sound data collected by a plurality of sound collecting devices M installed in the car, for example. In this case, the sound position deriving means 22 compares the plurality of sound data to derive the difference in sound propagation speed and the difference in sound pressure, and uses these information and information such as the installation position of each sound collection device. It is sufficient that the position of the source of the sound is derived based on the above.

また、音声位置情報は、かご内の所定の範囲のエリア（例えば、かご内を前後方向と左右方向とで区画することで設定した複数のエリア）の位置を示す情報であってもよいし、かご内における前後方向と左右方向の座標を示す情報であってもよい。 Further, the audio position information may be information indicating the position of a predetermined area within the car (for example, a plurality of areas set by dividing the inside of the car in the front-rear direction and the left-right direction), The information may also be information indicating coordinates in the front-rear direction and left-right direction within the car.

音声特徴導出手段２３は、音声データに基づいて音声成分（搭乗者の音声成分）の特徴を導出する音声成分導出処理と、音声成分導出処理で抽出した音声成分に基づいて搭乗者の特徴を示す音声特徴情報を導出する音声特徴導出処理とを実行するように構成されている。 The voice feature deriving means 23 performs a voice component derivation process that derives the characteristics of a voice component (passenger's voice component) based on the voice data, and indicates the characteristics of the passenger based on the voice component extracted in the voice component derivation process. The audio feature derivation process is configured to derive audio feature information.

音声成分導出処理は、例えば、ケプストラム分析により、音声データから搭乗者の音声成分を導出するように構成されていればよい。また、音声成分導出処理で導出する音声成分とは、例えば、音の大きさや、高さ、音色等のことである。 The audio component derivation process may be configured to derive the passenger's audio component from the audio data, for example, by cepstrum analysis. The audio components derived in the audio component derivation process include, for example, the loudness, pitch, and timbre of the sound.

音声特徴導出処理は、音声成分導出処理で抽出した音声成分に基づいて性別や、年齢層等の搭乗者の外見に関連する特徴を導出するように構成されていればよい。 The audio feature deriving process may be configured to derive features related to the appearance of the passenger, such as gender and age group, based on the audio components extracted in the audio component deriving process.

ここで、図２に示すように、本実施形態の音処理手段２では、音声位置導出手段２２が抽出した音源位置情報Ｄ１１は音声データＤ１０に関連付けられ、音声特徴導出手段２３が音声特徴導出処理で導出した音声特徴情報Ｄ１２は音声データＤ１０に関連付ける。そのため、音声関連情報Ｄ１には、音声データＤ１０と、音源位置情報Ｄ１１と、音声特徴情報Ｄ１２とが含まれる。 Here, as shown in FIG. 2, in the sound processing means 2 of this embodiment, the sound source position information D11 extracted by the sound position deriving means 22 is associated with the sound data D10, and the sound feature deriving means 23 performs the sound feature deriving process. The voice feature information D12 derived in step 1 is associated with the voice data D10. Therefore, the audio related information D1 includes audio data D10, sound source position information D11, and audio feature information D12.

画像処理手段３は、画像データを取得する画像データ取得手段３０と、画像データ取得手段３０で取得した画像データから搭乗者ごとの被写体データを抽出する被写体抽出手段３１と、被写体抽出手段３１が抽出した被写体データに基づいて搭乗者の位置情報（以下、被写体位置情報と称する）を導出する被写体位置導出手段３２と、被写体抽出手段３１が抽出した被写体データに基づいて被写体（搭乗者）の挙動を導出する挙動導出手段３３と、被写体抽出手段３１が抽出した被写体データに基づいて被写体（搭乗者）の特徴を示す情報（本実施形態では被写体特徴情報と称する）を導出する被写体特徴導出手段３４と、を有する。 The image processing means 3 includes an image data acquisition means 30 that acquires image data, a subject extraction means 31 that extracts subject data for each passenger from the image data acquired by the image data acquisition means 30, and a subject extraction means 31 that extracts subject data for each passenger. The subject position deriving means 32 derives the position information of the passenger (hereinafter referred to as subject position information) based on the subject data obtained, and the subject extracting means 31 extracts the behavior of the subject (passenger) based on the subject data extracted. a behavior deriving means 33 for deriving the behavior; and a subject characteristic deriving means 34 for deriving information indicating the characteristics of the subject (passenger) (referred to as subject characteristic information in this embodiment) based on the subject data extracted by the subject extracting means 31. , has.

画像データ取得手段３０が取得する画像データとは、かご内に設置されている撮像装置Ｃ（例えば、カメラ等）が撮像した画像のことである。また、画像データ取得手段３０は、撮像装置Ｃから直接的に画像データを取得してもよいし、撮像装置Ｃによって撮像された後に記憶手段に記憶された画像データを取得するように構成されていてもよい（すなわち、撮像装置Ｃから間接的に画像データを取得するように構成されていてもよい）。 The image data acquired by the image data acquisition means 30 is an image captured by an imaging device C (for example, a camera, etc.) installed in the car. Further, the image data acquisition means 30 may acquire image data directly from the imaging device C, or may be configured to acquire image data that has been captured by the imaging device C and then stored in the storage device. (In other words, it may be configured to indirectly acquire image data from the imaging device C).

被写体抽出手段３１は、画像データ内で搭乗者が写っている領域を指定する。本実施形態では、図３に示すように、画像データＰに対して複数の区画領域Ｐ１が設定されており、被写体抽出手段３１は、複数の区画領域Ｐ１のうち、搭乗者（Ｈ１，Ｈ２）が写っている区画領域Ｐ１を指定し、該区画領域Ｐ１を被写体データとするように構成されている。 The subject extraction means 31 specifies an area in which the passenger is photographed within the image data. In this embodiment, as shown in FIG. The camera is configured to designate a divided area P1 in which is photographed, and use the divided area P1 as subject data.

図３では、連絡者である搭乗者に符号「Ｈ１」を付し、連絡者でない搭乗者には符号「Ｈ２」を付している。 In FIG. 3, a passenger who is a contact person is labeled with a symbol "H1", and a passenger who is not a contact person is labeled with a symbol "H2".

なお、画像データＰは、動画であってもよいし、静止画であってもよい。画像データＰを静止画とする場合は、例えば、時系列順に連続する複数の静止画を一つの画像データとすればよい。 Note that the image data P may be a moving image or a still image. When the image data P is a still image, for example, a plurality of chronologically consecutive still images may be treated as one image data.

被写体位置導出手段３２は、被写体抽出手段３１が抽出した被写体データに基づいて搭乗者の位置情報（かご内における搭乗者の位置を示す情報）を導出する。 The subject position deriving means 32 derives position information of the passenger (information indicating the position of the passenger within the car) based on the subject data extracted by the subject extracting means 31.

被写体位置導出手段３２は、例えば、画像データＰ内での被写体データ（区画領域）Ｐ１の位置に基づいて搭乗者位置情報を導出するように構成されていてもよい。この場合、画像データの各区画領域には、予めかご内の対応する位置が関連付けておき、被写体位置導出手段３２は、被写体抽出手段３１が抽出した被写体データ（区画領域）Ｐの位置を搭乗者位置情報とするように構成されていればよい。 The subject position deriving means 32 may be configured to derive the passenger position information based on the position of the subject data (divided area) P1 within the image data P, for example. In this case, each divided area of the image data is associated with a corresponding position in the car in advance, and the subject position deriving means 32 determines the position of the subject data (divided area) P extracted by the subject extracting means 31. It is sufficient if the information is configured to be position information.

挙動導出手段３３は、図１に示すように、搭乗者の挙動を導出する挙動導出処理と、挙動導出処理で抽出した搭乗者の挙動に基づいて、搭乗者がかごの外部に連絡をとろうとしている連絡者であるか否かを判定する連絡者判定処理と、連絡者判定処理の判定結果に基づいて搭乗者がかごの外部に連絡をとろうとしている連絡者であることを示す連絡者情報、又は搭乗者がかごの外部に連絡をとろうとしている連絡者ではないことを示す非連絡者情報を被写体データに関連付ける連絡者情報付与処理と、を実行するように構成されている。 As shown in FIG. 1, the behavior derivation means 33 performs a behavior derivation process for deriving the passenger's behavior, and determines whether the passenger should contact the outside of the car based on the passenger's behavior extracted in the behavior derivation process. A contact person determination process that determines whether the passenger is the intended contact person, and a contact person that indicates that the passenger is the contact person who is trying to contact someone outside the car based on the determination result of the contact person determination process. information, or non-contact person information indicating that the passenger is not a contact person attempting to contact someone outside the car, with the subject data.

本実施形態の挙動導出処理は、搭乗者の挙動の有無を判定し、搭乗者に挙動が有ると判定した場合は、搭乗者の挙動の種類と、搭乗者の向きとを導出するように構成されている。 The behavior derivation process of this embodiment is configured to determine whether or not the occupant is behaving, and if it is determined that the occupant is behaving, to derive the type of behavior of the occupant and the orientation of the occupant. has been done.

挙動導出処理は、例えば、搭乗者の挙動の種類として、搭乗者が発話していると認められる発話動作や、搭乗者が発話していないと認められる非発話動作を導出するように構成されていればよい。 The behavior derivation process is configured, for example, to derive, as types of passenger behavior, speech actions that are recognized as the passenger making a speech, and non-speech actions that are recognized as the passenger not making a speech. That's fine.

また、挙動導出処理は、例えば、搭乗者の向きとして、搭乗者が外部連絡装置の方に向いているか、搭乗者が外部連絡装置とは別の方に向いているかを導出するように構成されていればよい。 Further, the behavior derivation process is configured to derive, for example, whether the passenger is facing the external communication device, or whether the passenger is facing a direction different from the external communication device. All you have to do is stay there.

連絡者判定処理は、挙動導出処理で導出した搭乗者の挙動の種類が発話動作であり、且つ搭乗者の向きが外部連絡装置の方である場合は搭乗者が連絡者であると判定し、挙動導出処理で搭乗者に挙動がないと判定した場合、若しくは、搭乗者の挙動の種類が非発話動作であると導出されるか、搭乗者の向きが搭乗者の挙動の種類が非発話動作であると導出された場合に、搭乗者が連絡者ではないと判定するように構成されている。 The contact person determination process determines that the passenger is the contact person if the type of the passenger's behavior derived in the behavior derivation process is a speaking motion and the passenger is facing the external communication device, If the behavior derivation process determines that the passenger has no behavior, or the type of the passenger's behavior is derived to be non-speech, or the direction of the passenger is determined to be non-speech. If it is derived that the passenger is not the contact person, it is determined that the passenger is not the contact person.

連絡者情報付与処理は、図２に示すように、連絡者判定処理で搭乗者が連絡者であると判定した場合は被写体データＤ２０に連絡者情報Ｄ２２を関連付け、連絡者判定処理で搭乗者が連絡者でないと判定した場合は被写体データＤ２０に非連絡者情報Ｄ２３を関連付けるように構成されている。 As shown in FIG. 2, in the contact person information provision process, if the passenger is determined to be the contact person in the contact person determination process, the contact person information D22 is associated with the subject data D20. If it is determined that the person is not a contact person, the non-contact person information D23 is associated with the subject data D20.

被写体特徴抽出手段で抽出する搭乗者の特徴も、性別や、年齢層等の搭乗者の外見に関連する特徴である。 The features of the passenger extracted by the subject feature extraction means are also features related to the passenger's appearance, such as gender and age group.

ここで、本実施形態の画像処理手段３では、被写体位置導出手段３２が抽出した被写体位置情報Ｄ２１は被写体データＤ２０に関連付けられ、被写体特徴導出手段３４が導出した被写体特徴情報Ｄ４も被写体データＤ２０に関連付けられ、また、上述のように、連絡者情報Ｄ２２又は非連絡者情報Ｄ２３も被写体データＤ２０に関連付けられる。そのため、画像関連情報Ｄ２には、被写体データＤ２０と、被写体位置情報Ｄ２１と、連絡者情報Ｄ２２又は非連絡者情報Ｄ２３の何れか一方と、被写体特徴情報Ｄ２４とが含まれている。 Here, in the image processing means 3 of this embodiment, the subject position information D21 extracted by the subject position deriving means 32 is associated with the subject data D20, and the subject characteristic information D4 derived by the subject characteristic deriving means 34 is also associated with the subject data D20. In addition, as described above, contact information D22 or non-contact information D23 is also associated with subject data D20. Therefore, the image-related information D2 includes subject data D20, subject position information D21, either contact information D22 or non-contact information D23, and subject characteristic information D24.

搭乗者情報作成手段４は、図１に示すように、音声データに関連付けられている音声位置情報と、被写体データに関連付けられている被写体位置情報とを比較し、音声位置情報と被写体位置情報とが同じ位置を示している場合は、音声関連情報と被写体関連情報とを関連付けて搭乗者情報を作成するように構成されている。 As shown in FIG. 1, the passenger information creation means 4 compares the audio location information associated with the audio data and the subject location information associated with the subject data, and compares the audio location information with the subject location information. If they indicate the same position, the camera is configured to create passenger information by associating the audio-related information with the subject-related information.

なお、搭乗者情報作成手段４は、音声データに関連付けられている音声位置情報と、被写体データに関連付けられている被写体位置情報とが同じ位置を示しており、且つ音声特徴情報が示す搭乗者の特徴と被写体特徴情報が示す搭乗者の特徴とが対応している場合に音声データと被写体データとを関連付けて搭乗者情報を作成するように構成されていてもよい。 Note that the passenger information creation means 4 determines that the voice position information associated with the voice data and the subject position information associated with the subject data indicate the same position, and that the voice characteristic information indicates the passenger's position. It may be configured to create passenger information by associating the audio data with the subject data when the feature and the feature of the passenger indicated by the subject characteristic information correspond.

対象選択手段５は、搭乗者の挙動と搭乗者が発した言葉に基づいて音声認識を行う対象とすべき搭乗者であるか否かを判定するように構成されている。 The target selection means 5 is configured to determine whether or not the passenger should be subjected to voice recognition based on the passenger's behavior and the words uttered by the passenger.

より具体的に説明すると、対象選択手段５は、搭乗者情報を選択する選択手段５０と、搭乗者の挙動情報に基づいて音声認識を行う対象とすべき搭乗者（本実施形態では連絡者）であるか否かを判定する挙動判定手段５１と、搭乗者情報の音声データから搭乗者が発した言葉を抽出し、且つ抽出した言葉に基づいて音声認識を行う対象とすべき搭乗者であるか否かを判定する言葉判定手段５２と、を有する。 More specifically, the target selection means 5 includes a selection means 50 for selecting passenger information, and a passenger (in this embodiment, a contact person) who is to be subjected to voice recognition based on the passenger's behavior information. The behavior determining means 51 determines whether or not the passenger is the passenger who is the target of speech recognition based on the extracted words. and a word determining means 52 for determining whether or not.

挙動判定手段５１は、被写体データに非連絡者情報が関連付けられている場合に、搭乗者が音声認識を行う対象とすべき搭乗者（連絡者）でないと判定するように構成されている。 The behavior determining means 51 is configured to determine that the passenger is not a passenger (contact person) to be subjected to voice recognition when non-contact person information is associated with the subject data.

言葉判定手段５２は、搭乗者が連絡者であるか否かを判定するために予め設定されている判定用の言葉が音声データから抽出した言葉に含まれているか否かを判定し、音声データから抽出した言葉に判定用の言葉が含まれている場合は搭乗者を連絡者と判定し、音声データから抽出した言葉に判定用の言葉が含まれていない場合は搭乗者が連絡者でないと判定するように構成されている。判定用の言葉とは、例えば、かごの異常を示す言葉等のことである。 The word determination means 52 determines whether or not the words extracted from the audio data include words for determination that are set in advance to determine whether or not the passenger is a contact person. If the words extracted from the voice data include the judgment word, the passenger is determined to be the contact person, and if the words extracted from the audio data do not include the judgment word, the passenger is not the contact person. is configured to make a determination. The word for judgment is, for example, a word indicating an abnormality of the car.

音声認識手段６は、挙動判定手段５１と言葉判定手段５２により連絡者が特定されていない状態においては、選択手段５０が選択した搭乗者情報に含まれる音声データの音声認識を行い、連絡者が特定されている状態においては、連絡者の搭乗者情報に含まれる音声データの音声認識を行うように構成されている。 In a state where the contact person is not specified by the behavior judgment means 51 and the word judgment means 52, the voice recognition means 6 performs voice recognition of the voice data included in the passenger information selected by the selection means 50, and determines whether the contact person is In the specified state, the system is configured to perform voice recognition of the voice data included in the contact person's passenger information.

本実施形態に係る音声認識システム１の構成は、以上の通りである。続いて、音声認識システム１の動作を説明する。 The configuration of the speech recognition system 1 according to this embodiment is as described above. Next, the operation of the speech recognition system 1 will be explained.

音声認識システム１は、図４に示すように、音処理手段２が音声関連情報を作成し（Ｓ１）、画像処理手段３が被写体関連情報を作成し（Ｓ２）、搭乗者情報を作成する場合（音声位置情報と被写体位置情報とが一致している場合）は（Ｓ３でＹｅｓ）、搭乗者情報作成手段４が音声関連情報と被写体関連情報とに基づいて搭乗者情報を作成する（Ｓ４）。 As shown in FIG. 4, in the voice recognition system 1, the sound processing means 2 creates voice-related information (S1), the image processing means 3 creates subject-related information (S2), and creates passenger information. (If the voice position information and the subject position information match) (Yes in S3), the passenger information creation means 4 creates passenger information based on the voice related information and the subject related information (S4) .

続いて、対象選択手段５０５が選択した搭乗者情報が連絡者のものであるか否かを判定し（Ｓ５）、対象選択手段５０５が搭乗者情報を連絡者のものであると判定した場合（Ｓ６でＹｅｓ）は、音声データに対して音声認識手段６による音声認識を行う（Ｓ７）ように構成されている。 Next, it is determined whether the passenger information selected by the target selection means 505 belongs to the contact person (S5), and if the target selection means 505 determines that the passenger information belongs to the contact person ( If YES in S6), the voice recognition means 6 performs voice recognition on the voice data (S7).

なお、本実施形態の音声認識システムは、音声認識手段６による音声データに対する音声認識を終了した後、または、搭乗者情報を作成しない場合（Ｓ３でＮｏ）、または、対象選択手段５０５が搭乗者情報を連絡者のものでないと判定した場合（Ｓ６でＮｏ）、処理を終了するか（Ｓ８でＹｅｓ）、処理を続行するか（Ｓ８でＮｏ）を判定する。 Note that the voice recognition system of the present embodiment can be used after the voice recognition means 6 completes voice recognition of the voice data, or when passenger information is not created (No in S3), or when the target selection means 505 selects a passenger. If it is determined that the information does not belong to the contact person (No in S6), it is determined whether to end the process (Yes in S8) or continue the process (No in S8).

また、図４では、音処理手段２による処理の流れの後に画像処理手段３による処理の流れを図示しているが、搭乗者情報作成手段４による処理が実行される前に音処理手段２による処理と画像処理手段３による処理とを完了させることができれば、画像処理手段３による処理の後に音処理手段２の処理が実行されてもよいし、音処理手段２による処理と画像処理手段３による処理とが並列的に処理されてもよい。 Further, although FIG. 4 shows the flow of processing by the image processing means 3 after the flow of processing by the sound processing means 2, the flow of processing by the sound processing means 2 is shown before the processing by the passenger information creation means 4 is executed. If the processing and the processing by the image processing means 3 can be completed, the processing by the sound processing means 2 may be executed after the processing by the image processing means 3, or the processing by the sound processing means 2 and the processing by the image processing means 3 may be executed. The processing may be performed in parallel.

音処理手段２は、図５に示すように、音データ取得手段２０が音データを取得するまで処理を繰り返し（Ｓ１０でＮｏ）、そして、音データ取得手段２０が音データを取得すると（Ｓ１０でＹｅｓ）と、音声作成手段２１が音データ取得手段２０で取得した音データから搭乗者ごとの音声データを作成し（Ｓ１１）、音声作成手段２１によって作成された音声データに基づいて音声位置導出手段２２が音源位置情報を導出し（Ｓ１２）、さらに、音声作成手段２１によって作成された音声データに基づいて音声特徴導出手段２３が搭乗者の特徴を示す音声特徴情報を導出する（Ｓ１３）。 As shown in FIG. 5, the sound processing means 2 repeats the process until the sound data acquisition means 20 acquires the sound data (No at S10), and when the sound data acquisition means 20 acquires the sound data (No at S10). Yes), the sound creation means 21 creates sound data for each passenger from the sound data acquired by the sound data acquisition means 20 (S11), and based on the sound data created by the sound creation means 21, the sound position derivation means 22 derives sound source position information (S12), and further, based on the sound data created by the sound creation means 21, the sound feature derivation means 23 derives sound feature information indicating the characteristics of the passenger (S13).

そして、導出された音源位置情報と音声特徴情報が音声データに関連付けられることによって、音声関連情報が作成される（Ｓ１４）。 Then, the derived sound source position information and audio feature information are associated with the audio data, thereby creating audio related information (S14).

画像処理手段３は、図６に示すように、画像データ取得手段３０が画像データを取得するまで処理を繰り返し（Ｓ２０でＮｏ）、そして、画像データ取得手段３０が画像データを取得すると（Ｓ２０でＹｅｓ）、被写体抽出手段３１が画像データ取得手段３０で取得した画像データから搭乗者ごとの被写体データを抽出し（Ｓ２１）、被写体抽出手段３１が抽出した被写体データに基づいて被写体位置導出手段３２が被写体位置情報を導出し（Ｓ２２）、被写体抽出手段３１が抽出した被写体データに基づいて挙動導出手段３３が被写体（搭乗者）の挙動を導出し、該挙動に基づいて被写体データに連絡者情報又は非連絡者情報の何れか一方を関連付け（Ｓ２３）、被写体抽出手段３１が抽出した被写体データに基づいて被写体特徴導出手段３４が被写体（搭乗者）の被写体特徴情報を導出する（Ｓ２４）。 As shown in FIG. 6, the image processing means 3 repeats the process until the image data acquisition means 30 acquires the image data (No in S20), and when the image data acquisition means 30 acquires the image data (No in S20). Yes), the subject extraction means 31 extracts subject data for each passenger from the image data acquired by the image data acquisition means 30 (S21), and the subject position derivation means 32 extracts subject data for each passenger from the image data acquired by the image data acquisition means 30 (S21). The subject position information is derived (S22), and the behavior deriving unit 33 derives the behavior of the subject (passenger) based on the subject data extracted by the subject extracting unit 31, and based on the behavior, contact information or contact information is added to the subject data. Either one of the non-contact person information is associated (S23), and the subject feature deriving unit 34 derives subject characteristic information of the subject (passenger) based on the subject data extracted by the subject extracting unit 31 (S24).

そして、連絡者情報又は非連絡者情報の何れか一方が関連付けられている被写体データに対して被写体位置情報と被写体特徴情報とを関連付けることによって画像関連情報を作成する（Ｓ２５）。 Then, image-related information is created by associating the subject position information and subject characteristic information with the subject data to which either contact information or non-contact information is associated (S25).

搭乗者情報作成手段４は、図７に示すように、音声データに関連付けられている音声位置情報と、被写体データに関連付けられている被写体位置情報とを比較し（Ｓ３０）、音声位置情報と被写体位置情報とが同じ位置かを比較する処理を繰り返し（Ｓ３１でＮｏ）、音声位置情報と被写体位置情報とが同じ位置を示している場合（Ｓ３１でＹｅｓ）は、音声関連情報と被写体関連情報とを関連付けて搭乗者情報を作成する（Ｓ３２）。 As shown in FIG. 7, the passenger information creation means 4 compares the voice position information associated with the voice data and the subject position information associated with the subject data (S30), and compares the voice position information with the subject position information. The process of comparing whether the location information and the subject location information are the same is repeated (No in S31), and if the audio location information and the subject location information indicate the same location (Yes in S31), the audio related information and the subject related information are passenger information is created by associating them with each other (S32).

対象選択手段５０５は、図８に示すように、選択手段５０が搭乗者情報を選択し（Ｓ５０）、挙動判定手段５１により搭乗者が連絡者でないと判定され（Ｓ５１でＮｏ）、言葉判定手段５２により搭乗者が連絡者でないと判定された場合（Ｓ５２でＮｏ）、搭乗者が連絡者でない旨を示す判定結果を出力し（Ｓ５３）、言葉判定手段５２により搭乗者が連絡者でないと判定された場合（Ｓ５２でＮｏ）、搭乗者が連絡者でない旨を示す判定結果を出力する（Ｓ５３）。 As shown in FIG. 8, the target selection means 505 selects passenger information when the selection means 50 selects the passenger information (S50), the behavior determination means 51 determines that the passenger is not the contact person (No in S51), and the word determination means If it is determined by 52 that the passenger is not the contact person (No in S52), a determination result indicating that the passenger is not the contact person is output (S53), and the word determination means 52 determines that the passenger is not the contact person. If so (No in S52), a determination result indicating that the passenger is not the contact person is output (S53).

一方で、挙動判定手段５１により搭乗者が連絡者であると判定されるか（Ｓ５１でＹｅｓ）、言葉判定手段５２により搭乗者が連絡者であると判定された場合（Ｓ５２でＹｅｓ）、搭乗者が連絡者である旨を示す判定結果を出力する（Ｓ５３）、言葉判定手段５２により搭乗者が連絡者でないと判定された場合（Ｓ５２でＮｏ）、搭乗者が連絡者である旨を示す判定結果を出力する（Ｓ５４）。 On the other hand, if the behavior determining means 51 determines that the passenger is the contact person (Yes in S51) or the word determining means 52 determines that the passenger is the contact person (Yes in S52), boarding is prohibited. outputs a determination result indicating that the passenger is the contact person (S53); if the word determining means 52 determines that the passenger is not the contact person (No in S52), outputs a determination result indicating that the passenger is the contact person; The determination result is output (S54).

そして、図４に示すように、搭乗者が連絡者であると判定結果が出力されている場合は（Ｓ６でＹｅｓ）、音声認識手段６による音声認識を行う。 Then, as shown in FIG. 4, if the determination result is output that the passenger is the contact person (Yes in S6), voice recognition is performed by the voice recognition means 6.

そして、音声認識手段６による音声認識が行われる。 Then, voice recognition is performed by the voice recognition means 6.

以上のように、本実施形態の音声認識システム１によれば、搭乗者の音声の発生源の位置と搭乗者の立ち位置の対応関係が一致している音声データに対して音声認識手段が音声認識を行えるように構成されているため、音声認識の対象とすべき音声データとは別の音声データに音声認識をしてしまうことが抑えられる。 As described above, according to the voice recognition system 1 of the present embodiment, the voice recognition means recognizes the voice data in which the correspondence between the position of the source of the voice of the passenger and the position of the passenger's voice matches. Since the configuration is such that recognition can be performed, it is possible to prevent voice recognition from being performed on voice data different from the voice data to be subjected to voice recognition.

このように、本実施形態の音声認識システム１は、音声認識の対象を適切に選択できるようにすることによって、音声認識の精度を向上させることができるようになっている。 In this manner, the speech recognition system 1 of the present embodiment is capable of improving the accuracy of speech recognition by appropriately selecting the target of speech recognition.

また、音声関連情報に含まれる音声特徴情報や、被写体特徴情報に含まれる被写体特徴情報等の発声元の搭乗者の特徴を示す情報を用いることができるため、これらの情報を用いることによって音声認識を行うべき対象であるか否かの判定や、音声認識の精度を高めることもできる。 In addition, since it is possible to use information indicating the characteristics of the passenger who uttered the voice, such as the voice feature information included in the voice related information and the subject feature information included in the subject feature information, voice recognition can be performed by using these information. It is also possible to improve the accuracy of voice recognition and the determination of whether or not the target should be subjected to voice recognition.

特に、本実施形態の音声特徴情報や被写体特徴情報は、搭乗者の外見に関連する特徴であるため、これらの情報を用いることによって音声認識を行うべき対象であるか否かの判定や、音声認識の精度を高めやすい。 In particular, the voice feature information and subject feature information of this embodiment are features related to the passenger's appearance, so by using these information, it is possible to determine whether or not the subject should be subjected to voice recognition, Easy to improve recognition accuracy.

なお、上述のように、搭乗者情報作成手段４が、音声データに関連付けられている音声位置情報と被写体データに関連付けられている被写体位置情報とが同じ位置を示しており、且つ音声特徴情報が示す搭乗者の特徴と被写体特徴情報が示す搭乗者の特徴とが対応している場合に音声データと被写体データとを関連付けて搭乗者情報を作成するように構成されている場合は、連絡者の特定誤りを抑えることができる。 Note that, as described above, the passenger information creation means 4 determines that the voice position information associated with the voice data and the subject position information associated with the subject data indicate the same position, and the voice characteristic information is If the vehicle is configured to create passenger information by associating voice data and subject data when the passenger characteristics indicated by the subject characteristic information correspond to the passenger characteristics indicated by the subject characteristic information, the contact person's Identification errors can be suppressed.

さらに、本実施形態の音声認識システム１では、音声の発生源の位置（音声関連情報の音声位置情報）と搭乗者の位置（被写体関連情報の被写体位置情報）との対応関係に加えて、搭乗者の挙動に基づいても搭乗者が連絡者であるか否かを判定するように構成されているため、音声認識を行うべき対象を選択する精度が向上する。 Furthermore, in the voice recognition system 1 of the present embodiment, in addition to the correspondence between the position of the voice source (voice position information of voice-related information) and the position of the passenger (subject position information of subject-related information), Since the system is configured to determine whether the passenger is the contact person based on the passenger's behavior, the accuracy of selecting the target for voice recognition is improved.

また、本実施形態の音声認識システム１においては、搭乗者が発した言葉に基づいても搭乗者が連絡者であるか否かを判定するように構成されているため、音声認識を行うべき対象を柔軟に選択できるようになる。 In addition, the voice recognition system 1 of the present embodiment is configured to determine whether or not the passenger is the contact person based on the words uttered by the passenger. be able to choose flexibly.

なお、本発明に係るエレベータ用の音声認識システムは、上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変更を加え得ることは勿論である。 Note that the voice recognition system for elevators according to the present invention is not limited to the above-described embodiments, and it goes without saying that various changes can be made without departing from the gist of the present invention.

上記実施形態において特に言及しなかったが、外部連絡装置Ｔとは、例えば、インターホンや、携帯情報端末（例えば、スマートホン）等のことである。 Although not specifically mentioned in the above embodiment, the external communication device T is, for example, an intercom, a mobile information terminal (for example, a smart phone), or the like.

上記実施形態の音声位置導出手段２２は、複数の集音装置Ｍを用いるように構成されていたが、例えば、１つの集音装置Ｍを用いるように構成されていてもよい。但し、複数の集音装置Ｍを用いた方が、音声の発生源の位置を導出する精度が高まる。 Although the sound position deriving means 22 of the above embodiment is configured to use a plurality of sound collection devices M, it may be configured to use one sound collection device M, for example. However, using a plurality of sound collecting devices M increases the accuracy of deriving the position of the sound source.

上記実施形態において特に言及しなかったが、音声認識手段６による音声認識を繰り返し行う際においては、連絡者を特定するための音処理手段２、画像処理手段３、搭乗者情報作成手段４、対象選択手段５による一連の処理を再度行ったうえで音声認識手段６による音声認識を行っても良いし、連絡者を特定した際に既に導出した情報（例えば、音声特徴情報や、被写体特徴情報等）を利用して、連絡者を特定するための一連の処理を行わずに音声認識手段６による音声認識を行っても良い。 Although not specifically mentioned in the above embodiment, when repeatedly performing voice recognition by the voice recognition means 6, the sound processing means 2 for identifying the contact person, the image processing means 3, the passenger information creation means 4, the target After performing the series of processing by the selection means 5 again, the voice recognition means 6 may perform voice recognition, or the information already derived when identifying the contact person (for example, voice characteristic information, subject characteristic information, etc.) may be used. ) may be used to perform voice recognition by the voice recognition means 6 without performing a series of processes for identifying the contact person.

上記実施形態では、かご内の搭乗者のうち、連絡者を特定したうえで、この連絡者の音声を認識できるように構成されていることを一例に挙げて音声認識システム１の説明を行ったが、この構成に限定されない。音声認識システム１は、例えば、連絡者とは別の種類の搭乗者を特定したうえで、この搭乗者の音声を認識できるように構成されていてもよい。 In the above embodiment, the voice recognition system 1 has been described using as an example a configuration in which a contact person is identified among the passengers in the car, and then the voice of this contact person can be recognized. However, the configuration is not limited to this. The voice recognition system 1 may be configured to, for example, identify a type of passenger other than the contact person and then recognize the voice of this passenger.

上記実施形態において特に言及しなかったが、音声認識システム１は、例えば、情報を記憶するための記憶装置や、制御用のマイコンを備え、かごに記憶装置と制御用のマイコンとが設置されるように構成されていてもよい。この場合、記憶装置には、かご内を撮像した画像データや、言葉判定手段５２で用いる判定用の言葉（キーワード）を示す情報等が記憶され、また、音処理手段２や、画像処理手段３、搭乗者情報作成手段４と、対象選択手段５と、音声認識手段６による処理は制御用のマイコンにより実行される。なお、記憶装置に記憶させる情報は、例えば、データベース形式であってもよい。 Although not specifically mentioned in the above embodiment, the voice recognition system 1 includes, for example, a storage device for storing information and a microcomputer for control, and the storage device and the microcomputer for control are installed in the car. It may be configured as follows. In this case, the storage device stores image data of the inside of the car, information indicating words (keywords) for judgment used by the word judgment means 52, etc. , the processing by the passenger information creation means 4, the object selection means 5, and the voice recognition means 6 is executed by a control microcomputer. Note that the information stored in the storage device may be in a database format, for example.

上記実施形態では、集音装置Ｍが外部連絡装置Ｔで構成されることを一例に挙げて説明を行ったが、外部連絡装置Ｔは、例えば、インターホンのように、かご内の音を集音する集音部と、かご内に音（より具体的には、搭乗者へのアナウンスや、搭乗者と対話するための音声等）を出力する出力部を備えているものであればよい。 In the above embodiment, the sound collection device M is configured with the external communication device T, but the external communication device T collects sounds inside the car, such as an intercom. Any device may be used as long as it is equipped with a sound collection unit for generating sound and an output unit for outputting sound (more specifically, announcements to passengers, voices for interacting with passengers, etc.) inside the car.

また、集音装置Ｍは、外部連絡装置Ｔ以外の装置によって構成することも可能であり、例えば、外部連絡装置Ｔではなく、かご内に設置されたマイクにより構成されていてもよい。この場合、かご内にスピーカーを設置すれば、このスピーカーを外部連絡装置Ｔの出力部の代用として用いれば、外部連絡装置Ｔを必要とせず、集音機能を重複させない構成にすることができる。 Further, the sound collection device M can be configured by a device other than the external communication device T, and for example, it may be configured by a microphone installed in the car instead of the external communication device T. In this case, if a speaker is installed in the car and used as a substitute for the output section of the external communication device T, the external communication device T is not required and the sound collection function can be configured without duplication.

上記実施形態において特に言及しなかったが、音声認識システム１では、かご内で集音した音データと、かご内を撮像した画像データとに基づいてかご内の状況を判定し、この判定結果に応じて必要なアクション（例えば、エレベータの動作の制御や、搭乗者への問いかけ等）を行うように構成されていてもよい。このようにすれば、判定したかご内の状況に基づいて、どのようなアクションをとるべきかを正しく判定できるようになる。 Although not specifically mentioned in the above embodiment, the voice recognition system 1 determines the situation inside the car based on sound data collected inside the car and image data captured inside the car. The system may be configured to take necessary actions (for example, control the operation of the elevator, ask questions to passengers, etc.) accordingly. In this way, it becomes possible to correctly determine what action to take based on the determined situation inside the car.

１…音声認識システム、２…音処理手段、３…画像処理手段、４…搭乗者情報作成手段、５…対象選択手段、６…音声認識手段、２０…音データ取得手段、２１…音声作成手段、２２…音声位置導出手段、２３…音声特徴導出手段、３０…画像データ取得手段、３１…被写体抽出手段、３２…被写体位置導出手段、３３…挙動導出手段、３４…被写体特徴導出手段、５０…選択手段、５１…挙動判定手段、５２…言葉判定手段、５０５…対象選択手段、Ｃ…撮像装置、Ｄ１…音声関連情報、Ｄ１０…音声データ、Ｄ１１…音源位置情報、Ｄ１２…音声特徴情報、Ｄ２…画像関連情報、Ｄ２０…被写体データ、Ｄ２１…被写体位置情報、Ｄ２２…連絡者情報、Ｄ２３…非連絡者情報、Ｄ２４…被写体特徴情報、Ｄ４…被写体特徴情報、Ｍ…集音装置、Ｐ…画像データ、Ｐ１…区画領域、Ｔ…外部連絡装置 DESCRIPTION OF SYMBOLS 1... Voice recognition system, 2... Sound processing means, 3... Image processing means, 4... Passenger information creation means, 5... Target selection means, 6... Voice recognition means, 20... Sound data acquisition means, 21... Voice creation means , 22... Sound position deriving means, 23... Sound feature deriving means, 30... Image data acquisition means, 31... Subject extracting means, 32... Subject position deriving means, 33... Behavior deriving means, 34... Subject feature deriving means, 50... Selection means, 51... Behavior determination means, 52... Word determination means, 505... Target selection means, C... Imaging device, D1... Audio related information, D10... Audio data, D11... Sound source position information, D12... Audio feature information, D2 ...Image related information, D20...Subject data, D21...Subject position information, D22...Contact information, D23...Non-contact information, D24...Subject characteristic information, D4...Subject characteristic information, M...Sound collection device, P...Image Data, P1...Divided area, T...External communication device

Claims

A sound collection device installed inside the car, which is composed of a sound collection unit that collects sounds inside the car, and an external communication device that includes an output unit that outputs sound for dialogue with passengers inside the car. a sound processing means for extracting voice data of the passenger from the sound data collected by the sound collection device and performing a process of specifying the location of the source of the voice data;
an image processing means that executes processing to identify the position of the passenger based on image data captured inside the car;
Among the voice data extracted by the sound processing means, the location of the source is the same as the position of the passenger identified by the image processing means, and the sound is transmitted by a contact person who is trying to contact an external party. voice recognition means for performing voice recognition on the object ,
Voice recognition system for elevators.

The sound processing means includes a sound feature extraction means for extracting sound feature information associated with the sound data, the sound feature information indicating the characteristics of the passenger who is the source of the sound based on the sound data.
The voice recognition system for an elevator according to claim 1.

The image processing means includes a subject characteristic extraction means for extracting subject characteristic information associated with subject data, which is subject characteristic information indicating the characteristics of the passenger reflected in the image data.
A voice recognition system for an elevator according to claim 1 or 2.

comprising behavior determination means for determining whether or not the passenger is a target for voice recognition based on the passenger's behavior reflected in the image data;
The voice recognition means is configured to generate the voice in which the position of the source is the same as the position of the passenger indicated by the image data, which is determined by the behavior determination means to be the passenger to be subjected to voice recognition. perform voice recognition on data,
A voice recognition system for an elevator according to any one of claims 1 to 3.

comprising word determination means for determining whether or not the passenger is a target for voice recognition based on words extracted based on the voice data;
The voice recognition means performs voice recognition on the voice data determined by the word determination means to be a passenger to be subjected to voice recognition.
A voice recognition system for an elevator according to any one of claims 1 to 4.