JP5252119B2

JP5252119B2 - Elevator voice call registration device

Info

Publication number: JP5252119B2
Application number: JP2012504246A
Authority: JP
Inventors: 絢子永田
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2010-03-12
Filing date: 2010-03-12
Publication date: 2013-07-31
Anticipated expiration: 2030-03-12
Also published as: CN102762477B; JPWO2011111221A1; CN102762477A; WO2011111221A1

Description

この発明は、エレベータの呼びを音声入力により登録するエレベータの音声呼び登録装置に関するものである。 The present invention relates to an elevator voice call registration device for registering elevator calls by voice input.

エレベータの呼びを音声入力により登録する音声呼び登録装置として、異なる認識手法を用いて音声認識を行う複数の音声認識処理部を備えたものが提案されている。この音声呼び登録装置は、複数の音声認識処理部の認識結果を併せて判定することで音声認識の確度を上げ、音声認識の確度により利用者に行う応答を変更する。これにより、利用者に快適な利用環境を与えることができる（例えば、特許文献１参照）。 As a voice call registration device for registering elevator calls by voice input, a device including a plurality of voice recognition processing units that perform voice recognition using different recognition methods has been proposed. This voice call registration device increases the accuracy of voice recognition by determining the recognition results of a plurality of voice recognition processing units together, and changes the response to the user according to the accuracy of voice recognition. Thereby, a comfortable use environment can be given to a user (for example, refer to patent documents 1).

しかしながら、特許文献１に記載のものを利用した場合、全ての利用者に対して同じ手法で音声認識を行うことになる。このため、音声認識処理部に用意された認識手法のいずれにも合わない音声を認識することができない。このような音声を発する利用者は、いつまでも音声入力による行先呼び等のエレベータの呼びを登録することができない。また、特許文献１に記載のものは、複数の音声認識処理部で同時に認識されないとエレベータの呼びを登録することができない。即ち、特許文献１に記載のものは、全ての利用者に対し、エレベータの呼びの登録を確定する敷居が一律に上がってしまい、使い勝手の悪いものとなっている。 However, when the one described in Patent Document 1 is used, voice recognition is performed by the same method for all users. For this reason, speech that does not match any of the recognition methods prepared in the speech recognition processing unit cannot be recognized. A user who makes such voice cannot register an elevator call such as a destination call by voice input indefinitely. Moreover, the thing of patent document 1 cannot register an elevator call unless it is simultaneously recognized by a plurality of voice recognition processing units. That is, the one described in Patent Document 1 is unusable because the threshold for determining registration of an elevator call is uniformly raised for all users.

これに対し、予め登録した利用者の音声のみ受付可能とする音声呼び登録装置が提案されている。この音声呼び登録装置によれば、音声の誤認識を低減することができる（例えば、特許文献２参照）。さらに、音声の分析結果を蓄積し、蓄積結果に基づいて、話者属性を認識する装置が提案されている。これらの装置によれば、話者属性の特定精度を向上し、音声の誤認識を更に低減することができる（例えば、特許文献３及び４参照）。 On the other hand, a voice call registration device has been proposed that can accept only the voice of a user registered in advance. According to this voice call registration device, voice misrecognition can be reduced (see, for example, Patent Document 2). Furthermore, an apparatus for accumulating speech analysis results and recognizing speaker attributes based on the accumulation results has been proposed. According to these apparatuses, it is possible to improve speaker attribute identification accuracy and further reduce speech misrecognition (see, for example, Patent Documents 3 and 4).

しかしながら、特許文献２乃至４に記載のものを音声呼び登録装置に適用した場合、利用者を特定の話者に限定する必要がある。このため、特許文献２乃至４に記載のものの利用範囲は狭くなってしまう。一方、エレベータの音声呼び登録装置は、不特定多数の利用者の音声を認識する必要がある。このため、特許文献２乃至４に記載のものを音声呼び登録装置に適用することは困難である。 However, when the devices described in Patent Documents 2 to 4 are applied to a voice call registration device, it is necessary to limit the user to a specific speaker. For this reason, the utilization range of the thing of patent documents 2 thru | or 4 will become narrow. On the other hand, the elevator voice call registration device needs to recognize the voices of an unspecified number of users. For this reason, it is difficult to apply the devices described in Patent Documents 2 to 4 to the voice call registration device.

これらに対し、利用用途、環境騒音で表される環境属性や発話者の性別や年齢で表される本人属性に合わせた複数の音声認識辞書（認識単語と音響モデル）を備えた装置が提案されている。この装置によれば、実際の環境属性や実際の本人属性を与えて、適切な音声認識辞書を選択することができ、発話者を限定せずに、発話者の特徴に合った音響モデルを選択することができる（例えば、特許文献５参照）。 On the other hand, devices with multiple speech recognition dictionaries (recognized words and acoustic models) are proposed according to the usage attribute, environmental attributes expressed by environmental noise, and individual attributes expressed by the gender and age of the speaker. ing. According to this device, it is possible to select an appropriate speech recognition dictionary by giving actual environment attributes and actual user attributes, and select an acoustic model that matches the characteristics of the speaker without limiting the speaker. (For example, refer to Patent Document 5).

しかしながら、特許文献５に記載のものを音声呼び登録装置に適用した場合、利用の都度、実際の環境属性や実際の利用者属性を与えなければならないという煩わしさがある。 However, when the one described in Patent Document 5 is applied to a voice call registration device, there is an annoyance that an actual environment attribute or an actual user attribute must be given each time it is used.

これに対し、エレベータに設けられたカメラ装置から入力された情報に基づいて、成人男性、成人女性、子供等の利用者属性や人数を判別することで、利用状況に合った情報表示を行う表示装置が提案されている。この表示装置の判別手法を音響モデルの選択に適用すれば、利用者を限定せずに、利用者に合う可能性のある音響モデルを容易に選択することができる（例えば、特許文献６参照）。 On the other hand, a display that displays information according to the use situation by determining the user attributes and the number of adult men, adult women, children, etc. based on information input from the camera device provided in the elevator A device has been proposed. If this display device discriminating method is applied to the selection of an acoustic model, it is possible to easily select an acoustic model that may suit the user without limiting the user (see, for example, Patent Document 6). .

日本特許第３０８２６１８号公報Japanese Patent No. 3082618 日本特許第２５５７９３９号公報Japanese Patent No. 2557939 日本特開平１０−２４０２８７号公報Japanese Unexamined Patent Publication No. 10-240287 日本特表２００３−５２４８０５号公報Japanese Special Table 2003-524805 日本特開２００２−２２９５８４号公報Japanese Patent Laid-Open No. 2002-229584 日本特開２００７−２６１７２２号公報Japanese Unexamined Patent Publication No. 2007-261722

しかしながら、特許文献６に記載の判別手法が常に正確とは限らない。このため、利用者の特徴に合っていない音響モデルが選択されてしまう場合もあるという問題があった。 However, the discrimination method described in Patent Document 6 is not always accurate. For this reason, there is a problem that an acoustic model that does not match the user's characteristics may be selected.

この発明は、上述のような課題を解決するためになされたもので、その目的は、利用者を限定せずに、容易な方法で、エレベータの呼びを音声入力により登録する際に利用者の特徴に合った音響モデルを選択する可能性を高めることができるエレベータの音声呼び登録装置を提供することである。 The present invention has been made to solve the above-described problems, and its purpose is not to limit the user, but to register the elevator call by voice input in an easy manner. It is an object of the present invention to provide an elevator voice call registration device capable of increasing the possibility of selecting an acoustic model suitable for a feature.

この発明に係るエレベータの音声呼び登録装置は、エレベータのかご又は乗場に設けられた音声入力装置への入力音声を取り込む音声入力部と、音響特性が互いに異なる複数の音響モデルを記憶した音響モデル記憶部と、前記音声入力装置が設けられたかご又は乗場にいる利用者の特徴に関する利用者情報を抽出する利用者情報抽出部と、前記音声入力装置が設けられたかご又は乗場の位置を含む前記エレベータの状態に関するエレベータ情報を抽出するエレベータ情報抽出部と、前記利用者情報と前記エレベータ情報とに基づいて、前記入力音声から前記エレベータの呼びを認識する際に使用する音響モデルを、前記複数の音響モデルの中から選択する音響モデル選択部と、前記利用者情報抽出部が利用者情報を抽出する度に、抽出された利用者情報を前記エレベータ情報に対応付けて利用者の特徴の傾向に関する利用者傾向情報を蓄積する利用者情報記憶部と、を備え、前記音響モデル選択部は、前記利用者傾向情報に基づいて、前記エレベータが前記エレベータ情報に対応した状態のときに前記エレベータの呼びを認識する際に使用する音響モデルを選択するものである。
An elevator voice call registration device according to the present invention includes an audio input unit that captures input audio to an audio input device provided in an elevator car or a landing, and an acoustic model storage that stores a plurality of acoustic models having different acoustic characteristics. And a user information extracting unit for extracting user information relating to characteristics of a user at a car or a hall provided with the voice input device, and a position of a car or a hall provided with the voice input device An elevator information extraction unit that extracts elevator information related to the state of the elevator, and an acoustic model used when recognizing the elevator call from the input voice based on the user information and the elevator information, an acoustic model selection unit that selects from among acoustic models, every time the user information extraction unit extracts user information, extracted Comprising a user information storage unit to use user information in association with the elevator information storing user tendency information about trends in features of the user, wherein the acoustic model selection unit, based on the user tendency information The acoustic model to be used when recognizing the call of the elevator when the elevator is in a state corresponding to the elevator information is selected .

この発明によれば、利用者を限定せずに、容易な方法で、エレベータの呼びを音声入力により登録する際に利用者の特徴に合った音響モデルを選択する可能性を高めることができる。 According to the present invention, it is possible to increase the possibility of selecting an acoustic model that matches the user's characteristics when registering an elevator call by voice input in an easy manner without limiting the user.

この発明の実施の形態１におけるエレベータの音声呼び登録装置とエレベータ制御部の構成図である。BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram of the elevator voice call registration apparatus and elevator control part in Embodiment 1 of this invention. この発明の実施の形態１におけるエレベータの音声呼び登録装置がエレベータの呼びを登録する場合の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement in case the elevator call registration apparatus in Embodiment 1 of this invention registers the call of an elevator. この発明の実施の形態１におけるエレベータの音声呼び登録装置が音響モデルを選択する場合の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement when the voice call registration apparatus of the elevator in Embodiment 1 of this invention selects an acoustic model. この発明の実施の形態２におけるエレベータの音声呼び登録装置とエレベータ制御部の構成図である。It is a block diagram of the voice call registration apparatus and elevator control part in Embodiment 2 of this invention. この発明の実施の形態３におけるエレベータの音声呼び登録装置とエレベータ制御部の構成図である。It is a block diagram of the voice call registration apparatus and elevator control part of the elevator in Embodiment 3 of this invention.

この発明を実施するための形態について添付の図面に従って説明する。なお、各図中、同一又は相当する部分には同一の符号を付しており、その重複説明は適宜に簡略化ないし省略する。 A mode for carrying out the invention will be described with reference to the accompanying drawings. In addition, in each figure, the same code | symbol is attached | subjected to the part which is the same or it corresponds, The duplication description is simplified or abbreviate | omitted suitably.

実施の形態１．
一般に、エレベータはビル等の建築物に設けられる。このエレベータのかご内や乗場には、呼び登録装置が設けられる。この呼び登録装置を利用して、利用者はエレベータの呼びを登録する。そして、この呼び登録装置によって登録された呼びに応答して、エレベータのかごが昇降する。Embodiment 1 FIG.
Generally, an elevator is provided in a building such as a building. A call registration device is provided in the elevator car and at the landing. Using this call registration device, the user registers an elevator call. In response to the call registered by the call registration device, the elevator car moves up and down.

ここで、呼び登録装置の一つとして、エレベータの呼びを音声入力により登録する音声呼び登録装置が提案されている。この音声呼び登録装置は、かご内や乗場で利用者が発した音声を認識して、エレベータの呼びを登録するものである。この音声呼び登録装置を利用すれば、両手の塞がった利用者等であってもエレベータの呼びを容易に登録することができる。 Here, as one of call registration devices, a voice call registration device that registers elevator calls by voice input has been proposed. This voice call registration device recognizes a voice uttered by a user in a car or a hall and registers an elevator call. If this voice call registration device is used, a call of an elevator can be easily registered even by a user who has both hands closed.

かかる音声呼び登録装置においては、利用者の発した音声が誤認識されると、利用者の希望するエレベータの呼びを登録することができなくなってしまう。そこで、本実施の形態の音声呼び登録装置では、音声の認識精度を向上するようにした。以下、本実施の形態の音声呼び登録装置を具体的に説明する。 In such a voice call registration device, if the voice uttered by the user is erroneously recognized, the elevator call desired by the user cannot be registered. Therefore, in the voice call registration device of the present embodiment, the voice recognition accuracy is improved. Hereinafter, the voice call registration device of the present embodiment will be described in detail.

図１はこの発明の実施の形態１におけるエレベータの音声呼び登録装置とエレベータ制御部の構成図である。
図１の音声呼び登録装置は、エレベータのかご内又は乗場に設けられる。この音声呼び登録装置は、音声入力部１、Ａ／Ｄ変換部２、音声切出し部３、音響分析部４、情報入力部５、利用者情報抽出部６、ビル情報記憶部７、音響モデル選択部８、認識辞書９、複数の音響モデル１０、音声認識部１１を備える。FIG. 1 is a configuration diagram of an elevator voice call registration device and an elevator control unit according to Embodiment 1 of the present invention.
The voice call registration device of FIG. 1 is provided in an elevator car or a landing. This voice call registration device includes a voice input unit 1, an A / D conversion unit 2, a voice cutout unit 3, an acoustic analysis unit 4, an information input unit 5, a user information extraction unit 6, a building information storage unit 7, and an acoustic model selection. Unit 8, a recognition dictionary 9, a plurality of acoustic models 10, and a speech recognition unit 11.

音声入力部１は、エレベータのかご内又は乗場に設けられたマイク等の音声入力装置（図示せず）への入力音声を取込む機能を備える。Ａ／Ｄ変換部２は、音声入力部１が取り込んだ入力音声をディジタルデータに変換する機能を備える。音声切出し部３は、Ａ／Ｄ変換部２から渡されたディジタルデータの無音区間を検出する機能を備える。また、音声切出し部３は、無音区間情報に基づいて、ディジタルデータの無音区間に挟まれた区間を、音声認識の対象となる発話区間として切り出す機能を備える。音響分析部４は、音声切出し部３に切り出された発話区間を、フーリエ変換等の演算処理によって音声認識に使用する特徴量データに変換する機能を備える。 The voice input unit 1 has a function of taking an input voice to a voice input device (not shown) such as a microphone provided in an elevator car or a landing. The A / D conversion unit 2 has a function of converting the input voice captured by the voice input unit 1 into digital data. The voice extraction unit 3 has a function of detecting a silent section of digital data passed from the A / D conversion unit 2. In addition, the voice extraction unit 3 has a function of cutting out a section sandwiched between silent sections of digital data as speech sections to be subjected to voice recognition based on the silent section information. The acoustic analysis unit 4 has a function of converting the utterance section extracted by the speech extraction unit 3 into feature amount data used for speech recognition by arithmetic processing such as Fourier transform.

情報入力部５は、センサ装置、秤装置、カメラ装置等、通常のエレベータに設けられている機器に検出された利用者の特徴に関する情報を取得する機能を備える。利用者情報抽出部６は、情報入力部５に入力された情報から利用者の背格好、かごへの乗車速度等、利用者の特徴に関する利用者情報を抽出する機能を備える。ビル情報記憶部７は、ビル情報を記憶する機能を備える。このビル情報は、エレベータが設けられたビル等の建築物の各階に入居しているテナントに関する情報等、当該建築物の各階の特徴に関する建築物情報からなる。即ち、ビル情報記憶部７は、建築物情報を記憶する建築物情報記憶部として機能する。 The information input unit 5 has a function of acquiring information related to the characteristics of the user detected by a device provided in a normal elevator such as a sensor device, a scale device, or a camera device. The user information extraction unit 6 has a function of extracting user information related to the user's characteristics such as the user's back appearance and the riding speed on the car from the information input to the information input unit 5. The building information storage unit 7 has a function of storing building information. This building information includes building information relating to the characteristics of each floor of the building, such as information relating to tenants occupying each floor of a building such as a building provided with an elevator. That is, the building information storage unit 7 functions as a building information storage unit that stores building information.

音響モデル選択部８は、ビル情報に対応付けられた利用者の特徴の傾向に関する利用者傾向情報を抽出する機能を備える。この利用者傾向情報は、音響モデル選択部８に記憶されていてもよいし、ビル情報記憶部７等の他の記憶部に記憶されていてもよい。また、音響モデル選択部８は、利用者情報抽出部６に抽出された利用者情報、ビル情報に対応づけられた利用者傾向情報等に基づいて、利用者属性を推定する機能を備える。 The acoustic model selection unit 8 has a function of extracting user tendency information related to a tendency of user characteristics associated with building information. This user tendency information may be stored in the acoustic model selection unit 8 or may be stored in another storage unit such as the building information storage unit 7. The acoustic model selection unit 8 has a function of estimating user attributes based on user information extracted by the user information extraction unit 6, user tendency information associated with building information, and the like.

この利用者属性の種類は、センサ装置、秤装置、カメラ装置等の利用者の特徴の検出内容と検出精度とを考慮して、様々なものに設定可能となっている。例えば、利用者属性は、大人と子供とを区別するように設定される場合もある。また、利用者属性は、１０代、２０代等、各年代を区別するように設定される場合もある。さらに、利用者属性は、男性又は女性等、性別を区別するように設定される場合もある。加えて、利用者属性は、年代、性別等の複合条件で区別するように設定される場合もある。なお、音響モデル選択部８には、利用者属性を推定する際の利用者情報、ビル情報等の各情報の重み付けを適宜設定できるようにもなっている。 Various types of user attributes can be set in consideration of the detection contents and detection accuracy of user characteristics such as sensor devices, scale devices, and camera devices. For example, the user attribute may be set so as to distinguish an adult from a child. Further, the user attribute may be set so as to distinguish each age, such as teenagers and twenties. Further, the user attribute may be set so as to distinguish gender such as male or female. In addition, the user attributes may be set so as to be distinguished by complex conditions such as age and sex. The acoustic model selection unit 8 can appropriately set the weighting of each piece of information such as user information and building information when estimating user attributes.

認識辞書９は、例えば、「イッカイ」、「メインフロア」等、音声認識させたい単語を記憶する機能を備える。複数の音響モデル１０は、各音響モデル記憶装置（図示せず）に記憶される。これらの音響モデル１０は、各音素の特徴量データが一通り揃っている音響データからなる。これらの音響データは、年代、性別等、音響モデル選択部８が推定する利用者属性に対応し、音響特性が互いに異なっている。そして、これらの音響モデル１０には、どの利用者属性に対応したモデルであるかを示すタグが予め付けられている。 The recognition dictionary 9 has a function of storing words to be recognized as speech, such as “Ikkai” and “Main floor”. The plurality of acoustic models 10 are stored in each acoustic model storage device (not shown). These acoustic models 10 are composed of acoustic data in which feature amount data of each phoneme is arranged. These acoustic data correspond to user attributes estimated by the acoustic model selection unit 8 such as age and sex, and have different acoustic characteristics. These acoustic models 10 are pre-attached with tags indicating which user attribute the model corresponds to.

音声認識部１１は、音響モデル選択部８が推定した利用者属性に対応したタグを、複数の音響モデル１０の中から検出する機能を備える。また、音声認識部１１は、検出したタグを有する音響モデル１０を、音響モデル選択部８が推定した利用者属性に一番近い特徴を持った音響モデル１０として選択する機能を備える。即ち、実質的には、音声認識部１１は、音響モデル選択部８が選択した音響モデル１０を複数の音響モデル１０の中から抽出するようになっている。さらに、音声認識部１１は、選択した音響モデル１０を使用して、音声入力部１への入力音声から認識辞書９に記述された単語を認識する機能を備える。 The speech recognition unit 11 has a function of detecting a tag corresponding to the user attribute estimated by the acoustic model selection unit 8 from the plurality of acoustic models 10. The voice recognition unit 11 has a function of selecting the acoustic model 10 having the detected tag as the acoustic model 10 having the feature closest to the user attribute estimated by the acoustic model selection unit 8. In other words, the speech recognition unit 11 substantially extracts the acoustic model 10 selected by the acoustic model selection unit 8 from the plurality of acoustic models 10. Furthermore, the speech recognition unit 11 has a function of recognizing words described in the recognition dictionary 9 from speech input to the speech input unit 1 using the selected acoustic model 10.

また、図１には、エレベータ制御部１２も示される。このエレベータ制御部１２は、呼び登録部１３を備える。呼び登録部１３は、認識辞書９の単語とエレベータが設けられたビル等の階床とを対応付けて予め記憶する機能を備える。例えば、認識辞書９の「イッカイ」には、文字通り、「１階」が対応付けて記憶される。認識辞書９の「メインフロア」にも、「１階」が対応付けて記憶される。そして、呼び登録部１３は、音声認識部１１に認識された単語に対応付けられた階床を認識した場合は、当該階床に対応した行先呼びをエレベータの呼びとして登録する。 FIG. 1 also shows an elevator control unit 12. The elevator control unit 12 includes a call registration unit 13. The call registration unit 13 has a function of storing in advance the words in the recognition dictionary 9 and the floor of a building or the like provided with an elevator in association with each other. For example, “Ikkai” in the recognition dictionary 9 is literally stored with “first floor” associated therewith. “First floor” is also stored in association with “main floor” of the recognition dictionary 9. When the call registration unit 13 recognizes the floor associated with the word recognized by the voice recognition unit 11, the call registration unit 13 registers the destination call corresponding to the floor as an elevator call.

また、エレベータ制御部１２は、エレベータ情報管理部１４も備える。このエレベータ情報管理部１４は、様々なエレベータの状態を検出して管理する機能を備える。例えば、エレベータ情報管理部１４は、音声入力装置が設けられたかごの又は乗場の位置、かごの走行方向、戸開閉状態等のエレベータ情報を管理する機能を備える。特に、エレベータ情報管理部１４は、刻一刻と変化するかごの現在位置（現在階）を確実に検出して管理する。 The elevator control unit 12 also includes an elevator information management unit 14. The elevator information management unit 14 has a function of detecting and managing various elevator states. For example, the elevator information management unit 14 has a function of managing elevator information such as the position of the car or landing, the traveling direction of the car, and the door open / close state provided with the voice input device. In particular, the elevator information management unit 14 reliably detects and manages the current position (current floor) of the car that changes every moment.

本実施の形態においては、音響モデル選択部８は、エレベータ情報管理部１４からエレベータ情報を抽出するエレベータ情報抽出部としても機能する。そして、音響モデル選択部８は、各エレベータ情報に対応付けられた利用者の特徴の傾向に関する利用者傾向情報を抽出する。この利用者傾向情報は、エレベータの利用者を一定期間調査することで把握される。この利用者傾向情報は、音響モデル選択部８に記憶されていてもよいし、他の記憶部に記憶されていてもよい。 In the present embodiment, the acoustic model selection unit 8 also functions as an elevator information extraction unit that extracts elevator information from the elevator information management unit 14. And the acoustic model selection part 8 extracts the user tendency information regarding the tendency of the feature of the user matched with each elevator information. This user tendency information is grasped by investigating elevator users for a certain period. This user tendency information may be stored in the acoustic model selection unit 8 or may be stored in another storage unit.

かかる音響モデル選択部８は、ビル情報から抽出した利用者傾向情報を第１利用者傾向情報として認識し、エレベータ情報から抽出した利用者傾向情報を第２利用者傾向情報として認識する。そして、音響モデル選択部８は、利用者情報、第１利用者傾向情報だけでなく、第２利用者傾向情報をも考慮して、利用者属性を推定するようになっている。 The acoustic model selection unit 8 recognizes the user tendency information extracted from the building information as the first user tendency information, and recognizes the user tendency information extracted from the elevator information as the second user tendency information. The acoustic model selection unit 8 estimates the user attribute in consideration of not only the user information and the first user tendency information but also the second user tendency information.

次に、図２及び図３を用いて、本実施の形態の音声呼び登録装置がエレベータの呼びを登録する場合の動作を説明する。
図２はこの発明の実施の形態１におけるエレベータの音声呼び登録装置がエレベータの呼びを登録する場合の動作を説明するためのフローチャートである。図３はこの発明の実施の形態１におけるエレベータの音声呼び登録装置が音響モデルを選択する場合の動作を説明するためのフローチャートである。Next, using FIG. 2 and FIG. 3, the operation when the voice call registration device of the present embodiment registers an elevator call will be described.
FIG. 2 is a flowchart for explaining the operation when the elevator voice call registration device according to the first embodiment of the present invention registers an elevator call. FIG. 3 is a flowchart for explaining the operation in the case where the elevator voice call registration device according to Embodiment 1 of the present invention selects an acoustic model.

まず、図２を用いて、エレベータの呼びを登録する手順の概要を説明する。
即ち、ステップＳ１で、かご内又は乗場の音声入力装置に音声が入力されると、音声入力部１が当該音声を取り込んで、ステップＳ２に進む。ステップＳ２では、Ａ／Ｄ変換部２が当該音声をディジタルデータに変換し、ステップＳ３に進む。First, an outline of a procedure for registering an elevator call will be described with reference to FIG.
That is, in step S1, when voice is input into the car or landing voice input device, the voice input unit 1 captures the voice and proceeds to step S2. In step S2, the A / D converter 2 converts the sound into digital data, and the process proceeds to step S3.

ステップＳ３では、音声切出し部３が当該ディジタルデータの発話区間を切り出して検出し、ステップＳ４に進む。ステップＳ４では、音響分析部４が当該発話区間を音声認識に使用する特徴量データに変換して音響分析し、ステップＳ５に進む。ステップＳ５では、音響モデル選択部８が実質的に音響モデル１０を選択し、ステップＳ６に進む。 In step S3, the voice cutout unit 3 cuts out and detects the speech section of the digital data, and proceeds to step S4. In step S4, the acoustic analysis unit 4 converts the speech section into feature amount data used for speech recognition, performs acoustic analysis, and proceeds to step S5. In step S5, the acoustic model selection unit 8 substantially selects the acoustic model 10 and proceeds to step S6.

ステップＳ６では、音声認識部１１が、音響分析部４に音響分析された特徴量データを、音響モデル選択部８に選択された音響モデル１０の音響データと比較する。これにより、音声入力部１に取り込んだ入力音声が認識され、ステップＳ７に進む。ステップＳ７では、音声認識部１１が入力音声の認識結果をエレベータ制御部１２の呼び登録部１３に出力し、ステップＳ８に進む。 In step S <b> 6, the speech recognition unit 11 compares the feature amount data acoustically analyzed by the acoustic analysis unit 4 with the acoustic data of the acoustic model 10 selected by the acoustic model selection unit 8. Thereby, the input voice taken into the voice input unit 1 is recognized, and the process proceeds to step S7. In step S7, the voice recognition unit 11 outputs the recognition result of the input voice to the call registration unit 13 of the elevator control unit 12, and the process proceeds to step S8.

ステップＳ８では、呼び登録部１３が当該認識結果に係る単語に対応付けられた階床を認識できたか否かを判断する。当該認識結果に係る単語に対応付けられた階床が認識されなかった場合は、動作が終了する。即ち、行先呼びは登録されない。これに対し、当該認識結果に係る単語に対応付けられた階床が認識された場合は、ステップＳ９に進む。ステップＳ９では、呼び登録部１３が当該階床に対応した行先呼びを登録し、動作が終了する。 In step S8, the call registration unit 13 determines whether or not the floor associated with the word related to the recognition result has been recognized. If the floor associated with the word related to the recognition result is not recognized, the operation ends. That is, the destination call is not registered. On the other hand, if the floor associated with the word related to the recognition result is recognized, the process proceeds to step S9. In step S9, the call registration unit 13 registers a destination call corresponding to the floor, and the operation ends.

ここで、図３のステップＳ５においては、上述したように、利用者情報、第１利用者傾向情報だけでなく、第２利用者傾向情報もが考慮され、入力音声からエレベータの呼びを認識する際に使用する音響モデル１０が選択される。以下、図３を用いて、本実施の形態における音響モデル１０の選択手順を具体的に説明する。 Here, in step S5 of FIG. 3, as described above, not only the user information and the first user tendency information but also the second user tendency information is considered, and the call of the elevator is recognized from the input voice. The acoustic model 10 to be used is selected. Hereinafter, the selection procedure of the acoustic model 10 in the present embodiment will be specifically described with reference to FIG.

まず、ステップＳ１１では、音響モデル選択部８が、ビル情報記憶部７のビル情報を参照し、ビル情報に対応付けられた第１利用者傾向情報を抽出する。その後、ステップＳ１２に進み、音響モデル選択部８が、エレベータ情報管理部１４のエレベータ情報を参照し、現状のエレベータ情報に対応付けられた第２利用者傾向情報を抽出する。 First, in step S11, the acoustic model selection unit 8 refers to the building information in the building information storage unit 7 and extracts the first user tendency information associated with the building information. Then, it progresses to step S12 and the acoustic model selection part 8 refers to the elevator information of the elevator information management part 14, and extracts the 2nd user tendency information matched with the present elevator information.

その後、ステップＳ１３に進み、利用者情報抽出部６が利用者情報を抽出する。その後、音響モデル選択部８が当該利用者情報を参照し、ステップＳ１４に進む。ステップＳ１４では、音響モデル選択部８が、利用者情報、第１利用者傾向情報だけでなく、第２利用者傾向情報をも考慮して、利用者属性を推定し、ステップＳ１５に進む。ステップＳ１５では、音響モデル選択部８が当該利用者属性に合った音響モデル１０を実質的に選択し、動作が終了する。 Then, it progresses to step S13 and the user information extraction part 6 extracts user information. Thereafter, the acoustic model selection unit 8 refers to the user information and proceeds to step S14. In step S14, the acoustic model selection unit 8 estimates the user attribute in consideration of not only the user information and the first user tendency information but also the second user tendency information, and the process proceeds to step S15. In step S15, the acoustic model selection unit 8 substantially selects the acoustic model 10 that matches the user attribute, and the operation ends.

以上で説明した実施の形態１によれば、利用者情報、ビル情報だけでなく、エレベータ情報もが考慮され、エレベータの呼びを認識する際に使用する音響モデル１０が選択される。このため、利用者を限定せずに、容易な方法で、エレベータの呼びを音声入力により登録する際に利用者の特徴に合った音響モデル１０を選択する可能性を高めることができる。 According to the first embodiment described above, not only user information and building information but also elevator information is considered, and the acoustic model 10 used when recognizing an elevator call is selected. For this reason, it is possible to increase the possibility of selecting the acoustic model 10 that matches the characteristics of the user when the elevator call is registered by voice input in an easy manner without limiting the user.

かかる音響モデル１０の選択により、利用者が発した音声に対する認識精度を上げるとともに、利用者と同じ特性を持った発話者以外の無駄話やアナウンス装置からの流れる音声に反応しにくくすることができる。即ち、利用者が発した音声の誤認識と当該誤認識による呼びの誤登録とを防止する可能性を高めることができる。 The selection of the acoustic model 10 can increase the recognition accuracy for the voice uttered by the user, and can make it difficult to react to the useless talk other than the talker having the same characteristics as the user and the voice flowing from the announcement device. . That is, it is possible to increase the possibility of preventing misrecognition of speech uttered by the user and call misregistration due to the misrecognition.

また、利用者情報抽出部６は、一般にエレベータに設けられるセンサ装置、秤装置、カメラ装置等の機器が検出した利用者の特徴から利用者情報を抽出する。このため、特別な装置を付加することなく、容易な方法で、利用者の特徴に合った音響モデル１０を選択する可能性を高めることができる。 The user information extraction unit 6 extracts user information from user characteristics detected by devices such as a sensor device, a scale device, and a camera device that are generally provided in an elevator. Therefore, it is possible to increase the possibility of selecting the acoustic model 10 that matches the user's characteristics by an easy method without adding a special device.

なお、実施の形態１においては、選択した音響モデル１０を使用し、音声認識を１回のみ行って、エレベータの呼びを登録するようになっていた。しかしながら、認識尤度の閾値を設け、音声認識部１１の認識結果として認識した単語と尤度を出力するようにし、エレベータの呼びを登録するか否かを判定してもよい。そして、利用者に特徴の合うものとして選択した音響モデル１０での認識尤度が低かった場合に、認識尤度が閾値を超えるまで、音声認識に使用する音響モデル１０の変更を繰り返してもよい。 In the first embodiment, the selected acoustic model 10 is used, speech recognition is performed only once, and elevator calls are registered. However, a threshold value for the recognition likelihood may be provided, and the word and likelihood recognized as a recognition result of the speech recognition unit 11 may be output to determine whether or not to register the elevator call. And when the recognition likelihood in the acoustic model 10 selected as a thing suitable for a user is low, the change of the acoustic model 10 used for speech recognition may be repeated until the recognition likelihood exceeds a threshold value. .

また、実施の形態１においては、利用者情報とビル情報とエレベータ情報とに基づいて選択された音響モデル１０を使用した入力音声の認識結果を出力して、エレベータの呼びを登録するようになっていた。しかしながら、全ての音響モデル１０を使用した入力音声の認識結果のうち、利用者情報とビル情報とエレベータ情報とに基づいて選択された音響モデル１０を使用した入力音声の認識結果を出力して、エレベータの呼びを登録するようにしてもよい。 In the first embodiment, the recognition result of the input voice using the acoustic model 10 selected based on the user information, the building information, and the elevator information is output, and the elevator call is registered. It was. However, among the recognition results of the input speech using all the acoustic models 10, the recognition result of the input speech using the acoustic model 10 selected based on the user information, the building information, and the elevator information is output. You may make it register the call of an elevator.

実施の形態２．
図４はこの発明の実施の形態２におけるエレベータの音声呼び登録装置とエレベータ制御部の構成図である。なお、実施の形態１と同一又は相当部分には同一符号を付して説明を省略する。Embodiment 2. FIG.
FIG. 4 is a configuration diagram of an elevator voice call registration device and an elevator control unit according to Embodiment 2 of the present invention. In addition, the same code | symbol is attached | subjected to the part which is the same as that of Embodiment 1, or an equivalent, and description is abbreviate | omitted.

実施の形態２の音声呼び登録装置は、実施の形態１の音声呼び登録装置に利用者情報記憶部１５を付加したものである。この利用者情報記憶部１５は、利用者情報抽出部６が利用者情報を抽出する度に、抽出された利用者情報をエレベータ情報に対応付けて記憶することにより、第２利用者傾向情報を蓄積する機能を備える。 The voice call registration device according to the second embodiment is obtained by adding a user information storage unit 15 to the voice call registration device according to the first embodiment. The user information storage unit 15 stores the extracted user information in association with the elevator information each time the user information extraction unit 6 extracts the user information, thereby storing the second user tendency information. It has a function to accumulate.

即ち、実施の形態２においては、利用者情報記憶部１５がエレベータ情報に連動して第２利用者傾向情報を学習する。そして、その学習結果が音響モデル１０の選択に反映される。具体的には、音響モデル選択部８は、利用者情報、第１利用者傾向だけでなく、実際のエレベータの運用中に自動で蓄積された第２利用者傾向情報をも考慮して、エレベータの呼びを認識する際に使用する音響モデル１０を選択するように設定される。 That is, in the second embodiment, the user information storage unit 15 learns the second user tendency information in conjunction with the elevator information. The learning result is reflected in the selection of the acoustic model 10. Specifically, the acoustic model selection unit 8 considers not only the user information and the first user tendency, but also the second user tendency information automatically accumulated during the actual operation of the elevator, and thus the elevator. It is set to select the acoustic model 10 to be used when recognizing the call.

そして、本実施の形態の音響モデル選択部８は、第２利用者傾向情報の蓄積量の増加に伴って、第２利用者傾向情報の重み付けを増加させるとともに、第１利用者傾向情報の重み付けを減少させるように設定される。例えば、音響モデル選択部８は、第１利用者傾向情報の重み付けを第２利用者傾向情報の学習量に反比例して減少させるように設定される。 And the acoustic model selection part 8 of this Embodiment increases the weight of 2nd user tendency information with the increase in the accumulation | storage amount of 2nd user tendency information, and the weight of 1st user tendency information. Is set to decrease. For example, the acoustic model selection unit 8 is set to decrease the weight of the first user tendency information in inverse proportion to the learning amount of the second user tendency information.

かかる構成の音声呼び登録装置においては、音声認識部１１は、エレベータ情報が変化する度に、エレベータの呼びを認識する際に使用する音響モデル１０を、音響モデル選択部８が実質的に選択した音響モデル１０に切り換える。そして、音声認識部１１は、エレベータ情報が変化する度に切り換わった音響モデル１０を使用して、音声入力部１が取り込んだ入力音声を認識する。 In the voice call registration device having such a configuration, the voice recognition unit 11 substantially selects the acoustic model 10 to be used when recognizing the elevator call whenever the elevator information changes. Switch to acoustic model 10. Then, the voice recognition unit 11 recognizes the input voice captured by the voice input unit 1 using the acoustic model 10 that is switched each time the elevator information changes.

以上で説明した実施の形態２によれば、実際のエレベータの運用中に蓄積されたエレベータ情報に対応した第２利用者傾向情報もが考慮され、エレベータが各エレベータ情報に対応した状態のときに使用される音響モデル１０が選択される。このため、利用者の特徴に合った音響モデル１０を選択する可能性をより高めることができる。 According to the second embodiment described above, the second user tendency information corresponding to the elevator information accumulated during the actual operation of the elevator is also taken into consideration, and the elevator is in a state corresponding to each elevator information. The acoustic model 10 to be used is selected. For this reason, the possibility of selecting the acoustic model 10 that matches the characteristics of the user can be further increased.

ここで、ビルに入居したテナント等の変更に伴って利用者の傾向が変わった場合、ビル情報を更新しないと、テナント等の変更当初は、利用者に合った音響モデル１０を選択する可能性が下がる。しかしながら、実施の形態２においては、各エレベータ情報に対応した第２利用者傾向情報の蓄積量の増加に伴って、第２利用者傾向情報の重み付けが増加するとともに、ビル情報に対応した第１利用者傾向情報の重み付けが減少する。このため、エレベータの運用を継続して第２利用者傾向の蓄積量が増加すれば、ビル情報を手動で更新することを忘れていても、利用者に合った音響モデル１０を選択する可能性を高くすることができる。 Here, if the user's tendency changes with the change of the tenant etc. who moved into the building, if the building information is not updated, the acoustic model 10 suitable for the user may be selected at the beginning of the change of the tenant etc. Go down. However, in the second embodiment, as the accumulated amount of the second user tendency information corresponding to each elevator information increases, the weight of the second user tendency information increases and the first corresponding to the building information. The weight of user tendency information is reduced. For this reason, if the accumulated amount of the second user tendency increases by continuing the operation of the elevator, there is a possibility of selecting the acoustic model 10 suitable for the user even if the user forgets to manually update the building information. Can be high.

実施の形態３．
図４はこの発明の実施の形態３におけるエレベータの音声呼び登録装置とエレベータ制御部の構成図である。なお、実施の形態１又は２と同一又は相当部分には同一符号を付して説明を省略する。
実施の形態３の音声呼び登録装置には、実施の形態１の情報入力部５、利用者情報抽出部６、ビル情報記憶部７に代わって、音声特徴抽出部１６、音声特徴記憶部１７、入力音声学習部１８が設けられる。Embodiment 3 FIG.
FIG. 4 is a configuration diagram of an elevator voice call registration device and an elevator control unit according to Embodiment 3 of the present invention. In addition, the same code | symbol is attached | subjected to the part which is the same as that of Embodiment 1 or 2, or an equivalent, and description is abbreviate | omitted.
In the voice call registration device of the third embodiment, instead of the information input unit 5, the user information extraction unit 6 and the building information storage unit 7 of the first embodiment, a voice feature extraction unit 16, a voice feature storage unit 17, An input speech learning unit 18 is provided.

音声特徴抽出部１６は、音響分析部４によって変換された特徴量データから音声の特徴を抽出する機能を備える。この音声の特徴は、大人の声、子供の声、女性の声、男性の声等に区別される。即ち、実施の形態３においては、音声特徴抽出部１６は、利用者の入力音声の特徴から利用者情報を抽出する利用者情報抽出部として機能する。 The speech feature extraction unit 16 has a function of extracting speech features from the feature amount data converted by the acoustic analysis unit 4. The characteristics of this voice are classified into adult voice, child voice, female voice, male voice and the like. That is, in the third embodiment, the voice feature extraction unit 16 functions as a user information extraction unit that extracts user information from the features of the user's input voice.

音声特徴記憶部１７は、音声特徴抽出部１６に音声の特徴が抽出される度に、利用者の音声の特徴をエレベータ情報と対応付けて記憶して、第２利用者傾向情報を蓄積する機能を備える。即ち、実施の形態３においては、音声特徴記憶部１７が実施の形態２の利用者情報記憶部に相当するものとして機能する。 The voice feature storage unit 17 stores the second user tendency information by storing the voice feature of the user in association with the elevator information each time the voice feature is extracted by the voice feature extraction unit 16. Is provided. That is, in the third embodiment, the voice feature storage unit 17 functions as an equivalent to the user information storage unit of the second embodiment.

入力音声学習部１８には、エレベータが各エレベータ情報に対応した状態のときに使用される音響モデル１０の初期設定として不特定話者モデルが設定されている。そして、入力音声学習部１８は、音声特徴記憶部１７に蓄積された第２利用者傾向情報に基づいて、各エレベータ状態のときにどのような音声特徴を持った利用者が多いのかを学習する。即ち、入力音声学習部１８は、上記学習によって、エレベータが各エレベータ情報に対応した状態のときに使用する音響モデル１０の設定を徐々に変化させるようになっている。 In the input speech learning unit 18, an unspecified speaker model is set as an initial setting of the acoustic model 10 used when the elevator is in a state corresponding to each elevator information. Then, the input speech learning unit 18 learns what kind of speech features the user has in each elevator state based on the second user tendency information accumulated in the speech feature storage unit 17. . That is, the input speech learning unit 18 gradually changes the setting of the acoustic model 10 used when the elevator is in a state corresponding to each elevator information by the learning.

そして、実施の形態３の音響モデル選択部８は、入力音声学習部１８が学習した各エレベータ情報に対応した第２利用者傾向情報に基づいて、エレベータの呼びを認識する際に使用する音響モデル１０を、複数の音響モデル１０の中から選択するように設定されている。 And the acoustic model selection part 8 of Embodiment 3 uses the acoustic model used when recognizing an elevator call based on the second user tendency information corresponding to each elevator information learned by the input speech learning part 18. 10 is selected to be selected from a plurality of acoustic models 10.

以上で説明した実施の形態３によれば、センサ装置、秤装置、カメラ装置からの利用者情報やビル情報に対応付けられた第１利用者傾向情報が抽出されなくても、利用者の特徴に合った音響モデル１０を選択する可能性を高めることができる。 According to the third embodiment described above, even if the first user tendency information associated with the user information and the building information from the sensor device, the scale device, and the camera device is not extracted, the user characteristics It is possible to increase the possibility of selecting the acoustic model 10 that suits the above.

以上のように、この発明に係るエレベータの音声呼び登録装置によれば、利用者を限定せずに、容易な方法で、エレベータの呼びを音声入力により登録する際に利用者の特徴に合った音響モデルを選択する可能性を高めるエレベータに利用できる。 As described above, according to the elevator voice call registration device of the present invention, it is suitable for the characteristics of the user when registering the elevator call by voice input in an easy manner without limiting the user. It can be used for an elevator that increases the possibility of selecting an acoustic model.

１音声入力部
２Ａ／Ｄ変換部
３音声切出し部
４音響分析部
５情報入力部
６利用者情報抽出部
７ビル情報記憶部
８音響モデル選択部
９認識辞書
１０音響モデル
１１音声認識部
１２エレベータ制御部
１３呼び登録部
１４エレベータ情報管理部
１５利用者情報記憶部
１６音声特徴抽出部
１７音声特徴記憶部
１８入力音声学習部DESCRIPTION OF SYMBOLS 1 Voice input part 2 A / D conversion part 3 Voice extraction part 4 Acoustic analysis part 5 Information input part 6 User information extraction part 7 Building information storage part 8 Acoustic model selection part 9 Recognition dictionary 10 Acoustic model 11 Voice recognition part 12 Elevator Control unit 13 Call registration unit 14 Elevator information management unit 15 User information storage unit 16 Speech feature extraction unit 17 Speech feature storage unit 18 Input speech learning unit

Claims

A voice input unit that captures input voice to a voice input device provided in an elevator car or a landing;
An acoustic model storage unit storing a plurality of acoustic models having different acoustic characteristics;
A user information extraction unit for extracting user information relating to the characteristics of the user at the car or the landing provided with the voice input device;
An elevator information extraction unit for extracting elevator information related to the state of the elevator including the position of a car or a landing provided with the voice input device;
Based on the user information and the elevator information, an acoustic model selection unit that selects an acoustic model to be used when recognizing the elevator call from the input voice from the plurality of acoustic models;
A user information storage unit that stores user tendency information relating to a tendency of user characteristics by associating the extracted user information with the elevator information each time the user information extraction unit extracts user information; ,
Equipped with a,
The acoustic model selection unit selects, based on the user tendency information, an acoustic model to be used when recognizing the elevator call when the elevator is in a state corresponding to the elevator information. Voice call registration device for elevators.

A speech recognition unit that outputs a recognition result of the input speech using the acoustic model selected by the acoustic model selection unit;
The elevator voice call registration device according to claim 1, further comprising:

A speech recognition unit that outputs the recognition result of the input speech using the acoustic model selected by the acoustic model selection unit among the recognition results of the input speech using the plurality of acoustic models;
The elevator voice call registration device according to claim 1, further comprising:

The user information extraction unit extracts the user information from user characteristics detected by at least one of a sensor device, a scale device, and a camera device provided in the elevator. The elevator voice call registration device according to any one of claims 3 to 4.

A building information storage unit that stores building information relating to the characteristics of each floor of the building in which the elevator is provided;
With
The acoustic model selection unit, on the basis of the user tends information and the building information, claims 1 to 4, characterized in that selecting an acoustic model to be used in recognizing the call of the elevator The elevator voice call registration device according to any one of the above.

The acoustic model selection unit increases the weighting of the user tendency information and decreases the weighting of the building information when selecting an acoustic model as the accumulated amount of the user tendency information increases. The elevator voice call registration device according to claim 5 .

The user information extraction unit, an elevator voice call registration device according to any one of claims 1 to 6, characterized in that extracting the user information from the characteristics of the input speech.