JP2023176156A

JP2023176156A - Information processing device

Info

Publication number: JP2023176156A
Application number: JP2022088290A
Authority: JP
Inventors: 敬滋堀; Takashige Hori; 浩司西山; Koji Nishiyama
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2022-05-31
Filing date: 2022-05-31
Publication date: 2023-12-13

Abstract

To provide an information processing device that can improve convenience of speech recognition.SOLUTION: An information processing device 100 which consists of an information server connected via a network to an electronic device that allows a user to input speech, includes speech input means 110, schedule acquisition means 120, intention estimation means 130, and output means 140. The speech input means 110 acquires an utterance content of the user. The schedule acquisition means 120 acquires action schedule information of the user registered in a schedule. When searching for facility information based on the utterance content, the intention estimation means 130 refers to facility information related to the action schedule information registered in the schedule within a period from present to a predetermined time ahead, and performs intention understanding of the utterance content. The output means 140 outputs results of the intention understanding.SELECTED DRAWING: Figure 2

Description

本発明は、情報処理装置に関する。 The present invention relates to an information processing device.

エージェント（音声対話サービス）は、話しかけると音声で応えて目的地や情報の検索をしてくれる。目的地のセットまで全て音声で操作できるので、キーボード等による手入力を行うことができない自動車の運転中に快適便利である。 When you talk to an agent (voice dialogue service), it responds with voice and searches for destinations and information. Since all operations including setting the destination can be performed by voice, it is comfortable and convenient while driving a car where manual input using a keyboard or the like cannot be performed.

エージェントの音声認識の精度を向上させる種々の技術が開発されている。例えば、特許文献１には、過去の発話履歴データを意図理解に活用してユーザからの少ない音声入力でも音声認識の精度を向上させることができる音声認識システム等が開示されている。 Various techniques have been developed to improve the accuracy of voice recognition by agents. For example, Patent Document 1 discloses a speech recognition system and the like that can improve the accuracy of speech recognition even with a small number of speech inputs from the user by utilizing past utterance history data for intention understanding.

特開２０１６－２４６５２号公報Japanese Patent Application Publication No. 2016-24652

特許文献１に開示されている技術のように、過去の発話履歴データは意図理解に積極的に活用されている一方、未来のデータは意図理解にほとんど活用されていない。例えば、施設検索に関して、音声認識に基づく施設探索の結果が複数該当する場合、施設を更に特定するためにユーザへの追加の問いかけが必要である。複数回の発話が必要となりユーザにとって煩わしいことがある。 As in the technology disclosed in Patent Document 1, past utterance history data is actively used to understand intentions, while future data is hardly used to understand intentions. For example, regarding a facility search, if a plurality of facility search results based on voice recognition match, an additional question to the user is required to further specify the facility. This requires multiple utterances, which may be troublesome for the user.

本発明は、かかる課題に鑑みてなされたものであり、音声認識の利便性を向上できる情報処理装置を提供することを目的とする。 The present invention has been made in view of such problems, and an object of the present invention is to provide an information processing device that can improve the convenience of voice recognition.

本発明の一態様に係る情報処理装置は、音声入力手段と、予定取得手段と、意図推定手段と、出力手段と、を備えている。音声入力手段は、ユーザの発話内容を取得する。予定取得手段は、スケジュールに登録されているユーザの行動予定情報を取得する。意図推定手段は、発話内容に基づき施設情報を検索する場合、現在から所定時間先までの期間内のスケジュールに登録されている行動予定情報に関連した施設情報を参照し、発話内容の意図理解を行う。出力手段は、意図理解の結果を出力する。 An information processing apparatus according to one aspect of the present invention includes a voice input means, a schedule acquisition means, an intention estimation means, and an output means. The voice input means acquires the content of the user's utterance. The schedule acquisition means acquires the user's action schedule information registered in the schedule. When searching for facility information based on the content of the utterance, the intention estimation means refers to the facility information related to the action plan information registered in the schedule for a period from now to a predetermined time period, and understands the intention of the content of the utterance. conduct. The output means outputs the result of intention understanding.

この態様によれば、ユーザの未来の行動予定を加味することで、施設情報を具体的に特定するよう意図理解するため、音声認識の利便性が向上する。 According to this aspect, since the user's intention is understood to specifically specify facility information by taking into account the user's future action schedule, the convenience of voice recognition is improved.

上記態様において、複数の行動予定情報が意図理解の候補に挙がる場合、より近い未来の行動予定情報を優先して意図理解するように構成されていることが好ましい。 In the above aspect, when a plurality of pieces of action plan information are listed as candidates for intention understanding, it is preferable that action plan information for the nearer future is configured to be given priority for intention understanding.

この態様によれば、複数の行動予定情報のうち、ユーザが意図している蓋然性が高い行動予定情報に関連した施設情報が上位に表示されるため、音声認識による意図理解の精度をより向上させることができる。 According to this aspect, among a plurality of pieces of action plan information, facility information related to action plan information that is likely to be the user's intention is displayed at the top, thereby further improving the accuracy of intention understanding by voice recognition. be able to.

本発明によれば、音声認識の利便性を向上できる情報処理装置を提供することができる。 According to the present invention, it is possible to provide an information processing device that can improve the convenience of voice recognition.

図１は、本発明の一実施形態の情報処理装置の概略的な構成の一例を示す図である。FIG. 1 is a diagram showing an example of a schematic configuration of an information processing apparatus according to an embodiment of the present invention. 図２は、本発明の位置実施形態の情報処理装置の機能的な構成の一例を示すブロック図である。FIG. 2 is a block diagram showing an example of the functional configuration of an information processing device according to an embodiment of the present invention. 図３は、本発明の一実施形態の情報処理装置による予定情報を活用した音声認識の一例を示す図である。FIG. 3 is a diagram illustrating an example of speech recognition using schedule information by an information processing apparatus according to an embodiment of the present invention.

添付図面を参照して、本発明の好適な実施形態について説明する。なお、各図において、同一の符号を付したものは、同一又は同様の構成を有する。以下、図面を参照して本発明について詳しく説明する。 Preferred embodiments of the present invention will be described with reference to the accompanying drawings. In addition, in each figure, those with the same reference numerals have the same or similar configurations. Hereinafter, the present invention will be explained in detail with reference to the drawings.

図１は、本発明の一実施形態の情報処理装置の概略的な構成の一例を示す図である。情報処理装置１００は、話しかけると音声で応えて目的地や情報の検索をしてくれるエージェント（音声対話サービス）の機能を有している。情報処理装置１００は、例えば、ユーザが音声入力可能な電子機器１、ネットワークＮを介して接続される情報サーバ２等で構成されている。電子機器１は、ヘッドユニット等の車両に搭載可能な車載装置であってもよいし、スマートフォン等の携帯端末であってもよい。 FIG. 1 is a diagram showing an example of a schematic configuration of an information processing apparatus according to an embodiment of the present invention. The information processing device 100 has the function of an agent (voice dialogue service) that responds with voice when spoken to and searches for destinations and information. The information processing apparatus 100 includes, for example, an electronic device 1 to which a user can input voice input, an information server 2 connected via a network N, and the like. The electronic device 1 may be an in-vehicle device such as a head unit that can be mounted on a vehicle, or may be a mobile terminal such as a smartphone.

電子機器１は、機能的な構成として、制御部１１、記憶部１２等を有している。同様に、情報サーバ２は、機能的な構成として、制御部２１、記憶部２２等を有している。なお、電子機器１及び情報サーバ２の構成要素は、図示した例に限定されず、必要に応じて任意の構成要素を適宜追加してもよい。 The electronic device 1 has a control section 11, a storage section 12, etc. as a functional configuration. Similarly, the information server 2 has a control section 21, a storage section 22, etc. as a functional configuration. Note that the components of the electronic device 1 and the information server 2 are not limited to the illustrated example, and arbitrary components may be added as necessary.

電子機器１は、カーナビゲーション装置としての機能を備えている。制御部１１は、ナビゲーション処理に加え、後述する音声入力手段１１０等、電子機器１における各種の処理を実行する機能を有する。記憶部１２は、制御部１１が処理した各種のデータを記憶する。 The electronic device 1 has a function as a car navigation device. The control unit 11 has a function of executing various processes in the electronic device 1, such as a voice input means 110, which will be described later, in addition to navigation processing. The storage unit 12 stores various data processed by the control unit 11.

制御部２１は、情報サーバ２における各種の処理を実行する機能を有する。例えば、制御部２１は、車両から送信されるデータを受信して電子機器１における各種の処理を実行する機能を有する。記憶部２２は、各種のデータを記憶する。電子機器１及び情報サーバ２の制御部１１，２１及び記憶部１２，２２は、後述する音声入力手段１１０、予定取得手段１２０、意図推定手段１３０、出力手段１４０等を構成している。 The control unit 21 has a function of executing various processes in the information server 2. For example, the control unit 21 has a function of receiving data transmitted from a vehicle and executing various processes in the electronic device 1. The storage unit 22 stores various data. Control units 11 and 21 and storage units 12 and 22 of electronic device 1 and information server 2 constitute voice input means 110, schedule acquisition means 120, intention estimation means 130, output means 140, etc., which will be described later.

図２は、本発明の位置実施形態の情報処理装置１００の機能的な構成の一例を示すブロック図である。図２に示すように、情報処理装置１００は、音声入力手段１１０と、予定取得手段１２０と、意図推定手段１３０と、出力手段１４０と、を備えている。 FIG. 2 is a block diagram showing an example of the functional configuration of the information processing device 100 according to the embodiment of the present invention. As shown in FIG. 2, the information processing device 100 includes a voice input means 110, a schedule acquisition means 120, an intention estimation means 130, and an output means 140.

音声入力手段１１０は、電子機器１のマイク等から入力されたユーザの音声を電気的な音声波形信号に変換する等により、ユーザの発話内容を取得する。予定取得手段１２０は、ユーザのスケジュールを登録可能な予定ＤＢを参照し、スケジュールに登録されているユーザの行動予定情報を取得する。予定ＤＢは、情報処理装置１００を構成する情報サーバ２の記憶部２２に含まれていてもよいし、情報サーバ２とは異なるサーバに含まれていてもよい。 The voice input means 110 acquires the content of the user's utterance by converting the user's voice input from a microphone or the like of the electronic device 1 into an electrical voice waveform signal. The schedule acquisition unit 120 refers to a schedule DB in which a user's schedule can be registered, and acquires the user's action schedule information registered in the schedule. The schedule DB may be included in the storage unit 22 of the information server 2 that constitutes the information processing device 100, or may be included in a server different from the information server 2.

意図推定手段１３０は、音声入力手段１１０によって取得された発話内容の意図理解を行う。例えば、意図推定手段１３０は、発話内容に基づき施設情報を検索する場合において、現在から所定時間先までの期間内（例えば、一週間以内）のスケジュールに登録されている行動予定情報に関連した施設情報を参照し、発話内容の意図理解を行う。意図推定手段１３０は、参照すべきスケジュールの範囲をユーザが任意に設定できるように構成されている。例えば、ユーザが音声入力手段１１０を通じて現在からどこまで先の期間内のスケジュールを参照するか所定時間を設定できる。 The intention estimation means 130 understands the intention of the utterance content acquired by the voice input means 110. For example, when searching for facility information based on the content of utterances, the intention estimation means 130 may search facilities related to action plan information registered in a schedule from now to a predetermined time (for example, within one week). Refer to the information and understand the intent of the utterance. The intention estimating means 130 is configured to allow the user to arbitrarily set the range of schedules to be referenced. For example, the user can set a predetermined time period through the voice input means 110 to refer to the schedule within a period from the present time.

出力手段１４０は、電子機器１を用いて意図理解の結果を出力する。例えば、電子機器１のモニターに意図理解の結果を表示したり、電子機器１のスピーカーから意図理解の結果を発話したりすることができる。 The output means 140 uses the electronic device 1 to output the result of intention understanding. For example, the result of intention understanding can be displayed on the monitor of electronic device 1, or the result of intention understanding can be uttered from the speaker of electronic device 1.

図３は、本発明の一実施形態の情報処理装置１００による予定情報を活用した音声認識の一例を示す図である。図３に示すように、エージェントとして機能する情報処理装置１００は、音声認識による施設検索実行時に、スケジュールに登録されているデータを参照し、意図理解に活用することが特徴の一つである。 FIG. 3 is a diagram illustrating an example of speech recognition using schedule information by the information processing apparatus 100 according to an embodiment of the present invention. As shown in FIG. 3, one of the characteristics of the information processing apparatus 100 functioning as an agent is that when performing a facility search using voice recognition, data registered in the schedule is referenced and utilized for understanding the intention.

例えば、屋号「Ａ居酒屋」を用いて複数の店舗（Ｂ店，Ｃ店，Ｄ店，…）を展開している居酒屋チェーンがあり、かつスケジュールに行動予定情報として「１月１日１８：００のＡ居酒屋Ｂ店」が登録されていた場合、ユーザから「Ａ居酒屋」、「居酒屋」又は「飲み会」と発話されたら、「Ａ居酒屋Ｂ店」を上位に表示する。 For example, there is an izakaya chain that operates multiple stores (store B, store C, store D, etc.) using the store name "A Izakaya," and the schedule includes activity schedule information such as "January 1st 18:00. In the case where "A Izakaya B store" is registered, when the user utters "A Izakaya", "Izakaya", or "Drinking party", "A Izakaya B store" is displayed at the top.

固有名詞である「Ａ居酒屋」は、スケジュールに登録されている「Ａ居酒屋Ｂ店」と一部一致するキーワードである。一般名詞である「居酒屋」は、スケジュールに登録されている「Ａ居酒屋Ｂ店」と一部一致するキーワードである。 The proper noun "A Izakaya" is a keyword that partially matches "A Izakaya B store" registered in the schedule. The common noun "Izakaya" is a keyword that partially matches "A Izakaya B store" registered in the schedule.

「居酒屋」や「飲み会」は、「Ａ居酒屋Ｂ店」に関連するキーワードの一例である。「居酒屋」は、「Ａ居酒屋Ｂ店」の施設が属する業種である日本標準産業分類の細分類番号７６５１「酒場，ビヤホール」に含まれ、「Ａ居酒屋Ｂ店」と関連性が深いキーワードである。「飲み会」は、スケジュールに登録されている「Ａ居酒屋Ｂ店」の施設の主たる使用目的であり、「Ａ居酒屋Ｂ店」と関連性が深いキーワードである。 “Izakaya” and “drinking party” are examples of keywords related to “A Izakaya B store”. "Izakaya" is included in subdivision number 7651 "Bars and beer halls" of the Japan Standard Industrial Classification, which is the industry to which the facility of "A Izakaya B store" belongs, and is a keyword closely related to "A Izakaya B store". . “Drinking party” is the main purpose of use of the facility of “A Izakaya B store” registered in the schedule, and is a keyword closely related to “A Izakaya B store”.

以上のように構成された本実施形態の情報処理装置１００によれば、ユーザの未来の行動予定を加味することで、施設情報を具体的に特定するよう意図理解するため、音声認識の利便性が向上する。音声認識の利便性を向上できる情報処理装置１００は、キーボード等による手入力を行うことができない自動車の運転中に快適便利である。 According to the information processing apparatus 100 of the present embodiment configured as described above, the user's future action schedule is taken into account to understand the user's intention to specifically specify the facility information, thereby improving the convenience of voice recognition. will improve. The information processing device 100 that can improve the convenience of voice recognition is comfortable and convenient while driving a car where manual input using a keyboard or the like cannot be performed.

複数の行動予定情報が意図理解の候補に挙がる場合、より近い未来の行動予定情報を優先して意図理解するように構成されていることが好ましい。例えば、現在から所定時間先までの期間内に「１月１日１８：００のＡ居酒屋Ｂ店」「１月２日１７：００のＡ居酒屋Ｃ店」がスケジュールにともに登録されていた場合、ユーザから「Ａ居酒屋」「居酒屋」「飲み会」と発話されたら、１月１日の行動予定情報と１月２日の行動予定情報と複数の行動予定情報が意図理解の候補に挙がる。 When a plurality of pieces of action plan information are selected as candidates for intention understanding, it is preferable that action plan information in the near future be prioritized for intention understanding. For example, if "A Izakaya B store at 18:00 on January 1st" and "A Izakaya C store at 17:00 on January 2nd" are both registered in the schedule within the period from now to a predetermined time, When the user utters "A-izakaya," "izakaya," and "drinking party," action schedule information for January 1st, action schedule information for January 2nd, and a plurality of action schedule information are listed as candidates for understanding the intention.

そのような場合、より近い未来である１月１日の行動予定情報を優先して意図理解し、「Ａ居酒屋Ｂ店」を「Ａ居酒屋Ｃ店」よりも上位に表示するように構成されていることが好ましい。この態様によれば、複数の行動予定情報のうち、ユーザが意図している蓋然性が高い行動予定情報に関連した施設情報が上位に表示されるため、音声認識による意図理解の精度をより向上させることができる。 In such a case, the system is configured to give priority to the action schedule information for January 1st, which is in the nearer future, to understand the intention and display "A Izakaya B store" higher than "A Izakaya C store". Preferably. According to this aspect, among a plurality of pieces of action plan information, facility information related to action plan information that is likely to be the user's intention is displayed at the top, thereby further improving the accuracy of intention understanding by voice recognition. be able to.

情報処理装置１００は、スケジュールに登録されている行動予定情報に付与された属性を参照することにより、意図理解に用いる行動予定情報と、意図理解に用いない行動予定情報と、を選択可能に構成されていてもよい。例えば、行動予定情報が、予定ＤＢを利用する他のユーザも参照可能な「全体公開情報」の属性と、自分自身のみが参照できる「プライベート」の属性とに振り分けられていた場合、「プライベート」の属性のみを意図理解に用いるように構成されていてもよい。 The information processing device 100 is configured to be able to select behavior schedule information to be used for intention understanding and behavior schedule information not to be used for intention understanding, by referring to attributes given to behavior schedule information registered in the schedule. may have been done. For example, if action schedule information is divided into an attribute of "public information" that can be referenced by other users who use the schedule database, and an attribute of "private" that can be referenced only by the user, "private" It may be configured such that only the attributes of are used for understanding the intention.

ユーザによるタグ分類等により行動予定情報に「仕事」又は「オフ」の属性を付与していた場合、「オフ」の属性のみを意図理解に用いるように構成されていてもよい。図２に示すように、情報処理装置１００は、固有名詞の属性を判定する判定手段１５０を更に備えていてもよい。 If the attribute of "work" or "off" is assigned to the action schedule information by tag classification by the user, etc., the configuration may be such that only the "off" attribute is used to understand the intention. As shown in FIG. 2, the information processing device 100 may further include determining means 150 for determining the attributes of proper nouns.

以上説明した実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。実施形態が備える各要素並びにその配置、材料、条件、形状及びサイズ等は、例示したものに限定されるわけではなく適宜変更することができる。また、異なる実施形態で示した構成同士を部分的に置換し又は組み合わせることが可能である。 The embodiments described above are intended to facilitate understanding of the present invention, and are not intended to be interpreted as limiting the present invention. Each element included in the embodiment, as well as its arrangement, material, conditions, shape, size, etc., are not limited to those illustrated, and can be changed as appropriate. Further, it is possible to partially replace or combine the structures shown in different embodiments.

１…電子機器、２…情報サーバ、１１，２１…制御部、１２，２２…記憶部、１００…情報処理装置、１１０…音声入力手段、１２０…予定取得手段、１３０…意図推定手段、１４０…出力手段、１５０…判定手段。 DESCRIPTION OF SYMBOLS 1... Electronic device, 2... Information server, 11, 21... Control unit, 12, 22... Storage unit, 100... Information processing device, 110... Voice input means, 120... Schedule acquisition means, 130... Intention estimation means, 140... Output means, 150...determination means.

Claims

a voice input means for acquiring the content of the user's utterance;
a schedule acquisition means for acquiring action schedule information of a user registered in the schedule;
When searching for facility information based on the content of the utterance, the intent of the content of the utterance is understood by referring to facility information related to the action plan information registered in the schedule for a period from now to a predetermined time ahead. Intention estimation means;
An information processing device comprising: output means for outputting the result of the intention understanding.