JP2020160781A

JP2020160781A - Voice recognition order system and voice recognition order method

Info

Publication number: JP2020160781A
Application number: JP2019059274A
Authority: JP
Inventors: 千洋江波戸; Chihiro Ebato
Original assignee: Echigoya Co Ltd
Current assignee: Echigoya Co Ltd
Priority date: 2019-03-26
Filing date: 2019-03-26
Publication date: 2020-10-01

Abstract

To provide a voice recognition order system and a voice recognition order method, in which a smart speaker is used to process orders such as food orders.SOLUTION: A voice recognition order system S is configured to include a smart speaker 31A and an order server 10 connected to the smart speaker 31A via a network so that information can be exchanged. The smart speaker 31A receives voice data obtained when a user speaks in a trigger activated state, recognizes the voice data, authenticates that a specified access keyword has been spoken, and generates text information that identifies user's order content based on the voice data. When the authentication of the access keyword having been spoken is successful, order information including the text information is transmitted to the order server 10.SELECTED DRAWING: Figure 1

Description

本発明は、スマートスピーカーを用いて、オーダー処理を行う音声認識オーダーシステム及び音声認識オーダー方法に関する。 The present invention relates to a voice recognition order system and a voice recognition order method for performing order processing using a smart speaker.

近年、レストラン等の飲食店において、テーブルに設置されたタッチパネル式の端末装置から注文を行うシステムが採用されている。端末装置の表示画面に表示されたメニューの中から、料理と個数をタッチパネルにより選択し、「注文」ボタンをタッチすると、そのオーダー情報が端末装置から厨房内の注文受付装置に無線ネットワークを介して送信されるようになっている。厨房内の注文受付装置では、オーダー情報をプリントアウトしたり注文受付装置の表示画面に表示させたりして、オーダー情報を調理者に提示することにより、調理が開始される。このようなタッチパネル式のオーダーシステムは、従来公知のＰＯＳシステムと連動して、会計処理や集計処理までもスムーズに行うことができるよう構成されているものもある。さらには、テーブル席に設置されたロボットと、直接的あるいはタブレット等を介して間接的に対話を行い、対話の内容から顧客の状況を判断し、店舗スタッフへとオペレーションの指示を出す店舗管理システムに関する技術も開示されている（特許文献１）。 In recent years, in restaurants such as restaurants, a system for placing an order from a touch panel type terminal device installed on a table has been adopted. From the menu displayed on the display screen of the terminal device, select the dish and quantity on the touch panel and touch the "Order" button, and the order information will be sent from the terminal device to the order receiving device in the kitchen via the wireless network. It is supposed to be sent. In the order receiving device in the kitchen, cooking is started by presenting the order information to the cook by printing out the order information or displaying it on the display screen of the order receiving device. Some of such touch panel type ordering systems are configured so that accounting processing and aggregation processing can be smoothly performed in conjunction with a conventionally known POS system. Furthermore, a store management system that directly or indirectly interacts with the robot installed at the table seat, judges the customer's situation from the content of the dialogue, and gives operation instructions to the store staff. (Patent Document 1) is also disclosed.

特開２０１８−１６１７１１号公報JP-A-2018-1617111

従来のタッチパネル式のオーダーシステムでは、顧客はタッチパネルを用いて必要なキータッチを行って料理のメニューページを開き、料理を選択し、さらに個数を選択し、最後にオーダー決定を行っている。
一方で、昨今、検索エンジンを使った調査、オンラインニュースの読み上げ、音楽や動画の再生といった操作をユーザの音声にて受け付けて実行するスマートスピーカーが販売されている。スマートスピーカーであれば、ユーザはタッチパネルを操作せずとも発話することで各種操作を行うことができる。
しかしながら、現在市販されているスマートスピーカー及びスマートスピーカーによるシステムでは、スマートスピーカーを客席に置いただけでは、タッチパネル表示画面へ所望のメニューページを開いたり、厨房へ所望の料理の注文を行なうことはできない。例えば、「ビール」と発話した場合、スマートスピーカーは通常接続されるＡＩアシスタントサーバへ接続し、「ビール」に紐づけられた一般的な検索結果、例えば、「ビールとは、アルコール飲料の一種。様々な作り方があるが、主に大麦を発芽させた麦芽（デンプンが酵素（アミラーゼ）で糖化している）を、ビール酵母でアルコール発酵させて・・・以下、略」といった最適解を返すだけで、料理の注文には結びつかない。 In the conventional touch panel type order system, the customer makes a necessary key touch using the touch panel to open the menu page of the dish, selects the dish, further selects the number, and finally decides the order.
On the other hand, recently, smart speakers that accept and execute operations such as surveys using search engines, reading online news, and playing music and videos by user's voice are on the market. With a smart speaker, the user can perform various operations by speaking without operating the touch panel.
However, in the smart speakers currently on the market and systems using smart speakers, it is not possible to open a desired menu page on the touch panel display screen or order a desired dish from the kitchen simply by placing the smart speaker in the audience seat. For example, when speaking "beer", the smart speaker connects to the AI assistant server that is normally connected, and general search results associated with "beer", for example, "beer is a type of alcoholic beverage. There are various ways to make it, but mainly the malt that has sprouted barley (starch is saccharified by an enzyme (amylase)) is alcohol-fermented with beer yeast, and the optimum solution is returned. So, it doesn't lead to ordering food.

本発明は、このような問題等に鑑みて、その目的は、スマートスピーカーを用いて料理の注文等のオーダー処理を行う音声認識オーダーシステム及び音声認識オーダー方法を提供することにある。 In view of such problems and the like, an object of the present invention is to provide a voice recognition order system and a voice recognition order method for performing order processing such as ordering food using a smart speaker.

本発明の音声認識オーダーシステムは、音声入力手段と音声出力手段を少なくとも備えるスマートスピーカーと、ネットワークを介して前記スマートスピーカーと情報の授受を可能に接続された第一のコンピュータを含んで構成された音声認識オーダーシステムであって、前記スマートスピーカーは、トリガ起動状態中にユーザが発話することで得られる音声データを受信する音声データ受信手段、前記音声データ受信手段が受信した音声データを認識して、所定のアクセスキーワードが発話されたこと認証する認証手段、前記音声データ受信手段が受信した音声データに基づいてユーザのオーダー内容を特定するテキスト情報を生成するテキスト情報生成手段、及び、前記認証手段によりアクセスキーワードの発話の認証が成功している場合に、前記テキスト情報生成手段が生成した前記テキスト情報を含むオーダー情報を前記第一のコンピュータに送信するオーダー情報送信手段を有し、前記第一のコンピュータは、前記スマートスピーカーの前記オーダー情報送信手段が送信した前記オーダー情報を受信すると、受信した前記オーダー情報に基づいてオーダー受付処理を実行する受付処理手段を有する。 The voice recognition ordering system of the present invention includes a smart speaker including at least a voice input means and a voice output means, and a first computer connected to the smart speaker so as to be able to exchange information via a network. In the voice recognition order system, the smart speaker recognizes voice data receiving means for receiving voice data obtained by speaking by a user while the trigger is activated, and voice data received by the voice data receiving means. , An authentication means that authenticates that a predetermined access keyword has been spoken, a text information generation means that generates text information that identifies a user's order content based on the voice data received by the voice data receiving means, and the authentication means. The first computer has an order information transmitting means for transmitting order information including the text information generated by the text information generating means to the first computer when the authentication of the speech of the access keyword is successful. The computer has a reception processing means that executes an order reception process based on the received order information when the order information transmitted by the order information transmission means of the smart speaker is received.

前記第一のコンピュータは、受信した前記オーダー情報に基づいて、前記オーダー内容に応じた処理を実行するよう指示するオーダー実行指示情報を生成する指示情報生成手段と、前記指示情報生成手段が生成した前記オーダー実行指示情報を、ネットワークを介して前記第一のコンピュータと情報の授受を可能に接続された第二のコンピュータに送信する指示情報送信手段を有し、前記第二のコンピュータは、前記第一のコンピュータの前記指示情報送信手段が送信した前記オーダー実行指示情報を受信する指示情報受信手段と、前記指示情報受信手段が受信した前記オーダー実行指示情報に基づいて、前記オーダー内容に応じた処理を実行するオーダー実行手段を有するよう構成してもよい。 Based on the received order information, the first computer generates instruction information generating means for generating order execution instruction information for instructing execution of processing according to the order contents, and the instruction information generating means. The second computer has an instruction information transmitting means for transmitting the order execution instruction information to a second computer connected to the first computer via a network so as to be able to exchange information. Processing according to the order contents based on the instruction information receiving means for receiving the order execution instruction information transmitted by the instruction information transmitting means of one computer and the order execution instruction information received by the instruction information receiving means. It may be configured to have an order execution means for executing.

複数の前記スマートスピーカーと、各前記スマートスピーカーのそれぞれに対応付けられた端末装置であって、ユーザが操作指示可能な前記端末装置を含む前記音声認識オーダーシステムにおいて、前記オーダー実行指示情報は、前記オーダー情報の送信元であるスマートスピーカーを特定するためのＩＤ情報を含み、前記第二のコンピュータは、前記オーダー実行指示情報に含まれる前記ＩＤ情報により特定されたスマートスピーカーに対応づけられた前記端末装置を、制御対象端末装置として前記オーダー実行指示情報に基づいて制御する端末制御手段を有するよう構成してもよい。 In the voice recognition order system including the plurality of smart speakers and the terminal device associated with each of the smart speakers and capable of instructing the operation by the user, the order execution instruction information is the same. The second computer includes ID information for identifying the smart speaker that is the source of the order information, and the second computer is the terminal associated with the smart speaker specified by the ID information included in the order execution instruction information. The device may be configured to have a terminal control means for controlling the device as a control target terminal device based on the order execution instruction information.

前記第二のコンピュータは、前記制御対象端末装置の表示画面に、前記オーダー実行指示情報が示す前記オーダー内容に応じた表示を行うよう指示する旨の制御情報を、前記制御対象端末装置に送信するよう構成してもよい。 The second computer transmits control information to the control target terminal device to instruct the display screen of the control target terminal device to display according to the order content indicated by the order execution instruction information. It may be configured as follows.

本発明の音声認識オーダー方法は、音声入力手段と音声出力手段を少なくとも備えるスマートスピーカーと、ネットワークを介して前記スマートスピーカーと情報の授受を可能に接続された第一のコンピュータにより実行される音声認識オーダー方法であって、前記スマートスピーカーが、トリガ起動状態中にユーザが発話することで得られる音声データを受信する音声データ受信ステップと、受信した音声データを認識して、所定のアクセスキーワードが発話されたこと認証する認証ステップと、前記音声データ受信ステップにて受信した音声データに基づいてユーザのオーダー内容を特定するテキスト情報生成するテキスト情報生成ステップと、前記認証ステップにてアクセスキーワードの発話の認証が成功している場合に、前記テキスト情報生成ステップにて生成された前記テキスト情報を含むオーダー情報を第一の前記コンピュータに送信するオーダー情報送信ステップと、を有し、前記第一のコンピュータが、前記スマートスピーカーから送信された前記オーダー情報を受信すると、当該オーダー情報に基づいてオーダー受付処理を実行する受付処理ステップと、を有する。 The voice recognition ordering method of the present invention is performed by a smart speaker having at least a voice input means and a voice output means, and a first computer connected to the smart speaker via a network so as to exchange information. In the ordering method, the smart speaker recognizes the voice data reception step of receiving the voice data obtained by the user speaking while the trigger is activated and the received voice data, and a predetermined access keyword is spoken. An authentication step for authenticating that the computer has been performed, a text information generation step for generating text information that identifies the user's order content based on the voice data received in the voice data reception step, and an access keyword utterance in the authentication step. The first computer has an order information transmission step of transmitting order information including the text information generated in the text information generation step to the first computer when the authentication is successful. However, when it receives the order information transmitted from the smart speaker, it has a reception processing step of executing an order reception process based on the order information.

前記第一のコンピュータが、前記スマートスピーカーから送信された前記オーダー情報を受信すると、当該オーダー情報に基づいて、前記オーダー内容に応じた処理を実行するよう指示するオーダー実行指示情報を生成する指示情報生成ステップと、前記指示情報生成ステップにより生成された前記オーダー実行指示情報を、ネットワークを介して前記第一のコンピュータと情報の授受を可能に接続された第二のコンピュータに送信する指示情報送信ステップと、を有し、前記第二のコンピュータが、前記第一のコンピュータから送信された前記オーダー実行指示情報を受信する指示情報受信ステップと、前記指示情報受信ステップにて受信した前記オーダー実行指示情報に基づいて、前記オーダー内容に応じた処理を実行するオーダー実行ステップと、を有するよう構成してもよい。 When the first computer receives the order information transmitted from the smart speaker, instruction information for generating order execution instruction information instructing to execute a process according to the order contents based on the order information. The instruction information transmission step of transmitting the generation step and the order execution instruction information generated by the instruction information generation step to a second computer connected to the first computer via a network so as to be able to exchange information. And, the instruction information receiving step in which the second computer receives the order execution instruction information transmitted from the first computer, and the order execution instruction information received in the instruction information receiving step. Based on the above, it may be configured to have an order execution step for executing a process according to the order contents.

前記オーダー実行指示情報は、前記オーダー情報の送信元である前記スマートスピーカーを特定するためのＩＤ情報を含み、前記第二のコンピュータが、複数の前記スマートスピーカーと、各前記スマートスピーカーのそれぞれに対応付けられた端末装置であって、ユーザが操作指示可能な前記端末装置のうち、前記オーダー実行指示情報に含まれる前記ＩＤ情報により特定されたスマートスピーカーに対応づけられた前記端末装置を、制御対象端末装置として前記オーダー実行指示情報に基づいて制御する端末制御ステップを有するよう構成してもよい。 The order execution instruction information includes ID information for identifying the smart speaker that is the source of the order information, and the second computer corresponds to the plurality of the smart speakers and each of the smart speakers. Among the attached terminal devices that can be operated by the user, the terminal device associated with the smart speaker specified by the ID information included in the order execution instruction information is controlled. The terminal device may be configured to have a terminal control step for controlling based on the order execution instruction information.

前記第二のコンピュータが、前記制御対象端末装置の表示画面に、前記オーダー実行指示情報が示す前記オーダー内容に応じた表示を行うよう指示する旨の制御情報を、前記制御対象端末装置に送信よう構成してもよい。 Let's send control information to the control target terminal device that the second computer instructs the display screen of the control target terminal device to display according to the order content indicated by the order execution instruction information. It may be configured.

本発明によれば、スマートスピーカーを用いて料理の注文等のオーダー処理を行う音声認識オーダーシステム及び音声認識オーダー方法を提供することができる。 According to the present invention, it is possible to provide a voice recognition order system and a voice recognition order method for performing order processing such as ordering food using a smart speaker.

音声認識オーダーシステムＳの構成例を示す説明図である。It is explanatory drawing which shows the configuration example of the voice recognition order system S. オーダーサーバ１０の構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of order server 10. 店舗サーバ３０の構成を概略的に示すブロック図である。It is a block diagram which shows the structure of the store server 30 schematicly. スマートスピーカー３１Ａの構成を概略的に示すブロック図である。It is a block diagram which shows schematic structure of smart speaker 31A. タブレット端末３１Ｂの構成を概略的に示すブロック図である。It is a block diagram which shows the structure of the tablet terminal 31B schematicly. 店舗情報ＤＢ１０２１の一例である。This is an example of the store information DB 1021. 発話シナリオＤＢ１０２２の一例である。This is an example of the utterance scenario DB 1022. 客席端末管理ＤＢ３０２１の一例である。This is an example of the audience seat terminal management DB 3021. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example. ホテル情報ＤＢ１０２３の一例である。This is an example of the hotel information DB1023. 発話シナリオＤＢ１０２４の一例である。This is an example of the utterance scenario DB1024. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example. 処理動作例を説明するためのシーケンスチャートである。It is a sequence chart for explaining the processing operation example.

本発明の実施形態について、図面を参照して説明する。ただし、以下に説明する実施形態は、あくまでも例示であり、以下に明示しない種々の変形や技術の適用を排除する意図はない。即ち、本発明は、その効果を奏する限りにおいて種々変形(各実施例を組み合わせる等)して実施することができる。また、以下の図面の記載において、同一又は類似の部分には同一又は類似の符号を付して表している。図面は模式的なものであり、必ずしも実際の寸法や比率等とは一致しない。図面相互間においても互いの寸法の関係や比率が異なる部分が含まれていることがある。 An embodiment of the present invention will be described with reference to the drawings. However, the embodiments described below are merely examples, and there is no intention of excluding the application of various modifications and techniques not specified below. That is, the present invention can be implemented with various modifications (combining each embodiment, etc.) as long as the effect is exhibited. Further, in the description of the following drawings, the same or similar parts are designated by the same or similar reference numerals. The drawings are schematic and do not necessarily match the actual dimensions and ratios. Even between drawings, parts with different dimensional relationships and ratios may be included.

図１は、音声認識オーダーシステムＳの構成例を示す説明図である。なお、ここではレストラン等において各客席に用意された客席端末３１と、第一のコンピュータとしてのオーダーサーバ１０と、第二のコンピュータとしての店舗サーバ３０、当該音声認識オーダーシステムＳを運営する事業者が設置するサービス提供者装置２０、スマートスピーカー３１ＡのＡＩアシスタントサーバ４０とからなり、それぞれ有線又は無線にてネットワークに接続している。音声認識オーダーシステムＳは、複数のレストランに適用できる。各レストランには複数の客席がある。 FIG. 1 is an explanatory diagram showing a configuration example of the voice recognition order system S. Here, a business operator that operates a seat terminal 31 prepared for each seat in a restaurant or the like, an order server 10 as a first computer, a store server 30 as a second computer, and the voice recognition order system S. The service provider device 20 installed by the company and the AI assistant server 40 of the smart speaker 31A are connected to the network by wire or wirelessly, respectively. The voice recognition order system S can be applied to a plurality of restaurants. Each restaurant has multiple seats.

客席端末３１は、スマートスピーカー３１Ａと当該スマートスピーカー３１Ａと一対で使用される客席タブレット端末（以下、タブレット端末３１Ｂと言う。）からなる。客席端末３１は、無線接続にてアクセスポイント（不図示）を介してネットワークに接続し、その他のシステム構成要素は有線にてネットワークに接続している。アクセスポイント（不図示）は、無線端末を相互に接続し、有線ネットワーク等のネットワークに接続する無線機である。 The audience seat terminal 31 includes a smart speaker 31A and an audience seat tablet terminal (hereinafter, referred to as a tablet terminal 31B) used in pairs with the smart speaker 31A. The audience terminal 31 is wirelessly connected to the network via an access point (not shown), and other system components are connected to the network by wire. An access point (not shown) is a wireless device that connects wireless terminals to each other and connects to a network such as a wired network.

なお、本実施形態では、サーバ側の構成として、オーダーサーバ１０、サービス提供者装置２０及びＡＩアシスタントサーバ４０をそれぞれ分離した構成としたが、これらは一のコンピュータ上で構成されてもよいし、それぞれのサーバを更に複数のコンピュータで構成することもできる。 In the present embodiment, the order server 10, the service provider device 20, and the AI assistant server 40 are separated from each other as the server-side configuration, but these may be configured on one computer. Each server can be further composed of a plurality of computers.

音声認識オーダーシステムＳは、音声入力手段の一例としてのマイクロフォンと、音声出力手段の一例としてのスピーカーと、を少なくとも備えるスマートスピーカー３１Ａと、ネットワークを介してスマートスピーカー３１Ａと情報の授受を可能に接続された第一のコンピュータの一例としてのオーダーサーバ１０を含んで構成された音声認識オーダーシステムＳである。 The voice recognition order system S is connected to a smart speaker 31A including at least a microphone as an example of a voice input means and a speaker as an example of a voice output means, and information can be exchanged with the smart speaker 31A via a network. It is a voice recognition order system S configured to include an order server 10 as an example of the first computer.

スマートスピーカー３１Ａは、トリガ起動状態中にユーザが発話することで得られる音声データを受信する音声データ受信手段、音声データ受信手段が受信した音声データを認識して、所定のアクセスキーワードが発話されたこと認証する認証手段、音声データ受信手段が受信した音声データに基づいてユーザのオーダー内容を特定するテキスト情報を生成するテキスト情報生成手段、及び、認証手段によりアクセスキーワードの発話の認証が成功している場合に、テキスト情報生成手段が生成したテキスト情報を含むオーダー情報をオーダーサーバ１０に送信するオーダー情報送信手段を有する。オーダーサーバ１０は、スマートスピーカー３１Ａのオーダー情報送信手段が送信したオーダー情報を受信すると、受信したオーダー情報に基づいてオーダー受付処理を実行する受付処理手段を有する。 The smart speaker 31A recognizes the voice data receiving means for receiving the voice data obtained by the user speaking while the trigger is activated and the voice data received by the voice data receiving means, and a predetermined access keyword is spoken. The authentication means for authenticating, the text information generation means for generating text information that identifies the user's order content based on the voice data received by the voice data receiving means, and the authentication means have successfully authenticated the utterance of the access keyword. If so, it has an order information transmitting means for transmitting the order information including the text information generated by the text information generating means to the order server 10. When the order server 10 receives the order information transmitted by the order information transmitting means of the smart speaker 31A, the order server 10 has a reception processing means that executes an order reception process based on the received order information.

図２は、オーダーサーバ１０の構成を概略的に示すブロック図、図３は、店舗サーバ３０の構成を概略的に示すブロック図、図４は、スマートスピーカー３１Ａの構成を概略的に示すブロック図、図５は、タブレット端末３１Ｂの構成を概略的に示すブロック図である。 FIG. 2 is a block diagram schematically showing the configuration of the order server 10, FIG. 3 is a block diagram schematically showing the configuration of the store server 30, and FIG. 4 is a block diagram schematically showing the configuration of the smart speaker 31A. , FIG. 5 is a block diagram schematically showing the configuration of the tablet terminal 31B.

＜オーダーサーバ１０＞
オーダーサーバ１０は、主に、各レストランの各客席のスマートスピーカー３１Ａからオーダー情報を受信して必要なオーダー受付処理を実行する装置である。
オーダーサーバ１０は、演算機能を有するＣＰＵ、作業用ＲＡＭ、各種データ及びプログラムを記憶するＲＯＭ等から構成された本発明の第一のコンピュータとしての制御部１０１、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置等を備える記憶部１０２、モニタ等の表示画面を備える表示部１０３、音声認識オーダーシステムＳを運営する事業者からの指示を受け付け当該指示に応じた指示信号を制御部１０１に対して与える入力部（例えば、キーボード、マウス、操作パネル（タッチパネルを含む）等）１０４、各種ネットワーク（ＬＡＮ（Local Area Network）を含む）を介して客席端末３１、サービス提供者装置２０及び店舗サーバ３０などと通信を行なうための通信部１０５を備えて構成されている。各構成部材はバスを介して相互に接続されている。 <Order server 10>
The order server 10 is mainly a device that receives order information from the smart speaker 31A of each audience seat of each restaurant and executes necessary order acceptance processing.
The order server 10 is a semiconductor memory element such as a control unit 101, a RAM, and a flash memory as the first computer of the present invention, which is composed of a CPU having a calculation function, a work RAM, a ROM for storing various data and programs, and the like. Alternatively, a storage unit 102 having a storage device such as a hard disk or an optical disk, a display unit 103 having a display screen such as a monitor, or an instruction signal corresponding to the instruction received from a business operator operating the voice recognition order system S. Is given to the control unit 101 via an input unit (for example, a keyboard, a mouse, an operation panel (including a touch panel), etc.) 104, various networks (including a LAN (Local Area Network)), an audience terminal 31, and a service provider. It is configured to include a communication unit 105 for communicating with the device 20 and the store server 30 and the like. The components are connected to each other via a bus.

制御部１０１は、受付処理手段１０１１、指示情報生成手段１０１２、指示情報送信手段１０１３を含み、他の部材と協動して本発明の各手段として機能する。
記憶部１０２は、店舗情報データベース（ＤＢ）１０２１、発話シナリオデータベース（ＤＢ）１０２２等を記憶する。 The control unit 101 includes a reception processing means 1011 and an instruction information generating means 1012, and an instruction information transmitting means 1013, and functions as each means of the present invention in cooperation with other members.
The storage unit 102 stores the store information database (DB) 1021, the utterance scenario database (DB) 1022, and the like.

店舗情報ＤＢ１０２１には、音声認識オーダーシステムＳを導入した企業が運営する店舗（例えば、レストラン）の店舗情報が登録（記憶）されている。図６Ａは、店舗情報ＤＢ１０２１の一例であり、図６Ａに示す例の場合、店舗を一意に特定する「店舗ＩＤ」に対応付けて、店舗サーバ３０の宛先情報を含む「店舗情報」、「アクセスキーワード」が登録されている。全ての店舗にそれぞれ異なる店舗ＩＤが付与されている。 In the store information DB 1021, store information of a store (for example, a restaurant) operated by a company that has introduced the voice recognition order system S is registered (stored). FIG. 6A is an example of the store information DB 1021, and in the case of the example shown in FIG. 6A, "store information" including the destination information of the store server 30 and "access" are associated with the "store ID" that uniquely identifies the store. "Keyword" is registered. All stores are given different store IDs.

「店舗情報」には、各店舗内に設置されている各客席端末３１のタブレット端末３１Ｂを一意に特定する「タブレット端末ＩＤ」と、タブレット端末３１Ｂに対し必要な制御情報を送信するための「タブレット端末宛先情報」と、スマートスピーカー３１Ａを一意に特定するＩＤ情報の一例としての「スマートスピーカーＩＤ」と、スマートスピーカー３１Ａに対し応答する宛先となる「スマートスピーカー宛先情報」と、が対応付けて登録されている。 The "store information" includes a "tablet terminal ID" that uniquely identifies the tablet terminal 31B of each audience seat terminal 31 installed in each store, and a "tablet terminal ID" for transmitting necessary control information to the tablet terminal 31B. The "tablet terminal destination information", the "smart speaker ID" as an example of the ID information that uniquely identifies the smart speaker 31A, and the "smart speaker destination information" that is the destination that responds to the smart speaker 31A are associated with each other. It is registered.

発話シナリオＤＢ１０２２には、スマートスピーカー３１Ａを通じてレストランのユーザと行われる会話の複数の発話シナリオ情報が蓄積されている。発話シナリオとは、ユーザとオーダーサーバ１０との間でやりとりされる会話のストーリである。飲食店に入店したユーザは、入店してから退店するまでの間に複数の滞在状態を遷移する。そこで発話シナリオＤＢ１０２２には、遷移状態ごとに別個の発話シナリオが蓄積されている。 The utterance scenario DB 1022 stores information on a plurality of utterance scenarios of conversations with the restaurant user through the smart speaker 31A. The utterance scenario is a story of conversations exchanged between the user and the order server 10. A user who has entered a restaurant changes a plurality of stay states between the time he / she enters the restaurant and the time he / she leaves the restaurant. Therefore, in the utterance scenario DB 1022, separate utterance scenarios are accumulated for each transition state.

各発話シナリオは、ユーザが対話（すなわち、スマートスピーカー３１Ａに向けての発話又はタブレット端末３１Ｂでの選択）に用いるユーザ側シナリオデータと、オーダーサーバ１０が対話に用いるサーバ側シナリオデータと、さらに、各“所定発話”が行われた際の“次に行うべき処理の情報”が対応付けて記憶されている。例えば、“所定発話：人数を伝える発話”の後には、“次に行うべき処理”としての“タブレット端末３１Ｂの表示画面にメニューを表示させるようオーダー実行指示情報を送信する”などである。具体的な処理については後述する。 Each utterance scenario includes user-side scenario data used by the user for dialogue (that is, utterance toward the smart speaker 31A or selection on the tablet terminal 31B), server-side scenario data used by the order server 10 for dialogue, and further. The "information on the next process to be performed" when each "predetermined utterance" is performed is stored in association with each other. For example, after "predetermined utterance: utterance that conveys the number of people", "send order execution instruction information so that the menu is displayed on the display screen of the tablet terminal 31B" as "the next process to be performed". The specific processing will be described later.

図６Ｂは、発話シナリオＤＢ１０２２の一例であり、図６Ｂに示す例の場合、発話シナリオＤＢ１０２２は、店舗を一意に特定する「店舗ＩＤ」に対応付けて、「発話シナリオ」と「商品リスト」が登録されている。店舗情報ＤＢ１０２１及び発話シナリオＤＢ１０２２は、音声認識オーダーシステムＳを運営する事業者から、入力部１０を介して、又は、ネットワークを介して接続されたサービス提供者装置２０から更新することができるように構成されている。 FIG. 6B is an example of the utterance scenario DB 1022, and in the case of the example shown in FIG. 6B, the utterance scenario DB 1022 has a “speech scenario” and a “product list” in association with the “store ID” that uniquely identifies the store. It is registered. The store information DB 1021 and the utterance scenario DB 1022 can be updated from the service provider device 20 connected via the input unit 10 or via the network from the business operator operating the voice recognition order system S. It is configured.

＜店舗サーバ３０＞
店舗サーバ３０は、主に、オーダーサーバ１０からのオーダー実行指示情報を受信して、例えば、タブレット端末３１Ｂの制御等、オーダー内容に応じた処理を実行する装置である。例えば、レストランの厨房等のバックヤードに設置されている。例えば、店舗内の各客席の注文状況を管理するＰＯＳシステムのサーバとして機能してもよい。 <Store server 30>
The store server 30 is a device that mainly receives order execution instruction information from the order server 10 and executes processing according to the order contents, such as control of the tablet terminal 31B. For example, it is installed in the backyard of a restaurant kitchen or the like. For example, it may function as a server of a POS system that manages the order status of each seat in the store.

店舗サーバ３０は、演算機能を有するＣＰＵ、作業用ＲＡＭ、各種データ及びプログラムを記憶するＲＯＭ等から構成された本発明の第二のコンピュータとしての制御部３０１、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置等を備える記憶部３０２、モニタ等の表示画面を備える表示部３０３、店舗内の店員からの指示を受け付け当該指示に応じた指示信号を制御部３０１に対して与える入力部（例えば、キーボード、マウス、操作パネル（タッチパネルを含む）等）３０４、各種ネットワーク（ＬＡＮ（Local Area Network）を含む）を介してオーダーサーバ１０、客席端末３１及びサービス提供者装置２０などと通信を行なうための通信部３０５を備えて構成されている。各構成部材はバスを介して相互に接続されている。 The store server 30 is a semiconductor memory element such as a control unit 301, a RAM, and a flash memory as a second computer of the present invention, which is composed of a CPU having a calculation function, a work RAM, a ROM for storing various data and programs, and the like. Alternatively, a storage unit 302 having a storage device such as a hard disk or an optical disk, a display unit 303 having a display screen such as a monitor, or an instruction signal corresponding to the instruction received from a clerk in the store is sent to the control unit 301. The order server 10, the audience terminal 31, and the service provider device 20 via an input unit (for example, a keyboard, a mouse, an operation panel (including a touch panel), etc.) 304, and various networks (including a LAN (Local Area Network)). It is configured to include a communication unit 305 for communicating with and the like. The components are connected to each other via a bus.

制御部３０１は、指示情報受信手段３０１１、オーダー実行手段３０１２、端末制御手段３０１３を含み、他の部材と協動して本発明の各手段として機能する。
記憶部３０２は、客席端末管理データベース（ＤＢ）３０２１等を記憶する。図６Ｃは、客席端末管理ＤＢ３０２１の一例であり、図６Ｃに示す例の場合、店舗サーバ３０が設置された店舗内に設置されている各客席端末３１のタブレット端末３１Ｂを一意に特定する「タブレット端末ＩＤ」と、タブレット端末３１Ｂに対し必要な制御情報を送信するための「タブレット端末宛先情報」と、スマートスピーカー３１Ａを一意に特定するＩＤ情報の一例としての「スマートスピーカーＩＤ」と、スマートスピーカー３１Ａに対し応答する宛先となる「スマートスピーカー宛先情報」と、が対応付けて登録されている。が対応付けて登録されている。 The control unit 301 includes an instruction information receiving means 3011, an order executing means 3012, and a terminal controlling means 3013, and functions as each means of the present invention in cooperation with other members.
The storage unit 302 stores the audience seat terminal management database (DB) 3021 and the like. FIG. 6C is an example of the audience seat terminal management DB 3021, and in the case of the example shown in FIG. 6C, the “tablet” that uniquely identifies the tablet terminal 31B of each audience seat terminal 31 installed in the store where the store server 30 is installed. A "terminal ID", a "tablet terminal destination information" for transmitting necessary control information to the tablet terminal 31B, a "smart speaker ID" as an example of ID information that uniquely identifies the smart speaker 31A, and a smart speaker. "Smart speaker destination information", which is a destination for responding to 31A, is registered in association with each other. Are registered in association with each other.

＜客席端末３１＞
客席端末３１は、各客席のテーブル上等にそれぞれ設置され、スマートスピーカー３１Ａとタブレット端末３１Ｂにより構成される。
＜スマートスピーカー３１Ａ＞
スマートスピーカー３１Ａは、いわゆるＩｏＴ（Internet of Things）機器であり、ＡＩアシスタントサーバ４０と連携して、種々の情報処理を行う。例えば、スマートスピーカー３１Ａは、音声解析技術を用いて、ユーザが発声した言葉の内容を特定し、特定した内容に応じて、情報やコンテンツの提供、さらには各種電子商店街への注文等を実現することが可能な装置である。 <Audience terminal 31>
The audience seat terminal 31 is installed on a table or the like of each audience seat, and is composed of a smart speaker 31A and a tablet terminal 31B.
<Smart speaker 31A>
The smart speaker 31A is a so-called IoT (Internet of Things) device, and performs various information processing in cooperation with the AI assistant server 40. For example, the smart speaker 31A uses voice analysis technology to identify the content of words uttered by the user, provide information and content according to the specified content, and even make orders to various electronic shopping districts. It is a device that can be used.

スマートスピーカー３１Ａは、制御部３１１Ａと、記憶部３１２Ａと、音声出力手段としてのスピーカー３１３Ａと、音声入力手段としてのマイクロフォン３１４Ａと、操作部３１５Ａと、通信部３１６Ａと、を有する。また、スマートスピーカー３１Ａの筐体には、視覚的にスマートスピーカー３１Ａのステータスを示すＬＥＤ（発光素子）具備する。 The smart speaker 31A includes a control unit 311A, a storage unit 312A, a speaker 313A as a voice output means, a microphone 314A as a voice input means, an operation unit 315A, and a communication unit 316A. Further, the housing of the smart speaker 31A is provided with an LED (light emitting element) that visually indicates the status of the smart speaker 31A.

通信部３１６Ａは、ＮＩＣ等によって実現され、ネットワークと有線又は無線で接続される。そして、オーダーサーバ１０、店舗サーバ３０及びＡＩアシスタントサーバ４０との間で情報の送受信を行う。 The communication unit 316A is realized by NIC or the like, and is connected to the network by wire or wirelessly. Then, information is transmitted and received between the order server 10, the store server 30, and the AI assistant server 40.

操作部３１５Ａは、ユーザから各種操作を受け付ける入力装置である。例えば、操作部３１５Ａは、スマートスピーカー３１Ａに備えられた操作キー等によって実現される。
なお、スマートスピーカー３１Ａは、物理的な操作部３１５Ａを有しなくてもよい。例えば、操作部３１５Ａは、物理的な操作キーではなく、マイクロフォン３１４Ａによって検知される音声を入力として受け付けるものであってもよい。 The operation unit 315A is an input device that receives various operations from the user. For example, the operation unit 315A is realized by an operation key or the like provided on the smart speaker 31A.
The smart speaker 31A does not have to have a physical operation unit 315A. For example, the operation unit 315A may receive the voice detected by the microphone 314A as an input instead of the physical operation keys.

マイクロフォン３１４Ａは、スマートスピーカー３１Ａに関する各種情報を検知する。具体的には、マイクロフォン３１４Ａは、各種センサであり、ユーザが発する音声や、スマートスピーカー３１Ａの周囲の環境音を検知し、音声データとして取得する。 The microphone 314A detects various information about the smart speaker 31A. Specifically, the microphone 314A is various sensors, detects a voice emitted by a user and an environmental sound around the smart speaker 31A, and acquires it as voice data.

記憶部３１２Ａは、オーダーサーバ１０の接続先情報及びＡＩアシスタントサーバ４０の接続先情報、当該スマートスピーカー３１Ａを一意に特定するＩＤ情報の一例としての「スマートスピーカーＩＤ」及び設置されている店舗を一意に特定する「店舗ＩＤ」など各種情報を記憶する。記憶部３１２Ａは、例えば、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。 The storage unit 312A uniquely identifies the connection destination information of the order server 10, the connection destination information of the AI assistant server 40, the "smart speaker ID" as an example of the ID information that uniquely identifies the smart speaker 31A, and the store where the smart speaker is installed. Stores various information such as the "store ID" specified in. The storage unit 312A is realized by, for example, a semiconductor memory element such as a RAM or a flash memory, or a storage device such as a hard disk or an optical disk.

記憶部３１２Ａは音声認識データベース（ＤＢ）３１２１Ａ等を記憶する。音声認識ＤＢ３１２１Ａには、スマートスピーカー３１Ａを起動状態とするためのトリガとなる言語が記憶されている。スマートスピーカー３１Ａは、マイクロフォン３１４Ａを通してユーザからのトリガとなる音声発話を受けると、待機状態から起動状態となる。そして、筐体に具備したＬＥＤが起動状態を示す色に発色する。 The storage unit 312A stores the voice recognition database (DB) 3121A and the like. The voice recognition DB3121A stores a language that triggers the smart speaker 31A to be activated. When the smart speaker 31A receives a voice utterance that is a trigger from the user through the microphone 314A, the smart speaker 31A goes from the standby state to the activated state. Then, the LED provided in the housing develops a color indicating the activated state.

また、音声認識ＤＢ３１２１Ａには、オーダーサーバ１０に接続するための認証を行うための所定のアクセスキーワード情報が記憶されている。後述する制御部３１１Ａの認証手段３１１２Ａが、マイクロフォン３１４Ａを通じて受信した音声データの接頭に「アクセスキーワード」があるか否かの認証を行い、受信した音声データの接頭に「アクセスキーワード」がある場合に、オーダーサーバ１０にテキスト情報を含むオーダー情報を送信する。 Further, the voice recognition DB3121A stores predetermined access keyword information for performing authentication for connecting to the order server 10. When the authentication means 3112A of the control unit 311A, which will be described later, authenticates whether or not the prefix of the voice data received through the microphone 314A has an "access keyword", and the prefix of the received voice data has an "access keyword". , Sends order information including text information to the order server 10.

「アクセスキーワード」は、例えば、店舗名「○○レストラン」等とする。「アクセスキーワード」がない場合には、ＡＩアシスタントサーバ４０に接続し、いわゆる従来公知の情報処理が行われる。さらに、例えば、記憶部３１２Ａは、マイクロフォン３１４Ａによって検知された音声データを、各情報が検知された日時と対応付けて記憶してもよい。 The "access keyword" is, for example, the store name "○○ restaurant" or the like. If there is no "access keyword", the AI assistant server 40 is connected to perform so-called conventionally known information processing. Further, for example, the storage unit 312A may store the voice data detected by the microphone 314A in association with the date and time when each information is detected.

制御部３１１Ａは、演算機能を有するＣＰＵ、作業用ＲＡＭ、各種データ及びプログラムを記憶するＲＯＭ等から構成される。また、制御部３１１Ａは、コントローラであり、例えば、ＡＳＩＣやＦＰＧＡ等の集積回路により実現される。 The control unit 311A is composed of a CPU having a calculation function, a work RAM, a ROM for storing various data and programs, and the like. Further, the control unit 311A is a controller, and is realized by, for example, an integrated circuit such as an ASIC or FPGA.

制御部３１１Ａは、音声データ受信手段３１１１Ａ、認証手段３１１２Ａ、テキスト情報生成手段３１１３Ａ、オーダー情報送信手段３１１４Ａを含み、他の部材と協動して本発明の各手段として機能する。 The control unit 311A includes a voice data receiving means 3111A, an authentication means 3112A, a text information generating means 3113A, and an order information transmitting means 3114A, and functions as each means of the present invention in cooperation with other members.

＜タブレット端末３１Ｂ＞
タブレット端末３１Ｂは、例えば、店舗サーバ３０からの制御情報に応じた処理を行なう。例えば、店舗サーバ３０からの制御情報に従ってタッチパネルにメニュー等を表示する装置である。また、ユーザが操作指示可能な装置であって、タッチパネル上でユーザの操作を受け付け、画面遷移を行なったり、注文を受け付けたり、店舗サーバ３０へ会計指示を送信したりすることもできる。 <Tablet terminal 31B>
The tablet terminal 31B performs processing according to control information from the store server 30, for example. For example, it is a device that displays a menu or the like on a touch panel according to control information from the store server 30. In addition, it is a device that allows the user to instruct operations, and can accept user operations on the touch panel, perform screen transitions, accept orders, and send accounting instructions to the store server 30.

タブレット端末３１Ｂは、演算機能を有するＣＰＵ、作業用ＲＡＭ、各種データ及びプログラムを記憶するＲＯＭ等から構成されたコンピュータとしての制御部３１１Ｂ、ＲＡＭ、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置等を備える記憶部３１２、タッチパネル等の表示画面を備える表示部３１３Ｂ、タッチパネルを介してユーザの操作指示を受け付け当該指示に応じた指示信号を制御部３１１Ｂに対して与える入力部３１４Ｂ、各種ネットワーク（ＬＡＮ（Local Area Network）を含む）を介して店舗サーバ３０、オーダーサーバ１０、サービス提供者装置２０などと通信を行なうための通信部３１５Ｂを備えて構成されている。各構成部材はバスを介して相互に接続されている。 The tablet terminal 31B is a control unit 311B as a computer composed of a CPU having a calculation function, a work RAM, a ROM for storing various data and programs, a semiconductor memory element such as a RAM or a flash memory, or a hard disk or an optical disk. Storage unit 312 including a storage device such as, etc., display unit 313B including a display screen such as a touch panel, input unit 314B that receives a user's operation instruction via the touch panel and gives an instruction signal corresponding to the instruction to the control unit 311B. A communication unit 315B for communicating with a store server 30, an order server 10, a service provider device 20, and the like via various networks (including a LAN (Local Area Network)) is provided. The components are connected to each other via a bus.

記憶部３１２Ｂは、注文可能商品データベース（ＤＢ）３１２１Ｂ等を記憶する。注文可能商品ＤＢ３１２１Ｂには、注文可能な料理のメニューリスト情報（料理の写真、料理情報（産地等）、値段等）が登録（記憶）されており、当該注文可能商品ＤＢ３１２１Ｂから抽出したメニューや料理の情報等が表示画面に表示される。注文可能商品ＤＢ３１２１Ｂのメニューリスト情報は、店舗サーバ３０から制御情報に基づいて更新可能に構成されている。 The storage unit 312B stores the orderable product database (DB) 312B and the like. Menu list information (food photos, food information (production area, etc.), prices, etc.) of orderable dishes is registered (memorized) in the orderable product DB3121B, and menus and dishes extracted from the orderable product DB3121B. Information etc. is displayed on the display screen. The menu list information of the orderable product DB3121B is configured to be updatable from the store server 30 based on the control information.

＜音声認識オーダーシステムＳの処理動作例１＞
図７乃至図１２は、ユーザが入店し、スマートスピーカー３１Ａを介してオーダーサーバ１０に種々のオーダー情報を送信し、オーダーサーバ１０が受信したオーダー情報に基づいてオーダー受付処理を実行する際の処理動作例を説明するためのシーケンスチャートである。 <Processing operation example 1 of voice recognition order system S>
7 to 12 show a case where a user enters a store, transmits various order information to the order server 10 via the smart speaker 31A, and executes an order acceptance process based on the order information received by the order server 10. It is a sequence chart for explaining the processing operation example.

＜入店〜メニュー表示＞
スマートスピーカー３１Ａはユーザからのトリガとなる音声発話を受け、待機状態から起動状態になる（ステップ１）。ユーザはトリガ発話に続けて何らかの発話を行い、制御部３１１Ａの音声データ受信手段３１１１Ａは、マイクロフォン３１４Ａと協働し、それらの発話を受け音声データとして受信する（ステップＳ２）。 <Entering the store-menu display>
The smart speaker 31A receives a voice utterance that triggers the user, and changes from the standby state to the activated state (step 1). The user makes some utterance following the trigger utterance, and the voice data receiving means 3111A of the control unit 311A cooperates with the microphone 314A to receive those utterances and receive them as voice data (step S2).

次いで、制御部３１１Ａの認証手段３１１２Ａは、受信した音声データを認識して、発話の接頭にアクセスキーワードが発話されたか否かを認証する（ステップＳ３）。記憶部３１２Ａの音声認識ＤＢに予め記憶されているアクセスキーワードと照合を行い認証する。図７は、「○○レストラン」をアクセスキーワードとした場合の例である。図７の例では、アクセスキーワード「○○レストラン」を接頭に発話しているので、認証成功（ＯＫ）となり、テキスト情報生成手段３１１３Ａが、ステップＳ２で受信した音声データに基づいてユーザのオーダー内容を特定するテキスト情報を生成する（ステップＳ４）。 Next, the authentication means 3112A of the control unit 311A recognizes the received voice data and authenticates whether or not the access keyword is spoken at the prefix of the utterance (step S3). Authentication is performed by collating with the access keyword stored in advance in the voice recognition DB of the storage unit 312A. FIG. 7 shows an example in which "○○ restaurant" is used as an access keyword. In the example of FIG. 7, since the access keyword “○○ restaurant” is spoken as a prefix, the authentication is successful (OK), and the text information generation means 3113A orders the user based on the voice data received in step S2. Generate text information to specify (step S4).

そして、記憶部３１２Ａに記憶したオーダーサーバ１０の接続先情報を参照して、ステップＳ４で生成したテキスト情報を含むオーダー情報をオーダーサーバ１０へ送信する（ステップＳ５）。この際、記憶部３１２Ａに記憶した「店舗ＩＤ」と「スマートスピーカーＩＤ」とを共に送信する。 Then, with reference to the connection destination information of the order server 10 stored in the storage unit 312A, the order information including the text information generated in step S4 is transmitted to the order server 10 (step S5). At this time, the "store ID" and the "smart speaker ID" stored in the storage unit 312A are transmitted together.

次に、オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信すると（ステップＳ６）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ７)。具体的には、オーダー情報に含まれるテキスト情報に基づいてオーダー受付処理を実行する。 Next, when the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A (step S6), the reception processing means 1011 executes the order reception processing based on the received order information (step S7). Specifically, the order acceptance process is executed based on the text information included in the order information.

図７前段に示す例では、「○○レストランこんにちは」という内容のテキスト情報を含むオーダー情報をスマートスピーカー３１Ａから受信している。そして、制御手段１０１は、記憶部１０２の発話シナリオＤＢ１０２２（図６Ｂ）のうち、オーダー情報と共に受信した「店舗ＩＤ」に対応付けられた「発話シナリオ」から、オーダー情報に含まれるテキスト情報のうち、アクセスキーワードを除く「こんにちは」に対する会話の返答を抽出して、オーダー情報の送信元であるスマートスピーカー３１Ａに返答する。「店舗ＩＤ」により、複数の店舗の中から送信元の店舗を特定し、更に、「スマートスピーカーＩＤ」により、複数の客席端末３１のスマートスピーカー３１Ａの中から送信元のスマートスピーカー３１Ａを特定することができる。 In the example shown in FIG. 7 the previous stage, it has received the order information, including the text information of the content of "○○ Restaurant Hello" from the smart speaker 31A. Then, the control means 101 is out of the text information included in the order information from the "utterance scenario" associated with the "store ID" received together with the order information in the utterance scenario DB 1022 (FIG. 6B) of the storage unit 102. , extracts the response of conversation for the "Hello" except for the access keyword, to respond to smart speaker 31A, which is the source of the order information. The "store ID" identifies the sender's store from among a plurality of stores, and the "smart speaker ID" identifies the sender's smart speaker 31A from among the smart speakers 31A of the plurality of audience terminals 31. be able to.

図７前段の例では、「ようこそ○○レストランヘ！何名様ですか？」という返答をする。具体的には、受付処理手段１０１１は、記憶部１０２の店舗情報ＤＢ１０２１を参照して、オーダー情報と共に送信された「店舗ＩＤ」と「スマートスピーカーＩＤ」に対応づけて記憶されたスマートスピーカー３１Ａの宛先情報に対して返答する。なお、ステップＳ５において、スマートスピーカー３１Ａの宛先情報をオーダー情報に含んでオーダーサーバ１０に送信し、ステップＳ７においてオーダー情報に含まれたスマートスピーカー３１Ａの宛先情報に対して返答するよう構成してもよい。 In the example in the first part of Fig. 7, the reply is "Welcome to XX restaurant! How many people?". Specifically, the reception processing means 1011 refers to the store information DB 1021 of the storage unit 102, and refers to the smart speaker 31A stored in association with the "store ID" and the "smart speaker ID" transmitted together with the order information. Reply to the destination information. In step S5, the destination information of the smart speaker 31A may be included in the order information and transmitted to the order server 10, and in step S7, the destination information of the smart speaker 31A included in the order information may be returned. Good.

スマートスピーカー３１Ａでは、オーダーサーバ１０からの返答を受けると、スピーカー３１３Ａから返答を出力する（ステップＳ８）。
次いで、ユーザは「○○レストラン４名」と発話すると、制御部３１１Ａでは、音声データ受信（ステップＳ９）、アクセスキーワードの認証（ステップＳ１０）、テキスト情報生成（ステップＳ１１）、テキスト情報を含むオーダー情報送信（ステップＳ１２）を行う。当該ステップＳ９乃至Ｓ１２の処理はそれぞれステップＳ２乃至ステップＳ５の処理と同様であるため説明を省略する。 When the smart speaker 31A receives the response from the order server 10, the smart speaker 31A outputs the response from the speaker 313A (step S8).
Next, when the user utters "4 XX restaurants", the control unit 311A receives voice data (step S9), authenticates access keywords (step S10), generates text information (step S11), and orders including text information. Information transmission (step S12) is performed. Since the processes of steps S9 to S12 are the same as the processes of steps S2 to S5, the description thereof will be omitted.

アクセスキーワードによる認証状態は、所定時間維持するよう設定することができる。そのため、ステップＳ３で行った認証状態が、ステップＳ９の後も続いている場合には、ステップＳ１０のアクセスキーワードの認証は不要である。また、スマートスピーカー３１ＡがステップＳ９で発話を受けて音声データを受信する前に起動状態でない場合には、ユーザからのトリガとなる音声発話を受け、待機状態から起動状態とした後に、ステップＳ９以降の処理を行う。スマートスピーカー３１Ａが待機状態か、起動状態か、認証状態かは、ステータスを示すＬＥＤ（発光素子）の発光色で区別できるよう、それぞれの状態に対応する異なる色のＬＥＤ（発光素子）の発光色でユーザに示す。 The authentication status based on the access keyword can be set to be maintained for a predetermined time. Therefore, if the authentication state performed in step S3 continues after step S9, the authentication of the access keyword in step S10 is unnecessary. If the smart speaker 31A is not in the activated state before receiving the utterance in step S9 and receiving the voice data, it receives the voice utterance that triggers the user, changes from the standby state to the activated state, and then steps S9 and thereafter. Perform the processing of. Whether the smart speaker 31A is in the standby state, activated state, or authenticated state can be distinguished by the light emitting color of the LED (light emitting element) indicating the status, so that the light emitting color of the LED (light emitting element) of a different color corresponding to each state can be distinguished. Show to the user.

オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信すると（ステップＳ１３）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ１４)。図７後段に示す例では、オーダー内容に応じた処理を実行するよう指示するオーダー実行指示情報を生成し、オーダー情報送信元のスマートスピーカー３１Ａが設置された店舗サーバ３０へ送信する。 When the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A (step S13), the reception processing means 1011 executes the order reception processing based on the received order information (step S14). In the example shown in the latter part of FIG. 7, order execution instruction information for instructing execution of processing according to the order contents is generated and transmitted to the store server 30 in which the smart speaker 31A of the order information transmission source is installed.

記憶部１０２の発話シナリオＤＢ１０２２（図６Ｂ）を参照し、テキスト情報のうち、アクセスキーワードを除く発話を受け、“次に行うべき処理”を実行することとなる。例えば、図７後段に示す「○○レストラン４名」という内容のテキスト情報を含むオーダー情報を受信した場合のように、人数を伝える発話の後には、人数を伝えたスマートスピーカー３１Ａと対となるタブレット端末３１Ｂの表示画面にメニューを表示させることが“次に行うべき処理”である。そのため、当該指示を示すオーダー実行指示情報を、指示情報生成手段１０１２が生成し、指示情報送信手段１０１２がオーダー情報送信元のスマートスピーカー３１Ａが設置された店舗サーバ３０へ送信する（図８：ステップＳ１５）。 With reference to the utterance scenario DB 1022 (FIG. 6B) of the storage unit 102, the utterance excluding the access keyword is received from the text information, and the “next process” is executed. For example, as in the case of receiving order information including text information of "○○ restaurant 4 people" shown in the latter part of FIG. 7, after the utterance that tells the number of people, it is paired with the smart speaker 31A that tells the number of people. Displaying the menu on the display screen of the tablet terminal 31B is the "next process". Therefore, the order execution instruction information indicating the instruction is generated by the instruction information generating means 1012, and the instruction information transmitting means 1012 transmits the order execution instruction information to the store server 30 in which the smart speaker 31A of the order information transmitting source is installed (FIG. 8: step). S15).

具体的には、記憶部１０２の店舗情報ＤＢ１０２１（図６Ａ）を参照して、ステップＳ１２でオーダー情報と共に送信された「店舗ＩＤ」と「スマートスピーカーＩＤ」に対応づけて記憶された「店舗情報」に含まれる店舗サーバ３０の宛先情報に向けてオーダー実行指示情報を送信する。このとき、ステップＳ１２におけるオーダー情報の送信元であるスマートスピーカー３１Ａを特定するためのＩＤ情報である「スマートスピーカーＩＤ」を共に送信する。 Specifically, referring to the store information DB 1021 (FIG. 6A) of the storage unit 102, the “store information” stored in association with the “store ID” and the “smart speaker ID” transmitted together with the order information in step S12 is stored. The order execution instruction information is transmitted to the destination information of the store server 30 included in the above. At this time, the "smart speaker ID" which is the ID information for identifying the smart speaker 31A which is the transmission source of the order information in step S12 is also transmitted.

店舗サーバ３０の指示情報受信手段３０１１は、オーダーサーバ１０からオーダー実行指示情報を受信する（ステップＳ１６）。そして、店舗サーバ３０のオーダー実行手段３０１２は、オーダー実行指示情報に基づいて、オーダー内容に応じた処理を実行する。 The instruction information receiving means 3011 of the store server 30 receives the order execution instruction information from the order server 10 (step S16). Then, the order execution means 3012 of the store server 30 executes processing according to the order contents based on the order execution instruction information.

図８の例では、店舗サーバ３０の端末制御手段３０１３が、オーダー実行指示情報に含まれるスマートスピーカーＩＤにより特定されたスマートスピーカー３１Ａに対応づけられたタブレット端末３１Ｂを制御対象端末装置としてオーダー実行指示情報に基づいて制御する。 In the example of FIG. 8, the terminal control means 3013 of the store server 30 uses the tablet terminal 31B associated with the smart speaker 31A specified by the smart speaker ID included in the order execution instruction information as the control target terminal device for order execution instruction. Control based on information.

具体的には、記憶部３０２の客席端末管理ＤＢ３０２１（図６Ｃ）を参照して、ステップＳ１６でオーダー実行指示情報と共に送信された「スマートスピーカーＩＤ」に対応づけて記憶された「タブレット端末宛先情報」に向けて、タブレット端末３１Ｂの表示画面にメニューを表示させる旨の制御情報を送信する（ステップＳ１７）。 Specifically, referring to the audience terminal management DB 3021 (FIG. 6C) of the storage unit 302, the “tablet terminal destination information” stored in association with the “smart speaker ID” transmitted together with the order execution instruction information in step S16. The control information for displaying the menu on the display screen of the tablet terminal 31B is transmitted (step S17).

タブレット端末３１Ｂは、店舗サーバ３０からの制御情報を受信し（ステップＳ１８）、当該制御情報に応じた処理を行う（ステップＳ１９）。図８の例では、メニューの表紙をタブレット端末３１Ｂの表示部３１３Ｂの表示画面に表示する。具体的には、タブレット端末３１Ｂの記憶部３１２Ｂの注文可能商品ＤＢに記憶されているメニューリスト情報から抽出して表示する。 The tablet terminal 31B receives the control information from the store server 30 (step S18) and performs processing according to the control information (step S19). In the example of FIG. 8, the cover of the menu is displayed on the display screen of the display unit 313B of the tablet terminal 31B. Specifically, it is extracted from the menu list information stored in the orderable product DB of the storage unit 312B of the tablet terminal 31B and displayed.

なお、ステップＳ１５にて、オーダー情報送信元のスマートスピーカー３１Ａの「スマートスピーカーＩＤ」を送信しているが、指示情報生成手段１０１２が生成するオーダー実行指示情報の内容が、タブレット端末３１Ｂに関する内容であれば、店舗情報ＤＢ１０２１（図６Ａ）を参照して、オーダー情報送信元のスマートスピーカー３１Ａの「スマートスピーカーＩＤ」に変えて、当該「スマートスピーカーＩＤ」に対応付けられた「タブレット端末ＩＤ」又は「タブレット端末宛先情報」を送信してもよい。例えば、「タブレット端末宛先情報」を受信した場合には、ステップＳ１７において、店舗サーバ３０の端末制御手段３０１３は、客席端末管理ＤＢ３０２１（図６Ｃ）を参照する必要がなくなる。 In step S15, the "smart speaker ID" of the smart speaker 31A, which is the source of the order information, is transmitted, but the content of the order execution instruction information generated by the instruction information generation means 1012 is the content related to the tablet terminal 31B. If there is, the store information DB 1021 (FIG. 6A) is referred to, and the "smart speaker ID" of the smart speaker 31A of the order information transmission source is changed to the "tablet terminal ID" associated with the "smart speaker ID" or You may send "tablet terminal address information". For example, when the "tablet terminal destination information" is received, in step S17, the terminal control means 3013 of the store server 30 does not need to refer to the audience seat terminal management DB 3021 (FIG. 6C).

タブレット端末３１Ｂの表示画面に表示されているメニューの表紙から、ドリンクメニューページ、お食事メニューページ、デザートメニューページ、注文状況表示のページなど、種々の表示を行う際には、タブレット端末３１Ｂの表示画面であるタッチパネル上でユーザがタッチしてページをめくったり、表示を変えたりすることができる。
本実施形態では、スマートスピーカー３１Ａを通じてタブレット端末３１Ｂの表示を変えて商品を注文するケースについて説明する。 When various displays such as a drink menu page, a meal menu page, a dessert menu page, and an order status display page are displayed from the cover of the menu displayed on the display screen of the tablet terminal 31B, the display screen of the tablet terminal 31B is used. The user can touch on the touch panel to turn the page or change the display.
In the present embodiment, a case of ordering a product by changing the display of the tablet terminal 31B through the smart speaker 31A will be described.

＜所望ページの表示＞
スマートスピーカー３１Ａはユーザからのトリガとなる音声発話を受け、待機状態から起動状態になる（ステップ２０）。ユーザはトリガ発話に続けて何らかの発話を行い、制御部３１１Ａの音声データ受信手段３１１１Ａは、マイクロフォン３１４Ａと協働し、それらの発話を受け音声データ（「○○レストランドリンクメニューを開いて」）として受信する（ステップＳ２１）。 <Display of desired page>
The smart speaker 31A receives a voice utterance that triggers the user, and changes from the standby state to the activated state (step 20). The user makes some utterance following the trigger utterance, and the voice data receiving means 3111A of the control unit 311A cooperates with the microphone 314A to receive those utterances and use them as voice data (“Open the XX restaurant drink menu”). Receive (step S21).

次いで、制御部３１１Ａでは、アクセスキーワードの認証（ステップＳ２２）、テキスト情報生成（ステップＳ２３）、テキスト情報を含むオーダー情報送信（ステップＳ２４）を行う。当該ステップＳ２１乃至Ｓ２４の処理はそれぞれステップＳ２乃至ステップＳ５の処理と同様であるため説明を省略する。 Next, the control unit 311A performs access keyword authentication (step S22), text information generation (step S23), and order information transmission including text information (step S24). Since the processes of steps S21 to S24 are the same as the processes of steps S2 to S5, the description thereof will be omitted.

オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信すると（ステップＳ２５）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ２６)。具体的には、オーダー情報に含まれるテキスト情報に基づいてオーダー受付処理を実行する。
図９に示す例では、オーダー内容に応じた処理を実行するよう指示するオーダー実行指示情報を生成し、オーダー情報送信元のスマートスピーカー３１Ａが設置された店舗サーバ３０へ送信する。 When the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A (step S25), the reception processing means 1011 executes the order reception processing based on the received order information (step S26). Specifically, the order acceptance process is executed based on the text information included in the order information.
In the example shown in FIG. 9, order execution instruction information for instructing execution of processing according to the order contents is generated and transmitted to the store server 30 in which the smart speaker 31A of the order information transmission source is installed.

記憶部１０２の発話シナリオＤＢ１０２２（図６Ｂ）を参照し、テキスト情報のうち、アクセスキーワードを除く発話を受け、“次に行うべき処理”を実行することとなる。例えば、図９に示す「○○レストランドリンクメニューを開いて」という内容のテキスト情報を含むオーダー情報を受信した場合のように、所定のメニューページを開くよう要求する発話の後には、要求元のスマートスピーカー３１Ａと対となるタブレット端末３１Ｂの表示画面に要求されたメニューページを表示させることが“次に行うべき処理”である。そのため、当該指示を示すオーダー実行指示情報を、指示情報生成手段１０１２が生成し、指示情報送信手段１０１２がオーダー情報送信元のスマートスピーカー３１Ａが設置された店舗サーバ３０へ送信する（ステップＳ２７）。具体的な処理はステップＳ１５の処理と同様であるため説明を省略する。 With reference to the utterance scenario DB 1022 (FIG. 6B) of the storage unit 102, the utterance excluding the access keyword is received from the text information, and the “next process” is executed. For example, as in the case of receiving order information including the text information "Open the XX restaurant drink menu" shown in FIG. 9, after the utterance requesting to open the predetermined menu page, the requesting source smart Displaying the requested menu page on the display screen of the tablet terminal 31B paired with the speaker 31A is the "next process". Therefore, the order execution instruction information indicating the instruction is generated by the instruction information generating means 1012, and the instruction information transmitting means 1012 transmits the order information transmitting means to the store server 30 in which the smart speaker 31A of the order information transmitting source is installed (step S27). Since the specific processing is the same as the processing in step S15, the description thereof will be omitted.

店舗サーバ３０の指示情報受信手段３０１１は、オーダーサーバ１０からオーダー実行指示情報を受信する（ステップＳ２８）。そして、店舗サーバ３０のオーダー実行手段３０１２は、オーダー実行指示情報に基づいて、オーダー内容に応じた処理を実行する。 The instruction information receiving means 3011 of the store server 30 receives the order execution instruction information from the order server 10 (step S28). Then, the order execution means 3012 of the store server 30 executes processing according to the order contents based on the order execution instruction information.

図９の例では、店舗サーバ３０の端末制御手段３０１３が、オーダー実行指示情報に含まれるスマートスピーカーＩＤにより特定されたスマートスピーカー３１Ａに対応づけられたタブレット端末３１Ｂを制御対象端末装置としてオーダー実行指示情報に基づいて制御する。 In the example of FIG. 9, the terminal control means 3013 of the store server 30 uses the tablet terminal 31B associated with the smart speaker 31A specified by the smart speaker ID included in the order execution instruction information as the control target terminal device for order execution instruction. Control based on information.

具体的には、記憶部３０２の客席端末管理ＤＢ３０２１（図６Ｃ）を参照して、ステップＳ２８でオーダー実行指示情報と共に受信した「スマートスピーカーＩＤ」に対応づけて記憶された「タブレット端末宛先情報」に向けて、タブレット端末３１Ｂの表示画面にドリンクメニューページを表示させるよう指示する旨の制御情報を送信する（ステップＳ２９）。 Specifically, referring to the audience terminal management DB 3021 (FIG. 6C) of the storage unit 302, the "tablet terminal destination information" stored in association with the "smart speaker ID" received together with the order execution instruction information in step S28. Control information to instruct the display screen of the tablet terminal 31B to display the drink menu page is transmitted (step S29).

タブレット端末３１Ｂは、店舗サーバ３０からの制御情報を受信し（ステップＳ３０）、当該制御情報に応じた処理を行う（ステップＳ３１）。図９の例では、ドリンクメニューページをタブレット端末３１Ｂの表示部３１３Ｂの表示画面に表示する。具体的には、タブレット端末３１Ｂの記憶部３１２Ｂの注文可能商品ＤＢに記憶されているメニューリスト情報から抽出して表示する。 The tablet terminal 31B receives the control information from the store server 30 (step S30) and performs processing according to the control information (step S31). In the example of FIG. 9, the drink menu page is displayed on the display screen of the display unit 313B of the tablet terminal 31B. Specifically, it is extracted from the menu list information stored in the orderable product DB of the storage unit 312B of the tablet terminal 31B and displayed.

タブレット端末３１Ｂの表示画面に表示されているドリンクメニューページから、ユーザがタッチパネルをタッチして所望のドリンクを注文することができる。
引き続き、本実施形態では、スマートスピーカー３１Ａを通じて所望の商品を注文するケースについて説明する。 From the drink menu page displayed on the display screen of the tablet terminal 31B, the user can touch the touch panel to order a desired drink.
Subsequently, in the present embodiment, a case where a desired product is ordered through the smart speaker 31A will be described.

＜商品の注文＞
次いで、制御部３１１Ａの音声データ受信手段３１１１Ａは、マイクロフォン３１４Ａと協働し、ユーザの発話を受け音声データ（「○○レストラン生ビール４つ！」）として受信する（ステップＳ３２）。 <Product order>
Next, the voice data receiving means 3111A of the control unit 311A cooperates with the microphone 314A and receives the user's utterance as voice data (“○○ restaurant draft beer 4!”) (Step S32).

次いで、制御部３１１Ａでは、アクセスキーワードの認証（ステップＳ３３）、テキスト情報生成（ステップＳ３４）、テキスト情報を含むオーダー情報送信（ステップＳ３５）を行う。なお、ステップＳ２１で行った認証状態が続いている場合には、ステップＳ３３のアクセスキーワードの認証は不要である。当該ステップＳ３２乃至Ｓ３５の処理はそれぞれステップＳ２乃至ステップＳ５の処理と同様であるため説明を省略する。 Next, the control unit 311A performs access keyword authentication (step S33), text information generation (step S34), and order information transmission including text information (step S35). If the authentication status performed in step S21 continues, the access keyword authentication in step S33 is unnecessary. Since the processes of steps S32 to S35 are the same as the processes of steps S2 to S5, the description thereof will be omitted.

オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信すると（ステップＳ３６）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ３７)。具体的には、オーダー情報に含まれるテキスト情報に基づいてオーダー受付処理を実行する。 When the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A (step S36), the reception processing means 1011 executes the order reception processing based on the received order information (step S37). Specifically, the order acceptance process is executed based on the text information included in the order information.

図１０前段に示す例では、「○○レストラン生ビール４つ！」という内容のテキスト情報を含むオーダー情報をスマートスピーカー３１Ａから受信している。そして、制御手段１０１は、記憶部１０２の発話シナリオＤＢ１０２２（図６Ｂ）のうち、オーダー情報と共に受信した「店舗ＩＤ」に対応付けられた「発話シナリオ」から、オーダー情報に含まれるテキスト情報のうち、アクセスキーワードを除く「生ビール４つ！」に対する“会話の返答”と、“次に行うべき処理”を実行することとなる。 In the example shown in the first part of FIG. 10, order information including text information of "○○ restaurant draft beer 4!" Is received from the smart speaker 31A. Then, the control means 101 is out of the text information included in the order information from the "utterance scenario" associated with the "store ID" received together with the order information in the utterance scenario DB 1022 (FIG. 6B) of the storage unit 102. , "Conversation response" to "4 draft beers!" Excluding access keywords and "Next processing" will be executed.

会話の返答として、オーダー情報の送信元であるスマートスピーカー３１Ａに返答する。また、次に行うべき処理は、タブレット端末３１Ｂの表示画面に注文内容を表示させることである。
まず、図１０前段の例では、「以上でよろしいですか？」という返答をする。具体的な処理はステップＳ７の処理と同様であるため説明を省略する。
スマートスピーカー３１Ａでは、オーダーサーバ１０からの返答を受けると、スピーカー３１３Ａから返答を出力する（ステップＳ３８）。 As a reply to the conversation, the smart speaker 31A, which is the sender of the order information, is replied. The next process to be performed is to display the order contents on the display screen of the tablet terminal 31B.
First, in the example in the first part of FIG. 10, the reply "Are you sure you want to do this?" Is given. Since the specific processing is the same as the processing in step S7, the description thereof will be omitted.
When the smart speaker 31A receives the response from the order server 10, the smart speaker 31A outputs the response from the speaker 313A (step S38).

ステップＳ３９にて、タブレット端末３１Ｂの表示画面に注文内容を表示させる指示を示すオーダー実行指示情報を、指示情報生成手段１０１２が生成し、指示情報送信手段１０１２がオーダー情報送信元のスマートスピーカー３１Ａが設置された店舗サーバ３０へ送信する（ステップＳ３９）。具体的な処理はステップＳ１５の処理と同様であるため説明を省略する。 In step S39, the instruction information generating means 1012 generates order execution instruction information indicating an instruction to display the order contents on the display screen of the tablet terminal 31B, and the instruction information transmitting means 1012 causes the smart speaker 31A of the order information transmitting source to generate the order execution instruction information. It is transmitted to the installed store server 30 (step S39). Since the specific processing is the same as the processing in step S15, the description thereof will be omitted.

店舗サーバ３０の指示情報受信手段３０１１は、オーダーサーバ１０からオーダー実行指示情報を受信する（ステップＳ４０）。そして、店舗サーバ３０のオーダー実行手段３０１２は、オーダー実行指示情報に基づいて、オーダー内容に応じた処理を実行する。 The instruction information receiving means 3011 of the store server 30 receives the order execution instruction information from the order server 10 (step S40). Then, the order execution means 3012 of the store server 30 executes processing according to the order contents based on the order execution instruction information.

図１０前段の例では、店舗サーバ３０の端末制御手段３０１３が、オーダー実行指示情報に含まれるスマートスピーカーＩＤにより特定されたスマートスピーカー３１Ａに対応づけられたタブレット端末３１Ｂを制御対象端末装置としてオーダー実行指示情報に基づいて制御する。 In the example of the first stage of FIG. 10, the terminal control means 3013 of the store server 30 orders execution using the tablet terminal 31B associated with the smart speaker 31A specified by the smart speaker ID included in the order execution instruction information as the control target terminal device. Control based on the instruction information.

具体的には、記憶部３０２の客席端末管理ＤＢ３０２１（図６Ｃ）を参照して、ステップＳ４０でオーダー実行指示情報と共に受信した「スマートスピーカーＩＤ」に対応づけて記憶された「タブレット端末宛先情報」に向けて、タブレット端末３１Ｂの表示画面に注文内容を表示させる旨の制御情報を送信する（ステップＳ４１）。 Specifically, referring to the audience terminal management DB 3021 (FIG. 6C) of the storage unit 302, the “tablet terminal destination information” stored in association with the “smart speaker ID” received together with the order execution instruction information in step S40. Control information for displaying the order details on the display screen of the tablet terminal 31B is transmitted (step S41).

タブレット端末３１Ｂは、店舗サーバ３０からの制御情報を受信し（ステップＳ４２）、当該制御情報に応じた処理を行う（ステップＳ４３）。図１０前段の例では、注文内容をタブレット端末３１Ｂの表示部３１３Ｂの表示画面に表示する。具体的には、タブレット端末３１Ｂの記憶部３１２Ｂの注文可能商品ＤＢに記憶されているメニューリスト情報から抽出して表示する。
ステップＳ３８の発話を受けて、ユーザは引き続き注文を行ってもよい（ステップＳ３２〜Ｓ４３）。 The tablet terminal 31B receives the control information from the store server 30 (step S42) and performs processing according to the control information (step S43). In the example of the first stage of FIG. 10, the order contents are displayed on the display screen of the display unit 313B of the tablet terminal 31B. Specifically, it is extracted from the menu list information stored in the orderable product DB of the storage unit 312B of the tablet terminal 31B and displayed.
Upon receiving the utterance in step S38, the user may continue to place an order (steps S32 to S43).

制御部３１１Ａの音声データ受信手段３１１１Ａは、マイクロフォン３１４Ａと協働し、ユーザの発話を受け音声データ（「以上で」）として受信する（ステップＳ４４）。
次いで、制御部３１１Ａでは、テキスト情報生成（ステップＳ４５）、テキスト情報を含むオーダー情報送信（ステップＳ４６）を行う。 The voice data receiving means 3111A of the control unit 311A cooperates with the microphone 314A and receives the user's utterance as voice data (“at or above”) (step S44).
Next, the control unit 311A performs text information generation (step S45) and order information transmission including text information (step S46).

当該ステップＳ４４乃至Ｓ４６の処理はそれぞれステップＳ２、ステップＳ４、ステップＳ５の処理と同様であるため説明を省略する。なお、スマートスピーカー３１ＡがステップＳ４４で発話を受けて音声データを受信する前に起動状態でない場合には、ユーザからのトリガとなる音声発話を受け、待機状態から起動状態とした後に、ステップＳ４４以降の処理を行う。ステップＳ３３で行った認証状態が続いていない場合には、ステップＳ４４に続いてアクセスキーワードの認証を行う。認証状態でない場合に、ユーザがアクセスキーワードを発話しなかった場合には、ＡＩアシスタントサーバ４０に向けた発話となる。ＡＩアシスタントサーバ４０とのやり取りについては図１３を用いて後に詳述する。 Since the processes of steps S44 to S46 are the same as the processes of steps S2, S4, and S5, respectively, the description thereof will be omitted. If the smart speaker 31A is not in the activated state before receiving the utterance in step S44 and receiving the voice data, it receives the voice utterance that triggers the user, changes from the standby state to the activated state, and then steps S44 and thereafter. Perform the processing of. If the authentication status performed in step S33 does not continue, the access keyword is authenticated following step S44. If the user does not speak the access keyword in the non-authenticated state, the utterance is directed to the AI assistant server 40. The communication with the AI assistant server 40 will be described in detail later with reference to FIG.

次いで、オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信すると（ステップＳ４７）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ４８)。具体的には、オーダー情報に含まれるテキスト情報に基づいてオーダー受付処理を実行する。 Next, when the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A (step S47), the reception processing means 1011 executes the order reception processing based on the received order information (step S48). Specifically, the order acceptance process is executed based on the text information included in the order information.

図１０後段に示す例では、オーダー内容に応じた処理を実行するよう指示するオーダー実行指示情報を生成し、オーダー情報送信元のスマートスピーカー３１Ａが設置された店舗サーバ３０へ送信する。 In the example shown in the latter part of FIG. 10, order execution instruction information for instructing execution of processing according to the order contents is generated and transmitted to the store server 30 in which the smart speaker 31A of the order information transmission source is installed.

記憶部１０２の発話シナリオＤＢ１０２２（図６Ｂ）を参照し、テキスト情報のうち、アクセスキーワードを除く発話を受け、“次に行うべき処理”を実行することとなる。例えば、図１０後段に示す「以上で」という内容のテキスト情報を含むオーダー情報を受信した場合のように、注文を確定するよう要求する発話の後には、要求元のスマートスピーカー３１Ａと対となるタブレット端末３１Ｂの表示画面に注文確定表示をすることが“次に行うべき処理”である。そのため、当該指示を示すオーダー実行指示情報を、指示情報生成手段１０１２が生成し、指示情報送信手段１０１２がオーダー情報送信元のスマートスピーカー３１Ａが設置された店舗サーバ３０へ送信する（ステップＳ４９）。具体的な処理はステップＳ１５の処理と同様であるため説明を省略する。 With reference to the utterance scenario DB 1022 (FIG. 6B) of the storage unit 102, the utterance excluding the access keyword is received from the text information, and the “next process” is executed. For example, as in the case of receiving the order information including the text information of "above" shown in the latter part of FIG. 10, after the utterance requesting the confirmation of the order, the smart speaker 31A of the requesting source is paired. Displaying the order confirmation on the display screen of the tablet terminal 31B is the "next process". Therefore, the order execution instruction information indicating the instruction is generated by the instruction information generating means 1012, and the instruction information transmitting means 1012 transmits the order information transmitting means to the store server 30 in which the smart speaker 31A of the order information transmitting source is installed (step S49). Since the specific processing is the same as the processing in step S15, the description thereof will be omitted.

店舗サーバ３０の指示情報受信手段３０１１は、オーダーサーバ１０からオーダー実行指示情報を受信する（ステップＳ５０）。そして、店舗サーバ３０のオーダー実行手段３０１２は、オーダー実行指示情報に基づいて、オーダー内容に応じた処理を実行する。 The instruction information receiving means 3011 of the store server 30 receives the order execution instruction information from the order server 10 (step S50). Then, the order execution means 3012 of the store server 30 executes processing according to the order contents based on the order execution instruction information.

図１０後段の例では、店舗サーバ３０の端末制御手段３０１３が、オーダー実行指示情報に含まれるスマートスピーカーＩＤにより特定されたスマートスピーカー３１Ａに対応づけられたタブレット端末３１Ｂを制御対象端末装置としてオーダー実行指示情報に基づいて制御する。 In the example in the latter part of FIG. 10, the terminal control means 3013 of the store server 30 orders execution using the tablet terminal 31B associated with the smart speaker 31A specified by the smart speaker ID included in the order execution instruction information as the control target terminal device. Control based on the instruction information.

具体的には、記憶部３０２の客席端末管理ＤＢ３０２１（図６Ｃ）を参照して、ステップＳ５０でオーダー実行指示情報と共に受信した「スマートスピーカーＩＤ」に対応づけて記憶された「タブレット端末宛先情報」に向けて、タブレット端末３１Ｂの表示画面に注文確定表示を指示する旨の制御情報を送信する（ステップＳ５１）。 Specifically, referring to the audience terminal management DB 3021 (FIG. 6C) of the storage unit 302, the “tablet terminal destination information” stored in association with the “smart speaker ID” received together with the order execution instruction information in step S50. Control information to instruct the order confirmation display is transmitted to the display screen of the tablet terminal 31B (step S51).

タブレット端末３１Ｂは、店舗サーバ３０からの制御情報を受信し（ステップＳ５２）、当該制御情報に応じた処理を行う（ステップＳ５３）。図１０後段の例では、注文確定表示をタブレット端末３１Ｂの表示部３１３Ｂの表示画面に表示する。 The tablet terminal 31B receives the control information from the store server 30 (step S52) and performs processing according to the control information (step S53). In the latter example of FIG. 10, the order confirmation display is displayed on the display screen of the display unit 313B of the tablet terminal 31B.

注文確定表示と共に、確認ボタンを表示し、ユーザが当該確認ボタンを選択したときに正式なオーダーが行われたとして、厨房内の注文端末装置へオーダー情報を送信し、注文端末装置のモニター等にオーダーを表示したりオーダーをプリントアウトしたりして調理者に提示し調理を開始してもよい。店舗サーバ３０を注文端末装置として機能させてもよい。又は、ステップＳ５０でオーダー実行指示情報を受信したときに、正式なオーダーが行われたと確定してもよい。ユーザへ確認を取るか否かは適宜設計変更できる。 A confirmation button is displayed along with the order confirmation display, and when the user selects the confirmation button, it is assumed that a formal order has been placed, and the order information is sent to the order terminal device in the kitchen and displayed on the monitor of the order terminal device. The order may be displayed or printed out and presented to the cook to start cooking. The store server 30 may function as an order terminal device. Alternatively, when the order execution instruction information is received in step S50, it may be confirmed that the formal order has been placed. Whether or not to confirm with the user can be changed as appropriate.

＜種々のオーダー＞
上述したような料理の注文に限らず、音声認識オーダーシステムＳによれば、種々のオーダーに対応できる。
図１１のシーケンスチャートに示すように、ステップＳ１００乃至ステップＳ１０４においてユーザが「今月のおすすめ」を紹介するようオーダーすると、オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信して（ステップＳ１０５）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ１０６)。 <Various orders>
Not limited to the above-mentioned food order, the voice recognition order system S can handle various orders.
As shown in the sequence chart of FIG. 11, when the user orders to introduce "this month's recommendation" in steps S100 to S104, the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A. Then (step S105), the order acceptance process is executed based on the received order information (step S106).

そうすると、店舗サーバ３０を介して、要求元のスマートスピーカー３１Ａと対となるタブレット端末３１Ｂの表示画面におすすめ動画を再生表示する（ステップＳ１０７乃至ステップＳ１１１）。おすすめ動画は、タブレット端末３１Ｂの記憶部３１２Ｂの注文可能商品ＤＢに記憶されているメニューリスト情報から抽出して表示する。 Then, the recommended moving image is reproduced and displayed on the display screen of the tablet terminal 31B paired with the requesting smart speaker 31A via the store server 30 (steps S107 to S111). The recommended moving image is extracted from the menu list information stored in the orderable product DB of the storage unit 312B of the tablet terminal 31B and displayed.

図１２のシーケンスチャートに示すように、ステップＳ１２０乃至ステップＳ１２４においてユーザが“鮮魚盛り”について説明するようオーダーすると、オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信して（ステップＳ１２５）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ１２６)。 As shown in the sequence chart of FIG. 12, when the user orders to explain "fresh fish heap" in steps S120 to S124, the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A. (Step S125), the order acceptance process is executed based on the received order information (step S126).

すると、制御手段１０１は、記憶部１０２の発話シナリオＤＢ１０２２（図６Ｂ）を参照して、オーダー情報と共に受信した「店舗ＩＤ」に対応付けられた「発話シナリオ」から、“鮮魚盛り”の説明を抽出して、オーダー情報の送信元であるスマートスピーカー３１Ａに返答する(ステップＳ１２７)。 Then, the control means 101 refers to the utterance scenario DB 1022 (FIG. 6B) of the storage unit 102, and explains the “fresh fish assortment” from the “speaking scenario” associated with the “store ID” received together with the order information. It is extracted and replied to the smart speaker 31A, which is the source of the order information (step S127).

なお、制御手段１０１は、記憶部１０２の発話シナリオＤＢ１０２２（図６Ｂ）を参照して、オーダー情報と共に受信した「店舗ＩＤ」に対応付けられた「商品リスト」に説明を求められた商品がない場合には、「その商品はございません。」等を返答する。
以上説明したように、音声認識オーダーシステムＳによれば、種々のオーダーに対応できる。上述した事例に限定されず、注文のキャンセルや、会計依頼等もオーダーすることができる。 In addition, the control means 101 refers to the utterance scenario DB 1022 (FIG. 6B) of the storage unit 102, and there is no product requested to be explained in the "product list" associated with the "store ID" received together with the order information. In that case, reply "There is no such product."
As described above, the voice recognition order system S can handle various orders. Not limited to the above-mentioned cases, it is possible to cancel an order or place an order for an accounting request.

＜認証不可の場合＞
上述した実施形態では、スマートスピーカー３１Ａの制御部３１１Ａの認証手段３１１２Ａによって、音声データ受信手段３１１１Ａが受信した音声データについて、所定のアクセスキーワードが発話されたか否かを認証し、認証が成功した場合を例に説明したが、認証失敗（認証不可）の場合について、図１３のシーケンスチャートを参照して説明する。 <If authentication is not possible>
In the above-described embodiment, when the authentication means 3112A of the control unit 311A of the smart speaker 31A authenticates whether or not a predetermined access keyword is spoken for the voice data received by the voice data receiving means 3111A, and the authentication is successful. However, the case of authentication failure (authentication not possible) will be described with reference to the sequence chart of FIG.

スマートスピーカー３１Ａはユーザからのトリガとなる音声発話を受け、待機状態から起動状態になる（ステップ１３０）。ユーザはトリガ発話に続けて何らかの発話を行い、制御部３１１Ａの音声データ受信手段３１１１Ａは、マイクロフォン３１４Ａと協働し、それらの発話を受け音声データ（「□□線恵比寿駅の終電は何時？」）として受信する（ステップＳ１３１）。 The smart speaker 31A receives a voice utterance that triggers the user, and changes from the standby state to the activated state (step 130). The user makes some utterance following the trigger utterance, and the voice data receiving means 3111A of the control unit 311A cooperates with the microphone 314A to receive the utterances and voice data (“What time is the last train at Ebisu Station on the □□ line?””. ) (Step S131).

次いで、制御部３１１Ａでは、アクセスキーワードの認証（ステップＳ１３２）を行う。記憶部３１２Ａの音声認識ＤＢに予め記憶されているアクセスキーワードと照合を行い認証し、認証が成功していない場合、すなわち、認証不可（ＮＧ）の場合には、受信した音声データをＡＩアシスタントサーバ４０へ送信する（ステップＳ１３３）。ＡＩアシスタントサーバ４０にて、従来公知の情報処理が行われる。つまり、音声データを受信し（ステップＳ１３４）、音声解析が行われ（ステップＳ１３５）、最適解がスマートスピーカー３１Ａへ返答される（ステップＳ１３６、Ｓ１３７）。 Next, the control unit 311A authenticates the access keyword (step S132). Authentication is performed by collating with the access keyword stored in advance in the voice recognition DB of the storage unit 312A, and if the authentication is not successful, that is, if the authentication is not possible (NG), the received voice data is sent to the AI assistant server. It is transmitted to 40 (step S133). Conventionally known information processing is performed on the AI assistant server 40. That is, the voice data is received (step S134), voice analysis is performed (step S135), and the optimum solution is returned to the smart speaker 31A (steps S136 and S137).

このように、本実施形態のスマートスピーカー３１Ａによれば、アクセスキーワードの認証が成功すればオーダーサーバ１０へ接続され、アクセスキーワードの認証がされなければ（認証不可）、通常のＡＩアシスタントサーバ４０に接続される。よって、従来公知のスマートスピーカーとしての機能を保ったまま音声認識オーダーシステムＳの構成に組み込むことができる。 As described above, according to the smart speaker 31A of the present embodiment, if the access keyword authentication is successful, the connection is made to the order server 10, and if the access keyword is not authenticated (authentication is not possible), the normal AI assistant server 40 is connected. Be connected. Therefore, it can be incorporated into the configuration of the voice recognition order system S while maintaining the function as a conventionally known smart speaker.

＜他の使用例＞
上述した実施形態では、音声認識オーダーシステムＳをレストランに適用した場合を例に説明したが、本発明の音声認識オーダーシステムＳは、他の場面にも適用できる。次いで、ホテルや旅館等に対し、本発明の音声認識オーダーシステムＳを適用した場合について説明する。 <Other usage examples>
In the above-described embodiment, the case where the voice recognition order system S is applied to a restaurant has been described as an example, but the voice recognition order system S of the present invention can also be applied to other situations. Next, a case where the voice recognition order system S of the present invention is applied to a hotel, an inn, or the like will be described.

客席端末３１と店舗サーバ３０は、その呼び名がかわるだけで構成は上記記載の内容と同様である。すなわち、各ホテルの各客室に用意された客室端末３１と、第一のコンピュータとしてのオーダーサーバ１０と、第二のコンピュータとしてのフロントサーバ３０、当該音声認識オーダーシステムＳを運営する事業者が設置するサービス提供者装置２０、スマートスピーカー３１ＡのＡＩアシスタントサーバ４０とからなり、それぞれ有線又は無線にてネットワークに接続している。 The configuration of the audience seat terminal 31 and the store server 30 is the same as that described above, except that the names are changed. That is, a guest room terminal 31 prepared in each guest room of each hotel, an order server 10 as a first computer, a front server 30 as a second computer, and a business operator operating the voice recognition order system S are installed. The service provider device 20 and the AI assistant server 40 of the smart speaker 31A are connected to the network by wire or wirelessly, respectively.

オーダーサーバ１０は、主に、各ホテルの各客室に設置したスマートスピーカー３１Ａからオーダー情報を受信して必要なオーダー受付処理を実行する装置である。
記憶部１０２は、ホテル情報データベース（ＤＢ）１０２３、発話シナリオデータベース（ＤＢ）１０２４等を記憶する。図１４Ａは、ホテル情報ＤＢ１０２３の一例であり、図１４Ａに示す例の場合、ホテルを一意に特定する「ホテルＩＤ」に対応付けて、フロントサーバ３０の宛先情報を含む「ホテル情報」、「アクセスキーワード」が登録されている。全てのホテルにそれぞれ異なるホテルＩＤが付与されている。 The order server 10 is a device that mainly receives order information from the smart speaker 31A installed in each guest room of each hotel and executes necessary order acceptance processing.
The storage unit 102 stores the hotel information database (DB) 1023, the utterance scenario database (DB) 1024, and the like. FIG. 14A is an example of the hotel information DB 1023, and in the case of the example shown in FIG. 14A, “hotel information” including the destination information of the front server 30 and “access” are associated with the “hotel ID” that uniquely identifies the hotel. "Keywords" are registered. All hotels are given different hotel IDs.

「ホテル情報」には、ホテル内の各客室に設置されている客室端末３１のタブレット端末３１Ｂを一意に特定する「タブレット端末ＩＤ」と、タブレット端各末３１Ｂに対し必要な制御情報を送信するための「タブレット端末宛先情報」と、スマートスピーカー３１Ａを一意に特定するＩＤ情報の一例としての「スマートスピーカーＩＤ」と、スマートスピーカー３１Ａに対し応答する宛先となる「スマートスピーカー宛先情報」と、が対応付けて登録されている。 In the "hotel information", the "tablet terminal ID" that uniquely identifies the tablet terminal 31B of the guest room terminal 31 installed in each guest room in the hotel and the necessary control information are transmitted to each end 31B of the tablet end. "Tablet terminal destination information" for this purpose, "smart speaker ID" as an example of ID information that uniquely identifies the smart speaker 31A, and "smart speaker destination information" that is the destination that responds to the smart speaker 31A. It is registered in association with each other.

発話シナリオＤＢ１０２４には、スマートスピーカー３１Ａを通じてホテルに滞在するユーザと行われる会話の複数の発話シナリオ情報が蓄積されている。発話シナリオとは、ユーザとオーダーサーバ１０との間でやりとりされる会話のストーリである。ホテルの客室内のユーザは、入室からチェックアウトのために室内から退室するまで間に複数の滞在状態を遷移する。発話シナリオＤＢ１０２４には、遷移状態ごとに別個の発話シナリオが蓄積されている。 The utterance scenario DB 1024 stores information on a plurality of utterance scenarios of conversations with the user staying at the hotel through the smart speaker 31A. The utterance scenario is a story of conversations exchanged between the user and the order server 10. A user in a hotel guest room transitions between a plurality of stay states from entering the room to leaving the room for check-out. In the utterance scenario DB 1024, separate utterance scenarios are accumulated for each transition state.

図１４Ｂは、発話シナリオＤＢ１０２４の一例であり、図１４Ｂに示す例の場合、発話シナリオＤＢ１０２４は、「ホテルＩＤ」に対応付けて、「発話シナリオ」と「リクエストリスト」が登録されている。 FIG. 14B is an example of the utterance scenario DB 1024, and in the case of the example shown in FIG. 14B, the utterance scenario DB 1024 has a “speech scenario” and a “request list” registered in association with the “hotel ID”.

フロントサーバ３０は、店舗サーバ３０と同様の動作を行う。主に、オーダーサーバ１０からのオーダー実行指示情報を受信して、例えば、タブレット端末３１Ｂの制御等、オーダー内容に応じた処理を実行する装置である。例えば、ホテルのフロントやコンシェルジュルーム等に設置されている。 The front server 30 operates in the same manner as the store server 30. This is a device that mainly receives order execution instruction information from the order server 10 and executes processing according to the order contents, such as control of the tablet terminal 31B. For example, it is installed at the front desk of a hotel or in a concierge room.

客室端末３１は、各客室にそれぞれ設置され、スマートスピーカー３１Ａとタブレット端末３１Ｂにより構成される。
タブレット端末３１Ｂの記憶部３１２Ｂは、リクエスト内容データベース（ＤＢ）(不図示)等を記憶する。リクエスト内容ＤＢ３１２１Ｂには、リクエスト可能な情報、例えば、ルームサービスの内容、夕食の予約、夕食時間の予約・変更、朝食の予約、朝食時間の予約・変更、貸切風呂の予約等が登録（記憶）されており、当該リクエスト内容ＤＢから読みだしたリクエスト内容が表示画面に表示される。 The guest room terminal 31 is installed in each guest room and is composed of a smart speaker 31A and a tablet terminal 31B.
The storage unit 312B of the tablet terminal 31B stores a request content database (DB) (not shown) and the like. Request details DB3121B registers (memorizes) requestable information such as room service details, dinner reservations, dinner time reservations / changes, breakfast reservations, breakfast time reservations / changes, and private bath reservations. The request content read from the request content DB is displayed on the display screen.

＜音声認識オーダーシステムＳの処理動作例２＞
図１５及び図１６は、ホテルに対し本発明の音声認識オーダーシステムＳを適用した場合の処理動作例を説明するためのシーケンスチャートである。 <Processing operation example 2 of voice recognition order system S>
15 and 16 are sequence charts for explaining an example of processing operation when the voice recognition order system S of the present invention is applied to a hotel.

ステップＳ２００乃至ステップＳ２０４においてユーザが「夕食予約」をオーダーすると、オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信して（ステップＳ２０５）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ２０６)。 When the user orders the "supper reservation" in steps S200 to S204, the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A (step S205), and based on the received order information. The order acceptance process is executed (step S206).

そうすると、フロントサーバ３０を介して、要求元のスマートスピーカー３１Ａと対となるタブレット端末３１Ｂの表示画面に予約可能時間を表示する（ステップＳ２０７乃至ステップＳ２１１）。例えば、フロントサーバ３０は、ステップＳ２０９において、ホテル内の各レストランの予約可能時間情報を含む制御情報を送信する。そして、タブレット端末３１Ｂでは、記憶部３１２Ｂのリクエスト内容ＤＢに記憶されているレストラン情報を抽出し、受信した予約可能時間情報と対応付けて表示する。 Then, the reservable time is displayed on the display screen of the tablet terminal 31B paired with the requesting smart speaker 31A via the front server 30 (steps S207 to S211). For example, in step S209, the front server 30 transmits control information including reservation time information of each restaurant in the hotel. Then, the tablet terminal 31B extracts the restaurant information stored in the request content DB of the storage unit 312B and displays it in association with the received reservable time information.

引き続き、ステップＳ２１２乃至ステップＳ２１５においてユーザが「17時から和食処予約」をオーダーすると、オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信して（ステップＳ２１６）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ２１７)。 Subsequently, when the user orders "Japanese restaurant reservation from 17:00" in steps S212 to S215, the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A (step S216) and receives the order information. The order acceptance process is executed based on the ordered order information (step S217).

図１６前段の例では、制御手段１０１は、記憶部１０２の発話シナリオＤＢ１０２４（図１４Ｂ）のうち、オーダー情報と共に受信した「ホテルＩＤ」に対応付けられた「発話シナリオ」から、オーダー情報に含まれるテキスト情報のうち、アクセスキーワードを除く「１７時から和食処予約」に対する会話の返答と、次に行うべき処理を実行することとなる。 In the example of the first stage of FIG. 16, the control means 101 is included in the order information from the “utterance scenario” associated with the “hotel ID” received together with the order information in the utterance scenario DB1024 (FIG. 14B) of the storage unit 102. Of the text information provided, the response to the conversation to "Reservation for Japanese restaurant from 17:00" excluding the access keyword and the processing to be performed next will be executed.

会話の返答として、オーダー情報の送信元であるスマートスピーカー３１Ａに返答する。また、次に行うべき処理は、タブレット端末３１Ｂの表示画面に注文内容を表示させることである。
まず、図１６前段の例では、「他に御用はございますか？」という返答をし、スマートスピーカー３１Ａから出力する（ステップＳ２１８）。 As a reply to the conversation, the smart speaker 31A, which is the sender of the order information, is replied. The next process to be performed is to display the order contents on the display screen of the tablet terminal 31B.
First, in the example in the first stage of FIG. 16, the reply "Do you have any other needs?" Is replied and output from the smart speaker 31A (step S218).

また、フロントサーバ３０を介して、要求元のスマートスピーカー３１Ａと対となるタブレット端末３１Ｂの表示画面に予約内容を表示する（ステップＳ２１９乃至ステップＳ２２３）。 Further, the reservation content is displayed on the display screen of the tablet terminal 31B paired with the requesting smart speaker 31A via the front server 30 (steps S219 to S223).

さらに、ステップＳ２２４乃至ステップＳ２２６にて、ユーザからスマートスピーカー３１Ａを通じてオーダーを受けると、オーダーサーバ１０の受付処理手段１０１１は、スマートスピーカー３１Ａが送信したオーダー情報を受信して（ステップＳ２２７）、受信したオーダー情報に基づいてオーダー受付処理を実行する(ステップＳ２２８)。以下、ステップＳ２２９乃至ステップＳ２３４の処理は上述した処理と同様のため説明を省略する。 Further, when an order is received from the user through the smart speaker 31A in steps S224 to S226, the reception processing means 1011 of the order server 10 receives the order information transmitted by the smart speaker 31A (step S227) and receives the order. The order acceptance process is executed based on the order information (step S228). Hereinafter, since the processing of steps S229 to S234 is the same as the processing described above, the description thereof will be omitted.

以上説明したように、本実施形態における音声認識オーダーシステムＳによれば、所定のアクセスキーワードが発話されたか否かを認証し、認証が成功した場合にオーダーサーバ１０にオーダー情報を送信し、オーダーサーバ１０にてオーダー情報に基づいたオーダー受付処理を実行することができるので、レストランやホテル等においてスマートスピーカー３１Ａを利用してオーダー処理を行う音声認識オーダーシステムＳを実現できる。 As described above, according to the voice recognition order system S in the present embodiment, it is authenticated whether or not a predetermined access keyword is spoken, and if the authentication is successful, the order information is transmitted to the order server 10 to place an order. Since the server 10 can execute the order reception process based on the order information, it is possible to realize the voice recognition order system S that performs the order process using the smart speaker 31A in a restaurant, a hotel, or the like.

オーダー内容が会話を返答することでオーダー受付処理を実行させる場合には、オーダーサーバ１０からスマートスピーカー３１Ａに向けて所定の返答を行なうことができる。また、オーダー内容に対応する“次に行うべき処理”がある場合には、当該処理を実行することができる。 When the order reception process is executed by replying the conversation with the order contents, a predetermined reply can be made from the order server 10 to the smart speaker 31A. In addition, if there is a "process to be performed next" corresponding to the order contents, the process can be executed.

また、オーダー情報に基づいたオーダー受付処理として行うべき処理が店舗サーバ３０に指示すべき処理であれば、店舗サーバ３０に対してオーダー実行指示情報を送信することにより、店舗サーバ３０がオーダー内容に応じた処理を実行することができる。そして、店舗サーバ３０がタブレット端末３１Ｂを制御情報にオーダー実行指示情報に含まれるスマートスピーカーＩＤ（ＩＤ情報）に対応づけられたタブレット端末ＩＤのタブレット端末３１Ｂを特定し、当該タブレット端末３１Ｂを制御対象端末装置として制御することができる。 Further, if the process to be performed as the order acceptance process based on the order information is the process to be instructed to the store server 30, the store server 30 can change the order contents by transmitting the order execution instruction information to the store server 30. It is possible to execute the corresponding processing. Then, the store server 30 identifies the tablet terminal 31B of the tablet terminal ID associated with the smart speaker ID (ID information) included in the order execution instruction information in the control information of the tablet terminal 31B, and controls the tablet terminal 31B. It can be controlled as a terminal device.

本発明の適用範囲は上述した構成に限定されることはない。本発明は、スマートスピーカーを用いて行われる音声認識オーダーシステム及び音声認識オーダー方法に対し、広く適用することができる。 The scope of application of the present invention is not limited to the above-described configuration. The present invention can be widely applied to a voice recognition ordering system and a voice recognition ordering method performed by using a smart speaker.

１０オーダーサーバ（第一のコンピュータ）
１０１制御部
１０１１受付処理手段
１０１２指示情報生成手段
１０１３指示情報送信手段
１０２記憶部
１０２１店舗情報データベース（ＤＢ）
１０２２発話シナリオデータベース（ＤＢ）
１０３表示部、１０４入力部、１０５通信部
３０店舗サーバ（第二のコンピュータ）
３０１制御部
３０１１指示情報受信手段
３０１３端末制御手段
３０２記憶部
３０２１客席端末管理データベース（ＤＢ）
３０３表示部、３０４入力部、３０５通信部
３１客席端末
３１Ａスマートスピーカー
３１１Ａ制御部
３１１１Ａ音声データ受信手段
３１１２Ａ認証手段
３１１３Ａテキスト情報生成手段
３１１４Ａオーダー情報送信手段
３１２Ａ記憶部
３１２１Ａ音声認識データベース（ＤＢ）
３１３Ａスピーカー（音声出力手段）、
３１４Ａマイクロフォン（音声入力手段）、
３１５Ａ操作部、３１６Ａ通信部
３１Ｂタブレット端末（端末装置、制御対象端末装置）
３１１Ｂ制御部
３１２Ｂ記憶部
３１２１Ｂ注文可能商品データベース（ＤＢ）
３１３Ｂ表示部、３１４Ｂ入力部、３１５Ｂ通信部
４０ＡＩアシスタントサーバ

10 Order server (first computer)
101 Control unit 1011 Reception processing means 1012 Instruction information generation means 1013 Instruction information transmission means 102 Storage unit 1021 Store information database (DB)
1022 Speech scenario database (DB)
103 display unit, 104 input unit, 105 communication unit 30 Store server (second computer)
301 Control unit 3011 Instruction information receiving means 3013 Terminal control means 302 Storage unit 3021 Audience terminal management database (DB)
303 Display unit, 304 Input unit, 305 Communication unit 31 Audience terminal 31A Smart speaker 311A Control unit 3111A Voice data receiving means 3112A Authentication means 3113A Text information generating means 3114A Order information transmitting means 312A Storage unit 3121A Voice recognition database (DB)
313A speaker (audio output means),
314A microphone (voice input means),
315A operation unit, 316A communication unit 31B tablet terminal (terminal device, controlled terminal device)
311B Control unit 312B Storage unit 3121B Orderable product database (DB)
313B display unit, 314B input unit, 315B communication unit 40 AI assistant server

Claims

A voice recognition ordering system including a smart speaker having at least a voice input means and a voice output means, and a first computer connected to the smart speaker via a network so as to exchange information.
The smart speaker
A voice data receiving means for receiving voice data obtained by a user speaking while the trigger is activated.
An authentication means that recognizes the voice data received by the voice data receiving means and authenticates that a predetermined access keyword has been spoken.
A text information generating means that generates text information that specifies a user's order content based on the voice data received by the voice data receiving means, and a text information generating means.
It has an order information transmitting means for transmitting the order information including the text information generated by the text information generating means to the first computer when the authentication of the utterance of the access keyword is successful by the authentication means.
The first computer is
A voice recognition order system comprising a reception processing means that executes an order reception process based on the received order information when the order information transmission means of the smart speaker receives the order information transmitted.

The first computer is an instruction information generation means for generating order execution instruction information instructing to execute a process according to the order contents based on the received order information.
It has an instruction information transmitting means for transmitting the order execution instruction information generated by the instruction information generating means to a second computer connected to the first computer via a network so as to exchange information.
The second computer
An instruction information receiving means for receiving the order execution instruction information transmitted by the instruction information transmitting means of the first computer, and
The voice recognition order system according to claim 1, further comprising an order execution means that executes a process according to the order contents based on the order execution instruction information received by the instruction information receiving means.

The voice recognition order system according to claim 2, wherein the plurality of smart speakers and a terminal device associated with each of the smart speakers and including the terminal device capable of instructing an operation by a user.
The order execution instruction information includes ID information for identifying a smart speaker that is a source of the order information.
The second computer is a terminal that controls the terminal device associated with the smart speaker specified by the ID information included in the order execution instruction information as a control target terminal device based on the order execution instruction information. A voice recognition ordering system characterized by having a control means.

The second computer transmits control information to the control target terminal device to instruct the display screen of the control target terminal device to display according to the order content indicated by the order execution instruction information. The voice recognition order system according to claim 3, wherein the voice recognition order system is characterized.

A voice recognition ordering method executed by a smart speaker having at least a voice input means and a voice output means, and a first computer connected to the smart speaker via a network so as to exchange information.
The smart speaker
A voice data reception step that receives voice data obtained by the user speaking while the trigger is activated, and
An authentication step that recognizes the received voice data and authenticates that the specified access keyword has been spoken.
A text information generation step for generating text information that specifies a user's order content based on the voice data received in the voice data receiving step, and a text information generation step.
When the authentication of the utterance of the access keyword is successful in the authentication step, the order information transmission step of transmitting the order information including the text information generated in the text information generation step to the first computer is used. Have,
The first computer
A voice recognition ordering method comprising: a reception processing step of executing an order reception process based on the order information upon receiving the order information transmitted from the smart speaker.

The first computer
Upon receiving the order information transmitted from the smart speaker, an instruction information generation step of generating order execution instruction information instructing to execute a process according to the order contents based on the order information, and an instruction information generation step.
It has an instruction information transmission step of transmitting the order execution instruction information generated by the instruction information generation step to a second computer connected to the first computer via a network so as to exchange information. And
The second computer
An instruction information receiving step for receiving the order execution instruction information transmitted from the first computer, and
The voice recognition order method according to claim 5, further comprising an order execution step for executing a process according to the order content based on the order execution instruction information received in the instruction information receiving step. ..

The order execution instruction information includes ID information for identifying the smart speaker that is the source of the order information.
The second computer is a plurality of smart speakers and a terminal device associated with each of the smart speakers, and is included in the order execution instruction information among the terminal devices that can be operated by the user. The sixth aspect of claim 6 is characterized in that the terminal device associated with the smart speaker specified by the ID information is controlled as a control target terminal device based on the order execution instruction information. Voice recognition ordering method.

The second computer transmits control information to the control target terminal device to instruct the display screen of the control target terminal device to display according to the order content indicated by the order execution instruction information. The voice recognition ordering method according to claim 7, characterized in that.