JP2019191795A

JP2019191795A - Customer service support system

Info

Publication number: JP2019191795A
Application number: JP2018082131A
Authority: JP
Inventors: 和夫金子; Kazuo Kaneko
Original assignee: Individual
Current assignee: Individual
Priority date: 2018-04-23
Filing date: 2018-04-23
Publication date: 2019-10-31
Anticipated expiration: 2038-04-23
Also published as: JP6535783B1; WO2019208327A1

Abstract

To quickly grasp customer requests.SOLUTION: An information processing apparatus (terminal 100) according to the present invention includes imaging means (imaging unit 180) for capturing an image of a scene around a user, audio acquisition means (audio input unit 190) for acquiring a content spoken by the user, analysis means (analysis unit 130, information generation unit 110) for analyzing the acquired audio on the basis of the captured image, and output means (output unit 120) for outputting a result of the analysis.SELECTED DRAWING: Figure 3

Description

本発明は、接客を支援する技術に関する。 The present invention relates to a technology for supporting customer service.

客からの商品の注文の受付や会計などを支援するためのシステムがある。このようなシステムを用いれば、接客員に口頭で注文内容を伝えることなく、顧客に注文品を提供することができる。 There is a system to support the receipt of goods orders from customers and accounting. By using such a system, it is possible to provide an order to a customer without verbally communicating the details of the order to the customer service.

特開２０１４-６３５１６号公報JP 2014-63516 A

しかし、機器の操作に不慣れな客がいた場合、あるいは要望の性質上端末装置の操作によってはその要望に対応することが困難である場合、客は接客員を呼ぶことになる。とはいっても、予算の関係上や人材不足等の要因によって、接客員を配置することはなかなか困難なのが実情である。この結果、客の満足度が下がる懸念がある。あるいは、そもそも接客に不満を持っていても、接客員を呼ぶのに躊躇する客も少なくない。この場合、接客について客が抱く不満を解消することができない。
さらに、接客員が適切な応対を行った結果、客の要望が満たされたとしても、客は接客員を呼ぶ時点では既に一定の不満やストレスを溜めている場合も多く、その店のサービスに対する印象の悪さが完全に払拭されるとは限らない。 However, if there is a customer unfamiliar with the operation of the device, or if it is difficult to respond to the request depending on the operation of the terminal device due to the nature of the request, the customer calls the customer service. However, the fact is that it is very difficult to arrange a customer service due to factors such as budget and lack of human resources. As a result, there is a concern that customer satisfaction is lowered. Or, in the first place, even if you are dissatisfied with customer service, there are many customers who hesitate to call the customer service. In this case, the customer's dissatisfaction with the customer service cannot be resolved.
Furthermore, even if the customer's request is satisfied as a result of the customer service being taken appropriately, the customer may already have a certain amount of dissatisfaction or stress at the time of calling the customer service. The bad impression is not completely wiped out.

このように、従来の接客支援システムには改善の余地があった。本発明は、客の要望を迅速に把握することを目的とする。 Thus, the conventional customer service support system has room for improvement. An object of this invention is to grasp | ascertain a customer's request rapidly.

本発明は、一の態様において、ユーザの周囲の場景を示す画像を取得する画像取得手段と、前記ユーザの発話に係る音声を取得する音声取得手段と、該取得された画像に基づいて、前記発話を解析する解析手段と、前記解析の結果を出力する出力手段とを有する情報処理装置を提供する。 In one aspect, the present invention provides an image acquisition unit that acquires an image showing a scene around the user, a voice acquisition unit that acquires a voice related to the user's utterance, and the acquired image based on the acquired image. Provided is an information processing apparatus having analysis means for analyzing an utterance and output means for outputting the result of the analysis.

本発明によれば、客の要望を迅速に把握することができる。 According to the present invention, it is possible to quickly grasp a customer's request.

接客支援システム１０の概要図。1 is a schematic diagram of a customer service support system 10. 端末１００の設置状態を示す図。The figure which shows the installation state of the terminal 100. FIG. 端末１００の機能ブロック図。The functional block diagram of the terminal 100. FIG. サーバ２００の機能ブロック図。The functional block diagram of the server 200. FIG. 端末１００の動作例を示す図。The figure which shows the operation example of the terminal 100. FIG. 端末１００の画面に表示される内容の例（その１）。The example of the content displayed on the screen of the terminal 100 (the 1). 端末１００の画面に表示される内容の例（その２）。The example of the content displayed on the screen of the terminal 100 (the 2). 端末１００の画面に表示される内容の例（その３）。The example of the content displayed on the screen of the terminal 100 (the 3). 端末１００の画面に表示される内容の例（その４）。The example of the content displayed on the screen of the terminal 100 (the 4). 端末１００の画面に表示される内容の例（その５）。The example of the content displayed on the screen of the terminal 100 (the 5). 端末１００の画面に表示される内容の例（その６）。An example (the 6) of the content displayed on the screen of the terminal 100. FIG. 端末１００の画面に表示される内容の例（その７）。An example of the content displayed on the screen of the terminal 100 (part 7). 端末１００の画面に表示される内容の例（その８）。An example of the content displayed on the screen of the terminal 100 (part 8).

図１は接客支援システム１０の概要を示す。接客支援システム１０は、端末１００とサーバ２００と従業員端末３００とを含む。スタッフＳＴＦ１およびＳＴＦ２は、フロアにいて客にサービスを行う従業員である。従業員には、料理をテーブルまで運ぶウエイトレス、ボーイ、あるいは管理監督するフロアマネージャー等が含まれる。スタッフＳＴＦ３は厨房にいて料理を作る料理人である。なお、従業員の役割が明確となっている必要はない。以下では単に従業員という。 FIG. 1 shows an outline of the customer service support system 10. The customer service support system 10 includes a terminal 100, a server 200, and an employee terminal 300. Staff STF1 and STF2 are employees who are on the floor and serve customers. Employees include a waitress who carries food to the table, a boy, or a floor manager who supervises. Staff STF3 is a cook who is in the kitchen and makes food. The employee's role need not be clear. In the following, it is simply called an employee.

端末１００（端末１００−１、１００−２、１００−３）は、サービス提供を受ける１以上の客に対応して設けられる。また、端末１００の数は例示である。
サーバ２００は、端末１００および従業員端末３００（従業員端末３００−１、３００−２、３００−３）と通信する。
各従業員端末３００（従業員端末３００−１、３００−２、３００−３）は、従業員に携帯されまたは店舗に据え付けられ、サーバ２００と無線通信を行う機能および情報画像または音声により通知する機能を備えた情報処理端末であって、接客員や料理人など従業員に情報を通知する。各従業員端末３００は、サーバ２００と無線通信を行い、接客その他の指示をサーバ２００から受信すると、従業員に情報を通知する。同図に示される従業員端末３００の数は例示である。 Terminals 100 (terminals 100-1, 100-2, 100-3) are provided corresponding to one or more customers who receive service provision. Further, the number of terminals 100 is an example.
Server 200 communicates with terminal 100 and employee terminal 300 (employee terminals 300-1, 300-2, 300-3).
Each employee terminal 300 (employee terminal 300-1, 300-2, 300-3) is carried by an employee or installed in a store, and notifies by a function of performing wireless communication with the server 200 and an information image or voice. It is an information processing terminal equipped with functions, and notifies employees such as customer service and cooks. Each employee terminal 300 performs wireless communication with the server 200 and, when receiving a customer service or other instruction from the server 200, notifies the employee of information. The number of employee terminals 300 shown in the figure is an example.

図２は端末１００の設置状態を示す。端末１００は、テーブル、カウンター、サービス提供を受ける一人のユーザまたは一つのユーザグループに対して少なくとも一つ以上設けられる。この例では、サービスの提供を受ける４人の客ＣＳＴ１〜客ＣＳＴ４（以下、ユーザという）が着席する一つのテーブルＴＢＬ１の上に置かれたタブレット型情報端末である。端末１００は、マイクカメラ１０１とスピーカ１０２とタッチパネル１０３とを含む。なお、端末１００の形態や設置位置は一例であって、例えば壁掛け等であってもよい。 FIG. 2 shows the installation state of the terminal 100. At least one terminal 100 is provided for a table, a counter, one user who receives service provision, or one user group. In this example, it is a tablet type information terminal placed on one table TBL1 on which four customers CST1 to CST4 (hereinafter referred to as users) receiving services are seated. The terminal 100 includes a microphone camera 101, a speaker 102, and a touch panel 103. In addition, the form and installation position of the terminal 100 are examples, and may be a wall hanging, for example.

マイクカメラ１０１は、テーブルＴＢＬ１や客ＣＳＴを含む場景を撮影するとともに、客ＣＳＴの発話を含む環境音を集音する。
スピーカ１０２は、ユーザへの情報を音声により通知するためのスピーカである。
タッチパネル１０３は、タッチパネル付きの表示装置であって、注文や呼出し等のサービス提供を受けるために客ＣＳＴによって操作されて情報や指示が端末１００に入力されるとともに、文字や画像によって客ＣＳＴにサービス内容その他の情報を表示するデバイスである。 The microphone camera 101 captures a scene including the table TBL1 and the customer CST and collects environmental sounds including the utterance of the customer CST.
The speaker 102 is a speaker for notifying information to the user by voice.
The touch panel 103 is a display device with a touch panel, which is operated by the customer CST to receive services such as ordering and calling, and information and instructions are input to the terminal 100, and the customer CST is serviced by characters and images. A device that displays content and other information.

なお、テーブル、飲食の内容、飲食物の提供の形態、着席、座席位置、人数は例示である。
なお、マイクカメラ１０１および／またはスピーカ１０２は、タッチパネル１０３と別体として設けられてもよい。スピーカ１０２は省略してもよい。この場合は、情報のユーザへの通知は文字や画像のみによって行われる。要は、端末１００を用いて、ユーザに注文の受付その他のサービスに関する要求に係る入力操作や、メニュー等のサービス内容に関する情報の閲覧をさせる一方、少なくとも一人のユーザの声、およびユーザの周囲の場景を示す画像を端末１００が取得できればよい。ユーザの周囲の場景とは、例えば、提供されている料理の内容、数、料理が消費される状況などが反映されたテーブルの画像およびユーザの状況（飲食などの動作や行為の内容、飲食のスピードなどの行為の態様、表情など）サービスの提供（飲食）を受けている状況を示す画像のうち少なくともいずれかを含む。なお、取得される画像は、静止画であっても動画であってもよい。 In addition, the table, the content of food and drink, the form of provision of food and drink, the seating, the seat position, and the number of people are examples.
Microphone camera 101 and / or speaker 102 may be provided separately from touch panel 103. The speaker 102 may be omitted. In this case, notification of information to the user is performed only with characters and images. In short, while using the terminal 100, the user is allowed to perform an input operation related to an order acceptance or other service request, or browse information related to service contents such as a menu, while at least one user's voice and the surroundings of the user It is only necessary that the terminal 100 can acquire an image showing a scene. The surrounding scene of the user is, for example, the contents of the provided dishes, the number of tables, the table image reflecting the situation where the dishes are consumed, etc. and the user situation (contents of actions and actions such as eating and drinking, It includes at least one of images indicating a situation in which a service is provided (eating and drinking) such as an aspect of an action such as speed and a facial expression. Note that the acquired image may be a still image or a moving image.

図３は端末１００の機能ブロック図である。端末１００は、情報生成部１１０と出力部１２０と解析部１３０と操作部１４０と計時部１５０と記憶部１６０と撮影部１８０と画像処理部１８１と音声入力部１９０と音声信号処理部１９１とを含む。 FIG. 3 is a functional block diagram of the terminal 100. The terminal 100 includes an information generation unit 110, an output unit 120, an analysis unit 130, an operation unit 140, a timing unit 150, a storage unit 160, a photographing unit 180, an image processing unit 181, an audio input unit 190, and an audio signal processing unit 191. Including.

操作部１４０は、例えばタッチパネル１０３として実現され、ユーザによって操作される。 The operation unit 140 is realized as the touch panel 103, for example, and is operated by the user.

計時部１５０は、クロック回路等によって実現され、情報生成部１１０によって適宜参照され、現在時刻の情報を提供する。 The time measuring unit 150 is realized by a clock circuit or the like, and is appropriately referred to by the information generating unit 110 to provide current time information.

撮影部１８０および画像処理部１８１は、マイクカメラ１０１として実現される。撮影部１８０は、光学系、レンズ、撮影機構、受光素子、制御機構を含み、端末１００の周囲の（換言するとユーザの周囲の）場景を撮影する。画像処理部１８１は、画像処理プロセッサによって実現され、撮影部１８０にて撮影された画像のデータから、周知のアルゴリズムを用いて、撮影された場景の特徴量を抽出する。特徴量とは、例えば、テーブルの上の皿の有無、皿の料理が消費されている程度、ユーザの表情、ユーザの口の動きや箸の動きを含む各種動作の内容やタイミングである。特徴量の抽出タイミングは、例えば所定フレーム数ごとに実行する。撮影部１８０および画像処理部１８１は、ユーザの周囲の場景を示す画像を取得する画像取得手段として機能する。生成された特徴量は、リアルタイムもしくは所定の時間間隔で、解析部１３０へ供給される。 The photographing unit 180 and the image processing unit 181 are realized as the microphone camera 101. The photographing unit 180 includes an optical system, a lens, a photographing mechanism, a light receiving element, and a control mechanism, and photographs a scene around the terminal 100 (in other words, around the user). The image processing unit 181 is realized by an image processing processor, and extracts the feature amount of the shot scene from the image data shot by the shooting unit 180 using a known algorithm. The feature amount is, for example, the content and timing of various operations including the presence / absence of a plate on the table, the degree to which the dish is consumed, the facial expression of the user, the movement of the user's mouth and the movement of chopsticks. The feature amount extraction timing is executed, for example, every predetermined number of frames. The imaging unit 180 and the image processing unit 181 function as an image acquisition unit that acquires an image showing a scene around the user. The generated feature amount is supplied to the analysis unit 130 in real time or at predetermined time intervals.

音声入力部１９０および音声信号処理部１９１は、マイクカメラ１０１として実現される。音声入力部１９０は、コンデンサマイクである。音声信号処理部１９１は、音声入力部１９０にて取得した波形データから、ノイズ除去処理や音声認識処理を行ってテキストデータを生成する音声信号処理プロセッサを含む。音声入力部１９０および音声信号処理部１９１は、ユーザの発話に係る音声を取得する音声取得手段として機能する。すなわち、ユーザの発話が集音されて、発話の内容を付ける情報（単語やフレーズなど）を生成する。生成された発話の内容を特徴づける情報は、解析部１３０へ供給される。 The audio input unit 190 and the audio signal processing unit 191 are realized as the microphone camera 101. The voice input unit 190 is a condenser microphone. The voice signal processing unit 191 includes a voice signal processor that performs text removal processing and voice recognition processing from waveform data acquired by the voice input unit 190 to generate text data. The voice input unit 190 and the voice signal processing unit 191 function as a voice acquisition unit that acquires voice related to the user's speech. That is, the user's utterances are collected and information (words, phrases, etc.) for adding the contents of the utterances is generated. Information characterizing the content of the generated utterance is supplied to the analysis unit 130.

解析部１３０は、汎用又は専用の１以上のプロセッサによって実現され、画像取得手段にて取得された画像と音声取得手段にて取得された音声とを用いて、ユーザの発話を解析する解析手段として機能する。ここで発話を解析するとは、音声解析により抽出された単語の辞書的な意味を特定するにとどまらず、発話がなされた背景や状況、発話の意図等を決定することである。例えば、一般的に、ある単語を発したことを検出したとしても、指示代名詞など意味を一義的に決定することができなかったり、辞書的な意味においては明確だったとしても、独り言なのかあるいは他の人への指示として発言されたなど等の状況や背景を判別できない場合がある。 The analysis unit 130 is realized by one or more general-purpose or dedicated processors, and serves as an analysis unit that analyzes a user's utterance using an image acquired by the image acquisition unit and a voice acquired by the audio acquisition unit. Function. Here, the analysis of the utterance is not only specifying the dictionary meaning of the word extracted by the voice analysis but also determining the background and situation of the utterance, the intention of the utterance, and the like. For example, in general, even if it is detected that a certain word is uttered, the meaning such as a demonstrative pronoun cannot be determined unambiguously. There are cases in which it is not possible to determine the situation or background, such as a statement made as an instruction to another person.

この点に鑑み、解析部１３０は、場景を示す画像を用いて、発話の内容を特定するほか、意図、背景などのより深いレベルの情報を抽出することを試みる。
例えば、解析部１３０は、撮影された画像に基づいて、該取得した音声内容を解析し、該撮影されたユーザの動作に基づいて、該取得された発話が注文の内容を意味するか否かを判定する。解析の結果は、情報生成部１１０に供給される。あるいは、解析部１３０は、撮影された画像に基づいて決定された飲食の進進行度に応じて、発話が注文の内容を意味するか否かを判定する。この際、音声解析により抽出された語が指示代名詞や感嘆詞など以外の、有意な語である場合、画像に加えて当該語の意味を加味して、発話を解析してもよい。 In view of this point, the analysis unit 130 attempts to extract deeper level information such as intention and background, in addition to specifying the content of the utterance using the image showing the scene.
For example, the analysis unit 130 analyzes the acquired audio content based on the captured image, and determines whether the acquired utterance means the content of the order based on the captured user action. Determine. The analysis result is supplied to the information generation unit 110. Or the analysis part 130 determines whether an utterance means the content of an order according to the progress degree of the eating and drinking determined based on the image | photographed image. At this time, when the word extracted by the voice analysis is a significant word other than the pronoun or exclamation, the utterance may be analyzed in consideration of the meaning of the word in addition to the image.

注文の内容を意味すると判定した場合、当該注文の品を示すテキスト情報に注文品であ
ること示すフラグ情報が付加され、情報生成部１１０へ供給される。情報生成部１１０では供給された情報に基づいて注文情報が生成される。
注文の内容を意味しないと判定した場合は、商品および／またはサービスに対するユーザの感想、評価、不満、要望等（以下、感想等という）を意味するものであるか否かが判定される。具体的には、その発話部分の音声データを切り出してまたはその発話内容を示すテキスト情報に変換して、感想等の情報であることを示すフラグ情報を付加し、情報生成部１１０へ供給する。
注文の内容でも感想等でもないと判定された場合は、情報生成部１１０に供給される情報はない。 When it is determined that it means the content of the order, flag information indicating that it is an order item is added to the text information indicating the item of the order and supplied to the information generation unit 110. The information generation unit 110 generates order information based on the supplied information.
If it is determined that the content of the order is not meant, it is determined whether or not it means the user's impression, evaluation, dissatisfaction, demand, etc. (hereinafter referred to as an impression) for the product and / or service. Specifically, the voice data of the utterance part is cut out or converted into text information indicating the utterance content, flag information indicating the impression information is added, and the information is supplied to the information generation unit 110.
If it is determined that the order is neither content nor impression, no information is supplied to the information generation unit 110.

情報生成部１１０は、汎用又は専用の１以上のプロセッサによって実現され、記憶部１６０から制御プログラムを読み出して実行することによって端末１００の各部を制御する。具体的には、解析部１３０から供給された情報（解析結果）と、操作部１４０および計時部１５０から供給される信号に基づき、また必要に応じて記憶部１６０から読み出した情報を用いて、出力部１２０へ、通知の実行命令および通知の内容を含む情報を生成して出力部１２０に出力する。
具体的には、解析部１３０にて生成された１回以上の解析結果、操作部１４０から取得した操作内容、解析部１３０にて生成された解析結果および操作部１４０から入力された操作信号、計時部１５０から取得したタイミング情報、および記憶部１６０から読み出したユーザ属性のうち、少なくともいずれか一つに基づいて、上記情報が生成される。 The information generation unit 110 is realized by one or more general-purpose or dedicated processors, and controls each unit of the terminal 100 by reading a control program from the storage unit 160 and executing it. Specifically, based on information (analysis result) supplied from the analysis unit 130 and signals supplied from the operation unit 140 and the time measuring unit 150, and using information read from the storage unit 160 as necessary, Information including a notification execution instruction and notification content is generated and output to the output unit 120.
Specifically, one or more analysis results generated by the analysis unit 130, operation contents acquired from the operation unit 140, analysis results generated by the analysis unit 130, and operation signals input from the operation unit 140, The information is generated based on at least one of the timing information acquired from the time measuring unit 150 and the user attribute read from the storage unit 160.

情報生成部１１０にて生成される情報には、以下の４つの種類がある。第１は、操作部１４０またはい音声入力部１９０にて受付けた注文に関する情報（注文の品や提供のタイミングについての情報）であって、主として従業員端末３００−３へ提供されるもの（注文情報という）である。第２は、主として従業員端末３００−１、３００−２へ送信される情報であって、操作部１４０から入力された信号または情報生成部１１０が解析部１３０からの情報に基づいて生成された、従業員の応対の要否や具体的な接客内容（（例えば、ユーザの要望を尋ねる、水を提供する、皿を下げるなど））に関する情報（応対要否情報という）である。第３は、表示部１２１または音声出力部１２２へ供給されて、ユーザへの各種通知や、提案、要求等を行うための情報（ユーザ支援情報という）である。第４は、商品および／またはサービスに対するユーザの感想、評価、不満、要望等（以下、感想等という）の情報（フィードバック情報という）である。 The information generated by the information generation unit 110 includes the following four types. The first is information related to the order received by the operation unit 140 or the voice input unit 190 (information about the order item and provision timing), which is mainly provided to the employee terminal 300-3 (order). Information). The second is information mainly transmitted to the employee terminals 300-1 and 300-2, and the signal input from the operation unit 140 or the information generation unit 110 is generated based on the information from the analysis unit 130. , Information (referred to as “necessity information”) regarding the necessity of reception of employees and specific contents of customer service (for example, asking user requests, providing water, lowering dishes, etc.). The third is information (referred to as user support information) that is supplied to the display unit 121 or the audio output unit 122 to make various notifications, suggestions, requests, and the like to the user. The fourth is information (hereinafter referred to as feedback information) of user impressions, evaluations, dissatisfactions, requests, etc. (hereinafter referred to as impressions, etc.) regarding the product and / or service.

応対要否情報やユーザ支援情報の生成方法については、例えば以下のように実行される。
操作部１４０にて同じような操作をユーザが繰り返していると検知されるなど、撮影されたユーザの動作から注文操作がうまくっていないと推定される場合、「うまくいかないなあ・・」、「あれ？」、「マグロがみつからないな・・・」といった発話があったことが解析部１３０にて検出された場合、あるいは撮影された画像から、空いた皿がテーブルにたまっている、ユーザの表情がこわばっている、ユーザがキョロキョロしている（顔の向きの変化が激しい）といった場景が解析部１３０にて解析された場合、情報生成部１１０は、操作がうまくいっていない、皿を下げて欲しい、会計方法が分からない等、ユーザが何らかの不満や要求を抱えていると推測し、推測の内容に基づいて、操作を支援する画面の生成や従業員による応対の必要性を判定し、該判定の結果や判定結果に基づいて、応対情報やユーザ支援情報を生成する。 The method for generating the necessity information or the user support information is executed as follows, for example.
If it is presumed that the ordering operation is not successful based on the user's action taken, such as when the operation unit 140 detects that the user repeats the same operation, ? ","Don't find a tuna ... ", or when the analysis unit 130 detects an utterance, or a user's facial expression that an empty plate has accumulated on the table from the captured image If the analysis unit 130 analyzes that the scene is stiff or the user is jerky (the face direction changes drastically), the information generation unit 110 is not operating well and wants to lower the dish. , Guess that the user has some dissatisfaction or request, such as not knowing the accounting method, and based on the content of the guess, generate a screen that supports the operation and Determine materiality, based on the result and the determination result of the determination, generates the answering information and user assistance information.

ユーザ支援情報には、撮影された場景に対応した商品またはサービス（レコメンドする商品など）を特定するための情報、注文に関する操作を支援するための情報ユーザの感想や評価の提供を促すためのメッセージが含まれてもよい。また、ユーザ支援情報は、情報に対応した画像を含んでいてもよい。場景に対応した画像とは、例えば、ユーザの注文の
内容（量や種類）、注文のスピードやタイミングに応じて決定される商品やサービスに関連する情報であってもよいし、キャラクタの画像やアバター画像などの商品やサービスとは直接関係のない情報であってもよい。このような画像を表示することで、注文操作時のエンターテインメント性が向上する。 The user support information includes information for identifying products or services (such as recommended products) corresponding to the photographed scene, information for supporting operations related to orders, and messages for prompting the user to provide feedback and evaluation. May be included. Further, the user support information may include an image corresponding to the information. The image corresponding to the scene may be, for example, information related to a product or service determined in accordance with the content (amount or type) of the user's order, the speed or timing of the order, Information that is not directly related to a product or service such as an avatar image may be used. By displaying such an image, the entertainment property at the time of ordering operation is improved.

ユーザ支援情報のうち商品情報の提案に関しては、ユーザが笑っている、上機嫌である、満足している、皿が空になる（飲食のペース）が標準よりも速いという解析結果を取得した場合、情報生成部１１０は、直前の注文内容と同一の内容の注文を行うように勧めるべきであると判定してもよい。飲食のペースに関しては、例えば一品の飲食に要する標準的な時間を記憶部１６０に記憶しておき、この標準的な時間と解析部１３０の解析結果から算定される時間とを比較することにより、飲食のペースを決定する。
また、記憶部１６０に被写体のユーザについてのユーザ属性に、そのユーザが好んで注文する商品の内容や来店タイミングに関する情報が含まれている場合において、情報生成部１１０は、今回の注文内容がいつもの注文品と異なる場合に、いつもの商品を提案し、あるいは、似たようなユーザ属性を有する他のユーザの注文履歴に基づいて、商品を提案すべきであると判定してもよい。
また、情報生成部１１０は、「これ、すごくおいしい！」、「またこの店にこようね」、「さっきの従業員の態度はどうかな・・・」といった、商品やサービスに対する感想等や評価を意味すると推定される発話が解析部１３０にて確認された場合、その評価の提供を依頼する機会が到来したと判断し、ユーザに対して依頼を実行すべきであると判定してもよい。 Regarding the proposal of product information among user support information, when the analysis result that the user is laughing, is happy, satisfied, the plate is empty (the pace of eating and drinking) is faster than the standard, The information generation unit 110 may determine that it should be recommended to place an order having the same content as the immediately preceding order content. Regarding the pace of eating and drinking, for example, by storing the standard time required for one item of eating and drinking in the storage unit 160, and comparing this standard time and the time calculated from the analysis result of the analysis unit 130, Determine the pace of eating and drinking.
In addition, when the user attribute of the subject user is included in the storage unit 160, the information generation unit 110 always displays the order contents of this time. If it is different from the order item, it may be determined that the product should be proposed based on the order history of other users having similar user attributes or the usual product.
In addition, the information generation unit 110 gives impressions and evaluations regarding products and services such as “This is really delicious!”, “Let's go to this store again”, “How about the attitude of the employee just before ...” When the analysis unit 130 confirms an utterance that is estimated to mean, it may be determined that an opportunity to request provision of the evaluation has arrived, and it may be determined that the request should be executed to the user. .

注文情報に関し、情報生成部１１０は、操作部１４０から受付けた注文のタイミングと、解析部１３０にて生成された場景の解析結果に基づいて、飲食物を提供するタイミングを調整してもよい。例えば、解析部１３０にてユーザの発話量が多い、テーブル上の料理が消費されるスピードが遅い、橋やフォークを持っている時間が短いといった解析結果が得られた場合、操作部１４０にて受付けた注文の場合、提供タイミングを意図的に遅らせる指示を生成して注文情報に内包させる。逆に、飲食スピードが速いと判断した場合は、この注文に係る品の提供のタイミングを急ぐ（他のユーザの注文よりも優先させることを示すフラグ情報を生成する。提供タイミングは、既に適用されている料理の種類やユーザ属性に基づいて決定してもよい。 Regarding the order information, the information generation unit 110 may adjust the timing of providing food and drink based on the order timing received from the operation unit 140 and the analysis result of the scene generated by the analysis unit 130. For example, when the analysis unit 130 obtains an analysis result that the user's utterance amount is large, the speed at which the dish on the table is consumed is slow, or the time for holding the bridge or fork is short, the operation unit 140 In the case of an accepted order, an instruction to intentionally delay the provision timing is generated and included in the order information. Conversely, when it is determined that the eating and drinking speed is fast, the provision timing of the goods related to this order is rushed (flag information indicating that priority is given to orders of other users is generated. The provision timing is already applied. The determination may be made based on the type of cooking food and user attributes.

フィードバック情報に関し、発話内容（音声データ）や映像（顔の表情など）そのものをフィードバック情報に含ませてもよいし、情報生成部１１０は音声データや画像データを加工して得られた情報を生成してもよい。例えば、注文情報に対応付けて、注文品の識別ＩＤ，注文時刻、満足度の指標（例えば５段階の「２」という情報が生成される。 Regarding feedback information, utterance content (voice data) or video (facial expression etc.) itself may be included in the feedback information, or the information generation unit 110 generates information obtained by processing the voice data and image data. May be. For example, in association with the order information, the order ID, the order time, and the satisfaction index (for example, information of “2” in five stages) are generated.

記憶部１６０は、ハードディスクや半導体メモリ等の記憶デバイスであって、後述する端末１００の動作を実現するためのＯＳプログラムのほか、メニューの表示および注文の受付を行うための画面の生成（ユーザインタフェース）を実現するためのプログラムが記憶される。
加えて、記憶部１６０には、ユーザ属性が記憶される。ユーザ属性とは、ユーザの年齢・性別などの情報ほか、商品へサービスとのユーザの関係（注文内容の履歴、来店の履歴、回数頻度など）を含んでいてもよい。 The storage unit 160 is a storage device such as a hard disk or a semiconductor memory, and in addition to an OS program for realizing the operation of the terminal 100 to be described later, it generates a screen for displaying menus and receiving orders (user interface). ) Is stored.
In addition, user attributes are stored in the storage unit 160. The user attributes may include information such as the user's age and sex, as well as the user's relationship with the product to the service (order content history, store visit history, frequency of times, etc.).

また、記憶部１６０には、撮影部１８０や音声入力部１９０にて取得され、場景が反映された画像や音声のデータ、または解析部１３０にて生成された解析結果（ユーザの発話の内容を含む）が記憶される。なお、撮影部１８０や音声入力部１９０にて基づいて生成されたデータはユーザのプライバシーに関する情報が含まれている可能性が高いので、一定時間が経過すると削除されるか、解析部１３０にて個人が特定されない情報に加工され
て上で記憶されることが好ましい。 In addition, the storage unit 160 stores image and sound data that is acquired by the photographing unit 180 and the voice input unit 190 and reflects the scene, or an analysis result generated by the analysis unit 130 (contents of the user's utterances). Is stored). Note that the data generated based on the photographing unit 180 and the voice input unit 190 is likely to contain information related to the user's privacy. It is preferable that the information is processed into information that is not specified and stored above.

出力部１２０は、表示部１２１と音声出力部１２２と通信部１２３を含み、情報生成部１１０から供給された解析部１３０の解析の結果が反映された情報を出力する。具体的には、表示部１２１及び音声出力部１２２によってユーザへの通知が行われ、通信部１２３によってサーバ２００または従業員端末３００への情報の提供が行われる。 The output unit 120 includes a display unit 121, an audio output unit 122, and a communication unit 123, and outputs information reflecting the analysis result of the analysis unit 130 supplied from the information generation unit 110. Specifically, the display unit 121 and the voice output unit 122 notify the user, and the communication unit 123 provides information to the server 200 or the employee terminal 300.

表示部１２１は、液晶パネルや駆動回路を含み、タッチパネル１０３として実現される。表示部１２１は、文字や画像によってユーザに対して情報を通知する。音声出力部１２２は、マイクやＡ／Ｄ変換回路を含み、スピーカ１０２として実現される。音声出力部１２２は、音声によって情報の通知をユーザに対して行う。表示部１２１および音声出力部１２２は同一の情報を通知してもよいし、異なる情報を通知してもよい。前者の場合、表示部１２１および音声出力部１２２の一方を省略してもよい。 The display unit 121 includes a liquid crystal panel and a drive circuit, and is realized as the touch panel 103. The display unit 121 notifies the user of information using characters or images. The audio output unit 122 includes a microphone and an A / D conversion circuit, and is realized as the speaker 102. The voice output unit 122 notifies the user of information by voice. The display unit 121 and the audio output unit 122 may notify the same information or different information. In the former case, one of the display unit 121 and the audio output unit 122 may be omitted.

通信部１２３は、ＩＥＥＥ等の所定の無線通信規格に従って情報をサーバ２００へ送信するための通信モジュールとして実現され、注文情報、応対情報、フィードバック情報をサーバ２００へ送信する。サーバ２００へ送信される情報は、その情報の送信元の端末１００の各々を識別するための情報と紐づけられて送信され、サーバ２００に記憶される。 The communication unit 123 is realized as a communication module for transmitting information to the server 200 in accordance with a predetermined wireless communication standard such as IEEE, and transmits order information, reception information, and feedback information to the server 200. Information transmitted to server 200 is transmitted in association with information for identifying each terminal 100 that is the transmission source of the information, and stored in server 200.

図６に出力部１２０によってタッチパネル１０３に表示される画面ＳＣ１を示す。この例では、端末１００は回転寿司店に設置され、ユーザの操作によって注文を受付ける機能を有するものである。具体的には、画面ＳＣ１は、オブジェクトＯＢ１０、オブジェクトＯＢ１１、オブジェクトＯＢ１２、オブジェクトＯＢ１３、オブジェクトＯＢ３０、オブジェクトＯＢ３１、オブジェクトＯＢ４０、オブジェクトＯＢ４１、オブジェクトＯＢ４２によって構成される。ユーザが各オブジェクトをタッチすることで所望のオブジェクトが選択される。 FIG. 6 shows a screen SC <b> 1 displayed on the touch panel 103 by the output unit 120. In this example, the terminal 100 is installed in a conveyor belt sushi restaurant and has a function of receiving an order by a user operation. Specifically, the screen SC1 includes an object OB10, an object OB11, an object OB12, an object OB13, an object OB30, an object OB31, an object OB40, an object OB41, and an object OB42. A user touches each object to select a desired object.

オブジェクトＯＢ１０、オブジェクトＯＢ１１、オブジェクトＯＢ１２、オブジェクトＯＢ１３は、商品の種類を選択するためのものである。オブジェクトＯＢ２０は、商品（この例では寿司）の一覧を表示して、ユーザに注文したいものを選択させる。オブジェクトＯＢ３０は、オブジェクトＯＢ２０にて選択された商品を表示する。オブジェクトＯＢ３１が選択されると、オブジェクトＯＢ３０に表示されている商品の注文が確定する。 The object OB10, the object OB11, the object OB12, and the object OB13 are for selecting the type of product. The object OB20 displays a list of products (in this example, sushi) and allows the user to select what he wants to order. The object OB30 displays the product selected by the object OB20. When the object OB31 is selected, the order of the product displayed on the object OB30 is confirmed.

オブジェクトＯＢ４０およびオブジェクトＯＢ４１は、画面遷移を行うためのオブジェクト（ソフトボタン）である。オブジェクトＯＢ４２は、従業員を呼出したいときに選択される。
ユーザは、注文を行う場合、オブジェクトＯＢ２０内に表示されたオブジェクトを選択してオブジェクトＯＢ３０内にドラッグし、オブジェクトＯＢ３１を選択すると、情報生成部１１０にて注文情報が生成され、通信部１２３を介してサーバ２００へ送信される。オブジェクトＯＢ４２が選択されると、従業員呼出しの指示が情報生成部１１０にて生成され通信部１２３を介してサーバ２００へ送信される。 The object OB40 and the object OB41 are objects (soft buttons) for performing screen transition. Object OB42 is selected when it is desired to call an employee.
When placing an order, the user selects an object displayed in the object OB20 and drags it into the object OB30. When the user selects the object OB31, order information is generated by the information generation unit 110, and is transmitted via the communication unit 123. To the server 200. When the object OB42 is selected, an employee call instruction is generated by the information generation unit 110 and transmitted to the server 200 via the communication unit 123.

図４はサーバ２００の機能ブロック図である。サーバ２００は、通信部２１０と制御部２２０と記憶部２４０とを含む。通信部２１０は、各端末１００および各従業員端末３００との間で情報の送受信を行うための通信インタフェースである。記憶部２４０はハードディスクや半導体メモリ等の情報記憶デバイスであって、後述のサーバ２００の動作を実現させるためのプログラムを記憶する。 FIG. 4 is a functional block diagram of the server 200. The server 200 includes a communication unit 210, a control unit 220, and a storage unit 240. The communication unit 210 is a communication interface for transmitting and receiving information between each terminal 100 and each employee terminal 300. The storage unit 240 is an information storage device such as a hard disk or a semiconductor memory, and stores a program for realizing the operation of the server 200 described later.

制御部２２０は、スタッフ管理部２２１と発注管理部２２２とフィードバック管理部２２３とを含む。スタッフ管理部２２１は、端末１００から受信した通知情報に、ユーザか
ら呼出しがあったことを示す情報や、その他従業員による応対が必要であることを示す情報が内包されている場合、従業員端末３００−１、３００−２、３００−３のうち、通知情報に対応する少なくとも一つ以上の従業員端末３００へ指示を送信する。 The control unit 220 includes a staff management unit 221, an order management unit 222, and a feedback management unit 223. When the notification information received from the terminal 100 includes information indicating that a call has been received from the user or other information indicating that an employee needs to respond to the staff management unit 221, An instruction is transmitted to at least one or more employee terminals 300 corresponding to the notification information among 300-1, 300-2, and 300-3.

スタッフ管理部２２１は、端末１００から注文情報を受信すると、従業員端末３００−３へ送信する。なお、スタッフ管理部２２１は、他のテーブルから受けた注文の状況を加味して、注文情報に内容される注文タイミングに関する情報を書き換えてもよい。 Upon receiving the order information from the terminal 100, the staff management unit 221 transmits the order information to the employee terminal 300-3. Note that the staff management unit 221 may rewrite the information regarding the order timing included in the order information in consideration of the status of orders received from other tables.

フィードバック管理部２２３は、端末１００から受信した注文の内容や提供のタイミングについての情報および従業員の呼出しに関する情報を従業員端末３００へ送信する。具体的には、注文情報については従業員端末３００−３へ送信し、応対要否情報については従業員端末３００−１および従業員端末３００−２の少なくともいずれかに送信する。 The feedback management unit 223 transmits to the employee terminal 300 information about the content of the order received from the terminal 100, information on the timing of provision, and information regarding the employee call. Specifically, the order information is transmitted to the employee terminal 300-3, and the response necessity information is transmitted to at least one of the employee terminal 300-1 and the employee terminal 300-2.

図５に端末１００の動作例を示す。端末１００は、所定のタイミングで、断続的に、画像の取得（Ｓ５０２）および音声の取得（Ｓ５０４）を行って、取得した画像データおよび音声データを解析する（Ｓ５０６）。なお、音声の取得タイミングと映像の取得タイミングとは一致している必要がない。取得した画像と音声とが時間的に対応付けてられていればよい。 FIG. 5 shows an operation example of the terminal 100. The terminal 100 intermittently performs image acquisition (S502) and audio acquisition (S504) at a predetermined timing, and analyzes the acquired image data and audio data (S506). Note that the audio acquisition timing and the video acquisition timing do not need to match. It is only necessary that the acquired image and sound are correlated in time.

情報生成部１１０は、解析部１３０における解析の結果に基づいて、ユーザが注文入力の操作に手間取っているかなど注文の支援が必要か否かを判定し（Ｓ５０８）、必要と判定した場合（Ｓ５０８：ＹＥＳ）、注文操作を支援するための画面を生成する（Ｓ５１０）。 Based on the result of the analysis in the analysis unit 130, the information generation unit 110 determines whether or not order support is necessary, such as whether the user is troublesome in the operation of order input (S508), and determines that it is necessary (S508). : YES), a screen for supporting the order operation is generated (S510).

図７は、Ｓ５１０においてタッチパネル１０３に表示される画面ＳＣ２である。画面ＳＣ２においては、オブジェクトＯＢ５０〜オブジェクトＯＢ５３が、通常の画面に重ねて表示される。オブジェクトＯＢ５０〜５３は、例えば、同一ページを閲覧して同一の品を選択しているがオブジェクトＯＢ３０へのドラッグ操作がなされていない、あるいは「マグロ」と発話しながら所定期間以上画面を見つめているなど、マグロを注文したいであろうと推定される場合に、表示される。 FIG. 7 is a screen SC2 displayed on the touch panel 103 in S510. On the screen SC2, the objects OB50 to OB53 are displayed so as to overlap the normal screen. The objects OB50 to 53, for example, browse the same page and select the same item, but have not been dragged to the object OB30, or are staring at the screen for a predetermined period or more while speaking “tuna”. Is displayed when it is estimated that tuna would be ordered.

オブジェクトＯＢ５０の内容を理解したユーザがオブジェクトＯＢ５１を選択すると、マグロの注文が確定する。これにより、オブジェクトＯＢ３０およびオブジェクトＯＢ３１に対する操作を行うことなく、マグロの注文が確定する。具体的には、情報生成部１１０はマグロの注文が注文されたことを示す注文情報を生成する。 When the user who understands the contents of the object OB50 selects the object OB51, the order of tuna is confirmed. As a result, the order of the tuna is confirmed without performing an operation on the object OB30 and the object OB31. Specifically, the information generation unit 110 generates order information indicating that a tuna order has been ordered.

ユーザがオブジェクトＯＢ５２を選択すると、操作を説明するための画面（図示省略）が表示される。ユーザがオブジェクトＯＢ５３を選択すると、オブジェクトＯＢ５０〜５３の表示を終了し、表示内容は図６の画面に戻る。 When the user selects the object OB52, a screen (not shown) for explaining the operation is displayed. When the user selects the object OB53, the display of the objects OB50 to 53 is terminated, and the display content returns to the screen of FIG.

注文の支援が必要でない場合（Ｓ５０８：ＮＯ）、情報生成部１１０は商品の提案を行うことが妥当か否かを判定する（Ｓ５１２）。妥当と判定された場合（Ｓ５１２：ＹＥＳ）、情報生成部１１０が提案内容を生成し、タッチパネル１０３に当該提案内容が表示される（Ｓ５１４）。 When order support is not necessary (S508: NO), the information generation unit 110 determines whether or not it is appropriate to propose a product (S512). When it is determined to be appropriate (S512: YES), the information generation unit 110 generates the proposal content, and the proposal content is displayed on the touch panel 103 (S514).

情報生成部１１０は、例えば、テーブルに置かれたグラス内の飲料が残り少なくなった場合、スープ類を除く複数の品を連続してした場合、前回の注文から所定時間が経過しても注文がなされなかった場合、干物などの喉が渇きやすい品を注文した場合など、ユーザが飲み物を欲していると推定される場合、飲み物を提案するべきであると判断する。 For example, when the beverage in the glass placed on the table is low, or when a plurality of items other than the soup are consecutively used, the information generation unit 110 receives the order even if a predetermined time has elapsed since the previous order. If it is not done, or if it is estimated that the user wants a drink, such as when a thirsty product such as dried fish is ordered, it is determined that a drink should be proposed.

図８は、タッチパネル１０３の画面に表示される画面ＳＣ３を示す。画面ＳＣ３はオブジェクトＯＢ５４、オブジェクトＯＢ５５、オブジェクトＯＢ５６、オブジェクトＯＢ５７が重ね表示される。この例では、飲み物を提案するべきと判断された場合を示している。
オブジェクトＯＢ５４の提案内容に興味を持ったユーザがオブジェクトＯＢ５５を選択すると、メニュー（図示省略）が表示される。注文したい品が決まっているユーザがオブジェクトＯＢ５６を選択すると、端末１００は発話内容を受付け、発話内容に含まれる品が注文であると判定し、その注文を確定させる。この提案に興味のないユーザがオブジェクトＯＢ５７を選択すると、オブジェクトＯＢ５４~オブジェクトＯＢ５７が消去され、表示内容は直前の画面に戻る。 FIG. 8 shows a screen SC <b> 3 displayed on the screen of the touch panel 103. On the screen SC3, an object OB54, an object OB55, an object OB56, and an object OB57 are displayed in an overlapping manner. This example shows a case where it is determined that a drink should be proposed.
When a user who is interested in the proposal contents of the object OB54 selects the object OB55, a menu (not shown) is displayed. When the user who has decided the item to be ordered selects the object OB56, the terminal 100 receives the utterance content, determines that the item included in the utterance content is an order, and confirms the order. When a user who is not interested in this proposal selects the object OB57, the objects OB54 to OB57 are deleted, and the display content returns to the previous screen.

あるいは、情報生成部１１０はこれまで注文品の傾向および／またはユーザ属性に記述される嗜好情報に基づいて、提案する品を決定してもよい。
図９は端末１００の画面に表示される内容の画面ＳＣ４である。画面ＳＣ４では、オブジェクトＯＢ５８、オブジェクトＯＢ５９、オブジェクトＯＢ６０が重ね表示される。オブジェクトＯＢ５８のメッセージに興味を持ったユーザがオブジェクトＯＢ５９を選択すると、お勧め品の一覧（図示省略）が表示される。このメッセージに興味がないユーザがオブジェクトＯＢ６０を選択すると、オブジェクトＯＢ５８〜オブジェクトＯＢ６０が消去され、表示内容は直前の画面に戻る。
あるいは、お勧め品の存在が通知されるのではなく、図１０に示すように、おすすめの内容が提示されてもよい（オブジェクトＯＢ６１）。 Or the information generation part 110 may determine the goods to propose based on the tendency information and / or the preference information described in the user attribute until now.
FIG. 9 shows a screen SC4 of contents displayed on the screen of the terminal 100. On the screen SC4, the object OB58, the object OB59, and the object OB60 are displayed in an overlapping manner. When a user who is interested in the message of the object OB58 selects the object OB59, a list of recommended products (not shown) is displayed. When a user who is not interested in this message selects the object OB60, the objects OB58 to OB60 are deleted, and the display content returns to the previous screen.
Alternatively, the presence of a recommended product is not notified, but recommended content may be presented as shown in FIG. 10 (object OB61).

情報生成部１１０は、現状では提案が妥当でないと判定した場合（Ｓ５１２：ＮＯ）、音声入力部１９０にて取得した発話の内容が提供品やサービスに関する感想等を述べてものであるか否かを判定する（Ｓ５１６）。具体的には、情報生成部１１０は、発話内容から、接客態度や、注文した料理についての感想等であると推測される発言がなされたか否かを判定する。感想等であると判定された場合、ユーザに端末１００が取得した感想等が反映された音声や画像を接客支援システム１０の運営者あるいは端末１００が設定された店舗の運営側へ提供することについての承諾を求める（Ｓ５１８）。 If the information generation unit 110 determines that the proposal is not appropriate at present (S512: NO), whether or not the content of the utterance acquired by the voice input unit 190 describes an impression about the provided product or service, etc. Is determined (S516). Specifically, the information generation unit 110 determines whether or not a speech presumed to be a customer service attitude, an impression of an ordered dish, or the like is made from the utterance content. When it is determined that it is an impression, etc., about providing the user with voice and images reflecting the impression acquired by the terminal 100 to the operator of the customer service support system 10 or the store management side where the terminal 100 is set. Is requested (S518).

ユーザが承諾した場合（Ｓ５２０：ＹＥＳ）、情報生成部１１０は一時的に記憶部１６０に記憶しておいた発話内容を、発話を取得したタイミングや発話したユーザ（あるいは当該ユーザと同一の場景を共有しているユーザ）の属性に対応付けて、サーバ２００に送信する。発話内容を受信したサーバ２００は、記憶部２４０に記憶する（Ｓ５２２）。 When the user consents (S520: YES), the information generation unit 110 temporarily displays the utterance content stored in the storage unit 160 at the timing of acquiring the utterance and the user who spoke (or the same scene as the user). The information is transmitted to the server 200 in association with the attribute of the shared user. The server 200 that has received the utterance content stores it in the storage unit 240 (S522).

Ｓ５１８において端末１００に表示される画面ＳＣ６を図１１に示す。画面ＳＣ６において、オブジェクトＯＢ６２、オブジェクトＯＢ６３、オブジェクトＯＢ６４が重ね表示される。画面ＳＣ６において、オブジェクトＯＢ６２の内容を確認し感想等の提供を承諾したユーザが、オブジェクトＯＢ６３を選択すると、情報生成部１１０は、当該感想等を示す音声または映像データそのもの、あるいは当該音声または映像データから生成された記憶部１６０から読み出してフィードバック情報を生成し、通信部１２３を介してサーバ２００へ送信する。感想等の提供に承諾しなかったユーザがオブジェクトＯＢ６４を選択すると（Ｓ５２０：ＮＯ）、オブジェクトＯＢ６２〜オブジェクトＯＢ６４が消去され、直前の画面に戻る。 A screen SC6 displayed on the terminal 100 in S518 is shown in FIG. On the screen SC6, the object OB62, the object OB63, and the object OB64 are displayed in an overlapping manner. When the user who confirms the content of the object OB62 and approves the impression on the screen SC6 selects the object OB63, the information generation unit 110 displays the sound or video data itself indicating the impression or the sound or video data. Is read out from the storage unit 160 generated from, and feedback information is generated and transmitted to the server 200 via the communication unit 123. When the user who has not accepted the provision of the impression selects the object OB64 (S520: NO), the objects OB62 to OB64 are deleted, and the screen returns to the previous screen.

感想等が検出されていなかった場合（Ｓ５１６：ＮＯ）、続いて情報生成部１１０は、従業員の応対の要否を判定する（Ｓ５２４）。例えば、情報生成部１１０は発話内容に従業員の対応を希望するような表現がある、あるいはユーザ同士の声が大きい、喧嘩しているような動作が検出された場合に、従業員の応対が必要であると判定する。
応対が必要と判定された場合（Ｓ５２４：ＹＥＳ）、情報生成部１１０は応対要否情報
を生成し、通信部１２３を介してサーバ２００へ送信する。対応不要と判定した場合（Ｓ５２４：ＮＯ）、処理はＳ５０２へ戻る。 When an impression or the like has not been detected (S516: NO), the information generation unit 110 subsequently determines whether or not an employee needs to be handled (S524). For example, the information generation unit 110 may respond to an employee when there is an expression that the employee wants to deal with in the utterance content, or when a voice that is loud between users or a fighting action is detected. Determine that it is necessary.
When it is determined that a response is required (S524: YES), the information generation unit 110 generates response necessity information and transmits it to the server 200 via the communication unit 123. If it is determined that no response is required (S524: NO), the process returns to S502.

図１２は、対応が必要と判定された場合に、端末１００に表示される画面ＳＣ７である。画面ＳＣ７において、オブジェクトＯＢ６５、オブジェクトＯＢ６６、オブジェクトＯＢ６７が重ね表示される。ユーザがオブジェクトＯＢ６を選択すると、応対要否情報の生成およびサーバ２００への提供が実行される（Ｓ５２６）。 FIG. 12 shows a screen SC7 displayed on the terminal 100 when it is determined that a response is necessary. On the screen SC7, the object OB65, the object OB66, and the object OB67 are displayed in an overlapping manner. When the user selects the object OB6, generation of response necessity information and provision to the server 200 are executed (S526).

従業員への要求内容が具体的に情報生成部１１０にて特定された場合は、図１３に示すような画面ＳＣ８を表示してもよい。画面ＳＣ８において、オブジェクトＯＢ６６が選択されると、サーバ２００を介して従業員端末３００−１または従業員端末３００−２へ、テーブルの片付けを指示する情報が送信される。オブジェクトＯＢ６８が選択されると、オブジェクトＯＢ６８〜オブジェクトＯＢ７０が消去されて直前の画面に戻る。
以後、Ｓ５０２〜Ｓ５２６を繰り返す。 When the request content for the employee is specifically specified by the information generation unit 110, a screen SC8 as shown in FIG. 13 may be displayed. When the object OB66 is selected on the screen SC8, information for instructing to clean up the table is transmitted to the employee terminal 300-1 or the employee terminal 300-2 via the server 200. When the object OB68 is selected, the objects OB68 to OB70 are deleted and the screen returns to the previous screen.
Thereafter, S502 to S526 are repeated.

上記実施例によれば、音声および映像という観点から場景が分析される。そして、この分析の結果に基づいて、商品やサービスに対する客の要望が推定でき、推定された要望に沿ったサービスを提供することができる。この要望には接客についてのものが含まれている場合、応対の必要性が自動的に判定され、必要のときのみ従業員に応対させるように指示が通知される。これにより、サービス品質を保つために接客員を過剰に配置する必要がない。
加えて、上記実施例によれば、客の事前の承諾を得たのち客のサービスに関する感想等が取得される。この際、承諾の要求が、場景に基づいて適切であると判定されたタイミングで実行されるので、客の気分を害す虞が軽減され、アンケートに協力してもらえる可能性が高まる。 According to the above embodiment, the scene is analyzed from the viewpoint of audio and video. And based on the result of this analysis, a customer's demand for goods and services can be estimated, and a service in accordance with the estimated demand can be provided. If this request includes information about customer service, the necessity for reception is automatically determined, and an instruction is sent to the employee only when necessary. As a result, it is not necessary to excessively arrange the customer service in order to maintain the service quality.
In addition, according to the above-described embodiment, after obtaining the prior consent of the customer, an impression related to the customer's service is acquired. At this time, since the request for consent is executed at a timing determined to be appropriate based on the scene, the possibility of harming the customer's mood is reduced, and the possibility that the questionnaire can be cooperated is increased.

なお、図６のＳ５０８、Ｓ５１２、Ｓ５１６、Ｓ５２４の各判定ステップは、全て実行される必要はなく、少なくとも一つが実行されればよい。また、各判断ステップを実行するタイミングや順序は任意である。 Note that it is not necessary to execute all the determination steps S508, S512, S516, and S524 in FIG. 6, and at least one of them may be executed. Moreover, the timing and order of executing each determination step are arbitrary.

端末１００は、ユーザが着席した時など、ユーザが注文を行う準備が整ったか否かを判定し、準備が整ったと判定した場合、その旨をユーザに通知してもよい。
具体的には、解析部１３０において、取得した映像データに基づいてユーザが到来したことを検知し、加えてそのユーザの属性情報（人数や性別、年齢等）を判定する。情報生成部１１０は、当該判定の内容に基づいて、表示部１２１および音声出力部を制御する。例えば、予め記憶されたウエイトレスの画像およびウエイターの画像のうち、ユーザの性別に応じたいずれか一方を選択して表示する。
そして、ユーザの年齢に応じて「いらっしゃいませ。ご注文が決まりましたらお話し下さい。タッチパネルでも大丈夫です」といった案内メッセージまたは「いらっしゃい。注文が決まったらこの端末に話しかけてね。代わりにタッチパネルで入力してもＯＫ！」のうちいずれかを選択し、選択されたメッセージの合成音声を放音する。これにより、ユーザに、自己の存在がサービス提供者に確かに把握されているという安心感を与えることができる。そして、初来店であっても注文操作をスムーズに行うことができる。また、ユーザの属性に応じて案内方法を変えることができる。
なお、撮影対象に年齢や性別などのユーザ属性が異なる複数のユーザが存在する場合は、例えば、属性値の共通度や優位性に基づいて一の属性を決定してもよいし（例えば男性3人、女性２のグループならユーザ属性は男性であるとするなど）、統計処理（年齢の平均値を算出するなど）を行って一の属性を決定してもよい。 The terminal 100 may determine whether the user is ready to place an order, such as when the user is seated, and may notify the user when it is determined that the user is ready.
Specifically, the analysis unit 130 detects that a user has arrived based on the acquired video data, and in addition, determines attribute information (number of people, sex, age, etc.) of the user. The information generation unit 110 controls the display unit 121 and the audio output unit based on the content of the determination. For example, either one of waitress images and waiter images stored in advance is selected and displayed according to the gender of the user.
Then, depending on the user's age, a welcome message such as “Come on. Please tell me when you place an order. The touch panel is fine.” Or, please talk to this terminal when the order is decided. Anyway! "Is selected, and the synthesized voice of the selected message is emitted. Thereby, it is possible to give the user a sense of security that the existence of the user is surely grasped by the service provider. And even if it is the first visit, the order operation can be performed smoothly. Also, the guidance method can be changed according to the user's attributes.
In addition, when there are a plurality of users having different user attributes such as age and sex in the shooting target, for example, one attribute may be determined based on the commonality or superiority of the attribute values (for example, male 3 One attribute may be determined by performing statistical processing (for example, calculating an average value of age) or the like if the user attribute is male for a group of two people and two women.

また、端末１００は、ユーザを検知していないときは表示部１２１や音声出力部１２２
の注文を受け付けるために必要な機能が実現されない省電力モードで動作し、ユーザを検知したことを契機として、表示部１２１や音声出力部１２２を作動させてもよい。この場合、ユーザが端末１００を作動させるための操作を行う必要がない。 Further, when the terminal 100 does not detect the user, the display unit 121 and the audio output unit 122 are used.
The display unit 121 and the audio output unit 122 may be activated when the user operates in a power saving mode in which a function necessary for receiving an order is not realized. In this case, it is not necessary for the user to perform an operation for operating the terminal 100.

サーバ２００は、発注情報を記憶部２４０に蓄積し、蓄積された情報に基づいて、制御部２２０は、季節毎あるいは曜日毎の注文傾向に関する情報を生成し、この情報に基づいて材料の発注タイミングや人員配置計画を決定してもよい。これにより、提供される料理の鮮度の向上、廃棄物の削減、注文品が品切れとなる事態の防止、人員配置（アルバイトの増減）の効率化に寄与することが期待される。 The server 200 accumulates the order information in the storage unit 240, and based on the accumulated information, the control unit 220 generates information on the order tendency for each season or each day of the week. Or a staffing plan may be determined. This is expected to contribute to improving the freshness of the food provided, reducing waste, preventing orders from being sold out, and improving the efficiency of staffing (increase or decrease in part-time jobs).

上記実施例においては、注文内容の指定は、主にタッチ操作によって行い、音声による注文は補助的役割であったが、音声による注文指示を主に受け付け、必要な時だけタッチパネル１０３を用いた入力を促してもよい。あるいは、タッチ操作による注文の受付と音声入力による注文の受け付けのいずれか一方のみを受付けてもよい。 In the above-described embodiment, the specification of the order contents is mainly performed by a touch operation, and the order by voice is an auxiliary role. However, the order instruction by voice is mainly accepted, and input using the touch panel 103 is performed only when necessary. You may be prompted. Alternatively, only one of reception of an order by touch operation and reception of an order by voice input may be received.

あるいは、タッチ操作と音声入力のいずれかまたは両方をユーザに指定させてもよい。例えば、画面にタッチ操作による注文と音声入力による注文のいずれかまたは両方を有効にする指定を受付けるためのオブジェクトを表示し、ユーザに所望の注文方法を選択ないし選択の切替えを行わせる。例えば、端末１００は音声による注文を主とする動作を行っている場合、解析部１３０にて発話内容の意味（注文内容）が聞き取れない場合は、ユーザに再度の発話を促す。ここで、滑舌が悪いため、あるいは発話された単語が音声信号処理部に登録されていない単語であった等の要因によって、発話を複数回取得しても依然として注文内容が判別できない場合、情報生成部１１０は、表示部１２１や音声出力部１２２を用いて、タッチパネル１０３を用いて注文内容を入力するように促す。情報生成部１１０は、音声による注文の受付がうまくいっていないと判定した場合、応答情報をサーバ２００へ送信して、従業員を当該ユーザへ向かわせて注文を直接口頭で受けるように指示してもよい。
この結果、ユーザの満足度が向上することが期待される。 Alternatively, the user may designate either or both of the touch operation and the voice input. For example, an object for accepting designation for enabling either or both of an order by touch operation and an order by voice input is displayed on the screen, and the user is allowed to select or switch the desired order method. For example, when the terminal 100 performs an operation mainly for ordering by voice, if the meaning of the utterance content (order content) cannot be heard by the analysis unit 130, the user is prompted to utter again. Here, if the order contents cannot be determined even if the utterance is acquired multiple times due to factors such as bad tongue, or the uttered word is not registered in the audio signal processing unit, The generation unit 110 uses the display unit 121 and the audio output unit 122 to prompt the user to input order details using the touch panel 103. When the information generation unit 110 determines that the order reception by voice is not successful, the information generation unit 110 transmits response information to the server 200 and instructs the employee to direct the order directly to the user. Also good.
As a result, it is expected that user satisfaction will be improved.

本発明を適用する対象のサービスは、飲食物の提供に限らない。例えば、物品の販売であってもよい。また、サービス提供場所は、店舗内に限らず、例えば旅客機や列車内であってもよい。例えば、端末１００を航空機や車両などの各座席の前面（前の座席の背もたれの背面）に設け、機内や車内における乗務員による飲食物の提供サービスに提供してもよい。 The service to which the present invention is applied is not limited to the provision of food and drink. For example, it may be the sale of goods. Further, the service providing place is not limited to the inside of the store, but may be, for example, a passenger plane or a train. For example, the terminal 100 may be provided on the front surface of each seat such as an aircraft or a vehicle (the back surface of the backrest of the front seat) and provided to a food service provided by a crew member in the cabin or in the vehicle.

サービスを受ける対価（料金）の支払いを行う機能を端末１００に設けてもよい。例えば、端末１００にクレジットカードのカード読取り機構を設け、複数の決済方法から一つをユーザに選択させ、現金支払いを希望するユーザに対してはレジへ案内する画面を生成する一方、カード支払いを希望するユーザに対してはカードの挿入操作などを案内する画面を生成する。 The terminal 100 may be provided with a function for paying the consideration (fee) for receiving the service. For example, a card reading mechanism for a credit card is provided in the terminal 100, and a user is allowed to select one of a plurality of payment methods, and a screen for guiding a cash payment is generated for a user who wishes to make a cash payment. A screen for guiding the card insertion operation to the desired user is generated.

このような支払処理に誘導にするまたは支払処理を支援するタイミングは、場景情報に基づいて決定されもよい。例えば、最後に受け付けた注文から所定時間が経過しても注文が入力されない場合、箸やフォークの動きが一定時間以上確認されない場合、デザートなどの一般的に最後に注文される料理が注文された場合、「満腹、満腹」または「もう終わりかな」、「さて会計するか・・・」といった注文の終了を示唆するような発話が確認された場合、情報生成部１１０は食事が終了したと判定する。あるいは、ユーザが椅子から立ち上がったことが解析部１３０にて検出された場合に、食事終了と判定してもよい。この際、例えば注文品が提供されていない場合、テーブルの上に料理が残っている場合、かばんなどのユーザの携行品が検出されている場合は、食事終了ではなく一時的な離席であ
ると判定してもよい。 The timing for inducing or supporting the payment process may be determined based on the scene information. For example, if the order is not entered even after a predetermined time has passed since the last received order, if the movement of chopsticks and forks has not been confirmed for a certain period of time, the last ordered dish such as dessert has been ordered If the utterance suggesting the end of the order is confirmed, such as “full or full” or “Is it over now” or “Is it to account?”, The information generation unit 110 determines that the meal has ended. To do. Alternatively, when the analysis unit 130 detects that the user has stood up from the chair, it may be determined that the meal has ended. In this case, for example, when no order item is provided, when a dish remains on the table, or when a user's carry-on item such as a bag is detected, it is a temporary absence from the meal. May be determined.

食事が終了したと判定すると、情報生成部１１０は、画面にウエイトレス又はウエイターの画像を表示し、「本日は有難う御座いました。又のお越しをお待ちしております」などと合成音声を放音するよう、出力部１２０を制御する。あるいは、表示部１２１に「最後にお茶はいかがですか？」といったメッセージを表示させてもよい。これにより、サービス提供者側は注文が終了したと認識していることをユーザに伝えることができる。 When it is determined that the meal has ended, the information generation unit 110 displays an image of a waitress or waiter on the screen and emits a synthesized voice such as “Thank you for coming today. The output unit 120 is controlled to do so. Alternatively, a message such as “How about the last tea?” May be displayed on the display unit 121. Thereby, the service provider side can inform the user that the order has been recognized.

続いて、端末１００は、会計に誘導するための処理を行う。例えば、ユーザがお茶の提供に同意した場合またはお茶が提供されたタイミングで、表示部１２１および／または音声出力部１２２に「清算する際はこのテーブルのカード読み取り機でも、レジでのお支払いも可能です」といった、支払処理を支援するためのメッセージを映像または音声によって出力させる。 Subsequently, the terminal 100 performs processing for guiding to accounting. For example, when the user agrees to provide tea or when the tea is provided, the display unit 121 and / or the audio output unit 122 “when paying, the card reader of this table can be used for payment at the cash register. A message for supporting payment processing such as “It is possible” is output by video or audio.

レジでの支払いが選択された場合は、例えば、端末１００は「レジでテーブル番号をお伝え下さい」と通知し、そのテーブル番号に対応付けられた端末１００によって受け付けられた注文内容および請求金額をサーバ２００から取得し、取得した情報に基づいて伝票を生成し、ユーザに提示する。あるいは、端末１００は「レジでご清算をお願い致します」と通知し、そのユーザの顔を撮影した画像または当該画像からユーザを識別するための情報を抽出して、その端末１００の識別情報と対応付けて、レジ端末（店舗内に備え付けられ、従業員によって操作されるキャッシュレジスタその他の支払用端末；図示省略）へ無線送信する。 When payment at the cash register is selected, for example, the terminal 100 notifies “Please tell the table number at the cash register”, and the order contents and the amount charged by the terminal 100 associated with the table number are sent to the server. The voucher is acquired from 200, a slip is generated based on the acquired information, and presented to the user. Alternatively, the terminal 100 notifies “Please check out at the cashier”, extracts an image of the user's face or information for identifying the user from the image, and identifies the identification information of the terminal 100 Correspondingly, it is wirelessly transmitted to a cash register terminal (a cash register or other payment terminal provided in the store and operated by an employee; not shown).

そして、このユーザがレジを訪れ、レジに設置されたレジ端末が有するカメラで撮影した画像が、当該レジ端末にて受信されたユーザのものと一致するか否かを判定し、一致した場合は、そのユーザの認証情報に対応付けて受信した端末１００の識別子をサーバ２００へ照会し、その端末１００に対応付けてサーバ２００に記憶されていうる注文内容（提供したサービスの内容）および請求金額をサーバ２００から取得する。
これにより、従業員が伝票をテーブルまで持っていくことや、その伝票を会計時にユーザがレジまで伝票を持っていく必要がなくなる。 Then, when this user visits the cash register and determines whether or not the image taken by the camera of the cash register terminal installed at the cash register matches that of the user received at the cash register terminal. The server 200 is inquired of the server 200 for the identifier of the terminal 100 received in association with the user authentication information, and the order content (content of the provided service) and the billing amount that can be stored in the server 200 in association with the terminal 100 are obtained. Obtain from the server 200.
This eliminates the need for the employee to take the slip to the table and the user to take the slip to the cash register at the time of accounting.

また、図３に示した端末１００の機能の一部をサーバ２００で実行してもよい。例えば、情報生成部１１０の機能をサーバ２００に持たせる。この場合、解析部１３０にて生成された情報は逐次サーバ２００に送信され、２００において注文情報、ユーザ支援情報、応対要否情報およびフィードバック情報が生成され、端末１００はサーバ２００からユーザ支援情報を受信してユーザに通知することになる。
要するに、１以上の情報処理装置から構成される情報処理システムにおいて、ユーザの周囲の場景を撮影するステップと、当該ユーザが発話した内容を取得するステップと、該撮影された画像に基づいて、該取得した音声を解析するステップと、解析の結果を出力するステップとが実行されればよい。 Further, some of the functions of the terminal 100 illustrated in FIG. For example, the server 200 is provided with the function of the information generation unit 110. In this case, the information generated by the analysis unit 130 is sequentially transmitted to the server 200, and order information, user support information, response necessity information and feedback information are generated in 200, and the terminal 100 receives user support information from the server 200. It will be received and notified to the user.
In short, in an information processing system including one or more information processing apparatuses, based on the captured image, the step of capturing a scene around the user, the step of acquiring the content spoken by the user, and the captured image The step of analyzing the acquired voice and the step of outputting the analysis result may be executed.

１００・・・端末、１０１・・・マイクカメラ、１０２・・・スピーカ、１０３・・・タッチパネル、１８０・・・撮影部、１８１・・・画像処理部、１９０・・・音声入力部、１９１・・・音声信号処理部、１３０・・・解析部、１４０・・・操作部、１５０・・・計時部、１１０・・・情報生成部、１６０・・・記憶部、１２０・・・出力部、１２１・・・表示部、１２２・・・音声出力部、１２３・・・通信部、２００・・・サーバ、２１０・・・通信部、２４０・・・記憶部、２２０・・・制御部、２２１・・・スタッフ管理部、２２３・・・フィードバック管理部、２２２・・・発注管理部 DESCRIPTION OF SYMBOLS 100 ... Terminal, 101 ... Microphone camera, 102 ... Speaker, 103 ... Touch panel, 180 ... Shooting unit, 181 ... Image processing unit, 190 ... Audio input unit, 191, ..Audio signal processing unit, 130 ... analysis unit, 140 ... operation unit, 150 ... timer unit, 110 ... information generation unit, 160 ... storage unit, 120 ... output unit, 121: Display unit, 122: Audio output unit, 123 ... Communication unit, 200 ... Server, 210 ... Communication unit, 240 ... Storage unit, 220 ... Control unit, 221 ... Staff management unit, 223 ... Feedback management unit, 222 ... Order management unit

本発明は、一の態様において、ユーザの周囲の場景を示す画像を取得する画像取得手段と、前記ユーザの発話に係る音声を取得して当該発話の内容を特定する音声取得手段と、該取得された画像が示す前記ユーザの動作に基づいて、当該ユーザの発話の意図を判定する解析手段と、前記判定の結果に応じた情報を出力する出力手段とを有する情報処理装置を提供する。 In one aspect, the present invention provides an image acquisition unit that acquires an image showing a scene around the user, a voice acquisition unit that acquires a voice related to the user's utterance and identifies the content of the utterance , and the acquisition An information processing apparatus is provided that includes an analysis unit that determines an intention of the user's utterance based on an operation of the user indicated by the displayed image, and an output unit that outputs information according to the determination result .

本発明は、一の態様において、ユーザの周囲の場景を示す画像を取得する画像取得手段と、前記ユーザの発話に係る音声を取得して当該発話の内容を特定する音声取得手段と、該取得された画像が示す前記ユーザの動作に基づいて、当該ユーザの発話の意図を判定し、該判定の結果に基づいて、従業員の前記ユーザに対する応対の要否または内容を示す応対要否情報を生成する解析手段と、前記応対要否情報を従業員端末へ出力する出力手段とを有する情報処理装置を提供する。 In one aspect, the present invention provides an image acquisition unit that acquires an image showing a scene around the user, a voice acquisition unit that acquires a voice related to the user's utterance and identifies the content of the utterance, and the acquisition Based on the action of the user indicated by the displayed image, the intention of the user is determined , and based on the result of the determination, the necessity information or the necessity information of the response indicating the necessity or contents of the employee to the user is determined. There is provided an information processing apparatus having analysis means for generating and output means for outputting the information on necessity of reception to an employee terminal .

Claims

Image acquisition means for acquiring an image showing a scene around the user;
Voice acquisition means for acquiring voice related to the user's utterance;
Analyzing means for analyzing the utterance based on the acquired image;
An information processing apparatus comprising: output means for outputting a result of the analysis.

The image acquisition means shoots a place where a product is placed or a place where a service is provided to the user;
The information processing apparatus according to claim 1.

The analysis means determines whether or not the user's utterance means the content of an order based on the captured user action.
The information processing apparatus according to claim 1 or 2.

Further comprising means for notifying the user,
The notification means requests the user to approve the content of the determined order;
The information processing apparatus according to claim 3.

The analysis means determines whether or not the acquired utterance means the content of an order according to the progress of the user's eating and drinking determined based on the captured image.
The information processing apparatus according to any one of claims 1 to 4.

The analysis means further considers the meaning of the acquired uttered word, and determines whether the acquired utterance means order content,
The information processing apparatus according to any one of claims 1 to 5.

Means for notifying the user;
Storage means for storing the content of the utterance, and
The notification means requests the user to consent to store the impression in the storage means when the analysis means determines that the utterance is an impression of food or drink or customer service,
The information processing apparatus according to any one of claims 4 to 6.

The output means notifies the employee of an instruction to execute customer service corresponding to the result of the analysis;
The information processing apparatus according to any one of claims 1 to 7.

It further comprises display means for displaying an image corresponding to the scene.
The information processing apparatus according to claim 1.

The display means displays a product or service corresponding to the scene.
The information processing apparatus according to claim 9.

On the computer,
Shooting a scene around the user;
Obtaining the content spoken by the user;
Analyzing the acquired sound based on the captured image;
A program for executing the step of outputting the result of the analysis.