JP7091890B2

JP7091890B2 - Photobook production system and server equipment

Info

Publication number: JP7091890B2
Application number: JP2018127701A
Authority: JP
Inventors: 大輔宮本; 章小川
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2018-07-04
Filing date: 2018-07-04
Publication date: 2022-06-28
Anticipated expiration: 2038-07-04
Also published as: JP2020009012A

Description

本発明は、フォトブック作製システム及びサーバ装置に関する。 The present invention relates to a photobook making system and a server device.

デジタルカメラやスマートフォン等で撮影した画像を、ネットワーク上にアップロードして保存することが行われている。また、アップロードした画像をプリントして、フォトブックやポスター等の印画物を作製するサービスが知られている（例えば特許文献１参照）。 Images taken with digital cameras, smartphones, etc. are uploaded and saved on the network. Further, there is known a service of printing an uploaded image to produce a printed matter such as a photo book or a poster (see, for example, Patent Document 1).

従来、アップロードした画像を用いてフォトブックを注文する場合、パーソナルコンピューターやスマートフォンを操作して画像の選択、配置、コメントの入力等を行っていた。 Conventionally, when ordering a photobook using uploaded images, the personal computer or smartphone is operated to select, arrange, and input comments.

近年、対話型の音声操作に対応したＡＩアシスタントを利用可能なスマートスピーカが普及している。ユーザがスマートスピーカに話しかけることで、検索エンジンを用いた調べ物、ニュースの読み上げ、音楽や動画の再生、家電の操作など、様々なアクションを実行できる。 In recent years, smart speakers that can use AI assistants that support interactive voice operations have become widespread. By talking to the smart speaker, the user can perform various actions such as searching using a search engine, reading news aloud, playing music and videos, and operating home appliances.

特開２００５－３３９２１４号公報Japanese Unexamined Patent Publication No. 2005-339214

本発明は、スマートスピーカに話しかけることでフォトブックの編集や注文を行うことができるフォトブック作製システム及びサーバ装置を提供することを課題とする。 An object of the present invention is to provide a photobook production system and a server device capable of editing and ordering a photobook by talking to a smart speaker.

本発明によるフォトブック作製システムは、ユーザ端末から受信した画像データを保存するサーバ装置と、前記サーバ装置と通信可能に接続され、音声の出力及びユーザの発話の集音を行うスマートスピーカと、を備え、前記サーバ装置は、前記スマートスピーカを介して入力されたユーザの音声を理解し、ユーザに対する応答文を生成し、前記スマートスピーカを介して前記応答文を前記ユーザへ出力する対話処理部と、前記ユーザの音声がフォトブック作製指示である場合に、保存する画像の中から複数の画像を選択する画像選択部と、前記画像選択部により選択された画像を用いてフォトブックデータを生成する編集処理部と、を有し、前記フォトブックデータのプレビュー画面を前記ユーザ端末へ送信し、前記ユーザの音声が前記フォトブックデータ内の画像変更指示である場合、保存している画像を用いた画像一覧画面を前記ユーザ端末へ送信し、前記ユーザの音声が検索条件を含む画像検索指示である場合、保存している画像から前記検索条件に合う画像を抽出し、抽出した画像を用いた画像一覧画面を前記ユーザ端末へ送信するものである。 The photobook production system according to the present invention comprises a server device that stores image data received from a user terminal, and a smart speaker that is communicably connected to the server device and outputs voice and collects user's speech. The server device includes a dialogue processing unit that understands a user's voice input via the smart speaker, generates a response text to the user, and outputs the response text to the user via the smart speaker. , When the user's voice is a photobook creation instruction, photobook data is generated using an image selection unit that selects a plurality of images from the images to be saved and an image selected by the image selection unit. It has an editing processing unit, the preview screen of the photobook data is transmitted to the user terminal, and when the voice of the user is an image change instruction in the photobook data, the saved image is used. When the image list screen is transmitted to the user terminal and the user's voice is an image search instruction including the search condition, an image satisfying the search condition is extracted from the stored image, and an image using the extracted image is used. The list screen is transmitted to the user terminal.

本発明の一態様によるフォトブック作製システムにおいて、前記検索条件は日付を含み、前記サーバ装置は、撮影日が前記日付の画像を抽出する。 In the photobook production system according to one aspect of the present invention, the search condition includes a date, and the server device extracts an image whose shooting date is the date.

本発明の一態様によるフォトブック作製システムにおいて、前記サーバ装置は、イベント又は記念日の日付が登録されたスケジュール情報を保存し、前記検索条件にイベント又は記念日の情報が含まれている場合、前記スケジュール情報を参照し、前記検索条件に含まれているイベント又は記念日の日付に撮影された画像を抽出する。 In the photobook production system according to one aspect of the present invention, the server device stores schedule information in which the date of the event or anniversary is registered, and when the search condition includes the information of the event or anniversary, With reference to the schedule information, images taken on the date of the event or anniversary included in the search condition are extracted.

本発明の一態様によるフォトブック作製システムにおいて、前記サーバ装置は、前記ユーザ端末の表示部で前記画像一覧画面に含まれる画像が識別番号と共に表示されるように制御し、前記ユーザの音声が前記識別番号を含む画像選択指示である場合、選択された画像を用いてフォトブックデータ内の画像を変更し、前記フォトブックデータ及び前記プレビュー画面を更新する。 In the photobook production system according to one aspect of the present invention, the server device controls the display unit of the user terminal so that the image included in the image list screen is displayed together with the identification number, and the voice of the user is the voice of the user. When the image selection instruction includes the identification number, the image in the photobook data is changed by using the selected image, and the photobook data and the preview screen are updated.

本発明の一態様によるフォトブック作製システムにおいて、前記画像一覧画面の画像には、前記表示部での表示位置に応じた識別番号が付与される。 In the photobook production system according to one aspect of the present invention, an identification number corresponding to the display position on the display unit is assigned to the image on the image list screen.

本発明の一態様によるフォトブック作製システムにおいて、前記画像一覧画面に含まれる複数の画像のうち、前記表示部で表示中の画像にのみ識別番号が付与される。 In the photobook production system according to one aspect of the present invention, an identification number is assigned only to the image displayed on the display unit among the plurality of images included in the image list screen.

本発明によるサーバ装置は、音声の出力及びユーザの発話の集音を行うスマートスピーカと通信可能に接続されたサーバ装置であって、ユーザ端末から受信した画像データを保存する記憶部と、前記スマートスピーカを介して入力されたユーザの音声を理解し、ユーザに対する応答文を生成し、前記スマートスピーカを介して前記応答文を前記ユーザへ出力する対話処理部と、前記ユーザの音声がフォトブック作製指示である場合に、保存する画像の中から複数の画像を選択する画像選択部と、前記画像選択部により選択された画像を用いてフォトブックデータを生成する編集処理部と、を有し、前記フォトブックデータのプレビュー画面を前記ユーザ端末へ送信し、前記ユーザの音声が前記フォトブックデータ内の画像変更指示である場合、保存している画像を用いた画像一覧画面を前記ユーザ端末へ送信し、前記ユーザの音声が検索条件を含む画像検索指示である場合、保存している画像から前記検索条件に合う画像を抽出し、抽出した画像を用いた画像一覧画面を前記ユーザ端末へ送信するものである。 The server device according to the present invention is a server device communicably connected to a smart speaker that outputs voice and collects voices of the user, and has a storage unit that stores image data received from the user terminal and the smart. A dialogue processing unit that understands the user's voice input via the speaker, generates a response text to the user, and outputs the response text to the user via the smart speaker, and the user's voice creates a photobook. It has an image selection unit that selects a plurality of images from the images to be saved when instructed, and an editing processing unit that generates photobook data using the images selected by the image selection unit. When the preview screen of the photobook data is transmitted to the user terminal and the voice of the user is an image change instruction in the photobook data, an image list screen using the saved image is transmitted to the user terminal. Then, when the user's voice is an image search instruction including a search condition, an image satisfying the search condition is extracted from the stored image, and an image list screen using the extracted image is transmitted to the user terminal. It is a thing.

本発明によるサーバ装置は、音声の出力及びユーザの発話の集音を行うスマートスピーカと通信可能に接続されたサーバ装置であって、ユーザ端末から受信した画像データを保存する記憶部と、前記スマートスピーカを介して入力されたユーザの音声を理解し、ユーザに対する応答文を生成し、前記スマートスピーカを介して前記応答文を前記ユーザへ出力する対話処理部と、を備え、前記ユーザの音声がフォトブック作製指示である場合、前記フォトブック作製指示を前記ユーザ端末に通知し、前記ユーザの音声が、前記フォトブック作製指示に応じて前記ユーザ端末が前記記憶部に保存している画像を用いて生成したフォトブックデータ内の画像変更指示である場合、前記記憶部に保存している画像を用いた画像一覧画面が前記ユーザ端末の表示部で表示されるように制御し、前記ユーザの音声が検索条件を含む画像検索指示である場合、前記検索条件に合う画像を用いた画像一覧画面が前記ユーザ端末の表示部で表示されるように制御するものである。 The server device according to the present invention is a server device communicably connected to a smart speaker that outputs voice and collects voices of the user, and has a storage unit that stores image data received from the user terminal and the smart. The user's voice is provided with a dialogue processing unit that understands the user's voice input via the speaker, generates a response text to the user, and outputs the response text to the user via the smart speaker. In the case of a photobook production instruction, the photobook production instruction is notified to the user terminal, and the voice of the user uses an image stored in the storage unit by the user terminal in response to the photobook production instruction. In the case of an image change instruction in the photobook data generated in the above, control is performed so that the image list screen using the image stored in the storage unit is displayed on the display unit of the user terminal, and the voice of the user. Is an image search instruction including a search condition, the image list screen using an image satisfying the search condition is controlled to be displayed on the display unit of the user terminal.

本発明によれば、スマートスピーカに話しかけることでフォトブックの編集や注文を行うことができる。 According to the present invention, a photo book can be edited or ordered by talking to a smart speaker.

本発明の実施形態に係るフォトブック作製システムの概略構成図である。It is a schematic block diagram of the photo book making system which concerns on embodiment of this invention. フォトブック作製指示の例を示す図である。It is a figure which shows the example of the photobook making instruction. フォトブックのプレビュー画面の例を示す図である。It is a figure which shows the example of the preview screen of a photo book. フォトブック編集指示の例を示す図である。It is a figure which shows the example of the photo book editing instruction. フォトブック編集指示の例を示す図である。It is a figure which shows the example of the photo book editing instruction. フォトブック編集指示の例を示す図である。It is a figure which shows the example of the photo book editing instruction. フォトブック注文指示の例を示す図である。It is a figure which shows the example of the photo book order instruction. フォトブック作製システムのブロック構成図である。It is a block block diagram of a photobook making system. 変形例によるユーザ端末の例を示す図である。It is a figure which shows the example of the user terminal by the modification. フォトブック編集指示の例を示す図である。It is a figure which shows the example of the photo book editing instruction. フォトブック編集指示の例を示す図である。It is a figure which shows the example of the photo book editing instruction. 画像検索キーワードの例を示す図である。It is a figure which shows the example of the image search keyword.

以下、本発明の実施の形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１に示すように、本発明の実施形態に係るフォトブック作製システムは、サーバ装置１及びスマートスピーカ２を備える。サーバ装置１は、インターネット等の通信ネットワークを介して、スマートスピーカ２及びユーザ端末３と通信可能となっている。 As shown in FIG. 1, the photobook making system according to the embodiment of the present invention includes a server device 1 and a smart speaker 2. The server device 1 can communicate with the smart speaker 2 and the user terminal 3 via a communication network such as the Internet.

ユーザ端末３は、スマートフォン、タブレット端末等である。サーバ装置１には、ユーザからアップロードされた画像データが保存されている。スマートスピーカ２は、通信機能と対話型音声操作のアシスタント機能とを有するスピーカであり、ユーザの自宅等に設置されている。 The user terminal 3 is a smartphone, a tablet terminal, or the like. Image data uploaded by the user is stored in the server device 1. The smart speaker 2 is a speaker having a communication function and an assistant function for interactive voice operation, and is installed at the user's home or the like.

スマートスピーカ２及びユーザ端末３には、サーバ装置１から、ユーザを識別する同一の識別情報（ユーザＩＤ）が付与されている。従って、サーバ装置１は、スマートスピーカ２及びユーザ端末３が、同一のユーザが使用するものであることを認識している。上述した画像のアップロードは、ユーザ端末３から行われてもよいし、同じユーザＩＤでログインした他の端末から行われてもよい。 The smart speaker 2 and the user terminal 3 are given the same identification information (user ID) for identifying the user from the server device 1. Therefore, the server device 1 recognizes that the smart speaker 2 and the user terminal 3 are used by the same user. The above-mentioned image upload may be performed from the user terminal 3 or may be performed from another terminal logged in with the same user ID.

サーバ装置１は、画像解析機能を有し、ユーザからアップロードされた画像の解析を行う。画像解析は、例えば、画像内の物体やテキストの検出、公序良俗に反するおそれのある画像の検出、著名人や有名キャラクター等が写った著作権を侵害する可能性のある画像の検出である。例えば、サーバ装置１はウェブ上で類似している画像を検索して、著作権侵害の有無を判定する。サーバ装置１は、公序良俗に反する画像や著作権を侵害する可能性のある画像については、フォトブックに使用すべきでない不適切画像と判定し、その他の画像についてはフォトブックに使用可能な画像と判定する。 The server device 1 has an image analysis function and analyzes an image uploaded by a user. Image analysis is, for example, detection of objects and texts in an image, detection of an image that may be offensive to public order and morals, and detection of an image that may infringe the copyright of a celebrity, a famous character, or the like. For example, the server device 1 searches for similar images on the Web and determines whether or not there is copyright infringement. The server device 1 determines that an image that is offensive to public order and morals or an image that may infringe copyright is an inappropriate image that should not be used in a photobook, and other images are images that can be used in a photobook. judge.

本実施形態において、ユーザは、スマートスピーカ２に話しかけることでフォトブックの編集を行い、ユーザ端末３に表示されるプレビュー画面を確認する。 In the present embodiment, the user edits the photo book by talking to the smart speaker 2 and confirms the preview screen displayed on the user terminal 3.

例えば、図２に示すように、ユーザは「ＯＫ、スピーカ」のような所定のウェイクワード（コマンドワード）でスマートスピーカ２に呼びかけ、アシスタント機能を起動させる。続いて、ユーザはフォトブックを作製するようにスマートスピーカ２に話かける。例えば「１２月の旅行の写真でフォトブック作って」と話しかける。 For example, as shown in FIG. 2, the user calls the smart speaker 2 with a predetermined wake word (command word) such as "OK, speaker" to activate the assistant function. Subsequently, the user talks to the smart speaker 2 to create a photo book. For example, say, "Make a photo book with photos of your trip in December."

スマートスピーカ２は、ウェイクワード以降のユーザの発話の音声データをサーバ装置１へ送信する。サーバ装置１は、ユーザの発話文を解釈し、フォトブックの作製処理を開始する。サーバ装置１は、このユーザによりアップロードされている画像を選択し、フォトブックのテンプレートに配置する。例えば、サーバ装置１は、保存している画像のＥＸＩＦデータを参照し、撮影日や撮影場所の情報から、１２月の旅行の写真を絞り込み、フォトブックに好適な画像を選択する。なお、選択される画像は、画像解析によりフォトブックに使用可能と判定された画像である。 The smart speaker 2 transmits the voice data of the user's utterance after the wake word to the server device 1. The server device 1 interprets the user's utterance and starts the photobook production process. The server device 1 selects an image uploaded by this user and places it in a photobook template. For example, the server device 1 refers to the EXIF data of the stored image, narrows down the photos of the trip in December from the information of the shooting date and the shooting location, and selects an image suitable for the photo book. The selected image is an image determined to be usable in the photo book by image analysis.

サーバ装置１は、選択した画像をレイアウトしてフォトブックデータを作成すると、応答文（音声データ）を生成してスマートスピーカ２へ送信する。スマートスピーカ２は、フォトブックデータをユーザ端末３から確認するように音声を出力する。例えば、スマートスピーカ２は「スマートフォンの通知を確認してください」という音声を出力する。 When the server device 1 lays out the selected image and creates the photobook data, the server device 1 generates a response sentence (voice data) and transmits it to the smart speaker 2. The smart speaker 2 outputs voice so as to confirm the photo book data from the user terminal 3. For example, the smart speaker 2 outputs a voice saying "Please check the notification on the smartphone".

ユーザ端末３には、サーバ装置１からフォトブックデータが生成されたことを通知するメッセージが届いている。ユーザがメッセージにタッチすると、ユーザ端末３はサーバ装置１からフォトブックのプレビュー画面を取得する。例えば、図３に示すように、ユーザ端末３のタッチパネル３Ａにフォトブックのプレビュー画面が表示される。 The user terminal 3 has received a message notifying that the photobook data has been generated from the server device 1. When the user touches the message, the user terminal 3 acquires the photobook preview screen from the server device 1. For example, as shown in FIG. 3, a photobook preview screen is displayed on the touch panel 3A of the user terminal 3.

例えば、プレビュー画面の下部には、複数ページの画像がサムネイル表示され、画面上部には、選択したページの画像が大きく表示される。ページ切替ボタンＢ１、Ｂ２を押して、サムネイル表示するページを切り替えることができる。 For example, images of a plurality of pages are displayed as thumbnails at the bottom of the preview screen, and images of the selected page are displayed large at the top of the screen. You can switch the page to be displayed as a thumbnail by pressing the page switching buttons B1 and B2.

発話により選択ページを切り替えることもできる。例えば、ユーザが「５ページ見せて」とスマートスピーカ２に話かける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、プレビュー画面の選択ページを５ページに切り替える。 You can also switch the selection page by utterance. For example, the user talks to the smart speaker 2 "Show me 5 pages". The server device 1 acquires and interprets the user's utterance text via the smart speaker 2, and switches the selection page of the preview screen to page 5.

発話によりページ順の入れ替えを行うことができる。例えば、図４に示すように、ユーザが「２ページと３ページを入れ替えて」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、２ページの画像と３ページの画像とを入れ替え、プレビュー画面を更新する。ページ入れ替えの音声指示には、少なくとも「入れ替え」というワードと、入れ替え対象の２つのページ番号が含まれている必要がある。 The page order can be changed by utterance. For example, as shown in FIG. 4, the user talks to the smart speaker 2 "swap pages 2 and 3". The server device 1 acquires and interprets the user's utterance text via the smart speaker 2, replaces the image on page 2 with the image on page 3, and updates the preview screen. The voice instruction for page replacement must include at least the word "replacement" and two page numbers to be replaced.

サーバ装置１は、ページ入れ替え後、応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は「ページを入れ替えました」のように、ページを入れ替えたことを知らせる音声を出力する。 After switching pages, the server device 1 generates a response sentence and sends it to the smart speaker 2. The smart speaker 2 outputs a voice informing that the pages have been replaced, such as "pages have been replaced".

ユーザが単に「ページを入れ替えて」とスマートスピーカ２に話しかけた場合、サーバ装置１は、所定の行動選択規則に基づいて、入れ替えるページを質問する応答文を生成してスマートスピーカ２へ送信する。あるいはまた、「“３ページと４ページを入れ替える”のように指示してください」と、所定の言葉の順番で指示を出すように知らせる音声をスマートスピーカ２から出力してもよい。 When the user simply talks to the smart speaker 2 to "swap pages", the server device 1 generates a response sentence asking the page to be replaced and sends it to the smart speaker 2 based on a predetermined action selection rule. Alternatively, the smart speaker 2 may output a voice informing the user to give instructions in the order of predetermined words, such as "Please give instructions such as" Swap pages 3 and 4 "."

プレビュー画面のページ入れ替えボタンＢ３を押し、入れ替えるページをドラッグアンドドロップすることでも、ページ順の入れ替えを行うことができる。 You can also change the page order by pressing the page change button B3 on the preview screen and dragging and dropping the page to be replaced.

フォトブックに使用する画像を変更する場合は、発話により画像の変更を指示する。例えば、図５に示すように、ユーザが「６ページの写真を変更」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、画像の一覧画面をユーザ端末３へ送信する。サーバ装置１は、画像一覧画面の送信後、応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は「端末の画面から写真を選んでください」等の音声を出力する。 When changing the image used for the photo book, the change of the image is instructed by utterance. For example, as shown in FIG. 5, the user talks to the smart speaker 2 "change the photo on page 6". The server device 1 acquires and interprets the user's utterance text via the smart speaker 2, and transmits the image list screen to the user terminal 3. After transmitting the image list screen, the server device 1 generates a response sentence and transmits it to the smart speaker 2. The smart speaker 2 outputs a voice such as "Please select a photo from the screen of the terminal".

ユーザが単に「写真を変更」とスマートスピーカ２に話しかけた場合、サーバ装置１は、所定の行動選択規則に基づいて、どのページの画像を変更するか質問する応答文を生成してスマートスピーカ２へ送信する。 When the user simply talks to the smart speaker 2 to "change the photo", the server device 1 generates a response sentence asking which page of the image should be changed based on a predetermined action selection rule, and the smart speaker 2 is generated. Send to.

ユーザが、ユーザ端末３を操作して画像一覧画面から画像を選択すると、サーバ装置１は画像を変更し、フォトブックデータ及びプレビュー画面を更新する。 When the user operates the user terminal 3 to select an image from the image list screen, the server device 1 changes the image and updates the photobook data and the preview screen.

画像の一覧画面には、ユーザがアップロードした全ての画像（「１２月の旅行」のようにフォトブックの目的が設定されている場合は、その目的に合う全画像）が含まれる。すなわち、サーバ装置１の画像解析により、フォトブックに使用すべきでないと判定された不適切画像も含まれる。ユーザが一覧画面から不適切画像を選択した場合、サーバ装置１は、画像の出所を質問する応答文を生成してスマートスピーカ２へ送信する。例えば、スマートスピーカ２は「６ページの写真にキャラクターコンテンツが含まれますが、ご自身で撮影された写真ですか」のような音声を出力する。 The image list screen includes all images uploaded by the user (if a photobook purpose is set, such as "Travel in December", all images that meet that purpose). That is, an inappropriate image determined by the image analysis of the server device 1 that should not be used for the photo book is also included. When the user selects an inappropriate image from the list screen, the server device 1 generates a response sentence asking the source of the image and sends it to the smart speaker 2. For example, the smart speaker 2 outputs a voice such as "The picture on page 6 contains character content. Is it a picture taken by yourself?"

ユーザが「はい、そうです」のような肯定的な返答をした場合、サーバ装置１は、選択した画像の使用を承認する応答文を生成し、スマートスピーカ２へ送信する。一方、ユーザが「いいえ」のような否定的な応答をした場合、サーバ装置１はこの画像の使用には問題があることを説明する応答文を生成してスマートスピーカ２から音声を出力し、ユーザ端末３の表示を画像一覧画面に戻す。 When the user gives a positive response such as "Yes, that's right", the server device 1 generates a response statement approving the use of the selected image and sends it to the smart speaker 2. On the other hand, when the user gives a negative response such as "No", the server device 1 generates a response statement explaining that there is a problem in using this image, and outputs the voice from the smart speaker 2. The display of the user terminal 3 is returned to the image list screen.

プレビュー画面の画像変更ボタンＢ４（図３参照）を押して画像一覧画面を表示させ、画像一覧画面から画像を選択することでも、画像の変更を行うことができる。 You can also change the image by pressing the image change button B4 (see FIG. 3) on the preview screen to display the image list screen and selecting an image from the image list screen.

サーバ装置１は、画像を選択してフォトブックデータを作成する際に、選択した画像に対し、コメント（又はタイトル）を生成して付与できる。サーバ装置１は、画像解析により検出された画像内の物体やテキスト、フォトブックの目的等に基づいて、コメントを生成する。例えば、サーバ装置１は、特徴的な物体が検出された画像に対してコメントを生成して付与する。 When the server device 1 selects an image and creates photobook data, the server device 1 can generate and add a comment (or title) to the selected image. The server device 1 generates a comment based on an object or text in the image detected by image analysis, the purpose of the photo book, or the like. For example, the server device 1 generates and adds a comment to an image in which a characteristic object is detected.

図６に示すように、コメントを付与したページを表示する場合、サーバ装置１は、コメントを生成・付与したこと及びコメント内容を知らせる応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は、コメントを音声で読み上げる。ユーザは、スマートスピーカ２から出力される音声を聞いて、コメントを確認する。 As shown in FIG. 6, when displaying a page to which a comment has been added, the server device 1 generates a response sentence notifying that the comment has been generated / added and the content of the comment, and sends it to the smart speaker 2. The smart speaker 2 reads out the comment by voice. The user listens to the voice output from the smart speaker 2 and confirms the comment.

コメントを変更する場合は、発話によりコメントの変更を指示する。例えば、図６に示すように、ユーザが「コメント変更」とスマートスピーカ２に話しかけ、続いて変更後のコメントを発話する。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、現在表示している画像のコメントを変更し、フォトブックデータ及びプレビュー画面を更新する。 When changing a comment, instruct the change of the comment by utterance. For example, as shown in FIG. 6, the user speaks to the smart speaker 2 "change comment", and then utters the comment after the change. The server device 1 acquires and interprets the user's utterance text via the smart speaker 2, changes the comment of the currently displayed image, and updates the photobook data and the preview screen.

サーバ装置１は、コメント変更後、応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は「コメントを変更しました」のように、コメントを変更したことを知らせる音声を出力する。 After changing the comment, the server device 1 generates a response sentence and sends it to the smart speaker 2. The smart speaker 2 outputs a voice informing that the comment has been changed, such as "The comment has been changed".

プレビュー画面のコメント変更ボタンＢ６を押し、ユーザ端末３のタッチキーを操作してコメントを入力することでも、コメントの変更を行うことができる。 The comment can also be changed by pressing the comment change button B6 on the preview screen and operating the touch key of the user terminal 3 to input the comment.

ユーザがフォトブックのプレビュー画面を一通り確認して「注文します」等の所定のワードを発するか、又はプレビュー画面の確定ボタンＢ５（図３参照）を押すと、サーバ装置１は最終確認画面を生成してユーザ端末３へ送信し、応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は「端末画面で、写真とコメントの最終確認をお願いします」等の音声を出力する。 When the user confirms the preview screen of the photo book and issues a predetermined word such as "order" or presses the confirmation button B5 (see FIG. 3) on the preview screen, the server device 1 displays the final confirmation screen. Is generated and transmitted to the user terminal 3, and a response sentence is generated and transmitted to the smart speaker 2. The smart speaker 2 outputs a voice such as "Please make a final confirmation of the photo and comment on the terminal screen."

ユーザ端末３には、図７に示すような最終確認画面ＦＣが表示される。例えば、画面の下部には、各ページの画像選択者、コメント作成者、コメント内容を含む編集一覧が表示され、画面の上部には選択したページが表示される。編集一覧の画像選択者及びコメント作成者における“ＡＩ自動選択”、“ＡＩ自動作成”は、サーバ装置１に相当する。 The final confirmation screen FC as shown in FIG. 7 is displayed on the user terminal 3. For example, the image selector, comment creator, and edit list including the comment content of each page are displayed at the bottom of the screen, and the selected page is displayed at the top of the screen. The "AI automatic selection" and "AI automatic creation" in the image selecter and the comment creator of the edit list correspond to the server device 1.

上述したように、サーバ装置１は、画像の選択、配置、コメントの作成等を自動で行う。そのため、図７に示すように、編集一覧では、サーバ装置１が画像を選択し、かつコメントもサーバ装置１が作成したか又はコメントの無いページについては、ページ番号を強調表示する。これらのページは、ユーザが見落としている可能性があるためである。 As described above, the server device 1 automatically selects, arranges, creates comments, and the like. Therefore, as shown in FIG. 7, in the edit list, the page number is highlighted when the server device 1 selects an image and the comment is also created by the server device 1 or has no comment. This is because these pages may be overlooked by the user.

ユーザが画像を選択（変更、入れ替え）したり、コメントを作成したりしたページは、既にユーザが目を通しているページであるため、強調表示する必要はない。 The page on which the user selects (changes, replaces) an image or creates a comment is a page that the user has already read and does not need to be highlighted.

ユーザが、強調表示されているページを確認した後、「注文します」等の所定のワードを発するか、又は最終確認画面の確定ボタンＢ７を押すと、サーバ装置１は決済画面を生成してユーザ端末３へ送信し、応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は「決済処理に進みます。ここからは端末より入力をお願いします」等の音声を出力する。決済画面では、フォトブックの配送先やクレジットカード番号の入力が必要となり、スマートスピーカ２を用いた音声入力より、ユーザ端末３の操作による手入力の方が好ましい。 When the user confirms the highlighted page and then issues a predetermined word such as "I will place an order" or presses the confirmation button B7 on the final confirmation screen, the server device 1 generates a payment screen. It is transmitted to the user terminal 3, a response sentence is generated, and it is transmitted to the smart speaker 2. The smart speaker 2 outputs a voice such as "Proceed to payment processing. Please input from the terminal from here." On the payment screen, it is necessary to input the delivery address of the photo book and the credit card number, and manual input by operating the user terminal 3 is preferable to voice input using the smart speaker 2.

サーバ装置１は、ユーザ（ユーザ端末３）から決済情報が入力され、フォトブックの注文を受け付けると、フォトブックデータ及び注文内容を工場５へ送信する。 When the payment information is input from the user (user terminal 3) and the photobook order is received, the server device 1 transmits the photobook data and the order contents to the factory 5.

工場５に設置されたプリンタ（図示略）は、受信したフォトブックデータに基づいて印画処理を行い、フォトブック６を作製する。工場５で作製されたフォトブック６は、ユーザへ配送される。 The printer (not shown) installed in the factory 5 performs printing processing based on the received photobook data to produce the photobook 6. The photo book 6 produced in the factory 5 is delivered to the user.

工場５へ送信されるフォトブックデータを考査端末４へ送信し、考査端末４で人手による考査を行ってもよい。これにより、サーバ装置１の画像解析で見落とした不適切画像を検出できる。 The photobook data transmitted to the factory 5 may be transmitted to the examination terminal 4 and the examination terminal 4 may be used for manual examination. As a result, an inappropriate image overlooked in the image analysis of the server device 1 can be detected.

このように、本実施形態によれば、スマートスピーカ２を介して、音声によりフォトブックの作製を指示できる。また、自動レイアウトされた画像のページ順の入れ替えや、自動作成されたコメントの変更も、スマートスピーカ２を介して音声指示により行うことができる。画像の変更に関しては、画像一覧画面から選択する必要があるため、ユーザ端末３を使用する。 As described above, according to the present embodiment, it is possible to instruct the production of the photo book by voice via the smart speaker 2. Further, the page order of the automatically laid out images can be changed, and the automatically created comment can be changed by voice instruction via the smart speaker 2. Since it is necessary to select from the image list screen for changing the image, the user terminal 3 is used.

サーバ装置１は、ユーザからアップロードされた画像の解析を行い、公序良俗に反するおそれがある画像や著作権を侵害するおそれがある画像等の不適席画像を予め特定している。サーバ装置１は、フォトブックの作製にあたり、不適切画像以外の、フォトブックでの使用に問題無い画像から画像を選択できる。 The server device 1 analyzes the image uploaded by the user, and identifies in advance an unsuitable seat image such as an image that may violate public order and morals or an image that may infringe copyright. When creating a photobook, the server device 1 can select an image from images other than inappropriate images that are not problematic for use in the photobook.

また、画像変更処理において、ユーザにより、画像一覧画面から不適切画像が選択された場合、スマートスピーカ２を介してユーザに画像の出所等を質問し、問題無いことが確認されると、フォトブックに使用する。 Further, in the image change process, when an inappropriate image is selected from the image list screen by the user, the user is asked about the source of the image via the smart speaker 2, and when it is confirmed that there is no problem, the photo book Used for.

サーバ装置１は、ユーザからのフォトブック作製を指示する音声に日付が含まれている場合は、画像データに含まれる撮影日情報を確認し、指定された日付に撮影された画像を選択してフォトブックを作製する。そのため、ユーザが意図しない画像がフォトブックに含まれることを防止できる。 If the voice instructing the user to create a photobook contains a date, the server device 1 confirms the shooting date information included in the image data and selects an image shot on the specified date. Make a photo book. Therefore, it is possible to prevent the photo book from including an image that the user does not intend.

図８は、フォトブック作製システムのブロック構成図である。図８に示すように、スマートスピーカ２は、制御部２０、集音部（マイク）２１、音声出力部（スピーカ）２２及び通信部２３を有する。 FIG. 8 is a block configuration diagram of a photobook production system. As shown in FIG. 8, the smart speaker 2 has a control unit 20, a sound collecting unit (microphone) 21, a voice output unit (speaker) 22, and a communication unit 23.

制御部２０は、音声認識の機能を有し、集音部２１を介して所定のウェイクワードが入力されると、ウェイクワード以降の音声を、通信部２３を用いてサーバ装置１へ送信する。 The control unit 20 has a voice recognition function, and when a predetermined wake word is input via the sound collecting unit 21, the control unit 20 transmits the voice after the wake word to the server device 1 using the communication unit 23.

音声出力部２２は、通信部２３を介してサーバ装置１から受信した応答文の音声データを出力する。 The voice output unit 22 outputs the voice data of the response text received from the server device 1 via the communication unit 23.

サーバ装置１は、対話処理部１０及びフォトブック編集部１００を備える。 The server device 1 includes an dialogue processing unit 10 and a photobook editing unit 100.

対話処理部１０は、ユーザからの音声指示を理解し、ユーザに対する適切な応答文を生成するものであり、入力理解部１１、対話管理部１２及び出力生成部１３を有する。入力理解部１１は、スマートスピーカ２から受け取ったユーザの発話文からユーザの意図（タスク）を推定する意図推定と、人名や地名等の固有名詞、日付、時間等の表現を発話文から抽出する固有表現抽出の機能を有する。 The dialogue processing unit 10 understands the voice instruction from the user and generates an appropriate response sentence to the user, and has an input understanding unit 11, a dialogue management unit 12, and an output generation unit 13. The input understanding unit 11 estimates the user's intention (task) from the user's utterance sentence received from the smart speaker 2, and extracts expressions such as proper nouns such as personal names and place names, dates, and times from the utterance sentences. It has a function of named entity extraction.

対話管理部１２は、入力理解部１１から受け取った結果情報をデータベースに相当する内部状態に書き込んで更新する内部状態更新と、内部状態及び対話戦略（行動選択規則）に基づいて次の行動を選択する行動選択の機能を有する。 The dialogue management unit 12 writes and updates the result information received from the input understanding unit 11 in the internal state corresponding to the database, and selects the next action based on the internal state and the dialogue strategy (action selection rule). Has the function of action selection.

出力生成部１３は、対話管理部１２の行動選択が出した指示に合う応答文を生成し、スマートスピーカ２へ送信する。 The output generation unit 13 generates a response sentence corresponding to the instruction issued by the action selection of the dialogue management unit 12, and transmits it to the smart speaker 2.

対話処理部１０は、フォトブック編集部１００と連携し、フォトブック編集部１００の処理結果を応答文に反映させることができる。 The dialogue processing unit 10 can cooperate with the photo book editing unit 100 and reflect the processing result of the photo book editing unit 100 in the response sentence.

フォトブック編集部１００は、画像ＤＢ、解析結果ＤＢ、フォトブックデータＤＢ、及び注文内容ＤＢを有する記憶部１１０を備える。 The photobook editing unit 100 includes a storage unit 110 having an image DB, an analysis result DB, a photobook data DB, and an order content DB.

ＣＰＵ（中央処理装置）が記憶部１１０に記憶されているプログラムを実行することで、画像受信部１０１、画像解析部１０２、画像選択部１０３、編集処理部１０４、コメント生成部１０５及び注文処理部１０６の機能が実現される。 By executing the program stored in the storage unit 110 by the CPU (central processing unit), the image receiving unit 101, the image analysis unit 102, the image selection unit 103, the editing processing unit 104, the comment generation unit 105, and the order processing unit 106 functions are realized.

画像受信部１０１は、ユーザ端末３からアップロードされた画像データを受信し、画像ＤＢに格納する。 The image receiving unit 101 receives the image data uploaded from the user terminal 3 and stores it in the image DB.

画像解析部１０２は、ユーザ端末３からアップロードされた画像の解析を行う。画像解析は、例えば、画像内の物体やテキストの検出、公序良俗に反するおそれのある画像の検出、著名人や有名キャラクター等が写った著作権を侵害する可能性のある画像の検出である。画像解析部１０２は、公序良俗に反する画像や著作権を侵害する可能性のある画像については、フォトブックに使用すべきでない不適切画像と判定し、判定結果を画像ＤＢ内の画像データに紐付ける。画像解析部１０２は、画像毎に、画像から検出した物体やテキスト、画像データに含まれる撮影日情報、撮影場所情報等をタグとして解析結果ＤＢに格納する。 The image analysis unit 102 analyzes the image uploaded from the user terminal 3. Image analysis is, for example, detection of objects and texts in an image, detection of an image that may be offensive to public order and morals, and detection of an image that may infringe the copyright of a celebrity, a famous character, or the like. The image analysis unit 102 determines that an image that violates public order and morals or an image that may infringe copyright is an inappropriate image that should not be used in a photo book, and associates the determination result with the image data in the image DB. .. The image analysis unit 102 stores the object and text detected from the image, the shooting date information included in the image data, the shooting location information, and the like as tags in the analysis result DB for each image.

画像選択部１０３は、対話処理部１０がユーザの発話からフォトブックの作製というタスクを抽出すると、画像ＤＢ内の画像（不適切画像を除く）からフォトブックに好適な画像を選択する。ユーザの発話に日時が含まれている場合は、画像データに含まれる撮影日情報を参照し、画像を選択する。 When the dialogue processing unit 10 extracts the task of creating a photobook from the user's utterance, the image selection unit 103 selects an image suitable for the photobook from the images (excluding inappropriate images) in the image DB. If the user's utterance includes the date and time, the shooting date information included in the image data is referred to and an image is selected.

編集処理部１０４は、画像選択部１０３により選択された画像を所定のテンプレートに配置し、フォトブックデータを生成し、プレビュー画面をユーザ端末３へ送信する。生成したフォトブックデータは、フォトブックデータＤＢに格納される。 The editing processing unit 104 arranges the image selected by the image selection unit 103 in a predetermined template, generates photobook data, and transmits the preview screen to the user terminal 3. The generated photobook data is stored in the photobook data DB.

編集処理部１０４は、対話処理部１０がユーザの発話からページの入れ替えというタスクを抽出すると、ページを入れ替えて、フォトブックデータ及びプレビュー画面を更新する。 When the dialogue processing unit 10 extracts the task of page replacement from the user's utterance, the editing processing unit 104 replaces the pages and updates the photobook data and the preview screen.

編集処理部１０４は、対話処理部１０がユーザの発話から画像の変更というタスクを抽出すると、画像ＤＢ内の画像を用いて画像一覧画面を作成し、ユーザ端末３へ送信する。画像一覧画面を介して画像が選択されると、編集処理部１０４は、画像を変更してフォトブックデータ及びプレビュー画面を更新する。 When the dialogue processing unit 10 extracts the task of changing an image from the user's utterance, the editing processing unit 104 creates an image list screen using the image in the image DB and transmits it to the user terminal 3. When an image is selected via the image list screen, the editing processing unit 104 changes the image and updates the photobook data and the preview screen.

画像一覧画面から不適切画像が選択された場合、不適切画像が選択されたことを対話処理部１０に通知する。対話処理部１０は、不適切画像の出所等を問う質問文を生成し、スマートスピーカ２から出力させる。 When an inappropriate image is selected from the image list screen, the dialogue processing unit 10 is notified that the inappropriate image has been selected. The dialogue processing unit 10 generates a question sentence asking about the source of the inappropriate image and outputs it from the smart speaker 2.

編集処理部１０４は、対話処理部１０がユーザの発話からコメントの変更というタスクを抽出すると、ユーザから音声入力されたコメントに変更し、フォトブックデータ及びプレビュー画面を更新する。 When the dialogue processing unit 10 extracts the task of changing the comment from the user's utterance, the editing processing unit 104 changes the comment to the comment input by the user and updates the photobook data and the preview screen.

また、編集処理部１０４は、図７に示すような最終確認画面ＦＣを作成する。 Further, the editing processing unit 104 creates a final confirmation screen FC as shown in FIG. 7.

コメント生成部１０５は、フォトブックに使用している画像に対して、コメントを自動生成し、付与する。各画像には、画像解析により画像から検出された物体やテキスト、画像データに含まれる撮影日情報、撮影場所情報等がタグとして付与されている。コメント生成部１０５はこれらのタグを用いて、コメントを生成する。 The comment generation unit 105 automatically generates and adds a comment to the image used in the photo book. Objects and texts detected from the images by image analysis, shooting date information included in the image data, shooting location information, and the like are attached to each image as tags. The comment generation unit 105 uses these tags to generate a comment.

ユーザからのフォトブック作製開始の音声コマンドに「〇△□のフォトブック」のようにフォトブックのテーマ（シーン）に相当する文言「〇△□」が含まれている場合、目的に応じて、コメント生成に参照するタグの優先順位を設定する。 If the voice command from the user to start creating a photobook contains the word "○ △ □" that corresponds to the theme (scene) of the photobook, such as "○ △ □ photobook", depending on the purpose, Set the priority of the tag to be referred to for comment generation.

例えば、フォトブックのテーマが旅行である場合、以下の優先順位でタグを用いてコメントを生成する。
１：地名（位置情報）
２：ランドマーク（有名な建造物など）
３：乗り物
４：食べ物 For example, if the theme of the photobook is travel, the tags are used to generate comments in the following order of priority.
1: Place name (location information)
2: Landmarks (famous buildings, etc.)
3: Vehicle 4: Food

また、例えばフォトブックのテーマがスポーツである場合、以下の優先順位でタグを用いてコメントを生成する。
１：撮影部
２：表情（うれしい、苦しい、楽しい、悲しいなど）
３：道具（サッカーボール、バット、ラケットなど） Also, for example, when the theme of the photo book is sports, comments are generated using tags in the following order of priority.
1: Shooting part 2: Facial expressions (happy, painful, fun, sad, etc.)
3: Tools (soccer balls, bats, rackets, etc.)

優先順位はユーザからの指示で変更可能となっていてもよい。 The priority may be changeable by an instruction from the user.

注文処理部１０６は、ユーザ端末３から、フォトブックのプリント冊数等の注文を受け付ける。注文処理部１０６が決済情報の入力を受け付けて決済処理を行うと、プリント注文が完了する。注文処理部１０６はフォトブックのプリント冊数や配送先住所等を含む注文内容を注文内容ＤＢに格納する。 The order processing unit 106 accepts an order such as the number of prints of a photo book from the user terminal 3. When the order processing unit 106 accepts the input of the payment information and performs the payment processing, the print order is completed. The order processing unit 106 stores the order contents including the number of prints of the photo book and the delivery address in the order contents DB.

プリント注文されたフォトブックデータと、注文内容とが工場５へ送信され、フォトブック６が製造される。 The photobook data ordered for print and the order details are transmitted to the factory 5, and the photobook 6 is manufactured.

このように、本実施形態によれば、スマートスピーカ２に話しかけることで、フォトブックの編集や注文を行うことができる。 As described above, according to the present embodiment, the photo book can be edited or ordered by talking to the smart speaker 2.

ユーザ端末３はスマートフォンやタブレット端末に限定されず、図９に示すような大画面のテレビ（ディスプレイ）３Ａ及びコントローラ３Ｂであってもよい。これにより、複数人で大型ディスプレイに表示されたプレビュー画面を確認しながら、スマートスピーカ２に話しかけてフォトブックの編集を行うことができる。 The user terminal 3 is not limited to a smartphone or a tablet terminal, and may be a large-screen television (display) 3A and a controller 3B as shown in FIG. As a result, a plurality of people can talk to the smart speaker 2 and edit the photo book while checking the preview screen displayed on the large display.

図５に示すような画像変更処理では、サーバ装置１に保存されている画像だけでなく、ユーザ端末３に保存されている画像を利用・選択できるようにしてもよい。例えば、サーバ装置１に保存されている画像の一覧画面がユーザ端末３に表示されている状態で、ユーザが「スマホの写真から選ぶ」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、ユーザ端末３内の画像の一覧画面を作成して表示するように、ユーザ端末３に指示する。 In the image change process as shown in FIG. 5, not only the image stored in the server device 1 but also the image stored in the user terminal 3 may be used and selected. For example, while the list screen of the images stored in the server device 1 is displayed on the user terminal 3, the user talks to the smart speaker 2 to "select from the pictures of the smartphone". The server device 1 instructs the user terminal 3 to acquire and interpret the user's utterance text via the smart speaker 2 and to create and display a list screen of images in the user terminal 3.

ユーザが、ユーザ端末３を操作して、端末内画像の一覧画面から画像を選択すると、ユーザ端末３は選択された画像の画像データをサーバ装置１へ送信（アップロード）する。サーバ装置１は、受信した画像の解析を行い、フォトブックに使用可能な画像か、又はフォトブックに使用すべきでない不適切画像であるかを判定する。フォトブックに使用可能な画像と判定した場合、サーバ装置１は、フォトブックデータ及びプレビュー画面を更新する。 When the user operates the user terminal 3 to select an image from the image list screen in the terminal, the user terminal 3 transmits (uploads) the image data of the selected image to the server device 1. The server device 1 analyzes the received image and determines whether it is an image that can be used in the photobook or an inappropriate image that should not be used in the photobook. When it is determined that the image can be used for the photo book, the server device 1 updates the photo book data and the preview screen.

不適切画像と判定した場合、サーバ装置１は、画像の出所を質問する応答文を生成してスマートスピーカ２へ送信する。ユーザからの返答の結果、問題無いことが確認されると、サーバ装置１は、この画像を利用してフォトブックデータ及びプレビュー画面を更新する。 If it is determined that the image is inappropriate, the server device 1 generates a response sentence asking the source of the image and sends it to the smart speaker 2. When it is confirmed that there is no problem as a result of the response from the user, the server device 1 updates the photobook data and the preview screen using this image.

上記実施形態では、画像を変更する際、図５に示すように、ユーザ端末３のタッチパネルをタッチして、画像一覧画面から画像を選択する例について説明したが、スマートスピーカ２を用いた音声入力により画像を選択できるようにしてもよい。 In the above embodiment, as shown in FIG. 5, an example of touching the touch panel of the user terminal 3 to select an image from the image list screen when changing an image has been described, but voice input using the smart speaker 2 has been described. The image may be selected by.

例えば、図１０に示すように、ユーザが「６ページの写真を変更」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈する。このとき、６ページに複数の画像が含まれている場合、サーバ装置１の編集処理部１０４が、各画像に識別番号を付与してユーザ端末３の画面を更新し、対話処理部１０が、どの番号の画像を変更するか質問する応答文を生成してスマートスピーカ２へ送信する。各画像に付与される識別番号は、発話が簡単なものであることが好ましい。 For example, as shown in FIG. 10, the user talks to the smart speaker 2 "change the photo on page 6". The server device 1 acquires and interprets the user's utterance text via the smart speaker 2. At this time, when a plurality of images are included in page 6, the editing processing unit 104 of the server device 1 assigns an identification number to each image to update the screen of the user terminal 3, and the dialogue processing unit 10 determines. A response message asking which number of the image to change is generated and transmitted to the smart speaker 2. The identification number given to each image is preferably one that is easy to speak.

ユーザが変更対象となる画像の識別番号（例えば“６－１（ろくのいち）”）をスマートスピーカ２に話しかけると、サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、変更対象の画像を特定し、画像の一覧画面をユーザ端末３へ送信する。 When the user speaks the identification number of the image to be changed (for example, "6-1 (Rokunoichi)") to the smart speaker 2, the server device 1 acquires and interprets the user's spoken text via the smart speaker 2. Then, the image to be changed is specified, and the image list screen is transmitted to the user terminal 3.

画像一覧画面では、表示されている各画像に識別番号が付与されている。各画像に付与される識別番号は、発話が容易なものであることが好ましい。ユーザ端末３を操作し、画面をスクロールして、表示部（例えばタッチパネル等のディスプレイ）に表示される画像が変わった場合でも、同じ表示位置の画像には、同じ識別番号が付与される。図１０に示す例では、ユーザ端末３には画像一覧画面のうちの６枚の画像が表示され、それぞれ１～６の連番の識別番号が付与される。画面をスクロールして表示される画像を変えても、表示中の画像には常に１～６の識別番号が付与される。 On the image list screen, an identification number is assigned to each displayed image. It is preferable that the identification number given to each image is easy to speak. Even when the image displayed on the display unit (for example, a display such as a touch panel) changes by operating the user terminal 3 and scrolling the screen, the same identification number is assigned to the image at the same display position. In the example shown in FIG. 10, six images from the image list screen are displayed on the user terminal 3, and identification numbers of serial numbers 1 to 6 are assigned to each. Even if the displayed image is changed by scrolling the screen, the displayed image is always given an identification number of 1 to 6.

画像一覧には多数の画像が含まれており、全ての画像に識別番号を付与した場合、画面をスクロールしていくと、画像に付与される識別番号が大きくなり、桁数が増え、ユーザが発話する番号の認識が困難になる。ユーザ端末３に表示中の画像にのみ識別番号を付与し、スクロールで表示内容が変わった時も同じ表示位置の画像には同じ識別番号を割り当てることで、識別番号は小さい値のままになり、桁数は少なく、ユーザが発話する番号の認識が容易になる。 The image list contains a large number of images, and if all the images are given identification numbers, as the screen is scrolled, the identification numbers given to the images increase, the number of digits increases, and the user It becomes difficult to recognize the number to be spoken. By assigning an identification number only to the image displayed on the user terminal 3 and assigning the same identification number to the image at the same display position even when the display content changes by scrolling, the identification number remains a small value. The number of digits is small, which makes it easy for the user to recognize the number spoken.

識別番号の付与は、ユーザ端末３が行ってもよいし、サーバ装置１が行ってもよい。 The identification number may be assigned by the user terminal 3 or the server device 1.

サーバ装置１は、画像一覧画面の送信後、応答文を生成してスマートスピーカ２へ送信する。スマートスピーカ２は「写真一覧の番号を教えてください」等の音声を出力する。 After transmitting the image list screen, the server device 1 generates a response sentence and transmits it to the smart speaker 2. The smart speaker 2 outputs a voice such as "Please tell me the number of the photo list".

ユーザは、使用したい画像に付与されている識別番号を発話する。例えば、ユーザは「２番の写真を使う」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、２番の識別番号が付与されている画像に変更し、フォトブックデータ及びプレビュー画面を更新する。識別番号の付与をユーザ端末３が行っている場合、サーバ装置１は、ユーザが発話した識別番号がどの画像に付与されているかをユーザ端末３に問い合わせる。 The user speaks the identification number given to the image to be used. For example, the user talks to the smart speaker 2 "use the second photo". The server device 1 acquires and interprets the user's utterance text via the smart speaker 2, changes the image to an image to which the second identification number is assigned, and updates the photobook data and the preview screen. When the user terminal 3 assigns the identification number, the server device 1 inquires the user terminal 3 which image the identification number spoken by the user is assigned to.

画像一覧画面において、識別番号は画像に重ねて表示してもよいし、画像の近傍に表示してもよい。また、ユーザが「番号オフ」「番号オン」とスマートスピーカ２に話しかけると、ユーザ端末３に表示されている画像一覧画面において、識別番号の表示のオフ／オンが切り替えられるようになっていてもよい。ユーザ端末３を操作し、画面をタッチして画像を選択するユーザには、識別番号の表示は不要なためである。ユーザ端末３が識別番号を付与している場合、サーバ装置１は、ユーザの発話文から識別番号の表示のオフ／オンをユーザ端末３に指示する。 On the image list screen, the identification number may be displayed superimposed on the image or may be displayed in the vicinity of the image. Further, when the user speaks "number off" and "number on" to the smart speaker 2, even if the display of the identification number can be switched off / on on the image list screen displayed on the user terminal 3. good. This is because the user who operates the user terminal 3 and touches the screen to select an image does not need to display the identification number. When the user terminal 3 is assigned an identification number, the server device 1 instructs the user terminal 3 to turn off / on the display of the identification number from the utterance of the user.

画像変更の際、ユーザが発話した条件で絞り込んだ画像を一覧画面に表示してもよい。 When changing the image, the images narrowed down according to the conditions spoken by the user may be displayed on the list screen.

例えば、図１１に示すように、ユーザが「６ページの写真を変更」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、画像の一覧画面をユーザ端末３へ送信する。この一覧画面には、サーバ装置１に登録されている全ての画像が含まれる。 For example, as shown in FIG. 11, the user talks to the smart speaker 2 "change the photo on page 6". The server device 1 acquires and interprets the user's utterance text via the smart speaker 2, and transmits the image list screen to the user terminal 3. This list screen includes all the images registered in the server device 1.

続いて、ユーザが「今年の６月１０日に撮影した写真を表示」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、編集処理部１０４が、記憶部１１０に格納されている画像のＥＸＩＦデータを参照し、撮影日が２０１８年６月１０日の画像を検索して抽出し、抽出した画像の一覧画面をユーザ端末３へ送信する。図１０に示した例と同様に、ユーザ端末３に表示される画像一覧画面では、表示されている各画像に識別番号が付与される。 Then, the user talks to the smart speaker 2 "Display the picture taken on June 10th of this year". The server device 1 acquires and interprets the user's spoken text via the smart speaker 2, and the editing processing unit 104 refers to the EXIF data of the image stored in the storage unit 110, and the shooting date is June 2018. The image on the 10th of the month is searched and extracted, and the list screen of the extracted images is transmitted to the user terminal 3. Similar to the example shown in FIG. 10, on the image list screen displayed on the user terminal 3, an identification number is assigned to each displayed image.

ユーザは、２０１８年６月１０日に撮影した画像の一覧画面から、使用したい画像に付与されている識別番号を発話する。例えば、ユーザは「２番の写真を使う」とスマートスピーカ２に話しかける。サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、２番の識別番号が付与されている画像に変更し、フォトブックデータ及びプレビュー画面を更新する。 The user speaks the identification number assigned to the image to be used from the list screen of the images taken on June 10, 2018. For example, the user talks to the smart speaker 2 "use the second photo". The server device 1 acquires and interprets the user's utterance text via the smart speaker 2, changes the image to an image to which the second identification number is assigned, and updates the photobook data and the preview screen.

検索条件を音声入力し、画像一覧画面に表示する画像を絞り込むことで、好みの画像が探しやすくなる。画像の絞り込み（検索）を行う際のキーワード（音声）の例を図１２に示す。 By inputting search conditions by voice and narrowing down the images to be displayed on the image list screen, it becomes easier to find your favorite image. FIG. 12 shows an example of a keyword (voice) when narrowing down (searching) an image.

例えばユーザが「２０１７年１２月１７日の写真」と発話することで、特定の日付の画像が検索される。撮影日がわかっている場合は、このように日付を指定して画像を検索できる。午前中、午後、夜のように、日付に加えて時間帯を指定できるようにしてもよい。 For example, when a user says "photograph of December 17, 2017", an image of a specific date is searched. If you know the shooting date, you can search for images by specifying the date like this. You may be able to specify a time zone in addition to the date, such as morning, afternoon, or night.

「２０１７年１２月の写真」のように日付の範囲を指定して画像を検索できるようにしてもよい。 You may be able to search for images by specifying a date range, such as "Photos of December 2017".

画像変更の際に、変更前の画像と同じ日に撮影された画像を一覧画面で確認したい場合、ユーザは「この写真と同じ撮影日の写真」と発話する。撮影日を覚えていなくても、同じ日付の画像を検索できる。 When changing an image, if you want to check the image taken on the same day as the image before the change on the list screen, the user says "a photo on the same shooting date as this photo". You can search for images with the same date without having to remember the shooting date.

サーバ装置１に誕生日等の記念日を事前にスケジュール情報に登録しておくことで、記念日に基づいて画像を検索できる。例えば、ユーザが「お母さんの誕生日の写真」と発話することで、サーバ装置１は、事前に登録されている誕生日と同じ日付の写真を複数年分（過去に遡って）検索する。これにより、毎年の記念日の写真を確認できる。 By registering anniversaries such as birthdays in the schedule information in advance in the server device 1, images can be searched based on the anniversaries. For example, when the user speaks "photograph of mother's birthday", the server device 1 searches for a photo of the same date as the pre-registered birthday for a plurality of years (backward to the past). This allows you to see photos of each year's anniversary.

スケジュール情報には、記念日だけでなく、旅行や花火大会などのイベントを登録してもよい。例えば、ユーザが「去年の花火大会の写真」と発話すると、サーバ装置１は、スケジュール情報に基づいて、１年前の花火大会の日に撮影された画像を検索し、画像一覧画面に表示する。イベントに時間帯が登録されている場合は、その時間帯に撮影された画像を検索する。 In the schedule information, not only anniversaries but also events such as trips and fireworks displays may be registered. For example, when the user speaks "a picture of the fireworks display last year", the server device 1 searches for an image taken on the day of the fireworks display one year ago based on the schedule information and displays it on the image list screen. .. If a time zone is registered for the event, the images taken during that time zone are searched.

ユーザが「日付指定クリア」と発話することで、サーバ装置１は検索条件をクリアし、画像一覧画面で全ての画像が確認できるようにする。 When the user speaks "Clear date designation", the server device 1 clears the search condition so that all the images can be confirmed on the image list screen.

上記実施形態では、サーバ装置１が対話処理部１０及びフォトブック編集部１００を備える構成について説明したが、対話処理部１０を有する第１サーバと、フォトブック編集部１００を有する第２サーバとを設けてもよい。スマートスピーカ２及びユーザ端末３が、同一のユーザが使用するものであることを認識する処理は、第１サーバが行ってもよいし、第２サーバが行ってもよい。 In the above embodiment, the configuration in which the server device 1 includes the dialogue processing unit 10 and the photobook editing unit 100 has been described, but the first server having the dialogue processing unit 10 and the second server having the photobook editing unit 100 are used. It may be provided. The process of recognizing that the smart speaker 2 and the user terminal 3 are used by the same user may be performed by the first server or the second server.

ユーザ端末３のＣＰＵがプログラムを実行することで、フォトブック編集部１００の画像選択部１０３及び編集処理部１０４の機能がユーザ端末３に実現されるようにしてもよい。この場合、サーバ装置１は、ユーザの発話から抽出したタスク、検索条件、ユーザが選択した画像の識別番号等をユーザ端末３へ通知する。ユーザ端末３は、通知されたタスクに応じて、フォトブックに好適な画像の選択、フォトブックデータ生成、プレビュー画面の作成・表示等を行う。 By executing the program by the CPU of the user terminal 3, the functions of the image selection unit 103 and the editing processing unit 104 of the photobook editing unit 100 may be realized in the user terminal 3. In this case, the server device 1 notifies the user terminal 3 of the task extracted from the user's utterance, the search condition, the identification number of the image selected by the user, and the like. The user terminal 3 selects an image suitable for the photobook, generates photobook data, creates and displays a preview screen, and the like according to the notified task.

画像の変更処理では、サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、変更対象の画像を特定し、画像の一覧画面をユーザ端末３へ送信する。ユーザ端末３が、サーバ装置１内の画像を用いて、一覧画面を作成してもよい。ユーザ端末３で表示される画像一覧画面では、表示中の画像に識別番号が付与されている。識別番号の付与は、ユーザ端末３が行ってもよいし、サーバ装置１が行ってもよい。 In the image change process, the server device 1 acquires and interprets the user's utterance text via the smart speaker 2, identifies the image to be changed, and transmits the image list screen to the user terminal 3. The user terminal 3 may create a list screen using the image in the server device 1. On the image list screen displayed on the user terminal 3, an identification number is assigned to the displayed image. The identification number may be assigned by the user terminal 3 or the server device 1.

ユーザの発話に検索条件が含まれている場合、サーバ装置１は検索条件をユーザ端末３に通知する。ユーザ端末３は、サーバ装置１から通知された検索条件に基づいて、サーバ装置１内の画像を検索して抽出し、抽出した画像の一覧画面を作成して表示する。 When the search condition is included in the user's utterance, the server device 1 notifies the user terminal 3 of the search condition. The user terminal 3 searches for and extracts images in the server device 1 based on the search conditions notified from the server device 1, and creates and displays a list screen of the extracted images.

サーバ装置１は、スマートスピーカ２を介してユーザの発話文を取得して解釈し、ユーザが選択した画像の識別番号をユーザ端末３に通知する。ユーザ端末３は、通知された識別番号に基づいて、画像を変更し、フォトブックデータ及びプレビュー画面を更新する。 The server device 1 acquires and interprets the user's utterance text via the smart speaker 2, and notifies the user terminal 3 of the identification number of the image selected by the user. The user terminal 3 changes the image and updates the photobook data and the preview screen based on the notified identification number.

なお、このような構成にした場合、ユーザの発話から抽出したタスク等をサーバ装置１からユーザ端末３へ通知するタイムラグが生じるため、サーバ装置１がフォトブック編集部１００の機能を有する方がスムーズに処理を進めることができる。 In addition, in such a configuration, since there is a time lag in which the server device 1 notifies the user terminal 3 of the task extracted from the user's utterance, it is smoother if the server device 1 has the function of the photobook editing unit 100. You can proceed with the process.

本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 The present invention is not limited to the above embodiment as it is, and at the implementation stage, the components can be modified and embodied within a range that does not deviate from the gist thereof. In addition, various inventions can be formed by an appropriate combination of the plurality of components disclosed in the above-described embodiment. For example, some components may be removed from all the components shown in the embodiments. Furthermore, components over different embodiments may be combined as appropriate.

１サーバ装置
２スマートスピーカ
３ユーザ端末
４考査端末
５工場
６フォトブック 1 Server device 2 Smart speaker 3 User terminal 4 Examination terminal 5 Factory 6 Photobook

Claims

A server device that stores image data received from the user terminal, and
A smart speaker that is communicably connected to the server device and outputs voice and collects user's utterances.
Equipped with
The server device is
A dialogue processing unit that understands the user's voice input via the smart speaker, generates a response sentence to the user, and outputs the response sentence to the user via the smart speaker.
An image selection unit that selects a plurality of images from the images to be saved when the user's voice is a photobook creation instruction.
An editing processing unit that generates photobook data using the image selected by the image selection unit, and
Have,
The same identification information is given to the smart speaker and the user terminal, and it is recognized that the smart speaker and the user terminal are used by the same user.
The preview screen of the photo book data is transmitted to the user terminal, and the preview screen is transmitted.
When the user's voice is an image change instruction in the photobook data, an image list screen using the saved image is transmitted to the user terminal.
When the user's voice is an image search instruction including a search condition, an image matching the search condition is extracted from the stored image, and an image list screen using the extracted image is transmitted to the user terminal. A featured photobook production system.

The photobook production system according to claim 1, wherein the search condition includes a date, and the server device extracts an image whose shooting date is the date.

The server device stores schedule information in which the date of the event or anniversary is registered, and when the search condition includes information on the event or anniversary, the server device refers to the schedule information and includes the information in the search condition. The photobook making system according to claim 1 or 2, wherein the image taken on the date of the event or anniversary is extracted.

The server device controls the display unit of the user terminal so that the image included in the image list screen is displayed together with the identification number.
When the user's voice is an image selection instruction including the identification number, the selected image is used to change the image in the photobook data, and the photobook data and the preview screen are updated. The photobook production system according to any one of claims 1 to 3.

The photobook production system according to claim 4, wherein an identification number is assigned to the image on the image list screen according to the display position on the display unit.

The photobook production system according to claim 5, wherein an identification number is assigned only to the image displayed on the display unit among the plurality of images included in the image list screen.

A server device that is communicably connected to a smart speaker that outputs voice and collects user's utterances.
A storage unit that stores image data received from the user terminal,
A dialogue processing unit that understands the user's voice input via the smart speaker, generates a response sentence to the user, and outputs the response sentence to the user via the smart speaker.
An image selection unit that selects a plurality of images from the images to be saved when the user's voice is a photobook creation instruction.
An editing processing unit that generates photobook data using the image selected by the image selection unit, and
Have,
The same identification information is given to the smart speaker and the user terminal, and it is recognized that the smart speaker and the user terminal are used by the same user.
The preview screen of the photo book data is transmitted to the user terminal, and the preview screen is transmitted.
When the user's voice is an image change instruction in the photobook data, an image list screen using the saved image is transmitted to the user terminal.
When the user's voice is an image search instruction including a search condition, an image matching the search condition is extracted from the stored image, and an image list screen using the extracted image is transmitted to the user terminal. Characterized server device.

A server device that is communicably connected to a smart speaker that outputs voice and collects user's utterances.
A storage unit that stores image data received from the user terminal,
A dialogue processing unit that understands the user's voice input via the smart speaker, generates a response sentence to the user, and outputs the response sentence to the user via the smart speaker.
Equipped with
The same identification information is given to the smart speaker and the user terminal, and it is recognized that the smart speaker and the user terminal are used by the same user.
When the voice of the user is a photobook production instruction, the photobook production instruction is notified to the user terminal, and the user terminal is notified.
When the user's voice is an image change instruction in the photobook data generated by the user terminal using the image stored in the storage unit in response to the photobook production instruction, the voice is stored in the storage unit. The image list screen using the image is controlled so as to be displayed on the display unit of the user terminal.
When the voice of the user is an image search instruction including a search condition, the server device is characterized in that an image list screen using an image satisfying the search condition is controlled to be displayed on the display unit of the user terminal. ..