JP7390891B2

JP7390891B2 - Client device, server, program, and information processing method

Info

Publication number: JP7390891B2
Application number: JP2019552691A
Authority: JP
Inventors: 雅人小助川; 和高橋; 重太郎望月; 悠人公文
Original assignee: Shiseido Co Ltd
Current assignee: Shiseido Co Ltd
Priority date: 2017-11-07
Filing date: 2018-10-22
Publication date: 2023-12-04
Anticipated expiration: 2038-10-22
Also published as: WO2019093105A1; JPWO2019093105A1; TW201922186A

Description

本発明は、クライアント装置、サーバ、及び、プログラムに関する。 The present invention relates to a client device, a server, and a program.

一般に、視覚障害者の行動を補助する方法として、音声案内が知られている。
例えば、特開２００４－０１６５７８号公報には、目的地に設置された送信機と、視覚障害者が携帯した受信機との間の距離を測定することにより、現在地から目的物までの距離を音声で知らせる技術が開示されている。Generally speaking, audio guidance is known as a method for assisting the actions of visually impaired people.
For example, Japanese Patent Application Laid-Open No. 2004-016578 discloses that by measuring the distance between a transmitter installed at a destination and a receiver carried by a visually impaired person, the distance from the current location to the destination can be determined by voice. A technology has been disclosed to notify the user.

しかし、特開２００４－０１６５７８号公報では、送信機が設置されていない場所では、視覚障害者は音声案内を受けることができない。そのため、視覚障害者が音声案内を受けられる範囲が限られる。その結果、視覚障害者が安心して行動できる範囲が狭まってしまう。 However, in Japanese Patent Laid-Open No. 2004-016578, visually impaired people cannot receive voice guidance in places where transmitters are not installed. Therefore, the range in which visually impaired people can receive audio guidance is limited. As a result, the range in which visually impaired people can move with peace of mind becomes narrower.

本発明の目的は、視覚障害者が音声案内を受けられる範囲の制約を解消することである。 An object of the present invention is to eliminate restrictions on the range in which visually impaired people can receive audio guidance.

本発明の一態様は、
画像に含まれるオブジェクトに関する音声出力データを生成可能なサーバと接続されるクライアント装置であって、
ユーザの指に装着された少なくとも１つの装着物と、少なくとも１つのオブジェクトと、を含む画像の画像データを取得する手段を備え、
前記画像データを前記サーバに送信する手段を備え、
前記画像に含まれるオブジェクトに関する音声を出力するための音声出力データを前記サーバから受信する手段を備え、
前記音声出力データに基づく音声を出力する手段を備える、
クライアント装置である。One aspect of the present invention is
A client device connected to a server capable of generating audio output data regarding an object included in an image, the client device comprising:
comprising means for acquiring image data of an image including at least one attachment worn on a user's finger and at least one object;
comprising means for transmitting the image data to the server,
comprising means for receiving audio output data from the server for outputting audio related to the object included in the image;
comprising means for outputting audio based on the audio output data;
It is a client device.

本発明によれば、視覚障害者が音声案内を受けられる範囲の制約を解消することができる。 According to the present invention, it is possible to eliminate restrictions on the range in which a visually impaired person can receive audio guidance.

本実施形態の情報処理システムの概略図である。FIG. 1 is a schematic diagram of an information processing system according to the present embodiment. 図１の情報処理システムの構成を示すブロック図である。2 is a block diagram showing the configuration of the information processing system in FIG. 1. FIG. 図１のカメラユニット５０の構成を示す図である。2 is a diagram showing the configuration of a camera unit 50 in FIG. 1. FIG. 図１のネイルキャップの構成を示す図である。FIG. 2 is a diagram showing the configuration of the nail cap shown in FIG. 1. FIG. 本実施形態の概要の説明図である。It is an explanatory diagram of an outline of this embodiment. 本実施形態の情報処理のシーケンス図である。It is a sequence diagram of information processing of this embodiment. 図６のＳ５００の説明図である。7 is an explanatory diagram of S500 in FIG. 6. FIG. 図６のＳ１００の説明図である。7 is an explanatory diagram of S100 in FIG. 6. FIG. 図６のＳ５０２の説明図である。7 is an explanatory diagram of S502 in FIG. 6. FIG. 変形例のジェスチャデータベースのデータ構造を示す図である。It is a figure which shows the data structure of the gesture database of a modification. 変形例の情報処理のシーケンス図である。FIG. 7 is a sequence diagram of information processing in a modified example. 変形例のジェスチャの一例を示す図である。It is a figure which shows an example of the gesture of a modification.

以下、本発明の一実施形態について、図面に基づいて詳細に説明する。なお、実施形態を説明するための図面において、同一の構成要素には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, one embodiment of the present invention will be described in detail based on the drawings. In addition, in the drawings for explaining the embodiments, the same components are generally designated by the same reference numerals, and repeated explanations thereof will be omitted.

（１）情報処理システムの構成
情報処理システムの構成について説明する。図１は、本実施形態の情報処理システムの概略図である。図２は、図１の情報処理システムの構成を示すブロック図である。(1) Configuration of information processing system The configuration of the information processing system will be explained. FIG. 1 is a schematic diagram of an information processing system according to this embodiment. FIG. 2 is a block diagram showing the configuration of the information processing system in FIG. 1.

図１に示すように、情報処理システム１は、クライアント装置１０と、サーバ３０と、カメラユニット５０と、を備える。
クライアント装置１０及びサーバ３０は、ネットワーク（例えば、インターネット又はイントラネット）ＮＷを介して接続される。
クライアント装置１０及びカメラユニット５０は、無線通信を介して接続される。As shown in FIG. 1, the information processing system 1 includes a client device 10, a server 30, and a camera unit 50.
The client device 10 and the server 30 are connected via a network (eg, the Internet or an intranet) NW.
The client device 10 and camera unit 50 are connected via wireless communication.

クライアント装置１０は、サーバ３０にリクエストを送信する情報処理装置の一例である。クライアント装置１０は、例えば、スマートフォン、タブレット端末、又は、パーソナルコンピュータである。 The client device 10 is an example of an information processing device that sends a request to the server 30. The client device 10 is, for example, a smartphone, a tablet terminal, or a personal computer.

サーバ３０は、クライアント装置１０から送信されたリクエストに応じたレスポンスをクライアント装置１０に提供する情報処理装置の一例である。サーバ３０は、例えば、ウェブサーバである。 The server 30 is an example of an information processing device that provides the client device 10 with a response in response to a request transmitted from the client device 10. Server 30 is, for example, a web server.

カメラユニット５０は、画像を撮像し、且つ、撮像した画像の画像データを生成するように構成される。 The camera unit 50 is configured to capture an image and generate image data of the captured image.

ユーザ（例えば、視覚障害者）は、自身の指にネイルキャップＮＣ（「装着物」の一例）を装着し、且つ、クライアント装置１０を携帯することにより、サーバ３０にアクセスする。 A user (for example, a visually impaired person) accesses the server 30 by wearing a nail cap NC (an example of a "wearing item") on his or her finger and carrying the client device 10 with him or her.

（１－１）クライアント装置の構成
クライアント装置１０の構成について説明する。(1-1) Configuration of Client Device The configuration of the client device 10 will be explained.

図２に示すように、クライアント装置１０は、記憶装置１１と、プロセッサ１２と、入出力インタフェース１３と、通信インタフェース１４と、を備える。 As shown in FIG. 2, the client device 10 includes a storage device 11, a processor 12, an input/output interface 13, and a communication interface 14.

記憶装置１１は、プログラム及びデータを記憶するように構成される。記憶装置１１は、例えば、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）、及び、ストレージ（例えば、フラッシュメモリ又はハードディスク）の組合せである。 The storage device 11 is configured to store programs and data. The storage device 11 is, for example, a combination of ROM (Read Only Memory), RAM (Random Access Memory), and storage (for example, flash memory or hard disk).

プログラムは、例えば、以下のプログラムを含む。
・ＯＳ（Operating System）のプログラム
・情報処理を実行するアプリケーション（例えば、ウェブブラウザ）のプログラムThe programs include, for example, the following programs.
・OS (Operating System) program ・Application program that executes information processing (e.g. web browser)

データは、例えば、以下のデータを含む。
・情報処理において参照されるデータベース
・情報処理を実行することによって得られるデータ（つまり、情報処理の実行結果）The data includes, for example, the following data.
・Databases referenced in information processing ・Data obtained by executing information processing (that is, execution results of information processing)

プロセッサ１２は、記憶装置１１に記憶されたプログラムを起動することによって、クライアント装置１０の機能を実現するように構成される。プロセッサ１２は、コンピュータの一例である。 The processor 12 is configured to implement the functions of the client device 10 by activating a program stored in the storage device 11. Processor 12 is an example of a computer.

入出力インタフェース１３は、クライアント装置１０に接続される入力デバイスからユーザの指示を取得し、かつ、クライアント装置１０に接続される出力デバイスに情報を出力するように構成される。
入力デバイスは、例えば、キーボード、ポインティングデバイス、タッチパネル、マイク、又は、それらの組合せである。
出力デバイスは、例えば、ディスプレイ、スピーカ、又は、それらの組合せである。The input/output interface 13 is configured to obtain user instructions from an input device connected to the client device 10 and output information to an output device connected to the client device 10 .
The input device is, for example, a keyboard, pointing device, touch panel, microphone, or a combination thereof.
The output device is, for example, a display, a speaker, or a combination thereof.

通信インタフェース１４は、クライアント装置１０とサーバ３０との間の通信を制御するように構成される。 Communication interface 14 is configured to control communication between client device 10 and server 30.

（１－２）サーバの構成
サーバ３０の構成について説明する。(1-2) Server Configuration The configuration of the server 30 will be explained.

図２に示すように、サーバ３０は、記憶装置３１と、プロセッサ３２と、入出力インタフェース３３と、通信インタフェース３４とを備える。 As shown in FIG. 2, the server 30 includes a storage device 31, a processor 32, an input/output interface 33, and a communication interface 34.

記憶装置３１は、プログラム及びデータを記憶するように構成される。記憶装置３１は、例えば、ＲＯＭ、ＲＡＭ、及び、ストレージ（例えば、フラッシュメモリ又はハードディスク）の組合せである。 The storage device 31 is configured to store programs and data. The storage device 31 is, for example, a combination of ROM, RAM, and storage (eg, flash memory or hard disk).

プログラムは、例えば、以下のプログラムを含む。
・ＯＳのプログラム
・情報処理を実行するアプリケーションのプログラム
・画像オブジェクトの特徴量と言語（例えば、オブジェクト名）との関係に関する学習用データセットThe programs include, for example, the following programs.
・OS program ・Application program that executes information processing ・Learning data set regarding the relationship between image object features and language (e.g. object name)

データは、例えば、以下のデータを含む。
・情報処理において参照されるデータベース
・情報処理の実行結果The data includes, for example, the following data.
・Databases referenced in information processing ・Execution results of information processing

プロセッサ３２は、記憶装置３１に記憶されたプログラムを起動することによって、サーバ３０の機能を実現するように構成される。プロセッサ３２は、コンピュータの一例である。 The processor 32 is configured to implement the functions of the server 30 by activating a program stored in the storage device 31. Processor 32 is an example of a computer.

入出力インタフェース３３は、サーバ３０に接続される入力デバイスからユーザの指示を取得し、かつ、サーバ３０に接続される出力デバイスに情報を出力するように構成される。
入力デバイスは、例えば、キーボード、ポインティングデバイス、タッチパネル、又は、それらの組合せである。
出力デバイスは、例えば、ディスプレイである。The input/output interface 33 is configured to obtain user instructions from an input device connected to the server 30 and output information to an output device connected to the server 30.
The input device is, for example, a keyboard, pointing device, touch panel, or a combination thereof.
The output device is, for example, a display.

通信インタフェース３４は、サーバ３０とクライアント装置１０との間の通信を制御するように構成される。 Communication interface 34 is configured to control communication between server 30 and client device 10 .

（１－３）カメラユニットの構成
カメラユニット５０の構成について説明する。図３は、図１のカメラユニット５０の構成を示す図である。(1-3) Configuration of camera unit The configuration of the camera unit 50 will be explained. FIG. 3 is a diagram showing the configuration of the camera unit 50 of FIG. 1.

図３Ａは、カメラユニット５０の正面図である。図３Ｂは、カメラユニット５０の上面図である。図３Ｃは、カメラユニット５０の側面図である。 FIG. 3A is a front view of the camera unit 50. FIG. 3B is a top view of the camera unit 50. FIG. 3C is a side view of the camera unit 50.

図３に示すように、カメラユニット５０は、レンズ５０ａと、スピーカ５０ｂと、クリップ５０ｃと、イメージセンサ５０ｄと、カメラコントローラ５０ｅと、を備える、 As shown in FIG. 3, the camera unit 50 includes a lens 50a, a speaker 50b, a clip 50c, an image sensor 50d, and a camera controller 50e.

図３Ａ～図３Ｃに示すように、レンズ５０ａは、カメラユニット５０の前面（Ｚ－側の面）に配置される。 As shown in FIGS. 3A to 3C, the lens 50a is arranged on the front surface (Z-side surface) of the camera unit 50.

図３Ｂ～図３Ｃに示すように、スピーカ５０ｂは、カメラユニット５０の上面（Ｙ＋側）に配置される。 As shown in FIGS. 3B to 3C, the speaker 50b is arranged on the top surface (Y+ side) of the camera unit 50.

クリップ５０ｃは、カメラユニット５０の背面（Ｚ＋側の面）に配置される。つまり、クリップ５０ｃは、レンズ５０ａと反対側の面に配置される。
ユーザは、クリップ５０ｃを自身の衣服に引っ掛けることにより、レンズ５０ａが自身の正面（つまり、視線の方向）を向くように、カメラユニット５０を装着することができる。The clip 50c is arranged on the back surface (Z+ side surface) of the camera unit 50. In other words, the clip 50c is placed on the opposite surface from the lens 50a.
By hooking the clip 50c onto his or her clothing, the user can wear the camera unit 50 so that the lens 50a faces in front of the user (that is, the direction of the user's line of sight).

図３Ａ及び図３Ｃに示すように、イメージセンサ５０ｄは、カメラユニット５０の内部に配置される。レンズ５０ａを通過した光は、イメージセンサ５０ｄ上で結像する。イメージセンサ５０ｄは、結像した光を電気信号に変換することにより、レンズ５０ａを通過した光に基づく画像データを生成するように構成される。 As shown in FIGS. 3A and 3C, the image sensor 50d is arranged inside the camera unit 50. The light that has passed through the lens 50a forms an image on the image sensor 50d. The image sensor 50d is configured to generate image data based on the light that has passed through the lens 50a by converting the imaged light into an electrical signal.

カメラコントローラ５０ｅは、カメラユニット５０の内部に配置されている。カメラコントローラ５０ｅは、カメラユニット５０の全体を制御するプロセッサである。 The camera controller 50e is arranged inside the camera unit 50. The camera controller 50e is a processor that controls the entire camera unit 50.

（１－４）ネイルキャップの構成
ネイルキャップＮＣの構成について説明する。図４は、図１のネイルキャップの構成を示す図である。(1-4) Structure of Nail Cap The structure of the nail cap NC will be explained. FIG. 4 is a diagram showing the structure of the nail cap shown in FIG. 1.

図４Ａに示すように、ネイルキャップＮＣは、５個の右手用ネイルキャップＮＣＲ、及び、５個の左手用ネイルキャップ（つまり、１０個のネイルキャップ）ＮＣＬを含む。１０個のネイルキャップＮＣＬ及びＮＣＲには、互いに、異なるパターン（例えば、テキスト「Ｌ１」～「Ｌ５」及び「Ｒ１」～「Ｒ５」）が形成されている。ネイルキャップＮＣに形成されたパターンによって、各ネイルキャップＮＣが区別される。 As shown in FIG. 4A, the nail cap NC includes five right-hand nail caps NCR and five left-hand nail caps (that is, 10 nail caps) NCL. Different patterns (for example, texts "L1" to "L5" and "R1" to "R5") are formed on the ten nail caps NCL and NCR. Each nail cap NC is distinguished by the pattern formed on the nail cap NC.

図４Ｂに示すように、各ネイルキャップＮＣは、ユーザの指に装着可能である。 As shown in FIG. 4B, each nail cap NC can be attached to a user's finger.

（２）本実施形態の概要
本実施形態の概要について説明する。図５は、本実施形態の概要の説明図である。(2) Overview of this embodiment An overview of this embodiment will be explained. FIG. 5 is an explanatory diagram outlining the present embodiment.

図５に示すように、ユーザ（例えば、視覚障害者）の爪に装着されたネイルキャップＮＣが撮像範囲に入ると、クライアント装置１０は、ネイルキャップＮＣの周辺のオブジェクトＯＢＪ１（リンゴ）及びオブジェクトＯＢＪ２（バナナ）を含む画像ＩＭＧの画像データを取得する。
クライアント装置１０は、取得した画像データをサーバ３０に送信する。As shown in FIG. 5, when the nail cap NC attached to the nail of a user (for example, a visually impaired person) enters the imaging range, the client device 10 detects an object OBJ1 (apple) and an object OBJ2 around the nail cap NC. Image data of an image IMG containing (banana) is acquired.
The client device 10 transmits the acquired image data to the server 30.

サーバ３０は、クライアント装置１０から送信された画像データに対して画像解析を実行することにより、画像ＩＭＧにおけるネイルキャップＮＣの位置と、オブジェクトＯＢＪ１～ＯＢＪ２の位置と、をする。
サーバ３０は、特定した位置に基づいて、オブジェクトＯＢＪ１～ＯＢＪ２のうち、ネイルキャップＮＣに最も近いオブジェクトＯＢＪ１を特定する。
サーバ３０は、記憶装置３１に記憶された学習用データセットを参照して、特定したオブジェクトＯＢＪ１の特徴量に基づくオブジェクト名（つまり、リンゴ）を推定する。
サーバ３０は、推定したオブジェクト名の音声を出力するための音声出力データを生成する。
サーバ３０は、生成した音声出力データをクライアント装置１０に送信する。The server 30 determines the position of the nail cap NC and the positions of the objects OBJ1 to OBJ2 in the image IMG by performing image analysis on the image data transmitted from the client device 10.
Based on the specified position, the server 30 specifies the object OBJ1 closest to the nail cap NC among the objects OBJ1 to OBJ2.
The server 30 refers to the learning data set stored in the storage device 31 and estimates the object name (that is, apple) based on the feature amount of the identified object OBJ1.
The server 30 generates audio output data for outputting audio of the estimated object name.
The server 30 transmits the generated audio output data to the client device 10.

クライアント装置１０は、サーバ３０から送信された音声出力データに基づいて、音声「リンゴ」を出力する。 The client device 10 outputs the voice "apple" based on the voice output data transmitted from the server 30.

ユーザは、クライアント装置１０によって出力された音声により、自身の指に対して最も近くにあるオブジェクトＯＢＪ１のオブジェクト名「リンゴ」を知ることができる。 The user can learn the object name "apple" of the object OBJ1 closest to his or her finger from the voice output by the client device 10.

このように、ユーザ（例えば、視覚障害者）は、ネイルキャップＮＣが装着された指を使って、音声案内を受けることができる。つまり、視覚障害者が音声案内を受けられる範囲の制約を解消することができる。 In this way, the user (for example, a visually impaired person) can receive audio guidance using the finger on which the nail cap NC is attached. In other words, it is possible to eliminate restrictions on the range in which visually impaired people can receive audio guidance.

（３）情報処理
本実施形態の情報処理について説明する。図６は、本実施形態の情報処理のシーケンス図である。図７は、図６のＳ５００の説明図である。図８は、図６のＳ１００の説明図である。図９は、図６のＳ５０２の説明図である。(3) Information processing Information processing of this embodiment will be explained. FIG. 6 is a sequence diagram of information processing according to this embodiment. FIG. 7 is an explanatory diagram of S500 in FIG. 6. FIG. 8 is an explanatory diagram of S100 in FIG. 6. FIG. 9 is an explanatory diagram of S502 in FIG. 6.

カメラユニット５０は、撮像（Ｓ５００）を実行する。
具体的には、イメージセンサ５０ｄは、レンズ５０ａを通過した光の結像を電気信号に変換することにより、レンズ５０ａを通過した光に対応する画像データを生成する（図７Ａ）。
カメラコントローラ５０ｅは、イメージセンサ５０ｄによって生成された画像データをクライアント装置１０に送信する。The camera unit 50 performs imaging (S500).
Specifically, the image sensor 50d generates image data corresponding to the light that has passed through the lens 50a by converting the image of the light that has passed through the lens 50a into an electrical signal (FIG. 7A).
The camera controller 50e transmits image data generated by the image sensor 50d to the client device 10.

ステップＳ５００の後、クライアント装置１０は、画像解析リクエスト（Ｓ１００）を実行する。
具体的には、プロセッサ１２は、ステップＳ５００で送信された画像データに対応する画像ＩＭＧがネイルキャップＮＣに形成されたパターンを含むか否かを判定する。一例として、図８に示すように、ユーザが、レンズ５０ａの画角の範囲内で右手の親指を立てるジェスチャを行うと、ステップＳ５００で送信された画像データは、右手の親指に装着されたネイルキャップＮＣのパターン（例えば、テキスト「Ｒ１」）の画像を含む。この場合、プロセッサ１２は、ネイルキャップＮＣに形成されたパターンを含むと判定する。
プロセッサ１２は、ネイルキャップＮＣに形成されたパターンを含むと判定した場合、画像解析リクエストデータをサーバ３０に送信する。
画像解析リクエストデータは、ネイルキャップＮＣに形成されたパターンを含む画像ＩＭＧの画像データを含む。After step S500, the client device 10 executes an image analysis request (S100).
Specifically, the processor 12 determines whether the image IMG corresponding to the image data transmitted in step S500 includes the pattern formed on the nail cap NC. As an example, as shown in FIG. 8, when the user performs a gesture of raising the thumb of the right hand within the field of view of the lens 50a, the image data transmitted in step S500 is displayed on the nail attached to the thumb of the right hand. Contains an image of the pattern of the cap NC (eg, text "R1"). In this case, the processor 12 determines that the pattern formed on the nail cap NC is included.
When the processor 12 determines that the pattern formed on the nail cap NC is included, the processor 12 transmits image analysis request data to the server 30.
The image analysis request data includes image data of an image IMG including a pattern formed on the nail cap NC.

ステップＳ１００の後、サーバ３０は、画像解析（Ｓ３００）を実行する。
具体的には、プロセッサ３２は、画像解析リクエストデータに含まれる画像データに対して、特徴量解析を適用することにより、画像ＩＭＧに含まれる以下のオブジェクト（ネイルキャップＮＣの画素、及び、オブジェクトＯＢＪ１～ＯＢＪ２の画素）の座標を特定する。
プロセッサ３２は、特定した座標に基づいて、画像ＩＭＧに含まれるオブジェクトＯＢＪ１～ＯＢＪ２のうち、ネイルキャップＮＣに最も近いオブジェクトＯＢＪ１を特定する。After step S100, the server 30 performs image analysis (S300).
Specifically, the processor 32 applies feature analysis to the image data included in the image analysis request data, thereby analyzing the following objects included in the image IMG (pixels of the nail cap NC and object OBJ1). ~ OBJ2 pixel) Specify the coordinates.
Based on the identified coordinates, the processor 32 identifies the object OBJ1 closest to the nail cap NC among the objects OBJ1 to OBJ2 included in the image IMG.

ステップＳ３００の後、サーバ３０は、オブジェクトの推定（Ｓ３０１）を実行する。
具体的には、プロセッサ３２は、記憶装置３１に記憶された学習用データセットを参照して、ステップＳ３００で特定したオブジェクトＯＢＪ１の画素の特徴量に対応するオブジェクト名を推定する。After step S300, the server 30 performs object estimation (S301).
Specifically, the processor 32 refers to the learning data set stored in the storage device 31 and estimates the object name corresponding to the feature amount of the pixel of the object OBJ1 identified in step S300.

ステップＳ３０２の後、サーバ３０は、テキストデータの生成（Ｓ３０２）を実行する。
具体的には、プロセッサ３２は、ステップＳ３０１で推定されたオブジェクト名と、所定の語句（例えば、主語及び述語）と、によって構成される文章（例えば、「これはリンゴです。」）のテキストデータを生成する。After step S302, the server 30 executes text data generation (S302).
Specifically, the processor 32 generates text data of a sentence (for example, "This is an apple.") composed of the object name estimated in step S301 and a predetermined phrase (for example, a subject and a predicate). generate.

ステップＳ３０２の後、サーバ３０は、画像解析レスポンス（Ｓ３０３）を実行する。
具体的には、プロセッサ３２は、画像解析レスポンスデータをクライアント装置１０に送信する。
画像解析レスポンスデータは、ステップＳ３０２で生成されたテキストデータを含む。After step S302, the server 30 executes an image analysis response (S303).
Specifically, the processor 32 transmits image analysis response data to the client device 10.
The image analysis response data includes the text data generated in step S302.

ステップＳ３０３の後、クライアント装置１０は、音声データの生成（Ｓ１０１）を実行する。
具体的には、プロセッサ１２は、画像解析レスポンスデータに含まれるテキストデータを、当該テキストデータに対応する音声データに変換する。
プロセッサ１２は、変換した音声データをカメラユニット５０に送信する。After step S303, the client device 10 executes audio data generation (S101).
Specifically, the processor 12 converts text data included in the image analysis response data into audio data corresponding to the text data.
Processor 12 transmits the converted audio data to camera unit 50.

ステップＳ１０１の後、カメラユニット５０は、音声出力（Ｓ５０２）を実行する。
具体的には、カメラコントローラ５０ｅは、ステップＳ１０１で送信された音声データに対応する音声を再生する。
図９に示すように、スピーカ５０ｂは、再生された音声（例えば、「これはリンゴです。」）を出力する。After step S101, the camera unit 50 performs audio output (S502).
Specifically, the camera controller 50e reproduces the audio corresponding to the audio data transmitted in step S101.
As shown in FIG. 9, the speaker 50b outputs the reproduced sound (for example, "This is an apple.").

本実施形態によれば、ステップＳ５００で撮像された画像において、ネイルキャップＮＣの最も近くにあるオブジェクトＯＢＪ１の名称が読み上げられる。ユーザ（例えば、視覚障害者）は、スピーカ５０ｂから出力された音声を介して、ネイルキャップＮＣに最も近いオブジェクトＯＢＪ１を認知することができる。 According to this embodiment, the name of the object OBJ1 closest to the nail cap NC is read out in the image captured in step S500. A user (for example, a visually impaired person) can recognize the object OBJ1 closest to the nail cap NC through the sound output from the speaker 50b.

特に、親指の爪は最も大きく、且つ、親指を立てる行為はポジティブな印象を与えるので、親指用のネイルキャップＮＣに形成されたパターン（例えば、テキスト「Ｒ１」）を認識した場合にステップＳ１００を実行することが好ましい。 In particular, the thumb nail is the largest, and the act of holding the thumb up gives a positive impression, so when a pattern (for example, text "R1") formed on the thumb nail cap NC is recognized, step S100 is performed. It is preferable to do so.

（４）変形例
変形例について説明する。変形例は、ユーザのジェスチャに応じた情報処理を実行する例である。(4) Modification Example A modification example will be explained. A modified example is an example in which information processing is performed according to a user's gesture.

（４－１）データベース
変形例のデータベースについて説明する。図１０は、変形例のジェスチャデータベースのデータ構造を示す図である。(4-1) Database The database of the modified example will be explained. FIG. 10 is a diagram showing a data structure of a gesture database according to a modified example.

図１０に示すように、変形例のジェスチャ情報データベースは、「パターン」フィールドと、「ジェスチャ」フィールドと、「アクション」フィールドと、を含む。各フィールドは、互いに関連付けられている。 As shown in FIG. 10, the gesture information database of the modified example includes a "pattern" field, a "gesture" field, and an "action" field. Each field is associated with each other.

「パターン」フィールドには、ネイルキャップＮＣに形成されたパターンを識別する情報が格される。 The "pattern" field stores information that identifies the pattern formed on the nail cap NC.

「ジェスチャ」フィールドには、単位時間あたりのネイルキャップＮＣの位置の変位に関する情報（例えば、動きベクトル）が格納される。動きベクトルが０の場合、ネイルキャップＮＣが静止していることを意味する。動きベクトルが０ではない場合、ネイルキャップＮＣの動きの方向及び速度を意味する。 The "gesture" field stores information (for example, motion vector) regarding displacement of the position of the nail cap NC per unit time. When the motion vector is 0, it means that the nail cap NC is stationary. If the motion vector is not 0, it means the direction and speed of the movement of the nail cap NC.

「アクション」フィールドには、プロセッサ１２が実行する情報処理の内容に関する情報が格納される。情報処理は、例えば、以下を含む。
・ステップＳ５００でカメラユニット５０から送信された画像のサーバ３０への送信
・ステップＳ５００でカメラユニット５０から送信された画像のサーバ３０への送信、且つ、当該画像が添付されたメールの送信
・ステップＳ５００でカメラユニット５０から送信された画像が添付されたメールの送信
・ステップＳ５００でカメラユニット５０から送信された画像の記憶装置１１への記憶（つまり、画像の保存）
・所定のアプリケーションの起動The "action" field stores information regarding the content of information processing executed by the processor 12. Information processing includes, for example, the following.
- Sending the image sent from the camera unit 50 in step S500 to the server 30 - Sending the image sent from the camera unit 50 in step S500 to the server 30, and sending an email with the image attached - Step Sending an email to which the image sent from the camera unit 50 is attached in S500 - Storing the image sent from the camera unit 50 in step S500 in the storage device 11 (that is, saving the image)
・Start the specified application

（４－２）情報処理
変形例の情報処理について説明する。図１１は、変形例の情報処理のシーケンス図である。図１２は、変形例のジェスチャの一例を示す図である。(4-2) Information Processing Information processing in a modified example will be explained. FIG. 11 is a sequence diagram of information processing in a modified example. FIG. 12 is a diagram illustrating an example of a modified gesture.

図１１に示すように、ステップＳ５００（図６）の後、クライアント装置１０は、ジェスチャの特定（Ｓ１１０）を実行する。
具体的には、プロセッサ１２は、ステップＳ５００で送信された画像データに対応する画像ＩＭＧに含まれるネイルキャップＮＣに形成されたパターンの動きベクトルを特定する。As shown in FIG. 11, after step S500 (FIG. 6), the client device 10 executes gesture identification (S110).
Specifically, the processor 12 identifies the motion vector of the pattern formed on the nail cap NC included in the image IMG corresponding to the image data transmitted in step S500.

一例として、図１２Ａに示すように、ユーザが、レンズ５０ａの画角の範囲内で右手の人差し指及び中指を立てて静止させるジェスチャを行うと、ステップＳ５００で送信された画像データは、右手の人差し指及び中指のネイルキャップＮＣに形成されたパターン（例えば、テキスト「Ｒ２」及び「Ｒ３」）の画像を含み、且つ、動きベクトルは０である。この場合、プロセッサ１２は、「右手の人差し指及び中指に装着されたネイルキャップＮＣのパターンが静止している」と判定する。
図１２Ｂに示すように、ユーザが、レンズ５０ａの画角の範囲内で右手の人差し指、中指、及び、薬指を立てて静止させるジェスチャを行うと、ステップＳ５００で送信された画像データは、右手の人差し指、中指、及び、薬指のネイルキャップＮＣに形成されたパターン（例えば、テキスト「Ｒ２」～「Ｒ４」）の画像を含み、且つ、動きベクトルは０である。この場合、プロセッサ１２は、「右手の人差し指、中指、及び、薬指に装着されたネイルキャップＮＣのパターンが静止している」と判定する。
図１２Ｃに示すように、ユーザが、レンズ５０ａの画角の範囲内で右手の人差し指及び中指を立てて上から下に移動させるジェスチャを行うと、ステップＳ５００で送信された画像データは、右手の人差し指及び中指のネイルキャップＮＣに形成されたパターン（例えば、テキスト「Ｒ２」及び「Ｒ３」）の画像と、当該画像が上から下に移動することを示す動きベクトルと、を含む。この場合、プロセッサ１２は、「右手の人差し指及び中指に装着されたネイルキャップＮＣのパターンが上から下に動いている」と判定する。
図１２Ｄに示すように、ユーザが、レンズ５０ａの画角の範囲内で右手の人差し指、中指、及び、薬指を立てて下から上に移動させるジェスチャを行うと、ステップＳ５００で送信された画像データは、右手の人差し指、中指、及び、薬指のネイルキャップＮＣに形成されたパターン（例えば、テキスト「Ｒ２」～「Ｒ４」）の画像と、当該画像が下から上に移動することを示す動きベクトルと、を含む。この場合、プロセッサ１２は、「右手の人差し指、中指、及び、薬指に装着されたネイルキャップＮＣのパターンが下から上に動いている」と判定する。As an example, as shown in FIG. 12A, when the user makes a gesture of holding up the index finger and middle finger of the right hand and holding them still within the range of the angle of view of the lens 50a, the image data transmitted in step S500 is and an image of a pattern (for example, text "R2" and "R3") formed on the nail cap NC of the middle finger, and the motion vector is 0. In this case, the processor 12 determines that "the pattern of the nail cap NC attached to the index finger and middle finger of the right hand is stationary."
As shown in FIG. 12B, when the user makes a gesture of holding the index finger, middle finger, and ring finger of the right hand upright within the field of view of the lens 50a, the image data transmitted in step S500 is transmitted to the right hand. It includes images of patterns (for example, text "R2" to "R4") formed on the nail caps NC of the index finger, middle finger, and ring finger, and the motion vector is 0. In this case, the processor 12 determines that "the patterns of the nail caps NC attached to the index finger, middle finger, and ring finger of the right hand are stationary."
As shown in FIG. 12C, when the user makes a gesture of raising the index and middle fingers of the right hand and moving them from top to bottom within the field of view of the lens 50a, the image data transmitted in step S500 is transferred to the right hand. It includes an image of a pattern (for example, text "R2" and "R3") formed on the nail cap NC of the index finger and middle finger, and a motion vector indicating that the image moves from top to bottom. In this case, the processor 12 determines that "the pattern of the nail cap NC attached to the index finger and middle finger of the right hand is moving from top to bottom."
As shown in FIG. 12D, when the user makes a gesture of raising the index finger, middle finger, and ring finger of the right hand and moving them from bottom to top within the field of view of the lens 50a, the image data transmitted in step S500 is an image of a pattern (for example, text "R2" to "R4") formed on the nail cap NC of the index finger, middle finger, and ring finger of the right hand, and a motion vector indicating that the image moves from bottom to top. and, including. In this case, the processor 12 determines that "the pattern of the nail cap NC attached to the index finger, middle finger, and ring finger of the right hand is moving from the bottom to the top."

ステップＳ１１０の後、クライアント装置１０は、アクションの実行（Ｓ１１１）を実行する。
具体的には、プロセッサ１２は、ジェスチャ情報データベース（図１０）を参照して、ステップＳ１１０で特定した動きベクトルに関連付けられた「アクション」フィールドの情報を特定する。
プロセッサ１２は、特定した「アクション」フィールドの情報に対応する処理を実行する。After step S110, the client device 10 executes an action (S111).
Specifically, the processor 12 refers to the gesture information database (FIG. 10) and identifies information in the "action" field associated with the motion vector identified in step S110.
The processor 12 executes processing corresponding to the information in the specified "action" field.

一例として、ステップＳ１１０で図１２Ａのジェスチャが特定された場合、画像の送信、及び、メールの送信が実行される（図１０）。
ステップＳ１１０で図１２Ｂのジェスチャが特定された場合、メールの送信が実行される（図１０）。
このように、クライアント装置１０は、Ｓ１００で認識されたパターンの組合せに応じた処理を実行する。As an example, when the gesture shown in FIG. 12A is identified in step S110, image transmission and email transmission are executed (FIG. 10).
If the gesture shown in FIG. 12B is identified in step S110, email transmission is executed (FIG. 10).
In this way, the client device 10 executes processing according to the combination of patterns recognized in S100.

別の例として、ステップＳ１１０で図１２Ｃのジェスチャが特定された場合、画像の保存が実行される（図１０）。
ステップＳ１１０で図１２Ｄのジェスチャが特定された場合、所定のアプリケーションの起動が実行される（図１０）。
このように、クライアント装置１０は、Ｓ１００で認識されたパターン、及び、ネイルキャップＮＣの動きの組合せに応じた処理を実行する。As another example, if the gesture of FIG. 12C is identified in step S110, image storage is performed (FIG. 10).
If the gesture shown in FIG. 12D is identified in step S110, a predetermined application is activated (FIG. 10).
In this way, the client device 10 executes processing according to the combination of the pattern recognized in S100 and the movement of the nail cap NC.

変形例によれば、クライアント装置１０は、ネイルキャップＮＣが装着された指によるジェスチャに応じたアクションを実行する。これにより、ユーザは、ネイルキャップＮＣを装着した指だけで、クライアント装置１０にユーザ指示を与えることができる。特に、ユーザが視覚障害者である場合、ユーザは、ディスプレイを見なくても、指の動きだけでクライアント装置１０に対して様々なユーザ指示を与えることができるので、特に有用である。 According to the modification, the client device 10 performs an action in response to a gesture made by a finger to which the nail cap NC is attached. Thereby, the user can give user instructions to the client device 10 using only the finger wearing the nail cap NC. This is particularly useful when the user is visually impaired, since the user can give various user instructions to the client device 10 just by moving his or her fingers without looking at the display.

（５）本実施形態の小括
本実施形態について小括する。(5) Summary of this embodiment This embodiment will be summarized.

本実施形態の第１態様は、
画像に含まれるオブジェクトに関する音声出力データを生成可能なサーバ３０と接続されるクライアント装置１０であって、
ユーザ（例えば、視覚障害者）の指に装着された少なくとも１つの装着物（例えば、ネイルキャップＮＣ）と、少なくとも１つのオブジェクトと、を含む画像ＩＭＧの画像データを取得する手段（例えば、ステップＳ１００の処理を実行するプロセッサ１２）を備え、
画像データをサーバ３０に送信する手段（例えば、ステップＳ１００の処理を実行するプロセッサ１２）を備え、
画像ＩＭＧに含まれるオブジェクトに関する音声を出力するための音声出力データ（例えば、テキストデータ）をサーバ３０から受信する手段（例えば、ステップＳ１０１の処理を実行するプロセッサ１２）を備え、
音声出力データに基づく音声を出力する手段（例えば、ステップＳ１０１の処理を実行するプロセッサ１２）を備える、
クライアント装置１０である。The first aspect of this embodiment is
A client device 10 connected to a server 30 capable of generating audio output data regarding an object included in an image, the client device 10 comprising:
A means (for example, step S100) for acquiring image data of an image IMG including at least one attachment (for example, a nail cap NC) attached to a finger of a user (for example, a visually impaired person) and at least one object. a processor 12) that executes the processing of
comprising means for transmitting image data to the server 30 (for example, the processor 12 that executes the process of step S100),
comprising means (for example, the processor 12 that executes the process of step S101) for receiving voice output data (for example, text data) from the server 30 for outputting voice regarding the object included in the image IMG;
comprising means for outputting audio based on the audio output data (for example, a processor 12 that executes the process of step S101);
This is a client device 10.

第１態様によれば、クライアント装置１０は、ユーザ（例えば、視覚障害者）の指に装着された装着物（例えば、ネイルキャップＮＣ）及びオブジェクトＯＢＪを含む画像の画像データをサーバ３０に送信し、且つ、オブジェクトＯＢＪに関する音声を出力する。これにより、視覚障害者が音声案内を受けられる範囲の制約を解消することができる。 According to the first aspect, the client device 10 transmits to the server 30 image data of an image including an object OBJ and an attachment (e.g., a nail cap NC) attached to a finger of a user (e.g., a visually impaired person). , and outputs audio related to object OBJ. This makes it possible to eliminate restrictions on the range in which visually impaired people can receive audio guidance.

特に、指に装着されたネイルキャップＮＣを含む画像データを取得するので、ユーザの目線により近い画像の音声案内を提供することができる。 In particular, since image data including the nail cap NC attached to the finger is acquired, it is possible to provide audio guidance of an image closer to the user's line of sight.

本実施形態の第２態様は、
出力される音声は、オブジェクトの名称を含む、
クライアント装置１０である。The second aspect of this embodiment is
The output audio includes the name of the object,
This is a client device 10.

第２態様によれば、ユーザは、出力された音声を介して、ネイルキャップＮＣの近傍にあるオブジェクトの名称を知ることができる。 According to the second aspect, the user can know the name of the object near the nail cap NC through the output audio.

本実施形態の第３態様は、
画像解析を実行可能なサーバ３０と接続されるクライアント装置１０であって、
画像データを取得する手段（例えば、ステップＳ１００の処理を実行するプロセッサ１２）を備え、
ユーザの指に装着された少なくとも１つの装着物（例えば、ネイルキャップＮＣ）の画像が画像データに含まれているか否かを判定する手段（例えば、ステップＳ１００の処理を実行するプロセッサ１２）を備え、
装着物の画像が画像データに含まれている場合、画像データをサーバ３０に送信する手段（例えば、ステップＳ１００の処理を実行するプロセッサ１２）を備え、
画像データに対する解析結果をサーバ３０から受信する手段（例えば、ステップＳ１０１の処理を実行するプロセッサ１２）を備える、
クライアント装置１０である。The third aspect of this embodiment is
A client device 10 connected to a server 30 capable of executing image analysis,
comprising means for acquiring image data (for example, a processor 12 that executes the process of step S100),
The method includes means (for example, a processor 12 that executes the process of step S100) for determining whether or not the image data includes an image of at least one attachment (for example, a nail cap NC) worn on a user's finger. ,
If the image data includes an image of the wearable object, the device includes means for transmitting the image data to the server 30 (for example, the processor 12 that executes the process of step S100),
comprising means for receiving analysis results for image data from the server 30 (for example, the processor 12 that executes the process in step S101);
This is a client device 10.

第３態様によれば、クライアント装置１０は、ユーザ（例えば、視覚障害者）の指に装着された装着物（例えば、ネイルキャップＮＣ）を含む画像を認識した場合、当該画像の画像データをサーバ３０に送信する。これにより、ユーザは、画像データをサーバ３０に送信するためのユーザ指示を容易に与えることができる。 According to the third aspect, when the client device 10 recognizes an image including an attachment (for example, a nail cap NC) attached to a finger of a user (for example, a visually impaired person), the client device 10 transmits the image data of the image to a server. Send to 30. This allows the user to easily give user instructions for transmitting image data to the server 30.

本実施形態の第４態様は、
取得する手段は、クライアント装置１０に接続されたカメラ（例えば、カメラユニット５０）、又は、クライアント装置１０に配置されたカメラから、画像データを取得する、クライアント装置１０である。The fourth aspect of this embodiment is
The acquiring means is the client device 10 that acquires image data from a camera connected to the client device 10 (for example, the camera unit 50) or a camera placed in the client device 10.

本実施形態の第５態様は、
取得する手段は、装着物を認識したときに、画像データを取得する、
クライアント装置１０である。The fifth aspect of this embodiment is
The acquisition means acquires image data when the worn object is recognized.
This is a client device 10.

第５態様によれば、ユーザは、ネイルキャップＮＣを装着した指をレンズ５０ａの画角の範囲内にかざすだけで、画像データを生成させるためのユーザ指示を与えることができる。 According to the fifth aspect, the user can give a user instruction for generating image data simply by holding the finger wearing the nail cap NC within the field of view of the lens 50a.

本実施形態の第６態様は、
取得する手段は、装着物に形成されたパターンを認識したときに、画像データを取得する、
クライアント装置１０である。The sixth aspect of this embodiment is
The acquiring means acquires image data when a pattern formed on the wearable object is recognized.
This is a client device 10.

本実施形態の第７態様は、
認識されたパターンの組合せに応じた処理を実行する手段（例えば、ステップＳ１１０～Ｓ１１１を実行するプロセッサ１２）を備える、
クライアント装置１０である。The seventh aspect of this embodiment is
comprising means (for example, a processor 12 that executes steps S110 to S111) for executing processing according to the combination of recognized patterns;
This is a client device 10.

第７態様によれば、ユーザは、レンズ５０ａに写り込む指の組合せによって、クライアント装置１０に対するユーザ指示を与えることができる。 According to the seventh aspect, the user can give a user instruction to the client device 10 by a combination of fingers reflected in the lens 50a.

本実施形態の第８態様は、
実行する手段は、前記認識されたパターン、及び、前記装着物の動きの組合せに応じた処理を実行する、
クライアント装置１０である。The eighth aspect of this embodiment is
The executing means executes a process according to a combination of the recognized pattern and the movement of the worn object.
This is a client device 10.

第８態様によれば、ユーザは、レンズ５０ａに写り込む指の組合せ、及び、指の動きによって、クライアント装置１０に対するユーザ指示を与えることができる。 According to the eighth aspect, the user can give a user instruction to the client device 10 by the combination of fingers reflected in the lens 50a and the movement of the fingers.

本実施形態の第９態様は、
クライアント装置１０と接続されるサーバ３０であって、
クライアント装置１０から画像データを取得する手段（例えば、ステップＳ３００の処理を実行するプロセッサ３２）を備え、
取得された画像データに対応する画像に含まれるオブジェクトの名称を推定する手段（例えば、ステップＳ３０１の処理を実行するプロセッサ３２）を備え、
推定されたオブジェクトの名称を含む音声を出力するための音声出力データを生成する手段（例えば、ステップＳ３０２の処理を実行するプロセッサ３２）を備え、
生成された音声出力データをクライアント装置１０に送信する手段（例えば、ステップＳ３０３の処理を実行するプロセッサ３２）を備える、
サーバ３０である。The ninth aspect of this embodiment is
A server 30 connected to the client device 10,
comprising means for acquiring image data from the client device 10 (for example, a processor 32 that executes the process of step S300),
comprising means for estimating the name of an object included in an image corresponding to the acquired image data (for example, a processor 32 that executes the process of step S301),
comprising means for generating audio output data for outputting audio including the estimated name of the object (for example, a processor 32 that executes the process of step S302),
comprising means for transmitting the generated audio output data to the client device 10 (for example, the processor 32 that executes the process of step S303);
This is the server 30.

（６）その他の変形例 (6) Other variations

記憶装置１１は、ネットワークＮＷを介して、クライアント装置１０と接続されてもよい。記憶装置３１は、ネットワークＮＷを介して、サーバ３０と接続されてもよい。 The storage device 11 may be connected to the client device 10 via the network NW. The storage device 31 may be connected to the server 30 via the network NW.

上記の情報処理の各ステップは、クライアント装置１０及びサーバ３０の何れでも実行可能である。 Each step of the above information processing can be executed by either the client device 10 or the server 30.

カメラユニット５０は、クライアント装置１０に内蔵されても良い。 The camera unit 50 may be built into the client device 10.

各ネイルキャップＮＣに形成されるパターンは、文字に限られない。当該パターンは、以下を含む。特に、美観性の高いパターンをネイルキャップＮＣに形成することにより、ネイルキャップＮＣを装着させ、且つ、本実施形態の音声案内を利用することの動機付けをユーザに与えることができる。
・図形
・幾何学模様
・凹凸
・カラーバリエーションThe pattern formed on each nail cap NC is not limited to letters. The pattern includes: In particular, by forming a highly aesthetically pleasing pattern on the nail cap NC, it is possible to motivate the user to wear the nail cap NC and use the audio guidance of this embodiment.
・Shapes, geometric patterns, unevenness, color variations

以上、本発明の実施形態について詳細に説明したが、本発明の範囲は上記の実施形態に限定されない。また、上記の実施形態は、本発明の主旨を逸脱しない範囲において、種々の改良や変更が可能である。また、上記の実施形態及び変形例は、組合せ可能である。 Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited to the above embodiments. Moreover, various improvements and changes can be made to the embodiments described above without departing from the spirit of the present invention. Furthermore, the above embodiments and modifications can be combined.

１：情報処理システム
１０：クライアント装置
１１：記憶装置
１２：プロセッサ
１３：入出力インタフェース
１４：通信インタフェース
３０：サーバ
３１：記憶装置
３２：プロセッサ
３３：入出力インタフェース
３４：通信インタフェース
５０：カメラユニット
５０ａ：レンズ
５０ｂ：スピーカ
５０ｃ：クリップ
５０ｄ：イメージセンサ
５０ｅ：カメラコントローラ1 : Information processing system 10 : Client device 11 : Storage device 12 : Processor 13 : Input/output interface 14 : Communication interface 30 : Server 31 : Storage device 32 : Processor 33 : Input/output interface 34 : Communication interface 50 : Camera unit 50a : Lens 50b: Speaker 50c: Clip 50d: Image sensor 50e: Camera controller

Claims

A client device connected to a server configured to generate audio output data regarding an object included in an image, the client device comprising:
comprising means for acquiring image data of an image including the object ,
comprising means for determining whether the image data includes an image of at least one attachment worn on a user's finger;
If an image of the wearable object is included in the image data, comprising means for transmitting the image data to the server,
comprising means for receiving audio output data regarding an object included in the image from the server;
Client device.

The client device according to claim 1, wherein the acquiring means acquires the image data from a camera connected to the client device or a camera located at the client device.

The transmitting means transmits the image data when the wearable object is recognized.
The client device according to claim 1 or claim 2.

4. The client device according to claim 3, wherein the transmitting means transmits the image data when recognizing a pattern formed on the wearable object.

5. The client device according to claim 1, further comprising means for executing processing according to a combination of the recognized patterns when the patterns formed on the wearable object are recognized.

6. The client device according to claim 5, wherein the executing means executes processing according to a combination of the recognized pattern and the movement of the wearable object.

The acquiring means acquires image data of a plurality of attachments attached to each finger of the user,
A client device according to any one of claims 1 to 6.

The acquiring means acquires image data of wearable items worn on the right and left hands of the user,
The client device according to claim 7.

The acquiring means acquires image data of an attachment attached to each finger of the right hand and each finger of the left hand of the user,
The client device according to claim 8.

The plurality of attachments are formed with mutually different patterns,
The acquiring means acquires image data including the pattern,
The client device according to any one of claims 7 to 9.

The acquiring means acquires image data of an attachment worn on a specific finger.
The client device according to any one of claims 7 to 10.

the user is visually impaired;
A client device according to any one of claims 1 to 11.

means for acquiring image data; means for determining whether an image of at least one wearable item attached to a user's finger is included in the image data; and an image of the wearable item is included in the image data. If the server is connected to a client device, the server includes means for transmitting the image data to the server and means for receiving analysis results for the image data from the server ,
comprising means for acquiring the image data from the client device,
comprising means for estimating the name of an object included in an image corresponding to the acquired image data,
comprising means for generating audio output data for outputting audio including the estimated name of the object,
comprising means for transmitting the generated audio output data to the client device;
server.

comprising means for specifying the position of the wearing object and the position of the object in the image,
The estimating means estimates the name of an object closest to the attached object.
The server according to claim 13.

A program for causing a computer to function as each means according to any one of claims 1 to 14.

An information processing method for generating audio output data regarding an object included in an image using a computer, the method comprising:
Obtaining image data of an image including a plurality of nail caps attached to a user's fingers and at least one object,
comprising the step of transmitting the image data to a server,
comprising the step of receiving audio output data from the server for outputting audio related to the object included in the image;
comprising the step of outputting audio based on the audio output data,
comprising means for, when recognizing the patterns formed on the plurality of nail caps, executing processing according to the combination of the recognized patterns ;
Information processing method.