JP7382847B2

JP7382847B2 - Information processing method, program, and information processing device

Info

Publication number: JP7382847B2
Application number: JP2020025389A
Authority: JP
Inventors: 慧 ▲柳▼澤
Original assignee: Mercari Inc
Current assignee: Mercari Inc
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2023-11-17
Anticipated expiration: 2040-02-18
Also published as: JP2021131617A

Description

本開示は、情報処理方法、プログラム、及び情報処理装置に関する。 The present disclosure relates to an information processing method, a program, and an information processing device.

以前から、ＣｔｏＣ（Customer To Customer）マーケットプレイスなどの電子商取引プラットフォームにおいて、個人売買を仲介するシステムが公開されている（例えば、特許文献１参照）。 BACKGROUND ART For some time now, systems that mediate individual sales have been published on electronic commerce platforms such as CtoC (Customer To Customer) marketplaces (see, for example, Patent Document 1).

特開２００１－１６７１６３号公報Japanese Patent Application Publication No. 2001-167163

しかしながら、従来技術では、電子商取引などのユーザインタフェースに慣れないユーザにとって、どのように設定、登録等したら利用可能になるのかを理解することが難しい場合に、説明文などの情報しか存在せず、設定方法等を適切に理解することができなかった。 However, with conventional technology, when it is difficult for users who are not accustomed to user interfaces such as e-commerce to understand how to set up, register, etc. to enable use, only information such as explanatory text exists. I was unable to properly understand the setting method, etc.

本開示は、所定画面からユーザインタフェースを用いて設定等がされる場合に、適切なガイドを行うことを可能にする仕組みを提供する情報処理方法、プログラム、及び情報処理装置を提供することを目的の一つとする。 The purpose of the present disclosure is to provide an information processing method, a program, and an information processing device that provide a mechanism that enables appropriate guidance when settings are made using a user interface from a predetermined screen. be one of the.

本開示の一実施形態に係る情報処理方法は、情報処理装置に含まれる１又は複数のプロセッサが、撮影装置により撮影中の画像を表示制御することと、前記画像内に表示される他の情報処理装置の画面を特定することと、特定された画面内に表示される文字が認識され、当該文字の文字情報を含む認識結果を取得することと、前記認識結果に対応するガイド情報を取得することと、前記ガイド情報を、対応する文字情報に関連付けて前記撮影中の画像に重畳して表示制御することと、を実行する。 An information processing method according to an embodiment of the present disclosure includes one or more processors included in an information processing device controlling the display of an image being photographed by a photographing device, and other information displayed in the image. identifying the screen of the processing device; recognizing characters displayed on the identified screen; acquiring recognition results including character information of the characters; and acquiring guide information corresponding to the recognition results. and controlling the display of the guide information by associating it with corresponding character information and superimposing it on the image being photographed.

開示の技術によれば、所定画面からユーザインタフェースを用いて設定等がされる場合に、適切なガイドを行うことを可能にする仕組みを提供することができる。 According to the disclosed technology, it is possible to provide a mechanism that allows appropriate guidance to be provided when settings are made from a predetermined screen using a user interface.

実施形態における情報処理システム１の各構成例を示す図である。1 is a diagram illustrating each configuration example of an information processing system 1 in an embodiment. FIG. 実施形態に係るユーザ端末１０Ａの一例を示すブロック図である。It is a block diagram showing an example of user terminal 10A concerning an embodiment. 実施形態に係るサーバ２０の一例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a server 20 according to an embodiment. 実施形態に係る物体データ２３３の一例を示す図である。It is a figure showing an example of object data 233 concerning an embodiment. 実施形態に係る文字認識データ２３４の一例を示す図である。It is a figure showing an example of character recognition data 234 concerning an embodiment. 実施形態に係るガイド格納先データ２３５の一例を示す図である。FIG. 3 is a diagram showing an example of guide storage location data 235 according to the embodiment. 実施形態に係る情報処理システム１の登録処理の一例を示すシーケンス図である。FIG. 2 is a sequence diagram illustrating an example of registration processing of the information processing system 1 according to the embodiment. 実施形態に係る情報処理システム１の表示処理の一例を示すシーケンス図である。FIG. 2 is a sequence diagram illustrating an example of display processing of the information processing system 1 according to the embodiment. 実施形態に係るユーザ端末１０Ａにおける画面遷移の一例を示す図である。It is a figure showing an example of screen transition in user terminal 10A concerning an embodiment.

以下、本開示の実施形態について図面を参照しつつ詳細に説明する。なお、同一の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. Note that the same elements are given the same reference numerals and redundant explanations will be omitted.

［実施形態］
実施形態では、情報処理装置の撮影装置により撮影中の画像を表示し、他の情報処理装置の表示部に表示された所定画面を特定し、所定画面に含まれる各項目に対してユーザが設定する場合に、各項目の入力方法をガイドするためのガイド情報を重畳表示する方法、プログラム、装置、システム等について説明する。 [Embodiment]
In the embodiment, an image being captured by an imaging device of an information processing device is displayed, a predetermined screen displayed on a display unit of another information processing device is specified, and a user sets settings for each item included in the predetermined screen. A method, program, device, system, etc. for superimposing guide information to guide the input method of each item when doing so will be explained.

実施形態では、情報処理装置としてウェラブル端末のスマートグラスを例にし、他の情報処理装置としてスマートフォンを例にして説明するが、この例に限られないことはいうまでもない。例えば、情報処理装置は、撮影装置を内蔵する又は外付け可能な装置でもよく、他の情報処理装置は、画面からユーザが操作するような装置であればよい。 In the embodiment, description will be given using smart glasses as a wearable terminal as an example of an information processing device, and using a smartphone as an example of another information processing device, but it goes without saying that the present invention is not limited to this example. For example, the information processing device may include a built-in imaging device or may be an externally attachable device, and the other information processing device may be a device that can be operated by a user from a screen.

ユーザは、所定画面の設定方法に関するガイド情報が表示部に重畳表示されるので、適切なガイドを容易に把握することができる。また、ガイド情報が重畳表示されることにより、ユーザは、別途説明文を読んだりせずにすむ。 Since the guide information regarding the method of setting the predetermined screen is superimposed on the display unit, the user can easily understand the appropriate guide. Furthermore, since the guide information is displayed in a superimposed manner, the user does not have to read a separate explanatory text.

＜システムの適用例＞
図１は、実施形態における情報処理システム１の各構成例を示す図である。図１に示す例では、ユーザが利用する各情報処理装置１０Ａ、１０Ｂ・・・と、物体認識処理を実行したり、文字認識処理を実行したり、ガイド情報を記憶したりするサーバ２０とが、ネットワークＮを介して接続される。なお、サーバ２０は、複数のサーバやデータベースで構成されてもよく、機能ごとに１つのサーバで処理されたり、各データを１つのデータベースで保存したりしてもよい。 <System application example>
FIG. 1 is a diagram illustrating each configuration example of an information processing system 1 in an embodiment. In the example shown in FIG. 1, each information processing device 10A, 10B, . . . used by a user, and a server 20 that executes object recognition processing, character recognition processing, and stores guide information. , are connected via network N. Note that the server 20 may be configured with a plurality of servers or databases, and each function may be processed by one server, or each data may be stored in one database.

情報処理装置１０Ａは、例えば、ウェアラブル端末（限定でなく例として、メガネ型デバイスなど）である。ウェアラブル端末は、ユーザが装着する電子デバイスである。ウェアラブル端末は、例えば、メガネ型端末（スマートグラス）、コンタクトレンズ型端末（スマートコンタクトレンズ）、拡張現実（ＡＲ: Augmented Reality）技術を用いたヘッドマウントディスプレイ、義眼、ブレイン・マシン・インタフェース等であってもよい。また、ウェアラブル端末はスマートスピーカー、ロボット等、ユーザが装着できない端末でもよい。本実施形態においては、上述したように、ウェアラブル端末がメガネ型端末（スマートグラス）である場合を例に説明する。なお、情報処理装置１０Ａは、ウェアラブル端末に限らず、スマートフォンやタブレット端末などの情報処理端末であってもよい。また、以下、情報処理装置１０Ａは、ユーザ端末１０Ａとも呼ばれる。 The information processing device 10A is, for example, a wearable terminal (by way of example and not limitation, a glasses-type device, etc.). A wearable terminal is an electronic device worn by a user. Wearable devices include, for example, glasses-type devices (smart glasses), contact lens-type devices (smart contact lenses), head-mounted displays using augmented reality (AR) technology, artificial eyes, brain-machine interfaces, etc. It's okay. Furthermore, the wearable terminal may be a terminal that cannot be worn by the user, such as a smart speaker or a robot. In this embodiment, as described above, the case where the wearable terminal is a glasses-type terminal (smart glasses) will be described as an example. Note that the information processing device 10A is not limited to a wearable terminal, but may be an information processing terminal such as a smartphone or a tablet terminal. Further, the information processing device 10A is also referred to as the user terminal 10A hereinafter.

情報処理装置１０Ｂは、例えば、スマートフォン、携帯電話（フィーチャーフォン）、コンピュータ、ＰＤＡ（Personal Digital Assistant）、券売機、宅配ロッカーや宅配ボックス、テレビ、家電のリモートコントローラー、スクリーンを含む表示装置などである。また、情報処理装置１０Ｂは、所定画面を用いてユーザに操作を行わせるような装置であればよい。また、以下、情報処理装置１０Ｂは、ユーザ端末１０Ｂとも呼ばれる。 The information processing device 10B is, for example, a smartphone, a mobile phone (feature phone), a computer, a PDA (Personal Digital Assistant), a ticket vending machine, a delivery locker or delivery box, a television, a remote controller for home appliances, a display device including a screen, etc. . Further, the information processing device 10B may be any device that allows the user to perform operations using a predetermined screen. Further, the information processing device 10B is also referred to as the user terminal 10B hereinafter.

情報処理装置２０は、例えばサーバであり、１又は複数の装置により構成されてもよい。また、情報処理装置２０は、物体認識処理を実行したり、文字認識処理を実行したり、ガイド情報を記憶したり、電子商取引プラットフォームを管理したりするサーバである。 The information processing device 20 is, for example, a server, and may be composed of one or more devices. Further, the information processing device 20 is a server that executes object recognition processing, character recognition processing, stores guide information, and manages an electronic commerce platform.

図１に示す例では、ユーザ端末１０Ｂは、所定画面を表示し、例えば電子商取引プラットフォームに会員登録するための登録画面を表示するとする。ユーザ端末１０Ａは、内蔵又は外付けの撮影装置（例えばカメラ）を用いて、ユーザ端末１０Ｂの画面に表示された登録画面を撮影する。 In the example shown in FIG. 1, the user terminal 10B displays a predetermined screen, for example, a registration screen for registering as a member of an electronic commerce platform. The user terminal 10A uses a built-in or external photographing device (for example, a camera) to photograph the registration screen displayed on the screen of the user terminal 10B.

このとき、ユーザは、表示された登録画面に対してジェスチャを行ったり、音声で登録画面が撮影されていることを指示したりすると、ユーザ端末１０Ａは、登録画面を含む画像を取得し、取得した画像をサーバ２０に送信する。 At this time, when the user performs a gesture on the displayed registration screen or instructs by voice that the registration screen is being photographed, the user terminal 10A acquires and acquires an image including the registration screen. The image is sent to the server 20.

サーバ２０は、取得した画像から物体（例えばスマートフォンや、券売機など）を認識し、認識した物体を識別する物体識別情報（物体ＩＤ）と、画像内における画面の位置情報を取得する。サーバ２０は、取得した物体ＩＤと位置情報とをユーザ端末１０Ｂに送信する。 The server 20 recognizes an object (for example, a smartphone, a ticket vending machine, etc.) from the acquired image, and acquires object identification information (object ID) for identifying the recognized object and screen position information within the image. The server 20 transmits the acquired object ID and position information to the user terminal 10B.

ユーザ端末１０Ｂは、サーバ２０から取得した位置情報に基づき、画像から文字認識する領域を特定し、特定した領域の画像（領域画像）から文字認識を行う。文字認識処理について、領域画像がサーバ２０に送信され、ユーザ端末１０Ｂは、サーバ２０による文字の認識結果を取得するようにしてもよい。認識結果は、例えば、「登録画面」の画面名、「氏名」、「メールアドレス」などの各入力項目である。 The user terminal 10B specifies a region for character recognition from the image based on the position information acquired from the server 20, and performs character recognition from the image of the specified region (region image). Regarding the character recognition process, the region image may be transmitted to the server 20, and the user terminal 10B may acquire the character recognition result by the server 20. The recognition results are, for example, each input item such as the screen name of the "registration screen", "name", and "email address".

ユーザ端末１０Ｂは、認識結果に基づいて、「登録画面」に対応するガイド情報をサーバ２０から取得し、ガイド情報を表示部に重畳表示する。これにより、物体認識によりガイド情報を特定しつつ、適切なガイド情報を重畳表示することで、ユーザ端末１０Ｂの実際の登録画面はそのまま表示し、ユーザ端末１０Ｂとは異なるユーザ端末１０Ａを用いて、登録画面の入力や設定をアシストすることができる。 Based on the recognition result, the user terminal 10B acquires guide information corresponding to the "registration screen" from the server 20, and displays the guide information in a superimposed manner on the display unit. As a result, by superimposing and displaying appropriate guide information while specifying guide information through object recognition, the actual registration screen of the user terminal 10B is displayed as is, and the user terminal 10A, which is different from the user terminal 10B, is used to You can assist with input and settings on the registration screen.

＜構成の一例＞
図２は、実施形態に係るユーザ端末１０Ａの一例を示すブロック図である。ユーザ端末１０Ａは典型的には、１つ又は複数の処理装置（ＣＰＵ）１１０、１つ又は複数のネットワーク又は他の通信インタフェース１２０、メモリ１３０、ユーザインタフェース１５０、撮影装置１６０、及びこれらの構成要素を相互接続するための１つ又は複数の通信バス１７０を含む。 <Example of configuration>
FIG. 2 is a block diagram showing an example of the user terminal 10A according to the embodiment. User terminal 10A typically includes one or more processing units (CPUs) 110, one or more network or other communication interfaces 120, memory 130, user interface 150, imaging device 160, and components thereof. including one or more communication buses 170 for interconnecting the .

ユーザインタフェース１５０は、例えば、ディスプレイ装置１５１及び入力装置（キーボード及び／又はマウス、又は他の何らかのポインティングデバイス、音を入力可能なマイク等）１５２を備えるユーザインタフェース１５０である。また、ユーザインタフェース１５０は、タッチパネルでもよい。また、ユーザ端末１０Ａがウェアラブル端末１０Ａの場合、ディスプレイ装置１５１はレンズ、入力装置１５２はマイク等でもよい。 The user interface 150 is, for example, a user interface 150 that includes a display device 151 and an input device (a keyboard and/or a mouse, or some other pointing device, a microphone capable of inputting sound, etc.) 152. Further, the user interface 150 may be a touch panel. Furthermore, when the user terminal 10A is a wearable terminal 10A, the display device 151 may be a lens, the input device 152 may be a microphone, or the like.

撮影装置１６０は、画像（静止画像及び動画像を含む）を撮影するためのデバイスである。例えば、撮影装置１６０は、ＣＣＤイメージセンサ、ＣＭＯＳイメージセンサ、レンズ等の撮影素子を含んでいてもよい。 The photographing device 160 is a device for photographing images (including still images and moving images). For example, the photographing device 160 may include a photographing element such as a CCD image sensor, a CMOS image sensor, and a lens.

メモリ１３０は、例えば、ＤＲＡＭ、ＳＲＡＭ、ＤＤＲＲＡＭ又は他のランダムアクセス固体記憶装置などの高速ランダムアクセスメモリであり、また、１つ又は複数の磁気ディスク記憶装置、光ディスク記憶装置、フラッシュメモリデバイス、又は他の不揮発性固体記憶装置などの不揮発性メモリでもよい。 Memory 130 is, for example, a high speed random access memory such as DRAM, SRAM, DDR RAM or other random access solid state storage, and may also include one or more magnetic disk storage, optical disk storage, flash memory devices, or It may also be a nonvolatile memory such as another nonvolatile solid state storage device.

また、メモリ１３０の他の例として、ＣＰＵ１１０から遠隔に設置される１つ又は複数の記憶装置でもよい。ある実施形態において、メモリ１３０は次のプログラム、モジュール及びデータ構造、又はそれらのサブセットを格納する。 Further, as another example of the memory 130, one or more storage devices installed remotely from the CPU 110 may be used. In some embodiments, memory 130 stores the following programs, modules and data structures, or a subset thereof.

オペレーティングシステム１３１は、例えば、様々な基本的なシステムサービスを処理するとともにハードウェアを用いてタスクを実行するためのプロシージャを含む。 Operating system 131 includes, for example, procedures for handling various basic system services and performing tasks using hardware.

ネットワーク通信モジュール１３２は、例えば、ユーザ端末１０Ａを他のコンピュータに、１つ又は複数のネットワーク通信インタフェース１２０及び、インターネット、他の広域ネットワーク、ローカルエリアネットワーク、メトロポリタンエリアネットワークなどの１つ又は複数の通信ネットワークを介して接続するために使用される。 The network communications module 132 may, for example, connect the user terminal 10A to another computer via one or more network communications interfaces 120 and one or more communications networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, etc. Used to connect over a network.

画像関連データ１３３は、撮影中に撮影された画像データに関連して取得可能なデータである。例えば、画像関連データ１３３は、画像データを識別するための画像ＩＤ、画像内の物体の物体ＩＤ、画面の位置情報、領域画像などを含む。これらのデータは、画像データから物体認識に関連して取得可能である。 The image-related data 133 is data that can be obtained in relation to image data taken during shooting. For example, the image-related data 133 includes an image ID for identifying image data, an object ID of an object within the image, screen position information, a region image, and the like. These data can be obtained in connection with object recognition from image data.

テキスト関連データ１３４は、撮影中の画像から文字認識して抽出された文字を含むテキストデータや、ユーザにより入力されたテキストデータに関連するデータを含む。また、テキスト関連データは、ユーザの音声データを音声認識して取得されたりしてもよい。ユーザにより入力装置１５２を操作されることで取得されたりする。 The text-related data 134 includes text data including characters extracted by character recognition from the image being photographed, and data related to text data input by the user. Further, the text-related data may be acquired by voice recognition of the user's voice data. The information may be acquired by operating the input device 152 by the user.

メモリ１３０には、ガイド情報を表示する表示処理を行うモジュールと、ガイド情報を登録する登録処理を行うモジュールとを有する。まず、表示処理を行うモジュールについて説明する。 The memory 130 includes a module that performs display processing to display guide information, and a module that performs registration processing to register guide information. First, a module that performs display processing will be explained.

＜表示処理＞
特定モジュール１３５は、撮影中の画像内に表示される他の情報処理装置（例えばユーザ端末１０Ｂ）の画面を特定する。例えば、特定モジュール１３５は、撮影中の画像から一の画像を取得し、サーバ２０に物体認識をリクエストする。特定モジュール１３５は、サーバ２０から、認識された物体の画面情報（例えば画像内における画面の位置情報）を取得する。具体例として、特定モジュール１３５は、サーバ２０側が公開する物体認識ＡＰＩを用いて、認識された物体の物体ＩＤや位置情報を取得してもよい。 <Display processing>
The identification module 135 identifies the screen of another information processing device (for example, the user terminal 10B) that is displayed within the image being captured. For example, the identification module 135 acquires one image from among the images being photographed, and requests object recognition from the server 20. The identification module 135 acquires screen information of the recognized object (for example, screen position information within the image) from the server 20 . As a specific example, the identification module 135 may use an object recognition API published by the server 20 to obtain the object ID and position information of the recognized object.

取得モジュール１３６は、特定された画面内に表示される文字が認識され、この文字の文字情報を含む認識結果を取得する。例えば、取得モジュール１３６は、画面の位置情報に基づいて切り出した領域画像をサーバ２０に送信し、サーバ２０側で領域画像内の文字列が認識されて、その文字列の文字情報を含む認識結果をサーバ２０から取得する。具体例として、取得モジュール１３６は、サーバ２０側が公開する文字認識ＡＰＩを用いて、領域画像内の文字列（テキスト）や文字列の位置情報（例えば氏名、メールアドレスなど）を取得してもよい。 The acquisition module 136 recognizes a character displayed within the specified screen and acquires a recognition result including character information of this character. For example, the acquisition module 136 transmits a region image cut out based on screen position information to the server 20, a character string in the region image is recognized on the server 20 side, and a recognition result including character information of the character string is generated. is obtained from the server 20. As a specific example, the acquisition module 136 may acquire character strings (text) in the area image and positional information of the character strings (for example, name, email address, etc.) using a character recognition API published by the server 20. .

また、取得モジュール１３６は、文字の認識結果に対応するガイド情報を取得する。例えば、取得モジュール１３６は、認識結果の文字情報を項目ごとにサーバ２０に送信し、サーバ２０側で各項目の文字情報に対応するガイド情報が特定され、特定されたガイド情報をサーバ２０から取得する。 The acquisition module 136 also acquires guide information corresponding to the character recognition results. For example, the acquisition module 136 transmits the character information of the recognition result to the server 20 for each item, the guide information corresponding to the character information of each item is identified on the server 20 side, and the identified guide information is acquired from the server 20. do.

表示制御モジュール１３７は、撮影装置１６０により撮影中の画像をディスプレイ１５１（表示部）に表示制御する。また、表示制御モジュール１３７は、取得されたガイド情報を、対応する項目又は文字情報に関連付けて、撮影中の画像に重畳して表示制御する。例えば、表示制御モジュール１３７は、取得されたガイド情報を、対応する文字情報の位置に関連付けて、入力項目の表示を維持するようにガイド情報を、ＡＲ（Augmented Reality）技術を用いて重畳して表示制御する。表示制御モジュール１３７は、入力項目の表示を維持するため、ガイド情報を透明化したり、入力項目の位置に重複しないように重畳表示したりしてもよい。また、以下においてガイド情報が重畳して表示される場合は、ＡＲ技術等が用いられてもよい。 The display control module 137 controls display of the image being photographed by the photographing device 160 on the display 151 (display section). Furthermore, the display control module 137 associates the acquired guide information with the corresponding item or text information, superimposes it on the image being photographed, and controls the display. For example, the display control module 137 associates the acquired guide information with the position of the corresponding character information, and superimposes the guide information using AR (Augmented Reality) technology so as to maintain the display of the input item. Control display. In order to maintain the display of input items, the display control module 137 may make the guide information transparent or display it in a superimposed manner so as not to overlap the position of the input item. Further, when guide information is displayed in a superimposed manner below, AR technology or the like may be used.

これにより、ユーザは、所定画面の設定方法に関するガイド情報が表示部に重畳表示されるので、適切なガイドを容易に把握することができる。また、物体認識によりガイド情報を特定しつつ、適切なガイド情報が表示部に重畳表示されることで、ユーザ端末１０Ｂの所定画面はそのまま表示し、ユーザ端末１０Ｂとは異なるユーザ端末１０Ａを用いて、所定画面の入力や設定をアシストすることができる。 As a result, the user can easily understand appropriate guidance because the guide information regarding how to set the predetermined screen is displayed in a superimposed manner on the display unit. In addition, by superimposing appropriate guide information on the display while identifying guide information through object recognition, the predetermined screen of the user terminal 10B can be displayed as is, and the user terminal 10A, which is different from the user terminal 10B, can be used. , can assist with input and settings on a predetermined screen.

取引制御モジュール１３８は、電子商取引プラットフォームにおいて商品の売買を制御し、例えば、出品や購入の手続き処理を制御する。なお、ガイド情報は、電子商取引プラットフォームにおけるデータベースに格納されているガイド情報から検索されてもよい。 The transaction control module 138 controls buying and selling of products on the electronic commerce platform, and controls, for example, listing and purchasing procedures. Note that the guide information may be searched from guide information stored in a database in the electronic commerce platform.

検知モジュール１３９は、撮影された画像からユーザのジェスチャを検知する。例えば、検知モジュール１３９は、タップなどの所定のジェスチャを検知する。また、検知モジュール１３９により検知されたモジュールに対応する処理が実行されてもよい。 The detection module 139 detects the user's gesture from the captured image. For example, the detection module 139 detects a predetermined gesture such as a tap. Furthermore, processing corresponding to the module detected by the detection module 139 may be executed.

また、表示制御モジュール１３７は、ガイド情報を重畳して表示制御することに、
ユーザのハンドの位置に対応する文字情報を特定すること、
特定された文字情報に対応するガイド情報を特定すること、
特定されたガイド情報を、特定された文字情報に関連付けて撮影中の画像に重畳して表示制御すること、
を含んでもよい。 In addition, the display control module 137 superimposes guide information and controls the display.
identifying text information corresponding to the position of the user's hand;
identifying guide information corresponding to the identified character information;
controlling the display of the identified guide information by associating it with the identified text information and superimposing it on the image being photographed;
May include.

例えば、表示制御モジュール１３７は、物体認識処理によりユーザの指先が認識され、認識された指先の位置が所定範囲内にある文字情報を特定することを含む。例えば、表示制御モジュール１３７は、特定された文字情報に対応付けられたガイド情報をサーバ２０から取得して特定することを含む。例えば、表示制御モジュール１３７は、特定されたガイド情報を、特定された文字情報の位置に関連付けて撮影中の画像に重畳して表示制御することを含む。 For example, the display control module 137 includes recognizing the user's fingertip through object recognition processing and identifying character information in which the position of the recognized fingertip is within a predetermined range. For example, the display control module 137 includes obtaining and specifying guide information associated with the specified character information from the server 20. For example, the display control module 137 includes controlling the display of the specified guide information in association with the position of the specified text information and superimposing it on the image being photographed.

これにより、ユーザが指定した位置に対応するガイド情報を個別に表示することができ、ユーザの選択順に応じて各文字情報に対応するガイド情報を表示することが可能になる。 This makes it possible to individually display the guide information corresponding to the position specified by the user, and to display the guide information corresponding to each piece of character information according to the user's selection order.

また、特定モジュール１３５は、情報処理装置（ユーザ端末１０Ｂ）の画面が識別され、この画面の識別情報と、この画面の位置情報とを取得し、この位置情報に基づき文字認識する領域を特定してもよい。例えば、特定モジュール１３５は、物体ＩＤにより示される物体の画面部分の位置情報により示される領域を特定する。位置情報は、画像内の位置情報を含む。 Further, the identification module 135 identifies the screen of the information processing device (user terminal 10B), acquires the identification information of this screen and the position information of this screen, and identifies the area for character recognition based on this position information. It's okay. For example, the identification module 135 identifies the area indicated by the position information of the screen portion of the object indicated by the object ID. The position information includes position information within the image.

また、取得モジュール１３６は、特定された領域内の画像から文字が認識され、この文字の文字情報を含む認識結果を取得してもよい。例えば、取得モジュール１３６は、各入力項目又は設定項目に対応する項目名（文字列）の文字情報を含む文字認識結果をサーバ２０から取得する。 Further, the acquisition module 136 may acquire a recognition result including character information of a character recognized from an image within the specified area and the character. For example, the acquisition module 136 acquires from the server 20 a character recognition result including character information of an item name (character string) corresponding to each input item or setting item.

これにより、認識された物体により文字認識する領域を特定し、特定された領域内の画像について文字認識すればよいため、処理負荷を軽減することができる。 Thereby, it is sufficient to specify a character recognition area based on the recognized object and perform character recognition on an image within the specified area, so that the processing load can be reduced.

また、ガイド情報は、各項目名を含む文字情報に対応する他のユーザのハンド操作に関する動画を含んでもよい。例えば、ハンド操作に関する動画は、熟練者による同じ項目への入力又は設定方法を示すハンドの動きを含む動画である。また、ハンド操作に関する動画は、実際に撮影された平面動画でも立体動画（Volumetric Video）でも、実際に撮影された動画に基づく疑似のハンドが動くアニメーションでもよい。 Further, the guide information may include a video related to another user's hand operation corresponding to the text information including each item name. For example, a video related to hand operation is a video that includes hand movements showing a method of inputting or setting the same item by an expert. Further, the video related to hand operation may be an actually shot two-dimensional video, a three-dimensional video (volumetric video), or an animation in which a pseudo hand moves based on an actually shot video.

これにより、ユーザは、実際のハンド操作を確認しつつ、所定画面への入力又は設定を容易に行うことができる。 Thereby, the user can easily perform input or settings on the predetermined screen while checking the actual hand operation.

また、警告モジュール１４０は、項目に対して入力される文字が認識され、この文字の認識結果が、この項目に関連付けられた入力条件を満たさない場合、警告を出力する。例えば、警告モジュール１４０は、メールアドレスの入力欄に入力される文字情報を認識し、認識結果が、所定のメールアドレスの形式の条件を満たすか否かを判定する。所定の条件を満たさない場合、警告モジュール１４０は、ポップアップや音声などで入力内容や設定内容が条件を満たさないことをユーザに通知する。所定の条件は、例えば＠がメールアドレスに含まれていないなどである。 Further, the warning module 140 outputs a warning when a character input for an item is recognized and the recognition result of this character does not satisfy the input condition associated with this item. For example, the warning module 140 recognizes character information input into an email address input field, and determines whether the recognition result satisfies a predetermined email address format condition. If the predetermined conditions are not met, the warning module 140 notifies the user that the input content or setting content does not meet the conditions, using a pop-up or voice. The predetermined condition is, for example, that @ is not included in the email address.

これにより、ユーザは、所定画面内の項目に入力や設定をする際に警告を報知され、入力ミスや設定ミスに気付くことができる。 Thereby, the user is notified of a warning when inputting or setting an item in a predetermined screen, and can notice input errors or setting errors.

また、表示制御モジュール１３７は、ガイド情報を重畳して表示制御することに、認識結果に含まれる各文字情報を表示制御し、ユーザにより選択された文字情報に対応するガイド情報を撮影中の画像に重畳して表示制御することを含んでもよい。例えば、表示制御モジュール１３７は、所定画面内に表示される文字列を全て認識し、認識された各文字列の文字情報を選択可能にする一覧情報（リスト）を表示制御する。ユーザにより所定の文字情報が選択された場合、表示制御モジュール１３７は、選択された文字情報に対応するガイド情報を撮影中の画像に重畳して表示制御する。選択について、音声認識により認識された音声の音声情報と、文字情報とが一致する場合に選択と判断されたり、所定の文字情報の位置でタップ処理が行われた場合に選択と判断されたりする。 The display control module 137 also superimposes and controls the display of guide information, controls the display of each character information included in the recognition result, and displays the guide information corresponding to the character information selected by the user in the image being photographed. It may also include controlling the display by superimposing the image on the image. For example, the display control module 137 recognizes all character strings displayed within a predetermined screen and controls the display of list information (list) that allows selection of character information for each recognized character string. When predetermined text information is selected by the user, the display control module 137 superimposes and controls the display of guide information corresponding to the selected text information on the image being photographed. Regarding selection, selection is determined when the audio information of the voice recognized by voice recognition and text information match, or selection is determined when a tap process is performed at the position of predetermined text information. .

これにより、認識された文字情報の一覧情報が表示されることで、ユーザは認識結果を確認することができ、また、一覧情報から文字情報（項目名）を選択してガイド情報を確認することができる。
＜登録処理＞ This allows the user to check the recognition results by displaying a list of recognized character information, and also allows the user to select character information (item name) from the list and check the guide information. I can do it.
<Registration process>

次に、ガイド情報が登録される処理について説明する。登録処理の場合、ユーザは、熟練者であり、自身のハンド操作をガイド情報としてサーバ２０に記録して登録する。 Next, a process for registering guide information will be explained. In the case of the registration process, the user is an expert and records and registers his or her hand operations as guide information in the server 20.

表示制御モジュール１３７は、撮影装置１６０により撮影中の画像を表示制御する。特定モジュール１３５は、画像内に表示される他の情報処理装置（ユーザ端末１０Ｂ）の画面を特定する。画面の特定の仕方は上述したとおりである。取得モジュール１３６は、特定された画面内に表示される文字が認識され、この文字の文字情報を含む認識結果を取得する。 The display control module 137 controls the display of the image being photographed by the photographing device 160. The identification module 135 identifies the screen of another information processing device (user terminal 10B) displayed within the image. The method of specifying the screen is as described above. The acquisition module 136 recognizes a character displayed within the specified screen and acquires a recognition result including character information of this character.

また、撮影装置１６０は、認識された各文字情報に対応する、ユーザ（熟練者）のハンド操作を含む各ガイド情報を撮影する。ネットワーク通信モジュール１３２は、認識された各項目名を含む各文字情報と、各項目名に対応する各ガイド情報とをサーバ２０に送信する。これにより、サーバ２０側では、項目名を含む文字情報と、その文字情報又は項目名に対応するガイド情報とを関連付けて保存することができる。 Further, the photographing device 160 photographs each piece of guide information including the user's (expert) hand operation corresponding to each piece of recognized character information. The network communication module 132 transmits each character information including each recognized item name and each guide information corresponding to each item name to the server 20. Thereby, on the server 20 side, the text information including the item name and the guide information corresponding to the text information or the item name can be stored in association with each other.

また、ガイド情報は、ユーザ（熟練者）による文字入力を含んでもよい。これにより、初心者であるユーザは、熟練者が実際に何を入力したかを参考にして、所定画面内の項目に入力等することが可能になる。 Further, the guide information may include character input by the user (expert). This allows a novice user to input items on a predetermined screen by referring to what an expert has actually input.

また、熟練者であるユーザの文字入力により個人情報が含まれる場合、ガイド情報は、個人情報を含まないようにする。例えば、サーバ２０側で、メールアドレスや氏名、ニックネームなどの文字情報にはぼかしを入れたり、他の記号に置き換えたりしてもよい。これにより、個人情報保護の観点からセキュリティを向上させることができる。 Furthermore, if personal information is included due to character input by an expert user, the guide information is configured not to include personal information. For example, on the server 20 side, text information such as email addresses, names, nicknames, etc. may be blurred out or replaced with other symbols. Thereby, security can be improved from the viewpoint of personal information protection.

また、特定モジュール１３５は、ユーザ端末１０Ｂの画面が識別され、この画面の識別情報と、この画面の位置情報とを取得し、この位置情報に基づき文字認識する領域を特定することを含んでもよい。また、取得モジュール１３６は、特定された領域内の画像から文字が認識され、この文字の文字情報を含む認識結果を取得してもよい。これにより、文字認識する領域を特定することができるので、処理負荷の軽減や、不要な情報を分析せずにプライバシーの尊重を図ることができる。 Further, the identification module 135 may include identifying the screen of the user terminal 10B, acquiring identification information of this screen and positional information of this screen, and specifying a character recognition area based on this positional information. . Further, the acquisition module 136 may acquire a recognition result including character information of a character recognized from an image within the specified area and the character. This makes it possible to specify the character recognition area, thereby reducing the processing load and respecting privacy without analyzing unnecessary information.

なお、１つ又は複数の処理装置（ＣＰＵ）１１０は、メモリ１３０から、必要に応じて各モジュールを読み出して実行する。例えば、１つ又は複数の処理装置（ＣＰＵ）１１０は、メモリ１３０に格納されているネットワーク通信モジュール１３２を実行することで、通信部（送信部、受信部を含む）を構成してもよい。また、１つ又は複数の処理装置（ＣＰＵ）１１０は、メモリ１３０に格納されている特定モジュール１３５、取得モジュール１３６、表示制御モジュール１３７、取引制御モジュール１３８、検知モジュール１３９、警告モジュール１４０をそれぞれ実行することで、特定部、取得部、表示制御部、検知部、警告部を構成してもよい。 Note that one or more processing units (CPUs) 110 read and execute each module from the memory 130 as necessary. For example, one or more processing units (CPUs) 110 may configure a communication unit (including a transmitting unit and a receiving unit) by executing a network communication module 132 stored in the memory 130. In addition, one or more processing units (CPUs) 110 execute a specific module 135, an acquisition module 136, a display control module 137, a transaction control module 138, a detection module 139, and a warning module 140, which are stored in the memory 130, respectively. By doing so, the identification section, the acquisition section, the display control section, the detection section, and the warning section may be configured.

他の実施形態において、特定モジュール１３５、取得モジュール１３６、表示制御モジュール１３７、取引制御モジュール１３８、検知モジュール１３９、警告モジュール１４０は、ユーザ端末１０Ａのメモリ１３０に格納されるスタンドアロンアプリケーションであってもよい。スタンドアロンアプリケーションとしては、限定はされないが、特定アプリケーション、取得アプリケーション、表示制御アプリケーション、取引制御アプリケーション、検知アプリケーション、警告アプリケーションが挙げられる。さらに他の実施形態において、特定モジュール１３５、取得モジュール１３６、表示制御モジュール１３７、取引制御モジュール１３８、検知モジュール１３９、警告モジュール１４０は別のアプリケーションへのアドオン又はプラグインであってもよい。 In other embodiments, the identification module 135, the acquisition module 136, the display control module 137, the transaction control module 138, the detection module 139, and the alert module 140 may be standalone applications stored in the memory 130 of the user terminal 10A. . Standalone applications include, but are not limited to, specific applications, acquisition applications, display control applications, transaction control applications, detection applications, and alert applications. In yet other embodiments, the identification module 135, acquisition module 136, display control module 137, transaction control module 138, detection module 139, and alert module 140 may be add-ons or plug-ins to another application.

上記に示した要素の各々は、先述の記憶装置の１つ又は複数に格納され得る。上記に示したモジュールの各々は、上述される機能を実行するための命令のセットに対応する。上記に示したモジュール又はプログラム（すなわち、命令のセット）は別個のソフトウェアプログラム、プロシージャ又はモジュールとして実装される必要はないとともに、従ってこれらのモジュールの様々なサブセットは様々な実施形態で組み合わされるか、或いは再構成されてもよい。ある実施形態において、メモリ１３０は上記に示されるモジュール及びデータ構造のサブセットを格納し得る。さらには、メモリ１３０は上述されない追加的なモジュール及びデータ構造を格納し得る。 Each of the elements shown above may be stored in one or more of the aforementioned storage devices. Each of the modules shown above corresponds to a set of instructions for performing the functions described above. The modules or programs (i.e., sets of instructions) illustrated above need not be implemented as separate software programs, procedures or modules, and therefore various subsets of these modules may be combined in various embodiments. Alternatively, it may be reconfigured. In some embodiments, memory 130 may store a subset of the modules and data structures shown above. Furthermore, memory 130 may store additional modules and data structures not described above.

図３は、実施形態に係るサーバ２０の一例を示すブロック図である。サーバ２０は典型的には、１つ又は複数の処理装置（ＣＰＵ）２１０、１つ又は複数のネットワーク又は他の通信インタフェース２２０、メモリ２３０、及びこれらの構成要素を相互接続するための１つ又は複数の通信バス２７０を含む。図３に示すサーバ２０は、図１に示すサーバ２０として説明するが、少なくとも１つの機能を有する別個のサーバとして構成されてもよい。 FIG. 3 is a block diagram showing an example of the server 20 according to the embodiment. Server 20 typically includes one or more processing units (CPUs) 210, one or more network or other communication interfaces 220, memory 230, and one or more processors for interconnecting these components. A plurality of communication buses 270 are included. Although the server 20 shown in FIG. 3 is described as the server 20 shown in FIG. 1, it may be configured as a separate server with at least one function.

サーバ２０は、場合によりユーザインタフェース２５０を含んでもよく、これとしては、ディスプレイ装置（図示せず）、及びキーボード及び／又はマウス（又は他の何らかのポインティングデバイス等の入力装置。図示せず）を挙げることができる。 Server 20 may optionally include a user interface 250, including a display device (not shown) and input devices such as a keyboard and/or mouse (or some other pointing device, not shown). be able to.

メモリ２３０は、例えば、ＤＲＡＭ、ＳＲＡＭ、ＤＤＲＲＡＭ又は他のランダムアクセス固体記憶装置などの高速ランダムアクセスメモリであり、また、１つ又は複数の磁気ディスク記憶装置、光ディスク記憶装置、フラッシュメモリデバイス、又は他の不揮発性固体記憶装置などの不揮発性メモリでもよい。 Memory 230 is, for example, a high speed random access memory such as DRAM, SRAM, DDR RAM or other random access solid state storage, and may also include one or more magnetic disk storage, optical disk storage, flash memory devices, or It may also be a nonvolatile memory such as another nonvolatile solid state storage device.

また、メモリ２３０の他の例は、ＣＰＵ２１０から遠隔に設置される１つ又は複数の記憶装置を挙げることができる。ある実施形態において、メモリ２３０は次のプログラム、モジュール及びデータ構造、又はそれらのサブセットを格納する。 Other examples of the memory 230 include one or more storage devices installed remotely from the CPU 210. In some embodiments, memory 230 stores the following programs, modules and data structures, or a subset thereof.

オペレーティングシステム２３１は、例えば、様々な基本的なシステムサービスを処理するとともにハードウェアを用いてタスクを実行するためのプロシージャを含む。 Operating system 231 includes, for example, procedures for handling various basic system services and performing tasks using hardware.

ネットワーク通信モジュール２３２は、例えば、サーバ２０を他のコンピュータに、１つ又は複数の通信ネットワークインタフェース２２０及びインターネット、他の広域ネットワーク、ローカルエリアネットワーク、メトロポリタンエリアネットワークなどの１つ又は複数の通信ネットワークを介して接続するために使用される。 Network communications module 232 may, for example, connect server 20 to other computers through one or more communications network interfaces 220 and one or more communications networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, etc. Used to connect via.

物体データ２３３は、認識対象となり得る物体の情報が格納される。例えば、物体データ２３３は、物体を特定するための物体ＩＤ、物体の名称、物体を含む画像データ等を含む（例えば、図４参照）。物体の画像データは、認識精度を上げるため、画面を含む様々な角度からの画像データを含んでもよい。 The object data 233 stores information about objects that can be recognition targets. For example, the object data 233 includes an object ID for identifying the object, a name of the object, image data including the object, etc. (see, for example, FIG. 4). The image data of the object may include image data from various angles including the screen in order to improve recognition accuracy.

文字認識データ２３４は、認識対象となりうる画面画像の文字情報を含む情報が格納される。例えば、文字認識データ２３４は、どの物体かを示すための物体ＩＤ、どの画面かを示すための画面ＩＤ、画面名、画面に含まれる項目名を含む各文字情報、及びその文字情報の位置を示す位置情報等を含む。位置情報は、例えば画面内における文字情報や項目の位置を示す情報を含む。 The character recognition data 234 stores information including character information of a screen image that can be recognized. For example, the character recognition data 234 includes an object ID to indicate which object, a screen ID to indicate which screen, a screen name, each character information including an item name included in the screen, and the position of the character information. Includes location information, etc. The position information includes, for example, character information on the screen and information indicating the position of the item.

ガイド格納先データ２３５は、所定画面の入力や設定にガイドを要するユーザ向けの情報が格納される。例えば、ガイド格納先データ２３５は、画面ＩＤ、文字情報（項目名を含む）、その文字情報に対応するガイドの内容を含むガイド情報等を含む。ガイド情報は、熟練者のハンドの動きを示す動画、熟練者のハンドの動きに基づくアニメーション、または音声ガイド等の少なくとも１つを含む。例えば、ガイド情報は、画面ＩＤが「配送画面」を示し、文字情報が「配送選択」を示す場合、ガイド情報は、熟練者が配送をどのように選択するかを示す動画等を含む。 The guide storage location data 235 stores information for users who require guidance for inputting or setting a predetermined screen. For example, the guide storage location data 235 includes a screen ID, text information (including item names), guide information including the content of the guide corresponding to the text information, and the like. The guide information includes at least one of a moving image showing the movement of an expert's hand, an animation based on the movement of the expert's hand, an audio guide, and the like. For example, when the screen ID indicates "delivery screen" and the text information indicates "delivery selection," the guide information includes a video showing how an expert selects delivery.

物体認識モジュール２３６は、ユーザ端末１０Ａから送信された画面を含む画像データを、物体認識ＡＰＩ（Application Programming Interface）を介して取得し、この画像データに対し、物体認識処理を実行し、物体認識の結果データを、画像データの送信元のユーザ端末１０Ａに送信する。結果データには、認識された物体の物体ＩＤと、認識された物体内の画面の位置を示す位置情報が含まれてもよい。 The object recognition module 236 acquires image data including a screen transmitted from the user terminal 10A via an object recognition API (Application Programming Interface), performs object recognition processing on this image data, and performs object recognition. The result data is transmitted to the user terminal 10A that is the source of the image data. The result data may include the object ID of the recognized object and position information indicating the position of the screen within the recognized object.

文字認識モジュール２３７は、ユーザ端末１０Ａから送信された画面を含む画像データ（例えば領域画像データ）に対して、文字認識ＡＰＩを介して取得し、この画像データに対して文字認識処理を実行し、認識結果データを、画像データの送信元のユーザ端末１０Ａに送信する。文字認識処理は、公知のＯＣＲ（Optical Character Recognition）技術が用いられればよい。その際、文字認識モジュール２３７は、文字認識した文字列の位置情報を、認識結果に関連づけておいてもよい。 The character recognition module 237 acquires image data including a screen (for example, area image data) transmitted from the user terminal 10A via the character recognition API, performs character recognition processing on this image data, The recognition result data is transmitted to the user terminal 10A that is the source of the image data. For the character recognition process, a known OCR (Optical Character Recognition) technique may be used. At this time, the character recognition module 237 may associate the position information of the recognized character string with the recognition result.

ガイド制御モジュール２３８は、ガイド情報を登録したり、送信したりするための処理を実行する。例えば、ガイド制御モジュール２３８は、ネットワーク通信モジュール２３２を介して、ユーザ端末１０Ａから、様々なデータや情報、リクエストを取得し、ガイド情報の制御を行う。 The guide control module 238 executes processing for registering and transmitting guide information. For example, the guide control module 238 acquires various data, information, and requests from the user terminal 10A via the network communication module 232, and controls guide information.

具体例として、ガイド制御モジュール２３８は、ガイド情報の取得リクエストや、ガイド情報のアップロードのリクエストや、物体認識のための画像データを学習させるリクエストなどを取得する。また、ガイド制御モジュール２３８は、各リクエストに基づいて、ガイド情報をデータベースにアップロードしたり、リクエストされたガイド情報をデータベース（ＤＢ）から取得したりする。 As a specific example, the guide control module 238 obtains a request to obtain guide information, a request to upload guide information, a request to learn image data for object recognition, and the like. Further, the guide control module 238 uploads guide information to a database or acquires requested guide information from a database (DB) based on each request.

電子商取引モジュール２３９は、商品やサービスの売買処理を実行する。例えば、電子商取引モジュール２３９は、商品やサービスの出品処理を実行したり、販売処理を実行したりする。また、電子商取引モジュール２３９は、ユーザ端末１０Ｂにおいて起動されるアプリケーションの所定画面を用いて、会員登録、販売対象の商品の登録、売買の取引、配送の設定、評価などを制御する。 The electronic commerce module 239 executes the buying and selling process of products and services. For example, the electronic commerce module 239 executes listing processing of products and services, and executing sales processing. Further, the electronic commerce module 239 controls membership registration, registration of products to be sold, purchase and sale transactions, delivery settings, evaluation, etc. using a predetermined screen of an application activated on the user terminal 10B.

音声認識モジュール２３９は、ユーザ端末１０Ａから送信された音声データを、音声認識ＡＰＩを介して取得し、この音声データに対して音声認識し、認識結果のテキストデータを、音声データの送信元のユーザ端末１０に送信したり、データベースに登録したりする。 The voice recognition module 239 acquires the voice data transmitted from the user terminal 10A via the voice recognition API, performs voice recognition on this voice data, and sends the text data of the recognition result to the user who sent the voice data. It is sent to the terminal 10 or registered in the database.

上記に示した要素の各々は先述される記憶装置の１つ又は複数に格納され得る。上記に示したモジュールの各々は、上述される機能を実行するための命令のセットに対応する。上記に示したモジュール又はプログラム（すなわち、命令のセット）は別個のソフトウェアプログラム、プロシージャ又はモジュールとして実装される必要はないとともに、従ってこれらのモジュールの様々なサブセットが様々な実施形態で組み合わされるか、或いは再構成され得る。ある実施形態において、メモリ２３０は上記に示されるモジュール及びデータ構造のサブセットを格納し得る。さらには、メモリ２３０は上述されない追加的なモジュール及びデータ構造を格納し得る。 Each of the elements shown above may be stored in one or more of the storage devices mentioned above. Each of the modules shown above corresponds to a set of instructions for performing the functions described above. The modules or programs (i.e., sets of instructions) illustrated above need not be implemented as separate software programs, procedures or modules, and therefore various subsets of these modules may be combined in various embodiments. Or it can be reconfigured. In some embodiments, memory 230 may store a subset of the modules and data structures shown above. Furthermore, memory 230 may store additional modules and data structures not described above.

なお、１つ又は複数の処理装置（ＣＰＵ）２１０は、メモリ２３０から、必要に応じて各モジュールを読み出して実行する。例えば、１つ又は複数の処理装置（ＣＰＵ）２１０は、メモリ２３０に格納されているネットワーク通信モジュール２３２を実行することで、通信部（送信部、受信部を含む）を構成してもよい。また、１つ又は複数の処理装置（ＣＰＵ）２１０は、メモリ２３０に格納されている物体認識モジュール２３６、文字認識モジュール２３７、ガイド制御モジュール２３８、電子商取引モジュール２３９、音声認識モジュール２４０をそれぞれ実行することで、物体認識部、文字認識部、ガイド制御部、電子商取引部、音声認識部を構成してもよい。 Note that one or more processing units (CPUs) 210 read and execute each module from the memory 230 as necessary. For example, one or more processing units (CPUs) 210 may configure a communication unit (including a transmitting unit and a receiving unit) by executing a network communication module 232 stored in the memory 230. In addition, one or more processing units (CPUs) 210 execute an object recognition module 236, a character recognition module 237, a guide control module 238, an electronic commerce module 239, and a voice recognition module 240, which are stored in the memory 230, respectively. This may constitute an object recognition section, a character recognition section, a guide control section, an electronic commerce section, and a voice recognition section.

図３は「サーバ」を示すが、図３は、本明細書に記載される実施形態の構造的な概略としてよりも、サーバのセットに存在し得る様々な特徴についての説明が意図されている。実際には、当業者により認識されるとおり、別個に示される項目が組み合わされ得るであろうとともに、ある項目が別個にされ得るであろう。例えば、図３において別個に示される項目は単一サーバ上に実装され得るであろうとともに、単一の項目が１台又は複数のサーバにより実装され得るであろう。 Although FIG. 3 depicts a "server," FIG. 3 is intended less as a structural overview of the embodiments described herein and more as an illustration of the various features that may be present in a set of servers. . In fact, items shown separately could be combined, as well as certain items could be made separate, as will be recognized by those skilled in the art. For example, items shown separately in FIG. 3 could be implemented on a single server, and a single item could be implemented by one or more servers.

＜データ構造の一例＞
図４は、実施形態に係る物体データ２３３の一例を示す図である。図４に示す例では、物体データ２３３は、物体ＩＤに関連付けて、物体名、物体の画像データなどを含む。これらのデータは、機械学習の学習データとして用いられてもよい。物体データ２３３の一例として、画像ＩＤ「Ｔ－０００１００」には、物体名「スマートフォン」、物体の画像データ「画像Ａ」などのデータが関連付けられる。なお、他にも、物体の画面の位置を示す位置情報などが関連付けられてもよい。 <Example of data structure>
FIG. 4 is a diagram showing an example of object data 233 according to the embodiment. In the example shown in FIG. 4, the object data 233 includes an object name, image data of the object, etc. in association with the object ID. These data may be used as learning data for machine learning. As an example of the object data 233, the image ID "T-000100" is associated with data such as an object name "smartphone" and image data of the object "image A." In addition, position information indicating the position of the object on the screen may be associated with the object.

図５は、実施形態に係る文字認識データ２３４の一例を示す図である。図５に示す例では、文字認識データ２３４は、物体ＩＤに関連付けて、画面ＩＤ、画面名、文字情報１、位置情報１、文字情報２などのデータを含む。文字認識データ２３４の一例として、物体ＩＤ「Ｔ－０００１００」には、画面ＩＤ「Ｄ１」、画面名「登録画面」、項目名を含む文字情報「氏名」、その項目の位置を示す位置情報「（ｘ１、ｙ１）」などが関連付けられる。 FIG. 5 is a diagram showing an example of character recognition data 234 according to the embodiment. In the example shown in FIG. 5, the character recognition data 234 includes data such as a screen ID, a screen name, character information 1, position information 1, and character information 2 in association with the object ID. As an example of the character recognition data 234, the object ID "T-000100" includes the screen ID "D1", the screen name "registration screen", the character information including the item name "name", and the position information indicating the position of the item " (x1, y1)" etc. are associated.

図６は、実施形態に係るガイド格納先データ２３５の一例を示す図である。図６に示す例では、ガイド格納先データ２３５は、画面ＩＤに関連付けて、文字情報、ガイド情報の格納先情報などが関連付けられる。ガイド格納先データ２３５の一例として、画面ＩＤ「Ｄ１」に、文字情報「氏名」、この氏名の入力に関するガイド情報の格納先のＵＲＬ「ＵＲＬ１」などが関連付けられる。 FIG. 6 is a diagram showing an example of the guide storage location data 235 according to the embodiment. In the example shown in FIG. 6, the guide storage location data 235 includes text information, guide information storage location information, and the like in association with the screen ID. As an example of the guide storage location data 235, the screen ID "D1" is associated with text information "name" and the URL "URL1" of the storage location of guide information related to input of this name.

上述したデータ構造は、あくまでも一例であって、この例に限られない。例えば図６に示すガイド格納先データ２３５は、文字情報に関連する項目にＩＤを設けて、この項目ＩＤにガイド情報の格納先情報が関連付けられてもよい。 The data structure described above is just an example, and is not limited to this example. For example, in the guide storage location data 235 shown in FIG. 6, an ID may be provided for an item related to character information, and the guide information storage location information may be associated with this item ID.

＜動作説明＞
次に、実施形態に係る情報処理システム１の動作について図７及び図８を用いて説明する。図７及び図８に示す例では、サーバ２０が機能ごとに分かれている。例えば、サーバ２０Ａが物体認識モジュール２３６を有するサーバであり、サーバ２０Ｂが文字認識モジュール２３７を有するサーバであり、ＤＢ１が物体の画像データ等の学習データの格納先やガイド情報の格納先であり、ＤＢ２は、物体データ２３５、文字認識データ２３６、ガイド格納先データ２３６等を保存し、ガイド制御モジュール２３８の機能を有する。図７は、実施形態に係る情報処理システム１の登録処理の一例を示すシーケンス図である。 <Operation explanation>
Next, the operation of the information processing system 1 according to the embodiment will be explained using FIGS. 7 and 8. In the example shown in FIGS. 7 and 8, the server 20 is divided by function. For example, the server 20A is a server that includes an object recognition module 236, the server 20B is a server that includes a character recognition module 237, and the DB1 is a storage destination for learning data such as object image data and guide information, The DB 2 stores object data 235, character recognition data 236, guide storage location data 236, etc., and has the function of a guide control module 238. FIG. 7 is a sequence diagram illustrating an example of the registration process of the information processing system 1 according to the embodiment.

（ステップＳ１０２）
ユーザは、ユーザ端末１０Ａを用いて、所定のアプリケーション（以下、「Ａアプリ」とも称する。）の登録起動ボタンを押下する。また、ユーザは、ユーザ端末１０Ａに向かって、「Ｈｅｙ ○○、Ａアプリを起動して」等と発話し、Ａアプリを起動する。このとき、ユーザは、「Ａアプリの登録機能を起動して」等と発話し、Ａアプリの登録機能を起動するようにしてもよい。 (Step S102)
The user uses the user terminal 10A to press the registration activation button of a predetermined application (hereinafter also referred to as "A application"). Further, the user speaks to the user terminal 10A, such as "Hey XXX, start the A app," and starts the A app. At this time, the user may say something like "Start the registration function of the A app" to activate the registration function of the A app.

（ステップＳ１０４）
ユーザ端末１０Ａは、Ａアプリの起動に伴い、撮影装置１６０を起動し、撮影中のカメラ画像をディスプレイ１５１に表示する。 (Step S104)
Upon activation of the A app, the user terminal 10A activates the photographing device 160 and displays the camera image being photographed on the display 151.

（ステップＳ１０６）
ユーザは、ディスプレイ１５１越しに見えるユーザ端末１０Ｂ（物体）をタップする。ユーザ端末１０Ａの検知モジュール１３９は、タップのジェスチャを検知する。また、ユーザは、音声等でユーザ端末１０Ｂの存在をユーザ端末１０Ａに知らせてもよい。 (Step S106)
The user taps the user terminal 10B (object) visible through the display 151. The detection module 139 of the user terminal 10A detects the tap gesture. Further, the user may notify the user terminal 10A of the existence of the user terminal 10B by voice or the like.

（ステップＳ１０８）
ユーザ端末１０Ａの特定モジュール１３５は、撮影装置１６０からのカメラ画像内における、タップした位置情報と、カメラ画像に基づき、物体認識用の学習データを生成する。 (Step S108)
The specific module 135 of the user terminal 10A generates learning data for object recognition based on the tapped position information in the camera image from the photographing device 160 and the camera image.

（ステップＳ１１０）
ユーザ端末１０Ａの特定モジュール１３５は、学習データをＤＢ１にアップロードする。ＤＢ１は、学習データにＩＤを付与し、格納先のＵＲＬを取得する。 (Step S110)
The specific module 135 of the user terminal 10A uploads the learning data to the DB1. The DB1 assigns an ID to the learning data and obtains the URL of the storage location.

（ステップＳ１１２）
ユーザ端末１０Ａの特定モジュール１３５は、ＤＢ１からアップロードした学習データのＩＤと、学習データの格納先を示すＵＲＬを取得する。 (Step S112)
The specific module 135 of the user terminal 10A obtains the ID of the learning data uploaded from the DB 1 and the URL indicating the storage location of the learning data.

（ステップＳ１１４）
ユーザ端末１０Ａの特定モジュール１３５は、学習データのＩＤとＵＲＬとをＤＢ２にアップロードする。 (Step S114)
The specific module 135 of the user terminal 10A uploads the ID and URL of the learning data to the DB2.

（ステップＳ１１６）
ユーザ端末１０Ａの特定モジュール１３５は、ＤＢ２から学習データのＩＤとＵＲＬとの格納が完了した旨の通知を取得する。 (Step S116)
The specific module 135 of the user terminal 10A obtains a notification from the DB2 that the storage of the learning data ID and URL has been completed.

（ステップＳ１１８）
ユーザ端末１０Ａの特定モジュール１３５は、カメラ画像を基に、物体の物体認識を行うリクエストをサーバ２０Ａに送信する。 (Step S118)
The specific module 135 of the user terminal 10A sends a request to perform object recognition of the object to the server 20A based on the camera image.

（ステップＳ１２０）
サーバ２０Ａの物体認識モジュール２３６は、取得されたカメラ画像の画像データに基づき、画像データ内の物体に最も類似する物体の情報を取得するようＤＢ２にリクエストする。なお、ＤＢ２は、カメラ画像の画像データと、画像内の物体ＩＤとを含む学習データを用いて学習された学習済みモデルを保持しており、サーバ２０Ａからカメラ画像の画像データが入力されると、この画像データに対応する物体ＩＤを出力してもよい。 (Step S120)
The object recognition module 236 of the server 20A requests the DB2 to obtain information on an object that is most similar to the object in the image data, based on the image data of the obtained camera image. Note that the DB2 holds a trained model that is trained using learning data including image data of a camera image and object ID in the image, and when the image data of a camera image is input from the server 20A, , the object ID corresponding to this image data may be output.

（ステップＳ１２２）
サーバ２０Ａは、ＤＢ２から、物体ＩＤと、物体の画面内の位置を含む位置情報とを取得する。 (Step S122)
The server 20A acquires the object ID and position information including the position of the object within the screen from the DB2.

（ステップＳ１２４）
サーバ２０Ａの物体認識モジュール２３６は、物体ＩＤと位置情報とをユーザ端末１０Ａに送信する。 (Step S124)
The object recognition module 236 of the server 20A transmits the object ID and position information to the user terminal 10A.

（ステップＳ１２６）
ユーザ端末１０Ａの特定モジュール１３５は、物体の位置情報に基づき、カメラ画像の所定領域をクロップする（切り出す）。 (Step S126)
The specific module 135 of the user terminal 10A crops (cuts out) a predetermined area of the camera image based on the position information of the object.

（ステップＳ１２８）
ユーザ端末１０Ａの特定モジュール１３５は、クロップ済み画像内の文字列を認識するようリクエストをサーバ２０Ｂに送信する。 (Step S128)
The specific module 135 of the user terminal 10A sends a request to the server 20B to recognize the character string in the cropped image.

（ステップＳ１３０）
サーバ２０Ｂの文字認識モジュール２３７は、クロップ済み画像内から文字列を認識し、画像内における文字列の位置情報を取得する。サーバ２０Ｂの文字認識モジュール２３７は、認識された文字列の文字情報と位置情報とを含む認識結果をユーザ端末１０Ａに送信する。 (Step S130)
The character recognition module 237 of the server 20B recognizes a character string from within the cropped image and obtains position information of the character string within the image. The character recognition module 237 of the server 20B transmits a recognition result including character information and position information of the recognized character string to the user terminal 10A.

（ステップＳ１３２）
ユーザ端末１０Ａの取得モジュール１３６は、認識結果に含まれる文字列の一番上の行（文字情報）を、画面の名称（画面名）とし、以下の行（文字情報）を画面の内容（項目名）として保存する。 (Step S132)
The acquisition module 136 of the user terminal 10A takes the top line (character information) of the character string included in the recognition result as the name of the screen (screen name), and the following lines (character information) as the screen contents (items). Save as (name).

（ステップＳ１３４）
ユーザ端末１０Ａの表示制御モジュール１３７は、録画中であることをディスプレイ１５１に表示制御し、ユーザ（例えば熟練者）に録画中であることを報知する。 (Step S134)
The display control module 137 of the user terminal 10A controls the display 151 to display that recording is in progress, and notifies the user (for example, an expert) that recording is in progress.

（ステップＳ１３６）
ユーザは、ユーザ端末１０Ｂの画面上に対して手を用いて設定、入力等の作業を行うことで、撮影装置１６０は、作業中の手と、認識対象の物体（ユーザ端末１０Ｂ）を撮影する。 (Step S136)
The user uses his/her hand to perform settings, input, etc. on the screen of the user terminal 10B, and the photographing device 160 photographs the hand being worked on and the object to be recognized (user terminal 10B). .

（ステップＳ１３８）
ユーザ端末１０Ａは、例えば、手の動きに基づいてアニメーションを作成する作成モジュール（不図示）を有してもよい。手のアニメーションは、ガイド情報に含まれる。 (Step S138)
The user terminal 10A may include, for example, a creation module (not shown) that creates animation based on hand movements. The hand animation is included in the guide information.

（ステップＳ１４０）
ユーザは、作業終了ボタンを押下する。作業終了ボタンは、物理的なボタンでもよいし、ディスプレイ１５１上に表示されたボタンでもよい。表示されたボタンの場合、ユーザの手のタップがボタン上で検知されれば、ユーザ端末１０Ａの作成モジュールは、作業終了を検知してもよい。 (Step S140)
The user presses the work end button. The work end button may be a physical button or a button displayed on the display 151. In the case of a displayed button, the creation module of the user terminal 10A may detect the end of the task if a tap of the user's hand is detected on the button.

（ステップＳ１４２）
ユーザ端末１０Ａは、ガイド情報をＤＢ１にアップロードする。ＤＢ１は、ガイド情報の格納先を示すＵＲＬを取得する。 (Step S142)
The user terminal 10A uploads guide information to the DB1. DB1 acquires the URL indicating the storage location of the guide information.

（ステップＳ１４４）
ユーザ端末１０Ａの取得モジュール１３６は、ガイド情報の格納先を示すＵＲＬと、画面の内容（項目名又は文字情報）とをＤＢ１から取得する。 (Step S144)
The acquisition module 136 of the user terminal 10A acquires the URL indicating the storage location of the guide information and the contents of the screen (item name or character information) from the DB1.

（ステップＳ１４６）
ユーザ端末１０Ａの取得モジュール１３６は、ガイド情報の格納先を示すＵＲＬと、画面の内容（項目名又は文字情報）とを関連付けて、ＤＢ２に送信する。ＤＢ２は、ガイド格納先データとして、ガイド情報のＵＲＬと、格納先データとを保存する。 (Step S146)
The acquisition module 136 of the user terminal 10A associates the URL indicating the storage location of the guide information with the contents of the screen (item name or character information) and transmits the associated URL to the DB2. The DB 2 stores the guide information URL and storage location data as guide storage location data.

（ステップＳ１４８）
ユーザ端末１０Ａの取得モジュール１３６は、ＤＢ２から保存が完了した旨の通知を取得する。 (Step S148)
The acquisition module 136 of the user terminal 10A acquires a notification from the DB2 that the storage has been completed.

図８は、実施形態に係る情報処理システム１の表示処理の一例を示すシーケンス図である。
（ステップＳ２０２）
ユーザは、ユーザ端末１０Ａを用いて、Ａアプリの表示起動ボタンを押下する。また、ユーザは、ユーザ端末１０Ａに向かって、「Ｈｅｙ ○○、Ａアプリを起動して」等と発話し、Ａアプリを起動する。このとき、ユーザは、「Ａアプリの表示機能を起動して」等と発話し、Ａアプリの表示機能を起動するようにしてもよい。 FIG. 8 is a sequence diagram illustrating an example of display processing of the information processing system 1 according to the embodiment.
(Step S202)
The user uses the user terminal 10A to press the display start button for the A app. Further, the user speaks to the user terminal 10A, such as "Hey XXX, start the A app," and starts the A app. At this time, the user may say something such as "start the display function of the A app" to start the display function of the A app.

（ステップＳ２０４）
ユーザ端末１０Ａは、Ａアプリの起動に伴い、撮影装置１６０を起動し、撮影中のカメラ画像をディスプレイ１５１に表示する。 (Step S204)
Upon activation of the A app, the user terminal 10A activates the photographing device 160 and displays the camera image being photographed on the display 151.

（ステップＳ２０６）
ユーザは、ディスプレイ１５１越しに見えるユーザ端末１０Ｂ（物体）をタップする。ユーザ端末１０Ａの検知モジュール１３９は、タップのジェスチャを検知する。また、ユーザは、音声等でユーザ端末１０Ｂの存在をユーザ端末１０Ａに知らせてもよい。 (Step S206)
The user taps the user terminal 10B (object) visible through the display 151. The detection module 139 of the user terminal 10A detects the tap gesture. Further, the user may notify the user terminal 10A of the existence of the user terminal 10B by voice or the like.

（ステップＳ２０８）
ユーザ端末１０Ａの特定モジュール１３５は、撮影装置１６０からのカメラ画像内における、タップした位置情報と、カメラ画像とに基づき、物体認識を行うようサーバ２０Ａにリクエストする。 (Step S208)
The specific module 135 of the user terminal 10A requests the server 20A to perform object recognition based on the tapped position information in the camera image from the photographing device 160 and the camera image.

（ステップＳ２１０）
サーバ２０Ａの物体認識モジュール２３６は、取得されたカメラ画像の画像データに基づき、画像データ内の物体に最も類似する物体の情報を取得するようＤＢ２にリクエストする。なお、ＤＢ２は、カメラ画像の画像データと、画像内の物体ＩＤとを含む学習データを用いて学習された学習済みモデルを保持しており、サーバ２０Ａからカメラ画像の画像データが入力されると、この画像データに対応する物体ＩＤを出力してもよい。 (Step S210)
The object recognition module 236 of the server 20A requests the DB2 to obtain information on an object that is most similar to the object in the image data, based on the image data of the obtained camera image. Note that the DB2 holds a trained model that is trained using learning data including image data of a camera image and object ID in the image, and when the image data of a camera image is input from the server 20A, , the object ID corresponding to this image data may be output.

（ステップＳ２１２）
サーバ２０Ａの物体認識モジュール２３６は、ＤＢ２から、物体ＩＤと、物体の画面内の位置を含む位置情報とを取得する。 (Step S212)
The object recognition module 236 of the server 20A acquires the object ID and position information including the position of the object within the screen from the DB2.

（ステップＳ２１４）
サーバ２０Ａの物体認識モジュール２３６は、物体ＩＤと位置情報とをユーザ端末１０Ａに送信する。 (Step S214)
The object recognition module 236 of the server 20A transmits the object ID and position information to the user terminal 10A.

（ステップＳ２１６）
ユーザ端末１０Ａの特定モジュール１３５は、物体の位置情報に基づき、カメラ画像の所定領域をクロップする（切り出す）。 (Step S216)
The specific module 135 of the user terminal 10A crops (cuts out) a predetermined area of the camera image based on the position information of the object.

（ステップＳ２１８）
ユーザ端末１０Ａの特定モジュール１３５は、クロップ済み画像内の文字列を認識するようリクエストをサーバ２０Ｂに送信する。 (Step S218)
The specific module 135 of the user terminal 10A sends a request to the server 20B to recognize the character string in the cropped image.

（ステップＳ２２０）
サーバ２０Ｂの文字認識モジュール２３７は、クロップ済み画像内から文字列を認識し、画像内における文字列の位置情報を取得する。サーバ２０Ｂの文字認識モジュール２３７は、認識された文字列の文字情報と位置情報とを含む認識結果をユーザ端末１０Ａに送信する。 (Step S220)
The character recognition module 237 of the server 20B recognizes a character string from within the cropped image and obtains position information of the character string within the image. The character recognition module 237 of the server 20B transmits a recognition result including character information and position information of the recognized character string to the user terminal 10A.

（ステップＳ２２２）
ユーザ端末１０Ａの取得モジュール１３６は、認識結果に含まれる文字列の一番上の行（文字情報）を、画面の名称（画面名）とし、以下の行（文字情報）を画面の内容（項目名）として保存する。 (Step S222)
The acquisition module 136 of the user terminal 10A takes the top line (character information) of the character string included in the recognition result as the name of the screen (screen name), and the following lines (character information) as the screen contents (items). Save as (name).

（ステップＳ２２４）
ユーザ端末１０Ａの取得モジュール１３６は、画面の名称（画面名）と画面の内容（項目名又は文字情報）に基づいて、ガイド情報をダウンロードするためのＵＲＬを取得するようＤＢ２にリクエストする。 (Step S224)
The acquisition module 136 of the user terminal 10A requests the DB2 to acquire the URL for downloading the guide information based on the name of the screen (screen name) and the contents of the screen (item name or character information).

（ステップＳ２２６）
ユーザ端末１０Ａの取得モジュール１３６は、ＤＢ２からガイド情報のダウンロードＵＲＬを取得する。 (Step S226)
The acquisition module 136 of the user terminal 10A acquires the guide information download URL from the DB2.

（ステップＳ２２８）
ユーザ端末１０Ａの取得モジュール１３６は、ガイド情報のダウンロードＵＲＬを用いて、ＤＢ１に格納されたガイド情報のファイルを取得するよう、ＤＢ１にリクエストする。 (Step S228)
The acquisition module 136 of the user terminal 10A requests the DB1 to acquire the guide information file stored in the DB1 using the guide information download URL.

（ステップＳ２３０）
ユーザ端末１０Ａの取得モジュール１３６は、ＤＢ１からガイド情報のファイルをダウンロードして取得する。 (Step S230)
The acquisition module 136 of the user terminal 10A downloads and acquires the guide information file from the DB1.

（ステップＳ２３２）
ユーザ端末１０Ａの表示制御モジュール１３７は、例えば、手の３Ｄモデルに設定したガイド情報（手のアニメーション）を、項目名の文字列（文字情報）の位置情報に基づいて再生制御する。 (Step S232)
The display control module 137 of the user terminal 10A controls the reproduction of guide information (hand animation) set in the 3D model of the hand, for example, based on the position information of the character string (character information) of the item name.

（ステップＳ２３４）
ユーザは、ディスプレイ１５１越しに見るユーザ端末１０Ｂの所定画面に対し、ユーザ端末１０Ａのディスプレイ１５１に表示されるガイド情報に従って、操作を行う。また、ユーザは、「次へ」などの音声を発したり、次に進むことを示す所定のジェスチャや、ハンドアニメーションと同じ手の位置（二次元や三次元データは問わず）で同じ動きをしたりすることで、ユーザ端末１０側でこれらの言動を認識し、次の行の項目名の文字列に対応するガイド情報を表示させることができる。 (Step S234)
The user operates a predetermined screen of the user terminal 10B viewed through the display 151 according to the guide information displayed on the display 151 of the user terminal 10A. In addition, the user can utter a voice such as "next", make a predetermined gesture to indicate going to the next step, or make the same movement with the same hand position (regardless of 2D or 3D data) as in the hand animation. By doing so, these words and actions can be recognized on the user terminal 10 side, and guide information corresponding to the character string of the item name in the next line can be displayed.

以上、本開示技術は、所定画面からユーザインタフェースを用いて設定等がされる場合に、適切なガイドを行うことを可能にする仕組みを提供することができる。また、物体認識により所定画面でのガイド情報を特定しつつ、適切なガイド情報を他の装置に重畳表示することで、ユーザ端末１０Ｂの所定画面はそのまま表示し、ユーザ端末１０Ｂとは異なるユーザ端末１０Ａを用いて、所定画面の入力や設定をアシストすることができる。 As described above, the disclosed technology can provide a mechanism that allows appropriate guidance to be provided when settings are made from a predetermined screen using a user interface. In addition, by identifying guide information on a predetermined screen through object recognition and superimposing appropriate guide information on another device, the predetermined screen of the user terminal 10B can be displayed as is, and a user terminal different from the user terminal 10B can 10A can be used to assist with input and settings on a predetermined screen.

＜画面例＞
次に、ユーザ端末１０Ａのディスプレイ１５１に表示される例について説明する。図９は、実施形態に係るユーザ端末１０Ａにおける画面遷移の一例を示す図である。図９に示す画面Ｈ１０は、携帯電話の枠Ｇ１０が表示制御モジュール１３７により表示され、ユーザに対してユーザ端末１０Ｂの位置を知らせる例を示す。枠Ｇ１０は表示されなくてもよい。 <Screen example>
Next, an example displayed on the display 151 of the user terminal 10A will be described. FIG. 9 is a diagram showing an example of screen transitions in the user terminal 10A according to the embodiment. A screen H10 shown in FIG. 9 shows an example in which a cell phone frame G10 is displayed by the display control module 137 to notify the user of the location of the user terminal 10B. Frame G10 may not be displayed.

次に、画面Ｈ１２は、ユーザが、所定画面が表示されたユーザ端末１０Ｂを枠Ｇ１０の位置にもってきた例を示す。画面Ｈ１２において、ユーザの手ＵＨは、ユーザ端末１０Ｂをタップすると（図８に示すステップＳ２０６）、検知モジュール１３９がタップのジェスチャを検知し、ユーザ端末１０Ａにおいてガイド機能が開始される。 Next, a screen H12 shows an example in which the user brings the user terminal 10B on which a predetermined screen is displayed to the position of the frame G10. On the screen H12, when the user's hand UH taps the user terminal 10B (step S206 shown in FIG. 8), the detection module 139 detects the tap gesture, and the guide function is started in the user terminal 10A.

次に、画面Ｈ１４では、物体認識、文字認識がなされ、ガイド情報が重畳表示される例を示す。表示制御モジュール１３７は、所定画面に対応するガイド情報を取得すると、ガイド情報をディスプレイ１５１に重畳表示するよう制御する。画面Ｈ１４に示す例では、３ＤのアニメーションＧＨがＡＲ技術を用いて重畳表示される。 Next, on the screen H14, an example is shown in which object recognition and character recognition are performed, and guide information is displayed in a superimposed manner. When the display control module 137 acquires the guide information corresponding to a predetermined screen, the display control module 137 controls the display 151 to display the guide information in a superimposed manner. In the example shown in screen H14, 3D animation GH is displayed in a superimposed manner using AR technology.

なお、開示技術は、上述した各実施形態に限定されるものではなく、開示技術の要旨を逸脱しない範囲内において、他の様々な形で実施することができる。このため、上記各実施形態はあらゆる点で単なる例示にすぎず、限定的に解釈されるものではない。例えば、上述した各処理ステップは処理内容に矛盾を生じない範囲で任意に順番を変更し、または並列に実行することができる。 Note that the disclosed technology is not limited to the embodiments described above, and can be implemented in various other forms without departing from the gist of the disclosed technology. Therefore, the above embodiments are merely illustrative in all respects, and should not be interpreted in a limited manner. For example, each of the above-mentioned processing steps can be arbitrarily changed in order or executed in parallel as long as there is no inconsistency in the processing contents.

本開示の各実施形態のプログラムは、コンピュータに読み取り可能な記憶媒体に記憶された状態で提供されてもよい。記憶媒体は、「一時的でない有形の媒体」に、プログラムを記憶可能である。プログラムは、限定でなく例として、ソフトウェアプログラムやコンピュータプログラムを含む。 The program of each embodiment of the present disclosure may be provided in a state stored in a computer-readable storage medium. The storage medium is a "non-temporary tangible medium" that can store a program. Programs include, by way of example and not limitation, software programs and computer programs.

［変形例］
また、上述した各実施形態における変形例を以下に示す。 [Modified example]
Further, modifications of each of the above-described embodiments are shown below.

＜変形例１＞
変形例１では、物体認識において、以下の技術を組み合わせてもよい。例えば、ＶＰＳ（Visual Positioning System）、ＧＰＳ（Global Positioning System）などの位置情報、信号強度などから通信状況を判定するネットワーク情報、超音波等である。例えば、複数の同じような物体として、券売機や宅配ロッカー、宅配ボックスなどがある場合に、位置情報と画像とに基づいて、どの物体かが特定されてもよい。 <Modification 1>
In the first modification, the following techniques may be combined in object recognition. Examples include location information such as VPS (Visual Positioning System) and GPS (Global Positioning System), network information for determining communication status based on signal strength, etc., ultrasound, and the like. For example, if there are multiple similar objects such as a ticket vending machine, delivery locker, delivery box, etc., which object may be identified based on position information and an image.

＜変形例２＞
物体の特定と認識に用いる画像は矩形でなくてもよい。また、物体認識において、動的ではなく事前に準備した学習データを活用してもよい。例えば、Semantic Segmentation、Instance Segmentation、立体認識などを用いて物体認識を行ってもよい。 <Modification 2>
Images used to identify and recognize objects do not have to be rectangular. Furthermore, in object recognition, learning data prepared in advance, rather than dynamically, may be utilized. For example, object recognition may be performed using Semantic Segmentation, Instance Segmentation, stereoscopic recognition, or the like.

＜変形例３＞
ユーザが次の操作に移る際のトリガとして、音声等が用いられてもよい。この場合、ユーザ端末１０Ａは、スピーカやマイクが設けられる。また、画像認識が状況判断の際に適宜組み合わされてもよい。また、ガイドの進め方は音声認識でなくてもよい。例えば、ハンドジェスチャ、画像認識による設定されそうな項目の分析、画面をタップなどで用いられてもよい。 <Modification 3>
Voice or the like may be used as a trigger when the user moves on to the next operation. In this case, the user terminal 10A is provided with a speaker and a microphone. Further, image recognition may be appropriately combined when determining the situation. Furthermore, the guidance does not need to be based on voice recognition. For example, hand gestures, analysis of items that are likely to be set by image recognition, tapping on the screen, etc. may be used.

＜変形例４＞
ガイド情報は、ハンドモーションだけでなく、次の情報を追加してもよい。例えば、音声、テキスト、動画像、静止画像などの画像、熱等の触覚や煙の匂い等の嗅覚、食べ物の味等の味覚などの情報である。また、ガイド情報は、ユーザ端末１０Ａに保存しておいたハンドモーションをアニメーションとして、手の３Ｄモデルに反映させて動かしてもよい。 <Modification 4>
The guide information may include not only the hand motion but also the following information. For example, the information includes audio, text, moving images, images such as still images, tactile sensations such as heat, olfactory sensations such as the smell of smoke, and tastes such as the taste of food. Further, the guide information may be moved by reflecting a hand motion stored in the user terminal 10A on the 3D model of the hand as an animation.

＜変形例５＞
ガイド情報の登録が可能な情報処理装置（物体）としては、スマートフォン、文字が書いてあるポスターやノート、パーソナルコンピュータ、スタジアムの大型スクリーン、コンビニエンスストアのマルチメディア端末、郵便局や宅配便の送り状 (ゆうプリタッチ等)、宅配便ロッカーや宅配ボックス (Pudo等)、リモコン (クーラー等)、券売機、テレビ等が挙げられる。 <Modification 5>
Information processing devices (objects) that can register guide information include smartphones, posters and notebooks with text written on them, personal computers, large screens at stadiums, multimedia terminals at convenience stores, and invoices ( Examples include Yu-Pri Touch, etc.), delivery lockers and delivery boxes (Pudo, etc.), remote controls (coolers, etc.), ticket vending machines, televisions, etc.

＜変形例６＞
視覚障害者が使用する点字等を含む文字認識で取得したテキストを用いて、画面を分析する方法は、アイコン等の画像認識と組み合わせてもよく、フォントのサイズや太さ、位置、色を用いてのテキストの重要性を判別し、重要性に基づいてガイド情報の表示の順を変更したりしてもよい。 <Modification 6>
The method of analyzing the screen using text obtained by character recognition, including Braille, etc. used by visually impaired people, may be combined with image recognition such as icons, and the method uses the size, thickness, position, and color of the font to analyze the screen. The importance of each text may be determined, and the display order of guide information may be changed based on the importance.

＜変形例７＞
また、ガイド情報を表示する装置としては、スマートフォン、タブレット端末、スマートグラス、ＨＭＤ（Head Mounted Display）、スマートコンタクトレンズ、脳侵襲型デバイス、脳非侵襲型デバイス、ロボットなどが挙げられる。 <Modification 7>
Examples of devices that display guide information include smartphones, tablet terminals, smart glasses, HMDs (Head Mounted Displays), smart contact lenses, brain-invasive devices, non-brain-invasive devices, and robots.

＜変形例８＞
また、サーバ２０が有する各機能のうち、少なくとも１部はユーザ端末１０Ａに設けられてもよい。また、サーバ２０が記憶する各データのうち、少なくとも１部はユーザ端末１０Ａに設けられてもよい。 <Modification 8>
Moreover, at least a part of each function that the server 20 has may be provided in the user terminal 10A. Moreover, at least a part of each data stored by the server 20 may be provided in the user terminal 10A.

１情報処理システム
１０Ａ、１０Ｂ情報処理装置（ユーザ端末）
２０、２０Ａ、２０Ｂ情報処理装置（サーバ）
１１０、２１０処理装置（ＣＰＵ）
１２０、２２０ネットワーク通信インタフェース
１３０、２３０メモリ
１３１、２３１オペレーティングシステム
１３２、２３２ネットワーク通信モジュール
１３３画像関連データ
１３４テキスト関連データ
１３５特定モジュール
１３６取得モジュール
１３７表示制御モジュール
１３８取引制御モジュール
１３９検知モジュール
１４０警告モジュール
１５０ユーザインタフェース
１６０撮影装置
１７０、２７０通信バス
２３３物体データ
２３４文字認識データ
２３５ガイド格納先データ
２３６物体認識モジュール
２３７文字認識モジュール
２３８ガイド制御モジュール
２３９電子商取引モジュール
２４０音声認識モジュール 1 Information processing system 10A, 10B Information processing device (user terminal)
20, 20A, 20B Information processing device (server)
110, 210 Processing unit (CPU)
120, 220 Network communication interface 130, 230 Memory 131, 231 Operating system 132, 232 Network communication module 133 Image related data 134 Text related data 135 Specification module 136 Acquisition module 137 Display control module 138 Transaction control module 139 Detection module 140 Warning module 150 User interface 160 Photographing device 170, 270 Communication bus 233 Object data 234 Character recognition data 235 Guide storage destination data 236 Object recognition module 237 Character recognition module 238 Guide control module 239 Electronic commerce module 240 Voice recognition module

Claims

One or more processors included in the information processing device,
Displaying and controlling images being photographed by a photographing device;
identifying a screen of another information processing device displayed in the image;
each character string displayed within the specified screen is recognized and a recognition result including each character information of each character string is obtained;
acquiring each guide information corresponding to each character information included in the recognition result;
Displaying and controlling list information that allows selection of each character information included in the recognition result;
Information processing that performs display control of the guide information corresponding to one character information selected by the user from the list information by associating it with the one character information and superimposing it on the image being photographed. Method.

Superimposing and controlling the display of the guide information includes:
identifying the one character information corresponding to the position of the user 's hand;
Identifying guide information corresponding to the identified one character information,
2. The information processing method according to claim 1, further comprising superimposing and controlling the display of the specified guide information on the image being photographed in association with the specified one character information.

Identifying the screen includes:
A screen of the information processing device is identified, and identification information of the screen and position information of the screen are acquired;
including specifying a character recognition area based on the position information,
To obtain the recognition result of the character string ,
3. The information processing method according to claim 1, further comprising recognizing a character string from an image within the specified area and obtaining a recognition result including character information of the character string .

The information processing method according to any one of claims 1 to 3, wherein the guide information includes a video related to another user's hand operation corresponding to each item related to each character information.

The one or more processors,
Claim further comprising: outputting a warning when a character input to an item related to the character information is recognized and the recognition result of the character does not satisfy an input condition associated with the item. 5. The information processing method according to any one of 1 to 4.

One or more processors included in the information processing device,
Displaying and controlling images being photographed by a photographing device;
identifying a screen of another information processing device displayed in the image;
each character string displayed within the specified screen is recognized and a recognition result including each character information of each character string is obtained;
acquiring each guide information corresponding to each character information included in the recognition result;
Displaying and controlling list information that allows selection of each character information included in the recognition result;
A program that causes guide information corresponding to one character information selected by a user from the list information to be displayed in association with the one character information and superimposed on the image being photographed.

An information processing device including one or more processors,
The one or more processors,
Displaying and controlling images being photographed by a photographing device;
identifying a screen of another information processing device displayed in the image;
each character string displayed within the specified screen is recognized and a recognition result including each character information of each character string is obtained;
acquiring each guide information corresponding to each character information included in the recognition result;
Displaying and controlling list information that allows selection of each character information included in the recognition result;
Information processing that performs display control of the guide information corresponding to one character information selected by the user from the list information by associating it with the one character information and superimposing it on the image being photographed. Device.

One or more processors included in the information processing device,
Displaying and controlling images being photographed by a photographing device;
identifying a screen of another information processing device displayed in the image;
Recognizing characters displayed within the specified screen and obtaining recognition results including character information of the characters;
photographing each guide information including the user's hand operation corresponding to each recognized character information using the photographing device;
An information processing method for transmitting each piece of character information and each piece of guide information to a server.

The information processing method according to claim 8 , wherein the guide information includes character input by the user.

10. The information processing method according to claim 9, wherein if personal information is included in the character input, the guide information does not include the personal information.

Identifying the screen includes:
A screen of the information processing device is identified, and identification information of the screen and position information of the screen are acquired;
including specifying a character recognition area based on the position information,
Obtaining the recognition result of the character is as follows:
The information processing method according to any one of claims 8 to 10 , comprising recognizing a character from an image within the specified area and obtaining a recognition result including character information of the character.

One or more processors included in the information processing device,
Displaying and controlling images being photographed by a photographing device;
identifying a screen of another information processing device displayed in the image;
Recognizing characters displayed within the specified screen and obtaining recognition results including character information of the characters;
photographing each guide information including the user's hand operation corresponding to each recognized character information using the photographing device;
An information processing method that transmits each of the character information and each of the guide information to a server.

An information processing device including one or more processors,
The one or more processors,
Displaying and controlling images being photographed by a photographing device;
identifying a screen of another information processing device displayed in the image;
Recognizing characters displayed within the specified screen and obtaining recognition results including character information of the characters;
photographing each guide information including the user's hand operation corresponding to each recognized character information using the photographing device;
An information processing device that transmits each piece of character information and each piece of guide information to a server.