JP6695454B1

JP6695454B1 - Information processing apparatus, information processing method, and program

Info

Publication number: JP6695454B1
Application number: JP2019006893A
Authority: JP
Inventors: 重人斉藤
Original assignee: 株式会社パン・パシフィック・インターナショナルホールディングス
Priority date: 2019-01-18
Filing date: 2019-01-18
Publication date: 2020-05-20
Anticipated expiration: 2039-01-18
Also published as: JP2020119001A

Abstract

【課題】商品画像から商品を認識する場合の認識精度を向上させる。【解決手段】各商品の商品識別情報がラベル付けされた複数の商品画像を用いる学習によって生成された第１モデルと、複数の類似商品を含むグループのグループ識別情報とに基づき、商品の認識処理を行う第１認識処理部と、複数の類似商品の商品画像を用いる学習によって生成された第２モデルに基づき、類似商品の認識処理を行う第２認識処理部と、第１商品画像を取得する画像取得部と、認識された商品が類似商品であれば、類似商品を含むグループのグループ識別情報を取得し、認識された商品が類似商品でなければ、認識された商品の商品識別情報を取得する第１取得部と、グループ識別情報が取得された場合、第２認識処理部から、認識された商品の商品識別情報を取得する第２取得部と、商品識別情報を出力する出力部と、を備える。【選択図】図５PROBLEM TO BE SOLVED: To improve recognition accuracy when recognizing a product from a product image. A product recognition process is performed based on a first model generated by learning using a plurality of product images labeled with product identification information of each product and group identification information of a group including a plurality of similar products. And a second recognition processing unit that performs recognition processing of similar products based on a second model generated by learning using product images of a plurality of similar products, and a first product image is acquired. If the image acquisition unit and the recognized product are similar products, the group identification information of the group including the similar product is acquired. If the recognized product is not the similar product, the product identification information of the recognized product is acquired. A first acquisition unit that acquires the product identification information of the recognized product from the second recognition processing unit when the group identification information is acquired; and an output unit that outputs the product identification information, Equipped with. [Selection diagram] Fig. 5

Description

本開示は、情報処理装置、情報処理方法、及びプログラムに関する。 The present disclosure relates to an information processing device, an information processing method, and a program.

近年、商品画像を用いて商品の認識を行うシステムの開発が行われている。例えば、特許文献１には、商品画像に商品識別コードが付与されているか否かを認識し、商品識別コードが付与されていなければ商品の認識を保留する技術が開示されている。 In recent years, a system for recognizing a product using a product image has been developed. For example, Patent Document 1 discloses a technique of recognizing whether or not a product identification code is attached to a product image, and suspending recognition of the product if the product identification code is not attached.

特開２０１８−４５４９４号公報JP, 2018-45494, A

しかしながら、商品識別コードが全商品に付与されているとしても、例えば大量の商品が扱われる場合、商品画像の撮像角度や障害物（例えば背景色や手）などに起因して、認識精度が下がってしまうという課題がある。 However, even if the product identification code is assigned to all products, for example, when a large number of products are handled, the recognition accuracy is lowered due to the image capturing angle of the product image and obstacles (for example, background color and hands). There is a problem that it will end up.

本発明は、以上説明した事情を鑑みてなされたものであり、商品画像から商品を認識する場合の認識精度を向上させることが可能な情報処理装置、情報処理方法、及びプログラムを提供することを目的の一つとする。 The present invention has been made in view of the circumstances described above, and provides an information processing device, an information processing method, and a program capable of improving recognition accuracy when a product is recognized from a product image. One of the purposes.

本開示の一実施形態に係る情報処理装置は、各商品の商品識別情報がラベル付けされた複数の商品画像を用いる学習によって生成された商品認識のための第１モデルと、複数の類似商品を含むグループのグループ識別情報とに基づき、商品の認識処理を行う第１認識部を１又は複数有する第１認識処理部と、前記複数の類似商品の商品画像を用いる学習によって生成された類似商品認識のための第２モデルに基づき、類似商品の認識処理を行う第２認識部を１又は複数有する第２認識処理部と、認識対象の第１商品画像を取得する画像取得部と、前記第１モデルを用いて前記第１商品画像の商品を認識した前記第１認識処理部から、認識された商品が前記類似商品であれば、前記類似商品を含むグループのグループ識別情報を取得し、認識された商品が前記類似商品でなければ、前記認識された商品の商品識別情報を取得する第１取得部と、前記グループ識別情報が取得された場合、前記第２モデルを用いて前記第１商品画像の商品を認識した前記第２認識処理部から、認識された商品の商品識別情報を取得する第２取得部と、前記第１取得部、又は前記第２取得部により取得された前記商品識別情報を出力する出力部と、を備える。 An information processing apparatus according to an embodiment of the present disclosure includes a first model for product recognition generated by learning using a plurality of product images labeled with product identification information of each product, and a plurality of similar products. A first recognition processing unit having one or a plurality of first recognition units that perform product recognition processing based on the group identification information of the included group, and similar product recognition generated by learning using product images of the plurality of similar products. A second recognition processing unit having one or a plurality of second recognition units for performing recognition processing of similar products, an image acquisition unit for acquiring a first product image to be recognized, and the first recognition unit. If the recognized product is the similar product, the group identification information of the group including the similar product is acquired and recognized from the first recognition processing unit that has recognized the product of the first product image using the model. If the purchased product is not the similar product, a first acquisition unit that acquires product identification information of the recognized product, and if the group identification information is acquired, the first product image using the second model Second acquisition unit that acquires the product identification information of the recognized product from the second recognition processing unit that has recognized the product, and the product identification information acquired by the first acquisition unit or the second acquisition unit. And an output unit for outputting.

開示の技術によれば、商品画像から商品を認識する場合の認識精度を向上させることが可能となる。 According to the disclosed technology, it is possible to improve the recognition accuracy when recognizing a product from a product image.

本実施形態に係る認識システム１の概略構成を示す図である。It is a figure which shows schematic structure of the recognition system 1 which concerns on this embodiment. 本実施形態に係る情報処理装置２０のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the information processing apparatus 20 which concerns on this embodiment. 本実施形態に係る学習装置１０４の機能構成の一例を示す図である。It is a figure which shows an example of a functional structure of the learning device 104 which concerns on this embodiment. 画像補正の一例を示す図である。It is a figure which shows an example of image correction. 本実施形態における認識装置１０６の機能構成の一例を示す図である。It is a figure which shows an example of a functional structure of the recognition apparatus 106 in this embodiment. 類似商品の学習対象部分を示すための図である。It is a figure for showing a learning target part of a similar product. 実施形態におけるシステムの全体処理の一例を示すフローチャートである。It is a flow chart which shows an example of the whole processing of the system in an embodiment. 実施形態における学習処理の一例を示すフローチャートである。It is a flow chart which shows an example of learning processing in an embodiment. 実施形態における認識処理の一例を示すフローチャートである。It is a flow chart which shows an example of recognition processing in an embodiment.

以下、本発明の実施形態について図面を参照しつつ詳細に説明する。なお、同一の要素には同一の符号を付し、重複する説明を省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The same elements will be denoted by the same reference symbols, without redundant description.

Ａ．本実施形態
本実施形態では、大型量販店やスーパーマーケットなどにおいて、商品画像から商品を認識するシステムを想定するが、商品を販売するあらゆる店舗（コンビニエンスストアや専門店など）にも適用可能である。また、商品の認識に限らず、撮像画像に含まれる物体を認識する場合にも適用可能である。以下、物体は、商品を例にして説明するが、これに限られるものではない。 A. This Embodiment In the present embodiment, a system for recognizing a product from a product image is assumed in a large-scale mass retailer, a supermarket, or the like, but it is also applicable to any store that sells the product (a convenience store, a specialty store, etc.). Further, the invention is not limited to recognition of merchandise and can be applied to recognition of an object included in a captured image. Hereinafter, the object will be described by taking a product as an example, but the object is not limited to this.

（１）構成
＜システム構成＞
図１は、本実施形態に係る認識システム１の概略構成を示す図である。図１に示すように、認識システム１は、第１撮像装置１０２と、学習装置１０４と、認識装置１０６と、第２撮像装置１０８とを有する。また、上述した装置の全部又は一部は、通信ネットワークを介して相互通信可能に接続されている。通信ネットワークは、例えば、インターネット、ＬＡＮ、専用線、電話回線、企業内ネットワーク、移動体通信網、Ｂｌｕｅｔｏｏｔｈ（登録商標）、ＷｉＦｉ（Wireless Fidelity）、その他の通信回線、それらの組み合わせ等のいずれであってもよく、有線であるか無線であるかを問わない。なお、認識システム１は、第１撮像装置１０２と学習装置１０４とを学習フェーズの第１システム、認識装置１０６と第２撮像装置１０８とを認識フェーズの第２システムとして構成してもよい。 (1) Configuration <system configuration>
FIG. 1 is a diagram showing a schematic configuration of a recognition system 1 according to this embodiment. As shown in FIG. 1, the recognition system 1 includes a first imaging device 102, a learning device 104, a recognition device 106, and a second imaging device 108. Further, all or part of the above-mentioned devices are connected to each other via a communication network so that they can communicate with each other. The communication network is, for example, the Internet, a LAN, a dedicated line, a telephone line, a corporate network, a mobile communication network, Bluetooth (registered trademark), WiFi (Wireless Fidelity), any other communication line, or a combination thereof. However, it does not matter whether it is wired or wireless. The recognition system 1 may be configured such that the first imaging device 102 and the learning device 104 are the first system in the learning phase, and the recognition device 106 and the second imaging device 108 are the second system in the recognition phase.

第１撮像装置１０２は、任意の角度から商品を撮像可能な装置である。撮像された商品画像は、学習装置１０４に送信される。また、第１撮像装置１０２は、任意の角度から撮像できなくても、ユーザが任意の角度から撮像するようにしてもよい。第１撮像装置１０２は、学習装置１０４が、任意の角度から商品を認識可能な三次元モデルを作成することができるように、様々な角度から商品を撮像し、大量の撮像画像を取得するようにしてもよい。例えば、学習装置１０４により三次元モデルが生成される場合、一例として、一商品につき約５００枚の撮像画像が撮像される。 The first imaging device 102 is a device that can image a product from an arbitrary angle. The captured product image is transmitted to the learning device 104. Further, the first image pickup device 102 may not be able to take an image from an arbitrary angle, but the user may take an image from an arbitrary angle. The first imaging device 102 images the product from various angles and acquires a large number of captured images so that the learning device 104 can create a three-dimensional model that can recognize the product from any angle. You can For example, when the learning device 104 generates a three-dimensional model, as an example, about 500 captured images are captured for each product.

学習装置１０４は、第１撮像装置１０２から複数の商品の撮像画像（商品画像）を取得し、例えば複数の撮像画像に対して学習を行い、商品認識を行うための学習モデルを生成する。本実施形態の学習では、例えば、大量の商品を精度よく学習するため、ディープラーニング（深層学習）を用いることが好適であるが、これに限られない。学習装置１０４は、所定の商品数を単位にして学習モデルを生成する学習部を有する。例えば、１つの学習部で１万個の商品に対する学習モデルが生成される。学習部は、ＧＰＵ（Graphics Processing Unit）などにより構成されうる。また、学習される商品画像には、商品ごとに商品識別情報（以下、「商品ＩＤ」とも称す。）が正解としてラベル付けされ、必要に応じて画像の補正が行われる。商品ＩＤは、例えばＪＡＮコードである。 The learning device 104 acquires captured images (commodity images) of a plurality of products from the first imaging device 102, performs learning on the plurality of captured images, and generates a learning model for product recognition. In the learning of the present embodiment, for example, it is preferable to use deep learning (deep learning) in order to accurately learn a large number of products, but the learning is not limited to this. The learning device 104 has a learning unit that generates a learning model in units of a predetermined number of products. For example, a learning model for 10,000 products is generated by one learning unit. The learning unit can be configured by a GPU (Graphics Processing Unit) or the like. Further, in the learned product image, product identification information (hereinafter, also referred to as “product ID”) is labeled as a correct answer for each product, and the image is corrected as necessary. The product ID is, for example, a JAN code.

また、学習装置１０４は、類似する商品をグループ化した類似商品リストを作成する。類似商品リストについて、類似する商品の商品ＩＤがグループ化され、グループごとにグループ識別情報（以下、「グループＩＤ」とも称す。）が付与される。類似する商品は、例えば、商品同士の画像の類似度が閾値以上であることを用いて自動で特定することが可能である。学習装置１０４は、学習した単位ごとに、生成された学習モデルと、類似商品リストとを、認識装置１０６に出力する。また、学習装置１０４は、類似する商品の類似部分に対して機械学習、好ましくは深層学習を行い、類似商品用の学習モデルを生成する。学習装置１０４の詳細な機能については、図３を用いて説明する。 The learning device 104 also creates a similar product list in which similar products are grouped. In the similar product list, product IDs of similar products are grouped, and group identification information (hereinafter, also referred to as “group ID”) is given to each group. Similar products can be automatically specified by using, for example, that the similarity between images of products is equal to or more than a threshold value. The learning device 104 outputs the generated learning model and the similar product list to the recognition device 106 for each learned unit. Further, the learning device 104 performs machine learning, preferably deep learning, on the similar portion of the similar product to generate a learning model for the similar product. Detailed functions of the learning device 104 will be described with reference to FIG.

認識装置１０６は、学習装置１０４の学習単位ごとに認識部を有しており、各認識部は、学習モデルと、類似商品リストとを記憶する。例えば、認識部は、ＧＰＵとＣＰＵ（Central Processing Unit）との組み合わせにより構成されうる。認識装置１０６は、第２撮像装置１０８から取得した対象商品の商品画像に対して、一次認識処理を行う。認識装置１０６は、複数の認識部を用いて認識された商品の商品ＩＤを取得する。認識装置１０６は、取得した商品ＩＤが類似商品リストに含まれていれば、類似商品用の二次認識処理を行い、より精度の高い商品認識を行う。認識装置１０６の詳細な機能については、図５を用いて説明する。 The recognition device 106 has a recognition unit for each learning unit of the learning device 104, and each recognition unit stores a learning model and a similar product list. For example, the recognition unit can be configured by a combination of GPU and CPU (Central Processing Unit). The recognition device 106 performs a primary recognition process on the product image of the target product acquired from the second imaging device 108. The recognition device 106 acquires the product ID of the product recognized by using the plurality of recognition units. If the acquired product ID is included in the similar product list, the recognition device 106 performs secondary recognition processing for similar products, and performs more accurate product recognition. Detailed functions of the recognition device 106 will be described with reference to FIG.

第２撮像装置１０８は、例えばカメラであり、商品を撮像し、撮像画像を認識装置１０６に出力する。第２撮像装置１０８は、例えばレジにおいて商品を撮像するために用いられたり、棚に陳列された商品を管理するために用いられたりする。 The second imaging device 108 is, for example, a camera, images a product, and outputs the captured image to the recognition device 106. The second imaging device 108 is used, for example, to image a product at a cash register or to manage a product displayed on a shelf.

＜ハード構成＞
図２は、本実施形態に係る情報処理装置２０のハードウェア構成の一例を示す図である。情報処理装置２０は、学習装置１０４又は認識装置１０６のハードウェアとして実装可能である。図２に示すように、情報処理装置２０は、プロセッサ２０２と、メモリ２０４と、ストレージ２０６と、入出力インタフェース（入出力Ｉ／Ｆ）２０８と、通信インタフェース（通信Ｉ／Ｆ）２１０とを含む。情報処理装置２００のＨＷの各構成要素は、例えばバスＢを介して相互に接続される。 <Hardware configuration>
FIG. 2 is a diagram illustrating an example of a hardware configuration of the information processing device 20 according to the present embodiment. The information processing device 20 can be implemented as hardware of the learning device 104 or the recognition device 106. As shown in FIG. 2, the information processing device 20 includes a processor 202, a memory 204, a storage 206, an input / output interface (input / output I / F) 208, and a communication interface (communication I / F) 210. . The respective components of the HW of the information processing device 200 are connected to each other via a bus B, for example.

情報処理装置２０は、プロセッサ２０２と、メモリ２０４と、ストレージ２０６と、入出力Ｉ／Ｆ２０８と、通信Ｉ／Ｆ２１０との協働により、本実施形態に記載される機能、及び／又は方法を実現する。 The information processing apparatus 20 realizes the functions and / or methods described in the present embodiment by the cooperation of the processor 202, the memory 204, the storage 206, the input / output I / F 208, and the communication I / F 210. To do.

プロセッサ２０２は、ストレージ２０６に記憶されるプログラムに含まれるコードまたは命令によって実現する機能、および／または、方法を実行する。プロセッサ２０２は、例えば、中央処理装置（ＣＰＵ）、ＭＰＵ（Micro Processing Unit）、ＧＰＵ、マイクロプロセッサ（microprocessor）、プロセッサコア（processor core）、マルチプロセッサ（multiprocessor）、ＡＳＩＣ（Application-Specific Integrated Circuit）、ＦＰＧＡ（Field Programmable Gate Array）等を含む。 The processor 202 executes the function and / or the method realized by the code or instruction included in the program stored in the storage 206. The processor 202 is, for example, a central processing unit (CPU), MPU (Micro Processing Unit), GPU, microprocessor (microprocessor), processor core (processor core), multiprocessor (multiprocessor), ASIC (Application-Specific Integrated Circuit), Includes FPGA (Field Programmable Gate Array), etc.

メモリ２０４は、ストレージ２０６からロードしたプログラムを一時的に記憶し、プロセッサ２０２に対して作業領域を提供する。メモリ２０４には、プロセッサ２０２がプログラムを実行している間に生成される各種データも一時的に格納される。メモリ２０４は、例えば、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）などを含む。 The memory 204 temporarily stores the program loaded from the storage 206 and provides a work area to the processor 202. Various data generated while the processor 202 executes the program is also temporarily stored in the memory 204. The memory 204 includes, for example, a RAM (Random Access Memory) and a ROM (Read Only Memory).

ストレージ２０６は、プロセッサ２０２により実行されるプログラム等を記憶する。ストレージ２０６は、例えば、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリなどを含む。 The storage 206 stores a program executed by the processor 202 and the like. The storage 206 includes, for example, a HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, and the like.

入出力Ｉ／Ｆ２０８は、情報処理装置２０に対する各種操作を入力する入力装置、および、情報処理装置２０で処理された処理結果を出力する出力装置を含む。入出力Ｉ／Ｆ２０８は、入力装置と出力装置が一体化していてもよいし、入力装置と出力装置とに分離していてもよい。 The input / output I / F 208 includes an input device that inputs various operations to the information processing device 20, and an output device that outputs a processing result processed by the information processing device 20. In the input / output I / F 208, the input device and the output device may be integrated, or the input device and the output device may be separated.

入力装置は、ユーザからの入力を受け付けて、当該入力に係る情報をプロセッサ２０２に伝達できる全ての種類の装置のいずれか、または、その組み合わせにより実現される。入力装置は、例えば、タッチパネル、タッチディスプレイ、キーボード等のハードウェアキーや、マウス等のポインティングデバイス、カメラ（画像を介した操作入力）、マイク（音声による操作入力）を含む。 The input device is realized by any of all types of devices capable of receiving an input from a user and transmitting information related to the input to the processor 202, or a combination thereof. The input device includes, for example, a hardware key such as a touch panel, a touch display, and a keyboard, a pointing device such as a mouse, a camera (operation input via an image), and a microphone (operation input by voice).

出力装置は、プロセッサ２０２で処理された処理結果を出力することができる全ての種類の装置のいずれか、または、その組み合わせにより実現される。当該処理結果を映像、および／または、動画像として出力する場合、出力装置は、フレームバッファに書き込まれた表示データに従って、当該表示データを表示することができる全ての種類の装置のいずれかまたはその組み合わせにより実現される。出力装置は、例えば、タッチパネル、タッチディスプレイ、モニタ（例えば、液晶ディスプレイ、ＯＥＬＤ（Organic Electroluminescence Display）など）などに画像やテキスト情報等を表示可能な装置、スピーカ（音声出力）などを含む。 The output device is realized by any of all types of devices capable of outputting the processing result processed by the processor 202, or a combination thereof. In the case of outputting the processing result as a video and / or a moving image, the output device may be any one or all of the devices capable of displaying the display data according to the display data written in the frame buffer. It is realized by combination. The output device includes, for example, a device capable of displaying images, text information, and the like on a touch panel, a touch display, a monitor (for example, a liquid crystal display, an OELD (Organic Electroluminescence Display), etc.), a speaker (voice output), and the like.

通信Ｉ／Ｆ２１０は、ネットワークを介して各種データの送受信を行う。当該通信は、有線、無線のいずれで実行されてもよく、互いの通信が実行できるのであれば、どのような通信プロトコルを用いてもよい。通信Ｉ／Ｆ２１０は、ネットワークを介して、他の情報処理装置との通信を実行する機能を有する。通信Ｉ／Ｆ２１０は、各種データをプロセッサ２０２からの指示に従って、他の情報処理装置に送信する。また、通信Ｉ／Ｆ２１０は、他の情報処理装置から送信された各種データを受信し、プロセッサ２０２に伝達する。 The communication I / F 210 transmits and receives various data via the network. The communication may be executed by wire or wireless, and any communication protocol may be used as long as mutual communication can be executed. The communication I / F 210 has a function of executing communication with another information processing device via a network. The communication I / F 210 transmits various data to another information processing device according to an instruction from the processor 202. Further, the communication I / F 210 receives various data transmitted from another information processing device, and transmits the data to the processor 202.

本実施形態のプログラムは、コンピュータに読み取り可能な記憶媒体に記憶された状態で提供されてもよい。記憶媒体は、「一時的でない有形の媒体」に、プログラムを記憶可能である。プログラムは、例えば、ソフトウェアプログラムやコンピュータプログラムを含む。 The program of this embodiment may be provided in a state of being stored in a computer-readable storage medium. The storage medium can store the program on a “non-transitory tangible medium”. The programs include, for example, software programs and computer programs.

情報処理装置２０における処理の少なくとも一部は、１以上のコンピュータにより構成されるクラウドコンピューティングにより実現されていてもよい。情報処理装置２０における処理の少なくとも一部を、他の情報処理装置により行う構成としてもよい。この場合、プロセッサ２０２により実現される各機能部の処理のうち少なくとも一部の処理を、他の情報処理装置で行う構成としてもよい。 At least a part of the processing in the information processing device 20 may be realized by cloud computing including one or more computers. At least part of the processing in the information processing device 20 may be configured to be performed by another information processing device. In this case, at least part of the processing of each functional unit realized by the processor 202 may be configured to be performed by another information processing device.

＜学習装置の機能構成＞
図３は、本実施形態に係る学習装置１０４の機能構成の一例を示す図である。図３に示す例では、学習装置１０４は、画像補正部３０２と、深層学習部３０４とを有する。画像補正部３０２及び深層学習部３０４は、例えば図２に示すプロセッサ２０２や作業領域としてのメモリ２０４などにより実現されうる。 <Functional configuration of learning device>
FIG. 3 is a diagram showing an example of a functional configuration of the learning device 104 according to the present embodiment. In the example illustrated in FIG. 3, the learning device 104 includes an image correction unit 302 and a deep learning unit 304. The image correction unit 302 and the deep learning unit 304 can be realized by, for example, the processor 202 shown in FIG. 2 and the memory 204 as a work area.

画像補正部３０２は、必要に応じて、第１撮像装置１０２から取得した商品画像に対し、補正処理を行う。補正処理は、商品画像から、商品部分の画像を切り出し、商品の部分画像にＪＡＮコードなどの商品識別情報を学習の正解としてラベル付けすることを含む。また、補正処理は、背景を補正すること、手の画像を付与すること、他の商品と組み合わせることなどを含んでもよい。画像補正部３０２は、補正後の画像を深層学習部３０４に出力する。 The image correction unit 302 performs correction processing on the product image acquired from the first imaging device 102 as necessary. The correction process includes cutting out an image of a product portion from the product image and labeling the partial image of the product with product identification information such as a JAN code as a correct answer for learning. In addition, the correction process may include correcting the background, adding an image of a hand, combining with another product, and the like. The image correction unit 302 outputs the corrected image to the deep learning unit 304.

（画像補正の例）
ここで、本実施形態における画像補正の例について説明する。図４は、画像補正の一例を示す図である。商品画像Ａ１は、撮像された商品画像から商品部分が切り出された部分画像（以下、「部分画像Ａ１」とも称す。）である。 (Example of image correction)
Here, an example of image correction in the present embodiment will be described. FIG. 4 is a diagram showing an example of image correction. The product image A1 is a partial image (hereinafter also referred to as “partial image A1”) in which a product part is cut out from the captured product image.

画像Ｃ１〜Ｃ３は、部分画像Ａ１に異なる背景色を組み合わせた商品画像である。例えば、画像Ｃ１は、部分画像Ａ１に黒の背景色を組みあわせた例である。画像Ｃ２は、部分画像Ａ１に赤の背景色を組み合わせた例である。画像Ｃ３は、部分画像Ａ１に緑の背景色を組み合わせた例である。これにより、撮像された商品画像の背景が任意の色であったとしても、適切に商品を認識させることができ、認識精度の向上を図ることができる。 Images C1 to C3 are product images obtained by combining the partial image A1 with different background colors. For example, the image C1 is an example in which the partial image A1 is combined with a black background color. The image C2 is an example in which the red background color is combined with the partial image A1. The image C3 is an example in which the green background color is combined with the partial image A1. Accordingly, even if the background of the captured product image is an arbitrary color, the product can be appropriately recognized, and the recognition accuracy can be improved.

画像Ｈ１〜Ｈ３は、部分画像Ａ１に手の画像を組み合わせた商品画像である。例えば、画像Ｈ１は、部分画像Ａ１の下部を手が保持している例である。画像Ｈ２は、部分画像Ａ１の中間を手が保持している例である。画像Ｈ３は、部分画像Ａ１の上部を手が保持している例である。これにより、撮像された商品画像に手の画像を組み合わせることで、ユーザが手に商品を持っている状態で商品が撮像されても、適切に商品を認識させることができ、認識精度の向上を図ることができる。 Images H1 to H3 are product images obtained by combining the partial image A1 with the image of the hand. For example, the image H1 is an example in which the hand holds the lower portion of the partial image A1. The image H2 is an example in which the hand holds the middle of the partial image A1. The image H3 is an example in which the hand holds the upper portion of the partial image A1. With this, by combining the captured product image with the image of the hand, the product can be properly recognized even when the product is captured while the user holds the product in the hand, and the recognition accuracy is improved. Can be planned.

画像Ｐ１〜Ｐ３は、部分画像Ａ１に他の商品画像を組み合わせた商品画像である。例えば、画像Ｐ１は、部分画像Ａ１に他の商品（例えば、「商品Ｍ」とする。）２つを組み合わせた例である。画像Ｐ２は、部分画像Ａ１に他の商品（以下、「商品Ｎ」とする。）２つを組み合わせた例である。画像Ｐ３は、部分画像Ａ１に商品Ｍ１つと商品Ｎ１つとを組み合わせた例である。これにより、撮像された商品画像に他の商品画像が含まれていたとしても、適切に商品を認識させることができ、認識精度の向上を図ることができる。 Images P1 to P3 are product images obtained by combining the partial image A1 with other product images. For example, the image P1 is an example in which the partial image A1 is combined with two other products (for example, “product M”). The image P2 is an example in which the partial image A1 is combined with two other products (hereinafter, referred to as “product N”). The image P3 is an example in which the partial image A1 is combined with one product M1 and one product N1. Accordingly, even if the captured product image includes another product image, the product can be appropriately recognized, and the recognition accuracy can be improved.

上述したとおり、画像補正部３０２により画像の補正が行われることで、事前に様々な撮像時の場面を想定して学習モデルを作成しておくことができる。これにより、異なる場面で商品が撮像された場合でも、商品の認識精度の向上を図ることができる。 As described above, since the image correction unit 302 corrects the image, it is possible to create a learning model in advance by assuming various scenes at the time of imaging. This makes it possible to improve the recognition accuracy of the product even when the product is captured in different scenes.

図３に戻り、深層学習部３０４は、画像補正部３０２から取得した商品画像に対して、ディープラーニング（深層学習）を行い、商品認識のための学習モデルを生成する。深層学習部３０４は、例えば、ＧＰＵ制御部３１０と、複数の学習部（１，２，３，…）を含む学習処理部３１２とを有する。各学習部は、例えばＧＰＵにより実現されうる。 Returning to FIG. 3, the deep learning unit 304 performs deep learning (deep learning) on the product image acquired from the image correction unit 302 to generate a learning model for product recognition. The deep learning unit 304 includes, for example, a GPU control unit 310 and a learning processing unit 312 including a plurality of learning units (1, 2, 3, ...). Each learning unit can be realized by, for example, a GPU.

ＧＰＵ制御部３１０は、各学習部の学習を制御する。上述したように、各学習部は、例えば１万個の商品の商品画像が入力され、学習モデル（以下、「第１モデル」とも称す。
）をそれぞれ生成する。生成された学習モデルには、類似商品リストが付与される。類似商品リストは、深層学習部３０４が、商品画像同士の類似度などを用いて類似商品を判別し、類似商品をグループ化し、グループにグループＩＤを付与することで生成可能である。 The GPU control unit 310 controls the learning of each learning unit. As described above, in each learning unit, for example, product images of 10,000 products are input, and the learning model (hereinafter, also referred to as “first model”).
) Are generated respectively. A similar product list is attached to the generated learning model. The similar product list can be generated by the deep learning unit 304 by discriminating similar products using the degree of similarity between product images, grouping similar products, and assigning a group ID to the group.

また、ＧＰＵ制御部３１０は、類似商品の商品画像のうち、他の商品と類似する部分を用いて、学習部に深層学習を実行させる。学習処理部３１２は、類似商品の部分画像を用いて、類似商品用の学習モデル（以下、「第２モデル」とも称す。）を生成する。類似商品の部分画像には商品ＩＤがラベル付けされている。ＧＰＵ制御部３１０は、各学習部により生成された各第１モデル、又は学習処理部３１２により生成された第２モデルを、認識装置１０６に出力する。 In addition, the GPU control unit 310 causes the learning unit to perform deep learning by using a portion of the product image of the similar product that is similar to another product. The learning processing unit 312 uses the partial image of the similar product to generate a learning model for the similar product (hereinafter, also referred to as “second model”). A product ID is labeled on the partial image of the similar product. The GPU control unit 310 outputs the first model generated by each learning unit or the second model generated by the learning processing unit 312 to the recognition device 106.

＜認識装置の機能構成＞
図５は、本実施形態における認識装置１０６の機能構成の一例を示す図である。図５に示す例では、認識装置１０６は、第１制御部５０２と、第１認識処理部５０４と、第２認識処理部５０６とを有する。第１制御部５０２と、第１認識処理部５０４と、第２認識処理部５０６とは、例えば図２に示すプロセッサ２０２や作業領域としてのメモリ２０４などにより実現されうる。 <Functional configuration of recognition device>
FIG. 5 is a diagram showing an example of a functional configuration of the recognition device 106 in the present embodiment. In the example illustrated in FIG. 5, the recognition device 106 includes a first control unit 502, a first recognition processing unit 504, and a second recognition processing unit 506. The first control unit 502, the first recognition processing unit 504, and the second recognition processing unit 506 can be realized by, for example, the processor 202 shown in FIG. 2 and the memory 204 as a work area.

第１制御部５０２は、画像取得部５１２と、第１取得部５１４と、判定部５１６と、第２取得部５１８と、出力部５２０とを有する。画像取得部５１２は、第２撮像装置１０８から認識対象の第１商品画像を取得する。画像取得部５１２は、取得した第１商品画像を第１認識処理部５０４に出力する。 The first control unit 502 includes an image acquisition unit 512, a first acquisition unit 514, a determination unit 516, a second acquisition unit 518, and an output unit 520. The image acquisition unit 512 acquires the first product image to be recognized from the second imaging device 108. The image acquisition unit 512 outputs the acquired first product image to the first recognition processing unit 504.

第１認識処理部５０４は、各商品の商品識別情報がラベル付けされた複数の商品画像を複数用いる学習によって生成された商品認識のための第１モデルと、複数の類似商品を含むグループのグループＩＤとに基づき、商品の認識処理を行う認識部を複数有する。各認識部は、ＧＰＵとＣＰＵとの組み合わせにより実現されてもよい。また、各認識部は、ソフトウエア上で実現されてもよく、この場合、１つのＧＰＵに複数の認識部を実行させることが可能であり、ＧＰＵを増加することで、処理能力を拡張することができる。また、実行させる認識部の数が制御可能になる。 The first recognition processing unit 504 includes a first model for product recognition generated by learning using a plurality of product images labeled with product identification information of each product, and a group of groups including a plurality of similar products. It has a plurality of recognition units that perform product recognition processing based on the ID. Each recognition unit may be realized by a combination of GPU and CPU. Further, each recognition unit may be realized by software, and in this case, it is possible to allow one GPU to execute a plurality of recognition units, and to increase the processing capacity by increasing the number of GPUs. You can Also, the number of recognition units to be executed can be controlled.

なお、各認識部は、取得した第１商品画像に対し、自身が保持する第１モデルを用いて認識処理（一次認識処理）を行う。各認識部は、認識結果として、第１商品画像に一番近い商品の商品ＩＤと、その類似度とを取得する。このとき、商品ＩＤが類似商品リストに含まれる場合、商品ＩＤを含むグループのグループＩＤが認識結果に含まれる。類似度は、例えば画像同士の二乗誤差などの公知の類似度が用いられればよい。第１認識処理部５０４は、類似度が一番大きい商品ＩＤ又はグループＩＤを第１取得部５１４に出力する。 It should be noted that each recognition unit performs recognition processing (primary recognition processing) on the acquired first product image using the first model held by itself. Each recognition unit acquires, as a recognition result, the product ID of the product closest to the first product image and the similarity thereof. At this time, when the product ID is included in the similar product list, the group ID of the group including the product ID is included in the recognition result. As the similarity, a known similarity such as a square error between images may be used. The first recognition processing unit 504 outputs the product ID or group ID having the highest degree of similarity to the first acquisition unit 514.

第１取得部５１４は、第１認識処理部５０４から、認識された商品が類似商品であれば、類似商品のグループＩＤを取得し、認識された商品が類似商品でなければ、認識された商品の商品ＩＤを取得する。例えば、第１取得部５１４は、各認識部により出力された認識結果内で、類似度が一番大きい商品ＩＤ又はグループＩＤを第１認識処理部５０４から取得する。 The first acquisition unit 514 acquires the group ID of the similar product from the first recognition processing unit 504 if the recognized product is a similar product, and if the recognized product is not a similar product, the recognized product. To acquire the product ID. For example, the first acquisition unit 514 acquires, from the first recognition processing unit 504, the product ID or the group ID having the highest degree of similarity in the recognition result output by each recognition unit.

判定部５１６は、第１取得部５１４により取得されたＩＤは、商品ＩＤであるかグループＩＤであるかを判定する。判定部５１６は、判定結果が商品ＩＤであれば、出力部５２０に商品ＩＤを出力する。また、判定部５１６は、判定結果がグループＩＤであれば、商品画像を第２認識処理部５０６に出力する。 The determination unit 516 determines whether the ID acquired by the first acquisition unit 514 is a product ID or a group ID. If the determination result is the product ID, the determination unit 516 outputs the product ID to the output unit 520. Further, the determination unit 516 outputs the product image to the second recognition processing unit 506 if the determination result is the group ID.

第２認識処理部５０６は、複数の類似商品の商品画像を用いる学習によって生成された類似商品認識のための第２モデルに基づき、類似商品の認識処理（二次認識処理）を行う第２認識部５３４（認識部Ｘ１，Ｘ２，・・・）を１又は複数有する。また、第２認識処理部５０６は、類似商品用の認識部を制御する第２制御部５３２を有する。第２制御部５３２は、各認識部から認識結果を取得する。認識結果は、商品ＩＤや、商品認識が適切に行われなかったときのエラーを示す情報（以下、「エラー情報」とも称す。）を含む。 The second recognition processing unit 506 performs second product recognition processing (secondary recognition processing) based on a second model for similar product recognition generated by learning using product images of a plurality of similar products. It has one or more units 534 (recognition units X1, X2, ...). The second recognition processing unit 506 also includes a second control unit 532 that controls the recognition unit for similar products. The second control unit 532 acquires the recognition result from each recognition unit. The recognition result includes a product ID and information indicating an error when the product recognition is not properly performed (hereinafter, also referred to as “error information”).

（類似商品用の学習対象）
ここで、図６は、類似商品の学習対象部分を示すための図である。図６に示す例では、各商品Ｓ１〜Ｓ３は、各部分画像Ｒ１０２，Ｒ１０４，Ｒ１０６以外はほぼ同じである。これらの商品Ｓ１〜Ｓ３は、類似商品として、１つのグループに割り当てられ、グループＩＤが付与される。第２認識部５３４は、例えば各部分画像Ｒ１０２，Ｒ１０４，Ｒ１０６を用いて学習された学習モデルを有する。 (Learning target for similar products)
Here, FIG. 6 is a diagram for showing a learning target portion of the similar product. In the example shown in FIG. 6, the products S1 to S3 are substantially the same except for the partial images R102, R104, and R106. These products S1 to S3 are assigned to one group as similar products and are given a group ID. The 2nd recognition part 534 has a learning model learned using each partial image R102, R104, and R106, for example.

このとき、第２制御部５３２は、認識対象の第１商品画像について、グループＩＤなどから商品のどの部分が類似するかを特定し、類似する部分の部分画像を切り出す。第２認識部５３４は、認識対象の部分画像と、第２モデルとを用いて商品認識を行い、類似度が一番大きい商品の商品ＩＤと、類似度とを認識結果に含める。これにより、類似商品用の第２認識部５３４は、部分画像を用いて認識処理を行うことができ、すなわち、特徴が表れる部分を用いて認識処理を行うことができ、認識精度をさらに向上させることができる。 At this time, the second control unit 532 specifies which part of the product is similar to the first product image to be recognized from the group ID and cuts out the partial image of the similar part. The second recognition unit 534 performs product recognition using the partial image to be recognized and the second model, and includes the product ID of the product having the highest similarity and the similarity in the recognition result. As a result, the second recognition unit 534 for similar products can perform the recognition process using the partial image, that is, can perform the recognition process using the portion in which the feature appears, and further improve the recognition accuracy. be able to.

図５に戻り、第２制御部５３２は、類似度が所定の閾値未満であれば、エラー情報を第２取得部５１８に出力し、類似度が所定の閾値以上であれば、一致している商品ＩＤを第２取得部５１８に出力する。例えば、類似度が所定の閾値未満になる場合とは、部分画像が手などによって隠されており、適切な認識処理が実行されずに類似度が低下する場合などである。このとき、適切な認識処理が実行されないので、第２制御部５３２は、エラー情報を返すようにする。 Returning to FIG. 5, the second control unit 532 outputs error information to the second acquisition unit 518 if the degree of similarity is less than the predetermined threshold value, and matches if the degree of similarity is greater than or equal to the predetermined threshold value. The product ID is output to the second acquisition unit 518. For example, the case where the degree of similarity is less than a predetermined threshold value is a case where the partial image is hidden by a hand or the like and the degree of similarity is reduced without performing appropriate recognition processing. At this time, since the appropriate recognition process is not executed, the second control unit 532 returns the error information.

第２取得部５１８は、第２認識処理部５０６から、類似商品の認識結果を取得する。認識結果には、商品ＩＤ又はエラー情報が含まれる。第２取得部５１８は、商品ＩＤ又はエラー情報を出力部５２０に出力する。 The second acquisition unit 518 acquires the recognition result of the similar product from the second recognition processing unit 506. The recognition result includes the product ID or error information. The second acquisition unit 518 outputs the product ID or the error information to the output unit 520.

出力部５２０は、第１取得部５１４、又は第２取得部５１８により取得された商品ＩＤ又はエラー情報を出力する。例えば、出力部５２０は、認識装置１０６をレジ装置とした場合、精算部（不図示）に商品ＩＤを出力することで、商品ＩＤに対応する金額が表示されたり、利用者の支払金額に加算されたりする。また、出力部５２０は、出力装置にエラー情報を出力し、アテンダントを呼ぶようにしてもよい。これにより、認識できなかった商品については、アテンダントが通常通りスキャン装置等を用いて商品を認識することができるようになる。 The output unit 520 outputs the product ID or the error information acquired by the first acquisition unit 514 or the second acquisition unit 518. For example, when the recognition device 106 is a cashier device, the output unit 520 outputs the product ID to the settlement unit (not shown) so that the amount corresponding to the product ID is displayed or added to the payment amount of the user. It will be done. Further, the output unit 520 may output error information to an output device and call an attendant. This allows the attendant to recognize the product that cannot be recognized by using the scanning device or the like as usual.

（２）動作説明
次に、実施形態に係る認識システム１の動作について説明する。
＜システムの全体処理＞
図７は、実施形態におけるシステムの全体処理の一例を示すフローチャートである。図７に示す例では、認識システム１において、ステップＳ１０２で、第１撮像装置１０２は、学習対象の商品の商品画像を撮像する。後段の処理である学習において３次元モデルが作成できるように、様々な角度から商品が大量に撮像されるとよい。 (2) Description of Operation Next, the operation of the recognition system 1 according to the embodiment will be described.
<Overall processing of system>
FIG. 7 is a flowchart showing an example of the overall processing of the system in the embodiment. In the example illustrated in FIG. 7, in the recognition system 1, in step S102, the first imaging device 102 captures a product image of a product to be learned. A large amount of products may be imaged from various angles so that a three-dimensional model can be created in learning, which is the latter process.

ステップＳ１０４で、学習装置１０４は、第１撮像装置１０２から取得した多くの商品画像に対して、学習処理、例えば深層学習を実行する。 In step S104, the learning device 104 performs a learning process, for example, deep learning, on many product images acquired from the first imaging device 102.

ステップＳ１０６で、認識装置１０６は、第２撮像装置１０８により撮像された商品画像に対して、学習装置１０４により学習されたモデルを用いて、商品を認識するための認識処理を実行する。これにより、撮像された商品画像から、深層学習を用いて生成されたモデルを用いて、商品を特定することができるようになる。 In step S106, the recognition device 106 uses the model learned by the learning device 104 on the product image captured by the second imaging device 108 to perform recognition processing for recognizing the product. As a result, the product can be identified from the captured product image using the model generated by deep learning.

＜学習処理＞
次に、本実施形態における学習装置１０４の学習処理について説明する。図８は、実施形態における学習処理の一例を示すフローチャートである。 <Learning process>
Next, the learning process of the learning device 104 in this embodiment will be described. FIG. 8 is a flowchart showing an example of the learning process in the embodiment.

ステップＳ２０２で、画像補正部３０２は、第１撮像装置１０２から商品画像を取得する。 In step S202, the image correction unit 302 acquires a product image from the first imaging device 102.

ステップＳ２０４で、画像補正部３０２は、取得した商品画像に対し、補正処理を行う。補正処理は、例えば、商品画像から、商品部分の画像を切り出し、商品の部分画像にＪＡＮコードなどの商品識別情報をラベル付けすることを含む。また、補正処理は、背景を補正すること、手の画像を付与すること、他の商品と組み合わせることなどを含んでもよい。 In step S204, the image correction unit 302 performs a correction process on the acquired product image. The correction process includes, for example, cutting out an image of a product portion from the product image and labeling the partial image of the product with product identification information such as a JAN code. In addition, the correction process may include correcting the background, adding an image of a hand, combining with another product, and the like.

ステップＳ２０６で、深層学習部３０４は、画像補正部３０２から取得した商品画像に対して、ディープラーニング（深層学習）を行い、商品認識のための学習モデルを生成する。深層学習部３０４は、補正画像に対する一次学習と、類似商品の部分画像に対する二次学習とを実行する。 In step S206, the deep learning unit 304 performs deep learning (deep learning) on the product image acquired from the image correction unit 302 to generate a learning model for product recognition. The deep learning unit 304 executes the primary learning for the corrected image and the secondary learning for the partial image of the similar product.

ステップＳ２０７で、深層学習部３０４は、学習されたモデルに、類似商品をグループ化した類似商品リストを含める。 In step S207, the deep learning unit 304 includes a similar product list in which similar products are grouped in the learned model.

これにより、補正後の画像に対して深層学習を行うことで、様々な撮像場面にも対応することができ、認識精度を向上させることができる。また、類似商品に対しては、特徴が表れる部分画像を切り出して、その部分画像を用いて深層学習を行うことで、異なる部分を重点的に用いた学習モデルを生成することができ、認識精度を向上させることができる。 Thus, by performing deep learning on the corrected image, it is possible to deal with various imaging scenes and improve the recognition accuracy. In addition, for similar products, by cutting out partial images that show features and performing deep learning using the partial images, it is possible to generate a learning model that focuses on different parts. Can be improved.

＜認識処理＞
次に、本実施形態における認識装置１０６の認識処理について説明する。図９は、実施形態における認識処理の一例を示すフローチャートである。 <Recognition processing>
Next, the recognition process of the recognition device 106 in this embodiment will be described. FIG. 9 is a flowchart showing an example of the recognition process in the embodiment.

ステップＳ３０２で、第２撮像装置１０８は、認識対象の商品を撮像し、第１商品画像を生成する。 In step S302, the second imaging device 108 images the product to be recognized and generates a first product image.

ステップＳ３０４で、画像取得部５１２は、第２撮像装置１０８から認識対象の第１商品画像を取得する。 In step S304, the image acquisition unit 512 acquires the first product image to be recognized from the second imaging device 108.

ステップＳ３０６で、第１認識処理部５０４は、各認識部（１、２、３、…）に第１商品画像を出力し、各認識部から認識結果を取得する。認識結果には、類似度が一番大きい商品の商品ＩＤと、その類似度とが含まれる。また、各認識部は、類似度が一番大きい商品の商品ＩＤが類似のグループ内に含まれる場合、このグループのグループＩＤを認識結果に含める。また、第１認識処理部５０４は、各認識結果に含まれる類似度の中で、一番大きい類似度を有する商品ＩＤ又はグループＩＤを特定する。 In step S306, the first recognition processing unit 504 outputs the first product image to each recognition unit (1, 2, 3, ...) And acquires the recognition result from each recognition unit. The recognition result includes the product ID of the product having the highest degree of similarity and the degree of similarity. Further, when the product ID of the product having the highest degree of similarity is included in the similar group, each recognition unit includes the group ID of this group in the recognition result. In addition, the first recognition processing unit 504 identifies the product ID or the group ID having the highest similarity among the similarities included in each recognition result.

ステップＳ３０８で、第１取得部５１４は、第１認識処理部５０４からの認識結果を取得する。認識結果は、商品ＩＤ又はグループＩＤを含む。 In step S308, the first acquisition unit 514 acquires the recognition result from the first recognition processing unit 504. The recognition result includes the product ID or the group ID.

ステップＳ３１０で、判定部５１６は、第１取得部５１４により取得されたＩＤが、商品ＩＤであるかグループＩＤであるかを判定する。判定結果が商品ＩＤであれば（ステップＳ３１０−ＮＯ）、処理はステップＳ３１８に進む。また、判定結果がグループＩＤであれば（ステップＳ３１０−ＹＥＳ）、処理はステップＳ３１２に進む。 In step S310, the determination unit 516 determines whether the ID acquired by the first acquisition unit 514 is a product ID or a group ID. If the determination result is the product ID (step S310-NO), the process proceeds to step S318. If the determination result is the group ID (step S310-YES), the process proceeds to step S312.

ステップＳ３１２で、第２認識処理部５０６は、複数の類似商品の商品画像から商品を識別するための第２モデルを用いて、類似商品の認識処理を行う。このとき、第１商品画像から、特徴が表れる部分が部分画像として切り出されてもよい。特徴が表れる部分は、グループＩＤに関連付けて第２認識処理部５０６に保持されていてもよい。 In step S312, the second recognition processing unit 506 performs recognition processing of similar products using the second model for identifying the products from the product images of the plurality of similar products. At this time, a portion where the feature appears may be cut out as a partial image from the first product image. The portion in which the feature appears may be held in the second recognition processing unit 506 in association with the group ID.

ステップＳ３１４で、第２取得部５１８は、第２認識処理部５０６から、類似商品の認識結果を取得する。 In step S314, the second acquisition unit 518 acquires the recognition result of the similar product from the second recognition processing unit 506.

ステップＳ３１６で、第２取得部５１８は、取得した認識結果に含まれる情報は、商品ＩＤかエラー情報かを判定する。取得した情報が商品ＩＤであれば（ステップＳ３１６−ＹＥＳ）、処理はステップＳ３１８に進み、取得した情報がエラー情報であれば（ステップＳ３１６−ＮＯ）、処理はステップＳ３２０に進む。 In step S316, the second acquisition unit 518 determines whether the information included in the acquired recognition result is the product ID or the error information. If the acquired information is the product ID (step S316-YES), the process proceeds to step S318, and if the acquired information is error information (step S316-NO), the process proceeds to step S320.

ステップＳ３１８で、出力部５２０は、第１取得部５１４、又は第２取得部５１８により取得された商品ＩＤを出力する。 In step S318, the output unit 520 outputs the product ID acquired by the first acquisition unit 514 or the second acquisition unit 518.

ステップＳ３２０で、出力部５２０は、エラー情報を出力する。 In step S320, the output unit 520 outputs the error information.

これにより、通常の商品認識処理と、特徴部分が表れる部分画像を用いての商品認識処理とを含む２段階の認識処理を実行することが可能になり、商品の認識精度を向上させることができる。 Accordingly, it is possible to perform a two-step recognition process including a normal product recognition process and a product recognition process using a partial image in which a characteristic portion appears, and it is possible to improve product recognition accuracy. ..

以上説明したように、本実施形態に係る認識システム１によれば、学習段階では、予め商品画像が類似する商品についてはグループ分けをしておき、特徴が表れる部分画像を用いての学習処理を実行しておくことができる。認識段階では、商品画像を用いての一次認識処理と、一次認識処理では適切に商品を認識できない場合に、類似商品における部分画像を用いての二次認識処理とを行うことで、認識精度を向上させることができる。 As described above, according to the recognition system 1 according to the present embodiment, in the learning stage, products having similar product images are grouped in advance, and learning processing is performed using partial images showing features. You can keep running. At the recognition stage, the recognition accuracy is improved by performing a primary recognition process using a product image and a secondary recognition process using a partial image of a similar product when the primary recognition process cannot appropriately recognize the product. Can be improved.

また、学習段階において、商品画像に対して商品部分の部分画像を切り出し、この部分画像に様々な背景を組み合わせて学習をさせることで、認識対象の商品が様々な状況で撮像された場合でも、適切に商品認識を行うことができる。 Also, in the learning stage, by cutting out a partial image of the product portion with respect to the product image, and learning by combining various backgrounds to this partial image, even when the recognition target product is imaged in various situations, Appropriate product recognition can be performed.

Ｂ．その他
なお、本発明は、上述した実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲内において、他の様々な形で実施することができる。このため、上記実施形態はあらゆる点で単なる例示にすぎず、限定的に解釈されるものではない。例えば、上述した各処理ステップは処理内容に矛盾を生じない範囲で任意に順番を変更し、または並列に実行することができる。 B. Others The present invention is not limited to the above-described embodiment, and can be implemented in various other forms without departing from the scope of the present invention. Therefore, the above embodiments are merely examples in all respects, and should not be construed as limited. For example, the processing steps described above can be arbitrarily changed in order or executed in parallel as long as the processing contents do not conflict.

また、上述した本実施形態では、商品を認識するシステムを例にして説明したが、撮像された画像から物体を識別するシステムにも同様に適用することができる。例えば、顔認証や、指紋認証などの生態認証にも適用可能である。 Further, although the system for recognizing a product has been described as an example in the above-described embodiment, the present invention can be similarly applied to a system for identifying an object from a captured image. For example, it can be applied to face authentication and biometric authentication such as fingerprint authentication.

また、上述した本実施形態では、認識システム１は、リアルタイムのレジシステムに適用可能である。例えば、第２撮像装置１０８で撮像された商品画像を、認識装置１０６は、各認識部で並列処理して商品を認識する。認識された商品の商品ＩＤを用いて商品の価格を特定することが可能である。また、各学習部により生成された学習モデルは、必要なものだけを、認識装置１０６に送信することができる。これにより、或る店舗では品数が少ない場合、不必要なモデルの送付を防止し、処理負荷を減らすことができる。 Further, in the above-described present embodiment, the recognition system 1 can be applied to a real-time cashier system. For example, the recognition device 106 processes the product images captured by the second imaging device 108 in parallel in each recognition unit to recognize the product. It is possible to specify the price of the product using the product ID of the recognized product. Moreover, only the necessary learning model generated by each learning unit can be transmitted to the recognition device 106. As a result, when a certain store has a small number of items, it is possible to prevent unnecessary models from being sent and reduce the processing load.

また、上述した本実施形態では、認識装置１０６を店舗に１つ備え、第２撮像装置１０８を備えるレジ装置を複数備える場合でも適用でき、認識装置１０６は、各レジ装置から商品画像を取得して、商品ＩＤ、又は商品ＩＤに関連付けられた価格をレジ装置に返す。これにより、認識システム１が店舗に導入される際の導入コストを減らすことができる。 Further, in the above-described embodiment, the present invention can be applied to the case where one store is provided with the recognition device 106 and a plurality of cash register devices including the second imaging device 108 are provided. The recognition device 106 acquires the product image from each cash register device. Then, the product ID or the price associated with the product ID is returned to the cash register device. Thereby, the introduction cost when the recognition system 1 is introduced into a store can be reduced.

また、上述した本実施形態では、認識装置１０６は、店舗内の装置であることを例にしたが、認識装置１０６をクラウドサーバ側に設置し、クライアント側では、第２撮像装置１０８をレジ装置などに実装しても上述した処理を実行することができる。 Further, in the above-described embodiment, the recognition device 106 is an in-store device, but the recognition device 106 is installed on the cloud server side, and the second imaging device 108 is installed on the client side as the cash register device. It is possible to execute the above-mentioned processing even when it is installed in.

１…認識システム、２０…情報処理装置、１０２…第１撮像装置、１０４…学習装置、１０６…認識装置、１０８…第２撮像装置、２０２…プロセッサ、２０４…メモリ、２０６…ストレージ、２０８…入出力Ｉ／Ｆ、２１０…通信Ｉ／Ｆ、３０２…画像補正部、３０４…深層学習部、３１０…ＧＰＵ制御部、３１２…学習処理部、５０２…第１制御部、５０４…第１認識処理部、５０６…第２認識処理部、５１２…画像取得部、５１４…第１取得部、５１６…判定部、５１８…第２取得部、５２０…出力部、５３２…第２制御部、５３４…第２認識部。 1 ... Recognition system, 20 ... Information processing device, 102 ... First imaging device, 104 ... Learning device, 106 ... Recognition device, 108 ... Second imaging device, 202 ... Processor, 204 ... Memory, 206 ... Storage, 208 ... Input Output I / F, 210 ... Communication I / F, 302 ... Image correction unit, 304 ... Deep learning unit, 310 ... GPU control unit, 312 ... Learning processing unit, 502 ... First control unit, 504 ... First recognition processing unit , 506 ... Second recognition processing unit, 512 ... Image acquisition unit, 514 ... First acquisition unit, 516 ... Judgment unit, 518 ... Second acquisition unit, 520 ... Output unit, 532 ... Second control unit, 534 ... Second Recognition section.

Claims

Product recognition based on a first model for product recognition generated by learning using a plurality of product images labeled with product identification information of each product and group identification information of a group including a plurality of similar products A first recognition processing unit having one or a plurality of first recognition units for performing processing;
A second recognition processing unit having one or a plurality of second recognition units that perform recognition processing of similar products based on a second model for similar product recognition generated by learning using product images of the plurality of similar products;
An image acquisition unit that acquires a first product image to be recognized,
If the recognized product is the similar product, the group identification information of the group including the similar product is acquired from the first recognition processing unit that has recognized the product of the first product image using the first model. If the recognized product is not the similar product, a first acquisition unit that acquires product identification information of the recognized product,
When the group identification information is acquired, a second acquisition unit that acquires product identification information of the recognized product from the second recognition processing unit that has recognized the product of the first product image using the second model. When,
An output unit that outputs the product identification information acquired by the first acquisition unit or the second acquisition unit;
An information processing apparatus including.

The information processing apparatus according to claim 1, wherein the first model is a model learned by a combination of an image of a product portion in the product image and an arbitrary background image of a plurality of background images.

The information processing apparatus according to claim 2, wherein the plurality of background images include at least one of a background image having a different color, an image of a hand, and images of other products.

The information processing apparatus according to claim 1, wherein the second model is a model learned by using a partial image similar to another product in the product image of the similar product.

The information processing apparatus according to any one of claims 1 to 4, wherein the first model includes a three-dimensional model capable of recognizing a product from an arbitrary angle.

Product recognition based on a first model for product recognition generated by learning using a plurality of product images labeled with product identification information of each product, and group identification information of a group including a plurality of similar products A first recognition processing unit having one or a plurality of first recognition units for performing processing;
A second recognition processing unit having one or a plurality of second recognition units that perform recognition processing of similar products based on a second model for similar product recognition generated by learning using product images of the plurality of similar products;
An information processing method executed by an information processing device including a control unit,
The control unit is
Acquire the first product image to be recognized,
If the recognized product is the similar product, the group identification information of the group including the similar product is acquired from the first recognition processing unit that has recognized the product of the first product image using the first model. If the recognized product is not the similar product, obtain product identification information of the recognized product,
When the group identification information is acquired, the product identification information of the recognized product is acquired from the second recognition processing unit that has recognized the product of the first product image using the second model,
An information processing method for outputting the acquired product identification information.

Product recognition based on a first model for product recognition generated by learning using a plurality of product images labeled with product identification information of each product and group identification information of a group including a plurality of similar products A first recognition processing unit having one or a plurality of first recognition units for performing processing;
A second recognition processing unit having one or a plurality of second recognition units that perform recognition processing of similar products based on a second model for similar product recognition generated by learning using product images of the plurality of similar products;
A program to be executed by an information processing device having a control unit,
In the control unit,
Acquire the first product image to be recognized,
If the recognized product is the similar product, the group identification information of the group including the similar product is acquired from the first recognition processing unit that has recognized the product of the first product image using the first model. If the recognized product is not the similar product, obtain product identification information of the recognized product,
When the group identification information is acquired, the product identification information of the recognized product is acquired from the second recognition processing unit that has recognized the product of the first product image using the second model,
A program for executing a process of outputting the acquired product identification information.