JP2023169922A

JP2023169922A - Information processing system and control method thereof, and program

Info

Publication number: JP2023169922A
Application number: JP2022081259A
Authority: JP
Inventors: 亮高見澤; Akira Takamizawa; 竜一布施; Ryuichi Fuse; 新片岡; Shin Kataoka; 賢太郎田路; Kentaro Taji
Original assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2023-12-01
Anticipated expiration: 2042-05-18
Also published as: JP7299542B1; JP2023171366A

Abstract

To provide a mechanism for further accurately recognizing an object to be recognized, possibly including an object different from the object to be recognized therein.SOLUTION: An information processing system comprises acquisition means for acquiring a container area including a container out of an image, processing means for applying specific processing of reducing a difference from other images to a partial area inside the container area acquired by the acquisition means, and control means for executing control to perform learning processing using the image processed by the processing means.SELECTED DRAWING: Figure 6

Description

本発明は、画像を用いて、画像に含まれる認識すべき対象を認識するための技術に関する。 The present invention relates to a technique for recognizing an object to be recognized included in an image using an image.

従来、識別対象の物体を含む画像（訓練画像、教師データ）を用いた機械学習により学習済みモデルを生成し、生成された学習済みモデルに画像を入力することで、画像に含まれる物体を認識する技術が知られている。 Conventionally, a trained model is generated by machine learning using images (training images, teacher data) that contain the object to be identified, and the object contained in the image is recognized by inputting the image to the generated trained model. There are known techniques to do this.

先行技術文献１には、画像を物体領域と背景領域と分離したマスク画像を生成することで、未知の物体（訓練画像中にない物体など）の検出精度を向上させることが提案されている。マスク画像の生成の際には、フレーム画像の画素ごとに、画素に「１」（白色に相当する値）又は「０」（黒色に相当する値）を対応付け、物体領域（白色範囲）と背景領域（黒色範囲）とに分類することが開示されている。 Prior Art Document 1 proposes to improve the detection accuracy of unknown objects (such as objects not in training images) by generating a mask image in which an image is separated into an object region and a background region. When generating a mask image, each pixel of the frame image is assigned "1" (a value corresponding to white) or "0" (a value corresponding to black) to the pixel, and the object area (white range) and It is disclosed that the image is classified into a background area (black area) and a background area (black area).

特開２０２２－１４２６３号公報JP 2022-14263 Publication

食堂における食後の会計の際に、画像から食器を認識し、認識した食器に応じた会計処理を行うなどのユースケースが考えられる。このように、食器を画像認識の対象とするときに、食器の中に食べ残しがあると、認識精度が下がってしまう。すなわち、認識すべき対象の内側に認識すべき対象とは異なるものが存在すると、それが認識精度の低下要因となる場合がある。先行技術文献１では、認識すべき対象の内側に認識すべき対象とは異なるものが存在する可能性については考慮されていない。 A possible use case would be to recognize tableware from an image when checking out after a meal at a cafeteria, and then perform accounting processing according to the recognized tableware. In this way, when tableware is used as an object of image recognition, if there is leftover food in the tableware, recognition accuracy decreases. That is, if something different from the object to be recognized exists inside the object to be recognized, this may become a factor in reducing recognition accuracy. Prior Art Document 1 does not take into account the possibility that something different from the object to be recognized exists inside the object to be recognized.

そこで本発明は、内側に認識すべき対象とは異なるものが存在する可能性のある認識すべき対象を、より精度よく認識できるようにする仕組みを提供することを目的とする。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a mechanism that can more accurately recognize an object to be recognized, which may have something different from the object to be recognized inside.

画像のうち容器を含む容器領域を取得する取得手段と、
前記取得手段で取得した前記容器領域の内側の一部領域に対して、他の画像との差異を低減する特定の加工を施す加工手段と、
前記加工手段で加工された画像を用いて学習処理を行うように制御する制御手段と
を備えることを特徴とする。 acquisition means for acquiring a container region including the container in the image;
processing means that performs specific processing on a partial area inside the container area acquired by the acquisition means to reduce differences from other images;
A control means for controlling the learning process to be performed using the image processed by the processing means.

本発明によれば、内側に認識すべき対象とは異なるものが存在する可能性のある認識すべき対象を、より精度よく認識できる。 According to the present invention, it is possible to more accurately recognize an object to be recognized in which there may be something different from the object to be recognized inside.

本実施形態に係る情報処理装置を適用可能なシステムを説明する図である。FIG. 1 is a diagram illustrating a system to which the information processing device according to the present embodiment can be applied. 各種装置のハードウェア構成の一例を示す図である。FIG. 2 is a diagram showing an example of the hardware configuration of various devices. ＡＩ学習時の一例を示すフローチャートである。It is a flowchart which shows an example at the time of AI learning. ＡＩ推論時の一例を示すフローチャートである。It is a flowchart which shows an example at the time of AI inference. 食器の検出の一例を説明する図である。It is a figure explaining an example of tableware detection. 食器黒塗り方法の一例を説明する図である。It is a figure explaining an example of the tableware blackening method.

以下、図面を参照して、本発明の実施形態を詳細に説明する。 Embodiments of the present invention will be described in detail below with reference to the drawings.

まず、図１を参照して、本発明の実施形態における情報処理システムの構成の一例について説明する。 First, with reference to FIG. 1, an example of the configuration of an information processing system according to an embodiment of the present invention will be described.

本発明における情報処理システムは、カメラ１０３とディスプレイ１０４と精算台１０５で構成されている食堂精算レーン１０２が、所定のコントローラ１０６（例えばＰｏＥハブ）からネットワーク１０７（例えばイーサネット）を介して、クライアント端末１０１と通信可能に接続されて構成されている。なおクライアント端末１０１に対して、複数の食堂精算レーン１０２が接続されてもよい。 In the information processing system of the present invention, a cafeteria payment lane 102 composed of a camera 103, a display 104, and a payment desk 105 is connected to a client terminal from a predetermined controller 106 (e.g., PoE hub) via a network 107 (e.g., Ethernet). 101 and configured to be communicably connected. Note that a plurality of cafeteria payment lanes 102 may be connected to the client terminal 101.

カメラ１０３は、精算台１０５のトレー全体が写る範囲を撮影可能な位置に設置されている。 The camera 103 is installed at a position where the entire tray of the checkout table 105 can be photographed.

精算台１０５には、会計のために食後の食器が載ったトレーが置かれる。なお、食器が載ったトレーは、食前の状態でもよい。 On the checkout table 105, a tray with after-meal dishes placed on it is placed for payment. Note that the tray on which the tableware is placed may be in a pre-meal state.

クライアント端末１０１は、例えばパーソナルコンピューター（以下、ＰＣ）であり、カメラ１０３で撮像された画像から食器を識別し、決済等の処理を行う。クライアント端末１０１は深層距離学習（ディープメトリックラーニング（ＤｅｅｐＭｅｔｒｉｃＬｅａｒｎｉｎｇ））の技術を用いて、精算台１０５に置かれた食器の種類を識別する。 The client terminal 101 is, for example, a personal computer (hereinafter referred to as a PC), identifies tableware from an image captured by a camera 103, and performs processing such as payment. The client terminal 101 uses deep distance learning (Deep Metric Learning) technology to identify the type of tableware placed on the payment table 105.

深層距離学習とは、画像の特徴量のみを抽出し、抽出した特徴量からアルゴリズムによって画像の特徴量ベクトルを算出し、その距離を測定することでどの商品に最も近いかを求める手法である。予めサンプル画像を用意しておき、各画像から特徴量ベクトルを抽出する。入力画像について、各サンプル画像と特養量ベクトルの距離を測定し、最も近い距離にあるサンプルと同一種類であると判定する。本実施例では、深層距離学習を用いて説明をするが、ＤｅｅｐＬｅａｒｎｉｎｇＣｌａｓｓｉｆｉｃａｔｉｏｎ等の他の手法を用いても良い。 Deep distance learning is a method that extracts only the features of an image, uses an algorithm to calculate the image feature vector from the extracted features, and measures the distance to determine which product is closest. Sample images are prepared in advance, and feature vectors are extracted from each image. For the input image, the distance between each sample image and the special energy vector is measured, and it is determined that the sample is of the same type as the closest sample. Although this embodiment will be explained using deep distance learning, other methods such as deep learning classification may also be used.

ディスプレイ１０４は、クライアント端末１０１で処理された決済の情報を表示し、食事を行った支払い者に精算を指示する。なお、ディスプレイ１０４には、カメラ１０３の映像を表示してもよい。 The display 104 displays information on the payment processed by the client terminal 101, and instructs the payer of the meal to settle the payment. Note that an image from the camera 103 may be displayed on the display 104.

次に図２を参照して、本発明を適用可能な装置の一例としてのクライアント端末１０１の構成の一例を示す。 Next, with reference to FIG. 2, an example of the configuration of the client terminal 101 as an example of a device to which the present invention can be applied is shown.

図２において、内部バス２５０に対してＣＰＵ２０１、メモリ２０２、不揮発性メモリ２０３、画像処理部２０４、ディスプレイ２０５、操作部２０６、記録媒体Ｉ／Ｆ２０７、外部Ｉ／Ｆ２０９、通信Ｉ／Ｆ２１０が接続されている。内部バス２５０に接続される各部は、内部バス２５０を介して互いにデータのやりとりを行うことができるようにされている。 In FIG. 2, a CPU 201, memory 202, nonvolatile memory 203, image processing unit 204, display 205, operation unit 206, recording medium I/F 207, external I/F 209, and communication I/F 210 are connected to an internal bus 250. ing. Each unit connected to the internal bus 250 is configured to be able to exchange data with each other via the internal bus 250.

メモリ２０２は、例えばＲＡＭ（半導体素子を利用した揮発性のメモリなど）からなる。ＣＰＵ２０１は、例えば不揮発性メモリ２０３に格納されるプログラムに従い、メモリ２０２をワークメモリとして用いて、クライアント端末１０１の各部を制御する。不揮発性メモリ２０３には、画像データや音声データ、その他のデータ、ＣＰＵ２０１が動作するための各種プログラムなどが格納される。不揮発性メモリ２０３は例えばハードディスク（ＨＤ）やＲＯＭなどで構成される。 The memory 202 is composed of, for example, a RAM (volatile memory using a semiconductor element, etc.). The CPU 201 controls each part of the client terminal 101 according to a program stored in the nonvolatile memory 203, for example, using the memory 202 as a work memory. The nonvolatile memory 203 stores image data, audio data, other data, various programs for the CPU 201 to operate, and the like. The nonvolatile memory 203 is composed of, for example, a hard disk (HD) or a ROM.

画像処理部２０４は、ＣＰＵ２０１の制御に基づいて、不揮発性メモリ２０３や記録媒体２０８に格納された画像データや、外部Ｉ／Ｆ２０９を介して取得した映像信号、通信Ｉ／Ｆ２１０を介して取得した画像データ、撮像された画像などに対して各種画像処理を施す。画像処理部２０４が行う画像処理には、Ａ／Ｄ変換処理、Ｄ／Ａ変換処理、画像データの符号化処理、圧縮処理、デコード処理、拡大／縮小処理（リサイズ）、ノイズ低減処理、色変換処理などが含まれる。画像処理部２０４は特定の画像処理を施すための専用の回路ブロックで構成しても良い。また、画像処理の種別によっては画像処理部２０４を用いずにＣＰＵ２０１がプログラムに従って画像処理を施すことも可能である。画像から認識すべき対象（食器）を認識する処理は、ＣＰＵ２０１が画像処理部２０４と協働して行う。 Based on the control of the CPU 201, the image processing unit 204 processes image data stored in the nonvolatile memory 203 and the recording medium 208, video signals acquired via the external I/F 209, and image data acquired via the communication I/F 210. Performs various image processing on image data, captured images, etc. Image processing performed by the image processing unit 204 includes A/D conversion processing, D/A conversion processing, image data encoding processing, compression processing, decoding processing, enlargement/reduction processing (resizing), noise reduction processing, and color conversion. This includes processing, etc. The image processing unit 204 may be configured with a dedicated circuit block for performing specific image processing. Further, depending on the type of image processing, the CPU 201 may perform image processing according to a program without using the image processing unit 204. The CPU 201 performs the process of recognizing the object (tableware) to be recognized from the image in cooperation with the image processing unit 204 .

ディスプレイ２０５は、ＣＰＵ２０１の制御に基づいて、画像やＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）を構成するＧＵＩ画面などを表示する。ＣＰＵ２０１は、プログラムに従い表示制御信号を生成し、ディスプレイ２０５に表示するための映像信号を生成してディスプレイ２０５に出力するようにクライアント端末１０１の各部を制御する。ディスプレイ２０５は出力された映像信号に基づいて映像を表示する。なお、クライアント端末１０１自体が備える構成としてはディスプレイ２０５に表示させるための映像信号を出力するためのインターフェースまでとし、ディスプレイ２０５は外付けのモニタ（テレビなど）で構成してもよい。 The display 205 displays images, a GUI screen forming a GUI (Graphical User Interface), etc. under the control of the CPU 201 . The CPU 201 generates a display control signal according to a program, and controls each part of the client terminal 101 to generate a video signal to be displayed on the display 205 and output it to the display 205. Display 205 displays video based on the output video signal. Note that the client terminal 101 itself includes an interface for outputting a video signal to be displayed on the display 205, and the display 205 may be an external monitor (such as a television).

操作部２０６は、キーボードなどの文字情報入力デバイスや、マウスやタッチパネルといったポインティングデバイス、ボタン、ダイヤル、ジョイスティック、タッチセンサ、タッチパッドなどを含む、ユーザー操作を受け付けるための入力デバイスである。なお、タッチパネルは、ディスプレイ２０５に重ね合わせて平面的に構成され、接触された位置に応じた座標情報が出力されるようにした入力デバイスである。 The operation unit 206 is an input device for receiving user operations, including a character information input device such as a keyboard, a pointing device such as a mouse or a touch panel, a button, a dial, a joystick, a touch sensor, a touch pad, and the like. Note that the touch panel is an input device that is configured in a planar manner so as to be superimposed on the display 205, and outputs coordinate information according to a touched position.

記録媒体Ｉ／Ｆ２０７は、メモリーカードやＣＤ、ＤＶＤといった記録媒体２０８が装着可能とされ、ＣＰＵ２０１の制御に基づき、装着された記録媒体２０８からのデータの読み出しや、当該記録媒体２０８に対するデータの書き込みを行う。外部Ｉ／Ｆ２０９は、外部機器と有線ケーブルや無線によって接続し、映像信号や音声信号の入出力を行うためのインターフェースである。通信Ｉ／Ｆ２１０は、外部機器やインターネット２１１などと通信して、ファイルやコマンドなどの各種データの送受信を行うためのインターフェースである。 A recording medium 208 such as a memory card, CD, or DVD can be attached to the recording medium I/F 207, and based on the control of the CPU 201, data can be read from the attached recording medium 208 and data can be written to the recording medium 208. I do. The external I/F 209 is an interface for connecting to external equipment via a wired cable or wirelessly, and for inputting and outputting video signals and audio signals. The communication I/F 210 is an interface for communicating with external devices, the Internet 211, etc., and transmitting and receiving various data such as files and commands.

カメラ部２１２は、光学像を電気信号に変換するＣＣＤやＣＭＯＳ素子等で構成される撮像素子（撮像センサー）等で構成されるカメラユニットである。 The camera unit 212 is a camera unit that includes an image sensor (imaging sensor) that includes a CCD or CMOS device that converts an optical image into an electrical signal.

次に図３を参照して、本発明の実施形態における、食器認識に係る学習処理（ＡＩ：ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅを用いた学習処理）の基本処理について説明する。なお、各ステップの処理は、各装置のＣＰＵ２０１が実行する。食堂の客が食堂精算レーン１０２を利用する前の処理として、クライアント端末１０１で画像を学習するときに、図３の処理が開始される。 Next, with reference to FIG. 3, basic processing of learning processing related to tableware recognition (learning processing using AI: Artificial Intelligence) in the embodiment of the present invention will be described. Note that the processing of each step is executed by the CPU 201 of each device. The process shown in FIG. 3 is started when the client terminal 101 studies an image as a process before the restaurant customer uses the cafeteria payment lane 102.

Ｓ３０１では、ＣＰＵ２０１は、カメラ１０３で撮影された食器を含む画像から食器毎の画像を外接矩形で切り出して取得し、記録媒体２０８に保存する。具体的には、カメラ１０３で撮影した画像から、食器の領域を検出する。この検出処理は、後述する食器の種別の検出とは異なる検出処理であり、種別は不明であるが、食器がある（あるいは、トレーではない物体がある）ということまでがわかる検出処理である。この検出処理で検出された食器の領域に対して、食器の外形に接するように矩形（以下、外接矩形）を設定する。元の画像から設定された外接矩形の領域の部分画像（すなわち、単一の食器を含む画像）を切り出して取得し、記録媒体２０８に保存する。 In S<b>301 , the CPU 201 cuts out and acquires an image of each tableware using a circumscribing rectangle from the image including the tableware photographed by the camera 103 , and stores the image in the recording medium 208 . Specifically, the area of tableware is detected from the image taken by the camera 103. This detection process is different from the detection of the type of tableware, which will be described later, and is a detection process that allows it to be determined that there is a tableware (or that there is an object other than a tray), although the type is unknown. A rectangle (hereinafter referred to as a circumscribed rectangle) is set for the area of the tableware detected in this detection process so as to be in contact with the outer shape of the tableware. A partial image of the set circumscribed rectangular area (that is, an image including a single tableware) is cut out and acquired from the original image, and is stored in the recording medium 208.

Ｓ３０２では、ＣＰＵ２０１は、Ｓ３０１で取得して保存した切り出し画像の内側の一部分を黒塗りする割合を設定する。このとき、黒塗りする割合は任意に設定可能であるが、１００％にしてしまうと、全て塗りつぶしてしまうことになるので、１００％未満とする。また、塗りつぶす色は黒に限らず、他の色を使用しても良い。 In S302, the CPU 201 sets the ratio of blacking out a portion of the inside of the cutout image acquired and saved in S301. At this time, the percentage of blacking can be set arbitrarily, but if it is set to 100%, the entire area will be filled in, so it should be less than 100%. Furthermore, the color to be filled is not limited to black, and other colors may be used.

Ｓ３０３では、ＣＰＵ２０１は、Ｓ３０２で設定した割合に基づいて、Ｓ３０１で取得して保存した切り出し画像（食器を含む領域）のそれぞれの内側の一部分を黒塗りし、記録媒体２０８に保存する。この黒塗り処理は、図６で後述する推論フェーズでの黒塗りと同様の処理である。ただし、教師データの作成過程であるＳ３０３の処理では、料理（食べ物）や食べ残しがない、食器だけを撮影した画像を用いた処理であってもよい。食器だけの画像であっても、内側を黒塗りにしたものを学習させることで、推論フェーズで食器自体ではなく食べ残しの画像が写る可能性のある食器の内側部分を、特徴と見なさないような学習をさせることができる。すなわち、Ｓ３０３の処理は、食器を含む容器領域の内側の一部分に対して、他の画像との差分を低下させる加工を施す処理である。 In S303, the CPU 201 blacks out a portion of the inside of each cutout image (area including tableware) acquired and saved in S301 based on the ratio set in S302, and saves it in the recording medium 208. This blacking process is similar to the blacking process in the inference phase described later with reference to FIG. However, in the process of S303, which is the training data creation process, an image of only tableware without cooking (food) or leftover food may be used. Even if it is an image of just a tableware, by learning images with the inside painted black, the inside part of the tableware, which may contain an image of leftover food rather than the tableware itself, is not considered as a feature during the inference phase. It is possible to have children learn a lot. That is, the process of S303 is a process of processing a portion of the inside of the container area containing the tableware to reduce the difference from other images.

Ｓ３０４では、ＣＰＵ２０１は、外接矩形で切り出したオリジナル画像（Ｓ３０１で取得した黒塗り前の切り出し画像、教師画像）と、Ｓ３０３で保存した黒塗り画像（教師画像）の双方と、それぞれの画像の食器の種別を示すラベル情報を用いた学習処理を行い、学習済モデルを作成する。作成した学習済みモデルを記録媒体２０８に記録する。なお、オリジナル画像を使用せずに、学習させる教師画像としては黒塗り画像のみを使用してもよい。その場合、推論フェーズで使用する画像も黒塗り画像のみとするのが好ましい。 In S304, the CPU 201 saves both the original image cut out by the circumscribed rectangle (the cutout image before blacking, the teacher image acquired in S301) and the blacking image (teacher image) saved in S303, and the tableware of each image. A trained model is created by performing a learning process using label information indicating the type of the model. The created trained model is recorded on the recording medium 208. Note that, without using the original image, only the black painted image may be used as the teacher image for learning. In that case, it is preferable that the images used in the inference phase are also only black images.

以上が図３の説明である。 The above is the explanation of FIG. 3.

次に図４を参照して、本実施形態における、食器の認識処理の一例を示す。この処理は、図３の学習処理で生成された学習済みモデルを用いた推論フェーズの処理であり、食堂の客が食堂精算レーン１０２を利用する際に行われる処理である。なお、各ステップの処理は、ＣＰＵ２０１が実行する。なお、図３の処理と図４の処理を同じクライアント端末１０１が行う例を説明するが、図３の処理で生成された学習済みモデルを使うのであれば、図４の処理は図３の処理を行うクライアント端末１０１とは別の個体の情報処理装置（例えばＰＣ）で実行するようにしてもよい。 Next, with reference to FIG. 4, an example of tableware recognition processing in this embodiment will be described. This process is an inference phase process using the trained model generated in the learning process of FIG. 3, and is a process performed when a customer of the cafeteria uses the cafeteria payment lane 102. Note that the processing of each step is executed by the CPU 201. An example will be described in which the same client terminal 101 performs the processing in FIG. 3 and the processing in FIG. 4. However, if the trained model generated in the processing in FIG. 3 is used, the processing in FIG. The processing may be executed by a separate information processing device (for example, a PC) from the client terminal 101 that performs the processing.

Ｓ４０１では、ＣＰＵ２０１は、カメラ１０３で精算台１０５の範囲を撮影する。カメラ１０３で清算台を撮影する際、常に撮影し続けてもいいし、撮影範囲内に何らかの動く物体を検知した場合に、撮影を開始しても良い。 In S<b>401 , the CPU 201 uses the camera 103 to photograph the area of the payment desk 105 . When photographing the checkout table with the camera 103, the camera 103 may continue photographing at all times, or may start photographing when some moving object is detected within the photographing range.

Ｓ４０２では、ＣＰＵ２０１は、撮影画像から、所定の範囲にトレーが置かれているかを判定するトレー配置判定処理を実行する。Ｓ４０３でトレーが置かれていると判断された場合はＳ４０４の食器位置検出処理を行い、トレーが置かれていないと判断された場合はＳ４０２のトレー配置判定が再度実行される。 In S402, the CPU 201 executes tray placement determination processing to determine whether the tray is placed within a predetermined range from the photographed image. If it is determined in S403 that a tray is placed, tableware position detection processing is performed in S404, and if it is determined that a tray is not placed, tray placement determination in S402 is executed again.

Ｓ４０４では、ＣＰＵ２０１は、カメラ１０３による撮影を行い、撮影された画像から、Ｓ３０１と同様に、食器毎の画像を外接矩形で切り出して取得する。図５に、カメラ１０３で撮影された画像の例を示す。撮影画像５０１には、精算台１０５に置かれたトレー５０２と食器５０３ａ～５０３ｄが写っている。トレー上の食器の位置を検出し、それぞれの食器に対して外接矩形５０４ａ～５０４ｄを算出する。なお、図５は食べ残しが無い食器の画像の例であるが、食べ残しがある場合には、各食器の内側に食べ残しが写ったものとなる。 In S404, the CPU 201 uses the camera 103 to take a picture, and from the captured image, similarly to S301, the CPU 201 cuts out and obtains an image for each tableware using a circumscribed rectangle. FIG. 5 shows an example of an image taken by the camera 103. The photographed image 501 shows a tray 502 placed on the checkout table 105 and tableware 503a to 503d. The positions of the dishes on the tray are detected, and circumscribed rectangles 504a to 504d are calculated for each dish. Note that FIG. 5 is an example of an image of tableware with no leftover food, but if there is leftover food, the leftover food will be shown inside each tableware.

Ｓ４０５では、ＣＰＵ２０１は、Ｓ４０４によって取得した食器ごとの部分画像の内部を、予め設定しておいた割合で黒塗りする。 In S405, the CPU 201 blacks out the inside of the partial image for each tableware acquired in S404 at a preset ratio.

図６に、食器の黒塗り加工を行う方法の一例を示す。外接矩形（容器領域）で切り出した切り出し画像６０１ａ～６０１ｃには、それぞれ食器６０２ａ～６０２ｃと食べ残し６０３ａ～６０３ｃや汚れが写っている。食べ残しや汚れがあると食器識別の精度に影響があるため、食べ残しや汚れ等の画像認識に不要な部分を黒塗りする処理を行う。本実施例では、黒塗りする割合を５０％に設定していた場合について説明する。外接矩形で切り出した切り出し画像６０１ａ～６０１ｃに対し、それぞれの中心から、それぞれ横５０％、縦５０％の割合で楕円（外接矩形が正方形の場合は真円となる）を黒塗りして黒塗領域６０４ａ～６０４ｃを生成する。形状が横長の食器６０２ｂや長方形の食器６０２ｃに関しては、外接矩形（切り出し画像６０１ｂ、６０１ｃ）の中心（Ｃ２，Ｃ３）から、それぞれ横５０％、縦５０％の割合で黒塗りした場合は、図示したような横長の楕円の黒塗り領域６０４ｂ、６０４ｃが生成される。このように、食べ残しや汚れ等の不要な部分を黒塗りすることによって、食べ残し等を誤って検出してしまうことを防ぎ、認識精度の向上が期待できる。また、食器があると認識（検出）された領域の形状（切り出し画像６０１ａ～６０１ｃの形状）に合わせて黒塗りする領域が変わる（真円か、横長の楕円か、など）。すなわち、食器の形状に合わせて食品などの検出対象物と異なるものが置かれている可能性が高い領域を黒塗りするため、より精度良く、検出対象物である食器の種類を検出することが可能となる。なお、不要な部分について、ある対象を検出する際に特徴のない部分（不要部分）であれば、食べ残しや汚れに限らない。 FIG. 6 shows an example of a method for blackening tableware. Cutout images 601a to 601c cut out using the circumscribed rectangle (container area) include dishes 602a to 602c, leftover food 603a to 603c, and dirt, respectively. Since leftover food and dirt affect the accuracy of tableware identification, a process is performed to black out areas that are unnecessary for image recognition, such as leftover food and dirt. In this embodiment, a case will be described in which the blacking ratio is set to 50%. For the cutout images 601a to 601c cut out with the circumscribed rectangle, an ellipse (if the circumscribed rectangle is a square, it becomes a perfect circle) is painted black from the center of each at a ratio of 50% horizontally and 50% vertically. Regions 604a to 604c are generated. Regarding the horizontally elongated tableware 602b and the rectangular tableware 602c, if black is painted at a ratio of 50% horizontally and 50% vertically from the center (C2, C3) of the circumscribed rectangle (cutout images 601b, 601c), as shown in the figure. Horizontally elongated elliptical black areas 604b and 604c as shown above are generated. In this way, by blacking out unnecessary portions such as leftover food and dirt, it is possible to prevent leftover food from being erroneously detected and improve recognition accuracy. Furthermore, the area to be painted black changes (such as a perfect circle or a horizontally long ellipse) depending on the shape of the area where the presence of tableware is recognized (detected) (the shape of the cutout images 601a to 601c). In other words, the area where there is a high possibility that something different from the object to be detected, such as food, is placed is painted black according to the shape of the tableware, making it possible to detect the type of tableware that is the object of detection with higher accuracy. It becomes possible. Note that the unnecessary part is not limited to leftover food or dirt, as long as it has no characteristics (unnecessary part) when detecting a certain object.

Ｓ４０６では、ＣＰＵ２０１は、ＡＩによる食器の種類判別を実行する。具体的には、Ｓ３０４で作成された学習済モデル（記録媒体２０８に記憶されている学習済みモデル）に、Ｓ４０６で作成した加工済みの切り出し画像を入力し、推論処理を行う。Ｓ４０４で複数の切り出し画像を取得していた場合は、それらの全てについてそれぞれ推論処理を行う。推論処理の結果として、各切り出し画像について、複数の食器種別毎のスコア（該当する食器種別に対する確からしさ）が出力される。ＣＰＵ２０１は、このうち、スコアが所定の閾値を超えているものを抽出し、判別結果の候補種別とする。候補種別として抽出される種別の数は、０、１、複数のいずれの場合もあり得る。 In S406, the CPU 201 uses AI to determine the type of tableware. Specifically, the processed cutout image created in S406 is input to the trained model created in S304 (the trained model stored in the recording medium 208), and inference processing is performed. If a plurality of cutout images have been acquired in S404, inference processing is performed on all of them. As a result of the inference process, scores for each of a plurality of tableware types (probability for the corresponding tableware type) are output for each cutout image. The CPU 201 extracts those whose scores exceed a predetermined threshold and sets them as candidate types for the determination result. The number of types extracted as candidate types may be zero, one, or multiple.

Ｓ４０７では、Ｓ４０６の推論処理の結果、候補種別が抽出されたか否かを判定する。候補種別が１つ以上抽出された場合はＳ４０８に進み、そうでない場合、すなわち候補種別が０であった（スコアが閾値を超える種別が無かった）場合にはＳ４１４へ進む。 In S407, it is determined whether a candidate type has been extracted as a result of the inference process in S406. If one or more candidate types have been extracted, the process advances to S408; if not, that is, if the candidate types are 0 (there is no type whose score exceeds the threshold), the process advances to S414.

Ｓ４０８～Ｓ４１２の処理は、候補種別の１つずつについて行われる。以下、例として、１つの切り出し画像についてＳ４０６で候補種別が吸い物椀、茶碗、焼き魚皿の３つが抽出された例を説明する。この場合、Ｓ４０８～Ｓ４１２の処理は、吸い物椀、茶碗、焼き魚皿それぞれについて行われる。 The processes of S408 to S412 are performed for each candidate type. Hereinafter, an example will be described in which three candidate types, ie, a soup bowl, a rice bowl, and a grilled fish plate, are extracted in step S406 for one cutout image. In this case, the processes of S408 to S412 are performed for each of the soup bowl, tea bowl, and grilled fish plate.

Ｓ４０８では、ＣＰＵ２０１は、Ｓ４０６で抽出された候補種別であって、Ｓ４０８での処理対象の候補種別に対応するサンプル画像を取得する。サンプル画像は、検出結果としてあり得る食器の正解データ（教師データ）に含まれる画像であり、Ｓ３０１で予め記録媒体２０８に記録されていた画像である。 In S408, the CPU 201 acquires a sample image corresponding to the candidate type extracted in S406 and to be processed in S408. The sample image is an image included in correct data (teacher data) of tableware that can be a possible detection result, and is an image that has been recorded in advance on the recording medium 208 in S301.

Ｓ４０９では、ＣＰＵ２０１は、候補種別の取得元となった認識対象画像である切り出し画像（外接矩形）のアスペクト比と、Ｓ４０８で取得したサンプル画像のアスペクト比とを比較する処理を実行する。 In S409, the CPU 201 executes a process of comparing the aspect ratio of the cutout image (circumscribed rectangle), which is the recognition target image from which the candidate type was obtained, with the aspect ratio of the sample image obtained in S408.

Ｓ４１０では、ＣＰＵ２０１は、Ｓ４０９の比較の結果、アスペクト比の差が許容範囲以内であるかを判定する。許容範囲内であればＳ４１１に処理を進め、許容範囲外であればＳ４１４に進む。例えば、焼き魚皿のサンプル画像において、食器の外接矩形のアスペクト比は横長の２：３であるものとする。これに対して、候補種別である焼き魚皿の取得元となった認識対象画像である切り出し画像（外接矩形）のアスペクト比が１：１であれば、焼き魚皿はアスペクト比が許容範囲外となるため、このステップでＮｏと判定され、焼き魚皿は候補種別から除外される。 In S410, the CPU 201 determines whether the difference in aspect ratio is within an allowable range as a result of the comparison in S409. If it is within the allowable range, the process advances to S411, and if it is outside the allowable range, the process advances to S414. For example, in a sample image of a grilled fish plate, it is assumed that the aspect ratio of the circumscribed rectangle of the tableware is 2:3 of the horizontal length. On the other hand, if the aspect ratio of the cutout image (circumscribed rectangle) that is the recognition target image from which the candidate type of grilled fish plate was obtained is 1:1, then the aspect ratio of grilled fish plate is outside the allowable range. Therefore, the determination in this step is No, and the grilled fish dish is excluded from the candidate types.

Ｓ４１１では、ＣＰＵ２０１は、候補種別の取得元となった認識対象画像である切り出し画像（外接矩形）のサイズと、Ｓ４０８で取得したサンプル画像のサイズとを比較する処理を実行する。具体的には、面積（ピクセル数）を比較する。Ｓ４０４の食器位置検出で検出された外接矩形の面積（ピクセル数）とＳ４１０で絞り込まれたサンプル画像群の候補の面積（ピクセル数）を比較する処理を実行する。 In S411, the CPU 201 executes a process of comparing the size of the cutout image (circumscribed rectangle), which is the recognition target image from which the candidate type was obtained, with the size of the sample image obtained in S408. Specifically, the areas (number of pixels) are compared. Processing is performed to compare the area (number of pixels) of the circumscribed rectangle detected in the tableware position detection in S404 with the area (number of pixels) of the candidate of the sample image group narrowed down in S410.

Ｓ４１２では、ＣＰＵ２０１は、Ｓ４１１の比較の結果、サイズの差が許容範囲以内であるかを判定する。許容範囲内であればＳ４１３に処理を進め、許容範囲外であればＳ４１４に進む。例えば、茶碗のサンプル画像のサイズが、吸い物椀のサンプル画像のサイズ１よりも大きい、サイズ２であるものとする。これに対して、候補種別である茶碗の取得元となった認識対象画像である切り出し画像（外接矩形）のサイズがサイズ１であり、サイズ１とサイズ２の差が許容範囲を超える差であれば、このステップでＮｏと判定され、茶碗は候補種別から除外される。このように、同じような形状の食器であっても大きさが異なる場合があるため、食器の大きさを比較して、異なる大きさの食器を候補から除外する処理を行う。例えば、茶碗の中でも、大きいものから小さいものまで大きさは様々であり、これらを識別するために、食器の画像の面積を比較することで候補を絞り込むことができる。 In S412, the CPU 201 determines whether the size difference is within an allowable range as a result of the comparison in S411. If it is within the allowable range, the process advances to S413, and if it is outside the allowable range, the process advances to S414. For example, assume that the size of the sample image of a tea bowl is size 2, which is larger than size 1 of the sample image of a soup bowl. On the other hand, if the size of the cutout image (circumscribed rectangle) that is the recognition target image from which the candidate type of teacup is obtained is size 1, and the difference between size 1 and size 2 exceeds the allowable range. For example, the determination in this step is No, and the tea bowl is excluded from the candidate types. In this way, even tableware with similar shapes may have different sizes, so a process is performed to compare the sizes of the tableware and exclude tableware of different sizes from candidates. For example, there are various sizes of bowls, from large ones to small ones, and in order to identify them, candidates can be narrowed down by comparing the areas of images of tableware.

Ｓ４１３では、ＣＰＵ２０１は、候補種別の全てについて処理済みであるか否かを判定する。全て処理済みであればＳ４１５に進み、そうでない場合にはＳ４０８に進んで次の候補種別についての処理を行う。 In S413, the CPU 201 determines whether all candidate types have been processed. If all have been processed, the process advances to S415; otherwise, the process advances to S408 to process the next candidate type.

Ｓ４１４では、ＣＰＵ２０１は、処理対象の候補種別を候補から除外する。すなわちその種別は認識結果としては確定しない。 In S414, the CPU 201 excludes the candidate type to be processed from the candidates. In other words, the type is not determined as a recognition result.

Ｓ４１５では、ＣＰＵ２０１は、Ｓ４０６で抽出された候補種別のうち、Ｓ４０８からＳ４１４の処理で候補から除外されなかった種別が存在するか否かを判定する。存在する場合にはＳ４１６に進み、存在しない場合（全ての種別が除外された場合）にはＳ４１７へ進む。 In S415, the CPU 201 determines whether there is any type among the candidate types extracted in S406 that was not excluded from the candidates in the processes from S408 to S414. If it exists, the process advances to S416; if it does not exist (all types have been excluded), the process advances to S417.

Ｓ４１６では、ＣＰＵ２０１は、Ｓ４０６で抽出された候補種別のうち、Ｓ４０８からＳ４１２の処理で候補から除外されなかった残りの種別のうち、スコアが最も高い食器の種別を１つ特定し、認識結果として確定する。すなわち、１つの容器領域に対して１つの食器の種別を特定する。 In S416, the CPU 201 identifies one tableware type with the highest score among the remaining types that were not excluded from the candidates in the processing from S408 to S412 among the candidate types extracted in S406, and selects it as the recognition result. Determine. That is, one type of tableware is specified for one container area.

一方、Ｓ４１７では、ＣＰＵ２０１は、検出対象の食器が、未登録の食器（未登録物品）として判定する。その場合、未登録物品は会計に含めないように処理を行う。例えば、トレー上に食器以外のタオル等が置かれていた場合、それを未登録物品として認識し、会計には含めないようにする。また、未登録物品であると識別できるように、当該物品に対して、「Ｕｎｋｎｏｗｎ」等の通知をしてもよい。 On the other hand, in S417, the CPU 201 determines that the tableware to be detected is an unregistered tableware (unregistered article). In that case, unregistered goods will be processed so as not to be included in accounting. For example, if a towel or the like other than tableware is placed on the tray, it will be recognized as an unregistered item and will not be included in the bill. Furthermore, a notification such as "Unknown" may be given to the article so that it can be identified as an unregistered article.

こうしてＳ４１６，Ｓ４１７で食器の種別が特定されると、ＣＰＵ２０１は、その日のメニュー（献立）情報を参照し、特定された食器に対応する料理（メニュー）と値段を取得する。そして、１つのトレー画像に含まれる全ての食器に対応する料理と値段を取得すると、ディスプレイ１０４に、検出結果として、各料理名、値段、合計金額を表示するように制御する。その後、ユーザーからの清算操作に応じて、表示された合計金額での清算を行う。 When the type of tableware is thus specified in S416 and S417, the CPU 201 refers to that day's menu (menu) information and acquires the dish (menu) and price corresponding to the specified tableware. When the dishes and prices corresponding to all the tableware included in one tray image are acquired, the display 104 is controlled to display each dish name, price, and total amount as a detection result. Thereafter, in response to the user's payment operation, payment is made with the displayed total amount.

以上が図４の説明である。 The above is the explanation of FIG. 4.

以上説明したように、本実施形態によれば、認識対象の食器とは異なるもの（食品）が写る可能性の高い食器の内側の一部領域を黒塗りする加工を施した画像を学習させ、また、推論に用いるようにする。このようにすることで、黒塗りした領域はどの種別の食器の画像でも同じ特徴（黒一色という特徴）を持つ画像となる。従って、学習フェーズにおいては、黒塗りした領域は食器の種別を判別するために有効な特徴（差異）を示すデータが得られる領域とはならないため、黒塗りされた領域に関して食器の種別の判断の根拠とする程度が低い学習済みモデルが生成されることとなる。こうして生成された学習済みモデルには、食器のうち、学習時に黒塗りされていた領域に相当する領域にどんな異物があっても、食器の種別判断に与える影響は低い。すなわち、食べ残しによる影響で誤った判断をする可能性が低減し、より精度よく食器の種別を判別することが可能となる。このように生成された学習済みモデルには、推論を行う食器の種別の検出対象画像として、食べ残しのある領域を黒塗りにせずにそのままの状態の画像を入力しても、学習時に黒塗りされていた領域に対応する領域は、食器の種別の判断の根拠となる程度が低い。すなわち、食べ残しの部分の画像の影響による認識精度の低下はないか、限定的となる。従って、推論時には黒塗り加工を行わず、図３のように、学習時に教師画像として食器の画像の内側の一部を黒塗りした画像を用いた学習を行うだけでも認識精度向上の効果を得ることができる。そのため、図４のＳ４０５の黒塗りの処理は行わなくてもよい。その分、推論時に処理にかける処理時間や処理負荷を低下させることができ、高い応答性で推論結果を通知することができる。もちろん、Ｓ４０５の処理を行えば、より高い精度が期待できる。 As explained above, according to the present embodiment, an image in which a partial area inside the tableware that is likely to contain something different from the tableware to be recognized (food) is painted black is trained. Also, use it for reasoning. By doing so, the blacked-out area has the same feature (the feature of being completely black) regardless of the type of tableware image. Therefore, in the learning phase, the blacked out area is not an area where data indicating features (differences) effective for determining the type of tableware can be obtained, so it is difficult to judge the type of tableware with respect to the blacked out area. A trained model with a low degree of basis will be generated. In the trained model generated in this way, even if there is any foreign object in the area of the tableware that corresponds to the area that was painted black at the time of learning, the effect on the judgment of the type of tableware is low. That is, the possibility of making a wrong judgment due to the influence of leftover food is reduced, and it becomes possible to discriminate the type of tableware with higher accuracy. The trained model generated in this way will not be blacked out during training even if an image with leftover food is input as it is without blacking out as the detection target image for the type of tableware used for inference. The area corresponding to the area where the tableware was used is of a low degree of use as a basis for determining the type of tableware. In other words, there is no or only limited reduction in recognition accuracy due to the influence of images of leftover food. Therefore, the effect of improving recognition accuracy can be obtained by simply performing learning using an image in which part of the inside of the tableware image is painted black as a teacher image during learning, as shown in Figure 3, without performing blacking during inference. be able to. Therefore, the blacking process of S405 in FIG. 4 does not need to be performed. Accordingly, the processing time and processing load required during inference can be reduced, and inference results can be notified with high responsiveness. Of course, higher accuracy can be expected by performing the process of S405.

なお、上述の実施形態では、食器のある領域の外接矩形で切り出した画像に対して、食器の内部を黒塗りする割合を設定する例を説明したが、これに限るものではない。検出した食器の食べ残し部分のみをＡＩで領域抽出し、その領域にのみ黒塗りするようにしてもよい。 Note that in the above-described embodiment, an example has been described in which the ratio of blacking out the inside of the tableware is set for an image cut out by the circumscribed rectangle of the area where the tableware is, but the present invention is not limited to this. AI may be used to extract only the uneaten portion of the detected tableware, and only that area may be painted black.

また、次のように処理してもよい。Ｓ４０４の処理の後、Ｓ４０５を省略し、Ｓ４０４で検出された黒塗りしていない認識対象画像の容器領域の画像に対してＳ４０６の食器の種類判別を実行する。そして、複数の候補種別にそれぞれ対応する複数のサンプル画像（予め記憶された、食べ残し等の異物がのっていない食器の画像）と、Ｓ４０４で検出された黒塗りしていない認識対象画像の容器領域の画像とのそれぞれの差分抽出を行う。そして、Ｓ４０４で検出された黒塗りしていない認識対象画像の容器領域の画像のうち、差分となった差分領域（すなわち、食器ではない領域で、食べ残し等と推定される領域）を黒塗りした画像を、複数の食器の候補種別の分作成し、それらに対してもう一度Ｓ４０６の食器の種類判別を実行する。その結果得られた食器の種別として一番スコアの高い１つの種別を、検出された食器の種別として確定するようにしてもよい。すなわち、検出対象の画像（認識対象画像）と候補画像との差分から食べ残し部分を特定し、その部分を黒塗りして、推論処理を行うことも可能である。 Alternatively, the following processing may be performed. After the processing in S404, S405 is omitted, and tableware type determination in S406 is performed on the image of the container region of the recognition target image that is not blacked out and detected in S404. Then, a plurality of sample images corresponding to the plurality of candidate types (pre-stored images of tableware without foreign objects such as leftover food) and a recognition target image without blacking detected in S404 are used. Extract each difference between the image of the container area and the image of the container area. Then, among the images of the container area of the recognition target image that is not blacked out and detected in S404, the difference area that is the difference (that is, the area that is not tableware and is estimated to be leftover food, etc.) is blacked out. Images are created for a plurality of tableware candidate types, and the tableware type determination in S406 is performed once again on these images. As a result, one type of tableware with the highest score may be determined as the type of the detected tableware. That is, it is also possible to specify the uneaten portion from the difference between the detection target image (recognition target image) and the candidate image, black out the leftover portion, and perform inference processing.

なお、上述の実施形態では、図６のように食器を黒塗りする際、楕円形になるように黒塗りする例を説明したが、これに限るものではない。食器の形状に合わせて、黒塗りの形状を変更してもよい。例えば、食器の形状が四角形６０６である場合、それに合わせて黒塗りの形状を四角形にしてもよい。 In addition, in the above-mentioned embodiment, when painting the tableware black as shown in FIG. 6, an example was explained in which the black painting is done in an oval shape, but the invention is not limited to this. The shape of the black coating may be changed to match the shape of the tableware. For example, if the shape of the tableware is a rectangle 606, the black shape may be made into a rectangle to match the shape.

さらに、認識対象の食器とは異なるもの（食品）が写る可能性の高い食器の内側の一部領域を黒塗りする加工を施す例を説明したが、認識の根拠となる可能性を低減させることが可能な、画像別の特徴（差異）を低減させる加工であればこれに限るものではない。例えば、黒塗りすると説明した領域について、白一色や、青一色などの、任意の色の単色で塗りつぶす処理としてもよい。また、数ドットおきに黒と灰色が交互に現れるなどの、単純なパターン画像に置き換える処理などでもよい。 Furthermore, we have explained an example in which a part of the inside of the tableware is painted black, where there is a high possibility that something different from the recognition target tableware (food) will be captured, but this reduces the possibility that it will become the basis for recognition. The processing is not limited to this, as long as it is possible to reduce characteristics (differences) between images. For example, an area described as being painted black may be filled with any single color, such as all white or all blue. Alternatively, it may be replaced with a simple pattern image, such as black and gray appearing alternately every few dots.

また、黒塗りすると説明した領域について、画像の無い無画像領域としてもよい。食器の外接矩形に含まれる黒塗りすると説明した領域を除く中央に穴の開いた状態の画像を学習や推論に用いても良い。なお、学習時に用いた食器の画像と、推論時に用いる食器の画像における食器の形状が異なると、同じ種別であるはずの食器が別の種別であると判定される可能性が出てきてしまう。例えば、中央に穴の開いた食器の画像を学習させると、推論時に中央に穴の開いていない画像は異なる食器であると判定される可能性が高まる。その点で言えば、学習時に単色やパターンで塗りつぶす処理とすれば、形状自体は変わらないため、推論時に用いる画像の自由度が高まる（黒塗りを施さないそのままの画像を用いても効果的な推論を行える）。黒塗りの代わりに無画像領域としようとした場合、推論前には検出対象の画像に含まれる食器の種別は正確にはわかっていないわけであるから、推論時に学習時と同じ形状で無画像領域を切り出すことは難しい。すなわち、無画像領域としてしまうよりは、黒塗りなどの特徴のないパターンでの置き換えの方が効果的であると想定される。 Further, the area described as being painted black may be a non-image area without an image. An image with a hole in the center excluding the area described as being painted black that is included in the circumscribed rectangle of the tableware may be used for learning and inference. Note that if the shape of the tableware in the image of tableware used during learning differs from the shape of the tableware in the image of tableware used during inference, there is a possibility that tableware that should be of the same type will be determined to be of a different type. For example, if the system learns images of dishes with a hole in the center, there is a higher possibility that images without a hole in the center will be determined to be different dishes during inference. From that point of view, if the process is filled with a single color or a pattern during learning, the shape itself will not change, increasing the degree of freedom of the image used during inference (it is also effective to use the original image without blacking). can make inferences). If you try to use a blank area instead of blacking out, the type of tableware included in the image to be detected is not accurately known before inference, so during inference, the same shape as during learning and no image will be used. It is difficult to isolate areas. In other words, it is assumed that replacing the area with a pattern without characteristics, such as black painting, is more effective than leaving the area as a no-image area.

なお、上述の実施形態は、食器の種別の認識を行う例を説明したが、食器の種別の認識に限らず、検出対象の内側に、検出対象とは異なるものが存在する可能性がある場合に適用可能である。例えば、鍋の内部の食品や料理にかかわらず鍋自体の種別を画像から判別する学習済みモデルを生成したり、推論したりする際にも適用可能である。また、ビーカー、シャーレなどの実験に使う容器、薬品・化粧品・食品・飲料などが入れられる容器（瓶やコップなど）、荷物の容器（木箱・段ボール・樹脂容器など）の認識（検出）を行う場合にも適用可能である。いずれも、容器の中身にかかわらず、容器自体を精度良く認識（検出）することに寄与する。 In addition, although the above-mentioned embodiment explained the example of recognizing the type of tableware, the recognition is not limited to the recognition of the type of tableware, and it is possible that there is something different from the detection target inside the detection target. Applicable to For example, it can be applied to generate or infer a trained model that determines the type of pot itself from an image, regardless of the food or cooking inside the pot. It also recognizes (detects) containers used for experiments such as beakers and petri dishes, containers for medicines, cosmetics, food, drinks, etc. (bottles, cups, etc.), and cargo containers (wooden boxes, cardboard, resin containers, etc.). It is also applicable when Both contribute to highly accurate recognition (detection) of the container itself, regardless of the contents of the container.

本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能である。具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 The present invention can be implemented as, for example, a system, an apparatus, a method, a program, a recording medium, or the like. Specifically, the present invention may be applied to a system consisting of a plurality of devices, or may be applied to a device consisting of a single device.

なお、ＣＰＵ２０１が行うものとして説明した上述の各種制御は１つのハードウェアが行ってもよいし、複数のハードウェア（例えば、複数のプロセッサーや回路）が処理を分担することで、装置全体の制御を行ってもよい。 Note that the various controls described above as being performed by the CPU 201 may be performed by a single piece of hardware, or multiple pieces of hardware (for example, multiple processors or circuits) may share the processing to control the entire device. You may do so.

また、本発明をその好適な実施形態に基づいて詳述してきたが、本発明はこれら特定の実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の様々な形態も本発明に含まれる。さらに、上述した各実施形態は本発明の一実施形態を示すものにすぎず、各実施形態を適宜組み合わせることも可能である。 Further, although the present invention has been described in detail based on its preferred embodiments, the present invention is not limited to these specific embodiments, and the present invention may be applied to various forms without departing from the gist of the present invention. included. Furthermore, each of the embodiments described above is merely one embodiment of the present invention, and it is also possible to combine the embodiments as appropriate.

また、上述した実施形態においては、本発明をＰＣに適用した場合を例にして説明したが、これはこの例に限定されず黒塗り画像を生成できる装置であれば適用可能である。すなわち、本発明はＰＤＡ、携帯電話端末（スマートフォン）、タブレット端末などに適用可能である。 Furthermore, in the above-described embodiments, the present invention has been described as an example in which the present invention is applied to a PC, but the present invention is not limited to this example, and can be applied to any device that can generate a blacked-out image. That is, the present invention is applicable to PDAs, mobile phone terminals (smartphones), tablet terminals, and the like.

（他の実施形態）
本発明は、以下の処理を実行することによっても実現される。即ち、上述した実施形態の機能を実現するソフトウェア（プログラム）をネットワーク又は各種記憶媒体を介してシステム或いは装置に供給し、そのシステム或いは装置のコンピュータ（又はＣＰＵやＭＰＵ等）がプログラムコードを読み出して実行する処理である。この場合、そのプログラム、及び該プログラムを記憶した記憶媒体は本発明を構成することになる。 (Other embodiments)
The present invention is also realized by performing the following processing. That is, the software (program) that realizes the functions of the embodiments described above is supplied to a system or device via a network or various storage media, and the computer (or CPU, MPU, etc.) of the system or device reads the program code. This is the process to be executed. In this case, the program and the storage medium storing the program constitute the present invention.

１０１クライアント端末
１０７ネットワーク 101 Client terminal 107 Network

Claims

acquisition means for acquiring a container region including the container in the image;
processing means that performs specific processing on a partial area inside the container area acquired by the acquisition means to reduce differences from other images;
An information processing system comprising: a control means for performing a learning process using the image processed by the processing means.

2. The information processing system according to claim 1, wherein the control means performs control to perform learning processing using an image processed by the processing means and an image not processed by the processing means.

Information processing according to claim 1, characterized in that the control means performs control so that the learning process is performed using images processed by the processing means without using images that have not been processed by the processing means. system.

The partial area is an area corresponding to the object based on the difference between an image in which an object different from the container is inside the container area and an image in which an object different from the container is not inside the container area. The information processing system according to claim 1, characterized in that:

The information processing system according to claim 1, wherein the learning process is a process of generating a trained model for identifying the type of container from an image.

The control means inputs a recognition target image showing a container to the trained model generated in the learning process, and performs control to perform an inference process to determine the type of container included in the image. The information processing system according to claim 5.

The processing means also performs the specific processing on a container region including the container in the recognition target image,
7. The information processing system according to claim 6, wherein the control means controls the image processed by the processing means to be input to the learned model and perform inference processing.

acquisition means for acquiring a container region including the container in the image;
processing means that performs specific processing on a partial area inside the container area acquired by the acquisition means to reduce differences from other images;
An information processing system comprising: control means for controlling to perform inference processing using the image processed by the processing means.

When the type of container specified by the inference process satisfies at least one of a condition related to aspect ratio and a condition related to size, the control means performs the inference process as the type of container included in the image to be recognized. 9. The information processing system according to claim 6, wherein the information processing system is controlled to output information indicating the type of container specified by.

2. The processing means performs the same processing as the specific processing on each partial region of the plurality of container regions acquired by the acquisition means from one or more images. 8. The information processing system according to any one of 8.

9. The information processing system according to claim 1, wherein the specific processing is a process of filling with a specific color or pattern.

The information processing system according to claim 11, wherein the specific color is black.

9. The information processing system according to claim 1, wherein the processing means performs the specific processing on a part of a predetermined area from the center of the container area.

9. The information processing system according to claim 1, wherein the processing means performs the specific processing on a partial region having a different shape depending on the shape of the container region.

an acquisition step of acquiring a container region including the container in the image;
a processing step of performing specific processing on a partial area inside the container area acquired in the acquisition step to reduce differences from other images;
A control method for an information processing system, comprising: a control step for performing a learning process using the image processed in the processing step.

an acquisition step of acquiring a container region including the container in the image;
a processing step of performing specific processing on a partial area inside the container area acquired in the acquisition step to reduce differences from other images;
A control method for an information processing system, comprising: a control step for performing inference processing using the image processed in the processing step.

A program for causing at least one computer to function as each means of the information processing system according to any one of claims 1 to 8.