JP2024011398A

JP2024011398A - Program, image processing method, image processing device, model generation method, and image processing system

Info

Publication number: JP2024011398A
Application number: JP2022113343A
Authority: JP
Inventors: 雄一郎四元; Yuichiro Yotsumoto
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2022-07-14
Filing date: 2022-07-14
Publication date: 2024-01-25

Abstract

PROBLEM TO BE SOLVED: To provide a program and the like that can sort images with high precision without requiring time for sorting processing.

SOLUTION: A computer acquires a captured image obtained by photographing a subject. The computer generates a subject image by removing a background region from the acquired captured image. The computer inputs the subject image in which the background region is removed from the captured image into a learning model that has been trained to output information regarding the suitability of the captured image when the subject image in which the background region is removed from the captured image is input, and outputs information regarding the suitability of the captured image from the learning model. Then, the computer sorts the captured image in terms of suitability based on the information regarding the suitability output from the learning model.

SELECTED DRAWING: Figure 2

Description

本願は、プログラム、画像処理方法、画像処理装置、モデル生成方法、及び画像処理システムに関する。 The present application relates to a program, an image processing method, an image processing device, a model generation method, and an image processing system.

特許文献１には、スポーツ大会やイベント等でカメラマンが撮影した画像を写真及び画像データとして販売する販売システムが開示されている。特許文献１に開示のような販売システムでは、カメラマンによって撮影された多数の画像を、販売対象とする画像と販売対象としない画像とに仕分けることが行われており、このような仕分け処理は人手で行われることが多い。 Patent Document 1 discloses a sales system that sells images taken by photographers at sports tournaments, events, etc. as photographs and image data. In a sales system such as that disclosed in Patent Document 1, a large number of images taken by a photographer are sorted into images to be sold and images not to be sold, and such sorting processing requires manual labor. It is often done in

特開２０２１－１８５３４号公報JP 2021-18534 Publication

画像の仕分けを人手で行う場合、画像の数によっては仕分けに長時間を要することがあり、また、仕分けを行う人の感性によって仕分け結果に差が生じる可能性がある。そこで、機械学習によって生成された学習モデルによって画像の仕分けを行うことが考えられる。しかし、学習モデルの学習に用いる画像に偏りがあると、高精度の画像の仕分けを実現する学習モデルを生成できない可能性がある。よって、仕分け処理に時間を要することなく、画像の仕分けを高精度に行うことは難しいという問題がある。特許文献１では、画像を仕分ける処理については言及されていない。 When sorting images manually, sorting may take a long time depending on the number of images, and the sorting results may vary depending on the sensitivity of the person doing the sorting. Therefore, it is conceivable to sort images using a learning model generated by machine learning. However, if the images used for learning a learning model are biased, it may not be possible to generate a learning model that can classify images with high accuracy. Therefore, there is a problem in that it is difficult to sort images with high precision without requiring time for the sorting process. Patent Document 1 does not mention the process of sorting images.

本開示は、仕分け処理に時間を要することなく、画像の仕分けを高精度に行うことが可能なプログラム等を提供することを目的とする。 An object of the present disclosure is to provide a program and the like that can sort images with high precision without requiring time for sorting processing.

本発明の一態様に係るプログラムは、被写体を撮影した撮影画像を取得し、取得した撮影画像から背景領域を除去し、撮影画像から背景領域が除去された被写体画像を入力した場合に前記撮影画像の適否に関する情報を出力するように学習された学習モデルに、取得した前記撮影画像から背景領域を除去した被写体画像を入力して、前記撮影画像の適否に関する情報を前記学習モデルから出力し、前記撮影画像の適否に関する情報に基づいて、前記撮影画像の適否の仕分けを行う処理をコンピュータに実行させる。 A program according to one aspect of the present invention acquires a captured image of a subject, removes a background area from the acquired captured image, and when a subject image from which the background area is removed is input, the captured image A subject image obtained by removing the background region from the acquired photographed image is input to a learning model that has been trained to output information regarding the suitability of the photographed image, and information regarding the suitability of the photographed image is outputted from the learning model; A computer is caused to perform a process of classifying the photographed images as appropriate or inappropriate based on information regarding the suitability of the photographed images.

本発明の一態様では、仕分け処理に時間を要することなく、画像の仕分けを高精度に行うことができる。 In one aspect of the present invention, images can be sorted with high precision without requiring time for sorting processing.

画像処理システムの構成例を示す説明図である。FIG. 1 is an explanatory diagram showing a configuration example of an image processing system. サーバ及び管理者端末の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of a server and an administrator terminal. サーバに記憶されるＤＢのレコードレイアウトの一例を示す説明図である。FIG. 2 is an explanatory diagram showing an example of a record layout of a DB stored in a server. 学習モデルの説明図である。It is an explanatory diagram of a learning model. 学習モデルの生成処理手順の一例を示すフローチャートである。3 is a flowchart illustrating an example of a learning model generation process procedure. 撮影画像の仕分け処理手順の一例を示すフローチャートである。3 is a flowchart illustrating an example of a procedure for sorting photographic images. 管理者端末の画面例を示す説明図である。It is an explanatory view showing an example of a screen of an administrator terminal. 実施形態２の仕分け処理手順の一例を示すフローチャートである。7 is a flowchart illustrating an example of a sorting process procedure according to the second embodiment.

以下に、本開示のプログラム、画像処理方法、画像処理装置、モデル生成方法、及び画像処理システムについて、その実施形態を示す図面に基づいて詳述する。 Below, a program, an image processing method, an image processing device, a model generation method, and an image processing system of the present disclosure will be described in detail based on drawings showing embodiments thereof.

（実施形態１）
図１は画像処理システムの構成例を示す説明図である。本実施形態では、カメラマンが撮影した撮影画像を販売対象とするか否かの仕分けを行い、販売対象に仕分けられた撮影画像を販売する画像処理システムについて説明する。本実施形態の画像処理システムは、プロスポーツの試合、コンサート等のイベントで、イベントの主催者側のカメラマンが撮影した画像を写真及び画像データとしてイベントの参加者等に販売するシステムに適用可能である。本実施形態の画像処理システムは、サーバ１０、カメラ２０、管理者端末３０、写真販売機４０、及びユーザ端末５０等を含み、これらの各機器はネットワークＮを介して通信接続されている。ネットワークＮは、インターネット又は公衆電話回線網であってもよく、画像処理システムが設けられている施設内に構築されたＬＡＮ（Local Area Network）であってもよい。また、サーバ１０と、カメラ２０、管理者端末３０又は写真販売機４０のいずれかとは、ケーブルを介した有線通信又は無線通信によって直接情報の送受信を行うように構成されていてもよい。 (Embodiment 1)
FIG. 1 is an explanatory diagram showing an example of the configuration of an image processing system. In this embodiment, an image processing system will be described in which images taken by a photographer are sorted to determine whether they are to be sold or not, and the photographed images that have been classified as to be sold are sold. The image processing system of this embodiment can be applied to a system that sells images taken by a photographer on the event organizer's side to event participants as photos and image data at events such as professional sports matches and concerts. be. The image processing system of this embodiment includes a server 10, a camera 20, an administrator terminal 30, a photo vending machine 40, a user terminal 50, etc., and these devices are communicatively connected via a network N. The network N may be the Internet or a public telephone line network, or may be a LAN (Local Area Network) constructed within a facility where the image processing system is installed. Further, the server 10 and any one of the camera 20, the administrator terminal 30, or the photo vending machine 40 may be configured to directly transmit and receive information through wired communication via a cable or wireless communication.

カメラ２０は、レンズ及び撮像素子等を有する撮像部、ネットワークＮに接続するための通信部等を備える撮影装置である。カメラ２０は、撮影ボタンに対する操作に従って撮像部による撮影を行って画像データ（以下では撮影画像と称する）を取得する処理、取得した撮影画像を通信部からサーバ１０へ送信する処理を行う。カメラ２０は、撮影ボタンに対する１回の操作に従って１枚の画像（静止画）を取得する処理と、例えば１秒間に３０枚又は１５枚の画像（動画）を取得する処理とを行うように構成されている。また、カメラ２０は、撮影によって取得した画像を逐次サーバ１０へ送信する構成でもよく、撮影した画像を記憶部に蓄積しておき、カメラマンによる操作に従ってサーバ１０へ送信する構成でもよい。なお、カメラ２０は、カメラマンに保持された状態で撮影を行うカメラであってもよく、三脚又は固定器具を用いて撮影位置が固定された状態で撮影を行うカメラであってもよく、１つのイベント会場に複数のカメラ２０が設けられていてもよい。 The camera 20 is a photographing device that includes an imaging unit having a lens, an image sensor, etc., a communication unit for connecting to the network N, and the like. The camera 20 performs a process of taking a picture with the imaging unit in accordance with an operation on a photographing button and acquiring image data (hereinafter referred to as a photographed image), and a process of transmitting the acquired photographed image to the server 10 from a communication unit. The camera 20 is configured to perform processing to obtain one image (still image) according to one operation on the shooting button, and processing to obtain, for example, 30 or 15 images (video) per second. has been done. Further, the camera 20 may be configured to sequentially transmit images acquired through photography to the server 10, or may be configured to accumulate captured images in a storage section and transmit them to the server 10 according to an operation by a photographer. Note that the camera 20 may be a camera that takes pictures while being held by a photographer, or may be a camera that takes pictures while the shooting position is fixed using a tripod or a fixing device, and may be a camera that takes pictures while being held by a photographer. A plurality of cameras 20 may be provided at the event venue.

サーバ１０は、種々の情報処理及び情報の送受信が可能な画像処理装置であり、サーバコンピュータ、パーソナルコンピュータ等である。サーバ１０は、カメラ２０によって撮影された撮影画像を取得し、取得した撮影画像に対して、販売対象とするか否かの仕分け処理を行う。本実施形態のサーバ１０は、撮影画像を販売対象とするか否かを仕分ける際に学習モデル１２Ｍ（図２参照）を用いる。管理者端末３０は、サーバ１０によって販売される撮影画像を管理する管理者が使用する端末であり、パーソナルコンピュータ、タブレット端末、スマートフォン等である。なお、管理者は、サーバ１０が学習モデル１２Ｍを用いて販売対象に仕分けた撮影画像に対して、真に販売対象とすべきか否かの仕分け処理を行う。 The server 10 is an image processing device capable of various information processing and transmission/reception of information, and is a server computer, a personal computer, or the like. The server 10 acquires photographed images taken by the camera 20, and performs sorting processing on the acquired photographed images to determine whether they are to be sold. The server 10 of this embodiment uses the learning model 12M (see FIG. 2) when sorting whether or not a photographed image is to be sold. The administrator terminal 30 is a terminal used by an administrator who manages the captured images sold by the server 10, and is a personal computer, a tablet terminal, a smartphone, or the like. Note that the administrator performs sorting processing on the captured images that have been sorted into sales items by the server 10 using the learning model 12M to determine whether they should truly be sold.

また、サーバ１０は、自機又は管理者が販売対象に仕分けた撮影画像をネットワークＮ経由で販売する処理を行う。サーバ１０が販売する撮影画像は、写真販売機４０又はユーザ端末５０を介して購入される。写真販売機４０は、カメラ２０による撮影が行われたイベントが開催された会場等に設置された端末であり、サーバ１０との間で通信するための通信部、タッチパネル、決済処理部、印刷部等を備える。写真販売機４０は、サーバ１０が販売する撮影画像をタッチパネルに表示し、購入対象の撮影画像の選択を受け付ける処理を行う。また写真販売機４０は、撮影画像の購入に係る決済処理を決済処理部によって行い、決済処理が行われた撮影画像を印刷部によって印刷して購入者に提供する処理を行う。決済処理は、現金決済、電子マネー決済、クレジットカード決済、アプリ決済等のいずれの決済方法であってもよい。ユーザ端末５０は、撮影画像を購入するユーザが使用する端末であり、スマートフォン、タブレット端末等である。ユーザ端末５０は、ネットワークＮ経由でウェブサイトを閲覧するためのブラウザがインストールされており、ブラウザによって、サーバ１０が販売する撮影画像の閲覧及び購入に係る決済処理を行う。なお、ユーザ端末５０を用いて撮影画像を購入する場合、撮影画像がサーバ１０からユーザ端末５０にダウンロードされ、ユーザ端末５０が通信可能なプリンタへ撮影画像を送信することにより印刷できる。なお、ユーザ端末５０を用いて購入した撮影画像は、ユーザ端末５０にダウンロードされる構成のほかに、例えばコンビニエンスストアに設置されたプリンタにサーバ１０からダウンロードされて印刷される構成でもよく、所定の印刷会社へ送信され、印刷された写真がユーザの自宅等に配送される構成でもよい。 The server 10 also performs a process of selling, via the network N, captured images that have been sorted into sales targets by the server itself or by the administrator. Photographed images sold by the server 10 are purchased via the photo vending machine 40 or the user terminal 50. The photo vending machine 40 is a terminal installed at a venue or the like where an event was held where photography was performed using the camera 20, and includes a communication section, a touch panel, a payment processing section, and a printing section for communicating with the server 10. Equipped with etc. The photo vending machine 40 displays the photographed images sold by the server 10 on the touch panel, and performs a process of accepting selection of a photographed image to be purchased. The photo vending machine 40 also has a payment processing section that performs payment processing related to the purchase of a photographed image, and a printing section that prints the photographed image for which the payment processing has been performed and provides it to the purchaser. The payment process may be any payment method such as cash payment, electronic money payment, credit card payment, or app payment. The user terminal 50 is a terminal used by a user who purchases captured images, and is a smartphone, a tablet terminal, or the like. The user terminal 50 has a browser installed therein for viewing websites via the network N, and uses the browser to perform payment processing for viewing and purchasing captured images sold by the server 10. Note that when purchasing a photographed image using the user terminal 50, the photographed image is downloaded from the server 10 to the user terminal 50, and can be printed by transmitting the photographed image to a printer with which the user terminal 50 can communicate. Note that, in addition to being downloaded to the user terminal 50, captured images purchased using the user terminal 50 may be downloaded from the server 10 and printed on a printer installed at a convenience store, for example, and may be printed in a predetermined manner. It may also be configured such that the photo is sent to a printing company and the printed photo is delivered to the user's home or the like.

図２はサーバ１０及び管理者端末３０の構成例を示すブロック図である。サーバ１０は、制御部１１、記憶部１２、通信部１３、入力部１４、表示部１５等を有し、これらの各部はバスを介して接続されている。制御部１１は、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro-Processing Unit）、ＧＰＵ（Graphics Processing Unit）、又はＡＩチップ（ＡＩ用半導体）等の１又は複数のプロセッサを含む。制御部１１は、記憶部１２に記憶されたプログラム１２Ｐを適宜実行することにより、サーバ１０が行うべき情報処理及び制御処理を実行する。 FIG. 2 is a block diagram showing a configuration example of the server 10 and the administrator terminal 30. The server 10 includes a control section 11, a storage section 12, a communication section 13, an input section 14, a display section 15, etc., and these sections are connected via a bus. The control unit 11 includes one or more processors such as a CPU (Central Processing Unit), an MPU (Micro-Processing Unit), a GPU (Graphics Processing Unit), or an AI chip (semiconductor for AI). The control unit 11 executes the information processing and control processing that the server 10 should perform by appropriately executing the program 12P stored in the storage unit 12.

記憶部１２は、ＲＡＭ（Random Access Memory）、フラッシュメモリ、ハードディスク、ＳＳＤ（Solid State Drive）等を含む。記憶部１２は、制御部１１が実行するプログラム１２Ｐ（プログラム製品）及び各種のデータを記憶している。また記憶部１２は、制御部１１がプログラム１２Ｐを実行する際に発生するデータ等を一時的に記憶する。プログラム１２Ｐ及び各種のデータは、サーバ１０の製造段階において記憶部１２に書き込まれてもよく、制御部１１が通信部１３を介して他の装置からダウンロードして記憶部１２に記憶してもよい。また記憶部１２は、例えば機械学習によって訓練データを学習済みの学習モデル１２Ｍを記憶している。学習モデル１２Ｍは、カメラ２０で撮影された撮影画像（厳密には、撮影画像から背景領域が除去された被写体画像）が入力された場合に、撮影画像が販売対象として適切であるか否かを示す情報を出力するように学習された学習済みモデルである。販売対象として適切である画像とは、例えばピントが被写体に合っている画像、構図又はアングルが良い画像、一般的に人が好む画像等、人が購入したくなる画像とする。以下では、販売対象として適切である画像を「良い画像」と称し、販売対象として適切でない画像を「悪い画像」と称し、画像の良し悪しは、販売対象として適切であるか否かを意味する。学習モデル１２Ｍは、人工知能ソフトウェアを構成するプログラムモジュールとしての利用が想定される。学習モデル１２Ｍは、入力値に対して所定の演算を行い、演算結果を出力するものであり、記憶部１２には、この演算を規定する関数の係数や閾値等のデータが学習モデル１２Ｍとして記憶される。 The storage unit 12 includes a RAM (Random Access Memory), a flash memory, a hard disk, an SSD (Solid State Drive), and the like. The storage unit 12 stores a program 12P (program product) executed by the control unit 11 and various data. The storage unit 12 also temporarily stores data and the like generated when the control unit 11 executes the program 12P. The program 12P and various data may be written into the storage unit 12 during the manufacturing stage of the server 10, or may be downloaded by the control unit 11 from another device via the communication unit 13 and stored in the storage unit 12. . Furthermore, the storage unit 12 stores a learning model 12M that has been trained using training data, for example, by machine learning. The learning model 12M determines whether or not the captured image is suitable for sale when an image captured by the camera 20 (strictly speaking, a subject image from which the background area has been removed) is input. This is a trained model that has been trained to output the information shown. Images that are suitable for sale are images that people would want to purchase, such as images where the subject is in focus, images with good composition or angles, and images that people generally like. In the following, images that are suitable for sale will be referred to as "good images," and images that are not suitable for sale will be referred to as "bad images." The quality of an image means whether or not it is suitable for sale. . The learning model 12M is assumed to be used as a program module that constitutes artificial intelligence software. The learning model 12M performs predetermined calculations on input values and outputs the calculation results, and the storage unit 12 stores data such as coefficients and threshold values of functions that define this calculation as the learning model 12M. be done.

また記憶部１２は、撮影画像ＤＢ１２ａ、仕分け画像ＤＢ１２ｂ、判定結果ＤＢ１２ｃ、及び販売画像ＤＢ１２ｄを記憶している。撮影画像ＤＢ１２ａは、カメラ２０によって撮影された撮影画像が蓄積されたデータベースである。仕分け画像ＤＢ１２ｂは、撮影画像に対して、学習モデル１２Ｍを用いて良い画像又は悪い画像に仕分けされた結果が記憶されるデータベースである。判定結果ＤＢ１２ｃは、学習モデル１２Ｍを用いて良い画像に仕分けされた撮影画像に対して、管理者が良し悪しを判定した結果が記憶されるデータベースである。販売画像ＤＢ１２ｄは、管理者によって良い画像と判定されて販売対象となった撮影画像が記憶されるデータベースである。学習モデル１２Ｍ、撮影画像ＤＢ１２ａ、仕分け画像ＤＢ１２ｂ、判定結果ＤＢ１２ｃ、及び販売画像ＤＢ１２ｄの一部又は全部は、サーバ１０に接続された他の記憶装置に記憶されてもよく、サーバ１０が通信可能な他の記憶装置に記憶されてもよい。 The storage unit 12 also stores a photographed image DB 12a, a sorted image DB 12b, a determination result DB 12c, and a sales image DB 12d. The photographed image DB 12a is a database in which photographed images photographed by the camera 20 are accumulated. The sorted image DB 12b is a database in which the results of sorting captured images into good images or bad images using the learning model 12M are stored. The determination result DB 12c is a database in which the results of the administrator's determination of the quality of photographic images classified into good images using the learning model 12M are stored. The sales image DB 12d is a database that stores photographed images that are determined to be good images by the administrator and are available for sale. A part or all of the learning model 12M, photographed image DB 12a, sorted image DB 12b, judgment result DB 12c, and sales image DB 12d may be stored in another storage device connected to the server 10, with which the server 10 can communicate. It may also be stored in other storage devices.

通信部１３は、有線通信又は無線通信に関する処理を行うための通信モジュールであり、ネットワークＮを介して他の装置との間で情報の送受信を行う。入力部１４は、ユーザによる操作入力を受け付け、操作内容に対応した制御信号を制御部１１へ送出する。表示部１５は、液晶ディスプレイ又は有機ＥＬディスプレイ等であり、制御部１１からの指示に従って各種の情報を表示する。入力部１４の一部及び表示部１５は一体として構成されたタッチパネルであってもよく、また、タッチパネルはサーバ１０に外付けされている構成でもよい。 The communication unit 13 is a communication module for performing processing related to wired communication or wireless communication, and sends and receives information to and from other devices via the network N. The input unit 14 accepts operation input by the user and sends a control signal corresponding to the operation content to the control unit 11. The display unit 15 is a liquid crystal display, an organic EL display, or the like, and displays various information according to instructions from the control unit 11. A part of the input unit 14 and the display unit 15 may be integrated into a touch panel, or the touch panel may be externally attached to the server 10.

本実施形態において、サーバ１０は複数のコンピュータからなるマルチコンピュータであってもよく、ソフトウェアによって仮想的に構築された仮想マシンであってもよく、クラウドサーバであってもよい。また、サーバ１０は、入力部１４及び表示部１５は必須ではなく、接続されたコンピュータを通じて操作を受け付ける構成でもよく、表示すべき情報を外部の表示装置へ出力する構成でもよい。また、サーバ１０は、非一時的なコンピュータ読取可能な可搬型記憶媒体１０ａを読み取る読取部を備え、読取部を用いて可搬型記憶媒体１０ａからプログラム１２Ｐを読み取って記憶部１２に記憶してもよい。なお、プログラム１２Ｐは単一のコンピュータ上で実行されてもよく、ネットワークＮを介して相互に接続された複数のコンピュータ上で実行されてもよい。 In this embodiment, the server 10 may be a multicomputer consisting of a plurality of computers, a virtual machine virtually constructed by software, or a cloud server. Further, the input unit 14 and the display unit 15 are not essential to the server 10, and the server 10 may have a configuration that accepts operations through a connected computer, or a configuration that outputs information to be displayed to an external display device. The server 10 also includes a reading section that reads the non-temporary computer-readable portable storage medium 10a, and the reading section may be used to read the program 12P from the portable storage medium 10a and store it in the storage section 12. good. Note that the program 12P may be executed on a single computer, or may be executed on multiple computers interconnected via the network N.

管理者端末３０は、制御部３１、記憶部３２、通信部３３、入力部３４、表示部３５等を有し、これらの各部はバスを介して接続されている。制御部３１、記憶部３２、通信部３３、入力部３４、及び表示部３５のそれぞれは、サーバ１０の制御部１１、記憶部１２、通信部１３、入力部１４、及び表示部１５と同様の構成を有するので、構成についての説明は省略する。ユーザ端末５０は、管理者端末３０と同様の構成を有するので、図示及び構成についての説明は省略する。写真販売機４０は、管理者端末３０と同様の構成に加えて、決済処理部及び印刷部を有するが、構成についての詳細な説明は省略する。 The administrator terminal 30 includes a control section 31, a storage section 32, a communication section 33, an input section 34, a display section 35, etc., and these sections are connected via a bus. Each of the control unit 31, storage unit 32, communication unit 33, input unit 34, and display unit 35 is similar to the control unit 11, storage unit 12, communication unit 13, input unit 14, and display unit 15 of the server 10. Since the present invention has a configuration, a description of the configuration will be omitted. Since the user terminal 50 has the same configuration as the administrator terminal 30, illustration and description of the configuration will be omitted. The photo vending machine 40 has a payment processing section and a printing section in addition to the same configuration as the administrator terminal 30, but a detailed explanation of the configuration will be omitted.

図３はサーバ１０に記憶されるＤＢ１２ａ～１２ｄのレコードレイアウトの一例を示す説明図である。図３Ａは撮影画像ＤＢ１２ａを、図３Ｂは仕分け画像ＤＢ１２ｂを、図３Ｃは判定結果ＤＢ１２ｃを、図３Ｄは販売画像ＤＢ１２ｄをそれぞれ示す。撮影画像ＤＢ１２ａ、仕分け画像ＤＢ１２ｂ、判定結果ＤＢ１２ｃ、及び販売画像ＤＢ１２ｄはそれぞれ、撮影対象のイベント毎に設けられており、各イベントに割り当てられたイベントＩＤに対応付けて記憶部１２に記憶されている。 FIG. 3 is an explanatory diagram showing an example of the record layout of the DBs 12a to 12d stored in the server 10. 3A shows the photographed image DB 12a, FIG. 3B shows the sorted image DB 12b, FIG. 3C shows the determination result DB 12c, and FIG. 3D shows the sales image DB 12d. The photographed image DB 12a, the sorted image DB 12b, the judgment result DB 12c, and the sales image DB 12d are provided for each event to be photographed, and are stored in the storage unit 12 in association with the event ID assigned to each event. .

図３Ａに示す撮影画像ＤＢ１２ａは、画像ＩＤ列、ファイル名列、撮影日時列、及び撮影場所列を含み、画像ＩＤに対応付けて撮影画像に関する情報を記憶する。画像ＩＤ列は、カメラ２０で撮影された撮影画像に固有に割り当てられた識別情報（画像ＩＤ）を記憶する。ファイル名列は、記憶部１２に記憶された撮影画像を読み出すためのフォルダ名及びファイル名を記憶する。なお、カメラ２０から取得された撮影画像は、記憶部１２の所定領域（画像フォルダ）に記憶される。撮影日時列は、撮影画像が撮影された日時を記憶し、撮影場所列は撮影場所の情報を記憶する。撮影場所の情報は、撮影場所の住所、撮影場所の建物の名称、撮影対象のイベントの名称、イベント会場内の場所を示す情報等であってもよい。撮影画像ＤＢ１２ａの記憶内容は図３Ａに示す例に限定されず、例えば撮影を行ったカメラ２０及びカメラマンの情報が記憶されてもよい。 The photographed image DB 12a shown in FIG. 3A includes an image ID column, a file name column, a photographing date and time column, and a photographing location column, and stores information regarding the photographed image in association with the image ID. The image ID column stores identification information (image ID) uniquely assigned to an image taken by the camera 20. The file name column stores folder names and file names for reading captured images stored in the storage unit 12. Note that the photographed images acquired from the camera 20 are stored in a predetermined area (image folder) of the storage unit 12. The photographing date and time column stores the date and time when the photographed image was photographed, and the photographing location column stores information on the photographing location. The information on the shooting location may include the address of the shooting location, the name of the building at the shooting location, the name of the event to be shot, information indicating a location within the event venue, and the like. The storage contents of the photographed image DB 12a are not limited to the example shown in FIG. 3A, and for example, information about the camera 20 that took the photograph and the cameraman may be stored.

図３Ｂに示す仕分け画像ＤＢ１２ｂは、画像ＩＤ列及び仕分け結果列を含み、撮影画像ＤＢ１２ａに登録してある各撮影画像の画像ＩＤに対応付けて、サーバ１０が学習モデル１２Ｍを用いて各撮影画像を良い画像又は悪い画像に仕分けた結果（良又は悪）を記憶する。図３Ｃに示す判定結果ＤＢ１２ｃは、画像ＩＤ列及び管理者判定結果列を含み、仕分け画像ＤＢ１２ｂに記憶された仕分け結果が「良」である各撮影画像、即ち、サーバ１０によって良い画像に仕分けられた各撮影画像の画像ＩＤに対応付けて、管理者が各撮影画像を良い画像又は悪い画像に仕分けた（判定した）結果（良又は悪）を記憶する。図３Ｄに示す販売画像ＤＢ１２ｄは、画像ＩＤ列及びファイル名列を含み、判定結果ＤＢ１２ｃに記憶された管理者判定結果が「良」である各撮影画像、即ち、管理者によって良い画像と判定された各撮影画像の画像ＩＤに対応付けて、撮影画像を記憶部１２から読み出すためのフォルダ名及びファイル名を記憶する。仕分け画像ＤＢ１２ｂ、判定結果ＤＢ１２ｃ、及び販売画像ＤＢ１２ｄの記憶内容は図３Ｂ～図３Ｄに示す例に限定されない。 The sorting image DB 12b shown in FIG. 3B includes an image ID column and a sorting result column, and the server 10 uses the learning model 12M to associate each captured image with the image ID of each captured image registered in the captured image DB 12a. The result of classifying images into good or bad images (good or bad) is stored. The judgment result DB 12c shown in FIG. 3C includes an image ID column and an administrator judgment result column, and includes each photographed image whose sorting result stored in the sorting image DB 12b is "good", that is, the image is sorted into a good image by the server 10. The result (determined) of the administrator's classification (determination) of each photographed image as a good image or a bad image is stored in association with the image ID of each photographed image. The sales image DB 12d shown in FIG. 3D includes an image ID column and a file name column, and includes each photographed image for which the administrator judgment result stored in the judgment result DB 12c is "good," that is, the image is judged to be a good image by the administrator. A folder name and a file name for reading the photographed image from the storage unit 12 are stored in association with the image ID of each photographed image. The storage contents of the sorting image DB 12b, determination result DB 12c, and sales image DB 12d are not limited to the examples shown in FIGS. 3B to 3D.

図４は学習モデル１２Ｍの説明図であり、図４Ａは学習モデル１２Ｍの構成例を示し、図４Ｂは学習モデル１２Ｍの入力データである被写体画像の例を示す。図４Ａに示す学習モデル１２Ｍは、図４Ｂ左側に示すような撮影画像から背景領域が除去された被写体画像（図４Ｂ右側参照）を入力とし、入力された被写体画像に基づいて、当該撮影画像が販売対象として適切であるか否かを判別する演算を行い、演算した結果を出力するように学習してある。被写体画像として抽出される被写体は、例えば撮影画像中に大きく映っている被写体、撮影画像の中心に近い位置に映っている被写体等、主要な被写体とすることができる。学習モデル１２Ｍは、例えばＣＮＮ（Convolution Neural Network）、ＳＶＭ（Support Vector Machine）、Transformer等のアルゴリズムを用いて構成されてもよく、複数のアルゴリズムを組み合わせて構成されてもよい。 FIG. 4 is an explanatory diagram of the learning model 12M, FIG. 4A shows a configuration example of the learning model 12M, and FIG. 4B shows an example of a subject image that is input data of the learning model 12M. The learning model 12M shown in FIG. 4A receives as input a subject image from which the background region has been removed from a photographed image as shown on the left side of FIG. It has been trained to perform calculations to determine whether or not it is suitable for sale, and to output the results of the calculations. The subject extracted as the subject image can be a main subject, such as a subject that appears large in the captured image or a subject that appears close to the center of the captured image. The learning model 12M may be configured using algorithms such as CNN (Convolution Neural Network), SVM (Support Vector Machine), Transformer, etc., or may be configured by combining a plurality of algorithms.

学習モデル１２Ｍは、被写体画像が入力される入力層と、入力された被写体画像から特徴量を抽出する中間層と、中間層の演算結果を基に被写体画像を含む撮影画像が販売対象として適切であるか否かに関する情報を出力する出力層とを有する。入力層は、被写体画像に含まれる各画素の画素値が入力される入力ノードを有する。中間層は、各種の関数及び閾値等を用いて、入力層から入力された各画素値に基づいて出力値を算出する。出力層（出力部）は、販売対象として適切である画像（即ち、良い画像）と、販売対象として適切でない画像（即ち、悪い画像）とのそれぞれに対応付けられた２つの出力ノードを有しており、出力ノード０から、撮影画像が良い画像であると判別すべき確率（確信度）を出力し、出力ノード１から、撮影画像が悪い画像であると判別すべき確率（確信度）を出力する。各出力ノードからの出力値は、例えば０～１の値であり、各出力ノードから出力された確率の合計が１．０（１００％）となる。本実施形態では、出力ノード０からの出力値を、ここでの撮影画像が良い画像である程度（度合）、即ち、販売対象としての適切度を示すスコアとして用いる。 The learning model 12M has an input layer into which a subject image is input, an intermediate layer which extracts feature amounts from the input subject image, and a process that determines whether captured images including the subject image are suitable for sale based on the calculation results of the intermediate layer. and an output layer that outputs information regarding whether or not the output layer exists. The input layer has an input node into which the pixel value of each pixel included in the subject image is input. The intermediate layer uses various functions, threshold values, and the like to calculate an output value based on each pixel value input from the input layer. The output layer (output section) has two output nodes respectively associated with images that are suitable for sale (i.e., good images) and images that are not suitable for sale (i.e., bad images). Output node 0 outputs the probability (confidence) that the captured image should be determined to be a good image, and output node 1 outputs the probability (confidence) that the captured image should be determined to be a bad image. Output. The output value from each output node is, for example, a value between 0 and 1, and the sum of the probabilities output from each output node is 1.0 (100%). In this embodiment, the output value from output node 0 is used as a score indicating the degree to which the photographed image is a good image, that is, the suitability of the image as a sales target.

上述した構成により、学習モデル１２Ｍは、被写体画像が入力された場合に、背景領域が除去される前の撮影画像が販売対象として良い画像であるか悪い画像であるかを示す出力値（確信度）を出力する。サーバ１０は、上述した学習モデル１２Ｍにおいて、出力ノード０からの出力値を、撮影画像が販売対象として良い画像である程度を示すスコアとして取得する。なお、学習モデル１２Ｍの出力層は、２つの出力ノードを有する代わりに、出力ノード０のみを有する構成でもよい。 With the above-described configuration, when a subject image is input, the learning model 12M generates an output value (confidence level) indicating whether the photographed image before the background region is removed is a good or bad image for sale. ) is output. In the learning model 12M described above, the server 10 acquires the output value from the output node 0 as a score indicating the extent to which the photographed image is suitable for sale. Note that the output layer of the learning model 12M may have a configuration having only output node 0 instead of having two output nodes.

学習モデル１２Ｍは、訓練用の被写体画像と、この被写体画像において背景領域が除去される前の撮影画像が良い画像であるか悪い画像であるかを示す情報（正解ラベル）とを含む訓練データを用いて機械学習することにより生成できる。訓練データは、例えば撮影画像に対して、管理者が販売対象として適切であるか否か（良い画像であるか否か）を判定した結果を示す正解ラベルを、撮影画像から背景領域を除去した被写体画像に付与して生成される。 The learning model 12M includes training data including a subject image for training and information (correct label) indicating whether the photographed image of the subject image before the background region is removed is a good image or a bad image. It can be generated by machine learning using The training data includes, for example, a correct label indicating the result of the administrator's determination of whether the image is suitable for sale (whether the image is good or not) for a photographed image, and a background area removed from the photographed image. It is generated by adding it to the subject image.

学習モデル１２Ｍは、訓練データに含まれる被写体画像が入力された場合に、訓練データに含まれる正解ラベル（良い画像又は悪い画像）に対応する出力ノードからの出力値が１に近づき、他方の出力ノードからの出力値が０に近づくように学習する。学習処理において学習モデル１２Ｍは、入力された被写体画像に基づいて中間層及び出力層での演算を行い、各出力ノードからの出力値を算出する。学習モデル１２Ｍは、算出した各出力ノードの出力値と正解ラベルに応じた値（正解ラベルに対応する出力ノードに対しては１、他方の出力ノードに対しては０）とを比較し、両者が近似するように、中間層及び出力層での演算処理に用いるパラメータを最適化する。当該パラメータは、中間層及び出力層におけるノード間の重み（結合係数）等である。パラメータの最適化の方法は特に限定されないが、誤差逆伝播法、最急降下法等を用いることができる。これにより、被写体画像が入力された場合に、背景領域が除去される前の撮影画像が良い画像であるか悪い画像であるかを予測し、予測結果を出力する学習モデル１２Ｍが得られる。 In the learning model 12M, when a subject image included in the training data is input, the output value from the output node corresponding to the correct label (good image or bad image) included in the training data approaches 1, and the output value from the other output node approaches 1. Learn so that the output value from the node approaches 0. In the learning process, the learning model 12M performs calculations in the intermediate layer and the output layer based on the input object image, and calculates an output value from each output node. The learning model 12M compares the calculated output value of each output node with the value corresponding to the correct label (1 for the output node corresponding to the correct label and 0 for the other output node), and The parameters used for calculation processing in the intermediate layer and output layer are optimized so that The parameters include weights (coupling coefficients) between nodes in the intermediate layer and the output layer. The parameter optimization method is not particularly limited, but an error backpropagation method, steepest descent method, etc. can be used. As a result, when a subject image is input, a learning model 12M that predicts whether a captured image before the background region is removed is a good image or a bad image and outputs a prediction result is obtained.

サーバ１０は、このような学習モデル１２Ｍを予め用意しておき、カメラ２０で撮影した撮影画像を良い画像又は悪い画像に仕分ける際に用いる。学習モデル１２Ｍの学習は他の学習装置で行われてもよい。他の学習装置で学習が行われて生成された学習済みの学習モデル１２Ｍは、例えばネットワークＮ経由又は可搬型記憶媒体１０ａ経由で学習装置からサーバ１０にダウンロードされて記憶部１２に記憶される。 The server 10 prepares such a learning model 12M in advance and uses it when classifying images taken by the camera 20 into good images or bad images. Learning of the learning model 12M may be performed by another learning device. A trained learning model 12M generated by learning performed by another learning device is downloaded from the learning device to the server 10 via the network N or the portable storage medium 10a, and is stored in the storage unit 12, for example.

以下に、上述したような訓練データを学習して学習モデル１２Ｍを生成する処理について説明する。図５は学習モデル１２Ｍの生成処理手順の一例を示すフローチャートである。以下の処理は、サーバ１０の制御部１１が、記憶部１２に記憶してあるプログラム１２Ｐに従って実行するが、他の学習装置で行われてもよい。以下の処理では、制御部１１はまず、記憶部１２に記憶してある撮影画像に基づいて訓練データを生成し、生成した訓練データを用いて学習モデル１２Ｍの学習を行う。なお、訓練データに用いる撮影画像は、管理者等によって良し悪しが判定され、判定結果が撮影画像に付与されて記憶部１２の所定領域（所定のＤＢ）に記憶してあるものとする。 Below, a process of learning the above-mentioned training data to generate the learning model 12M will be described. FIG. 5 is a flowchart showing an example of the procedure for generating the learning model 12M. The following processing is executed by the control unit 11 of the server 10 according to the program 12P stored in the storage unit 12, but may be executed by another learning device. In the following processing, the control unit 11 first generates training data based on the captured images stored in the storage unit 12, and uses the generated training data to perform learning of the learning model 12M. It is assumed that the photographed images used for the training data are judged to be good or bad by an administrator or the like, and the judgment results are added to the photographed images and stored in a predetermined area (predetermined DB) of the storage unit 12.

サーバ１０の制御部１１は、記憶部１２に記憶してある撮影画像と、当該撮影画像に対して管理者等が良し悪しを判定した判定結果（撮影画像の適否に関する情報）とを読み出す（Ｓ１１）。制御部１１は、読み出した撮影画像に対して、背景領域を除去する背景除去処理を行う（Ｓ１２）。背景除去処理は、どのような処理であってもよく、例えばＯｐｅｎＣＶの背景抽出クラス（BackgroundSubtrator）を使用した処理、カレイド社が提供する画像背景削除ツール（remove.bg）を使用した処理、Ｕ２－Ｎｅｔ等のディープラーニング学習モデルを使用した処理等であってもよい。なお、学習モデルを用いる場合、撮影画像と、撮影画像から背景が除去された前景画像（被写体画像）とにより、撮影画像が入力された場合に被写体画像を出力するように学習した学習モデルを使用する。また、背景のみが撮影された撮影画像（背景画像）がある場合、背景画像を用いて、撮影画像から背景領域を除去することによって被写体画像を生成する構成でもよい。このような背景除去処理により、制御部１１は、撮影画像から背景領域が除去された被写体画像を生成する。 The control unit 11 of the server 10 reads out the photographed image stored in the storage unit 12 and the judgment result (information regarding the suitability of the photographed image) determined by the administrator etc. for the photographed image (S11 ). The control unit 11 performs a background removal process to remove a background area on the read photographic image (S12). Background removal processing may be any type of processing, such as processing using OpenCV's background extraction class (BackgroundSubtrator), processing using the image background removal tool (remove.bg) provided by Kaleido, or U2- It may also be a process using a deep learning learning model such as Net. In addition, when using a learning model, a learning model that has been trained to output a subject image when a captured image is input is used by using a captured image and a foreground image (subject image) from which the background has been removed. do. Furthermore, when there is a photographed image in which only the background is photographed (background image), a configuration may be adopted in which a subject image is generated by using the background image and removing the background region from the photographed image. Through such background removal processing, the control unit 11 generates a subject image from which the background region has been removed from the photographed image.

制御部１１は、生成した被写体画像に、ステップＳ１１で読み出した良し悪しの判定結果に応じた正解ラベルを付与して訓練データを生成し、記憶部１２に記憶する（Ｓ１３）。具体的には、制御部１１は、判定結果が良の場合、良の正解ラベルを撮影画像に付与し、判定結果が悪の場合、悪の正解ラベルを撮影画像に付与する。制御部１１は、生成した訓練データを、例えば記憶部１２に用意された訓練ＤＢ（図示せず）に記憶しておく。 The control unit 11 generates training data by adding a correct label to the generated subject image according to the good/bad determination result read in step S11, and stores it in the storage unit 12 (S13). Specifically, when the determination result is good, the control unit 11 attaches a correct label of good to the photographed image, and when the determination result is bad, it attaches a correct label of bad to the photographed image. The control unit 11 stores the generated training data in a training DB (not shown) provided in the storage unit 12, for example.

制御部１１は、記憶部１２に記憶してある撮影画像のうちで、訓練データの生成処理に用いられていない未処理の撮影画像があるか否かを判断する（Ｓ１４）。未処理の撮影画像があると判断した場合（Ｓ１４：ＹＥＳ）、制御部１１は、ステップＳ１１の処理に戻り、未処理の撮影画像についてステップＳ１１～Ｓ１３の処理を行う。制御部１１は、未処理の撮影画像がないと判断するまでステップＳ１１～Ｓ１４の処理を繰り返す。これにより、記憶部１２に記憶してある撮影画像と、撮影画像に対する判定結果とに基づいて、学習モデル１２Ｍの学習に用いる訓練データが生成されて訓練ＤＢに蓄積される。上述した処理では、訓練データの生成に用いる撮影画像は、記憶部１２に記憶してある例で説明したが、サーバ１０の制御部１１は、例えばネットワークＮ経由で他の装置から、各撮影画像及び判定結果を取得する構成でもよい。 The control unit 11 determines whether or not there are unprocessed captured images that are not used in the training data generation process among the captured images stored in the storage unit 12 (S14). If it is determined that there is an unprocessed photographed image (S14: YES), the control unit 11 returns to the process of step S11 and performs the processes of steps S11 to S13 on the unprocessed photographed image. The control unit 11 repeats the processing of steps S11 to S14 until it determines that there are no unprocessed captured images. As a result, training data used for learning the learning model 12M is generated based on the photographed image stored in the storage unit 12 and the determination result for the photographed image, and is stored in the training DB. In the above-described process, the photographed images used to generate the training data are stored in the storage unit 12. However, the control unit 11 of the server 10 receives each photographed image from another device via the network N, for example. It may also be configured to acquire the determination result.

制御部１１は、未処理の撮影画像がないと判断した場合（Ｓ１４：ＮＯ）、上述したように訓練ＤＢに蓄積した訓練データを用いて、学習モデル１２Ｍの学習を行う。制御部１１は、上述した処理によって訓練ＤＢに蓄積した訓練データのうちの１つを読み出す（Ｓ１５）。そして、制御部１１は、読み出した訓練データに基づいて、学習モデル１２Ｍの学習処理を行う（Ｓ１６）。ここでは、制御部１１は、訓練データに含まれる撮影画像を学習モデル１２Ｍに入力し、当該撮影画像が入力されることによって学習モデル１２Ｍから出力される出力値を取得する。制御部１１は、学習モデル１２Ｍから出力された各出力ノードの出力値と、訓練データに含まれる正解ラベルに応じた値（正解ラベルに対応する出力ノードに対しては１、他の出力ノードに対しては０）とを比較し、両者が近似するように学習モデル１２Ｍを学習させる。学習処理において、学習モデル１２Ｍは、中間層及び出力層での演算処理に用いるパラメータを最適化する。例えば制御部１１は、中間層及び出力層におけるノード間の重み（結合係数）等のパラメータを、学習モデル１２Ｍの出力層から入力層に向かって順次更新する誤差逆伝播法を用いて最適化する。 When the control unit 11 determines that there are no unprocessed captured images (S14: NO), the control unit 11 performs learning of the learning model 12M using the training data accumulated in the training DB as described above. The control unit 11 reads out one of the training data accumulated in the training DB through the above-described processing (S15). Then, the control unit 11 performs a learning process on the learning model 12M based on the read training data (S16). Here, the control unit 11 inputs a captured image included in the training data to the learning model 12M, and acquires an output value output from the learning model 12M by inputting the captured image. The control unit 11 outputs the output value of each output node output from the learning model 12M and a value corresponding to the correct label included in the training data (1 for the output node corresponding to the correct label, and 1 for the other output nodes). 0), and the learning model 12M is trained so that the two approximate each other. In the learning process, the learning model 12M optimizes parameters used for arithmetic processing in the intermediate layer and the output layer. For example, the control unit 11 optimizes parameters such as weights (coupling coefficients) between nodes in the intermediate layer and the output layer using an error backpropagation method that sequentially updates parameters from the output layer to the input layer of the learning model 12M. .

制御部１１は、訓練ＤＢに記憶してある訓練データのうちで、学習処理が行われていない未処理の訓練データがあるか否かを判断する（Ｓ１７）。未処理の訓練データがあると判断した場合（Ｓ１７：ＹＥＳ）、制御部１１は、ステップＳ１５の処理に戻り、学習処理が未処理の訓練データについてステップＳ１５～Ｓ１６の処理を行う。未処理の訓練データがないと判断した場合（Ｓ１７：ＮＯ）、制御部１１は、一連の処理を終了する。 The control unit 11 determines whether or not there is unprocessed training data that has not been subjected to learning processing among the training data stored in the training DB (S17). If it is determined that there is unprocessed training data (S17: YES), the control unit 11 returns to the process of step S15, and performs the processes of steps S15 to S16 on the training data that has not been subjected to the learning process. If it is determined that there is no unprocessed training data (S17: NO), the control unit 11 ends the series of processes.

上述した学習処理により、被写体画像が入力された場合に、背景領域が除去される前の撮影画像が良い画像である可能性を示す出力値と、悪い画像である可能性を示す出力値とを出力する学習モデル１２Ｍが生成される。よって、サーバ１０は、学習モデル１２Ｍからの出力値によって、撮影画像が良い画像であるか悪い画像であるかに関する情報（適否に関する情報）を取得できる。なお、上述した処理において、ステップＳ１１～Ｓ１４による訓練データの生成処理と、ステップＳ１５～Ｓ１７による学習モデル１２Ｍの生成処理とは、各別の装置で行われてもよい。学習モデル１２Ｍは、上述したような訓練データを用いた学習処理を繰り返し行うことにより更に最適化することが可能である。また、既に学習済みの学習モデル１２Ｍについても、上述した学習処理で再学習させることにより、判別精度が更に向上した学習モデル１２Ｍを生成できる。 Through the learning process described above, when a subject image is input, an output value indicating the possibility that the captured image before the background area is removed is a good image and an output value indicating the possibility that it is a bad image are determined. A learning model 12M to be output is generated. Therefore, the server 10 can obtain information regarding whether a photographed image is a good image or a bad image (information regarding suitability) based on the output value from the learning model 12M. Note that in the above-described process, the training data generation process in steps S11 to S14 and the learning model 12M generation process in steps S15 to S17 may be performed in separate devices. The learning model 12M can be further optimized by repeatedly performing learning processing using training data as described above. In addition, by relearning the learning model 12M that has already been trained through the above-described learning process, it is possible to generate a learning model 12M with further improved discrimination accuracy.

以下に、本実施形態の画像処理システムにおいて、カメラ２０を用いて撮影した撮影画像を、販売対象とする画像と販売対象としない画像とに仕分ける処理について説明する。図６は撮影画像の仕分け処理手順の一例を示すフローチャート、図７は管理者端末３０の画面例を示す説明図である。図６では左側にカメラ２０が行う処理を、中央にサーバ１０が行う処理を、右側に管理者端末３０が行う処理をそれぞれ示す。 Below, in the image processing system of this embodiment, a process of sorting photographic images taken using the camera 20 into images to be sold and images not to be sold will be described. FIG. 6 is a flowchart showing an example of a photographed image sorting process procedure, and FIG. 7 is an explanatory diagram showing an example of a screen of the administrator terminal 30. In FIG. 6, the processing performed by the camera 20 is shown on the left, the processing performed by the server 10 is shown in the center, and the processing performed by the administrator terminal 30 is shown on the right.

本実施形態の画像処理システムでは、カメラマンがカメラ２０を用いて、イベントの出演者及び参加者等を撮影して撮影画像を取得する。なお、カメラ２０は、予め設定された撮影タイミングに従って自動的に撮影を行う構成でもよい。カメラ２０は、撮像部による撮影を行い（Ｓ２１）、取得した撮影画像を通信部（送出部）によりサーバ１０へ送信する（Ｓ２２）。なお、カメラ２０は、撮影を行う都度、得られた撮影画像をサーバ１０へ送信する構成でもよく、複数の撮影画像をまとめてサーバ１０へ送信する構成でもよい。以下では、１回の撮影が行われる都度、ステップＳ２２～Ｓ３５の処理が行われる構成について説明するが、複数回の撮影が行われた後にステップＳ２２～Ｓ３５の処理が行われてもよく、この場合、ステップＳ２２～Ｓ３５では、複数の撮影画像のそれぞれに対して各処理が行われる。 In the image processing system of this embodiment, a cameraman uses the camera 20 to photograph performers, participants, and the like of an event to obtain photographed images. Note that the camera 20 may be configured to automatically take pictures according to a preset shooting timing. The camera 20 performs photographing using the imaging section (S21), and transmits the acquired photographed image to the server 10 through the communication section (sending section) (S22). Note that the camera 20 may have a configuration in which it transmits the obtained captured image to the server 10 each time it performs photography, or it may have a configuration in which it transmits a plurality of captured images together to the server 10. In the following, a configuration will be described in which steps S22 to S35 are performed each time one photograph is taken, but steps S22 to S35 may be performed after a plurality of photographs are taken. In this case, in steps S22 to S35, each process is performed on each of the plurality of captured images.

サーバ１０の制御部１１（取得部）は、カメラ２０から送信された撮影画像を取得し、取得した撮影画像を記憶部１２に記憶する（Ｓ２３）。このとき、制御部１１は、撮影画像を記憶部１２の所定領域（画像フォルダ）に記憶すると共に、撮影画像に関する情報を撮影画像ＤＢ１２ａに記憶する。なお、撮影日時及び撮影場所の情報は、撮影画像と共にカメラ２０から取得してもよく、予め登録されていてもよい。次に制御部１１（除去部）は、取得した撮影画像に対して背景除去処理を実行し（Ｓ２４）、撮影画像から背景領域を除去した被写体画像を生成する。背景除去処理は、図５中のステップＳ１２と同様の処理を用いることができる。制御部１１は、生成した被写体画像に基づいて、背景領域が除去される前の撮影画像が良い画像である程度（度合）を示すスコアを算出する（Ｓ２５）。具体的には、制御部１１は、被写体画像を学習モデル１２Ｍに入力し、出力ノード０からの出力値を、撮影画像に対するスコアとして取得する。制御部１１（仕分け部）は、取得したスコアに基づいて、ここでの撮影画像を良い画像又は悪い画像に仕分ける（Ｓ２６）。例えば制御部１１は、取得したスコアが所定閾値（例えば０．７）以上である場合、撮影画像を良い画像に仕分け、所定閾値未満である場合、撮影画像を悪い画像に仕分ける。撮影画像を良い画像又は悪い画像に仕分ける際の閾値は、予め設定されて記憶部１２に記憶されており、また、入力部１４を介した操作に従って変更可能である。例えば制御部１１が、入力部１４を介して閾値の設定変更指示を受け付けた場合に、記憶部１２に記憶してある閾値を、変更指示された閾値に更新することにより、閾値が設定変更される。 The control unit 11 (acquisition unit) of the server 10 acquires the captured image transmitted from the camera 20, and stores the acquired captured image in the storage unit 12 (S23). At this time, the control unit 11 stores the photographed image in a predetermined area (image folder) of the storage unit 12, and also stores information regarding the photographed image in the photographed image DB 12a. Note that the information on the photographing date and time and the photographing location may be acquired from the camera 20 together with the photographed image, or may be registered in advance. Next, the control unit 11 (removal unit) performs background removal processing on the acquired photographed image (S24), and generates a subject image with the background region removed from the photographed image. As the background removal process, a process similar to step S12 in FIG. 5 can be used. The control unit 11 calculates a score indicating the degree to which the photographed image before the background region is removed is a good image, based on the generated subject image (S25). Specifically, the control unit 11 inputs the subject image to the learning model 12M, and obtains the output value from the output node 0 as a score for the captured image. The control unit 11 (sorting unit) sorts the captured images into good images or bad images based on the acquired scores (S26). For example, the control unit 11 classifies the photographed image as a good image when the acquired score is equal to or higher than a predetermined threshold (for example, 0.7), and classifies the photographed image as a bad image when the acquired score is less than the predetermined threshold. A threshold value for classifying captured images into good images or bad images is set in advance and stored in the storage unit 12, and can be changed according to an operation via the input unit 14. For example, when the control unit 11 receives an instruction to change the setting of the threshold value via the input unit 14, the setting of the threshold value is changed by updating the threshold value stored in the storage unit 12 to the threshold value for which the change instruction has been given. Ru.

制御部１１は、撮影画像の画像ＩＤに対応付けて、仕分けた結果（良又は悪）を仕分け画像ＤＢ１２ｂに記憶する（Ｓ２７）。次に制御部１１は、仕分け画像ＤＢ１２ｂの記憶内容に基づいて、良い画像に仕分けられた撮影画像を記憶部１２から読み出し（Ｓ２８）、読み出した撮影画像を管理者端末３０へ送信する（Ｓ２９）。管理者端末３０の制御部３１は、サーバ１０が送信した、良い画像に仕分けられた撮影画像を表示部３５に表示する（Ｓ３０）。例えば制御部３１は、図７に示すような画面を表示部３５に表示し、サーバ１０から取得した撮影画像を管理者に提示する。なお、サーバ１０は、撮影画像と共に又は撮影画像の代わりに、当該撮影画像から背景領域を除去した被写体画像を管理者端末３０へ送信してもよく、管理者端末３０は、撮影画像と共に又は撮影画像の代わりに、被写体画像を表示して管理者に提示してもよい。図７に示す画面は、表示中の撮影画像に対して、当該撮影画像が良い画像であるか悪い画像であるかの判定結果を受け付けるように構成されており、「良い」ボタン、「悪い」ボタン、及び「保留」ボタンが設けられている。管理者は、図７に示す画面において、入力部３４を介して「良い」ボタン又は「悪い」ボタンを操作することにより、撮影画像が良い画像であるか悪い画像であるかの判定を行う。また管理者は、撮影画像に対して良し悪しの判定を行えない場合、「保留」ボタンを操作することにより、判定ができないことを入力する。制御部３１は、いずれかのボタンが操作されることにより、管理者による判定結果を受け付け（Ｓ３１）、受け付けた管理者による判定結果をサーバ１０へ送信する（Ｓ３２）。 The control unit 11 stores the sorted results (good or bad) in the sorted image DB 12b in association with the image ID of the captured image (S27). Next, the control unit 11 reads out the captured images that have been sorted into good images from the storage unit 12 based on the stored content of the sorted image DB 12b (S28), and transmits the read out captured images to the administrator terminal 30 (S29). . The control unit 31 of the administrator terminal 30 displays the captured images transmitted by the server 10 and sorted into good images on the display unit 35 (S30). For example, the control unit 31 displays a screen as shown in FIG. 7 on the display unit 35, and presents the captured image acquired from the server 10 to the administrator. Note that the server 10 may transmit a subject image obtained by removing the background area from the photographed image to the administrator terminal 30 together with or instead of the photographed image, and the administrator terminal 30 may transmit the subject image with the photographed image or in place of the photographed image. Instead of the image, a subject image may be displayed and presented to the administrator. The screen shown in FIG. 7 is configured to accept a determination result as to whether the captured image being displayed is a good image or a bad image. button, and a "Hold" button. The administrator determines whether the photographed image is a good image or a bad image by operating a "good" button or a "bad" button via the input unit 34 on the screen shown in FIG. Furthermore, if the administrator cannot determine whether the photographed image is good or bad, the administrator inputs that the determination cannot be made by operating the "Hold" button. When one of the buttons is operated, the control unit 31 receives the determination result by the administrator (S31), and transmits the received determination result by the administrator to the server 10 (S32).

サーバ１０の制御部１１は、管理者端末３０が送信した管理者による判定結果を受信し、撮影画像の画像ＩＤに対応付けて、受信した判定結果（良又は悪）を判定結果ＤＢ１２ｃに記憶する（Ｓ３３）。制御部１１は、判定結果ＤＢ１２ｃの記憶内容に基づいて、管理者によって良い画像に仕分けられた撮影画像を特定し（Ｓ３４）、特定した撮影画像の画像ＩＤとファイル名とを対応付けて販売画像ＤＢ１２ｄに記憶する（Ｓ３５）。販売画像ＤＢ１２ｄに記憶された撮影画像は、サーバ１０によってネットワークＮ経由で写真販売機４０又はユーザ端末５０を介して販売される。なお、サーバ１０は、販売対象の撮影画像のサムネイル一覧を写真販売機４０又はユーザ端末５０へ送信し、サムネイル一覧を介して受け付けた撮影画像に対する購入希望を写真販売機４０又はユーザ端末５０から取得し、購入希望の撮影画像を写真販売機４０又はユーザ端末５０へ出力する。写真販売機４０は、サーバ１０から取得した購入希望の撮影画像を印刷部によって印刷することにより、購入者に提供する。なお、撮影画像の販売処理については一般的な処理であるので、ここでは省略する。 The control unit 11 of the server 10 receives the judgment result sent by the administrator from the administrator terminal 30, and stores the received judgment result (good or bad) in the judgment result DB 12c in association with the image ID of the photographed image. (S33). The control unit 11 identifies the captured images classified as good images by the administrator based on the stored contents of the determination result DB 12c (S34), associates the image ID and file name of the identified captured images, and sells the images. It is stored in the DB 12d (S35). The photographed images stored in the sales image DB 12d are sold by the server 10 via the network N via the photo vending machine 40 or the user terminal 50. Note that the server 10 transmits a thumbnail list of photographed images to be sold to the photo vending machine 40 or user terminal 50, and acquires from the photo vending machine 40 or user terminal 50 a purchase request for the photographed images received via the thumbnail list. Then, the photographed image that the user wishes to purchase is output to the photo vending machine 40 or the user terminal 50. The photo vending machine 40 prints the captured image that the user desires to purchase obtained from the server 10 using the printing unit, and provides the image to the purchaser. Note that the process of selling photographed images is a common process, so a description thereof will be omitted here.

上述した処理により判定結果ＤＢ１２ｃに記憶された撮影画像に対する管理者の判定結果は、学習モデル１２Ｍを再学習させる際に使用することができる。具体的には、サーバ１０の制御部１１は、判定結果ＤＢ１２ｃに記憶された撮影画像に対して、図５に示す処理を実行することにより、判定結果ＤＢ１２ｃの記憶内容に基づいて訓練データの生成処理と、学習モデル１２Ｍの学習処理とを実行することができる。これにより、学習モデル１２Ｍの判別精度を更に向上させることができる。また、本実施形態では、撮影画像から背景領域が除去された被写体画像と、当該撮影画像に対する良し悪しの判定結果とを訓練データに用いて学習モデル１２Ｍを学習する構成であるが、この構成に限定されない。例えば、撮影画像と、当該撮影画像に対する良し悪しの判定結果とを含む訓練データを用いて学習した学習モデルに対して、被写体画像と当該撮影画像に対する良し悪しの判定結果とを含む訓練データを用いてファインチューニング（転移学習）することによって生成されてもよい。 The administrator's judgment result for the photographed image stored in the judgment result DB 12c through the above-described processing can be used when relearning the learning model 12M. Specifically, the control unit 11 of the server 10 generates training data based on the stored contents of the determination result DB 12c by executing the process shown in FIG. 5 on the captured images stored in the determination result DB 12c. processing and learning processing of the learning model 12M can be executed. Thereby, the discrimination accuracy of the learning model 12M can be further improved. Furthermore, in the present embodiment, the learning model 12M is trained using a subject image from which the background region has been removed from a photographed image and a determination result of the quality of the photographed image as training data. Not limited. For example, for a learning model that is trained using training data that includes a captured image and a judgment result for the captured image, training data that includes a subject image and a judgment result for the captured image is used. It may also be generated by fine-tuning (transfer learning).

上述した処理により、本実施形態の画像処理システムでは、カメラ２０で撮影した撮影画像が販売対象として適切である程度を示すスコアを、学習モデル１２Ｍを用いて算出し、算出されたスコアに基づいて、撮影画像が良い画像又は悪い画像に仕分けられる。よって、人手による画像の仕分け作業が不要となり、作業負荷を低減することができる。また、画像の仕分けを人手で行う場合には、仕分けを行う人の感性によって仕分け結果に差が生じる可能性があるが、本実施形態では、学習モデル１２Ｍによって算出されたスコアによって仕分けが行われるので、客観的な仕分け結果を得ることができる。また、学習モデル１２Ｍは、撮影画像から背景領域を除去した被写体画像に基づいて、当該撮影画像に対するスコアを算出するので、背景領域の影響を受けず、判別精度を向上させることができる。例えば、撮影画像を学習させる場合、訓練データに用いる画像に偏りがあると、学習精度が低下する可能性がある。例えば、良い画像と判定された撮影画像に所定の第１色が多く含まれ、悪い画像と判定された撮影画像に所定の第２色が多く含まれる訓練データを用いて学習モデルを学習した場合、例えばピントが合っていない画像であっても第１色を多く含む画像に対して、良い画像であると判別される可能性がある。しかし、本実施形態では、撮影画像から背景領域を除去した被写体画像に基づいて、画像の良し悪しを判別するので、判別精度の低下が抑制される。 Through the above-described processing, the image processing system of the present embodiment uses the learning model 12M to calculate a score indicating the extent to which the photographed image taken by the camera 20 is suitable for sale, and based on the calculated score, Photographed images are classified into good images and bad images. Therefore, there is no need for manual image sorting work, and the workload can be reduced. Furthermore, when images are sorted manually, there may be differences in the sorting results depending on the sensitivity of the person doing the sorting, but in this embodiment, sorting is done based on the scores calculated by the learning model 12M. Therefore, objective sorting results can be obtained. Furthermore, since the learning model 12M calculates a score for the photographed image based on the subject image from which the background region has been removed, it is possible to improve the discrimination accuracy without being affected by the background region. For example, when learning captured images, if the images used as training data are biased, learning accuracy may decrease. For example, if a learning model is trained using training data in which a captured image that is determined to be a good image contains a large amount of a predetermined first color, and a captured image that is determined to be a bad image contains a large amount of a predetermined second color. For example, even an out-of-focus image may be determined to be a good image if it contains a large amount of the first color. However, in the present embodiment, since the quality of the image is determined based on the subject image obtained by removing the background region from the captured image, a decrease in the determination accuracy is suppressed.

本実施形態では、撮影画像を良い画像又は悪い画像に仕分ける際に用いる閾値の変更が可能である。よって、カメラ２０の数、カメラ２０で撮影した画像の数、仕分けを行う管理者の数、仕分け作業に費やすことが可能な時間等に応じて閾値を変更することにより、管理者が仕分けを行う画像の数を調整することが可能となる。また、本実施形態では、サーバ１０が学習モデル１２Ｍを用いて撮影画像の仕分けを行い、良い画像に仕分けられた撮影画像に対して管理者が更に仕分けを行う構成である。これにより、学習モデル１２Ｍによって出力されたスコアが閾値未満の撮影画像は、管理者が確認することなく販売対象から除外される。よって、管理者が仕分けを行う画像数を削減することができるので、管理者による仕分け作業の負担を軽減することができる。なお、学習モデル１２Ｍを用いて良い画像に仕分けられた撮影画像を、管理者による仕分けを行うことなく、そのまま販売対象としてもよい。この場合、管理者の作業負担を更に軽減できる。例えば閾値に高い値（例えば０．８又は０．９）を設定した場合、学習モデル１２Ｍが出力したスコアに基づいて良い画像に仕分けられた撮影画像は、管理者によっても良い画像と判断される可能性が高く、この場合、管理者による仕分けを行わないように構成されてもよい。 In this embodiment, it is possible to change the threshold value used when classifying captured images into good images or bad images. Therefore, the administrator performs sorting by changing the threshold according to the number of cameras 20, the number of images taken by the cameras 20, the number of administrators performing sorting, the time that can be spent on sorting work, etc. It becomes possible to adjust the number of images. Furthermore, in this embodiment, the server 10 sorts the captured images using the learning model 12M, and the administrator further sorts the captured images that have been sorted into good images. As a result, captured images whose scores output by the learning model 12M are less than the threshold are excluded from sales targets without being checked by the administrator. Therefore, the number of images to be sorted by the administrator can be reduced, and the burden of sorting work on the administrator can be reduced. Note that the captured images that have been sorted into good images using the learning model 12M may be sold as they are without being sorted by the administrator. In this case, the workload of the administrator can be further reduced. For example, if a high value (for example, 0.8 or 0.9) is set for the threshold, captured images that are classified as good images based on the scores output by the learning model 12M will be judged as good images by the administrator as well. This is highly likely, and in this case, the configuration may be such that the administrator does not perform sorting.

本実施形態では、撮影画像を販売対象とするか否かの仕分け処理は、撮影画像から背景領域を除去した被写体画像に対するスコアに基づいて行われる。このほかに、被写体画像に対するスコアに加えて、背景領域が除去される前の撮影画像に対するスコアも考慮して、販売対象とするか否かの仕分け処理が行われてもよい。例えば、制御部１１は、学習モデル１２Ｍを用いて取得した被写体画像に対するスコアが、所定値（例えば０．６）未満であるか否かを判断し、所定値未満であると判断した場合に、当該被写体画像の背景領域が除去される前の撮影画像に対するスコアを学習モデルを用いて取得する。そして、制御部１１は、取得した撮影画像に対するスコアが所定値（例えば０．８）以上であるか否かを判断し、所定値以上であると判断した場合に、当該撮影画像を販売対象とするように構成されていてもよい。このとき、撮影画像に対するスコアを算出する学習モデルは、被写体画像に対するスコアを算出する学習モデル１２Ｍと同じモデルであってもよく、異なるモデルであってもよい。 In this embodiment, the sorting process for determining whether or not a photographed image is to be sold is performed based on the score for a subject image obtained by removing the background area from the photographed image. In addition to this, in addition to the score for the subject image, the score for the photographed image before the background area is removed may also be taken into consideration in the sorting process to determine whether the image is to be sold. For example, the control unit 11 determines whether the score for the subject image acquired using the learning model 12M is less than a predetermined value (for example, 0.6), and when determining that it is less than the predetermined value, A learning model is used to obtain a score for a photographed image before the background region of the subject image is removed. Then, the control unit 11 determines whether or not the score for the acquired photographed image is equal to or higher than a predetermined value (for example, 0.8), and if it is determined that the score is equal to or higher than the predetermined value, the control section 11 determines whether or not the score for the acquired photographed image is a target for sale. It may be configured to do so. At this time, the learning model that calculates the score for the photographed image may be the same model as the learning model 12M that calculates the score for the subject image, or may be a different model.

本実施形態において、訓練データの生成処理、訓練データを用いた学習モデル１２Ｍの学習処理、及び、図６に示す処理のうちでサーバ１０が行う処理のいずれか又は複数を、管理者端末３０がローカルで行う構成とすることもできる。例えば管理者端末３０が図５に示す処理を実行することにより、訓練データを生成し、生成した訓練データを用いて学習モデル１２Ｍを生成して記憶部３２に記憶してもよい。これにより、図６中のサーバ１０が行う処理を管理者端末３０で実行することができる。このような構成とした場合であっても、本実施形態と同様の処理が可能であり、同様の効果が得られる。 In this embodiment, the administrator terminal 30 performs one or more of the processes performed by the server 10 among the training data generation process, the learning process of the learning model 12M using the training data, and the process shown in FIG. It can also be configured to be performed locally. For example, the administrator terminal 30 may generate training data by executing the process shown in FIG. 5, and the learning model 12M may be generated using the generated training data and stored in the storage unit 32. Thereby, the process performed by the server 10 in FIG. 6 can be executed by the administrator terminal 30. Even in the case of such a configuration, the same processing as in this embodiment is possible and the same effects can be obtained.

（実施形態２）
カメラ２０で撮影した動画から静止画を生成し、生成した静止画に対して販売対象とするか否かの仕分けを行う画像処理システムについて説明する。本実施形態の画像処理システムは、図１及び図２に示す実施形態１の画像処理システムと同様の装置を用いて実現されるので、各装置の構成についての説明は省略する。 (Embodiment 2)
An image processing system that generates still images from moving images captured by the camera 20 and sorts the generated still images to determine whether or not to sell them will be described. Since the image processing system of this embodiment is realized using the same devices as the image processing system of Embodiment 1 shown in FIGS. 1 and 2, description of the configuration of each device will be omitted.

図８は実施形態２の仕分け処理手順の一例を示すフローチャートである。図８に示す処理は、図６に示す処理において、ステップＳ２３の前にステップＳ４１～Ｓ４２を追加したものである。図６と同じステップについては説明を省略する。 FIG. 8 is a flowchart showing an example of the sorting processing procedure of the second embodiment. The process shown in FIG. 8 is the process shown in FIG. 6 with steps S41 to S42 added before step S23. Description of the same steps as in FIG. 6 will be omitted.

本実施形態の画像処理システムにおいて、カメラ２０は、図６中のステップＳ２１～Ｓ２２と同様の処理を実行する。なお、カメラ２０が撮影する撮影画像は、静止画であっても動画であってもよい。サーバ１０の制御部１１は、カメラ２０から送信された撮影画像を取得した場合、取得した撮影画像が動画であるか否かを判断する（Ｓ４１）。撮影画像が動画でないと判断した場合（Ｓ４１：ＮＯ）、即ち静止画である場合、制御部１１は、ステップＳ２３の処理に移行し、実施形態１と同様の処理を実行する。撮影画像が動画であると判断した場合（Ｓ４１：ＹＥＳ）、制御部１１は、動画である撮影画像から静止画を生成する（Ｓ４２）。例えば、撮影画像が１秒間に３０枚のフレームを含む動画である場合、制御部１１は、撮影画像から、１秒毎に静止画を生成し、１秒間に３０枚の静止画を生成する。なお、制御部１１は、全てのフレームから静止画を生成する必要はなく、例えば所定時間毎（０．１秒毎、０．５秒毎等）にフレームを取り出して静止画を生成してもよい。 In the image processing system of this embodiment, the camera 20 executes the same processing as steps S21 to S22 in FIG. Note that the image taken by the camera 20 may be a still image or a moving image. When the control unit 11 of the server 10 acquires the captured image transmitted from the camera 20, it determines whether the acquired captured image is a moving image (S41). If it is determined that the photographed image is not a moving image (S41: NO), that is, if it is a still image, the control unit 11 moves to the process of step S23 and executes the same process as in the first embodiment. If it is determined that the captured image is a moving image (S41: YES), the control unit 11 generates a still image from the captured image that is a moving image (S42). For example, if the photographed image is a moving image containing 30 frames per second, the control unit 11 generates a still image from the photographed image every second, and generates 30 still images per second. Note that the control unit 11 does not need to generate still images from all frames, and may generate still images by extracting frames at predetermined intervals (every 0.1 seconds, every 0.5 seconds, etc.), for example. good.

制御部１１は、動画から生成した静止画を、撮影画像として記憶部１２に記憶し（Ｓ２３）、ステップＳ２４以降の処理を実行する。なお、１つの動画から複数の撮影画像（静止画）が生成されるので、ステップＳ２３～Ｓ３５では、複数の撮影画像（静止画）に対して各処理が行われる。これにより、本実施形態においても、カメラ２０で撮影された撮影画像に対して、販売対象とすべきか否かの仕分け処理を行うことが可能であり、撮影画像が動画である場合には、動画から生成された静止画に対して販売対象とすべきか否かの仕分け処理を行うことができる。 The control unit 11 stores the still image generated from the moving image in the storage unit 12 as a photographed image (S23), and executes the processing from step S24 onwards. Note that since a plurality of captured images (still images) are generated from one moving image, each process is performed on the plurality of captured images (still images) in steps S23 to S35. As a result, in this embodiment as well, it is possible to perform sorting processing on the photographed images taken with the camera 20 to determine whether or not they should be sold. The still images generated from the still images can be sorted to determine whether they should be sold.

本実施形態では、動画から生成された静止画を販売対象とするが、動画自体を販売対象としてもよい。例えば、制御部１１は、動画から複数の静止画を生成し、生成した各静止画を学習モデル１２Ｍに入力して各静止画に対するスコアを取得し、各静止画のスコアの平均値が所定値（例えば０．８）以上である場合に、当該動画を販売対象とするように構成されていてもよい。また、動画に含まれる各フレームが入力された場合に、当該動画が販売対象として適切であるか否かを示す情報（当該動画が販売対象として適切である程度を示すスコア）を出力するように学習された学習モデルを用いて、各動画を販売対象とするか否かを判定してもよい。この場合、制御部１１は、動画を当該学習モデルに入力し、当該動画に対するスコアを学習モデルから取得し、取得したスコアが所定値（例えば０．８）以上である場合に、当該動画を販売対象とするように構成されていてもよい。 In this embodiment, still images generated from videos are sold, but videos themselves may be sold. For example, the control unit 11 generates a plurality of still images from a video, inputs each generated still image to the learning model 12M, obtains a score for each still image, and sets the average value of the scores of each still image to a predetermined value. (for example, 0.8) or more, the video may be made available for sale. In addition, when each frame included in a video is input, it is learned to output information indicating whether the video is suitable for sale (a score indicating the degree to which the video is suitable for sale). The learned model may be used to determine whether each video is to be sold. In this case, the control unit 11 inputs the video into the learning model, acquires a score for the video from the learning model, and sells the video when the acquired score is equal to or higher than a predetermined value (for example, 0.8). It may be configured to be targeted.

本実施形態では、上述した実施形態１と同様の効果が得られる。また本実施形態では、カメラ２０で撮影された動画に基づいて、動画から生成された静止画を販売対象とすることができる。よって、カメラ２０を用いて動画を撮影することにより、動画に含まれるフレームから生成された静止画を販売対象とすることができるので、大量の販売対象を収集できる。本実施形態においても、上述した実施形態１で適宜説明した変形例の適用が可能である。 In this embodiment, the same effects as in the first embodiment described above can be obtained. Furthermore, in this embodiment, based on the video captured by the camera 20, still images generated from the video can be sold. Therefore, by photographing a moving image using the camera 20, still images generated from frames included in the moving image can be sold, and a large amount of objects for sale can be collected. Also in this embodiment, the modifications described in the above-described first embodiment can be applied.

以上の実施形態１～２を含む実施の形態に関し、更に以下の付記を開示する。 Regarding the embodiments including Embodiments 1 and 2 above, the following additional notes are further disclosed.

（付記１）
被写体を撮影した撮影画像を取得し、
取得した撮影画像から背景領域を除去し、
撮影画像から背景領域が除去された被写体画像を入力した場合に前記撮影画像の適否に関する情報を出力するように学習された学習モデルに、取得した前記撮影画像から背景領域を除去した被写体画像を入力して、前記撮影画像の適否に関する情報を前記学習モデルから出力し、
前記撮影画像の適否に関する情報に基づいて、前記撮影画像の適否の仕分けを行う
処理をコンピュータに実行させるプログラム。 (Additional note 1)
Obtain a photographed image of the subject,
Remove the background area from the captured image,
A subject image from which the background area has been removed from the captured image is input to a learning model that has been trained to output information regarding the suitability of the captured image when a subject image from which the background area has been removed is input. and outputting information regarding the suitability of the photographed image from the learning model,
A program that causes a computer to perform a process of classifying the photographed images as appropriate or inappropriate based on information regarding the suitability of the photographed images.

（付記２）
適に仕分けされた前記撮影画像を出力し、
出力した前記撮影画像に対して適否の判定を受け付け、
前記撮影画像と、受け付けた前記適否に関する情報とを含む訓練データを取得する
処理を前記コンピュータに実行させる付記１に記載のプログラム。 (Additional note 2)
Outputting the photographed images that have been appropriately sorted;
Accepting a determination of suitability for the outputted photographed image,
The program according to supplementary note 1, which causes the computer to execute a process of acquiring training data including the photographed image and the received information regarding suitability.

（付記３）
前記撮影画像の適否に関する情報は、前記撮影画像の適切度を示すスコアであり、
前記撮影画像の適否を仕分ける際の閾値を受け付け、
受け付けた閾値に基づいて、取得した前記撮影画像の適否の仕分けを行う
処理を前記コンピュータに実行させる付記１又は２に記載のプログラム。 (Additional note 3)
The information regarding the suitability of the photographed image is a score indicating the suitability of the photographed image,
Accepting a threshold value for classifying the suitability of the photographed image,
The program according to supplementary note 1 or 2, which causes the computer to perform a process of classifying the acquired captured images as appropriate or inappropriate based on a received threshold value.

（付記４）
被写体を撮影した動画を取得し、
取得した動画に含まれるフレームから背景領域を除去し、
前記学習モデルに、前記フレームから背景領域を除去した被写体画像を入力して、前記フレームの適否に関する情報を前記学習モデルから出力する
処理を前記コンピュータに実行させる付記１～３のいずれかひとつに記載のプログラム。 (Additional note 4)
Obtain a video of the subject,
Remove the background area from the frames included in the acquired video,
A subject image obtained by removing a background region from the frame is input to the learning model, and information regarding the suitability of the frame is output from the learning model. program.

（付記５）
被写体を撮影した撮影画像を取得し、
取得した撮影画像から背景領域を除去し、
撮影画像から背景領域が除去された被写体画像を入力した場合に前記撮影画像の適否に関する情報を出力するように学習された学習モデルに、取得した前記撮影画像から背景領域を除去した被写体画像を入力して、前記撮影画像の適否に関する情報を前記学習モデルから出力し、
前記撮影画像の適否に関する情報に基づいて、前記撮影画像の適否の仕分けを行う
処理をコンピュータが実行する画像処理方法。 (Appendix 5)
Obtain a photographed image of the subject,
Remove the background area from the captured image,
A subject image from which the background area has been removed from the captured image is input to a learning model that has been trained to output information regarding the suitability of the captured image when a subject image from which the background area has been removed is input. and outputting information regarding the suitability of the photographed image from the learning model,
An image processing method in which a computer performs a process of classifying the photographed images as appropriate or unsuitable based on information regarding the suitability of the photographed images.

（付記６）
被写体を撮影した撮影画像を取得する取得部と、
取得した撮影画像から背景領域を除去する除去部と、
撮影画像から背景領域が除去された被写体画像を入力した場合に前記撮影画像の適否に関する情報を出力するように学習された学習モデルに、取得した前記撮影画像から背景領域を除去した被写体画像を入力して、前記撮影画像の適否に関する情報を前記学習モデルから出力する出力部と、
前記撮影画像の適否に関する情報に基づいて、前記撮影画像の適否の仕分けを行う仕分け部と
を備える画像処理装置。 (Appendix 6)
an acquisition unit that acquires a photographed image of the subject;
a removal unit that removes a background area from the acquired captured image;
A subject image from which the background area has been removed from the captured image is input to a learning model that has been trained to output information regarding the suitability of the captured image when a subject image from which the background area has been removed is input. an output unit that outputs information regarding the suitability of the photographed image from the learning model;
An image processing device comprising: a sorting unit that sorts the captured images into suitability based on information regarding the suitability of the captured images.

（付記７）
被写体を撮影した撮影画像から背景領域が除去された被写体画像と、前記撮影画像の適否に関する情報とを含む訓練データを取得し、
取得した訓練データを用いて、前記被写体画像を入力した場合に前記撮影画像の適否に関する情報を出力する学習モデルを生成する
処理をコンピュータが実行するモデル生成方法。 (Appendix 7)
Obtaining training data including a subject image from which a background region has been removed from a photographed image of the subject, and information regarding the suitability of the photographed image;
A model generation method in which a computer executes a process of generating a learning model that outputs information regarding the suitability of the photographed image when the subject image is input using acquired training data.

（付記８）
被写体を撮影した撮影画像と、前記撮影画像の適否に関する情報とを取得し、
取得した撮影画像から背景領域を除去し、
前記撮影画像から前記背景領域を除去した被写体画像と、取得した前記撮影画像の適否に関する情報とを含む訓練データを取得する
処理を前記コンピュータが実行する付記７に記載のモデル生成方法。 (Appendix 8)
Obtaining a photographed image of the subject and information regarding the suitability of the photographed image,
Remove the background area from the captured image,
The model generation method according to appendix 7, wherein the computer executes a process of acquiring training data including a subject image obtained by removing the background region from the photographed image and information regarding the suitability of the acquired photographed image.

（付記９）
撮影装置及び画像処理装置を含む画像処理システムであって、
前記撮影装置は、
被写体を撮影した撮影画像を前記画像処理装置へ送出する送出部を備え、
前記画像処理装置は、
前記撮影装置から前記撮影画像を取得する取得部と、
取得した撮影画像から背景領域を除去する除去部と、
撮影画像から背景領域が除去された被写体画像を入力した場合に前記撮影画像の適否に関する情報を出力するように学習された学習モデルに、取得した前記撮影画像から背景領域を除去した被写体画像を入力して、前記撮影画像の適否に関する情報を前記学習モデルから出力する出力部と、
前記撮影画像の適否に関する情報に基づいて、前記撮影画像の適否の仕分けを行う仕分け部とを備える
画像処理システム。 (Appendix 9)
An image processing system including a photographing device and an image processing device,
The photographing device is
comprising a sending unit that sends a photographed image of a subject to the image processing device,
The image processing device includes:
an acquisition unit that acquires the photographed image from the photographing device;
a removal unit that removes a background area from the acquired captured image;
A subject image from which the background area has been removed from the captured image is input to a learning model that has been trained to output information regarding the suitability of the captured image when a subject image from which the background area has been removed is input. an output unit that outputs information regarding the suitability of the photographed image from the learning model;
An image processing system, comprising: a sorting unit that sorts the captured images into suitability based on information regarding the suitability of the captured images.

今回開示された実施形態はすべての点で例示であって、制限的なものではないと考えられるべきである。本発明の範囲は、上記した意味ではなく、特許請求の範囲によって示され、特許請求の範囲と均等の意味及び範囲内でのすべての変更が含まれることが意図される。 The embodiments disclosed herein are illustrative in all respects and should not be considered restrictive. The scope of the present invention is indicated by the claims rather than the above-mentioned meaning, and is intended to include meanings equivalent to the claims and all changes within the scope.

１０サーバ
１１制御部
１２記憶部
１３通信部
１４入力部
１５表示部
２０カメラ
３０管理者端末
３１制御部
３２記憶部
３３通信部
４０写真販売機
５０ユーザ端末
１２Ｍ学習モデル 10 Server 11 Control Unit 12 Storage Unit 13 Communication Unit 14 Input Unit 15 Display Unit 20 Camera 30 Administrator Terminal 31 Control Unit 32 Storage Unit 33 Communication Unit 40 Photo Vending Machine 50 User Terminal 12M Learning Model

Claims

Obtain a photographed image of the subject,
Remove the background area from the captured image,
A subject image from which the background area has been removed from the captured image is input to a learning model that has been trained to output information regarding the suitability of the captured image when a subject image from which the background area has been removed is input. and outputting information regarding the suitability of the photographed image from the learning model,
A program that causes a computer to perform a process of classifying the photographed images as appropriate or inappropriate based on information regarding the suitability of the photographed images.

Outputting the photographed images that have been appropriately sorted;
Accepting a determination of suitability for the outputted photographed image,
The program according to claim 1, which causes the computer to execute a process of acquiring training data including the photographed image and the received information regarding suitability.

The information regarding the suitability of the photographed image is a score indicating the suitability of the photographed image,
Accepting a threshold value for classifying the suitability of the photographed image,
The program according to claim 1 or 2, which causes the computer to perform a process of classifying the obtained captured images as appropriate or inappropriate based on the received threshold value.

Obtain a video of the subject,
Remove the background area from the frames included in the acquired video,
The program according to claim 1 or 2, which causes the computer to execute a process of inputting a subject image obtained by removing a background region from the frame to the learning model, and outputting information regarding suitability of the frame from the learning model.

Obtain a photographed image of the subject,
Remove the background area from the captured image,
A subject image from which the background area has been removed from the captured image is input to a learning model that has been trained to output information regarding the suitability of the captured image when a subject image from which the background area has been removed is input. and outputting information regarding the suitability of the photographed image from the learning model,
An image processing method in which a computer performs a process of classifying the photographed images as appropriate or unsuitable based on information regarding the suitability of the photographed images.

an acquisition unit that acquires a photographed image of the subject;
a removal unit that removes a background area from the acquired captured image;
A subject image from which the background area has been removed from the captured image is input to a learning model that has been trained to output information regarding the suitability of the captured image when a subject image from which the background area has been removed is input. an output unit that outputs information regarding the suitability of the photographed image from the learning model;
An image processing device comprising: a sorting unit that sorts the captured images into suitability based on information regarding the suitability of the captured images.

Obtaining training data including a subject image from which a background region has been removed from a photographed image of the subject, and information regarding the suitability of the photographed image;
A model generation method in which a computer executes a process of generating a learning model that outputs information regarding the suitability of the photographed image when the subject image is input using acquired training data.

Obtaining a photographed image of the subject and information regarding the suitability of the photographed image,
Remove the background area from the captured image,
8. The model generation method according to claim 7, wherein the computer executes a process of acquiring training data including a subject image obtained by removing the background region from the photographed image and information regarding the suitability of the acquired photographed image.

An image processing system including a photographing device and an image processing device,
The photographing device is
comprising a sending unit that sends a photographed image of a subject to the image processing device,
The image processing device includes:
an acquisition unit that acquires the photographed image from the photographing device;
a removal unit that removes a background area from the acquired captured image;
A subject image from which the background area has been removed from the captured image is input to a learning model that has been trained to output information regarding the suitability of the captured image when a subject image from which the background area has been removed is input. an output unit that outputs information regarding the suitability of the photographed image from the learning model;
An image processing system, comprising: a sorting unit that sorts the captured images into suitability based on information regarding the suitability of the captured images.