JP6927540B1

JP6927540B1 - Information processing equipment, information processing system, information processing method and program

Info

Publication number: JP6927540B1
Application number: JP2020149204A
Authority: JP
Inventors: 啓成島; 兼太郎山口
Original assignee: Celsys Inc
Current assignee: Celsys Inc
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2021-09-01
Anticipated expiration: 2040-09-04
Also published as: JP2022043749A

Abstract

【課題】対象画像から、所望領域とは異なる領域が誤って検出される可能性を低減する情報処理装置、情報処理システム、情報処理方法及びプログラムを提供する。【解決手段】情報処理装置１は、対象画像におけるキーポイント位置の推定及び／又はセマンティックセグメンテーションによる領域分類を実行し、当該推定したキーポイント位置及び／又は当該領域分類の結果を用いて、対象画像から所望領域を含む限定領域を抽出する限定領域抽出部と、抽出された限定領域に対して所望領域を認識する処理を実行して、限定領域から所望領域を抽出して出力する所望領域出力部と、を備える。【選択図】図１PROBLEM TO BE SOLVED: To provide an information processing device, an information processing system, an information processing method and a program which reduce the possibility that an area different from a desired area is erroneously detected from a target image. An information processing apparatus 1 executes region classification by estimation of a key point position and / or semantic segmentation in a target image, and uses the estimated key point position and / or the result of the region classification to display the target image. A limited area extraction unit that extracts a limited area including a desired area from, and a desired area output unit that extracts a desired area from the limited area by executing a process of recognizing the desired area for the extracted limited area and outputting the desired area. And. [Selection diagram] Fig. 1

Description

本発明は、情報処理装置、情報処理システム、情報処理方法及びプログラムに関する。 The present invention relates to an information processing device, an information processing system, an information processing method and a program.

従来から、画像認識技術を用いて、画像から特定の領域を抽出することが行われている。たとえば、人体に対しては、人体検知、顔検出、顔認識、顔方向検知、顔器官検出、年齢、性別、表情、また肩、足元、身長等の人体の部位認識（人体特徴認識）を実行することが提案され、また、物体に対しては、大きさや形状の解析、及び椅子や自動車等の物体カテゴリの検出を行うことが提案されている（例えば、特許文献１参照）。 Conventionally, a specific region has been extracted from an image by using an image recognition technique. For example, for the human body, human body detection, face detection, face recognition, face direction detection, facial organ detection, age, gender, facial expression, and human body part recognition (human body feature recognition) such as shoulders, feet, and height are executed. It is also proposed to analyze the size and shape of an object and detect an object category such as a chair or an automobile (see, for example, Patent Document 1).

一方、非特許文献１には、人物が写った写真から、キーポイントを推定して、人物のポーズを推定する技術が開示されている。ここで、キーポイントとは、関節点(肩、肘、手首、腰、膝、足首など)や特徴点(目、鼻、口、耳など)のことである。
また、非特許文献２には、セマンティックセグメンテーションによって、写真から、人物の領域、自転車の領域、動物の領域など各領域を推定することが開示されている。 On the other hand, Non-Patent Document 1 discloses a technique of estimating a key point and estimating a pose of a person from a photograph of a person. Here, the key points are joint points (shoulders, elbows, wrists, hips, knees, ankles, etc.) and feature points (eyes, nose, mouth, ears, etc.).
Further, Non-Patent Document 2 discloses that each area such as a person's area, a bicycle area, and an animal area is estimated from a photograph by semantic segmentation.

特開２０１５−６１２３９号公報Japanese Unexamined Patent Publication No. 2015-61239

https://arxiv.org/abs/1611.08050 （Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, "Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields"）https://arxiv.org/abs/1611.08050 (Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, "Realtime Multi-Person 2D Pose Optimization using Part Affinity Fields") https://arxiv.org/abs/1605.06211（Evan Shelhamer, Jonathan Long, Trevor Darrell,"Fully Convolutional Networks for Semantic Segmentation"）https://arxiv.org/abs/1605.06211 (Evan Shelhamer, Jonathan Long, Trevor Darrell, "Fully Convolutional Networks for Semantic Segmentation")

従来手法として例えば、人物が写った写真の画像から、顔を検出する手法がある。しかし、対象画像に例えば、顔だけでなく「顔と似て非なる物」（例えば、コンセント）が写っている場合、顔ではない領域が誤って検出されて出力されてしまうことがある。このように、対象画像から、ユーザが所望する所望領域（例えば、人物の顔または犬の顔）とは異なる領域が誤って検出されて出力されてしまうことがある。 As a conventional method, for example, there is a method of detecting a face from an image of a photograph showing a person. However, when, for example, not only a face but also a "non-face-like object" (for example, an outlet) is shown in the target image, an area other than the face may be erroneously detected and output. In this way, an area different from the desired area (for example, a person's face or a dog's face) desired by the user may be erroneously detected and output from the target image.

本発明は、上記問題に鑑みてなされたものであり、対象画像から、所望領域とは異なる領域が誤って検出される可能性を低減することを可能とする情報処理装置、情報処理システム、情報処理方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above problems, and is an information processing device, an information processing system, and information that can reduce the possibility that a region different from a desired region is erroneously detected from a target image. It is an object of the present invention to provide a processing method and a program.

本発明の第１の態様に係る情報処理装置は、対象画像におけるキーポイント位置の推定及び／又はセマンティックセグメンテーションによる領域分類を実行し、当該推定したキーポイント位置、及び／又は当該領域分類の結果を用いて、前記対象画像から所望領域を含む限定領域を抽出する限定領域抽出部と、前記抽出された限定領域に対して前記所望領域を認識する処理を実行して、前記限定領域から所望領域を抽出して出力する所望領域出力部と、を備える。 The information processing apparatus according to the first aspect of the present invention executes the estimation of the key point position in the target image and / or the area classification by semantic segmentation, and obtains the estimated key point position and / or the result of the area classification. A limited region extraction unit that extracts a limited region including a desired region from the target image and a process of recognizing the desired region with respect to the extracted limited region are executed to obtain the desired region from the limited region. A desired area output unit for extracting and outputting is provided.

この構成によれば、キーポイントの推定、及び／又はセマンティックセグメンテーションによる対象画像内の画像領域の推定を用いることにより、所望領域を高精度に検出することができる。このため、対象画像から、所望領域とは異なる領域が誤って検出される可能性を低減することができる。 According to this configuration, the desired region can be detected with high accuracy by using the estimation of the key point and / or the estimation of the image region in the target image by semantic segmentation. Therefore, it is possible to reduce the possibility that a region different from the desired region is erroneously detected from the target image.

本発明の第２の態様に係る情報処理装置は、第１の態様に係る情報処理装置であって、前記限定領域抽出部は、前記セマンティックセグメンテーションによって前記対象画像を複数の領域に分け、当該領域毎に当該領域が表すカテゴリに分類し、前記所望領域の種類に対応するカテゴリに分類された領域を略含む領域を前記限定領域として抽出し、前記所望領域出力部は、前記選択された限定領域において前記所望領域を認識する処理を実行して、少なくとも一つの所望領域を出力する。 The information processing device according to the second aspect of the present invention is the information processing device according to the first aspect, and the limited area extraction unit divides the target image into a plurality of areas by the semantic segmentation and divides the target image into a plurality of areas. Each region is classified into a category represented by the region, and an region including a region classified into a category corresponding to the type of the desired region is extracted as the limited region, and the desired region output unit is the selected limited region. In, the process of recognizing the desired region is executed, and at least one desired region is output.

この構成によれば、複数のカテゴリの被写体が写った画像であっても、ユーザが所望する所望領域の種類に対応するカテゴリの画像領域だけを抽出することができる。 According to this configuration, even if the image shows a plurality of categories of subjects, only the image area of the category corresponding to the type of desired area desired by the user can be extracted.

本発明の第３の態様に係る情報処理装置は、第１の態様に係る情報処理装置であって、前記限定領域抽出部は、前記推定されたキーポイントのうちの１個または複数個を含むように前記限定領域を抽出する。 The information processing device according to the third aspect of the present invention is the information processing device according to the first aspect, and the limited area extraction unit includes one or more of the estimated key points. The limited region is extracted as described above.

この構成によれば、限定領域を確実に作ることができる。 According to this configuration, a limited area can be surely created.

本発明の第４の態様に係る情報処理装置は、第１の態様に係る情報処理装置であって、前記限定領域抽出部は、前記セマンティックセグメンテーションによって前記対象画像を複数の領域に分け、当該領域毎に当該領域が表すカテゴリに分類し、前記所望領域の種類に対応するカテゴリに分類された領域を略含む領域において、キーポイント位置を推定し、当該推定したキーポイントのうちの１個または複数個を含むように前記限定領域を抽出する。 The information processing device according to the fourth aspect of the present invention is the information processing device according to the first aspect, and the limited area extraction unit divides the target image into a plurality of areas by the semantic segmentation and divides the target image into a plurality of areas. Each is classified into the category represented by the region, and the key point position is estimated in the region including the region classified into the category corresponding to the type of the desired region, and one or more of the estimated key points are estimated. The limited region is extracted so as to include the number.

この構成によれば、２段階で抽出することによって、複数のカテゴリの被写体が写った画像であっても、ユーザが所望する所望領域の種類に対応するカテゴリの画像領域（例えば人の画像領域）であって所望領域の種類の画像領域（例えば、人の顔の画像領域）だけを高精度に抽出することができる。 According to this configuration, by extracting in two steps, even if the image shows a subject of a plurality of categories, the image area of the category corresponding to the type of the desired area desired by the user (for example, the image area of a person). Therefore, only an image region of a desired region type (for example, an image region of a human face) can be extracted with high accuracy.

本発明の第５の態様に係る情報処理装置は、第１から４のいずれかの態様に係る情報処理装置であって、前記限定領域抽出部によって抽出された限定領域が複数ある場合、当該複数の限定領域をユーザが選択可能にディスプレイに表示制御する表示制御部と、ユーザによって選択された限定領域を受け付ける受付部と、を備え、前記所望領域出力部は、前記ユーザによって選択された限定領域に対して前記所望領域を認識する処理を実行して、前記限定領域から所望領域を抽出して出力する。 The information processing device according to the fifth aspect of the present invention is the information processing device according to any one of the first to fourth aspects, and when there are a plurality of limited regions extracted by the limited region extraction unit, the plurality of the information processing devices. The desired area output unit includes a display control unit that controls the display of the limited area on the display so that the user can select the limited area, and a reception unit that receives the limited area selected by the user. The desired area output unit is the limited area selected by the user. The desired region is extracted from the limited region and output by executing the process of recognizing the desired region.

この構成によればユーザが複数の限定領域の中から１以上の限定領域を選択することで、ユーザによって選択された限定領域から所望領域が出力されるので、所望領域の出力精度を向上させることができる。 According to this configuration, when the user selects one or more limited areas from the plurality of limited areas, the desired area is output from the limited area selected by the user, so that the output accuracy of the desired area can be improved. Can be done.

本発明の第６の態様に係る情報処理装置は、第１から５のいずれかの態様に係る情報処理装置であって、前記限定領域抽出部は、前記複数の限定領域を抽出し、前記所望領域出力部は、前記選択された複数の限定領域から、複数の所望領域を出力し、前記出力された複数の所望領域のうち、少なくとも一つをユーザが選択可能にディスプレイに表示制御する表示制御部を備える。 The information processing device according to the sixth aspect of the present invention is the information processing device according to any one of the first to fifth aspects, and the limited area extraction unit extracts the plurality of limited areas and the desired one. The area output unit outputs a plurality of desired areas from the selected plurality of limited areas, and displays and controls at least one of the output desired areas on the display so that the user can select them. It has a part.

この構成によれば、ユーザが複数の所望領域から、１以上の領域を選択することができる。 According to this configuration, the user can select one or more regions from a plurality of desired regions.

本発明の第７の態様に係る情報処理装置は、第１から６のいずれかの態様に係る情報処理装置であって、前記所望領域出力部によって出力された所望領域が複数ある場合、前記複数の所望領域のうちユーザによって選択された１以上の所望領域を受け付ける受付部と、前記ユーザによって選択された１以上の所望領域をストレージに保存させる記憶処理部と、を更に備える。 The information processing device according to the seventh aspect of the present invention is the information processing device according to any one of the first to sixth aspects, and when there are a plurality of desired regions output by the desired region output unit, the plurality of desired regions. A reception unit that receives one or more desired areas selected by the user among the desired areas of the above, and a storage processing unit that stores one or more desired areas selected by the user in the storage are further provided.

この構成によれば、ユーザが選択した所望の画像を活用することができる。 According to this configuration, a desired image selected by the user can be utilized.

本発明の第８の態様に係る情報処理装置は、第１から７のいずれかの態様に係る情報処理装置であって、前記所望領域の出力において優先する対象を、ユーザによる優先度の指定に基づき取得する取得部を備え、前記限定領域抽出部は、複数の限定領域を抽出し、当該優先する対象に応じて、複数の限定領域それぞれの優先度を決定し、前記所望領域出力部は、当該優先度に応じて前記所望領域を出力する。 The information processing device according to the eighth aspect of the present invention is the information processing device according to any one of the first to seventh aspects, and the priority target in the output of the desired region is designated by the user. The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and the desired area output unit determines the priority of each of the plurality of limited areas. The desired region is output according to the priority.

この構成によれば、ユーザが優先するもの（例えば、手前にいる人物）の画像領域を取得することができる。 According to this configuration, it is possible to acquire an image area of a user's priority (for example, a person in front).

本発明の第９の態様に係る情報処理システムは、対象画像におけるキーポイント位置の推定及び／又はセマンティックセグメンテーションによる領域分類を実行し、当該推定したキーポイント位置、及び／又は当該領域分類の結果を用いて、前記対象画像から所望領域を含む限定領域を抽出する限定領域抽出部と、前記抽出された限定領域に対して前記所望領域を認識する処理を実行して、前記限定領域から所望領域を抽出して出力する所望領域出力部と、を備える。 The information processing system according to the ninth aspect of the present invention executes the estimation of the key point position in the target image and / or the area classification by semantic segmentation, and obtains the estimated key point position and / or the result of the area classification. A limited region extraction unit that extracts a limited region including a desired region from the target image and a process of recognizing the desired region with respect to the extracted limited region are executed to obtain the desired region from the limited region. A desired area output unit for extracting and outputting is provided.

本発明の第１０の態様に係る情報処理方法は、対象画像におけるキーポイント位置の推定及び／又はセマンティックセグメンテーションによる領域分類を実行し、当該推定したキーポイント位置、及び／又は当該領域分類の結果を用いて、前記対象画像から所望領域を含む限定領域を抽出する限定領域抽出手順と、前記抽出された限定領域に対して前記所望領域を認識する処理を実行して、前記限定領域から所望領域を抽出して出力する所望領域出力手順と、を有する。 In the information processing method according to the tenth aspect of the present invention, the key point position in the target image is estimated and / or the area classification by semantic segmentation is executed, and the estimated key point position and / or the result of the area classification is obtained. The desired region is extracted from the limited region by performing a limited region extraction procedure for extracting a limited region including a desired region from the target image and a process for recognizing the desired region with respect to the extracted limited region. It has a desired area output procedure for extracting and outputting.

本発明の第１１の態様に係るプログラムは、対象画像におけるキーポイント位置の推定及び／又はセマンティックセグメンテーションによる領域分類を実行し、当該推定したキーポイント位置、及び／又は当該領域分類の結果を用いて、前記対象画像から所望領域を含む限定領域を抽出する限定領域抽出手順、前記抽出された限定領域に対して前記所望領域を認識する処理を実行して、前記限定領域から所望領域を抽出して出力する所望領域出力手順、を実行させるためのプログラムである。 The program according to the eleventh aspect of the present invention executes region classification by estimation of key point position and / or semantic segmentation in the target image, and uses the estimated key point position and / or the result of the region classification. , A limited area extraction procedure for extracting a limited area including a desired area from the target image, a process for recognizing the desired area for the extracted limited area, and extracting the desired area from the limited area. This is a program for executing the desired area output procedure for output.

本発明の一態様によれば、キーポイントの推定、及び／又はセマンティックセグメンテーションによる対象画像内の画像領域の推定を用いることにより、所望領域を高精度に検出することができる。このため、対象画像から、所望領域とは異なる領域が誤って検出される可能性を低減することができる。 According to one aspect of the present invention, the desired region can be detected with high accuracy by using the estimation of the key point and / or the estimation of the image region in the target image by semantic segmentation. Therefore, it is possible to reduce the possibility that a region different from the desired region is erroneously detected from the target image.

第１の実施形態に係る情報処理装置の概略構成図である。It is a schematic block diagram of the information processing apparatus which concerns on 1st Embodiment. 情報処理装置に表示される画面遷移の一例である。This is an example of screen transition displayed on the information processing device. 実施例１の処理について説明するための図である。It is a figure for demonstrating the process of Example 1. FIG. 実施例２の処理について説明するための図である。It is a figure for demonstrating the process of Example 2. FIG. 変形例の画面遷移の一例である。This is an example of the screen transition of the modified example. 変形例の処理を説明するための図である。It is a figure for demonstrating the processing of a modification. 変形例の処理の一例を示すフローチャートである。It is a flowchart which shows an example of the processing of a modification. 第２の実施形態に係る情報処理システムの概略構成図である。It is a schematic block diagram of the information processing system which concerns on 2nd Embodiment. 本実施形態に係るコンピュータシステムの概略構成図である。It is a schematic block diagram of the computer system which concerns on this embodiment.

以下、各実施形態について、図面を参照しながら説明する。但し、必要以上に詳細な説明は省略する場合がある。例えば、既によく知られた事項の詳細説明や実質的に同一の構成に対する重複説明を省略する場合がある。これは、以下の説明が不必要に冗長になるのを避け、当業者の理解を容易にするためである。 Hereinafter, each embodiment will be described with reference to the drawings. However, more detailed explanation than necessary may be omitted. For example, detailed explanations of already well-known matters and duplicate explanations for substantially the same configuration may be omitted. This is to avoid unnecessary redundancy of the following description and to facilitate the understanding of those skilled in the art.

本実施形態では、上記の課題だけでなく、対象画像に所望の人物だけでなく他の人物も写っている場合に、所望の人物の所望領域の画像を取得することができるようにすることも課題とする。 In the present embodiment, in addition to the above-mentioned problems, it is also possible to acquire an image of a desired area of a desired person when not only a desired person but also another person is shown in the target image. Make it an issue.

第１の実施形態に係る情報処理装置１は、例えば多機能携帯電話（いわゆるスマートフォン）などの携帯電話、タブレット、ノートパソコンなどのモバイルデバイス、またはデスクトップパソコンなどである。本実施形態では、一例として、多機能携帯電話であるものとして説明する。 The information processing device 1 according to the first embodiment is, for example, a mobile phone such as a multifunctional mobile phone (so-called smartphone), a mobile device such as a tablet or a laptop computer, or a desktop personal computer. In the present embodiment, as an example, it will be described as a multifunctional mobile phone.

図１は、第１の実施形態に係る情報処理装置の概略構成図である。図１に示すように、情報処理装置１は例えば、入力インタフェース１１と、通信モジュール１２と、ストレージ１３と、メモリ１４と、ディスプレイ１５と、プロセッサ１６と、カメラ１７とを備える。
入力インタフェース１１は、ユーザの操作を受け付け、受け付けた操作に応じた入力信号をプロセッサ１６へ出力する。本実施形態では入力インタフェース１１は一例としてタッチパネルである。
通信モジュール１２は、通信回路網に接続されて、通信回路網に接続されている他のコンピュータと通信する。この通信は有線であっても無線であってもよい。 FIG. 1 is a schematic configuration diagram of an information processing device according to the first embodiment. As shown in FIG. 1, the information processing device 1 includes, for example, an input interface 11, a communication module 12, a storage 13, a memory 14, a display 15, a processor 16, and a camera 17.
The input interface 11 accepts a user's operation and outputs an input signal corresponding to the accepted operation to the processor 16. In this embodiment, the input interface 11 is a touch panel as an example.
The communication module 12 is connected to the communication network and communicates with other computers connected to the communication network. This communication may be wired or wireless.

ストレージ１３には、プロセッサ１６が読み出して実行するためのアプリケーションのプログラム及び各種のデータが格納されている。このアプリケーションは例えば、サーバもしくはクラウド経由でダウンロードされてインストールされたものである。
メモリ１４は、データ及びプログラムを一時的に保持する。メモリ１４は、揮発性メモリであり、例えばＲＡＭ（Random Access Memory）である。
ディスプレイ１５は、プロセッサ１６の指令に従って、情報を表示する。 The storage 13 stores an application program and various data for the processor 16 to read and execute. This application is, for example, downloaded and installed via a server or cloud.
The memory 14 temporarily holds data and programs. The memory 14 is a volatile memory, for example, a RAM (Random Access Memory).
The display 15 displays information according to the instructions of the processor 16.

プロセッサ１６は、ストレージ１３から第１の実施形態に係るアプリケーションのプログラムをメモリ１４にロードし、当該プログラムに含まれる一連の命令を実行することによって、取得部１６１、限定領域抽出部１６２、所望領域出力部１６３、表示制御部１６４、受付部１６５、記憶処理部１６６として機能する。各部の処理の詳細については後述する。 The processor 16 loads the program of the application according to the first embodiment from the storage 13 into the memory 14, and executes a series of instructions included in the program to execute the acquisition unit 161, the limited area extraction unit 162, and the desired area. It functions as an output unit 163, a display control unit 164, a reception unit 165, and a storage processing unit 166. Details of the processing of each part will be described later.

カメラ１７は、例えばディスプレイ１５側に設けられた背面カメラであり、被写体を撮像可能である。なお、情報処理装置１は、これに加えてまたはこれに替えて、ディスプレイ１５側に設けられた前面カメラを備えてもよい。 The camera 17 is, for example, a rear camera provided on the display 15 side, and can take an image of a subject. In addition to or in place of this, the information processing device 1 may include a front camera provided on the display 15 side.

図２は、情報処理装置に表示される画面遷移の一例である。図２の画面Ｇ１、Ｇ２は、例えば、アプリケーションを立ち上げて表示される画面である。画面Ｇ１では、対象画像がユーザによって選択されて表示されている画面の一例である。画面Ｇ１には、対象画像を選択するためのファイル選択用ボタンＢ１と、ユーザが抽出したい所望領域の種類を入力するための入力ボックスＢ２と、所望領域抽出の開始を指示するための抽出開始ボタンＢ３と、対象画像Ｆ１が表示されている。ここでは、対象画像Ｆ１として、二人の男性が写った画像が表示されている。入力ボックスＢ２ではなく、セレクトボックスや、複数のタグからの選択など、選択式であってもよい。ここでは所望領域は例えば、ユーザが抽出を所望する種類（例えば、人の顔）の画像領域である。なお、所望領域の種類は一例としてユーザが設定するものとして説明するが、これに限らず、所望領域の種類は予め設定されていてもよく、その場合、所望領域は例えば、予め所望する種類（例えば、人の顔）が設定された画像領域である。例えば情報処理装置が、顔検出専用の装置である場合、所望領域の種類は、人の顔に予め設定されていてもよい。 FIG. 2 is an example of a screen transition displayed on the information processing apparatus. The screens G1 and G2 of FIG. 2 are, for example, screens displayed by launching an application. The screen G1 is an example of a screen in which the target image is selected and displayed by the user. On the screen G1, a file selection button B1 for selecting a target image, an input box B2 for inputting the type of a desired area to be extracted by the user, and an extraction start button for instructing the start of extraction of the desired area are displayed. B3 and the target image F1 are displayed. Here, as the target image F1, an image showing two men is displayed. Instead of the input box B2, a selection type such as a select box or selection from a plurality of tags may be used. Here, the desired region is, for example, an image region of the type (for example, a human face) that the user desires to extract. The type of the desired area will be described as being set by the user as an example, but the present invention is not limited to this, and the type of the desired area may be set in advance. For example, a human face) is a set image area. For example, when the information processing device is a device dedicated to face detection, the type of the desired region may be preset for the human face.

例えば、ユーザによって「何を見つけますか？」（抽出したい所望領域の種類）に対する回答として「人の顔」が入力され、抽出開始ボタンＢ３が押された場合、処理が実行されて、画面Ｇ２に表示が遷移する。画面Ｇ２では例えば、「見つかった領域」（所望領域）として、二人の男性のうち一人の男性の顔の画像領域が所望領域として表示される。 For example, when the user inputs "human face" as an answer to "what do you want to find?" (Type of desired area to be extracted) and the extraction start button B3 is pressed, the process is executed and the screen G2 is executed. The display changes to. On the screen G2, for example, as a "found area" (desired area), an image area of the face of one of the two men is displayed as a desired area.

続いて図２の画面Ｇ１から画面Ｇ２に遷移する間に実行されている処理について、説明する。画面Ｇ１で抽出開始ボタンＢ３が押された場合、取得部１６１は、対象画像内の領域であってユーザが所望する所望領域の種類（図２の例の場合、人の顔）を取得する。なお、予め所望領域の種類が特定の物（例えば、人の顔）に決められている場合には、取得部１６１はなくてもよい。限定領域抽出部１６２は、対象画像から次の手法（下記の＜実施例１の処理＞から＜実施例３の処理＞の手法）の一つ以上を使って「限定領域」を抽出する。そして所望領域出力部１６３は、抽出された限定領域に対して前記所望領域を認識する処理（例えば、顔検出）を実行して、当該限定領域から所望領域を抽出して出力する。 Subsequently, the processing executed during the transition from the screen G1 to the screen G2 in FIG. 2 will be described. When the extraction start button B3 is pressed on the screen G1, the acquisition unit 161 acquires the type of the desired area (human face in the case of the example of FIG. 2) that is the area in the target image and is desired by the user. If the type of the desired region is determined in advance for a specific object (for example, a human face), the acquisition unit 161 may not be provided. The limited area extraction unit 162 extracts a "limited area" from the target image by using one or more of the following methods (methods of <Process of Example 1> to <Process of Example 3> below). Then, the desired region output unit 163 executes a process of recognizing the desired region (for example, face detection) on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.

＜実施例１の処理＞
まず、処理の実施例１は、キーポイント推定を使ったものである。実施例１について図３を用いて説明する。図３は、実施例１の処理について説明するための図である。図３において、限定領域抽出部１６２は、対象画像Ｈ１において、非特許文献１などの技術を用いて、キーポイント位置を推定する。画像Ｈ２は、推定されたキーポイントの位置が白丸で示されている。このように、キーポイントとして、関節点（肩、肘、手首、腰、膝、足首など）や特徴点（目、鼻、口、耳など）が抽出される。一態様では、限定領域抽出部１６２によって、顔のキーポイントのバウンディングボックスは、１人の人物について１個設定され、その人物の顔のキーポイント（例えば目、鼻、口、耳のキーポイント）全部を含む１個のバウンディングボックスが設定される。図３の例では、顔のキーポイントのバウンディングボックスＨ２１と、顔のキーポイントのバウンディングボックスＨ２２が示されている。ここでは一例として、顔のキーポイントのバウンディングボックスは、人物の顔のキーポイント全部をちょうど囲うのに必要な大きさの四角い箱（矩形）である。また一態様では、限定領域抽出部１６２によって、身体のキーポイントに対するバウンディングボックスは、１人の人物について１個設定され、身体のキーポイントに対する関節点（例えば肩、肘、手首、腰、膝、足首のキーポイント）全部を含む１個のバウンディングボックスが設定される。 <Processing of Example 1>
First, the first embodiment of the process uses key point estimation. The first embodiment will be described with reference to FIG. FIG. 3 is a diagram for explaining the process of the first embodiment. In FIG. 3, the limited region extraction unit 162 estimates the key point position in the target image H1 by using a technique such as Non-Patent Document 1. In image H2, the estimated key point positions are indicated by white circles. In this way, joint points (shoulders, elbows, wrists, hips, knees, ankles, etc.) and feature points (eyes, nose, mouth, ears, etc.) are extracted as key points. In one aspect, the limited area extraction unit 162 sets one bounding box for face key points for one person, and the face key points (for example, eye, nose, mouth, and ear key points). One bounding box including all is set. In the example of FIG. 3, the bounding box H21 of the key point of the face and the bounding box H22 of the key point of the face are shown. As an example here, the face keypoint bounding box is a square box (rectangle) of the size required to just enclose all the face keypoints of a person. In one aspect, the limited area extraction unit 162 sets one bounding box for the key points of the body for one person, and the joint points for the key points of the body (for example, shoulders, elbows, wrists, hips, knees, etc.) One bounding box including all the key points of the ankle is set.

図３の画像Ｈ２のように、検出されたキーポイントが複数人分の場合は、限定領域抽出部１６２は、以下の方法のうちの１つを使って（あるいは複数方法を併用して）、複数人物のうちの１人分に絞り込む。
（１）より多くの顔のキーポイントが検出された人物を選ぶ。
（２）より多くの身体のキーポイントが検出された人物を選ぶ。
（３）顔のキーポイントのバウンディングボックスが大きい人物を選ぶ。
（４）身体のキーポイントのバウンディングボックスが大きい人物を選ぶ。 When the detected key points are for a plurality of people as in the image H2 of FIG. 3, the limited area extraction unit 162 uses one of the following methods (or a combination of the plurality of methods). Narrow down to one of multiple people.
(1) Select a person whose facial key points are detected.
(2) Select a person whose body key points have been detected.
(3) Select a person with a large facial key point bounding box.
(4) Select a person with a large bounding box, which is a key point of the body.

図３の例では、限定領域抽出部１６２は、上記（１）、（２）の方法では差が付かないので(検出された顔のキーポイントはいずれも６個で同数、検出された身体のキーポイントはいずれも２個で同数)、上記（３）の方法を用いて、顔のキーポイントのバウンディングボックスが大きい人物を選択する。これにより、対象画像に複数の人物が写っている場合に、主たる人物の顔の画像を取得することができる。 In the example of FIG. 3, since the limited area extraction unit 162 does not make a difference by the methods (1) and (2) above (the number of the detected key points of the face is 6 and the same number of the detected body points). The number of key points is the same for both of them), and the person with a large bounding box of the key points of the face is selected by using the method (3) above. As a result, when a plurality of people are shown in the target image, it is possible to acquire an image of the face of the main person.

続いて、限定領域抽出部１６２は例えば、１人分のキーポイントから、例えば下記の方法で「限定領域」を決定する。
（１）顔のキーポイントのバウンディングボックスを所定量もしくは所定の割合だけ広げた領域を「限定領域」とする。ここで「所定量もしくは所定の割合」は、具体的には下記のようにしてもよい。
（ア）顔のキーポイントのバウンディングボックスの大きさに対する所定の割合。
（イ）身体のキーポイントのバウンディングボックスの大きさに対する所定の割合。
（ウ）ソース画像の大きさに対する所定の割合。
（エ）所定の画素数。 Subsequently, the limited area extraction unit 162 determines the "limited area" from the key points for one person, for example, by the following method.
(1) An area in which the bounding box of the key point of the face is expanded by a predetermined amount or a predetermined ratio is defined as a "limited area". Here, the "predetermined amount or predetermined ratio" may be specifically as follows.
(A) A predetermined ratio of facial key points to the size of the bounding box.
(B) A predetermined ratio of the key points of the body to the size of the bounding box.
(C) A predetermined ratio to the size of the source image.
(D) A predetermined number of pixels.

（２）顔のキーポイントの重心を中心とする所定の大きさの矩形領域を「限定領域」としてもよい。ここで「所定の大きさ」は，具体的には下記のようにしてもよい。
（ア）顔のキーポイントのバウンディングボックスの大きさに対する所定の割合。
（イ）身体のキーポイントのバウンディングボックスの大きさに対する所定の割合。
（ウ）ソース画像の大きさに対する所定の割合。
（エ）所定の画素数。
なお、顔のキーポイントが検出されなかった場合は，近隣の部位（例えば首，肩など）のキーポイントの位置を元に，顔が含まれるであろう「限定領域」を設定してもよい。 (2) A rectangular area having a predetermined size centered on the center of gravity of the key point of the face may be defined as a "limited area". Here, the "predetermined size" may be specifically as follows.
(A) A predetermined ratio of facial key points to the size of the bounding box.
(B) A predetermined ratio of the key points of the body to the size of the bounding box.
(C) A predetermined ratio to the size of the source image.
(D) A predetermined number of pixels.
If the key points of the face are not detected, a "limited area" that will include the face may be set based on the positions of the key points of neighboring parts (for example, neck, shoulders, etc.). ..

図３の例では、限定領域抽出部１６２は例えば、顔のキーポイントのバウンディングボックスを所定の割合だけ広げた領域を「限定領域」として抽出する。これによって、限定領域Ｈ３が抽出される。そして、抽出された限定領域Ｈ３に対して、所望領域出力部１６３によって顔検出が実行されて、所望領域Ｈ３が出力される。この顔検出は、公知の方法を用いてもよい。
このように、限定領域抽出部１６２は、対象画像におけるキーポイント位置を推定し当該推定したキーポイント位置を用いて、前記対象画像から前記所望領域を含む限定領域を抽出する。 In the example of FIG. 3, the limited area extraction unit 162 extracts, for example, an area in which the bounding box of the key point of the face is expanded by a predetermined ratio as a “limited area”. As a result, the limited region H3 is extracted. Then, face detection is executed by the desired region output unit 163 for the extracted limited region H3, and the desired region H3 is output. A known method may be used for this face detection.
In this way, the limited area extraction unit 162 estimates the key point position in the target image and uses the estimated key point position to extract the limited area including the desired area from the target image.

＜実施例２の処理＞
続いて、実施例２の処理は、セマンティックセグメンテーションを使ったものである。実施例２について図４を用いて説明する。図４は、実施例２の処理について説明するための図である。図４において、限定領域抽出部１６２は、対象画像Ｈ１において、非特許文献２などのセマンティックセグメンテーションの技術を用いて、画像領域を分類して、人物領域を「限定領域」として抽出する。 <Processing of Example 2>
Subsequently, the process of Example 2 uses semantic segmentation. The second embodiment will be described with reference to FIG. FIG. 4 is a diagram for explaining the process of the second embodiment. In FIG. 4, the limited area extraction unit 162 classifies the image area in the target image H1 by using a semantic segmentation technique such as Non-Patent Document 2, and extracts the person area as the “limited area”.

図４では、対象画像Ｈ１１に対して、限定領域抽出部１６２によってセマンティックセグメンテーションが実行されることによって、画像Ｈ１２に示すように、人物領域Ｒ１１、Ｒ１２が抽出される。
限定領域抽出部１６２は、複数の人物領域が抽出された場合、画像の中心により近い人物領域を限定領域として抽出してもよいし、人物領域の大きさが最も大きい人物領域を限定領域として抽出してもよいし、画像の中心により近く且つ人物領域の大きさが最も大きい人物領域を限定領域として抽出してもよい。
図４の場合、例えば、限定領域抽出部１６２によって、人物領域Ｒ１１、Ｒ１２のうち、中心に近い、及び／又はより領域が大きい人物領域Ｒ１１が限定領域として抽出される。図４の例の場合、限定領域Ｈ１３は一例として、画像から人物領域Ｒ１１に対して上及び左右にマージンを付けて抜き出した領域である。なお、これに限らず、限定領域Ｈ１３は、人物領域Ｒ１１に対して上及び左右にマージンがなくてもよく、人物領域Ｒ１１にぴったり外接する矩形領域であってもよい。そして、抽出された限定領域Ｈ１３に対して、所望領域出力部１６３によって顔検出が実行されて、所望領域Ｈ１４が出力される。この顔検出は、公知の方法を用いてもよい。 In FIG. 4, when the limited area extraction unit 162 executes semantic segmentation on the target image H11, the person areas R11 and R12 are extracted as shown in the image H12.
When a plurality of person areas are extracted, the limited area extraction unit 162 may extract the person area closer to the center of the image as the limited area, or extract the person area having the largest size of the person area as the limited area. Alternatively, the person area closer to the center of the image and having the largest person area may be extracted as the limited area.
In the case of FIG. 4, for example, the limited area extraction unit 162 extracts the person area R11 closer to the center and / or the larger area of the person areas R11 and R12 as the limited area. In the case of the example of FIG. 4, the limited area H13 is, as an example, an area extracted from the image with margins on the top and left and right with respect to the person area R11. Not limited to this, the limited area H13 may have no margins above and to the left and right with respect to the person area R11, and may be a rectangular area circumscribing the person area R11. Then, face detection is executed by the desired region output unit 163 for the extracted limited region H13, and the desired region H14 is output. A known method may be used for this face detection.

このように実施例２では、限定領域抽出部１６２は例えば、セマンティックセグメンテーションによる領域分類を実行し、当該領域分類の結果を用いて、前記対象画像から前記所望領域を含む限定領域を抽出する。
例えば、限定領域抽出部１６２は、セマンティックセグメンテーションによって対象画像を複数の領域に分け、当該領域毎に当該領域が表すカテゴリに分類し、ユーザが所望する所望領域の種類（例えば、人の顔）に対応するカテゴリ（例えば、人体）に分類された領域を略含む領域を限定領域として選択してもよい。所望領域出力部１６３は、当該選択された限定領域において当該所望領域を認識する処理（ここでは顔検出）を実行して、当該限定領域から所望領域（例えば、人の顔の画像領域）を抽出して出力してもよい。 As described above, in the second embodiment, the limited region extraction unit 162 executes, for example, region classification by semantic segmentation, and extracts the limited region including the desired region from the target image by using the result of the region classification.
For example, the limited area extraction unit 162 divides the target image into a plurality of areas by semantic segmentation, classifies each area into a category represented by the area, and sets the type of the desired area (for example, a human face) desired by the user. An area that substantially includes an area classified into the corresponding category (for example, the human body) may be selected as the limited area. The desired area output unit 163 executes a process of recognizing the desired area (here, face detection) in the selected limited area, and extracts a desired area (for example, an image area of a human face) from the limited area. And output.

上記について具体例を用いて説明する。例えば、人と犬が写った画像の場合において、ユーザが所望する所望領域の種類が「人の顔」である場合を例に説明する。この場合、限定領域抽出部１６２は、セマンティックセグメンテーションによって対象画像を複数の画像領域に分け、その中から、犬の画像領域やその他の画像領域を無視して、人の画像領域のみを選択して、限定領域としてもよい。そして所望領域出力部１６３は、人の画像領域から人の「顔」を認識し、人の「顔」画像領域を、所望領域として出力してもよい。これにより、２段階で抽出することによって、複数のカテゴリの被写体が写った画像であっても、ユーザが所望する所望領域の種類に対応するカテゴリの画像領域（例えば人の画像領域）であって所望領域の種類の画像領域（例えば、人の顔の画像領域）だけを高精度に抽出することができる。 The above will be described with reference to specific examples. For example, in the case of an image showing a person and a dog, the case where the type of the desired region desired by the user is a "human face" will be described as an example. In this case, the limited area extraction unit 162 divides the target image into a plurality of image areas by semantic segmentation, ignores the dog image area and other image areas, and selects only the human image area. , May be a limited area. Then, the desired area output unit 163 may recognize the person's "face" from the person's image area and output the person's "face" image area as a desired area. As a result, by extracting in two steps, even if the image shows a subject of a plurality of categories, the image area of the category corresponding to the type of the desired area desired by the user (for example, the image area of a person) can be obtained. Only an image region of a desired region type (for example, an image region of a human face) can be extracted with high accuracy.

＜実施例３の処理＞
続いて、実施例３は、セマンティックセグメンテーションとキーポイント推定を使った実施例である。この場合、限定領域抽出部１６２はまず、対象画像に対してセマンティックセグメンテーションを実行し、人物領域を抽出する。そして限定領域抽出部１６２は、この人物領域に対して、前記の「キーポイント推定を使った実施例」の手法を実行し、「限定領域」を抽出してもよい。このように、限定領域抽出部１６２は、セマンティックセグメンテーションによって前記対象画像を複数の領域に分け、当該領域毎に当該領域が表すカテゴリに分類し、前記所望領域の種類に対応するカテゴリに分類された領域（例えば、人物領域）を略含む領域において、キーポイント位置を推定し、当該推定したキーポイントのうちの１個または複数個を含むように前記限定領域を作る。これにより、キーポイントを推定するときに、人物以外のもの（例えば、コンセント）などを対象とすることを未然に防止できる。 <Processing of Example 3>
Subsequently, Example 3 is an example using semantic segmentation and key point estimation. In this case, the limited area extraction unit 162 first executes semantic segmentation on the target image to extract the person area. Then, the limited area extraction unit 162 may execute the method of the above-mentioned "Example using key point estimation" for this person area to extract the "limited area". As described above, the limited region extraction unit 162 divides the target image into a plurality of regions by semantic segmentation, classifies each region into a category represented by the region, and classifies the target image into a category corresponding to the type of the desired region. In an area that substantially includes an area (for example, a person area), the key point position is estimated, and the limited area is created so as to include one or more of the estimated key points. As a result, when estimating the key point, it is possible to prevent the target from something other than a person (for example, an outlet).

＜変形例：限定領域を複数抽出する例＞
以上の実施例では「限定領域」を１個（１人分）に絞り込んだが、数個（複数人）のままにしても良い。あるいは、「限定領域」の個数を，所定の個数（または所定の個数以下）に絞り込んでもよい。 <Modification example: Example of extracting multiple limited areas>
In the above embodiment, the "limited area" is narrowed down to one (for one person), but several (multiple people) may be left as they are. Alternatively, the number of "limited areas" may be narrowed down to a predetermined number (or a predetermined number or less).

続いて変形例について図５及び図６を用いて説明する。図５は、変形例の画面遷移の一例である。図５の画面Ｇ１１、Ｇ１２、Ｇ１３は、例えば、アプリケーションを立ち上げて表示される画面である。画面Ｇ１１では、対象画像がユーザによって選択されて表示されている画面の一例である。画面Ｇ１１には、対象画像を選択するためのファイル選択用ボタンＢ１１と、ユーザが抽出したい所望領域の種類を入力するための入力ボックスＢ１２と、所望領域抽出の開始を指示するための抽出開始ボタンＢ１３と、対象画像Ｆ１１が表示されている。ここでは、対象画像Ｆ１１として、一人の男性と一人の女性が写った画像が表示されている。入力ボックスＢ１２ではなく、セレクトボックスや、複数のタグからの選択など、選択式であってもよい。 Subsequently, a modified example will be described with reference to FIGS. 5 and 6. FIG. 5 is an example of the screen transition of the modified example. The screens G11, G12, and G13 of FIG. 5 are, for example, screens displayed by launching an application. The screen G11 is an example of a screen in which the target image is selected and displayed by the user. On the screen G11, a file selection button B11 for selecting a target image, an input box B12 for inputting the type of a desired area to be extracted by the user, and an extraction start button for instructing the start of extraction of the desired area are displayed. B13 and the target image F11 are displayed. Here, as the target image F11, an image showing one man and one woman is displayed. Instead of the input box B12, a selection type such as a select box or selection from a plurality of tags may be used.

例えば、画面Ｇ１１において、ユーザによって「抽出したい所望領域」として「人の顔」が入力され、抽出開始ボタンＢ１３が押された場合、処理が実行されて、画面Ｇ１２が表示される。画面Ｇ１２では例えば、所望領域として、男性の顔の画像領域Ｆ１２と女性の顔の画像領域Ｆ１３の両方が、所望領域として表示される。 For example, on the screen G11, when the user inputs a "human face" as the "desired area to be extracted" and the extraction start button B13 is pressed, the process is executed and the screen G12 is displayed. On the screen G12, for example, both the male face image area F12 and the female face image area F13 are displayed as desired areas.

この場合の処理として、限定領域抽出部１６２は、複数の限定領域を抽出する。そして、所望領域出力部１６３は、選択された複数の限定領域のそれぞれから、所望領域を出力してもよい。表示制御部１６４は、出力された複数の所望領域のうち、少なくとも一つをユーザが選択可能にディスプレイ１５に表示制御してもよい。これにより、ユーザが複数の所望領域から、１以上の領域を選択することができる。 As a process in this case, the limited area extraction unit 162 extracts a plurality of limited areas. Then, the desired region output unit 163 may output a desired region from each of the plurality of selected limited regions. The display control unit 164 may display and control at least one of the plurality of output desired areas on the display 15 so that the user can select it. This allows the user to select one or more areas from a plurality of desired areas.

例えば、画面Ｇ１２において、男性の顔の画像領域Ｆ１２がユーザによって選択されて、画面Ｇ１２の保存ボタンＢ１４が押された場合、受付部１６５は、複数の所望領域のうちユーザによって選択された男性の顔の画像領域Ｆ１２を受け付ける。記憶処理部１６６によって男性の顔の画像領域Ｆ１２がストレージ１３に保存され、画面Ｇ１３に表示が遷移する。このように、受付部１６５は、複数の所望領域のうちユーザによって選択された１以上の所望領域を受け付ける。そして、記憶処理部１６６は、ユーザによって選択された１以上の所望領域をストレージ１３に保存させる。これにより、ユーザが選択した所望の画像を活用することができる。 For example, on the screen G12, when the image area F12 of the male face is selected by the user and the save button B14 on the screen G12 is pressed, the reception unit 165 is the reception unit 165 of the male selected from the plurality of desired areas. The face image area F12 is accepted. The image area F12 of the male face is stored in the storage 13 by the storage processing unit 166, and the display transitions to the screen G13. In this way, the reception unit 165 receives one or more desired areas selected by the user from the plurality of desired areas. Then, the storage processing unit 166 stores one or more desired areas selected by the user in the storage 13. As a result, the desired image selected by the user can be utilized.

ユーザによって選択された領域は、ストレージ１３に保存され、ユーザは該画像を使用する。たとえば、１枚の集合写真からユーザの自己の顔のみを保存し、サムネイル画像として使用するなどが考えられる。 The area selected by the user is stored in the storage 13, and the user uses the image. For example, it is conceivable to save only the user's own face from one group photo and use it as a thumbnail image.

続いて図５の画面Ｇ１１から画面Ｇ１２に遷移する間に実行されている処理の一例について、説明する。図６は、変形例の処理を説明するための図である。図６において、限定領域抽出部１６２は、対象画像Ｈ２１において、非特許文献２などの技術を用いて、セマンティックセグメンテーションが実行されることによって、画像Ｈ２２に示すように、人物領域Ｒ２１、Ｒ２２が抽出される。 Subsequently, an example of the processing executed during the transition from the screen G11 to the screen G12 of FIG. 5 will be described. FIG. 6 is a diagram for explaining the processing of the modified example. In FIG. 6, the limited area extraction unit 162 extracts the person areas R21 and R22 as shown in the image H22 by performing semantic segmentation on the target image H21 using a technique such as Non-Patent Document 2. Will be done.

そして、限定領域抽出部１６２によって、人物領域Ｒ２１を含む限定領域Ｈ２３が抽出され、人物領域Ｒ２２を含む限定領域Ｈ２４が抽出される。そして、所望領域出力部１６３によって、顔検出が実行され、所望領域Ｈ２５、所望領域Ｈ２６が出力される。 Then, the limited area extraction unit 162 extracts the limited area H23 including the person area R21, and extracts the limited area H24 including the person area R22. Then, the desired region output unit 163 executes face detection, and outputs the desired region H25 and the desired region H26.

図７は、変形例の処理の一例を示すフローチャートである。
（ステップＳ１１０）まずプロセッサ１６は、対象画像が選択され所望領域の種類が指定された状態で、抽出開始ボタンが押されたか否か判定する。 FIG. 7 is a flowchart showing an example of processing of the modified example.
(Step S110) First, the processor 16 determines whether or not the extraction start button is pressed in a state where the target image is selected and the type of the desired region is specified.

（ステップＳ１２０）ステップＳ１１０で、対象画像が選択され所望領の種域類が指定された状態で、抽出開始ボタンが押された場合、限定領域抽出部１６２は、対象画像から限定領域を抽出する。 (Step S120) When the extraction start button is pressed while the target image is selected and the species area of the desired region is specified in step S110, the limited area extraction unit 162 extracts the limited area from the target image. ..

（ステップＳ１３０）次には、所望領域出力部１６３は、限定領域に対して所望領域を認識する処理を実行して、限定領域から所望領域を抽出して出力する。 (Step S130) Next, the desired area output unit 163 executes a process of recognizing the desired area with respect to the limited area, extracts the desired area from the limited area, and outputs the desired area.

（ステップＳ１４０）次にプロセッサ１６は、ユーザによって保存する所望領域が選択された状態で保存ボタンが押されたか否か判定する。 (Step S140) Next, the processor 16 determines whether or not the save button is pressed with the desired area to be saved selected by the user.

（ステップＳ１５０）ステップＳ１４０で、ユーザによって保存する所望領域が選択された状態で保存ボタンが押された場合、ユーザによって選択された所望領域をストレージ１３に保存させる。以上で本フローチャートの処理を終了する。 (Step S150) When the save button is pressed with the desired area to be saved by the user selected in step S140, the desired area selected by the user is saved in the storage 13. This completes the processing of this flowchart.

さらに複数の「限定領域」に優先度を付けてもよい。優先度を付けるには，前記の「複数人物のうちの１人に限定する方法」を使ってもよい。例えば、より多くの顔のキーポイントが検出された人物の「限定領域」の優先度を高くしてもよい。ユーザが「全身が写っている人物を優先する」ことを指定した場合、全身が写っている人物の「限定領域」の優先度を上げて、全身が写っている人物を優先して抽出されてもよい。この場合、限定領域抽出部１６２は、身体のキーポイントのうち、どのキーポイントが検出されたかによって、身体の写っている割合による優先度を設定してもよい。例えば限定領域抽出部１６２は、顔だけより顔を含む上半身が写っている方が優先度を高くし、顔を含む上半身だけ写っているより全身が写っている方が優先度を高く設定してもよい。 Further, a plurality of "limited areas" may be prioritized. To prioritize, the above-mentioned "method of limiting to one of a plurality of persons" may be used. For example, the priority of the "limited area" of the person in which more facial key points are detected may be increased. When the user specifies "priority is given to the person who has the whole body", the priority of the "limited area" of the person who has the whole body is raised, and the person who has the whole body is preferentially extracted. May be good. In this case, the limited area extraction unit 162 may set the priority according to the ratio of the image of the body depending on which key point is detected among the key points of the body. For example, in the limited area extraction unit 162, the priority is set when the upper body including the face is shown rather than only the face, and the priority is set when the whole body is shown rather than only the upper body including the face. May be good.

何を優先するかは、ユーザが選択可能であってもよく、取得部１６１は、所望領域の出力において優先する対象（例えば、大きい画像領域）を、ユーザによる優先度の指定に基づき取得可能であってもよい。この場合、限定領域抽出部１６２は、複数の限定領域を抽出し、当該優先する対象に応じて、複数の限定領域それぞれに優先度を決定する。例えば、手前にいる人物の画像領域を抽出する場合、限定領域抽出部１６２は例えば、複数の限定領域のうち、画像領域が大きいほど、優先度を高くしてもよい。所望領域出力部１６３は、当該優先度に応じて所望領域を出力する。例えば、所望領域出力部１６３は、優先度が基準以上高い限定領域から所望領域を抽出して出力する。これにより、ユーザが優先するもの（例えば、手前にいる人物）の画像領域を取得することができる。 The user may select what to prioritize, and the acquisition unit 161 can acquire a priority target (for example, a large image area) in the output of the desired area based on the priority specification by the user. There may be. In this case, the limited area extraction unit 162 extracts a plurality of limited areas and determines the priority for each of the plurality of limited areas according to the priority target. For example, when extracting an image area of a person in the foreground, the limited area extraction unit 162 may have a higher priority as the image area is larger than the plurality of limited areas, for example. The desired area output unit 163 outputs a desired area according to the priority. For example, the desired region output unit 163 extracts and outputs a desired region from a limited region having a higher priority than the reference. As a result, it is possible to acquire an image area of an object (for example, a person in front) that the user has priority over.

以上の実施例では，領域抽出したい部位が顔であるとしたが，これ以外の部位（例えば手）であってもよい。これにより、例えば人の手を抽出することができる。また、領域抽出したい対象が人であるとしたが、他の動物（例えば、犬または猫）であってもよい。これにより、例えば、犬の顔や猫の顔を抽出することができる。 In the above examples, the part to be extracted is the face, but other parts (for example, hands) may be used. Thereby, for example, a human hand can be extracted. Further, although it is assumed that the target for which the region is to be extracted is a person, it may be another animal (for example, a dog or a cat). Thereby, for example, the face of a dog or the face of a cat can be extracted.

以上、第１の実施形態に係る情報処理装置１は、対象画像におけるキーポイント位置の推定及び／又はセマンティックセグメンテーションによる領域分類を実行し、当該推定したキーポイント位置、及び／又は当該領域分類の結果を用いて、前記対象画像から所望領域を含む限定領域を抽出する限定領域抽出部１６２と、前記抽出された限定領域に対して前記所望領域を認識する処理を実行して、前記限定領域から所望領域を抽出して出力する所望領域出力部１６３と、を備える。 As described above, the information processing apparatus 1 according to the first embodiment executes the estimation of the key point position in the target image and / or the area classification by the semantic segmentation, and the estimated key point position and / or the result of the area classification. To execute a process of recognizing the desired region with respect to the extracted limited region and the limited region extraction unit 162 that extracts the limited region including the desired region from the target image, the desired region is desired from the limited region. A desired region output unit 163 for extracting and outputting a region is provided.

この構成によれば、キーポイントの推定、及び／又はセマンティックセグメンテーションによる対象画像内の画像領域の推定を用いることにより、所望領域を高精度に検出することができる。このため、対象画像から、所望領域とは異なる領域が誤って検出される可能性を低減することができる。
なお、表示制御部１６４は、限定領域抽出部１６２によって抽出された限定領域が複数ある場合、当該複数の限定領域をユーザが選択可能にディスプレイ１５に表示制御してもよい。受付部１６５は、ユーザによって選択された限定領域を受け付ける受付部と、を備える。所望領域出力部１６３は、前記ユーザによって選択された限定領域に対して前記所望領域を認識する処理を実行して、前記限定領域から所望領域を抽出して出力する。これにより、ユーザが複数の限定領域の中から１以上の限定領域を選択することで、ユーザによって選択された限定領域から所望領域が出力されるので、所望領域の出力精度を向上させることができる。 According to this configuration, the desired region can be detected with high accuracy by using the estimation of the key point and / or the estimation of the image region in the target image by semantic segmentation. Therefore, it is possible to reduce the possibility that a region different from the desired region is erroneously detected from the target image.
When there are a plurality of limited areas extracted by the limited area extraction unit 162, the display control unit 164 may display and control the plurality of limited areas on the display 15 so that the user can select them. The reception unit 165 includes a reception unit that receives a limited area selected by the user. The desired area output unit 163 executes a process of recognizing the desired area with respect to the limited area selected by the user, extracts the desired area from the limited area, and outputs the desired area. As a result, when the user selects one or more limited areas from the plurality of limited areas, the desired area is output from the limited area selected by the user, so that the output accuracy of the desired area can be improved. ..

＜第２の実施形態＞
続いて第２の実施形態について説明する。第１の実施形態では、ユーザが使用する情報処理装置１によって処理を実行したが、第２の実施形態では、ユーザが使用する端末装置が通信回路網を介して接続されたコンピュータシステムによって実行される。
図８は、第２の実施形態に係る情報処理システムの概略構成図である。図８に示すように、情報処理システムＳは一例として、端末装置３−１、…、３−Ｎ（Ｎは自然数）と、端末装置３−１〜３−Ｎそれぞれと通信回路網ＮＷを介して接続されたコンピュータシステム２を備える。コンピュータシステム２は、端末装置３−１、…、３−Ｎからの要求に応じて、処理を実行する。ここではコンピュータシステム２は一例として、一台のサーバであるものとして説明するが、これに限定されるものではなく、クラウドサービスのように複数のコンピュータで構成されてもよい。 <Second embodiment>
Subsequently, the second embodiment will be described. In the first embodiment, the process is executed by the information processing device 1 used by the user, but in the second embodiment, the terminal device used by the user is executed by the computer system connected via the communication network. NS.
FIG. 8 is a schematic configuration diagram of the information processing system according to the second embodiment. As shown in FIG. 8, the information processing system S is, as an example, via the terminal devices 3-1, ..., 3-N (N is a natural number), the terminal devices 3-1 to 3-N, and the communication network NW. The computer system 2 connected to the device is provided. The computer system 2 executes processing in response to a request from the terminal devices 3-1, ..., 3-N. Here, the computer system 2 will be described as one server as an example, but the present invention is not limited to this, and the computer system 2 may be configured by a plurality of computers such as a cloud service.

端末装置３−１〜３−Ｎは、別々のユーザが使用する端末装置であり、例えば、多機能携帯電話（いわゆるスマートフォン）などの携帯電話、タブレット、電子書籍リーダー、ノートパソコンなどのモバイルデバイス、またはデスクトップパソコンなどである。端末装置３−１〜３−Ｎは例えば、ＷＥＢブラウザを用いて、コンピュータシステム２から送信された情報を表示してもよいし、端末装置３−１〜３−Ｎにインストールされたアプリケーションにおいて、コンピュータシステム２から送信された情報を表示してもよい。 The terminal devices 3-1 to 3 to N are terminal devices used by different users, and are, for example, mobile phones such as multifunctional mobile phones (so-called smartphones), tablets, electronic book readers, mobile devices such as laptop computers, and the like. Or a desktop computer. The terminal devices 3-1 to 3-N may display the information transmitted from the computer system 2 by using, for example, a WEB browser, or in the application installed in the terminal devices 3-1 to 3-N. The information transmitted from the computer system 2 may be displayed.

図９は、本実施形態に係るコンピュータシステムの概略構成図である。図９に示すように、コンピュータシステム２は例えば、入力インタフェース２１と、通信モジュール２２と、ストレージ２３と、メモリ２４と、プロセッサ２５とを備える。
入力インタフェース１１は、コンピュータシステム２の管理者の操作を受け付け、受け付けた操作に応じた入力信号をプロセッサ２５へ出力する。
通信モジュール２２は、通信回路網ＮＷに接続されて、通信回路網ＮＷに接続されている端末装置３−１〜３−Ｎと通信する。この通信は有線であっても無線であってもよい。 FIG. 9 is a schematic configuration diagram of a computer system according to the present embodiment. As shown in FIG. 9, the computer system 2 includes, for example, an input interface 21, a communication module 22, a storage 23, a memory 24, and a processor 25.
The input interface 11 accepts an operation of the administrator of the computer system 2 and outputs an input signal corresponding to the accepted operation to the processor 25.
The communication module 22 is connected to the communication network NW and communicates with the terminal devices 3-1 to 3-N connected to the communication network NW. This communication may be wired or wireless.

ストレージ２３には、プロセッサ１６が読み出して実行するためのプログラム及び各種のデータが格納されている。
メモリ２４は、データ及びプログラムを一時的に保持する。メモリ１４は、揮発性メモリであり、例えばＲＡＭ（Random Access Memory）である。 The storage 23 stores a program and various data for the processor 16 to read and execute.
The memory 24 temporarily holds data and programs. The memory 14 is a volatile memory, for example, a RAM (Random Access Memory).

プロセッサ２５は、ストレージ２３から第１の実施形態に係るプログラムをメモリ２４にロードし、当該プログラムに含まれる一連の命令を実行することによって、取得部１６１、限定領域抽出部１６２、所望領域出力部１６３、表示制御部１６４、受付部１６５、記憶処理部１６６として機能する。これらの機能は、第１の実施形態と同様であるので、その説明を省略する。 The processor 25 loads the program according to the first embodiment from the storage 23 into the memory 24, and executes a series of instructions included in the program to execute the acquisition unit 161, the limited area extraction unit 162, and the desired area output unit. It functions as 163, a display control unit 164, a reception unit 165, and a storage processing unit 166. Since these functions are the same as those in the first embodiment, the description thereof will be omitted.

なお、コンピュータシステム２の一部の機能が、端末装置３−１〜３−Ｎで実現されてもよい。 It should be noted that some functions of the computer system 2 may be realized by the terminal devices 3-1 to 3-N.

なお、上述した実施形態で説明した情報処理装置１の少なくとも一部は、ハードウェアで構成してもよいし、ソフトウェアで構成してもよい。ソフトウェアで構成する場合には、情報処理装置１の少なくとも一部の機能を実現するプログラムをフレキシブルディスクやＣＤ−ＲＯＭ等の記録媒体に収納し、コンピュータに読み込ませて実行させてもよい。記録媒体は、磁気ディスクや光ディスク等の着脱可能なものに限定されず、ハードディスク装置やメモリなどの固定型の記録媒体でもよい。 At least a part of the information processing apparatus 1 described in the above-described embodiment may be configured by hardware or software. When configured by software, a program that realizes at least a part of the functions of the information processing device 1 may be stored in a recording medium such as a flexible disk or a CD-ROM, read by a computer, and executed. The recording medium is not limited to a removable one such as a magnetic disk or an optical disk, and may be a fixed recording medium such as a hard disk device or a memory.

また、情報処理装置１の少なくとも一部の機能を実現するプログラムを、インターネット等の通信回線（無線通信も含む）を介して頒布してもよい。さらに、同プログラムを暗号化したり、変調をかけたり、圧縮した状態で、インターネット等の有線回線や無線回線を介して、あるいは記録媒体に収納して頒布してもよい。 Further, a program that realizes at least a part of the functions of the information processing device 1 may be distributed via a communication line (including wireless communication) such as the Internet. Further, the program may be encrypted, modulated, compressed, and distributed via a wired line or wireless line such as the Internet, or stored in a recording medium.

さらに、一つまたは複数の情報処理機器によって情報処理装置１を機能させてもよい。複数の情報処理機器を用いる場合、情報処理機器のうちの１つをコンピュータとし、当該コンピュータが所定のプログラムを実行することにより情報処理装置１の少なくとも１つの手段として機能が実現されてもよい。 Further, the information processing apparatus 1 may be operated by one or a plurality of information processing devices. When a plurality of information processing devices are used, one of the information processing devices may be a computer, and the function may be realized as at least one means of the information processing device 1 by executing a predetermined program by the computer.

また、方法の発明においては、全ての工程（ステップ）をコンピュータによって自動制御で実現するようにしてもよい。また、各工程をコンピュータに実施させながら、工程間の進行制御を人の手によって実施するようにしてもよい。また、さらには、全工程のうちの少なくとも一部を人の手によって実施するようにしてもよい。 Further, in the invention of the method, all the steps (steps) may be realized by automatic control by a computer. Further, the progress control between the processes may be manually performed while the computer is used to perform each process. Further, at least a part of the whole process may be manually performed.

以上、本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。更に、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 As described above, the present invention is not limited to the above-described embodiment as it is, and at the implementation stage, the components can be modified and embodied within a range that does not deviate from the gist thereof. In addition, various inventions can be formed by an appropriate combination of the plurality of components disclosed in the above-described embodiment. For example, some components may be removed from all the components shown in the embodiments. Further, components over different embodiments may be combined as appropriate.

１情報処理装置
１１入力インタフェース
１２通信モジュール
１３ストレージ
１４メモリ
１５ディスプレイ
１６プロセッサ
１６１取得部
１６２限定領域抽出部
１６３所望領域出力部
１６４表示制御部
１６５受付部
１６６記憶処理部
１７カメラ
２コンピュータシステム
２１入力インタフェース
２２通信モジュール
２３ストレージ
２４メモリ
２５プロセッサ
３−１〜３−Ｎ端末装置
Ｓ情報処理システム 1 Information processing device 11 Input interface 12 Communication module 13 Storage 14 Memory 15 Display 16 Processor 161 Acquisition unit 162 Limited area extraction unit 163 Desired area output unit 164 Display control unit 165 Reception unit 166 Storage processing unit 17 Camera 2 Computer system 21 Input interface 22 Communication module 23 Storage 24 Memory 25 Processor 3-1 to 3-N Terminal device S Information processing system

Claims

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
The limited area extraction unit is an information processing device that raises the priority of an animal in which more key points of a specific site are detected when determining the priority of each of the plurality of limited areas.

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
When determining the priority of each of the plurality of limited regions, the limited region extraction unit raises the priority of the animal in which more key points of the body are detected.
Information processing apparatus.

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
When determining the priority of each of the plurality of limited regions, the limited region extraction unit raises the priority of an animal having a large bounding box of a key point of a specific site.
Information processing apparatus.

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
When determining the priority of each of the plurality of limited regions, the limited region extraction unit raises the priority of an animal having a large bounding box of a key point of the body.
Information processing apparatus.

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
When determining the priority of each of the plurality of limited regions, the limited region extraction unit raises the priority of the animal in which the whole body is shown.
Information processing apparatus.

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
When determining the priority of each of the plurality of limited areas, the limited area extraction unit sets the priority according to the ratio of the body image depending on which key point is detected among the key points of the body.
Information processing apparatus.

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
When determining the priority of each of the plurality of limited regions, the limited region extraction unit has a higher priority when the half body including the specific portion is shown than only the specific portion.
Information processing apparatus.

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
When determining the priority of each of the plurality of limited areas, the limited area extraction unit gives higher priority to the whole body than to the half body including a specific part.
Information processing apparatus.

Estimate the key point position in the target image and / or perform region classification by semantic segmentation, and use the estimated key point position and / or the result of the region classification to extract a limited region including the desired region from the target image. Limited area extraction unit to extract and
A desired region output unit that executes a process of recognizing the desired region on the extracted limited region, extracts the desired region from the limited region, and outputs the desired region.
An acquisition unit that acquires a priority target in the output of the desired area based on a priority specification by the user,
With
The limited area extraction unit extracts a plurality of limited areas, determines the priority of each of the plurality of limited areas according to the priority target, and determines the priority of each of the plurality of limited areas.
The desired area output unit outputs the desired area according to the priority.
When the limited area extraction unit determines the priority of each of the plurality of limited areas, the larger the image area, the higher the priority.
Information processing apparatus.

An information processing method for executing the processing of each part of the information processing apparatus according to any one of claims 1 to 9.

A program for operating a computer as the information processing device according to any one of claims 1 to 9.