JP2018128930A

JP2018128930A - Image processing device and control method thereof

Info

Publication number: JP2018128930A
Application number: JP2017022491A
Authority: JP
Inventors: 智也本條; Tomoya Honjo; 伊藤　嘉則; Yoshinori Ito; 嘉則伊藤
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2017-02-09
Filing date: 2017-02-09
Publication date: 2018-08-16

Abstract

PROBLEM TO BE SOLVED: To provide an image division technique that enables more efficient processing to be performed by a plurality of processing units.SOLUTION: An image processing device performing processing using a plurality of processing units includes: division means for regionally dividing an input image into a plurality of subregions; determination means for, on the basis of the number of the plurality of processing units, a first division number which is the number of the plurality of subregions, and an influence in the case of adjusting the number of the plurality of subregions from the first division number to a second division number, determining the second division number; adjustment means for adjusting a part of the subregions of the first division number such that the input image is regionally divided by the second division number; and control means for exerting control such that predetermined processing on subregions of the second division number is executed by the plurality of processing units.SELECTED DRAWING: Figure 2

Description

本発明は、画像を複数の部分領域に分割する技術に関するものである。 The present invention relates to a technique for dividing an image into a plurality of partial regions.

近年、写真に写っている人や物、あるいはその写真のシーンの認識を行う技術開発が進んでいる。その中で、画像を部分領域に分割してそれぞれの領域の特徴量を抽出し、その特徴量を用いて認識を行う手法が存在する。このような手法においては、部分領域に分割した後は、部分領域毎に処理を行うことが多く、その場合は１つの部分領域を処理単位として並列に処理が行える。これらの処理を行う処理系に多数の処理コアが搭載されているときに、各処理コアに部分領域毎の処理を割り当てれば、それだけ効率よく処理を行える。 In recent years, technological development for recognizing a person or an object in a photograph or a scene of the photograph has progressed. Among them, there is a method of dividing an image into partial regions, extracting feature amounts of each region, and performing recognition using the feature amounts. In such a method, after dividing into partial areas, processing is often performed for each partial area. In this case, processing can be performed in parallel with one partial area as a processing unit. When a large number of processing cores are mounted in a processing system that performs these processes, the processing can be performed more efficiently if the processing for each partial region is assigned to each processing core.

より効率的に並列処理を行うために、ある処理を複数に分割して処理単位を調整するための技術が提案されている。特許文献１では、ある形式で表現された３次元形状を形状処理システムに読み込む際に、当該システムを実行するＣＰＵの数を検出し、形状オブジェクトをＣＰＵの数に合わせて分割する技術が開示されている。 In order to perform parallel processing more efficiently, a technique has been proposed for adjusting a processing unit by dividing a certain process into a plurality of processes. Patent Document 1 discloses a technique for detecting the number of CPUs that execute the system and dividing the shape object according to the number of CPUs when a three-dimensional shape expressed in a certain format is read into the shape processing system. ing.

特許第３８４７０２０号公報Japanese Patent No. 3847020

一般に、分割数が変化すると部分領域の特徴量が変化して結果的に画像認識結果に影響を及ぼす可能性がある。そのため、特許文献１に開示されている技術のようにＣＰＵの数のみで分割数を決定する手法は、画像認識処理においては望ましくない場合がある。 In general, when the number of divisions changes, the feature quantity of the partial region may change, resulting in an influence on the image recognition result. For this reason, the method of determining the number of divisions based only on the number of CPUs as in the technique disclosed in Patent Document 1 may not be desirable in image recognition processing.

本発明はこのような問題を鑑みてなされたものであり、より効率的な処理を可能とする画像分割を行う技術を提供することを目的とする。 The present invention has been made in view of such a problem, and an object thereof is to provide a technique for performing image division that enables more efficient processing.

上述の問題点を解決するため、本発明に係る画像処理装置は以下の構成を備える。すなわち、複数の処理部を用いて処理を行う画像処理装置は、入力画像を複数の部分領域に領域分割する分割手段と、前記複数の処理部の個数と前記複数の部分領域の個数である第１の分割数と前記複数の部分領域の個数を前記第１の分割数から第２の分割数に調整した場合の影響とに基づいて、該第２の分割数を決定する決定手段と、前記入力画像が前記第２の分割数で領域分割されるように、前記第１の分割数の部分領域の一部を調整する調整手段と、前記複数の処理部により前記第２の分割数の部分領域に対する所定の処理を実行するよう制御する制御手段と、を有する。 In order to solve the above-described problems, an image processing apparatus according to the present invention has the following configuration. That is, an image processing apparatus that performs processing using a plurality of processing units includes a dividing unit that divides an input image into a plurality of partial regions, a number of the plurality of processing units, and a number of the plurality of partial regions. Determining means for determining the second division number on the basis of the division number of 1 and the effect of adjusting the number of the plurality of partial areas from the first division number to the second division number; Adjustment means for adjusting a part of the partial region of the first division number so that the input image is divided into regions by the second division number, and the portion of the second division number by the plurality of processing units And control means for controlling to execute a predetermined process for the area.

本発明によれば、より効率的な処理を可能とする画像分割を行う技術を提供することができる。 According to the present invention, it is possible to provide a technique for performing image division that enables more efficient processing.

第１実施形態に係る情報処理装置のブロック図である。1 is a block diagram of an information processing apparatus according to a first embodiment. 第１実施形態における認識処理を示すフローチャートである。It is a flowchart which shows the recognition process in 1st Embodiment. 入力画像が処理されていく様子を例示的に示す図である。It is a figure which shows a mode that an input image is processed. 統合予定の部分領域ペアの一覧を示す図である。It is a figure which shows the list of the partial area | region pairs of an integration plan. 第２実施形態における認識処理を示すフローチャートである。It is a flowchart which shows the recognition process in 2nd Embodiment. 部分領域の分割及び統合を説明する図である。It is a figure explaining division and integration of a partial field. 第３実施形態における認識処理を示すフローチャートである。It is a flowchart which shows the recognition process in 3rd Embodiment. 辞書番号とクラス数とを対応付けたテーブルを示す図である。It is a figure which shows the table which matched dictionary number and the number of classes.

以下に、図面を参照して、この発明の好適な実施の形態を詳しく説明する。なお、以下の実施の形態はあくまで例示であり、本発明の範囲を限定する趣旨のものではない。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. The following embodiments are merely examples, and are not intended to limit the scope of the present invention.

（第１実施形態）
本発明に係る画像処理装置の第１実施形態として、複数の処理部（処理コア（ＰＥ））を有するＧＰＵを用いた画像認識処理を行うスマートフォンを例に挙げて以下に説明する。なお、ＧＰＵは、Graphics Processing Unitの略である。 (First embodiment)
As a first embodiment of an image processing apparatus according to the present invention, a smartphone that performs image recognition processing using a GPU having a plurality of processing units (processing cores (PE)) will be described below as an example. Note that GPU is an abbreviation for Graphics Processing Unit.

具体的には、ユーザーがスマートフォンを使用して、カメラで撮影した画像に対しリアルタイムで認識処理を行う画像処理方法の処理について説明する。より具体的には、写真を部分領域に分割してそれぞれの領域の特徴量を抽出し、その特徴量を用いて画像認識を行う形態について説明する。特に、分割数を、認識処理を実行する処理系の数と、処理全体に及ぼす影響としての認識処理の認識精度と、に基づいて調整する形態について説明する。 Specifically, processing of an image processing method in which a user performs recognition processing in real time on an image captured by a camera using a smartphone will be described. More specifically, a mode will be described in which a photograph is divided into partial regions, feature amounts of each region are extracted, and image recognition is performed using the feature amounts. In particular, a mode in which the number of divisions is adjusted based on the number of processing systems that execute recognition processing and the recognition accuracy of recognition processing as an influence on the entire processing will be described.

＜前提条件＞
まず、第１実施形態における前提条件について説明する。 <Prerequisites>
First, preconditions in the first embodiment will be described.

認識結果は当該領域のカテゴリの割合を表すものとする。具体的には、生物、自然、物、その他の４カテゴリに分けられるものとし、認識結果としては、例えば「生物：自然：物：その他＝０．８：０．１：０．１：０」といった各カテゴリの尤度で表されるものとする。なお、ユーザーに認識結果を提示するときは、各カテゴリで最も尤度の高いカテゴリを選んで表示するものとする。 The recognition result represents the category ratio of the area. Specifically, it can be divided into four categories of organisms, nature, things, and others, and the recognition result is, for example, “organisms: nature: things: other = 0.8: 0.1: 0.1: 0”. It is assumed that it is represented by the likelihood of each category. When presenting the recognition result to the user, the category with the highest likelihood is selected and displayed for each category.

認識処理の認識精度には、一般的に用いられるＰＡ（Pixel Accuracy）を用いることとする。なお、ＰＡは、得られた認識結果を正解データと画素単位で比較したときの、認識結果と正解データが一致した割合を示す。 Generally used PA (Pixel Accuracy) is used for the recognition accuracy of the recognition process. Note that PA indicates a rate at which the recognition result matches the correct answer data when the obtained recognition result is compared with the correct answer data in pixel units.

認識処理は、ユーザーのスマートフォンに搭載されているＧＰＵ（Graphics Processing Unit）で行うものとする。ここでは、ＧＰＵに搭載されている処理コア（ＰＥ）の数は”１６”とする。認識処理の入力は画像及び後述する辞書であり、出力は画像中の各領域のカテゴリ尤度マップとなる。 The recognition process is performed by a GPU (Graphics Processing Unit) mounted on the user's smartphone. Here, the number of processing cores (PE) mounted on the GPU is “16”. The input of the recognition process is an image and a dictionary described later, and the output is a category likelihood map of each region in the image.

認識処理を実行する処理系の実行可能処理数として、ＧＰＵ上のコア数を用いる。また本実施形態では、処理全体に及ぼす影響として認識処理の認識精度を用いる。すなわち、処理効率の観点から、分割数がＧＰＵのコア数の整数倍に近くなるように、かつ認識精度の観点から、認識精度の悪化の程度が抑制されるように選ぶ。具体的な方法についてはフローチャートの説明の所で述べる。 The number of cores on the GPU is used as the number of executable processes of the processing system that executes the recognition process. In this embodiment, the recognition accuracy of the recognition process is used as an influence on the entire process. That is, from the viewpoint of processing efficiency, the number of divisions is selected to be close to an integer multiple of the number of cores of the GPU, and from the viewpoint of recognition accuracy, the degree of deterioration in recognition accuracy is selected. A specific method will be described in the description of the flowchart.

入力画像の部分領域への分割には、一般的に用いられるｋ−ｍｅａｎｓアルゴリズムを用いるとする。ｋ−ｍｅａｎｓアルゴリズムによるクラスタリングは公知のため、説明を省略する。本実施形態では、ｋ−ｍｅａｎｓを用いて、まず入力画像中の画素からＲＧＢの色情報を特徴量として抽出し、各画素を特徴空間上で５０個のクラスタにクラスタリングする。その後、２次元の画像上にそれぞれのクラスタをマッピングする。すると、例えば特徴空間上では同じクラスタに属している２つの画素が、２次元の画像上では分離されるという場合がある。そのため、２次元の画像上にマッピングしたクラスタに対してラべリング処理を行い、空間的に分離されているクラスタを分割することで、最終的なクラスタすなわち部分領域を得る。このことから、最終的に得られる部分領域の数は、５０以上となることが予想される。 Assume that a commonly used k-means algorithm is used to divide the input image into partial regions. Since the clustering by the k-means algorithm is known, the description thereof is omitted. In the present embodiment, using k-means, first, RGB color information is extracted as a feature amount from pixels in the input image, and each pixel is clustered into 50 clusters on the feature space. Thereafter, each cluster is mapped on the two-dimensional image. Then, for example, two pixels belonging to the same cluster on the feature space may be separated on the two-dimensional image. Therefore, a labeling process is performed on the clusters mapped on the two-dimensional image, and the clusters that are spatially separated are divided to obtain a final cluster, that is, a partial region. From this, the number of partial regions finally obtained is expected to be 50 or more.

部分領域のカテゴリを認識する時に用いる学習・認識処理には、一般的に用いられるＳＶＭ（Support Vector Machine）を用いるとする。本学習・識別手法は公知のため、説明を省略する。また、認識時に用いられるパラメータ（以下、辞書とする）は、学習時の入力画像の分割数に依存すると考えられるため、一般に学習時の分割数に近いほど、認識精度が良いことが期待される。本実施形態では、学習時の分割数は”５０”とする。 Assume that a commonly used SVM (Support Vector Machine) is used for learning / recognition processing used when recognizing a category of a partial region. Since this learning / identification method is publicly known, description thereof is omitted. In addition, since the parameters used for recognition (hereinafter referred to as a dictionary) are considered to depend on the number of divisions of the input image at the time of learning, generally the closer the number of divisions at the time of learning, the better the recognition accuracy is expected. . In this embodiment, the number of divisions during learning is “50”.

部分領域を認識するためにＳＶＭに入力される特徴量として、部分領域内の画素の、色の分布のヒストグラムを用いるとする。具体的には、領域内の画素のＲＧＢそれぞれの輝度（０〜２５５）を、均等に４段階に分けて（０〜６３、６４〜１２７、１２８〜１９１、１９２〜２５５）ヒストグラムを作成する。すなわち、ＲＧＢそれぞれ４次元、計１２次元の特徴量となる。ヒストグラムは値が０〜１となるように正規化を行うものとする。 It is assumed that a histogram of color distribution of pixels in the partial area is used as a feature amount input to the SVM for recognizing the partial area. Specifically, the luminance of each of RGB (0 to 255) of the pixels in the region is equally divided into four stages (0 to 63, 64 to 127, 128 to 191 and 192 to 255) to create a histogram. That is, each of RGB has 4 dimensions and 12 dimensions in total. The histogram is normalized so that the value is 0-1.

＜装置構成＞
図１は、第１実施形態に係る情報処理装置のブロック図である。メモリ１０１は、各処理で必要となるデータを記憶する。例えば、認識処理を実現するプログラムおよび辞書が格納されている。カメラ１０２は、２次元的な画像を撮影するカメラであり、撮影画像をメモリ１０１に出力する。 <Device configuration>
FIG. 1 is a block diagram of an information processing apparatus according to the first embodiment. The memory 101 stores data necessary for each process. For example, a program and a dictionary that realize recognition processing are stored. The camera 102 is a camera that captures a two-dimensional image, and outputs the captured image to the memory 101.

入力デバイス１０３は、ユーザーからの様々な入力を受け付ける。例えば、ユーザーからカメラ１０２で写真を撮影するコマンドの入力を受け取る。ディスプレイ１０４は、入力された画像や認識処理結果を表示する。 The input device 103 receives various inputs from the user. For example, an input of a command for taking a picture with the camera 102 is received from the user. The display 104 displays the input image and the recognition processing result.

ＣＰＵ１０５は、メモリ１０１に格納されているプログラムを実行し、様々な処理を実現する機能部である。また、各デバイスをコントロールする機能を提供する。例えば、撮影画像を分割する部分領域分割処理を行ったり、メモリ１０１に保存されている画像をディスプレイ１０４に転送したりする。 The CPU 105 is a functional unit that executes a program stored in the memory 101 and realizes various processes. It also provides a function to control each device. For example, a partial area dividing process for dividing a captured image is performed, or an image stored in the memory 101 is transferred to the display 104.

ＧＰＵ１０６は、ＣＰＵ１０５から指示された処理を並列に実行する。例えば、部分領域の特徴生成処理や、部分領域の認識処理を行う。また、ＧＰＵ１０６は複数のＰＥ（処理コア）１０７を備えており、各処理コアに処理が分配され、処理が実行される。各処理のコード（カーネル）は実行前にメモリ１０１からＧＰＵ１０６にロードされて実行される。 The GPU 106 executes processes instructed by the CPU 105 in parallel. For example, partial region feature generation processing or partial region recognition processing is performed. Further, the GPU 106 includes a plurality of PEs (processing cores) 107, and processing is distributed to each processing core, and processing is executed. The code (kernel) of each process is loaded from the memory 101 to the GPU 106 and executed before execution.

なお、本実施形態における画像処理装置の構成は、上述の構成に限定されない。例えば、ＣＰＵ１０５が各機能部をコントロールする機能を有するとしたが、各デバイスをコントロールするための専用の処理プロセッサを用いる構成でもよい。また、ＧＰＵの代わりにＤＳＰ等他の並列処理アクセラレータが搭載されていても良い。 Note that the configuration of the image processing apparatus in the present embodiment is not limited to the above-described configuration. For example, although the CPU 105 has a function of controlling each functional unit, a configuration using a dedicated processing processor for controlling each device may be used. In addition, other parallel processing accelerators such as a DSP may be mounted instead of the GPU.

＜装置の動作＞
図２は、第１実施形態における認識処理を示すフローチャートである。以下の処理は、ＣＰＵ１０５が所定の制御プログラムを実行することにより実現される。 <Operation of the device>
FIG. 2 is a flowchart showing the recognition processing in the first embodiment. The following processing is realized by the CPU 105 executing a predetermined control program.

ステップＳ２０１では、ＣＰＵ１０５は、ＧＰＵに搭載された処理コアの数を取得し、メモリ１０１に記録する。処理コアの数は、例えばＯｐｅｎＣＬのＡＰＩの場合はｃｌＧｅｔＤｅｖｉｃｅＩｎｆｏを使用して求めることができる。ここでは、処理コア数＝１６という値がメモリ１０１に保存される。 In step S <b> 201, the CPU 105 acquires the number of processing cores mounted on the GPU and records it in the memory 101. The number of processing cores can be obtained using, for example, clGetDeviceInfo in the case of the OpenCL API. Here, a value of processing core number = 16 is stored in the memory 101.

ステップＳ２０２では、ＣＰＵ１０５は、前提条件で提示したｋ−ｍｅａｎｓアルゴリズムを用いて入力画像を部分領域に分割する。ここでは、入力画像は、入力デバイス１０３によって入力された写真撮影コマンドに従って撮影された画像である。 In step S202, the CPU 105 divides the input image into partial regions using the k-means algorithm presented in the precondition. Here, the input image is an image photographed in accordance with a photography command input by the input device 103.

すなわち、ｋ−ｍｅａｎｓのクラス数であるｋの値に、前提条件で提示した”５０”という値を設定してクラスタリングを行う。その後、前提条件で示したように、２次元の画像上にマッピングした後、ラべリング処理を行う。すなわち、各部分領域に対して番号が振られる。 That is, clustering is performed by setting the value of “50” presented as the precondition to the value of k, which is the number of k-means classes. Then, as shown in the preconditions, after mapping onto a two-dimensional image, a labeling process is performed. That is, a number is assigned to each partial area.

図３は、入力画像が処理されていく様子を例示的に示す図である。図３（ａ）は分割前の入力画像の例、図３（ｂ）は分割後の入力画像の例をそれぞれ示している。部分領域の番号は入力画像の各画素が保持しているものとする。また、最終的に得られた部分領域の数が５４個であるとする。すなわち、図３（ｂ）に示す各部分領域には１〜５４の何れかの番号が振られている。 FIG. 3 is a diagram exemplarily showing how an input image is processed. FIG. 3A shows an example of the input image before division, and FIG. 3B shows an example of the input image after division. It is assumed that the number of the partial area is held by each pixel of the input image. Further, it is assumed that the number of partial regions finally obtained is 54. That is, any number from 1 to 54 is assigned to each partial region shown in FIG.

ステップＳ２０３では、ＣＰＵ１０５は、前提条件で提示した１２次元の特徴量を、部分領域毎に導出する。すなわち、部分領域内の各画素のＲＧＢ値を部分領域毎の１２次元の各ビンに追加し、最後に正規化処理を行う。 In step S <b> 203, the CPU 105 derives the 12-dimensional feature amount presented as the precondition for each partial region. That is, the RGB value of each pixel in the partial area is added to each 12-dimensional bin for each partial area, and finally normalization processing is performed.

ステップＳ２０４では、ＣＰＵ１０５は、メモリ１０１に保存された処理コア数（ｐとする）とＳ２０２で分割された分割数（ｙとする）とを用いて、新しい分割数（ｘとする）を決定する。ここでは、以下の条件に従って新しい分割数を決定する。 In step S204, the CPU 105 determines a new division number (x) using the number of processing cores (p) and the division number (y) divided in S202, stored in the memory 101. . Here, a new division number is determined according to the following conditions.

ｘ＝ｍａｘ（ｎ×ｐ）（ｎ：自然数）かつｘ≦ｙ
ここで、ｍａｘ（Ｘ）は、Ｘの取り得る値の中での最大値を意味する。 x = max (n × p) (n: natural number) and x ≦ y
Here, max (X) means the maximum value among the possible values of X.

このように決定することで、辞書作成時に使用された分割数（ここでは”５０”）に近く、かつ、分割数を調整しない場合に比べて複数の処理コアを効率良く使用して処理することが可能となる。すなわち、同程度の精度を保ちつつ処理速度を向上可能な分割数にすることができる。ここでは、ｐ＝１６、ｙ＝５４であるから、ｘ＝４８となる。 By determining in this way, the processing is close to the number of divisions used at the time of creating the dictionary (here, “50”) and more efficiently using a plurality of processing cores than when the number of divisions is not adjusted. Is possible. That is, the number of divisions that can improve the processing speed while maintaining the same level of accuracy can be achieved. Here, since p = 16 and y = 54, x = 48.

ステップＳ２０５では、ＣＰＵ１０５は、現在の分割数からＳ２０４で決定された新しい分割数にするために必要な調整処理である部分領域の統合処理において、どの部分領域を統合するかを決定する。すなわち、現在の分割数”５４”から新しい分割数”４８”にするためには、６個の部分領域（一部の部分領域）を削減する必要があり、その６個を選ぶ処理を行う。ここでは、隣接しかつ特徴量が類似する部分領域のペアを近い順から６個選ぶとする。 In step S205, the CPU 105 determines which partial area is to be integrated in the partial area integration process, which is an adjustment process necessary to obtain the new division number determined in step S204 from the current division number. That is, in order to change the current number of divisions “54” to the new number of divisions “48”, it is necessary to reduce six partial areas (partial partial areas), and processing for selecting the six is performed. Here, it is assumed that six pairs of adjacent partial regions having similar feature quantities are selected from the closest order.

具体的には、まず部分領域の隣接部分領域をリストアップする。リストアップの方法としては例えば、当該部分領域を上下左右に１画素ずらし、画素毎に部分領域の番号の変化の有無を調べ、当該部分領域の番号以外の番号が出現した場合はその番号と隣接しているとすることができる。その後、Ｓ２０３で抽出された１２次元の特徴空間上で、画像上で隣接している２つの部分領域のユークリッド距離を計算し、ユークリッド距離が短いものから６個選び、その部分領域の番号のペアをメモリ１０１に記録する。図４は、このようにして決定された統合予定の６個の部分領域ペアの一覧を示す図である。 Specifically, first, a partial area adjacent to the partial area is listed. As a list-up method, for example, the partial area is shifted by one pixel vertically and horizontally, the partial area number is checked for each pixel, and if a number other than the partial area number appears, it is adjacent to that number. It can be said that After that, on the 12-dimensional feature space extracted in S203, the Euclidean distance between two adjacent partial areas on the image is calculated, and six of the Euclidean distances that are short are selected, and a pair of the partial area numbers is selected. Is stored in the memory 101. FIG. 4 is a diagram showing a list of six partial region pairs to be integrated determined in this way.

ステップＳ２０６では、ＣＰＵ１０５は、Ｓ２０５で決定された部分領域ペアを統合する。ここでは、２つの部分領域中の各画素に登録されている部分領域の番号を、若い方の番号で更新する。例えば、図４より、部分領域番号”４”と”８”がペアであった場合は、部分領域”８”に含まれる各画素の部分領域の番号を”４”に書き換える。図３（ｃ）は統合後の例を示している。この段階で、入力画像は新しい分割数”４８”で分割された状態になる。 In step S206, the CPU 105 integrates the partial area pairs determined in S205. Here, the number of the partial area registered in each pixel in the two partial areas is updated with the younger number. For example, from FIG. 4, when the partial area numbers “4” and “8” are a pair, the number of the partial area of each pixel included in the partial area “8” is rewritten to “4”. FIG. 3C shows an example after integration. At this stage, the input image is divided by the new division number “48”.

ステップＳ２０７では、ＣＰＵ１０５は、４８個の部分領域の認識処理を１６個の処理コアに実行させる。すなわち、各処理コアは、前提条件で提示したＳＶＭと、メモリ１０１にあるＳＶＭの辞書のパラメータを用いて、特徴から推測されるカテゴリを出力する。ここでは各処理コアに分割により得られた複数の部分領域を等分に分配する。よって、各処理コアは３（＝４８／１６）個の部分領域の認識処理を行う。 In step S207, the CPU 105 causes the 16 processing cores to execute the recognition processing of 48 partial areas. That is, each processing core outputs the category estimated from the feature using the SVM presented in the preconditions and the parameters of the SVM dictionary in the memory 101. Here, a plurality of partial regions obtained by the division are distributed equally to each processing core. Therefore, each processing core performs recognition processing of 3 (= 48/16) partial areas.

出力結果であるカテゴリ尤度マップは、一番値が高いカテゴリをその部分領域のカテゴリとして、例えばディスプレイ１０４を通して出力される。図３（ｄ）は出力された認識結果を示している。 The category likelihood map as an output result is output through the display 104, for example, with the category having the highest value as the category of the partial area. FIG. 3D shows the output recognition result.

ステップＳ２０８では、ＣＰＵ１０５は、全ての入力画像に対する評価が完了したかどうかを判定する。すなわち、まだ認識されていない入力画像が残っていると判定されれば、Ｓ２０２に移行する。全ての入力画像で認識が完了したと判定されれば、処理を終了する。 In step S208, the CPU 105 determines whether evaluation for all input images has been completed. That is, if it is determined that there are still unrecognized input images, the process proceeds to S202. If it is determined that the recognition has been completed for all input images, the process is terminated.

なお、認識処理を実行する処理系の実行可能処理数として、ＧＰＵ上のコア数を用いるとしたが、例えばスレッド数やその時点での実行可能最大処理数等を用いても良い。また処理系としてＧＰＵ以外にも、一般的なマルチコアＣＰＵやＳＩＭＤ等並列処理可能な処理系を選択することももちろん可能である。 Although the number of cores on the GPU is used as the number of executable processes of the processing system that executes the recognition process, for example, the number of threads or the maximum number of executable processes at that time may be used. In addition to the GPU, it is of course possible to select a processing system capable of parallel processing such as a general multi-core CPU or SIMD as the processing system.

また、上述の説明では、分割数がＧＰＵの処理コア数の整数倍になるようにしたが、例えば一定のしきい値を設けて、ある範囲に収まっていれば多少のずれが生じてもよい、としてもよい。例えば、しきい値を処理コア数（ｐ）の８０％とし、分割数調整前の分割数をｙとしたときに、ｙｍｏｄｐ＜０．８ｐとなる場合に分割数を調整し、それ以外の場合には調整しないといったことも有効である。ここで、ＹｍｏｄＸは、ＹをＸで除算した時の剰余である。このようにすることで、認識精度と処理速度のバランスを取って分割数を選択することが可能になる。 In the above description, the number of divisions is an integral multiple of the number of processing cores of the GPU. However, for example, a certain threshold may be provided, and a slight deviation may occur as long as it falls within a certain range. It is good also as. For example, when the threshold value is 80% of the number of processing cores (p) and the division number before the division number adjustment is y, the division number is adjusted when ymodp <0.8p, otherwise It is also effective not to make adjustments. Here, YmodX is a remainder when Y is divided by X. In this way, it is possible to select the number of divisions while balancing recognition accuracy and processing speed.

加えて、上述の説明では、認識精度は辞書作成時の分割数に近い分割数ほど良くなるとしていたが、辞書を使用しないようなアルゴリズムであっても、適切な分割数を予め指定しておき、その分割数に近い分割数を選ぶようにすることも可能である。 In addition, in the above description, the recognition accuracy is improved as the number of divisions close to that at the time of creating the dictionary. However, even in an algorithm that does not use a dictionary, an appropriate number of divisions is designated in advance. It is also possible to select a division number close to the division number.

さらに、要求処理時間を考慮して分割数を決めることも可能である。例えば、仮に既定の分割数で認識処理を行った場合に、分割数が多すぎて要求処理時間を満たさないと判明した場合は、要求処理時間を満たす範囲で処理可能な最大の分割数とすることができる。例えば、画素単位の認識処理を行おうとしたとき要求処理時間を満たさなかった場合は、要求処理時間を満たす分割数となるまで画素を統合して減らすといった処理も考えられる。 Furthermore, the number of divisions can be determined in consideration of the request processing time. For example, if recognition processing is performed with a predetermined number of divisions, and it is found that the number of divisions is too large to satisfy the requested processing time, the maximum number of divisions that can be processed within the range that satisfies the requested processing time is set. be able to. For example, if the required processing time is not satisfied when trying to perform the pixel-by-pixel recognition processing, a process of integrating and reducing pixels until the number of divisions satisfying the required processing time can be considered.

また、上述の説明では、画像を部分領域に分割する方法としてｋ−ｍｅａｎｓアルゴリズムを用いたが、他のクラスタリング手法も使用可能である。例えば、Ｍｅａｎ−ｓｈｉｆｔアルゴリズムを使用することも可能である。また、あらかじめ画像の分割数を指定できる分割アルゴリズムを使用することにより、入力画像分割の段階で処理系に最適な分割数に画像を分割することが可能となり、分割数設定ステップから隣接部分領域統合ステップまでをスキップすることもできる。 In the above description, the k-means algorithm is used as a method of dividing an image into partial regions. However, other clustering methods can be used. For example, the Mean-shift algorithm can be used. In addition, by using a division algorithm that can specify the number of image divisions in advance, it is possible to divide the image into the optimum number of divisions for the processing system at the stage of input image division, and from the division number setting step, adjacent partial region integration You can also skip to steps.

加えて、上述の説明では、部分領域の特徴抽出処理を、統合される部分領域候補の決定処理の前に行ったが、決定処理の後で行っても良い。この場合、部分領域候補の決定には、特徴抽出処理で抽出する特徴とは異なる特徴を用いれば良い。例えば、色の分のヒストグラムを用いる場合でも、より計算量を少なくするためビンの数を半分に減らした６次元の特徴を使用するといったことが考えられる。これは特に、部分領域の特徴抽出処理が重く処理に時間がかかる時に有効である。 In addition, in the above description, the partial region feature extraction processing is performed before the determination processing of the partial region candidates to be integrated, but may be performed after the determination processing. In this case, a feature different from the feature extracted by the feature extraction process may be used to determine the partial region candidate. For example, even when a color histogram is used, it is conceivable to use 6-dimensional features in which the number of bins is reduced by half in order to reduce the amount of calculation. This is particularly effective when the feature extraction process of the partial region is heavy and takes time.

さらに、上述の説明では、「分割数」について考慮してきたが、例えば画像をほぼ同じ大きさで分割する分割アルゴリズムを用いる場合は、「部分領域の面積」を分割数と対応付けて考えることもできる。例えば、１００×１００の画像に対して部分領域の面積を１０×１０としたときは、分割数が１００に相当する。 Furthermore, in the above description, “the number of divisions” has been considered. However, for example, when using a division algorithm that divides an image with substantially the same size, “area of a partial region” may be considered in association with the number of divisions. it can. For example, when the area of the partial region is 10 × 10 for a 100 × 100 image, the number of divisions corresponds to 100.

以上説明したとおり第１実施形態によれば、分割数を、認識処理を実行する処理系の実行可能処理数と、処理全体に及ぼす影響として認識処理の認識精度と、に基づいて決定する。この構成により、認識精度が低下する可能性を低減しつつ、複数の処理コアを効率的に使用可能とする分割数に画像を分割することができる。 As described above, according to the first embodiment, the number of divisions is determined based on the number of executable processes of the processing system that executes the recognition process and the recognition accuracy of the recognition process as an influence on the entire process. With this configuration, it is possible to divide an image into a number of divisions that can efficiently use a plurality of processing cores while reducing the possibility of a reduction in recognition accuracy.

なお、上述の説明では画像認識処理を例として説明したが、部分領域の特徴を利用する他の処理にも適用が可能である。例えば、画像の高画質化処理や主要被写体の検出処理など、様々な処理への適用が考えられる。 In the above description, the image recognition process has been described as an example, but the present invention can also be applied to other processes using the characteristics of the partial area. For example, it can be applied to various processing such as image quality enhancement processing and main subject detection processing.

（第２実施形態）
第２実施形態では、認識処理を実行する処理系の実行可能処理数と、分割数を調整した場合に生じる処理時間の変化と、に基づいて分割数を調整する形態について説明する。なお、以降の説明では、第１実施形態で説明した処理と同様の処理については説明を省略し、第１実施形態で説明した処理と異なる処理についてのみ説明する。 (Second Embodiment)
In the second embodiment, a mode in which the number of divisions is adjusted based on the number of executable processes of the processing system that executes the recognition process and the change in processing time that occurs when the number of divisions is adjusted will be described. In the following description, description of processes similar to those described in the first embodiment will be omitted, and only processes different from those described in the first embodiment will be described.

＜前提条件＞
まず、第２実施形態における前提条件について説明する。以下に記述していない他の前提条件は、第１実施形態と同様である。 <Prerequisites>
First, preconditions in the second embodiment will be described. Other preconditions not described below are the same as those in the first embodiment.

認識処理を実行する処理系の実行可能処理数として、第１実施形態と同様、ＧＰＵ上のコア数を用いる。また本実施形態では、処理全体に及ぼす影響として、分割数を調整した場合に生じる処理時間の増加分と調整しなかった場合に生じる処理時間の増加分を用いる。すなわち、処理効率の観点から分割数がＧＰＵのコア数の整数倍になるように、かつ、処理時間の観点から全体の処理時間が短くなるように選ぶ。 As with the first embodiment, the number of cores on the GPU is used as the number of executable processes of the processing system that executes the recognition process. In the present embodiment, as an influence on the entire process, an increase in the processing time that occurs when the number of divisions is adjusted and an increase in the processing time that occurs when the number of divisions is not adjusted are used. That is, from the viewpoint of processing efficiency, the number of divisions is selected to be an integral multiple of the number of cores of the GPU, and from the viewpoint of processing time, the entire processing time is selected.

部分領域を生成するために入力する画像として、オリジナルの入力画像の他に、入力画像にガウシアンフィルタを適用したもの、および入力画像にシャープネスフィルタを適用したものも使用する。すなわち、３枚の画像それぞれで部分領域を生成する。なお、部分領域に分割する際の条件は、どの画像に対しても同じであるとする。また、各部分領域の特徴抽出処理や認識処理も、どの画像由来の部分領域に対しても同じ条件で行うものとする。 As an image to be input to generate a partial region, an image obtained by applying a Gaussian filter to the input image and an image obtained by applying a sharpness filter to the input image are used in addition to the original input image. That is, a partial area is generated for each of the three images. It is assumed that the conditions for dividing into partial areas are the same for all images. In addition, the feature extraction process and the recognition process of each partial area are performed under the same conditions for any image-derived partial area.

分割数調整にかかる平均処理時間が予め計測されており、１組の部分領域を統合するために必要な平均処理時間が０．１（ミリ秒／１ＧＨｚ・組）とする。なお、本実施形態では分割数調整はＣＰＵで行い、ＣＰＵのクロック数は１ＧＨｚとする。 The average processing time for adjusting the number of divisions is measured in advance, and the average processing time required to integrate one set of partial areas is 0.1 (milliseconds / 1 GHz · set). In this embodiment, the division number adjustment is performed by the CPU, and the CPU clock number is 1 GHz.

認識処理にかかる平均処理時間が予め計測されており、１つの部分領域を認識するために必要な平均処理時間が１（ミリ秒／１ＧＨｚ・個）とする。このとき、ＧＰＵコアが１６個であるため、部分領域が１６個以内であれば、並列処理で一度に処理を行えることになる。すなわち、部分領域が１６個以内であれば全て同一の処理時間で処理を行うことが可能である。なお、認識処理を行うＧＰＵのクロック数は１ＧＨｚとする。 The average processing time required for the recognition processing is measured in advance, and the average processing time required for recognizing one partial area is 1 (millisecond / 1 GHz · piece). At this time, since there are 16 GPU cores, if the number of partial areas is 16 or less, processing can be performed at a time by parallel processing. That is, as long as the number of partial areas is 16 or less, it is possible to perform processing in the same processing time. Note that the number of clocks of the GPU that performs recognition processing is 1 GHz.

＜装置の動作＞
図５は、第２実施形態における認識処理を示すフローチャートである。なお、ステップＳ５０１及びＳ５１３は、第１実施形態におけるＳ２０１及びＳ２０８と同様であるため説明は省略する。 <Operation of the device>
FIG. 5 is a flowchart showing recognition processing in the second embodiment. Note that steps S501 and S513 are the same as S201 and S208 in the first embodiment, and a description thereof will be omitted.

ステップＳ５０２では、ＣＰＵ１０５は、入力画像に対してガウシアンフィルタおよびシャープネスフィルタを適用し、適用結果得られる画像２枚をそれぞれ出力する。 In step S502, the CPU 105 applies a Gaussian filter and a sharpness filter to the input image, and outputs two images obtained as a result of the application.

ステップＳ５０３では、ＣＰＵ１０５は、オリジナルの入力画像およびステップＳ５０２で出力された２枚の画像に対し、第１実施形態におけるＳ２０２と同様にして部分領域に分割する。ここでは、最終的に得られた部分領域の数は、オリジナルの入力画像では”５４”、ガウシアンフィルタ適用画像では”５１”、シャープネスフィルタ適用画像では”５８”であるとする。ステップＳ５０４では、ＣＰＵ１０５は、Ｓ５０３で得られたそれぞれの領域から、第１実施形態におけるＳ２０３と同様にして特徴を抽出する。 In step S503, the CPU 105 divides the original input image and the two images output in step S502 into partial areas in the same manner as in S202 in the first embodiment. Here, it is assumed that the number of partial regions finally obtained is “54” in the original input image, “51” in the Gaussian filter applied image, and “58” in the sharpness filter applied image. In step S504, the CPU 105 extracts features from the respective areas obtained in S503 in the same manner as in S203 in the first embodiment.

ステップＳ５０５では、ＣＰＵ１０５は、Ｓ５０３で分割されたすべての部分領域の数を合算する。すなわち、現在得られている部分領域の数がそれぞれ５４，５１，５８であるから、これを合算すると１６３となり、この値がメモリ１０１に記録される。 In step S505, the CPU 105 adds up the numbers of all partial areas divided in S503. That is, since the number of partial areas currently obtained is 54, 51, and 58, respectively, the total is 163, and this value is recorded in the memory 101.

ステップＳ５０６では、ＣＰＵ１０５は、第１実施形態におけるＳ２０４と同様にして、分割数の候補を決定する。処理コア数ｐ＝１６、分割数ｙ＝１６３であるから、新しい分割数ｘ＝１６０となる。第１実施形態ではこの値をそのまま使用して分割数を調整したが、第２実施形態では処理負荷（特に処理時間）の比較を行い、本ステップで得られた分割数を用いた場合の処理時間が短くなる場合に限り、分割数を調整する。 In step S506, the CPU 105 determines candidates for the number of divisions in the same manner as in S204 in the first embodiment. Since the processing core number p = 16 and the division number y = 163, the new division number x = 160. In the first embodiment, this value is used as it is to adjust the number of divisions. However, in the second embodiment, the processing load (particularly the processing time) is compared, and the processing when the number of divisions obtained in this step is used. Only when the time is shortened, the number of divisions is adjusted.

ステップＳ５０７では、ＣＰＵ１０５は、分割数を調整した場合に生じる処理時間の増加分（以下、調整時処理時間）と調整しなかった場合に生じる処理時間の増加分（以下、未調整時処理時間）をそれぞれ計算（予測）する。 In step S507, the CPU 105 increases an increase in processing time that occurs when the number of divisions is adjusted (hereinafter referred to as processing time during adjustment) and an increase in processing time that occurs when the number of divisions is not adjusted (hereinafter referred to as unadjusted processing time). Is calculated (predicted).

まず、調整時処理時間について、今回は１６３個から１６０個へ、３個の部分領域の削減、すなわち３組の部分領域を統合する必要がある。前提条件に基づいて調整時処理時間ｔ１を求めると、
ｔ１＝０．１（ミリ秒／１ＧＨｚ・組）×１（ＧＨｚ）×３（組）＝０．３（ｍｓ）
となる。一方、未調整時処理時間について、今回は３個分の処理を、ＧＰＵ上で余計に処理する必要がある。前提条件に基づいて未調整時処理時間ｔ２を求めると、
ｔ２＝１（ミリ秒／１ＧＨｚ・個）×１（ＧＨｚ）×ｃｅｉｌ（３（個）／１６）＝１（ｍｓ）
となる。なお、ｃｅｉｌ（Ｘ）は、Ｘの要素を正の方向の最も近い整数に丸める処理（切り上げ処理）を表す。 First, regarding the adjustment processing time, it is necessary to reduce three partial areas from 163 to 160 this time, that is, to integrate three sets of partial areas. When the adjustment processing time t1 is obtained based on the preconditions,
t1 = 0.1 (milliseconds / 1 GHz · set) × 1 (GHz) × 3 (set) = 0.3 (ms)
It becomes. On the other hand, regarding the unadjusted processing time, this time, it is necessary to additionally process the processing for three on the GPU. When the unadjusted processing time t2 is obtained based on the preconditions,
t2 = 1 (milliseconds / 1 GHz / piece) × 1 (GHz) × ceil (3 (pieces) / 16) = 1 (ms)
It becomes. Note that ceil (X) represents a process (round-up process) that rounds the element of X to the nearest integer in the positive direction.

ステップＳ５０８では、ＣＰＵ１０５は、調整時処理時間と未調整時処理時間との大小の比較を行う。調整時処理時間＜未調整時処理時間と判定されれば、それは分割数を調整した方が全体の処理時間が少なくなることを意味しており、ステップＳ５０９に移行する。一方、調整時処理時間＞未調整時処理時間と判定されれば、それは分割数を調整しない方が全体の処理時間が少なくなることを意味しており、ステップＳ５１２に移行する。ここでは、ｔ１＜ｔ２でありステップＳ５０９に移行する。 In step S508, the CPU 105 compares the adjustment processing time with the non-adjustment processing time. If it is determined that the processing time during adjustment <the processing time during non-adjustment, it means that the total processing time becomes shorter when the number of divisions is adjusted, and the process proceeds to step S509. On the other hand, if it is determined that the adjustment processing time> the non-adjustment processing time, it means that the total processing time is shorter when the number of divisions is not adjusted, and the process proceeds to step S512. Here, t1 <t2, and the process proceeds to step S509.

ステップＳ５０９では、ＣＰＵ１０５は、現在の分割数からＳ５０６で決定された新しい分割数にするために必要な部分領域の統合処理において、どの部分領域を統合するかを決定する。すなわち、現在の分割数”１６３”から新しい分割数”１６０”にするために削減・統合する３個の部分領域を選ぶ処理を行う。ここでは、部分領域の大きさを小さい順から３個選び、その部分領域の番号をメモリ１０１に記録する。 In step S509, the CPU 105 determines which partial area to be integrated in the integration process of partial areas necessary for obtaining the new division number determined in step S506 from the current division number. That is, a process of selecting three partial areas to be reduced and integrated in order to change the current division number “163” to the new division number “160” is performed. Here, three partial areas are selected in ascending order, and the numbers of the partial areas are recorded in the memory 101.

ステップＳ５１０では、ＣＰＵ１０５は、Ｓ５０９で選択された部分領域を、周囲を取り囲む部分領域の数で分割し、それぞれの部分領域に吸収・統合させる。周囲を取り囲む部分領域の特定には、第１実施形態におけるＳ２０５で説明した隣接部分領域リストアップの手法を流用することができる。隣接している個数がわかれば、それをクラスタ数として例えばｋ−ｍｅａｎｓアルゴリズムを用いて当該部分領域を分割する。 In step S510, the CPU 105 divides the partial area selected in step S509 by the number of partial areas surrounding the periphery, and absorbs and integrates the partial areas in each partial area. In order to identify the partial region surrounding the periphery, the technique for listing adjacent partial regions described in S205 in the first embodiment can be used. If the adjacent number is known, the partial area is divided by using, for example, a k-means algorithm as the number of clusters.

図６は、部分領域の分割及び統合を説明する図である。図６（ａ）は分割前、図６（ｂ）は分割後の例を示している。ここでもし、分割後にどの隣接部分領域とも隣接していない部分領域が飛び地のように生成された場合は、第１実施形態と同様にして、特徴量が最も近い部分領域へ統合させる。 FIG. 6 is a diagram for explaining division and integration of partial areas. FIG. 6A shows an example before division, and FIG. 6B shows an example after division. Here, if a partial region that is not adjacent to any adjacent partial region after division is generated like an enclave, it is integrated into the partial region having the closest feature amount as in the first embodiment.

ステップＳ５１１では、ＣＰＵ１０５は、Ｓ５１０で分割された各領域を、隣接する最も特徴が近い部分領域へ統合させる。すなわち、第１実施形態におけるＳ２０６と同様にして、部分領域の番号を更新する。図６（ｃ）は更新された部分領域の例を示している。 In step S511, the CPU 105 integrates the areas divided in S510 into the adjacent partial areas having the closest features. That is, the number of the partial area is updated in the same manner as S206 in the first embodiment. FIG. 6C shows an example of the updated partial area.

ステップＳ５１２では、ＣＰＵ１０５は、１６０個の部分領域の認識処理を１６個の処理コアに実行させる。すなわち、各処理コアは、前提条件で提示したＳＶＭと、メモリ１０１にあるＳＶＭの辞書のパラメータを用いて、特徴から推測されるカテゴリを出力する。よって、各処理コアは１０（＝１６０／１６）個の部分領域の認識処理を行う。１６０個の部分領域の認識結果を画素毎に集計し、２つないし３つの認識結果（カテゴリ尤度）を有する画素に関しては、そのカテゴリ尤度を平均したものをその画素のカテゴリ尤度とする。 In step S512, the CPU 105 causes the 16 processing cores to execute the recognition processing of 160 partial areas. That is, each processing core outputs the category estimated from the feature using the SVM presented in the preconditions and the parameters of the SVM dictionary in the memory 101. Therefore, each processing core performs recognition processing of 10 (= 160/16) partial areas. The recognition results of 160 partial areas are totaled for each pixel, and for pixels having two to three recognition results (category likelihoods), the average of the category likelihoods is used as the category likelihood of the pixel. .

なお上述の説明においては処理全体に及ぼす処理負荷として処理時間を考慮する形態について説明したが、それ以外の指標を用いても構わない。例えば、分割数調整に要する消費電力と、認識処理の消費電力と、を事前に計算しておき、全体として最も消費電力が低くなるように分割数を決定しても良い。また、データ転送時間を指標として用いてもよい。 In the above description, the processing time is considered as the processing load on the entire processing. However, other indexes may be used. For example, the power consumption required for adjusting the number of divisions and the power consumption of the recognition process may be calculated in advance, and the number of divisions may be determined so that the overall power consumption is the lowest. Further, the data transfer time may be used as an index.

以上説明したとおり第２実施形態によれば、分割数を、認識処理を実行する処理系の実行可能処理数と、処理全体に及ぼす影響として分割数を調整した場合に生じる処理時間の変化と、に基づいて決定する。この構成により、処理時間が増加する可能性を低減しつつ、複数の処理コアを効率的に使用可能とする分割数に画像を分割することができる。 As described above, according to the second embodiment, the number of divisions, the number of executable processes of the processing system that executes the recognition process, and the change in the processing time that occurs when the number of divisions is adjusted as an influence on the entire process, Determine based on. With this configuration, it is possible to divide an image into a number of divisions that can efficiently use a plurality of processing cores while reducing the possibility of an increase in processing time.

（第３実施形態）
第３実施形態では、認識処理を実行する処理系の実行可能処理数と、分割数を調整した場合に生じる処理時間の変化と、に基づいて分割数を調整する他の形態について説明する。なお、以降の説明では、第１及び第２実施形態で説明した処理と同様の処理については説明を省略し、第１及び第２実施形態で説明した処理と異なる処理についてのみ説明する。 (Third embodiment)
In the third embodiment, another mode of adjusting the number of divisions based on the number of executable processes of the processing system that executes the recognition process and the change in the processing time that occurs when the number of divisions is adjusted will be described. In the following description, description of processes similar to those described in the first and second embodiments will be omitted, and only processes different from those described in the first and second embodiments will be described.

＜前提条件＞
まず、第３実施形態における前提条件について説明する。以下に記述していない他の前提条件は、第１及び第２実施形態と同様である。 <Prerequisites>
First, preconditions in the third embodiment will be described. Other preconditions not described below are the same as those in the first and second embodiments.

予め分割数ごとに辞書を作成しておくとする。ここでは特に、ｋ−ｍｅａｎｓのクラス数を２５，５０，７５，・・・と２５きざみで分割した部分領域で学習した辞書を予め保持しているとする。 It is assumed that a dictionary is created for each division number in advance. Here, in particular, it is assumed that a dictionary learned in a partial region obtained by dividing the number of k-means classes in increments of 25, 50, 75,.

図８は、辞書番号とクラス数とを対応付けたテーブルを示す図である。ここでは、辞書番号として、クラス数”２５”は辞書番号”１”、クラス数”５０”は辞書番号”２”・・・としている。 FIG. 8 is a diagram illustrating a table in which dictionary numbers and class numbers are associated with each other. Here, as the dictionary number, the class number “25” is the dictionary number “1”, the class number “50” is the dictionary number “2”.

認識処理を実行する処理系の実行可能処理数として、第１実施形態と同様、ＧＰＵ上のコア数を用いる。また本実施形態では、処理全体に及ぼす影響として分割数を調整した場合に生じる処理時間の増加分と調整しなかった場合に生じる処理時間の増加分を用いる。すなわち、処理効率の観点から、分割数がＧＰＵのコア数の整数倍になるように、かつ処理時間の観点から、全体の処理時間が短くなるように選ぶ。 As with the first embodiment, the number of cores on the GPU is used as the number of executable processes of the processing system that executes the recognition process. In this embodiment, an increase in processing time that occurs when the number of divisions is adjusted and an increase in processing time that occurs when adjustment is not performed are used as the influence on the entire processing. That is, from the viewpoint of processing efficiency, the number of divisions is selected to be an integral multiple of the number of GPU cores, and from the viewpoint of processing time, the entire processing time is selected to be short.

認識処理の内容を、分割数を調整した場合と調整していない場合で変化させる。具体的には、分割数を調整しない場合は、同じクラス数で学習した辞書を用いて認識処理を行う。そして、分割数を調整した場合は、上述の辞書による認識結果に加え、それよりもクラス数が１レベル小さい辞書による認識結果を用い、両結果を統合する処理を行う。すなわち、分割領域を統合する処理が入ることで、統合前と比較してクラス数が減少している可能性があるため、クラス数が少ない辞書も用いて結果を出力させることで、認識精度の向上を目指す。例えば、クラス数”５０”で分割し、最終的に分割数が”４８”となった場合は、クラス数”５０”で分割した辞書（辞書番号”１”の辞書）による認識処理に加え、クラス数”２５”で分割した辞書（辞書番号”２”の辞書）による認識処理も行う。両結果は案分して統合を行う。すなわち、上記の場合、
クラス数”２５”の認識結果：クラス数”５０”の認識結果
＝（５０−４８）：（４８−２５）＝２：２３
の重みを付けて結果を統合する。分割数を調整した場合は２種類の辞書で認識処理を行う必要があるため、１種類の辞書で認識処理を行う場合（第２実施形態と同様、１（ミリ秒／１ＧＨｚ・個）とする）と比較して処理時間が延びる。ここでは、２つの辞書を用いて１つの部分領域を認識するのに必要な処理時間は１．１（ミリ秒／１ＧＨｚ・個）とする。さらに、ＧＰＵコアが１６個であるため、部分領域が１６個以内であれば、並列処理で一度に処理を行える。すなわち、全て同一の処理時間で処理を行える。なお、認識処理を行うＧＰＵのクロック数は第２実施形態と同様１ＧＨｚとする。 The contents of the recognition process are changed depending on whether the number of divisions is adjusted or not. Specifically, when the number of divisions is not adjusted, recognition processing is performed using a dictionary learned with the same number of classes. When the number of divisions is adjusted, in addition to the recognition result by the dictionary described above, a recognition result by a dictionary having a class number one level smaller than that is used to perform a process of integrating both results. In other words, there is a possibility that the number of classes may be reduced compared to before the integration due to the process of integrating the divided areas. Therefore, by using a dictionary with a small number of classes and outputting the results, the recognition accuracy can be improved. Aim for improvement. For example, when the class number is divided by “50” and finally the number of divisions becomes “48”, in addition to the recognition processing by the dictionary divided by the class number “50” (dictionary with dictionary number “1”), Recognition processing is also performed using a dictionary divided by class number “25” (dictionary with dictionary number “2”). Both results will be integrated as appropriate. That is, in the above case,
Recognition result of class number “25”: Recognition result of class number “50” = (50−48) :( 48−25) = 2: 23
Integrate the results with a weight of. When the number of divisions is adjusted, it is necessary to perform recognition processing with two types of dictionaries. Therefore, when recognition processing is performed with one type of dictionary (similar to the second embodiment, 1 (millisecond / 1 GHz · piece) is assumed. ), The processing time is extended. Here, the processing time required to recognize one partial area using two dictionaries is assumed to be 1.1 (millisecond / 1 GHz · piece). Furthermore, since there are 16 GPU cores, if the number of partial regions is 16 or less, processing can be performed at once by parallel processing. In other words, all can be processed in the same processing time. Note that the number of clocks of the GPU that performs the recognition process is 1 GHz as in the second embodiment.

分割数の調整を行う処理を、第２実施形態と異なりＧＰＵで行うものとする。処理時間は、０．１（ミリ秒／１ＧＨｚ・組）とする。このとき、ＧＰＵコアが１６個であるため、部分領域の組が１６組以内であれば、並列処理で一度に処理を行えることになる。すなわち、全て同一の処理時間で処理を行うことが可能である。 Unlike the second embodiment, the processing for adjusting the number of divisions is performed by the GPU. The processing time is 0.1 (millisecond / 1 GHz · set). At this time, since there are 16 GPU cores, if the number of sets of partial areas is 16 or less, processing can be performed at a time by parallel processing. That is, it is possible to perform processing in the same processing time.

＜装置の動作＞
図７は、第３実施形態における認識処理を示すフローチャートである。ステップＳ７０１〜Ｓ７０４、Ｓ７０６〜Ｓ７０７、Ｓ７１１は、第１実施形態におけるＳ２０１〜２０４、Ｓ２０５〜Ｓ２０６、Ｓ２０８と同様であるため説明は省略する。なお、ここでは、Ｓ７０２で最終的に得られる部分領域の数は”５４”、Ｓ７０４で得られる分割数は”４８”であるとする。 <Operation of the device>
FIG. 7 is a flowchart showing recognition processing in the third embodiment. Steps S701 to S704, S706 to S707, and S711 are the same as S201 to S204, S205 to S206, and S208 in the first embodiment, and a description thereof will be omitted. Here, it is assumed that the number of partial areas finally obtained in S702 is “54” and the number of divisions obtained in S704 is “48”.

ステップＳ７０５では、ＣＰＵ１０５は、分割数を調整した場合の処理時間（以下、調整時処理時間）と調整しなかった場合の処理時間（以下、未調整時処理時間）をそれぞれ計算し、比較を行う。 In step S705, the CPU 105 calculates and compares the processing time when the number of divisions is adjusted (hereinafter referred to as adjustment time processing time) and the processing time when the number of divisions is not adjusted (hereinafter referred to as unadjusted processing time). .

比較の結果、調整時処理時間＜未調整時処理時間と判定されれば、それは分割数を調整した方が全体の処理時間が少なくなることを意味しており、ステップＳ７０６に移行する。一方、調整時処理時間＞未調整時処理時間と判定されれば、それは分割数を調整しない方が全体の処理時間が少なくなることを意味しており、ステップＳ７０８に移行する。 As a result of the comparison, if it is determined that the adjustment processing time is smaller than the non-adjustment processing time, it means that the total processing time is reduced when the number of divisions is adjusted, and the process proceeds to step S706. On the other hand, if it is determined that the adjustment processing time> the non-adjustment processing time, it means that the entire processing time is reduced when the number of divisions is not adjusted, and the process proceeds to step S708.

具体的に説明すると、分割数を調整した場合に生じる処理時間の増加分については、まず分割数を調整する処理については、今回は５４個から４８個へ、６個の部分領域の削減、すなわち６組の部分領域を統合する必要がある。前提条件に基づいて分割数調整に要する処理時間の増加分ｔ１１を求めると、以下のようになる。 More specifically, regarding the increase in processing time that occurs when the number of divisions is adjusted, for the process of adjusting the number of divisions, the number of partial areas is reduced from 54 to 48 this time. Six sets of partial areas need to be integrated. The increase t11 in processing time required for adjusting the number of divisions based on the preconditions is obtained as follows.

ｔ１１＝０．１（ミリ秒／１ＧＨｚ・組）×１（ＧＨｚ）×ｃｅｉｌ（６（組）／１６）＝０．１（ｍｓ）
さらに本実施形態では、分割数を調整した場合は２つの辞書を用いて認識処理を行う。前提条件に基づいて認識処理時間の増加分ｔ１２を求めると、以下のようになる。 t11 = 0.1 (millisecond / 1 GHz · set) × 1 (GHz) × ceil (6 (set) / 16) = 0.1 (ms)
Furthermore, in this embodiment, when the number of divisions is adjusted, recognition processing is performed using two dictionaries. The increase t12 in the recognition processing time is obtained based on the preconditions as follows.

ｔ１２＝（２種類の辞書での認識処理時間−１種類の辞書での認識処理時間）＝（１．１−１）（ミリ秒／１ＧＨｚ・個）×１（ＧＨｚ）×ｃｅｉｌ（４８（個）／１６）＝０．３（ミリ秒）
よって、合計の調整時処理時間は、以下のようになる。 t12 = (recognition processing time for two types of dictionaries—recognition processing time for one type of dictionaries) = (1.1-1) (milliseconds / 1 GHz · piece) × 1 (GHz) × ceil (48 (pieces) ) / 16) = 0.3 (milliseconds)
Therefore, the total adjustment processing time is as follows.

ｔ１＝ｔ１１＋ｔ１２＝０．１（ｍｓ）＋０．３（ｍｓ）＝０．４（ｍｓ）
一方、分割数を調整しなかった場合に生じる処理時間の増加分について、今回は６個分の処理を、ＧＰＵ上で余計に処理する必要がある。前提条件に基づいて処理時間の増加分ｔ２を求めると、以下のようになる。 t1 = t11 + t12 = 0.1 (ms) +0.3 (ms) = 0.4 (ms)
On the other hand, with respect to the increase in processing time that occurs when the number of divisions is not adjusted, it is necessary to process six processes this time on the GPU. When the increase t2 in the processing time is obtained based on the preconditions, it is as follows.

ｔ２＝１（ミリ秒／１ＧＨｚ・個）×１（ＧＨｚ）×ｃｅｉｌ（６（個）／１６）＝１（ｍｓ）
ｔ１＜ｔ２となり、これは分割数を調整した方が全体の処理時間が少なくなることを表している。よって、この場合はステップＳ７０６に移行する。また、分割数の調整を行うというフラグを立て、メモリ１０１に記録する。 t2 = 1 (milliseconds / 1 GHz / piece) × 1 (GHz) × ceil (6 (pieces) / 16) = 1 (ms)
t1 <t2, which indicates that the overall processing time is reduced by adjusting the number of divisions. Therefore, in this case, the process proceeds to step S706. In addition, a flag for adjusting the number of divisions is set and recorded in the memory 101.

ステップＳ７０８では、ＣＰＵ１０５は、分割数調整処理が行われたかどうかを判定する。すなわち、メモリ１０１に分割数の調整を行うというフラグが立てられているかをチェックし、フラグが立っていないと判定されれば、ステップＳ７０９に移行する。フラグが立っていると判定されれば、ステップＳ７１０に移行する。 In step S708, the CPU 105 determines whether the division number adjustment process has been performed. That is, it is checked whether a flag for adjusting the number of divisions is set in the memory 101. If it is determined that the flag is not set, the process proceeds to step S709. If it is determined that the flag is set, the process proceeds to step S710.

ステップＳ７０９は、１種類の辞書（クラス数”５０”で分割した辞書。すなわち辞書番号”１”の辞書）で認識を行う処理であり、第２実施形態におけるＳ５１２と同様である。 Step S709 is a process of recognizing with one type of dictionary (a dictionary divided by class number “50”, that is, a dictionary with dictionary number “1”), and is the same as S512 in the second embodiment.

ステップＳ７１０では、２種類の辞書で認識処理を行う。最終的に分割数が”４８”となったため、クラス数”５０”で分割した辞書（辞書番号”２”の辞書）とクラス数２５で分割した辞書（辞書番号”１”の辞書）を用いてそれぞれの認識結果を案分して統合を行う。 In step S710, recognition processing is performed using two types of dictionaries. Since the number of divisions finally becomes “48”, a dictionary divided by class number “50” (dictionary with dictionary number “2”) and a dictionary divided by class number 25 (dictionary with dictionary number “1”) are used. Then, each recognition result will be divided and integrated.

すなわち、上記の場合、「クラス数”２５”の認識結果：クラス数”５０”の認識結果＝２：２３」の重みを付けて結果を統合する。例えば、ある部分領域において、クラス数”２５”の結果が「生物：自然：物：その他＝０．７：０．２：０．１：０」であり、クラス数”５０”の結果が「生物：自然：物：その他＝０．４：０．４：０．２：０」となったとする。その場合は、それぞれの結果を案分して統合すると「生物：自然：物：その他＝０．４２４：０．３８４：０．１９２：０」となる。 That is, in the above case, the recognition result of “class number“ 25 ”: recognition result of class number“ 50 ”= 2: 23” is weighted and the results are integrated. For example, in a certain partial area, the result of the class number “25” is “organism: nature: thing: others = 0.7: 0.2: 0.1: 0”, and the result of the class number “50” is “ Biology: Nature: Object: Others = 0.4: 0.4: 0.2: 0 ”. In this case, when the results are appropriately divided and integrated, the result is “organism: nature: thing: other = 0.424: 0.384: 0.192: 0”.

以上説明したとおり第３実施形態によれば、分割数を、認識処理を実行する処理系の実行可能処理数と、処理全体に及ぼす影響として分割数を調整した場合に生じる処理時間の変化と、に基づいて決定する。この構成により、より精度の高い画像認識を効率的に行うことが可能となる。 As described above, according to the third embodiment, the number of divisions, the number of executable processes of the processing system that executes the recognition process, the change in the processing time that occurs when the number of divisions is adjusted as an influence on the entire process, Determine based on. With this configuration, it is possible to efficiently perform more accurate image recognition.

（第４実施形態）
第４実施形態では、第２実施形態の変形例であり、分割数を、さらに処理全体に及ぼす影響として認識処理の精度劣化に更に基づいて決定する形態について説明する。なお、以降の説明では、第２実施形態で説明した処理と同様の処理については説明を省略し、第２実施形態と異なる処理についてのみ説明する。 (Fourth embodiment)
The fourth embodiment is a modification of the second embodiment, and describes a mode in which the number of divisions is further determined based on deterioration in accuracy of the recognition process as an influence on the entire process. In the following description, description of processes similar to those described in the second embodiment will be omitted, and only processes different from those in the second embodiment will be described.

本実施形態では、まず分割数調整によって劣化する精度を精度劣化コストｅとして定義する。精度劣化コストｅの次元は処理時間と同じ（ｍｓ）とし、精度劣化コストｅは分割数調整によって統合される部分領域の組の数ｍに比例すると仮定する。 In the present embodiment, first, the accuracy that deteriorates due to the division number adjustment is defined as the accuracy degradation cost e. It is assumed that the accuracy degradation cost e has the same dimension (ms) as the processing time, and that the accuracy degradation cost e is proportional to the number m of sets of partial areas integrated by the division number adjustment.

ｅ＝ａ（ｍｓ／個）×ｍ（個）＝ａｍ（ｍｓ）
本実施形態では、分割数を調整した結果得られる精度が許容範囲外となるときの精度劣化コストｅが、分割数を調整しなかった場合の処理時間と同じになるように、比例定数ａを予め選択しておく。例えば、平均してｍ＝１０のときに要求される精度を満たせなくなり、その時の分割数を調整しなかった場合の平均処理時間が３（ｍｓ）であることが統計的にわかっている場合は、以下のようになる。 e = a (ms / piece) × m (piece) = am (ms)
In this embodiment, the proportionality constant a is set so that the accuracy degradation cost e when the accuracy obtained as a result of adjusting the number of divisions is outside the allowable range is the same as the processing time when the number of divisions is not adjusted. Select in advance. For example, when it is statistically known that the average processing time is 3 (ms) when the accuracy required when m = 10 on average cannot be satisfied and the number of divisions at that time is not adjusted It becomes as follows.

ａｍ＝ａ×１０＝３よって、ａ＝０．３（ｍｓ／個）
そして、トータルコストｆを定義し、トータルコストｆに基づいて最終的な分割数の調整を行う。トータルコストｆの次元は処理時間と同じ（ｍｓ）とする。 am = a × 10 = 3 Therefore, a = 0.3 (ms / piece)
Then, the total cost f is defined, and the final division number is adjusted based on the total cost f. The dimension of the total cost f is the same as the processing time (ms).

ｆ＝ｔ（処理時間の増加分）（ｍｓ）＋ｅ（ｍｓ）
このトータルコストｆが大きいほど、処理全体に与える影響が大きいことを表すとする。 f = t (increase in processing time) (ms) + e (ms)
It is assumed that the larger the total cost f, the greater the influence on the entire process.

上記式を第２実施形態に当てはめると、分割数を調整する場合は３組の部分領域を統合するとしたので、精度劣化コストｅは、以下のようになる。 When the above formula is applied to the second embodiment, when the number of divisions is adjusted, three sets of partial areas are integrated, so the accuracy degradation cost e is as follows.

ｅ＝０．３×３＝０．９（ｍｓ）
一方、第２実施形態より、処理時間の増加分は０．３（ｍｓ）であるから、トータルコストｆは、以下のようになる。 e = 0.3 × 3 = 0.9 (ms)
On the other hand, since the increase in processing time is 0.3 (ms) from the second embodiment, the total cost f is as follows.

ｆ１＝０．９＋０．３＝１．２（ｍｓ）
次に、分割数を調整しない場合は、精度劣化コストｅは０（ｍｓ）となる。一方、第２実施形態より、処理時間の増加分は１（ｍｓ）であるから、トータルコストｆは、以下のようになる。 f1 = 0.9 + 0.3 = 1.2 (ms)
Next, when the number of divisions is not adjusted, the accuracy degradation cost e is 0 (ms). On the other hand, since the increase in processing time is 1 (ms) from the second embodiment, the total cost f is as follows.

ｆ２＝１＋０＝１（ｍｓ）
よって、ｆ１＞ｆ２となるため、この場合は分割数を調整せずに（すなわち分割数を”１６３”として）処理を進める。 f2 = 1 + 0 = 1 (ms)
Therefore, since f1> f2, the process proceeds without adjusting the number of divisions (that is, with the number of divisions set to “163”).

なお、上述の精度劣化コストやトータルコストの式を定義したが、これは一例であり、他の定義があってもよい。例えば、トータルコストに、データ転送コストや消費電力などの項を更に追加して考慮することも可能である。 In addition, although the formula of the above-mentioned accuracy degradation cost and total cost was defined, this is an example and there may be another definition. For example, it is possible to consider additional terms such as data transfer cost and power consumption in addition to the total cost.

以上説明したとおり第４実施形態によれば、分割数を、認識処理を実行する処理系の実行可能処理数と、処理全体に及ぼす影響として分割数を調整した場合に生じる処理時間の変化と、に基づいて決定する。この構成により、処理時間と精度劣化のバランスを考慮した効率のよい認識処理が可能となる分割数を決定することが可能となる。 As described above, according to the fourth embodiment, the number of divisions, the number of executable processes of the processing system that executes the recognition process, and the change in the processing time that occurs when the number of divisions is adjusted as an influence on the entire process, Determine based on. With this configuration, it is possible to determine the number of divisions that enable efficient recognition processing in consideration of the balance between processing time and accuracy degradation.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

１０１メモリ；１０２カメラ；１０３入力デバイス；１０４ディスプレイ；１０５ＣＰＵ；１０６ＧＰＵ；１０７処理コア 101 memory; 102 camera; 103 input device; 104 display; 105 CPU; 106 GPU; 107 processing core

Claims

An image processing apparatus that performs processing using a plurality of processing units,
A dividing means for dividing the input image into a plurality of partial areas;
The effect of adjusting the number of the plurality of processing units, the first division number that is the number of the plurality of partial areas, and the number of the plurality of partial areas from the first division number to the second division number; Determining means for determining the second number of divisions based on:
Adjusting means for adjusting a part of the partial area of the first division number so that the input image is divided into areas by the second division number;
Control means for controlling the plurality of processing units to execute predetermined processing on the second divided number of partial regions;
An image processing apparatus comprising:

The image processing apparatus according to claim 1, wherein the determination unit determines the second division number so as to be an integral multiple of the number of the plurality of processing units.

The image processing apparatus according to claim 2, wherein the control unit equally distributes the second divided number of partial areas to the plurality of processing units.

A derivation unit for deriving a feature amount of each partial region;
4. The image according to claim 1, wherein the adjustment unit integrates adjacent partial regions having the similar feature amount among the first divided partial regions. 5. Processing equipment.

The image processing apparatus according to claim 4, wherein the predetermined process is an image recognition process based on a feature amount of a partial region.

The image processing apparatus according to claim 5, wherein the image recognition process includes a plurality of processes under different conditions and a process of dividing and integrating the results of the plurality of processes.

The image processing apparatus according to claim 1, wherein the influence is a degree of deterioration of accuracy in the predetermined processing by the plurality of processing units.

7. The apparatus according to claim 1, wherein the influence is an increase in a sum of a time required for adjustment by the adjusting unit and a time required for the predetermined processing by the plurality of processing units. The image processing apparatus described.

First predicting means for predicting a first processing load that is a processing load when the predetermined processing is executed on the first partial number of partial areas by the plurality of processing units;
Predicting a second processing load that is the sum of the processing load due to the adjusting means and the processing load when the predetermined processing is executed on the second divided number of partial areas by the plurality of processing units. Prediction means,
Further comprising
When the first processing load is smaller than the second processing load, the adjustment unit does not perform the adjustment, and the control unit performs the predetermined processing on the partial area of the first division number by the plurality of processing units. The image processing apparatus according to claim 1, wherein control is performed so as to execute the process.

The image processing apparatus according to claim 9, wherein the processing load includes any one of processing time, power consumption, and data transfer time.

A control method of an image processing apparatus that performs processing using a plurality of processing units,
A division step of dividing the input image into a plurality of partial areas;
The effect of adjusting the number of the plurality of processing units, the first division number that is the number of the plurality of partial areas, and the number of the plurality of partial areas from the first division number to the second division number; A determination step of determining the second number of divisions based on:
An adjustment step of adjusting a part of the partial region of the first division number so that the input image is divided into regions by the second division number;
A control step of controlling the plurality of processing units to execute predetermined processing on the second divided number of partial regions;
A control method for an image processing apparatus.

The program for functioning a computer as each means of the image processing apparatus of any one of Claims 1 thru | or 10.