JP2013141199A

JP2013141199A - Image processing apparatus, program, image processing method, and imaging device

Info

Publication number: JP2013141199A
Application number: JP2012206298A
Authority: JP
Inventors: Takeshi Matsuo; 武史松尾; Hiroko Kobayashi; 寛子小林
Original assignee: Nikon Corp
Current assignee: Nikon Corp
Priority date: 2011-12-06
Filing date: 2012-09-19
Publication date: 2013-07-18

Abstract

PROBLEM TO BE SOLVED: To compound a text into an image for a browser to easily read the text.SOLUTION: An image processing apparatus includes: an image input unit that inputs image data; an edge detection unit that detects an edge in the image data input by the image input unit; a text input unit that inputs text data; an area determination unit that determines a composite area of the text data in the image data on the basis of the edge detected by the edge detection unit; and a compounding unit that compounds the text data into the composite area determined by the area determination unit.

Description

本発明は、画像処理装置、プログラム、画像処理方法及び撮像装置に関する。 The present invention relates to an image processing device, a program, an image processing method, and an imaging device.

現在、撮像画像に関連したテキストを、撮像画像に重畳させる技術が開示されている（例えば、特許文献１参照）。特許文献１に記載された技術では、撮像画像において、相対的に重要な被写体が写っている重要領域以外の非重要領域にテキストを重畳させて合成画像を生成する。具体的には、人物が写っている領域を重要領域に分類し、画像の中央を含まない非重要領域中にテキストを重畳させる。 Currently, a technique for superimposing text related to a captured image on the captured image is disclosed (see, for example, Patent Document 1). In the technique described in Patent Literature 1, a composite image is generated by superimposing text on a non-important area other than an important area in which a relatively important subject is captured in a captured image. Specifically, an area in which a person is shown is classified as an important area, and text is superimposed on a non-important area that does not include the center of the image.

特開２００７−９６８１６号公報JP 2007-96816 A

しかしながら、特許文献１に記載された技術では、テキストを画像に重畳したときの可読性については考慮されていない。このため、例えば、複雑なテクスチャが存在する領域にテキストを重畳すると、テキスト表示に使われるフォントのアウトラインとテクスチャのエッジが重なりテキストの可読性が低下することがある。すなわち、テキストが読み辛くなることがある。 However, the technology described in Patent Document 1 does not consider readability when text is superimposed on an image. For this reason, for example, when text is superimposed on an area where a complex texture exists, the outline of the font used for text display and the edge of the texture may overlap and the readability of the text may deteriorate. That is, the text may be difficult to read.

本発明は上記の点に鑑みてなされたものであり、その目的は、閲覧者がテキストを読み易いように画像中にテキストを合成することができる画像処理装置、プログラム、画像処理方法及び撮像装置を提供することにある。 The present invention has been made in view of the above points, and an object of the present invention is to provide an image processing apparatus, a program, an image processing method, and an imaging apparatus capable of synthesizing text in an image so that a viewer can easily read the text. Is to provide.

本発明は上記の課題を解決するためになされたものであり、本発明の一態様は、画像データ、及び、テキストデータを取得する取得部と、前記取得部が取得した前記画像データのエッジを検出する検出部と、前記検出部により検出されたエッジに基づいて、前記画像データにおける前記テキストデータが配置される領域を決定する領域決定部と、前記領域決定部により決定された領域に前記テキストデータを配置した画像を生成する画像生成部と、を含むことを特徴とする画像処理装置である。 SUMMARY An advantage of some aspects of the invention is that an acquisition unit that acquires image data and text data, and an edge of the image data acquired by the acquisition unit are provided. A detection unit for detecting; an area determination unit for determining an area in which the text data is arranged in the image data based on the edge detected by the detection unit; and the text in the region determined by the area determination unit And an image generation unit that generates an image in which data is arranged.

また、本発明の一態様は、画像データを入力する画像入力部と、前記画像入力部により入力された画像データにおけるエッジを検出するエッジ検出部と、テキストデータを入力するテキスト入力部と、前記エッジ検出部により検出されたエッジに基づいて、前記画像データにおける前記テキストデータの合成領域を決定する領域決定部と、前記領域決定部により決定された合成領域に前記テキストデータを合成する合成部と、を備えることを特徴とする画像処理装置である。 According to another aspect of the present invention, an image input unit that inputs image data, an edge detection unit that detects an edge in the image data input by the image input unit, a text input unit that inputs text data, A region determining unit that determines a combined region of the text data in the image data based on an edge detected by an edge detecting unit; and a combining unit that combines the text data with the combined region determined by the region determining unit; An image processing apparatus comprising:

また、本発明の一態様は、画像データを入力するステップと、テキストデータを入力するステップと、前記入力された画像データにおけるエッジを検出するステップと、前記検出したエッジに基づいて、前記画像データにおける前記テキストデータの合成領域を決定するステップと、前記決定した合成領域に前記テキストデータを合成するステップと、をコンピュータに実行させるためのプログラムである。 According to another aspect of the present invention, the step of inputting image data, the step of inputting text data, the step of detecting edges in the input image data, and the image data based on the detected edges A program for causing a computer to execute a step of determining a synthesis region of the text data and a step of synthesizing the text data with the determined synthesis region.

また、本発明の一態様は、画像処理装置が、画像データを入力するステップと、前記画像処理装置が、テキストデータを入力するステップと、前記画像処理装置が、前記入力された画像データにおけるエッジを検出するステップと、前記画像処理装置が、前記検出したエッジに基づいて、前記画像データにおける前記テキストデータの合成領域を決定するステップと、前記画像処理装置が、前記決定した合成領域に前記テキストデータを合成するステップと、を有することを特徴とする画像処理方法である。 According to one embodiment of the present invention, an image processing device inputs image data, the image processing device inputs text data, and the image processing device uses an edge in the input image data. The image processing apparatus determines a text data synthesis area in the image data based on the detected edge; and the image processing apparatus determines the text in the determined synthesis area. And a step of synthesizing data.

また、本発明の一態様は、上述した画像処理装置を備えることを特徴とする撮像装置である。 Another embodiment of the present invention is an imaging device including the above-described image processing device.

また、本発明の一態様の画像処理装置は、画像データのエッジを検出する検出部と、前記検出部により検出された前記エッジの位置に基づいて、前記画像データにおける文字が配置される配置領域を決定する領域決定部と、前記領域決定部により決定された前記配置領域に前記文字を配置した画像を生成する画像生成部と、を含むことを特徴とする。 An image processing apparatus according to an aspect of the present invention includes a detection unit that detects an edge of image data, and an arrangement region in which characters in the image data are arranged based on the position of the edge detected by the detection unit And an image generation unit that generates an image in which the characters are arranged in the arrangement region determined by the region determination unit.

本発明によれば、閲覧者がテキストを読み易いように画像中にテキストを合成することができる。 According to the present invention, text can be synthesized in an image so that a viewer can easily read the text.

本発明の第１の実施形態による撮像装置の機能ブロック図の一例である。It is an example of the functional block diagram of the imaging device by the 1st Embodiment of this invention. 第１の実施形態による画像処理部の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image process part by 1st Embodiment. 第１の実施形態による入力画像とコスト画像と合成画像との一例を示すイメージ図である。It is an image figure which shows an example of the input image by 1st Embodiment, a cost image, and a synthesized image. 第１の実施形態による静止画の合成処理の手順を示すフローチャートである。4 is a flowchart illustrating a procedure of still image composition processing according to the first embodiment. 第１の実施形態による動画の合成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the synthetic | combination process of the moving image by 1st Embodiment. 第２の実施形態による本実施形態による画像処理部の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image process part by this embodiment by 2nd Embodiment. 第２の実施形態による合成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the synthetic | combination process by 2nd Embodiment. 第３の実施形態による画像処理部の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the image process part by 3rd Embodiment. 第３の実施形態による合成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the synthetic | combination process by 3rd Embodiment. テキスト矩形領域内のコストの総和の算出方法を示すイメージ図である。It is an image figure which shows the calculation method of the sum total of the cost in a text rectangular area. 撮像画像の特徴量を抽出するプロセスの一例を模式的に示す図である。It is a figure which shows typically an example of the process which extracts the feature-value of a captured image. 撮像画像の特徴量を抽出するプロセスの別の一例を模式的に示す図である。It is a figure which shows typically another example of the process which extracts the feature-value of a captured image. 笑顔レベルの判定方法を模式的に示すフローチャートである。It is a flowchart which shows typically the determination method of a smile level. 画像処理装置からの出力画像の一例を示す図である。It is a figure which shows an example of the output image from an image processing apparatus. 画像処理装置からの出力画像の別の例を示す図である。It is a figure which shows another example of the output image from an image processing apparatus. 撮像装置の画像処理部の内部構成を表す概略ブロック図である。It is a schematic block diagram showing the internal structure of the image process part of an imaging device. 代表色の決定の流れを示すフローチャートである。It is a flowchart which shows the flow of determination of a representative color. 画像処理部における処理の一例を示す概念図である。It is a conceptual diagram which shows an example of the process in an image process part. 画像処理部における処理の一例を示す概念図である。It is a conceptual diagram which shows an example of the process in an image process part. 図１８に示す主要領域に対して実施されたクラスタリングの結果を示す概念図である。It is a conceptual diagram which shows the result of the clustering implemented with respect to the main area | region shown in FIG. 文章付加部によって文章を付加された画像の一例である。It is an example of the image which the text was added by the text addition part. 文章付加部によって文章を付加された画像の別の一例である。It is another example of the image which the text was added by the text addition part. 色と単語との対応テーブルの一例を示す図である。It is a figure which shows an example of the correspondence table of a color and a word. 遠景画像（第２シーン画像）用の対応テーブルの一例を示す図である。It is a figure which shows an example of the correspondence table for a distant view image (2nd scene image). その他の画像（第３シーン画像）用の対応テーブルの一例を示す図である。It is a figure which shows an example of the corresponding | compatible table for other images (3rd scene image).

以下、図面を参照しながら本発明の実施形態について詳しく説明する。
［第１の実施形態］
図１は、本実施形態による撮像装置１００の機能ブロック図の一例である。
撮像装置１００は、図１に示すように、撮像部１１０、バッファメモリ部１３０、画像処理部１４０、表示部１５０、記憶部１６０、通信部１７０、操作部１８０及びＣＰＵ（Central processing unit）１９０を備える。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[First Embodiment]
FIG. 1 is an example of a functional block diagram of the imaging apparatus 100 according to the present embodiment.
As shown in FIG. 1, the imaging apparatus 100 includes an imaging unit 110, a buffer memory unit 130, an image processing unit 140, a display unit 150, a storage unit 160, a communication unit 170, an operation unit 180, and a CPU (Central processing unit) 190. Prepare.

撮像部１１０は、光学系１１１、撮像素子１１９及びＡ/Ｄ変換部１２０を備える。光学系１１１は、１又は２以上のレンズを備える。 The imaging unit 110 includes an optical system 111, an imaging element 119, and an A / D conversion unit 120. The optical system 111 includes one or more lenses.

撮像素子１１９は、例えば、受光面に結像した光学像を電気信号に変換して、Ａ/Ｄ変換部１２０に出力する。 For example, the image sensor 119 converts an optical image formed on the light receiving surface into an electrical signal and outputs the electrical signal to the A / D converter 120.

また、撮像素子１１９は、操作部１８０を介して静止画撮像指示を受け付けた際に得られる画像データを、撮像された静止画の撮像画像データとして、Ａ/Ｄ変換部１２０や画像処理部１４０を介して、記憶媒体２００に記憶させる。また、撮像素子１１９は、操作部１８０を介して動画撮像指示を受け付けた際に得られる、所定の間隔で連続的に撮像した動画の画像データを、撮像された動画の撮像画像データとして、Ａ/Ｄ変換部１２０や画像処理部１４０を介して、記憶媒体２００に記憶させる。また、撮像素子１１９は、例えば、操作部１８０を介して撮像指示を受け付けていない状態において、連続的に得られる画像データをスルー画データ（撮像画像）として、Ａ/Ｄ変換部１２０や画像処理部１４０を介して、表示部１５０に連続的に出力する。 Further, the image sensor 119 uses the image data obtained when a still image capturing instruction is received via the operation unit 180 as captured image data of the captured still image, and the A / D conversion unit 120 and the image processing unit 140. To be stored in the storage medium 200. In addition, the image sensor 119 obtains moving image image data continuously captured at a predetermined interval obtained when a moving image capturing instruction is received via the operation unit 180 as captured image data of the captured moving image as A The data is stored in the storage medium 200 via the / D conversion unit 120 and the image processing unit 140. In addition, for example, the imaging element 119 uses the A / D conversion unit 120 and the image processing as the through image data (captured image) as image data obtained continuously in a state where an imaging instruction is not received via the operation unit 180. The data is continuously output to the display unit 150 via the unit 140.

なお、光学系１１１は、撮像装置１００に取り付けられて一体とされていてもよいし、撮像装置１００に着脱可能に取り付けられてもよい。 Note that the optical system 111 may be attached to and integrated with the imaging apparatus 100, or may be detachably attached to the imaging apparatus 100.

Ａ/Ｄ変換部１２０は、撮像素子１１９によって変換された電子信号をアナログ／デジタル変換し、この変換したデジタル信号である撮像画像データ（撮像画像）を出力する。 The A / D converter 120 performs analog / digital conversion on the electronic signal converted by the image sensor 119, and outputs captured image data (captured image) that is the converted digital signal.

即ち、撮像部１１０は、設定された撮像条件（例えば絞り値、露出値等）に基づいてＣＰＵ１９０により制御され、光学系１１１を介した光学像を撮像素子１１９に結像させ、Ａ/Ｄ変換部１２０によりデジタル信号に変換された当該光学像に基づく撮像画像を生成する。 That is, the imaging unit 110 is controlled by the CPU 190 based on the set imaging conditions (for example, aperture value, exposure value, etc.), and forms an optical image via the optical system 111 on the imaging device 119 for A / D conversion. A captured image based on the optical image converted into a digital signal by the unit 120 is generated.

操作部１８０は、例えば、電源スイッチ、シャッターボタン、十字キー、確定ボタン、および、その他の操作キーを含み、ユーザによって操作されることでユーザの操作入力を受け付け、ＣＰＵ１９０に出力する。 The operation unit 180 includes, for example, a power switch, a shutter button, a cross key, a confirmation button, and other operation keys. The operation unit 180 is operated by the user, receives a user operation input, and outputs it to the CPU 190.

画像処理部１４０は、記憶部１６０に記憶されている画像処理条件に基づいて、バッファメモリ部１３０に記憶されている画像データに対して画像処理を実行する。画像処理部１４０の詳細については後述する。なお、バッファメモリ部１３０に記憶されている画像データとは、画像処理部１４０に入力される画像データのことであり、例えば、上述した撮像画像データ、スルー画データ、または、記憶媒体２００から読み出された撮像画像データのことである。 The image processing unit 140 performs image processing on the image data stored in the buffer memory unit 130 based on the image processing conditions stored in the storage unit 160. Details of the image processing unit 140 will be described later. The image data stored in the buffer memory unit 130 is image data input to the image processing unit 140. For example, the image data read from the captured image data, the through image data, or the storage medium 200 described above. This is taken image data.

表示部１５０は、例えば液晶ディスプレイであって、画像データ、操作画面などを表示する。例えば、表示部１５０は、画像処理部１４０によって文章が付加された撮像画像を表示する。 The display unit 150 is a liquid crystal display, for example, and displays image data, an operation screen, and the like. For example, the display unit 150 displays a captured image to which text is added by the image processing unit 140.

記憶部１６０は、種々の情報を記憶する。
バッファメモリ部１３０は、撮像部１１０によって撮像された画像データを、一時的に記憶する。通信部１７０は、カードメモリ等の取り外しが可能な記憶媒体２００と接続され、この記憶媒体２００への撮影画像データの書込み、読み出し、または消去を行う。 The storage unit 160 stores various information.
The buffer memory unit 130 temporarily stores image data captured by the imaging unit 110. The communication unit 170 is connected to a removable storage medium 200 such as a card memory, and writes, reads, or deletes photographed image data in the storage medium 200.

記憶媒体２００は、撮像装置１００に対して着脱可能に接続される記憶部であり、例えば、撮像部１１０によって生成された撮影画像データを記憶する。ＣＰＵ１９０は、撮像装置１００が備える各構成を制御する。バス３００は、撮像部１１０と、ＣＰＵ１９０と、操作部１８０と、画像処理部１４０と、表示部１５０と、記憶部１６０と、バッファメモリ部１３０と、通信部１７０とに接続され、各部から出力された画像データや制御信号等を転送する。 The storage medium 200 is a storage unit that is detachably connected to the imaging device 100, and stores, for example, photographed image data generated by the imaging unit 110. The CPU 190 controls each component included in the imaging device 100. The bus 300 is connected to the imaging unit 110, the CPU 190, the operation unit 180, the image processing unit 140, the display unit 150, the storage unit 160, the buffer memory unit 130, and the communication unit 170, and outputs from each unit. The transferred image data and control signals are transferred.

図２は、本実施形態による画像処理部１４０の機能構成を示すブロック図である。
画像処理部（画像処理装置）１４０は、画像入力部１１と、テキスト入力部１２と、第１の位置入力部１３と、エッジ検出部１４と、顔検出部１５と、文字サイズ決定部１６と、コスト算出部１７と、領域決定部１８と、合成部１９とを含んで構成される。 FIG. 2 is a block diagram illustrating a functional configuration of the image processing unit 140 according to the present embodiment.
The image processing unit (image processing apparatus) 140 includes an image input unit 11, a text input unit 12, a first position input unit 13, an edge detection unit 14, a face detection unit 15, and a character size determination unit 16. The cost calculation unit 17, the region determination unit 18, and the synthesis unit 19 are included.

画像入力部１１は、静止画の画像データ又は動画の画像データを入力する。画像入力部１１は、入力された画像データをエッジ検出部１４及び文字サイズ決定部１６に出力する。なお、画像入力部１１は、例えば、ネットワーク又は記憶媒体を介して、画像データを入力してもよい。以下、画像入力部１１に入力された画像データが示す画像を入力画像とする。また、入力画像における四角形の画像フォーマットの幅方向をＸ軸方向とし、Ｘ軸方向に対し直交する方向（高さ方向）をＹ軸方向としてＸＹ座標系を定める。 The image input unit 11 inputs still image data or moving image data. The image input unit 11 outputs the input image data to the edge detection unit 14 and the character size determination unit 16. Note that the image input unit 11 may input image data via a network or a storage medium, for example. Hereinafter, an image indicated by the image data input to the image input unit 11 is referred to as an input image. Also, an XY coordinate system is defined with the width direction of the rectangular image format in the input image as the X-axis direction and the direction (height direction) orthogonal to the X-axis direction as the Y-axis direction.

テキスト入力部１２は、入力画像に対応するテキストデータを入力する。入力画像に対応するテキストデータとは、入力画像に重畳するテキストに関するデータであり、テキスト、初期文字サイズ、改行位置、行数及び列数等を含む。初期文字サイズは、テキストの文字の大きさの初期値であり、ユーザにより指定された文字の大きさである。テキスト入力部１２は、入力されたテキストデータを文字サイズ決定部１６に出力する。 The text input unit 12 inputs text data corresponding to the input image. The text data corresponding to the input image is data relating to the text to be superimposed on the input image, and includes text, initial character size, line feed position, number of rows, number of columns, and the like. The initial character size is an initial value of the character size of the text, and is the character size designated by the user. The text input unit 12 outputs the input text data to the character size determination unit 16.

第１の位置入力部１３は、入力画像における重要な位置（以下、重要位置（第１の位置）とする）の入力を受け付ける。例えば、第１の位置入力部１３は、入力画像を表示部１５０に表示し、表示部１５０に設置されたタッチパネルにおいてユーザから指定された位置を重要位置とする。或いは、第１の位置入力部１３は、直接重要位置の座標値（ｘ_０，ｙ_０）の入力を受け付けてもよい。第１の位置入力部１３は、重要位置の座標値（ｘ_０，ｙ_０）をコスト算出部１７に出力する。なお、第１の位置入力部１３は、ユーザから重要位置の入力がなかった場合には、予め設定された所定位置（例えば、入力画像の中央）を重要位置とする。 The first position input unit 13 receives an input of an important position in the input image (hereinafter referred to as an important position (first position)). For example, the first position input unit 13 displays an input image on the display unit 150 and sets a position specified by the user on the touch panel installed on the display unit 150 as an important position. Alternatively, a first position input unit 13 may receive an input of the direct coordinate values of key position (x _{0, y} _0). The first position input unit 13 outputs the coordinate value (x ₀ , y ₀ ) of the important position to the cost calculation unit 17. The first position input unit 13 sets a predetermined position (for example, the center of the input image) as an important position when no important position is input from the user.

エッジ検出部１４は、例えば、Ｃａｎｎｙアルゴリズムを用いて、画像入力部１１から入力された画像データにおけるエッジを検出する。そして、エッジ検出部１４は、画像データと、当該画像データから検出したエッジの位置を示すデータとをコスト算出部１７に出力する。なお、本実施形態では、Ｃａｎｎｙアルゴリズムを用いてエッジを検出しているが、例えば、微分フィルタを用いたエッジ検出方法や、２次元フーリエ変換した結果のうちの高周波成分に基づいてエッジを検出する方法等を用いてもよい。
顔検出部１５は、パターンマッチング等により、画像入力部１１から入力された画像データにおける人物の顔を検出する。そして、顔検出部１５は、画像データと、当該画像データから検出した人物の顔の位置を示すデータとをコスト算出部１７に出力する。 The edge detection unit 14 detects an edge in the image data input from the image input unit 11 using, for example, the Canny algorithm. Then, the edge detection unit 14 outputs the image data and data indicating the position of the edge detected from the image data to the cost calculation unit 17. In this embodiment, the edge is detected using the Canny algorithm. For example, the edge is detected based on an edge detection method using a differential filter or a high-frequency component in the result of two-dimensional Fourier transform. A method or the like may be used.
The face detection unit 15 detects a human face in the image data input from the image input unit 11 by pattern matching or the like. Then, the face detection unit 15 outputs the image data and data indicating the position of the human face detected from the image data to the cost calculation unit 17.

文字サイズ決定部１６は、画像入力部１１から入力された画像データの画像サイズ（幅及び高さ）と、テキスト入力部１２から入力されたテキストデータの行数及び列数とに基づいて、テキストデータの文字サイズを決定する。具体的には、文字サイズ決定部１６は、テキストデータにおける全てのテキストを画像データに合成できるように、次の式（１）を満たすｆを文字サイズとする。 Based on the image size (width and height) of the image data input from the image input unit 11 and the number of lines and columns of the text data input from the text input unit 12, the character size determination unit 16 Determine the character size of the data. Specifically, the character size determination unit 16 sets f satisfying the following expression (1) as the character size so that all texts in the text data can be combined with the image data.

ただし、ｍはテキストデータの列数であり、ｌはテキストデータの行数である。また、Ｌ（≧０）は文字の大きさに対する行間の割合を示すパラメータである。また、ｗは画像データにおける画像領域の幅であり、ｈは画像データにおける画像領域の高さである。式（１）は、テキストの幅が画像データにおける画像領域の幅より小さく、かつ、テキストの高さが画像データにおける画像領域の高さより小さいことを表す。 Here, m is the number of text data columns, and l is the number of text data rows. L (≧ 0) is a parameter indicating the ratio of line spacing to character size. Further, w is the width of the image area in the image data, and h is the height of the image area in the image data. Expression (1) represents that the width of the text is smaller than the width of the image area in the image data, and the height of the text is smaller than the height of the image area in the image data.

例えば、文字サイズ決定部１６は、テキストデータに含まれる初期文字サイズが式（１）を満たさない場合には、式（１）を満たすまで文字サイズを徐々に小さくする。一方、文字サイズ決定部１６は、テキストデータに含まれる初期文字サイズが式（１）を満たす場合には、テキストデータに含まれる初期文字サイズをテキストデータの文字サイズとする。そして、文字サイズ決定部１６は、テキストデータと、当該テキストデータの文字サイズとを領域決定部１８に出力する。 For example, if the initial character size included in the text data does not satisfy Expression (1), the character size determination unit 16 gradually decreases the character size until Expression (1) is satisfied. On the other hand, when the initial character size included in the text data satisfies Expression (1), the character size determination unit 16 sets the initial character size included in the text data as the character size of the text data. Then, the character size determination unit 16 outputs the text data and the character size of the text data to the region determination unit 18.

コスト算出部１７は、画像データにおけるエッジの位置と、人物の顔の位置と、重要位置とに基づいて、画像データにおける各座標位置（ｘ，ｙ）のコストを算出する。コストは、画像データにおける重要度を表す。例えば、コスト算出部１７は、エッジ検出部１４により検出されたエッジのある位置のコストが高くなるように各位置のコストを算出する。また、コスト算出部１７は、重要位置から近いほどコストを高くし、重要位置から遠いほどコストを低くする。また、コスト算出部１７は、人物の顔のある領域のコストを高くする。 The cost calculation unit 17 calculates the cost of each coordinate position (x, y) in the image data based on the position of the edge in the image data, the position of the person's face, and the important position. The cost represents importance in the image data. For example, the cost calculation unit 17 calculates the cost of each position so that the cost of the position where the edge is detected by the edge detection unit 14 is high. The cost calculation unit 17 increases the cost as it is closer to the important position, and lowers the cost as it is farther from the important position. In addition, the cost calculation unit 17 increases the cost of the region with the human face.

具体的には、まず、コスト算出部１７は、例えば、次の式（２）に示すガウス関数により、重要位置（ｘ_０，ｙ_０）に基づくコストを示すグローバルコスト画像ｃ_ｇ（ｘ，ｙ）を生成する。 Specifically, first, the cost calculation unit 17 uses, for example, a global cost image c _g (x, y) indicating a cost based on the important position (x ₀ , y ₀ ) using a Gaussian function expressed by the following equation (2). ) Is generated.

ただし、ｘ_０は重要位置のＸ座標値であり、ｙ_０は重要位置のＹ座標値である。また、Ｓ_１（＞０）は幅方向（Ｘ軸方向）におけるコストの広がり方を決めるパラメータであり、Ｓ_２（＞０）は高さ方向（Ｙ軸方向）におけるコストの広がり方を決めるパラメータである。パラメータＳ_１及びパラメータＳ_２は、例えば設定画面等によりユーザが設定可能である。パラメータＳ_１及びパラメータＳ_２を変更することより、グローバルコスト画像における分布の形を調整することができる。なお、本実施形態では、ガウス関数によりグローバルコスト画像を生成しているが、例えば、余弦関数（（ｃｏｓ（πｘ）＋１）／２、但し−１≦ｘ≦１）や、原点ｘ＝０で最大値をとる三角形型（山型）の直線で表される関数や、ローレンツ型の関数（１／（ａｘ^２＋１）、ａは定数）など、中心に近いほど値が大きくなる分布の関数を用いてグローバルコスト画像を生成してもよい。 However, x ₀ is the X coordinate value of the critical position, y ₀ is the Y-coordinate values of the critical position. S ₁ (> 0) is a parameter that determines how the cost spreads in the width direction (X-axis direction), and S ₂ (> 0) is a parameter that determines how the cost spreads in the height direction (Y-axis direction). It is. Parameter S ₁ and parameter S ₂ can be set by the user, for example, by setting screen or the like. Than to change the parameters S ₁ and parameter S _2, it is possible to adjust the shape of the distribution in the global cost image. In this embodiment, the global cost image is generated by a Gaussian function. For example, the cosine function ((cos (πx) +1) / 2, where −1 ≦ x ≦ 1) or the origin x = 0. A function of a distribution whose value increases as it is closer to the center, such as a function represented by a triangular (mountain) straight line that takes the maximum value or a Lorentz function (1 / (ax ² +1), a is a constant). It may be used to generate a global cost image.

次に、コスト算出部１７は、次の式（３）及び（４）により、人物の顔の位置に基づくコストを示す顔コスト画像ｃ_ｆ（ｘ，ｙ）を生成する。 Next, the cost calculation unit 17 generates a face cost image c _f (x, y) indicating the cost based on the position of the person's face by the following equations (3) and (4).

ただし、（ｘ^（ｉ），ｙ^（ｉ））は検出したｎ個の顔のうちｉ（１≦ｉ≦ｎ）番目の顔の中心位置であり、ｓ^（ｉ）は当該ｉ番目の顔の大きさである。すなわち、コスト算出部１７は、人物の顔の領域における画素値を「１」とし、顔意外の領域における画素値を「０」とする顔コスト画像を生成する。 However, (x ⁽ⁱ⁾ , y ⁽ⁱ⁾ ) is the center position of the i (1 ≦ i ≦ n) -th face among the detected n faces, and s ⁽ⁱ⁾ is the i-th face It is a size. That is, the cost calculation unit 17 generates a face cost image in which the pixel value in the human face region is “1” and the pixel value in the unexpected region is “0”.

次に、コスト算出部１７は、次の式（５）により、エッジに基づくコストを示すエッジコスト画像ｃ_ｅ（ｘ，ｙ）を生成する。 Next, the cost calculation unit 17 generates an edge cost image c _e (x, y) indicating the cost based on the edge by the following equation (5).

すなわち、コスト算出部１７は、エッジ部分の画素値を「１」とし、エッジ以外の領域の画素値を「０」とするエッジコスト画像を生成する。なお、エッジ部分は、エッジのある位置でもよいし、エッジのある位置とその周辺を含む領域であってもよい。 That is, the cost calculation unit 17 generates an edge cost image in which the pixel value of the edge portion is “1” and the pixel value of the region other than the edge is “0”. Note that the edge portion may be a position where the edge is present, or may be a region including the position where the edge is present and its periphery.

そして、コスト算出部１７は、次の式（６）により、グローバルコスト画像と、顔コスト画像と、エッジコスト画像とに基づく最終コスト画像ｃ（ｘ，ｙ）を生成する。 Then, the cost calculation unit 17 generates a final cost image c (x, y) based on the global cost image, the face cost image, and the edge cost image by the following equation (6).

ただし、Ｃ_ｇ（≧０）はグローバルコスト画像の重み付け係数のパラメータであり、Ｃ_ｆ（≧０）は顔コスト画像の重み付け係数のパラメータであり、Ｃ_ｅ（≧０）はエッジコスト画像の重み付け係数のパラメータである。パラメータＣ_ｇ，パラメータＣ_ｅ及びパラメータＣ_ｆの比は設定画面等によりユーザが設定変更可能である。また、式（６）に示す最終コスト画像ｃ（ｘ，ｙ）は、０≦ｃ（ｘ，ｙ）≦１となるよう正規化されている。コスト算出部１７は、画像データと、当該画像データの最終コスト画像とを領域決定部１８に出力する。なお、パラメータＣ_ｇ，パラメータＣ_ｅ及びパラメータＣ_ｆは、１以下であってもよい。 However, C _g (≧ 0) is a parameter of the weighting coefficient of the global cost image, C _f (≧ 0) is a parameter of the weighting coefficient of the face cost image, and C _e (≧ 0) is the weighting of the edge cost image. This is a coefficient parameter. The ratio of the parameter C _g , the parameter C _e and the parameter C _f can be changed by the user on the setting screen or the like. Further, the final cost image c (x, y) shown in Expression (6) is normalized so that 0 ≦ c (x, y) ≦ 1. The cost calculation unit 17 outputs the image data and the final cost image of the image data to the region determination unit 18. The parameter C _g , the parameter C _e and the parameter C _f may be 1 or less.

なお、画像処理部１４０は、入力画像に応じてパラメータＣ_ｇ，パラメータＣ_ｅ及びパラメータＣ_ｆの比を自動的に変更するようにしてもよい。例えば、入力画像が風景画像である場合には、パラメータＣ_ｇを他のパラメータより大きくする。また、入力画像がポートレート（人物画像）である場合には、パラメータＣ_ｆを他のパラメータより大きくする。また、入力画像がビル等の建物の多い建物画像である場合には、パラメータＣ_ｅを他のパラメータより大きくする。具体的には、コスト算出部１７は、顔検出部１５により人物の顔が検出された場合に、入力画像をポートレートと判定し、パラメータＣ_ｆを他のパラメータより大きくする。一方、コスト算出部１７は、顔検出部１５により人物の顔が検出されなかった場合には、入力画像を風景画像と判定し、パラメータＣ_ｇを他のパラメータより大きくする。また、コスト算出部１７は、エッジ検出部１４により検出されたエッジが所定値より大きい場合は、入力画像を建物画像であると判定し、パラメータＣ_ｅを他のパラメータより大きくする。
或いは、画像処理部１４０は、風景画像のモード、ポートレートのモード、及び建物画像のモードを有し、現在画像処理部１４０に設定されているモードに応じてパラメータＣ_ｇ，パラメータＣ_ｅ及びパラメータＣ_ｆの比を変更してもよい。 Note that the image processing unit 140 may automatically change the ratio of the parameter C _g , the parameter C _e, and the parameter C _f according to the input image. For example, if the input image is a landscape image, the larger the parameter C _g than other parameters. Further, when the input image is a portrait (person image) increases the parameter C _f than other parameters. Further, when the input image is a large building images building or the like increases the parameter C _e than other parameters. Specifically, the cost calculation unit 17, when the face of a person is detected by the face detection unit 15 determines the input image portrait and, to increase the parameter C _f than other parameters. On the other hand, when the face detection unit 15 does not detect a human face, the cost calculation unit 17 determines that the input image is a landscape image, and makes the parameter _Cg larger than the other parameters. Also, the cost calculating unit 17, if the detected edge by the edge detecting section 14 is greater than the predetermined value, determines the input image as a building image, to increase the parameter C _e than other parameters.
Alternatively, the image processing unit 140 has a landscape image mode, a portrait mode, and a building image mode, and the parameter C _g , the parameter C _e, and the parameter according to the mode currently set in the image processing unit 140. The ratio of C _f may be changed.

また、コスト算出部１７は、画像データが動画である場合には、動画の画像データに含まれる複数のフレーム画像のコストの平均値を座標位置毎に算出する。具体的には、コスト算出部１７は、所定時間（例えば、３秒）間隔で動画のフレーム画像を取得し、取得したフレーム画像毎に最終コスト画像を生成する。そして、コスト算出部１７は、各フレーム画像の最終コスト画像を平均した平均最終コスト画像を生成する。平均最終コスト画像における各位置の画素値は、各最終コスト画像における各位置の画素値の平均値である。
なお、本実施形態では複数のフレーム画像のコストの平均値を算出しているが、例えば、合計値を算出してもよい。 Further, when the image data is a moving image, the cost calculation unit 17 calculates an average value of costs of a plurality of frame images included in the moving image data for each coordinate position. Specifically, the cost calculation unit 17 acquires a frame image of a moving image at a predetermined time (for example, 3 seconds) interval, and generates a final cost image for each acquired frame image. And the cost calculation part 17 produces | generates the average final cost image which averaged the final cost image of each frame image. The pixel value at each position in the average final cost image is the average value of the pixel values at each position in each final cost image.
In the present embodiment, the average value of the costs of a plurality of frame images is calculated. However, for example, a total value may be calculated.

領域決定部１８は、コスト算出部１７により入力された最終コスト画像と、文字サイズ決定部１６により入力されたテキストデータの文字サイズとに基づいて、画像データにおけるテキストを合成する合成領域を決定する。具体的には、まず、領域決定部１８は、テキストデータの行数及び列数と文字サイズとに基づいて、テキストを表示する矩形領域であるテキスト矩形領域の幅ｗ_ｔｅｘｔと高さｈ_ｔｅｘｔとを算出する。テキスト矩形領域は、合成領域に対応する領域である。続いて、領域決定部１８は、次の式（７）により、各座標位置（ｘ，ｙ）における、テキスト矩形領域内のコストの総和ｃ^＊ _ｔｅｘｔ（ｘ，ｙ）を算出する。 The area determination unit 18 determines a synthesis area for synthesizing text in the image data based on the final cost image input by the cost calculation unit 17 and the character size of the text data input by the character size determination unit 16. . Specifically, first, the area determination unit 18 determines the width w _text and the height h _text of a text rectangular area that is a rectangular area for displaying text, based on the number of rows and columns of text data and the character size. Is calculated. The text rectangular area is an area corresponding to the synthesis area. Subsequently, the area determination unit 18 calculates the total cost c ^* _text (x, y) in the text rectangular area at each coordinate position (x, y) by the following equation (7).

そして、領域決定部１８は、テキスト矩形領域内のコストの総和ｃ^＊ _ｔｅｘｔ（ｘ，ｙ）が最小となる座標位置（ｘ，ｙ）をテキストの合成位置とする。すなわち、領域決定部１８は、テキスト矩形領域内のコストの総和ｃ^＊ _ｔｅｘｔ（ｘ，ｙ）が最小となる座標位置（ｘ，ｙ）を左上の頂点とするテキスト矩形領域をテキストの合成領域とする。領域決定部１８は、画像データと、テキストデータと、テキストの合成領域を示すデータとを合成部１９に出力する。なお、本実施形態では、領域決定部１８は、テキスト矩形領域内のコストの総和（合計値）に基づいて合成領域を決定しているが、例えば、テキスト矩形領域内のコストの平均値が最も小さい領域を合成領域としてもよい。或いは、領域決定部１８は、テキスト矩形領域の中心の重みを重くしたコストの重み付け平均値が最も小さい領域を合成領域としてもよい。 Then, the area determination unit 18 sets the coordinate position (x, y) at which the total cost c ^* _text (x, y) in the text rectangular area is minimum as the text synthesis position. That is, the region determination unit 18 sets a text rectangular region having a coordinate position (x, y) at which the total cost c ^* _text (x, y) in the text rectangular region is minimum as a top left vertex as a text synthesis region. To do. The area determination unit 18 outputs image data, text data, and data indicating a text synthesis area to the synthesis unit 19. In the present embodiment, the area determination unit 18 determines the synthesis area based on the total cost (total value) in the text rectangular area. For example, the average cost value in the text rectangular area is the highest. A small area may be used as a synthesis area. Alternatively, the region determination unit 18 may set a region having the smallest cost weighted average value obtained by increasing the weight of the center of the text rectangular region as the composite region.

合成部１９は、画像データと、テキストデータと、テキストの合成領域を示すデータとを入力とする。合成部１９は、画像データの合成領域にテキストデータのテキストを重畳して合成した合成画像の画像データを生成して出力する。 The synthesis unit 19 receives image data, text data, and data indicating a text synthesis area. The synthesizing unit 19 generates and outputs image data of a synthesized image obtained by superimposing the text of the text data on the synthesis area of the image data.

図３は、本実施形態による入力画像とコスト画像と合成画像との一例を示すイメージ図である。
図３（ａ）は、入力画像を示す。図３（ｂ）は、グローバルコスト画像を示す。図３（ｂ）に示す例では、入力画像の中心が重要位置である。図３（ｂ）に示すように、グローバルコスト画像の画素値は、中心に近いほど「１」に近く、中心から遠いほど「０」に近い。図３（ｃ）は、顔コスト画像を示す。図３（ｃ）に示すように、顔コスト画像の画素値は、人の顔の領域が「１」であり、人の顔以外の領域が「０」である。図３（ｄ）は、エッジコスト画像を示す。図３（ｄ）に示すように、エッジコスト画像の画素値は、エッジ部分が「１」であり、エッジ部分以外の領域が「０」である。 FIG. 3 is an image diagram illustrating an example of an input image, a cost image, and a composite image according to the present embodiment.
FIG. 3A shows an input image. FIG. 3B shows a global cost image. In the example shown in FIG. 3B, the center of the input image is the important position. As shown in FIG. 3B, the pixel value of the global cost image is closer to “1” as it is closer to the center, and closer to “0” as it is farther from the center. FIG. 3C shows a face cost image. As shown in FIG. 3C, the pixel value of the face cost image is “1” in the area of the human face and “0” in the area other than the human face. FIG. 3D shows an edge cost image. As shown in FIG. 3D, the edge value of the pixel value of the edge cost image is “1”, and the area other than the edge portion is “0”.

図３（ｅ）は、グローバルコスト画像と顔コスト画像とエッジコスト画像とを組合わせた最終コスト画像を示す。図３（ｆ）は、入力画像にテキストを重畳して合成した合成画像を示す。図３（ｆ）に示すように、テキストデータのテキストは、最終コスト画像におけるコストの総和が小さい領域に重畳される。 FIG. 3E shows a final cost image obtained by combining a global cost image, a face cost image, and an edge cost image. FIG. 3F shows a synthesized image obtained by superimposing text on the input image. As shown in FIG. 3F, the text of the text data is superimposed on an area where the total cost in the final cost image is small.

次に、図４を参照して、画像処理部１４０による静止画の合成処理について説明する。
図４は、本実施形態による静止画の合成処理の手順を示すフローチャートである。
はじめに、ステップＳ１０１において、画像入力部１１が、静止画の画像データ（以下、静止画データとする）の入力を受け付ける。
次に、ステップＳ１０２において、テキスト入力部１２が、入力された静止画データに対応するテキストデータの入力を受け付ける。
次に、ステップＳ１０３において、第１の位置入力部１３が、入力された静止画データにおける重要位置の入力を受け付ける。 Next, still image synthesis processing by the image processing unit 140 will be described with reference to FIG.
FIG. 4 is a flowchart illustrating a procedure of still image composition processing according to the present embodiment.
First, in step S101, the image input unit 11 accepts input of still image data (hereinafter referred to as still image data).
Next, in step S102, the text input unit 12 receives input of text data corresponding to the input still image data.
Next, in step S103, the first position input unit 13 receives an input of an important position in the input still image data.

続いて、ステップＳ１０４において、文字サイズ決定部１６が、入力された静止画データのサイズと、入力されたテキストデータの行数及び列数とに基づいて、テキストデータの文字サイズを決定する。
次に、ステップＳ１０５において、顔検出部１５が、入力された静止画データにおける人物の顔の位置を検出する。
次に、ステップＳ１０６において、エッジ検出部１４が、入力された静止画データにおけるエッジの位置を検出する。 Subsequently, in step S104, the character size determination unit 16 determines the character size of the text data based on the size of the input still image data and the number of rows and columns of the input text data.
Next, in step S105, the face detection unit 15 detects the position of the person's face in the input still image data.
Next, in step S106, the edge detection unit 14 detects the position of the edge in the input still image data.

続いて、ステップＳ１０７において、コスト算出部１７が、指定された重要位置に基づいてグローバルコスト画像を生成する。すなわち、コスト算出部１７は、重要位置に近いほどコストが高く、重要位置から遠いほどコストが低いグローバルコスト画像を生成する。
次に、ステップＳ１０８において、コスト算出部１７が、検出された人物の顔の位置に基づいて、顔コスト画像を生成する。すなわち、コスト算出部１７は、人物の顔の領域のコストが高く、人物の顔以外の領域のコストが低い顔コスト画像を生成する。
次に、ステップＳ１０９において、コスト算出部１７が、検出されたエッジの位置に基づいて、エッジコスト画像を生成する。すなわち、コスト算出部１７は、エッジ部分のコストが高く、エッジ以外の領域のコストが低いエッジコスト画像を生成する。 Subsequently, in step S107, the cost calculation unit 17 generates a global cost image based on the designated important position. That is, the cost calculation unit 17 generates a global cost image that has a higher cost as it is closer to the important position and a lower cost as it is far from the important position.
Next, in step S108, the cost calculation unit 17 generates a face cost image based on the detected face position of the person. That is, the cost calculation unit 17 generates a face cost image in which the cost of the human face region is high and the cost of the region other than the human face is low.
Next, in step S109, the cost calculation unit 17 generates an edge cost image based on the detected edge position. That is, the cost calculation unit 17 generates an edge cost image in which the cost of the edge portion is high and the cost of the region other than the edge is low.

続いて、ステップＳ１１０において、コスト算出部１７は、生成されたグローバルコスト画像と、顔コスト画像と、エッジコスト画像とを組合わせて、最終コスト画像を生成する。
次に、ステップＳ１１１において、領域決定部１８が、生成した最終コスト画像と、決定されたテキストデータの文字サイズとに基づいて、静止画データにおけるテキストの合成領域を決定する。
最後に、ステップＳ１１２において、合成部１９が、決定された合成領域にテキストデータのテキストを重畳して、静止画データとテキストデータとを合成する。 Subsequently, in step S110, the cost calculation unit 17 generates a final cost image by combining the generated global cost image, face cost image, and edge cost image.
In step S111, the area determination unit 18 determines a text synthesis area in the still image data based on the generated final cost image and the determined character size of the text data.
Finally, in step S112, the synthesizer 19 synthesizes the still image data and the text data by superimposing the text of the text data on the determined synthesis area.

次に、図５を参照して、画像処理部１４０による動画の合成処理について説明する。図５は、本実施形態による動画の合成処理の手順を示すフローチャートである。
はじめに、ステップＳ２０１において、画像入力部１１が、動画の画像データ（以下、動画データとする）の入力を受け付ける。
次に、ステップＳ２０２において、テキスト入力部１２が、入力された動画データに対応するテキストデータの入力を受け付ける。
次に、ステップＳ２０３において、第１の位置入力部１３が、入力された動画データにおける重要位置の指定を受け付ける。 Next, a moving image composition process performed by the image processing unit 140 will be described with reference to FIG. FIG. 5 is a flowchart showing the procedure of the moving image composition process according to the present embodiment.
First, in step S201, the image input unit 11 receives input of moving image data (hereinafter referred to as moving image data).
Next, in step S202, the text input unit 12 accepts input of text data corresponding to the input moving image data.
Next, in step S203, the first position input unit 13 receives designation of an important position in the input moving image data.

続いて、ステップＳ２０４において、文字サイズ決定部１６が、動画データのサイズと、テキストデータの行数及び列数とに基づいて、テキストデータの文字サイズを決定する。
次に、ステップＳ２０５において、コスト算出部１７が、動画データから最初のフレーム画像を取得する。 Subsequently, in step S204, the character size determination unit 16 determines the character size of the text data based on the size of the moving image data and the number of rows and columns of the text data.
Next, in step S205, the cost calculation unit 17 acquires the first frame image from the moving image data.

続いて、ステップＳ２０６において、顔検出部１５が、取得したフレーム画像における人物の顔の位置を検出する。
次に、ステップＳ２０７において、エッジ検出部１４が、取得したフレーム画像におけるエッジの位置を検出する。 Subsequently, in step S206, the face detection unit 15 detects the position of the person's face in the acquired frame image.
Next, in step S207, the edge detection unit 14 detects the position of the edge in the acquired frame image.

続いて、ステップＳ２０８において、コスト算出部１７が、入力された重要位置に基づいてグローバルコスト画像を生成する。
次に、ステップＳ２０９において、コスト算出部１７が、検出された人物の顔の位置に基づいて、顔コスト画像を生成する。
次に、ステップＳ２１０において、コスト算出部１７が、検出されたエッジの位置に基づいて、エッジコスト画像を生成する。 Subsequently, in step S208, the cost calculation unit 17 generates a global cost image based on the input important position.
Next, in step S209, the cost calculation unit 17 generates a face cost image based on the detected face position of the person.
Next, in step S210, the cost calculation unit 17 generates an edge cost image based on the detected edge position.

続いて、ステップＳ２１１において、コスト算出部１７は、生成されたグローバルコスト画像と、顔コスト画像と、エッジコスト画像とを組合わせて、最終コスト画像を生成する。
次に、ステップＳ２１２において、コスト算出部１７は、現在のフレーム画像が動画像データにおける最後のフレーム画像であるか否かを判定する。
現在のフレーム画像が最後のフレーム画像でない場合（ステップＳ２１２：Ｎｏ）、ステップＳ２１３において、コスト算出部１７は、現在のフレーム画像から所定時間ｔ秒（例えば３秒）後のフレーム画像を動画像データから取得し、ステップＳ２０６へ戻る。 Subsequently, in step S211, the cost calculation unit 17 generates a final cost image by combining the generated global cost image, face cost image, and edge cost image.
Next, in step S212, the cost calculation unit 17 determines whether or not the current frame image is the last frame image in the moving image data.
When the current frame image is not the last frame image (step S212: No), in step S213, the cost calculation unit 17 converts the frame image after a predetermined time t seconds (for example, 3 seconds) from the current frame image to moving image data. And return to step S206.

一方、現在のフレーム画像が動画像データにおける最後のフレームである場合（ステップＳ２１２：Ｙｅｓ）、ステップＳ２１４において、コスト算出部１７は、各フレーム画像の最終コスト画像を平均した平均最終コスト画像を生成する。平均最終コスト画像における各座標位置の画素値は、各フレーム画像の最終コスト画像における各座標位置の画素値の平均値である。 On the other hand, when the current frame image is the last frame in the moving image data (step S212: Yes), in step S214, the cost calculation unit 17 generates an average final cost image by averaging the final cost images of the respective frame images. To do. The pixel value at each coordinate position in the average final cost image is the average value of the pixel values at each coordinate position in the final cost image of each frame image.

次に、ステップＳ２１５において、領域決定部１８が、生成された平均最終コスト画像と、決定されたテキストデータの文字サイズとに基づいて、動画データにおけるテキストの合成領域を決定する。
最後に、ステップＳ２１６において、合成部１９が、決定された合成領域にテキストデータのテキストを重畳して、動画データとテキストデータとを合成する。 Next, in step S215, the area determination unit 18 determines a text synthesis area in the moving image data based on the generated average final cost image and the determined character size of the text data.
Finally, in step S216, the synthesizer 19 synthesizes the moving image data and the text data by superimposing the text data on the determined synthesis area.

なお、本実施形態では、平均最終コスト画像に基づいて動画データ全体における合成領域を決定しているが、動画データの所定時間毎に合成領域を決定してもよい。例えば、画像処理部１４０は、最初のフレーム画像に基づく合成領域ｒ_１を０秒からｔ−１秒までのフレーム画像の合成領域とし、ｔ秒のフレーム画像に基づく合成領域ｒ_２をｔ秒から２ｔ−１秒までのフレーム画像の合成領域とし、以下同様に各フレーム画像の合成領域を決定する。これにより、動画データにおける被写体の動きに応じて、最適な位置にテキストを合成することができる。 In the present embodiment, the composite area in the entire moving image data is determined based on the average final cost image, but the composite area may be determined every predetermined time of the moving image data. For example, the image processing unit 140 sets the synthesis region r ₁ based on the first frame image as the synthesis region of the frame image from 0 seconds to t−1 seconds, and the synthesis region r ₂ based on the t-second frame image from t seconds. The composite area of frame images up to 2t-1 seconds is determined, and the composite area of each frame image is determined in the same manner. As a result, the text can be synthesized at the optimum position in accordance with the movement of the subject in the moving image data.

このように、本実施形態によれば、画像処理部１４０は、画像データにおけるエッジに関するコストを示すエッジコスト画像に基づいて、テキストを合成する合成領域を決定する。このため、エッジの少ない領域（すなわち、複雑なテクスチャの存在しない領域）にテキストを合成することができる。これにより、テキスト表示に使われるフォントのアウトラインとテクスチャのエッジが重なるのを防ぐことができるため、閲覧者がテキストを読み易いように入力画像中にテキストを合成することができる。 As described above, according to the present embodiment, the image processing unit 140 determines a synthesis region in which text is to be synthesized based on an edge cost image indicating a cost related to an edge in image data. Therefore, it is possible to synthesize text in a region with few edges (that is, a region where no complex texture exists). Thereby, since it is possible to prevent the outline of the font used for text display and the texture edge from overlapping, the text can be synthesized in the input image so that the viewer can easily read the text.

また、テキストを表示する位置を固定している場合、入力画像の内容やテキストの分量によっては被写体や注目する人物、物体、背景等にテキストが重なり、入力画像本来の印象を悪くする恐れがある。本実施形態による画像処理部１４０は、画像データにおける人物の顔に関するコストを示す顔コスト画像に基づいて、テキストを合成する合成領域を決定しているため、人物の顔以外の領域にテキストを合成することができる。また、画像処理部１４０は、画像データにおける重要位置に関するコストを示すグローバルコスト画像に基づいて、テキストを合成する合成領域を決定しているため、重要位置から離れた領域にテキストを合成することができる。例えば、多くの画像では、中央部分に被写体があるため、中央部分を重要位置とすることにより、被写体以外の領域にテキストを合成することができる。また、本実施形態による画像処理部１４０では、重要位置をユーザが指定可能なため、例えば入力画像Ａでは中央部分を重要位置とし、入力画像Ｂでは端部分を重要位置とする等、入力画像毎に重要位置を変更することができる。 In addition, when the text display position is fixed, depending on the content of the input image and the amount of text, the text may overlap the subject, the person of interest, the object, the background, etc., and the original impression of the input image may be deteriorated. . Since the image processing unit 140 according to the present embodiment determines a synthesis area for text synthesis based on a face cost image indicating a cost related to a human face in the image data, the text is synthesized in an area other than the human face. can do. Further, since the image processing unit 140 determines a synthesis area for synthesizing text based on a global cost image indicating a cost related to an important position in the image data, the image processing unit 140 can synthesize text in an area away from the important position. it can. For example, in many images, a subject is present in the central portion, and text can be synthesized in an area other than the subject by setting the central portion as an important position. In the image processing unit 140 according to the present embodiment, since the user can specify the important position, for example, in the input image A, the central portion is set as the important position, and in the input image B, the end portion is set as the important position. The important position can be changed.

また、本実施形態によれば、画像処理部１４０は、グローバルコスト画像と、顔コスト画像と、エッジコスト画像とを組合わせた最終コスト画像に基づいて、テキストを合成する合成領域を決定しているため、総合的に最適な位置にテキストを合成することができる。 In addition, according to the present embodiment, the image processing unit 140 determines a synthesis area in which text is synthesized based on a final cost image obtained by combining a global cost image, a face cost image, and an edge cost image. Therefore, it is possible to synthesize text at the optimal position comprehensively.

ところで、文字サイズを固定している場合、入力画像の画像サイズによって画像データに対するテキストの相対的な大きさが極端に変化して閲覧者にとってふさわしくないテキスト表示となる場合がある。例えば、入力画像に対してテキストデータの文字サイズが相対的に大きい場合、入力画像内に全てのテキストが納まらず文章が読み取れない。本実施形態によれば、画像処理部１４０は、入力画像の画像サイズに応じてテキストデータの文字サイズを変更するため、テキスト全体を入力画像内に収めることができる。 By the way, when the character size is fixed, the relative size of the text with respect to the image data may change drastically depending on the image size of the input image, resulting in a text display unsuitable for the viewer. For example, when the character size of text data is relatively large with respect to the input image, all the text does not fit in the input image and the sentence cannot be read. According to the present embodiment, the image processing unit 140 changes the character size of the text data in accordance with the image size of the input image, so that the entire text can be stored in the input image.

また、本実施形態によれば、画像処理部１４０は、動画の画像データにもテキストを合成することができる。これにより、例えば、動画を放送やインターネット等により配信して再生中に、ユーザから寄せられたコメントを動的に画像中に表示するサービス等に応用することができる。また、画像処理部１４０は、複数のフレーム画像の平均最終コスト画像を用いて合成領域を決定しているため、動画像全体における被写体の動きを考慮した総合的に最適な領域にテキストを合成することができる。 Further, according to the present embodiment, the image processing unit 140 can synthesize text with moving image data. Thereby, for example, it can be applied to a service or the like that dynamically displays a comment received from a user in an image while a moving image is distributed and reproduced by broadcasting or the Internet. In addition, since the image processing unit 140 determines the synthesis region using the average final cost image of a plurality of frame images, the image processing unit 140 synthesizes the text into a comprehensively optimal region considering the movement of the subject in the entire moving image. be able to.

［第２の実施形態］
次に、この発明の第２の実施形態による画像処理部（画像処理装置）１４０ａについて説明する。
図６は、本実施形態による画像処理部１４０ａの機能構成を示すブロック図である。本図において、図２に示す画像処理部１４０と同一の部分には同一の符号を付し、その説明を省略する。画像処理部１４０ａは、図２に示す画像処理部１４０の構成に加えて第２の位置入力部２１を備える。
第２の位置入力部２１は、画像データにおいてテキストを合成する位置（以下、テキスト位置（第２の位置）とする）の入力を受け付ける。例えば、第２の位置入力部２１は、画像入力部１１に入力された画像データを表示部１５０に表示し、表示部１５０に設置されたタッチパネルにおいてユーザから指定された位置をテキスト位置とする。或いは、第２の位置入力部２１は、直接テキスト位置の座標値（ｘ_１，ｙ_１）の入力を受け付けてもよい。第２の位置入力部２１は、テキスト位置の座標値（ｘ_１，ｙ_１）をコスト算出部１７ａに出力する。 [Second Embodiment]
Next, an image processing unit (image processing apparatus) 140a according to a second embodiment of the present invention will be described.
FIG. 6 is a block diagram illustrating a functional configuration of the image processing unit 140a according to the present embodiment. In this figure, the same parts as those of the image processing unit 140 shown in FIG. The image processing unit 140a includes a second position input unit 21 in addition to the configuration of the image processing unit 140 shown in FIG.
The second position input unit 21 receives an input of a position (hereinafter referred to as a text position (second position)) where text is combined in the image data. For example, the second position input unit 21 displays the image data input to the image input unit 11 on the display unit 150, and sets the position specified by the user on the touch panel installed on the display unit 150 as the text position. Alternatively, the second position input unit 21 may directly accept input of coordinate values (x ₁ , y ₁ ) of the text position. The second position input unit 21 outputs the coordinate value (x ₁ , y ₁ ) of the text position to the cost calculation unit 17a.

コスト算出部１７ａは、第２の位置入力部２１により入力されたテキスト位置（ｘ_１，ｙ_１）と、画像データにおけるエッジの位置と、人物の顔の位置と、重要位置とに基づいて、画像データにおける各座標位置（ｘ，ｙ）のコストを算出する。具体的には、コスト算出部１７ａは、テキスト位置（ｘ_１，ｙ_１）に基づくコストを示すテキスト位置コスト画像と、グローバルコスト画像と、顔コスト画像と、エッジコスト画像とを組合わせて最終コスト画像を生成する。グローバルコスト画像、顔コスト画像及びエッジコスト画像の生成方法は第１の実施形態と同様である。 Based on the text position (x ₁ , y ₁ ) input by the second position input unit 21, the position of the edge in the image data, the position of the person's face, and the important position, the cost calculation unit 17 a The cost of each coordinate position (x, y) in the image data is calculated. Specifically, the cost calculation unit 17a combines the text position cost image indicating the cost based on the text position (x ₁ , y ₁ ), the global cost image, the face cost image, and the edge cost image. Generate a cost image. The generation method of the global cost image, the face cost image, and the edge cost image is the same as that in the first embodiment.

コスト算出部１７ａは、次の式（８）により、テキスト位置コスト画像ｃ_ｔ（ｘ，ｙ）を生成する。 The cost calculation unit 17a generates a text position cost image c _t (x, y) by the following equation (8).

ただし、Ｓ_３（＞０）は幅方向（Ｘ軸方向）におけるコストの広がり方を決めるパラメータであり、Ｓ_４（＞０）は高さ方向（Ｙ軸方向）におけるコストの広がり方を決めるパラメータである。テキスト位置コスト画像は、テキスト位置（ｘ_１，ｙ_１）に近いほどコストが低く、テキスト位置から遠いほどコストが高い画像である。 However, S ₃ (> 0) is a parameter that determines how the cost spreads in the width direction (X-axis direction), and S ₄ (> 0) is a parameter that determines how the cost spreads in the height direction (Y-axis direction). It is. The text position cost image is an image that is lower in cost as it is closer to the text position (x ₁ , y ₁ ) and higher in cost as it is farther from the text position.

そして、コスト算出部１７ａは、次の式（９）により、最終コスト画像ｃ（ｘ，ｙ）を生成する。 And the cost calculation part 17a produces | generates the final cost image c (x, y) by following Formula (9).

ただし、Ｃ_ｔ（≧０）はテキスト位置コスト画像の重み付け係数のパラメータである。
式（９）は、式（６）の分母にＣ_ｔを加算し、分子にＣ_ｔｃ_ｔ（ｘ，ｙ）を加算した式である。なお、コスト算出部１７ａは、第２の位置入力部２１によりテキスト位置が指定されなかった場合には、テキスト位置コスト画像を生成せずに上述した式（６）により最終コスト画像を生成する。または、コスト算出部１７ａは、第２の位置入力部２１によりテキスト位置が指定されなかった場合には、パラメータＣ_ｔ＝０とする。 However, C _t (≧ 0) is a parameter of the weighting coefficient of the text position cost image.
Expression (9) is an expression in which C _t is added to the denominator of Expression (6) and C _t _ct (x, y) is added to the numerator. Note that if the text position is not designated by the second position input unit 21, the cost calculation unit 17a generates the final cost image by the above-described equation (6) without generating the text position cost image. Alternatively, the cost calculation unit 17a sets the parameter C _t = 0 when the text position is not designated by the second position input unit 21.

また、コスト算出部１７ａは、画像データが動画である場合には、動画の画像データに含まれる複数のフレーム画像のコストの平均値を座標位置毎に算出する。具体的には、コスト算出部１７ａは、所定時間（例えば、３秒）間隔で動画のフレーム画像を取得し、取得したフレーム画像毎に最終コスト画像を生成する。そして、コスト算出部１７ａは、各フレーム画像の最終コスト画像を平均した平均最終コスト画像を生成する。 In addition, when the image data is a moving image, the cost calculation unit 17a calculates an average value of costs of a plurality of frame images included in the moving image data for each coordinate position. Specifically, the cost calculation unit 17a acquires a frame image of a moving image at a predetermined time (for example, 3 seconds) interval, and generates a final cost image for each acquired frame image. Then, the cost calculation unit 17a generates an average final cost image obtained by averaging the final cost images of the respective frame images.

次に、図７を参照して、画像処理部１４０ａによる合成処理について説明する。図７は、本実施形態による合成処理の手順を示すフローチャートである。
ステップＳ３０１からＳ３０３に示す処理は、上述したステップＳ１０１からＳ１０３に示す処理と同様である。
ステップＳ３０３に続いて、ステップＳ３０４において、第２の位置入力部２１は、入力された画像データにおけるテキスト位置の指定を受け付ける。
ステップＳ３０５からＳ３０７に示す処理は、上述したステップＳ１０４からＳ１０６に示す処理と同様である。 Next, with reference to FIG. 7, the composition processing by the image processing unit 140a will be described. FIG. 7 is a flowchart showing the procedure of the synthesis process according to the present embodiment.
The processes shown in steps S301 to S303 are the same as the processes shown in steps S101 to S103 described above.
Subsequent to step S303, in step S304, the second position input unit 21 accepts designation of a text position in the input image data.
The processing shown in steps S305 to S307 is the same as the processing shown in steps S104 to S106 described above.

ステップＳ３０７に続いて、ステップＳ３０８において、コスト算出部１７ａは、指定されたテキスト位置に基づいてテキスト位置コスト画像を生成する。
ステップＳ３０９からＳ３１１に示す処理は、上述したステップＳ１０７からＳ１０９に示す処理と同様である。 Following step S307, in step S308, the cost calculation unit 17a generates a text position cost image based on the designated text position.
The processing shown in steps S309 to S311 is the same as the processing shown in steps S107 to S109 described above.

ステップＳ３１１に続いて、ステップＳ３１２において、コスト算出部１７ａは、テキスト位置コスト画像と、グローバルコスト画像と、顔コスト画像と、エッジコスト画像とを組み合わせて、最終コスト画像を生成する。
次に、ステップＳ３１３において、領域決定部１８が、生成された最終コスト画像と、決定されたテキストデータの文字サイズとに基づいて、画像データにおけるテキストの合成領域を決定する。
最後に、ステップＳ３１４において、合成部１９が、決定した合成領域にテキストデータのテキストを重畳して、画像データとテキストデータとを合成する。 Subsequent to step S311, in step S312, the cost calculation unit 17a generates a final cost image by combining the text position cost image, the global cost image, the face cost image, and the edge cost image.
Next, in step S313, the area determination unit 18 determines a text synthesis area in the image data based on the generated final cost image and the determined character size of the text data.
Finally, in step S314, the synthesis unit 19 superimposes the text data on the determined synthesis area to synthesize the image data and the text data.

なお、本実施形態では、第２の位置入力部２１においてテキスト位置を指定しているが、例えば、テキストを合成したい領域を指定してもよい。この場合、コスト算出部１７ａは、指定された領域の画素値を「０」とし、それ以外の領域の画素値を「１」とするテキスト位置コスト画像を生成する。すなわち、コスト算出部１７ａは、指定された領域のコストを低くする。 In the present embodiment, the text position is specified in the second position input unit 21. However, for example, an area where text is to be synthesized may be specified. In this case, the cost calculation unit 17a generates a text position cost image in which the pixel value of the designated area is “0” and the pixel value of the other area is “1”. That is, the cost calculation unit 17a reduces the cost of the designated area.

このように、本実施形態によれば、ユーザは、テキストを合成する位置を指定可能であり、画像処理部１４０ａは、指定されたテキスト位置のコストを低くして合成領域を決定する。これにより、第１の実施形態と同様の効果のみならず、更に、ユーザが指定した位置を優先的にテキストデータの合成領域として選択することができる。 As described above, according to the present embodiment, the user can designate the position where the text is to be synthesized, and the image processing unit 140a determines the synthesis area by reducing the cost of the designated text position. Thereby, not only the effect similar to 1st Embodiment but the position designated by the user can be preferentially selected as a text data synthesis region.

［第３の実施形態］
次に、この発明の第３の実施形態による画像処理部（画像処理装置）１４０ｂについて説明する。
図８は、本実施形態による画像処理部１４０ｂの機能構成を示すブロック図である。本図において、図２に示す画像処理部１４０と同一の部分には同一の符号を付し、その説明を省略する。画像処理部１４０ｂは、図２に示す画像処理部１４０の構成に加えて第２の位置入力部３１を備える。
第２の位置入力部３１は、Ｘ軸方向（幅方向）又はＹ軸方向（高さ方向）いずれかにおけるテキスト位置（第２の位置）の入力を受け付ける。テキスト位置とは、画像データにおいてテキストを合成する位置である。例えば、第２の位置入力部３１は、画像入力部１１に入力された画像データを表示部１５０に表示し、表示部１５０に設置されたタッチパネルにおいてユーザから指定された位置をテキスト位置とする。或いは、第２の位置入力部３１は、直接テキスト位置のＸ座標値ｘ_２又はＹ座標値ｙ_２の入力を受け付けてもよい。第２の位置入力部３１は、テキスト位置のＸ座標値ｘ_２又はＹ座標値ｙ_２を領域決定部１８ｂに出力する。 [Third Embodiment]
Next, an image processing unit (image processing apparatus) 140b according to a third embodiment of the present invention will be described.
FIG. 8 is a block diagram illustrating a functional configuration of the image processing unit 140b according to the present embodiment. In this figure, the same parts as those of the image processing unit 140 shown in FIG. The image processing unit 140b includes a second position input unit 31 in addition to the configuration of the image processing unit 140 illustrated in FIG.
The second position input unit 31 receives an input of a text position (second position) in either the X-axis direction (width direction) or the Y-axis direction (height direction). The text position is a position where the text is synthesized in the image data. For example, the second position input unit 31 displays the image data input to the image input unit 11 on the display unit 150, and sets the position specified by the user on the touch panel installed on the display unit 150 as the text position. Alternatively, the second position input unit 31 may directly accept the input of the X coordinate value x ₂ or the Y coordinate value y ₂ of the text position. The second position input unit 31 outputs the X-coordinate value _{x 2} or Y-coordinate value _{y 2} of the text located in the region determination unit 18b.

領域決定部１８ｂは、第２の位置入力部３１により幅方向の位置ｘ_２が指定された場合、上述した式（７）においてＸ座標値をｘ_２に固定させてｃ^＊ _ｔｅｘｔ（ｘ_２，ｙ）が最小となるＹ座標値ｙ_ｍｉｎを求める。そして、領域決定部１８ｂは、位置（ｘ_２，ｙ_ｍｉｎ）を合成位置とする。 When the position x _{2 in the} width direction is designated by the second position input unit 31, the region determination unit 18 b fixes the X coordinate value to x ₂ in the above-described formula (7) and sets c ^* _text (x ₂ , A Y coordinate value y _min that minimizes y) is obtained. Then, the region determination unit 18b sets the position (x ₂ , y _min ) as a composite position.

また、領域決定部１８ｂは、第２の位置入力部３１により高さ方向の位置ｙ_２が指定された場合、上述した式（７）においてＹ座標値をｙ_２に固定させてｃ^＊ _ｔｅｘｔ（ｘ，ｙ_２）が最小となるｘ_ｍｉｎを求める。そして、領域決定部１８ｂは、位置（ｘ_ｍｉｎ，ｙ_２）を合成位置とする。 In addition, when the position y _{2 in the} height direction is designated by the second position input unit 31, the region determination unit 18 b fixes the Y coordinate value to y ₂ in the above-described formula (7) and sets c ^* _text ( x _min where x, y ₂ ) is minimized is obtained. Then, the region determination unit 18b sets the position (x _min , y ₂ ) as the synthesis position.

次に、図９を参照して、画像処理部１４０ｂによる合成処理について説明する。図９は、本実施形態による合成処理の手順を示すフローチャートである。
ステップＳ４０１からＳ４０３までの処理は、上述したステップＳ１０１からＳ１０３までの処理と同様である。
ステップＳ４０３に続いて、ステップＳ４０４において、第２の位置入力部３１は、テキスト位置のＸ座標値ｘ_２又はＹ座標値ｙ_２の入力を受け付ける。
ステップＳ４０５からＳ４１１までの処理は、上述したステップＳ１０４からＳ１１０までの処理と同様である。 Next, with reference to FIG. 9, the composition processing by the image processing unit 140b will be described. FIG. 9 is a flowchart showing the procedure of the synthesis process according to this embodiment.
The processing from step S401 to S403 is the same as the processing from step S101 to S103 described above.
Following step S403, at step S404, a second position input section 31 accepts an input of the X-coordinate of the text position value _{x 2} or Y-coordinate value _{y 2.}
The processing from step S405 to S411 is the same as the processing from step S104 to S110 described above.

ステップＳ４１１に続いて、ステップＳ４１２において、領域決定部１８ｂが、指定されたテキスト位置のＸ座標値ｘ_２又はＹ座標値ｙ_２と、テキストデータの文字サイズと、最終コスト画像とに基づいて、画像データにおけるテキストの合成領域を決定する。
最後に、ステップＳ４１３において、合成部１９が、決定された合成領域にテキストデータのテキストを重畳して、画像データとテキストデータとを合成する。 Following step S411, at step S412, the area determining unit 18b is, the X coordinate value _{x 2} or Y-coordinate value _{y 2} in the specified text position, a character size of the text data, based on the final cost image, A text synthesis area in image data is determined.
Finally, in step S413, the synthesizer 19 synthesizes the image data and the text data by superimposing the text data on the determined synthesis area.

このように、本実施形態によれば、テキストを合成する位置の幅方向又は高さ方向の座標を指定することができる。画像処理部１４０ｂは、指定された幅方向又は高さ方向の位置のうち最終コスト画像基づく最適な領域を合成領域とする。これにより、ユーザの所望する領域であって、最適な領域（例えば、テキストの可読性の高い領域、人の顔のない領域、或いは、重要位置以外の領域）にテキストを重畳することができる。 Thus, according to the present embodiment, it is possible to specify the coordinate in the width direction or the height direction of the position where the text is to be synthesized. The image processing unit 140b sets an optimum region based on the final cost image among the designated positions in the width direction or the height direction as a synthesis region. As a result, the text can be superimposed on an optimum area (for example, an area with high text readability, an area without a human face, or an area other than the important position) which is an area desired by the user.

また、図４，図５，図７又は図９に示す各ステップを実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、画像データとテキストデータとを合成する処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。
また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。
また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Further, a program for realizing each step shown in FIG. 4, FIG. 5, FIG. 7 or FIG. 9 is recorded on a computer-readable recording medium, and the program recorded on this recording medium is read by a computer system, By executing the processing, the image data and the text data may be combined. Here, the “computer system” may include an OS and hardware such as peripheral devices.
Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.
The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。
また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。
また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。
さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time.
The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line.
The program may be for realizing a part of the functions described above.
Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、図面を参照してこの発明の一実施形態について詳しく説明してきたが、具体的な構成は上述のものに限られることはなく、この発明の要旨を逸脱しない範囲内において様々な設計変更等をすることが可能である。
例えば、上述した実施形態では、撮像装置１００が画像処理部１４０，１４０ａ，１４０ｂを備えているが、例えば、パーソナルコンピュータ、タブレットＰＣ（Personal Computer）、デジタルカメラや携帯電話機等の端末装置が、画像処理装置である画像処理部１４０，１４０ａ，１４０ｂを備えてもよい。 As described above, the embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. It is possible to
For example, in the above-described embodiment, the imaging apparatus 100 includes the image processing units 140, 140a, and 140b. For example, a terminal device such as a personal computer, a tablet PC (Personal Computer), a digital camera, or a mobile phone The image processing units 140, 140a, and 140b, which are processing devices, may be provided.

また、上述した実施形態では、画像データにおける全ての領域を合成領域の候補としたが、画像データの余白を考慮し、余白以外の領域を合成領域の候補としてもよい。この場合、文字サイズ決定部１６は、以下の式（１０）を満たすｆを文字サイズとする。 In the above-described embodiment, all the regions in the image data are set as the synthesis region candidates. However, in consideration of the margin of the image data, a region other than the margin may be set as the synthesis region candidate. In this case, the character size determination unit 16 sets f satisfying the following expression (10) as the character size.

ただし、Ｍ_１は幅方向の余白の大きさを示すパラメータであり、Ｍ_２は高さ方向の余白の大きさを示すパラメータである。なお、パラメータＭ_１とパラメータＭ_２とは同じ値（Ｍ_１＝Ｍ_２＝Ｍ）でもよい。コスト算出部１７，１７ａは、画像データにおいて余白を除く領域の最終コスト画像を生成する。また、領域決定部１８，１８ｂは、余白を除く領域（Ｍ_１＜ｘ＜ｗ−Ｍ_１，Ｍ_２＜ｙ＜ｈ−Ｍ_２）から合成領域を選択する。 However, M ₁ is a parameter indicating the size of the width direction of the margin, M ₂ is a parameter indicating the size of the height direction of the margin. The parameter M ₁ and the parameter M ₂ may be the same value (M ₁ = M ₂ = M). The cost calculation units 17 and 17a generate a final cost image of an area excluding margins in the image data. In addition, the region determination units 18 and 18b select a composite region from regions excluding margins (M ₁ <x <w−M ₁ , M ₂ <y <h−M ₂ ).

また、本実施形態では、第１の位置入力部１３により重要位置を入力しているが、予め設定された所定位置（例えば、画像データの中央）を重要位置としてグローバルコスト画像を生成してもよい。例えば、画像データの中央を重要位置とする場合、コスト算出部１７，１７ａは、次の式（１１）によりグローバルコスト画像を生成する。 In the present embodiment, the important position is input by the first position input unit 13, but a global cost image may be generated with a predetermined position (for example, the center of the image data) as an important position. Good. For example, when the center of the image data is an important position, the cost calculation units 17 and 17a generate a global cost image by the following equation (11).

ただし、Ｓ（＞０）はコストの広がり方を決めるパラメータである。 However, S (> 0) is a parameter that determines how the cost spreads.

また、重要位置が予め決っている場合、グローバルコスト画像は画像サイズによって決るため、画像サイズ毎に予めグローバルコスト画像を用意して記憶部１６０に記憶しておいてもよい。コスト算出部１７，１７ａは、入力画像の画像サイズに応じたグローバルコスト画像を記憶部１６０から読み出して最終コスト画像を生成する。これにより、テキストデータを画像データに合成する処理毎にグローバルコスト画像を生成する必要がなくなるため、全体の処理時間が短縮される。 Further, when the important position is determined in advance, the global cost image is determined by the image size. Therefore, a global cost image may be prepared in advance for each image size and stored in the storage unit 160. The cost calculation units 17 and 17a read a global cost image corresponding to the image size of the input image from the storage unit 160 and generate a final cost image. This eliminates the need to generate a global cost image for each process of combining text data with image data, thereby reducing the overall processing time.

また、上述した実施形態では、人物の顔の領域に基づく顔コスト画像を生成しているが、任意の特徴量（例えば、物体や動物等）に基づくコスト画像を生成してもよい。この場合、コスト算出部１７，１７ａは、特徴量の領域のコストが高い特徴量コスト画像を生成する。例えば、コスト算出部１７，１７ａは、物体認識等により検出した特徴量の領域の画素値を「１」とし、その他の領域の画素値を「０」とする特徴量コスト画像を生成する。そして、コスト算出部１７は、特徴量コスト画像に基づいて最終コスト画像を生成する。 In the above-described embodiment, a face cost image based on a human face region is generated. However, a cost image based on an arbitrary feature amount (for example, an object or an animal) may be generated. In this case, the cost calculation units 17 and 17a generate a feature cost image having a high cost in the feature value region. For example, the cost calculation units 17 and 17 a generate a feature amount cost image in which the pixel value of the feature amount area detected by object recognition or the like is set to “1” and the pixel values of other regions are set to “0”. Then, the cost calculation unit 17 generates a final cost image based on the feature amount cost image.

また、領域決定部１８，１８ｂは、テキスト矩形領域内のコストの総和ｃ^＊ _ｔｅｘｔ（ｘ，ｙ）を算出する前に、次の式（１２）により、予め全ての座標位置（ｘ，ｙ）に対して微分画像を生成しておいてもよい。 Further, the area determination units 18 and 18b calculate all the coordinate positions (x, y) in advance by the following equation (12) before calculating the total cost c ^* _text (x, y) in the text rectangular area. Alternatively, a differential image may be generated.

この場合、領域決定部１８，１８ｂは、テキスト矩形領域内のコストの総和ｃ^＊ _ｔｅｘｔ（ｘ，ｙ）を次の式（１３）により算出する。 In this case, the area determination units 18 and 18b calculate the total cost c ^* _text (x, y) in the text rectangular area by the following equation (13).

図１０は、テキスト矩形領域内のコストの総和の算出方法を示すイメージ図である。
本図に示すように、式（１３）を用いると、４回の演算でテキスト矩形領域内のコストの総和ｃ^＊ _ｔｅｘｔ（ｘ，ｙ）を算出することができる。これにより、上述した式（７）によりテキスト矩形領域内のコストの総和ｃ^＊ _ｔｅｘｔ（ｘ，ｙ）を算出する場合と比べて処理時間を短縮することができる。 FIG. 10 is an image diagram showing a method of calculating the total cost in the text rectangular area.
As shown in this figure, when Expression (13) is used, the total cost c ^* _text (x, y) in the text rectangular area can be calculated by four operations. As a result, the processing time can be shortened compared to the case where the total cost c ^* _text (x, y) in the text rectangular area is calculated by the above-described equation (7).

図１１は、画像上に配置される文章を決定するために用いられる撮像画像の特徴量を抽出するプロセスの一例を模式的に示す図である。図１１の例において、画像処理装置の判定部は、撮像画像のシーンを人物画像又は風景画像に分類する。次に、画像処理装置は、そのシーンに応じて、撮像画像の特徴量を抽出する。特徴量は、人物画像の場合には、顔の数（被写体の人数）及び平均色（配色パターン）とすることができ、風景画像の場合には、平均色（配色パターン）とすることができる。これらの特徴量を基に、人物画像用テンプレート又は風景画像用テンプレートに挿入される単語（形容詞等）が決定される。 FIG. 11 is a diagram schematically illustrating an example of a process for extracting a feature amount of a captured image used for determining a sentence arranged on an image. In the example of FIG. 11, the determination unit of the image processing apparatus classifies the scene of the captured image into a person image or a landscape image. Next, the image processing apparatus extracts a feature amount of the captured image according to the scene. The feature amount can be the number of faces (number of subjects) and the average color (color arrangement pattern) in the case of a person image, and can be the average color (color arrangement pattern) in the case of a landscape image. . Based on these feature quantities, words (adjectives and the like) to be inserted into the person image template or the landscape image template are determined.

ここで、図１１の例では、配色パターンは、撮像画像を構成する代表的な複数の色の組み合わせで構成されている。したがって、配色パターンは、撮像画像の平均的な色（平均色）を表すことができる。一例において、配色パターンとして、「第１色」、「第２色」、「第３色」を規定し、これら３種類の色の組み合わせ、すなわち３種類の平均的な色に基づいて、人物画像用、又は風景画像用の文章テンプレートに挿入される単語（形容詞）を決定することができる。 Here, in the example of FIG. 11, the color arrangement pattern is configured by a combination of a plurality of representative colors constituting the captured image. Therefore, the color arrangement pattern can represent the average color (average color) of the captured image. In one example, “first color”, “second color”, and “third color” are defined as a color arrangement pattern, and based on a combination of these three colors, that is, based on three average colors, a person image Or a word (adjective) to be inserted into a text template for a landscape image.

図１１の例において、撮像画像のシーンは２種類（人物画像及び風景画像）に分類される。他の例において、撮像画像のシーンは、３種類以上（３、４、５、６、７、８、９、又は１０種類以上）に分類することができる。 In the example of FIG. 11, the scene of the captured image is classified into two types (a person image and a landscape image). In another example, the scene of the captured image can be classified into three or more types (3, 4, 5, 6, 7, 8, 9, or 10 types or more).

図１２は、画像上に配置される文章を決定するために用いられる撮像画像の特徴量を抽出するプロセスの別の一例を模式的に示す図である。図１２の例において、撮像画像のシーンを３種類以上に分類することができる。 FIG. 12 is a diagram schematically illustrating another example of a process for extracting a feature amount of a captured image used for determining a sentence arranged on an image. In the example of FIG. 12, the scene of the captured image can be classified into three or more types.

図１２の例において、画像処理装置の判定部は、撮像画像が人物画像（第１モード画像）、遠景画像（第２モード画像）、又はその他の画像（第３モード画像）いずれであるかを判定する。まず、判定部は、図１１の例と同様に、撮像画像が人物画像であるか、人物画像とは異なる画像であるかを判定する。 In the example of FIG. 12, the determination unit of the image processing apparatus determines whether the captured image is a person image (first mode image), a distant view image (second mode image), or another image (third mode image). judge. First, as in the example of FIG. 11, the determination unit determines whether the captured image is a person image or an image different from the person image.

次に、撮像画像が人物画像とは異なる画像である場合、判定部は、撮像画像が遠景画像（第２モード画像）又はその他の画像（第３モード画像）のうちいずれであるか、を判定する。この判定は、例えば、撮像画像に付与された画像識別情報の一部を用いて行うことができる。 Next, when the captured image is an image different from the human image, the determination unit determines whether the captured image is a distant view image (second mode image) or another image (third mode image). To do. This determination can be performed using, for example, a part of the image identification information given to the captured image.

具体的には、撮像画像が遠景画像かどうかを判定するために、画像識別情報の一部である焦点距離を用いることができる。判定部は、焦点距離が、あらかじめ設定された基準距離以上である場合、撮像画像を遠景画像と判定し、焦点距離が基準距離未満である場合、撮像画像をその他の画像と判定する。以上により、撮像画像が、人物画像（第１モード画像）、遠景画像（第２モード画像）、又はその他の画像（第３モード画像）の３種類にシーン分類される。なお、遠景画像（第２モード画像）の例は、海や山などの風景画像等を含み、その他の画像（第３モード画像）の例は、花及びペット等を含む。 Specifically, in order to determine whether the captured image is a distant view image, a focal length that is a part of the image identification information can be used. The determination unit determines that the captured image is a distant view image when the focal distance is greater than or equal to a preset reference distance, and determines the captured image as another image when the focal distance is less than the reference distance. As described above, the captured image is classified into three types of scenes: a person image (first mode image), a distant view image (second mode image), or another image (third mode image). Note that examples of distant view images (second mode images) include landscape images such as the sea and mountains, and examples of other images (third mode images) include flowers and pets.

図１２の例においても、撮像画像のシーンが分類された後、画像処理装置は、そのシーンに応じて、撮像画像の特徴量を抽出する。 Also in the example of FIG. 12, after the scene of the captured image is classified, the image processing apparatus extracts the feature amount of the captured image according to the scene.

図１２の例において、撮像画像が人物画像（第１シーン画像）の場合、画像上に配置される文章を決定するために用いられる撮像画像の特徴量として、顔の数（被写体の人数）及び／又は笑顔レベルを用いることができる。すなわち、撮像画像が人物画像の場合、顔の数（被写体の人数）の判定結果に加え、又は代えて笑顔レベルの判定結果に基づいて、人物画像用テンプレートに挿入される単語を決定することができる。以下、笑顔レベルの判定方法の一例について、図１３を用いて説明する。 In the example of FIG. 12, when the captured image is a person image (first scene image), the number of faces (number of subjects) and the feature amount of the captured image used to determine the text arranged on the image A smile level can be used. That is, when the captured image is a human image, a word to be inserted into the human image template may be determined based on the determination result of the smile level in addition to or instead of the determination result of the number of faces (number of subjects). it can. Hereinafter, an example of a smile level determination method will be described with reference to FIG.

図１３の例において、画像処理装置の判定部は、人物画像に対して、顔認識などの方法により顔領域を検出する（ステップＳ５００１）。一例において、口角部分の上り具合を数値化することにより、人物画像の笑顔度が算出される。なお、笑顔度の算出には例えば、顔認識にかかる公知の様々な技術を用いることができる。 In the example of FIG. 13, the determination unit of the image processing apparatus detects a face area from a person image by a method such as face recognition (step S5001). In one example, the degree of smile of a person image is calculated by digitizing the degree of ascending of the mouth corner. For example, various known techniques for face recognition can be used for calculating the smile level.

次に、判定部は、あらかじめ設定された第１の笑顔閾値αと、笑顔度を比較する（ステップＳ５００２）。笑顔度がα以上と判定された場合、判定部は、この人物画像の笑顔レベルは、「笑顔：大」であると判定する。 Next, the determination unit compares the smile level with the first smile threshold α set in advance (step S5002). When it is determined that the smile level is greater than or equal to α, the determination unit determines that the smile level of the person image is “smile: large”.

一方、笑顔度がα未満と判定された場合、判定部は、あらかじめ設定された第２の笑顔閾値βと笑顔度を比較する（ステップＳ５００３）。笑顔度がβ以上と判定された場合、判定部は、この人物画像の笑顔レベルは、「笑顔：中」であると判定する。さらに、笑顔度がβ未満と判定された場合、判定部は、この人物画像の笑顔レベルは、「笑顔：小」であると判定する。 On the other hand, when it is determined that the smile level is less than α, the determination unit compares the smile level with a second smile threshold value β set in advance (step S5003). When it is determined that the smile level is β or more, the determination unit determines that the smile level of this person image is “smile: medium”. Furthermore, when it is determined that the smile level is less than β, the determination unit determines that the smile level of the person image is “smile: small”.

人物画像の笑顔レベルの判定結果に基づき、人物画像用テンプレートに挿入される単語が決定される。ここで、「笑顔：大」の笑顔レベルに対応する単語の例としては、「喜びいっぱいの」、「とてもいい」等が挙げられる。「笑顔：中」の笑顔レベルに対応する単語の例としては、「嬉しそうな」、「いい穏やかな」等が挙げられる。「笑顔：小」の笑顔レベルに対応する単語の例としては、「真剣そうな」、「クールな」等が挙げられる。 A word to be inserted into the person image template is determined based on the determination result of the smile level of the person image. Here, examples of the word corresponding to the smile level of “smile: large” include “full of joy” and “very good”. Examples of words that correspond to the smile level of “smile: medium” include “joyful” and “good calm”. Examples of words corresponding to the smile level of “smile: small” include “seriously seems” and “cool”.

なお、上記では、人物画像用テンプレートに挿入される単語が、連体形である場合について説明したが、これに限ることはなく、例えば終止形であってもよい。この場合、「笑顔：大」の笑顔レベルに対応する単語の例としては、「笑顔が素敵」、「すごくいい笑顔だね」等が挙げられる。「笑顔：中」の笑顔レベルに対応する単語の例としては、「にこやかだね」、「いい表情」等が挙げられる。「笑顔：小」の笑顔レベルに対応する単語の例としては、「真剣そうです」、「真面目そうです」等が挙げられる。 In the above description, the case where the word inserted into the person image template is a continuous form has been described. However, the present invention is not limited to this, and may be an end form, for example. In this case, examples of words corresponding to the smile level of “smile: large” include “smile is nice”, “it is a very good smile”, and the like. Examples of words corresponding to the smile level of “smile: medium” include “smiley” and “good expression”. Examples of words corresponding to a smile level of “smile: small” include “looks serious” and “looks serious”.

図１４Ａは、画像処理装置の動作結果を示す出力画像の一例であり、この出力画像は、図１１の例に基づいて決定された文章を有する。図１４Ａの例において、撮像画像は人物画像であると判定され、特徴量としては被写体の人数、及び配色パターン（平均色）が抽出されている。また、配色パターンに応じて、人物画像用テンプレートに挿入される単語が、「重厚な」と決定されている。その結果、図１４Ａに示す出力結果が得られている。すなわち、図１４Ａの例では、撮像画像の平均色に基づいて、「重厚な」の単語（形容詞、連体形）が決定されている。 FIG. 14A is an example of an output image showing the operation result of the image processing apparatus, and this output image has a sentence determined based on the example of FIG. In the example of FIG. 14A, the captured image is determined to be a person image, and the number of subjects and the color arrangement pattern (average color) are extracted as the feature amount. Further, the word inserted into the person image template is determined as “heavy” according to the color arrangement pattern. As a result, the output result shown in FIG. 14A is obtained. That is, in the example of FIG. 14A, the word “heavy” (adjective, combined form) is determined based on the average color of the captured image.

図１４Ｂは、画像処理装置の動作結果を示す出力画像の別一例であり、この出力画像は、図１２の例に基づいて決定された文章を有する。図１４Ｂの例において、撮像画像は人物画像であると判定され、特徴量としては被写体の人数、及び笑顔レベルが抽出されている。また、笑顔レベルに応じて、人物画像用テンプレートに挿入される単語が、「いい表情」と決定されている。その結果、図１４Ｂに示す出力結果が得られている。すなわち、図１４Ｂの例では、撮像画像における人物の笑顔レベルに基づいて、「いい表情」の単語（終止形）が決定されている。図１４Ｂの出力結果のように、人物画像に対して笑顔レベルを用いた単語出力を用いることで、画像から受ける印象に比較的近い文字情報を添付することができる。 FIG. 14B is another example of an output image showing the operation result of the image processing apparatus, and this output image has a sentence determined based on the example of FIG. In the example of FIG. 14B, it is determined that the captured image is a person image, and the number of subjects and the smile level are extracted as feature amounts. Further, according to the smile level, the word inserted into the person image template is determined as “good expression”. As a result, the output result shown in FIG. 14B is obtained. That is, in the example of FIG. 14B, the word (end form) of “good expression” is determined based on the smile level of the person in the captured image. As shown in the output result of FIG. 14B, by using the word output using the smile level for the person image, it is possible to attach character information that is relatively close to the impression received from the image.

図１２に戻り、撮像画像が風景画像（第２シーン画像）又はその他の画像（第３シーン画像）の場合、画像上に配置される文章を決定するために用いられる撮像画像の特徴量として、平均色に代えて、代表色を用いることができる。代表色としては、配色パターンにおける「第１色」、すなわち撮像画像において最も頻度の多い色を用いることができる。あるいは、代表色は、以下に説明するように、クラスタリングを用いて決定することができる。 Returning to FIG. 12, when the captured image is a landscape image (second scene image) or another image (third scene image), as the feature amount of the captured image used to determine the text arranged on the image, A representative color can be used instead of the average color. As the representative color, the “first color” in the color arrangement pattern, that is, the most frequently used color in the captured image can be used. Alternatively, the representative color can be determined using clustering as described below.

図１５は、撮像装置に含まれる画像処理部の内部構成を表す概略ブロック図である。図１５の例において、画像処理装置の画像処理部５０４０は、画像データ入力部５０４２と、解析部５０４４と、文章作成部５０５２と、文章付加部５０５４とを有する。画像処理部５０４０は、撮像部等で生成された画像データについて、各種の解析処理を行うことにより、画像データの内容に関する各種の情報を取得し、画像データの内容と整合性の高いテキストを作成し、画像データにテキストを付加することができる。 FIG. 15 is a schematic block diagram illustrating an internal configuration of an image processing unit included in the imaging apparatus. In the example of FIG. 15, the image processing unit 5040 of the image processing apparatus includes an image data input unit 5042, an analysis unit 5044, a text creation unit 5052, and a text addition unit 5054. The image processing unit 5040 performs various types of analysis processing on the image data generated by the imaging unit or the like, thereby acquiring various types of information regarding the content of the image data, and creating text that is highly consistent with the content of the image data. Then, text can be added to the image data.

解析部５０４４は、色情報抽出部５０４６、領域抽出部５０４８、クラスタリング部５０５０を有しており、画像データに対して解析処理を行う。色情報抽出部５０４６は、画像データから、画像データに含まれる各画素の色情報に関する第１情報を抽出する。典型的には、第１情報は、画像データに含まれる全ての画素のＨＳＶ値を、集計したものである。ただし、第１情報は、類似性が関連づけられた（例えば所定の色空間に関連付けされた）所定の色について、この所定の色が画像中に表れる頻度（画素単位での頻度、面積割合等）を示す情報であればよく、色の解像度や、色空間の種類は限定されない。 The analysis unit 5044 includes a color information extraction unit 5046, a region extraction unit 5048, and a clustering unit 5050, and performs analysis processing on the image data. The color information extraction unit 5046 extracts first information regarding color information of each pixel included in the image data from the image data. Typically, the first information is a total of the HSV values of all the pixels included in the image data. However, the first information is the frequency at which the predetermined color appears in the image (frequency in pixel units, area ratio, etc.) for a predetermined color associated with similarity (for example, associated with a predetermined color space). The color resolution and the type of color space are not limited.

例えば、第１情報は、ＨＳＶ空間ベクトル（ＨＳＶ値）やＲＧＢ値で表されるそれぞれの色について、それぞれの色の画素が、画像データに幾つずつ含まれるか、を表す情報であっても良い。ただし、第１情報における色解像度は、演算処理の負担等を考慮して適宜変更すれば良く、また、色空間の種類もＨＳＶやＲＧＢに限られず、ＣＭＹ、ＣＭＹＫ等であっても良い。 For example, the first information may be information indicating how many pixels of each color are included in the image data for each color represented by an HSV space vector (HSV value) or RGB value. . However, the color resolution in the first information may be changed as appropriate in consideration of the burden of calculation processing, and the type of color space is not limited to HSV or RGB, and may be CMY, CMYK, or the like.

図１６は、解析部５０４４において行われる代表色の決定の流れを表すフローチャートである。図１６のステップＳ５１０１では、画像処理装置が、具体的な画像データ５０６０（撮像画像、図１７参照）の代表色の算出を開始する。 FIG. 16 is a flowchart showing the flow of representative color determination performed in the analysis unit 5044. In step S5101, the image processing apparatus starts calculating the representative color of specific image data 5060 (captured image, see FIG. 17).

ステップＳ５１０２では、画像処理装置の画像データ入力部５０４２が、画像データを解析部５０４４に出力する。次に、解析部５０４４の色情報抽出部５０４６は、画像データに含まれる各画素の色情報に関する第１情報５０６２を算出する（図１７参照）。 In step S5102, the image data input unit 5042 of the image processing apparatus outputs the image data to the analysis unit 5044. Next, the color information extraction unit 5046 of the analysis unit 5044 calculates first information 5062 regarding the color information of each pixel included in the image data (see FIG. 17).

図１７は、ステップＳ５１０２において色情報抽出部５０４６が実施する第１情報５０６２の算出処理を表す概念図である。色情報抽出部５０４６は、画像データ５０６０に含まれる色情報を、各色毎（例えば２５６階調の各階調毎）に集計し、第１情報５０６２を得る。図１７の下図に示すヒストグラムは、色情報抽出部５０４６によって算出された第１情報５０６２のイメージを表している。図１７のヒストグラムの横軸は色であり、縦軸は、画像データ５０６０中に、所定の色の画素がいくつ含まれるかを表している。 FIG. 17 is a conceptual diagram illustrating a calculation process of the first information 5062 performed by the color information extraction unit 5046 in step S5102. The color information extraction unit 5046 aggregates the color information included in the image data 5060 for each color (for example, for each gradation of 256 gradations) to obtain first information 5062. The histogram shown in the lower part of FIG. 17 represents an image of the first information 5062 calculated by the color information extraction unit 5046. The horizontal axis of the histogram in FIG. 17 is color, and the vertical axis represents how many pixels of a predetermined color are included in the image data 5060.

図１６のステップＳ５１０３では、解析部５０４４の領域抽出部５０４８が、画像データ５０６０における主要領域を抽出する。例えば、領域抽出部５０４８は、図１７に示す画像データ５０６０の中からピントが合っている領域を抽出し、画像データ５０６０の中央部分を主要領域であると認定する（図１８における主要領域５０６４参照）。 In step S5103 of FIG. 16, the region extraction unit 5048 of the analysis unit 5044 extracts the main region in the image data 5060. For example, the region extraction unit 5048 extracts a focused region from the image data 5060 shown in FIG. 17, and recognizes the central portion of the image data 5060 as the main region (see the main region 5064 in FIG. 18). ).

図１６のステップＳ５１０４では、解析部５０４４の領域抽出部５０４８が、ステップＳ５１０５で実施されるクラスタリングの対象領域を決定する。例えば、領域抽出部５０４８は、図１８の上部に示すように、ステップＳ５１０３において画像データ５０６０の一部を主要領域５０６４であると認識し、主要領域５０６４を抽出した場合、クラスタリングの対象を、主要領域５０６４に対応する第１情報５０６２（主要第１情報５０６６）とする。図１８の下図に示すヒストグラムは、主要第１情報５０６６のイメージを表している。 In step S5104 in FIG. 16, the region extraction unit 5048 of the analysis unit 5044 determines a clustering target region to be implemented in step S5105. For example, as shown in the upper part of FIG. 18, the region extraction unit 5048 recognizes that part of the image data 5060 is the main region 5064 in step S5103 and extracts the main region 5064, the clustering target The first information 5062 (main first information 5066) corresponding to the area 5064 is used. The histogram shown in the lower part of FIG. 18 represents an image of the main first information 5066.

一方、領域抽出部５０４８が、ステップS５１０３において画像データ５０６０における主要領域５０６４を抽出しなかった場合、領域抽出部５０４８は、図１７に示すように、画像データ５０６０の全領域に対応する第１情報５０６２を、クラスタリングの対象に決定する。なお、クラスタリングの対象領域が異なることを除き、主要領域５０６４が抽出された場合と抽出されなかった場合とで、その後の処理に違いはないため、以下では、主要領域が抽出された場合を例に説明を行う。 On the other hand, when the region extraction unit 5048 has not extracted the main region 5064 in the image data 5060 in step S5103, the region extraction unit 5048 displays the first information corresponding to the entire region of the image data 5060 as shown in FIG. 5062 is determined as a clustering target. Note that there is no difference in the subsequent processing between the case where the main region 5064 is extracted and the case where it is not extracted, except that the target region for clustering is different. I will explain.

図１６のステップＳ５１０５では、解析部５０４４のクラスタリング部５０５０が、ステップＳ５１０４で決定された領域の第１情報５０６２である主要第１情報５０６６に対して、クラスタリングを実施する。図１９は、図１８に示す主要領域５０６４の主要第１情報５０６６について、クラスタリング部５０５０が実施したクラスタリングの結果を表す概念図である。 In step S5105 of FIG. 16, the clustering unit 5050 of the analysis unit 5044 performs clustering on the main first information 5066 that is the first information 5062 of the area determined in step S5104. FIG. 19 is a conceptual diagram showing the result of clustering performed by the clustering unit 5050 on the main first information 5066 of the main region 5064 shown in FIG.

クラスタリング部５０５０は、例えば、２５６階調の主要第１情報５０６６（図１８参照）を、ｋ−ｍｅａｎｓ法によって複数のクラスタに分類する。なお、クラスタリングは、ｋ−ｍｅａｎｓ法（ｋ平均法）に限定されない。他の例において、最短距離法等の他の方法を用いることができる。 For example, the clustering unit 5050 classifies the main first information 5066 (see FIG. 18) having 256 gradations into a plurality of clusters by the k-means method. Note that the clustering is not limited to the k-means method (k average method). In other examples, other methods such as the shortest distance method can be used.

図１９の上部は、各画素がどのクラスタに分類されたかを表しており、図１９の下部に示すヒストグラムは、各クラスタに属する画素の数を示したものである。クラスタリング部５０５０によるクラスタリングによって、２５６階調の主要第１情報５０６６（図１８）は、２５６より少ない（図１９に示す例では３つの）クラスタに分類されている。クラスタリングの結果は、各クラスタの大きさに関する情報と、各クラスタの色（クラスタの色空間上の位置）に関する情報とを含むことができる。 The upper part of FIG. 19 shows to which cluster each pixel is classified, and the histogram shown in the lower part of FIG. 19 shows the number of pixels belonging to each cluster. By the clustering by the clustering unit 5050, the 256 first main information 5066 (FIG. 18) is classified into less than 256 (three in the example shown in FIG. 19) clusters. The result of clustering can include information about the size of each cluster and information about the color of each cluster (the position of the cluster in the color space).

ステップＳ５１０６は、解析部５０４４のクラスタリング部５０５０が、クラスタリングの結果に基づき、画像データ５０６０の代表色を決定する。一例において、クラスタリング部５０５０は、図１７に示すようなクラスタリング結果を得た場合、算出された複数のクラスタのうち最も多くの画素を含む最大クラスタ５０７４に属する色を、画像データ５０６０の代表色とする。 In step S5106, the clustering unit 5050 of the analysis unit 5044 determines a representative color of the image data 5060 based on the clustering result. In one example, when the clustering unit 5050 obtains a clustering result as illustrated in FIG. 17, the color belonging to the maximum cluster 5074 including the most pixels among the plurality of calculated clusters is set as the representative color of the image data 5060. To do.

代表色の算出が終了すると、文章作成部５０５２は、代表色に関する情報を用いてテキストを作成し、画像データ５０６０に付与する。 When the calculation of the representative color is completed, the sentence creation unit 5052 creates a text using information on the representative color and assigns the text to the image data 5060.

文章作成部５０５２は、例えば風景画像用の文章テンプレートを読み出し、文章テンプレートの｛日時｝に、画像データ５０６０の生成日時に対応する単語（例えば「２０１２／０３／１０」）を適用する。この場合、解析部５０４４は、画像データ５０６０の生成日時に関する情報を記憶媒体等から検索し、文章作成部５０５２に出力することができる。 The text creation unit 5052 reads a text template for a landscape image, for example, and applies a word (for example, “2012/03/10”) corresponding to the generation date and time of the image data 5060 to {date and time} of the text template. In this case, the analysis unit 5044 can retrieve information related to the generation date and time of the image data 5060 from the storage medium and output the information to the text creation unit 5052.

また、文章作成部５０５２は、文章テンプレートの｛形容詞｝に、画像データ５０６０の代表色に対応する単語を適用する。文章作成部５０５２は、記憶部５０２８から対応情報を読み出して、文章テンプレートに適用する。一例において、記憶部５０２８には、シーン毎に色と単語とが関連付けられたテーブルが保存されている。文章作成部５０５２は、そのテーブルから読み出した単語を用いて文章（例えば「とてもきれいなものを見つけた」）を作成することができる。 In addition, the sentence creation unit 5052 applies a word corresponding to the representative color of the image data 5060 to the {adjective} of the sentence template. The sentence creation unit 5052 reads the correspondence information from the storage unit 5028 and applies it to the sentence template. In one example, the storage unit 5028 stores a table in which colors and words are associated with each scene. The sentence creation unit 5052 can create a sentence (for example, “I found a very beautiful thing”) using words read from the table.

図２０は、上述した一連の処理によってテキストを付与された画像データ５０８０を表示したものである。 FIG. 20 shows image data 5080 to which text is given by the series of processes described above.

図２１は、シーンが遠景画像の場合に、上述と同様の一連の処理によってテキストを付与された画像データの例を示したものである。この場合、シーンが遠景画像に分類され、かつ代表色は青と判定されている。例えば、シーン毎に色と単語とが関連付けられたテーブルにおいて、代表色の「青」に対して単語「爽やかな」等が対応付けられている。 FIG. 21 shows an example of image data to which text is given by a series of processes similar to the above when the scene is a distant view image. In this case, the scene is classified as a distant view image, and the representative color is determined to be blue. For example, in a table in which colors and words are associated with each scene, the word “fresh” is associated with the representative color “blue”.

図２２は、色と単語との対応情報を有するテーブルの一例を示す図である。図２２のテーブルにおいて、人物画像（第１シーン画像）、遠景画像（第２シーン画像）、及びその他の画像（第３シーン画像）、のシーンごとに、色と単語とが関連付けられている。一例において、画像データの代表色が「青」であり、シーンがその他の画像（第３シーン画像）であるとき、文章作成部５０５２は、テーブルの対応情報から、代表色に対応する単語（例えば「上品な」）を選択し、文章テンプレートの｛形容詞｝に適用する。 FIG. 22 is a diagram illustrating an example of a table having correspondence information between colors and words. In the table of FIG. 22, a color and a word are associated with each scene of a person image (first scene image), a distant view image (second scene image), and another image (third scene image). In one example, when the representative color of the image data is “blue” and the scene is another image (third scene image), the sentence creation unit 5052 uses a word corresponding to the representative color (for example, from the correspondence information in the table). “Classy”) and select {adjective} in the sentence template.

色と単語との対応テーブルは、例えば、ＰＣＣＳ表色系、ＣＩＣＣ表色系、又はＮＣＳ表色系などのカラーチャートに基づき設定することができる。 The correspondence table between colors and words can be set based on a color chart such as a PCCS color system, CICC color system, or NCS color system.

図２３は、ＣＣＩＣ表示系のカラーチャートを用いた、遠景画像（第２シーン画像）用の対応テーブルの一例を示す。図２４は、ＣＣＩＣ表示系のカラーチャートを用いた、その他の画像（第３シーン画像）用の対応テーブルの一例を示す。 FIG. 23 shows an example of a correspondence table for a distant view image (second scene image) using a color chart of the CCIC display system. FIG. 24 shows an example of a correspondence table for other images (third scene images) using a CCIC display color chart.

図２３において、横軸は、代表色の色相に、縦軸は代表色のトーンに対応している。単語の決定に図２３のテーブルを用いることにより、代表色の色相の情報だけでなく、代表色のトーンの情報も併せて単語を決定し、人間が生じる感性に比較的近いテキストを付与することが可能となる。以下、図２３のテーブルを用いた、遠景画像（第２シーン画像）の場合の具体的なテキストの設定例を説明する。なお、その他の画像（第３シーン画像）の場合、図２４のテーブルを用いて同様に設定することができる。 In FIG. 23, the horizontal axis corresponds to the hue of the representative color, and the vertical axis corresponds to the tone of the representative color. By using the table of FIG. 23 to determine the word, not only the information on the hue of the representative color but also the information on the tone of the representative color is used to determine the word, and a text that is relatively close to the sensibility generated by humans is given. Is possible. Hereinafter, a specific text setting example in the case of a distant view image (second scene image) using the table of FIG. 23 will be described. In the case of other images (third scene images), the same setting can be made using the table of FIG.

図２３において、代表色が領域Ａ５００１と判定された場合、その代表色の呼称（赤、橙、黄、青など）がそのままテキスト中の単語に適用される。例えば、代表色の色相が「赤（Ｒ）」、トーンが「ビビッド・トーン（Ｖ）」の場合、その色を表す形容詞「真っ赤な」等が選択される。 In FIG. 23, when the representative color is determined to be the area A5001, the representative color designation (red, orange, yellow, blue, etc.) is applied to the word in the text as it is. For example, if the hue of the representative color is “red (R)” and the tone is “Vivid Tone (V)”, the adjective “crimson” representing the color is selected.

また、代表色が領域Ａ５００２、Ａ５００３、Ａ５００４又はＡ５００５の色と判定された場合、その色から連想する形容詞が、テキスト中の単語に適用される。例えば、代表色が領域Ａ５００３の色（緑）と判定された場合、緑から連想する形容詞である「心地良い」、「さわやかな」等が適用される。 When the representative color is determined to be the color of the region A5002, A5003, A5004, or A5005, an adjective associated with the color is applied to the word in the text. For example, when the representative color is determined to be the color (green) of the area A5003, the adjectives associated with green, such as “comfortable” and “fresh”, are applied.

なお、代表色が領域Ａ５００１〜Ａ５００５の色と判定され、且つそのトーンがビビッド・トーン（Ｖ）、ストロング・トーン（Ｓ）、ブライト・トーン（Ｂ）、又はペール・トーン（ＬＴ）の場合には、形容詞の前に程度を表す副詞（例：とても、かなり等）が適用される。 When the representative color is determined to be the color of the area A5001 to A5005 and the tone is a vivid tone (V), a strong tone (S), a bright tone (B), or a pale tone (LT). Applies adverbs that indicate the degree before the adjectives (eg, very, pretty, etc.).

代表色が領域Ａ５００６、すなわち「ホワイト・トーン（白）」と判定された場合、白から連想される単語である「清らかな」、「澄んだ」等が選択される。また、代表色が領域Ａ５００７、すなわちグレー系の色（ライト・グレイ・トーン：ｌｔＧＹ、ミディアム・グレイ・トーン：ｍＧＹ、又はダーク・グレイ・トーン：ｄｋＧＹ）と判定された場合、無難な形容詞である「きれいな」、「すてきな」等が選択される。白、又はグレー系の色、すなわち無彩色が代表色となる画像においては、さまざまな色が画像全体に含まれる場合が多い。したがって、色とは関連性の少ない単語を用いることで、的外れな意味のテキストが付与されるのを防止し、画像から受けるイメージに比較的近いテキストを付与することができる。 When the representative color is determined to be the area A5006, that is, “white tone (white)”, “clean”, “clear”, and the like, which are words associated with white, are selected. Further, if the representative color is determined to be the area A5007, that is, a gray color (light gray tone: ltGY, medium gray tone: mGY, or dark gray tone: dkGY), it is a safe adjective. “Clean”, “nice”, etc. are selected. In an image in which a white or gray color, that is, an achromatic color is a representative color, various colors are often included in the entire image. Therefore, by using a word that is less related to color, it is possible to prevent a text having an inappropriate meaning from being added, and it is possible to provide a text that is relatively close to an image received from an image.

また、代表色が領域Ａ５００１〜Ａ５００７のいずれの領域にも属さない場合、すなわち代表色が低トーン（ダーク・グレイッシュ・トーン）、又は黒（ブラック・トーン）である場合、所定の意味を有する文字（単語、又は文章）をテキストとして選択することができる。所定の意味を有する文字は、例えば、「ここはどこ」、「あっ」等を含む。これらの単語や文章は、「つぶやき辞書」として画像処理装置の記憶部に保存しておくことができる。 Further, when the representative color does not belong to any of the areas A5001 to A5007, that is, when the representative color is a low tone (dark grayish tone) or black (black tone), characters having a predetermined meaning (Word or sentence) can be selected as text. Characters having a predetermined meaning include, for example, “where is here”, “a”, and the like. These words and sentences can be stored in the storage unit of the image processing apparatus as a “tweet dictionary”.

すなわち、代表色が低トーン、又は黒と判定されたとき、画像全体の色相の判定が困難なことがあるが、このような場合においても上記のように色とは関連性の少ない文字を用いることで、的外れな意味のテキストが付与されるのを防止し、画像から受けるイメージに近いテキストを付与することができる。 In other words, when the representative color is determined to be low tone or black, it may be difficult to determine the hue of the entire image. Even in such a case, characters having less relation to the color are used as described above. Thus, it is possible to prevent a text having an inappropriate meaning from being added, and to add a text close to an image received from an image.

また、上記の例では、シーンと代表色に応じて文章と単語が一義的に決定される場合について説明したが、これに限らず、文章と単語の選択において、時々、例外処理を行うこともできる。例えば、複数回に１回（例えば１０回に１回）は、上記の「つぶやき辞書」からテキストを抽出してもよい。これにより、テキストの表示内容が必ずしもパターン化されることがないので、ユーザが表示内容に飽きるのを防止することができる。 In the above example, the case where the sentence and the word are uniquely determined according to the scene and the representative color has been described. However, the present invention is not limited to this, and exception processing is sometimes performed in the selection of the sentence and the word. it can. For example, the text may be extracted from the “tweet dictionary” once every plural times (for example, once every 10 times). As a result, the display content of the text is not necessarily patterned, so that the user can be prevented from getting bored with the display content.

なお、上記の例において、文章付加部は、文章作成部によって生成されたテキストを画像の上部、又は下部に配置する場合について説明したが、これに限らず、例えばテキストを画像の外（枠外）に配置することもできる。 In the above example, the case where the sentence adding unit arranges the text generated by the sentence creating unit at the upper part or the lower part of the image has been described. However, the present invention is not limited to this. It can also be arranged.

また、上記の例において、テキストの位置が画像内で固定されている場合について説明したが、これに限らず、例えば画像処理装置の表示部において、テキストを流れるように表示させることができる。これにより、入力画像がテキストにより影響を受けにくい、又はテキストの視認性が向上される。 In the above example, the case where the position of the text is fixed in the image has been described. However, the present invention is not limited to this. For example, the text can be displayed so as to flow on the display unit of the image processing apparatus. Thereby, the input image is not easily affected by the text, or the text visibility is improved.

なお、上記の例において、テキストが画像に必ず貼り付けられる場合について説明したが、これに限らず、例えば人物画像の場合には、テキストは貼り付けず、遠景画像又はその他の画像の場合にはテキストを貼り付けるようにしてもよい。 In the above example, the case where the text is always pasted on the image has been described. However, the present invention is not limited to this. For example, in the case of a person image, the text is not pasted, and in the case of a distant view image or other images. You may make it paste a text.

また、上記の例において、文章付加部は、文章作成部によって生成されたテキストの表示方法（フォント、色、表示位置など）を所定の方法で決定する場合について説明したが、これに限らず、テキストの表示方法は、多種多様に決定することができる。以下、これらの方法について、いくつかの例を示す。 In the above example, the sentence adding unit has described the case where the display method (font, color, display position, etc.) of the text generated by the sentence creating unit is determined by a predetermined method. A variety of text display methods can be determined. Hereinafter, some examples of these methods will be described.

一例においては、ユーザが画像処理装置の操作部介して、テキストの表示方法（フォント、色、表示位置）を修正することができる。或いは、ユーザは、テキストの内容（単語）を変更、又は削除することができる。また、ユーザは、テキスト全体を表示させないように設定する、すなわちテキストの表示／非表示を選択することができる。 In one example, the user can correct the text display method (font, color, display position) via the operation unit of the image processing apparatus. Alternatively, the user can change or delete the contents (words) of the text. In addition, the user can select not to display the entire text, that is, display / non-display of the text.

また、一例においては、入力画像のシーンに応じてテキストの大きさを変更することができる。例えば、入力画像のシーンが人物画像の場合、テキストを小さくし、入力画像のシーンが遠景画像又はその他の画像の場合、テキストを大きくすることができる。 In one example, the size of the text can be changed according to the scene of the input image. For example, when the scene of the input image is a person image, the text can be reduced, and when the scene of the input image is a distant view image or other images, the text can be increased.

また、一例においては、テキストを強調表示して画像データに合成することもできる。例えば、入力画像が人物画像の場合、人物に吹き出しを付与し、その吹き出し中にテキストを配置することができる。 In one example, text can be highlighted and combined with image data. For example, when the input image is a person image, a balloon can be given to the person and text can be placed in the balloon.

また、一例においては、テキストの表示色は、入力画像の代表色を基準として設定することできる。具体的には、入力画像の代表色と色相は同じであり、且つトーンが異なる色を、テキストの表示色として用いることができる。これにより、テキストが過度に主張されることなく、入力画像とほどよく調和したテキストを付与することができる。 In one example, the display color of the text can be set with reference to the representative color of the input image. Specifically, a color having the same hue as the representative color of the input image and a different tone can be used as a text display color. As a result, it is possible to give a text that is in harmony with the input image without excessively claiming the text.

また、特に、入力画像の代表色が白の場合、テキストの表示色の決定において、例外処理を行ってもよい。ここで、例外処理では例えば、テキストの色を白とし、そのテキストの周辺部を黒に設定することができる。 In particular, when the representative color of the input image is white, exception processing may be performed in determining the text display color. Here, in the exception processing, for example, the text color can be set to white and the peripheral portion of the text can be set to black.

１１…画像入力部１２…テキスト入力部１３…第１の位置入力部１４…エッジ検出部１５…顔検出部１６…文字サイズ決定部１７，１７ａ…コスト算出部１８，１８ｂ…領域決定部１９…合成部２１，３１…第２の位置入力部１００…撮像装置１１０…撮像部１４０，１４０ａ，１４０ｂ…画像処理部１６０…記憶部 DESCRIPTION OF SYMBOLS 11 ... Image input part 12 ... Text input part 13 ... 1st position input part 14 ... Edge detection part 15 ... Face detection part 16 ... Character size determination part 17, 17a ... Cost calculation part 18, 18b ... Area determination part 19 ... Composition unit 21, 31 ... second position input unit 100 ... imaging device 110 ... imaging unit 140, 140a, 140b ... image processing unit 160 ... storage unit

Claims

An acquisition unit for acquiring image data and text data;
A detection unit for detecting an edge of the image data acquired by the acquisition unit;
An area determination unit that determines an area in which the text data is arranged in the image data based on the edge detected by the detection unit;
An image generation unit that generates an image in which the text data is arranged in an area determined by the area determination unit;
An image processing apparatus comprising:

An image processing apparatus according to claim 1,
The area determination unit determines an area with few edges in the image data as an area where the text data is arranged.

An image input unit for inputting image data;
An edge detection unit for detecting edges in the image data input by the image input unit;
A text input section for inputting text data;
An area determination unit that determines a synthesis area of the text data in the image data based on the edge detected by the edge detection unit;
A synthesizing unit that synthesizes the text data with the synthesis region determined by the region determining unit;
An image processing apparatus comprising:

The image processing apparatus according to claim 3.
The area determination unit determines an area with few edges in the image data as the synthesis area.

The image processing apparatus according to claim 3 or 4,
A cost calculation unit that calculates the cost representing the importance at each position of the image data so that the cost of the position where the edge is detected by the edge detection unit is high;
The image processing apparatus according to claim 1, wherein the region determination unit determines a region having a low cost corresponding to the synthesis region as the synthesis region based on the cost calculated by the cost calculation unit.

The image processing apparatus according to claim 5.
A first position input unit for inputting a first position in the image data;
The cost calculating unit increases the cost as the position is closer to the first position input by the first position input unit, and lowers the cost as the position is farther from the first position. Processing equipment.

The image processing apparatus according to claim 5 or 6,
A face detection unit for detecting a human face from the image data;
The cost calculation unit increases the cost of a region with a face detected by the face detection unit.

The image processing apparatus according to any one of claims 5 to 7,
A second position input unit for inputting a second position for combining the text data;
The cost calculation unit reduces the cost of the second position input by the second position input unit.

The image processing apparatus according to any one of claims 3 to 8,
An image processing apparatus, comprising: a character size determining unit that determines a character size of the text data so that all of the text of the text data can be combined in an image area of the image data.

The image processing apparatus according to any one of claims 3 to 9,
The image input unit inputs image data of a moving image,
The area determination unit determines the synthesis area of the text data based on a plurality of frame images included in the image data of the moving image.

Inputting image data;
Entering text data; and
Detecting an edge in the input image data;
Determining a synthesis area of the text data in the image data based on the detected edge;
Combining the text data with the determined combining region;
A program that causes a computer to execute.

An image processing apparatus for inputting image data;
The image processing apparatus inputting text data;
The image processing device detecting an edge in the input image data;
The image processing device determining a synthesis area of the text data in the image data based on the detected edge;
The image processing device combining the text data with the determined combining region;
An image processing method comprising:

An imaging apparatus comprising: the image processing apparatus according to claim 3.

A detection unit for detecting an edge of image data;
An area determination unit that determines an arrangement area in which characters in the image data are arranged based on the position of the edge detected by the detection unit;
An image generation unit that generates an image in which the characters are arranged in the arrangement region determined by the region determination unit;
An image processing apparatus comprising: