WO2013042768A1

WO2013042768A1 - Image processing device, program, image processing method, and imaging device

Info

Publication number: WO2013042768A1
Application number: PCT/JP2012/074230
Authority: WO
Inventors: 寛子小林; 司村田; 武史松尾
Original assignee: 株式会社ニコン
Priority date: 2011-09-21
Filing date: 2012-09-21
Publication date: 2013-03-28

Abstract

This image processing device has: a decision unit that decides on a character having a predetermined meaning from a captured image; a judgment unit that judges whether the captured image is an image of a person, or an image differing from the image of a person; a recording unit that records a first syntax that is the syntax of a text used in the image of a person and a second syntax that is the syntax of a text used in the image differing from the image of a person; and an output unit that, when the judgment unit judges that the captured image is the image of a person, outputs a text of the first syntax using the character having the predetermined meaning, and when the judgment unit judges that the captured image is the image differing from the image of a person, outputs a text of the second syntax using the character having the predetermined meaning.

Description

Image processing apparatus, program, image processing method, and imaging apparatus

The present invention relates to an image processing device, a program, an image processing method, and an imaging device.
The present application includes Japanese Patent Application No. 2011-266143 filed on Dec. 5, 2011, Japanese Patent Application No. 2011-206024 filed on Sep. 21, 2011, and Japanese Patent Application No. 2011 filed on Dec. 6, 2011. -266805, Japanese Patent Application No. 2011-267882 filed on December 7, 2011, Japanese Patent Application No. 2012-206296 filed on September 19, 2012, Japanese Patent Application filed on September 19, 2012 Claims priority based on 2012-206297, Japanese Patent Application No. 2012-206298 filed on September 19, 2012, and Japanese Patent Application No. 2012-206299 filed on September 19, 2012. This is incorporated here.

Conventionally, character information such as a birthday person name corresponding to an imaging date and an event name corresponding to an imaging date is given to a captured image by registering a birthday of the specific person, an event date and the like in advance. A technique is disclosed (for example, see Patent Document 1).

Further, in a conventional image processing apparatus that classifies images, the image is divided into predetermined pattern areas, and a histogram of the distribution regarding the color of each area is created. In the conventional image processing apparatus, the most frequently appearing color exceeding a specific threshold is determined as the representative region color of the region. Further, in the conventional image processing apparatus, the feature amount of the region is extracted, and based on the determined feature amount of the region and the representative color, an image from which the feature amount is extracted is defined, and an image dictionary is constructed.
In a conventional image processing apparatus, for example, a representative color of a large area at the top of an image is extracted, and an image dictionary is defined by defining “blue sky”, “cloudy sky”, “night sky”, etc. based on the extracted representative color. It comprised (for example, refer patent document 2).

Currently, a technique for superimposing a text related to a captured image on the captured image is disclosed (for example, see Patent Document 3). In the technique described in Patent Document 3, a composite image is generated by superimposing text on a non-important area other than an important area in which a relatively important subject is captured in a captured image. Specifically, an area in which a person is shown is classified as an important area, and text is superimposed on a non-important area that does not include the center of the image.

Further, a technique for performing predetermined color conversion on image data is disclosed (for example, see Patent Document 4). In the technique described in Patent Literature 4, when image data subjected to predetermined color conversion is sent to a printer, the image data is classified into image image data, character image data, and non-image image data other than characters, and image image data Is subjected to first color conversion, character image data is subjected to first color conversion or second color conversion, and non-image image data other than characters is subjected to first color conversion or second color conversion. Apply.

JP-A-2-303282 JP 2001-160057 A JP 2007-96816 A JP 2008-293082 A

However, the conventional technique described in Patent Document 1 can only add character information registered in advance by a user to a captured image.

Further, in the prior art described in Patent Document 2, since the classification is performed based on the feature amount extracted for each predetermined region and the representative color that is the most frequently appearing color, the arithmetic processing for classifying (labeling) the image The burden of was great.

In the prior art described in Patent Document 3, readability when text is superimposed on an image is not considered. For this reason, for example, when text is superimposed on an area where a complex texture exists, the outline of the font used for text display and the edge of the texture may overlap and the readability of the text may deteriorate. That is, the text may be difficult to read.

In the prior art described in Patent Document 4, when text related to an image is superimposed on the image, sufficient consideration has not been given to controlling the font color of the text.

For example, when the font color is fixed, the contrast between the font color of the text and the color of the image area in which the text is drawn is almost eliminated depending on the contents of the given image, and the readability of the text is significantly reduced. .
In addition, when the font color is fixed or a complementary color calculated from image information is used as the font color, the impression of the image may be greatly changed.

An object of one embodiment of the present invention is to provide a technique capable of giving more flexible character information to a captured image.

Another object is to provide an image processing device, an imaging device, and a program that can reduce the load of arithmetic processing for labeling an image.

Another object is to provide an image processing device, a program, an image processing method, and an imaging device that can synthesize text in an image so that a viewer can easily read the text.

Another object is to provide an image processing device, a program, an image processing method, and an imaging device that can synthesize text into an image with an appropriate font color.

An image processing apparatus according to one aspect of the present invention includes an image input unit that inputs a captured image, and a sentence template that inserts a word into a predetermined blank part to complete a sentence. A storage unit for storing a person image template used for creation and a landscape image template used for creating a sentence for a landscape image whose landscape is a subject; and whether the captured image is the person image or the landscape image The sentence template of either the person image template or the landscape image template is read out from the storage unit and read out according to a determination result by the determination unit for determining whether the image is a captured image or the landscape image template In addition, a word corresponding to the feature amount or the imaging condition of the captured image is inserted into the blank portion of the sentence template, and the imaging is performed. Characterized in that it comprises a sentence creation unit for creating a text for the image.

An image processing apparatus according to another aspect of the present invention is configured to determine an image input unit to which a captured image is input, a text corresponding to at least one of a feature amount of the captured image and an imaging condition of the captured image. Part, a determination part for determining whether the captured image is a first type image or a second type image different from the first type, and a sentence syntax used for the first type. A storage unit that stores a first syntax and a second syntax that is a syntax of a sentence used for the second type; and the determination unit determines that the captured image is the first type image, The sentence of the first syntax is created using the text determined by the determination unit, and when the determination unit determines that the captured image is the second type image, the determination unit determines The second syntax using text Characterized in that it contains the sentence creation unit that creates a sentence.

An imaging apparatus according to another aspect of the present invention includes an imaging unit that images a subject and generates a captured image, and a person whose subject is a subject as a sentence template that completes a sentence by inserting a word into a predetermined blank part A storage unit that stores a template for a human image used for creating a sentence for an image and a template for a landscape image used for creating a sentence for a landscape image whose scenery is a subject, and the captured image is the person image. A determination unit that determines whether the image is a landscape image, and the storage unit that stores either the person image template or the landscape image template according to a determination result by the determination unit on the captured image. The word corresponding to the feature amount or the imaging condition of the captured image is inserted into the blank part of the sentence template read out from Characterized in that it comprises a sentence creation unit for creating a sentence with respect to the captured image Te.

A program according to another aspect of the present invention includes a person image template used for creating a sentence for a person image in which a person is a subject as a sentence template for completing a sentence by inserting a word into a predetermined blank space, and a landscape. An image input step of inputting a captured image to a computer of an image processing apparatus including a storage unit that stores a landscape image template used to create a sentence for a landscape image of which the subject is a subject, and the captured image is the person image The sentence template of either the person image template or the landscape image template is determined according to a determination step of determining whether the image is a landscape image or a determination result of the determination step for the captured image. Read from the storage unit and capture the image in the blank section of the read text template Characterized in that to execute the sentence generating step of generating a sentence by inserting a word corresponding to the feature amount or the imaging condition of the image relative to the captured image.

An image processing apparatus according to another aspect of the present invention includes a determination unit that determines a character having a predetermined meaning from a captured image, and whether the captured image is a person image or an image different from the person image. A determination unit for determining; a storage unit that stores a first syntax that is a syntax of a sentence used for the person image; and a second syntax that is a syntax of a sentence used for an image different from the person image; When the determination unit determines that the image is the person image, the first syntax sentence is output using characters having the predetermined meaning, and the captured image is an image different from the person image. And an output unit that outputs the sentence of the second syntax using the character having the predetermined meaning when determined by the determination unit.

An image processing apparatus according to another aspect of the present invention includes an image acquisition unit that acquires captured image data, a scene determination unit that determines a scene from the acquired image data, and a color from the acquired image data. A main color extracting unit that extracts a main color based on a frequency distribution of information; a storage unit in which color information and a first label are associated in advance for each scene; and a main unit extracted from the storage unit. A first label generation unit that reads the first label stored in advance in association with the color and the determined scene, and generates the read first label as a label of the acquired image data; It is characterized by that.

An imaging apparatus according to another aspect of the present invention includes the image processing apparatus described above.

A program according to another aspect of the present invention is a program for causing a computer to execute image processing of an image processing apparatus having an imaging unit, an image acquisition procedure for acquiring captured image data, and the acquired image data A scene determination procedure for determining a scene from the main color extraction procedure for extracting a main color based on a frequency distribution of color information from the acquired image data, the extracted main color, color information for each scene A first label generation procedure for reading the first label from a storage unit associated with one label in advance and generating the read first label as a label of the acquired image data; It is made to perform.

An image processing apparatus according to another aspect of the present invention includes: a scene determination unit that determines whether or not a person photographing scene; and a color that is determined from the image data when the scene determination unit determines that the scene is not a person photographing scene. A color extracting unit that extracts information; a storage unit that stores color information and characters having a predetermined meaning in association with each other; and the scene determination unit that determines that the color is not a person-captured scene. And a reading unit that reads out the character having the predetermined meaning corresponding to the color information extracted by the extraction unit from the storage unit.

An image processing apparatus according to another aspect of the present invention includes an acquisition unit that acquires image data and text data, a detection unit that detects an edge of the image data acquired by the acquisition unit, and a detection unit that detects the image data and text data. An area determining unit that determines an area in which the text data is arranged in the image data, and an image generating unit that generates an image in which the text data is arranged in the area determined by the area determining unit It is characterized by including these.

An image processing apparatus according to another aspect of the present invention includes an image input unit that inputs image data, an edge detection unit that detects an edge in the image data input by the image input unit, and a text input that inputs text data A region determining unit that determines a combined region of the text data in the image data based on the edge detected by the edge detecting unit, and the text data is combined with the combined region determined by the region determining unit And a synthesizing unit.

The program according to another aspect of the present invention includes a step of inputting image data, a step of inputting text data, a step of detecting an edge in the input image data, and the detected edge based on the detected edge. A step of determining a synthesis region of the text data in the image data and a step of synthesizing the text data with the determined synthesis region are performed by a computer.

According to another aspect of the present invention, there is provided an image processing method in which an image processing device inputs image data, the image processing device inputs text data, and the image processing device receives the input. A step of detecting an edge in the image data; a step in which the image processing apparatus determines a synthesis area of the text data in the image data based on the detected edge; and Synthesizing the text data in a region.

An imaging apparatus according to another aspect of the present invention includes the above-described image processing apparatus.

An image processing apparatus according to another aspect of the present invention includes: a detection unit that detects an edge of image data; and an arrangement region in which characters in the image data are arranged based on the position of the edge detected by the detection unit And an image generation unit that generates an image in which the characters are arranged in the arrangement region determined by the region determination unit.

An image processing apparatus according to another aspect of the present invention includes an image input unit for inputting image data, a text setting unit for setting text data, and the text setting unit for image data input by the image input unit. A text composition area setting section for setting a text composition area, which is an area for synthesizing set text data, and image data input by the image input section and the text composition area set by the text composition area setting section. A font setting unit that sets a font color in which the tone is changed while keeping the hue unchanged, and a font setting unit that sets a font including at least the font color; In the image data input by the input unit, the text composition area setting is set. Combined image data, which is image data obtained by combining the text data set by the text setting unit using a font including at least the font color set by the font setting unit in the text synthesis area set by the unit. And a synthesized image generation unit for generating.

According to another aspect of the present invention, there is provided a program comprising: a step of inputting image data; a step of setting text data; and a text composition which is an area for combining the set text data in the input image data. A step of setting an area, and setting a font color in which the tone is changed while leaving the hue unchanged, for the tone and hue of the PCCS color system based on the input image data and the set text composition area, Setting a font including at least a font color; and combining the set text data using the font including at least the set font color in the set text composition area in the input image data. Generate composite image data that is image data Characterized in that to execute a step, to the computer.

According to another aspect of the present invention, there is provided an image processing method in which an image processing apparatus inputs image data, the image processing apparatus sets text data, and the image processing apparatus receives the input. Setting a text composition area, which is an area in which the set text data is synthesized in the image data, and the image processing apparatus comprising a PCCS table based on the input image data and the set text composition area. A step of setting a font color in which the tone is changed with the hue unchanged and setting a font including at least a font color with respect to the tone and hue of the color system; and the image processing apparatus in the input image data A font including at least the set font color in the set text composition area. And having the steps of: generating data of a composite image which is the data of an image obtained by synthesizing the data of the set text using cement.

An image processing apparatus according to another aspect of the present invention includes an acquisition unit that acquires image data and text data, an area determination unit that determines a text arrangement area in which the text data is arranged in the image data, and a text A color setting unit that sets a predetermined color in the data; and an image generation unit that generates an image in which the text data of the predetermined color is arranged in the text arrangement region, and the hue of the text arrangement region of the image data The ratio of the value to the hue value of the text data is closer to 1 than the ratio of the tone value of the text arrangement area of the image data to the tone value of the text data.

An image processing apparatus according to another aspect of the present invention includes a determination unit that determines an arrangement region in which characters in image data are arranged, a color setting unit that sets a predetermined color for the characters, and the characters in the arrangement region An image generation unit configured to generate an image, wherein the color setting unit is configured such that a ratio between a hue value of the arrangement region and a hue value of the character is a ratio between a tone value of the arrangement region and a tone value of the character. The predetermined color is set so as to be closer to 1.

According to the aspect of the present invention, it is possible to flexibly add character information to a captured image.

In addition, according to the aspect of the present invention, it is possible to realize labeling suitable for an image.

Further, according to the aspect of the present invention, the text can be synthesized in the image so that the viewer can easily read the text.

Further, according to the aspect of the present invention, it is possible to synthesize text into an image with an appropriate font color.

It is an example of the functional block diagram of the image processing apparatus by one Embodiment of this invention. It is an example of the text template memorize | stored in a memory | storage part. It is an example of the text template memorize | stored in a memory | storage part. It is an example of the text template memorize | stored in a memory | storage part. It is an example of the text template memorize | stored in a memory | storage part. It is an example of the word memorize | stored in a memory | storage part. It is an example of the word memorize | stored in a memory | storage part. It is explanatory drawing for demonstrating extraction of the color scheme of a captured image. It is explanatory drawing for demonstrating extraction of the color scheme of a captured image. It is explanatory drawing for demonstrating extraction of the color scheme of a captured image. It is explanatory drawing for demonstrating extraction of the color scheme of a captured image. It is a flowchart which shows an example of operation | movement of an image processing apparatus. It is a flowchart which shows an example of operation | movement of an image processing apparatus. It is an example of the captured image with which the text was added by the text addition part. It is an example of the captured image with which the text was added by the text addition part. It is an example of the captured image with which the text was added by the text addition part. It is an example of the captured image with which the text was added by the text addition part. It is an example of the captured image with which the text was added by the text addition part. It is an example of the functional block diagram of the imaging device by other one Embodiment. It is a schematic block diagram which shows the structure of the imaging system by other one Embodiment. It is a block diagram of an image processing part. It is a figure explaining an example of the image identification information memorize | stored in association with image data on the storage medium. It is a figure explaining an example of the combination of the main color memorize | stored in the table memory | storage part, and a 1st label. It is a figure explaining an example of the main colors of image data. It is a figure explaining an example of labeling of the main color extracted in FIG. It is a figure explaining an example of labeling of the main color extracted in FIG. It is an example of the image data of sports. It is a figure showing the color vector of the image data of the sport of FIG. 15A. It is an example of portrait image data. It is a figure showing the color vector of the image data of the portrait of FIG. 16A. It is an example of the image data of a landscape. It is a figure showing the color vector of the image data of the scenery of FIG. 17A. It is a figure explaining an example of the 1st label of the combination of the main colors for every scene. It is a figure explaining the example of the 1st label by time, a season, and a color vector. It is a flowchart of the label production | generation which an imaging device performs. It is a block diagram of the image processing part by other one Embodiment. It is a block diagram of the image processing part by other one Embodiment. It is a flowchart of the label production | generation which an imaging device performs. It is a figure explaining an example which extracts a several color vector from the image data by other one Embodiment. It is a block diagram which shows the function structure of an image process part. It is an image figure which shows an example of an input image. It is an image figure which shows an example of a global cost image. It is an image figure which shows an example of a face cost image. It is an image figure which shows an example of an edge cost image. It is an image figure which shows an example of a final cost image. It is an image figure which shows an example of a synthesized image. It is a flowchart which shows the procedure of the synthesis process of a still image. It is a flowchart which shows the procedure of the synthetic | combination process of a moving image. It is a block diagram which shows the function structure of the image process part by other one Embodiment. It is a flowchart which shows the procedure of a synthetic | combination process. It is a block diagram which shows the function structure of the image process part by other one Embodiment. It is a flowchart which shows the procedure of a synthetic | combination process. It is an image figure which shows the calculation method of the sum total of the cost in a text rectangular area. It is a block diagram which shows the function structure of the image process part which concerns on other one Embodiment. It is a figure which shows the relationship of the harmony of the contrast by the tone in a PCCS color system. It is a flowchart which shows the procedure of the process performed in an image process part. It is a flowchart which shows the procedure of the process performed in a font setting part. It is a figure which shows an example of image data with an image. It is a figure which shows an example of the data of a synthesized image with an image. It is a figure which shows an example of the data of a synthesized image with an image. It is a figure which shows an example of the hue ring of a PCCS color system in gray scale. It is a figure which shows an example of the tone of a PCCS color system in gray scale. It is a figure which shows 12 types of chromatic color tones. It is a figure which shows five types of achromatic tone. It is a figure which shows typically an example of the process which extracts the feature-value of a captured image. It is a figure which shows typically another example of the process which extracts the feature-value of a captured image. It is a flowchart which shows typically the determination method of a smile level. It is a figure which shows an example of the output image from an image processing apparatus. It is a figure which shows another example of the output image from an image processing apparatus. It is a schematic block diagram showing the internal structure of the image process part of an imaging device. It is a flowchart which shows the flow of determination of a representative color. It is a conceptual diagram which shows an example of the process in an image process part. It is a conceptual diagram which shows an example of the process in an image process part. FIG. 53 is a conceptual diagram illustrating a result of clustering performed on the main region illustrated in FIG. 52. It is an example of the image which the text was added by the text addition part. It is another example of the image which the text was added by the text addition part. It is a figure which shows an example of the correspondence table of a color and a word. It is a figure which shows an example of the correspondence table for a distant view image (2nd scene image). It is a figure which shows an example of the corresponding | compatible table for other images (3rd scene image).

(First embodiment)
Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is an example of a functional block diagram of an image processing apparatus 1001 according to the first embodiment of the present invention. 2A to 2D are examples of sentence templates stored in the storage unit 1090. FIG. 3A and 3B are examples of words stored in the storage unit 1090. 4A to 4D are explanatory diagrams for explaining extraction of a color arrangement pattern of a captured image.

The image processing apparatus 1001 includes an image input unit 1010, a determination unit 1020, a sentence creation unit 1030, a sentence addition unit 1040, and a storage unit 1090, as shown in FIG. The image input unit 1010 inputs a captured image via, for example, a network or a storage medium. The image input unit 1010 outputs the captured image to the determination unit 1020.

The storage unit 1090 stores a sentence template that completes a sentence by inserting a word into a predetermined blank part. Specifically, the storage unit 1090 includes, as a sentence template, a person image template used to create a sentence for an image in which a person is a subject (hereinafter referred to as a person image), and a landscape (also referred to as a second type) as a subject. And a landscape image template used for creating a sentence for an image (hereinafter referred to as a landscape image). An example of a person image is a portrait (also referred to as a first type).

For example, the storage unit 1090 stores two types of person image templates as shown in FIGS. 2A and 2B. 2A and 2B, a blank portion for inserting a word corresponding to the number of subjects (blank portion {number of people}) and a word corresponding to a color arrangement pattern of the captured image are inserted. It has a blank part (shown as a blank part {adjective}).

For example, the storage unit 1090 stores two types of landscape image templates as shown in FIGS. 2C and 2D. In the landscape image template shown in FIG. 2C, a blank portion (blank portion {date}) for inserting a word corresponding to the imaging condition (date and time) of the captured image and a word corresponding to the color arrangement pattern of the captured image are inserted. It has a blank part. In addition, the landscape image template shown in FIG. 2D includes a blank portion (denoted as blank portion {place}) for inserting a word corresponding to the imaging condition (location) of the captured image, and a word corresponding to the color arrangement pattern of the captured image. It has the blank part which inserts.

The person image template described above is a sentence template that is focused on a person imaged as a subject, that is, a sentence template in which a blank portion is set in a sentence from the viewpoint of a person imaged as a subject. is there. For example, the word “spent” in the person image template in FIG. 2A and the word “pose” in the person image template in FIG. 2B express the viewpoint of the person who is captured. The landscape image template described above is a text template that is imaged from the entire captured image, that is, a text template in which a blank portion is set in text from the viewpoint of the photographer who captured the subject. For example, the wording “one piece” in the landscape image template in FIG. 2C and the wording “scenery” in the landscape image template in FIG. 2D express the viewpoint of the photographer.

Further, the storage unit 1090 stores a word to be inserted into each blank portion of the sentence template in addition to the sentence template (person image template, landscape image template). For example, as illustrated in FIG. 3A, the storage unit 1090 stores a word related to the number of people as a word to be inserted into the blank portion {number of people} in association with the number of subjects of the captured image.

For example, when a person image template is used and the number of subjects is “1”, the word “one person” is inserted into the blank portion {number of persons] of the person image template. Note that the sentence creation unit 1030 reads out a sentence template to be used from the storage unit 1090 and inserts a word in the blank part (described later).

Further, as shown in FIG. 3B, the storage unit 1090 is associated with the color arrangement pattern of the captured image as a word to be inserted into the blank portion {adjective} of the person image template or the blank portion {adjective} of the landscape image template. Stores adjectives for person images and adjectives for landscape images.

For example, when a person image template is used, the color arrangement pattern of the entire region of the captured image has the first color “color 1”, the second color “color 2”, and the third color “color 3” shown in FIG. 4A. The word “cool” is inserted into the blank field {adjective} of the person image template. Further, when the landscape image template is used, the color arrangement pattern of the entire area of the captured image has the first color “color 2”, the second color “color 1”, and the third color “color 4” shown in FIG. 4B. The word “lively” is inserted into the blank field {adjective} of the landscape image template.

The above-described colors 1 to 5 are obtained by classifying individual colors actually expressed in the captured image into five colors (five representative colors) based on a standard such as a warm color / cold color. In other words, the above-described colors 1 to 5 are obtained by classifying the pixel values of each pixel of the captured image into five colors based on, for example, a warm color / cold color standard.
The first color constituting the color arrangement pattern is the color most expressed in the captured image among the colors 1 to 5, and the second color is the second color in the captured image out of the colors 1 to 5. The third color is the color most expressed in the captured image among the colors 1 to 5. In other words, the color having the largest number of pixels classified when the pixel values are classified into colors 1 to 5 is the first color, and the number of pixels classified when the pixel values are classified into colors 1 to 5 is The second most common color is the second color, and when the pixel values are classified into colors 1 to 5, the third color is the third largest number of pixels.
Note that the text creation unit 1030 extracts a color arrangement pattern from the captured image.

Note that a color arrangement pattern in a partial area on the captured image may be used instead of the color arrangement pattern in the entire area of the captured image. That is, the sentence creation unit 1030 may insert an adjective corresponding to the color arrangement pattern of a partial area on the captured image into the blank part. Specifically, the text creation unit 1030 determines a predetermined area on the captured image according to whether the captured image is a person image or a landscape image, and according to the determined color arrangement pattern of the predetermined area on the captured image An adjective may be inserted in the blank.
For example, when the captured image is a person image as shown in FIG. 4C, the sentence creation unit 1030 determines the central area of the human image as a predetermined area, extracts the color arrangement pattern of the central area, An adjective corresponding to the extracted color arrangement pattern may be inserted into the blank section. In addition, when the captured image is a landscape image as illustrated in FIG. 4D, the text creation unit 1030 determines a region above the landscape image as a predetermined region, and extracts and extracts a color scheme pattern of the above region. Adjectives corresponding to the color arrangement pattern may be inserted into the blank section.

Although not shown, the storage unit 1090 associates with the imaging date and time and inserts the word related to the date and time as a word to be inserted into the blank field {date and time} (for example, time, “good morning”, “dusk”, “midsummer”). !! ", ...) is memorized. In addition, the storage unit 1090 associates with the shooting location with a word related to the location as a word to be inserted into the blank portion {location} (for example, “North Country”, “Old City” “Mt. Fuji”, “Kaminarimon”,. ) Is memorized.

The determination unit 1020 acquires a captured image from the image input unit 1010. The determination unit 1020 determines whether the acquired captured image is a person image or a landscape image. Hereinafter, the determination of the person image / landscape image by the determination unit 1020 will be described in detail. Note that the first threshold value (also referred to as “Flow”) is smaller than the second threshold value (also referred to as “Fhigh”).

The determination unit 1020 attempts to recognize a face area in the captured image.
(When face area = 0)
If no face area is recognized in the captured image, the determination unit 1020 determines that the captured image is a landscape image.

(When face area = 1)
When the determination unit 1020 recognizes one face area in the captured image, the determination unit 1020 calculates a ratio R of the size of the face area to the size of the captured image according to the following equation (1).
R = Sf / Sp (1)
Sp in the above formula (1) is the size of the captured image, and specifically, the length in the longitudinal direction of the captured image is used. Sf in the above formula (1) is the size of the face area. Specifically, the length in the longitudinal direction of the rectangle circumscribing the face area (or the length of the major axis of the ellipse surrounding the face area ( Long diameter)) is used.

The determination unit 1020 that calculated the ratio R compares the ratio R with the first threshold value Flow. If the determination unit 1020 determines that the ratio R is less than the first threshold value Flow, the determination unit 1020 determines that the captured image is a landscape image. On the other hand, when the determination unit 1020 determines that the ratio R is equal to or greater than the first threshold value Flow, the determination unit 1020 compares the ratio R with the second threshold value Fhigh.

If the determination unit 1020 determines that the ratio R is equal to or greater than the second threshold Fhigh, the determination unit 1020 determines that the captured image is a person image. On the other hand, when the determination unit 1020 determines that the ratio R is less than the second threshold value Fhigh, the determination unit 1020 determines that the captured image is a landscape image.

(If face area ≧ 2)
When recognizing a plurality of face areas in the captured image, the determination unit 1020 calculates a ratio R (i) of the size of each face area to the size of the captured image according to the following equation (2).
R (i) = Sf (i) / Sp (2)
Sp in the above formula (2) is the same as that in the above formula (1). Sf (i) in the above formula (2) is the size of the i-th face area. Specifically, the length of the rectangle circumscribing the i-th face area (or the face area) The major axis length (major axis) of the enclosing ellipse is used.

The determination unit 1020 that calculated R (i) calculates the maximum value (Rmax) of R (i). That is, the determination unit 1020 calculates the ratio Rmax of the maximum face area size to the size of the captured image.

The determination unit 1020 that has calculated the ratio Rmax compares the ratio Rmax with the first threshold value Flow. If the determination unit 1020 determines that the ratio Rmax is less than the first threshold value Flow, the determination unit 1020 determines that the captured image is a landscape image. On the other hand, when the determination unit 1020 determines that the ratio Rmax is greater than or equal to the first threshold value Flow, the ratio Rmax is compared with the second threshold value Fhigh.

When the determination unit 1020 determines that the ratio Rmax is equal to or greater than the second threshold value Fhigh, the captured image is determined to be a person image. On the other hand, when the determination unit 1020 determines that the ratio Rmax is less than the second threshold value Fhigh, the determination unit 1020 calculates the standard deviation σ of R (i). The following formula (3) is a formula for calculating the standard deviation σ.

The determination unit 1020 that has calculated the standard deviation σ compares the standard deviation σ with a third threshold value (also referred to as Fstdev). If the determination unit 1020 determines that the standard deviation σ is less than the third threshold Fstdev, the determination unit 1020 determines that the captured image is a person image. On the other hand, when the determination unit 1020 determines that the standard deviation σ is greater than or equal to the third threshold value Fstdev, the determination unit 1020 determines that the captured image is a landscape image.

As described above, when the determination unit 1020 recognizes a plurality of face areas in the captured image, the ratio Rmax of the maximum face area size to the size of the captured image is greater than or equal to the second threshold Fhigh. Determines that the captured image is a person image. In addition, even if the ratio Rmax is less than the second threshold value Fhigh, the determination unit 1020 determines that the standard deviation σ of the ratio R (i) of the plurality of face regions is not greater than the first threshold value Flow. When it is less than the third threshold Fstdev, it is determined that the captured image is a person image.

Note that the determination unit 1020 replaces the determination with the standard deviation σ of the ratio R (i) of the plurality of face areas and the third threshold value Fstdev, and the variance λ and the distribution λ of the ratio R (i) of the plurality of face areas The determination may be made using the threshold value. Further, the determination unit 1020 may use the standard deviation (or variance) of the plurality of face regions Sf (i) instead of the standard deviation (or variance) of the ratio R (i) of the plurality of face regions ( In this case, the threshold value for the face area Sf (i) is used).

Further, when determining that the captured image is a person image, the determination unit 1020 determines (counts) the number of subjects based on the number of face areas having a ratio R (i) equal to or greater than the first threshold value Flow. That is, the determination unit 1020 determines that each face area having a ratio R (i) equal to or greater than the first threshold value Flow is an individual subject, and determines the number of face areas equal to or greater than the first threshold value Flow as the number of subjects. To do.

The determination unit 1020 outputs the determination result to the sentence creation unit 1030. Specifically, when the determination unit 1020 determines that the captured image is a person image, the image determination result information indicating the determination result indicating that the captured image is a person image, and the number determination result information indicating the determination result of the number of subjects. Is output to the sentence creation unit 1030. On the other hand, if the determination unit 1020 determines that the captured image is a landscape image, the determination unit 1020 outputs image determination result information indicating a determination result indicating that the image is a landscape image to the sentence creation unit 1030.
Also, the determination unit 1020 outputs the captured image acquired from the image input unit 1010 to the text creation unit 1030.

The sentence creation unit 1030 acquires a determination result and a captured image from the determination unit 1020. The text creation unit 1030 reads from the storage unit 1090 a text template of either a person image template or a landscape image template according to the acquired determination result. Specifically, when the text creation unit 1030 acquires image determination result information indicating a determination result indicating that the image is a person image, the sentence creation unit 1030 selects from two types of person image templates stored in the storage unit 1090. One of the randomly selected person image templates is read out. In addition, when the text creation unit 1030 acquires image determination result information indicating a determination result indicating that it is a landscape image, the text creation unit 1030 randomly selects from two types of landscape image templates stored in the storage unit 1090. The one person image template thus read is read out.

The sentence creation unit 1030 creates a sentence for the captured image by inserting a word corresponding to the feature amount or the imaging condition of the captured image into the blank part of the read sentence template (person image template or landscape image template). The word corresponding to the feature amount is an adjective corresponding to the color arrangement pattern of the captured image, or a word corresponding to the number of subjects (word related to the number of subjects). The word corresponding to the imaging condition of the captured image is a word corresponding to the imaging date and time (word related to the date and time) or a word corresponding to the imaging location (word related to the location).

As an example, when the person image template shown in FIG. 2A is read, the text creation unit 1030 acquires the number of subjects of the captured image from the number determination result information, and stores it in association with the number of persons. A word (word related to the number of people) is read from the storage unit 1090 and inserted into the blank portion {number of people}, a color arrangement pattern of the captured image is extracted, and a word (person image) stored in association with the extracted color arrangement pattern For example) is read from the storage unit 1090 and inserted into the blank part {adjective} to create a sentence for this captured image. Specifically, if the number of subjects is “1”, and the color arrangement pattern is the first color “color 1”, the second color “color 2”, and the third color “color 3”, the sentence creation unit 1030 Create the sentence "Cool memories spent alone".

As another example, when the person image template shown in FIG. 2B is read out, the sentence creation unit 1030 reads out words related to the number of people from the storage unit 1090 as in FIG. The adjective for the person image is read out from the storage unit 1090 and inserted into the blank part {adjective} to create a sentence for the captured image. Specifically, if the number of subjects is “10” and the color arrangement pattern is the first color “color 5”, the second color “color 4”, and the third color “color 2”, the sentence creating unit 1030 Sentence "Hot feeling? Pause with many people! ! Is created.

As another example, when the landscape creation template 1030 reads the landscape image template shown in FIG. 2C, the text creation unit 1030 acquires the imaging date and time from the additional information of the captured image (for example, Exif; Exchangeable Image File Format). The words stored in association with the acquired imaging date and time (words related to the date and time) are read from the storage unit 1090 and inserted into the blank portion {date and time}, the color arrangement pattern of this captured image is extracted, and the extracted color arrangement A word (adjective for a landscape image) stored in association with a pattern is read from the storage unit 1090 and inserted into a blank field {adjective} to create a sentence for this captured image.
Specifically, when the word “Midsummer !!!” is stored in the storage unit 1090 in association with August, the imaging date and time is August 10, 2011, the color arrangement pattern is the first color “color 5”, If the second color is “color 4” and the third color is “color 2”, the sentence creation unit 1030 displays the sentence “Midsummer! ! . Create a piece that feels hot.

As another example, when the landscape image template shown in FIG. 2D is read, the text creation unit 1030 acquires an imaging location from the additional information of the captured image, and stores it in association with the acquired imaging location. A word (a word related to a place) is read from the storage unit 1090 and inserted into a blank part {place}, a color arrangement pattern of this captured image is extracted, and a word (landscape) stored in association with the extracted color arrangement pattern The image adjective) is read from the storage unit 1090 and inserted into the blank field {adjective}, and a sentence for this captured image is created.
Specifically, when the word “old capital” is stored in the storage unit 1090 in association with Kyoto Station, the imaging location is in front of Kyoto Station, the color arrangement pattern is the first color “Color 1”, and the second color “Color 2”. ”, The third color“ color 5 ”, the sentence creation unit 1030 reads the sentence“ Old city. The soft scenery at that time! Is created.

The sentence creation unit 1030 that created the sentence outputs the created sentence and the captured image to the sentence addition unit 1040. The sentence adding unit 1040 acquires a sentence and a captured image from the sentence creating unit 1030. The sentence adding unit 1040 adds (synthesizes) the sentence to the captured image.

Subsequently, the operation of the image processing apparatus 1001 will be described. 5 and 6 are flowcharts showing an example of the operation of the image processing apparatus 1001.

In FIG. 5, the image input unit 1010 inputs a captured image (step S1010). The image input unit 1010 outputs the captured image to the determination unit 1020. The determination unit 20 determines whether or not there is one or more face areas in the captured image (step S1012). If the determination unit 1020 determines that there is one or more face areas in the captured image (step S1012: Yes), the ratio of the size of the face area to the size of the captured image is calculated for each face area (step S1014). ), The maximum value of the ratio is calculated (step S1016).

Subsequent to step S1016, the determination unit 1020 determines whether or not the maximum value calculated in step S1016 is greater than or equal to the first threshold (step S1020). If the determination unit 1020 determines that the maximum value calculated in step S1016 is equal to or greater than the first threshold (step S1020: Yes), the determination unit 1020 determines whether the maximum value is equal to or greater than the second threshold ( Step S1022). If the determination unit 1020 determines that the maximum value is greater than or equal to the second threshold (step S1022: Yes), the determination unit 1020 determines that the captured image is a person image (step S1030). Subsequent to step S1030, the determination unit 1020 counts the number of face areas having a ratio equal to or higher than the first threshold as the number of subjects (step 1032). Subsequent to step S1032, the determination unit 1020 writes a determination result (image determination result information indicating a determination result indicating that the image is a person image and number determination result information indicating a determination result of the number of subjects) and the captured image as text. The data is output to the creation unit 1030.

On the other hand, when it is determined in step S1022 that the maximum value is less than the second threshold (step S1022: No), the determination unit 1020 determines whether or not there are two or more face regions in the captured image ( Step S1040). If the determination unit 1020 determines that there are two or more face regions in the captured image (step S1040: Yes), the determination unit 1020 calculates the standard deviation of the ratio calculated in step S1014 (step S1042), and the standard deviation is It is determined whether or not the threshold value is less than 3 (step S1044). If the determination unit 1020 determines that the standard deviation is less than the third threshold (step S1044: Yes), the process proceeds to step S1030.

On the other hand, if it is determined in step S1012 that there is no face area in the captured image (step S1012: No), or if it is determined in step S1020 that the maximum value is less than the first threshold (step S1020). : No) or when it is determined in step S1040 that there is only one face area in the captured image (step S1040: No), the determination unit 1020 determines that the captured image is a landscape image (step S1050). ). Subsequent to step S1050, the determination unit 1020 outputs a determination result (image determination result information indicating a determination result indicating that the image is a landscape image) to the sentence creation unit 1030.

Note that step S1040 described above is a process for preventing a captured image having one face area from being always determined to be a person image. In step S1040 described above, there are a very large number of very small face areas with the same size in addition to the face area having the largest ratio of the size of the face area to the size of the captured image. If so, the standard deviation is small, so that it may be determined that the image is a person image. Therefore, in order to reduce the above-described determination as much as possible, the determination unit 1020 may determine whether there are two or more face regions having a predetermined size. For example, the determination unit 1020 may determine whether there are two or more face regions in which the above-described ratio is equal to or greater than a first threshold.

Subsequent to step S1032 or step S1050, the sentence creation unit 1030 reads either a person image template or a landscape image template from the storage unit 1090 according to the determination result acquired from the determination unit 1020. A word corresponding to the feature amount or the imaging condition of the captured image is inserted into the blank portion of the read sentence template to create a sentence for the captured image (step S1100).

FIG. 6 shows details of step S1100. In FIG. 6, the text creation unit 1030 determines whether or not the captured image is a person image (step S1102). Specifically, when the sentence creation unit 1030 has acquired image determination result information indicating a determination result indicating that the image is a person image as a determination result from the determination unit 1020, the captured image is a person image. If image determination result information indicating a determination result indicating that the image is a landscape image has been acquired, it is determined that the captured image is not a person image.

When the sentence creation unit 1030 determines that the captured image is a person image (step S1102: Yes), the document creation unit 1030 reads a person image template from the storage unit 1090 (step S1104). Specifically, the sentence creation unit 1030 reads one person image template randomly selected from the two types of person image templates stored in the storage unit 1090.

Subsequent to step S1104, the sentence creation unit 1030 inserts a word corresponding to the number of subjects in the blank portion {number of people} of the person image template (step S1110). Specifically, the text creation unit 1030 acquires the number of subjects from the number determination result information, reads words stored in association with the number of people (words related to the number of people) from the storage unit 1090, and reads the person image. Insert it into the blank field {number of people} of the template.

Subsequent to step S1110, the sentence creation unit 1030 inserts a word corresponding to the color arrangement pattern of the captured image (person image) into the blank portion {adjective} of the person image template (step S1120). Specifically, the sentence creation unit 1030 extracts a color arrangement pattern in the central area of the captured image (person image) and stores a word (adjective for person image) stored in association with the color arrangement pattern. It is read from the part 1090 and inserted into the blank part {adjective} of the person image template.

On the other hand, when it is determined in step S1102 that the captured image is a landscape image (step S1102: No), the text creation unit 1030 reads a landscape image template from the storage unit 1090 (step S1106). Specifically, the text creation unit 1030 reads one landscape image template randomly selected from the two types of landscape image templates stored in the storage unit 1090.

Subsequent to step S1106, the sentence creation unit 1030 inserts a word corresponding to the color arrangement pattern of the captured image (landscape image) into the blank field {adjective} of the landscape image template (step S1130). Specifically, the sentence creation unit 1030 extracts a color arrangement pattern of the upper region of the captured image (landscape image), and stores a word (landscape image adjective) stored in association with the color arrangement pattern. It is read out from 1090 and inserted into the blank field {adjective} of the landscape image template.

Following step S1120 or step S1130, the sentence creation unit 1030 determines whether or not a blank part {date} exists in the read sentence template (step S1132). In the case of the present embodiment, as shown in FIGS. 2A to 2D, the landscape image template in FIG. 2C has a blank space {date}, but the person image template in FIGS. 2A and 2B and the landscape image in FIG. 2D. The blank template {date and time} does not exist in the template for use. Therefore, if the landscape image template of FIG. 2C has been read in step S1106, the text creation unit 1030 determines that there is a blank portion {date}, and in step S1104, the person image of FIG. 2A or FIG. 2B. When the template for reading is read out, or when the landscape image template of FIG. 2D is read out in step S1106, it is determined that the blank part {date} does not exist.

If the sentence creation unit 1030 determines that there is a blank part {date} in the read sentence template (step S1132: Yes), the sentence corresponding to the imaging condition (date) of the captured image is changed to a blank part {date} of the sentence template. (Step S1140). Specifically, the text creation unit 1030 acquires the imaging date / time from the additional information of the captured image (landscape image), and stores the word (word related to the date / time) stored in association with the imaging date / time. Is inserted into the blank field {date and time} of the landscape image template. On the other hand, when the sentence creation unit 1030 determines that the blank part {date} does not exist in the read sentence template (step S1132: No), the process skips step S1140 and proceeds to step S1142.

Following step S1132 (No) or step S1140, the sentence creation unit 1030 determines whether or not a blank part {place} exists in the read sentence template (step S1142). In the case of the present embodiment, as shown in FIGS. 2A to 2D, the landscape image template in FIG. 2D has a blank portion {place}, but the person image template in FIGS. 2A and 2B and the landscape image in FIG. 2C. The blank template {place} does not exist in the template for use. Accordingly, when the landscape image template of FIG. 2D is read in step S1106, the text creation unit 1030 determines that a blank portion {place} exists, and in step S1104, the person image of FIG. 2A or FIG. 2B. When the template for reading is read out, or when the landscape image template of FIG. 2C is read out in step S1106, it is determined that the blank portion {place} does not exist.

When the sentence creating unit 1030 determines that the blank part {place} exists in the read sentence template (step S1142: Yes), the word corresponding to the imaging condition (place) of the captured image is changed to a blank part {place} of the sentence template. (Step S1150). Specifically, the text creation unit 1030 acquires an imaging location from the additional information of the captured image (landscape image), and stores a word (word related to the location) stored in association with the imaging location. Is inserted into the blank field {place} of the landscape image template. Then, the flowchart shown in FIG. 6 ends, and the process returns to the flowchart shown in FIG. On the other hand, when the sentence creating unit 1030 determines that the blank part {place} does not exist in the read sentence template (step S1142: No), step S1150 is skipped and the process returns to the flowchart shown in FIG.

Returning to FIG. 5, the sentence creation unit 1030 that created the sentence outputs the created sentence and the captured image to the sentence addition unit 1040. The sentence adding unit 1040 acquires a sentence and a captured image from the sentence creating unit 1030. The text adding unit 1040 adds (synthesizes) the text acquired from the text creating unit 1030 to the captured image acquired from the text creating unit 1030. Then, the flowchart shown in FIG. 5 ends.

FIGS. 7A to 7E are examples of captured images to which sentences are added by the sentence adding unit 1040. FIG. The captured image in FIG. 7A is determined to be a person image because one person's face is greatly reflected. That is, in this captured image, it is determined that the maximum value of the ratio of the size of the face area to the size of the captured image (the ratio of this one face area) is greater than or equal to the second threshold (step S1022 (Yes )). The captured image in FIG. 7B is determined to be a human image because the faces of the two people are shown large. That is, in this captured image, it is determined that the maximum value of the ratio of the size of the face area to the size of the captured image is greater than or equal to the second threshold (step S1022 (Yes)).

The captured image in FIG. 7C has a face of a certain size and has a uniform size, and thus is determined to be a person image. That is, in this captured image, although the maximum value of the ratio of the size of the face area to the size of the captured image is greater than or equal to the first threshold and less than the second threshold (step S1022 (No)), the standard deviation is It is determined that it is less than the third threshold (step S1044 (Yes)).

The captured image in FIG. 7D includes a face of a certain size, but is not uniform in size, and thus is determined to be a landscape image. That is, in this captured image, although the maximum value of the ratio of the size of the face area to the size of the captured image is greater than or equal to the first threshold and less than the second threshold (step S1022 (No)), the standard deviation is It is determined that the value is greater than or equal to the third threshold (step S1044 (No)). The captured image in FIG. 7E is determined to be a landscape image because no face is captured (step S1012 (No)).

As described above, according to the image processing apparatus 1001, more flexible character information can be given to the captured image. In other words, the image processing apparatus 1001 classifies captured images into human images and landscape images, and for human images, creates a text for human images using a human image template stored in advance, For landscape images, landscape image text is created using a prestored landscape image template, so that more flexible text information can be given according to the captured content.

In the above-described embodiment, an example in which the image input unit 1010 outputs the captured image to the determination unit 1020 when a captured image is input has been described, but the manner in which the determination unit 1020 acquires the captured image is not limited thereto. For example, the image input unit 1010 may store the captured image in the storage unit 1090 when the captured image is input, and the determination unit 1020 may read and acquire a desired captured image from the storage unit 1090 when necessary.

In the above-described embodiment, the example in which the number of colors of the first color constituting the color arrangement pattern uses five colors 1 to 5 has been described. However, for convenience of explanation, the number of colors may be 6 or more. Good. The same applies to the second color and the third color. In the above-described embodiment, the example using the color arrangement pattern composed of the first to third colors has been described, but the number of colors constituting the color arrangement pattern is not limited to this. For example, a color arrangement pattern composed of two colors or four or more colors may be used.

In the above embodiment, the sentence creating unit 1030 reads one person image template randomly selected from the two types stored in the storage unit 1090 when the captured image is a person image. However, the mode of selecting one of the two types of person image templates to be read is not limited to this. For example, the text creation unit 1030 may select one person image template designated by the user via the operation unit (not shown). Similarly, the text creation unit 1030 may select one landscape image template designated by the user via the designation receiving unit.

In the above-described embodiment, an example has been described in which a word to be inserted into the blank portion of the selected template is always obtained from the storage unit 1090. However, a word to be inserted into the blank portion of the selected template is obtained from the storage unit 1090. If not, another template may be selected again. For example, when the landscape image template of FIG. 2D having a blank portion {location} is selected for creating a sentence of a certain captured image, but the imaging location cannot be acquired from the additional information of the captured image, the blank portion The landscape image template may be selected again in FIG. 2C without {place}.

In the above-described embodiment, the image processing apparatus 1001 has described the example in which the person image template having the blank portion {number} and the blank portion {adjective} is stored in the storage unit 1090. The number and type of parts are not limited to this. For example, the person image template may have one or both of the blank part {date} and the blank part {location} in addition to the blank part {number of people} and the blank part {adjective}. Further, when the image processing apparatus 1001 includes various sensors, the person image template includes a blank part {blank part {illuminance}) for inserting words according to the imaging condition (illuminance) of the captured image, and the imaging condition of the captured image ( It may have a blank part {blank part {temperature}) for inserting a word corresponding to (temperature).

Further, the person image template does not necessarily have the blank portion {number of people}. An example of the case where the person image template does not have a blank portion {number of people} is a case where a sentence including words corresponding to the number of subjects is not created for the person image. In the case where a sentence including words corresponding to the number of subjects is not created for a person image, the image processing apparatus 1001 naturally needs to store a person image template having a blank part {number of persons] in the storage unit 1090. There is no.
Another example of the case where the person image template does not have a blank portion {number of people} is a case where a plurality of person image templates corresponding to the number of subjects are stored in the storage unit 1090. When a plurality of person image templates corresponding to the number of subjects are stored in the storage unit 1090, the image processing apparatus 1001 inserts a word corresponding to the number of subjects in the blank portion {number} for the person image. Thus, instead of creating a sentence including words according to the number of subjects, a person image template corresponding to the number of subjects is read from the storage unit 1090, and a sentence including words according to the number of subjects is created. .

In the above embodiment, the image processing apparatus 1001 also includes a landscape image template having a blank portion {date} and a blank portion {adjective}, and a landscape image template having a blank portion {location} and a blank portion {adjective}. However, the number and type of blank sections included in the landscape image template are not limited to this. For example, when the image processing apparatus 1001 includes various sensors, the above-described blank portion {illuminance}, blank portion {temperature}, and the like may be included.

In the above-described embodiment, the example in which the image processing apparatus 1001 stores two types of person image templates in the storage unit 1090 has been described, but one type or three or more types of person image templates are stored in the storage unit 1090. May be. Similarly, the image processing apparatus 1001 may store one type or three or more types of landscape image templates in the storage unit 1090.

Further, in the above-described embodiment, the image processing apparatus 1001 has described the example in which the sentence is added to the captured image when the sentence for the captured image is created. The image may be stored in the storage unit 1090 in association with the captured image.

The storage unit 1090 also has a first syntax that is a syntax of a sentence used for an image of a first type (for example, portrait) and a syntax of a sentence that is used for an image of a second type (for example, a landscape). Two syntaxes may be stored.

When the first syntax and the second syntax are stored in the storage unit 1090, the sentence creating unit 1030 is determined when the determination unit 1020 determines that the captured image is the first type image (that is, the determination unit When it is determined that 1020 is a human image), a sentence having the first syntax is created using a predetermined text, and when the determination unit 1020 determines that the captured image is a second type image (ie, When the determination unit 1020 determines that the image is a landscape image), a sentence having the second syntax may be created using a predetermined text.

The image processing apparatus 1001 includes a determination unit (not shown) that determines text corresponding to at least one of the feature amount of the captured image and the imaging condition (text according to the feature amount of the captured image or / and the imaging condition). You may do it. For example, when the image input unit 1010 inputs (acquires) a captured image, the determination unit determines a text corresponding to the feature amount or / and the imaging condition of the captured image as the predetermined text used for document creation. More specifically, for example, a plurality of texts are stored in advance in the storage unit 1090 in association with the feature amounts and the imaging conditions, and the determination unit determines whether the feature amount or / or from the plurality of texts in the storage unit 1090. The text corresponding to the imaging condition is selected.

That is, when the determination unit 1020 determines that the captured image is the first type image, the sentence generation unit 1030 generates a sentence of the first syntax using the text determined by the determination unit as described above. When the determination unit 1020 determines that the captured image is the second type image, the determination unit creates a sentence of the second syntax using the text determined as described above.

(Second Embodiment)
Subsequently, a second embodiment of the present invention will be described with reference to the drawings. FIG. 8 is an example of a functional block diagram of an imaging apparatus 1100 according to the second embodiment of the present invention.
As illustrated in FIG. 8, the imaging device 1100 according to the present embodiment includes an imaging unit 1110, a buffer memory unit 1130, an image processing unit (image processing device) 1140, a display unit 1150, a storage unit 1160, a communication unit 1170, and an operation unit. 1180, a CPU (Central Processing Unit) 1190, and a bus 1300.

The imaging unit 1110 includes an optical system 1111, an imaging element 1119, and an A / D (Analog to Digital) conversion unit 1120. The optical system 1111 includes one or more lenses.

The image sensor 1119 converts, for example, an optical image formed on the light receiving surface into an electric signal and outputs the electric signal to the A / D converter 1120.

Further, the image sensor 1119 uses the image data (electric signal) obtained when a still image capturing instruction is received via the operation unit 1180 as captured image data (electric signal) of the captured still image as an A / D. The data is output to the conversion unit 1120 or stored in the storage medium 1200 via the A / D conversion unit 1120 or the image processing unit 1140.
In addition, the image sensor 1119 captures moving image data (electrical signals) continuously captured at predetermined intervals obtained when a moving image capturing instruction is received via the operation unit 1180. Data (electrical signal) is output to the A / D conversion unit 1120 or stored in the storage medium 1200 via the A / D conversion unit 1120 or the image processing unit 1140.
In addition, the image sensor 1119, for example, uses image data (electrical signal) obtained continuously as through image data (captured image) (electrical signal) in a state where no imaging instruction is received via the operation unit 1180. The data is output to the / D conversion unit 1120 or continuously output to the display unit 1150 via the A / D conversion unit 1120 and the image processing unit 1140.

Note that the optical system 1111 may be attached to and integrated with the imaging device 1100, or may be detachably attached to the imaging device 1100.

The A / D conversion unit 1120 performs analog / digital conversion on the electrical / electronic signal (analog signal) of the image converted by the image sensor 1119, and captures image data (captured image) that is a digital signal obtained by this conversion. Output.

Here, the imaging unit 1110 is controlled by the CPU 1190 based on the instruction content received from the user through the operation unit 1180 and the set imaging conditions, and forms an optical image via the optical system 1111 on the imaging element 1119. A captured image based on the optical image converted into a digital signal by the A / D converter 1120 is generated.

Note that the imaging conditions define conditions at the time of imaging such as an aperture value and an exposure value, for example.
The imaging conditions can be stored in the storage unit 1160 and referred to by the CPU 1190, for example.

The image data output from the A / D conversion unit 1120 is stored based on the set image processing flow conditions, for example, the image processing unit 1140, the display unit 1150, the buffer memory unit 1130, (via the communication unit 1170). Input to one or more of the media 1200.
Note that the image processing flow condition is a flow for processing image data such as outputting image data output from the A / D conversion unit 1120 to the storage medium 1200 via the image processing unit 1140, for example. Specify the conditions. The image processing flow conditions can be stored in the storage unit 1160 and referred to by the CPU 1190, for example.

Specifically, an electrical signal of an image obtained when the imaging device 1119 receives a still image imaging instruction via the operation unit 1180 is input to the A / D conversion unit 1120 as an electrical signal of the captured still image. When outputting, the image data of the still image output from the A / D conversion unit 1120 is stored in the storage medium 1200 via the image processing unit 1140.
In addition, an electrical signal of a moving image continuously captured at a predetermined interval obtained when the imaging element 1119 receives a moving image capturing instruction via the operation unit 1180 is used as an electrical signal of the captured moving image as A / When outputting to the D conversion unit 1120, the image data of the moving image output from the A / D conversion unit 1120 is stored in the storage medium 1200 via the image processing unit 1140.
In the case where the imaging element 1119 does not accept an imaging instruction via the operation unit 1180, the electrical signal of a continuously obtained image is output to the A / D conversion unit 1120 as an electrical signal of a through image. For example, the live view image data output from the A / D conversion unit 1120 is continuously output to the display unit 1150 via the image processing unit 1140.

As a configuration for causing the image data output from the A / D conversion unit 1120 to pass through the image processing unit 1140, for example, the image data output from the A / D conversion unit 1120 is directly input to the image processing unit 1140. A configuration may be used, or the image data output from the A / D converter 1120 is stored in the buffer memory unit 1130, and the image data stored in the buffer memory unit 1130 is input to the image processing unit 1140. A configuration may be used.

The image processing unit 1140 executes image processing on the image data stored in the buffer memory unit 1130 based on the image processing conditions stored in the storage unit 1160. Details of the image processing unit 1140 will be described later. Note that the image data stored in the buffer memory unit 1130 is image data input to the image processing unit 1140. For example, the image data read from the above-described captured image data, through image data, or the storage medium 1200 is read. This is taken image data.

The image processing unit 1140 performs predetermined image processing on the input image data.
Here, as an example of image data to be input to the image processing unit 1140, there is image data output from the A / D conversion unit 1120. As another example, image data stored in the buffer memory unit 1130 is read out. Alternatively, as another example, image data stored in the storage medium 1200 can be read out and input via the communication unit 1170.

The operation unit 1180 includes, for example, a power switch, a shutter button, a cross key, a confirmation button, and other operation keys. The operation unit 1180 receives a user operation input by being operated by the user, and outputs it to the CPU 1190.

The display unit 1150 is a liquid crystal display, for example, and displays image data, an operation screen, and the like. For example, the display unit 1150 displays a captured image to which text is added by the image processing unit 1140.

Further, for example, the display unit 1150 can input and display image data that has been subjected to predetermined image processing by the image processing unit 1140, and can output image data output from the A / D conversion unit 1120, a buffer Image data read from the memory unit 1130 or image data read from the storage medium 1200 can be input and displayed.

The storage unit 1160 stores various information.
The buffer memory unit 1130 temporarily stores image data captured by the imaging unit 1110.
The buffer memory unit 1130 temporarily stores the image data read from the storage medium 1200.

The communication unit 1170 is connected to a removable storage medium 1200 such as a card memory, and writes (stores) captured image data to the storage medium 1200, reads image data from the storage medium 1200, or The image data stored in the storage medium 1200 is erased.
The storage medium 1200 is a storage unit that is detachably connected to the imaging apparatus 1100, and stores, for example, image data (captured / captured image data) generated by the imaging unit 1110.

The CPU 1190 controls each component included in the imaging device 1100. The bus 1300 is connected to the imaging unit 1110, the CPU 1190, the operation unit 1180, the image processing unit 1140, the display unit 1150, the storage unit 1160, the buffer memory unit 1130, and the communication unit 1170, and outputs from each unit. The transferred image data and control signals are transferred.

Note that the image processing unit 1140 of the imaging device 1100 corresponds to the determination unit 1020, the text creation unit 1030, and the text addition unit 1040 of the image processing device 1001 according to the first embodiment.
The storage unit 1160 of the imaging device 1100 corresponds to the storage unit 1090 of the image processing device 1001 according to the first embodiment.

For example, the image processing unit 1140 executes the processes of the determination unit 1020, the sentence creation unit 1030, and the sentence addition unit 1040 of the image processing apparatus 1001 according to the first embodiment.
Specifically, the storage unit 1160 stores at least information stored in the storage unit 1090 of the image processing apparatus 1001 according to the first embodiment.

A program for executing each process of the image processing apparatus 1001 according to the first embodiment is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. By doing so, the above-described various processes related to each process of the image processing apparatus 1001 may be performed. Here, the “computer system” includes hardware such as an OS (Operating System) and peripheral devices. Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. “Computer-readable recording medium” refers to a portable medium such as a flexible disk, a magneto-optical disk, a ROM (Read Only Memory), a writable nonvolatile memory such as a flash memory, or a CD (Compact Disc) -ROM. USB (Universal Serial Bus) A storage device such as a USB memory connected via an I / F (interface) or a hard disk built in a computer system.

Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time. The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

(Third embodiment)
FIG. 9 is a schematic block diagram illustrating a configuration of the imaging system 2001 according to the present embodiment.
The imaging device 2100 illustrated in FIG. 9 includes an imaging unit 2002, a camera control unit 2003, an image processing unit 2004, a storage unit 2005, a buffer memory unit 2006, a display unit 2007, an operation unit 2011, a communication unit 2012, a power supply unit 2013, and a bus. 2015 is provided.

The imaging unit 2002 includes a lens unit 2021, an imaging element 2022, and an AD conversion unit 2023. The imaging unit 2002 images a subject and generates image data. The imaging unit 2002 is controlled by the camera control unit 2003 based on the set imaging conditions (for example, aperture value, exposure, etc.), and the imaging element 2022 captures the optical image of the subject input via the lens unit 2021. Form an image on the surface. In addition, the imaging unit 2002 converts the analog signal output from the imaging element 2022 into a digital signal in the AD conversion unit 2023, and generates image data.
Note that the lens unit 2021 described above may be attached to and integrated with the imaging device 2100, or may be detachably attached to the imaging device 2100.

The imaging element 2022 outputs an analog signal obtained by photoelectrically converting the optical image formed on the imaging surface to the AD conversion unit 2023. The AD conversion unit 2023 converts the analog signal input from the image sensor 2022 into a digital signal, and outputs image data that is the converted digital signal.

For example, the imaging unit 2002 outputs image data of a captured still image in response to a still image shooting operation in the operation unit 2011. Further, the imaging unit 2002 outputs image data of moving images continuously captured at a predetermined interval in accordance with a moving image shooting operation in the operation unit 2011. Then, still image data and moving image data captured by the imaging unit 2002 are recorded in the storage medium 2200 via the buffer memory unit 2006 and the image processing unit 2004 under the control of the camera control unit 2003. Further, the imaging unit 2002 outputs image data obtained continuously at a predetermined interval as through image data (through image) in a shooting standby state in which no shooting operation is performed in the operation unit 2011. The through image data obtained by the imaging unit 2002 is displayed on the display unit 2007 via the buffer memory unit 2006 and the image processing unit 2004 under the control of the camera control unit 2003.

The image processing unit 2004 executes image processing on the image data stored in the buffer memory unit 2006 based on the image processing conditions stored in the storage unit 2005. Here, the image data stored in the buffer memory unit 2006 or the storage medium 2200 is, for example, still image data, through-image data, or moving image data captured by the imaging unit 2002, or the storage medium 2200. The image data read out from.

The storage unit 2005 stores predetermined shooting conditions, image processing conditions, reproduction control conditions, display control conditions, recording control conditions, output control conditions, and the like for controlling the imaging apparatus 2100. For example, the storage unit 2005 is a ROM.
Note that the storage unit 2005 may record image data of captured moving images and image data of still images. In this case, for example, the storage unit 2005 may be a flash memory or the like.

The buffer memory unit 2006 is used as a work area when the camera control unit 2003 controls the imaging apparatus 2100. Still image data, through image data, or moving image data captured by the imaging unit 2002, or image data read from the storage medium 2200 is buffered in the course of image processing under the control of the camera control unit 2003. Unit 2006 is temporarily stored. The buffer memory unit 2006 is, for example, a RAM (Random Access Memory).

The display unit 2007 is a liquid crystal display, for example, and is an image based on image data captured by the imaging unit 2002, an image based on image data read from the storage medium 2200, a menu screen, or the operation of the imaging device 2100. Displays information about status and settings.

The operation unit 2011 includes an operation switch for an operator to input an operation to the imaging apparatus 2100. For example, the operation unit 2011 includes a power switch, a release switch, a mode switch, a menu switch, an up / down / left / right selection switch, a confirmation switch, a cancel switch, and other operation switches. Each of the switches provided in the operation unit 2011 outputs an operation signal corresponding to each operation to the camera control unit 2003 in response to the operation.

A removable storage medium 2200 such as a card memory is inserted into the communication unit 2012.
The image data is written to, read from, or deleted from the storage medium 2200 via the communication unit 2012.
The storage medium 2200 is a storage unit that is detachably connected to the imaging device 2100. For example, image data generated by being captured by the imaging unit 2002 is recorded therein. In the present embodiment, the image data recorded on the storage medium 2200 is, for example, an Exif file.

The power supply unit 2013 supplies power to each unit included in the imaging apparatus 2100. The power supply unit 2013 includes, for example, a battery, and converts the voltage of power supplied from the battery into the operating voltage in each of the above-described units. The power supply unit 2013 supplies the converted power of the operating voltage to the above-described units under the control of the camera control unit 2003 based on the operation mode (for example, the shooting operation mode or the sleep mode) of the imaging device 2100.

The bus 2015 is connected to an imaging unit 2002, a camera control unit 2003, an image processing unit 2004, a storage unit 2005, a buffer memory unit 2006, a display unit 2007, an operation unit 2011, and a communication unit 2012, and image data output from each unit. And transfer control signals.

The camera control unit 2003 controls each unit included in the imaging device 2100.

FIG. 10 is a block diagram of the image processing unit 2004 according to the present embodiment.
As shown in FIG. 10, the image processing unit 2004 includes an image acquisition unit 2041, an image identification information acquisition unit 2042 (scene determination unit), a color space vector generation unit 2043, a main color extraction unit 2044, a table storage unit 2045, A label generation unit 2046, a second label generation unit 2047, and a label output unit 2048 are provided.

The image acquisition unit 2041 reads the image data captured by the imaging unit 2002 and the image identification information stored in association with the image data from the storage medium 2200 via the bus 2015. Image data read by the image acquisition unit 2041 is image data selected by the user of the imaging system 2001 by operating the operation unit 2011. The image acquisition unit 2041 outputs the acquired image data to the color space vector generation unit 2043. The image acquisition unit 2041 outputs the acquired image identification information to the image identification information acquisition unit 2042.

FIG. 11 is a diagram illustrating an example of image identification information stored in association with image data in the storage medium 2200 according to the present embodiment.
In FIG. 11, the left column is an example item, and the right column is an example of information. As shown in FIG. 11, items stored in association with image data are: imaging date / time, overall image resolution, shutter speed, aperture value (F value), ISO sensitivity, photometry mode, presence / absence of flash use, scene Mode, still image / moving image, etc. These pieces of image identification information are information set by the photographer using the operation unit 2011 of the imaging system 2001 at the time of imaging and information automatically set by the imaging apparatus 2100. Further, the Exif standard information stored in association with the image data may be used as the image identification information.
In the item, “scene” (also referred to as a shooting mode) is a combination pattern such as shutter speed, F value, ISO sensitivity, and focal length preset in the imaging apparatus 2100. These combination patterns are preset according to the object to be imaged, stored in the storage medium 2200, and manually selected by the user from the operation unit 2011. The scene is, for example, portrait, landscape, sport, night view portrait, party, beach, snow, sunset, night view, close-up, cooking, museum, fireworks, backlight, children, pets, and the like.

Returning to FIG. 10, the image identification information acquisition unit 2042 extracts the shooting information set in the captured image data from the image identification information output by the image acquisition unit 2041, and uses the extracted shooting information as the first label. The data is output to the generation unit 2046. The shooting information is information necessary for the first label generation unit 2046 to generate the first label, such as a scene and shooting date / time.

The color space vector generation unit 2043 converts the image data output from the image acquisition unit 2041 into a predetermined color space vector. The predetermined color space is, for example, HSV (Hue (Hue), Saturation (Saturation), Brightness (Brightness))).
The color space vector generation unit 2043 classifies all pixels of the image data for each color vector, detects the frequency for each color vector, and generates a color vector frequency distribution. The color space vector generation unit 2043 outputs information indicating the frequency distribution of the generated color vector to the main color extraction unit 2044.
When the image data is HSV, the color vector is expressed as the following equation (4).

In equation (4), i, j, and k are natural numbers of 0 to 100, respectively, when the hue is normalized to 0 to 100%.

The main color extraction unit 2044 extracts three colors as main colors in order of frequency from the information indicating the frequency distribution of the color vectors output from the color space vector generation unit 2043, and generates information indicating the extracted main colors as a first label. To the unit 2046. In addition, a color with high frequency is a color with many pixels of the same color vector. The information indicating the main color is the color vector of Expression (4) and the frequency (number of pixels) for each color vector.
In the present embodiment, the main color extraction unit 2044 may be configured by a color space vector generation unit 2043 and a main color extraction unit 2044.

In the table storage unit 2045 (storage unit), a first label is stored in association with each scene and each combination of main colors.

FIG. 12 is a diagram illustrating an example of combinations of primary colors and first labels stored in the table storage unit 2045 according to the present embodiment.
As shown in FIG. 12, among the main colors extracted from the image data, the first color having the highest frequency, the second color having the second highest frequency after the first color, and the second frequency having the second highest frequency. A first label is defined in advance for each combination of the three colors of the third color and for each scene, and is stored in the table storage unit 2045. For example, when the first color is color 1, the second color is color 2, and the third color is color 3, the first label of scene 1 is label (1, 1), and the label of scene n is label (1 , N). Similarly, in the combination where the first color is the color m, the second color is the color m, and the third color is the color m, the first label of the scene 1 is the label (m, 1), and the label of the scene n is the label ( m, n).
In this way, labels for each scene and each combination of the three main colors are defined in advance by experiments, questionnaires, etc., and stored in the table storage unit 2045. The frequency ratio of the first color, the second color, and the third color is 1: 1: 1.

Returning to FIG. 10, the first label generation unit 2046 is stored in association with the shooting information output by the image identification information acquisition unit 2042 and the information indicating the main color output by the main color extraction unit 2044. The label is read from the table storage unit 2045. The first label generation unit 2046 outputs information indicating the read first label and information indicating the main color output by the main color extraction unit 2044 to the second label generation unit 2047. In addition, the first label generation unit 2046 determines a scene using, for example, information included in Exif that is imaging information.

The second label generation unit 2047 extracts the frequency for each color vector from the information indicating the main color output from the main color extraction unit 2044, normalizes the frequency of the three color vectors using the extracted frequency, and outputs the three main colors. Calculate the color ratio. The second label generation unit 2047 generates a modification label (third label) that modifies the first label based on the calculated ratio of the three main colors, and the first label generation unit 2046 outputs the generated modification label. The first label is modified by being modified to one label to generate a second label for the image data. The second label generation unit 2047 outputs information indicating the generated second label to the label output unit 2048.

The label output unit 2048 stores information indicating the second label output from the second label generation unit 2047 in the table storage unit 2045 in association with the image data. Alternatively, the label output unit 2048 stores information indicating the label output from the second label generation unit 2047 in the storage medium 2200 in association with the image data.

FIG. 13 is a diagram illustrating an example of main colors of image data according to the present embodiment.
In FIG. 13, the horizontal axis is a color vector, and the vertical axis is the frequency of a color vector (color information).
Example of FIG. 13, the color space vector generation unit 2043, the color decomposed image data into HSV vector _{_{(HSV = (i m, j}} m, k m); m is a natural number from 0 to 100) a graph of the frequency distribution of It is. In FIG. 13, H (hue) = 0, S (saturation) = 0, V (lightness) = 0 on the left end, and H = 100, S = 100, V = 100 on the right end are arranged in order. It is a thing. The result of calculating the frequency for each color vector is schematically shown. In the example of FIG. 13, the first color c2001 having the highest frequency is the vector HSV = (i ₁ , j ₆₉ , k ₁₀₀ ) and rose (rose). The second color c2002 having the second highest frequency after the first color is a vector HSV = (i ₁₃ , j ₅₂ , k ₁₀₀ ) and light yellow (sulfur). Furthermore, the third color c2003 having the second highest frequency after the second color is a vector HSV = (i ₄₀ , j ₆₅ , k ₈₀ ), and a bitumen magnetic color (emerald).

14A and 14B are diagrams illustrating an example of labeling of main colors extracted in FIG. The color vectors in FIGS. 13, 14 </ b> A, and 14 </ b> B will be described assuming that the scene mode is portrait image data, for example.
FIG. 14A is an example of the first color, the second color, and the third color extracted in FIG. As shown in FIG. 14A, they are schematically shown in the order of the color vectors shown in FIG. 13 from the left. The first label generation unit 2046 reads out the first label stored in association with the combination of the first color, the second color, and the third color extracted by the main color extraction unit 2044 from the table storage unit 2045. In this case, the first label of the combination of the first color, the second color, and the third color is stored as “fun”. As shown in FIG. 14A, the widths of the first color, the second color, and the third color before normalization are L2001, L2002, and L2003, and the widths L2001, L2002, and L2003 are equal in length. The length L2010 is the sum of the widths L2001, L2002, and L2003.

In FIG. 14B, the extracted first color, second color, and third color are normalized with frequency, and the widths of the first color, the second color, and the third color are set to L2001 ′, L2002 ′, and L2003 ′. It is a figure after correcting as follows. The total width L2010 is the same as in FIG. 14A. In the example of FIG. 14B, since the frequency of the first color is higher than the frequencies of the other second and third colors, the second label generation unit 2047 reads the first label “pleasant” read by the first label generation unit 2046. ”Is generated based on a predetermined rule, the decoration label“ Very ”for modifying the first label“ Pleasant ”. The predetermined rule is that when the first color is more frequent than the predetermined threshold value than the other second color and the third color, the second label generation unit 2047 uses the modified label “ The first label is modified by generating the “very” and modifying the generated modified label to the first label “fun” to generate the second label “very fun”. The modification label is, for example, a word that emphasizes the first label.

Next, examples of modification labels will be described.
As shown in FIG. 14A, before normalization, the width or area of the three colors extracted by the main color extraction unit 2044 is 1: 1: 1. Then, after normalization based on the frequency of the color vectors, the widths or areas of the three colors are corrected as shown in FIG. 14B. For example, when the ratio of the first color is larger than about 67% of the entire L2010, the second label generation unit 2047 modifies the first label by modifying “very” as the modification label to the first label. 2 labels. When the ratio of the first color is larger than about 50% and smaller than 67% of the entire L2010, the second label generation unit 2047 determines that there is no decoration label. That is, the second label generation unit 2047 sets the first label as the second label without correcting it. When the ratio of the first color is about 33% of the entire L2010, the second label generation unit 2047 modifies the first label by modifying “first” as the decoration label, and changes the second label Label.
As described above, the second label generation unit 2047 generates a modification label to be modified according to the first label. For example, for each first label, a modifiable modification label may be stored in advance in association with the table storage unit 2045.

Next, examples of main colors for each scene will be described with reference to FIGS. 15A to 17B.
15A and 15B are diagrams of sports image data and color vectors according to the present embodiment. FIG. 15A shows sports image data, and FIG. 15B is a graph of sports color vectors. 16A and 16B are diagrams of portrait image data and color vectors according to the present embodiment. FIG. 16A shows portrait image data, and FIG. 16B is a graph of portrait color vectors. 17A and 17B are views of landscape image data and color vectors according to the present embodiment. FIG. 17A is image data of a landscape, and FIG. 17B is a graph of a landscape color vector. In FIG. 15B, FIG. 16B, and FIG. 17B, the horizontal axis is a color vector, and the vertical axis is frequency (number of pixels).

15A and 15B, the image data of FIG. 15A is decomposed into color vectors for each pixel, and the frequency (number of pixels) of each color vector is graphed as shown in FIG. 15B. The main color extraction unit 2044 extracts, for example, three colors c2011, c2012, and c2013 having a large number of pixels from such color vector information.

As shown in FIGS. 16A and 16B, the image data of FIG. 16A is decomposed into color vectors for each pixel, and the frequency (number of pixels) of each color vector is graphed as shown in FIG. 16B. The main color extraction unit 2044 extracts, for example, three colors c2021, c2022, and c2023 having a large number of pixels from such color vector information.
As shown in FIGS. 17A and 17B, the image data of FIG. 17A is decomposed into color vectors for each pixel, and the frequency (number of pixels) of each color vector is graphed as shown in FIG. 17B. The main color extraction unit 2044 extracts, for example, three colors c2031, c2032, and c2033 having a large number of pixels from such color vector information.

FIG. 18 is a diagram for explaining an example of a first label of a combination of main colors for each scene according to the present embodiment. In FIG. 18, rows represent scenes, and columns represent color vectors.
In FIG. 18, when the image data is HSV, the hue, saturation, and intensity of the HSV of the color combination (color 1, color 2, color 3) are, for example, color 1 (94, 100, 25) (maroon , Maroon), color 2 is (8, 100, 47) (cigarette color, coffee brown), and color 3 is (81, 100, 28) (deep purple, Dusky Violet).
The hue, saturation, and intensity of the HSV of the color vector (color 4, color 5, and color 6) are, for example, that the color 4 is (1, 69, 100) (rose, rose), and the color 5 is (13, 25, 100) (ivory color, ivory) and color 6 is (52, 36, 91) (light blue, aqua blue).
Further, the hue, saturation, and intensity of the HSV of the color vector (color 7, color 8, color 9) are, for example, that color 7 is (40, 65, 80) (dark blue magnetic color, emerald) and color 8 is ( 0, 0, 100) (white, white), and color 9 is (59, 38, 87) (salvia, salvia blue).

As shown in FIG. 18, for example, when the combination of colors is (color 1, color 2, color 3), the first label whose scene is a portrait is stored in the table storage unit 2045 as “dandy”. . In the same color combination (color 1, color 2, color 3), the first label of the scene is landscape is stored in the table storage unit 2045 as “interesting”. Further, the first label in which the scene is sports even in the same color combination (color 1, color 2, and color 3) is stored in the table storage unit 2045 as “(Rugby style) masculine”.
Also, as shown in FIG. 18, for example, when the combination of colors is (color 4, color 5, color 6), the first label whose scene is portrait is “childish” in the table storage unit 2045. It is remembered. The first label of a scene that is a scene of the same color combination (color 4, color 5, and color 6) is stored in the table storage unit 2045 as “soft”. In addition, the first label in which the scene is a sport with the same color combination (color 4, color 5, and color 6) is stored in the table storage unit 2045 as “(tennis style) lively”.
Also, as shown in FIG. 18, for example, when the color combination is (color 7, color 8, color 9), the table storage unit 2045 indicates that the first label whose scene is portrait is “youthful”. Is remembered. The first label whose scene is landscape with the same color combination (color 7, color 8, and color 9) is stored in the table storage unit 2045 as “(fresh green image) refreshing”.
In addition, the first label in which the scene is sport even in the same color combination (color 7, color 8, and color 9) is stored in the table storage unit 2045 as “(Sea sports style) refreshing”.
Also, as shown in FIG. 18, the information stored in the table storage unit 2045 stores not only the color combination and the first label of the adjective or adverb but also the word representing the image in association with each other. Also good. Note that the word representing an image is, for example, a rugby image or a fresh green image.

FIG. 19 is a diagram illustrating an example of a first label based on time, season, and color vector according to the present embodiment. In FIG. 19, the color vector is HSV image data and the color combination (color 7, color 8, and color 9) described in FIG. In FIG. 19, the columns represent time and season, and the rows are labels of each time and season for color combinations (color 7, color 8, color 9).
As shown in FIG. 19, the first label of the combination of colors (color 7, color 8, color 9) is “fresh” when the time is morning, “rainy” when the time is noon, In the case of night, it is stored in the table storage unit 2045 that “dawn is near”.
As shown in FIG. 19, the first label of the color combination (color 7, color 8, color 9) is “chilly” when the season is spring, “cool” when the season is summer, and “cool” when the season is autumn When the season is winter, “cold” is stored in the table storage unit 2045.
For such information regarding the time and season, the first label generation unit 2046 reads the first label from the table storage unit 2045 based on the shooting date and time included in the image identification information acquired by the image identification information acquisition unit 2042.
Further, as shown in FIG. 19, the first label may be the same in spring and autumn for the same color combination (color 7, color 8, and color 9).

Next, label generation processing performed by the imaging apparatus 2100 will be described with reference to FIG. FIG. 20 is a flowchart of label generation performed by the imaging apparatus 2100 according to this embodiment.

(Step S2001) The imaging unit 2002 of the imaging apparatus 2100 captures an image based on the control of the camera control unit 2003. Next, the imaging unit 2002 converts the captured image data into digital data by the AD conversion unit 2023, and stores the converted image data in the storage medium 2200.
Next, the camera control unit 2003 captures image identification information including imaging conditions set or selected by the user by the operation unit 2011 at the time of imaging and information automatically set or acquired by the imaging device 2100 at the time of imaging. The image data is stored in the storage medium 2200 in association with the image data. After step S2001 is completed, the process proceeds to step S2002.

(Step S2002) Next, the image acquisition unit 2041 of the image processing unit 2004 stores the image data captured by the imaging unit 2002 via the bus 2015 and the image identification information stored in association with the image data. Read from the medium 2200. Note that the image data read by the image acquisition unit 2041 is image data selected by the user of the imaging system 2001 by operating the operation unit 2011.
Next, the image acquisition unit 2041 outputs the acquired image data to the color space vector generation unit 2043. Next, the image acquisition unit 2041 outputs the acquired image identification information to the image identification information acquisition unit 2042. After step S2002, the process proceeds to step S2003.

(Step S2003) Next, the image identification information acquisition unit 2042 extracts shooting information set in the imaged image data captured from the image identification information output by the image acquisition unit 2041, and uses the extracted shooting information as the first label. The data is output to the generation unit 2046. After step S2003 ends, the process proceeds to step S2004.

(Step S2004) Next, the color space vector generation unit 2043 converts the image data output from the image acquisition unit 2041 into a vector in a predetermined color space. The predetermined color space is, for example, HSV. Next, the color space vector generation unit 2043 classifies all pixels of the image data for each generated color vector, detects the frequency for each color vector, and generates a color vector frequency distribution. Next, the color space vector generation unit 2043 outputs information indicating the frequency distribution of the generated color vector to the main color extraction unit 2044. After step S2004 ends, the process proceeds to step S2005.

(Step S2005) Next, the main color extraction unit 2044 extracts three colors as the main colors in order of frequency from the information indicating the frequency distribution of the color vectors output from the color space vector generation unit 2043, and the extracted main colors are extracted. The indicated information is output to the first label generation unit 2046. After step S2005, the process proceeds to step S2006.

(Step S2006) Next, the first label generation unit 2046 is stored in association with the photographing information output by the image identification information acquisition unit 2042 and the information indicating the main color output by the main color extraction unit 2044. One label is read from the table storage unit 2045. Next, the first label generation unit 2046 outputs information indicating the read first label and information indicating the main color output by the main color extraction unit 2044 to the second label generation unit 2047.
The first label generation unit 2046 is stored in the table storage unit 2045 in association with the shooting information output by the image identification information acquisition unit 2042 and the information indicating the main color output by the main color extraction unit 2044. When the first label is not stored, for example, it is determined whether or not the first label of another scene is recorded for the same main color. When it is determined that the first label of another scene is recorded for the same main color, the first label generation unit 2046 reads the first label of another scene for the same main color from the table storage unit 2045. Also good. On the other hand, if it is determined that the first label of another scene is not recorded for the same main color, the first label generation unit 2046 associates the main color with the color vector that is closest to the main color and has the same scene. The stored label may be read from the table storage unit 2045.
After step S2006 ends, the process proceeds to step S2007.

(Step S2007) Next, the second label generation unit 2047 normalizes the frequency of each color vector from the information indicating the main color output by the main color extraction unit 2044, and calculates the ratio of the three main colors. After step S2007 ends, the process proceeds to step S2008.

(Step S2008) Next, the second label generation unit 2047 generates a modification label that modifies the first label output by the first label generation unit 2046 based on the calculated ratio of the three main colors, and the generated modification The first label is modified by modifying the first label with the label to generate a second label. Next, the second label generation unit 2047 outputs information indicating the generated second label to the label output unit 2048. After step S2008 ends, the process proceeds to step S2009.

(Step S2009) Next, the label output unit 2048 stores information indicating the second label output by the second label generation unit 2047 in the table storage unit 2045 in association with the image data.
In step S2006, when the first label stored in association with the information indicating the scene and the information indicating the main color is not stored in the table storage unit 2045, the label output unit 2048 determines in step S2006. The read first label and the extracted main color may be associated with each other and newly stored in the table storage unit 2045.
Above, the label production | generation process which the image process part 2004 performs is complete | finished.

As described above, the imaging apparatus 2100 according to the present embodiment can extract the main color that is the feature amount of the image data with a small amount of calculation compared to the conventional technique. Furthermore, the imaging apparatus 2100 according to the present embodiment performs scene discrimination using information included in Exif, and the number of scenes stored in the table storage unit 2045 is selected based on the discrimination result. Scenes can be identified by the amount of computation. As a result, the imaging apparatus 2100 of the present embodiment can generate many labels with less arithmetic processing and fewer options for image data than in the related art.
That is, the image processing unit 2004 extracts three main colors having a high frequency from color vectors obtained by converting image data into a color space, and stores the first label stored in advance in association with the extracted main colors. Extract. As shown in FIG. 18 and FIG. 19, since the first label is stored in association with the main color for each scene, for each time and for each season, the image processing unit 2004 is extracted from the image data. Even if the main colors are the same, different first labels can be generated for each scene, time, and season, so that an optimum label for image data can be generated for each scene.
Further, the image processing unit 2004 normalizes the frequencies of the three main colors, generates a modified label that modifies the generated first label according to the ratio of the most frequent first color, and generates the generated modification The first label is modified by modifying the first label with the label to generate a second label.
As a result, the image processing unit 2004 generates the second label by modifying the first label with the modification label based on the ratio of the color arrangement of the main color of the image data. Compared with the case where labels are generated by extracting colors, a more optimal label can be generated for image data for each scene.

In the present embodiment, the example in which the color space vector generation unit 2043 generates color vectors of image data in the HSV color space has been described. However, RGB (red, green, blue), a luminance signal, and two color difference signals YCrCb or YPbPr, HLS based on hue, saturation, and lightness, Lab, which is a kind of complementary color space, and a color space based on the Japan Color Coordinating System (PCCS).
In the present embodiment, the color space vector generation unit 2043 has described an example in which the frequency distribution of the color vector is generated and information indicating the frequency distribution of the generated color vector is output to the main color extraction unit 2044. The color space vector generation unit 2043 may detect the frequency for each color vector and output information indicating the frequency for each detected color vector to the main color extraction unit 2044. Also in this case, for example, each RGB value stored in the table storage unit 2045 may be a color selected by the table creator from an interval of every 1 or 10 or the like.

In the present embodiment, the label output unit 2048 has described the example in which the information indicating the label is stored in the table storage unit 2045 in association with the image data. As text data, the image data selected by the user may be displayed on the display unit 2007 in a superimposed manner.
In the present embodiment, the first label and the second label are examples of adjectives or adverbs. However, the first label and the second label may be nouns, for example. In this case, the first label is, for example, “exhilarating”, “rejuvenation”, “dandy”, and the like.

In this embodiment, an example in which the main color is calculated from the image data has been described. However, the main color extraction unit 2044 extracts three colors that are separated by a predetermined distance between adjacent color vectors. It may be. In FIG. 15B, the adjacent color vectors are, for example, color vectors (50, 50, 50) and (50, 50, 51) when the image data is HSV. The distance between adjacent colors may be set based on a known threshold value that can identify a human visual color. For example, WEB256 colors recommended for use in WEB and monotone 256 colors that can be expressed in black and white may be used.

Also, the main color extraction unit 2044 performs a smoothing process on the frequency distribution of the color vectors generated by the color space vector generation unit 2043 using a known method before calculating the main color. Also good. Alternatively, the color space vector generation unit 2043 may perform a color reduction process using a known method before generating a color space vector. For example, the color space vector generation unit 2043 may reduce the image data to WEB color.
In the present embodiment, the main color extraction unit 2044 has described an example in which three frequently used colors are extracted from image data as main colors. However, the number of colors to be extracted is not limited to three, and two or more colors are extracted. If it is.

In this embodiment, an example in which HSV is used as a color vector has been described. When storing the combination of three colors as shown in FIG. 12 in the table storage unit 2045, HSV = (0,0,0), (1,0, 0), (1, 1, 0)... (100, 100, 99), (100, 100, 100) may be selected by the table creator. Alternatively, HSV = (0,0,0), (10,0,0), (10,10,0) (100, 100, 90), (100, 90, etc.) 100, 100, 100) may be selected by the table creator. In this way, by setting the interval of each value in the color vector to a predetermined value such as 10 or the like, the capacity to be stored in the table storage unit 2045 can be reduced, and the calculation amount can also be reduced.

(Fourth embodiment)
In the third embodiment, the example in which the scene of the image data selected by the user is determined based on the image identification information stored in the storage medium 2200 in association with the image data has been described. In the present embodiment, an example will be described in which an image processing apparatus discriminates a scene from selected image data and generates a label based on the discrimination result.

FIG. 21 is a block diagram of the image processing unit 2004a according to the present embodiment.
As shown in FIG. 21, the image processing unit 2004a includes an image acquisition unit 2041a, an image identification information acquisition unit 2042, a color space vector generation unit 2043, a main color extraction unit 2044, a table storage unit 2045, a first label generation unit 2046a, A second label generation unit 2047, a label output unit 2048, a feature amount extraction unit 2241, and a scene determination unit 2242 are provided. Note that functional units having the same functions as those of the third embodiment are denoted by the same reference numerals, and description thereof is omitted.

The image acquisition unit 2041a reads the image data captured by the imaging unit 2002 and the image identification information stored in association with the image data from the storage medium 2200 via the bus 2015. The image acquisition unit 2041a outputs the acquired image data to the color space vector generation unit 2043 and the feature amount extraction unit 2241. The image acquisition unit 2041a outputs the acquired image identification information to the image identification information acquisition unit 2242.

The feature amount extraction unit 2241 extracts feature amounts from the image data output by the image acquisition unit 2041a by a known method. Known methods use, for example, methods such as image binarization, smoothing, edge detection, and contour detection. The feature amount extraction unit 2241 outputs information indicating the extracted feature amount to the scene determination unit 2242.

The scene discrimination unit 2242 discriminates the scene of the image data acquired by the image acquisition unit 2041a using a known method based on the information indicating the feature amount output by the feature amount extraction unit 2241. The known method used for scene discrimination is, for example, as in the prior art described in Patent Document 2, the scene discrimination unit 2242 divides image data into a plurality of predetermined areas, and each area is divided. Based on the feature amount, it is determined whether a person is reflected in the image data or whether the sky is reflected. Based on the determination result, the scene determination unit 2242 determines the scene of the image data.
The scene determination unit 2242 outputs information indicating the determined scene to the first label generation unit 2046a.
In the present embodiment, the scene determination unit 2242 may be configured by a feature amount extraction unit 2241 and a scene determination unit 2242.

The first label generation unit 2046a stores the first label stored in association with the information indicating the scene output from the scene determination unit 2242 and the information indicating the main color output from the main color extraction unit 2044. Read from 2045. The first label generation unit 2046a outputs the information indicating the read first label and the information indicating the main color output by the main color extraction unit 2044 to the second label generation unit 2047.

Next, label generation processing performed by the image processing unit 2004a of the imaging device 2100 will be described with reference to FIG. The imaging device 2100 performs step S2001 and step S2002 in the same manner as in the third embodiment.

(Step S2003) Next, the feature amount extraction unit 2241 extracts the feature amount from the image data output by the image acquisition unit 2041a by a known method, and outputs information indicating the extracted feature amount to the scene determination unit 2242. .
Next, the scene discriminating unit 2242 extracts a scene, which is shooting information of the image data acquired by the image acquisition unit 2041a, using a known method based on the information indicating the feature amount output by the feature amount extraction unit 2241. The information indicating the acquired scene is output to the first label generation unit 2046a. After step S2003 ends, the process proceeds to step S2004.

The image processing unit 2004a performs steps S2004 and S2005 in the same manner as in the third embodiment. After step S2005, the process proceeds to step S2006.

(Step S2006) Next, the first label generation unit 2046a stores the information indicating the scene output by the scene determination unit 2242 and the information indicating the main color output by the main color extraction unit 2044 in association with each other. One label is read from the table storage unit 2045. Next, the first label generation unit 2046a outputs information indicating the read first label and information indicating the main color output by the main color extraction unit 2044 to the second label generation unit 2047. After step S2006 is completed, the image processing unit 2004a performs steps S2007 to S2009 in the same manner as in the third embodiment.

As described above, the image processing unit 2004a discriminates scenes of captured image data using a predetermined method, and performs the third implementation based on the determined scenes and the three main colors extracted from the image data. The label is generated in the same way as the form. As a result, the image processing unit 2004a can generate an optimum label for the image data even when the image identification information is not stored in the storage medium 2200 in association with the image data.

In the present embodiment, the example in which the image processing unit 2004a generates a label based on the scene determined from the image data and the extracted main color has been described. However, as in the third embodiment, the shooting information is also generated. It may be used to determine the scene. For example, the image processing unit 2004a may extract information indicating the date and time of image capture from the image identification information, and generate a label based on the extracted image capture date and scene determined from the image data. More specifically, when the scene is “landscape” and the imaging date is “autumn”, the first label stored in association with the scenes “landscape”, “autumn”, and main colors is read out and read out. A label may be generated based on the two first labels.
Alternatively, the main color and the first label may be stored in the table storage unit 2045 with the scene as “autumn scenery”.

(Fifth embodiment)
In the third embodiment and the fourth embodiment, the example in which the label is generated based on the main color extracted from the entire image data selected by the user has been described. In the present embodiment, an example will be described in which a scene is determined from selected image data, a main color is extracted in a predetermined image data area based on the determined scene, and a label is generated from the extracted main color. .

FIG. 22 is a block diagram of the image processing unit 2004b according to the embodiment of the present embodiment.
As shown in FIG. 22, the image processing unit 2004b includes an image acquisition unit 2041b, an image identification information acquisition unit 2042b, a color space vector generation unit 2043b, a main color extraction unit 2044, a table storage unit 2045, a first label generation unit 2046, A second label generation unit 2047, a label output unit 2048, and an area extraction unit 2341 are provided. Note that functional units having the same functions as those of the third embodiment are denoted by the same reference numerals, and description thereof is omitted.

The image acquisition unit 2041b reads the image data captured by the imaging unit 2002 and the image identification information stored in association with the image data from the storage medium 2200 via the bus 2015. The image acquisition unit 2041b outputs the acquired image data to the region extraction unit 2341 and the color space vector generation unit 2043b. The image acquisition unit 2041b outputs the acquired image identification information to the image identification information acquisition unit 2042b.

The image identification information acquisition unit 2042b extracts shooting information set in the imaged image data captured from the image identification information output by the image acquisition unit 2041b, and uses the extracted shooting information as a first label generation unit 2046 and a region extraction unit. 2341.

The region extraction unit 2341 extracts a region for extracting a main color from the image data output from the image identification information acquisition unit 2042b based on the shooting information output from the image identification information acquisition unit 2042b. The region extraction unit 2341 extracts image data of a region for extracting the extracted main color from the image data output by the image identification information acquisition unit 2042b, and outputs the image data of the extracted region to the color space vector generation unit 2043b. .
Note that, as a technique for extracting a predetermined region for extracting a main color, for example, a region to be extracted from the entire image may be set in advance for each scene. For example, when the scene is “landscape”, the area is two-thirds from the top of the image data. When the scene is “portrait”, the area has a predetermined size at the center of the image data.
Alternatively, in combination with the fourth embodiment, based on the feature amount extracted from the image data, the region from which the feature amount is extracted may be extracted as a region for extracting the main color. In this case, a plurality of areas may be extracted from the image data. For example, when it is determined that the scene of the captured image data is a portrait, the scene determination unit 2242 in FIG. 21 performs face detection using a technique such as feature amount extraction. When there are a plurality of detected face regions, the scene determination unit 2242 detects the main color from each of the detected regions. The first label generation unit 2046 and the second label generation unit 2047 may generate a plurality of labels for each detected main color. Or the scene discrimination | determination part 2242 may output a discrimination | determination result to the main color extraction part 2044 so that the area | region containing all the detected face areas may be used as an area | region which extracts a main color.

Returning to FIG. 22, the color space vector generation unit 2043b converts the image data output from the region extraction unit 2341 into a vector of a predetermined color space. The predetermined color space is, for example, HSV. The color space vector generation unit 2043b classifies all pixels of the image data for each generated color vector, detects the frequency for each color vector, and generates a color vector frequency distribution.
The color space vector generation unit 2043b outputs information indicating the frequency distribution of the generated color vector to the main color extraction unit 2044.

Next, label generation processing performed by the image processing unit 2004b of the imaging device 2100 will be described with reference to FIG. FIG. 23 is a flowchart of label generation performed by the imaging apparatus 2100 according to the present embodiment. The imaging device 2100 performs step S2001 as in the third embodiment. After step S2001, the process proceeds to step S2101.

(Step S2101) Next, the image acquisition unit 2041b of the image processing unit 2004b stores the image data captured by the imaging unit 2002 via the bus 2015 and the image identification information stored in association with the image data. Read from the medium 2200.
Next, the image acquisition unit 2041b outputs the acquired image data to the region extraction unit 2341 and the color space vector generation unit 2043b. Next, the image acquisition unit 2041b outputs the acquired image identification information to the image identification information acquisition unit 2042b. After step S2101, the process proceeds to step S2003.

(Step S2003) The image processing unit 2004b performs step S2003 in the same manner as in the third embodiment. After step S2003 is completed, the process proceeds to step S2102.

(Step S2102) Next, the area extraction unit 2341 selects main colors from image data output from the image identification information acquisition unit 2042b by a predetermined method based on the shooting information output from the image identification information acquisition unit 2042b. Extract the area to be extracted.
Next, the region extraction unit 2341 extracts image data of a region from which the extracted main color is extracted from the image data output from the image identification information acquisition unit 2042b, and the color space vector generation unit 2043b extracts the image data of the extracted region. Output to. After step S2102, the process proceeds to step S2103.

(Step S2103) Next, the color space vector generation unit 2043b converts the image data of the region output from the region extraction unit 2341 into a vector of a predetermined color space. Next, the color space vector generation unit 2043b classifies all pixels of the image data for each generated color vector, detects the frequency for each color vector, and generates a color vector frequency distribution. Next, the color space vector generation unit 2043b outputs information indicating the frequency distribution of the generated color vector to the main color extraction unit 2044. After step S2103 ends, the process proceeds to step S2005.

Hereinafter, the image processing unit 2004b performs steps S2005 to S2009 in the same manner as in the third embodiment.

As described above, the image processing unit 2004b extracts a region for extracting a main color from captured image data based on shooting information such as a scene. Then, the image processing unit 2004b generates a label in the same manner as in the third embodiment, based on the three main colors extracted from the image data of the region from which the main color is extracted. As a result, the image processing unit 2004b extracts the main color from the image data of the area corresponding to the scene, and generates a label based on the extracted main color of the area. Therefore, the third and fourth embodiments. As compared with the above, it is possible to generate an optimum label for image data suitable for a scene.

(Sixth embodiment)
In the third to fifth embodiments, the example in which three colors are selected as the main colors from the image data selected by the user has been described. In the present embodiment, an example in which three or more colors are selected from selected image data will be described. Note that the configuration of the image processing unit 2004 will be described for the same case as in the third embodiment (FIG. 10).

FIG. 24 is a diagram illustrating an example of extracting a plurality of color vectors from image data according to the present embodiment. In FIG. 24, the horizontal axis represents the color vector, and the vertical axis represents the frequency.
In FIG. 24, description will be made assuming that the main color extraction unit 2044 has extracted the first color vector c2021, the second color vector c2022, and the third color vector c2023 as in FIG. 16B.
In FIG. 24, when the frequency of the color vectors c2024, c2025, c2026 is within a predetermined range, the main color extraction unit 2044 extracts the color vectors c2024, c2025, c2026 as the fourth main color. In this case, the table storage unit 2045 stores labels for each scene including the fourth color in addition to the first to third colors described in FIG.
When the fourth color is extracted, the main color extraction unit 2044 reads the first label of the combination of the first color to the fourth color stored in the table storage unit 2045, and stores the stored first label. To extract. When a plurality of first labels of combinations of the first color to the fourth color are stored, for example, the main color extraction unit 2044 selects the first label read from the table storage unit 2045 first. Or you may make it select at random.

Further, the main color extraction unit 2044 may select three colors as main colors from the extracted four colors. In this case, the main color extraction unit 2044 may calculate the degree of approximation of the four extracted colors, and calculate the three colors having a low degree of approximation as the main color. For example, in FIG. 24, the degree of color approximation will be described assuming that four color vectors c2022 to c2025 are extracted as the first to fourth colors. The main color extraction unit 2044 reduces the four extracted colors from an 8-bit color space to, for example, a 7-bit color space. After the color reduction, for example, when the color vectors c2024 and c2025 are determined to be the same color, the main color extraction unit 2044 determines that the color vectors c2024 and c2025 are approximate colors. Then, the main color extracting unit 2044 selects one of the color vectors c2024 and c2025 as the third main color. In this case, in the frequency distribution of FIG. 24, the main color extraction unit 2044 selects a color vector having a larger distance away from the first color vector c2022 and the second color vector c2023 in the horizontal axis direction. You may make it like, You may select at random.
In addition, if the four color vectors remain separated even when the color is reduced to the 7-bit color space, the color space vector generation unit 2043 performs the color reduction until the four color vectors are integrated into the three color vectors.

As described above, four or more main colors and first labels are stored in advance in the table storage unit 2045 for each scene as shooting information, and four or more main colors are extracted from the image data. Since the label is generated based on the extracted main color and the scene, it is possible to generate a label more optimal for the image data than in the third to fifth embodiments.
That is, in the present embodiment, the image processing unit 2004 extracts four frequently used colors from color vectors obtained by converting image data into a color space, and stores them in association with the four extracted colors in advance. Extract the first label. Since the first label is stored in advance in association with the four extracted main color vectors for each piece of shooting information, for example, for each scene, for each time or season, the image processing unit 2004 is extracted from the image data. Even if the main colors are the same, different first labels can be generated for each scene, time, and season. Further, the image processing unit 2004 normalizes the frequencies of the four main colors, and adds a second label for emphasizing the first label to the generated first label according to the ratio of the most frequent first color. Appends to generate a label. As a result, the image processing unit 2004 can generate an optimum label for image data based on the four main colors as compared with the third to fifth embodiments.
In addition, the image processing unit 2004 extracts three main colors from the extracted four main colors by subtractive color or the like, and performs label generation processing on the extracted three main colors as in the third embodiment. Do. As a result, the image processing unit 2004 can generate an optimum label for the image data even if the image data has a small frequency vector frequency difference.

In this embodiment, an example in which four main colors are extracted from image data has been described. However, the main colors to be extracted are not limited to four colors, and may be more than that. In this case, the first label corresponding to the number of extracted main colors may be stored in the table storage unit 2045. For example, when extracting five main colors, as described above, the main color extracting unit 2044 performs color reduction and integrates the approximate colors into three main colors from the extracted plurality of main colors. You may make it extract again. Further, for example, when extracting six main colors, the main color extracting unit 2044 first performs the first group of the first color to the third color and the remaining fourth color to the sixth color in descending order of frequency. Separate into a second group. The fourth color has fewer pixels than the third color and more pixels than the fifth color, and the fifth color has fewer pixels than the fourth color.
Then, the first label generation unit 2046 extracts a first label corresponding to the first group and a first label corresponding to the second group. And the 1st label production | generation part 2046 uses a 1st label with a modification label about the two 1st labels extracted in this way according to the frequency of the 1st color or the 4th color like 3rd Embodiment. It may be modified by modification to generate a plurality of labels. Alternatively, the second label generation unit 2047 may integrate a plurality of labels generated in this way to generate one label. Specifically, when the label by the first group is “very refreshing” and the label by the second group is “a little childish”, the second label generation unit 2047 “is very refreshing and a little childish”. May be generated. When generating such two labels, the second label generation unit 2047 determines which of the two labels is arranged first in the second label generation unit 2047 and can generate an appropriate label. For confirmation, a processing function unit that performs a language analysis process (not shown) may be provided.

In the third to sixth embodiments, an example in which one label is generated for one image data has been described. However, two or more labels may be generated. In this case, for example, the color space vector generation unit 2043 (including 2043b) generates a color vector frequency distribution for each divided region divided into an upper half and a lower half in the image data of FIG. 17A. The main color extraction unit 2044 extracts each main color by three colors from the frequency vector frequency distribution for each divided area. The first label generation unit 2046 may extract a label for each region from the table storage unit 2045. The label output unit 2048 may store the plurality of labels generated in this manner in the storage medium 2200 in association with the image data.

In the third to fifth embodiments, the example in which the three main colors and the first label are associated with each other and stored in the table storage unit 2045 has been described. For example, a single color is used for each scene. And the first label may be associated with each other and stored in the table storage unit 2045. In this case, as described in the third embodiment, the table storage unit 2045 stores the three main colors and the first label in association with each scene, and further stores the single color and the first label for each scene. You may make it memorize | store in correlation.
By such processing, an appropriate label can be generated even for image data in which the image data is monotone and only one main color can be extracted. In this case, for example, the image processing unit 2004 (2004a, 2004b) detects four colors as main colors as in the sixth embodiment, only the first group of the first to third colors, and the remaining fourth colors. The label may be read from the table storage unit 2045 as a single color.
Further, when the color tone of the image data is monotonous and only two main colors can be extracted, for example, the first label generation unit 2046 has two extracted main colors (first color and second color). Read each first label. Next, the second label generation unit 2047 normalizes the two main colors based on the frequency of the extracted two main colors, and generates a modified label for the first color label based on the ratio of the first colors. Then, the second label of the first color may be generated by modifying and correcting the first label of the first color with the generated modified label. Alternatively, the second label generation unit 2047 generates two labels, the first label of the first color and the first label of the second color generated as described above, or the first label of the first color And the first label of the second color may be integrated to generate one label.

In the third to sixth embodiments, the example in which the image data selected by the user is read from the storage medium 2200 has been described. However, the image data used for the label generation process is stored in the storage medium 2200 as a raw (RAW) signal. When data and JPEG (Joint Photographic Experts Group) data are recorded, either RAW data or JPEG data may be used. Further, when thumbnail image data reduced for display on the display unit 2007 is stored in the storage medium 2200, a label may be generated using the thumbnail image data. Even if the thumbnail image data is not stored in the storage medium, the color space vector generation unit 2043 (including 2043b) determines the resolution of the image data output from the image acquisition unit 2041 (including 2041a and 2041b) in advance. Image data reduced in resolution may be generated, and the frequency of color vectors and main colors may be extracted from the reduced image data.

Note that a program for realizing the function of each unit of the image processing unit 2004 in FIG. 10, the image processing unit 2004a in FIG. 21, or the image processing unit 2004b in FIG. 22 is recorded on a computer-readable recording medium. The program recorded on the recording medium may be read into a computer system and executed to execute the processing of each unit. The program may be a program for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in a computer system.

(Seventh embodiment)
The functional block diagram of the imaging apparatus according to the present embodiment is the same as that shown in FIG. 8 according to the second embodiment.
Hereinafter, parts different from those of the second embodiment will be described in detail.

FIG. 25 is a block diagram showing a functional configuration of an image processing unit 3140 (image processing unit 1140 in FIG. 8) according to the present embodiment.
An image processing unit (image processing apparatus) 3140 includes an image input unit 3011, a text input unit 3012, a first position input unit 3013, an edge detection unit 3014, a face detection unit 3015, and a character size determination unit 3016. The cost calculation unit 3017, the region determination unit 3018, and the synthesis unit 3019 are included.

The image input unit 3011 inputs still image data or moving image data. The image input unit 3011 outputs the input image data to the edge detection unit 3014 and the character size determination unit 3016. Note that the image input unit 3011 may input image data via a network or a storage medium, for example. Hereinafter, an image indicated by the image data input to the image input unit 3011 is set as an input image. Also, an XY coordinate system is defined with the width direction of the rectangular image format in the input image as the X-axis direction and the direction (height direction) orthogonal to the X-axis direction as the Y-axis direction.

The text input unit 3012 inputs text data corresponding to the input image. The text data corresponding to the input image is data relating to the text to be superimposed on the input image, and includes text, initial character size, line feed position, number of rows, number of columns, and the like. The initial character size is an initial value of the character size of the text, and is the character size designated by the user. The text input unit 3012 outputs the input text data to the character size determination unit 3016.

The first position input unit 3013 receives an input of an important position in the input image (hereinafter, referred to as an important position (first position)). For example, the first position input unit 3013 displays an input image on the display unit 1150 and sets a position designated by the user on the touch panel installed on the display unit 1150 as an important position. Alternatively, a first position input unit 3013 may receive an input of the direct coordinate values of key position _(x 0, _{y 0).} The first position input unit 3013 outputs the coordinate value (x ₀ , y ₀ ) of the important position to the cost calculation unit 3017. Note that the first position input unit 3013 sets a predetermined position (for example, the center of the input image) set in advance as the important position when the user does not input the important position.

The edge detection unit 3014 detects an edge in the image data input from the image input unit 3011 using, for example, the Canny algorithm. Then, the edge detection unit 3014 outputs the image data and data indicating the position of the edge detected from the image data to the cost calculation unit 3017. In this embodiment, the edge is detected using the Canny algorithm. For example, the edge is detected based on an edge detection method using a differential filter or a high-frequency component in the result of two-dimensional Fourier transform. A method or the like may be used.
The face detection unit 3015 detects a human face in the image data input from the image input unit 3011 by pattern matching or the like. Then, the face detection unit 3015 outputs the image data and data indicating the position of the person's face detected from the image data to the cost calculation unit 3017.

The character size determination unit 3016 determines the text based on the image size (width and height) of the image data input from the image input unit 3011 and the number of rows and columns of the text data input from the text input unit 3012. Determine the character size of the data. Specifically, the character size determination unit 3016 sets f satisfying the following expression (5) as the character size so that all texts in the text data can be combined with the image data.

However, m is the number of text data columns, and l is the number of text data rows. L (≧ 0) is a parameter indicating the ratio of line spacing to character size. Further, w is the width of the image area in the image data, and h is the height of the image area in the image data. Expression (5) represents that the width of the text is smaller than the width of the image area in the image data, and the height of the text is smaller than the height of the image area in the image data.

For example, if the initial character size included in the text data does not satisfy Expression (5), the character size determination unit 3016 gradually decreases the character size until Expression (5) is satisfied. On the other hand, when the initial character size included in the text data satisfies Expression (5), the character size determining unit 3016 sets the initial character size included in the text data as the character size of the text data. Then, the character size determining unit 3016 outputs the text data and the character size of the text data to the region determining unit 3018.

The cost calculation unit 3017 calculates the cost of each coordinate position (x, y) in the image data based on the position of the edge in the image data, the position of the person's face, and the important position. The cost represents importance in the image data. For example, the cost calculation unit 3017 calculates the cost of each position so that the cost of the position where the edge is detected by the edge detection unit 3014 is high. The cost calculation unit 3017 increases the cost as it is closer to the important position, and lowers the cost as it is farther from the important position. In addition, the cost calculation unit 3017 increases the cost of the area having the human face.

Specifically, first, the cost calculation unit 3017 uses, for example, a global cost image c _g (x, y) indicating the cost based on the important position (x ₀ , y ₀ ) using a Gaussian function expressed by the following equation (6). ) Is generated.

However, x ₀ is the X coordinate value of the critical position, y ₀ is the Y-coordinate values of the critical position. S ₁ (> 0) is a parameter that determines how the cost spreads in the width direction (X-axis direction), and S ₂ (> 0) is a parameter that determines how the cost spreads in the height direction (Y-axis direction). It is. Parameter S ₁ and parameter S ₂ can be set by the user, for example, by setting screen or the like. Than to change the parameters S ₁ and parameter S _2, it is possible to adjust the shape of the distribution in the global cost image. In this embodiment, a global cost image is generated by a Gaussian function. For example, a cosine function ((cos (πx) +1) / 2, where −1 ≦ x ≦ 1) or an origin x = 0. A function represented by a triangle (mountain) straight line that takes the maximum value, a Lorentz type function (1 / (ax ² +1), where a is a constant), or the like of a distribution whose value increases as it is closer to the center. It may be used to generate a global cost image.

Next, the cost calculation unit 3017 generates a face cost image c _f (x, y) indicating the cost based on the position of the person's face by the following equations (7) and (8).

However, (x ⁽ⁱ⁾ , y ⁽ⁱ⁾ ) is the center position of the i (1 ≦ i ≦ n) -th face among the detected n faces, and s ⁽ⁱ⁾ is the i-th face It is a size. That is, the cost calculation unit 3017 generates a face cost image in which the pixel value in the human face region is “1” and the pixel value in the unexpected region is “0”.

Next, the cost calculation unit 3017 generates an edge cost image c _e (x, y) indicating the cost based on the edge by the following equation (9).

That is, the cost calculation unit 3017 generates an edge cost image in which the pixel value of the edge portion is “1” and the pixel value of the region other than the edge is “0”. Note that the edge portion may be a position where the edge is present, or may be a region including the position where the edge is present and its periphery.

Then, the cost calculation unit 3017 generates a final cost image c (x, y) based on the global cost image, the face cost image, and the edge cost image by the following equation (10).

However, C _g (≧ 0) is a parameter of the weighting coefficient of the global cost image, C _f (≧ 0) is a parameter of the weighting coefficient of the face cost image, and C _e (≧ 0) is the weighting of the edge cost image. This is a coefficient parameter. The ratio of the parameter C _g , the parameter C _e and the parameter C _f can be changed by the user on the setting screen or the like. Further, the final cost image c (x, y) shown in Expression (10) is normalized so that 0 ≦ c (x, y) ≦ 1. The cost calculation unit 3017 outputs the image data and the final cost image of the image data to the region determination unit 3018. The parameter C _g , the parameter C _e and the parameter C _f may be 1 or less.

Note that the image processing unit 3140 may automatically change the ratio of the parameter C _g , the parameter C _e, and the parameter C _f according to the input image. For example, if the input image is a landscape image, the larger the parameter C _g than other parameters. Further, when the input image is a portrait (person image) increases the parameter C _f than other parameters. Further, when the input image is a large building images building or the like increases the parameter C _e than other parameters. Specifically, the cost calculation unit 3017, when the face of a person is detected by the face detection unit 3015 determines the input image portrait and, to increase the parameter C _f than other parameters. On the other hand, if the human face is not detected by the face detection unit 3015, the cost calculation unit 3017 determines that the input image is a landscape image and makes the parameter _Cg larger than the other parameters. Also, the cost calculating unit 3017, if detected edge by the edge detection unit 3014 is greater than the predetermined value, determines the input image as a building image, to increase the parameter C _e than other parameters.
Alternatively, the image processing unit 3140 has a landscape image mode, a portrait mode, and a building image mode, and the parameter C _g , the parameter C _e, and the parameter according to the mode currently set in the image processing unit 3140. The ratio of C _f may be changed.

In addition, when the image data is a moving image, the cost calculation unit 3017 calculates an average value of costs of a plurality of frame images included in the moving image data for each coordinate position. Specifically, the cost calculation unit 3017 acquires a frame image of a moving image at a predetermined time (for example, 3 seconds) interval, and generates a final cost image for each acquired frame image. Then, the cost calculation unit 3017 generates an average final cost image obtained by averaging the final cost images of the respective frame images. The pixel value at each position in the average final cost image is an average value of the pixel values at each position in each final cost image.
In the present embodiment, the average value of costs of a plurality of frame images is calculated, but for example, a total value may be calculated.

The area determination unit 3018 determines a synthesis area in which text in the image data is combined based on the final cost image input by the cost calculation unit 3017 and the character size of the text data input by the character size determination unit 3016. . Specifically, first, the area determination unit 3018 determines the width w _text and the height h _text of a text rectangular area that is a rectangular area for displaying text, based on the number of rows and columns of text data and the character size. Is calculated. The text rectangular area is an area corresponding to the synthesis area. Subsequently, the region determination unit 3018 calculates the total cost c ^* _text (x, y) in the text rectangular region at each coordinate position (x, y) by the following equation (11).

Then, the area determination unit 3018 sets the coordinate position (x, y) at which the total cost c ^* _text (x, y) in the text rectangular area is minimum as the text synthesis position. In other words, the region determination unit 3018 sets the text rectangular region having the coordinate position (x, y) at which the total cost c ^* _text (x, y) in the text rectangular region is minimum as the upper left vertex as the text synthesis region. To do. The area determination unit 3018 outputs image data, text data, and data indicating a text synthesis area to the synthesis unit 3019. In the present embodiment, the area determination unit 3018 determines the synthesis area based on the total cost (total value) in the text rectangular area. For example, the average cost in the text rectangular area is the highest. A small area may be used as a synthesis area. Alternatively, the region determination unit 3018 may set a region having the smallest cost weighted average value obtained by increasing the weight of the center of the text rectangular region as the composite region.

The composition unit 3019 receives image data, text data, and data indicating a text composition area. The synthesizing unit 3019 generates and outputs image data of a synthesized image obtained by superimposing the text of the text data on the image data synthesis area.

26A to 26F are image diagrams showing examples of the input image, the cost image, and the composite image according to the present embodiment.
FIG. 26A shows an input image. FIG. 26B shows a global cost image. In the example shown in FIG. 26B, the center of the input image is the important position. As shown in FIG. 26B, the pixel value of the global cost image is closer to “1” as it is closer to the center, and closer to “0” as it is farther from the center. FIG. 26C shows a face cost image. As shown in FIG. 26C, the pixel value of the face cost image is “1” in the area of the human face and “0” in the area other than the human face. FIG. 26D shows an edge cost image. As illustrated in FIG. 26D, the pixel value of the edge cost image is “1” in the edge portion and “0” in the region other than the edge portion.

FIG. 26E shows a final cost image obtained by combining a global cost image, a face cost image, and an edge cost image. FIG. 26F shows a synthesized image obtained by superimposing text on the input image. As shown in FIG. 26F, the text of the text data is superimposed on an area where the total cost in the final cost image is small.

Next, with reference to FIG. 27, the still image composition processing by the image processing unit 3140 will be described.
FIG. 27 is a flowchart illustrating a procedure of still image composition processing according to the present embodiment.
First, in step S3101, the image input unit 3011 accepts input of still image data (hereinafter referred to as still image data).
Next, in step S3102, the text input unit 3012 receives input of text data corresponding to the input still image data.
Next, in step S3103, the first position input unit 3013 receives input of an important position in the input still image data.

In step S3104, the character size determination unit 3016 determines the character size of the text data based on the size of the input still image data and the number of rows and columns of the input text data.
In step S3105, the face detection unit 3015 detects the position of the person's face in the input still image data.
Next, in step S3106, the edge detection unit 3014 detects the position of the edge in the input still image data.

In step S3107, the cost calculation unit 3017 generates a global cost image based on the designated (input) important position. That is, the cost calculation unit 3017 generates a global cost image that has a higher cost as it is closer to the important position and a lower cost as it is farther from the important position.
In step S3108, the cost calculation unit 3017 generates a face cost image based on the detected face position of the person. That is, the cost calculation unit 3017 generates a face cost image in which the cost of the human face region is high and the cost of the region other than the human face is low.
In step S3109, the cost calculation unit 3017 generates an edge cost image based on the detected edge position. That is, the cost calculation unit 3017 generates an edge cost image in which the cost of the edge portion is high and the cost of the region other than the edge is low.

Subsequently, in step S3110, the cost calculation unit 3017 generates a final cost image by combining the generated global cost image, face cost image, and edge cost image.
In step S <b> 3111, the area determination unit 3018 determines a text synthesis area in the still image data based on the generated final cost image and the determined character size of the text data.
Finally, in step S3112, the synthesis unit 3019 superimposes the text data on the determined synthesis area to synthesize still image data and text data.

Next, with reference to FIG. 28, the moving image composition processing by the image processing unit 3140 will be described. FIG. 28 is a flowchart showing the procedure of the moving image composition process according to this embodiment.
First, in step S3201, the image input unit 3011 receives input of moving image data (hereinafter referred to as moving image data).
Next, in step S3202, the text input unit 3012 receives input of text data corresponding to the input moving image data.
Next, in step S3203, the first position input unit 3013 accepts designation of an important position in the input moving image data.

Subsequently, in step S3204, the character size determination unit 3016 determines the character size of the text data based on the size of the moving image data and the number of rows and columns of the text data.
In step S3205, the cost calculation unit 3017 acquires the first frame image from the moving image data.

In step S3206, the face detection unit 3015 detects the position of the person's face in the acquired frame image.
Next, in step S3207, the edge detection unit 3014 detects the position of the edge in the acquired frame image.

Subsequently, in steps S3208 to S3211, the cost calculation unit 3017 performs the same processing as in steps S3107 to S3110 in FIG.
In step S3212, the cost calculation unit 3017 determines whether the current frame image is the last frame image in the moving image data.
If the current frame image is not the last frame image (step S3212: No), in step S3213, the cost calculation unit 3017 uses a frame image that is a predetermined time t seconds (eg, 3 seconds) from the current frame image as moving image data. And the process returns to step S3206.

On the other hand, when the current frame image is the last frame in the moving image data (step S3212: Yes), in step S3214, the cost calculation unit 3017 generates an average final cost image by averaging the final cost images of the respective frame images. To do. The pixel value at each coordinate position in the average final cost image is the average value of the pixel values at each coordinate position in the final cost image of each frame image.

Next, in step S3215, the region determination unit 3018 determines a text synthesis region in the moving image data based on the generated average final cost image and the determined character size of the text data.
Finally, in step S3216, the synthesizer 3019 synthesizes the moving image data and the text data by superimposing the text data on the determined synthesis area.

In the present embodiment, the composite area in the entire moving image data is determined based on the average final cost image, but the composite area may be determined every predetermined time of the moving image data. For example, the image processing unit 3140 sets the synthesis area r ₁ based on the first frame image as the synthesis area of the frame image from 0 seconds to t−1 seconds, and sets the synthesis area r ₂ based on the frame image of t seconds from t seconds. The composite region of frame images up to 2t-1 seconds is determined, and the composite region of each frame image is determined in the same manner. As a result, the text can be synthesized at an optimal position in accordance with the movement of the subject in the moving image data.

As described above, according to the present embodiment, the image processing unit 3140 determines a synthesis area in which text is synthesized based on an edge cost image indicating a cost related to an edge in image data. Therefore, it is possible to synthesize text in a region with few edges (that is, a region where no complex texture exists). Thereby, since it is possible to prevent the outline of the font used for text display and the texture edge from overlapping, the text can be synthesized in the input image so that the viewer can easily read the text.

In addition, when the position where the text is displayed is fixed, depending on the content of the input image and the amount of the text, the text may overlap the subject, the person of interest, the object, the background, etc., and the original impression of the input image may be deteriorated. . The image processing unit 3140 according to the present embodiment determines a synthesis area for text synthesis based on a face cost image indicating a cost related to a person's face in the image data, and therefore synthesizes text in an area other than the person's face. can do. In addition, the image processing unit 3140 determines a synthesis area for synthesizing text based on a global cost image indicating a cost related to an important position in the image data. Therefore, the image processing unit 3140 can synthesize text in an area away from the important position. it can. For example, in many images, a subject is present in the central portion, and text can be synthesized in an area other than the subject by setting the central portion as an important position. Further, in the image processing unit 3140 according to the present embodiment, since the user can designate an important position, for example, in the input image A, the central portion is set as the important position, and in the input image B, the end portion is set as the important position. The important position can be changed.

In addition, according to the present embodiment, the image processing unit 3140 determines a synthesis region in which text is synthesized based on a final cost image obtained by combining a global cost image, a face cost image, and an edge cost image. Therefore, it is possible to synthesize text at an optimal position comprehensively.

By the way, when the character size is fixed, the relative size of the text with respect to the image data may change drastically depending on the image size of the input image, resulting in a text display unsuitable for the viewer. For example, when the character size of text data is relatively large with respect to the input image, all text may not fit in the input image and the sentence may not be read. According to the present embodiment, the image processing unit 3140 changes the character size of the text data in accordance with the image size of the input image, so that the entire text can be stored in the input image.

Further, according to the present embodiment, the image processing unit 3140 can synthesize text with moving image data. Thereby, for example, it can be applied to a service or the like that dynamically displays a comment received from a user in an image while a moving image is distributed and reproduced by broadcasting or the Internet. In addition, since the image processing unit 3140 determines the synthesis region using the average final cost image of a plurality of frame images, the image processing unit 3140 synthesizes the text into a comprehensively optimal region considering the movement of the subject in the entire moving image. be able to.

(Eighth embodiment)
Next, an image processing unit (image processing apparatus) 3140a according to an eighth embodiment of the present invention will be described.
FIG. 29 is a block diagram illustrating a functional configuration of the image processing unit 3140a according to the present embodiment. In this figure, the same parts as those of the image processing unit 3140 shown in FIG. 25 are denoted by the same reference numerals, and the description thereof is omitted. The image processing unit 3140a includes a second position input unit 3021 in addition to the configuration of the image processing unit 3140 shown in FIG.
The second position input unit 3021 receives an input of a position (hereinafter referred to as a text position (second position)) where text is combined in the image data. For example, the second position input unit 3021 displays the image data input to the image input unit 3011 on the display unit 1150, and sets the position specified by the user on the touch panel installed on the display unit 1150 as the text position. Alternatively, the second position input unit 3021 may directly accept input of coordinate values (x ₁ , y ₁ ) of the text position. The second position input unit 3021 outputs the coordinate value (x ₁ , y ₁ ) of the text position to the cost calculation unit 3017a.

The cost calculation unit 3017a, based on the text position (x ₁ , y ₁ ) input by the second position input unit 3021, the position of the edge in the image data, the position of the person's face, and the important position, The cost of each coordinate position (x, y) in the image data is calculated. Specifically, the cost calculation unit 3017a combines the text position cost image indicating the cost based on the text position (x ₁ , y ₁ ), the global cost image, the face cost image, and the edge cost image to obtain the final cost. Generate an image. The generation method of the global cost image, the face cost image, and the edge cost image is the same as that in the seventh embodiment.

The cost calculation unit 3017a generates a text position cost image c _t (x, y) by the following equation (12).

However, S ₃ (> 0) is a parameter that determines how the cost spreads in the width direction (X-axis direction), and S ₄ (> 0) is a parameter that determines how the cost spreads in the height direction (Y-axis direction). It is. The text position cost image is an image that has a lower cost as it is closer to the text position (x ₁ , y ₁ ), and has a higher cost as it is farther from the text position.

Then, the cost calculation unit 3017a generates a final cost image c (x, y) by the following equation (13).

However, C _t (≧ 0) is a parameter of the weighting coefficient of the text position cost image.
Expression (13) is an expression in which C _t is added to the denominator of Expression (10) and C _t _ct (x, y) is added to the numerator. Note that, when the text position is not designated by the second position input unit 3021, the cost calculation unit 3017 a generates the final cost image according to the above equation (10) without generating the text position cost image. Alternatively, the cost calculation unit 3017a sets the parameter C _t = 0 when the text position is not designated by the second position input unit 3021.

Further, when the image data is a moving image, the cost calculation unit 3017a calculates an average value of the costs of a plurality of frame images included in the moving image data for each coordinate position. Specifically, the cost calculation unit 3017a acquires a frame image of a moving image at a predetermined time (for example, 3 seconds) interval, and generates a final cost image for each acquired frame image. Then, the cost calculation unit 3017a generates an average final cost image obtained by averaging the final cost images of the respective frame images.

Next, with reference to FIG. 30, the composition processing by the image processing unit 3140a will be described. FIG. 30 is a flowchart showing the procedure of the composition processing according to this embodiment.
The processing shown in steps S3301 to S3303 is the same as the processing shown in steps S3101 to S3103 described above.
Subsequent to step S3303, in step S3304, the second position input unit 3021 accepts designation of a text position in the input image data.
The processing shown in steps S3305 to S3307 is the same as the processing shown in steps S3104 to S3106 described above.

Subsequent to step S3307, in step S3308, the cost calculation unit 3017a generates a text position cost image based on the designated text position.
The processing shown in steps S3309 to S3311 is the same as the processing shown in steps S3107 to S3109 described above.

Subsequent to step S3311, in step S3312, the cost calculation unit 3017a generates a final cost image by combining the text position cost image, the global cost image, the face cost image, and the edge cost image.
Next, in step S3313, the area determination unit 3018 determines a text synthesis area in the image data based on the generated final cost image and the determined character size of the text data.
Finally, in step S3314, the synthesis unit 3019 superimposes the text data on the determined synthesis area, and synthesizes the image data and the text data.

In the present embodiment, the text position is specified in the second position input unit 3021. However, for example, an area in which text is to be synthesized may be specified. In this case, the cost calculation unit 3017a generates a text position cost image in which the pixel value of the designated area is “0” and the pixel value of the other area is “1”. That is, the cost calculation unit 3017a reduces the cost of the designated area.

As described above, according to the present embodiment, the user can designate the position where the text is to be synthesized, and the image processing unit 3140a determines the synthesis area by reducing the cost of the designated text position. Thereby, not only the same effect as in the seventh embodiment but also a position designated by the user can be preferentially selected as a text data synthesis region.

(Ninth embodiment)
Next, an image processing unit (image processing apparatus) 3140b according to a ninth embodiment of the present invention will be described.
FIG. 31 is a block diagram illustrating a functional configuration of the image processing unit 3140b according to the present embodiment. In this figure, the same parts as those of the image processing unit 3140 shown in FIG. 25 are denoted by the same reference numerals, and the description thereof is omitted. The image processing unit 3140b includes a second position input unit 3031 in addition to the configuration of the image processing unit 3140 shown in FIG.
The second position input unit 3031 accepts input of a text position (second position) in either the X-axis direction (width direction) or the Y-axis direction (height direction). The text position is a position where the text is synthesized in the image data. For example, the second position input unit 3031 displays the image data input to the image input unit 3011 on the display unit 1150, and sets the position specified by the user on the touch panel installed on the display unit 1150 as the text position. Alternatively, the second position input unit 3031 may directly accept the input of the X coordinate value x ₂ or the Y coordinate value y ₂ of the text position. The second position input unit 3031 outputs the X-coordinate value _{x 2} or Y-coordinate value _{y 2} of the text located in the region determination unit 3018B.

When the position x _{2 in the} width direction is designated by the second position input unit 3031, the region determination unit 3018 b fixes the X coordinate value to x ₂ in the above-described formula (11) and sets c ^* _text (x ₂ , A Y coordinate value y _min that minimizes y) is obtained. Then, the region determination unit 3018b sets the position (x ₂ , y _min ) as the synthesis position.

In addition, when the position y _{2 in the} height direction is designated by the second position input unit 3031, the region determination unit 3018 b fixes the Y coordinate value to y ₂ in the above-described equation (11), and c ^* _text ( x _min where x, y ₂ ) is minimized is obtained. Then, the region determination unit 3018b sets the position (x _min , y ₂ ) as the synthesis position.

Next, with reference to FIG. 32, the composition processing by the image processing unit 3140b will be described. FIG. 32 is a flowchart showing the procedure of the synthesis process according to this embodiment.
The processing from step S3401 to S3403 is the same as the processing from step S3101 to S3103 described above.
Following step S3403, in step S3404, the second position input unit 3031 receives an X input of the coordinate values _{x 2} or Y-coordinate value _{y 2} text position.
The processing from step S3405 to S3411 is the same as the processing from step S3104 to S3110 described above.

Following step S3411, in step S3412, the area determining portion 3018b is the X coordinate value _{x 2} or Y-coordinate value _{y 2} in the specified text position, a character size of the text data, based on the final cost image, A text synthesis area in image data is determined.
Finally, in step S3413, the synthesis unit 3019 superimposes the text data on the determined synthesis area to synthesize the image data and the text data.

As described above, according to the present embodiment, it is possible to specify the coordinate in the width direction or the height direction of the position where the text is synthesized. The image processing unit 3140b sets the optimum region based on the final cost image among the designated positions in the width direction or height direction as the synthesis region. As a result, the text can be superimposed on an optimum area (for example, an area with high text readability, an area without a human face, or an area other than the important position) which is an area desired by the user.

Further, a program for realizing each step shown in FIG. 27, FIG. 28, FIG. 30 or FIG. 32 is recorded on a computer-readable recording medium, and the program recorded on this recording medium is read into a computer system. By executing the processing, the image data and the text data may be combined.

The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium.
The program may be for realizing a part of the functions described above.
Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

In the above-described embodiment, all the regions in the image data are set as the synthesis region candidates. However, in consideration of the margin of the image data, a region other than the margin may be set as the synthesis region candidate. In this case, the character size determination unit 3016 sets f satisfying the following expression (14) as the character size.

However, M ₁ is a parameter indicating the size of the width direction of the margin, M ₂ is a parameter indicating the size of the height direction of the margin. The parameter M ₁ and the parameter M ₂ may be the same value (M ₁ = M ₂ = M). The

cost calculation units

3017 and 3017a generate a final cost image of an area excluding the margin in the image data. Further, the

region determination units

3018 and 3018b select a composite region from regions excluding margins (M ₁ <x <w−M ₁ , M ₂ <y <h−M ₂ ).

In the present embodiment, the important position is input by the first position input unit 3013. However, even if a global cost image is generated using a predetermined position (for example, the center of the image data) as the important position. Good. For example, when the center of the image data is an important position, the

cost calculation units

3017 and 3017a generate a global cost image according to the following equation (15).

However, S (> 0) is a parameter that determines how the cost spreads.

If the important position is determined in advance, the global cost image is determined by the image size. Therefore, a global cost image may be prepared in advance for each image size and stored in the storage unit 1160. The

cost calculation units

3017 and 3017a read a global cost image corresponding to the image size of the input image from the storage unit 1160 and generate a final cost image. This eliminates the need to generate a global cost image for each process of combining text data with image data, thereby reducing the overall processing time.

In the above-described embodiment, a face cost image based on a person's face region is generated. However, a cost image based on an arbitrary feature amount (for example, an object or an animal) may be generated. In this case, the

cost calculation units

3017 and 3017a generate feature amount cost images in which the cost of the feature amount region is high. For example, the

cost calculation units

3017 and 3017a generate a feature amount cost image in which the pixel value of the region of the feature amount detected by object recognition or the like is “1” and the pixel value of the other region is “0”. Then, the cost calculation unit 3017 generates a final cost image based on the feature amount cost image.

Further, the

area determination units

3018 and 3018b calculate all the coordinate positions (x, y) in advance by the following equation (16) before calculating the total cost c ^* _text (x, y) in the text rectangular area. Alternatively, a differential image may be generated.

In this case, the

area determination units

3018 and 3018b calculate the total cost c ^* _text (x, y) in the text rectangular area by the following equation (17).

FIG. 33 is an image diagram showing a method of calculating the total cost in the text rectangular area.
As shown in this figure, when the equation (17) is used, the total cost c ^* _text (x, y) in the text rectangular area can be calculated by four operations. As a result, the processing time can be shortened compared to the case where the total cost c ^* _text (x, y) in the text rectangular area is calculated by the above-described equation (11).

(Tenth embodiment)
The functional block diagram of the imaging apparatus according to the present embodiment is the same as that shown in FIG. 8 according to the second embodiment.
Hereinafter, parts different from those of the second embodiment will be described in detail.
FIG. 34 is a block diagram showing a functional configuration of an image processing unit (image processing apparatus) 4140 (the image processing unit 1140 in FIG. 8) according to the tenth embodiment of the present invention.
As shown in FIG. 34, the image processing unit 4140 according to this embodiment includes an image input unit 4011, a text setting unit 4012, a text composition region setting unit 4013, a font setting unit 4014, and a composite image generation unit 4015. And a storage unit 4016.
The font setting unit 4014 includes a font color setting unit 4021.

The image input unit 4011 inputs image data of a still image, a moving image, or a through image. The image input unit 4011 outputs the input image data to the text setting unit 4012.
Here, the image input unit 4011 inputs, for example, image data output from the A / D conversion unit 1120, image data stored in the buffer memory unit 1130, or image data stored in the storage medium 1200.
As another example, a configuration in which the image input unit 4011 inputs image data via a network (not shown) may be used.

A text setting unit 4012 receives image data from the image input unit 4011 and sets text data to be superimposed (synthesized) on the image data. The text setting unit 4012 outputs the image data and the set text data to the text composition area setting unit 4013.
The text data may include, for example, information on the size of characters constituting the text.

Here, any method may be used as a method of setting text data to be superimposed on the image data.
As an example, text data fixedly determined in advance may be stored in the storage unit 4016, and the text setting unit 4012 may read the text data from the storage unit 4016 and set it.
As another example, the text setting unit 4012 may detect and set text data designated by the user by operating the operation unit 1180.

As another example, a rule for determining text data based on image data is stored in the storage unit 4016, and the text setting unit 4012 reads the rule from the storage unit 4016, and the image data is read according to the rule. Text data may be determined and set. As this rule, for example, a rule that defines the correspondence between a predetermined feature or a predetermined feature amount of image data and text data can be used. In this case, the text setting unit 4012 sets a predetermined feature for the image data. Or a predetermined feature amount is detected, and text data corresponding to the detection result is determined according to the above-described rule (the above-described correspondence).

The text composition area setting unit 4013 receives the image data and the set text data from the text setting section 4012, and sets an area (text composition area) for synthesizing the text data with the image data. The text composition area setting unit 4013 outputs the image data, the set text data, and information for specifying the set text composition area to the font setting unit 4014.

Here, any method may be used as a method of setting a region (text combining region) in which text data is combined with image data.
As an example, a fixed text synthesis area may be stored in the storage unit 4016, and the text synthesis area setting unit 4013 may read the text synthesis area from the storage unit 4016 and set it.
As another example, the text composition region setting unit 4013 may detect and set a text composition region designated by the user operating the operation unit 1180.

As another example, a rule for determining a text composition area based on image data is stored in the storage unit 4016, and the text composition area setting unit 4013 reads the rule from the storage unit 4016, and according to the rule, The text synthesis area may be determined and set from the image data. As this rule, for example, a rule for determining a text synthesis area so that text is superimposed on a non-important area other than an important area where a relatively important subject is captured in an image can be used. As a specific example, it is possible to use a configuration in which an area in which a person is captured is classified as an important area, and text is superimposed on a non-important area that does not include the center of the image. Various other rules may also be used.

In the present embodiment, the text composition area setting unit 4013, for example, sets the set text when the size of the preset text is large enough that the set text does not fit in the text composition area. The size of the text is changed to be small so that the whole of the text fits in the text composition area.

Here, various shapes of regions may be used as the text composition region, and for example, an internal region surrounded by a rectangular frame such as a rectangle or a square can be used. As another example, an internal area surrounded by a frame partially or entirely made of a curve may be used as the text synthesis area.

The font setting unit 4014 inputs image data, set text data, and information for specifying the set text composition region from the text composition region setting unit 4013, and based on one or more of these, this text Set the data font (including at least the font color). The font setting unit 4014 outputs this image data, set text data, information for specifying the set text composition area, and information for specifying the set font to the composite image generation unit 4015.

Here, in the present embodiment, the font setting unit 4014 mainly sets the font color of the text data by the font color setting unit 4021. In the present embodiment, the font color is included in the font as one of the fonts.
For this reason, in this embodiment, fonts other than the font color may be arbitrary, for example, may be fixedly set in advance.

The font color setting unit 4021 is text data input from the text composition region setting unit 4013 to the font setting unit 4014 based on the image data and text composition region input from the text composition region setting unit 4013 to the font setting unit 4014. Set the font color.
Note that when setting the font color by the font color setting unit 4021, for example, text data input from the text composition region setting unit 4013 to the font setting unit 4014 may be considered.

The composite image generation unit 4015 inputs image data, set text data, information for specifying the set text composition area, and information for specifying the set font from the font setting unit 4014, Image data (composite image data) obtained by synthesizing the text data with the font (including at least the font color) is generated in the text synthesis area of the data.

Then, the composite image generation unit 4015 outputs the generated composite image data to, for example, one or more of the storage unit 1200 (via the communication unit 1170), the display unit 1150, the buffer memory unit 1130, and the like.
As another example, a configuration in which the composite image generation unit 4015 outputs data of the generated composite image to a network (not shown) may be used.

The storage unit 4016 stores various types of information. For example, in the present embodiment, the storage unit 4016 includes information that is referred to by the text setting unit 4012, information that is referred to by the text composition region setting unit 4013, and a font setting unit 4014 (including the font color setting unit 4021). Store referenced information.

Next, processing performed in the font setting unit 4014 will be described in detail.
In the present embodiment, only the font color is set as the font, and any other font may be set. Therefore, the font color setting process performed by the font color setting unit 4021 will be described.

First, the Nippon Color Research Coloring System (PCCS color system: Practical Color Coordinate System Color System), which is one of the color expression methods, will be briefly described.
The PCCS color system is a color system in which hue, brightness, and saturation are determined based on human sensitivity.
In the PCCS color system, there is a concept of a tone (tone) determined from lightness and saturation, and a color can be expressed by two parameters of tone and hue.

As described above, in the PCCS color system, in addition to expressing the color by the three attributes of the color (hue, lightness, and saturation), the concept of tone can also be defined to express the color by tone and hue.
There are 12 types of tones for chromatic colors and 5 types for achromatic colors.
Twenty-four or twelve hues are determined depending on the tone.

FIG. 41 is a diagram illustrating an example of a hue circle of the PCCS color system in gray scale.
FIG. 42 is a diagram illustrating an example of a PCCS color system tone in gray scale. As a rule, the horizontal axis of the tone corresponds to the saturation, and the vertical axis of the tone corresponds to the lightness.
41 and 42 shown in color are published on, for example, the web page of DIC Color Design Co., Ltd.

In the example of the hue circle shown in FIG. 41, 24 types of hues are defined such as warm color systems 1 to 8, neutral color systems 9 to 12, cold color systems 13 to 19, and neutral color systems 20 to 24. Has been.

In the example of the tone (PCCS tone map) shown in FIG. 42, 12 types of tones are defined for chromatic colors and 5 types are defined for achromatic colors. In this example, 12 types of hues are defined for each chromatic color tone.

FIG. 43 is a diagram illustrating twelve chromatic color tones.
In this example, the correspondence between tone names and tone symbols is shown.
Specifically, as shown in FIG. 43, twelve chromatic tones include vivid tone (symbol v), strong tone (symbol s), and bright tone (bright tone: symbol b). ), Light tone (symbol lt), pale tone (symbol p), soft tone (soft tone: sf), light grayish tone (symbol ltg), dull tone (symbol) d), gray tone tone (symbol g), deep tone (symbol dp), dark tone (symbol dk), dark gray tone tone (dark gray tone) Symbol dkg) there is.

FIG. 44 is a diagram showing five types of achromatic tones.
In this example, the correspondence between the tone name, tone symbol, PCCS number, R (red) value, G (green) value, and B (blue) value is shown.
Specifically, as shown in FIG. 44, as the five types of achromatic tones, white tone (white tone: symbol W), light gray tone (symbol ltGy), medium gray tone (medium gray tone) : Symbol mGy), dark gray tone (symbol dkGy), and black tone (symbol Bk).

Note that the correspondence between the PCCS color system number and the RGB value in the achromatic color tone is in accordance with the color table of the web page “http://www.wsj21.net/ghp/ghp0c — 03.html”.

Next, processing performed by the font color setting unit 4021 will be described.
The font color setting unit 4021 is based on the PCCS color system, based on the image data and the text composition region input from the text composition region setting unit 4013 to the font setting unit 4014, and based on the text composition region setting unit 4013. In 4014, the font color of the text data input is set.

Here, in this embodiment, when setting the font color for displaying text in the image, the text composition area setting unit 4013 optimizes the position of the text displayed in the image (text composition area). The position in the image (text synthesis area) when the text is displayed in the image is determined.

The font color setting unit 4021 first determines the average color of the text composition area in the image data (text in the image based on the image data and the text composition area input from the text composition area setting unit 4013 to the font setting unit 4014). The average color of the image area in which is displayed.

Specifically, the font color setting unit 4021 is based on the image data input to the font setting unit 4014 from the text composition region setting unit 4013 and the text composition region, and the pixels inside the text composition region in this image data ( For the pixel), an average value for each R, an average value for each G, and an average value for each B are calculated, and a combination of these R, G, and B average values is obtained as an RGB average color. The font color setting unit 4021 then converts the obtained average RGB color to the tone and hue of the PCCS color system based on the conversion table information 4031 from the RGB system to the PCCS color system stored in the storage unit 4016. The tone and hue of the PCCS color system obtained by conversion are used as the average color of the PCCS color system.

Here, each of the pixels inside the text composition area in the image data has respective values of R, G, and B (for example, values of 0 to 255). The value is added for each R, G, and B, and the result of dividing each addition result by the number of all pixels is the average value for each R, G, and B. A combination of average values is an RGB average color.

The conversion table specified by the conversion table information 4031 from the RGB system to the PCCS color system, which is referred to when converting the RGB average color to the tone and hue of the PCCS color system, is an RGB average color. And the correspondence between the tone and hue of the PCCS color system.
As such a conversion table, those having various conversion contents may be used. Usually, since RGB can take more values than the PCCS color system, RGB values and PCCS tables can be used. The correspondence with the values of the color system is a many-to-one correspondence. In this case, several different RGB values are converted into the same PCCS color system value as the representative value.

In the present embodiment, the average RGB color is converted into the PCCS color system tone and hue based on the conversion table. However, as another example, the RGB average color is converted into the PCCS color system tone and hue. The storage unit 4016 stores conversion formula information that defines the content to be converted, and the font color setting unit 4021 reads the conversion formula information from the storage unit 4016 and calculates the conversion formula. Alternatively, a configuration may be used in which average RGB colors are converted into PCCS color system tones and hues.

Next, the font color setting unit 4021 determines the text input from the text composition region setting unit 4013 to the font setting unit 4014 based on the PCCS color system tone and hue, which are the average colors of the PCCS color system. Set the font color of the data.
Specifically, the font color setting unit 4021 performs tone conversion stored in the storage unit 4016 while maintaining the hue as it is for the tone and hue of the PCCS color system, which is the average color of the obtained PCCS color system. Based on the table information 4032, only the tone is changed to set the font color (color) of the text data input from the text composition region setting unit 4013 to the font setting unit 4014.
The information specifying the font color set in this way is included in the information specifying the font by the font setting unit 4014 and output to the composite image generating unit 4015.

Here, when the tone (tone) and hue (hue) of the PCCS color system, which is the average color of the PCCS color system calculated by the font color setting unit 4021, are set to t and h, respectively, the font color setting unit 21 sets them. The tone t ^* and the hue h ^* of the font color to be expressed are expressed by the formula ().

t ^* = {tone different from t}
h ^* = h (18)

In the present embodiment, the image color input and given by the image input unit 4011 has n gradations and n ³ types of colors, whereas the font colors are N types (usually N colors) defined by the PCCS color system. Since <n ³ ), there is a certain color difference at this point, and a certain font outline is obtained.
Note that if n = 256 gradations used in a general digital image, the color of the image is 256 ³ = 16777216.
Further, as an example, if there are 24 types of hues estimated to be large per tone, the font color is N = 12 × 24 + 5 = 293 types.

As described above, in this embodiment, the font color in which the tone of the PCCS color system is changed is applied to the text data with respect to the average color of the text composition area in which the text data is arranged in the image data. Thus, for example, when displaying an image obtained by combining the image data and the text data, it is possible to set a font color (with contrast) that makes the text easy to read without changing the impression of the image. .

Here, processing for changing the tone of the PCCS color system performed by the font color setting unit 4021 will be described.
FIG. 35 is a diagram showing the relationship of the harmony of the contrast by the tone in the PCCS color system.
Note that the contents of FIG. 35 are disclosed, for example, on the web page of DIC Color Design Co., Ltd.

In this embodiment, information 4032 of the tone conversion table that defines the correspondence between the tone before conversion and the tone after conversion is stored in the storage unit 4016.
Various contents may be set and used as the contents of the tone conversion table (correspondence between the tone before conversion and the tone after conversion). As an example, the PCCS color system shown in FIG. Is set in consideration of the relationship of the harmony of the contrast by tone.

Specifically, for example, a white tone or a light gray tone is assigned to a dark tone.
For the bright tone, for example, another tone having a harmonious relationship of contrast shown in FIG. 35 is assigned. Alternatively, a chromatic color having a harmonious relationship with a contrast can be assigned.

If there are two or more tone candidates after conversion corresponding to the tone before conversion based on the harmony relationship of the contrast, among these candidates, for example, the chromatic one is adopted. Furthermore, the brighter tone (for example, the most vivid tone) is adopted.
For example, in the relationship of the harmony of the contrast shown in FIG. 35, the darker the color goes to the lower left and the brighter the color goes to the right. As a specific example of adopting a vivid tone, the one closer to dp (dp itself may be used) is adopted.

Next, a processing procedure in the present embodiment will be described.
With reference to FIG. 36, a procedure of processing performed in the image processing unit 4140 according to the present embodiment will be described.
FIG. 36 is a flowchart illustrating a procedure of processing performed in the image processing unit 4140 according to the present embodiment.

First, in step S4001, the image input unit 4011 inputs image data.
In step S4002, the text setting unit 4012 sets text data.
In step S4003, the text composition area setting unit 4013 sets a text composition area when the text data is synthesized with the image data.
In step S4004, the font setting unit 4014 sets a font including a font color when the text data is combined with the text composition area set in the image data.
In step S4005, the composite image generation unit 4015 applies the set font to the text data to synthesize the text data in the text composition area set in the image data. Thus, the data of the composite image is generated.
Finally, in step S4006, the composite image generation unit 4015 outputs the generated composite image data to, for example, another component unit via the bus 1300.

With reference to FIG. 37, a procedure of processing performed in the font setting unit 4014 according to the present embodiment will be described.
FIG. 37 is a flowchart showing a procedure of processing performed in the font setting unit 4014 according to this embodiment.
The procedure of this process is the details of the process of step S4004 shown in FIG.

First, in step S4011, the font color setting unit 4021 in the font setting unit 4014 includes the image data, text data, and text composition area to be processed in this image data to display the text data. The average color of the set text composition area (image area for displaying text) is determined in RGB.
Next, in step S4012, the font color setting unit 4021 in the font setting unit 4014 obtains the corresponding PCCS color system tone and hue from the obtained average RGB color.

In step S4013, the font color setting unit 4021 in the font setting unit 4014 changes the obtained tone to another tone.
Next, in step S4014, the font color setting unit 4021 in the font setting unit 4014 uses the color of the PCCS color system determined by the combination of the changed tone (the other tone) and the obtained hue as it is as the font color. Set.
Finally, in step S4015, the font setting unit 4014 sets a font including the font color set by the font color setting unit 4021 for the text data.

A specific example of image processing will be described with reference to FIGS. 38 and 39.
FIG. 38 is a diagram illustrating an example of the image data 4901.
38 shows a case where the image data 4901 shown in FIG. 38 is input by the image input unit 4011 of the image processing unit 4140.

FIG. 39 is a diagram showing an example of the composite image data 4911 in this case.
The composite image data 4911 illustrated in FIG. 39 is output from the composite image generation unit 4015 and is output from the image processing unit 4140.

Here, in the composite image data 4911 shown in FIG. 39, the text setting unit 4012 further sets the text composition region 4921 set by the text composition region setting unit 4013 in the same image as the image data 4901 shown in FIG. Text data 4922 (in the example of FIG. 39, the character data “Memories spent on weekdays with everyone (2010/10/06)”) are set in the font set by the font setting unit 4014 (at least the font The image data 4901 and the text data 4922 are combined so that the image data 4901 is displayed.

In FIG. 39, the text synthesis area 4921 is shown in the synthesized image data 4911 for easy understanding of the text synthesis area 4921. However, in the present embodiment, in the actual display, text synthesis is performed. An area 4921 (in the example of FIG. 39, a rectangular frame) is not displayed, and only text data 4922 is combined with the original image data 4901 and displayed.

As described above, according to the image processing unit 4140 according to this embodiment, the font color of the text is set using the color information of the image area (text synthesis area) in which the text is displayed in the image. Specifically, the image processing unit 4140 according to the present embodiment sets a font color in which only the tone is changed without changing the hue in the PCCS color system, for the color information based on the text synthesis area. For example, the impression of the original image can be prevented from changing by displaying text.

Therefore, according to the image processing unit 4140 according to the present embodiment, when displaying text in a digital image such as a still image or a moving image, an image region (in which the text is displayed in the image so that the viewer can easily read it) The optimum font color can be obtained in consideration of the color information in the text synthesis area.

Here, in the present embodiment, an image of one image frame that is a still image or one image frame that constitutes a moving image (for example, one image frame selected to represent a plurality of image frames). For data, the text data to be superimposed (synthesized) on this image data, the text composition area to synthesize this text data on this image data, and the font containing the font color of this text data to be synthesized on this image data Although the case of setting is shown, as another example, these settings can be performed for image data of two or more image frames constituting a moving image. In this case, as an example, for two or more continuous image frames or two or more intermittent image frames constituting a moving image, the values of the corresponding pixels in the frame (for example, RGB values) are averaged, The same processing as in the present embodiment can be performed on image data (averaged image data) of one image frame formed from the average result.

As another configuration example, the font color setting unit 4021 determines the ratio of the hue value of a region (text placement region) in which text is placed in the image data and the hue value of the text data to the text data of the image data. A configuration in which the value is closer to 1 than the ratio between the tone value of the arrangement area and the tone value of the text data can also be used.
Here, the text arrangement area corresponds to the text composition area.
As an aspect, an acquisition unit (image input unit 4011 and text setting unit 4012 in the example of FIG. 34) that acquires image data and text data, and a text arrangement area in which the text data in the image data is arranged. A region determining unit to be determined (in the example of FIG. 34, a text composition region setting unit 4013) and a color setting unit (in the example of FIG. 34, a font color setting unit 4021 of the font setting unit 4014) for setting a predetermined color. ) And an image generation unit (a composite image generation unit 4015 in the example of FIG. 34) that generates an image in which the text data of the predetermined color is arranged in the text arrangement region, and the text arrangement of the image data The ratio of the hue value of the area to the hue value of the text data is the tone value of the text arrangement area of the image data, (In the example of FIG. 34, the image processing unit 4140) The image processing apparatus, characterized in that close to 1 than the ratio between the tone value of the text data can be configured.
Further, as one aspect, in the image processing apparatus described above (the image processing unit 4140 in the example of FIG. 34), the color setting unit (the font color setting unit 4021 of the font setting unit 4014 in the example of FIG. 34) is The tone value and hue value of the PCCS color system are obtained from the average RGB color in the text arrangement area, and only the tone value of the PCCS color system is changed without changing the hue of the PCCS color system. An image processing apparatus characterized by the above can be configured.
It should be noted that the ratio between the hue value of the text layout area (text layout area) and the hue value of the text data in the image data is the tone value of the text layout area of the image data and the tone value of the text data. Various values may be used as the value of each ratio when the value is closer to 1 than the ratio of.
Even in such a configuration, it is possible to obtain the same effect as in the present embodiment.

(Eleventh embodiment)
The functional block diagram of the imaging apparatus according to the present embodiment is the same as that shown in FIG. 8 according to the second embodiment.
The block diagram showing the functional configuration of the image processing unit according to the present embodiment is the same as that shown in FIG. 34 according to the tenth embodiment.
Hereinafter, portions different from the second and tenth embodiments will be described in detail.
In the description of the present embodiment, the same reference numerals as those used in FIGS. 8, 34, 36, and 37 are used.

In the present embodiment, the font setting unit 4014 receives image data, set text data, and information for specifying the set text composition region from the text composition region setting unit 4013, and the text data font. Is set by the font color setting unit 4021, and a predetermined outline is set as one of the fonts of the text data based on the outline information 4033 stored in the storage unit 4016. .

Here, as the predetermined outline, for example, a shadow or a trimming can be used.
As an example, a predetermined outline type (for example, shadow, trimming, etc.) is fixedly set in advance.
As another example, when two or more types of predetermined outlines can be switched and used, for example, when the user operates the operating unit 1180, the switching received by the operating unit 1180 from the user The font setting unit 4014 can switch the type of outline to be used in accordance with the instruction.

Further, as the color of the predetermined outline, for example, black or a tone color darker than the font color tone can be used.
As an example, the color of the predetermined outline is fixedly set in advance.
As another example, when two or more colors can be switched and used as a predetermined outline color, for example, when the user operates the operation unit 1180, the operation unit 1180 receives the color from the user. In accordance with the switching instruction, the font setting unit 4014 can switch the outline color to be used.

As the outline information 4033 stored in the storage unit 4016, information referred to when the font setting unit 4014 sets an outline for text is used. For example, one or more types of outlines that can be used Or information specifying the color is used.

FIG. 40 is a diagram illustrating an example of the composite image data 4931.
In the composite image data 4931 shown in FIG. 40, the same image as the original image data (not shown) composed of images other than the text data 4941 is set by the text composition region setting unit 4013. In a text composition area (not shown), text data 4941 (in the example of FIG. 40, character data “Like”) set by the text setting unit 4012 is used as the font set by the font setting unit 4014. This image data and this text data 4941 are synthesized so as to be displayed (including at least font color and outline).
Here, the example of FIG. 40 shows a case where a shadow is used as an outline.

In the present embodiment, in the process of step S4015 shown in FIG. 37 in the process of step S4004 shown in FIG. 36, the font setting unit 4014 uses a font including the font color set by the font color setting unit 4021 as text. When setting for this data, a predetermined outline font is set.

As described above, according to the image processing unit 4140 according to the present embodiment, the font color of the text is set using the color information of the image area (text synthesis area) in which the text is displayed in the image, and the font. Set the outline as.

Therefore, according to the image processing unit 4140 according to the present embodiment, it is possible to obtain the same effects as those of the tenth embodiment, and to add fonts with outlines such as shadows to the set font color for text. By enhancing the outline, the color contrast can be increased. Such an outline is particularly effective when the font color set for the text is white, for example.

(Twelfth embodiment)
The functional block diagram of the imaging apparatus according to the present embodiment is the same as that shown in FIG. 8 according to the second embodiment.
The block diagram showing the functional configuration of the image processing unit according to the present embodiment is the same as that shown in FIG. 34 according to the tenth embodiment.
Hereinafter, portions different from the second and tenth embodiments will be described in detail.
In the description of this embodiment, the same reference numerals as those used in FIGS. 8, 34, and 37 are used.

In the present embodiment, the font setting unit 4014 receives image data, set text data, and information specifying the set text composition region from the text composition region setting unit 4013, and the font color setting unit 4021 When the font color of the text data is set, based on the color change determination condition information 4034 stored in the storage unit 4016, the color change is greater than or equal to a predetermined value in the text composition area where the text is displayed. If it is determined that the color change is greater than or equal to a predetermined value in the text composition area, two or more font colors are set in the text composition area.
When the font color setting unit 4021 determines that the color change is less than a predetermined value in the text composition area, one type of font color is used for the entire text composition area as in the tenth embodiment. Set.

Specifically, the font color setting unit 4021 divides a text composition area for displaying text into a plurality of areas (referred to as divided areas in the present embodiment), and sets an average RGB color for each divided area. The required processing (same processing as step S4011 shown in FIG. 37) is executed.
The font color setting unit 4021 determines whether or not there is a difference greater than or equal to a predetermined value with respect to the average color values of RGB in the plurality of divided areas. It is determined that the color change is greater than or equal to a predetermined value in the text composition area. On the other hand, when the font color setting unit 4021 determines that there is no difference greater than or equal to a predetermined value with respect to the average color value of RGB in the plurality of divided areas, the color change in the text composition area is less than the predetermined value. judge.

Here, various methods may be used as a method for determining whether or not there is a difference of a predetermined value or more with respect to the average color values of RGB in a plurality of divided regions.
As an example, when the difference between the average color values of RGB in any two divided areas out of a plurality of divided areas is equal to or greater than a predetermined value, the average color value of RGB in the plurality of divided areas is equal to or greater than a predetermined value It is possible to use a method for determining that there is a difference.
As another example, among the plurality of divided areas, the difference between the RGB average color values of the two divided areas of the divided area having the lowest RGB average color value and the largest divided area is not less than a predetermined value. In such a case, it is possible to use a method for determining that there is a difference of a predetermined value or more in the average color values of RGB in a plurality of divided regions.
As another example, a dispersion value of RGB average color values is obtained for all of a plurality of divided areas, and when this dispersion value is equal to or greater than a predetermined value, an average of RGB values of the plurality of divided areas is obtained. A method of determining that there is a difference of a predetermined value or more in the color value can be used.

In these cases, when comparing the average color values of RGB, as an example, only one of R, G, and B can be compared. As another example, two or three of R, G, and B can be combined into a single value for comparison.
As another example, two or more of R, G, and B can be compared separately.
Here, when comparing two or more of R, G, and B separately, for example, one of the comparisons (any of R, G, and B) is greater than or equal to a predetermined value. When there is a difference, a method of determining that there is a difference of a predetermined value or more as a whole can be used, or when there is a difference of a predetermined value or more for all the comparisons (only), there is a difference of a predetermined value or more as a whole. Can be used.

Various methods may be used as a method of dividing a text composition region for displaying text into a plurality of regions (divided regions).
As an example, for a character included in text displayed in a text synthesis area, a technique in which a delimiter area for each character is used as a divided area can be used. In this case, for example, a rectangular area including the periphery of each character is set in advance, and the entire text synthesis area is configured by a combination of all character areas included in the text. In addition, the rectangular area | region for every character may differ for every size of a character, for example.
As another example, an area obtained by dividing a text composition area by a predetermined number of divisions or a predetermined size (for example, a horizontal length, a vertical length, or a block size such as a rectangle). Can be used as a divided region.

In the present embodiment, based on the average color values of RGB in a plurality of divided areas, it is determined whether or not the color change is greater than or equal to a predetermined value in the text synthesis area composed of the plurality of divided areas. As another example, based on the PCCS color system values of a plurality of divided areas (for example, values specifying the tone and hue of the PCCS color system), whether the color change is greater than or equal to a predetermined value in the text composition area A configuration for determining whether or not may be used.

When setting the font color of the text data, the font color setting unit 4021 determines that the color change in the text composition area for displaying the text is greater than or equal to a predetermined value, for each divided area. As in the tenth embodiment, processing for obtaining the average color of RGB (processing similar to step S4011 shown in FIG. 37), processing for obtaining the tone and hue of the PCCS color system (processing similar to step S4012 shown in FIG. 37). , Processing for changing the tone (processing similar to step S4013 shown in FIG. 37), processing for setting the font color (processing similar to step S4014 shown in FIG. 37), and executing font color for each divided area Set.
For example, if the process for obtaining the average RGB color (the same process as step S4011 shown in FIG. 37) has already been performed, the process may not be performed again.

In this embodiment, the entire font color set for each of the plurality of divided areas is set as the font color set for the text data.
Here, when the font color is set for each of the plurality of divided areas, when there are two or more divided areas in which the average color difference of RGB is less than a predetermined value among the plurality of divided areas, for example, For these two or more divided areas, the font color may be obtained for only one of the divided areas, and the same font color may be set for all of the two or more divided areas.

Further, as a further configuration example, the font color setting unit 4021 sets the font color for each of the plurality of divided areas, and then regarding the setting contents so that the entire font color of the text composition area becomes a gradation in a certain direction. It is also possible to adjust the tone and hue of the PCCS color system.

The color change determination condition information 4034 stored in the storage unit 4016 is used when the font color setting unit 21 determines whether or not the color change is greater than or equal to a predetermined value in the text composition area where the text is displayed. Information to be referenced is used, for example, information specifying a method for dividing a text synthesis area into a plurality of divided areas, and determining whether there is a difference of a predetermined value or more in an average color value of the plurality of divided areas Information for specifying a method, information for specifying a predetermined value (threshold value) used for various determinations, and the like are used.

As described above, according to the image processing unit 4140 according to the present embodiment, when there is a large change in color in an image area (text synthesis area) for displaying text, two types of the image area are displayed accordingly. Set the above font color.
Further, according to the image processing unit 4140 according to the present embodiment, as a configuration example, the tone and hue of the PCCS color system are adjusted so that the font color of the entire text becomes a gradation in a certain direction.

Therefore, according to the image processing unit 4140 according to the present embodiment, the readability of the text can be improved even when there is a large color change in the image area (text synthesis area) where the text is displayed. For example, if there is a large change in color in an image area (text synthesis area) that displays text, if the font color is calculated from a single average color in that image area, the contrast of a part of the text cannot be obtained, and the text However, according to the image processing unit 4140 according to the present embodiment, such a problem can be solved.

It should be noted that, in the present embodiment, as in the eleventh embodiment, a font with a predetermined outline may be set by the font setting unit 4014.

Here, like each step shown in FIG. 36 and FIG. 37, a program for realizing the processing procedure (processing step) performed in the above embodiment is recorded on a computer-readable recording medium, Processing may be performed by causing a computer system to read and execute a program recorded on the recording medium.

Further, the program may be transmitted from a computer system storing the program in a recording device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

(Other embodiments)
FIG. 45 is a diagram schematically illustrating an example of a process for extracting a feature amount of a captured image used for determining a sentence to be arranged on an image. In the example of FIG. 45, the determination unit of the image processing device classifies the scene of the captured image into a person image or a landscape image. Next, the image processing apparatus extracts a feature amount of the captured image according to the scene. The feature amount can be the number of faces (number of subjects) and the average color (color arrangement pattern) in the case of a person image, and can be the average color (color arrangement pattern) in the case of a landscape image. . Based on these feature quantities, words (adjectives and the like) to be inserted into the person image template or the landscape image template are determined.

Here, in the example of FIG. 45, the color arrangement pattern is composed of a combination of a plurality of representative colors that constitute the captured image. Therefore, the color arrangement pattern can represent the average color (average color) of the captured image. In one example, “first color”, “second color”, and “third color” are defined as a color arrangement pattern, and based on a combination of these three colors, that is, based on three average colors, a person image Or a word (adjective) to be inserted into a text template for a landscape image.

45, the scene of the captured image is classified into two types (a person image and a landscape image). In another example, the scene of the captured image can be classified into three or more types (3, 4, 5, 6, 7, 8, 9, or 10 types or more).

FIG. 46 is a diagram schematically illustrating another example of the process of extracting the feature amount of the captured image used for determining the text arranged on the image. In the example of FIG. 46, the scene of the captured image can be classified into three or more types.

In the example of FIG. 46, the determination unit of the image processing apparatus determines whether the captured image is a person image (first mode image), a distant view image (second mode image), or another image (third mode image). judge. First, as in the example of FIG. 45, the determination unit determines whether the captured image is a person image or an image different from the person image.

Next, when the captured image is an image different from the human image, the determination unit determines whether the captured image is a distant view image (second mode image) or another image (third mode image). To do. This determination can be performed using, for example, a part of the image identification information given to the captured image.

Specifically, in order to determine whether the captured image is a distant view image, the focal length that is a part of the image identification information can be used. The determination unit determines that the captured image is a distant view image when the focal distance is greater than or equal to a preset reference distance, and determines the captured image as another image when the focal distance is less than the reference distance. As described above, the captured image is classified into three types of scenes: a person image (first mode image), a distant view image (second mode image), or another image (third mode image). Note that examples of distant view images (second mode images) include landscape images such as the sea and mountains, and examples of other images (third mode images) include flowers and pets.

46, after the scene of the captured image is classified, the image processing apparatus extracts the feature amount of the captured image according to the scene.

In the example of FIG. 46, when the captured image is a human image (first scene image), the number of faces (number of subjects) and the feature amount of the captured image used to determine the text arranged on the image A smile level can be used. That is, when the captured image is a human image, a word to be inserted into the human image template may be determined based on the determination result of the smile level in addition to or instead of the determination result of the number of faces (number of subjects). it can. Hereinafter, an example of a smile level determination method will be described with reference to FIG.

In the example of FIG. 47, the determination unit of the image processing apparatus detects a face area from a person image by a method such as face recognition (step S5001). In one example, the degree of smile of a person image is calculated by digitizing the degree of ascending of the mouth corner. For example, various known techniques for face recognition can be used for calculating the smile level.

Next, the determination unit compares the smile level with a preset first smile threshold value α (step S5002). When it is determined that the smile level is greater than or equal to α, the determination unit determines that the smile level of the person image is “smile: large”.

On the other hand, when it is determined that the smile level is less than α, the determination unit compares the smile level with a second smile threshold value β set in advance (step S5003). When it is determined that the smile level is β or more, the determination unit determines that the smile level of this person image is “smile: medium”. Furthermore, when it is determined that the smile level is less than β, the determination unit determines that the smile level of the person image is “smile: small”.

The word to be inserted into the person image template is determined based on the determination result of the smile level of the person image. Here, examples of the word corresponding to the smile level of “smile: large” include “full of joy” and “very good”. Examples of words that correspond to the smile level of “smile: medium” include “joyful” and “good calm”. Examples of words corresponding to the smile level of “smile: small” include “seriously seems” and “cool”.

In the above description, the case where the word to be inserted into the person image template is a continuous form has been described. However, the present invention is not limited to this, and may be an end form, for example. In this case, examples of words corresponding to the smile level of “smile: large” include “smile is nice”, “it is a very good smile”, and the like. Examples of words corresponding to the smile level of “smile: medium” include “smiley” and “good expression”. Examples of words corresponding to a smile level of “smile: small” include “looks serious” and “looks serious”.

FIG. 48A is an example of an output image showing the operation result of the image processing apparatus, and this output image has a sentence determined based on the example of FIG. In the example of FIG. 48A, the captured image is determined to be a person image, and the number of subjects and the color arrangement pattern (average color) are extracted as the feature amount. Further, the word inserted into the person image template is determined as “heavy” according to the color arrangement pattern. As a result, the output result shown in FIG. 48A is obtained. That is, in the example of FIG. 48A, the word “heavy” (adjective, combined form) is determined based on the average color of the captured image.

FIG. 48B is another example of an output image showing the operation result of the image processing apparatus, and this output image has a sentence determined based on the example of FIG. In the example of FIG. 48B, the captured image is determined to be a person image, and the number of subjects and the smile level are extracted as the feature amount. Further, according to the smile level, the word inserted into the person image template is determined as “good expression”. As a result, the output result shown in FIG. 48B is obtained. That is, in the example of FIG. 48B, the word (end form) of “good expression” is determined based on the smile level of the person in the captured image. As in the output result of FIG. 48B, by using the word output using the smile level for the person image, it is possible to attach character information that is relatively close to the impression received from the image.

Returning to FIG. 46, when the captured image is a landscape image (second scene image) or another image (third scene image), as a feature amount of the captured image used to determine the text arranged on the image, A representative color can be used instead of the average color. As the representative color, the “first color” in the color arrangement pattern, that is, the most frequently used color in the captured image can be used. Alternatively, the representative color can be determined using clustering as described below.

FIG. 49 is a schematic block diagram showing an internal configuration of an image processing unit included in the imaging apparatus. In the example of FIG. 49, the image processing unit 5040 of the image processing apparatus includes an image data input unit 5042, an analysis unit 5044, a text creation unit 5052, and a text addition unit 5054. The image processing unit 5040 performs various types of analysis processing on the image data generated by the imaging unit or the like, thereby acquiring various types of information regarding the content of the image data, and creating text that is highly consistent with the content of the image data. Then, text can be added to the image data.

The analysis unit 5044 includes a color information extraction unit 5046, a region extraction unit 5048, and a clustering unit 5050, and performs analysis processing on the image data. The color information extraction unit 5046 extracts first information regarding color information of each pixel included in the image data from the image data. Typically, the first information is a total of the HSV values of all the pixels included in the image data. However, the first information is the frequency at which the predetermined color appears in the image (frequency in pixel units, area ratio, etc.) for a predetermined color associated with similarity (for example, associated with a predetermined color space). The color resolution and the type of color space are not limited.

For example, the first information may be information indicating how many pixels of each color are included in the image data for each color represented by an HSV space vector (HSV value) or RGB value. . However, the color resolution in the first information may be changed as appropriate in consideration of the burden of calculation processing, and the type of color space is not limited to HSV or RGB, and may be CMY, CMYK, or the like.

FIG. 50 is a flowchart showing the flow of representative color determination performed in the analysis unit 5044. In step S5101, the image processing apparatus starts calculating the representative color of specific image data 5060 (captured image, see FIG. 51).

In step S5102, the image data input unit 5042 of the image processing apparatus outputs the image data to the analysis unit 5044. Next, the color information extraction unit 5046 of the analysis unit 5044 calculates first information 5062 regarding the color information of each pixel included in the image data (see FIG. 51).

FIG. 51 is a conceptual diagram showing a calculation process of the first information 5062 performed by the color information extraction unit 5046 in step S5102. The color information extraction unit 5046 aggregates the color information included in the image data 5060 for each color (for example, for each gradation of 256 gradations) to obtain first information 5062. 51 represents an image of the first information 5062 calculated by the color information extraction unit 5046. The horizontal axis of the histogram in FIG. 51 is color, and the vertical axis represents how many pixels of a predetermined color are included in the image data 5060.

50, the region extraction unit 5048 of the analysis unit 5044 extracts the main region in the image data 5060. For example, the area extraction unit 5048 extracts a focused area from the image data 5060 shown in FIG. 51, and recognizes the central portion of the image data 5060 as the main area (see the main area 5064 in FIG. 52). ).

50, in step S5104, the region extraction unit 5048 of the analysis unit 5044 determines a target region for clustering performed in step S5105. For example, as shown in the upper part of FIG. 52, the area extraction unit 5048 recognizes that part of the image data 5060 is the main area 5064 in step S5103 and extracts the main area 5064, the clustering target is set as the main area 5064. The first information 5062 (main first information 5066) corresponding to the area 5064 is used. The histogram shown in the lower part of FIG. 52 represents an image of the main first information 5066.

On the other hand, if the region extraction unit 5048 has not extracted the main region 5064 in the image data 5060 in step S5103, the region extraction unit 5048 displays the first information corresponding to the entire region of the image data 5060 as shown in FIG. 5062 is determined as a clustering target. Note that there is no difference in the subsequent processing between the case where the main region 5064 is extracted and the case where it is not extracted, except that the target region for clustering is different. I will explain.

50, in step S5105, the clustering unit 5050 of the analysis unit 5044 performs clustering on the main first information 5066 that is the first information 5062 of the region determined in step S5104. FIG. 53 is a conceptual diagram showing the result of clustering performed by the clustering unit 5050 on the primary first information 5066 in the primary region 5064 shown in FIG.

For example, the clustering unit 5050 classifies the main information 5066 having 256 gradations (see FIG. 52) into a plurality of clusters by the k-means method. Note that the clustering is not limited to the k-means method (k average method). In other examples, other methods such as the shortest distance method can be used.

The upper part of FIG. 53 shows which cluster each pixel is classified, and the histogram shown at the lower part of FIG. 53 shows the number of pixels belonging to each cluster. By the clustering by the clustering unit 5050, the 256 first main information 5066 (FIG. 52) is classified into less than 256 clusters (three in the example shown in FIG. 53). The result of clustering can include information about the size of each cluster and information about the color of each cluster (the position of the cluster in the color space).

In step S5106, the clustering unit 5050 of the analysis unit 5044 determines a representative color of the image data 5060 based on the clustering result. In one example, when the clustering unit 5050 obtains a clustering result as shown in FIG. 53, the color belonging to the maximum cluster 5074 including the most pixels among the plurality of calculated clusters is set as the representative color of the image data 5060. To do.

When the calculation of the representative color is completed, the text creation unit 5052 creates a text using information on the representative color and assigns the text to the image data 5060.

The text creation unit 5052 reads a text template for a landscape image, for example, and applies a word (for example, “2012/03/10”) corresponding to the generation date and time of the image data 5060 to {date / time} of the text template. In this case, the analysis unit 5044 can retrieve information related to the generation date and time of the image data 5060 from the storage medium and output the information to the text creation unit 5052.

Also, the sentence creation unit 5052 applies the word corresponding to the representative color of the image data 5060 to the {adjective} of the sentence template. The sentence creation unit 5052 reads the correspondence information from the storage unit 5028 and applies it to the sentence template. In one example, the storage unit 5028 stores a table in which colors and words are associated with each scene. The sentence creation unit 5052 can create a sentence (for example, “I found a very beautiful thing”) using words read from the table.

FIG. 54 shows image data 5080 to which text is given by the series of processes described above.

FIG. 55 shows an example of image data to which text is given by a series of processes similar to the above when the scene is a distant view image. In this case, the scene is classified as a distant view image, and the representative color is determined to be blue. For example, in a table in which colors and words are associated with each scene, the word “fresh” is associated with the representative color “blue”.

FIG. 56 is a diagram showing an example of a table having correspondence information between colors and words. In the table of FIG. 56, a color and a word are associated with each scene of a person image (first scene image), a distant view image (second scene image), and another image (third scene image). In one example, when the representative color of the image data is “blue” and the scene is another image (third scene image), the sentence creation unit 5052 uses a word corresponding to the representative color (for example, from the correspondence information in the table). “Classy”) and select {adjective} in the sentence template.

The color-word correspondence table can be set based on a color chart such as a PCCS color system, CICC color system, or NCS color system.

FIG. 57 shows an example of a correspondence table for a distant view image (second scene image) using a color chart of the CCIC display system. FIG. 58 shows an example of a correspondence table for other images (third scene images) using a CCIC display color chart.

57, the horizontal axis corresponds to the hue of the representative color, and the vertical axis corresponds to the tone of the representative color. By using the table shown in FIG. 57 to determine the word, not only the information on the hue of the representative color but also the information on the tone of the representative color is used to determine the word, and give a text that is relatively close to the sensibility that humans generate. Is possible. Hereinafter, a specific text setting example in the case of a distant view image (second scene image) using the table of FIG. 57 will be described. In the case of other images (third scene images), the same setting can be made using the table of FIG.

In FIG. 57, when the representative color is determined to be the area A5001, the name of the representative color (red, orange, yellow, blue, etc.) is applied to the word in the text as it is. For example, if the hue of the representative color is “red (R)” and the tone is “Vivid Tone (V)”, the adjective “crimson” representing the color is selected.

Also, when the representative color is determined to be the color of the area A5002, A5003, A5004 or A5005, the adjective associated with the color is applied to the word in the text. For example, when the representative color is determined to be the color (green) of the area A5003, the adjectives associated with green, such as “comfortable” and “fresh”, are applied.

When the representative color is determined to be the color of the region A5001 to A5005 and the tone is a vivid tone (V), strong tone (S), bright tone (B), or pale tone (LT) Applies adverbs that indicate the degree before the adjectives (eg, very, pretty, etc.).

When it is determined that the representative color is the area A5006, that is, “white tone (white)”, words that are associated with white, such as “clean” and “clear”, are selected. If the representative color is determined to be the area A5007, that is, a gray-based color (light gray tone: ltGY, medium gray tone: mGY, or dark gray tone: dkGY), it is a safe adjective. “Clean”, “nice”, etc. are selected. In an image in which a white or gray color, that is, an achromatic color is a representative color, various colors are often included in the entire image. Therefore, by using words that are less related to color, it is possible to prevent text with inappropriate meanings from being added, and to add text that is relatively close to the image received from the image.

In addition, when the representative color does not belong to any of the areas A5001 to A5007, that is, when the representative color is a low tone (dark grayish tone) or black (black tone), characters having a predetermined meaning (Word or sentence) can be selected as text. Characters having a predetermined meaning include, for example, “where is here”, “a”, and the like. These words and sentences can be stored in the storage unit of the image processing apparatus as a “tweet dictionary”.

In other words, when the representative color is determined to be low tone or black, it may be difficult to determine the hue of the entire image. Even in such a case, characters having less relation to the color are used as described above. Thus, it is possible to prevent a text having an inappropriate meaning from being added, and to add a text close to an image received from an image.

In the above example, the case where the sentence and the word are uniquely determined according to the scene and the representative color has been described. However, the present invention is not limited to this, and exception processing is sometimes performed in the selection of the sentence and the word. it can. For example, the text may be extracted from the “tweet dictionary” once every plural times (for example, once every 10 times). As a result, the display content of the text is not necessarily patterned, so that the user can be prevented from getting bored with the display content.

In the above example, the case where the sentence adding unit arranges the text generated by the sentence creating unit at the upper part or the lower part of the image has been described. However, the present invention is not limited to this. It can also be arranged.

In the above example, the case where the position of the text is fixed in the image has been described. However, the present invention is not limited to this. For example, the text can be displayed so as to flow on the display unit of the image processing apparatus. Thereby, the input image is not easily affected by the text, or the text visibility is improved.

In the above example, the case where the text is always pasted on the image has been described. However, the present invention is not limited to this. For example, in the case of a person image, the text is not pasted. You may make it paste a text.

In the above example, the sentence adding unit has described the case where the display method (font, color, display position, etc.) of the text generated by the sentence creating unit is determined by a predetermined method. A variety of text display methods can be determined. Hereinafter, some examples of these methods will be described.

In one example, the user can correct the text display method (font, color, display position) via the operation unit of the image processing apparatus. Alternatively, the user can change or delete the contents (words) of the text. In addition, the user can select not to display the entire text, that is, display / non-display of the text.

In one example, the size of the text can be changed according to the scene of the input image. For example, when the scene of the input image is a person image, the text can be reduced, and when the scene of the input image is a distant view image or other images, the text can be increased.

In one example, text can be highlighted and combined with image data. For example, when the input image is a person image, a balloon can be given to the person and text can be placed in the balloon.

In one example, the display color of the text can be set based on the representative color of the input image. Specifically, a color having the same hue as the representative color of the input image and a different tone can be used as a text display color. As a result, it is possible to give a text that is in harmony with the input image without excessively claiming the text.

In particular, when the representative color of the input image is white, exception processing may be performed in determining the text display color. Here, in the exception processing, for example, the text color can be set to white and the peripheral portion of the text can be set to black.

As described above, the embodiments of the present invention have been described in detail with reference to the drawings. However, the specific configuration is not limited to the above, and various design changes and the like can be made without departing from the scope of the present invention. Is possible.
For example, in the above-described embodiment, the imaging device 1100 includes the image processing units (image processing devices) 3140, 3140a, 3140b, and 4140. For example, a personal computer, a tablet PC (Personal Computer), a digital camera, and a mobile phone And the like may include

image processing units

3140, 3140a, 3140b, and 4140, which are image processing apparatuses.

DESCRIPTION OF SYMBOLS 1001 ... Image processing apparatus, 1010 ... Image input part, 1020 ... Determination part, 1030 ... Text preparation part, 1040 ... Text addition part, 1090 ... Storage part, 1100 ... Imaging device, 1110 ... Imaging unit, 1111 ... Optical system, 1119 ... Image sensor, 1120 ... AD conversion unit, 1130 ... Buffer memory unit, 1140 ... Image processing unit, 1150 .. Display unit, 1160... Storage unit, 1170... Communication unit, 1180... Operation unit, 1190... CPU, 1200.

2100 ... Imaging device, 2001 ... Imaging system, 2002 ... Imaging unit, 2003 ... Camera control unit, 2004, 2004a, 2004b ... Image processing unit, 2005 ... Storage unit, 2006 ... Buffer memory section 2007 ... Display section 2011 ... Operation section 2012 ... Communication section 2013 ... Power supply section 2015 ... Bus 2021 ... Lens section 2022 Image sensor, 2023 ... AD converter, 2041, 2041b ... image acquisition unit, 2042, 2042b ... image identification information acquisition unit, 2043, 2043b ... color space vector generation unit, 2044 ... Main color extraction unit, 2045, table storage unit, 2046, 2046a, first label generation unit, 2047, second label generation unit, 2048 Label output unit, 2241 ... feature amount extracting unit, 2242 ... scene discrimination unit.

3011 ... Image input unit, 3012 ... Text input unit, 3013 ... First position input unit, 3014 ... Edge detection unit, 3015 ... Face detection unit, 3016 ... Character size determination Part, 3017, 3017a ... cost calculation part, 3018, 3018b ... area determination part, 3019 ... synthesis part, 3021, 3031 ... second position input part, 3140, 3140a, 3140b ... Image processing unit.

4011 ... Image input unit, 4012 ... Text setting unit, 4013 ... Text composition area setting unit, 4014 ... Font setting unit, 4015 ... Composite image generation unit, 4016 ... Storage unit, 4021 ... Font color setting unit, 4031 ... Conversion table information from RGB system to PCCS color system, 4032 ... Tone conversion table information, 4033 ... Outline information, 4034 ... Color change Information on determination conditions, 4140... Image processing unit.

Claims

An image input unit for inputting a captured image;
As a sentence template for completing a sentence by inserting a word into a predetermined blank space, a person image template used for creating a sentence for a person image with a person as a subject and a sentence for a landscape image with a landscape as a subject A storage unit for storing a landscape image template to be used;
A determination unit that determines whether the captured image is the person image or the landscape image;
According to a determination result by the determination unit for the captured image, the sentence template of either the person image template or the landscape image template is read from the storage unit, and the blank part of the read sentence template is stored in the blank part. An image processing apparatus comprising: a sentence creating unit that creates a sentence for the captured image by inserting a word corresponding to a feature amount or an imaging condition of the captured image.
The image processing apparatus according to claim 1.
The storage unit
Storing the sentence template in which the blank portion is set in a sentence from the viewpoint of a person imaged as a subject as the person image template;
An image processing apparatus, wherein the sentence template in which the blank portion is set in a sentence from a viewpoint of a photographer who has photographed a subject is stored as the landscape image template.
The image processing apparatus according to claim 1 or 2,
The determination unit
In the person image, the number of subjects is further determined as the feature amount,
The sentence creation unit
An image processing apparatus, wherein a sentence is created by inserting a word corresponding to the number of subjects in the blank portion with respect to the person image.
The image processing apparatus according to claim 3.
The determination unit
In the case where a plurality of face areas are recognized in the captured image,
The ratio of the size of the maximum face area to the size of the captured image is greater than or equal to a first threshold, less than a second threshold that is greater than or equal to the first threshold, and a plurality of face areas When the standard deviation or variance of the ratio or the standard deviation or variance of the sizes of the plurality of face regions is less than the third threshold,
Or, when the ratio of the size of the maximum face area is equal to or greater than the second threshold value,
An image processing apparatus, wherein the captured image is determined to be the person image, and the number of subjects is determined based on the number of face areas having a ratio equal to or greater than the first threshold.
The image processing apparatus according to any one of claims 1 to 4,
The sentence creation unit
An image processing apparatus, wherein a sentence is created by inserting an adjective corresponding to a color arrangement pattern of the captured image into the blank portion as a word corresponding to a feature amount of the captured image.
The image processing apparatus according to claim 5.
The sentence creation unit
A sentence is created by inserting an adjective in accordance with a color arrangement pattern of a predetermined area on the captured image determined according to whether the captured image is the person image or the landscape image into the blank section. An image processing apparatus.
An image input unit for inputting a captured image;
A determination unit that determines text corresponding to at least one of the feature amount of the captured image and the imaging condition of the captured image;
A determination unit that determines whether the captured image is a first type image or a second type image different from the first type;
A storage unit for storing a first syntax that is a syntax of a sentence used for the first type and a second syntax that is a syntax of a sentence used for the second type;
When the determination unit determines that the captured image is the first type image, the sentence having the first syntax is created using the text determined by the determination unit, and the captured image is the second image. An image processing apparatus, comprising: a sentence creation unit that creates a sentence of the second syntax using the text determined by the determination unit when the determination unit determines that the image is a type image.
An image processing apparatus according to claim 7,
The image processing apparatus according to claim 1, wherein the first type is a portrait, and the second type is a landscape.
An imaging unit that images a subject and generates a captured image;
As a sentence template for completing a sentence by inserting a word into a predetermined blank space, a person image template used for creating a sentence for a person image with a person as a subject and a sentence for a landscape image with a landscape as a subject A storage unit for storing a landscape image template to be used;
A determination unit that determines whether the captured image is the person image or the landscape image;
According to a determination result by the determination unit for the captured image, the sentence template of either the person image template or the landscape image template is read from the storage unit, and the blank part of the read sentence template is stored in the blank part. An imaging apparatus comprising: a sentence creating unit that creates a sentence for the captured image by inserting a word corresponding to a feature amount or an imaging condition of the captured image.
As a sentence template for completing a sentence by inserting a word into a predetermined blank space, a person image template used for creating a sentence for a person image with a person as a subject and a sentence for a landscape image with a landscape as a subject In a computer of an image processing apparatus provided with a storage unit for storing a landscape image template to be used,
An image input step for inputting a captured image;
A determination step of determining whether the captured image is the person image or the landscape image;
According to the determination result of the determination step for the captured image, the sentence template of either the person image template or the landscape image template is read from the storage unit, and the blank section of the read sentence template is stored in the blank section. A program for executing a sentence creation step of creating a sentence for the captured image by inserting a word corresponding to a feature amount or an imaging condition of the captured image.
A determining unit that determines a character having a predetermined meaning from the captured image;
A determination unit that determines whether the captured image is a person image or an image different from the person image;
A storage unit that stores a first syntax that is a syntax of a sentence used for the person image and a second syntax that is a syntax of a sentence used for an image different from the person image;
When the determination unit determines that the captured image is the person image, the first syntax sentence is output using the characters having the predetermined meaning, and the captured image is different from the person image. And an output unit that outputs the sentence of the second syntax using the character having the predetermined meaning when it is determined by the determination unit.
An image acquisition unit for acquiring captured image data;
A scene discriminating unit for discriminating a scene from the acquired image data;
A main color extraction unit for extracting a main color from the acquired image data based on a frequency distribution of color information;
A storage unit in which color information and a first label are associated and stored in advance for each scene;
Reading the first label stored in advance in association with the extracted main color and the determined scene from the storage unit, and generating the read first label as a label of the acquired image data A first label generator that
An image processing apparatus comprising:
A second label generating unit that normalizes a ratio of the main color based on the frequency of the extracted main color and generates a second label by correcting the first label based on the normalized ratio of the main color. The image processing apparatus according to claim 12, further comprising:
In the storage unit,
The image processing apparatus according to claim 12 or 13, wherein a combination information of a plurality of pieces of color information and a label are associated with each determined scene.
The scene discrimination unit
Acquiring image identification information from the acquired image data, extracting information indicating the scene from the acquired image identification information, and determining the scene of the image data based on the information indicating the extracted scene. The image processing apparatus according to claim 12, wherein the image processing apparatus is characterized.
The scene discrimination unit
The image processing apparatus according to claim 15, wherein a feature amount is extracted from the acquired image data, and the scene of the image data is determined based on the extracted feature amount.
An area extracting unit for extracting an area for extracting the main color from the acquired image data based on the determined scene;
The main color extraction unit includes:
The image processing apparatus according to any one of claims 12 to 15, wherein the main color is extracted from image data of an area from which the main color is extracted.
The first label and the second label generated by correcting the first label, or information based on the first label or the second label is associated with the acquired image data in the storage unit. The image processing apparatus according to claim 13, wherein the image processing apparatus is stored.
An imaging apparatus comprising the image processing apparatus according to any one of claims 12 to 18.
A program for causing a computer to execute image processing of an image processing apparatus having an imaging unit,
An image acquisition procedure for acquiring captured image data;
A scene determination procedure for determining a scene from the acquired image data;
A main color extraction procedure for extracting a main color from the acquired image data based on a frequency distribution of color information;
The extracted primary color, color information and first label for each scene are associated with each other, the first label is read from a storage unit that is stored in advance, and the read first label is used as the acquired image data. A first label generation procedure for generating as a label of
A program that causes a computer to execute.
A scene discriminating unit for discriminating whether or not it is a person-photographed scene;
A color extraction unit that extracts color information from the image data when the scene determination unit determines that the scene is not a person-captured scene;
A storage unit in which color information and characters having a predetermined meaning are associated and stored in advance;
A reading unit that reads out, from the storage unit, characters having the predetermined meaning corresponding to the color information extracted by the color extraction unit when the scene determination unit determines that the scene is not a person-captured scene. A featured image processing apparatus.
An acquisition unit for acquiring image data and text data;
A detection unit for detecting an edge of the image data acquired by the acquisition unit;
An area determination unit that determines an area in which the text data is arranged in the image data based on the edge detected by the detection unit;
An image generation unit that generates an image in which the text data is arranged in an area determined by the area determination unit;
An image processing apparatus comprising:
An image processing apparatus according to claim 22, wherein
The area determination unit determines an area with few edges in the image data as an area where the text data is arranged.
An image input unit for inputting image data;
An edge detection unit for detecting edges in the image data input by the image input unit;
A text input section for inputting text data;
An area determination unit that determines a synthesis area of the text data in the image data based on the edge detected by the edge detection unit;
A synthesizing unit that synthesizes the text data with the synthesis region determined by the region determining unit;
An image processing apparatus comprising:
The image processing apparatus according to claim 24, wherein
The area determination unit determines an area with few edges in the image data as the synthesis area.
The image processing apparatus according to claim 24 or 25,
A cost calculation unit that calculates the cost representing the importance at each position of the image data so that the cost of the position where the edge is detected by the edge detection unit is high;
The image processing apparatus according to claim 1, wherein the region determination unit determines a region having a low cost corresponding to the synthesis region as the synthesis region based on the cost calculated by the cost calculation unit.
The image processing apparatus according to claim 26, wherein
A first position input unit for inputting a first position in the image data;
The cost calculating unit increases the cost as the position is closer to the first position input by the first position input unit, and lowers the cost as the position is farther from the first position. Processing equipment.
The image processing apparatus according to claim 26 or 27,
A face detection unit for detecting a human face from the image data;
The cost calculation unit increases the cost of a region with a face detected by the face detection unit.
The image processing apparatus according to any one of claims 26 to 28,
A second position input unit for inputting a second position for combining the text data;
The cost calculation unit reduces the cost of the second position input by the second position input unit.
The image processing device according to any one of claims 24 to 29,
An image processing apparatus, comprising: a character size determining unit that determines a character size of the text data so that all of the text of the text data can be combined in an image area of the image data.
The image processing apparatus according to any one of claims 24 to 30, wherein
The image input unit inputs image data of a moving image,
The area determination unit determines the synthesis area of the text data based on a plurality of frame images included in the image data of the moving image.
Inputting image data;
Entering text data; and
Detecting an edge in the input image data;
Determining a synthesis area of the text data in the image data based on the detected edge;
Combining the text data with the determined combining region;
A program that causes a computer to execute.
An image processing apparatus for inputting image data;
The image processing apparatus inputting text data;
The image processing device detecting an edge in the input image data;
The image processing device determining a synthesis area of the text data in the image data based on the detected edge;
The image processing device combining the text data with the determined combining region;
An image processing method comprising:
An image pickup apparatus comprising the image processing apparatus according to any one of claims 24 to 31.
A detection unit for detecting an edge of image data;
An area determination unit that determines an arrangement area in which characters in the image data are arranged based on the position of the edge detected by the detection unit;
An image generation unit that generates an image in which the characters are arranged in the arrangement region determined by the region determination unit;
An image processing apparatus comprising:
An image input unit for inputting image data;
A text setting section for setting text data;
A text composition area setting section for setting a text composition area, which is an area for synthesizing text data set by the text setting section in the image data input by the image input section;
For the tone and hue of the PCCS color system based on the image data input by the image input unit and the text composition region set by the text composition region setting unit, a font color in which the tone is changed without changing the hue is used. Including a font color setting section to be set, and a font setting section for setting a font including at least the font color;
Set by the text setting unit using a font including at least a font color set by the font setting unit in the text synthesis region set by the text synthesis region setting unit in the image data input by the image input unit. A composite image generation unit that generates composite image data, which is image data obtained by combining text data;
An image processing apparatus comprising:
The image processing apparatus according to claim 36, wherein
The font color setting unit obtains an RGB average color of the text composition region set by the text composition region setting unit in the image data input by the image input unit, and a PCCS color system from the obtained RGB average color Obtaining a tone and hue of the PCCS color system, and setting a font color in which only the tone and hue of the obtained PCCS color system are changed,
An image processing apparatus.
The image processing apparatus according to claim 36 or claim 37.
In the PCCS color system, the font color setting unit changes a dark tone to a white tone or a light gray tone.
An image processing apparatus.
The image processing apparatus according to any one of claims 36 to 38,
In the PCCS color system, the font color setting unit changes the bright tone to another tone that is chromatic and has a harmonious relationship with the contrast.
An image processing apparatus.
40. The image processing apparatus according to claim 39, wherein
In the PCCS color system, the font color setting unit is the brightest tone and has the most vivid color among these other tones when there are a plurality of other tones that are chromatic and have a harmony of contrast. Change to a different tone,
An image processing apparatus.
In the image processing device according to any one of claims 36 to 40,
The font setting unit sets the font color by the font color setting unit and sets the outline font.
An image processing apparatus.
In the image processing device according to any one of claims 36 to 41,
The font color setting unit determines whether the color change in the text composition region set by the text composition region setting unit in the image data input by the image input unit is a predetermined value or more, and the text When it is determined that the color change in the synthesis area is a predetermined value or more, two or more font colors are set in the text synthesis area.
An image processing apparatus.
Inputting image data;
Setting text data; and
Setting a text synthesis area which is an area for synthesizing the set text data in the input image data;
For the tone and hue of the PCCS color system based on the input image data and the set text composition area, a font color in which the tone is changed is set with the hue unchanged, and a font including at least the font color is set. Steps to set,
In the input image data, composite image data is generated which is image data obtained by combining the set text data using the font including at least the set font color in the set text composition area. Steps,
A program that causes a computer to execute.
An image processing apparatus for inputting image data;
The image processing apparatus sets text data;
The image processing device setting a text synthesis area, which is an area for synthesizing the set text data in the input image data;
The image processing apparatus sets a font color in which a tone is changed while leaving the hue unchanged, for the tone and hue of the PCCS color system based on the input image data and the set text composition area, and at least Setting the font including the font color;
The image processing apparatus combines image data obtained by combining the set text data using a font including at least the set font color in the set text composition area in the input image data. Generating image data; and
An image processing method comprising:
The image processing apparatus according to any one of claims 36 to 42 is provided.
An imaging apparatus characterized by that.
An acquisition unit for acquiring image data and text data;
An area determination unit for determining a text arrangement area in which the text data is arranged in the image data;
A color setting unit for setting a predetermined color in the text data;
An image generation unit that generates an image in which the text data of the predetermined color is arranged in the text arrangement area;
The ratio between the hue value of the text arrangement area of the image data and the hue value of the text data is greater than the ratio of the tone value of the text arrangement area of the image data to the tone value of the text data. An image processing apparatus characterized by being close to 1.
The image processing apparatus according to claim 46,
The color setting unit obtains a tone value and a hue value of a PCCS color system from an average color of RGB in the text arrangement area, and the tone value of the PCCS color system without changing the hue of the PCCS color system An image processing apparatus characterized in that only the change is made.
A determination unit that determines an arrangement area in which characters in image data are arranged;
A color setting section for setting a predetermined color for the character;
An image generation unit that generates an image in which the characters are arranged in the arrangement region,
The color setting unit sets the predetermined color so that a ratio between a hue value of the arrangement area and a hue value of the character is closer to 1 than a ratio of a tone value of the arrangement area and the tone value of the character. An image processing apparatus characterized by setting.